Hybrid Real-time Framework for Detecting Adaptive Prompt Injection Attacks in Large Language Models

Chandra Prakash; Mary Lind; Elyson De La Cruz

doi:10.62411/jcta.15254

Authors

Chandra Prakash University of the Cumberlands
Mary Lind University of the Cumberlands
Elyson De La Cruz University of the Cumberlands

DOI:

https://doi.org/10.62411/jcta.15254

Keywords:

Adversarial Attacks, Artificial Intelligence, Heuristic Pre-filtering, Intrusion Detection, LLMs, Malicious Prompt Detection, Prompt Injection, Semantic Analysis

Abstract

Prompt injection has emerged as a critical security threat for Large Language Models (LLMs), exploiting their inability to separate instructions from data within application contexts reliably. This paper provides a structured review of current attack vectors, including direct and indirect prompt injection, and highlights the limitations of existing defenses, with particular attention to the fragility of Known-Answer Detection (KAD) against adaptive attacks such as DataFlip. To address these gaps, we propose a novel, hybrid, multi-layered detection framework that operates in real-time. The architecture integrates heuristic pre-filtering for rapid elimination of obvious threats, semantic analysis using fine-tuned transformer embeddings for detecting obfuscated prompts, and behavioral pattern recognition to capture subtle manipulations that evade earlier layers. Our hybrid model achieved an accuracy of 0.974, precision of 1.000, recall of 0.950, and an F1 score of 0.974, indicating strong and balanced detection performance. Unlike prior siloed defenses, the framework proposes coverage across input, semantic, and behavioral dimensions. This layered approach offers a resilient and practical defense, advancing the state of security for LLM-integrated applications.

Author Biographies

Chandra Prakash, University of the Cumberlands

School of Computer Information Sciences, University of the Cumberlands, Williamsburg 40769, Kentucky, United States

Mary Lind, University of the Cumberlands

School of Computer Information Sciences, University of the Cumberlands, Williamsburg 40769, Kentucky, United States

Elyson De La Cruz, University of the Cumberlands

School of Computer Information Sciences, University of the Cumberlands, Williamsburg 40769, Kentucky, United States

References

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection,” in Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, Nov. 2023, pp. 79–90. doi: 10.1145/3605764.3623985.

X. Liu, Z. Yu, Y. Zhang, N. Zhang, and C. Xiao, “Automatic and Universal Prompt Injection Attacks against Large Language Models,” ArXiv. Mar. 07, 2024. [Online]. Available: http://arxiv.org/abs/2403.04957

OWASP, “OWASP Top 10 for Large Language Model Applications,” OWASP. https://owasp.org/www-project-top-10-for-large-language-model-applications/

J. McHugh, K. Šekrst, and J. Cefalu, “Prompt Injection 2.0: Hybrid AI Threats,” ArXiv. Jul. 17, 2025. [Online]. Available: http://arxiv.org/abs/2507.13169

S. Abdelnabi et al., “LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge,” ArXiv. Jun. 11, 2025. [Online]. Available: http://arxiv.org/abs/2506.09956

R. Harang, “Securing LLM Systems Against Prompt Injection,” Nvidia Developer, 2023. https://developer.nvidia.com/blog/securing-llm-systems-against-prompt-injection/

Q. Zhan, R. Fang, H. S. Panchal, and D. Kang, “Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents,” in Findings of the Association for Computational Linguistics: NAACL 2025, 2025, pp. 7101–7117. doi: 10.18653/v1/2025.findings-naacl.395.

Y. Jia, Z. Shao, Y. Liu, J. Jia, D. Song, and N. Z. Gong, “A Critical Evaluation of Defenses against Prompt Injection Attacks,” ArXiv. May 23, 2025. [Online]. Available: http://arxiv.org/abs/2505.18333

F. Jia, T. Wu, X. Qin, and A. Squicciarini, “The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents,” in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 29680–29697. doi: 10.18653/v1/2025.acl-long.1435.

E. Camacho, “How to Set Up Prompt Injection Detection for Your LLM Stack,” NeuralTrustAI, 2025. https://neuraltrust.ai/blog/prompt-injection-detection-llm-stack

Q. Lan, AnujKaul, and S. Jones, “Prompt Injection Detection in LLM Integrated Applications,” Int. J. Netw. Dyn. Intell., p. 100013, Jun. 2025, doi: 10.53941/ijndi.2025.100013.

R. Zhang, D. Sullivan, K. Jackson, P. Xie, and M. Chen, “Defense against Prompt Injection Attacks via Mixture of Encodings,” in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), 2025, pp. 244–252. doi: 10.18653/v1/2025.naacl-short.21.

S. Choudhary, D. Anshumaan, N. Palumbo, and S. Jha, “How Not to Detect Prompt Injections with an LLM,” in Proceedings of the 18th ACM Workshop on Artificial Intelligence and Security, Oct. 2025, pp. 218–229. doi: 10.1145/3733799.3762980.

L. Beurer-Kellner et al., “Design Patterns for Securing LLM Agents against Prompt Injections,” ArXiv. Jun. 27, 2025. [Online]. Available: http://arxiv.org/abs/2506.08837

A. Sharma, “PPO-based Reinforcement Learning with Human Feedback with Hybrid Oversight and Predictive Reward Evaluation for AGI,” J. Futur. Artif. Intell. Technol., vol. 2, no. 3, pp. 493–503, Oct. 2025, doi: 10.62411/faith.3048-3719-276.

K.-H. Hung, C.-Y. Ko, A. Rawat, I.-H. Chung, W. H. Hsu, and P.-Y. Chen, “Attention Tracker: Detecting Prompt Injection Attacks in LLMs,” in Findings of the Association for Computational Linguistics: NAACL 2025, 2025, pp. 2309–2322. doi: 10.18653/v1/2025.findings-naacl.123.

P. H. Hussan and S. M. Mangj, “BERTPHIURL : A Teacher-Student Learning Approach Using DistilRoBERTa and RoBERTa for Detecting Phishing Cyber URLs,” J. Futur. Artif. Intell. Technol., vol. 1, no. 4, 2025, doi: 10.62411/faith.3048-3719-71.

A. Masood, “The Sandboxed Mind — Principled Isolation Patterns for Prompt Injection Resilient LLM Agents,” Medium, 2025. https://medium.com/@adnanmasood/the-sandboxed-mind-principled-isolation-patterns-for-prompt-injection-resilient-llm-agents-c14f1f5f8495

J. Yi et al., “Benchmarking and Defending against Indirect Prompt Injection Attacks on Large Language Models,” in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, Jul. 2025, pp. 1809–1820. doi: 10.1145/3690624.3709179.

J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with GPUs,” ArXiv. Feb. 28, 2017. [Online]. Available: http://arxiv.org/abs/1702.08734

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv. Oct. 10, 2018. [Online]. Available: http://arxiv.org/abs/1810.04805

N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” ArXiv. Aug. 27, 2019. [Online]. Available: http://arxiv.org/abs/1908.10084

V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection,” ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, Jul. 2009, doi: 10.1145/1541880.1541882.

S. Fort, J. Ren, and B. Lakshminarayanan, “Exploring the Limits of Out-of-Distribution Detection,” arXiv. Jul. 29, 2021. [Online]. Available: http://arxiv.org/abs/2106.03004

K. Lee, K. Lee, H. Lee, and J. Shin, “A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks,” ArXiv. Oct. 27, 2018. [Online]. Available: http://arxiv.org/abs/1807.03888

P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1, pp. I-511-I–518. doi: 10.1109/CVPR.2001.990517.

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp. 2999–3007. doi: 10.1109/ICCV.2017.324.

J. Schwenzow, “deepset/prompt-injections,” Hugging Face, 2023. https://huggingface.co/datasets/deepset/prompt-injections

A. Conneau, A. Baevski, R. Collobert, A. Mohamed, and M. Auli, “Unsupervised Cross-Lingual Representation Learning for Speech Recognition,” in Interspeech 2021, Aug. 2021, pp. 2426–2430. doi: 10.21437/Interspeech.2021-329.

D. R. I. M. Setiadi, S. Widiono, A. N. Safriandono, and S. Budi, “Phishing Website Detection Using Bidirectional Gated Recurrent Unit Model and Feature Selection,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 75–83, Jul. 2024, doi: 10.62411/faith.2024-15.

J. P. Ntayagabiri, Y. Bentaleb, J. Ndikumagenge, and H. El Makhtoum, “OMIC: A Bagging-Based Ensemble Learning Framework for Large-Scale IoT Intrusion Detection,” J. Futur. Artif. Intell. Technol., vol. 1, no. 4, pp. 401–416, Feb. 2025, doi: 10.62411/faith.3048-3719-63.