Hybrid Real-time Framework for Detecting Adaptive Prompt Injection Attacks in Large Language Models
DOI:
https://doi.org/10.62411/jcta.15254Keywords:
Adversarial Attacks, Artificial Intelligence, Heuristic Pre-filtering, Intrusion Detection, LLMs, Malicious Prompt Detection, Prompt Injection, Semantic AnalysisAbstract
Prompt injection has emerged as a critical security threat for Large Language Models (LLMs), exploiting their inability to separate instructions from data within application contexts reliably. This paper provides a structured review of current attack vectors, including direct and indirect prompt injection, and highlights the limitations of existing defenses, with particular attention to the fragility of Known-Answer Detection (KAD) against adaptive attacks such as DataFlip. To address these gaps, we propose a novel, hybrid, multi-layered detection framework that operates in real-time. The architecture integrates heuristic pre-filtering for rapid elimination of obvious threats, semantic analysis using fine-tuned transformer embeddings for detecting obfuscated prompts, and behavioral pattern recognition to capture subtle manipulations that evade earlier layers. Our hybrid model achieved an accuracy of 0.974, precision of 1.000, recall of 0.950, and an F1 score of 0.974, indicating strong and balanced detection performance. Unlike prior siloed defenses, the framework proposes coverage across input, semantic, and behavioral dimensions. This layered approach offers a resilient and practical defense, advancing the state of security for LLM-integrated applications.References
K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection,” in Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, Nov. 2023, pp. 79–90. doi: 10.1145/3605764.3623985.
X. Liu, Z. Yu, Y. Zhang, N. Zhang, and C. Xiao, “Automatic and Universal Prompt Injection Attacks against Large Language Models,” ArXiv. Mar. 07, 2024. [Online]. Available: http://arxiv.org/abs/2403.04957
OWASP, “OWASP Top 10 for Large Language Model Applications,” OWASP. https://owasp.org/www-project-top-10-for-large-language-model-applications/
J. McHugh, K. Šekrst, and J. Cefalu, “Prompt Injection 2.0: Hybrid AI Threats,” ArXiv. Jul. 17, 2025. [Online]. Available: http://arxiv.org/abs/2507.13169
S. Abdelnabi et al., “LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge,” ArXiv. Jun. 11, 2025. [Online]. Available: http://arxiv.org/abs/2506.09956
R. Harang, “Securing LLM Systems Against Prompt Injection,” Nvidia Developer, 2023. https://developer.nvidia.com/blog/securing-llm-systems-against-prompt-injection/
Q. Zhan, R. Fang, H. S. Panchal, and D. Kang, “Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents,” in Findings of the Association for Computational Linguistics: NAACL 2025, 2025, pp. 7101–7117. doi: 10.18653/v1/2025.findings-naacl.395.
Y. Jia, Z. Shao, Y. Liu, J. Jia, D. Song, and N. Z. Gong, “A Critical Evaluation of Defenses against Prompt Injection Attacks,” ArXiv. May 23, 2025. [Online]. Available: http://arxiv.org/abs/2505.18333
F. Jia, T. Wu, X. Qin, and A. Squicciarini, “The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents,” in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 29680–29697. doi: 10.18653/v1/2025.acl-long.1435.
E. Camacho, “How to Set Up Prompt Injection Detection for Your LLM Stack,” NeuralTrustAI, 2025. https://neuraltrust.ai/blog/prompt-injection-detection-llm-stack
Q. Lan, AnujKaul, and S. Jones, “Prompt Injection Detection in LLM Integrated Applications,” Int. J. Netw. Dyn. Intell., p. 100013, Jun. 2025, doi: 10.53941/ijndi.2025.100013.
R. Zhang, D. Sullivan, K. Jackson, P. Xie, and M. Chen, “Defense against Prompt Injection Attacks via Mixture of Encodings,” in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), 2025, pp. 244–252. doi: 10.18653/v1/2025.naacl-short.21.
S. Choudhary, D. Anshumaan, N. Palumbo, and S. Jha, “How Not to Detect Prompt Injections with an LLM,” in Proceedings of the 18th ACM Workshop on Artificial Intelligence and Security, Oct. 2025, pp. 218–229. doi: 10.1145/3733799.3762980.
L. Beurer-Kellner et al., “Design Patterns for Securing LLM Agents against Prompt Injections,” ArXiv. Jun. 27, 2025. [Online]. Available: http://arxiv.org/abs/2506.08837
A. Sharma, “PPO-based Reinforcement Learning with Human Feedback with Hybrid Oversight and Predictive Reward Evaluation for AGI,” J. Futur. Artif. Intell. Technol., vol. 2, no. 3, pp. 493–503, Oct. 2025, doi: 10.62411/faith.3048-3719-276.
K.-H. Hung, C.-Y. Ko, A. Rawat, I.-H. Chung, W. H. Hsu, and P.-Y. Chen, “Attention Tracker: Detecting Prompt Injection Attacks in LLMs,” in Findings of the Association for Computational Linguistics: NAACL 2025, 2025, pp. 2309–2322. doi: 10.18653/v1/2025.findings-naacl.123.
P. H. Hussan and S. M. Mangj, “BERTPHIURL : A Teacher-Student Learning Approach Using DistilRoBERTa and RoBERTa for Detecting Phishing Cyber URLs,” J. Futur. Artif. Intell. Technol., vol. 1, no. 4, 2025, doi: 10.62411/faith.3048-3719-71.
A. Masood, “The Sandboxed Mind — Principled Isolation Patterns for Prompt Injection Resilient LLM Agents,” Medium, 2025. https://medium.com/@adnanmasood/the-sandboxed-mind-principled-isolation-patterns-for-prompt-injection-resilient-llm-agents-c14f1f5f8495
J. Yi et al., “Benchmarking and Defending against Indirect Prompt Injection Attacks on Large Language Models,” in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, Jul. 2025, pp. 1809–1820. doi: 10.1145/3690624.3709179.
J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with GPUs,” ArXiv. Feb. 28, 2017. [Online]. Available: http://arxiv.org/abs/1702.08734
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv. Oct. 10, 2018. [Online]. Available: http://arxiv.org/abs/1810.04805
N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” ArXiv. Aug. 27, 2019. [Online]. Available: http://arxiv.org/abs/1908.10084
V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection,” ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, Jul. 2009, doi: 10.1145/1541880.1541882.
S. Fort, J. Ren, and B. Lakshminarayanan, “Exploring the Limits of Out-of-Distribution Detection,” arXiv. Jul. 29, 2021. [Online]. Available: http://arxiv.org/abs/2106.03004
K. Lee, K. Lee, H. Lee, and J. Shin, “A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks,” ArXiv. Oct. 27, 2018. [Online]. Available: http://arxiv.org/abs/1807.03888
P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1, pp. I-511-I–518. doi: 10.1109/CVPR.2001.990517.
T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp. 2999–3007. doi: 10.1109/ICCV.2017.324.
J. Schwenzow, “deepset/prompt-injections,” Hugging Face, 2023. https://huggingface.co/datasets/deepset/prompt-injections
A. Conneau, A. Baevski, R. Collobert, A. Mohamed, and M. Auli, “Unsupervised Cross-Lingual Representation Learning for Speech Recognition,” in Interspeech 2021, Aug. 2021, pp. 2426–2430. doi: 10.21437/Interspeech.2021-329.
D. R. I. M. Setiadi, S. Widiono, A. N. Safriandono, and S. Budi, “Phishing Website Detection Using Bidirectional Gated Recurrent Unit Model and Feature Selection,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 75–83, Jul. 2024, doi: 10.62411/faith.2024-15.
J. P. Ntayagabiri, Y. Bentaleb, J. Ndikumagenge, and H. El Makhtoum, “OMIC: A Bagging-Based Ensemble Learning Framework for Large-Scale IoT Intrusion Detection,” J. Futur. Artif. Intell. Technol., vol. 1, no. 4, pp. 401–416, Feb. 2025, doi: 10.62411/faith.3048-3719-63.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Chandra Prakash, Mary Lind, Elyson De La Cruz

This work is licensed under a Creative Commons Attribution 4.0 International License.













