Large Language Models for Cybersecurity Intelligence, Threat Hunting, and Decision Support

Shaochen Ren; Shiyang Chen

doi:10.54097/7ysr5k17

Authors

Shaochen Ren
Shiyang Chen

DOI:

https://doi.org/10.54097/7ysr5k17

Keywords:

Large Language Models, Cybersecurity, Threat Intelligence, Artificial Intelligence Security

Abstract

Large language models (LLMs) have emerged as transformative technologies in cybersecurity, offering unprecedented capabilities in threat detection, vulnerability analysis, and intelligent decision-making. This review examines the application of LLMs across critical cybersecurity domains, including cyber threat intelligence (CTI), threat hunting, vulnerability detection, malware analysis, and decision support systems. The integration of LLMs such as Generative Pre-trained Transformer 4 (GPT-4), Bidirectional Encoder Representations from Transformers (BERT), Large Language Model Meta AI (LLaMA), and domain-specific models like SecureFalcon has demonstrated remarkable potential in automating complex security tasks, enhancing analyst productivity, and enabling proactive defense mechanisms. However, the deployment of LLMs in cybersecurity contexts introduces unique challenges, including prompt injection vulnerabilities, data poisoning risks, hallucination concerns, and ethical considerations regarding adversarial use. This paper synthesizes recent research advances, evaluates current LLM architectures and their security applications, examines real-world implementation challenges, and identifies critical gaps requiring further investigation. Through comprehensive analysis of over sixty recent studies, we highlight how LLMs are reshaping cybersecurity practices while emphasizing the necessity for robust security frameworks, continuous model validation, and responsible deployment strategies to mitigate emerging risks associated with these powerful artificial intelligence (AI) systems.

Downloads

Download data is not yet available.

References

[1] Brown T, Mann B, Ryder N, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems. 2020; 33:1877-1901.

[2] Zhang J, Bu H, Wen H, et al. When LLMs meet cybersecurity: a systematic literature review. Cybersecurity. 2025;8:14.

[3] Motlagh NH, Khajavi SH, Jaribion A. A comprehensive overview of large language models for cyber defences: opportunities and directions. arXiv:2405.14487. 2024.

[4] Silva GJ, Westphall CB. A survey of large language models in cybersecurity. arXiv:2402.16968. 2024.

[5] Ji M, Shi J, Wang H, et al. SEvenLLM: benchmarking, eliciting, and enhancing abilities of large language models in cyber threat intelligence. arXiv:2405.03416. 2024.

[6] Gao P, Shao F, Liu X, et al. Enabling efficient cyber threat hunting with cyber threat intelligence. IEEE International Conference on Data Engineering. 2021:193-204.

[7] Shestov A, Cheshkov A, Levichev R, et al. Finetuning large language models for vulnerability detection. arXiv:2401.17010. 2024.

[8] Liu T, Wang F, Chen M. Large language model for vulnerability detection and repair: literature review and the road ahead. arXiv:2404.02525. 2024.

[9] Jiang Y, Sun W, Chen L, et al. CyberTeam: benchmarking LLMs in an embodied environment for blue team threat hunting. arXiv:2505.11901. 2025.

[10] Moongela H, Mayayise T. The impact of large language models on cybersecurity. Communications in Computer and Information Science. 2026; 2583:150-165.

[11] Hasanov I, Virta S, Hakkala A, Isoaho J. Application of large language models in cybersecurity: a systematic literature review. IEEE Access. 2024; 12:93331-93352.

[12] Greshake K, Abdelnabi S, Mishra S, et al. Not what you have signed up for: compromising real-world LLM-integrated applications with indirect prompt injection. arXiv:2302.12173. 2023.

[13] Wallace, E., Zhao, T. Z., Feng, S., & Singh, S. (2020). Concealed data poisoning attacks on NLP models. arXiv preprint arXiv:2010.12563.

[14] Yao Y, Duan J, Xu K, et al. A survey on large language model security and privacy: the good, the bad, and the ugly. High-Confidence Computing. 2024; 4:100211.

[15] Ferrag MA, Battah A, Tihanyi N, et al. Revolutionizing cyber threat detection with large language models: a privacy-preserving BERT-based lightweight model for IoT/IIoT devices. IEEE Access. 2024;12:18424-18441.

[16] Ferrag MA, Tihanyi N, Cordeiro LC, et al. Generative AI in cybersecurity: a comprehensive review of LLM applications and vulnerabilities. Future Generation Computer Systems. 2025;173:107877.

[17] Song Y, Zhang Z, Jiang X, et al. Vulnerability detection using BERT based LLM model with transparency obligation practice towards trustworthy AI. Machine Learning with Applications. 2024;18:100598.

[18] Xu H, Liu Y, Xing Y, et al. Large language models for cyber security: a systematic literature review. ACM Transactions on Software Engineering and Methodology. 2025;34(2):1-39.

[19] Wang Y, Chen L, Zhang H, et al. Retrieval-augmented generation for large language models: a survey. arXiv:2312.10997. 2023.

[20] Bryce C, Kalousis A, Leroux I, et al. Exploring the dual role of LLMs in cybersecurity: threats and defenses. Large Language Models in Cybersecurity. Springer. 2024:563-594.

[21] Kasri, W., Himeur, Y., Alkhazaleh, H. A., Tarapiah, S., Atalla, S., Mansoor, W., & Al-Ahmad, H. (2025). From vulnerability to defense: The role of large language models in enhancing cybersecurity. Computation, 13(2), 30.

[22] Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. NAACL-HLT. 2019:4171-4186.

[23] Touvron H, Martin L, Stone K, et al. Llama 2: open foundation and fine-tuned chat models. arXiv:2307.09288. 2023.

[24] Lewis P, Perez E, Piktus A, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. NeurIPS. 2020:9459-9474.

[25] Fang R, Bindu R, Gupta A, et al. LLM agents can autonomously exploit one-day vulnerabilities. arXiv: 2404. 08144. 2024.

[26] Wei J, Wang X, Schuurmans D, et al. Chain-of-thought prompting elicits reasoning in large language models. NeurIPS. 2022:24824-24837.

[27] Steenhoek B, Rahman MM, Jiles R, et al. LLMs cannot reliably identify and reason about security vulnerabilities yet: a comprehensive evaluation, framework, and benchmarks. IEEE Symposium on Security and Privacy. 2024:1-18.

[28] Mussabayev R, Khairullin R, Kassymbekov D, et al. Code vulnerability detection: a comparative analysis of emerging large language models. arXiv:2409.10490. 2024.

[29] Sun C, Wang Y, Wu S, et al. Everything you wanted to know about LLM-based vulnerability detection but were afraid to ask. arXiv:2504.13474. 2025.

[30] Zhou, X., Cao, S., Sun, X., & Lo, D. (2025). Large language model for vulnerability detection and repair: Literature review and the road ahead. ACM Transactions on Software Engineering and Methodology, 34(5), 1-31.

[31] Liu, Z., Tang, Z., Zhang, J., Xia, X., & Yang, X. (2024, April). Pre-training by predicting program dependencies for vulnerability analysis tasks. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (pp. 1-13).

[32] Jelodar, H., Bai, S., Hamedi, P., Mohammadian, H., Razavi-Far, R., & Ghorbani, A. (2025). Large Language Model (LLM) for Software Security: Code Analysis, Malware Analysis, Reverse Engineering. arXiv preprint arXiv:2504.071

[33] Jelodar H, Alavizadeh H, Esmaeili A, et al. Large language model for software security: code analysis, malware analysis, reverse engineering. arXiv:2504.07137. 2025.

[34] Qian X, Liu Y, Zhang H, et al. Exploring LLMs for malware detection: review, framework design, and countermeasure approaches. arXiv:2409.07587. 2024.

[35] Qian X, Chen B, Zhang Y, et al. LAMD: context-driven Android malware detection and classification with LLMs. arXiv:2502.18456. 2025.

[36] Omar M, Zangana HM, Al-Karaki JN, Mohammed D. Harnessing LLMs for IoT malware detection: a comparative analysis of BERT and GPT-2. IEEE ISMSIT. 2024:1-6.

[37] Zahan N, Burckhardt P, Lysenko M, et al. Leveraging large language models to detect npm malicious packages. arXiv:2403.12196. 2024.

[38] Jiang Y, Zhang W, Pang J, et al. Transformers and large language models for efficient intrusion detection systems: a comprehensive survey. arXiv:2408.09344. 2024.

[39] Lai C, Wang Y, Chen X, et al. Large language models in wireless application design: in-context learning-enhanced automatic network intrusion detection. arXiv:2405.17234. 2024.

[40] Bai J, Wang Y, Zhang L, et al. PhishDebate: an LLM-based multi-agent framework for phishing website detection. arXiv:2506.15656. 2025.

[41] Lee J, Lim P, Hooi B, Divakaran DM. Multimodal large language models for phishing webpage detection and identification. eCrime Symposium. 2024:1-12.

[42] Lim B, Kumar P, Tan A, et al. EXPLICATE: enhancing phishing detection through explainable AI and LLM-powered interpretability. arXiv:2503.20796. 2025.

[43] Afane K, Meli A, Khan LA, Hamlen KW. Next-generation phishing: how LLM agents empower cyber attackers. arXiv:2411.13874. 2024.

[44] Gioacchini L, Mellia M, Drago I, et al. AutoPenBench: benchmarking generative agents for penetration testing. arXiv:2410.03225. 2024.

[45] Deng G, Liu Y, Mayoral-Vilches V, et al. PentestGPT: evaluating and harnessing large language models for automated penetration testing. USENIX Security Symposium. 2024:847-864.

[46] Fang R, Bindu R, Gupta A, Kang D. Teams of LLM agents can exploit zero-day vulnerabilities. arXiv:2406.01637. 2024.

[47] Muzsai L, Imolai D, Lukács A. HackSynth: LLM agent and evaluation framework for autonomous penetration testing. arXiv:2412.01778. 2024.

[48] Shen, X., Wang, L., Li, Z., Chen, Y., Zhao, W., Sun, D., ... & Ruan, W. (2025, August). Pentestagent: Incorporating llm agents to automated penetration testing. In Proceedings of the 20th ACM Asia Conference on Computer and Communications Security (pp. 375-391).37.

[49] Henke J. AutoPentest: enhancing vulnerability management with autonomous LLM agents. arXiv:2505.10321. 2025.

[50] An, R., Chen, K., & Li, H. (2025). Unsupervised low-dose CT reconstruction with one-way conditional normalizing flows. IEEE Transactions on Computational Imaging.

[51] Chen W, Zhang Y, Liu X, et al. IntellBot: retrieval augmented LLM chatbot for cyber threat knowledge delivery. arXiv:2411.08234. 2024.

[52] Chen W, Zhang Y, Liu X, et al. Labeling NIDS rules with MITRE ATT&CK techniques: machine learning vs large language models. arXiv:2412.12456. 2024.

[53] Kunwar D, Sharma P, Prakash I, et al. Leveraging LLMs for non-security experts in threat hunting: detecting living off the land techniques. Computers. 2025;7(2):31.

[54] Huang, X., Ruan, W., Huang, W., Jin, G., Dong, Y., Wu, C., ... & Mustafa, M. A. (2024). A survey of safety and trustworthiness of large language models through the lens of verification and validation. Artificial Intelligence Review, 57(7), 175.

[55] Liu Y, Deng G, Li Z, et al. Jailbreaking ChatGPT via prompt engineering: an empirical study. arXiv:2305.13860. 2023.

[56] Carlini N, Tramer F, Wallace E, et al. Extracting training data from large language models. USENIX Security. 2021:2633-2650.

[57] Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., ... & Fung, P. (2023). Survey of hallucination in natural language generation. ACM computing surveys, 55(12), 1-38.

[58] Solaiman I, Brundage M, Clark J, et al. Release strategies and the social impacts of language models. arXiv:1908.09203. 2019.

[59] Li Z, Yu X, Zhang Y, et al. Understanding the effectiveness of large language models in code vulnerability detection. arXiv:2311.16169. 2023.

[60] Ding Z, Xu M, Wang Y, et al. Beyond accuracy: evaluating LLMs for software vulnerability detection. arXiv:2402.15432. 2024.

[61] Zhang K, Li Y, Wang X, et al. Few-shot vulnerability detection with contrastive learning. arXiv:2403.15432. 2024.

[62] Chen X, Hao Z, Li L, et al. CruParamer: learning on parameter-augmented API sequences for malware detection. IEEE Transactions on Information Forensics and Security. 2022; 17:788-803.

[63] Long, S., Tan, J., Mao, B., Tang, F., Li, Y., Zhao, M., & Kato, N. (2025). A survey on intelligent network operations and performance optimization based on large language models. IEEE Communications Surveys & Tutorials.

[64] Feng, R., Chen, H., Wang, S., Karim, M. M., & Jiang, Q. (2025). LLM-MalDetect: A Large Language Model-Based Method for Android Malware Detection. IEEE Access.

[65] Gibert D, Planes J, Le Q, Zizzo G. A wolf in sheep clothing: query-free evasion attacks against machine learning-based malware detectors with GANs. IEEE EuroS&P. 2023:1-16.