Real-Time Code Vulnerability Detection Using a Machine Learning-Integrated Language Server
DOI:
https://doi.org/10.46604/peti.2025.15065Keywords:
code vulnerability detection, CWE, Language Server Protocol, machine learningAbstract
The rapid growth of software development has improved productivity but also introduced security risks, especially when developers skip essential scans due to time constraints or limited tool support. This study proposes a real-time vulnerability detection system that integrates machine learning (ML) into a language server framework to enhance software security during coding. The system uses a Language Server Protocol (LSP) architecture with a Random Forest classifier that analyzes source code at the line level. Code is pre-processed through tokenization, abstract syntax tree (AST) traversal, and TF-IDF vectorization before being classified into four vulnerability types: CWE-79 (Cross-Site Scripting), CWE-89 (SQL Injection), CWE-22 (Path Traversal), and CWE-434 (Unrestricted File Upload). Using 20,000 labeled code lines, the model achieves 82.3% accuracy and an F1-score of 80.7%, performing best on CWE-79 and CWE-89 and showing weakest performance on CWE-434. The language server averages 72 ms per diagnostic, demonstrating its suitability for real-time developer workflows.
References
J. S. Cabrera, A. R. L. Reyes, and C. A. Lasco, “Multicriteria Decision Analysis on Information Security Policy: A Prioritization Approach,” Advances in Technology Innovation, vol. 6, no. 1, pp. 31-38, 2021.
F. Spanca and A. Salihu, “Unveiling the Consequences of Data Breaches: Risks, Impacts, and Mitigation in the Digital Age,” International Conference on Electrical, Communication and Computer Engineering, pp. 1-8, 2024.
M. Alenezi and M. Zarour, “On the Relationship between Software Complexity and Security,” https://doi.org/10.48550/arXiv.2002.07135, 2020.
C. R. Jose, “Exploring Security Process Improvements for Integrating Security Tools within a Software Application Development Methodology,” Ph.D. dissertation, Colorado Technical University, Colorado, CO, 2020.
Ö. Aslan, S. S. Aktuğ, M. Ozkan-Okay, A. A. Yilmaz, and E. Akin, “A Comprehensive Review of Cyber Security Vulnerabilities, Threats, Attacks, and Solutions,” Electronics, vol. 12, no. 6, article no. 1333, 2023.
H. Hanif, M. H. N. Md Nasir, M. F. Ab Razak, A. Firdaus, and N. B. Anuar, “The Rise of Software Vulnerability: Taxonomy of Software Vulnerabilities Detection and Machine Learning Approaches,” Journal of Network and Computer Applications, vol. 179, article no. 103009, 2021.
S. Pargaonkar, “Advancements in Security Testing: A Comprehensive Review of Methodologies and Emerging Trends in Software Quality Engineering,” International Journal of Science and Research, vol. 12, no. 9, pp. 61-66, 2023.
“CVE-2024-9465,” https://www.cve.org/CVERecord?id=CVE-2024-9465, accessed in 2025.
“CVE-2024-51378,” https://www.cve.org/CVERecord?id=CVE-2024-51378, accessed in 2025.
K. A. Deepak, C. Gnanaprakasam, S. N. Prabu, N. Senthamilarasi, K. J. Chenni, R. R. Vinston, et al., “Vulnerability Detection in Software Applications Using Static Code Analysis,” Journal of Theoretical and Applied Information Technology, vol. 102, no. 4, pp. 1307-1320, 2024.
M. Fu and C. Tantithamthavorn, “LineVul: A Transformer-Based Line-Level Vulnerability Prediction,” Proceedings of the 19th International Conference on Mining Software Repositories, pp. 608-620, 2022.
D. Hin, A. Kan, H. Chen, and M. A. Babar, “LineVD: Statement-Level Vulnerability Detection Using Graph Neural Networks,” Proceedings of the 19th International Conference on Mining Software Repositories, pp. 596-607, 2022.
Y. Wu, R. A. Gandhi, and H. Siy, “Using Semantic Templates to Study Vulnerabilities Recorded in Large Software Repositories,” Proceedings of the 2010 ICSE Workshop on Software Engineering for Secure Systems, pp. 22-28, 2010.
“CWE-79: Improper Neutralization of Input during Web Page Generation (‘Cross-Site Scripting’),” https://cwe.mitre.org/data/definitions/79.html, accessed in 2025.
M. Liu, B. Zhang, W. Chen, and X. Zhang, “A Survey of Exploitation and Detection Methods of XSS Vulnerabilities,” IEEE Access, vol. 7, pp. 182004-182016, 2019.
A. R. L. Reyes, E. D. Festijo, and R. P. Medina, “Enhanced Multi-Factor Out-of-Band Authentication En Route to Securing SMS-Based OTP,” International Journal of Engineering and Technology Innovation, vol. 9, no. 2, pp. 145-154, 2019.
A. R. L. Reyes, E. D. Festijo, and R. P. Medina, “Securing One Time Password (OTP) for Multi-Factor Out-of-Band Authentication through a 128-bit Blowfish Algorithm,” International Journal of Communication Networks and Information Security, vol. 10, no. 1, pp. 242-247, 2018.
W. Charoenwet, P. Thongtanunam, V. T. Pham, and C. Treude, “Toward Effective Secure Code Reviews: An Empirical Study of Security-Related Coding Weaknesses,” Empirical Software Engineering, vol. 29, no. 4, article no. 88, 2024.
Z. Li, Z. Liu, W. K. Wong, P. Ma, and S. Wang, “Evaluating C/C++ Vulnerability Detectability of Query-Based Static Application Security Testing Tools,” IEEE Transactions on Dependable and Secure Computing, vol. 21, no. 5, pp. 4600-4618, 2024.
T. Marjanov, I. Pashchenko, and F. Massacci, “Machine Learning for Source Code Vulnerability Detection: What Works and What Isn’t There Yet,” IEEE Security & Privacy, vol. 20, no. 5, pp. 60-76, 2022.
P. Dubey, P. Dubey, and P. N. Bokoro, “Unpacking Sarcasm: A Contextual and Transformer-Based Approach for Improved Detection,” Computers, vol. 14, no. 3, article no. 95, 2025.
S. Rajapaksha, J. Senanayake, H. Kalutarage, and M. O. Al-Kadri, “AI-Powered Vulnerability Detection for Secure Source Code Development,” International Conference on Information Technology and Communications Security, vol. 13809, pp. 275-288, 2022.
Y. Yang and H. Wang, “Random Forest-Based Machine Failure Prediction: A Performance Comparison,” Applied Sciences, vol. 15, no. 16, article no. 8841, 2025.
S. Zaharia, T. Rebedea, and S. Trausan-Matu, “Machine Learning-Based Security Pattern Recognition Techniques for Code Developers,” Applied Sciences, vol. 12, no. 23, article no. 12463, 2022.
“Official page for Language Server Protocol,” https://microsoft.github.io/language-server-protocol/, accessed in 2025.
D. Bork and P. Langer, “Language Server Protocol: An Introduction to the Protocol, Its Use, and Adoption for Web Modeling Tools,” Enterprise Modelling and Information Systems Architectures, vol. 18, pp. 9:1-16, 2023.
G. Bhandari, A. Naseer, and L. Moonen, “CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software,” Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 30-39, 2021.
J. Akhoundali, S. R. Nouri, K. Rietveld, and O. Gadyatskaya, “MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository Discovery,” Proceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 42-51, 2024.
H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions,” Communications of the ACM, vol. 68, no. 2, pp. 96-105, 2025.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Ariel Roy Luceño Reyes, Mark David Dayanan Prado, Raffy Beting Suarez, Rovenado Nesta Abellana Villotes

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Submission of a manuscript implies: that the work described has not been published before that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication. Authors can retain copyright of their article with no restrictions. Also, author can post the final, peer-reviewed manuscript version (postprint) to any repository or website.

Since Oct. 01, 2015, PETI will publish new articles with Creative Commons Attribution Non-Commercial License, under The Creative Commons Attribution Non-Commercial 4.0 International (CC BY-NC 4.0) License.
The Creative Commons Attribution Non-Commercial (CC-BY-NC) License permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes
