Adaptive Hybrid Learning for Websites Vulnerability prediction

Authors

  • Mohannad Hossain Hadi Department of Computer Science, College of Science, Mustansiriyah University, Baghdad, Iraq
  • Karim Hashim Al-Saedi Department of Computer Science, College of Science, Mustansiriyah University, Baghdad, Iraq

DOI:

https://doi.org/10.29304/jqcsm.2024.16.11433

Keywords:

Cybersecurity, Hybrid Modeling, Vulnerabilities Prediction, Data Mining

Abstract

This study presents a comprehensive exploration of machine learning (ML) techniques for predicting vulnerabilities in websites, which is a critical aspect of modern cybersecurity. With the advancement of digital threats and the complexity of cyber-attacks, conventional security strategies have become increasingly inadequate. By employing machine learning algorithms such as Random Forest and Gradient Boosting, this study formulates models adept at identifying potential vulnerabilities within the website code. This approach responds to the escalating demand for enhanced security measures, in the face of increasingly sophisticated digital threats. By integrating anomaly detection findings through the Isolation Forest algorithm, this study enriches the training dataset, enabling models to adapt to both known and emerging vulnerability patterns

The Gradient Boosting model slightly outperformed the Random Forest model in terms of overall accuracy, achieving a precision of 97% for the non-vulnerability class, and the vulnerability class had a precision of 90%, leading to an overall accuracy of 96.25%., which is attributed to its ability to iteratively learn from previous errors, thereby enhancing its adaptability to new vulnerabilities. This study underscores the significant potential of ML to enhance cybersecurity measures against website vulnerabilities.

Downloads

Download data is not yet available.

References

A. Budiman, S. Ahdan, and M. Aziz, “ANALISIS CELAH KEAMANAN APLIKASI WEB E-LEARNING UNIVERSITAS ABC DENGAN VULNERABILITY ASSESMENT,” 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:245754579

W. Wardana, A. Almaarif, and A. Widjajarto, “Vulnerability Assessment and Penetration Testing On The Xyz Website Using Nist 800-115 Standard,” Syntax Literate ; Jurnal Ilmiah Indonesia, 2022, [Online]. Available: https://api.semanticscholar.org/CorpusID:245971988

“How Many Websites Are There in the World?” [Online]. Available: https://siteefy.com/how-many-websites-are-there/

L. M. Gultom and M. Harahap, “ANALISIS CELAH KEAMANAN WEBSITE INSTANSI PEMERINTAHAN DI SUMATERA UTARA,” 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:56906769

R. Armando, A. Melyantara, R. Elfariani, D. Fitri, and M. Nasrullah, “IT Support Website Security Evaluation Using Vulnerability Assessment Tools,” Journal of Information Systems and Informatics, vol. 4, pp. 949–957, Feb. 2022, doi: 10.51519/journalisi.v4i4.330.

M. Nasrullah, “Analisis Manajemen Keamanan Informasi Menggunakan Standard ISO 27001:2005 Pada Staff IT Support Di Instansi XYZ.” 2019.

I. Riadi, A. Yudhana, and W. Yunanri., “Analisis Keamanan Website Open Journal System Menggunakan Metode Vulnerability Assessment,” 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:209098628

R. Nurdin, “ANALISA KEAMANAN INTERNET MENGGUNAKAN NESSUS DAN ETHEREAL UNIVERSITAS PUTRA INDONESIA ‘YPTK’ PADANG,” Jurnal Teknologi Informasi dan Pendidikan, vol. 10, pp. 11–25, Feb. 2018, doi: 10.24036/tip.v10i3.9.

A. Zirwan, “Pengujian dan Analisis Kemanan Website Menggunakan Acunetix Vulnerability Scanner,” Jurnal Informasi dan Teknologi, pp. 70–75, Feb. 2022, doi: 10.37034/jidt.v4i1.190.

M. Nasrullah, N. D. Angresti, S. H. Suryawan, and F. Mahananto, “Requirement Engineering terhadap Virtual Team pada Proyek Software Engineering,” Journal of Advances in Information and Industrial Technology, 2021, [Online]. Available: https://api.semanticscholar.org/CorpusID:239710736

M. Orisa and M. Ardita, “VULNERABILITY ASSESMENT UNTUK MENINGKATKAN KUALITAS KEMANAN WEB,” 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:233900059

“How Many Websites Are There in the World?” 2022. [Online]. Available: https://siteefy.com/how-many-websites-are-there/

S. Ariyani and A. Wijaya, “ATCS System Security Audit Using Nessus,” Journal of Information Engineering and Applications, vol. 7, pp. 24–27, 2017, [Online]. Available: https://api.semanticscholar.org/CorpusID:65650127

X. Zhang, C. Ma, and M. Timme, “Dynamic Vulnerability in Oscillatory Networks and Power Grids.” Feb. 2019.

R. Anitha and M. V. Srinath, “Dynamic Integrated System for Detecting and Fixing Vulnerability Bugs,” Indonesian Journal of Electrical Engineering and Informatics (IJEEI), 2018, [Online]. Available: https://api.semanticscholar.org/CorpusID:54583567

L. K. Shar and H. B. K. Tan, “Predicting common web application vulnerabilities from input validation and sanitization code patterns,” in 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, 2012, pp. 310–313. doi: 10.1145/2351676.2351733.

L. K. Shar and H. B. K. Tan, “Predicting SQL injection and cross site scripting vulnerabilities through mining input sanitization patterns,” Inf Softw Technol, vol. 55, no. 10, pp. 1767–1780, 2013, doi: https://doi.org/10.1016/j.infsof.2013.04.002.

L. K. Shar, L. C. Briand, and H. B. K. Tan, “Web Application Vulnerability Prediction Using Hybrid Program Analysis and Machine Learning,” IEEE Trans Dependable Secure Comput, vol. 12, no. 6, pp. 688–707, 2015, doi: 10.1109/TDSC.2014.2373377.

I. Abunadi and M. Alenezi, “An Empirical Investigation of Security Vulnerabilities within Web Applications,” JUCS - Journal of Universal Computer Science, vol. 22, no. 4, pp. 537–551, 2016, doi: 10.3217/jucs-022-04-0537.

A. Garg, R. Degiovanni, M. Jimenez, M. Cordy, M. Papadakis, and Y. Le Traon, “Learning from what we know: How to perform vulnerability prediction using noisy historical data,” Empir Softw Eng, vol. 27, 2020, [Online]. Available: https://api.semanticscholar.org/CorpusID:247188113

Y. Chen, A. Santosa, A. Yi, A. Sharma, A. Sharma, and D. Lo, “A Machine Learning Approach for Vulnerability Curation,” Feb. 2020, pp. 32–42. doi: 10.1145/3379597.3387461.

M. Hoque, N. Jamil, N. Amin, and K.-Y. Lam, “An Improved Vulnerability Exploitation Prediction Model with Novel Cost Function and Custom Trained Word Vector Embedding,” Sensors, vol. 21, p. 4220, Feb. 2021, doi: 10.3390/s21124220.

P. Pakshad, A. Shameli-Sendi, and B. Khalaji Emamzadeh Abbasi, “A security vulnerability predictor based on source code metrics,” Journal of Computer Virology and Hacking Techniques, vol. 19, no. 4, pp. 615–633, 2023, doi: 10.1007/s11416-023-00469-y.

“UNSW_NB15.” [Online]. Available: https://www.kaggle.com/datasets/mrwellsdavid/unsw-nb15

S. Sen Rituparna and Das, “Unsupervised Learning,” in Computational Finance with R, Singapore: Springer Nature Singapore, 2023, pp. 305–318. doi: 10.1007/978-981-19-2008-0_21.

T. P. and B. B. Talukdar Jyotismita and Singh, “Unsupervised Learning,” in Artificial Intelligence in Healthcare Industry, Singapore: Springer Nature Singapore, 2023, pp. 87–107. doi: 10.1007/978-981-99-3157-6_5.

R. K. Deepti Chopra, Unsupervised Learning, Introduction to Machine Learning with Python, vol. 1. Bentham Science Publisher. doi: https://doi.org/10.2174/9789815124422123010010.

K. Tyagi, C. Rane, R. Sriram, and M. Manry, “Chapter 3 - Unsupervised learning,” in Artificial Intelligence and Machine Learning for EDGE Computing, R. Pandey, S. K. Khatri, N. kumar Singh, and P. Verma, Eds., Academic Press, 2022, pp. 33–52. doi: https://doi.org/10.1016/B978-0-12-824054-0.00012-5.

G. E. Hinton and T. J. Sejnowski, “Unsupervised learning : foundations of neural computation,” 1999. [Online]. Available: https://api.semanticscholar.org/CorpusID:60095295

Y. Li and H. Wu, “A Clustering Method Based on K-Means Algorithm,” Phys Procedia, vol. 25, pp. 1104–1109, Dec. 2012, doi: 10.1016/j.phpro.2012.03.206.

R. Cosentino, R. Balestriero, Y. Bahroun, A. Sengupta, R. Baraniuk, and B. Aazhang, “Spatial Transformer K-Means.” 2022.

F. Nie, Z. Li, R. Wang, and X. Li, “An Effective and Efficient Algorithm for K-Means Clustering With New Formulation,” IEEE Trans Knowl Data Eng, vol. 35, no. 4, pp. 3433–3443, 2023, doi: 10.1109/TKDE.2022.3155450.

J. Q. J. H. A. Salih A. S. Ahmed, “Development of a Decision Support System for Urban Planning by Using K-means ++ Algorithm,” vol. 31, no. 3. doi: 10.23851/mjs.v31i3.721.

J. Wang and L. Liu, “A new multivariate control chart based on the isolation forest algorithm,” Qual Eng, pp. 1–17, doi: 10.1080/08982112.2023.2220773.

H. Mary Shyni and E. Chitra, “Unsupervised Lung Anomaly Detection from Chest Radiographs for Curative Care using Isolation Forest Algorithm,” in 2022 OPJU International Technology Conference on Emerging Technologies for Sustainable Development (OTCON), 2023, pp. 1–6. doi: 10.1109/OTCON56053.2023.10113915.

H. Mary Shyni and E. Chitra, “Unsupervised Lung Anomaly Detection from Chest Radiographs for Curative Care using Isolation Forest Algorithm,” in 2022 OPJU International Technology Conference on Emerging Technologies for Sustainable Development (OTCON), 2023, pp. 1–6. doi: 10.1109/OTCON56053.2023.10113915.

H. Xu, G. Pang, Y. Wang, and Y. Wang, “Deep Isolation Forest for Anomaly Detection,” IEEE Trans Knowl Data Eng, vol. 35, no. 12, pp. 12591–12604, 2023, doi: 10.1109/TKDE.2023.3270293.

F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation Forest,” in 2008 Eighth IEEE International Conference on Data Mining, 2008, pp. 413–422. doi: 10.1109/ICDM.2008.17.

R. K. Deepti Chopra, Supervised Learning, Introduction to Machine Learning with Python (2023) 1: 97. doi: https://doi.org/10.2174/9789815124422123010009.

T. P. and B. B. Talukdar Jyotismita and Singh, “Supervised Learning,” in Artificial Intelligence in Healthcare Industry, Singapore: Springer Nature Singapore, 2023, pp. 51–86. doi: 10.1007/978-981-99-3157-6_4.

G. S. Prima Arina Harlen Silalahi, “SUPERVISED LEARNING METODE K-NEAREST NEIGHBOR UNTUK PREDIKSI DIABETES PADA WANITA,” vol. Vol. 7 No. 1. doi: 10.46880/jmika.vol7no1.pp144-149.

R. K. Deepti Chopra, Linear Regression and Logistic Regression, Introduction to Machine Learning with Python. Bentham Science Publisher. doi: https://doi.org/10.2174/9789815124422123010005.

“Supervised Learning,” in Optimization for Learning and Control, John Wiley & Sons, Ltd, 2023, pp. 297–326. doi: https://doi.org/10.1002/9781119809180.ch10.

L. Breiman, “Random Forests,” Mach Learn, vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.

A. Onan, “Hybrid supervised clustering based ensemble scheme for text classification,” Kybernetes, vol. 46, no. 2, pp. 330–348, Jan. 2017, doi: 10.1108/K-10-2016-0300.

A. Onan, S. Korukoğlu, and H. Bulut, “A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification,” Inf Process Manag, vol. 53, no. 4, pp. 814–833, 2017, doi: https://doi.org/10.1016/j.ipm.2017.02.008.

A. Toçoğlu Mansur Alp and Onan, “Sentiment Analysis on Students’ Evaluation of Higher Educational Institutions,” in Intelligent and Fuzzy Techniques: Smart and Innovative Solutions, S. and O. B. and S. I. U. and C. S. and T. A. C. Kahraman Cengiz and Cevik Onar, Ed., Cham: Springer International Publishing, 2021, pp. 1693–1700.

Y. Amit and D. Geman, “Shape Quantization and Recognition with Randomized Trees,” Neural Comput, vol. 9, no. 7, pp. 1545–1588, Jul. 1997, doi: 10.1162/neco.1997.9.7.1545.

T. O. Omotehinwa, D. O. Oyewola, and E. G. Dada, “A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis,” Healthcare Analytics, vol. 4, p. 100218, 2023, doi: https://doi.org/10.1016/j.health.2023.100218.

Y. Cui, Y. Wu, and X. Lou, “Application of Gradient Boosting Algorithm in Investment Trends Forecast,” in 2023 7th International Conference on Trends in Electronics and Informatics (ICOEI), 2023, pp. 1287–1290. doi: 10.1109/ICOEI56765.2023.10126026.

H. A. Adnan Alnawas M. Al-Jawad, “A PREDICITON MODEL BASED ON STUDENTS’S BEHAVIOR IN E-LEARNING ENVIRONMENTS USING DATA MINING TECHNIQUES,” vol. 26, no. 5. 2022. doi: https://doi.org/10.31272/jeasd.26.5.11.

Downloads

Published

2024-03-30

How to Cite

Hossain Hadi, M., & Hashim Al-Saedi, K. (2024). Adaptive Hybrid Learning for Websites Vulnerability prediction . Journal of Al-Qadisiyah for Computer Science and Mathematics, 16(1), Comp. 32–49. https://doi.org/10.29304/jqcsm.2024.16.11433

Issue

Section

Computer Articles