Models in Review for the Analysis of Phishing Website URLs
DOI:
https://doi.org/10.29304/jqcsm.2024.16.31641Keywords:
Machine Learning, Phishing, Websites, XGBoost, URLs, Cybersecurity.Abstract
In this paper, we compare our method on DEFRAUD with current online API services and the state-of-the-art machine learning model for defending against phishing websites. Rule-based methods, in nature traditional such rules tend to get outdated quickly and are not capable of tracing new tactics used by the malicious ware every second. Over time, new proactive approaches enabled by machine learning (ML) have become more important as these solutions are flexible and adaptable in their ability to scan through modern data breaches for patterns from millions of datasets. In this study, we explore various machine learning algorithms: Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Trees (DT) Random Forest (RF), Support Vector Classifiers (SVC) and xgBoost for phishing website detection. Ensemble Methods like Random Forest, XGBoost have better accuracy/precision/recall. Metrics. While XGBoost is resource hungry, it is well known for out of the box support with huge data dimensions as well deep learning framework and avoiding overfitting. The study underscores the importance of integrating machine-learning models into practical cybersecurity applications. Future research should focus on improving these models and expanding their application across different domains to enhance cybersecurity defenses.
Downloads
References
A. Mandadi, S. Boppana, V. Ravella and R. Kavitha, “Phishing website detection using machine learning,” in 2022 IEEE 7th Int. Conf. for Convergence in Technology (I2CT), Mumbai, India, pp. 1–4, 2022. https:// doi.org/10.1109/i2ct54291.2022.9824801
S. Kuraku and D. Kalla, “Emotet malware—A banking credentials stealer,” IOSR Journal of Computer Engineering, vol. 22, pp. 31–41, 2020.
A. Kulkarni and L. L. Brown, “Phishing websites detection using machine learning,” International Journal
of Advanced Computer Science and Applications, vol. 10, 2019.https://doi.org/10.14569/ijacsa.2019.0100702
D. Kalla and A. Chandrasekaran, “Heart disease prediction using machine learning and deep learning,” International Journal of Data Mining & Knowledge Management Process (IJDKP), vol. 13, no. 3, 2023. https://doi.org/10.5121/ijdkp.2023.13301
A. Safi and S. Singh, “A systematic literature review on phishing website detection techniques,” Journal of King Saud University—Computer and Information Sciences, 2023. https://doi.org/10.1016/j.
jksuci.2023.01.004
S. Das Guptta, K. T. Shahriar, H. Alqahtani, D. Alsalman and I. H. Sarker, “Modeling hybrid feature- based phishing websites detection using machine learning techniques,” Annals of Data Science, 2022.https:// doi.org/10.1007/s40745-022-00379-8
D. Kalla, F. Samaah, S. Kuraku and N. Smith, “Phishing detection implementation using databricks and artificial Intelligence,” International Journal of Computer Applications, vol. 185, no. 11, pp. 1–11, 2023. https://doi.org/10.5120/ijca2023922764
Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011). Natural language processing: an introduction. Journal of the American Medical Informatics Association, 18(5), 544-551.
Azeez, N., Awotunde, O., & Oladeji, F. (2020). Approach for Identifying Phishing Uniform Resource Locators (URLs). Covenant Journal of Informatics and Communication Technology.
P. Gupta and A. Mahajan, “Phishing website detection and prevention based on logistic regression,” International Journal of Creative Research Thoughts, vol. 10, pp. 2320–2882, 2022.
T. A. Assegie, “K-nearest neighbor based URL identification model for phishing attack detection,” Indian Journal of Artificial Intelligence and Neural Networking, vol. 1, no. 2, pp. 18–21, 2021. https://doi.org/10.54105/ijainn.b1019.041221
D. Ahmed, K. Hussein, H. Abed and A. Abed, “Phishing websites detection model based on decision tree algorithm and best feature selection method,” Turkish Journal of Computer and Mathematics Education, vol. 13, no. 1, pp. 100–107, 2022
G. Ramesh, R. Lokitha, R. Monisha and N. Neha, “Phishing detection system using random forest algorithm,” International Journal for Research Trends and Innovation, vol. 8, pp. 510, 2023.
D. Aksu, A. Abdulwakil and M. A. Aydin, “Detecting phishing websites using support vector machine algo-rithm,” Pressacademia, vol. 5, no. 1, pp. 139–142, 2017.https://doi.org/10.17261/pressacademia.2017.582
V. Jakkula, “Tutorial on support vector machine (SVM),”2011. [Online]. Available: https://course.ccs.neu.edu/cs5100f11/resources/jakkula.pdf (accessed on 15/04/2023)
Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
G. Kamal and M. Manna, “Detection of phishing websites using Naïve bayes algorithms,” International Journal of Recent Research and Review, vol. XI, no. 4, pp. 34–38, 2018.
F. Mbachan, “Phishing URL prediction using logistic regression,” 2022. https://doi.org/10.13140/RG.2.2.11606.93767
H. Rajaguru and S. R. Sannasi Chakravarthy, “Analysis of decision tree and K-nearest neighbor algorithm in the classification of breast cancer,” Asian Pacific Journal of Cancer Prevention, vol. 20, no. 12, pp. 3777–3781, 2019. https://doi.org/10.31557/APJCP.2019.20.12.3777
Musa, H., Gital, A. Y., Zambuk, F. U., Umar, A., Umar, A. Y., & Waziri, J. U. (2019). A comparative analysis of phishing website detection using XGBOOST algorithm. Journal of Theoretical and Applied Information Technology, 97(5), 1434-1443. https://www.researchgate.net/publication/333134242_A_comparative_analysis_of_phishing_website_detection_using_XGBOOST_algorithm
Naik, N. N. (2021). Modelling Enhanced Phishing detection using XGBoost (Doctoral dissertation, Dublin, National College of Ireland).
Goud, N. S., & Mathur, A. (2021). Feature Engineering Framework to detect Phishing Websites using URL Analysis. International Journal of Advanced Computer Science and Applications, 12(7).
Tabassum, N., Neha, F. F., Hossain, M. S., & Narman, H. S. (2021, May). A hybrid machine learning based phishing website detection technique through dimensionality reduction. In 2021 IEEE international black sea conference on communications and networking (BlackSeaCom) (pp. 1-6). IEEE.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Ali Salam Al-jaberi , Sura Fadhil Rahman , Ihsan Faisal Raheem
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.