Robust and Accurate Phishing Detection Using Enhanced DistilBERT: A Transformer-Based Approach
DOI:
https://doi.org/10.29304/jqcsm.2025.17.32403Keywords:
Email Security, Transformer Fine-tuning, Phishing DetectionAbstract
Phishing scams are still a big problem in email, constantly changing and finding ways to sneak past regular filters by using tricky language and structures. To tackle this, we developed an improved system for spotting these fake emails. Our approach leverages an enhanced version of DistilBERT, fine-tuned and optimized specifically for phishing detection. The proposed “Enhanced DistilBERT” was trained on a comprehensive Email Phishing Dataset—a carefully curated collection of around 82,500 emails drawn from multiple trusted sources, including Enron, Ling, CEAS, Nazario, Nigerian Fraud, and Spam Assassin. This dataset provides a balanced mix of 42,891 phishing attempts and 39,595 legitimate messages, with a strong focus on both the subject lines and email body text. Such diversity enables robust text analysis and helps the model effectively learn the subtle distinctions between phishing and genuine emails. When evaluated, Enhanced DistilBERT outperformed other leading models, including RoBERTa and the original DistilBERT, by achieving perfect scores in accuracy, precision, recall, and F1-score, alongside a near-perfect ROC AUC of 99.76%. The model not only excelled in standard evaluations but also demonstrated resilience against noisy and adversarial data, maintaining high accuracy and low false-positive rates under challenging conditions. These results confirm the superiority and practicality of our approach, establishing Enhanced DistilBERT as a reliable and ready-to-deploy tool for combating phishing threats in today’s dynamic digital landscape.
Downloads
References
SALLOUM, Said, GABER, Tarek, VADERA, Sunil, et al. A systematic literature review on phishing email detection using natural language processing techniques. IEEE Access, 2022, vol. 10, p. 65703-65727.
DO, Nguyet Quang, SELAMAT, Ali, KREJCAR, Ondrej, et al. Deep learning for phishing detection: Taxonomy, current challenges and future directions. Ieee Access, 2022, vol. 10, p. 36429-36463.
MAGDY, Safaa, ABOUELSEOUD, Yasmine, et MIKHAIL, Mervat. Efficient spam and phishing emails filtering based on deep learning. Computer Networks, 2022, vol. 206, p. 108826.
MAHALAKSHMI, S., PANSY, D. Lita, et THEJASHREE, V. M. Smart Email Filtering Against Phishing Attacks. In : 2025 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI). IEEE, 2025. p. 1-6.
BENAVIDES, Eduardo, FUERTES, Walter, SANCHEZ, Sandra, et al. Classification of phishing attacks solutions by employing deep learning techniques: A systematic literature review. Developments and Advances in Defense and Security: Proceedings of MICRADS 2019, 2019, p. 51-64.
A. Khalid, A. Zainal, M. A. Maarof, and F. A. Ghaleb, Advanced persistent threat detection: A survey, In 2021 3rd International Cyber Resilience Conference (CRC), pp. 1-6. IEEE, 2021.
RASTENIS, Justinas, RAMANAUSKAITĖ, Simona, SUZDALEV, Ivan, et al. Multi-language spam/phishing classification by email body text: Toward automated security incident investigation. Electronics, 2021, vol. 10, no 6, p. 668.
SOMESHA, M. et PAIS, Alwyn R. Classification of phishing email using word embedding and machine learning techniques. Journal of Cyber Security and Mobility, 2022, p. 279-320.
FARES, Hajar, MOUAKKAL, Nouhayla, BADDI, Youssef, et al. Robust email phishing detection using machine learning and deep learning approach. International Journal of Communication Networks and Information Security, 2024, vol. 16, no 3, p. 91-108..
OWA, Kayode et ADEWOLE, Olumide. Benchmarking machine learning techniques for phishing detection and secure URL classification. International Journal of Computer Science and Mobile Computing, 2025, vol. 14, no 1, p. 20-37.
MBADIWE, Obianuju Nwaogo, NWOKONKWO, Obi Chukwuemeka, IKERIONWU, Charles O., et al. Email Anti-Phishing: Machine Learning Models and Evaluation Overview. 2024.
SOMESHA, M. et PAIS, Alwyn R. Classification of phishing email using word embedding and machine learning techniques. Journal of Cyber Security and Mobility, 2022, p. 279-320.
S. Y. Yerima, and M. K. Alzaylaee, High accuracy phishing detection based on convolutional neural networks, In 2020 3rd International Conference on Computer Applications & Information Security (ICCAIS), pp. 1-6. IEEE, 2020.
E. Castillo, S. Dhaduvai, P. Liu, K.-S. Thakur, A. Dalton, and T. Strzalkowski, Email threat detection using distinct neural network approaches, In Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management, pp. 48-55. 2020.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Hamad Abed Farhan

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.








