Overcoming Class Overlap and Imbalance in ECG Detection and Classification: A Deep Attention-Based Model on MIT-BIH

Authors

  • Asmaa Sami Mirdan College of Computer Science and Information Technology, Kirkuk University, Kirkuk, Iraq.

DOI:

https://doi.org/10.29304/jqcsm.2025.17.22175

Keywords:

Machine Learning, Random Forest, Decision Tree, Support Vector Machine, ECG

Abstract

ECG signals are essential for monitoring and diagnosing cardiovascular conditions, particularly arrhythmias, which can lead to severe complications if undetected. This study introduces an AI-based approach for arrhythmia classification using the MIT-BIH arrhythmia dataset, addressing challenges like class imbalance, class overlap, and intra-patient bias. To enhance data quality, the dataset was balanced using the Synthetic Minority Oversampling Technique (SMOTE) and augmented with Gaussian noise for minority class samples. A Conv1D-Attention network was employed during preprocessing to extract local ECG features and focus on key waveforms. Among the evaluated classifiers, decision tree, random forest, and support vector machine (SVM), the random forest achieved the highest accuracy of 91%. Although preprocessing reduced class imbalance and variance, a drop in performance was observed. This reflects a realistic evaluation scenario by preventing data leakage from similar ECG segments of the same patient in both training and test sets. Enforcing patient-independent segmentation compelled the model to generalize beyond individual patterns, a critical step for real-world applications. This study highlights the importance of rigorous evaluation protocols in biomedical machine learning. Combining data augmentation with attention-based feature extraction significantly enhances model generalizability, particularly in handling overlapping and imbalanced classes. This approach shows promise for developing reliable, patient-independent diagnostic tools for early arrhythmia detection in clinical settings.

Downloads

Download data is not yet available.

References

Acharya, U. R., Fujita, H., Oh, S. L., Hagiwara, Y., Tan, J. H., & Adam, M. (2017). Application of deep convolutional neural network for automated detection of myocardial infarction using ECG signals. Information Sciences, 415, 190–198.

Ahmed, S. R., Ahmed, A. K., & Jwmaa, S. J. (2023). Analyzing the employee turnover by using decision tree algorithm. 2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), 1–4. IEEE. https://doi.org/10.1109/HORA58378.2023.10156709

Ansari, Y., Mourad, O., Qaraqe, K., & Serpedin, E. (2023). Deep learning for ECG Arrhythmia detection and classification: an overview of progress for period 2017–2023. Frontiers in Physiology, 14, 1246746. https://doi.org/10.3389/fphys.2023.1246746

Assaad, M. A., & Shakah, G. H. (2024). Optimizing Health Pattern Recognition Particle Swarm Optimization Approach for Enhanced Neural Network Performance. Cihan University-Erbil Scientific Journal, 8(2), 76–83. https://doi.org/10.24086/issn.2519-6979

Assaad, M., Boné, R., & Cardot, H. (2008). A new boosting algorithm for improved time-series forecasting with recurrent neural networks. Information Fusion, 9(1), 41–55. https://doi.org/10.1016j.inffus.2006.10.009

Atanasoski, V., Petrovic, J., Maneski, L. P., Miletic, M., Babic, M., Nikolic, A., … Ivanovic, M. D. (2024). A Morphology-Preserving Algorithm for Denoising of EMG-Contaminated ECG Signals. IEEE Open Journal of Engineering in Medicine and Biology, 5, 296–305. https://doi.org/10.1109/OJEMB.2024.3380352

Atanasov, V. (2023). Application of Convolutional Neural Networks and Signal Processing Techniques to Identify and Filter Noise from ECG Signals. 2023 18th Conference on Electrical Machines, Drives and Power Systems (ELMA), 1–5. IEEE. https://doi.org/10.1109/ELMA58392.2023.10202451

Baker, M. R., Mohammed, E. Z., & Jihad, K. H. (2022). Prediction of colon cancer related tweets using deep learning models. International Conference on Intelligent Systems Design and Applications, 522–532. Springer. https://doi.org/10.1007/978-3-031-27440-4_50

Baker, M. R., Taher, Y. N., & Jihad, K. H. (2023). PREDICTION OF PEOPLE SENTIMENTS ON TWITTER USING MACHINE LEARNING CLASSIFIERS DURING RUSSIAN AGGRESSION IN UKRAINE. Jordanian Journal of Computers & Information Technology, 9(3), 189–206. https://doi.org/10.5455/jjcit.71-1676205770

Balla, A., Habaebi, M. H., Elsheikh, E. A. A., Islam, M. R., & Suliman, F. M. (2023). The effect of dataset imbalance on the performance of SCADA intrusion detection systems. Sensors, 23(2), 758. https://doi.org/10.3390/s23020758

Camuto, A., Willetts, M., Simsekli, U., Roberts, S. J., & Holmes, C. C. (2020). Explicit regularisation in gaussian noise injections. Advances in Neural Information Processing Systems, 33, 16603–16614.

Darmawahyuni, A., Nurmaini, S., Rachmatullah, M. N., Tutuko, B., Sapitri, A. I., Firdaus, F., … Predyansyah, A. (2022). Deep learning-based electrocardiogram rhythm and beat features for heart abnormality classification. PeerJ Computer Science, 8, e825.

Dias, M., Probst, P., Silva, L., & Gamboa, H. (2024). Cleaning ecg with deep learning: A denoiser tested in industrial settings. SN Computer Science, 5(6), 699. https://doi.org/10.1007/s42979-024-03017-7

Dou, B., Zhu, Z., Merkurjev, E., Ke, L., Chen, L., Jiang, J., … Wei, G.-W. (2023). Machine learning methods for small data challenges in molecular science. Chemical Reviews, 123(13), 8736–8780. https://doi.org/10.1021/acs.chemrev.3c00189

Ferreir, J. L., Kumar, S., Soni, A., Acharya, N., & Acharya, S. (2023). Clinical management of cardiovascular diseases. In Current Trends in the Diagnosis and Management of Metabolic Disorders (pp. 254–278). CRC Press.

Guo, T., & Li, X. (2023). Machine learning for predicting phenotype from genotype and environment. Current Opinion in Biotechnology, 79, 102853. https://doi.org/10.1016/j.copbio.2022.102853

Gygi, J. P., Kleinstein, S. H., & Guan, L. (2023). Predictive overfitting in immunological applications: Pitfalls and solutions. Human Vaccines & Immunotherapeutics, 19(2), 2251830. https://doi.org/10.1080/21645515.2023.2251830

Habib, M., & Okayli, M. (2024). Evaluating the sensitivity of machine learning models to data preprocessing technique in concrete compressive strength estimation. Arabian Journal for Science and Engineering, 49(10), 13709–13727. https://doi.org/10.1007/s13369-024-08776-2

Haghighi, S., Jasemi, M., Hessabi, S., & Zolanvari, A. (2018). PyCM: Multiclass confusion matrix library in Python. Journal of `Open-Source Software, 3(25), 729.

Hameed, H. K. (2023). Deep learning Algorithm for predicting various retail store sales Using LSTM and ARIMA. Dijlah Journal, 6(4).

Hasanin, T., Khoshgoftaar, T. M., Leevy, J. L., & Seliya, N. (2019). Examining characteristics of predictive models with imbalanced big data. Journal of Big Data, 6, 1–21.

Huang, Y., Liu, W., Yin, Z., Hu, S., Wang, M., & Cai, W. (2024). ECG classification based on guided attention mechanism. Computer Methods and Programs in Biomedicine, 257, 108454.

Jansen, B. J., Aldous, K. K., Salminen, J., Almerekhi, H., & Jung, S. (2023). Understanding audiences, customers, and users via analytics: An introduction to the employment of web, social, and other types of digital people data. Springer.

Kaggle. (2023). Arrhythmia Database Modern. Retrieved from Kaggle website: https://www.kaggle.com/datasets/protobioengineering/mit-bih/arrhythmia-database-modern-2023

Ketu, S., & Mishra, P. K. (2022). Empirical analysis of machine learning algorithms on imbalance electrocardiogram-based arrhythmia dataset for heart disease detection. Arabian Journal for Science and Engineering, 47(2), 1447–1469.

Khalid, H. (2024). Modern techniques in detecting, identifying and classifying fruits according to the developed machine learning algorithm. Journal of Applied Research and Technology, 22(2), 219–229. https://doi.org/10.22201/icat.24486736e.2024.22.2.2269

Khalid, Hind. (2024). Efficient Image Annotation and Caption System Using Deep Convolutional Neural Networks. BIO Web of Conferences, 97, 103. EDP Sciences. https://doi.org/10.1051/bioconf/20249700103

Khalili, M., GholamHosseini, H., Lowe, A., & Kuo, M. M. Y. (2024). Motion artifacts in capacitive ECG monitoring systems: a review of existing models and reduction techniques. Medical & Biological Engineering & Computing, 62(12), 3599–3622. https://doi.org/10.1007/s11517-024-03165-1

Khan, F., Yu, X., Yuan, Z., & Rehman, A. U. (2023). ECG classification using 1-D convolutional deep residual neural network. Plos One, 18(4), e0284791.

Lee, J.-H., Zaheer, M. Z., Astrid, M., & Lee, S.-I. (2020). SmoothMix: a Simple Yet Effective Data Augmentation to Train Robust Classifiers. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 3264–3274. https://doi.org/10.1109/CVPRW50498.2020.00386

Lekkas, G., Vrochidou, E., & Papakostas, G. A. (2025). Time–Frequency Transformations for Enhanced Biomedical Signal Classification with Convolutional Neural Networks. BioMedInformatics, 5(1), 7.

Li, H., Ditzler, G., Roveda, J., & Li, A. (2024). DeScoD-ECG: Deep Score-Based Diffusion Model for ECG Baseline Wander and Noise Removal. IEEE Journal of Biomedical and Health Informatics, 28(9), 5081–5091. https://doi.org/10.1109/JBHI.2023.3237712

Li, S.-F., Huang, M.-L., & Wu, Y.-S. (2023). Combining the taguchi method and convolutional neural networks for arrhythmia classification by using ecg images with single heartbeats. Mathematics, 11(13), 2841.

Markoulidakis, I., Kopsiaftis, G., Rallis, I., & Georgoulas, I. (2021). Multi-class confusion matrix reduction method and its application on net promoter score classification problem. Proceedings of the 14th PErvasive Technologies Related to Assistive Environments Conference, 412–419. https://doi.org/10.3390/technologies9040081

Maturi, B., Dulal, S., Sayana, S. B., Ibrahim, A., Ramakrishna, M., Chinta, V., … Ravipati, H. (2025). Revolutionizing Cardiology: The Role of Artificial Intelligence in Echocardiography. Journal of Clinical Medicine, 14(2), 625. https://doi.org/10.3390/jcm14020625

Mir, H. Y., & Singh, O. (2024). Powerline interference reduction in ECG signals using variable notch filter designed via variational mode decomposition. Analog Integrated Circuits and Signal Processing, 118(2), 317–328. https://doi.org/10.1007/s10470-023-02200-9

Muralidharan, K., Ramesh, A., Rithvik, G., Prem, S., Reghunaath, A. A., & Gopinath, M. P. (2021). 1D Convolution approach to human activity recognition using sensor data and comparison with machine learning algorithms. International Journal of Cognitive Computing in Engineering, 2, 130–143.

Nagpal, A. K., Pundkar, A., Singh, A., & Gadkari, C. (2024). Cardiac arrhythmias and their management: An in-depth review of current practices and emerging therapies. Cureus, 16(8). https://doi.org/10.7759/cureus.66549

Niu, Z., Zhong, G., & Yu, H. (2021). A review on the attention mechanism of deep learning. Neurocomputing, 452, 48–62.

Oleiwi, Z. C., AlShemmary, E. N., & Al-augby, S. (2023). Efficient ECG Beats Classification Techniques for The Cardiac Arrhythmia Detection Based on Wavelet Transformation. International Journal of Intelligent Engineering & Systems, 16(2).

Oleiwi, Z. C., AlShemmary, E. N., & Al-Augby, S. (2023). Arrhythmia Detection Based on New Multi-Model Technique for ECG Inter-Patient Classification. International Journal of Online & Biomedical Engineering, 19(12).

Oleiwi, Z. C., AlShemmary, E. N., & Al-Augby, S. (2024). Developing Hybrid CNN-GRU Arrhythmia Prediction Models Using Fast Fourier Transform on Imbalanced ECG Datasets. Mathematical Modelling of Engineering Problems, 11(2).

Owusu, E., Quainoo, R., Mensah, S., & Appati, J. K. (2023). A deep learning approach for loan default prediction using imbalanced dataset. International Journal of Intelligent Information Technologies (IJIIT), 19(1), 1–16. https://doi.org/10.4018/IJIIT.318672

Rai, H. M., & Chatterjee, K. (2022). Hybrid CNN-LSTM deep learning model and ensemble technique for automatic detection of myocardial infarction using big ECG data. Applied Intelligence, 52(5), 5366–5384.

Ramkumar, M., Babu, C. G., Priyanka, G. S., & Kumar, R. S. (2021). Ecg arrhythmia signals classification using particle swarm optimization-support vector machines optimized with independent component analysis. IOP Conference Series: Materials Science and Engineering, 1084(1), 12009. IOP Publishing.

Rashid, T. A., Shekho Toghramchi, C. I., Sindi, H., Alsadoon, A., Bačanin, N., Umar, S. U., … Mohammadi, M. (2021). An improved BAT algorithm for solving job scheduling problems in hotels and restaurants. Artificial Intelligence: Theory and Applications, 155–171.

Ren, H., Sun, Q., Xiao, Z., Yu, M., Wang, S., Yuan, L., … Yang, H. (2025). Heterogeneous feature fusion based machine learning strategy for ECG diagnosis. Expert Systems with Applications, 126714.

Roth, K., Pemula, L., Zepeda, J., Schölkopf, B., Brox, T., & Gehler, P. (2022). Towards total recall in industrial anomaly detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14318–14328.

Schlenger, J. (2024). Random forest. In Computer Science in Sport: Modeling, Simulation, Data Analysis and Visualization of Sports-Related Data (pp. 201–207). Springer. https://doi.org/10.1007/978-3-662-68313-2_24

Sengupta, S., & Das, S. (2024). Analyzing Electrocardiogram Data-Statistical Insights Into Cardiac Health. In Revolutionizing Healthcare Treatment With Sensor Technology (pp. 126–154). IGI Global.

Singh, N., & Singh, P. (2019). Cardiac arrhythmia classification using machine learning techniques. Engineering Vibration, Communication and Information Processing: ICoEVCI 2018, India, 469–480. Springer.

Tariq, M., Palade, V., Ma, Y., & Altahhan, A. (2023). Diabetic retinopathy detection using transfer and reinforcement learning with effective image preprocessing and data augmentation techniques. In Fusion of Machine Learning Paradigms: Theory and Applications (pp. 33–61). Springer.

Umar, S. U., Rashid, T. A., Ahmed, A. M., Hassan, B. A., & Baker, M. R. (2024). Modified Bat Algorithm: a newly proposed approach for solving complex and real-world problems. Soft Computing, 28(13), 7983–7998. https://doi.org/10.1007/s00500-024-09761-5

Valkenborg, D., Rousseau, A.-J., Geubbelmans, M., & Burzykowski, T. (2023). Support vector machines. American Journal of Orthodontics and Dentofacial Orthopedics, 164(5), 754–757. https://doi.org/10.1016/j.ajodo.2023.08.003

Wodecki, J., Michalak, A., Wyłomańska, A., & Zimroz, R. (2021). Influence of non-Gaussian noise on the effectiveness of cyclostationary analysis–Simulations and real data analysis. Measurement, 171, 108814. https://doi.org/10.1016/j.measurement.2020.108814

Wongvorachan, T., He, S., & Bulut, O. (2023). A comparison of undersampling, oversampling, and SMOTE methods for dealing with imbalanced classification in educational data mining. Information, 14(1), 54. https://doi.org/10.3390/info14010054

Zeng, W., Shan, L., Yuan, C., & Du, S. (2024). Advancing cardiac diagnostics: Exceptional accuracy in abnormal ECG signal classification with cascading deep learning and explainability analysis. Applied Soft Computing, 165, 112056. https://doi.org/10.1016/j.asoc.2024.112056.

Downloads

Published

2025-06-30

How to Cite

Sami Mirdan, A. (2025). Overcoming Class Overlap and Imbalance in ECG Detection and Classification: A Deep Attention-Based Model on MIT-BIH. Journal of Al-Qadisiyah for Computer Science and Mathematics, 17(2), Comp. 29–47. https://doi.org/10.29304/jqcsm.2025.17.22175

Issue

Section

Computer Articles