Adaptive Features Selection Technique for Efficient Heart Disease Prediction
DOI:
https://doi.org/10.29304/jqcm.2023.15.1.1137Keywords:
Heart disease;, Features selection;, Mutual information;, Random ForestAbstract
Heart disease is a common disease that causes death and is difficult to detect manually. A more efficient classification model that relies on machine learning methods to achieve higher classification accuracy, attracts the attention of researchers to design an effective prediction model. Moreover, it plays an important role in the practical application of medical cardiology with the aim of early detection of heart diseases. In this paper, an efficient and accurate heart disease detection system is proposed based on the proposed adaptive feature selection technique using four machine learning methods: Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), and Random Forest (RF). Two feature selection methods were used to design the proposed technique, mutual information (MI) and recursive feature elimination (RFE) to determine the optimal number of selected features that increase the performance of the classification models and reduce the time complexity of model implementation. The proposed technique was implemented on the two standard databases from the UCI machine learning repository: Cleveland heart disease and heart Statlog Cleveland. The best model was selected and saved as a prediction model using the cross-validation method. The results show that each data has a different number of features chosen according to the classifier model. For the first heart disease dataset, the best heart disease detection system Support Vector Machine-mutual information (SVM-MI) achieved the highest classification accuracy of approximately 96.755 compared to the other classifier models used. While the Random Forest-mutual information (RF-MI) model achieved an accuracy of 97.4% for the second data set. The proposed technique produced the highest prediction performance in terms of accuracy, f1 score, accuracy, and metric retrieval compared to the latest research in this field.
Downloads
References
[2] P. Ghosh et al., "Efficient prediction of cardiovascular disease using machine learning algorithms with relief and LASSO feature selection techniques," IEEE Access, vol. 9, pp. 19304-19326, 2021.
[3] F. E. Harrell, "Ordinal logistic regression," in Regression modeling strategies: Springer, 2015, pp. 311-325.
[4] N. Friedman, D. Geiger, and M. Goldszmidt, "Bayesian network classifiers," Machine learning, vol. 29, no. 2, pp. 131-163, 1997.
[5] X. Wu et al., "Top 10 algorithms in data mining," Knowledge and information systems, vol. 14, no. 1, pp. 1-37, 2008.
[6] A. B. Shaik and S. Srinivasan, "A brief survey on random forest ensembles in classification model," in International Conference on Innovative Computing and Communications, 2019: Springer, pp. 253-260.
[7] D. Oreski, S. Oreski, and B. Klicek, "Effects of dataset characteristics on the performance of feature selection techniques," Applied Soft Computing, vol. 52, pp. 109-119, 2017.
[8] M. K. H. Eknath, "Identification of important characteristics and methods for data processing in cardiovascular estimation," Journal of Emerging Technologies and Innovative Research, vol. 8, no. 4, pp. 277-281, 2021. [Online]. Available: https://www.jetir.org/view?paper=JETIR2104038.
[9] K. Dissanayake and M. G. Md Johar, "Comparative Study on Heart Disease Prediction Using Feature Selection Techniques on Classification Algorithms," Applied Computational Intelligence and Soft Computing, vol. 2021, 2021.
[10] A. Jović, K. Brkić, and N. Bogunović, "A review of feature selection methods with applications," in 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO), 2015: Ieee, pp. 1200-1205.
[11] M. S. Amin, Y. K. Chiam, and K. D. Varathan, "Identification of significant features and data mining techniques in predicting heart disease," Telematics and Informatics, vol. 36, pp. 82-93, 2019.
[12] A. Ul Haq, J. Li, M. H. Memon, J. Khan, and S. Ud Din, "A novel integrated diagnosis method for breast cancer detection," Journal of Intelligent & Fuzzy Systems, vol. 38, no. 2, pp. 2383-2398, 2020.
[13] H. Zhou, X. Wang, and R. Zhu, "Feature selection based on mutual information with correlation coefficient," Applied Intelligence, vol. 52, no. 5, pp. 5457-5474, 2022.