Investigating the Applicability of Logistic Regression and Artificial Neural Networks in Predicting Breast Cancer
DOI:
https://doi.org/10.29304/jqcm.2020.12.2.697Keywords:
Logistic Regression, Artificial Neural Networks, classification, Validation, Breast CancerAbstract
Breast cancer has become recently the most common cancer and a major cause of death among women all over the world and especially in developing countries like Iraq. This study aims to predict the type of breast tumor whether benign or malignant through the different models that were built using logistic regression and neural networks which is expected to be helpful for oncologists in diagnosing the type of breast tumor. Four models were set using binary logistic regression and two different types of artificial neural networks namely multilayer perceptron (MLP) and radial basis function (RBF). Both validated and trained models were evaluated using different performance metrics like accuracy or correct classification rate (CCR), receiver operating characteristic (ROC) curves, area under ROC curve (AUC), sensitivity and specificity. Dataset has been downloaded from the machine learning repository of University of California, Irvin (UCI ml repository) that consists of 9 attributes and 699 valid instances.
Firstly, some preprocessing was done to cleanse the data, then the models were built using the Logistic Regression method and Artificial Neural Networks and a comparison was done to find out which model will give the highest performance. Each model was validated with a different dataset than that used for developing the models. The analysis of the results showed that the Radial Basis Function neural network model is the best classifier in the prediction of the type of breast tumors since it had recorded the highest performance in terms of correct classification rate (accuracy), sensitivity, specificity, and AUC among all other models.
Downloads
References
[2] World population review/Developing countries 2020, available at:
https://worldpopulationreview.com/countries/developing-countries/ extracted on June 19, 2020.
[3] Iraqi Cancer Board. Results of the Iraqi Cancer Registry Baghdad, Iraq, Iraqi Cancer RegistryCenter, Ministry of Health, 2015. available at: https://moh.gov.iq/upload/upfile/ar/833.pdf accessed on January 27 2020.
[4] M. Rathi, A.K. Singh, "Breast Cancer Prediction using Naïve Bayes Classifier," International Journal of Information Technology & Systems, vol.1, issue 2, pp.77-80, 2012.
[5] JA Cruz, DS Wishart, "Applications of Machine Learning in Cancer Prediction and Prognosis," Cancer Informatics. vol.2, pp. 2-21. 2006.
[6] https://science-network.tv/logistic-regression/. Accessed September 24, 2019. Ronny Gunnarsson and first published June 22, 2014. Last revised August 30, 2019.
[7] DW Hosmer, S. Lemeshow Applied logistic regression. New York: Wiley; 1989.
[8] PC Austin, JV. Tu, "Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality," Journal of Clinical Epidemiology. vol.57, issue 11, pp. 1138-1146, 2004.
[9] R. Genuer, JM Poggi, C. Tuleau-Malot, "Variable selection using random forests," Pattern Recognition Letters. vol.31, issue 14, pp. 2225-2236. 2010. DOI:10.1016/j.patrec.2010.03.014
[10] IBM SPSS Regression 22. Copyright IBM Corporation 1989, 2013.
[11] Z Bursac , CH Gauss , DK Williams , DW Hosmer, "Purposeful selection of variables in logistic regression," Source Code for Biology and Medicine, vol.3, issue 17, 2008. doi:10.1186/1751-0473-3-17
[12] M.G Kanojia, S. Abraham, 2016, December. "Breast cancer detection using RBF neural network," in 2nd International Conference on Contemporary Computing and Informatics (IC3I) 2016, pp. 363-368, IEEE.
[13] D Dua, C Graff. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science
[14] F Leisch, E. Dimitriadou, (2015). Machine learning benchmark problems. Viewed on 24.09.2019 from https://cran.r-project.org/web/packages/mlbench/mlbench.pdf.
[15] WH Wolberg, OL Mangasarian, "Multisurface method of pattern separation for medical diagnosis applied to breast cytology," in Proceedings of the national academy of sciences, United States of America, Dec, 1990, pp.9193-6.
[16] H. Kang, "The prevention and handling of the missing data," Korean journal of anesthesiology," vol.64, issue 5, pp. 402–406, 2013. doi:10.4097/kjae.2013.64.5.402
[17] J Schwarz, p.Heidi Bruderer Enzler, (2014). Research Methodology: Tools; Applied Data Analysis (with SPSS). Lecture 08: Logistic Regression Analysis, Lurcern University. Available at,
http://www.schwarzpartners.ch/Applied_Data_Analysis/Lecture%2008_EN_2014%20Logistic%20Regression%20Analysis.pdf extracted on September 30, 2019.
[18] J. Shao, "Linear Model Selection by Cross-Validation," Journal of the American Statistical Association, vol.80, No.422, pp. 486-494, 1993.
[19] H. Yusuff, N. Mohamad, U.K. Ngah, A. Yahaya, "Breast cancer analysis using logistic regression," International Journal of Research and Reviews in Applied Sciences, vol.10, issue1, pp.14-22, 2012.
[20] S. Rana, H Midi, SK Sarkar, "Validation and Performance Analysis of Binary Logistic Regression Model," in Proceedings of the WSEAS International Conference on Environment, medicine and health sciences; Penang, Malaysia, 2010, pp.51-55.
[21] J. Padmavati, "A comparative study on breast cancer prediction using RBF and MLP," International Journal of Scientific & Engineering Research, vol.2, issue1, pp. 1-5, 2011.
[22] H. Jafarnejadsani, J.Pieper, J. Ehlers, "Adaptive control of a variable-speed variable-pitch wind turbine using radial-basis function neural network," IEEE transactions on control systems technology, vol.21, issue 6, pp. 2264-2272, 2013. DOI: 10.1109/TCST.2012.2237518
[23] S Miri Rostami, M Ahmadzadeh, "Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem," Journal of AI and Data Mining, vol.6, issue2, pp.263-276, 2018. DOI: 10.22044/JADM.2017.5061.1609