A Review of the Overfitting Problem in Convolution Neural Network and Remedy Approaches
DOI:
https://doi.org/10.29304/jqcm.2023.15.2.1240Keywords:
Convolutional Neural Network,, Overfitting Problem, Deep learning, Regularization strategy, Data ExpansionAbstract
Deep learning methods have attracted much attention over the past few years after their breakthroughs in speech recognition and computer vision. However, Convolution Neural Network (CNN) is one of the most important networks that have been used especially in image classification, these networks are facing an essential problem which is the “overfitting problem”. This problem means that the difficulty implies that the model would just "memorize" previously observed patterns, rather than “learn” the important patterns. As a result, it will down the classification performance and become a huge problem. Different remedy approaches have been suggested, each one can exhibit different behavior in the reduction of the overfitting problem according to the nature of the training data. So, this variety of remedy approaches needs to be examined. This paper would be focused on the factors that can cause the overfitting problem, then continue into the approaches to solving this problem.
Downloads
References
[2] B. Ghojogh and M. Crowley, “The theory behind overfitting, cross validation, regularization, bagging, and boosting: tutorial,” arXiv preprint arXiv:1905.12787, 2019.
[3] M. Liu et al., “Focuseddropout for convolutional neural network,” Applied Sciences, vol. 12, no. 15, p. 7682, 2022.
[4] M. R. NarasingaRao, V. Venkatesh Prasad, P. Sai Teja, M. Zindavali, and O. Phanindra Reddy, “A survey on prevention of overfitting in convolution neural networks using machine learning techniques,” International Journal of Engineering and Technology (UAE), vol. 7, no. 2.32, pp. 177–180, 2018.
[5] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
[6] L. Wan, M. Zeiler, S. Zhang, Y. le Cun, and R. Fergus, “Regularization of neural networks using dropconnect,” in International conference on machine learning, 2013, pp. 1058–1066.
[7] H. Zhu and X. Zhao, “TargetDrop: a targeted regularization method for convolutional neural networks,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 3283–3287.
[8] Z. Ouyang, Y. Feng, Z. He, T. Hao, T. Dai, and S.-T. Xia, “Attentiondrop for convolutional neural networks,” in 2019 IEEE International Conference on Multimedia and Expo (ICME), 2019, pp. 1342–1347.
[9] S. Salman and X. Liu, “Overfitting mechanism and avoidance in deep neural networks,” arXiv preprint arXiv:1901.06566, 2019.
[10] Y. Assiri, “Stochastic optimization of plain convolutional neural networks with simple methods,” arXiv preprint arXiv:2001.08856, 2020.
[11] H. Wei, L. Feng, X. Chen, and B. An, “Combating noisy labels by agreement: A joint training method with co-regularization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13726–13735.
[12] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press, 2016.
[13] E. Maggiori, Y. Tarabalka, G. Charpiat, and P. Alliez, “Convolutional neural networks for large-scale remote-sensing image classification,” IEEE Transactions on geoscience and remote sensing, vol. 55, no. 2, pp. 645–657, 2016.
[14] C. Szegedy et al., “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
[15] A. Ghosh, A. Sufian, F. Sultana, A. Chakrabarti, and D. De, “Fundamental concepts of convolutional neural network,” in Recent trends and advances in artificial intelligence and Internet of Things, Springer, 2020, pp. 519–567. doi: 10.1007/978-3-030-32644-9_36.
[16] C. F. G. Dos Santos and J. P. Papa, “Avoiding overfitting: A survey on regularization methods for convolutional neural networks,” ACM Computing Surveys (CSUR), vol. 54, no. 10s, pp. 1–25, 2022.
[17] F. Sultana, A. Sufian, and P. Dutta, “Advancements in image classification using convolutional neural network,” in 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), 2018, pp. 122–129.
[18] A. Kamilaris and F. X. Prenafeta-Boldú, “A review of the use of convolutional neural networks in agriculture,” J Agric Sci, vol. 156, no. 3, pp. 312–322, 2018.
[19] F. Samadi, G. Akbarizadeh, and H. Kaabi, “Change detection in SAR images using deep belief network: a new training approach based on morphological images,” IET Image Process, vol. 13, no. 12, pp. 2255–2264, 2019.
[20] D. Arpit et al., “A closer look at memorization in deep networks,” in International conference on machine learning, 2017, pp. 233–242.
[21] P. Chen, B. ben Liao, G. Chen, and S. Zhang, “Understanding and utilizing deep neural networks trained with noisy labels,” in International Conference on Machine Learning, 2019, pp. 1062–1070.
[22] X. Ying, “An overview of overfitting and its solutions,” in Journal of physics: Conference series, 2019, vol. 1168, no. 2, p. 022022.
[23] J. Kolluri, V. K. Kotte, M. S. B. Phridviraj, and S. Razia, “Reducing overfitting problem in machine learning using novel L1/4 regularization method,” in 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184), 2020, pp. 934–938.
[24] S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6023–6032.
[25] M. J. Dinneen, “Improved mixed-example data augmentation,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, pp. 1262–1270.
[26] G. Raskutti, M. J. Wainwright, and B. Yu, “Early stopping and non-parametric regression: an optimal data-dependent stopping rule,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 335–366, 2014.
[27] Z. Lu, C. Xu, B. Du, T. Ishida, L. Zhang, and M. Sugiyama, “LocalDrop: A hybrid regularization for deep neural networks,” IEEE Trans Pattern Anal Mach Intell, vol. 44, no. 7, pp. 3590–3601, 2021.
[28] S. Santurkar, D. Tsipras, A. Ilyas, and A. Madry, “How does batch normalization help optimization?,” Adv Neural Inf Process Syst, vol. 31, 2018.
[29] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, 2015, pp. 448–456.