A Survey for Emotion Recognition Based on Speech Signal
DOI:
https://doi.org/10.29304/jqcm.2022.14.1.905Keywords:
Speech Analysis, , Emotional speech databases, Feature extraction, classification, Audio Speech CharacteristicsAbstract
There are many characteristics of human beings, such as fingerprints, DNA, and retinal pigmentation that are essential. A person's voice is unique to each individual. Humans use speech to communicate their thoughts and feelings. The process of determining one's mental state involves expressing one's basic emotions in words. A person's emotions play a significant part in his or her daily existence. In order to convey one's thoughts and feelings to others, it is essential to use this method. Emotions can be discerned from speech information because humans have a built-in ability to do so. The selection of an emotion recognition body (speech database), the identification of numerous variables connected to speech, and the selection of a suitable classification model are the main hurdles for emotion recognition. To identify emotions, an emotion identification system analyzes auditory structural elements of speech. The analysis is based on multiple research papers and includes an in-depth examination of the methodology and data sets. The study discovered that emotion detection is accomplished using four distinct methodologies: physiological signal recognition, facial expression recognition, a variety of speech signals, and text semantics in both objective and subjective databases such as JAFFE, CK+, Berlin emotional database, and SAVEE. In general, these techniques enable the identification of seven basic emotions. To determine the emotion, the audio expression for eight emotions (happy, angry, sad, depressed, bored, anxious, afraid, and apprehensive), all published research maintain an average level of accuracy. The major goal of this survey is to compare and contrast numerous previous survey methodologies, which are backed up by empirical evidence. This study covered signal collection processing, feature extraction, and signal classification, as well as the pros and downsides of each approach. It also goes over a number of strategies that may need to be tweaked at each step of speech emotion recognition.
Downloads
References
[2] I. Chiriacescu, "Automatic Emotion Analysis Based on Speech," Delft University, 2010.
[3] R. D. Shah, A. C. Suthar, and M. E. Student, "Speech Emotion Recognition Based on SVM Using MATLAB", Int. J. Innov. Res. Comput. Commun. Eng. (An ISO Certif. Organ., 2007.
[4] L. Fu, X. Mao, and L. Chen, "Speaker independent emotion recognition based on SVM/HMMs fusion system", 2008, doi: 10.1109/ICALIP.2008.4590144.
[5] R. P. Gadhe, R. R. Deshmukh, and V. B. Waghmare, "KNN based emotion recognition system for isolated Marathi speech," Int. J. Comput. Sci. Eng., Vol. 4, No. 04, Pp. 173–177, 2015.
[6] I. J. Tashev, Z. Q. Wang, and K. Godin, "Speech emotion recognition based on Gaussian Mixture Models and Deep Neural Networks," 2017, doi: 10.1109/ITA.2017.8023477.
[7] Gang, H., Jiandong, L., & Donghua, L. (2004, May). Study of modulation recognition based on HOCs and SVM. In 2004 IEEE 59th Vehicular Technology Conference. VTC 2004-Spring (IEEE Cat. No. 04CH37514) (Vol. 2, pp. 898-902). IEEE.
[8] P. Shen, Z. Changjun, and X. Chen, "Automatic speech emotion recognition using support vector machine," in Proceedings of 2011 International Conference on Electronic & Mechanical Engineering and Information Technology, 2011, Vol. 2, Pp. 621–625.
[9] C. S. Ooi, K. P. Seng, L. M. Ang, and L. W. Chew, "A new approach of audio emotion recognition," Expert Syst. Appl., 2014, doi: 10.1016/j.eswa.2014.03.026.
[10] A. Milton, S. Sharmy Roy, and S. Tamil Selvi, "SVM Scheme for Speech Emotion Recognition using MFCC Feature," Int. J. Comput. Appl., 2013, DOI: 10.5120/11872-7667.
[11] S. S. Agrawal, N. Prakash, and A. Jain, "Transformation of Emotion based on Acoustic Features of Intonation Patterns for Hindi Speech," IJCSNS Int. J. Comput. Sci. Netw. Secur., 2010.
[12] S. Bahuguna and Y. P. Raiwani, “Study of Speaker’s Emotion Identification for Hindi Speech,” Int. J. Comput. Sci. Eng., Vol. 5, No. 7, Pp. 596, 2013.
[13] B. Panda, D. Padhi, K. Dash, and S. Mohanty, "Use of SVM classifier & MFCC in speech emotion recognition system," Int. J. Adv. Res. Comput. Sci. Softw. Eng., Vol. 2, No. 3, Pp. 225–230, 2012.
[14] S. G. Koolagudi, R. Reddy, J. Yadav, and K. S. Rao, "IITKGP-SEHSC: Hindi speech corpus for emotion analysis," in 2011 International conference on devices and communications (ICDeCom), 2011, Pp. 1–5.
[15] Wu, C., Huang, C., & Chen, H. (2018). Text-independent speech emotion recognition using frequency adaptive features. Multimedia Tools and Applications, 77(18), 24353-24363.
[16] C. C. Lee, E. Mower, C. Busso, S. Lee, and S. Narayanan, "Emotion recognition using a hierarchical binary decision tree approach," Speech Commun., 2011, DOI: 10.1016/j.specom.2011.06.004.
[17] E. M. Albornoz, D. H. Milone, and H. L. Rufiner, "Spoken emotion recognition using hierarchical classifiers," Comput. Speech Lang., Vol. 25, No. 3, Pp. 556–570, 2011.
[18] H. Cao, R. Verma, and A. Nenkova, "Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech," Comput. Speech Lang., 2015, DOI: 10.1016/j.csl.2014.01.003.
[19] Shegokar, P., & Sircar, P. (2016, December). Continuous wavelet transform based speech emotion recognition. In 2016 10th International Conference on Signal Processing and Communication Systems (ICSPCS) (pp. 1-8). IEEE.
[20] S. Basu, J. Chakraborty, and M. Aftabuddin, "Emotion recognition from speech using convolutional neural network with recurrent neural network architecture", 2018, DOI: 10.1109/CESYS.2017.8321292.
[21] Likitha, M. S., Gupta, S. R. R., Hasitha, K., & Raju, A. U. (2017, March). Speech based human emotion recognition using MFCC. In 2017 international conference on wireless communications, signal processing and networking (WiSPNET) (pp. 2257-2260). IEEE.
[22] Han, Z., & Wang, J. (2017, October). Speech emotion recognition based on Gaussian kernel nonlinear proximal support vector machine. In 2017 Chinese Automation Congress (CAC) (pp. 2513-2516). IEEE.
[23] S. Zhang, S. Zhang, T. Huang, W. Gao, and Q. Tian, "Learning Affective Features with a Hybrid Deep Model for Audio-Visual Emotion Recognition", IEEE Trans. Circuits Syst. Video Technol., 2018, DOI: 10.1109/TCSVT.2017.2719043.
[24] Caihua, C. (2019, July). Research on multi-modal mandarin speech emotion recognition based on SVM. In 2019 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS) (pp. 173-176). IEEE.
[25] E. S. Pane, A. D. Wibawa, and M. H. Purnomo, "Improving the accuracy of EEG emotion recognition by combining valence lateralization and ensemble learning with tuning parameters", Cogn. Process., 2019, DOI: 10.1007/s10339-019-00924-z.
[26] Liu, G., & Tan, Z. (2020, June). Research on Multi-modal Music Emotion Classification Based on Audio and Lyirc. In 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) (Vol. 1, pp. 2331-2335). IEEE.
[27] Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005, September). A database of German emotional speech. In Interspeech (Vol. 5, pp. 1517-1520).
[28] Bozkurt, E., Erzin, E., Erdem, Ç. E., & Erdem, A. T. (2010, April). Interspeech 2009 emotion recognition challenge evaluation. In 2010 IEEE 18th Signal Processing and Communications Applications Conference (pp. 216-219). IEEE.
[29] T. Bocklet, A. Maier, J. G. Bauer, F. Burkhardt, and E. Noth, "Age and gender recognition for telephone applications based on gmm supervectors and support vector machines", in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, Pp. 1605–1608.[1] M. J. Gangeh, P. Fewzee, A. Ghodsi, M. S. Kamel, and F. Karray, “Multiview supervised dictionary learning in speech emotion recognition,” IEEE Trans. Audio, Speech Lang. Process., 2014, doi: 10.1109/TASLP.2014.2319157.
[30] M. Dhuheir, A. Albaseer, E. Baccour, A. Erbad, M. Abdallah, and M. Hamdi, “Emotion Recognition for Healthcare Surveillance Systems Using Neural Networks: A Survey,” in 2021 International Wireless Communications and Mobile Computing (IWCMC), 2021, pp. 681–687.
[31] F. Ringeval et al., “Av+ ec 2015: The first affect recognition challenge bridging across audio, video, and physiological data,” in Proceedings of the 5th international workshop on audio/visual emotion challenge, 2015, pp. 3–8.
[32] Abdulmohsin, H. A. (2021). A new proposed statistical feature extraction method in speech emotion recognition. Computers & Electrical Engineering, 93, 107172.