A Proposed Arabic Text Classification Model using Multi-Label System

Authors

  • Hussain A. Rahmana College of computer Science and Information Technology, University of AL-Qadisiyah
  • Salwa S. Baawi College of computer Science and Information Technology, University of AL-Qadisiyah

DOI:

https://doi.org/10.29304/jqcm.2023.15.3.1269

Keywords:

Text classification, Arabic text classification, Single-label text classification, Multi-label text classification, Feature selection

Abstract

Multi-label text classification has grown in popularity in recent years, with each document being assigned numerous categories simultaneously. The Arabic Language has a very complex morphology and a vibrant nature; nonetheless, there needs to be more research on this topic for the Arabic Language. As a result, this study aims to present a method for the multi-label classification of Arabic texts based on binary relevance and the label power set transformation method. Three classification classifiers: namely logistics regression(LR), Random forest (RF), and multinomial naïve Bays (MNB), were experimentally assessed in this thesis. Furthermore, chi-square feature selection was investigated to improve the performance of the proposed model. The experimental results are implemented in Python programming using the "RTANews" multi-label Arabic text classification dataset. The results suggest that binary relevance combined with logistics regression produces the best results. It performed well, with an averaged micro-Recall of 0.8646. At the same time, the best result was produced by label power-set with the same algorithm and metrics of 0.8418 for the suggested multi-label Arabic text classification model.

Downloads

Download data is not yet available.

References

[1] A. H. Mohammad, “Arabic Text Classification: A Review,” Mod. Appl. Sci., vol. 13, no. 5, p. 88, 2019, doi: 10.5539/mas.v13n5p88.
[2] M. Abbas, K. A. Memon, A. A. Jamali, S. Memon, and A. Ahmed, “Multinomial Naive Bayes Classification Model for Sentiment Analysis,” IJCSNS Int. J. Comput. Sci. Netw. Secur., vol. 19, no. 3, p. 62, 2019, doi: 10.13140/RG.2.2.30021.40169.
[3] F. Peters, T. T. Tun, Y. Yu, and B. Nuseibeh, “Text Filtering and Ranking for Security Bug Report Prediction,” IEEE Trans. Softw. Eng., vol. 45, no. 6, pp. 615–631, 2019, doi: 10.1109/TSE.2017.2787653.
[4] A. Elsaid, A. Mohammed, L. F. Ibrahim, and M. M. Sakre, “A Comprehensive Review of Arabic Text Summarization,” IEEE Access, vol. 10, pp. 38012–38030, 2022, doi: 10.1109/ACCESS.2022.3163292.
[5] N. Aljedani, R. Alotaibi, and M. Taileb, “Multi-Label Arabic Text Classification: An Overview,” 2020.
[6] S. Kumar, N. Kumar, A. Dev, and S. Naorem, “Movie genre classification using binary relevance, label powerset, and machine learning classifiers,” Multimed. Tools Appl., 2022, doi: 10.1007/s11042-022-13211-5.
[7] M. K. B. Melhem, L. Abualigah, R. A. Zitar, A. G. Hussien, and D. Oliva, Comparative Study on Arabic Text Classification: Challenges and Opportunities, vol. 1071. Springer International Publishing, 2023. doi: 10.1007/978-3-031-17576-3_10.
[8] M. A. R. Abdeen, S. AlBouq, A. Elmahalawy, and S. Shehata, “A closer look at arabic text classification,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 11, pp. 677–688, 2019, doi: 10.14569/IJACSA.2019.0101189.
[9] N. A. Ahmed, M. A. Shehab, M. Al-Ayyoub, and I. Hmeidi, “Scalable multi-label Arabic text classification,” 2015 6th Int. Conf. Inf. Commun. Syst. ICICS 2015, pp. 212–217, 2015, doi: 10.1109/IACS.2015.7103229.
[10] A. Y. Taha and S. Tiun, “Binary relevance (BR) method classifier of multi-label classification for arabic text,” J. Theor. Appl. Inf. Technol., vol. 84, no. 3, pp. 414–422, 2016.
[11] M. A. Shehab, O. Badarneh, M. Al-Ayyoub, and Y. Jararweh, “A supervised approach for multi-label classification of Arabic news articles,” Proc. - CSIT 2016 2016 7th Int. Conf. Comput. Sci. Inf. Technol., pp. 1–6, 2016, doi: 10.1109/CSIT.2016.7549465.
[12] I. Hmeidi, M. Al-Ayyoub, N. A. Mahyoub, and M. A. Shehab, “A lexicon based approach for classifying Arabic multi-labeled text,” Int. J. Web Inf. Syst., vol. 12, no. 4, pp. 504–532, 2016, doi: 10.1108/IJWIS-01-2016-0002.
[13] B. Al-Salemi, M. Ayob, G. Kendall, and S. A. M. Noah, “Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms,” Inf. Process. Manag., vol. 56, no. 1, pp. 212–227, 2019, doi: 10.1016/j.ipm.2018.09.008.
[14] R. M. Al Mgheed, “Scalable Arabic text Classification Using Machine Learning Model,” 2021 12th Int. Conf. Inf. Commun. Syst. ICICS 2021, pp. 483–485, 2021, doi: 10.1109/ICICS52457.2021.9464566.
[15] N. Aljedani, R. Alotaibi, and M. Taileb, “HMATC: Hierarchical multi-label Arabic text classification model using machine learning,” Egypt. Informatics J., vol. 22, no. 3, pp. 225–237, 2021, doi: 10.1016/j.eij.2020.08.004.
[16] K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,” Inf., vol. 10, no. 4, pp. 1–68, 2019, doi: 10.3390/info10040150.
[17] C. Puttipornchai, S. Chanyachatchawan, and N. Tuaycharoen, “Multi-Label Classification for Articles in Thai Journal Database from Article’s Abstract,” 2022 19th Int. Jt. Conf. Comput. Sci. Softw. Eng. JCSSE 2022, pp. 1–6, 2022, doi: 10.1109/JCSSE54890.2022.9836270.
[18] D. S. Guru, M. Ali, M. Suhil, and M. Hazman, “A study of applying different term weighting schemes on Arabic text classification,” Lect. Notes Networks Syst., vol. 43, pp. 293–305, 2019, doi: 10.1007/978-981-13-2514-4_25.
[19] D. Ganda, R. B.-R. T. in P. Languages, and undefined 2018, “A survey on multi label classification,” Researchgate.Net, vol. 5, no. 1, pp. 19–23, 2018, [Online]. Available: https://www.researchgate.net/profile/Rachana-Buch/publication/327110772_A_Survey_on_Multi_Label_Classification/links/5bf56905299bf1124fe4aef2/A-Survey-on-Multi-Label-Classification.pdf
[20] R. A. Zayed, M. F. A. Hady, and H. Hefny, “Islamic fatwa request routing via hierarchical multi-label Arabic text categorization,” Proc. - 1st Int. Conf. Arab. Comput. Linguist. Adv. Arab. Comput. Linguist. ACLing 2015, pp. 145–151, 2016, doi: 10.1109/ACLing.2015.28.
[21] H. N. Alshaer, M. A. Otair, L. Abualigah, M. Alshinwan, and A. M. Khasawneh, “Feature selection method using improved CHI Square on Arabic text classifiers: analysis and application,” Multimed. Tools Appl., vol. 80, no. 7, pp. 10373–10390, 2021, doi: 10.1007/s11042-020-10074-6.
[22] S. T. Indra, L. Wikarsa, and R. Turang, “Using logistic regression method to classify tweets into the selected topics,” 2016 Int. Conf. Adv. Comput. Sci. Inf. Syst. ICACSIS 2016, no. August 2017, pp. 385–390, 2017, doi: 10.1109/ICACSIS.2016.7872727.
[23] F. Elghannam, “Multi-Label Annotation and Classification of Arabic Texts Based on Extracted Seed Keyphrases and Bi-Gram Alphabet Feed Forward Neural Networks Model,” ACM Trans. Asian Low-Resource Lang. Inf. Process., vol. 22, no. 1, 2022, doi: 10.1145/3539607.
[24] G. Tsoumakas, I. Katakis, and I. Vlahavas, “Mining Multi-label Data,” pp. 1–20.
[25] S. S. Sonawane, P. N. Mahalle, and A. S. Ghotkar, “Information Retrieval,” Stud. Big Data, vol. 104, pp. 81–94, 2022, doi: 10.1007/978-981-16-9995-5_4.

Downloads

Published

2023-09-30

How to Cite

Rahmana, H. A., & Baawi, S. S. (2023). A Proposed Arabic Text Classification Model using Multi-Label System. Journal of Al-Qadisiyah for Computer Science and Mathematics, 15(3), Comp Page 117–128. https://doi.org/10.29304/jqcm.2023.15.3.1269

Issue

Section

Computer Articles