Arabic word Prediction For Next and Previous Word Using Bert & CBOW Algorithms
DOI:
https://doi.org/10.29304/jqcsm.2024.16.41778Keywords:
Arabic Language, Word Prediction, CBOW, Bert Algorithm, F-MeasureAbstract
One application of Natural Language Processing (NLP) is Next Word Prediction, also known as Language Modeling. This process involves predicting the most likely word to follow in a given sentence based on the preceding context. It has numerous widely used applications, like auto-correct, which is mostly used in emails and messages. It can also be used in Microsoft Word or Google searches to predict the next word based on past searches or global queries. The goal of Natural Language Generation (NLG) is to create language that is human-interpretable and natural. Users find text generation, and next-word prediction in particular, convenient as it makes typing faster and error-free. Consequently, an essential analysis topic for all languages is a personalized text prediction system. This paper suggests a novel approach for predicting the following word in a Arabic sentence. It is possible to minimize the total number of keystrokes a user makes by anticipating the next word in a sequence. In this work, Bert algorithm and Continuous Bag of Words(CBOW) are proposed to predict the next word in Arabic language, and predict the previous word. The Bert Algorithm is achieved the best accuracy , 90% for next word prediction, and 80% for previous word prediction . And, Continuous Bag of Words(CBOW) is achieved the best accuracy , 100% for next word prediction, and 100% for previous word prediction.
Downloads
References
. R.M, Duwairi, Marji, N., Sha'ban & S. Rushaidat (2014, April). Sentiment analysis in arabic tweets. In 2014 5th international conference on information and communication systems (ICICS) (pp.1-6). IEEE. DOI: 10.1109/GlobConPT57482.2022.9938153
. C. ,Aliprandi, N. Carmignani, N. Deha, P Mancarella, & M. Rubino (2008). Advances in nlp applied to word prediction. University of Pisa, Italy February.
. R. Sharma, N. Goel, N. Aggarwal, P. Kaur & C. Prakash (2019, September). Next word prediction in hindi using deep learning techniques. In 2019 International conference on data science and engineering (ICDSE) (pp. 55-60). IEEE. DOI: 10.1109/ICDSE47409.2019.8971796
. A. Atçili, O. Özkaraca, G. Sariman, & B. Patrut, (2021, October). Next Word Prediction with Deep Learning Models. In The International Conference on Artificial Intelligence and Applied Mathematics in Engineering (pp. 523-531). Cham: Springer International Publishing.
. K. Shakhovska, I. Dumyn, N. Kryvinska, & M.K. Kagita, (2021). An Approach for a Next-Word Prediction for Ukrainian Language. Wireless Communications and Mobile Computing, 2021, 1-9.DOI.org/10.1155/2021/5886119
. A.F. Ganai, & F. Khursheed, (2019, November). Predicting next word using RNN and LSTM cells: Stastical language modeling. In 2019 Fifth International Conference on Image Information Processing (ICIIP) (pp. 469-474). IEEE. DOI: 10.1109/ICIIP47207.2019.8985885
. S. Agarwal, A. Sukritin, Sharma, & A. Mishra, (2022). Next Word Prediction Using Hindi Language. In Ambient Communications and Computer Systems: Proceedings of RACCCS 2021 (pp. 99-108). Singapore: Springer Nature Singapore.
. U.Anil, &M. Akcayol, M. (2020). Deep learning based prediction model for the next purchase. Advances in Electrical and Computer Engineering, 20(2).
. S. González-Carvajal, & E.C. Garrido-Merchán, (2020). Comparing BERT against traditional machine learning text classification. arXiv preprint arXiv:2005.13012.
. M.V. Koroteev, (2021). BERT: a review of applications in natural language processing and understanding. arXiv preprint arXiv:2103.11943.
. S., Chakraborty, M. Borhan Uddin Talukdar, P., Sikdar, & J., Uddin, (2024). An Efficient Sentiment Analysis Model for Crime Articles’ Comments using a Fine-tuned BERT Deep Architecture and Pre-Processing Techniques. Journal of Information Systems and Telecommunication (JIST), 1(45), 1.
. S., Sina, R., Ramin (2024). An Aspect-Level Sentiment Analysis Based on LDA Topic Modeling. Journal of Information Systems and Telecommunication (JIST), (46), 117.
. M. Milicevic, M. Baranovic, & K. Zubrinic, (2015). Application of machine learning algorithms for the query performance prediction. Advances in Electrical and Computer Engineering, 15(3), 33-44.
. Karani, D. (2018). Introduction to word embedding and word2vec. Towards Data Science, 1.
. M., Jaderyan, & H., Khotanlou(2020). SGF (Semantic Graphs Fusion): A Knowledge-based Representation of Textual Resources for Text Mining Applications. Journal of Information Systems and Telecommunication (JIST), 2(26), 120.
. S. Sivakumar, L.S. Videla,T.R. Kumar, J. Nagaraj, S. Itnal, & D. Haritha, (2020, September). Review on word2vec word embedding neural net. In 2020 international conference on smart electronics and communication (ICOSEC) (pp. 282-290). IEEE. DOI: 10.1109/ICOSEC49089.2020.9215319
. R. Wang, & J. Li, (2019, July). Bayes test of precision, recall, and F1 measure for comparison of two natural language processing models. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 4135-4145).
. D. Jatnika, M.A. Bijaksana, & A.A.Suryani, (2019). Word2vec model analysis for semantic similarities in english words. Procedia Computer Science, 157, 160-167.
. B. Sidaoui, &K. Sadouni, (2023). Epilepsy Seizure Prediction from EEG Signal Using Machine Learning Techniques. Advances in Electrical & Computer Engineering, 23(2).D. Karani, (2018). Introduction to word embedding and word2vec. Towards Data Science, 1.
. Taher, H. A., Abdulameer, M. H., & Mahdi, B. (2022). Information Retrieval Scheme Via Similarity Technique. International Journal on Technical and Physical Problems of Engineering (IJTPE), (51), 375-379
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Hawraa Ali Taher
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.