Improving Ancient Language Classification with Deep Learning and Traditional Models for Class Imbalance

Authors

  • Hadeel M Saleh Continuing Education Center, University of Anbar, Al-Anbar, Iraq.

DOI:

https://doi.org/10.29304/jqcsm.2026.18.22612

Keywords:

Key words: DVFE, Transfer Learning, Class Imbalance, Ensemble Learning, and Ancient Language Classification.

Abstract

The paper suggests a hybrid framework of the ancient language image classification, which combines deep transfer learning with ensemble machine learning to overcome the issue of imbalance in the classes in low-resource settings. They use a pre-trained Deep Visual Feature Extractor (DVFE) that is built on the ResNet-50 architecture to produce the high-level visual representations of the Ancient Language Images (ALI) dataset that consists of eight historical scripts. In order to reduce moderate levels of imbalance in the deep feature space, imbalance-aware resampling algorithms, such as SMOTE and ADASYN are only used on the training set. They are then used to extract features which are then further classified with the help of Random Forest and LightGBM models with hyperparameters being optimized via cross-validation. On a held-out test set, experimental results show strong and balanced performance with an overall accuracy of 0.89, macro-precision of 0.88, macro-recall of 0.87, macro-F1 of 0.88 and per-class values of AUC of 0.88 to 0.97. The results show that the ability to correct the imbalance on the feature level allows to increase the fairness between the majority and minority language classes and retain good discriminative power. This comparison also supports the fact that the proposed DVFE-ensemble system yields better performance as compared to the traditional methods used to deal with imbalance on the raw images level. On the whole, the research project demonstrates the usefulness of the deep feature extraction and imbalance-conscious ensemble learning as a tool of ensuring a high level of reliability in the ancient languages classification, which will further the development of the digital heritage preservation systems in the conditions of limited amounts of information.

Downloads

Download data is not yet available.

References

S. Kishanthan and A. Hevapathige, “Deep learning meets oversampling: A learning framework to handle imbalanced classification,” International Journal of Information Technology, pp. 1–13, 2025. https://doi.org/10.1007/s41870-025-02690-y

K. M. Hasib, M. S. Iqbal, F. M. Shah, J. A. Mahmud, M. H. Popel, M. I. H. Showrov, and O. Rahman, “A survey of methods for managing the classification and solution of data imbalance problem,” arXiv preprint arXiv:2012.11870, 2020. https://doi.org/10.3844/jcssp.2020.1546.1557.

J. Luo, F. Hartmann, E. Santus, R. Barzilay, and Y. Cao, “Deciphering undersegmented ancient scripts using phonetic prior,” Transactions of the Association for Computational Linguistics, vol. 9, pp. 69–81, 2021. https://doi.org/10.1162/tacl_a_00354

Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural networks, 106, 249-259.‏ https://doi.org/10.1016/j.neunet.2018.07.011

Smitha, N., Tanuja, R., & Manjula, S. H. (2026). Enhanced Sepsis Prediction Using Ensemble Learning with SMOTE-Based Data Balancing and Stratified Validation. Engineering, Technology & Applied Science Research, 16(1), 30875-30879.‏ https://doi.org/10.48084/etasr.14071

García-Torres, M., Saucedo, F., Divina, F., & Gómez-Guerrero, S. (2026). RFMSU: A multivariate symmetrical uncertainty-based random forest. Pattern Recognition, 169, 111939.‏https://doi.org/10.1016/j.patcog.2025.111939

Mohanty, N., Behera, B. K., Ferrie, C., & Dash, P. (2025). A quantum approach to synthetic minority oversampling technique (SMOTE). Quantum Machine Intelligence, 7(1), 38.‏ https://doi.org/10.48084/etasr.14071

Fernández, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research, 61, 863-905.‏ https://doi.org/10.1613/jair.1.11192

Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural networks, 106, 249-259.‏ https://doi.org/10.1016/j.neunet.2018.07.011

Khan, A. A., Chaudhari, O., & Chandra, R. (2024). A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Systems with Applications, 244, 122778.‏ https://doi.org/10.1016/j.eswa.2023.122778

Gao, X., Xie, D., Zhang, Y., Wang, Z., Chen, C., He, C., ... & Zhang, W. (2025). A comprehensive survey on imbalanced data learning. arXiv preprint arXiv:2502.08960.‏ https://doi.org/10.1007/s11704-025-50274-7

Werner de Vargas, V., Schneider Aranda, J. A., dos Santos Costa, R., da Silva Pereira, P. R., & Victória Barbosa, J. L. (2023). Imbalanced data preprocessing techniques for machine learning: a systematic mapping study. Knowledge and Information Systems, 65(1), 31-57.‏

https://doi.org/10.1007/s10115-022-01772-8

Hamid, M. H. A., Yusoff, M., & Mohamed, A. (2022). Survey on highly imbalanced multi-class data. International Journal of Advanced Computer Science and Applications, 13(6).‏

DOI: 10.14569/IJACSA.2022.0130627

De Alvis, C., & Seneviratne, S. (2024). A survey of deep long-tail classification advancements. arXiv preprint arXiv:2404.15593.‏

https://doi.org/10.48550/arXiv.2404.15593

Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of big data, 6(1), 1-54.‏

https://doi.org/10.1186/s40537-019-0192-5

Khan, S. H., Hayat, M., Bennamoun, M., Sohel, F. A., & Togneri, R. (2017). Cost-sensitive learning of deep feature representations from imbalanced data. IEEE transactions on neural networks and learning systems, 29(8), 3573-3587.‏10.1109/TNNLS.2017.2732482

Henning, S., Beluch, W., Fraser, A., & Friedrich, A. (2023, May). A survey of methods for addressing class imbalance in deep-learning based natural language processing. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 523-540).‏

DOI:10.18653/v1/2023.eacl-main.38

Hasib, K. M., Iqbal, M. S., Shah, F. M., Mahmud, J. A., Popel, M. H., Showrov, M. I. H., ... & Rahman, O. (2020). A survey of methods for managing the classification and solution of data imbalance problem. arXiv preprint arXiv:2012.11870.‏ https://doi.org/10.3844/jcssp.2020.1546.1557

Sampath, V., Maurtua, I., Aguilar Martin, J. J., & Gutierrez, A. (2021). A survey on generative adversarial networks for imbalance problems in computer vision tasks. Journal of big Data, 8(1), 27.‏

https://doi.org/10.1186/s40537-021-00414-0

Sommerschield, T., Assael, Y., Pavlopoulos, J., Stefanak, V., Senior, A., Dyer, C., ... & De Freitas, N. (2023). Machine learning for ancient languages: A survey. Computational Linguistics, 49(3), 703-747.‏

https://doi.org/10.1162/coli_a_00481

A. Bouchantouf and N. Lamghari, “Deep learning methods for ancient Arabic handwritten script recognition,” Informatica, 2025.

DOI: https://doi.org/10.31449/inf.v49i28.8920

Chadha, S., Mittal, S., & Singhal, V. (2020). Ancient text character recognition using deep learning. International Journal of Engineering Research and Technology, 3(9), 2177-2184.‏ DOI: 10.37624/ijert/13.9.2020.2177-2184

Dixit, V., Hussain, N., Basak, S., Atturu, D., Mitra, D., & Bhattacharya, U. (2025). Deep Learning in Archiving Indus Script and Motif Information. Journal of Computer Applications in Archaeology, 8(1).‏DOI: 10.5334/jcaa.175

Bi, X., Sun, Z., & Chen, Z. (2025). A novel unsupervised contrastive learning framework for ancient Yi script character dataset construction. npj Heritage Science, 13(1), 39.‏https://doi.org/10.1038/s40494-025-01600-6

Wang, N., Wang, W., Li, B., Zhang, H., Jiao, Q., & Liu, C. (2025). Multi-modal ancient scripts recognition via deep learning with data homogenization and augmentation. npj Heritage Science, 13(1), 522.‏https://doi.org/10.1038/s40494-025-02095-x

Diao, X., Bo, R., Xiao, Y., Shi, L., Zhou, Z., Xu, H., ... & Shi, D. (2025). Ancient Script Image Recognition and Processing: A Review. arXiv preprint arXiv:2506.19208.‏ https://doi.org/10.48550/arXiv.2506.19208

Idwan, S., Etaiwi, W., Rafayia, H., & Matar, I. (2025). A comprehensive review of statistical variants and enhancements of SMOTE oversampling method. International Journal of Data Science and Analytics, 20(8), 6887-6904.‏https://doi.org/10.1162/coli_a_00481.

Downloads

Published

2026-06-27

How to Cite

M Saleh, H. (2026). Improving Ancient Language Classification with Deep Learning and Traditional Models for Class Imbalance. Journal of Al-Qadisiyah for Computer Science and Mathematics, 18(2), Comp 257–277. https://doi.org/10.29304/jqcsm.2026.18.22612

Issue

Section

Computer Articles