Improving Ancient Language Classification with Deep Learning and Traditional Models for Class Imbalance
DOI:
https://doi.org/10.29304/jqcsm.2026.18.22612Keywords:
Key words: DVFE, Transfer Learning, Class Imbalance, Ensemble Learning, and Ancient Language Classification.Abstract
The paper suggests a hybrid framework of the ancient language image classification, which combines deep transfer learning with ensemble machine learning to overcome the issue of imbalance in the classes in low-resource settings. They use a pre-trained Deep Visual Feature Extractor (DVFE) that is built on the ResNet-50 architecture to produce the high-level visual representations of the Ancient Language Images (ALI) dataset that consists of eight historical scripts. In order to reduce moderate levels of imbalance in the deep feature space, imbalance-aware resampling algorithms, such as SMOTE and ADASYN are only used on the training set. They are then used to extract features which are then further classified with the help of Random Forest and LightGBM models with hyperparameters being optimized via cross-validation. On a held-out test set, experimental results show strong and balanced performance with an overall accuracy of 0.89, macro-precision of 0.88, macro-recall of 0.87, macro-F1 of 0.88 and per-class values of AUC of 0.88 to 0.97. The results show that the ability to correct the imbalance on the feature level allows to increase the fairness between the majority and minority language classes and retain good discriminative power. This comparison also supports the fact that the proposed DVFE-ensemble system yields better performance as compared to the traditional methods used to deal with imbalance on the raw images level. On the whole, the research project demonstrates the usefulness of the deep feature extraction and imbalance-conscious ensemble learning as a tool of ensuring a high level of reliability in the ancient languages classification, which will further the development of the digital heritage preservation systems in the conditions of limited amounts of information.
Downloads
References
S. Kishanthan and A. Hevapathige, “Deep learning meets oversampling: A learning framework to handle imbalanced classification,” International Journal of Information Technology, pp. 1–13, 2025. https://doi.org/10.1007/s41870-025-02690-y
K. M. Hasib, M. S. Iqbal, F. M. Shah, J. A. Mahmud, M. H. Popel, M. I. H. Showrov, and O. Rahman, “A survey of methods for managing the classification and solution of data imbalance problem,” arXiv preprint arXiv:2012.11870, 2020. https://doi.org/10.3844/jcssp.2020.1546.1557.
J. Luo, F. Hartmann, E. Santus, R. Barzilay, and Y. Cao, “Deciphering undersegmented ancient scripts using phonetic prior,” Transactions of the Association for Computational Linguistics, vol. 9, pp. 69–81, 2021. https://doi.org/10.1162/tacl_a_00354
Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural networks, 106, 249-259. https://doi.org/10.1016/j.neunet.2018.07.011
Smitha, N., Tanuja, R., & Manjula, S. H. (2026). Enhanced Sepsis Prediction Using Ensemble Learning with SMOTE-Based Data Balancing and Stratified Validation. Engineering, Technology & Applied Science Research, 16(1), 30875-30879. https://doi.org/10.48084/etasr.14071
García-Torres, M., Saucedo, F., Divina, F., & Gómez-Guerrero, S. (2026). RFMSU: A multivariate symmetrical uncertainty-based random forest. Pattern Recognition, 169, 111939.https://doi.org/10.1016/j.patcog.2025.111939
Mohanty, N., Behera, B. K., Ferrie, C., & Dash, P. (2025). A quantum approach to synthetic minority oversampling technique (SMOTE). Quantum Machine Intelligence, 7(1), 38. https://doi.org/10.48084/etasr.14071
Fernández, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research, 61, 863-905. https://doi.org/10.1613/jair.1.11192
Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural networks, 106, 249-259. https://doi.org/10.1016/j.neunet.2018.07.011
Khan, A. A., Chaudhari, O., & Chandra, R. (2024). A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Systems with Applications, 244, 122778. https://doi.org/10.1016/j.eswa.2023.122778
Gao, X., Xie, D., Zhang, Y., Wang, Z., Chen, C., He, C., ... & Zhang, W. (2025). A comprehensive survey on imbalanced data learning. arXiv preprint arXiv:2502.08960. https://doi.org/10.1007/s11704-025-50274-7
Werner de Vargas, V., Schneider Aranda, J. A., dos Santos Costa, R., da Silva Pereira, P. R., & Victória Barbosa, J. L. (2023). Imbalanced data preprocessing techniques for machine learning: a systematic mapping study. Knowledge and Information Systems, 65(1), 31-57.
https://doi.org/10.1007/s10115-022-01772-8
Hamid, M. H. A., Yusoff, M., & Mohamed, A. (2022). Survey on highly imbalanced multi-class data. International Journal of Advanced Computer Science and Applications, 13(6).
DOI: 10.14569/IJACSA.2022.0130627
De Alvis, C., & Seneviratne, S. (2024). A survey of deep long-tail classification advancements. arXiv preprint arXiv:2404.15593.
https://doi.org/10.48550/arXiv.2404.15593
Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of big data, 6(1), 1-54.
https://doi.org/10.1186/s40537-019-0192-5
Khan, S. H., Hayat, M., Bennamoun, M., Sohel, F. A., & Togneri, R. (2017). Cost-sensitive learning of deep feature representations from imbalanced data. IEEE transactions on neural networks and learning systems, 29(8), 3573-3587.10.1109/TNNLS.2017.2732482
Henning, S., Beluch, W., Fraser, A., & Friedrich, A. (2023, May). A survey of methods for addressing class imbalance in deep-learning based natural language processing. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 523-540).
DOI:10.18653/v1/2023.eacl-main.38
Hasib, K. M., Iqbal, M. S., Shah, F. M., Mahmud, J. A., Popel, M. H., Showrov, M. I. H., ... & Rahman, O. (2020). A survey of methods for managing the classification and solution of data imbalance problem. arXiv preprint arXiv:2012.11870. https://doi.org/10.3844/jcssp.2020.1546.1557
Sampath, V., Maurtua, I., Aguilar Martin, J. J., & Gutierrez, A. (2021). A survey on generative adversarial networks for imbalance problems in computer vision tasks. Journal of big Data, 8(1), 27.
https://doi.org/10.1186/s40537-021-00414-0
Sommerschield, T., Assael, Y., Pavlopoulos, J., Stefanak, V., Senior, A., Dyer, C., ... & De Freitas, N. (2023). Machine learning for ancient languages: A survey. Computational Linguistics, 49(3), 703-747.
https://doi.org/10.1162/coli_a_00481
A. Bouchantouf and N. Lamghari, “Deep learning methods for ancient Arabic handwritten script recognition,” Informatica, 2025.
DOI: https://doi.org/10.31449/inf.v49i28.8920
Chadha, S., Mittal, S., & Singhal, V. (2020). Ancient text character recognition using deep learning. International Journal of Engineering Research and Technology, 3(9), 2177-2184. DOI: 10.37624/ijert/13.9.2020.2177-2184
Dixit, V., Hussain, N., Basak, S., Atturu, D., Mitra, D., & Bhattacharya, U. (2025). Deep Learning in Archiving Indus Script and Motif Information. Journal of Computer Applications in Archaeology, 8(1).DOI: 10.5334/jcaa.175
Bi, X., Sun, Z., & Chen, Z. (2025). A novel unsupervised contrastive learning framework for ancient Yi script character dataset construction. npj Heritage Science, 13(1), 39.https://doi.org/10.1038/s40494-025-01600-6
Wang, N., Wang, W., Li, B., Zhang, H., Jiao, Q., & Liu, C. (2025). Multi-modal ancient scripts recognition via deep learning with data homogenization and augmentation. npj Heritage Science, 13(1), 522.https://doi.org/10.1038/s40494-025-02095-x
Diao, X., Bo, R., Xiao, Y., Shi, L., Zhou, Z., Xu, H., ... & Shi, D. (2025). Ancient Script Image Recognition and Processing: A Review. arXiv preprint arXiv:2506.19208. https://doi.org/10.48550/arXiv.2506.19208
Idwan, S., Etaiwi, W., Rafayia, H., & Matar, I. (2025). A comprehensive review of statistical variants and enhancements of SMOTE oversampling method. International Journal of Data Science and Analytics, 20(8), 6887-6904.https://doi.org/10.1162/coli_a_00481.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Hadeel M Saleh

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.








