Image Captioning Generation Using Inception V3 and Attention Mechanism

Yasir Hameed Zaidan; Jumana Waleed

doi:10.29304/jqcm.2023.15.2.1228

Authors

Yasir Hameed Zaidan Department of Computer Science, College of Science, University of Diyala, Iraq
Jumana Waleed Department of Computer Science, College of Science, University of Diyala, Iraq

DOI:

https://doi.org/10.29304/jqcm.2023.15.2.1228

Keywords:

Image Captioning Generation, Inception V3, LSTM, Attention Mechanism

Abstract

Captioning an image is the process of using a visual comprehension system with a model of language, by which we can construct sentences that are meaningful and syntactically accurate for an image. The goal is to train a deep learning model to learn the correspondence between an image and its textual description. This is a challenging task due to the inherent complexity and subjectivity of language, as well as the visual variability of images. Computer vision and natural language processing are both used in the difficult task of image captioning. In this paper, an end to end deep learning-based image captioning system using Inception V3 and Long-Short Term Memory (LSTM) with an attention mechanism is implemented. Expansive experimentation has been realized on one of the benchmark datasets named MS COCO, and the experiential results signify that this intended system is capable of surpassing diverse related systems concerning the extensively utilized measures of evaluation, and the accomplished results were 0.543, 0.87, 0.66, 0.51, 0.42 for Meteor and BLEU(B1-B4), respectively.

Downloads

Download data is not yet available.

References

[1] Q. Wang, H. Deng, X. Wu, Z. Yang, Y. Liu, Y. Wang, G. Hao, "LCM-Captioner: A lightweight text-based image captioning method with collaborative mechanism between vision and text", Neural Networks, vol. 162, (2023), pp. 318-329.
[2] H. Parvin, A. R. Naghsh-Nilchi, H. M. Mohammadi, "Transformer-based local-global guidance for image captioning", Expert Systems with Applications, vol. 223, 119774, (2023).
[3] R. Padate, A. Jain, M. Kalla, A. Sharma, "Image caption generation using a dual attention mechanism", Engineering Applications of Artificial Intelligence, vol. 123, Part A, 106112, (2023).
[4] S. Mohsen, A. Elkaseer and S. G. Scholz, "Industry 4.0-Oriented Deep Learning Models for Human Activity Recognition," in IEEE Access, vol. 9, pp. 150508-150521, (2021).
[5] M. H. Abdul-Hadi, J. Waleed, "Human Speech and Facial Emotion Recognition Technique Using SVM," 2020 International Conference on Computer Science and Software Engineering (CSASE), Duhok, Iraq, (2020), pp. 191-196.
[6] Saad Albawi, Muhanad Hameed Arif, Jumana Waleed, "Skin cancer classification dermatologist-level based on deep learning model", Acta Scientiarum. Technology, vol. 45, pp. e61531-e61531, 2023. doi: 10.4025/actascitechnol.v45i1.61531.
[7] M. F. Asghar, M. H. Ali, J. Waleed, "Pedestrian Attributes and Activity Recognition Using Deep Learning: A Comprehensive Survey", Al-Iraqia Journal for Scientific Engineering Research, vol. 2, no.1, pp. 40-56, (2022).
[8] F. Xiao, X. Gong, Y. Zhang, Y. Shen, J. Li, X. Gao, "DAA: Dual LSTMs with adaptive attention for image captioning", Neurocomputing, vol. 364, pp. 322-329, (2019).
[9] Z. Deng, Z. Jiang, R. Lan, W. Huang, X. Luo, "Image captioning using DenseNet network and adaptive attention", Signal Processing: Image Communication, vol. 85, 115836, (2020).
[10] P. Tian, H. Mo, L. Jiang, "Image Caption Generation Using Multi-Level Semantic Context Information", Symmetry, vol. 13, no. 7, 1184, (2021).
[11] C. Wang, X. Gu, "Local-global visual interaction attention for image captioning", Digital Signal Processing, vol. 130, 103707, (2022).
[12] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, (2016), pp. 2818-2826.
[13] X. Xia, C. Xu and B. Nan, "Inception-v3 for flower classification," 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, (2017), pp. 783-787.
[14] M. Lukoševičius, H. Jaeger, "Reservoir computing approaches to recurrent neural network training", Computer Science Review, vol. 3, no. 3, pp. 127-149, (2009).
[15] H. S. Gill, O. I. Khalaf, Y. Alotaibi, S. Alghamdi, F. Alassery, "Multi-Model CNN-RNN-LSTM Based Fruit Recognition and Classification", Intelligent Automation and Soft Computing, vol. 33, no. 1, pp.637-650, (2022).
[16] A. Jaffar, N. M. Thamrin, M. Syahirul Amin, M. Ali, M. Farid Misnan, A. Ihsan Mohd Yassin, "WATER QUALITY PREDICTION USING LSTM-RNN: A REVIEW", Penerbit UMT Journal of Sustainability Science and Management, vol.17, pp. 205-226, (2022).
[17] A. Tsantekidis, N. Passalis, A. Tefas, "Chapter 5 - Recurrent neural networks", Deep Learning for Robot Perception and Cognition, Academic Press, pp. 101-115, (2022).
[18] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick, "Microsoft COCO: Common Objects in Context", Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol. 8693. Springer, Cham, (2014).

Image Captioning Generation Using Inception V3 and Attention Mechanism

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

indexed

Make a Submission

Information

Developed By

journaldetails

details

Journal Details

Journal Policy

Aims and Scope

About Paper Review

Review Process

Abstracting and Indexing

Feedback

guidelines

Guidelines for Authors

Instruction for Authors

Copyright Agreement

DECLARATION FORM

Example of Published Paper

Licenses and Copyright

Publishing Fees:

Current Issue

Journal of Al-Qadisiyah for computer science and mathematics (JQCSM)

ISSN 2521-3504 (Online), ISSN 2074-0204 (Print)

It is scientific journal issued by College of computer Science and IT / University of Al-Qadisiyah