Visual question answering model based on attention feature fusion

LI Kuan; ZHANG Rongfen; LIU Yuhong; LU Xinxin

doi:10.19304/J.ISSN1000-7180.2021.1102

LI Kuan, ZHANG Rongfen, LIU Yuhong, LU Xinxin. Visual question answering model based on attention feature fusion[J]. Microelectronics & Computer, 2022, 39(4): 83-90. DOI: 10.19304/J.ISSN1000-7180.2021.1102

Citation:

Visual question answering model based on attention feature fusion

Abstract

Abstract

With the rise and continuous development of deep learning, significant progress has been made in the field of visual question answering. At present, most visual question answering models introduce attention mechanisms and related iterative operations to extract the correlation between image regions and high-frequency question word pairs, but The effectiveness of obtaining the spatial semantic association between the image and the question is low, which affects the accuracy of the answer. For this reason, a visual question answering model based on MobileNetV3 network and attention feature fusion is proposed. First, in order to optimize the image feature extraction module, the MobileNetV3 network is introduced and the spatial pyramid pooling structure is added to reduce the computational complexity of the network model while ensuring Model accuracy rate. In addition, the output classifier is improved, and the feature fusion method among them is connected using the attention-based feature fusion method to improve the accuracy of the question and answer. Finally, a comparative experiment is conducted on the public data set VQA 2.0, and the results show that the model proposed in the article is more superior than the current mainstream model.

FullText(HTML)

References (19)

Relative Articles

Supplements (0)

Cited By

Turn off MathJax

Article Contents

Visual question answering model based on attention feature fusion

Abstract

Catalog

Export File

Citation

Format

Content