LI Kuan, ZHANG Rongfen, LIU Yuhong, LU Xinxin. Visual question answering model based on attention feature fusion[J]. Microelectronics & Computer, 2022, 39(4): 83-90. DOI: 10.19304/J.ISSN1000-7180.2021.1102
Citation: LI Kuan, ZHANG Rongfen, LIU Yuhong, LU Xinxin. Visual question answering model based on attention feature fusion[J]. Microelectronics & Computer, 2022, 39(4): 83-90. DOI: 10.19304/J.ISSN1000-7180.2021.1102

Visual question answering model based on attention feature fusion

  • With the rise and continuous development of deep learning, significant progress has been made in the field of visual question answering. At present, most visual question answering models introduce attention mechanisms and related iterative operations to extract the correlation between image regions and high-frequency question word pairs, but The effectiveness of obtaining the spatial semantic association between the image and the question is low, which affects the accuracy of the answer. For this reason, a visual question answering model based on MobileNetV3 network and attention feature fusion is proposed. First, in order to optimize the image feature extraction module, the MobileNetV3 network is introduced and the spatial pyramid pooling structure is added to reduce the computational complexity of the network model while ensuring Model accuracy rate. In addition, the output classifier is improved, and the feature fusion method among them is connected using the attention-based feature fusion method to improve the accuracy of the question and answer. Finally, a comparative experiment is conducted on the public data set VQA 2.0, and the results show that the model proposed in the article is more superior than the current mainstream model.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return