钱梦莹, 田生伟, 张立强, 张新宇, 马圆圆. 基于RCBA模型的多模态讽刺识别[J]. 微电子学与计算机, 2022, 39(6): 12-21. DOI: 10.19304/J.ISSN1000-7180.2021.1286
引用本文: 钱梦莹, 田生伟, 张立强, 张新宇, 马圆圆. 基于RCBA模型的多模态讽刺识别[J]. 微电子学与计算机, 2022, 39(6): 12-21. DOI: 10.19304/J.ISSN1000-7180.2021.1286
QIAN Mengying, TIAN Shengwei, ZHANG Liqiang, ZHANG Xinyu, MA Yuanyuan. Multimodal sarcasm recognition based on RCBA model[J]. Microelectronics & Computer, 2022, 39(6): 12-21. DOI: 10.19304/J.ISSN1000-7180.2021.1286
Citation: QIAN Mengying, TIAN Shengwei, ZHANG Liqiang, ZHANG Xinyu, MA Yuanyuan. Multimodal sarcasm recognition based on RCBA model[J]. Microelectronics & Computer, 2022, 39(6): 12-21. DOI: 10.19304/J.ISSN1000-7180.2021.1286

基于RCBA模型的多模态讽刺识别

Multimodal sarcasm recognition based on RCBA model

  • 摘要: 目前,大多数讽刺识别模型都是针对文本数据进行研究,推文中包含的图像数据未得到有效利用,导致讽刺识别任务准确度不高.针对这一问题,提出一种结合注意力机制的联合神经网络模型RCBA,用于图文混合的多模态讽刺识别任务.RCBA模型首先利用结合空间注意力机制和通道注意力机制的深度残差网络(ResNet101)进行图像特征自适应提取; 同时,使用图像属性分类器提取图像属性特征; 其次,将图像属性特征作为双向长短时记忆神经网络(Bi-LSTM)的初始状态,完成文本特征的提取; 随后,通过两层神经网络融合图像特征、图像属性特征和文本特征; 最后使用两层的反向传播网络(BP)作为分类器,完成讽刺识别.该模型在图文Twitter讽刺公开数据集上进行实验,与图文讽刺识别任务的基线模型相比,准确率和F1值分别提升了6.19%、5.29%.实验结果表明RCBA模型能够有效提取多模态数据特征,在讽刺识别任务上具有更好的性能.

     

    Abstract: At present, most of the sarcasm recognition models are based on text data, and the image data contained in the tweets are not used effectively, which leads to the low accuracy of the sarcasm recognition task. Aiming at this problem, a joint neural network model combined with the attention mechanisms model is proposed, use for recognizing the multimodal data which mixed image and text whether contains sarcasm. The RCBA model first uses the 101-layer deep residual network which is combined with spatial attention mechanism and channel attention mechanism to adaptively extract images feature. At the same time, using the image attribute classifier to extract image attribute features. Secondly, the image attribute features are regarded as the initial state of the bidirectional long-short-term memory neural network to complete the extraction of text features. Then, the image features, image attribute features, and text features are fused by a two-layer neural network. Finally, construct a two-layer backpropagation network as a classifier to complete the sarcasm recognition task. RCBA is tested on the public dataset of image-text mixed Twitter sarcasm. Compared with the baseline model of the image-text sarcasm detection task, the accuracy and F1 value are respectively improved by 6.19% and 5.29%. The experimental results show that the RCBA model can effectively extract multi-modal data features and has better performance in the sarcasm recognition task.

     

/

返回文章
返回