康萍萍, 侯进, 周浩然, 陈子锐, 李晨. 基于空间注意力与图卷积的多标签图像分类算法[J]. 微电子学与计算机, 2022, 39(5): 10-19. DOI: 10.19304/J.ISSN1000-7180.2021.1166
引用本文: 康萍萍, 侯进, 周浩然, 陈子锐, 李晨. 基于空间注意力与图卷积的多标签图像分类算法[J]. 微电子学与计算机, 2022, 39(5): 10-19. DOI: 10.19304/J.ISSN1000-7180.2021.1166
KANG Pingping, HOU Jing, ZHOU Haoran, CHEN Zirui, LI Chen. Multi-label image classification algorithm based on spatial attention and graph convolution[J]. Microelectronics & Computer, 2022, 39(5): 10-19. DOI: 10.19304/J.ISSN1000-7180.2021.1166
Citation: KANG Pingping, HOU Jing, ZHOU Haoran, CHEN Zirui, LI Chen. Multi-label image classification algorithm based on spatial attention and graph convolution[J]. Microelectronics & Computer, 2022, 39(5): 10-19. DOI: 10.19304/J.ISSN1000-7180.2021.1166

基于空间注意力与图卷积的多标签图像分类算法

Multi-label image classification algorithm based on spatial attention and graph convolution

  • 摘要: 针对传统多标签图像分类模型存在难以生成更接近相关标签的高层图像特征,以及因未能利用标签之间的视觉相关性而导致的识别精度不够高等问题,提出了一种基于空间注意力与图卷积的多标签图像分类算法.首先,利用图卷积网络学习标签邻接图特征和使用GLOVE算法,从标签序列获取的标签嵌入;其次,在高层语义信息中引入改进的空间注意力网络以对特定类别的语义特征进行重标定,实现背景和干扰信息的抑制;最后,在基于共现特征融合的分类器中,整合高层语义信息与图卷积网络提取的标签共现特征,采用通道一对一的方式完成模型最终预测.在两个公开数据集上进行实验表明,该算法在MS-COCO和VOC-2007数据集上的平均精度分别为81.42%和94.3%,较基础的MLGCN网络分别提升了1.13和1.3个百分点,且模型参数量仅为原模型的八分之一,训练过程中需要的迭代次数也远少于原模型,极大程度地降低了其训练成本.

     

    Abstract: For traditional multi-label image classification models, it is difficult to generate high-level image features that are closer to related labels, and the visual correlation between the labels is not used, which leads to problems such as insufficient recognition accuracy. A multi-label image classification algorithm based on spatial attention and graph convolutionis proposed in this paper. Firstly, the graph convolutional network is used to learn the features of the label adjacency graph and the GLOVE algorithm is usedto obtain the label embedding from the label sequence. Secondly, an improved spatial attention networkis introducedin the high-level semantic information to re-calibrate the semantic features of a specific category and suppress background and interference information.Finally, the high-level semantic information with the tags extracted by the graph convolutional network in the classifier based on co-occurrence feature fusionare integrated, and the final prediction of the modelis completed in the channel one-to-one method. Experiments on two public data sets show that the average accuracy of the proposedalgorithmon the MS-COCO and VOC-2007 data sets are 81.42% and 94.3%, which are 1.13 and 1.3 percentage points higher than the basic MLGCN. The amount of model parameters is only one-eighth of the original model, and the number of iterations required in the training process is far less than that of the original model, which greatly reduces its training cost.

     

/

返回文章
返回