融合网格掩膜和残差坐标注意力的行人重识别

周传华; 夏徐东; 周东东; 周子涵

doi:10.19304/J.ISSN1000-7180.2021.1269

融合网格掩膜和残差坐标注意力的行人重识别

Person re-identification combining gridmask and residual coordinate attention

摘要

摘要: 传统行人重识别方法通过人工进行特征提取，成本较高且难以应用于复杂场景下的识别任务.深度学习应用于行人重识别问题上可以使得模型具有自主提取特征的能力，识别效果有明显提升的同时降低了成本.更深层的网络可以提高网络的特征表达能力，但随着网络层数的增加，网络会出现梯度消失的问题.残差网络可以缓解梯度消失问题，但提取出的特征信息难以被合理使用.本文针对残差网络进行了优化，引入坐标注意力机制模块.通过坐标注意力机制模块强化高贡献率特征信息，弱化低贡献率特征信息来提升网络特征表达能力.影响行人重识别模型识别效果的另一重要因素是行人图像部分存在被遮挡现象，本文引入网格掩膜的数据增强方法，在降低网络过拟合的同时提高网络泛化能力，有效缓解了现实场景中存在行人图像被遮挡的问题.最后使用困难三元组损失对网络进行监督训练。实验结果表明，该算法在CUHK03-Label、CUHK03-Detect、Market-1501和DukeMTMC-reID数据集上其Rank-1值分别达到了78.7%、75.8%、95.7%和89.6%，mAP值分别达到了78.7%、76.3%、73.1%和88.2%.

Abstract: The traditional pedestrian re-recognition method uses manual feature extraction, which has high cost and is difficult to be applied to recognition tasks in complex scenes. The application of deep learning to pedestrian re-recognition can make the model have the ability to extract features independently, and the recognition effect is significantly improved while the cost is reduced. The deeper network can improve the feature expression ability of the network, but the gradient will disappear with the increase of network layers. Residual network can alleviate the problem of gradient disappearance, but the extracted feature information is difficult to be used reasonably. In this paper, the residual network is optimized and the coordinate attention mechanism module is introduced. The coordinate attention mechanism module is used to strengthen the feature information of high contribution rate and weaken the feature information of low contribution rate to improve the network feature expression ability. Another important factor affecting the recognition effect of pedestrian re-recognition model is the phenomenon of occlusion in part of pedestrian images. In this paper, the data enhancement method of grid mask is introduced to reduce network overfitting and improve network generalization ability, which effectively alleviates the problem of occlusion in pedestrian images in real scenes. Finally, difficult triplet loss is used to supervise and train the network. The experimental results show that the rank-1 value of this algorithm can reach 78.7%, 75.8%, 95.7% and 89.6% on CUHK03-Label, CUHK03-Detect, Market-1501 and DukeMTMC-reID datasets, respectively. The mAP values were 78.7%, 76.3%, 73.1% and 88.2%, respectively.

HTML全文

参考文献(35)

施引文献

资源附件(0)