基于深度学习的街景下的文本检测

朱志颖; 程艳云

doi:10.19304/J.ISSN1000-7180.2022.0329

摘要: 针对自然街景文本角度倾斜、形状弯曲、长度不定等特点,提出一种基于注意力机制的自然街景文本检测方法,通过利用注意力机制的优势,对主干网络提取的特征进行加权融合,从而提升整体网络的检测性能. 首先，针对特征金字塔（FPN）横向连接中特征信息丢失的问题,引入注意力融合模块AFFM（Attention Feature Fusion Module）,通过计算高维和低维特征的融合权重,来改进原FPN中简单直接相加的特征融合方式,从而减少FPN特征融合过程中文本信息丢失,增强网络的特征提取能力. 其次,针对不同尺度特征图中的文本特征,引入一个子空间注意力模块SAM（Subspace Attention Module）,通过将多尺度融合特征图按通道划分为数个子空间特征图,分别学习每个子空间中的文本特征权重,使得融合后的特征图包含更多不同尺度的文本特征,从而增强融合特征图对文本实例的表征能力,进而提升网络的检测效果. 在公开数据集Total-Text上对模型进行评估,实验结果表明,该算法与目前快速高效的DBNet相比,准确率、召回率和F值分别提高了0.5%、0.4%和0.4%.

Abstract: Aiming at the characteristics of natural street text such as skewed angle, curved shape and variable length, an attention mechanism-based natural street text detection method is proposed to improve the overall network detection performance by taking advantage of the attention mechanism and weighting the fusion of features extracted from the backbone network. Firstly, to address the problem of feature information loss in the lateral connection of the feature pyramid network (FPN), we introduce the Attention Feature Fusion Module (AFFM) to improve the feature fusion method of simple direct summation in the original FPN by calculating the fusion weights of high and low-dimensional features, so as to reduce the text information loss in the process of FPN feature fusion and thus enhance the feature detection performance of the network. This reduces the loss of text information in the process of FPN feature fusion, and enhances the feature extraction capability of the network. Secondly, a Subspace Attention Module (SAM) is introduced for text features in feature maps of different scales, by dividing the multi-scale fused feature map into several subspace feature maps by channels, and learning the text feature weights in each subspace separately, so that the fused feature map contains more text features of different scales, thus The fused feature maps contain more text features at different scales, thus enhancing the characterization ability of the fused feature maps for text instances, and thus improving the detection effect of the network. The model is evaluated on the publicly available dataset Total-Text, and the experimental results show that the algorithm improves the accuracy, recall and F-value by 0.5%, 0.4% and 0.4%, respectively, compared with the current fast and efficient DBNet.

基于深度学习的街景下的文本检测

Text detection in street scene based on deep learning