基于文本强化与多分支卷积的文本检测方法

屠程力; 陈章进; 乔栋

doi:10.19304/J.ISSN1000-7180.2022.0239

基于文本强化与多分支卷积的文本检测方法

Text detection method based on text enhancement and multi-branch convolution

摘要

摘要: 自然场景下的文本检测技术是多种工业应用的前提，但常用检测方法的准确率不佳.为此提出一种基于文本强化与多分支卷积的神经网络方法，用于检测自然场景中的图片文本.首先，在主干网络前加入文本区域强化的网络结构, 在浅层网络提高文本区域的特征值，加强网络对文本特征的学习能力并抑制背景特征的表达.其次，针对场景文本高宽比差异大的特点，设计多分支结构的卷积模块，用接近文本形状的卷积核来表达差异化的感受野，并通过轻量级的注意力机制，补充网络对通道重要性的学习，其参数量仅为通道数的六倍.最后，改进损失函数在分类损失和检测框损失上的计算公式，对文本像素进行加权并引入覆盖预测框和标签框的最小矩形来表达重合度，提高网络在文本数据集上的训练有效性.消融实验和对比实验的结果表明，该方法的各个改进措施有效，在ICDAR2015和MSRA-TD500数据集上的分别取得了83.3%和82.4%的F值，同时在模糊文本、光斑文本和密集文本等困难样本的检测对比中表现较好.

Abstract: Because text detection technology in natural scenes is the premise of many industrial applications and the accuracy of common detection methods is not good, this paper proposes a neural network method based on text enhancement and multi-branch convolution to detect the picture text in natural scenes. Firstly, this paper adds the network structure of text area reinforcement in front of the backbone network, and increases the feature value of text area in the shallow network to strengthen the learning ability of the network to text features and suppress the expression of background features. Secondly, in view of the large difference in the aspect ratio of the scene text, this paper designs a convolution module with multi-branch structure and uses convolution kernel close to the shape of text to express the differentiated receptive field, and uses a lightweight attention mechanism to supplement the network's learning of the importance of channels with its parameters being only six times the number of channels. Finally, this paper improves the calculation formula of loss function on classification loss and detection box loss to weight text pixels and introduce the smallest rectangle covering prediction box and label box to express coincidence degree, thus improving the effectiveness of network training on text data sets. The results of ablation experiment and comparison experiment show that all the improvement measures of this method are effective, which achieves 83.3% and 82.4% F values on ICDAR2015 and MSRA-TD500 data sets, respectively, and performs well in the detection and comparison of difficult samples such as fuzzy text, light spot text and dense text.

HTML全文

参考文献(25)

施引文献

资源附件(0)