基于LFN的自然场景文本检测

李垚; 张健欣; 王林

doi:10.19304/J.ISSN1000-7180.2022.0595

摘要: 在自然场景文本检测领域,现有的深度学习网络仍存在文本误检、漏检、定位不准确的情况. 针对这一问题,本文设计出一种基于大感受野特征网络(Large Receptive Field Feature Network,LFN)的文本检测算法. 首先选取速度和准确度更好的轻量级主干网络ShuffleNet V2,并加入细粒度特征融合模块以获取更多隐藏的文本特征信息；再通过分析不同尺度的特征图感受野不同,并对比不同尺度的特征图进行归一化后得到的特征图尺寸对结果的影响,构造了双融合特征提取模块,对输入图像提取多尺度特征以减少文本特征丢失,增大感受野；最后为处理正负样本失衡的问题,在可微二值化模块中引入Dice Loss,增加文本定位的准确度. 在ICDAR2015和CTW1500数据集上的实验表明,该网络无论是在性能还是速度上对文本检测效果都有显著提升. 其中在ICDAR2015数据集上F1为86.1%,较性能最优的PSENet网络提升了0.4%,速度达到了50 fps,较速度最快的DBNet网络提升了约1.92倍,在CTW1500数据集上F1为83.2%,较PSENet网络提升了1%,速度达到了35 fps,较EAST网络提升了约1.65倍.

Abstract: In the field of natural scene text detection, the existing deep learning network has the situation of text false detection, missed detection, and inaccurate positioning. To solve this problem, a text detection algorithm based on Large Receptive Field Feature Network (LFN) is designed. First, ShuffleNet V2 is selected as a lightweight backbone network with better speed and accuracy, and a fine-grained feature fusion module is added to obtain more hidden text feature information. Then the double fusion feature extraction module is constructed to extract multi-scale features from the input image by analyzing the different receptive fields of different scale feature maps and the influence of the size of the feature map obtained after normalization of feature maps with different scales on the results is compared, thereby reduced feature loss and increased the receptive field. Finally, Dice Loss is introduced into the differentiable binary module to deal with the imbalance between positive and negative samples and increase the accuracy of text location. Experiments result on the ICDAR2015 and CTW1500 datasets show that the network has significantly improved text detection in both performance and speed. The F1 on ICDAR2015 dataset is 86.1%, which is 0.4% higher than PSENet method with the best performance and the speed is 50.3fps, which is about 1.92 times higher than DBNet method with the fastest speed. The F1 on CTW1500 dataset is 83.2%, which is 1% higher than PSENet method and the speed is 35fps, which is about 1.65 times higher than EAST method.

基于LFN的自然场景文本检测

Natural scene text detection based on LFN