• 北大核心期刊(《中文核心期刊要目总览》2017版)
  • 中国科技核心期刊(中国科技论文统计源期刊)
  • JST 日本科学技术振兴机构数据库(日)收录期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于LFN的自然场景文本检测

李垚 张健欣 王林

李垚,张健欣,王林.基于LFN的自然场景文本检测[J]. 微电子学与计算机,2023,40(6):17-24 doi: 10.19304/J.ISSN1000-7180.2022.0595
引用本文: 李垚,张健欣,王林.基于LFN的自然场景文本检测[J]. 微电子学与计算机,2023,40(6):17-24 doi: 10.19304/J.ISSN1000-7180.2022.0595
LI Y,ZHANG J X,WANG L. Natural scene text detection based on LFN[J]. Microelectronics & Computer,2023,40(6):17-24 doi: 10.19304/J.ISSN1000-7180.2022.0595
Citation: LI Y,ZHANG J X,WANG L. Natural scene text detection based on LFN[J]. Microelectronics & Computer,2023,40(6):17-24 doi: 10.19304/J.ISSN1000-7180.2022.0595

基于LFN的自然场景文本检测

doi: 10.19304/J.ISSN1000-7180.2022.0595
基金项目: 国家自然科学基金项目(NSFC21868019)
详细信息
    作者简介:

    李垚:女,(1995-),硕士研究生. 研究方向为图像处理,目标检测

    王林:男,(1973-),博士,教授. 研究方向为过程控制

    通讯作者:

    男,(1974-),博士,教授. 研究方向为复杂过程建模与优化控制、生产过程智能控制、目标检测.E-mail:zhangjianxin@imut.edu.cn

  • 中图分类号: TP183

Natural scene text detection based on LFN

  • 摘要:

    在自然场景文本检测领域,现有的深度学习网络仍存在文本误检、漏检、定位不准确的情况. 针对这一问题,本文设计出一种基于大感受野特征网络(Large Receptive Field Feature Network,LFN)的文本检测算法. 首先选取速度和准确度更好的轻量级主干网络ShuffleNet V2,并加入细粒度特征融合模块以获取更多隐藏的文本特征信息;再通过分析不同尺度的特征图感受野不同,并对比不同尺度的特征图进行归一化后得到的特征图尺寸对结果的影响,构造了双融合特征提取模块,对输入图像提取多尺度特征以减少文本特征丢失,增大感受野;最后为处理正负样本失衡的问题,在可微二值化模块中引入Dice Loss,增加文本定位的准确度. 在ICDAR2015和CTW1500数据集上的实验表明,该网络无论是在性能还是速度上对文本检测效果都有显著提升. 其中在ICDAR2015数据集上F1为86.1%,较性能最优的PSENet网络提升了0.4%,速度达到了50 fps,较速度最快的DBNet网络提升了约1.92倍,在CTW1500数据集上F1为83.2%,较PSENet网络提升了1%,速度达到了35 fps,较EAST网络提升了约1.65倍.

     

  • 图 1  基于LFN网络的文本检测网络模型

    Figure 1.  Text detection network model based on LFN

    图 2  ShuffleNet V2单元

    Figure 2.  ShuffleNet V2 unit

    图 3  改进的ShuffleNet V2单元结构

    Figure 3.  The improved ShuffleNet V2 unit structure

    图 4  不同尺度的网格图

    Figure 4.  Grids at different scales

    图 5  概率图P的生成

    Figure 5.  Generation of probability graph P

    图 6  阈值图T的生成

    Figure 6.  Generation of threshold graph T

    图 7  ICDAR2015部分实验结果图

    Figure 7.  ICDAR2015 partial experimental results

    表  1  ShuffleNet V2 整体结构网络

    Table  1.   ShuffleNet V2 overall structure network

    LayerOutput SizeKSizeSROutput channels
    0.51
    Image224×22433
    Conv1112×1123×322424
    Max-Pool56×563×3212424
    Stage 228×282148116
    28×2813
    Stage 314×142196232
    14×1417
    Stage 47×721192464
    7×713
    下载: 导出CSV

    表  2  ICDAR2015上的消融实验

    Table  2.   Ablation experiment on ICDAR2015

    MethodPrecisionRecallF1FPS
    LFN-FGFF83.684.784.236
    LFN-DFFE89.181.485.128
    LFN89.782.886.150
    PSENet86.984.585.71.6
    下载: 导出CSV

    表  3  ICDAR2015上归一化分析

    Table  3.   Normalization analysis on ICDAR2015

    SizePrecisionRecallF1FPS
    P290.178.483.830
    P389.782.886.150
    P487.265.674.953
    P589.726.741.245
    下载: 导出CSV

    表  4  ICDAR2015数据集检测结果

    Table  4.   Test results on ICDAR2015 dataset

    算法PRF1FPS
    CTPN[10](2016)74.251.260.97.1
    SegLink[22](2017)73.176.875.0-
    EAST[23](2017)83.673.578.213.2
    Textboxes++[9](2018)87.276.781.7-
    PSENet[11](2019)86.984.585.71.6
    DBNet[16](2019)88.282.785.426
    LFN89.782.886.150
    下载: 导出CSV

    表  5  CTW1500数据集检测结果

    Table  5.   Test results on CTW1500 dataset

    算法PRF1FPS
    CTPN[10](2016)60.453.856.97.14
    SegLink[22](2017)42.340.040.810.7
    EAST[23](2017)78.749.160.421.2
    CTD+TLOC[21](2017)77.469.873.413.3
    Textsnake[13](2018)67.985.375.6-
    PSENet[11](2019)84.879.782.23.9
    LFN87.379.583.235
    下载: 导出CSV
  • [1] 王润民, 桑农, 丁丁, 等. 自然场景图像中的文本检测综述[J]. 自动化学报,2018,44(12):2113-2141. DOI: 10.16383/j.aas.2018.c170572.

    WANG R M, SANG N, DING D, et al. Text detection in natural scene image: asurvey[J]. Acta Automatica Sinica,2018,44(12):2113-2141. DOI: 10.16383/j.aas.2018.c170572.
    [2] EPSHTEIN B, OFEK E, WEXLER Y. Detecting text in natural scenes with stroke width transform[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco: IEEE, 2010: 2963-2970.
    [3] MATAS J, CHUM O, URBAN M, et al. Robust wide-baseline stereo from maximally stable extremal regions[J]. Image and Vision Computing,2004,22(10):761-767. DOI: 10.1016/j.imavis.2004.02.006.
    [4] TIAN S X, PAN Y F, HUANG C, et al, Text flow: aunified text detection system in natural scene Images[C]//2015 IEEE International Conference on Computer Vision (ICCV). Santiago: IEEE, 2015: 4651-4659.
    [5] 谢斌红, 秦耀龙, 张英俊. 基于学习主动中心轮廓模型的场景文本检测[J]. 计算机工程,2022,48(3):244-252. DOI: 10.19678/j.issn.1000-3428.0060828.

    XIE B H, QIN Y L, ZHANG Y J. Scene text detection based on learning active center contour model[J]. Computer Engineering,2022,48(3):244-252. DOI: 10.19678/j.issn.1000-3428.0060828.
    [6] 易尧华, 杨锶齐, 王新宇, 等. 自然场景文本检测关键技术及应用[J]. 数字印刷,2020(4):1-11. DOI: 10.19370/j.cnki.cn10-1304/ts.2020.04.001.

    YI R H, YANG S Q, WANG X Y, et al. Keytechnology and application of natural scene text detection[J]. Digital Printing,2020(4):1-11. DOI: 10.19370/j.cnki.cn10-1304/ts.2020.04.001.
    [7] 李云洪, 闫君宏, 胡蕾. 局部与全局双重特征融合的自然场景文本检测[J]. 数据采集与处理,2022,37(2):415-425. DOI: 10.16337/j.1004-9037.2022.02.014.

    LI Y H, YAN J H, HU L. Natural scene text detection based on local and global dual-feature fusion[J]. Journal of Data Acquisition and Processing,2022,37(2):415-425. DOI: 10.16337/j.1004-9037.2022.02.014.
    [8] MA J Q, SHAO W Y, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. IEEE Transactions on Multimedia,2018,20(11):3111-3122. DOI: 10.1109/TMM.2018.2818020.
    [9] LIAO M, SHI B, BAI X. TextBoxes++: asingle-shot oriented scene text detector[J]. IEEE Transactions on Image Processing,2018,27(8):3676-3690. DOI: 10.1109/TIP.2018.2825107.
    [10] TIAN Z, HUANG W L, HE T, et al. Detecting text in natural image with connectionist text proposal network[C]//14th European Conference on Computer Vision. Amsterdam: Springer, 2016: 56-72.
    [11] WANG W H, XIE E Z, LI X, et al. Shape robust text detection with progressive scale expansion network[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019: 9328-9337.
    [12] DENG D, LIU H F, LI X L, et al. PixelLink: detecting scene text via instance segmentation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2018.
    [13] LONG S B, RUAN J Q, ZHANG W J, et al. TextSnake: aflexible representation for detecting text of arbitrary shapes[C]//15th European Conference on Computer Vision. Munich: Springer, 2018: 19-35.
    [14] MA N N, ZHANG X Y, ZHENG H T, et al. ShuffleNet V2: practical guidelines for efficient CNNarchitecture design[C]//15th European Conference on Computer Vision. Munich: Springer, 2018: 122-138.
    [15] ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//2018 IEEE/CVF Conference on ComputerVision and Pattern Recognition (CVPR). Salt Lake City: IEEE, 2018: 6848-6856.
    [16] LIAO M H, WAN Z Y, YAO C, et al. Real-time scene text detection with differentiable binarization[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2020: 11474-11481.
    [17] VATTIBR. A generic solution to polygon clipping[J]. Communications of the ACM,1992,35(7):56-63. DOI: 10.1145/129902.129906.
    [18] MILLETARI F, NAVAB N, AHMADI S A. V-Net: fully convolutional neural networks for volumetric medical image segmentation[C]//4th International Conference on 3D Vision. Stanford: IEEE, 2016: 565-571.
    [19] SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 761-769.
    [20] KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]//2015 13th International Conference on Document Analysis andRecognition. Tunis: IEEE, 2015: 1156-1160.
    [21] LIU Y L, JIN L W, ZHANG S T, et al. Detecting curve text in the wild: new dataset and new solution[J]. arXiv: 1712.02170, 2017.
    [22] 李煌, 王晓莉, 项欣光. 基于文本三区域分割的场景文本检测方法[J]. 计算机科学,2020,47(11):142-147. DOI: 10.11896/jsjkx.200800157.

    LI H, WANG X L, XIANG X G. Scene text detection based on triple segmentation[J]. Computer Science,2020,47(11):142-147. DOI: 10.11896/jsjkx.200800157.
    [23] ZHOU X Y, YAO C, WEN H, et al. EAST: an efficient and accurate scene text detector[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017: 2642-2651.
  • 加载中
图(7) / 表(5)
计量
  • 文章访问数:  15
  • HTML全文浏览量:  14
  • PDF下载量:  1
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-09-26
  • 修回日期:  2022-10-27

目录

    /

    返回文章
    返回