• 北大核心期刊(《中文核心期刊要目总览》2017版)
  • 中国科技核心期刊(中国科技论文统计源期刊)
  • JST 日本科学技术振兴机构数据库(日)收录期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于文本强化与多分支卷积的文本检测方法

屠程力 陈章进 乔栋

屠程力, 陈章进, 乔栋. 基于文本强化与多分支卷积的文本检测方法[J]. 微电子学与计算机, 2022, 39(11): 69-77. doi: 10.19304/J.ISSN1000-7180.2022.0239
引用本文: 屠程力, 陈章进, 乔栋. 基于文本强化与多分支卷积的文本检测方法[J]. 微电子学与计算机, 2022, 39(11): 69-77. doi: 10.19304/J.ISSN1000-7180.2022.0239
TU Chengli, CHEN Zhangjin, QIAO Dong. Text detection method based on text enhancement and multi-branch convolution[J]. Microelectronics & Computer, 2022, 39(11): 69-77. doi: 10.19304/J.ISSN1000-7180.2022.0239
Citation: TU Chengli, CHEN Zhangjin, QIAO Dong. Text detection method based on text enhancement and multi-branch convolution[J]. Microelectronics & Computer, 2022, 39(11): 69-77. doi: 10.19304/J.ISSN1000-7180.2022.0239

基于文本强化与多分支卷积的文本检测方法

doi: 10.19304/J.ISSN1000-7180.2022.0239
基金项目: 

国家自然科学基金 61674100

详细信息
    作者简介:

    屠程力  男,(1996-),硕士研究生.研究方向为神经网络. E-mail: tuchengliic@163.com

    陈章进   男,(1969-),博士,教授.研究方向为深度学习和大屏显示

    乔栋  男,(1995-),硕士研究生.研究方向为神经网络

  • 中图分类号: TP183

Text detection method based on text enhancement and multi-branch convolution

  • 摘要:

    自然场景下的文本检测技术是多种工业应用的前提,但常用检测方法的准确率不佳.为此提出一种基于文本强化与多分支卷积的神经网络方法,用于检测自然场景中的图片文本.首先,在主干网络前加入文本区域强化的网络结构, 在浅层网络提高文本区域的特征值,加强网络对文本特征的学习能力并抑制背景特征的表达.其次,针对场景文本高宽比差异大的特点,设计多分支结构的卷积模块,用接近文本形状的卷积核来表达差异化的感受野,并通过轻量级的注意力机制,补充网络对通道重要性的学习,其参数量仅为通道数的六倍.最后,改进损失函数在分类损失和检测框损失上的计算公式,对文本像素进行加权并引入覆盖预测框和标签框的最小矩形来表达重合度,提高网络在文本数据集上的训练有效性.消融实验和对比实验的结果表明,该方法的各个改进措施有效,在ICDAR2015和MSRA-TD500数据集上的分别取得了83.3%和82.4%的F值,同时在模糊文本、光斑文本和密集文本等困难样本的检测对比中表现较好.

     

  • 图 1  文本检测流程

    Figure 1.  Text detection process

    图 2  四阶段的网络结构

    Figure 2.  Our-stage network structure

    图 3  多分支卷积模块结构

    Figure 3.  Multi-branch convolution module structure

    图 4  LCEM模块结构

    Figure 4.  LCEM module structure

    图 5  三种IOU值相同的情况

    Figure 5.  Three cases with the same IOU value

    图 6  文本真值优化

    Figure 6.  Ext truth optimization

    图 7  损失下降曲线

    Figure 7.  Loss decline curve

    图 8  文本区域强化过程

    Figure 8.  Text region strengthening process

    图 9  三种困难样本的比对结果

    Figure 9.  Comparison results of three difficult samples

    图 10  密集文本的比对结果

    Figure 10.  Comparison results of dense text

    表  1  在ICDAR2015数据集上的实验结果

    Table  1.   experimental results on ICDAR2015 dataset

    方法 Recall/% Precision/% F-score/%
    DMPNet[18] 68.2 73.2 70.6
    SegLink[4] 76.8 73.1 74.9
    East PVANET[19] 71.3 80.8 75.7
    RRPN[5] 73.2 82.1 77.4
    PSENet[9] 79.7 81.5 80.6
    文献[20] 80.3 81.7 80.9
    PixelLink[21] 81.7 82.9 82.3
    TextField[22] 80.5 84.3 82.4
    TextSnake[23] 80.4 84.9 82.5
    TEMC 81.6 85.3 83.3
    下载: 导出CSV

    表  2  各组件的实验结果

    Table  2.   Experimental results of each component

    ResNet 文本强化 多分支卷积+LCEM 改良损失函数 P/% R/% F/%
    × × × 85.8 82.8 84.3
    × × 86.2 83.6 84.9
    × × 88.3 82.4 85.2
    × 87.7 83.6 85.6
    88.4 83.9 86.1
    下载: 导出CSV

    表  3  在MSRA-TD500数据集上的实验结果

    Table  3.   Experimental results on MSRA-TD500 dataset

    方法 Recall/% Precision/% F-score/%
    RRPN[5] 68.0 82.0 74.0
    EAST [19] 67.1 83.5 74.4
    PixelLink[21] 73.2 83.0 77.8
    TextSnake[23] 73.9 83.2 78.3
    PSENet[9] 75.6 80.6 78.0
    文献[20] 83.5 74.8 79.0
    CRAFT[25] 78.2 88.2 82.9
    TEMC 80.4 84.5 82.4
    下载: 导出CSV
  • [1] NING C C, ZHOU H J, SONG Y, et al. Inception single shot MultiBox detector for object detection[C]//2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). Hong Kong, China: IEEE, 2017: 549-554. DOI: 10.1109/ICMEW.2017.8026312.
    [2] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI: 10.1109/TPAMI.2016.2577031.
    [3] TIAN Z, HUANG W L, HE T, et al. Detecting text in natural image with connectionist text proposal network[C]//Proceeding of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016: 56-72. DOI: 10.1007/978-3-319-46484-8_4.
    [4] SHI B G, BAI X, BELONGIE S. Detecting oriented text in natural images by linking segments[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3482-3490. DOI: 10.1109/CVPR.2017.371.
    [5] MA J Q, SHAO W Y, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122. DOI: 10.1109/TMM.2018.2818020.
    [6] HE P, HUANG W L, HE T, et al. Single shot text detector with regional attention[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 3066-3074. DOI: 10.1109/ICCV.2017.331.
    [7] HU H, ZHANG C Q, LUO Y X, et al. WordSup: Exploiting word annotations for character based text detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 4950-4959. DOI: 10.1109/ICCV.2017.529.
    [8] HE W H, ZHANG X Y, YIN F, et al. Deep direct regression for multi-oriented scene text detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 745-753. DOI: 10.1109/ICCV.2017.87.
    [9] WANG W H, XIE E Z, LI X, et al. Shape robust text detection with progressive scale expansion network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 9328-9337. DOI: 10.1109/CVPR.2019.00956.
    [10] DU C, WANG C H, WANG Y N, et al. TextEdge: multi-oriented scene text detection via region segmentation and edge classification[C]//2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney: IEEE, 2019: 375-380. DOI: 10.1109/ICDAR.2019.00067.
    [11] LIAO M H, ZHU Z, SHI B G, et al. Rotation-sensitive regression for oriented scene text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 5909-5918. DOI: 10.1109/CVPR.2018.00619.
    [12] LYU P Y, LIAO M H, YAO C, et al. Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes[C]//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich: Springer, 2018: 71-88. DOI: 10.1007/978-3-030-01264-9_5.
    [13] GUO J M, ZHANG S F, LI J M. Hash learning with convolutional neural networks for semantic based image retrieval[C]//20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Auckland: Springer, 2016: 227-238. DOI: 10.1007/978-3-319-31753-3_19.
    [14] WU S T, ZHONG S H, LIU Y. Deep residual learning for image steganalysis[J]. Multimedia Tools and Applications, 2018, 77(9): 10437-10453. DOI: 10.1007/s11042-017-4440-4.
    [15] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 2818-2826. DOI: 10.1109/CVPR.2016.308.
    [16] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327. DOI: 10.1109/TPAMI.2018.2858826.
    [17] ZHENG Z H, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2020: 12993-13000. DOI: 10.1609/aaai.v34i07.6999.
    [18] LIU Y L, JIN L W. Deep matching prior network: toward tighter multi-oriented text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3454-3461. DOI: 10.1109/CVPR.2017.368.
    [19] ZHOU X Y, YAO C, WEN H, et al. East: an efficient and accurate scene text detector[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2642-2651. DOI: 10.1109/CVPR.2017.283.
    [20] 赵鹏, 徐本朋, 闫石, 等. 基于双分支特征融合的场景文本检测方法[J]. 控制与决策, 2021, 36(9): 2179-2186. DOI: 10.13195/j.kzyjc.2020.0002.

    ZHAO P, XU B P, YAN S, et al. A scene text detection based on dual-path feature fusion[J]. Control and Decision, 2021, 36(9): 2179-2186. DOI: 10.13195/j.kzyjc.2020.0002.
    [21] DENG D, LIU H F, LI X L, et al. PixelLink: detecting scene text via instance segmentation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018. DOI: 10.1609/aaai.v32i1.12269.
    [22] XU Y C, WANG Y K, ZHOU W, et al. TextField: learning a deep direction field for irregular scene text detection[J]. IEEE Transactions on Image Processing, 2019, 28(11): 5566-5579. DOI: 10.1109/TIP.2019.2900589.
    [23] LONG S B, RUAN J Q, ZHANG W J, et al. TextSnake: A flexible representation for detecting text of arbitrary shapes[C]//Proceedings of the 2018 15th European Conference on Computer Vision. Munich: Springer, 2018: 19-35. DOI: 10.1007/978-3-030-01216-8_2.
    [24] PENG S D, JIANG W, PI H J, et al. Deep snake for real-time instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 8530-8539. DOI: 10.1109/CVPR42600.2020.00856.
    [25] BAEK Y, LEE B, HAN D, et al. Character region awareness for text detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 9357-9366. DOI: 10.1109/CVPR.2019.00959.
  • 加载中
图(10) / 表(3)
计量
  • 文章访问数:  76
  • HTML全文浏览量:  63
  • PDF下载量:  6
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-04-11
  • 修回日期:  2022-05-10
  • 网络出版日期:  2022-11-29

目录

    /

    返回文章
    返回