• 北大核心期刊(《中文核心期刊要目总览》2017版)
  • 中国科技核心期刊(中国科技论文统计源期刊)
  • JST 日本科学技术振兴机构数据库(日)收录期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向AIoT芯片的OCR模型压缩方案研究

甘芷莹 许达文

甘芷莹, 许达文. 面向AIoT芯片的OCR模型压缩方案研究[J]. 微电子学与计算机, 2022, 39(11): 110-117. doi: 10.19304/J.ISSN1000-7180.2022.0241
引用本文: 甘芷莹, 许达文. 面向AIoT芯片的OCR模型压缩方案研究[J]. 微电子学与计算机, 2022, 39(11): 110-117. doi: 10.19304/J.ISSN1000-7180.2022.0241
GAN Zhiying, XU Dawen. Research on OCR model compression scheme for AIoT chips[J]. Microelectronics & Computer, 2022, 39(11): 110-117. doi: 10.19304/J.ISSN1000-7180.2022.0241
Citation: GAN Zhiying, XU Dawen. Research on OCR model compression scheme for AIoT chips[J]. Microelectronics & Computer, 2022, 39(11): 110-117. doi: 10.19304/J.ISSN1000-7180.2022.0241

面向AIoT芯片的OCR模型压缩方案研究

doi: 10.19304/J.ISSN1000-7180.2022.0241
基金项目: 

国家自然科学基金面上项目 61874124

详细信息
    作者简介:

    甘芷莹   女,(1996-),硕士研究生.研究方向为深度学习加速.E-mail: gan_zhiying@163.com

    许达文   男,(1986-),博士,副教授.研究方向为GPU通用计算与嵌入式系统

  • 中图分类号: TP183;TN492

Research on OCR model compression scheme for AIoT chips

  • 摘要:

    基于深度学习的OCR模型通常由CNN和RNN/LSTM构成,模型计算量大、权重参数多,导致在边缘设备上推理需要大量的计算资源才有可能达到性能要求.CPU和GPU这样的通用处理器无法同时满足处理速度和功耗要求,并且成本非常高.随着深度学习的普及,神经处理单元NPU在许多嵌入式和边缘设备中变得普遍,它具有高吞吐量的计算能力来处理神经网络所涉及的矩阵运算.以基于CRNN的OCR模型为例,面向AIoT芯片给出一个解决方案,通过剪枝和量化两种压缩算法降低网络参数冗余度,减少计算开销但仍能得到一个准确性和鲁棒性高的压缩模型,使得模型能够部署在NPU上.实验结果表明:对剪枝微调后的模型进行参数量化,稀疏度为78%量化后的模型精度降低不超过3%,模型大小从15.87 MB压缩为3.13 MB,将压缩后的模型部署到NPU端,与在CPU和GPU上的实现相比,NPU在延迟上分别实现了28.87倍和6.1倍的加速.

     

  • 图 1  非结构化剪枝的全连接网络示例

    Figure 1.  Example of fully connected network with unstructured pruning

    图 2  NPU架构的总体框图

    Figure 2.  NPU architecture diagram

    图 3  OCR模型压缩流程

    Figure 3.  OCR model compression process

    图 4  NPU剪枝流程

    Figure 4.  NPU pruning process

    图 5  两种剪枝策略对模型精度的影响

    Figure 5.  Effect of two pruning strategies on model accuracy

    图 6  网络剪枝微调优化对模型精度的影响

    Figure 6.  Effect of network pruning fine-tuning optimization on model accuracy

    图 7  参数量化对模型精度的影响

    Figure 7.  Effect of parameter quantification on model

    图 8  不同压缩方式对模型精度影响

    Figure 8.  Effect of different compression methods on model accuracy

    图 9  不同压缩方式对模型大小的影响

    Figure 9.  Effect of different compression methods on model size

    表  1  不同稀疏度模型推理时间

    Table  1.   Inference times for different sparsity models

    稀疏度/% 0 20 50 62.5 78
    推理时间/ms 55.1 53.3 48.6 45.2 38.7
    下载: 导出CSV
  • [1] XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 5987-5995. DOI: 10.1109/CVPR.2017.634.
    [2] LIN K, LI D Q, HE X D, et al. Adversarial ranking for language generation[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017: 3158-3168.
    [3] PU Q F, ANANTHANARAYANAN G, BODIK P, et al. Low latency geo-distributed data analytics[J]. ACM SIGCOMM Computer Communication Review, 2015, 45(4): 421-434. DOI: 10.1145/2829988.2787505.
    [4] DING H C, GUO Y X, LI X H, et al. Beef up the edge: spectrum-aware placement of edge computing services for the internet of things[J]. IEEE Transactions on Mobile Computing, 2019, 18(12): 2783-2795.
    [5] 刘伟佳, 李博权. 物联网、大数据分析和机器学习技术在灾备中的应用研究[J]. 微电子学与计算机, 2018, 35(12): 55-58. DOI: 10.19304/j.cnki.issn1000-7180.2018.12.011.

    LIU W J, LI B Q. Application research on internet of things, big data analysis and machine learning technology in backup for disaster recovery[J]. Microelectronics & Computer, 2018, 35(12): 55-58. DOI: 10.19304/j.cnki.issn1000-7180.2018.12.011.
    [6] CHANG Z Q, LIU S B, XIONG X X, et al. A survey of recent advances in edge-computing-powered artificial intelligence of things[J]. IEEE Internet of Things Journal, 2021, 8(18): 13849-13875. DOI: 10.1109/JIOT.2021.3088875.
    [7] 汪晶, 王君鹏, 孙文昊, 等. 用于脉冲卷积神经网络的神经形态处理VLSI架构设计[J]. 微电子学与计算机, 2020, 37(12): 1-5. DOI: 10.19304/j.cnki.issn1000-7180.2020.12.001.

    WANG J, WANG J P, SUN W H, et al. A neuromorphic hardware design of a spiking convolutional neural network[J]. Microelectronics & Computer, 2020, 37(12): 1-5. DOI: 10.19304/j.cnki.issn1000-7180.2020.12.001.
    [8] SHI B G, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2298-2304. DOI: 10.1109/TPAMI.2016.2646371.
    [9] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. DOI: 10.1162/neco.1997.9.8.1735.
    [10] LIU Z, LI J G, SHEN Z Q, et al. Learning efficient convolutional networks through network slimming[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2755-2763. DOI: 10.1109/ICCV.2017.298.
    [11] HAN S, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2015: 1135-1143.
    [12] GHOLAMI A, KIM S, DONG Z, et al. A survey of quantization methods for efficient neural network inference[Z]. arXiv preprint arXiv: 2103.13630, 2021.
    [13] Advantages of fixed-point numbers on hardware, 2019[EB/OL]. [2022-03-15]. https://www.ni.com/documentation/en/labview/latest/data-types/advantages-fixed-point-numbers/.
    [14] JADERBERG M, SIMONYAN K, VEDALDI A, et al. Synthetic data and artificial neural networks for natural scene text recognition[Z]. arXiv preprint arXiv: 1406.2227, 2014.
    [15] RUDER S. An overview of gradient descent optimization algorithms[EB/OL]. [2018-01-23]. https://arxiv.org/abs/1609.04747.
  • 加载中
图(9) / 表(1)
计量
  • 文章访问数:  84
  • HTML全文浏览量:  71
  • PDF下载量:  6
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-04-12
  • 修回日期:  2022-05-19
  • 网络出版日期:  2022-11-29

目录

    /

    返回文章
    返回