Research on OCR model compression scheme for AIoT chips
-
摘要:
基于深度学习的OCR模型通常由CNN和RNN/LSTM构成,模型计算量大、权重参数多,导致在边缘设备上推理需要大量的计算资源才有可能达到性能要求.CPU和GPU这样的通用处理器无法同时满足处理速度和功耗要求,并且成本非常高.随着深度学习的普及,神经处理单元NPU在许多嵌入式和边缘设备中变得普遍,它具有高吞吐量的计算能力来处理神经网络所涉及的矩阵运算.以基于CRNN的OCR模型为例,面向AIoT芯片给出一个解决方案,通过剪枝和量化两种压缩算法降低网络参数冗余度,减少计算开销但仍能得到一个准确性和鲁棒性高的压缩模型,使得模型能够部署在NPU上.实验结果表明:对剪枝微调后的模型进行参数量化,稀疏度为78%量化后的模型精度降低不超过3%,模型大小从15.87 MB压缩为3.13 MB,将压缩后的模型部署到NPU端,与在CPU和GPU上的实现相比,NPU在延迟上分别实现了28.87倍和6.1倍的加速.
Abstract:Deep learning-based OCR models usually consist of CNN and RNN/LSTM, which are computationally intensive and have many weight parameters, resulting in a large amount of computational resources required to achieve the performance requirements for inference in edge devices. general-purpose processors such as CPU and GPU cannot meet both processing speed and power requirements, and are very costly. With the popularity of deep learning, neural processing units NPUs are becoming common in many embedded and edge devices with high throughput computational power to handle the matrix operations involved in neural networks. An OCR model based on CRNN, for example, gives a solution for AIoT chips that reduces the redundancy of network parameters through two compression algorithms, pruning and quantization, to reduce the computational overhead but still obtain a compression model with high accuracy and robustness, enabling the model to be deployed on NPUs. Experimental results show that parameter quantization of the pruned and fine-tuned model reduces the accuracy of the quantized model by no more than 3% with a sparsity of 78% and compresses the model size from 15.87MB to 3.13MB. Deploying the compressed model to the NPU side, the NPU achieves a 28.87x and 6.1x speedup in latency compared to the implementations on the CPU and GPU, respectively.
-
Key words:
- AIoT /
- OCR recognition /
- model compression /
- parameter quantization /
- network pruning
-
表 1 不同稀疏度模型推理时间
Table 1. Inference times for different sparsity models
稀疏度/% 0 20 50 62.5 78 推理时间/ms 55.1 53.3 48.6 45.2 38.7 -
[1] XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 5987-5995. DOI: 10.1109/CVPR.2017.634. [2] LIN K, LI D Q, HE X D, et al. Adversarial ranking for language generation[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017: 3158-3168. [3] PU Q F, ANANTHANARAYANAN G, BODIK P, et al. Low latency geo-distributed data analytics[J]. ACM SIGCOMM Computer Communication Review, 2015, 45(4): 421-434. DOI: 10.1145/2829988.2787505. [4] DING H C, GUO Y X, LI X H, et al. Beef up the edge: spectrum-aware placement of edge computing services for the internet of things[J]. IEEE Transactions on Mobile Computing, 2019, 18(12): 2783-2795. [5] 刘伟佳, 李博权. 物联网、大数据分析和机器学习技术在灾备中的应用研究[J]. 微电子学与计算机, 2018, 35(12): 55-58. DOI: 10.19304/j.cnki.issn1000-7180.2018.12.011.LIU W J, LI B Q. Application research on internet of things, big data analysis and machine learning technology in backup for disaster recovery[J]. Microelectronics & Computer, 2018, 35(12): 55-58. DOI: 10.19304/j.cnki.issn1000-7180.2018.12.011. [6] CHANG Z Q, LIU S B, XIONG X X, et al. A survey of recent advances in edge-computing-powered artificial intelligence of things[J]. IEEE Internet of Things Journal, 2021, 8(18): 13849-13875. DOI: 10.1109/JIOT.2021.3088875. [7] 汪晶, 王君鹏, 孙文昊, 等. 用于脉冲卷积神经网络的神经形态处理VLSI架构设计[J]. 微电子学与计算机, 2020, 37(12): 1-5. DOI: 10.19304/j.cnki.issn1000-7180.2020.12.001.WANG J, WANG J P, SUN W H, et al. A neuromorphic hardware design of a spiking convolutional neural network[J]. Microelectronics & Computer, 2020, 37(12): 1-5. DOI: 10.19304/j.cnki.issn1000-7180.2020.12.001. [8] SHI B G, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2298-2304. DOI: 10.1109/TPAMI.2016.2646371. [9] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. DOI: 10.1162/neco.1997.9.8.1735. [10] LIU Z, LI J G, SHEN Z Q, et al. Learning efficient convolutional networks through network slimming[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2755-2763. DOI: 10.1109/ICCV.2017.298. [11] HAN S, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2015: 1135-1143. [12] GHOLAMI A, KIM S, DONG Z, et al. A survey of quantization methods for efficient neural network inference[Z]. arXiv preprint arXiv: 2103.13630, 2021. [13] Advantages of fixed-point numbers on hardware, 2019[EB/OL]. [2022-03-15]. https://www.ni.com/documentation/en/labview/latest/data-types/advantages-fixed-point-numbers/. [14] JADERBERG M, SIMONYAN K, VEDALDI A, et al. Synthetic data and artificial neural networks for natural scene text recognition[Z]. arXiv preprint arXiv: 1406.2227, 2014. [15] RUDER S. An overview of gradient descent optimization algorithms[EB/OL]. [2018-01-23]. https://arxiv.org/abs/1609.04747. -