基于高层次综合的卷积神经网络设计与优化方法研究

HLS-based design and optimization methodology for convolutional neural network

摘要: 本文基于FPGA高层次综合的设计方法学, 在ZYNQ-7020上实现了一个卷积神经网络加速器.采用循环展开和并行流水的设计方法对卷积核运算进行优化, 均衡了所占用逻辑资源及运算效率, 从而实现加速器的最优性能.通过MINST数据集在100MHz的工作频率下对加速器进行性能测试, 结果表明:对单张图片, 该加速器相对于通用平台ARM A9可实现3.77倍加速, 而对1000张图片的流式处理可实现高达6.14倍加速.

Abstract: Based on High Level Synthesis (HLS) design methodology of FPGA, this paper implements a convolutional neural network accelerator on ZYNQ-7020. The design method of cyclic unroll and pipelinling is used to optimize the convolution kernel operation, and the occupied logic resources and operation efficiency are balanced to achieve the optimal performance of the accelerator. The performance of the accelerator is tested by the MINST dataset at 100MHz working frequency. The results show that:the accelerator can achieve 3.77 times acceleration compared to the general platform ARM A9for a single picture, and the streaming processing of thousands of pictures can achieve up to 6.14 times acceleration.