隋远峰, 常亮, 赵思濛, 常玉春. 基于3D-cube结构的卷积神经网络硬件加速[J]. 微电子学与计算机, 2021, 38(8): 34-39.
引用本文: 隋远峰, 常亮, 赵思濛, 常玉春. 基于3D-cube结构的卷积神经网络硬件加速[J]. 微电子学与计算机, 2021, 38(8): 34-39.
SUI Yuanfeng, CHANG Liang, ZHAO Simeng, CHANG Yuchun. Hardware acceleration of convolutional neural network based on 3D-cube structure[J]. Microelectronics & Computer, 2021, 38(8): 34-39.
Citation: SUI Yuanfeng, CHANG Liang, ZHAO Simeng, CHANG Yuchun. Hardware acceleration of convolutional neural network based on 3D-cube structure[J]. Microelectronics & Computer, 2021, 38(8): 34-39.

基于3D-cube结构的卷积神经网络硬件加速

Hardware acceleration of convolutional neural network based on 3D-cube structure

  • 摘要: 传统的卷积神经网络需要大量的运算单元和繁琐的数据存取,导致计算速度较慢,效率不高.本文设计了全新的数据块结构以充分利用数据复用,大大减少数据读取次数,并且全面调用FPGA的并行运算资源,同时进行多个乘加操作,实现了高效并行卷积计算电路.将权重和偏置参数分别融合、最优化量化,减少了内存占用.通过以VGG16作为测试网络,在识别Imagenet数据集时,精度仅损失了0.02%,在200 MHz的情况下,吞吐率达到了129.6 GOPS,功耗仅为5.26 W.

     

    Abstract: Traditional convolutional neural network requires a large number of computing units and too much data access, resulting in slow calculation speed and low efficiency. A new data block structure is designed to make full use of data multiplexing, greatly reducing the number of data reading and fully calling the parallel computing resources of the FPGA. In this way, multiple multiplication and addition operations are carried out simultaneously, to realize an efficient parallel convolution calculation circuit. The weight and bias parameters are separately fused, optimized and quantized to reduce memory usage. By using VGG16 as the test network, when identifying the Imagenet data set, the accuracy was only lost by 0.02%. In the case of 200 MHz, the throughput rate reached 129.6 GOPS and the power consumption was only 5.2 6W.

     

/

返回文章
返回