基于FPGA的RNN加速SoC设计与实现

范军; 巩杰; 吴茜凤; 何虎

基于FPGA的RNN加速SoC设计与实现

Design and implementation of Recurrent Neural Network (RNN) acceleration SoC on FPGA

摘要

摘要: 为提高循环神经网络(RNN)推理速度，分析了循环神经网络(RNN)在CPU的运行时间瓶颈、输入向量稀疏性和参数规模.设计RNN加速器核实现矩阵-稀疏向量乘并行计算，并同时将多个输入向量完整存储于片上SRAM，以复用部分权重从而降低DDR带宽需求.通过Verilog HDL对RNN加速器核进行RTL描述，并搭建仿真环境，将语音识别算法DeepSpeech2的网络参数输入RNN加速器核进行功能仿真.基于FPGA，将MicroBlaze处理器与RNN加速器核搭建SoC，由MicroBlaze实现激活函数、向量逐元素相乘等其它计算.实现了DeepSpeech2中RNN部分推理计算，与只使用MicroBlaze处理器相比，速度提高23倍，能量消耗降低9.4倍.

Abstract: To accelerate inference of Recurrent Neural Networks(RNN), the elapsed time on CPU, the sparsity of input vectors and the parameter size of RNNs are analyzed. RNN acceleration core for parallel matrix-sparse vector multiplication is designed. Multiple input vectors are stored on-chip, to reuse part of the weight matrix, reducing data bandwidth between DDR and on-chip SRAM. The RNN acceleration core is implemented in RTL using Verilog HDL. And behavior simulation environment is built, using parameters of a speech recognition algorithm-DeepSpeech2-as inputs of the acceleration core. Acceleration SoC is built on FPGA with MicroBlaze CPU and the RNN acceleration core. The MicroBlaze is responsible for computings like activation functions and element-wise multiplication of vectors. When accelerating RNN part of Deep Speech 2, 23x speed and 9.4x energy efficiency are achieved compared to MicroBlaze only.

HTML全文

参考文献(8)

施引文献

资源附件(0)