谢帅, 蒋力, 叶瑶瑶. 针对实时目标检测的多维度并行FPGA加速器设计[J]. 微电子学与计算机, 2021, 38(8): 13-19.
引用本文: 谢帅, 蒋力, 叶瑶瑶. 针对实时目标检测的多维度并行FPGA加速器设计[J]. 微电子学与计算机, 2021, 38(8): 13-19.
XIE Shuai, JIANG Li, YE Yaoyao. Multidimensional parallel FPGA accelerator design for real-time object detection[J]. Microelectronics & Computer, 2021, 38(8): 13-19.
Citation: XIE Shuai, JIANG Li, YE Yaoyao. Multidimensional parallel FPGA accelerator design for real-time object detection[J]. Microelectronics & Computer, 2021, 38(8): 13-19.

针对实时目标检测的多维度并行FPGA加速器设计

Multidimensional parallel FPGA accelerator design for real-time object detection

  • 摘要: 目标检测任务对于检测任务精度和实时性都有很高要求,YOLOv3-tiny网络在这两点有很好的表现.但是其复杂的网络结构,使得实际应用需要从软件和硬件方面都进行针对性的优化.为了达到实时要求,综合使用三种优化技术:在软件层面,通过融合批归一层降低计算量,低位宽增大资源利用率;设计多维度并行FPGA计算核心匹配多个卷积层,提高整体吞吐率;细粒度层间流水和pingpong缓存设计,降低数据传输时间.在ZCU104型号的FPGA上,实现了418ⅹ418图片的21ms检测延时,超过同类加速器设计,并在DSP效率上有2.86倍或者8.81倍的提升.

     

    Abstract: The YOLOv3-tiny network performs well in both accuracy and real-time for object detection. However, its complex network structure makes practical applications require targeted optimization from both software and hardware aspects. In order to meet the real-time requirements, three optimization techniques are used comprehensively. At the software level, the amount of computation is reduced through the fusion of batch normalization layer, while the low bit width to increase resource utilization.The multi-dimensional parallel FPGA computation cores are designed to match multiple convolutional layers to improve the overall throughput. Fine-grained inter-layer flow and pingpong buffer design to reduce the data transfer time. With the ZCU104 model FPGA, it achieves a detection latency of 21ms for 418 x 418 images, which exceeds similar accelerator designs and improves the DSP efficiency by 2.86 times or 8.81 times.

     

/

返回文章
返回