ZOU Z W,SUN W H,CHEN S. Algorithm optimization and hardware acceleration for YOLO post processing[J]. Microelectronics & Computer,2024,41(4):31-37. doi: 10.19304/J.ISSN1000-7180.2022.0896
Citation: ZOU Z W,SUN W H,CHEN S. Algorithm optimization and hardware acceleration for YOLO post processing[J]. Microelectronics & Computer,2024,41(4):31-37. doi: 10.19304/J.ISSN1000-7180.2022.0896

Algorithm optimization and hardware acceleration for YOLO post processing

  • YOLO object detection network series have been widely adopted because of its high precision and low latency, but how to accelerate their post processing is not fully studied. Utilizing the characteristics of YOLO, the post processing algorithm is optimized: (1) the detect layer and post processing are merged through threshold judgement in advance, thus redundant computation and communication are avoided; (2) based on model quantization and systolic array, hardware acceleration for post processing is realized. Experiments prove that the convolution of detect layer of YOLOv3 and YOLOv5 is reduced by 87.3% - 99.9%; the hardware design is implemented on the Virtex Ultrascale+ VCU112 with 100 MHz clock frequency. Compared with traditional computation process, the speedup of detection layer and post processing reaches 7.2 - 9.3, and it costs 1 736 μs to select 5 best boxes out of 3 000 candidates. We have an edge over previous works for 4.7 - 5.0 speedup of detect layer and post processing while only 9.9% - 10.5% FF are used in post processing. The optimization improves the overall inference speed of sparse YOLOv3 by 1.2% - 1.3%.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return