李雨,龚龙庆,赵海婷.基于FPGA的H.264可变块运动估计算法的优化及实现[J]. 微电子学与计算机,2024,41(6):11-19. doi: 10.19304/J.ISSN1000-7180.2023.0943
引用本文: 李雨,龚龙庆,赵海婷.基于FPGA的H.264可变块运动估计算法的优化及实现[J]. 微电子学与计算机,2024,41(6):11-19. doi: 10.19304/J.ISSN1000-7180.2023.0943
LI Y,GONG L Q,ZHAO H T. Optimization and implementation of FPGA-based H.264 variable block motion estimation algorithm[J]. Microelectronics & Computer,2024,41(6):11-19. doi: 10.19304/J.ISSN1000-7180.2023.0943
Citation: LI Y,GONG L Q,ZHAO H T. Optimization and implementation of FPGA-based H.264 variable block motion estimation algorithm[J]. Microelectronics & Computer,2024,41(6):11-19. doi: 10.19304/J.ISSN1000-7180.2023.0943

基于FPGA的H.264可变块运动估计算法的优化及实现

Optimization and implementation of FPGA-based H.264 variable block motion estimation algorithm

  • 摘要: 可变块运动估计算法(Variable Block Size Motion Estimation, VBSME)是H.264标准中重要的组成部分。它不仅计算量大,而且耗时长。为了减少运动估计时间和计算量,本文采用硬件实现的方式,并提出了一种利用绝对差值和(Sum of Absolute Differences, SAD)计算的树状结构进行数据重用的方案。该方案使得各单元间的数据流向明确,结构更简单。同时,本文还考虑到了帧间模式决策(Mode Decision, MD)和SAD计算可以并行计算的可行性,设计了相应的并行流水线结构。利用Xilinx xc7v585tffg1761-1开发板进行了仿真验证。结果表明,该方案可以一次性处理输入的256个像素数据,提高了实时性,并且达到了100%的数据利用率。此外,该方案支持最大分辨率为1 920 × 1080,帧率为60 帧/s,并具备低编码延时,满足了绝大多数场合的实时性要求。

     

    Abstract: Variable Block Size Motion Estimation (VBSME) is the most important part of H.264 standard. It is not only computationally heavy, but also takes the longest time. In order to reduce the time and computation amount of motion estimation, this paper adopts the hardware implementation method, and proposes a data reuse scheme using the tree structure of Sum of Absolute Differences (SAD) calculation. The scheme makes the data flow between each unit clear and the structure is simpler. At the same time, this paper also considers the feasibility of parallel calculation of Mode Decision (MD) and SAD, and designs the corresponding parallel pipeline structure. Xilinx xc7v585tffg1761-1 development board was used for simulation verification. The results show that the scheme can process 256 pixels of input data at one time, improve the real-time performance, and achieve 100% data utilization. In addition, the scheme supports a maximum resolution of 1920×1080, a frame rate of 60fps, and has low coding delay, which meets the real-time requirements of most occasions.

     

/

返回文章
返回