贺飞龙, 蒋林, 刘新闯, 山蕊, 王昱, 吴皓月. 一种提前终止单元划分的帧内预测动态可重构实现[J]. 微电子学与计算机, 2020, 37(2): 15-19.
引用本文: 贺飞龙, 蒋林, 刘新闯, 山蕊, 王昱, 吴皓月. 一种提前终止单元划分的帧内预测动态可重构实现[J]. 微电子学与计算机, 2020, 37(2): 15-19.
HE Fei-long, JIANG Lin, LIU Xin-chuang, SHAN Rui, WANG Yu, WU Hao-yue. An intra-prediction dynamic reconfigurable implementation of early termination unit partitioning[J]. Microelectronics & Computer, 2020, 37(2): 15-19.
Citation: HE Fei-long, JIANG Lin, LIU Xin-chuang, SHAN Rui, WANG Yu, WU Hao-yue. An intra-prediction dynamic reconfigurable implementation of early termination unit partitioning[J]. Microelectronics & Computer, 2020, 37(2): 15-19.

一种提前终止单元划分的帧内预测动态可重构实现

An intra-prediction dynamic reconfigurable implementation of early termination unit partitioning

  • 摘要: 针对专用硬件实现高效视频编码(High Efficiency Video Coding,HEVC)帧内预测算法资源占用大,且硬件资源不能重复利用、灵活性差的问题.提出一种可重构的视频阵列处理器,能够根据当前视频序列的特点进行帧内预测算法的动态映射.首先,分析HEVC帧内预测算法的特点和重构的可行性,以提前终止编码块划分的阈值作为处理器进行硬件重构的依据.其次,以计算出来的参数驱动可重构阵列处理器进行硬件重构.最后,在重构的阵列处理器上进行帧内预测算法映射.通过在4×4的可重构阵列上进行Planar和DC两种预测模式实现,结果表明:与专用硬件实现方法相比资源减少了65%,与多核处理器实现方法相比延时降低了32%.

     

    Abstract: The High Efficiency Video Coding (HEVC) intra prediction algorithm for the dedicated hardware has a large resource occupation, and the hardware resources cannot be reused and the flexibility is poor. A reconfigurable video array processor is proposed, which can dynamically map the intra prediction algorithm according to the characteristics of the current video sequence. Firstly, the characteristics of HEVC intra prediction algorithm and the feasibility of reconstruction are analyzed. The threshold of early termination of coding block partition is determined as the basis for processor hardware reconstruction. Second, the reconfigurable array processor is driven by the calculated parameters for hardware reconstruction. Finally, intra prediction algorithm mapping is performed on the reconstructed array processor. By performing Planar and DC prediction mode experiments on a 4×4 reconfigurable array, the results show that the resource is reduced by 65% compared with the dedicated hardware implementation method, and the latency is reduced by 32% compared with the multi-core processor implementation.

     

/

返回文章
返回