基于CUDA的任意非结构化LDPC码的高吞吐量并行译码设计与实现

王若天; 沙金

doi:10.19304/J.ISSN1000-7180.2021.0461

基于CUDA的任意非结构化LDPC码的高吞吐量并行译码设计与实现

王若天,
沙金

Design and implementation of high through put parallel decoding of any unstructured LDPC code based on CUDA

摘要

摘要: 由于非结构化的低密度奇偶校验码(LDPC)具有更优异的纠错性能而受到广泛关注，但其非零元素分布较不规律且没有循环或准循环的子矩阵的构造方式，增加了译码器实现的设计难度.本文提出了基于CUDA的译码器设计，用于支持任意非结构化LDPC码的高吞吐量并行译码.利用校验矩阵压缩重排、优化信息存储等手段，设计实现GPU上高效的并行译码内核进行多帧译码.在GTX1660Ti GPU平台上的结果表明，基于TPMP流程的LLR-BP和NMSA译码内核设计吞吐量可分别达到78.88~360.25 Mbps和174.38~1 323.75 Mbps，实现了面向任意非结构化LDPC码的高效并行译码.

Abstract: Unstructured low-density parity-check (LDPC) code, which have better error correction performance, has received widespread attention.However, its irregular distribution of non-zero elements with no cyclic or quasi-cyclic structure in sub-matrix increases the complexity of the decoder implementation. Based on CUDA, a LDPC decoder design is proposedto support high throughput parallel decoding for any unstructured LDPC code. By means of compression and rearrangement of LDPC check matrix and optimization of message storage, an efficient parallel decoding kernel on GPU is designed and implemented for multi-frame decoding. The results on GTX1660Ti GPU platform show that the throughput of LLR-BP and NMSA decoding kernels based on TPMP schedule can achieve 78.88~360.25Mbps and 174.38~1 323.75 Mbps, realizing efficient parallel decoding for any unstructured LDPC codes.

HTML全文

参考文献(17)

施引文献

资源附件(0)