李德建,冯曦,王国旋,等.Flex-DMA:支持多模式高效传输的DMA系统设计[J]. 微电子学与计算机,2024,41(6):103-114. doi: 10.19304/J.ISSN1000-7180.2023.0330
引用本文: 李德建,冯曦,王国旋,等.Flex-DMA:支持多模式高效传输的DMA系统设计[J]. 微电子学与计算机,2024,41(6):103-114. doi: 10.19304/J.ISSN1000-7180.2023.0330
LI D J,FENG X,WANG G X,et al. Flex-DMA: design of high-performance multi-transfer mode DMA system[J]. Microelectronics & Computer,2024,41(6):103-114. doi: 10.19304/J.ISSN1000-7180.2023.0330
Citation: LI D J,FENG X,WANG G X,et al. Flex-DMA: design of high-performance multi-transfer mode DMA system[J]. Microelectronics & Computer,2024,41(6):103-114. doi: 10.19304/J.ISSN1000-7180.2023.0330

Flex-DMA:支持多模式高效传输的DMA系统设计

Flex-DMA: design of high-performance multi-transfer mode DMA system

  • 摘要: 随着数据密集型科学和高通量应用的迅速发展,专用集成电路设计不断涌现,传输系统不再只有数据传输的需求。现有的一些直接存储器访问(Data Memory Access, DMA)设计可以支持高效的矩阵转置传输,但这些设计不能满足复杂的访存模式,也不具有灵活的可配置性,从而降低计算效率。针对这些问题设计了一种可配置的多模式传输系统Flex-DMA,该系统包含可配置的寄存器以及传输通道,拥有基础模式和单指令多数据(Single Instruction Multiple Data,SIMD)模式。因此,Flex-DMA可根据不同的数据传输需求选择不同的传输模式,灵活配置数据规模和数据格式,支持数据向量化转换、矩阵转置传输等功能。在大规模并行模拟框架中对Flex-DMA做性能评估,其结果表明,Flex-DMA在数据向量化处理中可以获得平均5.14倍的加速比。此外,与MT-DMA结构相比,Flex-DMA在矩阵转置传输中可以获得平均2.52倍性能提升。实验证明:Flex-DMA能满足复杂的访存模式和传输需求,在低传输时延下实现数据的重组和预处理。

     

    Abstract: With the rapid development of data intensive science and high-throughput applications, ASIC designs in special fields are constantly emerging. The transmission system has more needs besides data transmission, and some existing Data Memory Access (DMA) designs can already support efficient matrix transpose transmission. However, these designs cannot meet the complex memory access mode and do not have flexible configurability, resulting in low computational efficiency. Aimed at these problems, a configurable multi-mode transmission system Flex-DMA is designed, which includes configurable registers and transmission channels, and has multiple transmission modes such as basic mode and Single Instruction Multiple Data (SIMD) mode. Due to its configurability, Flex-DMA can select different transmission modes based on various data transmission requirements, flexibly configure data scale and data format, and support vector instruction data conversion and matrix transposition. The performance evaluation of Flex-DMA in a massively parallel simulation framework shows that Flex-DMA can achieve an average speed up of 5.14 times in vectorization processing. In addition, Flex-DMA can achieve an average performance improvement of 2.52 times compared with MT-DMA structures. Experiments prove that Flex-DMA is able to meet complex memory access modes and transmission requirements, and realize data reorganization and preprocessing with low transmission latency.

     

/

返回文章
返回