An overview of SRAM in-memory computing
-
摘要:
在处理深度神经网络这类数据密集型应用的过程中,处理器和存储器间大量数据的频繁传输会造成严重的性能损耗和能量消耗,也是当前冯·诺伊曼架构最大的瓶颈.针对传统冯·诺伊曼体系架构的局限性,基于SRAM的存内计算技术将运算单元集成到内存中,支持数据的即存即算,彻底突破了冯·诺伊曼瓶颈,有望成为新一代智能计算架构.本文从体系结构的角度阐明了冯·诺伊曼架构所引起的“功耗墙”和“存储墙”问题,并给出了存内计算技术的兴起原因.文章围绕近几年国内外关于SRAM存内计算架构的研究,以其中几种经典架构为例描述了各类SRAM存内计算的工作机理、优缺点及意义,并从器件级、电路级和架构级的角度分别概述了目前关于SRAM存内计算技术的关键影响因素.SRAM存内计算技术潜力巨大,用途广泛,将会给机器学习应用,图计算应用和基因工程提供高效低能耗的系统结构支持,最后展望了未来几年内SRAM存内计算技术在器件、电路和架构方面的发展情况.
Abstract:In the process of processing data-intensive applications such as deep neural networks, the frequent transfer of large amounts of data between the processor and the memory causes severe performance loss and energy consumption, which is the biggest bottleneck of the current von Neumann architecture. In view of the limitations of the traditional von Neumann architecture, the SRAM-based in-memory computing technology integrates the computing unit into the memory to support data storage and calculation, which completely breaks through the von Neumann bottleneck and is expected to become a new generation Intelligent computing architecture.This paper clarifies the problems of "power wall" and "storage wall" caused by the von Neumann architecture from the perspective of architecture, and gives the reasons for the rise of in-memory computing. The paper focuses on the research of SRAM-based in-memory computing architectures in recent years, and describes the working mechanism, advantages and disadvantages and significance of various SRAM-based in-memory computing architectures by taking several classical architectures as examples. And from the perspective of device level, circuit level and architecture level, the key factors of current SRAM-based in-memory computing technology are summarized respectively. The SRAM-based in-memory computing technology is a promising and versatile technology that will provide efficient and low-energy system architectures for machine learning applications, graph computing applications and genetic engineering. The paper looks forward to the development of SRAM-based in-memory computing technology in devices, circuits and architectures in the coming year
-
Key words:
- data-intensive applications /
- Von Neumann architecture /
- SRAM /
- in-memory computing
-
[1] RACONTEUR. A day in data[EB/OL]. [2020-06-04]. https://www.raconteur.net/infographics/a-day-in-data/. [2] REINSEL D, GANTZ J, RYDNING J. Data age 2025: The digitization of the world from edge to core[R]. IDC White Paper-#US44413318. [3] WANG J C, WANG X W, ECKERT C, et al. A 28-nm compute SRAM with Bit-Serial logic/arithmetic operations for programmable in-memory vector computing[J]. IEEE Journal of Solis-State Circuits, 2020, 55(1): 76-86. DOI: 10.1109/JSSC.2019.2939682. [4] SI X, KHWAW S, CHENJ J, et al. A dual-split 6T SRAM-based computing-in-memory unit-macro with fully parallel product-sum operation for binarized DNNedge processors[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2019, 66(11): 4172-4185. DOI: 10.1109/TCSI.2019.2928043. [5] SI X, TU Y N, HUANGW H, et al. 15.5 A 28nm 64Kb 6T SRAM computing-in-memory macro with 8b MAC operation for AI edge chips[C]//2020 IEEE International Solid-State Circuits Conference (ISSCC). San Francisco, CA, USA: IEEE, 2020: 246-248. DOI: 10.1109/ISSCC19947.2020.9062995. [6] SI X, CHEN J J, TUY N, et al. 24.5A twin-8T SRAM computation-in-memory macro for multiple-bit CNN-based machine learning[C]//2019 IEEE International Solid-State Circuits Conference. San Francisco, CA, USA: IEEE, 2019: 396-398. DOI: 10.1109/ISSCC.2019.8662392. [7] MIYASHITA D, KOUSAI S, SUZUKI T, et al. A Neuromorphic chip optimized for deep learning and CMOS technology with time-domain analog and digital mixed-signal processing[J]. IEEE Journal of Solid-State Circuits, 2017, 52(10): 2679-2689. DOI: 10.1109/JSSC.2017.2712626. [8] YANG J, KONG Y Y, WANG Z, et al. 24.4 sandwich-RAM: An energy-efficient in-memory BWN architecture with pulse-width modulation[C]//2019IEEE International Solid-State Circuits Conference. San Francisco, CA, USA: IEEE, 2019: 394-396. DOI: 10.1109/ISSCC.2019.8662435. [9] NGUYENVT, KIMJS, LEEJW. 10T SRAM computing-in-memory macros for binary and multibit MAC operation of DNN edge processors[J]. IEEE Access, 2021(9): 71262-71276. DOI: 10.1109/ACCESS.2021.3079425. [10] ZHANGY Q, XU L, DONG Q, et al. Recryptor: A reconfigurable cryptographic cortex-M0 processor with in-memory and near-memory computing for IoTsecurity[J]. IEEE Journal ofSolid-State Circuits, 2018, 53(4): 995-1005. DOI: 10.1109/JSSC.2017.2776302. [11] JIANG Z W, YIN S H, SEOK M, et al. XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks[C]//2018 IEEE Symposium on VLSI Technology. Honolulu, HI, USA: IEEE. 2018: 173-174. DOI: 10.1109/VLSIT.2018.8510687. [12] DONG Q, JELOKA S, SALIGANEM, et al. A 0.3 V VDDmin 4+2T SRAM for searching and in-memory computing using 55 nm DDC technology[C]//2017 Symposium on VLSI Circuits. Kyoto, Japan, 2017: C160-C161. DOI: 10.23919/VLSIC.2017.8008465. [13] BISWAS A, CHANDRAKASANA P. Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications[C]// 2018 IEEE International Solid-State Circuits Conference (ISSCC). San Francisco, CA, USA: IEEE, 2018: 488-490. DOI: 10.1109/ISSCC.2018.8310397. [14] GONUGONDLA S K, KANG M G, SHANBHAGN. A 42 pJ/decision 3.12TOPS/W robust in-memory machine learning classifier with on-chip training[C]//2018IEEE International Solid-State Circuits Conference (ISSCC). San Francisco, CA, USA: IEEE, 2018: 490-492. DOI: 10.1109/ISSCC.2018.8310398. [15] ZHANG Z X, CHEN J J, SIX, et al. A 55nm 1-to-8 bit configurable 6T SRAM based computing-in-memory unit-macro for CNN-based AI edge processors[C]//2019 IEEE Asian Solid-State Circuits Conference (A-SSCC). Macau, Macao, China: IEEE, 2019: 217-218. DOI: 10.1109/A-SSCC47793.2019.9056933. [16] GUO R Q, LIU Y G, ZHENG S Y, et al. A 5.1pJ/Neuron 127.3 us/inference RNN-based speech recognition processor using 16 computing-in-memory SRAM macros in 65nm CMOS[C]//2019 Symposium on VLSI Circuits. Kyoto, Japan: IEEE, 2019: C120-C121. DOI: 10.23919/VLSIC.2019.8778028. [17] ECKERT C, WANG X W, WANG J C, et al. Neural cache: bit-serial in-cache acceleration of deep neural networks[C]//2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). Los Angeles, CA, USA: IEEE, 2018: 383-396. DOI: 10.1109/ISCA.2018.00040. [18] YUC S, YOOT, KIM T T, et al. A 16K current-based 8T SRAM compute-in-memory macro with decoupled read/write and 1-5bit column ADC[C]//2020 IEEE Custom Integrated Circuits Conference (CICC). Boston, MA, USA: IEEE, 2020: 1-4. DOI: 10.1109/CICC48029.2020.9075883. [19] XUE C X, CHEN W H, LIU Y S, et al. A 1Mb MultibitReRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors[C]//2019IEEE International Solid-State Circuits Conference (ISSCC). San Francisco, CA, USA: IEEE, 2019: 388-390. DOI: 10.1109/ISSCC.2019.8662395. -