徐丹妮, 贺占庄. 一种基于GPU通用计算的容错方法[J]. 微电子学与计算机, 2014, 31(2): 18-22.
引用本文: 徐丹妮, 贺占庄. 一种基于GPU通用计算的容错方法[J]. 微电子学与计算机, 2014, 31(2): 18-22.
XU Dan-ni, HE Zhan-zhuang. A Fault Tolerance Method Based on GPGPU[J]. Microelectronics & Computer, 2014, 31(2): 18-22.
Citation: XU Dan-ni, HE Zhan-zhuang. A Fault Tolerance Method Based on GPGPU[J]. Microelectronics & Computer, 2014, 31(2): 18-22.

一种基于GPU通用计算的容错方法

A Fault Tolerance Method Based on GPGPU

  • 摘要: 为确保GPU通用计算(GPGPU)程序在CPU-GPU异构平台上运行的可靠性,设计了一种以软件方法实现的容错模型.在分析GPGPU程序运行过程中瞬时故障的产生模式以及错误的传播路径后,对GPGPU程序运行所依赖的CPU端和GPU端分别进行容错设计,并针对GPGPU程序的运行特点,设计能够降低容错运算开销同时提升系统协同工作能力的优化方案,从而在提高GPGPU程序的可靠性的同时降低容错设计所带来的额外开销.通过对典型实例的测试验证了所提出的方案的可行性以及性能.

     

    Abstract: This paper proposes a new fault-tolerant model realized by software method to ensure the reliability of general purpose computation on graphics hardware (GPGPU) on CPU-CPU heterogeneous platform.After analyzing the transient fault occurrence mode and error propagation of GPGPU,fault-tolerant designed both in CPU side and GPU side.An optimal scheme of the fault-tolerant which can reduce the computational overhead and enhance the ability of system interoperability is raised according to the feature of GPGPU.In addition,overhead from the design of fault-tolerance will decline when improving the reliability of GPGPU program.Finally,the feasibility and performance of the model proposed is tested and verified on typical examples.

     

/

返回文章
返回