陈志佳, 朱元昌, 邸彦强, 冯少冲. 云训练中基于自适应副本策略的容错研究[J]. 微电子学与计算机, 2016, 33(2): 39-43.
引用本文: 陈志佳, 朱元昌, 邸彦强, 冯少冲. 云训练中基于自适应副本策略的容错研究[J]. 微电子学与计算机, 2016, 33(2): 39-43.
CHEN Zhi-jia, ZHU Yuan-chang, DI Yan-qiang, FENG Shao-chong. Research on Fault Tolerance Based on Self-Adaptive Backup Strategy in Cloud Training System[J]. Microelectronics & Computer, 2016, 33(2): 39-43.
Citation: CHEN Zhi-jia, ZHU Yuan-chang, DI Yan-qiang, FENG Shao-chong. Research on Fault Tolerance Based on Self-Adaptive Backup Strategy in Cloud Training System[J]. Microelectronics & Computer, 2016, 33(2): 39-43.

云训练中基于自适应副本策略的容错研究

Research on Fault Tolerance Based on Self-Adaptive Backup Strategy in Cloud Training System

  • 摘要: 为提升系统容错能力, 降低容错开销, 提出一种适应于云训练系统的自适应副本容错策略.首先分析了云训练系统内涵以及容错结构.通过分析确定自适应副本策略待解决的三个问题: 节点选择, 副本数量以及位置分布.引入节点活跃度评价节点是否需要生成副本; 结合容错需求, 得到需要生成的副本数量; 通过加权升序匹配算法, 实现位置分布的确定.实验中, 引入容错度等概念对策略进行评价, 数据表明自适应副本容错策略可以有效保证云训练的容错能力, 降低容错开销.

     

    Abstract: To improve the fault tolerance capability and decrease the fault tolerance overhead, a self-adaptive backup strategy suitable for cloud training system is proposed. The connotation and fault tolerance architecture of cloud training are analyzed. Three problems of the self-adaptive backup strategy are analyzed: node select, the number of backups and the location distribution. The node activity degree is introduced to evaluate the backup demands of nodes. The backup number is obtained combined with fault tolerance demands. By weighted ascending matching select, the location distribution is determined. Fault tolerance degree is introduced to the experiments. The results show that the proposed self-adaptive backup fault tolerance strategy can effectively improve the fault tolerance capability and decrease the overhead of fault tolerance.

     

/

返回文章
返回