机群系统上长序列最大串联重复识别并行算法

Parallel Algorithm for Long Sequences Maximal Tandem Repeats on the Cluster Computing Systems

摘要: 采用适当的划分机制,将序列的后缀划分为若干组,在并行机群中独立对每组进行序列最大串联重复识别,从而得到完整序列的最大串联重复,给出一种减少了处理时间并降低了空间消耗的序列最大串联重复识别并行算法.实验结果表明:该并行算法具有良好的适应性和可扩展性.

Abstract: The sequence suffix is partitioned into several parts by using appropriate policy, each partition is processed independently on the cluster systems to produce a temporary result, the final maximal tandem repeats set is achieved from the union of all temporary results, and a parallel algorithm for long sequences maximal tandem repeats recognition is presented based on partitioning suffix array. The parallel algorithm can decrease processing time and space consumption. The experimental results also show that the algorithm is flexible and scalable.