周溜溜, 业宁, 徐昇, 严敏利. 基于频繁子树挖掘的DNA重复序列识别方法[J]. 微电子学与计算机, 2011, 28(9): 193-196,201.
引用本文: 周溜溜, 业宁, 徐昇, 严敏利. 基于频繁子树挖掘的DNA重复序列识别方法[J]. 微电子学与计算机, 2011, 28(9): 193-196,201.
ZHOU Liu-liu, YE Ning, XU Sheng, YAN Min-li. Algorithm of Identification the DNA Repeat Sequence Based on Frequent Subtree Mining[J]. Microelectronics & Computer, 2011, 28(9): 193-196,201.
Citation: ZHOU Liu-liu, YE Ning, XU Sheng, YAN Min-li. Algorithm of Identification the DNA Repeat Sequence Based on Frequent Subtree Mining[J]. Microelectronics & Computer, 2011, 28(9): 193-196,201.

基于频繁子树挖掘的DNA重复序列识别方法

Algorithm of Identification the DNA Repeat Sequence Based on Frequent Subtree Mining

  • 摘要: 提出了一种基于频繁子树挖掘策略说我DNA重复序列识别方法.绕开了传统的序列比对方式, 将序列按照后缀树结构方式进行组织, 再对后缀树形式做了约减改进, 使其更加适合子树挖掘操作, 最后利用频繁子树挖掘的方法对其进行学习.算法可以直接识别出满足设定阈值的重复序列, 避免了由短重复体拼接所造成的时间浪费, 设计的“二次识别技术”使得算法对模糊重复体也有着很好的识别效果, 提高了识别完整度.实验证明:算法在识别效率性能方面较升, 尤其当识别较长重复体时, 优势体现的更为明显, 同时在识别完整度方面也高度可比.

     

    Abstract: The proposed algorithm is based on the thinking of the frequent subtree mining repetitive DNA sequences in the body identified.The organization of DNA sequences in the new algorithm is different from with the others;organized a sequence as a tree, so we could avoid alignment as those traditional methods, then improved trees more simple that could be operating by frequent subtree mining, used a kind of algorithm for mining frequent subtree to learn these trees.This new algorithm could find out the repeated sequences which meet the threshold set directly, avoid the wasting of time result of splicing the short sequences.Designed the new technology "secondary identification", which could find out the fuzzy repetitive sequences, also improved integrity of identification.Experiment show that our mothod improved the time efficiency compared with mainstream algorithms, especially learning to find out some long sequences and highly comparable on the integrity of identification.

     

/

返回文章
返回