杨晓军, 王一莉. 一种GIS的中文分词算法研究[J]. 微电子学与计算机, 2010, 27(7): 173-176,180.
引用本文: 杨晓军, 王一莉. 一种GIS的中文分词算法研究[J]. 微电子学与计算机, 2010, 27(7): 173-176,180.
YANG Xiao-jun, WANG Yi-li. Research of Chinese Word Segmentation Algorithms for GIS[J]. Microelectronics & Computer, 2010, 27(7): 173-176,180.
Citation: YANG Xiao-jun, WANG Yi-li. Research of Chinese Word Segmentation Algorithms for GIS[J]. Microelectronics & Computer, 2010, 27(7): 173-176,180.

一种GIS的中文分词算法研究

Research of Chinese Word Segmentation Algorithms for GIS

  • 摘要: 提出了一种应用于GIS领域的中文分词算法.采用将首字和尾种类词用哈希表管理,其余中间字串用Trie树来实现的“首位Hash-Trie树”结构作为词典载体来实现地学词典的高效率存取操作,简化了Trie树的深度,并基于一种改进的正向最大匹配的算法,很好的解决了切分歧义和未登录词的问题.实验结果表明,该算法为GIS中文查询语句的正确理解提供了有效的语义信息.

     

    Abstract: This article proposes a new segmentation algorithm which will be used in the field of GIS. It uses first Hash-Trie tree, two hash table manage the first word and the last word of a geo-item and the reaming words are built on Trie tree whose degree has decreased in that way, as the dictionary and based on an improved MM segmentation algorithm, deal with the ambiguity division and the unregistered words. The experiment indicates that the segmentation method can offer effective semantic information for the field of GIS about the understanding of Chinese query.

     

/

返回文章
返回