The Level of K-means Clustering Algorithm Based on the Minimum Spanning Tree
-
摘要:
针对K-means算法初始化时需要指定聚类数目, 和随机选择初始聚类中心对聚类结果产生不稳定的问题, 结合图论中最小生成树和层次算法的分裂、凝聚思想, 提出一种基于最小生成树的层次K-means算法.该算法初始时根据数据样本生成一颗最小生成树, 然后利用层次分裂思想把数据分成多个较小的簇, 通过K-means算法迭代操作得到每次操作的评价函数值来判断是否进行簇的合并, 进一步确定聚类簇数目.实验结果证明, 该算法能够较准确地判断聚类数目, 并且聚类结果的稳定性比基本K-means算法要好.
Abstract:Aiming at the number of clusters and the problem of the clustering results instability result of random selection of the initial cluster centers.Combine minimum spanning tree(MST) with the thinking of split and cohesion of hierarchical algorithm, proposed a MST-based hierarchical K-means algorithm. A MST is generated by sample data at the time of the algorithm initialization, and then use the thought of disunion, divide the data into smaller cluster. Gain the evaluation function value of each operation by iterative operation of the
K -means algorithm, determine whether to cluster merging with value of evaluation function, to futher determine the clustering number of cluster.Experimental results show that the algorithm can more accurately determine the number of clusters, and the stability of the clustering results of the algorithm is better than the basic K-means. -
表 1 实验数据集
数据集 数据个数 属性值 Iris 150 4 Wine 178 13 Glass 214 9 表 2 100次试验J值
算法数据集 Iris Wine Glass K-means算法 最大值 149.52 18 436.95 259.7 最小值 97.32 16 530.58 213.22 平均值 111.576 17 343.54 236.86 改进层次K-means 最大值 97.34 16 555.68 220.34 最小值 97.19 16 480.97 213.69 平均值 97.227 16 541.99 217.97 -
[1] 雷小锋, 谢昆青, 林帆, 等. 一种基于K-Means局部最优的高效聚类算法[J]. 软件学报, 2008, 19(7): 1683-1692. https://www.cnki.com.cn/Article/CJFDTOTAL-RJXB200807014.htm [2] 贺玲, 吴玲达, 蔡益朝. 数据挖掘中的聚类算法综述[J]. 计算机应用研究, 2007, 23(1): 10-13. https://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ200701002.htm [3] 孙吉贵, 刘杰, 赵连宇. 聚类算法研究[J]. 软件学报, 2008, 19(1): 48-61. https://www.cnki.com.cn/Article/CJFDTOTAL-RJXB200801009.htm [4] 胡伟. 改进的层次K均值聚类算法[J]. 计算机工程与应用, 2013, 49(2): 157-159. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG201302041.htm [5] 徐沁, 罗斌. 结合means-shift与MST的K-means聚类算法[J]. 计算机工程, 2013, 39(12): 214-210. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC201312045.htm [6] Galluccio L, Michel O, Comon P, et al, Graph based kmeans clustering[J]. Signal Processing, 2012, 92(9): 1970-1984. [7] 李翔宇, 王开军, 郭躬德. 基于网格最小生成树的聚类算法选择[J]. 模式识别与人工智能, 2013, 26(1): 34-41. https://www.cnki.com.cn/Article/CJFDTOTAL-MSSB201301010.htm -