基于聚类中心优化的k-means最佳聚类数确定方法

K-means Optimal Clustering Number Determination Method Based on Clustering Center Optimization

摘要: k-means聚类算法是在确定的聚类数k下对数据集进行聚类, 通常凭借先验规则假定一个k值, 取值具有很大主观性; 此外, k-means初始聚类中心的选择一般也是随机的, 这使得k-means聚类经常出现聚类指标局部最优化, 聚类结果不稳定.针对以上两个问题, 结合密度法改进了k-means初始聚类中心点的选择, 并在此基础上提出了一种确定k-means最佳聚类数的方法.实验结果证明, 该方法可以得到更好的聚类结果, 具有更高的准确性、更好的稳定性以及更优的收敛性.

Abstract: The traditional K-means clustering algorithm requires a certain number of clustering, usually the value of clustering number is assumed by priori rules. Besides, the choice of K-means initial clustering centers is also randomized, which made the clustering result instability. In view of the above factors, improved the selection of initial centers of K-means clusters, and a new method to determine the optimal clustering number is proposed based on that. The experimental results show that the proposed method can get better clustering results and have higher accuracy, better stability and better astringency.