Abstract:
In
k-means clustering, we are given a set of n data points in
d-dimensional space
Rd and an integer
K the problem is to determine a set of
K points in
Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. The initial centers of direct
k-means algorithm are chosen randomly, different initial centers will lead to different results. In this paper, in view of the deficiency of direct
k-means algorithm, we propose a novel method about initial centers based on sorting and partition and apply it to real data as well as simulated data, which show that this is a simple and efficient method to improve the clustering accuracy and efficiency.