Abstract:
Aiming at the poor classification accuracy of minority classes by classifier on unbalanced data sets, an improved k-means bi-directional sampling algorithm KMBS (k-means bi-directional sampling) is proposed, and integrated learning is applied to the classification algorithm. First, the improved k-means clustering algorithm is used to divide the original data set into different clustering clusters. Secondly, oversampling of the minority and under-sampling of the majority in the cluster using the modified SMOTE algorithm in the cluster, so as to make the dataset balance. Multiple executions of this algorithm can produce multiple data sets with large differences, so multiple classifiers with large differences can be trained to improve the effect of ensemble learning. By analyzing the experimental results, this algorithm can not only improve the overall classification performance, but also improve the classification performance of a few kinds of samples.