Abstract:
To solve the problem of the traditional classification algorithms and technologies in the huge amounts of heterogeneous data, such as low-expanding, large-calculating, time-consuming and poor classification. Parallel Classification Hybrid Algorithm is design by fusion of Map-Reduce and Nearest Neighbor Algorithm and parallel computation of weighted Euclidean distance, which improved the efficiency of mass data classification, improved the classification rate and reduced the cost of resource. The Hadoop platform is built for research and test the feasibility of the algorithm on multiple of data sets. The experimental results demonstrate that the Parallel Classification Hybrid Algorithm show a good classification effect in the massive data classification, is a feasible mass data classification model.