陈学斌, 王师, 董岩岩. 面向大数据的并行分类混合算法研究[J]. 微电子学与计算机, 2016, 33(4): 138-140.
引用本文: 陈学斌, 王师, 董岩岩. 面向大数据的并行分类混合算法研究[J]. 微电子学与计算机, 2016, 33(4): 138-140.
CHEN Xue-bin, WANG Shi, DONG Yan-yan. Research on Parallel Classification Hybrid Algorithm for Big Data[J]. Microelectronics & Computer, 2016, 33(4): 138-140.
Citation: CHEN Xue-bin, WANG Shi, DONG Yan-yan. Research on Parallel Classification Hybrid Algorithm for Big Data[J]. Microelectronics & Computer, 2016, 33(4): 138-140.

面向大数据的并行分类混合算法研究

Research on Parallel Classification Hybrid Algorithm for Big Data

  • 摘要: 针对传统分类算法及技术在处理海量异构数据存在的系统性能拓展性低、计算量大、耗时长、分类效果不佳等问题, 采用Map-Reduce与邻近分类算法融合设计适合大数据处理的并行分类混合算法, 利用加权欧氏距离并行计算, 达到提高海量数据分类效率、提高分类识别率和减小资源开销的目的, 搭建Hadoop集群研究并在多个数据集上测试算法的可行性.实验结果表明, 并行分类混合算法在海量数据分类中显现出较好的分类效果, 是可行的海量数据分类模型.

     

    Abstract: To solve the problem of the traditional classification algorithms and technologies in the huge amounts of heterogeneous data, such as low-expanding, large-calculating, time-consuming and poor classification. Parallel Classification Hybrid Algorithm is design by fusion of Map-Reduce and Nearest Neighbor Algorithm and parallel computation of weighted Euclidean distance, which improved the efficiency of mass data classification, improved the classification rate and reduced the cost of resource. The Hadoop platform is built for research and test the feasibility of the algorithm on multiple of data sets. The experimental results demonstrate that the Parallel Classification Hybrid Algorithm show a good classification effect in the massive data classification, is a feasible mass data classification model.

     

/

返回文章
返回