WANG Lin, CHEN Qingchao. Research on topic discovery based on hadoop gray wolf optimized K-means algorithm[J]. Microelectronics & Computer, 2022, 39(4): 24-32. DOI: 10.19304/J.ISSN1000-7180.2021.0862
Citation: WANG Lin, CHEN Qingchao. Research on topic discovery based on hadoop gray wolf optimized K-means algorithm[J]. Microelectronics & Computer, 2022, 39(4): 24-32. DOI: 10.19304/J.ISSN1000-7180.2021.0862

Research on topic discovery based on hadoop gray wolf optimized K-means algorithm

  • Quickly and accurately discovering hot topics in massive network data plays an important role in network public opinion monitoring. Aiming at the problem that the K-means algorithm is sensitive to the initial center point selection and the global search ability is insufficient, an improved gray wolf optimization K-means IGWO-KM algorithm based on Hadoop is proposed. First, the algorithm combines the gray wolf optimization algorithm with the K-means algorithm, and takes advantage of the gray wolf optimization algorithm′s fast convergence speed and global optimization for K-means to search for the best clustering center, reducing the random selection of the initial center point The resulting clustering results are unstable to obtain better clustering results. Secondly, use nonlinear convergence factors to improve the gray wolf optimization algorithm, and coordinate the algorithm′s global and local search capabilities. Then, the sine cosine algorithm is introduced and improved to enhance the global search ability of the gray wolf optimization algorithm, optimize the optimization accuracy and convergence speed, and avoid falling into the local optimum. After that, the nearest neighbor space sphere is used to reduce the redundant distance calculation in the K-means clustering process to speed up the algorithm convergence. Finally, the Hadoop cluster can process data in batches to realize the parallelization of algorithms. The experimental results show that the IGWO-KM algorithm has better optimization accuracy and stability. Compared with the GWO-KM algorithm and K-means, the algorithm has significantly improved Precision, Recall and F value, and has good convergence speed and scalability.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return