Text classification model based on multi-channel attention mechanism
-
摘要:
为解决卷积神经网络(CNN)和循环神经网络(RNN)处理文本分类任务时,由于文本特征稀疏造成的关键特征信息丢失、模型性能不高和分类效果不佳等问题.提出一种基于多通道注意力机制的文本分类模型,首先利用字词融合的形式进行向量表示,然后利用CNN和BiLSTM提取文本的局部特征和上下文关联信息,接着以注意力机制对各通道的输出信息进行特征加权,凸显特征词在上下文信息的重要程度,最后将输出结果进行融合,并使用softmax计算文本类别概率.在数据集的对比实验结果表明,所提模型的分类效果更为优异.相较于单个通道的模型分类效果,F1值分别提升1.44%和1.16%,验证了所提模型在处理文本分类任务的有效性.该模型互补了CNN和BiLSTM提取特征的缺点,有效的缓解了CNN丢失词序信息和BiLSTM处理文本序列的梯度问题,能够有效地统筹文本的局部和全局特征,并进行关键信息凸显,从而获取更为全面的文本特征,因此适用于文本分类任务.
Abstract:In order to solve the problems of loss of key feature information, poor model performance and classification effect due to sparse text features when processing text classification tasks with convolutional neural networks (CNN) and recurrent neural networks (RNN). This paper proposes a text classification model based on multi-channel attention mechanism. Firstly, the vector representation is performed using a form of character and word fusion. Then, using CNN and BiLSTM to extract local features and contextual information of the text. The attention mechanism is used to weight the output information of each channel to highlight the importance of the feature words in the context information. Finally, the output results are fused and the text category probabilities are calculated using softmax. The results of comparative experiments on the data set show that the proposed model has a better classification effect. Compared with the classification effect of the model for a single channel, the F1 values are improved by 1.44% and 1.16%, respectively, which verifies the effectiveness of the proposed model in handling the text classification task. The proposed model complements the shortcomings of CNN and BiLSTM in extracting features, and effectively alleviates the problem of CNN losing word order information and the problem of gradients in BiLSTM processing text sequences. The model can effectively integrate local and global features of text and highlight key information to obtain a more comprehensive text feature, so it is suitable for text classification tasks.
-
表 1 Word2vec词向量模型参数
Table 1. Word vector model parameters of Word2vec
参数 值 说明 vec_size 200 向量维度 min_count 5 最低频数 win 4 上下文窗口 alpha 0.001 学习率 sg 1 skip-gram模型 表 2 MCA-CL模型的超参数设置
Table 2. Hyperparameter settings of the MCA-CL model
参数 值 参数 值 max_len 50 activation relu kernel_size 3、4、5 dropout 0.5 num_filters 200 batch_size 128 lstm_units 128 epochs 30 attention_size 50 optimizer adam 表 3 各类模型的分类结果
Table 3. Classification results of each type of model
模型 Precision Recall F1-score CNN[14] 0.905 6 0.904 7 0.904 8 TextCNN[15] 0.914 4 0.912 1 0.912 3 TextC NN-Attention[16] 0.932 6 0.931 9 0.932 0 LSTM[17] 0.903 0 0.901 4 0.901 5 BiLSTM[18] 0.927 8 0.927 3 0.927 4 LSTM-Attention[19] 0.917 1 0.915 4 0.915 5 BiLSTM-Attention[20] 0.935 2 0.934 7 0.934 8 SCA-CL 0.938 2 0.937 9 0.938 0 MCA-CL 0.946 5 0.946 4 0.946 4 -
[1] 于游, 付钰, 吴晓平. 中文文本分类方法综述[J]. 网络与信息安全学报, 2019, 5(5): 1-8. DOI: 10.11959/j.issn.2096-109x.2019045.YU Y, FU Y, WU X P. Summary of text classification methods[J]. Chinese Journal of Network and Information Security, 2019, 5(5): 1-8. DOI: 10.11959/j.issn.2096-109x.2019045. [2] 田园, 马文. 基于Attention-BiLSTM的电网设备故障文本分类[J]. 计算机应用, 2020, 40(S2): 24-29. DOI: 10.11772/j.issn.1001-9081.2020020180.TIAN Y, MA W. Attention-BiLSTM-based fault text classification for power grid equipment[J]. Journal of Computer Applications, 2020, 40(S2): 24-29. DOI: 10.11772/j.issn.1001-9081.2020020180. [3] SHAH K, PATEL H, SANGHVI D, et al. A comparative analysis of logistic regression, random forest and KNN models for the text classification[J]. Augmented Human Research, 2020, 5(1): 12. DOI: 10.1007/s41133-020-00032-0. [4] CHEN G N, DAI Z B, DUAN J T, et al. Improved naive Bayes with optimal correlation factor for text classification[J]. SN Applied Sciences, 2019, 1(9): 1129. DOI: 10.1007/s42452-019-1153-5. [5] 胡婧, 刘伟, 马凯. 基于机器学习的高血压病历文本分类[J]. 科学技术与工程, 2019, 19(33): 296-301. DOI: 10.3969/j.issn.1671-1815.2019.33.043.HU J, LIU W, MA K. Text categorization of hypertension medical records based on machine learning[J]. Science Technology and Engineering, 2019, 19(33): 296-301. DOI: 10.3969/j.issn.1671-1815.2019.33.043. [6] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. Computer Science, 2013. arXiv: 1301.3781. [7] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook, NY, United States: Curran Associates Inc., 2013: 3111-3119. DOI: 10.5555/2999792.2999959. [8] 高云龙, 吴川, 朱明. 基于改进卷积神经网络的短文本分类模型[J]. 吉林大学学报(理学版), 2020, 58(4): 923-930. DOI: 10.13413/j.cnki.jdxblxb.2019422.GAO Y L, WU C, ZHU M. Short text classification model based on improved convolutional neural network[J]. Journal of Jilin University (Science Edition), 2020, 58(4): 923-930. DOI: 10.13413/j.cnki.jdxblxb.2019422. [9] 李洋, 董红斌. 基于CNN和BiLSTM网络特征融合的文本情感分析[J]. 计算机应用, 2018, 38(11): 3075-3080. DOI: 10.11772/j.issn.1001-9081.2018041289.LI Y, DONG H B. Text sentiment analysis based on feature fusion of convolution neural network and bidirectional long short-term memory network[J]. Journal of Computer Applications, 2018, 38(11): 3075-3080. DOI: 10.11772/j.issn.1001-9081.2018041289. [10] 王根生, 黄学坚. 基于Word2vec和改进型TF-IDF的卷积神经网络文本分类模型[J]. 小型微型计算机系统, 2019, 40(5): 1120-1126. DOI: 10.3969/j.issn.1000-1220.2019.05.040.WANG G S, HUANG X J. Convolution neural network text classification model based on Word2vec and improved TF-IDF[J]. Journal of Chinese Computer Systems, 2019, 40(5): 1120-1126. DOI: 10.3969/j.issn.1000-1220.2019.05.040. [11] 陶志勇, 李小兵, 刘影, 等. 基于双向长短时记忆网络的改进注意力短文本分类方法[J]. 数据分析与知识发现, 2019, 3(12): 21-29. DOI: 10.11925/infotech.2096-3467.2019.0267.TAO Z Y, LI X B, LIU Y, et al. Classifying short texts with improved-attention based bidirectional long memory network[J]. Data Analysis and Knowledge Discovery, 2019, 3(12): 21-29. DOI: 10.11925/infotech.2096-3467.2019.0267. [12] 万齐斌, 董方敏, 孙水发. 基于BiLSTM-Attention-CNN混合神经网络的文本分类方法[J]. 计算机应用与软件, 2020, 37(9): 94-98. DOI: 10.3969/j.issn.1000-386x.2020.09.016.WAN Q B, DONG F M, SUN S F. Text classification method based on BiLSTM-Attention-CNN hybrid neural network[J]. Computer Applications and Software, 2020, 37(9): 94-98. DOI: 10.3969/j.issn.1000-386x.2020.09.016. [13] 王丽亚, 刘昌辉, 蔡敦波, 等. CNN-BiGRU网络中引入注意力机制的中文文本情感分析[J]. 计算机应用, 2019, 39(10): 2841-2846. DOI: 10.11772/j.issn.1001-9081.2019030579.WANG L Y, LIU C H, CAI D B, et al. Chinese text sentiment analysis based on CNN-BiGRU network with attention mechanism[J]. Journal of Computer Applications, 2019, 39(10): 2841-2846. DOI: 10.11772/j.issn.1001-9081.2019030579. [14] 侯小培, 高迎. 卷积神经网络CNN算法在文本分类上的应用研究[J]. 科技与创新, 2019(4): 158-159. DOI: 10.15913/j.cnki.kjycx.2019.04.158.HOU S P, GAO Y. Research on the application of convolutional neural network CNN algorithm on text classification[J]. Science and Technology & Innovation, 2019(4): 158-159. DOI: 10.15913/j.cnki.kjycx.2019.04.158. [15] 史沛卓, 陈凯天, 钟叶珂, 等. 基于TextCNN的中国古诗文分类方法研究[J]. 电子技术与软件工程, 2021(10): 190-192. https://www.cnki.com.cn/Article/CJFDTOTAL-DZRU202110092.htmSHI P Z, CHEN K T, ZHONG Y K, et al. Research on the classification method of Chinese ancient poems based on TextCNN[J]. Electronic Technology and Software Engineering, 2021(10): 190-192. https://www.cnki.com.cn/Article/CJFDTOTAL-DZRU202110092.htm [16] 赵云山, 段友祥. 基于Attention机制的卷积神经网络文本分类模型[J]. 应用科学学报, 2019, 37(4): 541-550. DOI: 10.3969/j.issn.0255-8297.2019.04.011.ZHAO Y S, DUAN Y X. Convolutional neural networks text classification model based on attention mechanism[J]. Journal of Applied Sciences, 2019, 37(4): 541-550. DOI: 10.3969/j.issn.0255-8297.2019.04.011. [17] 赵明, 杜会芳, 董翠翠, 等. 基于word2vec和LSTM的饮食健康文本分类研究[J]. 农业机械学报, 2017, 48(10): 202-208. DOI: 10.6041/j.issn.1000-1298.2017.10.025.ZHAO M, DU H F, DONG C C, et al. Diet health text classification based on word2vec and LSTM[J]. Transactions of the Chinese Society for Agricultural Machinery, 2017, 48(10): 202-208. DOI: 10.6041/j.issn.1000-1298.2017.10.025. [18] 和志强, 杨建, 罗长玲. 基于BiLSTM神经网络的特征融合短文本分类算法[J]. 智能计算机与应用, 2019, 9(2): 21-27. DOI: 10.3969/j.issn.2095-2163.2019.02.005.HE Z Q, YANG J, LUO C L. Combination characteristics based on BiLSTM for short text classification[J]. Intelligent Computer and Applications, 2019, 9(2): 21-27. DOI: 10.3969/j.issn.2095-2163.2019.02.005. [19] 蓝雯飞, 徐蔚, 汪敦志, 等. 基于LSTM-Attention的中文新闻文本分类[J]. 中南民族大学学报(自然科学版), 2018, 37(3): 129-133. DOI: 10.3969/j.issn.1672-4321.2018.03.026.LAN W F, XU W, WANG D Z, et al. Text classification of Chinese news based on LSTM-attention[J]. Journal of South-Central University for Nationalities (Natural Science Edition), 2018, 37(3): 129-133. DOI: 10.3969/j.issn.1672-4321.2018.03.026. [20] 冯斌, 张又文, 唐昕, 等. 基于BiLSTM-Attention神经网络的电力设备缺陷文本挖掘[J]. 中国电机工程学报, 2020, 40(S1): 1-10. DOI: 10.13334/j.0258-8013.pcsee.200530.FENG B, ZHANG Y W, TANG X, et al. Power equipment defect record text mining based on BiLSTM-attention neural network[J]. Proceedings of the CSEE, 2020, 40(S1): 1-10. DOI: 10.13334/j.0258-8013.pcsee.200530. [21] 周末, 宋玉蓉, 宋波, 等. 融合自注意力机制的D-BGRU文本分类模型[J/OL]. 微电子学与计算机, 2021: 1-9[2021-10-20]. http://kns.cnki.net/kcms/detail/61.1123.TN.20210914.1630.018.html.ZHOU M, SONG Y R, SONG B, et al. A D-BGRU text classification model incorporating self-attention mechanism[J/OL]. Microelectronics and Computers, 2021: 1-9[2021-10-20]. http://kns.cnki.net/kcms/detail/61.1123.TN.20210914.1630.018.html. [22] 吴汉瑜, 严江, 黄少滨, 等. 用于文本分类的CNN_BiLSTM_Attention混合模型[J]. 计算机科学, 2020, 47(S2): 23-27. DOI: 10.11896/jsjkx.200400116.WU H Y, YAN J, HUANG S B, et al. CNN_BiLSTM_Attention hybrid model for text classification[J]. Computer Science, 2020, 47(S2): 23-27. DOI: 10.11896/jsjkx.200400116. -