基于CNN与Swin Transformer的新疆荒漠植物识别研究

许春陶; 钱育蓉; 范迎迎; 杜臻宇; 邵游朋

doi:10.19304/J.ISSN1000-7180.2022.0577

基于CNN与Swin Transformer的新疆荒漠植物识别研究

Research on the identification of desert plants in Xinjiang based on CNN and Swin Transformer

摘要

摘要: 新疆荒漠地区受气候和环境的双重影响易出现干旱灾害和影响农牧业生产,不利于新疆经济的可持续,新疆荒漠植物的识别是各植物研究人员了解植物生长状况的基础,也是生态保护研究和实施治理措施的前提. 同时,新疆荒漠植物图像存在类间相似、图像背景复杂和数据样本不平衡等特点,导致该研究具有一定的难度. 为提高识别准确率、准确定位局部重要特征与综合考虑复杂全局信息,本文提出了一种融合卷积神经网络(CNN)和Swin Transformer网络的植物图像识别方法. 该方法结合了CNN网络擅长提取局部特征和Swin Transformer擅长捕获全局表示的优点,同时在CNN分支中嵌入改进的Convolutional Block Attention Module (CBAM)注意力模块以便充分提取到具有区分度的局部关键特征,并使用Focal Loss损失函数解决数据样本不平衡问题. 通过实验结果表明,提出的融合方法在新疆荒漠植物数据集上相较于单分支网络更能充分提取图像的特征,其识别准确率可达97.99%,且精准率、召回率和F1分数都优于现有的方法. 最后通过可视化分析和混淆矩阵进一步佐证了该方法的有效性.

Abstract: The desert areas of Xinjiang are prone to drought disasters and agricultural and animal husbandry production under the dual influence of climate and environment, which is not conducive to the sustainable economy of Xinjiang, the identification of desert plants in Xinjiang is the basis for various plant researchers to understand the growth status of plants, as well as a prerequisite for ecological conservation research and implementation of management measures. At the same time, the study is difficult due to the similarity of Xinjiang desert plant images between classes, complex image background and unbalanced data samples. In order to improve recognition accuracy, accurately locate locally important features and comprehensively consider complex global information, a plant image recognition method that combines convolutional neural network (CNN) and Swin Transformer network is proposed. The method combines the advantages of CNN network which is good at extracting local features and Swin Transformer which is good at capturing global representation, and embeds an improved Convolutional Block Attention Module (CBAM) in the CNN branch to fully extract the local key features with differentiation, and the Focal Loss function is used to solve the problem of data sample imbalance. The experimental results show that the proposed fused method can extract the features of the images more adequately than the single-branch network on the Xinjiang desert plant dataset, and its recognition accuracy can reach 97.99%, and the precision, recall and F1 score are better than the existing methods. Finally, the effectiveness of the method is further corroborated by visualization analysis and confusion matrix.

HTML全文

参考文献(23)

施引文献

资源附件(0)