Research on the identification of desert plants in Xinjiang based on CNN and Swin Transformer
-
摘要:
新疆荒漠地区受气候和环境的双重影响易出现干旱灾害和影响农牧业生产,不利于新疆经济的可持续,新疆荒漠植物的识别是各植物研究人员了解植物生长状况的基础,也是生态保护研究和实施治理措施的前提. 同时,新疆荒漠植物图像存在类间相似、图像背景复杂和数据样本不平衡等特点,导致该研究具有一定的难度. 为提高识别准确率、准确定位局部重要特征与综合考虑复杂全局信息,本文提出了一种融合卷积神经网络(CNN)和Swin Transformer网络的植物图像识别方法. 该方法结合了CNN网络擅长提取局部特征和Swin Transformer擅长捕获全局表示的优点,同时在CNN分支中嵌入改进的Convolutional Block Attention Module (CBAM)注意力模块以便充分提取到具有区分度的局部关键特征,并使用Focal Loss损失函数解决数据样本不平衡问题. 通过实验结果表明,提出的融合方法在新疆荒漠植物数据集上相较于单分支网络更能充分提取图像的特征,其识别准确率可达97.99%,且精准率、召回率和
F 1分数都优于现有的方法. 最后通过可视化分析和混淆矩阵进一步佐证了该方法的有效性.-
关键词:
- 植物识别 /
- 卷积神经网络 /
- Swin Transformer /
- 注意力机制
Abstract:The desert areas of Xinjiang are prone to drought disasters and agricultural and animal husbandry production under the dual influence of climate and environment, which is not conducive to the sustainable economy of Xinjiang, the identification of desert plants in Xinjiang is the basis for various plant researchers to understand the growth status of plants, as well as a prerequisite for ecological conservation research and implementation of management measures. At the same time, the study is difficult due to the similarity of Xinjiang desert plant images between classes, complex image background and unbalanced data samples. In order to improve recognition accuracy, accurately locate locally important features and comprehensively consider complex global information, a plant image recognition method that combines convolutional neural network (CNN) and Swin Transformer network is proposed. The method combines the advantages of CNN network which is good at extracting local features and Swin Transformer which is good at capturing global representation, and embeds an improved Convolutional Block Attention Module (CBAM) in the CNN branch to fully extract the local key features with differentiation, and the Focal Loss function is used to solve the problem of data sample imbalance. The experimental results show that the proposed fused method can extract the features of the images more adequately than the single-branch network on the Xinjiang desert plant dataset, and its recognition accuracy can reach 97.99%, and the precision, recall and
F 1 score are better than the existing methods. Finally, the effectiveness of the method is further corroborated by visualization analysis and confusion matrix. -
表 1 不同组件对分类结果的影响
Table 1. Influence of different component on classification results
Method ResNet34 Swin-Transformer DA Dy-CBAM Accuracy/% Precision/% Recall/% F1-Score/% ① √ 92.38 91.77 91.48 91.62 ② √ √ 94.46 94.40 93.87 94.12 ③ √ 96.74 96.82 96.33 96.57 ④ √ √ 97.10 96.83 96.85 96.84 ⑤ √ √ √ 97.87 97.75 97.66 97.70 ⑥ √ √ √ 97.75 97.74 97.59 97.66 ⑦(ours) √ √ √ √ 97.99 97.89 97.85 97.87 表 2 损失函数对结果的影响
Table 2. Influence of loss function on the results
Method FL Accuracy/% Precision/% Recall/% F1-Score/% ① 97.75 97.62 97.57 97.59 ②(ours) √ 97.99 97.89 97.85 97.87 表 3 不同融合方式对结果的影响
Table 3. Influence of different fusion methods on the results
Method Fuse Accuracy/% Precision/% Recall/% F1-Score/% ① Add 97.68 97.62 97.53 97.57 ②(ours) Concat 97.99 97.89 97.85 97.87 表 4 Dy-CBAM不同位置的影响
Table 4. Influence of different positions of Dy-CBAM
Method A B Accuracy/% Precision/% Recall/% F1-Score/% ① 97.87 97.75 97.66 97.70 ② √ 97.82 97.57 97.74 97.65 ③(ours) √ 97.99 97.89 97.85 97.87 ④ √ √ 97.91 97.81 97.74 97.77 表 5 不同算法在本数据集上的对比
Table 5. Comparison of different algorithms on this dataset
Model Accuracy/% Precision/% Recall/% F1-Score/% VGG19 89.87 88.94 89.08 89.01 ResNet50 94.75 94.52 94.42 94.47 ResNeXt50 95.30 95.02 94.89 94.95 DenseNet121 94.11 93.45 93.41 93.43 DenseNet169 94.79 94.43 94.50 94.46 EfficientNet 94.85 94.62 94.29 94.45 VIT 96.41 95.97 96.08 96.02 BCNN 93.09 92.84 92.50 92.67 Ours 97.99 97.89 97.85 96.87 -
[1] 李欣玫, 左易灵, 薛子可, 等. 不同荒漠植物根际土壤微生物群落结构特征[J]. 生态学报,2018,38(8):2855-2863. DOI: 10.5846/stxb201704210722.LI X M, ZUO Y L, XUE Z K, et al. Structure of microbial communities in rhizosphere of different desert plants[J]. Acta Ecologica Sinica,2018,38(8):2855-2863. DOI: 10.5846/stxb201704210722. [2] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision,2004,60(2):91-110. DOI: 10.1023/B:VISI.0000029664.99615.94. [3] HE D C, WANG L. Texture unit, texture spectrum, and texture analysis[J]. IEEE Transactions on Geoscience and Remote Sensing,1990,28(4):509-512. DOI: 10.1109/TGRS.1990.572934. [4] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego: IEEE, 2005. [5] 祁亨年, 寿韬, 金水虎. 基于叶片特征的计算机辅助植物识别模型[J]. 浙江林学院学报,2003,20(3):281-284. DOI: 10.3969/j.issn.2095-0756.2003.03.013.QI H N, SHOU T, JIN S H. Leaf characteristics-based computer-aided plant identification model[J]. Journal of Zhejiang Forestry University,2003,20(3):281-284. DOI: 10.3969/j.issn.2095-0756.2003.03.013. [6] 陈寅, 周平. 植物叶形状与纹理特征提取研究[J]. 浙江理工大学学报,2013,30(3):394-3983. DOI: 10.3969/j.issn.1673-3851.2013.03.022.CHEN Y, ZHOU P. Research on shape and texture feature extraction of plant leaf images[J]. Journal of Zhejiang Sci-Tech University,2013,30(3):394-3983. DOI: 10.3969/j.issn.1673-3851.2013.03.022. [7] ZIN I A M, IBRAHIM Z, ISA D, et al. Herbal plant recognition using deep convolutional neural network[J]. Bulletin of Electrical Engineering and Informatics,2020,9(5):2198-2205. DOI: 10.11591/eei.v9i5.2250. [8] LEE S H, CHAN C S, WILKIN P, et al. Deep-plant: plant identification with convolutional neural networks[C]//2015 IEEE International Conference on image Processing (ICIP). Quebec City: IEEE, 2015: 452-456. [9] SHAH M P, SINGHA S, AWATE S P. Leaf classification using marginalized shape context and shape+texture dual-path deep convolutional neural network[C]//2017 IEEE International Conference on Image Processing (ICIP). Beijing: IEEE, 2017. [10] LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNN models for fine-grained visual recognition[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1449-1457. [11] PENG Z L, HUANG W, GU S Z, et al. Conformer: local features coupling global representations for visual recognition[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 357-366. [12] LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9992-10002. [13] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778. [14] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv: 2010.11929, 2020. [15] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018: 3-19. [16] CHEN Y P, DAI X Y, LIU M C, et al. Dynamic ReLU[C]//16th European Conference on Computer Vision. Glasgow: Springer, 2020. [17] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999-3007. [18] 段士民, 尹林克. 中国常见植物野外识别手册: 荒漠册[M]. 北京: 商务印书馆, 2016.DUAN S M, YI L K. Field guide to wild plants of China: desert[M]. Beijing: The Commercial Press, 2016. [19] DEVRIES T, TAYLOR G W. Improved regularization of convolutional neural networks with cutout[J]. arXiv: 1708.04552, 2017. [20] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 618-626. [21] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv: 1409.1556, 2014. [22] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2261-2269. [23] TAN M X, LE Q. EfficientNet: rethinking model scaling for convolutional neural networks[C]//Proceedings of the 36th International Conference on Machine Learning. Long Beach: PMLR, 2019: 6105-6114. -