Research on lightweight target detection algorithm based on improved YOLOv4
-
摘要:
针对YOLOv4目标检测算法在一些应用场景的参数多、网络复杂、精度低等问题,提出一种改进的轻量级的目标检测算法GD-YOLO. 首先,通过使用轻量级网络GhostNet替换掉YOLOv4的主干特征提取网络CSPDarknet,GhostNet网络极大降低了算法的参数量及计算量,使得算法更加轻量化;其次,提出双重注意力机制(DATM),其不仅增强模型对空间和通道上的特征进行加强,而且其结构参数量小,使用在对主干网络提取出来的三个有效特征层添加双重注意力机制,让模型对特征提取更加有效;最后,新增ACON激活函数代替原有的GhostNet网络中的ReLU激活函数,进一步提高算法检测精度. 在VOC2007+2012数据集上的实验结果表明,GD-YOLO算法的平均准确率(mAP)达到84.28%,与YOLOv4算法相比提升了4个百分点,与YOLOv5算法相比低了大约1个百分点;从模型参数量方面,与YOLOv4算法相比减少了11 M,与YOLOv5相比减少3 M. 所提GD-YOLO算法相对于YOLOv4不仅减少了模型参数量,而且也保存了较高的平均准确率,表明该算法是更具有轻量化及高准确率的.
Abstract:Aiming at the problems of YOLOv4 target detection algorithm in some application scenarios with too many parameters, complex network and low accuracy, an improved lightweight target detection algorithm GD-YOLO was proposed. Firstly, the main feature extraction network of YOLOv4, CSPDarknet, is replaced by the lightweight network GhostNet, which greatly reduces the number of parameters and computation of the algorithm and makes the algorithm more lightweight. Secondly, the double attention mechanism (DATM) is proposed, which not only strengthens the spatial and channel features of the model, but also has a small number of structural parameters. The double attention mechanism is added to the three effective feature layers extracted from the backbone network to make the model more effective for feature extraction. Finally, ACON activation function was added to replace ReLU activation function in GhostNet network to further improve the detection accuracy of the algorithm. Experimental results on VOC2007+2012 data set show that the GD-YOLO algorithm has an average accuracy (mAP) of 84.28%, which is 4 percentage points higher than YOLOv4 algorithm and about 1 percentage point lower than YOLOv5 algorithm. Compared with YOLOv4 algorithm, the number of model parameters is reduced by 11M, and 3M compared with YOLOv5 algorithm. Compared with YOLOv4, the proposed GD-YOLO algorithm not only reduces the number of model parameters, but also preserves a higher average accuracy, indicating that the algorithm is more lightweight and has higher accuracy.
-
Key words:
- object detection /
- YOLOv4 /
- lightweight network /
- GhostNet /
- dual attention mechanism
-
表 1 各种注意力机制与DATM在YOLOv4下的对比
Table 1. Effect comparison of different attention structures and DATM
算法 mAP/% YOLOv4 79.66 YOLOv4+SENet 82.18 YOLOv4+ECA-Net 82.52 YOLOv4+CBAM 82.71 YOLOv4+DATM 83.23 表 2 消融实验
Table 2. Ablation experiment
算法 mAP/% 参数量 FPS YOLOv4 79.66 23.9 M 17 YOLOv4+GhostNet 80.28 11.3 M 23 YOLOv4+DATM 83.23 24.0 M 27 YOLOv4+GhostNet+ACON 80.71 11.3 M 23 YOLOv4+GhostNet+DATM 83.98 11.4 M 27 GD-YOLO 84.28 11.4 M 27 表 3 各算法在VOC2007+2012数据集上的测试效果
Table 3. The test effect of each algorithm on the VOC2007+2012 dataset
算法 主干网络 mAP/% 参数量 FPS YOLO v4 Vgg 79.66 23.9 M 17 Mobilenet_v1 80.82 12.6 M 44 Mobilenet_v2 81.03 10.8 M 35 Mobilenet_v3 80.01 11.7 M 31 DenseNet121 84.02 16.4 M 19 ResNet50 82.41 33.6 M 27 YOLOv4 CSPDarknet 80.28 22.4 M 23 YOLOv5 CSPDarknet 85.08 14.2 M 30 本文方法 GhostNet 84.28 11.4 M 27 -
[1] GIRSHICK R, DONAHUE J, DARRELL T, et al. Region-based convolutional networks for accurate object detection and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(1):142-158. DOI: 10.1109/TPAMI.2015.2437384. [2] GIRSHICK R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448. [3] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149. DOI: 10.1109/TPAMI.2016.2577031. [4] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788. [5] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6517-6525. [6] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv preprint arXiv: 1804.02767, 2018. [7] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv preprint arXiv: 2004.10934, 2020. [8] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. Scaled-YOLOv4: Scaling cross stage partial network[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13024-13033. [9] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016: 21-37. [10] DENG J, XUAN X J, WANG W F, et al. A review of research on object detection based on deep learning[J]. Journal of Physics: Conference Series,2020,1684:012028. DOI: 10.1088/1742-6596/1684/1/012028. [11] LIU J, WANG X W. Tomato diseases and pests detection based on improved Yolo V3 convolutional neural network[J]. Frontiers in Plant Science,2020,11:898. DOI: 10.3389/fpls.2020.00898. [12] WAN J X, JIAN D F, YU D Z. Research on the method of grass mouse hole target detection based on deep learning[J]. Journal of Physics: Conference Series, 2021, 1952: 022061. [13] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[J]. arXiv preprint arXiv: 1602.07360, 2016. [14] HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv: 1704.04861, 2017. [15] HUANG R, PEDOEEM J, CHEN C X. YOLO-LITE: a real-time object detection algorithm optimized for non-GPU computers[C]//2018 IEEE International Conference on Big Data (Big Data). Seattle: IEEE, 2018: 2503-2510. DOI: 10.1109/BigData.2018.8621865. [16] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille: JMLR. org, 2015: 448-456. [17] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2261-2269. [18] SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4510-4520. [19] HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1314-1324. [20] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916. DOI: 10.1109/TPAMI.2015.2389824. [21] LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 8759-8768. [22] HAN K, WANG Y H, TIAN Q, et al. GhostNet: more features from cheap operations[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 1577-1586. [23] MA N N, ZHANG X Y, LIU M, et al. Activate or not: learning customized activation[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021: 8028-8038. -