Semantic segmentation algorithm based on separable dilated convolution and joint normalization method
-
摘要:
图像语义分割是图像理解的重要一环,已被广泛应用于自动驾驶等场景中.针对信息丢失和语义分割速度慢的问题,本文提出一种基于可分离空洞卷积和联合归一化的语义分割算法.首先结合可分离卷积和空洞卷积提取ResNet101的后三层输出,然后在语义分割中应用实例归一化方法,与应用批量归一化对比,验证了实例归一化的有效性,最后提出了两种联合归一化方法,验证了这两种归一化方法对语义分割效果的提升.本文方法在Pascal VOC 2012数据集进行了实验,结果表明,本文方法加速了网络的训练、验证和预测,交并集之比最高到达了80.62%.
Abstract:Image semantic segmentation is an important part of image understanding, which is applied to automatic driving. In this paper, we use the Pascal VOC 2012 data set and ResNet101 as the basic network.We propose semantic segmentation algorithm based on separable dilated convolution and improved normalization method to solve the problem of information loss andslow speed Firstly, we combineseparable convolution and dilated convolution to extract the last three layers' output of ResNet101.Compared with standard dilatedconvolution, separable dilatedconvolution accelerates the training, validation and prediction of the network. Then, in the semantic segmentation, the instance normalization method is applied and compared with the application batch normalization to verify the effectiveness of batch normalization. Finally, two normalization methods combining batch normalizationand instance normalizationare proposed to improve the effect of semantic segmentation. This method is tested in Pascal VOC 2012 data set. The results show thatour methodaccelerates the training, validation and prediction of the network. Thehighest mean intersection over union ofthis method in Pascal VOC 2012 data set is 80.62%.
-
表 1 标准空洞卷积和可分离空洞卷积实验对比
方法 mIoU(%) trarin(hour) Val(fps) Pred(fps) 标准空洞卷积 78.77 91.94 32.45 28.54 可分离空洞卷积 78.48 83.19 35.17 33.09 表 2 归一化方法实验对比
方法 BN IN PBIN CBIN backgroud 93.94 93.77 84.08 94.26 aeroplane 91.08 91.66 93.82 93.11 bike 63.47 62.74 45.76 46.95 bird 91.35 89.97 93.57 92.67 boat 74.24 74.10 75.26 72.72 bottle 77.92 76.13 85.06 84.14 bus 90.34 90.68 93.57 93.71 car 86.84 88.50 92.03 90.21 cat 91.34 92.15 95.16 91.77 chair 35.25 38.53 44.16 44.54 cow 89.90 91.79 93.15 91.80 table 64.82 61.90 54.04 50.44 dog 88.05 87.53 90.98 89.90 horse 87.47 89.41 90.75 90.55 motobike 83.99 85.34 88.07 87.49 person 85.80 85.84 88.42 88.39 plant 58.69 60.80 69.41 72.86 sheep 87.23 84.93 90.08 84.85 sofa 45.46 51.53 51.67 55.26 train 85.34 85.03 89.70 90.47 television 75.53 75.19 74.29 76.31 mIoU(%) 78.48 78.93 80.62 80.11 -
[1] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Las Vegas, NV, USA, 2016: 770-778. [2] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Boston, MA, USA, 2015: 3431-3440. [3] CHAURASIA A, CULURCIELLO E. LinkNet: Exploiting encoder representations for efficient semantic segmentation [C]// Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 2017: 1-4. [4] ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 6230-6239. [5] YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions [EB/OL]. arXiv preprint arXiv: 1511.07122, 2015.https://arxiv.org/abs/1511.07122. [6] CHEN L C, ZHU Y K, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]// Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich Germany, 2018: 801-818. [7] LI H C, XIONG P F, AN J, et al. Pyramid attention network for semantic segmentation [EB/OL]. arXiv preprint arXiv: 1805.10180, 2018.https://arxiv.org/abs/1805.10180. [8] ULYANOV D, VEDALDI A, LEMPITSKY V. Instance Normalization: The missing ingredient for fast stylization [EB/OL]. arXiv preprint arXiv: 1607.08022, 2016.https://arxiv.org/abs/1607.08022. [9] CHEN Z. Research on semantic segmentation based on convolutional neural network [D]. Beijin: Beijing Jiaotong University, 2018. [10] YU C Q, WANG J B, PENG C, et al. Learning a discriminative feature network for semantic segmentation [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, UT, USA, 2018: 1857-1866. [11] TIAN Z, HE T, SHEN C H, et al. Decoders matter for semantic segmentation: data-dependent decoding enables flexible feature aggregation [EB/OL]. arXiv preprint arXiv: 1903.02120, 2019.https://arxiv.org/abs/1903.02120. -