基于多尺度特征融合的RGB-D显著性检测

孔德冕; 吴谨

doi:10.19304/J.ISSN1000-7180.2021.0304

基于多尺度特征融合的RGB-D显著性检测

孔德冕,
吴谨

RGB-D saliency detection based on multi-scale feature fusion

KONG Demian,
WU Jin

摘要

摘要: 深度图的引入为RGB显著性检测提供了丰富的位置线索，但低质量的深度图会错误引导模型的特征拟合，并且由于真实世界的显著物体尺度变化较大，会使网络在预测过程中更加困难，误差变大.为了解决上述问题，本文设计了一种新的基于深度学习的RGB-D显著性检测模型.本文利用VGG19作为主干网络分别提取RGB图和深度图两个模态的特征；然后利用串行的自适应融合模块对提取到的特征进行跨模态融合，使RGB图和深度图的优势互补，自动筛选深度特征；接着利用联合边缘检测的多尺度特征聚合模块将跨模态融合后的特征与边缘信息融合；最后通过全局引导模块对模型进行全局特征引导，得到预测结果.利用本文方法对4个公开数据集上的图像进行了预测，并与6种不同的方法进行对比，本文方法预测结果更接近人工标定的真值图.PR(Precision-Recall)曲线、S(S-measure)指标、F(F-measure)指标和MAE(Mean Absolute Error)指标显示，本文方法的整体性能较其中6种方法高.

Abstract: The introduction of the depth map provides a wealth of position clues for RGB saliency detection, but low-quality depth maps will misguide the model's feature fitting, and due to the large changes in the scale of salient objects in the real world, the network will be in the process of prediction It is more difficult and the error becomes larger. To solve the above problems, this paper designs a new RGB-D saliency detection model based on deep learning. This paper uses VGG19 as the backbone network to extract the features of the two modalities of the RGB map and the depth map; then uses the serial adaptive fusion module to perform cross-modal fusion of the extracted features, so that the advantages of the RGB map and the depth map complement each other. Automatically select depth features; then use the multi-scale feature aggregation module of joint edge detection to fuse the cross-modal fusion features with edge information; finally, use the global guidance module to guide the model with global features to obtain the prediction result. Using this method to predict the images on 4 public datasets, and compared with 6 different methods, the prediction result of this method is closer to the artificially calibrated truth map. PR (Precision-Recall) curve, S (S-measure) index, F (F-measure) index and MAE (Mean Absolute Error) index show that the overall performance of the method in this paper is higher than that of the six methods.

HTML全文

参考文献(13)

施引文献

资源附件(0)