张禹,高新.动态场景下基于实例分割与光流的语义SLAM建图[J]. 微电子学与计算机,2024,41(2):19-27. doi: 10.19304/J.ISSN1000-7180.2023.0033
引用本文: 张禹,高新.动态场景下基于实例分割与光流的语义SLAM建图[J]. 微电子学与计算机,2024,41(2):19-27. doi: 10.19304/J.ISSN1000-7180.2023.0033
ZHANG Y,GAO X. Semantic SLAM building based on instance segmentation and optical flow in dynamic scenes[J]. Microelectronics & Computer,2024,41(2):19-27. doi: 10.19304/J.ISSN1000-7180.2023.0033
Citation: ZHANG Y,GAO X. Semantic SLAM building based on instance segmentation and optical flow in dynamic scenes[J]. Microelectronics & Computer,2024,41(2):19-27. doi: 10.19304/J.ISSN1000-7180.2023.0033

动态场景下基于实例分割与光流的语义SLAM建图

Semantic SLAM building based on instance segmentation and optical flow in dynamic scenes

  • 摘要: 视觉同步定位与建图技术常用于室内智能机器人的导航,但是其位姿是以静态环境为前提进行估计的。为了提升视觉即时定位与建图(Simultaneous Localization And Mapping, SLAM)在动态场景中的定位与建图的鲁棒性和实时性,在原ORB-SLAM2基础上新增动态区域检测线程和语义点云线程。动态区域检测线程由实例分割网络和光流估计网络组成,实例分割赋予动态场景语义信息的同时生成先验性动态物体的掩膜。为了解决实例分割网络的欠分割问题,采用轻量级光流估计网络辅助检测动态区域,生成准确性更高的动态区域掩膜。将生成的动态区域掩膜传入到跟踪线程中进行实时剔除动态区域特征点,然后使用地图中剩余的静态特征点进行相机的位姿估计并建立语义点云地图。在公开TUM数据集上的实验结果表明,改进后的SLAM系统在保证实时性的前提下,提升了其在动态场景中的定位与建图的鲁棒性。

     

    Abstract: The visual simultaneous localization and mapping technique is commonly used for indoor intelligent robot navigation, but its poses are estimated with static environment in mind. In order to improve the robustness and real-time performance of visual Simultaneous Localization And Mapping(SLAM ) for localization and mapping in dynamic scenes, we add dynamic region detection threads and semantic point cloud threads to the original ORB-SLAM2. The dynamic region detection thread consists of the instance segmentation network and the optical flow estimation network. The instance segmentation gives semantic information to the dynamic scene while generating a priori dynamic object masks, and in order to solve the under-segmentation problem of the instance segmentation network, the lightweight optical flow estimation network is used to assist the detection of dynamic regions and generate dynamic region masks with higher accuracy. The generated dynamic region masks are passed into the tracking thread for real-time rejection of dynamic region feature points, and then the remaining static feature points in the map are used for the camera's positional estimation and to build a semantic point cloud map. Experimental results on the publicly available TUM dataset show that the improved SLAM system improves the robustness of its localization and map building in dynamic scenes while ensuring real-time performance.

     

/

返回文章
返回