Enhancing Visual SLAM Robustness in Dynamic Scenes with YOLOv5-Assisted ORB-SLAM3

Rajaa Wejood Ali; Heba Hakim; Dr. Mohammed Abd Ali Al-Ibadi

doi:10.46604/peti.2026.15235

Authors

Rajaa Wejood Ali Department of Computer Engineering, University of Basrah, Basrah, Iraq
Heba Hakim Department of Computer Engineering, University of Basrah, Basrah, Iraq https://orcid.org/0000-0002-5300-3323
Dr. Mohammed Abd Ali Al-Ibadi Department of Computer Engineering, University of Basrah, Basrah, Iraq https://orcid.org/0000-0002-1034-3475

DOI:

https://doi.org/10.46604/peti.2026.15235

Keywords:

ORB-SLAM3, YOLOv5, dynamic environments, pose estimation, visual SLAM

Abstract

This study presents an enhanced visual SLAM (Simultaneous Localization and Mapping) framework that integrates ORB-SLAM3 with the YOLOv5 real-time object detection model to improve pose accuracy in dynamic environments. Although ORB-SLAM3 achieves robust performance in static scenes, its reliance on ORB feature tracking often degrades accuracy in the presence of moving objects. To overcome this limitation, YOLOv5 is employed to identify dynamic regions in each video frame, enabling the system to remove motion-related feature points before matching. This filtering mechanism reduces the influence of dynamic objects on trajectory estimation and enhances overall system robustness. The proposed method was evaluated using dynamic datasets, including BONN and TUM RGB-D, and further validated through real-world experiments with an Intel RealSense D435i camera. Experimental results demonstrate substantial improvements in pose accuracy compared with the baseline ORB-SLAM3 and the RTAB-Map system, confirming the effectiveness of the YOLOv5-assisted ORB-SLAM3 integration in dynamic scenes.

References

A. Barrau and S. Bonnabel, “The Invariant Extended Kalman Filter as a Stable Observer,” IEEE Transactions on Automatic Control, vol. 62, no. 4, pp. 1797-1812, 2017.

S. Thrun, M. Montemerlo, D. Koller, B. Wegbreit, J. Nieto, and E. Nebot, “FastSLAM: An Efficient Solution to the Simultaneous Localization and Mapping Problem with Unknown Data,” Journal of Machine Learning Research, vol. 4, no. 3, pp. 1-44, 2004.

S. Thrun and M. Montemerlo, “The GraphSLAM Algorithm with Applications to Large-Scale Mapping of Urban Structures,” International Journal of Robotics Research, vol. 25, no. 5-6, pp. 403-429, 2006.

L. Chen, G. Li, W. Xie, J. Tan, Y. Li, J. Pu, et al., “A Survey of Computer Vision Detection, Visual SLAM Algorithms, and their Applications in Energy-Efficient Autonomous Systems,” Energies, vol. 17, no. 20, article no. 5177, 2024.

X. Zhang, H. Dong, H. Zhang, X. Zhu, S. Li, and B. Deng, “A Real-time, Robust, and Versatile Visual-SLAM Framework Based on Deep Learning Networks,” IEEE Transactions on Instrumentation and Measurement, vol. 74, pp. 1-13, 2025.

S. Song, H. Lim, A. J. Lee, and H. Myung, “DynaVINS: A Visual-Inertial SLAM for Dynamic Environments,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 11523-11530, 2022.

P. Cong, J. Liu, J. Li, Y. Xiao, X. Chen, X. Feng, et al., “YDD-SLAM: Indoor Dynamic Visual SLAM Fusing YOLOv5 with Depth Information,” Sensors, vol. 23, no. 23, article no. 9592, 2023.

J. Li and J. Luo, “YS-SLAM: YOLACT++ Based Semantic Visual SLAM for Autonomous Adaptation to Dynamic Environments of Mobile Robots,” Complex & Intelligent Systems, vol. 10, no. 4, pp. 5771-5792, 2024.

M. Chen, H. Guo, R. Qian, G. Gong, and H. Cheng, “Visual Simultaneous Localization and Mapping (vSLAM) Algorithm Based on Improved Vision Transformer Semantic Segmentation in Dynamic Scenes,” Mechanical Sciences, vol. 15, no. 1, pp. 1-16, 2024.

C. Xu, E. Bonetto, and A. Ahmad, “DynaPix SLAM: A Pixel-Based Dynamic Visual SLAM Approach,” Proceedings of the 46th DAGM German Conference on Pattern Recognition (DAGM GCPR 2024), Part II, Springer-Verlag, pp. 168-184, 2023.

A. Eslamian and M. R. Ahmadzadeh, “Det-SLAM: A Semantic Visual SLAM for Highly Dynamic Scenes using Detectron2,” Proceedings of the 8th International Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), IEEE Press, pp. 1-5, 2022.

M. Labbé and F. Michaud, “RTAB-Map as an Open-Source Lidar and Visual Simultaneous Localization and Mapping Library for Large-Scale and Long-Term Online Operation,” Journal of Field Robotics, vol. 36, no. 2, pp. 416-446, 2019.

F. Endres, J. Hess, N. Engelhard, J. Sturm, D. Cremers, and W. Burgard, “An Evaluation of the RGB-D SLAM System,” Proceedings of the IEEE International Conference on Robotics and Automation, Saint Paul, Minnesota, USA, pp. 1691-1696, 2012.

E. Palazzolo, J. Behley, P. Lottes, P. Giguère, and C. Stachniss, “ReFusion: 3D Reconstruction in Dynamic Environments for RGB-D Cameras Exploiting Residuals,” Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), Macau, China, pp. 7855-7862, 2019.

R. Mur-Artal and J. D. Tardós, “ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255-1262, 2017.

B. Bescos, J. M. Fácil, J. Civera, and J. Neira, “DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4076-4083, 2018.

R. Mur-Artal, J. M. M. Montiel, and J. D. Tardós, “ORB-SLAM: A Versatile and Accurate Monocular SLAM System,” IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147-1163, 2015.

P. Cong, J. Li, J. Liu, Y. Xiao, and X. Zhang, “SEG-SLAM: Dynamic Indoor RGB-D Visual SLAM Integrating Geometric and YOLOv5-Based Semantic Information,” Sensors, vol. 24, no. 7, article no. 2102, 2024.

D. Feng, Z. Yin, X. Wang, F. Zhang, and Z. Wang, “YLS-SLAM: A Real-time Dynamic Visual SLAM based on Semantic Segmentation,” Industrial Robot: The International Journal of Robotics Research and Application, vol. 52, no. 1, pp. 106-115, 2024.

J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the Evaluation of RGB-D SLAM Systems,” Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, pp. 573-580, 2012.

C. Campos, R. Elvira, J. J. G. Rodriguez, J. M. M. Montiel, and J. D. Tardós, “ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874-1890, 2021.

Ultralytics, "YOLOv5: in PyTorch > ONNX > CoreML > TFLite," https://github.com/ultralytics/yolov5, accessed in 2025.

T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, et al., “Microsoft COCO: Common Objects in Context,” Lecture Notes in Computer Science, vol. 8693, pp. 740-755, 2014.