Research on Multi-unmanned Vehicle Path Planning Based on the APF-MASAC Algorithm

2025-7-5- 23

Research on Multi-unmanned Vehicle Path Planning Based on the APF-MASAC Algorithm
DOI:
                        
                    
CSTR:
                        
                    
Author:
                        YAN Dongmei1YAN Dongmei
Nanjing University of Posts and Telecommunications
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
YANG Nanyu2,3YANG Nanyu
College of Science,Hohai University,Jiangsu Nanjing 211100;China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
XU Jiajia4XU Jiajia
key laboratory of traffic information and safety of higher education institutes
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
LIU Lei2,3LIU Lei
College of Science,Hohai University,Jiangsu Nanjing 211100;China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:1.Nanjing University of Posts and Telecommunications;2.College of Science,Hohai University,Jiangsu Nanjing 211100;3.China;4.key laboratory of traffic information and safety of higher education institutes
Clc Number:
Fund Project:Autonomous driving decision-making system based on deep reinforcement learning

Article

Figures

Metrics

Reference [17]

Cited by

Materials

Comments

Abstract:

Aiming at the path planning problem of multiple unmanned vehicles in the real environment, an algorithm design scheme is proposed under the framework of Multi-Agent Soft Actor-Critic (MASAC). To optimize and verify the performance of the algorithm, this paper optimizes the algorithm from three aspects. Firstly, based on the potential shaping reward technology, a dense reward function is designed to provide more abundant, timely and effective feedback signals for the learning process of the algorithm, thus significantly accelerating the convergence speed of the algorithm. Secondly, the traditional experience replay buffer is improved by using the double consecutive frame technology. This technology incorporates two consecutive frames of observation data as a whole unit into the experience replay buffer, effectively capturing the dynamic information of environmental state changes and improving the training efficiency and stability. Thirdly, relying on the Gazebo simulation platform, a highly realistic dynamic obstacle environment is built, which provides a rich variety of and extremely challenging training samples for the training of the algorithm, ensuring that the algorithm can be fully learned and optimized under simulated real conditions. Finally, the effectiveness of the algorithm is verified through ablation experiments and robustness tests.

Key words:Reinforcement Learning; Multi-Agent; Artificial Potential Field;Path Planning; Autonomous Vehicles.

Reference

[1] Tang G, Tang C, Claramunt C, et al. Geometric A-star Algorithm: An Improved A-Star Algorithm for AGV Path Planning in a Port Environment[J]. IEEE Access, 2021, 9: 59196-59210.

[2] Chen Y, Bai G, Zhan Y, et al. Path Planning and Obstacle Avoiding of The USV Based on Improved ACO-APF Hybrid Algorithm with Adaptive Early-Warning[J]. IEEE Access, 2021, 9: 40728-40742.

[3] Li B, Qi X, Yu B, et al. Trajectory Planning for UAV Based on Improved ACO Algorithm[J]. IEEE Access, 2019, 8: 2995-3006.

[4] Lin S, Liu A, Wang J, et al. An Intelligence-Based Hybrid PSO-SA For Mobile Robot Path Planning in Ware House[J]. Journal of Computational Science, 2023, 67: 101938

[5] Lv L, Zhang S, Ding D, et al. Path Planning via an Improved DQN-Based Learning Policy[J]. IEEE Access, 2019, 7: 67319-67330.

[6] Yang Y, Li J, Peng L. Multi-robot Path Planning Based on a Deep Reinforcement Learning DQN Algorithm[J]. CAAI Transactions on Intelligence Technology, 2020, 5(3).

[7] Tordesillas J, How J P. MADER: Trajectory Planner in Multi-Agent and Dynamic Environments[J]. 2020.

[8] Haarnoja T, Zhou A, Abbeel P, et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//International conference on machine learning. PMLR, 2018: 1861-1870.

[9] Dong Y, Zou X. Mobile Robot Path Planning Based on Improved DDPG Reinforcement Learning Algorithm[C]. 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS). IEEE, 2020: 52-56.

[10] Zhu Q, Hu J, Cai W, et al. A New Robot Navigation Algorithm for Dynamic Unknown Environments Based on Dynamic Path Re-Computation and an Improved Scout Ant Algorithm[J]. Applied Soft Computing, 2011, 11(8): 4667-4676.

[11] 杨南禹,时正华. 基于人工势场-SAC算法的无人车路径规划研究[J]. 计算技术与自动化, 2024, 43(2): 82-87.

[12] Lowe R., Wu Y., et al. Multi-agent Actor-Critic for Mixed Cooperative Competitive Environments[C]. International Conference on Neural Information Processing Systems, Los Angeles, CA, USA, 2017, 6382-6393.

[13] 肖硕, 黄珍珍, 张国鹏等. 基于SAC的多智能体深度强化学习算法[J]. 电子学报, 2021, 49(09):? 1675-1681.

[14] Qie H, Shi D, Shen T, et al. Joint Optimization of? Multi-UAV Target Assignment and Path Planning Based on Multi-Agent Reinforcement Learning[J]. IEEE Access, 2019, 7: 146264-146272.

[15] Li X, Liu H, Li J, et al. Deep deterministic policy gradient algorithm for crowd-evacuation path planning[J]. Computers Industrial Engineering, 2021, 161: 107621.

[16] Son K, Kim D, Kang W J, et al. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning[J]. 2019.

[17] Semnani S H, Liu H, Everett M, et al. Multi-agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning[J]. arXiv, 2020.

Get Citation

Copy

Article Metrics

Abstract:19
PDF: 0
HTML: 0
Cited by: 0

History

Received:December 15,2024
Revised:March 24,2025
Adopted:March 24,2025
Online:
Published:

Article QR Code

Address：No. 219, Ningliu Road, Nanjing, Jiangsu Province

Postcode：210044

Phone：025-58731025

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code