Abstract:Aiming at the path planning problem of multiple unmanned vehicles in the real environment, an algorithm design scheme is proposed under the framework of Multi-Agent Soft Actor-Critic (MASAC). To optimize and verify the performance of the algorithm, this paper optimizes the algorithm from three aspects. Firstly, based on the potential shaping reward technology, a dense reward function is designed to provide more abundant, timely and effective feedback signals for the learning process of the algorithm, thus significantly accelerating the convergence speed of the algorithm. Secondly, the traditional experience replay buffer is improved by using the double consecutive frame technology. This technology incorporates two consecutive frames of observation data as a whole unit into the experience replay buffer, effectively capturing the dynamic information of environmental state changes and improving the training efficiency and stability. Thirdly, relying on the Gazebo simulation platform, a highly realistic dynamic obstacle environment is built, which provides a rich variety of and extremely challenging training samples for the training of the algorithm, ensuring that the algorithm can be fully learned and optimized under simulated real conditions. Finally, the effectiveness of the algorithm is verified through ablation experiments and robustness tests.