Abstract:Person re-identification (ReID) is a key technology in the field of intelligent video surveillance, aimed at retrieving the same person across cameras. Due to the complexity of monitoring scenarios, traditional single modal person re-identification is not suitable for some extreme situations such as low light and foggy days. In recent years, due to the needs of practical applications and the rapid development of deep learning, multi-modal person re-identification based on deep learning has received widespread attention. This article reviews the development of multi-modal person re-identification based on deep learning in recent years, elaborates on the shortcomings of traditional single modal person re-identification and summarizes the common application scenarios and advantages of multi-modal person re-identification, as well as the composition of each dataset. It focuses on analyzing the relevant methods and classifications of multi-modal person re-identification and discusses the hot spots and challenges of current research. Finally, the future development trends and potential application values of multi-modal person re-identification are prospected.