Abstract:To accurately detect road damages with large size differences and small scales in Vehicle-mounted-images, this paper presents a real-time Vehicle-mounted-images road damage detection model based on improved YOLOv5s, termed as VRD-YOLO(Vehicle-mounted-images Road Damage Detection -YOLO). Firstly, a Channel Mix Slide Transformer module is proposed, which enhances the model's global context modeling capability and strengthens the extraction of fine-grained road lesion semantic feature information. Secondly, a generalized feature pyramid with cross-layer fusion and cross-scale fusion is introduced to expand the network sensing field and strengthen the fusion of multi-scale damage features. Thirdly, to optimize the model feature response and improve the model’s detection performance, a dynamic detection head is designed to achieve scale perception, spatial perception, and task perception. Finally, the Vehicle-mounted Images Road Damage Dataset (VIRDD) was constructed to expand the number and type of existing road damage datasets, and ablation and comparison experiments were conducted based on this dataset. The experimental results show that VRD-YOLO achieves a detection accuracy of 74.45% mAP@0.5 on the VIRDD dataset, with a detection speed reaching 28.56 FPS. Compared to the YOLOv5s model, VRD-YOLO improves the precision, recall, F1 score, and mean average precision by 2.79, 2.32, 2.54, and 3.19 percentage points, respectively. Additionally, compared with six other classical and novel object detection models, VRD-YOLO attains the highest detection accuracy with the smallest model parameter count of 9.68 million, verifying the effectiveness and superiority of the proposed method.