Abstract:To address the issue that road crack detection algorithms often adopt a two-stage approach, which first obtains the bounding box of the crack area from the image and then further segments the crack mask from the detection box using semantic segmentation techniques, note that these two processes are independent of each other and exhibit low efficiency in actual production. This paper proposes an end-to-end integrated road crack detection method and adopts a more lightweight crack backbone feature extraction network. To tackle the problem of limited generalization ability of models in practical scenarios due to the relatively simple scenes in existing road crack public datasets, a fusion method combining a progressive pyramid and spatial adaptive module for crack feature fusion is proposed. This method aims to enhance the detection ability of small target cracks in complex scenes. The proposed model was trained on a collected dataset of urban complex street scenes, achieving a testing accuracy of 86.3% and a recall rate of 84.1%, thereby demonstrating the feasibility of the proposed method in practical applications.