Abstract:To address the limitations of existing stereo matching algorithms in terms of accuracy and computational efficiency, a novel stereo matching algorithm based on local self-attention is presented. The algorithm incorporates an enhanced local self-attention mechanism to strengthen the capture of local features, thereby improving the precision of disparity estimation and reducing computational complexity. Additionally, an Atrous Spatial Pyramid Pooling (ASPP) module is integrated to efficiently extract multi-scale features. The improved algorithm fuses these features to construct a matching cost volume, which is then regularized by an hourglass-shaped encoder-decoder network to determine the correspondences of feature points under varying disparity conditions. Finally, a high-precision disparity map is generated using a linear regression method. Experiments on the KITTI 2015 and Scene Flow datasets demonstrate that the improved algorithm reduces the end-point error by 22.1%, decreases the number of parameters by 45%, and shortens the running time by approximately 15.6%. These results confirm the advantages of the improved algorithm in enhancing accuracy and optimizing resources.