Abstract:RGB-Infrared person re-identification is a challenging task aimed at matching person images between visible and infrared modalities, playing a crucial role in criminal investigation and intelligent surveillance. To address the issue of weak feature extraction capability for fine-grained features in current cross-modal person re-identification tasks, this paper proposes a person re-identification model based on fused attention and feature enhancement. Firstly, automatic data augmentation technique is employed to mitigate perspective and scale differences among different cameras. Secondly, a cross-attention multi-scale Vision Transformer is introduced to generate more discriminative feature representations by processing multi-scale features. Furthermore, channel attention and spatial attention mechanisms are proposed to learn important information for distinguishing features when fusing visible and infrared image features. Finally, a loss function based on adaptive weights is designed, presenting a hard triplet loss to enhance the correlation between each sample and improve the identification capability of visible and infrared images for different persons. Extensive experiments conducted on the SYSU-MM01 and RegDB datasets show that the proposed method achieves mAP scores of 68.05% and 85.19%, respectively, outperforming previous works. Moreover, ablation experiments and comparative analysis validate the superiority and effectiveness of the proposed model.