基于深度学习的多模态行人重识别综述
作者:
基金项目:

国家自然科学基金(62172231);江苏省自然科学基金(BK20220107)


Multi-modal person re-identification based on deep learning:a review
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • | | |
  • 文章评论
    摘要:

    行人重识别(Re-ID)旨在跨像机检索同一目标行人,它是智能视频监控领域的一项关键技术.由于监控场景的复杂性,单模态行人重识别在低光、雾天等极端情况下的适用性较差.因实际应用的需要以及深度学习的快速发展,基于深度学习的多模态行人重识别受到了广泛的关注.本文针对近年来多模态行人重识别的发展脉络进行综述:阐述了传统单模态行人重识别方法存在的不足;归纳了多模态行人重识别的常见应用场景及其优势,以及各数据集的构成;重点分析了各种场景下多模态行人重识别的相关方法及其分类,并探讨了当前研究的热点和挑战;最后,讨论了多模态行人重识别的未来发展趋势及其潜在应用价值.

    Abstract:

    Person re-identification (Re-ID),which involves retrieving the same person across cameras,is a key technology in the field of intelligent video surveillance.However,due to the complexity of surveillance scenarios,traditional single-modal approaches encounter limitations in extreme conditions such as low lighting and foggy days.Given the practical demands and the swift advancement in deep learning,multi-modal person Re-ID based on deep learning has received widespread attention.This article provides a review of the progress in multi-modal person Re-ID based on deep learning in recent years,elaborates on the shortcomings of traditional single-modal approaches and summarizes the common application scenarios and advantages of multi-modal person Re-ID,as well as the composition of various datasets.The article also highlights the relevant methods and classification of multi-modal person Re-ID across diverse scenarios,exploring current research hotspots and challenges.Finally,it discusses the future development trends and potential applications of multi-modal person Re-ID.

    参考文献
    [1] 何智敏, 许佳云. 基于深度学习的行人重识别算法研究进展[J]. 智能制造, 2023(3):80-83 HE Zhimin, XU Jiayun. Research progress of pedestrian re-recognition algorithm based on deep learning[J]. Intelligent Manufacturing, 2023(3):80-83
    [2] Li S, Xiao T, Li H S, et al. Person search with natural language description[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). July 21-26, 2017, Honolulu, HI, USA. IEEE, 2017:5187-5196
    [3] Chen C Q, Ye M, Qi M B, et al. Sketch transformer:asymmetrical disentanglement learning from dynamic synthesis[C]//Proceedings of the 30th ACM International Conference on Multimedia. October 10-14, 2022, Lisboa, Portugal. ACM, 2022:4012-4020
    [4] Zhang Y K, Wang H Z. Diverse embedding expansion network and low-light cross-modality benchmark for visible-infrared person re-identification[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 18-22, 2023, Vancouver, BC, Canada. IEEE, 2023:2153-2162
    [5] Zhang G Q, Zhang Y Y, Zhang H W, et al. Learning dual attention enhancement feature for visible-infrared person re-identification[J]. Journal of Visual Communication and Image Representation, 2024, 99:104076
    [6] Chen C Q, Ye M, Qi M B, et al. SketchTrans:disentangled prototype learning with transformer for sketch-photo recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(5):2950-2964
    [7] Niu K, Huang Y, Ouyang W L, et al. Improving description-based person re-identification by multi-granularity image-text alignments[J]. IEEE Transactions on Image Processing:a Publication of the IEEE Signal Processing Society, 2020, 29:5542-5556
    [8] Wang Y H, Liu X H, Zhang P P, et al. TOP-ReID:multi-spectral object re-identification with token permutation[J]. arXiv e-Print, 2023, arXiv:2312. 09612
    [9] Ye M, Shen J B, Shao L. Visible-infrared person re-identification via homogeneous augmented tri-modal learning[J]. IEEE Transactions on Information Forensics and Security, 2021, 16:728-739
    [10] Wei W Y, Yang W Z, Zuo E G, et al. Person re-identification based on deep learning:an overview[J]. Journal of Visual Communication and Image Representation, 2022, 82:103418
    [11] Chen H, Lagadec B, Bremond F. Learning discriminative and generalizable representations by spatial-channel partition for person re-identification[C]//2020 IEEE Winter Conference on Applications of Computer Vision (WACV). March 1-5, 2020, Snowmass, CO, USA. IEEE, 2020:2472-2481
    [12] Sun Y F, Zheng L, Yang Y, et al. Beyond part models:person retrieval with refined part pooling (and a strong convolutional baseline)[M]//Computer Vision-ECCV 2018. Cham:Springer International Publishing, 2018:501-518
    [13] Chen G Y, Lin C Z, Ren L L, et al. Self-critical attention learning for person re-identification[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). October 27-November 3, 2019, Seoul, Korea (South). IEEE, 2019:9636-9645
    [14] Chen Y H, Zhang G Q, Zhang H W, et al. Multi-level part-aware feature disentangling for text-based person search[C]//2023 IEEE International Conference on Multimedia and Expo (ICME). July 10-14, 2023, Brisbane, Australia. IEEE, 2023:2801-2806
    [15] Zhang G Q, Liu J, Chen Y H, et al. Multi-biometric unified network for cloth-changing person re-identification[J]. IEEE Transactions on Image Processing, 2023, 32:4555-4566
    [16] Zhang G Q, Ge Y, Dong Z C, et al. Deep high-resolution representation learning for cross-resolution person re-identification[J]. IEEE Transactions on Image Processing, 2021, 30:8913-8925
    [17] Zhang G Q, Zhang H W, Lin W S, et al. Camera contrast learning for unsupervised person re-identification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(8):4096-4107
    [18] Zhang G Q, Luo Z Y, Chen Y H, et al. Illumination unification for person re-identification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10):6766-6777
    [19] Yi D, Lei Z, Liao S C, et al. Deep metric learning for person re-identification[C]//2014 22nd International Conference on Pattern Recognition. August 24-28, 2014, Stockholm, Sweden. IEEE, 2014:34-39
    [20] Sarafianos N, Xu X, Kakadiaris I. Adversarial representation learning for text-to-image matching[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). October 27-November 3, 2019, Seoul, Korea (South). IEEE, 2019:5813-5823
    [21] Sikdar A, Chowdhury A S. Scale-invariant batch-adaptive residual learning for person re-identification[J]. Pattern Recognition Letters, 2020, 129:279-286
    [22] Zhang H W, Zhang G Q, Chen Y H, et al. Global relation-aware contrast learning for unsupervised person re-identification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(12):8599-610
    [23] Wu A C, Zheng W S, Yu H X, et al. RGB-infrared cross-modality person re-identification[C]//2017 IEEE International Conference on Computer Vision (ICCV). October 22-29, 2017, Venice, Italy. IEEE, 2017:5390-5399
    [24] Feng Z X, Lai J H, Xie X H. Learning modality-specific representations for visible-infrared person re-identification[J]. IEEE Transactions on Image Processing:a Publication of the IEEE Signal Processing Society, 2019, 29:579-590
    [25] Hao Y, Wang N N, Li J, et al. HSME:hypersphere manifold embedding for visible thermal person re-identification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1):8385-8392
    [26] Zhang Z Y, Jiang S, Huang C, et al. RGB-IR cross-modality person ReID based on teacher-student GAN model[J]. Pattern Recognition Letters, 2021, 150:155-161
    [27] Zhou J F, Huang B G, Fan W J, et al. Text-based person search via local-relational-global fine grained alignment[J]. Knowledge-Based Systems, 2023, 262:110253
    [28] Shao Z Y, Zhang X Y, Fang M, et al. Learning granularity-unified representations for text-to-image person re-identification[C]//Proceedings of the 30th ACM International Conference on Multimedia. October 10-14, 2022, Lisboa, Portugal. ACM, 2022:5566-5574
    [29] Pang L, Wang Y W, Song Y Z, et al. Cross-domain adversarial feature learning for sketch re-identification[C]//Proceedings of the 26th ACM International Conference on Multimedia. October 22-26, 2018, Seoul, Republic of Korea. ACM, 2018:609-617
    [30] Zhai Y J, Zeng Y W, Cao D, et al. TriReID:towards multi-modal person re-identification via descriptive fusion model[C]//Proceedings of the 2022 International Conference on Multimedia Retrieval. June 27-30, 2022, Newark, NJ, USA. ACM, 2022:63-71
    [31] Chen C Q, Ye M, Jiang D. Towards modality-agnostic person re-identification with descriptive query[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 18-22, 2023, Vancouver, BC, Canada. IEEE, 2023:15128-15137
    [32] He K M, Zhang X Y, Ren S O, et al. Deep residual learning for image recognition C ]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR) .June 27-30,2016,Las Vegas,NV,USA.IEEE, 2016:770-778
    [33] Simonyan K Zisserman A. Very deep convolutional networks for large-scale image recognition [J]. arXie-Print,2014.arXiv.1409.1556
    [34] Krizhevskv A Sutskever I Hinton G E.ImageNet classification with deep convolutional neural networks [J] .Communications of the ACM201760(6) :84-90
    [35] Xiang X Z,Lv N,Yu Z T,et al.Cross-modality person re-identification based on dual-path multi-branch network [J].IEEE Sensors Journal,2019 ,19( 23) :11706-11713
    [36] Zhang G Q ,Zhang Y Y,Chen Y H,et al.Multi-granularity feature utilization network for cross-modality visible-infrared person re-identification J] .Soft Computing , 2023:10:1-4
    [37] Wang G A , Zhang T Z, Cheng J, et al. RCB-infrared cross-modality person re-identification via joint pixel and feature alignment C]//2019 IEEE/CVF International Conference on Computer Vision ( ICCV).October 27-November 3, 2019, Seoul, Korea ( South ). IEEE,2019.3622-3631
    [38] Dai H P Xie 0 Ma Y C , et al.RCB-infrared person re-identification via image modality conversion C ]//2020 25th International Conference on Pattern Recognition( ICPR) .January 10-15 ,2021 , Milan , Italy.IEEE ,2021 :592-598
    [39] Yu H, Cheng X,Peng W,et al.Modality unifying network for visible-infrared person re-identification[ C ]//2023 IEEE/CVF International Conference on Computer Vision( ICCV) .September 30- October 7,2023 Paris , France.IEEE 2023:11151-11161
    [40] Ye M, Shen J B, Crandall D J, et al. Dynamic dualattentive aggregation learning for visible-infrared person re-identification[M ]//Computer Vision - ECCV 2020.Cham ; Springer International Publishing,2020:229-247
    [41] Cheng D ,Li X H,Oi M B ,et al.Exploring cross-modality commonalities via dual-stream multi-branch network for infrared-visible person re-identification J ]. IEEE Access,2020 ,8:12824-12834
    [42] Wei Z Y, Yang X,Wang N N,et al. Flexible body partition-based adversarial learning for visible infrared person re-identification J] .IEEE Transactions on Neural Networks and Learning Systems ,2022 33( 9) 4676-4687
    [43] Kim M . Kim S, Park J, et al. PartMix: regularization strategy to learn part discovery for visible-infrared person re-identification [ C ]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.une 18 - 22,2023 ,Vancouver, Canada.IEEE,2023:18621-18632
    [44] Wu Z S, Ye M.Unsupervised visible-infrared person re-identification via progressive graph matching and alternate learning[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition ( CVPR ).June 18 - 22,2023,Vancouver, BC,Canada. IEEE,2023:9548-9558
    [45] Zhang Y K, Lu Y, Yan Y, et al. Frequency domain nuancesmining for visible-infrared person re-identification [J] .arXiv e-Print 2024,arXiv:2401.02162
    [46] Li Y L,Zhang T Z ,Zhang Y D.Frequency domain modality-invariant feature learning for visible infrared person re-identification [ J ].arXiv e-Print ,2024 ,arXiv:2401.01839
    [47] Hu W P,Liu B H, Zeng H T, et al. Adversarial decoupling and modality-invariant representation learning for visible-infrared person re-identification ].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(8)5095-5109
    [48] Lu Y, Wu Y,Liu B ,et al.Cross-modality person re-identification with shared-specific feature transfer[ C ]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition ( CVPR ).June 16 - 18 2020Seattle, WA ,U5A.IEEE 2020:13376-13386
    [49] Hu B Y,Liu J W Zha Z J.Adversarial disentanglement and correlation network for rgb-infrared person re-identification C ]//2021 IEEE International Conference on Multimedia and Expo ( ICME ).Shenzhen , China.IEEE ,2021:1-6
    [50] Feng Y J,Yu J, Chen F, et al.Visible-infrared person re-identification via cross-modality interaction transformer [ J]. IEEE Transactions on Multimedia. 2023 ,25:7647-7659
    [51] Zhang 0 , Lai C Z, Liu J N , et al.FMCNet: feature-level modality compensation for visible-infrared person re-identification[ C ]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition ( CVPR ) .June 21 - 24,2022,New Orleans,LA,USA.IEEE,2022:7339-7348
    [52] Yu Q ,Liu F,Song Y Z,et al.Sketch me that shoe[ C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. June 26 - July 1 2016, Las Vegas ,Nevada,USA.IEEE,2016:799-807
    [53] Song J F, Yu 0 ,Song Y Z ,et al.Deep spatial-semantic at-tention for fine-grained sketch-based image retrieval [C]//2017 IEEE International Conference on Computer Vision ( ICCV).October 22 - 29 .2017 , Venice . Italy. IEEE 2017:5552-5561
    [54] Pang K Y,Song Y Z,Xiang T ,et al.Cross-domain generative learning for fine-grained sketch-based image retrieva [C]//The 28th British Machine Vision Conference.April 9-September 17,2017,London,UK.2017:1-12
    [55] Cui S J, Zhu Y, Oin X X, et al. Learning multi-level domain invariant features for sketch re-identification Veurocomputing,2020,403 :294-303
    [56] Yang F, Wu Y, Wang Z, et al. Instance-level heterogeneous domain adaptation for limited-labeled sketch-to-photo retrieval [J] .IEEE Transactions on Multimedia,2021 23:2347-2360
    [57] Lin K J , Wang Z X , Wang Z , et al.Beyond domain gap:exploiting subjectivity in sketch-based person retrieval [C]//Proceedings of the 31st ACM International Conference on Multimedia.October 29-November 3 ,2023 , Ottawa,ON.Canada.ACM 2023:2078-2089
    [58] Yang 0 Z,Wu A C Zheng W S.Person re-identification by contour sketch under moderate clothing change [J] .IEFF Transactions on Pattern Analvsis and llachine Intelligence,2021 43(6):2029-2046
    [59] Chen Q S , Ouan Z Z ,Zhao K , et al. A cross-modality sketch person re-identification model based on cross-spectrum image generation [C]//International Forum on Digital TV and Wireless Multimedia Communications.December 9-10,2022,Singapore.Springer,2022:312-324
    [60] Wang Z , Wang Z X , Zheng Y Q , et al.Beyond intra-modality:a survey of heterogeneous person re-identification [J].arXiv e-Print ,2019 ,arXiv: 1905.10048
    [61] Jiang K Z ,Zhang T Z, Liu X , et al. Cross-modality transformer for visible-infrared person re-identification [C]//Computer Vision - ECCV 2022: 17th European Conference.0ctober 23-27,2022,Tel Aviv,Israel.ACM ,2022:480-496
    [62] Zhang Y F, Wang Y Z, Li H F, et al. Cross-compatible embedding and semantic consistent feature construction for sketch re-identification C]//Proceedings of the 30th ACM International Conference on Multimedia. October 10-142022,Lisboa Portugal.ACM2022:3347-3355
    [63] Zhu F Y , Zhu Y .Jiang X B , et al.Cross-domain attention and center loss for sketch re-identification [J] . IEEE Iransactions on Information Forensics and Security. 2022 ,17:3421-3432
    [64] Liu X Y, Cheng X, Chen H Y, et al. Differentiable auxiliary learning for sketch re-identification [ j ]. Proceedings of the AAAl Conference on Artificial Intelligence,2024 38(4):3747-3755
    ]65] Chen T L,Xu C L,Luo J B.Improving text-based person search by spatial matching and adaptive threshold[ C ]//2018 IEEE Winter Conference on Applications of Computer Vision ( WACV).March 12-15,2018 Iake Tahoe NV,USA.IEEE2018:1879-1887
    [66] Devlin J,Chang M W,Lee K ,et al.BERT: pre-training of deep bidirectional transformers for language understanding [J].arXiv e-Print,2018 ,arXiv:1810.04805
    [67] Ding Z F, Ding C X,Shao Z Y, et al.Semantically selfaligned network for text-to-image part-aware person re-identification [J].arXiv e-Print 2021 ,arXiv:2107.12666
    [68] Wei D L, Zhang S P , Yang T, et al. Calibrating crossmodal features for text-based person searching ]] .arKiv e-Print 2023.arXiv:2304.02278
    [69] Zhang Y, Lu H.Deep cross-modal projection learning for image-text matching[C]//Proceedings of the European Conference on Computer Vision ( ECCV).Sepember 8- 14,2018,Munich,Cermany.Springer,2018 :686-701
    [70] Chen Y H,Zhang G Q ,Lu Y J , et al.TIPCB :a simple but effective part-based convolutional baseline for text-based person search[J].Neurocomputing,2022 ,494:171-181
    [71] Bird S.NLTK:the natural language toolkit[C]//Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions. July 17 - 18 2006,Sydney, Australia. ACM ,2006:69-72
    [72] Yan S L , Dong N , Zhang L Y, et al. CLIP-driven finegrained textimage person re-identification J. IEEE ransactions on Image Processing,2023 ,32:6032-6046
    [73] 姜定,叶茫.面向跨模态文本到图像行人重识别的Iransformer 网络[J].中国图象图形学报,2023 , 28(5) :1384-1395 JIANG Ding , YE Mang. Transformer network for crossmodal text-to-image person re-identification [J] .Journal of Image and Graphics ,2023 28( 5) 1384-1395
    [74] Li S YSun LLi O L.CLIP-RelD : exploiting vision-language model for image re-identification without concrete text labels[ ]] .Proceedings of the AAAI Conference on Artificial Intelligence ,2023 ,37( 1):1405-1413
    [75] Yan S L,Tang H,Zhang L Y,et al.Image-specific information suppression and implicit local alignment for text based person search J .EEE Transactions on Neura Networks and Learning Systems ,2023, PP( 99) .1-14
    [76] Jiang D , Ye M. Cross-modal implicit relation reasoning and aligning for text-to-image person retrieval[ C ]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.June 18-22 ,2023 , Vancouver,Canada.IEEE,2023:2787-2797
    [77] Cao L Y,Niu K, Jiao B L, et al.Addressing information inequality for text-based person search via pedestrian-centric visual denoising and bias-aware alignments [ J] IEEE Transactions on Circuits and Systems for Video Technology,2023 ,33( 12) :7884-7899
    [78] He W Z, Deng Y H, Tang S X, et al.Instruct-RelD : a multi-purpose re-identification task with instructions[ J].arXiv e-Print,2023 ,arXiv:2306.07520
    [79] Wei X B,Song K C,Yang W K, et al.A visible-infrared clothes-changing dataset for person re-identification in natural scene[J].Neurocomputing ,2024 ,569 :127110
    [80] Zhang L,Fu X W,Huang F X, et al.An open-world, diverse , cross-spatial-temporal benchmark for dynamic wild person re-identification [J ]. arXiv e-Print,2024 ,arXiv:2403.15119
    [81] Nguyen D T, Hong H G,Kim K W,et al. Person recognition system based on a combination of body images from visible light and thermal cameras [J] .Sensors ,2017,17( 3) :605
    [82] Zhu A C,Wang Z J,Li Y F,et al.DSSL:deep surround ings-person separation learning for text-based person retrievall [C]//Proceedings of the 29th ACM International Conference on Multimedia.October 20-24,2021 , Virtual Event,China.ACM,2021 :209-217
    [83] Wei L H, Zhang S L,Cao W, et al.Person transfer CAN to bridge domain gap for person re-identification [ C ]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.June 18 - 22 , 2018 Salt Lake City ,UT,USA.IEEE,2018:79-88
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

张国庆,杨珊,汪海蕊,王准,杨艳,周洁琼.基于深度学习的多模态行人重识别综述[J].南京信息工程大学学报(自然科学版),2024,16(4):437-450
ZHANG Guoqing, YANG Shan, WANG Hairui, WANG Zhun, YANG Yan, ZHOU Jieqiong. Multi-modal person re-identification based on deep learning:a review[J]. Journal of Nanjing University of Information Science & Technology, 2024,16(4):437-450

复制
分享
文章指标
  • 点击次数:577
  • 下载次数: 1287
  • HTML阅读次数: 1389
  • 引用次数: 0
历史
  • 收稿日期:2024-04-13
  • 在线发布日期: 2024-08-07
  • 出版日期: 2024-07-28

地址:江苏省南京市宁六路219号    邮编:210044

联系电话:025-58731025    E-mail:nxdxb@nuist.edu.cn

南京信息工程大学学报 ® 2025 版权所有  技术支持:北京勤云科技发展有限公司