街景图像视觉位置识别技术研究综述

张暖; 王涛; 张艳; 魏毅博; 李镏文; 刘熠晨

doi:10.12082/dqxxkx.2025.250137

地球信息科学学报 >

2025 , Vol. 27 >Issue 8: 1751 - 1779

DOI: https://doi.org/10.12082/dqxxkx.2025.250137

地球信息科学理论与方法

街景图像视觉位置识别技术研究综述

张暖 ^,¹^,² ,
王涛 ^,¹^,²^,^* ,
张艳 ¹ ,
魏毅博 ¹ ,
李镏文 ¹ ,
刘熠晨 ³

展开

1.信息工程大学，郑州 450001
2.智能空间信息国家级重点实验室，郑州 450001
3.61646部队，北京 100080

*王涛（1975— ），男，山东聊城人，博士，教授，主要从事航空航天遥感工程、遥感信息处理与应用等方向研究。 E-mail: wangtaoynl@163.com

作者贡献：Author Contributions

张暖参与文献搜集、梳理归纳、论文撰写、论文修改；王涛、张艳、魏毅博、李镏文、刘熠晨参与论文的修改。所有作者均阅读并同意最终稿件的提交。

ZHANG Nuan participated in literature collection, sorting and summarizing, paper writing and revision; WANG Tao, ZHANG Yan, WEI Yibo, LI Liuwen and LIU Yichen participated in the revision of the paper. All the authors have read the last version of paper and consented for submission.

张暖（2002— ），女，安徽铜陵人，硕士生，主要从事遥感影像定位、视觉图像位置识别技术等方向研究。E-mail: 1263513899@qq.com

收稿日期: 2025-03-25

修回日期: 2025-06-08

网络出版日期: 2025-07-23

基金资助

智能空间信息国家级重点实验室基金(a8235)

收起

An Overview of Visual Place Recognition Based on Street View Images

ZHANG Nuan ^,¹^,² ,
WANG Tao ^,¹^,²^,^* ,
ZHANG Yan ¹ ,
WEI Yibo ¹ ,
LI Liuwen ¹ ,
LIU Yichen ³

Expand

1. Information Engineering University, Zhengzhou 450001, China
2. National Key Laboratory of Intelligent Spatial Information, Zhengzhou 450001, China
3. Troop 61646, Beijing 10080, China

*WANG Tao, E-mail: wangtaoynl@163.com

Received date: 2025-03-25

Revised date: 2025-06-08

Online published: 2025-07-23

Supported by

National Key Laboratory of Intelligent Spatial Information Fund(a8235)

Fold

摘要

【意义】街景图像视觉位置识别（Street View Image-based Visual Place Recognition, SV-VPR）是一种基于视觉特征信息的地理位置识别技术，其核心任务是通过分析街景图像的视觉特征，实现对未知地点的地理位置预测和精确定位。该技术需要克服不同环境条件下的外观变化（如昼夜光照差异、季节更替特征演变等）和视点差异（如车载相机与卫星图像的视角偏差），并通过计算图像特征相似性、几何约束等条件来实现精准识别。作为计算机视觉与地理信息科学的交叉领域，SV-VPR与视觉定位、图像检索、SLAM等技术密切相关，在无人机自主导航、自动驾驶高精度定位、网络空间地理围栏构建、增强现实场景融合等领域具有重要应用价值，特别是在GPS信号缺失场景下展现出独特的定位优势。【分析】本文系统综述了街景图像视觉位置识别技术的研究进展，主要包含以下内容：首先，阐述了图像视觉位置识别技术的基础概念与分类，深入探讨了街景图像视觉位置识别技术的基础概念与分类方法；其次，详细分析了该领域的关键技术研究；此外，全面梳理了街景图像视觉位置识别技术相关的数据集资源；同时，梳理了该技术的评价方法与指标体系；最后，对街景图像视觉位置识别技术的未来研究方向进行了展望。【目的】通过本综述，旨在为相关研究者提供系统化的技术发展脉络梳理，帮助快速把握领域研究现状；关键技术与评估方法的对比分析，为算法选型提供决策依据；前沿挑战与潜在突破方向的预判，启发创新性研究思路。

关键词： 街景图像; 视觉位置识别; 视觉特征; 位置预测; 精确定位; 视觉定位; 图像检索

本文引用格式

张暖 , 王涛 , 张艳 , 魏毅博 , 李镏文 , 刘熠晨 . 街景图像视觉位置识别技术研究综述[J]. 地球信息科学学报, 2025 , 27(8) : 1751 -1779 . DOI: 10.12082/dqxxkx.2025.250137

Abstract

[Significance] Street View Image-based Visual Place Recognition (SV-VPR) is a geographical location recognition technology that relies on visual feature information. Its core task is to predict and accurately locate unknown locations by analyzing the visual features of street view images. This technology must overcome challenges such as appearance changes under different environmental conditions (e.g., lighting differences between day and night, seasonal variations) and viewpoint differences (e.g., perspective deviations between vehicle-mounted cameras and satellite images). Accurate recognition is achieved through calculating image feature similarity, applying geometric constraints, and related methods. As an interdisciplinary field of computer vision and geographic information science, SV-VPR is closely related to visual positioning, image retrieval, SLAM, and more. It has significant application value in areas such as UAV autonomous navigation, high-precision positioning for autonomous driving, construction of geographical boundaries in cyberspace, and integration of augmented reality environments. It is particularly advantageous in GPS-denied environments. [Analysis] This paper systematically reviews the research progress of visual location recognition based on street view images, covering the following aspects: First, the basic concepts and classifications of visual place recognition technologies are introduced. Second, the foundational principles and categorization methods specific to street view image-based visual place recognition are discussed in depth. Third, the key technologies in this field are analyzed in detail. Furthermore, relevant datasets for street view image-based visual place recognition are comprehensively reviewed. In addition, evaluation methods and index systems used in this domain are summarized. Finally, potential future research directions for SV-VPR are explored. [Purpose] This review aims to provide researchers with a systematic overview of the technological development trajectory of SV-VPR, helping them quickly understand the current research landscape. It also offers a comparative analysis of key technologies and evaluation methods to support algorithm selection, and identifies emerging challenges and potential breakthrough areas to inspire innovative research.

Key words： street view image; Visual Place Recognition; visual feature; location prediction; accurate positioning; visual positioning; image retrieval

利益冲突：Conflicts of Interest 所有作者声明不存在利益冲突。

All authors disclose no relevant conflicts of interest.

参考文献

原文顺序 | 文献年度倒序 | 文中引用次数倒序

[1]	Ali-bey A, Chaib-draa B, Giguère P. GSV-Cities: Toward appropriate supervised visual place recognition[J]. Neurocomputing, 2022, 513:194-203. DOI:10.1016/j.neucom.2022.09.127

[2]	Ali-Bey A, Chaib-Draa B, Giguére P. MixVPR: Feature mixing for visual place recognition[C]// 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2023:2997-3006. DOI:10.1109/WACV56688.2023.00301

[3]	Ali-bey A, Chaib-draa B, Giguère P. BoQ: A place is worth a bag of learnable queries[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2024:17794-17803. DOI:10.1109/CVPR52733.2024.01685

[4]	Berton G, Masone C, Caputo B. Rethinking visual geo-localization for large-scale applications[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022:4868-4878. DOI:10.1109/CVPR52688.2022.00483

[5]	Berton G, Junglas L, Zaccone R, et al. MeshVPR: Citywide visual place recognition using 3D meshes[M]// Computer Vision-ECCV 2024. Cham: Springer Nature Switzerland, 2024:321-339. DOI:10.1007/978-3-031-72904-1_19

[6]	Berton G, Trivigno G, Caputo B, et al. JIST: Joint image and sequence training for sequential visual place recognition[J]. IEEE Robotics and Automation Letters, 2024, 9(2):1310-1317.

[7]	Berton G, Trivigno G, Caputo B, et al. EigenPlaces: Training viewpoint robust models for visual place recognition[C]// 2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2023:11046-11056. DOI: 10.1109/ICCV51070.2023.01017

[8]	Zhu Y Y, Wang J, Xie L X, et al. Attention-based pyramid aggregation network for visual place recognition[C]// Proceedings of the 26th ACM International Conference on Multimedia. ACM, 2018:99-107. 10.1145/3240508.3240525

[9]	Ge Y X, Wang H B, Zhu F, et al. Self-supervising fine-grained region similarities for large-scale image localization[M]// Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020:369-386. DOI:10.1007/97 8-3-030-58548-8_22

[10]	Garg S, Fischer T, Milford M. Where is your place, visual place recognition?[C]// Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2021:4416-4425. DOI:10.24963/ijcai.2021/603

[11]	Lowry S, Sünderhauf N, Newman P, et al. Visual place recognition: A survey[J]. IEEE Transactions on Robotics, 2016, 32(1):1-19. DOI:10.1109/TRO.2015.2496823

[12]	Kim H J, Dunn E, Frahm J M. Learned contextual feature reweighting for image geo-localization[C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:3251-3260. DOI:10.1109/CVPR.2017.346

[13]	Liu L, Li H D, Dai Y C. Stochastic attraction-repulsion embedding for large scale image localization[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2019: 2570-2579. DOI:10.1109/iccv.2019.00266

[14]	Zamir A R, Hakeem A, Van Gool L, et al. Large-Scale Visual Geo-Localization[M]. Cham: Springer International Publishing, 2016. DOI: 10.1007/978-3-319-25781-5

[15]	Sattler T, Leibe B, Kobbelt L. Efficient & effective prioritized matching for large-scale image-based localization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(9):1744-1756. DOI:10.1109/TPAMI.2016.2611662

[16]	Cadena C, Carlone L, Carrillo H, et al. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age[J]. IEEE Transactions on Robotics, 2016, 32(6):1309-1332. DOI:10.1109/TRO.2016.2624754

[17]	Smeulders A W M, Worring M, Santini S, et al. Content-based image retrieval at the end of the early years[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(12):1349-1380. DOI:10.1109/34.895972

[18]	Wilson D, Zhang X H, Sultani W, et al. Image and object geo-localization[J]. International Journal of Computer Vision, 2024, 132(4):1350-1392. DOI:10.1007/s11263-023-01942-3

[19]	Baatz G, Saurer O, Köser K, et al. Large scale visual geo-localization of images in mountainous terrain[M]//Computer Vision - ECCV 2012. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012:517-530. DOI:10.1007/978-3-642-33709-3_37

[20]	Zhang W, Kosecka J. Image based localization in urban environments[C]// Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06). IEEE, 2006:33-40. DOI:10.1109/3DPVT.2006.80

[21]	Weng L, Gouet-Brunet V, Soheilian B. Semantic signatures for large-scale visual localization[J]. Multimedia Tools and Applications, 2021, 80(15):22347-22372. DOI: 10.1007/s11042-020-08992-6

[22]	Chu M, Zheng Z D, Ji W, et al. Towards natural language-guided drones: GeoText-1652 benchmark with spatial relation matching[M]// Computer Vision-ECCV 2024. Cham: Springer Nature Switzerland, 2024:213-231. DOI: 10.1007/978-3-031-73247-8_13

[23]	Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2):91-110. DOI: 10.1023/B:VISI.0000029664.99615.94

[24]	Bay H, Tuytelaars T, Van Gool L. SURF: Speeded up robust features[M]// Computer Vision-ECCV 2006. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006:404-417. DOI: 10.1007/11744023_32

[25]	Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). IEEE, 2005:886-893. DOI:10.1109/CVPR.2005.177

[26]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6):84-90. DOI: 10.1145/3065386

[27]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[EB/OL]. 2014: 1409.1556. https://arxiv.org/abs/1409.1556v6

[28]	He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016:770-778. DOI:10.1109/CVPR.2016.90

[29]	Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015:1-9. DOI: 10.1109/CVPR.2015.7298594

[30]	Arandjelovic R, Gronat P, Torii A, et al. NetVLAD: CNN architecture for weakly supervised place recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016:5297-5307. DOI:10.1109/CVPR.2016.572

[31]	DeTone D, Malisiewicz T, Rabinovich A. SuperPoint: Self-supervised interest point detection and description[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2018:337-33712. DOI:10.1109/CVPRW.2018.00060

[32]	Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018:7132-7141. DOI:10.1109/CVPR.2018.00745

[33]	Woo S, Park J, Lee J Y, et al. CBAM: Convolutional Block attention module[M]// Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018:3-19. DOI:10.1007/978-3-030-01234-2_1

[34]	Radford A, Kim J W, Xu C, et al. Learning transferable visual models from natural language supervision[C]// International conference on machine learning. PMLR, 2021: 8748-8763. DOI:10.48550/arXiv.2103.00020

[35]	Jia C, Yang Y F, Xia Y, et al. Scaling up visual and vision-language representation learning with noisy text supervision[EB/OL]. 2021:2102.05918. https://arxiv.org/abs/2102.05918v2

[36]	Rublee E, Rabaud V, Konolige K, et al. ORB: An efficient alternative to SIFT or SURF[C]// 2011 International Conference on Computer Vision. IEEE, 2011:2564-2571. DOI:10.1109/ICCV.2011.6126544

[37]	Yi K M, Trulls E, Lepetit V, et al. LIFT: Learned invariant feature transform[M]//Computer Vision- ECCV 2016. Cham: Springer International Publishing, 2016:467-483. DOI:10.1007/978-3-319-46466-4_28

[38]	Dusmanu M, Rocco I, Pajdla T, et al. D2-net: A trainable CNN for joint description and detection of local features[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019:8084-8093. DOI:10.1109/CVPR.2019.00828

[39]	Oliva A, Torralba A. Modeling the shape of the scene: A holistic representation of the spatial envelope[J]. International Journal of Computer Vision, 2001, 42(3):145-175. DOI:10.1023/A:1011139631724

[40]	Murillo A C, Kosecka J. Experiments in place recognition using gist panoramas[C]// 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops. IEEE, 2009:2196-2203. DOI:10.1109/ICCVW.2009.5457552

[41]	Sünderhauf N, Protzel P. BRIEF-Gist - closing the loop by simple means[C]// 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2011:1234-1241. DOI:10.1109/IROS.2011.6094921

[42]	Calonder M, Lepetit V, Ozuysal M, et al. BRIEF: Computing a local binary descriptor very fast[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7):1281-1298. DOI:10.1109/TPAMI.2011.222 PMID

[43]	Maddern W, Milford M, Wyeth G. CAT-SLAM: Probabilistic localisation and mapping using a continuous appearance-based trajectory[J]. The International Journal of Robotics Research, 2012, 31(4):429-451. DOI:10.1177/0278 364912438273

[44]	Wang R T, Shen Y Q, Zuo W L, et al. TransVPR: Transformer-based place recognition with multi-level attention aggregation[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022:13638-13647. DOI:10.1109/CVPR52688.2022.01328

[45]	Yu J, Zhu C Y, Zhang J, et al. Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(2):661-674. DOI:10.1109/TNNLS.2019.2908982 PMID

[46]	Zhang J, Cao Y Y, Wu Q. Vector of locally and adaptively aggregated descriptors for image feature representation[J]. Pattern Recognition, 2021, 116:107952. DOI:10.1016/j.patcog.2021.107952

[47]	Radenović F, Tolias G, Chum O. Fine-tuning CNN image retrieval with No human annotation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(7):1655-1668. DOI:10.1109/TPAMI.2018.2846566 PMID

[48]	Hassani A, Walton S, Shah N, et al. Escaping the big data paradigm with compact transformers[EB/OL]. 2021:2104.05704. https://arxiv.org/abs/2104.05704v4

[49]	Noh H, Araujo A, Sim J, et al. Large-scale image retrieval with attentive deep local features[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017:3476-3485. DOI:10.1109/ICCV.2017.374

[50]	杨晓云. 基于卷积神经网络的视觉位置识别方法研究[D]. 哈尔滨: 东北林业大学, 2021. DOI:10.27009/d.cnki.gdblu.2021.000237. [Yang X Y. Research on visual place recognition method based on convolutional neural network[D]. Harbin: Northeast Forestry University, 2021. ] DOI:10.27009/d.cnki.gdblu.2021.000237

[51]	Wang Y W, Qiu Y Y, Cheng P T, et al. Hybrid CNN-transformer features for visual place recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(3):1109-1122. DOI:10.1109/TCSVT.2022.3212434

[52]	Keetha N, Mishra A, Karhade J, et al. AnyLoc: Towards universal visual place recognition[J]. IEEE Robotics and Automation Letters, 2024, 9(2):1286-1293. DOI:10.1109/LRA.2023.3343602

[53]	Zhang G Y, Zhang Y R, Zhang K R, et al. Can vision-language models be a good guesser? exploring VLMs for times and location reasoning[C]//2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2024:625-634. DOI: 10.1109/WACV57701.2024.00069

[54]	Waheed S, Ferrarini B, Milford M, et al. Image-based geo-localization for robotics:Are black-box vision-language models there yet?[EB/OL]. 2025:2501.16947. https://arxiv.org/abs/2501.16947v1

[55]	Zhang Z Y, Li R Z, Kabir T, et al. NAVIG: Natural language-guided analysis with vision language models for image geo-localization[EB/OL]. 2025:2502.14638. https://arxiv.org/abs/2502.14638v1

[56]	Matsuzaki S, Sugino T, Tanaka K, et al. CLIP-loc: Multi-modal landmark association for global localization in object-based maps[C]// 2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024:13673-13679. DOI:10.1109/ICRA57147.2024.10611393

[57]	Chen Z, Chen Z J. FLORA:Formal language model enables robust training-free zero-shot object referring analysis[EB/OL]. 2025:2501.09887. https://arxiv.org/abs/2501.09887v1

[58]	Izbicki M, Papalexakis E E, Tsotras V J. Exploiting the earth’s spherical geometry to geolocate images[M]// Machine Learning and Knowledge Discovery in Databases. Cham: Springer International Publishing, 2020:3-19. DOI: 10.1007/978-3-030-46147-8_1

[59]	Arandjelović R, Zisserman A. Visual vocabulary with a semantic twist[M]// Computer Vision-ACCV 2014. Cham: Springer International Publishing, 2015:178-195. DOI:10.1007/978-3-319-16865-4_12

[60]	Schönberger J L, Pollefeys M, Geiger A, et al. Semantic visual localization[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018:6896-6906. DOI:10.1109/CVPR.2018.00721

[61]	Torii A, Sivic J, Okutomi M, et al. Visual place recognition with repetitive structures[C]// IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE, 2015:2346-2359. DOI:10.1109/TPAMI.2015.2409868

[62]	袁一, 程亮, 宗雯雯, 等. 互联网众源照片的三维重建定位技术[J]. 测绘学报, 2018, 47(5):631-643. DOI [Yuan Y, Cheng L, Zong W W, et al. Crowd-sourced pictures geo-localization method based on 3D reconstruction[J]. Acta Geodaetica et Cartographica Sinica, 2018, 47(5):631-643. ] DOI

[63]

Waheed

, Milford

, McDonald-Maier

, et al. SwitchHit: A probabilistic, complementarity-based switching system for improved visual place recognition in changing environments[C]// 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022:7833-7840. DOI:10.1109/IROS47612.2022.9981722

[64]	Chu T Y, Chen Y M, Huang L H, et al. Street view image retrieval with average pooling features[C]// IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2020:1205-1208. DOI:10.1109/igarss39084.2020.9323667

[65]	Yang M, He D L, Fan M, et al. DOLG: Single-stage image retrieval with deep orthogonal fusion of local and global features[C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2021:11752-11761. DOI:10.1109/ICCV48922.2021.01156

[66]	Moreno Berton G, Paolicelli V, Masone C, et al. Adaptive-attentive geolocalization from few queries: A hybrid approach[C]// 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2021:2917-2926. DOI:10.1109/wacv48630.2021.00296

[67]	Hausler S, Garg S, Xu M, et al. Patch-NetVLAD: Multi-scale fusion of locally-global descriptors for place recognition[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021:14136-14147. DOI:10.1109/cvpr46437.2021.01392

[68]	Cai Y F, Zhao J Q, Cui J F, et al. Patch-NetVLAD: Learned patch descriptor and weighted matching strategy for place recognition[C]// 2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI). IEEE, 2022:1-8. DOI:10.1109/MFI55806.2022.9913860

[69]	Khaliq A, Milford M, Garg S. MultiRes-NetVLAD: Augmenting place recognition training with low-resolution imagery[J]. IEEE Robotics and Automation Letters, 2022, 7(2):3882-3889. DOI:10.1109/LRA.2022.3147257

[70]	Tolias G, Sicre R, Jégou H. Particular object retrieval with integral max-pooling of CNN activations[EB/OL]. 2015: 1511.05879. https://arxiv.org/abs/1511.05879v2

[71]	Peng G H, Yue Y F, Zhang J, et al. Semantic reinforced attention learning for visual place recognition[C]//2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021:13415-13422. DOI:10.1109/icra48506.2021.9561812

[72]	仇晓松, 邹旭东, 王金戈, 等. 基于卷积神经网络的视觉位置识别方法[J]. 计算机工程与设计, 2019, 40(1):223-229. [Qiu X S, Zou X D, Wang J G, et al. Method of visual place recognition based on convolutional neural network[J]. Computer Engineering and Design, 2019, 40(1):223-229. ] DOI:10.16208/j.issn1000-7024.2019.01.037

[73]	刘耀华. 基于难例挖掘和域自适应的视觉位置识别[D]. 武汉: 华中科技大学, 2019. DOI: 10.27157/d.cnki.ghzku.2019.003830. [Liu Y H. Visual place recognition based on hard example mining and domain adaptation[D]. Wuhan: Huazhong University of Science and Technology, 2019. DOI:10.27157/d.cnki.ghzku.2019.003830. ]

[74]	Izquierdo S, Civera J. Optimal transport aggregation for visual place recognition[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2024:17658-17668. DOI:10.1109/CVPR52733.2024.01672

[75]

Hou

, Wang

Y J

, Fu

Z J

, et al. LSTS-VPR: Robust visual place recognition in challenging environments using landmarks associated with spatiotemporal and semantic information[C]// 2022 7th International Conference on Robotics and Automation Engineering (ICRAE). IEEE, 2022:350-357. DOI:10.1109/ICRAE56463.2022.10056180

[76]	Garg K, Puligilla S S, Kolathaya S, et al. Revisit anything: Visual place recognition via image segment retrieval[M]//Computer Vision - ECCV 2024. Cham: Springer Nature Switzerland, 2024:326-343. DOI:10.1007/978-3-031-73113-6_19

[77]	Zhou B L, Lapedriza A, Khosla A, et al. Places: A 10 million image database for scene recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6):1452-1464. DOI:10.1109/TPAMI.2017.2723009 PMID

[78]	Chen Z T, Liu L Q, Sa I, et al. Learning context flexible attention model for long-term visual place recognition[J]. IEEE Robotics and Automation Letters, 2018, 3(4):4015-4022. DOI:10.1109/LRA.2018.2859916

[79]	Yang Y K, Ma B, Liu X D, et al. GSAP: A global structure attention pooling method for graph-based visual place recognition[J]. Remote Sensing, 2021, 13(8):1467. DOI:10.3390/rs13081467

[80]	Shu D W, Kwon J. Hierarchical bidirected graph convolutions for large-scale 3-D point cloud place recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(7):9651-9662. DOI:10.1109/TNNLS.2023.3236313

[81]	Qin C, Zhang Y Z, Liu Y D, et al. A visual place recognition approach using learnable feature map filtering and graph attention networks[J]. Neurocomputing, 2021, 457:277-292. DOI:10.1016/j.neucom.2021.06.038

[82]	Grodal J, Lahtinen A. String topology of finite groups of Lie type[EB/OL]. 2020: 2003.07852. https://arxiv.org/abs/2003.07852v1

[83]	Zhi L Y, Xiao Z F, Qiang Y G, et al. Street-level image localization based on building-aware features via patch-region retrieval under metropolitan-scale[J]. Remote Sensing, 2021, 13(23):4876. DOI:10.3390/rs13234876

[84]	Peng G H, Huang Y F, Li H S, et al. LSDNet: A lightweight self-attentional distillation network for visual place recognition[C]// 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022:6608-6613. DOI:10.1109/IROS47612.2022.9982272

[85]	Grainge O, Milford M, Bodala I, et al. Design space exploration of low-bit quantized neural networks for visual place recognition[J]. IEEE Robotics and Automation Letters, 2024, 9(6):5070-5077. DOI:10.1109/LRA.2024.3386459

[86]	Cao B Y, Araujo A, Sim J. Unifying deep local and global features for image search[M]// Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020:726-743. DOI:10.1007/978-3-030-58565-5_43

[87]	Ng T, Balntas V, Tian Y R, et al. SOLAR: Second-order loss and attention for image retrieval[M]// Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020:253-270. DOI:10.1007/978-3-030-58595-2_16

[88]	Paolicelli V, Tavera A, Masone C, et al. Learning semantics for visual place recognition through multi-scale attention[M]//Image Analysis and Processing-ICIAP 2022. Cham: Springer International Publishing, 2022:454-466. DOI:10.1007/978-3-031-06430-2_38

[89]	Trivigno G, Berton G, Aragon J, et al. Divide&Classify: Fine-grained classification for city-wide visual place recognition[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2023:11108-11118. DOI:10.1109/ICCV51070.2023.01023

[90]	Özdemir A, Scerri M, Barron A B, et al. EchoVPR: Echo state networks for visual place recognition[J]. IEEE Robotics and Automation Letters, 2022, 7(2):4520-4527. DOI:10.1109/LRA.2022.3150505

[91]	Nie J W, Feng J M, Xue D Y, et al. A novel image descriptor with aggregated semantic skeleton representation for long-term visual place recognition[C]//2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 2022:245-251. DOI: 10.1109/ICPR56361.2022.9956385

[92]	Doan D, Latif Y, Chin T J, et al. Scalable place recognition under appearance change for autonomous driving[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2019:9318-9327. DOI: 10.1109/iccv.2019.00941

[93]	Peng G H, Zhang J, Li H S, et al. Attentional pyramid pooling of salient visual residuals for place recognition[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2021:865-874. DOI:10.1109/ICCV48922.2021.00092

[94]	Zhang H, Chen X, Jing H M, et al. ETR: An efficient transformer for re-ranking in visual place recognition[C]// 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2023:5654-5663. DOI: 10.1109/WACV56688.2023.00562

[95]	Liu D F, Cui Y M, Yan L Q, et al. DenserNet: Weakly supervised visual localization using multi-scale feature aggregation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(7):6101-6109. DOI:10.1609/aaai.v35i7.16760

[96]	Henkel C. Efficient large-scale image retrieval with deep feature orthogonality and Hybrid-Swin-Transformers[EB/OL]. 2021:2110.03786. https://arxiv.org/abs/2110.03786v2

[97]	Nguyen S T, Fontan A, Milford M, et al. FUSELOC:Fusing global and local descriptors to disambiguate 2D-3D matching in visual localization[EB/OL]. 2024:2408.12037. https://arxiv.org/abs/2408.12037v1

[98]	Hou P S, Chen J, Nie J W, et al. Forest: A lightweight semantic image descriptor for robust visual place recognition[J]. IEEE Robotics and Automation Letters, 2022, 7(4):12531-12538. DOI:10.1109/LRA.2022.3219030

[99]	Dutto M, Berton G, Caldarola D, et al. Collaborative visual place recognition through federated learning[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2024:4215-4225. DOI:10.1109/CVPRW63382.2024.00425

[100]

Ali-bey

, Chaib-draa

, Giguère

. Global proxy-based hard mining for visual place recognition[EB/OL]. 2023: 2302.14217. https://arxiv.org/abs/2302.14217v1

[101]

Kordopatis-Zilos

, Galopoulos

, Papadopoulos

, et al. Leveraging EfficientNet and contrastive learning for accurate global-scale location estimation[C]// Proceedings of the 2021 International Conference on Multimedia Retrieval. ACM, 2021:155-163. DOI:10.1145/3460426.3463644

[102]

S X

, Zhang

C H

, Fan

L B

, et al. AddressCLIP: Empowering vision-language models for city-wide image address localization[M]// Computer Vision-ECCV 2024. Cham: Springer Nature Switzerland, 2024:76-92. DOI:10.1007/978-3-031-73390-1_5

[103]

Veličković

, Cucurull

, Casanova

, et al. Graph attention networks[EB/OL]. 2017: 1710.10903. https://arxiv.org/abs/1710.10903v3

[104]

Lin

T Y

, Dollár

, Girshick

, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:936-944. DOI:10.1109/CVPR.2017.106

[105]

Chu

T Y

, Chen

Y M

, Huang

L H

, et al. A grid feature-point selection method for large-scale street view image retrieval based on deep local features[J]. Remote Sensing, 2020, 12(23):3978. DOI:10.3390/rs12233978

[106]

王红君, 郝金龙, 赵辉, 等. 大规模城市环境下视觉位置识别技术的研究[J]. 计算机应用与软件, 2021, 38(8):194-198,226.

[Wang

H J

, Hao

J L

, Zhao

, et al. Visual position recognition technology in large-scale urban environment[J]. Computer Applications and Software, 2021, 38(8):194-198,226. ]

[107]

孔德磊, 方正, 李昊佳, 等. 基于事件的端到端视觉位置识别弱监督网络架构[J]. 机器人, 2022, 44(5):613-625.

DOI

[Kong

D L

, Fang

, Li

H J

, et al. An end-to-end weakly supervised network architecture for event-based visual place recognition[J]. Robot, 2022, 44(5):613-625. ] DOI:10.13973/j.cnki.robot.210303

[108]

Grainge

, Milford

, Bodala

, et al. TeTRA-VPR: A ternary transformer approach for compact visual place recognition[EB/OL]. 2025: 2503.02511. https://arxiv.org/abs/2503.02511v1

[109]

Chu

T Y

, Chen

Y M

, Su

, et al. A news picture geo-localization pipeline based on deep learning and street view images[J]. International Journal of Digital Earth, 2022, 15(1):1485-1505. DOI: 10.1080/17538947.2022.2121437

[110]

Torii

, Arandjelović

, Sivic

, et al. 24/7 place recognition by view synthesis[C]// 2018 IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE, 2018:257-271. DOI:10.1109/TPAMI.2017.2667665

[111]

Yildiz

, Khademi

, Siebes

R M

, et al. AmsterTime: A visual place recognition benchmark dataset for severe domain shift[C]// 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 2022:2749-2755. DOI:10.1109/ICPR56361.2022.9956049

[112]

Ros

, Sellart

, Materzynska

, et al. The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016:3234-3243. DOI:10.1109/CVPR.2016.352

[113]

Warburg

, Hauberg

, Lopez-Antequera

, et al. Mapillary street-level sequences: A dataset for lifelong place recognition[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020:2623-2632. DOI:10.1109/cvpr42600.2020.00270

[114]

Stenborg

, Toft

, Hammarstrand

. Long-term visual localization using semantically segmented images[C]//2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018:6484-6490. DOI: 10.1109/ICRA.2018.8463150

[115]

Sünderhauf

, Neubert

, Protzel

. Are we there yet? Challenging SeqSLAM on a 3000 km journey across all four seasons[J]. International Conference on Robotics and Automation, 2013:1-3

[116]

Merrill

, Huang

G Q

. Lightweight unsupervised deep loop closure[EB/OL]. 2018:1805.07703. https://arxiv.org/abs/1805.07703v2

[117]

Sahdev

, Tsotsos

J K

. Indoor place recognition system for localization of mobile robots[C]// 2016 13th Conference on Computer and Robot Vision (CRV). IEEE, 2016:53-60. DOI:10.1109/CRV.2016.38

[118]

Shotton

, Glocker

, Zach

, et al. Scene coordinate regression forests for camera relocalization in RGB-D images[C]// 2013 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2013:2930-2937. DOI: 10.1109/CVPR.2013.377

[119]

Vargas

, Castells

. Rank and relevance in novelty and diversity metrics for recommender systems[C]// Proceedings of the Fifth ACM Conference on Recommender Systems. ACM, 2011:109-116. DOI:10.1145/204393 2.2043955

[120]

Boyd

, Eng

K H

, Page

C D

. Area under the precision-recall curve: Point estimates and confidence intervals[M]// Advanced Information Systems Engineering. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013:451-466. DOI: 10.1007/978-3-642-40994-3_29

[121]

Fawcett

. An introduction to ROC analysis[J]. Pattern Recognition Letters, 2006, 27(8):861-874. DOI:10.1016/j.patrec.2005.10.010

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献