Most Viewed

  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All

Please wait a minute...
  • Select all
    |
  • QIN Qiming
    Journal of Geo-information Science. 2025, 27(10): 2283-2290. https://doi.org/10.12082/dqxxkx.2025.250426

    [Objectives] With the rapid increase in the number of Earth observation satellites in orbit worldwide, remote sensing data has been accumulating explosively, offering unprecedented opportunities for Earth system science research to dynamically monitor global change. At the same time, it also brings a series of challenges, including multi-source heterogeneity, scarcity of labeled data, insufficient task generalization, and data overload. [Methods] To address these bottlenecks, Google DeepMind has proposed AlphaEarth Foundations (AEF), which integrates multimodal data such as optical imagery, SAR, LiDAR, climate simulations, and textual sources to construct a unified 64-dimensional embedding field. This framework achieves cross-modal and spatiotemporal semantic consistency for data fusion and has been made openly available on platforms such as Google Earth Engine. [Results] The main contributions of AEF can be summarized as follows: (1) Mitigating the long-standing “data silos” problem by establishing globally consistent embedding layers; (2) Enhancing semantic similarity measurement through a von Mises-Fisher (vMF) spherical embedding mechanism, thereby supporting efficient retrieval and change detection; (3) Shifting complex preprocessing and feature engineering tasks into the pre-training stage, enabling downstream applications to become “analysis-ready” and significantly reducing application costs. The paper further highlights the application potential of AEF in three stages: (1) Initially in land cover classification and change detection; (2) Subsequently in deep coupling of embedding vectors with physical models to drive scientific discovery; (3) Ultimately evolving into a spatial intelligence infrastructure, serving as a foundational service for global geospatial intelligence. Nevertheless, AEF still faces several challenges: (1) Limited interpretability of embedding vectors, which constrains scientific attribution and causal analysis; (2) Uncertainties in domain transfer and cross-scenario adaptability, with robustness in extreme environments yet to be verified; (3) Performance advantages that require more empirical validation across regions and independent experiments. [Conclusions] Overall, AEF represents a new direction for research in remote sensing and geospatial artificial intelligence, with breakthroughs in data efficiency and cross-task generalization providing solid support for future Earth science studies. However, its further development will depend on continuous advances in interpretability, robustness, and empirical validation, as well as on transforming the 64-dimensional embedding vectors into widely usable data resources through different pathways.

  • YU Hanyang, LAN Chaozhen, WANG Longhao, WEI Zijun, GAO Tian, WANG Yiqiao, LIU Ruimeng
    Journal of Geo-information Science. 2025, 27(8): 1896-1919. https://doi.org/10.12082/dqxxkx.2025.250052

    [Significance] Multimodal remote sensing image matching has become a fundamental task in integrated Earth observation, enabling precise spatial alignment across heterogeneous image sources. [Progress] As the diversity of sensing modalities, acquisition geometries, and temporal conditions increases, traditional matching frameworks have proven inadequate for capturing complex variations in radiometric responses, geometric configurations, and semantic representations. This technological gap has driven a significant paradigm shift from handcrafted feature engineering to deep learning-based solutions, which now form the core of current research and application development. This paper provides a comprehensive and structured review of recent advances in deep learning methods for multimodal remote sensing image matching, with an emphasis on the evolution of methodological paradigms and technical frameworks. It establishes a clear dual-path classification: the single-session approach and the end-to-end approach. The former selectively replaces or enhances individual components of traditional pipelines, such as feature encoding or similarity estimation, using neural network modules. The latter integrates the entire matching process into a unified network architecture, enabling joint optimization of feature learning, transformation modeling, and correspondence inference within a closed loop. This progression reflects the field's transition from modular adaptation to holistic modeling, revealing a deeper integration of data-driven representation learning with geometric reasoning. The review further examines the development of architectural strategies supporting this evolution, including attention mechanisms, graph-based structures, hierarchical feature fusion, and modality-bridging transformations. These innovations contribute to improved robustness, semantic consistency, and adaptability across diverse matching scenarios. Recent trends also demonstrate a growing reliance on pretrained vision foundation models, which provide transferable feature spaces and reduce the dependence on large-scale labeled datasets. In addition to summarizing technical advancements, the paper analyzes representative datasets, performance evaluation strategies, and the current challenges that constrain real-world deployment. These include limited data availability, weak cross-scene generalization, computational inefficiency, and insufficient interpretability. [Prospect] By synthesizing methodological progress with practical demands, the review identifies key directions for future research, including the design of modality-invariant representations, physically-informed neural architectures, and lightweight solutions tailored for scalable, real-time image registration in complex operational environments.

  • ZHANG Nuan, WANG Tao, ZHANG Yan, WEI Yibo, LI Liuwen, LIU Yichen
    Journal of Geo-information Science. 2025, 27(8): 1751-1779. https://doi.org/10.12082/dqxxkx.2025.250137

    [Significance] Street View Image-based Visual Place Recognition (SV-VPR) is a geographical location recognition technology that relies on visual feature information. Its core task is to predict and accurately locate unknown locations by analyzing the visual features of street view images. This technology must overcome challenges such as appearance changes under different environmental conditions (e.g., lighting differences between day and night, seasonal variations) and viewpoint differences (e.g., perspective deviations between vehicle-mounted cameras and satellite images). Accurate recognition is achieved through calculating image feature similarity, applying geometric constraints, and related methods. As an interdisciplinary field of computer vision and geographic information science, SV-VPR is closely related to visual positioning, image retrieval, SLAM, and more. It has significant application value in areas such as UAV autonomous navigation, high-precision positioning for autonomous driving, construction of geographical boundaries in cyberspace, and integration of augmented reality environments. It is particularly advantageous in GPS-denied environments. [Analysis] This paper systematically reviews the research progress of visual location recognition based on street view images, covering the following aspects: First, the basic concepts and classifications of visual place recognition technologies are introduced. Second, the foundational principles and categorization methods specific to street view image-based visual place recognition are discussed in depth. Third, the key technologies in this field are analyzed in detail. Furthermore, relevant datasets for street view image-based visual place recognition are comprehensively reviewed. In addition, evaluation methods and index systems used in this domain are summarized. Finally, potential future research directions for SV-VPR are explored. [Purpose] This review aims to provide researchers with a systematic overview of the technological development trajectory of SV-VPR, helping them quickly understand the current research landscape. It also offers a comparative analysis of key technologies and evaluation methods to support algorithm selection, and identifies emerging challenges and potential breakthrough areas to inspire innovative research.

  • LI Junming, HU Yaxuan, WANG Nannan, WANG Siyaqi, WANG Ruolan, LYU Lin, FANG Ziqing
    Journal of Geo-information Science. 2025, 27(7): 1501-1519. https://doi.org/10.12082/dqxxkx.2025.250161

    [Objectives] Classical statistical inference typically relies on the assumptions of large sample sizes and independent, identically distributed (i.i.d.) observations, conditions that spatio-temporal data frequently violate, leading to inherent theoretical limitations in conventional approaches. In contrast, Bayesian spatio-temporal statistical methods integrate prior knowledge and treat all model parameters as random variables, thereby forming a unified probabilistic inference framework. This enables the incorporation of a broader range of uncertainties and offers robustness in modelling small samples and dependent structures, making Bayesian methods highly advantageous and increasingly influential in spatio-temporal analysis. [Progress] From the perspective of methodological evolution, this paper systematically reviews mainstream Bayesian spatio-temporal statistical models from two complementary perspectives: traditional Bayesian statistics and the Bayesian machine learning. The former includes Bayesian Spatio-temporal Evolutionary Hierarchical Models, Bayesian Spatio-temporal Regression Hierarchical Models, Bayesian Spatial Panel Data Models, Bayesian Geographically Weighted Spatio-temporal Regression Models, Bayesian Spatio-temporal Varying Coefficient Models, and Bayesian Spatio-temporal Meshed Gaussian Process Model. The latter includes Bayesian Causal Forest Models, Bayesian Spatio-temporal Neural Networks, and Bayesian Graph Convolutional Neural Networks. In terms of application, the review highlights representative studies across domains such as public health, environmental sciences, socio-economic and public safety, as well as energy and engineering. [Prospect] Bayesian spatio-temporal statistical methods need to achieve breakthroughs in multi-source heterogeneous data modeling, integration with deep learning, incorporation of causal inference mechanisms, and optimization of high-performance computing. These advances are essential to balance theoretical rigor with practical adaptability and to promote the development of a next-generation spatio-temporal modeling paradigm characterized by causal inference, adaptive generalization, and intelligent analysis.

  • LIU Kang
    Journal of Geo-information Science. 2025, 27(7): 1520-1531. https://doi.org/10.12082/dqxxkx.2025.250196

    [Significance] Human mobility is closely tied to transportation, infectious disease spread, and public safety, making trajectory analysis and modeling a long-standing research focus. While numerous specialized trajectory models, such as interpolation, prediction, and classification models, have been developed using machine learning or deep learning, most are task-specific and trained on localized datasets, limiting their generalizability across tasks, regions, or trajectory data. Recent advances in generative AI have demonstrated the potential of foundation models in NLP and computer vision, motivating the need for a trajectory foundation model capable of learning universal patterns from large-scale mobility data to support diverse downstream applications. [Methods] This paper first reviews the research progress of various specialized trajectory models. It then categorizes trajectory modeling tasks into conventional tasks (e.g., trajectory similarity computation, interpolation, prediction, and classification) and generation task (i.e., trajectory generation), and elaborates on recent advances in trajectory foundation models for these two types of tasks. [Conclusions] The paper argues that trajectory foundation models for conventional tasks should enhance not only task generalization but also spatial and data generalization. Trajectory foundation models for generation task must address the challenge of spatial generalization, enabling the generation of large-scale trajectory data "from scratch" based on easily obtainable macro-level urban data or features. Furthermore, integrating trajectory data with other data types (e.g., text, maps, and other geospatial data) to construct multimodal geographic foundation models, as well as developing application-oriented trajectory foundation models for fields such as transportation, public health, and public safety, are promising research directions worthy of future exploration.

  • HE Li, WANG Rong
    Journal of Geo-information Science. 2025, 27(9): 2151-2164. https://doi.org/10.12082/dqxxkx.2025.250273

    [Significance] Space is not merely a physical place, but a productive arena of social relations. Social phenomena are inherently endowed with spatial attributes, making the spatial perspective a critical pathway for understanding complex social issues. With the deepening "spatial turn" in the social sciences and continuous advancements in Geographic Information Systems (GIS)—particularly in data acquisition, spatial analysis and modeling, and spatial visualization—GIS has become an essential tool for addressing social issues. However, disciplinary differences in theoretical paradigms, methodological logic, and scale cognition between geography and the social sciences constrain their deeper integration. Existing literature lacks a systematic synthesis of integration trends, underlying challenges, and empowerment pathways, necessitating a comprehensive clarification of fusion mechanisms, core obstacles, and emerging opportunities. [Progress] This paper identifies five key advantages of GIS in empowering social science research: expanding spatial analytical thinking, supporting spatiotemporal data, enhancing survey techniques, enriching representational forms, and strengthening analytical capabilities. We review representative GIS applications in economics, political science, and sociology. From dimensions such as spatial cognition, data capacity, methodological adoption, and research hotspots, we distill application characteristics across these disciplines, revealing both commonalities and differences. While all three disciplines recognize spatial effects, their theoretical orientations shape distinct technical approaches—economics emphasizes causal identification, political science focuses on geopolitical structures, and sociology prioritizes contextual representation. Through a three-dimensional analysis—data, methodology, and cognition—we examine three major challenges in addressing social issues: the mismatch between data and research questions, the difficulty of integrating methods with causal mechanisms, and the contextual misalignment of place and scale, which reflect deeper issues of data suitability, methodological coherence, and the validity of spatial reasoning. [Prospects] The advancement of artificial intelligence, especially large models, injects new methodological momentum into GIS-based spatial analysis and brings threefold opportunities for addressing social issues. First, large models are driving spatial analysis from correlation-based description toward transparent causal inference; Second, multi-source data fusion and the generation of "silicon-based samples" help overcome the limitations of traditional survey data. Third, an emerging "space-survey" integrated framework is constructing a "spatial cognitive infrastructure" to support social research. Future efforts should establish a synergistic "large model-spatial analysis" paradigm that integrates these three opportunities. By simultaneously addressing challenges of data matching, method integration, and contextual misalignment, this paradigm can elevate GIS from a supportive tool to a core engine for theory generation and mechanism interpretation. This transformation will enhance the scientific value and practical effectiveness of GIS and spatial analysis in addressing complex social issues, fostering a bidirectional interaction between methodological innovation and theoretical advancement.

  • SHI Shihao, SHI Qunshan, ZHOU Yang, HU Xiaofei, QI Kai
    Journal of Geo-information Science. 2025, 27(7): 1596-1607. https://doi.org/10.12082/dqxxkx.2025.250015

    [Objectives] Small object detection is of great significance in both military and civil applications. However, due to challenges such as low resolution, high noise environments, target occlusion, and complex backgrounds, traditional detection methods often struggle to achieve the necessary accuracy and robustness. The problem of detecting small objects in complex scenes remains highly challenging. Therefore, this paper proposes a hybrid feature and multi-scale fusion algorithm for small object detection. [Methods] First, a Hybrid Conv and Transformer Block (HCTB) is designed to fully utilize local and global context information, enhancing the network's perception of small objects while optimizing computational efficiency and feature extraction capability. Second, a Multi-Dilated Shared Kernel Conv (MDSKC) module is introduced to extend the receptive field of the backbone network using dilated convolutions with varying expansion rates, thereby enabling efficient multi-scale feature extraction. Finally, the Omni-Kernel Cross Stage Model (OKCSM), constructed based on the concepts of Omni-Kernel and Cross Stage Partial, is integrated to optimize the small target feature pyramid network. This approach helps preserve small object information and significantly improves detection performance. [Results] Ablation and comparison experiments were conducted on the VisDrone2019 and TinyPerson datasets. Compared to the baseline model YOLOv8n, the proposed method improves precision, recall, mAP@50, and mAP@50:95 by 1.3%, 3.1%, 3%, and 1.9%, respectively on VisDrone2019, and by 3.6%, 1.3%, 2.1%, and 0.7%, respectively on TinyPerson. Additionally, the model size and GFLOPs are only 6.3 MB and 11.3 G, demonstrating its efficiency. Furthermore, compared with classical algorithms, such as HIC-YOLOv5, TPH- YOLOv5, and Drone-YOLO, the proposed algorithm demonstrates significant advantages and superior performance. [Conclusions] The algorithm effectively improves detection accuracy, confirming its strong performance in addressing small object detection in complex scenes.

  • SUN Baodi, CHEN Keying, CHEN Zhaohui, WANG Chun, YAN Yuxi, TANG Jingchao, LIU Yifeng
    Journal of Geo-information Science. 2025, 27(7): 1671-1686. https://doi.org/10.12082/dqxxkx.2025.250058

    [Significance] As the basic unit of a city, the carbon emission levels and accuracy of community-scale accounting directly impact the overall effectiveness of emission reduction in the construction industry. This paper reviews the main methods of carbon accounting, evaluates their advantages and disadvantages, and proposes a new approach to enhance the accuracy and comprehensiveness of community carbon accounting using digital twin technology. [Progress] This paper first introduces three traditional carbon accounting methods, namely the carbon emission factor method, the mass balance method, and the direct measurement method, and discusses their applications. It then identifies digital twin technologies suitable for community-scale carbon accounting, including Building Information Modeling (BIM), Geographic Information System (GIS), and the Internet of Things (IoT). The paper analyzes current development trends, including: (i) expanding the scope of carbon accounting to the community level using digital twin technology, (ii) strengthening the integration and interoperability of digital twin systems, and (iii) establishing a community carbon accounting framework grounded in digital twin technology. It further proposes integrating BIM, GIS, and IoT into a unified system based on the city information model to build a comprehensive community carbon emission platform. [Prospect] Looking ahead, the application of digital twin technology holds promise for enabling accurate carbon accounting, emission forecasting, reduction pathway planning, and performance evaluation for communities of varying scales and geographical contexts. Furthermore, with advances in AI technology, it is anticipated that city information models for community carbon accounting will increasingly integrate AI agents, leveraging the power of big data, large models, and high-performance computing, to create intelligent carbon accounting systems for the smart city era.

  • HAO Yuanfei, LIU Zhe, ZHENG Xi, QIAN Yun
    Journal of Geo-information Science. 2025, 27(9): 2070-2085. https://doi.org/10.12082/dqxxkx.2025.250129

    [Objectives] Street space serves as the primary perceptual interface for pedestrians in urban environments, and the visual quality of these spaces plays a crucial role in enhancing their vitality. Traditional evaluation methods often rely on single-objective indicators, making it difficult to effectively link objective environmental features with pedestrians' subjective perceptions. [Methods] This study proposes a novel evaluation framework based on Large Language Models (LLMs), incorporating the style dimension of subjective perception and extending traditional single-indicator quantitative analysis to a comprehensive approach that integrates both quantification and stylization. This framework utilizes Baidu Street View imagery to quantitatively assess two objective indicators, namely green view index and sky view factor, through semantic segmentation techniques. Additionally, it evaluates six subjective indicators, including vegetation diversity, building typology, building continuity, sidewalk usage, roadway usage, and signage usage, by leveraging prompt-optimized LLMs. The study then categorizes street space visual quality features within the research area using the Latent Dirichlet Allocation (LDA) topic model, aiming to explore the spatial characteristics of different streets and identify optimization strategies. [Results] Using Beijing's Xicheng District as the study area, the results reveal spatial distribution patterns of vegetation density and sky openness, along with pedestrians' subjective evaluations of indicators such as vegetation diversity and building type. Cluster analysis identified comprehensive service streets centered around Xidan North Street, characteristic streets centered around Xihuangchenggen South Street, and mixed-type streets centered around Lingjing Hutong. [Conclusions] This study innovatively introduces a large language model with human-like perceptual capabilities, enhancing its performance through prompt engineering. The resulting framework enables efficient and integrated evaluation of street visual quality by combining both objective and subjective factors. This approach provides a practical reference for large-scale, automated analysis of street view imagery.

  • ZHAO Luying, ZHOU Yang, HU Xiaofei, HUANG Gaoshuang, GAN Wenjian, HOU Mingbo
    Journal of Geo-information Science. 2025, 27(10): 2293-2315. https://doi.org/10.12082/dqxxkx.2024.240262

    [Significance] Cross-view geolocalization is the process of using a satellite image with coordinate metadata as reference to determine the geographic coordinates of an unknown ground-view image. This problem is often viewed as an image matching task, where an overhead satellite image is segmented into a number of square blocks of satellite patches, and the ground image is matched with candidate satellite patches to retrieve the most similar satellite patch, using the position of the center pixel in that patch as the query location. [Progress] With the development of cross-view geolocalization, the technique has been extended to fine-grained metric localization of ground imagery, i.e., identifying which image coordinates in a satellite patch correspond to a ground-measured location. Given that satellite images have global coverage and are easy to obtain, their application as reference images in image positioning has significantly broadened the application scope of image geolocation technology. This trend has prompted growing academic interest and attention to cross-view geolocalization research. Along with the development of various algorithmic techniques, cross-view geo-localization has evolved from the manual extraction of features, which was mainly based on the geometric features of buildings, to deep learning approaches that are applicable to richer scenarios, such as suburban and urban areas. The specific localization idea has progressed from the image-level cross-view localization, which uses the retrieval method to directly mark the retrieved center coordinate of the satellite image as the location of the ground image, to pixel-level fine-grained localization, which more accurately assigns the coordinates of the corresponding pixel location of the satellite image to the ground image. However, the drastic change in the viewing angle of ground and satellite images results in a huge difference in visual content, making cross-view image localization more challenging. To improve the accuracy of cross-view geo-localization, various scholars have made algorithmic improvements, such as representation learning and metric calculation. Additionally, for the huge viewpoint differences, some scholars study specialized geometric transformation, image generation, and other viewpoint conversion methods between cross-view images. Others improve localization accuracy with the help of directional information, intermediate viewpoint connection of UAV image information, and more. [Purpose] This paper summarizes the development process of cross-view geolocation, the different methods for improving accuracy, the various data sets involved, and the evaluation methods at different stages. On this basis, we discuss the future development trends and provide corresponding summaries.

  • WANG Kaiqing, XIAO Yanyan, ZHANG Zhiwei, LI Yongle
    Journal of Geo-information Science. 2025, 27(7): 1738-1750. https://doi.org/10.12082/dqxxkx.2025.250148

    [Objectives] Points of Interest (POIs) have dual characteristics as geospatial entities and carriers of cultural information, serving as the data foundation for analyzing and identifying regional cultural expressions and functional traits. Identifying and analyzing the types and characteristics of tourism cultural scenes along the Grand Canal is of great significance for achieving differentiated and sustainable cultural tourism development. [Methods] By integrating POI data with scene theory, spatial entities are associated with cultural values, and quantitative statistics are combined with qualitative configuration analysis. A tourism-cultural amenity database was established using 476,968 POI records, categorized into 6 major categories and 24 sub-categories. The Delphi method was employed to determine scores for each subcategory related to tourism amenity scenes, which were then used to calculate the performance scores of tourism cultural scenes. Descriptive statistical analysis, K-means clustering, and hierarchical clustering were applied to identify types of tourism-cultural scenes. The clustering results were visualized on maps. Meanwhile, the characteristics, formation mechanisms, and corresponding countermeasures of these scene types were further analyzed. [Results] (1) The Jiangsu section of the Grand Canal exhibits distinctive local tourism-cultural characteristics, with strong regional identity and attractiveness. However, significant disparities exist in tourism-cultural value orientations, particularly in subcategories such as locality, glamour, exhibitionism, utilitarianism, and charisma, highlighting the heterogeneous features of tourism-cultural scenes in this area. (2) Cluster analysis classified 34 counties (cities or districts) along the Jiangsu section into four types: local scenes (10 regions), utilitarian scenes (8 regions), comfortable scenes (13 regions), and charming scenes (3 regions). Discriminant analysis validated the reliability of these clustering results. Each of the four scene types exhibits distinct characteristics. (3) The types of tourism-cultural scenes are influenced by the combined effects of multiple factors (economic development, urbanization, population, fiscal policy, transportation, and tourism resources), which can be summarized into three configuration-based influence paths. [Conclusions] This study introduces scene theory into cultural tourism research based on POI big data, offering a novel approach to promoting regionally differentiated and sustainable development of cultural tourism.

  • LI Xiao, WANG Shaohua, LIANG Haojian, ZHOU Liang, LIU Chang, WANG Runqiao, SU Cheng
    Journal of Geo-information Science. 2025, 27(8): 1822-1840. https://doi.org/10.12082/dqxxkx.2025.250144

    [Objectives] Sustainable development is an important issue for countries worldwide, encompassing key aspects such as sustainable transportation systems and inclusive, sustainable urbanization. As a crucial component of urban public service infrastructure, the public transportation network serves as a cornerstone of a city's stable operation, with the distribution of its stops and routes directly influencing residents' travel patterns. However, existing studies mainly focus on accessibility analysis, site selection optimization, and spatial coupling with factors such as population and land use, while lacking in-depth optimization approaches and clear mechanisms that address spatial heterogeneity and facility redundancy. [Methods] Taking Beijing as a case study, with a focus on Dongcheng and Xicheng Districts, this study constructs a system of influencing factors based on multi-source data, including public transportation networks, topography, and economic indicators, and employs the XGBoost machine learning method to reveal the impact weights of these driving factors on the distribution of bus stops. On this basis, a mathematical model incorporating stop redundancy is proposed to optimize the spatial layout of upstream and downstream stops, producing a spatial optimization map of bus stops in Beijing. [Results] The findings indicate that: (1) There is an imbalance in the distribution of public transportation facilities in Beijing, with the proportion of the population having convenient access to public transportation differing by more than 30% between central and peripheral urban areas. (2) Among the 19 influencing factors, population density is the key driving factor, accounting for 27.77%, while the number of scenic spots and parking facilities have minimal impact, with feature importance scores below 0.5%. (3) Compared to the p-median model, the proposed redundancy optimization model significantly reduces the redundancy of optimized stops while maintaining performance in minimizing weighted distance. The optimized stop layout is more evenly distributed along existing bus routes. [Conclusions] These findings provide valuable reference and theoretical support for the layout of bus stops and other public service facilities, contributing to the efficient utilization of public resources and promoting sustainable urban development.

  • PING Yifan, LU Jun, GUO Haitao, HOU Qingfeng, ZHU Kun, SANG Zehao, LIU Tong
    Journal of Geo-information Science. 2025, 27(7): 1608-1623. https://doi.org/10.12082/dqxxkx.2025.250051

    [Objectives] Cross-view image geolocation refers to a technology that determines the geographical location of an image by matching it with reference images taken from different perspectives and possessing precise location information. This technology plays a crucial role in real-world applications such as Unmanned Aerial Vehicle (UAV) navigation, environmental monitoring, and target positioning. Currently, most deep learning-based cross-view image retrieval and geolocation methods for drone-satellite tasks rely heavily on supervised learning. However, the scarcity of high-quality labeled data presents a significant limitation, hindering the generalization capability of these models. Moreover, existing methods often fail to effectively model the spatial layout of images, making it difficult to bridge the substantial domain gap between cross-view images, thereby limiting the accuracy and robustness of geolocation tasks. [Methods] To address these challenges, this paper proposes a novel cross-view image retrieval and localization architecture called DINO-MSRA. The architecture first employs the DINOv2 large model framework, fine-tuned by Conv-LoRA, as the feature encoder. This enhances the model's feature extraction capabilities with fewer parameters, improving both efficiency and accuracy. Second, we design a spatial relation-aware feature aggregator based on the Mamba module (MSRA) to more effectively aggregate image features. By embedding spatial configuration features into the global descriptor, this module significantly improves the model's performance in cross-view matching tasks, especially in complex scenarios where spatial relationships between objects are crucial. Finally, the InfoNCE loss function is adopted to train the model, optimizing contrastive learning and ensuring more accurate retrieval and localization results. [Results] Extensive comparative and ablation experiments were conducted on the University-1652 and SUES-200 datasets. The experimental results show that for drone-view target localization (drone→satellite) and drone navigation (satellite→drone) tasks, the proposed method achieves R@1 accuracies of 95.14% and 97.29%, respectively, on the University-1652 dataset, representing improvements of 0.68% and 1.14% over the current best algorithm, CAMP. On the SUES-200 dataset at an altitude of 150 meters, R@1 accuracies reach 97.2% and 98.75%, which are 1.8% and 2.5% higher than CAMP, respectively. Moreover, the proposed method requires significantly fewer parameters than existing algorithms, only 19.2% of those used by Sample4Geo. [Conclusions] In summary, the proposed DINO-MSRA architecture outperforms current state-of-the-art methods in cross-view image matching, achieving higher accuracy and faster inference speed. These results demonstrate its robustness and practical application potential in challenging real-world scenarios.

  • DU Pei, SHEN Yangjie, LIU Zhenxia, YU Zhaoyuan
    Journal of Geo-information Science. 2025, 27(9): 2106-2116. https://doi.org/10.12082/dqxxkx.2025.250220

    [Objectives] Global climate change, accelerating sea-level rise, and intensifying anthropogenic pressures are rendering the intricate human-land-sea nexus within coastal zones increasingly complex, sensitive, and vulnerable. This growing challenge underscores the urgent need for integrated coastal research frameworks capable of synthesizing environmental sensing, dynamic process simulation, and scenario projection. Addressing this critical gap, Digital Twin (DT) technology emerges as a transformative paradigm. By integrating multi-source data, sophisticated models, and domain knowledge into intelligent systems, DT offers unprecedented potential for creating precise virtual replicas and enabling intelligent management of complex coastal socio-ecological systems. [Analysis] This paper systematically analyzes the state of coastal zone digitalization, highlighting the pressing need for robust digital frameworks that can effectively represent and analyze the strong coupling between natural processes and human activities under multifaceted pressures. Building on this foundation, we propose a novel conceptual framework and implementation pathway for constructing a Digital Twin Coastal Zone (DTCZ). This framework explicitly positions land-sea interface processes as the foundational scenario and centers on human-land-sea feedback mechanisms as the core analytical thread. The proposed DTCZ system architecture is articulated across four pivotal dimensions: (1) Comprehensive information integration and knowledge aggregation; (2) Simulation of natural processes integrated with coupled human-nature decision support; (3) Synergistic short-term forecasting and long-term monitoring capabilities; and (4) Realistic multidimensional representation enabling intelligent interaction. We critically discuss the key technological enablers supporting this vision, encompassing coastal data governance and fusion, multi-scale scenario modeling, predictive analytics for critical coastal elements, persistent long-term monitoring strategies, and the development of the integrated DTCZ platform itself. At its core, the envisioned DTCZ leverages spatiotemporally fused multi-source data as its foundation and prioritizes enhanced scenario simulation and intervention capabilities. [Prospects] This framework is designed to overcome the limitations, such as fragmented data and limited predictive power, that constrain traditional coastal digital systems. By significantly advancing the computational tractability and overall manageability of coastal systems, the DTCZ paradigm offers a powerful new methodological tool and operational framework. It holds strong potential for supporting sustainable coastal development and modernizing governance structures in the face of ongoing climate change, providing a robust platform for evidence-based planning and adaptive management.

  • PAN Jiechen, XING Shuai, CAO Jiayin, DAI Mofan, HUANG Gaoshuang, ZHI Lu
    Journal of Geo-information Science. 2025, 27(9): 1999-2020. https://doi.org/10.12082/dqxxkx.2025.250151

    [Significance] With rapid advances in remote sensing, surveying and mapping, and autonomous driving technologies, 3D point cloud semantic segmentation, a core technology of digital twin systems, is attracting increasing research attention. Airborne point cloud semantic segmentation is regarded as a key technology for enhancing the automation and intelligence of 3D geographic information systems. [Analysis] Driven by deep learning and sensing technologies such as LiDAR, depth cameras, and 3D laser scanners, point cloud semantic segmentation can automatically classify and accurately recognize large-scale point cloud data through precise feature extraction and efficient model training. However, compared with typical high-density, category-balanced point cloud datasets (e.g., those used in indoor scenes, autonomous driving, or robotics), airborne point clouds present significant challenges in areas such as registration and feature extraction. These challenges stem from their unique characteristics, including large-scale 3D terrain coverage, dynamic platform motion errors, considerable variations in ground-object spatial scales, and complex occlusions. Currently, deep-learning-based airborne point cloud semantic segmentation is still in its early stages. Due to heterogeneous data acquisition methods, varying resolutions, and diverse attribute information, there remains a gap between existing research and practical algorithm deployment. [Progress] This paper provides a comprehensive review of the field, covering adaptive algorithms, datasets, performance metrics, and emerging methods along with their advantages and limitations. It also offers quantitative comparisons with existing technologies, evaluating representative methods in terms of precision and applicability. [Prospect] A thorough analysis suggests that breakthroughs in airborne point cloud semantic segmentation necessitate systematic research innovations across multiple dimensions, including feature representation, multimodal fusion, few-shot learning, algorithm interpretability, and large-scale model benchmarking. These advancements are essential not only for overcoming current bottlenecks in real-world applications but also for establishing robust technical foundations for critical use cases such as digital twin cities and disaster emergency response.

  • CHEN Lijia, CHEN Honghui, XIE Yanqiu, HE Tianyou, YE Jing, WU Linhuang
    Journal of Geo-information Science. 2025, 27(7): 1624-1637. https://doi.org/10.12082/dqxxkx.2025.250092

    [Objectives] High-resolution remote sensing image segmentation provides essential data support for urban planning, land use, and land cover analysis by accurately extracting terrain information. However, traditional methods face challenges in predicting object categories at the pixel level due to the high computational cost of processing high-resolution images. Current segmentation approaches often divide remote sensing images into a series of standard blocks and perform multi-scale local segmentation, which captures semantic information at different granularities. However, these methods exhibit weak feature interaction between blocks, as they do not consider contextual prior knowledge, ultimately reducing local segmentation performance. [Methods] To address this issue, this paper proposes a high-resolution remote sensing image segmentation framework named CATrans (Cross-scale Attention Transformer), which combines cross-scale attention with a semantic-based visual Transformer. CATrans first predicts the segmentation results of local blocks and then merges them to produce the final global image segmentation. It introduces contextual prior knowledge to enhance local feature representation. Specifically, we propose a cross-scale attention mechanism to integrate contextual semantic information with multi-level features. The multi-branch parallel structure of the cross-scale attention module enhances focus on objects of varying granularities by analyzing shallow-deep and local-global dependencies. This mechanism aggregates cross-spatial information across various dimensions and weights multi-scale kernels to strengthen multi-level feature representations, enabling the model to avoid deep stacking and multiple sequential processes. Additionally, a semantic-based visual Transformer is adopted to couple multi-level contextual semantic information. Spatial attention is used to reinforce these semantic representations. The multi-level contextual information is grouped to form abstract semantic concepts, which are then fed into the Transformer for sequence modeling. The self-attention mechanism within the Transformer captures dependencies between different positions in the input sequence, thereby enhancing the correlation between contextual semantics and spatial positions. Finally, enhanced contextual semantics are generated through feature mapping. [Results] This paper conducts comparative experiments on the DeepGlobe, Inria Aerial, and LoveDA datasets. The results show that CATrans outperforms existing segmentation methods, including Discrete Wavelet Smooth Network (WSDNet) and Integrating Shallow and Deep Network (ISDNet). CATrans achieves a Mean Intersection over Union (mIoU) of 76.2%, 79.2%, and 54.2%, and a Mean F1 Score (mF1) of 86.5, 87.8%, and 66.8%, with inference speeds of 38.1 FPS, 13.2 FPS, and 95.22 FPS on the respective datasets. Compared to the best-performing method, WSDNet, CATrans improves segmentation performance across all classes, with mIoU gains of 2.1%, 4.0%, and 5.3%, and mF1 gains of 1.3%, 1.8%, and 5.6%. [Conclusions] These findings highlight that the proposed CATrans framework significantly enhances high-resolution remote sensing image segmentation by incorporating contextual prior knowledge to improve local feature representation. It achieves an effective balance between segmentation performance and computational efficiency.

  • ZHU Ge, ZHANG Zheng, CAO Lianshuai, MA Kunyang, XU Xinyue, CHENG Yi
    Journal of Geo-information Science. 2025, 27(9): 2165-2176. https://doi.org/10.12082/dqxxkx.2025.250207

    [Objectives] Map compilation involves professional operations such as element selection, symbolization, and notation configuration. However, the process is often complex and inefficient. Leveraging Large Language Models (LLMs), text-to-map technology significantly simplifies the mapping process, lowers the barrier to entry for non-experts, and improves mapping efficiency. Nevertheless, challenges remain, including heavy reliance on manual debugging and fragmentation tool invocation. [Methods] This paper proposes a DeepSeek-based method for constructing text-to-map agents, which automates the entire process from user input to visualization output. This is achieved through the decomposition of natural language instructions and autonomous adaptation of tools. Centered on the DeepSeek model, the approach associates cartographic elements with specialized tools and usage descriptions, analyzes module structures and collaboration mechanisms, and organizes tools into five categories. By interpreting user instructions and reasoning through task-oriented chains of thought, the agent invokes appropriate visualization tools to achieve cross-modal mapping from natural language to maps, enabling autonomous task reasoning and automated map generation. [Results] To evaluate the agent's effectiveness, two types of mapping tasks—based on local map data and online map services—were conducted using DeepSeek-V3-0324 and R1 models as decision-making cores. The experiments demonstrated that the agent could autonomously complete mapping tasks from natural language using both local and tile-based data. Local map visualization experiments confirmed the agent's ability to reuse tools effectively in low-complexity scenarios. Tile-based map visualization experiments indicated the agent's capability in handling high-complexity scenarios involving multi-toolchain invocations. It accurately decomposed subtasks, assigned appropriate tools, and performed structured string-based input variable transmission or direct invocation without variables, all presented to users in a semi-transparent manner. Across forty repeated experiments, the V3 model outperformed the R1 model, achieving 6.56 times greater execution efficiency with an average processing speed of approximately 6.29 seconds per step, and demonstrated better modular adaptability with the LangChain agent framework. [Conclusions] The proposed construction method validates the feasibility of using DeepSeek-based agents for intelligent cartography. The V3 model exhibits strong potential in this field, with its performance (6.29 s/step) comparable to that of professional cartographers. The text-to-map intelligent agent significantly reduces the entry barrier for map creation, promotes the broader adoption of mapping tools in everyday use, and provides a valuable technical reference for integrating autonomous cartography with professional software platforms such as ArcGIS and QGIS.

  • HU Sheng, WANG Zhenhua, XING Hanfa, LIU Wenkai, LIU Yefei, LI Jiaju, ZHANG Guanheng
    Journal of Geo-information Science. 2025, 27(7): 1687-1703. https://doi.org/10.12082/dqxxkx.2025.250064

    [Objectives] China's transport sector is one of the fastest-growing sources of carbon emissions, with Road Traffic Carbon Emissions (RTCE) accounting for a large share. The way an urban road network is laid out may strongly influence RTCE, yet existing studies often ignore spatial non-stationarity and nonlinear effects. [Methods] This article takes 302 urban functional areas in China as the research object. Experiment data include 2019 urban road network data, road traffic carbon emission grid data, and population and GDP grid data. Firstly, ArcGIS and osmnx packages were used to visualize the road traffic carbon emissions, road grade distribution, traffic network density, and traffic network structure indicators of the 302 urban functional areas. The distribution characteristics of urban RTCE and urban road network were also analyzed. Then, the fitting effects of OLS, GWR, and MGWR models were compared and analyzed to identify the best model for relating road-network form to RTCE. Finally, based on the Multi-scale Geographically Weighted Regression model (MGWR) and SHAP analysis, the impact mechanism of road network morphology on RTCE was explored. [Results] ① The spatial distribution of Road Traffic Carbon Emissions (RTCE) exhibits a multi-center pattern, with core areas such as the Beijing-Tianjin-Hebei region (1 003.604 t/km2), the Yangtze River Delta (849.074 t/km2), the Pearl River Delta (1 615.291 t/km2), and provincial capital cities (1 168.886 t/km2), gradually decreasing toward the surrounding areas. The RTCE levels in the eastern region are generally higher than those in the central and western regions. In terms of the spatial distribution characteristics of road network morphology, the density of the traffic network and road hierarchy distribution resemble the RTCE distribution pattern. The southern regions exhibit higher Road Direction Richness (RDR), while the northern regions have higher road Grid Coefficients (GC). ② The impact of road network morphology on road traffic carbon emissions shows significant spatial heterogeneity. For example, Road Network Density (RND) has a more pronounced impact in the Pearl River Delta (0.636), while Road Direction Richness (RDR) has a greater influence in the Yangtze River Delta (0.259). Additionally, different road network morphological indicators vary considerably in their impact on RTCE across regions. ③ Road network morphology exhibits spatial non-stationarity and nonlinear effects on RTCE. For instance, the bandwidth of RND is only 45, whereas that of RCR is 215, indicating that different morphological characteristics affect RTCE at different spatial scales. In the SHAP analysis based on machine learning, which accounts for nonlinear impacts, RND is identified as the most important feature influencing road traffic carbon emissions. [Conclusions] This study employs the MGWR model and SHAP method to reveal the spatial non-stationarity and nonlinear influence mechanisms of road network morphology on road traffic carbon emissions. The results indicate that the impact of road network characteristics on traffic carbon emissions varies significantly across different regions. These differences are reflected not only in spatial distribution but also in the underlying mechanisms of influence. Therefore, when formulating low-carbon road network planning strategies, it is essential to fully consider the spatial heterogeneity, non-stationarity, and nonlinear characteristics of the road network. A comprehensive analysis from the multidimensional perspective of "density-hierarchy-structure" is recommended to promote low-carbon urban transportation. These findings provide a scientific basis for urban transportation planning and low-carbon development, contributing to sustainable urban development, improved traffic efficiency, and enhanced quality of life for residents.

  • GUO Xuan, ZHANG Jinxue, WEI Yibing, YU Shutong, LIU Junnan, LIU Haiyan, XU Daozhu, XU Mingliang
    Journal of Geo-information Science. 2025, 27(12): 2789-2801. https://doi.org/10.12082/dqxxkx.2025.250239

    [Objectives] The trajectory knowledge graph effectively captures the deep semantic relationships between trajectories and geospatial entities, offering significant advantages in revealing complex associated information. However, traditional methods for constructing knowledge graphs from domain-specific data sources rely heavily on expert knowledge, involve extensive data preprocessing and entity-relationship extraction, and require high levels of professional expertise. [Methods] To address these challenges, this paper proposes a trajectory knowledge graph construction method that supports natural language-driven task execution through prompt learning with large language models. First, a prompt strategy for the preprocessing task is designed to guide large language models in automatically generating data processing code for cleaning abnormal trajectories. Second, a two-level system prompt strategy is developed to enable tool invocation by matching and calling the trajectory knowledge extraction tool. This strategy allows non-expert users to complete the graph construction process using simple natural language instructions, significantly reducing reliance on programming skills and deep semantic understanding. [Results] To evaluate the feasibility and effectiveness of the proposed prompt strategies, a set of test sentences was created for trajectory preprocessing and entity-relation extraction tasks. Real-world ship and vehicle trajectory datasets were used to support knowledge graph construction. Experiments conducted on two representative large language models, Tongyi Qianwen and Baidu Qianfan, achieved average accuracy rates exceeding 75% and 80%, respectively, demonstrating strong generalization ability and practical value. [Conclusions] This study verifies the effectiveness of combining large language models with prompt learning in constructing trajectory knowledge graphs with low technical barriers, demonstrating the strong generalization and application value of the proposed prompt strategy.

  • CUI Liqun, CHU Rubo, JIN Haibo
    Journal of Geo-information Science. 2026, 28(2): 420-435. https://doi.org/10.12082/dqxxkx.2026.250482

    [Objectives] This paper tackles the challenges of small target detection, complex background handling, and dense target distribution in remote sensing imagery, proposing an advanced solution to enhance detection performance. The research aims to improve the accuracy and robustness of target detection in high-resolution remote sensing images by addressing limitations in existing methods, particularly in scenarios with intricate backgrounds and densely packed or small-sized targets. [Methods] Based on the YOLOv11 framework, this paper proposes an advanced remote sensing object detection method that effectively integrates multi-scale feature collaboration and scenario-aware mechanisms. To achieve superior performance, three novel modules are specifically designed: the Parallel Kernel Feature Fusion Module (PKFFM), which performs cross-scale feature integration through parallel convolution kernels to significantly enhance feature representation capability; the Cascaded Dual-Branch Attention Module (CDBAM), which sequentially emphasizes critical spatial and channel-wise information to refine feature extraction; and the Scenario-Aware Module (SAM), which enables the network to better capture and utilize global contextual information in complex remote sensing scenes. Furthermore, the RS-WIoU (Remote Sensing Wise Intersection over Union) loss function is introduced to address the challenges of high-resolution imagery and varying object scales, leading to more accurate bounding box regression and substantially improved overall detection performance. [Results] To comprehensively validate the effectiveness of the proposed method, extensive experiments are conducted on three widely recognized high-resolution remote sensing datasets: TGRS-HRRSD, NWPU VHR-10, and DOTA-v1.0. The experimental results demonstrate that the proposed approach achieves outstanding mean Precision (mP) of 97.3%, 87.3%, and 84.3% on the respective datasets, significantly surpassing the baseline YOLOv11 model with relative improvements of 2.1%, 3.8%, and 2.9%. In terms of the more comprehensive metric mAP50-95, the proposed method further delivers gains of 3.0%, 1.2%, and 1.5% across the three datasets. Beyond superior accuracy, the model exhibits remarkable lightweight characteristics and strong robustness against complex backgrounds, varying scales, and dense object distributions, consistently outperforming other state-of-the-art remote sensing object detection algorithms. [Conclusions] The proposed method dramatically improves both precision and robustness in high-resolution remote sensing image object detection by synergistically combining the PKFFM, CDBAM, SAM, and RS-WIoU loss function, delivering a highly efficient and effective solution for real-world remote sensing applications. This collaborative framework enables better multi-scale feature fusion, enhanced channel and spatial attention, adaptive scenario understanding, and more accurate localization, leading to state-of-the-art results. Future work will focus on validating these modules on additional datasets and downstream tasks to further strengthen generalization performance and drive continued innovation in remote sensing technology.

  • LI Zihao, LI Xueying, LI Yongqing, YAN Jun, SUN Yuanyuan
    Journal of Geo-information Science. 2025, 27(8): 1920-1935. https://doi.org/10.12082/dqxxkx.2025.250186

    [Objectives] Change detection is a critical and challenging task in remote sensing image analysis, playing an increasingly important role in Earth observation. Although deep learning-based change detection techniques have achieved promising results, issues such as false detection and missed detection persist, especially in detailed and edge regions. [Methods] To address these challenges, this paper proposes a Multi-Scale Wavelet Transform Attention Network (WTANet) that integrates spatial-domain contextual information with frequency-domain high-frequency details. By leveraging complementary features from both spatial and frequency domains, and guiding the network through multi-scale feature differences, WTANet enhances the model’ s ability to perceive subtle changes from both global semantic and local detail perspectives. WTANet introduces the Detail Capture Wavelet Module (DCWM), which combines the frequency-domain properties of wavelet transforms with attention mechanisms to effectively extract coarse-to-fine information from remote sensing images. This helps recover high-frequency details typically lost due to convolution or pooling operations, thereby improving the network's capability to detect fine-grained changes. Additionally, the Feature Difference Enhancement Decoder (FDED) emphasizes differences between multi-scale features, enriching the feature representations and boosting the model’s performance in complex scenarios. [Results] Experimental results on three high-resolution remote sensing change detection datasets, CDD, LEVIR-CD, and S2Looking, demonstrate that WTANet achieves F1 scores of 97.52%, 91.24%, and 65.43%, respectively. Compared with representative change detection models such as SNUNet and BIT, WTANet exhibits superior performance in detail and edge detection. [Conclusions] The WTANet proposed in this study effectively improves the accuracy of remote sensing image change detection by integrating spatial and frequency domain information. This approach not only provides new insights for future research in remote sensing image analysis, but also offers valuable technical references for urban planning, environmental monitoring, and related fields.

  • PENG Daifeng, LI Yaning, ZHOU Dingwei, GUAN Haiyan
    Journal of Geo-information Science. 2025, 27(9): 2250-2267. https://doi.org/10.12082/dqxxkx.2025.250160

    [Objectives] Traditional optical remote sensing Change Detection (CD) methods involve cumbersome procedures and exhibit low automation levels. On the contrary, deep learning-based CD approaches possess hierarchical feature representation capabilities, as well as automatic learning of change patterns which facilitates to end-to-end CD. This significantly enhances the accuracy and automation levels of CD algorithms, establishing them as the mainstream solutions in the era of Remote Sensing (RS) big data. However, high-resolution remote sensing images are characterized by high spatiotemporal complexity of ground objects. Meanwhile, existing deep learning CD methods typically employ Siamese encoder architectures to extract multi-temporal image features and calculate feature differences to identify changes. This conventional approach easily leads to insufficient utilization of differential information, limited modeling capacity, and susceptibility to interference from complex backgrounds, shadows, and illumination variations. [Methods] To address the abovementioned limitations, this paper proposes a Multi-Scale Differential Feature Enhancement Network (MSDFENet) based on a fully convolutional architecture. MSDFENet employs a Siamese encoder architecture to extract multi-scale features from bi-temporal remote sensing images. By introducing an Asymmetric Partial Double Convolution (APDC) module, it reduces the number of parameters and minimizes redundant information. Furthermore, differential operations are utilized to extract differential features that capture multi-scale details of change information. During the decoding phase, a Multi-scale Feature Attention (MSFA) module is designed to achieve collaborative optimization of deep semantic features and shallow geometric features through the incorporation of a spatial coordinate attention mechanism. Finally, progressive up-sampling is applied to gradually restore fine-grained details of changed regions, and a simple convolutional layer is used to generate change map. [Results] To validate the effectiveness of this method, extensive experiments and analysis are conducted against mainstream deep learning CD methods by using the LEVIR-CD, CDD, and WHU-CD datasets. Quantitative results indicate that MSDFENet achieves optimal accuracy metrics across all three datasets, with F1-scores reaching 90.68%, 94.65%, and 91.64%, and IoU values attaining 82.96%, 89.78%, and 84.56%, respectively. Visual results demonstrate that MSDFENet effectively suppresses complex background interference and enhances edge localization accuracy, yielding superior visual performance. Model complexity analysis confirms that MSDFENet achieves an optimal balance between CD accuracy and computational efficiency. [Conclusions] The proposed MSDFENet is capable of significantly enhancing differential feature representation, effectively suppressing complex background noise interference, and substantially improving multi-scale change capture capabilities, thereby advancing CD performance.

  • YAN Qiuyu, WANG Shu, HUA Yixin, ZHANG Jiangshui
    Journal of Geo-information Science. 2025, 27(12): 2833-2849. https://doi.org/10.12082/dqxxkx.2025.250379

    [Objectives] Fine-grained object recognition in remote sensing is a fundamental yet highly challenging task within both Earth observation and computer vision. It involves the accurate localization and detailed classification of objects in High-Spatial-Resolution (HSR) imagery, which often features highly complex backgrounds, inter-class similarities, and intra-class variations. In recent years, notable progress has been driven by algorithms that jointly exploit pixel-level, object-level, and neighborhood-level information. These approaches combine semantic features, texture characteristics, and spatial contextual relationships to form multi-source and multi-scale feature representations. Despite these advances, existing methods remain inadequate for directly utilizing higher-level fine-grained knowledge such as scene composition, entity semantics, attribute descriptions, and temporal dynamics. The core limitation lies in the absence of a formalized knowledge organization and representation paradigm capable of systematically bridging low-level visual perception and higher-order semantic reasoning. [Methods] To address these limitations, this study proposes a multi-level knowledge graph-based organization and representation framework specifically designed for fine-grained remote sensing object recognition. The framework adopts a four-layer hierarchical structure encompassing scene, entity, feature, and change dimensions, enabling dynamic and semantically rich descriptions of remote sensing targets. In this structure, scene nodes provide contextual constraints, entity nodes capture essential connotations of objects, feature nodes encode visual and semantic attributes, and change nodes represent temporal evolution. [Results] By incorporating spatiotemporal references, spatial morphology, and inter-object relationships, the proposed approach enables knowledge organization under multiple constraints, including scene, entity, feature, and temporal conditions. In doing so, it moves beyond purely data-driven perception and establishes a mechanism for knowledge-driven reasoning in remote sensing interpretation. Extensive experiments were conducted to validate the effectiveness of the proposed framework. When integrated into the baseline model STD, the knowledge graph yielded an improvement of approximately 3.82% in mean Average Precision (mAP) and 3.92% in recall, demonstrating its ability to enhance detection accuracy. Beyond this single case, the universality and robustness of the framework were confirmed by consistent performance improvements across several representative neural networks, including Oriented R-CNN, Oriented RepPoints, LSKNet, and STD. These results indicate that the proposed method not only improves recognition performance but also enhances interpretability and adaptability across heterogeneous architectures and datasets. [Conclusions] Overall, this study demonstrates that a multi-level knowledge graph provides an effective pathway for advancing fine-grained object recognition in remote sensing, transitioning from feature perception to knowledge reasoning. The method not only increases recognition accuracy but also enhances semantic interpretability and dynamic adaptability, offering a scalable solution for intelligent remote sensing analysis. Importantly, it provides new theoretical and practical insights for applications in geospatial information extraction, environmental and urban monitoring, disaster assessment, and military intelligence analysis. By systematically integrating structured knowledge with data-driven models, the proposed framework enriches the semantic depth of remote sensing interpretation and demonstrates strong potential for future developments in intelligent Earth observation systems.

  • ZHAO Pengjun, YU Zexin, CHEN Rui
    Journal of Geo-information Science. 2026, 28(1): 1-14. https://doi.org/10.12082/dqxxkx.2025.250149

    [Significance] Urban digital twin models simulate comprehensive urban scenes by digitally mapping physical entities through real-time data integration. These models serve as visual, real-time representations of urban dynamics within smart cities, incorporating technologies such as the Internet of Things (IoT), spatial information systems, artificial intelligence, and others. Building on the Physical-Social-Information (PSI) three-dimensional framework, this paper reviews the current research progress of urban digital twin models and innovatively proposes a four-dimensional coupling framework: Physical-Social-Information-Time (PSIT). [Progress] The main research findings are as follows: (1) Since the introduction of digital twin technology into urban research in 2017, related literature has grown rapidly, with theoretical foundations and functional design frameworks gradually maturing. Urban digital twin models have initially been developed along three dimensions, PSI, including the digital mapping of geographic entities, spatial analysis of human activities, and the fusion and mining of geographic big data. (2) To more accurately reflect the real urban operations, current models require breakthroughs in data, technology, and algorithms. The PSI framework tends to overemphasize spatial features while oversimplifying the temporal dimension, lacking a representation of the spatiotemporal differentiation inherent in urban systems. (3) Recognizing the critical role of spatiotemporal coupling in urban modeling, this paper elevates time from a background variable to an independent dimension. This is based on the unidirectional nature of time, the temporal constraints on social behavior, the allometric time scales of urban element evolution, and the time-dependent mechanisms behind system phase transitions. Accordingly, the PSIT four-dimensional coupling framework is proposed to enhance the logic of urban system evolution and advance the theoretical paradigm of urban digital twin modeling. The CitySPS platform is presented as a case study for detailed illustration. [Prospect] The PSIT four-dimensional coupling framework offers the potential for more precise simulation and accurate prediction in digital urban spaces, representing a promising direction for future "intelligent" urban governance.

  • WANG Shumin, LI Yuanjun, YANG Tao, YU Qingying
    Journal of Geo-information Science. 2025, 27(8): 1780-1795. https://doi.org/10.12082/dqxxkx.2025.250134

    [Objectives] To address the limitation that existing trajectory anomaly detection methods often fail to fully consider road network constraints, this study proposes a trajectory anomaly detection algorithm designed to effectively identify potential fraudulent behavior by taxi drivers during passenger pickup. [Methods] The algorithm first performs map matching, aligning trajectory data with the actual road network to obtain a sequence of path segments. Then, a two-stage clustering approach is applied: the matched trajectory paths are initially clustered to extract and expand core road segments, forming multiple core paths. Next, the algorithm calculates the similarity between different core paths and assigns highly similar paths to the same cluster, thereby generating multiple path clusters. Finally, a CostThreshold is computed based on each path cluster. The travel cost of each trajectory, calculated by combining travel time and distance costs, is compared against the corresponding CostThreshold to determine whether the trajectory is anomalous. [Results] Compared with traditional anomaly detection methods on real-world trajectory datasets, the proposed approach demonstrates superior performance in detecting anomalous trajectories. It achieves significantly lower runtime and improves detection accuracy by up to 9.03% compared to the STADCS method. The F1 score also improves considerably compared to the Two-Phase and ATDC methods, with maximum gains of 6.67% and 9.45%, respectively. [Conclusions] This paper presents a detection method that integrates road network constraints with two-stage clustering and travel cost evaluation. The method enhances detection accuracy and efficiency while reducing the false positive rate. It is well-suited for complex urban road networks, offering valuable support for vehicle trajectory data mining and traffic management decision-making, with significant practical value in fraud detection and related fields.

  • ZHANG Zhengjia, JIN Qingguang, WANG Chao, ZHANG Hong, WANG Mengmeng, LIU Xiuguo
    Journal of Geo-information Science. 2025, 27(9): 2191-2212. https://doi.org/10.12082/dqxxkx.2025.250078

    [Significance] Permafrost monitoring represents one of the most critical application domains of Synthetic Aperture Radar Interferometry (InSAR) technology. In recent years, significant progress has been made in InSAR-based permafrost monitoring, driven by advancements in SAR satellite systems, the evolution of InSAR algorithms, and the integration of multi-source remote sensing technologies. These developments have established InSAR as a high-precision, large-scale technical approach for monitoring and assessing permafrost degradation under climate warming. [Progress] This paper systematically reviews recent innovations in InSAR-based permafrost monitoring and explores its interdisciplinary potential. First, we introduce commonly used Synthetic Aperture Radar (SAR) satellite systems, explain the fundamental principles and recent advancements in InSAR methodologies, and summarize both the geographical distribution of current permafrost study areas and quantitative trends in InSAR-related publications. Next, we comprehensively evaluate state-of-the-art applications of InSAR, including permafrost surface deformation monitoring, physically driven model construction, deformation prediction, and active layer thickness retrieval. Special emphasis is placed on addressing long-standing challenges such as interferometric decorrelation and imperfect permafrost parameter modeling through the integration of thermal-optical remote sensing data and hydrological models. [Prospect] In alignment with the sustainable development needs of permafrost regions, we analyze emerging research trends in InSAR permafrost studies, particularly in deep learning, multi-source data fusion, and sustainability assessment. This review not only provides a methodological framework and roadmap for addressing key challenges in InSAR-based permafrost research but also lays a technical foundation for tackling critical issues such as Arctic engineering safety and ecosystem stability evaluation.

  • XIE Xin, HU Zui
    Journal of Geo-information Science. 2025, 27(7): 1566-1581. https://doi.org/10.12082/dqxxkx.2025.250018

    [Objectives] Traditional settlements are rich in local cultural values and represent the culmination of long-standing agricultural civilizations and deep-seated wisdom. Revealing their traditional cultural connotations through the lens of information geography holds significant value. However, there is currently a lack of theoretical research on the informational attributes of the Cultural Landscape Genes of Traditional Settlements (CLGTS) from a social-cultural semiotic perspective. [Methods] To address this gap, this paper introduces the concept of Symbolic Information Entropy of CLGTS (SIE-CLGTS), grounded in information entropy theory. Based on the varying modes of expression of CLGTS symbols, this study defines corresponding entropy calculation methods, including gray-scale distance, spectral analysis, Bayesian probability statistics, adjacency relationships, and structural elements. Using the landscape gene symbols of Zhongtian Village in Hunan Province as a case study, we conducted experiments to measure SIE-CLGTS and analyze the results. Additionally, a simulated tourist guide route was designed to explore potential applications of SIE-GLGTS. [Results] The results indicate that: (1) The information entropy-based method can effectively quantify CLGTS; (2) SIE-CLGTS reflects the cultural attributes embodied in these symbols; (3) SIE-CLGTS shows great potential for applications such as the preservation and sustainable development of traditional villages. [Conclusions] This paper represents a scientific exploration of the informational characteristics of landscape genes from an information science perspective, contributing to the deeper application of landscape gene theory.

  • ZHANG Yu, ZHUANG Huifu, ZHANG Xiang, TAN Zhixiang, LIU Yuhao, SHANG Jingjie, GUO Mingming
    Journal of Geo-information Science. 2025, 27(9): 2213-2229. https://doi.org/10.12082/dqxxkx.2025.250269

    [Objectives] Unsupervised change detection is a research hotspot in Synthetic Aperture Radar (SAR) image information extraction. However, existing studies often rely on single-method pseudo-label generation, leading to limited reliability. Moreover, most current methods mainly utilize spatial-domain features of multi-temporal images, with relatively few explorations into the fusion and utilization of spatial-frequency dual-domain features. To address these challenges, this study proposes a Mamba-based spatial-frequency feature fusion U-Net model for unsupervised SAR change detection. [Methods] The proposed approach first uses a difference segmentation-clustering fusion approach to generate high-quality pseudo-label samples, reducing dependence on manually labeled sample data. Next, a spatial-frequency dual-domain feature fusion U-Net model, integrating Mamba and wavelet convolution, is constructed to extract change information. Mamba is employed to efficiently capture global features, which are then fused with local spatial features extracted by convolutional networks. Simultaneously, wavelet convolution is used to enhance frequency-domain feature extraction. The fusion of dual-domain features is performed during the upsampling stage of the U-Net architecture. [Results] To validate the effectiveness of the proposed method, experiments were conducted on two SAR image datasets. Both qualitative and quantitative comparisons were made with traditional and deep learning-based methods. Compared to the best-performing baseline, the proposed method improved the average F1_Score by 2.35% and the Kappa coefficient by 2.65% across the two datasets, significantly enhancing the reliability of change detection results. [Conclusions] The proposed method effectively improves the automation and reliability of SAR image change detection, offering strong technical support for applications such as environmental monitoring, urban expansion analysis, and disaster assessment.

  • LIU Lin, ZHENG Senlin, XIAO Luzi
    Journal of Geo-information Science. 2025, 27(9): 2268-2282. https://doi.org/10.12082/dqxxkx.2025.250266

    [Objectives] In the context of rapid urbanization and increasing population mobility in China, spatial variations in crime location choices among offenders with different household registrations (hukou) have drawn growing attention. However, most existing studies have focused on the differences between local and non-local offenders, without further differentiating among local offenders. This study aims to refine the classification of local offenders into two subgroups, those with matching hukou and residence and those with unmatched hukou and residence within the same city, to reveal differences in offending rates across different hukou types. It also examines the spatial distribution characteristics of their crime location choices and the underlying influencing factors. [Methods] Using ZG City in China as a case study, this research integrates offender data, census data, POI data, mobile phone signaling data, and remote sensing imagery to classify street theft offenders into three categories: local offenders with matching hukou and residence, local offenders with unmatched hukou and residence within the city, and non-local offenders. Kernel density estimation and a discrete choice model are used to analyze differences in crime location choices among these offenders types. [Results] The findings are as follows: (1) Offending rates are highest among non-locals, followed by local offenders with unmatched hukou and residence, and lowest among local offenders with matching hukou and residence. (2) Regarding journey-to-crime distance, non-locals travel the shortest distance, followed by unmatched locals, while matched local offenders travel the furthest. (3) All three offender types commit crimes in hotspots within the ring road, yet spatial patterns differ: matched local offenders tend to commit crimes in the old urban areas, unmatched local offenders are active in both old urban areas and Central Business Districts (CBDs), and non-local offenders mainly operate in CBDs. (4) Crime location preferences vary significantly by hukou types. A key finding is that the proportion of the migrant population has a positive effect on non-local offenders, no significant effect on unmatched locals, and a negative effect on matched local offenders. Matched local offenders tend to avoid neighborhoods with high rates of undergraduates and are more likely to commit crimes in urban villages, hospitals, and near bus stops. For unmatched local offenders, the elderly population proportion and proximity to factories have significant positive effects. For non-local offenders, elderly population proportion, urban villages, and factories all have significant positive effects. [Conclusions] By subdividing local offenders, this study reveals distinct spatial distribution patterns and influencing factors for street theft offenders with different hukou statuses. The findings provide valuable insights for optimizing urban security planning, enhancing law enforcement efficiency, and developing targeted crime prevention strategies.

  • WANG Yi, XU Yisong, TANG Jiayu, ZHAO Huasen, HUANG Fenghua
    Journal of Geo-information Science. 2025, 27(8): 1965-1982. https://doi.org/10.12082/dqxxkx.2025.250174

    [Objectives] This study aims to investigate the spatiotemporal evolution characteristics and driving mechanisms of landslide evolution in the Three Gorges Reservoir Area (TGRA) under compound extreme weather events in 2022, characterized by severe drought in the Yangtze River Basin and localized heavy rainfall within the reservoir area. It also seeks to address the knowledge gap in understanding landslide evolution under extreme climatic conditions. [Methods] Focusing on the Zigui-Fengjie section, this study utilized Sentinel-1 SAR data and the Small Baseline Subset (SBAS) InSAR technique to monitor surface deformation and identify active landslides. A dynamic evaluation of landslide susceptibility was conducted combining the information value model and SBAS-InSAR results. Based on reservoir water level fluctuations, the study period was divided into two intervals: normal weather (July 2020 to July 2022) and extreme weather (July 2022 to September 2023), to comparatively analyze landslide evolution patterns and driving mechanisms. [Results] The results are as follows: (1) A total of 136 active landslides were identified. The most favorable geomorphic conditions for landslide development included slopes of 10°-30°, southeasterly to northwesterly orientations, elevations of 100~400 m, distances to rivers less than 200 m, and distributions mainly in clastic and mixed clastic-carbonate rock areas. Many landslides were located in rainfed farmland within 100 m of roads. (2) The combination of InSAR technology with traditional landslide susceptibility assessment models enabled dynamic assessment. The method can be updated synchronously with InSAR deformation data to reflect the current state of landslide evolution in a timely manner. (3) Under extreme weather conditions, landslide risks in the study area increased significantly, while the relatively low water level of the Three Gorges Reservoir had no significant negative impact on reservoir bank stability. (4) Precipitation was identified as the primary driver of dynamic landslide susceptibility evolution, with the susceptibility-precipitation response varying considerably across regions with different lithologies. [Conclusions] By integrating SBAS-InSAR time-series deformation monitoring with the information value model, this study reveals the spatiotemporal variability of landslide risk under extreme weather conditions. It addresses critical gaps in understanding landslide evolution mechanisms in the TGRA and provides a scientific foundation for landslide monitoring, early warning, and risk management.