[Significance] Urban digital twin models simulate comprehensive urban scenes by digitally mapping physical entities through real-time data integration. These models serve as visual, real-time representations of urban dynamics within smart cities, incorporating technologies such as the Internet of Things (IoT), spatial information systems, artificial intelligence, and others. Building on the Physical-Social-Information (PSI) three-dimensional framework, this paper reviews the current research progress of urban digital twin models and innovatively proposes a four-dimensional coupling framework: Physical-Social-Information-Time (PSIT). [Progress] The main research findings are as follows: (1) Since the introduction of digital twin technology into urban research in 2017, related literature has grown rapidly, with theoretical foundations and functional design frameworks gradually maturing. Urban digital twin models have initially been developed along three dimensions, PSI, including the digital mapping of geographic entities, spatial analysis of human activities, and the fusion and mining of geographic big data. (2) To more accurately reflect the real urban operations, current models require breakthroughs in data, technology, and algorithms. The PSI framework tends to overemphasize spatial features while oversimplifying the temporal dimension, lacking a representation of the spatiotemporal differentiation inherent in urban systems. (3) Recognizing the critical role of spatiotemporal coupling in urban modeling, this paper elevates time from a background variable to an independent dimension. This is based on the unidirectional nature of time, the temporal constraints on social behavior, the allometric time scales of urban element evolution, and the time-dependent mechanisms behind system phase transitions. Accordingly, the PSIT four-dimensional coupling framework is proposed to enhance the logic of urban system evolution and advance the theoretical paradigm of urban digital twin modeling. The CitySPS platform is presented as a case study for detailed illustration. [Prospect] The PSIT four-dimensional coupling framework offers the potential for more precise simulation and accurate prediction in digital urban spaces, representing a promising direction for future "intelligent" urban governance.
[Objectives] To systematically investigate the key pathways and application models for leveraging intelligent technologies in urban-rural integrated planning. It seeks to explore potential responses to the complex challenges arising from China's strategic developmental shift towards the renewal and quality enhancement of its existing urban and rural stock. In this new era, traditional planning methodologies, which often rely on static data and experience-driven decision-making, face significant limitations in addressing the intricate stakeholder relationships, intertwined land uses, and dynamic nature of established built environments. This review, therefore, explores how a more intelligent, data-informed approach could better support contemporary planning objectives. [Discussion] Addressing challenges such as the precise identification of multi-source features, the issue of persistent data silos, and the need for in-depth cognition of complex human-environment relationships, this paper reviews and proposes a closed-loop "Perception-Fusion-Cognition-Planning" conceptual framework. This framework is intended to guide the application of intelligent digitalization across the planning lifecycle. It begins with Intelligent Perception, which suggests integrating multi-modal data from sources like high-altitude cameras and drones with AI-driven algorithms for the automated semantic analysis and dynamic monitoring of urban-rural elements. This is followed by Spatio-temporal Data Fusion, which focuses on standardizing and integrating these heterogeneous data streams onto a unified baseline, creating a consistent and reliable digital foundation for analysis. The third stage, Cognition, employs knowledge graph technology to transform the integrated data into deep, systemic insights by explicitly modeling the implicit relationships between spatial entities, socio-economic factors, and regulatory policies. Finally, the Planning and Governance stage describes how an integrated enabling platform can translate these insights into actionable decision support, facilitating scenario analysis and collaborative workflows. A case study focused on farmland protection in Zengcheng District, Guangzhou, is presented to illustrate the potential and feasibility of this integrated technological pathway. The case demonstrates how the framework can be applied to achieve end-to-end governance, from automated change detection to policy-based reasoning and targeted enforcement. [Prospect] This systematic framework offers a potential conceptual approach for addressing the multifaceted challenges inherent in the intelligent planning for urban-rural integration. It is hoped that this work can help facilitate a paradigm shift in planning from traditional, experience-driven methods toward modern, data-informed "platform-based governance," characterized by more dynamic, evidence-based, and collaborative processes. By providing both theoretical references and practical guidance, this review aims to contribute to the ongoing efforts to modernize spatial governance capabilities, thereby supporting the overarching goals of sustainable and high-quality integrated development in China.
[Objectives] Teleconnections refer to remote interactions between climate variables within the Earth system, which are crucial for understanding global climate change and predicting extreme weather events. With the application of complex network theory in climate science, the climate network approach has provided a novel methodological framework for studying teleconnection, helping to unravel the complex interaction mechanisms and dependencies within the climate system. [Analysis] This paper first reviews teleconnection research methods based on complex network theory, focusing on quantitative approaches for assessing interdependencies between variables, the application of network metrics to identify teleconnection patterns, and recent advances in the detection of teleconnection pathways. It then systematically summarizes research findings derived from climate network methods, highlighting their applications in both validating and discovering teleconnection phenomena, including the identification of univariate and multivariate patterns. The paper also discusses key insights into the underlying mechanisms of teleconnections, particularly the roles of atmospheric Rossby waves and ocean circulation. [Prospect] Finally, it emphasizes that current data and methodologies are still insufficient to fully capture nonlinear dynamics and cross-scale interactions. The paper suggests that future studies should aim to enhance the reliability and applicability of data and methods, give greater attention to the coupled relationships between Earth system subsystems, and explore the multi-scale temporal effects and dynamic evolution characteristics of teleconnections in greater depth.
[Background] In recent years, confronted with emerging phenomena and new challenges in cartographic practice, an increasing number of scholars have called for a critical reassessment of existing paradigms in cartography, aiming to address both the disciplinary challenges and societal demands arising from technological transformations. [Objectives and Methods] Following a research approach that integrates critical inheritance and innovative transcendence, this study employs theoretical deduction to first review and synthesize existing paradigms in cartography. It then analyzes the predicaments encountered during the structural transformation of mapping practices and, finally, proposes a new paradigm for the discipline. [Results] Traditional cartographic research tends to equate “maps” with practices defined by specific professional norms, thereby endowing cartography with distinct characteristics of professional map-making in its knowledge sources, focal concerns, and practical pathways, ultimately forming what may be termed the professional map-making paradigm. However, this paradigm increasingly reveals two prominent dilemmas. On the one hand, the professional map-making paradigm struggles to capture the complexity and fluidity of maps as they are embedded in everyday social life, often resulting in theoretical lag and explanatory failure when addressing new forms of cartographic practices and their associated meaning-making mechanisms. On the other hand, the paradigm tends to operate within a closed cycle of internal knowledge reproduction, lacking substantial theoretical innovation and failing to cultivate problem-oriented thinking, thereby weakening its capacity to guide and regulate cartographical practice. In response, this study, grounded in a networked and relational understanding of the world, proposes a social practice paradigm for cartography. This paradigm conceptualizes maps as social practices embedded within social networks and linked to social actors, emphasizing the unique meanings and social values that maps generate in connecting individuals with the external world. [Conclusions] The social practice paradigm advocates for a transcendent perspective in understanding and interpreting maps, incorporates interdisciplinary integration and pluralistic methodological approaches, and promotes the coordination of local experiences with global perspectives. This paradigm not only deepens holistic understanding of mapping practices but also offers new theoretical resources and analytical frameworks for cartography to address pressing issues in contemporary digital, intelligent, and networked societies. Future research should systematically analyze the research context and theoretical foundations of cartography within the social practice paradigm. A conceptual, discursive, knowledge, theoretical, and methodological system, distinct from the professional map-making paradigm, needs to be gradually established to address the challenges faced by cartographic practice in complex and evolving contexts. The fundamental goal is to expand the boundaries of cartographic research and intellectual resources, transcending existing disciplinary categories and knowledge frameworks.
[Significance] Geomorphology, as a traditional discipline, has always focused on the quantification of internal and external forces in surface processes and the study of process mechanisms. In recent years, with the rapid advancement of artificial intelligence technology, deep learning models, due to their excellent feature learning and information transfer capabilities, have effectively expanded the research paradigms of geomorphology and provided new vitality for the renewal of geomorphological research methods. At present, deep learning methods have demonstrated diverse application practices in geomorphology research and accumulated relatively rich achievements. It is urgently necessary to systematically classify and summarize relevant cases, summarize their general application paradigms, and further analyze the deficiencies and challenges in current research by combining the advantages of deep learning methods with the predicaments of geomorphology research. [Progress] In view of this, this paper reviews the latest application progress of deep learning in the field of digital geomorphology research from 2018 to 2025 from three aspects: feature extraction, process research, and cause analysis. Research shows that in terms of feature recognition, deep learning has achieved high-precision automatic extraction of typical landform elements such as loess gullies, sand dunes, landslides, and glaciers. In terms of trend prediction, it can effectively capture the temporal evolution laws of surface subsidence, water level changes, and disaster processes such as landslides and floods. In terms of model simulation, it has initially demonstrated the potential of geomorphic genesis modeling, such as the simulation of complex processes like tectonic movements and volcanic activities, promoting the advancement of geomorphological research from "descriptive analysis" to an integrated "prediction-interpretation" approach. Although deep learning has shown broad prospects in digital geomorphology research, there are still prominent limitations at present: Firstly, under the influence of geomorphic differences, the model's transfer and adaptation are limited. When applied to different climate zones or geomorphic types, the prediction accuracy often drops significantly, and cross-regional and cross-scale promotion is restricted. Secondly, the model's interpretability for the mechanism of geomorphic processes is insufficient, making it difficult to reveal the dynamic mechanism behind geomorphic evolution and limiting its application depth in theoretical modeling and scientific cognition. Secondly, the bottleneck of multi-source geomorphic data and computing power is obvious. The cost of obtaining high-quality and multi-scale geomorphic data is relatively high. The simulation of long time series and complex processes has a huge demand for computing resources, which restricts the wide application and promotion of related methods. [Prospect] Future research should, on the basis of strengthening the integration of data-driven and mechanism modeling, enhance the cross-regional adaptability, physical interpretability and applicability in resource-constrained environments of models. By building a standardized and multi-source integrated global geomorphic database, promoting the combination of deep learning with new methods such as physical constraint architecture and lightweight modeling, a new paradigm for geomorphic research oriented towards process simulation and mechanism cognition is gradually formed. At the same time, efforts should be made to promote the integrated application of digital geomorphology research with knowledge graphs and AI large models. With intelligent models at the core and knowledge enhancement as the link, an intelligent geomorphology system integrating data, mechanism and semantics should be constructed to achieve the collaborative drive of data and knowledge, providing solid support for the cognition of geomorphology processes and the exploration of mechanisms.
[Significance] The extraction of semantic relations between geographic entities is a central task at the intersection of geographic information processing and natural language processing. Geographic entity semantic relations describe the associative expressions between geographic entities based on conceptual, attributive, spatial, and temporal dimensions. Its primary objective is to identify geographic entities within unstructured text and extract the semantic associations among them. As a critical component in the evolution of geographic information science from geometric modeling towards cognitive intelligence, this technology establishes logical connections between entities by interpreting their spatiotemporal interaction mechanisms. Consequently, it plays a vital role in enriching the semantic content of geographic entity data, enabling human-computer compatible understanding, supporting complex spatial analysis, and advancing the intelligent application of geographic information. [Analysis] This paper conducts an extensive review of methods for geographic entity semantic relation extraction based on internet texts, tracing their progression through traditional rule-based approaches, conventional machine learning techniques, and modern deep learning-based artificial intelligence methods. Traditional rule-based methods, particularly those reliant on template matching, face bottlenecks in efficiency and generalization, constrained by expert-defined rules and low feature coverage. While conventional machine learning methods incorporate a broader range of features, they depend heavily on manual feature engineering and suffer from high computational complexity. In contrast, deep learning-based AI methods have achieved breakthroughs in general-domain relation extraction through automatic feature learning and end-to-end modeling, and their application is progressively expanding into the geographic domain. In the foreseeable future, deep learning models for geographic entity relation extraction will evolve towards few-shot learning, cross-domain transfer, enhanced interpretability, and deeper integration with knowledge graphs, progressing further in the direction of automation, dynamism, and multimodal collaboration. Future research must delve deeper into areas such as cross-modal collaborative cognition and reasoning, self-evolving modeling of dynamic spatial relationships, and adaptive learning mechanisms for low-resource scenarios. The application of Large Language Models (LLMs) is particularly noteworthy and warrants significant attention. [Purpose] This paper focuses on extraction algorithms for geographic entity relations from web texts, systematically reviewing and analyzing the latest algorithmic advances and application progress in related technical methods. Furthermore, it discusses current challenges and potential future research directions, aiming to provide a valuable reference for researchers in related fields. This review aims to provide researchers with a systematic overview of the technological development trajectory, facilitating a rapid grasp of the current research landscape in the field. Furthermore, by offering a comparative analysis of key technologies, it serves as a decision-making reference for algorithm selection. Finally, by forecasting frontier challenges and potential breakthrough directions, this paper seeks to inspire innovative research ideas.
[Objectives] Administrative divisions serve as fundamental units for administrative management, resource allocation, and socioeconomic statistics. Frequent adjustments lead to continual changes in their names, types, codes, and spatial boundaries, which complicate cross-year statistical analysis and historical traceability, posing challenges to regional planning and policy implementation. To address this issue, this paper proposes a method for constructing an administrative division knowledge graph for complex spatiotemporal evolution, aiming to achieve a unified semantic representation and intelligent management of administrative division changes. [Methods] To address the limitations of traditional GIS in formally expressing complex evolutionary relationships and dynamic evolutionary processes, this study designs a semantically consistent ontology model for administrative divisions and proposes a "Subject-Predicate-Object-Time" quadruple framework to systematically define the concepts, attributes, and relationships of administrative division units. Using long time-series vector data of China's provinces, cities, and counties from 1949 to 2023, methods of spatial overlay and attribute matching are employed to identify evolutionary events such as name changes, type adjustments, code changes, mergers, splits, and spatial boundary changes. An administrative division evolution knowledge graph is then constructed in a Neo4j graph database to enable the structured management of this evolutionary knowledge. [Results] The knowledge graph contains 26 269 nodes and 406 744 triples, comprehensively documenting information on the names, codes, areas, and evolution types of provincial, municipal, and county-level administrative divisions from 1949 to 2023. Based on the graph, 33 507 evolutionary events were identified, enabling spatiotemporal evolution querying, cross-spatiotemporal logical reasoning for knowledge discovery, and cross-year statistical data correction. [Conclusions] This research overcomes the limitations of traditional GIS in representing complex spatiotemporal relationships, offering a new approach for the study of administrative division evolution and providing a reference for modeling complex spatiotemporal knowledge of other geographical entities.
[Objectives] Wemap recommendation on real-world cartographic platforms is challenged by three persistent factors: sparse and fast-drifting user-item interactions, heterogeneous multimodal content, and frequent modality incompleteness, where items may lack images or contain only short, noisy textual descriptions. Systems that rely primarily on historical clicks or unimodal features struggle to learn stable user intent and often perform shallow cross-modal fusion, which limits ranking quality and reduces generalization to long-tail items. To address these issues, we present SMWRec, a graph-based multimodal recommendation framework that explicitly enforces cross-modal semantic consistency prior to feature fusion. [Methods] The core idea is an Align-Before-Fuse paradigm in which three auxiliary self-supervised objectives—Image-Text Contrastive (ITC), Image-Text Matching (ITM), and Masked Language Modeling (MLM)—regularize visual and textual embeddings to inhabit a coherent semantic space. On top of alignment, we introduce modality-agnostic feature perturbations (feature dropping and masking), which enhance robustness to incomplete inputs without adding significant architectural complexity.SMWRec employs lightweight graph message passing to aggregate user-item relational signals while keeping the fusion head simple and efficient. During training, the auxiliary objectives operate jointly with the main recommendation loss, stabilizing optimization and reducing overfitting under sparse data regimes; during inference, they are disabled, ensuring that the alignment introduces no additional latency. [Results] We evaluate the proposed framework under a full-ranking protocol on four multimodal datasets—MovieLens, TikTok, Kwai, and the domain-specific Wemaps—using Recall@K and NDCG@K as the primary metrics. We compare SMWRec against strong and representative baselines including VBPR, LightGCN, MMGCN, MMSSL, CAMP, and SLMRec, all under identical preprocessing, hyperparameters, and sampling strategies. Across all datasets, SMWRec consistently outperforms competing methods. On Wemaps, which contains user-generated micro-maps with diverse visual and textual cues, the model achieves a 31.48% improvement in Recall@10 and a 33.86% improvement in NDCG@10 over the strongest baseline, indicating enhanced candidate coverage and ranking accuracy. Comprehensive ablations confirm that both components—the Align-Before-Fuse module and the modality-agnostic perturbations—are major contributors to the performance gains; removing either leads to a marked decline in Recall and NDCG. Stress tests under missing-modality conditions further show that SMWRec maintains superior ranking quality when images are absent or text is truncated, demonstrating strong suitability for noisy, user-generated cartographic content. [Conclusions] Beyond accuracy, the framework offers practical advantages: it integrates easily with existing recommendation pipelines, scales naturally to large item collections through sparse graph operations, and introduces negligible inference overhead. Overall, our study demonstrates that explicit alignment prior to fusion, coupled with lightweight self-supervision and simple perturbations, provides an effective, reproducible, and extensible paradigm for Wemap recommendation. The proposed method mitigates representation degradation under sparsity and incompleteness, improves stability across datasets, and offers a clear pathway toward robust, multimodal, and user-centric mapping services.
[Objectives] Delineation of spatial boundaries for geographic entities is a crucial process in resource planning, functional identification, and spatial anomaly analysis, and has been widely applied in studies on human-environment interactions. Accurate delineation of Points of Interest (POI) boundaries captures regional spatial characteristics more precisely, thereby improving the reliability of subsequent analyses. Current methodologies for generating boundaries of POI outliers face three major limitations: (i) sensitivity to parameters, such as fixed radii or density thresholds; (ii) insufficient adaptability to individual outliers; and (iii) excessive reliance on multi-source data integration. [Methods] To address these challenges, we propose an adaptive nucleus-constrained approach for delineating POI outlier boundaries. This methodology introduces three principal innovations: (1) a star topology focused on outliers, augmented by inserting Steiner points to optimize neighborhood configurations and generate adaptive nuclei; (2) integration of these nuclei through Constrained Delaunay Triangulation (CDT), refined using natural barriers, POI categories, and semantic correlations; and (3) precise boundary extraction to preserve spatial contextual structures. [Results] Our method outperforms traditional approaches, including growth-restricted Voronoi diagrams (0.42), adaptive buffers (0.62), and fixed-radius buffers (0.26/0.10), in delineating outlier ranges, achieving an average F1-score of 0.69 across multiple POI types such as education, tourism, sports, and residential complexes. It demonstrated strong adaptability to spatial heterogeneity and complex urban contexts, maintaining consistent performance across diverse distributions and footprint combinations. The method was particularly effective for POIs with uniform surroundings and well-defined spatial boundaries, achieving maximum accuracy of 1.00. It also generalized well to semantically diverse and spatially dispersed outliers. [Conclusions] Overall, the proposed approach exhibited superior precision, robustness, and adaptability in delineation tasks, confirming its effectiveness and generalizability for modeling authentic spatial footprints of POI outliers.
[Objective] Human mobility flow generation refers to the technology of estimating mobility flow between two locations based on the characteristics of the origin and destination, in the absence of historical mobility flow data. This technology has significant application value in fields such as urban planning, traffic management, and commercial layout. However, classical activity models struggle to capture the complex factors influencing mobility behavior, while deep learning-based models offer higher accuracy but lack interpretability, limiting their practical utility. [Methods] This paper proposes a human mobility flow generation method based on disentangled representation learning. By constructing separate encoders for location-related, non-location-related, and residual factors, and using a mutual information minimization strategy, the model decouples the influencing factors into three independent latent codes. A comprehensive representation of travel flow is then formed by integrating an attention mechanism, enabling high-precision flow generation. [Results] Comparative and decoupling analysis experiments using 2020 population mobility data from New York and Pennsylvania demonstrate that, compared with existing methods, the proposed approach improves the "commuter public part" indicator by 4.47% and 3.71%, respectively. It also outperforms baseline models in terms of flow generation accuracy and cross-regional generalization ability. Ablation studies and verification using unsupervised disengtangled indicators show that the disentangled representation module effectively separates the three latent factors influencing mobility flow generation. The relative contributions of each factor are quantified using the SHAP method and attention weights. [Conclusions] Compared with classical activity models and standard deep learning models, the proposed method not only improves the accuracy of mobility flow generation but also enhances model interpretability, offering a new perspective for optimizing mobility flow generation models.
[Objectives] With the deep integration of information technologies in smart city development, GNSS data has experienced exponential growth. However, trajectory generation is susceptible to environmental interference and sensor malfunctions, inevitably introducing noise. This study aims to design a novel noise identification and recovery algorithm to enhance the positioning accuracy and data quality of raw GNSS measurements. [Methods] To address trajectory noise identification, we propose a density matrix-based adaptive DBSCAN algorithm. This parameter-agnostic approach effectively captures low-amplitude noise points while avoiding the misclassification of continuous turning points. For noise recovery, we introduce a trajectory segmentation-based functional reconstruction algorithm. First, trajectory partitioning is achieved via Douglas-Peucker (DP) data compression. Next, noise-contaminated segments are identified, and fitting functions are constructed using valid points within each segment. Finally, corrupted data are recovered by utilizing spatiotemporal attributes of neighboring points. Compared to mainstream interpolation methods(e.g., Lagrange, Newton, Hermite, linear, cubic spline, and nearest-neighbor), our approach better preserves local information features induced by noise by eliminating global feature dependencies. [Results] Utilizing raw GNSS trajectories from 1,500 volunteers in Changchun (August 19th - September 1st, 2024), we conducted two comparative experiments. Experiment 1 benchmarked our proposed identification algorithm against vanilla DBSCAN and its dominant variants (KANN-DBSCAN, BDT-ADBSCAN). Results demonstrate that the proposed algorithm achieves optimal values across all three quantitative metrics: Silhouette Coefficient (SC), Calinski-Harabasz Index (CHI), and Davies-Bouldin Index (DBI), with improvement ranges of 40.17%~381.80%, 20.03%~235.18%, and 23.42%~79.53% respectively. Experiment 2 compared the new recovery algorithm against the above-mentioned six mainstream interpolation methods. Our solution significantly outperformed all baselines in Dynamic Time Warping (DTW), yielding a comprehensive improvement of 43.18%~80.43%. [Conclusions] The proposed noise identification and recovery algorithm significantly enhances positioning accuracy in raw GNSS trajectories. It efficiently supports large-scale trajectory preprocessing tasks and provides high-quality data foundations for downstream spatiotemporal trajectory mining.
[Objectives] Detecting behavioral anomalies in complex traffic scenarios is essential for ensuring public safety. Current methods largely rely on real-time trajectory data to identify traffic violations, without fully leveraging the potential insights available from historical trajectory data. This limitation impedes the automatic identification of anomalous behaviors that deviate from typical patterns. [Methods] To address this issue, we propose a comprehensive detection method, TraB, for identifying anomalous target behaviors through the integration of traffic rules and behavioral patterns. The method first extracts target trajectory orientation information based on the topological structure of the road network and employs an orientation clustering algorithm to analyze multi-frame historical trajectories, thereby identifying target behavior patterns. Based on this foundation, a mapping relationship between video image space and geospatial space is established. By integrating traffic rules with behavior patterns, a unified detection framework is developed to enable collaborative analysis of both real-time and historical trajectories. This framework performs a multi-dimensional analysis of target anomalies across four dimensions: time, location, target type, and behavior pattern. [Results] Experimental results, derived from two real-world traffic surveillance video datasets collected in Xinyang, China, in 2023—covering approximately 1.5 hours of video and 1.2 million trajectory points—demonstrate that the TraB method significantly outperforms approaches based on low-level video features (LowF), moving object trajectories (TraM), and deep learning (DeeL). This superiority is reflected across multiple comprehensive detection metrics, including Precision (P), Recall (R), and F1-score. Specifically, compared with LowF, TraM, and DeeL, the TraB method achieved average performance improvements of 11.39%~17.81%, 14.09%~20.62%, and 10.06%~23.40%, respectively. Furthermore, TraB exhibited strong robustness in complex traffic scenarios, as evidenced by a maximum reduction of 60.93% in the standard deviation of its evaluation metrics compared with LowF, TraM, and DeeL. The proposed TraB method possesses intelligent detection capabilities, allowing it to effectively identify anomalies that diverge from normal behavioral patterns, thereby providing a novel perspective for monitoring target behavior in complex traffic environments.
[Background] Current urban traffic environments are highly complex, and traffic flows exhibit strong nonlinear characteristics in both time and space. Although existing prediction models based on traditional statistical methods, deep learning architectures and attention mechanisms are capable of capturing short-term nonlinear dynamics over limited horizons, they still struggle at the city scale when confronted with long temporal horizons, strong noise and continuous multi-step prediction requirements. In such settings, these models often suffer from insufficient representation of long-term dependencies and significant error accumulation along the prediction horizon, which makes it difficult to achieve accurate and stable long-term vehicle trajectory prediction. [Methods] To solve these problems, this paper proposes a long-term vehicle trajectory prediction model based on a Transformer architecture with local self-attention. Considering that urban vehicle trajectories are characterized by strong local temporal correlation yet are easily affected by noise, the model replaces the global self-attention structure of the traditional Transformer with a local self-attention mechanism, so that each time step primarily attends to a limited historical window and the influence of irrelevant distant points is suppressed. In parallel, the overall framework is adjusted for vehicle trajectory prediction in terms of data preprocessing, the design of the embedding layer and the form of the output representation. Discrete high-dimensional embeddings are employed to enhance the spatial expression of the input trajectories, and a pair of independent embedding vectors and decoding structures is constructed for latitude and longitude to improve coordinate prediction accuracy. Through these designs, the proposed model strengthens its ability to capture the spatio-temporal characteristics of trajectory data under long prediction horizons. [Results] Extensive experiments are carried out on one month of continuous GPS trajectories collected from 320 taxis in Rome. The results show that, for short-, medium- and long-term prediction tasks, both the average multi-step error and the single-step error of the proposed model are consistently lower than those of mainstream baseline models. In particular, the average displacement error and root mean square error are reduced by up to 41% and 35%, respectively, indicating clear advantages in prediction accuracy. Further analysis of the local attention mechanism shows that an appropriate local self-attention time window can significantly improve the ability of the model to capture trajectory features. When the time window size is increased from the optimal 30 minutes to 35 and 40 minutes, the average displacement error rises by approximately 3.78% and 5.17%, respectively, which implies that an overly large window tends to introduce additional noise and redundant information and consequently weakens the predictive performance of the model. [Conclusions] The research results can therefore provide technical methods and data support for practical applications such as personalized navigation recommendation, real-time traffic management and trajectory data recovery.
[Objectives] Addressing the challenges of weak generalization and heavy reliance on large amounts of city-specific annotated data in environment-dependent crime spatio-temporal prediction, this paper proposes a foundation model named ST-Crime based on generative pre-training and prompt learning, aiming to enhance the accuracy of environment-dependent crime spatio-temporal prediction and improve generalization performance in new environments. [Methods] The method first unifies crime data into a tensor representation, employs a Transformer backbone to capture global spatio-temporal dependencies, and introduces a crime spatio-temporal memory retrieval enhancement module. This module extracts common spatio-temporal patterns from multi-city data and generates prompt information through spatio-temporal memory, crime-type interaction, and adaptive graph learning mechanisms to strengthen the model's expressive capability. [Results] Experiments were conducted using crime data from the year 2019 across four cities—New York, Los Angeles, San Francisco, and Chicago—totaling over 300 000 records and covering four typical environment-dependent crime types: burglary, robbery, felony assault, and grand larceny. In the full-training scenario (using year-long data from New York, Los Angeles, and San Francisco), ST-Crime achieved Macro-F1 scores of 0.739 7, 0.643 3, and 0.665 2, and Micro-F1 scores of 0.687 1, 0.601 8, and 0.537 5 on the three cities respectively, significantly outperforming existing baseline models.Compared to the best-performing baseline model in each city, it achieved improvements of 1.57%, 4.30%, 6.45% in Macro-F1, and 1.15%, 6.06%, 9.63% in Micro-F1, demonstrating significant advancements. For Chicago in the few-shot (fine-tuned with only 20% of data) and zero-shot (direct inference) scenarios, it also achieved Macro-F1 scores of 0.658 6 and 0.603 1, and Micro-F1 scores of 0.596 9 and 0.565 3, which surpassed the best baseline by 7.02%, 7.73% in Macro-F1, and by 3.41%, 9.51% in Micro-F1, respectively, demonstrating superior cross-city generalization capability. [Conclusions] The study concludes that ST-Crime effectively captures crime spatio-temporal patterns and offers a unified solution for environment-dependent crime spatio-temporal prediction, achieving state-of-the-art performance across varying data conditions including full-training, few-shot, and zero-shot scenarios.
[Objectives] Land-Use and Land-Cover Maps (LULCMs) are essential carriers of spatial information, and quantifying their information content underpins cartographic information theory. Existing metrics have limitations: Shannon entropy, grounded in statistical features, inadequately captures spatial information. Spatial entropy and some landscape indices reflect spatial structural characteristics but yield measurement values independent of spatial disorder and lack thermodynamic consistency. Boltzmann entropy, derived from thermodynamics, is sensitive to structural disorder but conflates multiple spatial dimensions and fails to isolate components such as object geometry or spatial distribution. To address these shortcomings, we propose a novel spatial-information measurement method for LULCMs based on cartographic information theory. [Methods] The method constructs a spatial information measurement method for LULCMs using the landscape shape index and patch centroid distance features. It targets two key attributes—geometric complexity and spatial-distribution heterogeneity—leveraging LULCM's patch hierarchy. First, a land-cover classification system defines geometric-shape and spatial-distribution indicators for each patch. Specifically, for geometric shape features, this paper improves the traditional Landscape Shape Index (LSI) by incorporating fragmented space derived from the maximum rectangular space, enabling effective distinction of patches with different shapes under identical area and edge-length conditions. For spatial distribution features, patches are abstracted as point data, with centroids serving as integration points of spatial information. We then calculate average inter-class and intra-class distances between patches to characterize spatial distribution. Second, these multi-feature spatial indicators are input into the Feature-based Information Measurement Model (FIMM) to quantify geometric-shape and spatial-distribution information for each class. Finally, patch-area weighting aggregates class-level metrics into overall spatial information. [Results] Experiments on AID imagery and simulated datasets compare the proposed metric against Shannon entropy, spatial entropy, Boltzmann entropy, and landscape indices. Results show that the proposed method (1) clearly distinguishes spatial-information content across land-cover types, (2) aligns with human spatial cognition, (3) achieves thermodynamic consistency comparable to Boltzmann entropy (Pearson r = 0.93), and (4) identifies patch number as the core factor driving changes in multi-dimensional information components, showing a significant positive feedback effect. Application to 2010—2020 LULCMs for Wuhan and Changsha reveals overall information increases of 19 % and 35 %, respectively. [Conclusions] The proposed metric directly and accurately quantifies geometric shape and spatial clustering in LULCMs, offering both a new theoretical tool for image-information quantification and a practical approach for monitoring land-cover change and urban expansion.
[Objectives] Object detection technology in remote sensing images has been widely applied in critical fields such as military reconnaissance and disaster monitoring. However, due to the common characteristics of diverse orientations and varied morphologies of targets in remote sensing imagery, detection methods must possess adaptive capabilities in terms of both shape and orientation. Traditional oriented bounding box representations have significant limitations in fitting object shapes and orientations, often failing to delineate object contours accurately and introducing substantial background interference. [Methods] To address this issue, this paper proposes an object detection method for remote sensing images based on representative point set learning. This method employs representative point sets to replace traditional rotated bounding boxes, enabling flexible modeling and accurate localization of object geometries. These point sets adaptively distribute over key regions of the target. They are mapped to a 2D Gaussian distribution via a Gaussian conversion function, and a rotation regression loss based on this distribution is constructed to supervise the point sets, guiding them to align with the semantic and geometrically significant areas of the object. In the classification stage, to mitigate the large intra-class variation among objects of the same category, a large-margin cosine loss is introduced. Utilizing feature normalization and cosine decision boundary maximization strategies, this loss promotes a compact distribution of intra-class features. [Results] To validate the effectiveness of the proposed method, this paper conducts comparative experiments on the DIOR-R dataset and a self-constructed port ship detection dataset, comparing the proposed algorithm with current mainstream approaches. The experiments are implemented on a server platform based on the PyTorch framework. The results show that the mean Average Precision (mAP) of the proposed method reaches 66.43% and 79.80%, respectively, outperforming 12 typical remote sensing object detection methods proposed in recent years, such as GWDRetinaNet, Oriented RepPoints, and DODet. [Conclusions] Experimental results demonstrate that the proposed representative point set-based method exhibits good adaptability in handling the rotation and deformation of objects in remote sensing images, achieving competitive detection accuracy across multiple datasets. This method provides a feasible approach for modeling rotated objects in complex scenarios and holds certain reference value for the practical application of high-resolution remote sensing imagery.
[Objectives] As a crucial component of the Greenland Ice Sheet hydrological system, remote sensing monitoring of the spatiotemporal evolution of supraglacial lakes is of great significance for assessing the mass balance and stability of the ice sheet. However, traditional water-index methods rely on scene-dependent thresholds that require manual tuning, whereas classical machine-learning and deep-learning approaches demand large training samples and suffer from weak spatiotemporal generalization, resulting in poor automation for large-scale, time-series monitoring of supraglacial lakes. [Methods] To address this, this study employed a fine-tuned visual foundation model, namely the Segment Anything Model (SAM), trained with a small sample size (200), to construct a foundation model for the automated temporal extraction of supraglacial lakes. Combined with 242 scenes of multi-source remote sensing images from Sentinel-2 and Landsat 8/9, a temporal product of supraglacial lakes (May-September, 2019-2023) in the Isunnguata-Russell Glacier basin, southwestern Greenland, was developed. This study revealed the interannual/seasonal variation patterns of supraglacial lakes, and explored their elevation distribution characteristics as well as their responses to temperature changes. [Results] The results show that: ① The visual foundation model demonstrates high robustness in the temporal extraction of supraglacial lakes, effectively mitigating the impact of cloud cover, requiring few training samples, and operating without post-processing, thereby accurately capturing area changes during rapid lake expansion and peak periods; ② The ablation period shows significant interannual differences: the maximum area ranges from 109.521 0 km2 ± 5% to 282.852 9 km2 ± 5%, and the number increases from 1 247 to 2 549. The seasonal variation presents three stages: slow growth—rapid expansion, peak water storage, and sharp reduction. There are interannual differences in the formation and peak timing of supraglacial lakes, which usually disappear by the end of August or early September; only 23.5872 km2 of supraglacial lakes remained in September 2021; ③ Supraglacial lakes exhibit distinct elevation distribution patterns, and their responses to temperature changes (Positive Degree Days, PDD) also show a significant correlation with elevation. The middle elevation zone (1.0~1.6 km) is the concentrated area for the number and area of supraglacial lakes, with a relatively high response to temperature changes; the high elevation zone (1.6~2.0 km) has fewer lakes due to low temperature and insufficient meltwater, while the anomalies in the extent of supraglacial lakes show the highest sensitivity to PDD changes (R2 = 0.99); in contrast, the low elevation zone (0~0.8 km) has a weak correlation with PDD (R2 = 0.30), and its dynamics are mainly dominated by non-climatic factors. [Conclusions] This study verifies the great potential of visual foundation model in the temporal monitoring of supraglacial lakes and provides technical and data support for understanding the glacial meltwater process and its impact on glacial dynamics.