[Significance] The 3D real-scene models are evolving from merely "representing reality" in the past to "connecting reality, cognizing reality, and even foreseeing reality" in the present. With the construction of the national 3D real scene of China, the integrated application paradigm of "3D Real-Scene +" has applied in numerous industries. As a critical part of national new infrastructure, 3D real-scene models hold significant value for scientific research in areas such as geospatial cognition, virtual geographic environments, and spatiotemporal computing, as well as for national strategies and societal needs such as the "Digital China" and the "Digital economy". [Analysis] This paper provides a systematic review of the conceptual framework, core methodology, application scenarios, and challenges of 3D real-scene models. First, this paper systematically reviews and synthesizes the concepts, connotations, and main types of 3D real-scene models, proposing the concepts of primary 3D real-scene models and secondary 3D real-scene models to facilitate their in-depth application. The research scope of 3D real-scene models is summarized across five levels: the data layer, model layer, platform layer, application layer, and theory-standard layer. Second, this paper comprehensively outlines the core methodology currently from the perspective of primary 3D real-scene modeling, secondary 3D real-scene modeling, and model optimization and assessment. The state-of-the-art techniques in 3D real-scene and their development directions were systematically summarized as well. Next, this paper systematically reviews the application scenarios of 3D real-scene models. The application scenarios were summarized into four dimensions of spatio-temporal base, information extraction, connecting reality, and industry empowerment. Industrial empowerment is the core purpose of 3D real scene models. The industrial empowerment can be achieved through four paths: business data visualization and fusion in 3D real scene models, business decision-making enhancement by 3D analysis, process business reengineering by online 3D platform, and AI agent decision-making. Finally, the main issues and challenges currently facing the 3D real-scene models are summarized from the perspectives of the paradigm transformation of geographic information cognitive, spatiotemporal dynamic modeling of complex scene, intelligent transformation, and in-depth application in industries. Additionally, the future development directions of 3D real-scene models are prospected. [Purpose] This paper aims to provide a comprehensive academic perspective and a systematic overview of technological developments for related research, assisting researchers and technical professionals in quickly grasping the up-to-date trends in the field of 3D real-scene models. It serves as a reference for technology R&D and for the integrated application of 3D real-scene models in industries, while also identifying potential breakthrough directions such as AI-enabled real-scene 3D construction, thereby inspiring innovative research and practical ideas.
[Objectives] Visible information accessible within urban three-dimensional space forms the basis for comprehensively understanding the systematic nature and cultural significance of cityscapes. To overcome the computational limitations of GIS-based viewshed analysis and the challenges of digitally representing the semantic complexity of heritage landscapes, this study aims to develop an efficient computational method for identifying 3D visible locations. The objective is to facilitate the detection and strategic establishment of key viewing positions and visible corridors within urban vertical space. [Methods] Using semantic feature points of cultural heritage landscapes as conceptual anchors, this research integrates building height data into a Digital Elevation Model (DEM) to construct a ground-referenced 3D visible location calculation matrix. This matrix enables the generation of a dataset specifically designed for analyzing visible locations relative to cultural heritage, which is subsequently used to systematically examine spatial distribution patterns and inform urban planning strategies. [Results] An empirical study was conducted in Zhangjiakou City, focusing on its Great Wall landscape. A 3D visible-location dataset was successfully generated for the Great Wall based on a 100 m horizontal grid and 3m vertical intervals. Analysis of this dataset revealed four distinct spatial patterns of visible locations: (1) Comprehensive High-Visibility Zones, areas with significantly superior Great Wall visibility, primarily in the southwestern and southern sections with low building density, suitable for designation as priority protection areas; (2) Vertical High-Visibility North-South Aerial Corridors, centered along the east side of the Qingshui River to Dongxing Street and Linyuan Road in the central city, showing strong potential for establishing elevated visible corridors; (3) Vertically Restricted Visibility Zones under Dual Building-Terrain Influence, mainly located in the southeastern areas affected by both structural and topographical constraints, where height restrictions could be strategically relaxed to 'trade height for view'; and 4) Pervasive Low-Visibility Zones, where vertical position changes have minimal impact on visibility, located in the eastern region blocked by mountains, making them low-priority for landscape resource investment. The study also pinpointed 3 m, 9 m, and 12 m above ground level as key observation heights for perceiving the Great Wall in Zhangjiakou, providing a foundation for differentiated height control in urban planning and moving beyond one-size-fits-all approaches. While maintaining consistency with traditional analysis results, the proposed method effectively avoids the computational redundancy of conventional panoramic calculations, demonstrating its capability for rapid assessment of landscape visibility resources across large urban 3D spaces. [Conclusions] The proposed computational method for 3D visible locations ensures efficiency while capturing the semantic richness of heritage landscapes. The resulting dataset can effectively support 3D landscape design and spatial planning in urban contexts, offering a novel approach for leveraging the resource value of National Cultural Parks.
[Objectives] Visual quality serves as a crucial perceptual indicator for evaluating the built environment of cities and holds significant importance for urban renewal and the improvement of the human habitats. However, existing methods often overlook the influence of the built environment on the visual quality of indoor spaces, making it difficult to accurately capture the human perceptual experience within urban environments. [Methods] This study integrates multi-source laser point clouds to propose a visual quality evaluation method based on window-centered viewpoints. First, UAV and TLS point clouds were aligned, and feature categories were extracted to construct a viewshed image using the window center as the viewpoint, with both category and view distance assigned separately. Second, two novel three-dimensional evaluation factors, spatial visibility and spatial layer balance index, were introduced, and nine existing two-dimensional evaluation factors were integrated to establish a visual quality evaluation system across three dimensions: naturalness, sociability and complexity, based on the viewshed image. Finally, the Elo rating system was employed to quantify public subjective perceptions at the viewpoints, while a random forest model was used to analyze the relationships between evaluation factors and visual quality scores and to explore factor correlations. [Results] In the selected experimental area, the proposed method achieved high evaluation accuracy, with an R2 value of 0.969 3. Among the evaluation factors, spatial visibility, sky view factor, and spatial layer balance index were identified as key indicators for improving the visual quality of urban environments. While two-dimensional metrics dominated visual quality assessment, three-dimensional metrics proved more predictive and influential when the number of factors was limited. Most evaluation factors exhibited positive correlations with visual quality scores. Areas with open viewsheds and higher feature diversity demonstrated superior visual quality ratings. [Conclusions] Based on window-centered viewpoint, this study innovatively introduces the concepts of spatial visibility and spatial layer balance index and incorporates the Elo rating system to enhance the objectivity of evaluation results, achieving a multidimensional and multifaceted assessment of visual quality. The proposed method quantitatively evaluates the influence of the outdoor built environment on indoor visual quality, supports fine-scale urban assessments and key-area analyses from a visual perception perspective, and provides quantitative data support for urban planning and landscape design.
[Objectives] As one of the key data products and organization methods in China's new fundamental surveying, mapping, and 3D realistic geospatial scenes, geo-entity data and its model have expanded and improved existing geographic information products in terms of temporal accuracy, dimensionality, and granularity. Research on administrative division modeling, particularly under big data conditions for multidimensional dynamic data, is of significant importance. However, traditional modeling methods fail to comprehensively describe the lifecycle, sematic composition relationships, and other critical information of administrative divisions from an entity-based perspective. Furthermore, there has been limited exploration of entity-based fusion and transformation methodologies for multi-source heterogeneous data. [Methods] To address these challenges, this study proposes an objectified modeling method for administrative divisions that considers composition relationships. Based on the multi-granularity spatio-temporal object data model description framework, the spatio-temporal entity description of administrative divisions is first abstracted, encompassing identification coding, spatio-temporal reference, spatial characteristics, attribute characteristics, and composition relationships. Formal descriptions are then provided at two levels: the spatio-temporal characteristics of the entity itself and the composition relationships among the entities. Subsequently, considering the multidimensional dynamics and complex relationships of administrative division entities, a logical data model—referred to as "4 tuples & 1 operation"—is proposed, consisting of the inherent tuple, spatial tuple, attribute tuple, relationship tuple, and version operation. Using standard administrative division boundaries, codes, and statistical data from official websites as examples, a multi-source data fusion and transformation process for geo-entities is constructed. [Results] Experimental results from objectified modeling, implemented using open-source software and development platforms, demonstrate that the proposed method effectively supports the construction, storage, management, editing, querying, and visualization of administrative division entity data, incorporating lifecycle management and complex composition relationships. It successfully addresses the challenges of describing temporal composition relationships and multi-source data fusion in administrative division modeling. [Conclusions] This study establishes a coupled expression mechanism that integrates temporal evolution at the entity level and nested hierarchical relationships at the relational level for administrative divisions. The findings provide a foundational model for entity-based spatial cognition and multimodal applications.
[Background] With the comprehensive advancement of the Real-Scene 3D China initiative, rapidly establishing high-precision Digital Elevation Models (DEMs) based on existing real-scene 3D models has emerged as a novel approach to terrain modeling. In complex urban areas, terrain modeling faces additional challenges due to diverse land parcel types and the coexistence of both gradual and abrupt elevation changes. [Methods] To address these challenges, this study proposes a high-precision partitioning-and-blocking DEM modeling method based on real-scene 3D models. The implementation process consists of three key stages. First, elevation points are extracted from pre-established real-scene 3D models. This step provides the fundamental data foundation for subsequent DEM construction, ensuring that the source data accurately reflect the topographic characteristics of the study area. Second, the CoANet network is employed to extract road skeletons, identifying the linear features of urban roads and providing spatial references for regional division. Concurrently, the DeepLabV3 model is utilized to segment four major functional zones: buildings, woodlands, grasslands, and water bodies. Based on the distinct topographic characteristics of these functional zones, a differentiated modeling strategy is developed: For bare land areas, elevation data are extracted, and height-reduction processing is applied based on the average statistical height of street trees. This ensures the flatness of adjacent road sections and prevents scattered tree elevations from interfering with road terrain modeling. For woodland areas, an improved local-maximum algorithm is integrated with a neighborhood mean height-reduction algorithm to restore the understory terrain. This integration compensates for the limitations of using only a local-maximum algorithm and more accurately reproduces the topographic features beneath forest canopies. For building areas, extreme-value detection technology is adopted to reconstruct the transition zones between buildings and surrounding terrain, effectively solving the problem of elevation discontinuity at building edges. For water areas, DEM construction is performed based on water boundaries to ensure that modeled water surface elevations conform to actual hydrological conditions. Finally, the DEMs generated from the partitioned modeling of each functional zone are fused through a spatial data integration algorithm. This process eliminates data redundancy and inconsistencies between adjacent regions, ultimately generating a complete and unified DEM of the study area. [Results] Experimental results demonstrate the superior performance of the proposed differentiated modeling method. Specifically, in road areas, the modeling accuracy improved by 56.5% after average height-reduction processing, significantly improving the flatness and accuracy of road terrain representation. In grassland areas, accuracy improved by 17.8% compared with the traditional Cloth Simulation Filter (CSF) algorithm, due to better adaptability to subtle elevation fluctuations. In woodland areas, accuracy improved by 53.1% compared with the single improved local-maximum method, verifying the effectiveness of the algorithm-fusion strategy in restoring understory terrain. Overall, the DEM accuracy for the entire study area increased by 35% compared with conventional modeling methods. [Conclusions] The method proposed in this study provides an automated and high-fidelity DEM modeling solution for urban surfaces characterized by interlaced natural and artificial terrains, as well as the coexistence of abrupt and gradual elevation changes. It effectively overcomes key technical bottlenecks of traditional DEM modeling in complex urban environments and holds significant potential for advancing refined and intelligent urban terrain mapping with the Real-Scene 3D China initiative.
[Significance] Dynamic simulation of surface processes is crucial for understanding the evolution of geographical processes and their impacts on the ecological environment. It has long been both a focal point and a challenge in the field of geographic information science. [Background] Existing research has mainly focused on 2D surface simulations, while dynamic 3D process simulations still require further development. The integration of Discrete Global Grid Systems (DGGs) and Cellular Automata (CA) offers a powerful tool and platform for simulating geographical processes dynamically. [Methods] A water body simulation platform for Eling Lake was built using a Spherical Geodesic Octree Grid (SGOG). Based on Fick's first law and implemented in C++ with the Open Scene Graph (OSG) 3D engine and an integrated development platform, a visualization model for water pollution diffusion was constructed. Analytical and design procedures for both 2D and 3D cellular neighborhood and transformation rules were established using triangular cells. Boundary conditions at the lake embankment were defined by extending virtual cells. Static pollution diffusion simulations were then conducted under 2D, 3D, and terrain-constrained conditions at the shoreline to visually display pollutant diffusion during accidental water pollution events, and the simulation results were analyzed. [Results] The experiments showed that: (1) When the pollution limits and total instantaneous pollutant input were the same, but the actual water volume represented by each cell differed between 2D and 3D simulations, the duration of high-concentration pollution cells in the 3D simulation was significantly longer than in the 2D simulation—by about 55 minutes. Notable differences were also observed in the total number of affected cells and in the corresponding water body volumes. For example, after 55 minutes of diffusion, 661 polluted cells were identified in the 2D simulation, compared to 3,546 in the 3D simulation. At that point, the water body volume represented in the 3D simulation was approximately 89.4% of that in the 2D simulation. (2) In both 2D and 3D simulations, the temporal trends in the proportion of cells with different pollution levels were quite similar. For instance, the proportion of heavily polluted areas in both cases fell below 10% after 33 minutes of diffusion, while the proportion of lightly polluted areas exceeded 70% at the same time. However, the vertical spatial distribution of polluted cells should not be overlooked, as the 3D simulation results more accurately reflect real-world pollution diffusion patterns. [Conclusions] This study provides a new technical framework for research on water environment pollution and offers valuable decision-making support for managing sudden water pollution incidents.
[Objectives] Differences in parameters such as DEM accuracy and grid size can affect the results of surface runoff simulations. However, no studies have yet examined the impact of DEM grid center coordinate offsets on surface runoff simulation results. [Methods] In this study, a 0.2 m resolution DEM constructed from UAV tilt photography was used as the reference dataset. Various offset directions and distances were applied to generate DEM sequences with different raster center coordinates. Based on these, surface catchment simulations were conducted using DEMs with different resolutions and offset parameters to extract hydrological features such as catchment areas and drainage lines. These results were then compared with those obtained from the original DEM. [Results] The results are as follows: (1) Grid center offsets affect the extraction of watershed lines and catchment areas, and the degree of response varies significantly with DEM resolution. Under a 5 m resolution DEM, hydrological parameters remained generally stable, with only slight changes observed under certain offset conditions (e.g., +2 m in the x-direction or -2 m in both x and y directions). However, the 1 m resolution DEM was more sensitive to offsets: when the x-direction offset was -0.3 m and -0.4 m, line density increased to 8.461 3 km/km2 and the number of catchment areas decreased from 221 to 205. The maximum changes in line density for the 5 m and 1 m resolutions were 0.189 km/km2 and 0.849 3 km/km2, respectively, indicating that high-resolution DEMs are more susceptible to offset effects. (2) The impact of offset direction on hydrological information extraction also varies. Multi-directional offsets have a greater influence than single-directional ones. For example, in the 1 m resolution DEM, when the offset was -0.2 m, the x-direction line density is 8.363 9 km/km2 and the number of catchment areas was 226. However, with combined x and y offsets, line density decreased to 7.503 7 km/km2 and the number of catchment areas to 197. These results show that high-resolution DEMs exhibit greater variability in hydrological outputs under the same offset conditions. (3) The results also indicate that terrain complexity significantly affects the sensitivity to grid center offsets. In areas with high terrain variability, offset effects caused substantial changes in drainage lines, whereas in flatter areas, the impact of DEM center coordinate offsets on drainage simulation results was relatively minor. [Conclusions] This study investigates the impact of DEM grid center coordinate offsets on surface catchment simulations, offering insights into how such offsets contribute to uncertainty in hydrological analyses. The findings highlight the need to account for grid alignment when using high-resolution DEMs in terrain-sensitive applications.
[Objectives] To improve the accuracy of vehicle trajectory prediction in highway scenarios, a vehicle trajectory prediction method based on frequency-domain decomposition and feature fusion is proposed to address the problems of multi-scale feature entanglement, sensitivity to noise disturbances, and insufficient interpretability of spatial contexts under dense traffic flow. In complex highway environments, vehicle trajectories are characterized by strong non-stationarity, multi-scale temporal variations, and complex spatial interactions among neighboring vehicles, which increase the difficulty of accurate prediction. Therefore, it is necessary to jointly model motion regularity, interaction dependence, and temporal evolution in a unified framework. [Methods] First, under a local road coordinate system and lane-based spatial reference, the historical trajectories of vehicles are represented in a structured form. An improved multivariate variational mode decomposition method optimized by the improved whale optimization algorithm, namely IWOA-MVMD, is employed to decompose non-stationary trajectory signals into Intrinsic Mode Functions (IMFs) at different frequency scales. This process helps separate long-term motion trends, local fluctuation patterns, and high-frequency disturbance components, thereby enhancing multi-scale feature extraction and suppressing noise interference. Second, a convolutional neural network and a directional attention mechanism are used to extract local trajectory features and enhance the representation of spatial interaction features among vehicles. Third, a bidirectional cross-attention mechanism is designed to establish bidirectional information exchange between frequency-domain features and interaction features, enabling deep fusion of heterogeneous features. Finally, a Bi-Mamba network is used to model temporal dependencies, and a mixture density network is introduced to output the probability density distribution of future trajectories. [Results] Experiments were conducted on the NGSIM dataset over prediction horizons from 2 s to 5 s. The results show that the proposed model achieves an average RMSE of 1.04 m, which is 25.2% lower than that of the competitive baseline GCNTA. At the 5 s prediction horizon, the prediction error is reduced by 40.6%, indicating that the proposed method shows better robustness in longer-term prediction. In addition, the ADE and FDE of the proposed model are 0.95 m and 1.66 m, respectively, which are 12.0% and 34.9% lower than those of the best comparative model STDAN. These results demonstrate that the proposed framework performs well in both overall trajectory fitting and final-position prediction. [Conclusions] The proposed method effectively improves the accuracy and stability of vehicle trajectory prediction in complex highway scenarios. By integrating frequency-domain decomposition, spatial interaction modeling, cross-feature fusion, and temporal dependency learning, it provides methodological support for trajectory prediction in intelligent transportation and autonomous driving applications.
[Objectives] Existing quadrilateral Discrete Global Grid Systems (DGGS) predominantly employ solidified and fixed subdivision rules. Although there is a proliferation of numerous grid systems, they frequently operate independently of one another. This fragmentation has unfortunately led to a series of critical issues, including a wide diversity of incompatible standards, rigidly solidified subdivision rules, and significantly poor algorithm reusability. These persistent challenges severely hinder and obstruct the effective cross-grid reuse of upper-layer application algorithms within the broader geospatial domain. [Methods] To effectively breakthrough and overcome these formidable technical bottlenecks, this paper comprehensively proposes a novel multi-scale universal quadrilateral DGGS generation method that is fundamentally based on configurable subdivision radix sequences. This innovative approach effectively decouples the generic grid generation logic from specific concrete subdivision structures. The method proceeds through the following steps: First, the core grid subdivision rules are systematically abstracted into configurable external parameters, which are referred to as subdivision radix sequences. This step realizes the complete separation of the abstract grid generation logic from the specific physical structures. Second, a specialized cardinality extension strategy is meticulously designed. By establishing a robust virtual base mapping mechanism, this strategy successfully resolves the difficult compatibility challenges associated with non-standard integer-multiple subdivisions found in complex grid systems, such as the GeoSOT grid system. Finally, a unified logical index space is constructed to strictly ensure that, even when heterogeneous subdivision strategies are adopted at different hierarchy levels, their internal logical structures, algebraic encoding operations, and topological relationships still strictly adhere to a unified and consistent algorithmic workflow. [Results] Experiments concerning grid compatibility, on-demand generation capability, and algorithm reusability were conducted using global simulated discrete coordinate point data and real-world regional polygon data. The experimental results demonstrate the following: Regarding grid compatibility, the proposed method generates subdivision results identical in quantity to those of existing mainstream grid structures (GeoSOT and GMGICM). Furthermore, the encoding and decoding time costs are comparable to those of mainstream methods, verifying the proposed method's compatibility with existing grid frameworks. Regarding flexibility, various novel grid systems, such as "all-3", "all-5", and "2-3-5 cyclic" subdivisions, which can be constructed solely by adjusting external parameters, validating the method's capability for on-demand grid generation. Regarding algorithm reusability, the proposed algorithm supports multi-scale vector polygon gridding applications across seven heterogeneous grid systems using a single set of application code. This effectively obviates the linear growth of code volume associated with traditional methods, thereby validating the method's capability for low-cost migration and reuse across heterogeneous grids.
[Objectives] The rapid advancement of Large Language Models (LLMs) has created new opportunities for intelligent geospatial data processing. However, current integration approaches between LLMs and Geographic Information Systems (GIS) still face key challenges, such as data privacy risks associated with cloud-only architectures, incomplete integration with native GIS toolchains, and the lack of standardized communication protocols for cross-platform interoperability. To address these limitations, this study proposed Smart-QGIS, an agent prototype system for geospatial data processing and mapping built on the Model Context Protocol (MCP). The system supports localized deployment while maintaining open protocol compatibility, enabling flexible integration with both local and cloud-based LLMs. The primary objective is to establish a secure, extensible, and functionally complete intelligent GIS framework that bridges natural language interaction and professional spatial analysis workflows. [Methods] Smart-QGIS was developed on the QGIS platform and uses MCP as a standardized communication bridge between LLMs and native GIS functional interfaces. The system enables end-to-end task execution, allowing users to convert natural language instructions directly into executable spatial analysis operations. It adopted a multi-process modular architecture consisting of five coordinated layers: a user interaction layer, a plugin mediation layer, an MCP communication layer, a local execution layer, and a model inference layer. This architecture ensures system scalability, functional extensibility, and operational stability, while supporting integrated workflows including data loading, spatial analysis, and cartographic visualization. [Results] System performance was evaluated using the vector administrative boundary of Shaanxi Province and digital elevation model raster data. Two model deployment strategies were tested, including a locally deployed open-source OpenAI-compatible GPT model via Ollama and a cloud-based Alibaba Qwen LLM. Through Smart-QGIS, representative GIS tasks such as data loading, clipping, slope calculation, layer visualization, and automated map production were executed interactively. Results demonstrated that Smart-QGIS can accurately interpret complex, multi-step instructions while maintaining an efficient response time, typically below 75 seconds. For routine geospatial processing and visualization tasks, system performance is generally equivalent to or exceeds that of typical professional GIS users, while cloud-based models show higher efficiency in complex task execution. [Conclusions] Overall, the MCP-based localized LLM-GIS integration framework demonstrated advantages in privacy protection, functional coverage, and protocol universality. The system significantly lowers the technical barrier for geospatial data processing, enabling non-specialist users to perform complex spatial analysis tasks efficiently. The proposed architecture provides a practical technical pathway toward building open, collaborative, and intelligent GIS ecosystems, with strong potential for applications in intelligent spatial decision support, automated geospatial data services, and next-generation human-AI collaborative geospatial analysis environments.
[Objectives] SWOT satellite altimetry provides a new capability for large-scale river network water level monitoring; however, the original products are constrained by outlier noise, near-shore mixed pixels, and inter-orbit datum inconsistencies, which limit their suitability for high-accuracy hydrological applications. [Methods] Targeting the middle and lower reaches of the Yangtze River characterized by low gradients, slow flow, and complex boundary conditions, this study proposes a SWOT-based water level harmonization and modeling framework. Inter-orbit alignment is achieved by constructing a bias field from orbit overlap regions, thereby mitigating systematic datum inconsistencies across tracks. High-confidence RiverSP nodes are employed as reference observations, and pixel-level errors in PIXC water levels are modeled using a random forest approach that integrates backscatter characteristics, terrain attributes, and river morphology features. On this basis, a variance-guided α-weighted adaptive fusion strategy is implemented to balance the structural stability of RiverSP with the spatial detail preserved by PIXC. The framework ultimately produces a water level field referenced to a unified vertical datum, spatially continuous along the river network, and suitable for hydrological analyses in large, low-slope river systems. [Results] The correction results demonstrate a substantial improvement in the accuracy and consistency of the SWOT water level products. Specifically, the MAE of the PIXC and RiverSP products decreases from 0.41 m and 0.32 m to 0.15 m and 0.09 m, respectively, while the agreement between pixel-level water levels and hydrological gauge observations is markedly enhanced, with the coefficient of determination increasing from 0.73 to 0.96. Based on the corrected water-level fields, monthly longitudinal profiles and intra-annual time series at representative cross-sections are constructed to examine the spatiotemporal behavior of river water levels. The results reveal distinct hydrological response characteristics, including the along-river attenuation of flood peaks and the backwater effects induced by tidal forcing in the estuarine region. In the middle reaches of the Yangtze River, the intra-annual water-level variability is pronounced, with peak-to-trough differences exceeding 20 m, reflecting the combined influence of seasonal runoff regulation and channel geometry. In contrast, the estuarine cross-section exhibits strong tidal modulation, with monthly mean water levels reaching a maximum of approximately 5.44 m and a minimum of about 0.8~0.9 m, corresponding to an annual amplitude of roughly 4~5 m. These findings indicate that the corrected SWOT-derived water-level fields are capable of capturing both fluvial flood dynamics and estuarine tidal influences, thereby providing a reliable basis for hydrological analysis in large river systems. [Conclusions] These results confirm that the framework can produce spatially continuous, datum-consistent water level time series, suitable for large-scale hydrological analysis and flood monitoring.
[Significance] Fine-grained orientation and localization algorithms for ground images via cross-view image matchingis a high-precision geographic positioning technology that realizes sub-meter position estimation and sub-degree direction (heading) estimation by matching ground images with satellite reference images. The core challenge of this technology is to overcome the huge difference in geometry and appearance between the ground view and the satellite view, and realize accurate position and heading estimation through the methods of view unification, feature matching and attitude estimation. This technology is closely related to visual positioning, image retrieval, etc. It has important application value in GNSS rejection scenes such as automatic driving high-precision positioning and mobile robot navigation, and provides an absolute positioning solution for intelligent systems without relying on special infrastructure. [Analysis] Through literature collection, this paper systematically summarizes the research progress onfine-grained orientation and localization algorithms for ground images via cross-view image matchingfrom 2021 to 2025. In the context of technological development, the definition, technical route and domain boundary of this technology are clarified, the dynamic evolution process of this field from early exploration and technical differentiation to breakthrough in supervision mode is revealed, and a systematic classification system is constructed from three dimensions: unified perspective mode, orientation and positioning task type and supervision mode, and the core principles, technical details and advantages and disadvantages of various algorithms are deeply analyzed, and the types of feature extraction backbone networks are systematically sorted out. This paper summarizes the four public data sets resources and evaluation index systems of VIGOR, OxfordRobotCar, KITTI and Fordmulti-AV, and quantitatively analyzes the performance of various algorithms. In the analysis of core challenges, the key problems such as bridging the geometric and semantic differences across perspectives, inefficient fusion of weak supervised signals, attenuation of joint estimation accuracy under high directional noise, limited generalization of dynamic scenes, and limitations of timeliness and scene coverage of existing data sets are extracted. In the future research direction, five breakthrough directions are put forward: constructing BEV representation and efficient matching mechanism with geometric truth and semantic consistency, realizing hierarchical fusion and cross-domain generalization enhancement of weak supervision signals, overcoming the problem of robust joint estimation in high-direction noise and symmetrical scenes, improving the adaptability of the algorithm in dynamic and complex environments, and promoting technical engineering and improving the evaluation system, which provides clear guidance for the development of the field. [Conclusions] This paper systematically clarifies the technological evolution of this field from 2021 to 2025, constructs a multi-dimensional algorithm classification system, quantitatively analyzes the performance of mainstream algorithms and extracts the core technical challenges, and puts forward five breakthrough directions to establish a clear path for field research. This research provides reference for theoretical research, algorithm design and technology selection in this field, and also provides theoretical support for the engineering landing and large-scale application of technology in GNSS rejection scenarios.
[Objectives] Image feature matching is a method that establishes correspondences between images by identifying homologous image points across different images. High-precision image feature extraction and robust matching are fundamental tasks in photogrammetry, remote sensing, and computer vision. However, prevailing image feature extraction and matching algorithms primarily focus on improving matching completeness under various complex conditions, often neglecting the critical issues of insufficient feature localization accuracy and a scarcity of high-precision matching points. Although the sparse correspondences obtained by existing feature matching algorithms are sufficient for most tasks, issues such as insufficient accuracy in feature point localization and a scarcity of high-precision matches can still compromise the reliability of subsequent practical applications. [Methods] To address these limitations, this paper proposes a modular feature matching framework enhanced by arbitrary-scale super-resolution. A typical image feature matching workflow comprises three core steps: feature extraction, feature description, and feature matching. The framework integrating an image super-resolution reconstruction module into the traditional feature extraction and matching process enhances the localization accuracy and detectability of image feature points. This approach effectively increases the number of high-precision matching points and improves overall matching accuracy. Furthermore, the modular architecture allows for the substitution of individual algorithmic components to adapt to diverse matching scenarios, ensuring broad applicability. [Results] Comparative experiments were conducted against current mainstream feature extraction and matching methods on the public datasets MatrixCity and ETH3D. On the MatrixCity dataset, our method achieves both an increased number of high-precision match points and improved feature matching accuracy, surpassing the AKAZE algorithm by up to 7 times and 58.65%, respectively. Performance gains were observed across all tested super-resolution scales, with the 6× super-resolution scale yielding the highest number of correct matches and the 20× scale achieving the lowest Root Mean Square Error (RMSE). On the ETH3D dataset, the number of feature points has significantly increased while maintaining a high number of accurately matched points. The AKAZE algorithm achieved a 54 times increase in the number of matching points. [Conclusions] Compared to existing mainstream methods, the proposed approach shows significant advantages in both matching accuracy and the quantity of matching points. The relevant modular algorithms can be directly applied to enhance the performance of current feature extraction and matching pipelines. Experimental results in complex scenarios demonstrate the robustness of the proposed algorithm. This provides a reference for enabling high-precision feature matching technology to serve more robustly and extensively in practical applications that rely on high-accuracy matching, such as 3D reconstruction, remote sensing surveying, and visual positioning.
[Objectives] Remote sensing change detection technology is crucial for urban management, environmental monitoring, and disaster emergency response, as it provides timely and accurate information on changes in the Earth's surface. However, traditional change detection methods based solely on remote sensing images face limitations such as the inability to fully capture complex changes and the lack of effective integration of multi-source data, leading to inaccurate detection results. The integration of remote sensing images with Digital Surface Models (DSM) is a promising solution, but challenges remain in effectively fusing data from different modalities and accurately aligning their spatiotemporal features. This paper aims to address these issues by proposing a novel multimodal remote sensing change detection method that enhances the accuracy and robustness of change detection through effective data fusion and spatiotemporal feature integration. [Methods] To address the limitations of traditional change detection methods, this paper proposes a multimodal remote sensing change detection method named Cot-FresUNet, based on a Siamese neural network and a spatiotemporal dependency model. The method leverages optical images and DSM data as inputs and employs a two-dimensional twin neural network architecture in the encoding stage. This architecture is enhanced by a multi-scale feature fusion mechanism, which significantly improves the model's ability to integrate common features from multimodal data and distinguish subtle similarities and differences in feature representations. Additionally, the proposed spatiotemporal dependency model fully integrates the temporal continuity and spatial correlation of surface changes through multi-temporal feature interaction, thereby enhancing the model's capacity to represent and detect changes accurately. In the decoding stage, spatial pyramid pooling is utilized to obtain multi-scale features, which further enhances the deep fusion of local features with global contextual information. This comprehensive approach ensures that the model can effectively capture both fine-grained and large-scale changes in the monitored areas. [Results] The effectiveness of the proposed Cot-FresUNet method was evaluated on the publicly available 3DCD dataset and compared with eight existing methods. The results demonstrated significant improvements, with the proposed method achieving a 14.45% increase in F1-score and a 15.77% increase in Mean Intersection over Union (mIoU) compared to the previous best-performing method. Detailed analysis of the feature maps and learning effects at different stages of the model revealed the strengths of the proposed architecture. Furthermore, the performance of four fusion strategies, including early fusion and late fusion, was compared, and the fusion of optical imaging and DSM inputs was proven to achieve the highest accuracy in change detection through ablation experiments. Finally, the complexity metrics of eight types of models were compared, and the proposed method showed significant advantages in terms of parameter count and computational cost, making it highly efficient for practical applications. [Conclusions] The results indicate that the proposed Cot-FresUNet method can effectively improve the accuracy of multimodal change detection and address the issues of feature confusion, reliance on single-feature representations, and insufficient interaction of different spatiotemporal information that are commonly found in traditional twin neural network models. This study not only provides a robust solution for remote sensing change detection but also opens new avenues for future research in multimodal data integration and spatiotemporal analysis. The proposed method's ability to integrate optical images and DSM data through an advanced fusion mechanism and its enhanced representation of spatiotemporal features make it a valuable tool for applications in urban management, environmental monitoring, and disaster response.
[Objectives] Points of Interest (POIs) constitute fundamental semantic units for representing and deconstructing complex urban systems and are widely used in applications such as urban functional zoning, economic geography analysis, urban vitality assessment, and commercial site selection. Despite their importance, existing POI-based studies are increasingly constrained by data-related challenges. In particular, POI datasets are often difficult to update in a timely manner, exhibit substantial variability in data quality, and are typically available only at coarse spatial scales (e.g., district or county level). These limitations hinder fine-grained urban analysis and restrict the applicability of POI data in dynamic urban sensing and micro-scale decision-making. Addressing the need for timely, accurate, and street-level POI information remains a critical challenge in urban spatial intelligence. [Methods] To overcome these limitations, this paper proposes an efficient and fully automated method for roadside POI collection in urban environments based on an edge-cloud collaborative architecture. At the edge layer, a lightweight yet high-performance perception pipeline is deployed on a self-developed edge computing device. Specifically, the YOLO11-SORT model is employed to achieve real-time detection, tracking, and geolocation of roadside advertising signs from continuous street-view video streams. A virtual tripwire mechanism is further introduced to associate visual trajectories with spatial coordinates in an event-driven manner, reducing redundant localization operations. Subsequently, a two-stage Optical Character Recognition (OCR) framework combining DBNet for text detection and SVTR for text recognition is applied to extract structured textual information from advertising signs. At the cloud layer, a multimodal Chain-of-Thought prompt (CoT-TP) is designed, integrating textual semantics with visual spatial features to guide large language models in semantic reasoning, noise correction, and POI information synthesis. This design enables the generation of clean, standardized, and semantically consistent POI records from noisy real-world text. [Results] Experiments conducted on a dataset of 17 700 Chinese urban street-scene images from the BDCI competition demonstrate the effectiveness and efficiency of the proposed approach. The real-time roadside billboard detection model achieved a detection accuracy of 78.94% while maintaining a processing speed of 45.2 FPS, satisfying practical deployment requirements for edge devices. In POI information extraction tasks, the multimodal CoT-TP strategy achieved an F1 score of 86.52%, outperforming pure text-based prompting strategies by 4.24%. Field validation conducted in Gulou District, Fuzhou further verified the robustness and adaptability of the proposed system under complex urban road and environmental conditions. Notably, the average end-to-end processing time for a single POI—from data acquisition to final output—was approximately 0.5 seconds, indicating excellent overall system efficiency. [Conclusions] The proposed method enables reliable generation of highly time-sensitive, fully traceable, and fine-grained street-level POI data. By tightly integrating edge perception, multimodal semantic reasoning, and cloud-based intelligence, this approach provides a scalable and practical solution for dynamic urban sensing, smart city applications, and micro-business spatial analysis.
[Objectives] As a core spatiotemporal resource in intelligent transportation systems, the security of trajectory data is crucial for achieving trustworthy data sharing. However, existing digital watermarking algorithms for trajectory data often fail to withstand data compression and geometric transformation attacks, and inevitably compromise data integrity during the embedding process. These limitations hinder the simultaneous realization of copyright protection and integrity verification, thereby restricting the sharing and utilization of trajectory data. To address this issue, this paper proposes a dual watermarking algorithm that integrates both robust and fragile watermarks. The algorithm leverages invisible characters to achieve covert embedding of watermark information, thereby enhancing the security and usability of trajectory data. [Methods] This algorithm fully utilizes the textual representation characteristics of trajectory data, mapping watermark information to zero-width Unicode control characters to embed it into the trajectory data. This ensures the concealment and invisibility of the watermark, meeting the dual requirements of high data accuracy and privacy protection in intelligent transportation systems. The robust watermarking component employs Hamming codes to enhance error correction capabilities, ensuring accurate extraction of watermark information even after data transmission or processing. The encoded robust watermark is appended to the end of specific data fields, ensuring no impact on the precision of the trajectory data. Concurrently, a fragile watermark is constructed by utilizing trajectory grouping features to generate a structured watermark uniquely corresponding to each trajectory record. This fragile watermark is also encoded into invisible Unicode control characters and embedded into specific data fields, serving the purpose of integrity verification and tamper detection. [Results] Experiments conducted on the GPS trajectory datasets T-Drive and Geolife demonstrate that the proposed method effectively preserves both the usability and integrity of the trajectory data while exhibiting strong resilience against various malicious operations, including geometric transformations, and random insertion and deletion attacks. The correlation coefficients between the extracted invisible watermark and the original watermark remained above 0.9 after undergoing attacks, indicating strong robustness of the watermarking scheme. Furthermore, under diverse data tampering scenarios with varying types and intensities, the computed True Positive Rate (TPR) for tamper detection consistently reached 1.0, confirming the algorithm's capability to accurately identify and localize tampering behaviors. [Conclusions] The dual-watermarking algorithm proposed in this paper possesses both robustness and fragility, enabling simultaneous copyright protection and integrity verification of trajectory data. This addresses the limitations of existing trajectory data watermarking techniques that struggle to balance these two functions. It is applicable to various smart transportation data security scenarios and can effectively promote the trustworthy sharing and collaborative utilization of trajectory data, thereby driving the secure development and data trustworthiness of smart transportation systems.
[Objectives] Urban traffic flow prediction constitutes a core and fundamental task within the realm of Intelligent Transportation Systems (ITS). However, in real-world scenarios, the spatiotemporal distribution of traffic flow inherently exhibits highly complex cross-scale characteristics. On the one hand, it encompasses macro-level dynamic evolution trends at the city-wide scale; on the other hand, it is significantly constrained by micro-level vehicle flow mobility and diffusion effects within local neighborhoods. Existing methodological approaches often encounter significant difficulties in accurately capturing and distinguishing these two distinct types of features at different scales within a single unified model. Consequently, this limitation inevitably restricts the prediction accuracy of the models, particularly when dealing with complex and volatile traffic patterns. [Methods] To solve the above problems, this paper proposes an Urban Traffic Flow Prediction Model Capturing Global Dynamic and Local Propagation Features (GDLP-Net). The proposed model adopts a sophisticated dual-branch collaborative framework to address the issue. Specifically, the Global Dynamic Branch utilizes Convolutional Long Short-Term Memory (ConvLSTM) networks to extract city-level temporal evolution rules and capture long-term dependencies, subsequently generating a global spatial attention map. In parallel, the Local Attention Branch is designed to explicitly simulate the physical processes of traffic flow propagation within neighborhoods of varying spatial ranges. Building upon this foundation, the model leverages the generated global attention map to adaptively enhance the local propagation features, thereby achieving an effective and synergistic integration of spatiotemporal traffic flow information across different spatial scales. [Results] Extensive empirical experiments conducted on three public benchmark datasets—TaxiBJ, BikeNYC, and TaxiNYC—demonstrate that the method proposed in this paper can effectively reduce traffic flow prediction errors, with prediction accuracy comprehensively outperforming existing state-of-the-art baseline models. Specifically, when compared with Sequential Periodic Network (SPN), which is the best-performing baseline model in current literature, GDLP-Net exhibits remarkable performance improvements. On the BikeNYC dataset, the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) are reduced by approximately 6.8% and 4.6%, respectively; similarly, on the TaxiBJ dataset, the MAE and RMSE are reduced by approximately 4.0% and 4.1%, respectively. Furthermore, on the fine-grained TaxiNYC dataset, which is characterized by a 15-minute time interval and drastic data fluctuations, the model still achieves significant reductions of approximately 2.9% in MAE and 6.6% in RMSE. These experimental results provide strong evidence of the model's robust generalization ability in handling diverse data scenarios. [Conclusions] By systematically integrating global evolution trends and local propagation information, the method proposed in this paper effectively strikes a balance between model prediction accuracy and interpretability. Consequently, this study provides key technical support and valuable insights for critical application domains, such as alleviating traffic congestion, monitoring urban planning dynamics, and assisting in the optimization of transportation planning.
[Objectives] Understanding the mismatch between subjective safety perception and objective crime risk is essential for developing data-driven urban safety and livability strategies. Although existing studies have explored the relationships between built environment characteristics, crime risk, and safety perception, two key limitations remain: first, most existing studies rely on grid-based units of analysis, which tend to obscure street-level heterogeneity and mask potentially opposing mismatch mechanisms; second, the predominant reliance on traditional linear methods limits the ability to capture nonlinear effects and complex interactions among environmental factors. Despite the recent application of machine learning methods to street-level crime prediction and environmental analysis, interpretable research on the formation mechanisms of perception-risk mismatch remains relatively scarce. Therefore, this study develops an interpretable machine learning analytical framework at the street scale to reveal how built environment factors influence the spatial coupling between safety perception and crime risk. [Methods] Built environment features extracted from Street View Imagery (SVI) and socioeconomic variables were mapped to street segments to construct explanatory variables. Safety perception scores were obtained using a CNN-based model. Through standardization methods, safety perception and objective crime risk were classified into three matching types as the dependent variables. Subsequently, the optimal XGBoost model was selected through comparison experiments, and the SHAP interpretation method was introduced to quantitatively analyze the influence mechanisms of different environmental factors on the formation of two mismatch types at both global and local levels. [Results] Taking Chaoyang District, Beijing as the study area, built environment data and records of theft and violent crimes were integrated to conduct classification prediction experiments and interpretability analysis of street-level safety perception and crime risk mismatch. The results indicate significant spatial heterogeneity in perception-risk mismatch: objectively unsafe streets in urban core areas predominantly exhibit "high perceived safety" mismatch, while objectively safe streets in peripheral areas display "low perceived safety" mismatch. Among streets characterized by the "objectively unsafe-high perceived safety" type, building area exerted the greatest influence on both theft and violent crime (SHAP values of 0.636 and 0.781, respectively), followed by street length (0.337) and road area (0.320), with nonlinear enhancement and conditional moderation effects observed among variables. [Conclusions] By integrating interpretable machine learning with street-scale spatialanalysis, this study advances understanding of how built environment characteristics shape biases in safety perception through nonlinear and interactive mechanisms.The findings move beyond grid-based or traditional linear analyses by uncovering fine-grained, street-level mismatch patterns, thereby providing both methodological support and empirical evidence for targeted urban safety governance and context-sensitive spatial design interventions.
[Objectives] Mining-induced subsidence significantly constrains the safe operation of mines and threatens the stability of surface structures. The accuracy of surface subsidence prediction is influenced by the mine's geological conditions and mining parameters. The applicability of theoretical models and the engineering effectiveness of measurement techniques are constrained, making it challenging to meet the requirements for efficient and precise subsidence monitoring and early warning. [Methods] Leveraging the strong scalability of discrete element simulation samples and the data-driven advantages of deep learning, an ensemble learning time series prediction method based on discrete element simulation data is proposed to achieve accurate prediction of mining subsidence. Taking the 1208 working face of the Fengjiata Coal Mine as a case study, we independently developed parametric modeling and automated monitoring workflows using Rhino-Grasshopper and 3DEC, respectively. This facilitated high-precision discrete element parametric modeling, mining subsidence simulation, and automated monitoring of surface subsidence. Model parameters were calibrated through the integration of in-situ measurements and analyses of mining pressure behavior, resulting in the construction of a time series dataset for surface subsidence. Three deep learning algorithms—LSTM, Transformer, and the improved Temporal Sequence Transformer (TST)—are used to develop time series prediction models for surface subsidence under mining-induced effects. The optimal model is selected based on regression evaluation metrics, and LSTM is integrated as a front-end feature extractor with the optimal model, leading to the construction of an ensemble learning model for mining subsidence prediction. [Results] The results indicate that the TST model achieved superior predictive performance (for the inclination monitoring points: R2 = 0.969, REVS = 0.973, RMAE = 0.022, RMSE = 0.001; for the strike monitoring points: R2 = 0.965, REVS = 0.968, RMAE = 0.026, RMSE = 0.002). Its performance was further improved after optimization with the LSTM model (for the inclination monitoring points: improvements of 2.06% in R2 and 1.64% in RMSE were achieved; for the strike monitoring points: improvements of 1.66% in R2 and 1.45% in RMSE were achieved). The proposed TST-LSTM framework shows enhanced sensitivity and improved capture of short-term dynamics in time-series data. The measured subsidence data from the adjacent 1 209 working face and the Yujialiang Coal Mine's 52309 working face were compared with the model predictions. The maximum subsidence errors in the strike and dip directions for the 1209 working face were 69.2 mm and 47.0 mm, respectively. For the 52309 working face, the maximum subsidence errors in the strike and dip directions were 81.2 mm and 74.5 mm, respectively. These results demonstrate that the model exhibits high prediction accuracy. [Conclusions] This study integrates automatic monitoring of surface subsidence time series data, dataset construction, and ensemble learning techniques, proposing a intelligent prediction method for mining subsidence.
[Objectives] Remote sensing target detection is a core technology in geographic information intelligent interpretation. It is of vital importance in fields such as resource exploration, natural disaster assessment, and military target identification. It can provide decision-makers and researchers with quick and accurate surface key information, and plays an important supporting role in implementing policies and making scientific decisions. However, the existing methods are insufficient in lightweighting and real-time performance on edge devices, and the detection accuracy of small targets urgently needs to be improved. [Methods] In response to the above issues, this paper proposes a lightweight remote sensing object detection model DSNR-YOLO(Depthwise Separable Convolution with Normalization Attention and Residual Block-YOLO). This model introduces an improved DSNRblock in the backbone part, fusing depth-separable convolution, normalization-based Attention Module (NAM), and residual connection through a dual-branch selection strategy. Maintain detection accuracy while reducing the number of Params(M) and computational complexity; Embed Ghost Convolution (GhostConv) in the feature extraction and fusion stage to further reduce the power consumption of the model; Deploy the Bidirectional Feature Pyramid Network (BiFPN) in the neck network to achieve efficient interaction of multi-scale features and enhance the detection ability of small targets. At the same time, an adaptive weighted mixed loss function is designed to optimize the learning effect of the model on difficult and easy samples. [Results] The experiments on the DIOR dataset show that the mAP and mAP50 of DSNR-YOLO reach 66.9% and 89.0% respectively, with only 18.3 GFLOPs, Params (M) is 7.15, and a frame rate of 128 Frames Per Second (FPS). It improves the detection accuracy while reducing the hardware resource requirements. The detection accuracy is superior to the remote sensing detection models that have emerged in recent years. The comparison results on the SIMD dataset with mainstream lightweight modules and networks show that under the same deployment conditions, DSNR-YOLO achieves a certain degree of superiority in mAP (2.5%~6.3%) while maintaining a relatively high level of lightweighting, demonstrating the superiority of the overall performance of the model. [Conclusions] DSNR-YOLO is a lightweight deep learning model specifically designed for remote sensing target detection scenarios. It integrates the core advantages of lightweight architecture design, real-time inference optimization, and high-precision detection algorithms, achieving a precise balance in three dimensions: model size, running efficiency, and detection performance. This model not only demonstrates significant practical value in actual remote sensing applications, but also has broad promotion and application prospects in emerging fields such as intelligent remote sensing and real-time perception at the edge, providing strong technical support for promoting the development of remote sensing technology towards lightweight, intelligent, and real-time directions.