Close×
    • Select all
      |
    • QIN Qiming
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Objectives] With the rapid increase in the number of Earth observation satellites in orbit worldwide, remote sensing data has been accumulating explosively, offering unprecedented opportunities for Earth system science research to dynamically monitor global change. At the same time, it also brings a series of challenges, including multi-source heterogeneity, scarcity of labeled data, insufficient task generalization, and data overload. [Methods] To address these bottlenecks, Google DeepMind has proposed AlphaEarth Foundations (AEF), which integrates multimodal data such as optical imagery, SAR, LiDAR, climate simulations, and textual sources to construct a unified 64-dimensional embedding field. This framework achieves cross-modal and spatiotemporal semantic consistency for data fusion and has been made openly available on platforms such as Google Earth Engine. [Results] The main contributions of AEF can be summarized as follows: (1) Mitigating the long-standing “data silos” problem by establishing globally consistent embedding layers; (2) Enhancing semantic similarity measurement through a von Mises-Fisher (vMF) spherical embedding mechanism, thereby supporting efficient retrieval and change detection; (3) Shifting complex preprocessing and feature engineering tasks into the pre-training stage, enabling downstream applications to become “analysis-ready” and significantly reducing application costs. The paper further highlights the application potential of AEF in three stages: (1) Initially in land cover classification and change detection; (2) Subsequently in deep coupling of embedding vectors with physical models to drive scientific discovery; (3) Ultimately evolving into a spatial intelligence infrastructure, serving as a foundational service for global geospatial intelligence. Nevertheless, AEF still faces several challenges: (1) Limited interpretability of embedding vectors, which constrains scientific attribution and causal analysis; (2) Uncertainties in domain transfer and cross-scenario adaptability, with robustness in extreme environments yet to be verified; (3) Performance advantages that require more empirical validation across regions and independent experiments. [Conclusions] Overall, AEF represents a new direction for research in remote sensing and geospatial artificial intelligence, with breakthroughs in data efficiency and cross-task generalization providing solid support for future Earth science studies. However, its further development will depend on continuous advances in interpretability, robustness, and empirical validation, as well as on transforming the 64-dimensional embedding vectors into widely usable data resources through different pathways.

    • ZHAO Luying, ZHOU Yang, HU Xiaofei, HUANG Gaoshuang, GAN Wenjian, HOU Mingbo
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Significance] Cross-view geolocalization is the process of using a satellite image with coordinate metadata as reference to determine the geographic coordinates of an unknown ground-view image. This problem is often viewed as an image matching task, where an overhead satellite image is segmented into a number of square blocks of satellite patches, and the ground image is matched with candidate satellite patches to retrieve the most similar satellite patch, using the position of the center pixel in that patch as the query location. [Progress] With the development of cross-view geolocalization, the technique has been extended to fine-grained metric localization of ground imagery, i.e., identifying which image coordinates in a satellite patch correspond to a ground-measured location. Given that satellite images have global coverage and are easy to obtain, their application as reference images in image positioning has significantly broadened the application scope of image geolocation technology. This trend has prompted growing academic interest and attention to cross-view geolocalization research. Along with the development of various algorithmic techniques, cross-view geo-localization has evolved from the manual extraction of features, which was mainly based on the geometric features of buildings, to deep learning approaches that are applicable to richer scenarios, such as suburban and urban areas. The specific localization idea has progressed from the image-level cross-view localization, which uses the retrieval method to directly mark the retrieved center coordinate of the satellite image as the location of the ground image, to pixel-level fine-grained localization, which more accurately assigns the coordinates of the corresponding pixel location of the satellite image to the ground image. However, the drastic change in the viewing angle of ground and satellite images results in a huge difference in visual content, making cross-view image localization more challenging. To improve the accuracy of cross-view geo-localization, various scholars have made algorithmic improvements, such as representation learning and metric calculation. Additionally, for the huge viewpoint differences, some scholars study specialized geometric transformation, image generation, and other viewpoint conversion methods between cross-view images. Others improve localization accuracy with the help of directional information, intermediate viewpoint connection of UAV image information, and more. [Purpose] This paper summarizes the development process of cross-view geolocation, the different methods for improving accuracy, the various data sets involved, and the evaluation methods at different stages. On this basis, we discuss the future development trends and provide corresponding summaries.

    • HOU Qingfeng, LU Jun, GUO Haitao, PING Yifan
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Objectives] Cross-view image geo-localization technology can establish a correlation between image and real geographic space, holding significant research value for exploring the various attributes behind the images. In recent years, most cross-view image geo-localization algorithms based on deep learning have overly focused on image content, leading to overfitting of low-level details within the network and limited capability in extracting geometric spatial layout. As a result, the performance on evaluation datasets is suboptimal. [Methods] To address these issues and improve the performance of cross-view image geolocation algorithms, this paper presents a cross-view image retrieval and localization algorithm based on geometric relation constraints. First, the imaging principle of ground panoramic images is derived, and the mapping relationship between the spherical and plane rectangular coordinate systems is used to convert ground panoramic images to bird's-eye views, achieving initial geometric alignment between cross-domain matched images. Next, a CNN-Transformer hybrid feature extraction operator is designed. This operator not only extracts visual content features but also captures geometric spatial configuration information between local features, thus mitigating content and scale discrepancies caused by viewpoint changes. Furthermore, to suppress distorted information in ground images following geometric mapping transformations, a feature self-interaction module based on relational affinity matrix is proposed. This module separates foreground from background information by calculating the correlations among local features, thereby enhancing key foreground information. Finally, a feature aggregation module is introduced to generate a global feature descriptor and complete the matching process. [Results] Experiments conducted on the CVACT_val, CVUSA, and VIGOR datasets demonstrate that the algorithm achieves superior results, with Top1 image recall rates of 89.28%, 96.42%, and 62.21% on the respective datasets. Compared to GeoDTR, a similar algorithm, the accuracy of our algorithm improves by 3.07%, 1.04%, and 3.2%, respectively. [Conclusions] Highlighting its superiority and adaptability across various application scenarios.

    • XU Lianrui, YOU Xiong, LIU Kangyu, JIA Fenli
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Objectives] Visual saliency modeling in urban driving scenarios is a key direction for enhancing the intelligence of driving systems. Existing methods often overlook the feature extraction of combined objects and the dynamic variations in spatiotemporal features. This results in low efficiency in the interaction and transmission of multi-factor features, failing to meet the demands for high precision and efficiency in saliency prediction. [Methods] To address these challenges, we propose a novel method for visual saliency prediction and modeling in driving scenarios, based on the concept of multi-factor feature fusion. First, we employ computer vision techniques to extract and quantify color and texture features from scenario images. Next, we introduce high-difference thresholds, Hu moments, and Fourier descriptors to focus on combined objects and extract their shape features. Subsequently, a three-dimensional convolutional network is used to extract spatiotemporal features from continuous scenario images. Finally, we design a saliency fusion module based on a Long Short-Term Memory (LSTM) network, which integrates color, texture, shape, and spatiotemporal features. Saliency prediction and heatmap generation are accomplished through a saliency map decoding module. [Results] Comparative experiments were conducted using five different methods on a driving dataset from Zhengzhou. The proposed method was evaluated using the Area Under the Curve (AUC) metric, achieving a prediction accuracy of 91.12%, significantly outperforming other methods. It was also effectively validated on public datasets such as DADA, Dr(eye)ve, and Deng. [Conclusions] Experimental results demonstrate that the proposed method not only improves prediction accuracy but also maintains computational efficiency, while effectively capturing the distribution of combined objects and the details of spatiotemporal changes within the scenarios. This advancement supports the evolution of perception technology from isolated task models toward cognitively coupled systems.

    • LI Limin, ZHANG Aizhen, SHEN Jie, LI Jing
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Objectives] In recent years, researchers have begun to focus on individuals' psychological experience of interaction with maps. It has become evident that individuals' experience with maps are significantly influenced by their spatial cognitive styles and the information they receive. [Methods] This study employed the Spatial Cognitive Style Test (SCS) to classify participants based on their spatial cognitive styles and investigated the mode and scale of navigation maps. We aimed to investigate the optimal alignment between an individual's spatial cognitive style and the presentation mode of navigation maps from an offline representation perspective. [Results] The results show that: (1) Participants with a landmark style exhibited the poorest performance, followed by those with a route style, and participants with a survey style achieved the best results; (2) Compared to Judgment of Relative Direction (JRD) tasks, participants demonstrated higher proficiency in Scene and Orientation-dependent Pointing (SOP) tasks; (3) Participants performed better after learning large-scale maps than learning small-scale maps; (4) Regardless of learning large-scale or small-scale maps, landmark style individuals demonstrated significantly higher accuracy in SOP tasks than in JRD tasks. By contrast, route style participants did not exhibit significant difference in accuracy between these two tasks. Moreover, survey style participants did not exhibit significant difference in accuracy between these two tasks after learning large-scale maps, but their accuracy in SOP tasks was significantly higher than that in JRD tasks after learning small-scale maps. [Conclusions] Our results indicated significant differences in task performance among individuals with three different spatial cognitive styles, and individuals tend to judge target locations based on their current positions. In addition, the research results also suggest that using large-scale maps is more beneficial for individuals to judge the target location, and that individuals with different spatial cognitive styles have different performance on different tasks after using different scale maps. These results indicate that there is an optimal adaptability between an individual's spatial cognitive style and the mode of presentation of navigation maps and that people with different spatial cognitive styles have preferences for specific modes of map presentation. These findings are of great importance for enhancing individuals' experience of navigation map usage and also provide valuable suggestions and guidance for map designers and developers.

    • ZHANG Caili, XIANG Longgang, LI Yali, WANG Limei, HOU Shaoyang, YU Qian
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Objectives] The setting of lane-turning signs at planar intersections is a critical measure to achieve orderly vehicle flow on a large scale. These signs not only assist traffic management authorities in controlling traffic at planar intersections, but also help drivers avoid detours caused by selecting the wrong lane. Therefore, lane-level turning relationships provide crucial information for precise navigation services. Setting turning signs at intersection lanes is a fundamental aspect of traffic management that promotes safety, orderliness, and efficiency. Considering the low cost and fast updates possible with crowd-sourced trajectories, this paper presents a study on the recognition of lane-level turning relationships at planar intersections using crowd-sourced data. [Methods] First, the intersection lane space was determined. Road network topology processing, trajectory data cleaning, and map matching were performed to establish connections between trajectory points and their corresponding road segments. Based on processed road network and trajectory data, trajectories within the intersection guide area were extracted, and noise was removed from multiple perspectives. Using a Gaussian mixture model, cluster analysis was then conducted to detect lane information at road segment intersections. Second, intersection lane noise turnings were removed. Horizontal and vertical statistics of lane turning trajectories were analyzed, and trajectories falling below the threshold were identified as noise turnings and removed. Finally, intersection lane-turning identification rules were designed, taking into account the distribution of different lane-turning trajectories. An unsupervised classification method was used to detect lane-turning information based on these rules. [Results] To validate the effectiveness of the method, we selected OpenStreetMap road networks and crowd-sourced trajectories from two areas in Beijing for experiments, focusing on 10 representative intersections on major city thoroughfares. The main findings are as follows: (1) Based on the original trajectory data, lane-level turning relationships were identified using one-day trajectories, peak period trajectories, and off-peak trajectories, with recognition accuracy rates of 74.3%, 72.7%, and 55.7%, respectively; (2) After data encryption and simplification, recognition accuracy gradually improved with increased sampling frequency, reaching a maximum of 77.0% with 3-second encryption; (3) The proposed method demonstrated advantages over the threshold segmentation method, the method without noise turning elimination, and the topological connection method. [Conclusions] This research on lane-level turn relationship recognition based on crowd-sourced trajectories has significant implications for intelligent transportation and autonomous driving. The algorithms, technologies, and research findings in this paper provide a valuable reference for future research and foster continued innovation and application of intelligent transportation systems.

    • DENG Yuanyuan, XIANG Longgang, JIAO Fengwei
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Objectives] Overpasses serve as hubs within the urban traffic system and constitute a vital component of the transportation infrastructure. Accurate extraction of an intricate overpass structure holds paramount significance in the realm of transportation planning and vehicle navigation. Crowdsourced trajectory data have the merits of low cost, large data volume, and high real-time performance. In particular, vehicle trajectory contains a large amount of semantic information about travelling vehicles and road connectivity info, making it a significant data source for road network information extraction. Nevertheless, extracting fine overpass structure using trajectory data lacking elevation information presents many challenges, due to the fact that an overpass has a multi-layered three-dimensional structure with dense and intertwined internal roads, and its hierarchical structure and turning relationship are more complex than those of a typical intersection. [Methods] Considering these challenges, this paper proposes a forward and backward tracking and fusion method for automatic overpass structure extraction, taking macroscopic ordered connection and microscopic similarity clustering of trajectory points into account. Firstly, a set of high-confidence vehicle trajectories located on the main body of overpass is filtered out based on feature constraint rules. Then, taking the manually marked overpass entrances and exits as the tracking seed points, through potential road junction perceiving and verifying, and diverting trajectory based on flow clustering, the trajectory is tracked recursively forward or backward. After removing redundant branches, the overpass substructure is obtained. Finally, a two-stage fusion strategy is used, where all the substructures extracted from the forward or backward tracking are fused separately based on the road junctions with the same geographical location, and then by incorporating the road confluences information obtained from the backward tracking into the fused structure derived from the forward tracking, a complete two-dimensional structure of the overpass at roadway level is extracted. [Results] Taking the crowdsourced trajectory data collected from Shenzhen as the data source, structural extraction experiments and analyses were conducted on seven types of overpasses distributed in the urban or suburban areas, including three-forked trumpet overpasses, four-forked double trumpet overpasses, four-forked partial cloverleaf overpasses, four-forked combined overpasses, turbo overpasses, complete cloverleaf overpasses and deformed cloverleaf overpasses. The extracted structures are basically located in the center of the real overpass roads in the remote sensing images, with an overall GEO precision of 95.50% and an overall TOPO of 88.96%. [Conclusions] The results indicate that our proposed method can combine the connectivity of the trajectory and the bearing divergence of the local trajectory point group and effectively extract the geometric structure and topological information of overpasses without incorrect connection at the road crossing stacks.

    • QIU Tianxu, WANG Tao, ZHANG Yan, ZOU Haoyang, WANG Buyun, CHEN Chijie
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Objectives] Road object detection is a significant application of LiDAR (Light Detection and Ranging) point clouds. As a rapidly developing mapping system, LiDAR can quickly collect point cloud data of objects' surface in road environments. Currently, many detection methods based on LiDAR point clouds perform well for large objects, close-range objects, or objects in simple scenes, accurately detecting objects in such conditions. However, due to sensor limitations during the acquisition process, LiDAR point clouds are often sparse and lack the texture information present in RGB images. This limitation leads to high false detection and omission rates when dealing with small objects, long-distance objects, or objects in complex road environments. The rich texture information in RGB images can compensate for these deficiencies in point clouds, making it necessary to explore road object detection technologies based on the fusion of RGB images and LiDAR point clouds. [Methods] To improve road object detection in complex scenes, this paper uses PointRCNN as the baseline network and proposes a two-branch multi-stage fusion detection network, named EPG2LFusion, based on RGB images and LiDAR point clouds. The network introduces two key innovations: Firstly, to address the limitation of existing convolutional neural networks in extracting image features due to receptive field constraints, a convolutional module named WaveDSConv is designed in the image branch. This module combines wavelet transform convolution and depth separable convolution to enhance global image feature extraction, thereby improving the performance of fused object detection. Secondly, to overcome challenges in fusing the two different modalities of point clouds and images, a fusion module G2L-Fusion is proposed. This module establishes accurate point-pixel correspondence between point clouds and images using a projection matrix and effectively employs a channel attention mechanism to fuse global information and local information across multiple stages. [Results] The proposed method was evaluated on the KITTI benchmark dataset for road object detection across multiple categories (cars, pedestrians, cyclists). Results indicate that the proposed method achieves an average detection accuracy of 65.21% for all categories on the KITTI test set, which is 4.88% higher than the baseline network. For the challenging pedestrian category under medium difficulty, an average detection accuracy of 45.86% was achieved, demonstrating competitive performance compared to existing advanced algorithms. [Conclusions] The results confirm that the proposed algorithm leverages the rich texture features of RGB images to address the sparsity of LiDAR point clouds, significantly improving the detection accuracy of common road objects.

    • ZHANG Wenhui, CHENG Shifen, PENG Chaoda, LU Feng
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Objectives] Multiple Unmanned Aerial Vehicles (UAVs) path planning is a pivotal technology enabling efficient and cooperative operation of UAV swarms in complex environments. Fundamentally, it constitutes a constrained multi-objective optimization problem characterized by a sparse feasible region. Due to their robust global search capabilities, evolutionary algorithms have been widely adopted to address this class of problems. However, existing methodologies frequently neglect the spatial structural attributes of paths during the initialization phase and primarily rely on generic stochastic search operators for path regeneration. These limitations restrict their ability to effectively exploit the fitness function and the spatial relationships between UAVs and their operational environment, thereby hindering the generation of high-quality, globally feasible solutions under constrained computational resources. [Methods] To address these challenges, this paper introduces a Multi-Source Heuristic Evolutionary Algorithm (MSHEA) for multiple UAV path planning. MSHEA systematically integrates multi-source heuristic information related to spatial path structure, fitness data, and environmental context to enhance both the path initialization and regeneration processes. Specifically, we propose a Sequential Directed Expansion-based Initialization (SDEI) strategy to generate high-quality initial paths that exhibit spatial rationality and structural diversity. Furthermore, we develop a path regeneration mechanism integrating fitness and Flight Environment Information (FFEI), which substantially improves the repair efficiency of infeasible paths and the local optimization of feasible ones. [Results] Empirical validation was conducted using eight publicly available benchmark datasets for multiple UAV path planning, covering a range of complexity levels. The experimental results demonstrate MSHEA's superior performance and elevated stability across diverse flight scenarios: (1) Compared to the suboptimal benchmark algorithm, MSHEA achieved a 1%~6% improvement in hypervolume and a 6%~81% reduction in inverted generational distance. (2) Both the SDEI and FFEI components were confirmed to have significant positive impacts on the algorithm's overall performance. (3) MSHEA shows low sensitivity to its newly introduced hyperparameters, indicating strong adaptability and generalization capability. [Conclusions] In conclusion, MSHEA improves the effectiveness of solving multiple UAV path planning problems by incorporating multi-source heuristic information. This leads to robust and reliable performance for the challenges inherent in collaborative UAV missions.

    • ZHAO Binbin, ZHU Zhe, LIU Guang
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Objectives] In the real world, terrain varies immensely, and when extracting terrain feature points, reasonable bend division is a key step. Pseudo bend division is crucial for accurately extracting terrain branches and characteristic lines. Existing research predominantly employs a single merging method to address pseudo bends. However, the complex and diverse forms of actual terrains limit the applicability of such a unified approach, making it difficult to accurately extract terrain branches across different topographic scenarios. Furthermore, constrained by data accuracy, methods based on geometric indicators are vulnerable to noise points and mutual interference among the indicators themselves. Consequently, extracting ideal terrain feature points remains highly challenging. [Methods] To address these issues, this paper proposes a method based on constrained Delaunay triangulation to identify contour bends, establishing a multi-branch tree bend model that incorporates the geometric structures of bends on both sides of the contour. This model is then employed to identify three distinct types of pseudo bends and to handle pseudo bend phenomena under various topographic conditions. The method classifies pseudo bends according to the structure of the constrained Delaunay triangulation, enabling accurate identification of terrain branches and ensuring that the distribution of characteristic lines corresponds to the "leaf-vein-like" geomorphological structure. Furthermore, this study introduces a weighted indicator approach designed to mitigate the impact of noise points and mutual interference among indicators on terrain feature points extraction. [Results] Three topographically complex regions were selected for experimentation. By integrating the multi-branch tree bend model with constrained Delaunay triangulation, terrain branches were identified and terrain feature points were extracted. Quantitative evaluations demonstrate that: (1) In terrain feature point extraction, the proposed weighted indicator method significantly reduces absolute error compared with single-indicator and multi-criteria methods, with residual distribution variance decreased by 10%~55%, effectively suppressing noise interference in feature point selection; (2) In terrain branch extraction, the characteristic lines extracted by the proposed method exhibit greater total length, and the completeness of terrain branches increased by 9%~44% compared with other methods, more comprehensively reflecting the complex branching structure of actual terrain. [Conclusions] The pseudo bend division method proposed in this paper enables accurate identification of terrain branches, with significantly improved precision in terrain feature point extraction compared with existing geometry-based methods. The characteristic lines extracted after pseudo bend division show higher conformity with actual terrain morphology and stronger structural coherence.

    • LU Yanxu, ZHANG Xueying, ZHANG Chunju
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Objectives] As the core spatial unit and spatiotemporal benchmark of the national governance system, administrative divisions reflect, through their historical evolution, the dynamic responses of state spatial governance to changes in political, economic, social, and other factors. However, existing studies on the evolution of administrative divisions rely excessively on static data recorded at the annual level, which makes it difficult to effectively uncover the specific temporal sequences of actions and the underlying causal mechanisms during the process of change. [Methods] This paper proposes a knowledge graph construction method for the evolution of administrative divisions from an event-based perspective. By abstracting administrative division changes as events and categorizing them, it develops a knowledge representation model of administrative division evolution from three levels: objects, events, and processes. Furthermore, it designs a knowledge graph representation scheme comprising multiple types of nodes—such as objects, events, attributes, temporal markers, and legal bases—and employs multi-layered relational patterns to reveal the logical connections among events, objects, and attributes. [Results] Based on data of administrative division changes in Yangzhou since 1949, an event-driven knowledge graph of administrative division evolution was constructed. This graph enables the representation of the complete life-cycle trajectory of Yangzhou's administrative divisions since 1949, and clearly delineates the specific change actions and causal relationships with related administrative divisions. [Conclusions] This method breaks through the limitations of the traditional year-based static recording approach, achieving fine-grained and temporally ordered dynamic characterization of the evolution process. It allows for a more precise depiction of the dynamic processes and temporal logic underlying administrative division changes, thereby providing a traceable and inferable spatiotemporal knowledge foundation to support national spatial governance and the development of the digital economy.

    • ZHONG Cheng, WU Sheng, WANG Peixiao, ZHANG Hengcai, CHENG Shifen, LU Feng
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Significance] Accurate prediction of urban residents activity intensity is a fundamental research topic in Geospatial Artificial Intelligence (GeoAI), with important applications in urban planning, traffic management, and public safety. Although numerous predictive models have been proposed, effectively mining functional similarity among urban mixed-use areas as a prior to guide prediction remains a major challenge. Due to their composite nature, mixed-use areas exert differentiated impacts on resident activities across different time periods. Measuring multi-functional similarity based on diverse functional features and incorporating it as a prior into spatial dependency modeling holds promise for improving prediction accuracy. [Methods] To address this, the study proposes a novel Prior- and Data-Guided Spatio-Temporal Prediction Model (PDGSTPM). First, a hyperedge construction mechanism is designed within a hypergraph theoretical framework to represent urban functional semantics. Through self-supervised learning, the mixed functional characteristics encoded by POIs are explicitly transformed into a quantifiable, high-order multi-relational network. This enables the construction of functional similarity priors and facilitates a shift in spatial dependency modeling from first-order pairwise relationships to high-order structures. Second, a multi-granular similarity measurement method based on 1-Wasserstein Distance is introduced to capture morphological consistency in historical observation sequences, enabling data-level similarity representation that complements the functional similarity prior. Finally, by integrating both prior-guided and data-driven modeling approaches, a dual-guided graph neural network architecture is developed to accurately model complex spatio-temporal dependencies. [Results] Experiments were conducted to predict urban human activity intensity using mobile phone data from Xiamen City in March 2023. Compared with the best-performing baseline method, the proposed model reduced RMSE and MAE by 3.2% and 9.1%, respectively, for one-step prediction, and by 5.6% and 9.8%, respectively, for two-step prediction. [Conclusions] The experimental results validate the effectiveness of the proposed dual-guidance architecture in accurately modeling spatio-temporal dependencies.

    • WANG Qingrong, MU Zhuangzhuang, ZHU Changfeng, HE Runtian, GAO Huanyi
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Objectives] Traffic flow prediction is crucial for urban management and intelligent transportation systems. This study addresses challenges posed by abnormal data resulting from external disturbances, sudden events, and complex spatiotemporal characteristics of traffic flow data. [Methods] We propose a Multilayer Neural Network Prediction model (MLNN-CAD) that accounts for abnormal data. To overcome the limitations of traditional anomaly detection methods, specifically, the imprecision caused by using a single parameter in the Isolation Forest algorithm, we introduce a multi-level Isolation Forest algorithm. This approach improves anomaly detection accuracy by considering inter-parameter constraints and the inherent structure and patterns in traffic flow. To more precisely capture the dynamic impact of anomalies, we construct anomalous impact dynamic graphs using node distances, Pearson correlation coefficients, and traffic flow. This approach surpasses the limitations of traditional M-order matrices. Additionally, to better identify key traffic areas, we build a dynamic graph of important nodes based on traffic congestion index, rather than relying solely on node entry and exit data. This captures localized dynamic traffic information more effectively. We integrate Graph Convolutional Networks (GCN) and multi-layer Graph Attention Networks with residual connections (ResGAT) to extract global, anomaly-influenced, and key-node spatial information. For temporal modeling, Informer is used to capture global spatiotemporal dependencies, while Extended Long Short-Term Memory (XLSTM) networks extract anomaly- and key node-specific temporal features. The final traffic flow prediction is produced through a convolution fusion layer. [Results] The proposed model is validated using real-world traffic flow data from the PeMS04 and PeMS08 datasets, covering the period from January 1 to February 18, 2018. Experimental results show that the MLNN-CAD model outperforms existing models including Informer, XLSTM, STSGCN, STFGCN, and VMD-AGCGRN. On the PeMS04 dataset, it improves MAE, RMSE, and MAPE by 7.68%, 10.36%, and 6.06%, respectively, compared to VMD-AGCGRN. [Conclusions] The MLNN-CAD model offers a robust theoretical foundation for short-term traffic flow prediction under abnormal data conditions.

    • CHEN Wenjun, ZHOU Chenxin, Tom Lotz, FENG Yuqian, ZHU Mingyu, CHEN Min, HE Bin
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Objectives] Geographical research, particularly studies on emerging geographical concepts, is influenced by the inherent complexity of the discipline and the diversity of research perspectives. These factors often lead to discrepancies in cognition and focus among researchers, resulting in diverse and heterogeneous conceptual expressions, as well as hindrances to the effective retrieval and systematic integration of domain knowledge. Text topic models, as a representative method in geographical research topic extraction, provide a key technical approach to addressing these issues. However, existing models exhibit limitations in semantic parsing and expression. Their lack of interpretability and the "black box" nature restrict practical application, necessitating further exploration and refinement. [Methods] This study investigates the emerging interdisciplinary concept of "small and micro wetlands" and proposes a geographical research topic extraction method based on an integrated BERT-LDA model. The method leverages the advantages of the BERT model in long-text semantic understanding and the topic interpretability of the LDA model. It extracts and uncovers research themes and their underlying relationships related to small and micro wetlands from a large corpus of journal articles. Moreover, by constructing tailored paper retrieval rules, this method facilitates the deepening and expansion of domain-specific knowledge. [Results] The findings indicate that embedding high-dimensional semantic feature vectors of terms from journal papers into a low-dimensional topic space, combined with the introduction of a feature fusion adjustment factor in the calculation of keyword-topic influence, enhances the semantic parsing and expression capabilities of text topic models. This approach also helps overcome the "black box" limitations of existing models. In addition, the proposed iterative modeling process gradually improves the differentiation and representativeness of output topics, while optimizing the distribution structure of keywords in the corresponding topic semantic space. Based on 4 606 Chinese journal articles published between 2012 and 2022 from the Wanfang Database, the integrated model identifies wetland pollution purification, urban wetland parks, and pond aquaculture as the three main research themes in the study of small and micro wetlands. Furthermore, 112 paper retrieval rules, derived from 11 keywords such as plants, removal, wastewater, microorganisms, urban, and landscape, were generated, enabling effective literature retrieval without reliance on specific terminology or naming conventions. [Conclusions] In the integrative and interdisciplinary context of geographical research, the proposed method can effectively consolidate fragmented domain knowledge caused by terminological diversity and nomenclatural heterogeneity, from a knowledge engineering perspective. This approach provides a feasible solution to enhance the interpretability of academic knowledge mining. Moreover, the results offer valuable insights for the conservation and management of small and micro wetlands.

    • ZHANG Jiadan, ZHU Kun, ZHANG Zhenchao, GONG Zhihui, GUO Haitao, DAI Chenguang
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Objectives] Addressing the unique challenges of high-resolution remote sensing images, such as vast scene coverage, complex backgrounds, detailed ship targets, and susceptibility to interference from similar targets, this paper proposes a ship detection method based on multi-scale Gaussian contextual feature attention. [Methods] First, a Spatial and Channel-Gaussian Context Transformer (SC-GCT) module is designed to effectively mitigate background noise and interference from similar targets. Second, to further enhance the informativeness of the feature maps, a Multi-Scale Feature Adaptive Fusion (MFAF) method is introduced, enabling more accurate representation of target semantics and spatial position. [Results] Comparative experiments were conducted on the HRSC2016 dataset. The results show that our algorithm achieves an mAP value of 96.8%, which is 1.3% higher than the baseline model YOLOv8, with both detection accuracy and efficiency surpassing existing mainstream ship detection algorithms. Incorporating the SC-GCT and MFAF modules into the baseline network increased mAP by 0.7% and 1.1%, respectively, verifying the effectiveness of each component. To further validate the practical value of the proposed algorithm, remote sensing images from real port scenes such as Yokosuka Port, Lushun Port, and Mayport were tested. The algorithm maintained stable performance in these complex real-world environments. [Conclusions] The proposed algorithm significantly improves ship detection accuracy while demonstrating strong practicality and generalization capability in complex background scenarios.

    • LUO Yansong, CHEN Fulong, ZHU Meng, GAO Sheng, LI Hongqiang, ZHANG Xinru, CHEN Caiyan, CHENG Yanni
      Download PDF ( ) HTML ( )   Knowledge map   Save

      [Objectives] Amidst urbanization, protecting archaeological heritage is increasingly challenging, and traditional predictive models often lack accuracy in complex terrains. To address this, this study develops a new methodology to improve the accuracy and efficiency of archaeological site prediction by fusing remote sensing image texture features with traditional environmental factors. [Methods] Multi-source data from ASTER GDEM, OpenStreetMap, Sentinel-2, and Sentinel-1 SAR were integrated to extract 12 environmental factors (e.g., elevation, slope) and 9 texture features based on the Gray-Level Co-occurrence Matrix (GLCM). Three predictive models—Frequency Ratio (FR), Maximum Entropy (MaxEnt), and Random Forest (RF)—were developed. The FR model was used to reveal ancient settlement preferences, while MaxEnt and RF analyzed the relative importance of environmental factors influencing site distribution. Model performance was systematically evaluated using the Area Under the ROC Curve (AUC) for overall accuracy and Kvamme's Gain for predictive efficacy across different probability zones. [Results] Using Xi'an as a case study, the impact of fusing remote sensing textures on model performance varied. The Random Forest model most effectively leveraged the new features, with its AUC increasing from 0.85 to 0.86, while the AUCs of the MaxEnt and FR models showed no significant change. In terms of spatial distribution, RF achieved efficient identification (92.56% of known sites) within a relatively large area (44.53%), with its gain value in the higher probability region (level 5) improving significantly from 0.59 to 0.67. The MaxEnt model demonstrated exceptional precision, achieving a gain of 0.90 in an extremely small area (0.1%). In contrast, the FR model had the weakest performance, capturing only 7.79% of sites in high-probability zones, indicating limited discriminatory power. [Conclusions] This study confirms that integrating remote sensing image texture features is an effective pathway for enhancing archaeological prediction, but model selection must align with specific research objectives. Random Forest is most suitable for large-scale, high-efficiency survey planning, while MaxEnt is particularly valuable for precisely targeting small, high-accuracy site hotspots. The proposed multi-model evaluation framework provides a scientific basis for methodological selection tailored to different archaeological goals and offers significant practical value for the preemptive protection of cultural heritage.