集成空间数据引擎和图数据库的复杂地理时空语义建模研究
作者贡献:Author Contributions
岳梓晨、钟少波、梅新参与实验设计;岳梓晨完成实验操作;岳梓晨、钟少波参与论文的写作和修改。所有作者均阅读并同意最终稿件的提交。
The study was designed by YUE Zichen, ZHONG Shaobo, and MEI Xin. The experimental operation was completed by YUE Zichen. The manuscript was drafted and revised by YUE Zichen and ZHONG Shaobo. All the authors have read the last version of paper and consented for submission.
|
岳梓晨(1998—),男,安徽阜阳人,硕士生,主要研究方向为地理知识图谱和时空语义建模。E-mail: yuezc.gis@gmail.com |
收稿日期: 2024-12-30
修回日期: 2025-04-08
网络出版日期: 2025-06-06
基金资助
国家自然科学基金项目(72174031)
Research on Complex Spatiotemporal Semantic Modeling Integrating Spatial Data Engines and Graph Databases
Received date: 2024-12-30
Revised date: 2025-04-08
Online published: 2025-06-06
Supported by
National Natural Science Foundation of China(72174031)
岳梓晨 , 钟少波 , 梅新 . 集成空间数据引擎和图数据库的复杂地理时空语义建模研究[J]. 地球信息科学学报, 2025 , 27(6) : 1289 -1304 . DOI: 10.12082/dqxxkx.2025.240715
[Objectives] Knowledge graphs, as a cutting-edge technology for integrating multimodal data sources, have garnered significant attention in the GIS domain. These graphs are typically constructed using graph databases. However, mainstream graph databases still face challenges in effectively organizing and analyzing geospatial-temporal data. [Methods] To address this issue, this paper proposes an approach to modeling spatiotemporal semantics and query optimization that bridges graph and spatial data engine implemented within relational databases. In the graph database, geographic entities are stored as lightweight placeholder nodes (storing only mapping IDs) and linked to spatiotemporal index nodes (such as time trees and Geohash encodings) to enhance aggregation capabilities. Meanwhile, complete geospatial-temporal objects are stored in a relational database, while table partitioning strategies are employed to improve retrieval efficiency. This approach uses unified identifiers and JDBC for routing geographic entities across the databases. When users invoke pre-registered spatiotemporal functions in the graph database, a query rewriter transforms the graph queries into SQL statements based on entity identifiers, pushes them to the relational database for processing, and returns the results to the graph query pipeline. Additionally, a two-phase commit protocol ensures data consistency across the heterogeneous databases. [Results] We implemented a prototype system integrating Neo4j and PostGIS and conducted experiments on query and storage efficiency using a multisource spatiotemporal dataset from Shenzhen (including taxi trajectories, bike-sharing trajectories, road networks, POIs, and remote sensing imagery). Compared to mainstream graph database systems (e.g., Neo4j and GraphDB), our approach significantly improves performance for geospatial-temporal queries, reducing response times by 1~2 orders of magnitude in complex computational scenarios and enabling raster computations unsupported by native graph databases. By leveraging lightweight graph nodes and PostGIS data compression, storage space is reduced by approximately 3~5 times. Compared to virtual knowledge graph systems (e.g., Ontop), our method shows minimal differences in spatial query performance and storage overhead, while achieving notably faster response times for large-scale spatiotemporal queries. [Conclusions] Compared to existing methods, our approach leverages existing graph databases to construct materialized spatiotemporal knowledge graphs, enhancing modeling flexibility and query efficiency for geospatial-temporal data. It also supports user-defined extensions to the geospatial-temporal function library, offering a novel framework for efficiently managing and analyzing such data within knowledge graphs.
)与RDBMS地理实体(
)链接,并提供外部接口供其他引擎调用。查询转换引擎负责将图数据库中预注册的时空查询函数转换为SQL语句并下推至RDBMS中执行。数据同步器引擎则基于2PC(Two-Phase Commit,两阶段提交)协议确保GDBMS与RDBMS之间的数据一致性。本节将重点讨论上述关键技术。表1 地理时空对象的属性映射表Tab. 1 Attribute mapping table for geospatial-temporal objects |
| 标识属性 | 图数据库 | 关系数据库 |
|---|---|---|
| EntityID | 地理节点的唯一标识符 | 实体表的主键 (关联地理节点的ID) |
| EntityClass | 地理节点的标签 | 实体表名 (关联地理节点的标签) |
| Time | 地理节点关联的时间索引 | 实体表的时间列 |
| Geohash | 地理节点关联的Geohash索引 | 实体表的Geohash列 |
表2 社区活动数据(截至2024年12月)Tab. 2 Overview of experimental datasets |
| 对比维度 | Neo4j | GraphDB | Ontop |
|---|---|---|---|
| 数据模型 | Graph | Graph、RDF | RDB、RDF |
| 数据支持 | 点、线、面 | 点、线、面 | 点、线、面、栅格 |
| 扩展方案 | Neo4j spatial | GeoSPARQL | Ontop-spatial |
| 类别 | 开源 | 商业 | 开源 |
| Github Stars | 13.6 k | - | 675 |
表3 查询用例Tab. 3 Query cases |
| 类别 | 查询设置 | |
|---|---|---|
| Q1 | Points within Polygon | 查询位于深圳市火车站不同矩形窗口内的兴趣点,其中查询窗口的边长序列为1、5、10 km |
| Q2 | Lines within Polygon | 查询位于深圳市火车站不同矩形窗口内的道路,其中查询窗口的边长序列为1、5、10 km |
| Q3 | Polygons within Polygon | 查询位于深圳市火车站不同矩形窗口内的兴趣面,其中查询窗口的边长序列为1、5、10 km |
| Q4 | Polygon intersects with points | 查询与深圳市火车站不同矩形窗口相交的兴趣点,其中查询窗口的边长序列为1、5、10 km |
| Q5 | Polygon intersects with lines | 查询与深圳市火车站不同矩形窗口相交的道路,其中查询窗口的边长序列为1、5、10 km |
| Q6 | Line intersection | 查询与不同规模道路网络数据子集相交的单车轨迹,其中道路网络数据子集序列为30、40、50、60条 |
| Q7 | 时空查询(Point) | 查询位于深圳市火车站30 km矩形窗口内的出租车GPS点,其中时间窗口序列为10:00—14:00、 08:00—16:00、06:00—18:00、04:00—20:00、02:00—22:00和00:00—24:00 |
| Q8 | 时空查询(Line) | 查询位于深圳市火车站30 km矩形窗口内的单车轨迹,其中时间窗口序列为10:00—13:00、08:00—16:00、06:00—18:00、04:00—20:00、02:00—22:00和00:00—24:00 |
| Q9 | 栅格值提取 | 查询在深圳市莲花山公园内部栅格单元的平均植被覆盖度NDVI和海拔高度 |
| Q10 | 栅格代数运算 | 查询在深圳市莲花山公园海拔>500 m、坡度>30°且植被指数>0.6的区域 |
| Q11 | 空查询测试 | 查询系统跨数据库通信和处理的基础耗时,不涉及数据检索和计算 |
表4 GraST系统中多个查询的阶段耗时Tab. 4 Stage-wise response time of multiple queries in GraST system (s) |
| 查询 | 参数解析 | SQL转换 | 查询执行 | 结果映射 | 总时间 |
|---|---|---|---|---|---|
| Q1 | 0.002 | 0.029 | 0.069 | 0.194 | 0.294 |
| Q6 | 0.004 | 0.031 | 19.396 | 0.021 | 19.452 |
| Q8 | 0.003 | 0.027 | 0.604 | 0.491 | 1.125 |
| Q9 | 0.002 | 0.033 | 1.094 | 0.004 | 1.133 |
| Q10 | 0.003 | 0.032 | 2.595 | 0.004 | 2.634 |
表5 GraST 及对比系统的存储空间消耗Tab. 5 Storage space consumption of GraST and comparison systems |
| 系统 | 矢量数据 /MB | 相对增减 /% | 栅格数据 /MB | 相对增减 /% |
|---|---|---|---|---|
| GraST | 1 065.4 | - | 35.4 | - |
| Neo4j | 6 105.5 | 473.1 | - | - |
| GraphDB | 3 975.3 | 273.2 | - | - |
| Ontop | 592.5 | -44.4 | 24.3 | -31.4 |
利益冲突:Conflicts of Interest 所有作者声明不存在利益冲突。
All authors disclose no relevant conflicts of interest.
| [1] |
李德仁. 论时空大数据的智能处理与服务[J]. 地球信息科学学报, 2019, 21(12):1825-1831.
[
|
| [2] |
姚迪, 张超, 黄建辉, 等. 时空数据语义理解:技术与应用[J]. 软件学报, 2018, 29(7):2018-2045.
[
|
| [3] |
陆锋, 诸云强, 张雪英. 时空知识图谱研究进展与展望[J]. 地球信息科学学报, 2023, 25(6):1091-1105.
[
|
| [4] |
张雪英, 张春菊, 吴明光, 等. 顾及时空特征的地理知识图谱构建方法[J]. 中国科学:信息科学, 2020, 50(7):1019-1032.
[
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
李悦, 孙坦, 赵瑞雪, 等. 大规模RDF三元组转换及存储工具比较研究[J]. 数字图书馆论坛, 2020(11):2-12.
[
|
| [12] |
卢海川, 符海东, 刘宇. 基于CAN的地理语义数据存储与检索机制[J]. 计算机科学, 2019, 46(2):171-177.
[
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
向隆刚, 高萌, 王德浩, 等. Geohash-Trees:一种用于组织大规模轨迹的自适应索引[J]. 武汉大学学报(信息科学版), 2019, 44(3):436-442.
[
|
| [20] |
|
| [21] |
|
| [22] |
陆锋, 余丽, 仇培元. 论地理知识图谱[J]. 地球信息科学学报, 2017, 19(6):723-734.
[
|
| [23] |
仲腾, 张雪英, 许沛, 等. 基于云原生的地理空间知识库管理关键技术与服务方法研究[J]. 地球信息科学学报, 2024, 26(9):2013-2025.
[
|
/
| 〈 |
|
〉 |