Journal of Geo-information Science >
Research of the Parallel Spatial Database Proto System Based on MPP Architecture
Received date: 2015-03-12
Request revised date: 2015-05-14
Online published: 2016-02-04
Copyright
The efficiency for querying complex spatial information resources is an important indicator to evaluate the performance of current spatial databases. Traditional single node relation spatial data management is difficult to meet the demand of high-performance in querying large amounts of spatial data, especially for the complex join query on multi-table. In order to solve this problem, we design and implement a spatial database middleware prototype system. This system takes full advantages of the massive parallel processing (MPP) and shared-nothing architecture. In consideration of the characteristics of spatial data, we design the spatial data parallel import, multi-spatial-tables join strategy, spatial data query optimization and other algorithms and models. This paper firstly introduces the development status of parallel database systems in recent years, and then elaborates its MPP architecture and its organizational model, and the strategy of the join query on multi-spatial-table. Finally, we made some query experiments on massive spatial data and analyzed the results of these inquiries. The experimental results show that this system indicates a good performance (nearly linear speedup) in processing the complex query of massive spatial data. Compared with the tradition single node database, this system can fully improve the efficiency of complex querying for large spatial data, and it is a more efficient solution to solve the complex spatial data queries.
Key words: MPP; spatial database; parallel processing; shared-nothing
CHEN Dalun , CHEN Rongguo , XIE Jiong . Research of the Parallel Spatial Database Proto System Based on MPP Architecture[J]. Journal of Geo-information Science, 2016 , 18(2) : 151 -159 . DOI: 10.3724/SP.J.1047.2016.00151
Fig. 1 The schematic diagram of the partition and import of spatial data图1 空间数据划分与导入示意图 |
Fig. 2 The schematic diagram indicating the joining of multi-tables with partitioned column图2 分区列多表并行连接 |
Tab. 1 Algorithm for joining un-partitioned multi-tables表1 非分区多表并行连接算法 |
算法:非分区列多表并行连接 |
---|
步骤1 select A.name, A.shape from A For i=1 to Count_node(从节点数) 将表A的查询的投影列(A.name)、连接列(几何列A.shape)发送至协调者。其中,空间列以二进制BLOB对象的形式传输 end for 步骤2 create temp table tmp1(name char(20),shape st_geometry ) 协调者节点将结果集打包,生成临时结果集TDR For i=1 to Count_node(从节点数) 在各个节点上创建临时表TempA*,以及临时空间索引批量插入临时结果集TDR到各个节点的临时表 end for 步骤3 select ta.name, B.name,st_astext( ta.shape) from TempA* as ta,B where st_within(ta.shape = B.shape)=1 For i= 1 to P do(并行地) 对于各个节点,将临时表TempA*与表B进行连接操作,将结果集发送到协调者节点上 对于空间列进行st_astext数据类型转换 end for 协调者归并结果集 步骤4 协调者执行查询计划中的下一步骤(归并数据进行下一个表的连接或格式化输出至客户端) |
Fig. 3 The schematic diagram indicating the joining of multi-tables with un-partitioned spatial column图3 非分区列空间多表并行连接示意图 |
Fig. 4 The GDMC system architecture图4 GDMC系统架构图 |
Fig. 5 The diagram of query processor and multi-nodes executor图5 查询处理器与多节点执行器示例图 |
Tab. 2 System hardware specifications表2 系统的硬件配置 |
节点名 | 硬件配置 | 操作系统/JDK版本 | 数量/台 | 设备用途 |
---|---|---|---|---|
协调节点 | CPU:I7-3610 QM 内存:8 GB | CentOS 6.4/ JDK 1.7.0 | 1 | 管理后端支持节点,解析SQL语句,制定查询计划,归并查询结果 |
后端从节点 | CPU:I3-3220 内存:4 GB | CentOS 6.4/ JDK 1.7.0 | 4 | 提供数据库计算功能,返回子查询结果集 |
物理元数据库节点 | CPU:I5-2400 内存: 6 GB | CentOS 6.4/ JDK 1.7.0 | 1 | 提供数据库元数据信息。包括表、视图、约束等元数据 |
客户端节点 | CPU:I7-3610 QM 内存: 8 GB | CentOS 6.4/ JDK 1.7.0 | 1 | 输入SQL查询语句,输出查询结果 |
Fig. 6 The diagram of GDMC system’s logical networking图6 GDMC系统的逻辑组网 |
Fig. 7 The diagram of the parallel selection query experiment of GDMC system图7 系统的并行选择查询实验结果图(四节点) |
Fig. 8 The result diagram of the experiment for querying data in different sizes of query windows图8 统计查询不同大小窗口内的记录条数实验结果图 |
Fig. 9 The result diagram of the experiment for querying all the spatial objects in different sizes of query windows图9 系统查询遍历窗口中所有空间实体的实验结果图 |
Tab. 3 Statistics showing the results for querying the number of records that belong to a specified region表3 查询统计属于某地区的记录行数 |
结果集记录数 / 条 | 单节点萍执行平均时间 / s | 四节点执行平均时间 / s |
---|---|---|
130 000 | 383.312 | 100.269 |
Tab. 4 The result for querying the total area of bare soil in a specified region表4 查询某地区土地利用类型为裸土的总面积 |
单节点执行平均时间 / s | 四节点执行平均时间 / s |
---|---|
307.3 | 56.32 |
The authors have declared that no competing interests exist.
[1] |
[
|
[2] |
[
|
[3] |
|
[4] |
[
|
[5] |
[
|
[6] |
[
|
[7] |
[
|
[8] |
[
|
[9] |
[
|
[10] |
[
|
[11] |
[
|
[12] |
[
|
[13] |
[
|
[14] |
[
|
[15] |
[
|
[16] |
[
|
[17] |
[
|
[18] |
|
[19] |
|
[20] |
[
|
[21] |
|
[22] |
[
|
[23] |
[
|
/
〈 |
|
〉 |