地球信息科学学报 ›› 2016, Vol. 18 ›› Issue (2): 151-159.doi: 10.3724/SP.J.1047.2016.00151

• • 上一篇    下一篇

基于MPP架构的并行空间数据库原型系统的设计与实现

陈达伦1,2, 陈荣国1,**, 谢炯1   

  1. 1. 中国科学院地理科学与资源研究所 资源与环境信息系统国家重点实验室,北京 100101
    2. 中国科学院大学,北京 100049
  • 收稿日期:2015-03-12 修回日期:2015-05-14 出版日期:2016-02-10 发布日期:2016-02-04
  • 通讯作者: 陈荣国
  • 作者简介:

    作者简介:陈达伦(1990-),男,硕士生,主要从事空间数据库并行化研究。E-mail:Xiaoking31@126.com

  • 基金资助:
    基金项目:国家高技术发展研究计划“863”项目(2013AA12A204、2013AA122302)

Research of the Parallel Spatial Database Proto System Based on MPP Architecture

CHEN Dalun1,2, CHEN Rongguo1,*, XIE Jiong1   

  1. 1. State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2015-03-12 Revised:2015-05-14 Online:2016-02-10 Published:2016-02-04
  • Contact: CHEN Rongguo

摘要:

快速高效地查询信息是衡量当前空间数据库性能的重要指标之一。传统的单节点关系型空间数据管理方式难以满足大数据量空间数据查询的需求,特别是高性能的复杂空间多表连接任务需求。鉴此,本文设计并实现了基于Massive Parallel Processing(MPP)架构的并行空间数据库中间件原型系统。系统充分利用无共享(shared-nothing)架构的优势,特别是针对空间数据的特性,设计了并行空间数据划分与导入、并行空间多表连接、空间数据查询优化等算法与模型。首先介绍了近年来并行数据库系统的发展现状,接着阐述了基于MPP架构的并行空间数据库中间件系统的查询计划算法及其系统架构,最后作者对一些大规模数据量做查询实验及其查询结果分析。实验表明,在处理挖掘大规模数据量时,该系统有近似线性的加速比,相比于传统单节点数据库,它能充分提高海量空间数据的复杂查询的性能,解决了空间数据库并行化处理海量数据的问题。

关键词: MPP, 空间数据库, 并行, Shared Nothing

Abstract:

The efficiency for querying complex spatial information resources is an important indicator to evaluate the performance of current spatial databases. Traditional single node relation spatial data management is difficult to meet the demand of high-performance in querying large amounts of spatial data, especially for the complex join query on multi-table. In order to solve this problem, we design and implement a spatial database middleware prototype system. This system takes full advantages of the massive parallel processing (MPP) and shared-nothing architecture. In consideration of the characteristics of spatial data, we design the spatial data parallel import, multi-spatial-tables join strategy, spatial data query optimization and other algorithms and models. This paper firstly introduces the development status of parallel database systems in recent years, and then elaborates its MPP architecture and its organizational model, and the strategy of the join query on multi-spatial-table. Finally, we made some query experiments on massive spatial data and analyzed the results of these inquiries. The experimental results show that this system indicates a good performance (nearly linear speedup) in processing the complex query of massive spatial data. Compared with the tradition single node database, this system can fully improve the efficiency of complex querying for large spatial data, and it is a more efficient solution to solve the complex spatial data queries.

Key words: MPP, spatial database, parallel processing, shared-nothing