地球信息科学学报 ›› 2013, Vol. 15 ›› Issue (2): 166-174.doi: 10.3724/SP.J.1047.2013.00166

• 本期要文(可全文下载) • 上一篇    下一篇

基于NoSQL的海量空间数据云存储与服务方法

陈崇成, 林剑峰, 吴小竹, 巫建伟, 连惠群   

  1. 福州大学福建省空间信息工程研究中心空间数据挖掘与信息共享教育部重点实验室, 福州350002
  • 收稿日期:2012-11-19 修回日期:2012-12-31 出版日期:2013-04-25 发布日期:2013-04-18
  • 作者简介:陈崇成(1968-),男,福建闽清县人,博士,教授,研究方向为地学可视化与虚拟地理环境、空间数据挖掘与地理知识服务。E-mail:chencc@fzu.edu.cn
  • 基金资助:

    国家科技支撑计划项目(2013BAH28F00);福建省科技计划项目(2010I0008,2010HZ0004-1);欧盟第七框架国际合作项目(FP7-2009-People-IRSES, No.247608)。

Massive Geo-spatial Data Cloud Storage and Services Based on NoSQL Database Technique

CHEN Chongcheng, LIN Jianfeng, WU Xiaozhu, WU Jianwei, LIAN Huiqun   

  1. Laboratory of Spatial Data Mining and Information Sharing of Ministry of Education, Spatial Information Research Centre of Fujian, Fuzhou University, Fuzhou 350002, China
  • Received:2012-11-19 Revised:2012-12-31 Online:2013-04-25 Published:2013-04-18

摘要:

近年来, 实现海量空间数据高效地存储管理和在线服务, 成为地学信息科学领域日益关注的热点问题。本文根据矢量和栅格空间数据的不同特点, 提出并实现了矢量栅格数据一体化的海量空间数据分布式云存储管理与访问服务方案, 在海量矢量数据存储和处理中创新性引入分布式图数据库Neo4J 和并行图计算框架。在三层式空间数据云存储架构基础上, 给出NoSQL数据库技术的栅格和矢量数据云存储的实现策略与方法, 并开展了通用数据访问接口的设计。采用分布式文件系统HDFS存储栅格数据, 并使用列族数据库HBase 对其建立分布式空间索引, 及采用满足ACID约束的分布式图数据库Neo4J 来存储矢量数据, 并使用R树建立空间索引。在自主研发的地理知识云平台GeoKSCloud 框架下, 初步实现了核心组件-空间数据聚合中心(GeoDAC)软件, 可为各类用户提供空间数据分布式存储管理和访问服务。通过搭建试验床, 开展GeoDAC与开源GIS 软件PostGIS 在矢量数据读写访问性能方面的对比测试。结果表明, 虽然GeoDAC没有获得写入性能的加速作用, 但其具有PostGIS 无法比拟的强大读取性能。GeoDAC将海量数据经过空间分割后分布在集群上, 能够并行处理查询请求, 极大地提高空间查询速度, 具有广阔的应用前景。

关键词: 地理知识云, 数据聚合中心, 云存储, 空间数据, NoSQL

Abstract:

In recent years, how to implement a efficient storage management on massive geo-spatial data and ulteriorly web service for a broad variety of users, has becomes an increasingly hot issue in the field of geographical information science, with the explosive growth of Earth Observation System(EOS) data and the flourish of the new geography paradigm. A cloud storage system to provide distributed cloud-enabled storage management and services for massive geo-spatial data with an integrity of both vector and raster formats is proposed in this paper in the light of their intrinsic differences. Based on three-tier layer architecture, we put forward its implementation strategy and method of cloud storage management for raster and vector data respectively based on NoSQL database system, followed by a universal data access interface. The novel technolgies, which include distribute graph database-Neo4J and parralel graph compute framework on massive vector data storage and process were introduced. In our research, using the distributed file system-HDFS and the column family database-HBase as a container to store massive raster data with a distributed space index technique, and the distributed graph database system-Neo4J is used to store massive vector data in view of the constraints of ACID with a R-tree space index. Under the unified framework of Geographical Knowledge Cloud platform GeoKSCloud developed by our research group as a successor of GeoKSCloud, its core components — spatial data aggregation centre (GeoDAC) software has been in shape with aim to provide some distributed spatial data storage management and access services for all types of end users. A tesbed is established with serveral 5 physical nodes and accordingly 7 virtual nodes with different areas and operational systems. We carried out an elaborate comparison between GeoDAC and open source GIS software — PostGIS to validate vector data reading & writing performance. The preliminary results indicated that, although GeoDAC has no accelerated write performance than PostGIS, but it gains significant powerful reading or spatial query performance than PostGIS. Inside GeoDAC, space-partitioned massive data is distributed on the cluster and spatial query operation is implemented in parallel, consequently an enhanced rate of spatial query is gained. The achieved techniques and system in our work will provide a variety of users a powerful tool for further in-depth processing and owns a broad application prospects.

Key words: Geographical Knowledge Cloud, data aggregation, geo-spatial data, access service, vector data, cloud-enabled data storage, NoSQL