本期要文(可全文下载)

基于NoSQL的海量空间数据云存储与服务方法

展开
  • 福州大学福建省空间信息工程研究中心空间数据挖掘与信息共享教育部重点实验室, 福州350002
陈崇成(1968-),男,福建闽清县人,博士,教授,研究方向为地学可视化与虚拟地理环境、空间数据挖掘与地理知识服务。E-mail:chencc@fzu.edu.cn

收稿日期: 2012-11-19

  修回日期: 2012-12-31

  网络出版日期: 2013-04-18

基金资助

国家科技支撑计划项目(2013BAH28F00);福建省科技计划项目(2010I0008,2010HZ0004-1);欧盟第七框架国际合作项目(FP7-2009-People-IRSES, No.247608)。

Massive Geo-spatial Data Cloud Storage and Services Based on NoSQL Database Technique

Expand
  • Laboratory of Spatial Data Mining and Information Sharing of Ministry of Education, Spatial Information Research Centre of Fujian, Fuzhou University, Fuzhou 350002, China

Received date: 2012-11-19

  Revised date: 2012-12-31

  Online published: 2013-04-18

摘要

近年来, 实现海量空间数据高效地存储管理和在线服务, 成为地学信息科学领域日益关注的热点问题。本文根据矢量和栅格空间数据的不同特点, 提出并实现了矢量栅格数据一体化的海量空间数据分布式云存储管理与访问服务方案, 在海量矢量数据存储和处理中创新性引入分布式图数据库Neo4J 和并行图计算框架。在三层式空间数据云存储架构基础上, 给出NoSQL数据库技术的栅格和矢量数据云存储的实现策略与方法, 并开展了通用数据访问接口的设计。采用分布式文件系统HDFS存储栅格数据, 并使用列族数据库HBase 对其建立分布式空间索引, 及采用满足ACID约束的分布式图数据库Neo4J 来存储矢量数据, 并使用R树建立空间索引。在自主研发的地理知识云平台GeoKSCloud 框架下, 初步实现了核心组件-空间数据聚合中心(GeoDAC)软件, 可为各类用户提供空间数据分布式存储管理和访问服务。通过搭建试验床, 开展GeoDAC与开源GIS 软件PostGIS 在矢量数据读写访问性能方面的对比测试。结果表明, 虽然GeoDAC没有获得写入性能的加速作用, 但其具有PostGIS 无法比拟的强大读取性能。GeoDAC将海量数据经过空间分割后分布在集群上, 能够并行处理查询请求, 极大地提高空间查询速度, 具有广阔的应用前景。

本文引用格式

陈崇成, 林剑峰, 吴小竹, 巫建伟, 连惠群 . 基于NoSQL的海量空间数据云存储与服务方法[J]. 地球信息科学学报, 2013 , 15(2) : 166 -174 . DOI: 10.3724/SP.J.1047.2013.00166

Abstract

In recent years, how to implement a efficient storage management on massive geo-spatial data and ulteriorly web service for a broad variety of users, has becomes an increasingly hot issue in the field of geographical information science, with the explosive growth of Earth Observation System(EOS) data and the flourish of the new geography paradigm. A cloud storage system to provide distributed cloud-enabled storage management and services for massive geo-spatial data with an integrity of both vector and raster formats is proposed in this paper in the light of their intrinsic differences. Based on three-tier layer architecture, we put forward its implementation strategy and method of cloud storage management for raster and vector data respectively based on NoSQL database system, followed by a universal data access interface. The novel technolgies, which include distribute graph database-Neo4J and parralel graph compute framework on massive vector data storage and process were introduced. In our research, using the distributed file system-HDFS and the column family database-HBase as a container to store massive raster data with a distributed space index technique, and the distributed graph database system-Neo4J is used to store massive vector data in view of the constraints of ACID with a R-tree space index. Under the unified framework of Geographical Knowledge Cloud platform GeoKSCloud developed by our research group as a successor of GeoKSCloud, its core components — spatial data aggregation centre (GeoDAC) software has been in shape with aim to provide some distributed spatial data storage management and access services for all types of end users. A tesbed is established with serveral 5 physical nodes and accordingly 7 virtual nodes with different areas and operational systems. We carried out an elaborate comparison between GeoDAC and open source GIS software — PostGIS to validate vector data reading & writing performance. The preliminary results indicated that, although GeoDAC has no accelerated write performance than PostGIS, but it gains significant powerful reading or spatial query performance than PostGIS. Inside GeoDAC, space-partitioned massive data is distributed on the cluster and spatial query operation is implemented in parallel, consequently an enhanced rate of spatial query is gained. The achieved techniques and system in our work will provide a variety of users a powerful tool for further in-depth processing and owns a broad application prospects.

参考文献

[1] Haklay M, Singleton A, Parker C. Web mapping 2.0: Theneogeography of the GeoWeb[J]. Geography Compass,2008,2(6):2011-2039.

[2] 李德仁.论广义空间信息网格和狭义空间信息网格[J].遥感学报,2005,9(5):513-520.

[3] Güting R H. An introduction to spatial database systems[J].The VLDB Journal, 1994,3(4):357-399.

[4] Cattell R. Scalable SQL and NoSQL data stores[J]. ACMSIGMOD Record, 2011,39(4):12-27.

[5] Brewer E A. Towards robust distributed systems[C]. Proceedingsof the Annual ACM Symposium on Principles ofDistributed Computing, 2000,7-10.

[6] Gilbert S, Lynch N. Brewer's conjecture and the feasibilityof consistent, available, partition-tolerant web services[J].ACM SIGACT News, 2002,33(2):51-59.

[7] Brewer E. CAP twelve years later: How the“Rules”havechanged.[J]. Computer, 2012,45(2):23-29.

[8] Corbett J C, Dean J, Epstein M, et al. Spanner: Google’sglobally-distributed database[C]. To appear in Proceedingsof OSDI, 2012.

[9] Chang F, Dean J, Ghemawat S, et al. Bigtable: A distributedstorage system for structured data[J]. ACM Transactionson Computer Systems (TOCS), 2008,26(2):4-29.

[10] Ghemawat S, Gobioff H, Leung S T. The Google file system[C]. ACM SIGOPS Operating Systems Review, 2003,29-43.

[11] Dean J, Ghemawat S. MapReduce: simplified data processingon large clusters[J]. Communications of theACM, 2008,51(1):107-113.

[12] Stonebraker M. SQL databases v. NoSQL databases[J].Communications of the ACM, 2010,53(4):10-11.

[13] Leavitt N. Will NoSQL databases live up to their promise?[J]. Computer, 2010,43(2):12-14.

[14] 林子雨,赖永炫,林琛,等.云数据库研究[J].软件学报,2012,23(5):1148-1166.

[15] 王意洁,孙伟东,周松,等.云计算环境下的分布存储关键技术[J].软件学报,2012,23(4):962-986.

[16] 张桂刚,李超,张勇,等.一种基于海量信息处理的云存储模型研究[J].计算机研究与发展,2012(49):32-36.

[17] 周可,王桦,李春花.云存储技术及其应用[J].中兴通讯技术,2010 (4):24-27.

[18] 郭东,杜勇,胡亮.基于HDFS 的云数据备份系统[J].吉林大学学报:理学版,2012,50(1):101-105.

[19] Rodriguez M A, Neubauer P. Constructions from dots andlines[J]. Bulletin of the American Society for InformationScience and Technology, 2010,36(6):35-41.

[20] Eifrem E. Neo4J——the benefits of graph databases.2009. http://www.oscon.com/oscon2009/public/schedule/detail/8364

[21] Vicknair C, Macias M, Zhao Z, et al. A comparison of agraph database and a relational database: A data provenanceperspective[C]. ACM, 2010,42.

[22] Kernighan B, Lin S. An eflicient heuristic procedure forpartitioning graphs[J]. Bell System Technical Journal,1970,49(1):291-307.

[23] Fiduccia C M, Mattheyses R M. A linear-time heuristicfor improving network partitions[C]. IEEE, 1982,175-181.

[24] Lin J, Schatz M. Design patterns for efficient graph algorithmsin MapReduce[C]. Proceedings of the EighthWorkshop on Mining and Learning with Graphs: ACM,2010,78-85.

[25] Gehweiler J, Meyerhenke H. A distributed diffusive heuristicfor clustering a virtual P2P supercomputer[C]. IEEE,2010,1-8.

[26] Valiant L G. A bridging model for parallel computation[J]. Communications of the ACM, 1990,33(8):103-111.

[27] Malewicz G, Austern M H, Bik A J C, et al. Pregel: A systemfor large-scale graph processing[C]. Proceedings ofthe 2010 ACM SIGMOD International Conference onManagement of Data,2010,135-146.

[28] Amazon Simple Storage Service (Amazon S3). 2012.http://aws.amazon.com/s3/.

[29] Murty J. Programming Amazon Web Services: S3, EC2,SQS, FPS, and SimpleDB[C]. O'Reilly Media, Incorporated,2008.

[30] Amazon S3. 2012. http://en.wikipedia.org/wiki/Amazon_S3.

[31] Schäffer B, Baranski B, Foerster T. Towards spatial datainfrastructures in the clouds[J]. Geospatial Thinking,2010,1(1):399-418.

[32] Brantner M, Florescu D, Graf D, et al. Building a databaseon S3[C]. Proceedings of the 2008 ACM SIGMODInternational Conference on Management of Data, 2008,251-264.

[33] 吕雪锋, 程承旗, 龚健雅, 等. 海量遥感数据存储管理技术综述. 中国科学: 技术科学, 2012, 41(12): 1561-1573.

[34] Yang C, Goodchild M, Huang Q, et al. Spatial cloudcomputing: how can the geospatial sciences use and helpshape cloud computing[J]. International Journal of DigitalEarth, 2011,4(4):305-329.

[35] Pendleton C. The world according to Bing[J]. ComputerGraphics and Applications, 2010,30(4):15-17.

[36] Open G. Consortium Inc. Opengis simple features specificationfor sql[C]. Technical Report Revision 1.1, OGC,1999.

[37] Apache Lucene Core. 2012. http://lucene.apache.org/core/.

[38] Waldo J, Wyant G, Wollrath A, et al. A note on distributedcomputing[C]. Mobile Object Systems Towards theProgrammable Internet,997,49-64.

[39] Reed B, Junqueira F P. A simple totally ordered broadcastprotocol[C]. ACM, 2008,2.

[40] Andreev K, Racke H. Balanced graph partitioning[J].Theory of Computing Systems, 2006,39(6):929-939.

[41] The Apache Hama Project. 2012. http://hama.apache.org/. [2012-01-05]

[42] SPRING DATA. 2012. http://www.springsource.org/spring-data. [2012-09-10]

[43] Lin J X, Chen C C, Wu X Z, et al. GeoKSGrid: A geographicalknowledge grid with functions of spatial datamining and spatial decision[C]. 2011 IEEE InternationalConference on Spatial Data Mining and GeographicalKnowledge Services (ICSDM), 2011,121-126.

文章导航

/