ARTICLES

Massive Geo-spatial Data Cloud Storage and Services Based on NoSQL Database Technique

Expand
  • Laboratory of Spatial Data Mining and Information Sharing of Ministry of Education, Spatial Information Research Centre of Fujian, Fuzhou University, Fuzhou 350002, China

Received date: 2012-11-19

  Revised date: 2012-12-31

  Online published: 2013-04-18

Abstract

In recent years, how to implement a efficient storage management on massive geo-spatial data and ulteriorly web service for a broad variety of users, has becomes an increasingly hot issue in the field of geographical information science, with the explosive growth of Earth Observation System(EOS) data and the flourish of the new geography paradigm. A cloud storage system to provide distributed cloud-enabled storage management and services for massive geo-spatial data with an integrity of both vector and raster formats is proposed in this paper in the light of their intrinsic differences. Based on three-tier layer architecture, we put forward its implementation strategy and method of cloud storage management for raster and vector data respectively based on NoSQL database system, followed by a universal data access interface. The novel technolgies, which include distribute graph database-Neo4J and parralel graph compute framework on massive vector data storage and process were introduced. In our research, using the distributed file system-HDFS and the column family database-HBase as a container to store massive raster data with a distributed space index technique, and the distributed graph database system-Neo4J is used to store massive vector data in view of the constraints of ACID with a R-tree space index. Under the unified framework of Geographical Knowledge Cloud platform GeoKSCloud developed by our research group as a successor of GeoKSCloud, its core components — spatial data aggregation centre (GeoDAC) software has been in shape with aim to provide some distributed spatial data storage management and access services for all types of end users. A tesbed is established with serveral 5 physical nodes and accordingly 7 virtual nodes with different areas and operational systems. We carried out an elaborate comparison between GeoDAC and open source GIS software — PostGIS to validate vector data reading & writing performance. The preliminary results indicated that, although GeoDAC has no accelerated write performance than PostGIS, but it gains significant powerful reading or spatial query performance than PostGIS. Inside GeoDAC, space-partitioned massive data is distributed on the cluster and spatial query operation is implemented in parallel, consequently an enhanced rate of spatial query is gained. The achieved techniques and system in our work will provide a variety of users a powerful tool for further in-depth processing and owns a broad application prospects.

Cite this article

CHEN Chong-Cheng, LIN Jian-Feng, TUN Xiao-Zhu, WU Jian-Wei, LIAN Hui-Qun . Massive Geo-spatial Data Cloud Storage and Services Based on NoSQL Database Technique[J]. Journal of Geo-information Science, 2013 , 15(2) : 166 -174 . DOI: 10.3724/SP.J.1047.2013.00166

References

[1] Haklay M, Singleton A, Parker C. Web mapping 2.0: Theneogeography of the GeoWeb[J]. Geography Compass,2008,2(6):2011-2039.

[2] 李德仁.论广义空间信息网格和狭义空间信息网格[J].遥感学报,2005,9(5):513-520.

[3] Güting R H. An introduction to spatial database systems[J].The VLDB Journal, 1994,3(4):357-399.

[4] Cattell R. Scalable SQL and NoSQL data stores[J]. ACMSIGMOD Record, 2011,39(4):12-27.

[5] Brewer E A. Towards robust distributed systems[C]. Proceedingsof the Annual ACM Symposium on Principles ofDistributed Computing, 2000,7-10.

[6] Gilbert S, Lynch N. Brewer's conjecture and the feasibilityof consistent, available, partition-tolerant web services[J].ACM SIGACT News, 2002,33(2):51-59.

[7] Brewer E. CAP twelve years later: How the“Rules”havechanged.[J]. Computer, 2012,45(2):23-29.

[8] Corbett J C, Dean J, Epstein M, et al. Spanner: Google’sglobally-distributed database[C]. To appear in Proceedingsof OSDI, 2012.

[9] Chang F, Dean J, Ghemawat S, et al. Bigtable: A distributedstorage system for structured data[J]. ACM Transactionson Computer Systems (TOCS), 2008,26(2):4-29.

[10] Ghemawat S, Gobioff H, Leung S T. The Google file system[C]. ACM SIGOPS Operating Systems Review, 2003,29-43.

[11] Dean J, Ghemawat S. MapReduce: simplified data processingon large clusters[J]. Communications of theACM, 2008,51(1):107-113.

[12] Stonebraker M. SQL databases v. NoSQL databases[J].Communications of the ACM, 2010,53(4):10-11.

[13] Leavitt N. Will NoSQL databases live up to their promise?[J]. Computer, 2010,43(2):12-14.

[14] 林子雨,赖永炫,林琛,等.云数据库研究[J].软件学报,2012,23(5):1148-1166.

[15] 王意洁,孙伟东,周松,等.云计算环境下的分布存储关键技术[J].软件学报,2012,23(4):962-986.

[16] 张桂刚,李超,张勇,等.一种基于海量信息处理的云存储模型研究[J].计算机研究与发展,2012(49):32-36.

[17] 周可,王桦,李春花.云存储技术及其应用[J].中兴通讯技术,2010 (4):24-27.

[18] 郭东,杜勇,胡亮.基于HDFS 的云数据备份系统[J].吉林大学学报:理学版,2012,50(1):101-105.

[19] Rodriguez M A, Neubauer P. Constructions from dots andlines[J]. Bulletin of the American Society for InformationScience and Technology, 2010,36(6):35-41.

[20] Eifrem E. Neo4J——the benefits of graph databases.2009. http://www.oscon.com/oscon2009/public/schedule/detail/8364

[21] Vicknair C, Macias M, Zhao Z, et al. A comparison of agraph database and a relational database: A data provenanceperspective[C]. ACM, 2010,42.

[22] Kernighan B, Lin S. An eflicient heuristic procedure forpartitioning graphs[J]. Bell System Technical Journal,1970,49(1):291-307.

[23] Fiduccia C M, Mattheyses R M. A linear-time heuristicfor improving network partitions[C]. IEEE, 1982,175-181.

[24] Lin J, Schatz M. Design patterns for efficient graph algorithmsin MapReduce[C]. Proceedings of the EighthWorkshop on Mining and Learning with Graphs: ACM,2010,78-85.

[25] Gehweiler J, Meyerhenke H. A distributed diffusive heuristicfor clustering a virtual P2P supercomputer[C]. IEEE,2010,1-8.

[26] Valiant L G. A bridging model for parallel computation[J]. Communications of the ACM, 1990,33(8):103-111.

[27] Malewicz G, Austern M H, Bik A J C, et al. Pregel: A systemfor large-scale graph processing[C]. Proceedings ofthe 2010 ACM SIGMOD International Conference onManagement of Data,2010,135-146.

[28] Amazon Simple Storage Service (Amazon S3). 2012.http://aws.amazon.com/s3/.

[29] Murty J. Programming Amazon Web Services: S3, EC2,SQS, FPS, and SimpleDB[C]. O'Reilly Media, Incorporated,2008.

[30] Amazon S3. 2012. http://en.wikipedia.org/wiki/Amazon_S3.

[31] Schäffer B, Baranski B, Foerster T. Towards spatial datainfrastructures in the clouds[J]. Geospatial Thinking,2010,1(1):399-418.

[32] Brantner M, Florescu D, Graf D, et al. Building a databaseon S3[C]. Proceedings of the 2008 ACM SIGMODInternational Conference on Management of Data, 2008,251-264.

[33] 吕雪锋, 程承旗, 龚健雅, 等. 海量遥感数据存储管理技术综述. 中国科学: 技术科学, 2012, 41(12): 1561-1573.

[34] Yang C, Goodchild M, Huang Q, et al. Spatial cloudcomputing: how can the geospatial sciences use and helpshape cloud computing[J]. International Journal of DigitalEarth, 2011,4(4):305-329.

[35] Pendleton C. The world according to Bing[J]. ComputerGraphics and Applications, 2010,30(4):15-17.

[36] Open G. Consortium Inc. Opengis simple features specificationfor sql[C]. Technical Report Revision 1.1, OGC,1999.

[37] Apache Lucene Core. 2012. http://lucene.apache.org/core/.

[38] Waldo J, Wyant G, Wollrath A, et al. A note on distributedcomputing[C]. Mobile Object Systems Towards theProgrammable Internet,997,49-64.

[39] Reed B, Junqueira F P. A simple totally ordered broadcastprotocol[C]. ACM, 2008,2.

[40] Andreev K, Racke H. Balanced graph partitioning[J].Theory of Computing Systems, 2006,39(6):929-939.

[41] The Apache Hama Project. 2012. http://hama.apache.org/. [2012-01-05]

[42] SPRING DATA. 2012. http://www.springsource.org/spring-data. [2012-09-10]

[43] Lin J X, Chen C C, Wu X Z, et al. GeoKSGrid: A geographicalknowledge grid with functions of spatial datamining and spatial decision[C]. 2011 IEEE InternationalConference on Spatial Data Mining and GeographicalKnowledge Services (ICSDM), 2011,121-126.

Outlines

/