本期要文(可全文下载)

MySQL集群与MPI的并行空间分析系统设计与实验

展开
  • 1. 中国科学院地理科学与资源研究所 资源与环境信息系统国家重点实验室, 北京 100101;
    2. 中国科学院大学, 北京 100049
周玉科(1984-),男,博士研究生,从事高性能空间分析。E-mail:zyk@lreis.ac.cn

收稿日期: 2012-03-17

  修回日期: 2012-07-26

  网络出版日期: 2012-08-22

基金资助

国家科技支撑计划(2011BAH06B03、2011BAH24B10);国家自然科学基金项目(40830529、41171307)。

Design and Implement of Parallel Spatial Analysis System Based on MySQL & MPI

Expand
  • 1. State Key Laboratory of Resources and Environmental Information System (LREIS), Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China

Received date: 2012-03-17

  Revised date: 2012-07-26

  Online published: 2012-08-22

Supported by

null

摘要

GIS应用正面对空间数据规模日益增加和空间分析算法复杂度逐渐提高的挑战,本文提出一种基于MySQL空间数据库集群与MPI的并行计算库分布式空间分析框架的解决方案。该框架使用MySQL空间数据库集群解决大量空间数据存储与管理问题,利用MySQL Spatial的Replication机制加强空间数据的冗余备份和并发访问控制,同时使用MPI负责分布式计算节点间的通信减少人工控制通信的开发成本。并行框架的任务管理与调度系统采用优先队列式管理,通过Master节点监控集群状态,合理分发计算任务实现负载均衡和容错。最后,以多边形Overlay算法为例,研究其在该并行空间分析系统下的并行策略,采用数据并行的管道流水线作业方式在框架中运行测试,结果表明,该并行框架相比串行算法可以得到可靠的加速比。

本文引用格式

周玉科, 马廷, 周成虎, 高锡章, 范俊甫 . MySQL集群与MPI的并行空间分析系统设计与实验[J]. 地球信息科学学报, 2012 , 14(4) : 448 -453 . DOI: 10.3724/SP.J.1047.2012.00448

Abstract

With the rapid development of space survey technology, GIS is facing a challenge of fast growing size on spatial data and complexity of spatial analysis algorithm. Traditional serial spatial analysis method isn't able to deal with this condition well. High performance computer and new computing methods provide an innovative way for spatial data processing and analysing problem. Remote sensing data processing is data-intensive and an ideal domain to use parallel computing, but vector data operation is computing-intensive which needs more computing ability. In this paper, a distributed spatial analysis framework based on MySQL spatial and MPI is described. Parallel spatial vector data mean is explored in kind of cluster way. This framework uses MySQL spatial cluster to store and manage GIS data which can resolve the problem about fault-tolerant and concurrent access for the same data block. MPI is good at passing messages in distributed network nodes, so it's not necessary to control telecom between nodes manually. Task management and distribution use prior queue to achieve load balance and fault-tolerant through monitoring the status of cluster. Finally, a parallel polygon overlay operation is experimented on this distributed system to test the performance of the cluster. The strategy of parallel Overlay operation is in a pipeline way, each node gets a part set of the polygons in the overlaid layers. And this method got relative better speedup than the serial overlay operation.

参考文献

[1] 赵春宇.高性能并行GIS中矢量空间数据存取与处理关键技术研究.武汉:武汉大学,2006.

[2] Mineter M J, Dowers S and Gittings B M. Towards a HPC framework for integrated processing of geographical data: Encapsulating the complexity of parallel algorithms[J]. Transactions in GIS, 2000(4): 245-261.

[3] 薛勇,万伟,艾建文. 高性能地学计算进展[J]. 世界科技研究与发展,2008(3):314-319.

[4] 王结臣,王豹,胡玮,等. 并行空间分析算法研究进展及评述[J]. 地理与地理信息科学,2011(6):1-5.

[5] 罗英伟,汪小林.空间信息合作与并行处理[J].计算机辅助设计与图形学学报,2003,15(10):1307-1314.

[6] 方裕,邬伦,谢昆青,等.分布式协同计算的GIS技术研究[J].地理与地理信息科学,2006,22(3):9-12,54.

[7] MySQL Replication. http://dev.MySQL.com/doc/refman/5.5/en/replication.html

[8] 朱江,张立立.海量影像数据的发布集群系统与应用[J].地球信息科学,2006,8(2):101-105.

[9] Torque. http://www.clusterresources.com/torquedocs21.

[10] 吴亮,谢忠,陈占龙,等.分布式空间分析运算关键技术[J].地球科学(中国地质大学学报), 2010(3).

[11] 陈国良,孙广中,徐云,等. 并行算法研究方法学[J].计算机学报, 2008,12(9):1493-1502.

[12] Bentley J L, Ottmann T A. Algorithms for reporting and counting geometric intersections[J]. IEEE Trans. Comput., 1979, C-28:643-647.

[13] 王璟,张云泉,李玉成.基于MPI和MySQL的并行数据库系统搭建[J].计算机科学,2003,31(10):418-421.

文章导航

/