地球信息科学学报 ›› 2012, Vol. 14 ›› Issue (4): 448-453.doi: 10.3724/SP.J.1047.2012.00448

• 本期要文(可全文下载) • 上一篇    下一篇

MySQL集群与MPI的并行空间分析系统设计与实验

周玉科1,2, 马廷1, 周成虎1, 高锡章1, 范俊甫1,2   

  1. 1. 中国科学院地理科学与资源研究所 资源与环境信息系统国家重点实验室, 北京 100101;
    2. 中国科学院大学, 北京 100049
  • 收稿日期:2012-03-17 修回日期:2012-07-26 出版日期:2012-08-25 发布日期:2012-08-22
  • 作者简介:周玉科(1984-),男,博士研究生,从事高性能空间分析。E-mail:zyk@lreis.ac.cn
  • 基金资助:

    国家科技支撑计划(2011BAH06B03、2011BAH24B10);国家自然科学基金项目(40830529、41171307)。

Design and Implement of Parallel Spatial Analysis System Based on MySQL & MPI

ZHOU Yuke1,2, MA Ting1, ZHOU Chenghu1, GAO Xizhang1, Fan Junfu1,2   

  1. 1. State Key Laboratory of Resources and Environmental Information System (LREIS), Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2012-03-17 Revised:2012-07-26 Online:2012-08-25 Published:2012-08-22

摘要:

GIS应用正面对空间数据规模日益增加和空间分析算法复杂度逐渐提高的挑战,本文提出一种基于MySQL空间数据库集群与MPI的并行计算库分布式空间分析框架的解决方案。该框架使用MySQL空间数据库集群解决大量空间数据存储与管理问题,利用MySQL Spatial的Replication机制加强空间数据的冗余备份和并发访问控制,同时使用MPI负责分布式计算节点间的通信减少人工控制通信的开发成本。并行框架的任务管理与调度系统采用优先队列式管理,通过Master节点监控集群状态,合理分发计算任务实现负载均衡和容错。最后,以多边形Overlay算法为例,研究其在该并行空间分析系统下的并行策略,采用数据并行的管道流水线作业方式在框架中运行测试,结果表明,该并行框架相比串行算法可以得到可靠的加速比。

关键词: MPI, 并行GIS, 并行空间数据库, MySQL集群, 叠加分析

Abstract:

With the rapid development of space survey technology, GIS is facing a challenge of fast growing size on spatial data and complexity of spatial analysis algorithm. Traditional serial spatial analysis method isn't able to deal with this condition well. High performance computer and new computing methods provide an innovative way for spatial data processing and analysing problem. Remote sensing data processing is data-intensive and an ideal domain to use parallel computing, but vector data operation is computing-intensive which needs more computing ability. In this paper, a distributed spatial analysis framework based on MySQL spatial and MPI is described. Parallel spatial vector data mean is explored in kind of cluster way. This framework uses MySQL spatial cluster to store and manage GIS data which can resolve the problem about fault-tolerant and concurrent access for the same data block. MPI is good at passing messages in distributed network nodes, so it's not necessary to control telecom between nodes manually. Task management and distribution use prior queue to achieve load balance and fault-tolerant through monitoring the status of cluster. Finally, a parallel polygon overlay operation is experimented on this distributed system to test the performance of the cluster. The strategy of parallel Overlay operation is in a pipeline way, each node gets a part set of the polygons in the overlaid layers. And this method got relative better speedup than the serial overlay operation.

Key words: distributed spatial database, Overlay, MySQL cluster, MPI, parallel GIS