地球信息科学学报 ›› 2013, Vol. 15 ›› Issue (1): 55-60.doi: 10.3724/SP.J.1047.2013.00055

• 地球信息科学理论方法 • 上一篇    下一篇

基于MapReduce的多机并行DP算法与实验分析

张栋海1,3, 黄丽娜2,3, 刘晖1, 唐健1   

  1. 1. 武汉大学卫星导航技术研究中心, 武汉 430079;
    2. 武汉大学资源与环境科学学院, 武汉 430079;
    3. 武汉大学地理信息系统教育部重点实验室, 武汉 430079
  • 收稿日期:2012-09-13 修回日期:2012-12-17 出版日期:2013-02-25 发布日期:2013-02-25
  • 作者简介:张栋海(1989-),男,河北承德人,硕士研究生,研究方向为DEM地图制图自动综合。E-mail:zdh_zhangdonghai@163.com
  • 基金资助:

    国家自然科学基金项目(41101448,51008138);中央高校自主科研项目(274737);中国博士后科学基金项目(2011M501230)。

Research on Multi-machine Parallel DP Algorithm Based on MapReduce

ZHANG Donghai1,3, HUANG Lina2,3, LIU Hui1, TANG Jian1   

  1. 1. Research Center of GNSS, Wuhan University, Wuhan 430079, China;
    2. School of Resource and Environmental Science, Wuhan University, Wuhan 430079, China;
    3. Key Laboratory of Geographic Information System, Ministry of Education, Wuhan University, Wuhan 430079, China
  • Received:2012-09-13 Revised:2012-12-17 Online:2013-02-25 Published:2013-02-25

摘要:

随着网络地图不断发展,个性化网络地图也得到快速发展。个性化网络地图需要以矢量数据为数据基础,以满足人们对地图色彩、符号等个性化要求,所以需要实时、快速进行大量数据化简。本文以经典Douglas-Peucker算法作为曲线化简算法,利用开源云计算平台Hadoop建立多机协作的曲线并行化简服务框架,设计和实现了多机并行Douglas-Peucker算法,并在集群上进行实验分析,验证算法的效率和适用性。算法核心是设计数据的逻辑分片,利用MapReduce计算原理,将分片分配到集群中,实现并行运算。实验分别分为两个方面:(1)比较在固定阈值不同数据量情况下,传统DP算法与多机并行DP算法效率;(2)比较在相同数据量不同阈值情况下,传统DP算法与多机并行DP算法效率。实验表明,在大数据量和高复杂度情况下,多机并行DP算法的效率更高。

关键词: MapReduce, Douglas-Peucker算法, 曲线化简, 多机并行DP算法

Abstract:

Real time and rapid simplification of large-scale data, required by personalized WebGIS service which is based on vector data, becomes more and more important. The study was based on Douglas-Peucker, one of classical curve simplification algorithms, but in the view of its low performance, it can hardly simplify large-scale data in real time and rapidly. At the same time, the development of cloud-computing offers new storage technologies and computational methods for real time and rapid simplification of large-scale data. So this study made use of hadoop, one of the open source cloud computing platforms, to design and realize multi-machine parallel Douglas-Peucker algorithm. In the algorithm, we deigned the logic slices of data, and assigned the slices to the clusters by MapReduce computing model, achieved parallel simplification. In order to verify the efficiency of the algorithm, we designed the experiments and compared the efficiency of traditional DP algorithm and multi-machine parallel DP algorithm in tow aspects: 1) the same threshold and different amount of data; and 2) the fixed amount of data and different thresholds. The result of the experiments showed: the multi-machine parallel DP algorithm was more efficient than tradition DP algorithm for large-scale data and high-complexity computing. In this case, the data processing time was much longer than the data allocated in the inter-cluster and the transmission time, and every node was involved in a certain operation, improved the efficiency of operations. But for small scale data and low-complexity computing, the advantage of multi-machine parallel DP algorithm was non-obvious. Mainly due to a part of the nodes didn't participate in the operation, the computing potential of the cluster was not full play, while the data processing required time was very short, so the data allocation and transmission time impacted obviously. And, in order to meet the real time and rapid simplification of large-scale data, the multi-machine parallel DP algorithm should choose the appropriate simplification method for different amount of data and complexity computing in future.

Key words: Douglas-Peucker algorithm, multi-machine parallel DP algorithm, curve simplification, MapReduce