基于自定义RDD的海量遥感图像并行镶嵌方法
作者简介:景维鹏(1979-),男,博士,副教授,研究方向为并行计算、分布式计算、空间数据挖掘。E-mail: nefujwp@163.com
收稿日期: 2017-02-28
要求修回日期: 2017-06-09
网络出版日期: 2017-10-20
基金资助
黑龙江省自然基金重点项目(ZD201403)
A Model of Parallel Mosaicking for Massive Remote Sensing Images Based on Self-defined RDD
Received date: 2017-02-28
Request revised date: 2017-06-09
Online published: 2017-10-20
Copyright
图像镶嵌是遥感图像处理中的重要内容,在跨区域遥感图像分析中发挥重要作用。为了解决传统遥感图像并行算法中存在的计算节点利用率低、频繁数据I/O等问题,本文根据Spark分布式内存计算框架,充分利用Spark利于迭代数据处理的优势,提出了一种基于Spark自定义RDD(弹性分布式数据集)的并行镶嵌方法。该方法首先在集群的多个节点上通过相位相关法执行图像重叠区域估计操作,从而提高了图像重叠区域估计的多节点并行计算;然后,通过重写Spark中RDD的compute和getPartitions方法,自定义针对遥感图像处理的RDD,并将图像镶嵌中的重叠区域估计、图像配准和图像融合3个关键步骤作为自定义RDD的Transformation类型的操作算子;最后,通过隐式转换创建自定义RDD,并调用自定义RDD的操作算子实现图像镶嵌的并行处理。实验结果表明,与传统基于MPI的并行镶嵌算法相比,该方法在保证图像镶嵌效果的基础上,能够有效提高大数据量的图像镶嵌效率。
景维鹏 , 霍帅起 . 基于自定义RDD的海量遥感图像并行镶嵌方法[J]. 地球信息科学学报, 2017 , 19(10) : 1346 -1354 . DOI: 10.3724/SP.J.1047.2017.01346
Image mosaicking is an important part of remote sensing image processing. It plays a vital role in the analysis of trans-regional remote sensing images. In order to solve the problems of low utilization rates of the nodes and frequent data I/O in the traditional parallel algorithms of remote sensing images, we proposed a parallel mosaicking algorithms based on self-defined RDD (Resilient Distributed Datasets), in which the Spark distributed memory computing framework has been used. In this paper, we take full advantage of the Spark, which is conducive to the processing of iterative data, and build remote sensing images parallel mosaic processing model through the operation of the Spark RDD. Firstly, according to the logical separability and data independence of the Fourier transform and inverse Fourier transform in the phase correlation method, we improved the traditional phase correlation method by executing a single instruction on multiple nodes, which are executed parallel in the cluster. We did so to improve the image overlapping region estimation multi-node parallel computation in the algorithm. Then, we override the compute and getPartitions methods in RDD and self-define the RDD for remote sensing image processing. Meanwhile, we used the three key steps of the image mosaicking, including overlapping region estimation, image registration and image fusion, which are the transformation-type operators of the self-defined RDD. These transformation-type operators do not perform calculations in the process of parallel mosaicking, until the final mosaicking image is required to be written to disk or file system. Thus, reducing the time consumption in the process of image parallel mosaicking. Finally, the parallel processing of image mosaicking is realized by calling the operators of self-defined RDD with the method of implicit conversion, compared with the parallel mosaicking algorithm based on MPI. The experimental results show that the parallel mosaicking algorithm of massive remote sensing image based on self-defined RDD can effectively improve the image mosaicking efficiency of large data volume on the basis of guaranteeing the image mosaicking effects.
Fig. 1 The architecture of Spark图1 Spark集群架构图 |
Fig. 2 Self-defined RDD implementation details图2 自定义RDD的实现细节 |
算法1: |
初始化:创建SparkConf对象conf,将conf作为SparkContext构造函数的参数创建SparkContext对象sc,调用sc的textFile 方法创建初始RDD |
阶段1:在自定义RDD中添加操作方法 Iterator[BufferedImage]←compute(split: Partition,context: TaskContext)//调用父RDD的iterator方法,返回一个内部 //元素类型为bufferImage的迭代器对象 Array[Partition]←firstParent[BufferedImage].partitions//调用父RDD的partitions方法,返回父RDD的分区 RDD[BufferedImage]←Image overlap region estimation//重叠区域估计方法 RDD[BufferedImage]←Image registration//图像配准方法 RDD[BufferedImage]←Image fusion//图像融合方法 阶段2:调用隐式转换的处理方法 self-definedRDD[rdd]←exchange(rdd:RDD[String])//转换类中的exchange方法由implicit关键字修饰,RDD为方法参数,//自定义RDD作为返回值 import RDDtoSelf-defiendRDD.exchange//在程序中导入声明的隐式转换的方法 阶段3:生成自定义RDD对象. imageRDD ←fileRDD.exchange//初始RDD调用exchange方法生成自定义RDD对象 |
Fig. 3 Parallel mosaicking algorithm based on self-defined RDD图3 基于自定义RDD的并行镶嵌算法 |
Fig. 4 Parallel mosaicking directed acyclic graphs of remote sensing images图4 遥感图像并行镶嵌有向无环图 |
Fig. 5 Parallel mosaicking algorithm based on Spark图5 镶嵌效果图 |
Fig. 6 Speedup contrast chart (with increasing number of processes)图6 加速比对比图(随进程数增加) |
Fig. 7 Running time comparison chart (with the increase of data size)图7 运行时间对比图(随数据规模增加) |
Fig. 8 Throughput comparison chart (with the increase of data size)图8 吞吐率对比图(随数据规模增加) |
The authors have declared that no competing interests exist.
[1] |
|
[2] |
|
[3] |
|
[4] |
|
[5] |
|
[6] |
|
[7] |
|
[8] |
|
[9] |
|
[10] |
[
|
[11] |
[
|
[12] |
[
|
[13] |
[
|
[14] |
|
[15] |
[
|
[16] |
|
[17] |
|
[18] |
[
|
[19] |
[
|
[20] |
|
[21] |
|
[22] |
|
[23] |
|
[24] |
Xin R S. Rosen J. Zaharia M, et al.Shark: SQL and rich analytics at scale[C]//Proceedings of the 2013 ACM SIGMOD International Conference on Management of data. ACM, 2013:13-24.
|
[25] |
|
[26] |
|
/
〈 |
|
〉 |