地球信息科学学报 ›› 2017, Vol. 19 ›› Issue (10): 1346-1354.doi: 10.3724/SP.J.1047.2017.01346

• 遥感科学与应用技术 • 上一篇    下一篇

基于自定义RDD的海量遥感图像并行镶嵌方法

景维鹏(), 霍帅起   

  1. 东北林业大学信息与计算机工程学院,哈尔滨 150040
  • 收稿日期:2017-02-28 修回日期:2017-06-09 出版日期:2017-10-20 发布日期:2017-10-20
  • 作者简介:

    作者简介:景维鹏(1979-),男,博士,副教授,研究方向为并行计算、分布式计算、空间数据挖掘。E-mail: nefujwp@163.com

  • 基金资助:
    黑龙江省自然基金重点项目(ZD201403)

A Model of Parallel Mosaicking for Massive Remote Sensing Images Based on Self-defined RDD

JING Weipeng*(), HUO Shuaiqi   

  1. College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
  • Received:2017-02-28 Revised:2017-06-09 Online:2017-10-20 Published:2017-10-20
  • Contact: JING Weipeng E-mail:nefujwp@163.com

摘要:

图像镶嵌是遥感图像处理中的重要内容,在跨区域遥感图像分析中发挥重要作用。为了解决传统遥感图像并行算法中存在的计算节点利用率低、频繁数据I/O等问题,本文根据Spark分布式内存计算框架,充分利用Spark利于迭代数据处理的优势,提出了一种基于Spark自定义RDD(弹性分布式数据集)的并行镶嵌方法。该方法首先在集群的多个节点上通过相位相关法执行图像重叠区域估计操作,从而提高了图像重叠区域估计的多节点并行计算;然后,通过重写Spark中RDD的compute和getPartitions方法,自定义针对遥感图像处理的RDD,并将图像镶嵌中的重叠区域估计、图像配准和图像融合3个关键步骤作为自定义RDD的Transformation类型的操作算子;最后,通过隐式转换创建自定义RDD,并调用自定义RDD的操作算子实现图像镶嵌的并行处理。实验结果表明,与传统基于MPI的并行镶嵌算法相比,该方法在保证图像镶嵌效果的基础上,能够有效提高大数据量的图像镶嵌效率。

关键词: 遥感图像, 并行镶嵌, Spark, 相位相关法, 自定义RDD

Abstract:

Image mosaicking is an important part of remote sensing image processing. It plays a vital role in the analysis of trans-regional remote sensing images. In order to solve the problems of low utilization rates of the nodes and frequent data I/O in the traditional parallel algorithms of remote sensing images, we proposed a parallel mosaicking algorithms based on self-defined RDD (Resilient Distributed Datasets), in which the Spark distributed memory computing framework has been used. In this paper, we take full advantage of the Spark, which is conducive to the processing of iterative data, and build remote sensing images parallel mosaic processing model through the operation of the Spark RDD. Firstly, according to the logical separability and data independence of the Fourier transform and inverse Fourier transform in the phase correlation method, we improved the traditional phase correlation method by executing a single instruction on multiple nodes, which are executed parallel in the cluster. We did so to improve the image overlapping region estimation multi-node parallel computation in the algorithm. Then, we override the compute and getPartitions methods in RDD and self-define the RDD for remote sensing image processing. Meanwhile, we used the three key steps of the image mosaicking, including overlapping region estimation, image registration and image fusion, which are the transformation-type operators of the self-defined RDD. These transformation-type operators do not perform calculations in the process of parallel mosaicking, until the final mosaicking image is required to be written to disk or file system. Thus, reducing the time consumption in the process of image parallel mosaicking. Finally, the parallel processing of image mosaicking is realized by calling the operators of self-defined RDD with the method of implicit conversion, compared with the parallel mosaicking algorithm based on MPI. The experimental results show that the parallel mosaicking algorithm of massive remote sensing image based on self-defined RDD can effectively improve the image mosaicking efficiency of large data volume on the basis of guaranteeing the image mosaicking effects.

Key words: remote sensing images, parallel mosaicking, spark, phase correlation methods, self-defined RDD