2015 , Vol. 17 >Issue 12: 1465 - 1473

Orginal Article

A Parallel Geostatistical Areal Interpolation Algorithm Suited for Heterogeneous Cluster Computing

• YUN Shuo ,
• GUAN Qingfeng , *
Expand
• 1. National Engineering Research Center of GeographicInformation System, China University of Geosciences, Wuhan 430074, China
• 2. School of Information Engineering, China University of Geosciences, Wuhan 430074, China
*Corresponding author: GUAN Qingfeng, E-mail:

Request revised date: 2015-11-11

Online published: 2015-12-20

《地球信息科学学报》编辑部 所有

### Abstract

The goal of geostatistical areal interpolation is to estimate the unknown attribute values of a group of areal units using another group of areal units with known attribute values. Most geostatistical areal interpolation algorithms are based on the Kriging interpolation and its derivatives. Kriging interpolation considers the spatial variability of attributes and the covariance between the spatial units. It is a typical computationally intensive algorithm. The computation of covariance between a pair of areal unitsis independent from the computation between the other pairs, thus it is parallelizable. In addition, the covariance can be calculated using fast Fourier transform (FFT), which is a computationally intensive algorithm and is very suitable for the parallel processing.This paper presents a parallel algorithm for geostatistical areal interpolation that is suited for CPU+GPU heterogeneous computing clusters. The algorithm was implemented using MPI and CUDA. The experiment results showed that the hybridparallel algorithm outperformed the MPI-basedparallel algorithm that uses only the CPUs, and it exhibited a good scalability.

YUN Shuo , GUAN Qingfeng . A Parallel Geostatistical Areal Interpolation Algorithm Suited for Heterogeneous Cluster Computing[J]. Journal of Geo-information Science, 2015 , 17(12) : 1465 -1473 .

### 2 地统计面插值算法原理

#### 2.1 简单克里金面插值和普通克里金面插值

$z ^ t p = μ Z a p + w p T z s - μ Z a s$ (1)

$∑ k = 1 K w p s k ' σ Z s k , s k ' = σ Z t p , s k$ (2)

$z ^ t p = ∑ k = 1 K w p s k z s k = w p T z s$ (3)

$∑ a s a s T 0 w p - ξ p = σ p a p$ (4)

(1) 求源要素之间的协方差矩阵,源要素和目标要素之间的协方差矩阵;
(2) 构建矩阵方程;
(3) 通过求解方程得到权值矩阵;
(4) 通过权值矩阵求得目标要素的属性值。

#### 2.2 面要素间协方差计算

$σ Z s k , s k ' = Cov Z s k , Z s k ' = ∫ u ∈ D ∫ u ' ∈ D g k u σ Z u - u ' g k u ' d u ' du$ (5)

$z ( s k ) ≈ ∑ i = 1 N g k u i z u i , k = 1,2 , ⋯ , K$ (6)

$σ Z s k , s k ' = Cov Z s k , Z s k ' ≈ ∑ i = 1 N ∑ j = 1 N g k u i σ Z u i - u j g k u j$ (7)

#### 2.3 基于快速傅里叶变换（FFT）的协方差计算

$∑ ss = G s ∑ ss G s T$ (8)

$∑ ts = G t ∑ ss G s T$ (9)

$∑ zz$ 扩展后在二维下是对称的块托普利茨（Toeplitz）矩阵,其矩阵由第一行的块移动得到[19]。这种循环行列式形式的矩阵可分解成式（10）。
$∑ ~ zz = F E ~ F H$ (10)

$e ~ = f ∑ ~ zz 1 , ⋅$ (11)

$σ Z t p , s k = F H g ~ p H E ~ F H g ~ k$ (12)
DFT可通过FFT更高效地进行计算。面插值的元素间协方差计算步骤如下：
(1) 用格网划分研究域格网;
(2) 计算格网格子之间的协方差,并扩展为符合公式的形式;
(3) 求扩展矩阵特征值;
(4) 对要素的采样函数向量进行FFT变换;
(5) 将计算结果通过式（12）计算得到要素之间的协方差。

### 3 并行算法研究

#### 3.1 算法复杂度和并行分析

(1) 获取源数据和目标数据;
(2) 对源面要素和目标面要素进行格网处理,离散成格子单元;
(3) 计算格网的所有格子间的协方差矩阵,并扩展该矩阵;
(4) 对该格网的协方差矩阵进行FFT变换,得到特征值矩阵,然后该要素的取样函数矢量进行FFT变换,然后求取这2个结果的积,再对该结果进行FFT变换,得到每个要素对于其他要素的协方差矩阵中间值;
(5) 通过每个要素和其它要素的协方差矩阵中间值,和要求取协方差的那个要素的采样函数矢量,计算源要素之间的协方差矩阵,源要素和目标要素之间的协方差矩阵;
(6) 根据选用的克里金面插值方式的原理构建协方差矩阵方程;
(7) 求解协方差矩阵方程,得到每个源要素对每个目标要素的权值矩阵;
(8) 通过权值矩阵和选用克里金原理公式计算得到每个目标的值。

#### 3.2 并行算法设计与实现

##### Fig. 1 Flow chart of the parallel algorithm

（1） 数据准备。这个流程中需读取数据,包括源要素和目标要素的空间信息及源要素的属性信息,然后构建相应的要素目标;读取协方差模型参数,构建协方差参数对象;设置格网大小,生成格网格网信息。通过格网信息计算每个源要素和目标要素基于格网的坐标集合。
（2） 广播协方差参数对象,数据对象集合、网格信息对象、数据基于格网的坐标集合。
（3） 开始并行计算要素之间局部的协方差。以某一要素与其他要素的协方差矩阵计算为计算单元,分配到相应的计算结点上。每一个要素间协方差计算具体步骤如图2所示。每个进程中首先使用协方差参数模型构建协方差模型,利用格网信息计算格网中的协方差矩阵,然后对协方差矩阵进行计算。通过将数据基于格网的坐标集合上载到协处理器进行FFT处理,在进程中计算以上2个FFT的结果的乘法,将结果上载到协处理器重新进行FFT处理,然后将获取的结果和其它的坐标集合矢量进行乘法运算获取结果。重复当前源要素与其所有源要素和目标要素的计算,获取局部的源要素之间的协方差矩阵和局部源要素与目标要素之间的协方差矩阵。
##### Fig. 2 Covariance calculation between blocks

（4） 将协方差矩阵通过广播或者消息发送的方式返回给主进程。
（5） 主进程开始使用相关算法进行方程分解,得到源要素对目标要素的权重矩阵。
（6） 主进程使用权重矩阵对目标要素属性 求值。
（7） 将计算结果写入目标文件。

### 4 并行算法实验和分析

#### 4.1 实验环境和实验设计

$S p = t 1 t p$ （13）

#### 4.2 算法结果和性能分析

##### Fig. 3 The results of interpolation (the source data (a) and the target data (b))

4.2.1 CPU+GPU异构单节点测试结果与分析

##### Fig. 7 Computationaltime of different settings

4.2.2 CPU+GPU异构并行测试结果与分析

### 5 结语

The authors have declared that no competing interests exist.

 [1] Seymour L.Spatial data analysis: Theory and Practice. Robert Haining[J]. Journal of the American Statistical Association, 2005,100(469):353-354.
 [2] Goodchild M F, Anselin L, Deichmann U.A framework for the areal interpolation of socioeconomic data[J]. Environment & Planning A, 1993,25(3):383-397.Spatial data are collected and represented as attributes of spatial objects embedded in a plane. Basis change is defined as the transfer of attributes from one set of objects to another. Methods of basis change for socioeconomic data are reviewed and are seen to differ in the assumptions made in each about underlying density surfaces. These methods are extended to more general cases, and an illustration is provided by using Californian data. The implementation of this framework within a geographical information system is discussed.
 [3] Gotway C, Waller L.Applied Spatial Statistics for Public Health Data[M]. Hoboken, NJ:Wiley-Interscience, 2004.
 [4] Flowerdew R, Green M.Areal interpolation and types of data[A]. In Fotheringham S, Rogerson P(eds.). Spatial Analysis and GIS[M]. London: Taylor and Francis, 1994.
 [5] Goodchild M F, Lam N S N. Areal interpolation: A variant of the traditional spatial problem[J]. Geo Processing, 1980,1(3):297-312.Geo-Processing, 1(1980) 297—312 Elsevier Scientific Publishing Company, Amsterdam 297 - Printed in The Netherlands AREAL INTERPOLATION: A VARIANT OF THE. TRADITIONAL SPATIAL PROBLEM MICHAEL F. GOODCHILD and NINA SIU-NG02N LAM Department of
 [6] Kyriakidis P C, Schneider P, Goodchild M F.Fast geostatistical areal interpolation[C]. 7th International Conference on Geocomputation, Ann Arbor Michigan, 2005.
 [7] 吴博,高超,谢健.基于混合并行的Kriging插值算法研究[J].计算技术与自动化,2014,33(1):65-68.普通Kriging方法是进行 空间降水插值的一种有效方法。然而一方面由于海量数据插值计算量大,另一方面该算法的时间复杂度大,为减少空间降水插值的计算时间,采用OpenMP和 MPI混合并行技术,实现Kriging并行算法。在Windows操作系统上搭建并行计算环境,实验数据表明,该并行算法能有效地节省计算时间。
 [8] 肖汉. 利用GPU计算的双线性插值并行算法[J].小型微型计算机系统,2010,32(11):2241-2245.双线性插值算法在数字图像处理 中有广泛的应用,但计算速度慢.为提高其计算速度,提出一种基于图形处理器加速的双线性插值并行算法.主要利用Wallis变换双线性插值中各分块之间的 独立性适合GPU并行处理架构的特点,把传统串行双线性插值算法映射到CUDA并行编程模型,并从线程分配,内存使用,硬件资源划分等方面进行优化,来充 分利用GPU的巨大运算能力.实验结果表明,随着图像分辨率的增大,双线性内插并行算法可以把计算速度提高28倍.
 [9] Guan Q, Kyriakidis P C, Goodchild M F.A parallel computing approach to fast geostatistical areal interpolation[J]. International Journal of Geographical Information Science, 2011,25(8):1241-1267Areal interpolation is the procedure of using known attribute values at a set of (source) areal units to predict unknown attribute values at another set of (target) units. Geostatistical areal interpolation employs spatial prediction algorithms, that is, variants of Kriging, which explicitly incorporate spatial autocorrelation and scale differences between source and target units in the interpolation endeavor. When all the available source measurements are used for interpolation, that is, when a global search neighborhood is adopted, geostatistical areal interpolation is extremely computationally intensive. Interpolation in this case requires huge memory space and massive computing power, even with the dramatic improvement introduced by the spectral algorithms developed by Kyriakidis et al. (2005. Improving spatial data interoperability using geostatistical support-to-support interpolation. In: Proceedings of geoComputation. Ann Arbor, MI: University of Michigan) and Liu et al. (2006. Calculation of average covariance using fast Fourier transform (FFT). Menlo Park, CA: Stanford Center for Reservoir Forecasting, Petroleum Engineering Department, Stanford University) based on the fast Fourier transform (FFT). In this study, a parallel FFT-based geostatistical areal interpolation algorithm was developed to tackle the computational challenge of such problems. The algorithm includes three parallel processes: (1) the computation of source-to-source and source-to-target covariance matrices by means of FFT; (2) the QR factorization of the source-to-source covariance matrix; and (3) the computation of source-to-target weights via Kriging, and the subsequent computation of predicted attribute values for the target supports. Experiments with real-world datasets (i.e., predicting population densities of watersheds from population densities of counties in the Eastern Time Zone and in the continental United States) showed that the parallel algorithm drastically reduced the computing time to a practical length that is feasible for actual spatial analysis applications, and achieved fairly high speed-ups and efficiencies. Experiments also showed the algorithm scaled reasonably well as the number of processors increased and as the problem size increased.
 [10] Bernaschi M, Salvadore F.Multi-Kepler GPU vs. multi-intel MIC: A two test case performance study[C]. IEEE 2014 International Conference on High Performance Computing & Simulation (HPCS), 2014:1-8.
 [11] 王握文,陈明."天河一号"超级计算机系统研制[J].国防科技,2009(6):1-4.2009年10月,国家首台千万亿次超级计算机系统--"天河一号"在国防科学技术大学诞生."天河一号"的诞生,是我国高性能计算机技术发展的又一重大突破,标志着我国超级计算机研制能力实现了从百万亿次到千万亿次的重大跨越,我国成为继美国之后第二个能研制千万亿次超级计算机系统的国家.
 [12] 王涛. “天河二号”超级计算机[J].科学,2013(4):52-52.2013年6月17日，国际超级计算机大会（International Supercomputer Conference）发布了全球最快的500台超级计算机排行榜，中国国防科技大学与浪潮公司共同研发的“天河二号”超级计算机位列第一。这是中国研制的超级计算机第二次荣获冠军，第一次获得冠军的是2010年中国国防科技大学研制的“天河一号”超级计算机。
 [13] Kirk D.NVIDIA CUDA software and GPU parallel computing architecture[C]. International Symposium on Memory Management Proceedings of International Symposium on Memory Management, 2007,27(5-6):578.
 [14] Felicie A L.Intel MIC[M]. Newport, RI: Salve Regina University, 2012.
 [15] Journel A G, Huijbregts C.Mining Geostatistics[M].London: Academic Press, 1978.
 [16] Coburn T C.Geostatistics for Natural Resources Evaluation[J]. Technometrics, 2000,42(4):437-438.
 [17] Anderson T W.An introduction to multivariate statistical analysis[J]. Wiley, 1958,148:164-165.
 [18] Strang G.Introduction to applied mathematics[J]. Journal of Vibration & Acoustics, 1986,110(2):428-429.
 [19] Davis P. Circulant Matrices.Second Edition[M]. New York, NY: Chelsea Publishing, 1994.
 [20] Liu Y, Jiang Y, Kyriakidis P .Calculation of average covariance using fast Fouriertransform (FFT)[R]. Menlo Park, CA: Stanford Center for Reservoir Forecasting, PetroleumEngineering Department, Stanford University, 2006.
Options
Outlines

/

 〈 〉