面向数据传输的地理栅格数据快速压缩方法

江岭; 王春; 赵明伟; 杨灿灿

doi:10.3724/SP.J.1047.2016.00894

地球信息科学学报 >

2016 , Vol. 18 >Issue 7: 894 - 901

DOI: https://doi.org/10.3724/SP.J.1047.2016.00894

地球信息科学理论与方法

面向数据传输的地理栅格数据快速压缩方法

江岭 ^,^* ,
王春 ,
赵明伟 ,
杨灿灿

展开

滁州学院安徽地理信息集成应用协同创新中心,滁州 239000

作者简介：江岭(1987-),男,安徽六安人,博士,讲师,研究方向为数字地形建模及高性能地学计算。E-mail: jiangling_xs@163.com

收稿日期: 2015-04-05

要求修回日期: 2016-05-26

网络出版日期: 2016-07-15

基金资助

国家自然科学基金项目(41501445)

安徽省高等学校自然科学研究项目(KJ2015A171)

测绘遥感信息工程国家重点实验室开放基金项目(14I02)

滁州学院科研启动基金项目(2014qd028)

收起

A Fast Compression Approach of Geo-raster Data for Network Transmission

JIANG Ling ^,^* ,
WANG Chun ,
ZHAO Mingwei ,
YANG Cancan

Expand

Anhui Center for Collaborative Innovation in Geographical Information Integration and Application, Chuzhou University, Chuzhou 239000, China

*Corresponding author: JIANG Ling, E-mail: jiangling_xs@163.com

Received date: 2015-04-05

Request revised date: 2016-05-26

Online published: 2016-07-15

Copyright

《地球信息科学学报》编辑部所有

Fold

摘要

随着对地观测技术的高速发展,高分辨率地理栅格数据已被广泛应用于地貌、环境、水文等领域,传输与存储海量数据亟需通过数据压缩来解决有限信道容量的制约。本文分析了地理栅格数据特征,并基于数据保真性和压缩即时性原则,提出了融合转换压缩和编码压缩的地理栅格数据两阶段压缩方法,并从精度和效率2个视角构建了两阶段压缩方法的评价方法。利用不同大小的规则格网DEM数据,在集群系统上对两阶段压缩方法的数据保真性和压缩性能进行了测试。实验结果表明,本文构建的两阶段压缩方法在数值和地表形态上均有较好的精度,数据保真性高。同时,其压缩率一般在50%以上,解/压速率达到实时层次,能够显著地减少数据传输时间消耗,提高网络传输效率。两阶段压缩方法具有较好的普适性,可为高性能地学并行计算等领域提供技术支撑。

关键词： 数据压缩; 两阶段压缩; 栅格数据; DEM; 并行计算

本文引用格式

江岭 , 王春 , 赵明伟 , 杨灿灿 . 面向数据传输的地理栅格数据快速压缩方法[J]. 地球信息科学学报, 2016 , 18(7) : 894 -901 . DOI: 10.3724/SP.J.1047.2016.00894

Abstract

As the main form of representing the geographical information, geo-raster data contains abundant geographical knowledge. With the rapid development of earth observation technology, high-resolution geo-raster data has been widely applied to many research fields, such as landform, soil, environment and hydrology. With respect to this context, the contradiction between the saving and transferring of massive geo-raster data and a limited channel capacity has become increasingly prominent with regard to the intensive increase of data size. Data compression techniques provide the possibility to solve this problem. This paper studies the compression method of geo-raster data based on gridded DEMs for the purpose of realizing massive data’s online transmission. By analyzing the characteristics of geo-grid data, this paper proposes a new compression method named as the two-phase compression method, which combines the conversion compression and the coding compression based on the data fidelity and the real-time compression principle. Meanwhile, this paper establishes an assessment method of two-phase compression method from the perspectives of accuracy and efficiency. In order to test and verify the data fidelity and the compression performance of the two-phase compression method, this paper conducted several experiments on a 10-node server cluster under the Linux operating system by using different sizes of gridded DEMs. The experiment results showed that the proposed two-phase compression method has provided good data fidelity. It keeps the data accuracy on both of the numerical and the representation structure. At the same time, the compression ratio is generally above 50%, and the almost real-time decompression/compression efficiency also indicates that it has a good performance. The two-phase compression method can significantly reduce the time consumption of data transmission through network, and improve the efficiency of network transmission. In all, this two-phase compression method of geo-raster data presents a good universality, and it can provide a technical support to the application of geo-raster data, such as the high-performance geo-computation.

Key words： data compression; two-phase compression; geo-raster data; DEM; parallel computation

1 引言

地理栅格数据作为地理空间数据表达的主要形式,已广泛应用于地貌、环境、水文等领域^[1-3]。随着对地观测技术的高速发展,地理栅格数据的分辨率不断提高,为微观尺度上获取和分析地表细节信息提供了有效途径^[4]。在为高分辨率、短周期、多来源的地理栅格数据提供丰富地学信息的同时,也给海量化的数据存储,特别是快速网络传输带来了诸多难题。数据压缩技术通过一定的算法对给定数据进行重新组织以减少数据的信息冗余,可节省数据存储空间和提高数据传输效率,为解决存储和传输海量地理栅格数据与有限的信道容量之间日益突出的矛盾提供了有效途径。

对于地理栅格数据的压缩主要是通过建立预测模型以识别和剔除数据隐藏的信息冗余,并在此基础上进行编码^[5]。近年来,国内外众多学者对地理栅格数据压缩方法开展了研究工作,主要体现在采用标准图像的压缩^[6-7]、基于小波变换的压缩^[8-10]、基于人工神经网络的压缩^[11-12]、利用数据的空间相关性进行压缩^[13-14]等方面。上述研究取得了较好的成果,然而它们主要侧重于压缩率,忽略了压缩实时效率,同时部分压缩方法在数据解码恢复精度上存在不可逆,不能满足地学并行计算等领域数据网络通信精度和即时需求。为提高压缩实时性,部分学者从并行计算的视角开展了相关研究,主要集中在基于GPU的并行压缩计算。他们将待处理的地理栅格数据分为多个部分,并映射到不同GPU像素着色器上进行解压缩运算^[15-17]。这部分并行压缩方法提高了压缩计算效率,但其增加了压缩算法的复杂性,同时对计算设备有一定的依赖性,不具有普适性。为此,本文面向海量地理栅格数据的网络传输需求,从数据保真性和压缩即时性双重视角出发,开展地理栅格数据快速压缩方法研究,为地学数据共享、高性能地学计算等领域提供方法支撑。

2 两阶段压缩方法

地理栅格数据基于不同的数据组织方式,存在众多数据格式,如GeoTIFF、ESRI Grid、ERDAS IMAGINE等。从表达地理信息及参与地理计算的角度出发,可以抽象得到地理栅格数据的共性,这为研究具有普适性的地理栅格数据压缩方法提供了可能性。以规则格网DEM为例,地理栅格数据可抽象为2部分：（1）数据头,包含定义DEM参考起点坐标、无值区标识、格网分辨率、行数及列数;（2）数据体,按行列顺序记录高程数值阵列,其中数据类型为整型、浮点型或者双精度型。对地理栅格数据进行压缩的实质就是对其数据体的压缩。为此,本文提出了一种融合转换压缩和编码压缩的两阶段压缩方法,即对地理栅格数据先进行转换压缩,然后进行编码压缩,解压过程相反。

2.1 转换压缩

由上述分析可知,地理栅格数据的像元值数据类型通常有整型（INT）、浮点型（FLOAT）及双精度型（DOUBLE）3种,其中前2种数据类型较为常见,且浮点型最为广泛。在目前32位计算机系统中,整型和浮点型数据均为32位表示（即4个字节）,双精度型为64位表示（即8个字节）。然而,无符号短整型（UNSIGNED SHORT）数据是16位表示（即2个字节）,值域范围为[0,65 535]。这便给我们一个启示,能否在保证地理栅格数据满足分析和应用精度的情况下,将像元值通过数据变换而采用无符号短整型进行表示,进而达到数据压缩的目的。本文将地理栅格数据的像元值由高位数据类型转换为低位数据类型表示的过程,称为转换压缩。事实上,转换压缩思想已在矢量数据压缩中得到了较好的应用效果^[18-19],然而其在地理栅格数据压缩、乃至与编码压缩相结合等方面的研究较为鲜见,这也进一步促进了本文利用转换压缩进行地理栅格数据压缩的思考。转换压缩的关键在于核函数设计,因为不同的核函数使得转换压缩的效果差异较大。在数值分析方法中,归一化是将数据映射到[0,1]区间,以达到无量纲表达。借鉴归一化方法,本文设计了地理栅格数据转换压缩的核函数：先将像元值进行数据变换到[0,0.6],然后利用无符号短整型数据将变换后的有效数位进行表达,从而实现像元值从高位数据类型映射到低位数据类型表示。转换压缩映射公式如式（1）所示。

US E i = 0.6 × e i - e min e max - e min + ε × 10 p

（1）

式中：

US E i

为像元

i

转换压缩后的无符号短整型数据;

e i

为像元

i

的数值;

e min 、 e max

分别为地理栅格数据的最小和最大值;

ε

为精度舍入系数,

ε = 0.5 ×

10 - p

;

p

为有效位数指数。本文为确保最小程度丢失数据精度,

p

取值5,即地理栅格数据像元值最终转换到[0,60 000]值域间。解压为压缩的逆过程,公式如式（2）所示。

e i = US E i × (e max - e min) 0.6 × 10 p + e min

（2）

2.2 编码压缩

对于地理栅格数据而言,像元数据在空间位置上会存在一定的数据冗余,这种冗余通过一定的数据编码方法可进行有效消除,从而数据得以压缩。经转换压缩后,地理栅格数据由原来的整型等数据矩阵变换为无符号短整型数据矩阵。此时,数据矩阵中的冗余存在2个方面：（1）地理栅格数据矩阵中的原生冗余传递到转换后的数据矩阵中;（2）转换压缩也会派生出一定的数据冗余。为消除或减弱这部分数据冗余,利用数据编码方法对数据矩阵（高程矩阵）进行重组织,本文将该过程称为编码压缩。

通常,地理栅格数据的数据结构编码方法均以像元值重复为编码基础^[20]。实际上由地理学第一定律可知,地理栅格数据像元值在一定邻近区域常常表现为相近,而非重复。如图1所示,经转换压缩后相邻像元A、B的值变换为22 092、22 064,其值虽不相等,但从它们在内存中表示的二进制码来看,仍存在一定的冗余（第一个字节重复）,这种冗余是像元值在内存中字节冗余。由于字符类型（CHAR）数据为8位表示（即1个字节）,因此本文从字符压缩的角度考虑,将转换压缩后的像元值看作字符（即1个像元值变为2个字符）,并利用字符编码压缩方法对上述字节冗余进行最大程度消除。待数据传输完成后,将解压后字符集两两组合为无符号短整型数值,即可完成编码解压。

View original graphic|Download|PPT slide

Fig.1 Diagram of byte redundancy

图1 字节冗余示例

鉴于地理栅格数据传输具有时效性和保真性的要求,且转换压缩已隶属于有损压缩,因此,字符编码算法的选取应以压缩速度优先、压缩率兼顾且无损压缩为原则。目前,字符无损压缩方法从压缩模型上,大致可分为基于字典模式和基于统计模式的压缩方法^[21]。基于字典模式的压缩方法采用类似查字典的方式进行编码,即以较长的字符串或重复出现的字符组合构成字典中的词条集,并采用统一的记号来表示。字典压缩算法主要包括LZ77系列算法（如LZO、QUICKLZ、LZ4、LZFX、SNAPPY、FASTLZ等）及LZ88系列算法（如LZW等）,其中LZ77系列算法常常致力于压缩和解压速度,有时甚至牺牲压缩率。基于统计模式压缩方法的实质是通过统计字符的出现频率对原生字符进行重新编码,与原始字符出现频率有关而与其排列次序无关,主要的压缩算法有游程编码（RLE）、哈夫曼编码（HUFFMAN）、香农-范诺编码（SHANNO-FNAO,SFANO）等。本文将通过系列实验对上述字符无损压缩算法进行对比综合分析,然后选择最优字符压缩算法作为编码压缩阶段的方法。

3 实验结果与分析

3.1 实验环境及数据

本文利用具有跨平台特性的C++编程语言开发实现了两阶段压缩方法。为评价两阶段压缩方法的压缩性能,在一个配置Linux操作系统的集群上进行了测试实验。同时,基于MPI消息传递接口对应用两阶段压缩方法进行数据压缩传输的效率进行了测试。测试的环境为：10个计算节点,每个节点配置2个2.67 GHz六核Intel（R）Xeon（R）X5650处理器和24 GB内存,并通过1000 Mbps的快速交换式以太网互联。

本文选取了位于黄土高原丘陵沟壑区的陕北绥德县韭园沟部分地区为实验样区。区内丘陵起伏,沟壑纵横,地形相对较为复杂。基础数据为按照国家DEM生产规程生成的 1:1万（浮点型、5 m分辨率、1821行×2134列）DEM数据。以上述数据为基础,通过顺时针旋转5°、30°,分别获取了无值区域覆盖不同的数据（2001行×2285列、2645行×2759列）,如图2所示。以上3组数据和其对应通过取整生成的整型DEM数据,依次简称为数据组1、数据组2和数据组3。同时,以数据组3中浮点型DEM为基础,通过双线性内插重采样方法,也分别获取了1 m分辨率（13 225行×13 795列,约700 MB,为采样数据1）和0.4 m分辨率（33 063行×34 488列,约4 GB,为采样数据2）大数据量DEM数据,用于压缩传输实验。

View original graphic|Download|PPT slide

Fig.2 DEMs of the study area

图2 实验数据

3.2 评价方法

（1）精度评价

由于两阶段压缩方法中转换压缩为有损压缩,地理栅格数据压缩后会产生精度丢失。为评价两阶段压缩方法的数据保真性,本文从数值精度、等值线分析以及水文结构分析等方面,对转换压缩前后的数据进行精度分析。

对于数值精度,将待评定地理栅格数据（或由原始地理栅格数据通过地理计算派生出的结果）与转换压缩解压后数据相减可得到逐像元的数值残差,残差的整体统计指标包括最小值、最大值、均值及标准差。

等值线分析是诊断地理栅格数据误差及其对地理对象表达（如地形结构等）的有效工具,即以原始地理栅格数据提取的等值线（称为基准等值线）为基准,套合叠加转换压缩解压后数据所提取等间距的等值线（称为转换等值线）进行分析。在二维平面里,转换等值线到同值邻近基准等值线的距离,定义为等值线逼近度。等值线逼近度在形态上反映了转换等值线与基准等值线的一致程度,其数值上等于转换等值线结点到基准等值线的最近距离。本文以等值线逼近度为等值线分析定量评价因子,整体统计指标有最小值、最大值、均值及标准差。

水文结构分析针对DEM这一类地理栅格数据而设计,实质为以原始DEM数据提取的河流网络为基准,叠加对比分析转换压缩解压后DEM数据所提取等量河流网络。在河流网络中,径流节点为各级河流与上级河流交汇点,其空间分布可反映河流网络的基本骨架。本文定义径流节点完整度为转换径流节点与基准一致的数量所占基准径流节点数量的百分比,其中转换前后径流节点一致的约束条件为二者的距离小于DEM格网大小的四分之一。

（2）效率评价

对于一个压缩算法而言,压缩时间、解压时间及压缩率（式（3））是3个量化压缩算法性能的常用指标。为综合评定压缩算法的性能,本文定义净综合速率为单位时间内完成处理压缩率数据的大小,计算方法如式（4）所示。

CE = D V pre - D V com D V pre × 100 %

（3）

UCV = CE × D V pre CT + UCT

（4）

式中：

CE

为压缩率;

D V pre

和

D V pre

分别为压缩前后地理栅格数据大小;

UCV

是净综合速率;

CT

和

UCT

则是压缩时间及解压时间。

当面向数据传输需求时,压缩算法是否运用有效的准则为直接传输时间消耗是否大于压缩传输所需时间。为量化表征运用压缩算法进行传输地理栅格数据的性能,本文定义了压缩传输比,即直接传输地理栅格数据与压缩传输所消耗时间的比率,公式如式（5）所示。

CS = DT CT + UCT + TT

（5）

式中：

CS

为压缩传输比;

DT

和

TT

分别为传输压缩前后地理栅格数据消耗时间。

3.3 精度分析与算法选取

以数据组1中浮点型DEM为基础,对两阶段压缩方法的精度保真进行了分析,结果如图3和表1所示。从高程和坡度2个角度,评价了数值精度（图3（a）-（b））。由于转换压缩时进行了有效数位截留,从而造成高程残差偏离正态分布,可看作“系统误差”。高程最大误差为0.003 m,精度区间近似为±0.002 m,可看出转换压缩虽是有损压缩,但对高程精度丢失影响甚小。坡度由高程派生得到,其误差近似服从正态分布,说明高程中的“系统误差”对分析结果误差不起主控作用。坡度最大误差为0.034°,精度区间为±0.007°,转换压缩前后坡度精度丢失非常小。为分析压缩前后地表形态精度,从等高线及河网套合2个方面进行了评价（图3（c）-（d））。通过目视等高线套合结果可看出,压缩前后等高线几乎无差异。等值线逼近度平均值为0.05 m,标准差为0.03 m,且最大值也仅为0.45 m,不足DEM分辨率的十分之一,进一步验证了目视结果。在河网套合分析中,径流节点完整度达到98.55%,说明转换前后地形结构几乎无变化。对于径流节点存在变化的地方均为谷底平坦区域,而这些区域DEM本身就存在表达不确定性。综合数值精度和形态精度实验结果可看出,两阶段压缩方法的数据保真性好,解压后的数据无论对于地理对象表达还是分析应用,均可得到可靠的结果。

Tab.1 Performance of different lossless compression algorithms

表1 不同无损压缩算法压缩性能

	数据组1（1821行×2134列）			数据组2（2001行×2285列）				数据组3（2645行×2759列）
	CE/(%)		CT/10^-2 s	UCT10^-2 s	CE/(%)	CT/10^-2 s	UCT/10^-2 s	CE/(%)	CT/10^-2 s	UCT/10^-2 s
LZO	-0.36(39.03)	0.35( 2.93)	0.27( 2.31)	13.87(46.11)	0.47( 3.10)	0.36( 2.35)		45.49(62.75)	0.58( 3.45)	0.51( 2.58)
QUICKLZ	0.00(48.68)	1.66( 3.50)	0.24( 2.97)	0.00(53.93)	2.24( 3.65)	0.33( 3.14)		40.35(67.43)	3.49( 4.20)	4.27( 3.71)
LZ4	0.08(38.91)	0.70( 3.64)	0.28( 1.05)	14.90(45.63)	0.92( 3.79)	0.34( 1.14)		46.49(61.80)	1.07( 4.10)	0.51( 1.31)
LZFX	-1.96(39.62)	6.42( 5.45)	0.61( 3.76)	13.08(46.04)	6.74( 5.79)	0.85( 3.86)		45.26(62.16)	7.14( 6.69)	1.91( 4.03)
SNAPPY	0.29(38.02)	0.44( 5.46)	0.30( 1.39)	14.33(44.36)	0.75( 5.53)	0.35( 1.49)		44.53(59.85)	0.80( 5.54)	0.61( 1.83)
FASTLZ	-1.66(32.40)	5.02( 5.69)	0.67( 3.10)	13.40(40.43)	5.22( 6.04)	0.73( 3.14)		45.73(59.44)	5.32( 6.96)	0.86( 3.31)
LZW	-39.79(34.14)	36.13(25.50)	7.55( 8.35)	-19.11(16.00)	37.52(30.91)	8.50( 8.80)		24.85(46.12)	36.92(37.56)	9.65(11.25)
RLE	-0.19( 0.08)	2.14( 2.28)	1.19( 1.36)	14.76( 0.07)	2.68( 2.67)	1.53( 1.55)		46.59( 0.05)	4.18( 4.01)	1.94( 2.26)
HUFFMAN	1.68( 7.58)	30.72(29.35)	32.30(29.17)	8.82(12.40)	33.60(33.15)	34.17(32.86)		35.28(33.00)	34.83(39.22)	35.69(31.96)
SFANO	1.22( 7.05)	31.26(28.56)	24.71(23.74)	8.50(12.02)	31.78(29.19)	25.69(26.54)		30.99(28.47)	36.99(37.34)	31.50(31.57)


	注：括号内数字为对应整型数据组实验结果

View original graphic|Download|PPT slide

Fig.3 Error analysis of DEMs before and after the conversion compression

图3 DEM数据转换压缩前后误差分析

表1和表2是字符无损压缩算法性能实验结果。随着数据无值区覆盖面积增大（即数据冗余量增大）,各个压缩算法的压缩率均大幅增大,且压缩及解压时间也随之增加,这为面向大数据量进行压缩传输提供了可能。相比浮点型数据,各个压缩算法在处理整型数据时表现更为优越,这与整型和浮点型数据在内存中不同的表示结构有关,同时这也间接表明了转换压缩在两阶段压缩方法中的重要作用。在处理地理栅格数据方面,基于字典模式的压缩算法比基于统计模式的压缩算法表现出更好的性能,特别是基于字典模式的LZ77系列算法更为突出,且随着数据冗余量增加,优势也更加凸显。从净综合速率可知,LZO、LZ4、SNAPPY 3个压缩算法具有较好地即时性特征。在实验数据组里,最大净综合速率为LZO处理浮点型数据时的576.93 MB/s,及LZ4处理整型数据的158.93 MB/s,这表明LZO较为适合处理浮点型数据,而LZ4适合处理整型数据。从表2也可看出,最大净综合速率与数据冗余程度成正比,变化幅度显著。

Tab.2 Net execution speeds of different lossless compression algorithms

表2 不同无损压缩算法的净综合速率

	数据组1（1821行×2134列）			数据组2（2001行×2285列）			数据组3（2645行×2759列）
	浮点型NCV/(MB/s)		整型NCV/(MB/s)		浮点型NCV/(MB/s)	整型NCV/(MB/s)	浮点型NCV/(MB/s)	整型NCV/(MB/s)
LZO	-4.26	55.18		147.37	73.79		576.93	145.00
QUICKLZ	0.00	55.75		0.00	69.28		72.36	118.64
LZ4	0.62	61.48		103.15	80.80		409.46	158.93
LZFX	-2.06	31.90		15.02	41.63		69.61	80.76
SNAPPY	2.89	41.15		114.11	55.08		438.75	112.94
FASTLZ	-2.16	27.34		19.64	38.42		103.00	80.50
LZW	-6.75	7.47		-3.62	3.51		7.43	13.15
RLE	-0.42	0.17		30.54	0.14		105.92	0.10
HUFFMAN	0.20	0.96		1.14	1.64		6.96	6.45
SFANO	0.16	1.00		1.29	1.88		6.30	5.75

当进行地理栅格数据传输时,压缩处理过程通常在内存中,字符压缩算法的选择必须有尽可能快的压缩和解压速度,同时要兼顾不能消耗过多CPU资源。针对不同数据特征（大小情况、重复情况等）,LZO分为多种压缩方法,如LZO1A,LZO1B,LZO1F,LZO1X等,其中LZO1X对于大部分情况下均具有较好的压缩效果。在速度优先的前提下,为达到不同的压缩率,LZO1X压缩级别分为1-9,99及999几个级别。压缩级别越高,表明压缩率越高;同时,相对的压缩速度会减慢,而且压缩所需内存增加。一般而言,当压缩级别为1时,仅需64 KB的内存（本文实验过程正是采用了该级别）。因此,本文选择LZO1X-1字符压缩算法作为编码压缩阶段方法。

3.4 压缩方法效率分析

由表3的两阶段压缩方法性能可看出,无论地理栅格数据是否存在信息冗余,两阶段压缩方法均至少可达近50%的压缩率。两阶段压缩方法压缩和解压速度也会随着数据冗余量的增大而增大,对于实验数据,其压缩速度最大达到1294.79 MB/s、解压速度最大到达766.89 MB/s。综合压缩率和压缩/解压速率看,两阶段压缩方法优于表1中所列的单一字符压缩方法。上述数字充分体现了两阶段压缩方法在高压缩率及即时性2个方面均具有显著的优势。

Tab.3 Performance of two-phase compression method

表3 两阶段压缩方法性能

	CE/(%)	CT/10^-2 s	UCT/10^-2 s
数据组1 （1821行×2134列）	49.82 (69.51)	1.59 (3.85)	3.05 (4.65)
数据组2 （2001行×2285列）	56.93 (73.05)	1.56 (4.16)	2.83 (5.11)
数据组3 （2645行×2759列）	72.75 (81.38)	2.15 (5.21)	3.63 (6.92)


	注：括号内数字为对应整型数据组实验结果

基于两阶段压缩方法,利用采样数据1和采样数据2,进行了压缩传输和直接传输对比分析,如图4和表4所示。压缩传输过程为：主进程（主计算节点）利用两阶段压缩方法将数据进行压缩后发送给从进程（从计算节点）;从进程接收来自主进程的压缩数据后,再利用两阶段压缩方法进行解压。实验过程中,依据从进程数量,地理栅格数据被按行平均分配为多个子数据块,且主机只设置一个进程,进行数据读取并发送给从机（主节点自身不分配数据）。从图4中可看出,在不同进程数下,压缩传输消耗时间均小于直接传输,且减少传输时间均在50%以上。同时,在压缩传输过程中,压缩和解压时间占压缩传输总消耗的比例随着进程数的增加而逐渐降低,当进程数为64时,比重仅为16.27%左右。对于采样数据2而言,在进程数为2时,直接传输由于超过了MPI通信消息长度大小的限制（最大MPI消息长度为2³¹-1）而导致传输失败。此时通过两阶段压缩方法进行数据压缩,消息长度得到缩短,传输成功。从表4可知,压缩传输比可达2.5以上,且当数据量增加,其有增大的趋势,说明了两阶段压缩方法在多计算节点间进行网络传输应用具有较好的稳定性。以上实验结果均表明,两阶段压缩方法可显著减少传输时间,提高网络传输效率,是解决有限信道容量制约的有效手段。

View original graphic|Download|PPT slide

Fig.4 Data transmission efficient based on the two-phase compression method

图4 基于两阶段压缩方法数据传输效率

Tab.4 Com-transmission rate via the two-phase compression method

表4 基于两阶段压缩方法的压缩传输比

	进程数
	2	4	8	16	32	64
采样数据1 （13 225行×13 795列）	2.50	2.49	2.49	2.47	2.64	2.69
采样数据2 （33 063行×34 488列）	-	2.54	2.60	2.76	2.73	2.75

4 结语

目前,在高分辨率地理栅格数据广泛应用的背景下,传输与存储海量数据的需求与有限的信道容量之间矛盾日益突出。为此,本文面向网络传输海量地理栅格数据的需求,从数据压缩的视角,基于数据保真性和压缩即时性原则,提出了融合转换压缩和编码压缩的地理栅格数据两阶段压缩方法。实验和实例应用结果表明,本文构建的两阶段压缩方法在数值和表达结构（地表形态）上均有较好的精度,数据保真性高。同时,其压缩率一般均在50%以上,解/压速率达到实时层次,能够显著地减少数据传输时间消耗,提高网络传输效率。本文提出的两阶段压缩方法对地理栅格数据具有较好地普适性,可为海量地理栅格数据存储及高性能地学并行计算等领域提供技术支撑。然而,如何将其与其他压缩方法进行优势互补,并无缝结合到地理栅格数据应用的各个领域还需要进一步研究。

The authors have declared that no competing interests exist.

参考文献

原文顺序 | 文献年度倒序 | 文中引用次数倒序

[1]

Xiong L

., Tang G

., Li F

., et al.Modeling the evolution of loess-covered landforms in the Loess Plateau of China using a DEM of underground bedrock surface[J]. Geomorphology, 2014,209:18-26.

The evolution of loess-covered landforms is largely controlled by the pre-Quaternary underlying bedrock terrain, which is one of the most important factors in understanding the formation mechanism of the landforms. This study used multiple data sources to detect 1729 outcropping points of underlying terrain, in order to construct a digital elevation model (DEM) of the paleotopography of an area of the Loess Plateau subject to severe soil erosion. Four terrain characteristics, including terrain texture, slope gradient, the hypsometric curve, and slope aspect, were used to quantify topographic differences and reveal the loess-deposition process during the Quaternary. A loess thickness map was then created to show the spatial distribution of loess deposits in the test area. Finally, the geomorphological inheritance characteristics of the loess-covered landforms were evaluated in different landform divisions. The results showed the significant inheritance of modern topography from the underlying topography with a similar general relief trends. The average thickness of loess deposits was computed to be 104.6 m, with the thickest part located in the Xifeng loess tableland area. In addition, the slope aspects of the North and Northwest seem to have favored Quaternary loess deposition, which supported the hypothesis of an eolian origin for loess in China. The modern surface has lower topographic relief compared to the underlying terrain due to loess deposition.

DOI

[2]

De Donato

., Barres

., Sausse

., et al.Advances in 3-D infrared remote sensing gas monitoring: application to an urban atmospheric environment[J]. Remote Sensing of Environment, 2016,175:301-309.

Remote sensing technologies are some of the most powerful tools for atmospheric monitoring of natural or anthropogenic ecosystems. Extensive developments were observed in the two last decades concerning both satellite, airborne, on-board and ground systems. The present paper focuses on an advanced 3D reconstruction of a gas cloud detected in the atmosphere of an urban area using a scanning infrared (IR) gas system (SIGIS2, Bruker). Several measurements were carried out from 3 different positions in order to monitor an atmospheric volume around 10 8 02m 3 . The images generated by the imaging remote sensing system correspond to the 2-D projections of the 3-D gas cloud. All the 2-D data are fully georeferenced (x, y, z and t). Each pixel of the 2-D images is associated to an IR spectrum, which was approximated to a linear combination of reference spectra and expressed as a coefficient of correlation (0 to 1). Data with a correlation coefficient higher than 0.75 are selected for 3-D modeling. The method for 3-D reconstruction of gas clouds is based on the combination and relocation of all the oriented and georeferenced measurement data. The 3-D gas cloud is determined from the 2-D images in the volume of interest processing a 3-D interpolation using the gOcad03; DSI procedure. This integrated approach was applied to a local case study in an urban area. It leads to the identification and the spatial demarcation of a cloud of SO 2 with a total volume of 6502×0210 6 02m 3 . The existence of this pollutant may be related to the presence of ancient underground tanks of gasoline, leaking because of a defect of waterproofness. Another source of SO 2 can be the emission of gases stemming from diesel machines used for important public works in this urban area. This study demonstrates that the combination of scanning imaging IR spectroscopy with the measurement setup and the 3-D gOcad03; processing can be used as a generic approach for 3-D reconstruction of gas clouds applied to any kind of ground emissive sites.

DOI

[3]	Wulfa ., Bookhagena ., Scherlera D.Differentiating between rain, snow, and glacier contributions to river discharge in the western Himalaya using remote-sensing data and distributed hydrological modeling[J]. Advances in Water Resources, 2016,88:152-169.

[4]

宫鹏,黎夏,徐冰.高分辨率影像解译理论与应用方法中的一些研究问题[J].遥感学报,2006,10(1):1-5.

近年来，不断发展的遥感技术使遥感数据呈现出高空间分辨率、高光谱分辨率和高时间采集频率的特点。卫星图像空间分辨率已经提高到0．6m级，而航空遥感数字影像分辨率高达0．1m以上。光谱分辨率高达3―4nm。不断发展的高分辨率遥感数据能够提高信息提取和监测精度，并拓展遥感数据的应用范围。目前，国外已经加快对高分辨率图像，特别是高空间分辨率影像，在城市环境、精准农业、交通及道路设施、林业测量、军事目标识别和灾害评估中的应用。但是总的情况是自动化程度不高。介绍高空间分辨率影像信息提取、高光谱和偏振影像信息提取、影像数据融合和高分辨率遥感变化探测等方面迫切需要研究的一些科学问题及其意义。建议建立图像知识库，改善数据共享环境，为有志于从事这方面研究的学者提供参考。

DOI

[ Gong

., Li

., Xu

.Interpretation theory and application method development for information extraction from high resolution remotely sensed data[J]. Journal of Remote Sensing, 2006,10(1):1-5. ]

[5]

Kidner D

., Smith, D

.Advances in the data compression of digital elevation models[J]. Compression & Geoscience, 2003,29(4):985-1002.

The maintenance and dissemination of spatial databases requires efficient strategies for handling the large volumes of data that are now publicly available. In particular, satellite and aerial imagery, radar, LiDAR, and digital elevation models (DEMs) are being utilised by a sizeable user-base, for predominantly environmental applications. The efficient dissemination of such datasets has become a key issue in the development of web-based and distributed computing environments. However, the physical size of these datasets is a major bottleneck in their storage and transmission. The problem is often exaggerated when the data is supplied in less efficient, proprietary or national data formats.This paper presents a methodology for the lossless compression of DEMs, based on the statistical correlation of terrain data in local neighbourhoods. Most data and image compression algorithms fail to capitalise fully on the inherent redundancy in spatial data. At the same time, users often prefer a uniform solution to all their data compression requirements, but these solutions may be far from optimal. The approach presented here can be thought of as a simple pre-processing of the elevation data before the use of traditional data compression software frequently applied to spatial data sets, such as GZIP. Identification and removal of the spatial redundancy in terrain data, with the use of optimal predictors for DEMs and optimal statistical encoders such as Arithmetic Coding, gives even higher compression ratios. Both GZIP and our earlier approach of combining a simple linear prediction algorithm with Huffman Coding are shown to be far from optimal in identifying and removing the spatial redundancy in DEMs. The new approaches presented here typically halve the file sizes of our earlier approach, and give a 40-62% improvement on GZIP-compressed DEMs.

DOI

[6]

Losasso

., Hoppe

.Geometry clip maps: terrain rendering using nested regular grids[J]. ACM Transaction on Graphics, 2004,23(3):769-776.

Illustration using a coarse geometry clipmap (size n=31) View of the 216,000脳93,600 U.S. dataset near Grand Canyon (n=255) Figure 1:Terrains rendered using geometry clipmaps, showing clipmap levels (size n脳n) and transition regions (in blue on right). Rendering throughput has reached a level that enables a novel approach to level-of-detail (LOD) control in terrain rendering. We introduce the geometry clipmap, which caches the terrain in a set of nested regular grids centered about the viewer. The grids are stored as vertex buffers in fast video memory, and are incrementally refilled as the viewpoint moves. This simple framework provides visual continuity, uniform frame rate, complexity throttling, and graceful degradation. Moreover it allows two new exciting real-time functionalities: decompression and synthesis. Our main dataset is a 40GB height map of the United States. A compressed image pyramid reduces the size by a remarkable factor of 100, so that it fits entirely in memory. This compressed data also contributes normal maps for shading. As the viewer approaches the surface, we synthesize grid levels finer than the stored terrain using fractal noise displacement. Decompression, synthesis, and normal-map computations are incremental, thereby allowing interactive flight at 60 frames/sec.

DOI

[7]

李艳红,庞小平,李海亭.网络环境下的遥感影像金字塔纹理压缩算法与实验[J].地球信息科学学报,2012,14(1):109-115.

建立不同分辨率的遥感影像金字塔模型是现阶段虚拟地球平台的主要技术手段,遥感影像数据的高效压缩是模型应用的基础。论文在分析多种图像压缩技术的基础上,提出以小波变换的JPEG2000标准来压缩网络环境中遥感影像金字塔的纹理瓦片数据。首先,介绍了小波变换的JPEG2000标准的基本原理和EBCOT算法特点,然后,结合实例实现了网络环境中遥感影像金字塔纹理压缩/解压缩的具体过程,对4幅影像图像进行5级离散小波变换,分别对在1024×1024、512×512、256×256、128×128、64×64不同分块大小和1∶15、1∶30、1∶60不同压缩倍率情况下进行图像压缩和重构耗时实验,通过改变分块大小参数和压缩倍率,对压缩性能进行了对比分析,最后,从主观视觉质量和客观辐射质量,对JPEG2000压缩、DXT压缩和JPEG压缩几种方法的压缩图像进行了评价与对比分析。实验结果表明,小波变换的JPEG2000具有高质量、高压缩率、良好的抗误码能力等特性,重构图像的视觉质量与原始图像相比人眼看不出失真,而峰值信噪比较好,是一种简单有效、易于快速实现的压缩方法,更适合于网络环境下空间遥感图像数据的近无损实时压缩。

DOI

[ Li Y

., Pang X

., Li H

.Research of remote sensing pyramid texture compression in network[J]. Journal of Geo-information Science, 2012,14(1):109-115. ]

[8]	卫俊霞,相里斌,段晓峰,等.基于EZW的高光谱图像压缩技术研究[J].光谱学与光谱分析,2011,31(8):2283-2286. DOI [ Wei J ., Xiang L ., Duan X ., et al.Hyperspectral image compression technology research based on EZW[J]. Spectroscopy and Spectral Analysis, 2011,31(8):2283-2286. ]

[9]

Ouafi

., Ahmed A

., Barria

., ZIitouni

.A modified embedded zero tree wavelet (MEZW) algorithm for image compression[J]. Journal of Mathematical Imaging and Vison, 2008,30(7):298-307.

In this paper, we propose a modification of the Shapiro’s Embedded Zerotree Wavelet (EZW) algorithm. Our approach, called Modified EZW (MEZW), distributes entropy differently than Shapiro’s by using six instead of four symbols used in EZW and also optimizes the coding by a binary grouping of elements before coding. This approach can produce results that are a significant improvement on the PSNR and compression ratio obtained by Shapiro, without affecting the computing time. These results are also comparable with those obtained using the SPIHT and SPECK algorithms.

DOI

[10]

Tamilarasi

., Palanisamy

.A novel embedded coding for medical image compression using contourlet transform[J]. International Journal of Signal and Imaging Systems Engineering, 2012,5(3):204-212.

ABSTRACT The contourlet transform along with wavelet theory has great potential in medical image compression. The proposed technique aims at reducing the transmission cost while preserving the diagnostic integrity. In this paper we propose a wavelet based contourlet image compression algorithm. In the diagnosis of medical images, the ROI is selected using fuzzy C means algorithm and then to the resultant image optimized contourlet transform is applied. The region of less significance are compressed using Discrete Wavelet Transform and finally modified embedded zerotree wavelet algorithm is applied which uses six symbols instead of four symbol with better PSNR and high compression ratio.

DOI

[11]

Jiang

.Image compression with neural networks - a survey[J]. Signal Processing: Image Communication, 1999,14(9):737-760.

Apart from the existing technology on image compression represented by series of JPEG, MPEG and H.26x standards, new technology such as neural networks and genetic algorithms are being developed to explore the future of image coding. Successful applications of neural networks to vector quantization have now become well established, and other aspects of neural network involvement in this area are stepping up to play significant roles in assisting with those traditional technologies. This paper presents an extensive survey on the development of neural networks for image compression which covers three categories: direct image compression by neural networks; neural network implementation of existing techniques, and neural network based technology which provide improvement over traditional algorithms.

DOI

[12]

Toledo

., Pinzolas

., Ibarrola J

., et al.Improvement of the neighborhood based Levenberg-Marquardt algorithm by local adaptation of the learning coefficient[J]. IEEE Transactions on Neural Networks, 2005,16(4):988-992.

In this letter, an improvement of the recently developed neighborhood-based Levenberg-Marquardt (NBLM) algorithm is proposed and tested for neural network (NN) training. The algorithm is modified by allowing local adaptation of a different learning coefficient for each neighborhood. This simple add-in to the NBLM training method significantly increases the efficiency of the training episodes carried out with small neighborhood sizes, thus, allowing important savings in memory occupation and computational time while obtaining better performance than the original Levenberg-Marquardt (LM) and NBLM methods.

DOI PMID

[13]

Gerstner

.Multiresolution compression and visualization of globaltopographic data[J]. Geoinformatica, 2003,7(1):7-32.

<a name="Abs1"></a>We present a multiresolution model for terrain surfaces which is able to handle large-scale global topographic data. It is based on a hierarchical decomposition of the sphere by a recursive bisection triangulation in geographic coordinates. Error indicators allow the representation of the data at various levels of detail and enable data compression by local omission of data values. The resulting adaptive hierarchical triangulation is stored using a bit code of the underlying binary tree and additionally, relative pointers which allow a selective tree traversal. This way, it is possible to work directly on the compressed data. We show that significant compression rates can be obtained already for small threshold values. In a visualization application, adaptive triangulations which consist of hundreds of thousands of shaded triangles are extracted and drawn at interactive rates.

DOI

[14]	Platings ., Day A M.Compression of large-scale terrain data for realtime visualization using a tiled quad tree[J]. Computer Graphics Forum, 2004,23(4):41-759.

[15]

Yusov

., Shevtsov

.High-performance terrain rendering using hardware tessellation[J]. Journal of WSCG, 2011,19(23):85-92.

In this paper, we present a new terrain rendering approach, with adaptive triangulation performed entirely on theGPU via tessellation unit available on the DX11-class graphics hardware. The proposed approach avoidsencoding of the triangulation topology thus reducing the CPU burden significantly. It also minimizes the datatransfer overhead between host and GPU memory, which also improves rendering performance. During thepreprocessing, we construct a multiresolution terrain height map representation that is encoded by the robustcompression technique enabling direct error control. The technique is efficiently accelerated by the GPU andallows the trade-off between speed and compression performance. At run time, an adaptive triangulation isconstructed in two stages: a coarse and a fine-grain one. At the first stage, rendering algorithm selects thecoarsest level patches that satisfy the given error threshold. At the second stage, each patch is subdivided intosmaller blocks which are then tessellated on the GPU in the way that guarantees seamless triangulation.

[16]

Zhao J

., Tang

., Tong R

.Connectivity-based segmentation for GPU-accelerated mesh decompression[J]. Journal of Computer Science and Technology, 2012,27(6):1110-1118.

We present a novel algorithm to partition large 3D meshes for GPU-accelerated decompression.Our formulation focuses on minimizing the replicated vertices between patches,and balancing the numbers of faces of patches for efficient parallel computing.First we generate a topology model of the original mesh and remove vertex positions.Then we assign the centers of patches using geodesic farthest point sampling and cluster the faces according to the geodesic distance to the centers.After the segmentation we swap boundary faces to fix jagged boundaries and store the boundary vertices for whole-mesh preservation.The decompression of each patch runs on a thread of GPU,and we evaluate its performance on various large benchmarks.In practice,the GPU-based decompression algorithm runs more than 48x faster on NVIDIA GeForce GTX 580 GPU compared with that on the CPU using single core.

DOI

[17]

Durdevic D

., Tartalja I

.HFPAC: GPU friendly height field parallel compression[J].GeoInformatica,2013,17(1):207-233.

In this paper, we present a novel method for fast lossy or lossless compression and decompression of regular height fields. The method is suitable for SIMD parallel implementation and thus inherently suitable for modern GPU architectures. Lossy compression is achieved by approximating the height field with a set of quadratic Bezier surfaces. In addition, lossless compression is achieved by superimposing the residuals over the lossy approximation. We validated the method's efficiency through a CUDA implementation of compression and decompression algorithms. The method allows independent decompression of individual data points, as well as progressive decompression. Even in the case of lossy decompression, the decompressed surface is inherently seamless. In comparison with the GPU-oriented state-of-the-art method, the proposed method, combined with a widely available lossless compression method (such as DEFLATE), achieves comparable compression ratios. The method's efficiency slightly outperforms the state-of-the-art method for very high workloads and considerably for lower workloads.

DOI

[18]

陈志荣,尹天鹤,徐财江,等.面向移动用户的矢量地图数据压缩方法[J]. 浙江大学学报(理工版),2016,43(1):45-50.

随着第4代移动通信技术（4G）的普及以及4核和8核CPU手机的出现,无线网络数据传输能力大幅提升,移动终端性能大为改善.移动用户的广泛参与使得地理信息的采集和上传下载数据巨量,致使有限的带宽传输速率与海量信息传输需求不匹配.针对移动用户对空间数据的高压缩率和高失真度容忍率的需求,提出了一种新的矢量地图数据压缩方法,通过去除冗余点、平移坐标轴和转换数据类型3个步骤,可2次压缩数据.测试结果显示,该方法的综合压缩率接近70%,可有效降低无线网络数据传输负荷,节约移动终端的存储空间.

DOI

[ Chen Z

., Yin T

., Xu C

., et al.Vector data compression method for mobile users[J]. Journal of Zhejiang University (Science Edition), 2016,43(1):45-50. ]

[19]

蔡明,乔文孝,鞠晓东,等.一种新的数据无损压缩编码方法[J].电子与信息学报,2014,36(4):1008-1012.

为了降低数据存储和传输的成本，对数据进行压缩处理是一种有效的手段。该文针对具有较小均方值特征的整型数据序列提出了一种新的可用于数据无损压缩的位重组标记编码方法。该方法首先对整型数据序列进行位重组处理，以提高部分数据出现的概率；然后根据数据流中局部数据的概率分布特点自适应地选择合适的编码方式对数据流进行编码。运用实际具有较小均方值特征的整型数据序列对该文方法和其它几种无损压缩方法进行了压缩解压测试，并对比分析了各种压缩算法的压缩效果。测试结果表明，新方法可以实现数据的无损压缩与解压，且其压缩效果优于LZW编码，经典的算术编码，通用的WinRAR软件和专业音频数据压缩软件FLAC的压缩效果，具有良好的应用前景。

DOI

[ Cai

., Qiao W

., Ju X

., et al.A new coding method for lossless data compression[J]. Journal of Electronics & Information Technology, 2014,36(4):1008-1012. ]

[20]	汤国安,刘学军,闾国年,等.数字高程模型及地学分析的原理与方法[M].北京:科学出版社,2005. [ Tang G ., Liu X ., Lv G ., et al.The principle and methodology of DEM based Geo-analysis[M]. Beijing: Science Press, 2005. ]

[21]	袁玫,袁文.数据压缩技术及其应用[M].北京:电子工业出版社,1994. [ Yuan ., Yuan W.Application of data compression technique[M]. Beijing: Publishing House of Electronics Industry, 1994. ]

Options

摘要页面

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

1 引言

2 两阶段压缩方法

2.1 转换压缩

2.2 编码压缩

Fig.1 Diagram of byte redundancy

3 实验结果与分析

3.1 实验环境及数据

Fig.2 DEMs of the study area

3.2 评价方法

3.3 精度分析与算法选取

Tab.1 Performance of different lossless compression algorithms

Fig.3 Error analysis of DEMs before and after the conversion compression

Tab.2 Net execution speeds of different lossless compression algorithms

3.4 压缩方法效率分析

Tab.3 Performance of two-phase compression method

Fig.4 Data transmission efficient based on the two-phase compression method

Tab.4 Com-transmission rate via the two-phase compression method

4 结语

参考文献