地球信息科学学报 ›› 2021, Vol. 23 ›› Issue (1): 134-142.doi: 10.12082/dqxxkx.2021.200425

• 专栏:"全空间信息建模分析方法与应用研究" • 上一篇    下一篇

基于张量分解的AoT序列数据有损压缩方法

杨晨1,2(), 高鸿1,2, 张丽莹1,2, 胡旭1,2, 俞肇元1,2,3, 李冬双4,5,*()   

  1. 1.南京师范大学 虚拟地理环境教育部重点实验室,南京 210023
    2.江苏省地理环境演化国家重点实验室培育建设点,南京 210023
    3.江苏省地理信息资源开发与利用协同创新中心,南京 210023
    4.江苏省作物遗传生理国家重点实验室/江苏省作物栽培生理重点实验室,扬州大学农学院,扬州 225009
    5.江苏省粮食作物现代产业技术协同创新中心,扬州大学,扬州 225009
  • 收稿日期:2020-07-31 修回日期:2020-12-29 出版日期:2021-01-25 发布日期:2021-03-25
  • 通讯作者: 李冬双
  • 作者简介:杨 晨(1997— ),女,山西大同人,硕士生,主要从事地理空间数据处理与分析方面研究。E-mail: yangchen765@126.com
  • 基金资助:
    国家重点研发计划项目(2016YFB0502301);国家自然科学基金项目(42001320);国家自然科学基金项目(41976186)

A Lossy Compression Method for AoT Sequence Data based on Tensor Decomposition

YANG Chen1,2(), GAO Hong1,2, ZHANG Liying1,2, HU Xu1,2, YU Zhaoyuan1,2,3, LI Dongshuang4,5,*()   

  1. 1. Key Laboratory of Virtual Geographic Environment of The Ministry of Education (Nanjing Normal University), Nanjing 210023, China
    2. Cultivation Base of State Key Laboratory of Geographical Environment Evolution, Jiangsu Province, Nanjing 210023, China
    3. Jiangsu Provincial Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China
    4. Jiangsu Key Laboratory of Crop Genetics and Physiology/Jiangsu Key Laboratory of Crop Cultivation and Physiology, Agricultural College of Yangzhou University, Yangzhou 225009, China
    5. Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, Yangzhou University, Yangzhou 225009, China
  • Received:2020-07-31 Revised:2020-12-29 Online:2021-01-25 Published:2021-03-25
  • Contact: LI Dongshuang
  • Supported by:
    National Key Research and Development Program of China(2016YFB0502301);National Natural Science Foundation of China(42001320);National Natural Science Foundation of China(41976186)

摘要:

Array of Things (AoT)通过单一位置上的多传感器对城市系统进行连续动态观测。AoT观测数据量大且持续增长,使得如何利用有限的计算资源进行AoT序列数据的压缩传输成为其应用的关键瓶颈之一。本文提出了一种基于张量分解的AoT序列数据的有损压缩方法。面向其海量、高维且需在传感器端处理的需求,该方法首先将AoT序列数据组织成高维张量,利用算法复杂度较低的张量CANDECOMP/PARAFAC (CP)分解提取各维度上的特征主分量,而后利用张量重构实现特征保持的数据有损压缩。利用基于张量分解的有损压缩方法,针对美国芝加哥市区的24 h内感测的声光电磁数据进行了实验,讨论了不同压缩参数对压缩比、压缩误差、压缩精度、压缩时间、压缩过程运行内存占用和压缩结果内存占用之间的影响。实验结果表明该方法可实现AoT序列数据的有损压缩,其较小的内存占用能够支持传感器端数据压缩。并且与原始光场强度对比表明,压缩后的数据保持了原有时空分布特征。与传统矢量量化编码压缩方法相比,在相同压缩精度下,本文方法的压缩比约高27%~76%,压缩时间约节省46%~73%,压缩结果所占内存约节省17%~57%,因此本文方法具有更高的压缩比,更低的压缩时间和内存占用,可为AoT这一类数据的大规模有损压缩提供借鉴意义。

关键词: 传感器, 时空序列, AoT, 有损压缩, 多维张量, 张量分解, CP分解, 张量重构

Abstract:

Array of Things (AoT) provides continuous and dynamic observations of urban systems through multiple sensors at a single location. How to utilize the limited computing resources to compress and transmit AoT sequence data becomes one of the key bottlenecks of the AoT application. Considering that most AoT sequence data are massive, high-dimensional and needed to be processed at the sensor side, a tensor decomposition method is introduced to the lossy compression for AoT sequence data in this work. This method first organizes the AoT sequence data as a high-dimensional tensor to preserve the multidimensional coupling relationship among the different dimensions. The CANDECOMP/PARAFAC (CP) decomposition, which has simple parameter, relatively simple principle and low algorithm complexity, is then utilized to decompose and extract the principal feature components in each dimension of AoT sequence data. Since these principal feature components are obtained by absorbing the multidimensional coupling relationship, they can be further combined with tensor reconstruction to approximate the original data accurately. Considering that the data approximation is obtained by removing the redundant information, it can achieve the data lossy compression with the feature preservation. The simulation experiment is conducted based on the acousto-optic electromagnetic data sensed within 24 hours in the downtown area of Chicago in the United States. The influences of different compression parameters on compression ratio, compression error, compression accuracy, compression time, memory usage under the conditions of different compression parameter are discussed. The experimental results show that, with the increase of compression parameter, the compression error obviously decreases and the memory occupation weakly increases, which demonstrates that tensor-based method can achieve lossy compression of AoT sequence data and both the memory occupation during the running process and the memory occupation of the final results can support the data compression of sensor segments. Compared with the original intensity of the light field, the compressed data maintain the spatio-temporal distribution characteristics of original data that would not affect the further data analysis. In addition, compared with the traditional vector quantization coding compression method, the compression ratio of this method is higher about 27%~76%, the compression time is less about 46%~73%, and the memory occupation of compression result is smaller about 17%~57%. Therefore, the tensor-based method has a higher compression ratio, less compression time and smaller memory occupation under the same compression accuracy. The tensor-based method can also be applied to the data with multidimensional features, such as spatial dimensions with different locations, time dimensions at different time nodes, and attribute dimensions of different variables (temperature, humidity, etc.), which could provide a feasible idea for large-scale lossy compression of massive multidimensional geographic sensor sequence data represented by AoT sequence data.

Key words: sensor, spatio-temporal sequence, Array of Things (AoT), lossy compression, multidimensional tensor, tensor decomposition, CANDECOMP/PARAFAC decomposition, tensor reconstruction