Journal of Geo-information Science >
A Lossy Compression Method for AoT Sequence Data based on Tensor Decomposition
Received date: 2020-07-31
Revised date: 2020-12-29
Online published: 2021-03-25
Supported by
National Key Research and Development Program of China(2016YFB0502301)
National Natural Science Foundation of China(42001320)
National Natural Science Foundation of China(41976186)
Copyright
Array of Things (AoT) provides continuous and dynamic observations of urban systems through multiple sensors at a single location. How to utilize the limited computing resources to compress and transmit AoT sequence data becomes one of the key bottlenecks of the AoT application. Considering that most AoT sequence data are massive, high-dimensional and needed to be processed at the sensor side, a tensor decomposition method is introduced to the lossy compression for AoT sequence data in this work. This method first organizes the AoT sequence data as a high-dimensional tensor to preserve the multidimensional coupling relationship among the different dimensions. The CANDECOMP/PARAFAC (CP) decomposition, which has simple parameter, relatively simple principle and low algorithm complexity, is then utilized to decompose and extract the principal feature components in each dimension of AoT sequence data. Since these principal feature components are obtained by absorbing the multidimensional coupling relationship, they can be further combined with tensor reconstruction to approximate the original data accurately. Considering that the data approximation is obtained by removing the redundant information, it can achieve the data lossy compression with the feature preservation. The simulation experiment is conducted based on the acousto-optic electromagnetic data sensed within 24 hours in the downtown area of Chicago in the United States. The influences of different compression parameters on compression ratio, compression error, compression accuracy, compression time, memory usage under the conditions of different compression parameter are discussed. The experimental results show that, with the increase of compression parameter, the compression error obviously decreases and the memory occupation weakly increases, which demonstrates that tensor-based method can achieve lossy compression of AoT sequence data and both the memory occupation during the running process and the memory occupation of the final results can support the data compression of sensor segments. Compared with the original intensity of the light field, the compressed data maintain the spatio-temporal distribution characteristics of original data that would not affect the further data analysis. In addition, compared with the traditional vector quantization coding compression method, the compression ratio of this method is higher about 27%~76%, the compression time is less about 46%~73%, and the memory occupation of compression result is smaller about 17%~57%. Therefore, the tensor-based method has a higher compression ratio, less compression time and smaller memory occupation under the same compression accuracy. The tensor-based method can also be applied to the data with multidimensional features, such as spatial dimensions with different locations, time dimensions at different time nodes, and attribute dimensions of different variables (temperature, humidity, etc.), which could provide a feasible idea for large-scale lossy compression of massive multidimensional geographic sensor sequence data represented by AoT sequence data.
YANG Chen , GAO Hong , ZHANG Liying , HU Xu , YU Zhaoyuan , LI Dongshuang . A Lossy Compression Method for AoT Sequence Data based on Tensor Decomposition[J]. Journal of Geo-information Science, 2021 , 23(1) : 134 -142 . DOI: 10.12082/dqxxkx.2021.200425
表1 研究使用的数据集描述Tab. 1 Datasets description used in the study |
数据集名称 | 描述 | 数据集内容 |
---|---|---|
data.csv | 所有节点的感测值文件 | 数据采集时间戳、数据采集节点ID、传感器类型、传感器名称、感测类型、电子传感器原始测量值、经转换后的可读值(HRF值) |
nodes.csv | 数据集中的节点及其元数据 | 节点ID、项目ID、节点序列号(在物理机箱可见)、节点安装的街道地址、节点所在经度、节点所在纬度、节点构建和配置的更详细描述、节点安装开始时间戳、节点安装结束时间戳 |
sensors.csv | 传感器及其元数据 | 数据采集内容、传感器类型、传感器名称、感测类型、转换值的物理单位、数据表的最小HPF值(用作范围过滤器的下限)、数据表的最大HPF值(用作范围过滤器的上限)、传感器数据表的参考网址 |
provenance.csv | 整个数据集中的元数据 | 数据格式版本、项目ID、数据创建时间戳、数据结束时间戳、创建该摘要的时间戳、此摘要的URL地址 |
表2 压缩方法评价指标Tab. 2 Evaluation metrics of compression method |
结果评价 | 评价指标 | 指标计算公式 编号 | 参数描述 |
---|---|---|---|
压缩效果 | 压缩比 | (4) | 表示原始数据内存大小 为压缩结果数据内存大小 |
压缩误差 | 均方根误差 | (5) | 为重构张量的对应位置上的元素 为原始张量值对应位置上的元素 为原始张量的平均值 I、J、K分别为空间、属性、和时间维度上的数据个数 |
压缩精度 | 决定系数 | (6) |
[1] |
|
[2] |
|
[3] |
胡永利, 孙艳丰, 尹宝才 . 物联网信息感知与交互技术[J]. 计算机学报, 2012,35(6):75-91.
[
|
[4] |
|
[5] |
|
[6] |
|
[7] |
|
[8] |
|
[9] |
|
[10] |
|
[11] |
|
[12] |
|
[13] |
|
[14] |
|
[15] |
王东方, 周激流, 何坤 , 等. 基于张量Tucker分解的彩色图像压缩[J]. 四川大学学报:自然科学版, 2010,47(2):287-92.
[
|
[16] |
张乐飞, 何发智 . 基于张量分解的超光谱图像降秩与压缩[J]. 武汉大学学报·信息科学版, 2017,42(2):193-197.
[
|
[17] |
赵洪山, 马利波 . 基于张量Tucker分解的智能配电网大数据压缩[J]. 中国电机工程学报, 2019,39(16):4744-4752.
[
|
[18] |
|
[19] |
|
[20] |
|
[21] |
|
[22] |
|
[23] |
|
/
〈 | 〉 |