融合自注意力机制的双向LSTM时空插值模型

周啸宇; 王海起; 王琼; 单宇飞; 闫峰; 李发东; 刘峰; 曹元昊; 欧雅玟; 李雪莹

doi:10.12082/dqxxkx.2024.230574

地球信息科学学报 >

2024 , Vol. 26 >Issue 8: 1827 - 1842

DOI: https://doi.org/10.12082/dqxxkx.2024.230574

地球信息科学理论与方法

融合自注意力机制的双向LSTM时空插值模型

周啸宇 ^,¹ ,
王海起 ^,¹^,^* ,
王琼 ² ,
单宇飞 ¹ ,
闫峰 ¹ ,
李发东 ¹ ,
刘峰 ¹ ,
曹元昊 ¹ ,
欧雅玟 ¹ ,
李雪莹 ¹

展开

1.中国石油大学（华东）海洋与空间信息学院，青岛 266580
2.南京生兴有害生物防治技术股份有限公司，南京 211100

*王海起（1972— ），男，河南南阳人，博士，副教授，主要研究方向为地理信息与机器学习，空间与时空统计分析。E-mail: wanghaiqi@upc.edu.cn

周啸宇（2000— ），男，山东德州人，硕士生，主要研究方向为空间和时空数据分析。E-mail: zxy17860709661@163.com

Copy editor: 蒋树芳 , 黄光玉

收稿日期: 2023-09-23

修回日期: 2024-05-12

网络出版日期: 2024-07-24

基金资助

山东省自然科学基金面上项目(ZR2021MD068)

收起

A Bidirectional LSTM Spatiotemporal Interpolation Model with Self-attention Mechanism

ZHOU Xiaoyu ^,¹ ,
WANG Haiqi ^,¹^,^* ,
WANG Qiong ² ,
SHAN Yufei ¹ ,
YAN Feng ¹ ,
LI Fadong ¹ ,
LIU Feng ¹ ,
CAO Yuanhao ¹ ,
OU Yawen ¹ ,
LI Xueying ¹

Expand

1. China University of Petroleum, College of Oceanography and Space Informatics, Qingdao 266580, China
2. Sunshine(Nanjing)pco Technology Co., Ltd., Nanjing 211100, China

*WANG Haiqi, E-mail: wanghaiqi@upc.edu.cn

Received date: 2023-09-23

Revised date: 2024-05-12

Online published: 2024-07-24

Supported by

Natural Science Foundation of Shandong Province(ZR2021MD068)

Fold

摘要

时空插值可以捕获时空数据中的依赖关系，估计地理现象随时间的几何和属性数据变化。现有的时空插值方法大多未同时考虑数据的长期时间相关性以及全局空间信息，本文结合长短时记忆网络LSTM （Long Short Term Memory）与数据的空间特性构建了时空插值模型：①模型利用空间层剔除弱相关性的信息，提取相关性更强的空间信息输入LSTM网络；②由于传统人工神经网络ANN （Artificial Neural Network）模型无法考虑时间对插值的影响以及单向LSTM模型仅能考虑过去时刻对当前时刻的影响而不能利用未来时刻的信息，本文使用双向LSTM模型BiLSTM（Bi-directional LSTM）体现时间相关性；③为了有效提取全局空间特征并保留BiLSTM双向建模的优势，本文将自注意力机制引入BiLSTM中，构建了融合自注意力的双向LSTM插值模型SL-BiLSTM-SA （BiLSTM Model Fused with Spatial Layer-Self attention）。在实验设计阶段，模型被应用于山东省PM_2.5浓度数据集进行插值效果研究，并与其它模型进行性能比较。实验表明，SL-BiLSTM-SA模型有着更低的误差度量，相较时空普通克里金STOK （Spatio-Temporal Ordinary Kriging）和遗传算法优化的时空克里金GA-STK （Genetic Algorithm-optimized Spatio-Temporal Kriging）精度分别提高了39.83%、36.63%，且能较准确地预测高值和低值。本文融合空间信息，结合BiLSTM和Self-attention构建了时空插值模型，扩展了时空数据的插值手段，为时空数据分析提供了一定的理论和方法支撑。

本文引用格式

周啸宇 , 王海起 , 王琼 , 单宇飞 , 闫峰 , 李发东 , 刘峰 , 曹元昊 , 欧雅玟 , 李雪莹 . 融合自注意力机制的双向LSTM时空插值模型[J]. 地球信息科学学报, 2024 , 26(8) : 1827 -1842 . DOI: 10.12082/dqxxkx.2024.230574

Abstract

Spatial-temporal data missingness and sparsity are prevalent phenomena, for which spatial-temporal interpolation serves as a critical methodology to address these issues. Spatial-temporal interpolation constitutes a significant research domain within the field of Geographical Information Science. This technique enables the capture of dependencies in spatial-temporal data and the estimation of the geometric and attribute variations of geographical phenomena over time. With the advancement of geospatial technologies, particularly Geographic Information Systems, contemporary spatial-temporal interpolation methods predominantly rely on statistical, machine learning, and deep learning approaches that account for both temporal and spatial dimensions. These methods aim to reveal the evolutionary processes and spatial-temporal distribution patterns inherent in the data. However, a majority of such techniques often overlook long-term dependencies and contextual spatial information when interpolating. This study proposes an innovative model that intertwines Long Short-Term Memory (LSTM) networks with spatial attributes to address these limitations effectively. The proposed model operates through several key stages: (1) It employs a dedicated spatial layer to systematically eliminate weakly correlated information, focusing on extracting and feeding more significantly correlated spatial data into the LSTM network. (2) Given that conventional Artificial Neural Network (ANN) models are unable to consider the impact of the temporal dimension on interpolation, and unidirectional LSTM models can only factor in past moments' influence without utilizing future moment information, this research adopts a Bidirectional LSTM (BiLSTM) architecture. The BiLSTM inherently captures both spatial and temporal dependencies, thereby overcoming previous limitations. (3) To further enhance its performance by efficiently extracting comprehensive global spatial features while maintaining the advantages of bidirectional modeling offered by BiLSTM, we integrate a self-attention mechanism into the BiLSTM framework. This results in a novel, fused Bidirectional LSTM Interpolation Model with Spatial Layer-Self Attention (SL-BiLSTM-SA). In the experimental phase, the SL-BiLSTM-SA model is rigorously applied to a PM_2.5 concentration dataset from Shandong Province to conduct a meticulous investigation into its interpolation capabilities. Upon comparative analysis against other models, it is evident that the SL-BiLSTM-SA model outperforms with notably lower error metrics, demonstrating substantial improvements in accuracy—by 39.83% and 36.63% when compared to Spatio-Temporal Ordinary Kriging (STOK) and Genetic Algorithm-optimized Spatio-Temporal Kriging (GA-STK) methods, respectively. Moreover, our model exhibits commendable precision in forecasting high and low concentration levels. By seamlessly integrating spatial information and coupling the strengths of BiLSTM with self-attention mechanisms, this research not only extends the suite of interpolation methods for spatiotemporal data analysis but also furnishes robust theoretical underpinnings and methodological support to facilitate sophisticated spatiotemporal data analyses.

Key words： spatiotemporal interpolation; spatiotemporal correlation; spatial layer; long-term correlation; bidirectional long- and short-term memory network; self-attention mechanism; PM_2.5

1 引言

时空数据的缺失和稀疏分布是普遍存在的现象，时空插值是解决数据缺失和稀疏分布的重要手段，时空插值通过捕获时空依赖关系，利用已知位置和时间上的观测值来推断未知位置和时间上的变量值，换言之，时空插值方法可估计地理现象随时间的几何和属性数据变化^[1]。随着地理分析技术的发展，目前时空插值方法主要基于统计学、机器学习和深度学习建模时间和空间因素，揭示时空数据的演变过程和时空分布规律^[2]。

基于统计学的插值模型往往具有明确的数学描述，如经典的反距离加权IDW（Inverse Distance Weighted）和克里金^[3]，其中，时空克里金利用变异函数模型表征随机变量的变异结构或时空连续性，描述变量的时空结构特征。Zoest等^[4]采用时空回归克里金法预测荷兰埃因霍温市未观测时空位置的二氧化氮浓度。梅杨^[5]使用普通克里金OK（Ordinary Kriging）、时空普通克里金STOK和时空趋势克里金对山东省2014年PM_2.5日均浓度进行时空建模与预测分析。徐明轩等^[6]提出利用时空半变异函数对传统空间插值模型进行扩展的地下瓦斯浓度场重构方法，实现了复杂矿井环境下稀疏传感器对瓦斯浓度整体分布的监测。Wang等^[7]提出了“三位一体”的空间统计理论体系，该体系由总体性质、空间采样和推断构成，将空间自相关、分层异质性集成到了空间采样和统计推断过程中，在此理论体系基础上，Xu和Wang等^[8]提出了一种时空点插值方法，该方法考虑了总体相关性和异质性并且具有弥补样本偏差的能力，效果明显优于克里金等传统方法。

机器学习和深度学习模型可以更准确地捕捉时空变量之间的非线性关系，因此越来越多地应用于时空插值。Li等^[9]通过时空克里金估计随机森林模拟的拟合误差，将时空克里金与随机森林结合构建随机森林-时空克里金RF-STK（Random Forest-Spatiotemporal Kriging）模型。Martínez-Comesaña等^[10]提出基于优化多层感知器神经网络的插值方法对建筑物的室内温度、相对湿度和二氧化碳浓度进行时空估计，采用多目标遗传算法NSGA-II（Non-dominated Sorting Genetic Algorithm II）对神经网络进行优化。Wu等^[11]提出归纳式图神经网络克里金模型，生成随机子图作为样本，并对每个样本重建相应的邻接矩阵，以此恢复图上未采样节点的数据。黎嵘繁等^[12]提出基于多头注意力的时空克里金法，利用时空掩码矩阵建模时空依赖关系以捕捉时空特征，并利用多头注意力机制学习多层次的空间特征。

除考虑空间特征外，时间特征和时空特征的刻画亦是深度学习模型进行时空插值和预测的研究重点。Shi等^[13]使用LSTM网络预测了较短时间内局部地区未来的降雨强度。Zhao等^[14]将长短期记忆全连接网络LSTM-FC（LSTM-Fully Connected）应用于空气质量监测站48 h内的PM_2.5浓度预测。Fan等^[15]提出基于深度递归神经网络DRNN（Deep Recurrent Neural Network）的空气污染物时空预测框架。然而上述方法仅对监测站点处进行预测，未对未采样位置进行插值。Ma等^[16]提出一种基于地理长短期记忆网络Geo-LSTM（Geographic Long Short-Term Memory）的时空插值模型生成空气污染物浓度的空间插值结果，该模型同时考虑了空气污染物的时间变化趋势和空间关联。考虑到LSTM单向建模的局限性，Ma等^[17]将BiLSTM和IDW相结合，提出BiLSTM-IDW方法用于不同时间粒度的空气污染物时空插值，一方面，BiLSTM可以有效捕捉空气污染的长期时间机制，另一方面，IDW层可以考虑空气污染的空间相关性，并对空间分布进行插值。

总结上述研究，时空插值方法目前存在2个方面局限： ① 地统计方法，如克里金插值，使用预先设定的线性/非线性方程来定义复杂的时空关系； ② 基于机器学习或深度学习的插值方法大多考虑了时空数据的短期相关性或局部空间特征，未同时考虑长期时间相关性和全局空间特征。针对上述局限，本文融合空间信息，结合BiLSTM和Self-attention构建了时空插值模型，扩展了时空数据的插值手段，为时空数据分析提供了一定的理论和方法支撑。

2 研究方法

为了捕捉具有全局依赖性和局部依赖性的空间特征以及长期时间特征，研究结合LSTM网络与数据的空间特性构建了基于自注意力机制的双向LSTM插值模型SL-BiLSTM-SA。具体实现上，首先，模型利用空间层剔除弱相关性的空间信息，提取相关性更强的空间信息输入LSTM网络；其次，借助BiLSTM层捕获前后相邻时刻与当前时刻的时间相关性；最后，为了有效提取全局空间特征并保留BiLSTM双向建模的优势，引入Self-attention层捕获LSTM单元传播过程中的全局空间依赖。SL-BiLSTM-SA插值方法的技术路线如图1所示。

字段	字段说明
so2_24h	二氧化硫24 h滑动平均
no2_24h	二氧化氮24 h滑动平均
co_24h	一氧化碳24 h滑动平均
o3_8h_24h	臭氧日最大8 h滑动平均
o3_24h	臭氧日最大1 h平均
pm10_24h	颗粒物(粒径小于等于10 μm) 24 h滑动平均
pm2.5_24h	颗粒物(粒径小于等于2.5 μm) 24 h滑动平均

站点数量K	R²
站点数量K	r=3	r=5	r=7	r=9
2	0.820 7	0.802 5	0.801 2	0.785 5
6	0.824 3	0.810 3	0.795 7	0.777 0
10	0.835 3	0.812 6	0.817 6	0.803 2
14	0.828 2	0.823 4	0.805 6	0.812 2

神经元数量/个	R²
神经元数量/个	L=1	L=2	L=3	L=4	L=5
16	0.869 8	0.872 9	0.869 0	0.875 6	0.868 1
32	0.876 5	0.878 3	0.881 7	0.882 0	0.878 3
64	0.873 4	0.878 0	0.881 3	0.886 3	0.882 3
128	0.877 0	0.881 9	0.884 6	0.890 4	0.890 2
256	0.878 1	0.878 2	0.885 2	0.886 7	0.882 8

插值模型	R²	RMSE	MAE	MAPE
STOK	0.741 2	13.905 6	11.165 5	19.324 2
GA-STK	0.781 6	13.202 0	10.906 0	18.652 6
SL-LSTM	0.890 4	8.639 0	7.855 0	10.566 3
SL-BiLSTM-SA	0.902 8	8.366 5	7.783 3	10.342 4

日期	R²	RMSE	MAE	MAPE
2020年11月6日	0.781 9	8.443 9	6.106 3	13.776 6
2020年12月6日	0.885 3	9.366 5	7.392 8	10.194 2
2021年1月6日	0.781 3	6.189 7	5.554 7	14.928 7

模态框（Modal）标题

摘要

本文引用格式

Abstract

1 引言

2 研究方法

图1 SL-BiLSTM-SA插值方法的技术路线

2.1 模型结构

图2 SL-BiLSTM-SA模型结构

2.2 空间层设计

图3 筛选强空间相关信息的空间层结构

2.3 BiLSTM层设计

图4 ANN、RNN和LSTM结构对比

图5 时空相关性在SL-BiLSTM-SA模型中的体现

图6 SL-BiLSTM-SA模型的BiLSTM层结构

2.4 Self-attention层设计

图7 自注意力机制结构

3 实验与分析

3.1 研究区域及数据来源

图8 山东省国控空气质量监测站点分布

表1 空气污染物浓度指标含义

3.2 数据预处理

图9 滑动窗口示意图

图10 窗口长度为3且滑动步长为1的时间序列样本示例

3.3 实验设计与实现

表2 站点数量与滑动窗口长度不同组合的R2

图11 在邻近站点数K和滑动窗口长度r不同组合下R2变化情况

表3 BiLSTM层数和每层神经元数量不同组合的R2

图12 在BiLSTM层数和每层神经元数量不同组合下R2变化情况

3.4 实验结果与分析

图13 模型损失函数值随迭代次数的变化

表4 不同模型的精度比较

表5 SL-BiLSTM-SA模型精度评价

图14 SL-LSTM(左)和SL-BiLSTM-SA(右)模型预测值与真实值对比折线图

图15 SL-LSTM(左)和SL-BiLSTM-SA(右)模型预测值与真实值对比散点图

图16 2020年11月6日SL-BiLSTM-SA模型预测值与真实值的差值空间分布图

图17 SL-BiLSTM-SA模型的插值结果与实际分布比较

图18 SL-BiLSTM-SA模型与其他插值方法的结果对比

4 结论与讨论

参考文献

表2 站点数量与滑动窗口长度不同组合的R²

图11 在邻近站点数K和滑动窗口长度r不同组合下R²变化情况

表3 BiLSTM层数和每层神经元数量不同组合的R²

图12 在BiLSTM层数和每层神经元数量不同组合下R²变化情况