地球信息科学学报 ›› 2019, Vol. 21 ›› Issue (6): 799-813.doi: 10.12082/dqxxkx.2019.190014

• 地球信息科学理论与方法 •    下一篇

基于机器学习的高精度高分辨率气象因子时空估计

方颖1,2(), 李连发1,2,*()   

  1. 1. 中国科学院地理科学与资源研究所 资源与环境信息系统国家重点实验室,北京 100101
    2. 中国科学院大学,北京 100049
  • 收稿日期:2019-01-19 修回日期:2019-03-04 出版日期:2019-06-15 发布日期:2019-07-02
  • 通讯作者: 李连发 E-mail:fangying@lreis.ac.cn;lilf@lreis.ac.cn
  • 作者简介:

    作者简介:方颖(1995-),女,安徽宣城人,硕士生,研究方向为空间数据分析。E-mail: fangying@lreis.ac.cn

  • 基金资助:
    国家自然科学基金项目(41471376、41871351);中国科学院先导研究项目 (XDA19040501)

Spatiotemporal Estimation of High-Accuracy and High-Resolution Meteorological Parameters based on Machine Learning

Ying FANG1,2(), Lianfa LI1,2,*()   

  1. 1. State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Science, Beijing 100101, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2019-01-19 Revised:2019-03-04 Online:2019-06-15 Published:2019-07-02
  • Contact: Lianfa LI E-mail:fangying@lreis.ac.cn;lilf@lreis.ac.cn
  • Supported by:
    National Natural Science Foundation of China, No.41471376, 41871351;Priority Research Program of the Chinese Academy of Science, No.XDA19040501

摘要:

气象变量常作为重要的影响因子出现在环境污染、疾病健康和农业等领域,而高分辨率的气象资料可作为众多研究的基础数据,对推进相关研究的发展意义重大。本文以中国大陆为研究区域,利用2015年824个气象站点的气温、相对湿度和风速3套数据,结合不同的解释变量组合,分别构建了各自的GAM和残差自编码器神经网络(简称残差网络)模型,以10倍交叉验证判断模型是否过拟合。研究结果表明:① GAM和残差网络方法都不存在过拟合问题,同GAM相比,残差网络显著提高了模型预测的精度(3个气象因素的交叉验证CV R2平均提高了0.21,CV RMSE平均降低了37%),其中相对湿度模型的提升幅度最大(CV R2:0.85 vs. 0.52,CV RMSE:7.53% vs. 13.59%);② 残差模型的结果较普通克里格插值结果和再分析资料更接近站点观测数据,表明残差网络可作为高分辨率气象数据研制的可靠方法。此外,研究还发现在相对湿度模型中加入臭氧浓度和气温、在风速模型中加入GLDAS风速再分析资料,可提升模型的性能。

关键词: 气象因素, 机器学习, 残差自编码, 中国大陆, GAM, 深度学习, 高分辨率

Abstract:

The meteorological stations are sparsely distributed across Mainland China. In terms of generating high-resolution surfaces of meteorological parameters, the estimation accuracy of existing models is limited for air temperature, and is poor for relative humidity and wind speed (few studies reported). With the measurement data of 824 monitoring stations covering the mainland of China in 2015, this study compared the typical Generalized Additive Model (GAM) and autoencoder-based residual neural network (here after, residual network for short) in terms of predicting three meteorological parameters, i.e. air temperature, relative humidity, and wind speed. The performances of the two models were evaluated through 10-fold cross-validation. Basic variables including latitude, longitude, elevation, and the day of the year are used in the air temperature models. In addition to the basic variables, the relative humidity models use air temperature and ozone concentration as covariates, and the wind speed models use wind speed coarse-resolution reanalysis data as covariates. In our spatiotemporal models, spatial coordinates capture the spatial variation and time index of the day captures the time variation. Compared with GAM, residual network significantly improved the prediction accuracy: on average, CV (Cross Validation) R2of the three meteorological factors increased by 0.21, CV RMSE decreased by 37%, and the relative humidity model improved the most (CV R2: 0.85 vs. 0.52, CV RMSE: 7.53% vs. 13.59%). With incorporation of the monthly index in the relative humidity models, the accuracy was greatly improved, indicating that the different levels of time factors are important for the relative humidity models. Furthermore, we also discussed the effectiveness and limitations of coarse resolution reanalysis data and nearest neighbor values as covariates. This study shows that the residual network model can greatly improve the accuracy of national high spatial (1 km) and temporal (daily) resolution meteorological data as opposed to traditional GAMs. Our findings provide implications for high-accuracy and high-resolution mapping of meteorological parameters in China.

Key words: meteorological factors, machine learning, residual autoencoder, Mainland China, GAM, deep learning, high resolution