地球信息科学学报 ›› 2020, Vol. 22 ›› Issue (9): 1799-1813.doi: 10.12082/dqxxkx.2020.190441

• 地球信息科学理论与方法 • 上一篇    下一篇

基于机器学习的稀疏样本下的土壤有机质估算方法

刘明杰1,2(), 徐卓揆1,3, 郜允兵2,4,*(), 杨晶2,4, 潘瑜春2,4, 高秉博5, 周艳兵2,4, 周万鹏2,6, 王凌7   

  1. 1.长沙理工大学交通运输学院,长沙 410114
    2.国家农业信息化工程技术研究中心,北京 100097
    3.长沙理工大学公路地质灾变预警空间信息技术湖南省工程实验室,长沙 410114
    4.北京农业信息技术研究中心,北京 100097
    5.中国农业大学,北京100083
    6.河南理工大学,焦作 454003
    7.河北省农林科学院农业资源环境研究所,石家庄 050051
  • 收稿日期:2019-08-13 修回日期:2019-12-14 出版日期:2020-09-25 发布日期:2020-11-25
  • 通讯作者: 郜允兵 E-mail:2210478688@qq.com;gybgis@163.com
  • 作者简介:刘明杰(1995— ),男,贵州贵阳人,硕士生,研究方向为地理信息系统。E-mail:2210478688@qq.com
  • 基金资助:
    国家重点研发计划课题(2017YFD0801205);北京市农林科学院科技创新能力建设专项(KJCX20170407);北京市农林科学院科技创新能力建设专项(KJCX20200414);湖南省教育厅资助科研项目(13B129);湖南省工程实验室开放基金资助项目(KFJ180602)

Estimating Soil Organic Matter based on Machine Learning Under Sparse Sample

LIU Mingjie1,2(), XU Zhuokui1,3, GAO Yunbing2,4,*(), YANG Jing2,4, PAN Yuchun2,4, GAO Bingbo5, ZHOU Yanbing2,4, ZHOU Wanpeng2,6, WANG Ling7   

  1. 1. School of Traffic and Transportation Engineering, Changsha University of Science and Technology, Changsha 410114, China
    2. Beijing Research Center for Information Technology in Agriculture, Beijing 100097, China
    3. Engineering Laboratory of Spatial Information Technology of Highway Geological Disaster Early Warning in Hunan Province (Changsha University of Science & Technology),Changsha 410114, China
    4.National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
    5. China Agricultural University, Beijing 100083, China
    6. Henan Polytechnic University, Jiaozuo 454003, China
    7. Institute of Agricultural Resources and Environment, Hebei Academy of Agriculture and Forestry Sciences, Shijiazhuang 050051, China
  • Received:2019-08-13 Revised:2019-12-14 Online:2020-09-25 Published:2020-11-25
  • Contact: GAO Yunbing E-mail:2210478688@qq.com;gybgis@163.com
  • Supported by:
    The National Key Research and Development Program of China(2017YFD0801205);The Science and Technology Innovation Capacity Building Project of Beijing Academy of Agriculture and Forestry Sciences(KJCX20170407);The Science and Technology Innovation Capacity Building Project of Beijing Academy of Agriculture and Forestry Sciences(KJCX20200414);Scientific Research Project Funded by The Education Department of Hunan Province(13B129);Project Supported by Open Fund of Hunan Engineering Laboratory(KFJ180602)

摘要:

采用GRNN(Generalized Regression Neural Network)和RF(Random Forest)2种机器学习方法构建土壤有机质预测模型,以提高稀疏样本情况下的土壤有机质估算精度。依据北京市大兴区农用地2007年的土壤有机质采样数据,按MMSD准则(Minimization of the Mean of the Shortest Distances)抽稀为8种不同采样密度的样本(分别为2703、1352、676、339、169、85、43、22个样本),分别采用GRNN、RF和Ordinary kriging对各采样密度下的未知采样点进行预测,采用交叉检验的方式验证各采样密度下未知样点的预测精度。随着采样点密度的下降,样点间的空间自相关性逐渐减弱,半变异函数的拟和精度变差,预测点结果误差增大,预测的置信度降低。当抽稀到43个和22个采样点时,样点间的空间自相关性接近歼灭,半变异函数的决定系数较低且残差较大。普通克里格受到采样点数量和采样密度、样点的空间结构的影响比较明显,其预测精度随采样点数量的下降而下降。在85个采样点及以下时,其预测值与观测值之间没有显著的相关性。GRNN和RF的预测精度受采样密度的影响不大,其预测精度在一个较小的范围内波动,其预测值围绕观测值在一定阈值空间内震荡波动,具有较好的相关性,在85个及以下的采样密度时,预测精度相对普通克里格有较大的提升。普通克里格法不适合在稀疏样本条件下空间插值计算,尤其是在空间自相关性比较弱的情况下。机器学习模型能充分学习土壤间环境信息、样点空间邻近效应信息,兼顾属性相似性和空间自相关,具有更好的稳定性和适应性,不容易受到采样点数量、构型和采样密度等因素的影响,即使在采样点空间自相关性很弱的情况下也能做出稳定预测精度。

关键词: 土壤有机质, 空间插值, 机器学习, 属性相似性, 空间自相关, 大兴区, 稀疏样本, 采样密度

Abstract:

To improve the accuracy of soil organic estimation in the case of sparse samples and to construct the soil organic predictive models applying the machine learning methods, GRNN (Generalized Regression Neural Network) and RF(Random Forest). The soil was diluted into 8 samples with different sampling density (2703, 1352, 676, 339, 169, 85, 43, 22 samples) according to the soil organic matter sampling data of Daxing agricultural land in 2007 applying the MMSD (Minimization of the Mean of the Shortest Distances) criterion. GRNN (Generalized Regression Neural Network), RF (random forest) and Ordinary Kriging are applied to predict each sampling density espectively. Cross Validation is used to verify the prediction accuracy of unknown samples at each sampling density. With the decrease of sampling point density, the spatial correlation between sampling points decreases gradually, thus the semivariogram's fitting precision deteriorates, the errorofprediction point result increases, and the confidence of the prediction decreases. The spatial correlation between sampling points is close to disappear when the sample is diluted under 43 and 22 samples, and the coefficient of determination of the semivariogram function is low and the residual is large. The impacts the Ordinary Kriging receives, which are from the changes in the number of the sampling points, sampling density and spatial structures of samples is obvious. The prediction accuracy of the method decreases with the decrease of the number of sampling points. There is no significant correlation between the predicted values and the observed values at or below 85 sampling points. The prediction accuracy of GRNN and RF is almost independent of the sampling density. The predicted values fluctuate within a certain threshold space around the observed values, and has good correlation. At sampling points of 85 and below, the prediction accuracy is greatly improved compared with Ordinary Kriging. Ordinary Kriging is not suitable for spatial interpolating calculation in the case of sparse samples, especially in the case of weak spatial correlation. The machine learning models can fully learn the environmental information and spatial proximity information of soil sampling points. They combine attribute similarity and spatial correlation and have better stability and adaptability, not being easy to be affected by the number of sampling points, configuration and sampling density, and can make stable and accurate predictions even when the spatial autocorrelation between sampling points is very weak.

Key words: soil organic matter, spatial interpolation, machine learning, attribute similarity, spatial correlation, Daxing County, sparse sample, sampling density