地球信息科学学报 ›› 2021, Vol. 23 ›› Issue (6): 1028-1039.doi: 10.12082/dqxxkx.2021.200522

• 地理空间分析综合应用 • 上一篇    下一篇

机器学习方法在预测泉水潜在出露位置中的应用

李慧香1,2(), 潘云1,2,*(), 宫辉力1,2, 孙颖3   

  1. 1.首都师范大学资源环境与旅游学院,北京 100048
    2.水资源安全北京实验室,北京 100048
    3.北京市水文地质工程地质大队(北京市地质环境监测总站),北京 100195
  • 收稿日期:2020-09-11 修回日期:2020-12-12 出版日期:2021-06-25 发布日期:2021-08-25
  • 通讯作者: 潘云
  • 作者简介:李慧香(1995— ),女,内蒙古鄂尔多斯人,硕士生,主要从遥感水文方面的研究。E-mail: 838593386@qq.com

Application of Machine Learning Method in Prediction of Potential Exposure Position of Spring Water

LI Huixiang1,2(), PAN Yun1,2,*(), GONG Huili1,2, SUN Ying3   

  1. 1. Beijing Laboratory of Water Resources Security, Capital Normal University, Beijing 100048, China
    2. College of Resources Environment and Tourism, Capital Normal University, Beijing 100048, China
    3. Beijing Institute of Hydrogeology and Engineering Geology ( Beijing geological environment monitoring station), Beijing 100195, China
  • Received:2020-09-11 Revised:2020-12-12 Online:2021-06-25 Published:2021-08-25
  • Contact: PAN Yun

摘要:

泉水出露受到多种因素影响,在传统地质勘查手段之外,各种模型方法及影响因子预测手段,也被越来越多地应用于泉水的研究中。本文尝试利用机器学习的方法进行泉水出露位置的预测研究。根据北京市野外调查,确定了1378个测试样本点,选取了高程、坡度、坡向、地形湿度指数、径流强度指数、距河流距离、距断裂距离、岩性、归一化植被指数及土地利用类型作为影响因子,对比了2种机器学习方法(随机森林模型、分类回归树模型)和地统计方法(证据权重模型)的预测效果。研究发现:随机森林模型的预测效果最好(Area Under Curve, AUC=0.86),分类回归树和证据权重模型效果相当(AUC分别为0.81、0.80);随机森林模型同时揭示,岩性、距断裂距离和距河流距离这3个影响因子对泉潜在出露的影响最大。本研究表明,在强烈人类活动影响下机器学习方法仍然具有较好的泉水出露预测能力,有望为泉水保护、恢复提供新的技术方法。

关键词: 泉水潜在出露, 证据权重, 随机森林模型, 分类回归树模型, 北京市

Abstract:

The exposure of spring is usually difficult to be monitored over mountainous terrain. In this study we investigated the performance of statistical models (Weight of Evidence) and two machine learning models (Random Forest and Classification and Regression Tree) in predicting the potential exposure positions of spring water in Beijing. A total of 1378 springs from field survey were used for model training and validation. The environmental factors included elevation, slope, aspect, topographic wetness index, stream power index, distance to rivers, distance to faults, lithology, normalized difference vegetation index, and land use. The predicted results from the three models are validated using the receiver operating characteristics curve. The area under the curve for the Weight of Evidence model was 0.80, while that for Classification and Regression Tree and Random Forest the AUC was 0.81 and 0.86, respectively. Therefore, the Random Forest model has the best prediction performance. Moreover, the Random Forest model revealed that lithology, distance to faults, and distance to rivers had the greatest impact on the spring exposure. This study shows that the machine learning method has good prediction ability and is expected to be applied in future spring protection and restoration researches.

Key words: Groundwater spring potential map, weight of evidence, random forest, classification regression tree, Beijing