地球信息科学学报 ›› 2019, Vol. 21 ›› Issue (11): 1679-1688.doi: 10.12082/dqxxkx.2019.190185

• 地球信息科学理论与方法 • 上一篇    下一篇

基于三种机器学习算法的山洪灾害风险评价

周超, 方秀琴*(), 吴小君, 王雨晨   

  1. 河海大学地球科学与工程学院,南京 211100
  • 收稿日期:2019-04-23 修回日期:2019-06-17 出版日期:2019-10-25 发布日期:2019-12-11
  • 通讯作者: 方秀琴 E-mail:kinkinfang@hhu.edu.cn
  • 作者简介:周超(1997-),男,安徽宿州人,硕士生,主要从事地理信息科学方面的研究工作。 E-mail: 781124062@qq.com
  • 基金资助:
    国家重点研发计划项目(No.2016YFA0601500)

Risk Assessment of Mountain Torrents based on Three Machine Learning Algorithms

ZHOU Chao, FANG Xiuqin*(), WU Xiaojun, WANG Yuchen   

  1. School of Earth Sciences and Engineering, Hohai University, Nanjing 211100, China
  • Received:2019-04-23 Revised:2019-06-17 Online:2019-10-25 Published:2019-12-11
  • Contact: FANG Xiuqin E-mail:kinkinfang@hhu.edu.cn
  • Supported by:
    National Key Research and Development Program of China(No.2016YFA0601500)

摘要:

依据洪灾风险概念模型,从触发因子、孕灾环境和承灾体3方面选取江西省的12个洪灾风险指标,采用k近邻、随机森林、AdaBoost 3种机器学习算法构建洪灾风险评价模型。利用精度、Kappa系数、ROC曲线(AUC值)3种定量评估指标评价洪灾风险模型,基于随机森林和Boruta特征提取算法共同分析指标重要性,最后对比3种模型绘制的江西省山洪灾害风险分区图并分析山洪灾害分布特征。结果表明:① AdaBoost模型的精度、Kappa系数和AUC值的平均值为别为0.902、0.870和0.826,精度和Kappa系数略优于随机森林,AUC值与随机森林相当,而k近邻模型的3种性能指标均低于前2种算法;② 农田生产潜力、年最大6 h暴雨均值、年最大1 h暴雨均值、归一化差值植被指数、年降雨量均值这5个指标对最终的洪灾风险形成具有非常重要作用;③ 江西省较高风险区与最高风险区的面积和约占江西省总面积的34.4%,且主要分布于高降雨量、高暴雨量、农田生产潜力大的山区。

关键词: 随机森林机器学习算法, AdaBoost机器学习算法, ROC曲线, Boruta算法, 洪灾风险评价, 江西省

Abstract:

In China, floods are considered the most frequent natural disaster that can cause serious damages to the safety of human beings and severe economic losses. We chose Jiangxi Province as the study area, which frequently suffered from mountain torrents. According to the conceptual model of flood risk, 12 flood risk assessment indexes were selected from three aspects: trigger factor, hazard inducing environment, and hazard bearing agent. Three models of flood risk assessment were constructed using different machine learning algorithms, including k-Nearest Neighbor (kNN), Random Forest (RF), and AdaBoost. To evaluate the models' performances, we applied three quantitative performance indexes: accuracy, Kappa coefficient, and the ROC curve (AUC value). We analyzed the importance of indexes based on Random Forest algorithm and the feature extraction algorithm of Boruta. Then, the zoning maps of mountain flood risk drawn by the three models were used to compare and analyze the pattern of mountain flood disasters. According to the outcomes of the performance analysis, the average values of accuracy, Kappa coefficient, and AUC of the AdaBoost model were 0.902, 0.870, and 0.826, respectively. The accuracy and Kappa coefficient were slightly higher than RF, the AUC value was equivalent to RF. The three performance indexes of the kNN model were all lower than those of the other two. Our findings suggest that five indexes play very important roles in the formation of the final flood disaster risk, including potential farmland productivity, average annual maximum rainstorm within six hours, average annual maximum rainstorm within one hour, NDVI, and average annual rainfall. Our mapping results show that the areas of higher and highest risk zones account for 34.4% of Jiangxi Province. The regions with higher and highest risk are mainly distributed in the vicinity of mountains with high rainfall, heavy rainstorm, and high potential of farmland production.

Key words: Random Forest Machine Learning Algorithm, AdaBoost Machine Learning Algorithm, ROC curve, Boruta algorithm, flood risk assessment, Jiangxi Province