地球信息科学学报 ›› 2022, Vol. 24 ›› Issue (4): 780-791.doi: 10.12082/dqxxkx.2022.210446

• 遥感科学与应用技术 • 上一篇    下一篇

基于超参数优化CatBoost算法的河流悬浮物浓度遥感反演

陈点点1(), 陈芸芝1,*(), 冯险峰2,3, 武爽2,3   

  1. 1.福州大学 卫星空间信息技术综合应用国家地方联合工程研究中心 空间数据挖掘与信息共享教育部重点实验室数字中国研究院(福建),福州 350108
    2.中国科学院地理科学与资源研究所资源与环境信息国家重点实验室,北京 100101
    3.中国科学院大学,北京 100049
  • 收稿日期:2021-08-03 修回日期:2021-09-21 出版日期:2022-04-25 发布日期:2022-06-25
  • 通讯作者: *陈芸芝(1982— ),女,福建连江人,博士,副研究员,研究方向为资源与生态环境监测研究。 E-mail: chenyunzhi@fzu.edu.cn
  • 作者简介:陈点点(1997— ),女,山东临沂人,硕士生,研究方向为自然资源与水环境遥感。E-mail: 965519776@qq.com
  • 基金资助:
    中国科学院战略性先导科技专项(XDA23100503)

Retrieving Suspended Matter Concentration in Rivers based on Hyperparameter Optimized CatBoost Algorithm

CHEN Diandian1(), CHEN Yunzhi1,*(), FENG Xianfeng2,3, WU Shuang2,3   

  1. 1. Fuzhou University, National & Local Joint Engineering Research Center of Satellite Geospatial Information Technology, Key Laboratory of Spatial Data Mining and Information Sharing of Ministry of Education, The Academy of Digital China (Fujian), Fuzhou 350108, China
    2. State Key Laboratory of Resources and Environment Information System, Institute of Geographic Sciences and Natural Resources, Chinese Academy of Sciences, Beijing 100101, China
    3. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2021-08-03 Revised:2021-09-21 Online:2022-04-25 Published:2022-06-25
  • Supported by:
    Subproject of strategic Priority Science and Technology Project of Chinese Academy of Sciences (Class A)(XDA23100503)

摘要:

悬浮物浓度(TSM)是水生态环境评价的重要参数之一,及时掌握河流悬浮物浓度动态变化信息对于内陆水质监测、水环境治理是十分必要的。本研究基于野外实测光谱和悬浮物浓度数据,筛选与悬浮物浓度高度相关的波段组合反射率作为自变量,基于CatBoost、随机森林和多元线性回归算法构建悬浮物浓度遥感反演模型,采用带交叉验证的网格搜索法分别对CatBoost和随机森林2种机器学习模型进行超参数调优,确定模型最优参数配置,并对比不同模型反演精度,确定最优模型。基于最优模型,利用2019—2020年多时相Sentinel-2 MSI遥感影像,反演闽江下游悬浮物浓度,并分析其时空变化特征。结果表明:① b4/b3、(b6-b3)/(b6+b3)、(b4+b8)/b3、(1/b3-1/b4)×b5是MSI反演闽江下游TSM浓度的最佳波段组合反射率; ② 对比其他2种模型,基于超参数优化的CatBoost算法建立的悬浮物反演模型精度最高,其决定系数R²为0.95,均方根误差RMSE和平均绝对百分比误差MAPE分别为15.32 mg/L和19.68%; ③ 2019—2020年闽江下游悬浮物浓度分布“西低东高”,白沙至琅岐入海口呈升高趋势;④ 悬浮物浓度夏季最高,冬季和秋季次之,春季最低。本研究可为闽江下游悬浮物浓度监测及时空变化分析提供一种有效的技术手段和理论参考。

关键词: Sentinel-2 MSI, 悬浮物, CatBoost, 随机森林, 多元线性回归, 水色遥感, 闽江, 时空变化分析

Abstract:

Total Suspended Matter (TSM) is one of the significant parameters of aquatic ecological environment assessment. It is necessary to grasp the dynamic change information of river suspended solids concentration in time for inland water quality monitoring and water environment management. This paper is based on field measured spectra and suspended matter concentration data, the band combination reflectance that is highly correlated with the concentration of suspended solids is selected as the independent variable. The remote sensing inversion model of suspended solids concentration is constructed based on CatBoost, random forest, and multiple linear regression algorithms. In order to determine the optimal parameter configuration for the models, the grid search method with cross-validation is used for hyperparameter tuning of two machine learning models, i.e., CatBoost and Random Forest, respectively. And the inversion accuracy of different models is compared to determine the optimal model. Based on the optimal model, multi-temporal Sentinel-2 MSI remote sensing images from 2019 to 2020 are used to invert suspended matter concentrations in the lower reaches of the Minjiang River and analyse their spatial and temporal variation characteristics. The results indicate that: ① b4/b3, (b6-b3)/(b6+b3), (b4+b8)/b3, (1/b3-1/b4)×b5 are the best band combination reflectance for MSI inversion of TSM concentrations in the lower Minjiang River; ② Compared with the other two models, the suspended matter concentrations inversion model based on CatBoost algorithm with hyperparameter optimized has the highest accuracy, with a coefficient of determination R2 of 0.95, Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) of 15.32 mg/L and 19.68%, respectively; ③ The distribution of suspended matter concentrations in the lower reaches of the Minjiang River from 2019 to 2020 is "low in the west and high in the east", with a rising trend from Baisha to the mouth of the Langqi inlet; ④ The suspended matter concentration is highest in summer, followed by winter and autumn, and lowest in spring. This study provides an effective technical means and theoretical reference for the monitoring and spatio-temporal variation analysis of suspended matter concentration in the lower reaches of Minjiang River.

Key words: Sentinel-2 MSI, total suspended matter, CatBoost, Random Forest, multiple linear regression, water color remote sensing, Minjiang, temporal and spatial distribution characteristics