基于超参数优化CatBoost算法的河流悬浮物浓度遥感反演
陈点点(1997— ),女,山东临沂人,硕士生,研究方向为自然资源与水环境遥感。E-mail: 965519776@qq.com |
收稿日期: 2021-08-03
修回日期: 2021-09-21
网络出版日期: 2022-06-25
基金资助
中国科学院战略性先导科技专项(XDA23100503)
版权
Retrieving Suspended Matter Concentration in Rivers based on Hyperparameter Optimized CatBoost Algorithm
Received date: 2021-08-03
Revised date: 2021-09-21
Online published: 2022-06-25
Supported by
Subproject of strategic Priority Science and Technology Project of Chinese Academy of Sciences (Class A)(XDA23100503)
Copyright
悬浮物浓度(TSM)是水生态环境评价的重要参数之一,及时掌握河流悬浮物浓度动态变化信息对于内陆水质监测、水环境治理是十分必要的。本研究基于野外实测光谱和悬浮物浓度数据,筛选与悬浮物浓度高度相关的波段组合反射率作为自变量,基于CatBoost、随机森林和多元线性回归算法构建悬浮物浓度遥感反演模型,采用带交叉验证的网格搜索法分别对CatBoost和随机森林2种机器学习模型进行超参数调优,确定模型最优参数配置,并对比不同模型反演精度,确定最优模型。基于最优模型,利用2019—2020年多时相Sentinel-2 MSI遥感影像,反演闽江下游悬浮物浓度,并分析其时空变化特征。结果表明:① b4/b3、(b6-b3)/(b6+b3)、(b4+b8)/b3、(1/b3-1/b4)×b5是MSI反演闽江下游TSM浓度的最佳波段组合反射率; ② 对比其他2种模型,基于超参数优化的CatBoost算法建立的悬浮物反演模型精度最高,其决定系数R²为0.95,均方根误差RMSE和平均绝对百分比误差MAPE分别为15.32 mg/L和19.68%; ③ 2019—2020年闽江下游悬浮物浓度分布“西低东高”,白沙至琅岐入海口呈升高趋势;④ 悬浮物浓度夏季最高,冬季和秋季次之,春季最低。本研究可为闽江下游悬浮物浓度监测及时空变化分析提供一种有效的技术手段和理论参考。
陈点点 , 陈芸芝 , 冯险峰 , 武爽 . 基于超参数优化CatBoost算法的河流悬浮物浓度遥感反演[J]. 地球信息科学学报, 2022 , 24(4) : 780 -791 . DOI: 10.12082/dqxxkx.2022.210446
Total Suspended Matter (TSM) is one of the significant parameters of aquatic ecological environment assessment. It is necessary to grasp the dynamic change information of river suspended solids concentration in time for inland water quality monitoring and water environment management. This paper is based on field measured spectra and suspended matter concentration data, the band combination reflectance that is highly correlated with the concentration of suspended solids is selected as the independent variable. The remote sensing inversion model of suspended solids concentration is constructed based on CatBoost, random forest, and multiple linear regression algorithms. In order to determine the optimal parameter configuration for the models, the grid search method with cross-validation is used for hyperparameter tuning of two machine learning models, i.e., CatBoost and Random Forest, respectively. And the inversion accuracy of different models is compared to determine the optimal model. Based on the optimal model, multi-temporal Sentinel-2 MSI remote sensing images from 2019 to 2020 are used to invert suspended matter concentrations in the lower reaches of the Minjiang River and analyse their spatial and temporal variation characteristics. The results indicate that: ① b4/b3, (b6-b3)/(b6+b3), (b4+b8)/b3, (1/b3-1/b4)×b5 are the best band combination reflectance for MSI inversion of TSM concentrations in the lower Minjiang River; ② Compared with the other two models, the suspended matter concentrations inversion model based on CatBoost algorithm with hyperparameter optimized has the highest accuracy, with a coefficient of determination R2 of 0.95, Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) of 15.32 mg/L and 19.68%, respectively; ③ The distribution of suspended matter concentrations in the lower reaches of the Minjiang River from 2019 to 2020 is "low in the west and high in the east", with a rising trend from Baisha to the mouth of the Langqi inlet; ④ The suspended matter concentration is highest in summer, followed by winter and autumn, and lowest in spring. This study provides an effective technical means and theoretical reference for the monitoring and spatio-temporal variation analysis of suspended matter concentration in the lower reaches of Minjiang River.
表1 实测TSM浓度的描述统计Tab. 1 Descriptive statistics of the measured TSM |
采样时间 | 个数/个 | 最大值/(mg/L) | 最小值/(mg/L) | 平均值/(mg/L) | 标准差/(mg/L) | 变异系数/% |
---|---|---|---|---|---|---|
2014年10月 | 10 | 383 | 5 | 116.30 | 128.85 | 110.79 |
2017年07月 | 40 | 37 | 12 | 22.75 | 6.34 | 27.85 |
2019年12月 | 40 | 211 | 10 | 84.18 | 55.76 | 66.25 |
2020年11月 | 45 | 265 | 4 | 55.02 | 60.03 | 109.11 |
表2 Sentinel MSI影像Tab. 2 Sentinel MSI images |
季节 | 序号 | 日期 | 影像类型 |
---|---|---|---|
春 | 1 | 2019-03-20 | Sentinel-2A |
2 | 2020-04-08 | Sentinel-2B | |
3 | 2020-04-13 | Sentinel-2A | |
4 | 2020-04-18 | Sentinel-2B | |
夏 | 5 | 2020-06-12 | Sentinel-2A |
6 | 2020-07-22 | Sentinel-2A | |
7 | 2020-08-06 | Sentinel-2B | |
8 | 2020-08-26 | Sentinel-2B | |
秋 | 9 | 2019-11-05 | Sentinel-2A |
10 | 2019-11-10 | Sentinel-2B | |
11 | 2019-11-15 | Sentinel-2A | |
12 | 2020-10-10 | Sentinel-2A | |
冬 | 13 | 2019-01-24 | Sentinel-2B |
14 | 2019-01-29 | Sentinel-2A | |
15 | 2020-02-18 | Sentinel-2B | |
16 | 2020-02-23 | Sentinel-2A |
表3 MSI波段组合反射率与lg(TSM)、TSM的相关系数Tab. 3 Correlation coefficients of MSI band combination reflectance with lg(TSM) and TSM |
lg(TSM) | TSM | ||
---|---|---|---|
波段组合反射率 | 相关系数 | 波段组合反射率 | 相关系数 |
(1/b3-1/b4)×b5 | 0.86 | (1/b3-1/b8)×b8a | 0.80 |
(b4+b8)/b3 | 0.83 | (b5+b6)/b3 | 0.82 |
b4/b3 | 0.82 | b5/b3 | 0.75 |
(b6-b3)/(b6+b3) | 0.80 | (b7-b4)/(b7+b4) | 0.79 |
图5 CatBoost模型中回归误差RMSE和N_estimators的关系Fig. 5 The relationship between the regression error RMSE and N_estimators in the CatBoost model |
表4 CatBoost模型的必要参数和调优后的最优值Tab. 4 The necessary parameters and optimized values of CatBoost model |
参数 | 默认值 | 取值范围 | 最优值 |
---|---|---|---|
max_depth | 6 | [1,6] | 3 |
learning_rate | 0.03 | [0.01,0.05] | 0.01 |
n_estimators | 1000 | [50,1200] | 700 |
l2_leaf_reg | 3 | [1,3] | 1 |
loss_function | RMSE | RMSE, Logloss, MAE, MAPE, Poisson | RMSE |
[1] |
|
[2] |
王书航, 姜霞, 王雯雯, 等. 蠡湖水体悬浮物的时空变化及其影响因素[J]. 中国环境科学, 2014,34(6):1548-1555.
[
|
[3] |
|
[4] |
张运林, 秦伯强, 朱广伟, 等. 杭州西湖水体光学状况及影响因子分析[J]. 长江流域资源与环境, 2005(6):72-77.
[
|
[5] |
|
[6] |
|
[7] |
|
[8] |
王行行, 王杰, 崔玉环. 基于Sentinel-2 MSI影像的河湖系统水体悬浮物空间分异遥感监测:以安徽省升金湖与连接长江段为例[J]. 环境科学, 2020,41(3):1207-1216.
[
|
[9] |
|
[10] |
孙宏亮, 何宏昌, 付波霖, 等. 香港近海海域叶绿素a定量反演及时空变化分析[J]. 中国环境科学, 2020,40(5):2222-2229.
[
|
[11] |
|
[12] |
李云梅, 黄家柱, 韦玉春, 等. 用分析模型方法反演水体叶绿素的浓度[J]. 遥感学报, 2006,10(2):169-175.
[
|
[13] |
|
[14] |
刘忠华, 李云梅, 檀静, 等. 太湖、巢湖水体总悬浮物浓度半分析反演模型构建及其适用性评价[J]. 环境科学, 2012,33(9):3000-3008.
[
|
[15] |
|
[16] |
|
[17] |
|
[18] |
朱云芳, 朱利, 李家国, 等. 基于GF-1 WFV影像和BP神经网络的太湖叶绿素a反演[J]. 环境科学学报, 2017,37(1):130-137.
[
|
[19] |
烟贯发, 张雪萍, 王书玉, 等. 基于改进的PSO优化LSSVM参数的松花江哈尔滨段悬浮物的遥感反演[J]. 环境科学学报, 2014,34(8):2148-2156.
[
|
[20] |
|
[21] |
卢雪梅, 苏华. 基于OLCI数据的福建近海悬浮物浓度遥感反演[J]. 环境科学学报, 2020,40(8):2819-2827.
[
|
[22] |
方馨蕊, 温兆飞, 陈吉龙, 等. 随机森林回归模型的悬浮泥沙浓度遥感估算[J]. 遥感学报, 2019,23(4):756-772.
[
|
[23] |
|
[24] |
李东义, 陈坚, 王爱军, 等. 闽江河口洪季悬浮泥沙特征及输运过程[J]. 海洋工程, 2009,27(2):70-80.
[
|
[25] |
唐军武, 田国良, 汪小勇, 等. 水体光谱测量与分析Ⅰ:水面以上测量法[J]. 遥感学报, 2004,8(1):37-44.
[
|
[26] |
陈涛, 李武, 吴曙初. 悬浮泥沙浓度与光谱反射率峰值波长红移的相关关系[J]. 海洋学报(中文版), 1994(1):38-43.
[
|
[27] |
|
[28] |
|
[29] |
|
[30] |
|
[31] |
|
[32] |
|
[33] |
[34] |
温小乐, 徐涵秋. 福州城市扩展对闽江下游水质影响的遥感分析[J]. 地理科学, 2010,30(4):624-629.
[
|
[35] |
|
[36] |
龚松柏, 高爱国, 林建杰, 等. 闽江下游及河口悬浮物时空分布特征及其影响因素[J]. 地球科学与环境学报, 2017,39(6):826-836.
[
|
[37] |
刘大召, 付东洋, 沈春燕, 等. 河口及近岸二类水体悬浮泥沙遥感研究进展[J]. 海洋环境科学, 2010,29(4):611-616.
[
|
[38] |
谢旭, 陈芸芝. 基于PSO-RBF神经网络模型反演闽江下游水体悬浮物浓度[J]. 遥感技术与应用, 2018,33(5):900-907.
[
|
[39] |
张兵, 李俊生, 申茜, 等. 长时序大范围内陆水体光学遥感研究进展[J]. 遥感学报, 2021,25(1):37-52.
[
|
/
〈 |
|
〉 |