Journal of Geo-information Science >
Retrieving Suspended Matter Concentration in Rivers based on Hyperparameter Optimized CatBoost Algorithm
Received date: 2021-08-03
Revised date: 2021-09-21
Online published: 2022-06-25
Supported by
Subproject of strategic Priority Science and Technology Project of Chinese Academy of Sciences (Class A)(XDA23100503)
Copyright
Total Suspended Matter (TSM) is one of the significant parameters of aquatic ecological environment assessment. It is necessary to grasp the dynamic change information of river suspended solids concentration in time for inland water quality monitoring and water environment management. This paper is based on field measured spectra and suspended matter concentration data, the band combination reflectance that is highly correlated with the concentration of suspended solids is selected as the independent variable. The remote sensing inversion model of suspended solids concentration is constructed based on CatBoost, random forest, and multiple linear regression algorithms. In order to determine the optimal parameter configuration for the models, the grid search method with cross-validation is used for hyperparameter tuning of two machine learning models, i.e., CatBoost and Random Forest, respectively. And the inversion accuracy of different models is compared to determine the optimal model. Based on the optimal model, multi-temporal Sentinel-2 MSI remote sensing images from 2019 to 2020 are used to invert suspended matter concentrations in the lower reaches of the Minjiang River and analyse their spatial and temporal variation characteristics. The results indicate that: ① b4/b3, (b6-b3)/(b6+b3), (b4+b8)/b3, (1/b3-1/b4)×b5 are the best band combination reflectance for MSI inversion of TSM concentrations in the lower Minjiang River; ② Compared with the other two models, the suspended matter concentrations inversion model based on CatBoost algorithm with hyperparameter optimized has the highest accuracy, with a coefficient of determination R2 of 0.95, Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) of 15.32 mg/L and 19.68%, respectively; ③ The distribution of suspended matter concentrations in the lower reaches of the Minjiang River from 2019 to 2020 is "low in the west and high in the east", with a rising trend from Baisha to the mouth of the Langqi inlet; ④ The suspended matter concentration is highest in summer, followed by winter and autumn, and lowest in spring. This study provides an effective technical means and theoretical reference for the monitoring and spatio-temporal variation analysis of suspended matter concentration in the lower reaches of Minjiang River.
CHEN Diandian , CHEN Yunzhi , FENG Xianfeng , WU Shuang . Retrieving Suspended Matter Concentration in Rivers based on Hyperparameter Optimized CatBoost Algorithm[J]. Journal of Geo-information Science, 2022 , 24(4) : 780 -791 . DOI: 10.12082/dqxxkx.2022.210446
表1 实测TSM浓度的描述统计Tab. 1 Descriptive statistics of the measured TSM |
采样时间 | 个数/个 | 最大值/(mg/L) | 最小值/(mg/L) | 平均值/(mg/L) | 标准差/(mg/L) | 变异系数/% |
---|---|---|---|---|---|---|
2014年10月 | 10 | 383 | 5 | 116.30 | 128.85 | 110.79 |
2017年07月 | 40 | 37 | 12 | 22.75 | 6.34 | 27.85 |
2019年12月 | 40 | 211 | 10 | 84.18 | 55.76 | 66.25 |
2020年11月 | 45 | 265 | 4 | 55.02 | 60.03 | 109.11 |
表2 Sentinel MSI影像Tab. 2 Sentinel MSI images |
季节 | 序号 | 日期 | 影像类型 |
---|---|---|---|
春 | 1 | 2019-03-20 | Sentinel-2A |
2 | 2020-04-08 | Sentinel-2B | |
3 | 2020-04-13 | Sentinel-2A | |
4 | 2020-04-18 | Sentinel-2B | |
夏 | 5 | 2020-06-12 | Sentinel-2A |
6 | 2020-07-22 | Sentinel-2A | |
7 | 2020-08-06 | Sentinel-2B | |
8 | 2020-08-26 | Sentinel-2B | |
秋 | 9 | 2019-11-05 | Sentinel-2A |
10 | 2019-11-10 | Sentinel-2B | |
11 | 2019-11-15 | Sentinel-2A | |
12 | 2020-10-10 | Sentinel-2A | |
冬 | 13 | 2019-01-24 | Sentinel-2B |
14 | 2019-01-29 | Sentinel-2A | |
15 | 2020-02-18 | Sentinel-2B | |
16 | 2020-02-23 | Sentinel-2A |
表3 MSI波段组合反射率与lg(TSM)、TSM的相关系数Tab. 3 Correlation coefficients of MSI band combination reflectance with lg(TSM) and TSM |
lg(TSM) | TSM | ||
---|---|---|---|
波段组合反射率 | 相关系数 | 波段组合反射率 | 相关系数 |
(1/b3-1/b4)×b5 | 0.86 | (1/b3-1/b8)×b8a | 0.80 |
(b4+b8)/b3 | 0.83 | (b5+b6)/b3 | 0.82 |
b4/b3 | 0.82 | b5/b3 | 0.75 |
(b6-b3)/(b6+b3) | 0.80 | (b7-b4)/(b7+b4) | 0.79 |
图5 CatBoost模型中回归误差RMSE和N_estimators的关系Fig. 5 The relationship between the regression error RMSE and N_estimators in the CatBoost model |
表4 CatBoost模型的必要参数和调优后的最优值Tab. 4 The necessary parameters and optimized values of CatBoost model |
参数 | 默认值 | 取值范围 | 最优值 |
---|---|---|---|
max_depth | 6 | [1,6] | 3 |
learning_rate | 0.03 | [0.01,0.05] | 0.01 |
n_estimators | 1000 | [50,1200] | 700 |
l2_leaf_reg | 3 | [1,3] | 1 |
loss_function | RMSE | RMSE, Logloss, MAE, MAPE, Poisson | RMSE |
[1] |
|
[2] |
王书航, 姜霞, 王雯雯, 等. 蠡湖水体悬浮物的时空变化及其影响因素[J]. 中国环境科学, 2014,34(6):1548-1555.
[
|
[3] |
|
[4] |
张运林, 秦伯强, 朱广伟, 等. 杭州西湖水体光学状况及影响因子分析[J]. 长江流域资源与环境, 2005(6):72-77.
[
|
[5] |
|
[6] |
|
[7] |
|
[8] |
王行行, 王杰, 崔玉环. 基于Sentinel-2 MSI影像的河湖系统水体悬浮物空间分异遥感监测:以安徽省升金湖与连接长江段为例[J]. 环境科学, 2020,41(3):1207-1216.
[
|
[9] |
|
[10] |
孙宏亮, 何宏昌, 付波霖, 等. 香港近海海域叶绿素a定量反演及时空变化分析[J]. 中国环境科学, 2020,40(5):2222-2229.
[
|
[11] |
|
[12] |
李云梅, 黄家柱, 韦玉春, 等. 用分析模型方法反演水体叶绿素的浓度[J]. 遥感学报, 2006,10(2):169-175.
[
|
[13] |
|
[14] |
刘忠华, 李云梅, 檀静, 等. 太湖、巢湖水体总悬浮物浓度半分析反演模型构建及其适用性评价[J]. 环境科学, 2012,33(9):3000-3008.
[
|
[15] |
|
[16] |
|
[17] |
|
[18] |
朱云芳, 朱利, 李家国, 等. 基于GF-1 WFV影像和BP神经网络的太湖叶绿素a反演[J]. 环境科学学报, 2017,37(1):130-137.
[
|
[19] |
烟贯发, 张雪萍, 王书玉, 等. 基于改进的PSO优化LSSVM参数的松花江哈尔滨段悬浮物的遥感反演[J]. 环境科学学报, 2014,34(8):2148-2156.
[
|
[20] |
|
[21] |
卢雪梅, 苏华. 基于OLCI数据的福建近海悬浮物浓度遥感反演[J]. 环境科学学报, 2020,40(8):2819-2827.
[
|
[22] |
方馨蕊, 温兆飞, 陈吉龙, 等. 随机森林回归模型的悬浮泥沙浓度遥感估算[J]. 遥感学报, 2019,23(4):756-772.
[
|
[23] |
|
[24] |
李东义, 陈坚, 王爱军, 等. 闽江河口洪季悬浮泥沙特征及输运过程[J]. 海洋工程, 2009,27(2):70-80.
[
|
[25] |
唐军武, 田国良, 汪小勇, 等. 水体光谱测量与分析Ⅰ:水面以上测量法[J]. 遥感学报, 2004,8(1):37-44.
[
|
[26] |
陈涛, 李武, 吴曙初. 悬浮泥沙浓度与光谱反射率峰值波长红移的相关关系[J]. 海洋学报(中文版), 1994(1):38-43.
[
|
[27] |
|
[28] |
|
[29] |
|
[30] |
|
[31] |
|
[32] |
|
[33] |
[34] |
温小乐, 徐涵秋. 福州城市扩展对闽江下游水质影响的遥感分析[J]. 地理科学, 2010,30(4):624-629.
[
|
[35] |
|
[36] |
龚松柏, 高爱国, 林建杰, 等. 闽江下游及河口悬浮物时空分布特征及其影响因素[J]. 地球科学与环境学报, 2017,39(6):826-836.
[
|
[37] |
刘大召, 付东洋, 沈春燕, 等. 河口及近岸二类水体悬浮泥沙遥感研究进展[J]. 海洋环境科学, 2010,29(4):611-616.
[
|
[38] |
谢旭, 陈芸芝. 基于PSO-RBF神经网络模型反演闽江下游水体悬浮物浓度[J]. 遥感技术与应用, 2018,33(5):900-907.
[
|
[39] |
张兵, 李俊生, 申茜, 等. 长时序大范围内陆水体光学遥感研究进展[J]. 遥感学报, 2021,25(1):37-52.
[
|
/
〈 |
|
〉 |