Journal of Geo-information Science >
Correlation Analysis and Adaptive Genetic Algorithm based Feature Subset and Model Parameter Optimization in Salinization Monitoring
Received date: 2019-09-16
Request revised date: 2019-11-19
Online published: 2020-09-25
Supported by
National Natural Science Foundation of China(41877012)
The team project of the Chinese Academy of Sciences under Grant(2018-YDYLTD-002)
Characteristic Institutes Main Service Program(Program1, Topic3) of CAS(TSS-2015-014-FW-1-3)
Copyright
The selection of feature subset and the optimization of model parameters plays an important role in improving the accuracy of soil salinization monitoring. However, machine learning algorithm combined with other data such as remote sensing images to predict Soil Salt Content (SSC) pays little attention to the optimization of feature subset and model parameters. In this paper, the Support Vector Regression (SVR) algorithm with synchronous optimized feature subset and model parameters using the Adaptive Genetic Algorithm (AGA) was developed to retrieve the SSC of Sangong River Basin in 2016, and the distributions of SSC in different land use types were analyzed. The synchronous optimization of feature subset and model parameters, and the comparative experimental design were conducted as follows. First, a total of 40 salinization-related factors of 7 categories(Vegetation indices, Salinity indices, Underlying surface Reflection factor, Feature spaces, Tasselled Cap transformation factors, Surface reflectance, Topographic factors) were extracted from Landsat 8 OLI and SRTM Digital Elevation Model(DEM) data, and the Candidate Feature Variables (CFVs) were initially selected by correlation analysis using significance (p<0.05) as standard. Then the CFVs were introduced into AGA, Genetic Algorithm(GA), Grid Search (GS) to synchronous optimize the feature subset and model parameters of SVR, and the different salinization monitoring models (AGA-SVR, GA-SVR, GS-SVR) were established, respectively. The results show that the performance of different salinization monitoring models occurred in the order of AGA-SVR> GA-SVR > GS-SVR. Comparing with GS-SVR, the GA-SVR and AGA-SVR improved the accuracy of salinization monitoring obviously, while the R2/RMSE of AGA-SVR increased by 44.65%. In terms of the different types of salinized soil, the proportion of non-salinized soil, slightly salinized soil, moderately salinized soil, severely salinized soil, saline soil in Sangong River Basin was 42.83%, 11.02%, 15.88%, 9.22%, 21.05%, respectively. In terms of the distribution of SSC in different land use types, the unused land and grassland were mainly comprised of non-salinized soil and saline soil, while the distribution proportion of non-salinized soil were the largest in farmland and forest land. Moreover, the mean and standard deviation of SSC of different land use types were in the order of unused land > grassland >farmland > forest land. To some extent, the preferred method of feature subset selection and model parameters optimization in this paper can improve the accuracy of salinization monitoring.
XU Hongtao , CHEN Chunbo , ZHENG Hongwei , LUO Geping , YANG Liao , WANG Weisheng , WU Shixin . Correlation Analysis and Adaptive Genetic Algorithm based Feature Subset and Model Parameter Optimization in Salinization Monitoring[J]. Journal of Geo-information Science, 2020 , 22(7) : 1497 -1509 . DOI: 10.12082/dqxxkx.2020.190523
表1 实验数据Tab. 1 Dataset of soil salt content predicting of Sangong River Basin in 2016 |
数据 | 空间分辨率/m | 数据来源 | 数据获取时间 |
---|---|---|---|
Landsat 8 OLI数据 | 30 | USGS | 2016年8月4日 |
高程数据 | 30 | USGS | 2000年 |
SSC实测数据 | — | 野外采样 | 2016年8月1—7日 |
土地利用类型数据 | 30 | 团队完成 | 2015年 |
表2 提取的盐渍化相关因子的类别、名称、公式和编号以及参考文献Tab. 2 Extracted salinization-related factors along with their categories, names, equations, equation numbers and references |
类别 | 名称 | 公式 | 公式编号 | 参考文献 |
---|---|---|---|---|
植被指数 | 归一化植被指数 | NDVI =(NIR-R)/(NIR+R) | (1) | [4] |
扩展的归一化植被指 | ENDVI=(NIR+SWIRb2-R)/(NIR+SWIRb2+R) | (2) | [22] | |
增强植被指数 | EVI =2.5×(NIR-R)/(NIR+6×R-7.5×B+1) | (3) | [4] | |
扩展的增强植被指数 | EEVI =2.5×(NIR+SWIRb1)/(NIR+SWIRb1+6×R-7.5×B+1) | (4) | [22] | |
土壤调节植被指数 | SAVI =(1+L) ×(NIR-R)/(NIR+R+L) | (5) | [4] | |
修改型土壤调节植被指数 | MSAVI =((2×NIR-1)-)/2 | (6) | [23] | |
差值植被指数 | DVI =NIR-R | (7) | [23] | |
比值植被指数 | RVI=NIR/R | (8) | [23] | |
大气阻抗植被指数 | ARVI =(NIR-(2×R-B))/(NIR+(2×R-B)) | (9) | [23] | |
广义差分植被指数 | GDVI =(NIR2-R2)/(NIR2+R2) | (10) | [4] | |
非线性植被指数 | NLI =(NIR2-R)/(NIR2+R) | (11) | [4] | |
绿色大气阻抗指数 | GARI =(NIR-(G+ γ×(B-R)))/(NIR+(G+γ×(B-R))) | (12) | [4] | |
盐分指数 | 盐分指数 | SI = | (13) | [4] |
盐分指数1 | SI1 = | (14) | [4] | |
盐分指数2 | SI2 = | (15) | [4] | |
盐分指数3 | SI3 = | (16) | [4] | |
盐分指数 | S1 =B/R | (17) | [4] | |
盐分指数 | S2 =(B-R)/(B+R) | (18) | [4] | |
盐分指数 | S3 =G×R/B | (19) | [4] | |
盐分指数 | S5 =B×R/G | (20) | [4] | |
盐分指数 | S6 =NIR×R/G | (21) | [4] | |
冠层响应盐分指数 | CRSI = | (22) | [4] | |
下垫面因素 | 短波红外地表反照度 | αshort=0.356×B+0.13×R+0.373×NIR+0.085×SWIRb1+0.072×SWIRb2-0.002 | (23) | [24] |
可见光地表反照度 | αvis=0.443×B+0.170×G+0.240×R | (24) | [24] | |
特征空间 | 植被指数-盐分指数特征空间 | NSI = | (25) | [25] |
植被指数-湿度指数特征空间 | NWI = | (26) | [25] | |
湿度指数-盐分指数特征空间 | WSI = | (27) | [25] | |
缨帽变换因子 | 绿度指数 | GVI =-0.294×BTOA-0.243×GTOA-0.542×RTOA+0.728×NIRTOA+0.071×SWIRb1,TOA-0.161×SWIRb2,TOA | (28) | [26] |
湿度指数 | WI =0.151×RTOA+0.197×GTOA+0.328×BTOA+0.341×NIRTOA-0.712×SWIRb1,TOA-0.456×SWIRb2,TOA | (29) | [26] | |
亮度指数 | BI =0.303×RTOA+0.279×GTOA+0.473×BTOA+0.560×NIRTOA+0.508×SWIRb1,TOA+0.187× SWIRb2,TOA | (30) | [26] | |
地表反射率 | B2/B3/B4/B5/B6/B10/B7 | B/G/R/NIR/SWIRb1/TIRSb1/SWIRb2 | (31)-(37) | [27] |
地形因子 | 高程/坡度/地表粗糙度 | elevation/slope/roughness | (38)-(40) | [28] |
注:R、G、B、SWIRb1、TIRSb1、SWIRb2分别为红、绿、蓝、短波红外1、热红外波段1、短波红外2波段的地表反射率,TOA代表大气顶层表观反射率;L=0.5和γ=0.9是气溶胶和大气相关参数。 |
表3 三工河流域土壤样品的SSC统计特征Tab. 3 Statistic of SSC of the soil samples in Sangong River Basin |
统计值/ (g/kg) | ||||||
---|---|---|---|---|---|---|
n值 | 最小值 | 最大值 | 平均值 | 标准差 | 变异系数/% | |
采样数据 | 137 | 2.62 | 60.74 | 13.58 | 10.59 | 77.97 |
训练集 | 103 | 2.62 | 60.74 | 13.46 | 10.70 | 79.53 |
验证集 | 34 | 4.18 | 38.08 | 13.49 | 10.03 | 74.40 |
表4 三工河流域不同盐分含量估算模型Tab. 4 Soil salinity estimation of different models in Sangong River Basin |
方法 | 训练 | 验证 | 模型参数 | 建模特征 变量个数/个 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
R2 | RMSE/(g/kg) | RPIQ | R2 | RMSE/(g/kg) | RPIQ | Fitness/(g/kg) | C, γ | ||||
AGA-SVR | 0.86 | 3.96 | 1.89 | 0.82 | 4.27 | 2.02 | 191.95 | 70,8 | 15 | ||
GA-SVR | 0.96 | 2.02 | 3.72 | 0.76 | 4.93 | 1.75 | 153.74 | 70,10 | 17 | ||
GS-SVR | 0.77 | 5.16 | 1.45 | 0.71 | 5.37 | 1.60 | 132.70 | 60,4 | 33 |
表5 2016年三工河流域盐渍化类型面积统计Tab. 5 The statistical areas of different types of soil salinization in Sangong River Basin, 2016 |
盐渍地类型 | 非盐渍地 | 轻度盐渍地 | 中度盐渍地 | 重度盐渍地 | 盐土 |
---|---|---|---|---|---|
面积/km2 | 588.11 | 151.38 | 218.01 | 126.63 | 289.06 |
百分比/% | 42.83 | 11.02 | 15.88 | 9.22 | 21.05 |
表6 2015年三工河流域土地利用类型面积统计Tab. 6 The statistical areas of different types of land use in Sangong River Basin, 2015 |
土型 | 耕地 | 林地 | 草地 | 水域 | 城乡建设用地 | 未利用地 |
---|---|---|---|---|---|---|
面积/km2 | 659.72 | 4.078 | 479.72 | 45.77 | 97.42 | 85.46 |
百分比/% | 48.08 | 0.30 | 34.96 | 3.34 | 7.10 | 6.23 |
表7 不同盐渍地类型在不同土地利用类型中的分布Tab. 7 The statistics of areas of different types of saline soils distributed in different types of land use (km2,%) |
耕地 | 林地 | 草地 | 未利用地 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
面积 | 比例 | 面积 | 比例 | 面积 | 比例 | 面积 | 比例 | ||||
非盐渍地 | 275.21 | 41.72 | 1.97 | 48.41 | 194.10 | 40.46 | 27.80 | 32.53 | |||
轻度盐渍地 | 86.35 | 13.09 | 0.93 | 22.78 | 38.38 | 8.00 | 6.32 | 7.40 | |||
中度盐渍地 | 132.40 | 20.07 | 0.73 | 17.88 | 60.22 | 12.55 | 7.74 | 9.05 | |||
重度盐渍地 | 65.65 | 9.95 | 0.19 | 4.66 | 47.83 | 9.97 | 5.69 | 6.66 | |||
盐土 | 100.11 | 15.17 | 0.26 | 6.27 | 139.19 | 29.02 | 37.91 | 44.36 |
[1] |
姜红, 玉素甫江·如素力, 热伊莱·卡得尔, 等. 基于神经网络模型的干旱区绿洲土壤盐渍化评价分析[J]. 地球信息科学学报, 2017,19(7):983-993.
[
|
[2] |
|
[3] |
|
[4] |
|
[5] |
|
[6] |
|
[7] |
|
[8] |
|
[9] |
张同瑞, 赵庚星, 高明秀, 等. 基于近地多光谱和OLI影像的黄河三角洲冬小麦种植区盐分估算及遥感反演——以山东省垦利县和无棣县为例[J]. 自然资源学报, 2016,31(6):1051-1060.
[
|
[10] |
王飞, 杨胜天, 丁建丽, 等. 环境敏感变量优选及机器学习算法预测绿洲土壤盐分[J]. 农业工程学报, 2018,34(22):102-110.
[
|
[11] |
|
[12] |
杨爱霞, 丁建丽, 李艳红, 等. 基于表观电导率与实测光谱的干旱区湿地土壤盐分监测[J]. 中国沙漠, 2017,36(20):1365-1373.
[
|
[13] |
|
[14] |
|
[15] |
|
[16] |
王新新, 罗格平, 叶辉, 等. 天山北坡绿洲—荒漠区高时空分辨率日均气温数据集构建——以三工河流域为例[J]. 地理研究, 2017,36(1):49-60.
[
|
[17] |
|
[18] |
|
[19] |
孙浩, 刘丽娟, 李小玉, 等. 干旱区绿洲防护林网格局对农田蒸散量的影响———以新疆三工河流域绿洲为例[J]. 生态学杂志, 2018,37(8):2436-2444.
[ Effects of the pattern of agricultural shelterbelt network on evapotranspiration of oases in arid region: A case study from Sangong River Basin in Xinjiang[J]. Chinese Journal of Ecology, 2018,37(8):2436-2444. ]
|
[20] |
贺可, 吴世新, 杨怡, 等. 近40a新疆土地利用及其绿洲动态变化[J]. 干旱区地理, 2018,41(6):193-200.
[
|
[21] |
|
[22] |
王飞, 丁建丽, 魏阳, 等. 基于Landsat系列数据的盐分指数和植被指数对土壤盐度变异性的响应分析——以新疆天山南北典型绿洲为例[J]. 生态学报, 2017,37(15):5007-5022.
[
|
[23] |
|
[24] |
|
[25] |
李艳华, 丁建丽, 孙永猛, 等. 基于三维特征空间的土壤盐渍化遥感模型[J]. 水土保持研究, 2015,22(4):113-121.
[
|
[26] |
|
[27] |
|
[28] |
|
[29] |
|
[30] |
王雅婷, 孔金玲, 杨亮彦, 等. 基于SVR的旱区稀疏植被覆盖下土壤水分遥感反演[J]. 地球信息科学学报, 2019,21(8):1275-1283.
[
|
[31] |
|
[32] |
|
[33] |
乔木, 田长彦, 王新平. 新疆灌区土壤盐渍化及改良治理模式[M]. 乌鲁木齐: 新疆科学技术出版社, 2008.
[
|
[34] |
王雪梅, 康璇, 赵枫. 不同土地利用方式下渭-库绿洲土壤盐渍化特征分析[J]. 水土保持研究, 2016,23(1):160-164.
[
|
/
〈 | 〉 |