地球信息科学学报 ›› 2020, Vol. 22 ›› Issue (7): 1497-1509.doi: 10.12082/dqxxkx.2020.190523

• 地球信息科学理论与方法 • 上一篇    下一篇

基于相关分析和自适应遗传算法的盐渍化建模变量和参数优选

徐红涛1,2, 陈春波1,2(), 郑宏伟1,2,*(), 罗格平1,2, 杨辽1,2, 王伟胜1,2, 吴世新1,2   

  1. 1.中国科学院新疆生态与地理研究所,荒漠与绿洲生态国家重点实验室,乌鲁木齐 830011;
    2.中国科学院大学,北京 100049
  • 收稿日期:2019-09-16 修回日期:2019-11-19 出版日期:2020-07-25 发布日期:2020-09-25
  • 通讯作者: 郑宏伟 E-mail:xuhongtao17@mails.ucas.ac.cn;hzheng@ms.xjb.ac.cn
  • 作者简介:徐红涛(1993— ),男,河南驻马店人,硕士生,主要从事遥感与地理信息系统研究。E-mail:xuhongtao17@mails.ucas.ac.cn
  • 基金资助:
    国家自然科学基金项目(41877012);中国科学院“一带一路”团队项目(2018-YDYLTD-002);中国科学院特色研究所项目(TSS-2015-014-FW-1-3)

Correlation Analysis and Adaptive Genetic Algorithm based Feature Subset and Model Parameter Optimization in Salinization Monitoring

XU Hongtao1,2, CHEN Chunbo1,2(), ZHENG Hongwei1,2,*(), LUO Geping1,2, YANG Liao1,2, WANG Weisheng1,2, WU Shixin1,2   

  1. 1. State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2019-09-16 Revised:2019-11-19 Online:2020-07-25 Published:2020-09-25
  • Contact: ZHENG Hongwei E-mail:xuhongtao17@mails.ucas.ac.cn;hzheng@ms.xjb.ac.cn
  • Supported by:
    National Natural Science Foundation of China(41877012);The team project of the Chinese Academy of Sciences under Grant(2018-YDYLTD-002);Characteristic Institutes Main Service Program(Program1, Topic3) of CAS(TSS-2015-014-FW-1-3)

摘要:

机器学习结合遥感等其他数据反演土壤盐分含量(Soil Salt Content, SSC)较少关注对模型精度影响较大的建模特征变量和模型参数的优选。本文基于自适应遗传算法(Adaptive Genetic Algorithm, AGA)同步优选建模特征变量和模型参数的支持向量回归(Support Vector Regression, SVR)算法反演三工河流域2016年SSC,并分析其在不同土地利用类型的分布特征。建模特征变量和模型参数的同步优选及实验设计如下:首先基于Landsat 8 OLI和SRTM高程数据提取7类共40个盐渍化相关因子,经相关分析初步筛选出候选特征变量,分别代入AGA、遗传算法(Genetic Algorithm, GA)和格网搜索算法(Grid Search, GS)同步优选SVR的建模特征变量和模型参数,并建立盐渍化监测模型(AGA-SVR、GA-SVR、GS-SVR)。结果表明:① AGA-SVR精度最优,GA-SVR次之,GS-SVR最差,相较于GS-SVR,AGA-SVR的R2/RMSE提高了44.65%;② 三工河流域非、轻度、中度、重度盐渍地和盐土的面积占比分别为42.83%、11.02%、15.88%、9.22%、21.05%;③ 草地和未利用地主要以非盐渍地和盐土为主,耕地和林地中非盐渍地分布比例均为最大;不同土地利用类型的SSC均值和标准差均呈现未利用地>草地>耕地>林地的规律。本研究的建模特征变量和模型参数的优选方法可在一定程度上提高盐渍化监测的精度。关键词:盐渍化;遗传算法;机器学习;特征优选;参数优化;土壤盐分含量;土地利用;相关分析

关键词: soil salinization, adaptive genetic algorithm, machine learning, feature subset selection, parameter optimization, soil salt content, land use, correlation analysis

Abstract:

The selection of feature subset and the optimization of model parameters plays an important role in improving the accuracy of soil salinization monitoring. However, machine learning algorithm combined with other data such as remote sensing images to predict Soil Salt Content (SSC) pays little attention to the optimization of feature subset and model parameters. In this paper, the Support Vector Regression (SVR) algorithm with synchronous optimized feature subset and model parameters using the Adaptive Genetic Algorithm (AGA) was developed to retrieve the SSC of Sangong River Basin in 2016, and the distributions of SSC in different land use types were analyzed. The synchronous optimization of feature subset and model parameters, and the comparative experimental design were conducted as follows. First, a total of 40 salinization-related factors of 7 categories(Vegetation indices, Salinity indices, Underlying surface Reflection factor, Feature spaces, Tasselled Cap transformation factors, Surface reflectance, Topographic factors) were extracted from Landsat 8 OLI and SRTM Digital Elevation Model(DEM) data, and the Candidate Feature Variables (CFVs) were initially selected by correlation analysis using significance (p<0.05) as standard. Then the CFVs were introduced into AGA, Genetic Algorithm(GA), Grid Search (GS) to synchronous optimize the feature subset and model parameters of SVR, and the different salinization monitoring models (AGA-SVR, GA-SVR, GS-SVR) were established, respectively. The results show that the performance of different salinization monitoring models occurred in the order of AGA-SVR> GA-SVR > GS-SVR. Comparing with GS-SVR, the GA-SVR and AGA-SVR improved the accuracy of salinization monitoring obviously, while the R2/RMSE of AGA-SVR increased by 44.65%. In terms of the different types of salinized soil, the proportion of non-salinized soil, slightly salinized soil, moderately salinized soil, severely salinized soil, saline soil in Sangong River Basin was 42.83%, 11.02%, 15.88%, 9.22%, 21.05%, respectively. In terms of the distribution of SSC in different land use types, the unused land and grassland were mainly comprised of non-salinized soil and saline soil, while the distribution proportion of non-salinized soil were the largest in farmland and forest land. Moreover, the mean and standard deviation of SSC of different land use types were in the order of unused land > grassland >farmland > forest land. To some extent, the preferred method of feature subset selection and model parameters optimization in this paper can improve the accuracy of salinization monitoring.

Key words: soil salinization, adaptive genetic algorithm, machine learning, feature subset selection, parameter optimization, soil salt content, land use, correlation analysis