基于独立成分分析和随机森林算法的城镇用地提取研究
蒲东川(1995— ),男,重庆奉节人,硕士生,主要从事地表覆盖分类、城市遥感等研究。 E-mail:pudc17@mails.jlu.edu.cn |
收稿日期: 2019-07-19
要求修回日期: 2019-11-25
网络出版日期: 2020-10-25
基金资助
国家自然科学基金重点项目(61731022)
中科院A类先导专项(XDA19090300)
国家重点研发计划课题(2016YFA0600302)
国家重点研发计划课题(2016YFB0501502)
版权
Urban Area Extraction based on Independent Component Analysis and Random Forest Algorithm
Received date: 2019-07-19
Request revised date: 2019-11-25
Online published: 2020-10-25
Supported by
National Natural Science Foundation of China(61731022)
Strategic Priority Research Program of the Chinese Academy of Sciences(XDA19090300)
National Key Research and Development Project(2016YFA0600302)
National Key Research and Development Project(2016YFB0501502)
Copyright
城镇用地信息是联合国2030年可持续发展议程关注的重点之一。城市在世界范围内迅速扩张,快速准确地获取城镇用地信息对于政府决策具有重要作用。城镇土地覆盖信息非常复杂,包括人工建筑、树木、草地、水体等多种地表覆盖类型。基于传统人工测绘获取城镇用地信息费时费力并且难于及时更新。Landsat等遥感卫星数据为城镇用地信息提取提供了丰富的数据源。基于卫星遥感数据提取的城镇用地信息可以为未来城市的建设和管理提供基础的科学决策数据。基于监督分类方法和卫星遥感数据可快速地提取城镇用地信息,然而特征变量的选择对于高精度城镇用地信息提取尤为重要。为研究不同特征变量组合对于城镇用地信息提取的影响,以北京市为研究区,以2017年7月10日获取的Landsat 8 OLI影像为数据源,通过数据预处理、纹理提取、独立成分分析、主成分分析等得到4个维度的29个特征,选取了7种特征组合方案进行城镇用地提取。考虑随机森林算法性能稳定,分类精度高和可以方便进行特征重要性评价等优点,选择其作为监督分类算法以提取城镇用地信息,并进行了精度评定,以确定最优的城镇用地提取特征组合。研究发现:综合利用光谱特征和独立成分分析后的影像特征,提取城镇用地的总体精度为93.1%,Kappa系数为0.86,优于利用其他特征的提取结果;基于随机森林算法对数据进行训练后输出的各变量的归一化变量重要性与特征均值的标准差结果存在相似性,利用随机森林算法的变量重要性估计与特征均值折线图都可以进行变量重要性评价。
蒲东川 , 王桂周 , 张兆明 , 牛雪峰 , 何国金 , 龙腾飞 , 尹然宇 , 江威 , 孙嘉悦 . 基于独立成分分析和随机森林算法的城镇用地提取研究[J]. 地球信息科学学报, 2020 , 22(8) : 1597 -1606 . DOI: 10.12082/dqxxkx.2020.190385
Urban area information is of great significance for human development, in the 2030 United Nations (UN) Sustainable Development Agenda. Urban area expanded rapidly in many places of the world. Accurate and timely urban area information is very important for decision makers. However, land cover in urban area is highly complex, including artificial buildings, trees, grasslands, water bodies, etc. Extraction of urban land cover information based on traditional manual survey is time-consuming and difficult to update in time. Free access to remote sensing satellite data such as Landsat provides a rich source of data for urban area extraction. Urban area information extracted from space borne remote sensing images can provide basic scientific data for decision-making and city construction and management. Based on supervised classification method and satellite remote sensing data, it is possible to extract urban areas fast. However, choosing appropriate feature variables is very important for obtaining accurate urban area extraction result, especially linear correlations between different features has a significant impact on the extraction accuracy. After implementing independent component analysis (ICA) transformation to satellite remote sensing image data, linearly independent feature variables can be obtained, therefore accuracy of urban area extraction can be effectively improved. Taking Beijing city as the study area and Landsat 8 Operational Land Imager (OLI) imagery (path/row: 123/32) acquired on July 10th, 2017 as the experimental data, preprocessing, texture extraction, independent component analysis, and principal component analysis were performed, 29 features in 4 dimensions and 7 feature variable combinations were selected. Then, Random Forest (RF) algorithm was chosen for urban area extraction owing to its stable performance, high classification accuracy and feature importance evaluation capability. Based on the random forest algorithm, feature importance evaluation, urban area extraction, and accuracy assessment were carried out to determine the optimal feature combination for urban area extraction. It was found that: (1) the overall accuracy of urban area extraction with spectral and ICA transformed features is 93.1% and the Kappa coefficient is 0.86, which is superior to the results with other features; (2) Based on the random forest algorithm, the data is trained to obtain normalized importance of each feature. There is a similarity between the normalized importance of features and the standard deviation of mean values of the features, indicating that the importance estimate of features has a close relationship with the standard deviation of mean values of the features and both can be used to estimate importance of the variables.
表1 实验选取的4个特征维度及其对应的特征Tab. 1 4 feature dimensions selected by experiments and their corresponding features |
序号 | 特征名称 | |
---|---|---|
a | 光谱维度 | Coastal、Blue、Green、Red、NIR、SWIR1、SWIR2 |
b | ICA维度 | ICA1、ICA2、ICA3、ICA4、ICA5、ICA6、ICA7 |
c | PCA维度 | PCA1、PCA2、PCA3、PCA4、PCA5、PCA6、PCA7 |
d | 纹理维度 | Mean.(Mean)、Var.(Variance)、Hom.(Homogeneity)、Con.(Contrast)、Dis.(Dissimilarity)、Ent.(Entropy)、ASM.(Angular Second Moment)、Cor.(Correlation) |
图8 2017年北京市城镇用地提取结果验证点的空间分布Fig. 8 Distribution map of validation points for urban area extraction results in Beijing in 2017 |
表2 7种分类方案提取城镇用地结果的精度对比Tab. 2 Comparison of the accuracy of 7 classification schemes for extracting urban area |
序号 | 特征数量 | 特征名称 | 总体精度/% | Kappa系数 | 错分率/% | 漏分率/% | 运行时间/s |
---|---|---|---|---|---|---|---|
1 | 7 | 光谱特征 | 84.3 | 0.71 | 29.2 | 19.6 | 101.1 |
2 | 8 | 纹理特征 | 39.5 | 0.27 | 75.6 | 33.3 | 132.3 |
3 | 7 | PCA特征 | 81.9 | 0.62 | 25.3 | 23.9 | 97.5 |
4 | 7 | ICA特征 | 87.1 | 0.78 | 20.8 | 20.2 | 103.7 |
5 | 15 | 光谱+纹理 | 86.3 | 0.69 | 30.5 | 28.3 | 197.8 |
6 | 14 | 光谱+PCA | 89.2 | 0.79 | 23.3 | 31.5 | 164.2 |
7 | 14 | 光谱+ICA | 93.1 | 0.86 | 17.5 | 18.4 | 157.1 |
[1] |
|
[2] |
|
[3] |
徐涵秋, 王美雅. 地表不透水面信息遥感的主要方法分析[J]. 遥感学报, 2016,20(5):1270-1289.
[
|
[4] |
|
[5] |
|
[6] |
|
[7] |
|
[8] |
|
[9] |
|
[10] |
|
[11] |
|
[12] |
宫鹏, 张伟, 俞乐. 全球地表覆盖制图研究新范式[J]. 遥感学报, 2016,20(5):1002-1016.
[
|
[13] |
|
[14] |
詹国旗, 杨国东, 王凤艳, 等. 基于特征空间优化的随机森林算法在GF-2影像湿地分类中的研究[J]. 地球信息科学学报, 2018,20(10):1520-1528.
[
|
[15] |
温小乐, 钟奥, 胡秀娟. 基于随机森林特征选择的城市绿化乔木树种分类[J]. 地球信息科学学报, 2018,20(12):1777-1786.
[
|
[16] |
吴一全, 曹照清, 陶飞翔. 结合多尺度几何分析和KICA 的遥感图像变化检测[J]. 遥感学报, 2015,19(1):126-133.
[
|
[17] |
常睿春, 王璐, 王茂芝. FastICA在高光谱遥感矿物信息提取中的应用[J]. 国土资源遥感, 2013,25(4):129-132.
[
|
[18] |
曹晶晶, 卓莉, 王芳. 盲信号分离技术在高光谱混合像元分解中的应用[J]. 遥感技术与应用, 2013,28(3):488-495.
[
|
[19] |
|
[20] |
|
[21] |
|
[22] |
|
[23] |
|
[24] |
|
[25] |
|
[26] |
|
/
〈 | 〉 |