地球信息科学学报 ›› 2020, Vol. 22 ›› Issue (8): 1597-1606.doi: 10.12082/dqxxkx.2020.190385

• 地球信息科学理论与方法 • 上一篇    下一篇

基于独立成分分析和随机森林算法的城镇用地提取研究

蒲东川1,2(), 王桂周1,3,4, 张兆明1,3,4,*(), 牛雪峰2, 何国金1,3,4, 龙腾飞1,3,4, 尹然宇1,3,4, 江威1,3,4, 孙嘉悦2   

  1. 1.中国科学院空天信息创新研究院, 北京 100094
    2.吉林大学地球探测科学与技术学院, 长春 130026
    3.海南省地球观测重点实验室, 三亚 572029
    4.三亚中科遥感研究所, 三亚 572029
  • 收稿日期:2019-07-19 修回日期:2019-11-25 出版日期:2020-08-25 发布日期:2020-10-25
  • 通讯作者: 张兆明 E-mail:pudc17@mails.jlu.edu.cn;zhangzm@radi.ac.cn
  • 作者简介:蒲东川(1995— ),男,重庆奉节人,硕士生,主要从事地表覆盖分类、城市遥感等研究。 E-mail:pudc17@mails.jlu.edu.cn
  • 基金资助:
    国家自然科学基金重点项目(61731022);中科院A类先导专项(XDA19090300);国家重点研发计划课题(2016YFA0600302);国家重点研发计划课题(2016YFB0501502)

Urban Area Extraction based on Independent Component Analysis and Random Forest Algorithm

PU Dongchuan1,2(), WANG Guizhou1,3,4, ZHANG Zhaoming1,3,4,*(), NIU Xuefeng2, HE Guojin1,3,4, LONG Tengfei1,3,4, YIN Ranyu1,3,4, JIANG Wei1,3,4, SUN Jiayue2   

  1. 1. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
    2. College of Geo-exploration Science and Technology, Jilin University, Changchun 130026, China
    3. Key Laboratory of Earth Observation Hainan Province, Sanya 572029, China
    4. Sanya Institute of Remote Sensing, Sanya 572029, China
  • Received:2019-07-19 Revised:2019-11-25 Online:2020-08-25 Published:2020-10-25
  • Contact: ZHANG Zhaoming E-mail:pudc17@mails.jlu.edu.cn;zhangzm@radi.ac.cn
  • Supported by:
    National Natural Science Foundation of China(61731022);Strategic Priority Research Program of the Chinese Academy of Sciences(XDA19090300);National Key Research and Development Project(2016YFA0600302);National Key Research and Development Project(2016YFB0501502)

摘要:

城镇用地信息是联合国2030年可持续发展议程关注的重点之一。城市在世界范围内迅速扩张,快速准确地获取城镇用地信息对于政府决策具有重要作用。城镇土地覆盖信息非常复杂,包括人工建筑、树木、草地、水体等多种地表覆盖类型。基于传统人工测绘获取城镇用地信息费时费力并且难于及时更新。Landsat等遥感卫星数据为城镇用地信息提取提供了丰富的数据源。基于卫星遥感数据提取的城镇用地信息可以为未来城市的建设和管理提供基础的科学决策数据。基于监督分类方法和卫星遥感数据可快速地提取城镇用地信息,然而特征变量的选择对于高精度城镇用地信息提取尤为重要。为研究不同特征变量组合对于城镇用地信息提取的影响,以北京市为研究区,以2017年7月10日获取的Landsat 8 OLI影像为数据源,通过数据预处理、纹理提取、独立成分分析、主成分分析等得到4个维度的29个特征,选取了7种特征组合方案进行城镇用地提取。考虑随机森林算法性能稳定,分类精度高和可以方便进行特征重要性评价等优点,选择其作为监督分类算法以提取城镇用地信息,并进行了精度评定,以确定最优的城镇用地提取特征组合。研究发现:综合利用光谱特征和独立成分分析后的影像特征,提取城镇用地的总体精度为93.1%,Kappa系数为0.86,优于利用其他特征的提取结果;基于随机森林算法对数据进行训练后输出的各变量的归一化变量重要性与特征均值的标准差结果存在相似性,利用随机森林算法的变量重要性估计与特征均值折线图都可以进行变量重要性评价。

关键词: 随机森林, 独立成分分析, 主成分分析, 灰度共生矩阵, 卫星遥感, 城镇用地, Landsat 8, 特征重要性

Abstract:

Urban area information is of great significance for human development, in the 2030 United Nations (UN) Sustainable Development Agenda. Urban area expanded rapidly in many places of the world. Accurate and timely urban area information is very important for decision makers. However, land cover in urban area is highly complex, including artificial buildings, trees, grasslands, water bodies, etc. Extraction of urban land cover information based on traditional manual survey is time-consuming and difficult to update in time. Free access to remote sensing satellite data such as Landsat provides a rich source of data for urban area extraction. Urban area information extracted from space borne remote sensing images can provide basic scientific data for decision-making and city construction and management. Based on supervised classification method and satellite remote sensing data, it is possible to extract urban areas fast. However, choosing appropriate feature variables is very important for obtaining accurate urban area extraction result, especially linear correlations between different features has a significant impact on the extraction accuracy. After implementing independent component analysis (ICA) transformation to satellite remote sensing image data, linearly independent feature variables can be obtained, therefore accuracy of urban area extraction can be effectively improved. Taking Beijing city as the study area and Landsat 8 Operational Land Imager (OLI) imagery (path/row: 123/32) acquired on July 10th, 2017 as the experimental data, preprocessing, texture extraction, independent component analysis, and principal component analysis were performed, 29 features in 4 dimensions and 7 feature variable combinations were selected. Then, Random Forest (RF) algorithm was chosen for urban area extraction owing to its stable performance, high classification accuracy and feature importance evaluation capability. Based on the random forest algorithm, feature importance evaluation, urban area extraction, and accuracy assessment were carried out to determine the optimal feature combination for urban area extraction. It was found that: (1) the overall accuracy of urban area extraction with spectral and ICA transformed features is 93.1% and the Kappa coefficient is 0.86, which is superior to the results with other features; (2) Based on the random forest algorithm, the data is trained to obtain normalized importance of each feature. There is a similarity between the normalized importance of features and the standard deviation of mean values of the features, indicating that the importance estimate of features has a close relationship with the standard deviation of mean values of the features and both can be used to estimate importance of the variables.

Key words: random forest, independent component analysis, principal component analysis, gray level co-occurrence matrix, satellite remote sensing, urban area, Landsat 8, feature importance