地球信息科学学报 ›› 2021, Vol. 23 ›› Issue (3): 514-523.doi: 10.12082/dqxxkx.2021.190805

• 遥感科学与应用技术 • 上一篇    下一篇

基于多任务学习的高分辨率遥感影像建筑提取

朱盼盼1,2(), 李帅朋1,2, 张立强1,2,*(), 李洋1,2   

  1. 1.北京师范大学地理科学学部, 北京 100875
    2.北京师范大学环境遥感与数字城市北京市重点实验室, 北京 100875
  • 收稿日期:2019-12-26 修回日期:2020-04-23 出版日期:2021-03-25 发布日期:2021-05-25
  • 通讯作者: 张立强 E-mail:zlyxbmsl@163.com;zhanglq@bnu.edu.cn
  • 作者简介:朱盼盼(1989- ),女,河南周口人,博士生,主要从高分辨率光学遥感影像信息提取研究。E-mail: zlyxbmsl@163.com
  • 基金资助:
    国家自然科学基金项目(41371324)

Multitask Learning-based Building Extraction from High-Resolution Remote Sensing Images

ZHU Panpan1,2(), LI Shuaipeng1,2, ZHANG Liqiang1,2,*(), LI Yang1,2   

  1. 1. Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
    2. Beijing Key Laboratory of Environmental Remote Sensing and Digital City, Beijing Normal University, Beijing 100875, China
  • Received:2019-12-26 Revised:2020-04-23 Online:2021-03-25 Published:2021-05-25
  • Contact: ZHANG Liqiang E-mail:zlyxbmsl@163.com;zhanglq@bnu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(41371324)

摘要:

建筑物的自动提取对城市发展与规划、防灾预警等意义重大。当前的建筑物提取研究取得了很好的成果,但现有研究多把建筑提取当成语义分割问题来处理,不能区分不同的建筑个体,且在提取精度方面仍然存在提升的空间。近年来,基于多任务学习的深度学习方法已在计算机视觉领域得到广泛应用,但其在高分辨率遥感影像自动解译任务上的应用还有待进一步发展。本研究借鉴经典的实例分割算法Mask R-CNN和语义分割算法U-Net的思想,设计了一种将语义分割模块植入实例分割框架的深度神经网络结构,利用多种任务之间的信息互补性来提升模型的泛化性能。自底向上的路径增强结构缩短了低层细节信息向上传递的路径。自适应的特征池化使得实例分割网络可以充分利用多尺度信息。在多任务训练模式下完成了对遥感影像中建筑物的自动分割,并在经典的遥感影像数据集SpaceNet上对该方法进行验证。结果表明,本文提出的基于多任务学习的建筑提取方法在巴黎数据集上建筑实例分割精度达到58.8%,在喀土穆数据集上建筑实例分割精度达到60.7%,相比Mask R-CNN和U-Net提升1%~2%。

关键词: 深度学习, 多任务学习, 语义分割, 实例分割, 遥感影像, 建筑提取, Mask R-CNN, U-Net

Abstract:

Automatic extraction of buildings is of great significance to urban development and planning, and disaster prevention and early warning. Current researches on building extraction have achieved good results, but the existing research methods often take building extraction as a semantic segmentation problem and cannot distinguish different building individuals. Thus, there is still room of improvement in extraction accuracy. In recent years, deep learning methods based on multitask learning have been widely used in the field of computer vision, but its application in automatic interpretation of high-resolution remote sensing images has not yet further developed. The instance segmentation branch of Mask R-CNN is built on the basis of target detection, and can predict segmentation masks on each region of interest. However, some spatial details and the contextual information of the edge pixels of the region of interest will be lost inevitably. The semantic segmentation task can introduce more contextual information to the network. Therefore, the integration of semantic segmentation and instance segmentation tasks can improve the generalization performance of the whole network. Based on the classic instance segmentation method (Mask R-CNN) and a typical semantic segmentation method (U-Net), this research designs a deep neural network structure which embeds the semantic segmentation module into the instance segmentation framework, and improves the generalization performance of the model by using the information complementarity between various tasks. The bottom-up path augmentation structure shortens the path of lower layers’ information to pass up. The adaptive feature pooling makes it possible for instance segmentation network to make full use of multi-scale information. The automatic building segmentation of remote sensing images is performed in the multi-task training mode and the proposed method is verified on the classic remote sensing image data set (SpaceNet). The result shows that the building instance segmentation accuracy of our proposed method is 58.8% in the Paris data set and 60.7% in the Khartoum data set, increased by 1%~2% compared to individual Mask R-CNN and U-Net. The disadvantages of the proposed method are shown in two aspects, one is that the false extraction and missing extraction of small buildings are relatively high, and the other is that the accuracy of building boundary extraction needs to be improved.

Key words: deep learning, multi-task learning, semantic segmentation, instance segmentation, remote sensing, building extraction, Mask R-CNN, U-Net