地球信息科学学报 ›› 2022, Vol. 24 ›› Issue (6): 1189-1203.doi: 10.12082/dqxxkx.2022.210727

• 遥感科学与应用技术 • 上一篇    下一篇

基于MAEU-CNN的高分辨率遥感影像建筑物提取

张华1(), 郑祥成1, 郑南山1,*(), 史文中2   

  1. 1.中国矿业大学环境与测绘学院,徐州 22116
    2.香港理工大学土地测量及地理资讯学系,香港 999077
  • 收稿日期:2021-11-15 修回日期:2021-12-01 出版日期:2022-06-25 发布日期:2022-08-25
  • 通讯作者: *郑南山(1974— ),男,安徽安庆人,博士,教授,主要从事遥感数据处理与应用。E-mail: znshcumt@163.com
  • 作者简介:张 华(1979— ),男,安徽合肥人,博士,副教授,主要从事遥感数据智能解译及GIS理论与应用研究。E-mail: zhhua_79@163.com
  • 基金资助:
    国家自然科学基金项目(41971400);国家自然科学基金项目(41974039)

Building Extraction from High Spatial Resolution Imagery based on MAEU-CNN

ZHANG Hua1(), ZHENG Xiangcheng1, ZHENG Nanshan1,*(), SHI Wenzhong2   

  1. 1. School of Environment and Spatial Informatics, China University of Mining and Technology, Xuzhou 22116, China
    2. Department of Land Surveying and Geo-informatics, The Hong Kong Polytechnic University, Hongkong 999077, China
  • Received:2021-11-15 Revised:2021-12-01 Online:2022-06-25 Published:2022-08-25
  • Supported by:
    National Natural Science Foundation of China(41971400);National Natural Science Foundation of China(41974039)

摘要:

从高空间分辨率图像(HSRI)中提取建筑物信息在遥感应用领域具有重要意义。然而,由于遥感影像中的建筑物尺度变化大、背景复杂和外观变化大等因素,从HSRI中自动提取建筑物仍然是一项具有挑战性的任务。特别是从影像中同时提取小型建筑物群和具有精确边界的大型建筑物时,难度更大。为解决这些问题,本文提出了一种端到端的编码器-解码器神经网络模型,用于从HSRI中自动提取建筑物。所设计的网络称为MAEU-CNN(Multiscale Feature Enhanced U-shaped CNN with Attention Block and Edge Constraint)。首先,在设计的网络编码部分加入多尺度特征融合(MFF)模块,使网络能够更好地聚集多个尺度特征。然后,在编码器和解码器部分之间添加了多尺度特征增强模块(MFEF),以获得不同尺寸的感受野,用于获取更多的多尺度上下文信息。在跳跃连接部分引入双重注意机制,自适应地选择具有代表性的特征图用于提取建筑物。最后,为了进一步解决MAEU-CNN中由于池化及卷积操作导致的分割结果边界模糊的问题,引入多任务学习机制,将建筑物的边界几何信息融入网络中以优化提取的建筑物边界,最终获得精确边界的建筑物信息。MAEU-CNN在ISPRS Vaihingen语义标记数据集和WHU航空影像数据集2种不同尺度建筑物数据集上进行了试验分析,在ISPRS Vaihingen语义标记数据集上,MAEU-CNN在精度、F1分数和IoU指标中获得了最高精度,分别达到了93.4%、93.62%和88.01%;在WHU航空影像数据集上,召回率、F1分数和IoU指标中也获得了最高精度,分别达到了95.45%、95.58%和91.54%。结果表明,本文所提出的MAEU-CNN从遥感图像中提取建筑物信息精度较高,并且对于不同尺度具有较强的鲁棒性。

关键词: 建筑物提取, 深度学习, 注意力机制, 多任务学习, 多尺度特征增强, 感受野, 边界约束, 高分辨率遥感影像

Abstract:

Extraction of buildings from High Spatial Resolution Imagery (HSRI) plays an important role in remotely sensed imagery application. However, automatically extracting buildings from HSRI is still a challenging task due to factors such as large-scale variation of buildings, background complexity, and variation in appearance, etc. Especially, it is difficult in extracting both crowded small buildings and large buildings with accurate boundaries. To address these challenges, this paper presents an end-to-end encoder-decoder model to automatically extract buildings from HSRI. The designed network is called multiscale feature enhanced U-Shaped CNN with attention block and edge constraint (MAEU-CNN). Firstly, a Multiscale Feature Fusion (MFF) module is adopted in the encoder part of the network, which enables the network to aggregate features from multiple scales. Then, a Multi-scale Feature Enhancement module (MFEF) is added between the encoder and decoder parts to obtain multiscale receptive fields for obtaining multiscale context information. Thirdly, a dual attention mechanism is introduced to adaptively select representative feature maps for extraction of buildings instead of direct skipping connections. Lastly, in order to further solve the problem of segmentation result with poor boundaries aroused by the pooling operations in the MAEU-CNN, the geometric information of building boundary is introduced into the proposed MAEU-CNN by multi-task learning using the distance class map to produce fine-grained segmentations with precise boundaries. The performance of MAEU-CNN is examined through two different data sets at different building scales. The results show that MAEU-CNN obtains the greatest accuracy in each data set. The Precision, F1, and IoU is 93.4%, 93.62%, and 88.01%, respectively using the ISPRS Vaihingen semantic labeling contest data set. The Recall, F1, and IoU reach 95.45%, 95.58%, and 91.54%, respectively, using the WHU aerial image data set. Experimental results demonstrate that our proposed MAEU-CNN can achieve high accuracy for the extraction of building from remotely sensed imagery and show great robustness at different scales.

Key words: building extraction, deep learning, attention mechanism, multi-task learning, multiscale feature enhancement, receptive field, edge constraint, high spatial resolution imagery