Journal of Geo-information Science ›› 2019, Vol. 21 ›› Issue (11): 1779-1789.doi: 10.12082/dqxxkx.2019.190285

Previous Articles     Next Articles

Building Extraction based on SE-Unet

LIU Hao1,2, LUO Jiancheng1,2,*(), HUANG Bo3, YANG Haiping4, HU Xiaodong1, XU Nan1,2, XIA Liegang4   

  1. 1. State Key Laboratory of Remote Sensing Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
    3. Department of Geography and Resource Management, The Chinese University of Hongkong, Hongkong 999077, China
    4. College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310024, China
  • Received:2019-06-09 Revised:2019-08-12 Online:2019-10-25 Published:2019-12-11
  • Contact: LUO Jiancheng
  • Supported by:
    National Natural Science Foundation of China(No.41631179);Zhejiang Provincial Natural Science Foundation of China(No.LQ19D010006);National Key Research and Development Program of China(No.2017YFB0503600)


Automatic extraction of urban buildings has great importance in applications like urban planning and disaster prevention. In this regard, high-resolution remote sensing imagery contain sufficient information and are ideal data for precise extraction. Traditional approaches (excluding visual interpretation) demand researchers to manually design features to describe buildings and distinguishing them from other objects. Unfortunately, the complexity in high-resolution imagery makes these features fragile due to the change of sensors, imaging conditions, and locations. Recently, the convolutional neural networks, which succeeded in many visual applications including image segmentation, were used to extract buildings in high spatial resolution remote sensing imagery and achieved desirable results. However, convolutional neural networks still have much to improve regarding especially network architecture and loss functions. This paper proposed a convolutional neural network SE-Unet. It is based on U-Net architecture and employs squeeze-and-excitation modules in its encoder. The squeeze-and-excitation modules activate useful features and deactivate useless features in an adaptively weighted manner, which can remarkably increase network capacity with only a few extra parameters and memory cost. The decoder of SE-Unet concatenates corresponding features in the encoder to recover spatial information, as the U-Net does. Dice and cross-entropy loss function was applied to train the network and successfully alleviated the sample imbalance problem in building extraction. All experiments were performed on the Massachusetts building dataset for evaluation. Comparing to SegNet, LinkNet, U-Net, and other networks, SE-Unet showed the best results in all evaluation metrics, achieving 0.8704, 0.8496, 0.8599, and 0.9472 in terms of precision, recall, F1-score, and overall accuracy, respectively. Also, SE-Unet presented even better precision in extracting buildings that vary in size and shape. Our findings prove that squeeze-and-excitation modules can effectively strengthen network capability, and that dice and cross-entropy loss function can be useful in other sample imbalanced situations that involve high-resolution remote sensing imagery.

Key words: high spatial resolution remote sensing imagery, Massachusetts building dataset, building extraction, deep learning, convolutional neural network, SE-Unet, loss function