地球信息科学学报 ›› 2023, Vol. 25 ›› Issue (3): 638-653.doi: 10.12082/dqxxkx.2023.220708

• 遥感科学与应用技术 • 上一篇    下一篇

融合多元稀疏特征与阶层深度特征的遥感影像目标检测

高鹏飞(), 曹雪峰, 李科(), 游雄   

  1. 战略支援部队信息工程大学地理空间信息学院,郑州 450001
  • 收稿日期:2022-09-20 修回日期:2022-12-02 出版日期:2023-03-25 发布日期:2023-04-19
  • 通讯作者: * 李科(1977—),男,河北青县人,博士,教授,研究方向为地理环境智能感知和数据工程。 E-mail: like19771223@163.com
  • 作者简介:高鹏飞(1997—),男,河南开封人,硕士生,研究方向为深度学习和地理空间数据工程。E-mail: gao_pengfei2020@163.com
  • 基金资助:
    国家自然科学基金项目(41871322);国家自然科学基金项目(42130112)

Object Detection in Remote Sensing Images by Fusing Multi-neuron Sparse Features and Hierarchical Depth Features

GAO Pengfei(), CAO Xuefeng, LI Ke(), YOU Xiong   

  1. Institute of Geospatial Information, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, China
  • Received:2022-09-20 Revised:2022-12-02 Online:2023-03-25 Published:2023-04-19
  • Contact: LI Ke
  • Supported by:
    National Natural Science Foundation of China(41871322);National Natural Science Foundation of China(42130112)

摘要:

遥感影像目标检测在城市规划、自然资源调查、国土测绘、军事侦察等领域有着广泛的应用价值。针对遥感影像目标检测在目标尺度变化大、目标外观相似性高以及背景复杂度高等方面的难点,本文提出了一种新的目标检测算法,该算法有效融合了多元稀疏特征提取模块(MNB)和阶层深度特征融合模块(HDFB)。多元稀疏特征提取模块以多个卷积分支结构来模拟神经元的多个突触结构提取稀疏分布的特征,随着网络层的堆叠获取更大感受野范围内的稀疏特征,从而提高捕获的多尺度目标特征的质量。阶层深度特征融合模块基于空洞卷积提取不同深度的上下文信息特征,然后提取特征通过独创的树状融合网络,从而实现局部特征与全局特征在特征图级别的融合。本文算法在大规模公开数据集DIOR进行验证,实验结果表明:① 多元稀疏特征提取模块和阶层深度特征融合模块相结合的方法总体准确率达到72.5%,单张遥感影像的平均检测耗时为3.8毫秒;② 通过使用多元稀疏特征提取模块,多尺度和外观相似性目标的检测精度得到了提高,与使用Step-wise分支的物体检测结果相比,总体精度提高了5.8%;③ 通过阶层深度特征融合模块的多感受野深度特征融合网络提取阶层深度特征,并为实现局部特征与全局特征在特征图级别的融合提供了一种新的思路,提高了网络对上下文信息的获取能力;④ 重构PANet特征融合网络,以多元稀疏特征提取模块对不同尺度的稀疏特征进行融合,有效提高了PANet结构在遥感影像目标检测任务中的有效性。许多因素深刻影响着算法的最终表现:一方面高质量数据集是更高精度的基础,如图像质量、目标遮挡、目标的类内差异性大等深刻影响着检测器的训练效果;另一方面算法模型参数设置,如对数据集进行聚类分析得到候选框以提高最佳召回率,保证阶层深度特征融合模块的感受野范围覆盖特征图是确保精度的关键。我们得出结论:使用多元稀疏特征提取网络可以提高特征质量,而阶层深度特征融合模块可以融合上下文信息,减少复杂背景噪声的影响,从而在遥感影像的目标检测任务中获得更好的性能。

关键词: 遥感影像, 卷积神经网络, 稀疏特征, 阶层深度特征, 空洞卷积, 多分支结构, 感受野, 多尺度目标

Abstract:

Object detection in remote sensing images is of great significance to urban planning, natural resource survey, land surveying, and other fields. The rapid development of deep learning has greatly improved the accuracy of object detection. However, object detection in remote sensing images faced many challenges such as multi-scale, appearance ambiguity, and complicated background. The remote sensing image datasets have a large range of object size variations, e.g., object resolutions range from a dozen to hundreds of pixels. high background complexity, remote sensing images are obtained with full time geographic information; high similarity in the appearance of different classes of targets; and diversity within classes. To address these problems, a deep convolutional network architecture that fuses the Multi-Neuron Sparse feature extraction block (MNB) and Hierarchical Deep Feature Fusion Block (HDFB) is proposed in this paper. The MNB uses multiple convolutional branching structures to simulate multiple synaptic structures of neurons to extract sparsely distributed features, and improves the quality of captured multi-scale target features by acquiring sparse features in a larger receptive field range as the network layers are stacked. The HDFB extracts contextual features of different depths based on null convolution, and then extracts features through a unique multi-receptive field depth feature fusion network, thus realizing the fusion of local features with global features at the feature map level. Experiments are conducted on the large-scale public datasets (DIOR). The results show that: (1) the overall accuracy of the method reaches 72.5%, and the average detection time of a single remote sensing image is 3.8 milliseconds; Our method has better detection accuracy for multi-scale objects with high appearance similarity and complex background than other SOTA methods; (2) The object detection accuracy of multi-scale and appearance ambiguity targets is improved by using MNB. Compared with object detection results with Step-wise branches, the overall accuracy is improved by 5.8%, and the sum operation on the outputs of each branch help achieve better feature fusion; (3) The HDFB extracts the hierarchical features by the hierarchical depth feature fusion module, which provides a new idea to realize the fusion of local features and global features at the feature map level and improves the fusion capability of the network context information; (4) The reconstructed PANet feature fusion network fuses sparse features at different scales with multivariate sparse feature extraction module, which effectively improves the effectiveness of PANet structure in remote sensing image target detection tasks. Many factors influence the final performance of the algorithm. On the one hand, high quality data sets are the basis of higher accuracy, e.g., image quality, target occlusion, and large intra-class variability of targets profoundly affect the training effect of the detector; on the other hand, model parameters settings, such as clustering analysis of the dataset to obtain bounding boxes information to improve the best recall, and the perceptual field range of the class depth feature fusion module, are key to ensuring accuracy. We conclude that using a Multi-Neuron Sparse feature extraction Network can improve feature quality, while a Hierarchical Deep Feature Fusion Block can fuse contextual information and reduce the impact of complex background noise, resulting in better performance in object detection tasks in remote sensing images.

Key words: remote sensing image, convolutional neural network, sparse feature, hierarchical depth feature, atrous convolution, multi-branching structures, receptive field, multi-scale objectives