混合特征与多尺度融合的光学小目标检测算法

史世豪; 施群山; 周杨; 胡校飞; 齐凯

doi:10.12082/dqxxkx.2025.250015

地球信息科学学报 >

2025 , Vol. 27 >Issue 7: 1596 - 1607

DOI: https://doi.org/10.12082/dqxxkx.2025.250015

遥感科学与应用技术

混合特征与多尺度融合的光学小目标检测算法

史世豪 ,
施群山 ^,^* ,
周杨 ,
胡校飞 ,
齐凯

展开

信息工程大学地理空间信息学院，郑州 450001

^*施群山（1985— ），男，江苏盐城人，博士，副教授，主要从事摄影测量与遥感等研究。E-mail: hills1@163.com

作者贡献：Author Contributions

史世豪和施群山参与实验设计；史世豪、齐凯参与实验操作；史世豪、周杨、胡校飞参与论文的写作与修改。所有作者均阅读并同意最终稿件的提交。

The study was designed by SHI Shihao and SHI Qunshan. SHI Shihao and QI Kai conducted the experiments; SHI Shihao, ZHOU Yang and HU Xiaofei contributed to the writing and revision of the manuscript. All authors have read and approved the final manuscript.

史世豪（1999— ），男，河南开封人，硕士生，主要从事摄影测量与遥感、目标检测跟踪等研究。E-mail: syw15690860529@163.com

收稿日期: 2025-01-06

修回日期: 2025-04-22

网络出版日期: 2025-07-07

基金资助

国家自然科学基金(42001338)

河南省自然科学基金项目(202300410536)

智慧中原地理信息技术河南省协同创新中心和时空感知与智能处理自然资源部重点实验室基金项目(212108)

收起

An Optical Small Object Detection Algorithm Using Hybrid Features and Multi-Scale Fusion

SHI Shihao ,
SHI Qunshan ^,^* ,
ZHOU Yang ,
HU Xiaofei ,
QI Kai

Expand

Institute of Geospatial Information, Information Engineering University, Zhengzhou 450001, China

^*SHI Qunshan, E-mail: hills1@163.com

Received date: 2025-01-06

Revised date: 2025-04-22

Online published: 2025-07-07

Supported by

National Natural Science Foundation of China(42001338)

Natural Science Foundation of Henan province(202300410536)

Joint Fund of Collaborative Innovation Center of Geo-Information Technology for Smart Central Plains, Henan Province and Key Laboratory of Spatiotemporal Perception and Intelligent processing, Ministry of Natural Resources(212108)

Fold

摘要

【目的】小目标检测在军事和民用领域具有重要意义，但由于低分辨率、高噪声环境、目标遮挡及背景复杂等因素的影响，传统检测方法在精度和鲁棒性上难以满足实际需求，复杂场景下的小目标检测问题仍极具挑战性。因此，本文提出一种混合特征与多尺度融合的小目标检测算法。【方法】首先，本文设计了一种混合特征提取模块（Hybrid Conv and Transformer Block, HCTB），充分利用局部和全局上下文信息来增强网络对小目标感知，优化了计算效率和特征提取能力；其次，提出了多膨胀率共享卷积核模块（Multi-Dilated Shared Kernel Conv, MDSKC），通过不同膨胀率的空洞卷积扩展主干的感受野，高效提取多尺度特征；最后，结合基于Omni-Kernel和Cross Stage Partial思想构建的全核跨阶段特征融合模块（Omni-Kernel Cross Stage Model, OKCSM），优化了小目标特征金字塔网络，更大程度上保留小目标的信息，提高了检测性能。【结果】本文在 VisDrone2019和TinyPerson数据集上进行了消融实验和对比实验，结果表明：本文方法相较于基线模型yolov8n，在查准率、召回率、mAP₅₀、mAP_50:95上分别提升为1.3%、3.1%、3%、1.9%和3.6%、1.3%、2.1%、0.7%，且模型尺寸和GFLOPs仅为6.3 MB和11.3 G；此外，在与HIC-Yolov5、TPH-yolov5、Drone-YOLO等经典算法的对比实验中，本文提出的算法显示出明显的优势，优于其他对比方法。【结论】本文算法有效提升了检测精度，证明了本文算法面对复杂场景中小目标检测问题方面具有良好的检测性能。

关键词： 小目标检测; 多尺度特征融合; 特征金字塔; 空洞卷积; Yolov8; 多膨胀率; 混合特征提取

本文引用格式

史世豪 , 施群山 , 周杨 , 胡校飞 , 齐凯 . 混合特征与多尺度融合的光学小目标检测算法[J]. 地球信息科学学报, 2025 , 27(7) : 1596 -1607 . DOI: 10.12082/dqxxkx.2025.250015

Abstract

[Objectives] Small object detection is of great significance in both military and civil applications. However, due to challenges such as low resolution, high noise environments, target occlusion, and complex backgrounds, traditional detection methods often struggle to achieve the necessary accuracy and robustness. The problem of detecting small objects in complex scenes remains highly challenging. Therefore, this paper proposes a hybrid feature and multi-scale fusion algorithm for small object detection. [Methods] First, a Hybrid Conv and Transformer Block (HCTB) is designed to fully utilize local and global context information, enhancing the network's perception of small objects while optimizing computational efficiency and feature extraction capability. Second, a Multi-Dilated Shared Kernel Conv (MDSKC) module is introduced to extend the receptive field of the backbone network using dilated convolutions with varying expansion rates, thereby enabling efficient multi-scale feature extraction. Finally, the Omni-Kernel Cross Stage Model (OKCSM), constructed based on the concepts of Omni-Kernel and Cross Stage Partial, is integrated to optimize the small target feature pyramid network. This approach helps preserve small object information and significantly improves detection performance. [Results] Ablation and comparison experiments were conducted on the VisDrone2019 and TinyPerson datasets. Compared to the baseline model YOLOv8n, the proposed method improves precision, recall, mAP@₅₀, and mAP@_50:95 by 1.3%, 3.1%, 3%, and 1.9%, respectively on VisDrone2019, and by 3.6%, 1.3%, 2.1%, and 0.7%, respectively on TinyPerson. Additionally, the model size and GFLOPs are only 6.3 MB and 11.3 G, demonstrating its efficiency. Furthermore, compared with classical algorithms, such as HIC-YOLOv5, TPH- YOLOv5, and Drone-YOLO, the proposed algorithm demonstrates significant advantages and superior performance. [Conclusions] The algorithm effectively improves detection accuracy, confirming its strong performance in addressing small object detection in complex scenes.

Key words： small target detection; multiscale feature fusion; feature pyramid; dilated convolution; Yolov8; multi-dilated; hybrid feature extraction

利益冲突：Conflicts of Interest 所有作者声明不存在利益冲突。

All authors disclose no relevant conflicts of interest.

参考文献

原文顺序 | 文献年度倒序 | 文中引用次数倒序

[1]	胡惠娟, 秦一锋, 徐鹤, 等. 面向无人机航拍图像的YOLOv8目标检测改进算法[J]. 计算机科学, 2025, 52(4):202-211. [ Hu H J, Qin Y F, Xu H, et al. An improved YOLOv8 object detection algorithm for UAV aerial images[J]. Computer Science, 2025, 52(4):202-211. ]

[2]	Min X L, Zhou W, Hu R, et al. LWUAVDet: A lightweight UAV object detection network on edge devices[J]. IEEE Internet of Things Journal, 2024, 11(13):24013-24023. DOI:10.1109/JIOT.2024.3388045

[3]	潘玮, 韦超, 钱春雨, 等. 面向无人机视角下小目标检测的YOLOv8s改进模型[J]. 计算机工程与应用, 2024, 60(9):142-150. DOI [ Pan W, Wei C, Qian C Y, et al. Improved YOLOv8s model for small object detection from perspective of drones[J]. Computer Engineering and Applications, 2024, 60(9):142-150. ] DOI

[4]	Gong Y Q, Yu X H, Ding Y, et al. Effective fusion factor in FPN for tiny object detection[C]// 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2021: 1159-1167. DOI:10.1109/wacv48630.2021.00120

[5]	Wang C Y, Bochkovskiy A, Liao H M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023:7464-7475. DOI:10.1109/CVPR52729.2023.00721

[6]	Zhang Y, Ye M, Zhu G Y, et al. FFCA-YOLO for small object detection in remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62:1-15. DOI:10.1109/TGRS.2024.3363057

[7]	Tang S Y, Zhang S, Fang Y N. HIC-YOLOv5: Improved YOLOv5 for small object detection[C]// 2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024: 6614-6619. DOI:10.1109/ICRA57147.2024.10610273

[8]	Zhu X K, Lyu S C, Wang X, et al. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]// 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). IEEE, 2021:2778-2788. DOI: 10.1109/iccvw54120.2021.00312

[9]	Xiong X R, He M T, Li T Y, et al. Adaptive feature fusion and improved attention mechanism-based small object detection for UAV target tracking[J]. IEEE Internet of Things Journal, 2024, 11(12):21239-21249. DOI:10.1109/JIOT.2024.3367415

[10]	董一兵, 曾辉, 侯少杰. LMUAV-YOLOv8:低空无人机视觉目标检测轻量化网络[J]. 计算机工程与应用, 2025, 61(3):94-110. DOI [ Dong Y B, Zeng H, Hou S J. LMUAV-YOLOv8: Lightweight network for object detection in low-altitude UAV vision[J]. Computer Engineering and Applications, 2025, 61(3):94-110. ] DOI

[11]	梁燕, 何孝武, 邵凯, 等. 改进YOLOv8的无人机航拍图像目标检测算法[J]. 计算机工程与应用, 2025, 61(1):121-130. DOI [ Liang Y, He X W, Shao K, et al. Target detection algorithm for UAV images based on improved YOLOv8[J]. Computer Engineering and Applications, 2025, 61(1):121-130. ] DOI

[12]	Zhang Z X. Drone-YOLO: An efficient neural network method for target detection in drone images[J]. Drones, 2023, 7(8):526. DOI:10.3390/drones7080526

[13]	Zhao Q, Liu B H, Lyu S C, et al. TPH-YOLOv5++: Boosting object detection on drone-captured scenarios with cross-layer asymmetric transformer[J]. Remote Sensing, 2023, 15(6):1687. DOI:10.3390/rs15061687

[14]	Li Y T, Fan Q S, Huang H S, et al. A modified YOLOv8 detection network for UAV aerial image recognition[J]. Drones, 2023, 7(5):304. DOI:10.3390/drones7050304

[15]	Han K, Wang Y H, Tian Q, et al. GhostNet: More features from cheap operations[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020:1577-1586. DOI:10.1109/CVPR42600.2020.00165

[16]	Wang G, Chen Y F, An P, et al. UAV-YOLOv8: A small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios[J]. Sensors, 2023, 23(16):7190. DOI:10.3390/s23167190

[17]	Zhu L, Wang X J, Ke Z H, et al. BiFormer: Vision transformer with bi-level routing attention[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023:10323-10333. DOI:10.1109/CVPR52729.2023.00995

[18]	Zhang W, Liu C S, Chang F L, et al. Multi-scale and occlusion aware network for vehicle detection and segmentation on UAV aerial images[J]. Remote Sensing, 2020, 12(11):1760. DOI:10.3390/rs12111760

[19]	Lin Q Z, Ding Y, Xu H, et al. ECascade-RCNN: Enhanced cascade RCNN for multi-scale object detection in UAV images[C]// 2021 7th International Conference on Automation, Robotics and Applications (ICARA). IEEE, 2021: 268-272. DOI:10.1109/icara51699.2021.9376456

[20]	Lyu Y, Vosselman G, Xia G S, et al. Bidirectional multi-scale attention networks for semantic segmentation of oblique uav imagery[J]. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2021, V-2-2021:75-82. DOI:10.5194/isprs-annals-v-2-2021-75-2021

[21]	Liu W, Dragomir A, Dumitru E, et al. SSD: Single Shot Multi Box Detector[C]// European conference on computer vision (ECCV). Cham: Springer, 2016:21-37. DOI:10.1007/978-3-319-46448-0_2

[22]	Ghiasi G, Lin T Y, Le Q V. NAS-FPN: Learning scalable feature pyramid architecture for object detection[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019:7029-7038. DOI:10.1109/cvpr.2019.00720

[23]	Tan M X, Pang R M, Le Q V. EfficientDet: Scalable and efficient object detection[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 10778-10787. DOI:10.1109/cvpr42600.2020.01079

[24]	Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[M]// European Conference on Computer Vision (ECCV). Cham: Springer International Publishing, 2020:213-229. DOI:10.1007/978-3-030-58452-8_13

[25]	Liu F C, Gao C Q, Chen F, et al. Infrared small and dim target detection with transformer under complex backgrounds[J]. IEEE Transactions on Image Processing, 2023, 32:5921-5932. DOI:10.1109/TIP.2023.3326396

[26]	Qi M B, Liu L, Zhuang S, et al. FTC-net: Fusion of transformer and CNN features for infrared small target detection[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15:8613-8623

[27]	Chen G, Wang W H, Tan S R. IRSTFormer: A hierarchical vision transformer for infrared small target detection[J]. Remote Sensing, 2022, 14(14):3258. DOI:10.3390/rs14143258

[28]	Li Y H, Mao H Z, Girshick R, et al. Exploring plain vision transformer backbones for object detection[M]// European Conference on Computer Vision (ECCV). Cham: Springer Nature Switzerland, 2022:280-296. DOI:10.1007/978-3-031-20077-9_17

[29]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Curran Associates Inc, 2017:6000-6010. DOI:10.48550/arXiv.1706.03762

[30]	Shi D. TransNeXt: Robust foveal visual perception for vision transformers[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2024:17773-17783. DOI:10.1109/CVPR52733.2024.01683

[31]	Sunkara R, Luo T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects[M]// Machine Learning and Knowledge Discovery in Databases. Cham: Springer Nature Switzerland, 2023:443-459. DOI:10.1007/978-3-031-26409-2_27

[32]	Wang C Y, Mark Liao H Y, Wu Y H, et al. CSPNet: A new backbone that can enhance learning capability of CNN[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2020: 1571-1580. DOI:10.1109/cvprw50498.2020.00203

[33]	Cui Y, Ren W, Knoll A. Omni-kernel network for image restoration[C]// Proceedings of the 38th AAAI Conference on Artificial Intelligence. AAAI, 2024:159-168. DOI: 10.1609/aaai.v38i2.27907.

[34]	Du D W, Wen L Y, Zhu P F, et al. VisDrone-DET2020: The vision meets drone object detection in image challenge results[C]// Computer Vision - ECCV 2020 Workshops. Cham: Springer, 2020: 692-712. DOI:10.1007/978-3-030-66823-5_42

[35]	Yu X H, Gong Y Q, Jiang N, et al. Scale match for tiny person detection[C]// 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2020:1246-1254. DOI:10.1109/WACV45572.2020.9093394

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献