地球信息科学理论与方法

VoxTNT:基于多尺度Transformer的点云3D目标检测方法

  • 郑强文 , 1 ,
  • 吴升 , 2, * ,
  • 魏婧卉 1
展开
  • 1.福州大学计算机与大数据学院,福州 350100
  • 2.福州大学数字中国研究院(福建),福州 350100
*吴 升(1972—),男,福建松溪人,博士,教授,主要从事大数据分析与可视化、数字化规划、数字政府研究。E-mail:

作者贡献:Author Contributions

郑强文和吴升参与方法设计;郑强文和魏婧卉参与实验设计;郑强文完成实验操作;郑强文和魏婧卉参与实验结果分析;郑强文完成论文初稿;郑强文和吴升参与论文的写作和修改;吴升提供资助经费。所有作者均阅读并同意最终稿件的提交。

ZHENG Qiangwen and WU Sheng contributed to methodology design; ZHENG Qiangwen and WEI JInghui contributed to experimental design; ZHENG Qiangwen performed the experiments; ZHENG Qiangwen and WEI Jinghui analyzed the experimental results; ZHENG Qiangwen drafted the manuscript; ZHENG Qiangwen and WU Sheng contributed to writing and revising the manuscript; WU Sheng provided funding. All the authors have read the last version of manuscript and consented for submission.

郑强文(1990—),男,福建龙岩人,博士生,主要从自动驾驶领域感知技术研究。E-mail:

收稿日期: 2025-03-14

  修回日期: 2025-04-18

  网络出版日期: 2025-06-06

基金资助

公共数据开发利用科技创新团队(闽教科〔2023] 15 号)

VoxTNT: A Multi-Scale Transformer-based Approach for 3D Object Detection in Point Clouds

  • ZHENG Qiangwen , 1 ,
  • WU Sheng , 2, * ,
  • WEI Jinghui 1
Expand
  • 1. The College of Computer and Data Science, Fuzhou University, Fuzhou 350100, China
  • 2. The Academy of Digital China (Fujian), Fuzhou University, Fuzhou 350100, China
*WU Sheng, E-mail:

Received date: 2025-03-14

  Revised date: 2025-04-18

  Online published: 2025-06-06

Supported by

Fujian Provincal Program for Innovative Research Team, Fujian ES [2023] No.15.]

摘要

背景】传统方法因静态感受野设计较难适配城市自动驾驶场景中汽车、行人及骑行者等目标的显著尺度差异,且跨尺度特征融合易引发层级干扰。【方法】针对自动驾驶场景中多类别、多尺寸目标的3D检测中跨尺度表征一致性的关键挑战,本研究提出基于均衡化感受野的3D目标检测方法VoxTNT,通过局部-全局协同注意力机制提升检测性能。在局部层面,设计了PointSetFormer模块,引入诱导集注意力模块(Induced Set Attention Block, ISAB),通过约简的交叉注意力聚合高密度点云的细粒度几何特征,突破传统体素均值池化的信息损失瓶颈;在全局层面,设计了VoxelFormerFFN模块,将非空体素抽象为超点集并实施跨体素ISAB交互,建立长程上下文依赖关系,并将全局特征学习计算负载从ON 2)压缩至OM 2)(M<<NM为非空体素数量),规避了复杂的Transformer 直接使用在原始点云造成的高计算复杂度。该双域耦合架构实现了局部细粒度感知与全局语义关联的动态平衡,有效缓解固定感受野和多尺度融合导致的特征建模偏差。【结果】实验表明,该方法在KITTI数据集单阶段检测下,中等难度级别的行人检测精度AP(Average Precision)值达到59.56%,较SECOND基线提高约12.4%,两阶段检测下以66.54%的综合指标mAP(mean Average Precision)领先次优方法BSAODet的66.10%。同时,在WOD数据集中验证了方法的有效性,综合指标mAP达到66.09%分别超越SECOND和PointPillars基线7.7%和8.5%。消融实验进一步表明,均衡化局部和全局感受野的3D特征学习机制能显著提升小目标检测精度(如在KITTI数据集中全组件消融的情况下,中等难度级别的行人和骑行者检测精度分别下降10.8%和10.0%),同时保持大目标检测的稳定性。【结论】本研究为解决自动驾驶多尺度目标检测难题提供了新思路,未来将优化模型结构以进一步提升效能。

本文引用格式

郑强文 , 吴升 , 魏婧卉 . VoxTNT:基于多尺度Transformer的点云3D目标检测方法[J]. 地球信息科学学报, 2025 , 27(6) : 1361 -1380 . DOI: 10.12082/dqxxkx.2025.250122

Abstract

[Background] Traditional methods, due to their static receptive field design, struggle to adapt to the significant scale differences among cars, pedestrians, and cyclists in urban autonomous driving scenarios. Moreover, cross-scale feature fusion often leads to hierarchical interference. [Methodology] To address the key challenge of cross-scale representation consistency in 3D object detection for multi-class, multi-scale objects in autonomous driving scenarios, this study proposes a novel method named VoxTNT. VoxTNT leverages an equalized receptive field and a local-global collaborative attention mechanism to enhance detection performance. At the local level, a PointSetFormer module is introduced, incorporating an Induced Set Attention Block (ISAB) to aggregate fine-grained geometric features from high-density point clouds through reduced cross-attention. This design overcomes the information loss typically associated with traditional voxel mean pooling. At the global level, a VoxelFormerFFN module is designed, which abstracts non-empty voxels into a super-point set and applies cross-voxel ISAB interactions to capture long-range contextual dependencies. This approach reduces the computational complexity of global feature learning from O(N2) to O(M2) (where M << N, M is the number of non-empty voxels), avoiding the high computational complexity associated with directly applying complex Transformers to raw point clouds. This dual-domain coupled architecture achieves a dynamic balance between local fine-grained perception and global semantic association, effectively mitigating modeling bias caused by fixed receptive fields and multi-scale fusion. [Results] Experiments demonstrate that the proposed method achieves a single-stage detection Average Precision (AP) of 59.56% for moderate-level pedestrian detection on the KITTI dataset, an improvement of approximately 12.4% over the SECOND baseline. For two-stage detection, it achieves a mean Average Precision (mAP) of 66.54%, outperforming the second-best method, BSAODet, which achieves 66.10%. Validation on the WOD dataset further confirms the method’s effectiveness, achieving 66.09% mAP, which outperforms the SECOND and PointPillars baselines by 7.7% and 8.5%, respectively. Ablation studies demonstrate that the proposed equalized local-global receptive field mechanism significantly improves detection accuracy for small objects. For example, on the KITTI dataset, full component ablation resulted in a 10.8% and 10.0% drop in AP for moderate-level pedestrian and cyclist detection, respectively, while maintaining stable performance for large-object detection. [Conclusions] This study presents a novel approach to tackling the challenges of multi-scale object detection in autonomous driving scenarios. Future work will focus on optimizing the model architecture to further enhance efficiency.

利益冲突:Conflicts of Interest 所有作者声明不存在利益冲突。

All authors disclose no relevant conflicts of interest.

[1]
Mao J G, Shi S S, Wang X G, et al. 3D object detection for autonomous driving: A comprehensive survey[J]. International Journal of Computer Vision, 2023, 131(8):1909-1963. DOI:10.1007/s11263-023-01790-1

[2]
Qian R, Lai X, Li X R. 3D object detection for autonomous driving: A survey[J]. Pattern Recognition, 2022,130:108796. DOI:10.1016/j.patcog.2022.108796

[3]
Zamanakos G, Tsochatzidis L, Amanatiadis A, et al. A comprehensive survey of LIDAR-based 3D object detection methods with deep learning for autonomous driving[J]. Computers & Graphics, 2021, 99:153-181. DOI:10.1016/j.cag.2021.07.003

[4]
Lang B, Li X, Chuah M C. BEV-TP: End-to-end visual perception and trajectory prediction for autonomous driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(11):18537-18546. DOI:10.1109/TITS.2024.3433591

[5]
Zhang A, Eranki C, Zhang C, et al. Toward robust robot 3-D perception in urban environments: The UT campus object dataset[J]. IEEE Transactions on Robotics, 2024, 40:3322-3340. DOI: 10.1109/TRO.2024.3400831

[6]
Shreyas E, Sheth M H, Mohana. 3D object detection and tracking methods using deep learning for computer vision applications[C]// 2021 International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT). IEEE, 2021:735-738. DOI:10.1109/rteict52294.2021.9573964

[7]
张尧, 张艳, 王涛, 等. 大场景SAR影像舰船目标检测的轻量化研究[J]. 地球信息科学学报, 2025, 27(1):256-270.

DOI

[Zhang Y, Zhang Y, Wang T, et al. Lightweight research on ship target detection in large-scale SAR images[J]. Journal of Geo-information Science, 2025, 27(1):256-270.] DOI:10.12082/dqxxkx.2025.240574

[8]
高定, 李明, 范大昭, 等. 复杂背景下轻量级SAR影像船舶检测方法[J]. 地球信息科学学报, 2024, 26(11):2612-2625.

DOI

[Gao D, Li M, Fan D Z, et al. A ship detection method from lightweight SAR images under complex backgrounds[J]. Journal of Geo-information Science, 2024, 26(11):2612-2625.] DOI:10.12082/dqxxkx.2024.230544

[9]
Guo Y L, Wang H Y, Hu Q Y, et al. Deep learning for 3D point clouds: A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(12):4338-4364. DOI:10.1109/TPAMI.2020.3005434

[10]
Zhou Y, Tuzel O. VoxelNet: End-to-end learning for point cloud based 3D object detection[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018:4490-4499. DOI:10.1109/CVPR.2018.00472

[11]
Bello S A, Yu S S, Wang C, et al. Review: Deep learning on 3D point clouds[J]. Remote Sensing, 2020, 12(11):1729. DOI:10.3390/rs12111729

[12]
Song Z Y, Liu L, Jia F Y, et al. Robustness-aware 3D object detection in autonomous driving: A review and outlook[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(11):15407-15436. DOI:10.1109/TITS.2024.3439557

[13]
Yang B, Luo W J, Urtasun R. PIXOR: Real-time 3D object detection from point clouds[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018:7652-7660. DOI:10.1109/CVPR.2018.00798

[14]
Chen X Z, Ma H M, Wan J, et al. Multi-view 3D object detection network for autonomous driving[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:6526-6534. DOI:10.1109/CVPR.2017.691

[15]
Liang M, Yang B, Wang S L, et al. Deep continuous fusion for multi-sensor 3D object detection[M]// Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018:663-678. DOI:10.1007/978-3-030-01270-0_39

[16]
Liang M, Yang B, Chen Y, et al. Multi-task multi-sensor fusion for 3D object detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019:7337-7345. DOI:10.1109/CVPR.2019.00752

[17]
Li J Z, Yang L, Shi Z, et al. SparseDet: Towards efficient multi-view 3D object detection via sparse scene representation[J]. Advanced Engineering Informatics, 2024,62:102955. DOI:10.1016/j.aei.2024.102955

[18]
Chen Y Q, Li N Y, Zhu D D, et al. BEVSOC: Self-supervised contrastive learning for calibration-free BEV 3-D object detection[J]. IEEE Internet of Things Journal, 2024, 11(12):22167-22182. DOI:10.1109/JIOT.2024.3379471

[19]
Yang L, Zhang X Y, Yu J X, et al. MonoGAE: Roadside monocular 3D object detection with ground-aware embeddings[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(11):17587-17601. DOI:10.1109/TITS.2024.3412759

[20]
Kuang H W, Wang B, An J P, et al. Voxel-FPN: Multi-scale voxel feature aggregation for 3D object detection from LIDAR point clouds[J]. Sensors, 2020, 20(3):704. DOI:10.3390/s20030704

[21]
He C H, Zeng H, Huang J Q, et al. Structure aware single-stage 3D object detection from point cloud[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020:11870-11879. DOI:10.1109/CVPR42600.2020.01189

[22]
Yan Y, Mao Y X, Li B. SECOND: Sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10):3337. DOI: 10.3390/s18103337

[23]
Shi S S, Wang Z, Shi J P, et al. From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(8):2647-2664. DOI:10.1109/TPAMI.2020.2977026

[24]
Ye M S, Xu S J, Cao T Y. HVNet:Hybrid voxel network for LiDAR based 3D object detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020:1628-1637. DOI:10.1109/CVPR42600.2020.00170

[25]
Yin T W, Zhou X Y, Krahenbuhl P. Center-based 3D object detection and tracking[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021:11779-11788. DOI:10.1109/cvpr46437.2021.01161

[26]
Shi S S, Wang X G, Li H S. PointRCNN:3D object proposal generation and detection from point cloud[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019:770-779. DOI:10.1109/cvpr.2019.00086

[27]
Yang Z T, Sun Y N, Liu S, et al. 3DSSD: Point-based 3D single stage object detector[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020:11037-11045. DOI:10.1109/cvpr42600.2020.01105

[28]
Qi C R, Liu W, Wu C X, et al. Frustum PointNets for 3D object detection from RGB-D data[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018:918-927. DOI:10.1109/CVPR.2018.00102

[29]
Wang Z X, Jia K. Frustum ConvNet:Sliding Frustums to aggregate local point-wise features for amodal 3D object detection[C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019:1742-1749. DOI:10.1109/IROS40897.2019.8968513

[30]
Yang Z T, Sun Y N, Liu S, et al. STD:Sparse-to-dense 3D object detector for point cloud[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2019: 1951-1960. DOI:10.1109/iccv.2019.00204

[31]
Charles R Q, Hao S, Mo K C, et al. PointNet:Deep learning on point sets for 3D classification and segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:77-85. DOI:10.1109/CVPR.2017.16

[32]
Qi C R, Yi L, Su H, et al. PointNet++: Deep hierarchical feature learning on point sets in a metric space[EB/OL]. 2017: 1706.02413. https://arxiv.org/abs/1706.02413v1

[33]
Luo Z P, Zhang G J, Zhou C Q, et al. TransPillars:Coarse-to-fine aggregation for multi-frame 3D object detection[C]//2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2023:4219-4228. DOI:10.1109/WACV56688.2023.00421

[34]
Lang A H, Vora S, Caesar H, et al. PointPillars: Fast encoders for object detection from point clouds[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019:12689-12697. DOI:10.1109/CVPR.2019.01298

[35]
Shi W J, Rajkumar R. Point-GNN:Graph neural network for 3D object detection in a point cloud[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020:1708-1716. DOI:10.1109/cvpr42600.2020.00178

[36]
Meraz M, Ansari M A, Javed M, et al. DC-GNN: Drop channel graph neural network for object classification and part segmentation in the point cloud[J]. International Journal of Multimedia Information Retrieval, 2022, 11(2):123-133. DOI:10.1007/s13735-022-00236-7

[37]
Xiong S M, Li B, Zhu S. DCGNN: A single-stage 3D object detection network based on density clustering and graph neural network[J]. Complex & Intelligent Systems, 2023, 9(3):3399-3408. DOI:10.1007/s40747-022-00926-z

[38]
Zarzar J, Giancola S, Ghanem B. PointRGCN:Graph convolution networks for 3D vehicles detection refinement[EB/OL]. 2019: 1911. 12236. https://arxiv.org/abs/1911.12236v1

[39]
Wang X, Li K Q, Chehri A. Multi-sensor fusion technology for 3D object detection in autonomous driving: A review[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(2):1148-1165. DOI:10.1109/TITS.2023.3317372

[40]
Tang Q S, Bai X Y, Guo J T, et al. DFAF3D: A dual-feature-aware anchor-free single-stage 3D detector for point clouds[J]. Image and Vision Computing, 2023,129:104594. DOI:10.1016/j.imavis.2022.104594

[41]
Xiao W P, Peng Y, Liu C, et al. Balanced sample assignment and objective for single-model multi-class 3D object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(9):5036-5048. DOI: 10.1109/TCSVT.2023.3248656

[42]
Koo I, Lee I, Kim S H, et al. PG-RCNN:Semantic surface point generation for 3D object detection[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2023:18096-18105. DOI:10.1109/ICCV5 1070.2023.01663

[43]
Chen S T, Zhang H L, Zheng N N. Leveraging anchor-based LiDAR 3D object detection via point assisted sample selection[EB/OL]. 2024:2403.01978. https://arxiv.org/abs/2403.01978v1

[44]
Feng X Y, Du H M, Fan H H, et al. SEFormer: Structure embedding transformer for 3D object detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(1):632-640. DOI:10.1609/aaai.v37i1.25139

[45]
He C H, Li R H, Li S, et al. Voxel set transformer:A set-to-set approach to 3D object detection from point clouds[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022:8407-8417. DOI:10.1109/CVPR52688.2022.00823

[46]
Chen C, Chen Z, Zhang J, et al. SASA: Semantics-augmented set abstraction for point-based 3D object detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(1):221-229. DOI:10.1609/aaai.v36i1.19897

[47]
Xie T, Wang L, Wang K, et al. FARP-net: Local-global feature aggregation and relation-aware proposals for 3D object detection[J]. IEEE Transactions on Multimedia, 2023, 26:1027-1040. DOI:10.1109/TMM.2023.3275366

[48]
Xia Q M, Chen Y D, Cai G R, et al. 3-D HANet: A flexible 3-D heatmap auxiliary network for object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023,61:5701113. DOI:10.1109/TGRS.2023.3250229

[49]
Dong Z C, Ji H, Huang X F, et al. PeP: A Point enhanced Painting method for unified point cloud tasks[EB/OL]. 2023:2310.07591. https://arxiv.org/abs/2310.07591v2.

[50]
Zheng W, Tang W L, Jiang L, et al. SE-SSD: Self-ensembling single-stage object detector from point cloud[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021:14489-14498. DOI:10.1109/cvpr46437.2021.01426

[51]
Shi S S, Jiang L, Deng J J, et al. PV-RCNN++: Point-voxel feature set abstraction with local vector representation for 3D object detection[J]. International Journal of Computer Vision, 2023, 131(2):531-551. DOI:10.1007/s11263-022-01710-9

[52]
Fei H, Zhao J, Zhang Z, et al. PV-GNN: point-voxel 3d object detection based on graph neural network[J]. PREPRINT (Version 1) available at Research Square, 2024. DOI:10.21203/rs.3.rs-4598182/v1

[53]
Deng J J, Shi S S, Li P W, et al. Voxel R-CNN: Towards high performance voxel-based 3D object detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(2):1201-1209. DOI:10.1609/aaai.v35i2.16207

[54]
Shi S, Guo C, Jiang L, et al. PV-RCNN: point-voxel feature set abstraction for 3d object detection[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, 2020:10526-10535. DOI:10.1109/CVPR42600.2020.01054

[55]
Shenga H L, Cai S J, Liu Y, et al. Improving 3D object detection with channel-wise transformer[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2021:2723-2732. DOI:10.1109/ICCV48922.2021.00274

[56]
孔德明, 李晓伟, 杨庆鑫. 基于伪点云特征增强的多模态三维目标检测方法[J]. 计算机学报, 2024, 47(4):759-775.

[Kong D M, Li X W, Yang Q X. Multimodal 3D object detection method based on pseudo point cloud feature enhancement[J]. Chinese Journal of Computers, 2024, 47(4):759-775.] DOI:10.11897/SP.J.1016.2024.00759

[57]
Liu Z, Tang H, Lin Y, et al. Point-voxel CNN for efficient 3d deep learning[J]. Advances in neural information processing systems, 2019, 32. DOI:10.48550/arXi v.1907.03739

[58]
Chen Y L, Liu S, Shen X Y, et al. Fast point R-CNN[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2019:9774-9783. DOI:10.1109/iccv.2019.00987

[59]
Vora S, Lang A H, Helou B, et al. PointPainting:Sequential fusion for 3D object detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020:4603-4611. DOI:10.1109/cvpr42600.2020.00466

[60]
Liu Z, Lin Y T, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2021:9992-10002. DOI:10.1109/ICCV48922.2021.00986

[61]
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale[J]. arXiv preprint arXiv:2010. 11929, 2020. DOI:10.48550/arXiv.2010.11929

[62]
彭颖, 张胜根, 黄俊富, 等. 基于自注意力机制的两阶段三维目标检测方法[J]. 科学技术与工程, 2024, 24(25):10825-10831.

[Peng Y, Zhang S G, Huang J F, et al. Two-stage 3D object detection method based on self-attention mechanism[J]. Science Technology and Engineering, 2024, 24(25):10825-10831.] DOI:10.12404/j.issn.1671-1815.2400232

[63]
鲁斌, 杨振宇, 孙洋, 等. 基于多通道交叉注意力融合的三维目标检测算法[J]. 智能系统学报, 2024, 19(4):885-897.

[Lu B, Yang Z Y, Sun Y, et al. 3D object detection algorithm with multi-channel cross attention fusion[J]. CAAI Transactions on Intelligent Systems, 2024, 19(4):885-897.] DOI:10.11992/tis.202305029

[64]
张素良, 张惊雷, 文彪. 基于交叉自注意力机制的LiDAR 点云三维目标检测[J]. 光电子·激光, 2024, 35(1):75-83.

[Zhang S L, Zhang J L, Wen B. LiDAR point cloud 3D object detection based on cross self-attention mechanism[J]. Journal of Optoelectronics·Laser, 2024, 35(1):75-83.] DOI:10.16136/j.joel.2024.01.0593

[65]
刘明阳, 杨啟明, 胡冠华, 等. 基于Transformer的3D点云目标检测算法[J]. 西北工业大学学报, 2023, 41(6):1190-1197.

[Liu M Y, Yang Q M, Hu G H, et al. 3D point cloud object detection algorithm based on Transformer[J]. Journal of Northwestern Polytechnical University, 2023, 41(6):1190-1197.] DOI:10.1051/jnwpu/20234161190

[66]
Zhao H S, Jiang L, Jia J Y, et al. Point transformer[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2021:16239-16248. DOI:10.1109/ICCV48922.2021.01595

[67]
Pei Y, Zhao X, Li H, et al. Clusterformer:Cluster-based transformer for 3D object detection in point clouds[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2023:6641-6650. DOI:10.1109/ICCV51070.2023.00613

[68]
Ren S Y, Pan X, Zhao W J, et al. Dynamic graph transformer for 3D object detection[J]. Knowledge-Based Systems, 2023,259:110085. DOI:10.1016/j.knosys.2022.110085

[69]
Lu B, Sun Y, Yang Z Y. Voxel graph attention for 3-D object detection from point clouds[J]. IEEE Transactions on Instrumentation and Measurement, 2023,72:5023012. DOI:10.1109/TIM.2023.3301907

[70]
Ai L M, Xie Z Y, Yao R X, et al. MVTr: Multi-feature voxel transformer for 3D object detection[J]. The Visual Computer, 2024, 40(3):1453-1466. DOI:10.1007/s00371-023-02860-8

[71]
Hoang H A, Bui D C, Yoo M. TSSTDet: Transformation-based 3-D object detection via a spatial shape transformer[J]. IEEE Sensors Journal, 2024, 24(5):7126-7139

[72]
Dong Y P, Kang C X, Zhang J L, et al. Benchmarking robustness of 3D object detection to common corruptions in autonomous driving[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023:1022-1032. DOI:10.1109/CVPR52729.2023.00105

[73]
刘慧, 董振阳, 田帅华. 融合点云和体素信息的目标检测网络[J]. 计算机工程与设计, 2024, 45(9):2771-2778.

[Liu H, Dong Z Y, Tian S H. Object detection network fusing point cloud and voxel information[J]. Computer Engineering and Design, 2024, 45(9):2771-2778.] DOI:10.16208/j.issn1000-7024.2024.09.029

[74]
Wu H, Wen C L, Li W, et al. Transformation-equivariant 3D object detection for autonomous driving[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(3):2795-2802. DOI:10.1609/aaai.v37i3.25380

[75]
Guo S, Cai J Y, Hu Y Z, et al. LCASAFormer: Cross-attention enhanced backbone network for 3D point cloud tasks[J]. Pattern Recognition, 2025,162:111361. DOI:10.1016/j.patcog.2025.111361

[76]
Lee J, Lee Y, Kim J, et al. Set transformer: A framework for attention-based permutation-invariant neural networks[C]// International Conference on Machine Learning (ICML). PMLR, 2019:3744-3753. DOI:10.48550/arXiv.1810.00825

[77]
Fan L, Pang Z Q, Zhang T Y, et al. Embracing single stride 3D object detector with sparse transformer[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022:8448-8458. DOI: 10.1109/CVPR52688.2022.00827

[78]
Sun P, Tan M X, Wang W Y, et al. SWFormer: Sparse window transformer for 3D object detection in point clouds[M]//Computer Vision-ECCV 2022. Cham: Springer Nature Switzerland, 2022:426-442. DOI:10.1007/978-3-031-20080-9_25

[79]
Wang H Y, Shi C, Shi S S, et al. DSVT: Dynamic sparse voxel transformer with rotated sets[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023:13520-13529. DOI:10.1109/CVPR 52729.2023.01299

[80]
Han K, Xiao A, Wu E, et al. Transformer in transformer[J]. Advances in Neural Information Processing Systems, 2021,34:15908-15919

[81]
Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149. DOI:10.1109/TPAMI.2016.2577031

PMID

[82]
Te G S, Hu W, Zheng A M, et al. RGCNN: Regularized graph CNN for point cloud segmentation[C]// Proceedings of the 26th ACM International Conference on Multimedia. ACM, 2018:746-754. DOI:10.1145/3240508.3240621

[83]
Corporation N. NVIDIA documentation hub[EB/OL]. [9-15]. https://docs.nvidia.com/#all-documents.

[84]
PyTorch. Torch.scatter[EB/OL].[5-24]. https://pytorch.org/docs/2.3/generated/torch.scatter.html#torch.scatter.

[85]
Qi C R, Litany O, He K M, et al. Deep Hough voting for 3D object detection in point clouds[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2019:9276-9285. DOI:10.1109/iccv.2019.00937

[86]
Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[M]// Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020:213-229. DOI:10.1007/978-3-030-58452-8_13

[87]
Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving?The KITTI vision benchmark suite[C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012:3354-3361. DOI:10.1109/CVP R.2012.6248074

[88]
Sun P, Kretzschmar H, Dotiwalla X, et al. Scalability in perception for autonomous driving: Waymo open dataset[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020:2443-2451. DOI:10.1109/cvpr42600.2020.00252

[89]
Team O D. OpenPCDet:An open-source toolbox for 3d object detection from point clouds[EB/OL]. (2024-12-30)[10.1]. https://github.com/open-mmlab/OpenPCDet.

文章导航

/