地球信息科学学报 ›› 2020, Vol. 22 ›› Issue (1): 88-99.doi: 10.12082/dqxxkx.2020.190424

• 专辑:地理智能 • 上一篇    下一篇

基于Anchor-free的交通标志检测

范红超1,*(), 李万志2, 章超权1,2   

  1. 1. 挪威科技大学,特隆赫姆7491
    2. 武汉大学,武汉430072
  • 收稿日期:2019-08-05 修回日期:2019-11-27 出版日期:2020-01-25 发布日期:2020-04-08
  • 通讯作者: 范红超 E-mail:hongchao.fan@ntnu.no
  • 作者简介:范红超(1977— ),男,湖北襄阳人,博士,教授,主要从事众源地理信息数据挖掘与分析研究。
  • 基金资助:
    国家自然科学基金项目(41771484)

Anchor-Free Traffic Sign Detection

FAN Hongchao1,*(), LI Wanzhi2, ZHANG Chaoquan1,2   

  1. 1. Norwegian University of Science and Technology, Trondheim 7491, Norway
    2. Wuhan University, Wuhan 430072, China
  • Received:2019-08-05 Revised:2019-11-27 Online:2020-01-25 Published:2020-04-08
  • Contact: FAN Hongchao E-mail:hongchao.fan@ntnu.no
  • Supported by:
    National Natural Science Foundation of China(41771484)

摘要:

交通标志检测是自动驾驶中的重要研究方向,实时准确地从街景图像中检测交通标志对实现自动驾驶及智慧城市的发展具有重要意义。传统的算法基于颜色、形状特征进行检测,只能提取特定种类的交通标志,算法无法同时检测不同类型的交通标志。基于图像特征+机器学习分类器的算法需要人工设计特征,算法速度较慢。主流的基于深度学习的方法多基于先验框,在网络设计上引入了额外的超参数,且在训练过程中产生过量的冗余边界框,容易造成正负样本不平衡。本文受Anchor-free思想的启发,引用YOLO检测器直接回归物体边界框的思路,提出一种基于Anchor-free的实时交通标志检测网络AF-TSD(Anchor-free Traffic Sign Detection)。AF-TSD摒弃了先验框的设计,并引入自适应采样位置可变卷积与注意力机制,大大提高网络的特征表达能力。本文开展大量对比实验,实验结果表明本文提出的AF-TSD交通标志检测网络速度接近主流算法,但精度优于主流算法,在德国GTSDB交通标志检测数据集上取得了96.80%的精度,检测速度平均单张图片32 ms,达到实时检测的要求。

关键词: 众源地理信息数据, 交通标志检测, 卷积神经网络, 可变形卷积, 注意力机制, Anchor-free, AF-TSD

Abstract:

Traffic signs are essential elements in High Definition (HD) maps and hence very important for vehicles in autonomous driving. Real-time and accurate detection of traffic signs from street level images is of great significance for the development of autonomous driving. Conventional algorithms detect traffic signs based on image color and shape features, and can only work for specific kinds of traffic signs. Algorithms based on image feature and machine learning classifier need artificial designed features, and the detection speed is slow. To date, many approaches using deep learning methods have been developed based on anchor boxes, which introduce extra hyper parameters in network design. When switching to a different detection task, anchor boxes need to be redesigned. Anchor-based methods also generate massive redundant anchor boxes during model training, which easily cause imbalance between positive and negative samples. Inspired by the idea of anchor-free and YOLO, this paper proposed a real-time traffic sign detection network called AF-TSD, which regresses object boundary directly. AF-TSD adopts an effective convolution module named deformable convolution to enhance the feature expression ability of convolutional neural networks. This module adds 2D offsets to the regular grid sampling locations in the standard convolution. It also modulates input feature amplitudes from different spatial locations/bins. Both the offsets and amplitudes are learned from the preceding feature maps, via additional convolutional layers. In addition, AF-TSD introduces attention mechanism. It is inserted after fusion of the feature pyramid, and adaptively recalibrates channel-wise feature responses by explicitly modeling the interdependencies between channels. This module first squeezes global spatial information into a channel descriptor. Then the excitation operator maps the input-specific descriptor to a set of channel weights. The attention mechanism in this paper is lightweight and imposes only a slight increase in model complexity and computational burden. To test the superiority of AF-TSD, extensive comparative experiments were carried out. We first evaluated the influence of different modules on detection precision. The experimental results show that the deformable convolution and attention mechanism can help extract features of traffic signs. Then, AF-TSD was compared with mainstream detection networks, including Faster R-CNN, RetinaNet, and YOLOv3. Our proposed AF-TSD traffic sign detection network achieved 96.80% of mAP on GTSDB traffic sign detection dataset, which was superior to mainstream detection algorithms. The average detection speed was 32ms per images, which can meet the requirements of real-time detection.

Key words: VGI data, traffic sign detection, convolutional neural networks, deformable convolution, attention mechanism, anchor-free, AF-TSD