Journal of Geo-information Science ›› 2020, Vol. 22 ›› Issue (1): 88-99.doi: 10.12082/dqxxkx.2020.190424

Previous Articles     Next Articles

Anchor-Free Traffic Sign Detection

FAN Hongchao1,*(), LI Wanzhi2, ZHANG Chaoquan1,2   

  1. 1. Norwegian University of Science and Technology, Trondheim 7491, Norway
    2. Wuhan University, Wuhan 430072, China
  • Received:2019-08-05 Revised:2019-11-27 Online:2020-01-25 Published:2020-04-08
  • Contact: FAN Hongchao
  • Supported by:
    National Natural Science Foundation of China(41771484)


Traffic signs are essential elements in High Definition (HD) maps and hence very important for vehicles in autonomous driving. Real-time and accurate detection of traffic signs from street level images is of great significance for the development of autonomous driving. Conventional algorithms detect traffic signs based on image color and shape features, and can only work for specific kinds of traffic signs. Algorithms based on image feature and machine learning classifier need artificial designed features, and the detection speed is slow. To date, many approaches using deep learning methods have been developed based on anchor boxes, which introduce extra hyper parameters in network design. When switching to a different detection task, anchor boxes need to be redesigned. Anchor-based methods also generate massive redundant anchor boxes during model training, which easily cause imbalance between positive and negative samples. Inspired by the idea of anchor-free and YOLO, this paper proposed a real-time traffic sign detection network called AF-TSD, which regresses object boundary directly. AF-TSD adopts an effective convolution module named deformable convolution to enhance the feature expression ability of convolutional neural networks. This module adds 2D offsets to the regular grid sampling locations in the standard convolution. It also modulates input feature amplitudes from different spatial locations/bins. Both the offsets and amplitudes are learned from the preceding feature maps, via additional convolutional layers. In addition, AF-TSD introduces attention mechanism. It is inserted after fusion of the feature pyramid, and adaptively recalibrates channel-wise feature responses by explicitly modeling the interdependencies between channels. This module first squeezes global spatial information into a channel descriptor. Then the excitation operator maps the input-specific descriptor to a set of channel weights. The attention mechanism in this paper is lightweight and imposes only a slight increase in model complexity and computational burden. To test the superiority of AF-TSD, extensive comparative experiments were carried out. We first evaluated the influence of different modules on detection precision. The experimental results show that the deformable convolution and attention mechanism can help extract features of traffic signs. Then, AF-TSD was compared with mainstream detection networks, including Faster R-CNN, RetinaNet, and YOLOv3. Our proposed AF-TSD traffic sign detection network achieved 96.80% of mAP on GTSDB traffic sign detection dataset, which was superior to mainstream detection algorithms. The average detection speed was 32ms per images, which can meet the requirements of real-time detection.

Key words: VGI data, traffic sign detection, convolutional neural networks, deformable convolution, attention mechanism, anchor-free, AF-TSD