Skip to main content
. 2021 Mar 10;21(6):1951. doi: 10.3390/s21061951

Table 2.

A summary of deep learning-based fusion methods using radar and vision data.

Reference Sensors Sensors Signal Representation Network Architecture Level of Fusion Fusion Operation Problem Object Type Dataset
R. Nabati and H. Qi [47] Radar and visual camera RGB image and radar signal projections Fast-R-CNN (Two-stage) Mid-level fusion Region proposal Object detection 2D vehicle Nuscenes [41]
V. John et al., [48] Radar and camera RGB image and radar signal projections Yolo object detector (Tiny Yolov3), and, Encoder-decoder Feature level Feature concatenation Vehicle Detection and Free space Segmentation Vehicles and free space Nuscenes [41]
L.Teck-Yian et al., [53] Radar and camera RGB image and Radar Range-Azimuth image Modified SSD With two branches each for one sensor Early level fusion Feature concatenation Detection and classification 3D vehicles Self-recorded
S. Chadwick et al., [54] Radar and visual camera RGB image and Radar range-velocity maps One-stage detector Middle Feature concatenation and addition Object detection 2D vehicle Self-recorded
F. Nobis et al. (CRF-Net), [55] Radar and visual camera RGB image and radar signal projections RetinaNetwith a VGG backbone Deeper layers Feature concatenated Object detection 2D road vehicles NuScenes [41]
Meyer and Kuschk [56] Radar and visual camera RGB image and radar point clouds Faster RCNN (Two-stage) Early and Middle Average Mean Object Detection 3D vehicle Astyx hiRes 2019 [43]
Vijay John and Seiichi Mita [57] Radar and camera RGB image and radar signal projections Yolo object detector (Tiny Yolov3) Feature level(late) Feature concatenation 2D image-based obstacle detection vehicles, pedestrians,
two-wheelers, and objects (movable objects and debris)
Nuscenes [41]
S. Chang et al., [58] Radar and camera RGB image and radar signal projections Fully convolutional
one-stage object detection framework (FCOS)
Feature level spatial attention feature fusion (SAF) Obstacle detection Bicycle, car, motorcycle, bus, train, truck Nuscenes [41]
W.Yizhou et al.(RODnet), [98] Radar and Stereo videos 2D image and Radar Range-Azimuth maps 3D autoencoder, 3D stacked hourglass,
and 3D stacked hourglass with temporal inception layers
Mid level Cross-modal learning and supervision Object detection Pedestrians, cyclists,
and cars.
CRUW [98]
V. Lekic and Z. Babic [100] Radar and visual camera RGB image and Radar grid maps GANs (CMGGAN model) Mid-level Feature fusion and semantic fusion Segmentation Free space Self-recorded
Mario Bijelic et al., [156] Camera, lidar,
radar, and gated NIR sensor
Gated image, RGB image, Lidar projection, and radar projection Modified VGG [88] backbone, and SSD blocks Early feature fusion (Adaptive
fusion steered by entropy)
Feature concatenation Object detection Vehicles A novel multimodal dataset in adverse weather dataset [156]
Richard J. de Jong [157] Radar and camera RGB image and Radar micro-Doppler spectrograms CNN Data, middle and feature level fusion Feature concatenation Human Activity Classification Walking person Self-recorded