Table 2.
Reference | Sensors | Sensors Signal Representation | Network Architecture | Level of Fusion | Fusion Operation | Problem | Object Type | Dataset |
---|---|---|---|---|---|---|---|---|
R. Nabati and H. Qi [47] | Radar and visual camera | RGB image and radar signal projections | Fast-R-CNN (Two-stage) | Mid-level fusion | Region proposal | Object detection | 2D vehicle | Nuscenes [41] |
V. John et al., [48] | Radar and camera | RGB image and radar signal projections | Yolo object detector (Tiny Yolov3), and, Encoder-decoder | Feature level | Feature concatenation | Vehicle Detection and Free space Segmentation | Vehicles and free space | Nuscenes [41] |
L.Teck-Yian et al., [53] | Radar and camera | RGB image and Radar Range-Azimuth image | Modified SSD With two branches each for one sensor | Early level fusion | Feature concatenation | Detection and classification | 3D vehicles | Self-recorded |
S. Chadwick et al., [54] | Radar and visual camera | RGB image and Radar range-velocity maps | One-stage detector | Middle | Feature concatenation and addition | Object detection | 2D vehicle | Self-recorded |
F. Nobis et al. (CRF-Net), [55] | Radar and visual camera | RGB image and radar signal projections | RetinaNetwith a VGG backbone | Deeper layers | Feature concatenated | Object detection | 2D road vehicles | NuScenes [41] |
Meyer and Kuschk [56] | Radar and visual camera | RGB image and radar point clouds | Faster RCNN (Two-stage) | Early and Middle | Average Mean | Object Detection | 3D vehicle | Astyx hiRes 2019 [43] |
Vijay John and Seiichi Mita [57] | Radar and camera | RGB image and radar signal projections | Yolo object detector (Tiny Yolov3) | Feature level(late) | Feature concatenation | 2D image-based obstacle detection | vehicles, pedestrians, two-wheelers, and objects (movable objects and debris) |
Nuscenes [41] |
S. Chang et al., [58] | Radar and camera | RGB image and radar signal projections | Fully convolutional one-stage object detection framework (FCOS) |
Feature level | spatial attention feature fusion (SAF) | Obstacle detection | Bicycle, car, motorcycle, bus, train, truck | Nuscenes [41] |
W.Yizhou et al.(RODnet), [98] | Radar and Stereo videos | 2D image and Radar Range-Azimuth maps | 3D autoencoder, 3D stacked hourglass, and 3D stacked hourglass with temporal inception layers |
Mid level | Cross-modal learning and supervision | Object detection | Pedestrians, cyclists, and cars. |
CRUW [98] |
V. Lekic and Z. Babic [100] | Radar and visual camera | RGB image and Radar grid maps | GANs (CMGGAN model) | Mid-level | Feature fusion and semantic fusion | Segmentation | Free space | Self-recorded |
Mario Bijelic et al., [156] | Camera, lidar, radar, and gated NIR sensor |
Gated image, RGB image, Lidar projection, and radar projection | Modified VGG [88] backbone, and SSD blocks | Early feature fusion (Adaptive fusion steered by entropy) |
Feature concatenation | Object detection | Vehicles | A novel multimodal dataset in adverse weather dataset [156] |
Richard J. de Jong [157] | Radar and camera | RGB image and Radar micro-Doppler spectrograms | CNN | Data, middle and feature level fusion | Feature concatenation | Human Activity Classification | Walking person | Self-recorded |