Table 18.
Literature Analysis: Camera-LiDAR Fusion-based 3D Object Recognition Methods.
| Model | Detector Category | Environment | Scenario | Fusion Level | Advantage(s) | Limitation(s) |
|---|---|---|---|---|---|---|
| MV3D [73] | Two-stage | Outdoor | Multi-view feature fusion and 3D object proposal generation scenario | Early, Late, Deep | Introduces a deep fusion scheme for leveraging region-wise features from bird-eye and front view for multi-modalities’ interaction | The low LiDAR point density does not allow the detection of far objects that are captured by the camera The BEV-based region proposal network limits the recognition Detects cars only |
| BEVLFVC [74] | One-stage | Outdoor | Fusion scenario for LiDAR point cloud and camera-captured images in CNN | Middle | Exploits and fuses the whole feature map in contrast to previous fusion-based networks Generates high-quality proposal by fusion but boosts the speed by the fast one-stage fusion-based detector | Does not have superior LiDAR input representation Detects pedestrians only |
| D3PD [75] | Two-stage | Outdoor | 3D person detection scenario in automotive scenes | Early, Late, Deep | Performs end-to-end learning on camera-LiDAR data and gives high-level sensor data representation | Dependent on ground plane estimation for finding 3D anchor proposals |
| MVX-Net [76] | One-stage. | Outdoor. | Integration scenario for RGB and point-cloud modalities. | Early, Middle. | Reduces false positives and negatives due to its effective multi-modal fusion. | Does not provide a multi-class detection network. |
| SharedNet [77] | One-stage. | Outdoor. | LiDAR-camera-based 3D object detection scenario with only one neural network for autonomous vehicles. | Early, Middle. | Achieving a good balance between accuracy and efficiency. Reduces the memory requirements and model training time. | Slightly inferior performance in case of car detection. |