Skip to main content
. 2020 Aug 31;20(17):4938. doi: 10.3390/s20174938

Table 2.

Comparison of Faster R-CNN, You Look Only Once-v3 (YOLO v3) and Single Shot Multi-Box Detector (SSD).

Faster R-CNN YOLO v3 SSD
Phases RPN + Fast R-CNN detector Concurrent bounding-box regression and classification Concurrent bounding-box regression and classification
Neural Network Type Fully convolutional Fully convolutional Fully convolutional
Backbone Feature Extractor VGG-16 or other feature extractors Darknet-53
(53 convolutional layers)
VGG-16 or other feature extractors
Location Detection Anchor-based Anchor-Based Prior boxes/Default boxes
Anchor Box 9 default boxes with different scales and aspect ratios K-means from coco and VOC, 9 anchors boxes with different size A fixed number of bounding boxes with different scales and aspect ratios in each feature map
IoU Thresholds Two (at 0.3 and 0.7) One (at 0.5) One (at 0.5)
Loss Function Softmax loss for classification;
Smooth L1 for regression
Binary cross-entropy loss Softmax loss for confidence;
Smooth L1 Loss for localization
Input Size Conserve the aspect ratio of the original image, and resized dimension ranges from smallest 500 to largest 1000 Random multi-scale input Resize original images to a fixed size (300 × 300 or 512 × 512)