Table 2.
Faster R-CNN | YOLO v3 | SSD | |
---|---|---|---|
Phases | RPN + Fast R-CNN detector | Concurrent bounding-box regression and classification | Concurrent bounding-box regression and classification |
Neural Network Type | Fully convolutional | Fully convolutional | Fully convolutional |
Backbone Feature Extractor | VGG-16 or other feature extractors | Darknet-53 (53 convolutional layers) |
VGG-16 or other feature extractors |
Location Detection | Anchor-based | Anchor-Based | Prior boxes/Default boxes |
Anchor Box | 9 default boxes with different scales and aspect ratios | K-means from coco and VOC, 9 anchors boxes with different size | A fixed number of bounding boxes with different scales and aspect ratios in each feature map |
IoU Thresholds | Two (at 0.3 and 0.7) | One (at 0.5) | One (at 0.5) |
Loss Function | Softmax loss for classification; Smooth L1 for regression |
Binary cross-entropy loss | Softmax loss for confidence; Smooth L1 Loss for localization |
Input Size | Conserve the aspect ratio of the original image, and resized dimension ranges from smallest 500 to largest 1000 | Random multi-scale input | Resize original images to a fixed size (300 × 300 or 512 × 512) |