. 2020 Aug 31;20(17):4938. doi: 10.3390/s20174938

Table 2.

Comparison of Faster R-CNN, You Look Only Once-v3 (YOLO v3) and Single Shot Multi-Box Detector (SSD).

	Faster R-CNN	YOLO v3	SSD
Phases	RPN + Fast R-CNN detector	Concurrent bounding-box regression and classification	Concurrent bounding-box regression and classification
Neural Network Type	Fully convolutional	Fully convolutional	Fully convolutional
Backbone Feature Extractor	VGG-16 or other feature extractors	Darknet-53 (53 convolutional layers)	VGG-16 or other feature extractors
Location Detection	Anchor-based	Anchor-Based	Prior boxes/Default boxes
Anchor Box	9 default boxes with different scales and aspect ratios	K-means from coco and VOC, 9 anchors boxes with different size	A fixed number of bounding boxes with different scales and aspect ratios in each feature map
IoU Thresholds	Two (at 0.3 and 0.7)	One (at 0.5)	One (at 0.5)
Loss Function	Softmax loss for classification; Smooth L1 for regression	Binary cross-entropy loss	Softmax loss for confidence; Smooth L1 Loss for localization
Input Size	Conserve the aspect ratio of the original image, and resized dimension ranges from smallest 500 to largest 1000	Random multi-scale input	Resize original images to a fixed size (300 × 300 or 512 × 512)