Skip to main content
. 2022 Aug 8;82(6):9243–9275. doi: 10.1007/s11042-022-13644-y

Table 3.

Comparison of Two-stage (R-CNN & Successors) and Single-stage detectors (YOLOs)

Features R-CNN and its successors YOLO and its successors
Region proposals Region proposals (or ROI) are generated using Selective search algorithm Region proposals are generated by a single convolutional network.
Feature extraction (Backbone network) The backbone network is a heavyweight and time consuming. The backbone network is a lightweight and faster feature extractor.
Number of stages and their role First stage extracts region proposals. Second stage extracts feature vectors, thereafter detections. A single stage network predicts the bounding boxes offsets, confidence score and class conditional probabilities
Speed and accuracy Higher accuracy and low speed. Faster detection and accuracy nearer to two stage object detectors.
Computational cost They require powerful resources for computation and are computationally expensive. Not necessary for powerful resources for computation and are less expensive.
Performance They are efficient for detecting smaller and larger objects. They have mostly shown poor performance for detecting smaller objects and have been efficient for larger objects.