Table 2.
Results on the COCO test-val2017.
| Method | Average precision (%) | ||||||
|---|---|---|---|---|---|---|---|
| Size | |||||||
| RetinaNet13 | 500 | 53.1 | 34.4 | 36.8 | 14.7 | 38.5 | 49.1 |
| SSD12 | 512 | 48.5 | 28.8 | 30.3 | 10.9 | 31.8 | 43.5 |
| CornerNet26 | 512 | 57.8 | 40.5 | 45.3 | 20.8 | 44.8 | 56.7 |
| CenterNet27 | 512 | 62.4 | 44.9 | 48.1 | 25.6 | 47.4 | 57.4 |
| YOLOv315 | 608 | 57.9 | 33.0 | 34.3 | 18.3 | 35.4 | 41.9 |
| ASFF33 | 608 | 63.0 | 42.4 | 47.4 | 25.5 | 45.7 | 52.3 |
| Efficientdet32 | 896 | 65.0 | 45.8 | 49.3 | 26.6 | 49.4 | 59.8 |
| FCOS28 | 8001024 | 64.1 | 44.7 | 48.4 | 27.6 | 47.5 | 55.6 |
| DETR-DC5-R10130 | 8001333 | 64.7 | 44.9 | 47.7 | 23.7 | 49.5 | 62.3 |
| YOLOv417 | 512 | 64.9 | 45.4 | 48.7 | 23.0 | 50.7 | 62.7 |
| Ours | 512 | 65.4 | 45.6 | 48.9 | 25.4 | 51.0 | 62.0 |
| ViDT-tiny31 | 8001333 | 64.5 | 44.8 | 48.7 | 25.9 | 47.6 | 62.1 |
| ViDT-base31 | 8001333 | 69.4 | 49.2 | ||||
| YOLOv8-S21 | 640 | 61.8 | 44.9 | – | – | – | – |
| YOLOv8-L21 | 640 | – | – | – | – | ||
The proposed method achieved the best detection results on , which refers to the mean average precision (mAP) for intersection over union (IoU) value thresholds equal to 0.5. , , , and refer to the average mAP across value of IoU thresholds (i.e., refers to the average of 10 mAP across IoU thresholds). Bold-italic, italic and boldfaced values represent the best, second-best, and third-best results in each column, respectively.