Skip to main content
. 2022 Jul 8;3(7):100520. doi: 10.1016/j.patter.2022.100520

Table 6.

Object detection and instance segmentation results of different backbones on the COCO val2017 dataset

Backbone Mask R-CNN 1 ×38
APb AP50b AP75b APm AP50m AP75m Params FLOPs
CNN based

ResNet10197 40.4 61.1 44.2 36.4 57.7 38.8 63.2M 336G
ResNeXt101126 42.8 63.8 47.3 38.4 60.6 41.3 101.9M 493G
VAN-large105 47.1 67.9 51.9 42.2 65.4 45.5 64.4M

Transformer based

PVT-large112 42.9 65.0 46.6 39.5 61.9 42.5 81M 364G
Swin-B49 46.9 42.3 107M 496G
CSWin-B115 48.7 70.4 53.9 43.9 67.8 47.3 97M 526G

MLP based

CycleMLP-B583 44.1 65.5 48.4 40.1 62.8 43.0 95.3M 421G
WaveMLP-B76 45.7 67.5 50.1 27.8 49.2 59.7 75.1M 353G
HireMLP-L86 45.9 67.2 50.4 41.7 64.7 45.3 115.2M 443G
MS-MLP-B87 46.4 67.2 50.7 42.4 63.6 46.4 107.5M 557G
ActiveMLP-L85 47.4 69.9 52.0 43.2 67.3 46.5 96.0M

Employing the Mask R-CNN,38 where “1x” means that a single-scale training schedule is used.

The best performance.