Table 6.
Backbone | Mask R-CNN 1 38 |
|||||||
---|---|---|---|---|---|---|---|---|
Params | FLOPs | |||||||
CNN based | ||||||||
ResNet10197 | 40.4 | 61.1 | 44.2 | 36.4 | 57.7 | 38.8 | 63.2M | 336G |
ResNeXt101126 | 42.8 | 63.8 | 47.3 | 38.4 | 60.6 | 41.3 | 101.9M | 493G |
VAN-large105 | 47.1 | 67.9 | 51.9 | 42.2 | 65.4 | 45.5 | 64.4M | – |
Transformer based | ||||||||
PVT-large112 | 42.9 | 65.0 | 46.6 | 39.5 | 61.9 | 42.5 | 81M | 364G |
Swin-B49 | 46.9 | – | – | 42.3 | – | – | 107M | 496G |
CSWin-B115 | 48.7∗ | 70.4∗ | 53.9∗ | 43.9∗ | 67.8∗ | 47.3 | 97M | 526G |
MLP based | ||||||||
CycleMLP-B583 | 44.1 | 65.5 | 48.4 | 40.1 | 62.8 | 43.0 | 95.3M | 421G |
WaveMLP-B76 | 45.7 | 67.5 | 50.1 | 27.8 | 49.2 | 59.7∗ | 75.1M | 353G |
HireMLP-L86 | 45.9 | 67.2 | 50.4 | 41.7 | 64.7 | 45.3 | 115.2M | 443G |
MS-MLP-B87 | 46.4 | 67.2 | 50.7 | 42.4 | 63.6 | 46.4 | 107.5M | 557G |
ActiveMLP-L85 | 47.4 | 69.9 | 52.0 | 43.2 | 67.3 | 46.5 | 96.0M | – |
Employing the Mask R-CNN,38 where “1x” means that a single-scale training schedule is used.
The best performance.