Skip to main content
. Author manuscript; available in PMC: 2022 Nov 14.
Published in final edited form as: Domain Adapt Represent Transf (2022). 2022 Sep 15;13542:12–22. doi: 10.1007/978-3-031-16852-9_2

Table 1.

Benchmarking transfer learning with transformers (i.e., ViT-B and Swin-B) ImageNet pre-trained models on six medical image classification tasks. For comparison with CNNs, we evaluate four pre-trained models with ResNet-50 backbones. To provide a comprehensive evaluation, we also include results for the training of these three architectures from scratch.

Backbone Initialization ChestX-ray14 CheXpert Shenzhen VinDr-CXR RSNA Pneumonia RSNA PE
ResNet-50 Scratch 80.400.05 86.620.15 89.031.82 87.390.42 70.000.50 90.371.32
Supervised[20] 81.700.15 87.170.22 95.620.63 91.770.40 73.040.35 94.730.12
Sup.(IN21K)[21] 81.400.27 87.230.86 94.640.39 91.660.56 73.630.45 94.660.18
DINO[6] 81.410.35 87.370.45 96.380.48 90.960.68 73.580.35 95.600.10
MoCo-v3[8] 81.870.15 87.590.51 95.550.40 91.910.59 73.390.27 95.610.12

ViT-B Scratch 71.690.32 80.780.03 82.240.60 70.221.95 66.590.39 84.680.09
Sup. (in21k)[10] 80.050.17 87.880.50 93.671.03 88.301.45 71.500.52 91.190.11
DeiT[23] 79.460.24 87.490.43 95.350.80 89.642.97 72.930.62 91.950.07
DINO[6] 78.370.47 87.010.62 90.394.29 82.891.10 71.270.45 88.990.08
MoCo-v3[8] 79.200.30 87.120.36 92.851.00 87.250.63 72.790.52 91.330.10
BEiT(in21k)[3] 79.910.24 87.770.38 92.871.08 85.931.98 72.780.37 91.310.10
MAE[12] 79.010.58 87.120.54 92.524.98 87.001.74 72.850.50 91.960.12
SimMIM[27] 79.550.56 88.070.43 93.472.48 88.910.55 72.080.47 91.390.10

Swin-B Scratch 77.040.34 83.390.84 92.524.98 78.491.00 70.020.42 90.630.10
Supervised[18] 81.730.14 87.800.42 93.350.77 90.350.31 73.440.46 94.850.07
Sup. (in21k)[18] 81.740.13 87.940.54 94.211.25 91.231.06 73.200.59 94.580.13
SimMIM[27] 81.950.15 88.160.31 94.120.96 90.240.35 73.660.34 95.270.12

Abbreviations: Sup.: Supervised; IN21K: ImageNet-21K.

Unless mentioned, all models are pre-trained on ImageNet-1K.