Table 4.
Pretraining | VinDr-CXR | ChestX-ray14 | CheXpert | MIMIC-CXR | UKA-CXR | PadChest | |
---|---|---|---|---|---|---|---|
ROC-AUC | DINOv2 | 88.92 ± 4.59 | 79.79 ± 6.55 | 80.02 ± 6.60 | 80.52 ± 6.17 | 89.74 ± 3.57 | 87.62 ± 4.86 |
ImageNet-21 K | 86.38 ± 6.27 | 79.10 ± 6.34 | 79.56 ± 6.51 | 79.92 ± 6.35 | 89.45 ± 3.62 | 87.12 ± 5.05 | |
Accuracy | DINOv2 | 82.49 ± 6.92 | 72.81 ± 7.43 | 72.37 ± 8.29 | 73.08 ± 5.32 | 80.68 ± 4.00 | 79.82 ± 6.69 |
ImageNet-21 K | 81.92 ± 6.50 | 71.69 ± 7.29 | 71.36 ± 8.39 | 73.00 ± 5.37 | 79.94 ± 4.29 | 78.73 ± 7.49 | |
Sensitivity | DINOv2 | 83.58 ± 6.93 | 73.14 ± 8.94 | 75.68 ± 6.45 | 74.87 ± 10.01 | 83.42 ± 4.57 | 81.66 ± 6.91 |
ImageNet-21 K | 78.50 ± 8.97 | 73.04 ± 8.23 | 75.43 ± 6.00 | 73.91 ± 9.51 | 83.76 ± 4.37 | 81.80 ± 5.30 | |
Specificity | DINOv2 | 81.69 ± 7.37 | 73.32 ± 8.00 | 70.95 ± 9.69 | 72.25 ± 6.04 | 80.32 ± 4.44 | 79.49 ± 6.97 |
ImageNet-21 K | 81.80 ± 6.88 | 72.10 ± 7.94 | 70.23 ± 9.33 | 72.30 ± 6.16 | 79.39 ± 4.61 | 78.37 ± 7.80 | |
ROC-AUC p-value | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 |
The metrics used for comparison include the area under the receiver operating characteristic curve (ROC-AUC), accuracy, sensitivity, and specificity percentage values, all averaged over all labels for each dataset. The datasets in question are those pretrained with self-supervision on non-medical images (DINOv2 [18]) and those under full supervision with non-medical images (ImageNet-21 K [13]). The datasets employed in this study are VinDr-CXR, ChestX-ray14, CheXpert, MIMIC-CXR, UKA-CXR, and PadChest, with fine-tuning training images totals of n = 15,000, n = 86,524, n = 128,356, n = 170,153, n = 153,537, and n = 88,480, respectively, and test images totals of n = 3,000, n = 25,596, n = 39,824, n = 43,768, n = 39,824, and n = 22,045, respectively. For more information on the different labels used for each dataset, please refer to Table 3. p-values are given for the comparison between the ROC-AUC results obtained from DINOv2 and ImageNet-21 K pretraining weights