Table 2.
Comparison of the verification performance on two different subsets of the ChestX-ray14 dataset that either contain foreign material or not (first two rows). Furthermore, we show the verification results for the CheXpert dataset and the COVID-19 Image Data Collection (last two rows). We present the AUC (together with the lower and upper bounds of the 95% confidence intervals from 10,000 bootstrap runs), the accuracy, the specificity, the recall, the precision, and the F1-score.
| Dataset | Subset | AUC + 95 % CI | Accuracy () | Specificity () | Recall () | Precision () | F1-score |
|---|---|---|---|---|---|---|---|
| ChestX-ray14 | w/ foreign material | 0.9795 | |||||
| w/o foreign material | 0.9862 | ||||||
| CheXpert | – | 0.9429 | |||||
| COVID-19 | – | 0.9127 |