. 2022 Sep 1;12:14851. doi: 10.1038/s41598-022-19045-3

Table 2.

Comparison of the verification performance on two different subsets of the ChestX-ray14 dataset that either contain foreign material or not (first two rows). Furthermore, we show the verification results for the CheXpert dataset and the COVID-19 Image Data Collection (last two rows). We present the AUC (together with the lower and upper bounds of the 95% confidence intervals from 10,000 bootstrap runs), the accuracy, the specificity, the recall, the precision, and the F1-score.

Dataset	Subset	AUC + 95 % CI	Accuracy ( $\frac{T P + T N}{P + N}$ )	Specificity ( $\frac{TN}{N}$ )	Recall ( $\frac{TP}{P}$ )	Precision ( $\frac{TP}{T P + F P}$ )	F1-score
ChestX-ray14	w/ foreign material	$0 . 9970_{0.9938}^{0.9993}$	$0.9796 (\frac{672}{686})$	$0.9854 (\frac{338}{343})$	$0.9738 (\frac{334}{343})$	$0.9853 (\frac{334}{339})$	0.9795
ChestX-ray14	w/o foreign material	$0 . 9972_{0.9909}^{0.9999}$	$0.9862 (\frac{430}{436})$	$0.9908 (\frac{216}{218})$	$0.9817 (\frac{214}{218})$	$0.9907 (\frac{214}{216})$	0.9862
CheXpert	–	$0 . 9870_{0.9855}^{0.9884}$	$0.9440 (\frac{15, 562}{16, 486})$	$0.9629 (\frac{7, 937}{8, 243})$	$0.9250 (\frac{7, 625}{8, 243})$	$0.9614 (\frac{7, 625}{7, 931})$	0.9429
COVID-19	–	$0 . 9763_{0.9696}^{0.9825}$	$0.9180 (\frac{1, 421}{1, 548})$	$0.9780 (\frac{757}{774})$	$0.8579 (\frac{664}{774})$	$0.9750 (\frac{664}{681})$	0.9127