Skip to main content
. 2022 Sep 15;6(12):1399–1406. doi: 10.1038/s41551-022-00936-9

Table 5.

Impact of ensembling on performance

Mean AUC Mean F1 Mean MCC
Radiologists (mean) N/A 0.619 0.530
Best single model 0.878 0.563 (0.527, 0.598) 0.473 (0.434, 0.510)
Ensemble model 0.889 0.606 (0.571, 0.638) 0.523 (0.486, 0.561)

Comparison between the ensemble over top-ten model checkpoints and the single best model on the CheXpert validation dataset. The results were averaged across the five CheXpert competition pathologies. Numbers within parentheses indicate 95% CI. *The Mean AUC of radiologists is not available (N/A) because the binary radiologist predictions are represented by a single point on the receiver operating curve; therefore an area cannot be computed.