. 2022 Sep 15;6(12):1399–1406. doi: 10.1038/s41551-022-00936-9

Table 5.

Impact of ensembling on performance

	Mean AUC	Mean F1	Mean MCC
Radiologists (mean)	N/A	0.619	0.530
Best single model	0.878	0.563 (0.527, 0.598)	0.473 (0.434, 0.510)
Ensemble model	0.889	0.606 (0.571, 0.638)	0.523 (0.486, 0.561)

Comparison between the ensemble over top-ten model checkpoints and the single best model on the CheXpert validation dataset. The results were averaged across the five CheXpert competition pathologies. Numbers within parentheses indicate 95% CI. *The Mean AUC of radiologists is not available (N/A) because the binary radiologist predictions are represented by a single point on the receiver operating curve; therefore an area cannot be computed.