. 2024 Sep 24;108:105333. doi: 10.1016/j.ebiom.2024.105333

Table 2.

Model scores across testing datasets including AUROCs achieved and qualitative evaluation scores.

Team name	Team institution(s)	Hold out AUROC (95% CI)	Two site AUROC (95% CI)	AUROC difference	Mean AUROC	Qualitative score
Convalesco	University of Chicago	0.879 (0.873, 0.884)	0.911 (0.907, 0.915)	0.032	0.895	8.29
GAIL	Geisinger	0.889 (0.884, 0.894)	0.805 (0.799, 0.812)	−0.084	0.847	7.52
UC Berkeley Center for Targeted Machine Learning	UC Berkeley	0.864 (0.858, 0.874)	0.859 (0.854, 0.865)	−0.005	0.862	7.39
UW-Madison-BMI	University of Wisconsin–Madison	0.886 (0.88, 0.893)	0.841 (0.835, 0.846)	−0.045	0.864	6.84
Ruvos	Ruvos	0.851 (0.832, 0.844)	0.838 (0.832, 0.844)	−0.013	0.844	6.77
	Anonymous Group 1	0.884 (0.877, 0.891)	0.835 (0.829, 0.841)	−0.05	0.86	5.78
	Anonymous Group 2	0.853 (0.846, 0.86)	0.824 (0.816, 0.83)	−0.029	0.839	5.57
Penn	Penn	0.889 (0.883, 0.895)	0.841 (0.834, 0.847)	−0.048	0.865	5.37
	Anonymous Group 4	0.905 (0.9, 0.91)	0.836 (0.83, 0.841)	−0.07	0.87	4.8
	Anonymous Group 5	0.837 (0.832, 0.846)	0.836 (0.83, 0.842)	−0.001	0.836	4.69

Models not explicitly named have been masked as anonymous groups. The final rankings were based on the qualitative scores which combined aspects of reproducibility, interpretability, and translational feasibility (See Supplemental materials for more information).