Table 2.
Model scores across testing datasets including AUROCs achieved and qualitative evaluation scores.
Team name | Team institution(s) | Hold out AUROC (95% CI) | Two site AUROC (95% CI) | AUROC difference | Mean AUROC | Qualitative score |
---|---|---|---|---|---|---|
Convalesco | University of Chicago | 0.879 (0.873, 0.884) | 0.911 (0.907, 0.915) | 0.032 | 0.895 | 8.29 |
GAIL | Geisinger | 0.889 (0.884, 0.894) | 0.805 (0.799, 0.812) | −0.084 | 0.847 | 7.52 |
UC Berkeley Center for Targeted Machine Learning | UC Berkeley | 0.864 (0.858, 0.874) | 0.859 (0.854, 0.865) | −0.005 | 0.862 | 7.39 |
UW-Madison-BMI | University of Wisconsin–Madison | 0.886 (0.88, 0.893) | 0.841 (0.835, 0.846) | −0.045 | 0.864 | 6.84 |
Ruvos | Ruvos | 0.851 (0.832, 0.844) | 0.838 (0.832, 0.844) | −0.013 | 0.844 | 6.77 |
Anonymous Group 1 | 0.884 (0.877, 0.891) | 0.835 (0.829, 0.841) | −0.05 | 0.86 | 5.78 | |
Anonymous Group 2 | 0.853 (0.846, 0.86) | 0.824 (0.816, 0.83) | −0.029 | 0.839 | 5.57 | |
Penn | Penn | 0.889 (0.883, 0.895) | 0.841 (0.834, 0.847) | −0.048 | 0.865 | 5.37 |
Anonymous Group 4 | 0.905 (0.9, 0.91) | 0.836 (0.83, 0.841) | −0.07 | 0.87 | 4.8 | |
Anonymous Group 5 | 0.837 (0.832, 0.846) | 0.836 (0.83, 0.842) | −0.001 | 0.836 | 4.69 |
Models not explicitly named have been masked as anonymous groups. The final rankings were based on the qualitative scores which combined aspects of reproducibility, interpretability, and translational feasibility (See Supplemental materials for more information).