. 2023 Aug 23;12(17):17856–17865. doi: 10.1002/cam4.6418

TABLE 2.

Deep learning model performance mean metrics (95% CI).

	Hospital 1		Hospital 2
Performance Metric	Creatinine	Bilirubin	Creatinine	Bilirubin
AUROC	0.99 (0.98, 1)	0.97 (0.95, 0.99)	0.76 (0.70, 0.82)	0.72 (0.68, 0.76)
F1 Score	0.99 (0.98, 1)	0.66 (0.65, 0.67)	0.59 (0.54, 0.64)	0.24 (0.14, 0.33)
Sensitivity	0.99 (0.98, 1)	0.99 (0.98, 0.99)	0.60 (0.55, 0.64)	0.54 (0.52, 0.56)
Specificity	0.99 (0.98, 1)	0.91 (0.89, 0.93)	0.98 (0.96, 0.99)	0.90 (0.87, 0.94)
PPV	0.99 (0.98, 1)	0.5 (0.5, 0.5)	0.59 (0.47, 0.71)	0.24 (0.23, 0.25)
NPV	0.99 (0.98, 1)	0.99 (0.99, 0.99)	0.99 (0.98, 1)	0.92 (0.91, 0.94)
Cohen's Kappa	0.99 (0.98, 1)	0.65 (0.60, 0.69)	0.57 (0.5, 0.64)	0.20 (0.10, 0.31)
FNR	0.01 (0, 0.02)	0.00 (0.00, 0.00)	0.31 (0.27, 0.36)	0.37 (0.28, 0.45)

Abbreviations: AUROC, area under receiver operator characteristic curve; FNR, false negative rate; NPV, negative predictive power; PPV, positive predictive power.