. Author manuscript; available in PMC: 2020 Feb 1.

Published in final edited form as: Nature. 2019 Jul 31;572(7767):116–119. doi: 10.1038/s41586-019-1390-1

Extended Data Table 5 |. Future and cross-site generalisability experiments.

(a) Model performance when trained before the time point t_P and tested after t_P, both on the entirety of the future patient population as well as subgroups of patients for which the model has or hasn’t seen historical information during training. The model maintains a comparable level of performance on unseen future data, with a higher level of sensitivity of 59% for a time window of 48 hours ahead of time and a precision of two false positives per step for each true positive. The ranges correspond to bootstrap pivotal 95% confidence intervals with n=200. Note that this experiment is not a replacement for a prospective evaluation of the model. (b) Cohort statistics for (a), shown for both before and after the temporal split tP that was used to simulate model performance on future data. (c) Comparison of model performance when applied to data from previously unseen hospital sites. Data was split across sites so that 80% of the data was in group A and 20% in group B. No site from group B was present in group A and vice versa. The data was split into training, validation, calibration and test in the same way as in the other experiments. The table reports model performance when trained on site group A when evaluating on the test set within site group A versus the test set within site group B for predicting all AKI severities up to 48 hours ahead of time. Comparable performance is seen across key all key metrics. 95% bootstrap pivot confidence intervals are calculated using n=200 bootstrap samples. Note that the model would still need to be retrained to generalise outside of the VA population to a different demographic and a different set of clinical pathways and hospital processes elsewhere.

a

	Patient cohorts
Metric [95% CI]	Before t_p (test)	New admissions after t_p (test)	Subsequent admissions after t_p	All patients after t_p
Sensitivity (AKI episode)	55.09 [54.01, 56.06]	59 [57.11, 60.71]	59.04 [58.38, 59.63]	58.97 [58.33, 59.52]
ROC AUC	92.25 [92.01, 92.42]	90.19 [89.76, 90.77]	89.98 [89.83, 90.17]	89.98 [89.81, 90.14]
PRAUC	29.97 [28.61, 31.15]	30.75 [28.65, 32.81]	31.54 [30.87, 32.30]	31.28 [30.44, 32.02]
Sensitivity (step)	34.26 [33.17, 35.28]	36.87 [35.2, 38.85]	37.23 [36.67, 37.88]	37.08 [36.40, 37.65]
Specificity (step)	98.55 [98.50, 98.60]	97.66 [97.54, 97.76]	97.63 [97.58, 97.68]	97.64 [97.59, 97.68]
Precision	32.51 [31.44, 33.21]	32.66 [31.2, 34.03]	32.97 [32.52, 33.47]	32.84 [32.28, 33.33]
b

		Before t_p		After t_p

Patients

Number of patients		599,871		246,406
Average age*		61.3		64.2

Admissions within a given period

Unique admissions		2,134,544		364,778
ICU admissions		226,585(10.62%)		40,102 (10.99%)
Medical admissions		1,040,923 (48.77%)		170,383 (46.71%)
Surgical admissions		373,823(17.51%)		67,617 (18.54%)
No creatinine measured		458,486 (21.48%)		52,115 (14.29%)
Any Chronic Kidney Disease		774,883 (36.30%)		156,181 (42.82%)
Any AKI present		282,398(13.23%)		41,950 (14.59%)
c

Metric [95% Cl]		Site group A		Site group B

Sensitivity (AKI episode)		55.6% [54.5, 56.6]		54.6% [52.8, 56.3]
ROC AUC		91.8% [91.6, 92.1]		91.3% [90.8, 91.7]
PRAUC		30.0% [28.6, 31.2]		30.6% [28.3, 32.7]
Sensitivity (step)		34.3% [33.1, 35.2]		34.7% [32.6, 36.2]
Specificity (step)		98.5% [98.4, 98.5]		98.3% [98.2, 98.4]