Skip to main content
. Author manuscript; available in PMC: 2020 Feb 1.
Published in final edited form as: Nature. 2019 Jul 31;572(7767):116–119. doi: 10.1038/s41586-019-1390-1

Extended Data Table 5 |. Future and cross-site generalisability experiments.

(a) Model performance when trained before the time point tP and tested after tP, both on the entirety of the future patient population as well as subgroups of patients for which the model has or hasn’t seen historical information during training. The model maintains a comparable level of performance on unseen future data, with a higher level of sensitivity of 59% for a time window of 48 hours ahead of time and a precision of two false positives per step for each true positive. The ranges correspond to bootstrap pivotal 95% confidence intervals with n=200. Note that this experiment is not a replacement for a prospective evaluation of the model. (b) Cohort statistics for (a), shown for both before and after the temporal split tP that was used to simulate model performance on future data. (c) Comparison of model performance when applied to data from previously unseen hospital sites. Data was split across sites so that 80% of the data was in group A and 20% in group B. No site from group B was present in group A and vice versa. The data was split into training, validation, calibration and test in the same way as in the other experiments. The table reports model performance when trained on site group A when evaluating on the test set within site group A versus the test set within site group B for predicting all AKI severities up to 48 hours ahead of time. Comparable performance is seen across key all key metrics. 95% bootstrap pivot confidence intervals are calculated using n=200 bootstrap samples. Note that the model would still need to be retrained to generalise outside of the VA population to a different demographic and a different set of clinical pathways and hospital processes elsewhere.

a

Patient cohorts
 Metric [95% CI] Before tp (test) New admissions after tp (test) Subsequent admissions after tp All patients after tp
 Sensitivity (AKI episode) 55.09 [54.01, 56.06] 59 [57.11, 60.71] 59.04 [58.38, 59.63] 58.97 [58.33, 59.52]
 ROC AUC 92.25 [92.01, 92.42] 90.19 [89.76, 90.77] 89.98 [89.83, 90.17] 89.98 [89.81, 90.14]
 PRAUC 29.97 [28.61, 31.15] 30.75 [28.65, 32.81] 31.54 [30.87, 32.30] 31.28 [30.44, 32.02]
 Sensitivity (step) 34.26 [33.17, 35.28] 36.87 [35.2, 38.85] 37.23 [36.67, 37.88] 37.08 [36.40, 37.65]
 Specificity (step) 98.55 [98.50, 98.60] 97.66 [97.54, 97.76] 97.63 [97.58, 97.68] 97.64 [97.59, 97.68]
 Precision 32.51 [31.44, 33.21] 32.66 [31.2, 34.03] 32.97 [32.52, 33.47] 32.84 [32.28, 33.33]
b

Before tp After tp

 Patients

 Number of patients 599,871 246,406
 Average age* 61.3 64.2

 Admissions within a given period

 Unique admissions 2,134,544 364,778
 ICU admissions 226,585(10.62%) 40,102 (10.99%)
 Medical admissions 1,040,923 (48.77%) 170,383 (46.71%)
 Surgical admissions 373,823(17.51%) 67,617 (18.54%)
 No creatinine measured 458,486 (21.48%) 52,115 (14.29%)
 Any Chronic Kidney Disease 774,883 (36.30%) 156,181 (42.82%)
 Any AKI present 282,398(13.23%) 41,950 (14.59%)
c

 Metric [95% Cl] Site group A Site group B

 Sensitivity (AKI episode) 55.6% [54.5, 56.6] 54.6% [52.8, 56.3]
 ROC AUC 91.8% [91.6, 92.1] 91.3% [90.8, 91.7]
 PRAUC 30.0% [28.6, 31.2] 30.6% [28.3, 32.7]
 Sensitivity (step) 34.3% [33.1, 35.2] 34.7% [32.6, 36.2]
 Specificity (step) 98.5% [98.4, 98.5] 98.3% [98.2, 98.4]