Extended Data Table 5 |. Future and cross-site generalisability experiments.
(a) Model performance when trained before the time point tP and tested after tP, both on the entirety of the future patient population as well as subgroups of patients for which the model has or hasn’t seen historical information during training. The model maintains a comparable level of performance on unseen future data, with a higher level of sensitivity of 59% for a time window of 48 hours ahead of time and a precision of two false positives per step for each true positive. The ranges correspond to bootstrap pivotal 95% confidence intervals with n=200. Note that this experiment is not a replacement for a prospective evaluation of the model. (b) Cohort statistics for (a), shown for both before and after the temporal split tP that was used to simulate model performance on future data. (c) Comparison of model performance when applied to data from previously unseen hospital sites. Data was split across sites so that 80% of the data was in group A and 20% in group B. No site from group B was present in group A and vice versa. The data was split into training, validation, calibration and test in the same way as in the other experiments. The table reports model performance when trained on site group A when evaluating on the test set within site group A versus the test set within site group B for predicting all AKI severities up to 48 hours ahead of time. Comparable performance is seen across key all key metrics. 95% bootstrap pivot confidence intervals are calculated using n=200 bootstrap samples. Note that the model would still need to be retrained to generalise outside of the VA population to a different demographic and a different set of clinical pathways and hospital processes elsewhere.
a | ||||
---|---|---|---|---|
Patient cohorts |
||||
Metric [95% CI] | Before tp (test) | New admissions after tp (test) | Subsequent admissions after tp | All patients after tp |
Sensitivity (AKI episode) | 55.09 [54.01, 56.06] | 59 [57.11, 60.71] | 59.04 [58.38, 59.63] | 58.97 [58.33, 59.52] |
ROC AUC | 92.25 [92.01, 92.42] | 90.19 [89.76, 90.77] | 89.98 [89.83, 90.17] | 89.98 [89.81, 90.14] |
PRAUC | 29.97 [28.61, 31.15] | 30.75 [28.65, 32.81] | 31.54 [30.87, 32.30] | 31.28 [30.44, 32.02] |
Sensitivity (step) | 34.26 [33.17, 35.28] | 36.87 [35.2, 38.85] | 37.23 [36.67, 37.88] | 37.08 [36.40, 37.65] |
Specificity (step) | 98.55 [98.50, 98.60] | 97.66 [97.54, 97.76] | 97.63 [97.58, 97.68] | 97.64 [97.59, 97.68] |
Precision | 32.51 [31.44, 33.21] | 32.66 [31.2, 34.03] | 32.97 [32.52, 33.47] | 32.84 [32.28, 33.33] |
b | ||||
Before tp | After tp | |||
Patients | ||||
Number of patients | 599,871 | 246,406 | ||
Average age* | 61.3 | 64.2 | ||
Admissions within a given period | ||||
Unique admissions | 2,134,544 | 364,778 | ||
ICU admissions | 226,585(10.62%) | 40,102 (10.99%) | ||
Medical admissions | 1,040,923 (48.77%) | 170,383 (46.71%) | ||
Surgical admissions | 373,823(17.51%) | 67,617 (18.54%) | ||
No creatinine measured | 458,486 (21.48%) | 52,115 (14.29%) | ||
Any Chronic Kidney Disease | 774,883 (36.30%) | 156,181 (42.82%) | ||
Any AKI present | 282,398(13.23%) | 41,950 (14.59%) | ||
c | ||||
Metric [95% Cl] | Site group A | Site group B | ||
Sensitivity (AKI episode) | 55.6% [54.5, 56.6] | 54.6% [52.8, 56.3] | ||
ROC AUC | 91.8% [91.6, 92.1] | 91.3% [90.8, 91.7] | ||
PRAUC | 30.0% [28.6, 31.2] | 30.6% [28.3, 32.7] | ||
Sensitivity (step) | 34.3% [33.1, 35.2] | 34.7% [32.6, 36.2] | ||
Specificity (step) | 98.5% [98.4, 98.5] | 98.3% [98.2, 98.4] |