Table 2. External Validation of Models Built on Each Participating Site.
| Participating Site | Type of Model by Test Site, AUCa | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Baseline Modelb | Clinical Trajectory–Augmented Modelc | NLP-Augmented Modeld | |||||||
| UCSF | MPMC | BIDMC | UCSF | MPMC | BIDMC | UCSF | MPMC | BIDMC | |
| UCSF | NA | 0.604 | 0.838 | NA | 0.801 | 0.876 | NA | 0.878 | 0.897 |
| MPMC | 0.781 | NA | 0.714 | 0.823 | NA | 0.803 | 0.894 | NA | 0.854 |
| BIDMC | 0.867 | 0.729 | NA | 0.888 | 0.814 | NA | 0.923 | 0.857 | NA |
Abbreviations: AUC, area under the receiver operating characteristic curve; BIDMC, Beth Israel Deaconess Medical Center; MPMC, Mills-Peninsula Medical Center; NA, not applicable; NLP, natural language processing; UCSF, University of California, San Francisco.
Calculated using nested 10-fold cross-validation. All comparisons of the AUCs for each train and test pair between models (eg, trained on BIDMC, tested at UCSF for model 1 vs model 2: 0.867 vs 0.888) were statistically significant at P < .05.
Uses the highest and lowest of all laboratory values and vital signs.
Adds measures of distribution, variability, and trajectory of laboratory values and vital signs to models already using the highest and lowest values.
Adds NLP to models already using all observed values and measures of distribution, variability, and trajectory of laboratory values and vital signs.