Table 2.
Performance of the generalizable model at the development and external validation sites.
Development Site | Validation Site | |||||||
---|---|---|---|---|---|---|---|---|
Dataset | Development | Validation | Test | External Validation | ||||
Encounters (Cases/Controls), N |
10,457 (1,095/9,362) |
3,486 (365/3,121) |
4,152 (398/3,754) |
6,825 (387/6,438) |
||||
Model | XGB | LR | XGB | LR | XGB | LR | XGB | LR |
Feature Selection | IG | IG | IG | IG | IG | IG | IG | IG |
AUROC | 1.00 | 0.90 | 0.81 | 0.81 | 0.87 | 0.86 | 0.81 | 0.82 |
AUPRC | 0.99 | 0.71 | 0.52 | 0.54 | 0.62 | 0.61 | 0.51 | 0.48 |
PPV | 1.00 | 0.84 | 0.67 | 0.69 | 0.82 | 0.80 | 0.26 | 0.18 |
NPV | 0.99 | 0.94 | 0.93 | 0.93 | 0.94 | 0.94 | 0.97 | 0.98 |
Sensitivity | 0.91 | 0.47 | 0.35 | 0.41 | 0.35 | 0.38 | 0.61 | 0.70 |
Specificity | 1.00 | 0.99 | 0.98 | 0.98 | 0.99 | 0.99 | 0.90 | 0.81 |
F1 score | 0.95 | 0.61 | 0.46 | 0.51 | 0.49 | 0.52 | 0.37 | 0.29 |
F2 score | 0.92 | 0.52 | 0.39 | 0.45 | 0.39 | 0.43 | 0.48 | 0.45 |
F3 score | 0.92 | 0.50 | 0.37 | 0.43 | 0.37 | 0.41 | 0.54 | 0.55 |
F0.5 score | 0.98 | 0.73 | 0.57 | 0.60 | 0.65 | 0.66 | 0.30 | 0.22 |
Abbreviations: AUROC, area under the receiver operating characteristics curve; AUPRC, area under the precision recall curve; CFS, correlation-based feature selection; IG, information gain; LR, logistic regression; NB, naïve Bayes; NPV, negative predictive value; PPV, positive predictive value; XGB, XGBoost (extreme gradient boosting)