Table 4.
Discrimination in development and prospective validation sets measured by AUC (95% CIs).
Sampling Framework | Logistic regression with LASSO | Random forest | ||||
---|---|---|---|---|---|---|
Training/ test split | Cross-validation split | Model estimation | Development test set† | Prospective validation set‡ | Development test set | Prospective validation set |
Visit | Visit | Observed cluster analysis | 0.867 (0.860, 0.873) | 0.849 (0.846, 0.851) | 0.950 (0.946, 0.954) | 0.836 (0.833, 0.838) |
Visit | Person | Observed cluster analysis | 0.862 (0.856, 0.868) | 0.853 (0.850, 0.855) | 0.907 (0.901, 0.912) | 0.853 (0.850, 0.855) |
Person | Person | Observed cluster analysis | 0.854 (0.847, 0.861) | 0.847 (0.845, 0.850) | 0.856 (0.849, 0.862) | 0.847 (0.844, 0.849) |
Person | Person | Within cluster resampling | 0.863 (0.857, 0.869) | 0.854 (0.852, 0.856) | 0.857 (0.851, 0.864) | 0.847 (0.845, 0.849) |
Development test set includes 531,639 visits (141,968 people, 1,517 unique events) for the visit-level training/test split and 531,930 visits (72,771 people, 841 unique events) for the person-level training/test split.
Prospective validation set includes 4,286,495 visits (660,659 people, 6,678 unique events).