. Author manuscript; available in PMC: 2022 Oct 1.

Published in final edited form as: Biom J. 2021 May 24;63(7):1375–1388. doi: 10.1002/bimj.202000199

Table 4.

Discrimination in development and prospective validation sets measured by AUC (95% CIs).

Sampling Framework			Logistic regression with LASSO		Random forest
Training/ test split	Cross-validation split	Model estimation	Development test set^†	Prospective validation set^‡	Development test set	Prospective validation set
Visit	Visit	Observed cluster analysis	0.867 (0.860, 0.873)	0.849 (0.846, 0.851)	0.950 (0.946, 0.954)	0.836 (0.833, 0.838)
Visit	Person	Observed cluster analysis	0.862 (0.856, 0.868)	0.853 (0.850, 0.855)	0.907 (0.901, 0.912)	0.853 (0.850, 0.855)
Person	Person	Observed cluster analysis	0.854 (0.847, 0.861)	0.847 (0.845, 0.850)	0.856 (0.849, 0.862)	0.847 (0.844, 0.849)
Person	Person	Within cluster resampling	0.863 (0.857, 0.869)	0.854 (0.852, 0.856)	0.857 (0.851, 0.864)	0.847 (0.845, 0.849)

^†

Development test set includes 531,639 visits (141,968 people, 1,517 unique events) for the visit-level training/test split and 531,930 visits (72,771 people, 841 unique events) for the person-level training/test split.

^‡

Prospective validation set includes 4,286,495 visits (660,659 people, 6,678 unique events).