Skip to main content
. 2020 Feb 4;10:1776. doi: 10.1038/s41598-020-58601-7

Table 3.

AUROC measures for each prediction model’s best parameterization. We applied a reference- and three register-based models on fifteen years of health register data comprising hospital diagnoses, hospital procedures, drug prescriptions and interactions with primary care contractors to predict five-year risk for five T2D comorbidities. For each comorbidity, prediction was performed on a T2D population free of that comorbidity at the date of prediction (date of individuals’ first T2D diagnosis). The reference model was a logistic ridge regression based on canonical features: age, sex, country or region of birth and date of first T2D diagnosis as well as their interactions, while the register-based models were logistic ridge regression, random forest and gradient boosting based on the canonical features as well as hospital diagnoses, hospital procedures, drug prescriptions and interactions with primary care extracted from Danish health registers. Incidences are proportions of cases within comorbidities’ sub-population at the end of the prediction horizon. Value ranges in brackets represent 95% confidence intervals based on bootstrap sampling. For heart failure, myocardial infarction, cardiovascular disease and chronic kidney disease the gradient boosting model outperformed the reference models. AUROC, area under receiver operating characteristic curve.

Heart failure (incidence: 0.04)
AUROC ΔAUROCRLR ΔAUROCLR ΔAUROCRF
Reference, logistic regression (RLR) 0.74 (0.72–0.75)
Logistic regression (LR) 0.77 (0.76–0.79) 0.04 (0.02–0.05)
Random forest (RF) 0.77 (0.75–0.78) 0.03 (−0.01) −0.01 (−0.02–0.01)
Gradient boosting (GB) 0.80 (0.78–0.81) 0.06 (0.05–0.07) 0.02 (0.01–0.03) 0.03 (0.02–0.04)
Myocardial infarction (incidence: 0.02)
AUROC ΔAUROCRLR ΔAUROCRL ΔAUROCRF
Reference, logistic regression (RLR) 0.68 (0.65–0.70)
Logistic regression (LR) 0.70 (0.68–0.73) 0.03 (0.01–0.04)
Random forest (RF) 0.67 (0.64–0.69) −0.01 (−0.03–0.01) −0.04 (−0.06–−0.02)
Gradient boosting (GB) 0.71 (0.69–0.73) 0.03 (0.02–0.05) 0.01 (0.00–0.02) 0.04 (0.03–0.06)
Stroke (incidence: 0.03)
AUROC ΔAUROCRLR ΔAUROCRL ΔAUROCRF
Reference, logistic regression (RLR) 0.71 (0.69–0.73)
Logistic regression (LR) 0.72 (0.70–0.74) 0.01 (0.00–0.01)
Random forest (RF) 0.69 (0.67–0.71) −0.02 (−0.04–−0.01) −0.03 (−0.04–−0.01)
Gradient boosting (GB) 0.72 (0.70–0.74) 0.01 (0.00–0.02) 0.01 (0.00–0.02) 0.03 (0.02–0.05)
Cardiovascular disease (incidence: 0.25)
AUROC ΔAUROCRLR ΔAUROCLR ΔAUROCRF
Reference, logistic regression (RLR) 0.66 (0.64–0.67)
Logistic regression (LR) 0.68 (0.67–0.69) 0.02 (0.02–0.03)
Random forest (RF) 0.68 (0.67–0.69) 0.02 (0.02–0.03) 0.00 (0.00–0.01)
Gradient boosting (GB) 0.69 (0.68–0.70) 0.04 (0.03–0.05) 0.02 (0.01–0.02) 0.01 (0.01–0.02)
Chronic kidney disease (incidence: 0.03)
AUROC ΔAUROCRLR ΔAUROCLR ΔAUROCRF
Reference, logistic regression (RLR) 0.71 (0.69–0.73)
Logistic regression (LR) 0.74 (0.72–0.76) 0.04 (0.02–0.05)
Random forest (RF) 0.74 (0.72–0.76) 0.03 (0.01–0.05) 0.00 (−0.02–0.01)
Gradient boosting (GB) 0.77 (0.76–0.79) 0.07 (0.05–0.08) 0.03 (0.02–0.04) 0.04 (0.02–0.05)