Table 3:
Logistic regression models and identified predictors | |||||
---|---|---|---|---|---|
GLM originala (AUCint = 70.76%)d (AUCext=69.56%)e |
GLM undersampledb (AUCint = 70.30%)d (AUCext = 69.01%)e |
GLM oversampledc (AUCint = 70.83%)d (AUCext = 69.62%)e |
|||
Predictor | ORh (95% CI) | Predictor | OR (95% CI) | Predictor | OR (95% CI) |
Socio-economic | Socio-economic | Socio-economic | |||
Age | 1.02 (1.01-–1.03) | Age | 1.02 (1.01–1.03) | Age | 1.02 (1.01–1.03) |
Citizenship (ref=yes) | 1.38 (1.04–1.81) | Clinical | Citizenship (ref=yes) | 1.23 (1.01–1.50) | |
Marital status (ref=unmarried) | 0.95 (0.90–1.00) | Hepatitis C(ref=no) | 1.20 (1.02–1.43) | Marital status (ref=unmarried) | 0.95 (0.91–0.98) |
Income-poverty ratio | 0.94 (0.88–1.00) | Biochemical | Income-poverty ratio | 0.92 (0.88–0.96) | |
Food security | 0.85 (0.76–0.94) | Monocyte count | 0.44 (0.20–0.94) | Food security | 0.82 (0.77–0.89) |
Clinical | Serum potassium | 0.60 (0.43–0.83) | Clinical | ||
Diagnosed HT (ref=no) | 1.26 (1.02–1.55) | Uric acid | 1.14 (1.03–1.26) | Diagnosed HT (ref=no) | 1.18 (1.02–1.37) |
Mean SBP | 1.01 (1.00–1.02) | Vigorous exercise (ref=no) | 0.47 (0.28–0.76) | ||
Biochemical | Hysterectomy (ref=no) | 1.42 (1.04–1.93) | |||
Monocyte count | 0.45 (0.24–0.82) | Biochemical | |||
Red cell count | 1.51 (1.14–2.01) | GGT | 1.10 (1.00–1.20) | ||
Serum calcium | 1.39 (1.06–1.82) | Monocyte count | 0.43 (0.28–0.65) | ||
ALT | 1.33 (1.07–1.65) | Red cell count | 1.32 (1.07–1.62) | ||
Serum potassium | 0.58 (0.45–0.75) | Serum calcium | 1.39 (1.15–1.67) | ||
Triglycerides | 1.01 (1.00–1.02) | ALT | 1.46 (1.24–1.71) | ||
Serum potassium | 0.58 (0.49–0.70) | ||||
Osmolality | 1.02 (1.00–1.03) | ||||
Uric acid | 1.11 (1.04–1.20) | ||||
Triglycerides | 1.01 (1.00–1.02) | ||||
Hematocrit | 1.09 (1.03–1.14) |
a: logistic regression model on original, un-resampled data; b: logistic regression model on tde training data re-structured by majority class under-sampling; c: logistic regression model on tde training data re-structured by minority class oversampling; d: compared with CDC prediabetes screening tool AUC on internal validation data (N=3172) i.e. 0.644; e: compared with CDC prediabetes screening tool AUC on external validation data (N=3000) i.e. 0.628.
Abbreviations: ALT, serum alanine amino-transferase; AUCext, Area under receiver operating characteristic curve on the external validation data; AUCint, Area under receiver operating characteristic curve on the internal validation data; CI, confidence interval; GGT, serum gamma glutamyl transferase; HT, hypertension; OR, odds ratio; ref, reference level for categorical predictors; SBP, systolic blood pressure.