Table 5.
Performance of the new and Korean undiagnosed diabetes screening method in the development and validation datasets.
| Model | Screening method | Feature | AUC (95% CI) |
Youden index | Sensitivity (%) | Specificity (%) | PPV | NPV | PLR | NLR | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Train and Internal validation set | Lee* + RHR | Risk score | Sex, Age, WC, RHR, BMI, Family history of diabetes, Hypertension status, Smoking status, Alcohol consumption, Physical activity, Sleep time |
0.756 (0.728 to 0.784) |
39 | 70 | 69 | 0.09 | 0.98 | 2.24 | 0.44 |
| Logistic Regression | Logistic Regression |
0.801 (0.777 to 0.825) |
43.6 | 80.50 | 63.10 | 0.08 | 0.99 | 2.2 | 0.31 | ||
| Random Forest | Random Forest Classifier |
0.788 (0.763 to 0.813) |
44.8 | 82.30 | 62.40 | 0.09 | 0.98 | 2.35 | 0.19 | ||
| LGBM | LightGBM Classifier |
0.803 (0.779 to 0.827) |
45.9 | 80.70 | 65.20 | 0.09 | 0.99 | 2.58 | 0.17 | ||
| XGB | XGBoost Classifier |
0.797 (0.773 to 0.821) |
44.7 | 81.70 | 63.00 | 0.09 | 0.98 | 2.41 | 0.18 | ||
| Ada | AdaBoost Classifier |
0.786 (0.761 to 0.811) |
43.7 | 82.50 | 61.20 | 0.08 | 0.98 | 2.31 | 0.18 | ||
| External validation set | Lee + RHR | Risk score | Sex, Age, WC, RHR, BMI, Family history of diabetes, Hypertension status, Smoking status, Alcohol consumption, Physical activity, Sleep time |
0.765 (0.748 to 0.782) |
42 | 78 | 64 | 0.11 | 0.98 | 2.17 | 0.35 |
| Logistic Regression | Logistic Regression |
0.814 (0.799 to 0.829) |
47.4 | 87.40 | 60.00 | 0.11 | 0.99 | 2.2 | 0.21 | ||
| Random Forest | Random Forest Classifier |
0.815 (0.8 to 0.83) |
48.7 | 88.70 | 60.00 | 0.1 | 0.99 | 2.2 | 0.19 | ||
| LGBM | LightGBM Classifier |
0.819 (0.805 to 0.833) |
49.6 | 84.80 | 64.80 | 0.11 | 0.99 | 2.41 | 0.23 | ||
| XGB | XGBoost Classifier |
0.818 (0.804 to 0.832) |
49.5 | 82.90 | 66.60 | 0.11 | 0.98 | 2.48 | 0.25 | ||
| Ada | AdaBoost Classifier |
0.809 (0.786 to 0.816) |
46.5 | 83.90 | 62.50 | 0.11 | 0.98 | 2.24 | 0.26 |
*Lee et al. 20129 and Park et al. 202210 When Lee model’s + RHR (Park et al., 2022) performance was tested, data from 2019, 2020 were used to build prediction model and data from 2014, 2015, 2016, 2017, 2018 were used to validate. WC: Waist circumference, RHR: Resting heart rate, LGBM: Light Gradient Boosting Machine, XGB: Extreme Gradient Boosting, Ada: Ada Boost, AUC: The receiver operating characteristics curve under the curve. For this study, five different machine learning classification algorithms were used to predict undiagnosed diabetes. Based on their performance assessed by AUC, results from the best performed machine learning classification was used.