Table 3.
The area under the curve (AUC) of the receiver operating characteristic (ROC) curve, accuracy, and k-fold of prediction models generated from machine-learning algorithms in the Ansan/Ansung cohort.
99 Features | Logistic Regression |
XGBoost | Decision Tree |
KNN | SVM | Random Forest |
ANN |
---|---|---|---|---|---|---|---|
AUC of ROC | 0.866 (0.865–0.867) |
0.866 (0.865–0.867) |
0.647 (0.646–0.647) |
0.662 (0.661–0.663) |
0.597 (0.596–0.597) |
0.836 (0.835–0.836) |
0.816 |
Accuracy | 0.867 (0.867–0.868) |
0.868 (0.868–0.869) |
0.793 (0.792–0.793) |
0.826 (0.825–0.827) |
0.859 (0.858–0.859) |
0.841 (0.840–0.841) |
|
k-fold | 0.858 (0.853–0.863) |
0.859 (0.856–0.863) |
0.786 (0.764–0.786) |
0.821 (0.818–0.825) |
0.851 (0.848–0.854) |
0.833 (0.831–0.834) |
|
Top 15 features | |||||||
AUC of ROC | 0.849 (0.848–0.850) |
0.853 (0.853–0.854) | 0.639 (0.638–0.640) | 0.694 (0.693–0.695) | 0.574 (0.574–0.575) |
0.831 (0.830–0.832) |
0.822 |
Accuracy | 0.868 (0.867–0.868) |
0.877 (0.876–0.877) |
0.798 (0.797–0.798) |
0.837 (0.836–0.837) |
0.855 (0.854–0.856) |
0.860 (0.859–0.860) |
|
k-fold | 0.856 (0.850–0.862) |
0.861 (0.853–0.870) |
0.777 (0.768–0.785) |
0.827 (0.818–0.831) |
0.850 (0.846–0.852) |
0.856 (0.853–0.859) |
|
Top 9 features | |||||||
AUC of ROC | 0.849 (0.848–0.850) |
0.853 (0.852–0.853) | 0.636 (0.635–0.636) | 0.691 (0.690–0.692) | 0.561 (0.560–0.561) | 0.836 (0.835–0.837) |
0.862 |
Accuracy | 0.867 (0.867–0.868) |
0.868 (0.867–0.868) |
0.791 (0.790–0.792) |
0.834 (0.833–0.834) |
0.853 (0.852–0.853) |
0.862 (0.862–0.863) |
|
k-fold | 0.856 (0.851–0.861) |
0.861 (0.857–0.864) |
0.779 (0.764–0.795) |
0.828 (0.824–0.835) |
0.848 (0.843–0.853) |
0.857 (0.853–0.859) |
Prediction models were generated from the training set with 80% of the Ansan/Ansung cohort, and its 20% was used as a test set. KNN, K-Nearest Neighbor; SVM, support vector machine; ANN, artificial neural network. The top 15-feature prediction model generated from XGBoost included serum glucose, waist circumference, blood HbA1c, serum total bilirubin, season to enroll the study, body fat, pulse, hip circumference, serum HDL, ALT, and γ-GTP, gender, serum creatinine, residence area, and PRS for insulin resistance. The top 9-feature prediction model generated from XGBoost contained serum glucose, waist circumference, body fat, serum ALT, serum total bilirubin, pulse, serum HDL, and gender.