Skip to main content
. 2022 Sep 28;19(19):12378. doi: 10.3390/ijerph191912378

Table 1.

Overview of different ML-based methods utilized in the previous literature for diabetes prediction, including the year of publication, used dataset, missing value imputation techniques, feature selection strategies, number of selected features, classifier used, and corresponding performance evaluation metrics.

Years Dataset MVI 1 FS NSF BPC Performance
2016 [32] ENRC None None 9 DT Acc: 0.840
2018 [37] LMHC None None All RF Acc: 0.808 Sn: 0.849 Sp: 0.767
2018 [37] PIDD None mRMR 7 RF Acc: 0.772 Sn: 0.746 Sp: 0.799
2018 [30] PIDD None None 8 NB AUC: 0.819 Acc: 0.763 Sn: 0.763
2018 [38] PIDD KNN impute BWA 4 Linear Kernel SVM AUC: 0.920
2019 [39] PIDD NB None 8 RF AUC: 0.928 Acc: 0.871 Sn: 0.857
2019 [40] PIDD None CRB 11 NB Acc: 0.823
2019 [41] PIDD None None 8 MLP Acc: 0.775 Sn: 0.85 Sp: 0.68
2020 [31] PIDD Mean CRB 6 Ensemble of AB, XGB AUC: 0.950 Sn: 0.789 Sp: 0.789
2020 [42] NHANES None LR 7 RF AUC: 0.95 Acc: 0.943
2020 [43] PIDD Case deletion None 2 SVM AUC: 0.700 Acc: 0.750
2021 [44] PIDD None None 8 Ensemble of J48, NBT, RF, Simple CART, RT AUC: 0.832 Acc: 0.792 Sn: 0.786
2021 [45] LMHC Case deletion ANOVA, GI 16 XGB AUC: 0.876 Acc: 0.727 Sn: 0.738

1 Note: MVI: Missing Value Imputation, FS: Feature Selection, NSF: Number of Selected Feature, BPC: Best Performing Classifier, ENRC: Egyptian National Research Center, LMHC: Luzhou Municipal Health Commission, PIDD: PIMA Indian Dataset, mRMR: Minimum Redundancy Maximum Relevance, BWA: Boruta Wrapper Algorithm, CRB: Correlation-Based, NHANES: National Health and Nutrition Examination Survey, ANOVA: Analysis of Variance, GI: Gini Impurity, NBT: Naive Bayes Tree, RT: Random Tree.