Skip to main content
. 2019 Apr 30;7(2):E246–E251. doi: 10.9778/cmajo.20180142

Table 2:

Ten-fold cross-validation results for each of 4 machine learning algorithms, minimizing or maximizing various metrics*

Metric and algorithm Sensitivity, % Specificity, % PPV, % NPV, % Accuracy, %
Misclassification rate
C5.0 40.9 (31.8–50.7) 99.3 (98.6–99.7) 84.9 (71.9–92.8) 94.8 (93.4–95.9) 94.4 (93.0–95.5)
CaRT 40.9 (31.8–50.7) 99.3 (98.6–99.7) 84.9 (71.9–92.8) 94.8 (93.4–95.9) 94.4 (93.0–95.5)
CHAID 40.0 (30.9–49.8) 99.3 (98.6–99.7) 84.6 (71.4–92.7) 94.7 (93.3–95.9) 94.3 (92.9–95.5)
LASSO 40.9 (31.8–50.7) 99.3 (98.6–99.7) 84.9 (71.9–92.8) 94.8 (93.4–95.9) 94.4 (93.0–95.5)
F1 score
C5.0 61.8 (52.0–70.8) 96.5 (95.2–97.4) 61.8 (52.0–70.8) 96.5 (95.2–97.4) 93.5 (92.0–94.8)
CaRT 60.9 (51.1–69.9) 96.3 (95.0–97.3) 60.4 (50.6–69.4) 96.4 (95.1–97.3) 93.3 (91.8–94.6)
CHAID 51.8 (42.1–61.4) 98.6 (97.7–99.1) 77.0 (65.5–85.7) 95.7 (94.3–96.7) 94.6 (93.2–95.8)
LASSO 40.9 (31.8–50.7) 99.3 (98.6–99.7) 84.9 (71.9–92.8) 94.8 (93.4–95.9) 94.4 (93.0–95.5)
PPV
C5.0 43.6 (34.3–53.4) 99.1 (98.3–99.5) 81.4 (68.7–89.9) 95.0 (93.6–96.1) 94.4 (93.0–95.5)
CaRT 40.9 (31.8–50.7) 99.3 (98.6–99.7) 84.9 (71.9–92.8) 94.8 (93.4–95.9) 94.4 (93.0–95.5)
CHAID 42.7 (33.5–52.5) 99.3 (98.6–99.7) 85.5 (72.8–93.1) 94.9 (93.5–96.1) 94.5 (93.1–95.7)
LASSO 40.9 (31.8–50.7) 99.3 (98.6–99.7) 84.9 (71.9–92.8) 94.8 (93.4–95.9) 94.4 (93.0–95.5)
Youden J statistic
C5.0 85.5 (77.2–91.2) 85.5 (83.4–87.5) 35.3 (29.7–41.5) 98.5 (97.4–99.1) 85.5 (83.5–87.4)
CaRT 80.9 (72.1–87.5) 89.2 (87.2–90.8) 40.8 (34.3–47.7) 98.1 (97.0–98.8) 88.5 (86.6–90.1)
CHAID 52.7 (43.0–62.2) 97.9 (96.9–98.6) 69.9 (58.7–79.2) 95.7 (94.4–96.8) 94.1 (92.6–95.3)
LASSO 87.3 (79.2–92.6) 85.4 (83.2–87.3) 35.6 (29.9–41.6) 98.6 (97.7–99.2) 85.5 (83.5–87.4)

Note: CaRT = classification and regression tree, CHAID = chi-square automated interaction detection, LASSO = least absolute shrinkage and selection operator, NPV = negative predictive value, PPV = positive predictive value.

*

The misclassification rate metric was minimized, whereas the F1 score, PPV and Youden J statistic metrics were maximized.

A dummy classifier that assumes all cases were type 2 diabetes would achieve an accuracy of 91.6%.

Instances reported as final case definitions.