. 2019 Apr 30;7(2):E246–E251. doi: 10.9778/cmajo.20180142

Table 2:

Ten-fold cross-validation results for each of 4 machine learning algorithms, minimizing or maximizing various metrics^*

Metric and algorithm	Sensitivity, %	Specificity, %	PPV, %	NPV, %	Accuracy, %^†
Misclassification rate
C5.0	40.9 (31.8–50.7)	99.3 (98.6–99.7)	84.9 (71.9–92.8)	94.8 (93.4–95.9)	94.4 (93.0–95.5)
CaRT	40.9 (31.8–50.7)	99.3 (98.6–99.7)	84.9 (71.9–92.8)	94.8 (93.4–95.9)	94.4 (93.0–95.5)
CHAID	40.0 (30.9–49.8)	99.3 (98.6–99.7)	84.6 (71.4–92.7)	94.7 (93.3–95.9)	94.3 (92.9–95.5)
LASSO	40.9 (31.8–50.7)	99.3 (98.6–99.7)	84.9 (71.9–92.8)	94.8 (93.4–95.9)	94.4 (93.0–95.5)
F1 score
C5.0	61.8 (52.0–70.8)	96.5 (95.2–97.4)	61.8 (52.0–70.8)	96.5 (95.2–97.4)	93.5 (92.0–94.8)
CaRT	60.9 (51.1–69.9)	96.3 (95.0–97.3)	60.4 (50.6–69.4)	96.4 (95.1–97.3)	93.3 (91.8–94.6)
CHAID	51.8 (42.1–61.4)	98.6 (97.7–99.1)	77.0 (65.5–85.7)	95.7 (94.3–96.7)	94.6 (93.2–95.8)
LASSO	40.9 (31.8–50.7)	99.3 (98.6–99.7)	84.9 (71.9–92.8)	94.8 (93.4–95.9)	94.4 (93.0–95.5)
PPV
C5.0	43.6 (34.3–53.4)	99.1 (98.3–99.5)	81.4 (68.7–89.9)	95.0 (93.6–96.1)	94.4 (93.0–95.5)
CaRT	40.9 (31.8–50.7)	99.3 (98.6–99.7)	84.9 (71.9–92.8)	94.8 (93.4–95.9)	94.4 (93.0–95.5)
CHAID^‡	42.7 (33.5–52.5)	99.3 (98.6–99.7)	85.5 (72.8–93.1)	94.9 (93.5–96.1)	94.5 (93.1–95.7)
LASSO	40.9 (31.8–50.7)	99.3 (98.6–99.7)	84.9 (71.9–92.8)	94.8 (93.4–95.9)	94.4 (93.0–95.5)
Youden J statistic
C5.0	85.5 (77.2–91.2)	85.5 (83.4–87.5)	35.3 (29.7–41.5)	98.5 (97.4–99.1)	85.5 (83.5–87.4)
CaRT	80.9 (72.1–87.5)	89.2 (87.2–90.8)	40.8 (34.3–47.7)	98.1 (97.0–98.8)	88.5 (86.6–90.1)
CHAID	52.7 (43.0–62.2)	97.9 (96.9–98.6)	69.9 (58.7–79.2)	95.7 (94.4–96.8)	94.1 (92.6–95.3)
LASSO^‡	87.3 (79.2–92.6)	85.4 (83.2–87.3)	35.6 (29.9–41.6)	98.6 (97.7–99.2)	85.5 (83.5–87.4)

Note: CaRT = classification and regression tree, CHAID = chi-square automated interaction detection, LASSO = least absolute shrinkage and selection operator, NPV = negative predictive value, PPV = positive predictive value.

The misclassification rate metric was minimized, whereas the F1 score, PPV and Youden J statistic metrics were maximized.

^†

A dummy classifier that assumes all cases were type 2 diabetes would achieve an accuracy of 91.6%.

^‡

Instances reported as final case definitions.