Table 2. Confusion Matrix Metrics for Random Forest and Logistic Regression Models.
Population at highest risk, %a | Specificity, % | Sensitivity, % | PPV, % | ||||||
---|---|---|---|---|---|---|---|---|---|
Random forest | Logistic regression | Difference (95% CI)b | Random forest | Logistic regression | Difference (95% CI)b | Random forest | Logistic regression | Difference (95% CI)b | |
5 | 95.5 | 95.1 | 0.4 (0.0 to 0.7) | 16.2 | 8.1 | 8.1 (3.9 to 11.7) | 15.5 | 7.8 | 7.7 (3.7 to 11.3) |
10 | 90.4 | 90.1 | 0.2 (−0.2 to 0.7) | 27.3 | 19.9 | 7.4 (3.0 to 14.6) | 12.7 | 9.4 | 3.3 (1.3 to 6.7) |
20 | 80.3 | 79.9 | 0.3 (−0.1 to 1.4) | 42.4 | 38.4 | 4.1 (−1.1 to 12.5) | 9.9 | 8.9 | 1.0 (−0.1 to 3.0) |
Abbreviation: PPV, positive predictive value.
Binary predictions are obtained from continuous risk scores by classifying this highest-risk percentage as positive.
The 95% CIs were estimated using 10 000 bootstrap replications.