TABLE 3.
Prediction Algorithm | (approximate) Controls per Case | Eligible Covariates | Notes |
---|---|---|---|
Logistic Regression | 50, 20, or 10 | All | Weighted |
(14 variants) | 50, 20, or 10 | Pre-Selected | Weighted |
50, 20, or 10 | All | ||
50, 20, or 10 | Pre-selected | ||
50 | All | Backward selection | |
50 | Pre-Selected | Backward selection | |
Lasso | 20 or 10 | All | Deviance loss |
(8 variants) | 20 or 10 | All | neg AUC loss |
20 or 10 | Pre-Selected | Deviance loss | |
20 or 10 | Pre-Selected | neg AUC loss | |
Ridge Regression | 20 or 10 | All | Deviance loss |
(8 variants) | 20 or 10 | All | neg AUC loss |
20 or 10 | Pre-Selected | Deviance loss | |
20 or 10 | Pre-Selected | neg AUC loss | |
Random Forest | 50, 20, or 10 | All | 10,000 trees, 1/3 of covariates sampled per split |
(6 variants) | 50, 20, or 10 | Pre-Selected | 10,000 trees, 1/3 of covariates sampled per split |
Support Vector Machine | 50 | All | Tuning parameters chosen by cross validation |
(2 variants) | 50 | Pre-Selected | Tuning parameters chosen by cross validation |
Neural Network | 20 or 10 | Pre-Selected | 10 nodes in 1hidden layer |
(4 variants) | 20 or 10 | Pre-Selected | 5 nodes in 1 hidden layer |
Note: Risk scores were re-scaled to account for undersampling of controls, except for weighted logistic regression algorithms.