Table 2.
Model Name | Number Variables | Hyperparameters | Model Accuracy | External Test Accuracy |
---|---|---|---|---|
After Recursive Feature Elimination with Cross-Validation (RFECV) | ||||
logistic regression | 135 | default | 0.82 | 0.66 |
decision tree | 200 | default | 0.75 | 0.69 |
random forest | 154 | default | 0.82 | 0.78 |
gradient boosting | 160 | default | 0.78 | 0.72 |
After Tuning Hyperparameters | ||||
logistic regression 1 | 200 | l1, 1 | 0.81 ± 0.06 | 0.63 ± 0.18 |
decision tree 2 | 200 | auto, 5, 1 | 0.70 ± 0.05 | 0.56 ± 0.20 |
random forest 3 | 200 | sqrt, 400, 5, 2 | 0.80 ± 0.07 | 0.75 ± 0.20 |
gradient boosting 4 | 200 | log2, 200, 10, 4 | 0.78 ± 0.07 | 0.69 ± 0.20 |
After Tuning Hyperparameters and RFECV | ||||
logistic regression 1 | 18 | l1, 1 | 0.82 | 0.63 |
decision tree 2 | 156 | auto, 5, 1 | 0.77 | 0.63 |
random forest 3 | 150 | sqrt, 400, 5, 2 | 0.82 | 0.72 |
gradient boosting 4 | 162 | log2, 200, 10, 4 | 0.82 | 0.75 |
After Multicollinearity and RFECV | ||||
logistic regression | 88 | default | 0.82 | 0.66 |
decision tree | 2 | default | 0.70 | 0.63 |
random forest | 127 | default | 0.79 | 0.78 |
gradient boosting | 69 | default | 0.78 | 0.69 |
After Multicollinearity and Tuning Hyperparameters | ||||
logistic regression 1 | 161 | l2, 10 | 0.81 ± 0.06 | 0.78 ± 0.18 |
decision tree 2 | 161 | auto, 6, 1 | 0.74 ± 0.04 | 0.60 ± 0.20 |
random forest 3 | 161 | log2, 700, 5, 2 | 0.80 ± 0.07 | 0.80 ± 0.20 |
gradient boosting 4 | 161 | log2, 300, 15, 4 | 0.80 ± 0.07 | 0.80 ± 0.20 |
After Multicollinearity, Tuning Hyperparameters and RFECV | ||||
logistic regression 1 | 99 | l2, 10 | 0.83 | 0.72 |
decision tree 2 | 85 | auto, 6, 1 | 0.76 | 0.63 |
random forest 3 | 150 | log2, 700, 5, 2 | 0.82 | 0.72 |
gradient boosting 4 | 161 | log2, 300, 15, 4 | 0.80 | 0.78 |
Hyperparameters for 1 logistic regression {penalty, cost C}, 2 decision tree {max_features, max_depth, min_samples_leaf}, 3 random forest {max_features, n_estimators, max_depth, min_samples_leaf} and 4 gradient boosting {max_features, n_estimators, max_depth, min_samples_leaf} classifiers.