Table 1.
Models performances in cross-validation (mean with confidence intervals) and on the testset. ACC: accuracy; MCC: Matthews Correlation Coefficient; CI: 95% studentized bootstrap confidence interval.
Workflow | Features Selection | N° Features | Hyperparameters | Train Metrics | Test Metrics | ||||
---|---|---|---|---|---|---|---|---|---|
MCC (CI) | Kappa (CI) | ACC (CI) | MCC | Kappa | ACC | ||||
ALL + RF | None | 323,564 | max.depth = 10 num.trees = 50 mtry = 569 min.node.size = 20 | 0.127 (0.09–0.163) | 0.113 (0.081–0.145) | 0.679 (0.668–0.690) | 0.157 | 0.120 | 0.695 |
IVF + RF | IVF | 161,782 | max.depth = 15 num.trees = 100 mtry = 402 min.node.size = 20 | 0.162 (0.128–0.197) | 0.146 (0.115–0.178) | 0.679 (0.665–0.694) | 0.138 | 0.115 | 0.686 |
RFE + RF | IVF + RFE | 415 | max.depth = 10 num.trees = 500 mtry = 24 min.node.size = 20 | 0.467 (0.431–0.503) | 0.455 (0.419–0.491) | 0.784 (0.771–0.798) | 0.428 | 0.371 | 0.773 |
Boruta + RF | IVF + Boruta | 200 | max.depth = 15 num.trees = 200 mtry = 17 min.node.size = 20 | 0.485 (0.453–0.518) | 0.473 (0.440–0.506) | 0.790 (0.777–0.803) | 0.415 | 0.394 | 0.767 |
RFE∩Boruta + RF | IVF + Intersect (RFE-Boruta) | 34 | max.depth = 15 num.trees = 500 mtry = 5 min.node.size = 20 | 0.533 (0.502–0.563) | 0.523 (0.493–0.553) | 0.806 (0.794–0.818) | 0.510 | 0.484 | 0.802 |
RFE∩Boruta + RF (randomized output) | IVF + Intersect (RFE Boruta) | 34 | max.depth = 15 num.trees = 500 mtry = 5 min.node.size = 20 | 0.018 (−0.016–0.053) | 0.014 (−0.010–0.037) | 0.671 (0.663–0.680) | −0.065 | −0.042 | 0.648 |