Table 8.
Hyperparameters of the optimal classifiers built on base data set used for the calculation of tenfold cross-validation accuracy in Table 7. The “with D” and “no D” columns refer to the full and reduced (after removal of D) sets of features, respectively. N/A stands for “Not Applicable” (the first two parameters are random forest specific). The definitions of the feature sets are given in Table 1.
Features | Model | Variant | Bootstrap | Criterion | max_depth | max_features | min_samples_leaf | min_samples_split | n_estimators |
---|---|---|---|---|---|---|---|---|---|
Set A | RF | with D | True | gini | 80 | sqrt | 4 | 2 | 800 |
no D | True | entropy | 10 | sqrt | 2 | 10 | 600 | ||
GB | with D | N/A | N/A | 50 | sqrt | 4 | 10 | 900 | |
no D | N/A | N/A | 10 | log2 | 2 | 2 | 100 | ||
Set B | RF | with D | True | entropy | None | None | 2 | 5 | 1000 |
no D | True | entropy | None | log2 | 1 | 10 | 600 | ||
GB | with D | N/A | N/A | 110 | log2 | 2 | 10 | 400 | |
no D | N/A | N/A | 10 | log2 | 4 | 5 | 100 | ||
Set C | RF | with D | True | entropy | 60 | log2 | 4 | 2 | 900 |
no D | True | entropy | 10 | sqrt | 2 | 10 | 600 | ||
GB | with D | N/A | N/A | 10 | log2 | 2 | 2 | 100 | |
no D | N/A | N/A | 10 | log2 | 2 | 2 | 100 |