Table 2.
Comparison of the parameters (random forest algorithm, resampling methodology, model hyperparameters, number of hyperparameter combinations tested) and the outputs (optimal model accuracy and hyperparameters, training set accuracy, training set kappa, test set accuracy, test set kappa) for three models RF1, RF2, RF3 performed on the full animal isolate dataset and the RF1—no DT104 model performed on a dataset without the clonal DT104 animal isolates.
Model | Random forest algorithm | Resampling methodology | Model hyperparameters | Number of hyperparameter combinations tested | Optimal model accuracy and hyperparameters | Training set accuracy | Training set kappa | Test set accuracy | Test set kappa |
---|---|---|---|---|---|---|---|---|---|
RF1 | RandomForest | Out of bag sampling (10,000 iterations) | mtry (1 to 163 in increments of 1) | 163 | 0.786 (mtry = 109) | 0.929 (95% CI: 0.895–0.954) | 0.905 | 0.779 (95% CI: 0.670–0.866) | 0.700 |
RF2 | RandomForest | 10-times repeated 10-fold cross-validation | mtry (1 to 163 in increments of 1) | 163 | 0.775 (mtry = 49) | 0.901 (95% CI: 0.863–0.931) | 0.867 | 0.805 (95% CI: 0.699–0.887) | 0.727 |
RF3 | Ranger | 10-times repeated 10-fold cross-validation | mtry (2 to 162 in increments of 2), splitrule (gini or extratrees), min.node.size (1, 5 to 30 in increments of 5) | 1,134 | 0.778 (mtry = 40, splitrule = gini, min.node.size = 1) | 0.913 (95% CI: 0.877–0.941) | 0.884 | 0.805 (95% CI: 0.699–0.887) | 0.727 |
RF1—no DT104 | RandomForest | Out of bag sampling (10,000 iterations) | mtry (1 to 421 in increments of 1) | 421 | 0.818 (mtry = 82) | 0.989 (95% CI: 0.969–0.998) | 0.985 | 0.781 (95% CI: 0.660–0.875) | 0.663 |