Skip to main content
. 2024 Mar 12;14:1254860. doi: 10.3389/fmicb.2023.1254860

Table 2.

Comparison of the parameters (random forest algorithm, resampling methodology, model hyperparameters, number of hyperparameter combinations tested) and the outputs (optimal model accuracy and hyperparameters, training set accuracy, training set kappa, test set accuracy, test set kappa) for three models RF1, RF2, RF3 performed on the full animal isolate dataset and the RF1—no DT104 model performed on a dataset without the clonal DT104 animal isolates.

Model Random forest algorithm Resampling methodology Model hyperparameters Number of hyperparameter combinations tested Optimal model accuracy and hyperparameters Training set accuracy Training set kappa Test set accuracy Test set kappa
RF1 RandomForest Out of bag sampling (10,000 iterations) mtry (1 to 163 in increments of 1) 163 0.786 (mtry = 109) 0.929 (95% CI: 0.895–0.954) 0.905 0.779 (95% CI: 0.670–0.866) 0.700
RF2 RandomForest 10-times repeated 10-fold cross-validation mtry (1 to 163 in increments of 1) 163 0.775 (mtry = 49) 0.901 (95% CI: 0.863–0.931) 0.867 0.805 (95% CI: 0.699–0.887) 0.727
RF3 Ranger 10-times repeated 10-fold cross-validation mtry (2 to 162 in increments of 2), splitrule (gini or extratrees), min.node.size (1, 5 to 30 in increments of 5) 1,134 0.778 (mtry = 40, splitrule = gini, min.node.size = 1) 0.913 (95% CI: 0.877–0.941) 0.884 0.805 (95% CI: 0.699–0.887) 0.727
RF1—no DT104 RandomForest Out of bag sampling (10,000 iterations) mtry (1 to 421 in increments of 1) 421 0.818 (mtry = 82) 0.989 (95% CI: 0.969–0.998) 0.985 0.781 (95% CI: 0.660–0.875) 0.663