. 2024 Mar 12;14:1254860. doi: 10.3389/fmicb.2023.1254860

Table 2.

Comparison of the parameters (random forest algorithm, resampling methodology, model hyperparameters, number of hyperparameter combinations tested) and the outputs (optimal model accuracy and hyperparameters, training set accuracy, training set kappa, test set accuracy, test set kappa) for three models RF1, RF2, RF3 performed on the full animal isolate dataset and the RF1—no DT104 model performed on a dataset without the clonal DT104 animal isolates.

Model	Random forest algorithm	Resampling methodology	Model hyperparameters	Number of hyperparameter combinations tested	Optimal model accuracy and hyperparameters	Training set accuracy	Training set kappa	Test set accuracy	Test set kappa
RF1	RandomForest	Out of bag sampling (10,000 iterations)	mtry (1 to 163 in increments of 1)	163	0.786 (mtry = 109)	0.929 (95% CI: 0.895–0.954)	0.905	0.779 (95% CI: 0.670–0.866)	0.700
RF2	RandomForest	10-times repeated 10-fold cross-validation	mtry (1 to 163 in increments of 1)	163	0.775 (mtry = 49)	0.901 (95% CI: 0.863–0.931)	0.867	0.805 (95% CI: 0.699–0.887)	0.727
RF3	Ranger	10-times repeated 10-fold cross-validation	mtry (2 to 162 in increments of 2), splitrule (gini or extratrees), min.node.size (1, 5 to 30 in increments of 5)	1,134	0.778 (mtry = 40, splitrule = gini, min.node.size = 1)	0.913 (95% CI: 0.877–0.941)	0.884	0.805 (95% CI: 0.699–0.887)	0.727
RF1—no DT104	RandomForest	Out of bag sampling (10,000 iterations)	mtry (1 to 421 in increments of 1)	421	0.818 (mtry = 82)	0.989 (95% CI: 0.969–0.998)	0.985	0.781 (95% CI: 0.660–0.875)	0.663