Table 2. Predictive performance of random forest models.
Model diagnostics describing overall performance when applied to predict host category of held-out coronaviruses not used for model training. CI denotes confidence interval, Kappa denotes Cohen’s Kappa statistic, mAUC denotes multiclass area-under-curve statistic, and F1macro denotes F1 score calculated using macro-averaging (performance on each host category weighted equally).
Predictor features | Accuracy (95% CI) | Kappa | mAUC | F1macro |
---|---|---|---|---|
Spike protein | 0.735 (0.700, 0.769) | 0.696 | 0.898 | 0.757 |
Whole genome | 0.728 (0.687, 0.766) | 0.688 | 0.902 | 0.758 |