. 2021 Sep 3;7(9):000642. doi: 10.1099/mgen.0.000642

Table 2.

Summary of models trained

Models were optimized and evaluated via a nested cross-validation protocol. The prefix and suffix of each model name corresponds to the dataset and contamination reduction technique applied, respectively. Neat, SD and CR refer to the feature spaces with no decontamination, Simple Decontamination, and SHAP Decontamination applied, respectively (see Methods). Karius-Without corresponds to the SHAP-decontaminated feature space after claimed ‘culture-confirmed’ pathogens are excluded. Karius-Only refers to the feature space containing only genera with ‘culture-confirmed’ pathogens as features.

No. of features	Feature space	Model performance
No. of features	Feature space	Precision	Recall	AUROC
1564	Karius-Neat	0.976	0.983	0.995
1564	Karius-normalised	0.956	0.932	0.943
111	Karius-SD	0.896	0.787	0.942
25	Karius-CR	0.883	0.810	0.942
22	Karius-Without	0.803	0.727	0.915
22	Karius-Only	0.929	0.862	0.950
685	Pooled-Neat	0.950	0.939	0.982
21	Pooled-CR	0.870	0.796	0.904