Skip to main content
. 2021 Sep 3;7(9):000642. doi: 10.1099/mgen.0.000642

Table 2.

Summary of models trained

Models were optimized and evaluated via a nested cross-validation protocol. The prefix and suffix of each model name corresponds to the dataset and contamination reduction technique applied, respectively. Neat, SD and CR refer to the feature spaces with no decontamination, Simple Decontamination, and SHAP Decontamination applied, respectively (see Methods). Karius-Without corresponds to the SHAP-decontaminated feature space after claimed ‘culture-confirmed’ pathogens are excluded. Karius-Only refers to the feature space containing only genera with ‘culture-confirmed’ pathogens as features.

No. of features

Feature space

Model performance

Precision

Recall

AUROC

1564

Karius-Neat

0.976

0.983

0.995

1564

Karius-normalised

0.956

0.932

0.943

111

Karius-SD

0.896

0.787

0.942

25

Karius-CR

0.883

0.810

0.942

22

Karius-Without

0.803

0.727

0.915

22

Karius-Only

0.929

0.862

0.950

685

Pooled-Neat

0.950

0.939

0.982

21

Pooled-CR

0.870

0.796

0.904