Skip to main content
. 2018 May 3;9:872. doi: 10.3389/fmicb.2018.00872

Table 4.

Comparison of performance of different methods based on the IBD and WT2D datasets.

IBD dataset
Experiment 20 runs of 10-fold cross-validation (25P+97H) Five runs of LOOCV (25P+25H)
Feature 30-mer 30-mer Species abundance Presence of strain-specific markers Abundance in contig bin†††† 7-mer††
Number of feature 1 15 443 91756 Not mentioned 200
Classifier Single
logical
feature
predictor
Random
forests
Random
forests
Support
vector
machine
Logistic regression + LASSO Support
vector
machine
AUC ASS* = 0.875 ± 0.004 0.990 ± 0.005 0.893 ± 0.080 0.914 ± 0.084 0.967 Accuracy = 0.88

WT2D dataset
Experiment 20 runs of 10-fold cross-validation (52P +43H) Training (20H+20P)
Testing (32P+13H)

Feature 40-mer 40-mer Species abundance Presence of strain-specific markers Gene markers††† Abundance of bins with MetaGen 40-mer
Number of feature 1 10 381 83456 50 3 3
Classifier Single
logical
feature
predictor
Random
forests
Random
forests
Support
vector
machine
Support
vector
machine
Random
forests
Random
forests
AUC ASS = 0.76 ± 0.003 0.939 ± 0.011 0.772 ± 0.116 0.785 ± 0.104 0.83 0.961 (training)
0.685 (testing)
0.979 (training)
0.782 (testing)

Using much fewer features, MetaGO achieved better results compared to other methods. The results of MetaGO were in bold. There were two experimental setting for IBD dataset, the “Five runs of LOOCV” are the subset of our experiment and LOOCV was more relaxed than 10-fold cross-validation. For the WT2D dataset, 40-mers were tested under two experimental setting for comparing with other methods. (Pasolli et al., 2016); ††(Cui and Zhang, 2013); †††(Qin et al., 2014); ††††(Xing et al., 2017); average of sensitivity and specificity.