. 2018 May 3;9:872. doi: 10.3389/fmicb.2018.00872

Table 4.

Comparison of performance of different methods based on the IBD and WT2D datasets.

Experiment			20 runs of 10-fold cross-validation (25P+97H)				Five runs of LOOCV (25P+25H)
	IBD dataset
Feature	30-mer		30-mer	Species abundance^†	Presence of strain-specific markers^†	Abundance in contig bin^††††	7-mer^††
Number of feature	1		15	443	91756	Not mentioned	200
Classifier	Single logical feature predictor		Random forests	Random forests	Support vector machine	Logistic regression + LASSO	Support vector machine
AUC	*ASS = 0.875 ± 0.004**		0.990 ± 0.005	0.893 ± 0.080	0.914 ± 0.084	0.967	Accuracy = 0.88

	WT2D dataset
Experiment			20 runs of 10-fold cross-validation (52P +43H)			Training (20H+20P) Testing (32P+13H)

Feature	40-mer	40-mer	Species abundance^†	Presence of strain-specific markers^†	Gene markers^†††	Abundance of bins with MetaGen	*40-mer*
Number of feature	1	10	381	83456	50	3	3
Classifier	Single logical feature predictor	Random forests	Random forests	Support vector machine	Support vector machine	Random forests	Random forests
AUC	ASS = 0.76 ± 0.003	0.939 ± 0.011	0.772 ± 0.116	0.785 ± 0.104	0.83	0.961 (training) 0.685 (testing)	0.979 (training) 0.782 (testing)

Using much fewer features, MetaGO achieved better results compared to other methods. The results of MetaGO were in bold. There were two experimental setting for IBD dataset, the “Five runs of LOOCV” are the subset of our experiment and LOOCV was more relaxed than 10-fold cross-validation. For the WT2D dataset, 40-mers were tested under two experimental setting for comparing with other methods. ^†(Pasolli et al., 2016); ^††(Cui and Zhang, 2013); ^†††(Qin et al., 2014); ^††††(Xing et al., 2017); ^∗average of sensitivity and specificity.