Skip to main content
. 2018 Jun 27;34(13):i32–i42. doi: 10.1093/bioinformatics/bty296

Table 3.

The results for the task of selecting between 18 ecological environments as well as 5 organismal environments belonging to 5 organisms’ gut

Step Representation Dataset Classifier Micro-metrics (averaged over samples)
Macro-metrics (averaged over classes)
Precision Recall F1 Precision Recall F1
(ii) 3-mers ECO-18K RF 0.6 ± 0.01 0.6 ± 0.01 0.6 ± 0.01 0.63 ± 0.02 0.6 ± 0.01 0.57 ± 0.01
4-mers 0.67 ± 0.01 0.67 ± 0.01 0.67 ± 0.01 0.7 ± 0.01 0.67 ± 0.01 0.65 ± 0.01
5-mers 0.72 ± 0.01 0.72 ± 0.01 0.72 ± 0.01 0.74 ± 0.01 0.72 ± 0.01 0.71 ± 0.01
6-mers 0.75 ± 0.01 0.75 ± 0.01 0.75 ± 0.01 0.76 ± 0.01 0.75 ± 0.01 0.73 ± 0.01
7-mers 0.74 ± 0.01 0.74 ± 0.01 0.74 ± 0.01 0.76 ± 0.01 0.74 ± 0.01 0.73 ± 0.01
8-mers 0.72 ± 0.01 0.72 ± 0.01 0.72 ± 0.01 0.74 ± 0.01 0.72 ± 0.01 0.71 ± 0.01
(ii) 3-mers 5GUTS-3100 RF 0.8 ± 0.02 0.8 ± 0.02 0.8 ± 0.02 0.8 ± 0.02 0.8 ± 0.02 0.79 ± 0.02
4-mers 0.84 ± 0.01 0.84 ± 0.01 0.84 ± 0.01 0.84 ± 0.01 0.84 ± 0.01 0.83 ± 0.01
5-mers 0.86 ± 0.02 0.86 ± 0.02 0.86 ± 0.02 0.86 ± 0.02 0.86 ± 0.02 0.85 ± 0.02
6-mers 0.87 ± 0.01 0.87 ± 0.01 0.87 ± 0.01 0.87 ± 0.01 0.87 ± 0.01 0.87 ± 0.01
7-mers 0.87 ± 0.01 0.87 ± 0.01 0.87 ± 0.01 0.88 ± 0.02 0.87 ± 0.01 0.87 ± 0.01
8-mers 0.86 ± 0.01 0.86 ± 0.01 0.86 ± 0.01 0.87 ± 0.01 0.86 ± 0.01 0.86 ± 0.01
(iv) 6-mers ECO-18K RF 0.75 ± 0.01 0.75 ± 0.01 0.75 ± 0.01 0.76 ± 0.01 0.75 ± 0.01 0.73 ± 0.01
SVM 0.79 ± 0.01 0.79 ± 0.01 0.79 ± 0.01 0.79 ± 0.01 0.79 ± 0.01 0.79 ± 0.01
DNN-3L 0.78 ± 0.01 0.78 ± 0.01 0.78 ± 0.01 0.78 ± 0.01 0.78 ± 0.01 0.78 ± 0.01
(iv) 6-mers 5GUTS-3100 RF 0.87 ± 0.01 0.87 ± 0.01 0.87 ± 0.01 0.87 ± 0.01 0.87 ± 0.01 0.87 ± 0.01
SVM 0.88 ± 0.02 0.88 ± 0.02 0.88 ± 0.02 0.89 ± 0.01 0.88 ± 0.02 0.88 ± 0.02
DNN-5L 0.87 ± 0.01 0.87 ± 0.01 0.87 ± 0.01 0.87 ± 0.01 0.87 ± 0.01 0.87 ± 0.01
(iv) 6-mers ECO-180K RF 0.83 ± 0.0 0.83 ± 0.0 0.83 ± 0.0 0.84 ± 0.0 0.83 ± 0.0 0.83 ± 0.0
(10× larger) SVM 0.86 ± 0.0 0.86 ± 0.0 0.86 ± 0.0 0.87 ± 0.01 0.86 ± 0.0 0.86 ± 0.0
DNN-5L 0.88 ± 0.0 0.88 ± 0.0 0.88 ± 0.0 0.88 ± 0.0 0.88  ±  0.0 0.88  ±  0.0

Note: The classifiers (Random Forest, Support Vector Machine and neural network classifiers) are tuned and evaluated in a stratified 10×fold cross-validation setting in three datasets ECO-18K, 5GUTS-3100 and ECO-180K. The step column refers to the steps in Figure 3.