Table 2.
Mock | Cross-validated | Novel taxa | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Target | Condition | Method | Parameters | F | P | R | F | P | R | F | P | R | Threshold |
16S rRNA gene | Balanced | NB-bespoke | [6,6]:0.9 | 0.705 | 0.98 | 0.582 | 0.827 | 0.931 | 0.744 | 0.165 | 0.243 | 0.125 | F = (0.49, 0.8, 0.1) |
[6,6]:0.92 | 0.705 | 0.98 | 0.581 | 0.825 | 0.936 | 0.737 | 0.165 | 0.251 | 0.123 | F = (0.7, 0.8, 0.15) | |||
[6,6]:0.94 | 0.703 | 0.98 | 0.579 | 0.822 | 0.942 | 0.729 | 0.162 | 0.259 | 0.118 | ||||
[7,7]:0.92 | 0.712 | 0.978 | 0.592 | 0.831 | 0.931 | 0.751 | 0.151 | 0.221 | 0.115 | ||||
[7,7]:0.94 | 0.708 | 0.978 | 0.586 | 0.829 | 0.936 | 0.743 | 0.157 | 0.239 | 0.117 | ||||
Naive-Bayes | [7,7]:0.7 | 0.495 | 0.797 | 0.38 | 0.819 | 0.886 | 0.761 | 0.115 | 0.138 | 0.099 | |||
rdp | 0.6 | 0.564 | 0.798 | 0.457 | 0.815 | 0.868 | 0.768 | 0.102 | 0.128 | 0.084 | |||
0.7 | 0.55 | 0.799 | 0.438 | 0.812 | 0.892 | 0.746 | 0.124 | 0.173 | 0.096 | ||||
Uclust | 0.51:0.9:3 | 0.498 | 0.746 | 0.392 | 0.846 | 0.876 | 0.817 | 0.154 | 0.201 | 0.126 | |||
Precision | NB-bespoke | [6,6]:0.98 | 0.676 | 0.987 | 0.537 | 0.803 | 0.956 | 0.692 | 0.163 | 0.303 | 0.111 | P = (0.94, 0.95, 0.25) | |
[7,7]:0.98 | 0.687 | 0.98 | 0.551 | 0.815 | 0.951 | 0.713 | 0.164 | 0.283 | 0.115 | ||||
rdp | 1 | 0.239 | 0.941 | 0.16 | 0.632 | 0.968 | 0.469 | 0.12 | 0.457 | 0.069 | |||
Recall | NB-bespoke | [12,12]:0.5 | 0.754 | 0.8 | 0.721 | 0.815 | 0.83 | 0.801 | 0.053 | 0.058 | 0.049 | R = (0.47, 0.75, 0.04) | |
[14,14]:0.5 | 0.758 | 0.802 | 0.726 | 0.811 | 0.826 | 0.797 | 0.052 | 0.057 | 0.048 | R = (0.7, 0.75, 0.04) | |||
[16,16]:0.5 | 0.755 | 0.785 | 0.732 | 0.808 | 0.825 | 0.792 | 0.052 | 0.058 | 0.047 | ||||
[18,18]:0.5 | 0.772 | 0.803 | 0.748 | 0.805 | 0.823 | 0.789 | 0.055 | 0.061 | 0.05 | ||||
[32,32]:0.5 | 0.937 | 0.966 | 0.913 | 0.788 | 0.818 | 0.76 | 0.054 | 0.067 | 0.045 | ||||
Naive-Bayes | [11,11]:0.5 | 0.567 | 0.77 | 0.479 | 0.793 | 0.82 | 0.768 | 0.059 | 0.065 | 0.055 | |||
[12,12]:0.5 | 0.567 | 0.769 | 0.479 | 0.79 | 0.816 | 0.765 | 0.059 | 0.064 | 0.055 | ||||
[18,18]:0.5 | 0.564 | 0.764 | 0.477 | 0.779 | 0.807 | 0.753 | 0.057 | 0.063 | 0.051 | ||||
rdp | 0.5 | 0.577 | 0.791 | 0.48 | 0.816 | 0.848 | 0.787 | 0.068 | 0.079 | 0.06 | |||
Novel | Blast+ | 10:0.51:0.8 | 0.436 | 0.723 | 0.325 | 0.816 | 0.896 | 0.749 | 0.225 | 0.332 | 0.171 | F = (0.4, 0.8, 0.2) | |
Uclust | 0.76:0.9:5 | 0.467 | 0.775 | 0.348 | 0.84 | 0.938 | 0.76 | 0.219 | 0.358 | 0.158 | |||
VSEARCH | 10:0.51:0.8 | 0.45 | 0.74 | 0.342 | 0.814 | 0.891 | 0.75 | 0.226 | 0.333 | 0.171 | |||
10:0.51:0.9 | 0.45 | 0.74 | 0.342 | 0.82 | 0.896 | 0.755 | 0.219 | 0.338 | 0.162 | ||||
Fungi | Balanced | Naive-Bayes | [6,6]:0.94 | 0.874 | 0.935 | 0.827 | 0.481 | 0.57 | 0.416 | 0.374 | 0.438 | 0.327 | F = (0.85, 0.45, 0.37) |
[6,6]:0.96 | 0.874 | 0.935 | 0.827 | 0.495 | 0.597 | 0.423 | 0.399 | 0.473 | 0.344 | ||||
[6,6]:0.98 | 0.874 | 0.935 | 0.827 | 0.505 | 0.629 | 0.423 | 0.426 | 0.52 | 0.361 | ||||
[7,7]:0.98 | 0.874 | 0.935 | 0.827 | 0.485 | 0.596 | 0.409 | 0.388 | 0.47 | 0.33 | ||||
NB-bespoke | [6,6]:0.94 | 0.928 | 0.968 | 0.915 | 0.48 | 0.567 | 0.416 | 0.371 | 0.433 | 0.325 | |||
[6,6]:0.96 | 0.928 | 0.968 | 0.915 | 0.491 | 0.59 | 0.42 | 0.393 | 0.466 | 0.34 | ||||
[6,6]:0.98 | 0.927 | 0.97 | 0.913 | 0.504 | 0.624 | 0.422 | 0.421 | 0.512 | 0.358 | ||||
[7,7]:0.98 | 0.935 | 0.97 | 0.921 | 0.487 | 0.596 | 0.412 | 0.386 | 0.466 | 0.329 | ||||
rdp | 0.7 | 0.929 | 0.939 | 0.922 | 0.479 | 0.572 | 0.413 | 0.382 | 0.451 | 0.332 | |||
0.8 | 0.924 | 0.939 | 0.915 | 0.507 | 0.633 | 0.422 | 0.434 | 0.534 | 0.366 | ||||
0.9 | 0.922 | 0.937 | 0.913 | 0.517 | 0.698 | 0.411 | 0.47 | 0.617 | 0.379 | ||||
Precision | Naive-Bayes | [6,6]:0.98 | 0.874 | 0.935 | 0.827 | 0.505 | 0.629 | 0.423 | 0.426 | 0.52 | 0.361 | P = (0.92, 0.6, 0.3) | |
NB-bespoke | [6,6]:0.98 | 0.927 | 0.97 | 0.913 | 0.504 | 0.624 | 0.422 | 0.421 | 0.512 | 0.358 | |||
rdp | 0.8 | 0.924 | 0.939 | 0.915 | 0.507 | 0.633 | 0.422 | 0.434 | 0.534 | 0.366 | |||
0.9 | 0.922 | 0.937 | 0.913 | 0.517 | 0.698 | 0.411 | 0.47 | 0.617 | 0.379 | ||||
1 | 0.821 | 0.943 | 0.742 | 0.461 | 0.81 | 0.322 | 0.459 | 0.774 | 0.327 | ||||
Recall | NB-bespoke | [6,6]:0.92 | 0.938 | 0.971 | 0.924 | 0.467 | 0.544 | 0.409 | 0.353 | 0.407 | 0.312 | R = (0.9, 0.4, 0.3) | |
[6,6]:0.94 | 0.928 | 0.968 | 0.915 | 0.48 | 0.567 | 0.416 | 0.371 | 0.433 | 0.325 | ||||
[6,6]:0.96 | 0.928 | 0.968 | 0.915 | 0.491 | 0.59 | 0.42 | 0.393 | 0.466 | 0.34 | ||||
[6,6]:0.98 | 0.927 | 0.97 | 0.913 | 0.504 | 0.624 | 0.422 | 0.421 | 0.512 | 0.358 | ||||
[7,7]:0.96 | 0.935 | 0.969 | 0.921 | 0.47 | 0.56 | 0.404 | 0.357 | 0.422 | 0.31 | ||||
[7,7]:0.98 | 0.935 | 0.97 | 0.921 | 0.487 | 0.596 | 0.412 | 0.386 | 0.466 | 0.329 | ||||
rdp | 0.7 | 0.929 | 0.939 | 0.922 | 0.479 | 0.572 | 0.413 | 0.382 | 0.451 | 0.332 | |||
0.8 | 0.924 | 0.939 | 0.915 | 0.507 | 0.633 | 0.422 | 0.434 | 0.534 | 0.366 | ||||
0.9 | 0.922 | 0.937 | 0.913 | 0.517 | 0.698 | 0.411 | 0.47 | 0.617 | 0.379 | ||||
Novel | Naive-Bayes | [6,6]:0.98 | 0.874 | 0.935 | 0.827 | 0.505 | 0.629 | 0.423 | 0.426 | 0.52 | 0.361 | F = (0.85, 0.45, 0.4) | |
NB-bespoke | [6,6]:0.98 | 0.927 | 0.97 | 0.913 | 0.504 | 0.624 | 0.422 | 0.421 | 0.512 | 0.358 | |||
rdp | 0.8 | 0.923 | 0.939 | 0.915 | 0.507 | 0.633 | 0.422 | 0.434 | 0.534 | 0.366 | |||
0.9 | 0.921 | 0.937 | 0.913 | 0.517 | 0.698 | 0.411 | 0.47 | 0.617 | 0.379 |
aF, F-measure; P, precision; R, recall
bNaive Bayes parameters: k-mer range, confidence
cRDP parameters: confidence
dBLAST+/VSEARCH parameters: max accepts, minimum consensus, minimum percent identity
eUCLUST parameters: minimum consensus, similarity, max accepts
fThreshold describes the score cut-offs used to define optimal method ranges, in the following format: [metric = (mock score, cross-validated score, novel-taxa score)]. If two cut-offs are given, the second indicates a higher cut-off used to select parameters for the developmental NB-bespoke method, and the configurations listed are the union of the two cutoffs: the second cutoff for selecting NB-bespoke, the first for selecting all other methods