Table 1. Assignment accuracy from ten-fold cross validation.
Accuracy per ranka | |||||
Method | Training/Reference set | Fragment length | Genus | Family | Phylum |
LCAb | SilvaMod | F.L.d | 82% | 92% | 99.9% |
LCAb | SilvaMod | 450 bp | 62% | 88% | 99.7% |
LCAb | SilvaMod | 100 bp | 38% | 61% | 94% |
LCAb | Greengenes | F.L.d | 69% | 94% | 99% |
LCAb | Greengenes | 450 bp | 48% | 87% | 99% |
LCAb | Greengenes | 100 bp | 33% | 65% | 94% |
RDPc | Greengenes | F.L.d | – | 97% | 98% |
RDPc | Greengenes | 450 bp | – | 94% | 95% |
RDPc | Greengenes | 100 bp | – | 49% | 51% |
RDPc | RDP v6 | F.L.d | 81% | 95% | 99% |
RDPc | RDP v6 | 450 bp | 73% | 92% | 98% |
RDPc | RDP v6 | 100 bp | 35% | 56% | 90% |
Assignment accuracy defined as number of correct assignments divided by the total number of sequences tested, given at three different ranks. The best values for each combination of rank and fragment length are indicated in bold.
Classification using Megablast alignments and the CREST LCAClassifier within a 2% LCA range of the highest bitscore as well as percent similarity filters.
Naïve Bayes classification using the RDP Classifier with a bootstrap of 0.8. With the Greengenes training set, RDP Classifier was run via the QIIME script assign_taxonomy, which does not classify sequences beyond the family level.
Un-cropped full-length sequences from the reference or training dataset.