Table 1. The training sets used for naïve Bayesian classification of bacterial 16S rRNA sequences.
Training set | Abbreviation | Sequence database | Taxonomy mapping |
---|---|---|---|
RDP Training Set 6 | RDP TS6 | 8422 sequences (Cole et al., 2009)a | Based on Bergey's taxonomy |
SILVA bacteria subset distributed for Mothur | SILVA Subset | 14 956 bacterial sequences selected from an export of the SILVA databaseb,c | SILVA taxonomy |
Reduced SILVA subset, comparable in size to RDP TS6 | SILVA98.1 | 8572 bacterial sequences, >1.9% unique, from the SILVA subset | SILVA taxonomy |
Greengenes bacteria subset of 99% similar sequences | GG99 | 127 741 bacterial sequences, >1% unique, from the Greengenes databased | Greengenes taxonomy |
Reduced Greengenes training set, comparable in size to RDP TS6 | GG91.3 | 8275 bacterial sequences, >8.7% unique, from the full Greengenes database | Greengenes taxonomy |
Abbreviation: RDP, Ribosomal Database Project II.