Skip to main content
. 2013 Jan 7;8(1):e53608. doi: 10.1371/journal.pone.0053608

Table 1. Training sets used for the naïve Bayesian classification of bacterial 16S rRNA gene sequences.

Abbreviation Description Sequence Databasea Underlying Taxonomy
RDP TS6 RDP classifier training set v.6 (default forv. 2.3 of the RDP classifier) 8,127 bacterial and 295 archaeal sequences Based on “The Taxonomic Outline of Bacteria and Archaea” (TOBA) 7.7 [52]
LTP Bacterial subset of “The Living TreeProject” v. 106 8,494 bacterial sequences “List of Prokaryotic names with Standing in Nomenclature (LPSN; http://www.bacterio.cict.fr/)
unfiltered RDP All bacterial isolates in RDP database 31,334 non-redundant bacterial sequencesb Based on “The Taxonomic Outline of Bacteria and Archaea” (TOBA) 7.7 [52]
filtered NCBI All bacterial isolates in RDP database,filtered for annotation quality 21,240 non-redundant bacterial sequencesb NCBI taxonomy [48]
a

Except for the ‘RDP TS6’ training set, which always trains on the full sequence, numbers are only for the testing of 100 nt single-reads from the V4 region. For the three other training sets, which train only on the region to be classified, the number of sequences reflects both the number of sequences covering this region (all three training sets) and its degree of redundancy (‘unfiltered RDP’ and ‘filtered NCBI’).

b

The numbers are for the ‘original non-redundant training set’ (see Methods section ‘Leave k out classification testing’); numbers for each leave-k-out iteration may vary slightly.