Skip to main content
. 2013 Jan 7;8(1):e53608. doi: 10.1371/journal.pone.0053608

Table 2. Comparison of classification coverage of V4 reads from fecal samples among different training sets.

Genus Family Order Class Phylum
Training set DB A B DB A B DB A B DB A B DB A B
RDP TS6 41.7 46.2 29.6 64.2 73.8 59.0 84.1 95.8 79.3 91.0 a 99.1 a 84.6 a 93.1 99.7 86.3
LTP 42.6 44.3 30.8 64.8 66.1 54.5 80.5 94.5 75.6 88.7 98.7 90.9 91.8 99.8 93.4
Unfiltered RDP 45.5 55.5 55.6 71.2 83.6 72.1 84.8 a 96.5 a 82.3 a 87.1 96.7 82.3 93.9 99.9 94.7
Filtered NCBI 41.9 46.8 38.8 72.9 78.2 61.4 85.2 96.4 79.2 90.7 97.5 85.6 94.5 99.9 94.7

We used each of the four training sets to classify single 100 bp reads excised from environmental (uncultured) bacteria 16S rRNA gene sequence from the RDP database (DB), as well as single100 bp reads from the same region sequenced from two fecal samples: A (6,298,382 sequences) and B (3,452,321 sequences). We then computed coverage for each of the ranks: phylum, class, order, family and genus, using per-rank confidence score thresholds that would ensure an FPR of at most 5%. The highest coverage in each column is underlined.

a

The confidence score threshold for these cases was lower than that of a higher level/s, and a sequence could thus be classified at the current level but not at the higher taxonomic levels. We found that the classification of such sequences is associated with a high error rate and our recommendation is to exclude them. We have therefore adjusted coverage accordingly.