Table 1. Number of sequences failing quality screening criteria and total number of sequences remaining (bold, italics), for standard processing pipeline and for AmpliconNoise processing.
Screening Criteria | Standard Output | AmpliconNoise Output |
Initial | 409,997 | 232,792 |
<100% match to 5' primer | 31,948 | NA |
Sequence length <120 bp | 137,753 | NA |
Ambiguous bases present | 6,295 | 0 |
Homopolymers >6 bases | 149 | 12 |
Avg Qscore <25 | 8,985 | NA |
Poorly aligning to database | 138 | 106 |
Remainder after 1st stage screening | 253,973 | 232,674 |
Uniques | 22,351 | 3,053 |
Pre-cluster sequences differing by 1 bp | 10,529 uniques | NA |
Flagged as chimeras | 1,081 | 1,460 |
Phyla other than target | 25 | 25 |
Net sequence read yield | 252,867 | 231,189 |
Sequence reads per sample (+/−SE) | 4,214+/−101 | 3,853+/−99 |
Unique sequence reads | 10,250 | 2,751 |
OTUs | 1,166 | 792 |
OTUs containing shared reads | 769 | 739 |
Equalize sampling effort (subsample to 3,000 reads per sample) | ||
OTUs per sample (+/−SE) | 112+/−2 | 120+/−2 |
The sum of sequences failing each criterion in the initial screening is greater than the number of sequences dropped because some sequences failed on multiple criteria. AmpliconNoise processing includes a test of matching to the 5' primer, and does not make use of quality scores. OTUs were defined based on a 3% sequence dissimilarity threshold, using the average neighbor method.