Skip to main content
Applied and Environmental Microbiology logoLink to Applied and Environmental Microbiology
. 2011 Nov;77(21):7846–7849. doi: 10.1128/AEM.05220-11

Barcoded Primers Used in Multiplex Amplicon Pyrosequencing Bias Amplification,

David Berry 1, Karim Ben Mahfoudh 1, Michael Wagner 1, Alexander Loy 1,*
PMCID: PMC3209180  PMID: 21890669

Abstract

“Barcode-tagged” PCR primers used for multiplex amplicon sequencing generate a thus-far-overlooked amplification bias that produces variable terminal restriction fragment length polymorphism (T-RFLP) and pyrosequencing data from the same environmental DNA template. We propose a simple two-step PCR approach that increases reproducibility and consistently recovers higher genetic diversity in pyrosequencing libraries.

TEXT

Recent advances in DNA sequencing technologies have created opportunities for sequencing at an unprecedented depth and breadth (12) and multiplex sequencing has emerged as a popular strategy for parallel sequencing of many different samples (14). In multiplex sequencing, a unique sample-specific identifier, or “barcode” sequence, is added to the DNA that is to be sequenced. After sequencing, reads are sorted into sample libraries via detection of the appropriate barcode. Multiplexing in amplicon sequencing, which is widely performed for diversity surveys of 16S rRNA or functional genes, can be performed either by ligating barcodes and sequencing adapters to amplicons created with “conventional” PCR primers (primers that consist only of the template-specific sequence) (13), or more simply by using long oligonucleotides that, in addition to conventional PCR primers, already include 5′ tags with barcodes and sequencing adapters, thereby eliminating the ligation step (2, 8). The latter approach is referred to here as “barcoded primer” PCR (bcPCR).

The implicit assumption behind the bcPCR approach is that the adapter and barcode nucleotide sequence adjacent to the template-specific PCR primer does not interact with the template strand in such a way as to promote template sequence-dependent selective amplification. Pyrosequencing-based genetic diversity studies are known to be affected by a number of factors, including template sequence (1), amplicon size and target region, choice of primers (4), pyrosequencing errors (5, 10, 16), and OTU clustering procedure (9), and it was recently demonstrated that this widely used approach suffers from a relatively low technical reproducibility (20). In order to test specifically whether bcPCR affects surveys of genetic diversity, we designed barcoded primers comprised of the Titanium FLX sequencing adapters, randomly selected 8-nucleotide barcode sequences from a published and widely cited list (7) (Table S1), and primers targeting a fragment of the 16S rRNA gene of most bacteria and spanning regions V6 to V9, which is sufficient for accurate microbial community characterization (11) and captures genetic diversity similarly to full-length 16S rRNA (17) (for details, see the supplemental material). We amplified DNA isolated from the mouse gut lumen and analyzed the resulting 16S rRNA gene amplicons using terminal restriction fragment length polymorphism (T-RFLP) and 454 pyrosequencing.

Each primer variant out of 11 randomly selected barcoded primers was tested in triplicate using T-RFLP (for details, see the supplemental material). T-RFLP was also conducted for three replicate DNA extractions (using the same extraction protocol) from the same homogenized sample in order to compare barcode-induced variation to a known source of technical variation. T-RFLP profiles were significantly less reproducible for primers that had different barcodes than for replicates of the same barcoded primer (P < 0.0001) (Fig. 1A). The resulting average pairwise distance of profiles obtained with primers carrying different barcodes was even greater than that observed for amplification of multiple DNA extractions with a single barcoded primer (P < 0.0001), indicating that the variability associated with amplification using primers with different barcodes is greater than that observed with replicate DNA extractions (Fig. 1A). The detection of T-RF peaks, which is a presence/absence measurement, was not significantly different for the 1-step and 2-step bcPCR on average (P = 0.31), but the overall variation was higher among amplicons produced with different barcoded primers (F test, P = 0.029), which indicates that the barcode sequence did affect the detection of some peaks (Fig. 1B).

Fig. 1.

Fig. 1.

Barcoded pyrosequencing primers affect reproducibility of community profiles obtained via T-RFLP (A and B) or 454 sequencing (C and D). All T-RFLP experiments were performed in triplicate. (A and B) Average pairwise Euclidean distances of T-RFLP profiles are shown, as measured by T-RF relative abundances (A) and T-RF presence/absence (B). From left to right, the bars show comparisons made using T-RFLP replicates obtained from application of a single barcoded primer for bcPCR using DNA from a single extraction, T-RFLP replicates obtained from application of a single barcoded primer for bcPCR using DNA extracted separately for each replicate from the same homogenized gut sample, and T-RFLP profiles obtained after amplification of DNA from a single extraction using a mixture of 11 randomly chosen barcoded PCR primers with either 1-step or 2-step bcPCR. (C and D) Average pairwise community similarities from 454 sequencing libraries prepared from the same DNA extraction (also DNA from the mouse gut lumen, but different extraction than that used for panels A and B) using 16 barcoded primers with either 1-step or 2-step bcPCR are compared. Bray-Curtis (C) and unweighted UniFrac (D) distances are shown. Error bars indicate standard deviations, and asterisks indicate statistical significance at P values of <0.05 (*) and <0.001 (***).

In order to reduce the variability associated with different bar-coded primers, we reasoned that the presence of the overhanging pyrosequencing adapter and barcode region should be minimized during amplification. We therefore implemented a 2-step PCR procedure in which conventional PCR primers amplify the template to the desired yield in the first step, and a dilution of the amplicons from this first step then serves as a template in a successive low-cycle-number amplification using the appropriate barcoded primers (see Fig. S1 in the supplemental material). This 2-step protocol is similar to “reconditioning PCR” and therefore may be expected to have the additional benefit of reducing heteroduplex formation in mixed-template reactions (19), although in the present study we did not observe a significant effect of PCR procedure on the percentage of 454 pyrosequencing reads (see below) detected as chimeras by Chimera Slayer (6) (9.0 ± 2.3%, n = 22, P = 0.25). This protocol, which we refer to as “2-step bcPCR” to distinguish it from standard “1-step” bcPCR, produces barcoded amplicons that can be directly used for pyrosequencing.

To test this approach, we performed 20 cycles of amplification with conventional PCR primers and then used 1 μl of the PCR product of the first reaction (1:50 dilution) as the template for a 5-cycle amplification with barcoded primers. Compared to the 1-step bcPCR, the 2-step bcPCR protocol indeed significantly improved the reproducibility of T-RFLP profiles obtained after use of the same 11 different barcoded primers with the same DNA extract (t test, P < 0.0001). The T-RFLP profiles were also more similar to each other than profiles obtained from 1-step bcPCR amplification with a single barcoded primer were to each other using DNA from replicate extractions of the same homogenized sample (P < 0.0001) (Fig. 1A). However, the profiles from the 2-step bcPCR were still slightly less reproducible than those obtained with 1-step bcPCR using a single barcoded primer. The reason for this remaining minor bias introduced by barcoded primers even in the 2-step bcPCR is unknown, but it is unlikely to be connected with interactions between the barcode and the template. This is because the first step of amplification produces amplicons removed from their genomic context, and therefore in the second step of amplification, a template with neighboring sequence regions should no longer be present at relevant concentrations. Amplification using barcoded primers in both steps of the 2-step protocol confirmed that the presence of the barcoded primer was responsible for the reduced reproducibility of the 1-step bcPCR T-RFLP profiles (see Fig. S2 in the supplemental material), rather than the lack of a reconditioning step (19).

To test whether these results could be reproduced with sequencing data, pyrosequencing with 16 different barcodes using either 1-step or 2-step bcPCR was performed. Of the 16 barcodes, 6 were tested with both methods in order to make paired comparisons and the other 10 were tested with one of the PCR methods (5 for each method). The pyrosequencing data confirmed the T-RFLP result that 2-step bcPCR improves reproducibility, as measured by community similarity assessments with Bray-Curtis distance as well as unweighted UniFrac distance (Fig. 1C and D).

A widely used approach for 16S rRNA gene surveys is to classify sequences as belonging to specific taxa based on reference databases and compare their relative abundances (3, 18). The average relative representations of abundant taxa (>1% on average, classified at the family level) in the 1-step and 2-step bcPCR pyrosequence data sets were similar (Fig. 2A). However, we found that 2-step bcPCR reduced the relative standard deviation of relative abundance data for abundant families (Fig. 2A). Comparison of 6 barcoded primers evaluated using both methods revealed that 1-step bcPCR yielded reduced species richness, evenness, and phylogenetic distance (UniFrac tree branch length) (Fig. 2B), indicating that 2-step bcPCR recovers some sequence diversity missed by 1-step bcPCR. The extra diversity recovered by 2-step bcPCR shared similarity with high-quality 16S rRNA sequences in the SILVA database (SSU r106 Ref) (15) (mean sequence similarity, 91%) (see Fig. S3 in the supplemental material) and included two reads with 100% similarity to sequences in the database that had been recovered from rat feces, which indicates that the extra diversity is real and not a methodological artifact.

Fig. 2.

Fig. 2.

The bcPCR method affects alpha diversity and reproducibility of taxonomic classification. (A) Comparison of 11 randomly selected barcoded primers (6 used for both bcPCR methods). Relative abundance for each taxon (family level) present on average at ≥1% is plotted on a heatmap for each barcode used. Average relative abundance and relative standard deviation are listed for each method and taxon. (B) A box plot of the paired difference for several alpha diversity metrics for operational taxonomic units (OTUs) is shown for 1-step and 2-step bcPCR using an identical set of 6 randomly selected barcoded primers (for details about metrics, see the supplemental material). The dashed line indicates a difference of zero.

We explored whether any of the variation observed with different barcodes could be explained by known or predictable characteristics of the different barcoded oligonucleotides, but community structure was not determined by in silico folding stability, homodimer or heterodimer formation potential, or the identity of the nucleotide base on the 3′ end of the barcode (the base proximal to the template-specific PCR primer sequence) (perMANOVA, P > 0.05), and GC content was identical for all barcodes. This leads us to conclude that the bcPCR bias cannot be predicted by in silico secondary structure evaluation of the primer but is likely driven by selective or stochastic amplification caused by currently unknown template and barcoded primer interactions. While the present study evaluated the effect of varying the barcode region, the sequencing adapters would also be expected to contribute to selective amplification in bcPCR. This raises a possible concern for comparability of studies across different sequencing platforms as well as sequencing chemistries that use different adapters on the same platform.

The T-RFLP and pyrosequencing data clearly demonstrate that barcoded primers introduce biases in PCR that translate into less reproducible data sets. We have devised and evaluated a modified 2-step amplification procedure that improves this issue and outperforms the standard protocol. This modification can be easily incorporated into existing protocols and should be a valuable contribution to the production of high-quality multiplex amplicon libraries for high-throughput sequencing.

Supplementary Material

Supplemental Material

Acknowledgments

We thank Sebastian Lücker for design of bacterial primers, Holger Daims for helpful discussions, Christian Baranyi for technical assistance, and the Norwegian High-Throughput Sequencing Centre for pyrosequencing.

This work was financially supported by the Austrian Science Fund (P20185-B17 to A.L.) and the Austrian Federal Ministry of Science and Research (GEN-AU III InflammoBiota to D.B., M.W., and A.L.).

Footnotes

Supplemental material for this article may be found at http://aem.asm.org/.

Published ahead of print on 2 September 2011.

REFERENCES

  • 1. Amend A. S., Seifert K. A., Bruns T. D. 2010. Quantifying microbial communities with 454 pyrosequencing: does read abundance count? Mol. Ecol. 19:5555–5565 [DOI] [PubMed] [Google Scholar]
  • 2. Binladen J., et al. 2007. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing. PLoS One 2:e197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Caporaso J. G., et al. 2010. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7:335–336 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Engelbrektson A., et al. 2010. Experimental factors affecting PCR-based estimates of microbial species richness and evenness. ISME J. 4:642–647 [DOI] [PubMed] [Google Scholar]
  • 5. Gomez-Alvarez V., Teal T. K., Schmidt T. M. 2009. Systematic artifacts in metagenomes from complex microbial communities. ISME J. 3:1314–1317 [DOI] [PubMed] [Google Scholar]
  • 6. Haas B. J., et al. 2011. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res. 21:494–504 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Hamady M., Walker J. J., Harris J. K., Gold N. J., Knight R. 2008. Error-correcting Barcoded primers for pyrosequencing hundreds of samples in multiplex. Nat. Methods 5:235–237 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Hoffmann C., et al. 2007. DNA bar coding and pyrosequencing to identify rare HIV drug resistance mutations. Nucleic Acids Res. 35:e91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Huse S. M., Welch D. M., Morrison H. G., Sogin M. L. 2010. Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environ. Microbiol. 12:1889–1898 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Kunin V., Engelbrektson A., Ochman H., Hugenholtz P. 2010. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ. Microbiol. 12:118–123 [DOI] [PubMed] [Google Scholar]
  • 11. Liu Z., Lozupone C., Hamady M., Bushman F. D., Knight R. 2007. Short pyrosequencing reads suffice for accurate microbial community analysis. Nucleic Acids Res. 35:e120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Margulies M., et al. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–380 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Meyer M., Stenzel U., Hofreiter M. 2008. Parallel tagged sequencing on the 454 platform. Nat. Protoc. 3:267–278 [DOI] [PubMed] [Google Scholar]
  • 14. Parameswaran P., et al. 2007. A pyrosequencing-tailored nucleotide Barcode design unveils opportunities for large-scale sample multiplexing. Nucleic Acids Res. 35:e130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Pruesse E., et al. 2007. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 35:7188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Quince C., et al. 2009. Accurate determination of microbial diversity from 454 pyrosequencing data. Nat. Methods 6:639–641 [DOI] [PubMed] [Google Scholar]
  • 17. Schloss P. D. 2010. The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies. PLoS Comput. Biol. 6:e1000844. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Schloss P. D., et al. 2009. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75:7537–7541 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Thompson J. R., Marcelino L. A., Polz M. F. 2002. Heteroduplexes in mixed-template amplifications: formation, consequence and elimination by “reconditioning PCR.” Nucleic Acids Res. 30:2083–2088 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Zhou J., et al. 2011. Reproducibility and quantitation of amplicon sequencing-based detection. ISME J. 5:1303–1313 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

Articles from Applied and Environmental Microbiology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES