PrimerProspector: de novo design and taxonomic analysis of barcoded polymerase chain reaction primers

William A Walters; J Gregory Caporaso; Christian L Lauber; Donna Berg-Lyons; Noah Fierer; Rob Knight

doi:10.1093/bioinformatics/btr087

. 2011 Feb 23;27(8):1159–1161. doi: 10.1093/bioinformatics/btr087

PrimerProspector: de novo design and taxonomic analysis of barcoded polymerase chain reaction primers

William A Walters ^1,^†,^✉, J Gregory Caporaso ^2,^†,^✉, Christian L Lauber ³, Donna Berg-Lyons ³, Noah Fierer ^3,4, Rob Knight ^2,5,^*

PMCID: PMC3072552 PMID: 21349862

Abstract

Motivation: PCR amplification of DNA is a key preliminary step in many applications of high-throughput sequencing technologies, yet design of novel barcoded primers and taxonomic analysis of novel or existing primers remains a challenging task.

Results: PrimerProspector is an open-source software package that allows researchers to develop new primers from collections of sequences and to evaluate existing primers in the context of taxonomic data.

Availability: PrimerProspector is open-source software available at http://pprospector.sourceforge.net

Contact: rob.knight@colorado.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

1 INTRODUCTION

Using next-generation sequencing methods to characterize hundreds of samples simultaneously in a single sequencing run has revolutionized microbial ecology (Tringe and Hugenholtz, 2008). However, primer design for such studies remains challenging. The primers must amplify an appropriate region of DNA that is the right length for sequencing and also taxonomically informative (Liu et al., 2008; Wang et al., 2007); a linker that is not complementary to the target in any one of many diverse species must be inserted before the barcode to avoid differential amplification (Hamady et al., 2008); and the set of barcodes must be checked to avoid formation of secondary structure within or between primers (i.e. primer-dimers) or between the barcodes and the primers. Additionally, the techniques need to be generic rather than tied to one taxonomic outline or database, so that many different target genes can be studied.

Here we present PrimerProspector, an open-source software package for primer design and analysis built using the PyCogent toolkit (Knight et al., 2007), that resolves these issues. We recently applied PrimerProspector to identify the 16S rRNA 515f/806r primer pair as nearly universal to archaea and bacteria, and to optimize this primer pair for increased sensitivity across these domains. This optimized primer pair, applied successfully in several recent studies (Bates et al., 2010; Caporaso et al., 2010; G.Bergmann et al., manuscript in preparation), has provided novel insight into archaeal and bacterial community membership in soils by allowing for more accurate determination of the abundances of taxa missed by many commonly used canonical primer pairs, e.g. the Verrucomicrobia.

No existing tools specifically address the issues associated with designing barcoded polymerase chain reaction (PCR) primers for community analysis. Primer design is a large field and we cannot survey it comprehensively in this article, but among a selection of related tools, Primer Validator (http://bioinfo.unice.fr/454) allows taxonomic assessment but does not generate de novo primers, or allow a customizable 3^′ weighted scoring system to predict successful amplification of tested primers. BarCrawl (Frank, 2009) allows design of barcodes for specified PCR primers but not design of the primers themselves, so is a useful complement to PrimerProspector. RDP's Probe Match (Cole et al., 2005) will report sequences matching a probe, as does Greengenes' probe function (DeSantis et al., 2006), but these tools are tied to the respective 16S rRNA databases and do not have support for barcodes. Primrose and OligoCheck (Ashelford et al., 2002) are useful for small numbers of target sequences, but do not scale well to thousands or tens of thousands of sequences, as is necessary when designing universal or near-universal primers, and do not incorporate differential weighting of 5^′ and 3^′ bases in primer scoring. Primer BLAST uses Primer3 software to build primers of a specified length against one target sequence, and then BLASTs the results against other databases to ensure that putative primers do not target BLAST hits. This functionality is also a useful complement to that provided in PrimerProspector.

While applications of PrimerProspector to date have focused on SSU rRNA primer design, PrimerProspector can be used for any nucleic acid sequences and allows users to design de novo primers based upon arbitrary multiple sequence alignments. User-specifiable design parameters include primer length, degeneracy and targeted regions for generation of primers. Existing or de novo primers can be analyzed for predicted taxonomic coverage, as shown in Figure 1. Finally, common pitfalls in primer design can be identified, such as likely barcode-primer secondary structure, regions susceptible to primer dimerization and disparate GC content between primer pairs. Convenient reports show amplicons or simulated reads that cover regions of sequences that are not phylogenetically informative or are of unsuitable lengths for sequencing.

Fig. 1. — Taxonomic coverage summary of the 515f/806r 16S SSU rRNA primer pair at the phylum level for (A) archaea, (B) eukarya and (C) bacteria. The y-axes represent percent coverage and the value on top of each bar is the total number of reference sequences in each taxon. In this analysis, the reference sequences were derived from the Silva database, and filtered at 97% sequence identity with uclust (Edgar, 2010). Archaeal and bacterial sequences shorter than 1450 bases, and eukaryotic sequences less that 1800 bases, were excluded from the reference set. As illustrated, this primer pair is nearly universal for archaeal and bacterial 16S but is generally poor for eukaryotic (notably metazoan) 18S sequences. This plot and additional PrimerProspector analyses informed the decision to use this primer pair in Caporaso *et al.* (2010), Bates *et al.* (2010) and G.Bergmann *et al.* (2010). Comparisons with the unoptimized primer pair and with an alternative popular pair (27f/338r) are shown as Supplementary Figures S1 and S2, respectively.

2 METHODS

De novo design of primers is performed by finding short conserved sequences in a given multiple sequence alignment to act as a 3^′ binding site for new primers. Once these sites have been identified, full-length forward or reverse de novo primers are generated by incorporating the N upstream or downstream bases, where N is 15 by default. De novo full-length primers can then be sorted according to sensitivity, specificity or degeneracy, and compared with known primers to find matches or significant overlap. Specificity for particular target groups, such as archaea, can be obtained by supplying an optional alignment of sequences from which to exclude matches.

Primer analyses, including the prediction of taxonomic coverage, rely upon scoring primers against target sequences. To predict its taxonomic coverage, a primer is locally aligned to full-length target sequences with known taxonomies, and scored based on gap, 3^′ mismatch and non-3^′ mismatch counts. An example of the graphical output is provided in Supplementary Figure S3. The final five bases are considered to be the 3^′ region by default, and are considered to be the most important for PCR amplification. The scoring scheme is parameterizable. The RDP Classifier (Wang et al., 2007) is used to classify the resulting sequence fragments, and the accuracy is displayed both in terms of which taxa are amplified and in terms of classification level of the resulting fragments. PrimerProspector supports retraining of the RDP Classifier for taxa coverage analysis based on different reference taxonomies.

Descriptions of the scripts included in PrimerProspector, the various outputs generated by PrimerProspector and an example based on the F515/R806 primer pair are included in the online documentation at http://pprospector.sourceforge.net/.

3 CONCLUSIONS

PCR amplification continues to be a key step in many high-throughput sequencing applications such as barcoded marker gene-based microbial community analyses. PrimerProspector represents a significant advance over prior work in this area by providing a single tool to facilitate primer design and analysis, including support for barcodes (and associated linkers). PrimerProspector is a fast and extensible framework for primer design and analysis, and has already been successfully applied to help researchers identify the most relevant and useful primers for their application, starting with multiple sequence alignments for any nucleic acid sequence.

Funding: Bill and Melinda Gates foundation; Crohn's and Colitis foundation of America; Howard Hughes Medical Institute; National Institutes of Health Signaling and Cell Cycle Regulation Training Grant (T32GM008759) in part.

Conflict of Interest: none declared.

Supplementary Material

Supplementary Data

supp_27_8_1159__index.html^{(1.1KB, html)}

REFERENCES

Ashelford K.E., et al. PRIMROSE: a computer program for generating and estimating the phylogenetic range of 16S rRNA oligonucleotide probes and primers in conjunction with the RDP-II database. Nucleic Acids Res. 2002;30:3481–3489. doi: 10.1093/nar/gkf450. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bates S.T., et al. Examining the global distribution of dominant archaeal populations in soil. ISME J. 2010 doi: 10.1038/ismej.2010.171. [Epub ahead of print, doi:10.1038/ismej.2010.171] [DOI] [PMC free article] [PubMed] [Google Scholar]
Caporaso J.G., et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc. Natl Acad. Sci. USA. 2010 doi: 10.1073/pnas.1000080107. [Epub ahead of print, doi: 10.1073/pnas.1000080107] [DOI] [PMC free article] [PubMed] [Google Scholar]
Cole J.R., et al. The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 2005;33:D294–D296. doi: 10.1093/nar/gki038. [DOI] [PMC free article] [PubMed] [Google Scholar]
DeSantis T.Z., et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 2006;72:5069–5072. doi: 10.1128/AEM.03006-05. [DOI] [PMC free article] [PubMed] [Google Scholar]
Edgar R.C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–2461. doi: 10.1093/bioinformatics/btq461. [DOI] [PubMed] [Google Scholar]
Frank D.N. BARCRAWL and BARTAB: software tools for the design and implementation of barcoded primers for highly multiplexed DNA sequencing. BMC Bioinformatics. 2009;10:362. doi: 10.1186/1471-2105-10-362. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hamady M., et al. Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nat. Methods. 2008;5:235–237. doi: 10.1038/nmeth.1184. [DOI] [PMC free article] [PubMed] [Google Scholar]
Knight R., et al. PyCogent: a toolkit for making sense from sequence. Genome Biol. 2007;8:R171. doi: 10.1186/gb-2007-8-8-r171. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu Z., et al. Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers. Nucleic Acids Res. 2008;36:e120. doi: 10.1093/nar/gkn491. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tringe S.G., Hugenholtz P. A renaissance for the pioneering 16S rRNA gene. Curr. Opin. Microbiol. 2008;11:442–446. doi: 10.1016/j.mib.2008.09.011. [DOI] [PubMed] [Google Scholar]
Wang Q., et al. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 2007;73:5261–5267. doi: 10.1128/AEM.00062-07. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

supp_27_8_1159__index.html^{(1.1KB, html)}

supp_btr087_Figure_S1.pdf^{(295.8KB, pdf)}

supp_btr087_Figure_S2.pdf^{(293.2KB, pdf)}

supp_btr087_Figure_S3.pdf^{(325KB, pdf)}

supp_btr087_SupplementaryFiguresCaptions.pdf^{(63.4KB, pdf)}

[B1] Ashelford K.E., et al. PRIMROSE: a computer program for generating and estimating the phylogenetic range of 16S rRNA oligonucleotide probes and primers in conjunction with the RDP-II database. Nucleic Acids Res. 2002;30:3481–3489. doi: 10.1093/nar/gkf450. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] Bates S.T., et al. Examining the global distribution of dominant archaeal populations in soil. ISME J. 2010 doi: 10.1038/ismej.2010.171. [Epub ahead of print, doi:10.1038/ismej.2010.171] [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] Caporaso J.G., et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc. Natl Acad. Sci. USA. 2010 doi: 10.1073/pnas.1000080107. [Epub ahead of print, doi: 10.1073/pnas.1000080107] [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] Cole J.R., et al. The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 2005;33:D294–D296. doi: 10.1093/nar/gki038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] DeSantis T.Z., et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 2006;72:5069–5072. doi: 10.1128/AEM.03006-05. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] Edgar R.C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–2461. doi: 10.1093/bioinformatics/btq461. [DOI] [PubMed] [Google Scholar]

[B7] Frank D.N. BARCRAWL and BARTAB: software tools for the design and implementation of barcoded primers for highly multiplexed DNA sequencing. BMC Bioinformatics. 2009;10:362. doi: 10.1186/1471-2105-10-362. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] Hamady M., et al. Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nat. Methods. 2008;5:235–237. doi: 10.1038/nmeth.1184. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] Knight R., et al. PyCogent: a toolkit for making sense from sequence. Genome Biol. 2007;8:R171. doi: 10.1186/gb-2007-8-8-r171. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] Liu Z., et al. Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers. Nucleic Acids Res. 2008;36:e120. doi: 10.1093/nar/gkn491. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] Tringe S.G., Hugenholtz P. A renaissance for the pioneering 16S rRNA gene. Curr. Opin. Microbiol. 2008;11:442–446. doi: 10.1016/j.mib.2008.09.011. [DOI] [PubMed] [Google Scholar]

[B12] Wang Q., et al. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 2007;73:5261–5267. doi: 10.1128/AEM.00062-07. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

PrimerProspector: de novo design and taxonomic analysis of barcoded polymerase chain reaction primers

William A Walters

J Gregory Caporaso

Christian L Lauber

Donna Berg-Lyons

Noah Fierer

Rob Knight

Abstract

1 INTRODUCTION

Fig. 1.

2 METHODS

3 CONCLUSIONS

Supplementary Material

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

PrimerProspector: de novo design and taxonomic analysis of barcoded polymerase chain reaction primers

William A Walters

J Gregory Caporaso

Christian L Lauber

Donna Berg-Lyons

Noah Fierer

Rob Knight

Abstract

1 INTRODUCTION

Fig. 1.

2 METHODS

3 CONCLUSIONS

Supplementary Material

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases