Video abstract
A video abstract by the authors of this paper is available. video-abstract8870.mov
Keywords: PCR primer prediction, degenerate primers, alignment, consensus sequence, homologous genes, paralogous genes
Abstract
The PCR-amplification of unknown homologous or paralogous genes generally relies on PCR primers predicted from multi sequence alignments. But increasing sequence divergence can induce the need to use degenerate primers which entails the problem of testing the characteristics, unwanted interactions and potential mispriming of degenerate primers. Here I introduce easyPAC, a new software for the prediction of degenerate primers from multi sequence alignments or single consensus sequences. As a major innovation, easyPAC allows to apply all customary primer test procedures to degenerate primer sequences including fast mapping to reference files. Thus, easyPAC simplifies and expedites the designing of specific degenerate primers enormously. Degenerate primers suggested by easyPAC were used in PCR amplification with subsequent de novo sequencing of TDRD1 exon 11 homologs from several representatives of the haplorrhine primate phylogeny. The results demonstrate the efficient performance of the suggested primers and therefore show that easyPAC can advance upcoming comparative genetic studies.
Introduction
In typical population genetic as well as comparative studies of genes and genomes the amplification of homologous genes, especially from non-model organisms, bases on PCR primers deduced from multi species alignments of available sequences or a corresponding consensus sequence. This approach finds its use in studies that aim to reveal the selective regimes that act on certain genes within a defined phylogeny.1,2
Moreover, the amplification of paralogous genes within one species, like the olfactory receptor gene family in human or the Low-molecular-weight glutenin subunit gene family in bread wheat,3,4 is based on multi sequence alignments as well. In summary, conserved regions in alignments of available sequences serve as template for the prediction of primers that are used to amplify unknown sequences (Fig. 1).
Figure 1.
Methodology of primer design.
Notes: Alignments include available sequences (blue) and serve as template for the prediction of primers that are used to PCR-amplify unknown sequences (red). This methodology can be applied for the amplification of homologous genes within a species tree (left side) as well as for paralogous genes within one species (right side).
However, with a growing number of sequences or an increasing degree of sequence divergence, even conserved regions within an alignment may exhibit sequence variants or alternatively may exhibit unwanted sequence traits, hence PCR-amplification will depend on degenerate primers to ensure coverage of all sequences. An increasing level of degeneracy in primer sequences will in turn boost the probability for mispriming, especially in the light of a potential presence of paralogous genes, pseudo genes or multicopy transposable elements (TEs). On top of that, testing degenerate primers is circumstantial since it requires separate testing of each possible primer sequence. This complicates precluding mispriming and bears a challenge in connection with the controlling of primer characteristics and the assembly of proper degenerate primer pairs.
Today, sophisticated tools for primer prediction (eg, Primer3Plus and OligoFaktory) are available providing a comprehensive range of functions including the possibility to controll primer specificity via database search of primer sequences.5,6 However, they only support the prediction of nondegenerate primers based on single nondegenerate input sequences. To overcome this shortcoming, several different tools have been developed to predict degenerate primers based on multiple sequence alignments among them being the most popular GeneFisher2, CODEHOP, HYDEN, PriFi, Primaclade and Greene SCPrimer.7–14 However, these approaches cannot rival functionality of the first-mentioned tools. Beside more or less extensive restrictions in functional range, they do not support a comparison with reference databases to ensure specificity. Table 1 lists a selection of available parameter settings for popular primer prediction tools. Note that some of these tools were developed to adress special biological issues and may hold many features not listed in the table. I developed a freely available primer pair prediction software, easy- PAC (easy Primer prediction from Alignments and Consensus sequences), that for the first time combines all features and routines that are required to design specific degenerate primers that underwent all commonly applied primer and primer pair test procedures and that are optionally mapped against an arbitrary number of user defined reference files that may contain paralogous genes, TE sequences or even whole genomes to ensure primer specificity.
Table 1.
Comparison of available parameter settings for different popular primer prediction tools.
| Primer prediction tool | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GeneFisher2 | |||||||||||||||||
| CODEHOP | |||||||||||||||||
| HYDEN | |||||||||||||||||
| OligoFaktory | |||||||||||||||||
| PriFi | |||||||||||||||||
| Primaclade | |||||||||||||||||
| Primer3Plus | |||||||||||||||||
| SCPrimer | |||||||||||||||||
| easyPAC |
Notes: Green dots indicate that the respective parameter can be adjusted by the user. Red dots indicate that the user has no control over the parameter. Yellow dots indicate restricted control (eg, CODEHOP allows to adjust only primer concentration not Na+ or Mg2+ concentration). 1single nondegenerate sequence, 2alignment or consensus sequence, 3Na+, Mg2+ and primer concentration, 4predefine primer, 5define PCR target, 6exclude regions, 7primer length, 8GC content, 9minimum/maximum Tm, 10maximum ΔTm, 11maximum degeneracy, 12define primer 3′-end, 13maximum poly-X,14maximum poly-XY, 15maximum 3′-stability, 16maximum 3′-complementarity,17map primers to databases.
Algorithm and Implementation
The accepted input format for alignment and reference file(s) is FASTA. For each sequence the full IUPAC code is accepted. Instead of using a multi sequence alignment, the program alternatively accepts one degenerate sequence (consensus sequence). Users have full control over all customary primer-and primer-pair parameters such as primer length (default: 18nt–28nt), Tm (default: 52 °C–65 °C), GC-content (default: 20%–80%), maximum 3′-end complementarity (default: 3), maximum 3′-end stability (default: 6), maximum number of poly-X/polyXY (default: 4/3) and the maximum degeneracy (default: 64), which is measured by the number of possible sequence combinations of a degenerate primer and in the following referred to as the ambiguity factor (AF). Users can also declare an arbitrary number of regions to be excluded from primer search and optionally use a predefined forward or reverse primer to find an adequate reverse or forward primer respectively.
Basically, the easyPAC algorithm first assembles all possible primer candidates and then performs a series of tests starting with basic tests so that many candidates are already rejected when easy- PAC performs tests that require more computational power.
The workflow starts with reading of the alignment from which a consensus sequence is created (this step is omitted if the user provides a single consensus sequence). Since easyPAC in the following processes solely the consensus sequence, the number of sequences in the input alignment is not limited and has virtually no influence on computation time. The consensus sequence is then split in all possible individual sequences ranging from user defined minimum to maximum primer size. Sequences that are located within the declared target or excluded regions or that are too far from the target to yield a PCR product of proper size are rejected. Then easyPAC computes the AF for each of the remaining sequences (where 1 is multiplied with 2/3/4 for every RYSWKM/BDHV/N within the primer sequence) and primers with an improper AF are rejected.
In the next steps, all primer sequences that do not have the user defined 3′-end sequence or exhibit too high 3′-end stability are sorted out. For the remaining candidates, Tm (by default calculated by the bases-tacking method),15 GC-content, poly-X/XY content, and 3′-end self complementarity (which also avoids hairpin formation) are computed and primers that do not meet the user defined requirements are discarded.
If desired, easyPAC will now map primer candidate sequences to one or more reference files (which can contain for instance sequences of paralogous genes and/or TE sequences) and scrap those, that match to any sequence of the reference file in sense or antisense orientation. Alternatively the user can use the option ‘Allow primer to match once in reference’ and supply a whole genome of the species in question. In this case, primers matching more than once to the reference in sense or antisense orientation are rejected. This will ensure primer specificity and obviates the need for subsequent primer BLAST. A reference genome should not be provided if primers are intended for the amplification of paralogous genes within one species. By default, primer mapping is performed by implementation of the Seq- Map algorithm which is very fast (eg, much faster than BLAST) since SeqMap is especially designed to map short sequences to large references.16 However, SeqMap will only map to nondegenerate characters (ATGC) within the reference. So this option is recommended if the reference is large and contains no or very few degenerate positions. Alternatively, mapping can be performed using an internal algorithm which is notably slower but will map eg, an A to any of the following characters: ARWMDHVN. This option is recommended if the reference is small and degenerated (eg, a consensus sequence of a large number of paralogous genes). Depending on the applied mapping algorithm, a specific number of mismatches including insertions or deletions can be tolerated.
Finally, easyPAC will sort forward and reverse primer candidates by their quality and assemble proper primer pairs on the basis of maximum allowed ΔTm, maximum product size and maximum 3′-end pair complementarity. Proper primer pairs are output in order of increasing AF sum and PCR product size.
One major improvement introduced by easyPAC is the possibility of testing degenerate primers. To this end, every degenerate primer candidate is used to rebuild a group of all possible sequence combinations prior to every applied test procedure. Every sequence within this group is then tested separately and the degenerate primer candidate will only be accepted if all of its possible sequence combinations passed the entire test procedure including optional mapping against a reference (Fig. 2).
Figure 2.
easyPAC work flow.
Note: Degenerate primer candidates are assembled and each possible sequence of a degenerate primer is tested separately.
Beside a textual output and for further improvement of usability, easyPAC creates a graphical output which ensures comprehensibility of primer pair selection. The graphical output contains the alignment with the corresponding consensus sequence, a color-coded indication of sequence conservation, an alignment annotation assigning target sequence, regions excluded from primer search, simple repeats, matches to reference and internal duplications and finally the best primer pairs and their location within the alignment (Fig. 3).
Figure 3.
easyPAC output.
Notes: The graphical output comprises the annotated alignment and the suggested primers within the alignment (top). The textual output contains a list of suggested primer pairs with corresponding primer details.
By default, easyPAC uses rather less rigorous parameter settings to maximize the number of suggested primers which anyway will be sorted by their quality in the final output. Using more stringent settings will not necessarily produce better results but may accelerate computation time since many primer sequences will be discarded during the initial test procedures.
Tests of the Program
An alignment containing genomic DNA sequences of TDRD1 exon 11 (87 bp ± 1000 bp of adjacent intronic sequence) from Human (Homo sapiens), Chimp (Pan troglodytes), Gorilla (Gorilla gorilla), Orang-Utan (Pongo pygmaeus), Rhesus macaque (Macaca mulatta) and Common marmoset (Callithrix jacchus) was used as template to predict primers for PCR amplification of TDRD1 exon 11 in primate species where TDRD1 sequence is completely unknown.
easyPAC was executed using the default settings. The coordinates of exon 11 were used to declare the target start and target end and a file containing primate TE sequences obtained from Repbase served as reference file.17 In total, easyPAC considered 12384 mostly degenerate primer sequences. The number of primer candidates that passed each test are shown in Table 2.
Table 2.
Number of remaining primer candidates after each applied test.
| Working step | Fwd. primer candidates | Rev. primer candidates |
|---|---|---|
| Assemble possible sequences | 6280 | 6104 |
| Check for proper AF and distance to target | 3370 | 2685 |
| Calculate Tm and GC content | 537 | 221 |
| Check for poly X/XY motifs | 514 | 81 |
| Calculate 3′-end stability | 487 | 81 |
| Check for 3′-selfcomplementarity | 271 | 27 |
| Map sequences to reference | 256 | 21 |
| Check for multiple occurrence in alignment | 256 | 2 |
| Assembling proper primer pairs | Found 19 proper pairs |
Primer prediction was finished after 50 seconds (computation carried out on a standard desktop PC with 2,67 GHz and 4 GB RAM) and 19 primer pairs were suggested of which the first primer pair was used for subsequent PCR amplification (Forward primer: 5′-TCAAAGGATGCTTGAGRGGATGGT- 3′, Tm: 55.8 °C–57.8 °C, degeneracy: 2-fold. Reverse primer: 5′-CAGTGMTAAAGYTGYGCCTTTGTTTA- 3′, Tm: 52.9 °C–58.8 °C, degeneracy: 8-fold). Figure 4 shows the degree of sequence conservation along the alignment as well as the position of exon 11 and the position of the primers suggested by easyPAC.
Figure 4.
Conservation of the alignment.
Notes: The graph indicates the fraction of positions that are conserved in every species of the alignment within a 100 bp sliding window along the alignment. Bars indicate the position of exon 11 (yellow) and the position of the primers (blue) suggested by easyPAC.
Given an alignment that covered the primate phylogeny from New World monkeys to Old World monkeys and hominids, the aim was to PCR amplify TDRD1 exon 11 of other representatives of these groups including Goeldi’s marmoset (Callimico goeldii, Platyrrhini), Tufted capuchin (Cebus apella, Platyrrhini), Mantled guereza (Colobus guereza, Cercopithecidae) and Lar gibbon (Hylobates lar, Hominidae). In addition, PCRs were also performed with DNA from the 6 species included in the alignment to check the performance of the degenerate primers (Fig. 5).
Figure 5.
Phylogenetic relationship of the species involved in this study.
Notes: Six species covering haplorrhine phylogeny (black) were used for the alignment that served as template for primer prediction. Primers were used to PCR-amplify TDRD1 exon 11 from four additional representatives of haplorrhine phylogeny with hitherto unknown sequence (red).
All initial PCRs yielded a specific product of approximately 500 bp in length which corresponds to the expected size of the desired DNA fragment (Fig. 6). The PCR products were cloned and sequenced in both directions. The obtained sequences were aligned and TDRD1 exon 11 sequence could be found in all a mplicons. TDRD1 exon 11 sequences of Callimico goeldii, Cebus apella, Colobus guereza and Hylobates lar are stored at NCBI GenBank under the accession numbers JN944162-JN944165. All methods are described in more detail in the material and methods section.
Figure 6.
PCR amplification of TDRD1 exon 11 using genomic DNA from 10 primate species and primers suggested by easyPAC.
Note: Red species names refer to species with hitherto unknown sequence (conf. Fig. 5).
Conclusion
The introduced software easyPAC can be used for fast prediction of specific and tested PCR primers from alignments exhibiting a high degree of sequence divergence. easyPAC is the first freely available primer design software that incorporates testing and mapping of degenerate primers, thus simplifying primer design enormously by reducing the required time for primer design from several hours to a few minutes with simultaneous maximization of result quality.
Although exhibiting up to 8-fold degeneracy, the predicted primers were found to perform well in wet lab experiments under the conditions suggested by the software. TDRD1 exon 11 could be PCR amplified and sequenced from all reference species and de novo sequenced from four other representative primate species. This shows, that easyPAC is perfectly applicable for upcoming comparative studies of homologous and paralogous genes.
Material and Methods
The sequence of human TDRD1 exon 11 (87 bp) including 1000 bp of 3′- and 5′-flanking intronic sequence respectively was used as basis sequence for an alignment including Homo sapiens, Pan troglodytes, Gorilla gorilla, Pongo pygmaeus, Macaca mulatta and Callithrix jacchus. The alignment was downloaded from the Ensembl database (6 primates EPO). Two Alu-SINE sequences that were unique to Callithrix jacchus were deleted from the alignment leading to an alignment with a total length of 2149 bp. This alignment was used for primer prediction with easyPAC using the following settings: Primer size: 18nt–28nt, Tm: 52 °C–65 °C, maximum ΔTm: 6 °C, GC-content: 20%–80%, maximum degeneracy: 16, maximum 3′-complementarity: 3, maximum 3′-stability: 6, maximum poly-X/XY: 4/3, reference file contained primate TE sequences from Repbase.
DNA of the respective primate species was isolated from different tissues with the QIAmp DNA Mini Kit (QIAGEN). PCR of TDRD1 exon 11 was performed using the Taq PCR Core Kit (QIAGEN, [60″ denaturing at 95 °C, 40″ annealing at 52 °C, 60″ elongation at 72 °C] × 35). The PCR products were ligated into pGEM-T vector (Promega) and transformed to E. coli cells (TOP10 strain) via electroporation. Grown cell colonies were used as DNA source for PCR amplification of the vector including the initial PCR amplicon. Cycle S equencing Kit Version 3.1 (Applied Biosystems) was used to sequence these PCR products in both directions. Sequencing was performed on an ABI 3130 DNA Sequencer (Applied Biosystems). Sequences were aligned with T-Coffee and checked for correct translatability of the exonic sequence without stop codons or frame shifts.18
Availability and System Requirements
easyPAC is freely available at http://www.uni-mainz.de/FB/Biologie/Anthropologie/472_ENG_HTML.php and also at http://sourceforge.net/projects/easypac/.
easyPAC is written in Perl. The software is provided as Perl script which contains the source code and which will run on any platform (Windows, Macintosh, Linux) but requires the installation of a Perl distribution. Normally Perl is preinstalled on Macintosh and Linux systems. However, the installation of additional modules may be required (available at the Comprehensive Perl Archive Network: www.cpan.org) if they are not already part of the installed Perl distribution. Details can be found in the easy- PAC documentation. There is also an executable file of easyPAC (easyPAC.exe) that runs on Windows systems without the need for Perl installation.
Supplementary Data
easyPAC.zip
easyPAC.tar.gz
Both files contain the compressed easyPAC folder including the original Perl script (easyPAC.pl), the easyPAC documentation (easyPAC_documentation. pdf), an executable file for Windows systems (easy-PAC.exe, tested for Windows 7 and Windows XP) and all files required for the execution of easyPAC.
easyPAC_results_image.png
easyPAC_results_text.txt
These files contain the original easyPAC output (picture and text) for primer search as described in the Tests of the program section.
Acknowledgements
Thanks go to Hui Jiang and Wing Hung Wong for friendly permission to use the SeqMap algorithm. Further thanks go to the German Primate Center, especially the Department of Veterinary Medicine and Animal Husbandry, for providing primate tissue samples. The staff of the working group Zischler is gratefully acknowledged for helpful discussion. Thanks are also given to Dr. Roland Stern for his assistance concerning the evaluation of thermodynamic parameters of oligonucleotides.
Footnotes
Disclosures and Ethics
As a requirement of publication author(s) have provided to the publisher signed confirmation of compliance with legal and ethical obligations including but not limited to the following: authorship and contributorship, conflicts of interest, privacy and confidentiality and (where applicable) protection of human and animal research subjects. The authors have read and confirmed their agreement with the ICMJE authorship and conflict of interest criteria. The authors have also confirmed that this article is unique and not under consideration or published in any other publication, and that they have permission from rights holders to reproduce any copyrighted material. Any disclosures are made in this section. The external blind peer reviewers report no conflicts of interest.
Funding
This work was supported by the “Schwerpunkt rechnergestuetzte Forschungsmethoden in den Naturwissenschaften” (SRFN), Johannes Gutenberg- University, Mainz.
References
- 1.Chapman MA, Leebens-Mack JH, Burke JM. Positive selection and expression divergence following gene duplication in the sunflower CYCLOIDEA gene family. Mol Biol Evol. 2008;25(7):1260–73. doi: 10.1093/molbev/msn001. [DOI] [PubMed] [Google Scholar]
- 2.Herlyn H, Zischler H. Sequence evolution, processing, and posttranslational modification of zonadhesin D domains in primates, as inferred from cDNA data. Gene. 2005;362:85–97. doi: 10.1016/j.gene.2005.06.009. [DOI] [PubMed] [Google Scholar]
- 3.Glusman G, Yanai I, Rubin I, Lancet D. The complete human olfactory subgenome. Genome Res. 2001;11:685–702. doi: 10.1101/gr.171001. [DOI] [PubMed] [Google Scholar]
- 4.Zhang X, Liu D, Yang W, Liu K, Sun J, et al. Development of a new marker system for identifying the complex members of the low-molecular-weight glutenin subunit gene family in bread wheat (Triticum aestivum L.) Theor Appl Genet. 2011;122(8):1503–16. doi: 10.1007/s00122-011-1550-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Untergasser A, Nijveen H, Rao X, Bisseling T, Geurts R, et al. Primer3Plus, an enhanced web interface to Primer3. Nucleic Acids Res. 2007;35:W71–4. doi: 10.1093/nar/gkm306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Schretter C, Milinkovitch MC. OligoFaktory: a visual tool for interactive oligonucleotide design. Bioinformatics. 2005;22(1):115–6. doi: 10.1093/bioinformatics/bti728. [DOI] [PubMed] [Google Scholar]
- 7.Giegerich R, Meyer F, Schleiermacher C. GeneFisher—software support for the detection of postulated genes. Proc Int Conf Intell Syst Mol Biol. 1996;4:68–77. [PubMed] [Google Scholar]
- 8.Rose TM, Schultz ER, Henikoff JG, Pietrokovski S, McCallum CM, et al. Consensus-degenerate hybrid oligonucleotide primers for amplification of distantly-related sequences. Nucleic Acids Res. 1998;26:1628–35. doi: 10.1093/nar/26.7.1628. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Linhart C, Shamir R. The degenerate primer design problem. Bioinformatics. 2002;18(Suppl 1):S172–80. doi: 10.1093/bioinformatics/18.suppl_1.s172. [DOI] [PubMed] [Google Scholar]
- 10.Linhart C, Shamir R. The degenerate primer design problem: theory and applications. J Comput Biol. 2005;12:431–56. doi: 10.1089/cmb.2005.12.431. [DOI] [PubMed] [Google Scholar]
- 11.Fredslund J, Schauser L, Madsen LH, Sandal N, Stougaard J. PriFi: using a multiple alignment of related sequences to find primers for amplification of homologs. Nucleic Acids Res. 2005;33(Web Server Issue):W516–20. doi: 10.1093/nar/gki425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gadberry MD, Malcomber T, Doust AN, Kellogg EA. Primaclade—a flexible tool to find primers across multiple species. Bioinformatics. 2005;21:1263–4. doi: 10.1093/bioinformatics/bti134. [DOI] [PubMed] [Google Scholar]
- 13.Boyce R, Chilana P, Rose TM. iCODEHOP: a new interactive program for designing COnsensus-DEgenerate Hybrid Oligonucleotide Primers from multiply aligned protein sequences. Nucleic Acids Res. 2009;37(Web Server Issue):W222–8. doi: 10.1093/nar/gkp379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Gorron E, Rodriguez F, Bernal D, Rodriguez-Rojas LM, Bernal A, et al. A new method for designing degenerate primers and its use in the identification of sequences in Brachiaria showing similarity to apomixis-associated genes. Bioinformatics. 2010;26(16):2053–4. doi: 10.1093/bioinformatics/btq312. [DOI] [PubMed] [Google Scholar]
- 15.SantaLucia J., Jr A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci U S A. 1998;195(4):7. 1460–5. doi: 10.1073/pnas.95.4.1460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Jiang H, Wong HW. SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics. 2008;24:2395–6. doi: 10.1093/bioinformatics/btn429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110(1–4):462–7. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]
- 18.Notredame C, Higgins DG, Heringa J. T-Coffee: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000;302:205–17. doi: 10.1006/jmbi.2000.4042. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
easyPAC.zip
easyPAC.tar.gz
Both files contain the compressed easyPAC folder including the original Perl script (easyPAC.pl), the easyPAC documentation (easyPAC_documentation. pdf), an executable file for Windows systems (easy-PAC.exe, tested for Windows 7 and Windows XP) and all files required for the execution of easyPAC.
easyPAC_results_image.png
easyPAC_results_text.txt
These files contain the original easyPAC output (picture and text) for primer search as described in the Tests of the program section.






