Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2016 Oct 7;45(Database issue):D56–D60. doi: 10.1093/nar/gkw913

GETPrime 2.0: gene- and transcript-specific qPCR primers for 13 species including polymorphisms

Fabrice PA David 1,2, Jacques Rougemont 1,2,*, Bart Deplancke 2,3,*
PMCID: PMC5210624  PMID: 28053161

Abstract

GETPrime (http://bbcftools.epfl.ch/getprime) is a database with a web frontend providing gene- and transcript-specific, pre-computed qPCR primer pairs. The primers have been optimized for genome-wide specificity and for allowing the selective amplification of one or several splice variants of most known genes. To ease selection, primers have also been ranked according to defined criteria such as genome-wide specificity (with BLAST), amplicon size, and isoform coverage. Here, we report a major upgrade (2.0) of the database: eight new species (yeast, chicken, macaque, chimpanzee, rat, platypus, pufferfish, and Anolis carolinensis) now complement the five already included in the previous version (human, mouse, zebrafish, fly, and worm). Furthermore, the genomic reference has been updated to Ensembl v81 (while keeping earlier versions for backward compatibility) as a result of re-designing the back-end database and automating the import of relevant sections of the Ensembl database in species-independent fashion. This also allowed us to map known polymorphisms to the primers (on average three per primer for human), with the aim of reducing experimental error when targeting specific strains or individuals. Another consequence is that the inclusion of future Ensembl releases and other species has now become a relatively straightforward task.

INTRODUCTION

Genome-scale experiments have accumulated massive information over recent years and have greatly contributed to our understanding of gene expression and its regulatory mechanisms. These experiments have clearly revealed the ubiquitous nature of alternative splicing and isoform dosage effects (1,2). It is in this regard key to perform precise, quantitative measurements of selected genes and transcripts to assess specific expression patterns or functions. Such experiments typically involve the quantitative real-time polymerase chain reaction (qPCR), and the value of these qPCR assays depends in large part on the quality of the selected primer pair for the respective, targeted transcription unit (3).

We have therefore undertaken the systematic design of primer pairs for every known gene and transcript for organisms with well-annotated genome references with in silico verification of optimal specificity. The design of these primer pairs follows the pipeline described in (4), which we briefly recall here: for designing gene- or transcript-specific primers pairs, exon junctions that are included in respectively the largest or smallest number of isoforms for each gene are first identified after which the corresponding transcript is processed with PerlPrimer (5) for the best primer set that overlaps these junctions. Candidate primers are then filtered according to (i) genome-wide specificity (running BLAST with an E-value of 100) and (ii) not spanning 5′ or 3′ untranslated regions (UTR), as well as ranked according to the number of isoforms they cover, amplicon length, and other primer quality parameters that were previously discussed (3,4). The top three primer pairs are then retained and displayed in the database with a star-based quality flag corresponding to the rank in this list. If no pair passes the filters, then the original primer design constraints are progressively relaxed until a candidate pair emerges, hence the warnings associated with some primers (the ‘warnings’ column that can be observed in Figure 1).

Figure 1.

Figure 1.

The GETPrime 2.0 search interface and tabular display. The figure shows several of the 30 primer pairs found for human gene MDM1. Results can be downloaded in tab-separated format through the ‘Download’ link. The search is restricted to an organism, Ensemble release, and a maximum number of lines (the smaller the number, the faster the query). Each result line corresponds to a single primer pair, and displays its unique ID, the gene, and transcript(s) it targets, its star-based rank (among the best three pairs found for the gene), the fraction of isoforms it covers, the amplicon length, the primer sequences and their respective melting temperatures, and the Ensembl annotation for the gene (KNOWN or NOVEL). The last two columns provide respectively warnings if the primer search did not work with standard parameters and a link to a primer pair-specific page shown in Figure 3.

Since its inception in 2011, the database has been used continuously and access statistics show a large user base. For example, the GETPrime web interface received nearly 1800 visits (by 1000 users) over the first 6 months of 2016 alone. Individual users also provided constructive feedback to further improve GETPrime, which in large part prompted the major update of the database (2.0) that is presented here.

Data integration

GETPrime 2.0 cross-references a number of data sources to document gene structures, transcript sequences, genome sequences, and annotated variants. The database now incorporates data from three versions of Ensembl (6): 50 (July 2008), 61 (February 2011), and 81 (July 2015). This is to keep backward compatibility with the first release of GETPrime, while updates will be performed on a regular basis. Relevant data from Ensembl is automatically imported into our PostgreSQL database (https://www.postgresql.org). Thanks to the uniform structure of the Ensembl database for various species, we can now easily select additional species and we currently host yeast, chicken, macaque, chimpanzee, rat, platypus, pufferfish, and Anolis carolinensis next to the previously established primers pairs for human, mouse, zebrafish, fly, and worm. Compared to version 1.0 (4), the database schema has been re-designed to improve the speed of queries via the web user interface and to provide two new interaction modes: a batch download capability and a programmatic interface (RESTful API).

User interface

The user interface of GETPrime 2.0 has been re-designed to make it faster, friendlier, and richer. It is based on a new 3-tier Ruby on Rails (RoR) (http://rubyonrails.org) application. Among many other features, this framework improves the efficiency of database queries and simplifies the rendering on web pages. It also implements a RESTful API that allows programmers to access the data directly (see documentation at http://bbcftools.epfl.ch/getprime/api_documentation). A new search engine allows searching by gene name, Ensembl gene ID or transcript ID or directly by the internal primer pair ID (Figure 1). The search box accepts up to 10 identifiers per search. When only one identifier is provided and does not match perfectly, a regular expression search is performed. This search tool uses the Jquery (mostly the Ajax method) and datatables.js Javascript libraries. The Ajax technology is used to update portions of the web pages following user selections without reloading the whole page. This improves the responsivity and flexibility of the display.

Primers are linked to a view in the UCSC genome browser (7) where they are displayed in their genomic context. In the UCSC view, primer pairs are identified by a unique numeric ID, by the gene and transcript they target, and by their rank in the list of candidates (Figure 2). This UCSC display is generated by uploading a single custom track (as a BED file) generated for each organism and Ensembl version. The BED file can be directly downloaded as well as the full database as TAB-separated files. Each primer pair is clickable and linked back to the GETPrime website, and more specifically to the page containing details about the primer. This page contains more information than the previous version of GETPrime. For example, next to the position in the genome of the primer sequences, the position and the length of the introns are reported when applicable.

Figure 2.

Figure 2.

The UCSC view of GETPrime 2.0 primer pairs. The two primers (in black) of each pair are displayed as thick bars connected by thin arrows revealing on which strand the pair of primers will amplify DNA. They are also mapped to their genomic coordinates, including the intron(s) that each primer potentially spans. In this example, six primer pairs are displayed. For the first three, both forward and reverse primers span an intron, whereas for the three other pairs, only the reverse primer spans an intron. Note that the format of the displayed identifier is the following: GETPrimeID|Ensembl-gene-ID_GETPrime-rank (e.g. 2111376|ENSG00000111554_3) and that the other primer pairs for MDM1 are not visible within this screenshot.

Sequence polymorphisms

Our knowledge of genomic variation within species and how such variants drive molecular and organismal diversity is rapidly increasing (812). One of the benefits of these advances is that we are now able to incorporate variant information (when available) in genomic experiments since such genetic variants may be an important source of experimental variability or even failure (13,14). Thus, to reduce experimental error, we decided to start displaying the presence of known SNPs within the GETPrime 2.0 primers to aid users in the design and interpretation of their experiments. So far, we were able to cover SNPs for human and mouse by importing them from dbSNP v145 (15) and to map these SNPs to the primers that overlap them. Corresponding positions in the primer sequences are then highlighted (Figure 3) and a link to the dbSNP-based evidence allows a more detailed evaluation of the nature and relevance of the polymorphism(s).

Figure 3.

Figure 3.

The GETPrime 2.0 primer details page. All information about one particular primer pair is summarized in this page: gene and transcript IDs, GETPrime warnings, and detailed information about each forward and reverse primer. Particularly relevant are the indication of SNP positions (in red) and whether a primer spans an intron as well as the UCSC display link.

Database content

The GETPrime 2.0 database currently contains a total of 1 175 874 primer pairs (444 256 in human, 268 855 in mouse), corresponding to an average of six pairs per covered gene (across 13 species). In human, there are more than 20 pairs per gene and 12 in mouse. On average, 92% of Ensembl protein-coding genes are covered by our database, the remainder corresponding to non-unique sequences for which specific primers could not be designed (Table 1). Importantly, for human and mouse, this number exceeds 98%. However, some species are still only partially covered due to differences in the Ensembl annotation compared to the human database. In particular, for A.carolinensis or macaque, only a fraction of the annotated genes were processed in the pipeline (Table 1). Moreover, the incomplete status of the macaque assembly led to a high failure rate of the pipeline probably due to the repetitive nature of unassembled contigs (Table 1). We plan to resolve both issues in a next release. Regarding polymorphisms, a total of 2 864 885 variants were mapped to human primers (492 968 in mouse), indicating that more than 80% of human primers overlap a documented variant, with an average of about three SNPs per primer. This illustrates the importance of considering this information when designing or using primers.

Table 1. Global statistics of GETPrime 2.0 for each of the 13 included species.

Species Number of genes in ensembl v81 Number of genes covered (% of total genes) Number of primer pairs Number of variants
Anolis carolinensis 19 19 (100%) 57
Caenorhabditis elegans 20 447 20 412 (99.8%) 104 810
Danio rerio 22 337 21 805 (97.6%) 121 576
Drosophila melanogaster 13 918 13 911 (99.9%) 99 032
Gallus gallus 5222 5204 (99.6%) 18 791
Homo sapiens 22 017 21 653 (98.3%) 444 256 2 864 885
Macaca mulatta 8693 1154 (13.2%) 5345
Mus musculus 22 155 21 835 (98.6%) 268 855 492 968
Ornithothynchus anatinus 170 149 (87.6%) 606
Pan troglodytes 140 140 (100%) 474
Rattus norvegicus 21 470 20 841 (97.0%) 88 311
Saccharomyces cerevisiae 6692 6620 (98.9%) 19 923
Tetraodon nigroviridis 1130 1125 (99.6%) 3838

CONCLUSION AND PERSPECTIVE

The steady access statistics of the GETPrime database are a testimony that the embedded primer information is useful and the release of GetPrime 2.0 responds to user feedback that we have received, namely: update the genomic data, extend to new species, and cross-reference new types of genomic data (polymorphisms). Our plan for the future is to maintain the availability of the database, keep it up-to-date and add new species when possible. In addition, we intend for GETPrime to closely follow and reflect the growth of genomic data resources at Ensembl and elsewhere. One additional important aspect would be a broader experimental validation of our in silico-designed primers. One way to do so would be to accommodate user feedback. We intend to implement a system that would allow the flagging of primers that have been successfully (or possibly even unsuccessfully) used in experiments, including links to the respective papers.

FUNDING

Swiss National Science Foundation Grant [#31003A_162735 to B.D.]; SyBIT project of SystemsX.ch (to J.R.); Swiss Federal Institute of Technology in Lausanne (EPFL). The open access publication charge for this paper has been waived by Oxford University Press—NAR Editorial Board members are entitled to one free paper per year in recognition of their work on behalf of the journal.

Conflict of interest statement. None declared.

REFERENCES

  • 1.Pelechano V., Wei W., Steinmetz L.M. Extensive transcriptional heterogeneity revealed by isoform profiling. Nature. 2013;497:127–131. doi: 10.1038/nature12121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Kornblihtt A.R., Schor I.E., Alló M., Dujardin G., Petrillo E., Muñoz M.J. Alternative splicing: a pivotal step between eukaryotic transcription and translation. Nat. Rev. Mol. Cell Biol. 2013;14:153–165. doi: 10.1038/nrm3525. [DOI] [PubMed] [Google Scholar]
  • 3.Derveaux S., Vandesompele J., Hellemans J. How to do successful gene expression analysis using real-time PCR. Methods. 2010;50:227–230. doi: 10.1016/j.ymeth.2009.11.001. [DOI] [PubMed] [Google Scholar]
  • 4.Gubelmann C., Gattiker A., Massouras A., Hens K., David F., Decouttere F., Rougemont J., Deplancke B. GETPrime: a gene- or transcript-specific primer database for quantitative real-time PCR. Database. 2011;2011:bar040. doi: 10.1093/database/bar040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Marshall O.J. PerlPrimer: cross-platform, graphical primer design for standard, bisulphite and real-time PCR. Bioinformatics. 2004;20:2471–2472. doi: 10.1093/bioinformatics/bth254. [DOI] [PubMed] [Google Scholar]
  • 6.Yates A., Akanni W., Amode M.R., Barrell D., Billis K., Carvalho-Silva D., Cummins C., Clapham P., Fitzgerald S., Gil L., et al. Ensembl 2016. Nucleic Acids Res. 2016;44:D710–D716. doi: 10.1093/nar/gkv1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Speir M.L., Zweig A.S., Rosenbloom K.R., Raney B.J., Paten B., Nejad P., Lee B.T., Learned K., Karolchik D., Hinrichs A.S., et al. The UCSC Genome Browser database: 2016 update. Nucleic Acids Res. 2016;44:D717–D725. doi: 10.1093/nar/gkv1275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., Abecasis G.R., et al. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Deplancke B., Alpern D., Gardeux V. The genetics of transcription factor DNA binding variation. Cell. 2016;166:538–554. doi: 10.1016/j.cell.2016.07.012. [DOI] [PubMed] [Google Scholar]
  • 10.Albert F.W., Kruglyak L. The role of regulatory variation in complex traits and disease. Nat. Rev. Genet. 2015;16:197–212. doi: 10.1038/nrg3891. [DOI] [PubMed] [Google Scholar]
  • 11.Keane T.M., Goodstadt L., Danecek P., White M.A., Wong K., Yalcin B., Heger A., Agam A., Slater G., Goodson M., et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature. 2011;477:289–294. doi: 10.1038/nature10413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Huang W., Massouras A., Inoue Y., Peiffer J., Ràmia M., Tarone A.M., Turlapati L., Zichner T., Zhu D., Lyman R.F., et al. Natural variation in genome architecture among 205 Drosophila melanogaster Genetic Reference Panel lines. Genome Res. 2014;24:1193–1208. doi: 10.1101/gr.171546.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Taris N., Lang R.P., Camara M.D. Sequence polymorphism can produce serious artefacts in real-time PCR assays: hard lessons from Pacific oysters. BMC Genomics. 2008;9:234. doi: 10.1186/1471-2164-9-234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Boyle B., Dallaire N., MacKay J. Evaluation of the impact of single nucleotide polymorphisms and primer mismatches on quantitative PCR. BMC Biotechnol. 2009;9:75. doi: 10.1186/1472-6750-9-75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Sherry S.T., Ward M.H., Kholodov M., Baker J., Phan L., Smigielski E.M., Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–311. doi: 10.1093/nar/29.1.308. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES