Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2013 Dec 6;42(Database issue):D749–D755. doi: 10.1093/nar/gkt1196

Ensembl 2014

Paul Flicek 1,2,*, M Ridwan Amode 2, Daniel Barrell 2, Kathryn Beal 1, Konstantinos Billis 2, Simon Brent 2, Denise Carvalho-Silva 1, Peter Clapham 2, Guy Coates 2, Stephen Fitzgerald 1, Laurent Gil 1, Carlos García Girón 2, Leo Gordon 1, Thibaut Hourlier 2, Sarah Hunt 1, Nathan Johnson 1, Thomas Juettemann 1, Andreas K Kähäri 2, Stephen Keenan 1, Eugene Kulesha 1, Fergal J Martin 2, Thomas Maurel 1, William M McLaren 1, Daniel N Murphy 2, Rishi Nag 2, Bert Overduin 1, Miguel Pignatelli 1, Bethan Pritchard 2, Emily Pritchard 1, Harpreet S Riat 2, Magali Ruffier 1, Daniel Sheppard 2, Kieron Taylor 1, Anja Thormann 1, Stephen J Trevanion 2, Alessandro Vullo 1, Steven P Wilder 1, Mark Wilson 2, Amonida Zadissa 1, Bronwen L Aken 2, Ewan Birney 1, Fiona Cunningham 1, Jennifer Harrow 2, Javier Herrero 1, Tim JP Hubbard 2, Rhoda Kinsella 1, Matthieu Muffato 1, Anne Parker 2, Giulietta Spudich 1, Andy Yates 1, Daniel R Zerbino 1, Stephen MJ Searle 2
PMCID: PMC3964975  PMID: 24316576

Abstract

Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.

INTRODUCTION

The Ensembl project (http://www.ensembl.org) creates and distributes genome annotations and provides integrated views of other valuable genomic data for supported chordate genomes. Our resources are intended to serve as community reference datasets on which other genomic research can be built. As such, Ensembl provides unique tools, datasets and user support compared to similar projects such as the UCSC Genome Browser (1), while supporting community standards that promote interoperability in genomics. For example, we have developed and distribute an extensive, open software infrastructure with diverse analysis pipelines supporting a variety of genome analyses (2) and the artificial intelligence inspired eHive analysis management system (3); data mining and analysis tools that include BioMart (4) and the Ensembl Variant Effect Predictor (VEP) (5); supported and robust application programing interfaces (APIs) (6) and a unique genome browser interface (7). Our software is distributed using a permissive Apache-style open-source license meaning that, unlike similar software, it is free for all potential users. Additionally, our data is provided without restriction and we have the most comprehensive suite of training options of any public genomics tool to maximize usability. In common with the UCSC Genome Browser, Ensembl supports community standard file formats such as BAM, BED, wiggle and other common file types. We have also incorporated support for track hubs over the past year to enable researchers to set up and view large-scale datasets. For example, the data produced by the ENCODE consortium (8) can be viewed by loading the ENCODE track hub (9) and users can then access an experiment matrix within the Ensembl configuration menu and quickly select datasets by cell or experiment type. However, over and above simply displaying these data, Ensembl uses them as described below in our integrative Regulatory Build analysis resulting in an evidence-based annotation of whole-genome regulation.

Ensembl resources are available for a total of 77 species as of release 73 (September 2013) with human, mouse, zebrafish, rat and various farm animals having the most extensive support. For 60 chordate species, we have full support comprising evidence-based gene annotation and comparative genomics analysis. In addition, for 18 of these species, there are variation resources and regulatory annotation for human and mouse. At present 13 additional chordate species are accessible with basic support via Ensembl preview sites (available from http://pre.ensembl.org), which provide BLAST access to the genome data and genome visualization, but not a complete gene build. Three non-chordate model species are also fully supported by Ensembl—worm (Caenorhabditis elegans), fruit fly (Drosophila melanogaster) and yeast (Saccharomyces cerevisiae)—with imported annotation from their respective genome databases in partnership with the Ensembl Genomes project (10). All fully supported species are accessible via the Ensembl BioMart, the Ensembl APIs and web displays. All data are also available for querying via our public MySQL servers, as full data downloads and as an Amazon public dataset.

Since our last report (11), we have added two new species with full gene annotation and comparative genomics support: duck (Anas platyrhynchos) (12) and collared flycatcher (Ficedula albicollis) and one new species with variation support: gibbon. New assemblies with corresponding updates to the gene annotations, alignments and variation data were also provided for rat, cat and chicken. At the same time, we added seven new species with basic support on the Ensembl preview site: blind cave fish (Astyanax mexicanus), white rhinoceros (Ceratotherium simum simum), baboon (Papio anubis), prairie vole (Microtus ochrogaster), vervet monkey (Chlorocebus sabaeus), naked mole-rat (Heterocephalus glaber) and aardvark (Orycteropus afer) and updated the preview sites for common shrew (Sorex araneus), bottle-nosed dolphin (Tursiops truncatus), American pika (Ochotona princeps) and armadillo (Dasypus novemcinctus) with new assemblies. In addition, as the human and mouse genome assemblies are updated regularly by the Genome Reference Consortium (GRC) to include alternate sequences in the form of ‘fix’ and ‘novel’ assembly patches (13), we include these additional alternate sequences and annotate them with genes, variation and other features as appropriate. Ensembl release 73 (September 2013) included the human GRCh37.p12 assembly (i.e. the twelfth patch release of the GRCh37 assembly) and the GRCm38.p1 mouse assembly.

In addition to the newly support species and community standards reported above, the most important updates over the last year that have advanced the project since our last report (11) include new and more comprehensive phenotype annotations most valuable to those interested in human disease research, scrollable genome browsing designed to appeal to all users of our web interface and new REST endpoints supporting more flexible analysis options for those users that interact with the Ensembl resources programmatically. These and other features are described in more detail below.

Ensembl browser

This year we significantly updated Ensembl’s main Region in Detail page with the full incorporation of the Javascript-based, scrollable and zoomable browser, Genoverse, in place of the overview panel that had been a part of Ensembl for >10 years. Older, unsupported browsers fall back to the previous non-scrolling overview image. Genoverse allows users to scroll back and forth along the genome and update the main image below it to show the new region (Figure 1A). Our search engine was also upgraded from Lucene to Solr and we implemented a new search interface with features such as faceting and auto-completion.

Figure 1.

Figure 1.

(A) The scrollable Genoverse view (with view control icons in the upper right corner) provides the overview panel on the Region in Detail page. Image from URL: http://e73.ensembl.org/Homo_sapiens/Location/View?r=6:133017695-133161157. (B) Phenotype data from DDG2P, OMIM, Orphanet, HGMD and COSMIC for the human gene ATP2A2. Image created from URL http://e73.ensembl.org/Homo_sapiens/Gene/Phenotype?db=core;g=ENSG00000174437;r=12:110718561-110788898.

Our web displays dedicated to variation and phenotype data were also markedly improved. We specifically focused on displays for structural variants, which are now coloured by class and the higher quality structural variants from the 1000 Genomes Project are provided in a separate track. We introduced a page for structural variant phenotype data and now have additional phenotype data, including the variants from NCBI’s ClinVar project that are classified as being probable-pathogenic, pathogenic, drug-response or histocompatibility. Phenotype data from multiple sources are integrated and displayed for relevant genes (Figure 1B). Variants in regulatory regions are now annotated on all tracks and variant names are visible when zoomed in on all displays. We have also improved the VEP visual output in the form of summary pie charts.

Beyond these major developments, there have been other important improvements. In particular, we improved the handling of user data with a streamlined upload interface and support for uploading VEP output files. Additionally, configuration of complex data hubs has been made easier by displaying the track options in a matrix similar to the existing configurations for regulation data.

Ensembl annotations

All Ensembl annotations whether gene, variation or regulation, are based on integration of relevant data sources. We update the human gene set for every Ensembl release via a merge of the Ensembl evidence-based automatic annotation and Havana (14) manual annotation to produce an updated GENCODE gene set. This set also includes all current human Consensus Coding Sequence (CCDS) gene models (15). Manual annotation from Havana is additionally incorporated into our gene sets on alternate releases for zebrafish (16) and for mouse, which also includes all current CCDS gene models. Pig includes manual annotation from Havana on selected regions of the genome. The year 2013 has seen the inclusion of RNASeq data for seven species: human, chicken, cat, collared flycatcher, gibbon, rabbit and anole lizard. For gibbon, rabbit and anole lizard, the RNASeq data were used to update an existing standard gene annotation whereas for other species (except human) the data were integrated into the annotation as part of the primary gene-build process. Some of these species are provided with tissue-specific RNASeq samples which allow users to explore tissue-specific expression.

Ensembl variation annotation integrates all publicly available variation datasets to provide a coherent and complete resource for variome interpretation across 217 million variants in the 18 Ensembl species with supported variation resources. Basic variation data including genomic location, allele changes, allele and genotype frequencies and population data are imported for SNPs and indels from dbSNP (17) and for structural variants from DGVa (18). Additional human data are imported directly from the 1000 Genomes Project (19), the Exome Sequencing Project (20) and from 14 individual genomes that provide genotype information. In addition to the newly supported gibbon data listed above, over the past year, updated variation resources were released for human, platypus, cow, mouse, pig, zebrafish, opossum, orang-utan and macaque. We also extended cross-references to new and popular genotyping chips for human, chicken, horse and cow.

We significantly expanded our support for human phenotype data in Ensembl beyond the UniProt (21), OMIM (22), EGA, HGMD Public (23) (variation location only), COSMIC somatic mutations (24) and NHGRI GWAS Catalog (25) resources that we have supported in the past. New data from ClinVar, Orphanet (26), the Developmental Disorder Genotype—Phenotype Database (DDG2P) from DECIPHER (27), dbGaP, Phencode and the MAGIC and GIANT consortiums have now been fully integrated into Ensembl (Figure 1B). From these data sources, we select only the significant associations to display on the website and provide full datasets in the database and BioMart. For variants stored in LOVD (28), Ensembl queries LOVD directly and displays the information on the appropriate variation web page. In addition, we have now incorporated phenotype information for other species. Mouse phenotype data are provided from international projects including EuroPhenome (29) and IMPC (International Mouse Phenotyping Consortium) (30). For other animals, data are imported from the Online Mendelian Inheritance in Animals (OMIA) database (31). This year we have also developed a new pipeline to cross-reference publications citing variants from EuropePMC, NCBI and UCSC.

Regulatory annotation in Ensembl is currently available across multiple human and mouse cell lines. The main resource, the Ensembl Regulatory Build is a comprehensive synthesis of functional assays provide by a number of consortia, such as ENCODE (8) and Roadmap Epigenomics Mapping Consortium (32). Although the raw data from these projects can be displayed directly on Ensembl through dedicated track hubs, the Regulatory Build is a higher level integrated analysis that defines a collection of Regulatory Features (i.e. regions of the genome that display regulatory activity in one of 13 human or five murine cell lines). Where relevant, transcription-factor binding sites are predicted on these regions using the JASPAR binding motifs (33).

Additionally, Ensembl links out to relevant externally curated databases of regulatory data including enhancer regions from VISTA (34), miRNA binding sites from Microcosm http://www.ebi.ac.uk/enright-srv/microcosm/) and eQTLs from Genevar (35). Several reference DNA methylation experiments are also included.

COMPARATIVE GENOMICS RESOURCES

Whole-genome alignments of vertebrate species are provided within the Ensembl Compara database. Because all genome assemblies are not sequenced to the same level of completeness, we group the assemblies into two tiers for differential processing. High quality genomes from 13 species are aligned into a progressive multiple sequence alignment using the EPO (Enredo-Pecan-Ortheus) pipeline (36,37), which also estimates the underlying ancestral genome sequences. The low coverage genomes of an additional 23 species, that are much more fragmented, are inserted into the previous alignment by mapping them onto the human assembly with LASTZ. We also produce clade-specific multiple sequence alignments for primates, birds, fish and amniotes. In particular, the fish multiple alignment has been extended to eight species this year. From these alignments, we compute the conservation at every position, using GERP (38).

Our gene-based comparative genomics resources are updated every release to incorporate new species, updated assemblies and gene annotation sets. These include gene phylogenetic trees, gene families and gene dynamics. This year, the inclusion of duck and collared flycatcher and an update of the guide species tree have greatly improved the quality of the gene trees in the Sauria clade and reduced the number of poorly supported duplications from 74% to 30%. In close collaboration with the TreeFam (39) and Ensembl Genomes (10) projects, we will migrate to an HMM-based classification for GreeTree annotation, which will reduce a key quadratic complexity to a linear one.

Data access, data mining and quality control

Ensembl’s REST service, available at http://beta.rest.ensembl.org, continues to be actively developed as a public beta (Figure 2). This year has seen the addition of 12 new endpoints including access to translation features, SNVs and protein domains, as well as access to whole-genome alignments by region. Additionally, the REST API is now able to query GeneTrees by their containing member stable identifier or gene symbols such as HGNC. Other new endpoints include the ability to use a stable identifier to identify overlapping features and location information as well as access to NCBI taxonomy and ontology datasets. The ontology endpoints currently provide the gene ontology (GO) (40), sequence ontology (SO) (41) and experimental factor ontology (EFO) (42) information used within Ensembl. The REST service will move out of beta during the next year coinciding with the introduction of POST requests and improved VEP integration.

Figure 2.

Figure 2.

Usage and example output for the Ensembl REST server Fetch Variant Consequences endpoint.

The more established BioMart data-mining tool (43) provides users with a variety of ways of accessing the Ensembl data quickly and with relative ease. Users can choose to access the data via the MartView web interface or via the MartService routes including the BioMart Perl API, DAS server, SOAP, REST or BioConductor biomaRt package. The Ensembl BioMart databases (4) are built from scratch each release in order to incorporate the latest annotated and imported data, and they are current with the data resources described above.

Beyond programmatic and tool-based data-access methods, we continue to provide complete data downloads in a variety of formats including the VCF files that were introduced this year to distribute many subsets of the Ensembl variation data.

To manage the increasing size and complexity of Ensembl releases, we have increased our quality control (QC) procedures over the past year. These are an essential part of each release cycle and range from validation testing of the various APIs to methods for checking data and database integrity. The Ensembl gene set is also independently analyzed using a specific curation/QC pipeline run for all updates of the human and mouse gene sets. This procedure compares the set of Ensembl translations for a particular species directly to the publicly available sequence resources UniProt (21) and RefSeq (44) and reports the percentage identity of the alignments. In addition to the above species, the pipeline has been employed for the rat, zebrafish and chicken genomes.

Ensembl variation and regulatory resources also rely on comprehensive and flexible infrastructure to manage the growing amount of relevant datasets in the public domain. In preparation for the updated human reference genome (GRCh38) expected at the end of 2013, we consolidated these pipelines to work automatically over a large number of files with minimal supervision. Specifically, more of our pipelines are run using the eHive system (3) and employ both a modular structure to the analysis and on-the-fly calculation for specific data types stored in the databases.

Outreach and training

Ensembl supports users through worldwide face-to-face training workshops, our helpdesk@ensembl.org and dev@ensembl.org email lists, online training and social media. This year we made several changes to intensify and connect our user interactions on Twitter (https://twitter.com/Ensembl), Facebook (https://www.facebook.com/Ensembl.org) and the Ensembl Blog (http://www.ensembl.info/). Together these methods have proven extremely effective for supporting users beyond the traditional mailing list and FAQ model that we continue to maintain.

Distance and on-line training are provided through our Helpdesk channel on YouTube, which saw >70% growth in the number of subscribers and incorporated two new videos during the year. For those users wanting more intensive training, we filmed and now provide a one-day Ensembl browser workshop (http://www.ebi.ac.uk/training/online/course/ensembl-filmed-browser-workshop) and a three-day Ensembl API workshop (http://www.ebi.ac.uk/training/online/course/ensembl-filmed-api-workshop), both complete with videos and exercises. A Quick Ensembl course providing a short introduction to the browser (http://www.ebi.ac.uk/training/online/course/ensembl-quick-tour-0) complements the more complete workshops.

ACKNOWLEDGEMENTS

The authors wish to thank all of their users, the systems teams who maintain their computational infrastructure and those researchers who have provided data to Ensembl in advance of publication under the understandings of the Fort Lauderdale meeting discussing Community Resource Projects and the Toronto meeting on pre-publication data sharing.

FUNDING

The Wellcome Trust provides majority funding for the Ensembl project [WT095908 and WT098051] with additional funding for specific project components from the National Human Genome Research Institute [U41HG007234, 1R24RR032658 and 1R01HD074078], the BBSRC [BB/I025506/1, BB/I025360/1 and BB/K009524/1], the European Molecular Biology Laboratory and as specified: The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 222664. (“Quantomics”); This Publication reflects only the author’s views and the European Community is not liable for any use that may be made of the information contained herein. The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement number [200754] - the GEN2PHEN project; The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement [n° 282510 - BLUEPRINT]; Work supported by the European Commission within the Framework Programme 7 Capacities Specific Programme, under Grant Agreement [no. 312301] (Helix Nebula - The Science Cloud; Rat genomics resources receive additional support as specified: The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement [N° HEALTH-F4-2010-241504] (EURATRANS). Funding for open access charge: The Wellcome Trust.

Conflict of interest statement. None declared.

REFERENCES

  • 1.Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, et al. The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res. 2013;41:D64–D69. doi: 10.1093/nar/gks1048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Potter SC, Clarke L, Curwen V, Keenan S, Mongin E, Searle SMJ, Stabenau A, Storey R, Clamp M. The Ensembl analysis pipeline. Genome Res. 2004;14:934–941. doi: 10.1101/gr.1859804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Severin J, Beal K, Vilella AJ, Fitzgerald S, Schuster M, Gordon L, Ureta-Vidal A, Flicek P, Herrero J. eHive: an artificial intelligence workflow system for genomic analysis. BMC Bioinform. 2010;11:240. doi: 10.1186/1471-2105-11-240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kinsella RJ, Kähäri A, Haider S, Zamora J, Proctor G, Spudich G, Almeida-King J, Staines D, Derwent P, Kerhornou A, et al. Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database (Oxford) 2011;2011 doi: 10.1093/database/bar030. bar030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010;26:2069–2070. doi: 10.1093/bioinformatics/btq330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Stabenau A, McVicker G, Melsopp C, Proctor G, Clamp M, Birney E. The Ensembl core software libraries. Genome Res. 2004;14:929–933. doi: 10.1101/gr.1857204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Parker A, Bragin E, Brent S, Pritchard B, Smith JA, Trevanion S. Using caching and optimization techniques to improve performance of the Ensembl website. BMC Bioinform. 2010;11:239. doi: 10.1186/1471-2105-11-239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Rosenbloom KR, Sloan CA, Malladi VS, Dreszer TR, Learned K, Kirkup VM, Wong MC, Maddren M, Fang R, Heitner SG, et al. ENCODE Data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res. 2013;41:D56–D63. doi: 10.1093/nar/gks1172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kersey PJ, Staines DM, Lawson D, Kulesha E, Derwent P, Humphrey JC, Hughes DST, Keenan S, Kerhornou A, Koscielny G, et al. Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species. Nucleic Acids Res. 2012;40:D91–D97. doi: 10.1093/nar/gkr895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, et al. Ensembl 2013. Nucleic Acids Res. 2013;41:D48–D55. doi: 10.1093/nar/gks1236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Huang Y, Li Y, Burt DW, Chen H, Zhang Y, Qian W, Kim H, Gan S, Zhao Y, Li J, et al. The duck genome and transcriptome provide insight into an avian influenza virus reservoir species. Nat Genet. 2013;45:776–783. doi: 10.1038/ng.2657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Church DM, Schneider VA, Graves T, Auger K, Cunningham F, Bouk N, Chen H-C, Agarwala R, McLaren WM, Ritchie GRS, et al. Modernizing reference genome assemblies. PLoS Biol. 2011;9:e1001091. doi: 10.1371/journal.pbio.1001091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wilming LG, Gilbert JGR, Howe K, Trevanion S, Hubbard T, Harrow JL. The vertebrate genome annotation (Vega) database. Nucleic Acids Res. 2008;36:D753–D760. doi: 10.1093/nar/gkm987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Harte RA, Farrell CM, Loveland JE, Suner M-M, Wilming L, Aken B, Barrell D, Frankish A, Wallin C, Searle S, et al. Tracking and coordinating an international curation effort for the CCDS Project. Database (Oxford) 2012;2012 doi: 10.1093/database/bas008. bas008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, Collins JE, Humphray S, McLaren K, Matthews L, et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature. 2013;496:498–503. doi: 10.1038/nature12111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–311. doi: 10.1093/nar/29.1.308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Lappalainen I, Lopez J, Skipper L, Hefferon T, Spalding JD, Garner J, Chen C, Maguire M, Corbett M, Zhou G, et al. dbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res. 2013;41:D936–D941. doi: 10.1093/nar/gks1213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Fu W, O’Connor TD, Jun G, Kang HM, Abecasis G, Leal SM, Gabriel S, Altshuler D, Shendure J, Nickerson DA, et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. 2013;493:216–220. doi: 10.1038/nature11690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.UniProt Consortium. Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res. 2013;41:D43–D47. doi: 10.1093/nar/gks1068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Amberger J, Bocchini C, Hamosh A. A new face and new challenges for Online Mendelian Inheritance in Man (OMIM(®)) Hum Mutat. 2011;32:564–567. doi: 10.1002/humu.21466. [DOI] [PubMed] [Google Scholar]
  • 23.Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NST, Abeysinghe S, Krawczak M, Cooper DN. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 2003;21:577–581. doi: 10.1002/humu.10212. [DOI] [PubMed] [Google Scholar]
  • 24.Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011;39:D945–D950. doi: 10.1093/nar/gkq929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. doi: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Rath A, Olry A, Dhombres F, Brandt MM, Urbero B, Ayme S. Representation of rare diseases in health information systems: the Orphanet approach to serve a wide range of end users. Hum. Mutat. 2012;33:803–808. doi: 10.1002/humu.22078. [DOI] [PubMed] [Google Scholar]
  • 27.Swaminathan GJ, Bragin E, Chatzimichali EA, Corpas M, Bevan AP, Wright CF, Carter NP, Hurles ME, Firth HV. DECIPHER: web-based, community resource for clinical interpretation of rare variants in developmental disorders. Hum Mol Genet. 2012;21:R37–R44. doi: 10.1093/hmg/dds362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Fokkema IFAC, Taschner PEM, Schaafsma GCP, Celli J, Laros JFJ, den Dunnen JT. LOVD v.2.0: the next generation in gene variant databases. Hum. Mutat. 2011;32:557–563. doi: 10.1002/humu.21438. [DOI] [PubMed] [Google Scholar]
  • 29.Morgan H, Beck T, Blake A, Gates H, Adams N, Debouzy G, Leblanc S, Lengger C, Maier H, Melvin D, et al. EuroPhenome: a repository for high-throughput mouse phenotyping data. Nucleic Acids Res. 2010;38:D577–D585. doi: 10.1093/nar/gkp1007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Mallon A-M, Iyer V, Melvin D, Morgan H, Parkinson H, Brown SDM, Flicek P, Skarnes WC. Accessing data from the International Mouse Phenotyping Consortium: state of the art and future plans. Mamm. Genome. 2012;23:641–652. doi: 10.1007/s00335-012-9428-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lenffer J, Nicholas FW, Castle K, Rao A, Gregory S, Poidinger M, Mailman MD, Ranganathan S. OMIA (Online Mendelian Inheritance in Animals): an enhanced platform and integration into the Entrez search interface at NCBI. Nucleic Acids Res. 2006;34:D599–D601. doi: 10.1093/nar/gkj152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR, et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 2010;28:1045–1048. doi: 10.1038/nbt1010-1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 2010;38:D105–D110. doi: 10.1093/nar/gkp950. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Visel A, Minovitsky S, Dubchak I, Pennacchio LA. VISTA Enhancer Browser–a database of tissue-specific human enhancers. Nucleic Acids Res. 2007;35:D88–D92. doi: 10.1093/nar/gkl822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Yang T-P, Beazley C, Montgomery SB, Dimas AS, Gutierrez-Arcelus M, Stranger BE, Deloukas P, Dermitzakis ET. Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies. Bioinformatics. 2010;26:2474–2476. doi: 10.1093/bioinformatics/btq452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Paten B, Herrero J, Beal K, Fitzgerald S, Birney E. Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res. 2008;18:1814–1828. doi: 10.1101/gr.076554.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Paten B, Herrero J, Fitzgerald S, Beal K, Flicek P, Holmes I, Birney E. Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res. 2008;18:1829–1843. doi: 10.1101/gr.076521.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Cooper,G.M., Stone,E.A., Asimenos,G., NISC Comparative Sequencing Program, Green,E.D., Batzoglou,S. and Sidow,A. (2005) Distribution and intensity of constraint in mammalian genomic sequence. Genome Res., 15, 901–913. [DOI] [PMC free article] [PubMed]
  • 39.Ruan J, Li H, Chen Z, Coghlan A, Coin LJM, Guo Y, Hériché J-K, Hu Y, Kristiansen K, Li R, et al. TreeFam: 2008 Update. Nucleic Acids Res. 2008;36:D735–D740. doi: 10.1093/nar/gkm1005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 2005;6:R44. doi: 10.1186/gb-2005-6-5-r44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Malone J, Holloway E, Adamusiak T, Kapushesky M, Zheng J, Kolesnikov N, Zhukova A, Brazma A, Parkinson H. Modeling sample variables with an Experimental Factor Ontology. Bioinformatics. 2010;26:1112–1118. doi: 10.1093/bioinformatics/btq099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, Kasprzyk A. BioMart–biological queries made easy. BMC Genom. 2009;10:22. doi: 10.1186/1471-2164-10-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Pruitt KD, Tatusova T, Brown GR, Maglott DR. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012;40:D130–D135. doi: 10.1093/nar/gkr1079. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES