Abstract
The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the key community mouse database which supports basic, translational and computational research by providing integrated data on the genetics, genomics, and biology of the laboratory mouse. MGD serves as the source for biological reference data sets related to mouse genes, gene functions, phenotypes and disease models with an increasing emphasis on the association of these data to human biology and disease. We report here on recent enhancements to this resource, including improved access to mouse disease model and human phenotype data and enhanced relationships of mouse models to human disease.
INTRODUCTION
The laboratory mouse is used extensively as a model for investigating the etiopathogenesis of human disease. The relevance of laboratory mouse to biomedical research includes its extensive experimental genetics capabilities, fully-sequenced inbred strain genomes, published genotype to phenotype associations, and data for genome wide coverage of induced variation from mouse large-scale mutagenesis programs (1–4) and reviewed in (5). Resources such as the Collaborative Cross (6) and Diversity Outbred projects (7) capture most of the genetic variation present in laboratory mouse strains and serve as unique platforms for studies of human relevant quantitative traits and complex inherited syndromes. Furthermore, the International Mouse Phenotyping Consortium (IMPC) is generating a catalogue of gene function through systematic generation and phenotyping of a genome-wide collection of traditional gene knockout (KO) and CRIPSR mutant lines in the mouse (8).
The Mouse Genome Database (MGD) is the primary community knowledgebase for mouse phenotype and gene function and mouse models of human disease (9). MGD’s goal is to advance understanding of human biology and disease by facilitating access to integrated genetics and genomic data for the laboratory mouse. To this end, MGD serves as an authoritative resource for the catalog of mouse genes and genome features connecting reference genomic sequence information to mouse biological data, including (i) molecular function, biological process and cellular location information encoded using the Gene Ontology (GO); (ii) a comprehensive listing of mouse mutations, variants and human disease models, with mutant genotypes annotated to Mammalian Phenotype (MP) terms and Disease Ontology (DO) terms and (iii) provide authoritative nomenclature and identifiers for mouse gene names, symbols, alleles and strains as the recognized primary community resource (Table 1). Standardization of mouse nomenclature and annotation of data with commonly used biological ontologies and standardized vocabularies ensure that data are consistently annotated, making precise data mining possible.
Table 1. Data for which MGD serves as an authoritative source. In addition to providing unique IDs and symbols for genes, alleles and strains, MGD expertly curates functional, phenotype and disease model data from literature into MGD.
Data type | Description |
---|---|
Unified mouse genome feature catalog | Catalog of integrated predictions from Ensembl, NCBI, Havana/Vega; used by NCBI, IMPC, etc. |
Gene Ontology (GO) annotations for mouse | Curated from literature and integrated from others |
Mouse phenotype annotations | Curated from literature, integrated with data from large scale projects |
Mouse models of human disease | Curated mouse models of human disease annotated with Disease Ontology |
Gene to nucleotide sequence association | Co-curation with MGA (Mouse Genome Annotation) group |
Gene to protein sequence association | Co-curation with UniProt and Protein Ontology groups |
Mammalian Phenotype (MP) Ontology | Developed, distributed and used by MGD; also used by RGD, IMPC, DMDD, etc. |
Gene and genome feature symbols, names and IDs | Created using International Nomenclature Guidelines in coordination with human and rat nomenclature groups |
Mutation symbols, names and IDs | Nomenclature and IDs for mouse mutations are assigned and provided by MGD |
Mouse strain nomenclature and IDs | Created and provided by MGD; nomenclature assistance is also provided to other mouse repositories |
MGD is a core component of the Mouse Genome Informatics (MGI) consortium (http://www.informatics.jax.org). Other database resources coordinated through the MGI consortium include the Gene Expression Database (GXD) (10), the Mouse Tumor Biology Database (MTB) (11), the Gene Ontology project (GO) (12), MouseMine (13), the International Mouse Strain Resource (IMSR) (14) and the CrePortal database of recombinase expressing mice (www.CrePortal.org, unpublished). Data and information for these resources are obtained through a combination of expert curation of the biomedical literature and by automated or semi-automatic processing of data sets downloaded from more than fifty other data resources. Metrics of current MGD content is shown in Table 2.
Table 2. Summary of MGD content, September 2017.
Genes and genome features with nucleotide sequence data | 47 693 |
Genes with protein sequence data | 24 317 |
Genes with human orthologs | 17 089 |
Genes with rat orthologs | 18 509 |
Genes with GO annotations | 24 502 |
Total GO annotations | 312 109 |
Mutant alleles in mice | 51 378 |
Genes with mutant alleles in mice | 12 401 |
Mouse QTL | 6257 |
Genotypes with phenotype annotation (MP) | 60 951 |
Total MP annotations | 315 657 |
Mouse models (genotypes) associated with human diseases | 6027 |
References in the MGD bibliography | 237 578 |
In this report, we highlight several improvements to the capture, annotation, integration and presentation of data associated with mouse models of human disease. These include the incorporation of the Disease Ontology (DO), new disease detail pages and ontology browser with listings of associated genes and mouse models, improved ontology vocabulary browsers and incorporation of a Human Phenotype Ontology (HPO) browser. We added human disease to phenotype relationships from Orphanet, which expands our number of human diseases annotated to HPO terms. We now include phenotype data from Deciphering the Mechanisms of Developmental Disorders (DMDD) project. Finally, we are participating with the new Alliance of Genome Resources member groups in efforts to create a new data resource for comparative biology (http://www.alliancegenome.org).
NEW FEATURES AND IMPROVEMENTS
The three primary areas of improved and new functionality in MGD center on disease and phenotype annotation and the user interfaces used to deliver these annotations to the biomedical research community.
Disease Ontology (DO) is incorporated into MGD
The Disease Ontology (DO) is a community effort to provide standard terms for annotating phenotypic data (15). It is a hierarchical ontology, built on a Directed Acyclic Graph (DAG) structure, that integrates vocabularies from MeSH, ICD, NCI Thesaurus, SNOMED, UMLS, Orphanet, EFO and OMIM. Its hierarchical structure permits a range of detail from high-level, broadly descriptive terms to low-level, very specific terms. This range is useful for annotating mouse model data to the level of detail known and for searching for this information using either whole systems or specific terms as search criteria.
We have adopted the use of Disease Ontology in MGD to annotate mouse models of human disease and participated in updating and making new additions to DO. Previously we relied on disease titles from the Online Mendelian Inheritance in Man (OMIM) database (16) for annotations of mouse models of human disease. This approach limited the scope of disease terms because of the focus of OMIM on human diseases with familial inheritance. Furthermore, the OMIM term list does not exist in a complete hierarchical structure, limiting search and retrieval efficiency. Existing MGI mouse-human disease model annotations to OMIM terms have been translated to DO terms. The new MGD Disease Ontology Browser (Figure 1) allows users to navigate the ontology and see associated genes and mouse models, either in a tree or graphical view. Links to other disease vocabularies and ontologies, including OMIM, are provided when available. The additional genes tab and models tab for terms show all data for the selected term and also for any of the more specific subclasses in the ontology. The MGI Quick Search field, Human-Mouse Disease Connection, and other advanced query forms will support searches using the DO terms or IDs.
Ontology browser redesign
Improvements to the Mammalian Phenotype (MP), Gene Ontology (GO), and the Adult Mouse Anatomy browsers include implementation of new format and search options including an autocomplete option to assist in finding relevant terms (Figure 2). Tree views include easy navigation with links to superclasses and to annotated mouse phenotype, function and expression data. Toggles in the tree structure allow expansion or collapse of particular sections of interest.
MGD has also implemented a new browser created by MGI that features the Human Phenotype Ontology (17) developed by Peter Robinson and colleagues (http://www.informatics.jax.org/vocab/hp_ontology). Users can either browse or search the HPO browser to view terms, definitions and links to disease detail pages that are associated with the phenotypic feature at the Human-Mouse Disease Connection (HMDC) portal.
Improvements to the Human-Mouse Disease Connection (HMDC)
The Human-Mouse Disease Connection (HMDC, http://www.diseasemodel.org) is a translational tool designed for exploring and comparing human and mouse phenotypes and their associations with known human diseases. It also provides rapid access to mouse model resources and supporting references. Searches can be initiated based on human or mouse data using one or more parameters, including genes, genomic locations, phenotypes and diseases. New features and data include searches by Disease Ontology terms to group and display disease classes, and the incorporation of human disease and phenotype relationships from Orphanet (http://www.orpha.net). These new data sets add to the existing OMIM disease-to-phenotype data from the HPO project (17) implemented previously.
Inclusion of Deciphering the Mechanisms of Developmental Disorders (DMDD) data
MGD now integrates mouse embryonic mutant phenotype data generated by the Deciphering the Mechanisms of Developmental Disorders (DMDD) project (18). MGD currently has information for 63 mouse lines, and more phenotype data for ∼200 additional lines will be added as they become available. All of the DMDD mouse lines are derived from International Knockout Mouse Consortium (IKMC)’s Knockout Mouse Project (KOMP) or European Conditional Mouse Mutagenesis Program (EUCOMM) ES cells (19) or CRISPR-induced lines (20). Mouse mutations are annotated to MP terms by DMDD and are shown in parallel with other published and submitted data on these mutations. Expression data for these lines are also available from the Gene Expression Database (GXD) at MGI (10). Shown in Figure 3 is an example of DMDD data submitted for the Slc20a2tm1a(EUCOMM)Wtsi mutation together with data submitted by IMPC and from published literature curated at MGI.
Other enhancements
In addition to the major enhancements described above, several minor enhancements were implemented. Researchers can now search for genome features that are still on unlocalized and unplaced contigs for the reference genome assembly (GRCm38). MGD data loads from IMPC were modified to load and integrate phenotype data generated from CRISPR mutations in addition to ES cell line knockout mutations, increasing the number of annotated phenotype-genotype data in MGD. MGD also updated the load of GO annotations from the Go consortium site to use the new Gene Product Association Data (GPAD, http://www.geneontology.org/page/gene-product-association-data-gpad-format) annotation file format which supports inclusion of additional metadata in contrast to the previously used Gene Association File (GAF).
MGD and the Alliance of Genome Resources
MGD is one of the founding members of the Alliance of Genome Resources (AGR), a new data resource integration effort among the major model organism database (MOD) groups and the Gene Ontology Consortium (GOC). The founding members of the AGR are: the Gene Ontology Consortium, Mouse Genome Database (MGD), FlyBase, WormBase, Saccharomyces Genome Database (SGD), Rat Genome Database (RGD), Zebrafish Information Network (ZFIN). The AGR will work to standardize access to and display of common data types from different model organisms to better support comparative biology for biomedical researchers. The formation of the AGR builds on the collaborative activities between the MODs and GO over several years seeking to enhance data integration, exchange, and the use of common data standards. Now these groups will merge key activities and data representations, coordinating data retrieval and analysis within the comparative perspective. The initial release of the public web portal for the AGR (http://www.alliancegenome.org) is scheduled for October 2017. Among the data types to be included in the initial release are gene details such as gene name and symbol, genomic location, orthology, function annotations, and disease associations. Longer-term goals of the Alliance include adding other model organisms, data types, and analysis tools within a common shared infrastructure.
IMPLEMENTATION AND PUBLIC ACCESS
The primary MGI database (‘production’) is a highly normalized relational database designed and optimized for data integration and incremental updating. This database is the locus of data loads and ongoing expert data curation. It resides in a PostgreSQL server behind a firewall and is not accessible by the public. In contrast, our public web interface is backed by a combination of a highly unnormalized databases (also in PostgreSQL) and Solr/Lucene indexes, designed for high performance query and display in a read-only environment. The front end data stores are refreshed from the production master on a weekly basis. The separation between the public and production (private) architectures provides a large measure of flexibility in project planning, as either side can (and often does) change without affecting the other.
MGD broadcasts data in a variety of ways to support basic research communities, clinical researchers and advanced users interested in programmatic or bulk access. MGD provides free public web access to data from http://www.informatics.jax.org. The web interface provides a simple ‘Quick Search’, available from all web pages in the system and is the most used entry point for users. Various advanced query forms are provided to support precise parameter searching. Data may be retrieved from most results pages by downloading text or Excel files, or forwarding results to Batch Query or MouseMine analysis tools (see below).
MGD offers batch querying interfaces for data retrieval for users wishing to retrieve data in bulk. The Batch Query tool (http://www.informatics.jax.org/batch) (21) is used for retrieving bulk data about lists of genome features. Feature identifiers can be typed in or uploaded from a file. Gene IDs from MGI, NCBI GENE, Ensembl, VEGA, UniProt and other resources can be used. Users can choose the information set they wish to retrieve, such as genome location, GO annotations, list of mutant alleles, MP annotations, Reference SNP IDs or Disease Ontology (DO) terms. Results are returned as a web display, or in tab delimited text or Excel format. Results may also be forwarded to MouseMine (see below).
MGD data access is available through MouseMine (http://www.mousemine.org) (13), an instance of InterMine that offers flexible querying, templates, iterative querying of results and linking to other model organism InterMine instances. MouseMine access is also available via a RESTful API, with client libraries in Perl, Python, Ruby, Java and JavaScript.
This year, MGD retired its FTP server and other back end upgrades. The files that were accessible via FTP are now downloadable from http://www.informatics.jax.org/downloads/. MGD provides a large set of regularly updated database reports from this site. Direct SQL access to a read-only copy of the database is also offered (contact MGI user support for an account). MGI User Support is also available to assist users in generating customized reports on request.
OUTREACH
MGD User Support staff are available for on-site help and training on the use of MGD and other MGI data resources as well as providing off-site workshop/tutorial programs (roadshows) that include lectures, demos and hands-on tutorials. To inquire about hosting an MGD roadshow, email mgi-help@jax.org.
Members of the MGD User Support team can be contacted via email, web requests, phone or fax.
World wide web: http://www.informatics.jax.org/mgihome/support/mgi_inbox.shtml
Facebook: https://www.facebook.com/mgi.informatics
Twitter: https://twitter.com/mgi_mouse and https://twitter.com/hmdc_mgi
Email access: mgi-help@jax.org
Telephone access: +1 207 288 6445.
Fax access: +1 207 288 6830.
CITING MGD
For a general citation of the MGI resource, researchers should cite this article. In addition, the following citation format is suggested when referring to datasets specific to the MGD component of MGI: mouse genome database (MGD), MGI, The Jackson Laboratory, Bar Harbor, Maine (URL: http://www.informatics.jax.org). Type in date (month, year) when you retrieved the data cited.
MOUSE GENOME DATABASE GROUP
A. Anagnostopoulos, A. Andrews, R.M. Baldarelli, J.S. Beal, S.M. Bello, O. Blodgett, N.E. Butler, K. Christie, L.E. Corbani, H.J. Drabkin, R. Espinoza, J. Franco, S.L. Giannatto, P. Hale, D.P. Hill, L. Hutchins, M. Law, J.R. Lewis, M. McAndrews, N. Mez, D. Miers, H. Motenko, L. Ni, H. Onda, M. Perry, J.M. Recla, D.J. Reed, B. Richards-Smith, D. Sitnikov, M. Tomczuk, L. Wilming and Y. Zhu.
ACKNOWLEDGEMENTS
We are grateful to our collaborators at Disease Ontology (DO) and Rat Genome Database (RGD) for providing updated disease ontology files and help with revisions and additions to the ontology. The use of Human Phenotype Ontology and Human Phenotype Annotation files from Dr Peter Robinson (The Jackson Laboratory) is gratefully acknowledged.
FUNDING
National Institutes of Health (NIH)/National Human Genome Research Institute (NHGRI) [U41 HG000330, R25 HG007053 to M.G.D., U41 HG002223 to Alliance for Genome Resources]. Funding for open access charge: NIH/NHGRI [U41 HG000330].
Conflict of interest statement. None declared.
REFERENCES
- 1. Brown F.C., Scott N., Rank G., Collinge J.E., Vadolas J., Vickaryous N., Whitelaw N., Whitelaw E., Kile B.T., Jane S.M. et al. . ENU mutagenesis identifies the first mouse mutants reproducing human β-thalassemia at the genomic level. Blood Cells Mol. Dis. 2013; 50:86–92. [DOI] [PubMed] [Google Scholar]
- 2. Li Y., Klena N.T., Gabriel G.C., Liu X., Kim A.J., Lemke K., Chen Y., Chatterjee B., Devine W., Damerla R.R. et al. . Global genetic analysis in mice unveils central role for cilia in congenital heart disease. Nature. 2015; 521:520–524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Ha S., Stottmann R.W., Furley A.J., Beier D.R.. A forward genetic screen in mice identifies mutants with abnormal cortical patterning. Cereb Cortex. 2015; 25:167–179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Daxinger L., Harten S.K., Oey H., Epp T., Isbel L., Huang E., Whitelaw N., Apedaile A., Sorolla A., Yong J. et al. . An ENU mutagenesis screen identifies novel and known genes involved in epigenetic processes in the mouse. Genome Biol. 2013; 14:R96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Smith C.L., Eppig J.T.. The Mammalian Phenotype Ontology as a unifying standard for experimental and high-throughput phenotyping data. Mamm. Genome. 2012; 23:653–668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Threadgill D.W., Churchill G.A.. Ten years of the collaborative cross. Genetics. 2012; 190:291–294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Churchill G.A., Gatti D.M., Munger S.C., Svenson K.L.. The Diversity Outbred mouse population. Mamm. Genome. 2012; 23:713–718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Ring N., Meehan T.F., Blake A., Brown J., Chen C.K., Conte N., Di Fenza A., Fiegel T., Horner N., Jacobsen J.O.B. et al. . A mouse informatics platform for phenotypic and translational discovery. Mamm. Genome. 2015; 26:413–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Blake J.A., Eppig J.T., Kadin J.A., Richardson J.E., Smith C.L., Bult C.J., Anagnostopoulos A., Baldarelli R.M., Beal J.S., Bello S.M. et al. . Mouse Genome Database (MGD)-2017: Community knowledge resource for the laboratory mouse. Nucleic Acids Res. 2017; 45:D723–D729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Finger J.H., Smith C.M., Hayamizu T.F., McCright I.J., Xu J., Law M., Shaw D.R., Baldarelli R.M., Beal J.S., Blodgett O. et al. . The mouse Gene Expression Database (GXD): 2017 update. Nucleic Acids Res. 2017; 45:D730–D736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Bult C.J., Krupke D.M., Begley D.A., Richardson J.E., Neuhauser S.B., Sundberg J.P., Eppig J.T.. Mouse Tumor Biology (MTB): a database of mouse models for human cancer. Nucleic Acids Res. 2015; 43:D818–D824. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Gene Ontology Consortium Gene Ontology Consortium: going forward. Nucleic Acids Res. 2015; 43:D1049–D1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Motenko H., Neuhauser S.B., O’Keefe M., Richardson J.E.. MouseMine: a new data warehouse for MGI. Mamm. Genome. 2015; 26:325–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Eppig J.T., Motenko H., Richardson J.E., Richards-Smith B., Smith C.L.. The International Mouse Strain Resource (IMSR): cataloging worldwide mouse and ES cell line resources. Mamm. Genome. 2015; 26:448–455. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Schriml L.M., Mitraka E.. The Disease Ontology: fostering interoperability between biological and clinical human disease-related data. Mamm. Genome. 2015; 26:584–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Amberger J.S., Bocchini C.A., Schiettecatte F., Scott A.F., Hamosh A.. OMIM.org: Online Mendelian Inheritance in Man (OMIM), an Online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015; 43:D789–D798. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Köhler S., Vasilevsky N.A., Engelstad M., Foster E., McMurry J., Aymé S., Baynam G., Bello S.M., Boerkoel C.F., Boycott K.M. et al. . The human phenotype ontology in 2017. Nucleic Acids Res. 2017; 45:D865–D876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Mohun T., Adams D.J., Baldock R., Bhattacharya S., Copp A.J., Hemberger M., Houart C., Hurles M.E., Robertson E., Smith J.C. et al. . Deciphering the Mechanisms of Developmental Disorders (DMDD): a new programme for phenotyping embryonic lethal mice. Dis. Model Mech. 2013; 6:562–566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Bradley A., Anastassiadis K., Ayadi A., Battey J.F., Bell C., Birling M.C., Bottomley J., Brown S.D., Bürger A., Bult C.J. et al. . The mammalian gene function resource: the International Knockout Mouse Consortium. Mamm. Genome. 2012; 23:580–586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Rosen B., Schick J., Wurst W.. Beyond knockouts: the International Knockout Mouse Consortium delivers modular and evolving tools for investigating mammalian genes. Mamm. Genome. 2015; 26:456–466. [DOI] [PubMed] [Google Scholar]
- 21. Eppig J.T., Blake J.A., Bult C.J., Kadin J.A., Richardson J.E., Anagnostopoulos A., Babiuk R.P., Baldarelli R.M., Beal J.S., Bello S.M. et al. . The Mouse Genome Database (MGD): Facilitating mouse as a model for human biology and disease. Nucleic Acids Res. 2015; 43:D726–D736. [DOI] [PMC free article] [PubMed] [Google Scholar]