Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Sep 1.
Published in final edited form as: Curr Protoc Bioinformatics. 2019 Sep;67(1):e85. doi: 10.1002/cpbi.85

Using MARRVEL v1.2 for bioinformatics analysis of human genes and variant pathogenicity

Julia Wang 1, Dongxue Mao 2, Fatima Fazal 3, Seon-Young Kim 4, Shinya Yamamoto 5, Hugo Bellen 6, Zhandong Liu 7*
PMCID: PMC6750039  NIHMSID: NIHMS1037046  PMID: 31524990

Abstract

One of the greatest challenges for bioinformatic analysis of human sequencing data is identifying which variants are pathogenic. To solve this problem, numerous databases and tools have been generated. However, much of these useful data and tools are spread out and requires users to search for their variants of interest through human genetics databases, variant function prediction tools, and model organism databases. To solve this problem, we collected data from and observed human geneticists, clinicians, and model organism researchers to carefully select and display valuable information that facilitates the evaluation of whether or not a variant is likely pathogenic. This program, Model organism Aggregated Resources for Rare Variant ExpLoration (MARRVEL) v1.2 allows users to collect relevant data from 27 public sources for further efficient bioinformatic analysis of human variants for prioritization.

Keywords: Genetics, Genomics databases, Variants of unknown significance, Genes of unknown significance, Model organisms

INTRODUCTION

As sequencing technology evolves, more genes and variants of unknown significance come into focus of clinicians and human geneticists who tries to understand the cause of genetic disorders. There currently exist many databases and prediction algorithms that are useful to assess the significance of genes and variants to establish a hypothesis that can be experimentally tested. However, the compilation of these dispersed data usually demands either a lot of time or bioinformatics skills. Here, Model organism Aggregated Resources for Rare Variant ExpLoration (MARRVEL) (Wang et al., 2017) provides a simple, easy-to-use tool for non-computational users interested in gathering data dispersed throughout dozens of tools and databases across the world-wide-web.

This protocol includes methods to search MARRVEL v1.2 starting with a human gene and variant or human gene only. In addition, methods to start the search with a gene of interest in key model organisms are discussed. Finally, the support protocol describes how to use the MARRVEL API.

STRATEGIC PLANNING

Information required to initiate a MARRVEL search is very simple and requires a gene name and/or a variant information. The most likely source of error is either an outdated gene symbol or an incorrect variant nomenclature. If you are unsure, please refer to HUGO Gene Nomenclature Committee (HGNC) (Povey et al., 2001) for the correct human gene symbol, Mutalyzer (den Dunnen, 2016; Wildeman, van Ophuizen, den Dunnen, & Taschner, 2008) for the correct variant (nucleic) nomenclature, and TransVar (Zhou et al., 2015) for the correct variant (amino acid) nomenclature.

Please also note that MARRVEL1.2 uses hg19/GRCh37, so variant information based on hg38/GRCh38 must be converted to hg19/GRCh37 using UCSC LiftOver (https://genome.sph.umich.edu/wiki/LiftOver) before initiating a variant based search.

BASIC PROTOCOL 1

HUMAN GENE AND VARIANT INITIATED SEARCH

The purpose of this method is to query human genetics databases (OMIM, ExAC, gnomAD, Geno2MP, ClinVar, DGV, and DECIPHER), key eukaryotic genetic model organism and gene ontology databases (SGD, PomBase, WormBase, FlyBase, ZFin, MGI, RGD, GO Central), and others (DIOPT, dbNSFP, Mutalyzer, TransVar, GTEx, Human Protein Atlas) using MARRVEL v1.2 (updated February 2019). Please see Table 1 for a comprehensive list and descriptions of each database/tool.

Table 1:

List of all databases and tools curated by MARRVEL v1.2

Name of Database URL/Link to Database Rationale for Inclusion into MARRVEL Reference (PMID)
Human Genetics Databases
OMIM (Online Mendelian Inheritance in Man) https://omim.org/ The three main pieces of information that we draw from OMIM are: gene function, associated phenotypes, and reported alleles. It is helpful to know if a gene is associated with a known Mendelian phenotype (# entries) whose molecular basis is known. Genes without this knowledge are candidates for novel gene discovery. For genes that are this category, if the patient’s phenotype does not match the reported disease and phenotype as well as those of the patients in the literature, then this increases the opportunity to provide a phenotypic expansion for the gene of interest. PMID: 28654725
gnomAD http://gnomad.broadinstitute.org/ gnomAd contains a total of 123,136 exome sequences and 15,496 whole-genome sequences from unrelated individuals sequenced as part of various disease-specific and population genetic studies. Significant portion of ExAC data is intergrated into gnomAD. In MARRVEL we currently display the population frequencies that pertains to specific variant. PMID: 27535533
ExAC http://exac.broadinstitute.org/ ExAC contains more than 60,000 exomes and is, other than gnomAD (http://gnomad.broadinstitute.org/), the largest public collection of exomes that have been selected against individuals with severe early-onset Mendelian phenotypes. For MARRVEL’s purposes, ExAC and gnomAD serves as the best control population dataset to calculate minor allele frequency. We provide two sets of outputs from ExAC. The first output is the gene-centric overview of the expected versus observed number of missense and loss of function (LOF) alleles. A metric called pLI (probability of LOF Intolerance) ranges between 0.00 and 1.00 reflects the selective pressure on certain variants before reproductive age. pLI score of 1.00 means that this gene is very intolerant of any LOF variants and haploinsufficiency of this gene may cause disease in human. The second output is data from ExAC that pertains to the specific variant. If identical variant is seen in ExAC, MARRVEL will display the minor allele frequency. PMID: 27535533
ClinVar https://www.ncbi.nlm.nih.gov/clinvar/ ClinVar is a public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. Variants with interpretations reported by researchers and clinicians are valuable for analyzing how likely a variant is pathogenic. PMID: 29165669
Geno2MP (Genotype to Mendelian Phenotype) http://geno2mp.gs.washington.edu/Geno2MP/ Geno2MP is a collection of samples from the University of Washington Center for Mendelian Genetics. It contains ~9,650 exomes of affected individuals and unaffected relatives. This database links the phenotypic as well as mode of inheritance information to specific alleles. For phenotype, by comparing the affected organ system of the patient of interest to the affected individuals in Geno2MP, one may find potential matches. A match in allele, mode of inheritance, and phenotype provides an increased probability that the variant likely pathogenic. However, due to small sample size a negative association does not necessarily decrease a variant’s pathogenic priority. A mechanism to contact the primary physician of a patient of interest is provided in the original source. N/A
DECIPHER https://decipher.sanger.ac.uk/ The DECIPHER data displayed on MARRVEL includes common variants from the control population. The data displayed includes structural variants that cover the genomic location of the input variant. DECIPHER also contains variant and phenotypic information for affected individuals but can only be accessed directly through their website. PMID: 19344873
DGV (Database of Genomic Variants) http://dgv.tcag.ca/dgv/app/home To our knowledge, DGV is the largest public-access collection of structural variants from more than 54,000 individuals. The database includes samples of reportedly healthy individuals, at the time of ascertainment, from up to 72 different studies. Possible limitations to this data include variation in source and method of the data acquired the lack of information regarding incomplete penetrance of pathogenic CNVs, and whether individuals will develop associated diseases subsequent to data collection. PMID: 24174537
Model Organisms Databases
FlyBase (Drosophila) http://flybase.org MARRVEL collects and displays data from multiple model organism databases. We provide a summary of the molecular, cellular and biological function of the gene using GO terms. The most likely ortholog is derived by DIOPT. PMID:26467478
IMPC (International Mouse Phenotyping Consortium) (mouse) http://www.mousephenotype.org/ MARRVEL provides a hyperlink to coresponding mouse gene pages on the IMPC website. If there has been a knock-out mouse made by the IMPC, an exhaustive list of assays and their results are made available publicly and can provide insight into the phenotype when a gene is lost. Some information is curated in MGI but there maybe a time lag. PMID: 27626380
MGI (Mouse Genome Informatics) (mouse) http://www.informatics.jax.org/ MARRVEL collects and displays data from multiple model organism databases. We provide a summary of the molecular, cellular and biological function of the gene using GO terms. The most likely ortholog is derived by DIOPT. PMID:25348401
Monarch Initiative https://monarchinitiative.org/ MARRVEL provides a link to the Phenogrid of a human gene on Monarch Initiative. This grid provides comparisons between the phenotype of model organisms and known human diseases. PMID: 27899636
PomBase (fission yeast) https://www.pombase.org/ MARRVEL collects and displays data from multiple model organism databases. We provide a summary of the molecular, cellular and biological function of the gene using GO terms. The most likely ortholog is derived by DIOPT. PMID:22039153
WormBase (C. elegans) http://wormbase.org MARRVEL collects and displays data from multiple model organism databases. We provide a summary of the molecular, cellular and biological function of the gene using GO terms. The most likely ortholog is derived by DIOPT. PMID:26578572
ZFIN (zebrafish) https://zfin.org/ MARRVEL collects and displays data from multiple model organism databases. We provide a summary of the molecular, cellular and biological function of the gene using GO terms. The most likely ortholog is derived by DIOPT. PMID:26097180
RGD (Rat Genome Database) (rat) https://rgd.mcw.edu/ MARRVEL collects and displays data from multiple model organism databases. We provide a summary of the molecular, cellular and biological function of the gene using GO terms. The most likely ortholog is derived by DIOPT. PMID:25355511
GTEx (The Genotype-Tissue Expression Project) https://gtexportal.org/home/ MARRVEL displays both mRNA and protein expression pattern in human tissues of each gene. The expression pattern can add insight into the phenotypes observed in patients and/or model organisms. PMID: 29019975, 23715323
The Human Protein Atlas https://www.proteinatlas.org/ MARRVEL displays both mRNA and protein expression pattern in human tissues of each gene. The expression pattern can add insight into the phenotypes observed in patients and/or model organisms. PMID: 21752111
GO (Gene Ontology) Central http://www.geneontology.org/ MARRVEL displays only Gene Ontology (GO) terms (Molecular Function, Cellular Component, and Biological Process) derived from experimental evidence for each gene. They are filtered by “experimental evidence codes” and GO terms based on “computational analysis evidence codes” and “electronic annotation evidence codes” (predictions) are avoided. PMID: 10802651, 25428369
SGD (Saccharomyces Genome Database) (budding yeast) https://www.yeastgenome.org/ MARRVEL collects and displays data from multiple model organism databases. We provide a summary of the molecular, cellular and biological function of the gene using GO terms. The most likely ortholog is derived by DIOPT. PMID: 22110037
Nomenclature/Identifier Databases
HGNC (HUGO Gene Nomenclature Committee) https://www.genenames.org/ HGNC official gene symbols are used for MARRVEL searches. PMID: 27799471
DIOPT (DRSC Integrative Ortholog Prediction Tool) https://www.flyrnai.org/cgi-bin/DRSC_orthologs.pl DIOPT provided multiple protein sequence alignment of the best predicted orthologs in six model organisms against the protein sequence of the human gene of interest. The alignment will provide information on the conservation of specific amino acids as well as functional protein domains. PMID: 21880147
Mutalyzer https://mutalyzer.nl/ MARRVEL uses Mutalyzer’s API to convert different variant nomenclatures to genomic location. PMID: 18000842
TransVar https://bioinformatics.mdanderson.org/transvar/ MARRVEL uses TransVar to convert protein (amino acid) variant nomenclature to genomic location nomeclature, which most databases that MARRVEL links to uses. PMID: 26513549
Gene2Function http://www.gene2function.org/search/ MARRVEL collaborates with DIOPT and Gene2Function to provide the “Model Organism Search” feature. Hyperlink is provided for users to access their website that integrates a number of MO databases and displays them in a different style from how MARREL does. PMID: 28663344
Ensembl https://useast.ensembl.org/ Ensembl gene IDs are used to link the different databases. PMID: 29155950
Miscellaneous Databases
PubMed https://www.ncbi.nlm.nih.gov/pubmed/ MARRVEL provides a hyperlink to “Gene” based PubMed search. Clicking this link will allow one to search biomedical papers that refers to the gene of interest based on previous gene names and symbols. N/A
dbNSFP http://varianttools.sourceforge.net/Annotation/dbNSFP MARRVEL uses dbNSFP to provide pathogenicity prediction scores. PMID: 21520341

A case example will be used to illustrate how each of the elements may be useful to the user. The entire background of the case can be found in Ansar and Chung, et al, 2018 (Ansar et al., 2018). In summary, homozygous loss-of-function variants in the gene Dynamin-binding protein (DNMBP) causes bilateral infantile cataracts. DNMBP:p.Arg271* is one of the variants identified as pathogenic in this autosomal recessive disease. In addition to the pathogenic variant reported in Ansar and Chung, et al, 2018, a benign variant, DNMBP:p.Cys1413Trp, will be used to illustrate what to expect when a benign variant is queried. When these variants lead to negative data, additional examples will be used to illustrate alternative results that are possible.

Necessary Resources

Hardware

A computer with internet access.

Software

Web browser (Chrome, Firefox, etc).

Steps and Annotations
  • 1

    Navigate to http://marrvel.org on your web browser

  • 2

    Click on either “Gene” or “Protein Variant” according to the nomenclature of your variant (Figure 1).

    For “Gene”, enter HGNC (Povey et al., 2001) gene name and Human Genome Variation Society (HGVS) standard variant notation (den Dunnen, 2016).

    The search bar for “Gene” is compatible with two types of variant nomenclature: genome location and transcript-based nomenclature. For genomic location nomenclature, use the coordinates according to hg19/GRCh37. Then, click on “SEARCH.” MARRVEL uses Mutalyzer for this function (den Dunnen, 2016; Wildeman et al., 2008).

    For example, 6:99365567 T>C / FBXL4 or 6:99365567 T>C or FBXL4 or NM_012160.3:c.541A>G

    For “Protein”, enter your variant in the following format: GENE NAME:p.[Reference Amino Acid][Amino Acid location][Variant Amino Acid].

    For example, TK2:p.Leu178Met, IRF2BPL:p.P372R, IRF2BPL:p.Gln126*. Then, click on “SEARCH.”

    When the amino acid change is ambiguous and can be matched to multiple transcripts and genomic locations, MARRVEL will provide the options for you to select the appropriate transcript. Then, click “MARRVEL IT.”

    Note that for Protein Variant, you can only search for missense or stop-gain variants. For all other variants, please use the nucleotide change nomenclature and the “Gene” search.

Figure 1: Annotated screenshot of search page.

Figure 1:

A) Click on Model organism Search for starting a search based on a gene in a model organism. The user will be directed to a page to select the model organism of interest and then enter the gene symbol. Suggestions of gene symbols will be available as the user types.

B) For variants with amino acid/protein nomenclature, click on “Protein Variant”. Users will see a single search bar where both the gene and variant can be entered. The search bar accepts either three letter amino acid code or single letter amino acid code. When there is ambiguity for which isoform the user is referring to, all options will be listed for the user to select.

C) For “About MARRVEL”, “FAQ”, “Feedback”, and “API”, click on “About” on the top menu. “About MARRVEL” includes information on the development team, acknowledgements of data sources, and more. “FAQ” includes the answers to frequently asked questions. “Feedback” page allows users to report errors or provide suggestions for new features in future updates. “API” page provides instructions on how to access MARRVEL API.

D) After searching for DNMBP:p.Cys1413Trp from the “Protein Variant” search bar, a box titled “Reverse Annotation Candidates” displays the gene name, genomic coordinate of the variant, variant type, transcripts, and a link to proceed to the MARRVEL results page.

Case example:

For searching the two variants, DNMBP:p.Arg271* and DNMBP:p.Cys1413Trp, click on “Protein Variant” and enter one of the variants. An intermediate page will display the genomic coordinates of the variant and the corresponding isoform. Click on “MARRVEL IT”.

The results page will contain data from all the databases that MARRVEL queries.

  • 3

    Click on OMIM (Online Mendelian Inheritance in Man)(Amberger, Bocchini, Scott, & Hamosh, 2019) from the menu on the left to navigate to OMIM data boxes as a starting point to determine if your gene of interest is associated with a human disease.

    Locate the “Human Gene Description” box from OMIM for a short summary of what is known about the gene and gene product.

    Locate the “Gene-Phenotype Relationships” box to determine if this gene is a known phenotype-associated gene or not.

    Locate the “Reported Alleles From OMIM” box to get a list of pathogenic variants curated by OMIM.

    Note that OMIM may still miss recently reported disease associations. Thus, we recommend users to conduct PubMed searches as well.

Case example:

For both DNMBP:p.Arg271* and DNMBP:p.Cys1413Trp, the results from OMIM will be the same. Since this is a new disease only recently reported in 2018, OMIM has not yet curated it as a Gene-Phenotype Relationship. In the left box titled “Human Gene Description,” DNMBP is described as a protein involved in the regulation of cell junctions (Figure 2).

Figure 2: OMIM, gnomAD, and ExAC.

Figure 2:

A) Searching for DNMBP:p.Cys1413Trp, MARRVEL’s result page show that there is no known Gene-Phenotype Relationship curated by OMIM, meaning it is not associated with a known disease. Furthermore, this variant is commonly found in the control population database gnomAD. There are 16868 individuals who are homozygous for this variant, indicating that this variant is most likely benign.

B) Searching for IRF2BPL, MARRVEL’s result page show that it is associated with a known autosomal dominant disease called Neurodevelopmental disorder with regression, abnormal movements, loss of speech, and seizures. In the Control Population Gene Summary box for gnomAD database, the pLI score is 0.96, o/e score is 0.11 with the upper bound of 0.33. This means that this gene is highly intolerant of loss-of-function and the loss of one copy of this gene may lead to haploinsufficiency phenotypes.

On the other hand, searching for the gene IRF2BPL (Figure 2) shows that it is associated with an autosomal dominant disease.

  • 3

    Click on gnomAD/ExAC on the left menu to access ExAC (Exome Aggregation Consortium) (Lek et al., 2016) and gnomAD (genome Aggregation Database) (Karczewski et al., 2019) to determine the prevalence of your variant of interest in a large population database.

    ExAC and gnomAD are large population genomics databases based on Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS) of people who are selected to exclude severe pediatric diseases. MARRVEL displays data from both sources. Both ExAC and gnomAD can be used as a “control population database”, especially for severe pediatric disorders, but its interpretation requires some degree of caution.

    Note that gnomAD includes most of the data in ExAC since gnomAD is an updated release of ExAC.

    Use the “Control Population Gene Summary” box to obtain gene-level statistics such as probability of finding loss of function (LOF) alleles in the general population. An important metric is the upper bound of the confidence interval for LOF expected/observed (o/e) ratio and pLI (probability of LOF Intolerance) scores (Karczewski et al., 2019). These metrics measure how intolerant a gene is to LOF variants.

    Locate the next two boxes titled “Population Allele Frequencies (ExAC Database)” and Population Allele Frequencies (gnomAD Database)” to obtain the allele frequencies of the variant of interest in ExAC (Lek et al., 2016) and gnomAD (Karczewski et al., 2019), respectively. These boxes will only be displayed when the user inputs variant information when performing a MARRVEL search.

Case example:

DNMBP:p.Arg271* is not found in gnomAD, indicating that it is a very rare variant, which increases its likelihood that it is a pathogenic variant.

For DNMBP:p.Cys1413Trp, there are 16868 individuals who are homozygous in the Population Allele Frequencies (gnomAD Database) box. This is critical information that indicates this variant is not likely to cause a disease (Figure 2).

Searching for the gene IRF2BPL, an important metric to note is the upper bound of the loss-of-function o/e score’s confidence interval. When this number is below 0.35, in this case it is 0.33, then this gene is highly intolerant of loss-of-function.

  • 3

    Locate to the Pathogenicity Prediction Scores box to determine how likely it is that a variant is pathogenic (Figure 2).

    MARRVEL obtains variant function prediction scores from Single Nucleotide Polymorphism Database (dbNSFP) (Liu, Jian, & Boerwinkle, 2011). The scores we display include:
    1. Combined Annotation Dependent Depletion (CADD) (Rentzsch, Witten, Cooper, Shendure, & Kircher, 2019) uses 63 annotations (949 features) to predict how damaging a variant likely is. CADD_phred is displayed on MARRVEL where the range is from 0 (least damaging) to 50 (most damaging).
    2. Rare Exome Variant Ensemble Learner (REVEL) (Ioannidis et al., 2016) is an Ensemble method of 13 tools that predict the pathogenicity of missense variants. The range of the output is 0 (least damaging) to 1 (most damaging).
    1. Mendelian Clinically Applicable Pathogenicity score (M-CAP) (Jagadeesh et al., 2016) uses conservation data and trained on data from mutations linked to Mendelian diseases. The possible outputs are: “Tolerated” or “Damaging.”
    2. PolyPhen-2 (Adzhubei, Jordan, & Sunyaev, 2013) has two different scores that MARRVEL displays:
      1. PolyPhen-2 HumDiv uses eight sequence-based and three structure-based predictive features. It is trained with Mendelian disease mutations data and single nucleotide variations (SNVs) data from close mammalian homolog proteins. The possible outputs are: “Benign”, “Possibly Damaging”, and “Probably Damaging”.
      2. PolyPhen-2 HumVar uses eight sequence-based and three structure-based predictive features. It is trained with disease associated and common SNVs. The possible outputs are: “Benign”, “Possibly Damaging”, and “Probably Damaging”.
    1. Genomic Evolutionary Rate Profiling (GERP + +) (Davydov et al., 2010) uses multiple protein sequence alignments and phylogenetic tree of 34 mammalian species to report how conserved the amino acid of interest is. The range of the score is −12.3 (least conserved) to 6.17 (most conserved).
    2. phyloP 100way Vertebrate (Ramani, Krumholz, Huang, & Siepel, 2018) uses multiple alignments and phylogenetic tree of 100 vertebrate species to report how conserved the amino acid of interest is. The range of the score is - 20.0 (least conserved) to 10.003 (most conserved).

    For each prediction/conservation output, a Rank Score is provided by dbNSFP to allow comparison between different scores. It is between 0 and 1 and a score of 0.9 means it is more likely to be damaging than 90% of all potential non-synonymous single nucleotide variations (nsSNVs) predicted by that method.

Case example:

Please refer to Figure 3 for a screenshot of the results.

Figure 3: Screenshot of Single-Nucleotide Variant Functional Prediction results.

Figure 3:

A) Benign variant example DNMBP:p.Cys1413Trp. The name of each prediction tool is listed on the left column and the corresponding scores are listed in the middle column. On the last column, the Rank Scores from dbNSFP are listed, providing an overview and comparison of all of the scores. Here, it indicates that all six prediction tools assign the variant as likely benign.

B) Pathogenic variant example DNMBP:p.Arg271*. Since this variant is an early stop, some of the prediction tool results are not available. For example, REVEL and Polyphen-2 tools only provide scores for missense variants.

The Single-Nucleotide Variant Functional Prediction box lists the scores that each prediction tool provides for the variants. CADD indicates that DNMBP:p.Arg271* is quite pathogenic with a score of 37 (range: 1 – 50) (and a rank score of 0.97 indicating that it is in the 97th percentile pathogenicity out of all missense variants). GERP and phyloP both indicate that this residue is moderately well-conserved throughout evolution.

For another variant, DNMBP:p.Cys1413Tr,p the Rank Scores indicate that the six tools predict this variant to be in the 4.7–34.5 percentile pathogenicity of all variants, indicating that this variant is likely benign.

  • 3

    Refer to data from Genotype to Mendelian Phenotype Browser (Geno2MP) (http://geno2mp.gs.washington.edu) in the “Disease population” and “Gene-Phenotype Relationships” boxes to check if there are other individuals with rare variants in the gene of interest and their phenotypic descriptions.

    Geno2MP contains about 9,600 exomes of individuals with a rare disease and their unaffected relatives enrolled in the Washington University Center for Mendelian Genomics study (Chong et al., 2015). Some crude phenotypic descriptions are also provided.

    Locate the “Disease population” box to obtain the allele frequency of the variant of interest.

    Locate the “Gene-Phenotype Relationships” box to obtain HPO (human phenotype ontology) terms (Kohler et al., 2017) for the individuals with the variant of interest.

Case example:

No matches with DNMBP:p.Arg271* or DNMBP:p.Cys1413Trp are found in Geno2MP.

However, if the user searches for variant 6:99365567 T>C in the gene FBXL4 (Figure 4), there are indeed individuals with this variant of interest and the respective HPO (Human Phenotype Ontology) profiles are listed in the “Gene-Phenotype Relationships (Geno2MP)” box.

Figure 4: Geno2MP and ClinVar.

Figure 4:

Searching for the variant 6:99365567 T>C in the gene FBXL4, Geno2MP and ClinVar shows that: A) In Geno2MP there are seven HPO (human phenotype ontology phenotype) profiles listed on the right. In addition, there are no individuals homozygous for the variant of interest and nine individuals heterozygous for the variant of interest. B) However, only three out of the seven HPO Profiles are actually from affected individuals. C) In ClinVar, there are forty known variants that are pathogenic and causing a disease. All variants are listed below in a table. D) Notice that the top variants are all highlighted in blue. These are the variants that are at or include the genomic coordinate of your variant of interest. This usually results in highlighting the copy number variants covering your variant. Thus, the single nucleotide variants then are displayed only after scrolling past the copy number variants.

  • 3

    Refer to data from ClinVar (Landrum & Kattman, 2018) in the “Reported Alleles From ClinVar” box to check for the clinical significance of the variant of interest.

    ClinVar is a database supported by the National Institutes of Health (NIH) where researchers and clinicians submit variants with or without determination of pathogenicity. Both information of SNVs and CNVs are collected in ClinVar.

    Locate the top row of colorful boxes (in green, blue, yellow, and orange) to review a summary of the number of each type of variant reported in ClinVar.

    Check the list of variants below in the box “Reported Alleles From ClinVar.” Note that if a variant was included in the initial search, the highlighted variants in teal are all variants that include the genomic location of the variant of interest. For many CNVs, the region will likely include many more genes than your gene of interest. The columns “Clinical Significance” and “Review Status” can inform you of the significance of the variant and how certain that designation is. Under “Clinical Significance” look for pathogenic and under “Review Status” look for multiple reviewers and no conflict.

Case example:

Only copy number variants are reported in ClinVar so there are no matches with DNMBP:p.Arg271* or DNMBP:p.Cys1413Trp.

However, if the user searches for variant 6:99365567 T>C in the gene FBXL4 (Figure 4), and scroll further down in the ClinVar table, there are missense variants that are reported to be pathogenic associated with Mitochondria DNA depletion syndrome 13 with the criteria and submission status provided for each variant.

  • 3

    Click on to DGV/DECIPHER on the left menu to access data from Database of Genomic Variants (DGV) (MacDonald, Ziman, Yuen, Feuk, & Scherer, 2014) and DECIPHER_CONTROL (Firth et al., 2009) to check if your gene of interest is included in copy number variations present in the control population.

    DGV (Database of Genomic Variants, http://dgv.tcag.ca/dgv/app/home) and DECIPHER (DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources, https://decipher.sanger.ac.uk/) are both collections of CNVs. DGV includes samples of reportedly healthy individuals, at the time of ascertainment, from up to 72 different studies.

    For DECIPHER, MARRVELv1.2 displays population copy number variants (DECIPHER_CONTROL). In future updates MARRVEL, will display patient derived data from DECIPHIER (DECIPHER_DISEASE). We recommend the users of MARRVELv1.2 to visit the original DECIPHER website to access potentially pathogenic CNV information for the time being.

Case example:

By clicking on the “Loss” column twice, control individuals with loss of DNMBP will appear on the top of the list (Figure 5). Only two entries have more than one allele with the loss of DNMBP. Users can select the respective references to read about the original study. In this case, even the studies with three alleles of DNMBP deletion have only a 0.00148075 allele frequency. This indicates that loss of DNMBP may cause disease.

Figure 5: Control copy number variants from DGV and DECIPHER.

Figure 5:

A) Click on the “Loss” column twice to sort the variants by the largest number of loss of copy number variants. Only two entries have more than one allele with the loss of DNMBP. B) The studies with three alleles of DNMBP deletion has a 0.00148075 allele frequency. C) Searching for IRF2BPL:p.Pro372Arg, DECIPHER shows that there is one individual in the control population with a duplication of the gene.

Searching for another variant, IRF2BPL:p.Pro372Arg, DECIPHER’s control population (Common Copy Number Variants) does contain one individual with a duplication of that location (Figure 5).

  • 3
    Refer to data from “Gene Function Table” to obtain information on gene or protein expression patterns and Gene Ontology (GO) terms associated with the gene of interest in human (Carithers & Moore, 2015; The Gene Ontology, 2019; Thul & Lindskog, 2018), rat (Shimoyama et al., 2015), mouse (Law & Shaw, 2018), zebrafish (Ruzicka et al., 2018), fruit fly (Thurmond et al., 2019), nematode worm(Grove et al., 2018), budding yeast(Cherry, 2015) and fission yeast (Lock, Rutherford, Harris, & Wood, 2018).
    1. Gene name: Each gene name is hyperlinked to gene pages on respective model organism databases which provide further phenotypic information and the resources available for each model organism.
    2. PubMed link: Click on the PubMed link for a list of publications that relates to the gene of interest in each organism.
    3. DRSC Integrative Ortholog Prediction Tool (DIOPT) score (Hu et al., 2011): Check this column for a score of the number of ortholog prediction algorithms that predict the specific model organism gene as an ortholog of the human gene of interest. A DIOPT score of 3 or above can be used a reasonable cut-off to identify solid ortholog candidates. However, there are cases where genuine orthologs only have a DIOPT score of 1 due to limited homology. Users should keep in mind that one human gene may correspond to multiple model organism genes and vice versa. At the top of the “Gene Function Table”, un-check the “Show only best DIOPT score gene” box to display all potential candidates.
    4. Expression: Locate this column for a list of tissues where the gene or protein of interest has reported to be expressed.
    5. GO terms: All GO terms are filtered by “experimental evidence codes.”
    6. Other links such as Monarch Initiative and IMPC:
      1. The “Monarch Initiative” (Mungall et al., 2017) hyperlink brings the user to the Phenogrid page for the specific human gene, a chart that provides phenotypic comparisons between human conditions linked to the gene and model organisms genes (not necessarily the orthologs) that have similar conditions. For more information on Monarch Phenogrid, please visit https://monarchinitiative.org/.
      2. If a mouse gene has a knockout made and phenotypically characterized by the International Mouse Phenotyping Consortium (IMPC) (Munoz-Fuentes et al., 2018), the “IMPC” links to the page that details the phenotype of the knockout mouse and its availability from public stock centers. For more information on IMPC, please visit http://www.mousephenotype.org/.

Case example

In the Gene Function Table, homologs of DNMBP in model organisms are listed (Figure 6). Notably, very little is known about this gene in humans and mice, but in zebrafish, fruit flies, worms, and yeast, there are several molecular functions and biological processes that DNMBP is reported to play a role in. This provides users with a starting point if they would like to pursue animal models for further study.

Figure 6: Gene Function Table.

Figure 6:

The Gene Function Table has seven columns and several link-out buttons. The columns are: organism name, homolog, DIOPT score, expression, molecular function, cellular component, and biological process.

A) Clicking on the “Show all/GTEx” button under Expression for Human provides a pop-up with a bar graph displaying the expression levels of the gene of interest in human tissues.

B) Clicking on the “Show all” button under Expression for Drosophila provides a pop-up with boxes with color-coded expression levels of the gene of interest in each fly tissue in either adult or larval stage.

C) Clicking on the IMPC link leads the user to a page that describes the phenotype for the knock out mouse of the gene of interest (if available).

In addition, users can check where the gene is expressed by checking the Expression column and clicking the buttons in the column. For example, GTEx data will be displayed after the user clicks on the “Show all/GTEx” button. Then, data from the Protein Atlas and GTEx can be browsed. GTEx shows that this gene, DNMBP, is broadly expressed in all human tissues (Figure 6).

  • 3

    Locate the “Human Protein Domains” table to identify annotated domains in the gene of interest.

    Refer to the “human gene protein domains” box to obtain predicted protein domains of the human gene from Ensembl (Letunic & Bork, 2018) and Uniprot (Mitchell et al., 2019).

Case example:

From the “Human Gene Protein Domains” box, DNMBP has four Src homology 3 domain of Dynamin Binding Protein and also a RhoGEF domain (Figure 7). These annotations provide clues to the molecular function of the protein and helps to answer the question: do the variants of interest fall within protein domains. In this case, DNMBP:p.Arg271* is within the fourth Src homology domain. Whereas DNMBP:p.Cys1413Trp is not within any protein domains.

Figure 7: Human Gene Protein Domains and Multiple Protein Alignment.

Figure 7:

A) One of the domains of DNMBP is a Src Homology 3 domain that is located at amino acids number 247 to 296.

B) Gene symbol of the gene queried.

C) Link-out to DIOPT, the source of the multiple alignment.

D) List of organisms included in the multiple alignment. Hs (Homo sapiens), mm (Mus musculus), rn (Rattus norvegicus), dr (Danio rerio), dm (Drosophila melanogaster), ce (Caenorhabditis elegans), sp (Schizosaccharomyces pombe).

E) Numbering of the amio acid at the end of each row.

F) Enter the number for the amino acids of interest in the first two boxes and select the organisms of interest. The amino acids will be highlighted in blue.

  • 3

    Refer to the “Multiple Protein Alignment” box to assess the conservation level of your variant of interest (Figure 3).

    The amino acid multiple alignment is generated by DIOPT (Hu et al., 2011) using the MAFFT FFT-NS-2 (v7.305b) aligner (Nakamura, Yamada, Tomii, & Katoh, 2018) which includes human (hs), rat (rn), mouse (mm), zebrafish (dr), fruit fly (dm), worm (ce), and yeasts (sc and sp). To highlight the amino acid of interest, enter the amino acid numbers and select the organism(s) of interest in this box.

    Note that the alignment generally uses the longest transcript of each gene (usually but not always the canonical transcript), thus not necessarily be the transcript of interest.

Case example:

By searching for amino acid 271 in the multiple alignment, we see that DNMBP:p.Arg271 is conserved from humans to rat, zebrafish, and worm but not in mice, fruit flies, or yeast (Figure 7). This helps the user decide which model organisms may be useful for further study of this variant.

In comparison, DNMBP:p.Cys1413 is conserved from humans to mouse, rat, and fly but not other model organisms.

ALTERNATE PROTOCOL 1 (optional)

HUMAN GENE-ONLY SEARCH

For searches that start with only the human gene name and no variant of interest, please use this protocol.

Necessary Resources

Hardware

A computer with internet access.

Software

Web browser (Chrome, Firefox, etc).

Protocol steps—Step annotations
  1. Enter the HGNC gene symbol of your gene of interest (Povey et al., 2001) in the “Human gene symbol” search bar on MARRVEL.org.

  2. Click on SEARCH.

  3. Refer to Base Protocol 1 Steps 3–11 for interpretation of data

    Note that some data are only displayed when a variant is entered. For example, the allele frequency from gnomAD, ExAC, and Geno2MP won’t be displayed without variant information. The pathogenicity prediction algorithms will also not be available. For Geno2MP, a summary of the types of variants present in this database will be displayed instead. For ClinVar, there will be no variants highlighted.

ALTERNATE PROTOCOL 2 (optional)

MODEL ORGANISM GENE SEARCH

If you are in gathering information that is gathered by MARRVEL related to the predicted human ortholog of a specific model organism gene, this protocol explains how to search MARRVEL starting with the model organism gene.

One reason why the search result may be negative is an incorrect gene symbol. Users should go to the specific model organism database to determine what the latest gene nomenclature is and try different gene aliases if necessary. In other cases, if the search result is negative, there may be no good predicted orthologs for the gene of interest.

Necessary Resources

Hardware

A computer with internet access.

Software

Web browser (Chrome, Firefox, etc).

Protocol steps—Step annotations
  1. Navigate to http://marrvel.org/model.

    One can go to this page by clicking the “Model Organism Search” tab on the top of the MARRVEL1.2 homepage.

  2. Select the model organism of interest and enter a gene symbol.

  3. Click on the gene symbol and then click search.

  4. Click on the “MARRVEL it” button for the predicted human ortholog of choice.

    The “DIOPT score,” a score of the number of ortholog prediction algorithms that predict the gene as an ortholog of the human gene of interest, (Hu et al., 2011) and “Best score from Human gene to model organism?” helps the selection of the human gene that are likely to be true orthologs of the model organism gene of interest. For more information on DIOPT scores, please visit…

    When “Best score from Human gene to model organism?” is “Yes,” it indicates that the human gene is more likely to be a true human ortholog of the gene of interest. Note that due to evolutionary genome duplications and other phenomenon, the relationships between orthologs are often not 1:1 and are often 1:2 or 1:4, etc.

  5. Follow Basic Protocol 1 Steps 3–11 for interpretation of data. Similar to the gene-only search, some data will not be displayed because a variant is not entered when starting with a model organism gene.

    Note that some data are only displayed when a variant is entered. For example, the allele frequency from gnomAD, ExAC, and Geno2MP won’t be displayed without variant information. The pathogenicity prediction algorithms will also not be available. For Geno2MP, a summary of the types of variants present in this database will be displayed instead. For ClinVar, there will be no variants highlighted.

SUPPORT PROTOCOL 1 (optional)

Using MARRVEL API

MARRVELv1.2 also provides Application Programming Interfaces (APIs) that allows users to retrieve data from MARRVEL: such as OMIM, ExAC, and gnomAD from your application without searching through the html interface. The link to the API documentation can be easily found from the navigation bar of MARRVEL home page (MARRVEL API tab). We also provide Python code example if one wants to apply API in Python.

Necessary Resources

Hardware

A computer with internet access.

Software

Web browser (Chrome, Firefox, etc).

Protocol steps—Step annotations
  1. Navigate to http://marrvel.org/doc/

    We have listed the example query and response for databases that are included in MARRVEL.

  2. For example, to see the URL and the parameters to query OMIM, click on /OMIM/
    1. Click on Request to see the Query Parameters.
    2. Click on Response to see the list of the output and example result.
  3. Find the desired database/information you are interested in and send http request to the URL with proper method and parameters.

  4. To check the python example on how to fetch the data from MARRVEL through our own API, click on the “Python example” on the page of http://marrvel.org/doc/
    1. Import the required python libraries
    2. Find the desired database/information you are interested in and execute the code for each database
    3. You can replace the string belongs to gene_id to another gene symbol that you are interested, and you will see the expected result for the interested gene. Please be aware these examples only show how to get the data and to convert them to a DataFrame. You will need to slightly modify the code if you wish to collect data from MARRVEL API in a different format/variable types

GUIDELINES FOR UNDERSTANDING RESULTS

Anticipated results of these protocols include prioritization of likely pathogenic variants that are found in rare disease patients, elimination of likely benign variants, and collection of human and model organism gene and protein data curated by MARRVEL1.2 via API.

The conclusions that can be drawn from the results of the MARRVEL1.2 output will vary greatly depending on the purpose of each search. Bioinformaticians, human geneticists, model organism scientist, patients, and other users all have different reasons for obtaining data related to a variant and gene of interest through MARRVEL.

The example used throughout the protocol can be summarized in Figure 8. Searching for the variant DNMBP:p.Arg271* on MARRVEL provides us with the information that the gene is not associated with any human disease (OMIM has not curated publications in the last few months), the variant is not seen in the normal population and is predicted to be pathogenic by multiple prediction tools, and the gene is expressed widely in humans, including neuronal tissues. These data provide us with a clear picture that this variant is a very likely candidate for causing the disease in the patient of interest.

Figure 8: Could homozygous loss-of-function variants in DNMBP (p.Arg271*) be the cause of infantile cataracts?

Figure 8:

Another example to provide a perspective from model organism researchers is illustrated in Figure 9. Here, the user starts from a fly gene, Ankle2, that the user knows causes decreased brain size in the fly. The human ortholog is ANKLE2 and it is associated with an autosomal recessive disease which fits with the fly model. Furthermore, this gene is known to function in mitotic nuclear division, phosphatase 2A binding, and nervous system development in other model organisms. This gene contains Ankyrin repeat domains and is expressed in the human brain. These data may give the user a clue to understanding the mechanism underlying the decreased brain size in the fly and provides a link between fly and human phenotypes.

Figure 9: My fly model of Ankle2 loss-of-function shows decreased brain size – is this correlated in humans?

Figure 9:

The main limitations of MARRVEL1.2 is that the interpretation is completely up to the user and we aim to provide the most concise set of data so much of the dataset is not displayed. We strongly encourage the users to consult with experts (e.g. clinicians, human geneticists, model organism biologists) before making any critical decisions using data aggregated and displayed on MARRVEL.

COMMENTARY

Background Information

The impetus for the development of MARRVEL (Model organism Aggregated Resources for Rare Variant ExpLoration) stemmed from our role as model organism researchers in the Undiagnosed Diseases Network (Ramoni et al., 2017; Splinter et al., 2018; Wangler et al., 2017).

As we developed MARRVEL and released our first version in 2017 (MARRVEL1.0), we discovered that in addition to clinicians and human geneticists, many model organism scientists benefit from data gathered in MARRVEL to quickly assess the biological and medical relevance of long lists of genes they obtained from genetic screens, transcriptome datasets such as RNA-seq, and large scale protein-protein interaction studies. Furthermore, we have interacted with multiple patients’ family members who use MARRVEL to look up information on genes and variants that are found in exome of a child with a rare disease. We continue to be surprised by the different ways that users’ find MARRVEL to be useful and will be updating MARRVEL to meet the needs of many users while keeping the website relatively simple and straightforward to use.

Critical Parameters and Troubleshooting

As mentioned in the main protocols, using the correct gene and variant nomenclature may be the main barrier to successful usage of MARRVEL. However, there are great resources such as Mutalyzer (den Dunnen, 2016; Wildeman et al., 2008), TransVar (Zhou et al., 2015), and HGNC (Povey et al., 2001) that can help users identify the issues and correct their search terms.

Suggestions for Further Analysis

Here we list useful literature that users may find helpful for subsequent or parallel analysis of rare disease cases:

Significance Statement.

The goal of MARRVEL.org is to provide user-friendly at-a-glance information from dozens of databases and tools for the purpose of prioritizing rare disease associated variants for further analysis in model organisms. This protocol guides users through the webtool as well as API access to the aggregated data.

ACKNOWLEDGEMENT

The initial development of MARRVEL was supported in part by the Undiagnosed Diseases Network Model Organisms Screening Center through the NIH Commonfund (U54NS093793) and through the NIH Office of Research Infrastructure Programs (ORIP) (R24OD022005).

JW is funded by the Eunice Kennedy Shriver National Institute Of Child Health & Human Development of the National Institutes of Health under Award Number F30HD094503 and The Robert and Janice McNair Foundation McNair MD/PhD Student Scholar Program.

FF is funded by the Stand By ELI Foundation.

SY receives support from the NIH National Institute on Deafness and other Communication Disorders (R01DC014932).

HJB is supported by the NIH National Institute of General Medical Sciences (R01GM067858) and is an Investigator of the Howard Hughes Medical Institute. ZL is supported by the NIH National Institute of General Medical Science (R01GM120033), National Institute of Aging (R01AG057339), and the Huffington Foundation.

ZL is funded by the Huffington Foundation, Ting Tsung Wei Fong Chao Foundation, and NIH 5R01GM120033 and 1R01AG057339. DM and SYK are funded by the Huffington Foundation.

LITERATURE CITED

  1. Adzhubei I, Jordan DM, & Sunyaev SR (2013). Predicting functional effect of human missense mutations using PolyPhen-2. Current Protocols in Human Genetics, Chapter 7, Unit7 20. doi: 10.1002/0471142905.hg0720s76 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Amberger JS, Bocchini CA, Scott AF, & Hamosh A (2019). OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Research, 47(D1), D1038–D1043. doi: 10.1093/nar/gky1151 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Ansar M, Chung HL, Taylor RL, Nazir A, Imtiaz S, Sarwar MT, … Antonarakis SE (2018). Bi-allelic Loss-of-Function Variants in DNMBP Cause Infantile Cataracts. American Journal of Human Genetics, 103(4), 568–578. doi: 10.1016/j.ajhg.2018.09.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Carithers LJ, & Moore HM (2015). The Genotype-Tissue Expression (GTEx) Project. Biopreserv Biobank, 13(5), 307–308. doi: 10.1089/bio.2015.29031.hmm [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cherry JM (2015). The Saccharomyces Genome Database: A Tool for Discovery. Cold Spring Harb Protoc, 2015(12), pdb top083840. doi: 10.1101/pdb.top083840 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chong JX, Buckingham KJ, Jhangiani SN, Boehm C, Sobreira N, Smith JD, … Bamshad MJ (2015). The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. American Journal of Human Genetics, 97(2), 199–215. doi: 10.1016/j.ajhg.2015.06.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, & Batzoglou S (2010). Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Computational Biology, 6(12), e1001025. doi: 10.1371/journal.pcbi.1001025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. den Dunnen JT (2016). Sequence Variant Descriptions: HGVS Nomenclature and Mutalyzer. Current Protocols in Human Genetics, 90, 7 13 11–17 13 19. doi: 10.1002/cphg.2 [DOI] [PubMed] [Google Scholar]
  9. Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, Rajan D, … Carter NP (2009). DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. American Journal of Human Genetics, 84(4), 524–533. doi: 10.1016/j.ajhg.2009.03.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Grove C, Cain S, Chen WJ, Davis P, Harris T, Howe KL, … WormBase C (2018). Using WormBase: A Genome Biology Resource for Caenorhabditis elegans and Related Nematodes. Methods in Molecular Biology, 1757, 399–470. doi: 10.1007/978-1-4939-7737-6_14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Harnish M, Deal S, Wangler M, & Yamamoto S (2019). In vivo functional study of disease-associated rare human variants using Drosophila. Journal of Visualized Experiments, [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Hu Y, Flockhart I, Vinayagam A, Bergwitz C, Berger B, Perrimon N, & Mohr SE (2011). An integrative approach to ortholog prediction for disease-focused and other functional studies. BMC Bioinformatics, 12, 357. doi: 10.1186/1471-2105-12-357 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Ioannidis NM, Rothstein JH, Pejaver V, Middha S, McDonnell SK, Baheti S, … Sieh W (2016). REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. American Journal of Human Genetics, 99(4), 877–885. doi: 10.1016/j.ajhg.2016.08.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Jagadeesh KA, Wenger AM, Berger MJ, Guturu H, Stenson PD, Cooper DN, … Bejerano G (2016). M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nature Genetics, 48(12), 1581–1586. doi: 10.1038/ng.3703 [DOI] [PubMed] [Google Scholar]
  15. Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, … MacArthur DG (2019). Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. bioRxiv, 531210. doi: 10.1101/531210 [DOI] [Google Scholar]
  16. Kohler S, Vasilevsky NA, Engelstad M, Foster E, McMurry J, Ayme S, … Robinson PN (2017). The Human Phenotype Ontology in 2017. Nucleic Acids Research, 45(D1), D865–D876. doi: 10.1093/nar/gkw1039 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Landrum MJ, & Kattman BL (2018). ClinVar at five years: Delivering on the promise. Human Mutation, 39(11), 1623–1630. doi: 10.1002/humu.23641 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Law M, & Shaw DR (2018). Mouse Genome Informatics (MGI) Is the International Resource for Information on the Laboratory Mouse. Methods in Molecular Biology, 1757, 141–161. doi: 10.1007/978-1-4939-7737-6_7 [DOI] [PubMed] [Google Scholar]
  19. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, … Exome Aggregation C (2016). Analysis of protein-coding genetic variation in 60,706 humans. Nature, 536(7616), 285–291. doi: 10.1038/nature19057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Letunic I, & Bork P (2018). 20 years of the SMART protein domain annotation resource. Nucleic Acids Research, 46(D1), D493–D496. doi: 10.1093/nar/gkx922 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Liu X, Jian X, & Boerwinkle E (2011). dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Human Mutation, 32(8), 894–899. doi: 10.1002/humu.21517 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Lock A, Rutherford K, Harris MA, & Wood V (2018). PomBase: The Scientific Resource for Fission Yeast. Methods in Molecular Biology, 1757, 49–68. doi: 10.1007/978-1-4939-7737-6_4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. MacDonald JR, Ziman R, Yuen RK, Feuk L, & Scherer SW (2014). The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Research, 42(Database issue), D986–992. doi: 10.1093/nar/gkt958 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Manolio TA, Fowler DM, Starita LM, Haendel MA, MacArthur DG, Biesecker LG, … Bult C (2017). Bedside Back to Bench: Building Bridges between Basic and Clinical Genomic Research. Cell, 169(1), 6–12. doi: 10.1016/j.cell.2017.03.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, … Finn RD (2019). InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Research, 47(D1), D351–D360. doi: 10.1093/nar/gky1100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Mungall CJ, McMurry JA, Kohler S, Balhoff JP, Borromeo C, Brush M, … Haendel MA (2017). The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Research, 45(D1), D712–D722. doi: 10.1093/nar/gkw1128 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Munoz-Fuentes V, Cacheiro P, Meehan TF, Aguilar-Pimentel JA, Brown SDM, Flenniken AM, … consortium, I. (2018). The International Mouse Phenotyping Consortium (IMPC): a functional catalogue of the mammalian genome that informs conservation. Conservation Genetics, 19(4), 995–1005. doi: 10.1007/s10592-018-1072-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Nakamura T, Yamada KD, Tomii K, & Katoh K (2018). Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics, 34(14), 2490–2492. doi: 10.1093/bioinformatics/bty121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Povey S, Lovering R, Bruford E, Wright M, Lush M, & Wain H (2001). The HUGO Gene Nomenclature Committee (HGNC). Human Genetics, 109(6), 678–680. doi: 10.1007/s00439-001-0615-0 [DOI] [PubMed] [Google Scholar]
  30. Ramani R, Krumholz K, Huang YF, & Siepel A (2018). PhastWeb: a web interface for evolutionary conservation scoring of multiple sequence alignments using phastCons and phyloP. Bioinformatics. doi: 10.1093/bioinformatics/bty966 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Ramoni RB, Mulvihill JJ, Adams DR, Allard P, Ashley EA, Bernstein JA, … Wise AL (2017). The Undiagnosed Diseases Network: Accelerating Discovery about Health and Disease. American Journal of Human Genetics, 100(2), 185–192. doi: 10.1016/j.ajhg.2017.01.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Rentzsch P, Witten D, Cooper GM, Shendure J, & Kircher M (2019). CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Research, 47(D1), D886–D894. doi: 10.1093/nar/gky1016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Ruzicka L, Howe DG, Ramachandran S, Toro S, Van Slyke CE, Bradford YM, … Westerfield M (2018). The Zebrafish Information Network: new support for non-coding genes, richer Gene Ontology annotations and the Alliance of Genome Resources. Nucleic Acids Research. doi: 10.1093/nar/gky1090 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Shimoyama M, De Pons J, Hayman GT, Laulederkind SJ, Liu W, Nigam R, … Jacob H (2015). The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease. Nucleic Acids Research, 43(Database issue), D743–750. doi: 10.1093/nar/gku1026 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Splinter K, Adams DR, Bacino CA, Bellen HJ, Bernstein JA, Cheatle-Jarvela AM, … Undiagnosed Diseases, N. (2018). Effect of Genetic Diagnosis on Patients with Previously Undiagnosed Disease. New England Journal of Medicine, 379(22), 2131–2139. doi: 10.1056/NEJMoa1714458 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. The Gene Ontology C (2019). The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research, 47(D1), D330–D338. doi: 10.1093/nar/gky1055 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Thul PJ, & Lindskog C (2018). The human protein atlas: A spatial map of the human proteome. Protein Science, 27(1), 233–244. doi: 10.1002/pro.3307 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Thurmond J, Goodman JL, Strelets VB, Attrill H, Gramates LS, Marygold SJ, … FlyBase C (2019). FlyBase 2.0: the next generation. Nucleic Acids Research, 47(D1), D759–D765. doi: 10.1093/nar/gky1003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Wang J, Al-Ouran R, Hu Y, Kim SY, Wan YW, Wangler MF, … Bellen HJ (2017). MARRVEL: Integration of Human and Model Organism Genetic Resources to Facilitate Functional Annotation of the Human Genome. American Journal of Human Genetics, 100(6), 843–853. doi: 10.1016/j.ajhg.2017.04.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Wangler MF, Yamamoto S, Chao HT, Posey JE, Westerfield M, Postlethwait J, … Bellen HJ (2017). Model Organisms Facilitate Rare Disease Diagnosis and Therapeutic Research. Genetics, 207(1), 9–27. doi: 10.1534/genetics.117.203067 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Wildeman M, van Ophuizen E, den Dunnen JT, & Taschner PE (2008). Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker. Human Mutation, 29(1), 6–13. doi: 10.1002/humu.20654 [DOI] [PubMed] [Google Scholar]
  42. Zhou W, Chen T, Chong Z, Rohrdanz MA, Melott JM, Wakefield C, … Chen K (2015). TransVar: a multilevel variant annotator for precision genomics. Nature Methods, 12(11), 1002–1003. doi: 10.1038/nmeth.3622 [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES