Abstract
DECIPHER (https://www.deciphergenomics.org) is a free web platform for sharing anonymized phenotype‐linked variant data from rare disease patients. Its dynamic interpretation interfaces contextualize genomic and phenotypic data to enable more informed variant interpretation, incorporating international standards for variant classification. DECIPHER supports almost all types of germline and mosaic variation in the nuclear and mitochondrial genome: sequence variants, short tandem repeats, copy‐number variants, and large structural variants. Patient phenotypes are deposited using Human Phenotype Ontology (HPO) terms, supplemented by quantitative data, which is aggregated to derive gene‐specific phenotypic summaries. It hosts data from >250 projects from ~40 countries, openly sharing >40,000 patient records containing >51,000 variants and >172,000 phenotype terms. The rich phenotype‐linked variant data in DECIPHER drives rare disease research and diagnosis by enabling patient matching within DECIPHER and with other resources, and has been cited in >2,600 publications. In this study, we describe the types of data deposited to DECIPHER, the variant interpretation tools, and patient matching interfaces which make DECIPHER an invaluable rare disease resource.
Keywords: genetic disorders, genomic medicine, genotype phenotype correlation, Matchmaker Exchange, rare diseases, variant interpretation, whole‐exome sequencing, whole‐genome sequencing
The DECIPHER web platform supports the sharing and interpretation of rare disease phenotype‐linked variant data to advance diagnosis and research.
1. INTRODUCTION
The population prevalence of rare diseases has recently been estimated to be 3.5%–5.9%, which equates to 263–446 million people affected globally. A large proportion of these rare diseases, approximately 72%, are known to have a genetic basis (Nguengang Wakap et al., 2020). Advances in genomic technologies to determine causal variants, such as whole‐exome sequencing, currently identify the genetic basis of disease for only 25%–47% of patients (Liu et al., 2019; Quaio et al., 2020; Sawyer et al., 2016; Stranneheim et al., 2021). As a result, many patients undergoing diagnostic genetic testing do not receive a molecular diagnosis, and often experience long delays which have a substantial emotional impact on the family (Miller, 2021) and significant healthcare costs (Monroe et al., 2016). A molecular diagnosis has multiple benefits for the patient and their family, including better understanding of the prognosis, personalized treatment, tailored management and surveillance, improved access to health and social care, and increased reproductive choice (Liu et al., 2019; Wright et al., 2018).
The number of rare Mendelian diseases with known molecular etiology is estimated to be 5000–6000 (Hartley et al., 2018); however, for the majority of disease‐associated genes, it is not known which variants are disease‐causing, and which are benign. Different pathogenic variants in the same gene can cause different diseases, for example, variants in FGFR3 can cause multiple diseases including Muenke Syndrome, Hypochondroplasia, Achondroplasia, Camptodactyly Tall Stature and Hearing Loss syndrome (CATSHLS), Lacrimo‐Auriculo‐Dento‐Digital Syndrome (LADD Syndrome), Thanatophoric Dysplasia (types 1 & 2), SADDAN syndrome and Crouzon Syndrome with Acanthosis Nigricans. Different diseases caused by variants in the same gene must be considered distinct due to their disparate clinical presentation and different treatment options. The sharing of patient‐level variants and phenotypes is therefore essential to accelerate our understanding of the molecular basis of genetic disease.
DECIPHER (Bragin et al., 2014; Chatzimichali et al., 2015; Firth et al., 2009; Swaminathan et al., 2012) is a global web‐based platform that shares phenotype‐linked variant data from rare disease patients (Figure 1a). It is freely available via a web interface at https://www.deciphergenomics.org. Approximately 40,000 of the patient records held by DECIPHER have explicit patient consent for open sharing on the website (Figure 1b). These openly shared records contain more than 51,000 variants and more than 172,000 phenotype terms. The integration of this phenotype and variant data enables the discovery of new gene‐disease trait and variant‐disease trait relationships, driving molecular diagnosis and our understanding of human biology. Since DECIPHER was established in 2004, the platform has been used and cited in more than 2,600 published manuscripts.
Patient records in DECIPHER are deposited by academic clinical centers, which are affiliated both to a hospital that oversees the treatment of patients with genetic conditions, and to a local university department of human/clinical genetics. Eligible centers can apply to join DECIPHER using an online application form (https://www.deciphergenomics.org/join/overview). Data from a center is stored within a DECIPHER project, and a senior clinician at that center (clinical coordinator), sometimes in conjunction with a senior clinical scientist (lab coordinator), has the responsibility for approving/rejecting applications from individuals working at that center who wish to access the data in the project.
The platform supports the deposition of genetic and genomic variation (e.g., sequence variants, insertions and deletions, short tandem repeats (STRs), copy‐number variants [CNVs], complex and copy number neutral structural variants); including that observed in genomic conditions. Variant interpretation interfaces are provided, including genome and protein browsers, which contextualize genetic and phenotype information to enable accurate interpretation. These interfaces integrate external data sets such as the Genome Aggregation Database (gnomAD; Karczewski et al., 2020), which can be used to exclude variants seen at appreciable frequency in the general population, in addition to disease relevant data sets such as ClinVar (Landrum et al., 2018) and DECIPHER records themselves. DECIPHER also encourages the use of global standards to promote good practice, including the American College of Medical Genetics and Genomics and Association of Molecular Pathology (ACMG/AMP) guidelines for sequence variant interpretation (Richards et al., 2015) and ACMG/ClinGen technical standards for interpreting CNVs (Riggs et al., 2020).
In the following sections, we present examples of the genotype/phenotype data deposited and shared with the rare disease community. In addition, we present the tools provided by DECIPHER to assess the pathogenicity of variants according to international standards, and the utility of DECIPHER to map the clinically relevant part of the assayable human genome.
2. DECIPHER PATIENT RECORDS
DECIPHER associates variants and phenotypes through individual patient records, each of which are connected to a particular depositing center. DECIPHER itself cannot reidentify individuals, and technical and organizational measures are in place to safeguard data. These measures are reviewed and updated in line with evolving best practices.
On deposition, each patient record is given a DECIPHER Patient ID as a reference, which is shown on the website and forms part of the URL for the patient record (e.g., https://www.deciphergenomics.org/patient/283351—note that URLs of the form https://decipher.sanger.ac.uk/patient/283351 continue to be supported). Each patient record also has an internal ID (e.g., a lab number), which is only displayed to users of the depositing center. The internal ID allows the depositing center (only) to link the record to an individual patient.
Through the DECIPHER platform, it is possible to send a patient's clinician an email to request further information about the patient, for example in the case where there is a potential patient match, or if a researcher is carrying out a functional study on the gene in which that patient's variant is situated. Below we will describe in more detail the clinical and research utility of this notification system.
3. DEPOSITION AND BREADTH OF SHARING
DECIPHER has been carefully designed to ensure that the depth and breadth of sharing are proportionate to the scientific/clinical needs and level of consent. For example, a user who does not belong to a DECIPHER project can only access the openly shared patient data, while data that is visible to registered users who are logged in reflects their project and consortium memberships.
Patient genotype and phenotype data can be deposited to DECIPHER in three ways:
-
1.
Via the web interface for an individual patient's data.
-
2.
By uploading Excel or csv files via the web interface (bulk upload) for data from multiple patients.
-
3.
Using the deposition API to allow programmatic uploading of data and synchronization of data across systems (e.g., synchronization between a center's electronic health records and the patient records in that center's DECIPHER project).
DECIPHER users at the depositing center determine the sharing level of each patient record and variant. Patient records, and individual variants within these records, can be kept private to the depositing center. This allows DECIPHER's tools to be used for assessing variant pathogenicity to inform the conversation with the patient before seeking consent for wider sharing. With explicit patient consent, patient records are shared openly, with the data available to anyone who visits the website. Consent forms approved by the English National Research Ethics Service (NRES) are available to download from the DECIPHER website. Since DECIPHER is an international database, depositing centers must ensure appropriate consent is obtained in accordance with local laws and regulations. DECIPHER also supports consortium sharing. This allows sharing of patient records between a defined group of centers, where there is an expectation of collaboration for patient care, again before explicit patient consent for open sharing has been obtained. DECIPHER currently hosts six consortia, which share more than 63,000 patient records. Consortia include the United Kingdom National Health Service consortium, the Deciphering Developmental Disorders (DDD) consortium which shares research data from the DDD study (Wright et al., 2015), and a data‐sharing consortium covering New South Wales and Western Australia.
DECIPHER is a live interface and data deposited is available to view, interpret, and share in real time. Patient records can be added and edited iteratively as more information becomes available, for example, additional phenotype terms, the inheritance status of a variant, or new functional data. Depositors are encouraged to ensure complete and accurate data entry, for the benefit of all users of DECIPHER. If a patient is reported in a publication, submitters are requested to add the citation to the patient record to alleviate issues of double‐counting of cases. Information can be added to a record by a clinician and clinical scientist working asynchronously and in different locations.
4. GENETIC DATA
As our knowledge of rare disease genetics develops and the interaction between gene loci is more fully understood, there is a pressing need for the visualization of all types of genetic variation within a single interface. DECIPHER fulfills this need, supporting many types of genetic variation including sequence variants, CNVs, aneuploidy, uniparental disomy (UPD), inversions, insertions, and STRs (Figure 2). The visualization of Complex Genomic Rearrangements is challenging and thus not every genetic rearrangement can yet be supported.
4.1. Variant deposition
Variants are deposited using genomic coordinates. Sequence variants can also be deposited using a relevant subset of HGVS nomenclature (den Dunnen et al., 2016), and will be normalized (left aligned, parsimonious) during the deposition process (Tan et al., 2015). For known STRs, the disease‐relevant STR can be selected from a dropdown in the web interface. Additional information about the variant such as inheritance, genotype, pathogenicity, and contribution to phenotype can also be recorded.
4.2. Mosaicism
For de novo mosaic variants, it is possible to record the mosaicism observed in each tissue, as a percentage. This information is clinically important as it can help explain the variability of clinical symptoms, for example, the difference between nevus sebaceous or Schimmelpenning syndrome (where extracutaneous abnormalities are present), caused by HRAS and KRAS variants (Groesser et al., 2012).
4.3. Mitochondrial variants
DECIPHER supports the deposition and interpretation of variants in the nuclear and mitochondrial genomes. Mitochondrial diseases are the most common form of inherited neuro‐metabolic disorders and are caused by mutations in the nuclear or mitochondrial genomes. In addition, nuclear genetic factors have been shown to influence clinical outcomes for mitochondrial DNA mutations (Boggan et al., 2019). Thus the display of both genomes in a single interface is clinically important. In DECIPHER it is possible to record homoplasmy or the percentage of heteroplasmy per tissue, which is clinically essential as it has been shown to contribute to disease progression (Grady et al., 2018).
4.4. Variant haplotypes
Variants may work in cis to create or modify a disease allele or in trans to cause a biallelic disorder. For this reason, DECIPHER users can assign variants to a haplotype, for example, for compound heterozygous variants, the variants will be shown as in trans. As our understanding of rare disease genetics improves, the representation of its complexity is becoming even more essential. It is known that genetic modifiers alleviate or exacerbate the severity of the disease (Rahit & Tarailo‐Graovac, 2020) and there are recent examples where rare pathogenic haplotypes have been shown to cause disease, such as an albinism‐causing TYR haplotype (Campbell et al., 2019).
4.5. Pathogenicity predictors
For all sequence variants deposited to DECIPHER, predictions from the Ensembl Variant Effect Predictor (VEP; McLaren et al., 2016) are displayed across all Ensembl/GENCODE transcripts. Predictions include the consequence (e.g., missense, frameshift), the protein change, and several pathogenicity scores: SIFT (Sim et al., 2012), PolyPhen‐2 (Adzhubei et al., 2013), CADD (Kircher et al., 2014), REVEL (Ioannidis et al., 2016), and SpliceAI (Jaganathan et al., 2019). DECIPHER seeks advice from experts in the field and refers to benchmarking studies for pathogenicity predictors (e.g., Gunning et al., 2021) before the inclusion of additional scores, assisting in the application of good practice.
4.6. Reference genome
All genomic information is displayed in the GRCh38 assembly version of the human genome, allowing the most up‐to‐date genome and transcript information to be used to enable accurate variant interpretation. The display of genomic data in GRCh38 permits DECIPHER to promote the use of Matched Annotation from NCBI and EMBL‐EBI (MANE) transcripts, where the RefSeq and Ensembl/GENCODE transcripts from a protein‐coding gene pair are identical (5′ UTR, coding region, and 3′ UTR). DECIPHER currently promotes and highlights MANE Select transcripts, one high‐quality representative transcript per protein‐coding gene that is well‐supported by experimental data and represents the biology of the gene (https://tark.ensembl.org/web/mane_project). Describing variants relative to a single, recommended transcript, along with sequence variant normalization, assists in the standardization of variant reporting.
4.7. Reference conversion tools
Deposition with GRCh37/hg19 coordinates is still supported: before normalization, DECIPHER remaps GRCh37 coordinates onto the GRCh38 assembly, using an algorithm based on the UCSC LiftOver tool (https://genome.ucsc.edu/cgi-bin/hgLiftOver; Kuhn et al., 2013). A recent study comparing exome variant calls detected in GRCh37 and GRCh38 genome assemblies, with lifted over variants (GRCh37 to GRCh38), has shown that the majority of variants have concordant genotypes (>98% SNVs and >93% indels across all samples), with most discordant calls clustered within discrete discordant reference patches (Li et al., 2021). DECIPHER provides a range of tools to allow users to visualize the differences between assemblies and help identify regions of discordance between the assemblies. These include GRCh37 and GRCh38 comparative genome browsers, gene lists for variants lifted over by DECIPHER which display genes that no longer overlap the variant, and a liftover mapping genome browser track (Figure 3).
5. PHENOTYPIC DATA
DECIPHER supports detailed phenotype data capture (Figure 4a) which enables the in‐depth comparison of patient phenotypes, as well as the delineation of new syndromes. Much of the phenotype is represented using Human Phenotype Ontology (HPO) terms—a standardized, controlled vocabulary that supports deep phenotyping (Köhler et al., 2019). This allows phenotypic information to be described unambiguously, and for phenotypic similarity between patients to be established computationally by comparing related terms in the ontology. This is essential for finding potential patient matches. The DECIPHER phenotype deposition interface provides a search tool, allowing HPO terms to be added to a patient record quickly and easily. DECIPHER also supports the recording of the absence of clinically relevant phenotypes, and of manifestations of HPO terms (clinical modifiers), such as severity, age of onset, and pace of progression. This information can be helpful to users trying to determine the accuracy of a patient match, especially when the number of patient phenotypes is small.
In collaboration with ophthalmologists, DECIPHER has developed forms for groups of HPO phenotypes for the eye community, to assist phenotyping in the clinic. These forms contain a predetermined list of HPO terms that can be marked absent or present, and include common retinal and non‐retinal disease, and symptoms and signs (extraocular features, ocular features, and electrodiagnostic testing and imaging). These forms are available to depositors as an optional addition to the phenotyping interface. Clinical data from >1500 individuals with inherited eye disorders have been deposited to DECIPHER using the relevant phenotype form. DECIPHER is working with other disease specialties to develop further forms.
5.1. Family history
In the case of inherited disorders, it is important to capture family phenotype history. In DECIPHER, users can record whether or not relevant family members are affected with similar or related phenotypes. Presence of absence of HPO terms can also be indicated for each family member if known.
5.2. Quantitative data
In addition to HPO terms, DECIPHER supports quantitative phenotype data capture (Figure 4b). Developmental milestones (age of social smile, sat independently, walked independently, and first words) and anthropometric measurements (growth, visual function, fundus imaging) can be deposited. Aggregated observations from open‐access patient records are shared openly (see Section 7.3). DECIPHER also provides an interface to record birth and pregnancy information, such as age of the mother/father at birth of the patient, consanguinity, maternal illness, and gestation (which is also used to adjust growth charts); this information is not currently shared openly, but is shared within a consortium.
6. GENOTYPIC SUMMARIES TO ASSIST VARIANT INTERPRETATION
DECIPHER provides a suite of tools to assist in assessing the pathogenicity of variants, including genome and protein browsers.
6.1. Protein browser
A protein browser is available for protein‐coding genes, showing a genotypic summary that helps users to determine if a variant is located in a mutational hot spot or established functional domain (Figure 5a). The protein browser is fully interactive and is customizable via a settings menu. In the center of the protein browser, Pfam domains (Mistry et al., 2021) are displayed allowing users to identify distinct functional/structural elements of the protein. Clinically relevant variants from DECIPHER and ClinVar are plotted above and below the Pfam domains, with annotated pathogenicity and predicted molecular consequence (e.g., missense, likely loss‐of‐function [LOF]) indicated through coloring. In addition to the location of the variants being shown, for likely LOF variants, the location of the protein‐truncating codon is indicated, since this information is essential in determining if a transcript is likely to escape nonsense‐mediated decay (NMD). A predicted (NMD) track is also displayed. The location of variation in the general population is shown through display of gnomAD missense and LOF tracks. Regional missense constraint data are also available (regional missense constraint improves variant deleteriousness prediction, Samocha et al., https://www.biorxiv.org/content/10.1101/148353v1), in addition to exon structure. Protein secondary structures (e.g., locations of helices and turns) and the locations of 3D structures (experimental structures were available from the Protein Data Bank in Europe [PDBe] and predicted structures from Alphafold [Jumper et al., 2021]) are displayed at the bottom of the protein browser. Clicking on these 3D structures will display an interactive 3D protein viewer (Marco Biasini, 2015, pv v1.8.1. Zenodo. 10.5281/zenodo.20980) which provides zooming, panning, and rotation, and hovering over an amino acid with a pointing device identifies the visualized amino acid and position (similar behavior exists for ligands). DECIPHER variants are shown in this 3D view, allowing users to determine, for example, if the variants are all within a DNA binding pocket or enzyme active site.
When looking at the protein browser from a patient record with a sequence variant, the location of the patient's variant is displayed by a vertical line, allowing easy orientation. In the case of a patient with a CNV, the protein browser is accessible from the CNV's genes tab, which displays a table of genes that overlap the CNV, along with other relevant information such as gene/disease association information and predictive scores. Clicking on a row displays further information about that gene, including the protein browser. An additional track is shown on the protein browser, indicating which part of the protein overlaps the CNV.
6.2. Genome browser
The Genoverse genome browser (http://genoverse.org), developed by the DECIPHER team, is a portable, interactive, customizable genome browser that allows the user to explore data. It displays a number of tracks containing information relevant to variant pathogenicity assessment such as genes associated with disease phenotypes (as curated and maintained by Online Mendelian Inheritance in Man (OMIM, https://omim.org; Amberger et al., 2019), protein ortholog sequences from Ensembl indicating conservation, transcripts (as maintained by Ensembl), and regional missense constraint. Information from population resources such as gnomAD and Database of Genomic Variants (DGV) Gold (Church et al., 2010) are displayed to enable users to determine if their patient's variant has been observed in healthy individuals. Disease relevant variant tracks are also available, which include DECIPHER sequence variants and CNVs, ClinVar sequence and structural variants, and variants from Human Gene Mutation Database (HGMD) public (Stenson et al., 2020). The tracks which are displayed by default are tailored according to the type of variant being assessed.
7. TOOLS SUPPORTING MOLECULAR DIAGNOSTIC ASSESSMENT
7.1. Assessing pathogenicity according to international standards
DECIPHER supports the annotation and sharing of variant pathogenicity using ACMG guidelines for sequence variants and ACMG/ClinGen technical standards for CNVs, which helps to standardize the classification of variants across centers. When interpreting a CNV it is possible for users to choose to assess the variant using sequence variant guidelines, which may be more applicable for small CNVs since the distinction between a sequence variant and a CNV is blurred (Brandt et al., 2020).
7.1.1. Criteria selection
In both pathogenicity interfaces (Figure 5a,b), types of evidence (such as population data and functional data) are displayed, along with the relevant evidence criteria used to determine if data supports the variant being pathogenic or benign. Relevant criteria can be selected with a single click. Some of the criteria have additional information links. These either provide information about how the criteria can be used according to the original study (e.g., de novo CNV evidence), or in the case of sequence variants they provide information about ClinGen Sequence Variant Interpretation (SVI) Working Group guidelines (e.g., recommendation for functional assays (PS3/BS3); Brnich et al., 2019). As new guidelines become available these pathogenicity interfaces are updated to provide the latest relevant recommendations. The strength of each criterion can be modified as required in the interface.
7.1.2. Relevant evidence
Within the interfaces, there is a customized section displaying “evidence to consider” which provides information relating to the specific evidence type being assessed. For example, for computational and predictive data evidence, predictive pathogenicity scores (SIFT, PolyPhen‐2, CADD, REVEL, and SpliceAI) are displayed. Links are also provided to relevant DECIPHER interpretation interfaces, for example to the in‐built tolerated population variation calculator, which can be used to determine if a variant observed in the reference sample is too common to cause a given rare variant Mendelian disease trait (Whiffin et al., 2017). External links (e.g., PubMed literature search) are also provided.
7.1.3. Calculation of variant pathogenicity
As criteria are added, DECIPHER uses these to calculate the variant pathogenicity. For sequence variants, this is calculated according to the combining rules detailed in the original 2015 ACMG guidelines. In addition, DECIPHER calculates the posterior probability of pathogenicity and classification according to the ClinGen SVI Working Group's Bayesian classification framework, which provides a mathematical foundation for the combining rules (Tavtigian et al., 2018). DECIPHER highlights cases where these classifications disagree, and ultimately all pathogenicity assessments are made by depositors using their professional discretion. For CNVs, the evidence can be scored according to ACMG/ClinGen technical standards instead.
7.1.4. ClinGen Expert Panel specifications
For some genes, there are ClinGen Variant Curation Expert Panel specifications, which recommend adaptations of the sequence variant ACMG guidelines (e.g., Rett and Angelman‐like Disorders Variant Curation Expert Panel for MECP2, CDKL5, FOXG1, UBE3A, SLC9A6, and TCF4). When interpreting variants in genes for which these recommendations exist, detailed information about how to apply the criteria is provided along with a link to the relevant Clinical Domain Working Group, so that patients with variants in these genes benefit from interpretation in accordance with these recommended standards.
7.2. Confirming variant‐phenotype association and making a molecular diagnosis
DECIPHER provides an assessment interface (Figure 5d) which is designed to be used in a multidisciplinary team meeting to evaluate whether one or more variants explain the clinical features seen in a patient, and record if a molecular diagnosis has been made (or excluded). Depositors can report evidence from several evidence lines, such as the age at presentation or additional clinical investigation, to weigh evidence for or against a genotype‐phenotype relationship. An OMIM gene‐disease pair and assertion is recorded, for example, “genetic diagnosis confirmed,” “uncertain genetic diagnosis,” or “non‐penetrant (or presymptomatic) for a dominant genetic disorder.” The output of the assessment is a date‐stamped report providing the patient's variants and phenotypes, in addition to the diagnosis and evidence on which that diagnosis was made.
There are many published examples of patients having blended phenotypes due to pathogenic variants in more than one gene, for example, in Ferrer et al. (2019), the patient had three independent rare disease diagnoses due to pathogenic variants in SIN3A (Witteveen–Kolk syndrome), FLG (dermatitis), and EDAR (ectodermal dysplasia). A recent study has suggested that multiple molecular diagnoses occur in approximately 5% of cases in which a molecular diagnosis is elucidated (Posey et al., 2017). Blended phenotypes among patients with dual diagnoses include cases where individual phenotypic features are clearly attributable to only one of the two diagnoses, and cases where phenotypic features could be attributable to both of the diagnoses. The assessment interface allows multiple assessments to be created for a patient, allowing the genetic basis of blended phenotypes to be recorded and shared.
7.3. Quantitative phenotypic data to confirm fit with diagnosis
7.3.1. Quantitative phenotype data and gene‐specific centile charts
Quantitative phenotype data (developmental milestones or anthropometric measurements) can be recorded in DECIPHER, and are aggregated on a gene‐by‐gene basis and shared openly (Figure 6a). In order for this information to be shown for a given gene, there must be at least five patients with both quantitative phenotype data and openly shared sequence variants annotated as pathogenic/likely pathogenic. Once this threshold is met, DECIPHER automatically aggregates and shares the information as a series of graphs on which expectations for the predominantly healthy population (Normal), the DECIPHER population as a whole, and the gene‐specific data is plotted. Anthropometric measurements are plotted around the standard deviation (adjusted for sex and gestation, where possible), while developmental milestones are plotted against time. The standard deviation for each population is displayed at the bottom of the graph as a boxplot. For users logged into DECIPHER and looking at a patient record from their center, a vertical line indicates their patient's measurement or age at attainment of the milestone, allowing them to easily judge whether it is consistent with a pathogenic/likely pathogenic variant in the gene. The display of the DECIPHER population allows users to determine if a particular measurement is particularly discriminative for a given disorder. These gene‐specific centile charts can also be used in the clinic to determine how a child is developing relative to other children with the same disorder.
7.3.2. Composite facial images
For certain genes, there also are composite faces, which highlight facial dysmorphologies specific to a gene. These anonymized composite face images have been created from individuals with de novo mutations in the affected genes that were collected through the DDD study (Deciphering Developmental Disorders Study, 2017). The image capture capability in DECIPHER will facilitate further development of this aspect.
8. IDENTIFYING PATIENT MATCHES TO SUPPORTING DIAGNOSIS
8.1. Matchmaking within DECIPHER
The phenotype‐linked variant data in DECIPHER allows for effective patient matching. DECIPHER presents a powerful, flexible matching patient interface (Figure 6b), which allows users to view DECIPHER records that overlap a deposited copy‐number, sequence, or insertion variant, or a gene. The matching patient interface displays useful summary information about the potential matches, for example, for sequence variants, this is consequence, inheritance, and pathogenicity. To allow users to quickly identify the most prominent clinical features in overlapping patients, a list of phenotypes present in multiple matching patients is displayed. When viewing this interface from a patient record, additional lists showing which of the patient's phenotypes are present or not recorded in matching patients are also displayed. This allows users to easily determine if there is a good phenotypic match between their patient and other matching patients. Beneath this is a table containing information about the individual matching patient records. The table columns can be sorted and all matching phenotypes are shown in bold.
8.1.1. Customizable data display
A series of filters are provided in the matching patient interface so that users can drill into the most relevant patient data. This allows users to filter on, for example, functional similarity, consequence, inheritance, and/or pathogenicity. This can be particularly useful when different variant consequences are associated with different syndromes (e.g., SCN2A, where loss of function variants are associated with nonspecific severe intellectual disability, and missense variants with infantile epileptic encephalopathy).
8.1.2. Functionally identical variants
If the same variant has previously been deposited to DECIPHER, a “Functionally Identical Variant” interface is present, displaying variant pathogenicity and evidence, in addition to phenotype information from these patient records. This ensures that users are alerted to other patients carrying the same variant, and assists in the standardization of variant classification across centers.
8.1.3. Discriminative phenotypes
The wealth of the phenotype‐genotype‐linked data in DECIPHER also allows the aggregation of data associated with pathogenic variants in disease genes. Within DECIPHER, aggregated phenotype data is used to identify the most discriminating phenotypes associated with disease genes (Figure 6c). Recognizing distinctive clinical characteristics associated with a disorder can be key to a diagnosis. The interface presents a table displaying the percentage of phenotyped patients with sequence variants in a gene of interest with a particular phenotype, compared with the percentage of phenotyped patients in DECIPHER with the same phenotype, and the odds ratio and p value from a Fisher's exact test, which indicate the most discriminative phenotypes associated with a gene.
8.1.4. Clinician contact
If a matching patient is discovered, it is possible to contact the clinician responsible for the patient's care through DECIPHER. DECIPHER depositors are able to send messages directly, and since October 2014, over 4500 collaboration requests have been sent amongst these registered DECIPHER users. In the case where a user is not registered with DECIPHER, the DECIPHER team first moderates such contact requests, and if the request appears to be legitimate and appropriate, forwards the message to the clinician responsible for the patient, asking them to contact the requestor directly to discuss collaboration. Over 2900 such contact requests have been sent since January 2018.
8.2. Matchmaking through the Matchmaker Exchange
DECIPHER is a founding member of the Matchmaker Exchange (MME; https://matchmakerexchange.org), a Global Alliance for Genomics and Health (GA4GH) driver project which enables the federated discovery of similar rare disease patient data in connected databases. This worldwide collaboration allows automated matchmaking of genetic and/or phenotypic data between databases, via an application programming interface (API). Through MME, DECIPHER is currently connected to Broad‐seqr (https://seqr.broadinstitute.org/matchmaker/matchbox; Arachchi et al., 2018), GeneMatcher (https://genematcher.org; Sobreira et al., 2015), MyGene2 (https://www.mygene2.org/; MyGene2, 2016), PhenomeCentral (https://phenomecentral.org; Buske et al., 2015), and RD‐Connect (https://platform.rd-connect.eu; Lochmüller et al., 2018). Since 2020, DECIPHER depositors have made approximately 1500 requests for matches from connected databases and received details of more than 4100 potential patient matches. In the same time period, DECIPHER has received more than 55,000 requests for matches from connected databases, and has returned details of more than 255,000 potential patient matches.
Within DECIPHER, users with write access to a patient are able to query the MME. It is essential that the patient record in DECIPHER has explicit consent for open sharing, as some connected databases have dual notification, that is, they provide their user with details of any potential patient match, and unshared patient records will not be available to users of the other databases. Once MME is queried and the connected databases have responded, details of potential patient matches are displayed within the DECIPHER interface. Potential matches from each database are displayed in a tabular format with matching phenotypes in bold, assisting users in determining the level of phenotype similarity (Figure 6d). DECIPHER supports the querying of MME for patients with at least one open‐access sequence or a copy‐number variant that overlaps one gene. Other types of variants present in the patient record will not be included in the MME request.
When an MME request is sent to DECIPHER which contains genomic information, all open‐access patient sequence or copy‐number variants which overlap a single gene, and all DDD consortium research variants (see Section 8.3) are evaluated for similarity based on functional overlap. Many of the variant requests received from connected databases provide genomic coordinates in GRCh37, and in these cases, DECIPHER performs liftover to convert the coordinates to GRCh38 before identifying matches. A score for each potential patient match is provided, ranging from 0 to 1, with 1 indicating a better match. DECIPHER's scoring algorithm for genomic matches takes into account the Ensembl VEP predicted consequence, assessing the severity and similarity of the consequence to those provided in the request.
If only phenotypic data is provided, all open‐access patients with phenotypes are evaluated for a match. This takes into account all HPO ancestor terms for both the patient in the request, and patients within DECIPHER. These matches are scored by generating an Intersection over Union score comparing the HPO ancestor terms of the request patient and the patient in DECIPHER.
DECIPHER returns the 20 highest scoring matches per MME request. In the case where there are many matches, the patients' chromosomal sex is taken into account in addition to the score, to prioritize the best possible matches. The returned matches include variant, phenotype (including absent phenotypes), and diagnosis information.
9. DRIVING RARE DISEASE RESEARCH
The ~40,000 openly consented patient records in DECIPHER contain more than 51,000 variants and ~172,000 phenotypes, and represent a rich data set to drive rare disease research. Since its inception in 2004, DECIPHER has been cited more than 2600 times in peer‐reviewed publications (Figure 7a); a testimony to its impact on rare disease research. In some cases, there is a large genotypic patient series, which allows, for example, the full spectrum of phenotypes associated with a gene to be recognized. At the time of writing, the genes with the most open‐access sequence variants were NF1 (162), ANKRD11 (123), ARID1B (107), KMT2A (107), and DDX3X (78) (Figure 7b).
9.1. Search
To identify the most relevant patient records and gene information DECIPHER offers a powerful search function allowing users to search using many different categories including gene, phenotype, HPO identifier, genomic position (in GRCh37 or GRCh38), chromosome band, pathogenicity, and inheritance. Advanced searches are supported, such as searching for multiple terms either from the same category (e.g., multiple phenotypes) or different categories (e.g., gene plus phenotype). Results are displayed in a tabular format, in addition to genome browser‐based representations.
9.2. Driving discovery
The genotype‐linked phenotypic data allows, for example, new variant‐disease associations to be discovered, such as loss‐of‐function variants in ARFGEF1 causing developmental delay and epilepsy (Thomas et al., 2021). The data set also enables the extension of phenotypes for new syndromes to be uncovered (e.g., Witteveen–Kolk syndrome a SIN3A‐related disorder; Balasubramanian et al., 2021), in addition to well‐established syndromes (e.g., ALG13 congenital disorder of glycosylation; Alsharhan et al., 2021). It also permits the understanding of contiguous gene effects, such as that around ERF which causes a novel craniosynostosis syndrome with varying degrees of intellectual disability (Calpena et al., 2021).
9.3. DDD research variants
In addition to the openly consented patient data, DECIPHER openly shares the DDD research variants, which are variants of unknown significance identified in undiagnosed probands with developmental disorders in the DDD study. These include functional de novo variants and rare loss‐of‐function homozygous, compound heterozygous, and hemizygous variants in genes that are neither developmental disorder genes nor OMIM‐morbid genes. At present this data set comprises nearly 5000 variants. High‐level phenotype terms are provided for each variant (Figure 7c). The number of patients with each variant in the DDD data set is displayed, in addition to the number of patients identified in the GeneDx and Radboud University Medical Center de novo variant data set as described by Kaplanis et al. (2020). This data set enables the discovery of new gene‐disease associations.
9.4. Bulk data for research
The openly consented patient data is available for bulk download for research purposes, subject to a data access agreement. In bulk, the data can be used, for example, for developing new analytical methods, in understanding patterns of polymorphism, and in refining critical intervals to map genes involved in specific phenotypes and diseases. The data set has recently been used to associate phenotypes with functional systems (Jabato et al., 2021), and to develop a new tool to assist clinical interpretation of CNVs (Requena et al., 2021). DECIPHER also shares the data in bulk for display, subject to a Data Display Agreement. This allows third‐party variant analysis companies and academic genome browser providers such as Ensembl and UCSC to display the data, maximizing the possibility of finding patient matches.
10. Summary and future plans
DECIPHER is a free web‐based platform that enables the visualization of genomic and phenotypic relationships to aid variant interpretation, diagnosis, and discovery. The platform supports the interpretation and sharing of almost all types of genetic variation, providing variant interpretation interfaces that contextualize the genotypic and phenotypic data. These interfaces include a genome browser, protein browser, matching patient variant displays, and tools to assess the variant according to internationally‐accepted standards. Potential matching patients in other connected databases can also be identified through the MME. The platform enables the flexible and proportionate sharing of patient‐level data, so that the depth and breadth of sharing is tailored to the scientific/clinical needs and the level of patient consent attained. DECIPHER currently openly shares ~40,000 rare disease patient records, and supports the more limited sharing of >63,000. DECIPHER is under continuous development, ensuring that it keeps up to date with the fast‐moving field of rare genetic diseases. New user‐facing features are released approximately every 6 weeks, along with updates to reference data sources (such as the Ensembl/GENCODE gene set, HPO, ClinVar). Future plans for the platform include integration of datasets to further assist variant interpretation in the noncoding genome (e.g., regulatory datasets), inclusion of management resource information (e.g., treatment information and links to cellular pathway information), and integration of functional data (e.g., saturation genome editing).
DECIPHER enables clinical use of selected new datasets and tools developed by the research community. This makes them directly available to clinicians and clinical scientists, thereby assisting in the rapid translation of research into the diagnostic arena. Since its inception in 2004, the platform has made a huge impact on rare genetic disease research and is cited in more than 2600 publications. The rich phenotype‐linked variant data hosted by DECIPHER, and the tools it provides, enable DECIPHER to advance its mission of mapping the clinically relevant parts of the genome.
CONFLICT OF INTERESTS
Matthew Hurles is a cofounder, shareholder, and nonexecutive director of Congenica Ltd., a diagnostic software company.
ACKNOWLEDGMENTS
The authors thank the patients and their families for their permission to include their information in DECIPHER, and all registered DECIPHER users for depositing and seeking consent to share patient data. The authors would also like to thank Graeme Black and Panagiotis Sergouniotis, University of Manchester and Manchester University NHS Foundation Trust, for their invaluable input to the eye HPO phenotype forms. The DECIPHER project was given a favorable NRES REC opinion by Cambridge South (previously Cambridgeshire 4 REC), REC reference 04/MRE05/50, in November 2004. DECIPHER submits annual progress reports to ensure this favorable opinion applies for the duration of the research. DECIPHER is supported by Wellcome funding, grant WT206194. Helen Firth is supported by The Wellcome award WT200990/Z/16/Z Designing, developing, and delivering integrated foundations for genomic medicine. Fiona Cunnigham and Sarah E. Hunt receive funding from the Wellcome Trust (Grant number WT108749/Z/15/Z) and the European Molecular Biology Laboratory. This study was funded in whole, or in part, by the Wellcome Trust [Grant numbers WT206194, WT200990/Z/16/Z, and WT108749/Z/15/Z]. For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.
Foreman, J. , Brent, S. , Perrett, D. , Bevan, A. P. , Hunt, S. E. , Cunningham, F. , Hurles, M. E. , & Firth, H. V. (2022). DECIPHER: Supporting the interpretation and sharing of rare disease phenotype‐linked variant data to advance diagnosis and research. Human Mutation, 43, 682–697. 10.1002/humu.24340
Contributor Information
Julia Foreman, Email: jf11@sanger.ac.uk.
Helen V. Firth, Email: hvf21@cam.ac.uk.
REFERENCES
- Adzhubei, I. , Jordan, D. M. , & Sunyaev, S. R. (2013). Predicting functional effect of human missense mutations using PolyPhen‐2. Current Protocols in Human Genetics, Chapter 7, Unit 7 20, 10.1002/0471142905.hg0720s76 [DOI] [PMC free article] [PubMed]
- Alsharhan, H. , He, M. , Edmondson, A. C. , Daniel, E. , Chen, J. , Donald, T. , Bakhtiari, S. , Amor, D. J. , Jones, E. A. , Vassallo, G. , Vincent, M. , Cogné, B. , Deb, W. , Werners, A. H. , Jin, S. C. , Bilguvar, K. , Christodoulou, J. , Webster, R. I. , Yearwood, K. R. , … Sobering, A. K. (2021). ALG13 X‐linked intellectual disability: New variants, glycosylation analysis, and expanded phenotypes. Journal of Inherited Metabolic Disease, 44(4), 1001–1012. 10.1002/jimd.12378 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Amberger, J. S. , Bocchini, C. A. , Scott, A. F. , & Hamosh, A. (2019). OMIM.org: Leveraging knowledge across phenotype‐gene relationships. Nucleic Acids Research, 47(D1), D1038–D1043. 10.1093/nar/gky1151 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arachchi, H. , Wojcik, M. H. , Weisburd, B. , Jacobsen, J. , Valkanas, E. , Baxter, S. , Byrne, A. B. , O'Donnell‐Luria, A. H. , Haendel, M. , Smedley, D. , MacArthur, D. G. , Philippakis, A. A. , & Rehm, H. L. (2018). matchbox: An open‐source tool for patient matching via the Matchmaker Exchange. Human Mutation, 39(12), 1827–1834. 10.1002/humu.23655 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Balasubramanian, M. , Dingemans, A. , Albaba, S. , Richardson, R. , Yates, T. M. , Cox, H. , Douzgou, S. , Armstrong, R. , Sansbury, F. H. , Burke, K. B. , Fry, A. E. , Ragge, N. , Sharif, S. , Foster, A. , De Sandre‐Giovannoli, A. , Elouej, S. , Vasudevan, P. , Mansour, S. , Wilson, K. , … Kleefstra, T. (2021). Comprehensive study of 28 individuals with SIN3A‐related disorder underscoring the associated mild cognitive and distinctive facial phenotype. European Journal of Human Genetics, 29(4), 625–636. 10.1038/s41431-020-00769-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boggan, R. M. , Lim, A. , Taylor, R. W. , McFarland, R. , & Pickett, S. J. (2019). Resolving complexity in mitochondrial disease: Towards precision medicine. Molecular Genetics and Metabolism, 128(1‐2), 19–29. 10.1016/j.ymgme.2019.09.003 [DOI] [PubMed] [Google Scholar]
- Bragin, E. , Chatzimichali, E. A. , Wright, C. F. , Hurles, M. E. , Firth, H. V. , Bevan, A. P. , & Swaminathan, G. J. (2014). DECIPHER: Database for the interpretation of phenotype‐linked plausibly pathogenic sequence and copy‐number variation. Nucleic Acids Research, 42, D993–D1000. 10.1093/nar/gkt937 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brandt, T. , Sack, L. M. , Arjona, D. , Tan, D. , Mei, H. , Cui, H. , Gao, H. , Bean, L. , Ankala, A. , Del Gaudio, D. , Knight Johnson, A. , Vincent, L. M. , Reavey, C. , Lai, A. , Richard, G. , & Meck, J. M. (2020). Adapting ACMG/AMP sequence variant classification guidelines for single‐gene copy number variants. Genetics in Medicine, 22(2), 336–344. 10.1038/s41436-019-0655-2 [DOI] [PubMed] [Google Scholar]
- Brnich, S. E. , Abou Tayoun, A. N. , Couch, F. J. , Cutting, G. R. , Greenblatt, M. S. , Heinen, C. D. , Luo, X. , McNulty, S. M. , Starita, L. M. , Tavtigian, S. V. , Wright, M. W. , Harrison, S. M. , Biesecker, L. G. , Berg, J. S. , & Clinical Genome Resource Sequence Variant Interpretation Working Group (2019). Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Medicine, 12(1), 3. 10.1186/s13073-019-0690-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buske, O. J. , Girdea, M. , Dumitriu, S. , Gallinger, B. , Hartley, T. , Trang, H. , Misyura, A. , Friedman, T. , Beaulieu, C. , Bone, W. P. , Links, A. E. , Washington, N. L. , Haendel, M. A. , Robinson, P. N. , Boerkoel, C. F. , Adams, D. , Gahl, W. A. , Boycott, K. M. , & Brudno, M. (2015). PhenomeCentral: A portal for phenotypic and genotypic matchmaking of patients with rare genetic diseases. Human Mutation, 36(10), 931–940. 10.1002/humu.22851 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Calpena, E. , McGowan, S. J. , Blanco Kelly, F. , Boudry‐Labis, E. , Dieux‐Coeslier, A. , Harrison, R. , Johnson, D. , Lachlan, K. , Morton, J. , Stewart, H. , Vasudevan, P. , Genomics England Research Consortium , Twigg, S. , & Wilkie, A. (2021). Dissection of contiguous gene effects for deletions around ERF on chromosome 19. Human Mutation, 42(7), 811–817. 10.1002/humu.24213 [DOI] [PubMed] [Google Scholar]
- Campbell, P. , Ellingford, J. M. , Parry, N. , Fletcher, T. , Ramsden, S. C. , Gale, T. , Hall, G. , Smith, K. , Kasperaviciute, D. , Thomas, E. , Lloyd, I. C. , Douzgou, S. , Clayton‐Smith, J. , Biswas, S. , Ashworth, J. L. , Black, G. , & Sergouniotis, P. I. (2019). Clinical and genetic variability in children with partial albinism. Scientific Reports, 9(1), 16576. 10.1038/s41598-019-51768-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chatzimichali, E. A. , Brent, S. , Hutton, B. , Perrett, D. , Wright, C. F. , Bevan, A. P. , Hurles, M. E. , Firth, H. V. , & Swaminathan, G. J. (2015). Facilitating collaboration in rare genetic disorders through effective matchmaking in DECIPHER. Human Mutation, 36(10), 941–949. 10.1002/humu.22842 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Church, D. M. , Lappalainen, I. , Sneddon, T. P. , Hinton, J. , Maguire, M. , Lopez, J. , Garner, J. , Paschall, J. , DiCuccio, M. , Yaschenko, E. , Scherer, S. W. , Feuk, L. , & Flicek, P. (2010). Public data archives for genomic structural variation. Nature Genetics, 42(10), 813–814. 10.1038/ng1010-813 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deciphering Developmental Disorders Study . (2017). Prevalence and architecture of de novo mutations in developmental disorders. Nature, 542(7642), 433–438. 10.1038/nature21062 [DOI] [PMC free article] [PubMed] [Google Scholar]
- den Dunnen, J. T. , Dalgleish, R. , Maglott, D. R. , Hart, R. K. , Greenblatt, M. S. , McGowan‐Jordan, J. , Roux, A. F. , Smith, T. , Antonarakis, S. E. , & Taschner, P. E. (2016). HGVS recommendations for the description of sequence variants: 2016 update. Human Mutation, 37(6), 564–569. 10.1002/humu.22981 [DOI] [PubMed] [Google Scholar]
- Ferrer, A. , Schultz‐Rogers, L. , Kaiwar, C. , Kemppainen, J. L. , Klee, E. W. , & Gavrilova, R. H. (2019). Three rare disease diagnoses in one patient through exome sequencing. Cold Spring Harbor Molecular Case Studies, 5(6), 10.1101/mcs.a004390 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Firth, H. V. , Richards, S. M. , Bevan, A. P. , Clayton, S. , Corpas, M. , Rajan, D. , Van Vooren, S. , Moreau, Y. , Pettett, R. M. , & Carter, N. P. (2009). DECIPHER: Database of chromosomal imbalance and phenotype in humans using Ensembl resources. American Journal of Human Genetics, 84(4), 524–533. 10.1016/j.ajhg.2009.03.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grady, J. P. , Pickett, S. J. , Ng, Y. S. , Alston, C. L. , Blakely, E. L. , Hardy, S. A. , Feeney, C. L. , Bright, A. A. , Schaefer, A. M. , Gorman, G. S. , McNally, R. J. , Taylor, R. W. , Turnbull, D. M. , & McFarland, R. (2018). mtDNA heteroplasmy level and copy number indicate disease burden in m.3243A>G mitochondrial disease. EMBO Molecular Medicine, 10(6), 10.15252/emmm.201708262 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Groesser, L. , Herschberger, E. , Ruetten, A. , Ruivenkamp, C. , Lopriore, E. , Zutt, M. , Langmann, T. , Singer, S. , Klingseisen, L. , Schneider‐Brachert, W. , Toll, A. , Real, F. X. , Landthaler, M. , & Hafner, C. (2012). Postzygotic HRAS and KRAS mutations cause nevus sebaceous and Schimmelpenning syndrome. Nature Genetics, 44(7), 783–787. 10.1038/ng.2316 [DOI] [PubMed] [Google Scholar]
- Gunning, A. C. , Fryer, V. , Fasham, J. , Crosby, A. H. , Ellard, S. , Baple, E. L. , & Wright, C. F. (2021). Assessing performance of pathogenicity predictors using clinically relevant variant datasets. Journal of Medical Genetics, 58(8), 547–555. 10.1136/jmedgenet-2020-107003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartley, T. , Balci, T. B. , Rojas, S. K. , Eaton, A. , Canada, C. R. , Dyment, D. A. , & Boycott, K. M. (2018). The unsolved rare genetic disease atlas? An analysis of the unexplained phenotypic descriptions in OMIM(R). American Journal of Medical Genetics. Part C, Seminars in Medical Genetics, 178(4), 458–463. 10.1002/ajmg.c.31662 [DOI] [PubMed] [Google Scholar]
- Ioannidis, N. M. , Rothstein, J. H. , Pejaver, V. , Middha, S. , McDonnell, S. K. , Baheti, S. , Musolf, A. , Li, Q. , Holzinger, E. , Karyadi, D. , Cannon‐Albright, L. A. , Teerlink, C. C. , Stanford, J. L. , Isaacs, W. B. , Xu, J. , Cooney, K. A. , Lange, E. M. , Schleutker, J. , Carpten, J. D. , … Sieh, W. (2016). REVEL: An ensemble method for predicting the pathogenicity of rare missense variants. American Journal of Human Genetics, 99(4), 877–885. 10.1016/j.ajhg.2016.08.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jabato, F. M. , Seoane, P. , Perkins, J. R. , Rojano, E. , García Moreno, A. , Chagoyen, M. , Pazos, F. , & Ranea, J. (2021). Systematic identification of genetic systems associated with phenotypes in patients with rare genomic copy number variations. Human Genetics, 140(3), 457–475. 10.1007/s00439-020-02214-7 [DOI] [PubMed] [Google Scholar]
- Jaganathan, K. , Kyriazopoulou Panagiotopoulou, S. , McRae, J. F. , Darbandi, S. F. , Knowles, D. , Li, Y. I. , Kosmicki, J. A. , Arbelaez, J. , Cui, W. , Schwartz, G. B. , Chow, E. D. , Kanterakis, E. , Gao, H. , Kia, A. , Batzoglou, S. , Sanders, S. J. , & Farh, K. K. (2019). Predicting splicing from primary sequence with deep learning. Cell, 176(3), 535–548. 10.1016/j.cell.2018.12.015 [DOI] [PubMed] [Google Scholar]
- Jumper, J. , Evans, R. , Pritzel, A. , Green, T. , Figurnov, M. , Ronneberger, O. , Tunyasuvunakool, K. , Bates, R. , Žídek, A. , Potapenko, A. , Bridgland, A. , Meyer, C. , Kohl, S. , Ballard, A. J. , Cowie, A. , Romera‐Paredes, B. , Nikolov, S. , Jain, R. , Adler, J. , … Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589. 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaplanis, J. , Samocha, K. E. , Wiel, L. , Zhang, Z. , Arvai, K. J. , Eberhardt, R. Y. , Gallone, G. , Lelieveld, S. H. , Martin, H. C. , McRae, J. F. , Short, P. J. , Torene, R. I. , de Boer, E. , Danecek, P. , Gardner, E. J. , Huang, N. , Lord, J. , Martincorena, I. , Pfundt, R. , … Retterer, K. (2020). Evidence for 28 genetic disorders discovered by combining healthcare and research data. Nature, 586(7831), 757–762. 10.1038/s41586-020-2832-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karczewski, K. J. , Francioli, L. C. , Tiao, G. , Cummings, B. B. , Alföldi, J. , Wang, Q. , Collins, R. L. , Laricchia, K. M. , Ganna, A. , Birnbaum, D. P. , Gauthier, L. D. , Brand, H. , Solomonson, M. , Watts, N. A. , Rhodes, D. , Singer‐Berk, M. , England, E. M. , Seaby, E. G. , Kosmicki, J. A. , … MacArthur, D. G. (2020). The mutational constraint spectrum quantified from variation in 141,456 humans. Nature, 581(7809), 434–443. 10.1038/s41586-020-2308-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kircher, M. , Witten, D. M. , Jain, P. , O'Roak, B. J. , Cooper, G. M. , & Shendure, J. (2014). A general framework for estimating the relative pathogenicity of human genetic variants. Nature Genetics, 46(3), 310–315. 10.1038/ng.2892 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Köhler, S. , Carmody, L. , Vasilevsky, N. , Jacobsen, J. , Danis, D. , Gourdine, J. P. , Gargano, M. , Harris, N. L. , Matentzoglu, N. , McMurry, J. A. , Osumi‐Sutherland, D. , Cipriani, V. , Balhoff, J. P. , Conlin, T. , Blau, H. , Baynam, G. , Palmer, R. , Gratian, D. , Dawkins, H. , … Robinson, P. N. (2019). Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Research, 47(D1), D1018–D1027. 10.1093/nar/gky1105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuhn, R. M. , Haussler, D. , & Kent, W. J. (2013). The UCSC genome browser and associated tools. Briefings in Bioinformatics, 14(2), 144–161. 10.1093/bib/bbs038 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Landrum, M. J. , Lee, J. M. , Benson, M. , Brown, G. R. , Chao, C. , Chitipiralla, S. , Gu, B. , Hart, J. , Hoffman, D. , Jang, W. , Karapetyan, K. , Katz, K. , Liu, C. , Maddipatla, Z. , Malheiro, A. , McDaniel, K. , Ovetsky, M. , Riley, G. , Zhou, G. , … Maglott, D. R. (2018). ClinVar: Improving access to variant interpretations and supporting evidence. Nucleic Acids Research, 46(D1), D1062–D1067. 10.1093/nar/gkx1153 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li, H. , Dawood, M. , Khayat, M. M. , Farek, J. R. , Jhangiani, S. N. , Khan, Z. M. , Mitani, T. , Coban‐Akdemir, Z. , Lupski, J. R. , Venner, E. , Posey, J. E. , Sabo, A. , & Gibbs, R. A. (2021). Exome variant discrepancies due to reference‐genome differences. American Journal of Human Genetics, 108(7), 1239–1250. 10.1016/j.ajhg.2021.05.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu, P. , Meng, L. , Normand, E. A. , Xia, F. , Song, X. , Ghazi, A. , Rosenfeld, J. , Magoulas, P. L. , Braxton, A. , Ward, P. , Dai, H. , Yuan, B. , Bi, W. , Xiao, R. , Wang, X. , Chiang, T. , Vetrini, F. , He, W. , Cheng, H. , … Yang, Y. (2019). Reanalysis of clinical exome sequencing data. New England Journal of Medicine, 380(25), 2478–2480. 10.1056/NEJMc1812033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lochmüller, H. , Badowska, D. M. , Thompson, R. , Knoers, N. V. , Aartsma‐Rus, A. , Gut, I. , Wood, L. , Harmuth, T. , Durudas, A. , Graessner, H. , Schaefer, F. , Riess, O. , RD‐Connect consortium , NeurOmics consortium , & EURenOmics consortium (2018). RD‐Connect, NeurOmics and EURenOmics: Collaborative European initiative for rare diseases. European Journal of Human Genetics, 26(6), 778–785. 10.1038/s41431-018-0115-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McLaren, W. , Gil, L. , Hunt, S. E. , Riat, H. S. , Ritchie, G. R. , Thormann, A. , Flicek, P. , & Cunningham, F. (2016). The Ensembl variant effect predictor. Genome Biology, 17(1), 122. 10.1186/s13059-016-0974-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miller, D. (2021). The diagnostic odyssey: Our family's story. American Journal of Human Genetics, 108(2), 217–218. 10.1016/j.ajhg.2021.01.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mistry, J. , Chuguransky, S. , Williams, L. , Qureshi, M. , Salazar, G. A. , Sonnhammer, E. , Tosatto, S. , Paladin, L. , Raj, S. , Richardson, L. J. , Finn, R. D. , & Bateman, A. (2021). Pfam: The protein families database in 2021. Nucleic Acids Research, 49(D1), D412–D419. 10.1093/nar/gkaa913 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Monroe, G. R. , Frederix, G. W. , Savelberg, S. M. , de Vries, T. I. , Duran, K. J. , van der Smagt, J. J. , Terhal, P. A. , van Hasselt, P. M. , Kroes, H. Y. , Verhoeven‐Duif, N. M. , Nijman, I. J. , Carbo, E. C. , van Gassen, K. L. , Knoers, N. V. , Hövels, A. M. , van Haelst, M. M. , Visser, G. , & van Haaften, G. (2016). Effectiveness of whole‐exome sequencing and costs of the traditional diagnostic trajectory in children with intellectual disability. Genetics in Medicine, 18(9), 949–956. 10.1038/gim.2015.200 [DOI] [PubMed] [Google Scholar]
- MyGene2 . (2016). Website aims to accelerate gene discovery, diagnosis, treatment: MyGene2.org fosters open sharing among families, researchers, and clinicians. The American Journal of Medical Genetics ‐ Part A, 170(6), 1388–1389. 10.1002/ajmg.a.37746 [DOI] [PubMed] [Google Scholar]
- Nguengang Wakap, S. , Lambert, D. M. , Olry, A. , Rodwell, C. , Gueydan, C. , Lanneau, V. , Murphy, D. , Le Cam, Y. , & Rath, A. (2020). Estimating cumulative point prevalence of rare diseases: Analysis of the Orphanet database. European Journal of Human Genetics, 28(2), 165–173. 10.1038/s41431-019-0508-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Posey, J. E. , Harel, T. , Liu, P. , Rosenfeld, J. A. , James, R. A. , Coban Akdemir, Z. H. , Walkiewicz, M. , Bi, W. , Xiao, R. , Ding, Y. , Xia, F. , Beaudet, A. L. , Muzny, D. M. , Gibbs, R. A. , Boerwinkle, E. , Eng, C. M. , Sutton, V. R. , Shaw, C. A. , Plon, S. E. , … Lupski, J. R. (2017). Resolution of disease phenotypes resulting from multilocus genomic variation. New England Journal of Medicine, 376(1), 21–31. 10.1056/NEJMoa1516767 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quaio, C. , Moreira, C. M. , Novo‐Filho, G. M. , Sacramento‐Bobotis, P. R. , Groenner Penna, M. , Perazzio, S. F. , Dutra, A. P. , da Silva, R. A. , Santos, M. , de Arruda, V. , Freitas, V. G. , Pereira, V. C. , Pintao, M. C. , Fornari, A. , Buzolin, A. L. , Oku, A. Y. , Burger, M. , Ramalho, R. F. , Marco Antonio, D. S. , … Baratela, W. (2020). Diagnostic power and clinical impact of exome sequencing in a cohort of 500 patients with rare diseases. American Journal of Medical Genetics. Part C, Seminars in Medical Genetics, 184(4), 955–964. 10.1002/ajmg.c.31860 [DOI] [PubMed] [Google Scholar]
- Rahit, K. , & Tarailo‐Graovac, M. (2020). Genetic modifiers and rare mendelian disease. Genes (Basel), 11(3), 10.3390/genes11030239 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Requena, F. , Abdallah, H. H. , Garcia, A. , Nitschke, P. , Romana, S. , Malan, V. , & Rausell, A. (2021). CNVxplorer: A web tool to assist clinical interpretation of CNVs in rare disease patients. Nucleic Acids Research, 49(W1), W93–W103. 10.1093/nar/gkab347 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richards, S. , Aziz, N. , Bale, S. , Bick, D. , Das, S. , Gastier‐Foster, J. , Grody, W. W. , Hegde, M. , Lyon, E. , Spector, E. , Voelkerding, K. , Rehm, H. L. , & ACMG Laboratory Quality Assurance Committee (2015). Standards and guidelines for the interpretation of sequence variants: A joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in Medicine, 17(5), 405–424. 10.1038/gim.2015.30 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Riggs, E. R. , Andersen, E. F. , Cherry, A. M. , Kantarci, S. , Kearney, H. , Patel, A. , Raca, G. , Ritter, D. I. , South, S. T. , Thorland, E. C. , Pineda‐Alvarez, D. , Aradhya, S. , & Martin, C. L. (2020). Technical standards for the interpretation and reporting of constitutional copy‐number variants: A joint consensus recommendation of the American College of Medical Genetics and Genomics (ACMG) and the Clinical Genome Resource (ClinGen). Genetics in Medicine, 22(2), 245–257. 10.1038/s41436-019-0686-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sawyer, S. L. , Hartley, T. , Dyment, D. A. , Beaulieu, C. L. , Schwartzentruber, J. , Smith, A. , Bedford, H. M. , Bernard, G. , Bernier, F. P. , Brais, B. , Bulman, D. E. , Warman Chardon, J. , Chitayat, D. , Deladoëy, J. , Fernandez, B. A. , Frosk, P. , Geraghty, M. T. , Gerull, B. , Gibson, W. , … Boycott, K. M. (2016). Utility of whole‐exome sequencing for those near the end of the diagnostic odyssey: Time to address gaps in care. Clinical Genetics, 89(3), 275–284. 10.1111/cge.12654 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sim, N. L. , Kumar, P. , Hu, J. , Henikoff, S. , Schneider, G. , & Ng, P. C. (2012). SIFT web server: Predicting effects of amino acid substitutions on proteins. Nucleic Acids Research, 40, W452–W457. 10.1093/nar/gks539 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sobreira, N. , Schiettecatte, F. , Valle, D. , & Hamosh, A. (2015). GeneMatcher: A matching tool for connecting investigators with an interest in the same gene. Human Mutation, 36(10), 928–930. 10.1002/humu.22844 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stenson, P. D. , Mort, M. , Ball, E. V. , Chapman, M. , Evans, K. , Azevedo, L. , Hayden, M. , Heywood, S. , Millar, D. S. , Phillips, A. D. , & Cooper, D. N. (2020). The Human Gene Mutation Database (HGMD(R)): Optimizing its use in a clinical diagnostic or research setting. Human Genetics, 139(10), 1197–1207. 10.1007/s00439-020-02199-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stranneheim, H. , Lagerstedt‐Robinson, K. , Magnusson, M. , Kvarnung, M. , Nilsson, D. , Lesko, N. , Engvall, M. , Anderlid, B. M. , Arnell, H. , Johansson, C. B. , Barbaro, M. , Björck, E. , Bruhn, H. , Eisfeldt, J. , Freyer, C. , Grigelioniene, G. , Gustavsson, P. , Hammarsjö, A. , Hellström‐Pigg, M. , … Wedell, A. (2021). Integration of whole genome sequencing into a healthcare setting: High diagnostic rates across multiple clinical entities in 3219 rare disease patients. Genome Medicine, 13(1), 40. 10.1186/s13073-021-00855-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swaminathan, G. J. , Bragin, E. , Chatzimichali, E. A. , Corpas, M. , Bevan, A. P. , Wright, C. F. , Carter, N. P. , Hurles, M. E. , & Firth, H. V. (2012). DECIPHER: Web‐based, community resource for clinical interpretation of rare variants in developmental disorders. Human Molecular Genetics, 21(R1), R37–R44. 10.1093/hmg/dds362 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tan, A. , Abecasis, G. R. , & Kang, H. M. (2015). Unified representation of genetic variants. Bioinformatics, 31(13), 2202–2204. 10.1093/bioinformatics/btv112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tavtigian, S. V. , Greenblatt, M. S. , Harrison, S. M. , Nussbaum, R. L. , Prabhu, S. A. , Boucher, K. M. , & ClinGen Sequence Variant Interpretation Working Group (ClinGen SVI) (2018). Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genetics in Medicine, 20(9), 1054–1060. 10.1038/gim.2017.210 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomas, Q. , Gautier, T. , Marafi, D. , Besnard, T. , Willems, M. , Moutton, S. , Isidor, B. , Cogné, B. , Conrad, S. , Tenconi, R. , Iascone, M. , Sorlin, A. , Masurel, A. , Dabir, T. , Jackson, A. , Banka, S. , Delanne, J. , Lupski, J. R. , Saadi, N. W. , … Vitobello, A. (2021). Haploinsufficiency of ARFGEF1 is associated with developmental delay, intellectual disability, and epilepsy with variable expressivity. Genetics in Medicine, 23, 1901–1911. 10.1038/s41436-021-01218-6 [DOI] [PubMed] [Google Scholar]
- Whiffin, N. , Minikel, E. , Walsh, R. , O'Donnell‐Luria, A. H. , Karczewski, K. , Ing, A. Y. , Barton, P. , Funke, B. , Cook, S. A. , MacArthur, D. , & Ware, J. S. (2017). Using high‐resolution variant frequencies to empower clinical genome interpretation. Genetics in Medicine, 19(10), 1151–1158. 10.1038/gim.2017.26 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wright, C. F. , Fitzgerald, T. W. , Jones, W. D. , Clayton, S. , McRae, J. F. , van Kogelenberg, M. , King, D. A. , Ambridge, K. , Barrett, D. M. , Bayzetinova, T. , Bevan, A. P. , Bragin, E. , Chatzimichali, E. A. , Gribble, S. , Jones, P. , Krishnappa, N. , Mason, L. E. , Miller, R. , Morley, K. I. , … Firth, H. V. (2015). Genetic diagnosis of developmental disorders in the DDD study: A scalable analysis of genome‐wide research data. Lancet, 385(9975), 1305–1314. 10.1016/S0140-6736(14)61705-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wright, C. F. , FitzPatrick, D. R. , & Firth, H. V. (2018). Paediatric genomics: Diagnosing rare disease in children. Nature Reviews Genetics, 19(5), 253–268. 10.1038/nrg.2017.116 [DOI] [PubMed] [Google Scholar]