Skip to main content
Genome Medicine logoLink to Genome Medicine
. 2021 Oct 14;13:153. doi: 10.1186/s13073-021-00965-0

Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases

Francisco M De La Vega 1,2,3, Shimul Chowdhury 4, Barry Moore 5, Erwin Frise 1, Jeanette McCarthy 1, Edgar Javier Hernandez 5, Terence Wong 4, Kiely James 4, Lucia Guidugli 4, Pankaj B Agrawal 6,7, Casie A Genetti 6, Catherine A Brownstein 6, Alan H Beggs 6, Britt-Sabina Löscher 8, Andre Franke 8, Braden Boone 9, Shawn E Levy 9, Katrin Õunap 10,11, Sander Pajusalu 10,11, Matt Huentelman 12, Keri Ramsey 12, Marcus Naymik 12, Vinodh Narayanan 12, Narayanan Veeraraghavan 4, Paul Billings 1, Martin G Reese 1,, Mark Yandell 1,5,, Stephen F Kingsmore 4
PMCID: PMC8515723  PMID: 34645491

Abstract

Background

Clinical interpretation of genetic variants in the context of the patient’s phenotype is becoming the largest component of cost and time expenditure for genome-based diagnosis of rare genetic diseases. Artificial intelligence (AI) holds promise to greatly simplify and speed genome interpretation by integrating predictive methods with the growing knowledge of genetic disease. Here we assess the diagnostic performance of Fabric GEM, a new, AI-based, clinical decision support tool for expediting genome interpretation.

Methods

We benchmarked GEM in a retrospective cohort of 119 probands, mostly NICU infants, diagnosed with rare genetic diseases, who received whole-genome or whole-exome sequencing (WGS, WES). We replicated our analyses in a separate cohort of 60 cases collected from five academic medical centers. For comparison, we also analyzed these cases with current state-of-the-art variant prioritization tools. Included in the comparisons were trio, duo, and singleton cases. Variants underpinning diagnoses spanned diverse modes of inheritance and types, including structural variants (SVs). Patient phenotypes were extracted from clinical notes by two means: manually and using an automated clinical natural language processing (CNLP) tool. Finally, 14 previously unsolved cases were reanalyzed.

Results

GEM ranked over 90% of the causal genes among the top or second candidate and prioritized for review a median of 3 candidate genes per case, using either manually curated or CNLP-derived phenotype descriptions. Ranking of trios and duos was unchanged when analyzed as singletons. In 17 of 20 cases with diagnostic SVs, GEM identified the causal SVs as the top candidate and in 19/20 within the top five, irrespective of whether SV calls were provided or inferred ab initio by GEM using its own internal SV detection algorithm. GEM showed similar performance in absence of parental genotypes. Analysis of 14 previously unsolved cases resulted in a novel finding for one case, candidates ultimately not advanced upon manual review for 3 cases, and no new findings for 10 cases.

Conclusions

GEM enabled diagnostic interpretation inclusive of all variant types through automated nomination of a very short list of candidate genes and disorders for final review and reporting. In combination with deep phenotyping by CNLP, GEM enables substantial automation of genetic disease diagnosis, potentially decreasing cost and expediting case review.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13073-021-00965-0.

Background

A central tenet of genomic medicine is that outcomes are improved when symptom-based diagnoses and treatments are augmented with genetic diagnoses and genotype-differentiated treatments. Worldwide, an estimated 7 million infants are born with serious genetic disorders every year [1]. The last decade witnessed a huge increase in the catalog of genes associated with Mendelian conditions, from about 2300 in 2010 [2], to over 6700 by the end of 2020 [3]. The translation of that knowledge, in conjunction with major improvements in WES and WGS and downstream analytical pipelines, has enabled increased rates of diagnosis, from about 10%, with single gene tests, to over 50% [4]. While limitations of read alignment and variant calling were major obstacles to early clinical implementations of WES and WGS [5], they have been largely removed by algorithmic advances, hardware acceleration, and parallelization through cloud computing [6, 7]. However, clinical interpretation of genetic variants in the context of the patient’s phenotype remains largely manual and extremely labor-intensive, requiring highly trained expert input. This remains a major barrier to widespread adoption and contributes to continued low rates of genomic testing for patients with suspected genetic disorders despite strong evidence for diagnostic and clinical utility and cost effectiveness [8].

The major challenge for genome-based diagnosis of rare genetic disease is to identify a putative disease-causing variant amid approximately four million benign variants in each genome, a problem akin to finding a needle in a haystack [9]. Clinical genome interpretation is, by necessity, performed by highly trained, scarce, genome analysts, genetic counselors, and laboratory directors [10]. For an average of 100 variants for review per case [11], this translates to 50–100 h of expert review per patient [10]. In practice, this has led to review of only about 10 variants per case, which somewhat defeats the purpose of genome-wide sequencing.

The genome interpretation process consists of iterative variant filtering, coupled with evidence-based review of candidate disease-causing variants [12]. This process was almost entirely manual until the advent of variant prioritization algorithms, such as Annovar [13] and VAAST [14], and was later improved by the integration of patient phenotypes in analyses, e.g., Phevor [15], Exomiser [16], Phen-Gen [17], Phenolyzer [18], and more recently Amelie [19]. While these tools accelerate review times, their stand-alone performance has been insufficient for widespread clinical adoption, in part due to their inability to appropriately interpret structural variants (SVs). SVs account for over 10% of Mendelian disease [20, 21], and about 20% of diagnoses in routine neonatal intensive care unit (NICU) [22] and pediatric patients [23]. Unified methods for prioritization of SVs, SNVs, and small indels are a fundamental requirement for further automation of genome interpretation.

The use of artificial intelligence (AI) has made significant inroads in healthcare [24], and a new class of genome interpretation methods [19, 2528] are being developed with the promise of removing the interpretation bottleneck for rare genetic disease diagnosis through electronic clinical decision support systems (eCDSS) [29]. Speed and accuracy of interpretation are particularly important for seriously ill children in the NICU [27], where diagnosis in the first 24–48 h of life has been shown to maximally improve health outcomes [30]. The settings and extent to which AI facilitates diagnosis are still being investigated [27, 28]. Issues include what types of AI methods are most suitable (e.g., Bayesian networks, decision trees, neural nets [31]); how they compare with current variant prioritization approaches in terms of accuracy; their diagnostic performance across different clinical scenarios and variant types; their potential to offer new forms of decision support; and how well they integrate with automated patient phenotyping and clinical decision making [27, 28, 32].

Algorithmic benchmarking in this domain is no simple matter. Hitherto, most attempts have used simulated cases (created by adding known disease-causing variants to reference exomes and genomes), included only a few cases, derived from a single center, or were limited to certain variant types [17, 33, 34]. Such benchmarking is inherently limited, as it is not representative of the true diversity of genetic diseases and variant types (e.g., by omitting cases with causal SVs), and provide no means to evaluate the impact of different sequencing and variant calling pipelines on performance.

Here we describe and benchmark the diagnostic performance of Fabric GEM (hereafter referred to as “GEM”), a new AI-based eCDSS, and compare it to current variant prioritization approaches using a diverse cohort of retrospective pediatric cases from the Rady Children’s Institute for Genomic Medicine (RCIGM). These cases are largely comprised of seriously ill NICU infants; all were diagnosed with Mendelian conditions following WGS (or, in a few cases, WES), using a combination of filtering and variant prioritization approaches. These real-world cases encompass the breadth of phenotypes and disease-causing variants, including pathogenic SVs. We then sought to replicate the diagnostic performance of GEM in a second set of affected, diagnosed, and undiagnosed children outside the NICU. They were collected from five different academic medical centers, mostly consisting of WES, to examine the generalizability of GEM’s diagnostic performance to other sequencing, variant calling pipelines, and clinical settings. Finally, we reanalyzed a set of previously negative RCIGM cases to evaluate the ability of GEM to identify new diagnoses without suggesting numerous false positives that would lead to time-consuming case reviews. Our results show that rapid, accurate, and comprehensive WGS- and WES-based diagnosis is achievable through integration of new data modalities with algorithmic innovations made possible by AI.

Methods

Patient selection, phenotyping, and specimen sequencing

This retrospective study was designed to provide benchmark data to test the GEM eCDSS. We compiled 119 cases from Rady Children’s Hospital (the Benchmark cohort), consisting of mostly NICU admissions, and 60 additional cases from five academic medical centers (the Validation cohort), which consisted mostly of referrals from genetic clinics and none included causal structural variants, as described below.

Rady Children’s Hospital

In total, 119 cases with primary findings, deemed definitively solved using previously published methods [27, 30, 35], and 14 negative cases, were sequenced as part of the rapid-WGS (rWGS) sequencing program at the Rady Children’s Hospital Clinical Genome Center. These cases where a sample of convenience, drawn from the first symptomatic children who were enrolled in four previously published studies that examined the diagnostic rate, time to diagnosis, clinical utility of diagnosis, outcomes, and health care utilization of rWGS between 26 July 2016 and 25 September 2018 at Rady Children’s Hospital, San Diego, USA. One of the studies was a randomized controlled trial of genome and exome sequencing (ClinicalTrials.gov identifier: NCT03211039) [30]; the others were cohort studies (ClinicalTrials.gov identifiers: NCT02917460, and NCT03385876) [3540]. All subjects had a symptomatic illness of unknown etiology in which a genetic disorder was suspected, had a Rady Children’s Hospital Epic EHR, and that had clinical phenotype descriptions expressed as human phenotype ontology terms both manually curated by clinicians and automatically extracted by CNLP (Additional file 1: Table S1).

WGS (or in a few instances WES) was performed as previously described [35, 40]. Briefly, PCR-free WGS was performed to an average of 40× coverage in the Illumina HiSeq 2000, HiSeq 4000, and NovaSeq 6000 sequencers. Alignment and sequence variant calling were performed using the Illumina DRAGEN software, while copy number variation was identified through an approach that integrates the tools Manta [41] and CNVnator [42]. Structural variants were then filtered for recurrent artifacts observed in previous non-affected cases and only included in the input VCF file if they overlap a known disease gene (OMIM). All variants reported as primary findings were validated orthogonally by Sanger sequencing. In the case of trios, de novo origin of reported variants was established by comparing to their parents’ data. In some older cases, SV calling was not performed; any causal SVs therein were identified by an orthogonal CGH microarray or manual inspection of alignments. In what follows, we refer to these 119 cases with primary findings as the Benchmark cohort, and the 14 negative cases as the Unsolved cohort.

Boston Children’s Hospital

Eleven cases (all single probands) from the Beggs Lab, Congenital Myopathy Research Program laboratory, and Manton Center for Orphan Disease Research at Boston Children’s Hospital were included in the analysis [4348].

Libraries (TruSeq DNA v2 Sample Preparation kit; Illumina, San Diego, CA) and whole-exome capture (EZ Exome 2.0, Roche) were performed according to manufacturer protocols from DNA extracted from blood samples. WES was carried out on an Illumina HiSeq 2000. Reads were aligned to the GRCh37/hg19 human genome assembly using an in-house assembler. Variants were called using Gene Analysis Toolkit (GATK) version 3.1 or higher (Broad Institute, Cambridge, MA) and were Sanger confirmed by the Boston Children’s Hospital IDDRC Molecular Genetics Core Facility.

Christian-Albrechts University of Kiel

Twelve cases (all single probands) from the Institute of Clinical Molecular Biology (IKMB) were included in the analysis [4955].

Illumina’s Nextera/TruSeq whole-exome target capture method was applied. WES was carried out on the Illumina HiSeq/NovaSeq platforms. Reads were aligned to the GRCh37/hg19 human genome assembly using BWA-MEM version 0.7.17 and variants called using GATK version 4.1.6.0 (Broad Institute, Cambridge, MA).

HudsonAlpha Institute for Biotechnology

Three cases (two trios and a single proband) from the Clinical Services Laboratory at HudsonAlpha Institute for Biotechnology, including cases from the Clinical Sequencing Evidence-Generating Research (CSER) consortium, were included in the analysis [5659].

WGS was carried out on an Illumina HiSeq X. Reads were aligned to the GRCh37/hg19 human genome assembly followed by variant calling using the Illumina DRAGEN software version 3.2.8 (Illumina, Inc. San Diego, CA).

Translational Genomics Research Institute

Twenty-three cases (including singletons, duos, trios, and quads) from the Center for Rare Childhood Disorders at The Translational Genomics Research Institute (TGen) were included in the analysis [60, 61].

WES or WGS sequencing was carried out on an Illumina HiSeq 2000, HiSeq 2500, HiSeq 4000, or NovaSeq6000. For WES, the Agilent SureSelect Human All Exon V6 or CRE V2 target capture method was applied. Reads were aligned to reference GRCh37 version hs37d5 and variants called using GATK Haplotype caller version 3.3-0-g37228af (Broad Institute, Cambridge, MA).

Tartu University Hospital

Eleven cases from Tartu University Hospital in Estonia that had undergone WES were included in the analysis [6264].

Nextera Rapid Capture Exome Kit-i (Illumina Inc.) target capture method was applied. WES was carried out on an Illumina HiSeq2500 sequencer. Reads were aligned to the GRCh37/hg19 human genome assembly using BWA-MEM version 0.5.9 and variants called using GATK Haplotype caller version 3.4 (Broad Institute, Cambridge, MA).

Variant annotation and data sources

All analyses were performed based on the GRCh37 human genome assembly. Variant consequences and annotations were obtained with VEP v.95 [65] utilizing ENSEMBL transcripts version 95 (excluding non-coding transcripts) and selecting the canonical transcript for analysis. Transcript-specific prediction for evaluating variant deleteriousness was calculated with VVP [66], which were also used as input for VAAST [14]. Variants were annotated with ClinVar (version 20200419) [67] ensuring exact position and base match. Gene conditions were extracted from OMIM (version 2020_07) [68] and HPO (obo file dated 2020-08-11) [69]. Gene symbols were harmonized using ENSEMBL and HGNC databases controlling for synonymous gene symbols.

AI-based disease gene and condition prioritization

AI-based prioritization and scoring of candidate disease genes and diagnostic conditions was performed using Fabric GEM [70], which is a commercially available eCDSS part of the Fabric Enterprise platform (Fabric Genomics, Oakland, CA). GEM inputs are genetic variant calls in VCF format and case metadata, including (optional) parental affection status, and patient (proband) phenotypes in the form of Human Phenotype Ontology (HPO) terms. The VCF files can include “small variants” (single nucleotide, multiple nucleotide, and small insertion/deletion variants), and (optionally) structural variants (insertion/deletions of over 50 bp, inversions, and/or copy number variants with imprecise ends). This information can be provided via an application programming interface or manually in the user interface. Data analysis is typically carried out in minutes depending on inputs. GEM outputs are displayed in an interactive report (Additional file 2: Figure S1) that includes a list of candidate genes ranked by the GEM gene score (see below), detailed information of patient variants present in each candidate gene, and conditions associated with each candidate gene ranked by GEM’s condition match (CM) score (explained below).

GEM aggregates inputs from multiple variant prioritization algorithms with genomic and clinical database annotations, using Bayesian means to score and prioritize potentially damaged genes and candidate diseases. Briefly, the algorithm parametrizes itself using the proband’s called variants as one-time, run-time training data, inferring the states of multiple variables directly from the input variant distribution, e.g., sex. Additional static training parameters were derived from the 1000 Genomes Project [71] and CEPH [72] genome datasets. GEM reevaluates genotype calls and quality scores considering read support, genomic location, proband sex, and potentially overlapping SVs, augmenting the genotype calls with more nuanced posterior probabilities, computing ploidy for each variant. GEM also computes the likelihood that the proband belongs to any of several different ancestry groups using the input genotypes together with gnomAD sub-population variant frequency data [73]. The probabilities of other, internal, variables, conditioned on each state (sex and ancestry, etc.) are then obtained using naive Bayes, controlling for non-independence of variables by calculating a correlation matrix at run time using the proband’s data. For example, after conditioning variant scores on ancestry, known inheritance pattern for the gene in question, gene location, and proband sex, GEM may conclude that a de novo variant is unlikely to participate in a disease-causing genotype, even though it is predicted to be highly damaging. Thus, highly damaging and de novo variants, even frameshifting ones, do not automatically receive high GEM scores. GEM uses the same procedure to evaluate and score biallelic genotypes for known and novel disease-gene candidates. The only difference is that the global prior (e.g., relative proportion of known disease genes with autosomal recessive vs. autosomal dominant inheritance patterns), rather than OMIM and HPO support for a particular inheritance pattern at that locus, is used to evaluate possible biallelic cases in novel gene candidates.

GEM’s gene scores are Bayes factors (BF) [74]. Analogous to the likelihood ratio test, a Bayes factor presents the log10 ratio between the posterior probabilities of two models, summarizing the relative support for the hypothesis (in this case) that the prioritized genotype damages the gene in which it resides and that explains the proband’s phenotype versus the contrapositional hypothesis that the variant neither damages the gene nor explains the proband’s phenotype. In keeping with established best practice [74], a log10 Bayes factor between 0 and 0.69 is considered moderate support, between 0.69 and 1.0 substantial support, between 1.0 and 2.0 strong support, and above 2.0, decisive support. A score less than 0 indicates that the counter hypothesis is more likely. For calculating the Bayes posterior p(M|D), the probability of the data given a model (pD|M) is derived from GEM’s severity scoring protocol, which includes input from the VAAST and VVP algorithms, and any available prior variant classifications from the ClinVar database. This model is conditioned upon sex, ancestry, feasible inheritance model, gene location, and gene-phenotype priors derived by seeding the provided patient HPO terms to the HPO ontology graph and subsequently obtaining priors for all genes in the HPO and GO ontologies by belief propagation using Phevor’s previously described Bayesian network-based algorithm [15]. The prior probability for the model (pM) is based upon known disease associations in the Mendelian conditions databases OMIM and/or HPO with the gene in question.

GEM’s Bayes factor-based scoring system is designed for ease of explanation and to speed interpretation. GEM scores are not intended to be definitive, rather they are designed to provide guidance for succinct case reviews carried out by clinical geneticists. Thus, GEM outputs also include several additional scores that provide additional guidance and improve explainability. GEM gene scores, for example, are accompanied by VAAST [14], VVP [66], and Phevor [15] posterior probabilities, conditioned upon the potentially confounding variables of proband sex, gene location, and ancestry, together with common variant genomic and clinical annotations (Additional file 2: Figure S1). These scores further ease interpretation, as they allow users to assess the major drivers of a GEM score and their relative contributions to it.

GEM also provides means to assess the Mendelian conditions associated with putative disease-causing genes as possible diagnoses via its condition match (CM) scores. Like gene scores, CM scores are Bayes factors and are derived from the log10 ratio of the posterior probability that HPO phenotype associations for a given Mendelian condition’s HPO are consistent with the proband’s phenotype versus the contrapositional hypothesis. For these calculations, the probability of the data, p(D|M), is determined using Phevor’s Bayesian algorithm to obtain a probability for each disease, conditioned upon the proband’s phenotype. The prior probability for the model, p(M), is the probability that one or more genes associated with the Mendelian condition (as documented in OMIM and/or HPO) contain a damaging genotype as ascertained by GEM’s severity scoring protocol. Condition match scores are displayed alongside each gene-associated condition for review (Additional file 2: Figure S7).

Structural variant scoring and ab initio inference by GEM

At run time, GEM infers ab initio the existence of SVs, their coordinates, and their copy numbers (ploidy) in a probabilistic fashion using SNVs, sort indel calls, read depths, zygosity, and gnomAD frequency data. GEM searches the proband’s genotypes for evidence of three types of SV: deletions, duplications, and CNVs. Regions exhibiting loss of heterozygosity (LOH), for example, are used as evidence for heterozygous deletions. Genomic spans lacking expected variants, the signature of homozygous deletions, are identified using gnomAD population frequencies [73] to derive point estates that a given gnomAD variant would or would not be ascertained given its population frequency. Further evidence for duplications and deletions is derived from read support, e.g., approximately integer increases or decreases in depth across a region provide support for copy number variation. Point estimates at each site of a small variant call are further conditioned upon provided variables, such as genotype qualities, and inferred ones, such as sex and ancestry, to obtain more refined estimates. High scoring segments and their maximum likelihood start and end coordinates are identified using a Markov model [75]. The results are used to determine the degree of support for external SV calls, and as the basis for GEM’s own SV calls. For ease of reporting, ab initio SV calls that overlap an external SV call (default minimum reciprocal overlap of 33%) are replaced in the output by the external SV call as long as they still overlap the relevant scored genes.

Benchmarking variant prioritization with VAAST, Phevor, and Exomiser

We used the Snakemake software [76] to create a workflow that analyzes cases with the VAAST, Phevor, and Exomiser algorithms. This workflow was only applied to the benchmark cohort to enable us to compare the performance of four genome interpretation tools with exactly the same inputs and annotations. The pipeline starts with a VCF file, family structure, affection status, and HPO terms and concludes with the outputs for each of the algorithms. VVP scores were obtained as described above and provided to VAAST as input. VAAST was provided pedigree information and affection status and was run in both dominant and recessive modes with results aggregated. Gene ranks for VAAST are reported for the highest scoring occurrence of the gene from aggregated outputs. Phevor was provided with HPO terms and VAAST scores as inputs. Ranks were selected as described for VAAST.

Exomiser [16] benchmark analyses were run with the same configuration used in the 100,000 Genomes Project [77], specifically (1) using the GRCh37 genome assembly; (2) analyzing autosomal and X-linked forms of dominant and recessive inheritance; (3) allele frequency sources from the 1000 Genomes Project [78], TopMed [79], UK10K [80], ESP, ExAC [81], and gnomAD [73] (except Ashkenazi Jewish); (4) pathogenicity sources from REVEL [82] and MVP [83]; and (5) including the steps failedVariantFilter, variantEffectFilter (remove non-coding variants), frequencyFilter with maxFrequency = 2.0, pathogenicityFilter with keepNonPathogenic = true, inheritanceFilter, omimPrioritiser, and hiPhivePrioritiser.

Exomiser was considered to have identified the diagnosed gene when it was scored as a candidate for any of the utilized modes of inheritance. None of the tools in this analysis were provided a target mode of inheritance (as it is unknown), and so the diagnostic gene rank for Exomiser was determined from its rank within the combined gene candidate list from all modes of inheritance (i.e., the same procedure used for VAAST and PHEVOR). The ranks within the combined list of candidate genes were generated by sorting gene-level candidates from all modes of inheritance on the Exomiser combinedScore in descending order with each candidate gene only added to the list on its first, highest scoring occurrence. Exomiser variant level scoring was not considered for determining candidates or ranking. All Exomiser analyses on the benchmark cohort ran to completion and successfully produced output; however, in 18 cases, Exomiser did not identify the true positive diagnostic gene as a scored candidate (i.e., it was absent from its output). A similar phenomenon was observed in 4 cases using VAAST. For both tools, these cases were considered false negatives.

Impact of deep phenotypes derived from clinical NLP

The utility of HPO terms was investigated by rerunning all analyses from the benchmark cohort with three sets of HPO terms. The motivations for these analyses were first to determine how sensitive GEM is to phenotyping errors; and second, to compare the utility of CNLP-derived descriptions to manual ones. For each case, an HPO terms list was provided that included HPO terms manually curated by the analysis team when the case was originally solved. A second set of HPO terms was generated from NLP analysis of clinical notes related to the case using the CLiX ENRICH software (Clinithink, Alpharetta, GA) [28]. A randomized set of HPO terms was generated for each case whereby the number of HPO terms from the CliniThink analysis case was held constant, and alternate terms were randomly selected from the entire corpus of HPO terms across all samples with each selection probability determined by the number of times that term occurred in the corpus.

Results and discussion

GEM AI outperforms variant prioritization approaches

We benchmarked GEM, an AI-based eCDSS, using a cohort of 119 pediatric retrospective cases from Rady Children’s Institute for Genomic Medicine (RCIGM; benchmark cohort). Most of these were critically ill NICU infants who received genomic sequencing for diagnosis of genetic diseases. All had been diagnosed with one or more Mendelian conditions using a combination of manual filtering and variant prioritization approaches (“Methods”). To further validate performance, we also analyzed a second cohort comprised of 60 non-NICU, rare disease patients from five different academic medical centers (validation cohort). Finally, we reanalyzed a set of 14 previously analyzed probands that had remained undiagnosed by WGS. Our goal was to evaluate the ability of GEM to identify new diagnoses in these previously unsolved cases, without providing false positive findings that would result in time-consuming case reviews. To provide context for our performance benchmarks, we also ran three commonly used variant prioritization tools: VAAST [14], Phevor [15], and Exomiser [16].

The benchmark and validation cohorts included singleton probands, parent-offspring trios, different modes of inheritance, and both small causal variants (SNVs, and small insertions or deletions, indels; Table 1; Additional file 1: Table S1) and large structural variants (SV), some of which were causative (Table 2). In these retrospective analyses, we considered the variants, disease genes, and conditions that were included as primary findings in the clinical report as the “gold standard” truth set.

Table 1.

Characteristics of case cohorts. Benchmark cohort, 119 cases total. Validation cohort, 60 cases total. Grand total, 179 cases

Assay type Variant type Proband sex Pedigree Type
Mode of inheritance WGS WES SNV/Indel SV Male Female Single Duos Trios
Benchmark cohort
 Autosomal dominant 70 11 66 15 36 45 35 6 40
 Autosomal recessive 27 23 4 14 13 9 1 17
 X-linked dominant 6 5 1 1 5 2 4
 X-linked recessive 5 5 5 2 1 2
 Sub-totals 108 11 99 20 56 63 48 8 63
Validation cohort
 Autosomal dominant 3 34 37 10 27 15 2 20
 Autosomal recessive 1 14 15 5 10 9 6
 X-linked dominant 1 5 6 3 3 1 3 2
 X-linked recessive 2 2 2 1 1
 Sub-totals 5 55 60 0 20 40 26 5 29

Table 2.

Diagnostic structural variants identified by GEM in the benchmarking cohort (20 out of 119 cases). Structural variants are ranked by GEM based on the genes harbored by the variant and presented alongside other ranked genes with coding SNVs or small indels based on the top scored gene. The asterisk indicates genes that in the literature are candidates for the phenotype of the diagnostic disease/syndrome (as described in OMIM). The results show that GEM can analyze both deletions (del) and duplications (dup) of sizes as small as 4 kb and up to entire chromosome arms, diverse modes of inheritance, pedigree structure, and from either WGS or WES assay data. GEM also automatically identified compound heterozygotes between SVs and SNV/indels (cases 1, 2, and 8). Input SV calls can include breakpoint-based calls (here “SV”), or imprecise CNV calls based on read depth analysis. Notably, GEM can also infer SVs directly from the small variant data when external SV calls are not provided (cases 2, 10, 15, and 17), and score them appropriately, identifying diagnostic variants that in the original cases were found by microarrays and not by sequencing

Case no. Top scored gene(s) Gene rank GEM score Variant(s) position SV type Length (kb) Mode of Inheritance Pedigree type Assay type SV calls in input Diagnosis
252268 FANCA* 1 2.28 chr16:89847864-89863349; FANCA: c.3788_3790delTCT Del 15 Recessive Trio WGS SV Fanconi anemia
223449 TANGO2* 1 2.13 chr22:20028937-20057143; TANGO2: c.605+1G>A Del 28 Recessive Trio WGS None MECRCN
266523 BTRC* 1 2.05 chr10:102941001-103430600 Dup 490 Dominant Duo WGS SV Split hand/foot malformation type 3
267392 HIRA, TBX1* 1 2.05 chr22:18893883-21562619 Del 2669 Dominant Single WES CNV DiGeorge syndrome; velocardiofacial syndrome
267148 KMT2A 1 1.87 chr11:116691508-126432828; chr22:17038511-20307516 Dup 9741; 3269 Dominant Trio WES CNV Emanuel syndrome
253691 HIRA, TBX1* 1 1.73 chr22:18893883-20307516 Del 1414 Dominant Single WES CNV DiGeorge syndrome; velocardiofacial Syndrome
256943 MAGEL2* 1 1.64 chr15:22833478-28566610 Del 5733 Dominant Single WES CNV Prader Willi syndrome
254012 NDUFS3* 1 1.56 chr11:47605229-47609177; NDUFS3: c.374G>A Del 4 Recessive Trio WGS SV Leigh syndrome
254728 EPHA4 2 1.46 chr2:220309089-224580863 Del 4272 Dominant Single WGS SV Pathogenic deletion in 2q35q36.1
44671 NPAP1 1 1.42 chr15 tetrasomy (broken in multiple dups) Dup 4542; 991; 358; 158 Dominant Trio WGS None Isodicentric chromosome 15 syndrome
360547 FREM1 1 1.33 chr9:1-18477200 Del 18,437 Dominant Trio WGS SV Chromosome 9p deletion syndrome
259685 TYROBP 1 1.31 chr19:23158251-33502767 Dup 10,345 Dominant Trio WES SV Partial trisomy 19p12.q13.11
266700 TAB2 1 1.31 chr6:144951601-150260400 Del 5309 Dominant Trio WGS SV Chromsome 6q24-q25 Syndrome
244102 MAGEL2* 1 1.28 chr15:23684685-26108259 Del 2424 Dominant Single WES CNV Prader Willi syndrome
204560 JAG1* 2 1.21 chr20:10471400-13459333 Del 44 Dominant Trio WGS None Alagille syndrome
246146 HCN1 1 1.20 chr5:213101-46,270,700 Dup 44 Dominant Single WGS SV Trisomy 5p
45020 PCDH19* 1 1.15 chrX:92925011-99669272 Del 6744 X-linked dominant Trio WGS None Developmental and epileptic encephalopathy 9
248678 FANCC* 1 1.14 chr9:97998556-98009092 Del 11 Recessive Single WGS SV Fanconi Anemia
352726 THRA 1 1.00 chr17:32147833-79020944 Dup 46,873 Dominant Proband WGS SV Distal trisomy 17q
251355 TRIP11 4 0.58 chr14:84783523-96907490 Del 12,124 Dominant Duo WGS SV Chromosome 14q31.2q32.2 Syndrome

GEM gene scores are Bayes factors (BF) [84]; these were used to rank gene candidates (Additional file 2: Figure S1). BFs are widely used in AI, as they concisely quantify the degree of support for a conclusion derived from diverse lines of evidence. In keeping with established practices [84], a BF of 0–0.69 was considered moderate support, 0.69–1.0 substantial support, 1.0–2.0 strong support, and above 2.0, decisive support [84]. Scores less than 0 indicated support for the counter hypothesis—that variants in that gene were not causal for the proband’s disease. GEM outputs also include several annotations and metrics that provide additional, supportive guidance for subsequent expert case review (Additional file 2: Figure S1). Experience has shown that such guidance is critical for adoption by experts who wish to review the evidence supporting automated variant assertions. These include VAAST, VVP, and Phevor posterior probabilities, conditioned upon proband sex, gene location, and ancestry. Annotations include variant consequence, ClinVar database pathogenicity assessments, and OMIM conditions associated with genes. This metadata enables expert users to review the major contributions underpinning a final GEM score. Moreover, GEM prioritizes diplotypes, rather than variants, which speeds interpretation of compound heterozygous variants in recessive diseases (Additional file 2: Figure S1B). Comparison of the diagnostic performance of GEM to variant prioritization methods utilized ranking of the correct diagnostic gene. We assumed that in the case of compound heterozygotes, variant prioritization methods such as Exomiser would rank one variant of the pair highly, leading to identification of the other upon manual review (“Methods”).

GEM ranked 97% of previously reported causal gene(s) and variant(s) among the top 10 candidates in the 119 benchmark cohort cases. In 92% of cases, it ranked the correct gene and variant in the top 2 (Fig. 1A). By comparison, the next best algorithm, Phevor, identified 73% of causal variants in the top 10 candidates and 59% in the top 2. GEM, Phevor, and Exomiser prioritize results by patient phenotypes (provided as HPO terms) in addition to variant pathogenicity, whereas VAAST only utilizes genotype data, explaining its lower performance. Thus, these data also highlight that patient phenotypes improve the diagnostic performance of automated interpretation tools.

Fig. 1.

Fig. 1

The diagnostic sensitivity of GEM was greater than the variant prioritization methods Phevor, Exomiser, and VAAST. A Proportion of the benchmark cohort of 119 cases where the true causal genes (or variants in the case of causal SVs) were identified among the top 1st, 2nd, 5th, or 10th gene candidates. Patient phenotypes were extracted manually from medical records by clinicians and provided as HPO term inputs to GEM, Exomiser, and Phevor. VAAST only considers variant information. It should be noted that GEM and Phevor ranks correspond to genes, which may include one or two variants (the latter in the case of a compound heterozygote), whereas Exomiser and VAAST ranks were for single variants. In the case of compound heterozygotes, the rank of the top-ranking variant is shown for Exomiser and VAAST. B Comparison of GEM performance in the validation cohort (excluding SV cases) versus the validation cohort (comprised of 60 rare genetic disease cases from multiple sources)

The benchmark cohort included 3 cases for which two genes were reported to contribute to the patient phenotype. This rate (2.5%) is consistent with previous reports for digenic inheritance [85]. The statistics above use the top ranked genes in these cases, but Additional file 1: Table S3 shows that GEM also ranked the second causal gene among its top candidates, whereas Phevor reported poor ranks in one case, and Exomiser missed the second gene in two out of the three cases.

Next, we investigated whether the diagnostic performance of GEM extended to Mendelian diseases other than those of NICU infants, such as patients with later disease onset, less severe presentations, or with data produced by other variant calling pipelines or outpatient genetic clinics. For these analyses, we compiled a validation cohort largely consisting of WES cases from five different academic medical centers (Table 1; Additional file 1: Table S2). The diagnostic performance of GEM in the validation cohort was almost identical to that in the benchmark cohort (Fig. 1B). These data demonstrated that the diagnostic performance of GEM was not dependent of disease severity, age of onset, or genomic sequencing or variant detection methods.

An implication of these findings is that GEM achieved 97% recall (true positive rate) by review of 10 genes, whereas the other tools had < 78% recall by similar review (Fig. 1, Additional file 2: Figure S2). In part, this difference reflected the unique ability of GEM to prioritize SVs. Excluding SV cases, GEM, Phevor, and Exomiser achieved recall of 97%, 83%, and 76%, respectively, by review of 10 genes (Additional file 2: Figure S3A). Furthermore, VAAST and Exomiser failed to provide rankings for 4 and 18 true positive variants, respectively. Exclusion of false negatives and SV cases increased the top 10 recall of Exomiser to 93% (Additional file 2: Figure S3B), in agreement with previous reports [86]. These data show the importance of including all types of cases and causal variants in benchmarking to avoid overestimation of diagnostic performance in real-world clinical applications.

Scoring of structural variants increases diagnostic rate

A major barrier to the incorporation of SV calls into genome diagnostic interpretation, whether manual or using eCDSS, is their low precision (high false positive, FP, rates) using short read alignments, with typical FP rates of 20–30% [87, 88]. This leads to overwhelmingly time-consuming, manual assessment of event quality and significance for large numbers of SVs. GEM minimizes the effect of low precision by scoring SVs either with SV calls provided in the proband’s input VCF file, and/or by inferring ab initio their existence from metadata associated with SNV and indel calls (“Methods”; see below). The benchmark cohort included 20 cases in which SVs were reported to be causative, reflecting a similar incidence to that in real-world experience (Fig. 1A, Table 2) [2023]. In 17 of these, the causative SV was ranked first by GEM. In two, it was ranked second, and in one it was listed fourth, demonstrating that GEM retains adequate diagnostic performance with imprecise SV calls. The disease-causing SVs in the benchmark set ranged from small (4 kb) to very large (e.g., entire chromosome arms). In three cases, the diagnosis was of an autosomal recessive disorder in which the SV was compound heterozygous with a SNV/indel. In each, GEM integrated the two variants correctly, automatically identifying the causative diplotypes (Additional file 2: Figure S5). With regard to the diagnostic specificity of GEM, the mean and median number of gene candidates for these probands with BF > 0 (any support) was 8.7 and 9.5, respectively, which was similar to probands whose VCF files contained no SVs, causative or otherwise.

Large SVs frequently affect more than one gene. For consistency with other variant classes, genes within multigenic SVs are grouped and sorted by GEM based upon the gene-centric Bayes factor score associated with the overlap of the proband phenotype and known Mendelian disorders (if any) associated with them (“Methods”). For example, Additional file 2: Figure S4 shows a case that highlights the practical utility of prioritizing genes harboring causative SVs together with SNVs and short indels in the same report, rather than separately cross-referencing with databases of microdeletion syndromes [89]. While it is often unknown which genes harbored in a pathogenic SV are causal for microdeletion/microduplication syndromes, GEM’s gene-by-gene rankings typically agreed with causal gene candidates suggested by the literature (asterisks in Table 2).

By default, GEM evaluates every gene and transcript for the presence of overlapping SVs. Notably, four benchmark cases did not include externally called SVs in their input VCFs (these had been previously diagnosed by manual inspection and orthogonal confirmatory tests; Table 2). Nevertheless, GEM inferred the existence of these four SVs using its ab initio SV identification algorithm and evaluated them jointly with SNVs and indels (“Methods”). To further demonstrate this innovative functionality, we removed all external SV calls from each input VCF file of the 14 WGS cases (as GEMs ab initio SV imputation is currently limited to WGS data) and reran GEM. GEM re-identified 13 of the 14 of the causative SVs. Although GEM’s inferred SV termini were imprecise, an overlapping SV of the same class (duplication, deletion, or CNV) and ploidy to that in the original VCF was inferred, and the same high scoring gene and mode of inheritance/genotype (autosomal dominant, simple recessive, or compound heterozygote) was ranked first. SV recall within the top 1, 5, and 10 ranked GEM results were 71%, 86%, and 93%, respectively. The single false negative was a small (4 kb) homozygous deletion. GEM failed to identify this SV because it did not span sites with known variation in the gnomAD database [73], upon which ab initio SV inference is based (“Methods”). With regard to specificity, the mean and median number of results with genes with BF > 0 in these cases was 10.6 and 12.5, respectively. These values differed only slightly from the results obtained using external SV calls (8.7 and 9.5, respectively), despite the fact every gene and transcript was evaluated for the presence of SVs.

Collectively, these results demonstrate the accuracy of GEM’s ab initio approach to identification and prioritization of SVs without recourse to external calls and databases of known causative SVs. Thus, GEM compensates, in part, for the low recall of SVs from short-read sequences. If an external SV calling pipeline fails to detect an SV, there is still the possibility that GEM will identify it via this ab initio approach. This capability, together with GEM’s ability to accurately prioritize SVs in the context of SNVs and short indels, addresses an unmet need for clinical applications. This characteristic also makes GEM well suited for reanalyses of older cases and/or pipelines lacking SV calling.

Leveraging automated phenotyping from clinical natural language processing

Ontology-based phenotype descriptions, using Human Phenotype Ontology (HPO) terms [69], are widely used to communicate the observed clinical features of disease in a machine-readable format. These lists of terms are usually derived by manual review of patient EHR data by trained personnel, a time-consuming, subjective process. A solution is automatic extraction of patient phenotypes from clinical notes using clinical natural language processing (CNLP) [28, 90]. One challenge has been that CNLP generates many more terms than manual extraction. Thus, manual curation yielded an average of 4 HPO terms (min = 1, max = 12) in the benchmark cohort, while CNLP yielded an average of 177 HPO terms (min = 2, max = 684). Some of the extra CNLP terms are hierarchical parent terms of those observed, raising the concern that their inclusion diminishes the average information content in a manner that could impede diagnosis [27]. To investigate the effect of CNLP-derived HPO terms on GEM’s performance, we analyzed the benchmark cohort both with HPO terms extracted by commercial CNLP (“Methods”) and manually extracted HPO terms.

Figure 2 shows the distributions and medians for ranks and GEM gene scores of true positives, as well as the number of gene candidates with BF ≥ 0.69 (moderate support), for manual and CNLP terms. The median rank of the causal genes did not significantly differ between CNLP- and manually derived phenotype descriptions (Fig. 2A). The median GEM gene score of true positives was higher with CNLP-derived phenotypes than with manual phenotypes (Fig. 2B). The number of candidates above the BF threshold was higher with manual phenotypes than CNLP (Fig. 2C). CNLP rescued a few true positives with low ranks and negative BF scores compared to manual phenotype descriptions (Fig. 2A, B). These results demonstrate that GEM performs somewhat better with CNLP-derived phenotype descriptions as part of an automated interpretation workflow, than with sparse, manual phenotypes.

Fig. 2.

Fig. 2

Comparison of GEM performance with manually curated and CNLP-derived HPO terms in the benchmark cohort. Distribution of ranks for causal genes (A); GEM Bayes factors for causal genes (B); and number of candidates (hits) at BF ≥ 0.69 threshold (moderate support) (C). The black line in the graphs denotes the median. The asterisks represent statistical difference between the groups with p < 0.0001 from a two-tailed Wilcoxson matched pairs signed rank test (ranks showed no statistically significant difference)

Resilience to mis-phenotyping and gaps in clinical knowledge

Given the potentially noisy nature of the CNLP phenotype descriptions, it was important to examine the sensitivity of GEM to mis-phenotyping. To address this question, we randomly permuted CNLP-extracted HPO terms between cases, weighting by term frequency within the cohort, so that every case maintained the same number of HPO terms as CNLP originally provided. Permuting HPO terms resulted in lower gene scores, and several cases would have been lost for review had the gene score threshold of BF ≥ 0 still been used, but ranks are unaffected (98% in top 10; Fig. 3). This represented lower bound estimates, as actual mis-phenotyping (short of data tracking issues) would be much less. It is also worth noting that even using randomly permuted phenotype descriptions, GEM’s performance still exceeded that of Phevor and Exomiser using the correct phenotypes (Additional file 2: Figure S2). We therefore conclude that GEM is resilient to mis-phenotyping.

Fig. 3.

Fig. 3

Impact of missing data and mis-phenotyping on GEM performance in the benchmark cohort. Causal gene rank (A); Bayes factors for causal genes (B); and number of candidates (hits) above gene BF ≥ 0.69 threshold (moderate support) (C) under standard conditions, withdrawing ClinVar information, and permuting HPO terms extracted by CNLP. The black line in the graphs denotes the median

We also evaluated the impact of gaps in clinical knowledge on GEM performance by withdrawing annotations from a key clinical database, ClinVar. Absence of ClinVar annotations had minimal impact in ranking, although it reduced median gene scores (1.1 vs. 2.7), resulting in 9 cases no longer meeting the minimum Bayes factor threshold ≥ 0 (any support; Fig. 3). Clearly, ClinVar provided GEM with valuable information. Nonetheless, without ClinVar, GEM’s top 10 maximal recall (88%) still exceeded that of Phevor (72%) and Exomiser (65%; Fig. 1). More broadly, these results show that integrating more datatypes in GEM improves diagnostic performance and results in greater algorithmic stability (Figs. 2 and 3).

About 70% (86/122) of the disease-causing variants in the benchmarking dataset are reported in ClinVar with pathogenic (P) or likely pathogenic (LP) clinical significance annotations. Moreover, each proband’s whole-genome variant set contained on average 1.9 variants with ClinVar P/LP annotations. These two facts underscore the importance of ClinVar annotations for assisting diagnosis. They also make clear that tools that leverage ClinVar information need to avoid false positives which lead to longer candidate lists as non-causal genes also contain ClinVar P/LP variants. Additional file 1: Table S4 breaks down results for the benchmark cohort with respect to ClinVar annotations of causal variants. Overall, mean, and median ranks were slightly improved for diagnostic variants with ClinVar P/LP annotations vs. those without them (mean 1 vs. 3), with GEM showing the greatest improvement in ranks. Moreover, GEM maintained the same number of candidates with GEM gene score > 0 for both classes [10], demonstrating that GEM can use ClinVar status to improve diagnostic rates without increasing the number of candidates for review.

GEM performs equivalently on parent-offspring trios and single probands

Parent-offspring trios are widely used for molecular diagnosis of rare genetic disease. While a recent study showed that singleton proband sequencing returned a similar diagnostic yield as trios [91], interpretation of trio sequences is less labor-intensive. For example, trios enable facile identification of de novo variants, which is the leading mechanism of genetic disease in outbred populations [92]. Likewise, in recessive disorders, proband compound heterozygosity can be automatically distinguished from two variants in cis. However, these benefits are associated with increased sequencing costs. Moreover, both parents are not always available for sequencing or do not wish to have their genomes sequenced.

To understand how GEM performs in the absence of parental data, we reanalyzed the 63 trio and duo cases from the benchmark cohort as singleton proband cases. Surprisingly, we observed insignificant differences in the mean rank of the causal gene (Fig. 4A), GEM score of the causal gene (Fig. 4B), or number of candidates with BF ≥ 0.69 (Fig. 4C), using either manually or CNLP-extracted HPO terms. In contrast, this reanalysis was associated with a decline in the performance of Exomiser (Additional file 2: Figure S6). These analyses demonstrated that GEM was resilient to the absence of parental genotypes, a feature that could increase the cost effectiveness and adoption of WGS.

Fig. 4.

Fig. 4

Comparative performance of parent-offspring trios or duos vs. singleton probands in the benchmark cohort. Causal gene rank (A); Bayes factors (B); and number of candidates (hits) above gene BF ≥ 0.69 (moderate support) (C) for 63 cases analyzed as parent-offspring trios (n = 59) or duos (n = 4), as compared with analysis as single probands, using both manually curated or CNLP-derived HPO terms. The black line in the graphs denotes the median. No statistically significance difference between the any manual/CNLP groups was found between trios versus single probands using the two-tailed Wilcoxson matched pairs signed rank test

GEM scores optimize case review workflows

Conventional prioritization algorithms rank variants, enabling manual reviewers to start with the top ranked variants, and work their way down in the list until a convincing variant is identified for further curation, classification, and possible clinical reporting. This review process typically involves (a) assessing variant quality, deleteriousness, and prior clinical annotations; (b) evaluating whether there is a reasonable match between the phenotypes exhibited by the patient and those reported for condition(s) known to be associated with defects in the corresponding gene; and (c) considering the match in mode(s) of inheritance reported in the literature for the candidate disease and the patient’s diplotype.

GEM accelerates this process, because it intrinsically considers variant quality, deleteriousness, prior clinical annotations, and mode of inheritance. Furthermore, at manual review, GEM gene scores summarize the relative strength of evidence supporting the hypothesis that the gene is damaged and that this explains the proband’s phenotype.

GEM scores provide a logical framework for setting thresholds with regard to the optimal number of candidates that should be reviewed to achieve a desired diagnostic rate. This enables laboratory directors and clinicians to dynamically set optimal tradeoffs of interpretation time and diagnosis rate for specific patients, relative to their suspicion of a genetic etiology or results of other diagnostic tests.

We examined the effect of different BF thresholds on recall (true positive rate) and median number of gene candidates for review in the benchmark cohort (Fig. 5). In such analyses, it is germane to consider the concept of maximal true positive rate (or recall) to measure the theoretical proportion of true positive diagnoses recoverable by perfect interpretation when reviewing a set of N genes containing the true positive. For example, in the benchmark dataset, a GEM causal gene score threshold ≥ 0 would retain a median of ten candidates for review and provide a 99% maximal recall; whereas a threshold of ≥ 0.5 would retain a median of four candidates for review for a 97% maximal recall (Fig. 5).

Fig. 5.

Fig. 5

Trade-off between GEM gene scores, maximal true positive rates, and number of candidates for review in the benchmark cohort. GEM gene scores are Bayes factors (BF) that can be used speed case review. A Gene maximal true positive rate achieved at the different BF thresholds (Y-axis). B Median number of candidate genes for review at each BF threshold. As the BF threshold is decreased, true positive rate increases, while the number of candidates to review remains manageable. Input HPO terms for this analysis were extracted by CNLP

These results illustrate how a tiered approach to case review using GEM gene scores could minimize the number of candidate genes to review, and, thereby manual interpretation effort. For example, a first pass review of candidates with a gene BF ≥ 0.69 provided an expected 95% diagnostic rate (and a corresponding median of 3 genes to be manually reviewed). If followed by a second pass using a threshold > 0, if no convincing candidates are found, an additional 4% possible diagnoses would be recovered, involving review of a median increment of seven genes. Application of this two-tiered approach to the benchmark dataset of 119 cases (Fig. 1), required manual final review of 395 candidate genes (3 genes in 115 cases and 10 genes in 5 cases), or an average of 3.3 candidate genes per case, for a maximal recall of 99%. Finally, review of candidates with BF < 0 recovered the last true positive in the benchmark cohort (COL4A4, ranked 40th in the GEM report with a BF = − 0.6. This case was a phenotypically and genotypically atypical autosomal dominant presentation of Alport syndrome 2 (MIM 203780).

Clinical decision support for diagnosis

Quantifying how well the observed phenotypes in a patient match the expected phenotypes of Mendelian conditions associated with a candidate gene is challenging for clinical reviewers and is a major interpretation bottleneck. In practice, clinicians look for patterns of phenotypes, biasing their observations. In addition, patient phenotypes evolve as their disease progresses. And there is considerable, disease-specific heterogeneity in the range of expected phenotypes. Simply comparing exact matches of the patient’s observed HPO terms with those expected for that disease is suboptimal, because the observed and expected HPO terms are often hierarchical neighbors, rather than exact matches. Missing terms, particularly those considered pathognomonic for a condition, and “contradictory” terms further complicate such comparisons by clinicians. Thus, generation of quantitative, standardized, unbiased models of disease similarity has proven elusive.

GEM can automate or provide clinical decision support for this process via a condition match (CM) score (“Methods”). The GEM CM score summarizes the match between observed and expected HPO phenotypes for genetic diseases and considers the known mode(s) of inheritance, associated gene(s), their genome location(s), proband sex, the pathogenicity of observed diplotypes, and ClinVar annotations. Importantly, CM scores reflect relationships between phenotype terms as expressed in the HPO ontology graph, enabling inclusion of imprecise matches in similarity comparisons. CM scores can be used in a wide variety of clinical settings to prioritize and quickly assess possible Mendelian conditions as candidate diagnoses, a process we term diagnostic nomination.

Specific, definitive, genetic disease diagnosis remains a significant challenge for clinical reviewers, even with the short, highly informative candidate gene lists provided by tools such as GEM. This is because many genes are associated with more than one Mendelian disease. For example, application of a GEM causal gene score threshold 0.69 to the 119 probands in the benchmark cohort results in a median of 3 gene candidates per proband (c.f. Fig. 5), associated with a maximal gene recall of 95%. However, because many genes are associated with more than one disease, clinical reviewers would actually need to consider around 12 candidate Mendelian conditions per proband (data not shown). This difficulty is exacerbated by the fact that most laboratory directors are not physicians and lack formal training in clinical diagnosis.

Determination of a specific, definitive genetic disease diagnosis among several candidates can be accomplished with a combination of GEM CM scores and causal gene scores (Fig. 6). Using the benchmark cohort’s true (reported) gene and disorder diagnoses as ground truth, we used a GEM gene score threshold ≥ 0.69 to recover gene candidates, and the associated CM scores to rank order the diseases associated with those gene candidates (Fig. 6A). Using CNLP-derived phenotypes, the true disease diagnosis was the top nomination by CM score in 75% of cases, within the top 5 in 91% of cases, and within the top 10 in 95% of cases. Performance was inferior with manually extracted phenotype terms. The area under the receiver-operator characteristic (ROC) curves (AUCs) were 0.90 and 0.88, for CNLP and manual terms, respectively (Fig. 6B). This implied that the larger number of CNLP-extracted terms conveyed greater information content, permitting better discrimination of the correct diagnostic condition, than sparse, manually extracted phenotypes [27].

Fig. 6.

Fig. 6

Performance of GEM condition match scores for diagnostic nomination in the benchmark cohort. A Ranks for reported diagnostic conditions for the benchmark dataset, using a GEM gene BF score ≥ 0.69 and sorted by CM score, for HPO terms derived from CNLP or manual curation. B Receiver-operator characteristic curves for the condition match (CM) score for all hits with BF 0. CNLP All: HPO extracted from clinical notes by CNLP; AUC = 0.91. Manual: Manually curated HPO terms; AUC = 0.88. CNLP Multiple Dx: CNLP-derived CM score for the true positive disorder versus the other possible disorders associated with that gene; AUC = 0.68. Manual Multiple Dx: As for CNLP-derived CM but using manually curated HPO terms; AUC = 0.69

In the benchmark cohort, 58 of the 100 candidate genes (excluding cases with causal, multigenic SVs) were associated with 2 or more disorders (median of 3 gene-disorder, maximum of 15; Additional file 2: Figure S7 shows the example of ERCC6). We measured how well the CM score distinguished between multiple, alternative disorders associated with the same gene (Fig. 6B). In these 58 cases, the AUC was less than that for CNLP with the entire set of candidate genes in the benchmark cohort (0.68 vs 0.9). This decrease can be at least partially explained by the high similarity (and in some cases identity) of the clinical features of different disorders associated with the same gene. Thus, a combination of GEM gene and CM scores can refine candidate disorders for clinical reporting, further reducing review times.

Reanalysis of previously unsolved cases

Recent reports show that reanalysis of older unsolved cases suspected of rare genetic disease can yield new diagnoses supported by incremental increases in knowledge of pathogenic variants, disease-gene discoveries, and reports of phenotype expansion for known disorders [93, 94]. While worthwhile, there are barriers to reanalysis, such as limited reimbursement and low incremental diagnostic yield, that limit use to physician requests. Ideally, all unsolved cases would be reanalyzed automatically periodically, and a subset with high likelihood of new findings would be prioritized for manual review. The strong correlation between true positive rates and GEM gene scores (Fig. 5) suggested a strategy for triaging reanalyzed cases for manual review: only cases for which the recalculated GEM score had increased sufficiently to suggest a high probability of a new diagnosis would pass the threshold for manual review. Likewise, GEM condition match scores could be used to search all prior cases to identify the subset of unsolved cases with support for particular Mendelian conditions, aiding cohort assembly for targeted reanalysis based upon particular proband phenotypes, or for review by particular medical specialists. Of note, an advantage of CNLP is that it is possible to automatically generate a new clinical feature list at time of reanalysis. This is particularly important in disorders whose clinical features evolve with time and were the observed features may be nondescript at presentation.

To test the utility of GEM for reanalysis, we selected 14 unsolved cases that had rWGS performed by RCIGM. For these reanalyzes, we used CNLP-derived HPO terms (Table 3) and a more stringent gene BF threshold ≥ 1.5 to restrict the search to very strongly supported candidates. Ten cases yielded no hits. Four cases returned a total of 7 candidate genes. Review of three cases did not return new diagnoses. In the remaining case, a new likely diagnosis was made of autosomal dominant Shwachman-Diamond Syndrome (MIM: 260400) or severe congenital neutropenia (MIM: 618752) [95, 96], both of which are associated with pathogenic variants in SRP54. The respective CM scores using 261 CNLP-derived terms were relatively high (0.893 and 0.672, respectively). The association of SRP54 and these disorders was first reported in November 2017 [95] and entered in OMIM in January 2020 [97], which explained why it was not identified as the diagnosis originally in July 2017. The identified candidate p.Gly108Glu variant has been classified as “uncertain significance” by ACMG guidelines. However, if we were able to confirm de novo origin with paternal genotypes (which is currently lacking for this single proband case), the variant could be reclassified as “likely pathogenic” (meeting PM2, PM1, PP3, and PM6 of the ACMG guidelines). This was a singleton proband sequence and confirmation is being pursued. Thus, GEM reanalysis of 14 unsolved cases led to 7 gene-disorder reviews (an average of 0.5 per case), and yielded one likely new diagnosis, which was consistent with prior reanalysis yields [93, 94].

Table 3.

Previously undiagnosed cases with potential leads. Cases with hits with a GEM gene score BF > 1.5. Zygo zygosity, Hom homozygous, Het heterozygous, Dup large duplication

Case Pedigree Sex Assay Rank Chr Gene Variant Type Variant ACMG De novo Zygo GEM score Mode of inheritance MIM ID(s) CM score(s)
244799 Single Male WGS 1 14 SRP54 SNV Uncertain significance Likely Het 1.76 Dominant 618752, 260400 0.672, 0.893
245237 Trio Male WGS 2 X GK SNV VUS Yes Het 1.60 X-linked recessive 307030 1.119
245237 Trio Male WGS 3 16 FANCA SNV VUS No Hom 1.55 Recessive 227650 1.315
245768 Single Male WGS 1 16 TSC2 Dup VUS Likely Het 1.64 Dominant N/A N/A
247458 Single Male WGS 1 1 SLC25A24 SNV VUS Likely Het 1.86 Dominant 612289 1.995
247963 Trio Female WGS 1 X STAG2 SNV Likely pathogenic Yes Het 1.53 X-linked dominant 301022 1.25

Conclusions

Here we described and benchmarked a Bayesian, AI-based gene prioritization tool for scalable diagnosis of rare genetic diseases by CNLP and WES or WGS. GEM improved upon prior, similar tools [19, 27, 28, 98, 99] by incorporating OMIM, HPO, and ClinVar knowledge explicitly, automatically controlling for confounding factors, such as sex and ancestry, compatibility with CNLP-derived phenotypes, SVs and singleton probands, and by directly nominating diplotypes and disorders, rather than just prioritizing variants.

In the cohorts examined, GEM had maximal recall of 99%, requiring review of an average of 3 candidate genes, and less than one half of the associated disorders nominated by other widely used variant prioritization methods per case. Improved diagnostic performance is anticipated to enable faster and more cost-effective, tiered reviews. GEM recall was essentially unaltered in the absence of parental genotypes in our data, meaning that full trio-sequencing is not always a requirement for high diagnostic yield. However, our cohort includes only definitively solved cases with 70% of variants already classified as P/LP in ClinVar; identification of less certain candidate variants and genes may still benefit from parental genotypes for ascertaining de novo variants, and for phasing alleles in genes associated to recessive conditions.

Uniquely, GEM provided AI-based unified gene prioritization for SVs and small variants. Hitherto, this has been frustrated by the high false positive rates of SV calls using short-read sequences and lack of a suitable framework for AI-based SV pathogenicity assertions [87, 88]. Furthermore, GEM inferred SV calls ab initio from WGS when they were not provided. These functionalities are critical for reanalyzing older cases, and for pipelines lacking SV calls.

Finally, in a small data set, we showed that GEM can efficiently reanalyze cases, potentially permitting cost-effective, scalable reanalysis of previously unsolved cases as disease, gene, and variant knowledge evolves [94, 100]. Indeed, integration of GEM and CNLP could enable automatic surveillance for rare disease patients [101] from genomes obtained for research or other clinical tests performed in healthcare [102, 103]. These combined features hold promise for reduced time-to-diagnosis and greater scalability for critical applications, such as in seriously ill children in the NICU/PICU [27, 104].

Supplementary Information

13073_2021_965_MOESM1_ESM.xlsx (336KB, xlsx)

Additional file 1. Supplementary Tables (Tables S1-S14).

13073_2021_965_MOESM2_ESM.pdf (911KB, pdf)

Additional file 2. Supplementary Figures (Figures S1-S7).

Acknowledgements

We thank Jeff Rule and Birgit Crain for help in extracting case data for the RCH cases from Fabric Enterprise, Brent Lutz for project management, and Sandy White for interinstitutional coordination (Fabric Genomics, Oakland CA). We are grateful to Joe Azure, Josh Grigonis, Jeff Rule, Bjoern Achilles-Stade, Peter Spiro, Gilberto De La Vega, Tara Friedrich (Fabric Genomics, Oakland, CA), Ray Drummond, Richard Littin, Aidan Scarlet, Mike Richdale, and Tim Dawson (NetValue Ltd, Hamilton, New Zealand) for software and architecture development efforts to implement GEM in the Fabric Enterprise platform. We also acknowledge Edward Kiruluta and Marco Falcioni (formerly Fabric Genomics) for useful discussions early on the project.

Abbreviations

AI

Artificial intelligence

BF

Bayes factor

eCDS

Electronic Clinical Support System

CNLP

Clinical Natural Language Processing

CM

Condition match score

GEM

Fabric GEM

HPO

Human Phenotype Ontology

OMIM

Online Mendelian Inheritance in Man

SNV

Single-nucleotide variant

Indel

Insertion-deletion

SV

Structural variants

VAAST

Variant Annotation, Analysis and Search Tool

VVP

VAAST Variant Prioritizer

WES

Whole-exome sequencing

WGS

Whole-genome sequencing

Authors’ contributions

FV, MY, and SC designed the study and analysis strategy. MY developed the GEM algorithms. FV guided requirements, designed UIs, and led the software implementation. BM and EF implemented analysis pipelines. BM, EF, EJH, JM, FV, and MY performed data analysis. AF, AHB, BB, BSL, CAB, CAG, JM, KJ, KÕ, KR, LG, MH, MN NV, PBA, SC, SP, TW, and VN complied cases and clinical evidence. PB provided feedback on features and development. MGR and SK sponsored the project and provided helpful discussions and edits of the manuscript. FV and MY wrote the manuscript. All authors reviewed and suggested edits for the final version of the manuscript. The authors read and approved the final manuscript.

Funding

MH, KR, MN, and VN were supported in part by The Center for Rare Childhood Disorders, funded through donations made to the TGen Foundation. AF and BSL were supported by the DFG Cluster of Excellence “Precision Medicine in Chronic Inflammation”. KO and SP were supported by Estonian Research Council grants PUT355, PRG471, MOBTP175, and PUTJD827. Sequencing and analysis were partially provided by the Broad Institute of MIT and Broad Center for Mendelian Genomics (Broad CMG) and was funded by the National Human Genome Research Institute, the National Eye Institute, and the National Heart, Lung and Blood Institute grant UM1 HG008900 and in part by National Human Genome Research Institute grant R01 HG009141. The phenotyping and analysis of patients at Boston Children’s Hospital was funded by MDA602235 from the Muscular Dystrophy Association, and the Tommy Fuss Foundation, and the Yale Center for Mendelian Genomics. Sanger sequencing confirmations utilized the resources of the Boston Children’s Hospital IDDRC Molecular Genetics Core Facility supported by U54HD090255 from the National Institutes of Health.

Availability of data and materials

The datasets supporting the conclusions of this article are included within the article and its additional files. Due to patient privacy, data sharing consent, and HIPAA regulations, our raw data cannot be submitted to publicly available databases. Anonymized outputs from GEM [70], Phevor [15], VAAST [14], and Exomiser [16] for the benchmark dataset cases are tabulated in Additional file 1: Tables S5-S8, and GEM for the validation dataset cases in Additional file 1: Table S10. Condition match scores for hits with gene BF > 0 used for Fig. 6 are tabulated in Additional file 1: Tables S11-S14. GEM, Phevor, and VAAST software implementations for versions used in this analysis are part of the Fabric Enterprise analysis platform and are commercially available [70]. Exomiser source code (version 12.1.0) is available on GitHub [105].

Declarations

Ethics approval and consent to participate

The need for Institutional Review Board Approval at Rady Children’s Hospital for the current study was waived as all data used from this project had previously been generated as part of IRB approved studies and none of the results reported in this manuscript can be used to identify individual patients. The studies from which cases derived from were previously approved by the Institutional Review Boards of Rady Children’s Hospital, Boston Children’s Hospital (IRB protocols 03-08-128R and 10-02-0053), Christian-Albrechts University of Kiel (approval #A-156/02), HudsonAlpha Institute for Biotechnology (Western Institutional Review Board #20130675 and the University of Alabama at Birmingham #X130201001), the Translational Genomics Research Institute (WIRB® Protocol #20120789), and the Research Ethics Committee of the University of Tartu (approvals #263/M-16 and #2871N). These studies were performed in accordance with the Declaration of Helsinki and informed consent was obtained from at least one parent or guardian.

Consent for publication

Not applicable.

Competing interests

FV, EF, JM, and MGR were employees of Fabric Genomics Inc. during the performance of this work and have received stock grants from Fabric Genomics Inc. BM, PB, and MY are consultants to Fabric Genomics Inc. and have received consulting fees and stock grants from Fabric Genomics Inc. The remaining authors declare that they have no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Francisco M. De La Vega, Email: francisco.delavega@stanford.edu

Martin G. Reese, Email: mreese@fabricgenomics.com

Mark Yandell, Email: myandell@genetics.utah.edu.

References

  • 1.Church G. Compelling reasons for repairing human germlines. New Engl J Med. 2017;377:1909–1911. doi: 10.1056/NEJMp1710370. [DOI] [PubMed] [Google Scholar]
  • 2.Bamshad MJ, Nickerson DA, Chong JX. Mendelian gene discovery: fast and furious with no end in sight. Am J Hum Genet. 2019;105:448–455. doi: 10.1016/j.ajhg.2019.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Online Mendelian Inheritance in Man, OMIM®McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD) (available at https://omim.org/).
  • 4.Wright CF, FitzPatrick DR, Firth HV. Paediatric genomics: diagnosing rare disease in children. Nat Rev Genet. 2018;10:1–16. doi: 10.1038/nrg.2018.12. [DOI] [PubMed] [Google Scholar]
  • 5.Mardis ER. The $1,000 genome, the $100,000 analysis? Genome Med. 2010;2:84. [DOI] [PMC free article] [PubMed]
  • 6.Lavenier D, Cimadomo R, Jodin R. Variant calling parallelization on processor-in-memory architecture. bioRxiv 2020.11.03.366237.
  • 7.Lee S, Min H, Yoon S. Will solid-state drives accelerate your bioinformatics? In-depth profiling, performance analysis and beyond. Brief Bioinform. 2015;17:713–727. doi: 10.1093/bib/bbv073. [DOI] [PubMed] [Google Scholar]
  • 8.Kiely B, Vettam S, Adesman A. Utilization of genetic testing among children with developmental disabilities in the United States. Appl Clin Genet. 2016;9:93–100. doi: 10.2147/TACG.S103975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Markello TC, Adams DR. Current protocols in human genetics. Curr Protoc Hum Genet. 2013;79:6.13.1–6.13.19. doi: 10.1002/0471142905.hg0613s79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Dewey FE, Grove ME, Pan C, Goldstein BA, Bernstein JA, Chaib H, Merker JD, Goldfeder RL, Enns GM, David SP, Pakdaman N, Ormond KE, Caleshu C, Kingham K, Klein TE, Whirl-Carrillo M, Sakamoto K, Wheeler MT, Butte AJ, Ford JM, Boxer L, Ioannidis JPA, Yeung AC, Altman RB, Assimes TL, Snyder M, Ashley EA, Quertermous T. Clinical interpretation and implications of whole-genome sequencing. Jama. 2014;311:1035–1045. doi: 10.1001/jama.2014.1717. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Jagadeesh KA, Wenger AM, Berger MJ, Guturu H, Stenson PD, Cooper DN, Bernstein JA, Bejerano G. M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat Genet. 2016;48:1–8. doi: 10.1038/ng.3703. [DOI] [PubMed] [Google Scholar]
  • 12.Yang Y, Muzny DM, Reid JG, Bainbridge MN, Willis A, Ward PA, Braxton A, Beuten J, Xia F, Niu Z, Hardison M, Person R, Bekheirnia MR, Leduc MS, Kirby A, Pham P, Scull J, Wang M, Ding Y, Plon SE, Lupski JR, Beaudet AL, Gibbs RA, Eng CM. Clinical whole-exome sequencing for the diagnosis of mendelian disorders. New Engl J Med. 2013;369:1502–1511. doi: 10.1056/NEJMoa1306555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164. doi: 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Yandell M, Huff C, Hu H, Singleton M, Moore B, Xing J, Jorde LB, Reese MG. A probabilistic disease-gene finder for personal genomes. Genome Res. 2011;21:1529–1542. doi: 10.1101/gr.123158.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Singleton MV, Guthery SL, Voelkerding KV, Chen K, Kennedy B, Margraf RL, Durtschi J, Eilbeck K, Reese MG, Jorde LB, Huff CD, Yandell M. Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. Am J Hum Genet. 2014;94:599–610. doi: 10.1016/j.ajhg.2014.03.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Robinson P, Kohler S, Oellrich A, Project SMG, Wang K, Mungall C, et al. Improved exome prioritization of disease genes through cross species phenotype comparison. Genome Res. 2013;24. 10.1101/gr.160325.113. [DOI] [PMC free article] [PubMed]
  • 17.Agrawal S, Javed A, Ng PC. Phen-Gen: combining phenotype and genotype to analyze rare disorders. Nat Methods. 2014;11:1–7. doi: 10.1038/nmeth.2801. [DOI] [PubMed] [Google Scholar]
  • 18.Yang H, Robinson PN, Wang K. Phenolyzer: phenotype-based prioritization of candidate genes for human diseases. Nat Methods. 2015;12:841–843. doi: 10.1038/nmeth.3484. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Birgmeier J, Haeussler M, Deisseroth CA, Steinberg EH, Jagadeesh KA, Ratner AJ, Guturu H, Wenger AM, Diekhans ME, Stenson PD, Cooper DN, Ré C, Beggs AH, Bernstein JA, Bejerano G. AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature. Sci Transl Med. 2020;12:eaau9113. doi: 10.1126/scitranslmed.aau9113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cooper GM, Coe BP, Girirajan S, Rosenfeld JA, Vu TH, Baker C, Williams C, Stalker H, Hamid R, Hannig V, Abdel-Hamid H, Bader P, McCracken E, Niyazov D, Leppig K, Thiese H, Hummel M, Alexander N, Gorski J, Kussmann J, Shashi V, Johnson K, Rehder C, Ballif BC, Shaffer LG, Eichler EE. A copy number variation morbidity map of developmental delay. Nat Genet. 2011;43:838–846. doi: 10.1038/ng.909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Miller DT, Adam MP, Aradhya S, Biesecker LG, Brothman AR, Carter NP, Church DM, Crolla JA, Eichler EE, Epstein CJ, Faucett WA, Feuk L, Friedman JM, Hamosh A, Jackson L, Kaminsky EB, Kok K, Krantz ID, Kuhn RM, Lee C, Ostell JM, Rosenberg C, Scherer SW, Spinner NB, Stavropoulos DJ, Tepperberg JH, Thorland EC, Vermeesch JR, Waggoner DJ, Watson MS, Martin CL, Ledbetter DH. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am J Hum Genet. 2010;86:749–764. doi: 10.1016/j.ajhg.2010.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Clark MM, Stark Z, Farnaes L, Tan TY, White SM, Dimmock D, Kingsmore SF. Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases. Npj Genom Med. 2018;3:1–10. doi: 10.1038/s41525-018-0053-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Yuan H, Shangguan S, Li Z, Luo J, Su J, Yao R, et al. CNV profiles of Chinese pediatric patients with developmental disorders. Genet Med. 2021:1–10. [DOI] [PubMed]
  • 24.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:1–13. doi: 10.1038/s41591-018-0300-7. [DOI] [PubMed] [Google Scholar]
  • 25.Dias R, Torkamani A. Artificial intelligence in clinical and genomic diagnostics. Genome Med. 2019;11:70. doi: 10.1186/s13073-019-0689-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Birgmeier J, Deisseroth CA, Hayward LE, Galhardo LMT, Tierno AP, Jagadeesh KA, Stenson PD, Cooper DN, Bernstein JA, Haeussler M, Bejerano G. AVADA: toward automated pathogenic variant evidence retrieval directly from the full-text literature. Genet Med. 2020;22:362–370. doi: 10.1038/s41436-019-0643-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Clark MM, Hildreth A, Batalov S, Ding Y, Chowdhury S, Watkins K, Ellsworth K, Camp B, Kint CI, Yacoubian C, Farnaes L, Bainbridge MN, Beebe C, Braun JJA, Bray M, Carroll J, Cakici JA, Caylor SA, Clarke C, Creed MP, Friedman J, Frith A, Gain R, Gaughran M, George S, Gilmer S, Gleeson J, Gore J, Grunenwald H, Hovey RL, Janes ML, Lin K, McDonagh PD, McBride K, Mulrooney P, Nahas S, Oh D, Oriol A, Puckett L, Rady Z, Reese MG, Ryu J, Salz L, Sanford E, Stewart L, Sweeney N, Tokita M, Kraan LVD, White S, Wigby K, Williams B, Wong T, Wright MS, Yamada C, Schols P, Reynders J, Hall K, Dimmock D, Veeraraghavan N, Defay T, Kingsmore SF. Diagnosis of genetic diseases in seriously ill children by rapid whole-genome sequencing and automated phenotyping and interpretation. Sci Transl Med. 2019;11:eaat6177. doi: 10.1126/scitranslmed.aat6177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.James KN, Clark MM, Camp B, Kint C, Schols P, Batalov S, Briggs B, Veeraraghavan N, Chowdhury S, Kingsmore SF. Partially automated whole-genome sequencing reanalysis of previously undiagnosed pediatric patients can efficiently yield new diagnoses. Npj Genom Med. 2020;5:33. doi: 10.1038/s41525-020-00140-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Shortliffe EH, Sepúlveda MJ. Clinical Decision Support in the Era of Artificial Intelligence. Jama. 2018;320:2199. doi: 10.1001/jama.2018.17163. [DOI] [PubMed] [Google Scholar]
  • 30.Kingsmore SF, Cakici JA, Clark MM, Gaughran M, Feddock M, Batalov S, et al. A randomized, controlled trial of the analytic and diagnostic performance of singleton and trio, rapid genome and exome sequencing in ill infants. Am J Hum Genet. 2019:1–17. [DOI] [PMC free article] [PubMed]
  • 31.Rusell SJ, Norvig P. Artificial Intelligence: A Modern Approach. 4. Hoboken: Prarson; 2020. [Google Scholar]
  • 32.Eilbeck K, Quinlan A, Yandell M. Settling the score: variant prioritization and Mendelian disease. Nat Rev Genet. 2017;18:1–14. doi: 10.1038/nrg.2017.52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Smedley D, Jacobsen JOB, Jäger M, Köhler S, Holtgrewe M, Schubach M, Siragusa E, Zemojtel T, Buske OJ, Washington NL, Bone WP, Haendel MA, Robinson PN. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat Protoc. 2015;10:2004–2015. doi: 10.1038/nprot.2015.124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Robinson PN, Ravanmehr V, Jacobsen JOB, Danis D, Zhang XA, Carmody L, et al. Interpretable clinical genomics with a likelihood ratio paradigm. Medrxiv. 2020:2020.01.25.19014803. [DOI] [PMC free article] [PubMed]
  • 35.Farnaes L, Hildreth A, Sweeney NM, Clark MM, Chowdhury S, Nahas S, Cakici JA, Benson W, Kaplan RH, Kronick R, Bainbridge MN, Friedman J, Gold JJ, Ding Y, Veeraraghavan N, Dimmock D, Kingsmore SF. Rapid whole-genome sequencing decreases infant morbidity and cost of hospitalization. Npj Genom Med. 2018;3:1–8. doi: 10.1038/s41525-018-0049-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Farnaes L, Nahas SA, Chowdhury S, Nelson J, Batalov S, Dimmock DM, Kingsmore SF, on behalf of the R. Investigators Rapid whole-genome sequencing identifies a novel GABRA1 variant associated with West syndrome. Mol Case Stud. 2017;3:a001776. doi: 10.1101/mcs.a001776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Hildreth A, Wigby K, Chowdhury S, Nahas S, Barea J, Ordonez P, Batalov S, Dimmock D, Kingsmore S, on behalf of the R. Investigators Rapid whole-genome sequencing identifies a novel homozygous NPC1 variant associated with Niemann–Pick type C1 disease in a 7-week-old male with cholestasis. Mol Case Stud. 2017;3:a001966. doi: 10.1101/mcs.a001966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Sanford E, Watkins K, Nahas S, Gottschalk M, Coufal NG, Farnaes L, Dimmock D, Kingsmore SF, on behalf of the R. Investigators Rapid whole-genome sequencing identifies a novel AIRE variant associated with autoimmune polyendocrine syndrome type 1. Mol Case Stud. 2018;4:a002485. doi: 10.1101/mcs.a002485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Sanford E, Farnaes L, Batalov S, Bainbridge M, Laubach S, Worthen HM, Tokita M, Kingsmore SF, Bradley J. Concomitant diagnosis of immune deficiency and Pseudomonas sepsis in a 19 month old with ecthyma gangrenosum by host whole-genome sequencing. Mol Case Stud. 2018;4:a003244. doi: 10.1101/mcs.a003244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Briggs B, James KN, Chowdhury S, Thornburg C, Farnaes L, Dimmock D, Kingsmore SF, on behalf of the R. Investigators Novel Factor XIII variant identified through whole-genome sequencing in a child with intracranial hemorrhage. Mol Case Stud. 2018;4:a003525. doi: 10.1101/mcs.a003525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, Källberg M, Cox AJ, Kruglyak S, Saunders CT. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2016;32:1220–1222. doi: 10.1093/bioinformatics/btv710. [DOI] [PubMed] [Google Scholar]
  • 42.Abyzov A, Urban AE, Snyder M, Gerstein M. CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011;21:974–984. doi: 10.1101/gr.114876.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Ravenscroft G, Miyatake S, Lehtokari V-L, Todd EJ, Vornanen P, Yau KS, Hayashi YK, Miyake N, Tsurusaki Y, Doi H, Saitsu H, Osaka H, Yamashita S, Ohya T, Sakamoto Y, Koshimizu E, Imamura S, Yamashita M, Ogata K, Shiina M, Bryson-Richardson RJ, Vaz R, Ceyhan O, Brownstein CA, Swanson LC, Monnot S, Romero NB, Amthor H, Kresoje N, Sivadorai P, Kiraly-Borri C, Haliloglu G, Talim B, Orhan D, Kale G, Charles AK, Fabian VA, Davis MR, Lammens M, Sewry CA, Manzur A, Muntoni F, Clarke NF, North KN, Bertini E, Nevo Y, Willichowski E, Silberg IE, Topaloglu H, Beggs AH, Allcock RJN, Nishino I, Wallgren-Pettersson C, Matsumoto N, Laing NG. Mutations in KLHL40 are a frequent cause of severe autosomal-recessive nemaline myopathy. Am J Hum Genet. 2013;93:6–18. doi: 10.1016/j.ajhg.2013.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Konersman CG, Freyermuth F, Winder TL, Lawlor MW, Lagier-Tourenne C, Patel SB. Novel autosomal dominant TNNT1 mutation causing nemaline myopathy. Mol Genet Genom Med. 2017;5:678–691. doi: 10.1002/mgg3.325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Lehtokari V, Kiiski K, Sandaradura SA, Laporte J, Repo P, Frey JA, Donner K, Marttila M, Saunders C, Barth PG, den Dunnen JT, Beggs AH, Clarke NF, North KN, Laing NG, Romero NB, Winder TL, Pelin K, Wallgren-Pettersson C. Mutation update: the spectra of nebulin variants and associated myopathies. Hum Mutat. 2014;35:1418–1426. doi: 10.1002/humu.22693. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Laing NG, Dye DE, Wallgren-Pettersson C, Richard G, Monnier N, Lillis S, Winder TL, Lochmüller H, Graziano C, Mitrani-Rosenbaum S, Twomey D, Sparrow JC, Beggs AH, Nowak KJ. Mutations and polymorphisms of the skeletal muscle r-actin gene (ACTA1) Hum Mutat. 2009;30:1267–1277. doi: 10.1002/humu.21059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Smedemark-Margulies N, Brownstein CA, Vargas S, Tembulkar SK, Towne MC, Shi J, Gonzalez-Cuevas E, Liu KX, Bilguvar K, Kleiman RJ, Han M-J, Torres A, Berry GT, Yu TW, Beggs AH, Agrawal PB, Gonzalez-Heydrich J. A novel de novo mutation in ATP1A3 and childhood-onset schizophrenia. Mol Case Stud. 2016;2:a001008. doi: 10.1101/mcs.a001008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Yuen M, Sandaradura SA, Dowling JJ, Kostyukova AS, Moroz N, Quinlan KG, Lehtokari V-L, Ravenscroft G, Todd EJ, Ceyhan-Birsoy O, Gokhin DS, Maluenda J, Lek M, Nolent F, Pappas CT, Novak SM, D’Amico A, Malfatti E, Thomas BP, Gabriel SB, Gupta N, Daly MJ, Ilkovski B, Houweling PJ, Davidson AE, Swanson LC, Brownstein CA, Gupta VA, Medne L, Shannon P, Martin N, Bick DP, Flisberg A, Holmberg E, den Bergh PV, Lapunzina P, Waddell LB, Sloboda DD, Bertini E, Chitayat D, Telfer WR, Laquerrière A, Gregorio CC, Ottenheijm CAC, Bönnemann CG, Pelin K, Beggs AH, Hayashi YK, Romero NB, Laing NG, Nishino I, Wallgren-Pettersson C, Melki J, Fowler VM, MacArthur DG, North KN, Clarke NF. Leiomodin-3 dysfunction results in thin filament disorganization and nemaline myopathy. J Clin Invest. 2015;125:456–457. doi: 10.1172/JCI80057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.D. D. D. (DDD) Study. Zweier M, Begemann A, McWalter K, Cho MT, Abela L, Banka S, Behring B, Berger A, Brown CW, Carneiro M, Chen J, Cooper GM, Finnila CR, Sacoto MJG, Henderson A, Hüffmeier U, Joset P, Kerr B, Lesca G, Leszinski GS, McDermott JH, Meltzer MR, Monaghan KG, Mostafavi R, Õunap K, Plecko B, Powis Z, Purcarin G, Reimand T, Riedhammer KM, Schreiber JM, Sirsi D, Wierenga KJ, Wojcik MH, Papuc SM, Steindl K, Sticht H, Rauch A. Spatially clustering de novo variants in CYFIP2, encoding the cytoplasmic FMRP interacting protein 2, cause intellectual disability and seizures. Eur J Hum Genet. 2019;27:747–759. doi: 10.1038/s41431-018-0331-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Zeissig S, Petersen B-S, Tomczak M, Melum E, Huc-Claustre E, Dougan SK, Laerdahl JK, Stade B, Forster M, Schreiber S, Weir D, Leichtner AM, Franke A, Blumberg RS. Early-onset Crohn’s disease and autoimmunity associated with a variant in CTLA-4. Gut. 2015;64:1889. doi: 10.1136/gutjnl-2014-308541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Zeissig Y, Petersen B-S, Milutinovic S, Bosse E, Mayr G, Peuker K, Hartwig J, Keller A, Kohl M, Laass MW, Billmann-Born S, Brandau H, Feller AC, Röcken C, Schrappe M, Rosenstiel P, Reed JC, Schreiber S, Franke A, Zeissig S. XIAP variants in male Crohn’s disease. Gut. 2015;64:66. doi: 10.1136/gutjnl-2013-306520. [DOI] [PubMed] [Google Scholar]
  • 52.Schubert D, Bode C, Kenefeck R, Hou TZ, Wing JB, Kennedy A, Bulashevska A, Petersen B-S, Schäffer AA, Grüning BA, Unger S, Frede N, Baumann U, Witte T, Schmidt RE, Dueckers G, Niehues T, Seneviratne S, Kanariou M, Speckmann C, Ehl S, Rensing-Ehl A, Warnatz K, Rakhmanov M, Thimme R, Hasselblatt P, Emmerich F, Cathomen T, Backofen R, Fisch P, Seidl M, May A, Schmitt-Graeff A, Ikemizu S, Salzer U, Franke A, Sakaguchi S, Walker LSK, Sansom DM, Grimbacher B. Autosomal dominant immune dysregulation syndrome in humans with CTLA4 mutations. Nat Med. 2014;20:1410–1416. doi: 10.1038/nm.3746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Müller T, Rasool I, Heinz-Erian P, Mildenberger E, Hülstrunk C, Müller A, Michaud L, Koot BGP, Ballauff A, Vodopiutz J, Rosipal S, Petersen B-S, Franke A, Fuchs I, Witt H, Zoller H, Janecke AR, Visweswariah SS. Congenital secretory diarrhoea caused by activating germline mutations in GUCY2C. Gut. 2016;65:1306. doi: 10.1136/gutjnl-2015-309441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Jung ES, Petersen B-S, Mayr G, Cheon JH, Kang Y, Lee SJ, Che X, Kim WH, Kim S, Schreiber S, Franke A, Koh H. Compound heterozygous mutations in IL10RA combined with a complement factor properdin mutation in infantile-onset inflammatory bowel disease. Eur J Gastroen Hepat. 2018;30:1491–1496. doi: 10.1097/MEG.0000000000001247. [DOI] [PubMed] [Google Scholar]
  • 55.Janecke AR, Heinz-Erian P, Yin J, Petersen B-S, Franke A, Lechner S, Fuchs I, Melancon S, Uhlig HH, Travis S, Marinier E, Perisic V, Ristic N, Gerner P, Booth IW, Wedenoja S, Baumgartner N, Vodopiutz J, Frechette-Duval M-C, Lafollie JD, Persad R, Warner N, Tse CM, Sud K, Zachos NC, Sarker R, Zhu X, Muise AM, Zimmer K-P, Witt H, Zoller H, Donowitz M, Müller T. Reduced sodium/proton exchanger NHE3 activity causes congenital sodium diarrhea. Hum Mol Genet. 2015;24:6614–6623. doi: 10.1093/hmg/ddv367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Amendola LM, Berg JS, Horowitz CR, Angelo F, Bensen JT, Biesecker BB, Biesecker LG, Cooper GM, East K, Filipski K, Fullerton SM, Gelb BD, Goddard KAB, Hailu B, Hart R, Hassmiller-Lich K, Joseph G, Kenny EE, Koenig BA, Knight S, Kwok P-Y, Lewis KL, McGuire AL, Norton ME, Ou J, Parsons DW, Powell BC, Risch N, Robinson M, Rini C, Scollon S, Slavotinek AM, Veenstra DL, Wasserstein MP, Wilfond BS, Hindorff LA, Plon SE, Jarvik GP, C. consortium The Clinical Sequencing Evidence-Generating Research Consortium: Integrating Genomic Sequencing in Diverse and Medically Underserved Populations. Am J Hum Genet. 2018;103:319–327. doi: 10.1016/j.ajhg.2018.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Thompson ML, Finnila CR, Bowling KM, Brothers KB, Neu MB, Amaral MD, Hiatt SM, East KM, Gray DE, Lawlor JMJ, Kelley WV, Lose EJ, Rich CA, Simmons S, Levy SE, Myers RM, Barsh GS, Bebin EM, Cooper GM. Genomic sequencing identifies secondary findings in a cohort of parent study participants. Genet Med. 2018;20:1635–1643. doi: 10.1038/gim.2018.53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.East KM, Kelley WV, Cannon A, Cochran ME, Moss IP, May T, Nakano-Okuno M, Sodeke SO, Edberg JC, Cimino JJ, Fouad M, Curry WA, Hurst ACE, Bowling KM, Thompson ML, Bebin EM, Johnson RD, Acemgil A, Crossman DK, Finnila CR, Gray DE, Greve V, Hardy S, Hiatt SM, Latner DR, Lawlor JMJ, Miskell EL, Narmore W, Schach JH, Cooper GM, Might M, Barsh GS, Korf BR. A state-based approach to genomics for rare disease and population screening. Genet Med. 2021;23:777–781. doi: 10.1038/s41436-020-01034-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Bowling KM, Thompson ML, Amaral MD, Finnila CR, Hiatt SM, Engel KL, Cochran JN, Brothers KB, East KM, Gray DE, Kelley WV, Lamb NE, Lose EJ, Rich CA, Simmons S, Whittle JS, Weaver BT, Nesmith AS, Myers RM, Barsh GS, Bebin EM, Cooper GM. Genomic diagnosis for children with intellectual disability and/or developmental delay. Genome Med. 2017;9:43. doi: 10.1186/s13073-017-0433-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Johnson BV, Kumar R, Oishi S, Alexander S, Kasherman M, Vega MS, Ivancevic A, Gardner A, Domingo D, Corbett M, Parnell E, Yoon S, Oh T, Lines M, Lefroy H, Kini U, Allen MV, Grønborg S, Mercier S, Küry S, Bézieau S, Pasquier L, Raynaud M, Afenjar A, de Villemeur TB, Keren B, Désir J, Maldergem LV, Marangoni M, Dikow N, Koolen DA, VanHasselt PM, Weiss M, Zwijnenburg P, Sa J, Reis CF, López-Otín C, Santiago-Fernández O, Fernández-Jaén A, Rauch A, Steindl K, Joset P, Goldstein A, Madan-Khetarpal S, Infante E, Zackai E, Mcdougall C, Narayanan V, Ramsey K, Mercimek-Andrews S, Pena L, Shashi V, Network UD, Pena L, Shashi V, Schoch K, Sullivan JA, Acosta MT, Adams DR, Aday A, Alejandro ME, Allard P, Ashley EA, Azamian MS, Bacino CA, Bademci G, Baker E, Balasubramanyam A, Baldridge D, Barbouth D, Batzli GF, Beggs AH, Bellen HJ, Bernstein JA, Berry GT, Bican A, Bick DP, Birch CL, Bivona S, Bonnenmann C, Bonner D, Boone BE, Bostwick BL, Briere LC, Brokamp E, Brown DM, Brush M, Burke EA, Burrage LC, Butte MJ, Carrasquillo O, Chang TCP, Chao H-T, Clark GD, Coakley TR, Cobban LA, Cogan JD, Cole FS, Colley HA, Cooper CM, Cope H, Craigen WJ, D’Souza P, Dasari S, Davids M, Davidson JM, Dayal JG, Dell’Angelica EC, Dhar SU, Dorrani N, Dorset DC, Douine ED, Draper DD, Dries AM, Duncan L, Eckstein DJ, Emrick LT, Eng CM, Enns GM, Esteves C, Estwick T, Fernandez L, Ferreira C, Fieg EL, Fisher PG, Fogel BL, Forghani I, Friedman ND, Gahl WA, Godfrey RA, Goldman AM, Goldstein DB, Gourdine J-PF, Grajewski A, Groden CA, Gropman AL, Haendel M, Hamid R, Hanchard NA, High F, Holm IA, Hom J, Huang A, Huang Y, Isasi R, Jamal F, Jiang Y, Johnston JM, Jones AL, Karaviti L, Kelley EG, Koeller DM, Kohane IS, Kohler JN, Krakow D, Krasnewich DM, Korrick S, Koziura M, Krier JB, Kyle JE, Lalani SR, Lam B, Lanpher BC, Lanza IR, Lau CC, Lazar J, LeBlanc K, Lee BH, Lee H, Levitt R, Levy SE, Lewis RA, Lincoln SA, Liu P, Liu XZ, Loo SK, Loscalzo J, Maas RL, Macnamara EF, MacRae CA, Maduro VV, Majcherska MM, Malicdan MCV, Mamounas LA, Manolio TA, Markello TC, Marom R, Martin MG, Martínez-Agosto JA, Marwaha S, May T, McCauley J, McConkie-Rosell A, McCormack CE, McCray AT, Merker JD, Metz TO, Might M, Morava-Kozicz E, Moretti PM, Morimoto M, Mulvihill JJ, Murdock DR, Nath A, Nelson SF, Newberry JS, Newman JH, Nicholas SK, Novacic D, Oglesbee D, Orengo JP, Pak S, Pallais JC, Palmer CGS, Papp JC, Parker NH, Phillips JA, Posey JE, Postlethwait JH, Potocki L, Pusey BN, Renteri G, Reuter CM, Rives L, Robertson AK, Rodan LH, Rosenfeld JA, Rowley RK, Sacco R, Sampson JB, Samson SL, Saporta M, Schaechter J, Schedl T, Scott DA, Shakachite L, Sharma P, Shields K, Shin J, Signer R, Sillari CH, Silverman EK, Sinsheimer JS, Smith KS, Solnica-Krezel L, Spillmann RC, Stoler JM, Stong N, Sweetser DA, Tamburro CP, Tan QK-G, Tekin M, Telischi F, Thorson W, Tifft CJ, Toro C, Tran AA, Urv TK, Vogel TP, Waggott DM, Wahl CE, Walley NM, Walsh CA, Walker M, Wambach J, Wan J, Wang L, Wangler MF, Ward PA, Waters KM, Webb-Robertson B-JM, Wegner D, Westerfield M, Wheeler MT, Wise AL, Wolfe LA, Woods JD, Worthey EA, Yamamoto S, Yang J, Yoon AJ, Yu G, Zastrow DB, Zhao C, Zuchner S, Gahl W, Schoch K, Sullivan JA, e Vairo FP, Pichurin PN, Ewing SA, Barnett SS, Klee EW, Perry MS, Koenig MK, Keegan CE, Schuette JL, Asher S, Perilla-Young Y, Smith LD, Rosenfeld JA, Bhoj E, Kaplan P, Li D, Oegema R, van Binsbergen E, van der Zwaag B, Smeland MF, Cutcutache I, Page M, Armstrong M, Lin AE, Steeves MA, den Hollander N, Hoffer MJV, Reijnders MRF, Demirdas S, Koboldt DC, Bartholomew D, Mosher TM, Hickey SE, Shieh C, Sanchez-Lara PA, Graham JM, Tezcan K, Schaefer GB, Danylchuk NR, Asamoah A, Jackson KE, Yachelevich N, Au M, Pérez-Jurado LA, Kleefstra T, Penzes P, Wood SA, Burne T, Pierson TM, Piper M, Gécz J, Jolly LA. Partial loss of USP9X function leads to a male neurodevelopmental and behavioral disorder converging on transforming growth factor β signaling. Biol Psychiat. 2020;87:100–112. doi: 10.1016/j.biopsych.2019.05.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Jepsen WM, Ramsey K, Szelinger S, Llaci L, Balak C, Belnap N, Bilagody C, Both MD, Gupta R, Naymik M, Pandey R, Piras IS, Sanchez-Castillo M, Rangasamy S, Narayanan V, Huentelman MJ. Two additional males with X-linked, syndromic mental retardation carry de novo mutations in HNRNPH2. Clin Genet. 2019;96:183–185. doi: 10.1111/cge.13580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Puusepp S, Reinson K, Pajusalu S, Murumets Ü, Õiglane-Shlik E, Rein R, Talvik I, Rodenburg RJ, Õunap K. Effectiveness of whole exome sequencing in unsolved patients with a clinical suspicion of a mitochondrial disorder in Estonia. Mol Genet Metab Rep. 2018;15:80–89. doi: 10.1016/j.ymgmr.2018.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Zimoń M, Baets J, Almeida-Souza L, Vriendt ED, Nikodinovic J, Parman Y, Battaloǧlu E, Matur Z, Guergueltcheva V, Tournev I, Auer-Grumbach M, Rijk PD, Petersen B-S, Müller T, Fransen E, Damme PV, Löscher WN, Barišić N, Mitrovic Z, Previtali SC, Topaloǧlu H, Bernert G, Beleza-Meireles A, Todorovic S, Savic-Pavicevic D, Ishpekova B, Lechner S, Peeters K, Ooms T, Hahn AF, Züchner S, Timmerman V, Dijck PV, Rasic VM, Janecke AR, Jonghe PD, Jordanova A. Loss-of-function mutations in HINT1 cause axonal neuropathy with neuromyotonia. Nat Genet. 2012;44:1080–1083. doi: 10.1038/ng.2406. [DOI] [PubMed] [Google Scholar]
  • 64.Pravata VM, Gundogdu M, Bartual SG, Ferenbach AT, Stavridis M, Õunap K, Pajusalu S, Tur R, V.A. Wojcik MH, Aalten DMF. A missense mutation in the catalytic domain of O-GlcNAc transferase links perturbations in protein O-GlcNAcylation to X-linked intellectual disability. Febs Lett. 2020;594:717–727. doi: 10.1002/1873-3468.13640. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, Flicek P, Cunningham F. The Ensembl variant effect predictor. Genome Biol. 2016;17:1–14. doi: 10.1186/s13059-016-0974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Flygare S, Hernandez EJ, Phan L, Moore B, Li M, Fejes A, Hu H, Eilbeck K, Huff C, Jorde L, Reese MG, Yandell M. The VAAST Variant Prioritizer (VVP): ultrafast, easy to use whole genome variant prioritization tool. Bmc Bioinformatics. 2018;19:57. doi: 10.1186/s12859-018-2056-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Hoover J, Jang W, Katz K, Ovetsky M, Riley G, Sethi A, Tully R, Villamarin-Salomon R, Rubinstein W, Maglott DR. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2016;44:D862–D868. doi: 10.1093/nar/gkv1222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype–gene relationships. Nucleic Acids Res. 2018;47:gky1151. doi: 10.1093/nar/gky1151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Groza T, Köhler S, Moldenhauer D, Vasilevsky N, Baynam G, Zemojtel T, Schriml LM, Kibbe WA, Schofield PN, Beck T, Vasant D, Brookes AJ, Zankl A, Washington NL, Mungall CJ, Lewis SE, Haendel MA, Parkinson H, Robinson PN. The Human Phenotype Ontology: semantic unification of common and rare disease. Am J Hum Genet. 2015;97:111–124. doi: 10.1016/j.ajhg.2015.05.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Fabric GEM (available at https://fabricgenomics.com/fabric-gem). Last accessed 22 Aug 2021.
  • 71.1000 Genomes Project Consortium. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA, M. P. I. for M. Genetics, B. and W. Hospital, B. College, C. S. H. Laboratory, E. B. Institute, E. M. B. Laboratory, Illumina, L. U. M. Center, L. S. University, U. of C. S. Diego, U. of M. S. of Medicine, U. of Utah, U. of Washington, B. I. of M. and Harvard, M. G. Hospital, U. N. I. of Health, W. T. S. Institute, B. C. of Medicine, C. University, D. University, H. University, S. University, U. of Geneva, U. of M. and D. of N. Jersey, U. of Montreal, Y. University, BGI-Shenzhen, T. G. R. Institute, U. of C. S. Cruz, U. of Michigan, U. of Oxford, W. U. in S. Louis An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Sasani TA, Pedersen BS, Gao Z, Baird L, Przeworski M, Jorde LB, Quinlan AR. Large, three-generation human families reveal post-zygotic mosaicism and variability in germline mutation accumulation. Elife. 2019;8:e46922. doi: 10.7554/eLife.46922. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP, Gauthier LD, Brand H, Solomonson M, Watts NA, Rhodes D, Singer-Berk M, England EM, Seaby EG, Kosmicki JA, Walters RK, Tashman K, Farjoun Y, Banks E, Poterba T, Wang A, Seed C, Whiffin N, Chong JX, Samocha KE, Pierce-Hoffman E, Zappala Z, O’Donnell-Luria AH, Minikel EV, Weisburd B, Lek M, Ware JS, Vittal C, Armean IM, Bergelson L, Cibulskis K, Connolly KM, Covarrubias M, Donnelly S, Ferriera S, Gabriel S, Gentry J, Gupta N, Jeandet T, Kaplan D, Llanwarne C, Munshi R, Novod S, Petrillo N, Roazen D, Ruano-Rubio V, Saltzman A, Schleicher M, Soto J, Tibbetts K, Tolonen C, Wade G, Talkowski ME, Salinas CAA, Ahmad T, Albert CM, Ardissino D, Atzmon G, Barnard J, Beaugerie L, Benjamin EJ, Boehnke M, Bonnycastle LL, Bottinger EP, Bowden DW, Bown MJ, Chambers JC, Chan JC, Chasman D, Cho J, Chung MK, Cohen B, Correa A, Dabelea D, Daly MJ, Darbar D, Duggirala R, Dupuis J, Ellinor PT, Elosua R, Erdmann J, Esko T, Färkkilä M, Florez J, Franke A, Getz G, Glaser B, Glatt SJ, Goldstein D, Gonzalez C, Groop L, Haiman C, Hanis C, Harms M, Hiltunen M, Holi MM, Hultman CM, Kallela M, Kaprio J, Kathiresan S, Kim B-J, Kim YJ, Kirov G, Kooner J, Koskinen S, Krumholz HM, Kugathasan S, Kwak SH, Laakso M, Lehtimäki T, Loos RJF, Lubitz SA, Ma RCW, MacArthur DG, Marrugat J, Mattila KM, McCarroll S, McCarthy MI, McGovern D, McPherson R, Meigs JB, Melander O, Metspalu A, Neale BM, Nilsson PM, O’Donovan MC, Ongur D, Orozco L, Owen MJ, Palmer CNA, Palotie A, Park KS, Pato C, Pulver AE, Rahman N, Remes AM, Rioux JD, Ripatti S, Roden DM, Saleheen D, Salomaa V, Samani NJ, Scharf J, Schunkert H, Shoemaker MB, Sklar P, Soininen H, Sokol H, Spector T, Sullivan PF, Suvisaari J, Tai ES, Teo YY, Tiinamaija T, Tsuang M, Turner D, Tusie-Luna T, Vartiainen E, Ware JS, Watkins H, Weersma RK, Wessman M, Wilson JG, Xavier RJ, Neale BM, Daly MJ, MacArthur DG. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Hoijtink H, Mulder J, van Lissa C, Gu X. A tutorial on testing hypotheses using the Bayes factor. Psychol Methods. 2019. 10.1037/met0000201. [DOI] [PubMed]
  • 75.Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. P Ieee. 1989;77:257–286. doi: 10.1109/5.18626. [DOI] [Google Scholar]
  • 76.Köster J, Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28:2520–2522. doi: 10.1093/bioinformatics/bts480. [DOI] [PubMed] [Google Scholar]
  • 77.Turro E, Astle WJ, Megy K, Gräf S, Greene D, Shamardina O, Allen HL, Sanchis-Juan A, Frontini M, Thys C, Stephens J, Mapeta R, Burren OS, Downes K, Haimel M, Tuna S, Deevi SVV, Aitman TJ, Bennett DL, Calleja P, Carss K, Caulfield MJ, Chinnery PF, Dixon PH, Gale DP, James R, Koziell A, Laffan MA, Levine AP, Maher ER, Markus HS, Morales J, Morrell NW, Mumford AD, Ormondroyd E, Rankin S, Rendon A, Richardson S, Roberts I, Roy NBA, Saleem MA, Smith KGC, Stark H, Tan RYY, Themistocleous AC, Thrasher AJ, Watkins H, Webster AR, Wilkins MR, Williamson C, Whitworth J, Humphray S, Bentley DR, Abbs S, Abulhoul L, Adlard J, Ahmed M, Aitman TJ, Alachkar H, Allsup DJ, Almeida-King J, Ancliff P, Antrobus R, Armstrong R, Arno G, Ashford S, Astle WJ, Attwood A, Aurora P, Babbs C, Bacchelli C, Bakchoul T, Banka S, Bariana T, Barwell J, Batista J, Baxendale HE, Beales PL, Bennett DL, Bentley DR, Bierzynska A, Biss T, Bitner-Glindzicz MAK, Black GC, Bleda M, Blesneac I, Bockenhauer D, Bogaard H, Bourne CJ, Boyce S, Bradley JR, Bragin E, Breen G, Brennan P, Brewer C, Brown M, Browning AC, Browning MJ, Buchan RJ, Buckland MS, Bueser T, Diz CB, Burn J, Burns SO, Burren OS, Burrows N, Calleja P, Campbell C, Carr-White G, Carss K, Casey R, Caulfield MJ, Chambers J, Chambers J, Chan MMY, Cheah C, Cheng F, Chinnery PF, Chitre M, Christian MT, Church C, Clayton-Smith J, Cleary M, Brod NC, Coghlan G, Colby E, Cole TRP, Collins J, Collins PW, Colombo C, Compton CJ, Condliffe R, Cook S, Cook HT, Cooper N, Corris PA, Furnell A, Cunningham F, Curry NS, Cutler AJ, Daniels MJ, Dattani M, Daugherty LC, Davis J, Soyza AD, Deevi SVV, Dent T, Deshpande C, Dewhurst EF, Dixon PH, Douzgou S, Downes K, Drazyk AM, Drewe E, Duarte D, Dutt T, Edgar JDM, Edwards K, Egner W, Ekani MN, Elliott P, Erber WN, Erwood M, Estiu MC, Evans DG, Evans G, Everington T, Eyries M, Fassihi H, Favier R, Findhammer J, Fletcher D, Flinter FA, Floto RA, Fowler T, Fox J, Frary AJ, French CE, Freson K, Frontini M, Gale DP, Gall H, Ganesan V, Gattens M, Geoghegan C, Gerighty TSA, Gharavi AG, Ghio S, Ghofrani H-A, Gibbs JSR, Gibson K, Gilmour KC, Girerd B, Gleadall NS, Goddard S, Goldstein DB, Gomez K, Gordins P, Gosal D, Gräf S, Graham J, Grassi L, Greene D, Greenhalgh L, Greinacher A, Gresele P, Griffiths P, Grigoriadou S, Grocock RJ, Grozeva D, Gurnell M, Hackett S, Hadinnapola C, Hague WM, Hague R, Haimel M, Hall M, Hanson HL, Haque E, Harkness K, Harper AR, Harris CL, Hart D, Hassan A, Hayman G, Henderson A, Herwadkar A, Hoffman J, Holden S, Horvath R, Houlden H, Houweling AC, Howard LS, Hu F, Hudson G, Hughes J, Huissoon AP, Humbert M, Humphray S, Hunter S, Hurles M, Irving M, Izatt L, James R, Johnson SA, Jolles S, Jolley J, Josifova D, Jurkute N, Karten T, Karten J, Kasanicki MA, Kazkaz H, Kazmi R, Kelleher P, Kelly AM, Kelsall W, Kempster C, Kiely DG, Kingston N, Klima R, Koelling N, Kostadima M, Kovacs G, Koziell A, Kreuzhuber R, Kuijpers TW, Kumar A, Kumararatne D, Kurian MA, Laffan MA, Lalloo F, Lambert M, Allen HL, Lawrie A, Layton DM, Lench N, Lentaigne C, Lester T, Levine AP, Linger R, Longhurst H, Lorenzo LE, Louka E, Lyons PA, Machado RD, Ross RVM, Madan B, Maher ER, Maimaris J, Malka S, Mangles S, Mapeta R, Marchbank KJ, Marks S, Markus HS, Marschall H-U, Marshall A, Martin J, Mathias M, Matthews E, Maxwell H, McAlinden P, McCarthy MI, McKinney H, McMahon A, Meacham S, Mead AJ, Castello IM, Megy K, Mehta SG, Michaelides M, Millar C, Mohammed SN, Moledina S, Montani D, Moore AT, Morales J, Morrell NW, Mozere M, Muir KW, Mumford AD, Nemeth AH, Newman WG, Newnham M, Noorani S, Nurden P, O’Sullivan J, Obaji S, Odhams C, Okoli S, Olschewski A, Olschewski H, Ong KR, Oram SH, Ormondroyd E, Ouwehand WH, Palles C, Papadia S, Park S-M, Parry D, Patel S, Paterson J, Peacock A, Pearce SH, Peden J, Peerlinck K, Penkett CJ, Pepke-Zaba J, Petersen R, Pilkington C, Poole KES, Prathalingam R, Psaila B, Pyle A, Quinton R, Rahman S, Rankin S, Rao A, Raymond FL, Rayner-Matthews PJ, Rees C, Rendon A, Renton T, Rhodes CJ, Rice ASC, Richardson S, Richter A, Robert L, Roberts I, Rogers A, Rose SJ, Ross-Russell R, Roughley C, Roy NBA, Ruddy DM, Sadeghi-Alavijeh O, Saleem MA, Samani N, Samarghitean C, Sanchis-Juan A, Sargur RB, Sarkany RN, Satchell S, Savic S, Sayer JA, Sayer G, Scelsi L, Schaefer AM, Schulman S, Scott R, Scully M, Searle C, Seeger W, Sen A, Sewell WAC, Seyres D, Shah N, Shamardina O, Shapiro SE, Shaw AC, Short PJ, Sibson K, Side L, Simeoni I, Simpson MA, Sims MC, Sivapalaratnam S, Smedley D, Smith KR, Smith KGC, Snape K, Soranzo N, Soubrier F, Southgate L, Spasic-Boskovic O, Staines S, Staples E, Stark H, Stephens J, Steward C, Stirrups KE, Stuckey A, Suntharalingam J, Swietlik EM, Syrris P, Tait RC, Talks K, Tan RYY, Tate K, Taylor JM, Taylor JC, Thaventhiran JE, Themistocleous AC, Thomas E, Thomas D, Thomas MJ, Thomas P, Thomson K, Thrasher AJ, Threadgold G, Thys C, Tilly T, Tischkowitz M, Titterton C, Todd JA, Toh C-H, Tolhuis B, Tomlinson IP, Toshner M, Traylor M, Treacy C, Treadaway P, Trembath R, Tuna S, Turek W, Turro E, Twiss P, Vale T, Geet CV, van Zuydam N, Vandekuilen M, Vandersteen AM, Vazquez-Lopez M, von Ziegenweidt J, Noordegraaf AV, Wagner A, Waisfisz Q, Walker SM, Walker N, Walter K, Ware JS, Watkins H, Watt C, Webster AR, Wedderburn L, Wei W, Welch SB, Wessels J, Westbury SK, Westwood J-P, Wharton J, Whitehorn D, Whitworth J, Wilkie AOM, Wilkins MR, Williamson C, Wilson BT, Wong EKS, Wood N, Wood Y, Woods CG, Woodward ER, Wort SJ, Worth A, Wright M, Yates K, Yong PFK, Young T, Yu P, Yu-Wai-Man P, Zlamalova E, Kingston N, Walker N, Bradley JR, Ashford S, Penkett CJ, Freson K, Stirrups KE, Raymond FL, Ouwehand WH. Whole-genome sequencing of patients with rare diseases in a national health system. Nature. 2020;583:96–102. doi: 10.1038/s41586-020-2434-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Auton A, Abecasis GR, Altshuler DM, Durbin RM, Bentley DR, Chakravarti A, Clark AG, Donnelly P, Eichler EE, Flicek P, Gabriel SB, Gibbs RA, Green ED, Hurles ME, Knoppers BM, Korbel JO, Lander ES, Lee C, Lehrach H, Mardis ER, Marth GT, McVean GA, Nickerson DA, Schmidt JP, Sherry ST, Wang J, Wilson RK, Boerwinkle E, Doddapaneni H, Han Y, Korchina V, Kovar C, Lee S, Muzny D, Reid JG, Zhu Y, Chang Y, Feng Q, Fang X, Guo X, Jian M, Jiang H, Jin X, Lan T, Li G, Li J, Li Y, Liu S, Liu X, Lu Y, Ma X, Tang M, Wang B, Wang G, Wu H, Wu R, Xu X, Yin Y, Zhang D, Zhang W, Zhao J, Zhao M, Zheng X, Gupta N, Gharani N, Toji LH, Gerry NP, Resch AM, Barker J, Clarke L, Gil L, Hunt SE, Kelman G, Kulesha E, Leinonen R, McLaren WM, Radhakrishnan R, Roa A, Smirnov D, Smith RE, Streeter I, Thormann A, Toneva I, Vaughan B, Zheng-Bradley X, Grocock R, Humphray S, James T, Kingsbury Z, Sudbrak R, Albrecht MW, Amstislavskiy VS, Borodina TA, Lienhard M, Mertes F, Sultan M, Timmermann B, Yaspo M-L, Fulton L, Fulton R, Ananiev V, Belaia Z, Beloslyudtsev D, Bouk N, Chen C, Church D, Cohen R, Cook C, Garner J, Hefferon T, Kimelman M, Liu C, Lopez J, Meric P, O’Sullivan C, Ostapchuk Y, Phan L, Ponomarov S, Schneider V, Shekhtman E, Sirotkin K, Slotta D, Zhang H, Balasubramaniam S, Burton J, Danecek P, Keane TM, Kolb-Kokocinski A, McCarthy S, Stalker J, Quail M, Davies CJ, Gollub J, Webster T, Wong B, Zhan Y, Campbell CL, Kong Y, Marcketta A, Yu F, Antunes L, Bainbridge M, Sabo A, Huang Z, Coin LJM, Fang L, Li Q, Li Z, Lin H, Liu B, Luo R, Shao H, Xie Y, Ye C, Yu C, Zhang F, Zheng H, Zhu H, Alkan C, Dal E, Kahveci F, Garrison EP, Kural D, Lee W-P, Leong WF, Stromberg M, Ward AN, Wu J, Zhang M, Daly MJ, DePristo MA, Handsaker RE, Banks E, Bhatia G, del Angel G, Genovese G, Li H, Kashin S, McCarroll SA, Nemesh JC, Poplin RE, Yoon SC, Lihm J, Makarov V, Gottipati S, Keinan A, Rodriguez-Flores JL, Rausch T, Fritz MH, Stütz AM, Beal K, Clarke L, Datta A, Herrero J, McLaren WM, Ritchie GRS, Smith RE, Zerbino D, Zheng-Bradley X, Sabeti PC, Shlyakhter I, Schaffner SF, Vitti J, Cooper DN, Ball EV, Stenson PD, Barnes B, Bauer M, Cheetham RK, Cox A, Eberle M, Kahn S, Murray L, Peden J, Shaw R, Kenny EE, Batzer MA, Konkel MK, Walker JA, MacArthur DG, Lek M, Sudbrak R, Amstislavskiy VS, Herwig R, Ding L, Koboldt DC, Larson D, Ye K, Gravel S, Swaroop A, Chew E, Lappalainen T, Erlich Y, Gymrek M, Willems TF, Simpson JT, Shriver MD, Rosenfeld JA, Bustamante CD, Montgomery SB, Vega FMDL, Byrnes JK, Carroll AW, DeGorter MK, Lacroute P, Maples BK, Martin AR, Moreno-Estrada A, Shringarpure SS, Zakharia F, Halperin E, Baran Y, Cerveira E, Hwang J, Malhotra A, Plewczynski D, Radew K, Romanovitch M, Zhang C, Hyland FCL, Craig DW, Christoforides A, Homer N, Izatt T, Kurdoglu AA, Sinari SA, Squire K, Xiao C, Sebat J, Antaki D, Gujral M, Noor A, Ye K, Burchard EG, Hernandez RD, Gignoux CR, Haussler D, Katzman SJ, Kent WJ, Howie B, Ruiz-Linares A, Dermitzakis ET, Devine SE, Kang HM, Kidd JM, Blackwell T, Caron S, Chen W, Emery S, Fritsche L, Fuchsberger C, Jun G, Li B, Lyons R, Scheller C, Sidore C, Song S, Sliwerska E, Taliun D, Tan A, Welch R, Wing MK, Zhan X, Awadalla P, Hodgkinson A, Li Y, Shi X, Quitadamo A, Lunter G, Marchini JL, Myers S, Churchhouse C, Delaneau O, Gupta-Hinch A, Kretzschmar W, Iqbal Z, Mathieson I, Menelaou A, Rimmer A, Xifara DK, Oleksyk TK, Fu Y, Liu X, Xiong M, Jorde L, Witherspoon D, Xing J, Browning BL, Browning SR, Hormozdiari F, Sudmant PH, Khurana E, Tyler-Smith C, Albers CA, Ayub Q, Balasubramaniam S, Chen Y, Colonna V, Danecek P, Jostins L, Keane TM, McCarthy S, Walter K, Xue Y, Gerstein MB, Abyzov A, Balasubramanian S, Chen J, Clarke D, Fu Y, Harmanci AO, Jin M, Lee D, Liu J, Mu XJ, Zhang J, Zhang Y, Zhu H, Dal E, Kahveci F, Ward AN, Wu J, Zhang M, del Angel G, Hartl C, Nemesh JC, Shakir K, Yoon SC, Lihm J, Degenhardt J, Fritz MH, Meiers S, Raeder B, Casale FP, Clarke L, Smith RE, Stegle O, Zheng-Bradley X, Barnes B, Cheetham RK, Eberle M, Kahn S, Shaw R, Lameijer E-W, Batzer MA, Konkel MK, Walker JA, Hall I, Lacroute P, Cerveira E, Hwang J, Plewczynski D, Radew K, Romanovitch M, Church D, Xiao C, Antaki D, Bafna V, Michaelson J, Ye K, Devine SE, Gardner EJ, Mills RE, Dayama G, Emery S, Shi X, Quitadamo A, Chen K, Fan X, Chong Z, Chen T, Chaisson MJ, Huddleston J, Malig M, Nelson BJ, Parrish NF, Blackburne B, Lindsay SJ, Ning Z, Walter K, Zhang Y, Chen J, Clarke D, Lam H, Mu XJ, Sisu C, Challis D, Evani US, Lu J, Nagaswamy U, Yu J, Li W, Leong WF, Ward AN, del Angel G, Hartl C, Poplin RE, Rodriguez-Flores JL, Clarke L, Smith RE, Zheng-Bradley X, Christoforides A, Izatt T, Xiao C, Kang HM, Balasubramanian S, Habegger L, Yu H, Clarke L, Cunningham F, Dunham I, Zheng-Bradley X, Lage K, Jespersen JB, Horn H, DeGorter MK, Balasubramanian S, Kim D, Marcketta A, Desalle R, Narechania A, Sayres MAW, Rodriguez-Flores JL, Clarke L, Zheng-Bradley X, Gymrek M, Willems TF, Mendez FL, Poznik GD, Underhill PA, Cerveira E, Romanovitch M, Coin L, Shao H, Mittelman D, Banerjee R, Cerezo M, Fitzgerald TW, Louzada S, Massaia A, McCarthy S, Ritchie GR, Yang F, Kalra D, Hale W, Dan X, Ye C, Zheng X, Clarke L, Zheng-Bradley X, Cox A, Kahn S, Sudbrak R, Albrecht MW, Lienhard M, Larson D, Izatt T, Kurdoglu AA, Xiao C, Balasubramaniam S, Keane TM, McCarthy S, Stalker J, Barnes KC, Beiswanger C, Cai H, Cao H, Gerry NP, Gharani N, Gignoux CR, Henn B, Jones D, Jorde L, Kaye JS, Kent A, Kerasidou A, Mathias R, Ossorio PN, Parker M, Resch AM, Rotimi CN, Royal CD, Sandoval K, Su Y, Sudbrak R, Tian Z, Tishkoff S, Toji LH, Via M, Wang Y, Yang H, Yang L, Zhu J, Bodmer W, Bedoya G, Cai Z, Gao Y, Chu J, Peltonen L, Garcia-Montero A, Orfao A, Dutil J, Martinez-Cruzado JC, Oleksyk TK, Mathias RA, Hennis A, Watson H, McKenzie C, Qadri F, LaRocque R, Sabeti PC, Zhu J, Deng X, Sabeti PC, Asogun D, Folarin O, Happi C, Omoniwa O, Stremlau M, Tariyal R, Jallow M, Sisay-Joof F, Corrah T, Rockett K, Kwiatkowski D, Kooner J, T. n T. H. n. Dunstan SJ, Hang NT, Fonnie R, Garry R, Kanneh L, Moses L, Sabeti PC, Schieffelin J, Grant DS, Gallo C, Poletti G, Saleheen D, Rasheed A, Brooks LD, Felsenfeld AL, McEwen JE, Vaydylevich Y, Duncanson A, Dunn M, Schloss JA, Kang HM, Marchini JL, McCarthy S. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, Taliun SAG, Corvelo A, Gogarten SM, Kang HM, Pitsillides AN, LeFaive J, Lee S, Tian X, Browning BL, Das S, Emde A-K, Clarke WE, Loesch DP, Shetty AC, Blackwell TW, Smith AV, Wong Q, Liu X, Conomos MP, Bobo DM, Aguet F, Albert C, Alonso A, Ardlie KG, Arking DE, Aslibekyan S, Auer PL, Barnard J, Barr RG, Barwick L, Becker LC, Beer RL, Benjamin EJ, Bielak LF, Blangero J, Boehnke M, Bowden DW, Brody JA, Burchard EG, Cade BE, Casella JF, Chalazan B, Chasman DI, Chen Y-DI, Cho MH, Choi SH, Chung MK, Clish CB, Correa A, Curran JE, Custer B, Darbar D, Daya M, de Andrade M, DeMeo DL, Dutcher SK, Ellinor PT, Emery LS, Eng C, Fatkin D, Fingerlin T, Forer L, Fornage M, Franceschini N, Fuchsberger C, Fullerton SM, Germer S, Gladwin MT, Gottlieb DJ, Guo X, Hall ME, He J, Heard-Costa NL, Heckbert SR, Irvin MR, Johnsen JM, Johnson AD, Kaplan R, Kardia SLR, Kelly T, Kelly S, Kenny EE, Kiel DP, Klemmer R, Konkle BA, Kooperberg C, Köttgen A, Lange LA, Lasky-Su J, Levy D, Lin X, Lin K-H, Liu C, Loos RJF, Garman L, Gerszten R, Lubitz SA, Lunetta KL, Mak ACY, Manichaikul A, Manning AK, Mathias RA, McManus DD, McGarvey ST, Meigs JB, Meyers DA, Mikulla JL, Minear MA, Mitchell BD, Mohanty S, Montasser ME, Montgomery C, Morrison AC, Murabito JM, Natale A, Natarajan P, Nelson SC, North KE, O’Connell JR, Palmer ND, Pankratz N, Peloso GM, Peyser PA, Pleiness J, Post WS, Psaty BM, Rao DC, Redline S, Reiner AP, Roden D, Rotter JI, Ruczinski I, Sarnowski C, Schoenherr S, Schwartz DA, Seo J-S, Seshadri S, Sheehan VA, Sheu WH, Shoemaker MB, Smith NL, Smith JA, Sotoodehnia N, Stilp AM, Tang W, Taylor KD, Telen M, Thornton TA, Tracy RP, Berg DJVD, Vasan RS, Viaud-Martinez KA, Vrieze S, Weeks DE, Weir BS, Weiss ST, Weng L-C, Willer CJ, Zhang Y, Zhao X, Arnett DK, Ashley-Koch AE, Barnes KC, Boerwinkle E, Gabriel S, Gibbs R, Rice KM, Rich SS, Silverman EK, Qasba P, Gan W, Abe N, Almasy L, Ament S, Anderson P, Anugu P, Applebaum-Bowden D, Assimes T, Avramopoulos D, Barron-Casella E, Beaty T, Beck G, Becker D, Beitelshees A, Benos T, Bezerra M, Bis J, Bowler R, Broeckel U, Broome J, Bunting K, Bustamante C, Buth E, Cardwell J, Carey V, Carty C, Casaburi R, Castaldi P, Chaffin M, Chang C, Chang Y-C, Chavan S, Chen B-J, Chen W-M, Chuang L-M, Chung R-H, Comhair S, Cornell E, Crandall C, Crapo J, Curtis J, Damcott C, David S, Davis C, de las Fuentes L, DeBaun M, Deka R, Devine S, Duan Q, Duggirala R, Durda JP, Eaton C, Ekunwe L, Boueiz AE, Erzurum S, Farber C, Flickinger M, Fornage M, Frazar C, Fu M, Fulton L, Gao S, Gao Y, Gass M, Gelb B, Geng XP, Geraci M, Ghosh A, Gignoux C, Glahn D, Gong D-W, Goring H, Graw S, Grine D, Gu CC, Guan Y, Gupta N, Haessler J, Hawley NL, Heavner B, Herrington D, Hersh C, Hidalgo B, Hixson J, Hobbs B, Hokanson J, Hong E, Hoth K, Hsiung CA, Hung Y-J, Huston H, Hwu CM, Jackson R, Jain D, Jhun MA, Johnson C, Johnston R, Jones K, Kathiresan S, Khan A, Kim W, Kinney G, Kramer H, Lange C, Lange E, Lange L, Laurie C, LeBoff M, Lee J, Lee SS, Lee W-J, Levine D, Lewis J, Li X, Li Y, Lin H, Lin H, Lin KH, Liu S, Liu Y, Liu Y, Luo J, Mahaney M, Make B, Manson J, Margolin L, Martin L, Mathai S, May S, McArdle P, McDonald M-L, McFarland S, McGoldrick D, McHugh C, Mei H, Mestroni L, Min N, Minster RL, Moll M, Moscati A, Musani S, Mwasongwe S, Mychaleckyj JC, Nadkarni G, Naik R, Naseri T, Nekhai S, Neltner B, Ochs-Balcom H, Paik D, Pankow J, Parsa A, Peralta JM, Perez M, Perry J, Peters U, Phillips LS, Pollin T, Becker JP, Boorgula MP, Preuss M, Qiao D, Qin Z, Rafaels N, Raffield L, Rasmussen-Torvik L, Ratan A, Reed R, Regan E, Reupena MS, Roselli C, Russell P, Ruuska S, Ryan K, Sabino EC, Saleheen D, Salimi S, Salzberg S, Sandow K, Sankaran VG, Scheller C, Schmidt E, Schwander K, Sciurba F, Seidman C, Seidman J, Sherman SL, Shetty A, Sheu WH-H, Silver B, Smith J, Smith T, Smoller S, Snively B, Snyder M, Sofer T, Storm G, Streeten E, Sung YJ, Sylvia J, Szpiro A, Sztalryd C, Tang H, Taub M, Taylor M, Taylor S, Threlkeld M, Tinker L, Tirschwell D, Tishkoff S, Tiwari H, Tong C, Tsai M, Vaidya D, VandeHaar P, Walker T, Wallace R, Walts A, Wang FF, Wang H, Watson K, Wessel J, Williams K, Williams LK, Wilson C, Wu J, Xu H, Yanek L, Yang I, Yang R, Zaghloul N, Zekavat M, Zhao SX, Zhao W, Zhi D, Zhou X, Zhu X, Papanicolaou GJ, Nickerson DA, Browning SR, Zody MC, Zöllner S, Wilson JG, Cupples LA, Laurie CC, Jaquish CE, Hernandez RD, O’Connor TD, Abecasis GR. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature. 2021;590:290–299. doi: 10.1038/s41586-021-03205-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Walter K, Min JL, Huang J, Crooks L, Memari Y, McCarthy S, Perry JRB, Xu C, Futema M, Lawson D, Iotchkova V, Schiffels S, Hendricks AE, Danecek P, Li R, Floyd J, Wain LV, Humphries SE, Plagnol V, Richards JB, Greenwood CMT, Soranzo N, Bala S, Clapham P, Coates G, Cox T, Daly A, Danecek P, Du Y, Edkins S, Ellis P, Flicek P, Guo X, Guo X, Huang L, Jackson DK, Joyce C, Keane T, Kolb-Kokocinski A, Langford C, Li Y, Liang J, Lin H, Liu R, Maslen J, McCarthy S, Quail MA, Stalker J, Sun J, Tian J, Wang G, Wang J, Wang Y, Wong K, Zhang P, Birney E, Boustred C, Chen L, Clement G, Cocca M, Danecek P, Smith GD, Day INM, Day-Williams A, Down T, Dunham I, Evans DM, Geihs M, Greenwood CMT, Hart D, Hendricks AE, Howie B, Hubbard T, Hysi P, Iotchkova V, Jamshidi Y, Karczewski KJ, Kemp JP, Lachance G, Lawson D, Lek M, Lopes M, MacArthur DG, Marchini J, Mangino M, Mathieson I, McCarthy S, Memari Y, Min JL, Moayyeri A, Northstone K, Panoutsopoulou K, Paternoster L, Quaye L, Richards JB, Ring S, Ritchie GRS, Shihab HA, Shin S-Y, Small KS, Artigas MS, Southam L, Pourcain BS, Surdulescu G, Tachmazidou I, Tobin MD, Valdes AM, Visscher PM, Wain LV, Walter K, Ward K, Wilson SG, Wong K, Yang J, Zhang F, Zheng H-F, Anney R, Ayub M, Blackwood D, Bolton PF, Breen G, Collier DA, Craddock N, Crooks L, Curran S, Curtis D, Gallagher L, Geschwind D, Gurling H, Holmans P, Lee I, Lönnqvist J, McCarthy S, McGuffin P, McIntosh AM, McKechanie AG, McQuillin A, Morris J, O’Donovan MC, Owen MJ, Parr JR, Paunio T, Pietilainen O, Rehnström K, Sharp SI, Skuse D, Clair DS, Suvisaari J, Walters JTR, Williams HJ, Barroso I, Bochukova E, Bounds R, Dominiczak A, Farooqi IS, Hendricks AE, Keogh J, Marenne G, McCarthy S, Morris A, Muddyman D, O’Rahilly S, Porteous DJ, Smith BH, Wheeler E, Turki SA, Anderson CA, Antony D, Beales P, Bentham J, Bhattacharya S, Calissano M, Carss K, Chatterjee K, Cirak S, Cosgrove C, Durbin R, Floyd J, Foley AR, Franklin CS, Futema M, Grozeva D, Humphries SE, McCarthy S, Mitchison HM, O’Rahilly S, Onoufriadis A, Parker V, Payne F, Raymond FL, Roberts N, Savage DB, Scambler P, Schmidts M, Schoenmakers N, Semple RK, Serra E, Spasic-Boskovic O, Stevens E, van Kogelenberg M, Vijayarangakannan P, Walter K, Williamson KA, Wilson C, Whyte T, Ciampi A, Greenwood CMT, Hendricks AE, Li R, Metrustry S, Oualkacha K, Xu C, Bobrow M, Bolton PF, Griffin H, Kent A, Muntoni F, Raymond FL, Semple RK, Smee C, Spector TD, Charlton R, Ekong R, Futema M, Humphries SE, Khawaja F, Lopes LR, Migone N, Payne SJ, Pollitt RC, Povey S, Ridout CK, Robinson RL, Scott RH, Shaw A, Syrris P, Taylor R, Vandersteen AM, Barrett JC, Farooqi IS, FitzPatrick DR, Hurles ME, Kaye J, Kennedy K, Langford C, McCarthy S, Owen MJ, Palotie A, Richards JB, Stalker J, Timpson NJ, Zeggini E, Amuzu A, Casas JP, Chambers JC, Dedoussis G, Gambaro G, Gasparini P, Gaunt TR, Iotchkova V, Isaacs A, Johnson J, Kleber ME, Kooner JS, Langenberg C, Luan J, Malerba G, März W, Matchan A, Min JL, Morris R, Nordestgaard BG, Benn M, Ring S, Scott RA, Toniolo D, Traglia M, Tybjærg-Hansen A, van Duijn CM, van Leeuwen EM, Varbo A, Whincup P, Zaza G, Zhang W. The UK10K project identifies rare variants in health and disease. Nature. 2015;526:82–90. doi: 10.1038/nature14962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O’Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, Tukiainen T, Birnbaum DP, Kosmicki JA, Duncan LE, Estrada K, Zhao F, Zou J, Pierce-Hoffman E, Berghout J, Cooper DN, Deflaux N, DePristo M, Do R, Flannick J, Fromer M, Gauthier L, Goldstein J, Gupta N, Howrigan D, Kiezun A, Kurki MI, Moonshine AL, Natarajan P, Orozco L, Peloso GM, Poplin R, Rivas MA, Ruano-Rubio V, Rose SA, Ruderfer DM, Shakir K, Stenson PD, Stevens C, Thomas BP, Tiao G, Tusie-Luna MT, Weisburd B, Won H-H, Yu D, Altshuler DM, Ardissino D, Boehnke M, Danesh J, Donnelly S, Elosua R, Florez JC, Gabriel SB, Getz G, Glatt SJ, Hultman CM, Kathiresan S, Laakso M, McCarroll S, McCarthy MI, McGovern D, McPherson R, Neale BM, Palotie A, Purcell SM, Saleheen D, Scharf JM, Sklar P, Sullivan PF, Tuomilehto J, Tsuang MT, Watkins HC, Wilson JG, Daly MJ, MacArthur DG. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291. doi: 10.1038/nature19057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Ioannidis NM, Rothstein JH, Pejaver V, Middha S, McDonnell SK, Baheti S, Musolf A, Li Q, Holzinger E, Karyadi D, Cannon-Albright LA, Teerlink CC, Stanford JL, Isaacs WB, Xu J, Cooney KA, Lange EM, Schleutker J, Carpten JD, Powell IJ, Cussenot O, Cancel-Tassin G, Giles GG, MacInnis RJ, Maier C, Hsieh C-L, Wiklund F, Catalona WJ, Foulkes WD, Mandal D, Eeles RA, Kote-Jarai Z, Bustamante CD, Schaid DJ, Hastie T, Ostrander EA, Bailey-Wilson JE, Radivojac P, Thibodeau SN, Whittemore AS, Sieh W. REVEL: An Ensemble method for predicting the pathogenicity of rare missense variants. Am J Hum Genet. 2016;99:877–885. doi: 10.1016/j.ajhg.2016.08.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Qi H, Zhang H, Zhao Y, Chen C, Long JJ, Chung WK, Guan Y, Shen Y. MVP predicts the pathogenicity of missense variants by deep learning. Nat Commun. 2021;12:510. doi: 10.1038/s41467-020-20847-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Kass RE, Raftery AE. Bayes Factors. J Am Stat Assoc. 1995;90:773–795. doi: 10.1080/01621459.1995.10476572. [DOI] [Google Scholar]
  • 85.Yang Y, Muzny DM, Xia F, Niu Z, Person R, Ding Y, et al. Molecular findings among patients referred for clinical whole-exome sequencing. JAMA. 2014. 10.1001/jama.2014.14601. [DOI] [PMC free article] [PubMed]
  • 86.Cipriani V, Pontikos N, Arno G, Sergouniotis PI, Lenassi E, Thawong P, Danis D, Michaelides M, Webster AR, Moore AT, Robinson PN, Jacobsen JOB, Smedley D. An improved phenotype-driven tool for rare mendelian variant prioritization: benchmarking exomiser on real patient whole-exome data. Genes-basel. 2020;11:460. doi: 10.3390/genes11040460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Sarwal V, Niehus S, Ayyala R, Chang S, Lu A, Darci-Maher N, et al. A comprehensive benchmarking of WGS-based structural variant callers. Biorxiv. 2020:2020.04.16.045120. [DOI] [PMC free article] [PubMed]
  • 88.Zare F, Dow M, Monteleone N, Hosny A, Nabavi S. An evaluation of copy number variation detection tools for cancer using whole exome sequencing data. Bmc Bioinformatics. 2017;18:1–13. doi: 10.1186/s12859-017-1705-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Swaminathan GJ, Bragin E, Chatzimichali EA, Corpas M, Bevan AP, Wright CF, Carter NP, Hurles ME, Firth HV. DECIPHER: web-based, community resource for clinical interpretation of rare variants in developmental disorders. Hum Mol Genet. 2012;21:R37–R44. doi: 10.1093/hmg/dds362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Yandell MD, Majoros WH. Genomics and natural language processing. Nat Rev Genet. 2002;3:601–610. doi: 10.1038/nrg861. [DOI] [PubMed] [Google Scholar]
  • 91.Kingsmore SF, Cakici JA, Clark MM, Gaughran M, Feddock M, Batalov S, Bainbridge MN, Carroll J, Caylor SA, Clarke C, Ding Y, Ellsworth K, Farnaes L, Hildreth A, Hobbs C, James K, Kint CI, Lenberg J, Nahas S, Prince L, Reyes I, Salz L, Sanford E, Schols P, Sweeney N, Tokita M, Veeraraghavan N, Watkins K, Wigby K, Wong T, Chowdhury S, Wright MS, Dimmock D, the R. Investigators. Bezares Z, Bloss C, Braun JJA, Diaz C, Mashburn D, Tamang D, Orendain D, Friedman J, Gleeson J, Barea J, Chiang G, Cohenmeyer C, Coufal NG, Evans M, Honold J, Hovey RL, Kimball A, Lane B, Le C, Le J, Leibel S, Moyer L, Mulrooney P, Oh D, Ordonez P, Oriol A, Ortiz-Arechiga M, Puckett L, Speziale M, Suttner D, Kraan LVD, Knight G, Sauer C, Song R, White S, Wise A, Yamada C. A randomized, controlled trial of the analytic and diagnostic performance of singleton and trio, rapid genome and exome sequencing in ill infants. Am J Hum Genet. 2019;105:719–733. doi: 10.1016/j.ajhg.2019.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Samocha KE, Robinson EB, Sanders SJ, Stevens C, Sabo A, McGrath LM, Kosmicki JA, Ouml M KR, Mallick S, Kirby A, Wall DP, MacArthur DG, Gabriel SB, DePristo M, Purcell SM, Palotie A, Boerwinkle E, Buxbaum JD, Cook EH, Gibbs RA, Schellenberg GD, Sutcliffe JS, Devlin B, Roeder K, Neale BM, Daly MJ. A framework for the interpretation of de novo mutation in human disease. Nat Genet. 2014;46:1–8. doi: 10.1038/ng.3050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Liu P, Meng L, Normand EA, Xia F, Song X, Ghazi A, Rosenfeld J, Magoulas PL, Braxton A, Ward P, Dai H, Yuan B, Bi W, Xiao R, Wang X, Chiang T, Vetrini F, He W, Cheng H, Dong J, Gijavanekar C, Benke PJ, Bernstein JA, Eble T, Eroglu Y, Erwin D, Escobar L, Gibson JB, Gripp K, Kleppe S, Koenig MK, Lewis AM, Natowicz M, Mancias P, Minor L, Scaglia F, Schaaf CP, Streff H, Vernon H, Uhles CL, Zackai EH, Wu N, Sutton VR, Beaudet AL, Muzny D, Gibbs RA, Posey JE, Lalani S, Shaw C, Eng CM, Lupski JR, Yang Y. Reanalysis of Clinical Exome Sequencing Data. New Engl J Med. 2019;380:2478–2480. doi: 10.1056/NEJMc1812033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Wenger AM, Guturu H, Bernstein JA, Bejerano G. Systematic reanalysis of clinical exome data yields additional diagnoses: implications for providers. Genet Med. 2017;19:209–214. doi: 10.1038/gim.2016.88. [DOI] [PubMed] [Google Scholar]
  • 95.Carapito R, Konantz M, Paillard C, Miao Z, Pichot A, Leduc MS, Yang Y, Bergstrom KL, Mahoney DH, Shardy DL, Alsaleh G, Naegely L, Kolmer A, Paul N, Hanauer A, Rolli V, Müller JS, Alghisi E, Sauteur L, Macquin C, Morlon A, Sancho CS, Amati-Bonneau P, Procaccio V, Mosca-Boidron A-L, Marle N, Osmani N, Lefebvre O, Goetz JG, Unal S, Akarsu NA, Radosavljevic M, Chenard M-P, Rialland F, Grain A, Béné M-C, Eveillard M, Vincent M, Guy J, Faivre L, Thauvin-Robinet C, Thevenon J, Myers K, Fleming MD, Shimamura A, Bottollier-Lemallaz E, Westhof E, Lengerke C, Isidor B, Bahram S. Mutations in signal recognition particle SRP54 cause syndromic neutropenia with Shwachman-Diamond–like features. J Clin Invest. 2017;127:4090–4103. doi: 10.1172/JCI92876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Bellanné-Chantelot C, Schmaltz-Panneau B, Marty C, Fenneteau O, Callebaut I, Clauin S, Docet A, Damaj G-L, Leblanc T, Pellier I, Stoven C, Souquere S, Antony-Debré I, Beaupain B, Aladjidi N, Barlogis V, Bauduer F, Bensaid P, Boespflug-Tanguy O, Berger C, Bertrand Y, Carausu L, Fieschi C, Galambrun C, Schmidt A, Journel H, Mazingue F, Nelken B, Quah TC, Oksenhendler E, Ouachée M, Pasquet M, Saada V, Suarez F, Pierron G, Vainchenker W, Plo I, Donadieu J. Mutations in the SRP54 gene cause severe congenital neutropenia as well as Shwachman-Diamond–like syndrome. Blood. 2018;132:1318–1331. doi: 10.1182/blood-2017-12-820308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.SIGNAL RECOGNITION PARTICLE, 54-KD; SRP54Online Mendelian Inheritance in Man® (available at https://omim.org/entry/604857?search=srp54&highlight=srp54).
  • 98.Li Z, Zhang F, Wang Y, Qiu Y, Wu Y, Lu Y, Yang L, Qu WJ, Wang H, Zhou W, Tian W. PhenoPro: a novel toolkit for assisting in the diagnosis of Mendelian disease. Bioinformatics. 2019;35:btz100. doi: 10.1093/bioinformatics/btz100. [DOI] [PubMed] [Google Scholar]
  • 99.Deisseroth CA, Birgmeier J, Bodle EE, Kohler JN, Matalon DR, Nazarenko Y, Genetti CA, Brownstein CA, Schmitz-Abe K, Schoch K, Cope H, Signer R, Martinez-Agosto JA, Shashi V, Beggs AH, Wheeler MT, Bernstein JA, Bejerano G. ClinPhen extracts and prioritizes patient phenotypes directly from medical records to expedite genetic disease diagnosis. Genet Med. 2019;21:1585–1593. doi: 10.1038/s41436-018-0381-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Baker SW, Murrell JR, Nesbitt AI, Pechter KB, Balciuniene J, Zhao X, Yu Z, Denenberg EH, DeChene ET, Wilkens AB, Bhoj EJ, Guan Q, Dulik MC, Conlin LK, Tayoun ANA, Luo M, Wu C, Cao K, Sarmady M, Bedoukian EC, Tarpinian J, Medne L, Skraban CM, Deardorff MA, Krantz ID, Krock BL, Santani AB. Automated clinical exome reanalysis reveals novel diagnoses. J Mol Diagn. 2019;21:38–48. doi: 10.1016/j.jmoldx.2018.07.008. [DOI] [PubMed] [Google Scholar]
  • 101.Son JH, Xie G, Yuan C, Ena L, Li Z, Goldstein A, Huang L, Wang L, Shen F, Liu H, Mehl K, Groopman EE, Marasa M, Kiryluk K, Gharavi AG, Chung WK, Hripcsak G, Friedman C, Weng C, Wang K. Deep phenotyping on electronic health records facilitates genetic diagnosis by clinical exomes. Am J Hum Genet. 2018;103:58–73. doi: 10.1016/j.ajhg.2018.05.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.G.-R. D. Collaboration. Center RG, Hout CVV, Tachmazidou I, Backman JD, Hoffman JD, Liu D, Pandey AK, Gonzaga-Jauregui C, Khalid S, Ye B, Banerjee N, Li AH, O’Dushlaine C, Marcketta A, Staples J, Schurmann C, Hawes A, Maxwell E, Barnard L, Lopez A, Penn J, Habegger L, Blumenfeld AL, Bai X, O’Keeffe S, Yadav A, Praveen K, Jones M, Salerno WJ, Chung WK, Surakka I, Willer CJ, Hveem K, Leader JB, Carey DJ, Ledbetter DH, Cardon L, Yancopoulos GD, Economides A, Coppola G, Shuldiner AR, Balasubramanian S, Cantor M, Nelson MR, Whittaker J, Reid JG, Marchini J, Overton JD, Scott RA, Abecasis GR, Yerges-Armstrong L, Baras A. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature. 2020;586:749–756. doi: 10.1038/s41586-020-2853-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Carey DJ, Fetterolf SN, Davis FD, Faucett WA, Kirchner HL, Mirshahi U, Murray MF, Smelser DT, Gerhard GS, Ledbetter DH. The Geisinger MyCode community health initiative: an electronic health record–linked biobank for precision medicine research. Genet Med. 2016;18:906–913. doi: 10.1038/gim.2015.187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Dimmock DP, Clark MM, Gaughran M, Cakici JA, Caylor SA, Clarke C, Feddock M, Chowdhury S, Salz L, Cheung C, Bird LM, Hobbs C, Wigby K, Farnaes L, Bloss CS, Kingsmore SF, the R. Investigators. Bainbridge MN, Barea J, Batalov S, Bezares Z, Bird LM, Bloss CS, Braun JJA, Cakici JA, Campo MD, Carroll J, Cheung C, Cohenmeyer C, Coufal NG, Diaz C, Ding Y, Ellsworth K, Evans M, Feigenbaum A, Friedman J, Gleeson J, Hansen C, Honold J, James K, Jones MC, Kimball A, Knight G, Kraan LVD, Lane B, Le J, Leibel S, Lenberg J, Mashburn D, Moyer L, Mulrooney P, Nahas S, Oh D, Orendain D, Oriol A, Ortiz-Arechiga M, Prince L, Rego S, Reyes I, Sanford E, Sauer C, Schwanemann L, Speziale M, Suttner D, Sweeney N, Song R, Tokita M, Veeraraghavan N, Watkins K, Wong T, Wright MS, Yamada C. An RCT of rapid genomic sequencing among seriously ill infants results in high clinical utility, changes in management, and low perceived harm. Am J Hum Genet. 2020;107:942–952. doi: 10.1016/j.ajhg.2020.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.The Exomiser - a tool to annotate and prioritize exome variants (available at https://github.com/exomiser/Exomiser). Last accessed 22 Aug 2021.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

13073_2021_965_MOESM1_ESM.xlsx (336KB, xlsx)

Additional file 1. Supplementary Tables (Tables S1-S14).

13073_2021_965_MOESM2_ESM.pdf (911KB, pdf)

Additional file 2. Supplementary Figures (Figures S1-S7).

Data Availability Statement

The datasets supporting the conclusions of this article are included within the article and its additional files. Due to patient privacy, data sharing consent, and HIPAA regulations, our raw data cannot be submitted to publicly available databases. Anonymized outputs from GEM [70], Phevor [15], VAAST [14], and Exomiser [16] for the benchmark dataset cases are tabulated in Additional file 1: Tables S5-S8, and GEM for the validation dataset cases in Additional file 1: Table S10. Condition match scores for hits with gene BF > 0 used for Fig. 6 are tabulated in Additional file 1: Tables S11-S14. GEM, Phevor, and VAAST software implementations for versions used in this analysis are part of the Fabric Enterprise analysis platform and are commercially available [70]. Exomiser source code (version 12.1.0) is available on GitHub [105].


Articles from Genome Medicine are provided here courtesy of BMC

RESOURCES