Abstract
Gene discovery for Mendelian conditions (MCs) offers a direct path to understanding genome function. Approaches based on next-generation sequencing applied at scale have dramatically accelerated gene discovery and transformed genetic medicine. Finding the genetic basis of ∼6,000–13,000 MCs yet to be delineated will require both technical and computational innovation, but will rely to a larger extent on meaningful data sharing.
Main Text
Most of what we understand about how the human genome encodes function and what constitutes a causal variant has been motivated by gene discovery for Mendelian conditions (MCs).1 Indeed, the vast majority of variants of known function in the genome underlie MCs, and study of MCs is currently the gold standard for adjudicating variants of unknown significance (VUSs). While new computational strategies as well as technologies (e.g., multiplexed assays for variant effects) and biological models that can be scaled to assess the impact of every possible variant offer an unprecedented opportunity to explore genome function,2 it is the study of natural genome variation in humans when manifested by MCs that still provides the most efficient and putatively cost-effective path to link genotype with human phenotype. Moreover, this path leads directly to development and testing of new preventive, diagnostic, and treatment strategies for rare diseases (e.g., cystic fibrosis transmembrane regulator modulators).3 So it is not surprising that the overwhelming majority of genetic diagnostic tests, results returned to families, and results that inform reproductive options, guide clinical management, and enable selection of therapeutics are based on discoveries of the genes underlying MCs.
Prior to 2010, gene discovery was driven by positional cloning, which requires information about the genomic location and function of a candidate gene, the phenotype, or both, limiting its effectiveness. Introduction of computational approaches based on exome sequencing (ES) that required neither was a disruptive innovation that replaced not only positional cloning, but virtually all incumbent approaches to gene discovery.4, 5, 6 Accordingly, thousands of MCs that had been intractable to conventional gene discovery approaches for various reasons suddenly became solvable using ES. The impact has been stunning.7, 8
Rapid adoption of ES, and approaches using next-generation sequencing (NGS) in general, to identify genes associated with MCs (1) markedly accelerated the rate of novel gene discovery for MCs (i.e., a gene not previously known to underlie an MC [novel gene] or a gene found to underlie a novel condition or known but unexplained MC); (2) enabled identification of >1,000 new MCs; (3) replaced “phenotype-driven” with “genotype-driven” syndrome delineation; (4) led to the deconstruction of heuristic phenotypic classes (e.g., developmental disorders, autism, epilepsy, congenital heart defects) into separate and often distinct MCs with otherwise low clinical recognizability (LCR); and (5) expanded our understanding the phenotypic effects of thousands of genotypes and MCs. Summed across all genes underlying MCs discovered in the past decade, application of ES and NGS has rapidly advanced our knowledge of genome function, transformed the clinical practice of genetic medicine and challenged our understanding of fundamental concepts in human genetics (e.g., risk, penetrance, variable expressivity, etc.). However, gene discovery for MCs risks becoming a victim of its own success: There is perception in some circles that the pace of discovery is leveling off or even declining, the number of unsolved MCs is small, the remaining MCs are unlikely to be solvable by existing ES-based approaches, and/or many of the remaining MCs will be solved in the course of clinical diagnostic testing alone. We offer a different perspective.
From 1900 to 1950, a handful of new MCs were characterized each year (Figures 1A–1C, Figure S1, see Supplemental Methods in Supplemental Data). In the 1950s, the rate at which new MCs were delineated increased coincident with the emergence of the disciplines of medical and biochemical genetics and dysmorphology, reaching a peak in the 1970s. Despite the growing number of rare MCs cataloged, a relatively small number (i.e., ∼40) of genes underlying MCs were known prior to the introduction of positional cloning in 1986.9 Subsequently, both the rate of MC delineation and the rate of reports (i.e., publications) of discovery of genes associated with MC increased steeply (Figure 1A). Specifically, between 1986 and 1997, the number of MCs delineated and the number of reported discoveries of genes underlying MCs increased annually by ∼3 (p = 5.9 × 10−4) and ∼10 (p = 1.2 × 10−6) per year, respectively. However, between 1998 and 2010, prior to the introduction of ES in 20096 and its application toward gene discovery in 2010,4, 10 the rate of reports of gene discoveries for MCs had plateaued (Figure 1A). After 2010, the number of MCs delineated and the number of discoveries of genes underlying MCs reported each year markedly increased by ∼19 (p = 0.006) and ∼14 (p = 0.06), respectively, per year through 2015. The impact has been rapid and profound. NGS-based approaches (primarily ES) have led to ∼36% (1,268/3,549) of all reported Mendelian gene discoveries (Figures S1 and S2), and by the end of 2017, the majority (87%) of reported gene discoveries were made via NGS-based approaches (Figure 1C).
The annual number of MCs for which the genetic basis is reported peaked between 2012 and 2015 and declined slightly each year thereafter (Figures 1A and 1B), suggesting that perhaps the underlying rate of gene discovery is declining as well. To distinguish whether discovery trends parallel reporting trends, we reviewed annualized totals of novel gene discoveries, both published and unpublished, publicly reported by the National Institutes of Health (NIH) Centers for Mendelian Genomics (CMG). By 2015, the CMGs made 419 novel gene discoveries and published 93, a ratio of gene discoveries to published reports of 4.5.8 By the end of 2018, the CMGs had made 1,937 discoveries, an increase from ∼120 discoveries per year to ∼244 per year, but reported only 287 (6.7 gene discoveries per reported discovery).7 In other words, despite a drop in publication rate, the rate of discovery has continued to increase. This reporting delay, of obvious concern, has multiple explanations, but is due in part to investigators spending time to ascertain additional affected families, deeply characterize phenotypes and delineate new MCs (an obligatory consequence of the shift to genotype-driven delineation), and generate functional data to establish causality, link variants to function and outcome, and leverage high-impact publications. Whether the pace of discovery of investigators collaborating with CMGs reflects the worldwide experience of all investigators is unclear, but the total number of novel gene discoveries published by the CMGs represents about one of every six publications of novel gene discoveries; this suggests that these data seem a reasonable estimator of discovery trends. Public reporting of numbers of novel genes underlying MCs discovered each year by other large-scale programs would help to validate these results.
Online Mendelian Inheritance in Man (OMIM) and Orphanet both include only several hundred MCs for which the underlying gene is still unknown,11 so it is often alleged that at the current pace of discovery, the genetic basis of nearly all MCs will be identified within the next ten years or so. However, the pervasive use of ES and NGS to identify the genetic basis of MCs has also accelerated the pace of novel MC delineation (Figure 1D). Historically, delineation of new MCs has been phenotype driven. That is, a person, persons, or a family with a recognizable but heretofore unreported pattern of phenotypic findings was ascertained, clinical characterization of additional persons with an overlapping pattern of findings established the canonical phenotype, and subsequently the underlying genetic basis of the canonical phenotype was sought. In contrast, new MCs are now delineated only after discovery of their genetic basis (i.e., delineation is genotype-driven), that is the rates of syndrome delineation and gene discovery have become inextricably linked, with >80% of the novel gene discoveries reported each year representing genes for newly described MCs (Figure 1D and Figure S3). The historical totals of MCs described by genotype-driven (n = 2,023) versus phenotype driven (n = 3,149) delineation are approaching equality (Figure S4), and ultimately most MCs will be ascertained via genotype-driven delineation.
What is the source of these new MCs? Foremost, NGS of large numbers of persons with a condition representative of a phenotypic class has enabled delineation, based on the underlying gene responsible, of hundreds of new LCR-MCs (e.g., intellectual disability, developmental disorders12). This splitting of phenotypic classes into separate LCR-MCs is, and will continue to be, a primary source of newly delineated MCs and novel disease-associated gene discoveries. Moreover, even many MCs considered to be of high clinical recognizability (HCR) (e.g., Brachmann-De Lange [MIM: 122470], Noonan [MIM: 163950], and Kabuki [MIM: 147920]) are being found to be comprised of multiple MCs caused by variants in several genes. For some such HCR-MCs (e.g., arguably, Coffin-Siris [MIM: 135900]), the canonical phenotype and distribution of phenotypic effects are, in large part, indistinguishable across different causal genes, although they may eventually be resolvable from one another by deep phenotyping. In other words, we suggest that genetic heterogeneity in an MC is often a reflection of our general lack of knowledge of gene/genotype-phenotype relationships. Thus, we predict that as our understanding of the phenotypic effects of variants which cause MCs improves, fewer and fewer MCs will be considered genetically heterogeneous. However, while burden analyses of increasingly larger cohorts of proband-parent trios diagnosed by phenotypic class (e.g., autism, congenital heart defects) are predicted to increase the number of MCs found to be caused by DNVs,13 there may be diminishing returns as in many, if not most, persons categorized by rare disease phenotypic class; the condition is likely oligogenic or polygenic rather than an MC.14
While phenotypic classes might be a rich source of new MCs, how many as-of-yet unknown MCs might exist? Catalogues of hundreds of well-established, unexplained MCs, loci for many of which have been mapped, demonstrate that the opportunity for discoveries among inherited MCs remains high. Moreover, MCs with or without a known gene are still almost entirely ascertained from populations of European or Middle Eastern ancestry. To what extent this has limited our scope of knowledge of MCs in general is unclear but should be empirically assessed. Accordingly, there is widespread interest in prioritizing efforts to characterize MCs and their underlying genetic basis in under-represented populations, with particular emphasis on surveying population isolates and populations with high levels of consanguinity that are typically not included in large-scale efforts to discover genes for MCs. A dedicated large-scale effort would require extensive infrastructure, coordination, governance, and resources, but the return on investment could be substantial.
Orthogonal evidence from humans and mice suggests that there are conservatively at least twice as many MCs that have yet to be delineated as there are known MCs (Figure 2 and see Supplemental Methods in Supplemental Data). Analysis of large databases of human coding variation (e.g., gnomAD) using metrics (e.g., Constrained Coding Regions,15 LOEUF,16 missense OEUF,16 and nonsense-mediated decay escape rank17) that assess the depletion of classes of functional variation in specific regions or genes identifies 9,596 genes under strong selective constraint that are therefore priority candidates for MCs, 77% (i.e., 7,416) of which would be novel genes for MCs (Figure 2). Furthermore, of the 10,487 mouse genes in the Mouse Genome Database (MGI)18 that are each linked to at least one non-lethal phenotype in a mutant strain, the human orthologs for 72% (i.e., 7,501 priority candidate genes) have yet to be shown to underlie an MC (Figure 2). Taken together, mouse- and human-supported data yield a total of 13,737 priority candidate genes for MCs, 78% (10,467) of which would be novel (Figure 2). Even under the more conservative assumption that a gene must be considered a priority candidate in both human and mouse, there are 6,346 genes predicted to underlie an MC, of which 4,450 (70%) would be novel (Figure 2). Accordingly, if we assume that each candidate gene underlies a single MC, there are ∼1.5–3 times as many novel genes (4,450–10,467) for MCs yet to be discovered as there are genes (3,519) known already to underlie an MC. If we extrapolate that the same proportion of these genes underlie multiple MCs as is the case for known genes for MCs (i.e., 16% underlie two MCs, 4.7% underlie three, 1.8% underlie four, etc.), we predict that at a minimum, ∼6,100–14,400 MCs remain to be discovered. And these figures are still a considerable underestimate of the number of unsolved MCs because we did not account for the fact that mutant phenotypes for over half (∼12,000) of all protein-coding mouse genes have yet to be assessed. Moreover, we used conservative cutoffs for defining constrained genes, and current human constraint metrics both lack power to detect constraint in ∼30% of all human genes and are underpowered to detect constraint against homozygous loss of function16 mutations. For example, our analysis identifies 1,393 known genes for MCs that are not constrained in humans. The majority (>75%) of these genes underlie MCs that are inherited in an autosomal recessive pattern.
The widespread use of ES in general, and in diagnostic settings in particular, has highlighted the contribution of de novo variants (DNVs) to risk of rare disease in general, and especially of MCs that markedly reduce fitness. Yet our analysis of OMIM suggests that each year prior to and since 2010, the majority (∼80%–90%; see Supplemental Materials in Supplemental Data) of discoveries of genes underlying MCs are for MCs that are typically inherited in an autosomal recessive, dominant, or X-linked pattern rather than due entirely to DNVs (Figure 1C and Figure S5). A similar estimate (76%) is obtained from analysis of discoveries of genes for MCs identified in the course of diagnostic testing via ES (K. Retterer, GeneDx, personal communication). In diagnostic settings, variants in novel candidate genes for MCs are usually resolved as pathogenic via collaboration with dedicated discovery efforts by research programs. Such successful informal collaboration between industry and academic gene discovery efforts represents an opportunity that could be, if not should be, leveraged at scale for mutual benefit (i.e., to further accelerate gene discovery and in turn increase diagnostic rates). More importantly, such a collaborative effort across diagnostic labs and researchers could translate into a big windfall for families with rare diseases.
Even with an accelerating pace of gene discovery for MCs, efforts to find a gene underlying an MC are successful only about half the time. This observation underscores the reality that there are myriad factors limiting the rate of MC gene discovery using current ES- and NGS-based approaches. Such limiting factors include: (1) inability to robustly predict the impact of missense, synonymous, intronic, splice, and non-coding variants; (2) limited access to high-throughput functional validation of candidate variants; (3) much slower co-evolution of the infrastructure and regulatory framework necessary to share genomic data openly and at scale worldwide (hundreds of putative gene discoveries are unreported); (4) technical limitations of approaches based on ES (e.g., identifying indels, copy number variants, repeat expansions, structural variants, etc.); and (5) the challenges (insufficient resources, lack of organized efforts) of ascertaining and deeply phenotyping families with high priority candidate genotypes. The cost and impact of overcoming each limitation varies substantially, but to what extent and under what circumstances remains a topic of intense investigation both in the public and private sectors.
Many of the efforts to further improve the success of MC gene discovery have focused on application of new sequencing technologies and variant calling and/or annotation. Whole-genome sequencing (WGS), in particular, has been considered by many to be the logical tool to supplant ES for MC gene discovery. Yet to date, WGS has, after excluding coding, near-splice site, or structural variants (SVs) overlapping known MC-associated genes, yielded few discoveries of novel genes or loci underlying MCs.19 This is due in part to limited availability and utility of annotations for untranslated regions, enhancers, insulators, silencers, RNA genes, and microRNAs as well as callers for SVs and repeat expansions. However, it also underscores the observation that the vast majority of known MCs are caused predominantly by coding variants with large effect sizes, and both ES and NGS gene panels already cover such coding regions robustly. Indeed, with the exception of repeat expansions, pathogenic non-coding variants have been reported for only 156 MCs and of these conditions, 150 (∼96%) of the genes discovered were found via variants accessible to ES or microarrays.19
There are a handful of examples of successful gene discovery using WGS to identify pathogenic non-coding variants when ES failed, and these can inform judicious use of WGS for discovery. In virtually every case, the search space was reduced to a small fraction of the genome via linkage analysis, homozygosity mapping, or identification of shared chromosomal microdeletions or duplications. For a small number of recessive MCs, identification of only one protein-coding variant led to the search for a non-coding variant in trans via WGS.20, 21, 22, 23, 24 Additionally, most non-coding pathogenic variants identified to date are small to moderate-size (e.g., multiple nucleotides) deletions, insertions, mobile element, or repeat expansions and contractions that remove or alter the sequence of a large portion or all of a regulatory element,9, 10, 11, 12, 13 duplicate the element in its entirety,25 or translocate it out of its normal sequence context.26, 27 Enrichment for non-coding SNVs in regulatory regions has been detected in, for example, developmental disorders28 and autism,29 but proving the pathogenicity for any one specific SNV is challenging because most are unique and alter non-overlapping bases.
Use of transcriptome sequencing (RNA-seq) to identify abnormally spliced transcripts and/or assess transcript abundance facilitates identification of non-coding variants, deep intronic splice variants,16, 17, 18 and to a lesser extent, synonymous variants with unexpected effects on splicing. However, while useful for diagnosis or validating effects of variants detected by ES or WGS, successful applications of RNA-seq to novel gene discovery for MCs are currently constrained by lack of knowledge of and/or access to disease-relevant tissue or the expense of creating transdifferentiated cell lines19, 20 from affected individuals and controls.30 Thus, while WGS may be technologically superior to ES at detecting non-coding and structural variants and to RNA-seq at highlighting pathogenic variants that alter splicing or transcript abundance, there is little evidence to date that the predicted thousands of currently undiscovered MCs will be even largely caused by non-coding and/or deep intronic variants that can only be detected by widespread application of RNA-seq or WGS.
Perhaps the principal bottleneck to discovering genes underlying MCs is the lack of meaningful sharing at scale of genetic data and phenotypic information from families with a known or suspected novel MC. Millions of people with rare diseases, particularly children, have undergone targeted genetic testing, and ES and/or WGS has been performed on hundreds of thousands of them.7, 31, 32 Yet most of these results are buried in medical records, proprietary or restricted-access databases, and scientific papers, and most are difficult to access, much less leverage for gene discovery. In the U.S., institutions that participate in research and/or clinical care, including diagnostic labs, must comply with federal regulations such as the Health Insurance Portability and Accountability Act (HIPAA) and the Common Rule, which place boundaries, as well as protections, on use of patient and research participant data. The ambiguity of these boundaries can make navigating regulatory and privacy issues surrounding data sharing challenging and expensive, but sharing of de-identified data (e.g., candidate gene and non-identifying phenotypic information) among researchers, clinicians, and scientists in academics and industry is relatively straightforward. Over the past decade, a growing number of web-based platforms and databases to support data sharing have been developed and linked to one another via a federated network called the MatchMaker Exchange (MME)33 in order to facilitate matching of candidate genes. Matching within and between nodes of the MME has facilitated hundreds of discoveries of novel genes underlying MCs.
But such sharing doesn’t happen nearly as often as it could and should. In some instances, it is fear of non-compliance with HIPAA or the Common Rule and the risk of fines, suspension, etc., a lack of awareness of the power of sharing, or inaccessibility to platforms for sharing. In other cases, the intangible incentives to share are offset by concern that sharing might result in losing priority to publish or diminish competitiveness for grant funding, which in turn could adversely affect professional recognition and career advancement. Lately, sharing has also been threatened by attempts to monetize the discovery process by, for example, commercial start-ups and advocacy groups who generate or aggregate data and then market it for profit or fundraising. Moreover, matching on candidate genes without additional data (e.g., phenotype, mode of inheritance, variant) is increasingly inefficient because even nowadays, most matches are false positive matches. This problem will only worsen as the number of candidate genes shared across MME approaches all human genes. Finally, matching does not ensure public reporting, much less timely reporting, of the results as demonstrated by the increasing ratio of discoveries to publications in the CMGs, so discoveries can remain unknown for years and the information unable to be used by diagnostic labs and clinicians.
Families with MCs are arguably eager to share health data34, 35 if it can improve their care or the care of other families with the same condition, and when patients share their own health data online, HIPAA and the Common Rule do not apply. However, use of MME is restricted to clinicians and researchers who are often disincentivized to share data with one another, or who deprioritize it due to time constraints and the perception that is unlikely to be of benefit, and are even less likely to share it publicly. Over the past several years, families have increasingly turned to social media to circumvent the obstacles that limit data sharing by clinicians and researchers and to advertise their child’s health information and candidate genes to the public at large to make themselves more discoverable. This approach has led to some notable successes that are widely cited in the popular press.36 However, most efforts to use social media to facilitate case-matching fail. Some families are unable to gain the attention of suitable researchers and clinicians, and others lack the expertise to prioritize the information that should be shared, releasing non-standardized health and genetic information that cannot be easily compared or interpreted. Newer family facing platforms (e.g., MyGene2) aim to increase patient control over their data and create a public knowledge base of variant data linked to rare disease phenotypes in order to promote and facilitate data sharing directly from families while still allowing researchers and clinicians to share de-identified data.
Use of ES- and NGS-based strategies coupled with phenotype-driven delineation of MCs has brought us within reach of identifying genes for all known MCs that remain unsolved. But importantly, it has also revealed that the majority of MCs that exist likely have not yet been delineated because they are likely not recognizable as discrete entities by commonly employed clinical phenotyping approaches. Indeed, genotype-driven delineation of MCs has rekindled an emphasis on the need for deep-phenotyping in families if we are to achieve the goal of understanding genome function and more importantly, its links to human disease. Moreover, barring some currently unknown or unexpected biological mechanism that underlies the majority of MCs yet to be delineated, technical innovations will continue to yield only marginal improvements in rates of gene discovery. A deeper and more sustained impact on gene discovery for MCs will likely require a far broader commitment to more open, simpler, and more meaningful data sharing among all stakeholders in research and clinical care worldwide, as well as identifying resources to support a worldwide infrastructure to ascertain, sequence, and phenotype families with a broad range of clinical findings. The return on investment is nothing short of a keystone in the foundation of precision genomic medicine.
Declaration of Interests
The authors declare no competing interests.
Acknowledgments
We thank all of the families, clinicians, and investigators for their participation and support. We thank Kati Buckingham, John Carey, Katrina Dipple, Ada Hamosh, Colby Marvin, Jay Shendure, and Kathryn Shively for helpful discussion. This work was supported in part by grants from the National Human Genome Research Institute (NHGRI) and National Heart, Lung, and Blood Institute (NHLBI) grant HG006493 (to the University of Washington Center for Mendelian Genomics). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NHGRI and NHLBI or of the National Institutes of Health.
Footnotes
Supplemental Data can be found online at https://doi.org/10.1016/j.ajhg.2019.07.011.
Web Resources
Code for OMIM analysis and figures, https://github.com/jxchong/mendelian_commentary
MatchMaker Exchange, https://www.matchmakerexchange.org
Mouse Genome Database, http://www.informatics.jax.org
MyGene2, https://mygene2.org/MyGene2/
Online Mendelian Inheritance in Man, http://www.omim.org
Supplemental Data
References
- 1.Antonarakis S.E., Beckmann J.S. Mendelian disorders deserve more attention. Nat. Rev. Genet. 2006;7:277–282. doi: 10.1038/nrg1826. [DOI] [PubMed] [Google Scholar]
- 2.Starita L.M., Islam M.M., Banerjee T., Adamovich A.I., Gullingsrud J., Fields S., Shendure J., Parvin J.D. A multiplex homology-directed DNA repair assay reveals the impact of more than 1,000 BRCA1 missense substitution variants on protein function. Am. J. Hum. Genet. 2018;103:498–508. doi: 10.1016/j.ajhg.2018.07.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ramsey B.W., Davies J., McElvaney N.G., Tullis E., Bell S.C., Dřevínek P., Griese M., McKone E.F., Wainwright C.E., Konstan M.W., VX08-770-102 Study Group A CFTR potentiator in patients with cystic fibrosis and the G551D mutation. N. Engl. J. Med. 2011;365:1663–1672. doi: 10.1056/NEJMoa1105185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ng S.B., Buckingham K.J., Lee C., Bigham A.W., Tabor H.K., Dent K.M., Huff C.D., Shannon P.T., Jabs E.W., Nickerson D.A. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 2010;42:30–35. doi: 10.1038/ng.499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ng S.B., Bigham A.W., Buckingham K.J., Hannibal M.C., McMillin M.J., Gildersleeve H.I., Beck A.E., Tabor H.K., Cooper G.M., Mefford H.C. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat. Genet. 2010;42:790–793. doi: 10.1038/ng.646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ng S.B., Turner E.H., Robertson P.D., Flygare S.D., Bigham A.W., Lee C., Shaffer T., Wong M., Bhattacharjee A., Eichler E.E. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461:272–276. doi: 10.1038/nature08250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Posey J.E., O’Donnell-Luria A.H., Chong J.X., Harel T., Jhangiani S.N., Coban Akdemir Z.H., Buyske S., Pehlivan D., Carvalho C.M.B., Baxter S., Centers for Mendelian Genomics Insights into genetics, human biology and disease gleaned from family based genomic studies. Genet. Med. 2019;21:798–812. doi: 10.1038/s41436-018-0408-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Chong J.X., Buckingham K.J., Jhangiani S.N., Boehm C., Sobreira N., Smith J.D., Harrell T.M., McMillin M.J., Wiszniewski W., Gambin T., Centers for Mendelian Genomics The genetic basis of Mendelian phenotypes: Discoveries, challenges, and opportunities. Am. J. Hum. Genet. 2015;97:199–215. doi: 10.1016/j.ajhg.2015.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Collins F.S. Positional cloning moves from perditional to traditional. Nat. Genet. 1995;9:347–350. doi: 10.1038/ng0495-347. [DOI] [PubMed] [Google Scholar]
- 10.Vissers L.E.L.M., de Ligt J., Gilissen C., Janssen I., Steehouwer M., de Vries P., van Lier B., Arts P., Wieskamp N., del Rosario M. A de novo paradigm for mental retardation. Nat. Genet. 2010;42:1109–1112. doi: 10.1038/ng.712. [DOI] [PubMed] [Google Scholar]
- 11.Hartley T., Balcı T.B., Rojas S.K., Eaton A., Canada C.R., Dyment D.A., Boycott K.M. The unsolved rare genetic disease atlas? An analysis of the unexplained phenotypic descriptions in OMIM®. Am. J. Med. Genet. C. Semin. Med. Genet. 2018;178:458–463. doi: 10.1002/ajmg.c.31662. [DOI] [PubMed] [Google Scholar]
- 12.Deciphering Developmental Disorders Study Large-scale discovery of novel genetic causes of developmental disorders. Nature. 2015;519:223–228. doi: 10.1038/nature14135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Jin S.-C., Homsy J., Zaidi S., Lu Q., Morton S., DePalma S.R., Zeng X., Qi H., Chang W., Sierant M.C. Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands. Nat. Genet. 2017;49:1593–1601. doi: 10.1038/ng.3970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Grove J., Ripke S., Als T.D., Mattheisen M., Walters R.K., Won H., Pallesen J., Agerbo E., Andreassen O.A., Anney R., Autism Spectrum Disorder Working Group of the Psychiatric Genomics Consortium. BUPGEN. Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium. 23andMe Research Team Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet. 2019;51:431–444. doi: 10.1038/s41588-019-0344-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Havrilla J.M., Pedersen B.S., Layer R.M., Quinlan A.R. A map of constrained coding regions in the human genome. Nat. Genet. 2019;51:88–95. doi: 10.1038/s41588-018-0294-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Karczewski, K.J., Francioli, L.C., Tiao, G., Cummings, B.B., Alfoldi, J., Wang, Q., Collins, R.L., Laricchia, K.M., Ganna, A., Birnbaum, D.P., et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. bioRxiv. 10.1101/531210.
- 17.Coban-Akdemir Z., White J.J., Song X., Jhangiani S.N., Fatih J.M., Gambin T., Bayram Y., Chinn I.K., Karaca E., Punetha J., Baylor-Hopkins Center for Mendelian Genomics Identifying genes whose mutant transcripts cause dominant disease traits by potential gain-of-function alleles. Am. J. Hum. Genet. 2018;103:171–187. doi: 10.1016/j.ajhg.2018.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bult C.J., Blake J.A., Smith C.L., Kadin J.A., Richardson J.E., Mouse Genome Database Group Mouse Genome Database (MGD) 2019. Nucleic Acids Res. 2019;47(D1):D801–D806. doi: 10.1093/nar/gky1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Smedley D., Schubach M., Jacobsen J.O.B., Köhler S., Zemojtel T., Spielmann M., Jäger M., Hochheiser H., Washington N.L., McMurry J.A. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in Mendelian disease. Am. J. Hum. Genet. 2016;99:595–606. doi: 10.1016/j.ajhg.2016.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.LaCroix A.J., Stabley D., Sahraoui R., Adam M.P., Mehaffey M., Kernan K., Myers C.T., Fagerstrom C., Anadiotis G., Akkari Y.M., University of Washington Center for Mendelian Genomics GGC repeat expansion and Exon 1 methylation of XYLT1 is a common pathogenic variant in Baratela-Scott Syndrome. Am. J. Hum. Genet. 2019;104:35–44. doi: 10.1016/j.ajhg.2018.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Karolak J.A., Vincent M., Deutsch G., Gambin T., Cogné B., Pichon O., Vetrini F., Mefford H.C., Dines J.N., Golden-Grant K. Complex Compound inheritance of lethal lung developmental disorders due to disruption of the TBX-FGF pathway. Am. J. Hum. Genet. 2019;104:213–228. doi: 10.1016/j.ajhg.2018.12.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wu N., Ming X., Xiao J., Wu Z., Chen X., Shinawi M., Shen Y., Yu G., Liu J., Xie H. TBX6 null variants and a common hypomorphic allele in congenital scoliosis. N. Engl. J. Med. 2015;372:341–350. doi: 10.1056/NEJMoa1406829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Albers C.A., Paul D.S., Schulze H., Freson K., Stephens J.C., Smethurst P.A., Jolley J.D., Cvejic A., Kostadima M., Bertone P. Compound inheritance of a low-frequency regulatory SNP and a rare null mutation in exon-junction complex subunit RBM8A causes TAR syndrome. Nat. Genet. 2012;44:435–439, S1–S2. doi: 10.1038/ng.1083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wieczorek D., Newman W.G., Wieland T., Berulava T., Kaffe M., Falkenstein D., Beetz C., Graf E., Schwarzmayr T., Douzgou S. Compound heterozygosity of low-frequency promoter deletions and rare loss-of-function mutations in TXNL4A causes Burn-McKeown syndrome. Am. J. Hum. Genet. 2014;95:698–707. doi: 10.1016/j.ajhg.2014.10.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ngcungcu T., Oti M., Sitek J.C., Haukanes B.I., Linghu B., Bruccoleri R., Stokowy T., Oakeley E.J., Yang F., Zhu J. Duplicated enhancer region increases expression of CTSB and segregates with keratolytic winter erythema in South African and Norwegian families. Am. J. Hum. Genet. 2017;100:737–750. doi: 10.1016/j.ajhg.2017.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Brewer M.H., Chaudhry R., Qi J., Kidambi A., Drew A.P., Menezes M.P., Ryan M.M., Farrar M.A., Mowat D., Subramanian G.M. Whole genome sequencing identifies a 78 kb insertion from chromosome 8 as the cause of charcot-marie-tooth neuropathy CMTX3. PLoS Genet. 2016;12:e1006177. doi: 10.1371/journal.pgen.1006177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Spielmann M., Brancati F., Krawitz P.M., Robinson P.N., Ibrahim D.M., Franke M., Hecht J., Lohan S., Dathe K., Nardone A.M. Homeotic arm-to-leg transformation associated with genomic rearrangements at the PITX1 locus. Am. J. Hum. Genet. 2012;91:629–635. doi: 10.1016/j.ajhg.2012.08.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Short P.J., McRae J.F., Gallone G., Sifrim A., Won H., Geschwind D.H., Wright C.F., Firth H.V., FitzPatrick D.R., Barrett J.C., Hurles M.E. De novo mutations in regulatory elements in neurodevelopmental disorders. Nature. 2018;555:611–616. doi: 10.1038/nature25983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Turner T.N., Eichler E.E. The role of de novo noncoding regulatory mutations in neurodevelopmental disorders. Trends Neurosci. 2019;42:115–127. doi: 10.1016/j.tins.2018.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Brechtmann F., Mertes C., Matusevičiūtė A., Yépez V.A., Avsec Ž., Herzog M., Bader D.M., Prokisch H., Gagneur J. OUTRIDER: A statistical method for detecting aberrantly expressed genes in RNA sequencing data. Am. J. Hum. Genet. 2018;103:907–917. doi: 10.1016/j.ajhg.2018.10.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Stark Z., Dolman L., Manolio T.A., Ozenberger B., Hill S.L., Caulfied M.J., Levy Y., Glazer D., Wilson J., Lawler M. Integrating genomics into healthcare: A global responsibility. Am. J. Hum. Genet. 2019;104:13–20. doi: 10.1016/j.ajhg.2018.11.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.GeneDx announces completion of 100,000 exome sequences. https://www.globenewswire.com/news-release/2018/06/12/1520222/0/en/GeneDx-Announces-Completion-of-100-000-Exome-Sequences.html.
- 33.Philippakis A.A., Azzariti D.R., Beltran S., Brookes A.J., Brownstein C.A., Brudno M., Brunner H.G., Buske O.J., Carey K., Doll C. The Matchmaker Exchange: a platform for rare disease gene discovery. Hum. Mutat. 2015;36:915–921. doi: 10.1002/humu.22858. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Lambertson K.F., Damiani S.A., Might M., Shelton R., Terry S.F. Participant-driven matchmaking in the genomic era. Hum. Mutat. 2015;36:965–973. doi: 10.1002/humu.22852. [DOI] [PubMed] [Google Scholar]
- 35.Burstein M.D., Robinson J.O., Hilsenbeck S.G., McGuire A.L., Lau C.C. Pediatric data sharing in genomic research: attitudes and preferences of parents. Pediatrics. 2014;133:690–697. doi: 10.1542/peds.2013-1592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Might M., Wilsey M. The shifting model in clinical diagnostics: how next-generation sequencing and families are altering the way rare diseases are discovered, studied, and treated. Genet. Med. 2014;16:736–737. doi: 10.1038/gim.2014.23. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.