Abstract
Genomic studies of human disorders are often performed by distinct research communities (i.e., focused on rare diseases, common diseases, or cancer). Despite underlying differences in the mechanistic origin of different disease categories, these studies share the goal of identifying causal genomic events that are critical for the clinical manifestation of the disease phenotype. Moreover, these studies face common challenges, including understanding the complex genetic architecture of the disease, deciphering the impact of variants on multiple scales, and interpreting non-coding mutations. Here, we highlight these challenges in-depth and argue that properly addressing them will require a more unified vocabulary and approach across disease communities. Toward this goal, we present a unified perspective on relating variant impact to various genomic disorders.
Keywords: mutation impact, variant interpretation, regulatory mutations, genomic structural variations, driver mutations, cumulative impact
Opportunities and challenges in genomic studies of human diseases
In the last decade, genomics and related technologies have become essential for studying the complex genetic architectures of various human diseases[1–3]. The continued decline in the cost of sequencing and the rapid growth of computational methods for genome interpretation have enabled the regular use of genomics in the clinic[4–6]. Furthermore, genomics-based approaches promise to usher us into an era of precision medicine, where the genetic makeup of a patient can inform directed therapies[7,8]. Despite this excitement and promise, genomics-based human disease studies face common challenges. For instance, a key goal in disease genomics is to identify causal variants and associated genes. Yet, the quest to identify pathogenic variants often leads to simplistic variant dichotomies that do not capture the complex genetic architecture of a given disease. Most genomic studies classify germline mutations as common or rare based on their allele frequency[9]. Common variants are often implicated in common diseases, whereas rare mutations are considered critical in rare diseases[10,11]. Similarly, somatic mutations in the cancer genome are often binarized as drivers vs. passengers, where drivers are considered consequential for tumor growth with a negligible role of “passenger” mutations[12]. Moreover, interpreting non-coding variants and missense mutations remains challenging. Canonical variant interpretation approaches assume that high-impact variants are likely to be rare germline or somatic driver mutations, yet there are many exceptions to this generalization as the high impact of many germline mutations does not necessarily correlate with population-level rarity. Similarly, a subset of high-impact somatic mutations might not drive tumor growth.
Beyond single-nucleotide variants (SNVs) and small insertions and deletions (INDELs), accurate detection and interpretation of genomic structural variations (SVs) pose significant challenges. The current issues associated with accurately interpreting molecular consequences and prioritizing regulatory mutations and SVs have led to debate in the disease genomics community regarding the utility of whole-genome sequencing (WGS) compared with much cheaper whole-exome sequencing platforms[13–15]. Despite these shared challenges, the genomics community works in distinct spheres of rare diseases, common diseases, and cancer with minimal cross-talk to address questions of common interest. Thus, there is an urgent need to break these artificial silos and encourage frequent exchange of ideas and methods to tackle common questions and advance our collective understanding of these diverse sets of diseases. Motivated by these considerations, we provide a brief overview of the challenges and our perspective on integrating efforts to address these issues cohesively.
Variant dichotomy and the complex genetic architecture of disease
A majority of genomic studies tend to binarize variants while considering their population-level frequency (common vs. rare), impact (low vs. high), and disease causality (driver vs. passengers). These canonical dichotomies reflect classification convenience rather than capturing the complex genetic architecture of various diseases. For instance, recent studies have highlighted the polygenic architecture of different common and rare diseases, where multiple genes determine the clinical manifestation of the disease phenotype[16–18]. In particular, the recently proposed omnigenic model[19] posits that many genomic variants with small molecular effects contribute toward a given complex trait or phenotype. Furthermore, these variants influence genes that are not directly relevant to a given disease; however, their cumulative impact is propagated to a smaller set of core genes directly relevant to disease through a cell-specific regulatory network. Similarly, the canonical model of Mendelian diseases posits that deleterious alleles clustered within a limited number of genomic loci are responsible for the origin of the Mendelian phenotype. However, recent works have shown clinical heterogeneity among Mendelian diseases that are attributed to allelic heterogeneity and the variant modifier effects contributed by the environment and genetic background[20,21].
Like rare and common diseases, the canonical approaches in cancer biology classify cancer mutations as drivers and passengers. The classic model of tumor growth assumes that a handful of drivers provide a selective growth advantage to tumor cells[22] and the remaining “passenger” mutations are inconsequential for tumor growth. However, there is increasing evidence for a continuum or more nuanced model, where low-frequency and weak drivers along with standard drivers can promote tumor growth[18]. In particular, we have shown that the cumulative impact of weak drivers provides significant contributions beyond standard drivers to predict a cancer phenotype[23]. Furthermore, researchers have argued that a subset of non-driver mutations can also be classified as “deleterious passengers” that impart adverse fitness effects to tumor cells and likely inhibit tumor growth[24].
Finally, the general assumption underlying the genetic architecture of common and rare diseases is that of two distinct extremes, where rare variants are responsible for Mendelian phenotypes and common variants confer significant risk to common disease phenotypes[25]. In contrast, cancer – which is primarily driven by somatic mutations – is often considered to be a distinct category of disease due to differences in the mutational processes[26] generating germline and somatic mutations. However, recent large-scale genomic studies suggest a complex interplay among various common, rare, and somatic variations in various diseases. For example, recent work has shown that 20% of genes implicated in rare diseases often contain or are near variants associated with various common diseases, indicating that both rare and common variants contribute to the genetic architecture of common diseases[27]. In particular, one such study indicated that de novo and rare mutations in the HSPA1L gene, which belongs to the heat shock protein family, are associated with inflammatory bowel disease[28]. Similarly, rare variants, “particularly de novo mutations, and somatic mosaicism” are often implicated in various neurodevelopmental diseases including autism, schizophrenia, bipolar disorders, and other intellectual disabilities[29].
Challenges in variant interpretation across human diseases
An essential goal in large-scale sequencing studies is to identify a small subset of causal variants and disease-associated genes. Current approaches to achieving this goal can be broadly classified into three categories: 1) disease association of variants using statistical methods that employ burden testing or recurrence-based approaches for germline and somatic mutations[30,31], respectively, 2) impact quantification for coding and non-coding mutations on molecular, organismal, and intra- and inter-species levels[32–34] and 3) a combination of these two approaches (see Box 1). While statistical methods are regularly applied to identify genome-wide association study (GWAS) loci and rare-variant-enriched causal genes in various disease studies, they provide limited insights into the molecular underpinnings of these diseases. Furthermore, these statistical approaches are often limited by statistical power and require many samples to identify causal genomic variants[35]. This approach is particularly challenging for detecting causal non-coding variants for which the underlying functional motifs/territories are not well defined[36,37]. A simplistic approach to address this issue is to increase the sample size. However, this strategy is not likely to be practical for rare or heterogeneous diseases, including rare cancers that lack large patient cohorts. Furthermore, multiple hypothesis testing would limit the ability to detect causal genomic loci. Alternatively, an accurate definition of the underlying functional territory of various regulatory elements (including enhancers, insulators, and promoters) and their correct connections to target genes can address these challenges to a certain extent. We note that employing tissue-specific non-coding annotations and epigenetic data is critical for identifying any disease-relevant regulatory mutation. For instance, certain regulatory elements are active across multiple tissues, whereas others regulate gene expression in a tissue-specific manner[38]. Similarly, the activity of genomic regulatory elements differs across different developmental stages[39].
Text Box 1.
Mutation burden and genetic-association tests are common approaches for identifying disease-associated variants and genes in cancer, common diseases, and rare diseases. For instance, most GWAS employ a linear or logistic regression model for association testing for continuous or binary phenotypes[70]. Typically, these models include various covariates (e.g., age, sex, ancestry) to avoid confounding effects as well as a random effect term to account for genetic relatedness among individuals in a given study[71]. Subsequently, a stringent multiple-hypothesis testing correction threshold (P < 5X10−8) is utilized to avoid false-positive associations. In contrast, genotype information for one individual or for a cohort is often collapsed to perform gene- or genomic-element-level burden testing in rare diseases and cancer. A simple implementation of such burden testing in rare diseases summarizes genotypes in a binary variable to capture the presence of at least one rare allele[72]. Other implementations utilize multivariate collapsing[73], nonparametric methods[74], and variance component testing[30], which consider a mixture of effect sizes among rare variants. Similarly, many cancer genomics studies have used mutation burden testing to identify cancer driver genes and genomic elements[31,75,76]. Briefly, cancer driver detection approaches search for positive selection signals by comparing the observed mutation rate with the expected mutation rate based on a background mutation rate. However, accurate modeling of background mutation rates remains challenging due to the significant variation across the cancer genome and a limited understanding of cancer mutational processes[77]. Furthermore, burden testing effectively detects frequently mutated drivers but fails to detect low-frequency or rare driver elements or genes. Beyond mutation burden testing, many efforts have also focused on quantifying the molecular impact of mutations to prioritize and detect causal mutations in different diseases[32,78–80]. Most of these methods utilize sequences, cross-species conservation scores, and protein structural changes to assess the molecular effects of coding mutations. In contrast, quantifying the molecular impact of non-coding mutations is not straightforward and requires further methodological development. Finally, multiple methods have included molecular impact as part of mutational burden testing to identify causal genes or genomic elements in cancer and various rare disease studies[81–83]. Typically, these methods employ functional annotations or the molecular impact of mutations as weights in their mutation burden testing frameworks. The inclusion of the molecular effect of mutations intends to facilitate the discovery of functionally relevant regions enriched in rare variants or somatic mutations that are likely to play pivotal roles in rare diseases or cancer growth.
We can also quantify the impact of genomic variants to interpret their role in various diseases. However, the challenges of impact quantification vary for different categories of variants. For instance, assessing loss-of-function[40] and synonymous mutation effects is relatively straightforward compared with assessing nonsynonymous mutation effects. In contrast, interpreting nonsynonymous mutations is comparatively easier than interpreting non-coding mutations. Similarly, among non-coding mutations, those that affect transcription factor binding motifs on regulatory elements in the genome are relatively simple to interpret[41].
Beyond mutations and INDELs, disease genomes also harbor larger (> 50 bp) SVs. Due to their relatively larger length, SVs are likely to disrupt multiple coding and regulatory elements in the genome[42], thus playing an essential role in the development of disease phenotypes[43]. Despite the vital role of SVs in various conditions, we currently lack methods for accurately evaluating the impact of SVs and precise interpretation of their role in various common and rare diseases. Some recent efforts to address this challenge have relied on the underlying functional annotation of the regions affected by SVs for prioritization[44,45]. Despite these initial efforts, we currently lack mechanistic insights into how SVs perturb genome function, thus necessitating more systematic approaches for evaluating the impact of SVs on the organismal endophenotype.
Finally, the current variant interpretation methods likely ignore the role of genetic buffering, epistatic interactions, and variant modifier effects, which play critical roles in dictating the final disease phenotype[46–48]. Similarly, our understanding of how different categories of variants (SNVs, INDELs, and SVs) cumulatively influence disease phenotype remains limited and is hardly considered in studies identifying causal genes. Thus, there is a need for novel computational approaches for variant interpretation that can quantify the cumulative impact of distinct categories of variants on a gene and network level to predict their role in various diseases.
Role of whole-genome sequencing and new technologies for variant interpretation
Due to the current limitations in interpreting non-coding variants and SVs, there are ongoing debates around the utility of WGS in large-scale disease studies, considering their higher cost than exome- or array-based sequencing studies. In particular, in rare disease communities, some arguments favor the generation of patients’ exome and transcriptome sequencing profiles[14,21]. Similarly, large-scale cancer genomic studies have also favored the generation of exome and transcriptome profiles of tens of thousands of cancer patients because most clinically actionable mutations are likely to occupy the coding regions of the cancer genome[49]. Indeed, the coding region of a given gene is highly conserved, and coding mutations generally are under higher negative selection and are likely to have significant cellular and molecular impact. However, a large fraction of ultra-conserved elements[50] are present in the non-coding regions, and various regulatory mutations have demonstrated important functional consequences on cellular, molecular, and organismal levels. Moreover, the cumulative effect of intermediate-impact non-coding mutations can also contribute to the manifestation of a disease phenotype. We note that there have been controversies[51] around defining the function of non-coding[52] regions in the genome. In particular, careful consideration is needed to distinguish between evolutionarily selected effects of certain non-coding elements compared with the causal role of other elements lacking the signature of selection[53]. Nonetheless, we think that generating whole-genome profiles for patients will help identify more novel biomarkers and better elucidate disease biology than concentrating on only 1% of the genome captured by exome or transcriptome sequencing platforms. Furthermore, the WGS platform makes it possible to identify copy number variants and SVs. Similarly, prior studies have shown that WGS facilitates better uniformity in genomic coverage and thus allows for improved accuracy in detecting coding and non-coding variants[15] along with identifying somatic mosaicism[54] events in rare and common diseases.
Despite the clear advantage of WGS-based disease studies, the current short-read-based sequencing platform is not suitable for accurately identifying variants that fall in the repetitive regions of the genome[55]. This issue affects the accurate detection and interpretation of SVs in various diseases. The advent of long-read and linked-read sequencing technologies makes it possible to address these limitations[56]. However, these technologies remain more expensive and error-prone compared with short-sequencing platforms and, thus, are not suitable for large-scale genome sequencing studies. We anticipate that continued improvement in long-read sequencing chemistry and a reduction in sequencing cost will allow us to generate a better genomic variation catalog for various diseases in the near future[42,57,58].
Charting common ground across genomics studies of human diseases
A comparative bibliometric analysis of some of the marquee publications in common diseases (the UK Biobank[59], Trans-Omics for Precision Medicine[1], and Genome Aggregation Database[60]), rare diseases (the Center for Mendelian Genomics[2,27,61] and Undiagnosed Disease Network[62]), and cancer genomics (The Cancer Genome Atlas[49,63], Pan Cancer Analysis of Whole Genomes[3], and Hartwig metastatic cohort[64]) highlights commonalities and differences in current research directions. For instance, we used the bibliometrix tool[65] to closely inspect keywords present in the reference of marker papers of these distinct projects (Fig 1a) and publications citing them (Fig 1b). The results revealed various themes unique to rare (ACMG guidelines, de novo mutations, and intellectual disability disorders), common (genome-wide association, loci, and missing heritability), and cancer (somatic, survival, signature, and evolution) studies. However, various common keywords emerged across disease studies, including mutations, genes, SVs, genome, and identification. These common research themes highlight potentials avenues for future collaboration among researchers across disease genomic studies.
Fig 1: Unified themes underlying human disease genomics and the variant impact landscape.
Bibliometric analysis of keywords in references cited by (A) or citing (B) marker papers of large-scale genomic studies in cancer, common diseases, and rare diseases highlights commonalities and differences in the overall themes. Common keywords across all three disease categories are shown as black segments of the dependency wheel. In contrast, keywords that are either unique or common in two disease categories but with unequal connectivity weights correspond to disease-specific color groups (red, green, and blue for cancer, rare diseases, and common diseases, respectively). Furthermore, panel C presents correlation patterns observed between mutational impact and population-level allele frequency or the cellular fraction genomic variants for distinct disease categories (e.g., common disease, rare disease, and cancer). While the population-level allele frequency for somatic mutations is close to zero, the cellular fraction for germline mutations is close to one. Plane I on the proposed representation relates the normalized impact of mutations with their allele frequency for common and rare diseases. In contrast, plane II highlights the relationship between the normalized impact and cellular fraction for somatic mutations in cancer. Theoretically, plane III relates the cellular fraction with the allele frequency of germline mutations. The black text on these plots corresponds to variant categories that are easy to detect, while the red text corresponds to variants that are difficult to detect using canonical approaches.
Beyond bibliometric analyses, we note that, despite differences among the underlying mutational processes of germline and somatic mutations in various diseases, there are many similarities between their normalized impact and mutational frequencies on the population and cellular level. The normalized impact of a variant can be quantified at different resolutions, including the molecular, cellular, and organismal level, while considering inter- and intra-species conservation metrics. Similarly, while the population-level allele frequency measures the prevalence of a variant in a human population, the cellular-level fraction quantifies the prevalence of mutations in cells. One would expect the allele frequency of germline mutations to vary; however, their cellular-level fraction will be one (Fig 1C). In contrast, the allele frequency of somatic mutations is close to zero (Fig 1C), while their cellular fraction differs based on their clonal or subclonal status.
Previous works have highlighted the relationship between the allele frequencies of germline mutations and their impact scores[66], where Mendelian, de novo, and rare variants have higher impact but lower allele frequencies (Fig 1c. I). In contrast, common variants in GWAS loci tend to have higher allele frequencies and lower normalized impact. Theoretically, one could also observe common variants with high impact near the extremum. However, common variants with a higher positive impact that provide a fitness advantage would undergo population-level fixation and thus not be observed. Moreover, detecting any high-impact deleterious mutations in this category is implausible as the purifying selection would remove such common variants from the human population. Between these extremes, we would also observe variants with intermediate impact and intermediate allele frequency. Finally, theoretically, there could be rare variants with low molecular impact in common diseases, but these variants would be challenging to detect, similar to common variants with high impact.
Compared with germline variants, the impact of somatic mutations relates to their cellular fractions (Fig 1c. II). We note that the cellular fraction is analogous to population-level frequency but has an inverse relationship with the impact score of somatic mutations. The negative relationship can be attributed to the primary role of positive selection in cancer[67] as opposed to the influence of negative selection[68] among germline variants. For instance, we expect the canonical clonal cancer drivers to have a high impact and high cellular fraction. In contrast, subclonal somatic variants (including low frequency drivers) and weak cancer driver mutations are likely to have high and intermediate impacts, respectively. Similarly, weak and strong deleterious passengers would occupy distinct extrema on this plane with a negative impact leading to adverse fitness effects on tumor growth. However, detecting deleterious passengers with a high negative fitness effect would be challenging, as they would be selectively removed during tumor evolution. Finally, we note that these distinct categories of variants can work together in various diseases. For instance, common and rare variants can interact cooperatively to determine rare and common disease phenotypes. Similarly, germline and somatic variants in a cancer genome can synergistically interact to drive tumor progression[69].
Overall, the similarities observed in our bibliometric analysis and the presence of a variant impact continuum highlight the necessity for collaboration among researchers across disease studies. Although these common themes are expansive, a few specific topics within these themes would be an excellent starting point for cooperation among genomic researchers studying various human diseases. For instance, sharing ideas and methodologies for variant interpretation, particularly regulatory mutations, could serve as common ground for such interaction. In this context, the Impact of Genomic Variation on Function consortium and Atlas of Variant Effects Alliance provide roadmaps for further meaningful collaborations to address common challenges in disease genomic studies. Similarly, exchanging ideas and methods for addressing challenges associated with building accurate SV maps and SV interpretation would immensely benefit our understanding of the genomic architecture of various diseases. Finally, interactions among researchers to address questions related to integrating genomic information for clinical decision workflows, harmonizing clinical data, enabling secure data exchange, and developing genomic meta-analysis approaches will help achieve the goal of precision medicine across diseases.
Concluding remarks
Recent advances in sequencing and computational techniques are facilitating the goal of precision medicine for various human diseases. Previous genomic studies frequently have used canonical dichotomy to distinguish putative causal genomic variants from remaining mutations across human diseases. However, recent studies have highlighted the complex genetic architecture of various human disorders, indicating a mutational continuum rather than a simplistic dichotomy. Furthermore, genomic studies encounter common challenges in variant interpretation and prioritization, particularly for non-coding mutations and genomic rearrangements. Despite these common challenges (see outstanding questions), there is an evident lack of interaction among various disease-specific research communities. This opinion article points out these commonalities and highlights potential avenues for collaborations to better understand the complex patterns of mutational impact and their role in different human diseases.
Outstanding questions.
Can additional insights related to human diseases be gained by variant continuum models rather than canonical mutation binarization approaches?
How to precisely measure the impact of non-coding mutations and genomic rearrangements in various diseases?
How can we quantify the cumulative effect of genomic variants in the clinical manifestation of human disease phenotype?
Can we borrow ideas and methods from common and rare diseases to study tumor biology and vice versa?
Highlights.
Genomic variants in human diseases are often binarized, oversimplifying their underlying genetic architecture.
Interpretation of regulatory mutations and genomic structural variations remains a crucial challenge for genomic studies of various diseases.
Whole-genome sequencing approaches are more effective in capturing the entire variant spectrum in human diseases.
Common challenges in variant interpretation necessitate the development of unified frameworks to investigate the impact of genomic variants.
Acknowledgments
S.K. acknowledges support from the Canada Research Chair and the V foundation for cancer research. M.G. acknowledges support from the NIH and from AL Williams Professorship funds.
Glossary
- WGS
Whole-genome sequencing, which captures all variants in a given genome
- Driver mutations
A handful of cancer mutations that provide a selective growth advantage to tumor cells
- Minor allele frequency
Population-wide frequency of minor alleles, which can vary across distinct human populations
- Structural variations
Genomic alterations with lengths greater than 50 bp that are likely to affect multiple coding and regulatory elements in the genome simultaneously
- Common variants
Genomic variants with MAF > 0.1% that are likely to be common in a given population
- Rare mutations
Mutations with MAF <= 0.05% that are rare in a given population and likely to be more deleterious compared with common variants
- Missense mutations
Coding mutations that change the identity of the amino acid residues affected by them
- Non-coding mutations
The vast majority of mutations in a whole-genome sequencing study, occupying non-coding regions in the genome
Footnotes
Declaration of interests
The authors declare that they have no conflicts of interest.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Taliun D. et al. (2021) Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 2021 590:7845 590, 290–299 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Posey JE et al. (2019) Insights into genetics, human biology and disease gleaned from family based genomic studies. Genetics in Medicine 21, 798–812 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Campbell PJ et al. (2020) Pan-cancer analysis of whole genomes. Nature 578, 82–93 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Peter M. et al. (2022) Participant experiences of genome sequencing for rare diseases in the 100,000 Genomes Project: a mixed methods study. European Journal of Human Genetics 2022 30:5 30, 604–610 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Tamborero D. et al. (2022) The Molecular Tumor Board Portal supports clinical decisions and automated reporting for precision oncology. Nature Cancer 2022 3:2 3, 251–261 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zehir A. et al. (2017) Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nature Medicine 2017 23:6 23, 703–713 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Collins FS and Varmus H. (2015) A New Initiative on Precision Medicine. The New England journal of medicine 372, 793–5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dugger SA et al. (2017) Drug development in the era of precision medicine. Nature Reviews Drug Discovery 2017 17:3 17, 183–196 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.1000 Genomes Project Consortium, A. et al. (2015) A global reference for human genetic variation. Nature 526, 68–74 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Pritchard JK and Cox NJ (2002) The allelic architecture of human disease genes: common disease–common variant… or not? Human Molecular Genetics 11, 2417–2423 [DOI] [PubMed] [Google Scholar]
- 11.Botstein D. and Risch N. (2003) Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nature Genetics 2003 33:3 33, 228–237 [DOI] [PubMed] [Google Scholar]
- 12.Vogelstein B. and Kinzler KW (2015) The Path to Cancer --Three Strikes and You’re Out. The New England journal of medicine 373, 1895–8 [DOI] [PubMed] [Google Scholar]
- 13.Schwarze K. et al. (2018) Are whole-exome and whole-genome sequencing approaches cost-effective? A systematic review of the literature. Genetics in Medicine 2018 20:10 20, 1122–1130 [DOI] [PubMed] [Google Scholar]
- 14.Bamshad MJ et al. (2011) Exome sequencing as a tool for Mendelian disease gene discovery. Nature Reviews Genetics 2011 12:11 12, 745–755 [DOI] [PubMed] [Google Scholar]
- 15.Belkadi A. et al. (2015) Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants. Proceedings of the National Academy of Sciences of the United States of America 112, 5473–5478 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Oetjens MT et al. (2019) Quantifying the polygenic contribution to variable expressivity in eleven rare genetic disorders. Nature Communications 2019 10:1 10, 1–10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Torkamani A. et al. (2018) The personal and clinical utility of polygenic risk scores. Nature Reviews Genetics 2018 19:9 19, 581–590 [DOI] [PubMed] [Google Scholar]
- 18.Castro-Giner F. et al. (2015) The mini-driver model of polygenic cancer evolution. Nature Reviews Cancer 15, 680–685 [DOI] [PubMed] [Google Scholar]
- 19.Boyle EA et al. (2017) An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell 169, 1177–1186 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Mohammadi P. et al. (2019) Genetic regulatory variation in populations informs transcriptome analysis in rare disease. Science 366, 351–356 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Cummings BB et al. (2017) Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Science Translational Medicine 9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Vogelstein B. et al. (2013) Cancer Genome Landscapes. Science 339, 1546–1558 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kumar S. et al. (2020) Passenger Mutations in More Than 2,500 Cancer Genomes: Overall Molecular Functional Impact and Consequences. Cell 180, 915–927.e16 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.McFarland CD et al. (2013) Impact of deleterious passenger mutations on cancer progression. Proceedings of the National Academy of Sciences 110, 2910–2915 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hindorff LA et al. (2011) Genetic architecture of cancer and other complex diseases: lessons learned and future directions. Carcinogenesis 32, 945–954 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Helleday T. et al. (2014) Mechanisms underlying mutational signatures in human cancers. Nature Reviews Genetics 15, 585–598 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Chong JX et al. (2015) The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. American journal of human genetics 97, 199–215 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Takahashi S. et al. (2017) De novo and rare mutations in the HSPA1L heat shock gene associated with inflammatory bowel disease. Genome Medicine 9, 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Erickson RP (2010) Somatic gene mutation and human disease other than cancer: An updateMutation Research - Reviews in Mutation Research, 705Elsevier, 96–106 [DOI] [PubMed] [Google Scholar]
- 30.Wu MC et al. (2011) Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test. American Journal of Human Genetics 89, 82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Lochovsky L. et al. (2015) LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations. Nucleic Acids Research 43, 8123–8134 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Fu Y. et al. (2014) FunSeq2: A framework for prioritizing noncoding regulatory variants in cancer. Genome Biology 15, 480. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Cooper GM et al. (2005) Distribution and intensity of constraint in mammalian genomic sequence. Genome research 15, 901–13 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Peterson TA et al. (2012) Incorporating molecular and functional context into the analysis and prioritization of human variants associated with cancer. Journal of the American Medical Informatics Association 19, 275–283 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Lawrence MS et al. (2014) Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Kumar S. and Gerstein M. (2017) Cancer genomics: Less is more in the hunt for driver mutations. Nature DOI: 10.1038/nature23085 [DOI] [PubMed] [Google Scholar]
- 37.Rheinbay E. et al. (2017) Recurrent and functional regulatory mutations in breast cancer. Nature DOI: 10.1038/nature22992 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Rozowsky J. et al. (2021) Multi-tissue integrative analysis of personal epigenomes. bioRxiv DOI: 10.1101/2021.04.26.441442 [DOI] [Google Scholar]
- 39.Haniffa M. et al. (2021) A roadmap for the Human Developmental Cell Atlas. Nature 2021 597:7875 597, 196–205 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Balasubramanian S. et al. (2017) Using ALoFT to determine the impact of putative loss-of-function variants in protein-coding genes. Nature communications 8, 382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Khurana E. et al. (2016) Role of non-coding sequence variants in cancer. Nature Reviews Genetics 17, 93–108 [DOI] [PubMed] [Google Scholar]
- 42.Ebert P. et al. (2021) Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Weischenfeldt J. et al. (2013) Phenotypic impact of genomic structural variation: insights from and for human disease. Nature Reviews Genetics 14, 125–138 [DOI] [PubMed] [Google Scholar]
- 44.Kumar S. et al. (2020) SVFX: a machine learning framework to quantify the pathogenicity of structural variants. Genome Biology 2020 21:1 21, 1–21 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Ganel L. et al. (2016) SVScore: an impact prediction tool for structural variation. Bioinformatics DOI: 10.1093/bioinformatics/btw789 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Hartman IV JL et al. (2001) Principles for the Buffering of Genetic Variation. Science 291, 1001–1004 [DOI] [PubMed] [Google Scholar]
- 47.Castel SE et al. (2018) Modified penetrance of coding variants by cis-regulatory variation contributes to disease risk. Nature Genetics 2018 50:9 50, 1327–1334 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Domingo J. et al. (2019) The Causes and Consequences of Genetic Interactions (Epistasis). Annual Review of Genomics and Human Genetics 20, 433–460 [DOI] [PubMed] [Google Scholar]
- 49.Ding L. et al. (2018) Perspective on Oncogenic Processes at the End of the Beginning of Cancer Genomics. Cell 173, 305–320.e10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Bejerano G. et al. (2004) Ultraconserved elements in the human genome. Science 304, 1321–1325 [DOI] [PubMed] [Google Scholar]
- 51.Graur D. et al. (2013) On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE. Genome biology and evolution 5, 578–590 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Kellis M. et al. (2014) Defining functional DNA elements in the human genome. Proceedings of the National Academy of Sciences of the United States of America 111, 6131–6138 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Brunet TDP and Doolittle WF (2014) Getting “function “ right. Proceedings of the National Academy of Sciences of the United States of America 111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Yizhak K. et al. (2019) RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science 364 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Eichler EE (2019) Genetic Variation, Comparative Genomics, and the Diagnosis of Disease. New England Journal of Medicine 381, 64–74 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Chaisson MJP et al. (2019) Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nature Communications 10, 1784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Sedlazeck FJ et al. (2018) Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nature Reviews Genetics 2018 19:6 19, 329–346 [DOI] [PubMed] [Google Scholar]
- 58.Nurk S. et al. (2022) The complete sequence of a human genome. Science 376, 44–53 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Bycroft C. et al. (2018) The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Karczewski KJ et al. (2020) The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Baxter SM et al. (2022) Centers for Mendelian Genomics: A decade of facilitating gene discovery GENETICS IN MEDICINE, 24ELSEVIER SCIENCE INC, 784–797 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Ramoni RB et al. (2017) The Undiagnosed Diseases Network: Accelerating Discovery about Health and Disease. Am J Hum Genet 100, 185–192 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Weinstein JN et al. (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nature genetics 45, 1113–20 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Priestley P. et al. (2019) Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 2019 575:7781 575, 210–216 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Aria M. and Cuccurullo C. (2017) bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics 11, 959–975 [Google Scholar]
- 66.Manolio TA et al. (2009) Finding the missing heritability of complex diseases. Nature 2009 461:7265 461, 747–753 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Martincorena I. et al. (2017) Universal Patterns of Selection in Cancer and Somatic Tissues. Cell 171, 1029–1041.e21 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Khurana E. et al. (2013) Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Ramroop JR et al. (2019) Germline Variants Impact Somatic Events during Tumorigenesis. Trends in Genetics 35, 515–526 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Purcell S. et al. (2007) PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. The American Journal of Human Genetics 81, 559–575 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Zhou W. et al. (2018) Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet 50, 1335–1341 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Morgenthaler S. and Thilly WG (2007) A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res 615, 28–56 [DOI] [PubMed] [Google Scholar]
- 73.Li B. and Leal SM (2008) Methods for Detecting Associations with Rare Variants for Common Diseases: Application to Analysis of Sequence Data. The American Journal of Human Genetics 83, 311–321 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Madsen BE and Browning SR (2009) A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic. PLOS Genetics 5, e1000384 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Lawrence MS et al. (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Rheinbay E. et al. (2020) Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature 578, 102–111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Martínez-Jiménez F. et al. (2020) A compendium of mutational cancer driver genes. Nature reviews. Cancer 20, 555–572 [DOI] [PubMed] [Google Scholar]
- 78.Adzhubei I. et al. (2013) Predicting functional effect of human missense mutations using PolyPhen-2. Current Protocols in Human Genetics DOI: 10.1002/0471142905.hg0720s76 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Kumar P. et al. (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protocols 4, 1073–1081 [DOI] [PubMed] [Google Scholar]
- 80.Kircher M. et al. (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nature Genetics 46, 310–315 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Mularoni L. et al. (2016) OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations GENOME BIOLOGY, 17BIOMED CENTRAL LTD [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Shuai S. et al. (2020) Combined burden and functional impact tests for cancer driver discovery using DriverPower NATURE COMMUNICATIONS, 11NATURE PORTFOLIO [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Price AL et al. (2010) Pooled Association Tests for Rare Variants in Exon-Resequencing Studies. The American Journal of Human Genetics 86, 832–838 [DOI] [PMC free article] [PubMed] [Google Scholar]

