Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jul 7.
Published in final edited form as: Trends Mol Med. 2023 Apr 17;29(7):554–566. doi: 10.1016/j.molmed.2023.03.007

Cancer driver mutations: predictions and reality

Daria Ostroverkhova 1, Teresa M Przytycka 2,*, Anna R Panchenko 1,3,4,5,*
PMCID: PMC12232956  NIHMSID: NIHMS2087632  PMID: 37076339

Abstract

Cancer cells accumulate many genetic alterations throughout their lifetime but only few of them drive cancer progression, these are so called driver mutations. Driver mutations may vary between cancer types and patients, can remain latent for a long time and become drivers at certain cancer stages, or may drive oncogenesis only in conjunction with other mutations. The high mutational, biochemical, and histological tumor heterogeneity makes driver mutation experimental identification and computational prediction very challenging. In this review we summarize the most recent efforts to identify driver mutations in cancer and annotate their effects. We underline the success of computational methods to predict driver mutations in finding novel cancer biomarkers, including in circulating tumor DNA. We also report on boundaries of their applicability for clinical research.

Introduction

Cancer exploits an imbalance between cellular processes that lead to changes in DNA and those that repair them. As a result, cancer cells may accumulate many (epi)genetic alterations throughout their lifetime, one or two orders of magnitude more than germline and normal somatic cells [1]. It has been long recognized that cancerogenesis is a multi-stage process. One of the first pieces of evidence came from the observation that for certain cancer types the death rate increased as the sixth power of the patient age. Consequently a mathematical model was proposed suggesting several successive driving mutations and stages of cancer [2]. Further studies confirmed a small number of mutations driving cancerogenesis (driver mutations) [3,4], with about one driver mutation per patient in sarcomas, thyroid, and testicular cancers, and about four driver mutations per patient in bladder, endometrial, and colorectal cancers (Figure 1C) [5], while most mutations in cancer are assumed to be largely neutral (passenger mutations), they do not contribute to tumorigenesis. Vast majority of driver mutations represent single nucleotide substitutions, or point mutations which will be a topic of this review.

Figure 1.

Figure 1.

Driver and passenger mutations in human cancer. A: Detection of somatic mutations in human cancers requires the extraction of DNA from tumor and normal samples, DNA sequencing, alignment of paired tumor-normal sequences to the human reference genome for subsequent mutation calling. B: Cancerogenesis is a multi-stage process, and cells accumulate a myriad of somatic mutations throughout their life. Driver mutations lead to a variety of genetic and epigenetic alterations beneficial to cancer cells. C: A proportion of driver mutations per cancer type; all mutations are shown in gray radial bars, colored radial bars correspond to driver mutations for each cancer type. D: In certain cancer types, patients harboring driver mutations are characterized by better or worse survival compared to patients without drivers suggesting clinical importance of driver mutation identification.

Cancer driver mutations may affect cell cycle control, lead to insensitivity to the growth inhibitory signals and to escape from immune surveillance. Therefore, driver mutations confer a selective advantage and are usually under positive selection in cancer. Driver mutations can differ between cancer types and patients, and the same mutation may drive cancer progression under certain circumstances and be neutral in another environment. Yet, the classification into driver and passenger mutations is not binary and to make the story more complex, some mutations (so-called “latent drivers”[6]) might become drivers at a certain stage of cancer evolution or when combined with other mutations in the same or different genes. Indeed, individually infrequent, and functionally weak mutations may collectively account for a clonal selection of cancerogenic traits (Figure 1B) [7], whereas multiple mutations on the same allele can lead to the increased activity, cell proliferation and tumor growth [8]. Besides mutations in coding genes, there are also many alterations affecting non-coding regions in cancer [9].

The distribution of driver mutations over human genes is not uniform and recent studies, attempting to reconstruct the evolutionary history of individual tumors, showed that 50% of all early clonal driver mutations are located in only nine driver genes, whereas subclonal mutations occur in 35 different genes, pointing to a diverse set of drivers in later evolution [10]. Driver genes are usually classified into oncogenes and tumor suppressor genes. Oncogenes usually harbor gain-of-function mutations, which activate a protein and lead to uncontrollable cell growth or proliferation. Tumor suppressor genes, on the other hand, are responsible for homeostasis during cell division and DNA replication with a strong positive selection in cancer for deactivating mutations. Some genes can have both tumor suppressor and oncogene characteristics under different circumstances.

Detecting driver events in cancer is necessary for understanding the molecular mechanisms of cancer and consequently for developing diagnostic, prognostic, and treatment strategies. Indeed, as we show, there is a causative relationship between the presence of driver mutations in a tumor genome and the clinical phenotype of a given patient. There are many computational methods designed to detect driver genes [11], while relatively few methods rank mutations with respect to their driver statuses. This is explained by the harder complexity of the latter problem since the presence of a driver mutation in a gene is sufficient to call it driver but not vice versa. Here we review computational methods that predict driver mutations, identify mutational processes causing these mutations, and detect potential mutation-based clinical biomarkers. We evaluate experimental data sets used to annotate driver mutations and train computational models. Finally, we conclude our review with methods’ comparison, highlighting their prediction accuracy and limitations.

Estimating the background somatic mutation rate in cancer

Somatic mutations observed in cancer patients occur as a result of two major mechanisms: background mutability and natural selection. Background mutability refers to the somatic background mutation rate, it provides necessary variability and is shaped by multiple mutational and repair processes which are devoid of selection component. If mutations accumulate as a result of neutral background mutational processes, one should expect a positive linear correlation between the background mutation rate and the number of mutations observed in cancer patients, which has been indeed observed in many genes, with the strongest trend found for synonymous mutations [12]. Mutations that occur at higher or lower frequencies than expected from the background mutation rate model, might be under selection in cancer [5]. The ratio of non-synonymous to synonymous mutations (dN/dS) is being routinely used in cancer evolutionary genomics to estimate the effect of selection. Those genomic regions or nucleotide sites that are under positive selection would have ratio greater than one and the link between dN/dS values and fitness selection coefficients in somatic cancer evolution has been established [13]. There are also under-represented mutations which are under negative selection in cancer because they result in cell death or senescence. Estimates of the effects of negative selection in cancer have been controversial, mostly due to the low mutation counts, the presence of multiple gene copies and the recessiveness of deleterious mutations [14].

Accurate estimates of the neutral background mutation rate have proven to be difficult but crucial for identifying driver mutations. Background mutation rate depends on many endogenous and exogenous factors. It has been shown that cell-type specific (epi)genomic features, like replication timing, histone modifications and chromatin accessibility, may explain up to 86% of the variance in mutation rates in cancer genomes on a megabase scale [15]. Taking into account these large-scale covariates, various computational methods have been designed to estimate the regional variations of the background mutation rate and predict significantly mutated genes [16]. Yet, driver mutation prediction requires an estimate of the background mutation rate at the single nucleotide scale, which is more laborious. Previous studies pointed to the local DNA sequence context as a major factor describing the largest proportion of mutation rate variation [17], with hepta-nucleotide context explaining up to 80% variability in the per-nucleotide substitution rate [18]. Additionally, local DNA structures (non-B-DNA, such as DNA stem-loops or quadruplexes) also contribute to the mutation rate variation on a scale of single or ten base-pairs [1921]. Computational modeling of these mutational processes will be described next but it is often challenging to separate neutral background and natural selection components from each other.

Driver mutations and mutational signatures

DNA molecules are constantly exposed to different mutagenic processes such as UV light, smoking, reactive oxygen species (ROS) and many others. These mutagenic processes leave characteristic mutational patterns, which can be analyzed using mutational signatures or mutational motif frameworks [22,23]. Mutational signatures are typically modeled as a multinomial distribution over a set of mutation categories. Most commonly, mutation categories are defined as triplets of nucleotides where the nucleotide in the central position of the triplet is mutated while the flanking nucleotides provide local context for the mutation (Figure 2). Mutational signatures are identified computationally based on mutation catalogs of big cohorts of cancer genomes using methods such as non-negative matrix factorization [2426], latent Dirichlet allocation [27], topic models [28,29], and other approaches [30] . Currently, the most popular set of mutational signatures is provided in the COSMIC database [31]. Each mutational signature is aimed to be linked to a different mutational process. For example, specific signatures have been linked to smoking, homologous recombination deficiency, mutagenic activities of the APOBEC enzyme family, spontaneous or enzymatic deamination of 5-methylcytosine to thymine, among many others. In addition to mutational signatures, computational methods infer the so-called signatures’ exposures which measure the number of mutations attributed to each signature in a given cancer genome. This is done jointly with the de-novo inference of the mutational signatures using methods listed above or by dedicated approaches that utilize previously inferred mutational signatures [3234]. With these concepts at hand researchers are investigating whether some cancer driver mutations can be caused by specific mutagenic processes and, vice versa, whether cancer driver mutations can drive such mutagenic processes.

Figure 2.

Figure 2.

Schematic representation of mutational signature and mutational motif approaches to predict driver mutations. Computational methods are used to identify mutational processes represented in the form of mutational signatures or mutational motifs in cancer genomes. By analyzing mutational signatures/motifs, driver mutations can be predicted.

Indeed, there are several known examples where alterations in specific genes have been causally linked to specific signatures using either computational or experimental arguments (POLE [35], MUTYH [36], ERCC2 [37], MSH6 [38], and FHIT [39]). Interestingly, it is possible that different mutations in the same gene can lead to distinct genome-wide mutational signatures as it was shown in the case of mutations in POLE gene [35,40]. Leveraging a systems biology perspective, a recent study linked some mutational signatures to mutated subnetworks [41]. Specifically, the authors adapted a new computational method, NETPHIX, which uses integer linear programming technique to infer causal relations between mutations in specific gene networks and a continuous phenotype (here signature exposure) [42]. This study proposed, for example, that mutations in the gene subnetwork consisting of CDH10, CDH1, PIK3A contribute to the emergence of disperse (as opposed to clustered) APOBEC mutations in breast cancer.

The causal relation between a cancer mutation and a mutagenic process can also go in the opposite direction. There are several examples where known driver mutations are proposed to be caused by specific mutagenic processes. A prominent example is a driver mutation in PIK3CA gene caused by the APOBEC-induced mutagenesis observed across many cancers [43]. Recent computational studies performed large scale statistical analyses to identify such cancer-related mutational hotspots that can be attributed to specific mutational signatures [4446]. These methods utilize predefined mutational signatures and computationally derived exposures of these signatures across genomes to provide a statistical estimate of whether a given hotspot can be attributed to a given signature with high confidence. These studies linked several driver mutations to specific signatures. For example, consistent with previous studies [47], the KRAS G12C mutation (CCA > CAA) in lung adenocarcinoma has been linked to the smoking related signatures, whereas a signature attributed to aflatoxin was associated with TP53 R249S (GCC > GAC) mutation [46]. In addition, the sun exposure in melanoma has been linked with BRAF V600E mutation [48].

Another powerful and straightforward approach to analyze characteristic mutational patterns is to use mutational motifs. Mutational motifs represent a mutated nucleotide and its local DNA sequence where a number of flanking nucleotides and a location of a mutated site are varied (Figure 2). Historically, motifs have been derived from experimental studies and can be viewed as footprints of interactions between DNA sequence and mutagens, providing information about the underlying molecular mechanisms of mutations. A number of computational methods have been developed to extract, analyze and annotate mutational motifs from mutation data [22,4951].

Computational methods to predict driver mutations

Many computational methods have been developed that look for an excess of the number of mutations per genomic site or per genomic region relative to the background mutation rate as an indicator of positive selection [5,9,12,52]. Several studies employed the probabilistic statistical framework with parameters pertaining to the number of observed mutations, the number of patient samples, background mutation rate model, dN/dS, the ratio between the number of germline and somatic mutations and others (Figure 3) [5,14,5356]. For example, MutaGene method first builds the DNA context-dependent mutational profiles and uses these profiles as background somatic mutational models. Mutational profiles are cancer type specific and are constructed after removing the bias from mutational hotspots with recurring mutations [12]. Next, the MutaGene algorithm calculates the probability of observing a certain type of mutation (or amino acid substitution) in a given site more frequently (by chance) than the actual number of observed mutations in a given cohort of patients. Because of the limited statistical power for single nucleotide mutations, the inferred mutation rate from genomic regions with similar covariate values can be used [57]. The above-mentioned methods have relatively high accuracy (see the next section), are intuitive, do not employ many parameters, and do not require explicit training on driver and passenger mutation sets. However, these methods rely on mutation occurrence statistics but hotspot mutations, occurring in highly mutable sites, can still be neutral, and rare mutations can be drivers. Therefore, additional confounding factors should be explored in such cases.

Figure 3.

Figure 3.

Probabilistic statistical framework to predict driver mutations. To identify a status of mutation, these methods compare observed mutational frequencies in a given cohort of patients to the expected background mutability. Afterwards, DNA mutations (or amino acid substitutions) are ranked according to their scores.

Another approach to identify driver mutations is to decipher their phenotypic, functional effects with the main idea in mind that mutations affecting functionally important regions would be more likely to drive cancerogenesis (Figure 4) [58]. Indeed, evidence comes from the disproportional accumulation of cancer mutations in certain functional regions where functionally important sites may be located within the close proximity from each other either on DNA or protein structures [59]. Therefore these methods looks for individual or clustered mutations affecting functional sites [53,6062] and overall have high precision but low recall [63]. Other methods try to identify if mutations affect macromolecular interactions, including short-range or long-range interactions or have potential allosteric effects [6467]. It should be mentioned that loss-of-function driver mutations are relatively easier to predict than gain-of-function mutations as there many more ways to disrupt the function than create one. Generally, methods deciphering functional effects of mutations heavily rely on evolutionary conservation estimates and therefore on the quality of multiple sequence alignments. Consequently, it is challenging to predict the effects of mutations on sites with low conservation or for those protein families which do not exhibit much variability in evolutionary conservation [68].

Figure 4.

Figure 4.

Deciphering phenotypic and functional effects of mutations to identify their driver status. A: Effects of driver missense mutations on protein stability and binding. Driver mutations might have higher destabilizing (or in some cases over-stabilizing) effects compared to other mutations in a protein. B: Distribution of functional impact scores. Driver mutation might have higher impact on function compared to other mutations in a protein. Red bars depict driver mutations. C: Lossof-function driver mutations disrupt the function of a protein (e.g., altering a protein binding site), whereas gain-of-function driver mutations may lead to the increased functional activity of a protein or confer a new function (e.g., gaining a novel post-transcriptional modification (PTM) site).

There are also machine learning methods that specifically aim at predicting cancer driver mutations. They differ from each other by the sets of positive and negative examples used for training, the sets of features, and types of algorithms (Figure 5). The most popular features are evolutionary conservation, physico-chemical properties of amino acids and their functional impacts, genomic, epigenomic and structural contexts of mutated sites. The most common algorithms include Support Vector Machines (SVM) [67,68], Random Forests [6972], Naïve Bayes classifiers [73,74], and Hidden Markov models [75]. For example, the CADD method [69] predicts deleterious mutations employing SVM algorithm and incorporates a wide range of features, including evolutionary conservation, predicted effects on protein function, epigenetic annotations and genomic context. Similarly, another method, CanDrA [70], also uses SVM and takes into account a large set of (epi)genomic features as well as features computed by functional prediction algorithms. Although both methods employ an SVM algorithm, CADD focuses on predicting deleterious mutations, while CanDrA specifically identifies driver cancer mutations. The use of Random Forest algorithm is widespread across tools that aim at predicting driver mutations [71,72]. For instance, both DEOGEN2 and CHASMPlus methods both use Random Forest but differ from each other by the features they use. A set of features in DEOGEN2 is larger than in CHASMPlus and includes the pathway related features, whereas CHASMPlus provides cancer type specific predictions which allows it to perform better in some cases than DEOGEN2, as can be seen in the next section. Apart from supervised learning techniques, some algorithms employ unsupervised learning without predefined labels which can be learned by assuming that driver mutations are more uniformly distributed among different samples than passengers, and the obtained labels should be consistent with the functional impact score descriptors [73,74].

Figure 5.

Figure 5.

Supervised machine learning workflow for driver mutation identification involves several steps. First, various features are selected and preprocessed from the input data. Next, a suitable algorithm is chosen to train the prediction model on the preprocessed labelled input data using features of interest. After training process, the algorithm can make prediction of driver mutations on new data. Finally, the results are validated and interpreted to elucidate a status of mutations.

Several attempts have been made to apply a sub-class of machine learning methods, deep learning neural network approaches, to the problem of driver mutation annotation [7579]. In one study a convolutional neural network (CNN) has been used to annotate the pathogenicity of mutations [79], whereas in another study [76] a novel deep learning architecture has been designed to distinguish cancer from non-cancer mutations. A recent method, Dig, combines the probabilistic approach with deep learning framework to identify cancer drivers by estimating the background somatic mutation rates on both kilobase and single nucleotide scales and then by testing for signs of positive selection [75]. However, despite their novelty most deep learning methods focus on predicting general pathogenic variants and not cancer driver mutations. One of the reasons lies in a limited training labelled data available for driver and passenger cancer mutations.

Constructing experimental data sets and evaluating the performance of computational methods

Currently, there is no way to directly assess the tumorigenic effects of individual mutations in cancer patients. Therefore, various indirect experimental validation protocols have been proposed. Driver mutations in cancer can be inferred experimentally by inducing the xenograft tumor formation, by monitoring drug sensitivity or by observing the changes in gene expression profiling [8082]. Previously, the transformative potential of specific alleles in mice was estimated on the large scale, where 474 cancer-mutated alleles were expressed in HA1E-M cells and then used in an in vivo pooled strategy [81]. Most recently, functional effects of cancer mutations were assessed on the proliferation of pre-cancerous non-tumorigenic bronchial epithelial cells using cytosine and adenine base editors [83,84], whereas another study tested 1049 mutations in 2-growth factor-dependent cell models, Ba/F3 and MCF10A, and drivers were identified if the mutant cell viability was higher than the wild-type [84]. These data sets have been used for training and validation of computational methods.

Below we summarize several studies attempting a comprehensive comparison of cancer driver mutation predictors on the large scale. In one of the comparative studies an integrative approach combining methods to predict driver genes with methods distinguishing driver from passenger mutations using sequence (CTAT-cancer) was shown to outperform individual methods [84]. Another comparison on in vivo and in vitro tumour formation assays yielded the following accuracy with PROVEAN [85] having the highest accuracy (the area under the receiver-operating curve, AUC=0.72), followed by PrimateAI [86], DEOGEN2 [72] and CHASM [61,71]. Since the training and test sets are usually unbalanced with respect to the number of positives and negatives, some comparison studies calculated the balanced measure, the Mathews correlation coefficient (MCC). For example, the best classifiers based on this evaluation metric and comprehensive test sets were found to be CHASMplus, MutaGene and CanDrAplus (MCC = 0.64, 0.61, and 0.58 respectively) [12]. Importantly, it was shown that the probabilistic score with the correction by background mutability led to a much better performance than the frequency of mutated alleles in cancer patients – a metric widely used in the clinical research to prioritize mutations with respect to their relevance. It should be mentioned that driver mutations are specific to cancer type and tissue, but very few computational methods provide cancer-type specific driver mutation predictions. Cancer-specific predictions were compared lately, and TVA and MutaGene were found to outperform other methods [52].

Driver mutations as cancer biomarkers

The connection between a person’s genetic background and the response to the treatment has been established a long time ago. Nowadays such approach is heavily based on genetic testing but only a small fraction of patients harbouring potential driver mutations (biomarkers) are enrolled in genotype-matched trials [87]. Prognostic biomarkers are used to derive the correct diagnosis, whereas pharmacogenomic biomarkers may predict the response to the drug based on a specific genomic markup. Many mutation biomarkers have been identified so far. In lung adenocarcinoma patients multiplexed assays showed that 25%, 17% and 8% of patients had KRAS, EGFR and ALK alterations respectively, and patients with driver mutations had 2.4 years median survival compared to 2.1 years for patients without biomarkers [88]. Similarly, patients harboring certain driver mutations in DNA polymerase epsilon gene had an excellent prognosis and responded well to immunotherapy, allowing to use this biomarker in the pre-treatment triage [89]. Not only individual mutations can serve as clinical biomarkers, but most recently, clustered mutations in TP53, EGFR and BRAF genes have been found to be associated with the overall survival [90].

The most recent evidence supports the view of mutations in epigenetic signalling machinery, including histones and chromatin remodelers as potential new epigenetic biomarkers in cancer [91]. For example, children diagnosed with the diffuse midline glioma that also carry K27M mutations in histone H3, have an overall survival rate of less than a year [92] compared to more than four years survival for patients with the wild type [93]. Based on these findings, the World Health Organization defined a new type of highly aggressive tumor and included H3K27M mutations as biomarkers of diffuse midline glioma. Mutations in other players of the epigenetic machinery, chromatin-remodeling genes, ARID1A and ARID1B, were also shown to be associated with treatment failure and decreased survival [94].

In addition, mutational signatures can be used as biomarkers and predictors of drug response [95]. Since many mutational signatures are caused by a malfunctioned DNA repair mechanism, they can be used as biomarkers for drugs which target a complementary DNA repair process by the means of the synthetic lethality [96]. APOBEC signatures, on the other hand, are predictive of a response to immunotherapy in several cancers [97,98]. This is attributed, in part, to high tumor mutation burden which is known to be predictive of the response to immunotherapy [99] and, in part, to immune-related APOBEC3B upregulation [98].

Liquid biopsy is a new diagnostic tool which identifies tumor-related material, including circulating tumor DNA (ctDNA), proteins and antibodies in blood. It has certain advantages over conventional tissue biopsy: it is non-invasive and permits more comprehensive characterization of tumor heterogeneity and clonal evolution during treatment. Development of ctDNA technique has enabled the detection of new mutational and epigenetic cancer biomarkers in ctDNA. For example, the CancerSEEK protocol, which relies on levels of circulating proteins and mutations in ctDNA, reported 70% correct detection rate for eight types of cancers [100]. A recent TARGET study employed ctDNA technology and identified cancer mutations in 41% of patients [101], whereas EGFR-sensitizing mutations were observed in 11% of patients [102]. ctDNA technologies reported from 25% up to 65% increase in novel biomarker detection compared to conventional sequencing studies [103,104], and these novel biomarkers were associated with short survival and included subclonal drivers of drug resistance [104]. Despite its promising clinical applications, liquid biopsy profiling using sequencing of ctDNA still suffers from the limited sensitivity and specificity because of the low content of ctDNA in some patients [105] and false positives coming from mutations in hematopoietic cells. Computational approaches have been proposed to mitigate this issue and to distinguish somatic mutations explained by clonal hematopoiesis from the tumor derived ones [106].

Boundaries of applicability

As shown in a previous section, predictions of driver mutations are critical as they pave the road for the subsequent characterization of the cancerogenic mechanisms and clinical applications. Here we would like to highlight the methods’ limitations and a range of applicability, with the ultimate goal to accelerate their improvement. A recent comparison of driver mutation predictors pointed out that the majority of methods actually learn to discriminate driver genes from non-driver genes which is an easier problem than classifying individual variants [107]. Namely, the authors showed that if negative cases (passenger variants), which are usually derived from non-driver genes, were in fact taken from driver genes, about 28% drop in prediction performance was observed. Subsequently, this study proposed to weight genes according to the amount of conditional entropy needed to describe driver and passenger statuses of variants, given the driver gene label where these variants were found.

Another limitation of new methods’ development and their fair comparison is the lack of an objective gold standard benchmarking in cancer research. Supervised learning methods rely on positive and negative sets (drivers and passengers), which are not clearly defined. In fact, very often different methods use non-congruent definitions of drivers and passengers. Passengers are defined either as SNP polymorphisms, or deleterious germline variants, or can be sampled randomly. Moreover, many experimental data sets report variants with the pronounced functional disrupting effects and have a shortage of the experimentally verified negative cases (with no effect). Databases, like ClinVar [108] and OncoKB [109] and others, comprise many driver variants identified experimentally but also contain predicted drivers. This, in turn, may lead to circularity and error propagation.

As the number of pathogenic mutation predictors is large and their training is relatively easy to perform, very often cancer driver mutation predictors are inadequately compared with the pathogenicity predictors. In addition, many machine learning methods suffer from overfitting and a lack of interpretability, this issue pertains to cancer driver predictors. Proposed models can be too complex and fit the noise in the data rather than describing the real pattern. Indeed, taking into account a large number of confounding factors and parameters can result in a small training and validation errors, but can have nothing to do with actual biological mechanisms. Finally, many predictions are not cancer or patient-specific and have limited clinical significance in this regard.

Conclusion

Identification and interpretation of cancer driver mutations is one of the most important problems in cancer biology. Large scale cancer sequencing projects allowed for the development of many computational methods addressing this challenge. Yet, despite significant progress, many open questions remain, and new challenges continue to emerge. First, although some cancer driver mutations can be detected as mutational hotspots, most potential cancer driver mutations are found at relatively low frequencies and thus are difficult to detect by computational approaches [110]. To boost the power of computational methods, additional sequencing efforts are needed. In addition, more biologically realistic models of the background mutation rate and models leveraging the concept of mutational signatures would likely lead to more confident predictions of cancer driver mutations.

Next, it is appreciated that cancer driver mutations do not act in isolation. While many studies considered dependencies between cancer driver genes, including their mutual exclusivity, co-occurrence, and functional dependencies (e.g. [111]), these studies typically focused on gene level analyses. Larger sets of mutational data will allow to extend these methods to mutation-level analyses. In addition, whether a given mutation contributes to cancer initiation and progression depends on the genetic background of the patient, tissues of tumor origin, and tumor evolutionary history. Uncovering such dependencies remains challenging, especially in the context of rarely mutated genes, and requires sequencing of more tumor samples across diverse populations and tumor stages. Cancer driver mutations are also uncovered by functional analyses. Unfortunately, many putative cancer drivers fall into poorly functionally characterized regions [16]. While new machine learning approaches provide promising tools for such functional studies, these methods require large and unbiased training data. Additional challenge emerges from a limited number of gold standard sets for training and validation. However, as new data continues to emerge, machine learning methods are expected to become more powerful.

Finally, in addition to the functional validation, cancer driving properties can be evaluated in the terms of selection during tumor evolution. Understanding of tumor evolution requires computational analysis of intratumor heterogeneity. Recent years have brought a rapid progress in single cell sequencing and other single-cell level experiments. These new experimental techniques are increasingly leveraged to study tumor evolution [112] and are likely to provide additional tools helping to identify and to study cancer driver mutations.

Acknowledgements

DO and ARP were supported by the Department of Pathology and Molecular Medicine, Queen’s University, Canada. ARP is the recipient of a Senior Canada Research Chair in Computational Biology and Biophysics and a Senior Investigator Award from the Ontario Institute of Cancer Research, Canada. ARP acknowledges the support of the Natural Sciences and Engineering Research Council of Canada (NSERC) (No. RGPIN/02972-2021 ARP). TMP was supported by the Intramural Research Program of the National Library of Medicine, NIH.

References

  • 1.Lynch M (2010) Rate, molecular spectrum, and consequences of human mutation. Proc Natl Acad Sci U S A 107, 961–968. 10.1073/pnas.0912629107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Armitage P and Doll R (1954) The age distribution of cancer and a multi-stage theory of carcinogenesis. Br J Cancer 8, 1–12. 10.1038/bjc.1954.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Greenman C et al. (2007) Patterns of somatic mutation in human cancer genomes. Nature 446, 153–158. 10.1038/nature05610 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Tomasetti C et al. (2015) Only three driver gene mutations are required for the development of lung and colorectal cancers. Proc Natl Acad Sci U S A 112, 118–123. 10.1073/pnas.1421839112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Martincorena I et al. (2017) Universal Patterns of Selection in Cancer and Somatic Tissues. Cell 171, 1029–1041 e1021. 10.1016/j.cell.2017.09.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Nussinov R and Tsai CJ (2015) ‘Latent drivers’ expand the cancer mutational landscape. Curr Opin Struct Biol 32, 25–32. 10.1016/j.sbi.2015.01.004 [DOI] [PubMed] [Google Scholar]
  • 7.Saito Y et al. (2020) Landscape and function of multiple mutations within individual oncogenes. Nature 582, 95–99. 10.1038/s41586-020-2175-2 [DOI] [PubMed] [Google Scholar]
  • 8.Vasan N et al. (2019) Double PIK3CA mutations in cis increase oncogenicity and sensitivity to PI3Kalpha inhibitors. Science 366, 714–723. 10.1126/science.aaw9032 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Rheinbay E et al. (2020) Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature 578, 102–111. 10.1038/s41586-020-1965-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Gerstung M et al. (2020) The evolutionary history of 2,658 cancers. Nature 578, 122–128. 10.1038/s41586-019-1907-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Tokheim CJ et al. (2016) Evaluating the evaluation of cancer driver genes. Proc Natl Acad Sci U S A 113, 14330–14335. 10.1073/pnas.1616440113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Brown AL et al. (2019) Finding driver mutations in cancer: Elucidating the role of background mutational processes. PLoS Comput Biol 15, e1006981. 10.1371/journal.pcbi.1006981 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Williams MJ et al. (2020) Measuring the distribution of fitness effects in somatic evolution by combining clonal dynamics with dN/dS ratios. Elife 9. 10.7554/eLife.48714 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Zapata L et al. (2018) Negative selection in tumor genome evolution acts on essential cellular functions and the immunopeptidome. Genome Biol 19, 67. 10.1186/s13059-018-1434-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Polak P et al. (2015) Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364. 10.1038/nature14221 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Shuai S et al. (2020) Combined burden and functional impact tests for cancer driver discovery using DriverPower. Nat Commun 11, 734. 10.1038/s41467-019-13929-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Barlough JE et al. (1986) Antibodies to marine caliciviruses in the Pacific walrus (Odobenus rosmarus divergens Illiger). J Wildl Dis 22, 165–168. 10.7589/0090-3558-22.2.165 [DOI] [PubMed] [Google Scholar]
  • 18.Aggarwala V and Voight BF (2016) An expanded sequence context model broadly explains variability in polymorphism levels across the human genome. Nat Genet 48, 349–355. 10.1038/ng.3511 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Georgakopoulos-Soares I et al. (2018) Noncanonical secondary structures arising from non-B DNA motifs are determinants of mutagenesis. Genome Res 28, 1264–1271. 10.1101/gr.231688.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Buisson R et al. (2019) Passenger hotspot mutations in cancer driven by APOBEC3A and mesoscale genomic features. Science 364. 10.1126/science.aaw2872 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Du X et al. (2014) Potential non-B DNA regions in the human genome are associated with higher rates of nucleotide mutation and expression variation. Nucleic Acids Res 42, 12367–12379. 10.1093/nar/gku921 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Rogozin IB et al. (2018) Mutational signatures and mutable motifs in cancer genomes. Brief Bioinform 19, 1085–1101. 10.1093/bib/bbx049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kim YA et al. (2021) Mutational Signatures: From Methods to Mechanisms. Annu Rev Biomed Data Sci 4, 189–206. 10.1146/annurev-biodatasci-122320-120920 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Alexandrov LB et al. (2013) Deciphering signatures of mutational processes operative in human cancer. Cell Rep 3, 246–259. 10.1016/j.celrep.2012.12.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Nik-Zainal S et al. (2012) Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993. 10.1016/j.cell.2012.04.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Goncearenco A et al. (2017) Exploring background mutational processes to decipher cancer genetic heterogeneity. Nucleic Acids Res 45, W514–W522. 10.1093/nar/gkx367 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Matsutani T et al. (2019) Discovering novel mutation signatures by latent Dirichlet allocation with variational Bayes inference. Bioinformatics 35, 4543–4552. 10.1093/bioinformatics/btz266 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Robinson W et al. (2019) Modeling clinical and molecular covariates of mutational process activity in cancer. Bioinformatics 35, i492–i500. 10.1093/bioinformatics/btz340 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Funnell T et al. (2019) Integrated structural variation and point mutation signatures in cancer genomes using correlated topic models. PLoS Comput Biol 15, e1006799. 10.1371/journal.pcbi.1006799 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Wojtowicz D et al. (2021) RepairSig: Deconvolution of DNA damage and repair contributions to the mutational landscape of cancer. Cell Syst 12, 994–1003 e1004. 10.1016/j.cels.2021.07.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Tate JG et al. (2019) COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res 47, D941–D947. 10.1093/nar/gky1015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Rosenthal R et al. (2016) DeconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol 17, 31. 10.1186/s13059-016-0893-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Huang X et al. (2018) Detecting presence of mutational signatures in cancer with confidence. Bioinformatics 34, 330–337. 10.1093/bioinformatics/btx604 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Li S et al. (2020) Using sigLASSO to optimize cancer mutation signatures jointly with sampling likelihood. Nat Commun 11, 3575. 10.1038/s41467-020-17388-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Fang H et al. (2020) Mutational processes of distinct POLE exonuclease domain mutants drive an enrichment of a specific TP53 mutation in colorectal cancer. PLoS Genet 16, e1008572. 10.1371/journal.pgen.1008572 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Viel A et al. (2017) A Specific Mutational Signature Associated with DNA 8-Oxoguanine Persistence in MUTYH-defective Colorectal Cancer. EBioMedicine 20, 39–49. 10.1016/j.ebiom.2017.04.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Kim J et al. (2016) Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors. Nat Genet 48, 600–606. 10.1038/ng.3557 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zou X et al. (2018) Validating the concept of mutational signatures with isogenic cell models. Nat Commun 9, 1744. 10.1038/s41467-018-04052-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Volinia S et al. (2017) The ubiquitous ‘cancer mutational signature’ 5 occurs specifically in cancers with deleted FHIT alleles. Oncotarget 8, 102199–102211. 10.18632/oncotarget.22321 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Hodel KP et al. (2020) POLE Mutation Spectra Are Shaped by the Mutant Allele Identity, Its Abundance, and Mismatch Repair Status. Mol Cell 78, 1166–1177 e1166. 10.1016/j.molcel.2020.05.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kim YA et al. (2020) Network-based approaches elucidate differences within APOBEC and clock-like signatures in breast cancer. Genome Med 12, 52. 10.1186/s13073-020-00745-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Kim YA et al. (2020) Identifying Drug Sensitivity Subnetworks with NETPHIX. iScience 23, 101619. 10.1016/j.isci.2020.101619 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.McGranahan N et al. (2015) Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci Transl Med 7, 283ra254. 10.1126/scitranslmed.aaa1408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Wong JKL et al. (2022) Association of mutation signature effectuating processes with mutation hotspots in driver genes and non-coding regions. Nat Commun 13, 178. 10.1038/s41467-021-27792-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Poulos RC et al. (2018) Analysis of 7,815 cancer exomes reveals associations between mutational processes and somatic driver mutations. PLoS Genet 14, e1007779. 10.1371/journal.pgen.1007779 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Temko D et al. (2018) The effects of mutational processes and selection on driver mutations across cancer types. Nat Commun 9, 1857. 10.1038/s41467-018-04208-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Riely GJ et al. (2008) Frequency and distinctive spectrum of KRAS mutations in never smokers with lung adenocarcinoma. Clin Cancer Res 14, 5731–5734. 10.1158/1078-0432.CCR-08-0646 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Greaves WO et al. (2013) Frequency and spectrum of BRAF mutations in a retrospective, single-institution study of 1112 cases of melanoma. J Mol Diagn 15, 220–226. 10.1016/j.jmoldx.2012.10.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Rogozin IB and Pavlov YI (2003) Theoretical analysis of mutation hotspots and their DNA sequence context specificity. Mutat Res 544, 65–85. 10.1016/s1383-5742(03)00032-2 [DOI] [PubMed] [Google Scholar]
  • 50.Stormo GD et al. (1986) Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res 14, 6661–6679. 10.1093/nar/14.16.6661 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Pham P et al. (2003) Processive AID-catalysed cytosine deamination on single-stranded DNA simulates somatic hypermutation. Nature 424, 103–107. 10.1038/nature01760 [DOI] [PubMed] [Google Scholar]
  • 52.Landau J et al. (2023) Shared Cancer Dataset Analysis Identifies and Predicts the Quantitative Effects of Pan-Cancer Somatic Driver Variants. Cancer Res 83, 74–88. 10.1158/0008-5472.CAN-22-1038 [DOI] [PubMed] [Google Scholar]
  • 53.Chang MT et al. (2016) Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat Biotechnol 34, 155–163. 10.1038/nbt.3391 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Lochovsky L et al. (2015) LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations. Nucleic Acids Res 43, 8123–8134. 10.1093/nar/gkv803 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Vitsios D et al. (2022) Cancer-driving mutations are enriched in genic regions intolerant to germline variation. Sci Adv 8, eabo6371. 10.1126/sciadv.abo6371 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Dietlein F et al. (2020) Identification of cancer driver genes based on nucleotide context. Nat Genet 52, 208–218. 10.1038/s41588-019-0572-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Lawrence MS et al. (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218. 10.1038/nature12213 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Li M et al. (2017) Annotating Mutational Effects on Proteins and Protein Interactions: Designing Novel and Revisiting Existing Protocols. Methods Mol Biol 1550, 235–260. 10.1007/978-1-4939-6747-6_17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Porta-Pardo E and Godzik A (2014) e-Driver: a novel method to identify protein regions driving cancer. Bioinformatics 30, 3109–3114. 10.1093/bioinformatics/btu499 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Tamborero D et al. (2013) OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics 29, 2238–2244. 10.1093/bioinformatics/btt395 [DOI] [PubMed] [Google Scholar]
  • 61.Chen H et al. (2020) Comprehensive assessment of computational algorithms in predicting cancer driver mutations. Genome Biol 21, 43. 10.1186/s13059-020-01954-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Tamborero D et al. (2018) Cancer Genome Interpreter annotates the biological and clinical relevance of tumor alterations. Genome Med 10, 25. 10.1186/s13073-018-0531-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Porta-Pardo E et al. (2017) Comparison of algorithms for the detection of cancer drivers at subgene resolution. Nat Methods 14, 782–788. 10.1038/nmeth.4364 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Liu C et al. (2021) A network-based deep learning methodology for stratification of tumor mutations. Bioinformatics 37, 82–88. 10.1093/bioinformatics/btaa1099 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Tan ZW et al. (2020) AlloSigMA 2: paving the way to designing allosteric effectors and to exploring allosteric effects of mutations. Nucleic Acids Res 48, W116–W124. 10.1093/nar/gkaa338 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Nishi H et al. (2013) Cancer missense mutations alter binding properties of proteins and their interaction networks. PLoS One 8, e66273. 10.1371/journal.pone.0066273 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Zhu H et al. (2020) Candidate Cancer Driver Mutations in Distal Regulatory Elements and Long-Range Chromatin Interaction Networks. Mol Cell 77, 1307–1321 e1310. 10.1016/j.molcel.2019.12.027 [DOI] [PubMed] [Google Scholar]
  • 68.Li M et al. (2016) Balancing Protein Stability and Activity in Cancer: A New Approach for Identifying Driver Mutations Affecting CBL Ubiquitin Ligase Activation. Cancer Res 76, 561–571. 10.1158/0008-5472.CAN-14-3812 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Rentzsch P et al. (2019) CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res 47, D886–D894. 10.1093/nar/gky1016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Mao Y et al. (2013) CanDrA: cancer-specific driver missense mutation annotation with optimized features. PLoS One 8, e77945. 10.1371/journal.pone.0077945 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Carter H et al. (2009) Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer Res 69, 6660–6667. 10.1158/0008-5472.CAN-09-1133 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Raimondi D et al. (2017) DEOGEN2: prediction and interactive visualization of single amino acid variant deleteriousness in human proteins. Nucleic Acids Res 45, W201–W206. 10.1093/nar/gkx390 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Kumar RD et al. (2016) Unsupervised detection of cancer driver mutations with parsimony-guided learning. Nat Genet 48, 1288–1294. 10.1038/ng.3658 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Bailey MH et al. (2018) Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173, 371–385 e318. 10.1016/j.cell.2018.02.060 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Sherman MA et al. (2022) Genome-wide mapping of somatic mutation rates uncovers drivers of cancer. Nat Biotechnol 40, 1634–1643. 10.1038/s41587-022-01353-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Gupta P et al. (2022) A new deep learning technique reveals the exclusive functional contributions of individual cancer mutations. J Biol Chem 298, 102177. 10.1016/j.jbc.2022.102177 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Luo P et al. (2019) deepDriver: Predicting Cancer Driver Genes Based on Somatic Mutations Using Deep Convolutional Neural Networks. Front Genet 10, 13. 10.3389/fgene.2019.00013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Zeng Z et al. (2021) Deep learning for cancer type classification and driver gene identification. BMC Bioinformatics 22, 491. 10.1186/s12859-021-04400-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Quang D et al. (2015) DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31, 761–763. 10.1093/bioinformatics/btu703 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Berger AH et al. (2016) High-throughput Phenotyping of Lung Cancer Somatic Mutations. Cancer Cell 30, 214–228. 10.1016/j.ccell.2016.06.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Kim E et al. (2016) Systematic Functional Interrogation of Rare Cancer Variants Identifies Oncogenic Alleles. Cancer Discov 6, 714–726. 10.1158/2159-8290.CD-16-0160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Kohsaka S et al. (2017) A method of high-throughput functional evaluation of EGFR gene variants of unknown significance in cancer. Sci Transl Med 9. 10.1126/scitranslmed.aan6566 [DOI] [PubMed] [Google Scholar]
  • 83.Kim Y et al. (2022) High-throughput functional evaluation of human cancer-associated mutations using base editors. Nat Biotechnol 40, 874–884. 10.1038/s41587-022-01276-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Bailey MH et al. (2018) Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 174, 1034–1035. 10.1016/j.cell.2018.07.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Choi Y et al. (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS One 7, e46688. 10.1371/journal.pone.0046688 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Sundaram L et al. (2018) Predicting the clinical impact of human mutation with deep neural networks. Nat Genet 50, 1161–1170. 10.1038/s41588-018-0167-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Meric-Bernstam F et al. (2015) Feasibility of Large-Scale Genomic Testing to Facilitate Enrollment Onto Genomically Matched Clinical Trials. J Clin Oncol 33, 2753–2762. 10.1200/JCO.2014.60.4165 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Kris MG et al. (2014) Using multiplexed assays of oncogenic drivers in lung cancers to select targeted drugs. JAMA 311, 1998–2006. 10.1001/jama.2014.3741 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Barbari SR and Shcherbakova PV (2017) Replicative DNA polymerase defects in human cancers: Consequences, mechanisms, and implications for therapy. DNA Repair (Amst) 56, 16–25. 10.1016/j.dnarep.2017.06.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Bergstrom EN et al. (2022) Mapping clustered mutations in cancer reveals APOBEC3 mutagenesis of ecDNA. Nature 602, 510–517. 10.1038/s41586-022-04398-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Espiritu D et al. (2021) Molecular Mechanisms of Oncogenesis through the Lens of Nucleosomes and Histones. J Phys Chem B 125, 3963–3976. 10.1021/acs.jpcb.1c00694 [DOI] [PubMed] [Google Scholar]
  • 92.Wu G et al. (2012) Somatic histone H3 alterations in pediatric diffuse intrinsic pontine gliomas and non-brainstem glioblastomas. Nat Genet 44, 251–253. 10.1038/ng.1102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Khuong-Quang DA et al. (2012) K27M mutation in histone H3.3 defines clinically and biologically distinct subgroups of pediatric diffuse intrinsic pontine gliomas. Acta Neuropathol 124, 439–447. 10.1007/s00401-012-0998-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Sausen M et al. (2013) Integrated genomic analyses identify ARID1A and ARID1B alterations in the childhood cancer neuroblastoma. Nat Genet 45, 12–17. 10.1038/ng.2493 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Levatic J et al. (2022) Mutational signatures are markers of drug sensitivity of cancer cells. Nat Commun 13, 2926. 10.1038/s41467-022-30582-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Gulhan DC et al. (2019) Detecting the mutational signature of homologous recombination deficiency in clinical samples. Nat Genet 51, 912–919. 10.1038/s41588-019-0390-2 [DOI] [PubMed] [Google Scholar]
  • 97.Faden DL et al. (2019) APOBEC mutagenesis is tightly linked to the immune landscape and immunotherapy biomarkers in head and neck squamous cell carcinoma. Oral Oncol 96, 140–147. 10.1016/j.oraloncology.2019.07.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Wang S et al. (2018) APOBEC3B and APOBEC mutational signature as potential predictive markers for immunotherapy response in non-small cell lung cancer. Oncogene 37, 3924–3936. 10.1038/s41388-018-0245-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Chan TA et al. (2019) Development of tumor mutation burden as an immunotherapy biomarker: utility for the oncology clinic. Ann Oncol 30, 44–56. 10.1093/annonc/mdy495 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Cohen JD et al. (2018) Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930. 10.1126/science.aar3247 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Rothwell DG et al. (2019) Utility of ctDNA to support patient selection for early phase clinical trials: the TARGET study. Nat Med 25, 738–743. 10.1038/s41591-019-0380-z [DOI] [PubMed] [Google Scholar]
  • 102.Mayo-de-Las-Casas C et al. (2017) Large scale, prospective screening of EGFR mutations in the blood of advanced NSCLC patients to guide treatment decisions. Ann Oncol 28, 2248–2255. 10.1093/annonc/mdx288 [DOI] [PubMed] [Google Scholar]
  • 103.Mack PC et al. (2020) Spectrum of driver mutations and clinical impact of circulating tumor DNA analysis in non-small cell lung cancer: Analysis of over 8000 cases. Cancer 126, 3219–3228. 10.1002/cncr.32876 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Jee J et al. (2022) Overall survival with circulating tumor DNA-guided therapy in advanced non-small-cell lung cancer. Nat Med 28, 2353–2363. 10.1038/s41591-022-02047-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Godsey JH et al. (2020) Generic Protocols for the Analytical Validation of Next-Generation Sequencing-Based ctDNA Assays: A Joint Consensus Recommendation of the BloodPAC’s Analytical Variables Working Group. Clin Chem 66, 1156–1166. 10.1093/clinchem/hvaa164 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Razavi P et al. (2019) High-intensity sequencing reveals the sources of plasma circulating cell-free DNA variants. Nat Med 25, 1928–1937. 10.1038/s41591-019-0652-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Raimondi D et al. (2021) Current cancer driver variant predictors learn to recognize driver genes instead of functional variants. BMC Biol 19, 3. 10.1186/s12915-020-00930-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Landrum MJ et al. (2018) ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 46, D1062–D1067. 10.1093/nar/gkx1153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Chakravarty D et al. (2017) OncoKB: A Precision Oncology Knowledge Base. JCO Precis Oncol 2017. 10.1200/PO.17.00011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Lawrence MS et al. (2014) Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501. 10.1038/nature12912 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Dao P et al. (2017) BeWith: A Between-Within method to discover relationships between cancer modules via integrated analysis of mutual exclusivity, co-occurrence and functional interactions. PLoS Comput Biol 13, e1005695. 10.1371/journal.pcbi.1005695 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Nam AS et al. (2021) Integrating genetic and non-genetic determinants of cancer evolution by single-cell multi-omics. Nat Rev Genet 22, 3–18. 10.1038/s41576-020-0265-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES