Abstract
Cancer is an evolutionary disease driven by mutations in asexually reproducing somatic cells. In asexual microbes, bias reversals in the mutation spectrum can speed adaptation by increasing access to previously undersampled beneficial mutations. By analyzing tumors from 20 tissues, along with normal tissue and the germline, we demonstrate this effect in cancer. Nonhypermutated tumors reverse the germline mutation bias and have consistent spectra across tissues. These spectra changes carry the signature of hypoxia, and they facilitate positive selection in cancer genes. Hypermutated and nonhypermutated tumors thus acquire driver mutations differently: hypermutated tumors by higher mutation rates and nonhypermutated tumors by changing the mutation spectrum to reverse the germline mutation bias.
Keywords: cancer, mutation bias, mutation spectrum, hypermutator, positive selection, hypoxia
Introduction
Cancer is an evolutionary disease arising from DNA mutations that allow cells to proliferate ab- normally and invade other tissues (Brown et al. 2023). During the asexual reproduction of normal somatic cells, de novo mutations accumulate through time, which increases cancer risk with age (Cancer Research UK 2024). Cancer develops when mutations in specific genes or combinations of genes (so-called “cancer genes”) impair normal cell function by, for example, disabling the cell cycle checkpoints or activating signals which drive excessive cell division (National Cancer Institute 2021).
For both evolution within a tumor and evolution within a population of asexual organisms, the mutation rate is a key factor. In microbial evolution, lineages with an increased mutation rate (mutators) frequently emerge due to their improved access to rare beneficial mutations (Raynes and Sniegowski 2014). Along with increases in mutation rate, changes in mutation spectra also affect mutational supply (Storz et al. 2019; Gomez et al. 2020; Sane et al. 2023). We previously demonstrated that the interaction between mutation rate and spectrum powerfully influences the evolutionary trajectory of asexual populations (Tuffaha et al. 2023). Simply put, previously undersampled mutations may be reached by either making more mutations (mutation rate elevation, Fig. 1a), or by making different mutations (mutation spectrum change, Fig. 1b). If mutations have historically occurred with some bias (undersampling some classes of mutations while oversampling others), reversing this bias affords access to mutations that were previously unlikely to have occurred, such as undersampled beneficial mutations (Sane et al. 2023).
Fig. 1.
a,b) Schematic: a germline spectrum (green) under-represents some classes of mutations. Access to these classes increases by either an overall mutation rate increase (a) or reversing the bias (b). c) Example of a spectrum with three types of mutations having the same uniform frequency 1/3. Type 1 is over-represented in the germline (, green) while the other two are under-represented. Arrows show the direction that reverses the bias for each mutational type. An example of a bias-reversing spectrum is shown in blue (). d) Distributions of the RMR for transitions in normal tissues (yellow), HM tumors (red), and NHM tumors (blue) in different tissues, compared with the uniform (purple) and germline (green) levels. e) The corresponding RMRs for each 1-mer mutation type. Stars indicate distribution means that are significantly different from the germline. f) Bias reversals (y axis) for passenger gene spectra in different tissues (boxplots) are significantly higher in NHM than in HM or normal tissues at all three levels of analysis (x axis). The bias reversal measure for nonsynonymous mutations in cancer genes is also plotted (“x”). The germline results are shown as solid lines since the standard errors are smaller than the line thickness. Horizontal bars within boxplots indicate medians; whiskers indicate the confidence interval (CI); “+” signs represent outliers.
Cancer is a set of diseases that have similar hallmarks (Hanahan 2022), but there are many differences between cancer types, including different mutational processes and rates (Alexandrov et al. 2020). Tumors vary in mutational burden both within the same tissue (across patients) and between different tissues (Zhou et al. 2021). Hypermutation is usually caused by DNA mismatch repair defects, whether inherited or acquired from somatic mutations, which is common in gastrointestinal cancers (Yuza et al. 2017). Exposure to UV light also elevates mutation rates in skin cancers (Garibyan and Fisher 2010). Thus, hypermutation is sometimes defined based on these etiologies rather than the exact number of mutations (Alexandrov et al. 2020). There is debate over whether a single universal threshold can be established to define “hypermutation” for all cancer types and sequencing techniques, but it is generally considered to fall within the range of 10 to 20 mutations per megabase (mut/Mb) (Mo et al. 2023; Swat et al. 2023; Haynes et al. 2024).
Along with these changes in mutation rate, biases in mutational spectra are widely observed in somatic mutations, where different tissues, individuals, and exposures show different tendencies for some types of mutations rather than the others to occur (Wellcome Sanger Institute 2024b). Distinct patterns of mutations, called “mutational signatures,” arise in the DNA of normal (Li et al. 2021; Moore et al. 2021; Park et al. 2021) and cancer (Alexandrov et al. 2020, 2013) cells due to various biological processes (Alexandrov et al. 2015; Boot et al. 2022), environmental exposures (Kucab et al. 2019; Chen et al. 2024), or intrinsic factors (Meier et al. 2018). Whether and to what extent the combined effects of these diverse mutational processes exhibit mutational biases, and how these biases compare with the mutation bias of the germline has not yet been determined.
We investigate the interaction between mutation rate and mutation spectrum changes in cancer. In particular, we compare the mutation spectrum of hypermutator (HM) and nonhypermutator (NHM) cancers, demonstrating a highly conserved spectrum across NHM samples that consistently reverses the mutation bias of the germline. Cancer driver mutations thus occur through distinct mechanisms in HM and NHM cancers. In HM cancers, the supply of cancer driver mutations is increased through higher mutation rates, while in NHM cancers, the reversal in mutation spectrum allows driver mutations. Our results emphasize the role of both the mutation rate and spectrum as critical in modeling genetic evolution in cancers and open new perspectives into the mechanisms that drive the development and progression of different cancers.
Materials and Methods
A summary workflow of our methodology is shown in supplementary fig. S1, Supplementary Material online.
Datasets
We use three sets of data in our analysis, described as follows.
-
PCAWG: The Pan-Cancer Analysis of Whole Genomes (PCAWG) study (https://dcc.icgc.org/pcawg) is an international collaboration which provides whole-genome mutation data from cancer tissues from over 2,700 donors. Overall, the dataset has 55,657,793 SNV mutations including 421,336 coding mutations, in 20 different primary tissues.
Each sample in the PCAWG dataset was classified as a HM cancer or an NHM cancer based on the classification provided by Alexandrov et al. (2020).
We note that the hypermutated set consists mainly of skin tumors and samples with mismatch repair or putative polymerase epsilon defects. The distribution of coding mutations from hypermutated and nonhypermutated samples in the different tissues is shown in supplementary fig. S2, Supplementary Material online, while the distribution of whole-genome mutations as well as the number of donors in the dataset are clarified in supplementary figs. S3 and S4, Supplementary Material online.
Normal tissue mutations: We collected whole-genome and coding mutations from 19 and 9 normal tissues as provided in the Supplementary material of the work of Moore et al. (2021) and Li et al. (2021), respectively.
Germline de novo mutations: We consider the seven datasets used by Rodriguez-Galindo et al. (2020), which consist of 679,547 germline single-base substitutions from family-based whole-genome datasets from multiple centers in Europe, Asia, and North America (see Rodriguez-Galindo et al. (2020) for details). These include 10,750 coding mutations.
All datasets are based on the reference genome hg19, and so we map mutations to this genome to determine the effects and contexts of mutations.
Cancer Gene Identification
We use the Cancer Gene Census (CGC) as the primary reference for defining cancer-related genes due to its extensive curation and validation in cancer genomics. CGC genes are identified based on strong experimental and clinical evidence, distinguishing them from passenger mutations and noncancer-associated genes. To assess whether CGC genes are consistently enriched for nonsynonymous mutations across tumor types, we compared the proportion of these mutations occurring in CGC genes to their representation in the coding genome. Across all tumor types, nonsynonymous mutations in CGC genes accounted for 5.7 of all nonsynonymous mutations, despite comprising only 4.5 of potential nonsynonymous mutations in coding genome sites. All individual tissues have a higher percentage of nonsynonymous mutations in CGC genes than the expected 4.5. No such enrichment is found in synonymous mutations for most tissues. This significant enrichment in nonsynonymous mutations (, t-test) confirms that CGC genes are preferentially mutated across diverse cancers, supporting their relevance as a robust reference set for cancer-associated mutations.
Spectrum Calculation
We consider two types of spectra: absolute count spectra and mutation rate spectra. The methods we use to calculate these spectra can be generalized to any spectrum with N mutational categories, and thus we explain for a general spectrum. However, we only calculate spectra at three different levels of detail: (i) the transition:transversion spectrum represented by a single measure, the transition frequency; (ii) the 1-mer spectrum that includes the frequency of each 1-mer mutation type, and (iii) the more detailed 3-mer spectrum, in which the mutation rate of each nucleotide depends on the identity of adjacent nucleotides.
Since a mutation, for instance, on one strand of the DNA corresponds to a mutation in the same position on the other strand, these two mutations are equivalent and we count them in a single mutational category; this category will be denoted following the convention of taking the pyrimidine as the reference base. A 1-mer spectrum therefore consists of 6 categories of mutation, and the same idea leads to 96 categories in a 3-mer spectrum.
Assume M mutations are observed in a (sub)dataset, out of which are from a given mutational category i, so that its absolute frequency is . The vector of these absolute frequencies for all mutational types is what we call the absolute count spectrum.
When comparing spectra, it is sometimes useful to correct these absolute frequencies by the occurrence opportunities for each mutation type in the underlying genome. For example, a very high frequency of mutations could be observed in a genome with high GC content, even if the underlying mutation rate is not elevated. Thus, to isolate changes in mutation rate, as opposed to genome content, we normalize the observed frequencies by mutational opportunities in the human genome and call such a spectrum the mutation rate spectrum. If mutational category i has opportunities to occur in the genome, then we define its genomic mutation rate as . For 1-mer and 3-mer mutations, the factor is simply given by the genome content; for transitions and transversions, however, each base in the genome provides two opportunities for a transversion to occur, and one opportunity for a transition regardless of genome content. Normalizing the vector of these genomic mutation rates to add to one gives the mutation rate spectrum, i.e. the mutational frequencies if all categories had the same opportunity to occur, which will be referred to as the relative mutation rates as opposed to the absolute frequencies in the absolute count spectrum. We emphasize that both the absolute count spectrum and mutation rate spectrum are useful in different contexts, depending on whether we wish to compare the number of mutations (of each type) that actually occurred, or the underlying mutation rate of each type.
Genes that are recurrently mutated across cancer patients are identified as putative cancer driver genes. Signals of positive selection have been previously detected in a few hundred of such genes (Wellcome Sanger Institute 2024a) using DNA sequencing data from tumor tissues. These signals measure an excess of amino-acid changing (nonsynonymous) mutations compared with a null model. Because mutation hotspots could also be recurrently mutated across patients, several sophisticated methods that account for the peculiarities of somatic mutation spectra have been proposed (Martincorena et al. 2017; Weghorn and Sunyaev 2017; Hess et al. 2019). When analyzing the spectra of the coding genome, we exclude mutations that occur in cancer genes identified by the Cancer Gene Census list (Wellcome Sanger Institute 2024a). Restricting our analysis to mutations in noncancer “passenger” genes eliminates the possibility that mutations driven by positive selection in cancer genes might distort the spectrum.
When comparing spectra across tissues or cancer types, we directly compare the corresponding Transition:Transversion (Ti:Tv) and 1-mer frequencies. To compare 3-mer spectra, we use correlation coefficients. Since both cosine similarity and Pearson correlation yield results dominated by CpG transitions (3-mers in which ) due to their high frequencies, we use Spearman rank correlation, which reduces overrepresentation of highly frequent mutation types.
We define the uniform spectrum as the mutation spectrum that accesses all possible mutations with equal probability. While not expected empirically, this spectrum is important theoretically as no mutational class is either over- or undersampled (Couce et al. 2013; Sane et al. 2023; Tuffaha et al. 2023). For a mutation rate spectrum with N mutational categories, each category has frequency in the uniform mutation rate spectrum. On the other hand, the uniform absolute count spectrum depends on the genome content. Again, if mutational category i has opportunities to occur in the genome, the uniform absolute count spectrum is simply given by .
Bias Reversal Measure
Consider a set of somatic mutations isolated from some cancer of interest. We seek to compare the spectrum of these mutations with the germline spectrum. While normal tissue spectra could serve as an alternative reference, we use the germline spectrum because it represents the inherited mutational background upon which all somatic mutations accumulate. This allows us to quantify fundamental shifts in mutational processes that occur during tumorigenesis, rather than tissue-specific mutations influenced by external factors. In particular, we would like to quantify the extent to which the cancer spectrum reverses (rather than reinforces) the mutational bias observed in the germline. As previously demonstrated (Sane et al. 2023; Tuffaha et al. 2023), the bias is reversed if a particular mutation frequency moves toward (or in fact past) the uniform frequency. Thus, for example, the unbiased absolute transition frequency is 0.33. If the transition frequency in the germline is 0.7, a transition frequency of either 0.6 or 0.2 are both in the direction that reverses the germline bias, whereas a change to 0.8 reinforces the germline bias.
Mutations that occur more frequently, in absolute terms, will have a greater effect on the degree to which the mutation bias is reversed. We therefore use absolute count spectra to compute the bias reversal. If we consider n mutational categories, the bias may be closer to the uniform level than the germline bias for some categories, but may be further away in others. Let represent, for the ith mutational category, the direction which constitutes a bias reversal, relative to the germline. Thus if the germline has a lower frequency than the uniform level, such that an increase in frequency represents a bias reversal. Similarly, when the germline has a higher frequency than the uniform level and reducing this frequency would yield a bias reversal. (In the unlikely case that the germline and the uniform levels are exactly equal, there is no bias to be reversed and therefore we take .) The overall bias reversal of a given spectrum is then measured by the quantity
| (1) |
where and are the frequencies of the ith mutational category in the germline spectrum and the new spectrum, respectively. Supplementary fig. S5, Supplementary Material online illustrates how the bias reversal measure is calculated for two fictional spectra with three types of mutations. We note two things: first, a spectrum need not reverse the bias in each individual mutational class to have a positive bias reversal measure; these effects are summed across classes. The extreme examples shown in this figure are for illustration. Second, we note that the bias reversal measure does not always combine linearly across scales. For example, the bias reversal measure at the Ts:Tv level will only be equal to the bias reversal measure at the 1-mer level if the frequencies of the two 1-mer Ti happen to shift, in every case, in the same direction as the overall Ti, and similarly for the four 1-mer Tv. When this occurs, the bias reversal measure combines linearly and is identical across scales, as observed in our data for example in Fig. 1f.
Signature Decomposition
Different mutational processes generate distinct combinations of mutation types. Recent efforts have successfully identified particular spectra, “mutational signatures,” associated with distinct mutational processes in cancer, and established links to their underlying mechanisms (Alexandrov et al. 2020). A mutational spectrum from a particular tissue or tumor type can thus be decomposed into weighted contributions from known mutational signatures (Rosenthal et al. 2016; Li et al. 2020; Serrano Colome et al. 2023; Jin et al. 2024).
We use SigNet (Serrano Colome et al. 2023) to find the combination of mutational signatures, as identified in COSMIC v3.1 (Alexandrov et al. 2020), that best describe the observed mutation counts in our data. This algorithm leverages the strong correlations between mutational processes observed in cancer data, providing highly accurate decompositions even when the number of mutations is low (Serrano Colome et al. 2023).
In brief: we first count the number of observed mutations for each of the 96 mutation types. Since COSMIC signatures are derived from whole-genome data, when decomposing mutational spectra from coding regions, SigNet first corrects for the 3-mer abundances in the coding genome; the input data are rescaled by the ratio between the 3-mer abundances in the coding genome and the whole genome.
The output of SigNet includes estimates of the signature weights, as well as a classification score, which reflects the degree to which the decomposition is considered reliable. SigNet also assigns a weight to an “unknown” category, which pools the weights from any signatures with predicted weight lower than 0.01.
We also perform a signature decomposition using SigProfileAssigment (Díaz-Gay et al. 2023) to validate our results.
Positive Selection Detection in Cancer Genes
The ratio of nonsynonymous to synonymous mutations (dN/dS) is a standard measure used to detect the influence of natural selection; positive selection is inferred when dN/dS>1, and genes with strong evidence for positive selection in tumor samples are considered cancer genes (Martincorena et al. 2017). Here, rather than detecting cancer genes, we use a similar approach to check whether HM and NHM samples show signs of positive selection in known cancer genes. In other words, we check if selection acts differently in HM and NHM using the pooled coding 3-mer spectra across tissues in each of these two categories. To avoid the dominance of skin and colon (see supplementary fig. S2a, Supplementary Material online) in the pooled HM spectrum, mutations from skin and colon are downsampled so that the numbers of mutations from these two tissues are equal to the number of HM mutations in the next most prevalent tissue, in this case, stomach cancers (supplementary fig. S2b, Supplementary Material online).
Finding dN/dS is not possible for the four 3-mer mutation types (, , , and ) that do not generate synonymous mutations. Also, the number of synonymous mutations in cancer genes in our dataset is low for some tissue types (<200 mutations for 10 tissues out of 20). These data are clearly not sufficient to compose a 96-category 3-mer mutation spectrum.
Therefore, instead of using synonymous mutations, dS, as a neutral proxy, we use the fact that in the absence of positive selection, both cancer and passenger genes in tumor samples should have the same genomic mutation rates. This method allows us to calculate expectations for synonymous and nonsynonymous mutations in all 3-mer contexts based on passenger genes, and compare these with the observed mutations in cancer genes. We interpret any observed differences as signs of positive selection because signals of negative selection are nearly absent in tumor evolution (but see Weghorn and Sunyaev (2017) and Tilk et al. (2022)).
Consider a particular 3-mer. We first count the number of times this 3-mer occurs in the passenger genes in the human genome, . We then consider one of the three possible mutations of that 3-mer, and count the number of observed mutations of this type, in passenger genes, in the PCAWG dataset, . Dividing the observed mutation count in passenger genes by the number of occurrences of the 3-mer in passenger genes yields the expected genomic mutation rate for this 3-mer mutation, .
To determine the expected number of mutations of this 3-mer mutation type in cancer genes, we count the number of times this 3-mer occurs in cancer genes in the human genome, . In addition, we classify each of these instances as either synonymous or nonsynonymous, depending on the outcome if this particular 3-mer mutation occurred at that position, such that , where and are the synonymous and nonsynonymous instances, respectively.
Given these values, it is straightforward to compute the expected numbers of synonymous and nonsynonymous mutations in cancer genes, and , respectively. Finally, we count the number of observed mutations of this type in the PCAWG dataset in cancer genes, again differentiating synonymous () and nonsynonymous () mutations such that . The relative difference between the observed and expected numbers of mutations is then a measure of whether there is an excess of a particular type of mutation in cancer genes. We refer to this relative difference as the “excess measure,” computed for each 3-mer mutation as:
| (2) |
for synonymous and nonsynonymous mutations, respectively. If the excess measure is positive for a given 3-mer mutation type, then more mutations of that type are sampled in cancer genes than in passenger genes, for example, a value of 1 means there are twice as many mutations observed as expected. Due to the lack of synonymous mutations for the four 3-mer mutations mentioned above, the excess measures for synonymous mutations are defined for only 92 of the 96 mutation types.
We use bootstrapping to assess the statistical significance of the excess measure for all 3-mer mutation types in particular types of cancer. The null hypothesis is that the observed number of synonymous and nonsynonymous mutations in cancer genes follows the same distribution as expected under random sampling, meaning there is no selection on 3-mer mutation types in cancer genes. Since mutations in cancer genes form about of all observed coding mutations in PCAWG, we bootstrap samples from all coding mutations, where each sample is of the same size ( of the total pool of coding mutations). Each of these samples has a number of synonymous and nonsynonymous mutations in each 3-mer context, forming the distributions and across the bootstrapped samples, respectively.
To quantify deviation from the null expectation, we compute a z-score comparing the observed numbers of synonymous () and nonsynonymous () mutations in cancer genes to the bootstrapped distributions and , respectively. Given that there are 96 comparisons, we use a conservative Bonferroni correction and consider an excess measure to be statistically significant at a P-value of , which corresponds to the critical z-score value of .
Results
Reduced Transition Bias in Nonhypermutated Samples
We define the mutation bias of any mutational type to be reversed, compared with the germline bias, if it differs from the germline bias in the direction of the unbiased state, which assumes a uniform probability of each k-mer mutation type (see Methods) (Tuffaha et al. 2023) (Fig. 1c). Using family-based datasets of de novo mutations (Rodriguez-Galindo et al. 2020), we find that the germline oversamples transitions (Ti) with a relative mutation rate (RMR) of 0.853 in coding mutations (Fig. 1d).
The Ti RMR in 9 normal tissue samples (Fig. 1d) is not significantly different from the germline (t-test, ). The liver is the only outlier, with a Ti RMR of 0.55, presumably related to the transversion-rich mutational process induced by aristolochic acid exposure (SBS22 in COSMIC database), as observed in female donors (Li et al. 2021). Similarly, the Ti RMR in the pooled data from HM cancers does not differ significantly from the Ti RMR in the germline (t-test, ). In contrast, NHM samples show a significantly reduced Ti RMR (t-test, ).
These results are supported when the overall Ti RMR is decomposed at the 1-mer level (Fig. 1e). Note that the 1-mer RMRs of the germline are higher than the uniform spectrum level (1/6) for transitions and lower for transversions. Even when the germline RMR is very close to the uniform level (e.g. mutations), NHMs show strong evidence for a reversed bias; for both transitions they have a significantly reduced RMR compared with the germline, while two of four transversions show a significantly elevated RMR (stars; Bonferroni-corrected ). In contrast, the HM 1-mer mutation spectra differ significantly from the germline only for transversions, where the Tv bias is not reduced but is in fact significantly reinforced. Note that none of the 1-mer mutation distributions for normal tissues have significantly different means from the germline. Analogous results hold for whole-genome mutations (supplementary fig. S6, Supplementary Material online).
To extend our analysis to the 3-mer level, we define a new metric, the bias reversal measure, to summarize spectrum changes across mutation types. For any spectrum, the bias reversal measure sums the degree to which the spectrum reverses the bias observed in the germline (see Methods). For passenger-gene (noncancer-gene) mutations in HM and NHM cancers and at the Ti:Tv, 1-mer, and 3-mer levels, NHMs show distributions of this measure across tissues that are significantly higher than zero (Fig. 1f; t-test, in all three cases), whereas normal tissues and HMs are not significantly different from zero (t-test, in all six cases), meaning that the bias is significantly reversed in NHM, but no such pattern is observed in normal tissues or HM. Similar results hold for the whole genome (supplementary fig. S7, Supplementary Material online).
We restricted our analyses of coding spectra to passenger genes to reduce possible confounding effects of positive selection (see Methods). Returning to cancer genes, we find that in normal tissue, HM, or NHM cancers, mutations in cancer driver genes tend to have a higher bias reversal than mutations in passenger genes (“x” in Fig. 1f). This higher bias reversal for cancer genes relative to passenger genes is not driven by differences in the expected counts of 3-mer mutations, as these are highly correlated between these gene classes ().
We recognize the importance of ensuring that NHM vs. HM differences are not simply driven by tissue composition. To test this, we analyzed mutation spectra within the five tissues that contain sufficient NHM and HM cases (Brain, Colorectal, Liver, Stomach, and Uterus; see supplementary fig. S2, Supplementary Material online). In the analysis restricted to these five tissues, only 2 out of 6 mutation types in NHM samples showed significant differences from the germline—compared with 4 out of 6 when using data from all tissues—while the significance in HM samples (one mutation type) remained unchanged. The loss of significance in some mutational types is presumably due to the lack of information (only five tissues) and the strict Bonferroni correction we applied. As shown in supplementary figs. S8 and S9, Supplementary Material online, despite the reduced statistical power when using only five tissues, the overall trends observed in Fig. 1e and f are maintained, further supporting our conclusions that distinct mutational processes underlie NHM and HM cancers.
NHM 3-mer Spectrum is Highly Similar Across Tissues
To test for similarity among spectra, we computed the rank correlation coefficient between each tissue spectrum and all other spectra (pooled) in the same class. For instance, we correlated the spectrum from each tissue’s NHM samples with the spectrum computed by pooling NHM samples from all other tissues. NHM spectra are significantly more strongly correlated with one another (mean ) than HM spectra are with one another (mean ; Fig. 2a; t-test, ). Considering unpooled pairs of tissues, NHM spectra also exhibit higher correlations than HM spectra (Fig. 2b). The overall positive correlation between NHM and HM spectra (, ) is shown in supplementary fig. S10, Supplementary Material online, along with full 3-mer spectra (supplementary figs. S12 and S13, Supplementary Material online).
Fig. 2.
NHM spectra are highly similar across tissues, unlike HM spectra. a) Correlating the spectrum of each tissue with the pooled spectrum from all other tissues in the same class (HM or NHM) shows high Spearman correlation coefficients for NHM (blue) and lower values for HM (red). Center bar indicates distribution median and whiskers show CI; “+” signs represent outliers. b) Spearman correlation coefficients of the spectrum of each tissue with each other tissue in the same class (violin plots); horizontal lines represent the medians of the distributions. Correlation coefficients between each tissue and pooled samples from all other tissues in the same class (“x” symbols) are also shown for comparison; these are the values summarized in panel (a).
What Mutational Processes Characterize NHM Tumors?
To examine the mutational processes underlying mutation spectra changes, we conducted a signature decomposition analysis (Serrano Colome et al. 2023) (Fig. 3a). For the pooled NHM spectrum, 52% of the mutations are attributed to mutation signature SBS5, 16.8% are related to the AID/APOBEC family of cytidine deaminases (SBS2 and SBS13), and 14.4% of the mutations are attributed to SBS40, while other signatures have lower weights. All of the signatures composing the NHM spectrum have positive bias reversal measures (Fig. 3b), excluding SBS1 and SBS2.
Fig. 3.
Most signatures composing the NHM and colon HM spectra reverse the bias, while mutational processes tend to reinforce the bias in skin HM samples. a) The weights for each mutational signature identified as contributing to the three spectra are shown; gray proportions at the top represent mutations from unknown mutational processes. b) Bias reversal measures for each signature; circles have an area proportional to the weight of that signature within the respective spectrum.
Since the spectra of HM tumors vary substantially across tissues (Fig. 2c), we aimed to perform tissue-specific signature decompositions of HM samples, but only skin and colon had sufficient mutations (supplementary fig. S2, Supplementary Material online, also see supplementary fig. S13, Supplementary Material online for the full 3-mer spectra). As expected, UV light signatures (SBS7a through SBS7d) dominate the skin spectrum (Fig. 3a); the two dominant signatures, SBS7a and SBS7b, have a reinforced bias (negative bias reversal measure; Fig. 3b). In contrast, mutations derived from a defective polymerase epsilon dominate the colon HM spectrum (SBS10a and SBS10b, and the associated SBS28; Fig. 3a). All signatures underlying the colon HM spectrum have a positive bias reversal measure (Fig. 3b), explaining why colon cancer was the outlier with strong transition bias reversal in HM cancers (Fig. 1d).
We validated our results by performing a signature decomposition using SigProfilerAssignment (Díaz-Gay et al. 2023). Despite some minor differences is signature identities and weights (supplementary fig. S11, Supplementary Material online), the results are consistent with our findings using SigNet: the signatures that have higher weights in NHM and colon HM reverse the germline bias and are mostly attributed to DNA repair defects, while the highly weighted signatures in skin HM are still the bias-reinforcing signatures SBS7a and 7b.
Positive Selection in Cancer Genes in NHM Cancers Anticorrelated with the Germline Spectrum
Previous work has identified stronger signals of positive selection in tumors with lower mutation rates (Martincorena et al. 2017; Tilk et al. 2022). Our results suggest that mutations that are positively selected in cancer, at least in NHM, may reverse germline mutation biases. To examine positive selection in both HM and NHM tumors, we compute the excess measure (equation (2)) for each 3-mer mutation (Fig. 4a). In both HM and NHM categories, we observe positive and negative excess measures for different 3-mer mutation types, but nonsynonymous mutations in NHM show the highest number of positive values, with a distribution mean that is significantly different from zero (Fig. 4a; t-test, ). The nonsynonymous distribution in NHM is significantly different from each of the other three distributions (Wilcoxon rank sum test, ). In agreement with previous work (Martincorena et al. 2017; Tilk et al. 2022), we thus find stronger evidence for positive selection acting on nonsynonymous mutations in NHM than in HM tumors. Note that the distribution mean for nonsynonymous mutations in HM is also significantly different from zero (t-test, ), but the distribution means for synonymous mutations are not significantly different from zero (t-test, Bonferroni-corrected, and , respectively).
Fig. 4.
Evidence of positive selection in cancer genes is stronger in NHMs than HMs. a) Excess measures comparing cancer and noncancer genes for all 3-mer mutation types are shown for synonymous and nonsynonymous mutations in both HM (red) and NHM (blue). Only the nonsynonymous distributions are significantly higher than zero. Boxplot whiskers include CI. The positive y axis is truncated for clarity; “+” signs represent outliers. b) Histograms of z-scores obtained by comparing the numbers of synonymous mutations of each 3-mer mutation type in cancer genes with results in 50,000 bootstrapped samples from all synonymous mutations. The Bonferroni-corrected CI from the boostrapped dataset is shown in yellow. c) Analogous results for nonsynonymous mutations. d,e) Excess measure for nonsynonymous mutations in NHM and HM is correlated with the germline nonsynonymous passenger-gene spectrum (log-scale). The 3-mer mutations that show significant positive selection for NS NHM (out of CI in panel (c), squares) show a significant anticorrelation only in the NHM case (red lines, (d) , (e) ), while all 3-mer mutations taken together (squares and circles) do not show any significant correlations (black lines, (d) , (e) ). The color of each point corresponds to the NHM z-score from panel (c). Open circles/squares (in panels (a) and (e)) indicate out-of-scope outliers.
To detect selection on individual 3-mer mutation types (and to account for the variance in excess measure for rare mutation types), we compared the numbers of mutations observed in cancer genes with bootstrapped samples from all coding mutations. For all 3-mer contexts in HM, and almost all those in NHM, the number of synonymous mutations in cancer genes falls within the confidence interval of the bootstrapped samples, showing no sign of selection (Fig. 4b). In contrast, nonsynonymous mutations show 24 positively selected 3-mer mutation types in NHM (out of 96), but none in HM (Fig. 4c). Nonsynonymous excess measures in these 24 3-mer mutation types in NHM (Fig. 4d) are anticorrelated with the corresponding mutation count frequencies in the nonsynonymous germline spectrum, while this is not true for HM cancers (Fig. 4e), demonstrating that among mutations under positive selection, those least likely to occur in the germline show the strongest selective effect in NHM. There is no significant correlation, however, between the germline mutation spectrum and the measure of positive selection across all 3-mer mutation types.
Negative selection is typically detectable only in tumors with a very low mutation burden, as expected under strong clonal interference (Tilk et al. 2022); consistent with this expectation, we found no 3-mer mutation type with an observed value that falls below the confidence interval, in HM or in NHM (Fig. 4b).
Discussion
When looking at cancer through an evolutionary lens, cancer driver mutations are “beneficial” to the cell, as they increase a cell’s reproductive rate, leading to local or distant invasions (Alberts et al. 2002; Brown et al. 2023). These driver mutations are prevalent across cancer types, but rare in normal tissues (Ng and Chan 2023). While some cancers may develop a mutator phenotype, which can explain their increased access to driver mutations, a mutator phenotype is not essential for carcinogenesis (Tomlinson et al. 1996). Our previous work (Tuffaha et al. 2023) suggests another mechanism that enhances access to previously rare mutations, that is mutational bias shifts. Here we tested whether such bias shifts exist in cancer, and if so, whether this occurs only in nonhypermutated cancers or also in HMs.
We examined the mutational spectra of thousands of human tumor samples, computing spectra for HM and NHM cancers across 20 tissue types. We demonstrate that the mutational spectra in NHM tumors is highly correlated across tissues, an effect not observed in HM tumors (Fig. 2). Thus, while HM tumors show a dramatic increase in mutation rate, NHM tumors show only modest increases in mutation rate but have a distinct mutation spectrum that is repeated across diverse tissues and donors.
Reversals of mutation bias, that push the mutation spectrum either towards or past the unbiased/uniform state, offer access to undersampled classes of mutations (Couce et al. 2013; Sane et al. 2023; Tuffaha et al. 2023), which we hypothesized may include the mutations that drive cancer. Consistent with this prediction, whether analyzed at the Ts:Tv, 1-mer, or 3-mer levels, the distinct mutation spectrum of NHM (Fig. 2) tumors significantly reverses the germline bias (Fig. 1), while HM tumors and normal tissues show no such effect.
These results suggest that while HM tumors access cancer driver mutations through elevated mutation rate, NHM tumors access driver mutations through changes in mutation spectrum that correspond to reversals in germline mutational biases (Fig. 5).
Fig. 5.
Cancer driver mutations are rarely accessed but “beneficial” in the fitness landscape (gray surface) of a cellular lineage. These can be accessed by two possible changes in mutational processes: an increase in mutation rates as seen in HMs, or a reversal in the mutational bias as seen in NHMs.
We further confirm this hypothesis by demonstrating that driver mutations in NHM are indeed stronger, and/or more common, for those types of mutations that are less likely to occur in the germline or in healthy tissues (Fig. 4d). This suggests that indirect selection on 3-mer mutation modifiers (Milligan et al. 2022) may tune the human germline and healthy somatic mutation spectrum such that the greater the effect of a cancer driver mutation, the less likely it is to occur, and if it does occur, the less likely it is to remain unrepaired.
The results further justify the use of the germline spectrum as a baseline, noting that normal tissue spectra have been shown to be more similar to the germline than to tumor spectra. Additionally, normal tissues undergo cell turnover and environmental exposure (Li et al. 2021; Moore et al. 2021), which can introduce tissue-specific mutational signatures unrelated to tumor evolution. Using the germline as a baseline ensures that our bias reversal measure reflects oncogenic mutational processes rather than external mutagenic influences that vary across tissues.
Our signature decomposition analysis (Fig. 3) suggests SBS13 and SBS40 as mutational processes that reverse the germline bias and play a key role in the NHM mutation spectrum. While SBS5 makes the largest contribution to the NHM spectrum, it is a clock-like signature (prevalence increases with patient age) that shows a modest bias reversal and is found in the germline and healthy tissues as well. In contrast, SBS13 is attributed to activity of the AID/APOBEC family of cytidine deaminases (Nik-Zainal et al. 2012), which is known to promote carcinogenesis and cause genomic instability (Talluri et al. 2021; Pecori et al. 2022; Shilova et al. 2022). This mutational signature has been previously detected in many cancer types (see Fig. 3 in Alexandrov et al. 2020), and it reverses the germline bias mainly due to its richness in transversions. Likewise, SBS40 is a mutational signature detectable in many cancers (Alexandrov et al. 2020), and it strongly reverses the germline transition bias because it mimics a uniform spectrum (Fig. 3). There is a significant positive correlation between SBS40 and hypoxia (Serrano Colome et al. 2023), and the presence of transient or chronic hypoxia, characterized by critically low tissue oxygen, is a defining characteristic of cancer (Hanahan and Weinberg 2000). This hypoxic state is associated with a significant increase in mutation rate, thought to result from a reduction in the effectiveness of various DNA repair mechanisms (Bhandari et al. 2020; Kaplan and Glazer 2020).
Further work needs to be done to test potential temporal contributions of these two mutational signatures to tumorogenesis. While the activity of the AID/APOBEC family is mutagenic, likely contributing to initial tumor formation (Talluri et al. 2021; Shilova et al. 2022), many APOBEC3-induced mutations occur later in tumor evolution (Butler and Banday 2023). Similarly, hypoxic conditions in the healthy tissue, due to inflammation, could also precede tumorigenesis (Pham et al. 2021). Once a solid tumor has developed a certain minimum size, we hypothesize that the hypoxic state could enable a second round of driver mutations that are required for metabolic changes and angiogenesis allowing the tumor to survive and grow under these new conditions.
Further studies are also needed to determine whether the distinct and remarkably stable mutation spectrum observed in NHM tumors is due to alterations in DNA repair enzymes resulting from common hypoxic conditions among patients, from other unknown causes shared among patients, or from multifactorial convergent evolution at the mutation spectrum level.
In contrast with the NHM spectrum, we observed a wide variability in the spectra of HM tumors. While skin cancers show no evidence of reversing the germline bias, HM samples from colon show a strong bias reversal. The mechanisms underlying hypermutation differ between these tissues—UV exposure in skin and defective mismatch repair (MMR) or polymerase proofreading in the colon—leading to distinct mutational outcomes. In colon cancer, these mechanisms both increase the mutation rate and promote bias reversal, which may contribute to its high incidence by providing more opportunities for driver mutations to arise (Tuffaha et al. 2023). Interestingly, it has been observed that mutations in hypermutated and nonhypermutated colon cancers target different cancer genes (van Geel et al. 2015), and here we show that colon cancer has two distinct spectra in these categories.
By comparing observed and expected mutation counts, we show that NHM tumors have an excess of nonsynonymous mutations in cancer genes in several 3-mer contexts, which we interpret as evidence for positive selection (Martincorena et al. 2017; Weghorn and Sunyaev 2017; Dietlein et al. 2020). This excess is significant not only for nonsynonymous mutations, but for synonymous mutations in a few contexts, which is not completely unexpected since several synonymous mutations have been previously identified to act as cancer driver mutations due to their impact on splicing, RNA secondary structure and expression levels (Supek et al. 2014; Sharma et al. 2019).
On the other hand, we find no strong signal for positive selection in HM samples. This could be due to the combined effects of clonal interference (Gerrish and Lenski 1998; Tilk et al. 2022) and deleterious load in HM tumors, such that only a small fraction of mutations will successfully spread through the population. This effect could dilute signals of positive selection when mutation rates are very high. Since the small fraction of beneficial mutations that spread are also expected to have very large selective effects, negative epistasis could also reduce the signals of positive selection in HM tumors. This can occur when a single, large-effect mutation eliminates the need for further mutations that would otherwise be beneficial, for example when cancer driver genes “break” the same regulatory pathway. Moreover, HM tumors are known to have longer sequence context dependencies than those accounted for by our 3-mer model (Pleasance et al. 2010; Dietlein et al. 2020). This may also make our detection of positive selection across 3-mer mutations less reliable in HM tumors than in NHM tumors. Nevertheless, we find that the excess of nonsynonymous mutations in the 24 3-mer mutation types that show positive selection in NHM is also anticorrelated with the germline mutation spectrum in HM tumors (Fig. 4d,e). This suggests that the underlying fitness landscape may be quite similar for HM and NHM, but the inference of positive selection is simply more difficult in HM.
Our work putatively identified 24 3-mer mutation types that show positive selection in NHM. Classifying the contribution of these mutations to oncogenes and tumor suppressor genes is a clear avenue for future work, as is characterizing the distribution of fitness effects of these mutation types in the germline. More generally, we hope that characterizing the distinct 3-mer spectrum observed in NHM cancers may shed further light on cancer driver mutations and their critical role in early oncogenesis.
Supplementary Material
Contributor Information
Marwa Z Tuffaha, Mathematics, Western University, London, ON, Canada N6A 5B7.
David Castellano, Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ 85721, USA.
Claudia Serrano Colomé, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain.
Ryan N Gutenkunst, Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ 85721, USA.
Lindi M Wahl, Mathematics, Western University, London, ON, Canada N6A 5B7.
Supplementary Material
Supplementary material is available at Molecular Biology and Evolution online.
Author Contributions
M.Z.T., D.C., R.N.G., and L.M.W. designed the study. M.Z.T. and D.C. performed data analysis; C.S.C. performed spectral decompositions. All authors contributed to data interpretation. M.Z.T., L.M.W., and R.N.G. created the figures. M.Z.T. and D.C. drafted the manuscript. All authors edited and finalized the manuscript.
Funding
This work was supported by the Natural Sciences and Engineering Research Council of Canada grant RGPIN-2019-06294, by the National Institute of General Medical Sciences of the National Institutes of Health through grants R01GM127348 and R35GM149235, and by the Spanish Ministry of Science and Innovation through the Centro de Excelencia Severo Ochoa (CEX2020-001049-S, MCIN/AEI /10.13039/501100011033), the Generalitat de Catalunya through the CERCA programme, and the European Union’s H2020 research and innovation program under Marie Sklodowska-Curie grant agreement No.754422.
Data Availability
Functional annotations per coding site using SnpEff and Supplementary tables, in addition to the MATLAB code that was used for the coding mutations analysis can be found in https://github.com/MarwaTuffaha/CancerBiases. Data are reported in the Supplementary Material tables. Supplementary tables S1–S4, Supplementary Material online provide data shown in Figs. 1 to 4 in the main text, respectively. Supplementary table S5, Supplementary Material online provides the 3-mer spectra plotted in supplementary figs. S9 and S10, Supplementary Material online.
References
- Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. Cancer as a microevolutionary process. In: Molecular biology of the cell. 4th ed. New York: Garland Science; 2002.
- Alexandrov LB, Jones PH, Wedge DC, Sale JE, Campbell PJ, Nik-Zainal S, Stratton MR. Clock-like mutational processes in human somatic cells. Nat Genet. 2015:47(12):1402–1407. 10.1038/ng.3441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, Boot A, Covington KR, Gordenin DA, Bergstrom EN, et al. The repertoire of mutational signatures in human cancer. Nature. 2020:578(7793):94–101. 10.1038/s41586-020-1943-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SA, Behjati S, Biankin AV, Bignell GR, Bolli N, Borg A, Børresen-Dale A-L, et al. Signatures of mutational processes in human cancer. Nature. 2013:500(7463):415–421. 10.1038/nature12477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bhandari V, Li CH, Bristow RG, Boutros PC. Divergent mutational processes distinguish hypoxic and normoxic tumours. Nat Commun. 2020:11(1):737. 10.1038/s41467-019-14052-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boot A, Liu M, Stantial N, Shah V, Yu W, Nitiss KC, Nitiss JL, Jinks-Robertson S, Rozen SG. Recurrent mutations in topoisomerase iiα cause a previously undescribed mutator phenotype in human cancers. Proc Natl Acad Sci U S A. 2022:119(4):e2114024119. 10.1073/pnas.2114024119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown JS, Amend SR, Austin RH, Gatenby RA, Hammarlund EU, Pienta KJ. Updating the definition of cancer. Molecular Cancer Research. 2023:21(11):1142–1147. 10.1158/1541-7786.MCR-23-0411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Butler K, Banday AR. APOBEC3-mediated mutagenesis in cancer: causes, clinical significance and therapeutic potential. J Hematol Oncol. 2023:16(1):31. 10.1186/s13045-023-01425-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cancer Research UK . Age and cancer. [accessed 2024 Feb 13]. https://www.cancerresearchuk.org/about-cancer/causes-of-cancer/age-and-cancer. 2024.
- Chen Z-Y, Petetin H, Méndez Turrubiates RF, Achebak H, Pérez García-Pando C, Ballester J. Population exposure to multiple air pollutants and its compound episodes in Europe. Nat Commun. 2024:15(1):2094. 10.1038/s41467-024-46103-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Couce A, Guelfo JR, Blázquez J. Mutational spectrum drives the rise of mutator bacteria. PLoS Genet. 2013:9(1):e1003167. 10.1371/journal.pgen.1003167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Díaz-Gay M, Vangara R, Barnes M, Wang X, Islam SMA, Vermes I, Duke S, Narasimman NB, Yang T, Jiang Z, et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics. 2023:39(12):btad756. 10.1093/bioinformatics/btad756. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dietlein F, Weghorn D, Taylor-Weiner A, Richters A, Reardon B, Liu D, Lander ES, Van Allen EM, Sunyaev SR. Identification of cancer driver genes based on nucleotide context. Nat Genet. 2020:52(2):208–218. 10.1038/s41588-019-0572-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garibyan L, Fisher DE. How sunlight causes melanoma. Curr Oncol Rep. 2010:12(5):319–326. 10.1007/s11912-010-0119-y. [DOI] [PubMed] [Google Scholar]
- Gerrish PJ, Lenski RE. The fate of competing beneficial mutations in an asexual population. Genetica. 1998:102/103:127–144. 10.1023/A:1017067816551. [DOI] [PubMed] [Google Scholar]
- Gomez K, Bertram J, Masel J. Mutation bias can shape adaptation in large asexual populations experiencing clonal interference. Proc Biol Sci. 2020:287(1937):20201503. 10.1098/rspb.2020.1503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hanahan D. Hallmarks of cancer: new dimensions. Cancer Discov. 2022:12(1):31–46. 10.1158/2159-8290.CD-21-1059. [DOI] [PubMed] [Google Scholar]
- Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000:100(1):57–70. 10.1016/S0092-8674(00)81683-9. [DOI] [PubMed] [Google Scholar]
- Haynes T, Gilbert MR, Breen K, Yang C. Pathways to hypermutation in high-grade gliomas: mechanisms, syndromes, and opportunities for immunotherapy. Neurooncol Adv. 2024:6(1):vdae105. 10.1093/noajnl/vdae105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hess JM, Bernards A, Kim J, Miller M, Taylor-Weiner A, Haradhvala NJ, Lawrence MS, Getz G. Passenger hotspot mutations in cancer. Cancer Cell. 2019:36(3):288–301. 10.1016/j.ccell.2019.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jin H, Gulhan DC, Geiger B, Ben-Isvy D, Geng D, Ljungström V, Park PJ. Accurate and sensitive mutational signature analysis with MuSiCal. Nat Genet. 2024:56(3):541–552. 10.1038/s41588-024-01659-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaplan AR, Glazer PM. Impact of hypoxia on DNA repair and genome integrity. Mutagenesis. 2020:35(1):61–68. 10.1093/mutage/gez019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kucab JE, Zou X, Morganella S, Joel M, Nanda AS, Nagy E, Gomez C, Degasperi A, Harris R, Jackson SP, et al. A compendium of mutational signatures of environmental agents. Cell. 2019:177(4):821–836. 10.1016/j.cell.2019.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li R, Di L, Li J, Fan W, Liu Y, Guo W, Liu W, Liu L, Li Q, Chen L, et al. A body map of somatic mutagenesis in morphologically normal human tissues. Nature. 2021:597(7876):398–403. 10.1038/s41586-021-03836-1. [DOI] [PubMed] [Google Scholar]
- Li S, Crawford FW, Gerstein MB. Using sigLASSO to optimize cancer mutation signatures jointly with sampling likelihood. Nat Commun. 2020:11(1):3575. 10.1038/s41467-020-17388-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martincorena I, Raine KM, Gerstung M, Dawson KJ, Haase K, Van Loo P, Davies H, Stratton MR, Campbell PJ. Universal patterns of selection in cancer and somatic tissues. Cell. 2017:171(5):1029–1041. 10.1016/j.cell.2017.09.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meier B, Volkova NV, Hong Y, Schofield P, Campbell PJ, Gerstung M, Gartner A. Mutational signatures of DNA mismatch repair deficiency in C. elegans and human cancers. Genome Res. 2018:28(5):666–675. 10.1101/gr.226845.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Milligan WR, Amster G, Sella G. The impact of genetic modifiers on variation in germline mutation rates within and among human populations. Genetics. 2022:221(4):iyac087. 10.1093/genetics/iyac087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mo S-F, Cai Z-Z, Kuai W-H, Li X, Chen Y-T. Universal cutoff for tumor mutational burden in predicting the efficacy of anti-pd-(l) 1 therapy for advanced cancers. Front Cell Dev Biol. 2023:11:1209243. 10.3389/fcell.2023.1209243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moore L, Cagan A, Coorens TH, Neville MD, Sanghvi R, Sanders MA, Oliver TR, Leongamornlert D, Ellis P, Noorani A, et al. The mutational landscape of human somatic and germline cells. Nature. 2021:597(7876):381–386. 10.1038/s41586-021-03822-7. [DOI] [PubMed] [Google Scholar]
- National Cancer Institute . What is cancer? [accessed 2024 Jan 23]. https://www.cancer.gov/about-cancer/understanding/what-is-cancer. 2021.
- Ng AS, Chan DKH. Commonalities and differences in the mutational signature and somatic driver mutation landscape across solid and hollow viscus organs. Oncogene. 2023:42(37):2713–2724. 10.1038/s41388-023-02802-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, Greenman CD, Raine K, Jones D, Hinton J, Marshall J, Stebbings LA, et al. Mutational processes molding the genomes of 21 breast cancers. Cell. 2012:149(5):979–993. 10.1016/j.cell.2012.04.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park S, Mali NM, Kim R, Choi J-W, Lee J, Lim J, Park JM, Park JW, Kim D, Kim T, et al. Clonal dynamics in early human embryogenesis inferred from somatic mutation. Nature. 2021:597(7876):393–397. 10.1038/s41586-021-03786-8. [DOI] [PubMed] [Google Scholar]
- Pecori R, Di Giorgio S, Paulo Lorenzo J, Nina Papavasiliou F. Functions and consequences of AID/APOBEC-mediated DNA and RNA deamination. Nat Rev Genet. 2022:23(8):505–518. 10.1038/s41576-022-00459-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pham K, Parikh K, Heinrich EC. Hypoxia and inflammation: insights from high-altitude physiology. Front Physiol. 2021:12:676782. 10.3389/fphys.2021.676782. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD, Varela I, Lin M-L, Ordóñez GR, Bignell GR, et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature. 2010:463(7278):191–196. 10.1038/nature08658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raynes Y, Sniegowski PD. Experimental evolution and the dynamics of genomic mutation rate modifiers. Heredity (Edinb). 2014:113(5):375–380. 10.1038/hdy.2014.49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rodriguez-Galindo M, Casillas S, Weghorn D, Barbadilla A. Germline de novo mutation rates on exons versus introns in humans. Nat Commun. 2020:11(1):3304. 10.1038/s41467-020-17162-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosenthal R, McGranahan N, Herrero J, Taylor BS, Swanton C. deconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 2016:17(1):31. 10.1186/s13059-016-0893-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sane M, Diwan GD, Bhat BA, Wahl LM, Agashe D. Shifts in mutation spectra enhance access to beneficial mutations. Proc Natl Acad Sci U S A. 2023:120(22):e2207355120. 10.1073/pnas.2207355120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Serrano Colome C, Canal Anton O, Seplyarskiy V, Weghorn D. Mutational signature decomposition with deep neural networks reveals origins of clock-like processes and hypoxia dependencies. bioRxiv. 10.1101/2023.12.06.570467, 2023, preprint: not peer reviewed. [DOI]
- Sharma Y, Miladi M, Dukare S, Boulay K, Caudron-Herger M, Groß M, Backofen R, Diederichs S. A pan-cancer analysis of synonymous mutations. Nat Commun. 2019:10(1):2569. 10.1038/s41467-019-10489-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shilova O, Tsyba D, Shilov E. Mutagenic activity of AID/APOBEC deaminases in antiviral defense and carcinogenesis. Mol Biol. 2022:56(1):46–58. 10.1134/S002689332201006X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Storz JF, Natarajan C, Signore AV, Witt CC, McCandlish DM, Stoltzfus A. The role of mutation bias in adaptive molecular evolution: insights from convergent changes in protein function. Phil Trans R Soc B. 2019:374(1777):20180238. 10.1098/rstb.2018.0238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Supek F, Miñana B, Valcárcel J, Gabaldón T, Lehner B. Synonymous mutations frequently act as driver mutations in human cancers. Cell. 2014:156(6):1324–1335. 10.1016/j.cell.2014.01.051. [DOI] [PubMed] [Google Scholar]
- Swat W, Kumar D, Boyle TA, Dalton E, Comer S, Solomon J, Golem S, Eyring KR, Baskovich B, Vashisht A, et al. Distributional range of tumor mutational burden (TMB) across multiple cancer types emphasizing the importance of different cutoff values in different tumor types for clinical interpretation: A consensus statement from the SEQUOIA consortium. 2023.
- Talluri S, Samur MK, Buon L, Kumar S, Potluri LB, Shi J, Prabhala RH, Shammas MA, Munshi NC. Dysregulated APOBEC3G causes DNA damage and promotes genomic instability in multiple myeloma. Blood Cancer J. 2021:11(10):166. 10.1038/s41408-021-00554-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tilk S, Tkachenko S, Curtis C, Petrov DA, McFarland CD. Most cancers carry a substantial deleterious load due to Hill-Robertson interference. Elife. 2022:11:e67790. 10.7554/eLife.67790. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tomlinson IP, Novelli M, Bodmer W. The mutation rate and cancer. Proc Natl Acad Sci U S A. 1996:93(25):14800–14803. 10.1073/pnas.93.25.14800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tuffaha MZ, Varakunan S, Castellano D, Gutenkunst RN, Wahl LM. Shifts in mutation bias promote mutators by altering the distribution of fitness effects. Am Nat. 2023:202(4):503–518. 10.1086/726010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Geel RM, Beijnen JH, Bernards R, Schellens JH. Treatment individualization in colorectal cancer. Curr Colorectal Cancer Rep. 2015:11(6):335–344. 10.1007/s11888-015-0288-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weghorn D, Sunyaev S. Bayesian inference of negative and positive selection in human cancers. Nat Genet. 2017:49(12):1785–1788. 10.1038/ng.3987. [DOI] [PubMed] [Google Scholar]
- Wellcome Sanger Institute . 2024a. Cancer gene census. [accessed 2024 Feb 06]. https://cancer.sanger.ac.uk/census.
- Wellcome Sanger Institute . 2024b. Catalogue Of somatic mutations in cancer. [accessed 2024 Feb 06]. https://cancer.sanger.ac.uk/cosmic.
- Yuza K, Nagahashi M, Watanabe S, Takabe K, Wakai T. Hypermutation and microsatellite instability in gastrointestinal cancers. Oncotarget. 2017:8(67):112103–112115. 10.18632/oncotarget.v8i67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou C, Chen S, Xu F, Wei J, Zhou X, Wu Z, Zhao L, Liu J, Guo W. Estimating tumor mutational burden across multiple cancer types using whole-exome sequencing. Ann Transl Med. 2021:9(18):1437. 10.21037/atm. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Functional annotations per coding site using SnpEff and Supplementary tables, in addition to the MATLAB code that was used for the coding mutations analysis can be found in https://github.com/MarwaTuffaha/CancerBiases. Data are reported in the Supplementary Material tables. Supplementary tables S1–S4, Supplementary Material online provide data shown in Figs. 1 to 4 in the main text, respectively. Supplementary table S5, Supplementary Material online provides the 3-mer spectra plotted in supplementary figs. S9 and S10, Supplementary Material online.





