Skip to main content
Science Advances logoLink to Science Advances
. 2021 Nov 3;7(45):eabg4398. doi: 10.1126/sciadv.abg4398

Deficiency of replication-independent DNA mismatch repair drives a 5-methylcytosine deamination mutational signature in cancer

Hu Fang 1, Xiaoqiang Zhu 1, Haocheng Yang 1, Jieun Oh 1, Jayne A Barbour 1, Jason W H Wong 1,*
PMCID: PMC8565909  PMID: 34730999

DNA mismatch repair is essential for protecting the human genome from damage induced by 5-methylcytosine deamination.

Abstract

Multiple mutational signatures have been associated with DNA mismatch repair (MMR)–deficient cancers, but the mechanisms underlying their origin remain unclear. Here, using mutation data from cancer genomes, we identify a previously unknown function of MMR that is able to protect genomes from 5-methylcytosine (5mC) deamination–induced somatic mutations in a replication-independent manner. Cancers with deficiency of MMR proteins MSH2/MSH6 (MutSα) exhibit mutational signature contributions distinct from those that are deficient in MLH1/PMS2 (MutLα). This disparity arises from unrepaired 5mC deamination–induced mismatches rather than replicative DNA polymerase errors. In cancers with biallelic loss of MBD4 DNA glycosylase, repair of 5mC deamination damage is strongly associated with H3K36me3 chromatin, implicating MutSα as the essential factor in its repair. We thus uncover a noncanonical role of MMR in the protection against 5mC deamination–induced mutation in human cancers.

INTRODUCTION

DNA mismatch repair (MMR) is a highly conserved DNA repair pathway, well known for its ability to recognize and remove errors from the newly synthesized DNA strand during replication (1). Cancer genomes with MMR deficiency (MMRd) usually display a high mutational burden and a microsatellite instability (MSI) phenotype (2). Inactivation of DNA MMR genes is frequently observed in MMRd cancers and predisposes to the most common form of hereditary colorectal cancer (CRC) known as Lynch syndrome (3). Previous studies have comprehensively characterized the landscape of MMRd cancer genomes, especially with a close inspection on MSI events and how these events influence cancer-associated genes and oncogenic pathways (4, 5). These results illuminate selective pressures and facilitate the identification of potential cancer drivers. Furthermore, tumors with MMRd have been shown to benefit from immune checkpoint blockade, with response associated with MSI status and potentially increased neoantigen presentation (6).

The general mechanism of MMR in the correction of DNA replication errors in humans is well established. The process is initiated by the MutSα heterodimer, composed of MSH2 and MSH6, which recognize misincorporated bases in double-stranded DNA behind the replication fork. Subsequently, the MutLα heterodimer, which is composed of MLH1 and PMS2, is recruited to excise the sequence surrounding the mutated base. The mismatched section of the daughter strand is digested by EXO1 (Exonuclease 1) exonuclease, and the gap is filled by a DNA polymerase (1). MMR is also known to have noncanonical roles outside the context of DNA replication (7). One of the better-known functions of noncanonical MMR (ncMMR) is facilitating somatic hypermutation of the immunoglobulin locus in lymphoid cells (8). ncMMR has also been shown to be activated by DNA lesions resulting in an error-prone repair process that leads to the formation of A:T mutations (911). While these studies have found that ncMMR is generally associated with increased mutagenesis through the recruitment of error-prone DNA polymerase eta, more recently, it has also emerged that ncMMR is capable of protecting actively transcribed genes by removing DNA lesions in a transcription-dependent manner (12). The mechanistic details of how ncMMR achieves error-free repair remain unclear, but in an artificial in vitro model, it has been shown that ncMMR can indirectly facilitate error-free DNA demethylation (13), suggesting that high-fidelity ncMMR is at least possible.

The use of somatic mutational signatures has contributed substantially to our understanding of the underlying mutational processes in cancer (14). Now, seven single-based substitution (SBS) mutational signatures have been associated with MMRd across various cancer types (15). MMRd mutational signatures, SBS6 and SBS15, are characterized by a high frequency of C > T mutations; SBS21 and SBS26 have a dominant mutation spectrum of T > C, while SBS44 has relatively high contributions from C > A, C > T, and T > C mutations. SBS14 and SBS20 are associated with MMRd concurrent with polymerase epsilon (POLE) and polymerase delta (POLD1) exonuclease mutations, respectively. Recently, it was shown that two mutational processes largely underlie the MMRd-specific mutational signatures (16). Both of these mutational processes are believed to be associated with DNA polymerase–associated replication errors, yet only one has been reproducible in vitro using clonal models of MMRd cell lines (16, 17). Thus, the mechanisms underlying the mutational processes and the mutational signatures observed in human MMRd cancers remain unclear.

In this study, we provide evidence that ncMMR is required for the repair of 5-methylcytosine (5mC) deamination–induced G:T mismatches (we will refer to this as 5mC deamination damage) outside of the context of DNA replication. We show that mutations arising from unrepaired 5mC deamination events are prevalent in MMRd cancers, particularly those deficient in MutSα. Therefore, ncMMR activity is able to protect genomes from 5mC deamination–induced somatic mutations.

RESULTS

Profile of MMR genes in MSI-H tumors

We first investigated three tumor types [CRC, stomach adenocarcinoma (STAD), and uterine corpus endometrial carcinoma (UCEC)], as they have been recognized as MSI prone and contain most of the MSI events (4). A set of 316 cancer samples with MSI-high (MSI-H) phenotype was obtained from The Cancer Genome Atlas (TCGA) Pan-Cancer cohort for these tumor types. To focus on samples where the major mutational process is MMRd, we excluded samples with concurrent POLE and POLD1 exonuclease domain mutations as defined by high contributions from SBS14 and SBS20, respectively (18). The remaining samples were stratified into MutLα and MutSα mutants after careful review of the underlying mutations, RNA expression, and DNA methylation status of the four canonical MMR genes (MLH1, PMS2, MSH2, and MSH6) (see Materials and Methods, fig. S1, and table S1). In the end, 197 MutLα- and 18 MutSα-deficient cancer samples were identified. In addition, 14 samples were found with both MutLα and MutSα defects, and 37 samples were unable to be conclusively classified based on the available data.

As expected, all the MSI-H samples showed high mutation load for both indels and single-base substitutions with high proportions of C > T and T > C (Fig. 1A). MLH1 promoter methylation (defined as β > 0.25) was observed in 78% (149 of 191) of MSI-H samples with available methylation data, and this is in line with previous studies (19, 20). For the curated MutLα-deficient cancers, 93.4% (184 of 197) had aberrant expression of MLH1, while most with MutSα deficiency harbored truncating mutations in either MSH2 or MSH6 (Fig. 1A). To compare the mutation spectrum for MutLα- and MutSα-deficient samples, we performed principal components analysis based on trinucleotide context point mutation frequencies. Samples with MutSα deficiency were clustered together with a relatively high frequency of C > T mutations, while MutLα-deficient cancers were more distributed with a broader mutation spectrum (Fig. 1B). These results suggest that there may be differences underlying the mutational process of MutLα and MutSα deficiency.

Fig. 1. Landscape of MSI-H samples.

Fig. 1.

(A) Profile of mutation burden and six types of mutation frequency as well as the aberrant status of MMR genes including DNA mutation, RNA expression, and methylation. Cancer types and mutants classification are also indicated. (B) Principal components (PC) analysis of MSI-H cancer samples based on the frequency of 96 types of mutational contexts. The fractions of the six types of mutations are represented by the area of the sectors, and MutSα mutants are highlighted in red. SNV, single nucleotide variant; nDel, insertion and deletion.

MutSα- and MutLα-deficient cancers display differential mutational signatures

To determine whether MutSα- and MutLα-deficient cancers have different mutational processes, we used SigFit (21) to assign the somatic mutations of each sample to the five MMRd-associated SBS mutational signatures along with the age-associated SBS1, which is present in most cancers. SBS1 contributed to a unexpectedly high proportion of mutations in many MMR samples (Fig. 2A). However, this may reflect the difficulty in resolving SBS1 and SBS6, as the two signatures show a high degree of similarity. Generally, MutSα-deficient cancers had the highest contribution from SBS1 + SBS6 (Fig. 2A). To simplify the representation of mutational processes in MMRd cancers, we adopted the use of two signatures proposed by Nemeth et al. (16). Using the nonnegative matrix factorization algorithm, two de novo signatures were decomposed from the mutations of the MMRd samples. In line with Nemeth et al. (16), two distinctive de novo signatures were obtained with signature A (SigA), characterized by a high frequency of C > T mutations, particularly in CpG sites, while SigB showed a broader spectrum with C > A, C > T, and T > C mutations (Fig. 2B). We then calculated the cosine similarity between the reconstructed spectrum derived from these two signatures and the actual mutational spectrum for each sample. Most of the samples had a high cosine similarity above 0.85 (fig. S2A). Next, we compared the newly decomposed signatures with the previously reported MMRd-related signatures (15). SigA showed relatively high cosine similarity with SBS6 and SBS15, while SigB was more alike with SBS21 and SBS26 (fig. S2B).

Fig. 2. Mutation signature contribution in MutSα and MutLα mutants.

Fig. 2.

(A) Fraction of MMRd-associated signatures and age-related signature SBS1 contribution in MutSα and MutLα mutants. (B) The spectrum of de novo signatures extracted from TCGA MSI-H cancer samples. (C to F) The boxplot of SigA contribution for MutSα and MutLα mutants in TCGA-MMRd, MSK-CRC, MSK-UCEC, and DepMap MMRd cohort. P values were calculated by two-tailed Student’s t test.

Because we found that the mutation spectrum of MutSα-deficient samples had a generally higher proportion of C > T mutations and SBS1 + SBS6 relative to MutLα-deficient samples (Figs. 1, A and B, and 2A), we sought to examine the contribution of the two de novo signatures in each MMR sample. Samples with MutSα deficiency had significantly higher contribution of SigA relative to those with MutLα deficiency (P < 0.001, Student’s t test; Fig. 2C), with samples deficient of both complexes having SigA contribution less than MutSα but not significantly different to MutLα-deficient samples (P < 0.01 and P = 0.41, respectively, Student’s t test; fig. S2C). To exclude any potential cancer-specific effect, we examined the signature contribution in CRC, STAD, and UCEC separately and found that SigA is significantly more enriched in MutSα-deficient samples compared with MutLα-deficient samples across all cancer types (P < 0.01, Student’s t test; fig. S2, D to F). We next expanded our data to three independent cohorts to validate this observation. Because of the limited availability of data types for these cohorts, the approach to classify MutSα and MutLα deficiency status is slightly different from the TCGA dataset (see Materials and Methods). MSK-CRC (22) and MSK-UCEC (23) cohorts contain 99 CRC and 22 UCEC MSI-H samples, respectively, with only targeted sequencing data available. After fitting the mutations from the samples to SigA and SigB, both the cohorts showed significant enrichment of SigA in MutSα-deficient samples compared with MutLα-deficient samples (P < 0.001 and P = 0.04 for CRC and UCEC respectively, Student’s t test; Fig. 2, D and E). Furthermore, the analysis was also performed on the DepMap (24) cohort composing of 99 MMRd cell lines across 16 cancer types. Again, MutSα-deficient samples had significant enrichment in SigA compared with MutLα-deficient samples (P = 0.001, Student’s t test; Fig. 2F). The relative contribution of SigB for these cohorts is shown in fig. S2G. Last, as complex MSH2 and MSH6 mutations are a frequent mechanism of MSI in prostate cancer (25), we identified a further four MutSα mutant samples in the TCGA prostate adenocarcinoma cohort and confirmed them all to have high SigA contribution (>0.845; fig. S3). Together, these results suggest that these two MMR-associated mutational processes contribute to a different extent to the overall mutation spectrum of MutSα- and MutLα-deficient cancers.

CpG C > T mutations in MutSα mutants show no replication strand bias compared with non-CpG C >T mutations

Mutation density varies across cancer genomes (26). Because of differential MMR efficiency, strong correlation is found between DNA replication timing and mutation density, where late replicating regions have higher mutation density compared with early replicating regions of the genome (27). In MMRd cancers, the association of mutation density and replication timing becomes less apparent for MMR-dependent mutational processes. Given that we found different contributions of mutational processes in MutSα- and MutLα-deficient cancers, we sought to determine how the processes are influenced by replication timing. As expected, higher mutation density was observed in late replicating regions compared with early replicating regions in MMR-proficient microsatellite stable (MSS) cancers, but the difference was reduced in MutSα- and MutLα-deficient MSI samples (fig. S4A). We observed a significant difference in the dependence of the mutation load and replication time between MutSα and MutLα (fig. S4B) and ascribed this difference to CpG C > T mutations but not non-CpG C > T or T > C mutations (fig. S4, C to O). These data suggest that there may be differences in the dependence of CpG C > T mutations on replication compared with other types of substitution mutations.

We next investigated the replication strand bias of mutations to further examine the relationship between mutation formation in MMRd cancers and DNA replication. Mutations in MMRd samples have generally been attributed to unrepaired errors that have escaped from polymerase proofreading during DNA replication (28). Because of the differential fidelity of polymerases during DNA synthesis, mutation load in the leading strand and lagging strand are expected to be asymmetric (29). We examined the replication asymmetry of C > T/G > A and A > G/T > C mutations in the leading and lagging strands. Both mutation types showed strand bias with the leading strand, generating more C > T and A > G (P < 0.05 for both, chi-square test; Fig. 3A). The lack of strand bias for exome-wide–simulated mutations confirmed that this bias is related to replication rather than sequence composition (Fig. 3B). We next compared the strand bias for each individual MutSα- and MutLα-deficient samples. While there was no difference in replication bias in A > G/T > C mutations between the MutLα- and MutSα-deficient samples (P = 0.849, Student’s t test; Fig. 3C), the bias of C > T/G > A mutation is significantly different, with MutSα showing less asymmetry than MutLα-deficient samples (P = 0.002, Student’s t test; Fig. 3D). Furthermore, we compared CpG C > T and non-CpG C > T replication bias individually for MutSα- and MutLα-deficient samples. CpG C > T mutations showed significantly less bias compared with non-CpG C > T mutations (P < 0.001 for both MutSα and MutLα, Student’s t test; Fig. 3E). We further observed significant CpG C > T strand bias difference between MutSα- and MutLα-deficient samples (P < 0.001, Student’s t test), while there was no significant difference for non-CpG C > T mutations (Fig. 3E). Although there was insufficient whole genome–sequenced (WGS) samples with MutSα deficiency (n = 1), with increased number of mutations, we were able to assess strand bias across all mutation types (fig. S5, A to F), and consistent results were observed in the strand bias difference between CpG C > T and non-CpG C > T mutations (MutLα, P < 0.001, Student’s t test; fig. S5G). Reduced strand bias of CpG C > T mutations was also observed in WGS MSS samples (P < 0.001; fig. S5H). Furthermore, as some MutLα-deficient samples had relatively high SigA contribution, we directly correlated SigA contribution with the degree of CpG C > T replication bias. Negative correlation was observed with samples with high proportions of SigA showing less bias [correlation coefficient (R) = −0.29, P < 0.0001; fig. S5I]. Together, these results suggest that CpG C > T mutations associated with SigA in human MMRd cancers may not have arisen from unrepaired DNA polymerase errors as they do not exhibit the characteristic replication asymmetry found for other types of substitution mutations.

Fig. 3. Replication asymmetry for MMRd samples.

Fig. 3.

The landscape of replication asymmetry for all observed mutations (A) and expected mutations (B) in MutSα and MutLα mutants. The expected mutations were obtained from simulation data that consider the abundance of trinucleotide mutational contexts. (C and D) Boxplot of replication strand bias for A > G/T > C and C > T/G > A mutations in MutSα and MutLα mutants. (E) Boxplot of replication strand bias for CpG C > T and non-CpG C > T mutations in MutSα and MutLα mutants. WXS, whole exome sequencing. ***P <0.001; n.s., >0.05, two-tailed Student’s t test.

Most CpG C > T mutations in MMRd cancers are caused by the deamination of 5mC

5mC has the tendency to undergo spontaneous deamination into thymine and is a major mutagenic process in the human genome (30). Methyl-binding domain 4 (MBD4) is a key DNA glycosylase responsible for the removal of 5mC deamination damage (31). Cancer patients with biallelic germline MBD4 deficiency present with extremely high frequency of CpG C > T mutations, providing a model for mutations induced by 5mC deamination (32). CpG C > T mutations in MBD4-deficient cancers are likely to arise outside the context of DNA replication, thus replication asymmetry would not be expected. To test if this is the case, we then obtained somatic mutations from WGS MBD4-deficient cancers (32) and compared the CpG C > T replication strand bias with MMRd mutants. We also included POLE exonuclease domain mutant cancers as their somatic mutations are known to be predominantly leading strand biased (33), and it has been proposed that POLE mutants are particularly erroneous when replicating 5mC (34). MBD4 mutants show little strand bias for all mutation types, while there was substantial strand bias for POLE mutants (Fig. 4A), an observation not present in simulated mutations (Fig. 4B). The strand bias of CpG C > T mutations was close to zero for MBD4 mutants, while, as expected, POLE mutants presented strong leading strand bias for both CpG and non-CpG C > T mutations (Fig. 4, C and D). Although CpG C > T mutations for POLE mutants were highly enriched in TCG trinucleotide context, we observed consistent CpG C > T mutation strand bias in other contexts (fig. S6A). These results suggest that the pattern of MMRd CpG C > T mutations is more similar to MBD4 mutants where the mutations arise from the deamination of 5mC and are less dependent on the process of DNA replication.

Fig. 4. Replication asymmetry for MBD4 and POLE mutants.

Fig. 4.

The landscape of replication asymmetry for all observed mutations (A) and expected mutations (B) in MBD4 and POLE mutants. The expected mutations are obtained from simulation data that consider the abundance of trinucleotide mutational contexts. (C and D) Boxplot of replication strand bias for CpG C > T and non-CpG C > T mutations in MBD4 and POLE mutants. (E and F) Boxplot of replication strand bias for CpG C > T mutations in highly methylated and lowly methylated regions for MMRd samples and POLE mutants. Sites with a β > 0.3 are defined as highly methylated, while <0.3 as lowly methylated. The range of mutation counts in lowly and highly methylated sites for calculating strand bias in MMRd samples was (52 to 156) and (2347 to 6145), respectively, and for POLE mutants, (72 to 703) and (4032 to 39,580), respectively.

As CpG C > T mutation rate increases with methylation level in MMRd samples (35), we compared replication strand bias of CpG C > T mutations in lowly and highly methylated sites in MMRd and POLE mutant cancers. Mutations at highly methylated regions showed significantly less strand bias compared with lowly methylated sites for MMRd samples (P = 0.0233, Student’s t test; Fig. 4E), while POLE mutants presented comparably strong strand bias at both lowly and highly methylated regions (P = 0.80, Student’s t test; Fig. 4F). Although there were limited samples for MBD4 mutants, we observed low strand bias of CpG C > T mutations at highly methylated regions (fig. S6B). This further supports our hypothesis that CpG C > T mutations in MMR-deficient cancers arise from replication-independent deamination of 5mC.

MMR repairs 5mC deamination damage in an H3K36me3-dependent manner

The histone mark H3K36me3 is an important epigenetic modification involved in the recruitment of MMR to chromatin (36). One of the hallmarks of H3K36me3-dependent MMR is the differential repair of exons and introns, where exons have much fewer mutations than expected due to increased H3K36me3 and MMR activity compared with introns (37). To examine whether MMR might play a role in the repair of 5mC deamination damage, we compared the observed and expected CpG C > T mutation densities in exons and introns in MBD4-deficient cancers and compared this to MMRd (i.e., MSI-H), MSS, and POLE mutant cancers (Fig. 5, A to D). Because of the proximity of introns and exons in the genome, comparison of their mutation density automatically controls for transcriptional activity and replication timing, both of which are also correlated with H3K36me3 (38). Exons have more observed and expected number of CpG C > T mutations compared with introns due to their generally higher GC content and CpG methylation levels (39). Meanwhile, compared to introns, MSS and POLE showed a substantial and significant decrease in observed exonic CpG C > T compared to expected (37.3 and 31.2%; P < 0.0001, one sample t test; Fig. 5, A and B), while this decrease was substantially less in MSI cancers due to the loss of MMR (2.74%; Fig. 5C). Unexpectedly, a substantial and significant reduction in the observed exonic mutation density compared with expected exonic mutation density was also observed in MBD4 mutants (21.6%; P < 0.0001, one sample t test; Fig. 5D). As MBD4 is responsible for the repair of 5mC deamination damage, the decrease in observed exonic mutations in MBD4 mutants suggests that MMR may also be playing a role in the repair of G:T mismatches.

Fig. 5. Association of mutation frequency with local determinants for different samples.

Fig. 5.

(A to D) Observed and expected mutation densities in exons and introns across MSS, POLE mutants, MMRd samples, and MBD4 mutants. The expected mutations are obtained from simulation data that consider the abundance of trinucleotide mutational contexts. The decrease of observed (obs) and expected (exp) mutation density in exonic regions is indicated and calculated as (obs-exp)/exp. (E) Correlation of CpG C > T mutation ratio (obs/exp) with gene expression for MMRd samples, MSS, and MBD4 mutants. The P values of the correlation are 7.7 × 10−4, <2.2 × 10−16, and 0.167 for MBD4 mutants, MMRd samples, and MSS, respectively, and they were obtained from the linear regression model by fitting observed mutation density with unbinned gene expression. (F to H) Boxplot of transcription strand bias for CpG C > T and non-CpG C > T mutations in MMRd samples, MSS samples, and MBD4 mutants. (I) The hazard ratio (HR) of different epigenetics marks for CpG C > T mutation formation from multivariable logistic regression model. The 95% confidence level is indicated. P value is calculated by Wald’s test. (J) Correlation between mutations in MBD4 mutants and H3K36me3 signal from mobilized CD34+ primary cells.

Recently, the MMR system has been reported to preferentially protect actively transcribed genes from mutations during transcription (12). Consistent with this, we found a negative correlation between CpG C > T mutation density and gene expression level for MSS and MBD4 mutants, while there were more mutations in highly expressed genes for MMRd samples (Fig. 5E). To determine whether transcription-coupled nucleotide excision repair (TC-NER) may also have a role in repairing 5mC deamination damage, we examined the transcription strand bias in MSI, MSS, and MBD4 mutant cancers. We found that the transcription strand bias of CpG C > T mutations was close to zero in all cases (Fig. 5, F to H), suggesting that TC-NER is not involved in its repair. In MBD4 mutant cancers, this lack of transcription strand bias further suggests that MMR is likely to be playing a role in the repair of the G:T mismatches. We next examined the association of CpG C > T mutation and different epigenetics marks for MBD4 mutants by multivariable logistic regression. Since CpG C > T mutations are highly dependent on methylation (fig. S7A), we only selected mutations with highly methylated CpG (>0.9) to ensure that we delineate the impact of histone modifications from DNA methylation. Apart from replication timing, we identified histone mark H3K36me3 had the lowest hazard ratio (HR) for CpG C > T mutation formation (HR = 0.88, P < 0.001; Fig. 5I and table S2). There were fewer mutations in the regions of high H3K36me3 signal, suggesting that, in the absence of MBD4, the repair of CpG C > T mutations is dependent on H3K36me3 (Fig. 5J). H3K36me3 also positively correlates with methylation level (fig. S7B). This suggests that, in MBD4 mutants, although CpG methylation is the strongest determinant of mutation density, H3K36me3 activity is also an important factor for accounting for CpG mutation density. Together, these results suggest that even in the absence of MBD4, MMR has some capacity to facilitate the repair of 5mC deamination damage in an H3K36me3-dependent manner.

MMR rather than MBD4 is essential for the repair of 5mC deamination damage in regions with high H3K36me3

While purified MBD4 can excise mismatched bases from DNA in vitro (40), it is unclear whether MBD4 can repair 5mC deamination damage in the absence of MMR proteins in vivo. Previous studies have shown that MBD4 binding in the genome is strongly associated with DNA methylation density but is only weakly associated with H3K36me3 (41). To determine whether MBD4 can facilitate the repair of 5mC deamination damage, we identified regions of the genome enriched in MBD4 and H3K36me3 based on chromatin immunoprecipitation sequencing (ChIP-seq) data from Encyclopedia of DNA Elements (ENCODE). Because the MBD4 ChIP-seq dataset is only available for the HepG2 cell line, we also used H3K36me3 ChIP-seq data from HepG2 to avoid cell type–specific bias. After removing regions of low mappability, we identified 119,237 1-kb windows in the genome that have either high (top 20%) MBD4 or H3K36me3 (Fig. 6A). We also identified windows with low MBD4 or low H3K36me3. Using the respective regions, we compared the observed/expected mutation density across the windows in MBD4 mutants, MSS, and MMRd (MSI) cancers. In high MBD4 regions, the observed/expected mutation rate was generally lower than the other regions (Fig. 6B). This is due to the lower methylation level of CpGs in these regions despite overall high density of CpG and the enrichment in early replicating regions (fig. S8A). The observed/expected mutation load was broadly similar across the three cancer types, suggesting that MBD4 alone has little impact on observed mutation density. By contrast, for high H3K36me3 regions, in line with the importance of H3K36me3 for MMR, MSI had much more observed/expected mutations than MBD4 mutants and MSS cancers. This supports our earlier results that MMR can repair 5mC deamination damage in the absence of MBD4. To exclude the influence of cell type on this observation, we also used H3K36me3 data from HCT116 to identify the top and bottom 20% H3K36me3 occupied region and found that the difference in the observed/expected mutation ratio among MMRd, MSS, and MBD4 samples are very similar to those from HepG2 (fig. S9). To further quantify the impact of MBD4 binding on 5mC deamination damage, we generated multivariable regression models and found that, in MMRd cancers, MBD4 signal was not associated with lower likelihood of mutation [HR = 1.06; Fig. 6C; see MutLα and MutSα separately in fig. S8 (B and C)]. In MSS cancers, although a small decrease in HR was present for MBD4, H3K36me3 was a larger contributor to mutation likelihood (HR = 0.89 versus HR = 0.96; Fig. 6D).

Fig. 6. Association of MBD4 binding sites, histone mark H3K36me3, and mutations.

Fig. 6.

(A) Venn diagram indicating the number of regions classified as top and bottom MBD4 and H3K36me3 signal based on the HepG2 cell line. (B) The ratio of observed and expected CpG C > T mutations in different regions for MBD4 mutants, MSS, and MMRd cancers. The HR of different epigenetics marks for CpG C > T mutation formation from multivariable logistic regression model for MMRd (C) and MSS (D) cancers. The 95% confidence level is indicated. P value was calculated by Wald’s test. (E) Schematic of the proposed mechanism of mismatch formation, repair, and mutations.

It has been observed that truncating mutations in MBD4 are common in MSI cancers, and truncated MBD4 can exert a dominant-negative effect (42). We have previously shown that MSI cancers with and without MBD4 truncating mutations does not show any difference in C > T mutation density at methylation CpG sites (35). To examine this further in the context of MutLα- and MutSα-deficient MSI cancers, we found that 13.7% (27 of 197) of MutLα and 11.1% (2 of 18) of MutSα-deficient cancers harbored MBD4 truncating mutations (fig. S8D and table S3). We did not find any difference in the mutational signature contributions in MBD4 wild type and mutants for both MutLα- and MutSα-deficient cancers (fig. S8E). We also examined the expression of MBD4 in TCGA MSI and MSS samples. While MBD4 expression was significantly lower in MutLα-deficient cancers compared with MSS cancers (fig. S8F), no association between MBD4 expression and SigA contribution was observed for MutLα-deficient cancers (fig. S8G). This further suggests that MBD4 is not a major factor in limiting CpG C > T mutations in MMRd cancers.

As MutSα-deficient cancers have the highest SigA contribution and MutSα is responsible for recognizing mismatched bases in DNA, our data suggest that it is the essential component for the repair of 5mC deamination damage. We found TDG to be up-regulated in MutLα-deficient cancers (fig. S8H). This up-regulation suggests that TDG may be the alternative glycosylase that can collaborate with MMR in the repair of 5mC deamination damage independently of MutLα and MBD4 (Fig. 6E).

DISCUSSION

A number of previous studies have sought to use genomics to determine the mechanisms underlying the different mutational signatures in MMRd cancers (16, 17, 4345). However, these efforts have been either restricted by cell line/organoid models (17, 45), nonmammalian models (44), a lack of samples (43), or incomplete assessment of all the data types available (16). In this study, we carefully determined the status of the canonical MMR genes and classified each sample as being MutSα or MutLα deficient. In doing so, we found substantial differences in the mutational signatures of MutSα and MutLα across four independent cohorts. Specifically, MutSα deficiency presents a high CpG C > T mutation spectrum, while MutLα mutants have a broader mutational spectrum including C > A, C > T, and T > C mutations. These results are consistent with a recently published study of whole exome–sequenced MSH2/MSH6-deficient gliomas, which were all found to have a high frequency of CpG C > T mutation similar to SigA (46). Another study based on the MSH6 null mouse model also reported elevated mutation frequency and predominance of G:C to A:T transition in MSH6-deficient small intestinal epithelium (47).

Because of the susceptibility of cytosine (at CpG sites) to a variety of chemical modifications, CpG C > T mutations are common in cancer genomes and are generally recognized to be the result of 5mC deamination. Nevertheless, 5mC can also lead to CpG mutations in other ways (30, 35, 48). While MMR is known to correct G:T and other mismatches that result from DNA replication errors, the repair of 5mC deamination should require excision of thymine from the G:T mismatch to restore the correct G:C pair (31). It has been well established that MBD4 plays a critical role in repairing mismatches caused by 5mC deamination (32). C > T mutations that arise from unrepaired 5mC deamination–induced G:T should show no replication strand asymmetry as the deamination process should be largely independent of DNA replication (Fig. 4C). By contrast, CpG C > T mutations that arise as a result of DNA replication errors, such as those in POLE mutant cancers (34), display a high degree of asymmetry (Fig. 4D). As the CpG C > T mutations observed in MMRd cancers, particularly those deficient of MutSα, lack strong replication strand asymmetry, this suggests that most of these mutations likely arise from replication-independent 5mC deamination.

Of note, we found that clonal mutations derived from two independent cultured MutSα mutant cell lines (49, 50) had a broader distribution of mutation types (fig. S10, A and B), with both presenting a low contribution of SigA (fig. S10C). As CpG C > T mutations generated by clonal expansion of these MMRd cell lines show significant replication strand bias (fig. 10, D and E), we propose that this discrepancy with cancer samples may be due to the rapid rate of cell division and lack of time for 5mC deamination–induced mutations to accumulate in cell lines. In line with this hypothesis, Zou et al. (49) found approximately 133 mutations per round of cell division (~per day) in the MSH6-deficient cell lines, whereas, in MBD4-deficient mice, mCpG deamination–induced mutations only accumulated at a rate of ~1.4 per day (531 mutations over 382 days) (32). Thus, we propose that the two main mutational processes operational in MMRd cancers are a replication-independent 5mC deamination–induced C > T mutational signature (i.e., SigA) and a replication-dependent DNA polymerase error-driven signature (i.e., SigB) (Fig. 6E). As SigA contributes to more than 50% of mutations observed in most MMRd cancers (Fig. 2, C to F, and fig. S11), the inability to repair 5mC deamination damage can be considered the main process in MMRd cancers.

Activities of MMR outside the context of DNA replication have been termed as ncMMR and have been generally associated with the (error-prone) repair of DNA lesions by recruiting error-prone DNA polymerases (911). It is therefore unexpected that a form of ncMMR appears to participate in the error-free repair of 5mC deamination damage. Nevertheless, a recent study showed evidence of ncMMR-dependent error-free repair of oxidative damage in actively transcribed genes (12). Our observation that less replication strand bias of CpG C > T mutations in MutSα cancers support a role of MMR outside of DNA replication. Furthermore, as the activity of MMR is deferential for exon and intron in a histone mark H3K36me3-dependent manner, we identified a substantial and significant decrease in observed exonic CpG C > T mutations compared to expected for MSS and POLE mutants, while this decrease was substantially less in MSI cancers due to the loss of MMR. In addition, the reduced CpG C > T mutations in high H3K36me3 regions for MBD4 mutants compare to MMRd cancers suggest the role of MMR in the absence of MBD4 to mend CpG C > T mutations. Together, we proposed a role of ncMMR outside the context of DNA replication in repairing 5mC deamination damage.

Nevertheless, our data suggest that, for the repair of 5mC deamination damage, MMR only performs the function of mismatch/lesion recognition to promote the recruitment of other DNA repair factors pathways and more focus on the role of MutSα complex. Interaction of MMR and base excision repair has been studied in vitro in the context of active DNA demethylation (13), and their results support a model where the two pathways collaborate to facilitate error-free removal of 5mC-induced lesions including G:T mismatches in a MSH2- and TDG-dependent manner. Our findings support this model. MutSα is known to have the role in DNA mismatch recognition, and its MSH6 subunit contains the PWWP domain that binds H3K36me3 to facilitate its recruitment (36). As MBD4 is known to interact with MLH1, the high contribution of SigA for MutSα-deficient cancers suggests that MutSα is essential for efficient mismatch recognition to recruit the MutLα-MBD4 complex for the repair of 5mC deamination damage. On the other hand, repair of CpG C > T mutations for MBD4 mutants is dependent on H3K36me3, which suggests that the recruitment of MutS must still be facilitating some level of repair even in the absence of MBD4. The most likely candidate involved in this repair process is TDG, as it is known that it can repair T:G mismatches. The increased expression of TDG in MutLα-deficient cancers supports this that TDG may be the alternative glycosylase that can collaborate with MMR in the repair of 5mC deamination damage independently of MutLα and MBD4 (fig. S8H). Although our data provide strong evidences for ncMMR in the repair of 5mC-induced CpG deamination, some of our results are based on correlations. Thus, direct experimental demonstration of the role of ncMMR in repairing methylcytosine at CpG sites will be needed to be performed in the future to elucidate the precise underlying mechanism.

In summary, we demonstrate that replication-independent 5mC deamination contributes to most CpG C > T mutations in MMRd cancers. We find that MMR is essential for the repair of 5mC deamination–induced G:T mismatches. This ncMMR function is likely to be MutSα dependent as MutSα-deficient cancers are highly enriched in CpG C > T mutations. These results provide previously unknown insights of mutational processes in MMRd cancers and further our understanding of the ever important MMR pathway.

MATERIALS AND METHODS

Data collection

This study was approved by the Institutional Review Board of the University of Hong Kong/Hospital Authority Hong Kong West Cluster (approval no. UW 18-599). A list of 316 MSI-H cancer samples including CRC, STAD, and UCEC was obtained from TCGA Pan-Cancer dataset (51). Other three independent cohorts were obtained for validation: DepMap cohort comprises 99 MSI-H samples across 16 cancer types (24). MSK-CRC and MSK-UCEC cohorts contain 99 MSI-H CRCs and 22 MSI-H endometrial cancers, respectively, both with targeted sequencing data (22, 23). All these data are summarized in table S4.

MutSα and MutLα classification

For the 316 samples from TCGA Pan-Cancer cohort, signature contributions were assigned by fitting COSMIC mutation signature v3 via the SigFit R package (21). Samples with high contributions (>10%) of signature SBS10a/b, SBS14, and SBS20 were excluded to avoid the effect of mutational processes that are caused by POLE and POLD1. DNA mutation, RNA expression, and methylation data were applied to the remaining 266 samples for classification by the steps below: (i) Linear regression analysis was performed on the basis of MLH1 methylation and expression. The regression equation was obtained as y = 9.050 − 4.996*x. Hypermethylated MLH1 is defined as β > 0.25, and the low MLH1 expression cutoff value was obtained as 7.8 based on the equation. (ii) Then, MutSα and MutLα were determined on the basis of the RNA expression and truncating mutation of the MMR genes that are elaborated in fig. S1. For 99 MSI-H samples from the DepMap cohort, we first classified samples with truncating mutations in MSH2/MSH6 as MutSα if they have no aberration in MLH1/PMS2. Then, the remaining samples with no aberration in MSH2/MSH6 were classified as MutLα (table S5). Because of the availability of data for MSK-CRC cohort, the classification of MutSα and MutLα is based on DNA mutations of MMR genes. Samples with truncating mutations in MSH2/MSH6 and without truncating mutations in MLH1/PMS2 are classified as MutSα. The remaining samples are classified as MutLα (table S6). For samples from the MSK-UCEC cohort, the classification is based on immunohistochemistry of the four MMR genes (table S7).

Mutation simulation at trinucleotide resolution

Mutation simulation was performed by SigProfilerSimulator (52) to preclude the bias of trinucleotide composition, which could affect the mutation distribution in local regions. Briefly, the total number of simulated mutations for each sample is equal to the observed mutations, but the position of the mutation is relocated according to the frequency of trinucleotide context along the given region. Each sample was simulated 100 times, and all the mutations are combined as expected mutations for subsequent local mutation density analysis.

Mutational signature analysis

The profile of each signature was represented by six substitution subtypes: C > A, C > G, C > T, T > A, T > C, and T > G. For signatures generated by trinucleotide context, each substitution on the cancer genome was examined by incorporating information on the bases immediately 5′ and 3′ to each mutated base to generate 96 possible mutational types. De novo signatures were extracted by SigFit, which applies a Bayesian inference algorithm (21). Mutational signatures were displayed and reported on the basis of the observed trinucleotide frequency of the human genome. For validation cohorts, contribution of de novo signatures for each sample was calculated by fitting the mutations to the extracted de novo signatures.

Replication timing and mutation density

The replication time of different chromosome regions was obtained for the HepG2 cell line from the ENCODE data portal (53). Exome sequence with known replication time was divided into five bins from late to early: [−3.88, 51.98], [51.98, 66.30], [66.30, 74.95], [74.95, 80.74], and [80.74, 87.95]. The counts of mutations within each bin were calculated as observed mutation. Similarly, the expected mutation counts were also computed for each bin based on simulated data. The slope was obtained from the linear regression model based on the correlation of mutation ratio [observed/expected (obs/exp)] and replication timing.

Gene expression and mutation density

The general gene expression data were obtained from Genotype-Tissue Expression Portal, and all expressed genes were integrated into four bins according to the expression quartile. For each bin, only sites located within early replicated regions are adopted. The size of each bin was calculated on the basis of the length of exons of each gene. The count of observed and expected mutations was calculated for each bin to determine the mutation density.

Calculating strand asymmetries

Replication direction was defined using replication timing profiles from a previously published paper (54). Left (leading)– and right (lagging)–replicating regions were determined by the derivative of the profile. For a given mutation type in the right replication direction, the mutation counts (N1) in that region were calculated, and its complementary mutation was calculated as n1. Correspondingly, the mutation counts of this mutation type in left replication direction were calculated as N2, and its complementary mutation was calculated as n2. Then, asymmetry (A) was calculated as

A=log2((N1+n2)/(N2+n1))

For the transcription strand asymmetry, coding and template strand were obtained from a published study (55), and the asymmetry is reported as log2 ratio of (mutation count within template regions)/(mutation counts within coding regions).

Computing mutation density in exonic and intronic regions

All gene coordinates were obtained from the UCSC (University of California Santa Cruz) table browser. Middle exons and middle introns were extracted for each gene. Then, the mutation density was calculated as mutation counts per megabase for both exonic and intronic regions.

Associations among MBD4 mutant mutation density, histone marks, and CpG methylation

As MBD4 mutants are derived from acute myelocytic leukemia, we obtained whole-genome bisulfite sequencing data for E050 (mobilized CD34 primary cells). Histone marks including H3K36me3, H3K27me3, H3K9me3, H3K4me3, H3K4me1, and H3K27ac and deoxyribonuclease (DNase) I–hypersensitive site are derived from common myeloid progenitor and CD34-positive samples. For the data to estimate regression model for MSS and MMRd (MSI) samples, the mutations are from TCGA CRC cancer. Histone marks and DNase I–hypersensitive site are obtained from Homo sapiens’ large intestine male embryo (108 days). All these data are downloaded from the Roadmap Epigenomics Atlas (56). MBD4 ChIP-seq data were obtained from HepG2 cells from ENCODE (53). Only sites with methylation value of >0.9 are adopted for fitting the logistic model. For the correlation of CpG methylation, MBD4 mutants mutation, and H3K36me3 signal, all cytosines in the CpG dinucleotide were merged into 12 bins according to their methylation level as: [0], [0, 0.1], .., [0.9, 1.0], [1]. These bins were then used as intersected regions to calculate the mutation density in each methylation level. H3K36me3 bins were set on the basis of the H3K36me3 signal. For the grouping of MBD4 and H3K36me occupied regions, the H3K36me data were also obtained from HepG2 from ENCODE (53). MBD4 and H3K36me3 signal/input were calculated across 1-kb windows across the genome. Regions that had an average mappability score of <1 based on UCSC Duke Uniqueness 35 base pair and those that overlapped Data Analysis Center blacklisted regions were removed from the analysis.

Acknowledgments

We thank I. Majewski and R. Poulos for valuable feedback on the manuscript.

Funding: The project was supported by the Research Grants Council, HK (17100920 and C7028-19G) and Seed Funding from the University Grants Council, The University of Hong Kong to J.W.H.W.

Author contributions: J.W.H.W. conceived and designed the research. H.F., H.Y., J.O., and J.W.H.W developed methodology and performed research. H.F., X.Z., J.O., J.A.B., and J.W.H.W analyzed data. H.F. and J.W.H.W. wrote the manuscript.

Competing interests: The authors declare that they have no competing interests.

Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Links to the dataset used and scripts to generate figures in the paper are listed in the key resource table in the supplementary tables.

Supplementary Materials

This PDF file includes:

Figs. S1 to S11

Tables S1 to S7

Other Supplementary Material for this manuscript includes the following:

Data file S1

View/request a protocol for this paper from Bio-protocol.

REFERENCES AND NOTES

  • 1.Kunkel T. A., Erie D. A., Eukaryotic mismatch repair in relation to DNA replication. Annu. Rev. Genet. 49, 291–313 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Lynch H. T., Snyder C. L., Shaw T. G., Heinen C. D., Hitchins M. P., Milestones of Lynch syndrome: 1895-2015. Nat. Rev. Cancer 15, 181–194 (2015). [DOI] [PubMed] [Google Scholar]
  • 3.Valle L., Vilar E., Tavtigian S. V., Stoffel E. M., Genetic predisposition to colorectal cancer: Syndromes, genes, classification of genetic variants and implications for precision medicine. J. Pathol. 247, 574–588 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Cortes-Ciriano I., Lee S., Park W.-Y., Kim T.-M., Park P. J., A molecular portrait of microsatellite instability across multiple cancers. Nat. Commun. 8, 15180 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hause R. J., Pritchard C. C., Shendure J., Salipante S. J., Classification and characterization of microsatellite instability across 18 cancer types. Nat. Med. 22, 1342–1350 (2016). [DOI] [PubMed] [Google Scholar]
  • 6.Mandal R., Samstein R. M., Lee K.-W., Havel J. J., Wang H., Krishna C., Sabio E. Y., Makarov V., Kuo F., Blecua P., Ramaswamy A. T., Durham J. N., Bartlett B., Ma X., Srivastava R., Middha S., Zehir A., Hechtman J. F., Morris L. G., Weinhold N., Riaz N., Le D. T., Diaz L. A. Jr., Chan T. A., CANCER diversity of tumors with mismatch repair deficiency influences anti-PD-1 immunotherapy response. Science 364, 485–491 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Crouse G. F., Non-canonical actions of mismatch repair. DNA Repair 38, 102–109 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Cascalho M., Wong J., Steinberg C., Wabl M., Mismatch repair co-opted by hypermutation. Science 279, 1207–1210 (1998). [DOI] [PubMed] [Google Scholar]
  • 9.Pena-Diaz J., Bregenhorn S., Ghodgaonkar M., Follonier C., Artola-Borán M., Castor D., Lopes M., Sartori A. A., Jiricny J., Noncanonical mismatch repair as a source of genomic instability in human cells. Mol. Cell 47, 669–680 (2012). [DOI] [PubMed] [Google Scholar]
  • 10.Zlatanou A., Despras E., Braz-Petta T., Boubakour-Azzouz I., Pouvelle C., Stewart G. S., Nakajima S., Yasui A., Ishchenko A. A., Kannouche P. L., The hMsh2-hMsh6 complex acts in concert with monoubiquitinated PCNA and Pol η in response to oxidative DNA damage in human cells. Mol. Cell 43, 649–662 (2011). [DOI] [PubMed] [Google Scholar]
  • 11.Supek F., Lehner B., Clustered mutation signatures reveal that error-prone DNA repair targets mutations to active genes. Cell 170, 534–547.e23 (2017). [DOI] [PubMed] [Google Scholar]
  • 12.Huang Y., Gu L., Li G.-M., H3K36me3-mediated mismatch repair preferentially protects actively transcribed genes from mutation. J. Biol. Chem. 293, 7811–7823 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Grin I., Ishchenko A. A., An interplay of the base excision repair and mismatch repair pathways in active DNA demethylation. Nucleic Acids Res. 44, 3713–3727 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Volkova N. V., Meier B., González-Huici V., Bertolini S., Gonzalez S., Vöhringer H., Abascal F., Martincorena I., Campbell P. J., Gartner A., Gerstung M., Mutational signatures are jointly shaped by DNA damage and repair. Nat. Commun. 11, 2169 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Alexandrov L. B., Kim J., Haradhvala N. J., Huang M. N., Wei Tian Ng A., Wu Y., Boot A., Covington K. R., Gordenin D. A., Bergstrom E. N., Islam S. M. A., Lopez-Bigas N., Klimczak L. J., Pherson J. R. M., Morganella S., Sabarinathan R., Wheeler D. A., Mustonen V.; PCAWG Mutational Signatures Working Group, Getz G., Rozen S. G., Stratton M. R.; PCAWG Consortium , The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Nemeth E., Lovrics A., Gervai J. Z., Seki M., Rospo G., Bardelli A., Szüts D., Two main mutational processes operate in the absence of DNA mismatch repair. DNA Repair 89, 102827 (2020). [DOI] [PubMed] [Google Scholar]
  • 17.Zou X., Koh G. C. C., Nanda A. S., Degasperi A., Urgo K., Roumeliotis T. I., Agu C. A., Side L., Brice G., Perez-Alonso V., Rueda D., Badja C., Young J., Gomez C., Bushell W., Harris R., Choudhary J. S., Jiricny J., Skarnes W. C., Nik-Zainal S., Dissecting mutational mechanisms underpinning signatures caused by replication errors and endogenous DNA damage. bioRxiv 10.1101/2020.08.04.234245 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Haradhvala N. J., Kim J., Maruvka Y. E., Polak P., Rosebrock D., Livitz D., Hess J. M., Leshchiner I., Kamburov A., Mouw K. W., Lawrence M. S., Getz G., Distinct mutational signatures characterize concurrent loss of polymerase proofreading and mismatch repair. Nat. Commun. 9, 1746 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Simpkins S. B., Bocker T., Swisher E. M., Mutch D. G., Gersell D. J., Kovatich A. J., Palazzo J. P., Fishel R., Goodfellow P. J., MLH1 promoter methylation and gene silencing is the primary cause of microsatellite instability in sporadic endometrial cancers. Hum. Mol. Genet. 8, 661–666 (1999). [DOI] [PubMed] [Google Scholar]
  • 20.Haraldsdottir S., Hampel H., Wu C., Weng D. Y., Shields P. G., Frankel W. L., Pan X., de la Chapelle A., Goldberg R. M., Bekaii-Saab T., Patients with colorectal cancer associated with Lynch syndrome and MLH1 promoter hypermethylation have similar prognoses. Genet. Med. 18, 863–868 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Gori K., Baez-Ortega A., SigFit: Flexible Bayesian inference of mutational signatures. bioRxiv, 372896 (2018). [Google Scholar]
  • 22.Yaeger R., Chatila W. K., Lipsyc M. D., Hechtman J. F., Cercek A., Sanchez-Vega F., Jayakumaran G., Middha S., Zehir A., Donoghue M. T. A., You D., Viale A., Kemeny N., Segal N. H., Stadler Z. K., Varghese A. M., Kundra R., Gao J., Syed A., Hyman D. M., Vakiani E., Rosen N., Taylor B. S., Ladanyi M., Berger M. F., Solit D. B., Shia J., Saltz L., Schultz N., Clinical sequencing defines the genomic landscape of metastatic colorectal cancer. Cancer Cell 33, 125–136.e3 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Soumerai T. E., Donoghue M. T. A., Bandlamudi C., Srinivasan P., Chang M. T., Zamarin D., Cadoo K. A., Grisham R. N., O’Cearbhaill R. E., Tew W. P., Konner J. A., Hensley M. L., Makker V., Sabbatini P., Spriggs D. R., Troso-Sandoval T. A., Charen A. S., Friedman C., Gorsky M., Schweber S. J., Middha S., Murali R., Chiang S., Park K. J., Soslow R. A., Ladanyi M., Li B. T., Mueller J., Weigelt B., Zehir A., Berger M. F., Abu-Rustum N. R., Aghajanian C., DeLair D. F., Solit D. B., Taylor B. S., Hyman D. M., Clinical utility of prospective molecular characterization in advanced endometrial cancer. Clin. Cancer Res. 24, 5939–5947 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ghandi M., Huang F. W., Jané-Valbuena J., Kryukov G. V., Lo C. C., Mc Donald E. R. III, Barretina J., Gelfand E. T., Bielski C. M., Li H., Hu K., Andreev-Drakhlin A. Y., Kim J., Hess J. M., Haas B. J., Aguet F., Weir B. A., Rothberg M. V., Paolella B. R., Lawrence M. S., Akbani R., Lu Y., Tiv H. L., Gokhale P. C., de Weck A., Mansour A. A., Oh C., Shih J., Hadi K., Rosen Y., Bistline J., Venkatesan K., Reddy A., Sonkin D., Liu M., Lehar J., Korn J. M., Porter D. A., Jones M. D., Golji J., Caponigro G., Taylor J. E., Dunning C. M., Creech A. L., Warren A. C., Mc Farland J. M., Zamanighomi M., Kauffmann A., Stransky N., Imielinski M., Maruvka Y. E., Cherniack A. D., Tsherniak A., Vazquez F., Jaffe J. D., Lane A. A., Weinstock D. M., Johannessen C. M., Morrissey M. P., Stegmeier F., Schlegel R., Hahn W. C., Getz G., Mills G. B., Boehm J. S., Golub T. R., Garraway L. A., Sellers W. R., Next-generation characterization of the cancer cell line encyclopedia. Nature 569, 503–508 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Pritchard C. C., Morrissey C., Kumar A., Zhang X., Smith C., Coleman I., Salipante S. J., Milbank J., Yu M., Grady W. M., Tait J. F., Corey E., Vessella R. L., Walsh T., Shendure J., Nelson P. S., Complex MSH2 and MSH6 mutations in hypermutated microsatellite unstable advanced prostate cancer. Nat. Commun. 5, 4988 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Supek F., Lehner B., Scales and mechanisms of somatic mutation rate variation across the human genome. DNA Repair 81, 102647 (2019). [DOI] [PubMed] [Google Scholar]
  • 27.Lawrence M. S., Stojanov P., Polak P., Kryukov G. V., Cibulskis K., Sivachenko A., Carter S. L., Stewart C., Mermel C. H., Roberts S. A., Kiezun A., Hammerman P. S., McKenna A., Drier Y., Zou L., Ramos A. H., Pugh T. J., Stransky N., Helman E., Kim J., Sougnez C., Ambrogio L., Nickerson E., Shefler E., Cortés M. L., Auclair D., Saksena G., Voet D., Noble M., DiCara D., Lin P., Lichtenstein L., Heiman D. I., Fennell T., Imielinski M., Hernandez B., Hodis E., Baca S., Dulak A. M., Lohr J., Landau D. A., Wu C. J., Melendez-Zajgla J., Hidalgo-Miranda A., Koren A., McCarroll S. A., Mora J., Lee R. S., Crompton B., Onofrio R., Parkin M., Winckler W., Ardlie K., Gabriel S. B., Roberts C. W. M., Biegel J. A., Stegmaier K., Bass A. J., Garraway L. A., Meyerson M., Golub T. R., Gordenin D. A., Sunyaev S., Lander E. S., Getz G., Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Preston B. D., Albertson T. M., Herr A. J., DNA replication fidelity and cancer. Semin. Cancer Biol. 20, 281–293 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Haradhvala N. J., Polak P., Stojanov P., Covington K. R., Shinbrot E., Hess J. M., Rheinbay E., Kim J., Maruvka Y. E., Braunstein L. Z., Kamburov A., Hanawalt P. C., Wheeler D. A., Koren A., Lawrence M. S., Getz G., Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell 164, 538–549 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Tomkova M., Schuster-Bockler B., DNA modifications: Naturally more error prone? Trends Genet. 34, 627–638 (2018). [DOI] [PubMed] [Google Scholar]
  • 31.Bellacosa A., Drohat A. C., Role of base excision repair in maintaining the genetic and epigenetic integrity of CpG sites. DNA Repair 32, 33–42 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Sanders M. A., Chew E., Flensburg C., Zeilemaker A., Miller S. E., al Hinai A. S., Bajel A., Luiken B., Rijken M., Mclennan T., Hoogenboezem R. M., Kavelaars F. G., Fröhling S., Blewitt M. E., Bindels E. M., Alexander W. S., Löwenberg B., Roberts A. W., Valk P. J. M., Majewski I. J., MBD4 guards against methylation damage and germ line deficiency predisposes to clonal hematopoiesis and early-onset AML. Blood 132, 1526–1534 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Fang H., Barbour J. A., Poulos R. C., Katainen R., Aaltonen L. A., Wong J. W. H., Mutational processes of distinct POLE exonuclease domain mutants drive an enrichment of a specific TP53 mutation in colorectal cancer. PLOS Genet. 16, e1008572 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Tomkova M., Tomek J., Kriaucionis S., Schuster-Bockler B., Mutational signature distribution varies with DNA replication timing and strand asymmetry. Genome Biol. 19, 129 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Poulos R. C., Olivier J., Wong J. W. H., The interaction between cytosine methylation and processes of DNA replication and repair shape the mutational landscape of cancer genomes. Nucleic Acids Res. 45, 7786–7795 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Li F., Mao G., Tong D., Huang J., Gu L., Yang W., Li G.-M., The histone mark H3K36me3 regulates human DNA mismatch repair through its interaction with MutSα. Cell 153, 590–600 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Frigola J., Sabarinathan R., Mularoni L., Muiños F., Gonzalez-Perez A., López-Bigas N., Reduced mutation rate in exons due to differential mismatch repair. Nat. Genet. 49, 1684–1692 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Huang Y., Li G.-M., DNA mismatch repair preferentially safeguards actively transcribed genes. DNA Repair 71, 82–86 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Saxonov S., Berg P., Brutlag D. L., A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc. Natl. Acad. Sci. U.S.A. 103, 1412–1417 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Morera S., Grin I., Vigouroux A., Couvé S., Henriot V., Saparbaev M., Ishchenko A. A., Biochemical and structural characterization of the glycosylase domain of MBD4 bound to thymine and 5-hydroxymethyuracil-containing DNA. Nucleic Acids Res. 40, 9917–9926 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Baubec T., Ivanek R., Lienert F., Schubeler D., Methylation-dependent and -independent genomic targeting principles of the MBD protein family. Cell 153, 480–492 (2013). [DOI] [PubMed] [Google Scholar]
  • 42.Bader S. A., Walker M., Harrison D. J., A human cancer-associated truncation of MBD4 causes dominant negative impairment of DNA repair in colon cancer cells. Br. J. Cancer 96, 660–666 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Davies H., Morganella S., Purdie C. A., Jang S. J., Borgen E., Russnes H., Glodzik D., Zou X., Viari A., Richardson A. L., Børresen-Dale A. L., Thompson A., Eyfjord J. E., Kong G., Stratton M. R., Nik-Zainal S., Whole-genome sequencing reveals breast cancers with mismatch repair deficiency. Cancer Res. 77, 4755–4762 (2017). [DOI] [PubMed] [Google Scholar]
  • 44.Meier B., Volkova N. V., Hong Y., Schofield P., Campbell P. J., Gerstung M., Gartner A., Mutational signatures of DNA mismatch repair deficiency in C. elegans and human cancers. Genome Res. 28, 666–675 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Drost J., van Boxtel R., Blokzijl F., Mizutani T., Sasaki N., Sasselli V., de Ligt J., Behjati S., Grolleman J. E., van Wezel T., Nik-Zainal S., Kuiper R. P., Cuppen E., Clevers H., Use of CRISPR-modified human stem cell organoids to study the origin of mutational signatures in cancer. Science 358, 234–238 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Touat M., Li Y. Y., Boynton A. N., Spurr L. F., Iorgulescu J. B., Bohrson C. L., Cortes-Ciriano I., Birzu C., Geduldig J. E., Pelton K., Lim-Fat M. J., Pal S., Ferrer-Luna R., Ramkissoon S. H., Dubois F., Bellamy C., Currimjee N., Bonardi J., Qian K., Ho P., Malinowski S., Taquet L., Jones R. E., Shetty A., Chow K.-H., Sharaf R., Pavlick D., Albacker L. A., Younan N., Baldini C., Verreault M., Giry M., Guillerm E., Ammari S., Beuvon F., Mokhtari K., Alentorn A., Dehais C., Houillier C., Laigle-Donadey F., Psimaras D., Lee E. Q., Nayak L., Faline-Figueroa J. R. M., Carpentier A., Cornu P., Capelle L., Mathon B., Barnholtz-Sloan J. S., Chakravarti A., Bi W. L., Chiocca E. A., Fehnel K. P., Alexandrescu S., Chi S. N., Haas-Kogan D., Batchelor T. T., Frampton G. M., Alexander B. M., Huang R. Y., Ligon A. H., Coulet F., Delattre J.-Y., Hoang-Xuan K., Meredith D. M., Santagata S., Duval A., Sanson M., Cherniack A. D., Wen P. Y., Reardon D. A., Marabelle A., Park P. J., Idbaih A., Beroukhim R., Bandopadhayay P., Bielle F., Ligon K. L., Mechanisms and therapeutic implications of hypermutation in gliomas. Nature 580, 517–523 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Mark S. C., Sandercock L. E., Luchman H. A., Baross A., Edelmann W., Jirik F. R., Elevated mutant frequencies and predominance of G:C to A:T transition mutations in Msh6−/− small intestinal epithelium. Oncogene 21, 7126–7130 (2002). [DOI] [PubMed] [Google Scholar]
  • 48.Nabel C. S., Manning S. A., Kohli R. M., The curious chemical biology of cytosine: Deamination, methylation,and oxidation as modulators of genomic potential. ACS Chem. Biol. 7, 20–30 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Zou X., Owusu M., Harris R., Jackson S. P., Loizou J. I., Nik-Zainal S., Validating the concept of mutational signatures with isogenic cell models. Nat. Commun. 9, 1744 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Póti Á., Berta K., Xiao Y., Pipek O., Klus G. T., Ried T., Csabai I., Wilcoxen K., Mikule K., Szallasi Z., Szüts D., Long-term treatment with the PARP inhibitor niraparib does not increase the mutation load in cell line models and tumour xenografts. Br. J. Cancer 119, 1392–1400 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Wilks C., Cline M. S., Weiler E., Diehkans M., Craft B., Martin C., Murphy D., Pierce H., Black J., Nelson D., Litzinger B., Hatton T., Maltbie L., Ainsworth M., Allen P., Rosewood L., Mitchell E., Smith B., Warner J., Groboske J., Telc H., Wilson D., Sanford B., Schmidt H., Haussler D., Maltbie D., The Cancer Genomics Hub (CGHub): Overcoming cancer through the power of torrential data. Database 2014, bau093 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Bergstrom E. N., Barnes M., Martincorena I., Alexandrov L. B., Generating realistic null hypothesis of cancer mutational landscapes using SigProfilerSimulator. BMC Bioinformatics 21, 438 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Sloan C. A., Chan E. T., Davidson J. M., Malladi V. S., Strattan J. S., Hitz B. C., Gabdank I., Narayanan A. K., Ho M., Lee B. T., Rowe L. D., Dreszer T. R., Roe G., Podduturi N. R., Tanaka F., Hong E. L., Cherry J. M., ENCODE data at the ENCODE portal. Nucleic Acids Res. 44, D726–D732 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Koren A., Polak P., Nemesh J., Michaelson J. J., Sebat J., Sunyaev S. R., McCarroll S. A., Differential relationship of DNA replication timing to different forms of human mutation and variation. Am. J. Hum. Genet. 91, 1033–1040 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Vohringer H., Gerstung M., Learning mutational signatures and their multidimensional genomic properties with TensorSignatures. bioRxiv, 850453 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Roadmap Epigenomics Consortium, Kundaje A., Meuleman W., Ernst J., Bilenky M., Yen A., Heravi-Moussavi A., Kheradpour P., Zhang Z., Wang J., Ziller M. J., Amin V., Whitaker J. W., Schultz M. D., Ward L. D., Sarkar A., Quon G., Sandstrom R. S., Eaton M. L., Wu Y.-C., Pfenning A. R., Wang X., Claussnitzer M., Liu Y., Coarfa C., Harris R. A., Shoresh N., Epstein C. B., Gjoneska E., Leung D., Xie W., Hawkins R. D., Lister R., Hong C., Gascard P., Mungall A. J., Moore R., Chuah E., Tam A., Canfield T. K., Hansen R. S., Kaul R., Sabo P. J., Bansal M. S., Carles A., Dixon J. R., Farh K.-H., Feizi S., Karlic R., Kim A.-R., Kulkarni A., Li D., Lowdon R., Elliott G. N., Mercer T. R., Neph S. J., Onuchic V., Polak P., Rajagopal N., Ray P., Sallari R. C., Siebenthall K. T., Sinnott-Armstrong N. A., Stevens M., Thurman R. E., Wu J., Zhang B., Zhou X., Beaudet A. E., Boyer L. A., De Jager P. L., Farnham P. J., Fisher S. J., Haussler D., Jones S. J. M., Li W., Marra M. A., Mc Manus M. T., Sunyaev S., Thomson J. A., Tlsty T. D., Tsai L.-H., Wang W., Waterland R. A., Zhang M. Q., Chadwick L. H., Bernstein B. E., Costello J. F., Ecker J. R., Hirst M., Meissner A., Milosavljevic A., Ren B., Stamatoyannopoulos J. A., Wang T., Kellis M., Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figs. S1 to S11

Tables S1 to S7

Data file S1


Articles from Science Advances are provided here courtesy of American Association for the Advancement of Science

RESOURCES