Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2017 May 22;45(13):7786–7795. doi: 10.1093/nar/gkx463

The interaction between cytosine methylation and processes of DNA replication and repair shape the mutational landscape of cancer genomes

Rebecca C Poulos 1, Jake Olivier 2, Jason WH Wong 1,*
PMCID: PMC5737810  PMID: 28531315

Abstract

Methylated cytosines (5mCs) are frequently mutated in the genome. However, no studies have yet comprehensively analysed mutation–methylation associations across cancer types. Here we analyse 916 cancer genomes, together with tissue type-specific methylation and replication timing data. We describe a strong mutation–methylation association across colorectal cancer subtypes, most interestingly in samples with microsatellite instability (MSI) or Polymerase epsilon (POLE) exonuclease domain mutations. By analysing genomic regions with differential mismatch repair (MMR) efficiency, we suggest a possible role for MMR in the correction of 5mC deamination events, potentially accounting for the high rate of 5mC mutation accumulation in MSI tumours. Additionally, we propose that mutant POLE asserts a mutator phenotype specifically at 5mCs, and we find coding mutation hotspots in POLE-mutant cancers at highly-methylated CpGs in the tumour-suppressor genes APC and TP53. Finally, using multivariable regression models, we demonstrate that different cancers exhibit distinct mutation–methylation associations, with DNA repair influencing such associations in certain cancer genomes. Taken together, we find differential associations with methylation that are vital for accurately predicting expected mutation loads across cancer types. Our findings reveal links between methylation and common mutation and repair processes, with these mechanisms defining a key part of the mutational landscape of cancer genomes.

INTRODUCTION

Cancer develops as somatic mutations accumulate in cells, with certain driver mutations conferring a growth advantage to a sub-population (1). In some cancers, mutations develop primarily from exposure to exogenous mutagens such as ultraviolet (UV) light or cigarette smoke, while in other cancer types, most mutations accumulate after a cell develops defective replication or repair mechanisms (2). Mutation rates vary throughout the cancer genome due to factors such as trinucleotide composition (3), transcription factor binding (4,5), chromatin organisation (6), replication timing and mismatch repair (MMR) efficiency (7). However, the origin of many mutations within cancer cells still remains unknown (3).

DNA methylation is an epigenetic mark, most commonly occurring in the genome at sites of CpG dinucleotides (8). Cytosine methylation involves the covalent attachment of a methyl group to the fifth atom of the carbon ring of a cytosine, forming molecules known as 5-methylcytosine (5mC) (9). Methylation has important functions within a cell, influencing development (10), gene expression and silencing (11), as well as being implicated in carcinogenesis (12).

Despite its crucial role in cellular function however, CpG methylation can also be somewhat mutagenic, with methylated cytosines being approximately fivefold more likely to undergo spontaneous deamination (loss of an amine group) than unmethylated cytosines (13). 5mC deamination yields thymine, leading to a G•T mismatch in DNA which can be recognized by thymine DNA glycosylases and repaired through the base excision repair (BER) pathway (14,15). However, if a cell replicates before the mismatch can be repaired, a C>T mutation will become encoded into its genome. A mutation signature from cytosine deamination at CpG sites—signatures 1A and 1B from Alexandrov et al. (3)—has been identified in many cancer types, and is strongly correlated with age of diagnosis as, over time, more deamination events can occur and their mutagenic effects accumulate (16). Methylated CpG dinucleotides (mCpGs) have additionally been found to be highly mutated in non-cancer tissues, with mutation rates also correlating with increasing age (17).

The commonly accepted mechanism of mCpG mutation is that mutations accumulate solely due to random spontaneous deamination of 5mC. However, other processes have also been associated with 5mC mutation or deamination, including exposure to UV light or to cigarette smoke (18). In addition, understanding the repair of G•T mismatches is crucial for determining how mutations at sites of 5mC accumulate within the genome (19). In this study, we analyse the association between methylation and mutation in 61 whole-genome sequenced (WGS) colorectal cancers, together with an additional 855 whole-genomes across 11 cancer types. We describe the association in detail within colorectal cancer subtypes, positing a potential role for MMR in the correction of deaminated 5mCs, and suggesting that Polymerase epsilon (POLE) exonuclease domain mutation increases mutagenesis specifically at 5mCs. We further define the influence of methylation and replication timing on mutation accumulation and repair in cancer, describing distinct mutation–methylation associations in different cancer types, and pinpointing nucleotide excision repair (NER) to be pertinent to mutation profiles at 5mCs in certain cancer genomes.

MATERIALS AND METHODS

Somatic mutations and sample classification

Raw data and somatic mutation calls were obtained from The Cancer Genome Atlas (TCGA) Cancer Genomics Hub (CGhub) (20), International Cancer Genome Consortium (ICGC) (21), or previously published datasets (3,22). Data sources and cancer samples for each cancer type are listed in Supplementary Table S1A and Supplementary Table S1B, with data processing as described (4). In brief, for cancer data obtained from TCGA, mutations were called from BAM files using Strelka (23), with only mutations listed as ‘PASS’ selected for analysis. For cancers with data obtained from ICGC, ‘single base substitutions’ were obtained directly from the ICGC data portal (release 16). Somatic mutations from Alexandrov et al. (3) were obtained from ftp://ftp.sanger.ac.uk/pub/cancer/AlexandrovEtAl and mutations from Zheng et al. (22) were obtained from the database of Genotypes and Phenotypes (dbGap) (phs000830). Colorectal cancers with microsatellite instability (MSI) were selected if they were designated as MSI high (MSI-H) via annotations from TCGA. Tumours with POLE exonuclease domain mutations (POLE-mutant tumours) were designated as such if they had both a somatic mutation in the exonuclease domain (amino acids 268–471) (24) and a Pearson's correlation >0.85 with signature 10 (3). See Supplementary Materials and Methods for further details of cancer subtype classifications.

Methylation, replication timing and repair data

Methylation data from normal sigmoid colon tissue were downloaded from the Roadmap Epigenomics Atlas (25) (Gene Expression Omnibus [GEO]: GSM983645). These data were from whole genome bisulfite sequencing (WGBS) obtained as a wig file and converted to BED format using ‘convert2bed’. Methylation values and chromosome coordinates for individual nucleotides in each CpG were merged, taking the value for methylation as that from the cytosine of each CpG dinucleotide. This value was then used for all methylation calculations relating to colorectal cancer mutations throughout this study. Additional methylation datasets were obtained from the Roadmap Epigenomics Atlas (25) and analysed similarly. These datasets were matched to various cancer types and subtypes as listed in Supplementary Table S2, together with their GEO accession numbers. Normal tissue methylation data have been used for all analyses, which is a limitation of our study. However, while sample-matched methylation data are available for some samples, these bisulfite-based technologies are unable to distinguish between a bisulfite-converted cytosine to uracil change (read as C>T) at unmethylated cytosines and methylation-induced spontaneous mCpG deamination resulting in a C>T change, and hence these sample-matched data are inappropriate for use in this study.

Genome-wide replication timing datasets were downloaded from the UCSC Genome Browser (also available through GEO as GSE34399). GM12878 was the only lymphoblastoid cell-line used, to avoid biasing the sample through inclusion of multiple lymphoblastoid cell-lines, as previously described (7). The remaining datasets contained replication timing values for 11 cell-types. The genome was divided into megabase windows using BEDtools (26), with replication timing averaged across cell-types within these windows.

Excision repair sequencing (XR-seq) data for skin fibroblast cell line NHF1 (27) were obtained in Sequence Read Archive (SRA) format (GEO: GSE67941), and processed as previously described (28).

Statistical analyses

Regression models and other statistical analyses were performed in R. For each cancer type or subtype, the binary logistic regression model predicting mutation probability incorporated methylation (with a quadratic term), replication timing and an interaction between methylation and replication timing, as shown below:

graphic file with name M1.gif

where Pmut = probability of mutation; M = methylation; R = replication timing; b0, b1, b2, b3 and b4 represent constants estimated from logistic regression.

This model was selected for use as it significantly improved upon nested binary logistic regression models with fewer terms (data not shown). A significant improvement was determined by use of both a Likelihood Ratio Test (LRT; ‘lrtest’ function from the ‘lmtest’ package (29); model selected if LRT showed significant improvement by P < 0.05 at all steps between nested models) and the Akaike Information Criterion (AIC; model with smallest AIC was selected). Regression models were constructed using data for autosomes only. Mutations were considered a binary outcome, with each CpG designated as either never mutated in any sample, or mutated in at least one sample, within a given cancer type or subtype. The area under the curve (AUC) was calculated using the ROCR package (30). Equations predicted by the regression models, together with the predicted vertex and AUC from relevant nested models, are recorded in Table 1 and Supplementary Table S3. For further details of regression modelling or other statistical analyses, refer to Supplementary Materials and Methods.

Table 1. Regression equation from multivariable models predicting mutation probability across colorectal and squamous cell carcinoma subtypes, together with vertex and area under curve (AUC) predictions.

graphic file with name gkx463tbl1.jpg

RESULTS AND DISCUSSION

Methylation and mutation associations in colorectal cancer

Recent studies investigating the accumulation of somatic mutations in cancer have shown that mutations in many cancer types increase at promoters due to inhibition of NER at transcription factor bindings sites (4,5). Colon cancers have the lowest relative rate of mutations at promoters, attributable to the reduced importance of NER in the repair of mutations accumulating in such tissues (4). In this study, we have investigated the reduction of promoter mutations in colorectal cancer further. To do so, we constructed mutation profiles around transcription start sites (TSSs) using 61 WGS colorectal cancer samples from TCGA and observed a decrease in mutation load in the region immediately surrounding the TSS (Figure 1A). To understand this feature across colorectal cancer subtypes, we separated these colorectal cancer samples into those which were microsatellite stable (MSS), MSI or POLE-mutant. We found each of the subtypes to exhibit reduced mutation loads at the TSS, with more pronounced relative hypo-mutation in MSI and POLE-mutant samples (Supplementary Figure S1A).

Figure 1.

Figure 1.

Association between mutation accumulation and methylation in colorectal cancer subtypes. (A) Colorectal cancer (n = 61) mutation profile and average methylation profile from normal colon whole genome bisulfite sequencing (WGBS) data, around transcription start sites (TSSs). Nucleotide-resolution mutation data (light beige), together with mutation data in 25 bp bins (dark beige) is shown. (B) Correlation between mutations per megabase (Mb) at CpG dinucleotides and fractions of CpGs methylated (using normal sigmoid colon tissue WGBS methylation data) across autosomes in Polymerase epsilon exonuclease domain mutation (POLE-mutant) colorectal cancers, those with microsatellite instability (MSI) or those that are microsatellite stable (MSS). Genome-wide data is binned for each colorectal cancer subtype (bins of 0.1 methylation), along with r2 and significance from Pearson's regression. The comparison of MSI and POLE-mutant slopes with MSS slopes was calculated by linear regression on binned data, with MSS as the reference factor.

As CpG methylation is typically lower at CpG Island (CGI)-associated promoter elements (31), we investigated methylation around the TSS using normal sigmoid colon WGBS data (25). We mapped average CpG methylation, observing a corresponding decrease in methylation in the region immediately surrounding the TSS (Figure 1A; see also Supplementary Figure S1B for DNase I hypersensitivity (DHS) and H3K4me3 profiles around the TSS—indicating promoter activity). Given this association, we correlated normal colon tissue methylation with mutations per megabase (Mb) of CpG dinucleotides across autosomes in colorectal cancer to determine whether more highly-methylated sites are more frequently mutated. We found there to be a significant association between mutation load and methylation (with methylation in bins containing increasing fractions of CpGs methylated) in each colorectal cancer subtype (P < 0.0001, Pearson's correlation; Figure 1B). Further, we observed significantly steeper slopes of association for POLE-mutant and MSI samples when compared with the slope for MSS samples [P < 0.0001 (POLE-mutant) and P < 0.05 (MSI), linear regression; Figure 1B]. We propose that the baseline-association observed in MSS samples represents endogenous mCpG deamination and repair rate in colon tissue. Therefore, this finding demonstrates that the increased CpG mutation loads of MSI and POLE-mutant colorectal cancers must be attributable either to methylation-associated mutagenesis or to methylation-associated repair deficiencies.

Potential role for mismatch repair in the correction of 5mC deamination-induced mismatches

We find no evidence in the literature to suggest that loss of MMR increases the rate of spontaneous deamination at mCpG dinucleotides. We suggest therefore, that the difference that we have observed between the rate of mutation accumulation at mCpGs in MSS and MSI colorectal cancer genomes must instead be due to a methylation-associated repair deficiency.

MMR efficiency differs with replication timing, as MMR is more active in early-replicating regions (7). Therefore, if MMR is involved to a significant extent in the repair of mCpG deamination-induced mismatches, mutations would accumulate at a greater rate at highly-methylated sites in later-replicating regions where MMR efficiency is poorer. We found this to be the case in MSS (Figure 2A) and POLE-mutant genomes (Supplementary Figure S2), both of which are MMR-proficient. In these genomes, we found the slope of the mutation–methylation association to be increased in mid- and late-replicating regions, when compared with early-replicating regions [MSS: P = 0.0546 (mid) and P < 0.0001 (late); POLE-mutant: P < 0.01 (mid) and P < 0.0001 (late), linear regression; Figure 2A and Supplementary Figure S2]. In contrast, in MSI cancers, where most mutations accumulate in the absence of MMR, we do not observe the effects of differing replication timing on the correction of mCpG deamination-induced mismatches. In these cancers, we found there to be no difference in the slope of mutation–methylation associations in mid- or late-replicating regions, when compared with early-replicating regions [P = 0.6237 (mid) and P = 0.5342 (late), linear regression; Figure 2B]. Taken together, these data would support a role for MMR in the repair of deamination-induced mismatches at mCpG dinucleotides.

Figure 2.

Figure 2.

Association between mutation accumulation and methylation across changes in replication timing in colorectal cancers with differential mismatch repair. Correlation between mutations per megabase (Mb) at CpG dinucleotides and fractions of CpGs methylated (using normal sigmoid colon tissue WGBS methylation data) across autosomes in (A) microsatellite stable (MSS) colorectal cancers, and (B) colorectal cancers with microsatellite instability (MSI). Genome-wide data is binned for each colorectal cancer subtype (bins of 0.1 methylation), along with r2 and significance from Pearson's regression. The comparison of mid- and late-replicating slopes with early-replicating slopes was calculated by linear regression on binned data, with ‘early-replicating’ as the reference factor.

It is worth noting however, that BER via methyl-CpG binding domain 4 (MBD4) and thymine-DNA glycosylase (TDG) is more commonly associated with the repair of 5mC deamination-induced G•T mismatches (14,15). Studies have shown that their impairment can cause increased transition mutations at mCpGs (3234). In particular, MBD4 has been found to be altered at high rates in MSI colorectal cancers due to MSI-induced mutations in polynucleotide tracts within the coding region of MBD4 (3537). Indeed, we found that 4/10 of our MSI samples harboured a truncation of MBD4. However, as the mutation–methylation association is no different between MSI samples with and without MBD4 truncation, we find it unlikely that MBD4 inactivation is solely responsible for the increased rate of mutation accumulation that we observe at 5mCs in MSI cancers (see Supplementary Data Note). Alternative explanations are that the samples without MBD4 truncations harbor other defects in BER that we have not detected, or that the loss of MMR also simultaneously leads to the impairment of BER activity. To further elucidate whether MMR plays a direct role in repairing errors at mCpG dinucleotides, or somehow indirectly impacts the correction of errors at mCpG dinucleotides, would require further research.

Mutagenesis at 5mC nucleotides in POLE-mutant colorectal cancers

Focusing next on the association between methylation and mutation accumulation in POLE-mutant tumours, we computed the correlation coefficient between CpG mutations and methylation for individual POLE-mutant colorectal cancer samples. We found that the slope of the line of best fit from binned data comparing CpG mutations to fractions of CpGs methylated, ranged from 521.7 to 3090 (Supplementary Figure S3), with a significant positive correlation between the slope of each line and the total number of mutations in each POLE-mutant sample (r2 = 0.67 P < 0.05, Pearson's correlation; Figure 3A), and confirming our observation that much of the increased mutagenesis at CpGs in POLE-mutant cancers is methylation-associated (see Figure 1B). POLE-mutant samples have POLE with an inactivated exonuclease domain, leading to a loss of proofreading ability on newly-synthesized DNA (38,39). Samples with greater absolute numbers of mutations therefore will generally have either a stronger mutator phenotype, or have become POLE exonuclease domain mutated earlier. However, with neither of these factors expected to alter the rate of 5mC deamination, we hypothesized that exonuclease domain-mutated POLE may instead more often make replication errors when encountering a site requiring the insertion of guanine in a mCpG context, though other mechanisms of mutagenesis related to defective POLE may also be possible (40). (We note that these data could also be explained if errors are introduced by wild-type POLE when encountering a mCpG context, but we find no evidence in the literature from in vitro studies to suggest that wild-type replicative polymerases typically make such errors in the context of 5mCs).

Figure 3.

Figure 3.

Methylation-associated mutations in POLE-mutant colorectal cancers. (A) Correlation of total mutations per Polymerase epsilon exonuclease domain mutant (POLE-mutant) colorectal cancer sample, with the slope of line of best fit from the mutation–methylation association at Supplementary Figure S3. r2 and significance is by Pearson's correlation. (B) Percentage of candidate sites which harbour C>T mutations in a TCG context per sample for low (<25%) and high (>75%) methylated CpGs (using normal colon tissue methylation data) per POLE-mutant colorectal cancer sample. Mean and standard deviation are shown; significance is by unpaired t-test where **P < 0.01. (C) Strand-specificity of T[C>A]T (left) and T[C>T]G (right) mutations in the regions 5΄ and 3΄ to origins of replication (oriC) near TOP1 (top) and LMNB2 (bottom). Significance is by Fisher's exact test. (D) Excerpt from the UCSC genome browser, depicting strand specificity of T[C>A]T and T[C>T]G mutations 5΄ and 3΄ to the oriC near TOP1 (top) and LMNB2 (bottom).

With the TCG trinucleotide being the most highly mutated CpG variant in POLE-mutant tumours (3,41), we found a significantly greater proportion of T[C>T]G mutations to occur at high rather than low methylated TCG sites (P < 0.01, paired t-test; Figure 3B). Given POLE's role in leading strand replication (42,43), we investigated the strand-specificity of the T[C>A]T and T[C>T]G mutations, both of which are common mutations in POLE-mutant cancer genomes (3,41). We found significant strand asymmetry to occur in both trinucleotide contexts around two known origins of replication (P < 0.0001 and P < 0.05, Fisher's exact test; Figure 3C and D). This finding more likely associates CpG mutations in POLE-mutant tumors with mutagenesis, rather than with some unknown and genome-wide repair deficiency. As these mutations occur more often at methylated than unmethylated cytosines (Figure 1B and Figure 3B), we propose this mutagenesis to be specifically associated with 5mCs. Lending further support to our hypothesis is prior evidence suggesting that POLE exonuclease domain mutation can result in a mutator phenotype greater than that from proofreading-deficiency alone (39), with some variants increasing mutation load even above that from catalytic domain inactivation (41).

mCpG mutations as potential driver events in POLE-mutant colorectal cancers

Many mutations responsible for genetic diseases are C>T transitions occurring at CpG dinucleotides (15,44). Additionally, methylated CpGs are hotspots for somatic cancer mutations in driver genes such as TP53, RB1 and EGFR (15,4547). POLE-mutant colorectal cancers harbour specific mutation hotspots in the key tumour-suppressors tumor protein p53 (TP53) and adenomatous polyposis coli (APC) (41,48) (a finding which we have confirmed in our samples; Figure 4A). As POLE exonuclease domain mutation is thought to be an early event in tumors (38), these POLE-mutant-signature mutations could also occur early in oncogenesis, and serve as gatekeeper mutations—conferring a growth advantage to cellular subpopulations and driving tumor growth. We observed these mutation hotspots (truncating C>T mutations at TP53 R213X and APC R1114X) to occur at TCG trinucleotides, leading us to hypothesize that these sites may be more often mutated specifically in POLE-mutant tumours because of the strong mutation–methylation association that we have observed in this cancer subtype.

Figure 4.

Figure 4.

Mutation hotspots in cancer driver genes in POLE-mutant tumors. (A) Contingency table and significance from Fisher's exact test of Polymerase epsilon exonuclease domain mutant (POLE-mutant) and microsatellite stable (MSS) colorectal cancer samples which are wild-type or mutant at tumor protein p53 (TP53) R213 and adenomatous polyposis coli (APC) R1114 codons. (B) Methylation status in normal colon tissue for each CpG site within coding exons of TP53 and APC, together with significance by one-sample t-test against methylation at R213 and R1114 codons respectively. Mean and standard deviation are shown. ** denotes P < 0.01.

We found these sites to be highly methylated in normal colon tissue, with the CpG at TP53 R213 methylated in 97.1% of reads, and at APC R1114 methylated in 98.7% of reads (Figure 4B). However, while these sites are methylated to a significantly greater extent than other codons in the same gene (P < 0.01, one-sample t-test; Figure 4B), there may yet be other locations in TP53 or APC which are equally likely to become mutated when considering methylation alone. To investigate this, we considered all possible C>T mutations at TCG trinucleotides which would lead to the immediate truncation of either TP53 or APC. We found that the R213 site in TP53 is the only possible trinucleotide fulfilling these criteria (Supplementary Figure S4A) and potentially explaining its hotspot mutation status in POLE-mutant samples. In APC, we found three additional sites occurring earlier from the N-terminal of the protein which fulfilled the criteria listed, together with five mutation sites at or after the C-terminal of codon 1920 (Supplementary Figure S4B). Taken together, our findings suggest that methylation may be responsible for the formation of specific mutation hotspots in POLE-mutant cancers, with other factors likely also contributing to mutation occurrence and selection within cells—perhaps due to a phenotype conferred to cells by mutations at specific sites which makes them more likely to be observed in cancer sequencing data (15).

Differential influence of methylation on mutation accumulation across cancer types and subtypes

Having described a strong mutation–methylation association across colorectal cancer subtypes, we sought to investigate whether any such association exists in other cancer types. To do so, we incorporated into our analyses, somatic mutations from an additional 855 whole-genomes across 11 cancer types available from TCGA, ICGC and previously published datasets (3,22) (Supplementary Table S1A and Supplementary Table S1B). We developed regression models using both tissue type-specific methylation data (Supplementary Table S2) and average cell-type replication timing data, plotting actual mutations together with the function predicted by multivariable regression models (see Materials and Methods).

To first validate our regression models, we investigated the predicted associations in colorectal cancer, finding a positive association between mutation probability and fraction of CpGs methylated across colorectal cancer subtypes for all possible methylation values (function vertex > 1 fraction CpGs methylated; Table 1), consistent with what we have already demonstrated. Also confirming previous findings (7), we found mutation probability to vary little across replication timing changes in MSI colorectal cancers, compared with MSS and POLE-mutant subtypes (depicted in rightmost graphs; Figure 5). This is also evident via the small improvement to the area under the curve (AUC) in nested models which additionally incorporated replication timing (MSI: 2.0%), compared with 16.4% in MSS and 12.6% in POLE-mutant subtypes (Table 1).

Figure 5.

Figure 5.

Actual and predicted mutation rates, according to methylation and replication timing for colorectal cancer subtypes. Graphs depict actual and predicted (by regression model; see Materials and Methods) mutation probability and log odds of mutation probability by methylation or replication timing, for (A) microsatellite stable (MSS) colorectal cancer, (B) colorectal cancers with microsatellite instability (MSI) and (C) colorectal cancers with Polymerase epsilon exonuclease domain mutation (POLE-mutant). Graphs from left to right are: mutation probability by fraction of CpGs methylated (actual and predicted), log odds of mutation probability by fraction of CpGs methylated (actual and predicted), log odds of mutation probability by fraction of CpGs methylated (predicted, using overall average replication timing in all bins), log odds of mutation probability by replication timing (actual and predicted) and log odds of mutation probability by replication timing (predicted, using overall average methylation in all bins). Binned data is shown (bins of 0.1 for methylation or 10 for replication timing). See Table 1 for regression output, predicted vertex and area under curve values.

Having validated our regression models in this way, we then examined the mutation–methylation association in skin cancer subtypes, as skin cancers are subject to well-defined mutation and repair processes associated with UV light. The propensity for mutagenic cyclobutane pyrimidine dimer (CPD) DNA lesion formation following UV light exposure is known to increase at mCpGs (49,50), and hence we would expect that the underlying association between CpG mutation rate and methylation in UV light-induced cancers should be both positive and linear. However, in both squamous cell carcinoma (SCC) and melanoma we found that the association between mutation rate and methylation was non-linear (Figure 6A and Supplementary Figure S5G). The vertex predicted by our multivariable regression model was at 0.51 (SCC; Table 1) and 0.50 (melanoma; Supplementary Table S3) fraction of CpGs methylated, meaning that at methylation fractions greater than ∼0.5, increasing methylation was actually associated with decreasing mutation probability (Figure 6A and Supplementary Figure S5G).

Figure 6.

Figure 6.

Actual and predicted mutation probability and regression function vertex according to methylation and replication timing for squamous cell carcinoma (SCC) subtypes. Graph depicting actual and predicted (by regression model; see Materials and Methods) mutation probability by fraction of CpGs methylated (actual and predicted) for (A) wild-type Xeroderma pigmentosum complementation group C (XPCwild-type) and (B) XPC mutant (XPC−/−) SCC. Binned data is shown (bins of 0.1 for methylation), with the function's vertex indicated by a dotted line. See Table 1 for regression output, predicted vertex and area under curve values. (C) Vertex predicted by regression model on individual XPCwild-type and XPC−/− SCC, with significance by unpaired t-test. Mean and standard deviation are shown. **** denotes P < 0.0001.

To determine why this might be the case, we examined the efficiency of NER in the context of replication timing. NER is a key repair mechanism in many skin cancers due to its role in the removal of UV light-induced DNA lesions We find levels of NER (in response to UV light exposure) to be increased in early-replicating regions (CPD: r2 = 0.78, and (6–4)pyrimidine–pyrimidone photoproduct ((6–4)PP): r2 = 0.57, P < 0.0001, Pearson's correlation; Supplementary Figure S6A), likely due to early-replicating regions tending to be more highly-transcribed (51) and therefore more frequently subject to transcription-coupled NER and domain-associated global genome NER (22). We found that the mutation–methylation pattern in the skin cancers (Figure 6A and Supplementary Figure S5G) closely mimics the replication timing-methylation pattern in NHEK cells (Supplementary Figure S6B), suggesting that NER may underlie the non-linear relationship between methylation and CpG mutation rate in skin cancers. To investigate this, we examined the mutation–methylation association in global genome NER-deficient Xeroderma pigmentosus complementation group C mutant (XPC−/−) SCCs (Figure 6B and Supplementary Figure S6D), and compared this with XPCwild-type SCCs (Figure 6A and Supplementary Figure S6C). Although the mutation–methylation association also remained non-linear in the NER-deficient XPC−/− cancer sub-type, we found the vertex of the function predicting mutation probability to have shifted upwards from 0.51 in XPCwild-type SCC, to 0.64 fraction of CpGs methylated in XPC−/− SCC (Table 1). [This shift can be reproduced with vertices predicted by regression models using individual XPCwild-type and XPC−/− SCC samples (P < 0.0001, unpaired t-test; Figure 6C)]. Further, the AUC showed a 4.9% improvement when methylation was added to a nested model in XPC−/− SCC, with only a 2.4% improvement in XPCwild-type SCC (Table 1). Taken together, our results suggest that the negative association between mutation rate and methylation at high fractions of methylation is, at least in part, driven by the underlying mutation-replication timing-association induced by NER reliance following UV light exposure. As some highly-methylated regions are active gene bodies which tend to be both early-replicating (52) and subject to transcription-coupled NER (22), this likely leads to their reduced overall mutation load in skin cancers.

When investigating other cancer types, the multivariable regression models predicted the regression function's vertex to be between 0 and 1 fraction of CpGs methylated for breast, liver, ovarian and pancreatic cancers, as well as chronic lymphocytic leukaemia (Supplementary Table S3, Supplementary Figure S5). The primary mutation and repair processes are not well understood in many of these cancers, with samples harbouring varied mutation signatures and many mutations of unknown origin (3). It is possible that our regression models are unable to completely separate the association between replication timing and methylation (with both factors significantly impacting on mutation rate), or that tumour-specific methylation changes significantly alter the mutation–methylation associations that we observe. However, it may also be true that in some cancer types, the underlying association with methylation is actually such that, at high rates of methylation, mCpGs are in fact less likely to become mutated, due to the specific mutation and repair processes inherent in various tissue types. In fact, other analyses have shown that the genome-wide rate of C>T single nucleotide polymorphisms (SNPs) increases only at low and intermediate (20–60%) methylated sites, but not at sites with high methylation (53).

CONCLUSION

In this study, we analysed 61 colorectal cancer whole-genomes, together with data from an additional 11 cancer types. Using tissue-specific methylation data, we describe a strong association between C>T mutations and methylation at CpG dinucleotides in many cancer types, driving patterns of mutation formation throughout the genome. Our evidence suggests that MMR may play a role in the correction of G•T mismatches resulting from deamination of 5mC. We also propose a mutator phenotype occurring specifically at 5mCs that results from POLE exonuclease domain mutation—a phenotype that we implicate in potentially driving tumour growth through the formation of specific mutation hotspots in key cancer-associated genes. Additionally, we reveal distinct associations between mutation and methylation across cancer types, highlighting the influence of DNA repair on mutation–methylation associations in some genomes. Together, our findings provide significant developments to our understanding of mutation formation and repair at CpG dinucleotides in cells.

Our study describes distinct mutation–methylation associations in cancer genomes which must be understood in order to effectively predict expected mutation loads across cancer types and subtypes. We emphasize the need for researchers to understand and stratify cancer subtypes according to relevant mutation and repair mechanisms when developing predictive models of mutation rates according to genetic and epigenetic features in the genome. Doing so will allow scientists to more accurately distinguish driver from passenger mutations. Our findings reveal novel links between methylation and common mutation and repair processes, as we show these to be key mechanisms that define the mutational landscape of cancer genomes.

Supplementary Material

Supplementary Data

ACKNOWLEDGEMENTS

The authors thank TCGA, ICGC and other groups who have made data publicly available.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Health and Medical Research Council (NHMRC, Australia) [APP1119932]; Cancer Institute NSW [13/DATA/1-02]; Cure Cancer Foundation Australia with the assistance of Cancer Australia through the Priority-driven Collaborative Cancer Research Scheme [APP1057921 to J.W.H.W.]; Australian Postgraduate Award (to R.C.P.) Australia Research Council Future Fellowship [FT130100096 to J.W.H.W.]. Funding for open access charge: NHMRC.

Conflict of interest statement. None declared.

REFERENCES

  • 1. Nowell P.C. The clonal evolution of tumor cell populations. Science. 1976; 194:23–28. [DOI] [PubMed] [Google Scholar]
  • 2. Vogelstein B., Papadopoulos N., Velculescu V.E., Zhou S., Diaz L.A., Kinzler K.W.. Cancer genome landscapes. Science (New York, N.Y.). 2013; 339:1546–1558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Alexandrov L.B., Nik-Zainal S., Wedge D.C., Aparicio S.A.J.R., Behjati S., Biankin A.V., Bignell G.R., Bolli N., Borg A., Borresen-Dale A.-L. et al. . Signatures of mutational processes in human cancer. Nature. 2013; 500:415–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Perera D., Poulos R.C., Shah A., Beck D., Pimanda J.E., Wong J.W.H.. Differential DNA repair underlies mutation hotspots at active promoters in cancer genomes. Nature. 2016; 532:259–263. [DOI] [PubMed] [Google Scholar]
  • 5. Sabarinathan R., Mularoni L., Deu-Pons J., Gonzalez-Perez A., López-Bigas N.. Nucleotide excision repair is impaired by binding of transcription factors to DNA. Nature. 2016; 532:264–267. [DOI] [PubMed] [Google Scholar]
  • 6. Schuster-Bockler B., Lehner B.. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature. 2012; 488:504–507. [DOI] [PubMed] [Google Scholar]
  • 7. Supek F., Lehner B.. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature. 2015; 521:81–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Riggs A.D., Jones P.A.. George K, Sidney W. Advances in Cancer Research. 1983; 40, Academic Press; 1–30. [DOI] [PubMed] [Google Scholar]
  • 9. Brero A., Leonhardt H., Cardoso M.C.. Replication and translation of epigenetic information. Curr. Top. Microbiol. Immunol. 2006; 301:21–44. [DOI] [PubMed] [Google Scholar]
  • 10. Smith Z.D., Meissner A.. DNA methylation: roles in mammalian development. Nat. Rev. Genet. 2013; 14:204–220. [DOI] [PubMed] [Google Scholar]
  • 11. Doerfler W. De novo methylation, long-term promoter silencing, methylation patterns in the human genome, and consequences of foreign DNA insertion. Curr. Top. Microbiol. Immunol. 2006; 301:125–175. [DOI] [PubMed] [Google Scholar]
  • 12. Jones P.A., Baylin S.B.. The fundamental role of epigenetic events in cancer. Nat. Rev. Genet. 2002; 3:415–428. [DOI] [PubMed] [Google Scholar]
  • 13. Ehrlich M., Norris K.F., Wang R.Y., Kuo K.C., Gehrke C.W.. DNA cytosine methylation and heat-induced deamination. Biosci. Rep. 1986; 6:387–393. [DOI] [PubMed] [Google Scholar]
  • 14. Jacobs A.L., Schär P.. DNA glycosylases: in DNA repair and beyond. Chromosoma. 2012; 121:1–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Walsh C.P., Xu G.L.. Cytosine methylation and DNA repair. Curr. Top. Microbiol. Immunol. 2006; 301:283–315. [DOI] [PubMed] [Google Scholar]
  • 16. Alexandrov L.B., Jones P.H., Wedge D.C., Sale J.E., Campbell P.J., Nik-Zainal S., Stratton M.R.. Clock-like mutational processes in human somatic cells. Nat. Genet. 2015; 47:1402–1407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Rahbari R., Wuster A., Lindsay S.J., Hardwick R.J., Alexandrov L.B., Turki S.A., Dominiczak A., Morris A., Porteous D., Smith B. et al. . Timing, rates and spectra of human germline mutation. Nat. Genet. 2016; 48:126–133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Pfeifer G.P. Mutagenesis at methylated CpG sequences. Curr. Top. Microbiol. Immunol. 2006; 301:259–281. [DOI] [PubMed] [Google Scholar]
  • 19. Wiebauer K., Neddermann P., Hughes M., Jiricny J.. Jost J-P, Saluz H-P. DNA Methylation: Molecular Biology and Biological Significance. 1993; Birkhäuser Basel, Basel; 510–522. [Google Scholar]
  • 20. Wilks C., Cline M.S., Weiler E., Diehkans M., Craft B., Martin C., Murphy D., Pierce H., Black J., Nelson D. et al. . The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data. Database. 2014; 2014:bau093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. The International Cancer Genome Consortium International network of cancer genome projects. Nature. 2010; 464:993–998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Zheng C.L., Wang N.J., Chung J., Moslehi H., Sanborn J.Z., Hur J.S., Collisson E.A., Vemula S.S., Naujokas A., Chiotti K.E. et al. . Transcription restores DNA repair to heterochromatin, determining regional mutation rates in cancer genomes. Cell Rep. 2014; 9:1228–1234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Saunders C.T., Wong W.S., Swamy S., Becq J., Murray L.J., Cheetham R.K.. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics. 2012; 28:1811–1817. [DOI] [PubMed] [Google Scholar]
  • 24. Heitzer E., Tomlinson I.. Replicative DNA polymerase mutations in cancer. Curr. Opin. Genet. Dev. 2014; 24:107–113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Roadmap Epigenomics Consortium Kundaje A, Meuleman W., Ernst J., Bilenky M., Yen A., Heravi-Moussavi A., Kheradpour P., Zhang Z., Wang J. et al. . Integrative analysis of 111 reference human epigenomes. Nature. 2015; 518:317–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Quinlan A.R., Hall I.M.. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010; 26:841–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Hu J., Adar S., Selby C.P., Lieb J.D., Sancar A.. Genome-wide analysis of human global and transcription-coupled excision repair of UV damage at single-nucleotide resolution. Genes Dev. 2015; 29:948–960. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Poulos R.C., Thoms J.A.I., Guan Y.F., Unnikrishnan A., Pimanda J.E., Wong J.W.H.. Functional mutations form at CTCF-cohesin binding sites in melanoma due to uneven nucleotide excision repair across the motif. Cell Rep. 2016; 17:2865–2872. [DOI] [PubMed] [Google Scholar]
  • 29. Zeileis A., Hothorn T.. Diagnostic checking in regression relationships. R News. 2002; 2:7–10. [Google Scholar]
  • 30. Sing T., Sander O., Beerenwinkel N., Lengauer T.. ROCR: visualizing classifier performance in R. Bioinformatics. 2005; 21:3940–3941. [DOI] [PubMed] [Google Scholar]
  • 31. Long H.K., King H.W., Patient R.K., Odom D.T., Klose R.J.. Protection of CpG islands from DNA methylation is DNA-encoded and evolutionarily conserved. Nucleic Acids Res. 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Wong E., Yang K., Kuraguchi M., Werling U., Avdievich E., Fan K., Fazzari M., Jin B., Brown A.M.C., Lipkin M. et al. . Mbd4 inactivation increases C→T transition mutations and promotes gastrointestinal tumor formation. Proc. Natl. Acad. Sci. U.S.A. 2002; 99:14937–14942. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Millar C.B., Guy J., Sansom O.J., Selfridge J., MacDougall E., Hendrich B., Keightley P.D., Bishop S.M., Clarke A.R., Bird A.. Enhanced CpG mutability and tumorigenesis in MBD4-deficient mice. Science. 2002; 297:403. [DOI] [PubMed] [Google Scholar]
  • 34. Vasovcak P., Krepelova A., Menigatti M., Puchmajerova A., Skapa P., Augustinakova A., Amann G., Wernstedt A., Jiricny J., Marra G. et al. . Unique mutational profile associated with a loss of TDG expression in the rectal cancer of a patient with a constitutional PMS2 deficiency. DNA Repair. 2012; 11:616–623. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Bader S., Walker M., Hendrich B., Bird A., Bird C., Hooper M., Wyllie A.. Somatic frameshift mutations in the MBD4 gene of sporadic colon cancers with mismatch repair deficiency. Oncogene. 1999; 18:8044–8047. [DOI] [PubMed] [Google Scholar]
  • 36. Evertson S., Wallin A., Arbman G., Rutten S., Emterling A., Zhang H., Sun X.F.. Microsatellite instability and MBD4 mutation in unselected colorectal cancer. Anticancer Res. 2003; 23:3569–3574. [PubMed] [Google Scholar]
  • 37. Riccio A., Aaltonen L.A., Godwin A.K., Loukola A., Percesepe A., Salovaara R., Masciullo V., Genuardi M., Paravatou-Petsotas M., Bassi D.E. et al. . The DNA repair gene MBD4 (MED1) is mutated in human carcinomas with microsatellite instability. Nat. Genet. 1999; 23:266–268. [DOI] [PubMed] [Google Scholar]
  • 38. Rayner E., van Gool I.C., Palles C., Kearsey S.E., Bosse T., Tomlinson I., Church D.N.. A panoply of errors: polymerase proofreading domain mutations in cancer. Nat. Rev. Cancer. 2016; 16:71–81. [DOI] [PubMed] [Google Scholar]
  • 39. Kane D.P., Shcherbakova P.V.. A common cancer-associated DNA polymerase ε mutation causes an exceptionally strong mutator phenotype, indicating fidelity defects distinct from loss of proofreading. Cancer Res. 2014; 74:1895–1901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Aksenova A., Volkov K., Maceluch J., Pursell Z.F., Rogozin I.B., Kunkel T.A., Pavlov Y.I., Johansson E.. Mismatch repair–independent increase in spontaneous mutagenesis in yeast lacking non-essential subunits of DNA polymerase ε. PLOS Genet. 2010; 6:e1001209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Shinbrot E., Henninger E.E., Weinhold N., Covington K.R., Göksenin A.Y., Schultz N., Chao H., Doddapaneni H., Muzny D.M., Gibbs R.A. et al. . Exonuclease mutations in DNA polymerase epsilon reveal replication strand specific mutation patterns and human origins of replication. Genome Res. 2014; 24:1740–1750. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Pursell Z.F., Isoz I., Lundstrom E.B., Johansson E., Kunkel T.A.. Yeast DNA polymerase epsilon participates in leading-strand DNA replication. Science. 2007; 317:127–130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Miyabe I., Kunkel T.A., Carr A.M.. The major roles of DNA polymerases epsilon and delta at the eukaryotic replication fork are evolutionarily conserved. PLOS Genet. 2011; 7:e1002407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Cooper D.N., Bacolla A., Férec C., Vasquez K.M., Kehrer-Sawatzki H., Chen J.-M.. On the sequence-directed nature of human gene mutation: the role of genomic architecture and the local DNA sequence environment in mediating gene mutations underlying human inherited disease. Hum. Mutat. 2011; 32:1075–1099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Fujii A., Harada T., Iwama E., Ota K., Furuyama K., Ijichi K., Okamoto T., Okamoto I., Takayama K., Nakanishi Y.. Hypermethylation of the CpG dinucleotide in epidermal growth factor receptor codon 790: implications for a mutational hotspot leading to the T790M mutation in non-small-cell lung cancer. Cancer Genet. 2015; 208:271–278. [DOI] [PubMed] [Google Scholar]
  • 46. Holliday R., Grigg G.W.. DNA methylation and mutation. Mutat. Res./Fundam. Mol. Mech. Mutagen. 1993; 285:61–67. [DOI] [PubMed] [Google Scholar]
  • 47. Jones P.A., Rideout W.M., Shen J.-C., Spruck C.H., Tsai Y.C.. Methylation, mutation and cancer. BioEssays. 1992; 14:33–36. [DOI] [PubMed] [Google Scholar]
  • 48. Palles C., Cazier J.-B., Howarth K.M., Domingo E., Jones A.M., Broderick P., Kemp Z., Spain S.L., Almeida E.G., Salguero I. et al. . Germline mutations in the proof-reading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nat. Genet. 2013; 45:136–144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Rochette P.J., Lacoste S., Therrien J.P., Bastien N., Brash D.E., Drouin R.. Influence of cytosine methylation on ultraviolet-induced cyclobutane pyrimidine dimer formation in genomic DNA. Mutat. Res. 2009; 665:7–13. [DOI] [PubMed] [Google Scholar]
  • 50. Cannistraro V.J., Pondugula S., Song Q., Taylor J.S.. Rapid deamination of cyclobutane pyrimidine dimer photoproducts at TCG sites in a translationally and rotationally positioned nucleosome in vivo. J. Biol. Chem. 2015; 290:26597–26609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Schubeler D., Scalzo D., Kooperberg C., van Steensel B., Delrow J., Groudine M.. Genome-wide DNA replication profile for Drosophila melanogaster: a link between transcription and replication timing. Nat. Genet. 2002; 32:438–442. [DOI] [PubMed] [Google Scholar]
  • 52. Aran D., Toperoff G., Rosenberg M., Hellman A.. Replication timing-related and gene body-specific methylation of active human genes. Hum. Mol. Genet. 2010; 20:670–680. [DOI] [PubMed] [Google Scholar]
  • 53. Xia J., Han L., Zhao Z.. Investigating the relationship of DNA methylation with mutation rate and allele frequency in the human genome. BMC Genomics. 2012; 13(Suppl. 8):S7. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES