Abstract
Methylated cytosines deaminate at higher rates than unmethylated cytosines, and the lesions they produce are repaired less efficiently. As a result, methylated cytosines are mutational hotspots. Here, combining rare polymorphism and base-resolution methylation data in humans, Arabidopsis thaliana, and rice (Oryza sativa), we present evidence that methylation state affects mutation dynamics not only at the focal cytosine but also at neighboring nucleotides. In humans, contrary to prior suggestions, we find that nucleotides in the close vicinity (±3 bp) of methylated cytosines mutate less frequently. Reduced mutability around methylated CpGs is also observed in cancer genomes, considering single nucleotide variants alongside tissue-of-origin-matched methylation data. In contrast, methylation is associated with increased neighborhood mutation risk in A. thaliana and rice. The difference in neighborhood mutation risk is less pronounced further away from the focal CpG and modulated by regional GC content. Our results are consistent with a model where altered risk at neighboring bases is linked to lesion formation at the focal CpG and subsequent long-patch repair. Our findings indicate that cytosine methylation has a broader mutational footprint than is commonly assumed.
Keywords: Cytosine methylation, mutation, base excision repair, Arabidopsis thaliana
5-METHYLCYTOSINE (5mC) is found in bacteria, archaea (Blow et al. 2016), and diverse eukaryotes, including vertebrates (Goll and Halpern 2011; Li and Zhang 2014), many invertebrates (Regev et al. 1998; Bewick et al. 2017), plants (Zhang et al. 2018), and fungi (Bewick et al. 2019). Introduced by methyltransferases that target specific nucleotide contexts, the modification marks the underlying sequence for differential treatment, preventing the binding of some proteins, and facilitating recruitment of others, often—but not always—in the context of transcriptional silencing.
In plants, methylation is prominently associated with repeat sequences and the silencing of transposable elements (TEs), but is also found in the bodies of broadly expressed genes (Schmitz et al. 2019). In vertebrates, methylation is present at TEs and gene bodies too, but not specific to these regions. Rather, methylation is ubiquitous (Suzuki and Bird 2008; Feng et al. 2010). In humans, for example, >75% of CpGs, the principal context for methylation in vertebrates, are methylated (Lister et al. 2009). In contrast, only 22% of CpGs are methylated in Arabidopsis thaliana (Feng et al. 2010).
In both vertebrates and plants, methylation is functionally important beyond transposon control. Locally, methylation can repress transcription from a given promoter, and contributes to dynamic regulation of individual genes (Zhang et al. 2018). On a regional scale, acting in conjunction with histone modifications, methylation can help establish and maintain larger silent domains, including during X chromosome inactivation in mammals (Goto and Monk 1998).
Despite its importance for genome regulation, cytosine methylation comes at a cost: 5mCs are more liable to spontaneous deamination than unmethylated cytosines (Coulondre et al. 1978; Duncan and Miller 1980; Wang et al. 1982; Ehrlich et al. 1986; Shen et al. 1994; Zhang and Mathews 1994), and the resulting T:G mismatches are less efficiently repaired than the U:G mismatches that arise from deamination of unmethylated cytosines (Schmutte et al. 1995; Krokan and Bjoras 2013). Methylated cytosines therefore carry a double burden: a higher rate of lesion formation, and less efficient repair. Consequently, 5mCs are more likely to give rise to mutations, specifically C to T transitions (Lutsenko and Bhagwat 1999).
The elevated mutability of 5mC has shaped patterns of genetic variation and genome evolution. CpGs are more likely than other dinucleotides to be found polymorphic in mammalian populations (Barker et al. 1984; Xia et al. 2012), and CpG to TpG changes are disproportionately common among variants associated with both disease (Cooper and Krawczak 1989; Denissenko et al. 1997; Mancini et al. 1997; Zemojtel et al. 2009, 2011; Cooper et al. 2010) and adaptation to novel environments (Stoltzfus and McCandlish 2017; Storz et al. 2019). CpG to TpG transitions can dominate substitution profiles between species (Ebersberger et al. 2002; Hwang and Green 2004), and genomes where CpG methylation is common are depleted of CpGs (Josse et al. 1961; Russell et al. 1976; Salser 1978; Bird 1980; Simmen 2008).
Importantly, higher transition rates at CpGs are also evident in data from parent-child trios (Kong et al. 2012; Francioli et al. 2015; Rahbari et al. 2016; Jónsson et al. 2017) somatic mutations in healthy tissues (Hoang et al. 2016; Martincorena et al. 2018), mutation accumulation lines (Ossowski et al. 2010; Lee et al. 2012; Weng et al. 2019), and rare SNP spectra (Rahbari et al. 2016), strongly supporting mutational processes as the driving force. Finally, whereas early studies had to rely on CpGs as a (reasonable) proxy for methylation, more recent analyses have tethered elevated rates directly to methylation by integrating base-resolution methylation maps with polymorphism/somatic mutation data and comparing rates of evolution or SNP incidence at methylated and unmethylated CpGs explicitly (Ossowski et al. 2010; Mugal and Ellegren 2011; Lee et al. 2012; Xia et al. 2012; Supek et al. 2014; Tomkova et al. 2016; Weng et al. 2019). There is, in short, overwhelming evidence that cytosine methylation strongly impacts the emergence of novel variants, the spectrum of standing genetic variation, genome-wide base composition, and longer-term patterns of genome evolution.
The focal point in understanding the effects of methylation on genome fragility and evolution has, quite naturally, been the methylated cytosine itself. Methylation, however, can affect the rates of lesion formation, recognition and repair beyond the methylated cytosine. For specific mutational processes, this has been well documented. Notably, methylation increases UV-induced formation of pyrimidine dimers (Tommasi et al. 1997; Ikehata and Ono 2007; Banyasz et al. 2016) and slows their subsequent repair (Tornaletti and Pfeifer 1996). 5mC also affects the formation and repair of other directly adjacent lesions, including oxidation damage at neighboring guanines (Tomkova and Schuster-Böckler 2018). But might methylation cast a longer shadow still?
Previously, Qu and colleagues reported a ∼1.5-fold higher incidence of SNPs around (±10 bp) methylated compared to unmethylated CpGs in both humans and medaka fish (Oryzias latipes) (Qu et al. 2012), consistent with a role for methylation in increasing, through an unknown mechanism, the mutability of nucleotides in its vicinity. Implicating methylation as the causative force behind increased mutation rates, however, requires careful control of context. Mutation rate varies at multiple scales across the genome, depending on local and regional sequence composition, chromatin state, replication timing, and functional context (Hodgkinson and Eyre-Walker 2011; Ségurel et al. 2014; Makova and Hardison 2015). Methylated cytosines are unevenly distributed across these contexts, which must be taken into account when trying to determine whether methylation has left a mark on genome variation and evolution beyond the methylated base itself.
Here, we use data on rare polymorphisms in humans, A. thaliana, and rice to quantify the mutational effect methylation has on adjacent bases. Rare SNPs constitute a better proxy for germline mutational processes than common SNPs or substitutions, which more strongly reflect longer-term selection and gene conversion (Rahbari et al. 2016; Zhu et al. 2017). Controlling for sequence context and chromatin state, and considering a range of potential confounders, we find, in contrast to previous results, that methylation is associated with reduced SNP incidence at CpG-neighboring sites in humans. We find a similar reduction in neighborhood mutability when we consider single nucleotide variants in human cancer genomes. In contrast, in both A. thaliana and rice, methylation is positively associated with SNP incidence. In A. thaliana and humans, excess mutability associated with methylation (or lack of methylation, respectively) appears confined to close neighbors (±3 bp), and decays with distance to the methylated site, supporting a mechanism that is contingent on lesion formation at the focal CpG. Our work suggests that methylation casts a longer mutational shadow than commonly assumed, acting in a manner that depends on species-specific coupling between lesion formation and downstream choice of repair pathway.
Methods
Analysis of relative mutational risk associated with methylation in humans
To isolate the specific impact of methylation on SNP incidence, we pursued a matched pairs approach, broadly as previously described (Supek et al. 2014) and described further below. First, CpGs in the human genome were classified as methylated or unmethylated based on base-resolution methylation information in H1 human embryonic stem cells (Lister et al. 2009) (http://neomorph.salk.edu/human_methylome/data.html). Methylated sites were defined as cytosines with a ≥0.7 ratio of methylated to unmethylated reads; unmethylated sites as those with a ratio ≤0.2. CpGs with intermediate methylation (0.2 > x < 0.7) levels were excluded from further analysis. Note that, as methylation stoichiometry is strongly bimodal, this excludes relatively few sites (see Supek et al. 2014 for details). To provide robust classification of methylated/unmethylated sites, we only considered cytosines with ≥10 read coverage in the bisulfite sequencing data. As the X chromosome is differentially affected by methylation in males and females, and because mutations that arise on sex chromosomes are subject to distinct evolutionary regimes, analysis was confined to autosomes. To enable intersection with polymorphism data, coordinates for all eligible methylated/unmethylated CpG sites were converted to hg19 using the UCSC LiftOver utility (https://genome.ucsc.edu/cgi-bin/hgLiftOver). Each unmethylated CpG was then paired with the nearest methylated CpG that matched the following criteria: (a) identical ±4 bp flanking sequence context in the reconstructed ancestral human genome (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/supporting/ancestral_alignments/), (b) identical chromatin state as defined by a hidden Markov model for H1 (https://personal.broadinstitute.org/anshul/projects/wfh/ihmm/chmmBED/human_H1/), and (c) located on the same chromosome. Subsequently, any pairs whose flanking sequence context contained one or more additional CpG dinucleotides were removed to avoid possible confounding effects of proximal CpGs on local mutation risk. For the surviving pairs of methylated and unmethylated CpGs, we then calculated the incidence of singleton SNPs from the gnomAD database (Karczewski et al. 2019) (https://gnomad.broadinstitute.org/). In calculating the incidence of transition SNPs, we removed T to C SNPs that had occurred in the TpG context, and A to G SNPs that had occurred in the CpA context, principally to enable comparison to the analysis of Qu et al. (2012), where—given the data available at the time—it was prudent to exclude these sites because of potential polarization errors. Note that, below, we only consider SNPs at positions ±3 bp from the focal CpG despite pairing for ±4 bp. This is because, at the ±4 position, the ±5 base is unknown, and so including this position could lead to skews due to unknown neighboring bases. Repeat sequences were masked using RepeatMasker v4.0.8 (www.repeatmasker.org).
Analysis of cancer genomes
We applied the same approach just described to data from melanoma (MELA, n = 65 genomes), breast cancer (BRCA, n = 198), liver cancer (LIRI, n = 237), and esophageal adenocarcinoma (ESAD, n = 91), chosen due to the availability of sufficient sample sizes and good-quality matched-tissue methylation maps. Filtering steps were identical with two exceptions: we did not use repeat masker and did not filter out T to C mutations in the TpG context, and A to G mutations in the CpA context. SNV data were obtained from the PCAWG project (Campbell et al. 2020; https://dcc.icgc.org/pcawg). Methylation and chromatin state maps from the closest available healthy tissue were sourced from the Roadmap Epigenomics Project (http://www.roadmapepigenomics.org/): keratinocyte (GSM1127056, E059), breast luminal epithelium (GSM1127125, E027), liver (GSM916049, E066), and esophageal cells (GSM983649, E079).
Analysis of relative mutational risk associated with methylation in Arabidopsis and rice
Classification of CpGs (but also CHG and CHH sites) into methylated and unmethylated sites was based on high-coverage bisulfite data of Ws0 global stage seed for A. thaliana (GSM1664380) (Lin et al. 2017) and data from 3-week-old leaf tissue in rice (GSM1039487) (Stroud et al. 2013). During initial exploration, we obtained very similar methylation maps from different tissues, developmental stages, and accessions. We therefore carried out our final analysis using a high-coverage bisulfite sequencing dataset, which allows us to apply the same conservative coverage cut-off applied to human data. The 1001 Arabidopsis genomes (https://1001genomes.org/data/GMI-MPI/releases/v3.1/) and the 3000 rice genomes projects (http://snp-seek.irri.org/), respectively, were the sources of polymorphism data. To calculate rare SNP incidence, we considered homozygous SNPs confined to a single accession in the selfing A. thaliana as singletons. HMM-defined chromatin states for both A. thaliana and rice were taken from the Plant Chromatin State Database (Liu et al. 2018) (http://systemsbiology.cau.edu.cn/chromstates/). Note here that chromatin states in A. thaliana, rice, and humans are independently defined, and, while it is useful to compare states enriched for similar marks or sequence types, there is no strict equivalence of chromatin states between species. Repeat sequences in both plants were masked using RepeatMasker v4.0.8.
Methylation/demethylation mutants
We obtained base-resolution methylation data for knock-out mutants of ros1 (GSM1859475), nrpd1 (GSM1859476), and a ros1/nrpd1 (GSM1859478) double knock-out (Wierzbicki et al. 2012) as well as for DDM (GSM1014117) and DRD (GSM1014120) mutants (Zemach et al. 2013). Base calls were lifted over to TAIR10 as required. Sites affected by a given deletion were classified as follows: a site targeted by ROS1 was one where methylation was greater in the ros1 mutant than in the WT; a site targeted by NRPD1 was one where methylation was lower in the nrpd1/ros1 double mutant compared to the ros1 deletion strain; a site requiring DDM for methylation was one where the ddm mutant had lower methylation and the drd mutant had no effect.
Replicating prior estimates of relative mutational risk associated with methylation
To track down divergent relative mutational risk associated with methylation (RRmet) estimates compared to the study of Qu et al. (2012), we additionally considered base-resolution bisulfite sequencing data from sperm (Molaro et al. 2012) (GSE30340), lifted over to hg19, and re-implemented their original protocol based on the published methods and feedback provided by the lead author, Wei Qu. This included consideration of sites with ≥5 read coverage, with unmethylated/methylated CpGs defined as those with methylation stoichiometries of ≥0.8 and ≤0.2, respectively. Nucleotides flanking the focal CpG (±10 bp) were considered part of methylated or unmethylated “blocks” based on the methylation status of the focal CpG. These site blocks, defined according to either H1 or sperm methylation data, were then overlapped with polymorphism data, either hg19-converted HapMap (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/), 1000 Genomes (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/), or gnomAD (https://gnomad.broadinstitute.org/) (Karczewski et al. 2019). As in the original study, SNPs that occurred in CpG/TpG/CpA dinucleotides were excluded.
Data availability statement
The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article. The authors affirm that all data analyzed in this article are publicly available and links to the original data sources are provided throughout. Code used to carry out all analyses is available at https://github.com/vfkusmartsev/Proximal_Mutations_5mC. Prior datasets are available as follows: Human polymorphism data: HapMap (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/), 1000 Genomes (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/), gnomAD (https://gnomad.broadinstitute.org/); SNV data: https://dcc.icgc.org/pcawg; A. thaliana polymorphism data: https://1001genomes.org/data/GMI-MPI/releases/v3.1/; rice polymorphism data: http://snp-seek.irri.org/; H1 hESC methylation data: http://neomorph.salk.edu/human_methylome/data.html; Cancer methylation and chromatin state maps: http://www.roadmapepigenomics.org/ for keratinocyte (GEO GSM1127056, E059), breast luminal epithelium (GEO GSM1127125, E027), liver (GEO GSM916049, E066), and esophageal cells (GEO GSM983649, E079). Methylation data for A. thaliana: wildtype (GEO GSM1664380), ros1 (GSM1859475), nrpd1 (GEO GSM1859476), ros1/nrpd1 (GEO GSM1859478) ddm (GEO GSM1014117), and drd (GEO GSM1014120); Methylation data for rice: GEO GSM1039487. Supplemental material available at figshare: https://doi.org/10.25386/genetics.11833482.
Results
Methylation is associated with decreased mutability of neighboring bases in humans
To establish whether SNP incidence varies as a function of methylation at nearby CpGs, we combined data from large-scale surveys of population genomic variation with base-resolution methylation data. For humans, we defined methylated (unmethylated) sites as those with >70% (<20%) methylated reads in H1 human embryonic stem cells (hESC) (Lister et al. 2009), previously shown to be a reasonable proxy for germline methylation (Prendergast et al. 2014; Supek et al. 2014). As in prior work (Supek et al. 2014), we paired methylated and unmethylated CpGs according to multiple criteria, which were applied simultaneously:
First, as mutation rate varies strongly with local sequence context (Blake et al. 1992; Zhao and Boerwinkle 2002; Hwang and Green 2004; Carlson et al. 2018), we required the four nucleotides on either side (±4 bp) of the focal CpG to be the same. Matching the sequence context in this manner also controls for local GC content, previously shown to correlate inversely with CpG mutability (Fryxell and Moon 2005), and sequence complexity, which is an important determinant of indel formation propensity. As heptanucleotide context was previously shown to account for >80% of variability in mutation rates in humans (Aggarwala and Voight 2016), we did not extend matching to even longer sequence motifs, which would have drastically reduced sample size.
Second, we required each member of the methylated/unmethylated pair to be in the same chromatin state as defined by a widely used hidden Markov model for H1 (see Methods), which integrates signals from multiple histone marks, methylation, and DNA accessibility (Ho et al. 2014). We match by chromatin state because mutation rates vary substantially with chromatin environment (Schuster-Böckler and Lehner 2012; Makova and Hardison 2015). In particular, heterochromatic regions accumulate more mutations compared to euchromatic, actively transcribed regions (Schuster-Böckler and Lehner 2012), which are generally more accessible to or specifically targeted by DNA repair machinery (Supek and Lehner 2015; Frigola et al. 2017). Chromatin states also capture other determinants of mutation rate heterogeneity, including replication timing and transcriptional activity, the latter being important in the context of this work because deamination risk is more than two orders of magnitude higher in single- compared to double-stranded DNA (Frederico et al. 1990).
Matching pairs by sequence and chromatin context, and choosing the closest available match along the same chromosome, we then excluded sequence contexts that contained any CpG other than the focal CpG. This allows us to assess neighborhood mutability without the confounding effects of methylation clustering, for example in the context of CpG islands where focal unmethylated CpGs are likely to be surrounded by other unmethylated CpGs.
After filtering, we retained 60,589 matched pairs for downstream analysis. For these pairs, we computed the relative mutational risk associated with methylation, RRmet, as
where SNPmet and SNPunmet are SNPs observed in methylated and unmethylated contexts across all pairs, respectively, and Nmet (= Nunmet) is the total number of pairs. Thus, RRmet >1 indicates higher mutability and RRmet <1 lower mutability in the methylated context.
Considering singleton SNPs observed across 15,708 whole genomes from the gnomAD database (Lek et al. 2016; Karczewski et al. 2019), we find a reduced incidence of SNPs adjacent to methylated compared to unmethylated CpGs (Figure 1). Across all sites (±1–3 bp from the focal CpG) and mutation types (transitions and transversion), RRmet is 0.886 (P = 2.73*10−21, Z-test for proportions). In other words, there are 886 SNPs in the ±1–3 bp neighborhood of methylated CpGs for every 1000 SNPs surrounding unmethylated CpGs.
Figure 1.
The relative mutational risk of methylation (RRmet) at CpGs and neighboring nucleotides in humans. Mutation risk at CpG-neighboring nucleotides is lower when the focal CpG is methylated. Classification of CpGs into methylated and unmethylated is based on base-resolution methylation data from H1 human embryonic stem cells (see Methods). Vertical bars are confidence intervals on RRmet computed using the delta method.
Methylation is associated with decreased mutability of neighboring bases during somatic evolution in cancer
To investigate whether reduced RRmet around CpGs reflects germline-specific mutational processes, we considered single nucleotide variants (SNVs) that arose during somatic evolution in cancer. Using methylation maps from matched healthy tissues and pairing methylated and unmethylated sites as described above (see Methods for data sources), we first considered SNVs across 65 melanoma genomes. We focused on transition mutations, which dominate the mutation spectrum in skin cancer (Alexandrov et al. 2013). Melanoma provide a useful test case because of the high prevalence of mutations that originate from UV-induced cyclobutane pyrimidine dimers (CPDs). These lesions involve neighboring pyrimidines, particularly cytosines, and frequently occur in a CpCpG context (Alexandrov et al. 2013). Once a CPD has formed, both cytosines lose aromatic stabilization and deaminate at a higher rate, resulting in C to T mutations (Peng and Shaw 1996; Cannistraro and Taylor 2009). Importantly, the likelihood of CPD formation is up to 15-fold higher when the CpG is methylated (Tommasi et al. 1997). For CpGs flanked by an upstream cytosine, we therefore expect RRmet >1 at the ±1 bp position. Splitting pairs into those with a proximal cytosine and those where the CpG is flanked by a purine (RpCpG), we find that CpCpG contexts indeed show elevated RRmet at the ±1 bp position (Figure 2A). In contrast, RRmet around RpCpGs is reminiscent of patterns observed in germline (Figure 2A).
Figure 2.
The relative mutational risk of methylation (RRmet) at CpGs and neighboring nucleotides in cancer. (A) Position-specific RRmet estimates computed across 65 melanoma genomes. (B) Joint neighborhood (±1–3 bp) RRmet estimates in different cancers. Vertical bars are confidence intervals on RRmet computed using the delta method.
We extended analysis to three additional cancer types: liver cancer, breast cancer, and esophageal adenocarcinoma. To increase statistical power, we computed a single joint RRmet estimate across sites ±1–3 bp from the focal CpG. We find a germline-like relative risk pattern (RRmet < 1) in all cases (Figure 2B). This suggests that the mutagenic processes at play here, like many others (Chen et al. 2017), are shared between soma and germline.
Methylation is associated with increased SNP incidence in plants
In contrast to human germline and soma, we find mutability of CpG-neighboring bases to be positively associated with methylation in two plants: A. thaliana and rice. Applying the same matching protocol as above (and confining analysis to repeat-masked sequence, see Methods), we find an overall RRmet of 1.28 (P = 2.06*10−89) in A. thaliana and 1.31 in O. sativa (P = 5.43*10−15), where estimates are noisier due to relatively smaller number of matched pairs (Figure 3A; N = 121,774 pairs in A. thaliana, N = 42,779 pairs in O. sativa). The strongest increase in mutational risk was associated with C to T transitions (RRmet = 1.38 in A. thaliana; RRmet = 1.71 in O. sativa). Interestingly, RRmet appears to level off as a function of distance from the focal CpG, with the greatest deviation from random expectation (RRmet = 1) at the nucleotide directly adjacent to the CpG. A similar (albeit inverted) trend is also evident for human germline SNPs and SNVs in skin cancer (see Figure 1 and Figure 2).
Figure 3.
(A) The relative mutational risk of methylation (RRmet) at CpGs and neighboring nucleotides in humans, A. thaliana, and rice. For humans, pairing is based on H1 hESC methylation data. In contrast to Figure 1, only SNPs in repeat-masked sequence were included for all species. (B) RRmet as applied to indels in humans and A. thaliana. Vertical bars are confidence intervals on RRmet computed using the delta method. * P < 0.05, ** P < 0.005.
Increased mutability around methylated sites in plants is not limited to the CpG context
As methylation in plants is not limited to CpGs but also occurs in CpHpG and CpHpH contexts (where H = A, C, or T), we independently collated matched pairs for these contexts. In both cases, there is a tendency for RRmet to be >1 (Supplemental Material, Figure S1). However, as methylation at CpHpG and CpHpH sites is rarer (6 and 1.5% compared to 24% for CpG in A. thaliana, 21 and 2.2% compared to 59% for CpG in O. sativa; Feng et al. 2010) and more strongly skewed to certain functional contexts, sample sizes are much smaller. We therefore focus on CpGs, but note that elevated RRmet across all contexts suggests that differential risk is likely not associated with machinery that acts exclusively at CpGs vs. other contexts.
Methylation is associated with an altered incidence of insertions and deletions
With a view to understanding the mechanism of altered neighborhood mutability, we also investigated whether the presence of methylation affects insertion and deletion (indel) rates around the focal CpG, using the same matched pairs as above. As for SNPs, we focused on singleton indels, and also confined analysis to single-base insertions and deletions, thereby ensuring that the sequence context of the indel is comparable. As indels are rarer than SNPs, we compute only two RRmet estimates, for the CpG itself and all for neighboring bases upstream and downstream (±1–3 bp) combined. In humans, echoing results from SNPs, we find a reduced incidence of indels for methylated sites, both at the CpG itself and in neighboring sequence (Figure 3B). In A. thaliana, RRmet is higher across the board, with a tendency for RRmet >1 for deletions affecting CpG-proximal sites. There were too few events to carry out an informative analysis of indel patterns in rice or the cancer genomes analyzed above.
Methylation-associated changes in mutability are not sequence- or chromatin state-specific
To identify potential drivers of methylation-associated mutability (and pinpoint differences between plants and human), we stratified RRmet by sequence context, local/regional GC content, and chromatin state. We find that lower mutability around methylated CpGs in humans (and higher mutability in A. thaliana) remains evident when RRmet is calculated for individual sequence contexts (see Figure 4, B and F). This suggests that, while there is variability associated with the surrounding sequence, global RRmet estimates are not driven by a specific nucleotide context or set of contexts. This is in line with the fact that, for all species being compared, a large number of unique sequence contexts (>15,000) contribute to global RRmet estimates (Figure 4, A, E, and I). The diversity of motifs involved also provides a first pointer that differential representation of specific sequence contexts across species does not explain why observations in plants differ from those in humans. To rule this out explicitly, we subsampled matched pairs to achieve an identical representation of nucleotide contexts in humans and A. thaliana. Doing so, RRmet remains >1 in A. thaliana and <1 in humans (Figure S2).
Figure 4.
The relative mutational risk of methylation (RRmet) at CpG-neighboring nucleotides in humans, A. thaliana, and rice as a function of sequence context (B, F, and J), regional (±100 bp) GC content (C, G, and K), and chromatin state classes (D, H, and L). (A, E, and I) show the number of unique sequence contexts that contribute to genome-wide RRmet estimates in each species. RRmet estimates for each sequence context, grouped by core sequence context (XpCpGpY) (B, F, and J), are displayed as weighted boxplots, with weights corresponding to the number of pairs associated with a given context. To calculate dependency on GC content, methylated and unmethylated members of a given pair were independently binned based on their regional GC content and RRmet calculated based on these bins. The distribution of regional GC content for methylated and unmethylated sites is almost identical, as shown by the histograms above each panel. Broad chromatin state classes were defined based on the underlying original chromatin states (see Figure S4). *** P < 0.001. Vertical bars are confidence intervals on RRmet calculated using the delta method.
We then considered to what extent RRmet is influenced by local and regional GC content. GC content has previously been suggested to impact focal deamination rates, as DNA duplexes of high GC content are less likely to spontaneously become single-stranded (Leroy et al. 1988). In line with this model, observed CpG content is lower than expected in regions of low GC but not in regions of high GC content (Adams and Eason 1984) and substitution rates at CpGs in primates and Arabidopsis correlate inversely with GC content (Fryxell and Moon 2005; DeRose-Wilson and Gaut 2007). Considering local (motif-wide) and regional (±100 bp either side of the focal CpG) GC content, we find that RRmet in plants is highest at low-to-intermediate GC content and becomes less pronounced at higher GC content (Figure 4, C and K and Figure S3). This is consistent with a collateral damage model of mutagenesis, where initial deamination events at methylated focal CpGs put neighboring sites at risk, and do so more frequently in a low GC context, where spontaneous deamination is more likely to occur. In humans, we observe no such relationship (Figure 4G), in line with the absence of 5mC deamination as the driver of mutational liability.
Next, we considered variability in RRmet across different chromatin states. Again, we find a consistent signal of RRmet >1 in plants and RRmet <1 in humans (Figure 4, D, H, and L and Figure S4), suggesting that the processes responsible for methylation-associated neighborhood mutability can be modulated, but are not driven, by chromatin state. In particular, we note that RRmet is comparatively low (∼1) in A. thaliana genic sequence, perhaps as a result of transcription-coupled repair. These results also highlight that differences between species are not owing to differential enrichment of certain uniquely divergent chromatin states.
Sites accessible to the (de)methylation machinery show reduced relative risk
In both mammals and plants, methylated cytosines are subject to active demethylation. In mammals, ten-eleven translocation (TET) enzymes oxidize 5mC residues to yield 5-hydroxymethylcytosine, which—along with further-oxidized derivatives—can be excised by DNA glycosylases. In plants, active demethylation is more direct. Different glycosylases—such as DEMETER (DME) and REPRESSOR OF SILENCING 1 (ROS1) in A. thaliana—target and excise 5mC itself (Zhu 2009). Both direct and indirect demethylation pathways, being reliant on DNA glycosylases to generate abasic sites, effectively constitute instances of programmed lesion formation. For mammals, our laboratories have previously shown that cytosines that spend a larger fraction of their time in a hydroxymethylated state are subject to different mutation dynamics (Supek et al. 2014; Tomkova et al. 2016). In a similar vein, we wanted to know whether repeated activity of the A. thaliana demethylation machinery might be associated with altered mutagenesis, including at neighboring sites. In particular, we hypothesized that sites that undergo frequent methylation/demethylation/remethylation cycles might suffer from a greater cumulative risk of mutation if base excision repair (BER) that follows glycosylase activity is mutagenic. Sites that undergo such cycles are not uncommon in A. thaliana. ROS1 in particular has been implicated in preventing, through ongoing active removal of 5mCs, the spread of methylation from transposable elements into active genes (Zhang et al. 2018).
We therefore considered RRmet in the context of ROS1 activity, classifying sites as ROS1 targets if they showed increased methylation in a ros1 knock-out strain (see Methods). Similarly, we considered sites to be regular targets of the RNA-directed DNA methylation (RdDM) machinery, which is responsible for de novo methylation, if they showed decreased methylation in strains deleted for nrpd1—a polymerase IV subunit critical for methylation. In order to include effects of ROS1-targeted sites here, we considered changes in methylation in nrpd1/ros1 double mutants compared to the ros1 mutant. We paired sites by sequence context and by ROS1/NRPD1 target status. As RRmet is >1 across different chromatin contexts, we jettison chromatin state pairing to maintain a sufficient samples size for analysis. We find that sites targeted by ROS1 experience a substantially lower excess mutation risk associated with methylation than sites not targeted by ROS1 (Figure 5A). We observe very similar results for nrpd1 (Figure 5A). Post hoc analysis suggests that these differences are also observed when chromatin state is controlled for (data not shown). We note that this effect is quantitative: sites that experience greater gain (loss) in methylation in ros1 (nrpd1) knock-outs, show lower RRmet (Figure 5B). Finally, cytosines that are dependent on the chromatin remodeler DDM1 for methylation show higher RRmet than sites where methylation can be established and subsequently maintained solely through RdDM and maintenance methyltransferase activity (Figure 5A, see Methods for how DDM1 dependency was defined).
Figure 5.
(A) The relative mutational risk of methylation (RRmet) at CpG-neighboring (±1–3 bp) nucleotides in A. thaliana as a function of whether a given site is targeted by RNA-directed DNA methylation (which involves nrpd1), the DNA glycosylase ROS1, or the chromatin remodeler DDM1 (see main text for how targets and nontargets were defined). All P < 10–60. Vertical bars are confidence intervals on RRmet computed using the delta method. (B) RRmet as a function of relative methylation change in ros1 deletion mutants compared to the corresponding wild-type (WT) strain and compared to the nrpd1/ros1 double mutants. Sites with more methylation in ros1 compared to the WT are sites of high ROS1 activity, where methylation is normally erased by ROS1 but increases in the ros1 knock out. Sites with more methylation in ros1 compared to nrpd1/ros1 are sites where the RNA-directed DNA methylation machinery is active, i.e., where deletion of nrpd1 leads to further loss of methylation.
Taken together, these results are inconsistent with a model where higher mutability of neighboring bases is a consequence of frequent methylation/demethylation cycles. Rather, they reinforce the impression that RRmet is enhanced in regions that are less accessible to machinery that affects methylation, demethylation, and, presumably, demethylation-coupled repair.
Prior contradictory results are owing to mapping artifacts
Our results in humans (RRmet <1 at sites flanking the CpG) appear to contradict a prior analysis of genome-wide human polymorphism and methylation data, which found more, not fewer, mutations in the vicinity of methylated sites (Qu et al. 2012). The analytical pipeline of Qu and colleagues (hereafter simply referred to as Qu) exhibits several potentially important differences to our approach. First, Qu used methylation data from sperm (Molaro et al. 2012) rather than H1 hESC (Lister et al. 2009). Second, whereas we pursue the matching approach described above, Qu assigned nucleotides in the vicinity (±10 bp) of a given CpG to one of two categories: after joining blocks of overlapping CpG dinucleotides, they averaged methylation levels across CpGs in the resulting larger block and considered methylated (unmethylated) blocks to be those with overall CpG methylation levels ≥80% (≤20%). They then computed an overall SNP rate for each category. Third, Qu analyzed sites with ≥5 reads. Finally, they considered SNPs from the HapMap project (CEU population), which sampled predominantly common SNPS (>5% allele frequency), while we consider SNPs from the much larger gnomAD database, which are predominantly rare.
To understand which, if any, of these analytical choices explain the discrepancy, we started by reimplementing our matching approach with the sperm methylation data used by Qu. We obtained results very similar to those we obtained with H1 hESC methylation data (RRmet = 0.877, P = 1.68*10−160, Figure S5), suggesting that the different methylation datasets are not the source of the discrepancy. We also obtained very similar results when considering sites with ≥5 instead of ≥10 reads (overall RRmet = 0.889, P = 3.19*10−81), which increases samples size to 162,156 pairs. Unsurprisingly, there is also no substantive change when we require ≥80% rather than ≥70% read support to call a site methylated (data not shown). Next, we sought to reproduce the original finding by implementing the approach of Qu in full, using HapMap (CEU) polymorphisms, sperm methylation data, sites covered by ≥5 read, a threshold of ≥80% for calling methylated sites, and calculating RRmet as the SNP incidence in blocks around focal CpGs as done by Qu. Doing so, we can replicate their results, at least qualitatively (RRmet = 1.4, Figure 6A). Using H1 instead of sperm data again yielded very similar results (Figure 6A), reconfirming that differences in methylation data are not pertinent. Excluding repeats or poorly accessible regions, as defined by the 1000 Genomes project (see Methods), also has limited effect, with RRmet consistently >1 (Figure 6A). In contrast, we obtain very different results when considering SNPs from the 1000 Genomes project (overall RRmet = 0.971) or gnomAD (overall RRmet = 0.87, Figure 6A) instead of HapMap (CEU) polymorphisms.
Figure 6.
Unraveling conflicting estimates of the impact of methylation on mutation risk at neighboring nucleotides in humans. (A) Effects of dataset choice and filtering on the relative mutational risk of methylation (RRmet) at CpG-neighboring nucleotides in humans. Using polymorphism data from the HapMap project (CEU) consistently yields RRmet estimates >1, which are robust to inclusion of SNPs from poorly accessible or repeat regions. RRmet estimates drop below one when considering SNPs from the 1000 Genomes Project or gnomAD. RRmet estimates are for all SNPs, i.e., transitions and transversions combined. Vertical bars are confidence intervals on RRmet computed using the delta method. (B) RRmet at focal CpGs as a function of allele frequency.
Prompted by these results, we considered, for our paired sites, RRmet at focal CpGs as a function of allele frequency. When we do so, a striking pattern emerges: for SNPs that are relatively rare (<10%), we see what we expect—more SNPs occur in the methylated context, and, consequently, RRmet is >1 (Figure 6B). For more common alleles, however, the situation reverses sharply (Figure 6B), with more SNPs in unmethylated than methylated CpGs, to the point where >90% of SNPs with allele frequency >30% appear to be contributed by unmethylated CpGs. Given what we know about the mutability of methylated vs. unmethylated CpGs, this is unlikely to be correct. Indeed, the driver of this unexpected pattern is a technical artifact we dissected previously (Supek et al. 2014). To understand what is happening, first recall that bisulfite treatment leaves methylated cytosines unaffected, but converts unmethylated cytosines to uracils, which will be read as thymines during sequencing. This can cause problems when the bisulfite-treated DNA and the genome against which the resulting reads are mapped are not identical. This is the case for both human methylation maps considered here (Lister et al. 2009; Molaro et al. 2012), where methylation was assayed in H1 hESC and pooled DNA from two anonymous sperm donors, respectively, but mapped against the human reference genome. This leads to wrong inferences whenever there is a CpG in the reference but a TpG SNP in H1/the donor genomes. The bisulfite data are blind to this SNP, and the site will be called as an unmethylated CpG. This is doubly problematic given that such TpG SNPs likely resulted, via deamination, from methylated CpGs. The result of this misidentification is that the set of presumed unmethylated CpGs becomes enriched for the most mutable, methylated CpGs, i.e., those that have already given rise to a TpG SNPs in the human population. Critically, this issue will only confound analysis when variants are relatively common on average, as is the case in HapMap, but is negligible when the large majority of variants are rare, as is the case for 1000 Genomes, and, especially, gnomAD data. We have previously demonstrated this explicitly for H1, by establishing its genotype from publicly available short-read data (Supek et al. 2014).
Discussion
Our results suggest that CpG methylation state affects the mutability of neighboring nucleotides in humans, A. thaliana and rice. Notably, we find RRmet, the relative mutational risk associated with methylation, to be >1 in plants but <1 in humans, which is not explained by differential representation of specific sequence or chromatin contexts. We therefore suggest that RRmet is not linked to intrinsic differences between methylated and unmethylated sequences, which would be species-invariant, but instead reflects species-specific lesion formation rates and how the cellular machinery in each species responds to these lesions.
In humans and A. thaliana, the spatial pattern of RRmet around the focal CpG returns relatively quickly toward the baseline. Even though our matching approach curtails examination of longer-range effects, this suggests that the mutational effects are relatively local. We think that this locally confined signal is suggestive of BER, which has previously been tipped as a potential culprit behind altered neighborhood mutability (Qu et al. 2012). Experiments in cell extracts revealed the existence of different flavors of BER in both humans and plants (Córdoba-Cañero et al. 2009; Martínez-Macías et al. 2013), which differ in the number of bases excised and resynthesized during the repair process. In short-patch BER (SP-BER) only a single nucleotide is added, whereas during long-patch BER (LP-BER) multiple bases are excised and resynthesized. In A. thaliana, tracking repair at U:G mismatches, repair tract lengths of up to 3 bp have been observed (Córdoba-Cañero et al. 2009; Martínez-Macías et al. 2013). In mammals, tracts removed during LP-BER are similarly short, at 2–13 bp (Fortini and Dogliotti 2007; Krokan and Bjoras 2013). This contrasts with nucleotide excision repair (NER) and mismatch repair (MMR), where excision tract lengths are typically much longer, from >20 bp for NER (Sancar 1996) to several hundred bases or more during MMR (Fang and Modrich 1993). The involvement of LP-BER is also consistent with increased single-nucleotide deletion rates (Bennett et al. 2001; Lyons and O’Brien 2010).
But why would trends be inverted for humans and plants? We posit that RRmet reflects the combination of two distinct risk factors: the pathway chosen downstream of a given lesion (which determines sign; e.g., whether U:G or T:G is handled more efficiently), and the relative risk of lesion formation (which affects amplitude and should be higher for T:G). In humans, while some lesions, including 8-oxoguanine, are thought to mostly trigger SP-BER (Fortini et al. 1999), others have been associated with LP-BER. Importantly, the latter category includes U:G mismatches, which result from deamination of unmethylated cytosines. Studying mouse embryonic fibroblasts, Bennett and colleagues showed that, where uracil removal is triggered by uracil DNA glycosylase (UNG), subsequent resynthesis exceeded a single nucleotide in 80% of cases, suggesting that UNG, perhaps by virtue of interacting with PCNA, biases downstream repair pathway choice toward long-patch repair (Bennett et al. 2001; Fortini and Dogliotti 2007).
Importantly, such “uracil-initiated” (Bennett et al. 2001) LP-BER has already been associated with higher mutation risk at neighboring sites: Chen and colleagues introduced mismatches into an SV40 episome capable of replicating in human cells to monitor mutagenic effects either side of the mismatch. While they found that both T:G and U:G mismatches were associated with collateral damage to the neighborhood, liability was ∼sevenfold higher for U:G (Chen et al. 2014). Thus, even though, in a physiological context, the rate of lesion formation might be higher for T:G, the mutagenesis risk associated with repair might be greater for U:G. This is in line with our findings. The precise mechanism(s) behind increased rates surrounding unmethylated CpGs in our data, however, remains unclear. Chen et al. (2014) showed that both BER and MMR were required for elevated mutability, and proposed a model inspired by events during somatic hypermutation at immunoglobulin genes, where BER-generated lesions are hijacked by the MMR machinery, as previously demonstrated for U:G repair (Schanz et al. 2009; Peña-Diaz et al. 2012), and known to occur in the context of active demethylation (Grin and Ishchenko 2016). Specifically, they suggested that excess mutability in their experimental model is consistent with APOBEC enzymes tagging along with the MMR machinery to attack and deaminate single-stranded cytosines. In our data, however, we find no enrichment of the tell-tale APOBEC mutational signature (C to T changes in a TpCpN context). APOBEC-related excess mutational risk might therefore be a specific manifestation of a more general liability of being more fragile in a single-stranded state.
In plants, both LP- and SP-BER have been observed in A. thaliana. How different lesions (or associated glycosylases) predispose to either short- or long-patch repair, however, remains poorly understood (Lee et al. 2014) and—as the repertoire of glycosylases differs substantially between humans and plants—it is certainly conceivable that U:G and T:G mismatches might be associated with a different propensity for LP- vs. SP-BER to what is seen in humans.
We speculate that choice of repair (sub)pathways in different species—and the ensuing mutational burden—might be evolved rather than random. Methylation in plants is relatively specific with regard to functional genomic context, being found principally at TEs and gene bodies (Schmitz et al. 2019). Gene bodies might be protected by transcription-coupled repair (Figure 4D), which likely evolved to counteract the mutagenic effects of transcription, including the excess mutational liability that comes with DNA being single-stranded. Mutations in TEs might, on average, be neutral or even beneficial, if they speed up the process of rendering those elements inactive. Thus, selection to reduce mutagenic effects of methylation on the focal cytosine or its neighborhood might, in global terms, be limited.
In mammals, on the other hand, methylation is ubiquitous, and the underlying sequence often remains important, notably in the context of X inactivation and imprinting. Elevated mutation loads in the neighborhood of methylated cytosines might therefore be much less well tolerated in mammals, so that better repair of these lesions evolved. It is interesting to note in this regard that, during postreplicative repair in mammals, G:T mismatches in the context of hemi-methylated sites are preferentially corrected to G:C with high (∼90%) efficiency (Brown and Jiricny 1987; Bill et al. 1998), perhaps reflective of an evolved bias to counter frequent deamination at methylated CpGs. To our knowledge, no such bias has been observed in plants, although direct evidence is limited to tobacco protoplasts (Inamdar et al. 1992), and this issue deserves further investigation.
To understand the mechanistic underpinnings of methylation-associated mutability and the involvement of particular DNA repair pathways, further analysis of somatic mutation data will also be desirable, in particular from cancers where specific repair pathways are compromised. Irrespective of the precise mechanisms at play, our study provides strong evidence that the mutational impact of methylation extends beyond the methylated cytosine itself, and shapes the emergence of novel variants across the genomes of different eukaryotes.
Acknowledgments
The authors are grateful to Wei Qu for help in replicating her original protocol, members of the Molecular Systems group for discussions and three anonymous reviewers for comments. This work was supported by UK Medical Research Council core funding to T.W., V.K. carried out all analyses, except for the analysis of cancer genomes, which was carried out by M.D., V.K., M.D., B.S.-B. and T.W. designed analyses and interpreted results. B.S.-B. and T.W. supervised the study. T.W. conceived the study and wrote the manuscript with input from all authors.
Footnotes
Supplemental material available at figshare: https://doi.org/10.25386/genetics.11833482.
Communicating editor: N. Springer
Literature Cited
- Adams R. L. P., and Eason R., 1984. Increased G + C content of DNA stabilizes methyl CpG dinucleotides. Nucleic Acids Res. 12: 5869–5877. 10.1093/nar/12.14.5869 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aggarwala V., and Voight B. F., 2016. An expanded sequence context model broadly explains variability in polymorphism levels across the human genome. Nat. Genet. 48: 349–355. 10.1038/ng.3511 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alexandrov L. B., Nik-Zainal S., Wedge D. C., Aparicio S. A. J. R., Behjati S. et al. , 2013. Signatures of mutational processes in human cancer. Nature 500: 415–421 (erratum: Nature 502: 258). 10.1038/nature12477 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Banyasz A., Esposito L., Douki T., Perron M., Lepori C. et al. , 2016. Effect of C5-methylation of cytosine on the UV-induced reactivity of duplex DNA: conformational and electronic factors. J. Phys. Chem. B 120: 4232–4242. 10.1021/acs.jpcb.6b03340 [DOI] [PubMed] [Google Scholar]
- Barker D., Schafer M., and White R., 1984. Restriction sites containing CpG show a higher frequency of polymorphism in human DNA. Cell 36: 131–138. 10.1016/0092-8674(84)90081-3 [DOI] [PubMed] [Google Scholar]
- Bennett S. E., Sung J. S., and Mosbaugh D. W., 2001. Fidelity of uracil-initiated base excision DNA repair in DNA polymerase beta-proficient and -deficient mouse embryonic fibroblast cell extracts. J. Biol. Chem. 276: 42588–42600. 10.1074/jbc.M106212200 [DOI] [PubMed] [Google Scholar]
- Bewick A. J., Vogel K. J., Moore A. J., and Schmitz R. J., 2017. Evolution of DNA methylation across insects. Mol. Biol. Evol. 34: 654–665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bewick A. J., Hofmeister B. T., Powers R. A., Mondo S. J., Grigoriev I. V. et al. , 2019. Diversity of cytosine methylation across the fungal tree of life. Nat. Ecol. Evol. 3: 479–490. 10.1038/s41559-019-0810-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bill C. A., Duran W. A., Miselis N. R., and Nickoloff J. A., 1998. Efficient repair of all types of single-base mismatches in recombination intermediates in Chinese hamster ovary cells: competition between long-patch and G-T glycosylase-mediated repair of G-T mismatches. Genetics 149: 1935–1943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bird A. P., 1980. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 8: 1499–1504. 10.1093/nar/8.7.1499 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blake R. D., Hess S. T., and Nicholson-Tuell J., 1992. The influence of nearest neighbors on the rate and pattern of spontaneous point mutations. J. Mol. Evol. 34: 189–200. 10.1007/BF00162968 [DOI] [PubMed] [Google Scholar]
- Blow, M. J., T. A. Clark, C. G. Daum, A. M. Deutschbauer, A. Fomenkov et al., 2016 The epigenomic landscape of prokaryotes. PLoS Genet. 12: e1005854 (erratum: PLoS Genet. 12: e1006064). doi: 10.1371/journal.pgen.1005854 10.1371/journal.pgen.1005854 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown T. C., and Jiricny J., 1987. A specific mismatch repair event protects mammalian cells from loss of 5-methylcytosine. Cell 50: 945–950. 10.1016/0092-8674(87)90521-6 [DOI] [PubMed] [Google Scholar]
- Campbell P. J., Getz G., Korbel J. O., et al. ,, 2020. Pan-cancer analysis of whole genomes. Nature 578: 82–93. 10.1038/s41586-020-1969-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cannistraro V. J., and Taylor J.-S., 2009. Acceleration of 5-methylcytosine deamination in cyclobutane dimers by G and its implications for UV-induced C-to-T mutation hotspots. - PubMed - NCBI. J. Mol. Biol. 392: 1145–1157. 10.1016/j.jmb.2009.07.048 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carlson J., Locke A. E., Flickinger M., Zawistowski M., Levy S. et al. , 2018. Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans. Nat. Commun. 9: 3753 10.1038/s41467-018-05936-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen C., Qi H., Shen Y., Pickrell J., and Przeworski M., 2017. Contrasting determinants of mutation rates in germline and soma. Genetics 207: 255–267. 10.1534/genetics.117.1114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen J., Miller B. F., and Furano A. V., 2014. Repair of naturally occurring mismatches can induce mutations in flanking DNA. eLife 3: e02001 10.7554/eLife.02001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cooper D. N., and Krawczak M., . 1989. Cytosine methylation and the fate of CpG dinucleotides in vertebrate genomes. Human Genetics 83: 181–188. 10.1007/BF00286715 [DOI] [PubMed] [Google Scholar]
- Cooper D. N., Mort M., Stenson P. D., Ball E. V., and Chuzhanova N. A., 2010. Methylation-mediated deamination of 5-methylcytosine appears to give rise to mutations causing human inherited disease in CpNpG trinucleotides, as well as in CpG dinucleotides. Hum Genomics 4: 406–410. 10.1186/1479-7364-4-6-406 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Córdoba-Cañero D., Morales-Ruiz T., Roldán-Arjona T., and Ariza R. R., 2009. Single-nucleotide and long-patch base excision repair of DNA damage in plants. Plant J. 60: 716–728. 10.1111/j.1365-313X.2009.03994.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coulondre C., Miller J. H., Farabaugh P. J., and Gilbert W., 1978. Molecular basis of base substitution hotspots in Escherichia coli. Nature 274: 775–780. 10.1038/274775a0 [DOI] [PubMed] [Google Scholar]
- Denissenko M. F., Chen J. X., Tang M.-S., and Pfeifer G. P., . 1997. Cytosine methylation determines hot spots of DNA damage in the human P53 gene. PNAS 94: 3893–3898. 10.1073/pnas.94.8.3893 [DOI] [PMC free article] [PubMed] [Google Scholar]
- DeRose-Wilson L. J., and Gaut B. S., 2007. Transcription-related mutations and GC content drive variation in nucleotide substitution rates across the genomes of Arabidopsis thaliana and Arabidopsis lyrata. BMC Evol. Biol. 7: 66 10.1186/1471-2148-7-66 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duncan B. K., and Miller J. H., 1980. Mutagenic deamination of cytosine residues in DNA. Nature 287: 560–561. 10.1038/287560a0 [DOI] [PubMed] [Google Scholar]
- Ebersberger I., Metzler D., Schwarz C., and Pääbo S., 2002. Genomewide comparison of DNA sequences between humans and chimpanzees. Am. J. Hum. Genet. 70: 1490–1497. 10.1086/340787 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ehrlich M., Norris K. F., Wang R. Y., Kuo K. C., and Gehrke C. W., 1986. DNA cytosine methylation and heat-induced deamination. Biosci. Rep. 6: 387–393. 10.1007/BF01116426 [DOI] [PubMed] [Google Scholar]
- Fang W. H., and Modrich P., 1993. Human strand-specific mismatch repair occurs by a bidirectional mechanism similar to that of the bacterial reaction. J. Biol. Chem. 268: 11838–11844. [PubMed] [Google Scholar]
- Feng S., Cokus S. J., Zhang X., Chen P.-Y., Bostick M. et al. , 2010. Conservation and divergence of methylation patterning in plants and animals. Proc. Natl. Acad. Sci. USA 107: 8689–8694. 10.1073/pnas.1002720107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fortini P., and Dogliotti E., 2007. Base damage and single-strand break repair: mechanisms and functional significance of short- and long-patch repair subpathways. DNA Repair (Amst.) 6: 398–409. 10.1016/j.dnarep.2006.10.008 [DOI] [PubMed] [Google Scholar]
- Fortini P., Parlanti E., Sidorkina O. M., Laval J., and Dogliotti E., 1999. The type of DNA glycosylase determines the base excision repair pathway in mammalian cells. J. Biol. Chem. 274: 15230–15236. 10.1074/jbc.274.21.15230 [DOI] [PubMed] [Google Scholar]
- Francioli L. C., Polak P. P., Koren A., Menelaou A., Chun S. et al. , 2015. Genome-wide patterns and properties of de novo mutations in humans. Nat. Genet. 47: 822–826. 10.1038/ng.3292 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frederico L. A., Kunkel T. A., and Shaw B. R., 1990. A sensitive genetic assay for the detection of cytosine deamination: determination of rate constants and the activation energy. Biochemistry 29: 2532–2537. 10.1021/bi00462a015 [DOI] [PubMed] [Google Scholar]
- Frigola J., Sabarinathan R., Mularoni L., Muiños F., Gonzalez-Perez A. et al. , 2017. Reduced mutation rate in exons due to differential mismatch repair. Nat. Genet. 49: 1684–1692 [corrigenda: Nat. Genet. 50: 1196 (2018)]. 10.1038/ng.3991 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fryxell K. J., and Moon W. J., 2005. CpG mutation rates in the human genome are highly dependent on local GC content. Mol. Biol. Evol. 22: 650–658 (erratum: Mol. Biol. Evol. 22: 1159). 10.1093/molbev/msi043 [DOI] [PubMed] [Google Scholar]
- Goll M. G., and Halpern M. E., 2011. DNA methylation in zebrafish. Prog. Mol. Biol. Transl. Sci. 101: 193–218. 10.1016/B978-0-12-387685-0.00005-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goto T., and Monk M., 1998. Regulation of X-chromosome inactivation in development in mice and humans. Microbiol. Mol. Biol. Rev. 62: 362–378. 10.1128/MMBR.62.2.362-378.1998 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grin I., and Ishchenko A. A., 2016. An interplay of the base excision repair and mismatch repair pathways in active DNA demethylation. Nucleic Acids Res. 44: 3713–3727. 10.1093/nar/gkw059 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ho J. W. K., Jung Y. L., Liu T., Alver B. H., Lee S. et al. , 2014. Comparative analysis of metazoan chromatin organization. Nature 512: 449–452. 10.1038/nature13415 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoang M. L., Kinde I., Tomasetti C., McMahon K. W., Rosenquist T. A. et al. , 2016. Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing. Proc. Natl. Acad. Sci. USA 113: 9846–9851. 10.1073/pnas.1607794113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hodgkinson A., and Eyre-Walker A., 2011. Variation in the mutation rate across mammalian genomes. Nat. Rev. Genet. 12: 756–766. 10.1038/nrg3098 [DOI] [PubMed] [Google Scholar]
- Hwang D. G., and Green P., 2004. Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc. Natl. Acad. Sci. USA 101: 13994–14001. 10.1073/pnas.0404142101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ikehata H., and Ono T., 2007. Significance of CpG methylation for solar UV‐induced mutagenesis and carcinogenesis in skin‡. Photochem. Photobiol. 83: 196–204. [DOI] [PubMed] [Google Scholar]
- Inamdar N. M., Zhang X.-Y., Brough C. L., Gardiner W. E., Bisaro D. M. et al. , 1992. Transfection of heteroduplexes containing uracil.guanine or thymine.guanine mispairs into plant cells. Plant Mol. Biol. 20: 123–131. 10.1007/BF00029155 [DOI] [PubMed] [Google Scholar]
- Jónsson H., Sulem P., Kehr B., Kristmundsdottir S., Zink F. et al. , 2017. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature 549: 519–522. 10.1038/nature24018 [DOI] [PubMed] [Google Scholar]
- Josse J., Kaiser A. D., and Kornberg A., 1961. Enzymatic synthesis of deoxyribonucleic acid. VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid. J. Biol. Chem. 236: 864–875. [PubMed] [Google Scholar]
- Karczewski K. J., Francioli L. C., Tiao G., Cummings B. B., Alföldi J. et al. , 2019. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. bioRxiv. doi: 10.1101/531210 [DOI] [Google Scholar]
- Kong A., Frigge M. L., Masson G., Besenbacher S., Sulem P. et al. , 2012. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488: 471–475. 10.1038/nature11396 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krokan H. E., and Bjoras M., 2013. Base excision repair. Cold Spring Harb. Perspect. Biol. 5: a012583 10.1101/cshperspect.a012583 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee H., Popodi E., Tang H., and Foster P. L., 2012. Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing. Proc. Natl. Acad. Sci. USA 109: E2774–E2783. 10.1073/pnas.1210309109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee J., Jang H., Shin H., Choi W. L., Mok Y. G. et al. , 2014. AP endonucleases process 5-methylcytosine excision intermediates during active DNA demethylation in Arabidopsis. Nucleic Acids Res. 42: 11408–11418. 10.1093/nar/gku834 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lek M., Karczewski K. J., Minikel E. V., Samocha K. E., Banks E. et al. , 2016. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536: 285–291. 10.1038/nature19057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leroy J. L., Kochoyan M., Huynh-Dinh T., and Guéron M., 1988. Characterization of base-pair opening in deoxynucleotide duplexes using catalyzed exchange of the imino proton. J. Mol. Biol. 200: 223–238. 10.1016/0022-2836(88)90236-7 [DOI] [PubMed] [Google Scholar]
- Li E., and Zhang Y., 2014. DNA methylation in mammals. Cold Spring Harb. Perspect. Biol. 6: a019133 10.1101/cshperspect.a019133 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin J.-Y., Le B. H., Chen M., Henry K. F., Hur J. et al. , 2017. Similarity between soybean and Arabidopsis seed methylomes and loss of non-CG methylation does not affect seed development. Proc. Natl. Acad. Sci. USA 114: E9730–E9739. 10.1073/pnas.1716758114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lister R., Pelizzola M., Dowen R. H., Hawkins R. D., Hon G. et al. , 2009. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462: 315–322. 10.1038/nature08514 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Y., Tian T., Zhang K., You Q., Yan H. et al. , 2018. PCSD: a plant chromatin state database. Nucleic Acids Res. 46: D1157–D1167. 10.1093/nar/gkx919 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lutsenko E., and Bhagwat A. S., 1999. Principal causes of hot spots for cytosine to thymine mutations at sites of cytosine methylation in growing cells. A model, its experimental support and implications. Mutat. Res. 437: 11–20. 10.1016/S1383-5742(99)00065-4 [DOI] [PubMed] [Google Scholar]
- Lyons D. M., and O’Brien P. J., 2010. Human base excision repair creates a bias toward -1 frameshift mutations. J. Biol. Chem. 285: 25203–25212. 10.1074/jbc.M110.118596 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Makova K. D., and Hardison R. C., 2015. The effects of chromatin organization on variation in mutation rates in the genome. Nat. Rev. Genet. 16: 213–223. 10.1038/nrg3890 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mancini D., Singh S., Ainsworth P., and Rodenheiser D., 1997. Constitutively methylated CpG dinucleotides as mutation hot spots in the retinoblastoma gene (RB1). Am. J. Human Genet. 61: 80–87. 10.1086/513898 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martincorena I., Fowler J. C., Wabik A., Lawson A. R. J., Abascal F. et al. , 2018. Somatic mutant clones colonize the human esophagus with age. Science 362: 911–917. 10.1126/science.aau3879 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martínez-Macías M. I., Córdoba-Cañero D., Ariza R. R., and Roldán-Arjona T., 2013. The DNA repair protein XRCC1 functions in the plant DNA demethylation pathway by stimulating cytosine methylation (5-meC) excision, gap tailoring, and DNA ligation. J. Biol. Chem. 288: 5496–5505. 10.1074/jbc.M112.427617 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Molaro A., Hodges E., Fang F., Song Q., McCombie W. R. et al. , 2012. Sperm methylation profiles reveal features of epigenetic inheritance and evolution in primates. Cell 146: 1029–1041. 10.1016/j.cell.2011.08.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mugal C. F., and Ellegren H., 2011. Substitution rate variation at human CpG sites correlates with non-CpG divergence, methylation level and GC content. Genome Biol. 12: R58 10.1186/gb-2011-12-6-r58 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ossowski S., Schneeberger K., Lucas-Lledó J. I., Warthmann N., Clark R. M. et al. , 2010. The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327: 92–94. 10.1126/science.1180677 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peña-Diaz J., Bregenhorn S., Ghodgaonkar M., Follonier C., Artola-Borán M. et al. , 2012. Noncanonical mismatch repair as a source of genomic instability in human cells. Mol. Cell 47: 669–680 [corrigenda: Mol. Cell 67: 162 (2017)]. 10.1016/j.molcel.2012.07.006 [DOI] [PubMed] [Google Scholar]
- Peng W., and Shaw B. R., 1996. Accelerated deamination of cytosine residues in UV-induced cyclobutane pyrimidine dimers leads to CC→TT transitions †. Biochemistry 35: 10172–10181. 10.1021/bi960001x [DOI] [PubMed] [Google Scholar]
- Prendergast J. G. D., Chambers E. V., and Semple C. A. M., 2014. Sequence-level mechanisms of human epigenome evolution. Genome Biol. Evol. 6: 1758–1771. 10.1093/gbe/evu142 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qu W., Hashimoto S. I., Shimada A., Nakatani Y., Ichikawa K. et al. , 2012. Genome-wide genetic variations are highly correlated with proximal DNA methylation patterns. Genome Res. 22: 1419–1425. 10.1101/gr.140236.112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rahbari R., Wuster A., Lindsay S. J., Hardwick R. J., Alexandrov L. B. et al. ; UK10K Consortium , 2016. Timing, rates and spectra of human germline mutation. Nat. Genet. 48: 126–133. 10.1038/ng.3469 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Regev A., Lamb M. J., and Jablonka E., 1998. The role of DNA methylation in invertebrates: developmental regulation or genome defense? Mol. Biol. Evol. 15: 880–891. 10.1093/oxfordjournals.molbev.a025992 [DOI] [Google Scholar]
- Russell G. J., Walker P. M. B., Elton R. A., and Subak-Sharpe J. H., 1976. Doublet frequency analysis of fractionated vertebrate nuclear DNA. J. Mol. Biol. 108: 1–20. 10.1016/S0022-2836(76)80090-3 [DOI] [PubMed] [Google Scholar]
- Salser W., 1978. Globin mRNA sequences: analysis of base pairing and evolutionary implications. Cold Spring Harb. Symp. Quant. Biol. 42: 985–1002. 10.1101/SQB.1978.042.01.099 [DOI] [PubMed] [Google Scholar]
- Sancar A., 1996. DNA excision repair. Annu. Rev. Biochem. 65: 43–81. 10.1146/annurev.bi.65.070196.000355 [DOI] [PubMed] [Google Scholar]
- Schanz S., Castor D., Fischer F., and Jiricny J., 2009. Interference of mismatch and base excision repair during the processing of adjacent U/G mispairs may play a key role in somatic hypermutation. Proc. Natl. Acad. Sci. USA 106: 5593–5598. 10.1073/pnas.0901726106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmitz R. J., Lewis Z. A., and Goll M. G., 2019. DNA methylation: shared and divergent features across eukaryotes. Trends Genet. 35: 818–827. 10.1016/j.tig.2019.07.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmutte C., Yang A. S., Beart R. W., and Jones P. A., 1995. Base excision repair of U:G mismatches at a mutational hotspot in the p53 gene is more efficient than base excision repair of T:G mismatches in extracts of human colon tumors. Cancer Res. 55: 3742–3746. [PubMed] [Google Scholar]
- Schuster-Böckler B., and Lehner B., 2012. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488: 504–507. 10.1038/nature11273 [DOI] [PubMed] [Google Scholar]
- Ségurel L., Wyman M. J., and Przeworski M., 2014. Determinants of mutation rate variation in the human germline. Annu. Rev. Genomics Hum. Genet. 15: 47–70. 10.1146/annurev-genom-031714-125740 [DOI] [PubMed] [Google Scholar]
- Shen J.-C., Rideout W. M., and Jones P. A., 1994. The rate of hydrolytic deamination of 5-methylcytosine in double-stranded DNA. Nucleic Acids Res. 22: 972–976. 10.1093/nar/22.6.972 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simmen M. W., 2008. Genome-scale relationships between cytosine methylation and dinucleotide abundances in animals. Genomics 92: 33–40. 10.1016/j.ygeno.2008.03.009 [DOI] [PubMed] [Google Scholar]
- Stoltzfus A., and McCandlish D. M., 2017. Mutational biases influence parallel adaptation. Mol. Biol. Evol. 34: 2163–2172. 10.1093/molbev/msx180 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Storz J. F., Natarajan C., Signore A. V., Witt C. C., McCandlish D. M. et al. , 2019. The role of mutation bias in adaptive molecular evolution: insights from convergent changes in protein function. Philos. Trans. R. Soc. Lond. B Biol. Sci. 374: 20180238 10.1098/rstb.2018.0238 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stroud H., Ding B., Simon S. A., Feng S., Bellizzi M. et al. , 2013. Plants regenerated from tissue culture contain stable epigenome changes in rice. eLife 2: e00354 10.7554/eLife.00354 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Supek F., and Lehner B., 2015. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521: 81–84. 10.1038/nature14173 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Supek F., Lehner B., Hajkova P., and Warnecke T., 2014. Hydroxymethylated cytosines are associated with elevated C to G transversion rates. PLoS Genet. 10: e1004585 10.1371/journal.pgen.1004585 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suzuki M. M., and Bird A., 2008. DNA methylation landscapes: provocative insights from epigenomics. Nat. Rev. Genet. 9: 465–476. 10.1038/nrg2341 [DOI] [PubMed] [Google Scholar]
- Tomkova M., and Schuster-Böckler B., 2018. DNA modifications: naturally more error prone? Trends Genet. 34: 627–638. 10.1016/j.tig.2018.04.005 [DOI] [PubMed] [Google Scholar]
- Tomkova M., McClellan M., Kriaucionis S., and Schuster-Boeckler B., 2016. 5-hydroxymethylcytosine marks regions with reduced mutation frequency in human DNA. eLife 5: 5:e17082. 10.7554/eLife.17082. 10.7554/eLife.17082 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tommasi S., Denissenko M. F., and Pfeifer G. P., 1997. Sunlight induces pyrimidine dimers preferentially at 5-methylcytosine bases. Cancer Res. 57: 4727–4730. [PubMed] [Google Scholar]
- Tornaletti S., and Pfeifer G. P., 1996. UV damage and repair mechanisms in mammalian cells. BioEssays 18: 221–228. 10.1002/bies.950180309 [DOI] [PubMed] [Google Scholar]
- Wang R. Y. H., Kuo K. C., Gehrke C. W., Huang L.-H., and Ehrlich M., 1982. Heat- and alkali-induced deamination of 5-methylcytosine and cytosine residues in DNA. Biochim. Biophys. Acta. 697: 371–377. 10.1016/0167-4781(82)90101-4 [DOI] [PubMed] [Google Scholar]
- Weng M.-L., Becker C., Hildebrandt J., Neumann M., Rutter M. T. et al. , 2019. Fine-grained analysis of spontaneous mutation spectrum and frequency in Arabidopsis thaliana. Genetics 211: 703–714. 10.1534/genetics.118.301721 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wierzbicki A. T., Cocklin R., Mayampurath A., Lister R., Rowley M. J. et al. , 2012. Spatial and functional relationships among Pol V-associated loci, Pol IV-dependent siRNAs, and cytosine methylation in the Arabidopsis epigenome. Genes Dev. 26: 1825–1836. 10.1101/gad.197772.112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xia J., Han L., and Zhao Z., 2012. Investigating the relationship of DNA methylation with mutation rate and allele frequency in the human genome. BMC Genomics 13: S7 10.1186/1471-2164-13-S8-S7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zemach A., Kim M. Y., Hsieh P.-H., Coleman-Derr D., Eshed-Williams L. et al. , 2013. The Arabidopsis nucleosome remodeler DDM1 allows DNA methyltransferases to access H1-containing heterochromatin. Cell 153: 193–205. 10.1016/j.cell.2013.02.033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zemojtel T. S., Kielbasa M., Arndt P. F., Chung H.-R., and Vingron M., 2009. Methylation and deamination of CpGs generate p53-binding sites on a genomic scale. Trends in Genet. 25: 63–66. 10.1016/j.tig.2008.11.005 [DOI] [PubMed] [Google Scholar]
- Zemojtel T., Kielbasa S. M., Arndt P. F., Behrens S., Bourque G., and Vingron M., 2011. CpG Deamination creates transcription factor -binding sites with high efficiency. Genome Biol. Evol. 3: 1304–1311. 10.1093/gbe/evr107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang H., Lang Z., and Zhu J.-K., 2018. Dynamics and function of DNA methylation in plants. Nature Publishing Group 19: 489–506. [DOI] [PubMed] [Google Scholar]
- Zhang X., and Mathews C. K., 1994. Effect of DNA cytosine methylation upon deamination-induced mutagenesis in a natural target sequence in duplex DNA. J. Biol. Chem. 269: 7066–7069. [PubMed] [Google Scholar]
- Zhao Z., and Boerwinkle E., 2002. Neighboring-nucleotide effects on single nucleotide polymorphisms: a study of 2.6 million polymorphisms across the human genome. Genome Res. 12: 1679–1686. 10.1101/gr.287302 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu J.-K., 2009. Active DNA demethylation mediated by DNA glycosylases. Annu. Rev. Genet. 43: 143–166. 10.1146/annurev-genet-102108-134205 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu Y. O., Sherlock G., and Petrov D. A., 2017. Extremely rare polymorphisms in Saccharomyces cerevisiae allow inference of the mutational spectrum. PLoS Genet. 13: e1006455 10.1371/journal.pgen.1006455 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article. The authors affirm that all data analyzed in this article are publicly available and links to the original data sources are provided throughout. Code used to carry out all analyses is available at https://github.com/vfkusmartsev/Proximal_Mutations_5mC. Prior datasets are available as follows: Human polymorphism data: HapMap (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/), 1000 Genomes (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/), gnomAD (https://gnomad.broadinstitute.org/); SNV data: https://dcc.icgc.org/pcawg; A. thaliana polymorphism data: https://1001genomes.org/data/GMI-MPI/releases/v3.1/; rice polymorphism data: http://snp-seek.irri.org/; H1 hESC methylation data: http://neomorph.salk.edu/human_methylome/data.html; Cancer methylation and chromatin state maps: http://www.roadmapepigenomics.org/ for keratinocyte (GEO GSM1127056, E059), breast luminal epithelium (GEO GSM1127125, E027), liver (GEO GSM916049, E066), and esophageal cells (GEO GSM983649, E079). Methylation data for A. thaliana: wildtype (GEO GSM1664380), ros1 (GSM1859475), nrpd1 (GEO GSM1859476), ros1/nrpd1 (GEO GSM1859478) ddm (GEO GSM1014117), and drd (GEO GSM1014120); Methylation data for rice: GEO GSM1039487. Supplemental material available at figshare: https://doi.org/10.25386/genetics.11833482.






