Skip to main content
Human Genomics logoLink to Human Genomics
. 2010 Aug 1;4(6):406–410. doi: 10.1186/1479-7364-4-6-406

Methylation-mediated deamination of 5-methylcytosine appears to give rise to mutations causing human inherited disease in CpNpG trinucleotides, as well as in CpG dinucleotides

David N Cooper 1,, Matthew Mort 1, Peter D Stenson 1, Edward V Ball 1, Nadia A Chuzhanova 2
PMCID: PMC3525222  PMID: 20846930

Abstract

The cytosine-guanine (CpG) dinucleotide has long been known to be a hotspot for pathological mutation in the human genome. This hypermutability is related to its role as the major site of cytosine methylation with the attendant risk of spontaneous deamination of 5-methylcytosine (5mC) to yield thymine. Cytosine methylation, however, also occurs in the context of CpNpG sites in the human genome, an unsurprising finding since the intrinsic symmetry of CpNpG renders it capable of supporting a semi-conservative model of replication of the methylation pattern. Recently, it has become clear that significant DNA methylation occurs in a CpHpG context (where H = A, C or T) in a variety of human somatic tissues. If we assume that CpHpG methylation also occurs in the germline, and that 5mC deamination can occur within a CpHpG context, then we might surmise that methylated CpHpG sites could also constitute mutation hotspots causing human genetic disease. To test this postulate, 54,625 missense and nonsense mutations from 2,113 genes causing inherited disease were retrieved from the Human Gene Mutation Database http://www.hgmd.org. Some 18.2 per cent of these pathological lesions were found to be C → T and G → A transitions located in CpG dinucleotides (compatible with a model of methylation-mediated deamination of 5mC), an approximately ten-fold higher proportion than would have been expected by chance alone. The corresponding proportion for the CpHpG trinucleotide was 9.9 per cent, an approximately two-fold higher proportion than would have been expected by chance. We therefore estimate that ~5 per cent of missense/nonsense mutations causing human inherited disease may be attributable to methylation-mediated deamination of 5mC within a CpHpG context.

Keywords: CpG dinucleotide, CpNpGp trinucleotide, cytosine methylation, 5-methylcytosine deamination, mutation hot-spots, human inherited disease, missense/nonsense mutations

Man's yesterday may ne'er be like his morrow; Nought may endure but Mutabililty.

Percy Bysshe Shelley (1816) Mutability

The first hint that the cytosine-guanine (CpG) dinucleotide might constitute a hotspot for pathological mutations in the human genome came nearly 25 years ago with the finding that two different CGA → TGA (Arg → Term) nonsense mutations had recurred quite independently at different locations in the factor VIII (F8) gene causing haemophilia A [1]. The potential generality of this phenomenon was supported by the finding that 12 of the 34 (35 per cent) single base-pair (bp) substitutions then known to cause human inherited disease were C → T and G → A transitions within CpG dinucleotides [2]. Further studies soon confirmed that the CpG dinucleotide was a mutation hotspot in a variety of different human disease genes, including PAH, [3]F9, [4]LDLR, [5]RB1, [6]HPRT1[7] and DMD [8]. As mutation data accumulated, CGA → TGA transitions were encountered particularly frequently as a cause of human genetic disease; such nonsense mutations are inherently more likely to come to clinical attention than missense mutations [9,10].

From the outset, it was realised that the hyper-mutability of the CpG dinucleotide was related to its role as the major site of cytosine methylation in the human genome. The reason traditionally put forward to explain this association has been that, while cytosine spontaneously deaminates to uracil (which is efficiently recognised as a non-DNA base and removed by uracil-DNA glycosylase), the spontaneous deamination of 5-methylcytosine (5mC) yields thymine, [11] thereby creating G•T mismatches whose removal by methyl-CpG binding domain protein 4 (MBD4) and/or thymine DNA glycosylase followed by base excision repair is error prone [12-16]. It remains possible, however, that mCpG transitions are not exclusively caused by the spontaneous deamination of 5-methylcytosine and may also arise through the action of other mechanisms and processes [17-19]. Irrespective of the precise nature of the underlying mechanism, Krawczak et al. (1998) [9] estimated that the rate of CG → TG (and CG → CA on the other strand) transitions was five times the base mutation rate. Subsequent estimates of 5mC hypermutability--derived from various studies of polymorphism, disease mutations or evolutionary divergence--have ranged between four-fold and 15-fold [20-26].

It has been known for some time that cytosine methylation also occurs in the context of CpNpG sites in mammalian genomes, where N represents any nucleotide [27,28]. Since the intrinsic symmetry of the CpNpG trinucleotide would support a semi-conservative model of replication of the methylation pattern (as with the CpG dinucleotide), it comes as no surprise that both maintenance and de novo methylation occurs at CpNpG sites in mammalian cells [28]. In their recent paper on the human methylome, Lister et al. [29] reported abundant DNA methylation in CpHpG trinucleotides (where H = A, C or T). Specifically, some 17.3 per cent of 5mC in embryonic stem cells was found to occur within CpApG, CpCpG and CpTpG, with a further 7.2 per cent of 5mC occurring in CpHpH. Although Lister et al. [29] suggested that non-CpG methylation is almost entirely lost upon differentiation (a conclusion based solely upon the analysis of foetal lung fibroblasts), others have noted CpNpG methylation within human genes in a variety of different somatic tissues [30,31]. Although the extent of non-CpG methylation in the germ-line remains unclear, if we were to assume not only that CpHpG methylation occurs in the germline, but also that 5mC deamination can occur within a CpHpG context, then it is very likely that methylated CpHpG sites would constitute mutation hot-spots. Indirect evidence that this might indeed be the case has come from a disproportionately high number of C → T and G → A transitions at CpNpG sites in studies of the human NF1[32] and BRCA1[33] genes. In the light of the above, we have revisited the question of CpG dinucleotide hyper-mutability and explored the potential contribution that CpHpG transitions might make to human inherited disease.

According to the April 2010 release of the Human Gene Mutation Database (HGMD; http://www.hgmd.org), [34] 56,457 pathological mutations have been reported in a total of 2,242 human genes. A subset of 54,625 pathological missense and nonsense mutations in 2,113 genes, with ± 2 bp genomic DNA sequence flanking the site of mutation, was retrieved from the HGMD. The numbers of C → T and G → A mutations in this mutation dataset that were located within either CpG dinucleotides or CpHpG trinucleotides were counted and termed 'mutations in di/trinucleotide' (Table 1). Only these C → T and G → A transitions, found in the context of a CpG dinucleotide or CpHpG trinucleotide, would be compatible with a model of methylation-mediated deamination of 5mC. The remaining mutations in this HGMD dataset that were located in non-CpG or non-CpHpG di/trinucleotides within the genes in question were also counted and termed 'mutations not in di/trinucleotide' (Table 1). Thus, 18.2 per cent of the studied missense/nonsense mutations causing human inherited disease are located in the CpG dinucleotide, while the corresponding proportion for the CpHpG trinucleotide is 9.9 per cent. To assess the significance of these figures, the number of all possible C → T and G → A mutations within either CpG dinucleotides or CpHpG trinucleotides within the coding regions of the mutated genes, termed 'possible mutations in di/trinucleotides', were also counted (Table 1). In parallel, all possible single bp substitutions that occurred in a non-CpG dinucleotide or non-CpHpG trinucleotide context (as well as mutations other than C → T and G → A in CpG and CpHpG) within the coding regions of the mutated genes were counted as 'possible mutations not in di/trinucleotide' (Table 1). A weak positive correlation was noted between the number of CpG mutations in the 2,113 genes analysed and the number of possible CpG mutations in these genes (Pearson's correlation 0.129, p = 2.45 × 10-9), implying that the CpG mutation frequency is influenced to some extent by the frequency of occurrence of the underlying CpG dinucleotide. Unsurprisingly, a significantly greater proportion (approximately ten-fold) of observed pathological missense/nonsense mutations within these genes were C → T and G → A transitions within CpG dinucleotides than would have been expected (by chance alone) for all possible mutations (Table 1; p < 10-230). A weak positive correlation (Pearson's correlation 0.251, p = 1.01 × 10-31) was also noted between the number of CpHpG-located mutations and the number of CpHpG trinucleotides in these genes, implying that the CpHpG mutation frequency is influenced to some extent by the frequency of occurrence of the underlying CpHpG trinucleotide. Once again, a greater proportion (approximately two-fold) of observed pathological missense/nonsense mutations within these genes were C → T and G → A transitions within CpHpG trinucleotides than would have been expected by chance alone for all possible mutations (Table 1; p < 10-230).

Table 1.

Numbers of C → T and G → A mutations found in CpG dinucleotides and CpHpG trinucleotides in a dataset of 54,625 missense and nonsense mutations in 2,113 different human genes (HGMD) and the numbers of possible C → T and G → A mutations in CpG dinucleotides and CpHpG trinucleotides within the coding regions of the mutated genes.

Di/trinucleotide Dataset Number of mutations in p-value
in di/trinucleotide not in di/trinucleotide
CpG HGMD 9,947 44,678 < 10-230
Possible 292,147 13,269,850
CpHpG HGMD 5,402 49,223 < 10-230
Possible 610,714 12,951,283

From the data presented in Table 1, we estimate that ~11.8 per cent of the 9,947 CpG mutations (ie 1,176) occurred within this dinucleotide by chance alone and hence would not necessarily have originated via the methylation-mediated deamination of 5mC. In a similar vein, we estimate that ~46 per cent (2,460) of the CpHpG mutations (5,402) occurred within these trinucleotides by chance alone and hence may not have originated via methylation-mediated deamination of 5mC. The other side of this particular coin, however, is that the remaining 54 per cent of the 5,402 observed CpHpG mutations in HGMD (ie the excess 2,842 over expectation, or ~5 per cent of all the missense/nonsense mutations analysed) may well be attributable to methylation-mediated deamination of 5mC within a CpHpG context. As far as we are aware, this is the first (albeit crude) estimate of the potential impact of CpHpG mutations on human inherited disease.

A similar analysis was performed for 1,766 regulatory mutations (identified in the promoters of 191 human genes) retrieved from the HGMD. The numbers of actual and possible CpG and CpHpG mutations were counted as before, using the promoter sequences of each gene. In order to determine the total numbers of possible CpG/CpHpG and non-CpG/CpHpG mutations, the wild-type promoter sequences for each gene (total length, 22,051 bp) were used (Table 2). As with the missense/nonsense mutations, an approximately twofold higher proportion of observed pathological regulatory mutations within these genes were C → T and G → A transitions within CpG dinucleotides than would have been expected by chance alone for all possible mutations (Table 2; p = 6.03 × 10-9). We estimate that ~55 per cent of the 94 CpG mutations (ie ~52) probably occurred within these dinucleotides by chance alone rather than via methylation-mediated deamination of 5mC. By contrast, a lower than expected proportion of C → T and G → A regulatory mutations located in CpHpG trinucleotides was observed (p = 0.011). The absence of any excess of C → T and G → A mutations located in CpHpG trinucleotides indicates that most, if not all, the promoter CpHpG mutations probably occurred by chance alone, making it unnecessary to invoke methylation-mediated deamination of 5mC to account for them. Since neither CpG nor CpHpG were found to be under-represented in the examined promoter regions as compared to the coding regions, we surmise that the reduced (CpG) or absent (CpHpG) preponderance of C → T and G → A promoter mutations in the methylatable di/trinu-cleotides may be due to the relative paucity of cytosine methylation within the promoter regions [35] that would render unmethylated CpG and CpHpG di/trinucleotides no more mutable than any other di/trinucleotide.

Table 2.

Numbers of C → T and G → A mutations found in CpG dinucleotides and CpHpG trinucleotides in a dataset of 1,766 regulatory mutations of 191 gene promoters (HGMD) and the numbers of possible C → T and G → A mutations in CpG dinucleotides and CpHpG trinucleotides.

Di/trinucleotide Dataset Number of mutations in p-value
in di/trinucleotide not in di/trinucleotide
CpG HGMD 94 1,672 6.03 × 10-9
Possible 1,940 64,213
CpHpG HGMD 54 1,712 0.011
Possible 2,838 63,315

Although two specific examples of non-CpG methylation altering the binding of transcription factors to promoter elements within human genes have so far been reported, [36,37] the functional role of most non-CpG methylation in the human genome is still unclear. Irrespective of the functionality or otherwise of this specific type of post-synthetic DNA modification in the human genome, it would appear that methylation of the CpHpG trinucleotide may leave a significant imprint on the spectrum of missense/nonsense mutations causing human genetic disease.

References

  1. Youssoufian H, Kazazian HH Jr, Phillips DG, Aronis S. et al. Recurrent mutations in haemophilia A give evidence for CpG mutation hotspots. Nature. 1986;324:380–382. doi: 10.1038/324380a0. [DOI] [PubMed] [Google Scholar]
  2. Cooper DN, Youssoufian H. The CpG dinucleotide and human genetic disease. Hum Genet. 1988;78:151–155. doi: 10.1007/BF00278187. [DOI] [PubMed] [Google Scholar]
  3. Abadie V, Lyonnet S, Maurin N, Berthelon M. et al. CpG dinucleotides are mutation hot spots in phenylketonuria. Genomics. 1989;5:936–939. doi: 10.1016/0888-7543(89)90137-7. [DOI] [PubMed] [Google Scholar]
  4. Koeberl DD, Bottema CD, Ketterling RP, Bridge PJ. et al. Mutations causing hemophilia B: Direct estimate of the underlying rates of spontaneous germ-line transitions, transversions, and deletions in a human gene. Am J Hum Genet. 1990;47:202–217. [PMC free article] [PubMed] [Google Scholar]
  5. Rideout WM, Coetzee GA, Olumi AF, Jones PA. 5-Methylcytosine as an endogenous mutagen in the human LDL receptor and p53 genes. Science. 1990;249:1288–1290. doi: 10.1126/science.1697983. [DOI] [PubMed] [Google Scholar]
  6. Mancini D, Singh S, Ainsworth P, Rodenhiser D. Constitutively methylated CpG dinucleotides as mutation hot spots in the retinoblastoma gene (RB1) Am J Hum Genet. 1997;61:80–87. doi: 10.1086/513898. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. O'Neill JP, Finette BA. Transition mutations at CpG dinucleotides are the most frequent in vivo spontaneous single-based substitution mutation in the human HPRT gene. Environ Mol Mutagen. 1998;32:188–191. doi: 10.1002/(SICI)1098-2280(1998)32:2&#x0003c;188::AID-EM16&#x0003e;3.0.CO;2-Y. [DOI] [PubMed] [Google Scholar]
  8. Buzin CH, Feng J, Yan J, Scaringe W. et al. Mutation rates in the dystrophin gene: A hotspot of mutation at a CpG dinucleotide. Hum Mutat. 2005;25:177–188. doi: 10.1002/humu.20132. [DOI] [PubMed] [Google Scholar]
  9. Krawczak M, Ball EV, Cooper DN. Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes. Am J Hum Genet. 1998;63:474–488. doi: 10.1086/301965. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Mort M, Ivanov D, Cooper DN, Chuzhanova NA. A meta-analysis of nonsense mutations causing human genetic disease. Hum Mutat. 2008;29:1037–1047. doi: 10.1002/humu.20763. [DOI] [PubMed] [Google Scholar]
  11. Shen JC, Rideout WM, Jones PA. The rate of hydrolytic deamination of 5-methylcytosine in double-stranded DNA. Nucleic Acids Res. 1994;22:972–976. doi: 10.1093/nar/22.6.972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Hendrich B, Hardeland U, Ng HH, Jiricny J. et al. The thymine glycosylase MBD4 can bind to the product of deamination at methylated CpG sites. Nature. 1999;401:301–304. doi: 10.1038/45843. [DOI] [PubMed] [Google Scholar]
  13. Waters TR, Swann PF. Thymine-DNA glycosylase G to A transition mutations at CpG sites. Mutat Res. 2000;462:137–147. doi: 10.1016/S1383-5742(00)00031-4. [DOI] [PubMed] [Google Scholar]
  14. Walsh CP, Xu GL. Cytosine methylation DNA repair. Curr Top Microbiol Immunol. 2006;301:283–315. doi: 10.1007/3-540-31390-7_11. [DOI] [PubMed] [Google Scholar]
  15. Cortázar D, Kunz C, Saito Y, Steinacher R. et al. The enigmatic thymine DNA glycosylase. DNA Repair. 2007;6:489–504. doi: 10.1016/j.dnarep.2006.10.013. [DOI] [PubMed] [Google Scholar]
  16. Boland MJ, Christman JK. Characterization of Dnmt3b:thymine-DNA glycosylase interaction and stimulation of thymine glycosylase-mediated repair by DNA methyltransferase(s) and RNA. J Mol Biol. 2008;379:492–504. doi: 10.1016/j.jmb.2008.02.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Shen JC, Rideout WM, Jones PA. High frequency mutagenesis by a DNA methyltransferase. Cell. 1992;71:1073–1080. doi: 10.1016/S0092-8674(05)80057-1. [DOI] [PubMed] [Google Scholar]
  18. Zhang X, Mathews CK. Effect of DNA cytosine methylation upon deamination-induced mutagenesis in a natural target sequence in duplex DNA. J Biol Chem. 1994;269:7066–7069. [PubMed] [Google Scholar]
  19. Pfeifer GP. Mutagenesis at methylated CpG sequences. Curr Top Microbiol Immunol. 2006;301:259–281. doi: 10.1007/3-540-31390-7_10. [DOI] [PubMed] [Google Scholar]
  20. Nachman MW, Crowell SL. Estimate of the mutation rate per nucleotide in humans. Genetics. 2000;156:297–304. doi: 10.1093/genetics/156.1.297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Kondrashov AS. Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases. Hum Mutat. 2003;21:12–27. doi: 10.1002/humu.10147. [DOI] [PubMed] [Google Scholar]
  22. Tomso DJ, Bell DA. Sequence context at human single nucleotide polymorphisms: Overrepresentation of CpG dinucleotide at polymorphic sites and suppression of variation in CpG islands. J Mol Biol. 2003;327:303–308. doi: 10.1016/S0022-2836(03)00120-7. [DOI] [PubMed] [Google Scholar]
  23. Jiang C, Zhao Z. Directionality of point mutation and 5-methylcytosine deamination rates in the chimpanzee genome. BMC Genomics. 2006;7:316. doi: 10.1186/1471-2164-7-316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Elango N, Kim SH, Vigoda E, Yi SV. Mutations of different molecular origins exhibit contrasting patterns of regional substitution rate variation. PLoS Comput Biol. 2008;4:e1000015. doi: 10.1371/journal.pcbi.1000015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Misawa K, Kikuno RF. Evaluation of the effect of CpG hypermutability on human codon substitution. Gene. 2009;431:18–22. doi: 10.1016/j.gene.2008.11.006. [DOI] [PubMed] [Google Scholar]
  26. Li JB, Gao Y, Aach J, Zhang K. et al. Multiplex padlock targeted sequencing reveals human hypermutable CpG variations. Genome Res. 2009;19:1606–1615. doi: 10.1101/gr.092213.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Woodcock DM, Crowther PJ, Diver WP. The majority of methylated deoxycytidines in human DNA are not in the CpG dinu-cleotide. Biochem Biophys Res Commun. 1987;145:888–894. doi: 10.1016/0006-291X(87)91048-5. [DOI] [PubMed] [Google Scholar]
  28. Clark SJ, Harrison J, Frommer M. CpNpG methylation in mammalian cells. Nat Genet. 1995;10:20–27. doi: 10.1038/ng0595-20. [DOI] [PubMed] [Google Scholar]
  29. Lister R, Pelizzda M, Dowen RH, Hawkins RD. et al. Human DNA methylomes at base resolution show widespread epige-nomic differences. Nature. 2009;462:315–322. doi: 10.1038/nature08514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Lee J, Jang SJ, Benoit N, Hoque MO. et al. Presence of 5-methylcytosine in CpNpG trinucleotides in the human genome. Genomics. 2010;96:67–72. doi: 10.1016/j.ygeno.2010.03.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Laurent L, Wong E, Li G, Huynh T. et al. Dynamic changes in the human methylome during differentiation. Genome Res. 2010;20:320–331. doi: 10.1101/gr.101907.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Rodenhiser DI, Andrews JD, Mancini DN, Jung JH. et al. Homonucleotide tracts, short repeats CpG/CpNpG motifs are frequent sites for heterogeneous mutations in the neurofibromatosis type 1 (NF1) tumour-suppressor gene. Mutat Res. 1997;373:185–195. doi: 10.1016/S0027-5107(96)00171-6. [DOI] [PubMed] [Google Scholar]
  33. Cheung LW, Lee YF, Ng TW, Ching WK. et al. CpG/CpNpG motifs in the coding region are preferred sites for mutagenesis in the breast cancer susceptibility genes. FEBS Lett. 2007;581:4668–4674. doi: 10.1016/j.febslet.2007.08.061. [DOI] [PubMed] [Google Scholar]
  34. Stenson PD, Mort M, Ball EV, Howells K. et al. The Human Gene Mutation Database: 2008 update. Genome Med. 2009;1:13. doi: 10.1186/gm13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Illingworth RS, Bird AP. CpG islands -- "A rough guide". FEBS Lett. 2009;583:1713–1720. doi: 10.1016/j.febslet.2009.04.012. [DOI] [PubMed] [Google Scholar]
  36. Clark SJ, Harrison J, Molloy PL. Sp1 binding is inhibited by (m)Cp(m)CpG methylation. Gene. 1997;195:67–71. doi: 10.1016/S0378-1119(97)00164-9. [DOI] [PubMed] [Google Scholar]
  37. Inoue S, Oishi M. Effects of methylation of non-CpG sequence in the promoter region on the expression of human synapto-tagmin XI (syt11) Gene. 2005;348:123–134. doi: 10.1016/j.gene.2004.12.044. [DOI] [PubMed] [Google Scholar]

Articles from Human Genomics are provided here courtesy of BMC

RESOURCES