Abstract
Epigenetics represents a secondary inheritance system that has been poorly investigated in human biology. The objective of this study was to perform a comprehensive analysis of DNA methylation variation between and within the germlines of normal males. First, methylated cytosines were mapped using bisulphite modification–based sequencing in the promoter regions of the following disease genes: presenilins (PSEN1 and PSEN2), breast cancer (BRCA1 and BRCA2), myotonic dystrophy (DM1), and Huntington disease (HD). Major epigenetic variation was detected within samples, since the majority of sperm cells of the same individual exhibited unique DNA methylation profiles. In the interindividual analysis, 41 of 61 pairwise comparisons revealed distinct DNA methylation profiles (P=.036 to 6.8 × 10−14). Second, a microarray-based epigenetic profiling of the same sperm samples was performed using a 12,198-feature CpG island microarray. The microarray analysis has identified numerous DNA methylation–variable positions in the germ cell genome. The largest degree of variation was detected within the promoter CpG islands and pericentromeric satellites among the single-copy DNA fragments and repetitive elements, respectively. A number of genes, such as EED, CTNNA2, CALM1, CDH13, and STMN2, exhibited age-related DNA methylation changes. Finally, allele-specific methylation patterns in CDH13 were detected. This study provides evidence for significant epigenetic variability in human germ cells, which warrants further research to determine whether such epigenetic patterns can be efficiently transmitted across generations and what impact inherited epigenetic individuality may have on phenotypic outcomes in health and disease.
Phenotypic differences among individuals have traditionally been attributed to genetic (DNA sequence) variation and environmental differences. Over the past several decades, documentation of DNA sequence variants has been one of the top priorities in biomedical research. Numerous major international projects—from the sequencing of the Human Genome1,2 to the creation of SNP databases (dbSNP, now called “Entrez SNP”) and the Haplotype Map3 (HapMap)—have contributed significantly to the understanding of the position, degree, and structure of DNA polymorphisms. However, SNPs and other DNA sequence differences are relatively rare, and DNA sequences of two unrelated individuals are 99.5% identical. Furthermore, only a small fraction of these polymorphisms are functional—that is, polymorphisms that change amino acid sequence in the protein or have an impact on gene expression. Sequencing of the chimpanzee (Pan troglodytes) genome revealed 98.67% DNA sequence identity to the human genome, and, again, only a fraction of polymorphisms appear to result in structural or functional gene differences.4 Such findings raise the question: is this low DNA sequence variation across unrelated individuals and our closest related species sufficient to account for all major differences in physiological and psychological phenotypic outcomes?
One potential, although poorly investigated, source of phenotypic differences is epigenetic variation. By definition, “epigenetics” refers to the regulation of various genomic functions that are controlled by partially stable modifications of DNA and chromatin proteins.5 Epigenetic signals are critical to the proper functioning of the genome, as seen in Dnmt1-knockout mice that die in early embryogenesis,6 in several rare pediatric syndromes, and in cancer.7 One important feature of epigenetic regulation is partial epigenetic stability, or metastability. Epigenetic profiles in different cells of the same organism can be quite different, and developmental programs, environmental factors, or stochastic events in the nucleus of a cell can induce this variation. The first systematic effort to document DNA methylation differences and similarities across different genome regions, the Human Epigenome Project, was recently launched. The pilot study of the MHC locus on chromosome 6 investigated seven cell types (adipose, brain, breast, lung, liver, prostate, and muscle) across 32 individuals.8 In this study, which was not controlled for sex and age, around half (118/253) of the tested loci showed some interindividual variability in at least one tissue. The next phase of the Human Epigenome Project, which will be controlled for the above parameters, will provide coverage of >5,000 loci (∼3,000 genes) from chromosomes 6, 13, 20, and 22 across >20 tissue types.8 The Human Epigenome Project and other smaller-scale studies have investigated epigenetic variation primarily in somatic cells. However, there has been very little effort to document epigenetic variation in the germline, apart from imprinted genes9,10 and isolated cases of germ cell epimutations.11,12
There are several reasons to believe that the germline may contain substantial epigenetic variation. Epigenetic reprogramming during gametogenesis, fertilization, and embryogenesis involves dramatic chromatin remodeling.13 Methylation reprogramming during gametogenesis involves the erasure and reestablishment of methylation of imprinted genes and other nonimprinted genes and, then, a second wave of reprogramming during fertilization (paternal) and embryogenesis (maternal).13 This process is thought to (1) ensure that both gametes acquire the appropriate sex-specific epigenetic states and establish the epigenetic states required for early embryonic development and toti- or pluripotency and (2) allow the erasure of epimutations that adult germ cells may have inherited or developed during their lifetime.14,15 In parallel with DNA methylation, chromatin changes during spermatogenesis involve the compaction of the haploid genome by the replacement of the core histones via transition proteins to the much smaller basic protamines 1 and 2.16 However, a number of testis-specific histones and histone variants—such as TSH2B, histones H2A, H3, and H4, variants of H2B, and CENP-A—are present, to some extent, in the mature spermatozoa.17–19 How these remaining histones are arranged and to what extent interindividual variability in histone placement and modification can affect development and phenotype are subjects yet to be investigated. Despite dramatic changes, not all epigenetic signals are erased in the germline, and recent studies in mice have suggested that this phenomenon could underlie epigenetic inheritance.20,21 Therefore, there is ample opportunity during these phases of reprogramming to either maintain or generate substantial epigenetic variability in the germ cells.
Although there is evidence that some individual loci exhibit partial epigenetic stability during meiosis in mice and in other organisms, to further understand the degree, mechanisms, and importance of epigenetic inheritance across generations in humans, three main questions need to be addressed in the following order. (1) Is there any evidence for epigenetic variation in the germ cells? (2) To what extent is the epigenetic variation meiotically stable? (3) What is the impact of epigenetic variation in germ cells on phenotypic differences? In this study, we attempted to answer the first question by estimating the intra- and interindividual epigenetic variation detectable in the mature sperm of healthy individuals. For this goal, we used two different laboratory strategies. The first approach focused on promoter regions of several disease-related genes—such as PSEN1 (MIM 104311), PSEN2 (MIM 600759), BRCA1 (MIM 113705), BRCA2 (MIM 600185), DM1 (MIM 160900), and HD (MIM 143100)—in healthy individuals, with the use of bisulphite modification–based mapping of methylated cytosines and measured epigenetic “distances” between individuals. The second strategy was to perform a microarray-based epigenetic profiling of sperm DNA with the use of a CpG island microarray, which provides genomewide information on methylation variability across different unique and repetitive DNA sequences. Several loci of interest identified in the microarray experiments were further investigated using methylation-sensitive single-nucleotide primer extension (MS-SNuPE) reaction.
Material and Methods
Samples
Two sperm sample sets were used in this study. The first sample set was received from the Fairfax Cryobank, Genetics & IVF Institute, in Fairfax, VA, and consisted of 25 sperm samples collected from healthy white sperm donors at an average age of 27 years (range 22–35 years). The second set of sperm samples was collected at the Centre for Addiction and Mental Health in Toronto from 21 healthy white individuals at an average age of 39 years (range 24–56 years). This study was approved by an institutional ethics board, and informed consent was obtained from all participants. Some aspects of sperm DNA data analysis required a nonsperm tissue of reference; for this purpose, postmortem brain tissues were used (donated by The Stanley Medical Research Institute’s brain collection, courtesy of Drs. M. B. Knable, E. F. Torrey, M. J. Webster, S. Weis, and R. H. Yolken). These brain samples were from 22 white males who had an average age at death of 46 years (range 31–66 years). Extraction of DNA was performed using standard salt and phenol/chloroform extraction.
Bisulphite Modification–Based Mapping of Methylated Cytosines
Bisulphite modification–based mapping of methylated cytosines was performed as described elsewhere.22 In brief, genomic DNA (700 ng) was digested with BglII (Fermentas) for 1 h at 37°C, was denatured at 100°C for 5 min, was chilled on ice, and was then incubated at 50°C for 15 min in 0.3 M NaOH. The DNA was then mixed with 2% low–melting point agarose (SeaPlaque Agarose, FMC) and was dropped into ice-cold mineral oil to form seven beads of ∼10 μl, and, finally, the beads were placed into a freshly prepared solution containing 2.5 M sodium bisulphite (pH 5.0) plus 1 mM hydroquinone (both from Sigma). The beads were then incubated on ice for 30 min, followed by incubation at 50°C for 3.5 h. The beads were washed in four changes of Tris-EDTA (TE) (pH 8.0) for 1 h and then were desulphonated in 0.2 M NaOH for 30 min. After desulphonation, the beads were washed a second time in three changes of TE for 30 min. Before amplification, the beads were washed in H2O for 30 min. PCR amplification of the target sequences consisted of 5 μl of agarose beads containing the bisulphite-treated DNA, 2 mM MgCl2, 0.2 mM deoxynucleotide triphosphates (dNTPs), 0.4 μM each of forward and reverse primer, 250 ng/ml BSA, and 2.5 U Taq polymerase (New England Biolabs) in 1× PCR buffer, to a total volume of 50 μl. PCR was performed using either a seminested or a fully nested approach, with the first PCR consisting of one cycle at 97°C for 4 min, 53°C for 2 min, and 72°C for 2 min, followed by 24 cycles at 94°C for 45 s, 53°C for 1 min, and 72°C for 1 min. The second PCR used 5 μl of the first PCR as a template and consisted of one cycle at 97°C for 2 min, 53°C for 2 min, and 72°C for 1 min, followed by 24 cycles at 94°C for 45 s, 55°C for 45 s, and 72°C for 1 min. CpG islands in the 5′ promoter sequences were analyzed in six genes: PSEN1 (Entrez GeneID 5663; chromosome 14: 72672525–72673163), PSEN2 (GeneID 5664; chromosome 1: 223365273–223365990), BRCA1 (GeneID 672; chromosome 17: 38530561–38531181), BRCA2 (GeneID 675; chromosome 13: 31787367–31788153), HD (GeneID 3064; chromosome 4: 3113281–3113816), and DM1 (GeneID 1760; chromosome 19: 50964670–50965254). An intronic CpG island within the CDH13 gene (GeneID 1012; chromosome 16: 81218597–81218988) was also analyzed by bisulphite genomic sequencing. Nucleotide positions given are from the May 2004 Genome (hg17) version (UCSC Genome Browser). The primers used for amplification of bisulphite-modified DNA fragments are available in table 1.
Table 1. .
Primer Sequences
| Gene and Primer Name | Methoda | Sequence (5′→3′) |
| BRCA1: | ||
| prBRCA1_for | Bis-PCR | GAGTAGAGGTTAGAGGGTAGGT |
| prBRCA1_rev1 | Bis-PCR | CAAAACATATTCCAATTCCTATCAC |
| prBRCA1_rev2 | Bis-PCR | TCAATACCCCCAACCTAATCCTC |
| BRCA2: | ||
| prBRCA2_for | Bis-PCR | GGGGGATTTGGAGTAGGTATAGG |
| prBRCA2_rev1 | Bis-PCR | CACTTCCCCAAAACAACAATATTCC |
| prBRCA2_rev2 | Bis-PCR | AACCCACTACCACCACCACTAACC |
| PSEN1: | ||
| prPSEN1for_1 | Bis-PCR | GATTTATTGAGTGGTGGGAGAG |
| prPSEN1for_2 | Bis-PCR | TGATAGGTGTTAAATTTAGGATGG |
| prPSEN1rev | Bis-PCR | CCCCTCATCTTTTAAAACACC |
| PSEN2: | ||
| prPSEN2_for | Bis-PCR | GGGGTGGAGAGAGGAGAGTGT |
| prPSEN2_rev1 | Bis-PCR | AAATACAATTACTTCACTCAACACC |
| prPSEN2_rev2 | Bis-PCR | AACTCTATAACCTCAATTTCTTCATC |
| HD: | ||
| prHDfor_1 | Bis-PCR | GGTTTTTTGGTTAGTTATTGGTAGAG |
| prHDfor_2 | Bis-PCR | GTAGGTTAGGGTTGTTAATTATGTTGG |
| prHDrev_1 | Bis-PCR | CAATACAACAACTCCTCAACCACAACC |
| DM1: | ||
| prDM1for_1 | Bis-PCR | GTGGATGGGTAAATTGTAGGTTTGG |
| prDM1rev_1 | Bis-PCR | AACATTCCCAACTACAAAAACCCTTC |
| prDM1rev_2 | Bis-PCR | CTTTTCCTCCCCCAACCCTAATTC |
| CDH13: | ||
| CDH13 44g5F1 | Bis-PCR | ATAAAATTTAAGTTAGGATGGGAGATATAG |
| CDH13 44g5R1 | Bis-PCR | ATAAATAAACCAAAACAATACTTTACCTA |
| CDH13 44g5F2 | Bis-PCR | TTTGGTATTTAGTAGTTGTTTAATAAAGTT |
| CDH13 44g5R2 | Bis-PCR | TACAAAATATCATACTCTAATCACTAAACC |
| ARH1: | ||
| 22_F_7F1 | Bis-PCR | AGTGGATATTGTGTTAATTTTTAAG |
| 22_F_7R1 | Bis-PCR | ATATCCTATCTTTTCAATATCATTT |
| 22_F_7F2 | Bis-PCR | TATTTAAAGTGTAGTTGTAGGATTTGTTAG |
| 22_F_7R2 | Bis-PCR | ACTACTTTTAAACTTTCATCTAAAATCAAT |
| NELL2: | ||
| 44_A_4F1 | Bis-PCR | TTGTTTTATTATAATGTTGATGAGA |
| 44_A_4R1 | Bis-PCR | TTCTTAAACTCTACATTACCTCTATTT |
| 44_A_4F2 | Bis-PCR | GGGGTTATAGTTTGTGTGGAGT |
| 44_A_4R2 | Bis-PCR | CAATATATAATAAAATACAATAAAACCTCCTC |
| 44_A_4F3 | Bis-PCR | TTATTTATTATGGGTTTTTGTTTG |
| 44_A_4R3 | Bis-PCR | CTACTCTTTCTTATAAAAATAACTAAATTC |
| 44_A_4F4 | Bis-PCR | GTGTAGTGGTGTAATTATGTTTTATTGTAG |
| 44_A_4R4 | Bis-PCR | TCTAAACATACTAACTCATACCTATAATCC |
| RHOQ: | ||
| 50_H_2F1 | Bis-PCR | ATTAGATTTGGAGTTTGAGAGTTTAG |
| 50_H_2R1 | Bis-PCR | AAAAACAAAAACCCTAACTTATTACAT |
| 50_H_2F2 | Bis-PCR | GATAGTGAGGAATGGGAATATAATAG |
| 50_H_2R2 | Bis-PCR | CCATAACACAAACTTTAATCATTTACTA |
| SCAM: | ||
| 31_F_12F1 | Bis-PCR | GTATAGAGGAAAGGAATGTTATTTTTATT |
| 31_F_12R1 | Bis-PCR | TAATACTAAAACTCTAAATAATCACCCAAA |
| 31_F_12F2 | Bis-PCR | TTTTGTGTTATTAGGTTGGAATTT |
| 31_F_12R2 | Bis-PCR | AATACTTTCCTCATATACCTCCTCTAC |
| PDE: | ||
| 9_H_3F1 | Bis-PCR | GGAGATGAAATGGGTAATATTTTT |
| 9_H_3R1 | Bis-PCR | ATATTCTTATAACTACCATCAACCAAAAC |
| 9_H_3F2 | Bis-PCR | GAGGAGAGTTGGTTAGTTAAGATTT |
| 9_H_3R2 | Bis-PCR | AAACTCAACTTAAATTCCAAAAATAC |
| MKL: | ||
| 22_A_4F1 | Bis-PCR | TTAGTGTTTATTTTGATTGTAGAGTTG |
| 22_A_4R1 | Bis-PCR | ATTCTAACAAAATTAAAAACCACTATTC |
| 22_A_4F2 | Bis-PCR | GTTTATGGAGTTTTTGTTGTGTG |
| 22_A_4R2 | Bis-PCR | CTAAACTCCAATATTCCACTTCATTA |
| NEIL: | ||
| 43_B_5F1 | Bis-PCR | GTAGAAGAGGATTAGGTATTTAATTGGTTA |
| 43_B_5R1 | Bis-PCR | AACTATAAACCTCTAACACCTCCTAACT |
| 43_B_5F2 | Bis-PCR | AAAGTTTGATGAGGGGAAATAGTA |
| 43_B_5R2 | Bis-PCR | ATTCTAACCCACTACTACCAACTTATT |
| DSCAM: | ||
| 103_C_9F1 | Bis-PCR | ATATTTTATTGATGATAGAAGAGAAGGTAG |
| 103_C_9R1 | Bis-PCR | AACCTACTAATACAATACAAAATATAACCA |
| 103_C_9F2 | Bis-PCR | GGAGATGTAGGTAATATGTGTATTTAGTT |
| 103_C_9R2 | Bis-PCR | ACATTAAAACACTTTCCTAAAATAACAA |
| 103_C_9F3 | Bis-PCR | AAAGTTTAGTTGGATTTATAGTTTT |
| 103_C_9R3 | Bis-PCR | TTACTATATTAATCTATTTCCACACTACTA |
| 103_C_9F4 | Bis-PCR | TAGAATTATGGTGGGAGGTAAAAG |
| 103_C_9R4 | Bis-PCR | TAAATTAAACTTACAATTCCACATAACTAA |
| OLR1: | ||
| 53_H_11F1 | Bis-PCR | TTAATTTTTGTATTTTTAGTAGAGATAGGG |
| 53_H_11R1 | Bis-PCR | ATTACAATAAACTAAAATCACACCACTAC |
| 53_H_11F2 | Bis-PCR | TTTTTAAAGTGTTAGGATTATAGG |
| 53_H_11R2 | Bis-PCR | AAAACTTAAATCCACCAAAAA |
| 53_H_11F3 | Bis-PCR | TATTTGATTTTAATTTTTGGAGATG |
| 53_H_11R3 | Bis-PCR | ATAAATAAACTTCTTAAACTCCTCATATTT |
| 53_H_11F4 | Bis-PCR | TTTAGGTTGGAATGTAGTGGTTT |
| 53_H_11R4 | Bis-PCR | ACCAATCTACCTCCATTAACTCTATT |
| AHR: | ||
| 55_F_9F1 | Bis-PCR | GTTTTGTTAGAATGTTTTAAAGTTGTTT |
| 55_F_9R1 | Bis-PCR | CAAATAACTCCCACTTTTAATAAATATC |
| 55_F_9F2 | Bis-PCR | GATGTGTATAGGTATTTTTATATTTATTTTTAGG |
| 55_F_9R2 | Bis-PCR | ATTCTATCAATTACCAATATCCACATACT |
| 55_F_9F3 | Bis-PCR | GGATATAAGTTATGGAAATAATTAGAAAAT |
| 55_F_9R3 | Bis-PCR | CTAATCAACACAAACAATATATACATAAAA |
| 55_F_9F4 | Bis-PCR | AGTATTATAGGAATTTGAAGTAGAGAAAAA |
| 55_F_9R4 | Bis-PCR | TTTACACAATATTTACTTCAATTATTTACC |
| CDH13: | ||
| 44_G_5F1 | Bis-PCR | ATAAAATTTAAGTTAGGATGGGAGATATAG |
| 44_G_5R1 | Bis-PCR | ATAAATAAACCAAAACAATACTTTACCTA |
| 44_G_5F2 | Bis-PCR | TTTGGTATTTAGTAGTTGTTTAATAAAGTT |
| 44_G_5R2 | Bis-PCR | TACAAAATATCATACTCTAATCACTAAACC |
| 44_G_5F3 | Bis-PCR | GGATAGTTTTGATGTTGTATAATAAATAGT |
| 44_G_5R3 | Bis-PCR | ATAAATTTAACCAAATCTATATCTCAAAAC |
| 44_G_5F4 | Bis-PCR | TAATTTTAAGGATAGTGATTATGTAATTGG |
| 44_G_5R4 | Bis-PCR | ACTCCCATATCCCACCAAAA |
| FBN1: | ||
| 52_A_2F1 | Bis-PCR | ATTTTATGATAGATAGGATATAGGTATTGA |
| 52_A_2R1 | Bis-PCR | TACTAAATATATACAAATAAACATCCTTCC |
| 52_A_2F2 | Bis-PCR | GTAGTAGGGTAGAAATTTATAGTTAGGTTT |
| 52_A_2R2 | Bis-PCR | CCACTTTTATCCACCTATTTTCTAAT |
| 52_A_2F3 | Bis-PCR | TTTTTATTATTTTTAGATTGATGGTAGG |
| 52_A_2R3 | Bis-PCR | TAATATAAAACTACCTTTCAAATATCACAT |
| 52_A_2F4 | Bis-PCR | TGATTTAAATAATGAGAATAGATAGGTTT |
| 52_A_2R4 | Bis-PCR | TACCACTACAAACTTAATACTTTAATAACC |
| ARH1: | ||
| 22_F_7Sp1 | MS-SNuPE | (GACT)X6GYGGGTGGTTTTTGYGTAATYG |
| 22_F_7Sp2 | MS-SNuPE | GACTTTTAGGGATAATTYGTTTATAAATTTTTATTG |
| 22_F_7Sp5 | MS-SNuPE | AGYGGGGTGYGGGGG |
| 22_F_7Sp6 | MS-SNuPE | GGGTAATAGGTATAGATTTYGTTT |
| 22_F_7Sp3 | MS-SNuPE | GACTTAYGAATTAAGTAGTTTAGAAGATAAATG |
| 22_F_7Sp4 | MS-SNuPE | (GACT)X3TTTATTTTTTYGYGGTTTTATATTTYGATTTG |
| 22_F_7Sp7 | MS-SNuPE | GGTTGGTTTAAYGTAGAGYGG |
| 22_F_7Sp8 | MS-SNuPE | (GACT)X5AATAAGTTTTAGGTAAAATTTTGTTAATAAAAAT |
| NELL2: | ||
| 44_A_4Sp1 | MS-SNuPE | GTTTTTGTTTTGTTTTTTGGGTTGTT |
| 44_A_4Sp2 | MS-SNuPE | GACTGACTGTAGGGTTTTATTGTGTTATTATGTT |
| RHOQ: | ||
| 50_H_2Sp4 | MS-SNuPE | ATTTGGTTAGATGGGTGTGTTT |
| 50_H_2Sp3 | MS-SNuPE | (GACT)X3GGTGTTAGYGTAGAGTGTATTTT |
| 50_H_2Sp2 | MS-SNuPE | (GACT)X6AGGGTGATGGTTTAGTTGATTTT |
| 50_H_2Sp5 | MS-SNuPE | TAGAGGGGTAGGGGTTGT |
| 50_H_2Sp7 | MS-SNuPE | GACTGAGTTTAGGTATATTTGTYGGGTT |
| 50_H_2Sp1 | MS-SNuPE | (GACT)X4GTAGTTGGAAGTTTTTGGGATAAT |
| 50_H_2Sp6 | MS-SNuPE | TTTTAGAGTATTAGTGTGTATTAGTTTTT |
| 50_H_2Sp8 | MS-SNuPE | GACTGACTTGGTTTAATAYGTTGTATTTTTTTTAGTTAT |
| SCAM: | ||
| 31_F_12Sp5 | MS-SNuPE | GTTGTGGTTAYGAGGGGG |
| 31_F_12Sp1 | MS-SNuPE | GTTTTTTTTTAGGTTTTAAAGTGGATAG |
| 31_F_12Sp6 | MS-SNuPE | GACTGACTTGTGTTTATTTATTGGAAGATGTTGT |
| 31_F_12Sp4 | MS-SNuPE | GTTGGGTTTYGAGGTTGTGT |
| 31_F_12Sp3 | MS-SNuPE | GACTGACTTTATAGGYGTTTTTGGTTAGGAG |
| 31_F_12Sp2 | MS-SNuPE | (GACT)X3TGGTYGTTTTATTTTYGGTTTTATAAGAT |
| PDE: | ||
| 9_H_3Sp8 | MS-SNuPE | YGGYGGGAGYGATGGAG |
| 9_H_3Sp4 | MS-SNuPE | GACTGAGATTTGAATGAGTTAAAGTYGG |
| 9_H_3Sp5 | MS-SNuPE | (GACT)X4ATAGTAGGTYGTTGATYGGTYG |
| 9_H_3Sp3 | MS-SNuPE | (GACT)X4TYGGAAGTATTTTATTTTTTTTTTTYGTTAGT |
| 9_H_3Sp7 | MS-SNuPE | GGYGGTGGAGAAGTTGAGT |
| 9_H_3Sp2 | MS-SNuPE | GACTGAGTATTATAGATATGYGTGTTTAYG |
| 9_H_3Sp6 | MS-SNuPE | (GACT)X5AGGTTTTTAGGYGTTYGYGTYG |
| 9_H_3Sp1 | MS-SNuPE | (GACT)X5AGYGGAATAYGTGATATTATTTTATTTATTTT |
| MKL: | ||
| 22_A_4Sp3 | MS-SNuPE | GGYGGTGGGAGGGGAT |
| 22_A_4Sp5 | MS-SNuPE | GACTGACTGAGGTGGGTTGAGAGTAG |
| 22_A_4Sp2 | MS-SNuPE | (GACT)X4TGTATAGAGAGGTGGAGYGTT |
| 22_A_4Sp4 | MS-SNuPE | AGGGGGTGTTTGGGAG |
| 22_A_4Sp1 | MS-SNuPE | GACTTTGTGTGTYGGGTATTTAAGTTG |
| NEIL: | ||
| 43_B_5Sp6 | MS-SNuPE | GTYGGYGGTTTTGGAGGG |
| 43_B_5Sp5 | MS-SNuPE | GATTAGAGATAATTGTTTGTAGTTATGT |
| 43_B_5Sp3 | MS-SNuPE | (GACT)X3TGGGAGTTTTTTTTGTTYGAATAGTT |
| 43_B_5Sp1 | MS-SNuPE | TYGGTATTAGYGAGTGTAAGATG |
| 43_B_5Sp2 | MS-SNuPE | (GACT)X3AATAGGTGGTTAGGTAGTTGTTT |
| 43_B_5Sp4 | MS-SNuPE | (GACT)X4TGTATATTATTTAGATTGTTTTATGTAGG |
| DSCAM: | ||
| 103_C_9Sp1 | MS-SNuPE | TAGTTTTTTATTGAAGAGTGTTTAATTATTTT |
| 103_C_9Sp2 | MS-SNuPE | GGTAAAAGGTATTTTTTATATGGTAG |
| 103_C_9Sp3 | MS-SNuPE | GTTTTTGTTTTTATTTTATTTTTATTTTTTTTTTGT |
| OLR1: | ||
| 53_H_11Sp1 | MS-SNuPE | TGTTAGGATTATAGGTATGAGTTAT |
| 53_H_11Sp2 | MS-SNuPE | GGTTGGTTTTAAATTTTTTATTTTAAGTTATT |
| 53_H_11Sp3 | MS-SNuPE | TTGGGATTATAGGTGTGAGTTAT |
| 53_H_11SpN4 | MS-SNuPE | TAATTCTAATTAAAAATTTAAAACTTCTTACC |
| 53_H_11SpN5 | MS-SNuPE | AACATAAAAATTACTTAAACCCTAAAAAC |
| 53_H_11SpN6 | MS-SNuPE | CCTCTCAAAAAAAAAAAAAAAAAATTAACC |
| 53_H_11Sp7 | MS-SNuPE | GACTGACTGTGGTATATTTTYGGTTTATTGTAATTTT |
| 53_H_11Sp12 | MS-SNuPE | GATTATAGGTGTGAGTTATTGTGT |
| 53_H_11SpN8 | MS-SNuPE | AAACAAAAAAATCCCTCRAACCC |
| 53_H_11SpN9 | MS-SNuPE | CTAAAAATACAAAAATTAACCAAATATAATAAC |
| 53_H_11Sp10 | MS-SNuPE | GTTTTGAATTTTTGATTTYGGGTGATT |
| 53_H_11SpN11 | MS-SNuPE | CCCAACACTTTAAAAAACCRAAAC |
| AHR: | ||
| 55_F_9Sp1 | MS-SNuPE | GACTAGGTATTATATTTTGTAAAGTGGTTTTTT |
| 55_F_9Sp2 | MS-SNuPE | TGTATTTATAGTTTGGGAGGAAG |
| 55_F_9Sp3 | MS-SNuPE | TAGGAAAAGTGATAAGTTTATTTGG |
| CDH13: | ||
| 44_G_5Sp1 | MS-SNuPE | GACTGACTGYGTGATTTYGGTTTATTGTAAGTTT |
| 44_G_5Sp3 | MS-SNuPE | TTTYGAGTAGTTGGGATTATAGG |
| 44_G_5SpN2 | MS-SNuPE | AAACTAAAACAAAAAAATAACRTAAACCC |
| 44_G_5SpN4 | MS-SNuPE | ATCTCTACTAAAAATACAAAAAATTAACC |
| 44_G_5SpN5 | MS-SNuPE | ATAATCCTAACACTTTAAAAAACTAAAAC |
| 44_G_5Sp6 | MS-SNuPE | GTTAGGATTATAGGYGTGAGTTAT |
| 44_G_5SpN7 | MS-SNuPE | ATTTTAACTTTTTTAAAAAAAAAAAAAAAAACCC |
| 44_G_5Sp8 | MS-SNuPE | TGGTATTGGAGTTTGTGGTAG |
| FBN1: | ||
| 52_A_2Sp1 | MS-SNuPE | TTGTTGGATTGTAAAGGTTATTTATG |
| 52_A_2Sp2 | MS-SNuPE | TGAATTTTTGTAATGTAGAGTTTGTATT |
Bis-PCR = treated with sodium bisulfite and PCR amplified.
PCR products were electrophoresed on an agarose gel. DNA fragments were excised, were cleaned using Qiagen Gel Extraction Kit (Qiagen), and were cloned into the pGEM-T vector (Promega). Thirty clones from each PCR product (locus/individual) were sequenced. To evaluate the degree of intraindividual variation, we sequenced an additional 30 clones from separate bisulphite reactions in five cases: two in BRCA1, one in BRCA2, and two in PSEN2. A total of 1,020 clones were analyzed, which required >1,500 sequencing reactions, since some longer fragments had to be sequenced from both ends.
Analyses of DNA Methylation Variation in Bisulphite Modification–Based Experiments
The degree of epigenetic diversity within and across individuals was evaluated using the concept of epigenetic “distance.”23 Each of the 30 sequenced clones was binary coded, with “0” for an unmethylated cytosine and “1” for a methylated cytosine. Each clone was, therefore, represented by a row vector of n 0 and 1, where n is the number of cytosines in the tested region.
Estimation of intraindividual variation
Unique methylation profiles were identified for each set of 30 clones. For example, a set of clones 0101, 0101, 0111, and 1100 exhibits three types of methylation profiles (1/2, 3, and 4), and, therefore, the proportion of unique methylation profiles is 3/4. This calculation was performed for every set of 30 clones, and then the mean and SD of the proportion of unique clones across individuals were calculated for each locus. In the second round of analysis, because of possible imperfect C→T conversion with bisulphite treatment, two clones different by a single position were treated as identical. With use of the above example, profiles 0101 and 0111 are now treated as identical, and the degree of uniqueness is 2/4. In the final analysis, the tolerance was increased to two differences—that is, the clones that exhibited two or fewer differences were treated as identical.
Comparison of DNA methylation distances across individuals
The average methylation-intensity vector for each locus/individual was calculated by dividing the sum of the methylated cytosines by 30 for each different cytosine position. The degree of epigenetic dissimilarity was measured by Euclidean distance, by use of the following equation:
![]() |
where m1 is the average methylation vector of individual 1, m2 is the average methylation vector of individual 2, and d12 is the Euclidean DNA methylation distance between individuals 1 and 2. The larger the distance, the more dissimilar the two individuals’ methylation profiles are to each other. With this metric, we calculated the distances between all possible pairs of individuals for each promoter locus of BRCA1, BRCA2, HD, DM1, PSEN1, and PSEN2. To test statistical significance of methylation differences, we performed the following analysis. For each locus, all clones from all individuals were pooled together, and two sets of 30 randomly selected clones from the pool formed the methylation profiles of two pseudo-individuals. The epigenetic distance between the two pseudo-individuals was then calculated with the same procedure as above, and this procedure was repeated 100,000 times to generate 100,000 distances, the density distribution of which was plotted, and the mean (±2 SD) was calculated. The (one-tailed) P value of a distance was then obtained by finding the area under the distribution curve, from the left up to the calculated distance. An epigenetic distance in two real individuals with P<.05 (i.e., >2 SD) indicates that difference in the DNA methylation of two individuals is statistically significant.
Microarray-Based DNA Methylation Analysis
Microarrays
Genomewide epigenetic profiling was performed using the 12,192 CpG island microarrays24 purchased from the University Health Network Microarray Facility in Toronto.
Enrichment of unmethylated DNA
We used our developed technology for enrichment of the unmethylated DNA fraction and for epigenetic profiling described in detail elsewhere.25 The general principle of the DNA methylation profiling consists of interrogation of the unmethylated fraction of genomic DNA on the microarray. Intensity of hybridization inversely correlates with the DNA methylation status at the genomic locus homologous to a specific DNA fragment on the array. In brief, methylation-sensitive restriction enzymes were used to digest 1 μg of genomic DNA, and two enzyme scenarios were used in this project. First, sperm DNA samples from 25 individuals were analyzed using methylation-sensitive enzymes HpaII, Hin6I, and AciI (designated “sperm DNA–HHA array” set). This enzyme “cocktail” strategy, however, is not ideal for GC-rich regions, such as CpG islands, since these three enzymes would generate DNA fragments too small for efficient amplification and hybridization. Therefore, a single-digestion approach with HpaII alone was used on a second set of sperm DNA samples from 21 individuals (designated “sperm DNA–HpaII” array set). DNA adaptors (annealing products of two primers, U-CG1a [5′-CGTGGAGACTGACTACCAGAT-3′] and U-CG1b [5′-AGTTACATCTGGTAGTCAGTCTCCA-3′]) were ligated to the restricted DNA fragments, followed by treatment with McrBC (New England Biolabs), which cleaves the fragments containing two or more methylated cytosines, thereby further enriching the unmethylated fraction. Adaptor-PCR amplification of the ligated products, with the use of primers complementary to the adaptor sequence, consisted of 250 ng of ligated DNA, 2.5 mM MgCl2, 0.2 mM aminoallyl-dNTPs (15 mM aminoallyl–2′-deoxyuridine 5′-triphosphate, 10 mM 2′-deoxythymidine 5′-triphosphate, and 25 mM each of 2′-deoxycytidine 5′-triphosphate, 2′-deoxyguanosine 5′-triphosphate, and 2′-deoxyadenosine 5′-triphosphate), 200 pmol primer U-CG1b, and 5 U Taq polymerase (New England Biolabs) in 1× PCR reaction buffer (Sigma), to a final volume of 100 μl. PCR conditions are adjusted in such a way that only fragments <1.5 kb (i.e., short, digested, and, therefore, unmethylated) will amplify preferentially. Cycling consisted of an initial cycle at 72°C for 5 min and 95°C for 1 min, 25 cycles at 95°C for 40 s and 68°C for 2 min 30 s, and a final extension at 72°C for 5 min. Equal amounts of amplicons from each sample were mixed to form the pooled control, which was labeled with Cy3 and was cohybridized against each individual amplicon labeled with Cy5. Hybridization was performed at 42°C with the use of standard procedure.25
For comparison with the sperm DNA methylation profiles, DNA samples from postmortem brains of 22 individuals who did not have any known brain disease were subjected to the same microarray-based DNA methylation profiling that used a single-digestion approach with HpaII (designated “brain DNA–HpaII” array set).
Microarray data processing and analysis
Methylation differences between the individuals and the pooled control were analyzed by the ratio of hybridization intensities of Cy5 (individual samples) over Cy3 (pooled control). As we have learned from our previous analyses of arrays used for DNA methylation analysis, such ratios show normal distribution; therefore, the data can be treated similarly to those in classical microarray experiments. The array data were normalized in two steps—first, in a global intensity normalization, to adjust the Cy5:Cy3 ratio to 1:1 across the entire array, followed by a block-by-block LOWESS normalization. The data were trimmed to remove spots with ambiguous genome locations, including spots with no sequence or annotation (647 spots), spots with >30% repetitive elements (2,706 spots), and translocation hotspots (633 spots). The spots for which the microarray clones represented identical sequences were averaged, which resulted in ∼4,970 unique loci. Coefficient of variation (CV) was calculated for each remaining spot by dividing the SD in Cy5/Cy3 by the mean of Cy5/Cy3 across all individuals. The sperm DNA–HHA experiments were performed in duplicate, and the data were averaged ratios. The sperm DNA–HpaII and the brain DNA–HpaII data sets consisted of one array per individual, because we opted for increased biological replicates rather than for increased technical replicates for the number of microarrays available.
Age-covariate analysis
For the CpG island microarray experiment, the age-covariate analysis was performed using a correlation coefficient between two series of quantities, to measure the linear relationship between the series. Pearson correlation coefficient was calculated between the mean fold change (log Cy5/Cy3) across individuals and the ages across individuals, for each spot on the microarray. A large absolute value (|r|>0.5) of the coefficient indicates that the methylation intensity at the locus covariates with age in a positive or a negative way. To test their statistical significance, the ages across individuals were permuted, and, again, the coefficients were computed using the permuted age series. For each spot, the permutation was repeated 5,000 times to get 5,000 coefficients. The one-tailed P value of the coefficient was then obtained by finding the fraction of times that the coefficients were larger (or smaller) than the original coefficient. Although a P-value cutoff at .05 may lead to many false positives, the adjustment of P value for multiple testing, by controlling the probability of making at least one false positive, dramatically lowers the power of the experiment and is also considered too conservative for microarray studies.26 “False discovery rate” (FDR) is defined as the expected proportion of false-positive predictions among the positive predictions. For example, in the 100 positives declared by FDR at 0.1, 90 are expected to be true positives on average. The FDR criterion has increasingly been adopted over P value in microarray analysis. We have, therefore, applied FDR for the findings described in this article.
The autocorrelation clustering analysis for the CpG island microarray experiment was performed using the autocorrelation function ACF(x), which measures how strongly two methylation intensities x loci apart influence each other.
Measurement of Densities of Methylated Cytosines in the Selected Loci
Further analysis of a selected set of DNA fragments identified as the most variable was performed using the MS-SNuPE reaction on the ABI SNapShot platform accommodated for measuring the C/T ratios in the bisulphite-treated genomic DNA.27 In brief, genomic DNA was digested with NdeI (Fermentas), followed by treatment with sodium bisulphite, as described above. The loci of interest were amplified using nested PCR (primers available in table 1). Typical PCR amplification consisted of one cycle at 95°C for 1 min, then 40 cycles at 95°C for 30 s, 50°C for 30 s, and 72°C for 40 s, followed by a final extension at 72°C for 5 min. Quantitative interrogation of the bisulphite-induced C→T transition at CpG dinucleotides in such amplicons was performed with primers targeted to the CpG dinucleotides within the restriction sites for HpaII, Hin6I, or AciI.
Results
Intra- and Interindividual DNA Methylation Differences in the Promoters of BRCA1, BRCA2, HD, DM1, PSEN1, and PSEN2
The bisulphite modification–based mapping of methylated cytosines for all of these genes demonstrated that numerous individual clones (representing individual sperm cells) demonstrated quite different DNA methylation profiles within individuals (fig. 1A). This finding was confirmed by the analysis of the degree of uniqueness of DNA methylation profiles (fig. 1B). In the case of HD, ∼80% of all clones exhibited unique patterns of methylated cytosine distribution. This estimate did not change dramatically when potential bisulphite modification–induced artifacts were taken into account; on average, 72% of clones were different from one another when one methylated cytosine difference was tolerated, and up to 53% were different when two differences were allowed. The latter situation is a very conservative estimate of the degree of uniqueness, since such a high artifactual C→T nonconversion rate is unrealistic. In our experiments, the artifactual C→T bisulphite conversion was always <1%. The lowest degree of intraindividual DNA methylation uniqueness was detected for PSEN2: 36%, 20%, and 13% for 0, 1, and 2 levels of tolerance, respectively. This analysis of uniqueness is, however, related to the clone length and correlates specifically with the density of the CpGs analyzed (Pearson R=0.64, 0.93, and 0.98 for 0, 1, and 2 levels of tolerance, respectively), since more methylatable CpG sites allow more opportunity for variation.
Figure 1. .
Intraindividual variability of DNA methylation. A, DNA methylation profiles of the promoter CpG islands of BRCA1 and PSEN2, determined on the basis of sequencing 60 clones of bisulphite-modified sperm DNA. The BRCA1 locus covered 32 CpGs, and the PSEN2 region included 45 CpGs. Nine monomorphic (unmethylated) CpGs (BRCA1 or PSEN2) were excluded from the figure. Each individual is represented, with individual CpG dinucleotides from left to right (black = methylated cytosines; white = unmethylated cytosines) and individual clones from top to bottom. Like the presented BRCA1 and PSEN2 cases, a substantial proportion of clones in other loci (HD, DM1, BRCA2, and PSEN1) revealed unique DNA methylation profiles. B, Estimates of the proportion of unique methylation profiles in the promoter regions of the six analyzed genes. The Y-axis shows the proportion of clones carrying unique methylation profiles over the total number of sequenced clones; the X-axis shows the proportion of unique profiles that contain at least 1, 2, and 3 differences (left, middle, and right bars, respectively), compared with the other profiles at the same locus in the same individual.
Whereas the intraindividual analysis can show variability within an individual, significantly variable methylation patterns between individuals were also revealed (fig. 2). The gene-specific results were as follows: in BRCA1, n=4, 32 CpGs were analyzed, and 5/6 pairwise comparisons exhibited statistically significant differences (average P=2.53×10-5); in BRCA2, n=4, 36 CpGs were analyzed, and 3/6 pairwise comparisons were significant (average P=8.56×10-7); in PSEN1, n=3, 43 CpGs were analyzed, and 2/3 pairwise comparisons were significant (average P=1.89×10-4); in PSEN2, n=5, 45 CpGs were analyzed, and 6/10 pairwise comparisons were significant (average P=5.11×10-3); in DM1, n=7, 99 CpGs were analyzed, and 13/21 pairwise comparisons were significant (average P=5.60×10-4); and, in HD, n=6, 108 CpGs were analyzed, and 12/15 pairwise comparisons were significant (average P=1.66×10-3). Overall, 67% (41/61) of the pairwise comparisons were significantly different, which suggests a high overall level of interindividual variability in the methylation patterns of the tested genes. For the five cases in which 60 sequenced clones were available for BRCA1, BRCA2, and PSEN2, the comparisons were performed using two randomly selected groups of 30 clones (fig. 2). As a validation of this statistical method, the additional sets of 30 clones representing BRCA1, BRCA2, and PSEN2 were compared with the primary sets of 30 clones of the same individuals. In all cases, the results (compare pairs 55′ and 66′ in BRCA1 in fig. 2) showed that their profiles were not different, which is to be expected since they are from the same individual.
Figure 2. .
Interindividual variability of DNA methylation in six human disease genes. Bisulphite modification–based mapping of methylated cytosines in BRCA1, BRCA2, HD, DM1, PSEN1, and PSEN2. Thirty individual clones were sequenced from three to seven individuals. Analysis for each gene is represented in two panels. Left panels, graphical profile of the percentage of methylation (Y-axis, ranging from 0% to 40%) for every CpG dinucleotide (X-axis, ranging from 32 to 108 CpG dinucleotides), out of the total number of clones for each individual. Right panels, Euclidean distances (Y-axis) of pairwise comparisons between individual methylation profiles (X-axis). The blue line is the mean distance, and red lines are ±2 SD from the mean, both obtained for each gene from the permutation study (see the “Material and Methods” section). Pairwise comparisons are annotated—for example, as “16”—for the comparison of the Euclidean distance of individual 1 with that of individual 6. Primed individual numbers (e.g., 4′) represent a second set of 30 clones from those individuals. The error bars on some data points represent SDs from 100,000 permutations of 30 clone groups from the individuals from whom 60 clones were sequenced.
DNA Methylation Differences Detected by the CpG Island Microarrays
This CpG island microarray contains 12,192 DNA fragments; however, unique sequences are represented by 4,970 distinct loci, of which only about half meet the commonly used criteria for CpG islands: GC content of ⩾50%, length >200 bp, and observed/expected CG dinucleotide ratio >0.6.28 As described in the “Material and Methods” section, we used two strategies to increase the informativeness of our microarray analysis. The first strategy was to use an enzyme cocktail of HpaII, Hin6I, and AciI (sperm DNA–HHA data set), which is more informative for lower GC content loci, such as those that do not meet the CpG island criteria. The second strategy was to use HpaII alone (sperm DNA–HpaII data set), which would be more informative for the higher GC-containing loci. Lastly, we analyzed a brain DNA microarray data set (brain DNA–HpaII data set), to compare tissue-specific differences. As a measure of methylation variation, we have calculated the CV across individuals for each array set. The CV is calculated by dividing the SD in the Cy5/Cy3 ratio by the mean of the Cy5/Cy3 ratio, and it is expressed as a percentage. The variation in CV among individuals across the genome ranged from 2.1% to 30.5% (mean=6.7%), from 0.8% to 66.2% (mean=9.2%), and from 2.1% to 97.4% (mean=10.9%) for the sperm DNA–HHA, sperm DNA–HpaII, and brain DNA–HpaII data sets, respectively (table 2). We considered the loci within the top 10% of CVs (90th percentile) as highly variable regions.
Table 2. .
Statistical Analysis of Microarray Data from the Sperm DNA–HHA, Sperm DNA–HpaII, and Brain DNA–HpaII Data Sets
| Descriptive Statistics | Sperm DNA–HHA (n=25) |
Sperm DNA–HpaIIa (n=21) |
Brain DNA–HpaII (n=22) |
| Mean CV (±SD)b | 6.72 (2.48) | 9.23 (4.92) | 10.89 (5.40) |
| Loci countc | 4,969 | 4,947 | 4,952 |
| 90th percentile (%)d | >9.53 | >13.71 | >16.33 |
| 10th percentile (%)d | <4.33 | <5.16 | <6.44 |
| SNP analysise: | |||
| 90th percentile: | |||
| No. with SNPs | 78 | 32 | ND |
| No. with no SNPs | 72 | 118 | ND |
| 10th percentile: | |||
| No. with SNPs | 74 | 22 | ND |
| No. with no SNPs | 76 | 128 | ND |
| χ2 | .12 | 1.829 | ND |
| P value | .729 | .176 | ND |
| CGI analysisf: | |||
| Count: | |||
| CGI | 2,523 | 2,512 | 2,478 |
| Non-CGI | 2,446 | 2,435 | 2,401 |
| Mean CV (±SD): | |||
| CGI | 6.69 (2.57) | 9.6 (5.28) | 11.06 (4.94) |
| Non-CGI | 6.79 (2.32) | 8.97 (4.39) | 11.05 (5.60) |
| t Test P value | .14815 | 4.92×10-6 | .93995 |
| CGI χ2 testg: | |||
| 90th percentile: | |||
| No. CGI | 235 | 296 | 256 |
| No. non-CGI | 217 | 198 | 238 |
| 10th percentile: | |||
| No. CGI | 226 | 218 | 255 |
| No. non-CGI | 209 | 277 | 238 |
| χ2 | .003 | 24.34 | .001 |
| P value | .955 | 5.81×10-7 | .974 |
| Promoter χ2 testg: | |||
| 90th percentile: | |||
| No. promoter CGI | NA | 245 | NA |
| No. non–promoter CGI | NA | 52 | NA |
| 10th percentile: | |||
| No. promoter CGI | NA | 152 | NA |
| No. non–promoter CGI | NA | 67 | NA |
| χ2 | NA | 11.44 | NA |
| P value | NA | 4.87×10-4 | NA |
Values in bold type are statistically significant.
The mean (±SD) of the CV in ratio Cy5/Cy3 across the individuals (n) for each data set was calculated.
The count represents the number of unique loci remaining after data trimming (see the “Material and Methods” section).
Loci with CVs >90th percentile are in the top 10% of methylation-variable regions, and loci with CVs <10th percentile are the least variable loci.
The SNP analysis was performed to test for the effects of SNPs from our DNA methylation analysis. We randomly selected 300 loci from the 10th and 90th percentile loci, and the clone sequence plus 1-kb flanking regions were screened for the presence of SNPs—in particular, SNPs that create or disrupt HpaII, Hin6I, and AciI restriction sites. ND = not done.
The CpG island (CGI) analyses were performed by separating loci into either CGI or non-CGI categories. A Student's t test was performed to analyze the difference in mean CV. The numbers of loci within each group in the 90th and 10th percentile loci were counted, and χ2 analysis was performed.
The CGI loci were further subdivided into loci within promoter regions of genes or loci not within promoters, the numbers of loci in the 90th and 10th percentiles were counted, and χ2 analysis was performed. NA= not applicable.
The data for each locus were plotted on the genome (figs. 3 and 4) and are also available online as a custom annotation track (Center for Addiction and Mental Health) with use of the UCSC Genome Browser (fig. 3B). Figure 3A depicts the sperm DNA–HpaII data set and highlights the highly variable regions on the genome. To assess whether this distribution of highly variable spots is nonrandom, we performed an autocorrelation analysis; however, this analysis did not identify any evidence for autocorrelation, most likely because of the large genomic distance between microarray clones (average 0.6 Mb). Other analyses included testing if the detected variability is confounded by DNA sequence variation; comparison of DNA methylation variation in CpG islands and in non–CpG islands, as well as across different classes of repetitive elements; and assessing if DNA methylation variation correlates with the GC content, clone length, or particular chromosomal cytobands.
Figure 3. .
Chromosomal view of methylation variability by CpG island microarray analysis. A, Unmethylated fraction of genomic DNA extracted from sperm samples (n=21) hybridized individually (Cy5), in contrast to the pooled reference control (Cy3). The CV of the Cy5/Cy3 ratio was calculated for each spot across the 21 individuals and was mapped to the corresponding genomic location. Each chromosome ideogram is overlaid with red bars that represent the position of each clone on the CpG island microarray. The bars highlighted in green are the loci that showed variance in the 90th percentile (the top 10% of loci exhibiting the largest degree of DNA methylation variation). B, Screenshot of the custom annotation track on the UCSC Genome Browser (available from the Center for Addiction and Mental Health). Shown is chromosome 6, which includes the major histocompatibility complex locus that was screened for epigenetic variability by the Human Epigenome Project pilot study.8
Figure 4. .
Genomewide view of brain DNA–HpaII (A) and sperm DNA–HHA (B) data sets. The unmethylated fraction of genomic DNA was enriched from brain DNA (n=22) or sperm samples (n=25), and each was hybridized individually (Cy5), in contrast to the pooled samples (Cy3). The CV of ratio Cy5/Cy3 was calculated for each spot across the 22 or 25 individuals and was mapped to the corresponding genomic location. Each chromosome ideogram is overlaid with red bars that represent the position of each clone on the array. The bars highlighted in green are the loci that showed statistically significant variance (90th percentile).
Exclusion of Genetic Confounding Effects: SNPs and Copy-Number Polymorphisms
Any method that relies on restriction-enzyme digestion to differentiate between methylated and unmethylated DNA can be influenced by SNPs within the enzyme-restriction sites. Therefore, from each of the sperm DNA–HHA and sperm DNA–HpaII data sets, we selected 150 highly variable loci and 150 conserved loci and performed in silico screening, to identify all known SNPs within a 2-kb region of the selected clone that disrupt or create HpaII, Hin6I, or AciI enzyme sites for the sperm DNA–HHA data set or just HpaII sites for the sperm DNA–HpaII data set (SNP annotation of the UCSC Genome Browser). If SNPs were a significant confounding factor in DNA methylation variation, we would expect a higher proportion of SNPs in highly variable loci (e.g., 90th percentile) compared with the lowest variable loci (e.g., 10th percentile). The χ2 analysis revealed no association between the number of potentially disruptive enzyme-restriction sites and the degree of variability in either data set (sperm DNA–HHA χ2=0.12, P=.729; sperm DNA–HpaII χ2=1.83, P=.176). This finding suggests that the degree of variability in the sperm DNA microarray analysis is more dependent on DNA methylation differences than on DNA sequence differences.
Recent reports have identified >200 copy-number polymorphisms (CNPs) that represent large duplications and deletions that contribute significantly to genomic variation between individuals.29–31 Like SNPs, CNPs could simulate DNA methylation variability in the microarray analysis. We have cross-referenced the CNPs identified in these studies with the CpG island microarray loci and have identified 25 microarray loci that occur within known CNP regions. These include large CNPs in chromosomes 3 (covering the genes OSTα, AB018337, UNQ3030, BC015560, and DLG1), 16 (BC008967, XYLT1, ARL61P, MIR16, MGC16943, and CDR2), and 17 (AY302137, BHD, RAI1, FLJ20308, TOP3A, and SMCR8) and smaller CNPs on chromosomes 1 (NEGR1), 2 (AK024244), 6 (RDBP), 8 (TSTA3), 9 (LHX2), 11 (TNNT3), and 14 (AK090461). Microarray results for these genes listed could, therefore, be influenced by deletions or duplications as much as by methylation variability; however, none of these loci appear in the list of highly variable (>90th percentile) loci.
CpG Island Analysis
Not all DNA fragments on the CpG island microarray met the criteria for CpG islands. The list of loci were divided into “CpG islands” or “non–CpG islands,” and a Student's t test was performed to test for any statistically significant difference in the mean CV (table 2). A significantly increased DNA methylation variability was found in loci defined as CpG islands in the sperm DNA–HpaII data set (t test P=4.92×10-6), and this variability was exemplified by a bias towards CpG islands in the 90th percentile (highly variable regions) (χ2=24.34; P=5.81×10-7). In addition, when the CpG islands were split into promoter CpG islands and CpG islands not associated with known gene promoters, significantly higher variability in promoter CpG islands (χ2=11.44; P=4.87×10-4) was detected. However, analyses of methylation variability with other measures, including GC percent alone and clone length, did not reveal any association. No evidence for higher DNA methylation variation was detected in the promoter CpG islands in the brain DNA–HpaII data set, and there also was no association with SNPs. Therefore, this sperm DNA–HpaII experiment appears to have revealed genuine increased methylation differences in the promoter CpG islands.
Cytoband Analysis
It has been well described that different cytobands could have evolved in different ways and that the genes within each band could have evolutionary similarities.32,33 Since these bands are based on, among other things, GC content and Alu content, we sought to identify whether methylation variability was one of the aspects that showed similarities within bands. The CpG island microarray annotation includes the division of loci into different cytobands, including the G bands (Giemsa negative: gneg) and the four classes of R bands (Giemsa positive: gpos25, gpos50, gpos75, and gpos100). These bands are defined as follows. The darkest R bands, gpos100, are very rich in GC and Alu; the next darkest, gpos75, are very rich in GC but not in Alu; gpos50 are not rich in GC but are rich in Alu; and gpos25 are Giemsa-dark bands that are rich in neither GC nor Alu.34 Mean CVs for all the loci within each of these cytobands were calculated, and a Student's t test was performed to identify any statistically significant differences. In each of the data sets, marginally significant associations with certain cytobands were identified. In the sperm DNA–HHA data set, significant decreases in variability between gpos75 band loci (CV = 6.51) and the other three R bands—gpos25, gpos50, and gpos100 (average CV = 6.83; average P=.023)—were detected. In the sperm DNA–HpaII data set, gpos25 exhibited a lower degree of methylation compared with gpos50 (CV = 8.97 and CV = 9.50, respectively; P=.041). Although the significance of these statistical tests diminished when corrected for multiple testing, the result is suggestive of an increase in variability in the Alu-rich cytobands, such as the gpos50 and gpos100 cytobands, compared with the Alu-poorer bands gpos25 and gpos75.
Age-Dependent DNA Methylation Changes in the Sperm
Methylation dynamics with age (sperm DNA–HHA age range 22–35 years; sperm DNA–HpaII age range 24–56 years) as a covariate were investigated. Individuals were ordered by increasing age, and the Pearson correlation between the age and relative methylation-signal intensity (ratio of case to reference) was calculated for each locus. In the sperm DNA–HpaII and sperm DNA–HHA data sets, 105 and 8 loci, respectively, were found, whose absolute correlation coefficients were >0.5 and whose P values were <.05. Numerous genes were identified in the germ cell data that corresponded to genes involved in spermatogenesis and development (e.g., INSM1, TZFP, and EED) and neurogenesis (e.g., CALM1, STMN2, ARHGEF9, and ARX) or to disease-related genes (e.g., MAF, DCC, and CDH13 [MIM 601364]). A number of examples are shown in figure 5. The lists of genes for each data set are available in tables 3 and 4.
Figure 5. .
Age-related DNA methylation changes in the sperm. Individuals were ordered by increasing age (top left panel), and gene-specific DNA methylation dynamics were investigated using the individual ages (sperm DNA–HpaII age range 24–56 years) as a covariate. Pearson correlation was calculated for each locus, and the one-tailed P value of the coefficient was obtained. In the sperm DNA–HpaII data set, 105 loci were identified as significantly (P<.05) correlated (r>0.5) or inversely correlated (r<-0.5) with age. Since the unmethylated fraction of DNA was interrogated, positive correlation indicates decreasing DNA methylation with age, whereas inverse correlation reflects increasing methylation with age. The genes CTNNA2, EED, CALM1, CDH13, and STMN2 are shown as examples. Other genes for the sperm DNA–HpaII, sperm DNA–HHA, and brain DNA–HpaII data sets are available in tables 3 and 4.
Table 3. .
Age-Related Correlation in Sperm DNA–HpaII Data Set[Note]
| University Health Network Accession Number | Age Correlationa | Genome Locationb | Nearest Gene Distance (bp) |
Nearest Gene Entrez GeneID | Nearest Gene |
| UHNhscpg0007432 | −.918421653 | chr 14: 73296338-73296842 | 0 | 91748 | C14orf43 |
| UHNhscpg0004878 | −.644590225 | chr 18: 58533334-58533572 | 1,366 | 23239 | PHLPP |
| UHNhscpg0001279 | −.635499301 | chr 2: 15682160-15683012 | 0 | 1653 | DDX1 |
| UHNhscpg0010337 | −.633961562 | chr 4: 24692990-24693129 | 0 | 55203 | LGI2 |
| UHNhscpg0000523 | −.628700319 | chr 14: 89918538-89919063 | 14,066 | 801 | CALM1d |
| UHNhscpg0009687 | −.61929126 | chr 11: 71430395-71430772 | 0 | 4926 | NUMA1 |
| UHNhscpg0002060 | −.607457801 | chr 7: 117447932-117448126 | 10,775 | 56311 | ANKRD7 |
| UHNhscpg0006282 | −.603136381 | chr 14: 50203675-50203889 | 0 | 60485 | SAV1 |
| UHNhscpg0008336 | −.60134971 | chr 3: 4211475-4211590 | 0 | AY358092 | |
| UHNhscpg0002269 | −.594487623 | chr 16: 18719890-18721070 | 0 | 23204 | ARL6IP |
| UHNhscpg0006883 | −.590701745 | chr 7: 20603836-20604518 | 4,091 | 221833 | SP8 |
| UHNhscpg0003324 | −.57876246 | chr 8: 80412773-80412879 | 272,989 | 11075 | STMN2c |
| UHNhscpg0007072 | −.578612647 | chr 13: 20930559-20930802 | 0 | 253832 | FLJ25952 |
| UHNhscpg0002392 | −.577309667 | chr 12: 55758348-55759109 | 0 | 23306 | AB006624 |
| UHNhscpg0010814 | −.574206485 | chr 9: 111503780-111504547 | 0 | 548645 | GNG10 |
| UHNhscpg0002833 | −.568147866 | chr 11: 62202212-62203556 | 0 | LOC51035 | |
| UHNhscpg0007366 | −.562953631 | chr 13: 20770028-20770950 | 73,766 | FLJ25952 | |
| UHNhscpg0010640 | −.55975393 | chr 16: 81828071-81828454 | 0 | 1012 | CDH13d |
| UHNhscpg0003417 | −.55808056 | chr 8: 80412825-80412879 | 272,989 | 11075 | STMN2c |
| UHNhscpg0007351 | −.557144063 | chr 11: 133162749-133163293 | 52,433 | 219938 | SPATA19e |
| UHNhscpg0011679 | −.553844494 | chr 19: 5999835-6000028 | 0 | 5990 | RFX2e |
| UHNhscpg0011303 | −.551957348 | chr 1: 114639352-114639536 | 12,748 | 51592 | TRIM33 |
| UHNhscpg0000547 | −.551565288 | chr 9: 86126053-86126501 | 0 | 81689 | HBLD2c |
| UHNhscpg0010993 | −.550180272 | chr 1: 46478949-46480024 | 0 | 10489 | AF370430 |
| UHNhscpg0000380 | −.545084964 | chr X: 134996172-134996628 | 0 | 2273 | FHL1 |
| UHNhscpg0003556 | −.544679103 | chr X: 62745590-62745765 | 0 | 23229 | ARHGEF9c |
| UHNhscpg0002314 | −.539967585 | chr 3: 157491673-157492091 | 0 | 7881 | KCNAB1 |
| UHNhscpg0003596 | −.531788057 | chr 4: 172320378-172320459 | 934,465 | 51166 | AADAT |
| UHNhscpg0001705 | −.526787621 | chr 15: 99009623-99010314 | 2,859 | 140460 | ASB7 |
| UHNhscpg0009822 | −.526562678 | chr 1: 21855364-21855679 | 370 | AK026930 | |
| UHNhscpg0000063 | −.52421876 | chr 3: 72584222-72584350 | 5,758 | 23429 | RYBP |
| UHNhscpg0001205 | −.521667613 | chr 17: 38793230-38793666 | 8,954 | AK128207 | |
| UHNhscpg0009465 | −.518509703 | chr 5: 92982058-92982196 | 0 | 83989 | DKFZP564D172 |
| UHNhscpg0009804 | −.517599705 | chr 6: 95040381-95040442 | 854,395 | 2045 | EPHA7d |
| UHNhscpg0010111 | −.516980672 | chr 3: 45163070-45163300 | 152 | 64866 | CDCP1 |
| UHNhscpg0008321 | −.51607039 | chr 4: 32393339-32393439 | 1,572,342 | 5099 | PCDH7 |
| UHNhscpg0006596 | −.515801694 | chr 12: 6667579-6668374 | 0 | 171017 | ZNF384c |
| UHNhscpg0007045 | −.514160426 | chr 6: 39190 376-39191144 | 0 | 55776 | C6orf64 |
| UHNhscpg0003064 | −.513774745 | chr 15: 50863986-50864266 | 0 | 3175 | ONECUT1 |
| UHNhscpg0008779 | −.512617631 | chr 12: 64004487-64004759 | 0 | 253827 | LOC253827 |
| UHNhscpg0002807 | −.510358672 | chr 9: 91266376-91266791 | 859 | 4783 | NFIL3 |
| UHNhscpg0007252 | −.510001953 | chr 13: 110249836-110250880 | 68,680 | 283487 | LOC283487 |
| UHNhscpg0008979 | −.509333714 | chr 3: 97915795-97915891 | 100,433 | AY358738 | |
| UHNhscpg0002130 | −.508146229 | chr 20: 61676052-61676691 | 6,181 | 85441 | PRIC285 |
| UHNhscpg0000750 | −.506824133 | chr 11: 111449826-111451103 | 0 | 55216 | FLJ10726 |
| UHNhscpg0004733 | −.506433015 | chr 11: 77963050-77963233 | 0 | 79731 | FLJ23441 |
| UHNhscpg0007997 | −.504269038 | chr 11: 19691578-19692280 | 0 | 89797 | AJ488207 |
| UHNhscpg0011253 | −.503229452 | chr 5: 5123770-5123955 | 111,307 | 170690 | ADAMTS16 |
| UHNhscpg0001019 | −.502788473 | chr 17: 38793230-38793666 | 8,954 | AK128207 | |
| UHNhscpg0000830 | −.502263623 | chr 20: 11257478-11257784 | 588,782 | 22903 | BTBD3 |
| UHNhscpg0008888 | −.501752931 | chr 14: 46575274-46575377 | 0 | 161357 | MAMDC1c |
| UHNhscpg0005027 | −.501245356 | chr 10: 127671190-127671371 | 0 | 92565 | AY251163 |
| UHNhscpg0008252 | .50313828 | chr X: 24795508-24795821 | 1,997 | 170302 | ARXc |
| UHNhscpg0005849 | .505284858 | chr 9: 69169771-69170443 | 3,351 | 9413 | C9orf61 |
| UHNhscpg0002396 | .505405308 | chr 9: 15499859-15499960 | 0 | 11168 | PSIP1 |
| UHNhscpg0000331 | .505543224 | chr1: 114065861-114066820 | 0 | 54665 | FLJ11220 |
| UHNhscpg0004028 | .505850764 | chr 6: 109920727-109920873 | 0 | FLJ25791 | |
| UHNhscpg0003238 | .508910182 | chr 5: 72642966-72643484 | 134,358 | 2297 | FOXD1 |
| UHNhscpg0005599 | .512086932 | chr 11: 57161641-57162347 | 6,789 | 219539 | YPEL4 |
| UHNhscpg0004696 | .515663002 | chr 2: 26012699-26013466 | 0 | 55252 | ASXL2 |
| UHNhscpg0008470 | .516009324 | chr 1: 225950806-225950903 | 0 | 55746 | NUP133 |
| UHNhscpg0008340 | .51741056 | chr 4: 84313172-84313686 | 0 | 51138 | COPS4 |
| UHNhscpg0003990 | .517872516 | chr 12: 43376666-43376983 | 0 | 4753 | NELL2c |
| UHNhscpg0008557 | .518846707 | chr 10: 239933-240068 | 0 | 10771 | ZMYND11 |
| UHNhscpg0000482 | .520984085 | chr 11: 85649369-85649478 | 0 | 8726 | EEDe |
| UHNhscpg0005884 | .520992457 | chr 3: 128381578-128381626 | 13,045 | 285311 | AK097460 |
| UHNhscpg0008792 | .521754696 | chr 4: 145190267-145191203 | 5,950 | 2996 | GYPE |
| UHNhscpg0009520 | .525317277 | chr 1: 91678271-91678520 | 0 | 8317 | CDC7d |
| UHNhscpg0004355 | .526505907 | chr 6: 27464330-27464706 | 0 | 441136 | AK092633 |
| UHNhscpg0001611 | .526967248 | chr 6: 30289787-30290357 | 683 | 7726 | TRIM26 |
| UHNhscpg0004783 | .529777441 | chr 22: 15638898-15638988 | 0 | 150165 | MGC57211 |
| UHNhscpg0000025 | .532588588 | chr 10: 75370203-75371061 | 22,943 | 5328 | PLAUd |
| UHNhscpg0005928 | .535021909 | chr 1: 212554059-212554157 | 0 | 7399 | USH2A |
| UHNhscpg0001828 | .537335195 | chr 17: 17507093-17507808 | 17,703 | 10743 | RAI1 |
| UHNhscpg0005479 | .543975527 | chr 1: 215895774-215895935 | 122,221 | 127018 | LYPLAL1 |
| UHNhscpg0002565 | .545509304 | chr 6: 116998748-116999090 | 1,641 | 51389 | RWDD1 |
| UHNhscpg0008258 | .545922515 | chr 1: 114408830-114409330 | 316 | 148281 | SYT6 |
| UHNhscpg0010487 | .547254207 | chr 12: 92273678-92274142 | 234 | 11163 | NUDT4 |
| UHNhscpg0002717 | .55323384 | chr 12: 48302621-48302970 | 649 | AK123353 | |
| UHNhscpg0002673 | .553941933 | chr 15: 81825124-81825456 | 80,654 | 646 | BNC1 |
| UHNhscpg0010939 | .554017064 | chr 10: 8108807-8109019 | 23,399 | FLJ45983 | |
| UHNhscpg0001474 | .557227088 | chr 8: 53488737-53489332 | 3,881 | 9705 | ST18d |
| UHNhscpg0008206 | .559375672 | chr 14: 89155480-89155723 | 41,909 | 29018 | AF118074 |
| UHNhscpg0004409 | .560612707 | chr 12: 63850471-63850714 | 0 | 23592 | MAN1 |
| UHNhscpg0005280 | .566882321 | chr 8: 114808371-114808497 | 289,953 | 114788 | CSMD3 |
| UHNhscpg0002623 | .566952142 | chr 12: 14237814-14238207 | 171,686 | 55729 | BC063855e |
| UHNhscpg0003738 | .567376726 | chr 2: 88155376-88155675 | 10,310 | 51315 | LOC51315 |
| UHNhscpg0008444 | .570746223 | chr 10: 122728932-122729159 | 69,907 | 55717 | WDR11d |
| UHNhscpg0007259 | .575414924 | chr 1: 208139197-208139388 | 390 | 7779 | SLC30A1 |
| UHNhscpg0002607 | .582792699 | chr 2: 26481086-26481443 | 0 | 165082 | GPR113 |
| UHNhscpg0008656 | .58347414 | chr 1: 115293188-115293360 | 4,205 | 7252 | TSHB |
| UHNhscpg0002406 | .584415421 | chr 18: 48120746-48121468 | 0 | 1630 | DCCd |
| UHNhscpg0009002 | .5849705 | chr 16: 78361983-78362481 | 169,871 | 4094 | MAFd |
| UHNhscpg0007649 | .586330872 | chr 14: 80296231-80296366 | 0 | 145508 | C14orf145 |
| UHNhscpg0004597 | .587453741 | chr 22: 15638898-15638988 | 0 | MGC57211 | |
| UHNhscpg0003289 | .613126037 | chr 14: 20640986-20641802 | 4,239 | 554207 | BC031469 |
| UHNhscpg0002376 | .616193621 | chr 18: 33116047-33117280 | 0 | 56853 | BRUNOL4 |
| UHNhscpg0002145 | .617261027 | chr 1: 116672902-116673528 | 13,466 | 476 | ATP1A1 |
| UHNhscpg0000928 | .628741946 | chr 18: 32114987-32115360 | 12,305 | 55034 | MOCOS |
| UHNhscpg0008601 | .636604373 | chr 19: 40897102-40898011 | 0 | 27033 | TZFPe |
| UHNhscpg0002312 | .640025401 | chr 12: 52959901-52960341 | 457 | 3178 | HNRPA1 |
| UHNhscpg0004507 | .640711129 | chr 2: 80187757-80187860 | 0 | 1496 | CTNNA2c |
| UHNhscpg0002864 | .644197159 | chr 20: 20293918-20294056 | 2,708 | 3642 | INSM1e |
| UHNhscpg0008280 | .693689486 | chr 3: 111231569-111231663 | 692,505 | 55211 | DPPA4e |
| UHNhscpg0003180 | .698353684 | chr 2: 45078887-45079081 | 1,606 | 6496 | SIX3 |
Note.— Individuals were ordered by increasing age, and gene-specific DNA methylation dynamics were investigated using the individual ages (sperm DNA–HpaII age range 24–56 years) as a covariate.
Negative score = increasing methylation with respect to age. Positive score = decreasing methylation with respect to age.
chr = chromosome.
Genes related to brain/neuronal development.
Genes related to cancer or other disease.
Genes related to spermatogenesis, embryogenesis, and development.
Table 4. .
Age-Related Correlation in Sperm DNA–HHA Data Set[Note]
| University Health Network Accession Number | Age Correlationa | Genome Locationb | Nearest Gene Distance (bp) |
Nearest Gene Entrez GeneID | Nearest Gene |
| UHNhscpg0006311 | −.571461 | chr 16: 73575728-73575882 | 0 | 79726 | BC004519 |
| UHNhscpg0000757 | −.5414097 | chr 16: 3084575-3084850 | 1,713 | 84891 | ZNF206 |
| UHNhscpg0002369 | −.5080423 | chr 12: 102467385-102467934 | 15,583 | 55576 | STAB2 |
| UHNhscpg0001087 | −.5049054 | chr 2: 222992625-222992945 | 0 | 0 | FLJ32447 |
| UHNhscpg0000495 | −.4991295 | chr 2: 38215365-38216278 | 448 | 1545 | CYP1B1 |
| UHNhscpg0000562 | −.4981613 | chr 10: 21822844-21823534 | 18,875 | 387640 | FLJ45187 |
| UHNhscpg0009206 | −.4953882 | chr 2: 120996167-120996370 | 56,240 | 84931 | FLJ14816 |
| UHNhscpg0005717 | −.4945551 | chr 21: 46567171-46567396 | 1,086 | 5116 | PCNT2 |
| UHNhscpg0007805 | −.49356 | chr 2: 223623949-223624362 | 0 | 2181 | ACSL3 |
| UHNhscpg0006512 | −.4900278 | chr 12: 21985242-21985363 | 0 | 10060 | BC033804 |
| UHNhscpg0001346 | −.4890173 | chr 12: 94687435-94687958 | 0 | 0 | METAP2 |
| UHNhscpg0000367 | −.4675992 | chr 9: 123770922-123771907 | 0 | 57706 | AK024782 |
| UHNhscpg0009946 | −.4518449 | chr 17: 70661243-70661746 | 0 | 51155 | HN1 |
| UHNhscpg0007913 | −.449313 | chr 3: 44011141-44011398 | 246,983 | 375337 | AK093476 |
| UHNhscpg0002168 | −.437548 | chr 7: 13802034-13802474 | 0 | 2115 | ETV1 |
| UHNhscpg0000973 | −.4372439 | chr 18: 54013705-54014057 | 0 | 23327 | NEDD4Lc |
| UHNhscpg0001825 | −.432339 | chr 12: 94687426-94687979 | 0 | 0 | METAP2 |
| UHNhscpg0008872 | −.428427 | chr 3: 25681850-25682051 | 1,058 | 7155 | TOP2Bd |
| UHNhscpg0004998 | −.425298 | chr 18: 24676574-24676664 | 665,482 | 1000 | CDH2c |
| UHNhscpg0003543 | −.4174782 | chr 13: 93951550-93951661 | 21,626 | 1638 | DCT |
| UHNhscpg0000752 | −.4161375 | chr 19: 52308085-52308632 | 0 | 23211 | C19orf7 |
| UHNhscpg0000402 | −.4145143 | chr 10: 70330981-70331591 | 0 | 0 | AK056044 |
| UHNhscpg0000851 | −.4097783 | chr 15: 43466690-43467020 | 8,677 | 2628 | GATM |
| UHNhscpg0008103 | −.4090966 | chr 5: 78724780-78724896 | 0 | 9456 | HOMER1 |
| UHNhscpg0000874 | −.4090138 | chr 13: 45524409-45524808 | 0 | 23091 | BC019000 |
| UHNhscpg0003075 | −.4080136 | chr 12: 22379747-22379948 | 832 | 0 | SIAT8A |
| UHNhscpg0007389 | −.405673 | chr 1: 192571739-192571804 | 354,765 | 343450 | SLICK |
| UHNhscpg0001303 | −.405598 | chr 12: 50749103-50749772 | 254 | 60673 | FLJ11773 |
| UHNhscpg0008902 | −.4048977 | chr 14: 46575274-46575748 | 0 | 161357 | MAMDC1c |
| UHNhscpg0001410 | −.4042417 | chr 15: 43466690-43467020 | 8,677 | 2628 | GATM |
| UHNhscpg0001498 | −.4035585 | chr 19: 14052743-14053182 | 5,870 | 113230 | BC011002 |
| UHNhscpg0001795 | −.4022476 | chr 22: 45478515-45478970 | 97 | 25771 | C22orf4 |
| UHNhscpg0000092 | −.4017695 | chr 1: 243700095-243700378 | 45,036 | 317705 | VN1R5 |
| UHNhscpg0000407 | −.4016029 | chr 3: 159310225-159310601 | 0 | 51319 | MGC12197 |
| UHNhscpg0001152 | −.401599 | chr 4: 85773951-85774343 | 0 | 4825 | NKX6-1 |
| UHNhscpg0011244 | .400513 | chr 1: 233050707-233050741 | 0 | 55127 | AK098212 |
| UHNhscpg0008036 | .4027472 | chr 22: 22430339-22430668 | 0 | 150248 | FLJ36561 |
| UHNhscpg0009859 | .4084672 | chr 20: 38753133-38753612 | 1,843 | 9935 | MAFBd |
| UHNhscpg0011733 | .4095515 | chr 5: 78381324-78381406 | 0 | 29958 | DMGDH |
| UHNhscpg0010258 | .4143755 | chr 1: 10393997-10394274 | 0 | 5226 | PGD |
| UHNhscpg0005083 | .4183838 | chr 6: 122167296-122167347 | 354,726 | 2697 | GJA1 |
| UHNhscpg0008843 | .422049 | chr 3: 54282033-54282522 | 0 | 55799 | AF516696 |
| UHNhscpg0003506 | .4240404 | chr 2: 36496443-36496693 | 0 | 0 | CRIM1c |
| UHNhscpg0005633 | .4240989 | chr 18: 30341908-30341962 | 85,357 | 1837 | DTNAc |
| UHNhscpg0005683 | .4247309 | chr 4: 84391329-84391815 | 0 | 51316 | PLAC8 |
| UHNhscpg0010991 | .4284163 | chr 12: 78167517-78167778 | 0 | 6857 | SYT1c |
| UHNhscpg0010414 | .428641 | chr 14: 69723455-69723658 | 0 | 6547 | SLC8A3 |
| UHNhscpg0011471 | .4289618 | chr 19: 63431698-63431877 | 765 | 27300 | ZNF544 |
| UHNhscpg0011163 | .4336394 | chr 19: 58188067-58188175 | 0 | 90338 | ZNF160 |
| UHNhscpg0003691 | .4439328 | chr 8: 116422851-116423033 | 66,866 | 7227 | TRPS1 |
| UHNhscpg0011833 | .4440889 | chr 13: 99427912-99428530 | 3,789 | 0 | ZIC2c |
| UHNhscpg0005658 | .4461968 | chr 7: 4636276-4636727 | 0 | 55698 | FLJ10324 |
| UHNhscpg0003824 | .4462915 | chr 2: 107673648-107673883 | 238,538 | 285190 | BX537861 |
| UHNhscpg0006149 | .4473537 | chr 1: 193309526-193310390 | 370 | 343450 | SLICK |
| UHNhscpg0005850 | .4475083 | chr 11: 959952-960191 | 0 | 161 | AP2A2 |
| UHNhscpg0009680 | .4498291 | chr 14: 38935426-38935732 | 978 | 254170 | FBXO33 |
| UHNhscpg0009704 | .4509138 | chr 19: 45765931-45766515 | 0 | 57731 | SPTBN4 |
| UHNhscpg0007755 | .4520231 | chr 15: 76517964-76518338 | 0 | 0 | IREB2 |
| UHNhscpg0008364 | .4540589 | chr 3: 172679809-172679986 | 18,989 | 23043 | AB011123 |
| UHNhscpg0008495 | .4544119 | chr 21: 36354670-36354909 | 10 | 54093 | C21orf18 |
| UHNhscpg0006360 | .4566517 | chr 10: 102972869-102973961 | 2,762 | 10660 | LBX1e |
| UHNhscpg0005626 | .4603208 | chr 1: 43493737-43494422 | 0 | 991 | CDC20 |
| UHNhscpg0011755 | .4645775 | chr 11: 18684679-18684913 | 0 | 0 | FLJ37794 |
| UHNhscpg0009584 | .4827054 | chr 14: 38935426-38935732 | 978 | 254170 | FBXO33 |
| UHNhscpg0010541 | .4897825 | chr 20: 1875274-1875325 | 6,734 | 140885 | PTPNS1c |
| UHNhscpg0007861 | .4988212 | chr 19: 8563202-8563420 | 0 | 81794 | ADAMTS10d |
| UHNhscpg0010637 | .5227931 | chr 20: 1875274-1875325 | 6,734 | 140885 | PTPNS1c |
| UHNhscpg0005162 | .5267622 | chr 4: 123228934-123229348 | 16,718 | 0 | TRPC3 |
| UHNhscpg0011884 | .5376706 | chr 8: 39748508-39748595 | 0 | 2515 | ADAM2e |
| UHNhscpg0005387 | .5521166 | chr 19: 19634927-19635609 | 0 | 57130 | ATP13A |
Note.— Individuals were ordered by increasing age, and gene-specific DNA methylation dynamics were investigated using the individual ages (sperm DNA–HHA age range 22–35 years) as a covariate.
Negative score = increasing methylation with respect to age. Positive score = decreasing methylation with respect to age.
chr = chromsome.
Genes related to brain/neuronal development.
Genes related to cancer or other disease.
Genes related to spermatogenesis, embryogenesis, and development.
DNA Methylation in the Repetitive Elements
All the above analyses were performed on unique DNA sequences; however, the CpG island microarray also contains a large number of clones containing repetitive elements, which, as a rule, are heavily methylated.35 Although it is difficult to directly distinguish between methylation and copy-number differences, one possible approach is to compare methylation of repetitive elements in the sperm to that in other tissues. For this reason, the sperm DNA–HpaII data set was analyzed in comparison with the brain DNA–HpaII data set. The microarray loci that contain a single repetitive element were separated into each repeat class, and the mean CV (±SD) was calculated. If the repetitive elements were influencing the methylation variability, one would expect that those loci containing repetitive elements would display significantly different mean CVs than those of nonrepetitive loci. This analysis revealed the overall average repetitive-element CV of 10.5 in the sperm, compared with the overall average CV in nonrepetitive elements of 9.6. The breakdown of CV for each type of repetitive element represented on the microarray is shown in figure 6. This analysis identified that satellite DNA repeats were statistically more variable than other repetitive elements in the sperm DNA–HpaII data set (P=6.12×10-17). In comparison, this effect was far less pronounced in the brain DNA–HpaII data set (P=.0027) (fig. 6A). When the satellite repeats were further separated into specific repeat classes, a number of repeat classes, predominantly centromeric or pericentromeric satellite repeats, were identified as responsible for this increase in interindividual variability, including (GAATTC)n (CV=18.5), ALR/α (human α-repetitive DNA [CV=25.0]), CER (human D22Z3-centromeric–repetitive DNA [CV=18.7]), and HSATII repeats (human satellite II DNA [CV=34.8]) (fig. 6B).
Figure 6. .
Repetitive-element analysis in sperm DNA–HpaII and brain DNA–HpaII data sets. The microarray loci that contain a single repetitive element were separated into each repeat class, and the mean CV (±SD) was calculated (A). The repeat classes include DNA transposons (n=209), long interspersed transposable elements (LINEs [n=771]), low-complexity repeats (n=461), long terminal repeats (LTRs [n=360]), satellites (n=208), simple repeats (n=346), SINEs (n=1,058), small nuclear RNA (snRNA [n=30]), and tRNA (n=40), and the nonrepetitive loci (n=6,976) are presented for comparison. The satellite repeats were the only class to show significantly increased variability in the sperm DNA–HpaII (P=6.12×10-17) and less-significantly increased variability in the brain DNA–HpaII (P=.0027) data sets. B, Breakdown of the satellite repeats into specific satellite-repeat classes, which reveals a number of repeat classes with increased variability—predominantly, the centromeric satellite repeats, including (GAATTC)n (P=8.44×10-17; n=55), ALR/α (P=4.08×10-25; n=119), CER (P=.0026; n=6), and HSATII repeats (P=3.91×10-5; n=19) but not BSR/β repeats (P>.05; n=7).
Validation of the Microarray Data with Use of Bisulphite Modification–Based Methylated/Unmethylated Cytosine Analysis
For validation of the microarray data, 12 loci that were detected as variable in the CpG island microarray analysis (table 5) were analyzed using the MS-SNuPE reaction on the ABI SNapShot platform27 at the CpG dinucleotides in the HpaII and the Hin6I or AciI restriction sites. Initially, such loci were selected on the basis of increased variability (>90th percentile) in the sperm DNA–HHA data set; in addition, a number of these loci were also highly variable in the sperm DNA–HpaII data set (CDH13, SCAM1, MKL2, and DIRAS3). Each of the 12 loci selected were initially resequenced to confirm the identity of the sequence. DNA samples from 11 individuals were treated with sodium bisulphite and were PCR amplified, and primer extension reactions were performed to interrogate 65 CpG dinucleotides within the 12 sequences. Examples of six loci are presented in figure 7. This analysis revealed variable levels of methylation differences in at least one enzyme-restriction site in 11 of 12 loci tested. It should be noted here that DNA methylation differences in a single restriction site may be sufficient to generate significant differences in the microarray analysis. Only one locus (DIRAS3) showed no methylation differences between the 11 individuals; however, we were able to test only 5 of 20 CpG sites at this locus, so methylation variation in the untested CpG sites cannot be ruled out. To assess the replicability of the assay, we repeated the MS-SNuPE/SNaPshot experiment on five loci in five individuals. Consistent with published data,27 the results in this second round of experiments were within 5% of the first experiment, on average (range 1.7%–9.9%).
Table 5. .
List of Clones Selected for Bisulfite-Modification MS-SNuPE Analysis
| University Health Network Accession Number | Gene | Gene Description | Chromosome Location |
Total Enzyme CpGsa |
CpGs Testedb |
Variable Methylated CpGsc |
| UHNhscpg0004931 | OLR1 | Oxidized low-density lipoprotein receptor 1 | 12p13.2 | 12 | 11 | 9 |
| UHNhscpg0004063 | CDH13 | Cadherin 13 preproprotein | 16q23.3 | 8 | 8 | 8 |
| UHNhscpg0002847 | SCAM1 | Sorbin and SH3 domain–containing 3 | 8p21.3 | 13 | 6 | 5 |
| UHNhscpg0003990 | NELL2 | NEL-like 2 (chicken) | 12q12 | 2 | 2 | 2 |
| UHNhscpg0003907 | NEIL2 | Nei-like 2 (Escherichia coli) | 8p23.1 | 7 | 6 | 4 |
| UHNhscpg0001947 | MKL2 | Megakaryoblastic leukemia 2 | 16p13.12 | 10 | 5 | 2 |
| UHNhscpg0000823 | 2-PDE | 2′-Phosphodiesterase | 3p14.3 | 20 | 6 | 1 |
| UHNhscpg0004641 | RHOQ | Ras-related GTP-binding protein TC10 | 2p21 | 12 | 6 | 1 |
| UHNhscpg0002006 | DIRAS3 | Ras homolog gene family, member I | 1p31.2 | 20 | 5 | 0 |
| UHNhscpg0005090 | AHR | RWD domain–containing 3 | 1p21.3 | 3 | 3 | 2 |
| UHNhscpg0009548 | DSCAM | Down syndrome cell-adhesion molecule | 21q22.2 | 4 | 3 | 3 |
| UHNhscpg0004745 | FBN1 | Fibrillin 1 (Marfan syndrome) | 15q21.1 | 2 | 2 | 2 |
The number of CpG sites within the recognition sequence of restriction enzymes HpaII, Hin6I, and AciI.
The number of CpG sites tested by MS-SNuPE.
The number of CpG sites that showed variable methylation between individuals.
Figure 7. .
MS-SNuPE analysis of densities of methylated cytosines in CpG dinucleotides of selected genes. Genomic DNA from 11 individuals was treated with sodium bisulphite and then was PCR amplified for each gene. The genes NELL2, SCAM1, NEIL2, MKL2, CDH13, and OLR1 are represented. The methylation status of CpG dinucleotides within each of the restriction-enzyme sites was interrogated using the primer-extension reactions. Methylation of each of the CpG dinucleotides is represented as a percentage of methylated PCR products: completely unmethylated (white circles), partially methylated (partially black circles), or completely methylated (black circles).
Finally, for further validation of the MS-SNuPE method and microarray results, we performed bisulphite genomic sequencing of 30 clones from five individuals on a locus within the gene that encodes cadherin 13 (CDH13 [University Health Network accession number UHNhscpg0004063]). This analysis revealed a clear-cut bimodal distribution of epialleles, with the majority of clone sequences being either mostly methylated across all 16 CpG dinucleotides tested or predominantly unmethylated. In addition, this sequencing analysis identified a SNP, C/G—also identified in dbSNP as rs16961372—with a rare C allele frequency of 0.396 in whites. Of the five individuals sequenced, one was homozygous C, one was homozygous G, and the other three were C/G heterozygous (fig. 8). Of particular interest is the substantially higher density of methylated cytosines on the G allele, whereas the C alleles predominantly exhibit a low degree of methylation. Counting all clones across all five individuals together, we found that 67 (77%) of 87 of the sequences with the G allele were methylated, whereas only 14 (22%) of 63 sequences containing the C allele were methylated (χ2=40.4; P=2.08×10-10).
Figure 8. .
Methylation profiles of CDH13. Methylation status of 16 CpG sites surrounding the CDH13 C/G SNP across 30 clones sequenced in each of five tested individuals. Seventy-seven percent (67/87) of the G alleles are methylated (four or more methylated CpGs), whereas 78% (49/63) of the C (bisulphite-converted to T) alleles are unmethylated. The first seven CpG dinucleotides interrogated by MS-SNuPE in figure 7 are represented in this figure as CpGs 5, 6, 9, 10, 13, 15, and 16. CpG 9 is the third MS-SNuPE primer that was predominantly unmethylated in all individuals. Each individual is represented, with single CpG dinucleotides from left to right (black = methylated; white = unmethylated) and with individual clones from top to bottom.
Given that the microarray analysis suggested that promoter CpG islands were significantly more variable, we also performed bisulphite genomic sequencing of the promoter CpG island of CDH13, which was not represented on the CpG microarray. This analysis, however, found that the promoter CpG island of CDH13 is predominantly unmethylated in all individuals, with only solitary methylation sites present in one to three clones for each of the individuals.
Discussion
In this study, we performed an in-depth analysis to address the question of epigenetic variability in the germline. The main conclusions are that (1) the male germline exhibits locus-, cell-, and age-dependent DNA methylation differences and that (2) DNA methylation variation is significant across unrelated individuals, at a level that, by far, exceeds DNA sequence variation. These findings are interesting from both basic molecular biological and biomedical points of view.
First, our study contributes to the understanding of epigenetic peculiarities of gene regulatory regions in the germline. It has been generally accepted that CpG islands are predominantly unmethylated,36 which implies that DNA methylation differences would not be expected there. From our studies, we find that even relatively low densities of methylated cytosines in the CpG islands are sufficient to generate unique epigenetic profiles in DNA regions that do not exhibit any DNA sequence variation, both in different cells of the same individual and also across individuals. Fine-mapping of methylated cytosines of relatively short DNA fragments of BRCA1, BRCA2, PSEN1, PSEN2, DM1, and HD suggest that each sperm cell is unique not only in terms of DNA sequence but also in epigenomic profile, and variation of the latter by far exceeds the former.
At the genomewide level, unexpectedly, promoter CpG islands exhibited larger interindividual variation compared with other single-copy DNA sequences, including the non–promoter CpG islands. This epigenetic phenomenon seems to be discordant, with a general rule that functionally important loci exhibit a low degree of DNA variation, as is seen in the case of SNPs being less common in promoters and exonic sequences than in introns and intergenic regions. In addition, promoter CpG–rich regions are often highly conserved between species; for instance, the mouse genome contains 15,500 CpG islands, of which ∼10,000 are highly conserved.37 Therefore, if the epigenetic variability were just “noise” of little functional relevance, one would expect more variability in these less biologically important regions, such as introns and intergenic sequences. Evidence of the opposite—increased epigenetic variability in the regions that directly control gene activity—may indicate some peculiarities of DNA methylation machinery during gametogenesis that may or may not be of functional importance in the somatic cells (see below for discussion of the postzygotic [in]stability of inherited epigenetic profiles).
Our study has also identified a larger degree of interindividual variability of centromeric satellite repeats. Although we cannot strictly rule out the possibility of DNA copy differences, which are common in centromeric satellite repeats,38 the fact that the germ cell data set showed substantially larger CVs in comparison with the brain DNA data set suggests that germline satellite methylation differences in the germ cells could be a genuine biological phenomenon. Interindividual methylation variability in satellite repeats is consistent with current knowledge39 and may contribute to phenotypic variability in immunodeficiency-centromeric instability-facial anomalies syndrome (ICF [MIM 242860]), a disease that is associated with methylation defects in pericentromeric satellites.40 In addition, microRNAs (or siRNAs) regulate gene expression, heterochromatin formation, and genome stability and often arise from demethylation of tandem repeats that are common in pericentromeric sequences.41 Therefore, interindividual methylation variability in tandem repeats that give rise to microRNAs could also be involved in the variability in gene expression that results in inherited phenotypic variation. A recent study has described increased interindividual variability in the methylation of Alu repeats42 in whole-blood DNA, a finding that was not obvious in our analysis of germ cells, in which short interspersed transposable elements (SINEs) were not statistically different from nonrepetitive elements. However, Sandovici et al. noted that the parental-origin differences in methylation were identified only for Alu elements in pericentromeric chromosomal bands,42 which is consistent with our results.
Second, epigenetic variation within and across germline samples could be of significant interest in human morbid genetics, which, thus far, has nearly exclusively concentrated on DNA sequence differences. Inherited epigenetic variation may provide the basis for new hypotheses and experimental designs in the studies of various human diseases, where the traditional DNA sequence–based studies are reaching the limit of explanatory power. For example, although Huntington disease (HD) is caused by trinucleotide-repeat expansion in the HD gene, the correlation between the number of trinucleotide repeats and age at onset for patients with later-onset HD (at age >50 years) is low.43 The epigenetic status of the HD promoter region may contribute to the steady-state HD mRNA levels and, therefore, to the production of toxic polyglutamine-containing proteins. HD genes containing identical trinucleotide-repeat expansion but differential DNA methylation and chromatin compaction in the promoter region may exhibit significant differences, in terms of their pathogenic potential reflected in the age at disease onset and severity of disease.
The role of differential germline epigenetic modification in complex non-Mendelian disease may be even more critical. Despite significant efforts over the past several decades, DNA sequence–based risk factors have been uncovered in only a small fraction of complex diseases, such as familial breast cancer and early-onset Alzheimer disease. For a number of complex diseases, genetic epidemiological studies showed that DNA sequence differences account for only a small proportion of phenotypic variance among relatives, whereas the substantial remaining fraction of phenotypic differences (in some cancers, 58%–82%44) are typically attributed to environment. Identification of causal environmental factors is very difficult because methodologically impeccable designs in epidemiological studies, as a rule, cannot be applied to humans.45 At the same time, there is an increasing body of evidence that environmental factors play a minimal role in a number of complex traits and disease conditions.46 In this context, epigenetic variation in the germline arises as a new molecular mechanism that may help the understanding of complex phenotypes that are not the outcome of DNA sequence variation or differential environment. The recent finding of germline epimutations of MLH1 in three individuals affected with multiple cancers11,47 provides a good starting point for a systematic search for disease-specific epimutations in the germline.12
In our bisulphite modification–based analyses, the overwhelming majority of loci exhibited rather subtle DNA methylation differences (“shades of gray” type), whereas methylation of the locus within CDH13 is clearly bimodal (“black or white” type). The cadherin gene is a putative mediator of cell-cell interaction in the heart and may act as a negative regulator of neural cell growth. This gene is not imprinted; however, the promoter is hypermethylated in numerous cancers.48–53 Of particular interest is the finding that DNA methylation profiles are associated with DNA alleles; the C allele of CDH13 is predominantly unmethylated, whereas the G allele is predominantly methylated. To our knowledge, only a few human studies have identified associations between DNA sequence and epigenetic profiles. In Beckwith-Wiedemann syndrome, loss of maternal allele–specific methylation was more common on the G allele at the T382G SNP (CAGA haplotype) of the differentially methylated region KvDMR1.54 A common variant, 677C→T, at the 5′ 10-methylenetetrahydrofolate reductase gene (MTHFR), is associated with an increased risk of imprinting defects in the Prader-Willi syndrome/Angelman syndrome region of 15q.55 A comprehensive screen of chromosome 21q has identified a single CpG island with a C/G SNP that was methylated in peripheral blood DNA on the C allele, regardless of the parent of origin.56 Finally, the C102T polymorphism in the serotonin 5-HT2A receptor gene (5HT2AR), which has been associated with several psychiatric disorders, was methylated specifically on the C allele.57 In mice, a recent study identified a number of genes that, on in vitro mutation, affect epigenetic reprogramming during gametogenesis and early development on a genomewide level, suggesting a further mechanism by which DNA sequence (mutations) in trans can affect the epigenetic state.58 A comprehensive epigenetic analysis of SNPs is warranted, and this effort may shed a new light on rather inconsistent genetic association studies in complex disease. Epialleles and epihaplotypes that combine both DNA sequence and epigenetic information may be better predictors of the risk for a disease than any of the two components analyzed separately.
A number of genes in the sperm exhibited DNA methylation changes that correlate with age (fig. 5 and tables 3 and 4). This finding is particularly interesting in light of the evidence that older paternal age is associated with risk for schizophrenia in the offspring.59,60 Although it has been hypothesized that such effects could be due to epigenetic changes in the paternal genome, no locus-specific and age-dependent epigenetic changes in the human male germline have been identified thus far. In this study, a number of genes that show age-related changes in their DNA methylation have been detected, including a number of important developmental genes. The embryonic ectoderm development gene, EED, is a polycomb group gene involved in maintaining the epigenetically regulated repressive state of developmental genes over successive cell generations.61 CTNNA2, or catenin, is a neuronal cadherin-associated protein and may play a major role in the folding and lamination of the cerebral cortex.62 CALM1, or calmodulin, is a key calcium-modulated protein that functions in growth and in the cell cycle, as well as in signal transduction and in the synthesis and release of neurotransmitters. STMN2, or stathmin-like 2, is a neuronal growth-associated protein that shares significant amino acid–sequence similarity with the phosphoprotein stathmin, and CDH13, as described above, is the heart cadherin and is hypermethylated in a number of cancers.
All the above phenotype-related aspects were discussed under the assumption that the epigenetic peculiarities of the germline are, at least to some extent, reflected in the somatic cells after birth. What proportion and to what extent these inherited epigenetic signals can “survive” the reprogramming that immediately follows fertilization, as well as during the later stages of embryogenesis,13,63,64 remain to be investigated. The methylation clearing is not complete and, on a global DNA level, is reduced to ∼10%.65,66 That could represent 90% of all methylation for each gene being erased or could mean that 90% of methylated genes are completely cleared and that 10% of genes retain their methylation, or there could be numerous combinations of the two. It is also unknown what happens to the histone modifications through these phases of loss of DNA methylation signals. Since modifications of DNA and histones are codependent, even if the DNA methylation signals are erased, the histones may be able to carry on specific epigenetic messages to the next stage until the DNA gets remethylated. This concept of cellular memory through histone modifications has been demonstrated for polycomb group proteins through H3K27 trimethylation.67 A combined analysis of both histone modifications and DNA methylation dynamics, from zygote to postnatal stage, is required for the understanding of the importance of germline epigenetics to phenotypic outcomes.
The second aspect that will determine biological importance of the epigenetic variation in the germline is transgenerational epigenetic inheritance: can complex DNA methylation patterns, at least to some extent, be inherited from the parents and transmitted to the offspring? There is already experimental evidence demonstrating epigenetic meiotic inheritance across different species, such as yeast,68 Arabidopsis,69 Drosophila,70,71 and mice.20,21 Although there is no doubt that transgenerational epigenetic inheritance does exist, it is not clear if this is limited to a few loci or if it is a common genomewide phenomenon.
Acknowledgments
This research has been supported by the Special Initiative grant from the Ontario Mental Health Foundation, Canadian Institutes for Health and Research, National Institute of Mental Health, and by the National Alliance for Research on Schizophrenia and Depression, the Stanley Foundation, and the Crohn’s and Colitis Foundation of Canada. We acknowledge Sigrid Ziegler for technical assistance.
Web Resources
Accession numbers and URLs for data presented herein are as follows:
- Center for Addiction and Mental Health: Epigenomics, http://www.epigenomics.ca (for the online data linking the germline epigenetic variation to the genome, by use of the UCSC Genome Browser)
- dbSNP, http://www.ncbi.nlm.nih.gov/SNP/ (for rs16961372)
- Entrez Gene, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene (for PSEN1 [accession number 5663], PSEN2 [accession number 5664], BRCA1 [accession number 672], BRCA2 [accession number 675], HD [accession number 3064], DM1 [accession number 1760], CDH13 [accession number 1012], and genes in tables and )
- HapMap, http://www.hapmap.org/
- Human Epigenome Project, http://www.epigenome.org/index.php
- Online Mendelian Inheritance in Man (OMIM), http://www.ncbi.nlm.nih.gov/Omim/ (for PSEN1, PSEN2, BRCA1, BRCA2, DM1, HD, CDH13, and ICF)
- UCSC Genome Browser, http://genome.ucsc.edu/
- University Health Network CpG Island Microarray Database, http://data.microarrays.ca/cpg/index.htm
References
- 1.Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921 10.1038/35057062 [DOI] [PubMed] [Google Scholar]
- 2.Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, et al (2001) The sequence of the human genome. Science 291:1304–1351 10.1126/science.1058040 [DOI] [PubMed] [Google Scholar]
- 3.Altshuler D, Brooks LD, Chakravarti A, Collins FS, Daly MJ, Donnelly P (2005) A haplotype map of the human genome. Nature 437:1299–1320 10.1038/nature04226 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Chimpanzee Sequencing and Analysis Consortium (2005) Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437:69–87 10.1038/nature04072 [DOI] [PubMed] [Google Scholar]
- 5.Henikoff S, Matzke MA (1997) Exploring and explaining epigenetic effects. Trends Genet 13:293–295 10.1016/S0168-9525(97)01219-5 [DOI] [PubMed] [Google Scholar]
- 6.Li E, Bestor TH, Jaenisch R (1992) Targeted mutation of the DNA methyltransferase gene results in embryonic lethality. Cell 69:915–926 10.1016/0092-8674(92)90611-F [DOI] [PubMed] [Google Scholar]
- 7.Jaenisch R, Bird A (2003) Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat Genet 33:S245–S254 10.1038/ng1089 [DOI] [PubMed] [Google Scholar]
- 8.Rakyan VK, Hildmann T, Novik KL, Lewin J, Tost J, Cox AV, Andrews TD, Howe KL, Otto T, Olek A, Fischer J, Gut IG, Berlin K, Beck S (2004) DNA methylation profiling of the human major histocompatibility complex: a pilot study for the human epigenome project. PLoS Biol 2:e405 10.1371/journal.pbio.0020405 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Li E (2002) Chromatin modification and epigenetic reprogramming in mammalian development. Nat Rev Genet 3:662–673 10.1038/nrg887 [DOI] [PubMed] [Google Scholar]
- 10.Reik W, Walter J (2001) Genomic imprinting: parental influence on the genome. Nat Rev Genet 2:21–32 10.1038/35047554 [DOI] [PubMed] [Google Scholar]
- 11.Suter CM, Martin DI, Ward RL (2004) Germline epimutation of MLH1 in individuals with multiple cancers. Nat Genet 36:497–501 10.1038/ng1342 [DOI] [PubMed] [Google Scholar]
- 12.Martin DI, Ward R, Suter CM (2005) Germline epimutation: a basis for epigenetic disease in humans. Ann N Y Acad Sci 1054:68–77 10.1196/annals.1345.009 [DOI] [PubMed] [Google Scholar]
- 13.Reik W, Dean W, Walter J (2001) Epigenetic reprogramming in mammalian development. Science 293:1089–1093 10.1126/science.1063443 [DOI] [PubMed] [Google Scholar]
- 14.Morgan HD, Sutherland HG, Martin DI, Whitelaw E (1999) Epigenetic inheritance at the agouti locus in the mouse. Nat Genet 23:314–318 10.1038/15490 [DOI] [PubMed] [Google Scholar]
- 15.Allegrucci C, Thurston A, Lucas E, Young L (2005) Epigenetics and the germline. Reproduction 129:137–149 10.1530/rep.1.00360 [DOI] [PubMed] [Google Scholar]
- 16.Kimmins S, Sassone-Corsi P (2005) Chromatin remodelling and epigenetic features of germ cells. Nature 434:583–589 10.1038/nature03368 [DOI] [PubMed] [Google Scholar]
- 17.Zalensky AO, Siino JS, Gineitis AA, Zalenskaya IA, Tomilin NV, Yau P, Bradbury EM (2002) Human testis/sperm-specific histone H2B (hTSH2B): molecular cloning and characterization. J Biol Chem 277:43474–43480 10.1074/jbc.M206065200 [DOI] [PubMed] [Google Scholar]
- 18.Churikov D, Zalenskaya IA, Zalensky AO (2004) Male germline-specific histones in mouse and man. Cytogenet Genome Res 105:203–214 10.1159/000078190 [DOI] [PubMed] [Google Scholar]
- 19.Churikov D, Siino J, Svetlova M, Zhang K, Gineitis A, Morton Bradbury E, Zalensky A (2004) Novel human testis-specific histone H2B encoded by the interrupted gene on the X chromosome. Genomics 84:745–756 10.1016/j.ygeno.2004.06.001 [DOI] [PubMed] [Google Scholar]
- 20.Rakyan V, Whitelaw E (2003) Transgenerational epigenetic inheritance. Curr Biol 13:R6 10.1016/S0960-9822(02)01377-5 [DOI] [PubMed] [Google Scholar]
- 21.Chong S, Whitelaw E (2004) Epigenetic germline inheritance. Curr Opin Genet Dev 14:692–696 10.1016/j.gde.2004.09.001 [DOI] [PubMed] [Google Scholar]
- 22.Hajkova P, el-Maarri O, Engemann S, Oswald J, Olek A, Walter J (2002) DNA-methylation analysis by the bisulfite-assisted genomic sequencing method. Methods Mol Biol 200:143–154 [DOI] [PubMed] [Google Scholar]
- 23.Yatabe Y, Tavare S, Shibata D (2001) Investigating stem cells in human colon by using methylation patterns. Proc Natl Acad Sci USA 98:10839–10844 10.1073/pnas.191225998 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Heisler LE, Torti D, Boutros PC, Watson J, Chan C, Winegarden N, Takahashi M, Yau P, Huang TH, Farnham PJ, Jurisica I, Woodgett JR, Bremner R, Penn LZ, Der SD (2005) CpG island microarray probe sequences derived from a physical library are representative of CpG islands annotated on the human genome. Nucleic Acids Res 33:2952–2961 10.1093/nar/gki582 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Schumacher A, Kapranov P, Kaminsky Z, Flanagan J, Assadzadeh A, Yau P, Virtanen C, Winegarden N, Cheng J, Gingeras T, Petronis A (2006) Microarray-based DNA methylation profiling: technology and applications. Nucleic Acids Res 34:528–542 10.1093/nar/gkj461 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3:RESEARCH0036 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kaminsky ZA, Assadzadeh A, Flanagan J, Petronis A (2005) Single nucleotide extension technology for quantitative site-specific evaluation of metC/C in GC-rich regions. Nucleic Acids Res 33:e95 10.1093/nar/gni094 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gardiner-Garden M, Frommer M (1987) CpG islands in vertebrate genomes. J Mol Biol 196:261–282 10.1016/0022-2836(87)90689-9 [DOI] [PubMed] [Google Scholar]
- 29.Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M, Navin N, Lucito R, Healy J, Hicks J, Ye K, Reiner A, Gilliam TC, Trask B, Patterson N, Zetterberg A, Wigler M (2004) Large-scale copy number polymorphism in the human genome. Science 305:525–528 10.1126/science.1098918 [DOI] [PubMed] [Google Scholar]
- 30.Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson D, Pinkel D, Olson MV, Eichler EE (2005) Fine-scale structural variation of the human genome. Nat Genet 37:727–732 10.1038/ng1562 [DOI] [PubMed] [Google Scholar]
- 31.Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C (2004) Detection of large-scale variation in the human genome. Nat Genet 36:949–951 10.1038/ng1416 [DOI] [PubMed] [Google Scholar]
- 32.Craig JM, Bickmore WA (1993) Chromosome bands: flavours to savour. Bioessays 15:349–354 10.1002/bies.950150510 [DOI] [PubMed] [Google Scholar]
- 33.Furey TS, Haussler D (2003) Integration of the cytogenetic map with the draft human genome sequence. Hum Mol Genet 12:1037–1044 10.1093/hmg/ddg113 [DOI] [PubMed] [Google Scholar]
- 34.Holmquist GP (1992) Chromosome bands, their chromatin flavors, and their functional features. Am J Hum Genet 51:17–37 [PMC free article] [PubMed] [Google Scholar]
- 35.Yoder JA, Walsh CP, Bestor TH (1997) Cytosine methylation and the ecology of intragenomic parasites. Trends Genet 13:335–340 10.1016/S0168-9525(97)01181-5 [DOI] [PubMed] [Google Scholar]
- 36.Bird AP (1986) CpG-rich islands and the function of DNA methylation. Nature 321:209–213 10.1038/321209a0 [DOI] [PubMed] [Google Scholar]
- 37.Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, et al (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562 10.1038/nature01262 [DOI] [PubMed] [Google Scholar]
- 38.Jabs EW, Goble CA, Cutting GR (1989) Macromolecular organization of human centromeric regions reveals high-frequency, polymorphic macro DNA repeats. Proc Natl Acad Sci USA 86:202–206 10.1073/pnas.86.1.202 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Miniou P, Jeanpierre M, Bourc’his D, Coutinho Barbosa AC, Blanquet V, Viegas-Pequignot E (1997) α-Satellite DNA methylation in normal individuals and in ICF patients: heterogeneous methylation of constitutive heterochromatin in adult and fetal tissues. Hum Genet 99:738–745 10.1007/s004390050441 [DOI] [PubMed] [Google Scholar]
- 40.Gisselsson D, Shao C, Tuck-Muller CM, Sogorovic S, Palsson E, Smeets D, Ehrlich M (2005) Interphase chromosomal abnormalities and mitotic missegregation of hypomethylated sequences in ICF syndrome cells. Chromosoma 114:118–126 10.1007/s00412-005-0343-7 [DOI] [PubMed] [Google Scholar]
- 41.Lippman Z, Martienssen R (2004) The role of RNA interference in heterochromatic silencing. Nature 431:364–370 10.1038/nature02875 [DOI] [PubMed] [Google Scholar]
- 42.Sandovici I, Kassovska-Bratinova S, Loredo-Osti JC, Leppert M, Suarez A, Stewart R, Bautista FD, Schiraldi M, Sapienza C (2005) Interindividual variability and parent of origin DNA methylation differences at specific human Alu elements. Hum Mol Genet 14:2135–2143 10.1093/hmg/ddi218 [DOI] [PubMed] [Google Scholar]
- 43.Wexler NS, Lorimer J, Porter J, Gomez F, Moskowitz C, Shackell E, Marder K, et al (2004) Venezuelan kindreds reveal that genetic and environmental factors modulate Huntington’s disease age of onset. Proc Natl Acad Sci USA 101:3498–3503 10.1073/pnas.0308679101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, Pukkala E, Skytthe A, Hemminki K (2000) Environmental and heritable factors in the causation of cancer: analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med 343:78–85 10.1056/NEJM200007133430201 [DOI] [PubMed] [Google Scholar]
- 45.Taubes G (1995) Epidemiology faces its limits. Science 269:164–169 [DOI] [PubMed] [Google Scholar]
- 46.Wong AH, Gottesman II, Petronis A (2005) Phenotypic differences in genetically identical organisms: the epigenetic perspective. Hum Mol Genet 14:R11–R18 10.1093/hmg/ddi116 [DOI] [PubMed] [Google Scholar]
- 47.Hitchins M, Williams R, Cheong K, Halani N, Lin VA, Packham D, Ku S, Buckle A, Hawkins N, Burn J, Gallinger S, Goldblatt J, Kirk J, Tomlinson I, Scott R, Spigelman A, Suter C, Martin D, Suthers G, Ward R (2005) MLH1 germline epimutations as a factor in hereditary nonpolyposis colorectal cancer. Gastroenterology 129:1392–1399 10.1053/j.gastro.2005.09.003 [DOI] [PubMed] [Google Scholar]
- 48.Hibi K, Kodera Y, Ito K, Akiyama S, Nakao A (2004) Methylation pattern of CDH13 gene in digestive tract cancers. Br J Cancer 91:1139–1142 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Ogama Y, Ouchida M, Yoshino T, Ito S, Takimoto H, Shiote Y, Ishimaru F, Harada M, Tanimoto M, Shimizu K (2004) Prevalent hyper-methylation of the CDH13 gene promoter in malignant B cell lymphomas. Int J Oncol 25:685–691 [PubMed] [Google Scholar]
- 50.Sakai M, Hibi K, Koshikawa K, Inoue S, Takeda S, Kaneko T, Nakao A (2004) Frequent promoter methylation and gene silencing of CDH13 in pancreatic cancer. Cancer Sci 95:588–591 10.1111/j.1349-7006.2004.tb02491.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Hibi K, Nakayama H, Kodera Y, Ito K, Akiyama S, Nakao A (2004) CDH13 promoter region is specifically methylated in poorly differentiated colorectal cancer. Br J Cancer 90:1030–1033 10.1038/sj.bjc.6601647 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Widschwendter A, Ivarsson L, Blassnig A, Muller HM, Fiegl H, Wiedemair A, Muller-Holzner E, Goebel G, Marth C, Widschwendter M (2004) CDH1 and CDH13 methylation in serum is an independent prognostic marker in cervical cancer patients. Int J Cancer 109:163–166 10.1002/ijc.11706 [DOI] [PubMed] [Google Scholar]
- 53.Roman-Gomez J, Castillejo JA, Jimenez A, Cervantes F, Boque C, Hermosin L, Leon A, Granena A, Colomer D, Heiniger A, Torres A (2003) Cadherin-13, a mediator of calcium-dependent cell-cell adhesion, is silenced by methylation in chronic myeloid leukemia and correlates with pretreatment risk profile and cytogenetic response to interferon alfa. J Clin Oncol 21:1472–1479 10.1200/JCO.2003.08.166 [DOI] [PubMed] [Google Scholar]
- 54.Murrell A, Heeson S, Cooper WN, Douglas E, Apostolidou S, Moore GE, Maher ER, Reik W (2004) An association between variants in the IGF2 gene and Beckwith-Wiedemann syndrome: interaction between genotype and epigenotype. Hum Mol Genet 13:247–255 10.1093/hmg/ddh013 [DOI] [PubMed] [Google Scholar]
- 55.Zogel C, Bohringer S, Gross S, Varon R, Buiting K, Horsthemke B (2006) Identification of cis- and trans-acting factors possibly modifying the risk of epimutations on chromosome 15. Eur J Hum Genet (http://www.nature.com/ejhg/journal/vaop/ncurrent/full/5201602a.html) (electronically published April 5, 2006; accessed May 22, 2006) [DOI] [PubMed] [Google Scholar]
- 56.Yamada Y, Watanabe H, Miura F, Soejima H, Uchiyama M, Iwasaka T, Mukai T, Sakaki Y, Ito T (2004) A comprehensive analysis of allelic methylation status of CpG islands on human chromosome 21q. Genome Res 14:247–266 10.1101/gr.1351604 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Polesskaya OO, Aston C, Sokolov BP (2006) Allele C-specific methylation of the 5-HT2A receptor gene: evidence for correlation with its expression and expression of DNA methylase DNMT1. J Neurosci Res 83:362–373 10.1002/jnr.20732 [DOI] [PubMed] [Google Scholar]
- 58.Blewitt ME, Vickaryous NK, Hemley SJ, Ashe A, Bruxner TJ, Preis JI, Arkell R, Whitelaw E (2005) An N-ethyl-N-nitrosourea screen for genes involved in variegation in the mouse. Proc Natl Acad Sci USA 102:7629–7634 10.1073/pnas.0409375102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Malaspina D, Harlap S, Fennig S, Heiman D, Nahon D, Feldman D, Susser ES (2001) Advancing paternal age and the risk of schizophrenia. Arch Gen Psychiatry 58:361–367 10.1001/archpsyc.58.4.361 [DOI] [PubMed] [Google Scholar]
- 60.Byrne M, Agerbo E, Ewald H, Eaton WW, Mortensen PB (2003) Parental age and risk of schizophrenia: a case-control study. Arch Gen Psychiatry 60:673–678 10.1001/archpsyc.60.7.673 [DOI] [PubMed] [Google Scholar]
- 61.Montgomery ND, Yee D, Chen A, Kalantry S, Chamberlain SJ, Otte AP, Magnuson T (2005) The murine polycomb group protein Eed is required for global histone H3 lysine-27 methylation. Curr Biol 15:942–947 10.1016/j.cub.2005.04.051 [DOI] [PubMed] [Google Scholar]
- 62.Smith A, Bourdeau I, Wang J, Bondy CA (2005) Expression of catenin family members CTNNA1, CTNNA2, CTNNB1 and JUP in the primate prefrontal cortex and hippocampus. Brain Res Mol Brain Res 135:225–231 10.1016/j.molbrainres.2004.12.025 [DOI] [PubMed] [Google Scholar]
- 63.Oswald J, Engemann S, Lane N, Mayer W, Olek A, Fundele R, Dean W, Reik W, Walter J (2000) Active demethylation of the paternal genome in the mouse zygote. Curr Biol 10:475–478 10.1016/S0960-9822(00)00448-6 [DOI] [PubMed] [Google Scholar]
- 64.Mayer W, Niveleau A, Walter J, Fundele R, Haaf T (2000) Demethylation of the zygotic paternal genome. Nature 403:501–502 [DOI] [PubMed] [Google Scholar]
- 65.Walsh CP, Chaillet JR, Bestor TH (1998) Transcription of IAP endogenous retroviruses is constrained by cytosine methylation. Nat Genet 20:116–117 10.1038/2413 [DOI] [PubMed] [Google Scholar]
- 66.Hajkova P, Erhardt S, Lane N, Haaf T, El-Maarri O, Reik W, Walter J, Surani MA (2002) Epigenetic reprogramming in mouse primordial germ cells. Mech Dev 117:15–23 10.1016/S0925-4773(02)00181-8 [DOI] [PubMed] [Google Scholar]
- 67.Czermin B, Imhof A (2003) The sounds of silence: histone deacetylation meets histone methylation. Genetica 117:159–164 10.1023/A:1022927725945 [DOI] [PubMed] [Google Scholar]
- 68.Klar AJ (1998) Propagating epigenetic states through meiosis: where Mendel’s gene is more than a DNA moiety. Trends Genet 14:299–301 10.1016/S0168-9525(98)01535-2 [DOI] [PubMed] [Google Scholar]
- 69.Mittelsten Scheid O, Afsar K, Paszkowski J (2003) Formation of stable epialleles and their paramutation-like interaction in tetraploid Arabidopsis thaliana. Nat Genet 34:450–454 10.1038/ng1210 [DOI] [PubMed] [Google Scholar]
- 70.Cavalli G, Paro R (1998) The Drosophila Fab-7 chromosomal element conveys epigenetic inheritance during mitosis and meiosis. Cell 93:505–518 10.1016/S0092-8674(00)81181-2 [DOI] [PubMed] [Google Scholar]
- 71.Cavalli G, Paro R (1999) Epigenetic inheritance of active chromatin after removal of the main transactivator. Science 286:955–958 10.1126/science.286.5441.955 [DOI] [PubMed] [Google Scholar]









