Significance
The main insight from this study is that the role of 5-hydroxymethylcytosine (5hmC) in postmitotic neurons is to sculpt the genome occupancy of the very abundant 5-methylcyctosine binding protein 2 (MeCP2). Accumulation of 5hmCG in transcribed genes replaces high-affinity 5mCG binding sites with low-affinity sites, decreasing MeCP2 occupancy over the transcription unit and removing its repressive effect. We refer to this role for 5hmCG as “functional demethylation” because its biochemical effect with respect to MeCP2 is equivalent to chemical demethylation: Loss of high-affinity sites for interaction in the genome. This concept reinforces the roles of 5hmC in demethylation in dividing cells by a mechanism that achieves the same goal without requiring cell division or DNA damage.
Keywords: epigenetics, 5-hydroxymethylcytosine, neuron, MeCP2
Abstract
5-hydroxymethylcytosine (5hmC) occurs at maximal levels in postmitotic neurons, where its accumulation is cell-specific and correlated with gene expression. Here we demonstrate that the distribution of 5hmC in CG and non-CG dinucleotides is distinct and that it reflects the binding specificity and genome occupancy of methylcytosine binding protein 2 (MeCP2). In expressed gene bodies, accumulation of 5hmCG acts in opposition to 5mCG, resulting in “functional” demethylation and diminished MeCP2 binding, thus facilitating transcription. Non-CG hydroxymethylation occurs predominantly in CA dinucleotides (5hmCA) and it accumulates in regions flanking active enhancers. In these domains, oxidation of 5mCA to 5hmCA does not alter MeCP2 binding or expression of adjacent genes. We conclude that the role of 5-hydroxymethylcytosine in postmitotic neurons is to functionally demethylate expressed gene bodies while retaining the role of MeCP2 in chromatin organization.
Proper maintenance of genomic cytosine methylation is essential for the normal functions of mammalian cells (1). The discovery of 5-hydroxymethylcytosine (5hmC) and the 10–11 translocation (TET) proteins (2, 3) have led to important insights into 5-methylcytosine (5mC) oxidation and its roles in DNA demethylation (4). Passive demethylation occurs following oxidation of 5mC to 5hmC in dividing cells because 5hmC prevents the maintenance DNA methyltransferase complex (DNMT1/UHRF1) from acting on newly synthesized hemimethylated DNA to restore 5mC on the nascent strand. This results in dilution of 5hmC in genomic DNA and, in successive rounds of division, demethylation (4). Oxidation of 5hmC to 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) can also occur (5). These derivatives are removed by thymine-DNA glycosylase and replaced with cytosine by the mammalian base excision repair mechanism, thus resulting in active DNA demethylation (6). Taken together, these mechanisms provide for both genome-wide and local cytosine demethylation in dividing cell populations.
In postmitotic neurons, 5hmC accumulates to approximately 10 times the levels present in peripheral cell types (2, 7, 8) and it serves as a stable epigenetic mark. Maximal accumulation of 5hmC occurs in expressed gene bodies (9–13). Although the distribution of 5hmC is cell-specific, the ratio of 5hmC/5mC correlates well with gene expression in all characterized neuronal cell types (11). The discovery that the most abundant 5mC binding protein in neurons, the methyl CpG binding protein 2 (MeCP2), can bind 5mC and 5hmC with similar affinities led to the hypothesis that MeCP2 is an important “reader” of this new epigenetic mark, and that together they constitute a cell-specific mechanism for the regulation of chromatin structure and gene expression (11). Recent data indicate, however, that the binding of recombinant MeCP2 to 5hmC containing DNA depends on the context in which it occurs (14–16). Thus, MeCP2 binds with similar high affinities to 5mCG, 5mCA, and 5hmCA containing DNA, but its binding to 5hmCG occurs at a low affinity similar to that measured for unmethylated DNA. Given the importance of MeCP2 for normal neuronal function demonstrated through studies of Rett syndrome mouse models (17, 18) and the differences in MeCP2 binding to 5hmCG and 5hmCA containing DNA (14–16), we have employed 5hmC mapping using bisulfite-sequencing data (MethylC-Seq) (10) and oxidative bisulfite sequencing (oxBS-Seq) (19), high-resolution mapping of native MeCP2 binding sites using the Occupied Regions of Genomes from Affinity-purified Naturally Isolated Chromatin (ORGANIC) approach (20), and ChIP mapping of informative histone modifications to understand the biochemical consequences of 5hmC accumulation in a single neuronal cell type in vivo.
Results
5hmCG and 5hmCH Occur Predominantly in Euchromatin.
To investigate further the distributions and functions of 5hmC, we have chosen to focus on cerebellar granule cells. Granule cells are an abundant, postmitotic neuron type that provide an opportunity to examine and distinguish between the highly condensed, inactive domains characteristic of heterochromatin (HC), and the less compact, functionally active domains of euchromatin (EC) (21). Since standard MethylC-Seq does not distinguish between methylated and hydroxymethylated sites (10, 19) we used in parallel genome-wide oxBS-Seq because comparative analysis of these datasets allows 5hmC levels to be determined with single-nucleotide resolution (19). MethylC-Seq data and oxBS-Seq data were collected from three independent biological replicates yielding average coverages of 13.5 and 25.4 per cytosine, respectively (cut off three reads, Dataset S1). The correlation coefficients for three oxBS-Seq technical replicates within the same biological replica (pairwise Pearson r ≥ 0.99 for 5mCG and ≥0.92 for 5mCA in 1-kb bins) demonstrated that the oxidation and bisulfite conversion were highly reproducible. Analysis of the spike-in controls for the MethylC-Seq and oxBS-Seq biological replicates (Fig. S1A and Dataset S1) showed CG nonconversion rates of 0.59% for MethylC-Seq and 0.21% for oxBS-Seq and CH nonconversion rates of 0.46% and 0.21%, respectively. The oxBS-Seq 5hmC efficiency of oxidation (measured as the difference between 5hmC conversion in oxBS-Seq and MethylC-Seq) was 91.93% and 92.94% for CG and CH, respectively. Together these data demonstrate that oxBS-Seq can be used to assess accurately genome-wide CG and non-CG methylation and hydroxymethylation in granule cell DNA samples (Fig. S1A and Dataset S1).
To estimate the percentage of methylation and hydroxymethylation at each cytosine and to understand their distribution in the genome we used the maximum likelihood methylation levels (MLML) method described by Qu et al. (22). This approach combines information from MethylC-Seq and oxBS-Seq to arrive at maximum likelihood estimates for the 5mC and 5hmC levels per cytosine. A binomial test is performed for each methylation level calculated. If the estimated methylation level falls out of the confidence interval calculated from input coverage and methylation level, then this is counted as a conflict and removed from the analysis.
Our data (Fig. 1A) are generally consistent with previous reports of neuronal cytosine modification (9, 10, 12). Thus, of the modified CG dinucleotides, methylcytosine (∼60%) is ∼4.6 times more abundant than hydroxymethylcytosine CG (∼13%) (Fig. 1A). The extent of cytosine modification in the CH context (defined as C followed by A, C, or T, ∼1.5%) also agrees well with previous MethylC-Seq studies of total non-CG methylation in neurons (9, 10, 12) (Fig. 1A), as does our consensus nucleotide context indicating that non-CG methylation occurs predominantly as 5mCA (Fig. S1C). Our data demonstrate that 5hmCH (0.57%) occurs at reduced but substantial levels in granule cell genomes relative to 5mCH (0.94%). Although these values are higher than the levels of 5hmCA determined using Tet-assisted bisulfite sequencing to study 5hmC in neurons in the prefrontal cortex (10), the levels of 5hmC detection by oxBS-Seq are consistent with recent studies that used Pvu-Seal-Seq to determine that ∼24% of 5hmC in mouse embryonic stem cells occurs in the non-CpG context (23, 24).
To determine the distribution of cytosine modifications in the granule cell genome we calculated the average percentage for each modification in 100-kb windows across the genome (Fig. 1B). As expected, 5mCG was the most abundant modification genome-wide, with 5mCG (median value 68.5%), and the majority of the 100-kb genomic intervals were heavily methylated. 5hmCG was significantly less abundant in each interval (median value 16.9%). The DNA fragment distribution for cytosine modifications in the CA context revealed that the majority of the genome contained low levels of both 5mCA (1.0%) and 5hmCA (0.65%) with few large intervals containing levels of modified CH that approach those seen in the CG context (Fig. 1B).
To determine whether the distribution of each form of cytosine modification in granule cell genomes was highly correlated in independent biological replicates, a Pearson’s correlation coefficient matrix for the triplicate samples was generated (Fig. 1C). These data illustrate that the gross distributions of cytosine modifications mapped using MethylC-Seq and oxBS-Seq data are highly reproducible. They also reveal two features of these distributions that are of interest. First, although the distributions of 5mCH and 5hmCH are strongly correlated, the distributions of 5mCG and 5hmCG are inversely correlated; second, the average values per region of 5hmCG are more closely correlated with 5mCA and 5hmCA than they are with 5mCG. These data suggest that the accumulation of 5hmCG, 5mCA, and 5hmCA reflect aspects of granule cell development and nuclear structure that are not shared by 5mCG.
Given previous studies demonstrating enrichment of 5mCG in HC (25), we were interested in whether the distinct distributions of 5hmCG, 5mCA, and 5hmCA might reflect their accumulation in EC. To map heterochromatic and euchromatic domains in granule cell nuclei, ChIP-Seq data were collected for histone marks characteristic of EC (H3K27Ac) and HC (H3K9me3) (26). The ratio between these histone marks (H3K27Ac/H3K9me3) was used as a measurement of chromatin organization (EC/HC). DNA accessibility in euchromatic domains was confirmed using Assay for Transposase-Accessible Chromatin with high-throughput sequencing (27) (ATAC-Seq; Fig. 1E). We then rank-ordered the genomic segments analyzed above based on EC/HC and plotted cytosine methylation status for each replicate relative to this measure. As expected, 5mCG accumulates in heterochromatic domains in granule cell nuclei, and it is relatively depleted in EC (Spearman rho = −0.62) (Fig. 1D). In contrast, 5hmCG (rho = 0.65), 5mCA (rho = 0.25), and 5hmCA (rho = 0.51) are all enriched in euchromatic regions and depleted in heterochromatic segments.
The distributions presented above are consistent with the observation that modifications in CA dinucleotides are abundant in postmitotic neurons and depleted in regions with low accessibility (10, 12). However, inspection of cytosine modification data relative to chromatin organization and gene expression in the genome browser provides additional insights into the relationships between 5hmCG and 5hmCA (Fig. 1E and Fig. S2A). Three properties are evident. First, although reproducible distributions of each modification are clearly revealed in both of the biological replicates shown, the data do not correspond precisely at each genomic position. Rather, it is their enrichment or depletion over specific genomic features, such as gene bodies, promoters, or enhancers, that is conserved. Second, as expected, the accumulation of 5hmCG and depletion of 5mCG overexpressed gene bodies (e.g., Grm4) is clearly evident in the MethylC-Seq and oxBS-Seq combined data. Third, although 5hmCG, 5mCA, and 5hmCA are all depleted in heavily heterochromatic foci (Fig. S3B), the distributions of 5mCA and 5hmCA in EC are complex and differ from 5hmCG. For example, both 5mCA and 5hmCA are depleted from the highly expressed genes in Fig. 1E, indicating that the functions of 5hmC may vary depending on sequence context.
5hmCG Accumulation in Expressed Gene Bodies Results in “Functional Demethylation.”
The observation that 5hmCG accumulates in active gene bodies is interesting given recent data demonstrating that recombinant MeCP2 does not bind to 5hmCG (14–16). To determine whether this binding specificity is retained in vivo, we performed pull-down assays in cerebellar extracts using beads carrying control, methylated, or hydroxymethylated cytosine residues in a DNA fragment modeled on an endogenous sequence containing both CG and CA dinucleotides, or in precisely sequence-matched DNA probes containing only CA or CG dinucleotides (Fig. 2A). These data confirm the in vitro binding specificity of MeCP2 and demonstrate that brain-expressed MeCP2 binds with high affinity to 5mCA, 5hmCA, and 5mCG but that it does not bind to 5hmCG.
To understand the importance of sequence context, methylation, and hydroxymethylation on gene expression we next evaluated the accumulation of each cytosine modification in expressed euchromatic genes relative to those present in HC (Fig. 2B and Dataset S2). Highly expressed gene bodies in EC contain elevated levels of 5hmCG, which does not bind MeCP2, and they are depleted for the high-affinity MeCP2 binding residues 5mCG, 5mCA, and 5hmCA (Fig. 2B, Upper). Genes that are expressed poorly or not at all in euchromatic domains show the opposite properties: They accumulate high-affinity MeCP2 binding sites (5mCG, 5mCA, and 5hmCA), and 5hmCG is not abundant in these transcription units. In HC, most genes are either not expressed or expressed at a very low level (Fig. S2B). As expected, these genes are highly methylated, and 5hmCG and 5hmCA do not accumulate (Fig. 2B, Lower). Granule cell promoters are demethylated relative to gene bodies in both EC and HC (Fig. 2B). As previously reported for cortical neurons (12), granule cells also contain a class of developmental regulatory genes that are hypomethylated, for example the HOX group of homeobox genes (Fig. 2F, Right). These are present in HC, they are not expressed, and they are nearly completely demethylated—thus, they have little or no modified cytosines that can serve as high-affinity binding sites for MeCP2.
Taken together with the binding specificity of MeCP2, the contrasting relationships of 5hmCG and 5hmCA to gene expression suggest distinct roles for these two forms of hydroxymethylation. To determine whether these differences in gene body cytosine methylation predict MeCP2 genome occupancy in granule cells we used the ORGANIC profiling methodology (28). This is a modified native chromatin immunoprecipitation assay that does not use cross-linking or sonication, resulting in improved sensitivity and specificity for mapping of native protein–DNA interactions (Fig. S3 B and C). We then used the ORGANIC sequencing data to quantify MeCP2 enrichment in the gene body for each of the expressed transcript categories defined above (Dataset S2). As expected, we observed an inverse relationship between gene expression and the level of MeCP2 binding in the gene body (Fig. 2C). Thus, highly expressed genes bind low levels of MeCP2 and poorly expressed genes are enriched in MeCP2 binding, in agreement with a repressive role for MeCP2 in transcription (17, 18, 29). To reveal the relative contribution of each type of modified cytosine to MeCP2 binding in expressed genes we applied the random forest regressor algorithm (30). As shown in Fig. 2D, this analysis indicates that the most important contributions to MeCP2 gene occupancy come from 5mCG and 5hmCG, and that a significant but less important contribution comes from the levels of 5mCA and 5hmCA. Since 5mCG and 5hmCG act in opposition with respect to MeCP2 binding, we compared MeCP2 binding with the ratios of 5hmCG/5mCG and 5hmCA/5mCA. In each context, genes were binned into quintiles based on the ratio of 5hmC/5mC, and MeCP2 enrichment in each quintile plotted over the gene body. We find that that there is a very strong inverse correlation between 5hmCG/5mCG and binding: Euchromatic genes with a high ratio of 5hmCG/5mCG bind the lowest levels of MeCP2, and those with low ratios of 5hmCG/5mCG are enriched in MeCP2 binding (Fig. 2 E and F). In contrast, the relative levels of 5hmCA and 5mCA in gene bodies is not predictive. Although depletion of 5mCA and 5hmCA overexpressed gene bodies also occurs (Fig. 2B), these data demonstrate that the levels of 5hmCA and 5mCA in expressed genes must at most play a contributory role for MeCP2 binding in active genes. Taken together, our data suggest that the main role of 5hmCG in postmitotic neurons is to convert high-affinity 5mCG MeCP2 binding sites to 5hmCG low-affinity binding sites, thereby reducing the binding of MeCP2 to 5mCG and facilitating transcription. We refer to this process as “functional” demethylation because the consequence of 5mCG oxidation to 5hmCG is functionally equivalent to its demethylation with respect to MeCP2 binding.
In HC, 5mCG is the most important feature for the binding of MeCP2 (Fig. 2D, Lower). The levels of 5hmCG and 5hmCA are very low, and these genes generally bind high levels of MeCP2. However, there is a class of heterochromatic genes with very low binding of MeCP2 (Fig. 2E, Lower and Fig. 2F). These genes correspond to the hypo- or demethylated developmental regulatory genes previously mentioned (12). The fact that they are not expressed despite very low binding of MeCP2 indicates that the mechanisms responsible for repression of heterochromatic genes do not depend solely on the levels of MeCP2.
Functional Demethylation of Expressed Genes Contributes to the Modulation of Gene Expression by MeCP2.
It has been established that loss of MeCP2 in specific neurons leads to dysregulation of a large number of genes (11, 15–18, 30, 31). If functional demethylation of expressed genes contributes to the modulation of gene expression by MeCP2, then both the magnitude and direction of change in MeCP2null granule cells should be related to methylation status and MeCP2 binding in the gene body. To determine if this is the case, we calculated the effect sizes for the change in expression observed in granule cells upon loss of MeCP2 in euchromatic (Fig. 3A) and heterochromatic (Fig. S4A) genes (Dataset S2). Positive effect sizes are associated with those genes whose expression is increased in the MeCP2null. Negative effect sizes represent those genes whose expression decreases in MeCP2null granule cells. Comparison of effect size with features of granule cell genes revealed several aspects of MeCP2 function in neurons. As expected, MeCP2 enrichment in euchromatic gene bodies significantly correlates with both the magnitude and direction of the change in gene expression in MeCP2null granule cells. Thus, expression of genes enriched for MeCP2 increases as a consequence of its loss, whereas those depleted in MeCP2 relative to the genome average decrease expression in MeCP2null neurons. We conclude that MeCP2 binding has a repressive effect on transcription in euchromatic, expressed genes, and that those genes whose expression increases in MeCP2null granule cells are likely direct binding targets of MeCP2. Since the ratio of 5hmCG/5mCG is inversely correlated with MeCP2 occupancy, this relationship is also predictive for alterations in expression in response to loss of the MeCP2. Interestingly, neither the level of expression nor the ratio of 5hmCA to 5mCA determines the magnitude or the direction of change in the MeCP2null. Taken together, our data support a model in which accumulation of 5hmCG over gene bodies facilitates transcription as a result of release from the repressive effects of MeCP2 binding.
A second influence on MeCP2 binding and neuronal gene expression, demethylation of promoters and adjacent domains, was also evident in our data (Fig. 4B). Given recent studies demonstrating that methylation of H3 lysine 4 can specifically block DNA methyltransferase activity (32, 33), and that genes marked by broad H3K4me3 domains around transcription start sites (TSS) have increased transcriptional consistency (34, 35), we were interested in the relative contributions of promoter demethylation and functional demethylation in expressed neuronal genes. To complement the oxBS data, H3K4me3 coverage was measured by ChIP-Seq and DNA accessibility was assessed by ATAC-Seq. We then calculated H3K4me3 gene coverage as the ratio H3K4me3 ChiP-Seq peak breadth to gene length (Fig. 3B and Fig. S3B) and confirmed that H3K4me3 coverage reflects gene body accessibility by ATAC-Seq (Fig. 3D). We observed that the presence of H3K4me3 within the gene body is associated with strong decreases in methylation, hydroxymethylation, and MeCP2 binding over the region of the gene covered by H3K4me3 (Fig. 3C, Lower and Fig. 3E). This phenomenon does not overcome the influence of 5hmCG for the majority of granule cell expressed genes because 5hmCG accumulation, as well as 5mCG, 5mCA, and 5hmCA depletion, occurs 3′ to the H3K4me3 marked domains in most genes (Fig. 3C and Fig. S4B). However, a small class of active genes that are covered substantially by H3K4me3 marks, that are demethylated, and bind little MeCP2 escape modulation by 5hmCG-mediated functional demethylation.
5mCA and 5hmCA Accumulate in Intergenic DNA Adjacent to Active Enhancers and Promoters.
Non-CG methylation is a prominent feature of neuronal genomes that accumulates as cells become postmitotic and terminally differentiate (10). Recent studies have established that the distribution of modified CH dinucleotides plays an important role in gene expression (9, 12) and MeCP2 function (15). Given the relatively minor contributions of 5mCA and 5hmCA to MeCP2 binding and function within gene bodies, we were interested in analyzing in detail the levels and distribution of 5hmCA relative to 5mCA in intergenic regions (Fig. S5A) and its potential impact on MeCP2 regulatory function. We mapped H3K4me1, H3K27Ac, and H3K4me3 in granule cells and then used a multivariate hidden Markov model (ChromHMM) (36) to characterize chromatin states based on these markers. This allowed us to parse the genome into four categories based on previous characterization of the relationships of these marks to regulatory domains: active promoters, active enhancers, primed enhancers, and nonregulatory domains (Fig. 4 A and B). To confirm that the categories assigned properly distinguish active from inactive regulatory domains, ATAC-Seq was used to identify accessible regions (Fig. 4C). As predicted, active enhancers and promoters were highly accessible relative to primed enhancers and nonregulatory regions in the ATAC-Seq data (Fig. 4 B and C). To understand whether 5mCA and 5hmCA play distinct roles in enhancer or promoter activity, we next studied the distribution of modified cytosines relative to intergenic active enhancers (Fig. 4 B and D and Dataset S3). As previously reported (12), the core domains of both active enhancers and promoters are demethylated (Fig. 4D). The core of active enhancers, but not primed enhancers, displays also a decrease in 5hmC levels that correspond with an increase of DNA accessibility measured by ATAC-Seq (Fig. 4 C and D). Consistent with their location in EC (Fig. S5B), regions flanking active enhancers in granule cells contain reduced levels of 5mCG relative to the genome average. While there is an enrichment of 5hmCG in these domains, its accumulation near active enhancers is minor relative to its accumulation in expressed gene bodies. In contrast, 5mCA and 5hmCA are highly enriched in 5′ and 3′ flanking DNA in close proximity to active enhancers, similar to the accumulation levels found in silent genes (Fig. 4D).
To assess whether 5hmCA accumulation in enhancer flanking DNA is associated with the level of CG methylation, intergenic enhancers situated at different distances upstream and downstream a TSS (Fig. S5 C and D) were rank-ordered based on the level of 5mCA or 5hmCA occurring 2 kb upstream and downstream of the enhancer core (Fig. 5 A and B and Dataset S3). Using these rank orders to plot the occurrence in the remaining cytosine modifications, it is evident that 5hmCA and 5mCA accumulation is correlated, and that neither 5mCG nor 5hmCG levels reflect the level of 5hmCA or 5mCA. To understand the significance of 5hmCA and 5mCA accumulation flanking active enhancers we binned them into quartiles based on the levels of 5mCA or 5hmCA (Fig. 5 B and C and Fig. S5A). For each group, the levels of enhancer RNA (eRNA) (37) and TSS RNA for the genes closest to each enhancer were calculated and plotted (Fig. 5C). These data demonstrate that genes with high levels of 5mCA or 5hmCA in their enhancers are associated with low levels of eRNA (Spearman rho = −0.33 and −0.28, respectively) and TSS RNA (Spearman rho = −0.13 and −0.10, respectively), whereas those genes with high levels of transcription do not accumulate 5mCA or 5hmCA in nearby enhancers. Although the 5mCG enrichment in these regions is subtle compared with CA modifications, higher levels of 5mCG also correlate with low levels of eRNA and TSS RNA (Spearman rho = −0.3 and −0.15, respectively). Interestingly, 5hmCG accumulation is not significantly associated with either a decrease or an increase of eRNA or TSS RNA (Spearman rho = 0.01 and 0.008, respectively).
Oxidation of 5mCA to 5hmCA in Enhancer Shores Does Not Alter MeCP2 Binding.
Given the accumulation of 5hmCA and 5mCA adjacent to active enhancers, two important questions arise: Which cytosine modifications contribute to MeCP2 binding in these regulatory domains, and does conversion of 5mCA to 5hmCA in these domains disturb MeCP2 binding? To understand the relative importance of each cytosine modification to MeCP2 binding in regions flanking active enhancers, we used again the random forest classifier algorithm. In contrast to the minor contributions of non-CG modification to MeCP2 binding in expressed gene bodies (Fig. 2D), 5hmCA and 5mCA were important determinants of MeCP2 binding in active enhancer flanking regions (Fig. 5D). Furthermore, the very strong influence of 5hmCG on MeCP2 binding in gene bodies was not apparent in enhancers. Rather, the high-affinity binding sites for MeCP2 5mCG, 5hmCA, and 5mCA ranked first, second, and third, respectively, in predicting MeCP2 occupancy of these regulatory domains. Given these data, and the fact that 5hmCG acts in opposition to 5mCG in expressed gene bodies, we were interested in the effect of 5hmCA on MeCP2 binding in these enriched domains. If 5hmCA is a “neutral” modification with respect to MeCP2 binding, then the ratio of 5hmCA to 5mCA should have no impact on binding. As shown in Fig. 5F, the level of MeCP2 binding does not vary significantly as the ratio of 5hmCA to 5mCA increases in these enhancer domains. This is consistent with the biochemical data demonstrating the MeCP2 binds with high affinity to both 5mCA and 5hmCA. We conclude that oxidation of 5mCA to 5hmCA in intergenic regulatory domains is neutral for MeCP2 binding, thus maintaining functional DNA methylation.
Discussion
The discovery of 5hmC in the mammalian genome (2) has led to important insights into its role as an intermediate in passive and active cytosine demethylation in dividing cell populations (4, 38). Despite the very high levels of 5hmC found in neurons, its role has been difficult to decipher. Here we report two functions for 5hmC in postmitotic neurons. Oxidation of 5mCG to 5hmCG occurs predominantly in expressed gene bodies. This conversion of high-affinity MeCP2 binding sites (5mCG) to low-affinity sites (5hmCG) results in diminished MeCP2 binding, release of its repressive effect, and increased transcription. We refer to this role for 5hmC as functional demethylation because 5hmCG is stably present in these domains, yet it inhibits binding and function of the most abundant neuronal 5mC binding protein, MeCP2. 5hmCA, however, is present in active enhancers and promoters that are also enriched in 5mCA. Since both 5mCA and 5hmCA are high-affinity binding sites for MeCP2, oxidation of 5mCA to 5hmCA maintains MeCP2 binding and its repressive effects on gene expression. We refer to this role for 5hmC as neutral with respect to MeCP2 function because 5hmCA is a stable mark in postmitotic neurons that allows MeCP2 to retain its repressive functions at these important regulatory sites. We propose that 5hmCG-mediated functional demethylation of expressed genes has arisen both to facilitate and stabilize gene expression in postmitotic neurons and that retention of high-affinity binding of 5hmCA by MeCP2 allows functional demethylation to occur without disturbing MeCP2 function in other genomic domains.
The functions of 5hmC identified here are consistent with several recent studies that have established an important relationship between cytosine methylation and MeCP2 function (14–16). Thus, our data confirm the influence of both CG and non-CG methylation on MeCP2 genome occupancy (14–16), and they support the general conclusion that MeCP2 binding within genes and regulatory domains has a repressive effect on gene expression (18, 29). However, the identification of distinct functions for 5hmCG and 5hmCA enhances considerably our understanding of the important roles of hydroxymethylation in sculpting the final landscape of functional cytosine methylation in the brain. Furthermore, our data highlight the fact that an understanding of cytosine modification at CG dinucleotides must incorporate analysis of both 5mCG and 5hmCG because they have opposing impacts on MeCP2 binding, genome occupancy, and function. The demonstration that 5mCG and 5hmCG are the most important determinants of MeCP2 function in expressed gene bodies is consistent with previous studies reporting a positive correlation of 5hmC accumulation and gene expression (9–13, 15). Given the present demonstration that functional demethylation is the mechanism regulating the impact of MeCP2 on expressed gene bodies, our data also provide support for the inverse relationship between non-CG gene body methylation and expression (9, 10, 12, 15), since depletion of these high-affinity binding sites for MeCP2 can also contribute to functional demethylation.
Our understanding of relative roles of 5mCH and 5hmCH in neuronal genomes is less well developed. The present findings confirm reports that 5mCH occurs at maximal levels in CA dinucleotides (9, 10, 12), and they extend these results to identify active enhancers as highly enriched in both 5mCA and 5hmCA. They demonstrate the functional equivalence of 5mCA and 5hmCA for MeCP2 binding and gene expression. Based on these findings, we have proposed that 5hmCA function is neutral with respect to MeCP2 binding and function in vivo. While this model may explain retention of high-affinity binding of MeCP2 to 5hmCA, our data do not exclude other roles for 5hmCA in neurons. It will be interesting to assess, for example, whether enhancers targeted by the MeCP2/NCoR complex (18) are enriched in 5mCA and 5hmCA, whether their relative abundance in these targets influences enhancer activity, and whether the functions of other transcription factors are altered by accumulation of non-CG methylation and hydroxymethylation (39).
The level of expression of any gene reflects the combined action of a large number of complex and interacting genetic and epigenetic factors. We have identified two roles for 5hmC in neurons that help sculpt MeCP2 genome occupancy and determine its consequences for gene expression. Although our data highlight the differential effects 5hmCG and 5hmCA on the binding of this critical neuronal protein, we do not yet understand why the additional regulatory functions of 5hmC and MeCP2 are required principally in vertebrate neurons. Insight into this important issue will require elucidation of the mechanisms targeting Tet oxidases to specific genomic domains and definition of precise roles of MeCP2 in the organization of EC.
Materials and Methods
Mice.
All procedures were in accordance with protocols approved by the Rockefeller University Institute Animal Care and Use Committee in accordance with the National Institute of Health guidelines.
Ox-BS-Seq and MethylC-Seq.
Ox-BS-Seq and MethylC-Seq experiments were performed using TrueMethyl-Seq reagents (CEGX) and workflow, following the manufacturer’s instructions. A total of eight cerebella (four males and four females) were pooled before sorting. Nuclei were digested with 100 μg/mL Proteinase K in the presence of SDS 1% for 2 h at 50 °C. Then, the lysate was treated with RNase A/T1 mix (Thermo Fisher) for 1 h at 37 °C and DNA was extracted with phenol:chloroform:isoamyl alcohol (25:24:1), precipitated in 70% ethanol and dissolved in Tris-EDTA buffer. At least 1 μg of DNA per sample was sonicated using a Covaris-S2 system, and DNA fragments of ∼200 bp were end-repaired using TruSeq DNA Sample kit (Illumina) as per the manufacturer’s instructions. Four nanograms of TrueMethyl sequencing spike-in controls were added to the DNA sample before adapter ligation. After TruSeq DNA adapters ligation, libraries were repurified to eliminate potential contaminating compounds using 80% acetonitrile (Sigma-Aldrich) and TrueMethyl-Seq magnetic beads. DNA libraries in bead pellets were then incubated 5 min at 37 °C in 20 µL of denaturing solution. ssDNA libraries were then transferred into two independent tubes: One microliter of oxidation solution (19) was added to the oxBS sample and 1 μL of ultrapure water was added to MethylC sample. After the oxidation step, bisulfite conversion, desulfonation, and purification were performed in both the oxBS library and MethylC library as per the manufacturer’s instructions. PCR amplification suitable for HiSeq2000 sequencing was performed using TruSeq DNA primers and TrueMethyl-Seq reagents. Quality of libraries was assessed using High Sensitivity D1000 ScreenTape (Agilent) for the 2200 TapeStation system.
Nuclear RNA-Seq.
One cerebellum was immediately homogenized in ice-cold homogenization buffer as described above, in the presence of 0.5 mM DTT and RNasin RNase inhibitors (Promega). RNeasy Micro kit (Qiagen) with in-column DNase digestion was used. Ten nanograms of total RNA per sample were converted to cDNA using the NuGEN Ovation RNA-Seq kit (NuGEN) following the manufacturers’ instructions. cDNA obtained was quality-scored by RNA 6000 PicoChip (Agilent) for Agilent 2100 Bioanalyzer. One milligram of cDNA per sample was sonicated using Covaris-S2, cDNA fragments of 200 bp were end-repaired and adapters were ligated for HiSEq. 2000 technology using TruSeq DNA Sample kit and following the manufacturer’s instructions.
MeCP2 ORGANIC ChIP-Seq.
ORGANIC profiling protocol in granule cells was adapted from ref. 20. After sorting, 3 × 106 nuclei were centrifuged (1,200 × g 15 min at 4 °C) and resuspended in 500 μL of prewarmed MNase digestion buffer: 15 mM Tris⋅HCl, pH 8, 1 mM CaCl2, 15 mM NaCl, 60 mM KCl, and 0.5 mM spermidine. Nuclei were digested with 2.5 U of micrococcal nuclease (Worthington) at 37 °C for 6 min. Digestion was stopped on ice by adding 5 mM of EDTA. After spinning, the pellet was resuspended in 500 μL of extraction buffer: 10 mM Tris⋅HCl, pH 8, 150 mM of NaCl, 0.1% Triton-X 1.5 mM EDTA, pH 8, and 0.5 spermidine. Supernatant (S1) was kept at 4 °C after NaCl and Triton-X were adjusted to 150 mM and 0.1%, respectively. DNA was mechanically extracted using a 26-gauge needle 10 times on ice before 4 °C end-over-end incubation for 2 h. The extract was then centrifuged at 9,500 × g and 4 °C for 10 min and supernatant was kept as fraction S2. S1 and S2 were combined and 25 μL was reserved as Input. Immunoprecipitation was performed at 4 °C overnight in an end-to-end rotator by combining the extract (S1+S2) with anti MeCP2 antibody (AB1)-coated beads [50 μL of protein G and 50 μL of protein A Dynabeads (Thermo Fisher) per sample]. After incubation, beads were washed in extraction buffer, DNA was extracted, and a library was prepared for Illumina sequencing [for both immunoprecipitation (IP) and input] as described in SI Materials and Methods for ChIP-Seq.
SI Materials and Methods
Mice.
All procedures were in accordance with protocols approved by the Rockefeller University Institute Animal Care and Use Committee in accordance with the National Institute of Health guidelines. NeuroD1-EGFP-Rpl10a-JP241 male and female mice (Gensat) were generated as described (40), bred to C57BL/6J mice, and maintained as transheterozygotes until they were killed between 7–12 wk of age for experimentation. Mecp2tm1.1Bird hemizygous females (The Jackson Laboratory), were purchased at 4 wk of age and bred to C57BL/6J mice. Males were killed between 7–12 wk of age for experimentation. All mice were maintained on a 12-h light/dark cycle and given ad libitum access to food and water.
Tissue Preparation and Nuclei Isolation.
Nuclei isolation and sorting were performed as described in ref. 2. Briefly, cerebella were dissected as described above and homogenized in homogenization buffer (0.25 M sucrose, 150 mM KCl, 5 mM MgCl2, 20 mM Tricine, pH 7.8, 0.15 mM spermine, 0.5 mM spermidine, and EDTA-free protease inhibitor mixture using loose (A) and tight (B) glass-glass dounce. The homogenate was supplemented with 50% iodixanol (OptiPrep; Sigma-Aldrich), 150 mM KCl, 5 mM MgCl2, 20 mM Tricine, pH 7.8, and laid on a 29% iodixanol cushion. Nuclei were pelleted by centrifugation for 30 min at 10,000 × g and at 4 °C in swinging bucket rotor (SW41) in a Beckman Coulter XL-70 ultracentrifuge. The nuclear pellet was resuspended in homogenization buffer and costained with DyeCycle Ruby (Thermo Fisher) to 20 μM final concentration.
FACS Sorting Conditions for Granule Cell Nuclei.
Nuclei were sorted in a BD FASCAria cell sorter using 635-nm and 488-nm excitation lasers and by gating with two parameters: high GFP signal (compared with WT mice) indicating bacTRAP-positive cells and lowest signal for DyeCycle Ruby indicating singlets. Alternatively, for H3K27Ac and H3K4me1 ChIP-Seq, granule cell nuclei from C57BL/6J were sorted by selecting a specific FSC/SSC population identified from GC EGFP+ mice. We observed that this FSC/SSC gated population in granule cell EGFP+ animals resulted >92% of the sorted nuclei’s being EGFP-positive. Thus, the gate selected allows us to enrich the sample from 65–70% to 92–96% on granule cells, as described in ref. 11.
ChIP-Seq.
A total of four cerebella (two males and two females) were pooled before sorting. Throughout the protocol buffers were supplemented with EDTA-free protease inhibitor mixture and 10 mM of sodium butyrate (Sigma-Aldrich). ChIP-Seq homogenates were fixed before the addition of 50% iodixanol by adding 1% paraformaldehyde (PFA) for 8-min end-to-end rotation. PFA was then quenched by adding 125 μL of glycine for 5 min. After sorting, nuclei were processed using LowCell# ChIP kit (Diagenode). Nuclei (105) were sonicated in a Bioruptor water bath sonicator for 15 min at intervals of 30 s on and 30 s off to obtain ∼350-bp fragments and chromatin-frozen in liquid nitrogen and kept at −80 °C. Each IP was performed in chromatin aliquots of 5 × 105 nuclei. Immunoprecipitation was performed following LowCell# ChIP kit manufacturer’s instructions, mixing 50:50 IgA and IgG coated beads to 3 μg of antibodies: H3K4me3 (ab8580; Abcam), H3K9me3 (ab8898; Abcam), H3K27Ac (ab4729; Abcam), H3K4me1 (ab8895; Abcam), MeCP2 AB1 (D4F3; Cell Signaling), or MeCP2 AB2 (ab2828; Abcam). A 1:10 ratio of resuspended chromatin per sonicated nuclei was saved as input sample of the procedure. Immunoprecipitation was performed at 4 °C overnight in an end-to-end rotator. After washing beads with IP DNA, both input and IP samples were digested with 100 μg/mL Proteinase K in the presence of SDS 1% for 2 h at 50 °C. Then, the lysate was treated with RNase A/T1 mix for 1 h at 37 °C to and DNA was extracted with phenol:chloroform:isoamyl alcohol (25:24:1), precipitated in 70% ethanol and dissolved in TruSeq DNA Sample kit resuspension buffer. Libraries were performed and sequenced as described above.
ATAC-Seq.
ATAC-Seq experiments were performed following ref. 27. Two male cerebella were pooled before sorting. Nuclei were directly processed after sorting. After centrifugation (as described above for ChIP-Seq) 50,000 nuclei were resuspended in 22.5 μL of reaction mix: 25 μL 2× TD Buffer, 2.5 μL Tn5 Transposase, and 22.5 μL nuclease-free water (Nextera DNA Library Preparation Kit; Illumina). The reaction was incubated for 30 min at 37 °C. Immediately following transposition, DNA was purified using Qiagen MinElute Kit (Qiagen) and DNA eluted in 10 μL of elution buffer. Transposed DNA fragments were then amplified a total of eight cycles using Nextera PCR Primers and Thermo HF Phusion polymerase (Thermo). Library was purified using Qiagen PCR Cleanup Kit (Qiagen) and resuspended in 20 μL of resuspension buffer for HiSEq. 2000 Illumina sequencing.
DNA Pulldown Assay.
Mouse cerebellar nuclei were resuspended in Nuclear Extraction Buffer (Millipore) containing 0.5 mM DTT and protease inhibitor and incubated on an orbital shaker for 60 min at 4 °C. Samples were centrifuged at 16,000 × g for 5 min at 4 °C to collect the nuclear protein extract in the supernatant. M280-straptavidin beads (Invitrogen) (8 μL per sample) were washed once in PBS 0.1% Triton X-100 and then incubated with 200 ng of biotinylated DNA probe (discussed below) in 300 μL of PBS, overnight at 4 °C. Beads were washed twice in PBS 1% Triton X-100, three times in wash buffer (0.2 mM EDTA, 20% glycerol, 20 mM Hepes-KOH, pH 7.9, 0.1 M KCl, 1 mM DTT, 1 mM protease inhibitor PMSF, and 0.1% Triton X-100), and incubated with protein nuclear extract in precipitation buffer (0.05 mM EDTA, 5% glycerol, 5 mM Hepes-KOH, pH 7.9, 150 mM KCl, 1 mM DTT, 1 mM protease inhibitor PMSF, and 0.025% Triton X-100 in PBS) for 15 min at 4 °C. Beads were washed five times in wash buffer and once in PBS and eluted in LDS at 95 °C for 10 min. MeCP2 abundance in the eluted fractions was then determined by Western blot.
Endogenous DNA probe: cagtgcctagaagatgaacttggacctctgcggcatgttctggatttgactcaaagcaaaagctttcaattggaagatgctgagaatttcatcagcaatatcagagtaactgttgtaaaactaaaggtaaggtgttgctttatttgctaatctggaaataaaatagagaagaaatgcatttttaagtggcttgccatttctggtctttgatgggttctgtgcatttagtcagccaaagtttaaagtcactgtgcaagtgaatccttccactgta.
CA DNA probe: ctaggccacagaattgaaagatcttacgtatatacgatttacgttatacgattacgatatacgatttacgttaatacgtttacgattattacgaatttacgtttttacgaatatacgaaatacgtttaatacgtaattacgtatattacgtatatacgatttacgaattacgggatgatgctagaatttccacctac.
CG DNA probe: ctaggccacagaattgaaagatcttgcatatatgcaatttgcattatgcaattgcaatatgcaatttgcattaatgcatttgcaattattgcaaatttgcatttttgcaaatatgcaaaatgcatttaatgcataattgcatatattgcatatatgcaatttgcaaattgcaggatgatgctagaatttccacctac.
SI Data and Software Availability
We used Integrative Genomics Viewer (IGV, Version 2.3) (41) to represent our data using mm10 Refseq transcript annotation as reference.
Ox-BS-Seq and MethylC-Seq.
Sequencing data were aligned to UCSC mm10 Mus musculus genome using bsmap aligner (v2.87) (42). MOABS (43) methylation ratio calling module (mcall) was used to summarize the methylation levels of individual cytosines. We then used the statistical tool MLML (22) for simultaneous estimation of 5-mC and 5-hmC, integrated in the MethPipe software (44). Files were further processed using SAMtools (45) and BEDtools (46). HOMER (v4.8) (47) scripts were used for quantification of percentage of modification in RefSeq annotated TSS, TTS, and transcripts.
To assess how methylation and hydroxymethylation per context determines MeCP2 enrichment in DNA, we used random forest algorithm in the caret (V 6.0–72) predictive model building package for R.
RNA-Seq and Nuclear RNA-Seq.
Sequencing data were aligned to mm10 M. musculus genome using STAR (v2.4.2a) (49) and processed with SAMtools (45). HOMER (v4.8) (47) scripts were used for quality control and to quantify gene expression in gene bodies allowing multiple isoforms per gene.
ChIP-Seq, ORGANIC ChIP-Seq, and ATAC-Seq.
Sequencing data were aligned to mm10 M. musculus genome using Bowtie2 (v2.1.0) (50) and processed with SAMtools (45) and BEDtools (46). HOMER (v4.8) (47) scripts were used for tag quantification in RefSeq annotated TSS, TTS, and transcripts, peak calling, and tag quantification in regions of interests.
SI Quantification and Statistical Analysis
Ox-BS-Seq and MethylC-Seq.
For oxBS quality-control evaluation we did three technical replicas in biological replicas 2 and 3 by performing three independent oxidation reactions from the same DNA libraries. In this study, only autosomes are considered. One hundred-base pair paired-end sequencing data were aligned with bsmap to UCSC mm10 M. musculus genome with default parameters and unstranded configuration, allowing read length * 8% number of mismaches. The total number of cytosines covered were 538 million and 835 million in MethylC-Seq and oxBS-Seq respectively, with an average read depth of 13.15 and 25.4, respectively.
For genome-wide analysis, only good-quality base calls (Phred score of 20 or greater) and total read count greater than five per site were considered. Each base call was paired between MethylC-Seq and oxBS-Seq per replica and 5hmC estimated by the difference between the two fractions of modification. Negative values were then removed from the analysis. The majority of our analyses of DNA methylation are based on the level 5(h)mCG/CG or 5(h)mCA/CA, which are estimates of the fraction of cytosines in the sequenced population that are methylated.
We used the TrueMethyl kit provided spike-in controls to estimate our experimental biases. Both MethylC-Seq and oxBS-Seq are subject to a low error rate due to the failure of the chemical conversion of unmethylated cytosine to uracil. Such errors lead to false-positive detections of modified cytosines. In addition, oxBS-Seq suffers from a small rate of error due to nonoxidation (Eox), which is the probability that a hydroxymethylated cytosine is not oxidized, and thus not converted by the bisulfite reaction, and leads to a false detection of 5mC. These rates were calibrated as described previously (10) and summarized as described below.
The frequency of C base calls at spike-in controls reference positions was calculated from reads uniquely mapped to the spike-in control sequences references, following removal of clonal reads, as described for mm10. In MethylC-Seq, the cytosine nonconversion rate per context (CGBS-nonconversion, CHBS-nonconversion, and CABS-nonconversion) represents the averaged frequency of unmethylated cytosine failure of bisulfite conversion in all positions on control sequences. The oxidation frequency per context was estimated by calculating the difference between the 5hmC nonconversion rate in MethylC-Seq and oxBS-Seq. Thus, the 5hmC context-specific oxidation error (Eox = 1 − oxidation frequency) represents the rate at which 5hmC bases failed to be oxidized and subsequently were not converted by bisulfite. These values are summarized in Dataset S1.
We then took advantage of the MLML statistical tool (22) for simultaneous estimation of 5mC and 5hmC, integrated in the MethPipe software (44). This approach arrives at maximum likelihood estimates for the 5mC and 5hmC levels by combining data from MethylC-Seq and oxBS-Seq. The results are consistent in that 5mC and 5hmC levels are nonnegative. From the MLML output, only positions with 0 conflicts were considered for further analysis.
In Fig. S1B we calculated the fraction of sequences within the sample that are methylated and/or hydroxymethylated per each modified site (mod C/C > 0) with read depth >20 in both MethylC-Seq and oxBS-Seq. Base calls from repetitive regions were excluded from the analysis. The x axis indicates the frequency of modification level by binning them in 20 bins.
In Fig. 1B technical oxBS-Seq replicas were merged in a single biological replica. We calculated the percentage of 5mCG/CG, 5hmCG/CG, 5mCA/CA, and 5hmCA/CA in 100-kb consecutive, nonoverlapping windows genome-wide.
In Fig. 1C we calculated the pairwise Pearson correlation coefficient (r) between the replicas and modifications and plot a correlation matrix. Cytosine modifications were organized based on hierarchical clustering order.
In Fig. 2B we divided genes in EC (EC/HC > 1.1) and HC (EC/HC < 0.9) (see the ChIP-Seq section for details). We then subdivided them according to their level of expression as described in the figure legend. For each cytosine modification, average percentage of modification per bin was analyzed in contiguous 100-bp bins from 10 kb upstream of the TSS to TSS and from TSS to 10 kb downstream and gene bodies (divided each individual transcript in 100 fragments).
In Fig. 4D each cytosine modification and average percentages of modification per bin were calculated for positions (subdivided in 100 equal fragments) of each chromatin state. For enhancers and promoters each fragment was plotted and the analysis extended to 10 kb upstream and downstream. Nonregulatory regions were divided in 1-kb fragments and percentage of modification was calculated as described above and the distributions are shown in box plots.
In Fig. 5B we calculated 5mCA/CA, 5hmCA/CA, 5mCG/CG, and 5hmCG/CG normalized enrichments on each 2-kb upstream and downstream defined enhancers (flanking regions). Only intergenic enhancers were selected for the analysis. Heat-map representation of modifications in enhancer regions (divided in 50 bins) extended 10 kb upstream and downstream divided in 200-bp windows are shown. Enhancers were organized by the normalized 5mCA/CA, 5hmCA/CA, 5mCG/CG, and 5hmCG/CG mean value of their upstream and downstream flanking regions, respectively, from high to low (Fig. 5B, Upper). This order was used to plot 5mCA/CA, 5hmCA/CA, 5mCG/CG, and 5hmCG/CG normalized enrichments in the regions.
In Fig. S1C ideograms show sequence composition and relative frequency at positions flanking the most highly methylated and hydroxymethylated CH sites (third quartile or greater of the modification ratio) on each DNA strand.
A pairwise Pearson correlation coefficient (r) was used to study correlations between replicas and Spearman rho between cytosine modification enrichment and ChIP or effect size data.
MeCP2 ORGANIC ChIP-Seq.
One hundred-base pair paired-end reads were aligned using Bowtie2 (50) with default parameters in a sensitive-local configuration with a minimum alignment score of G,20,8. Duplicate reads were removed before processing.
Enrichment measurements for MeCP2 ChIP and ORGANIC MeCP2 ChIP in Figs. 2 and 3 and Fig. S4 were calculated per transcript length as the fraction of normalized values in fragments per kilobase per million to the mean value of the genome of IP and input:
Enrichment of each pair was calculated individually. Final MeCP2 enrichment value is the average of individual replicas. Transcripts were then grouped according to their gene expression as described in the figure legend. Violins’ horizontal lines indicate first quartile, median, and third quartile.
In Fig. 2E we divided genes in EC (EC/HC ratio >1.1 in gene bodies) and HC (ratio <0.9 in gene bodies) (see the ChIP-Seq section for details). We then subdivided them according to their ratio of 5hmC/5mC enrichment per context. For each quintile, ORGANIC MeCP2 ChIP and input reads per million (RPM) were calculated in contiguous 100-bp bins from 2 kb upstream to TSS, from TTS to 2 kb downstream and gene bodies (dividing each individual transcript in 100 fragments). The obtained IP profiles were divided to their respective input control samples and enrichment was calculated as described above.
In Figs. 2D and 5E we applied the random forest regressor algorithm, a nonparametric supervised learning method used for classification and regression that is particularly suitable for our purpose, because it can capture the complex relationships between sequence DNA methylation and hydroxymethylation levels and MeCP2 binding (30). This algorithm determines the relative importance of CG and CA modifications in a given sequence to predict MeCP2 enrichment. In Fig. 2D we used 10,000 randomly sampled transcripts in EC (EC/HC > 1.1) and HC (EC/HC < 0.9) as training sets for a 100-tree random forest regressor. We used the fractions of 5mCG/CG, 5hmCG/CG, 5mCA/CA, and 5hmCA/CA per transcript as predictors of the averaged enrichment from the pooled ORGANIC MeCP2 ChIP-Seq replicas. The procedure was repeated 10 times and the relative importance of each modification was averaged. Mean ± SD is indicated in the plots. In Fig. 5E we chose 2,754 intragenic enhancer shores as training sets for a 100-tree random forest regressor. We used the fractions of 5mCG/CG, 5hmCG/CG, 5mCA/CA, and 5hmCA/CA per 2-kb region as predictors of the averaged enrichment from the pooled ORGANIC MeCP2 ChIP-Seq replicas. The procedure was repeated 10 times and the relative importance of each modification was averaged. Mean ± SD is indicated in the plots.
ChIP-Seq.
Fifty-base pair paired-end reads were aligned using Bowtie2 (50) with default parameters in a sensitive-local configuration with a minimum alignment score of G,20,8. Duplicate reads were removed before processing.
In Fig. 1D, log2 enrichment (fraction of each 1-Mb window normalized to the average fraction of the genome) of each modification and replica was calculated individually and plotted in a heat map. Cytosine modifications were organized following the chromatin accessibility ratio (EC/HC). To obtain this ratio, we first calculated the normalized (RPM) enrichment as follows:
These values were then used to obtain the ratio EC/HC and an estimation of euchromatic (>1.1, EC) and heterochromatic (<0.9, HC) DNA.
ChromHMM is a program for the learning and analysis of chromatin states using a multivariate hidden Markov model that explicitly models the observed combination of marks (36). We included H3K4me3, H3K4me1, and H3K27Ac ChIP aligned reads and defined a learning model of four chromatin states with a resolution of 200-bp bins. After confirmation of genomic localization and -Seq peak correlation in the genome browser (Fig. 4 B and C), we defined emitted state 1 as active promoters, state 2 as nonregulatory DNA, state 3 as primed enhancers, and state 4 as active enhancers.
ATAC-Seq.
One hundred-base pair paired-end reads were aligned using Bowtie2 (50) with default parameters in a sensitive-local configuration with a minimum alignment score of G,20,8. Duplicate reads were removed before processing.
In Fig. 3C we divided genes according to their H3K4m3 coverage levels. For each group, ATAC-Seq fragments per million (FPM) were calculated in contiguous 100-bp bins from 5 kb upstream to TSS, from TTS to 5 kb downstream and gene bodies (dividing each individual transcript into 100 fragments).
In Fig. 4C, ATAC-Seq FPM values were calculated for each chromatin state (subdivided in 100 equal fragments). Plots were extended 10 kb upstream and downstream and divided into 100-bp windows.
RNA-Seq and Nuclear RNA-Seq.
Fifty-base pair single-end reads were aligned using STAR (49) to mm10 M. musculus genome with default parameters and allowing a ratio of mismatches to mapped length less than 0.06.
To better measure the magnitude and direction of gene expression dysregulation in KO animals (MeCP2null) compared with the WT we defined the degree of expression divergence as the effect size (51, 52):
This metric compares the extent of difference between mean MeCP2null expression and WT expression [in reads per kilobase per million (RPKM)] per each gene, scaled by the degree of variation of both phenotypes (expression noise or measurement error).
In Fig. 5C we divided enhancers (±5 kb) and TSS (−5 kb, +5 kb) in quintiles based on either the normalized 5mCA/CA mean value (Fig. 5C, Upper) or 5hmCA/CA (Fig. 5C, Lower) of their correspondent enhancer flanking regions. Nuclear RNA-Seq Log10 RPKM values were calculated in contiguous 100-bp bins or in 100 equal fragments in the enhancer region.
Browser Representations.
In Fig. 1E and Fig. S2A percentage of cytosine methylation and hydroxymethylation is represented as a vertical column whose height corresponds to the percentage of modification in 100-bp contiguous windows for CG or CA sites. The two deepest sequenced biological replicas are shown. In Figs. 2F, 3B, and 4B methylation and hydroxymethylation in CG and CA tracks represent the average percentage of modification as the fraction of basecalls in 100-bp contiguous windows that are methylated or hydroxymethylated in CG and CA from the merged biological and technical replicas.
We also summarized the following data through the figures: (i) nuclear RNA-Seq, ATAC-Seq, MeCP2 ORGANIC ChIP-Seq, MeCP2, H3K27Ac, H3K9me3, H3K4me1, and H3K4me3 ChIP-Seq: normalized read counts; (ii) EC/HC ratio calculated in 10-kb contiguous windows as for Fig. 1D; and (iii) MeCP2 enrichment: ratio calculated in 10-kb contiguous windows as described below for Fig. 2C.
In Figs. 2C and 3E we used the Wilcoxon–Mann–Whitney test pairwise, a nonparametric statistical hypothesis test, and considered statistically P values equal to or smaller than 0.05. The range of P values is indicated in the legend by asterisks.
The number of replicas and sequencing metrics of all samples are summarized in Dataset S1.
Supplementary Material
Acknowledgments
We thank I. Ibañez for comments on the manuscript; B. López for technical assistance with animal breeding; A. Mousa for bioinformatics support; and E. Stoyanova, X. Xu, and F. Piccolo for discussion. We also thank C. Zhao, C. Lai, and N. Nnatubeugo from the Rockefeller University Genomics Resource Center; S. Mazel, S. Han, S. Semova, and S. Tadesse from the Rockefeller University Flow Cytometry Resource Center; and Y. Zhang and G. E. Zentner for technical discussions. This work was supported by the Howard Hughes Medical Institute (N.H.).
Footnotes
The authors declare no conflict of interest.
Data deposition: The sequence data reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database [accession nos. GSE95628 and GSE42880 (cerebellum MeCP2 KO RNA-Seq data)].
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1708044114/-/DCSupplemental.
References
- 1.Suzuki MM, Bird A. DNA methylation landscapes: Provocative insights from epigenomics. Nat Rev Genet. 2008;9:465–476. doi: 10.1038/nrg2341. [DOI] [PubMed] [Google Scholar]
- 2.Kriaucionis S, Heintz N. The nuclear DNA base 5-hydroxymethylcytosine is present in Purkinje neurons and the brain. Science. 2009;324:929–930. doi: 10.1126/science.1169786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Tahiliani M, et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science. 2009;324:930–935. doi: 10.1126/science.1170116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wu H, Zhang Y. Reversing DNA methylation: Mechanisms, genomics, and biological functions. Cell. 2014;156:45–68. doi: 10.1016/j.cell.2013.12.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ito S, et al. Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine. Science. 2011;333:1300–1303. doi: 10.1126/science.1210597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.He YF, et al. Tet-mediated formation of 5-carboxylcytosine and its excision by TDG in mammalian DNA. Science. 2011;333:1303–1307. doi: 10.1126/science.1210944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Globisch D, et al. Tissue distribution of 5-hydroxymethylcytosine and search for active demethylation intermediates. PLoS One. 2010;5:e15367. doi: 10.1371/journal.pone.0015367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Münzel M, et al. Quantification of the sixth DNA base hydroxymethylcytosine in the brain. Angew Chem Int Ed Engl. 2010;49:5375–5377. doi: 10.1002/anie.201002033. [DOI] [PubMed] [Google Scholar]
- 9.Guo JU, et al. Distribution, recognition and regulation of non-CpG methylation in the adult mammalian brain. Nat Neurosci. 2014;17:215–222. doi: 10.1038/nn.3607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lister R, et al. Global epigenomic reconfiguration during mammalian brain development. Science. 2013;341:1237905. doi: 10.1126/science.1237905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Mellén M, Ayata P, Dewell S, Kriaucionis S, Heintz N. MeCP2 binds to 5hmC enriched within active genes and accessible chromatin in the nervous system. Cell. 2012;151:1417–1430. doi: 10.1016/j.cell.2012.11.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Mo A, et al. Epigenomic signatures of neuronal diversity in the mammalian brain. Neuron. 2015;86:1369–1384. doi: 10.1016/j.neuron.2015.05.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Song CX, et al. Selective chemical labeling reveals the genome-wide distribution of 5-hydroxymethylcytosine. Nat Biotechnol. 2011;29:68–72. doi: 10.1038/nbt.1732. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ayata P. 2013. Decoding 5hmC as an active chromatin mark in the brain and its link to Rett Syndrome. PhD thesis (The Rockefeller Univ, New York)
- 15.Chen L, et al. MeCP2 binds to non-CG methylated DNA as neurons mature, influencing transcription and the timing of onset for Rett syndrome. Proc Natl Acad Sci USA. 2015;112:5509–5514. doi: 10.1073/pnas.1505909112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Gabel HW, et al. Disruption of DNA-methylation-dependent long gene repression in Rett syndrome. Nature. 2015;522:89–93. doi: 10.1038/nature14319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lombardi LM, Baker SA, Zoghbi HY. MECP2 disorders: From the clinic to mice and back. J Clin Invest. 2015;125:2914–2923. doi: 10.1172/JCI78167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lyst MJ, et al. Rett syndrome mutations abolish the interaction of MeCP2 with the NCoR/SMRT co-repressor. Nat Neurosci. 2013;16:898–902. doi: 10.1038/nn.3434. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Booth MJ, et al. Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution. Science. 2012;336:934–937. doi: 10.1126/science.1220671. [DOI] [PubMed] [Google Scholar]
- 20.Kasinathan S, Orsi GA, Zentner GE, Ahmad K, Henikoff S. High-resolution mapping of transcription factor binding sites on native chromatin. Nat Methods. 2014;11:203–209. doi: 10.1038/nmeth.2766. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Palay SL, Chan-Palay V. Cerebellar Cortex: Cytology and Organization. Springer; New York: 1974. [Google Scholar]
- 22.Qu J, Zhou M, Song Q, Hong EE, Smith AD. MLML: Consistent simultaneous estimates of DNA methylation and hydroxymethylation. Bioinformatics. 2013;29:2645–2646. doi: 10.1093/bioinformatics/btt459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wu H, Zhang Y. Charting oxidized methylcytosines at base resolution. Nat Struct Mol Biol. 2015;22:656–661. doi: 10.1038/nsmb.3071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Sun Z, et al. A sensitive approach to map genome-wide 5-hydroxymethylcytosine and 5-formylcytosine at single-base resolution. Mol Cell. 2015;57:750–761. doi: 10.1016/j.molcel.2014.12.035. [DOI] [PubMed] [Google Scholar]
- 25.Cedar H, Bergman Y. Linking DNA methylation and histone modification: Patterns and paradigms. Nat Rev Genet. 2009;10:295–304. doi: 10.1038/nrg2540. [DOI] [PubMed] [Google Scholar]
- 26.Barth TK, Imhof A. Fast signals and slow marks: The dynamics of histone modifications. Trends Biochem Sci. 2010;35:618–626. doi: 10.1016/j.tibs.2010.05.006. [DOI] [PubMed] [Google Scholar]
- 27.Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013;10:1213–1218. doi: 10.1038/nmeth.2688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Orsi GA, Kasinathan S, Zentner GE, Henikoff S, Ahmad K. Mapping regulatory factors by immunoprecipitation from native chromatin. Curr Protoc Mol Biol. 2015;110:21.31.1–25. doi: 10.1002/0471142727.mb2131s110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Lyst MJ, Bird A. Rett syndrome: A complex disorder with simple roots. Nat Rev Genet. 2015;16:261–275. doi: 10.1038/nrg3897. [DOI] [PubMed] [Google Scholar]
- 30.Rube HT, et al. Sequence features accurately predict genome-wide MeCP2 binding in vivo. Nat Commun. 2016;7:11025. doi: 10.1038/ncomms11025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ben-Shachar S, Chahrour M, Thaller C, Shaw CA, Zoghbi HY. Mouse models of MeCP2 disorders share gene expression changes in the cerebellum and hypothalamus. Hum Mol Genet. 2009;18:2431–2442. doi: 10.1093/hmg/ddp181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ooi SK, et al. DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA. Nature. 2007;448:714–717. doi: 10.1038/nature05987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rose NR, Klose RJ. Understanding the relationship between DNA methylation and histone lysine methylation. Biochim Biophys Acta. 2014;1839:1362–1372. doi: 10.1016/j.bbagrm.2014.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Benayoun BA, et al. H3K4me3 breadth is linked to cell identity and transcriptional consistency. Cell. 2014;158:673–688. doi: 10.1016/j.cell.2014.06.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Chen K, et al. Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor-suppressor genes. Nat Genet. 2015;47:1149–1157. doi: 10.1038/ng.3385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ernst J, Kellis M. ChromHMM: Automating chromatin-state discovery and characterization. Nat Methods. 2012;9:215–216. doi: 10.1038/nmeth.1906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kim TK, et al. Widespread transcription at neuronal activity-regulated enhancers. Nature. 2010;465:182–187. doi: 10.1038/nature09033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lee HJ, Hore TA, Reik W. Reprogramming the methylome: Erasing memory and creating diversity. Cell Stem Cell. 2014;14:710–719. doi: 10.1016/j.stem.2014.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Sayeed SK, Zhao J, Sathyanarayana BK, Golla JP, Vinson C. C/EBPβ (CEBPB) protein binding to the C/EBP|CRE DNA 8-mer TTGC|GTCA is inhibited by 5hmC and enhanced by 5mC, 5fC, and 5caC in the CG dinucleotide. Biochim Biophys Acta. 2015;1849:583–589. doi: 10.1016/j.bbagrm.2015.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Doyle JP, et al. Application of a translational profiling approach for the comparative analysis of CNS cell types. Cell. 2008;135:749–762. doi: 10.1016/j.cell.2008.10.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): High-performance genomics data visualization and exploration. Brief Bioinform. 2013;14:178–192. doi: 10.1093/bib/bbs017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Xi Y, Li W. BSMAP: Whole genome bisulfite sequence MAPping program. BMC Bioinformatics. 2009;10:232. doi: 10.1186/1471-2105-10-232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Sun D, et al. MOABS: Model based analysis of bisulfite sequencing data. Genome Biol. 2014;15:R38. doi: 10.1186/gb-2014-15-2-r38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Song Q, et al. A reference methylome database and analysis pipeline to facilitate integrative and comparative epigenomics. PLoS One. 2013;8:e81148. doi: 10.1371/journal.pone.0081148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Li H, et al. 1000 Genome Project Data Processing Subgroup The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Quinlan AR, Hall IM. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Heinz S, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38:576–589. doi: 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: A sequence logo generator. Genome Res. 2004;14:1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Dobin A, et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Ghanbarian AT, Hurst LD. Neighboring genes show correlated evolution in gene expression. Mol Biol Evol. 2015;32:1748–1766. doi: 10.1093/molbev/msv053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Sullivan GM, Feinn R. Using effect size-or why the P value is not enough. J Grad Med Educ. 2012;4:279–282. doi: 10.4300/JGME-D-12-00156.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.