Abstract
The relationship between DNA methylation and chromatin structure is still largely unknown. By analyzing a large set of published sequencing data, we observed a long-range power law correlation of DNA methylation with cell class-specific scaling exponents in the range of tens of kilobases. We showed that such cell class-specific scaling exponents are caused by different patchiness of DNA methylation in different cells. By modeling the chromatin structure using high-resolution chromosome conformation capture data and mapping the methylation level onto the modeled structure, we demonstrated that the patchiness of DNA methylation is related to chromatin structure. The scaling exponents of the power law correlation are thus a display of the spatial organization of chromatin. Besides the long-range correlation, we also showed that the local correlation of DNA methylation is associated with nucleosome positioning. The local correlation of partially methylated domains is different from that of nonpartially methylated domains, suggesting that their chromatin structures differ at the scale of several hundred base pairs (covering a few nucleosomes). Our study provides a novel, to our knowledge, view of the spatial organization of chromatin structure from a perspective of DNA methylation, in which both long-range and local correlations of DNA methylation along the genome reflect the spatial organization of chromatin.
Introduction
Composed of DNA and histones, chromatin has a three-dimensional (3D) structure at different hierarchical levels (1). The spatial organization of chromatin plays an essential role in many genomic functions, including gene expression, DNA replication, and cell mitosis (2, 3, 4, 5, 6). Several lines of evidence show that epigenetics can remodel chromatin structure at different levels (7, 8, 9, 10, 11, 12). Super-resolution imaging recently showed that chromatin folding varies for different epigenetic states (9).
DNA methylation, as the most abundant epigenetic modification in eukaryotic chromosomes, is also thought to influence chromatin structure (10). DNA methylation has a close relationship with nucleosome positioning (11), and the binding of CCCTC-binding factor can be partly influenced by DNA methylation and thus changes chromatin structure (12). Recently, DNA methylation was also used to reconstruct A/B compartments of chromatin revealed by high-resolution chromosome conformation capture (Hi-C) experiments (13). Nevertheless, how DNA methylation relates to chromatin structure remains largely unknown.
On the other hand, the distribution of DNA methylation in chromatin, and thus the correlation of DNA methylation levels between different genomic segments, may provide hints on the spatial organization of chromatin. Here, we investigate long-range and local correlations in the DNA methylation landscape using published whole-genome bisulfite sequencing (WGBS) data, which we expect to reflect the packing of DNA in the 3D space, and try to obtain information on the underlying chromatin structure. DNA methylation possesses long-range power law correlation with a cell class-specific scaling exponent. In addition, the scaling exponent can be used to discern cell classes. We find that the degree of DNA methylation patchiness is cell-specific and that this patched methylation pattern contributes to the different scaling exponents in different cells. Using polymer modeling with Hi-C data, we show that the partially methylated domains (PMDs) spatially segregate from the non-PMDs (genomic regions that are not classified as PMDs) in the IMR90 cell line, leading to it having patchiness of DNA methylation that differs from that of the h1 cell line. In this way, the cell class-specific exponents for the long-range DNA methylation correlation reflect the spatial organization of chromatin. We also demonstrate that the local DNA methylation correlation is related to nucleosome occupancy, and suggest that there are different chromatin structures of PMDs and non-PMDs at nucleosome level. Therefore, both long-range and local DNA methylation correlations can reflect the spatial organization of chromatin.
Materials and Methods
Sources of WGBS data
In this work, we used WGBS data for different cells, including 36 somatic cells, 49 cancer cells and the corresponding normal cells, 8 human brain cells, 1 mouse brain cell, 12 embryonic stem cell lines and related cells, and 6 cells with neurodegenerative diseases (NDDs) (105 in total). All the methylomes were summarized (Tables S1–S5) including the references, URLs, and sample details. The methylomes of cancer samples were downloaded from The Cancer Genome Atlas (TCGA) project. Of all the samples in TCGA, nine types of cancer samples have WGBS data and these nine samples were used.
The Hg18 reference genome was used for human brain cells and human embryonic stem cells (ESCs). The other cells used Hg19 as the reference genome. We determined that the reference genome used had little effect on the methylation correlation found here (Fig. S9 A).
Identification of PMDs, non-PMDs, and PMD-like regions
PMDs were identified genome-wide in tumor samples using a sliding window approach and the parameters in (14) were adopted in this work. The window size was set as 10 kb. A region was identified as a PMD if there were at least 10 methylated (β value > 0) CpG dinucleotides within it, of which the average methylation level was < 0.7. The contiguous PMD windows were then merged into a longer PMD. Only PMDs with lengths longer than 100 kb are used in the following analysis. Non-PMDs are identified as the complementary set of the PMDs and only non-PMDs whose lengths are > 100 kb are used. PMD-like regions were defined in noncancer samples as the corresponding genomic regions of cancer PMDs. Thus, PMD-like regions are defined only for the noncancer samples that have corresponding cancer samples.
Calculation of scaling exponents
The scaling exponents of the long-range power law correlations were calculated as the maximum slope of the fitted double-log correlation data in the genomic region of tens of kilobases. To systematically identify those chromosomes whose slopes of double-log plot of methylation correlation are not well-defined in the concerned (tens of kilobases) region, we calculated the SD of the first-order derivatives for each fitted double-log plot. Low SDs indicate linear behavior with small slope fluctuation for methylation correlation, whereas high SDs indicate large fluctuation.
Fast Fourier transform of the local correlation
Fast Fourier transform (FFT) was performed on the local correlation of the two genomic regions (PMDs and non-PMDs) of imr90, respectively. To avoid the finite length effect and influences of length distribution of genomic regions, we used PMDs and non-PMDs with a genomic length > 0.1 Mb.
Detrended fluctuation analysis
A brief explanation of Detrended Fluctuation Analysis (DFA) was given here and the details can also be found in the Supporting Material: Detrended Fluctuation Analysis for different cell classes. In DFA, the root mean-square fluctuation of DNA methylation as a function of genomic distance was defined. For purely uncorrelated random sequences, , corresponding to a ∼0.5 slope in double-log plot. If the correlation of a sequence decays exponentially, indicating a finite-range correlation, the fluctuation scaling exponent will also be 0.5. Only when a long-range correlation with an infinite characteristic length is expected, will the scaling exponent deviate from 0.5 and thus may be described by a power law.
Gene expression analysis
The level 3 RNA-seq by expectation maximization data from TCGA RNAseq version2 was downloaded from https://portal.gdc.cancer.gov. The RNA-seq by expectation maximization data were then converted to transcripts per million by multiplying by . To compare the differentially expressed genes between tumor and normal samples, we chose the tumor-normal sample pairs that were taken from the same patient and gene expression data of both tumor and normal samples that were available. We finally obtained four tumor-normal pairs, namely, brca_t5-brca_n5, coad_t2-coad_n2, luad_t5-luad_n5, and ucec_t5-ucec_n5. We compared gene expression for these four tumor-normal pairs.
Genes with intragenic regions intersecting with tumor PMDs (or PMD-like regions in corresponding normal samples) were identified. Then gene expression fold change was calculated as for each gene. These genes were divided into four categories: activated genes (fold change ≥ 2), repressed genes (fold change ≤ 0.5), specifically expressed in tumor sample ( and ), and specifically expressed in normal sample ( and ). We also defined the gene density of the genome and the specific genomic regions like PMDs as the number of genes per million base pairs. Gene functional classification was carried out using The Database for Annotation, Visualization and Integrated Discovery (15) (16). The housekeeping genes list can be downloaded from https://www.tau.ac.il/∼elieis/HKG/ (17).
Structural modeling using Hi-C data
We developed a restraint-based method to construct an ensemble of 3D chromosome models (18). The method was verified by the reproduction of experimental Hi-C contact frequencies. In our method, chromosome was coarse-grained as a polymer chain consisting of a string of beads. The Hi-C data for the IMR90 and h1 cell lines were obtained from Rao et al. (19) and Dixon et al. (20), respectively. According to the resolution of Hi-C data, in our modeling for IMR90 and h1, each bead represents a 50 or 40 kb genomic region, respectively. The polymer structure was optimized according to distance restraints derived from Hi-C data. To achieve this, we first converted the contact frequency matrix measured by the Hi-C experiment to a distance matrix that provides the spatial restraints for the coarse-grained beads. Then, we performed MD simulations starting from randomly generated initial conformations using biased potentials to generate an ensemble of conformations based on the restraint distance matrix. Further modeling details and validation were presented in Xie et al. (18).
Results
DNA methylation shows long-range power law correlation
We compared the Pearson correlation coefficients of DNA methylation levels (β values) within the methylome across a wide range of human cells, including normal somatic cells, cancer cells, brain cells, gland cells, and stem cells. In calculating the long-range correlation, the methylation level was first averaged using a 200 bp window. The sources of relative WGBS data were summarized in Tables S1–S5.
Taking chromosome 1 as representative, all the methylation correlations strikingly present a long-range power law decay as the genomic distance increases (Fig. 1). The power law correlation implies a scale-free property of DNA methylation and the scale-invariant genomic segment lies in the tens of kilobases scale. Power law scaling is of general interest (21) and is often noticed in evolving systems that may be produced by hierarchical structure of several length-scales (22). The scale-invariant genomic scale (tens of kilobases) also involves the sizes of genes and chromatin domains, and can be important for a variety of genomic functions (19, 23).
The correlation coefficients still have finite values in the order of 0.01–0.1 even for the 1 Mb genomic separation (Fig. 1). To verify the statistical significance of the power law decay found here, we also calculated the correlation of a randomly methylated DNA sequence for comparison. Specifically, we generated a randomized methylation pattern by randomly assigning the methylation level of each CpG following the overall distribution of the original sample. The correlation coefficient for the random sample immediately drops to zero and the power law decay disappears (Fig. S1). This comparison clearly shows the nonrandom nature of DNA methylation in the cells and that the methylation level of CpGs separated by a very long genomic distance is indeed significantly correlated.
Interestingly, the scaling exponents of long-range DNA methylation correlation differ substantially between normal somatic cells and cancer cells, and the respective values are −0.26 ± 0.02 and −0.06 ± 0.02. The value for cancer cells is significantly smaller than that for normal somatic cells. Small SDs show that the scaling exponents are conserved among either normal somatic cells or cancer cells (Fig. 1, A and B), although the methylation levels of individual CpGs (and even the average values among all CpGs) vary greatly (24). Similarly, the differences between normal somatic cells and brain or gland cells are substantial but consistent within each cell class (Fig. 1, C and D), suggesting that cellular differentiation causes systematic variations of DNA methylation landscape. It was found that the scaling exponents for chromosome 1 of normal somatic cells in three different individuals are conserved (Fig. 2 A; Figs. S2 A and S3 A) and that the power law scaling is also present in mouse brain cells (Fig. S2 B).
In addition, DFA (25) was also performed that again show the long-range correlation in the DNA methylome (Fig. S4). DFA was used previously to describe the long-range correlation in DNA sequences (25), which is more robust than direct correlation calculation when determining the average behavior of a long-range effect. The average scaling exponents of 0.76 0.01 and 0.92 0.02 are observed separately for normal somatic cells (Fig. S4 A) and cancer cells (Fig. S4 B). Their deviation from 0.5 and small variances indicate a uniform power law decay within certain cell states among different types of tissues. Cancer cells hold an obviously higher scaling exponent, in accordance with their flatter double-log correlation curves (Fig. 1 B). The DFA of gland cells, brain cells, and ESCs are presented in Fig. S4, C–E, respectively.
The significant difference between different cell classes demonstrates that the long-range correlations in the DNA methylome cannot simply originate from the DNA sequence (25, 26). DNA methylation was previously demonstrated to have long-range correlations by establishing a firm link with the A/B compartment (13), suggesting the scale-free property found here for DNA methylation to originate from chromatin structure, which is discussed later in more detail.
Clustering on the scaling exponents of chromosomes can be used to discern different cell types
The power law scaling behavior is observed in almost all chromosomes across a large variety of samples (Fig. S2 C). Hierarchical clustering for scaling exponents on all autosomal chromosomes demonstrates that most chromosomes behave similarly within each cell class, whereas chromosomes 14 and 21 tend to always have a higher scaling exponent (Fig. 2). When all chromosomes are compared, it can be clearly seen that cancer cells are distinguished from normal somatic cells (Fig. 2 B), consistent with the clustering on cell types (Fig. S3 B). Systematic differences are also clearly seen among normal brain cells, glioblastoma, and NDDs (Fig. 2 C). Different types of NDDs have similar scaling exponents whilst behaving significantly differently from glioblastoma, possibly highlighting their different pathogenesis (Fig. 2 C). In addition, the scaling exponent also clearly distinguishes ESCs and induced pluripotent stem cells from somatic cell lines and adult stem cell lines (Fig. 2 D).
When compared to the normal brain cells, all NDD samples analyzed here possess more negative scaling exponents for chromosome 2, 3, 5, and 15, suggesting their common roles associated with the neural diseases. In contrast, chromosome 19 shows little variation among all samples.
A small number of chromosomes possess correlations that deviates from a simple power law scaling (Fig. S2 D). We systematically identify such chromosomes (see Materials and Methods) which are, interestingly, mainly found in certain cells and particular chromosomes, namely chromosome 22 of the brain samples and chromosomes 4 and 10 of ESCs and induced pluripotent stem cells. Although the atypical power law behavior of these chromosomes may reflect the large fluctuation of the original methylation data, the clustered behavior could also suggest that these particular chromosomes have peculiar structures and functions that call for further studies. For example, it is known that genes in chromosome 22 are dense and that genetic disorder in chromosome 22 is associated with brain abnormalities (27).
Patchiness of DNA methylation is found along the genome and contributes to the power law scaling
Extensive changes in DNA methylation take place during tumorigenesis (28, 29). In cancer cells, a large amount of long-range DNA hypomethylation was identified, distinct from the DNA methylation of normal cells (28). A domain with long-range DNA hypomethylation is termed a PMD (14). The DNA methylation profile illustrating the PMD formation in cancer cells can also be seen in Fig. 6 C. The IMR90 cell line also has such DNA hypomethylation character (14, 28, 30).
For cancer cells or the IMR90 cell line, the whole chromosome can be viewed as composed of alternating low-methylation-level domains (i.e., PMDs) and high-methylation-level domains (i.e., non-PMDs) in contrast to other cells. That is, the patchiness of DNA methylation for cancer cells or the IMR90 cell line is more apparent than in normal somatic and stem cells. The scaling exponents for IMR90 and cancer cells are similar to each other (Fig. 2). Such a coincidence promoted us to investigate whether the patchiness of DNA methylation contributes to the different scaling exponents of long-range DNA methylation in different cell classes.
Chromosome 1 in the IMR90 and h1 cell lines is taken as an example. There are 34% PMDs in IMR90, whereas h1 lacks PMDs. Namely, IMR90 and h1 have different degrees of DNA methylation patchiness. To understand how such patchiness is generated in IMR90 but not h1 cells, their Hi-C data, which are available, are used in the next section for structural modeling (19, 20).
Here, we show that the high-low alternative pattern of DNA methylation is enough to mathematically reproduce the slow-decaying correlation in IMR90. We discretized the DNA methylation level of IMR90 and h1 into 1 and 0 with the methylation average as a reference value. Specifically, for chromosome 1 of each cell type, we assign a value of 1 to every 200 bp unit with a methylation level greater than that of the chromosome average, and 0 to those with a methylation level smaller than average. The correlations of the two discrete model series were calculated and shown in Fig. 3 B. The corresponding correlations of experimental DNA methylation level are also plotted in Fig. 3 A for comparison. The discrete model series also possesses the power law scaling behavior at the tens of kilobases scale (Fig. 3 B). The comparison between Fig. 3, A and B shows that the discrete model is able to reproduce the different scaling exponents in IMR90 and h1 cell lines, proving that the difference mainly comes from the different patchiness of their DNA methylation patterns. That is, the alternation of low and high methylation alternation along the genome in IMR90 results in the lower power law scaling exponent compared to the h1 cell line.
In addition, we calculated the correlations of the discrete model series for all the samples used in Fig. 1. The results are shown in Fig. S5. The scaling exponents of long-range DNA methylation for normal somatic cells, cancer cells, normal brain cells, and gland cells are −0.18 ± 0.02, −0.06 ± 0.02, −0.26 ± 0.03, and −0.13 ± 0.03, respectively. The scaling exponents using the discrete model series are the same as the experimental DNA methylation for cancer cells. For the other three cell classes, these two values differ, but only slightly. The order of the scaling exponents among the four cell classes is maintained after discretization, again demonstrating that the high-low alternative pattern of DNA methylation accounts for the cell class-specific scaling exponents.
Patchiness of DNA methylation in IMR90 is related to chromatin structure
Next, we show the patchiness of DNA methylation reflects the packing of DNA in the 3D chromatin structure by mapping methylation level onto the modeled chromatin structure using Hi-C data.
We have developed a polymer modeling strategy using Hi-C data to construct the chromatin structure (see Materials and Methods and (18)). Hi-C data provide the frequency of physical interactions between any different genomic loci (31), and the frequencies can be further related with spatial distances (32). We used structural optimization to obtain the coarse-grained chromatin conformations meeting the distance constraints derived from Hi-C data.
We modeled the structures of chromosome 1 from IMR90 and h1 cells and mapped their DNA methylation levels onto the structures, which are respectively shown in Fig. 4, A and B. The two chromatin structures have obviously different organizations. Chromosome 1 of IMR90 shows a somewhat spherical appearance (Fig. 4 A), whereas the h1 chromosome adopts a scissor-like conformation (Fig. 4 B), suggesting structural changes during cellular differentiation. The mapping of DNA methylation level might provide a clue of how the different patchiness of DNA methylation in the two cell lines happens. It is interesting to observe that genomic regions with low methylation levels (colored blue in Fig. 4 A) are largely located close to each other in the chromatin model reconstructed based on the Hi-C data in IMR90. In contrast, the segregation of DNA methylation is not obvious in the h1 cell line (Fig. 4 B).
In our previous work, we have shown that the segregated low methylation regions (PMDs) in IMR90 are related to lamina-associated domains and chromatin compartment B, as well as other genome features (18), showing that the formation of PMDs may be caused by the improper function of DNA methyltransferase in chromatin compartment B and may be the origin of different patchiness in IMR90 compared to h1. Nearly all of the PMDs locate in chromatin compartment B and segregate from other genomic regions (chromatin compartment A) (18). This confirms the spatial segregation of the DNA methylation level in IMR90, qualitatively seen from the rendered chromosome structure (Fig. 4 A). These results suggest that the different patchiness of DNA methylation in different cells is related to their different chromatin structures. Thus, the long-range power law correlation for DNA methylation can reflect the spatial organization of chromatin, which in itself is hierarchical.
Local methylation correlations suggest the different chromatin structure in PMDs and non-PMDs
In the previous section, we have shown that the long-range correlation of DNA methylation reflects the global packing of DNA in chromatin. Next, we show that the local methylation correlations in PMDs and non-PMDs reflect their different structures. IMR90 cells, whose DNA methylation and nucleosome occupancy were obtained together using the nucleosome occupancy and methylome sequencing technique, is used as an example (33).
We compared the local correlation of CpG methylation in PMDs and non-PMDs in IMR90 cells (Fig. 5 A). Consistent with previous studies on the IMR90 cell line (34), the decay of PMD correlation clearly shows an obvious periodic behavior at base-resolution. The non-PMD regions, in contrast, show very weak periodic behavior.
The periodicities in different genomic regions were then quantified using FFTs of their local correlations. For PMD, the FFT of its local correlation shows a strong peak at 181 bp. At a similar position, a much weaker peak was found for non-PMD regions (Fig. 5 B). The period of 181 bp is consistent with the nucleosome repeat length (NRL), suggesting that this periodicity may come from the regular organization of nucleosomes in PMDs and that the nucleosomes in non-PMDs are relatively irregularly spaced.
We further analyzed the nucleosome occupancy and methylome sequencing data of nucleosome occupancy (33). The local correlation of nucleosome occupancy in PMD and non-PMD regions is shown in Fig. 5 C and their FFTs are plotted in Fig. 5 D. The local correlation of nucleosome occupancy has a 182 bp period in both PMD and non-PMD genomic regions, as seen from the FFTs of the local correlations (Fig. 5 D). Interestingly, the periodicity for PMDs is stronger than that in non-PMDs (Fig. 5 C), similarly to their differences in methylation patterns but to a lesser extent. Such a result is consistent with the possibility that the local DNA methylation correlates with nucleosome organization. The regularity of nucleosome arrangement can be weakened by nucleosome depletion or different NRLs. Both these factors severely affect chromatin organization and compaction. Nucleosome depletion massively influence chromatin flexibility (26, 35, 36). With different NRLs, the nucleosomes can form 30 nm higher order chromatin structure or other chromatin fibers (37, 38). Therefore, the local correlation of DNA methylation suggests that the chromatin structure of PMDs and non-PMDs is different at the kilobase genomic scale.
Such a conclusion is also consistent with our previous analysis of Hi-C data (18). We found that the Hi-C patterns for PMDs and non-PMDs are obviously different, which again shows that these two domains have different spatial organization. From Hi-C data, it is easy to see that all the PMDs have uniform physical contact within its interior, whereas the majority of non-PMDs contain localized interaction domains.
Gene expression in PMDs and PMD-like regions are repressed
To understand how the patchiness of DNA methylation is related to biological functions, we analyzed the gene expression in PMDs and non-PMDs. As explained in the Materials and Methods, we analyzed the four tumor-normal sample pairs in TCGA. It was previously found that the PMDs in IMR90 correlate with repressive and anticorrelate with active histone marks (28). In addition, the CGI promoters are hypermethylated in PMDs (28).
Consistent with earlier studies (24, 30, 39), we find that genes within PMDs in cancer samples tend to be transcriptionally repressed (Fig. S6; Tables S6 and S7) and, interestingly, these genes are related to specific functions. Genes within cancer PMDs mainly relate to Gene Ontology terms such as cell membrane, glycoprotein, disulfide bond, olfaction, cadherin, and receptor (Table S8), which suggests that some intra-PMD genes regulating cell communication tend to be repressed. In addition, almost all housekeeping genes (3794 of 3796) are located outside PMDs, consistent with their essential role in fundamental cellular function (Table S8).
Taking the brca_t5 tumor sample as an example, there are 473 genes intersecting with PMDs, of which 305 are located within the PMD body and, in particular, 156 are in the PMD center (defined as the central 60% of the PMD), indicating that most genes embed in the PMD body. In addition, among the 473 genes intersecting with PMDs, 57.7% of them have non-CGI promoters (the definition of non-CGI and CGI promoter is from (40)) and this ratio is significantly higher than that of all genes (34.2%, 8420 of the total 24,630 genes), which indicates that genes with non-CGI promoters are enriched in PMDs (Table S8).
Besides the repressed gene expression level, we also find that the repression degree correlates with the PMD lengths in the four tumor-normal sample pairs. Fig. 6 A shows the correlation between gene repression levels and PMD lengths in brca and the results in the two other tumor-normal sample pairs (colon adenocarcinoma (coad) and uterine corpus endometrial carcinoma (ucec)) are presented in Fig. S7. With the increasing length of PMDs or PMD-like regions, the percentage of repressed genes increases, which may result from these genomic regions being buried in the compact chromatin regions in 3D space and probably also impedes the binding with transcription factors, RNA polymerase, or other regulators. The gene density of PMDs (2.737 in the brca_t5 sample) is also much lower than that of non-PMDs (6.526 in the brca_t5 sample), indicating the gene sparsity in PMDs.
We plotted the local correlation of different genomic regions of breast cells in Fig. 6 B. The decay of DNA correlation in the PMDs of breast cancer cells clearly shows a periodic behavior at base-resolution, just like the IMR90 cell line. The corresponding genomic regions of cancer PMDs in normal cells are defined as PMD-like regions. We found that PMD-like regions of breast cells have an average methylation level higher than PMDs and lower than non-PMDs (Fig. S8), and a less obvious methylation correlation periodicity (Fig. 6 B). Furthermore, the average expression level of genes in PMD-like regions is lower than that in non-PMDs and higher than that in PMDs, which is consistent with PMD-like regions’ intermediate behavior in the DNA methylation level (Fig. S8) and periodicity of local methylation correlation (Fig. 6 B).
The intermediate properties and the similar genomic locations of PMD-like regions to the PMDs imply the role of PMD-like regions in tumor development. PMD-like regions may be the precursor of tumor PMDs in which the genes regulating cell communications are further repressed. Interestingly, PMD or PMD-like domains tend to lie in genomic regions with lower CpG density (Fig. 6 C), suggesting that they belong to different isochores (41). In this analysis, the Fisher’s exact test between the expression level of PMDs and PMD-like regions and that between PMDs and non-PMDs is shown in Table S7. The average expression level in PMDs, PMD-like regions, and non-PMDs is shown in Table S6. The comparison of expression level in PMD and PMD-like regions in the four tumor-normal sample pairs is shown in Fig. S6.
Discussion
Power law scaling in cancer cells is not caused by the lower average methylation level or copy number variation (CNV). One difference between cancer and normal somatic cells in DNA methylation is that the former appears to be demethylated in PMDs compared to the latter. To show that the more sustained correlation of cancer cell DNA methylation is not caused by this overall demethylation, we checked the scaling exponents of methylation correlations among cells with large variations in methylation levels. We calculated the methylation correlations of human inner cell mass (42) and primordial germ cells (43). The average methylation levels of these cells are both significantly lower than normal somatic cells, as has been found for cancer cells (Fig. S9 B). However, the scaling exponents for inner cell mass and primordial germ cells are nearly the same as normal cells and much lower than those of cancer cells (Fig. S9 C). Thus, it can be concluded that average methylation level does not determine the different scaling exponents across cell classes.
Since tumor samples are enriched in CNVs, we showed that the DNA methylation correlation is little affected by CNVs in cancer cells. For example, the long-range methylation correlation for chromosome 1 behaves similarly whether the CpG sites within CNVs of the brca_t5 tumor sample are included or excluded (Fig. S9 D) in the correlation calculations. We also checked the single-cell WGBS sequencing data and found that the long-range correlation pattern was quite well-conserved (Fig S9 E). Therefore, DNA methylation correlation found in this work is also conserved among different individual cells.
Conclusion
Through exploiting the chromatin structure modeled based on Hi-C data and the underlying long-range and local correlations of the DNA methylome, our study provides a comprehensive view of the flow of genetic information, connecting DNA sequence, CpG methylation, local and long-range chromatin structure, and gene expression. In normal somatic cells, DNA sequences with low CpG density correlate with low methylation levels and low expression levels (PMD-like). The development of cancers is associated with further decreases of the average methylation level in PMD-like regions, some of which turn into PMDs containing further suppressed genes. The correlation of methylation shows consistent differences among different classes of cells, including normal somatic cells, cancer cells, brain cells, gland cells, and stem cells, that are highly conserved within each class. The clear cell class dependence of the long-range power law scaling in methylation correlation shows that it can serve as a simple measure to discriminate cells at normal and pathological states. Such a finding points to a new direction, to our knowledge, in the analysis of the development of different diseases, such as cancers and NDDs, at the chromatin level.
Author Contributions
Y.Q.G designed the research. L.Z., W.J.X., S.L., and L.M. performed research. W.J.X., L.Z., S.L., and L.M. wrote the manuscript. C.G. contributed to the analysis of the data.
Acknowledgments
We thank Fuchou Tang, Chengqi Yi, and Xiaoliang S. Xie for helpful comments on the manuscript. The results shown here are partly based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/. We are grateful to Manel Esteller for providing the methylome of neurodegenerative diseases. This work was supported by the National Natural Science Foundation of China (grants 21573006, 21233002, and 91427304 to Y.Q.G.).
Editor: Anatoly Kolomeisky.
Footnotes
Ling Zhang and Wen Jun Xie contributed equally to this work.
Supporting Materials and Methods, nine figures, and eight tables are available at http://www.biophysj.org/biophysj/supplemental/S0006-3495(17)30911-6.
Supporting Material
References
- 1.Gibcus J.H., Dekker J. The hierarchy of the 3D genome. Mol. Cell. 2013;49:773–782. doi: 10.1016/j.molcel.2013.02.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Levine M., Cattoglio C., Tjian R. Looping back to leap forward: transcription enters a new era. Cell. 2014;157:13–25. doi: 10.1016/j.cell.2014.02.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Galupa R., Heard E. X-chromosome inactivation: new insights into cis and trans regulation. Curr. Opin. Genet. Dev. 2015;31:57–66. doi: 10.1016/j.gde.2015.04.002. [DOI] [PubMed] [Google Scholar]
- 4.Naumova N., Imakaev M., Dekker J. Organization of the mitotic chromosome. Science. 2013;342:948–953. doi: 10.1126/science.1236083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Pombo A., Dillon N. Three-dimensional genome architecture: players and mechanisms. Nat. Rev. Mol. Cell Biol. 2015;16:245–257. doi: 10.1038/nrm3965. [DOI] [PubMed] [Google Scholar]
- 6.Dekker J. Gene regulation in the third dimension. Science. 2008;319:1793–1794. doi: 10.1126/science.1152850. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Jenuwein T., Allis C.D. Translating the histone code. Science. 2001;293:1074–1080. doi: 10.1126/science.1063127. [DOI] [PubMed] [Google Scholar]
- 8.Aranda S., Mas G., Di Croce L. Regulation of gene transcription by Polycomb proteins. Sci. Adv. 2015;1:e1500737. doi: 10.1126/sciadv.1500737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Boettiger A.N., Bintu B., Zhuang X. Super-resolution imaging reveals distinct chromatin folding for different epigenetic states. Nature. 2016;529:418–422. doi: 10.1038/nature16496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Cedar H., Bergman Y. Linking DNA methylation and histone modification: patterns and paradigms. Nat. Rev. Genet. 2009;10:295–304. doi: 10.1038/nrg2540. [DOI] [PubMed] [Google Scholar]
- 11.Chodavarapu R.K., Feng S., Pellegrini M. Relationship between nucleosome positioning and DNA methylation. Nature. 2010;466:388–392. doi: 10.1038/nature09147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bell A.C., Felsenfeld G. Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene. Nature. 2000;405:482–485. doi: 10.1038/35013100. [DOI] [PubMed] [Google Scholar]
- 13.Fortin J.P., Hansen K.D. Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data. Genome Biol. 2015;16:180. doi: 10.1186/s13059-015-0741-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lister R., Pelizzola M., Ecker J.R. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009;462:315–322. doi: 10.1038/nature08514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Huang W., Sherman B.T., Lempicki R.A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37:1–13. doi: 10.1093/nar/gkn923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Huang W., Sherman B.T., Lempicki R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
- 17.Eisenberg E., Levanon E.Y. Human housekeeping genes, revisited. Trends Genet. 2013;29:569–574. doi: 10.1016/j.tig.2013.05.010. [DOI] [PubMed] [Google Scholar]
- 18.Xie W.J., Meng L., Gao Y.Q. Structural modeling of chromatin integrates genome features and reveals chromosome folding principle. Sci. Rep. 2017;7:2818. doi: 10.1038/s41598-017-02923-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Rao S.S., Huntley M.H., Aiden E.L. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Dixon J.R., Selvaraj S., Ren B. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Clauset A., Shalizi C.R., Newman M.E.J. Power-law distributions in empirical data. SIAM Rev. 2009;51:661–703. [Google Scholar]
- 22.Eugene V.K., Yuri I.W., Georgy P.K. Springer; New York: 2006. Power Laws, Scale-Free Networks and Genome Biology. [Google Scholar]
- 23.Schneider R., Grosschedl R. Dynamics and interplay of nuclear architecture, genome organization, and gene expression. Genes Dev. 2007;21:3027–3043. doi: 10.1101/gad.1604607. [DOI] [PubMed] [Google Scholar]
- 24.Schultz M.D., He Y., Ecker J.R. Human body epigenome maps reveal noncanonical DNA methylation variation. Nature. 2015;523:212–216. doi: 10.1038/nature14465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Peng C.K., Buldyrev S.V., Stanley H.E. Long-range correlations in nucleotide sequences. Nature. 1992;356:168–170. doi: 10.1038/356168a0. [DOI] [PubMed] [Google Scholar]
- 26.Arneodo A., Vaillant C., Thermes C. Multi-scale coding of genomic information: from DNA sequence to genome structure and function. Phys. Rep. 2011;498:45–188. [Google Scholar]
- 27.McDermid H.E., Morrow B.E. Genomic disorders on 22q11. Am. J. Hum. Genet. 2002;70:1077–1088. doi: 10.1086/340363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Berman B.P., Weisenberger D.J., Laird P.W. Regions of focal DNA hypermethylation and long-range hypomethylation in colorectal cancer coincide with nuclear lamina-associated domains. Nat. Genet. 2011;44:40–46. doi: 10.1038/ng.969. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hon G.C., Hawkins R.D., Ren B. Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genome Res. 2012;22:246–258. doi: 10.1101/gr.125872.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Lister R., Pelizzola M., Ecker J.R. Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature. 2011;471:68–73. doi: 10.1038/nature09798. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Lieberman-Aiden E., van Berkum N.L., Dekker J. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Serra F., Di Stefano M., Marti-Renom M.A. Restraint-based three-dimensional modeling of genomes and genomic domains. FEBS Lett. 2015;589(20 Pt A):2987–2995. doi: 10.1016/j.febslet.2015.05.012. [DOI] [PubMed] [Google Scholar]
- 33.Collings C.K., Anderson J.N. Links between DNA methylation and nucleosome occupancy in the human genome. Epigenet. Chromatin. 2017;10:18. doi: 10.1186/s13072-017-0125-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Gaidatzis D., Burger L., Stadler M.B. DNA sequence explains seemingly disordered methylation levels in partially methylated domains of Mammalian genomes. PLoS Genet. 2014;10:e1004143. doi: 10.1371/journal.pgen.1004143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Diesinger P.M., Heermann D.W. Depletion effects massively change chromatin properties and influence genome folding. Biophys. J. 2009;97:2146–2153. doi: 10.1016/j.bpj.2009.06.057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ricci M.A., Manzo C., Cosma M.P. Chromatin fibers are formed by heterogeneous groups of nucleosomes in vivo. Cell. 2015;160:1145–1158. doi: 10.1016/j.cell.2015.01.054. [DOI] [PubMed] [Google Scholar]
- 37.Grigoryev S.A. Nucleosome spacing and chromatin higher-order folding. Nucleus. 2012;3:493–499. doi: 10.4161/nucl.22168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Routh A., Sandin S., Rhodes D. Nucleosome repeat length and linker histone stoichiometry determine chromatin fiber structure. Proc. Natl. Acad. Sci. USA. 2008;105:8872–8877. doi: 10.1073/pnas.0802336105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Schroeder D.I., Blair J.D., LaSalle J.M. The human placenta methylome. Proc. Natl. Acad. Sci. USA. 2013;110:6037–6042. doi: 10.1073/pnas.1215145110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Saxonov S., Berg P., Brutlag D.L. A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc. Natl. Acad. Sci. USA. 2006;103:1412–1417. doi: 10.1073/pnas.0510310103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Costantini M., Clay O., Bernardi G. An isochore map of human chromosomes. Genome Res. 2006;16:536–541. doi: 10.1101/gr.4910606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Guo H., Zhu P., Qiao J. The DNA methylation landscape of human early embryos. Nature. 2014;511:606–610. doi: 10.1038/nature13544. [DOI] [PubMed] [Google Scholar]
- 43.Guo F., Yan L., Qiao J. The transcriptome and DNA methylome landscapes of human primordial germ cells. Cell. 2015;161:1437–1452. doi: 10.1016/j.cell.2015.05.015. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.