A Quantitative Analysis of the Impact on Chromatin Accessibility by Histone Modifications and Binding of Transcription Factors in DNase I Hypersensitive Sites

Peng Cui; Jing Li; Bo Sun; Menghuan Zhang; Baofeng Lian; Yixue Li; Lu Xie

doi:10.1155/2013/914971

. 2013 Oct 22;2013:914971. doi: 10.1155/2013/914971

A Quantitative Analysis of the Impact on Chromatin Accessibility by Histone Modifications and Binding of Transcription Factors in DNase I Hypersensitive Sites

Peng Cui ^1,², Jing Li ^1,², Bo Sun ^1,², Menghuan Zhang ¹, Baofeng Lian ^1,², Yixue Li ^1,^2,^*, Lu Xie ^2,^*

PMCID: PMC3819824 PMID: 24236298

Abstract

It is known that chromatin features such as histone modifications and the binding of transcription factors exert a significant impact on the “openness” of chromatin. In this study, we present a quantitative analysis of the genome-wide relationship between chromatin features and chromatin accessibility in DNase I hypersensitive sites. We found that these features show distinct preference to localize in open chromatin. In order to elucidate the exact impact, we derived quantitative models to directly predict the “openness” of chromatin using histone modification features and transcription factor binding features, respectively. We show that these two types of features are highly predictive for chromatin accessibility in a statistical viewpoint. Moreover, our results indicate that these features are highly redundant and only a small number of features are needed to achieve a very high predictive power. Our study provides new insights into the true biological phenomena and the combinatorial effects of chromatin features to differential DNase I hypersensitivity.

1. Introduction

In eukaryotes, DNA is organized into chains of nucleosomes, each of which consists of about 146 bp of DNA wrapped around an octamer of four types of histones [1]. The packaging of chromatin into nucleosomes provides a repressive environment for many DNA-binding proteins and plays an important role in the regulation of transcription [2]. However, some domains in chromatin are depleted of nucleosomes and exhibit highly accessible structure. These nucleosome-free regions are supersensitive to the cleavage of DNase I [3] and are known as DNase I hypersensitive sites (DHSs). They are predominantly found in many active genes and cis-regulatory elements [4]. The dynamic alterations of “openness” in chromatin play important roles in many biological processes, including transcription [5], replication [2], and differentiation [6].

Traditionally, the experimental technique of choice to discover the DNase I hypersensitive sites is Southern blotting [7]. However, this low-throughput method is not able to study large chromosomal regions at a time and cannot represent the “openness” of chromatin in a quantitative manner. The significance of differential accessibility in DNase I hypersensitive sites is unknown, but it may reflect some important biological phenomena like histone modifications and protein occupation [8]. Even until now genome-wide quantitative analyses of the relationship between chromatin accessibility and chromatin features in DNase I hypersensitive sites are rare. By taking advantage of the abundant datasets of the ENCODE project [9], we analyzed genome-wide localization data of DNase I hypersensitive sites and 33 chromatin features in human embryonic stem cell (H1hesc) cell line. All datasets were generated by recently developed genome-wide high throughput experimental techniques, such as Chip-seq [10, 11] and DNase-seq [12].

It is generally accepted that histone modifications and the binding of transcription factors are two main effectors for the “openness” of chromatin. Previous studies have shown that histone modifications and transcription factors tend to occur near or just in the DNase I hypersensitive sites [8, 13]. Recently, two studies, one in K562 cell line and the other in Drosophila embryonic cells, have demonstrated that transcription factor binding sites and the chromatin accessibility are highly correlated with each other [6, 13]. Although these studies provided important information, so far, quantitative analysis of the combinatorial effects of different chromatin features and the biological significance of differential hypersensitivity is still unclear. In this work, we built support vector regression (SVR) models to directly predict the “openness” of chromatin in DNase I hypersensitive sites using combined chromatin features. Our results indicate that both histone modification features and transcription factor binding features are predictive for chromatin accessibility with high accuracy and these chromatin features are highly redundant.

2. Materials and Methods

2.1. Datasets

All datasets are from ENCODE project, which aims to build a comprehensive list of functional elements in the human genome [9]. The 10 histone modifications (HMs) and binding sites of 23 transcription factors (TFs) were quantified using Chip-seq and downloaded from the tracks of UCSC Genome Browser at ENCODE/Broad Institute and ENCODE/Stanford/Yale/USC/Harvard. The chromatin accessibility dataset was measured using DNase-seq and downloaded from ENCODE/OpenChrom (Duke/UNC/UTA). Each dataset includes the genome-wide sequencing signals and regions of statistically enriched signal (peaks). Peaks can be viewed as locations of chromatin features and DNase I hypersensitive sites, respectively, and the values of DNase-seq signals represent chromatin accessibility. We must note that DNase I hypersensitivity could not be simply viewed as binary property (peaks versus nonpeaks) but rather continuous values (sequencing signals) representing differential chromatin accessibility. These datasets come from the common H1hesc cell line.

2.2. Mapping HM and TF Binding Peaks to the DNase I Hypersensitive Sites

We obtained genomic locations of 33 chromatin feature profiles, all together including 582489 histone modification peaks (10 HMs) and 443217 transcription factor binding peaks (23 TFs). For each profile, we mapped the peaks of feature onto the genome and examined whether it localized in open chromatin or not. The presence or absence of chromatin feature within accessible chromatin was decided by overlap or nonoverlap with DNase-seq peaks. If there was any amount of overlap within accessible chromatin (DNase-seq peaks), we counted as a presence [13]. Then, we calculated the percentage of the peaks occurring in the DNase I hypersensitive sites for each feature.

2.3. Supervised Learning Methods for Chromatin Accessibility Prediction

To investigate the quantitative relationship between chromatin accessibility and these chromatin features in DNase I hypersensitive sites, we constructed support vector regression (SVR) models for HM and TF binding features, respectively. Concretely, in every DNase I hypersensitive site, we calculated the maximum signal of DNase-Seq and the corresponding maximum signal of Chip-Seq for each chromatin feature. For the sake of figuring out whether the maximum signal exhibits largest prediction power or not, as a comparison, we also calculated the average signal of Chip-seq and DNase-seq for each hypersensitive region. Then, SVR model was built to predict the chromatin accessibility using signals of these chromatin features. SVR is a machine learning algorithm based on statistical theory for regression problems [14, 15]. We implemented this algorithm using the “e1071” R package [16].

In order to reduce the computation cost, we randomly selected 5000 DHSs for our samples. The sample size analysis indicated that the prediction power increased only moderately after the size reached 2000. So, the sample size of 5000 is big enough to represent the entire dataset (Supporting Information S1 which is available online at http://dx.doi.org/10.1155/2013/914971). We used the 10-fold cross-validation method to evaluate the prediction power. Specifically, we randomly split our sample dataset into 10 equal size subsets. Among them, 9 subsets were used as training data and the remaining subset was treated as the validation data for testing the model. This process was repeated 10 times and each subset could only be used once as the validation data. After that, we combined the results and plotted the regression relationship between predicted signals and the actual DNase-seq signals. Then, the coefficient of determination (R ²) [17] was computed indicating how well these data points fit the line. R ² is also a frequently used measure of the proportion of total variation of outcomes explained by the model. We chose the square root of the coefficient of determination (R) as our prediction power.

2.4. Analysis of the Importance for Each Chromatin Feature and the Combinatorial Effects of Different Features

To estimate which feature exhibits the maximal prediction power, we predicted the chromatin accessibility using only one feature. And to investigate whether HM features and TF binding features are redundant, we next predicted the “openness” of chromatin using all features. We also explored the combinatorial effects of these features. All possible one-feature (C₃₃ ¹), two-feature (C₃₃ ²), and three-feature (C₃₃ ³) models were evaluated by their performance.

2.5. Model Comparison Analysis

Instead of SVR algorithm, we also explored the quantitative relationship between chromatin features and chromatin accessibility with linear regression model. Similarly, HM features, TF binding features, and HM+TF feature combinations were applied to linear regression model, respectively. The coefficient of determination of the predicted signals and the actual DNase-seq signals were calculated and compared with the SVR models. In order to identify whether the maximum signals or the average signals exhibit largest prediction power, we also applied these models with the average signals of chromatin features to predict the average signals of DNase-seq.

3. Results

3.1. The Localization Preference of Chromatin Features

We analyzed genome-wide localizations of 33 Chip-seq profiles in the human embryonic stem cell line (H1hesc) from ENCODE project [9], including 10 histone modifications, and the binding sites of 23 transcription factors. For each profile, we mapped the peaks of Chip-seq dataset to the DNase I hypersensitive sites (see Section 2). Figure 1 shows the percentage of the peaks within the accessible chromatin for each feature. We observed that different chromatin feature exhibits different preference to chromatin accessibility. For histone modifications, H3k4me3 exerts the largest preference of accessible chromatin. 82.2% H3k4me3 peaks located in DHS. On the contrary, most H3k9me3 occurred out of DHS (93.7%), which indicated that H3k9me3 was associated with heterochromatin [18]. Compared to histone modifications, a majority of transcription factors tend to bind onto accessible chromatin, which suggests that the process of transcription requires an open chromatin structure [19]. The mean percentage of transcription factors locating in DHS is 60.5%, higher than that of histone modifications (45.1%).

The percentages of histone modification (HM) features and transcription factor (TF) binding features within accessible chromatin regions. The black circle and green triangle represent HM features and TF binding features, respectively. The two red lines represent the mean percentages for HMs and TFs, respectively.

3.2. Predicting Chromatin Accessibility Using Histone Modification Features

In order to examine the quantitative relationship between chromatin accessibility and HM features in a combinatorial manner, we constructed SVR model to predict the “openness” of chromatin in DNase I hypersensitive sites using all histone modification features. We can see from Figure 2(a) that there is a linear relationship between predicted signals and the actual DNase-Seq signals. The coefficient of determination (R ²) is 0.58 indicating that histone modification features explain about 58% variance of chromatin accessibility.

Prediction accuracy of chromatin accessibility using HM features. (a) Scatter plot of predicted versus experimentally measured DNase-seq signals using all HM features. The black line represents the linear fit between predicted and measured signals (R ², coefficient of determination). (b) Prediction powers (R, the square root of coefficient of determination) of the SVR models using only one particular HM feature.

We next examined the prediction power for every histone modification feature. Figure 2(b) shows that H3k4me2, H3k4me3, and H3k9ac exhibit the most important effects to chromatin accessibility (R = 0.67, 0.66,0.63, resp.). These histone modifications are generally enriched in the promoters of expressed genes [20] and the open chromatin structure plays an important role in regulating the complex transcription process. On the other hand, H3k9me3 and H3k36me3 exhibit the least prediction powers (R = 0.30,0.23, resp.), which suggests that these modifications are associated with heterochromatin [21, 22]. Interestingly, H3k27ac and H4k20me1, which are the most predictive histone modifications for gene expression levels [23], are not the most important features associated with chromatin accessibility.

3.3. Predicting Chromatin Accessibility Using Transcription Factor Binding Features

Previous studies have shown that transcription factors tend to bind onto open chromatin and they are highly correlated with each other [6, 13]. To investigate the quantitative relationship of the binding of transcription factors and the chromatin accessibility in a combinatorial manner, we next applied our SVR model to all TF binding features. As shown in Figure 3(a), the TF model achieves a coefficient of determination (R ²) of 0.58 which is equal to that achieved by HM model. These TF binding features can also explain about 58% variance of chromatin accessibility.

Prediction accuracy of chromatin accessibility using TF binding features. (a) Scatter plot of predicted versus experimentally measured DNase-seq signals using all TF binding features. The black line represents the linear fit between predicted and measured signals (R ², coefficient of determination). (b) Prediction powers (R, the square root of coefficient of determination) of the SVR models using only one particular TF binding feature.

For the prediction power of particular TF binding feature, there is a difference with that of histone modifications; that is, most transcription factors exhibit important effects to chromatin accessibility (Figure 3(b)). This is consistent with their functions because transcription factors directly control the complex transcription process [24] which requires an open chromatin environment. However, a small group of features exhibit lower prediction powers, such as SUZ12, CTCF, and ZNF274 (R = 0.37,0.36,0.33, resp.). ZNF274 and SUZ12 are known to be transcriptional repressors [25, 26]. CTCF has many roles, such as transcriptional repression, insulator function, and imprinting genetic information [27]. These factors are not so important to contribute to the “openness” of chromatin.

3.4. Chromatin Features Are Highly Redundant to Chromatin Accessibility

The previous analyses suggest that both histone modification features and transcription factor binding features are predictive for chromatin accessibility with high accuracy in DNase I hypersensitive sites. So, there is a question that whether the prediction power will increase if we use all these features. To address this question, we directly predicted the “openness” of chromatin using all features. As shown in Figure 4(a), the coefficient of determination (R ² = 0.66) is a little higher (8%) than using only HM or TF binding features, which indicates that these two types of features are highly redundant. To check the importance of different features and their combinatorial effects, we tried to build models with all possible combinations of one to three features (Figure 4(b)). Focusing on the three-feature combinations (5456 models), we found that the least prediction power combinations (H3k36me3, H3k9me3, and ZNF274, R = 0.45) could achieve about 56% prediction power of the full model (R = 0.81). And there are 110 combinations achieving more than 90% prediction power of the full one. These analyses indicate that most of these features are highly redundant for chromatin accessibility.

Redundancy of HM features and TF binding features. (a) Scatter plot of predicted versus experimentally measured DNase-seq signals using all HM and TF binding features. The black line represents the linear fit between predicted and measured signals (R ², coefficient of determination). (b) Comparison of prediction powers (R, the square root of coefficient of determination) between all possible one-feature, two-feature, three-feature models, and the full model in H1hesc.

By examining the 110 high prediction power combinations, we found that seven chromatin features, H3k4me2, H3k4me3, H3k9ac, H4k20me1, SIN3A, ZNF143, SUZ12, were significantly enriched (P < 0.01, hypergeometric test) in the set of 110 models. Interestingly, all these features showed high prediction powers in the one-feature models except H4k20me1 and SUZ12. H4k20me1 is a particular one, which has been reported for the most predictive histone modification for gene expression [23]. SUZ12 is a part of Polycomb Repressive Complex 2 (PRC2) and may be involved in chromatin silencing with noncoding RNA [25]. The mechanisms of how SUZ12 influences chromatin structure are unknown; however, it may exert distinct impact on chromatin accessibility compared with other features.

3.5. Comparison with Other Models

In this study, we chose the SVR algorithm and the maximum signal in every hypersensitive region to model the relationship between chromatin features and chromatin accessibility. Generally, the SVR algorithm is a nonlinear regression method. We also have explored modeling using linear regression model and the average signal in every region. As shown in Table 1, prediction powers of models using average signal are significantly lower than the corresponding maximum signal models. And in either situation, the SVR models exhibit higher prediction power than linear models. Our results indicate that the “openness” of chromatin is determined by the maximum signal of features and their relationships are assumed as a nonlinear relevance.

Table 1.

Comparison of prediction powers with different models. The prediction power is represented as the square root of the coefficient of determination (R) for predicted and the actual DNase-seq signals. LM: linear regression model.

Model	SVR	LM	SVR	LM
Model	(max_signal)	(max_signal)	(avg_signal)	(avg_signal)
HM	0.76	0.69	0.70	0.56
TF	0.76	0.63	0.69	0.53
HM + TF	0.81	0.73	0.75	0.61

Open in a new tab

4. Discussion

In this work, we presented quantitative analyses of the relationship of histone modifications and the binding of transcription factors to chromatin accessibility separately and combinedly in DNase I hypersensitive sites. We first examined the percentage of feature peaks within DNase hypersensitive sites (DHSs) in human embryonic stem cell (H1hesc) line. We found that different chromatin features showed different location preference in DHS. This may be due to the particular function of different chromatin features. Thurman et al. have done similar analysis in K562 cell line [13] for TF binding features. In our analysis, we find that the percentage of transcription factors within DHS is significantly lower in the H1hesc cell line than that in K562 cell line. The reason may be as follows: in order to maintain the “stemness” state, most genes are repressed in the stem cell compared to the cancer cell line K562. This phenomenon means that the degree to which chromatin features occur in accessible chromatin may differ according to different cellular circumstances.

Our results demonstrate that both histone modification (HM) features and transcription factor (TF) binding features account for nearly 58% variance of chromatin accessibility in H1hesc cell line. For histone hallmarks, many activators of gene expression exhibited important impact on the “openness” of chromatin, such as H3k4me [21] and histone acetylations [28]. The hallmarks of repressors for gene expression such as H3k9me3 [21] show lower prediction powers. Unexpectedly, the transcription elongation hallmark H3k36me3 [29] shows the least prediction power. This is consistent with the viewpoint of a recently published paper. Chantalat et al. [22] argued that H3k36me3 is associated with constitutive and facultative heterochromatin. For TF binding features, the majority of TFs showed an important impact on chromatin accessibility except some transcriptional repressors, such as ZNF274 and SUZ12. This may indicate that the complex transcription process requires open chromatin environment [19].

It is generally accepted that cellular factors regulate the complex dynamic change of chromatin structure in a collective manner. We have shown that these features is highly redundant to predict chromatin accessibility and a small subgroup of features are able to achieve a very high prediction power. However, the mechanism of how these features cooperatively impact the openness of chromatin is still unclear, and we must note that our analysis could not reveal the “cause” or “consequence” relationship of HM and TF binding features to chromatin accessibility. Histone modifications play an important role in creating and maintaining the accessible chromatin environment [30] and may act as docking sites for transcription factors [31]. Some pioneer TFs tend to bind onto the genome and create an accessible site, such as FoxA1 [32] which is the best known pioneer transcription factor. Then, more transcription factors tend to bind onto the opening site and the DNase I hypersensitive site is created. As an extension, future work could explore the mechanisms of how these features cooperatively regulate open chromatin structure and their causal relationships, based on increased datasets.

5. Conclusion

We present genome-wide quantitative analysis of the impact of chromatin features to chromatin accessibility in DNase I hypersensitive sites. Our findings indicate that both histone modifications and the binding of transcription factors could explain nearly 58% variation of the “openness” of chromatin structure. The combinatorial effect analyses reveal that these chromatin features are highly redundant for prediction and H3k4me2, H3k4me3, H3k9ac, SIN3A, and ZNF143 show closest association with chromatin accessibility. Our results provide insights into the systematic effects of chromatin features to differential chromatin accessibility.

Supplementary Material

Online Supporting Information S1. The prediction powers using all features with different sample sizes. In each sample size, we randomly sampled 500 times and modeling according to the procedure discussed in method 2.3 in each sample.

Online Supporting Information S2. The input maximum signal dataset including the maximum signals of Chip-seq and DNase-seq in 5000 DNase I hypersensitive sites.

Online Supporting Information S3. The input average signal dataset including the average signals of Chip-seq and DNase-seq in 5000 DNase I hypersensitive sites.

Online Supporting Information S4. The prediction powers (R) using only one feature.

Online Supporting Information S5. The prediction powers (R) using all possible two-feature combinations.

Online Supporting Information S6. The prediction powers (R) using all possible three-feature combinations.

Online Supporting Information S7. The related R codes are available.

Click here for additional data file.^{(1.9MB, zip)}

Acknowledgments

This work was funded by National Key Basic Research Program (2010CB912702, 2011CB910204), National Hi-Tech Program (2012AA020201), Key Infectious Disease Project (2012ZX10002012-014), and National Natural Science Foundation of China (31070752, 31000582).

Abbreviations

DHS:: DNase I hypersensitive site
HM:: Histone modification
SVR:: Support vector regression.

References

1.Luger K, Mäder AW, Richmond RK, Sargent DF, Richmond TJ. Crystal structure of the nucleosome core particle at 2.8 Å resolution. Nature. 1997;389(6648):251–260. doi: 10.1038/38444. [DOI] [PubMed] [Google Scholar]
2.Anderson JD, Widom J. Sequence and position-dependence of the equilibrium accessibility of nucleosomal DNA target sites. Journal of Molecular Biology. 2000;296(4):979–987. doi: 10.1006/jmbi.2000.3531. [DOI] [PubMed] [Google Scholar]
3.Dingwall C, Lomonossoff GP, Laskey RA. High sequence specificity of micrococcal nuclease. Nucleic Acids Research. 1981;9(12):2659–2674. doi: 10.1093/nar/9.12.2659. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Gross DS, Garrard WT. Nuclease hypersensitive sites in chromatin. Annual Review of Biochemistry. 1988;57:159–197. doi: 10.1146/annurev.bi.57.070188.001111. [DOI] [PubMed] [Google Scholar]
5.Cockerill PN. Structure and function of active chromatin and DNase I hypersensitive sites. FEBS Journal. 2011;278(13):2182–2210. doi: 10.1111/j.1742-4658.2011.08128.x. [DOI] [PubMed] [Google Scholar]
6.Li X-Y, Thomas S, Sabo PJ, Eisen MB, Stamatoyannopoulos JA, Biggin MD. The role of chromatin accessibility in directing the widespread, overlapping patterns of Drosophila transcription factor binding. Genome Biology. 2011;12(article R34) doi: 10.1186/gb-2011-12-4-r34. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Lu Q, Bruce R. Epigenetics Protocols. Humana Press; 2004. DNaseI hypersensitivity analysis of chromatin structure; pp. 77–86. [DOI] [PubMed] [Google Scholar]
8.Boyle AP, Davis S, Shulha HP, et al. High-resolution mapping and characterization of open chromatin across the genome. Cell. 2008;132(2):311–322. doi: 10.1016/j.cell.2007.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Dunham I, Kundaje A, Aldred SF, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nature Reviews Genetics. 2009;10(10):669–680. doi: 10.1038/nrg2641. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Mardis ER. ChIP-seq: welcome to the new frontier. Nature Methods. 2007;4(8):613–614. doi: 10.1038/nmeth0807-613. [DOI] [PubMed] [Google Scholar]
12.Song L, Crawford GE. DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harbor Protocols. 2010 doi: 10.1101/pdb.prot5384. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Thurman RE, Rynes E, Humbert R, et al. The accessible chromatin landscape of the human genome. Nature. 2012;489:75–82. doi: 10.1038/nature11232. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Nello C, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press; 2000. [Google Scholar]
15.Cheng C, Yan K-K, Yip KY, et al. A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets. Genome Biology. 2011;12(2, article R15) doi: 10.1186/gb-2011-12-2-r15. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Evgenia D, Hornik K, Leisch F, Meyer D, Weingessel A. Misc functions of the Department of Statistics (e1071), TU Wien. R package pp. 1–5, 2008.
17.Everitt BS. Cambridge Dictionary of Statistics. 2nd edition. New York, NY, USA: Cambridge University Press; 2002. [Google Scholar]
18.Bartkova J, Moudry P, Hodny Z, Lukas J, Meyts RD, Bartek J. Heterochromatin marks HP1γ, HP1α and H3K9me3, and DNA damage response activation in human testis development and germ cell tumours. International Journal of Andrology. 2011;34(4):e103–e113. doi: 10.1111/j.1365-2605.2010.01096.x. [DOI] [PubMed] [Google Scholar]
19.Sproul D, Gilbert N, Bickmore WA. The role of chromatin structure in regulating the expression of clustered genes. Nature Reviews Genetics. 2005;6(10):775–781. doi: 10.1038/nrg1688. [DOI] [PubMed] [Google Scholar]
20.Regha K, Sloane MA, Huang R, et al. Active and repressive chromatin are interspersed without spreading in an imprinted gene cluster in the mammalian genome. Molecular Cell. 2007;27(3):353–366. doi: 10.1016/j.molcel.2007.06.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Margueron R, Reinberg D. Chromatin structure and the inheritance of epigenetic information. Nature Reviews Genetics. 2010;11(4):285–296. doi: 10.1038/nrg2752. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Chantalat S, Depaux A, Héry P, et al. Histone H3 trimethylation at lysine 36 is associated with constitutive and facultative heterochromatin. Genome Research. 2011;21(9):1426–1437. doi: 10.1101/gr.118091.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Karlić R, Chung H-R, Lasserre J, Vlahoviček K, Vingron M. Histone modification levels are predictive for gene expression. Proceedings of the National Academy of Sciences of the United States of America. 2010;107(7):2926–2931. doi: 10.1073/pnas.0909344107. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Gill G. Regulation of the initiation of eukaryotic transcription. Essays in Biochemistry. 2001;37:33–43. doi: 10.1042/bse0370033. [DOI] [PubMed] [Google Scholar]
25.Rinn JL, Kertesz M, Wang JK, et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell. 2007;129(7):1311–1323. doi: 10.1016/j.cell.2007.05.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Yano K, Ueki N, Oda T, Seki N, Masuho Y, Muramatsu M-A. Identification and characterization of human ZNF274 cDNA, which encodes a novel Kruppel-type zinc-finger protein having nucleolar targeting ability. Genomics. 2000;65(1):75–80. doi: 10.1006/geno.2000.6140. [DOI] [PubMed] [Google Scholar]
27.Dunn KL, Davie JR. The many roles of the transcriptional regulator CTCF. Biochemistry and Cell Biology. 2003;81(3):161–167. doi: 10.1139/o03-052. [DOI] [PubMed] [Google Scholar]
28.Turner BM. Histone acetylation and an epigenetic code. BioEssays. 2000;22(9):836–845. doi: 10.1002/1521-1878(200009)22:9<836::AID-BIES9>3.0.CO;2-X. [DOI] [PubMed] [Google Scholar]
29.Fuchs SM, Laribee RN, Strahl BD. Protein modifications in transcription elongation. Biochimica et Biophysica Acta. 2009;1789(1):26–36. doi: 10.1016/j.bbagrm.2008.07.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Marx J. Molecular biology. Protein tail modification opens way for gene activity. Science. 2006;311(5762):p. 757. doi: 10.1126/science.311.5762.757a. [DOI] [PubMed] [Google Scholar]
31.Bell O, Tiwari VK, Thomä NH, Schübeler D. Determinants and dynamics of genome accessibility. Nature Reviews. 2011;12:554–564. doi: 10.1038/nrg3017. [DOI] [PubMed] [Google Scholar]
32.Cirillo LA, Lin FR, Cuesta I, Friedman D, Jarnik M, Zaret KS. Opening of compacted chromatin by early developmental transcription factors HNF3 (FoxA) and GATA-4. Molecular Cell. 2002;9(2):279–289. doi: 10.1016/s1097-2765(02)00459-8. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Online Supporting Information S2. The input maximum signal dataset including the maximum signals of Chip-seq and DNase-seq in 5000 DNase I hypersensitive sites.

Online Supporting Information S3. The input average signal dataset including the average signals of Chip-seq and DNase-seq in 5000 DNase I hypersensitive sites.

Online Supporting Information S4. The prediction powers (R) using only one feature.

Online Supporting Information S5. The prediction powers (R) using all possible two-feature combinations.

Online Supporting Information S6. The prediction powers (R) using all possible three-feature combinations.

Online Supporting Information S7. The related R codes are available.

Click here for additional data file.^{(1.9MB, zip)}

[B1] 1.Luger K, Mäder AW, Richmond RK, Sargent DF, Richmond TJ. Crystal structure of the nucleosome core particle at 2.8 Å resolution. Nature. 1997;389(6648):251–260. doi: 10.1038/38444. [DOI] [PubMed] [Google Scholar]

[B2] 2.Anderson JD, Widom J. Sequence and position-dependence of the equilibrium accessibility of nucleosomal DNA target sites. Journal of Molecular Biology. 2000;296(4):979–987. doi: 10.1006/jmbi.2000.3531. [DOI] [PubMed] [Google Scholar]

[B3] 3.Dingwall C, Lomonossoff GP, Laskey RA. High sequence specificity of micrococcal nuclease. Nucleic Acids Research. 1981;9(12):2659–2674. doi: 10.1093/nar/9.12.2659. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4.Gross DS, Garrard WT. Nuclease hypersensitive sites in chromatin. Annual Review of Biochemistry. 1988;57:159–197. doi: 10.1146/annurev.bi.57.070188.001111. [DOI] [PubMed] [Google Scholar]

[B5] 5.Cockerill PN. Structure and function of active chromatin and DNase I hypersensitive sites. FEBS Journal. 2011;278(13):2182–2210. doi: 10.1111/j.1742-4658.2011.08128.x. [DOI] [PubMed] [Google Scholar]

[B6] 6.Li X-Y, Thomas S, Sabo PJ, Eisen MB, Stamatoyannopoulos JA, Biggin MD. The role of chromatin accessibility in directing the widespread, overlapping patterns of Drosophila transcription factor binding. Genome Biology. 2011;12(article R34) doi: 10.1186/gb-2011-12-4-r34. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Lu Q, Bruce R. Epigenetics Protocols. Humana Press; 2004. DNaseI hypersensitivity analysis of chromatin structure; pp. 77–86. [DOI] [PubMed] [Google Scholar]

[B8] 8.Boyle AP, Davis S, Shulha HP, et al. High-resolution mapping and characterization of open chromatin across the genome. Cell. 2008;132(2):311–322. doi: 10.1016/j.cell.2007.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9.Dunham I, Kundaje A, Aldred SF, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10.Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nature Reviews Genetics. 2009;10(10):669–680. doi: 10.1038/nrg2641. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11.Mardis ER. ChIP-seq: welcome to the new frontier. Nature Methods. 2007;4(8):613–614. doi: 10.1038/nmeth0807-613. [DOI] [PubMed] [Google Scholar]

[B12] 12.Song L, Crawford GE. DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harbor Protocols. 2010 doi: 10.1101/pdb.prot5384. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13.Thurman RE, Rynes E, Humbert R, et al. The accessible chromatin landscape of the human genome. Nature. 2012;489:75–82. doi: 10.1038/nature11232. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14.Nello C, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press; 2000. [Google Scholar]

[B15] 15.Cheng C, Yan K-K, Yip KY, et al. A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets. Genome Biology. 2011;12(2, article R15) doi: 10.1186/gb-2011-12-2-r15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16.Evgenia D, Hornik K, Leisch F, Meyer D, Weingessel A. Misc functions of the Department of Statistics (e1071), TU Wien. R package pp. 1–5, 2008.

[B17] 17.Everitt BS. Cambridge Dictionary of Statistics. 2nd edition. New York, NY, USA: Cambridge University Press; 2002. [Google Scholar]

[B18] 18.Bartkova J, Moudry P, Hodny Z, Lukas J, Meyts RD, Bartek J. Heterochromatin marks HP1γ, HP1α and H3K9me3, and DNA damage response activation in human testis development and germ cell tumours. International Journal of Andrology. 2011;34(4):e103–e113. doi: 10.1111/j.1365-2605.2010.01096.x. [DOI] [PubMed] [Google Scholar]

[B19] 19.Sproul D, Gilbert N, Bickmore WA. The role of chromatin structure in regulating the expression of clustered genes. Nature Reviews Genetics. 2005;6(10):775–781. doi: 10.1038/nrg1688. [DOI] [PubMed] [Google Scholar]

[B20] 20.Regha K, Sloane MA, Huang R, et al. Active and repressive chromatin are interspersed without spreading in an imprinted gene cluster in the mammalian genome. Molecular Cell. 2007;27(3):353–366. doi: 10.1016/j.molcel.2007.06.024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21.Margueron R, Reinberg D. Chromatin structure and the inheritance of epigenetic information. Nature Reviews Genetics. 2010;11(4):285–296. doi: 10.1038/nrg2752. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22.Chantalat S, Depaux A, Héry P, et al. Histone H3 trimethylation at lysine 36 is associated with constitutive and facultative heterochromatin. Genome Research. 2011;21(9):1426–1437. doi: 10.1101/gr.118091.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] 23.Karlić R, Chung H-R, Lasserre J, Vlahoviček K, Vingron M. Histone modification levels are predictive for gene expression. Proceedings of the National Academy of Sciences of the United States of America. 2010;107(7):2926–2931. doi: 10.1073/pnas.0909344107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] 24.Gill G. Regulation of the initiation of eukaryotic transcription. Essays in Biochemistry. 2001;37:33–43. doi: 10.1042/bse0370033. [DOI] [PubMed] [Google Scholar]

[B25] 25.Rinn JL, Kertesz M, Wang JK, et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell. 2007;129(7):1311–1323. doi: 10.1016/j.cell.2007.05.022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26.Yano K, Ueki N, Oda T, Seki N, Masuho Y, Muramatsu M-A. Identification and characterization of human ZNF274 cDNA, which encodes a novel Kruppel-type zinc-finger protein having nucleolar targeting ability. Genomics. 2000;65(1):75–80. doi: 10.1006/geno.2000.6140. [DOI] [PubMed] [Google Scholar]

[B27] 27.Dunn KL, Davie JR. The many roles of the transcriptional regulator CTCF. Biochemistry and Cell Biology. 2003;81(3):161–167. doi: 10.1139/o03-052. [DOI] [PubMed] [Google Scholar]

[B28] 28.Turner BM. Histone acetylation and an epigenetic code. BioEssays. 2000;22(9):836–845. doi: 10.1002/1521-1878(200009)22:9<836::AID-BIES9>3.0.CO;2-X. [DOI] [PubMed] [Google Scholar]

[B29] 29.Fuchs SM, Laribee RN, Strahl BD. Protein modifications in transcription elongation. Biochimica et Biophysica Acta. 2009;1789(1):26–36. doi: 10.1016/j.bbagrm.2008.07.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] 30.Marx J. Molecular biology. Protein tail modification opens way for gene activity. Science. 2006;311(5762):p. 757. doi: 10.1126/science.311.5762.757a. [DOI] [PubMed] [Google Scholar]

[B31] 31.Bell O, Tiwari VK, Thomä NH, Schübeler D. Determinants and dynamics of genome accessibility. Nature Reviews. 2011;12:554–564. doi: 10.1038/nrg3017. [DOI] [PubMed] [Google Scholar]

[B32] 32.Cirillo LA, Lin FR, Cuesta I, Friedman D, Jarnik M, Zaret KS. Opening of compacted chromatin by early developmental transcription factors HNF3 (FoxA) and GATA-4. Molecular Cell. 2002;9(2):279–289. doi: 10.1016/s1097-2765(02)00459-8. [DOI] [PubMed] [Google Scholar]

PERMALINK

A Quantitative Analysis of the Impact on Chromatin Accessibility by Histone Modifications and Binding of Transcription Factors in DNase I Hypersensitive Sites

Peng Cui

Jing Li

Bo Sun

Menghuan Zhang

Baofeng Lian

Yixue Li

Lu Xie

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.2. Mapping HM and TF Binding Peaks to the DNase I Hypersensitive Sites

2.3. Supervised Learning Methods for Chromatin Accessibility Prediction

2.4. Analysis of the Importance for Each Chromatin Feature and the Combinatorial Effects of Different Features

2.5. Model Comparison Analysis

3. Results

3.1. The Localization Preference of Chromatin Features

Figure 1.

3.2. Predicting Chromatin Accessibility Using Histone Modification Features

Figure 2.

3.3. Predicting Chromatin Accessibility Using Transcription Factor Binding Features

Figure 3.

3.4. Chromatin Features Are Highly Redundant to Chromatin Accessibility

Figure 4.

3.5. Comparison with Other Models

Table 1.

4. Discussion

5. Conclusion

Supplementary Material

Acknowledgments

Abbreviations

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases