Significance
Chromatin consists of DNA and hundreds of proteins that interact with the genetic material. In vivo, chromatin folds into nonrandom structures. The physical mechanism leading to these characteristic conformations, however, remains poorly understood. Here, we introduce a model that generates chromosome conformations by using the idea that chromatin can be subdivided into types based on its biochemical interactions. Chromatin types, which are distinct from DNA sequence, are partially epigenetically controlled and change during cell differentiation, thus constituting a link between epigenetics, chromosomal organization, and cell development. The degree of accuracy achieved by this model supports the viability of the proposed physical mechanism of chromatin folding and makes the computational model a powerful tool for future investigations.
Keywords: human genome, genome architecture, maximum entropy, molecular dynamics, Hi-C
Abstract
In vivo, the human genome folds into a characteristic ensemble of 3D structures. The mechanism driving the folding process remains unknown. We report a theoretical model for chromatin (Minimal Chromatin Model) that explains the folding of interphase chromosomes and generates chromosome conformations consistent with experimental data. The energy landscape of the model was derived by using the maximum entropy principle and relies on two experimentally derived inputs: a classification of loci into chromatin types and a catalog of the positions of chromatin loops. First, we trained our energy function using the Hi-C contact map of chromosome 10 from human GM12878 lymphoblastoid cells. Then, we used the model to perform molecular dynamics simulations producing an ensemble of 3D structures for all GM12878 autosomes. Finally, we used these 3D structures to generate contact maps. We found that simulated contact maps closely agree with experimental results for all GM12878 autosomes. The ensemble of structures resulting from these simulations exhibited unknotted chromosomes, phase separation of chromatin types, and a tendency for open chromatin to lie at the periphery of chromosome territories.
Chromatin comprises a highly flexible polymer composed of nucleosomes, DNA wrapped around histone proteins, connected to one another by a linker region of 20–50 bp. Hundreds of associated structural and regulatory proteins interact with the genetic material, coordinating the way chromatin folds to fit inside the nucleus of eukaryotic cells.
The resulting ensemble of partially organized structures brings sections of DNA separated by a great genomic distance into close spatial proximity, and plays an important role in controlling gene transcription (1, 2). Although some of the features of this ensemble can be explained using simple polymer physics (3–6), there is now ample evidence that specific biochemical interactions play a crucial role (7–10). Understanding the interplay between biochemistry, genome architecture, and transcriptional regulation is a major outstanding challenge.
For over two decades, molecular biology techniques that combine chromatin fragmentation and proximity ligation have given us quantitative information about how chromatin is organized in vivo (5, 11–13). In recent years, Hi-C experiments have made it possible to measure the frequency of contact between all pairs of genomic loci using a single experiment.
Here, we explore a physical model by which local interactions between genomic loci can lead to the conformations of human chromosomes in interphase. Specifically, we propose a theoretical energy landscape model for chromatin folding, designated the Minimal Chromatin Model (MiChroM), which uses the maximum entropy principle (14, 15) in combination with a minimal number of assumptions to model the structural consequences of the aforementioned biochemical interactions. Importantly, MiChroM can be used to model biochemical interactions even though the identity of the interacting biomolecules is unknown. MiChroM suggests a mechanism that is sufficient to explain chromatin organization and can be used to generate ensembles of 3D structures describing whole genomes. As we will show, contact maps generated in silico from these ensembles of structures reproduce in detail the maps from Hi-C.
The first assumption made in MiChroM is that the genome is partitioned into intervals of a handful of types, such that each type of interval is marked by characteristic histone modifications and interacts with a characteristic combination of nuclear proteins. As a result, when two segments of chromatin come into contact, the effective free energy change due to this contact depends, to first order, on the chromatin type of each segment [also Jost et al. (16)].
This assumption is supported by both biochemical and structural data. For instance, five distinct types of chromatin have been found in Drosophila cells based on the binding patterns of nuclear proteins (17). Further, analysis of original Hi-C maps (5) suggested that human chromatin is partitioned into two compartments, A and B, each associated with distinct long-range contact patterns. More recently, Rao et al. (9) used kilobase-resolution Hi-C experiments to show that the human genome can be further partitioned into six subcompartments (A1 and A2 and B1, B2, B3, and B4), each correlated with particular histone marks and associated with a particular pattern of long-range contacts. A similar partitioning of the genome was observed also in the mouse (9, 18) and Drosophila (19, 20). Both the boundaries of these genomic intervals and their chromatin types may change along with changes in cell state (9). The close association between interval types and long-range contact patterns suggests that intervals of the same type segregate together in the nucleus.
The second assumption made in MiChroM is that certain pairs of genomic “anchor” loci tend to form loops. This tendency is encoded in the model as a change in the effective free energy of a chromatin configuration when the two anchor loci are in contact. This assumption is well-supported by historical literature (8), and has been further confirmed by recent high-resolution Hi-C maps of the human genome, where loops are visible as peaks in the contact probability map (9). Most loops are associated with convergent pairs of CCCTC-binding factor (CTCF)–binding motifs, which have been proposed to help orchestrate loop formation via extrusion (21). MiChroM, however, makes no assumption about the particular mechanism of loop formation. Loops associated with the presence of CTCFs typically enclose a few hundred kilobases of DNA, and there is evidence that such structures are involved in diverse regulatory functions, including activation, repression, and insulation (8).
Finally, MiChroM assumes that every time a pair of loci comes into contact, there is a gain/loss of effective free energy, , that depends only on the genomic distance, . This “ideal chromosome” term models the local structure of chromatin in the absence of compartmentalization or looping (15), and is sequence translational invariant by construction. The form of the ideal chromosome potential is supported by the widespread evidence that chromatin can behave like a liquid crystal (22–24), and is consistent with the popular notion of the existence of a higher order fiber in chromatin (25–27), although remaining more general.
To build a physical model for chromatin, we use the maximum entropy principle to convert the above three assumptions into an information theoretical energy function. The effective energy that maximizes the information theoretical entropy takes the following form (SI Appendix):
and includes, respectively, the potential energy, , characterizing a generic homopolymer; the interactions between chromatin types (assumption 1); the interactions between loop anchors (assumption 2); and the translational invariant compaction term (assumption 3).
This potential function contains 27 parameters that must be provided to specify the model fully. Once the potential function is fully specified, it is possible to perform molecular dynamics simulations of chromatin using as input the classification of loci into chromatin types and the location of loops. This procedure is directly analogous to the simulation of protein folding using amino acid sequence and disulfide bond positions as the only input.
Determining the optimal value for these 27 parameters requires a training dataset. In this case, we iteratively adjusted the parameter set to reproduce data extracted from a Hi-C contact map of chromosome 10 generated using GM12878 cells (9). To do so, we modeled human chromosome 10, which is 136 Mbp long, as a polymer containing 2,712 monomers, each representing 50 kb of DNA. We used the annotations generated by Rao et al. (9) to assign each monomer a chromatin type, as well as to specify the positions of loops between pairs of monomers. In each iteration, we combined these polymer specifications with the current parameter set to generate an ensemble of structures. We then used this ensemble to generate a simulated map of pairwise intermonomer contact frequencies, and compared this contact map with the one obtained by Rao et al. (9) experimentally to choose the next set of parameters (SI Appendix).
The simulated contact maps obtained using the final set of parameters correspond closely to the experimental contact maps obtained for chromosome 10 (Pearson’s r = 0.95). This correspondence goes beyond the visually obvious “checkerboard” pattern in the simulated contact map (Fig. 1). In general, all features larger than 300–400 kb in the experimental contact map (i.e., features that are about an order of magnitude larger than the size of an individual monomer in our simulations) appear to be accurately recapitulated by the MiChroM model. Notably, the power law scaling relationship between the probability of forming contacts and genomic distance, often used to justify the nonequilibrium fractal globule model, is also reproduced with great accuracy by this equilibrium model (Fig. 1E).
Next, we applied the MiChroM model to the remaining GM12878 autosomes by combining the potential function with the experimentally derived monomer type and loop annotations. When each chromosome is simulated separately, the resulting intrachromosomal contact map closely corresponds to the experimental contact map in every case. Notably, the correspondence for autosomes that were not used to train the potential function was typically as close (Pearson’s r = 0.95) as the correspondence for chromosome 10 (Fig. 2 and SI Appendix, Supplementary Text and Figs. S2–S47).
When we examined the ensemble of 3D structures for each individual chromosome, we observed that each chromosome formed a compact chromosome territory. We also observed the phase separation of chromatin types within this territory, leading to subvolumes comprising only a single type of genomic interval (Fig. 3A). Usually, only a single subvolume formed for each subcompartment, although we observed multiple subvolumes of a single type in some cases. Similarly, we see that highly expressed genes [as measured by RNA sequencing (28)] tend to occupy spatial subvolumes, which is expected, given that highly expressed genes lie predominantly in the A compartment. Overall, these findings are consistent with the notion that different types of intervals colocalize in distinct spatial compartments. Interestingly, the A compartment tends to be less densely packed and to lie at the periphery of the chromosome territory. These observations are consistent with the findings of prior studies using both microscopy and Hi-C (9, 29, 30). Notably, a control model composed of a simple self-avoiding homopolymer chain failed to exhibit any of these results and, instead, recapitulated the expected properties for an equilibrium globule (Fig. 3 A and B and SI Appendix, Fig. S3).
It is commonly assumed that one essential feature of chromosomes is the absence of knots, because one might suppose that a highly knotted structure could create obstacles to the transcription process. We studied the extent of knotting in the ensemble of chromosome structures sampled from the optimized energy landscape and from the homopolymer potential. To quantify knotting in a particular conformation of the chromosome, we used two different knot invariants: the Alexander polynomial and the minimal rope length required to generate a topologically equivalent knot (15, 31). Both measures show that the configurations produced by MiChroM are largely devoid of knots. In contrast, the homopolymer control system tended to form extraordinarily complex knots (Fig. 3C). This topological feature is a direct result of inferring the energy landscape from the three physical assumptions explained above. Remarkably, the simple equilibrium mechanism underlying MiChroM produces ensembles of structures that are devoid of knots.
Finally, we used MiChroM to simulate chromosomes 17 and 18 jointly (SI Appendix, Fig. S1). This simulation allowed us to explore whether the MiChroM potential function, which was trained using a single intrachromosomal contact map for chromosome 10, could successfully reproduce genome architecture at a larger scale. The resulting intrachromosomal contact maps are essentially the same as those intrachromosomal contact maps simulated in isolation (Pearson’s r = 0.96). The phenomenon of phase separation of chromatin types now extends to both chromosomes, creating larger regions of space occupied by one single type. Spatial confinement introduces artifacts in the frequency of interchromosomal contacts; therefore, the interchromosomal contact map from simulation shows somewhat increased probabilities with respect to Hi-C. Even with the biased intensity, the two-chromosome map shows a correct pattern of interchromosomal interactions.
When we examined the 3D ensemble, we found that, despite the extensive contacts between the chromosomes, the chromosomes were not entangled with one another (SI Appendix, Fig. S1B); instead, we observed the formation of nonoverlapping chromosome territories. This last result highlights the fact that MiChroM can successfully recapitulate features of the nucleus as a whole.
The MiChroM assumes that chromosomes fold under the action of a cloud of proteins that bind with different selectivity to different sections of chromatin, and offers a simple strategy for recapitulating the energy landscape created by such interactions. This energy landscape brings about transient contacts rather than permanent ones, which is consistent with the fact that most of the experimentally observed contacts between two genetic loci only occur in a small fraction of cells at a given time (5, 32). Contacts associated with loop formation tend to be more frequent; accordingly, our optimization algorithm assigns them a larger free energy gain upon formation. In humans, we find that six types of chromatin are sufficient to reproduce the arrangement of interphase DNA in vivo. The fact that our model can be reliably transferred from one chromosome to the rest suggests the plausibility of the proposed energetic mechanism, even if the underlying biochemical details remain unclear at the present time.
As shown, MiChroM is able to explain and reproduce the results of DNA proximity ligation experiments. Nevertheless, caution must be applied in the interpretation of these results. Hi-C experiments are performed using millions of cells at once, and report only a population average. We know little about what happens in individual cells at specific moments in time. For instance, a typical cell population interrogated by Hi-C may contain entirely separate subpopulations, as well as fluctuating or even oscillating configurations. These subpopulations and configurations would be lost in MiChroM.
The classification of loci into chromatin types and the position of chromatin loops, which are inputs of our model, are strongly associated with epigenetic features (histone modifications and bound CTCF motifs in convergent orientation) that can be directly and inexpensively assayed by ChIP sequencing. Exploiting these associations along with MiChroM opens up the possibility of predicting in silico the 3D structure of whole genomes starting from 1D genomics data, which are often already publicly available.
Supplementary Material
Acknowledgments
We thank Ryan R. Cheng, Davit Potoyan, and Lena Simine for many useful discussions and Erica J. Di Pierro for help in editing the manuscript. This work was supported by the Center for Theoretical Biological Physics sponsored by the National Science Foundation (Grants PHY-1427654 and NSF-CHE-1614101) and by the Cancer Prevention and Research Institute of Texas (Grant R1110). Additional support to P.G.W. was provided by the D. R. Bullard-Welch Chair at Rice University (Grant C-0016). M.D.P. was also supported by the Welch Foundation (Grant C-1792). E.L.A. was also supported by an NIH New Innovator Award (1DP2OD008540-01), an NIH 4D Nucleome Grant (U01HL130010), the NHGRI Center for Excellence for Genomic Sciences (Grant HG006193), the Welch Foundation (Grant Q-1866), an NVIDIA Research Center Award, an IBM University Challenge Award, a Google Research Award, a Cancer Prevention Research Institute of Texas Scholar Award (R1304), a McNair Medical Institute Scholar Award, and the President's Early Career Award in Science and Engineering.
Footnotes
The authors declare no conflict of interest.
See Commentary on page 11991.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1613607113/-/DCSupplemental.
References
- 1.Cremer T, Cremer C. Chromosome territories, nuclear architecture and gene regulation in mammalian cells. Nat Rev Genet. 2001;2(4):292–301. doi: 10.1038/35066075. [DOI] [PubMed] [Google Scholar]
- 2.Bickmore WA. The spatial organization of the human genome. Annu Rev Genomics Hum Genet. 2013;14:67–84. doi: 10.1146/annurev-genom-091212-153515. [DOI] [PubMed] [Google Scholar]
- 3.Grosberg AY, Nechaev SK, Shakhnovich EI. The role of topological constraints in the kinetics of collapse of macromolecules. J Phys (Paris) 1988;49(12):2095–2100. [Google Scholar]
- 4.Grosberg A, Rabin Y, Havlin S, Neer A. Crumpled globule model of the 3-dimensional structure of DNA. Europhys Lett. 1993;23(5):373–378. [Google Scholar]
- 5.Lieberman-Aiden E, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326(5950):289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gürsoy G, Xu Y, Kenter AL, Liang J. Spatial confinement is a major determinant of the folding landscape of human chromosomes. Nucleic Acids Res. 2014;42(13):8223–8230. doi: 10.1093/nar/gku462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.van Steensel B. Chromatin: Constructing the big picture. EMBO J. 2011;30(10):1885–1895. doi: 10.1038/emboj.2011.135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Phillips JE, Corces VG. CTCF: Master weaver of the genome. Cell. 2009;137(7):1194–1211. doi: 10.1016/j.cell.2009.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Rao SSP, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159(7):1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Collepardo-Guevara R, et al. Chromatin unfolding by epigenetic modifications explained by dramatic impairment of internucleosome interactions: A multiscale computational study. J Am Chem Soc. 2015;137(32):10205–10215. doi: 10.1021/jacs.5b04086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Cullen KE, Kladde MP, Seyfred MA. Interaction between transcription regulatory regions of prolactin chromatin. Science. 1993;261(5118):203–206. doi: 10.1126/science.8327891. [DOI] [PubMed] [Google Scholar]
- 12.Dekker J, Rippe K, Dekker M, Kleckner N. Capturing chromosome conformation. Science. 2002;295(5558):1306–1311. doi: 10.1126/science.1067799. [DOI] [PubMed] [Google Scholar]
- 13.Dostie J, et al. Chromosome Conformation Capture Carbon Copy (5C): A massively parallel solution for mapping interactions between genomic elements. Genome Res. 2006;16(10):1299–1309. doi: 10.1101/gr.5571506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Jaynes ET. Information theory and statistical mechanics. Phys Rev. 1957;106(4):620–630. [Google Scholar]
- 15.Zhang B, Wolynes PG. Topology, structures, and energy landscapes of human chromosomes. Proc Natl Acad Sci USA. 2015;112(19):6062–6067. doi: 10.1073/pnas.1506257112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Jost D, Carrivain P, Cavalli G, Vaillant C. Modeling epigenome folding: Formation and dynamics of topologically associated chromatin domains. Nucleic Acids Res. 2014;42(15):9553–9561. doi: 10.1093/nar/gku698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Filion GJ, et al. Systematic protein location mapping reveals five principal chromatin types in Drosophila cells. Cell. 2010;143(2):212–224. doi: 10.1016/j.cell.2010.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Dixon JR, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485(7398):376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Eagen KP, Hartl TA, Kornberg RD. Stable chromosome condensation revealed by chromosome conformation capture. Cell. 2015;163(4):934–946. doi: 10.1016/j.cell.2015.10.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Sexton T, et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell. 2012;148(3):458–472. doi: 10.1016/j.cell.2012.01.010. [DOI] [PubMed] [Google Scholar]
- 21.Sanborn AL, et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc Natl Acad Sci USA. 2015;112(47):E6456–E6465. doi: 10.1073/pnas.1518552112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Boy de la Tour E, Laemmli UK. The metaphase scaffold is helically folded: Sister chromatids have predominantly opposite helical handedness. Cell. 1988;55(6):937–944. doi: 10.1016/0092-8674(88)90239-5. [DOI] [PubMed] [Google Scholar]
- 23.Naumova N, et al. Organization of the mitotic chromosome. Science. 2013;342(6161):948–953. doi: 10.1126/science.1236083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Zhang B, Wolynes PG. Shape transitions and chiral symmetry breaking in the energy landscape of the mitotic chromosome. Phys Rev Lett. 2016;116(24):248101. doi: 10.1103/PhysRevLett.116.248101. [DOI] [PubMed] [Google Scholar]
- 25.Bascom GD, Sanbonmatsu KY, Schlick T. Mesoscale modeling reveals hierarchical looping of chromatin fibers near gene regulatory elements. J Phys Chem B. 2016;120(33):8642–8653. doi: 10.1021/acs.jpcb.6b03197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Grigoryev SA, et al. Hierarchical looping of zigzag nucleosome chains in metaphase chromosomes. Proc Natl Acad Sci USA. 2016;113(5):1238–1243. doi: 10.1073/pnas.1518280113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Maeshima K, Hihara S, Eltsov M. Chromatin structure: Does the 30-nm fibre exist in vivo? Curr Opin Cell Biol. 2010;22(3):291–297. doi: 10.1016/j.ceb.2010.03.001. [DOI] [PubMed] [Google Scholar]
- 28.Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5(7):621–628. doi: 10.1038/nmeth.1226. [DOI] [PubMed] [Google Scholar]
- 29.Boettiger AN, et al. Super-resolution imaging reveals distinct chromatin folding for different epigenetic states. Nature. 2016;529(7586):418–422. doi: 10.1038/nature16496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hübner MR, Spector DL. Chromatin dynamics. Annu Rev Biophys. 2010;39:471–489. doi: 10.1146/annurev.biophys.093008.131348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Stasiak A, Katritch V, Kauffman LH. Ideal Knots. World Scientific; River Edge, NJ: 1998. [Google Scholar]
- 32.Bantignies F, et al. Polycomb-dependent regulatory contacts between distant Hox loci in Drosophila. Cell. 2011;144(2):214–226. doi: 10.1016/j.cell.2010.12.026. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.