De novo prediction of human chromosome structures: Epigenetic marking patterns encode genome architecture

Michele Di Pierro; Ryan R Cheng; Erez Lieberman Aiden; Peter G Wolynes; José N Onuchic

doi:10.1073/pnas.1714980114

. 2017 Oct 31;114(46):12126–12131. doi: 10.1073/pnas.1714980114

De novo prediction of human chromosome structures: Epigenetic marking patterns encode genome architecture

Michele Di Pierro ^a,^1,², Ryan R Cheng ^a,¹, Erez Lieberman Aiden ^a,^b, Peter G Wolynes ^a,^c,^d, José N Onuchic ^a,^d,²

PMCID: PMC5699090 PMID: 29087948

Significance

In the nucleus of eukaryotic cells, the genome is organized in three dimensions in an architecture that depends on cell type. This organization is a key element of transcriptional regulation, and its disruption often leads to disease. We demonstrate that it is possible to predict how a genome will fold based on the epigenetic marks that decorate chromatin. Epigenetic marking patterns are used to predict the corresponding ensemble of 3D structures by leveraging both energy landscape theory and neural network-based machine learning. These predictions are extensively validated by the results of DNA-DNA ligation assays and fluorescence microscopy, which are found to be in exceptionally good agreement with theory.

Keywords: epigenetics, machine learning, energy landscape theory, genomic architecture, Hi-C

Abstract

Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data derived from chromatin immunoprecipitation-sequencing (ChIP-Seq). We exploit the idea that chromosomes encode a 1D sequence of chromatin structural types. Interactions between these chromatin types determine the 3D structural ensemble of chromosomes through a process similar to phase separation. First, a neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization [Minimal Chromatin Model (MiChroM)] to generate an ensemble of 3D chromosome conformations at a resolution of 50 kilobases (kb). After training the model, dubbed Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles (MEGABASE), on odd-numbered chromosomes, we predict the sequences of chromatin types and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps, as well as distances measured using 3D fluorescence in situ hybridization (FISH) experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible.

In the nucleus of eukaryotic cells, the 1D information of the genome is organized in three dimensions (1, 2). It is increasingly evident that genomic spatial organization is a key element of transcriptional regulation (1, 3, 4). During interphase, the 3D arrangement of chromatin brings into close spatial proximity sections of DNA separated by great genomic distance, introducing interactions between genes and regulatory elements. These folding patterns are cell type-specific (5, 6), and their disruption can lead to disease (7–10).

The use of high-resolution contact mapping experiments (Hi-C) has revealed that, at the large scale, genome structure is dominated by the segregation of human chromatin into compartments. Initial analysis of Hi-C experiments revealed that loci typically exhibited one of two long-range contact patterns, suggesting the presence of two spatial neighborhoods, dubbed the A and B compartments (11). Subsequently, higher resolution experiments have shown the presence of six distinct long-range patterns, indicating the presence of six subcompartments (A1, A2, B1, B2, B3, and B4) in human lymphoblastoid cells (GM12878) (6). The compartmentalization of the genome has been observed in many organisms [including mouse (6, 12) and Drosophila (13–15)], and has been confirmed by microscopy experiments (16). Crucially, the long-range contact pattern seen at a locus is cell type-specific, and is strongly associated with particular chromatin marks.

To model this structure, we recently introduced an effective energy landscape model for chromatin structure called the Minimal Chromatin Model (MiChroM) (17). This model combines a generic polymer potential with additional interaction terms governing compartment formation, as well as other processes involved in chromatin organization (4, 18–22) [i.e., the local helical structural tendency of the chromatin filament (17, 23–25) and the chromatin loops associated with the presence of CCCTC-binding factor (CTCF) (6, 26–28)]. The formation of compartments (as well as any other interaction in the MiChroM) is assumed to operate only through direct protein-mediated contacts bringing about segregation of chromatin types through a process of phase separation (17, 29). The MiChroM shows that the compartmentalization patterns that Hi-C maps reveal can be transformed into 3D models of genome structure at 50-kb resolution.

Here, we extend the earlier work by demonstrating that the structure of chromosomes can be predicted, de novo, by inferring chromatin types from chromatin immunoprecipitation-sequencing (ChIP-Seq) data and then using these inferences as an input into an effective energy landscape model. The work flow behind this approach is broadly described in Fig. 1.

Fig. 1. — Schematic illustration of the MEGABASE + MiChroM computational pipeline. (1) ChIP-Seq data constitute the only input to our pipeline. ChIP-Seq tracks obtained from a publicly available resource (ENCODE) are converted into a sequence of chromatin structural types using a neural network dubbed MEGABASE. The neural network encodes the relationship between compartmentalization and the biochemical state of each locus along the genome. (2) Sequences of chromatin structural types are used as input to a physical model for chromatin folding (MiChroM) to obtain the ensembles of 3D structures of specific chromosomes (17). MiChroM is an effective energy landscape model consisting of a generic polymer with chromatin-type interactions and a translational invariant local ordering term (Ideal Chromosome). (3) Ensembles of 3D structures are validated by comparing the predicted contact maps with those experimentally determined by using Hi-C.

Although the compartments and subcompartments visible in Hi-C maps correlate with a handful of specific epigenetic modifications present at those loci (also ref. 6), the distributions of epigenetic markers found in each compartment are broad and largely overlap. It is therefore impossible to assign any given locus correctly to a specific compartment using the frequency of any single epigenetic modification. To overcome this difficulty, we use a machine learning approach to extract information from the raw chromatin immunoprecipitation (ChIP-Seq) data. We first obtained ChIP-Seq profiles available from the Encyclopedia of DNA Elements (ENCODE) project for the GM12878 lymphoblastoid cell line, encompassing 84 protein-binding experiments and 11 histone marks. Next, we discretized each of these profiles, partitioning them into 50-kb loci, each of which is assigned a value from 1 (weakest signal) to 20 (strongest signal). We then constructed a neural network to uncover the relationship between compartment annotations and epigenetic markings. We use a neural network in which each data type available at a given locus corresponds to a single neuron (30). The state of the network is represented by the state vector $\vec{σ} (l) = (C (l), {Exp}_{1} (l), {Exp}_{2} (l), \dots, {Exp}_{L} (l))$ , which represents all of the data available at locus l, with C being the subcompartment annotation and ${Exp}_{i}$ being the result of the ith ChIP-Seq experiment. The data at each locus are further assumed to be distributed according to a Boltzmann distribution for a Potts model:

\begin{array}{l} H (\vec{σ}) = - \sum_{i < j} J_{i j} (σ_{i}, σ_{j}) - \sum_{i} h_{i} (σ_{i}), \\ P (\vec{σ}) = \frac{1}{z} \exp (- H (\vec{σ})), \end{array}

where the $P (\vec{σ})$ indicates the probability of observing the state vector $\vec{σ}$ at any given locus $l$ , the $J_{i j}$ interactions capture local pairwise correlations between epigenetic marks or between marks and chromatin types, and h_i determines the individual frequencies of chromatin types and markers. This procedure is equivalent to training a Boltzmann machine to encode the information contained in the dataset. The learning strategy is based on the idea that the parameters of the neural network should maximize the likelihood of observing the set of state vectors representing a particular training set. A similar strategy has been previously introduced to quantify the correlated mutational patterns observed in amino acid sequence data of protein families occurring under natural selection to aid protein structure prediction (31, 32).

The quality of compartment prediction is improved when we include in the Potts model interactions that do not just refer to a single 50-kb locus but also to interactions encoding correlations between markings and annotations of nearest neighbors and next nearest neighbors (i.e., the neural network correlates information from loci l − 2, l − 1, l, l + 1, l + 2). Through these couplings, the probability of observing a specific state vector at a given locus is correlated with the states of the adjacent segments, thus minimizing the effect of uncorrelated noise. This strategy is analogous to the construction of secondary structure predictors in protein folding using helix–coil models (33).

The inferred probabilistic model is then marginalized to predict the most probable chromatin type for a given locus $l$ when given the experimental ChIP-Seq measurements of loci $(l - 2, l - 1, l, l + 1, l + 2)$ :

C S T (l) = \arg \max P (C | {Exp}_{1, ..., L} (l - 2, l - 1, l, l + 1, l + 2)) .

We refer to the resulting probabilistic predictor of chromatin structural types (CST) as the Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles (MEGABASE). Once trained for a given new input sequence of epigenetic marks, the model can then find the most probable sequence of corresponding compartment annotations.

The state vectors of every locus of the odd-numbered chromosomes comprise the training set. The state vectors of the even-numbered chromosomes then provide a test set to quantify the performance of the trained model.

After training on the odd-numbered chromosomes, we used our statistical model to predict the chromatin types for the independent set of the even chromosomes of the cell line GM12878 from their epigenetic marking profiles. For the test set, the predicted type assignments are in broad agreement with the experimentally determined structural annotations in the study by Rao et al. (6). Specifically, the model is very accurate in predicting the assignments to compartments (A vs. B), while producing a larger number of mismatches between the predicted chromatin types and the published subcompartment annotations, which are more fine-grained (A1 vs. A2, B1 vs. B2 vs. B3) (SI Appendix, Fig. S1).

Once predicted sequences of type annotations are available, we use our earlier MiChroM to sample the predicted conformational ensembles of 3D structures. To highlight the relationship between chromatin types and compartmentalization, we use the MiChroM Hamiltonian with the same parameters that had already been determined, but omit the term in that energy function that models the CTCF-mediated looping interactions. These looping interactions seem to arise from a distinct process from compartmentalization, and omitting such interactions does not disrupt the large-scale architecture of chromosomes (17) (the results of additional simulations, including also the CTCF-mediated looping interactions, are provided in SI Appendix, Fig. S2).

The simulations all start from a random collapsed polymer having the proper length confined in a spherical region at correct density (SI Appendix). After equilibration, we collect an ensemble of 3D structures representing the chromosome-specific energy landscape as shaped by the inferred chromatin-type sequences (used as input) and by the MiChroM effective interactions.

From the ensemble of equilibrium conformations, we calculate the contact probabilities between any pair of loci within each chromosome. We compare the resulting contact maps from the simulated ensemble of 3D structures with the experimental Hi-C maps reported by Rao et al. (6). The overall agreement between the experimental and simulated contact probabilities is visually evident. The comparison between the simulated and experimental contact maps is shown in Fig. 2 for representative chromosomes in the test set (i.e., the even autosomes). The Pearson’s coefficient is ∼0.9 or higher for all of the chromosomes whether in the training set or test set, and the analysis of the Pearson’s coefficient as a function of genomic distance (SI Appendix, Figs. S3–S24) confirms that the two sets of maps are correlated exceptionally well. The power law scaling of the contact probability between two loci as a function of their genomic distance is reproduced well at all genomic distances in a comparison with Hi-C data (SI Appendix, Figs. S3–S24).

Fig. 2. — Predicting the 1D chromatin sequences, 3D conformations, and 2D contact probabilities of human chromosomes from epigenetic marking patterns. We apply MEGABASE + MiChroM to obtain an ensemble of 3D structures for all of the autosomes of cell line GM12878. For illustrative purposes, predictions for chromosome 2 (*Left*) and chromosome 10 (*Right*) are shown, respectively. (A) Ninety-five ChIP-Seq tracks are downloaded from the ENCODE database and used as input for MEGABASE to predict 1D sequences of chromatin types (shown in B). The 3D structure of each chromosome is encoded in its specific 1D sequence of chromatin structural types. (C) Typical 3D conformation obtained by MiChroM is shown for chromosomes 2 and 10. (D) Approximately 50,000 structures are collected from simulation to generate high-quality contact maps. These contact maps are compared with the Hi-C maps shown in E. The simulations correctly predict the long-range contact probability patterns that are observed in Hi-C maps, as seen in the magnified regions.

Finally, we compare the Cartesian distances between multiple pairs of loci as predicted through the use of our computational model with those measured by using 3D fluorescence in situ hybridization (FISH), and reported by Rao et al. (6) for the cell line GM12878 and by Lieberman-Aiden et al. (11) for the closely related cell line GM06990. FISH experiments in Fig. 3 show that chromatin belonging to the same structural type tends to come into contact more frequently than otherwise, supporting the idea that compartmentalization is induced by a process of phase separation. This behavior is predicted with quantitative accuracy by our ChIP-Seq–based simulation. Remarkably, simulations predict all of the experimentally determined average distances, together with their variances (Fig. 3 and SI Appendix, Figs. S25–S27).

Fig. 3. — Simulated conformational ensembles predict the distances measured by 3D FISH experiments. Simulations and 3D FISH experiments support the idea that the compartmentalization observed in Hi-C maps emerges from the phase separation of chromatin structural types. (A and B) Cartesian distances between four loci (L1, L2, L3, and L4) in chromosome 14 (cell line GM06990) were measured in two distinct 3D FISH experiments reported by Lieberman-Aiden et al. (11). The same distances were measured using the MEGABASE + MiChroM pipeline. The positions of the fluorescent probes are illustrated in representative 3D configurations from simulations, as well as along the chromosome. As illustrated by the annotations from MEGABASE shown in the figure, the four loci are composed of chromatin of alternating types: L1 and L3 composed of type A chromatin and L2 and L4 composed of type B chromatin. (C and D) Cumulative distribution functions (CDF) show that loci composed of chromatin belonging to the same type tend to be closer in space than otherwise, despite the interlaced order and despite lying at greater genomic distances. This phenomenon is observed in FISH experiments, and it is correctly predicted by our ChIP-Seq–based modeling. The comparison between the predicted and measured probability distributions shows excellent agreement for both the average distance and the distance fluctuations (more examples of validation with FISH data are provided in *SI Appendix*). The average ratio between simulated distances and FISH-measured distances has been used to calibrate the length scale of simulation. One unit of length in simulation corresponded to a length of 0.17 μm, which also implies the size of a simulated chromosomal territory being ∼2–3 μm across, which is consistent with what was previously reported by Cremer and Cremer (2).

Representative predicted 3D conformations for chromosome 2 and chromosome 10 are shown in Fig. 2.

As previously observed by Di Pierro et al. (17), analysis of the conformational ensembles shows the existence of microphase separation between chromatin of different types, leading to the formation of the characteristic patterns of interactions seen in Hi-C maps. Examples of the long-range patterns that are captured by our predictions are shown in Fig. 2. The more transcriptionally active segments of chromatin (compartments A1 and A2 in Fig. 2) are more frequently found on the outer surface, while the inactive segments (compartments B1, B2, and B3 in Fig. 2) typically reside in the core of chromosomes.

The quality of the structural predictions achieved using the chromatin annotation inferred by MEGABASE shows that there exists a clear sequence-to-structure relationship between the sequences of chromatin types predicted from epigenetic marks and genome architecture. The accuracy achieved by using our energy landscape model in predicting the effects of compartmentalization, as seen by Hi-C and 3D FISH, supports the plausibility of microphase separation being the physical process driving compartmentalization in chromosomes (17, 34–36) (Fig. 4).

Fig. 4. — Process of microphase separation explains compartmentalization in chromosomes. The MEGABASE + MiChroM hypothesizes that chromatin characterized by homogeneous epigenetic markings undergoes a process similar to phase separation under the action of the proteome present in the nucleus. In simulations, we observe that segments of chromatin belonging to the same structural type tend to segregate, forming liquid droplets, which rearrange dynamically by splitting and fusing. This simple process of phase separation is sufficient to explain the emergence of compartmentalization in genomes as observed in DNA-DNA ligation assays and microscopy experiments.

The success achieved in reliably predicting chromosome architecture indicates that our probabilistic model captures the essential features of epigenetic marks that are associated with compartmentalization. Hence, we further exploit MEGABASE to study this relationship by calculating the content of mutual information shared between markers and compartments, and so quantifying which of the markers are the best predictors of compartmentalization. It is immediately evident that certain biochemical markers share a high content of mutual information with chromatin structural types, while others do not. According to our neural network, histone methylations HK36me3, H3K27me3, H3K4me1, and H4K20me1 and nuclear proteins EED, ZBED1, TRIM22, and HCFC1 carry most of the information associated with identifying the chromatin types (SI Appendix, Fig. S28). In contrast, we see that although compartment A, for example, has a very high content of H3K27ac, that marker by itself is a poor predictor owing to its modest mutual information value.

Histone modifications alone carry enough information to predict genome architecture. To illustrate the disproportionate predictive value of histone marks, we created a reduced model by training MEGABASE using only the 11 patterns of histone modifications out of the 95 tracks available in the ENCODE database. The sequences of chromatin types predicted by this reduced model turn out to be only marginally different from those obtained by the full dataset of ChIP-Seq tracks (SI Appendix).

Our results demonstrate clearly that it is possible to generate de novo predictions of the genome’s 3D structure, as well as specific predictions about the results of Hi-C and FISH experiments, using only ChIP-Seq data on histone modifications as an input. The faithfulness of the predicted conformational ensembles underlines the existence of a sequence-to-structure relationship between patterns of histone modifications and the 3D spatial arrangement of chromosomes.

These findings offer great hope that, like the problem of protein folding before it, the puzzle of genome folding may be amenable to computational predictions (37). However, despite the success of the neural network-based prediction algorithm, the details of the mechanism underlying chromatin folding remain unclear. Does chromatin fold into a specific conformation because of the particular sequence of epigenetic markers or, vice versa, do compartments share similar epigenetic markers because of chromosome architecture? Dynamical studies using Hi-C and other methods will doubtless be essential in addressing these questions.

Supplementary Material

Supplementary File

pnas.1714980114.sapp.pdf^{(36.4MB, pdf)}

Acknowledgments

We thank Erica J. Di Pierro for help in editing the manuscript. This work was supported by the Center for Theoretical Biological Physics sponsored by National Science Foundation (NSF) Grant PHY-1427654. J.N.O. was also supported by the NSF Grant CHE-1614101 and by the Welch Foundation (Grant C-1792). Additional support to P.G.W. was provided by the D. R. Bullard-Welch Chair at Rice University (Grant C-0016). E.L.A. was also supported by an NIH New Innovator Award (1DP2OD008540-01), the National Human Genome Research Institute (NHGRI) Center for Excellence for Genomic Sciences (HG006193), the Welch Foundation (Q-1866), an NVIDIA Research Center Award, an International Business Machines Corporation (IBM) University Challenge Award, a Google Research Award, a Cancer Prevention Research Institute of Texas Scholar Award (R1304), a McNair Medical Institute Scholar Award, an NIH 4D Nucleome Grant (U01HL130010), an NIH Encyclopedia of DNA Elements Mapping Center Award (UM1HG009375), and the President’s Early Career Award in Science and Engineering.

Footnotes

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1714980114/-/DCSupplemental.

References

1.Bickmore WA. The spatial organization of the human genome. Annu Rev Genomics Hum Genet. 2013;14:67–84. doi: 10.1146/annurev-genom-091212-153515. [DOI] [PubMed] [Google Scholar]
2.Cremer T, Cremer C. Chromosome territories, nuclear architecture and gene regulation in mammalian cells. Nat Rev Genet. 2001;2:292–301. doi: 10.1038/35066075. [DOI] [PubMed] [Google Scholar]
3.Whalen S, Truty RM, Pollard KS. Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat Genet. 2016;48:488–496. doi: 10.1038/ng.3539. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Gürsoy G, Xu Y, Liang J. Spatial organization of the budding yeast genome in the cell nucleus and identification of specific chromatin interactions from multi-chromosome constrained chromatin model. PLoS Comput Biol. 2017;13:e1005658. doi: 10.1371/journal.pcbi.1005658. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Krijger PH, et al. Cell-of-origin-specific 3D genome structure acquired during somatic cell reprogramming. Cell Stem Cell. 2016;18:597–610. doi: 10.1016/j.stem.2016.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Rao SSP, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Göndör A. Dynamic chromatin loops bridge health and disease in the nuclear landscape. Semin Cancer Biol. 2013;23:90–98. doi: 10.1016/j.semcancer.2013.01.002. [DOI] [PubMed] [Google Scholar]
8.Krijger PH, de Laat W. Regulation of disease-associated gene expression in the 3D genome. Nat Rev Mol Cell Biol. 2016;17:771–782. doi: 10.1038/nrm.2016.138. [DOI] [PubMed] [Google Scholar]
9.Fullwood MJ, et al. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature. 2009;462:58–64. doi: 10.1038/nature08497. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Montefiori L, et al. Extremely long-range chromatin loops link topological domains to facilitate a diverse antibody repertoire. Cell Rep. 2016;14:896–906. doi: 10.1016/j.celrep.2015.12.083. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Lieberman-Aiden E, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Dixon JR, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Eagen KP, Hartl TA, Kornberg RD. Stable chromosome condensation revealed by chromosome conformation capture. Cell. 2015;163:934–946. doi: 10.1016/j.cell.2015.10.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Sexton T, et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell. 2012;148:458–472. doi: 10.1016/j.cell.2012.01.010. [DOI] [PubMed] [Google Scholar]
15.Li QJ, et al. The three-dimensional genome organization of Drosophila melanogaster through data integration. Genome Biol. 2017;18:145. doi: 10.1186/s13059-017-1264-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Wang S, et al. Spatial organization of chromatin domains and compartments in single chromosomes. Science. 2016;353:598–602. doi: 10.1126/science.aaf8084. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Di Pierro M, Zhang B, Aiden EL, Wolynes PG, Onuchic JN. Transferable model for chromosome architecture. Proc Natl Acad Sci USA. 2016;113:12168–12173. doi: 10.1073/pnas.1613607113. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Barbieri M, et al. Complexity of chromatin folding is captured by the strings and binders switch model. Proc Natl Acad Sci USA. 2012;109:16173–16178. doi: 10.1073/pnas.1204799109. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Brackley CA, Johnson J, Kelly S, Cook PR, Marenduzzo D. Simulated binding of transcription factors to active and inactive regions folds human chromosomes into loops, rosettes and topological domains. Nucleic Acids Res. 2016;44:3503–3512. doi: 10.1093/nar/gkw135. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Jost D, Carrivain P, Cavalli G, Vaillant C. Modeling epigenome folding: Formation and dynamics of topologically associated chromatin domains. Nucleic Acids Res. 2014;42:9553–9561. doi: 10.1093/nar/gku698. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Wong H, et al. A predictive computational model of the dynamic 3D interphase yeast nucleus. Curr Biol. 2012;22:1881–1890. doi: 10.1016/j.cub.2012.07.069. [DOI] [PubMed] [Google Scholar]
22.Tjong H, Gong K, Chen L, Alber F. Physical tethering and volume exclusion determine higher-order genome organization in budding yeast. Genome Res. 2012;22:1295–1305. doi: 10.1101/gr.129437.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Zhang B, Wolynes PG. Shape transitions and chiral symmetry breaking in the energy landscape of the mitotic chromosome. Phys Rev Lett. 2016;116:248101. doi: 10.1103/PhysRevLett.116.248101. [DOI] [PubMed] [Google Scholar]
24.Zhang B, Wolynes PG. Topology, structures, and energy landscapes of human chromosomes. Proc Natl Acad Sci USA. 2015;112:6062–6067. doi: 10.1073/pnas.1506257112. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Grigoryev SA, et al. Hierarchical looping of zigzag nucleosome chains in metaphase chromosomes. Proc Natl Acad Sci USA. 2016;113:1238–1243. doi: 10.1073/pnas.1518280113. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Sanborn AL, et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc Natl Acad Sci USA. 2015;112:E6456–E6465. doi: 10.1073/pnas.1518552112. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Phillips JE, Corces VG. CTCF: Master weaver of the genome. Cell. 2009;137:1194–1211. doi: 10.1016/j.cell.2009.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Nichols MH, Corces VG. A CTCF code for 3D genome architecture. Cell. 2015;162:703–705. doi: 10.1016/j.cell.2015.07.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Zhang B, Wolynes PG. Genomic energy landscapes. Biophys J. 2017;112:427–433. doi: 10.1016/j.bpj.2016.08.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA. 1982;79:2554–2558. doi: 10.1073/pnas.79.8.2554. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Lapedes A, Giraud B, Jarzynski C. 2002. Using sequence alignments to predict protein structure and stability with high accuracy. arXiv:1207.2484.
32.Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models. Phys Rev E Stat Nonlin Soft Matter Phys. 2013;87:012707. doi: 10.1103/PhysRevE.87.012707. [DOI] [PubMed] [Google Scholar]
33.Bryngelson JD, Hopfield JJ, Southard SN. A protein structure predictor based on an energy model with learned parameters. Tetrahedron Comput Methodol. 1990;3:129–141. [Google Scholar]
34.Hnisz D, Shrinivas K, Young RA, Chakraborty AK, Sharp PA. A phase separation model for transcriptional control. Cell. 2017;169:13–23. doi: 10.1016/j.cell.2017.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Larson AG, et al. Liquid droplet formation by HP1α suggests a role for phase separation in heterochromatin. Nature. 2017;547:236–240. doi: 10.1038/nature22822. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Strom AR, et al. Phase separation drives heterochromatin domain formation. Nature. 2017;547:241–245. doi: 10.1038/nature22989. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Wolynes PG. Evolution, energy landscapes and the paradoxes of protein folding. Biochimie. 2015;119:218–230. doi: 10.1016/j.biochi.2014.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

pnas.1714980114.sapp.pdf^{(36.4MB, pdf)}

[r1] 1.Bickmore WA. The spatial organization of the human genome. Annu Rev Genomics Hum Genet. 2013;14:67–84. doi: 10.1146/annurev-genom-091212-153515. [DOI] [PubMed] [Google Scholar]

[r2] 2.Cremer T, Cremer C. Chromosome territories, nuclear architecture and gene regulation in mammalian cells. Nat Rev Genet. 2001;2:292–301. doi: 10.1038/35066075. [DOI] [PubMed] [Google Scholar]

[r3] 3.Whalen S, Truty RM, Pollard KS. Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat Genet. 2016;48:488–496. doi: 10.1038/ng.3539. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4] 4.Gürsoy G, Xu Y, Liang J. Spatial organization of the budding yeast genome in the cell nucleus and identification of specific chromatin interactions from multi-chromosome constrained chromatin model. PLoS Comput Biol. 2017;13:e1005658. doi: 10.1371/journal.pcbi.1005658. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5] 5.Krijger PH, et al. Cell-of-origin-specific 3D genome structure acquired during somatic cell reprogramming. Cell Stem Cell. 2016;18:597–610. doi: 10.1016/j.stem.2016.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r6] 6.Rao SSP, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7] 7.Göndör A. Dynamic chromatin loops bridge health and disease in the nuclear landscape. Semin Cancer Biol. 2013;23:90–98. doi: 10.1016/j.semcancer.2013.01.002. [DOI] [PubMed] [Google Scholar]

[r8] 8.Krijger PH, de Laat W. Regulation of disease-associated gene expression in the 3D genome. Nat Rev Mol Cell Biol. 2016;17:771–782. doi: 10.1038/nrm.2016.138. [DOI] [PubMed] [Google Scholar]

[r9] 9.Fullwood MJ, et al. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature. 2009;462:58–64. doi: 10.1038/nature08497. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r10] 10.Montefiori L, et al. Extremely long-range chromatin loops link topological domains to facilitate a diverse antibody repertoire. Cell Rep. 2016;14:896–906. doi: 10.1016/j.celrep.2015.12.083. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r11] 11.Lieberman-Aiden E, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r12] 12.Dixon JR, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13] 13.Eagen KP, Hartl TA, Kornberg RD. Stable chromosome condensation revealed by chromosome conformation capture. Cell. 2015;163:934–946. doi: 10.1016/j.cell.2015.10.026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r14] 14.Sexton T, et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell. 2012;148:458–472. doi: 10.1016/j.cell.2012.01.010. [DOI] [PubMed] [Google Scholar]

[r15] 15.Li QJ, et al. The three-dimensional genome organization of Drosophila melanogaster through data integration. Genome Biol. 2017;18:145. doi: 10.1186/s13059-017-1264-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16] 16.Wang S, et al. Spatial organization of chromatin domains and compartments in single chromosomes. Science. 2016;353:598–602. doi: 10.1126/science.aaf8084. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r17] 17.Di Pierro M, Zhang B, Aiden EL, Wolynes PG, Onuchic JN. Transferable model for chromosome architecture. Proc Natl Acad Sci USA. 2016;113:12168–12173. doi: 10.1073/pnas.1613607113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r18] 18.Barbieri M, et al. Complexity of chromatin folding is captured by the strings and binders switch model. Proc Natl Acad Sci USA. 2012;109:16173–16178. doi: 10.1073/pnas.1204799109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r19] 19.Brackley CA, Johnson J, Kelly S, Cook PR, Marenduzzo D. Simulated binding of transcription factors to active and inactive regions folds human chromosomes into loops, rosettes and topological domains. Nucleic Acids Res. 2016;44:3503–3512. doi: 10.1093/nar/gkw135. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20] 20.Jost D, Carrivain P, Cavalli G, Vaillant C. Modeling epigenome folding: Formation and dynamics of topologically associated chromatin domains. Nucleic Acids Res. 2014;42:9553–9561. doi: 10.1093/nar/gku698. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r21] 21.Wong H, et al. A predictive computational model of the dynamic 3D interphase yeast nucleus. Curr Biol. 2012;22:1881–1890. doi: 10.1016/j.cub.2012.07.069. [DOI] [PubMed] [Google Scholar]

[r22] 22.Tjong H, Gong K, Chen L, Alber F. Physical tethering and volume exclusion determine higher-order genome organization in budding yeast. Genome Res. 2012;22:1295–1305. doi: 10.1101/gr.129437.111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r23] 23.Zhang B, Wolynes PG. Shape transitions and chiral symmetry breaking in the energy landscape of the mitotic chromosome. Phys Rev Lett. 2016;116:248101. doi: 10.1103/PhysRevLett.116.248101. [DOI] [PubMed] [Google Scholar]

[r24] 24.Zhang B, Wolynes PG. Topology, structures, and energy landscapes of human chromosomes. Proc Natl Acad Sci USA. 2015;112:6062–6067. doi: 10.1073/pnas.1506257112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r25] 25.Grigoryev SA, et al. Hierarchical looping of zigzag nucleosome chains in metaphase chromosomes. Proc Natl Acad Sci USA. 2016;113:1238–1243. doi: 10.1073/pnas.1518280113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r26] 26.Sanborn AL, et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc Natl Acad Sci USA. 2015;112:E6456–E6465. doi: 10.1073/pnas.1518552112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r27] 27.Phillips JE, Corces VG. CTCF: Master weaver of the genome. Cell. 2009;137:1194–1211. doi: 10.1016/j.cell.2009.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r28] 28.Nichols MH, Corces VG. A CTCF code for 3D genome architecture. Cell. 2015;162:703–705. doi: 10.1016/j.cell.2015.07.053. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r29] 29.Zhang B, Wolynes PG. Genomic energy landscapes. Biophys J. 2017;112:427–433. doi: 10.1016/j.bpj.2016.08.046. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r30] 30.Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA. 1982;79:2554–2558. doi: 10.1073/pnas.79.8.2554. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r31] 31.Lapedes A, Giraud B, Jarzynski C. 2002. Using sequence alignments to predict protein structure and stability with high accuracy. arXiv:1207.2484.

[r32] 32.Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models. Phys Rev E Stat Nonlin Soft Matter Phys. 2013;87:012707. doi: 10.1103/PhysRevE.87.012707. [DOI] [PubMed] [Google Scholar]

[r33] 33.Bryngelson JD, Hopfield JJ, Southard SN. A protein structure predictor based on an energy model with learned parameters. Tetrahedron Comput Methodol. 1990;3:129–141. [Google Scholar]

[r34] 34.Hnisz D, Shrinivas K, Young RA, Chakraborty AK, Sharp PA. A phase separation model for transcriptional control. Cell. 2017;169:13–23. doi: 10.1016/j.cell.2017.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r35] 35.Larson AG, et al. Liquid droplet formation by HP1α suggests a role for phase separation in heterochromatin. Nature. 2017;547:236–240. doi: 10.1038/nature22822. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r36] 36.Strom AR, et al. Phase separation drives heterochromatin domain formation. Nature. 2017;547:241–245. doi: 10.1038/nature22989. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r37] 37.Wolynes PG. Evolution, energy landscapes and the paradoxes of protein folding. Biochimie. 2015;119:218–230. doi: 10.1016/j.biochi.2014.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

De novo prediction of human chromosome structures: Epigenetic marking patterns encode genome architecture

Michele Di Pierro

Ryan R Cheng

Erez Lieberman Aiden

Peter G Wolynes

José N Onuchic

Significance

Abstract

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

De novo prediction of human chromosome structures: Epigenetic marking patterns encode genome architecture

Michele Di Pierro

Ryan R Cheng

Erez Lieberman Aiden

Peter G Wolynes

José N Onuchic

Significance

Abstract

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases