Abstract
Adapting a well-established formalism in polymer physics, we develop a minimalist approach to infer three-dimensional folding of chromatin from Hi-C data. The three-dimensional chromosome structures generated from our heterogeneous loop model (HLM) are used to visualize chromosome organizations that can substantiate the measurements from fluorescence in situ hybridization, chromatin interaction analysis by paired-end tag sequencing, and RNA-seq signals. We demonstrate the utility of the HLM with several case studies. Specifically, the HLM-generated chromosome structures, which reproduce the spatial distribution of topologically associated domains from fluorescence in situ hybridization measurement, show the phase segregation between two types of topologically associated domains explicitly. We discuss the origin of cell-type-dependent gene-expression level by modeling the chromatin globules of α-globin and SOX2 gene loci for two different cell lines. We also use the HLM to discuss how the chromatin folding and gene-expression level of Pax6 loci, associated with mouse neural development, are modulated by interactions with two enhancers. Finally, HLM-generated structures of chromosome 19 of mouse embryonic stem cells, based on single-cell Hi-C data collected over each cell-cycle phase, visualize changes in chromosome conformation along the cell-cycle. Given a contact frequency map between chromatic loci supplied from Hi-C, HLM is a computationally efficient and versatile modeling tool to generate chromosome structures that can complement interpreting other experimental data.
Significance
The packaging of chromosomes, giant macromolecules made of hundreds-of-megabase-long DNA, into a micrometer-sized cell nucleus is truly remarkable. Recent advances in Hi-C techniques have ushered in a new era of research on genome organization. We developed a computationally efficient and versatile approach, called the heterogeneous loop model, to generate chromosome structural ensemble from Hi-C. The heterogenous-loop-model-generated three-dimensional chromosome structures not only substantiate the chromosome organizations implicated by diverse experimental data but also allow us to decipher the structural origin of genome function and variation of gene-expression level along the cell cycle and across different cell types.
Introduction
Recent advances in chromosome conformation capture techniques combined with parallel sequencing (1, 2, 3, 4, 5) and fluorescence imaging microscopies have ushered in a new era of chromosome research over the past decade. Along with post-translational histone modifications, which have led to the conceptualization of epigenomes (6), the critical findings from fluorescence imaging and Hi-C data that the spatial organization of chromatin varies with the tissue or cell types (7, 8), cell cycle (4), and pathological states (9, 10, 11) have brought a new dimension to our understanding of genome functions.
Among others, maps of genome-wide contact frequencies, quantified by Hi-C data, offer unprecedented opportunities to infer 3D chromosome structures in cell nuclei (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22). In a nutshell, Hi-C provides the contact frequencies of genomic loci pairs based on the statistics of PCR-amplified DNA fragments digested from formaldehyde cross-linked cells (1, 2). One could interpret that Hi-C measures the population-sampled contact probability between pair of genomic loci, say i and j, pij. A proper mathematical mapping of pij to the spatial distance rij is of critical importance for interpreting fluorescence imaging data (23, 24) in comparison with Hi-C data.
The advent of fluorescence in situ hybridization (FISH) followed by C-based techniques have engendered much devotion to capturing the principle underlying the three-dimensional (3D) folding of chromosomes. This has led to development of a series of polymer-based models over the decades, which include the “multiloop subcompartment model” (25, 26), the “random loop model” (RLM) (27, 28, 29), the “strings and binders switch” model (12, 15, 30) and its derivative (17, 31, 32), the “loop extrusion model” (13, 14, 15, 33), the “minimal chromatin model” (34), and, more recently, the “chromosome copolymer model” (22). Among them, although applicability is limited to the associated spatiotemporal scale of the model being considered, some were developed by keeping a specific molecular mechanism in mind or by incorporating “one-dimensional” information of epigenetic modification and/or DNA accessibility along genomic loci as an input to a heteropolymer model (22, 32, 35). On the other hand, partly sacrificing model simplicity, others were developed solely for the purpose of reconstructing more precise 3D chromatin structures from Hi-C (20, 36, 37, 38) and other experiments (39).
As the cell imaging data over different cell types are rapidly growing, comparative study of chromosome conformations has become imperative. In the abovementioned models, however, a physically sound mapping of pij from Hi-C to the spatial distance rij (see review (40)) is still lacking, and computational costs are still high. To this end, here we develop a minimalist model that allows us to generate chromatin conformations from Hi-C data in a most efficient way and to study the structural characteristics of chromosome at a length scale of interest corresponding to the resolution of the given data. To achieve such a goal in the most simplifying manner, one could learn much from the literature of generic polymer problems, such as the collapse transition of an isolated polymer chain or macromolecular networks with increasing numbers of internal bonds (41, 42, 43, 44) and polymer conformation and dynamics inside confinement (45, 46, 47).
Pushing the polymer physics idea to its extreme, we propose a minimalist approach, termed the heterogeneous loop model (HLM), that allows us to build 3D structures of chromosomes from Hi-C data. The HLM adapts the RLM, which was originally developed based on a randomly cross-linked polymer chain (27, 28, 48). In the RLM, which represents chromosome conformation in terms of the sum of harmonic potentials, pairwise contact probabilities are expressed analytically in terms of a few model parameters. Here, without sacrificing the mathematical tractability and simplicity of the RLM, we extend the RLM to the HLM by allowing the loop interactions to be nonuniform and heterogeneous such that the resulting loop interactions can best represent a given Hi-C data set.
In this study, we apply the HLM to various regions of human and mouse genomes that span 1–100 Mb at 5–500 kb resolution and generate the corresponding conformational ensemble of chromosomes. We demonstrate the utilities of the HLM by comparing the structural information extracted from an HLM-generated chromosome ensemble with those implicated by the measurements from FISH (23, 24, 28), chromatin interaction analysis by paired-end tag sequencing (49, 50), and previous modeling studies (28, 32, 37, 51, 52). Through multiple examples, this study will demonstrate that the HLM is an excellent approach to infer 3D structures from Hi-C data.
Methods
Description of the HLM
The full energy potential of the HLM consists of two parts.
| (1) |
In what follows, we delineate the first and second terms of Eq. 1 (see Supporting Materials and Methods for technical details).
First, decomposed into two parts, describes the harmonic constraints on a chain of N monomers (27),
| (2) |
where successive monomers along the backbone and nonsuccessive monomers forming loops are both harmonically restrained. In the second line, is written in a compact form with and K representing the Kirchhoff matrix. K can be built from the interaction strength matrix , which takes as its matrix element. The interaction strengths ought to be non-negative (kij ≥ 0) for all i and j-th monomer pairs. In the HLM, if , then the i and j-th monomer has a potential to form a (chromatin) loop. After removing the translational degrees of freedom by setting in Eq. 2, we obtain the probability density of pairwise distance as (27)
| (3) |
where
| (4) |
and is the covariance between the positions of i and j-th monomers, which can be obtained from an inverse of K-matrix as
| (5) |
One can obtain the contact probability pij by integrating the pairwise distance P(rij) (Eq. 3) up to a certain capture radius (rc) (53, 54), , which gives
| (6) |
where . Therefore, a one-to-one analytical mapping between pij and kij follows from the precise mappings between pij and σij from Eqs. 4 to 6 and between σij and kij from Eq. 5.
Although it is tempting to directly use the mathematical relation between pij and kij to obtain from Hi-C data, there is an unavoidable numerical issue (see Supporting Materials and Methods and Figs. S3–S5 for details). In practice, we calculate the -matrix that approximates by selecting only the significant contacts in . More specifically, we evaluate the significance of contact probability pij by calculating zij, which is defined as (see the matrix elements in the upper diagonal part of Fig. 1 B)
| (7) |
where is the mean contact probability for monomer pairs separated by the arc length s along the contour. The greater the value of zij, the more significant the contacts are deemed. We then select the top 2N (i, j) pairs ranked in terms of the values of zij (>1) (the matrix elements in the lower diagonal part of Fig. 1 B). For these 2N pairs whose contact probability pij is given in , the precise value of (or equivalently ) can be determined using Eq. 6. Then, starting from a Rouse chain configuration as an initial input, we add nonsuccessive bonds with varying interaction strengths (0 ≤ kij ≤ 10 kBT/a2) until we minimize the objective function
| (8) |
so as to determine the optimal values of . Here, the weight factor ωij, which is used to normalize the statistical bias from chromatin loops of different sizes, is defined as
| (9) |
where is the number of loops of size s. The gradient-descent algorithm (L-BFGS-B method in SciPy package) was used to determine the optimal parameters . A fully convergent solution of -matrix (Fig. 1 C) could be obtained within a few minutes when N was not too large (≤200). This -matrix determining process, termed “constrained optimization,” faithfully reproduces the original matrix with a relative error smaller than 5% (see also Figs. S3–S5).
Figure 1.
The pipeline of the HLM. (A) Contact probability matrix of a 10-Mb genomic region of chr5 in GM12878 cells is shown. (B) matrix calculated from Eq. 7 is shown above the diagonal. The significant contacts selected from are shown below the diagonal. The sign of the first principal component of is provided on the left-hand side of the panel, and we divide the whole chromosome domain into “L” (purple), “M” (green), and “N” (orange) accordingly. (C) The interaction strength matrix calculated by the constrained optimization is shown. (D) The conformational ensemble of chromosomes generated from HLM potential defined by and (Eq. 1) is illustrated with the L, M, and N domains colored in purple, green, and orange, respectively, following the domain labels assigned in (B). (E) shows -matrix calculated using a conformational ensemble produced from molecular dynamics simulations. The PC between (A) and (E) is 0.96. To see this figure in color, go online.
In fact, the number of selected top contact pairs (nc) could have been 3N, N, or even N/2. But we found that when nc ≥ 2N, the quality of the resulting interaction strength matrix is already good enough that the Pearson correlation (PC) between the original Hi-C and the contact probability matrix obtained from saturates for nc > 2N (Fig. S6). Thus, to build the interaction strength matrix by simultaneously minimizing the computational cost, we chose nc = 2N.
After obtaining (Fig. 1 C) and hence , we added a nonbonded interaction term Unb(r), defined for all i and j pairs to the full energy potential UHLM(r) (Eq. 1):
| (10) |
where uLJ(r) is the Lennard-Jones potential truncated for r ≥ rc where rc = 5a/2 with ϵ = 0.45 kBT,
| (11) |
If ϵ = ϵθ (= 0.34 kBT) with , then Unb(r) leads to θ-solvent condition for an infinitely long chain, putting the second virial coefficient to zero, i.e., . We chose ε (= 0.45 kBT) slightly greater than εθ and assigned a loci-pair-type-dependent prefactor . Each monomer i is assigned with a type t, either “−” or “+,” based on the sign of the first principal component of (see the track on top of Fig. 1 B). The value of prefactor (>0)—depending on the types of two loci i and j, which are titj = ++, − −, or −+ —is evaluated by averaging over all the monomer pairs of the corresponding types, such that . The values of are determined based on a given Hi-C data set. For the case shown in Fig. 1, we obtain χ−,− = 1.18, χ−,+ = 0.79, and χ+,+ = 1.19. According to the Flory-Huggins theory (55), the condition leads to spatial separation between + and − type loci, which indeed is realized and reflected in the characteristic checkerboard pattern of Hi-C data. It should be noted, however, that the classification of type −/+ is not necessarily identical to the A/B compartment of chromatin. Whereas A/B compartments are genome-wide characteristics usually defined based on Hi-C data at low (Mb) resolutions (2, 3), the monomers in the HLM can be always classified into types −/+ regardless of the resolution of the model.
Finally, we sampled 3D chromosome structures using molecular dynamics simulation, implementing the full energy potential UHLM(r), and calculated the contact probability matrix based on an HLM-generated conformational ensemble. In the specific example demonstrated for the Hi-C data of the 10-Mb genomic region of chr5 in the GM12878 cell line (Fig. 1), (Fig. 1 E) obtained from HLM-generated chromosome conformations (Fig. 1 D; see also the clustering analysis that highlights the conformational variability of chromosomes in Supporting Materials and Methods; Fig. S7) displays a notable resemblance to the input (Fig. 1 A) (PC of 0.96; Spearman correlation of 0.92). Despite the simplicity of the HLM potential (Eq. 1), the similarity between and , as well as the chromosome conformations ensemble generated during the procedure, is remarkable.
Structure characterization
We quantified the structural feature of HLM-generated chromosome ensemble by means of several quantities:
-
1)
The compactness of a (sub)chain of length N is quantified in terms of , where rg is the gyration radius of the (sub)chain.
-
2)
The asphericity (A) is calculated by , where λi (i = 1, 2, 3) are the three eigenvalues of the moment of inertia tensor and is their mean (56, 57). A = 0 for a sphere, and A > 0 for a nonspherical shape.
-
3)
The roughness of the surface of a (sub)chain was evaluated using the Voronoi diagram (58), which tessellates the 3D space occupied by the chain. An upper bound for the volume of each monomer was set using a dodecahedron with a diameter of 2a. The Voronoi diagram provides a well-defined volume V and surface area S of the (sub)chain. Because the surface area of a perfect sphere with the volume V is S0 = (36πV2)1/3, we quantified the surface roughness using S/S0 ≥ 1.
-
4)
To visualize an ensemble of structures with considerable variability, we first divided the chain into a few segments (domains). Next, the distribution of the distances between the geometric centers of these domains were computed based on the ensemble of structures. Several configurations of chromosomes were then randomly selected from the most populated state (in terms of interdomain distances), aligned, and rendered.
Results
The HLM is effectively a multiblock copolymer model in which monomer-monomer interactions (loops) are harmonically restrained with varying interaction strengths (kij) (Methods; Supporting Materials and Methods). Mapping the pairwise contact probabilities pij from Hi-C to the model parameters kij is the essence of the HLM. By incorporating a standard Lennard-Jones nonbonded potential slightly below the θ-condition, which takes into account the short-range excluded-volume interaction between monomers as well as the global thermodynamic driving force that induces spatial separation between different monomer types, the HLM allows us to generate a conformational ensemble of chromosome structures that reproduces a contact probability matrix that displays close resemblance to an original inputted Hi-C data set. We used the HLM to model various genomic regions (see Table 1). HLM-generated chromosome conformations were used to interpret the currently available experimental results.
Table 1.
Genomic Regions Simulated in This Work
| Species | Cell line | Hi-C experiment | Chromosome | Start (bp) | End (bp) | Resolution (kb) | N | PCa | Time (min)b | Figure |
|---|---|---|---|---|---|---|---|---|---|---|
| Human | GM12878 | (3) | chr5 | 90,000,000 | 100,000,000 | 50 | 200 | 0.96 | 4.8 | Fig. 1 |
| IMR90 | (3) | chr21 | 14,000,000 | 48,000,000 | 250 | 137 | 0.97 | 0.8 | Fig. 2 | |
| IMR90 | (3) | chr11 | 59,000,000 | 94,000,000 | 250 | 140 | 0.98 | 1.7 | Fig. S8 | |
| IMR90 | (3) | chr1 | 150,000,000 | 180,000,000 | 250 | 120 | 0.98 | 0.8 | Fig. S9 | |
| K562 | (3) | chr16 | 60,000 | 560,000 | 5 | 100 | 0.94 | 0.2 | Fig. 3 | |
| GM12878 | (3) | chr16 | 60,000 | 560,000 | 5 | 100 | 0.92 | 0.4 | Fig. 3 | |
| hESC | (88) | chr3 | 179,000 | 184,000 | 40 | 125 | 0.94 | 1.4 | Fig. 4 | |
| HUVEC | (3) | chr3 | 179,000 | 184,000 | 40 | 125 | 0.95 | 1.8 | Fig. 4 | |
| Mouse | mESC | (75) | chr2 | 105,000,000 | 106,000,000 | 8 | 125 | 0.94 | 1.4 | Fig. 5 |
| NPC | (75) | chr2 | 105,000,000 | 106,000,000 | 8 | 125 | 0.96 | 1.2 | Fig. 5 | |
| CN | (75) | chr2 | 105,000,000 | 106,000,000 | 8 | 125 | 0.97 | 1.3 | Fig. 5 | |
| ncx_NPC | (75) | chr2 | 105,000,000 | 106,000,000 | 8 | 125 | 0.97 | 0.8 | Fig. 5 | |
| ncx_CN | (75) | chr2 | 105,000,000 | 106,000,000 | 8 | 125 | 0.97 | 1.1 | Fig. 5 | |
| mESC | (4) | chr19 | 1 | 61,342,430 | 500 | 117 | 0.92c | 1.4 | Fig. 6 |
The similarity between contact probabilities (pij) from Hi-C and those from modeling is quantified by the PC (see also discussions in Supporting Materials and Methods).
It takes a few minutes to determine the interaction strength parameters by the constrained optimization, namely obtaining from .
From the post-M to pre-M phase, PC of mESCs is 0.77, 0.96, 0.96, 0.96, 0.97, and 0.91, respectively.
Spatial distribution of TADs inferred from HLM in comparison with FISH measurement
Intrachromosomal distances between topologically associated domains (TADs) in human IMR90 cells, measured by Wang et al. through a multiplexed FISH method (23), have been used as a benchmark for different models (38). To show the utility of the HLM, we model a 34-Mb genomic region on chr21 of IMR90 cells, which contains 33 labeled TADs (Table S1 provides the genomic positions of these TADs).
First, the contact probability matrix constructed from HLM-generated structures captures the characteristic checkerboard pattern of the heatmap of Hi-C data, ; the mean contact probability PHLM(s) of the HLM is consistent with PHi-C(s) calculated from Hi-C over all length scales, including the wiggly pattern at large s (Fig. 2, A and B).
Figure 2.
A 34-Mb-genomic region of chr21 in IMR90 cells modeled by the HLM. (A) A heatmap of contact probabilities from Hi-C (, upper diagonal part) and HLM (, lower diagonal part) is given. The PC between and is 0.97. The positions of TADs, labeled by sticks, are displayed above the heatmap. The type of each domain, A or B, is depicted in red or blue, respectively. (B) Plotted are the mean contact probabilities P(s)’s calculated from Hi-C (orange data) and HLM (blue line). (C) The heatmap of inter-TAD distances measured by FISH (upper diagonal part) is compared with that calculated from the HLM (lower diagonal part). (D) Distributions of A- and B-type TADs projected on the x axis, along which the geometric centers of different types of TADs are aligned, are indicative of the spatial separation between the two TAD types. An ensemble of structures is also shown. (E) Intrachain end-to-end distances r(s)’s as a function of arc length s from FISH (orange data) and the HLM (blue line) are shown. The inset shows r(s)’s in log-log scale. (F) Inverse of pairwise contact probability between TADs, versus inter-TAD distance rij is shown in log-log scale. To see this figure in color, go online.
The heatmap calculated for inter-TAD distances using the HLM-generated conformational ensemble (lower diagonal part of Fig. 2 C) can directly be compared with the FISH measurement (upper diagonal part). The square block pattern along the diagonal axis of the heatmap indicates that four to five adjacent TADs constitute an aggregate, reminiscent of meta-TAD (30), and the patterns in the off-diagonal part (highlighted by the magenta boxes) suggest long-range clustering of TADs. The error of the inter-TAD distance heatmap relative to FISH is 0.184, which is comparable to the value of the GEM model (38) and better than others (see Fig. 4 D in (38)). A principal component analysis of this matrix (top left part of the matrix in Fig. 2 C) divides TADs into A/B types (23). Fig. 2 D demonstrates the polarized organization of A- and B-type TADs by aligning the geometric centers of HLM-generated A- and B-type TADs along an axis that best separates the two types of TADs (23).
Figure 4.
Comparison of a 5-Mb genomic region on chr3 modeled by the HLM between hESCs and HUVECs, which includes the SOX2 gene. (A) Genes annotated in this region are aligned with RNA-seq (68) and H3K27ac signals (89) of two cell lines. The genomic positions of three “simulated” FISH probes (32) are labeled in the bottom track. (B) The distance between upstream and SOX2, (C) the distance between SOX2 and downstream, and (D) the gyration radius calculated from our model are given. (E) A heatmap of contact probabilities for hESCs measured by Hi-C (88) (upper diagonal part) and calculated from the HLM (lower diagonal part) is displayed. Based on the first principal component of the significance matrix (track on the left side of heatmap), we divided the region into three domains and colored the chromatin chain accordingly in the snapshot of a typical structural ensemble. (F) Analysis was carried out for HUVECs with Hi-C data from (3). To see this figure in color, go online.
The intrachain end-to-end distance displays a scale-dependent scaling relationship with the genomic distance s, r(s) ∼ sν (Fig. 2 E). In qualitative agreement with the FISH measurement (23), there is a crossover around s = 7 Mb, such that ν ≈ 1/3 for s < 7 Mb and ν ≈ 0.21 for s > 7 Mb.
We explore the relationship between contact probability pij and the corresponding distance rij of two loci. It is expected that the looping probability of polymer is inversely proportional to the volume of space (V) explored by the two loci as Ploop ∼ 1/V. Because the volume V scales with the spatial separation (R) between the two loci in d-dimension as V ∼ Rd, it follows that (59, 60, 61)
| (12) |
The correlation hole exponent g is g = 0 for a Gaussian chain (55). According to the Flory theorem (62, 63, 64, 65), the ideal chain statistics is a good approximation for a chain in polymer melts or for a subchain in a fully equilibrated globule. Because d = 3 for 3D, we expect Ploop ∼ R−3, or equivalently, (see also Fig. S1 B). In fact, this scaling relation is observed for the data point generated by HLM for rij < 1 μm (Fig. 2 F). Although Wang et al., who combined Hi-C and FISH data, reported a scaling relation of for the entire range, it is not clear whether the relation can straightforwardly be extended to the range of rij < 1 μm in which the data point from their measurement might be less accurate. According to the HLM-generated data, a more proper scaling should be for rij < 1 μm and for rij > 1 μm.
Next, to demonstrate another analysis on FISH measurement, we applied the HLM to the q-arm of chr11 in IMR90 cells, whose intrachain pairwise distances between genomic loci had been measured with FISH (28, 66) (see Table S2 for the position of FISH probes in the genome and in the model). The model produces the contact probability matrix with a PC of 0.98 relative to Hi-C data () (see Fig. S8, A and B). The HLM enables us to calculate the spatial distances between specific pairs of loci (Fig. S8 C), with a mean relative error of 0.189 (with respect to FISH data). The HLM-generated structural ensemble also indicates that compared to the gene-poor and transcriptionally inactive antiridge domain, the transcriptionally active ridge domain is less compact, less spherical, and has a rougher domain surface (Fig. S8, D–F), all of which are in agreement with the FISH experiment (66). Modeling another 30-Mb region on chr1 of IMR90 cells leads to similar results (Fig. S9; Table S3).
Visualization of chromatin globules
α-Globin gene
Cis-regulatory elements generally mediate the transcription of neighboring genes within a range smaller than 1 Mb (67). The α-globin gene domain, a 500-kb genomic region known as ENm008 located at the left telomere of human chr16, has previously been studied to decipher the relationship between chromatin structure and transcription activity (37, 51, 52). RNA-seq data (68, 69, 70) indicate that the α-globin genes (including ζ-, μ-, α2-, α1-, and θ-globin genes) are expressed in K562 cell lines but silenced in GM12878 (tracks on the left side of the Hi-C heatmaps in Fig. 3 A). According to 3C/5C measurements (51, 71), the α-globin gene forms long-range looping interactions with multiple regulatory elements upon gene activation. Among them, of particular interest is one of the DNase I-hypersensitive sites (DHSs), HS40, located at ∼70 kb upstream of the α1 gene.
Figure 3.
α-Globin gene domain modeled by the HLM for two different cell lines. (A) Shown are the heatmap () of contact probabilities measured by Hi-C (upper diagonal part) and the corresponding map () obtained from the HLM (lower diagonal part) in K562 (left) and GM12878 (right) cells. RNA-seq signals (68) are displayed on the left side of the heatmaps, and the location of the α-globin gene is depicted in gray shading. (B) Mean contact probability P(s) is plotted. (C) Compactness and (D) asphericity of the domain are shown. (E) Genomic positions of the loci, closer to the α1 gene in K562 than in GM12878 cells, are labeled using red sticks. Contrasted below are the distance distributions between the α1 gene and HS40, P(rα,HS40), for two cell lines. For each cell line, an ensemble of structures is shown for comparison with chains colored by the genomic position from the telomere (blue) to centromere (red). The α-globin gene and HS40 are rendered using a black and an orange sphere, respectively. (F) Pol-II-mediated chromatin interactions (49), involving α-globin genes and specific to K562 cells, are compared with the model. To see this figure in color, go online.
The HLM-generated structural ensemble at 5-kb resolution for ENm008 of two cell lines (K562 and GM12878) suggests that the contact probability P(s) decreases slightly faster in K562 than in GM12878 cells at large s (Fig. 3 B). The α-globin domains of K562 and GM12878 cell lines visualized with FISH (51) indicate that K562 is less compact than GM12878, which is confirmed straightforwardly by the compactness calculated using the HLM-generated structures (Fig. 3 C). Compared with GM12878 cells, the α-globin domain in K562 cells adopts a less spherical shape (Fig. 3 D; (51, 52)) (see also Fig. S10, where individual loci are classified into different groups based on their 3D coordinates, clarifying the spatially separated domains of K562 cells).
Next, we examined the changes in the distances between the -globin gene and other loci upon activation of the gene. Even though the whole domain in K562 cells is relatively more expanded, HS40 is closer to the α1 gene in K562 than in GM12878 cells (Fig. 3 E), which is consistent with the expectation based on the higher contact enrichment between HS40 and the α1 gene observed in K562 by 3C/5C measurements (e.g., Fig. 2 in (51)). Through inter-cell-line comparison between K562 and GM12878 for the rest of the region using distance distribution to the α-globin gene locus, we identified a group of loci other than HS40 that are significantly closer to α-globin genes in K562 cells (Mann-Whitney U test, p < 1 × 10−5). Their genomic positions are marked using red sticks in Fig. 3 E. According to the independent chromatin interaction analysis by paired-end tag sequencing experiments (49, 50) designed to capture the chromatin loop interactions mediated by specific protein factors, the structural variation associated with α-globin genes is mainly orchestrated by Pol II (see Table S4). HLM captures 83% of Pol-II-mediated chromatin loops specific to K562 cells (Fig. 3 F).
These results indicate that the HLM captures both the tissue-specific variation in the global packing of the α-globin gene domain and variation in the structure of the gene locus. The multiple K562-specific interactions, substantiated by the HLM, suggest that a cooperative action of multiple regulatory elements, including HS40, is responsible for the activation of α-globin genes (37). HLM-generated conformations indeed confirm the notion of chromatin globule proposed in (51).
SOX2 gene
As another example of transcription-dependent chromatin folding, we studied the human SOX2 gene locus, which encodes a transcription factor involving the regulation of embryonic development. The SOX2 gene is transcribed in human embryonic stem cells (hESCs) but not in umbilical vein epithelial cells (HUVECs) (Fig. 4 A). To compare the results from the HLM with a recent modeling study (32), we measured the distances between the SOX2 gene and two possible regulatory elements located at regions ∼800 kb upstream and ∼650 kb downstream. Whereas both elements are closer to the SOX2 locus in transcriptionally active hESCs than in inactive HUVECs, the chromatin fiber is less compact in hESCs (Fig. 4 D; see also the snapshots in Fig. 4, E and F). HLM-generated structures demonstrate the dependence of chromatin folding on the transcription level at SOX2 gene loci, and this trend comports well with the prediction made in (32), which also employed polymer model simulation.
Chromatin interaction at Pax6 gene loci
The efficacy of the HLM was further tested for the genomic loci of the Pax6 gene, which involve the development of mouse neural tissues. Flanked by two neighboring genes (Pax6os1 and Elp4), the expression level of the Pax6 gene is considered to be regulated by multiple long-range elements, including two regulatory regions located at ∼50 kb upstream (URR) and ∼95 kb downstream (DRR) (Fig. 5 A). The DRR contains several DHSs and the SIMO enhancer, which was identified in transgenic reporter gene studies of developing mouse embryos (72, 73). Another cis-regulatory element within the URR, PE3, has recently been identified from mouse pancreatic β cells (β-TC3) (74).
Figure 5.
Pax6 gene locus modeled by the HLM for five different mouse cell types. (A) Genes in a 1-Mb region on chr2, where the genomic positions of the URR, Pax6, and the DRR are labeled with gray shading, are shown in alignment with the fragments per kilobase of transcript per million score measured from RNA-seq analysis (75). Pax6 gene promoters and enhancers and nearby DHSs are zoomed in at the bottom (74, 90), where the positions of the upstream enhancer (UE), promoter (P), and downstream enhancer (DE) are marked with arrows. (B) Expression levels of the Pax6 gene are provided for different cell types. (C) Contact profiles for five different cell types are shown. The profiles were constructed using Hi-C data (gray bars) for the URR, Pax6, and the DRR (from top to bottom) relative to other genomic regions and calculated using the HLM (solid lines). (D) Pax6 expression level is shown as a function of the average distance between two enhancers (UE and DE) and the promoter (P). (E) Percentage of chromatin conformations belonging to each group classified based on the distances between UE, P, and DE is shown. (F) Shown are the scatter plot of the distances rP–DE and RUE–P of 200 structures, which were randomly selected from the conformational ensemble of ncx_NPCs. Typical structures are presented for each group in which the three sites are labeled using different colors. To see this figure in color, go online.
A study combining Capture-C, FISH, and simulations (32) has reported a nontrivial correlation between the expression level of the Pax6 gene and the spatial separation from the Pax6 gene to the URR and DRR. Among the three types of mouse cells (β-TC3, MV+, and RAG cells) studied in (32), the Pax6 gene maintained the largest separation from the DRR in the β-TC3 cells that displayed the highest expression level of Pax6. Therefore, it was suggested (32) that the enhancer at the DRR is not involved in upregulation of Pax6 in β-TC3 cells or that some unclear upregulation mechanisms that do not require the spatial proximity to enhancers are responsible for the activity of the Pax6 gene.
To study the origin of the possible complex interplay between the Pax6 gene and neighboring genetic elements, we applied the HLM to the same genomic region of five different mouse cell types whose Hi-C data are currently available: 1) embryonic stem cells (mESCs), 2) neural progenitors (NPCs), 3) cortical neurons (CNs), 4) ncx_NPCs, and 5) ncx_CNs, with the prefix “ncx_ ” indicating that the cells are directly purified from the developing mouse embryonic neocortex in vivo. Each cell type displays distinct transcriptional activity patterns of Pax6 and its neighboring genes (75) (Fig. 5 A). According to the fragments per kilobase of transcript per million scores from RNA-seq analysis, which is higher when the gene transcription is more active, the five cell types display Pax6 activity in the following order: ncx_NPC > NPC > CN > ES > ncx_CN (Fig. 5 B).
The contact probabilities calculated from our HLM-generated conformations reasonably reproduce the Hi-C data at 8-kb resolution (75) (see Fig. S11; Table 1). The Hi-C contact profiles of three genomic loci (URR, Pax6, and DRR) with other genomic regions (histograms in Fig. 5 C) are well-captured by HLM-generated conformations (lines in Fig. 5 C). Compared with the distance of Pax gene promoter (P) to the upstream enhancer (UE), Pax6 gene activity is better correlated with the distance to the downstream enhancer (DE) (see Fig. 5 D); the closer to the DE, the higher the Pax gene activity is. The highest Pax gene activity is seen in ncx_NPCs. Notice that the most enriched Hi-C contacts between Pax6 and DRR are indeed found in ncx_NPCs, which is marked with a red star in Fig. 5 C. We note that our finding on contacts between Pax6 and the DRR using a different set of cell lines differs from the result based on β-TC3 cells (see Fig. 2 A in (32)). This, however, underscores that the mechanism or the chromatin conformations responsible for the Pax6 gene activity depend strongly on the cell type. It is clear that the mechanism of Pax6 gene regulation in ncx_NPCs differs from that in β-TC3 cells.
Next, given that Hi-C data are obtained from a collection of millions of cells, heterogeneity of chromatin conformations is inevitable in analyses, which has indeed been highlighted in (32). To characterize the heterogeneity in the HLM-generated conformational ensembles, we classified each chromatin structure into five groups based on the separations between the Pax6 gene promoter (P) and two enhancers (UE and DE) (Fig. 5 E). To visualize the conformational diversity, we randomly selected 200 structures and characterized them by the promoter-enhancer distances (Fig. 5 F). Except for the “gray” group, in which all three separations are large, the population of conformational ensemble consists mainly of the “black” group (P is close to DE but not to UE) and the “purple” group (P is close to UE but not to DE), which are suspected to be responsible for the high expression level of the Pax6 gene. Consistent with our analysis of the ensemble-averaged distance to enhancers for different cells (Fig. 5 D), the proportion of the “black” group shows a decreasing trend as Pax6 becomes less active (Fig. 5 E), suggesting a more important role of DE than UE in regulating the Pax6 gene for the five cell lines.
Although an indirect upregulation of Pax6 gene by DRR as seen in β-TC3 cells (32) cannot entirely be ruled out, the correlation of gene activity level with the spatial proximity of the Pax6 gene to the DRR is clearly demonstrated, at least across the five cell lines that we studied using the HLM. The mechanism of indirect upregulation and the mechanism of cell-type-dependent choices deserve further study.
Chromosome in different phases of the cell cycle
Most Hi-C data are obtained over a population of “unphased” cells. Here, we employ the HLM to model the global architecture of chromosome at different phases of the cell cycle during the interphase based on single-cell Hi-C (4). Accumulating the data from tens to hundreds of binary contact matrices of single cells into an input matrix , we built a 500-kb-resolution model of chromosome for the post-M, early-S, mid-S, late-S/G2, and pre-M phases of chr19 in mESCs (above the diagonal in Fig. 6 A). matrices computed using the HLM (below the diagonal in Fig. 6 A) display reasonable correlation with the original Hi-C data (PC > 0.9), except for the post-M phase (PC = 0.77); unlike other phases, the lower PC value with the -matrix at the post-M phase, characterized with a uniform and featureless pattern, is due to the smaller number of sampling cells (Nc).
Figure 6.
Chr19 of mESCs modeled by the HLM at 500-kb resolution. (A) A heatmap of contact probabilities from Hi-C (upper diagonal part) and HLM (lower diagonal part) is given. From post-M phase to pre-M phase, Pearson correlations (PCs) are 0.77, 0.96, 0.96, 0.96, 0.97, and 0.91, respectively. The Hi-C matrices are the outcomes from a sum of Nc binary contact matrices of single cells in the same phases of the cell cycle. (B) Plotted are and the average volume () occupied by a single monomer. (C) Asphericity of the chromosome in different phases along the cell cycle is calculated. Depicted at post-M, G1, and pre-M phases are the snapshots of the HLM-generated structure, which are colored from the centromere (blue) to telomore (red). To see this figure in color, go online.
The local compactness of the chromosome conformation was quantified in terms of the average volume occupied by a single monomer () based on the Voronoi tessellation (Fig. 6 B). After mitosis, the chromosome continues to expand until the late-S/G2 phase. The gyration radius also captures this trend (Fig. 6 B), except that the model has the largest value of rg in the post-M phase. A partial condensation of the chain (decreases in and ) is observed before entering the pre-M phase. This decondensation-condensation cycle is also captured with the asphericity of structures generated from the HLM (Fig. 6 C), which decreases dramatically from the post-M to G1 phases and then increases gradually after the G1 phase. The same conclusion can be drawn from the probability density of pairwise distance between monomers (see Fig. S12).
Discussion
The HLM is similar to previous polymer models of chromatin that also convert information on spatial proximity into effective harmonic restraints between monomers (25, 76, 77). In fact, our use of harmonic potential is based on our observation that the pairwise loci distance distributions measured in many FISH experiments (23, 36, 78, 79) are reasonably represented by the variations under harmonic restraints. For example, the distance distribution between seven pairs of FISH probes in mESCs (36) can be reasonably represented by the probability density of the pairwise distance of the HLM (Fig. S1 C). Of course, we cannot rule out the possibility that the cell population is too heterogeneous to capture by using single harmonic restraint.
The HLM adopts a “mean-field” approach of using a population-sampled Hi-C map as the sole input data. Fundamental concerns as to the use of single-input data in solving the inverse problem can still be raised to many modeling approaches employing information such as epigenetic marks and DNA accessibility, which are also population-averaged, not single-cell based. Nevertheless, the nature of contact pairs is still probabilistic, giving rise to variations in pairwise distances (Fig. S1 A). More importantly, topological and energetic frustrations that arise from the competition between the chain connectivity and long-range interaction defined in Hi-C data are inherent in the polymeric system (80). It is generally not possible to obtain a single chromatin structure that satisfies all the probabilistic constraints given in the Hi-C map. As a result of computationally solving the inverse problem of inferring 3D structures from population-sampled Hi-C data, we always observe structural heterogeneity in the chromosome conformation ensemble (e.g., see Fig. S7). Of course, it is in principle questionable whether or not such a heterogeneous structural ensemble acquired from a Hi-C-map-based HLM represents the true heterogeneous population of chromosome; however, as demonstrated in this study, using the HLM, we can still extract an amount of meaningful information that can complement diverse experimental measurements.
To demonstrate that the choice of energy potential in HLM is optimal over similar alternatives, we examined HLM and its three variants on a 10-Mb genomic region on chr5 of GM12878 cells (Fig. S13). Unlike the HLM, which faithfully reproduced the domain edges of enriched contacts observed by Hi-C (highlighted by cyan boxes in Fig. S13 A) that were regarded as a distinct feature of loop extrusion (14), two alternative copolymer models, which retain uniform strength of loop interaction, could not properly reproduce the diagonal-block patterns of Hi-C data (Fig. S13, B and C). In a homopolymer model, in which χ−,−, χ−,+, and χ+,+ are all set to 1 (see Methods), the long-range checkboard pattern was not reproduced (Fig. S13 D). The PC of contact probabilities contrasted between Hi-C and other models at different genomic separation, PC(s), shows that HLM outperforms others (Fig. S13 E).
Di Stefano et al. have performed steered molecular dynamics simulations of a polymer model of the whole genome of hESC and IMR90 cells, based on physical restraints derived from Hi-C (21). Their model features the nuclear positioning of different chromosomes and functional genomic regions observed in vivo. To compare two models, we computed the Kendall rank correlation between the Hi-C contact matrix and the HLM distance matrix in the simulated genomic regions of both cell types (Fig. S13 F). The Kendall rank correlation value gets closer to −1 as the correlation between the model and Hi-C increases. In comparison with steered molecular dynamics, structures generated with the HLM are better correlated with Hi-C, especially at short genomic distance.
The minimal chromosome model (MiChroM) is another interesting Hi-C-based polymer model (34), which we have used in our previous study (35). We simulated chr5 of GM12878 cells at 50-kb resolution with MiChroM (Fig. S13 G) and compared PC(s) of a 10-Mb region based on both models (Fig. S13 H). Even though MiChroM considers six types of monomers to describe nonbonded interactions, the HLM outperforms MiChroM in terms of short-range correlations. It also shows comparable long-range correlations with Hi-C.
In addition to the overall PCs listed in Table 1, we calculated PC(s) for all the genomic regions discussed in this study (Fig. S14). Compared with the modeling based on ensemble Hi-C data (Fig. S14, A and B), the model of chr19 of mESC based on single-cell Hi-C shows lower correlation (Fig. S14 C). We found that the PC in general decreases at large genomic separation, but there are two groups of exceptional cases in which the models maintain high correlation with Hi-C at large value of s. The first group is the model of IMR90 cells, which has the lowest model resolution (i.e., the largest genomic size of each monomer) among the human genome models in Fig. S14 A. The second group is the model of mouse neuron cells (Fig. S14 B), in which the input Hi-C library has higher depth of coverage than the Hi-C data used in others (75). These results suggest that the quality of the resulting structures depends on the accuracy of Hi-C contact probabilities, which can be improved by lowering the model resolution and choosing ultradeep Hi-C. For a specific genomic region of interest, one may improve the model quality further by fine-tuning the value of χ.
As shown for different chromosomes, cell types, and species with a flexible choice of model resolution, one of the greatest advantages of the HLM is its versatile application. Although all of the output conformations exhibit great variability (see discussions in Figs. S7 and 5 F), the population-sampled contact map faithfully reproduces the input Hi-C data. For a given Hi-C data set, the two sets of model parameters and can be determined in a few minutes using a personal computer without any manual intervention (Table 1).
In summary, we demonstrated that the HLM is a computationally efficient approach with which to investigate the genome function. The conformational ensemble generated by the HLM shows that depending on the chromatin states, different types of chromatin domains have different compactness and shapes, and spatial phase separation between domains takes places in human genome. The inter-cell-line comparison of human α-globin and SOX2 loci shows that although the submegabase gene domain becomes less compact upon gene activation, the most critical regulatory element comes closer to the gene, and that its expression is likely affected by many other elements. The activity of the Pax6 gene during mouse neuron development is mostly modulated by the proximity between Pax6 promoter and the DE, whereas the distance to the upstream regulatory element shows nonmonotonic variations with its activity for the cell types we studied. The HLM was also used to visualize the cell-cycle dynamics of chromosome organization based on single-cell Hi-C. Although the HLM is not designed based on assumptions of molecular mechanisms of genome organization, the principle of transcription regulation can be inferred from the changes of chromatin conformations. With Hi-C data being accumulated, the HLM would be of great use to provide complementary structural information that is not easily accessible to current experiments.
Author Contributions
L.L., M.H.K., and C.H. designed and performed the research, analyzed the data, and wrote the manuscript.
Acknowledgments
We thank the Center for Advanced Computation in Korea Institute for Advanced Study for providing computing resources.
C.H. acknowledges a partial support from the National Research Foundation of Korea (NRF-2018R1A2B3001690).
Editor: Wilma Olson.
Footnotes
Supporting Material can be found online at https://doi.org/10.1016/j.bpj.2019.06.032.
Supporting Citations
References (81, 82, 83, 84, 85, 86, 87) appear in the Supporting Material.
Supporting Material
References
- 1.Dekker J., Rippe K., Kleckner N. Capturing chromosome conformation. Science. 2002;295:1306–1311. doi: 10.1126/science.1067799. [DOI] [PubMed] [Google Scholar]
- 2.Lieberman-Aiden E., van Berkum N.L., Dekker J. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Rao S.S., Huntley M.H., Aiden E.L. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Nagano T., Lubling Y., Tanay A. Cell-cycle dynamics of chromosomal organization at single-cell resolution. Nature. 2017;547:61–67. doi: 10.1038/nature23001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Du Z., Zheng H., Xie W. Allelic reprogramming of 3D chromatin architecture during early mammalian development. Nature. 2017;547:232–235. doi: 10.1038/nature23263. [DOI] [PubMed] [Google Scholar]
- 6.Rivera C.M., Ren B. Mapping human epigenomes. Cell. 2013;155:39–55. doi: 10.1016/j.cell.2013.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Jin F., Li Y., Ren B. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature. 2013;503:290–294. doi: 10.1038/nature12644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Leung D., Jung I., Ren B. Integrative analysis of haplotype-resolved epigenomes across human tissues. Nature. 2015;518:350–354. doi: 10.1038/nature14217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Dryden N.H., Broome L.R., Fletcher O. Unbiased analysis of potential targets of breast cancer susceptibility loci by Capture Hi-C. Genome Res. 2014;24:1854–1868. doi: 10.1101/gr.175034.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Jäger R., Migliorini G., Houlston R.S. Capture Hi-C identifies the chromatin interactome of colorectal cancer risk loci. Nat. Commun. 2015;6:6178. doi: 10.1038/ncomms7178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Baca S.C., Prandi D., Garraway L.A. Punctuated evolution of prostate cancer genomes. Cell. 2013;153:666–677. doi: 10.1016/j.cell.2013.03.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Barbieri M., Chotalia M., Nicodemi M. Complexity of chromatin folding is captured by the strings and binders switch model. Proc. Natl. Acad. Sci. USA. 2012;109:16173–16178. doi: 10.1073/pnas.1204799109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Sanborn A.L., Rao S.S., Aiden E.L. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc. Natl. Acad. Sci. USA. 2015;112:E6456–E6465. doi: 10.1073/pnas.1518552112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Fudenberg G., Imakaev M., Mirny L.A. Formation of chromosomal domains by loop extrusion. Cell Rep. 2016;15:2038–2049. doi: 10.1016/j.celrep.2016.04.085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Bianco S., Lupiáñez D.G., Nicodemi M. Polymer physics predicts the effects of structural variants on chromatin architecture. Nat. Genet. 2018;50:662–667. doi: 10.1038/s41588-018-0098-8. [DOI] [PubMed] [Google Scholar]
- 16.Jost D., Carrivain P., Vaillant C. Modeling epigenome folding: formation and dynamics of topologically associated chromatin domains. Nucleic Acids Res. 2014;42:9553–9561. doi: 10.1093/nar/gku698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Brackley C.A., Brown J.M., Marenduzzo D. Predicting the three-dimensional folding of cis-regulatory regions in mammalian genomes using bioinformatic data and polymer models. Genome Biol. 2016;17:59. doi: 10.1186/s13059-016-0909-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Wang S., Xu J., Zeng J. Inferential modeling of 3D chromatin structure. Nucleic Acids Res. 2015;43:e54. doi: 10.1093/nar/gkv100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Szalaj P., Michalski P.J., Plewczynski D. 3D-GNOME: an integrated web service for structural modeling of the 3D genome. Nucleic Acids Res. 2016;44:W288–W293. doi: 10.1093/nar/gkw437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Tjong H., Li W., Alber F. Population-based 3D genome structure analysis reveals driving forces in spatial genome organization. Proc. Natl. Acad. Sci. USA. 2016;113:E1663–E1672. doi: 10.1073/pnas.1512577113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Di Stefano M., Paulsen J., Micheletti C. Hi-C-constrained physical models of human chromosomes recover functionally-related properties of genome organization. Sci. Rep. 2016;6:35985. doi: 10.1038/srep35985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Shi G., Liu L., Thirumalai D. Interphase human chromosome exhibits out of equilibrium glassy dynamics. Nat. Commun. 2018;9:3161. doi: 10.1038/s41467-018-05606-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wang S., Su J.H., Zhuang X. Spatial organization of chromatin domains and compartments in single chromosomes. Science. 2016;353:598–602. doi: 10.1126/science.aaf8084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Boettiger A.N., Bintu B., Zhuang X. Super-resolution imaging reveals distinct chromatin folding for different epigenetic states. Nature. 2016;529:418–422. doi: 10.1038/nature16496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Münkel C., Langowski J. Chromosome structure predicted by a polymer model. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics. 1998;57:5888–5896. [Google Scholar]
- 26.Münkel C., Eils R., Langowski J. Compartmentalization of interphase chromosomes observed in simulation and experiment. J. Mol. Biol. 1999;285:1053–1065. doi: 10.1006/jmbi.1998.2361. [DOI] [PubMed] [Google Scholar]
- 27.Bohn M., Heermann D.W., van Driel R. Random loop model for long polymers. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 2007;76:051805. doi: 10.1103/PhysRevE.76.051805. [DOI] [PubMed] [Google Scholar]
- 28.Mateos-Langerak J., Bohn M., Goetze S. Spatially confined folding of chromatin in the interphase nucleus. Proc. Natl. Acad. Sci. USA. 2009;106:3812–3817. doi: 10.1073/pnas.0809501106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hofmann A., Heermann D.W. The role of loops on the order of eukaryotes and prokaryotes. FEBS Lett. 2015;589:2958–2965. doi: 10.1016/j.febslet.2015.04.021. [DOI] [PubMed] [Google Scholar]
- 30.Fraser J., Ferrai C., Nicodemi M., FANTOM Consortium Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation. Mol. Syst. Biol. 2015;11:852. doi: 10.15252/msb.20156492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Brackley C.A., Johnson J., Marenduzzo D. Simulated binding of transcription factors to active and inactive regions folds human chromosomes into loops, rosettes and topological domains. Nucleic Acids Res. 2016;44:3503–3512. doi: 10.1093/nar/gkw135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Buckle A., Brackley C.A., Gilbert N. Polymer simulations of heteromorphic chromatin predict the 3D folding of complex genomic loci. Mol. Cell. 2018;72:786–797.e11. doi: 10.1016/j.molcel.2018.09.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Chiariello A.M., Annunziatella C., Nicodemi M. Polymer physics of chromosome large-scale 3D organisation. Sci. Rep. 2016;6:29775. doi: 10.1038/srep29775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Di Pierro M., Zhang B., Onuchic J.N. Transferable model for chromosome architecture. Proc. Natl. Acad. Sci. USA. 2016;113:12168–12173. doi: 10.1073/pnas.1613607113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Liu L., Shi G., Hyeon C. Chain organization of human interphase chromosome determines the spatiotemporal dynamics of chromatin loci. PLoS Comput. Biol. 2018;14:e1006617. doi: 10.1371/journal.pcbi.1006617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Giorgetti L., Galupa R., Heard E. Predictive polymer modeling reveals coupled fluctuations in chromosome conformation and transcription. Cell. 2014;157:950–963. doi: 10.1016/j.cell.2014.03.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Gürsoy G., Xu Y., Liang J. Computational construction of 3D chromatin ensembles and prediction of functional interactions of alpha-globin locus from 5C data. Nucleic Acids Res. 2017;45:11547–11558. doi: 10.1093/nar/gkx784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zhu G., Deng W., Zeng J. Reconstructing spatial organizations of chromosomes through manifold learning. Nucleic Acids Res. 2018;46:e50. doi: 10.1093/nar/gky065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Li Q., Tjong H., Alber F. The three-dimensional genome organization of Drosophila melanogaster through data integration. Genome Biol. 2017;18:145. doi: 10.1186/s13059-017-1264-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Serra F., Di Stefano M., Marti-Renom M.A. Restraint-based three-dimensional modeling of genomes and genomic domains. FEBS Lett. 2015;589:2987–2995. doi: 10.1016/j.febslet.2015.05.012. [DOI] [PubMed] [Google Scholar]
- 41.Goldbart P.M., Zippelius A. Amorphous solid state of vulcanized macromolecules: a variational approach. Phys. Rev. Lett. 1993;71:2256–2259. doi: 10.1103/PhysRevLett.71.2256. [DOI] [PubMed] [Google Scholar]
- 42.Solf M.P., Vilgis T.A. Statistical mechanics of macromolecular networks without replicas. J. Phys. Math. Gen. 1995;28:6655. [Google Scholar]
- 43.Bryngelson J.D., Thirumalai D. Internal constraints induce localization in an isolated polymer molecule. Phys. Rev. Lett. 1996;76:542–545. doi: 10.1103/PhysRevLett.76.542. [DOI] [PubMed] [Google Scholar]
- 44.Zwanzig R. Effect of close contacts on the radius of gyration of a polymer. J. Chem. Phys. 1997;106:2824–2827. [Google Scholar]
- 45.Cacciuto A., Luijten E. Self-avoiding flexible polymers under spherical confinement. Nano Lett. 2006;6:901–905. doi: 10.1021/nl052351n. [DOI] [PubMed] [Google Scholar]
- 46.Kang H., Yoon Y.G., Hyeon C. Confinement-induced glassy dynamics in a model for chromosome organization. Phys. Rev. Lett. 2015;115:198102. doi: 10.1103/PhysRevLett.115.198102. [DOI] [PubMed] [Google Scholar]
- 47.Gürsoy G., Xu Y., Liang J. Spatial confinement is a major determinant of the folding landscape of human chromosomes. Nucleic Acids Res. 2014;42:8223–8230. doi: 10.1093/nar/gku462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Zhang Y., Heermann D.W. Loops determine the mechanical properties of mitotic chromosomes. PLoS One. 2011;6:e29225. doi: 10.1371/journal.pone.0029225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Li G., Ruan X., Ruan Y. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell. 2012;148:84–98. doi: 10.1016/j.cell.2011.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Tang Z., Luo O.J., Ruan Y. CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell. 2015;163:1611–1627. doi: 10.1016/j.cell.2015.11.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Baù D., Sanyal A., Marti-Renom M.A. The three-dimensional folding of the α-globin gene domain reveals formation of chromatin globules. Nat. Struct. Mol. Biol. 2011;18:107–114. doi: 10.1038/nsmb.1936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Peng C., Fu L.Y., Zhang H.Y. The sequencing bias relaxed characteristics of Hi-C derived data and implications for chromatin 3D modeling. Nucleic Acids Res. 2013;41:e183. doi: 10.1093/nar/gkt745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Meluzzi D., Arya G. Efficient estimation of contact probabilities from inter-bead distance distributions in simulated polymer chains. J. Phys. Condens. Matter. 2015;27:064120. doi: 10.1088/0953-8984/27/6/064120. [DOI] [PubMed] [Google Scholar]
- 54.Fudenberg G., Imakaev M. FISH-ing for captured contacts: towards reconciling FISH and 3C. Nat. Methods. 2017;14:673–678. doi: 10.1038/nmeth.4329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.de Gennes P.G. Cornell University Press; Ithaca and London: 1979. Scaling Concepts in Polymer Physics. [Google Scholar]
- 56.Aronovitz J.A., Nelson D.R. Universal features of polymer shapes. J. Phys. (Paris) 1986;47:1445–1456. [Google Scholar]
- 57.Hyeon C., Dima R.I., Thirumalai D. Size, shape, and flexibility of RNA structures. J. Chem. Phys. 2006;125:194905. doi: 10.1063/1.2364190. [DOI] [PubMed] [Google Scholar]
- 58.Rycroft C.H. VORO++: a three-dimensional voronoi cell library in C++ Chaos. 2009;19:041111. doi: 10.1063/1.3215722. [DOI] [PubMed] [Google Scholar]
- 59.Friedman B., O’Shaughnessy B. Short time behavior and universal relations in polymer cyclization. J. Phys. II. 1991;1:471–486. [Google Scholar]
- 60.Hyeon C., Thirumalai D. Kinetics of interior loop formation in semiflexible chains. J. Chem. Phys. 2006;124:104905. doi: 10.1063/1.2178805. [DOI] [PubMed] [Google Scholar]
- 61.Toan N.M., Morrison G., Thirumalai D. Kinetics of loop formation in polymer chains. J. Phys. Chem. B. 2008;112:6094–6106. doi: 10.1021/jp076510y. [DOI] [PubMed] [Google Scholar]
- 62.Flory P.J. The configuration of real polymer chains. J. Chem. Phys. 1949;17:303–310. [Google Scholar]
- 63.Grosberg A.Y., Khokhlov A.R. AIP Press; New York: 1994. Statistical Physics of Macromolecules. [Google Scholar]
- 64.Halverson J.D., Smrek J., Grosberg A.Y. From a melt of rings to chromosome territories: the role of topological constraints in genome folding. Rep. Prog. Phys. 2014;77:022601. doi: 10.1088/0034-4885/77/2/022601. [DOI] [PubMed] [Google Scholar]
- 65.Liu L., Hyeon C. Contact statistics highlight distinct organizing principles of proteins and RNA. Biophys. J. 2016;110:2320–2327. doi: 10.1016/j.bpj.2016.04.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Goetze S., Mateos-Langerak J., van Driel R. The three-dimensional structure of human interphase chromosomes is related to the transcriptome map. Mol. Cell. Biol. 2007;27:4475–4487. doi: 10.1128/MCB.00208-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Levine M., Cattoglio C., Tjian R. Looping back to leap forward: transcription enters a new era. Cell. 2014;157:13–25. doi: 10.1016/j.cell.2014.02.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Mortazavi A., Williams B.A., Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628. doi: 10.1038/nmeth.1226. [DOI] [PubMed] [Google Scholar]
- 69.Trapnell C., Williams B.A., Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 2010;28:511–515. doi: 10.1038/nbt.1621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Vernimmen D., Marques-Kranc F., Higgs D.R. Chromosome looping at the human α-globin locus is mediated via the major upstream regulatory element (HS -40) Blood. 2009;114:4253–4260. doi: 10.1182/blood-2009-03-213439. [DOI] [PubMed] [Google Scholar]
- 72.Kleinjan D.A., Seawright A., van Heyningen V. Aniridia-associated translocations, DNase hypersensitivity, sequence comparison and transgenic analysis redefine the functional domain of PAX6. Hum. Mol. Genet. 2001;10:2049–2059. doi: 10.1093/hmg/10.19.2049. [DOI] [PubMed] [Google Scholar]
- 73.Bhatia S., Bengani H., Kleinjan D.A. Disruption of autoregulatory feedback by a mutation in a remote, ultraconserved PAX6 enhancer causes aniridia. Am. J. Hum. Genet. 2013;93:1126–1134. doi: 10.1016/j.ajhg.2013.10.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Buckle A., Nozawa R.S., Gilbert N. Functional characteristics of novel pancreatic Pax6 regulatory elements. Hum. Mol. Genet. 2018;27:3434–3448. doi: 10.1093/hmg/ddy255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Bonev B., Mendelson Cohen N., Cavalli G. Multiscale 3D genome rewiring during mouse neural development. Cell. 2017;171:557–572.e24. doi: 10.1016/j.cell.2017.09.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Fritsche M., Li S., Wiggins P.A. A model for Escherichia coli chromosome packaging supports transcription factor-induced DNA domain formation. Nucleic Acids Res. 2012;40:972–980. doi: 10.1093/nar/gkr779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Di Stefano M., Rosa A., Micheletti C. Colocalization of coregulated genes: a steered molecular dynamics study of human chromosome 19. PLoS Comput. Biol. 2013;9:e1003019. doi: 10.1371/journal.pcbi.1003019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Finn E.H., Pegoraro G., Misteli T. Comparative analysis of 2D and 3D distance measurements to study spatial genome organization. Methods. 2017;123:47–55. doi: 10.1016/j.ymeth.2017.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Szabo Q., Jost D., Cavalli G. TADs are 3D structural units of higher-order chromosome organization in Drosophila. Sci. Adv. 2018;4:eaar8082. doi: 10.1126/sciadv.aar8082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Thirumalai D., Hyeon C. RNA and protein folding: common themes and variations. Biochemistry. 2005;44:4957–4970. doi: 10.1021/bi047314+. [DOI] [PubMed] [Google Scholar]
- 81.Knight P.A., Ruiz D. A fast algorithm for matrix balancing. IMA J. Numer. Anal. 2013;33:1029. [Google Scholar]
- 82.Veitshans T., Klimov D., Thirumalai D. Protein folding kinetics: timescales, pathways and energy landscapes in terms of sequence-dependent properties. Fold. Des. 1997;2:1–22. doi: 10.1016/S1359-0278(97)00002-3. [DOI] [PubMed] [Google Scholar]
- 83.Limbach H.J., Arnold A., Holm C. ESPResSo – an extensible simulation package for research on soft matter systems. Comput. Phys. Commun. 2006;174:704–727. [Google Scholar]
- 84.Yang T., Zhang F., Li Q. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res. 2017;27:1939–1949. doi: 10.1101/gr.220640.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Ester M., Kriegel H.-P., Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E., Han J., Fayyad U., editors. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. The AAAI Press; 1996. pp. 226–231. [Google Scholar]
- 86.Pedregosa F., Varoquaux G., Duchesnay E. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 2011;12:2825–2830. [Google Scholar]
- 87.Kuhn R.M., Haussler D., Kent W.J. The UCSC genome browser and associated tools. Brief. Bioinform. 2013;14:144–161. doi: 10.1093/bib/bbs038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Dixon J.R., Selvaraj S., Ren B. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Ernst J., Kheradpour P., Bernstein B.E. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473:43–49. doi: 10.1038/nature09906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Pruitt K.D., Brown G.R., Ostell J.M. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014;42:D756–D763. doi: 10.1093/nar/gkt1114. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






