Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Mar 1.
Published in final edited form as: IEEE/ACM Trans Comput Biol Bioinform. 2018 Aug 15;17(2):647–656. doi: 10.1109/TCBB.2018.2865349

Inferring Spatial Organization of Individual Topologically Associated Domains via Piecewise Helical Model

Zhang Rongrong 1, Ming Hu 2, Yu Zhu 3, Zhaohui Qin 4, Ke Deng 5, Jun S Liu 6
PMCID: PMC7202374  NIHMSID: NIHMS1582098  PMID: 30113897

Abstract

The recently developed Hi-C technology enables a genome-wide view of chromosome spatial organizations, and has shed deep insights into genome structure and genome function. However, multiple sources of uncertainties make downstream data analysis and interpretation challenging. Specifically, statistical models for inferring threedimensional (3D) chromosomal structure from Hi-C data are far from their maturity. Most existing methods are highly over-parameterized, lacking clear interpretations, and sensitive to outliers. In this study, we propose a parsimonious, easy to interpret, and robust piecewise helical model for the inference of 3D chromosomal structure of individual topologically associated domain from Hi-C data. When applied to a real Hi-C dataset, the piecewise helical model not only achieves much better model fitting than existing models, but also reveals that geometric properties of chromatin spatial organization are closely related to genome function.

Index Terms —: chromosome spatial organizations, Bayesian model, piecewise helical model, Hi-C data, MCMC

1. INTERODUCTION

Spatial organizations of chromosomes form the threedimensional (3D) structural basis of gene expression regulation, DNA replication, and DNA repair [1]. Understanding how chromatin folds and its functional implications has attracted wide attention of the scientific community for many decades. Since early 1980s, scientists have been using microscopic-based methods, such as fluorescence in situ hybridization (FISH) [2] to study the relative positioning of genomic loci in each cell. Although very successful, these microscopic-based methods are limited by their low throughput capacities. To achieve higher throughput, Dekker et al. developed the revolutionary chromosome conformation capture (3C) technology, which simultaneously interrogates the entire cell population [3]. Lieberman-Aiden et al. further coupled 3C with next generation sequencing technologies, named the resulting method as Hi-C, and obtained the first genome-wide view of chromosome spatial organizations [4]. Recently, Hi-C and the related technologies, such as ChIA-PET [5], TCC [6], and single-cell Hi-C [7] have been widely used and shed deep insights into genome structure and genome functions [8, 9].

In Hi-C experiments, two chromatin regions with close spatial proximity are first cross-linked and then cut into fragments by restriction enzyme. The ends of two nearby fragments are fused together to form a ligation product. Such ligation product is further sheared into short pieces, and sequenced from both ends by next generation sequencing technologies. Following sequencing, the resulting pair-end reads are aligned to the reference genome. Hi-C experiments measure chromatin interactions in millions of cells simultaneously. A higher number of aligned pair-end reads indicates more frequent chromatin interactions and closer spatial proximity.

A fundamental question in Hi-C data analysis is to study spatial organizations of chromosomes. Biophysicists have proposed several polymer models to study biophysical principles governing chromatin folding [1015]. The basic idea of polymer models is that chromatin interaction energy, derived from the underlying biophysical principles, determines the specific conformation of chromosome folding. However, these polymer models cannot fully accommodate and explain the inherent randomness in Hi-C data. To fill in this gap, data-driven statistical approaches have been developed to model uncertainties of Hi-C data and study chromatin dynamics in the cell population [16, 17].

Despite from different perspectives, both polymer models and statistical models characterize spatial organizations of chromosomes via similar beads-on-a-string representation: they first partition a genomic region of interest into non-overlapped, consecutive bins (beads) based on their genomic order (string), and then estimate the Euclidean coordinates of each bin (bead). Without any specific geometric assumptions, such a beads-on-a-string representation is highly over-parameterized and sensitive to outliers. It is also difficult to take full advantage of the knowledge of biophysical constraints on the resulting structure.

To overcome the limitations of the beads-on-a-string representation, we propose a parsimonious, easy to interpret, and robust piecewise helical model for inferring spatial organizations of individual topologically associated domains (TADs) from Hi-C data. Similar helix models have been successfully applied to studies of protein folding [18, 19] and gene expression regulation [20]. To the best of our knowledge, such modeling strategies have not been employed in Hi-C data analysis yet.

The motivation of the proposed piecewise helical model comes from two perspectives. First of all, existing methods, such as BACH [17], ChromSDE [15] and pastis [21], have showed that although large scale chromatin folding is highly dynamic, local scale chromatin folding, especially within topologically associated domains (TADs) [22], can be very stable. Therefore, we assume that chromatin within a TAD exhibits a consensus spatial organization among the cell population. Noticeably, two recent Hi-C studies [23, 24] have shown that the paternal allele and maternal allele exhibit highly similar chromatin organization features for autosome chromosomes. Therefore, in this paper, we assume that two homologous autosome alleles share the same 3D consensus structure, and do not account for allele-specific chromatin spatial organizations. Additionally, it is known from geometry that any 3D curve can be uniquely determined by its local curvature and torsion. As a special case, a constant curvature and constant torsion leads to a helical curve. Since any continuous function can be approximated by piecewise constant function, the curvature and torsion of an arbitrary 3D curve can be approximated in the same way. Therefore, any continuous 3D curve can be approximated by several well-connected helixes, which we refer to as a piecewise helical curve. From these two motivations, we propose a piecewise helical model to characterize the consensus 3D chromosomal structure within TADs.

We have performed exploratory analysis on a real Hi-C dataset [25] using the existing method BACH, and the results confirmed our motivation for using piecewise helical curve to locally characterize the 3D chromatin folding structure. Specially, Figure 4 in Section 4 of this paper demonstrates a BACH-predicted 3D structure of a 1Mb TAD (mouse chromosome 18, 33,960,001 – 34,960,000) at 40kb resolution, which can be well approximated by a piecewise helical curve. More importantly, compared to the beads-on-a-string model, our proposed piecewise helical model only involves curvatures and torsions as primary unknown parameters, resulting in a highly efficient computational algorithm with straightforward geometric interpretations. In addition, genomic distance between any two loci could be easily incorporated in a piecewise helical model as the arc length between them to protect against potential outliers, leading to a robust 3D structural model for chromatin folding.

Figure 4.

Figure 4.

BACH-predicted 3D chromosomal structures under the beads-on-a-string representation. 3D model of a topological domain in mouse chromosome 18, 33,960,001 – 34,960,000. Each bead represents one 40 KB bin.

2. Methods

2.1. Piecewise helical curve representation

Let r represent a 3D curve, and r(s) = (r1(s),r2(s),r3(s)) represent the Euclidean coordinates of a point on the curve, where s ϵ [0,S] is the arc length parameter such that |r’(s)| = 1.

From curve geometry, the Frenet frame [26] as a local coordinate system determines the dynamics and geometry of a curve. The Frenet frame of r consists of the tangent vector t(s) = (t1(s),t2(s),t3(s)),the normal vector n(s) = (n1(s),n2(s),n3(s)) and the binormal vector b(s) = (b1 (s), b2 (s), b3 (s)), and these three vectors are orthonormal to each other. At a given s, the collection of t(s), n(s), and b(s), denoted as {t(s), n(s),b(s)} is referred to as the orientation of the curve at s. Furthermore, together with r(s), the vectors satisfy a system of differential equations which are called the Frenet equations and given as follows:

r(s)=t(s),
t(s)=κ(s)n(s),
n(s)=κ(s)t(s)+τ(s)b(s),
b(s)=τ(s)n(s).

Here ()΄ denotes the differentiation with respect to the arc length parameter s. κ (s) is the curvature of the curve at s, which represents how fast the curve deviates from a straight line at s, and τ(s) is the torsion of the curve at s, which represents how fast it twists from a flat plane at s. As noted previously, the geometry of the curve is controlled by the Frenet frame, the curvature and the torsion. Once the curvature function κ(s), the torsion function τ(s) the starting point r(0), and the orientation {t(0),n(0),b(0)} at the starting point are given, the entire curve can be constructed by solving the Frenet equations.

When both κ(s) and τ(s) are constants, the resulting curve is a helix. Helixes are arguably the simplest 3D curves. They, however, can give rise to more complex curves. One way to construct more complex 3D curves from helixes is to connect the latter piece by piece, each of which may have different constant curvatures and torsions. We denote a curve constructed in such way as a piecewise helical curve.

Figure 1 shows an example of a piecewise helical curve consisting of two helixes. The first helix starts at r(0) and its orientation at s = 0 is given by {t(0), n(0), b(0)} The curvature and torsion of the first helix are 0.5 and 0.3, respectively. The ending point of the first helix is r(c1) and the orientation at the ending point is {t(c1),n(c1), b(c1)}. The starting point of the second helix is r(c1). Therefore, the two helixes are connected continuously. The initial orientation of the second helix at r(c1) is. If it is identical to {t(c1),n(c1),b(c1)}, then the two helixes are set to be smoothly connected. Otherwise, they are set to be flexibly connected. Note that the two helixes in Figure 1 are flexibly connected. The curvature of the second helix is 0.5 while the torsion is 0.2. The ending point is r(c2). In summary, a piecewise helical curve of two helixes are uniquely determined by the starting point r(0), the location c1 where the two helixes are connected, the location c2 where the second helix ends, a list of pairs giving the curvatures and torsions, {(κ(1), τ(1)), (κ(2), τ(2))} and a list of triples giving the orientations, {(t(0),n(0),b(0)), (t(1),n(1),b(1))}. The curve can be constructed by solving the Frenet equations.

Figure 1.

Figure 1.

A piecewise helical curve consisting of two helices.

We use the notations from the previous work [18] and define a 12-dimensional vector:

Y=(t1,n1,b1,t2,n2,b2,t3,n3,b3,r1,r2,r3)

whose entries are consisting of nine components of the three basis vectors in the Frenet frame and three coordinates of the point on the curve. Then the Frenet equations can be summarized into the following matrix form:

Y=M(s)Y,

where

M=[F0000F0000F0V1V2V30],F=[0κ(s)0κ(s)0τ(s)0τ(s)0],V1=[100000000],V2=[000100000],V3=[000000100].

For the first helix with constant curvature κ(1) and torsion τ(1) in Figure 1, the matrix M is a constant matrix, and the Frenet equations become a system of homogeneous first order linear differential equations with constant coefficients. Given the initial value Y(0), the solution [18] to the Frenet equations is given by:

Y(1)(s)=A(κ(1),τ(1),s)Y(0),0sc1.

where

Y(0)=(t1(0),n1(0),b1(0),t2(0),n2(0),b2(0),t3(0),n3(0),b3(0),r1(0),r2(0),r3(0)),

and

A(κ,τ,s)=[a0000a0000a0b1b2b3I3],
a=[τ2+κ2cos(αs)α2κsin(αs)ακτ(1cos(αs))α2κsin(αs)αcos(αs)τsin(αs)ακτ(1cos(αs))α2τsin(αs)ακ2+τ2cos(αs)α2],
b1=[αsτ2+κ2sin(αs)α3κ(1cos(αs))α2κτ(αssin(αs))α3000000],
b1=[αsτ2+κ2sin(αs)α3κ(1cos(αs))α2κτ(αssin(αs))α3000000],
b2=[αsτ2+κ2sin(αs)α3κ(1cos(αs))α2κτ(αssin(αs))α3000],
b3=[000000αsτ2+κ2sin(αs)α3κ(1cos(αs))α2κτ(αssin(αs))α3],
a=k2+τ2,

and I3 is the 3 × 3 identity matrix.

Y(1)(c1)explicitly gives the orientation at the ending point, and the coordinates of the ending point of the first helix are r(c1)= (r1(1), r2(1), r3(1)).

If the starting point r(0) is the origin, i.e. r(0) = (0,0,0) and the orientation at 0 is given by the standard basis of 3, i.e. {t = (l,0,0),n = (0,1,0),b = (0,0,1)}, the parametric expression of the first helix could be simplified as follows:

r(s)=(r1(s)r2(s)r3(s))=(κα2cos(αs)κα2sin(αs)ταs),
α=κ2+τ2,0sc1.

As discussed previously, to construct the second helix, r(c1) is served as the starting point of the second helix and the orientation at starting point is (t(1), n(1), b(1)). The solution of the Frenet equations of the second helix is given by:

Y(2)(s)=A(κ(2),τ(2),sc1)Y(c1),c1<sc2,

where

Y(c1)=(t1(1),n1(1),b1(1),t2(1),n2(1),b2(1),t3(1),n3(1),b3(1),r1(1),r2(1),r3(1)).

The last three components of the vector Y(1)(c2) give the coordinates of the ending point r(c2).

It is straightforward to extend the piecewise helical curve with two helixes to that with H helixes. Suppose the latter is defined for s ϵ [0,S], r(c0) = r(0) is the starting point, c1,c2,…”,cH −1 are the H −1 locations where the H helixes connect to each other consecutively, and r(cH) = r(S) is the ending point.

The j-th helix is defined from cj-1 to cj, r(j-1) = r(cj-1) the starting point, the curvature and torsion are k(j) and t(j), the orientation at r(j-1) is. {t(j-1), n(j-1), b(j-1)} By recursively applying the procedures used for two helixes, the solution to the Frenet equations of the j-th helix is given by:

Y(j)(s)=A(κ(j),τ(j),scj1)Y(cj1),
cj1<scj,

where

Y(ci)=(t1(i),n1(i),b1(i),t2(i),n2(i),b2(i),t3(i),n3(i),b3(i),r1(i),r2(i),r3(i)),
i=j1,j=1,2,,H.

If all the helixes are smoothly connected, the recursive solution to the Frenet equations could be simplified as

Y(j)(s)=A(κ(j),τ(j),scj1)×[k=1j1A(κ(k),τ(k),ckck1)]Y(0),
cj1<scj,j2.

Since, any arbitrary 1D function can be approximated by a piecewise constant function, the curvature and torsion functions of any 3D curve can be approximated in the same way. As discussed above, a curve with piecewise constant curvature and torsion functions is a piecewise helical curve. Hence the piecewise helical curve representation is flexible enough to model any 3D curve.

Compared with the over-parameterized beads-on-a-string representation, the piecewise helical curve representation is much simpler. It is a parsimonious representation with a finite number of parameters, which can be much smaller than the total number of coordinates of loci in the TAD of interest. Moreover, the piecewise helical curve representation is mathematical tractable. The spatial distance between any two genomic loci can be expressed recursively as a function of a series of curvatures, torsions and orientation vectors, and all these parameters have clear geometric interpretations. The proposed piecewise helical representation is a model for 3D curve, and doesn’t need additional spatial constraints. On the other hand, the beads-on-a-string model, used by existing methods such as ChromSDE and pastis, do not impose any spatial constraints and entirely rely on the spatial pattern present in Hi-C data, making the model unstable and sensitive to outliers. Therefore, the piecewise helical model is more preferable to the beads-ona-string model in terms of model robustness.

2.2. Piecewise helical model for contact frequencies

In this work, we are interested in modeling the consensus 3D structure at the topological domain scale based on Hi-C data. Using the piecewise helical curve representation, we propose a statistical model to link Hi-C data to pair-wise spatial distances between genomic loci.

We use the piecewise helical curve with H helixes to approximate the 3D structure of the TAD of interest. Suppose the total number of loci in the TAD is N, and their corresponding positions on the curve are r(s1), …, r(sN). Then, the Euclidean distance between any two loci i and j, denoted as dij = d (si;, sj), is given by:

dij=(r1(si)r1(sj))2+(r2(si)r2(sj))2+(r3(si)r3(sj))2.

Let Ytj represent the number of pair-end reads spanning the two genomic loci i and j, i,j ϵ {1,…, N}. To account for the over-dispersion in Hi-C data (see Section 4), we propose the following negative binomial model for Yij

Yij~NB(θij,ϕ).

In this model, Yij has mean ϕij, variance θij +θ2ij/ϕ, and dispersion parameter ϕ. Furthermore, we model the relationship between log θij and log dij and local genomic features as follows:

log θij= log eij+β0+β1(log dij),

where

etj= exp (βenz log (E1Ej)+βgcc log (G1Gj)+βmap log (MiMj)).

In this model, Ei,Ej,Gi,Gj,Mi,Mj are the restriction enzyme cutting frequency, GC content, and sequence mappability of loci i and loci j, respectively, and βenz, βgcc, βmap, βo and β1 are unknown coefficients. βo represents the over-all sequencing depth, and β1 represents the link coefficient between spatial distance and chromatin interaction frequency. Note that β1 is expected to be negative.

We assume that all Yijs are independent. The joint likelihood is given as follows:

L({Yij}1i,jN|{dij}1i,jNϕ)=1i<jN(ϕϕ+θij)ϕΓ(ϕ+Yij)Γ(ϕ)Yij!×(θijϕ+θij)Yij,

where θij = exp(log eij + βo + β1(log dij)).

2.3. Model implementation

Since Hi-C data do not contain information about the absolute spatial distance, piecewise helical models are not identifiable up to a scaling factor. For example, assume that the helix has curvature κ, torsion τ and total arc length S. For any positive constant C, two sets of parameters o, β1, κ,τ,S) and (β0 + β1 log C, β1,,, S/C) can achieve the same likelihood value in our posited negative binomial model. To resolve this non-identifiability issue without loss of generality, we normalize the absolute scale of the piecewise helical curve such that

1i<jN log dij=0.

Recall the formula for a helix starting from the origin in Section 2.1. The first two coordinates of the helix are periodical, and we refer to each period as a loop of this helix. For a helix with curvature κ, torsion τ and total arc length S, the number of loops, which is denoted as L, is given as follows:

L=κ2+τ2*S2π.

For any helix that contains N genomic loci, we add the following constraint on L:

LN/2.

The purpose of imposing this constraint is to exclude helixes that have an excessive number of loops. Equivalently, this constraint requires that on average each loop contains at least two genomic loci.

Additional assumptions are imposed on the piecewise constant curvatures and torsions in the piecewise helical model to avoid over-fitting. We require that the helixes within a piecewise helical curve to have equal length, which is equivalent to requiring that the helixes have an equal number of genomic loci. We use the Bayesian Information Criterion (BIC) [27] to determine the number of helixes for a TAD. Note that the curvatures and torsions for all helixes and orientation vectors t, n, b at the connection points are collectively the primary parameters of the piecewise helical model.

A fully Bayesian approach with Markov Chain Monte Carlo (MCMC) techniques is adopted for fitting the piecewise helical model and performing the subsequent statistical inference. Uniform prior Unif[0.001,1000] is chosen for curvature κ, torsion τ and over-dispersion parameter ϕ. Non-informative priors are used for the other parameters in the piecewise helical model. The random walk Metropolis within Gibbs algorithm [28] is implemented to sample the joint posterior distribution. Figure S1 in the supplementary document (in the rest of this paper, figures and tables with labels starting with an S can be found in the supplementary document) shows high correlation between curvature and torsion in the single helix model. In order to improve the sampling efficiency, the multiple-try Metropolis algorithm [28] is applied to sample curvature and torsion simultaneously. When fitting the piecewise helical model, we ran 50,000 MCMC iterations. The first 10,000 samples were discarded as the burn-in stage, and then every 50th sample in the last 40,000 samples were used for posterior inference.

The computation time for fitting the piecewise helical model depends on the number of MCMC iterations and the number of loci in the genomic region of interest. All MCMC calculations are conducted on computing nodes in Purdue Linux cluster “Radon”, each with hyper-threading quad-core Xeon E3–1284L 2.3 GHz processors and 32 GB RAM. Under the default setting, it takes 42 seconds to predict a 3D piecewise helical chromosomal structure which contains 2 helixes with 33 loci. The computation time increases in a quadratic fashion as the number of loci in the genomic region of interest increases.

In the rest of this paper, we refer to our proposed method of using the piecewise helical model to reconstruct 3D structure of TADs as the PHM method.

3. SIMULATION STUDIES

3.1. Simulation study when Hi-C data is simulated from a single helix

First, we conducted a simulation study to test the performance of the single helix model using a TAD in mouse chromosome 18, 33,960,001 – 34,960,000. Each 40kb bin is treated as a bead in the beads-on-a-string representation, and the chromatin is assumed to fold as a single helix with curvature κ = 0.56 and torsion τ = 0.28. The arc length between the center of the i-th bin and the center of the j-th bin is proportional to the genomic distance between them (|i — j|). The parameters in the negative binomial model are set as: β0 = 6, β = −1, ϕ = 20, β enz = 0.1, βgcc = −0.1, βmap = 0.1. The local genomic features of each bin, including restriction enzyme cutting frequency, GC content and sequence mappability, are calculated from the UCSC reference genome mm9. The Hi-C contact matrix {yij} 1≤ ij≤25 is then simulated from the posited negative binomial model. The constant shifted log likelihood of the simulated data at the true parameter values, referred to as true log likelihood, is 142811.16. We normalize the absolute scale of the piecewise helical curve as mentioned previously and transform the curvature, torsion and arc length parameters accordingly. After parameter transformation, curvature κ is 1.79, and torsion τ is 0.90.

The proposed algorithm is applied to the simulated Hi-C data, and statistical inference is conducted on the unknown parameters. Figure S2A shows the convergence of ten parallel chains. The Gelman-Rubin statistic [29] is 1.00, suggesting good mixing. Among the ten parallel chains, chain 8 achieves the highest posterior mode 142813.94, which is higher than the true log likelihood 142811.16. Therefore, chain 8 is used for posterior inference. Figure S2B shows the autocorrelation plot of every 50th posterior sample in the last 40,000 iterations in chain 8. After thinning, the posterior samples are independent. Table S1A lists the summary statistics of the posterior distribution of all unknown parameters. For each parameter, the 95% credible interval covers the true value. Figure 2A shows the structural alignment between the inferred helix (blue line) and the true helix (red line). The root mean squared distance [2] between the two structures is 0.02, suggesting high similarity. The residual plot (Figure 2B) shows no obvious trend between the standardized residuals and the fitted values, suggesting good model fit.

Figure 2.

Figure 2.

A. Structural alignment of the inferred helix (blue) and the true helix (red). Root mean square distance = 0.02. B. Residual plot. The solid black line represents that the standardized residual equals to zero. The two dashed black lines represent that the absolute value of the standardized residual equals 1.96.

Posterior predictive check [29] (Figure S2C) is also conducted to evaluate the goodness of fit. Nine summary statistics of the Hi-C contact matrix are used: minimal value, 10% percentile, 25% percentile, median, 75% percentile, 90% percentile, maximum value, mean and variance. For every 50th sample in the last 40,000 iterations in MCMC, a Hi-C contact matrix is simulated based on the posterior samples of parameters, and the summary statistics of the simulated Hi-C contact matrix are recorded. The summary statistics of the input Hi-C contact matrix are compared with the distribution of the summary statistics of the simulated Hi-C contact matrix. All p-values are between 0.05 and 0.95, suggesting good model fit.

3.2. Simulation study when Hi-C data is simulated from a piecewise helical curve

Next, we conducted a simulation study to test the performance of the piecewise helical model using a TAD in mouse chromosome 1, 92,800,001 – 94,800,000. Each 40kb bin is treated as a bead in the beads-on-a-string representation, and chromatin is assumed to fold as a piecewise helical curve, which consists of two equal-sized helixes with curvatures and torsions listed below:

 helix 1: κ(1)=0.60,τ(1)=0.20;
 helix 2:κ(2)=0.50,τ(2)=0.10.

The arc length between the center of the i-th bin and the center of the j-th bin is proportional to the genomic distance between them (|i —j|). Linear link function is used to connect Hi-C data and spatial proximity, with β0= 5, β1= − l. We further set ϕ = 15, βenz = 0.1, βgcc = −0.1, βmap = 0.l. The local genomic features of each bin, including restriction enzyme cutting frequency, GC content and sequence mappability, are calculated from the UCSC reference genome mm9. The Hi-C contact matrix {yij} 1≤ ij≤50 is simulated from the posited negative binomial model. The true log likelihood is 179911.19. To account for the multi-collinearity in the regression, we normalize the absolute scale of the piecewise helical curve as mentioned previously and transform the curvature, torsion and arc length parameters accordingly. The parameters become: β0 = 3.73, β1 = −1,

 helix 1: κ(1)=2.13,τ(1)=0.71;
 helix 2: κ(2)=1.77,τ(2)=0.35.

Our algorithm is applied to the simulated data, and statistical inference is conducted on the unknown parameters. Figure S3A shows the convergence of ten parallel chains. The Gelman-Rubin statistic is 1.00, suggesting good mixing. Among the ten parallel chains, chain 5 achieves the highest posterior mode 179915.52, which is higher than the true log likelihood 179911.19. Therefore, chain 5 is used for posterior inference. Figure S3B shows the autocorrelation plot of every 50th posterior sample in the last 40,000 iterations in chain 5. After thinning, the posterior samples are independent. Table S1B lists the summary statistics of the posterior distributions of some primary parameters. For each parameter, the 95% credible interval covers the true value.

Figure 3A shows the structural alignment between the inferred helical curve (blue line) and the true helical curve (red line). The root mean squared distance [16] between the two structures is 0.06, suggesting high similarity. Figure 3B shows the residual plot and suggests good model fit.

Figure 3.

Figure 3.

A. Structural alignment of the inferred helical curve (blue) and the true helical curve (red). Root mean square distance = 0.06. B. Residual plot. The solid black line represents that the standardized residual equals to zero. The two dashed black lines represent that the absolute value of the standardized residual equals 1.96.

4. RESULTS

4.1. Data description

Simulation studies have shown that the piecewise helical model works well on the synthetic datasets. To evaluate the performance of the piecewise helical model on real Hi-C data, we choose a published real Hi-C dataset on mouse embryonic stem (ES) cells [25]. The Hi-C experiments were conducted using two restriction enzymes, HindIII and NcoI. The data sets with restriction enzyme HindIII and NcoI are referred as the HindIII sample and the NcoI sample, respectively.

We first investigate the log average number of Hi-C reads spanning two loci as a function of the log genomic distance between the loci. We observe that the function is approximately linear when the genomic distance is around the size of a TAD (Figure S4). Therefore, a linear link function is used to analyze Hi-C data within each TAD.

Next, we evaluate the additional variation in Hi-C data after adjusting for genomic distance. We select all loci pairs with the same genomic distance, and plot the mean versus the variance of Hi-C read count spanning two loci. Interestingly, the variance is a quadratic function of the mean, suggesting a strong over-dispersion pattern (Figure S4B). Similar over-dispersion patterns have been observed in other types of next-generation sequencing count data, such as RNA-Seq data [30] and ChIP-Seq data [31]. Using the negative binomial model is a popular approach to analyzing such over-dispersed count data. Therefore, we apply the same approach to account for the over-dispersion phenomenon in our proposed model. In addition, we apply BACH to TADs in mouse chromosome 18. The 3D chromosomal structures predicted by BACH demonstrate a helix-like shape (Figure 4). This observation motivates us to use the parsimonious piecewise helical curve representations for chromatin folding.

4.2. Results of PHM and model comparisons

Piecewise helical model is applied to TADs of mouse ES cells [22] to infer the geometric properties of chromatin folding. Among all 2,200 TADs in mouse ES cells, 1,639 TADs with size larger than 480kb (12 loci) are analyzed. For each TAD, the predicted piecewise helical curve is obtained and the number of loops is used to measure the geometric property (i.e. compactness) of the piecewise helical curve. Note that the number of loops only reflects a small amount of characteristics of the piecewise helical curve structure. A larger number of loops indicates a more compact curve, while a smaller number of loops indicates a looser curve. In Figure 5, we compare the predicted piecewise helical models of the two TADs in mouse chromosome 2, 6,360,001 – 8,600,000 and chromosome 8, 22,040,001 – 24,280,000. Both of the two TADs have 56 40kb bins. We can see that the TAD in chromosome 2 contains 24 loops, and it is more compact than the TAD in chromosome 8 with 6 loops.

Figure 5.

Figure 5.

3D chromosomal structures predicted by the piecewise helical model A. 3D model of a topological domain in mouse chromosome 2, 6,360,001 – 8,600,000. B. 3D model of a topological domain in mouse chromosome 8, 22,040,001 – 24,280,000.

We next evaluate how the compactness of chromatin folding, as measured by the number of loops, is correlated with genetic and epigenetic features. We collect eleven markers for each TAD, including gene density (UCSC reference genome mm9), gene expression [32], promoter marker H3K4me3 [32], active chromatin marker RNA polymerase II [32], H3K36me3 [33], repressive chromatin marker H3K27me3 [34], H3K9me3 [35], H4K20me3 [34], DNA hypersensitivity marker DNaseI [36], DNA replication timing [37] and genome-nuclear lamina interaction [38]. The estimated Pearson correlations coefficients between the compactness of chromatin folding and the genetic and epigenetic features are presented in Table 1. Statistical hypothesis testing shows that they are all significant. It is clear that gene-rich, highly expressed and early replicated genomic regions are much looser, whereas gene-poor, lowly expressed and later replicated genomic regions are more compact.

TABLE 1.

PEARSON CORRELATION COEFFICIENTS BETWEEN THE NUMBER OF LOOPS AND THE GENETIC AND EPIGENETIC FEATURES.

Features Number of loops
HindIII NcoI
Gene density −0.26 −0.25
Gene expression −0.09 −0.08
RNA polymerase II −0.22 −0.21
H3K36me3 −0.33 −0.31
H3K27me3 −0.30 −0.29
H3K4me3 −0.31 −0.30
DNaseI −0.38 −0.36
Replication timing −0.39 −0.37
Lamina 0.39 0.37
H3K9me3 0.34 0.31
H4K20me3 0.30 0.29

To show the advantage of allowing flexible connection between two consecutive helixes within the piecewise helical curve, we used the TAD in mouse chromosome 1, 57,440,001 – 58,440,000 as an illustrative example. The piecewise helical curve with two helixes, which achieves the lowest BIC, is preferred for this TAD in PHM. Figure 6A is the raw Hi-C contact heatmap, which shows long-range chromatin interactions. In the reconstructed structure as shown in Figure 6B, the two helixes bend at the connection point to account for the long-range interactions.

Figure 6.

Figure 6.

3D chromosomal structures predicted by the piecewise helical model A. raw Hi-C contact matrix heatmap of topological domain in mouse chromosome 1, 57,440,001 – 58,440,000. B. recon structed 3D model from PHM, and each bead represent one 40kb bin.

To demonstrate PHM outperforms BACH in terms of goodness-of-fit, we compare the performances of the two methods using the TAD in mouse chromosome 18, 33,960,001 – 34,960,000 as an example. For this TAD, BIC is used to determine the number of helixes within the piecewise helical curve in PHM, and the result leads to the single helix model. The 3D chromatin structures predicted by PHM and BACH are depicted in Figure 7, respectively, which are similar to each other. We further check the residual plots of the two methods. For PHM, 17 out of 276 (6.16%) normalized residuals have their absolute values above 1.96, and the residuals have no obvious pattern as a function of predicted values (Figure S5A). For BACH, 63 out of 276 (22.83%) normalized residuals have their absolute values above 1.96, and the residuals demonstrate larger variance when the predicted values are larger (Figure S5B). The Poisson distribution used in BACH cannot fully capture the over-dispersion pattern in Hi-C data. By explicitly modeling over-dispersion via the negative binomial distribution, PHM provides a better fit of real Hi-C data than BACH.

Figure 7.

Figure 7.

Alignment of 3D chromosomal structures predicted by PHM (black dots and black lines) and BACH (white dots and white lines). 3D model of a topological domain in mouse chromosome 18, 33,960,001 – 34,960,000. Each bead represents one 40kb bin.

PHM is then applied to both of the HindIII and NcoI samples of the longest 500 TADs in mouse ES cells [22] to predict their 3D structures. We calculate the pairwise Euclidean distance matrices of the predicted structures from the two samples. The Spearman correlation between the two matrixes is calculated and used to assess the reproducibility of PHM. To compare with existing methods, we repeat this procedure and obtain the corresponding Spearman correlations for chromSDE, three variants of Pastis methods (Pastismds, Pastis-nmds, Pastis-pm1). The results are listed in Figure S6A and Table 2. PHM achieves the highest correlation, indicating its higher reproducibility.

TABLE 2.

SPEARMAN CORRELATION COEFFICIENTS BETWEEN THE HINDIII SAMPLE AND THE NCOI SAMPLE

Chrom SDE Pastis-mds Pastis-nmds Pastis-pm1 PHM
0.9691 0.5969 0.6621 0.5401 0.9878

Zhang et al. [15] showed that the mapping from contact counts to physical distance can vary from one resolution to another and proposed a procedure to assess the stability of a 3D structure reconstruction method with respect to change in resolution. The procedure can be described as follows. Assume that X3×N and Y3×M denote predicted structures for different resolution levels with N < M, indicating that X is constructed at the lower resolution level. A downsampled structure Y*3×N from Y at the same resolution as X is obtained by averaging the coordinates of loci. Then the Spearman correlation between distance matrices from X and Y* is calculated and used as a stability measure. We separately apply this procedure to PHM, ChromSDE, three variants of Pastis for the selected first 500 longest TADs at 40kb and 80 kb resolutions. The resulting Spearman correlations are presented in Figure S6B, C and Table 3. Our PHM method achieves comparable Spearman correlation coefficients with ChromSDE, and significantly outperforms the three Pastis methods.

TABLE 3.

SPEARMAN CORRELATION COEFFICIENTS BETWEEN INFERRED STRUCTURES AT 40KB AND 80KB RESOLUTIONS

Chrom SDE Pastis-mds Pastis-nmds Pastis-pm1 PHM
HindIII 0.9901 0.8419 0.9419 0.7997 0.9773
NcoI 0.9822 0.8249 0.8836 0.7773 0.9697

4.3. Model validation with gold standard FISH data

We further use a FISH dataset independently generated by [39] for the same mouse ES cells to validate our PHM method. In the FISH experiment, spatial distances of six pairs of genes, which belong to four TADs, were directly measured. These distances are referred to as the FISH distances and used as the gold standard. The detailed information about the FISH experiment and the four TADs (1–4) can be found in Table S2.

We apply PHM with two helixes to TADs 1 and 2, and PHM with one helix to TADs 3 and 4. The HindIII sample and NcoI sample are analyzed separately. Based on the results, the Pearson correlation coefficients between the predicted spatial distances and the gold standard FISH distances are calculated, which are 0.90 (Figure S7A) and 0.82 (Figure S7B) for HindIII sample and NcoI sample, respectively.

As a comparison, we also apply BACH, ChromSDE, and the three variants of Pastis to infer the structures of the four TADs. The resulting Pearson correlations with the gold standard FISH distances of all the methods are presented in Table 4. For the HindIII sample, the top three performers are Pastis-mds, Pastis-pm1, and PHM. For the NcoI sample, the top three performers are PHM, BACH, and Pastis-pm1. Even though the two Pastis methods, Pastis-mds, Pastis-pm1 perform better for the HindIII sample than PHM, their performances for the NcoI sample are inferior. Considering both of the samples, PHM achieves better over-all performance than the other competing methods.

TABLE 4.

PEARSON CORRELATION WITH FISH DATA

HindIII NcoI
PHM 0.90 0.82
BACH 0.88 0.78
ChromSDE −0.14 −0.14
Pastis-mds 0.99 0.48
Pastis-nmds 0.83 0.03
Pastis-pm1 0.97 0.46

4.4. Comparison of 3D structures between mouse ES cells and mouse cortex cells

We further investigate the difference of 3D chromosomal structures between mouse ES cells and mouse cortex cells, and how such structural difference is related to cell type specific gene expression. The Hi-C experiment on mouse cortex cells was conducted using the restriction enzyme HindIII [22]. Therefore, we only compare the HindIII sample of mouse ES cells with the mouse cortex Hi-C data. Dixon et al. reported 2,200 and 1,518 TADs in mouse ES cells and mouse cortex cells, respectively [22]. We select overlapped TADs between the two cell types, divide them into 451 1MB bins. For each 1-MB bin, it contains 25 40kb loci, so we apply PHM with one helix and BACH to these selected 1MB bins to predict their 3D structures.

Similar to mouse ES cells, we observe that genomic regions consisting of highly expressed genes tend to be more elongated in mouse cortex cells. In addition to elongation, the helix model can be used to evaluate the magnitude of over-dispersion in the Hi-C data. Higher magnitude of over-dispersion indicates higher chromatin dynamics among the cell population. We observe that the magnitude of over-dispersion is positively correlated with the gene expression. The Pearson correlation coefficients are 0.41 and 0.37 in the mouse ES cells and mouse cortex cells, respectively. Our observation is consistent with a recent study, which shows that Hi-C data generated from TADs with active genes demonstrate higher variance than the TADs with silent genes [40]. More interestingly, the difference of over-dispersion in Hi-C data between mouse ES cells and mouse cortex cells is positively associated with the difference of gene expression between these two types of cells (Pearson correlation = 0.27). Together, these results suggest that genomic regions with higher gene expression tend to have higher chromatin dynamics.

5. Discussions

Hi-C technology substantially advances our understanding of spatial organizations of chromosomes and their implications on genomic functions. However, appropriate statistical models for analyzing Hi-C data and inferring chromatin folding are still lacking. In this paper, we propose a piecewise helical model to reconstruct 3D chromosomal structure within each topologically associated domain. Compared with existing approaches with over-parameterized beads-on-a-string representation, the piecewise helical model has the following four advantages: (i) it is parsimonious, yet captures key features of the 3D chromosomal structure; (ii) unknown parameters—curvature and torsion—have clear geometric interpretation; (iii) it is straightforward to incorporate the genomic distance into the arc length of the piecewise helical model, so that the inferred 3D chromosomal structures are robust to outliers, and show high reproducibility between different samples; and (iv) the computational cost for the parsimonious model is significantly lower than other complex models. Additionally, the simplicity of the piecewise helical model makes it affordable to employ a computationally more demanding, yet much more suitable, negative binomial regression model to link the spatial distance with Hi-C data.

Applied to the real Hi-C data, the piecewise helical model coupling with the negative binomial regression achieved a higher consistency with the gold standard FISH data than BACH. In addition, we confirm that geometric properties of TADs, such as the number of the loops, are highly associated with genomic features. Interestingly, we observed that the magnitude of over-dispersion, a measurement of chromatin dynamics, is positively correlated with gene expression.

We noticed a few limitations of the piecewise helical model. First, we use BIC to select the number of helixes, but assume that all helixes within the same helical curve are of equal size. In theory, we can treat the helix boundary points as unknown parameters. Statistical inference of the number of helixes and helix boundary points can be formulated into the change point estimation problem. However, such model is over-parameterized, and results are usually unstable. Second, in the piecewise helical model, we use piecewise constant functions to model curvature and torsion changes. But for an arbitrary 3D curve, they can be any continuous functions. It is possible to model curvature and torsion functions by smoothing spline functions, which may result in a much higher computational cost. Modeling 3D chromosomal structure via an arbitrary continuous curve clearly deserves further effort in the future research.

Although local chromatin folding, especially within TADs, can be very stable. This assumption may not be true for modelling the 3D structure of the whole chromosome in the cell population. The whole chromosome, especially the euchromatin/open chromatin regions are highly dynamic, and it is unclear that whether the cell population contains one dominant structure or multiple distinct 3D chromosomal structures with comparable mixture proportions among the entire cell population. In another on-going project of our group, we proposed to use a mixture of piecewise helical model to characterize the structure variability of the whole chromosome, and hope to report the results in possible future publication.

Supplementary Material

Supplementary Document

Acknowledgment

This work has utilized computing resources at Purdues Community Cluster Program.

Funding: National Science Foundation grant (DMS 1000443 to Y.Z.); National Institutes of Health grant (U54DK107977 to M.H.).

Biographies

Rongrong Zhang is currently a Ph.D. candidate in the Department of Statistics at Purdue University.

graphic file with name nihms-1582098-b0001.gif

Ming Hu received the Ph.D. degree in Biostatistics from University of Michigan, Ann Arbor, Michigan in 2010. He is currently an assistant staff of Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation. His main research interests include the development of Bayesian statistical methodologies and efficient computational tools for genomic and biomedical research.

graphic file with name nihms-1582098-b0002.gif

Yu Zhu received the Ph.D. degree in Statistics from University of Michigan, Ann Arbor, Michigan in 2000. He is currently a professor of statistics with the Department of Statistics, Purdue University, and with the Center for Statistical Science, the Department of Industrial Engineering, Tsinghua University, Beijing, China. His main research interests include dimension reduction and variable selection for high dimensional data analysis, statistical bioinformatics and next generation sequencing technologies, design and analysis of experiments, theoretical and computational statistics.

graphic file with name nihms-1582098-b0003.gif

Zhaohui Qin received the Ph.D. degree in Statistics from University of Michigan, Ann Arbor, Michigan in 2000. He is currently an associate professor of biostatistics and bioinformatics with the Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University. His main research interests include developing and evaluating model-based methods to analyze genetics and genomics data, in particular, next generation sequencing data

graphic file with name nihms-1582098-b0004.gif

Ke Deng received the Ph.D. degree in Statistics from Peking University, Beijing, China in 2008. He is currently an Associate professor of statistics with the Center for Statistical Science, the Department of Industrial Engineering, Tsinghua University, Beijing, China. His main research interests include Bayesian methodologies, sequential Monte Carlo, bioinformatics, statistical genetics, text mining, network tomography and social sciences.

graphic file with name nihms-1582098-b0005.gif

Jun S. Liu received the Ph.D. degree in Statistics from Chicago University, Chicago, Illinois in 1991. He is currently a professor of statistics with the Department of Statistics, Harvard University. He received the COPSS Presidents’ Award in 2002. He was an IMS Medallion Lecturer in 2002 and a Bernoulli Lecturer in 2004. He was elected a fellow of the Institute of Mathematical Statistics in 2004 and of the American Statistical Association in 2005. His main research interests include Bayesian methodology, computational biology and bioinformatics.

graphic file with name nihms-1582098-b0006.gif

Footnotes

Conflict of lnterests: none declared.

The software program implementing the helix models can be freely downloaded at: http://www.stat.purdue.edu/~zhanl602/phm

Contributor Information

Zhang Rongrong, Department of Statistics, Purdue University, West Lafayette, Indiana..

Ming Hu, Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio..

Yu Zhu, Department of Statistics, Purdue University, West Lafayette, Indiana, and the Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China..

Zhaohui Qin, Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia..

Ke Deng, Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China..

Jun S. Liu, Department of Statistics, Harvard University, Cambridge, Massachusetts.

References

  • [1].Dekker J, “Gene regulation in the third dimension,” Science, vol. 319, no. 5871, pp. 1793–4, March 28, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Lanctot C, Cheutin T, Cremer M, Cavalli G, and Cremer T, “Dynamic genome architecture in the nuclear space: regulation of gene expression in three dimensions,” Nat Rev Genet, vol. 8, no. 2, pp. 104–15, February, 2007. [DOI] [PubMed] [Google Scholar]
  • [3].Dekker J, Rippe K, Dekker M, and Kleckner N, “Capturing chromosome conformation,” Science, vol. 295, no. 5558, pp. 1306–11, February 15, 2002. [DOI] [PubMed] [Google Scholar]
  • [4].Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, and Dekker J, “Comprehensive mapping of long-range interactions reveals folding principles of the human genome,” Science, vol. 326, no. 5950, pp. 289–93, October 9, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Fullwood MJ, Liu MH, Pan YF, Liu J, Xu H, Mohamed YB, Orlov YL, Velkov S, Ho A, Mei PH, Chew EG, Huang PY, Welboren WJ, Han Y, Ooi HS, Ariyaratne PN, Vega VB, Luo Y, Tan PY, Choy PY, Wansa KD, Zhao B, Lim KS, Leow SC, Yow JS, Joseph R, Li H, Desai KV, Thomsen JS, Lee YK, Karuturi RK, Herve T, Bourque G, Stunnenberg HG, Ruan X, Cacheux-Rataboul V, Sung WK, Liu ET, Wei CL, Cheung E, and Ruan Y, “An oestrogen-receptor-alpha-bound human chromatin interactome,” Nature, vol. 462, no. 7269, pp. 58–64, November 5, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Kalhor R, Tjong H, Jayathilaka N, Alber F, and Chen L, “Genome architectures revealed by tethered chromosome conformation capture and population-based modeling,” Nat Biotechnol, December 25, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Nagano T, Lubling Y, Stevens TJ, Schoenfelder S, Yaffe E, Dean W, Laue ED, Tanay A, and Fraser P, “Single-cell Hi-C reveals cell-to-cell variability in chromosome structure,” Nature, vol. 502, no. 7469, pp. 59–64, October 3, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Dekker J, Marti-Renom MA, and Mirny LA, “Exploring the threedimensional organization of genomes: interpreting chromatin interaction data,” Nat Rev Genet, vol. 14, no. 6, pp. 390–403, June, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Schmitt AD, Hu M and Ren B, “Genome-wide mapping and analysis of chromosome architecture,” Nature Reviews Molecular Cell Biology, 17, 743–755, Sep. 2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Mateos-Langerak J, Bohn M, de Leeuw W, Giromus O, Manders EM, Verschure PJ, Indemans MH, Gierman HJ, Heermann DW, van Driel R, and Goetze S, “Spatially confined folding of chromatin in the interphase nucleus,” Proc Natl Acad Sci U S A, vol. 106, no. 10, pp. 3812–7, March 10, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Bohn M, and Heermann DW, “Diffusion-driven looping provides a consistent framework for chromatin organization,” PLoS One, vol. 5, no. 8, pp. e12218, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Barbieri M, Chotalia M, Fraser J, Lavitas LM, Dostie J, Pombo A, and Nicodemi M, “Complexity of chromatin folding is captured by the strings and binders switch model,” Proc Natl Acad Sci U S A, vol. 109, no. 40, pp. 16173–8, October 2, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Barbieri M, Fraser J, Lavitas LM, Chotalia M, Dostie J, Pombo A, and Nicodemi M, “A polymer model explains the complexity of large-scale chromatin folding,” Nucleus, vol.4, no. 4, pp. 267–73, Jul-Aug, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Lesne A, Riposo J, Roger P, Cournac A, and Mozziconacci J, “3D genome reconstruction from chromosomal contacts,” Nat Methods, vol. 11, no. 11, pp. 1141–3, November, 2014. [DOI] [PubMed] [Google Scholar]
  • [15].Zhang Z, Li G, Toh KC, and Sung WK, “3D chromosome modeling with semi-definite programming and Hi-C data,” J Comput Biol, vol. 20, no. 11, pp. 831–46, November, 2013. [DOI] [PubMed] [Google Scholar]
  • [16].Rousseau M, Fraser J, Ferraiuolo MA, Dostie J, and Blanchette M, “Threedimensional modeling of chromatin structure from interaction frequency data using Markov chain Monte Carlo sampling,” BMC Bioinformatics, vol. 12, pp. 414, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Hu M, Deng K, Qin Z, Dixon J, Selvaraj S, Fang J, Ren B, and Liu JS, “Bayesian inference of spatial organizations of chromosomes,” PLoS Comput Biol, vol. 9, no. 1, pp. e1002893, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Hausrath AC, and Goriely A, “Repeat protein architectures predicted by a continuum representation of fold space,” Protein Sci, vol. 15, no. 4, pp. 753–60, April, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Hausrath AC, and Goriely A, “Continuous representations of proteins: construction of coordinate models from curvature profiles,” J Struct Biol, vol. 158, no. 3, pp. 267–81, June, 2007. [DOI] [PubMed] [Google Scholar]
  • [20].Xiao G, Wang X, and Khodursky AB, “Modeling ThreeDimensional Chromosome Structures Using Gene Expression Data,” J Am Stat Assoc, vol. 106, no. 493, pp. 61–72, March, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Varoquaux N, Ay F, Noble WS, Vert J-P. “A statistical approach for inferring the 3D structure of the genome.” Bioinformatics. 2014;30:i26–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Dixon J, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu J, and Ren B, “Topological domains in mammalian genomes identified by analysis of chromatin interactions,” Nature, vol. 485, no. 7398, pp. 376–380, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov I. d., Robinson JT, Sanborn AL, Machol I, Omer AD, Lander ES and Aiden EL, “A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping,” Cell 159, 1665–1680 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Dixon JR, Jung I, Selvaraj S, Shen Y, Antosiewicz-Bourget JE, Lee AY, Ye Z, Kim A, Rajagopal N, Xie W, et al. , “Chromatin architecture reorganization during stem cell differentiation,” Nature, 518 (2015), pp. 331–336 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Dixon J, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu J, and Ren B, “Topological domains in mammalian genomes identified by analysis of chromatin interactions,” Nature, vol. 485, no. 7398, pp. 376–380, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Guggenheimer Heinrich (1977), “Differential Geometry, “ Dover. ISBN 0-486-63433-7 [Google Scholar]
  • [27].Schwarz G (1978). Estimating the dimension of a model. Annals of Statistics 6(2), pp. 461–464. [Google Scholar]
  • [28].Liu J, Monte Carlo Strategies in scientific computing, New York: Springer-Verlag, 2001. [Google Scholar]
  • [29].Gelman A, Carlin JB, Stern HS, and Rubin DB, Bayesian data analysis, Reprinted 1997. ed., London: Chapman & Hall, 1995. [Google Scholar]
  • [30].Bullard JH, Purdom E, Hansen KD, and Dudoit S, “Evaluation of statistical methods for normalization and [40] differential expression in mRNA-Seq experiments,” BMC Bioinformatics, vol. 11, pp. 94, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Qin ZS, Yu J, Shen J, Maher CA, Hu M, Kalyana-Sundaram S, Yu J, and Chinnaiyan AM, “HPeak: an HMM-based algorithm for defining read-enriched regions in ChlPSeq data,” BMC Bioinformatics, vol. 11, no. 1, pp. 369, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Shen Y, Yue F, McCleary DF, Ye Z, Edsall L, Kuan S, Wagner U, Dixon J, Lee L, Lobanenkov VV, and Ren B, “A map of the cis-regulatory sequences in the mouse genome,” Nature, vol. 488, no. 7409, pp. 116–20, August 2, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Marson A, Levine SS, Cole MF, Frampton GM, Brambrink T, Johnstone S, Guenther MG, Johnston WK, Wernig M, Newman J, Calabrese JM, Dennis LM, Volkert TL, Gupta S, Love J, Hannett N, Sharp PA, Bartel DP, Jaenisch R, and Young RA, “Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells,” Cell, vol. 134, no. 3, pp. 521–33, August 8, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim TK, Koche RP, Lee W, Mendenhall E, O’Donovan A, Presser A, Russ C, Xie X, Meissner A, Wernig M, Jaenisch R, Nusbaum C, Lander ES, and Bernstein BE, “Genome-wide maps of chromatin state in pluripotent and lineage-committed cells,” Nature, vol. 448, no. 7153, pp. 553–60,August 2, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Bilodeau S, Kagey MH, Frampton GM, Rahl PB, and Young RA, “SetDBI contributes to repression of genes encoding developmental regulators and maintenance of ES cell state,” Genes Dev, vol. 23, no. 21, pp. 2484–9, November 1, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Schnetz MP, Handoko L, Akhtar-Zaidi B, Bartels CF, Pereira CF, Fisher AG, Adams DJ, Flicek P, Crawford GE, Laframboise T, Tesar P, Wei CL, and Scacheri PC, “CHD7 targets active gene enhancer elements to modulate ES cell-specific gene expression,” PLoS Genet, vol. 6, no. 7, pp. e1001023, July, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Hiratani I, Ryba T, Itoh M, Rathjen J, Kulik M, Papp B, Fussner E, Bazett-Jones DP, Plath K, Dalton S, Rathjen PD, and Gilbert DM, “Genome-wide dynamics of replication timing revealed by in vitro models of mouse embryogenesis,” Genome Res, vol. 20, no. 2, pp. 155–69, February, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Peric-Hupkes D, Meuleman W, Pagie L, Bruggeman S, Solovei I, Brugman W, Graf S, Flicek P, Kerkhoven RM, Lohuizen M, Reinders M, Wessels L, and Steensel B, “Molecular Maps of the Reorganization of Genome-Nuclear Lamina Interactions during Differentiation,” Mol Cell, vol. 38, no. 4, pp. 603–613, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [39].Eskeland R, Leeb M, Grimes GR, Kress C, Boyle S, Sproul D, Gilbert N, Fan Y, Skoultchi AI, Wutz A, and Bickmore WA, “RingIB compacts chromatin structure and represses gene expression independent of histone ubiquitination,” Mol Cell, vol. 38, no. 3, pp. 452–64, May 14, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Sofueva S, Yaffe E, Chan WC, Georgopoulou D, Vietri Rudan M, Mira-Bontenbal H, Pollard SM, Schroth GP, Tanay A, and Hadjur S, “Cohesin-mediated interactions organize chromosomal domain architecture,” EMBO J, November 1, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Document

RESOURCES