Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Jan 10.
Published in final edited form as: Phys Rev E Stat Nonlin Soft Matter Phys. 2011 May 13;83(5 Pt 1):050903. doi: 10.1103/PhysRevE.83.050903

Statistical mechanics of nucleosome ordering by chromatin-structure-induced two-body interactions

Răzvan V Chereji 1, Denis Tolkunov 1,2, George Locke 1, Alexandre V Morozov 1,2,*
PMCID: PMC3254185  NIHMSID: NIHMS347038  PMID: 21728479

Abstract

One-dimensional arrays of nucleosomes (DNA-bound histone octamers separated by stretches of linker DNA) fold into higher-order chromatin structures which ultimately make up eukaryotic chromosomes. Chromatin structure formation leads to 10–11 base pair (bp) discretization of linker lengths caused by the smaller free energy cost of packaging nucleosomes into regular chromatin fibers if their rotational setting (defined by the DNA helical twist) is conserved. We describe nucleosome positions along the fiber using a thermodynamic model of finite-size particles with both intrinsic histone-DNA interactions and an effective two-body potential. We infer one- and two-body energies directly from high-throughput maps of nucleosome positions. We show that higher-order chromatin structure helps explains in vitro and in vivo nucleosome ordering in transcribed regions, and plays a leading role in establishing well-known 10–11 bp genome-wide periodicity of nucleosome positions.


In living cells, eukaryotic DNA is found in a compact, multiscale chromatin state [1]. The fundamental unit of chromatin is a nucleosome: 147 base pairs (bp) of DNA wrapped around a histone octamer [2]. In addition to its primary function of DNA compaction, chromatin modulates DNA accessibility to transcription factors and other molecular machines in response to external signals, exerting a profound influence on numerous DNA-mediated biological processes such as gene transcription, DNA repair, and replication [3].

Equilibrium thermodynamic models that account for intrinsic histone-DNA sequence preferences and nearest-neighbor steric exclusion have been used to predict nucleosome positions and formation energies [46]. However, structural regularity of the chromatin fiber imposes additional constraints, leading to discretization of linker lengths between neighboring nucleosomes with the 10–11 bp periodicity of the DNA double helix [7,8]. The discretization is required to avoid steric clashes caused by the nucleosome rotating around the linker DNA axis as the linker is extended [9], and more generally to minimize the free energy costs associated with maintaining a regular pattern of protein-protein and protein-DNA contacts in the chromatin fiber [8]. Indeed, adding a short DNA segment to the linker will result in a rotation of the nucleosome with respect to the rest of the fiber, disrupting its periodic structure. This additional twist has to be compensated unless the segment is 10–11 bp in length, bringing the nucleosome into an equivalent rotational position.

Large-scale maps of in vivo and in vitro nucleosome positions in yeast reveal nucleosome-depleted regions (NDRs) in the vicinity of transcription start and termination sites (TSS and TTS) [5,10,11]. In these experiments, chromatin is digested with micrococcal nuclease to obtain mononucleosome core particles, and the mononucleosomal DNA is purified and either sequenced or hybridized to microarrays [12]. 5′ NDRs play a key role in gene regulation [10]. NDRs are also observed in vitro, where they are defined by poly(dA:dT) tracts and other nucleosome-disfavoring sequences. Surprisingly, there are no oscillations in nucleosome occupancy around in vitro NDRs and, on average, just a ~25% depletion of the occupancy over 5′ NDRs compared with the genome-wide mean [5,11] (bp occupancy is defined as its probability to be nucleosome covered). This is true even if genomic DNA from S. cerevisiae is combined with purified histones in a 1:1 mass ratio, yielding a maximum nucleosome occupancy of 0.82 which is close to the in vivo value [11]. This behavior is in sharp contrast with in vivo chromatin in which the action of transcription factors, chromatin remodeling enzymes, and components of transcriptional machinery results in well-positioned genic nucleosomes and highly pronounced 5′ NDRs (~70% depletion on average with respect to the mean) [5,10]. Because occupancy oscillations are a generic feature of one-dimensional liquids of finite-size particles in the vicinity of potential barriers and wells [13], the absence of such oscillations in vitro and shallow NDRs strongly suggest that sequence-specific histone-DNA interaction energies are on average comparable to kBT. Consequently, nucleosome-positioning and disfavoring sequences are expected to play a minor role in establishing in vivo localization of genic nucleosomes.

Here we focus on how nucleosome positions are affected by effective two-body interactions imposed on neighboring particles by regular chromatin structure. We map a three-dimensional chromatin fiber onto a system of nonoverlapping particles of length a = 147 bp with both histone-DNA and short-range nearest-neighbor interactions. The particles are confined to a one-dimensional lattice of length L. We develop a theory in which the two-body interaction (reflected in linker discretization) is deduced exactly from the two-particle distribution, even in the presence of 10–11 bp periodic one-body energies related to the rotational positioning of the nucleosome [10,11].

Let u(k) be the external potential energy of a particle that occupies positions k through k + a − 1 on the DNA, and let Φ(k,l) be the two-body interaction between a pair of nearest-neighbor particles with starting positions k and l, respectively. Here u(k) describes intrinsic histone-DNA interactions, whereas Φ (k,l) accounts for the effects of chromatin structure. The grand-canonical partition function is given by

Z=1+N=1NmaxJ(zw)N1zJ=1+J(Izw)1zJ, (1)

where Nmax is the maximum number of particles that can be positioned on L bp, I is the identity matrix, |j 〉 is a unit vector of dimension La + 1 with 1 at position j, and J=j=1La+1j. In matrix notation, 〈k|z|l〉 = eβ[μ u(k)] δk,l and 〈k|w|l〉 = eβΦ (k,l)Θ(lk), where μ is the chemical potential, δk,l is the Kronecker delta, β is the inverse temperature, and Θ is the Heaviside step function.

The one-particle and nearest-neighbor pair distribution functions are

n(i)=1ZJ(Izw)1iizii(Iwz)1J, (2)
n¯2(i,j)=1ZJ(Izw)1iizwzjj(Iwz)1J. (3)

Note that for 0 < ji < 2a, 2(i,j ) = n2(i,j ), where n2 is the ordinary two-particle distribution function. Defining two matrices, 〈i|N |j 〉 = n(i)δi,j and 〈i|N2|j〉 = 2(i,j ), we rewrite the partition function as

Z=11J(IN2N1)NJ. (4)

By inverting Eqs. (2) and (3) we obtain the exact expressions for one- and two-body energies [14,15]:

β[u(k)μ]=ln(JIN2N1kkNkkIN1N2J1JIN2N1NJ), (5)
βΦ(k,l)=ln(kN1N2N1l[1JIN2N1NJ]kIN1N2JJIN2N1l). (6)

Note that if the two-body interactions are neglected, Eq. (5) reduces to [6]

eβ[u(i)μ]=n(i)1O(i)+n(i)j=ii+a11O(j)+n(j)1O(j), (7)

where O(i) is the nucleosome occupancy of bp i[O(i)=j=ia+1in(j)].

If one-body energies u and two-body interactions Φ are known, Eqs. (2) and (3) allow us to construct particle distributions n and 2 exactly. Conversely, we can use Eqs. (5) and (6) to find u and Φ from one- and two-particle distributions. However, the two-particle distribution is not directly measured in current high-throughput experiments, in which chromatin from many cells is mixed together before mononucleosomes are isolated and sequenced. In other words, it is not known which particular genome a given nucleosome comes from. This is irrelevant for n but may present a problem for 2, which requires two-nucleosome configurations. Nonetheless, we can build a model for 2 which allows us to approximate the two-body interaction.

Let g(i,j ) be the pair distribution n2(i,j )/[n(i)n(j)]. Without one-body energies, the system is homogeneous and g is a function of only the relative distance between the nucleosomes: g(i,j ) ≡ g(ji). In this case Eq. (6) reduces to

βΦ(i,j)=ln[g(ji)]+α(ji)+lnC (8)

for arbitrary interactions Φ [16]. The constants C and α can be determined from the asymptotic condition lim(j i)→∞ Φ (i,j ) = 0. However, position-dependent one-body energies break translational invariance of the pair distribution g. Assuming that Φis translationally invariant, we introduce Plinker(Δ) = 〈g(i,i + Δ + a)〉i and approximate Φ as

βΦ(i,j)ln[Plinker(j(i+a))]+α(ji)+lnC. (9)

This step is reminiscent of replacing the ensemble average with the time average in statistical mechanics. Here, the average is taken over all initial positions i. Our numerical tests show this to be an excellent approximation, even if one- and two-body energies are comparable in magnitude, making the system strongly inhomogeneous.

Each experimental nucleosome positioning data set consists of a histogram of the number of nucleosomes starting at each genomic bp i. We preprocess these data by removing all counts of height 1 from the histogram and smoothing the remaining profile with a σ = 2 Gaussian kernel. Next, we compute n(i) by rescaling the smoothed profile so that the maximum occupancy for each chromosome is 1. Finally, we identify all local maxima on the n profile and assume that they mark prevalent nucleosome positions. For each maximum at bp i we find subsequent maxima at positions i + 146 < j1 < j2 < j3 <, in the 50 bp window. To each pair of maxima (i,j1),(i,j2), … we assign the probability that they represent neighboring nucleosomes: n(i)n(j1),n(i)[1 − n(j1)]n(j2), and so on. By summing over all initial positions i and normalizing, we obtain the linker length probability which gives us an empirical estimate of Plinker.

Figure 1 demonstrates our procedure in a model system, with preprocessing and rescaling steps skipped since the simulated n profile is noise-free and already properly normalized. Specifically, we use local maxima in the nucleosome starting probability profile [inset of Fig. 1(a)] to obtain Plinker [Fig. 1(b)]. Figure 1(d) shows that the two-body interaction can be reconstructed using Eq. (9), even in the presence of one-body energies with the same periodicity. The reconstruction is facilitated by the presence of potential wells or barriers in the one-body energy profile that are strong enough to create nonuniform density of nearby nucleosomes. To find the one-body energies, we substitute predicted Φ into Eq. (2), which we solve numerically for z [Fig. 1(c)]. Nucleosome occupancies inferred from predicted u and Φ are virtually identical to the exact profile [Fig. 1(a)].

FIG. 1.

FIG. 1

(Color online) Amodel with 10 bp oscillations in both one-body and two-body energies. The two-body interaction is Φ(x)=Acos(2π10x)ex/b, where A = 5kBT and b = 50 bp. For the one-body potential, 10 bp oscillations with the 0.5kB T amplitude were superimposed onto a smooth energy profile with two −5kB T potential wells separated by 1000 bp. DNA length of 2416 bp was chosen to be able to position 16 nucleosomes with 151 bp repeat length. The occupancy profile (a), the linker length distribution (b), the one-body energy (c), and the two-body interaction (d): exact (solid blue line) and predicted (dashed green line). μ − 〈u〉 = −1kB T in (a)–(d). Inset of (a): probability of starting a nucleosome at a given bp. (e) Average number of nucleosomes 〈Ntot〉 vs μ − 〈u〉. Insets: Occupancy profiles corresponding to three different chemical potentials, computed with Φ. (f) Linker length distributions for three values of 〈Ntot〉 shown as points in (e), with and without two-body interactions.

As the chemical potential is increased, nucleosomes undergo a transition in which their average number goes up in a steplike fashion [Fig. 1(e)] [17]. In contrast to the Φ = 0 case in which linkers are distributed exponentially, two-body interactions lead to the pronounced discretization of linker lengths [Fig. 1(f)]. The first minimum of Φ becomes more dominant as the number of nucleosomes increases, leading to a well-positioned array with 4-bp-long linkers.

We now use Eq. (9) to predict nearest-neighbor interactions from genome-wide nucleosome maps [Fig. 2(a)]. We find that despite significant experiment-to-experiment variations, all two-body potentials have minima within 1–2 bp of 5 + 10m bp, m = 0,1, … [18]. Surprisingly, there are substantial differences between two Kaplan et al. [5] in vitro replicates, with one replicate exhibiting higher values of Φ due to pronounced depletion of nucleosomes separated by <10 bp. Apparently, chromatin structure can undergo subtle uncontrolled changes from experiment to experiment.

FIG. 2.

FIG. 2

(Color online) (a) Two-body interaction Φ inferred from in vitro maps of nucleosome positions [5,11]. Gray bars indicate consensus positions of the minima. (b) Autocorrelation of nucleosome starting positions in one of the in vitro data sets [11], and of starting positions predicted using sequence-specific one-body energies from the “spatially resolved” model [6], with and without Φ. The two-body potential is from Fig. 1, consistent with the minima of Φ observed in (a). The one-body energies have σ = 0.23kBT. To account for the limited size of the in vitro data set, model output was degraded by randomly removing 1% of predicted nucleosome probabilities.

Two-body interactions are reflected in the autocorrelation of nucleosome starting positions [Fig. 2(b)]. The oscillations in the autocorrelation function are suppressed when nucleosome positions are predicted using a sequence-specific model which neglects two-body interactions [6]. This “spatially resolved” model assigns mono- and dinucleotide energies independently at each position within the 147 bp nucleosomal site and is thus capable of capturing the 10–11 bp periodicity of one-body interactions. We find that the autocorrelation function is much closer to experiment if the two-body potential is included into the model [Fig. 2(b)].

Two-body interactions are also essential for reconstructing nucleosome occupancy profiles over transcribed regions (Fig. 3). Sequence-specific energy barriers over NDRs must be low in vitro to account for the lack of occupancy oscillations induced by steric exclusion at 1:1 DNA:histone mass ratio [11]. Even with the low barriers shown in Fig. 3(a), the interaction-free model yields an oscillatory profile which is not observed in the data. The oscillations are suppressed by the two-body potential, and the resulting profile increases toward the center of the gene, in contrast with the pure steric exclusion scenario in which nucleosomes adjacent to the barriers are always the most localized [13]. This behavior is also observed in vivo where the +2 nucleosome is higher than the +1 nucleosome [Fig. 3(b)]. The in vivo barriers are more pronounced to account for additional nucleosome depletion in the NDRs due to effects other than intrinsic histone-DNA interactions. Finally, in agreement with a previous hypothesis [11], a potential well is added to localize the +1 nucleosome in vivo. The well makes the TSS profile asymmetric with respect to the center of the NDR [compare to the more symmetric TTS profile in Fig. 3(b)].

FIG. 3.

FIG. 3

(Color online) A minimalmodel of nucleosome ordering in genic regions. (a) Dashed red and dotted orange lines: average nucleosome occupancy in vitro around TSS and TTS [11]. Solid blue and dash-dot black lines: model predictions with and without Φ from Fig. 1. Both models have the average occupancy of 0.60 (less than the maximum possible occupancy of 0.82 because some histone octamers are not DNA bound). Inset: one-body energy landscape with barrier heights, widths, and shapes adjusted to reproduce observed NDRs. (b) Same as (a), for in vivo nucleosomes (YPD medium) [19]. Φ is from Fig. 1 with A = 7kBT. The log intensities from the microarray were exponentiated and normalized separately for each gene, yielding the average occupancy of 0.70, which was also used in the models.

In summary, our study shows that short-range two-body interactions induced by chromatin fiber formation play a major role in genome-wide nucleosome ordering. We demonstrate that large-scale mononucleosome maps contain evidence of the two-body potential. This potential is more important than intrinsic histone-DNA interactions for predicting 10–11 bp periodicity in genome-wide nucleosome positions and for understanding nucleosome occupancy in transcribed regions. Clearly, two-body interactions should be an integral part of genome-wide models of nucleosome occupancy. Our study also underscores the need for future experiments focused on multinucleosome distributions, which can be analyzed using our exact theory [Eqs. (5) and (6)].

Acknowledgments

This research was supported by National Institutes of Health (Grant No. HG 004708) and by the Alfred P. Sloan Foundation (A.V. M.).

References

RESOURCES