Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2013 Jun 19;110(27):10988–10993. doi: 10.1073/pnas.1210887110

Fitness landscape for nucleosome positioning

Donate Weghorn 1, Michael Lässig 1,1
PMCID: PMC3704022  PMID: 23784778

Abstract

Histone–DNA complexes, so-called nucleosomes, are the building blocks of DNA packaging in eukaryotic cells. The histone-binding affinity of a local DNA segment depends on its elastic properties and determines its accessibility within the nucleus, which plays an important role in the regulation of gene expression. Here, we derive a fitness landscape for intergenic DNA segments in yeast as a function of two molecular phenotypes: their elasticity-dependent histone affinity and their coverage with transcription factor binding sites. This landscape reveals substantial selection against nucleosome formation over a wide range of both phenotypes. We use it as the core component of a quantitative evolutionary model for intergenic DNA segments. This model consistently predicts the observed diversity of histone affinities within wild Saccharomyces paradoxus populations, as well as the affinity divergence between neighboring Saccharomyces species. Our analysis establishes histone binding and transcription factor binding as two separable modes of sequence evolution, each of which is a direct target of natural selection.

Keywords: biophysics, nucleosome-depleted regions, evolution of regulation, quantitative traits, inference of selection


The positional organization of nucleosomes in eukaryotic cells is of key importance for the overall chromatin structure and, thus, for the regulation of gene expression (13). Nucleosomes form through binding of a histone octamer to a DNA sequence segment of average length 146 base pairs (bp), which wraps around the protein complex (4). Histone-bound DNA segments are interspersed with unbound “linker” segments. Particularly prominent features of this pattern are so-called nucleosome-depleted regions (NDRs). These are extended troughs in occupancy at least ∼100 bp long, primarily located in intergenic DNA. Changes in nucleosome positioning affect the accessibility of local DNA segments for binding interactions with transcription factors and lead to observable changes of gene expression in yeast (3, 5).

Explaining two correlated molecular functions—histone binding and transcriptional regulation—in the same sequence segment may be seen as a chicken-and-egg problem (69). Is transcription factor binding the primary function, which displaces nucleosomes to sequence segments in which transcription is neutral or deleterious? Or, conversely, does nucleosome positioning constrain transcriptional interactions? Here, we address this problem by a quantitative evolutionary analysis of yeast genomes. We infer a fitness landscape for intergenic sequence segments that measures selection on their regulatory interactions and on local nucleosome formation. We capture these functions by two molecular phenotypes, the regulatory binding site content and the histone binding affinity, which reflect distinct biophysical characteristics of a DNA segment. The fitness landscape resulting from our analysis shows substantial selection acting jointly on transcriptional interactions and on nucleosome formation. Specifically, we find broad selection against histone binding—that is, in favor of nucleosome depletion—in sequence segments ∼100 bp long, although individual nucleotides within these segments are under only weak selection. Our inference of selection on nucleosome positioning is corroborated by an evolutionary analysis within and across yeast species. We model the evolution of sequence segments by mutations, genetic drift, and selection given by our fitness landscape. This model explains the observed intraspecies diversity as well as the cross-species divergence of nucleosome positioning in a quantitative way. At the end of the paper, we discuss the implications of our findings for the functional and evolutionary relationship between nucleosome positioning and transcriptional regulation and, in a broader context, for the inference of selection on correlated molecular functions.

Our evolutionary analysis is based on established biophysical models that relate the histone binding affinity and the regulatory site content of a DNA segment to its nucleotide sequence. Several mechanisms are known to influence the local probability of nucleosome formation (8). Histone-affine DNA has a specific nucleotide composition that facilitates superhelical turns around the cylinder-shaped octamer (10, 11). In contrast, histone-repelling sequence contains homopolymeric adenine segments on one strand paired with thymine segments on the other strand; these A:T tracts confer a high rigidity to the DNA double strand (12, 13). In addition, competition with other DNA-binding proteins (3, 14, 15), as well as active rearrangement through chromatin remodelers (16, 17), may alter histone binding to DNA. All these factors contribute, to different degrees, to the positioning of nucleosomes in vivo (15). Here, we choose one particular biophysical phenotype, the elasticity-mediated histone binding affinity, to map direct selection on nucleosome formation in yeast intergenic regions. Our finding of broad selection in favor of nucleosome depletion is consistent with the known functional role of NDRs. They reflect stable barriers in the histone binding energy landscape, which constrain the positioning of nucleosomes between them (1821). To infer regulatory binding sites in the yeast genome, we use standard statistical models of the position-dependent binding energy profile for specific transcription factors (22).

Our findings are consistent with previous results on the evolution of nucleosome positioning. About 70% of interspecific nucleosome architecture changes in yeast are caused by cis effects as opposed to trans-acting factors (23), which supports our inference of a local histone binding phenotype. At the level of sequence evolution, it has been shown that linker regions in yeast coding sequence are more conserved than regions of higher nucleosome occupancy (24, 25), in agreement with a previous analysis of chromosome III promoters (15). More specifically, A:T-loss nucleotide changes are reduced in NDRs compared with high-occupancy regions (26), which is consistent with A:T-rich sequence disfavoring nucleosome formation. Similar signatures of selection acting on nucleotide frequencies also have been found in the human lineage (27). It is important to note, however, that observations of sequence conservation do not distinguish the evolutionary signal of direct selection acting on a specific function, in this case nucleosome formation or transcriptional interactions, from selection acting on other, potentially unrelated functions encoded in the same sequence segment. This is why we base our study on biophysically grounded models: The statistics of a biophysical trait associated with a specific function will prove to be less confounded by apparent selection than summary sequence measures. Our inference method can be applied to other quantitative traits with a large sequence target, even if individual nucleotide changes are under only weak selection.

Results

Phenotypes of Histone Binding and of Transcription Factor Binding.

Wrapping DNA around histones necessitates specific elastic deformations of its double strand. We evaluate the energy cost of these deformations using the model of references (20, 28). The local energy cost depends on sequence content, because different nucleotide triplets have different a priori deformations in the unbound state. Given the genomic landscape of energy costs, the resulting mean nucleosome occupancy ω of a given sequence segment is determined by equilibrium thermodynamics. We call this phenotype the histone binding affinity of the segment. Our analysis uses the thermodynamic model and algorithm of references (20, 28) (for details, see Methods). This model successfully predicts the nucleosome positioning observed under in vitro conditions, that is, without the competitive binding of transcription factors (20). As expected, the ensemble average of ω decreases with increasing energy cost and increases with increasing histone density (or equivalently, with the associated chemical potential) (Fig. S1). For our genomic analysis, we use a chemical potential that reproduces the genome-wide occupancy average in vivo of about 80%. With these settings, we take ω as the best computable phenotype to measure the elasticity-mediated histone binding affinity of a given sequence segment. By definition, this phenotype is independent of the regulatory interactions encoded in that segment. We measure these interactions by an independent phenotype, n, given by the number of annotated transcription factor binding sites (Methods).

We can relate these phenotypes to the in vivo nucleosome positioning in Saccharomyces cerevisiae, which was measured in (3). In Fig. 1A, we evaluate the mean in vivo occupancy score Inline graphic for intergenic sequence segments of length 100 bp. We find a strong dependence on both phenotypes: Inline graphic is an increasing function of ω and a decreasing function of n. We conclude that DNA rigidity and transcription factor binding jointly contribute to nucleosome depletion in living yeast cells. This motivates our joint analysis of selection on exactly these phenotypes, to which we now turn.

Fig. 1.

Fig. 1.

In vivo nucleosome occupancy and fitness for yeast intergenic sequence segments. (A) The mean nucleosome occupancy, Inline graphic, is plotted against two molecular phenotypes: the elasticity-mediated histone binding affinity, ω, and the number of transcription factor binding sites, n. Occupancy data in S. cerevisiae are taken from (3) and shown for nonoverlapping intergenic sequence segments of length 100 bp. Data points not shown reflect insufficient phenotype counts Inline graphic. (B) The scaled fitness landscape Inline graphic inferred from the genomic phenotype distribution (by Eq. 1). This landscape shows that direct selection acts on both phenotypes and establishes sequence-dictated nucleosome positioning as a primary mode of the evolution of intergenic DNA.

Phenotype-Dependent Fitness Landscape.

To infer a map between phenotype and fitness, we compare the genomic distribution of phenotype value pairs, Inline graphic, with the corresponding distribution Inline graphic evaluated in a suitable null model. To obtain Inline graphic, we construct a tiling of the yeast genome into nonoverlapping segments of fixed length Inline graphic bp. This procedure is designed to avoid overcounting in longer NDRs and to make the phenotype data comparable between segments (for details, see Methods). The resulting distribution Inline graphic for intergenic sequence in S. cerevisiae is shown in Fig. S2A. As a genomic null model, we use uncorrelated random sequence, which implies that nucleotide triplets conferring specific local elasticity properties are scrambled in the null model. The resulting phenotypic null distribution may be approximated as a product, Inline graphic. We obtain the marginal distribution Inline graphic using the same tiling procedure as in the actual yeast genome (which ensures that our results are insensitive to its bioinformatic details). This distribution is shown as a black line in Fig. 2. The marginal distribution Inline graphic can even be evaluated analytically, using the information content (or relative entropy) of the binding motifs of individual transcription factors. Details on both components of the null model are given in SI Text. The resulting joint distribution Inline graphic is shown in Fig. S2B.

Fig. 2.

Fig. 2.

Selection against nucleosome formation. Distribution of histone binding affinity for nonoverlapping intergenic segments of length 100 bp in S. cerevisiae, Inline graphic (purple ●), compared with the analogous distribution from random sequence Inline graphic (solid black line). Both distributions are evaluated in bins of width 0.05. The effective scaled fitness landscape for histone binding affinity, Inline graphic (red line), is the log-likelihood of the distributions Inline graphic and Inline graphic.

We now can infer the scaled phenotype-fitness map Inline graphic as the log-likelihood score of the genomic phenotype distribution and the null distribution (29, 30):

graphic file with name pnas.1210887110eq1.jpg

All fitness values on the left-hand side are measured in units of Inline graphic, where N is the effective population size. This landscape is defined up to an arbitrary constant, because only fitness differences (selection coefficients) enter the evolution of phenotype frequencies. Our inference of selection involves several assumptions. First, Eq. 1 is valid if nucleosome positioning is at an evolutionary equilibrium of mutations, genetic drift, and selection. This assumption is corroborated by our cross-species analysis described below. Second, the landscape Inline graphic is inferred from all intergenic sequence segments. The underlying uniformity assumption may be relaxed: If the fraction of segments under selection against histone binding is anywhere above ∼20%, our inference of selection essentially remains unchanged in the regime of reduced affinity, Inline graphic (SI Text and Fig. S3A). Similarly, our results are insensitive to variations of the tiling length Inline graphic within the length range of functional NDRs, as shown in Fig. S3B.

The scaled fitness landscape Inline graphic inferred for S. cerevisiae intergenic sequence is shown in Fig. 1B. It reveals substantial selection on both histone binding affinity and transcriptional regulation: We find scaled fitness differences Inline graphic in our set of intergenic segments. Importantly, the selection on histone binding affinity is a primary effect; that is, the overrepresentation of NDRs in the yeast genome cannot be explained by direct selection on regulatory site content alone. Our finding of substantial direct selection on ω gives an a posteriori justification for our choice of this phenotype. Before we discuss the implications of the inferred fitness landscape, we test its predictions for evolution of sequence-dictated nucleosome positioning within and across species.

Selection Against Nucleosome Formation.

As shown in Fig. 1B, the selection on histone binding affinity does not depend strongly on the regulatory phenotype n. Therefore, it can be evaluated in good approximation from an effective fitness landscape for histone binding affinity, Inline graphic, which is most convenient for our subsequent evolutionary analysis. This landscape is inferred from the marginal distributions Inline graphic and Inline graphic by an equilibrium relation analogous to Eq. 1, and is shown in Fig. 2. Again, the function Inline graphic is insensitive to the fraction of segments under selection and to the choice of tiling length (SI Text and Fig. S3).

The effective fitness landscape shows that selection in favor of nucleosome depletion acts across a broad range of affinity values, beyond what commonly would be considered a nucleosome-free region. This implies that there is predominantly directional selection on affinity changes,

graphic file with name pnas.1210887110eq2.jpg

with an average proportionality constant Inline graphic obtained from a linear fit to the function Inline graphic in the range Inline graphic. Affinity changes of Inline graphic are under substantial selection, i.e., they lead to fitness changes of magnitude Inline graphic. However, most point mutations confer smaller affinity changes and are only weakly selected. The efficacy of selection on nucleosome formation is not caused by large effects of single mutations, but by the multitude of elasticity-changing mutations in an extended sequence segment.

Selection on Affinity Polymorphisms.

We now show that the fitness landscape of Eq. 2 correctly predicts the frequency bias of intergenic single-nucleotide polymorphisms (SNPs) that is related to selection against nucleosome formation. From the Saccharomyces Genome Resequencing Project, we obtained the genomes of 35 Saccharomyces paradoxus isolates and their alignments (Methods). We choose this species for the analysis because it has a simpler population structure than S. cerevisiae (31). We analyze SNPs in nonoverlapping intergenic NDRs with Inline graphic identified on the S. paradoxus reference genome. To determine the SNP allele frequency as a function of the associated phenotypic effect, we compute the average binding affinity in the two subpopulations carrying either allele. In this way, we obtain a polarized phenotype difference Inline graphic, where Inline graphic denotes the larger and Inline graphic the smaller of the two subpopulation averages. Under selection against histone binding, we expect a decrease in the average frequency of the high-affinity allele, Inline graphic, with increasing deleterious effect. Fig. 3 shows the data points Inline graphic and the resulting average frequencies in bins of the affinity difference. These data permit a linear fit of Inline graphic as a function of Inline graphic,

graphic file with name pnas.1210887110eq3.jpg

with a proportionality constant Inline graphic. On the other hand, our fitness landscape predicts the scaled selection coefficient Inline graphic for each of these SNPs according to Eq. 2. Assuming approximate linkage equilibrium, the classic equilibrium allele frequency distribution Inline graphic then determines the expected frequency of the deleterious allele, Inline graphic (32) (Methods). To leading order, we obtain a linear dependence as in Eq. 3 with a predicted value Inline graphic. This is in good agreement with the observed value for S. paradoxus polymorphisms. Here, we treat the S. paradoxus isolates as a mixed population. Performing this analysis separately for the three major subpopulations in the sample (31), we find that population structure has only a minor influence on the signal of selection (Fig. S4).

Fig. 3.

Fig. 3.

Selection on SNPs. The data points show the frequency of the high-affinity allele, Inline graphic, as a function of the phenotypic effect (i.e., the difference Inline graphic between both alleles) for SNPs in intergenic S. paradoxus NDRs with Inline graphic (green dots, with size indicating the number of SNPs contributing to the data point). From these data, we evaluated the effect-dependent average frequency Inline graphic (in Inline graphic-bins of size 0.05; green dots with error bars, joined by solid green line). Its approximately linear decrease follows Eq. 3 (least-squares fit, dashed green line) and shows that there is weak selection against alleles of higher affinity. The prediction from the fitness landscape Inline graphic (dashed red line; see text) is in good agreement with the data. The expectation under neutrality is a constant, Inline graphic (dashed blue line), and is inconsistent with the data.

Our polymorphism analysis establishes a quantitative inference of selection on NDRs on a microevolutionary timescale, despite the fact that individual mutations are under only weak to moderate selection. Importantly, apparent selection acting on sequence traits other than those relevant to nucleosome depletion is generally random with respect to the phenotype polarization. Therefore, the expectation value of the frequency of the deleterious allele as a function of the selection coefficient, Inline graphic, is affected only to a small extent by sequence conservation, say, due to the presence of transcription factor binding sites.

Conservation of Histone Binding Affinity and Equilibrium.

Our equilibrium theory of nucleosome positioning makes a definite prediction for cross-species evolution: The phenotype distribution Inline graphic and, hence, the number of NDRs below a given affinity threshold are conserved. Fig. 4A compares the genomic distributions Inline graphic for S. cerevisiae and S. paradoxus intergenic regions. These distributions indeed are strikingly similar between the two species. We can compare this conservation with simulated neutral evolution of an ensemble of sequence segments with the S. cerevisiae distribution Inline graphic as the initial condition (Methods and SI Text). Already over the distance between S. cerevisiae and S. paradoxus, the neutrally evolved sequences show a significant decrease in low-affinity counts, which is inconsistent with the data. For example, we obtain a conserved number of about Inline graphic nonoverlapping intergenic NDRs with length 100 bp and Inline graphic in the actual S. cerevisiae and S. paradoxus genomes. In contrast, the count of NDRs with the same characteristics drops to about 980 for simulated neutral evolution over the evolutionary distance between S. cerevisiae and S. paradoxus, and to 170 at neutral equilibrium. Similar results are obtained in a three-species comparison of S. cerevisiae, S. paradoxus, and Saccharomyces bayanus.

Fig. 4.

Fig. 4.

Cross-species evolution of histone binding affinity. (A) Distribution of histone binding affinity, Inline graphic, for intergenic segments of length 100 bp with Inline graphic in S. paradoxus (green ●) and in S. cerevisiae (purple ●, same as Fig. 2). These distributions are very similar, which is consistent with evolutionary equilibrium under selection given by the fitness landscape Inline graphic. In contrast, simulated neutral evolution (blue ●) already leads to a significant reduction of low-affinity counts over the same evolutionary distance, and would approach the neutral equilibrium distribution Inline graphic (black line, same as Fig. 2) in the long-time limit. (B) Cross-species distribution of affinity pairs Inline graphic for NDRs in S. cerevisiae and their aligned sequences in S. paradoxus (gray contour areas). The conditional average (green line) and standard deviation (green bars) of Inline graphic is plotted as a function of Inline graphic. We compare these data with the conditional distributions Inline graphic for simulated evolution in the fitness landscape Inline graphic and under neutrality (average, red and blue lines; standard deviation, red and blue bars). The cross-species data are consistent with evolution under directional selection against nucleosome formation. At the same time, the near-neutral standard deviation shows the variability of cross-species affinity evolution under this fitness model.

The observed cross-species conservation of affinity distribution Inline graphic and NDR number corroborates the assumption of evolutionary equilibrium underlying our analysis. The equilibrium state is characterized by detailed balance: Between two species, the number of genome segments increasing in affinity above a given threshold equals the number of segments decreasing below the same threshold. As we show below, this turnover describes the occupancy variability of individual NDRs between species.

To test the predictions of our fitness model for the divergence statistics of histone binding affinity, we mapped the set of intergenic NDR segments with Inline graphic in S. cerevisiae onto their aligned segments in S. paradoxus (Methods and SI Text). Fig. 4B shows the contour lines and binned averages of the resulting scatter plot Inline graphic. These pairs have lower mean affinity values in S. cerevisiae compared with S. paradoxus. This merely reflects our choice of base species (the opposite effect is observed if the alignment is constructed from a base set of S. paradoxus NDRs).

We can compare the actual process with in silico evolution under selection, using a Wright–Fisher simulation of the S. cerevisiae NDR sequences in the fitness landscape Inline graphic (for details, see Methods and SI Text). Fig 4B shows the binned average and standard deviation of the resulting conditional distribution Inline graphic for cross-species phenotype evolution. We find both quantities to be in quantitative agreement with the observed divergence statistics between S. cerevisiae and S. paradoxus. We conclude that our fitness landscape captures selection in favor of nucleosome depletion also over longer evolutionary times.

We also can compare the cross-species data to simulations of neutral evolution. Across the whole range of affinity values on S. cerevisiae NDRs, neutral evolution leads to an average affinity gain—i.e., an average loss of NDR function—that is inconsistent with the observed process. At the same time, the standard deviation of the cross-species affinity change is similar to the neutral value; i.e., the fitness landscape does not strongly constrain phenotype variability. This is in accordance with previous findings showing a high variance across loci in the divergence of both NDR occupancy and A:T enrichment (3).

Discussion

We have inferred a phenotype-fitness map Inline graphic for yeast intergenic sequence segments, which measures selection depending on histone binding affinity and regulatory site content (Fig. 1B). This map offers a quantitative solution to the chicken-and-egg problem posed in the introduction: Can we rank nucleosome positioning and transcriptional regulation with respect to their selective effects on intergenic sequence? As shown in Fig. 1B, fitness has a genuinely two-dimensional phenotype target: there are two chickens. Histone binding and transcription factor binding are separable primary modes of the evolution of intergenic DNA, subject to direct selection of comparable strength. The selection on histone binding spans an extended set of nucleosome-depleted intergenic segments, which have affinity values up to above 50%. This result contrasts with the merely passive role of DNA methylation that has been inferred from cell-type specific variations of the methylation pattern in human and mouse (33, 34).

Direct selection on nucleosome affinity has an important biological consequence. It establishes a set of nucleosome-depleted regions that are earmarked for interactions with transcription factors. The reduced nucleosome affinity not only increases the equilibrium coverage with transcription factors, but also may speed up the search kinetics of factor molecules toward their binding sites. Because these effects are largely independent of the actual coverage with binding sites, they facilitate binding site turnover and the adaptive formation of new sites. At the same time, the directional selection against histone binding given by our fitness landscape does not favor a specific affinity value, which is consistent with the observed cross-species variability of the affinity phenotype. This may suggest a two-tier model of selection on nucleosome-depleted intergenic regions: Elasticity-mediated directional selection broadly reduces nucleosome coverage, whereas balancing selection jointly tunes nucleosome and transcription factor coverage to gene-specific values.

The phenotypes used in this paper, histone binding affinity and regulatory site content, are distilled from the underlying cellular biophysics. A phenotype-based inference of selection is particularly relevant for histone binding, a quantitative trait that has extended (>100 bp) sequence targets with small phenotypic effects of individual mutations. Only by mapping nucleotide changes onto an affinity phenotype can we infer substantial aggregate selection against nucleosome formation. However, given the complexity of the molecular machinery of transcriptional regulation and chromatin organization, our analysis in terms of just two phenotypes is necessarily incomplete. For example, histone binding in vivo is expected to depend on additional sequence features besides our elasticity-mediated binding phenotype (10). Integrating additional phenotypes into the inference of selection leads to a higher-dimensional fitness landscape, which can be analyzed for its principal directions of selection. The projection on the two phenotypes used in this paper likely will lead to an underestimate but will not generate a spurious signal of selection. A more comprehensive analysis can also address fitness interactions or interference selection; our results suggest an avenue to infer these effects by a phenotype-based approach.

From a broader perspective, this paper is a case study analyzing quantitative traits that are encoded in overlapping sequence and represent coupled molecular functions. This scenario is at some distance from idealized models of population genetics and quantitative genetics but probably is typical—at least in the densely packed genomes of prokaryotes and unicellular eukaryotes. We have shown that a joint phenotype-fitness map can disentangle selective effects on such functions, i.e., distinguish direct from apparent selection. We expect this method to be applicable to a broader class of complex molecular functions, for which we can measure or infer at least some key phenotypes.

Methods

Histone Binding Affinity.

The biophysical model for histone binding underlying our analysis follows (20, 28). This model defines a histone-binding free energy landscape Inline graphic as a function of the 5′ genomic coordinate r of a nucleosome. The free energy of a DNA sequence segment Inline graphic is given by

graphic file with name pnas.1210887110uneq4.jpg

where Inline graphic denote trinucleotide subsegments; Inline graphic are the roll, twist, and tilt deformations in the nucleosome state, Inline graphic are the intrinsic deformations in the unbound state (35), Inline graphic denotes the corresponding elastic constants, and we use a core binding length Inline graphic bp (28). The statistics of nucleosome positioning is then given by standard equilibrium thermodynamics. It may be derived from the grand canonical partition function

graphic file with name pnas.1210887110uneq5.jpg

with the no-overlap constraint Inline graphic. The partition function depends on the temperature via Inline graphic and on the chemical potential η, which are adjusted to in vivo conditions. This determines the expected single-nucleotide nucleosome occupancies (36),

graphic file with name pnas.1210887110uneq6.jpg

and the expected mean occupancy

graphic file with name pnas.1210887110uneq7.jpg

over sequence segments of length Inline graphic. The dependence of ω on local binding energies and on the chemical potential is shown in Fig. S1.

Data Analysis.

We used genomic sequences and their alignments from University of California, Santa Cruz (UCSC) Genome Browser (sacCer3) for the interspecific analysis of S. cerevisiae and S. paradoxus. Up to a threshold, insertions and deletions were corrected to exclude alignment uncertainties. This procedure did not affect our cross-species analysis (for details, see SI Text and Fig. S5). The resulting total sequence length was Inline graphic bp, with Inline graphic bp in intergenic regions (37). The second dataset, obtained from the Saccharomyces Genome Resequencing Project, contains aligned genomes of 35 S. paradoxus strains, including SNPs. This dataset has a well-separable substructure (31). To control for demographic effects, we partitioned this dataset into three groups (European, Far Eastern, and American). We obtained annotated transcription factor binding sites on S. cerevisiae from the SwissRegulon Portal (Feb 2012) (22). Only nonoverlapping binding sites with a posterior probability >0.5 were used. To identify low-occupancy regions predicted by our affinity model, we constructed a tiling of the genome into nonoverlapping segments of fixed length Inline graphic bp, using a dynamic programming algorithm with an upper bound of 0.95 of the predicted mean nucleosome occupancy ω in each individual segment. Experimental in vivo nucleosome occupancy scores for S. cerevisiae were obtained from the Gene Expression Omnibus database (accession series GSE22211) (3) and processed to reduce the effects of measurement uncertainties (SI Text).

Polymorphism Statistics.

To predict the expected deleterious allele frequency given by the fitness landscape, we use the equilibrium allele frequency spectrum for a two-allele locus, Inline graphic, where Inline graphic is the scaled selection coefficient, Inline graphic is the scaled neutral mutation rate, and Inline graphic is a normalization factor. From this distribution, we determine the allele frequency spectrum for polymorphic loci, Inline graphic, in a set of m isolates by binomial sampling Inline graphic. This distribution produces an average frequency of the deleterious allele, Inline graphic, with a proportionality constant Inline graphic (for Inline graphic).

Modeling Sequence Evolution.

We use a Wright–Fisher simulation for a population of NDR sequences evolving under mutations, genetic drift, and selection given by the fitness landscape Inline graphic. The evolutionary time for simulation of the cross-species evolution is chosen so that the average sequence divergence in the set of predicted NDRs equals the observed real value of 13%. Simulations of neutral evolution use the same model, but without selection. More details are given in SI Text and Fig. S6.

Supplementary Material

Supporting Information

Acknowledgments

We thank Alain Arneodo for providing the sequence-based algorithm to compute histone binding energies and nucleosome occupancy (28) and Stephan Schiffels for kindly making his Wright–Fisher evolution model algorithm available. We also are grateful for stimulating discussions with Ville Mustonen. This work was supported by Deutsche Forschungsgemeinschaft Grant SFB 680, by German Federal Ministry of Education and Research Grant 0315893-Sybacol, and in part by the National Science Foundation (NSF) under Grant NSF PHY05-51164 during a visit to the Kavli Institute for Theoretical Physics (Santa Barbara, CA).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1210887110/-/DCSupplemental.

References

  • 1.Lee W, et al. A high-resolution atlas of nucleosome occupancy in yeast. Nat Genet. 2007;39(10):1235–1244. doi: 10.1038/ng2117. [DOI] [PubMed] [Google Scholar]
  • 2.Bai L, Morozov AV. Gene regulation by nucleosome positioning. Trends Genet. 2010;26(11):476–483. doi: 10.1016/j.tig.2010.08.003. [DOI] [PubMed] [Google Scholar]
  • 3.Tsankov AM, Thompson DA, Socha A, Regev A, Rando OJ. The role of nucleosome positioning in the evolution of gene regulation. PLoS Biol. 2010;8(7):e1000414. doi: 10.1371/journal.pbio.1000414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Luger K, Mäder AW, Richmond RK, Sargent DF, Richmond TJ. Crystal structure of the nucleosome core particle at 2.8 Å resolution. Nature. 1997;389(6648):251–260. doi: 10.1038/38444. [DOI] [PubMed] [Google Scholar]
  • 5.Field Y, et al. Gene expression divergence in yeast is coupled to evolution of DNA-encoded nucleosome organization. Nat Genet. 2009;41(4):438–445. doi: 10.1038/ng.324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Shivaswamy S, et al. Dynamic remodeling of individual nucleosomes across a eukaryotic genome in response to transcriptional perturbation. PLoS Biol. 2008;6(3):e65. doi: 10.1371/journal.pbio.0060065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Jiang C, Pugh BF. Nucleosome positioning and gene regulation: Advances through genomics. Nat Rev Genet. 2009;10(3):161–172. doi: 10.1038/nrg2522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Radman-Livaja M, Rando OJ. Nucleosome positioning: How is it established, and why does it matter? Dev Biol. 2010;339(2):258–266. doi: 10.1016/j.ydbio.2009.06.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Swamy KBS, Chu W-Y, Wang C-Y, Tsai H-K, Wang D. Evidence of association between nucleosome occupancy and the evolution of transcription factor binding sites in yeast. BMC Evol Biol. 2011;11:150. doi: 10.1186/1471-2148-11-150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Segal E, et al. A genomic code for nucleosome positioning. Nature. 2006;442(7104):772–778. doi: 10.1038/nature04979. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Thåström A, et al. Sequence motifs and free energies of selected natural and non-natural nucleosome positioning DNA sequences. J Mol Biol. 1999;288(2):213–229. doi: 10.1006/jmbi.1999.2686. [DOI] [PubMed] [Google Scholar]
  • 12.Widom J. Role of DNA sequence in nucleosome stability and dynamics. Q Rev Biophys. 2001;34(3):269–324. doi: 10.1017/s0033583501003699. [DOI] [PubMed] [Google Scholar]
  • 13.Segal E, Widom J. Poly(dA:dT) tracts: Major determinants of nucleosome organization. Curr Opin Struct Biol. 2009;19(1):65–71. doi: 10.1016/j.sbi.2009.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Schones DE, et al. Dynamic regulation of nucleosome positioning in the human genome. Cell. 2008;132(5):887–898. doi: 10.1016/j.cell.2008.02.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Yuan G-C, et al. Genome-scale identification of nucleosome positions in S. cerevisiae. Science. 2005;309(5734):626–630. doi: 10.1126/science.1112178. [DOI] [PubMed] [Google Scholar]
  • 16.Cairns BR. The logic of chromatin architecture and remodelling at promoters. Nature. 2009;461(7261):193–198. doi: 10.1038/nature08450. [DOI] [PubMed] [Google Scholar]
  • 17.Whitehouse I, Rando OJ, Delrow J, Tsukiyama T. Chromatin remodelling at promoters suppresses antisense transcription. Nature. 2007;450(7172):1031–1035. doi: 10.1038/nature06391. [DOI] [PubMed] [Google Scholar]
  • 18.Kornberg RD, Stryer L. Statistical distributions of nucleosomes: Nonrandom locations by a stochastic mechanism. Nucleic Acids Res. 1988;16(14A):6677–6690. doi: 10.1093/nar/16.14.6677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Mavrich TN, et al. A barrier nucleosome model for statistical positioning of nucleosomes throughout the yeast genome. Genome Res. 2008;18(7):1073–1083. doi: 10.1101/gr.078261.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Milani P, et al. Nucleosome positioning by genomic excluding-energy barriers. Proc Natl Acad Sci USA. 2009;106(52):22257–22262. doi: 10.1073/pnas.0909511106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Möbius W, Gerland U. Quantitative test of the barrier nucleosome model for statistical positioning of nucleosomes up- and downstream of transcription start sites. PLoS Comp Biol. 2010;6(8):e1000891. doi: 10.1371/journal.pcbi.1000891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.van Nimwegen E. Finding regulatory elements and regulatory motifs: A general probabilistic framework. BMC Bioinformatics. 2007;8(Suppl 6):S4. doi: 10.1186/1471-2105-8-S6-S4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Tirosh I, Sigal N, Barkai N. Divergence of nucleosome positioning between two closely related yeast species: Genetic basis and functional consequences. Mol Syst Biol. 2010;6:365. doi: 10.1038/msb.2010.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Warnecke T, Batada NN, Hurst LD. The impact of the nucleosome code on protein-coding sequence evolution in yeast. PLoS Genet. 2008;4(11):e1000250. doi: 10.1371/journal.pgen.1000250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Washietl S, Machné R, Goldman N. Evolutionary footprints of nucleosome positions in yeast. Trends Genet. 2008;24(12):583–587. doi: 10.1016/j.tig.2008.09.003. [DOI] [PubMed] [Google Scholar]
  • 26.Kenigsberg E, Bar A, Segal E, Tanay A. Widespread compensatory evolution conserves DNA-encoded nucleosome organization in yeast. PLOS Comput Biol. 2010;6(12):e1001039. doi: 10.1371/journal.pcbi.1001039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Prendergast JG, Semple CA. Widespread signatures of recent selection linked to nucleosome positioning in the human lineage. Genome Res. 2011;21(11):1777–1787. doi: 10.1101/gr.122275.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Vaillant C, Audit B, Arneodo A. Experiments confirm the influence of genome long-range correlations on nucleosome positioning. Phys Rev Lett. 2007;99(21):218103. doi: 10.1103/PhysRevLett.99.218103. [DOI] [PubMed] [Google Scholar]
  • 29.Berg J, Willmann S, Lässig M. Adaptive evolution of transcription factor binding sites. BMC Evol Biol. 2004;4:42. doi: 10.1186/1471-2148-4-42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Mustonen V, Kinney J, Callan CG, Jr, Lässig M. Energy-dependent fitness: A quantitative model for the evolution of yeast transcription factor binding sites. Proc Natl Acad Sci USA. 2008;105(34):12376–12381. doi: 10.1073/pnas.0805909105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Liti G, et al. Population genomics of domestic and wild yeasts. Nature. 2009;458(7236):337–341. doi: 10.1038/nature07743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Wright S. The distribution of gene frequencies in populations. Proc Natl Acad Sci USA. 1937;23(6):307–320. doi: 10.1073/pnas.23.6.307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Thurman RE, et al. The accessible chromatin landscape of the human genome. Nature. 2012;489(7414):75–82. doi: 10.1038/nature11232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Stadler M, et al. DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature. 2011;480(7378):490–495. doi: 10.1038/nature10716. [DOI] [PubMed] [Google Scholar]
  • 35.Goodsell DS, Dickerson RE. Bending and curvature calculations in B-DNA. Nucleic Acids Res. 1994;22(24):5497–5503. doi: 10.1093/nar/22.24.5497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Percus JK. Equilibrium state of a classical fluid of hard rods in an external field. J Stat Phys. 1976;15(6):505–511. [Google Scholar]
  • 37.Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003;423(6937):241–254. doi: 10.1038/nature01644. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES