SUMMARY
Nucleosome organization influences gene activity by controlling DNA accessibility to transcription machinery. Here, we develop a chemical biology approach to determine mammalian nucleosome positions genome-wide. We uncovered surprising features of nucleosome organization in mouse embryonic stem cells. In contrast to the prevailing model, we observe that for nearly all mouse genes a class of fragile nucleosomes occupies previously designated nucleosome-depleted regions around transcription start sites and transcription termination sites. We show that nucleosomes occupy DNA targets for a subset of DNA-binding proteins, including CTCF and pluripotency factors. Furthermore, we provide evidence that promoter-proximal nucleosomes, with the +1 nucleosome in particular, contribute to the pausing of RNA polymerase II. Lastly, we find a characteristic preference for nucleosomes at exon-intron junctions. Taken together, we establish an accurate method for defining the nucleosome landscape, and provide a valuable resource for studying nucleosome-mediated gene regulation in mammalian cells.
Graphical abstract
A chemical approach to accurately map nucleosome positions in mouse embryonic stem cells reveals unexpected insights into nucleosome organization in transcriptional start and stop sites as well as in DNA binding regions for CTCF and pluripotency factors.

Introduction
The positioning of nucleosomes encodes epigenetic information that controls the output of the genome. A general notion is that nucleosomes compete with cellular machinery to access DNA and function mainly to inhibit DNA-dependent processes (Kornberg and Lorch, 1999). In the context of transcriptional regulation, such competition manifests itself in two ways. First, early studies have shown that transcription factors (TFs) are sufficient to evict nucleosomes upon binding, but nucleosome assembly can also antagonize TF binding to a promoter (Workman and Kingston, 1998). The competition between nucleosomes and TFs in vivo has been best demonstrated at the inducible Pho5 promoter in S. cerevisiae (Korber and Barbaric, 2014). Under non-permissive conditions for Pho5, nucleosomes are well positioned to occlude Pho4 binding sites. Upon phosphate starvation, TFs Pho2 and Pho4 outcompete nucleosomes, leading to the transcriptional activation of Pho5. Meanwhile, in vitro competition experiments showed that precise nucleosome location relative to TF binding sites can influence TF occupancy by orders of magnitude (Polach and Widom, 1995). Recently, nucleosome exclusion by poly(dA) sequences near TF binding sites in yeast promoters further illustrated that nucleosome-TF competition can result in large changes in transcription outputs in vivo (Raveh-Sadka et al., 2012). In addition to competition with TFs, nucleosomes can function as a physical barrier to RNA polymerase II (RNAPII) (Teves et al., 2014). In vitro transcription studies on chromatin templates have revealed that nucleosomes impede elongating RNAPII during transcription (Kulaeva et al., 2013). RNAPII often pauses on a defined nucleosome template in front of the dyad and at the dyad axis of the +1 nucleosome (Bondarenko et al., 2006; Kulaeva et al., 2013).
With the advent of high-throughput sequencing technology, nucleosome positions across the genome have been determined for a variety of species [For review, see (Hughes and Rando, 2014)]. Some common themes emerge from these genome-wide analyses (Hughes and Rando, 2014; Struhl and Segal, 2013). Most noticeably, promoters are often devoid of nucleosomes in all species. Nucleosome occupancy at promoters typically anti-correlates with gene expression level. Highly expressed genes are more nucleosome depleted and poorly expressed genes less so. In addition to promoters, other regulatory elements including enhancers, insulators, and transcription termination sites are depleted of nucleosomes. These observations support the view that TFs and nucleosomes broadly compete for genome accessibility. However, concerning the barrier function of nucleosomes, while in vitro studies have clearly illustrated that the +1 nucleosome imposes a strong barrier to RNAPII, genome-wide studies have been equivocal as to whether such a role is realized in vivo. For example, global analysis of RNAPII occupancy by PRO-seq and NET-seq in Drosophila showed that RNAPII accumulates in front of the +1 nucleosome, suggesting that the +1 nucleosome acts as a transcriptional barrier (Kwak et al., 2013; Mavrich et al., 2008; Weber et al., 2014). In disagreement with these data, other studies using RNAPII ChIP-seq found that RNAPII pausing at the proximal promoter was correlated with low occupancy of the +1 nucleosome, arguing against the barrier function of nucleosomes (Gilchrist et al., 2010).
Most nucleosome positioning maps have been generated by micrococcal nuclease digestion, followed by high-throughput sequencing (MNase-seq). This method indirectly infers the nucleosome positions based on nucleosome-protected genomic fragments. Systematic biases, including sequence preference of MNase (Chung et al., 2010), spontaneous unwrapping of nucleosomes, and binding of other chromatin factors, limit the accuracy of these nucleosome maps. We have recently developed a new genome-wide nucleosome mapping approach to directly determine nucleosome center positions at base pair resolution in yeast (Brogaard et al., 2012; Moyle-Heyrman et al., 2013) based on a chemical cleavage reaction described by Flaus and Richmond (Flaus et al., 1996). In this approach, substitution of wild type histone H4 gene with a H4S47C mutant allows for site-directed DNA cleavage near the nucleosome dyad by hydroxyl radicals generated by Fenton reactions, and thus the direct determination of nucleosome centers.
Here we devised a chemical mapping strategy to determine genome-wide nucleosome positions in mouse embryonic stem (ES) cells with unprecedented resolution. Chemical mapping enables us to identify a class of fragile nucleosomes at distinct genomic landmarks, including transcription start sites (TSS), transcription termination sites (TTS), insulator CTCF binding sites, and pluripotency factor binding sites. Furthermore, our analysis provides evidence that nucleosomes located near promoters or intron-exon junctions contribute to RNAPII pausing at corresponding genomic locations. Thus, the first chemically-defined mouse nucleosome map provides a new perspective on nucleosome-mediated regulation of transcription and splicing in mammalian cells.
RESULTS
A chemical method to mapping genome-wide nucleosome positions in mouse ES cells
The chemical mapping method requires introducing a cysteine substitution at serine 47 in histone H4 to localize free radical-mediated cleavage of nucleosome DNA near the dyad (Brogaard et al., 2012; Flaus et al., 1996) (Figure 1A). We designed a strategy to substitute a majority of endogenous H4 proteins with mutant H4S47C. First, we identified shRNAs against a common region shared by all H4 genes (Figure S1A–C). Second, we simultaneously expressed H4-shRNA and RNAi-resistant H4S47C stably in mouse ES cells, minimizing the deleterious effects from knockdown of H4 (Figure S1B, D). Clones with high levels of H4S47C and low levels of endogenous H4 were selected to validate the functionality of H4S47C in mouse ES cells (Figure S1D). In selected clones, the overall histone expression in H4S47C ES cells was similar to WT cells (Figure 1B). By pulse labeling with the methionine analog, azidohomoalanine (Deal et al., 2010), we showed that H4S47C and WT ES cells displayed a similar synthesis rate for H4 (Figure S1E). These cells also exhibited a normal growth rate and normal levels of pluripotency markers (Figure S1F–G). Furthermore, RNA-seq analysis showed that global gene expression profiles were highly consistent between H4S47C and WT ES cells under 2i and Lif culture conditions (Figure 1C, S1H). In addition, H4S47C ES cells could differentiate in an embryoid body (EB) differentiation assay (Figure S1I). Lastly, we experimentally determined nucleosome positions in H4S47C and WT ES cells using MNase. Comparison of genome-wide nucleosome positions showed that MNase-defined nucleosome centers in H4S47C ES cells agreed with nucleosome centers in WT cells (Figure 1D). Taken together, our analysis demonstrates that H4S47C substitution in mouse ES cells functionally replaces the WT H4, thereby enabling us to perform chemical mapping of nucleosome positions for the mouse genome.
Figure 1. Generation of a Chemical Nucleosome Map for Mouse ES cells.

(A) Experimental design to chemically map nucleosomes in mouse ES cells. (B) Top, total histones purified from WT and H4S47C ES cells resolved by SDS-PAGE. Bottom, western blot analysis on H4 protein levels in both cell lines. (C) Scatterplots of FPKM expression values from RNA-seq show pairwise comparison for 23,808 genes between WT and H4S47C datasets under 2i (left) or Lif (right) culture conditions. (D) Distance between unique nucleosomes from MNase maps of WT vs H4S47C ES cells. (E) A DNA agarose gel image shows specific cleavage products occur only in cells with H4S47C. (F) An exemplary genomic region displaying tracks from the chemical NCP score, chemical center-weighted nucleosome occupancy, and MNase center-weighted nucleosome occupancy, with two A/T-rich regions indicated with blue arrows. (G) The linker length distribution uncovered by the chemical map shows a preferential 10n + 5 bp pattern. See also Figure S1.
Next, we optimized experimental procedures to perform a copper ion-mediated Fenton reaction to cleave nucleosomal DNA in “A12” ES cells expressing H4S47C. The cleavage reaction resulted in the characteristic DNA ladder in the presence of the copper chelator (Figure 1E). We purified cleaved DNA fragments ranging from 140–300 bp for Illumina HiSeq paired-end sequencing. From two independent libraries, we obtained over 8.0 billion cleavages for both strands. The major cleavage sites on the Crick and Watson strands show characteristic distances with peaks at +2, −5, and −12 bp (Figure S1J), as previously observed in yeast (Brogaard et al., 2012; Moyle-Heyrman et al., 2013). Using a recently improved deconvolution algorithm (Xi et al., 2014), we calculated the nucleosome center positioning (NCP) score at every genomic location and generated a genome-wide map of 10.6 million unique nucleosomes. Although in general agreement with the MNase map, the chemical map features several pronounced advantages. First, the NCP scores reveal that many MNase-defined nucleosome centers represent several alternative nucleosome positions spaced by 10n bp in a base pair resolution view (Figure 1F). Second, the chemical map can faithfully detect nucleosome occupancy in A/T-rich regions where the MNase map often fails (indicated in Figure 1F). Third, the chemical map shows that linker length in mouse ES cells has a preferential form ~10n+5 bp (Figure 1G).
DNA sequence features and nucleosome positioning
DNA sequence features have been shown to significantly affect in vivo nucleosome positioning (Struhl and Segal, 2013). Notably, nucleosomes prefer two classes of dinucleotide motifs, AA/TT/AT/TA and GC/CG, to be periodically positioned in anti-phase along the nucleosomal DNA. Previous MNase-determined nucleosome positioning maps in higher organisms show relatively weak periodic dinucleotide signals within the nucleosome, raising the question to what extent DNA sequence contributes to nucleosome positioning in vivo (Zhang et al., 2010). By analyzing ~11 million unique nucleosomes in the chemical map, we show that chemically-defined nucleosomes in the mouse genome possess much stronger AA/TT/AT/TA dinucleotide motif signals compared with the MNase-generated map at periodic locations flanking the nucleosome dyad (Figure 2A). Thus, this result provides further evidence for the importance of such DNA sequence features in exact nucleosome positioning in higher organisms and illustrates improved accuracy of chemical mapping in mammalian cells. Furthermore, the observed 10 bp periodicity of AA/TT/AT/TA signals appears to be a universal feature in all nucleosomes whether in genic or intergenic regions or in genes of different expression levels (Figure S2AB). The variation of average dinucleotide signals within different genomic regions is caused mainly by the difference of A/T base composition of the DNA sequences.
Figure 2. DNA Sequence Features of Nucleosome Positioning in Mouse ES cells.

(A) AA/TT/AT/TA dinucleotide frequency within nucleosomes and its flanking regions based on the unique chemical map and the unique MNase map. (B) Normalized nucleosome occupancy score over poly(dA-dT) and poly(dG-dC) tracts (with zero mismatch) as a function of tract length in the log scale for S. cerevisiae, S. pombe, and mouse ES cells (mESC). The zero line indicates the genome average. (C) Distance between the centers of poly(dA-dT) or poly(dG-dC) tracts and uniquely defined nucleosomes in the chemical map of mouse ES cells for tract lengths equal to 5 bp and 15 bp. See also Figure S2.
By contrast, some DNA sequence features such as poly(dA-dT) tracts have been found to be associated with nucleosome-depleted regions (NDRs) in multiple species (Hughes et al., 2012; Segal and Widom, 2009; Struhl and Segal, 2013). Yet a recent chemical nucleosome map in S. pombe suggests that these polymers do not significantly affect nucleosome positioning when compared with S. cerevisiae (Moyle-Heyrman et al., 2013). In our chemical map, nucleosome occupancy was only reduced by 10% over a poly(dA-dT) tract of 20 bp and by 26% over a poly(dG-dC) tract of the same length compared with the genome average, as opposed to 75% and 84% respectively from the MNase map (Figure 2B, S2C). This result is in clear contrast to what has been previously described for S. cerevisiae.
In S. cerevisiae, the centers of poly(dA-dT) tracts were preferentially aligned with AA/TT/AT/TA motif sites (Figure S2D, top right). The poly(dG-dC) tracts also show periodic preference, anti-phased with poly(dA-dT) tracts (Figure S2D, bottom right). When compared with S. cerevisiae, we observed similar distribution patterns of these tracts in the chemical map of mouse ES cells (Figure S2D, left). In both species, while shorter tracts (dA-dT and dG-dC) were uniformly distributed across the nucleosome region, longer tracts of 15 bp in mouse ES cells are predominantly centered on ±54 bp (Figure 2C). However, these tracts do not appreciably deplete nucleosomes in mouse cells as in S. cerevisae.
Nucleosome positioning over the TSS
Genome-wide MNase mapping studies in all species have shown that actively transcribed genes exhibit a well-defined NDR localized upstream of the TSS and flanked downstream by strongly phased nucleosomes (Hughes and Rando, 2014). While the MNase map from H4S47C-expressing ES cells confirmed the characteristic depletion of nucleosomes directly upstream of the TSS of 21,677 mouse genes, our chemical map revealed an unprecedented nucleosome pattern at the TSS (Figure 3A, 3B, S3A). High nucleosome occupancy is observed in the previously designated −1 NDR. In principle, cysteine-bearing transcription factors could also direct hydroxyl radicals to produce cleavages in DNA, thus yielding false nucleosome signals. To investigate, we first calculated the cross-correlation of cleavages on the two strands in the −1 nucleosome region (−150 bp to 0 bp). The result shows a characteristic pattern with peaks at −12, −5, and +2 bp (Figure S3B), suggesting that cleavages in the −1 nucleosome region are predominantly nucleosome-specific (Figure S1J). Furthermore, the −1 nucleosomes defined in the chemical map exhibit strong periodic dinucleotide signals (Figure S3C), confirming the accuracy of mapped nucleosomes.
Figure 3. Nucleosome Occupancy Over the TSS in the Mouse Genome.

(A) Nucleosome occupancy around the TSS of Zik1 locus in mouse ES cells by chemical and MNase maps. (B) Nucleosome occupancy by chemical versus MNase maps for three species: mouse ES cells (ESC), S. cerevisiae, and S. pombe. Only genes with unique TSSs were included to avoid ambiguity. (C and D) TSSs are categorized into eight clusters using a k-means clustering algorithm based on the NCP score pattern in the [−150,250] bp region of the TSS (see details in Figure S3). (C) Cluster 1 (2595 genes) has a stronger −1 nucleosome signal compared with the +1 position. (D) Cluster 8 (2975 genes) exhibits a stronger +1 nucleosome signal and periodical alternative nucleosome positioning synchronized to base pair downstream of the TSS. (E) Center-weighted nucleosome occupancy around the TSS in gene expression quartiles (FPKM) in mouse ES cells: chemical map vs MNase map. (F) Read coverage scores around the TSS in quartiles of gene expression from partial MNase digestion (5 U/mL) of mouse ES cells. (G) Center-weighted short reads occupancy score (51–80 bp) around the TSS in gene expression quartiles (MNase data from (Carone et al., 2014)). (H) Distance between MNase map short reads (64–84 bp) and all unique nucleosomes or all −1 nucleosomes (−1 nucleosomes were defined as having a center in [−150, +5] bp region of a unique TSS). See also Figures S3, S4, S5, and S6.
This unexpected nucleosome pattern over the TSS seems to be species-specific. For example, the nucleosomal organization at the TSS in S. cerevisae agrees between MNase and chemical maps (Brogaard et al., 2012), whereas a modest nucleosome peak upstream the TSS was observed in the chemical map but was absent in the MNase map for S. pombe (Moyle-Heyrman et al., 2013).
In comparison with yeast, the nucleosome phasing pattern (peak-to-valley distance) downstream of the TSS is noticeably weaker in mouse ES cells, possibly due to population averaging of genes with different oscillating phases. Therefore, we performed a k-means clustering of the NCP score in the −150 to 250 bp region of the TSS, and identified 8 clusters with distinct nucleosome phasing patterns (Figure S3D). These gene clusters distinguish themselves from one another primarily by the relative occupancy of the −1, 0, +1 nucleosomes, or their distances relative to the TSS. First, none of the clusters show a depletion of nucleosomes over the −1 position. In fact, most of the clusters exhibit a strong or well-positioned nucleosome peak over the −1 (See Cluster 1 as an example in Figure 3C). Second, nucleosomes in each cluster oscillate in different phases relative to the TSS, which explains the degraded phasing pattern when all genes are superimposed at the TSS (Figure 3B). Third, many genes displayed alternative positioning in the −1/+1 nucleosomes with 10n bp spacing (Figure 3D, S3E).
Existing MNase maps have shown that nucleosome occupancy in promoter regions is negatively correlated with gene expression level (Hughes and Rando, 2014). Surprisingly, the chemical map exhibited a strong positive correlation between nucleosome occupancy and gene expression over the TSS (Figure 3E), in marked contrast to our MNase map as well as other published MNase studies (Carone et al., 2014; Teif et al., 2012). In addition, our result differs from the yeast chemical maps, which have shown that higher gene expression is commonly associated with lower nucleosome occupancy at the TSS ((Brogaard et al., 2012; Moyle-Heyrman et al., 2013); Figure S4A). We found that the positive correlation between nucleosome occupancy and gene activity holds true for all 8 clusters from the chemical map, indicating a genome-wide phenomenon (Figure S4B).
We asked whether MNase overdigestion in active promoter regions might account for the loss of nucleosome-protected DNA near the TSS. Consistent with this possibility, transcriptionally active promoters are well-known DNase I hypersensitivity sites (Figure S5A; ENCODE GSM1014154; (Vierstra et al., 2014)). Thus, higher accessibility of active promoters to MNase could result in the depletion of nucleosomal fragments. To test this idea, we performed partial MNase digestions and recovered mononucleosomal DNA for deep sequencing. The read occupancy from the partial MNase digestions showed that the region around the TSS was highly sensitive to MNase digestion (Figure 3F, S5BC). The resultant read occupancy pattern had the same positive correlation with gene expression, while the occupancy score in genes with higher expression degraded faster than genes with lower expression, as digestion proceeded from 5 U/mL to 15 U/mL (Figure 3F, S5C). If these reads were indeed overdigested in a complete MNase mapping experiment, these nucleosome fragments were likely converted into subnucleosomal fragments (Henikoff et al., 2011). Therefore, we analyzed 51–80 bp MNase short reads from a published mouse ES cell MNase footprinting dataset (Carone et al., 2014). We recapitulated the results in Carone et al, and showed that the MNase short reads were enriched around the TSS and exhibited a positive correlation with gene expression (Figure 3G). Importantly, a high portion of short reads (64–84 bp) were aligned with the −1 nucleosome in center positions (Figure 3H). The edges of MNase short reads were significantly enriched for A or T, reflecting an MNase cleavage preference (Figure S5D). While 51–80 bp MNase short reads have been considered to be the footprints for transcription factors and RNAPII (Henikoff et al., 2011), part of the MNase short reads in ES cells might have originated from overdigested nucleosomal fragments. Consistent with this analysis, a recently available MPE-Fe(II) coupled histone ChIP-seq dataset confirmed the occupancy of H3 and H2B at such sites (Ishii et al., 2015) (Figure S5E). Taken together, we argue that the most plausible explanation for the differential nucleosome occupancy patterns in the mouse MNase and chemical maps is MNase overdigestion of nucleosomes around the TSS. In line with our interpretation, a class of MNase-sensitive “fragile nucleosomes” was recently discovered through differential MNase digestion in the promoter region (Chereji et al., 2016; Kubik et al., 2015; Xi et al., 2011). Compared with MNase and MPE-Fe(II) mapping, an apparent benefit of the chemical mapping is its ability to identify and map “fragile” nucleosomes with high accuracy and quantify their occupancy at promoters and elsewhere in the mouse genome.
Nucleosome positioning over the TTS and gene body
The transcription termination site (TTS) is another classical NDR, as shown in the MNase map of H4S47C ES cells (Figure S6A). However, like the TSS, the chemical map indicated a well-positioned nucleosome at the site (Figure S6A). Furthermore, nucleosome occupancy at the TTS showed a similar positive correlation with gene expression in the chemical map but not in the MNase map (Figure S6B). Thus, in terms of the relationship between nucleosome occupancy and transcription activity, the chemical map provides a consistent view at the TSS and TTS for the mouse genome.
On a genome-wide scale, both the chemical and MNase maps agreed that intragenic regions featured higher nucleosome occupancy than intergenic regions (Figure S6C); and nucleosome occupancy correlates positively with gene expression levels in intragenic regions (for exons and introns) (Figure S6D), albeit less pronounced in the MNase-mapped exons. The chemical map further shows that short nucleosome linker lengths, <30 bp, tend to be more enriched for highly expressed genes (Figure S6E). Collectively, the chemical nucleosome map shows that transcriptionally active genes in ES cells are often associated with high nucleosome occupancy throughout the entire gene body, spanning the TSS, coding sequence, and TTS.
Nucleosome occupancy at functional CTCF binding sites
Systematic mapping of nucleosomes by MNase has shown that functional TF binding sites tend to be located in NDRs, a reflection of the competition between nucleosomes and TFs. The target site for CCCTC-binding factor (CTCF), the major insulator protein in vertebrates, is one of the best examples of these studies (Ong and Corces, 2014). It has been shown that CTCF-binding sites are normally located in the NDRs surrounded by well-positioned nucleosomes (Carone et al., 2014; Teif et al., 2012). However, recent in vivo ChIP analysis reveals that histone variants H3.3 and H2A.Z are enriched at CTCF binding sites (Jin et al., 2009). Thus, the question remains as to whether nucleosomes and CTCF can co-occupy their target sites.
We compiled approximately 36,800 CTCF-binding sites previously identified using ChIP-seq in mouse ES cells (Chen et al., 2008). A heat map of chemical NCP scores over CTCF binding sites was generated and sorted in descending order of ChIP-seq density, in conjunction with heat maps for nucleosome read coverage from complete and partial MNase maps, and short reads from MNase footprinting (Figure 4A). In agreement with previous studies (Carone et al., 2014; Teif et al., 2012), we found that CTCF binding sites were NDRs flanked by an array of strongly phased nucleosomes in the complete MNase map. However, in contrast, the chemical map revealed high nucleosome occupancy centered on CTCF binding sites. Moreover, these chemically-defined nucleosomes appeared to be extremely susceptible to MNase digestion as indicated by partial MNase digestion maps and short reads from MNase footprinting data in mouse ES cells (Carone et al., 2014).
Figure 4. Nucleosome Occupancy Around the CTCF Binding Sites.

(A) Heat maps of complete MNase digestion map read coverage scores, chemical map NCP scores, partial MNase digestion map (15 U/mL) read coverage scores, and MNase short read coverage scores (51–80 bp) sorted in descending order of ChIP-seq density at CTCF binding sites. The CTCF ChIP-seq data from GEO accession GSM288351 (Chen et al., 2008)). (B,C) Plots of center-weighted nucleosome occupancy in the chemical map and the complete MNase digestion in quartiles of CTCF ChIP-seq signal. (D) Cross-strand cleavage cross-correlation in the ±75 bp region of CTCF sites shows the nucleosome-specific cleavage pattern. (E) AA/TT/AT/TA frequency plot for unique nucleosomes defined in the ±75 bp region of CTCF sites. (F, G) Read coverage score from partial MNase digestion maps (15 U/mL) and MNase short read coverage score ordered by quartiles of factor ChIP-seq density. (H) Distance between CTCF sites to unique nucleosomes and MNase short reads (64–84 bp). (I) Distance between CTCF sites and unique nucleosomes in quartiles of CTCF ChIP-seq density.
To further test the current CTCF-nucleosome competition model, we divided the CTCF ChIP-seq signals into quartiles, with the top 25% representing sites most bound by the factor. If CTCF outcompetes nucleosomes for binding at its cognate sites, then CTCF ChIP-seq signals should increase while nucleosome signals decrease, and vice versa. The quartile plot showed that ChIP-seq signals were positively correlated with chemically-defined nucleosome occupancy, where binding sites most occupied by CTCF displayed relatively higher nucleosome enrichment (Figure 4BC). Moreover, the chemical cleavages around CTCF show characteristic patterns consistent with nucleosome configuration, and the defined nucleosomes possess strong dinucleotide signals (Figure 4DE). These results indicate that hydroxyl radical cleavages mainly occur at H4-specific cysteine residues. The NCP score coverage for CTCF is noticeably broader (Figure 4A) and the cross-correlation peaks at −5 bp and +2 bp are weaker compared with other regularly positioned nucleosomes, possibly indicating fuzzier or more dynamic nucleosome positioning around CTCF. As observed at the TSS, partial MNase reads and short MNase reads are aligned and enriched at CTCF sites (Figure 4FG). In addition, we found that CTCF sites prefer to be positioned around the nucleosome center where short reads are concentrated, or 10n bp from the dyad due to alternative nucleosome positioning (Figure 4H). Indeed, the higher the CTCF ChIP-seq signal is, the closer the CTCF site is to the nucleosome dyad (Figure 4I). Together, these results suggest that nucleosomes and CTCF proteins may co-occupy the same target DNA sequences. In this scenario, CTCF may function as a pioneer factor. However, the presented data do not exclude an alternative sequential model in which CTCF might act to recruit nucleosomes to its cognate binding sites (Jin et al., 2009).
Nucleosome occupancy at pluripotency transcription factor-binding sites in ES cells
Emerging evidence suggests that pluripotency transcription factors Oct4, Sox2, and Klf4 can function as pioneer factors to bind to their target sites on nucleosomes in vitro (Soufi et al., 2015). Additionally, during reprogramming of iPS cells, these ectopically expressed factors occupy silent target sites that are enriched for nucleosomes in fibroblasts (Soufi et al., 2012). In stem cells, some studies showed that these factors preferentially occupied NDRs (West et al., 2014; You et al., 2011), whereas others identified an overlap of nucleosome occupancy with such TF-bound sites (Teif et al., 2012). Thus, it demands further investigation as to whether nucleosomes and pluripotent factors co-occupy target sites in ES cells.
Two possible scenarios explain how TFs bind to target sites on chromatin. First, TFs and nucleosomes independently occupy the DNA. In this case, nucleosomes that bind to DNAs unbound by TFs are not expected to align well to form a phasing pattern at factor sites (Ganapathi et al., 2011; Garcia et al., 2014). Indeed, predicted nucleosome occupancy based on DNA preferences (Xi et al., 2010) shows no phasing pattern around ChIP-seq-called TF binding sites (Figure S7A, cyan). Second, TF binding influences nucleosome positioning and vice versa. For many TFs, phased nucleosomes have been observed in the flanking regions of functional TF binding sites (Gaffney et al., 2012; Koerber et al., 2009). A previous MNase map revealed nucleosome occupancy but not phasing at binding sites for Oct4, Nanog, and Sox2 in mouse ES cells (Teif et al., 2012). Thus, it is possible that these TFs bind target DNA independent of neighboring nucleosomes.
Using the high-resolution chemical map, we investigated nucleosome positioning over the binding sites of four pluripotency factors, Oct4, Sox2, Nanog, and Klf4. Our analysis shows clear nucleosome phasing patterns around these factors sites, an indication of TF-modulated nucleosome positioning (Figure S7A, magenta). Similar to CTCF, we observed nucleosome-specific cleavage patterns and strong periodic AA/TT/AT/TA signals in the unique nucleosomes around these factor sites (Figure S7BC). Additionally, these factors prefer to bind to target sites within the nucleosomes rather than the linker regions (Figure 5A), suggesting that nucleosomes and TFs may co-occupy these target sites. The ChIP-seq quartile plot further revealed profound nucleosome enrichment at Oct4, Sox2, Nanog, and Klf4 sites and demonstrated that TF ChIP-seq signals positively correlate with nucleosome occupancy [Figure 5B, Figure S7D, data from (Chen et al., 2008; Whyte et al., 2013)]. Our chemical map revealed profound nucleosome enrichment at Oct4, Sox2, Nanog, and Klf4 sites and demonstrated that TF ChIP-seq signals positively correlate with nucleosome occupancy (Figure 5B, Figure S7D). Our results support the view that these four pluripotent TFs function as pioneer factors to occupy their target sites on nucleosomes in ES cells, resulting in nucleosome phasing in the flanking regions.
Figure 5. Nucleosome Occupancy Around the Pluripotency Factor Binding Sites.

(A) Distance between factor binding sites and unique nucleosomes. (B and C) Center-weighted nucleosome occupancy plotted in quartiles of ChIP-seq signal for five ES cell transcription factors (ChIP-seq data is from GEO accession GSE11431 (Klf4 and Zfx) and GEO accession GSE44286 (Oct4, Sox2, Nanog)). See also Figure S7.
By contrast, our complete MNase map in H4S7C ES cells showed nucleosome depletion at corresponding TF binding sites flanked by minimally phased nucleosomes (Figure 5C). Again, we interpreted the nucleosome depletion as a result of enzyme overdigestion since partial MNase digestion showed increased sensitivity of nucleosomal DNA at these sites (Figure S7D). As a control, we examined nucleosome occupancy at binding sites for traditional transcription factor Zfx. Nucleosome occupancy dipped in the MNase and chemical maps, indicating that Zfx prefers to bind NDRs or that its binding excludes nucleosomes. In agreement, Zfx was more likely to bind to DNA on the nucleosome edge or linker regions compared with the other four pluripotency factors (Figure 5A). Furthermore, Zfx exhibited a reverse correlation between ChIP-seq factor binding signal and nucleosome occupancy, suggesting that Zfx outcompetes nucleosomes for binding.
A role of nucleosomes in RNAPII pausing in the promoter region
RNAPII pausing is a widespread regulatory mechanism conserved in metazoans (Jonkers and Lis, 2015). Using the chemical map and an available GRO-seq dataset from mouse ES cells (Williams et al., 2015), we investigated how promoter-proximal nucleosome patterns are correlated with RNAPII pausing We adopted the definition of “pausing index” from Williams et al. as the ratio of the average GRO-seq density in the promoter region (TSS ± 150 bp) over the average gene body GRO-seq density (+250 to 2250 bp). As the GRO-seq reads at the TSS and gene body for the bottom 50% of expressed genes were too low to calculate a meaningful PI score, we chose to use the top 50% TSS GRO-seq reads for our analysis.
The heat map of NCP scores showed that highly paused genes had more nucleosome occupancy and stronger phasing patterns downstream of the TSS (Figure 6A). This relationship was further exemplified in the pattern of nucleosome occupancy around the TSS plotted in quartiles of PI scores (Figure 6B), suggesting that nucleosomes in the TSS play a role in regulating RNAPII pausing. Moreover, in the 8 TSS clusters we generated, higher PI scores appeared to be associated with higher nucleosome occupancy downstream of the TSS (Figure 6C). Regression analysis of PI over +1 nucleosome occupancy treating the cluster ID as a dummy variable gives an overall statistically significant positive effect of +1 nucleosome occupancy on pausing (p-values=1.6e-13). A recent NET-seq study showed that RNAPII stalls at the entry to the +1 nucleosome in Drosophlia S2 cells (Weber et al., 2014). Consistent with this finding, the GRO-seq signals were also found to be enriched in front of or at the dyad of the +1 nucleosome defined by chemical mapping (Figure 6D). Collectively, the chemical map provides evidence supporting a barrier function for the +1 nucleosome in RNAPII pausing in the promoter region.
Figure 6. Nucleosome Occupancy and RNAPII Pausing Around the TSS.

(A) Heat map of NCP scores around the TSS sorted in descending order of Pausing Index (PI). Shown are genes that correspond to the top 50% GRO-seq total density in the TSS ± 150 bp and have lengths >=2250 bp (also for B and C). (B) Nucleosome occupancy around the TSS in PI quartiles. (C) Nucleosome occupancy around the TSS plotted for genes corresponding to top and bottom halves of PI in each cluster obtained in Figure S3D. In Cluster 5, four genes with extremely high NCP scores around TSS were removed. (D) Plots of GRO-seq density (gray) and normalized NCP scores (red) aligned at the TSS. Arrow indicates +1 nucleosome location in each cluster.
Nucleosome positioning at exon-intron junctions
Recent studies of intragenic nucleosome positioning by MNase mapping have found that nucleosomes are preferentially centered within exons, while flanking splice sites are frequently located in the linker regions (Andersson et al., 2009; Schwartz et al., 2009; Tilgner et al., 2009). Our MNase map confirmed a significant enrichment of nucleosome reads at the centers of 169,358 mouse internal exons in comparison with the flanking introns (Figure 7A). In contrast, the chemical map reveals nucleosomes are enriched at exon boundaries (Figure 7B). As shown in Figure S6D, the nucleosome occupancy in exons or introns elevates as gene expression level increases. However, the respective occupancy patterns are consistent across different gene expression levels in both maps (Figure 7C).
Figure 7. Nucleosome Occupancy at Intron-exon and Exon-intron Junctions.

(A and B) NCP scores and MNase read center scores aligned at 169,358 internal exon centers. Ovals: nucleosomes; boxes: internal exons with average length of 149 bp. (C) NCP scores and MNase read center scores aligned at centers of internal exons, as in (A and B), and intron centers by FPKM quartiles. (D) Top, plots of chemical NCP scores aligned at 189,474 intron → exon and exon → intron boundaries in a 500 bp window. Bottom, GRO-seq density at splice junctions shows that GRO-seq enrichment coincides with NCP scores at exon boundaries. NCP scores show distinct peaks (arrows) corresponding to the most favored nucleosome positions. (E and F) NCP scores or GRO-seq density aligned at splice junctions sorted by FPKM gene expression quartiles. (G) A model illustrating nucleosome-mediated regulation of RNAPII elongation at exon boundaries. RNAPII stalls where well-phased nucleosomes demarcate intron → exon and exon → intron boundaries.
Further examination of nucleosome positions at exon boundaries shows that nucleosome centers are favorably positioned at the second base pair of the exon at the intron-to-exon junction (+2 bp) and prefer more alternative periodic positioning 10n bp upstream and downstream of the +2 bp (Figure 7D). At the exon-to-intron junction, the two most favored dyad positions occur at the second to last base pair of the exon (−2 bp), and the seventh base pair of intron (+7 bp), while no strong periodicity of positioning preference is observed. In sum, our results show nucleosomes are well positioned at the exon-intron junctions, rather than at the centers of exons as suggested by previous MNase maps.
The rate of transcription is known to influence the recognition of splice sties by the RNAPII-coupled splicing machinery (Saldi et al., 2016). Nucleosomes at exon-intron junctions might serve as speed bumps to pause/slow an elongating RNAPII, thus favoring exon definition. To investigate the role of nucleosomes at exon boundaries in this process, we overlaid GRO-seq density (Williams et al., 2015) with chemical NCP scores in a ±140 bp window around 189,474 mouse internal exon-intron and intron-exon junctions. We found the accumulation of GRO-seq signals coincided almost perfectly with the peaks of NCP scores around exon boundaries (Figure 7D, black). Furthermore, the characteristic NCP score patterns remain the same regardless of gene expression levels (Figure 7E). For all expressed genes in ES cells, the GRO-seq density accumulates at the junctions, although with varying intensity for each expression group (Figure 7F). Because GRO-seq signals typically correlate with RNAPII ChIP-Seq signals (Williams et al., 2015), the overlap of GRO-seq signal and NCP scores at the exon boundaries suggests that RNAPII likely stalls at a location close to the nucleosome dyad at exon-intron junctions.
DISCUSSION
In this study, we have established a chemical mapping method to determine precise nucleosome positions for mammalian cells. The chemical nucleosome map reveals several features of mammalian nucleosome landscape on a genome-wide scale, particularly with regard to the role of nucleosomes in transcriptional regulation.
First, many stereotypical NDRs, most noticeably in the mouse promoters, are abundantly occupied by nucleosomes in the chemical map. This observation is in direct conflict with the prevailing view of promoter nucleosome architecture based on extensive MNase-seq mapping studies (Hughes and Rando, 2014). However, our study is not alone in this regard; several pioneering studies have provided evidence to challenge the existing model. In yeast, differential MNase digestion identified over 2000 promoters that contained MNase-sensitive “fragile” nucleosomes upstream of the TSS (Kubik et al., 2015). Similar observations have also been made in flies, plants, and mice (Chereji et al., 2016; Iwafuchi-Doi et al., 2016; Vera et al., 2014). Moreover, using MPE-ChIP-seq, Ishii et al. recently showed that promoters are occupied by MNase-sensitive nucleosomes in ES cells (Ishii et al., 2015). The partial MNase digestion data in this study supported these previous reports. While differential MNase digestion can identify “fragile” nucleosomes, it is difficult to compare the occupancy of “fragile” versus stable nucleosomes because of different experimental variables. Moreover, one must expect greater inaccuracy when estimating the dyad position of “fragile” nucleosomes as a result of poor resolution of partial MNase maps. In contrast, H4S47C-mediated chemical mapping gives accurate positioning information on “fragile” nucleosomes. Using the chemical map, we extend this observation into other NDRs such as the TTS, showing that MNase-sensitive nucleosomes occupy previously designated NDRs in the mouse genome. Thus, together with accumulative studies on “fragile” nucleosomes from other groups (Chereji et al., 2016; Ishii et al., 2015; Kubik et al., 2015; Xi et al., 2011), our chemical map challenges the current model and reveals an unexpected feature of nucleosome architecture over the TSS and TTS. However, the nature of “fragile” nucleosomes remains unknown. Future investigations are necessary to examine several possibilities (Kubik et al., 2015).
Second, CTCF- and pluripotency TF- binding sites in mouse ES cells exhibit high nucleosome occupancy through chemical mapping. To our surprise, binding of CTCF and pluripotency factors appears to increase the sensitivity of nucleosomes to MNase. It has recently been shown that the canonical pioneer factor FoxA displaces linker histone H1 and increases MNase accessibility of nucleosomes at liver-specific enhancers (Iwafuchi-Doi et al., 2016). We hypothesize that a similar mechanism may be at work for nucleosomes upon co-binding with CTCF, Oct4, Nanog, Sox2, or Klf4 in ES cells. Indeed, recent data has shown that three of these factors (Oct4, Sox2, and Klf4) can bind directly to nucleosomal target DNA in vitro (Soufi et al., 2015). Positive ChIP-seq signals were previously identified for H3.3 and H2A.Z at CTCF-binding sites (Jin et al., 2009). H3 and H2B occupancy has also been identified by MPE-ChIP-seq at these factor-binding sites in ES cells [CTCF shown in (Ishii et al., 2015) and the other four by our re-analysis (data not shown)]. These results, together with the in vivo evidence provided by our chemical map, further support a role for the pioneering function of these factors, thus adding to an increasingly diverse repertoire of pioneer factors in mammals (Zaret and Mango, 2016).
The aforementioned differential nucleosome architecture at the promoter led us to re-examine the in vivo barrier function of nucleosomes using the chemical map. Even though the GRO-seq signals measure both NELF/DSIF-regulated and nucleosome-induced pausing events in the proximal promoter region, the chemical map reveals that a statistically significant and positive correlation between the +1 nucleosome occupancy and RNAPII pausing in ES cells (Figure 6BC). By extending this line of investigation to exon-intron junctions, the chemical map leads to a revision of the current exon-centered nucleosome model based on MNase mapping. The chemical map shows that, on a genomic scale, nucleosomes occupy both junctions of exons and introns. Our observation is reminiscent of a hypothesis put forward by earlier work based on the common sequence periodicity around splice junctions (Kogan and Trifonov, 2005). It has previously been shown that RNAPII pauses at 5′ intron-exon and 3′ exon-intron junctions (Jonkers et al., 2014; Kwak et al., 2013). Our chemical map illustrates that splice acceptors and donors are preferentially positioned near nucleosome dyads, where the strongest DNA-histone interactions occur (Kulaeva et al., 2013). Thus, the two-nucleosome model at exon-intron junctions provides a better explanation for the two pausing events observed around an exon. Given that exon definition is a co-transcriptional event regulated by transcription kinetics and epigenetic histone modifications (Bentley, 2014), the accumulation of steady-state RNAPII and the preference of nucleosome dyads at both splice junctions suggest that nucleosome positioning at the DNA level regulates exon definition by influencing the rate of elongation as a transcription barrier. Here, preferentially positioned nucleosomes demarcate exon junctions by stalling RNAPII twice at exon boundaries (Figure 7G).
In conclusion, we have constructed a high-resolution chemical nucleosome map in mouse ES cells. As demonstrated here, the resulting map is valuable in revealing new dynamic features of nucleosome architecture in transcription regulation. Given its clear advantages in accuracy and consistency, future directions would be to implement this approach in other mammalian cell types.
STAR★METHODS
CONTACT FOR REAGENT AND RESOURCE SHARING
For experimental reagents and resources contact awang@northwestern.edu; for computational data analysis resources contact jzwang@northwestern.edu
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Generation of H4S47C-expressing mouse ES cells
Parental mouse AB2.2 ES cells were used to construct H4S47C ES cell lines for chemical mapping. We designed a strategy to replace a majority of endogenous H4 with H4S47C in AB2.2 cells using a combination of RNAi knockdown and cDNA expression. We generated and validated three independent shRNAs against two regions of common DNA sequences shared by 13 mouse H4 genes. To construct mU6-driven H4-shRNA, the oligos were phosphorylated, annealed, and ligated into a BbsI and XhoI digested PB-mU6∷PGK-Puro vector: H4-Sh1 sense and H4-Sh1 antisense, H4-Sh2 and H4-Sh2, and H4-Sh3 sense and H4-Sh3 antisense to target all mouse H4 genes (sequences available in the Key Resources Table). To express H4S47C in the presence of H4-shRNA, we synthesized a codon-modified, RNAi-resistant H4S47C cDNA expression vector PB-CAG-H4S47C∷PGK-Hygro. The H4-shRNA and the H4S47C transgenes were simultaneously introduced into mouse AB2.2 ES cells by PiggyBac transgenesis (Su et al., 2011). After sequential drug selection with Hygromycin and Puromycin, we established multiple stable ES cell lines. For individual lines, the expression levels of endogenous H4 and synthesized H4S47C in stable cell lines were analyzed by RT-PCR. Lines with relatively high levels of H4S47C and low levels of endogenous H4 were selected for further validation and characterization. The “A12” H4S47C clone was used for chemical mapping in this study.
KEY RESOURCES TABLE
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Antibodies | ||
| Histone H4 antibody | Proteintech | Cat# 16047-1-AP |
| Biological Samples | ||
| Chemicals, Peptides, and Recombinant Proteins | ||
| L-α-Lysophosphatidylcholine from egg yolk | Sigma | Cat# L4129-25mg |
| N-(1,10-phenanthrolin-5-yl)iodoacetamide | Biotum | Cat# 92015 |
| Hydrogen peroxide | Fisher Chemical | Cat# H325-100 |
| Neocuproine, 98% | Alfa Aesar | Cat# J63698 |
| 3-mercaptopropionic acid, 99% | Alfa Aesar | Cat# A13261 |
| Nuclease micrococcal from Staphylococcus aureus (MNase) | Sigma | Cat# N3755-500UN |
| Azidohomoalanine | AnaSpec | Cat# AS-63669 |
| Critical Commercial Assays | ||
| Histone Purification Kit | Active Motif | Cat# 40025 |
| DNATerminator® End Repair Kit | Lucigen | Cat# 40035-2 |
| DNA Polymerase I, Large (Klenow) Fragment | New England BioLabs | Cat# M0210S |
| Deposited Data | ||
| Raw and analyzed sequence data for DNA-seq and RNA-seq | This paper | GEO: GSE82127 |
| Experimental Models: Cell Lines | ||
| Clone A12 H4S47C mouse ES cells | This paper | N/A |
| Experimental Models: Organisms/Strains | ||
| Recombinant DNA | ||
| PB-mU6∷PGK-Puro vector | This paper | N/A |
| PB-CAG-H4S47C∷PGK-Hygro | This paper | N/A |
| Sequence-Based Reagents | ||
| H4-sh1 sense: TTT GGC CTC ATC TAC GAG GAG AAT TAA GTC TCC TCG TAG ATG AGG CCT TTT TC |
This paper | N/A |
| H4-sh1 antisense: TCG AGA AAA AGG CCT CAT CTA CGA GGA GAC TTA ATT CTC CTC GTA GAT GAG GC |
This paper | N/A |
| H4-sh2 sense: TTT GCG ACG CCG TCA CCT ACA CAT ATA GAG GTG TAG GTG ACG GCG TCG CTT TTT C |
This paper | N/A |
| H4-sh2 antisense: TCG AGA AAA AGC GAC GCC GTC ACC TAC ACC TCT ATA TGT GTA GGT GAC GGC GTC G |
This paper | N/A |
| H4-sh3 sense: TTT GAG CAC GTC AAG CGT AAG AAT TAA TGT CTT GCG CTT GGC GTG CTC TTT TTC |
This paper | N/A |
| H4-sh3 antisense: TCG AGA AAA AGA GCA CGC CAA GCG CAA GAC ATT AAT TCT TAC GCT TGA CGT GCT |
This paper | N/A |
| H4S47C: ATG TCT GGT CGC GGT AAG GGA GGA AAA GGC CTG GGC AAA GGT GGC GCT AAA CGC CAC CGC AAA GTG CTC CGC GAT AAC ATC CAG GGC ATC ACC AAG CCT GCC ATT CGC CGC CTT GCT CGG CGC GGC GGT GTC AAA CGC ATT TGC GGA TTG ATT TAT GAA GAA ACT AGA GGC GTG CTC AAA GTC TTT CTG GAG AAC GTC ATT AGA GAT GCA GTG ACT TAT ACC GAA CAT GCA AAA AGA AAA ACA GTC ACA GCC ATG GAT GTT GTG TAT GCC CTC AAA CGC CAG GGC CGC ACC CTC TAC GGA TTC GGC GGC TAA |
This paper | N/A |
| Software and Algorithms | ||
| Bowtie | Langmead et al., 2009 | http://bowtie-bio.sourceforge.net/manual.shtml |
| Tophat | Trapnell et al., 2009 | https://ccb.jhu.edu/software/tophat/manual.shtml |
| Cufflinks | Trapnell et al., 2010 | http://cole-trapnell-lab.github.io/cufflinks |
| MACS | Zhang et al., 2008 | http://liulab.dfci.harvard.edu/MACS |
| R | The R Development Core Team | https://www.r-project.org/ |
| NuPoP (R-package) | Xi et al., 2010 | https://www.bioconductor.org/packages/release/bioc/html/NuPoP.html |
| NuCMap (R-package) | Xi et al., 2014 | http://bioinfo.stats.northwestern.edu/~jzwang/NuCMap.html |
| Other | ||
METHOD DETAILS
ES cell culture
WT and H4S47C ES cells were cultured using a standard procedure as previously described (Su et al., 2011). To test the differentiation potential of H4S47C ES cells, we followed a widely used embryoid body (EB) formation induction protocol described in (Wobus et al., 2002). In brief, ES cell aggregates were first cultured as hanging drops for 2 days and subsequently grown in suspension for 3 days. Differentiation of EBs was induced by 10−8 M retinoic acid on gelatin-coated plates.
RT-PCR and RNA-seq
Total RNA was isolated from cultured mouse ES cells with TRIzol Reagent (Life Technologies) according to the manufacturer’s instructions. Reverse transcription was performed using Superscript III with an oligo-dT primer (Invitrogen). For RNA-seq analysis, total RNA samples from WT and H4S47C ES cells were used to construct RNA-seq libraries with a TruSeq Stranded Total RNA Library Prep Kit (Illumina). The library sequencing was performed using a HiSeq 2000 platform at the Genomics Facility of University of Chicago.
Chemical Mapping of H4S47C ES cells
The “A12” H4S47C-expressing ES cells were cultured in M15 media with Lif and 2i without feeders prior to the mapping experiments. The cells were harvested for mapping reactions as previously described with specific modifications for mouse ES cells (Brogaard et al., 2012). First, trypsinized ES cells were pelleted and washed in permeabilization buffer (150 mM sucrose, 80 mM KCl, 35 mM Hepes pH 7.4, 5 mM K2HPO4, 5 mM MgCl2). Second, cells were permeabilized with L-α-lysophosphatidylcholine at a final concentration of 100 μ/mL for 5 min. Permeabilized cells were then washed in wash buffer (150 mM sucrose, 5 mM MgCl2, 0.01% NP-40). Third, the cell pellets were resuspended in labeling buffer (150 mM Sucrose, 10mM Tris-HCl pH 7.4, 15 mM NaCl, 60 mM KCl, 5 mM MgCl2, 0.01% NP-40, 0.5 mM spermidine, 0.15 mM spermine), and incubated with 1.4 mM N-(1,10-Phenanthrolin-5-yl)iodoacetamide (Biotium) for 2 hrs at RT. Fourth, following washes with mapping buffer (150 mM Sucrose, 50 mM Tris-HCl pH 7.5, 2.5 mM NaCl, 60 mM KCl, 5 mM MgCl2, 0.01% NP-40, 0.5 mM spermidine, 0.15 mM spermine), the labeled cells were incubated with 0.15 mM CuCl2 for 2 min. The pellets were washed again and resuspended in mapping buffer and subjected to 20 min of hydroxyl radical cleavage using 6 mM of 3-mercaptopropanoic acid (Sigma) and 6 mM of H2O2. The mapping reaction was quenched with 2.8 mM Neocuproine (Alfa Aesar). Lastly, to extract chemically cleaved DNA, the cell pellets were lysed in a buffer containing 10 mM Tris-HCl pH 8.0, 25 mM EDTA, 100 mM NaCl, 0.5% SDS, and 0.2 mg/mL proteinase K at 55°C for 4 hrs. Following phenol/chloroform extraction, ethanol precipitation, and RNase A treatment at 40 μg/mL, the purified DNA was resolved on a 2.25% NuSieve 3:1 (Lonza) agarose gel. DNA fragments between two adjacent nucleosomes (~100 to 300 bp) were gel-purified and prepared for paired-end sequencing.
Complete and partial MNase mapping
Complete MNase digestion of nucleosomal DNA in mouse ES cells was performed as recently described (Sebeson et al., 2015). The resulting mononucleosomal DNA was gel purified for library construction. For partial MNase digestion, ES cells were harvested and permeabilized as described for chemical mapping. Cells were resuspended in MNase buffer (150 mM sucrose, 50 mM Tris-HCl pH 7.4, 50 mM NaCl, 0.5 mM MgCl2, 2 mM CaCl2, 0.01% NP-40, 0.15 mM spermine, 0.5 mM spermidine) and incubated with 5 U/mL or 15 U/mL MNase at 37°C. The reactions were terminated with Stop Buffer (10 mM Tris-HCl pH 7.4, 100 mM NaCl, 10 mM EDTA, 1% SDS) after a 5 min digestion, when roughly 15–20% of genomic DNA was cleaved into mononucleosome-sized fragments. The mononucleosomal DNA fragment was resolved on a 1.5% agarose gel and gel-purified for sequencing.
Metabolic labeling and histone purification
Newly synthesized histones were metabolically label according to a previously described protocol (Deal et al., 2010). Briefly, WT and H4S47C ES cells were methionine starved for 30 min in methionine-free DMEM media, followed by a 30 min incubation with azidohomoalanine (AnaSpec) at 4 mM. After labeling, cells were harvested, washed, and resuspended in cold NE1 buffer (10 mM Tris-HCl pH 7.4, 10 mM KCl, 1 mM MgCl2, 0.1% Triton X-100, 0.1 mM DTT, 20% glycerol), gently vortexed, and centrifuged for nuclei isolation. Nuclei were resuspended in cold HB125 buffer (0.125 mM sucrose, 15 mM Tris pH 75, 15 mM NaCl, 40 mM KCl), followed by biotin coupling and chromatin extraction.
We extracted core histones from WT and H4S47C ES cells with the Histone Purification Kit (Active Motif) according to the manufacturer’s protocols. Histones were separated by a 15% SDS-PAGE gel and visualized by Coomassie Blue staining or western blot using rabbit anti-histone H4 (16047-1-AP, Proteintech).
QUANTIFICATION AND STATISTICAL ANALYSIS
Sequencing and read alignment
All DNA libraries from chemical and MNase mappings were prepared for paired-end sequencing using the standard Illimina protocol. The libraries were sequenced on the Illumina HiSeq 2000 platform at the Genomics Facility of University of Chicago. For chemical mapping, we generated 5,610 million paired-end reads from two independent chemical mapping experiments. For MNase mapping of WT and H4S47C ES cells, we generated 1,062 million and 1,392 million paired-end reads, respectively. For partial MNase mapping, we generated 172.9 million paired-end reads for 5 U/mL MNase and 181.1 million paired-end reads for the 15 U/mL MNase.
Reads were mapped to the mouse (mm9) genome using Bowtie. We mapped the paired-end reads by allowing 0 to 3 mismatches progressively. The reads were first aligned under the most stringent criterion (i.e., 0 mismatch). Then in the following steps, unaligned reads from previous runs were re-aligned using less stringent mismatch criteria. This strategy helps balance the success rate of sequence alignment (considering sequencing errors) and the chance of mis-alignment due to homologies. If one read (paired) was aligned to n multiple locations, each location was assigned with a read frequency 1/n. The alignment yielded ~ 4,390 million cleavages on each strand. For the MNase maps, we applied the same alignment strategy, but only the uniquely mapped reads were used in the following analysis.
NCP score calculation and unique nucleosome set in the chemical map
We followed the same approach by Brogaard et al to identify the cleavage pattern around the nucleosome dyad in the chemical map (Brogaard et al., 2012). The resulting cleavage patterns, summarized in four distinct templates that characterize the cleavage amount in positions around the dyad, were fed into an R package NuCMap that implements the deconvolution algorithm developed by Xi et al (available at http://bioinfo.stats.northwestern.edu/~jzwang) to compute the nucleosome center positioning (NCP) score at every genomic location (Xi et al., 2014). The NCP score is a measure of abundance of nucleosomes centered at a given genomic location. Due to the genome size, the average cleavage (1.6 per base pair) in the mouse chemical map is lower than those in S. cerevisiae and S. pombe maps. Thus, the chemical map for ES cells was constructed using the deconvolution model without considering the noise term.
We defined the unique nucleosome map based on the magnitude of NCP scores avoiding overlaps between neighboring nucleosomes. First, a genomic location that had the largest NCP score was called as the center of a nucleosome. Then the location with the next largest NCP score that was at least ±147 bp away from the first selected nucleosome was called as the second nucleosome center. This step was repeated so that every selected nucleosome center in the current step was at least ±147 bp away from any previously selected nucleosome centers until no nucleosomes could be defined any further. Based on all defined nucleosomes, we selected the top 95% nucleosomes (~10.6 million) with the highest NCP score as the final unique set to represent the most abundant nucleosomes.
Nucleosome occupancy and center-weighted occupancy
We defined nucleosome occupancy in two ways throughout this paper. For the chemical map, we defined nucleosome occupancy at position k as where Sk+j is the NCP score at position k + j. In addition, we defined the center-weighted occupancy at position k as , where wj is the Gaussian weight equal to exp [−(j/20)2/2]. The center-weighted occupancy is a smoothed version of the NCP score, which often provides a better definition of nucleosome boundaries.
For the MNase maps, nucleosome occupancy and center-weighted occupancy were computed in a similar fashion to the chemical map. We first selected the reads with lengths 127–167 bp and calculated the read center score, defined as the number of reads centered at each genomic position (for reads of even length, we assigned 0.5 to its two centers). The read center score is analogous to the NCP score in the chemical map, which measures the nucleosome center occupancy at a given genomic location. Replacing the NCP score in the above formulas gives the nucleosome occupancy and center-weighted occupancy for the MNase maps.
Read coverage score in partial MNase maps
Because the center of a read in the partial MNase maps (5 U/mL and 15 U/mL) does not necessarily align with the nucleosome center, we defined the read coverage score at a given genomic location for partial MNase maps as the total number of reads that cover the location.
Unique nucleosome set for MNase maps of WT and H4S47C ES cells
For the MNase maps of WT and H4S47C mouse ES cells, we first selected all local peaks of center-weighted occupancy score obtained above, requiring that each peak selected must achieve the maximum score within ±35 bp. This requirement helps to avoid nucleosome calls that might otherwise correspond to extremely low scores in the following steps. Out of these selected peaks, we followed the same algorithm used for the chemical map as described above, and progressively selected ~ 10 million well-defined, non-overlapping nucleosomes for WT and H4S47C ES cells.
To calculate the center-to-center distance between nucleosomes in the two unique sets, we used the first set as a reference for calculating how many nucleosomes in the second set were d bp away from the closest nucleosome in the reference set for −73 ≤ d ≤ 73. The difference due to which set is used as the reference is negligible when the selected numbers of nucleosomes in the two sets are close.
AA/TT/AT/TA signal for MNase map
The AA/TT/AT/TA signal for the MNase map was calculated based on a published data set by Teif et al (2012). We first computed the nucleosome center-weighed occupancy score and then defined a unique set of nucleosomes using the algorithm described above. The selected nucleosome center-weighed occupancy peaks represent the approximate center positions of the nucleosomes, although with systematic errors because of the well-known sequence preference of MNase. Hence, if we align the DNA at these peak positions, the AA/TT/AT/TA signal would become much weaker or even completely disappear. Instead, we searched the raw data for sequence reads of length 147 bp in the ±20bp region nearest the peak position. If no such sequence existed, we further searched for sequences of lengths 148, 146, 149, 145, and 150 bp sequentially within the ±20bp region until the first sequence was identified. The center of the identified sequence was treated as nucleosome center to generate the AA/TT/AT/TA frequency plot. If no sequence was identified in ±20bp region, the peak position was treated as the true nucleosome center to generate the alignment.
Definition and analysis of poly(dA-dT) and -(dG-dC) tracts
A poly(dA-dT) tract is defined as a DNA fragment with multiple As in a run on one strand and multiple Ts on the other strand by reverse complement. We defined a poly(dA-dT) tract without mismatches as A0, a poly(dA-dT) tract with one internal mismatch as A1, and so forth. The immediate upstream or downstream base pair of a poly(dA-dT) tract must be a non- A nucleotide (or non- T on the other strand). In addition, an A1 tract of length 5 refers to a sequence of 5 nucleotides, among which four are As but with one internal non- A (at position 2 through 4 but not in the first and last position of the 5 mer, e.g., AAGAA or AGAAA or AAAGA). Again, an A1 tract of length 5 must be preceded or followed immediately by a non- A nucleotide (or non- T on the other strand). The same rule applies to poly(dG-dC) tracts.
To calculate the average occupancy over a poly(dA-dT) or -(dG-dC) tract, first, the nucleosome occupancy score at every base pair of a given tract was averaged. The average occupancy was further averaged across all tracts of the same kind identified genome-wide. The final average occupancy score over the tracts was normalized by genome average nucleosome occupancy and presented in the log2 scale.
To calculate the distance between nucleosomes and poly(dA-dT) or -(dG-dC), for each poly(dA-dT) or -(dG-dC) tract, we searched the nearest unique nucleosome defined in the chemical map. For a poly(dA-dT) or -(dG-dC) tract with an even length (having two centers), the shorter distance was used in the frequency plot. The distance frequency obtained was symmetrized, as a position downstream the nucleosome dyad with respect to one strand denotes a position upstream with respect to the other strand.
Quantification of factor binding sites
The CTCF factor binding sites (peak location) and the ChIP-seq reads were obtained from a published study by Chen et al.,(2008; GEO: GSM288351). We used the software “liftOver” from UCSC Genome Browser to convert the site and reads coordinates from mouse genome mm8 to mm9. We followed the approach by Chen et al. to extend each read (single-end) in the downstream direction for 200 bp from reads start position, and calculated the ChIP-seq density as the total reads coverage in the [−100,100] bp region around the CTCF peak centers.
For factors Oct4, Sox2, and Nanog, we downloaded the raw ChIP-seq data by Whyte et al., 2013 (GEO: GSE44286). We re-aligned the reads to mouse genome mm9 with Bowtie (with the same parameters suggested by Langmead et al., 2009. Then we applied the ChIP-seq peak-calling software tool MACS (Zhang et al., 2008) to generate the map of putative factor binding sites (using p-value threshold 1e-9). The binding sites (peak location) for Klf4 and Zfx were downloaded from Chen et al.(2008; GEO: GSE11431) and their genome coordinates (on mm8) were converted mm9 using “liftOver”.
To calculate the center-to-center distance between factor sites and nucleosomes, we used the factor peak-centers as the reference set and searched for the nearest unique nucleosome in the ±100 bp region. The ChIP-seq data provides only an approximate region where the factors bind to DNA. To refine the map of factor binding sites, we first downloaded the motif position weight matrix for each factor from “The MEME Suite” (http://meme-suite.org/db/motifs, we used the JASPAR_CORE_2014.meme). We scanned the ±50 bp region of the ChIP-seq peak using the position weight matrix, and the position that achieved the highest score was regarded as the true factor binding site. The adjusted factor center positions were chosen as the reference set for calculation of the distance to nucleosomes. The distance frequency plot presented was symmetrized and smoothed (kernel smoothing with bandwidth= 2).
Linker length calculation
Genome-wide linker DNA length was calculated based on the unique set of nucleosomes from the chemical map. We calculated the linker length to be distance between two consecutive nucleosome centers – 147bp. The presented distribution is only for linker lengths ranging from 1 to 100 bp.
For genic region linker DNA length calculation, a linker is counted as in a genic region if its two neighboring nucleosome centers defined in the unique map are in the genic region.
DATA AND SOFTWARE AVAILABILITY
Raw and processed data are available at GSE82127.
Supplementary Material
(A) The mouse genome encodes 13 histone H4 genes, each of which has a variable DNA sequence but encodes an identical protein. Sequence alignment of 13 mouse histone H4 genes identifies two regions of common sequences (1 and 2 highlighted in red) for designing shRNA against all H4 genes. (B) Schematic of PiggyBac vectors for expressing shRNAs against H4 and a synthesized RNAi-resistant H4S47C cDNA. (C) Co-expression of H4-shRNA1 and H4-shRNA3 in HEK 293T cells efficiently down-regulated FLAG-tagged mouse H4 but not FLAG-tagged H4S47C, showing that shRNAs were specific for endogenous H4 and not for synthetic H4S47C. (D) RT-PCR analysis confirmed endogenous H4 was knocked down in H4S47C clones while expressing the H4S47C transgene. Representative clones 11, 7, 9, and 1 are shown. (E) WT and H4S47C ES cells showed equal rates of H4 protein synthesis. Both cells were pulse-labeled with azidohomoalanine. Newly synthesized histone proteins were biotinylated and detected by Streptavidin and Coomassie blue stain. Western blot with rabbit anti-H4 demonstrated equal H4 levels between both cell lines. (F) Western blots showing similar levels of Oct4, Nanog, and Ago2 between WT and H4S47C ES cells. (G) The growth curve demonstrates that H4S47C ES cells grow at the same rate as WT cells. Data represents the mean of three independent biological replicates (n=3) and error bars represent the standard error of the mean (SEM). (H) Volcano plots of p-value (−log10 scale) vs fold change (log2 scale) showing differential gene expression of ~21,800 genes between WT vs H4S47C ES cells in 2i (left) or Lif (right) culture conditions. Dotted horizontal line at y=1.3 represents the threshold of statistical significance (p ≤ 0.05). The vertical dotted lines at x=±1 mark the 2-fold change. Genes are identified as significant if p≤0.05 with a log2 fold change =2. The black arrows indicate the number of genes up- or down-regulated to statistical significance. In either 2i or Lif, only a small number of genes are significantly up- or down- regulated between WT and H4S47C ES cells. (I) Left, representative phase-contrast images show that WT and H4S47C ES cells exhibited the same cellular morphology and differentiation patterns. Day 0: Cells were cultured on gelatin-treated plates in the presence of serum and Lif. Day 5: Embryoid bodies (EBs) were generated under the hanging drop method for 2 days followed by suspension culture for 3 days in the absence of Lif. Day 10: Differentiation of EBs after transfer into adherent culture with retinoic acid (RA) for 5 days. Right, corresponding RT-PCR shows a loss of Oct4 by Day 5. Reactions without RT were used as a negative control. (J) Crick-Watson cleavage peak-peak distance plot showing 3 dominant distances equal to −12, −5 and +2 nucleotides, confirming the primary and secondary cleavage sites at −1 and +6, respectively. Local cleavage peaks were selected from Watson and Crick strands separately. A peak was selected if it represented a local maximum cleavage site (also >=10) within ±73 bp on one strand.
(A) Frequency of AA/TT/AT/TA dinucleotide motifs in aligned nucleosomes in intergenic vs intragenic regions. A unique nucleosome is designated as intragenic if its center is in a gene body, otherwise it is defined as intergenic. The zero value x-coordinate is the dyad of unique nucleosomes defined by the chemical map. The periodicity of the dinucleotide motif signal is comparable between the intergenic regions vs intragenic regions, while the difference in average motif frequency is attributable to the difference of A/T composition in the two regions. (B) Same plot as in (A) but only unique nucleosomes in gene body were selected and divided into quartiles of FPKM gene expression values. Similar to (A), the periodicity is comparable between all groups while the difference in average frequency is due to A/T composition. (C) The normalized nucleosome occupancy in log2 scale over poly(dA-dT) and poly(dG-dC) tracts with 0–4 mismatches as a function of tract length in mouse ES cells in the chemical and MNase maps, and in S. cerevisiae by chemical mapping. In each plot, A0 (or G0) stands for poly(dA-dT) or poly(dG-dC) tract of 0 mismatch and so forth. (D) Distance between poly(dA-dT) or poly(dG-dC) centers to unique nucleosomes defined by the chemical maps in mouse ES cells and S. cerevisae.
(A) Center-weighted nucleosome occupancy of mouse ES cell genes aligned at the TSS in the chemical map compared with MNase maps for H4S47C and WT from this study and an MNase map from a previous study (Teif et al., 2012). Nucleosomes are enumerated in terms of distance relative to the 0 bp position at the TSS. The chemical map reveals that some genes are occupied by nucleosomes at the TSS. In this work, a “−1” nucleosome refers to a nucleosome that is positioned directly upstream of the TSS, and a “0” nucleosome refers to one that overlaps with the TSS. (B) Cross-correlation of cleavages across the two strands around −1 nucleosomes (−150 bp to TSS) showing nucleosome-specific characteristic pattern with peaks at −12, −5 and +2. (C) Plot of AA/TT/AT/TA frequency of uniquely defined −1 nucleosomes. (D) Eight clusters from k-means clustering of the chemical NCP score pattern in the [−150,250] bp region around the TSS. For each unique TSS (out of ~21,000), we normalized the NCP score in the [−150, 200] bp window by the average NCP score in this region. We used the “kmeans” function in R to perform the k-means clustering of the normalized NCP score in the [−150, 200] bp region and obtained eight clusters with distinct patterns of nucleosome positioning around the TSS. (E) A zoomed-in view of the eight clusters −250 to 250 bp around the TSS shows genes within each cluster have synchronized, periodic alternative positioning.
(A) Comparison of center-weighted nucleosome occupancy defined by the chemical map and the MNase map around the TSS for S. cerevisiae and S. pombe, separated by FPKM gene expression quartiles. The RNA-seq data for S. cerevisiae was taken from (Nagalakshmi et al., 2008). (B) Center-weighted nucleosome occupancy plots around the TSS by expression level in FPKM (top half versus lower half) within each cluster from Figure S3D.
(A) Plot of DNase I hypersensitivity sites around the TSS by quartiles of gene expression. DNase I data was taken from mouse ENCODE project (GSM1014154) (Vierstra et al., 2014). (B) Partial MNase digestion with 5 U/mL and 15 U/mL generates more enriched read coverage around the TSS compared with the complete MNase map. (C) Partial MNase digestion at 15 U/mL shows genes with higher expression have higher read coverage around the TSS. (D) Shorts reads from MNase footprinting data (Carone et al., 2014) show high A/T frequency when aligned at read ends, demonstrating the MNase digestion bias (i.e. preference to cleave into an AA/TT/AT/TA dinucleotide). (E) Nucleosome Positioning Index by MPE-ChIP H3 and MPE-ChIP H2B (Ishii et al., 2015) around the TSS, sorted by FPKM quartiles.
(A) Center-weighted nucleosome occupancy of genes aligned at the TTS (based on ~22,800 genes with unique TTS) for the chemical and MNase maps. The chemical map shows a well-positioned nucleosome at the TTS in contrast to substantial depletion by the MNase map. High A/T frequency (green) might contribute to the dramatic depletion of nucleosomes in the MNase map. (B) Same as (A), the chemical map shows phased nucleosome arrays around the TTS where gene expression level is positively correlated with average nucleosome occupancy. (C) Normalized average NCP scores and read center scores from the chemical map and from the MNase map, respectively, in intergenic vs intrageneic regions. Intragenic regions show increased occupancy over intergenic regions in both maps. (D) Nucleosome occupancy increases with gene expression (FPKM) in exons and introns as demonstrated by the chemical and MNase maps. Total average NCP score or read center score is shown. (E) Linker length distribution in genic regions by gene expression quartiles in FPKM shows higher expressed genes are enriched with shorter linker lengths.
(A) Center-weighted nucleosome occupancy defined by the chemical map and predicted nucleosome map [by NuPoP R package, (Xi et al., 2010)] centered on binding sites of Oct4, Sox2, Nanog, and Klf4 (Chen et al., 2008; Whyte et al., 2013). A/T frequency for the region is shown in yellow. (B) Cross-strand cross-correlation of chemical cleavages in the ±75 bp region of factors sites. (C) AA/TT/AT/TA dinucleotide frequency of unique nucleosomes defined in the ±75 bp region of factor sites. The A/T frequency for the region surrounding the peak site is shown. (D) Read coverage scores from 5 U/mL partial MNase maps plotted in quartiles of ChIP-seq signals for Oct4, Sox2, Nanog, and Klf4.
Highlights.
Chemical mapping determines nucleosome landscape in embryonic stem cells.
Chemically-defined nucleosomes occupy NDRs in the TSS and TTS.
Nucleosomes position at DNA targets for CTCF and pluripotency factors.
Nucleosomes are preferentially localized to exon-intron junctions.
Acknowledgments
We thank K. Brogaard and R. Holmgren for advice. We acknowledge the University of Chicago Genomics Core for Illumina Hi-Seq. L.N.V. was supported in part by the CMBD training grant NIH T32 GM08061 and is a recipient of NSF Graduate Research Fellowship. The work was supported by pilot projects under U54CA143869 and U54CA193419 and a grant from NIGMS R01GM107177 to J. W. and X.W.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
SUPPLEMENTAL INFORMATION
Supplemental Information includes 7 supplemental figures.
AUTHOR CONTRIBUTIONS
X.W. and J.W. conceptually designed the study. L.N.V., A.C.S., and X.W. performed experiments; L.X., B.X. performed computational analysis; L.N.V., L.X., J.W. and X.W analyzed data and wrote the manuscript.
References
- Andersson R, Enroth S, Rada-Iglesias A, Wadelius C, Komorowski J. Nucleosomes are well positioned in exons and carry characteristic histone modifications. Genome Res. 2009;19:1732–1741. doi: 10.1101/gr.092353.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bentley DL. Coupling mRNA processing with transcription in time and space. Nat Rev Genet. 2014;15:163–175. doi: 10.1038/nrg3662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bondarenko VA, Steele LM, Ujvari A, Gaykalova DA, Kulaeva OI, Polikanov YS, Luse DS, Studitsky VM. Nucleosomes can form a polar barrier to transcript elongation by RNA polymerase II. Mol Cell. 2006;24:469–479. doi: 10.1016/j.molcel.2006.09.009. [DOI] [PubMed] [Google Scholar]
- Brogaard K, Xi L, Wang JP, Widom J. A map of nucleosome positions in yeast at base-pair resolution. Nature. 2012;486:496–501. doi: 10.1038/nature11142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carone BR, Hung JH, Hainer SJ, Chou MT, Carone DM, Weng Z, Fazzio TG, Rando OJ. High-resolution mapping of chromatin packaging in mouse embryonic stem cells and sperm. Dev Cell. 2014;30:11–22. doi: 10.1016/j.devcel.2014.05.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, Zhang W, Jiang J, et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell. 2008;133:1106–1117. doi: 10.1016/j.cell.2008.04.043. [DOI] [PubMed] [Google Scholar]
- Chereji RV, Kan TW, Grudniewska MK, Romashchenko AV, Berezikov E, Zhimulev IF, Guryev V, Morozov AV, Moshkin YM. Genome-wide profiling of nucleosome sensitivity and chromatin accessibility in Drosophila melanogaster. Nucleic acids research. 2016;44:1036–1051. doi: 10.1093/nar/gkv978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chung HR, Dunkel I, Heise F, Linke C, Krobitsch S, Ehrenhofer-Murray AE, Sperling SR, Vingron M. The effect of micrococcal nuclease digestion on nucleosome positioning data. PLoS One. 2010;5:e15754. doi: 10.1371/journal.pone.0015754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deal RB, Henikoff JG, Henikoff S. Genome-wide kinetics of nucleosome turnover determined by metabolic labeling of histones. Science. 2010;328:1161–1164. doi: 10.1126/science.1186777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flaus A, Luger K, Tan S, Richmond TJ. Mapping nucleosome position at single base-pair resolution by using site-directed hydroxyl radicals. Proc Natl Acad Sci U S A. 1996;93:1370–1375. doi: 10.1073/pnas.93.4.1370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gaffney DJ, McVicker G, Pai AA, Fondufe-Mittendorf YN, Lewellen N, Michelini K, Widom J, Gilad Y, Pritchard JK. Controls of nucleosome positioning in the human genome. PLoS Genet. 2012;8:e1003036. doi: 10.1371/journal.pgen.1003036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ganapathi M, Palumbo MJ, Ansari SA, He Q, Tsui K, Nislow C, Morse RH. Extensive role of the general regulatory factors, Abf1 and Rap1, in determining genome-wide chromatin structure in budding yeast. Nucleic acids research. 2011;39:2032–2044. doi: 10.1093/nar/gkq1161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garcia P, Paulo E, Gao J, Wahls WP, Ayte J, Lowy E, Hidalgo E. Binding of the transcription factor Atf1 to promoters serves as a barrier to phase nucleosome arrays and avoid cryptic transcription. Nucleic acids research. 2014;42:10351–10359. doi: 10.1093/nar/gku704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilchrist DA, Dos Santos G, Fargo DC, Xie B, Gao Y, Li L, Adelman K. Pausing of RNA polymerase II disrupts DNA-specified nucleosome organization to enable precise gene regulation. Cell. 2010;143:540–551. doi: 10.1016/j.cell.2010.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henikoff JG, Belsky JA, Krassovsky K, MacAlpine DM, Henikoff S. Epigenome characterization at single base-pair resolution. Proc Natl Acad Sci U S A. 2011;108:18318–18323. doi: 10.1073/pnas.1110731108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hughes AL, Jin Y, Rando OJ, Struhl K. A functional evolutionary approach to identify determinants of nucleosome positioning: a unifying model for establishing the genome-wide pattern. Mol Cell. 2012;48:5–15. doi: 10.1016/j.molcel.2012.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hughes AL, Rando OJ. Mechanisms underlying nucleosome positioning in vivo. Annu Rev Biophys. 2014;43:41–63. doi: 10.1146/annurev-biophys-051013-023114. [DOI] [PubMed] [Google Scholar]
- Ishii H, Kadonaga JT, Ren B. MPE-seq, a new method for the genome-wide analysis of chromatin structure. Proc Natl Acad Sci U S A. 2015;112:E3457–3465. doi: 10.1073/pnas.1424804112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Iwafuchi-Doi M, Donahue G, Kakumanu A, Watts JA, Mahony S, Pugh BF, Lee D, Kaestner KH, Zaret KS. The Pioneer Transcription Factor FoxA Maintains an Accessible Nucleosome Configuration at Enhancers for Tissue-Specific Gene Activation. Mol Cell. 2016;62:79–91. doi: 10.1016/j.molcel.2016.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jin C, Zang C, Wei G, Cui K, Peng W, Zhao K, Felsenfeld G. H3.3/H2A.Z double variant-containing nucleosomes mark ‘nucleosome-free regions’ of active promoters and other regulatory regions. Nat Genet. 2009;41:941–945. doi: 10.1038/ng.409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jonkers I, Kwak H, Lis JT. Genome-wide dynamics of Pol II elongation and its interplay with promoter proximal pausing, chromatin, and exons. Elife. 2014;3:e02407. doi: 10.7554/eLife.02407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jonkers I, Lis JT. Getting up to speed with transcription elongation by RNA polymerase II. Nat Rev Mol Cell Biol. 2015;16:167–177. doi: 10.1038/nrm3953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koerber RT, Rhee HS, Jiang C, Pugh BF. Interaction of transcriptional regulators with specific nucleosomes across the Saccharomyces genome. Mol Cell. 2009;35:889–902. doi: 10.1016/j.molcel.2009.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kogan S, Trifonov EN. Gene splice sites correlate with nucleosome positions. Gene. 2005;352:57–62. doi: 10.1016/j.gene.2005.03.004. [DOI] [PubMed] [Google Scholar]
- Korber P, Barbaric S. The yeast PHO5 promoter: from single locus to systems biology of a paradigm for gene regulation through chromatin. Nucleic acids research. 2014;42:10888–10902. doi: 10.1093/nar/gku784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kornberg RD, Lorch Y. Twenty-five years of the nucleosome, fundamental particle of the eukaryote chromosome. Cell. 1999;98:285–294. doi: 10.1016/s0092-8674(00)81958-3. [DOI] [PubMed] [Google Scholar]
- Kubik S, Bruzzone MJ, Jacquet P, Falcone JL, Rougemont J, Shore D. Nucleosome Stability Distinguishes Two Different Promoter Types at All Protein-Coding Genes in Yeast. Mol Cell. 2015;60:422–434. doi: 10.1016/j.molcel.2015.10.002. [DOI] [PubMed] [Google Scholar]
- Kulaeva OI, Hsieh FK, Chang HW, Luse DS, Studitsky VM. Mechanism of transcription through a nucleosome by RNA polymerase II. Biochim Biophys Acta. 2013;1829:76–83. doi: 10.1016/j.bbagrm.2012.08.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kwak H, Fuda NJ, Core LJ, Lis JT. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science. 2013;339:950–953. doi: 10.1126/science.1229386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mavrich TN, Jiang C, Ioshikhes IP, Li X, Venters BJ, Zanton SJ, Tomsho LP, Qi J, Glaser RL, Schuster SC, et al. Nucleosome organization in the Drosophila genome. Nature. 2008;453:358–362. doi: 10.1038/nature06929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moyle-Heyrman G, Zaichuk T, Xi L, Zhang Q, Uhlenbeck OC, Holmgren R, Widom J, Wang JP. Chemical map of Schizosaccharomyces pombe reveals species-specific features in nucleosome positioning. Proc Natl Acad Sci U S A. 2013;110:20158–20163. doi: 10.1073/pnas.1315809110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008;320:1344–1349. doi: 10.1126/science.1158441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ong CT, Corces VG. CTCF: an architectural protein bridging genome topology and function. Nat Rev Genet. 2014;15:234–246. doi: 10.1038/nrg3663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Polach KJ, Widom J. Mechanism of protein access to specific DNA sequences in chromatin: a dynamic equilibrium model for gene regulation. J Mol Biol. 1995;254:130–149. doi: 10.1006/jmbi.1995.0606. [DOI] [PubMed] [Google Scholar]
- Raveh-Sadka T, Levo M, Shabi U, Shany B, Keren L, Lotan-Pompan M, Zeevi D, Sharon E, Weinberger A, Segal E. Manipulating nucleosome disfavoring sequences allows fine-tune regulation of gene expression in yeast. Nat Genet. 2012;44:743–750. doi: 10.1038/ng.2305. [DOI] [PubMed] [Google Scholar]
- Saldi T, Cortazar MA, Sheridan RM, Bentley DL. Coupling of RNA Polymerase II Transcription Elongation with Pre-mRNA Splicing. J Mol Biol. 2016 doi: 10.1016/j.jmb.2016.04.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwartz S, Meshorer E, Ast G. Chromatin organization marks exon-intron structure. Nat Struct Mol Biol. 2009;16:990–995. doi: 10.1038/nsmb.1659. [DOI] [PubMed] [Google Scholar]
- Sebeson A, Xi L, Zhang Q, Sigmund A, Wang JP, Widom J, Wang X. Differential Nucleosome Occupancies across Oct4-Sox2 Binding Sites in Murine Embryonic Stem Cells. PLoS One. 2015;10:e0127214. doi: 10.1371/journal.pone.0127214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Segal E, Widom J. Poly(dA:dT) tracts: major determinants of nucleosome organization. Curr Opin Struct Biol. 2009;19:65–71. doi: 10.1016/j.sbi.2009.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soufi A, Donahue G, Zaret KS. Facilitators and impediments of the pluripotency reprogramming factors’ initial engagement with the genome. Cell. 2012;151:994–1004. doi: 10.1016/j.cell.2012.09.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soufi A, Garcia MF, Jaroszewicz A, Osman N, Pellegrini M, Zaret KS. Pioneer transcription factors target partial DNA motifs on nucleosomes to initiate reprogramming. Cell. 2015;161:555–568. doi: 10.1016/j.cell.2015.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Struhl K, Segal E. Determinants of nucleosome positioning. Nat Struct Mol Biol. 2013;20:267–273. doi: 10.1038/nsmb.2506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Su H, Meng S, Lu Y, Trombly MI, Chen J, Lin C, Turk A, Wang X. Mammalian hyperplastic discs homolog EDD regulates miRNA-mediated gene silencing. Mol Cell. 2011;43:97–109. doi: 10.1016/j.molcel.2011.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teif VB, Vainshtein Y, Caudron-Herger M, Mallm JP, Marth C, Hofer T, Rippe K. Genome-wide nucleosome positioning during embryonic stem cell development. Nat Struct Mol Biol. 2012;19:1185–1192. doi: 10.1038/nsmb.2419. [DOI] [PubMed] [Google Scholar]
- Teves SS, Weber CM, Henikoff S. Transcribing through the nucleosome. Trends Biochem Sci. 2014;39:577–586. doi: 10.1016/j.tibs.2014.10.004. [DOI] [PubMed] [Google Scholar]
- Tilgner H, Nikolaou C, Althammer S, Sammeth M, Beato M, Valcarcel J, Guigo R. Nucleosome positioning as a determinant of exon recognition. Nat Struct Mol Biol. 2009;16:996–1001. doi: 10.1038/nsmb.1658. [DOI] [PubMed] [Google Scholar]
- Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics (Oxford, England) 2009;25:1105–1111. doi: 10.1093/bioinformatics/btp120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology. 2010;28:511–515. doi: 10.1038/nbt.1621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vera DL, Madzima TF, Labonne JD, Alam MP, Hoffman GG, Girimurugan SB, Zhang J, McGinnis KM, Dennis JH, Bass HW. Differential nuclease sensitivity profiling of chromatin reveals biochemical footprints coupled to gene expression and functional DNA elements in maize. Plant Cell. 2014;26:3883–3893. doi: 10.1105/tpc.114.130609. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vierstra J, Rynes E, Sandstrom R, Zhang M, Canfield T, Hansen RS, Stehling-Sun S, Sabo PJ, Byron R, Humbert R, et al. Mouse regulatory DNA landscapes reveal global principles of cis-regulatory evolution. Science. 2014;346:1007–1012. doi: 10.1126/science.1246426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weber CM, Ramachandran S, Henikoff S. Nucleosomes are context-specific, H2A.Z-modulated barriers to RNA polymerase. Mol Cell. 2014;53:819–830. doi: 10.1016/j.molcel.2014.02.014. [DOI] [PubMed] [Google Scholar]
- West JA, Cook A, Alver BH, Stadtfeld M, Deaton AM, Hochedlinger K, Park PJ, Tolstorukov MY, Kingston RE. Nucleosomal occupancy changes locally over key regulatory regions during cell differentiation and reprogramming. Nat Commun. 2014;5:4719. doi: 10.1038/ncomms5719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013;153:307–319. doi: 10.1016/j.cell.2013.03.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williams LH, Fromm G, Gokey NG, Henriques T, Muse GW, Burkholder A, Fargo DC, Hu G, Adelman K. Pausing of RNA polymerase II regulates mammalian developmental potential through control of signaling networks. Mol Cell. 2015;58:311–322. doi: 10.1016/j.molcel.2015.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wobus AM, Guan K, Yang HT, Boheler KR. Embryonic stem cells as a model to study cardiac, skeletal muscle, and vascular smooth muscle cell differentiation. Methods Mol Biol. 2002;185:127–156. doi: 10.1385/1-59259-241-4:127. [DOI] [PubMed] [Google Scholar]
- Workman JL, Kingston RE. Alteration of nucleosome structure as a mechanism of transcriptional regulation. Annual review of biochemistry. 1998;67:545–579. doi: 10.1146/annurev.biochem.67.1.545. [DOI] [PubMed] [Google Scholar]
- Xi L, Brogaard K, Zhang Q, Lindsay B, Widom J, Wang JP. A locally convoluted cluster model for nucleosome positioning signals in chemical map. J Am Stat Assoc. 2014;109:48–62. doi: 10.1080/01621459.2013.862169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xi L, Fondufe-Mittendorf Y, Xia L, Flatow J, Widom J, Wang JP. Predicting nucleosome positioning using a duration Hidden Markov Model. BMC bioinformatics. 2010;11:346. doi: 10.1186/1471-2105-11-346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xi Y, Yao J, Chen R, Li W, He X. Nucleosome fragility reveals novel functional states of chromatin and poises genes for activation. Genome Res. 2011;21:718–724. doi: 10.1101/gr.117101.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- You JS, Kelly TK, De Carvalho DD, Taberlay PC, Liang G, Jones PA. OCT4 establishes and maintains nucleosome-depleted regions that provide additional layers of epigenetic regulation of its target genes. Proc Natl Acad Sci U S A. 2011;108:14497–14502. doi: 10.1073/pnas.1111309108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zaret KS, Mango SE. Pioneer transcription factors, chromatin dynamics, and cell fate control. Curr Opin Genet Dev. 2016;37:76–81. doi: 10.1016/j.gde.2015.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, et al. Model-based analysis of ChIP-Seq (MACS) Genome biology. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Moqtaderi Z, Rattner BP, Euskirchen G, Snyder M, Kadonaga JT, Liu XS, Struhl K. Evidence against a genomic code for nucleosome positioning. Reply to “Nucleosome sequence preferences influence in vivo nucleosome organization”. Nat Struct Mol Biol. 2010;17:920–923. doi: 10.1038/nsmb0810-920. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
(A) The mouse genome encodes 13 histone H4 genes, each of which has a variable DNA sequence but encodes an identical protein. Sequence alignment of 13 mouse histone H4 genes identifies two regions of common sequences (1 and 2 highlighted in red) for designing shRNA against all H4 genes. (B) Schematic of PiggyBac vectors for expressing shRNAs against H4 and a synthesized RNAi-resistant H4S47C cDNA. (C) Co-expression of H4-shRNA1 and H4-shRNA3 in HEK 293T cells efficiently down-regulated FLAG-tagged mouse H4 but not FLAG-tagged H4S47C, showing that shRNAs were specific for endogenous H4 and not for synthetic H4S47C. (D) RT-PCR analysis confirmed endogenous H4 was knocked down in H4S47C clones while expressing the H4S47C transgene. Representative clones 11, 7, 9, and 1 are shown. (E) WT and H4S47C ES cells showed equal rates of H4 protein synthesis. Both cells were pulse-labeled with azidohomoalanine. Newly synthesized histone proteins were biotinylated and detected by Streptavidin and Coomassie blue stain. Western blot with rabbit anti-H4 demonstrated equal H4 levels between both cell lines. (F) Western blots showing similar levels of Oct4, Nanog, and Ago2 between WT and H4S47C ES cells. (G) The growth curve demonstrates that H4S47C ES cells grow at the same rate as WT cells. Data represents the mean of three independent biological replicates (n=3) and error bars represent the standard error of the mean (SEM). (H) Volcano plots of p-value (−log10 scale) vs fold change (log2 scale) showing differential gene expression of ~21,800 genes between WT vs H4S47C ES cells in 2i (left) or Lif (right) culture conditions. Dotted horizontal line at y=1.3 represents the threshold of statistical significance (p ≤ 0.05). The vertical dotted lines at x=±1 mark the 2-fold change. Genes are identified as significant if p≤0.05 with a log2 fold change =2. The black arrows indicate the number of genes up- or down-regulated to statistical significance. In either 2i or Lif, only a small number of genes are significantly up- or down- regulated between WT and H4S47C ES cells. (I) Left, representative phase-contrast images show that WT and H4S47C ES cells exhibited the same cellular morphology and differentiation patterns. Day 0: Cells were cultured on gelatin-treated plates in the presence of serum and Lif. Day 5: Embryoid bodies (EBs) were generated under the hanging drop method for 2 days followed by suspension culture for 3 days in the absence of Lif. Day 10: Differentiation of EBs after transfer into adherent culture with retinoic acid (RA) for 5 days. Right, corresponding RT-PCR shows a loss of Oct4 by Day 5. Reactions without RT were used as a negative control. (J) Crick-Watson cleavage peak-peak distance plot showing 3 dominant distances equal to −12, −5 and +2 nucleotides, confirming the primary and secondary cleavage sites at −1 and +6, respectively. Local cleavage peaks were selected from Watson and Crick strands separately. A peak was selected if it represented a local maximum cleavage site (also >=10) within ±73 bp on one strand.
(A) Frequency of AA/TT/AT/TA dinucleotide motifs in aligned nucleosomes in intergenic vs intragenic regions. A unique nucleosome is designated as intragenic if its center is in a gene body, otherwise it is defined as intergenic. The zero value x-coordinate is the dyad of unique nucleosomes defined by the chemical map. The periodicity of the dinucleotide motif signal is comparable between the intergenic regions vs intragenic regions, while the difference in average motif frequency is attributable to the difference of A/T composition in the two regions. (B) Same plot as in (A) but only unique nucleosomes in gene body were selected and divided into quartiles of FPKM gene expression values. Similar to (A), the periodicity is comparable between all groups while the difference in average frequency is due to A/T composition. (C) The normalized nucleosome occupancy in log2 scale over poly(dA-dT) and poly(dG-dC) tracts with 0–4 mismatches as a function of tract length in mouse ES cells in the chemical and MNase maps, and in S. cerevisiae by chemical mapping. In each plot, A0 (or G0) stands for poly(dA-dT) or poly(dG-dC) tract of 0 mismatch and so forth. (D) Distance between poly(dA-dT) or poly(dG-dC) centers to unique nucleosomes defined by the chemical maps in mouse ES cells and S. cerevisae.
(A) Center-weighted nucleosome occupancy of mouse ES cell genes aligned at the TSS in the chemical map compared with MNase maps for H4S47C and WT from this study and an MNase map from a previous study (Teif et al., 2012). Nucleosomes are enumerated in terms of distance relative to the 0 bp position at the TSS. The chemical map reveals that some genes are occupied by nucleosomes at the TSS. In this work, a “−1” nucleosome refers to a nucleosome that is positioned directly upstream of the TSS, and a “0” nucleosome refers to one that overlaps with the TSS. (B) Cross-correlation of cleavages across the two strands around −1 nucleosomes (−150 bp to TSS) showing nucleosome-specific characteristic pattern with peaks at −12, −5 and +2. (C) Plot of AA/TT/AT/TA frequency of uniquely defined −1 nucleosomes. (D) Eight clusters from k-means clustering of the chemical NCP score pattern in the [−150,250] bp region around the TSS. For each unique TSS (out of ~21,000), we normalized the NCP score in the [−150, 200] bp window by the average NCP score in this region. We used the “kmeans” function in R to perform the k-means clustering of the normalized NCP score in the [−150, 200] bp region and obtained eight clusters with distinct patterns of nucleosome positioning around the TSS. (E) A zoomed-in view of the eight clusters −250 to 250 bp around the TSS shows genes within each cluster have synchronized, periodic alternative positioning.
(A) Comparison of center-weighted nucleosome occupancy defined by the chemical map and the MNase map around the TSS for S. cerevisiae and S. pombe, separated by FPKM gene expression quartiles. The RNA-seq data for S. cerevisiae was taken from (Nagalakshmi et al., 2008). (B) Center-weighted nucleosome occupancy plots around the TSS by expression level in FPKM (top half versus lower half) within each cluster from Figure S3D.
(A) Plot of DNase I hypersensitivity sites around the TSS by quartiles of gene expression. DNase I data was taken from mouse ENCODE project (GSM1014154) (Vierstra et al., 2014). (B) Partial MNase digestion with 5 U/mL and 15 U/mL generates more enriched read coverage around the TSS compared with the complete MNase map. (C) Partial MNase digestion at 15 U/mL shows genes with higher expression have higher read coverage around the TSS. (D) Shorts reads from MNase footprinting data (Carone et al., 2014) show high A/T frequency when aligned at read ends, demonstrating the MNase digestion bias (i.e. preference to cleave into an AA/TT/AT/TA dinucleotide). (E) Nucleosome Positioning Index by MPE-ChIP H3 and MPE-ChIP H2B (Ishii et al., 2015) around the TSS, sorted by FPKM quartiles.
(A) Center-weighted nucleosome occupancy of genes aligned at the TTS (based on ~22,800 genes with unique TTS) for the chemical and MNase maps. The chemical map shows a well-positioned nucleosome at the TTS in contrast to substantial depletion by the MNase map. High A/T frequency (green) might contribute to the dramatic depletion of nucleosomes in the MNase map. (B) Same as (A), the chemical map shows phased nucleosome arrays around the TTS where gene expression level is positively correlated with average nucleosome occupancy. (C) Normalized average NCP scores and read center scores from the chemical map and from the MNase map, respectively, in intergenic vs intrageneic regions. Intragenic regions show increased occupancy over intergenic regions in both maps. (D) Nucleosome occupancy increases with gene expression (FPKM) in exons and introns as demonstrated by the chemical and MNase maps. Total average NCP score or read center score is shown. (E) Linker length distribution in genic regions by gene expression quartiles in FPKM shows higher expressed genes are enriched with shorter linker lengths.
(A) Center-weighted nucleosome occupancy defined by the chemical map and predicted nucleosome map [by NuPoP R package, (Xi et al., 2010)] centered on binding sites of Oct4, Sox2, Nanog, and Klf4 (Chen et al., 2008; Whyte et al., 2013). A/T frequency for the region is shown in yellow. (B) Cross-strand cross-correlation of chemical cleavages in the ±75 bp region of factors sites. (C) AA/TT/AT/TA dinucleotide frequency of unique nucleosomes defined in the ±75 bp region of factor sites. The A/T frequency for the region surrounding the peak site is shown. (D) Read coverage scores from 5 U/mL partial MNase maps plotted in quartiles of ChIP-seq signals for Oct4, Sox2, Nanog, and Klf4.
