A simple method for generating high-resolution maps of genome-wide protein binding

Peter J Skene; Steven Henikoff

doi:10.7554/eLife.09225

. 2015 Jun 16;4:e09225. doi: 10.7554/eLife.09225

A simple method for generating high-resolution maps of genome-wide protein binding

Peter J Skene ¹, Steven Henikoff ^1,^2,^*

Editor: Joaquín M Espinosa³

PMCID: PMC4480131 PMID: 26079792

Abstract

Chromatin immunoprecipitation (ChIP) and its derivatives are the main techniques used to determine transcription factor binding sites. However, conventional ChIP with sequencing (ChIP-seq) has problems with poor resolution, and newer techniques require significant experimental alterations and complex bioinformatics. Previously, we have used a new crosslinking ChIP-seq protocol (X-ChIP-seq) to perform high-resolution mapping of RNA Polymerase II (Skene et al., 2014). Here, we build upon this work and compare X-ChIP-seq to existing methodologies. By using micrococcal nuclease, which has both endo- and exo-nuclease activity, to fragment the chromatin and thereby generate precise protein–DNA footprints, high-resolution X-ChIP-seq achieves single base-pair resolution of transcription factor binding. A significant advantage of this protocol is the minimal alteration to the conventional ChIP-seq workflow and simple bioinformatic processing.

DOI: http://dx.doi.org/10.7554/eLife.09225.001

Research organism: D. melanogaster, human

Main text

The diverse gene expression programs that allow for differentiation and response to environmental stimuli result from the regulated binding of transcription factors to DNA. The prevalent technique used in chromatin biology for mapping protein–DNA interactions is chromatin immunoprecipitation (ChIP), but little has changed since it was first described 27 years ago (Solomon et al., 1988). Despite recent advances in read-out technologies for ChIP, such as high-throughput sequencing (ChIP-seq), the basic chromatin preparation protocol remains the same and has a number of limitations. For example, sonication is typically used to fragment the chromatin. This however, has been shown to be non-random, with heterochromatic regions showing increased resistance to fragmentation leading to bias in the experiment (Teytelman et al., 2009). In addition, sonication typically produces chromatin fragments between 200 and 500 bp, whereas the footprint of a typical chromatin-associated protein is ∼10-fold smaller, indicating a lack of resolution currently obtained by ChIP-seq. Even extensive sonication only yields fragments with an average length of 200 bp, suggesting that sonication is of limited use in generating high-resolution maps of genome-wide protein binding (Fan et al., 2008). In a previous study, we were interested in how RNA Polymerase II transcribes through nucleosomes at the promoter (Skene et al., 2014). Answering this question required the precise mapping of PolII with respect to the position of nucleosomes, but conventional ChIP-seq that uses sonication to fragment the chromatin, yields fragments approximately twice the size of a nucleosome. Additionally, it has been shown that PolII can crosslink to nearby nucleosomes and therefore mapping the immunoprecipitated DNA fragments from these composite PolII:nucleosome:DNA complexes fails to precisely map the position of PolII on the DNA (Koerber et al., 2009; Skene et al., 2014). By using micrococcal nuclease (MNase) to digest unprotected DNA, we were able to achieve high resolution in a ChIP experiment, mapping the precise location of PolII and chromatin remodelers on the DNA (Figure 1A; a detailed protocol is provided as a Supplementary file 1) (Skene et al., 2014). Optimization of this simple protocol for the high-resolution mapping of protein–DNA interactions has the potential to revolutionize our understanding of genome-wide protein binding. High resolution is especially a requirement at closely spaced transcription factor binding sites, such as locus control regions and super-enhancers, which have been shown to be vital to cell fate decisions and human diseases (Hnisz et al., 2013; Pott and Lieb, 2014).

Figure 1. — (A) Experimental workflow using MNase to fragment the chromatin. (B) Average PolII profile across TSS in Drosophila. S2 cells as measured by conventional ChIP (Core et al., 2012), high-resolution X-ChIP-seq (fragment lengths 20–70 bp) and 3′NT that maps the position of the polymerase active site via the last ribonucleotide incorporated into the nascent chain (Weber et al., 2014). With 3′NT, the RNA has to be transcribed at least 25 nt in length to be mapped. (C) Length distribution of the mapped paired-end reads from a total PolII high-resolution X-ChIP-seq experiment (Skene et al., 2014).

**DOI:** http://dx.doi.org/10.7554/eLife.09225.002

Figure 1—figure supplement 1. — (A) Experimental workflow using MNase to fragment the chromatin. (B) Average PolII profile across TSS in Drosophila. S2 cells as measured by conventional ChIP (Core et al., 2012), high-resolution X-ChIP-seq (fragment lengths 20–70 bp) and 3′NT that maps the position of the polymerase active site via the last ribonucleotide incorporated into the nascent chain (Weber et al., 2014). With 3′NT, the RNA has to be transcribed at least 25 nt in length to be mapped. (C) Length distribution of the mapped paired-end reads from a total PolII high-resolution X-ChIP-seq experiment (Skene et al., 2014).

**DOI:** http://dx.doi.org/10.7554/eLife.09225.002

To evaluate high-resolution crosslinking ChIP-seq (X-ChIP-seq), we compared it to existing methodologies. Initially, we focused on mapping PolII at the transcriptional start site (TSS) in Drosophila S2 cells, where there are existing data sets at both low and high resolution. We performed high-resolution X-ChIP-seq using the same antibody against total PolII (Rpb3 subunit) as previously used by a conventional sonication ChIP experiment (Figure 1B) (Core et al., 2012). Using this cell line also allowed a comparison with the single base-pair resolution technique that maps the last ribonucleotide incorporated into the nascent RNA chain (3′NT), thereby mapping the exact position of the PolII active site (Weber et al., 2014). By using paired-end sequencing, we can selectively study specific lengths of immunoprecipitated fragments. Analyzing sequenced fragments with lengths 20–70 bp, which more closely represent the footprint of PolII (Samkurashvili and Luse, 1996), avoids the complication of mapping fragments consisting of PolII crosslinked to adjacent nucleosomes. Using this technique, we find that the maximal peak of PolII signal coincides with the position of the polymerase's active site at ∼+35 bp, as measured by 3′NT. This is consistent with evidence suggesting that the vast majority of Drosophila genes have a productively engaged PolII enzyme stalled just downstream of the promoter rather than PolII stably bound at the pre-initiation complex (Core et al., 2012). In contrast, PolII distribution as measured by conventional ChIP with the chromatin fragmented by sonication, shows a distinct distribution at the promoter with a broader peak centered at the TSS with maximal density at −5 bp. This discrepancy likely comes from biases in the probability of sonication breaking the DNA at the nucleosome-depleted region of the promoter, as accessible regions such as DNase I sites and promoters of active genes have been shown to be sonicated at higher probability than inactive genomic regions (Teytelman et al., 2009). Analysis of a published sonicated input chromatin sample indicates a strong sonication bias at the promoter region (Figure 1—figure supplement 1). In contrast, by predominantly fragmenting the chromatin with MNase, it is possible to generate footprints corresponding to nucleosomes and other DNA-bound factors (Henikoff et al., 2011; Skene et al., 2014). Overall, this shows that using a high-resolution ChIP technique to map the protected footprint of PolII achieves comparable resolution to the single base-pair resolution achieved by mapping the position of the active site of PolII via nascent chain mapping. In comparison to conventional ChIP-seq, using high-resolution X-ChIP-seq achieves both higher resolution, as indicated by the width of the ChIP peak and higher accuracy by avoiding sonication bias, as shown by high similarity to 3′NT. Moreover, the depth of sequencing indicates the cost-effectiveness of this high-resolution ChIP approach, with the 3′NT profile based on 150 million reads (Weber et al., 2014), whereas our method required only 7 million paired-end reads with a fragment length of 20–70 bp. For comparison, the PolII profile generated by conventional ChIP was based on 13 million mapped reads (Core et al., 2012).

A limitation of high-resolution X-ChIP-seq is that a minority of the immunoprecipitated fragments represent the footprint of PolII on DNA, likely as a consequence of formaldehyde readily forming protein–protein crosslinks generating complexes such as PolII crosslinked to nucleosomes (Koerber et al., 2009; Skene et al., 2014). In our previous study, mapping murine PolII, only 10% of the fragments were 20–70 bp in length and less than 3% were under 50 bp (Figure 1C) (Skene et al., 2014). Therefore, to improve the cost-effectiveness of this technique and make it more applicable to transcription factors, which typically have a <50-bp footprint, we have further optimized the method to enrich for short fragments prior to sequencing. Previously, Agencourt AMpure beads have been used to select for short fragments prior to linker ligation (Orsi et al., 2015). In agreement, initial attempts indicated that Agencourt AMpure beads could enrich for DNA fragments below 100 bp from a complex mixture, but were unable to selectively purify fragments of ∼50 bp. However, by adjusting the volumetric ratio of beads to DNA, we could reproducibly control the selection within the 100–200 bp range with a ratio of 1.1×, leaving fragments of ∼170 bp in the unbound fraction (Figure 2A). Given that the ligation of the Illumina adapters to the immunoprecipitated DNA adds ∼125 bp, by using this ratio of AMpure beads, we could selectively enrich for ligated products containing short inserts (Figure 2B). Using this approach on input DNA from a MNase ChIP experiment, where the vast majority of the DNA fragments are from mono-nucleosomes, we find a 25-fold enrichment of fragments below 50 bp (Figure 2C). Therefore, combining this modification to the existing library preparation protocol with the MNase X-ChIP approach yields cost-effective high-resolution data.

Figure 2. — (A) Volumetric ratio of AMpure beads to DNA was optimized to selectively retain fragments below 200 bp in the unbound fraction using a 10-bp ladder as a test case. The cartoon indicates the size of ligated product containing a 50-bp insert. (B) Library preparation workflow to enrich for short insert sizes between the ligated linkers. (C) Fold enrichment of sequenced ChIP fragments less than 50 bp after the AMpure size selection as depicted in panel B. This method of enriching for short fragments is specifically applicable to high-resolution X-ChIP-seq, where MNase has been used to generate minimally protected DNA footprints. In contrast, in conventional ChIP-seq where sonication is used, the enrichment of short size classes would be inappropriate as typically fragments are between 200 and 500 bp in length and even extensive sonication can only further fragment chromatin to an average size of 200 bp (Fan et al., 2008).

**DOI:** http://dx.doi.org/10.7554/eLife.09225.004

To assess the resolution of this method, we chose the well-characterized transcription factor CCCTC-binding factor (CTCF). We performed high-resolution X-ChIP-seq in K562 cells and analyzed 20–50 bp fragments, and compared this to conventional ChIP as used in the ENCODE project (Figure 3A). To avoid the complexities of peak-calling algorithms that might be biased for differing data types, we used an unbiased approach of centering the data on CTCF motifs that were found within DNase I sites and therefore most likely bound by CTCF. High-resolution X-ChIP-seq of CTCF yielded a more focused distribution of reads centered over the CTCF motif. To quantify this, we measured the width of the ChIP peak at its half-height for each individual CTCF site (Figure 3B). A conventional ChIP approach using sonication gave a half-height width of 200 bp. In contrast, analysis of 20–50 bp fragments from MNase ChIP gave much higher resolution, with a half-height width of only 50 bp, suggesting that genome-wide MNase is chewing back to a minimal footprint of CTCF bound to the DNA. By analyzing different ranges of fragment lengths, it was possible to see that shorter fragments gave the highest resolution and smallest range in peak widths (Figure 3B and Figure 3—figure supplement 1).

Figure 3. — (A) Average CTCF profile at DNase I sites that contain the motif in K562 cells, as measured by conventional ChIP in the ENCODE project and high-resolution X-ChIP-seq (20–50 bp fragments). Sites were determined by identifying the DNase I sites common to K562 and HeLa cells, as defined by the ENCODE project, that contained the 19 bp CTCF consensus binding motif (MA0139.1) by using FIMO analysis with a false discovery rate of 0.01 (Grant et al., 2011). This identified 9403 such 19 bp CTCF motifs within DNase I sites that were at least 500 bp apart. (B) Box plots indicating half-height widths of ChIP peaks at each individual CTCF motif for different size classes of immunoprecipitated fragments in high-resolution X-ChIP-seq and conventional ChIP. (C) ChIP profiles at a typical CTCF motif. For ChIP-exo, the 5′ ends of forward and reverse strands are plotted. (D) The upper graph displays the average profile mapping the position of both of the ends of paired-end reads for the 20–50 bp immunoprecipitated CTCF fragments in high-resolution X-ChIP-seq centered over the CTCF motif. For comparison, the ends for the forward and reverse strands are shown for ChIP-exo. The heatmaps below show the signal ±40 bp for each CTCF motif (defined as CTCF motifs with DNase I sites in both HeLa and K652 cells; n = 9403). The 19 bp between the identified peaks is highlighted and the 19 bp CTCF motif indicated.

**DOI:** http://dx.doi.org/10.7554/eLife.09225.005

Figure 3—figure supplement 1. — (A) Average CTCF profile at DNase I sites that contain the motif in K562 cells, as measured by conventional ChIP in the ENCODE project and high-resolution X-ChIP-seq (20–50 bp fragments). Sites were determined by identifying the DNase I sites common to K562 and HeLa cells, as defined by the ENCODE project, that contained the 19 bp CTCF consensus binding motif (MA0139.1) by using FIMO analysis with a false discovery rate of 0.01 (Grant et al., 2011). This identified 9403 such 19 bp CTCF motifs within DNase I sites that were at least 500 bp apart. (B) Box plots indicating half-height widths of ChIP peaks at each individual CTCF motif for different size classes of immunoprecipitated fragments in high-resolution X-ChIP-seq and conventional ChIP. (C) ChIP profiles at a typical CTCF motif. For ChIP-exo, the 5′ ends of forward and reverse strands are plotted. (D) The upper graph displays the average profile mapping the position of both of the ends of paired-end reads for the 20–50 bp immunoprecipitated CTCF fragments in high-resolution X-ChIP-seq centered over the CTCF motif. For comparison, the ends for the forward and reverse strands are shown for ChIP-exo. The heatmaps below show the signal ±40 bp for each CTCF motif (defined as CTCF motifs with DNase I sites in both HeLa and K652 cells; n = 9403). The 19 bp between the identified peaks is highlighted and the 19 bp CTCF motif indicated.

**DOI:** http://dx.doi.org/10.7554/eLife.09225.005

We also compared our CTCF high-resolution X-ChIP-seq results to CTCF profiles obtained using ChIP-exo, which is based upon the sonication of crosslinked chromatin, followed by exonuclease digestion of the immunoprecipitated complexes (Rhee and Pugh, 2011). In ChIP-exo, sequential ligation reactions allow the demarcation of 5′ and 3′ ends and bioinformatic analysis is used to identify ‘peak pairs’ that flank the transcription factor binding site. Figure 3C shows profiles based on ENCODE X-ChIP-seq data as processed by the ENCODE uniform processing pipelines and downloaded as ‘raw signal’, our high-resolution X-ChIP-seq stacked read data, and raw ChIP-exo data around a representative CTCF motif. Due to the low amounts of noise, high-resolution X-ChIP-seq is amenable to a very simple thresholding algorithm to identify peaks (Kasinathan et al., 2014), requiring only 13 million paired-end reads to obtain a crisp peak feature. This is in contrast to ChIP-exo, where more complex analysis is required, including dedicated software (Rhee and Pugh, 2011; Starick et al., 2015). Based on 82 million reads, the ChIP-exo raw data show significant signal at a distance from the CTCF motif. This might be a consequence of immunoprecipitating sonicated 200–300 bp chromatin fragments containing more than one protein, which would block the subsequent exonuclease cleavage. However, by using MNase to fragment the chromatin, which has both endo- and exo-nuclease activity, high-resolution X-ChIP-seq should be able to discriminate between nearby proteins. An additional limitation of ChIP-exo is that the input chromatin is not subjected to the same exonuclease treatment and therefore subsequent analyses cannot be normalized to input. With high-resolution X-ChIP-seq, however, all the steps in chromatin fragmentation are prepared prior to immunoprecipitation, thereby allowing input normalization. Moreover, high-resolution X-ChIP-seq requires only minimal alteration to the existing conventional ChIP workflow and library preparation, whereas other techniques require more extensive changes (Starick et al., 2015). To more directly compare to ChIP-exo, we plotted the end positions of each of our 20–50 bp paired-end reads (Figure 3D). We find two predominant sharp peaks on either side of the 19-bp CTCF motif that are separated by 19 bp, indicating that on average for each of our immunoprecipitated fragments, MNase has chewed back to one side of the minimal sequence motif. In contrast, the signal for ChIP-exo is relatively diffuse when centered around the CTCF motif, with an average distance of 52 bp between peak pairs for the peak-called sites (Rhee and Pugh, 2011). DNase I footprinting is often used to generate maps of global transcription factor binding at nucleotide resolution, with the drawback that the technique is not targeted to a specific transcription factor (Hesselberth et al., 2009; Neph et al., 2012). By comparing the ends of the DNA fragments released by DNase I footprinting and that of high-resolution X-ChIP-seq, we find that both techniques identify protected fragments with ends separated by 19 bp at the 19 bp consensus CTCF motifs (Figure 3—figure supplement 2). This therefore suggests that high-resolution X-ChIP-seq can achieve single nucleotide resolution, and by using immunoprecipitation, has the advantage that it can be used to interrogate individual transcription factors.

By harnessing the endo- and exo-nuclease activity of MNase to fragment chromatin, high-resolution X-ChIP-seq has key advantages over conventional ChIP-seq and ChIP-exo in terms of the resolution obtained (Figure 4). Overall, the combination of the improvements to enrich for short immunoprecipitated fragments and the unparalleled ChIP resolution for PolII and transcription factor binding indicate that high-resolution X-ChIP-seq is a cost-effective and simple approach that easily fits within existing ChIP-seq pipelines for determining precise genome-wide maps of protein–DNA binding.

Figure 4. — The fragmentation strategy is shown for (A) conventional ChIP-seq, (B) ChIP-exo and (C) high-resolution X-ChIP-seq. In high-resolution X-ChIP-seq, MNase generates minimally protected DNA fragments that are represented by the lengths of the extracted DNA fragments, which can be obtained by paired-end sequencing. By using an AMpure size selection, it is possible to enrich for these short fragments and increase the cost-effectiveness of the technique. In contrast, conventional ChIP and ChIP-exo are designed for single-end sequencing. Furthermore, the protocols used to generate sequencing libraries for conventional ChIP-seq and ChIP-exo select against fragments below 100 bp.

**DOI:** http://dx.doi.org/10.7554/eLife.09225.008

Materials and methods

Cell lines

Drosophila S2 cells and K562 cells were cultured under standard conditions.

ChIP

High-resolution X-ChIP-seq was performed as described in the Supplementary file 1. Libraries were prepared from the isolated DNA and following cluster generation, 25 rounds of paired-end sequencing was performed by the FHCRC Genomics Shared Resource on the Illumina HiSeq 2500 platform (Henikoff et al., 2011). Details of the library protocol have previously been published (Orsi et al., 2015). After processing and base-calling by Illumina software, paired-end sequencing data were aligned to the hg19 or dmel_r5_51 genome build using Bowtie or Novoalign, respectively. Counts per base pair were normalized as previously described with the fraction of mapped reads spanning each base-pair position multiplied by the genome size (Kasinathan et al., 2014). To analyze reads by length, we divided paired-end fragments into distinct size classes, as indicated in the figure legends. V-plot construction has been previously described (Henikoff et al., 2011). Half-height width for each individual site was calculated as follows: the half height was calculated by dividing the maximum ChIP signal within ±1000 bp of each 19 bp CTCF motif by the background signal, which was defined as the median ChIP signal between −1000 to −900 and +900 to +1000 bp relative to the motif. The half-height width for each motif was calculated by counting the number of contiguous base pairs that had ChIP signal greater than or equal to the half-height.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Funding Information

This paper was supported by the following grants:

Howard Hughes Medical Institute (HHMI) Henikoff to Steven Henikoff.
Damon Runyon Cancer Research Foundation DRG-2110-12 to Peter J Skene.

Additional information

Competing interests

The authors declare that no competing interests exist.

Author contributions

PJS, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article.

SH, Conception and design, Drafting or revising the article.

Additional files

Supplementary file 1.

Protocol for high-resolution X-ChIP-seq. Detailed protocol for performing high-resolution X-ChIP-seq in cell lines.

DOI: http://dx.doi.org/10.7554/eLife.09225.009

elife09225s001.pdf^{(200.3KB, pdf)}

DOI: 10.7554/eLife.09225.009

Major datasets

The following datasets were generated:

Skene PJ, Henikoff S, 2015, A simple method for generating high-resolution maps of genome wide protein binding, http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE67454, Publicly available at the NCBI Gene Expression Omnibus (Accession no: GSE67454).

Skene PJ, Hernandez AE, Groudine M, Henikoff S, 2014, The nucleosomal barrier to promoter escape by RNA Polymerase II is overcome by the chromatin remodeler Chd1, http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52349, Publicly available at the NCBI Gene Expression Omnibus (Accession no: GSE52349).

References

Core LJ, Waterfall JJ, Gilchrist DA, Fargo DC, Kwak H, Adelman K, Lis JT. Defining the status of RNA polymerase at promoters. Cell Reports. 2012;2:1025–1035. doi: 10.1016/j.celrep.2012.08.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fan X, Lamarre-Vincent N, Wang Q, Struhl K. Extensive chromatin fragmentation improves enrichment of protein binding sites in chromatin immunoprecipitation experiments. Nucleic Acids Research. 2008;36:e125. doi: 10.1093/nar/gkn535. [DOI] [PMC free article] [PubMed] [Google Scholar]
Grant CE, Bailey TL, Noble WS. FIMO: Scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017–1018. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
Henikoff JG, Belsky JA, Krassovsky K, MacAlpine DM, Henikoff S. Epigenome characterization at single base-pair resolution. Proceedings of the National Academy of Sciences of USA. 2011;108:18318–18323. doi: 10.1073/pnas.1110731108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hesselberth JR, Chen X, Zhang Z, Sabo PJ, Sandstrom R, Reynolds AP, Thurman RE, Neph S, Kuehn MS, Noble WS, Fields S, Stamatoyannopoulos JA. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nature Methods. 2009;6:283–289. doi: 10.1038/nmeth.1313. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hnisz D, Abraham BJ, Lee TI, Lau A, Saint-André V, Sigova AA, Hoke HA, Young RA. Super-enhancers in the control of cell identity and disease. Cell. 2013;155:934–947. doi: 10.1016/j.cell.2013.09.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kasinathan S, Orsi GA, Zentner GE, Ahmad K, Henikoff S. High-resolution mapping of transcription factor binding sites on native chromatin. Nature Methods. 2014;11:203–209. doi: 10.1038/nmeth.2766. [DOI] [PMC free article] [PubMed] [Google Scholar]
Koerber RT, Rhee HS, Jiang C, Pugh BF. Interaction of transcriptional regulators with specific nucleosomes across the Saccharomyces genome. Molecular Cell. 2009;35:889–902. doi: 10.1016/j.molcel.2009.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Neph S, Vierstra J, Stergachis AB, Reynolds AP, Haugen E, Vernot B, Thurman RE, John S, Sandstrom R, Johnson AK, Maurano MT, Humbert R, Rynes E, Wang H, Vong S, Lee K, Bates D, Diegel M, Roach V, Dunn D, Neri J, Schafer A, Hansen RS, Kutyavin T, Giste E, Weaver M, Canfield T, Sabo P, Zhang M, Balasundaram G, Byron R, MacCoss MJ, Akey JM, Bender MA, Groudine M, Kaul R, Stamatoyannopoulos JA. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature. 2012;489:83–90. doi: 10.1038/nature11212. [DOI] [PMC free article] [PubMed] [Google Scholar]
Orsi GA, Kasinathan S, Zentner GE, Henikoff S, Ahmad K. Mapping regulatory factors by immunoprecipitation from native chromatin. Current Protocols in Molecular Biology. 2015;110:21.31.1–21.31.25. doi: 10.1002/0471142727.mb2131s110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pott S, Lieb JD. What are super-enhancers? Nature Genetics. 2014;47:8–12. doi: 10.1038/ng.3167. [DOI] [PubMed] [Google Scholar]
Rhee HS, Pugh BF. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell. 2011;147:1408–1419. doi: 10.1016/j.cell.2011.11.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Samkurashvili I, Luse DS. Translocation and transcriptional arrest during transcript elongation by RNA polymerase II. The Journal of Biological Chemistry. 1996;271:23495–23505. doi: 10.1074/jbc.271.38.23495. [DOI] [PubMed] [Google Scholar]
Skene PJ, Hernandez AE, Groudine M, Henikoff S. The nucleosomal barrier to promoter escape by RNA polymerase II is overcome by the chromatin remodeler Chd1. eLife. 2014;3:e02042. doi: 10.7554/eLife.02042. [DOI] [PMC free article] [PubMed] [Google Scholar]
Solomon MJ, Larsen PL, Varshavsky A. Mapping protein-DNA interactions in vivo with formaldehyde: evidence that histone H4 is retained on a highly transcribed gene. Cell. 1988;53:937–947. doi: 10.1016/S0092-8674(88)90469-2. [DOI] [PubMed] [Google Scholar]
Starick SR, Ibn-Salem J, Jurk M, Hernandez C, Love MI, Chung HR, Vingron M, Thomas-Chollier M, Meijsing SH. ChIP-exo signal associated with DNA-binding motifs provides insight into the genomic binding of the glucocorticoid receptor and cooperating transcription factors. Genome Research. 2015;25:825–835. doi: 10.1101/gr.185157.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
Teytelman L, Ozaydin B, Zill O, Lefrançois P, Snyder M, Rine J, Eisen MB. Impact of chromatin structures on DNA processing for genomic analyses. PLOS ONE. 2009;4:e6700. doi: 10.1371/journal.pone.0006700. [DOI] [PMC free article] [PubMed] [Google Scholar]
Weber CM, Ramachandran S, Henikoff S. Nucleosomes are context-specific, H2A.Z-modulated barriers to RNA polymerase. Molecular Cell. 2014;53:819–830. doi: 10.1016/j.molcel.2014.02.014. [DOI] [PubMed] [Google Scholar]

eLife. 2015 Jun 16;4:e09225. doi: 10.7554/eLife.09225.010

Decision letter

Editor: Joaquín M Espinosa¹

eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.

Thank you for submitting your work entitled “A simple method for generating high‐resolution maps of genome wide protein binding” for peer review at eLife. Your submission has been reviewed by James Manley (Senior editor) and a Reviewing editor. We feel the work as it stands is not yet fully developed to a level that we could consider for publication as a Research Advance in eLife. However, if you are able to address these concerns with additional data and significant revisions of the text to include more detail, we would be prepared to consider a resubmission on this topic that would be evaluated by the same editors.

The authors present modifications to the widely used ChIP-seq technique as a new method entitled X-ChIP-seq. These modifications simply involve the addition of a micrococcal nuclease digestion step to reduce background DNA fragments and a size-selection step to enrich for short fragments prior to sequencing. It is claimed that these modifications both greatly reduce background and increase resolution, relative to conventional ChIP-seq.

The authors highlight the simplicity of their modifications and bioinformatic analysis relative to other recent modifications to ChIP-seq such as ChIP-exo and ChIP-nexus, making this of interest to the field. The authors aim to show the superiority of X-ChIP-seq through limited comparisons to other datasets (such as conventional ChIP, ChIP-exo, 3'NT, and DNaseI footprinting).

The major areas of concern are:

1) The claim that X-ChIP-seq provides “near base-pair resolution” is not convincingly supported and several of the comparisons with other datasets are incomplete (as detailed below), making it difficult to directly compare the X-ChIP-seq data with other techniques.

2) There is a severe lack of information provided on methods used for the processing and analysis of sequencing data. This makes it difficult to verify the validity of methods used and does not support the claim of “simple bioinformatic processing”.

Main text, first paragraph: Clarify the reasoning/evidence that this is “near base-pair resolution”? This is not clearly established in Skene et al. 2014 and is not readily apparent from Figure 1; Although the positioning in Figure 1B looks different from conventional ChIPseq, the width (resolution) of the peak looks very similar; Are the authors arguing that it is higher resolution because it is a closer match to the 3'NT data which is nucleotide resolution? (This should actually be called accuracy).

Related to Figure 1: Would it not be better to demonstrate the resolution if this technique at a single locus (rather than averaging all TSS) and/or with a DNA-binding protein that has a more defined position on DNA? See Figure3 CTCF.

What do ChIP-exo and ChIP-nexus look like for PolII?

Figure 1–figure supplement 1: It would be very helpful to show end positions for 1st reads of X-ChIP input for comparison. This would allow for clear demonstration of any reduction in of bias. Also, it is unclear if this analysis uses reads aligned to both strands; this could change interpretation of lower half of figure; left shift of + strand reads would correspond to a right shift of - strand reads. How would this affect the plots?

Main text, third paragraph: How would adding this step alter conventional ChIPseq? I.e. what fraction of reads from conventional ChIPseq are within these size ranges and what happens to resolution/accuracy when only these are analyzed? Again there is insufficient comparison between new and old technique. This is important for demonstrating an improvement and/or less bias.

Main text, fourth paragraph: clarify “unbiased approach” (at least in Methods) How many sites were used? What was their average size etc. (some of this information is in Figure 3 legend). Give source for DNase data.

Main text, fourth paragraph: clarify or show data to support the claim that “shorter fragments gave the highest resolution and smallest range in peak widths”; Again this would be better supported with a comparison of both X-ChIP-seq and conventional ChIP-seq (at least by bioinformatically selecting shorter insert sizes).

Main text, fifth paragraph: Clarify “achieve single nucleotide resolution” and comment on offset between the two techniques.

Main text, last paragraph: It would be helpful to see more analysis of low background as in many cases (i.e. transcription-related proteins that do not directly bind DNA) this could be as important as the claimed increase in resolution.

eLife. 2015 Jun 16;4:e09225. doi: 10.7554/eLife.09225.011

Author response

We believe that our initial submission did not clearly state the limitations of ChIP-seq and how the bias and length of fragments obtained by sonication prevent detailed analyses to provide high-resolution data. We have edited the manuscript to make it clear how the approaches used here to generate near single base-pair resolution are not applicable to conventional ChIp-seq.

The major areas of concern are:

The most direct evidence for near base-pair resolution is that we identify DNA fragment ends separated by 19 bp that precisely flank the 19 bp CTCF motif (Figure 4D). We now provide further comparisons to other datasets as described below.

We thank the editors for pointing out these oversights. Further details of the bioinformatic processing, including calculation of half-height widths and generation of V-plots, are now included in the figure legends and the Materials and Methods section, as described below. In addition, we now refer to the detailed step-by-step library preparation protocol that we had previously published in Current Protocols in Molecular Biology (PMID: 25827087).

Main text, first paragraph: Clarify the reasoning/evidence that this is “near base-pair resolution”? This is not clearly established in Skene et al. 2014 and is not readily apparent from Figure 1; Although the positioning in Figure 1B looks different from conventional ChIPseq, the width (resolution) of the peak looks very similar; Are the authors arguing that it is higher resolution because it is a closer match to the 3'NT data which is nucleotide resolution? (This should actually be called accuracy).

As pointed out above, the evidence for near base-pair resolution derives primarily from the fact that the 19 bp CTCF motif corresponded to fragment end maxima that were 19 bp apart, but also from the close correspondence of the PolII envelope obtained using our method to that obtained using 3’NT, which maps the single 3’ base in the active site of PolII. However, we have softened the statement by altering the text to:

“By using micrococcal nuclease (MNase) to digest unprotected DNA, we were able to achieve high resolution in a ChIP experiment, mapping the precise location of PolII and chromatin remodelers on the DNA (Figure 1A).”

In addition to the higher resolution, we also achieve higher accuracy, by avoiding sonication bias, as indicated by the close match to 3’NT. The text has been altered to include this discussion:

“In comparison to conventional ChIP-seq, using high-resolution X-ChIP-seq achieves both higher resolution, as indicated by the width of the ChIP peak and higher accuracy by avoiding sonication bias, as shown by high similarity to 3’NT.”

An intrinsic limitation of the 3’NT technique is that RNA has to be transcribed at least 25 nucleotides in length to be mapped and therefore 3’NT cannot determine PollII distribution upstream of +25bp relative to the TSS. The figure legend now points out this fact.

Related to Figure 1: Would it not be better to demonstrate the resolution if this technique at a single locus (rather than averaging all TSS) and/or with a DNA-binding protein that has a more defined position on DNA? See Figure3 CTCF.

We agree, although as PolII is a processive enzyme, which has a broad distribution along the length of genes, it is not ideal for this type of analysis. Rather, we analyzed CTCF at individual loci, because it binds to discrete sites.

What do ChIP-exo and ChIP-nexus look like for PolII?

There are no available ChIP-exo or ChIP-nexus datasets for PolII in Drosophila S2 cells. Despite ChIP-exo being first published in 2011, very few other labs have adopted the technique. Indeed the complexities of getting ChIP-exo to work seemed to be the motivation for developing a derivative of ChIP-exo: ChIP-nexus (“However, we found significant technical hurdles in applying ChIP-exo.” PMID: 25751057). This is why we used the example of CTCF to directly compare conventional ChIP-seq, high-resolution X-ChIP-seq and ChIP-exo.

Figure 1–figure supplement 1: It would be very helpful to show end positions for 1st reads of X-ChIP input for comparison. This would allow for clear demonstration of any reduction in of bias. Also, it is unclear if this analysis uses reads aligned to both strands; this could change interpretation of lower half of figure; left shift of + strand reads would correspond to a right shift of - strand reads. How would this affect the plots?

Thank you for bringing up this point. We have now clarified Figure 1–figure supplement 1 to show the left ends (5’ position on forward strand reads) and right ends (3’ position on reverse strand reads) of fragments from single end sequencing data of sonicated input chromatin, which has been aligned to the transcriptional start site (reads corresponding to genes on the reverse strand were flipped).

A similar analysis of input chromatin from high-resolution X-ChIP-seq will not be comparable, as here the chromatin has been predominantly fragmented by MNase and therefore will provide footprints corresponding to nucleosomes and other DNA bound factors (PMID: 22025700). Indeed in a previous publication, we used mapping of the input chromatin to determine nucleosome positions (PMID: 24737864). The main text has been altered to include this:

“Analysis of a published sonicated input DNA sample indicates a strong sonication bias at the promoter region (Figure 1–figure supplement 1). In contrast, by predominantly fragmenting the chromatin with MNase it is possible to generate footprints corresponding to nucleosomes and other DNA-bound factors.”

The application of this enrichment for short size classes to conventional ChIP-seq is not appropriate for two reasons: 1) The bias in sonication means that recovered DNA fragments in conventional ChIP-seq do not relate to the footprint of bound factors. Therefore, the length of the DNA fragments provides no information as to the location of the minimally protected footprint and as such, single end sequencing is appropriate for conventional ChIP-seq. 2) Typical sonication yields fragments between 200-500 bp, as indicated for the conventional ChIP-seq ENCODE CTCF dataset analyzed here (GSM749690) and even extensive sonication only further fragments chromatin down to an average of 200 bp (PMID: 18765474). Therefore, it is not possible to get high-resolution data from conventional ChIP-seq. The main text has been altered to include this discussion.

Main text, fourth paragraph: clarify “unbiased approach” (at least in Methods) How many sites were used? What was their average size etc. (some of this information is in Figure 3 legend). Give source for DNase data.

Figure legend 3 has been altered to include these details:

“Sites were determined by identifying the DNase1 sites common to K562 and HeLa cells, as defined by the ENCODE project, that contained the 19 bp CTCF consensus binding motif (MA0139.1) by using FIMO analysis with a false discovery rate of 0.01. This identified 9403 such 19 bp CTCF motifs within DNase1 sites that were at least 500 bp apart.”

Figure 3B shows that smaller fragments provide the highest resolution as measured by the half-height width of the peak at each CTCF site. Here we show a comparison of the half-height widths for high-resolution X-ChIP-seq and conventional ChIP-seq. As discussed above, it is not possible to further separate conventional ChIP-seq by fragment sizes, as single end sequencing was used by the ENCODE project, which is appropriate as sonication yields 200-500 bp fragments, and because of the bias introduced by sonication the length of a fragment does not reflect its DNA footprint of protein binding.

To further illustrate this point, we have now included a V-plot analysis (PMID: 22025700) as a supplement, indicating that the shorter DNA fragments are more tightly grouped and closely centered over the CTCF motif. We have also included a discussion as to why this V-plot analysis is not possible for conventional ChIP-seq in the figure legend.

Main text, fifth paragraph: Clarify “achieve single nucleotide resolution” and comment on offset between the two techniques.

This statement was based upon the fact that by mapping the ends of the fragments captured by high-resolution X-ChIP-seq, we find that MNase chews back to give ends separated by 19 bp either side of the 19bp CTCF consensus motif (Figure 3D). The text has been altered to make this clearer. In addition, the x-axis of Figures 3D and Figure 3–figure supplement 2 have been altered to more clearly indicate that we are showing ChIP signal plotted at single nucleotide resolution.

The offset observed for DNaseI likely reflects the intrinsic differences in steric hindrance between DNase1/CTCF and MNase/CTCF to chromatin digestion. This is commented upon in the figure legend.

Here, we used the term background to reflect the large sprawling ChIP signal, which is observed in conventional ChIP-seq flanks the minimal DNA binding motif. This statement has been amended.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary file 1.

Protocol for high-resolution X-ChIP-seq. Detailed protocol for performing high-resolution X-ChIP-seq in cell lines.

DOI: http://dx.doi.org/10.7554/eLife.09225.009

elife09225s001.pdf^{(200.3KB, pdf)}

DOI: 10.7554/eLife.09225.009

[bib1] Core LJ, Waterfall JJ, Gilchrist DA, Fargo DC, Kwak H, Adelman K, Lis JT. Defining the status of RNA polymerase at promoters. Cell Reports. 2012;2:1025–1035. doi: 10.1016/j.celrep.2012.08.034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Fan X, Lamarre-Vincent N, Wang Q, Struhl K. Extensive chromatin fragmentation improves enrichment of protein binding sites in chromatin immunoprecipitation experiments. Nucleic Acids Research. 2008;36:e125. doi: 10.1093/nar/gkn535. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Grant CE, Bailey TL, Noble WS. FIMO: Scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017–1018. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Henikoff JG, Belsky JA, Krassovsky K, MacAlpine DM, Henikoff S. Epigenome characterization at single base-pair resolution. Proceedings of the National Academy of Sciences of USA. 2011;108:18318–18323. doi: 10.1073/pnas.1110731108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Hesselberth JR, Chen X, Zhang Z, Sabo PJ, Sandstrom R, Reynolds AP, Thurman RE, Neph S, Kuehn MS, Noble WS, Fields S, Stamatoyannopoulos JA. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nature Methods. 2009;6:283–289. doi: 10.1038/nmeth.1313. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Hnisz D, Abraham BJ, Lee TI, Lau A, Saint-André V, Sigova AA, Hoke HA, Young RA. Super-enhancers in the control of cell identity and disease. Cell. 2013;155:934–947. doi: 10.1016/j.cell.2013.09.053. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Kasinathan S, Orsi GA, Zentner GE, Ahmad K, Henikoff S. High-resolution mapping of transcription factor binding sites on native chromatin. Nature Methods. 2014;11:203–209. doi: 10.1038/nmeth.2766. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Koerber RT, Rhee HS, Jiang C, Pugh BF. Interaction of transcriptional regulators with specific nucleosomes across the Saccharomyces genome. Molecular Cell. 2009;35:889–902. doi: 10.1016/j.molcel.2009.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] Neph S, Vierstra J, Stergachis AB, Reynolds AP, Haugen E, Vernot B, Thurman RE, John S, Sandstrom R, Johnson AK, Maurano MT, Humbert R, Rynes E, Wang H, Vong S, Lee K, Bates D, Diegel M, Roach V, Dunn D, Neri J, Schafer A, Hansen RS, Kutyavin T, Giste E, Weaver M, Canfield T, Sabo P, Zhang M, Balasundaram G, Byron R, MacCoss MJ, Akey JM, Bender MA, Groudine M, Kaul R, Stamatoyannopoulos JA. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature. 2012;489:83–90. doi: 10.1038/nature11212. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Orsi GA, Kasinathan S, Zentner GE, Henikoff S, Ahmad K. Mapping regulatory factors by immunoprecipitation from native chromatin. Current Protocols in Molecular Biology. 2015;110:21.31.1–21.31.25. doi: 10.1002/0471142727.mb2131s110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Pott S, Lieb JD. What are super-enhancers? Nature Genetics. 2014;47:8–12. doi: 10.1038/ng.3167. [DOI] [PubMed] [Google Scholar]

[bib12] Rhee HS, Pugh BF. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell. 2011;147:1408–1419. doi: 10.1016/j.cell.2011.11.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Samkurashvili I, Luse DS. Translocation and transcriptional arrest during transcript elongation by RNA polymerase II. The Journal of Biological Chemistry. 1996;271:23495–23505. doi: 10.1074/jbc.271.38.23495. [DOI] [PubMed] [Google Scholar]

[bib14] Skene PJ, Hernandez AE, Groudine M, Henikoff S. The nucleosomal barrier to promoter escape by RNA polymerase II is overcome by the chromatin remodeler Chd1. eLife. 2014;3:e02042. doi: 10.7554/eLife.02042. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Solomon MJ, Larsen PL, Varshavsky A. Mapping protein-DNA interactions in vivo with formaldehyde: evidence that histone H4 is retained on a highly transcribed gene. Cell. 1988;53:937–947. doi: 10.1016/S0092-8674(88)90469-2. [DOI] [PubMed] [Google Scholar]

[bib16] Starick SR, Ibn-Salem J, Jurk M, Hernandez C, Love MI, Chung HR, Vingron M, Thomas-Chollier M, Meijsing SH. ChIP-exo signal associated with DNA-binding motifs provides insight into the genomic binding of the glucocorticoid receptor and cooperating transcription factors. Genome Research. 2015;25:825–835. doi: 10.1101/gr.185157.114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] Teytelman L, Ozaydin B, Zill O, Lefrançois P, Snyder M, Rine J, Eisen MB. Impact of chromatin structures on DNA processing for genomic analyses. PLOS ONE. 2009;4:e6700. doi: 10.1371/journal.pone.0006700. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] Weber CM, Ramachandran S, Henikoff S. Nucleosomes are context-specific, H2A.Z-modulated barriers to RNA polymerase. Molecular Cell. 2014;53:819–830. doi: 10.1016/j.molcel.2014.02.014. [DOI] [PubMed] [Google Scholar]

PERMALINK

A simple method for generating high-resolution maps of genome-wide protein binding

Peter J Skene

Steven Henikoff

Roles

Abstract

Main text

Figure 1. High-resolution X-ChIP-seq of PolII at transcriptional start sites (TSS).

Figure 1—figure supplement 1. Sonication bias at promoter regions.

Figure 2. Size selection to enrich for short immunoprecipitated fragments.

Figure 3. High-resolution X-ChIP-seq provides base-pair resolution of the minimal CTCF sequence motif.

Figure 3—figure supplement 1. V plot of DNA fragments recovered by CTCF high-resolution X-ChIP-seq.

Figure 3—figure supplement 2. Comparison of the ends of DNA fragments from DNase I and high-resolution X-ChIP-seq centered over the CTCF motif.

Figure 4. Comparison of different ChIP methodologies and how the resolution obtained depends on the fragmentation strategy used.

Materials and methods

Cell lines

ChIP

Funding Statement

Funding Information

Additional information

Competing interests

Author contributions

Additional files

Major datasets

References

Decision letter

Roles

Author response

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A simple method for generating high-resolution maps of genome-wide protein binding

Peter J Skene

Steven Henikoff

Roles

Abstract

Main text

Figure 1. High-resolution X-ChIP-seq of PolII at transcriptional start sites (TSS).

Figure 1—figure supplement 1. Sonication bias at promoter regions.

Figure 2. Size selection to enrich for short immunoprecipitated fragments.

Figure 3. High-resolution X-ChIP-seq provides base-pair resolution of the minimal CTCF sequence motif.

Figure 3—figure supplement 1. V plot of DNA fragments recovered by CTCF high-resolution X-ChIP-seq.

Figure 3—figure supplement 2. Comparison of the ends of DNA fragments from DNase I and high-resolution X-ChIP-seq centered over the CTCF motif.

Figure 4. Comparison of different ChIP methodologies and how the resolution obtained depends on the fragmentation strategy used.

Materials and methods

Cell lines

ChIP

Funding Statement

Funding Information

Additional information

Competing interests

Author contributions

Additional files

Major datasets

References

Decision letter

Roles

Author response

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases