Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2019 Sep 30;36(4):1270–1272. doi: 10.1093/bioinformatics/btz720

ChIPseqSpikeInFree: a ChIP-seq normalization approach to reveal global changes in histone modifications without spike-in

Hongjian Jin 1, Lawryn H Kasper 2, Jon D Larson 3, Gang Wu 4, Suzanne J Baker 5, Jinghui Zhang 6, Yiping Fan 7,
Editor: Bonnie Berger
PMCID: PMC7523640  PMID: 31566663

Abstract

Motivation

The traditional reads per million normalization method is inappropriate for the evaluation of ChIP-seq data when treatments or mutations have global effects. Changes in global levels of histone modifications can be detected with exogenous reference spike-in controls. However, most ChIP-seq studies overlook the normalization that must be corrected with spike-in. A method that retrospectively renormalizes datasets without spike-in is lacking.

Results

ChIPseqSpikeInFree is a novel ChIP-seq normalization method to effectively determine scaling factors for samples across various conditions and treatments, which does not rely on exogenous spike-in chromatin or peak detection to reveal global changes in histone modification occupancy. Application of ChIPseqSpikeInFree on five datasets demonstrates that this in silico approach reveals a similar magnitude of global changes as the spike-in method does.

Availability and implementation

St. Jude Cloud (https://pecan.stjude.cloud/permalink/spikefree) and St. Jude Github ( https://github.com/stjude/ChIPseqSpikeInFree).

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Chromatin immunoprecipitation (ChIP) combined with next-generation sequencing (ChIP-seq) identifies genome-wide binding sites of DNA-associated proteins, including transcription factors, histones, or their cofactors. ChIP-seq is also useful for comparing occupancy changes at a set of loci across samples obtained from different biological and chemical perturbations. Global changes in histone modifications were reported in several pediatric brain tumors, such as globally reduced trimethylation of H3K27 (H3K27me3) in diffuse intrinsic pontine gliomas (DIPGs) (Bender et al., 2013; Chan et al., 2013; Harutyunyan et al., 2019; Larson et al., 2019; Lewis et al., 2013; Venneti et al., 2013), medulloblastomas (Dubuc et al., 2013), and ependymomas (Bayliss et al., 2016; Pajtler et al., 2018), as well as gain of H3K36me2 (Stafford et al., 2018) and H3K27ac (Harutyunyan et al., 2019; Larson et al., 2019; Lewis et al., 2013; Stafford et al., 2018) in H3K27M-expressing DIPGs. Globally reduced H3K36 methylation also occurs in chondrocytes and chondroblastoma cells with H3.3K36M mutations (Fang et al., 2016; Lu et al., 2016), and H3K79me2 is aberrantly elevated in MLL-rearranged leukemia (Bu et al., 2018; Daigle et al., 2011; Wong et al., 2015). Although differences in histone modification levels can be measured at individual loci in ChIP samples normalized to input chromatin, library preparation methods for ChIP-seq uncouple enrichment from the amount of starting chromatin.

Reliable cross-sample normalization was very challenging until the advent of the spike-in method (Bonhoure et al., 2014; Egan et al., 2016; Orlando et al., 2014). By adding a constant, low amount of reference exogenous chromatin as an internal control to each sample before immunoprecipitation, the spike-in method permits adjustment of signals in each sample to those of an internal control, thereby permitting direct comparison between ChIP-seq samples. Because the spike-in method requires a prerequisite experimental procedure, it also introduces several pitfalls. The proportion of spiked-in chromatin to chromatin of interest must be empirically optimized for different histone marks or transcription factors. Experimental variation and species cross-reactivity of antibodies may also be introduced, affecting subsequent data interpretation.

Therefore, we developed ChIPseqSpikeInFree, a novel ChIP-seq normalization method to effectively determine the scaling factors for samples obtained under various conditions or treatments, which does not rely on exogenous spike-in or peak detection to reveal global changes in histone modification occupancy.

2 Materials and methods

ChIP-seq datasets were collected from public domains, meeting the following criteria: (i) global changes in histone marks confirmed by immunoblot, (ii) ChIP performed with reference exogenous genomes (ChIP-Rx) (Orlando et al., 2014) and (iii) SPP-called Qtags were ≥ 1 (Marinov et al., 2014). Spike-in ChIP-seq samples were first aligned to a combined reference genome (hg19/sacCer3 for yeast chromatin spike-in, hg19/dm3 or mm9/dm3 for Drosophila chromatin spike-in), and reads were separated by each organism post alignment. Mapped reads were then marked duplicated with Picard (version 1.65). Only uniquely mapped reads (Samtools version 1.3.1, parameter ‘–q 1 –F 1024’) were used for analysis. We selected five high-quality datasets, including 67 samples of six histone marks (H3K27me3, H3K27ac, H3K4me3, H3K36 me2, H3K36me3 and H3K79me2) relevant to various cancer models (Supplementary Material).

To develop a ChIP-seq normalization method without an exogenous spike-in reference, we first explored global reduction of H3K27me3 by a histone H3.3 K27M mutation occurring at high frequency in DIPGs (Bender et al., 2013; Larson et al., 2019; Pathania et al., 2017; Venneti et al., 2013; Wu et al., 2012). H.3K27M globally reduces H3K27 methylation by inhibiting polycomb repressive complex 2 (PRC2) activity (Lewis et al., 2013), leading to failure to spread H3K27me from PRC2 recruitment sites and abrogating PRC2-established H3K27me2-3–repressive chromatin domains (Harutyunyan et al., 2019; Justin et al., 2016; Stafford et al., 2018).

To comprehensively survey genome-wide coverage of H3K27me3, we used a sliding window to scan the genome and counted the reads falling into the window. The window size was 1 kb and step size was 1 kb. We defined enrichment as count per million for each 1-kb window (CPMW). Then, we calculated the proportion of reads below the CPMW and visualized the results by density plots and cumulative distribution (Fig. 1) by using one pair of H3.3WT and H3.K27M ChIP-seq samples derived from E15.5 hindbrain neural stem cells (H-NSCs). We further defined two points for each sample: the turning point of enrichment signal start and the last summit (Fig. 1). Cumulative curves revealed distinct distributions of H3.3K27M and H3.3WT (Fig. 1A;  Supplementary Fig. S1). We classified the windows into highly enriched regions in H3.3WT (CPMW > Xb) and low signal intensity regions (CPMW ≤ Xb) (Fig. 1B). The majority of ChIP-seq input samples (> 95%) exhibited uniformly low coverage across the genome, and >98% of reads fell within low signal intensity regions (Supplementary Material).

Fig. 1.

Fig. 1.

ChIPseqSpikeInFree normalization faithfully detects global change of histone modifications. (A) Density plot of cumulative proportion of reads below a specific CPMW cutoff in one pair of H3.3WT and H3.K27M ChIP-seq samples derived from H-NSCs (dataset 1). Two turning points were determined based on density distribution. (B) Proportion of reads as a function of CPMW. A slope calculation is illustrated for H3.3WT samples with the two points a and b plotted. (C) Proportion of reads in two categories defined by the CPMW cutoff in H3.3 WT cells (Xb in B). The proportion of reads within two different CPMW ranges were calculated and plotted as bar graphs. Average fold change (FC) represents the ratio of H3.3K27M to H3.3WT in highly enriched regions. (D) Comparison of scaling factors derived from ChIPseqSpikeInFree and spike-in normalization. Scaling factors were grouped by histone H3 genotypes, ChIP antibodies and datasets. CB represents the T/C28a2 chondroblastoma cells (dataset 4). For spike-in normalization, the ChIP-Rx ratio was calculated for each sample and then transformed to a scaling factor according to Pathania et al. (2017). (E) The change was calculated as the log2 ratio of the average scaling factor of H3.3 K27M versus H3.3 WT (or K27M-KO, K27R) in DIPG datasets or H3.3 K36M versus H3.3 WT in the CB dataset. The x-axis depicts the log2 ratio of the scaling factor derived by Drosophila or yeast spike-in. The y-axis depicts the log2 ratio of the scaling factor derived by ChIPseqSpikeInFree normalization. The Pearson correlation coefficient between the two methods was 0.96 with p value of 1.2e-5. Symbol shape indicates ChIP-seq antibody, and symbol color matches the x-axis label in panel D

The proportion of H3K27me3 reads in highly enriched regions was higher in H3.3K27M than in H3.3WT (Fig. 1C;  Supplementary Material). This may reflect a restriction of H3K27me3 peaks to PRC2 recruitment sites in H3.3K27M samples. In the absence of efficient spreading of repressive marks, a larger proportion of reads fell within these focused regions. In contrast, reads were distributed across broader regions in H3.3WT samples and less focused in the highly enriched regions. We also confirmed this pattern in additional independent H3K27me3 ChIP-seq datasets (Supplementary Material). We also measured global changes in other histone marks in other disease models (Fang et al., 2016; Orlando et al., 2014) (Supplementary Material).

We observed that some highly enriched regions were retained despite global changes by oncogenic mutations or drug treatment and that the proportion of reads within these regions was inversely associated with total histone mark levels. To quantitatively measure this global change, we specifically calculated the slope at two points (Fig. 1B) of the cumulative curve for each sample. For any given sample i, the slope of the line (βi) connecting ai to βiwas calculated as

βi=(YbYa)/(XbXa)

where X and Y represent the coordinates on the x- and y-axes. Given a set of n samples immunoprecipitated with the same antibody, we chose one sample as a reference (r). We derived the scaling factor (S) for any other sample i as Si=βr/βi where βi is the slope of sample i. The effective ChIP-seq library size for sample i was then calculated as Ni * Si, where Ni is the original library size. The effective library size was then used to normalize the read count from sample i during the downstream analysis. Consequently, larger scaling factors reduced normalized read counts, indicating a global loss of ChIP-seq signals and vice versa.

3 Results

We comprehensively tested the algorithm for six histone marks in 67 ChIP samples that passed our criteria (Supplementary Material). ChIPseqSpikeInFree reliably detected no changes in H3K4me3, moderate global gains of H3K27Ac and dramatic global losses of H3K27me3 in K27M in mouse DIPGs, H-NSCs and mouse neural progenitor cells (Fig. 1D and E; Supplementary Material). ChIPseqSpikeInFree also reliably detected dramatic global gains of H3K27me3 with K27M knockout in the BT245 and DIPGXIII cell lines, as compared with K27M knockin or K27R knockin in G477 cells (Fig. 1D and E; Supplementary Material). ChIPseqSpikeInFree detected globally reduced H3K36me2 and H3K36me3 by H3.3 K36M in human benign chondroblastoma cells, no changes in H3K4me3, and globally reduced H3K79me2 in MV4-11 cells treated with phenylbutazone (Fig. 1D and e; Supplementary Material). These global changes determined by ChIPseqSpikeInFree and the spike-in–based method were of a similar magnitude and highly correlated (r > 0.9) (Fig. 1E;  Supplementary Material). The cumulative curves from the training datasets of 119 input samples (Supplementary Material) indicated that ChIPseqSpikeInFree can perform quality control testing to detect ChIP failure, input or input-like samples with poor enrichment, or complete loss of enrichment.

Funding

This study was supported in part by a National Cancer Institute award (P30CA021765), National Institutes of Health grant (P01CA096832) and American Lebanese Syrian Associated Charities (ALSAC).

Conflict of Interest: none declared.

Supplementary Material

btz720_Supplementary_Data

Contributor Information

Hongjian Jin, Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA.

Lawryn H Kasper, Department of Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA.

Jon D Larson, Department of Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA.

Gang Wu, Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA.

Suzanne J Baker, Department of Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA.

Jinghui Zhang, Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA.

Yiping Fan, Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA.

References

  1. Bayliss J.  et al. (2016) Lowered H3K27me3 and DNA hypomethylation define poorly prognostic pediatric posterior fossa ependymomas. Sci. Transl. Med., 8, 366ra161.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bender S.  et al. (2013) Reduced H3K27me3 and DNA hypomethylation are major drivers of gene expression in K27M mutant pediatric high-grade gliomas. Cancer Cell, 24, 660–672. [DOI] [PubMed] [Google Scholar]
  3. Bonhoure N.  et al. (2014) Quantifying ChIP-seq data: a spiking method providing an internal reference for sample-to-sample normalization. Genome Res., 24, 1157–1168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bu J.  et al. (2018) SETD2-mediated crosstalk between H3K36me3 and H3K79me2 in MLL-rearranged leukemia. Leukemia, 32, 890–899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Chan K.M.  et al. (2013) The histone H3.3K27M mutation in pediatric glioma reprograms H3K27 methylation and gene expression. Genes Dev., 27, 985–990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Daigle S.R.  et al. (2011) Selective killing of mixed lineage leukemia cells by a potent small-molecule DOT1L inhibitor. Cancer Cell, 20, 53–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Dubuc A.M.  et al. (2013) Aberrant patterns of H3K4 and H3K27 histone lysine methylation occur across subgroups in medulloblastoma. Acta Neuropathol., 125, 373–384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Egan B.  et al. (2016) An alternative approach to ChIP-seq normalization enables detection of genome-wide changes in histone H3 lysine 27 trimethylation upon EZH2 inhibition. PLoS One, 11, e0166438.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Fang D.  et al. (2016) The histone H3.3K36M mutation reprograms the epigenome of chondroblastomas. Science, 352, 1344–1348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Harutyunyan A.S.  et al. (2019) H3K27M induces defective chromatin spread of PRC2-mediated repressive H3K27me2/me3 and is essential for glioma tumorigenesis. Nat. Commun., 10, 1262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Justin N.  et al. (2016) Structural basis of oncogenic histone H3K27M inhibition of human polycomb repressive complex 2. Nat. Commun., 7, 11316.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Larson J.D.  et al. (2019) Histone H3.3 K27M accelerates spontaneous brainstem glioma and drives restricted changes in bivalent gene expression. Cancer Cell, 35, 140–155.e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Lewis P.W.  et al. (2013) Inhibition of PRC2 activity by a gain-of-function H3 mutation found in pediatric glioblastoma. Science, 340, 857–861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Lu C.  et al. (2016) Histone H3K36 mutations promote sarcomagenesis through altered histone methylation landscape. Science, 352, 844–849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Marinov G.K.  et al. (2014) Large-scale quality analysis of published ChIP-seq data. G3 (Bethesda), 4, 209–223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Orlando D.A.  et al. (2014) Quantitative ChIP-Seq normalization reveals global modulation of the epigenome. Cell Rep., 9, 1163–1170. [DOI] [PubMed] [Google Scholar]
  17. Pajtler K.W.  et al. (2018) Molecular heterogeneity and CXorf67 alterations in posterior fossa group A (PFA) ependymomas. Acta Neuropathol., 136, 211–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Pathania M.  et al. (2017) H3.3(K27M) cooperates with Trp53 loss and PDGFRA gain in mouse embryonic neural progenitor cells to induce invasive high-grade gliomas. Cancer Cell, 32, 684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Stafford J.M.  et al. (2018) Multiple modes of PRC2 inhibition elicit global chromatin alterations in H3K27M pediatric glioma. Sci. Adv., 4, eaau5935.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Venneti S.  et al. (2013) Evaluation of histone 3 lysine 27 trimethylation (H3K27me3) and enhancer of Zest 2 (EZH2) in pediatric glial and glioneuronal tumors shows decreased H3K27me3 in H3F3A K27M mutant glioblastomas. Brain Pathol., 23, 558–564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Wong S.H.  et al. (2015) The H3K4-methyl epigenome regulates leukemia stem cell oncogenic potential. Cancer Cell, 28, 198–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Wu G.  et al. (2012) Somatic histone H3 alterations in pediatric diffuse intrinsic pontine gliomas and non-brainstem glioblastomas. Nat. Genet., 44, 251–253. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btz720_Supplementary_Data

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES