Abstract
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is currently the method of choice to determine binding sites of chromatin-associated factors in a genome-wide manner. Here, we describe a method to investigate the binding preferences of mammalian DNA methyltransferases (DNMT) based on ChIP-seq using biotin-tagging. Stringent ChIP of DNMT proteins based on the strong interaction between biotin and avidin circumvents limitations arising from low antibody specificity and ensures reproducible enrichment. DNMT-bound DNA fragments are ligated to sequencing adaptors, amplified and sequenced on a high-throughput sequencing instrument. Bioinformatic analysis gives valuable information about the binding preferences of DNMTs genome-wide and around promoter regions. This method is unconventional due to the use of genetically engineered cells; however, it allows specific and reliable determination of DNMT binding.
Keywords: ChIP-seq, Immunoprecipitation, in vivo biotinylation, Next-generation sequencing, DNA methyltransferases, CpG islands
1. Introduction
Methylation of cytosine bases is one of the best mechanistically understood epigenetic modification and plays various roles in genome regulation. Three conserved enzymes are responsible for the deposition of methyl groups to cytosine bases in mammals - the de novo DNA methyltransferases 3A (DNMT3A) and DNMT3B, as well as the maintenance DNA methyltransferase 1 (DNMT1) [1,2]. In mammals, DNA methylation occurs at the majority of CpG dinucleotides throughout the entire genome, only CpG islands remain largely protected from methylation [3]. Based on biochemical studies, the mechanism of DNA methylation has been largely elucidated, but how DNA methylation patterns are precisely set along the genome and to what extent these cause further regulation remains to be understood in full detail.
Genome-wide localization studies of epigenetic marks and their regulatory factors are essential for elucidating many biological processes and disease states. Recent technological advances in high-throughput sequencing have enabled a genomics revolution, making studies of chromatin-associated factors in a genome-wide manner affordable, faster and more precise. Amongst others, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) can be utilized to determine how and where chromatin-associated proteins, such as DNMTs, bind to the genome (Figure 1). In this method, proteins of interest are directly cross-linked to their site of interaction on chromatin, followed by selective enrichment using antibodies and high-throughput sequencing of bound DNA. Obtained sequencing reads are aligned to the genome and high-frequency interaction sites are determined based on local coverage (Figure 1). However, this technique is often limited by various problems that result in unspecific or absent binding signals. In particular, the performance of antibodies can greatly vary and thus limit the applicability of ChIP for a wide range of proteins [4]. This problem of inadequate enrichment is in particular relevant for antibodies directed against DNA methyltransferases, of which only a few ChIP-grade antibodies exist, therefore limiting genome-wide profiling of DNMTs.
Figure 1.
Standard ChIP-seq workflow. Proteins of interest are first cross-linked to chromatin (fixation). Cells are lysed and chromatin is extracted. Fragmentation of chromatin is achieved by ultrasonication or enzymatic digest. Specific antibodies are used to enrich genomic regions bound by the protein of interest. Eluted DNA is sequenced and obtained reads are aligned to a reference genome for subsequent analysis.
Epitope tags have been proven useful for ChIP-seq applications where suitable antibodies are not available [5–8]. Among these, we recommend biotin tagging as an attractive technique for the purification of proteins of choice based on strong avidin-biotin interaction that can resist a large range of salt and detergent concentrations, temperatures and pH levels [9,10]. This is especially beneficial for DNMTs that, unlike transcription factors, display a broad binding preference to the genome and where sequencing analysis requires robust signal-to-noise ratios [11]. Additionally, biotinylated proteins are rare in nature, dramatically minimizing the chances of cross-reactions that could distort the result of a performed precipitation assay [12].
Here, we first present a step-by-step protocol for genome-wide profiling of DNA-methyltransferases by biotin-ChIP followed by high-throughput sequencing. As a requirement, a short biotin acceptor site (16-23aa) needs to be added to the N- or C-terminus of the DNMT protein of interest, and the bacterial biotin ligase BirA has to be stably expressed in the cell line or tissue of interest (Figure 2). Upon translation of the tagged DNMT protein, the acceptor site is specifically recognized and biotinylated by BirA in vivo [13,14]. The biotin acceptor sequence can be either introduced to the endogenous Dnmt locus (via genome editing - see Flemr and Buhler 2015 for details [15]), or expressed as a tagged variant from a heterologous site [11]. Both approaches allow reliable measurements of DNMT-genome interactions and should be used depending on the biological question.
Figure 2.
(A) Expression of DNMT proteins fused to the biotin acceptor peptide can be either achieved from endogenously modified Dnmt genes or from heterologous integration sites. (B) The biotin acceptor peptide is covalently biotinylated in vivo by the BirA ligase. (C) This allows stringent and reproducible immunoprecipitation based on strong biotin-avidin interactions.
For the subsequent ChIP, cells are fixed with formaldehyde, and their nuclei extracted and lysed. Following sonication of chromatin, streptavidin-coupled magnetic beads are used to enrich biotinylated-DNMT proteins. Stringent washing steps ensure removal of all unspecific interactions prior to DNA elution and high-throughput sequencing. Finally, the obtained sequencing reads from DNMT ChIP and input samples are aligned to the genome and genome-wide binding properties of DNMTs are extracted and visualized. We furthermore indicate potential strategies to analyze binding of DNMTs genome-wide, at promoters and at CpG islands, with particular interest in different genomic locations, DNA methylation and CpG densities (Figure 4).
Figure 4.
Examples of data analysis to identify DNMT localisation at promoters and CpG islands. A) Scatter plot indicating variation in CpG density and CpG methylation at mouse gene promoters. Promoters are binned based on CpG densities into high, intermediate and low CpG promoters, according to Weber et al 2007 (HCP, ICP, LCP). B) Violin plots indicating the distribution of DNA methylation and DNMT3A2 binding at all CpG islands (all CGi), at CpG islands overlapping with promoters (TSS CGi) and CpG islands overlapping with gene bodies (GB CGi). C) Heatmap indicating coverage of DNMT3A2 reads around methylated (> 80% m-CpG) CpG islands. Each row represents one +/- 4kb interval surrounding individual CpG islands. k-means clustering identifies CpG islands with similar DNMT3A2 binding properties. D) Average density profiles of DNMT3A2 binding at all CpG islands (all CGi), at CpG islands overlapping with promoters (TSS CGi) and CpG islands overlapping with gene bodies (GB CGi).
2. Materials
Prepare all solutions using ultrapure water and analytical grade reagents. Prepare and store all reagents at room temperature, unless indicated otherwise. Diligently follow all waste disposal regulations when disposing chemicals. See Notes 1 and 2 for further general instructions.
2.1. Chromatin Cross-linking
2.2. Chromatin Extraction
Phosphate-buffered Saline (DPBS, Invitrogen #14190144) at 4 °C
EDTA-free Protease-inhibitor cocktail (PIC, make 25 x stock in water, Sigma #11836170001)
Cell scraper (TPP, #99003)
Refrigerated swing-out, bench top centrifuge for 15 mL flacons
Chromatin Extraction Buffer 1: 10 mM EDTA (pH 8); 10 mM Tris-HCl (pH 8) (see Note 3); 0.5 mM EGTA (pH 8); 0.25 % Triton X-100; ad H2O to 500 mL
Chromatin Extraction Buffer 2: 1 mM EDTA (pH 8); 10 mM TRIS (pH 8); 0.5 mM EGTA (pH 8); 200 mM NaCl; ad H2O to 500 mL
ChIP Dilution Buffer: 50 mM HEPES (pH 7.5), 1 mM EDTA (pH 8); 1 % Triton X-100; 0.1 % Sodium deoxycholate.
ChIP Lysis Buffer: ChIP Dilution Buffer with 0.2 % Sodium dodecyl sulfate (SDS) (see Note 5); 300 mM NaCl.
ChIP Buffer: ChIP Dilution Buffer with 0.1 % Sodium dodecyl sulfate (SDS) (see Note 5); 150 mM NaCl.
Optional: 1 mL syringe (Braun Injekt®-F #9166017V) and hypodermic-needle (Braun Sterican® 25G x 5/8”, # 4658302)
2.3. Sonication
Bioruptor® Pico sonication device (Diagenode B01060001).
1.5 mL Bioruptor® Pico Microtubes with Caps (Diagenode #C30010016).
Thermomixer (Eppendorf Thermomixer C).
PCR MinElute Kit (Qiagen #28004)
NanoDrop (ThermoFisher #ND-2000)
Agarose, 1 x TBE, RedSafe Stain (20,000 x, ChemBio #21141)
6x gel loading dye (NEB #B7022).
Gel electrophoresis chamber (BioRad #1704406), power supply (BioRad #1645070).
2.4. Bead preparation
Dynabeads® M-280 Streptavidin (ThermoFisher #11206D)
IP Buffer: ChIP Dilution Buffer with 0.1 % SDS, 150 mM NaCl.
EDTA-free Protease-inhibitor cocktail (PIC, Sigma #11836170001).
Blocking buffer: 1 mL ChIP Buffer, 1 % cold fish skin gelatine (Sigma #G7765), 100 μL S. cerevisiae tRNA (10 mg/mL, Sigma R5636-5X; see Note 6), 1 x PIC.
Overhead rotator (ThermoFisher # 15920D).
Magnetic rack (ThermoFisher #12321D).
2.5. Chromatin Immunoprecipitation (IP)
2 % SDS in Tris-EDTA buffer (pH 8)
DOC Buffer: 250 mM LiCl; 0.5 % 4-Nonylphenyl-polyethylene glycol (NP-40); 0.5 % Sodium-deoxycholate; 1 mM EDTA (pH 8); 10 mM TRIS (pH 8)
High salt buffer (HSB): ChIP Dilution Buffer with 0.2 % Sodium dodecyl sulfate (SDS); 500 mM NaCl
Tris-EDTA buffer (pH 8)
2.6. DNA Elution
Elution buffer : 1 % SDS; 100 mM Sodium bicarbonate (see Note 7)
RNase A (10 mg/mL, Sigma #1010969001).
Proteinase K (10 mg/mL, Sigma #03115828001).
2.7. DNA purification
Option 1: PCR MinElute Kit (Qiagen #28004).
Option 2: Phenol solution (Sigma #P4557) (see Note 1)
Chloroform and Isoamylalcohol (24:1 mix, Sigma # C0549).
Glycogen (20 μg/μL, Sigma #10901393001).
3 M Sodium Acetate, ice-cold Ethanol (70 % and 100 %).
Eppendorf® LoBind DNA/RNA microcentrifuge 1.5 mL tubes (Sigma #Z666548).
2.8. Library Preparation and Sequencing
NEBNext ULTRA DNA library prep kit (NEB #E7370)
NEBNext multiplex plugs for Illumina (NEB #E7335).
MinElute PCR Purification Kit (Qiagen #28004).
Nuclease-free water.
Ampure® XP Beads (Beckman #A63880)
80% Ethanol (see Note 8)
Magnetic stand
DNA LoBind Tubes (Eppendorf #022431021).
PCR cycler
Agilent DNA 1000 Kit (Agilent #5067-1504), including Ladder and Sample Buffer; D1000 ScreenTape (Agilent #5067-5582); 2200 TapeStation Instrument (Agilent #G2964AA).
Scalpel, UV-table.
2.9. Required software and R libraries
3. Methods
3.1. Chromatin Cross-linking
Culture cells under chosen conditions on standard cell culture dishes - e.g. one 10 cm dish of mouse embryonic stem cells is sufficient for one ChIP for which around 100 μg chromatin is needed (see Note 9 and 10).
Add 1/10 vol. of 11 % (v/v) formaldehyde solution to medium and ensure even distribution by shaking plate slightly. For 8 mL medium add 800 μL 11 % FA solution to obtain 1 %. Incubate 8 min at room temperature.
Add 440 μL 2.5 M glycine (0.125 mM final concentration) to 10 cm dish and shake. Incubate 10 min on ice to quench formaldehyde. When incubating avoid drying of cells.
Rinse cells twice with 10 mL ice-cold PBS. Keep dishes on ice and avoid drying. When handling several plates rinse plates one after the other.
3.2. Chromatin Extraction
Add 1 mL PBS + 1 x PIC and scrape cells off. Resuspend and collect by flushing plate and transfer in appropriate tube depending on number of plates used.
Spin cells at 600 x g for 5 min at 4 °C and remove supernatant carefully. At this stage cells can be stored at -80 °C.
Resuspend in 5 mL/dish Chromatin Extraction Buffer 1, incubate 10 min on ice.
Spin cells at 600 x g for 5 min at 4 °C and remove supernatant carefully.
Resuspend in 5 mL/dish Chromatin Extraction Buffer 2 (see Note 11), incubate 10 min on ice.
Spin cells at 600 x g for 5 min at 4 °C and remove supernatant carefully.
Resuspend in 900 μL ChIP Lysis Buffer + 1 x PIC per ChIP and incubate for 1-2 hours on ice by inverting tube sporadically to resuspend cell nuclei.
3.3. Sonication
Aliquot to pre-chilled tubes indicated by the manufacturer - a maximum of 300 μL per tube for 1.5 mL Diagenode reaction tubes
Sonicate 300 μL aliquots in sonicator for 30 sec ON and 45 sec OFF for as many cycles as needed (see Note 12).
Pellet cell debris for 10 min at 12 000 x g and 4 °C, pool supernatants to a new, pre-chilled 1.5 mL-reaction tube and keep on ice.
Take out 40 μL of each sonicated chromatin for sonication test (see Note 13). Store remaining chromatin at 4 °C overnight.
Reverse cross-link all collected 40 μL aliquots overnight (procedure as described in step 3.6 - DNA Elution - Option 1).
Purify de-crosslinked chromatin with Minelute columns according to the manufacturer’s protocol. Elute in 2 x 20 μL EB and measure concentration on nanodrop.
Pour a 1.2 % agarose gel with 1 x TBE and 1 x RedSafe.
Add 6 x Loading Dye to the DNA samples, load 500 and 1200 ng on a 1.2 % agarose gel to test sonication. Add a DNA standard in a another lane. Start electrophoresis at 60 V until the sample has entered the gel and then continue at 100 V till the dye front reaches the middle of the gel.
If sonicated chromatin ranges around 200-500 bp, continue with ChIP (Figure 3). Otherwise sonicate as many more cycles as necessary to reach the recommended size (see Note 14).
Figure 3.
DNA of sonicated chromatin from mouse embryonic stem cells loaded on 1.2 % agarose gel. 500 and 1200 ng of DNA loaded on lane 1 and 3, respectively. Lower amounts of DNA allow better estimation of fragmentation efficiency. Furthermore, gel was restained with 1 x RedSafe in water for 1 hour after electrophoresis to ensure homogenous detection of higher molecular weight DNA molecules.
3.4. Bead preparation
Wash 30 μL streptavidin-magnetic beads per ChIP 2 x 5 min in 1 mL ChIP Buffer. Bind magnetic beads on magnetic rack for 2 min each time and remove the buffer.
Resuspend beads in 1 mL blocking buffer and incubate 1-2 h at 4 °C with overhead rotation.
Wash blocked beads 3 x with 1 mL ChIP buffer and take up in 30 μL ChIP buffer per ChIP after final washing step (see Note 15).
3.5. Chromatin Immunoprecipitation
Dilute NaCl- and SDS-concentration of lysed and sonicated chromatin by adding 1 x volume of ChIP dilution buffer including 1 x PIC. Aliquot ~100 μg chromatin based on the measured concentration obtained from the 5 % de-crosslinked input elute (step 3.3.6) and adjust sample volumes with ChIP buffer, supplemented with 1 x PIC to 1 mL.
Store 5 % of chromatin at −20 °C as an input sample for each cell line/tissue (see Note 16).
Add 30 μL pre-blocked Streptavidin-magnetic beads for up to 2 mL volume of chromatin sample and incubate overnight at 4 °C on an overhead rotator (see Note 17).
Wash 2 x 8 min with 2 % SDS in 1 x TE.
Pulse-spin and collect beads on magnetic rack for 2 minutes in order to remove supernatant between each step.
Wash 1 x 8 min with High salt buffer.
Wash 1 x 8 min with DOC buffer.
Wash 2 x 8 min with 1 mL 1 x TE. Change tube to 2 mL reaction tube at second wash to avoid carryover of unspecific material from the walls of the tube.
3.6. DNA elution
Resuspend ChIP beads and saved input samples (from step 5.2) in a final volume of 300 μL in fresh elution buffer.
Add 6 μL RNaseA and mix by inverting the tube.
Incubate 30 min at 37 °C and shaking at 750 rpm to avoid settling of magnetic beads to the bottom of the tube.
Adjust elution buffer by adding 6 μL 0.5 M EDTA, 12 μL 1 M Tris-HCl (pH 8), and 6 μL Proteinase K.
Incubate for 3 hours at 55 °C and de-crosslink overnight at 65 °C and mixing with 750 rpm to avoid settling of beads at the bottom of the tube.
3.7. DNA purification
-
a)Option 1
- Purify DNA from ChIP and input elutes with MinElute Kit following the manufacturer’s instructions. Use 10 μL EB, and elute twice. Collect both elutes in DNA Low-bind tubes. Alternatively, use phenol-chloroform extraction as indicated below (see Note 18).
-
b)Option 2
- Add 300 μL Phenol and mix by vortexing (see Note 1). The solution should become white throughout, although successive rapid phase separation can be observed.
- Centrifuge for 3 min at 12 000 x g at room temperature. Transfer upper phase to new tube and add 300 μL of Chloroform:Isoamylalcohol solution and mix by vortexing.
- Centrifuge again for 3 min at 12 000 x g and transfer upper phase to new 1.5 mL LoBind-tube.
- Add 1 μL (20 μg) glycogen, 30 μL 3 M NaOAc and 700 μL 100 % Ethanol cooled to -20 °C. Do not prepare master mix and add in given order, otherwise glycogen precipitates in Ethanol.
- Mix well and spin for about 2 hours at 12 000 x g at 4 °C to ensure maximum recovery of precipitated DNA.
- Remove supernatant and add 1 mL ice-cold 70 % EtOH, vortex shortly and centrifuge for 30 min at 12 000 x g at 4 °C.
- Remove supernatant completely through pipetting (see Note 19) and air-dry pellet for 10 min.
- Take up pellet in 20 μL nuclease-free H2O.
- Measure DNA concentration using Qubit dsDNA HighSensitivity for ChIP material and nanodrop for input material.
- Samples can be stored at -20 °C for months or directly used for library preparation.
3.8. Library Preparation
End Repair: Add 6.5 μL 10x NEBNext End Repair Reaction Buffer and 3 μL NEBNext End Prep Enzyme Mix to 10 - 20 ng of ChIP or input DNA (diluted in water to final volume of 55.5 μL).
Incubate for 30 minutes at 20 °C, followed by 30 minutes at 65 °C.
Purify DNA with MinElute PCR Purification Kit according to manufacturer’s instructions including 250 μL of PB buffer. Elute in 2 x 22 μL EB.
Adaptor Ligation: Add 15 μL of Ligase Master Mix, 2.5 μL diluted adapter oligo mix (1:10 in water) and 1 μL Ligation enhancer to 65 μL sample obtained in step 2.
Incubate for 15 minutes at 20 °C
Add 3 μL of USER enzyme, pipet up/down and incubate 15 minutes at 37 °C
Purify ligated DNA via Ampure® XP Beads in a beads to DNA ratio of 1.2 to 1 according to manufacturer’s instructions and take up DNA in 20 μL water.
PCR-Amplification: Add 2.5 μL of Universal primer to 25 μL of NEBNext Q5 Hot Start HiFi PCR 2 x MasterMix. Mix with eluted DNA and add 2.5 μL of an Index primer of choice to gain a total volume of 50 μL (see Note 20).
Amplify library using the following cycling conditions: Initial denaturation: 30 seconds at 98 °C, 15 cycles amplification á 10 seconds at 98 °C and 75 seconds at 65 °C followed by a final extension of 5 minutes at 65 °C.
Purify samples again with Ampure® XP Beads as above.
Quality Control: Run 1 μL of each sample on a 2200 TapeStation - D1000 ScreenTape to check library size and concentration, and identify presence of primer dimers or self-ligated adapters.
Sample Pooling: The sample concentration is calculated according to the TapeStation result for fragments within a size of 150 - 400 bp. Samples with different indices can now be pooled at equimolar ratios.
Adapter removal: If required, the library can be purified from self-ligated adapters through a 2 % agarose gel. Electrophorese at 60 V until the sample has entered the gel and then continue at 100 V until the dye front reaches the middle of the gel (see Note 21).
Check DNA size on a UV table and cut out fragments with a size of 150-400 bp. Extract DNA with the help of MinElute Gel Extraction Kit using 2 x 11 μL EB. A total molarity of approximately 10 nM is sufficient for further processing and Illumina sequencing.
For this application, sequencing on an Illumina HiSeq2500 instrument on 125-bp-reads in a single-end manner is performed (see Note 22).
3.9. Sequencing depth and read alignments
Due to the pervasive, genome-wide binding preference of DNMT proteins, we recommend to sequence at least 50 million reads per sample to guarantee sufficient coverage for downstream analysis. This is easily reached with current Illumina sequencing platforms. We also recommend including the corresponding input sample to the same sequencing reaction. This allows to identify and normalize potential biases in chromatin fragmentation or library preparation [16], and to calculate DNMT enrichments (see later steps). Once sequencing reads are obtained, the .fastq-files should be first filtered for low quality reads and PCR duplicates, and adapter sequences should be removed. This can be achieved using the FASTX-Toolkit [17] or similar tools. After preprocessing, the reads are aligned to the reference genome using BOWTIE [18] or similar aligners, allowing two mismatches and excluding reads that map to multiple locations in the genome (see Note 23). QuasR in R provides a simple, R-based interface for alignment of sequencing reads, count-based summary extraction in the GenomicRanges format and simple data visualization of genomic alignments [19].
3.10. Detection of DNMT-enriched sites
Due to the broad binding preference of DNMTs to the genome, standard peak calling algorithms fail to produce meaningful results and enriched sites have to be defined using alternative approaches. We suggest to bin the genome into 1-kb sized intervals and to calculate the log2-fold enrichments (LFE) of DNMT signals over the corresponding input sample (see Note 24). Binning and selection of genomic intervals can be achieved in with the GenomicRanges package in R [21]. This package is especially useful to represent and manipulate genomic annotations, and for storing sequencing data along the defined annotations. Once the genomic intervals and the aligned sequencing reads are stored as GenomicRanges objects, the number of sequencing reads overlapping with the 1kb-sized genomic intervals can be directly calculated for the ChIP and input samples and stored as a variable. LFE can be calculated using the following formula:
whereas n_IP and n_inp represents the number of overlapping ChIP or input reads per 1kb interval, respectively. N_IP and N_inp the library size of ChIP and input samples, and p = 8 pseudocounts to stabilize sampling noise. The obtained LFE can be utilized to rank and identify the enriched regions in the genome. Alternatively, and if replicates are available, the calculated reads per interval can be further used for the detection of significantly enriched tiles using edgeR or DESeq2 packages in R [22,23]. For this we suggest to remove all genomic intervals with an LFE of < 0.5 and > −0.5. This filtering step removes tiles that have no chance in passing significance tests, and therefore results in increased detection power by ameliorating multiple-testing normalization. Alternatively, the genefilter package in R provides functions that can be used to determine the most suitable cutoff for filtering [24].
3.11. Analysis of DNMT binding at promoters and CpG islands
In order to analyze DNMT binding at promoters or CpG islands, first the correct genomic coordinates have to be defined. In case of promoters this can be easily achieved by using the GenomicFeatures package in R to obtain the regions of interest surrounding transcriptional start sites (TSS) in the appropriate GenomicRanges format [21]. Alternatively, promoter and TSS regions can be directly downloaded from the UCSC table browser [25]. The same is true for CpG island definitions which can be directly downloaded from UCSC. However, the UCSC definitions are based on Gardiner-Garden et al. (1987) [26], and more accurate CpG island detection algorithms exist [27,28]. Again, the decision on which definitions to use should be defined based on the biological question in mind. We suggest converting the obtained genomic coordinates to the GenomicRanges format in R [21]. This allows straightforward calculation and storage of CpG densities, GC percentages, CpG observed/expected ratios as well as calculations of DNA methylation percentages [29], DNA methylation densities [30], or DNMT protein enrichments and other chromatin features such as bivalency of H3K27me3 and H3K4me3 marks. For the detection and calculation of DNA sequence features (including occurrences of transcription factor binding motifs) we recommend the biostrings package in R [31]. This allows extraction of DNA sequence information and performing calculations in a straightforward manner. In case of calculating LFE for DNMTs or other proteins/histone modifications, the different size of the analyzed features have to be taken into account. This is especially important for CpG islands intervals that strongly vary in size.
Therefore we recommend performing a reads per kilobase (RPK) normalization prior to calculating the LFE (see Note 25). Once these measurements are obtained for the promoters and CpG islands of interest, binning or selection of regions of interest can be performed according to the research question. For example, CpG islands can be binned into islands of CpG high, intermediate and low densities ([32], Figure 4A), high or low DNA methylation [30], or association with annotated promoters or orphan CpG islands [33]. Total DNMT protein enrichment at promoters and CpG islands can now be visualized using various methods, including box plots, density plots or violin plots (Figure 4B) and compared to other genomic and epigenomic features extracted at the same promoter regions using scatterplots.
3.12. Visualization of DNMT protein localization along promoters and CpG islands
However, these calculated DNMT enrichments do not necessarily allow to obtain information about the distribution of DNMT proteins along the analyzed genomic features. This is often required to understand the position of DNMT proteins along promoters and CpG islands, and to identify local binding preferences, requiring the user to inspect these sites one by one in genomic browsers. However, this is not feasible for large datasets and the results for individual intervals need to be visualized in bulk. For this we recommend the genomation package in R [34]. Heatmap profiles allow visualizing the coverage of DNMT proteins around all promoters and CpG islands of interest simultaneously, and furthermore to arrange or cluster the intervals of interest based on coverage or binding patterns, respectively (Figure 4C). Clustering is a straightforward method to separate promoters and CpG islands based on similar DNMT binding or other epigenetic properties. Additionally, the entire set of intervals, or selected intervals (e.g. by CpG density, genomic location or clustering), can be visualised as average metaprofiles that summarise general binding properties of DNMTs (Figure 4D). Both heatmap and metaprofile visualisation of DNMT coverage require that the genomic intervals of interest are aligned to each other in order to obtain meaningful results. For promoters, this requires that all intervals are centred and oriented based on a fixed position, such as the TSS. CpG islands intervals extracted based on DNA sequence properties usually do not contain strand information and can be centred either on the midpoint or the border of the interval. For CpG islands that overlap with promoters, the orientation can be defined based on the directionality of the underlying gene. Similar strategies can be also applied when analysing DNMT coverage at exon-intron borders, TF binding sites or repetitive elements.
Acknowledgements
We thank Isabel Schwarz and Joёl Wirz for carefully reading the manuscript prior to submission. Research in the Baubeclab is supported by an SNSF Professorship (SNF157488) and Systems-X.ch Special Opportunities Grant (2015_322) to T.B., and by the University of Zurich.
4 Notes
Do not handle chemicals until all safety precautions have been read and understood. Obtain special instructions before use. For instance, wear protective gloves, clothes and eye protection when handling sodium dodecyl sulfate (SDS), phenol, chloroform and sodium azide. In order to avoid formation of dust and/or vapors, work under a fume hood only is highly recommended. Be especially careful when handling phenol, always use SafeLock reaction tubes at all times.
All buffers can be stored at 4 °C unless stated otherwise.
pH adjustments: EDTA and EGTA are adjusted with NaOH; Tris solutions are adjusted with HCl; Hepes solution is adjusted with KOH or NaOH, as indicated.
Formaldehyde solution should be fresh and methanol-free.
SDS precipitates at 4 °C. Buffers containing more than 0.1% SDS should be stored at room temperature. These are stable for one year.
S. cerevisiae tRNA has to be denatured for 5 min at 95 °C prior to use. Keep on ice after denaturation.
We find that it is best to prepare the elution buffer fresh prior to de-crosslinking the chromatin.
Solutions containing ethanol should be prepared fresh each time. Provide good ventilation in process area to prevent formation of aerosols.
Depending on the study purpose, a wild-type or mock control should always be included as a separate sample for later evaluation.
Protocol performance varies from cell to cell and tissue type, since protein levels can vary and not all cell types are equally amenable to genetic engineering. Thus, the amount of cells/material and sonication cycles required for an optimal result is highly dependent on the cell line/tissue sample itself as well as the protein of interest in the study, and need to be tested beforehand.
Each step can be followed under the microscope, especially when performed for the first time (stain nuclei with trypan blue). Optionally, the extract can be passed 3 times through pre-chilled G26 syringe to dissociate cell clumps. In the end no clumps should be visible.
A sonication test run using different numbers of cycles is greatly recommended prior to start due to cell type variability and properties of the sonicator in use.
Testing a successful sonication of the chromatin is an important step that should be skipped on no account. Fragmenting the chromatin to a correct size range is crucial for an optimal precipitation and sonication efficiency can vary greatly depending on used cell concentrations. Be aware of these fluctuations and ensure good chromatin ranges prior to immunoprecipitation. If sonication is sufficient, DNA fragment should show highest signal in a small range of suitable sizes. For longer sonications, mix chromatin every 10 cycles by inverting the tubes. Allow the machine and samples to cool down every 30 minutes of continuous usage, if not indicated by manufacturer otherwise.
Extended sonication should be treated with caution, since overheating of the sample could affect the integrity of chromatin and associated proteins.
Always avoid drying of beads by closing lid and handling liquids quickly.
This input sample will be used for sequencing as it reflects the chromatin state at the point of the ChIP start.
Bead blocking can be done for up to eight ChIP reactions at once and stored for one week at 4 °C with 0.01 % NaAzide. Adjust volumes accordingly.
These options to isolate chromatin associated DNA exist due to the varying yield received from different starting material and/or the abundance of precipitated protein of interest. For first-time ChIP samples always use phenol-chloroform extraction.
Be careful not to disrupt the pellet.
Index primer sequences are individually added to each sample, allowing their identification from a pooled sequencing reactions.
Do not let the gel run for too long to avoid excessive separation of DNA and to concentrate the DNA in a small gel piece as much as possible.
The choice between single- or paired-end sequencing depends on the individual research questions to be addressed in the study, on the depth of sequencing coverage, and also the budget. Single-read sequencing involves sequencing from only one end of the DNA fragment and is a simpler way to utilize sequencing whereas paired-end sequencing allows reading both ends of a fragment. The resulting longer reads can provide more reliable information about the protein locations or mapping along repetitive sequence elements. However, this degree of accuracy may not be required for all experiments.
These parameters are just a recommendation and can be changed depending on the biological question in mind.
To avoid biases introduced by incorrect genome annotations, different genetic backgrounds, repetitive elements etc., we recommend removing 1kb intervals that overlap with satellite repeats, simple repeats or so called “blacklisted regions” from ENCODE [20]. Furthermore, removal of genomic intervals with insufficient coverage in the input sample helps to reduce false positives.
References
- 1.Okano M, Bell DW, Haber DA, Li E. DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell. 1999;99(3):247–257. doi: 10.1016/s0092-8674(00)81656-6. [DOI] [PubMed] [Google Scholar]
- 2.Smith ZD, Meissner A. DNA methylation: roles in mammalian development. Nat Rev Genet. 2013;14(3):204–220. doi: 10.1038/nrg3354. [DOI] [PubMed] [Google Scholar]
- 3.Suzuki MM, Bird A. DNA methylation landscapes: provocative insights from epigenomics. Nat Rev Genet. 2008;9(6):465–476. doi: 10.1038/nrg2341. [DOI] [PubMed] [Google Scholar]
- 4.Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P, Chen Y, et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 2012;22(9):1813–1831. doi: 10.1101/gr.136184.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Einhauer A, Jungbauer A. The FLAG peptide, a versatile fusion tag for the purification of recombinant proteins. J Biochem Biophys Methods. 2001;49(1-3):455–465. doi: 10.1016/s0165-022x(01)00213-5. [DOI] [PubMed] [Google Scholar]
- 6.Kolodziej KE, Pourfarzad F, de Boer E, Krpic S, Grosveld F, Strouboulis J. Optimal use of tandem biotin and V5 tags in ChIP assays. BMC Mol Biol. 2009;10:6. doi: 10.1186/1471-2199-10-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wilbanks EG, Larsen DJ, Neches RY, Yao AI, Wu CY, Kjolby RA, Facciotti MT. A workflow for genome-wide mapping of archaeal transcription factors with ChIP-seq. Nucleic Acids Res. 2012;40(10):e74. doi: 10.1093/nar/gks063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kidder BL, Hu G, Zhao K. ChIP-Seq: technical considerations for obtaining high-quality data. Nat Immunol. 2011;12(10):918–922. doi: 10.1038/ni.2117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.de Boer E, Rodriguez P, Bonte E, Krijgsveld J, Katsantoni E, Heck A, Grosveld F, Strouboulis J. Efficient biotinylation and single-step purification of tagged transcription factors in mammalian cells and transgenic mice. Proc Natl Acad Sci U S A. 2003;100(13):7480–7485. doi: 10.1073/pnas.1332608100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Green NM. Avidin and streptavidin. Methods Enzymol. 1990;184:51–67. doi: 10.1016/0076-6879(90)84259-j. [DOI] [PubMed] [Google Scholar]
- 11.Baubec T, Colombo DF, Wirbelauer C, Schmidt J, Burger L, Krebs AR, Akalin A, Schubeler D. Genomic profiling of DNA methyltransferases reveals a role for DNMT3B in genic methylation. Nature. 2015;520(7546):243–247. doi: 10.1038/nature14176. [DOI] [PubMed] [Google Scholar]
- 12.Lindqvist Y, Schneider G. Protein-biotin interactions. Current opinion in structural biology. 1996;6(6):798–803. doi: 10.1016/s0959-440x(96)80010-8. [DOI] [PubMed] [Google Scholar]
- 13.Schatz PJ. Use of peptide libraries to map the substrate specificity of a peptide-modifying enzyme: a 13 residue consensus peptide specifies biotinylation in Escherichia coli. Biotechnology (N Y) 1993;11(10):1138–1143. doi: 10.1038/nbt1093-1138. [DOI] [PubMed] [Google Scholar]
- 14.Kim J, Cantor AB, Orkin SH, Wang J. Use of in vivo biotinylation to study protein–protein and protein–DNA interactions in mouse embryonic stem cells. Nat Protoc. 2009;4:506–517. doi: 10.1038/nprot.2009.23. [DOI] [PubMed] [Google Scholar]
- 15.Flemr M, Buhler M. Single-Step Generation of Conditional Knockout Mouse Embryonic Stem Cells. Cell Rep. 2015;12(4):709–716. doi: 10.1016/j.celrep.2015.06.051. [DOI] [PubMed] [Google Scholar]
- 16.Meyer CA, Liu XS. Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nat Rev Genet. 2014;15(11):709–721. doi: 10.1038/nrg3788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.FASTX-Toolkit. 2010 http://hannonlab.cshl.edu/fastx_toolkit/
- 18.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Gaidatzis D, Lerch A, Hahne F, Stadler MB. QuasR: quantification and annotation of short reads in R. Bioinformatics. 2015;31(7):1130–1132. doi: 10.1093/bioinformatics/btu781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.mod/mouse/humanENCODE: Blacklisted genomic regions for functional genomics analysis. 2014 https://sites.google.com/site/anshulkundaje/projects/blacklists.
- 21.Lawrence M, Huber W, Pages H, Aboyoun P, Carlson M, Gentleman R, Morgan MT, Carey VJ. Software for computing and annotating genomic ranges. PLoS Comput Biol. 2013;9(8):e1003118. doi: 10.1371/journal.pcbi.1003118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Gentleman R, Carey V, Huber W, Hahne F. genefilter: methods for filtering genes from high-throughput experiments. 2016 https://www.bioconductor.org/packages/release/bioc/html/genefilter.html.
- 25.Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004;32(Database issue):D493–496. doi: 10.1093/nar/gkh103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987;196(2):261–282. doi: 10.1016/0022-2836(87)90689-9. [DOI] [PubMed] [Google Scholar]
- 27.Takai D, Jones PA. Comprehensive analysis of CpG islands in human chromosomes 21 and 22. Proc Natl Acad Sci U S A. 2002;99(6):3740–3745. doi: 10.1073/pnas.052410099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Hackenberg M, Carpena P, Bernaola-Galvan P, Barturen G, Alganza AM, Oliver JL. WordCluster: detecting clusters of DNA words and genomic elements. Algorithms Mol Biol. 2011;6:2. doi: 10.1186/1748-7188-6-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Stadler MB, Murr R, Burger L, Ivanek R, Lienert F, Scholer A, van Nimwegen E, Wirbelauer C, Oakeley EJ, Gaidatzis D, Tiwari VK, et al. DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature. 2011;480(7378):490–495. doi: 10.1038/nature10716. [DOI] [PubMed] [Google Scholar]
- 30.Baubec T, Ivanek R, Lienert F, Schubeler D. Methylation-dependent and -independent genomic targeting principles of the MBD protein family. Cell. 2013;153(2):480–492. doi: 10.1016/j.cell.2013.03.011. [DOI] [PubMed] [Google Scholar]
- 31.Pagès H, Aboyoun P, Gentleman R, DebRoy S. Biostrings: String objects representing biological sequences, and matching algorithms. 2016 https://bioconductor.org/packages/release/bioc/html/Biostrings.html.
- 32.Weber M, Hellmann I, Stadler MB, Ramos L, Paabo S, Rebhan M, Schubeler D. Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet. 2007;39(4):457–466. doi: 10.1038/ng1990. [DOI] [PubMed] [Google Scholar]
- 33.Illingworth RS, Gruenewald-Schneider U, Webb S, Kerr AR, James KD, Turner DJ, Smith C, Harrison DJ, Andrews R, Bird AP. Orphan CpG islands identify numerous conserved promoters in the mammalian genome. PLoS Genet. 2010;6(9):e1001134. doi: 10.1371/journal.pgen.1001134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Akalin A, Franke V, Vlahovicek K, Mason CE, Schubeler D. Genomation: a toolkit to summarize, annotate and visualize genomic intervals. Bioinformatics. 2015;31(7):1127–1129. doi: 10.1093/bioinformatics/btu775. [DOI] [PubMed] [Google Scholar]




