Main Text
A commonly used genome fragmentation method in next generation sequencing, restriction endonuclease (RE) digestion, may severely compromise genomic mapping resolution and prevent the functional annotation of certain chromosomal regions unless REs are applied in correct combinations to sample all genomic regions with an equal probability.
Genome fragmentation by REs is routinely used in multiple genomic mapping technologies, including RNA-DNA hybrid (R-loop) immunoprecipitation sequencing (DRIP-seq),1, 2 chromosome conformation capture (4C/5C, Hi-C),3 reduced-representation bisulfite sequencing (RRBS),4 and restriction site associated marker (RAD) genotyping.5 The performance of these approaches depends on (1) the length distribution of the restriction fragments (determining the spatial resolution of the assay) and (2) the randomness of RE digestion (ensuring that all genomic regions are sampled with an equal probability).6 Therefore, selecting the proper combination of REs for genome fragmentation is of crucial importance to obtain representative next-generation sequencing (NGS) libraries and to assign clear biological functions to the mapped regions.
Using the DRIP-seq technique, we have recently shown that this technology contains inherent biases related to RE digestion that might prevent functional annotation of a significant fraction of R-loops.2 R-loops, nucleic acid structures that are composed of an RNA-DNA hybrid and a single-stranded DNA, are involved in multiple cellular processes and may also mediate genomic instability in a pathological context. The DRIP method uses an anti-RNA-DNA hybrid antibody to capture RNA-DNA hybrids associated with RE-fragmented DNA or chromatin, followed by fragment mapping to the genome. The main reason for the over-representation of lengthy DRIP fragments may be that the distribution of restriction enzyme cutting sites is not random in the human genome.7 This bias is especially enhanced over the first exons. The over-representation of first exons in RE-fragmented samples may also be an issue in other species and sequencing methods. For instance, the mouse genome also contains long intronic sequences that may cause similar biases. Similar to the DRIP method, suboptimal RE fragment size distribution and first exon bias might affect the outcome and interpretation of other frequently used genomic technologies (e.g., all C-based methods [4C, 5C, and Hi-C]), potentially introducing false-positive spatial contacts that fall proximal to open reading frames (ORFs), especially to first exons. Finally, the estimation of the evolutionary conservation of R-loop binding sites between species that reflect the sequence homology/divergence of exonic DRIP fragments,8 but precisely located R-loop binding sites, is potentially also problematic.
Superimposed on the RE bias, multiple other genome characteristics can affect the efficacy of RE digestion. DNA methylation is present in higher organisms, and the majority of REs do not cut at methylated cytosines. Furthermore, most REs do not cut DNA-RNA hybrids that are prevalent over the chromosomes (constituting 5%–8% of the eukaryote genome). Restriction enzyme accessibility is also limited by the chromatin (nucleosome) structure that inherently prefers the cutting of linker DNA sequences.
The randomness and uniformity of restriction fragment length distributions can be tested for any combination of REs using in silico restriction endonuclease digests (Figure 1), and RE cocktails with theoretically justified cutting parameters can be selected for use in experiments. We recommend using the DECIPHER R package, which is available in Bioconductor.9 To predict the expected DNA fragments, the “digestDNA” function can be used to perform in silico restriction digestion of given DNA sequences. Issues related to CpG methylation can be experimentally addressed by methylation-insensitive REs that cleave methylated DNA. RNase H1 digestion of nucleic acid preps can also be applied to remove RNA from DNA-RNA hybrids. Furthermore, short treatment of live cells with chromatin decompaction agents (e.g., HDAC inhibitors) may provide increased RE accessibility in experiments involving in situ RE fragmentation (e.g., Hi-C). Collectively, the above recommendations can help identify RE cocktails and experimental conditions that result in proper DNA fragment size distributions and optimal resolution in genomic sequencing technologies.
Figure 1.
Genome Fragmentation by In Silico Restriction Enzyme Digestion in Species That Were Analyzed by DRIP-seq or HI-C
The absolute number of restriction fragments is shown in terms of the average fragment lengths (mean + interquartile range [IQR]) obtained by the indicated restriction enzymes applied alone or in combination. RE cocktail denotes the HindIII, EcoRI, BsrGI, XbaI, and SspI enzymes.
Acknowledgments
This work was supported by the Hungarian Academy of Sciences (Lendület programme, LP2015-9/2015) and grants from the National Research, Development and Innovation Office (NKFIH-ERC-HU-117670 and GINOP-2.3.2-15-2016-00024).
References
- 1.Ginno P.A., Lott P.L., Christensen H.C., Korf I., Chédin F. R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters. Mol. Cell. 2012;45:814–825. doi: 10.1016/j.molcel.2012.01.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Halász L., Karányi Z., Boros-Oláh B., Kuik-Rózsa T., Sipos É., Nagy É., Mosolygó-L Á., Mázló A., Rajnavölgyi É., Halmos G., Székvölgyi L. RNA-DNA hybrid (R-loop) immunoprecipitation mapping: an analytical workflow to evaluate inherent biases. Genome Res. 2017;27:1063–1073. doi: 10.1101/gr.219394.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lieberman-Aiden E., van Berkum N.L., Williams L., Imakaev M., Ragoczy T., Telling A., Amit I., Lajoie B.R., Sabo P.J., Dorschner M.O. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Meissner A., Gnirke A., Bell G.W., Ramsahoye B., Lander E.S., Jaenisch R. Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucleic Acids Res. 2005;33:5868–5877. doi: 10.1093/nar/gki901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Miller M.R., Dunham J.P., Amores A., Cresko W.A., Johnson E.A. Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res. 2007;17:240–248. doi: 10.1101/gr.5681207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bystrykh L.V. A combinatorial approach to the restriction of a mouse genome. BMC Res. Notes. 2013;6:284. doi: 10.1186/1756-0500-6-284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hartono S.R., Korf I.F., Chédin F. GC skew is a conserved property of unmethylated CpG island promoters across vertebrates. Nucleic Acids Res. 2015;43:9729–9741. doi: 10.1093/nar/gkv811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Sanz L.A., Hartono S.R., Lim Y.W., Steyaert S., Rajpurkar A., Ginno P.A., Xu X., Chédin F. Prevalent, Dynamic, and Conserved R-Loop Structures Associate with Specific Epigenomic Signatures in Mammals. Mol. Cell. 2016;63:167–178. doi: 10.1016/j.molcel.2016.05.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wright E.S. Using DECIPHER v2.0 to analyze big biological sequence data in R. R J. 2016;8:352–359. [Google Scholar]

