Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Mar 30.
Published in final edited form as: Mol Cell. 2012 Mar 1;45(6):814–825. doi: 10.1016/j.molcel.2012.01.017

R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters

Paul A Ginno 1, Paul L Lott 2, Holly C Christensen 1, Ian Korf 1,2, Frédéric Chédin 1,2,
PMCID: PMC3319272  NIHMSID: NIHMS362480  PMID: 22387027

Summary

CpG islands (CGIs) function as promoters for approximately 60% of human genes. Most of these elements remain protected from CpG methylation, a prevalent epigenetic modification associated with transcriptional silencing. Here, we report that methylation-resistant CGI promoters are characterized by significant strand asymmetry in the distribution of guanines and cytosines (GC skew) immediately downstream from their transcription start sites. Using innovative genomics methodologies, we show that transcription through regions of GC skew leads to the formation of long R-loop structures. Furthermore, we show that GC skew and R-loop formation potential is correlated with and predictive of the unmethylated state of CGIs. Finally, we provide evidence that R-loop formation protects from DNMT3B1, the primary de novo DNA methyltransferase in early development. Altogether, these results suggest that protection from DNA methylation is a built-in characteristic of the DNA sequence of CGI promoters that is revealed by the co-transcriptional formation of R-loop structures.

INTRODUCTION

In mammals, DNA methylation at CpG dinucleotides is a prevalent epigenetic modification, affecting 70–80% of all target sites (Lister et al., 2009). Methylation is distributed throughout the genome, though particularly abundant at repeated DNA elements where it contributes to stable transcriptional silencing (Yoder et al., 1997). Due to the high rate of deamination of 5-methylcytosine to thymine, CpG-rich regions are scarce in mammalian genomes and predominantly correspond to unmethylated DNA segments called CpG islands (CGIs). Many CGIs reside at the 5′ ends of genes where they function as promoter elements (Illingworth and Bird, 2009). Approximately 60% of all human genes, particularly ubiquitously expressed housekeeping genes, are transcribed from CGI promoters, making these loci critical functional elements in the human genome.

CGI promoter methylation is associated with heritable transcriptional silencing as seen at hundreds of genes on the inactive X chromosome in females (Payer and Lee, 2008), at imprinted genes, and at genes expressed in a tissue-specific manner (Guibert et al., 2009). Aberrant methylation and silencing of CGI promoters is often observed in the context of cancer (Jones and Baylin, 2002).. Genome-level studies confirm that the large majority (82–94% depending on the study) of promoter CGIs are unmethylated in normal tissues (Illingworth and Bird, 2009). However, the mechanism by which CGI promoters remain protected from this otherwise prevalent epigenetic modification is a major outstanding question. This question is particularly pertinent for early developmental stages that are characterized by a strong, global wave of de novo methylation occurring concomitantly with initial differentiation events.

Multiple lines of evidence suggest that transcriptional activity is required for protecting CGI promoters from DNA methylation (Bird, 2002). For instance, a strong active promoter is required to maintain the unmethylated paternal allele of the imprinted Airn CGI in murine embryonic stem cells (ESCs) (Stricker et al., 2008). Likewise, impaired promoter function at the MAGEA1 and APRT genes lead to acquisition of DNA methylation (De Smet et al., 2004; Macleod et al., 1994). More globally, the presence of RNA polymerase II at CGIs is associated with resistance to DNA methylation (Takeshima et al., 2009), consistent with the finding that the location of a CGI relative to a TSS is a powerful predictor of its DNA methylation status (Straussman et al., 2009). A transcription-based model is further supported by observations that transcriptional silencing occurs prior to the onset of de novo methylation (Bird, 2002; Mutskov and Felsenfeld, 2004). In the well-studied case of X-inactivation, de novo DNA methylation is not required for the initiation of silencing (Payer and Lee, 2008). In the context of cancer, DNA methylation only occurs after a gene has become transcriptionally inactive (Bachman et al., 2003; Brock et al., 2007). Finally, recent human methylome data show that the level of protection against DNA methylation at promoter regions is directly correlated to transcriptional output (Laurent et al., 2010; Lister et al., 2009). Taken together, these findings imply that transcription initiation at CGI promoters is crucial for resisting DNA methylation. The exact mechanism(s) by which transcription confers this protection remains unknown.

DNA sequence features may also be correlated with the methylation status of CGIs. Importantly, GC-content, CGI length, and CpG density are not accurate predictors (Feltus et al., 2003; Straussman et al., 2009). However, some sequence motifs, particularly degenerate G-rich sequences, have been associated with unmethylated CGIs (Bock et al., 2006; Straussman et al., 2009). This suggests that the protection mechanism operating at CGIs might require particular DNA sequence arrangements.

Here we present a series of computational and experimental data that delineate a model to account for the protection of promoter CGIs against DNA methylation. Namely, we describe that the majority of unmethylated CGI promoters in the human genome show significant strand asymmetry in the distribution of guanine and cytosines, a property known as GC skew. This property, in turn, confers the ability to form long, stable, three-stranded nucleic acid structures called R-loops upon transcription. R-loop structures form in cis when the newly transcribed G-rich RNA strand reanneals back to the template C-rich DNA strand, forcing the non-template G-rich DNA strand into a largely single-stranded (ss) conformation (Figure 1A). The driving force behind R-loop formation is the superior thermodynamic stability of a G-rich RNA bound to a C-rich DNA template (Ratmeyer et al., 1994; Roberts and Crothers, 1992). Such R-loop structures have long been reported from Bacteria and bacteriophages to yeast and mammals (Li and Manley, 2006). Our data further suggests that R-loop formation at CGIs can prevent methylation of the underlying DNA sequence.

Figure 1. Co-oriented positive GC skew is a common property of strong human CGI promoters.

Figure 1

A Transcription through regions of GC skew such that a G-rich RNA is generated can lead to R-loop formation (top). In contrast, transcription through the same region such that a C-rich RNA is produced does not lead to R-loop formation (bottom). The G-rich and C-rich strands are color-coded in red and blue, respectively; open lollipops represent unmethylated CpG sites. Note that transcription through regions devoid of GC skew does not give rise to R-loop formation either. B Percent of human genes (RefSeq) showing overlap with GC skew at their 5′ (−500 to +1500 relative to the beginning of the gene) or 3′ (−500 to +1500 relative to the end of the gene) extremities, as determined by the SkewR algorithm. The overlap was calculated for each chromosome and the average and standard deviation for the genome are shown. C Metagene analysis of the 7,820 genes showing positive GC skew co-oriented with transcription. All genes were oriented from left to right (as denoted by the arrow above) and aligned at their TSSs. The graph shows the aggregate GC%, CpG obs/exp ratio, and GC skew values calculated for a 50 nucleotide sliding window. The grey shaded area highlights the portion of the region corresponding to a CpG island (GC%>50% and CpG o/e>0.60).

RESULTS

GC skew is a common characteristic of human CGI promoters

To determine if human CGI promoters carry sequence signatures for R-loop formation, we developed a Hidden Markov Model-based algorithm called “SkewR” to identify GC-rich regions displaying strand asymmetry in the distribution of G and C residues. In brief, SkewR scans the entire human genome and assigns each nucleotide to one of four distinct states representing either the average sequence composition (Genomic state), a GC-rich state devoid of GC skew (GC state), or two skewed states (G-skew or C-skew depending on whether guanines or cytosines are enriched on the particular strand being analyzed; see Supplementary Text and Figure S1 for details). Under the most stringent parameters, 16,694 non-repetitive skewed loci were detected in the human genome (see Table S1 for a list), 75% of which mapped to genic regions. Furthermore, regions showing GC skew displayed a striking preference for the 5′-end of genes: 45.9% of all human genes (based on RefSeq annotation) representing over 10,400 loci showed GC skew at their core promoters (defined here as −500 to +1500 base-pairs (bp) around the transcription start site; Figure 1B). Importantly, 97% of all GC-skewed promoter regions corresponded to CGIs, suggesting that GC skew is a common characteristic of this class of elements. In contrast, only 3.7% of genes showed similar skew at their 3′-end. Likewise, gene bodies showed negligible propensity towards GC skew (data not shown). We conclude from this analysis that GC skew is strongly enriched at CGI promoters.

Positive GC skew downstream of transcription is a hallmark of strong CGI promoters

In 75% of cases, promoters with GC skew (7,820 loci; Table S1) were oriented such that the non-template strand for transcription showed an excess of G over C residues (positive GC skew). Transcription through these regions could in principle give rise to R-loops provided the GC skewed region is located downstream of the TSS. To test this, we aligned all corresponding genes at their transcription start sites (TSSs) and recovered 2 kilobases (kb) of DNA sequence on each side of the TSS. This dataset was then used to compute the GC%, CpG density (measured as the CpG observed versus expected ratio or CpG o/e), and GC skew over a 50 nucleotide sliding window. This analysis revealed that promoters with positive GC skew define a set of highly GC-rich and CpG-dense CGIs (Figure 1C). The GC% rose nearly symmetrically around the TSS, while the CpG o/e ratio rose close to 1 just upstream of the TSSs, reflecting a strong clustering of CpG sites at this position. Using the “standard” criteria for CpG islands (Gardiner-Garden and Frommer, 1987), the CGIs defined by this promoter set showed an average footprint of nearly 1,300 bp. In contrast to GC% and CpG density, GC skew rose in a markedly asymmetric manner around the TSS (Figure 1C). Prior to the TSS, the GC skew hovered around zero, reflecting the absence of any significant GC strand bias. About 200 bp upstream of the TSS, however, the GC skew rose abruptly and reached two maxima around 30 and 250 bp downstream of the TSS. The positive GC skew then gradually decreased with distance but could still be clearly detected 4 kb downstream (Figure 1C and data not shown). Altogether, this analysis revealed that (i) GC skew is hallmark of a large set of strong CGI promoters; (ii) the TSS of these promoters is characterized by an abrupt transition in GC skew; and (iii) GC skew imposes an intrinsic asymmetry to these loci in such a way that positive GC skew downstream of the TSS is co-oriented with the transcribed gene.

Not surprisingly, the genes associated with these CGI promoters showed strong enrichment in gene ontology categories corresponding mostly to “housekeeping” genes (Table S1). Likewise, a clear majority of these genes was characterized by high expression levels and broad tissue-specificity (data not shown).

Formation of long, stable R-loops at the endogenous human SNRPN CGI

In order to detect R-loops at endogenous loci, we used a previously described R-loop footprinting method (Yu et al., 2003). This procedure uses non-denaturing bisulfite treatment combined to Ribonuclease H (RNase H) digestion as a tool to search for RNA:DNA hybrid-dependent ssDNA footprints in native genomic DNA samples.

The imprinted SNRPN CGI was the first region chosen for this analysis. This CGI overlaps with the Prader-Willi syndrome Imprinting Center and serves to maintain the paternal allele in an unmethylated state (El-Maarri et al., 2001); it is also highly GC-skewed (Figure 2A,B and Figure S2A). In vitro transcription through cloned SNRPN CGI fragments in the physiological orientation led to efficient R-loop formation as judged from topological and antibody gel mobility shift assays, and primer extension assays (Figure S2 and data not shown). To assess R-loop formation at the endogenous SNRPN locus, we extracted genomic DNA from undifferentiated human H1 ESCs and pluripotent Ntera2 cells and subjected it to non-denaturing bisulfite treatment. This was followed by PCR amplification with primer pairs targeting the 5′ and 3′ boundaries of R-loop structures, followed by cloning, and sequencing. From this, we recovered a series of independent DNA molecules with extensive stretches of C to T conversion, indicative of the presence of ssDNA (Figure 2C). Importantly, treatment of the genomic DNA with RNase H (which specifically degrades RNA when base-paired to DNA) prior to bisulfite conversion prevented our ability to recover any SNRPN amplicon despite numerous attempts (amplification of the same region with non-converted primers was highly efficient; data not shown). Thus, our ability to detect single-strandedness on the non-template strand was dependent upon RNA:DNA hybrid formation. The single-stranded footprints observed on individual DNA molecules varied in size from ~150 to over 600 bp of contiguous C to T conversion. Their 5′-ends were distributed over a ~300 bp initiation region coinciding with a sharp rise of GC skew (Figure 2B,C). On the 3′-end, the ssDNA footprints terminated over a ~150 bp region over which GC skew gradually returned to the genomic average (Figure 2B,C). The maximal span of these R-loop footprints was ~670 bp, matching closely to the GC skewed area. Long ssDNA footprints were also detected at the SNRPN CGI in DNA extracted from human post-mortem brain tissue (Broadman Area 9) and whole blood. However, only a minority (~10%) of the molecules recovered from these two differentiated tissues corresponded to long R-loop structures. Most molecules showed only background conversion due to spontaneous DNA breathing (Figure S3). In contrast, every molecule recovered from human ESCs carried a long R-loop footprint. This raises the possibility that pluripotent cells may be particularly adept at either forming or preserving R-loops compared to differentiated cell types.

Figure 2. Formation of genomic R-loops at the endogenous SNRPN CGI promoter.

Figure 2

A Schematic representation of the 4.5 kb region surrounding the SNRPN CGI (green). The major transcription start site and first untranslated exon are indicated by a broken arrow and blue box, respectively. The CpG o/e ratio, GC percent, and GC skew are indicated in UCSC Genome Browser dense track format; strong SkewR blocks are indicated by black boxes. The analyzed region is indicated by dashed lines and expanded in the panel below. B The GC skew over the analyzed region is plotted using a 100 nucleotide sliding window. The solid line represents the average genomic GC skew with the standard deviation shown as dotted lines. C This panel depicts the distribution of ssDNA footprints over the analyzed region in a stack format. This was generated from the analysis of 21 individual DNA molecules recovered from H1 ESCs, Ntera2 cells, blood, and brain. Vertical tick marks indicate when a given cytosine on the non-template DNA strand was sequenced as thymine, indicative of a single-stranded conformation. Green tick marks indicate converted cytosines in CpG dinucleotides. The position of all cytosines along the region is indicated on the line at the bottom of the stack (All Cs). The gray shaded area highlights the span of the longest ssDNA footprints. The primers used to generate the sequenced amplicons are indicated at the bottom (red primers corresponding to “converted” primers matching bisulfite-modified DNA).

Since the SNRPN CGI is imprinted in hESCs, we sought to determine whether the R-loop tracts preferentially associated with the transcribed paternal allele. The two parental alleles could be distinguished by an informative single nucleotide polymorphism (SNP) in the H1 hESC line (rs12916854, A > G). In all cases, R-loops were associated with the A allele at rs12916854. We infer that this allele corresponds to the paternal chromosome since ~96% of the cytosines coinciding with CpG sites (Figure 2C, green ticks) within the bisulfite-converted R-loop tracts were unmethylated. This indicates that R-loops form in cis upon transcription of the unmethylated paternal SNRPN CGI.

R-loop formation at the endogenous human APOE promoter

To confirm R-loop formation at a non-imprinted locus, we selected the promoter of the human APOE gene, which was identified by SkewR (Figure 3A,B). While this promoter only shows intermediate CpG density, it belongs to a class of unmethylated CpG island-like regions (UMRs) (Straussman et al., 2009). As expected, non-denaturing bisulfite footprinting revealed the presence of long ssDNA tracts on the non-template G-rich strand expanding up to 650 bp (Figure 3C). Pre-treatment of the genomic DNA with RNase H abolished our ability to amplify the region, indicating that the ssDNA footprints were dependent on RNA:DNA hybrid formation. As was the case at SNRPN, the large majority (~94%) of CpG sites within the R-loop footprints were unmethylated. Using an informative SNP (rs440446), we confirmed that R-loop formation at APOE occurs on both parental alleles, consistent with bi-allelic expression.

Figure 3. R-loop formation at the endogenous APOE promoter.

Figure 3

All symbols are as described in Figure 2. DNA was recovered from human H1 ESCs and Ntera2 cells and ssDNA footprints determined from 12 independent DNA molecules.

Transcription induces R-loops at the murine Airn CGI

To further establish R-loop formation at CGIs, we focused on the well-studied murine Airn locus. The Airn CGI shows oriented GC skew (Figure 4A,B) and undergoes efficient R-loop formation upon in vitro transcription (Figure S4). In vivo, Airn expression is weak in undifferentiated mouse ESCs and only undergoes induction upon differentiation (Latos et al., 2009). We extracted genomic DNA from undifferentiated and differentiated E14 mESCs and assayed R-loop formation as described above. In undifferentiated cells, PCR was inefficient and only two independent molecules could be recovered (Figure 4C). One carried a short (~170 bp) ssDNA footprint while the other only showed sporadic background conversion. In contrast, multiple independent molecules with long ssDNA footprints were readily recovered from differentiated cells. These footprints reached up to 600 bp in length and initiated with the rise of GC skew. In this particular case, four independent molecules were recovered after pre-treatment of the genomic DNA from differentiated cells with RNase H. None of these molecules carried C to T conversion tracks, indicating that the single-strandedness detected on the Airn non-template strand was dependent on the formation of an RNA:DNA hybrid.

Figure 4. Formation of genomic R-loops at the endogenous mouse Airn CGI promoter.

Figure 4

A Schematic description of the 4.5 kb region surrounding the Airn promoter. B GC skew over the analyzed region. C Analysis of R-loop-derived ssDNA footprints. Symbols are as in Figure 2 except each line corresponds to one individual DNA molecule. DNA was analyzed from mESCs differentiated along a neural path by addition of retinoic acid (−LIF+RA, high Airn expression) with (+H) or without (−H) RNase H pre-treatment prior to bisulfite footprinting. Amplicons were also recovered from undifferentiated mESCs (+LIF, little to no Airn expression). Brackets indicate regions that underwent short deletions due to instability of the DNA sequence in E. coli.

Genomic profiling methods reveal widespread R-loop formation at CpG island promoters

To further demonstrate that R-loop structures form broadly in the human genome, we used the monoclonal S9.6 antibody. This antibody recognizes RNA:DNA hybrids in a sequence-independent manner (Boguslawski et al., 1986; Hu et al., 2006) and binds strongly to R-loops (see below). S9.6 immunocytochemistry staining patterns on H1 hESCs showed an extensive nuclear localization characterized by thousands of small spots distributed throughout the nucleoplasm (Figure 5A). Bright staining in DAPI-poor nucleolar regions was also clearly evident, compatible with R-loop formation at the highly transcribed and inherently R-loop-prone ribosomal DNA arrays (El Hage et al., 2010). Finally, extra-nuclear staining was consistently observed in the immediate nuclear periphery. This signal coincides with mitochondrial staining (data not shown) and is likely to result from mitochondrial replication (Brown et al., 2008). Pre-treatment of fixed cells with RNase H decreased the S9.6 signal to background levels (data not shown), indicating that S9.6 localization reflects the presence of endogenous RNA:DNA hybrids. Importantly, a transiently expressed HA-tagged human RNase H1 protein lacking its mitochondrial localization signal showed a nuclear distribution nearly identical to that of S9.6, (Figure 5A, bottom right). This suggests that S9.6 and RNASEH1 recognize an abundance of endogenous RNA:DNA hybrid targets in the human genome.

Figure 5. Widespread R-loop formation at human promoters.

Figure 5

A H1 hESCs were stained with the S9.6 antibody (top left) and counterstained with DAPI (top right). A merge of both channels is shown (bottom left). Nucleolar and mitochondrial staining are indicated by arrows while the boxed inset shows a magnified view of the nucleoplasm. The bottom right panel shows the cellular distribution of the HA-tagged human RNASEH1(ΔMLS) protein (red) upon transfection in HEK293 cells. Cells were counterstained with DAPI (blue). B The aggregate GC skew for all newly identified R-loop forming promoters is graphed in red. All genes were aligned at their TSS and the GC skew computed using a 50 nt sliding window over the −500/+1500 region. The overall GC skew predicted for all 7,820 highly skewed promoters (Figure 1C) is shown in green for comparison. C and D Examples of DRIP-seq data. Each panel shows a schematic description of the region analyzed with TSSs indicated by broken arrows (minor TSSs are shown by dashed lines), exons by blue boxes, CpG islands by green boxes. The SkewR tract shows the position of GC skew blocks with red indicating G-rich blocks and blue C-rich blocks. The RE tract indicates cut sites for the 5 restriction enzymes used to fragment the genome. Below are two tracts representing DRIP-seq read density in the absence (top) or presence (bottom) of Ribonuclease pre-treatment. The red box indicates restriction fragments which coincide with RNase H-sensitive DRIP-seq peaks and therefore harbor R-loop forming regions. The coordinates of each region analyzed (Hg19) are given. The two regions shown here also correspond to DRIVE-seq peaks (data not shown).

In order to characterize these targets globally, we developed two independent genome-wide R-loop profiling methods. The first one, which we termed DRIP (DNA:RNA ImmunoPrecipitation) relies on the intrinsic specificity of the S9.6 antibody for R-loop molecules. The second method, which we termed DRIVE (DNA:RNA in vitro Enrichment), makes use of a catalytically-deficient, but binding-competent, human RNASEH1 mutant protein in affinity pulldown assays. Both methods enable the specific and near quantitative recovery of R-loop molecules in complex nucleic acid mixtures (Figure S5A and B) and lend themselves to genome-wide R-loop characterization. For this, genomic DNA was first extracted gently from human pluripotent Ntera2 cells and fragmented using a cocktail of restriction enzymes. The samples were then incubated with S9.6 or the MBP-RNASEH1 protein (see Experimental Procedures for details) and regions of RNA:DNA hybrids were recovered after a series of washes and elutions. In parallel, the same procedure was carried out on genomic DNA that had been pre-treated with RNase H. After pulldown or immunoprecipitation enrichment was assayed using qPCR at the SNRPN and APOE promoters (Figure S5C). The recovered material was ligated to barcoded Illumina adaptors and used to build sequencing libraries for high-throughput DNA sequencing.

In total, DRIP-seq and DRIVE-seq identified 20,862 and 1,224 peaks, respectively. 1,972 unique genes carried RNase H-sensitive DRIP or DRIVE peaks overlapping with their core promoter region (−500/+1500 around the TSS), representing a highly significant enrichment. Importantly, DRIP and DRIVE showed a strong overlap at these promoter regions (Figure S5D). Gene ontology analysis revealed that this RNA:DNA hybrid-forming gene set was enriched for functional categories representing “housekeeping” functions such as cellular metabolic processes, translational elongation, and gene expression. Consistent with our computational analysis, 84% of these promoters corresponded to CpG islands and these loci showed pronounced GC skew downstream of the TSS (Figure 5B). Within the CGI class, 65% of the RNA:DNA hybrid-forming regions identified here mapped onto highly skewed CGIs previously identified by SkewR. The other 19% mapped onto a weaker class of CGI promoters devoid of a strong SkewR block (see below). Only 2.1% of the newly identified promoters corresponded to GC-poor, CpG-poor promoters devoid of GC skew, reinforcing the notion that GC skew is an accurate predictor of R-loop formation. Taken together, this data provides strong support to the notion that R-loop formation is a widespread characteristic of thousands of human promoters, mostly CGIs. Two representative examples of such loci are provided (Figure 5C, D).

GC skew is predictive of the epigenetic state of CGIs

We next wished to test whether R-loop formation and the ability of CGI promoters to remain protected from DNA methylation were two inter-related qualities. For this, we first asked whether the DNA methylation status of CGIs was reflected by their potential for R-loop formation, as measured by GC skew. For this, we compared promoter CGIs, which tend to resist DNA methylation, with gene body CGIs, which often tend to be methylated. Significantly, 65% of the promoter CGI class (n=13,636) overlapped with strong SkewR blocks while only 16.3% of the gene body CGI class (n=4,598) did (Figure 6A), indicating that these two classes of CGIs can be discriminated by their GC skew. We next asked whether CGIs showing distinct epigenetic states would also show distinct GC skew profiles. For this, we used published data (Straussman et al., 2009) and derived two sets of ~1,600 CGIs each, corresponding to the least, and the most, methylated CGIs in two lines of hESCs, respectively. Strikingly, 67.4% of unmethylated CGIs showed overlap with strong SkewR blocks, while only 9.4% of methylated CGIs showed similar overlap (Figure 6B). We conclude that unmethylated CGI promoters are highly associated with strong GC skew, and therefore with significant R-loop formation potential.

Figure 6. R-loop formation potential is predictive of the unmethylated status of CGI promoters.

Figure 6

A Percent of promoter CGIs (n=13,636) and gene body CGIs (n=4,598) with GC skew overlap. B Percent of unmethylated (n=1,785) and methylated (n=1,594) CGIs showing GC skew overlap (Methylation data from the hESC dataset from (Straussman et al., 2009)). Unmethylated CGIs showed a methylation score <−0.8 while methylated CGIs had a score > 1.3 in both hESC cell lines). In both panels, the overlap was calculated for each chromosome and the average and standard deviation for the genome are shown. C Metagene analysis of a subset of 4,528 promoter CGIs lacking strong GC skew. Symbols and analysis are as described for Figure 1C. D Aggregate CpG methylation levels around the TSS of genes corresponding respectively to strong CGIs characterized by high GC skew (n=7,820, red), weak CGIs characterized by intermediate GC skew (n=4,526, orange), and “CpG-poor” promoters characterized by little to no skew (n=7,570, green). The DNA methylation data was from (Laurent et al., 2010).

Lastly, we investigated whether the strength of GC skew was correlated with the degree of protection against DNA methylation. For this, we focused on three classes of promoters distinguished by their GC skew. The first corresponded to the group of strong, highly skewed, promoters described in Figure 1C (n=7,820). The second class corresponded to promoter CGIs lacking significant GC skew overlap (the “missing” promoter CGIs from Figure 6A; n=4,526). This second class defines a set of weaker CGIs characterized by lower CpG densities and GC content, and intermediate levels of GC skew (Figure 6C). The third class corresponded to GC-poor and CpG-poor promoters (n=7,570) characterized by minimal shifts in GC skew around the TSS compared to both CGI groups (data not shown). Analysis of human methylome data (Laurent et al., 2010) revealed that highly skewed CGIs showed the most significant reduction of DNA methylation around the TSS both in terms of absolute levels and size of protected region (Figure 6D). Weaker CGIs with intermediate GC skew, while still protected, showed an overall reduced level of protection compared to the previous group. Finally, “CpG-poor” promoters showed little to no protection against DNA methylation. We also observed a similar correlation between GC skew and H3K4me3 signal (Figure S6A), a hallmark of CGI promoters thought to protect against DNA methylation (Ooi et al., 2007). The strength of GC skew, or R-loop formation potential, is therefore predictive of the DNA methylation status at human promoters.

R-loop formation on episomal templates can protect against de novo DNA methylation

To directly test the hypothesis that R-loop formation protects the underlying DNA sequence from the activity of DNA methyltransferases (DNMTs), we used a well-described episomal system (Chedin et al., 2002) in which the R-loop forming portion of the human SNRPN CGI was cloned in both orientations under the control of a constitutive viral promoter. As expected, episomal R-loop formation occurred only in the orientation that generates a G-rich RNA (Figure S7A). These episomes were transfected in HEK293c18 cells together with expression vectors for DNMT3B1, the main de novo DNMT expressed during early development (Takahashi et al., 2007), and/or for the DNMT3L stimulatory factor (Chedin et al., 2002). The resulting episomal methylation was then measured using methyl-sensitive restriction enzymes and Southern blotting. Methylation of the non-R-loop forming episome was readily observed upon DNMT3B1 expression and clear stimulation was achieved upon co-expression with DNMT3L, as expected (Figure 7A; left). This indicates that constitutive transcription in the non-permissive orientation for R-loop formation is not sufficient to protect from DNA methylation. In contrast, methylation of the R-loop forming episome appeared reduced even in the presence of DNMT3L (Figure 7A; right). We observed similar behavior for an episome containing an R-loop forming portion of the mAirn CGI (Figure 7B and Figure S7B). In both cases, the reduction in DNA methylation was particularly visible for high molecular weight species corresponding to highly methylated molecules. Importantly, deletion of the constitutive promoter transcribing the SNRPN region restored the ability of DNMT3B1 to methylate the CGI irrespective of orientation (Figure 7C). We confirmed these observations through careful quantification of Southern blot data on three independent replicates. This analysis showed that, on average, highly methylated molecules were reduced 2.7-fold between the R-loop forming and the non-R-loop forming orientations of the SNRPN fragment (Figure 7D). In contrast, promoter deletion abolished this difference. Interestingly, no significant difference was observed when the active DNMT3A2 methyltransferase was used instead of DNMT3B1, suggesting that the two active de novo enzymes do not react to R-loop formation in the same manner (Figure 7D). Finally, a statistically significant 2-fold reduction in DNA methylation efficiency upon R-loop formation was confirmed using bisulfite sequencing focusing on a 536 bp region mapping to a highly GC-skewed portion of the SNRPN G-rich strand and encompassing 20 CpG sites (Figure 7E). By contrast, a 532 bp untranscribed and non-skewed region encompassing 48 CpG sites located 3.5 kb downstream of the SNRPN and Airn regions was equally methylated regardless of the orientation of the R-loop forming region (Figure S7C). Altogether, this data indicates that transcription of GC-skewed, CpG-rich regions in an orientation compatible with R-loop formation confers at least partial protection from DNMT3B1-mediated DNA methylation to the underlying DNA sequence.

Figure 7. R-loop-mediated protection from DNA methylation.

Figure 7

The ability of DNMT3B1 to methylate the SNRPN (A) and Airn (B) CGIs in the presence or absence of the stimulatory factor DNMT3L is shown. CGIs were cloned in episomes in an R-loop forming or non-R-loop forming orientation as graphically indicated above. The episomes were harvested 7 days post-transfection and methylation was analyzed after cleavage by the methyl-sensitive HpaII enzyme, gel electrophoresis and Southern blotting with SNRPN and Airn CGI probes, respectively. Regions showing clear lack of methylation are highlighted by brackets and an asterisk. C is identical to A except the constitutive CMV promoter driving transcription through the SNRPN region was deleted. D For each sample, the graph depicts the fold reduction in methylation comparing the R-loop to the non-R-loop forming orientation, as determined by band densitometry. Values were calculated from three independent experiments and are shown with means and standard error. E The average methylation levels (in %) measured by bisulfite sequencing on the G-rich strand of the SNRPN insert is presented for both orientations. The number of independent molecules sequenced in each case is indicated together with the standard deviation for each sample. F Model for the function of R-loops in the protection against de novo DNA methylation and epigenetic silencing at GC skewed CGIs. See text for details.

DISCUSSION

Here we present evidence for widespread R-loop formation at CGI promoters in the human genome. This conclusion is supported by computational analyses which revealed that positive GC skew, the main sequence attribute required for R-loop formation, is a hallmark of a large subset of strong CGI promoters. Importantly, the TSS of these promoters is characterized by an abrupt transition in GC skew such that positive GC skew is maximal at the 5′-end of the transcribed RNA (Figure 1). These observations are in agreement with prior studies showing that the 5′-end of human genes is characterized by GC strand asymmetries and G clustering on the coding strand downstream from promoters (Aerts et al., 2004; Touchon et al., 2003). Likewise, global R-loop formation is consistent with prior suggestions that the compositional asymmetries observed at human promoters are due to transcription-coupled mutational processes (Green et al., 2003; Polak and Arndt, 2008). The combination of GC skew and G clustering near the 5′-end of transcribed genes offers an optimal genomic context for the formation of R-loops as demonstrated by careful in vitro studies (Roy and Lieber, 2009; Roy et al., 2008). In agreement, we provide base pair-resolution data demonstrating R-loop formation at two loci in the human and 1 locus in the mouse genomes (Figures 24). In all cases, R-loop tracts matched closely with GC skew and were located within the boundaries of unmethylated promoter regions. Using innovative genomic methods (Figure 5), we further showed that R-loop formation is a widespread structural feature of a large fraction of human CGI promoters.

R-loop formation at CGI promoters is consistent with several important aspects of CGI biology. First, CGI promoters are intrinsically open chromatin regions characterized by elevated nuclease accessibility and lower nucleosome occupancy (Tazi and Bird, 1990). The presence of extensive ssDNA on the non-template strand of R-loops might account at least in part for higher nuclease sensitivity. Likewise, it is possible that the formation of long RNA:DNA hybrids at these loci might render them less likely to wrap around nucleosomes (Dunn and Griffith, 1980). Second, R-loop formation is consistent with accumulating evidence showing that CGI promoters represent DNA replication origins (Cadoret et al., 2008; Delgado et al., 1998; Sequeira-Mendes et al., 2009). As noted earlier, R-loops have been traditionally observed at replication origins in multiple systems including bacterial plasmids and chromosomes, bacteriophages, and mitochondria. Third, while CGI promoter sequences are nearly symmetrical when analyzed for CpG density and GC content, GC skew is, by contrast, intrinsically asymmetric around the TSS (Figure 1C). It Is possible that this asymmetry, and subsequent R-loop formation, may serve to correct the lack of directionality in the initial steps of transcription (Core et al., 2008; Seila et al., 2008). This correction might be mediated by the ability of R-loops to elicit transcriptional pausing (Dominguez-Sanchez et al., 2011; El Hage et al., 2010; Tous and Aguilera, 2007). Interestingly, we note that the first peak of GC skew ~30 nt downstream of TSS coincides closely to the peak of paused RNA polymerase II (Figure S6B). More broadly, a number of proteins involved in mRNA capping, spliceosome assembly, RNA splicing, mRNA surveillance, and mRNA export, contribute to regulating R-loop formation (Aguilera and Gomez-Gonzalez, 2008; Li and Manley, 2006; Paulsen et al., 2009), suggesting that R-loop formation may be mechanistically tied to these processes.

Using computational analyses, we show that the potential for R-loop formation is correlated with and predictive of the methylation status of a CGI (Figure 6). This suggests that R-loop formation may be directly involved in maintaining the unmethylated state of CGI promoters. In support of this notion, we provide direct evidence that R-loop formation can protect from DNMT3B1-mediated DNA methylation (Figure 7). The extent of protection (ranging from 2 to 4-fold in independent assays) may be underestimated in this system given that episomal R-loop formation efficiency was only 10–15% on a steady-state level (Figure S7A). The exact mechanism by which R-loop structures maintain an unmethylated state remains to be fully described. One possibility is that R-loops represent inappropriate substrates for DNMT3B1 activity, as recently shown using purified recombinant enzyme (Ross et al., 2010). It is unlikely, however, that a substrate-only mechanism could account for the protection owing to the fact that R-loop formation efficiency is likely to be low (Huang et al., 2006) and that R-loops are likely to exist only transiently due to the activity of endogenous RNase H-type enzymes or RNA:DNA helicases. Two alternative, non-exclusive, models could account for R-loop function. The first model is based on the report that several members of the H3K4 methyltransferase family avidly bind to ssDNA (Krajewski et al., 2005). This raises the possibility that R-loop formation may contribute to recruiting the protective H3K4 trimethyl mark. In support of this model, H3K4me3 levels are positively correlated with transcriptional activity (Barski et al., 2007) and GC skew coincides with the peak of H3K4me3 nucleosomes downstream of the TSS (Figure S6B). Therefore, R-loop formation may serve as an initial, transient nucleic acids-based signal, which is converted into a more permanent and heritable chromatin-based landmark. Additional recruitment of H3K4me3 to unmethylated CGIs by the CFP1 protein (Thomson et al., 2010) may enable the maintenance of an unmethylated state irrespective of the transcriptional activity of a given promoter in differentiated somatic tissues. The second model relies on the possibility that R-loops may also signal the recruitment of DNA demethylating complexes to CGI promoters. The AID cytosine deaminase, a leading contender for initiating DNA demethylation (Popp et al., 2010), requires ssDNA substrates and targets transcribed loci, including R-loop-forming regions (Chaudhuri et al., 2003; Pham et al., 2003; Yu et al., 2005). Recent Chip-seq analyses show that AID is enriched at H3K4me3-marked promoter proximal sequences overlapping broadly with CGIs (Yamane et al., 2011). Altogether, our results suggest that protection from DNA methylation and epigenetic silencing at the majority of promoter CGIs in the human genome is a built-in characteristic of the DNA sequence that is revealed by the co-transcriptional formation of R-loop structures.

Experimental Procedures

Detection of R-loops using Non-denaturing Bisulfite Treatment

Single-stranded R-loop footprinting was carried out using a previously reported method (Yu et al., 2003). In short, genomic DNA was extracted gently from cells or tissues and treated with sodium bisulfite under non-denaturing conditions. Putative R-loop regions were PCR amplified and the resulting PCR products were gel purified, cloned and sequenced. As a control, genomic DNA was first pre-treated with RNase H (New England Biolabs) overnight at 37°C before bisulfite conversion. For each locus analyzed, the data shown here was merged from at least two independent biological replicates.

Immunocytochemistry

Cells were fixed and permeabilized in pure, ice-cold methanol and stained according to standard protocols using the murine S9.6 primary antibody (Hu et al., 2006). The human RNase H1 cDNA was amplified from a full-length clone so as to remove the first 26 amino acids which carry the mitochondrial localization sequence and add an N-terminal HA tag (Cerritelli et al., 2003). This PCR fragment was recloned into pcDNA3 (Invitrogen), resulting in pRH1(ΔMLS)HA. The distribution of HA-RNASEH1 upon transfection of HEK293 cells was analyzed by immunocytochemistry using a mouse Anti-HA primary antibody.

Purification of the S9.6 antibody and MBP-RNASEH1 protein

The S9.6-producing hybridoma cell line (HB-8730) was purchased from ATCC (Manassas, VA) and the antibody was recovered from ascites fluid and purified to homogeneity by Antibodies Inc. (Davis, CA). The human RNASEH1(ΔMLS) cDNA was cloned in-frame in a modified pMALc-2x expression vector. An Asp to Asn mutation was then introduced at position 145 by site-directed mutagenesis. This mutation abolishes the catalytic activity of the enzyme but leaves intact its ability to specifically bind to RNA:DNA hybrids ((Wu et al., 2001) and Figure S5B). The resulting MBP-RNASEH1 protein was expressed in E. coli Rosetta cells and purified to near homogeneity through an amylose affinity column and S ion-exchange column.

Genome-wide R-loop profiling using DRIP-seq and DRIVE-seq

Total nucleic acids were extracted from pluripotent Ntera2 cells by SDS/Proteinase K treatment at 37°C followed by phenol-chloroform extraction and ethanol precipitation. DNA was fragmented using HindIII, EcoRI, BsrGI, XbaI and SspI and pre-treated, or not, with RNase H overnight. For DRIP, DNA was further processed essentially as described for MeDIP (Weber et al., 2005) except the S9.6 antibody was used and the denaturation step was omitted. For DRIVE, the MBP-RNASEH1 protein was added to DNA (2 to 1 w/w ratio) and allowed to bind for two hours. Bound DNA fragments were recovered by addition of 50 μl amylose beads (New England Biolabs) followed by two washes and elution in a maltose-containing buffer. DNA fragments were treated with Proteinase K and recovered after phenol-chloroform extraction and ethanol precipitation. Validation of the DRIP and DRIVE procedures was performed by qPCR (primers available upon request). The pulled-down material (with and without RNase H-treatment) and input DNA were then sonicated, size-selected and ligated to Illumina barcoded adaptors for sequencing on Illumina GAIIx and HiSeq platforms. From 4.8 to 6 million mapped reads were obtained for each sequenced library. Alignment to the Hg19 build was carried out using BWA (Li and Durbin, 2009), and peak calling was done using MACS (Zhang et al., 2008). For DRIP-seq, peaks were called using all mapped reads enforcing a greater than 10-fold enrichment above both input and RNase H-pre-treated control datasets. For DRIVE-seq, peaks were called by using uniquely mapped reads enforcing a greater than 5-fold enrichment above both input and RNase H-pre-treated samples, as well as an FDR of <0.1. Gene ontology analysis was performed using the DAVID website (Huang da et al., 2009); overlap analyses were performed using the UCSC genome browser and custom Perl scripts. Consistent with immunocytochemistry data (Figure 5A), the mitochondrial genome and ribosomal DNA arrays were two prominent targets in our sequencing datasets (data not shown).

Accession numbers

DRIP and DRIVE-seq raw sequence files have been deposited on the NCBI Sequence Read Archive database under the accession number SRA048940.1.

Supplementary Material

01
02

Highlights.

  • CpG island (CGI) promoters are characterized by GC skew downstream of TSS

  • Transcription through skewed CGI promoters leads to R-loop formation

  • R-loop formation potential is correlated with and predictive of unmethylated state of CGIs

  • R-loops may establish a protective chromatin environment against DNA methylation

Acknowledgments

We are grateful to Dr. M. R. Lieber in whose laboratory this project was initiated. We thank our colleagues Drs. J. LaSalle, R. Wu and P. Hagerman for their help. We thank Dr. Leppla (NICHD) for sharing purified S9.6 samples. We thank members of the Chédin laboratory and Dr. J. Roth for critical reading. This work was supported by research grants from the Foundation for Prader-Willi Research (to F.C.) and from the National Institutes of Health (1R01HG004348 to I.K.; 1R01GM094299 to F.C.). P.A.G was supported by a predoctoral NIH Training Grant (5T32GM007377).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Aerts S, Thijs G, Dabrowski M, Moreau Y, De Moor B. Comprehensive analysis of the base composition around the transcription start site in Metazoa. BMC Genomics. 2004;5:34. doi: 10.1186/1471-2164-5-34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Aguilera A, Gomez-Gonzalez B. Genome instability: a mechanistic view of its causes and consequences. Nat Rev Genet. 2008;9:204–217. doi: 10.1038/nrg2268. [DOI] [PubMed] [Google Scholar]
  3. Bachman KE, Park BH, Rhee I, Rajagopalan H, Herman JG, Baylin SB, Kinzler KW, Vogelstein B. Histone modifications and silencing prior to DNA methylation of a tumor suppressor gene. Cancer Cell. 2003;3:89–95. doi: 10.1016/s1535-6108(02)00234-9. [DOI] [PubMed] [Google Scholar]
  4. Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–837. doi: 10.1016/j.cell.2007.05.009. [DOI] [PubMed] [Google Scholar]
  5. Bird A. DNA methylation patterns and epigenetic memory. Genes Dev. 2002;16:6–21. doi: 10.1101/gad.947102. [DOI] [PubMed] [Google Scholar]
  6. Bock C, Paulsen M, Tierling S, Mikeska T, Lengauer T, Walter J. CpG island methylation in human lymphocytes is highly correlated with DNA sequence, repeats, and predicted DNA structure. PLoS Genet. 2006;2:e26. doi: 10.1371/journal.pgen.0020026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Boguslawski SJ, Smith DE, Michalak MA, Mickelson KE, Yehle CO, Patterson WL, Carrico RJ. Characterization of monoclonal antibody to DNA. RNA and its application to immunodetection of hybrids. J Immunol Methods. 1986;89:123–130. doi: 10.1016/0022-1759(86)90040-2. [DOI] [PubMed] [Google Scholar]
  8. Brock MV, Herman JG, Baylin SB. Cancer as a manifestation of aberrant chromatin structure. Cancer J. 2007;13:3–8. doi: 10.1097/PPO.0b013e31803c5415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Brown TA, Tkachuk AN, Clayton DA. Native R-loops persist throughout the mouse mitochondrial DNA genome. J Biol Chem. 2008;283:36743–36751. doi: 10.1074/jbc.M806174200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cadoret JC, Meisch F, Hassan-Zadeh V, Luyten I, Guillet C, Duret L, Quesneville H, Prioleau MN. Genome-wide studies highlight indirect links between human replication origins and gene regulation. Proc Natl Acad Sci U S A. 2008;105:15837–15842. doi: 10.1073/pnas.0805208105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cerritelli SM, Frolova EG, Feng C, Grinberg A, Love PE, Crouch RJ. Failure to produce mitochondrial DNA results in embryonic lethality in Rnaseh1 null mice. Mol Cell. 2003;11:807–815. doi: 10.1016/s1097-2765(03)00088-1. [DOI] [PubMed] [Google Scholar]
  12. Chaudhuri J, Tian M, Khuong C, Chua K, Pinaud E, Alt FW. Transcription-targeted DNA deamination by the AID antibody diversification enzyme. Nature. 2003;422:726–730. doi: 10.1038/nature01574. [DOI] [PubMed] [Google Scholar]
  13. Chedin F, Lieber MR, Hsieh CL. The DNA methyltransferase-like protein DNMT3L stimulates de novo methylation by Dnmt3a. Proc Natl Acad Sci U S A. 2002;99:16916–16921. doi: 10.1073/pnas.262443999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Core LJ, Waterfall JJ, Lis JT. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science. 2008;322:1845–1848. doi: 10.1126/science.1162228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. De Smet C, Loriot A, Boon T. Promoter-dependent mechanism leading to selective hypomethylation within the 5′ region of gene MAGE-A1 in tumor cells. Mol Cell Biol. 2004;24:4781–4790. doi: 10.1128/MCB.24.11.4781-4790.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Delgado S, Gomez M, Bird A, Antequera F. Initiation of DNA replication at CpG islands in mammalian chromosomes. EMBO J. 1998;17:2426–2435. doi: 10.1093/emboj/17.8.2426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Dominguez-Sanchez MS, Barroso S, Gomez-Gonzalez B, Luna R, Aguilera A. Genome Instability and Transcription Elongation Impairment in Human Cells Depleted of THO/TREX. PLoS Genet. 2011;7:e1002386. doi: 10.1371/journal.pgen.1002386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Dunn K, Griffith JD. The presence of RNA in a double helix inhibits its interaction with histone protein. Nucleic Acids Res. 1980;8:555–566. doi: 10.1093/nar/8.3.555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. El-Maarri O, Buiting K, Peery EG, Kroisel PM, Balaban B, Wagner K, Urman B, Heyd J, Lich C, Brannan CI, et al. Maternal methylation imprints on human chromosome 15 are established during or after fertilization. Nat Genet. 2001;27:341–344. doi: 10.1038/85927. [DOI] [PubMed] [Google Scholar]
  20. El Hage A, French SL, Beyer AL, Tollervey D. Loss of Topoisomerase I leads to R-loop-mediated transcriptional blocks during ribosomal RNA synthesis. Genes Dev. 2010;24:1546–1558. doi: 10.1101/gad.573310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Feltus FA, Lee EK, Costello JF, Plass C, Vertino PM. Predicting aberrant CpG island methylation. Proc Natl Acad Sci U S A. 2003;100:12253–12258. doi: 10.1073/pnas.2037852100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987;196:261–282. doi: 10.1016/0022-2836(87)90689-9. [DOI] [PubMed] [Google Scholar]
  23. Green P, Ewing B, Miller W, Thomas PJ, Green ED. Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet. 2003;33:514–517. doi: 10.1038/ng1103. [DOI] [PubMed] [Google Scholar]
  24. Guibert S, Forne T, Weber M. Dynamic regulation of DNA methylation during mammalian development. Epigenomics. 2009;1:81–98. doi: 10.2217/epi.09.5. [DOI] [PubMed] [Google Scholar]
  25. Hu Z, Zhang A, Storz G, Gottesman S, Leppla SH. An antibody-based microarray assay for small RNA detection. Nucleic Acids Res. 2006;34:e52. doi: 10.1093/nar/gkl142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Huang da W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
  27. Huang FT, Yu K, Hsieh CL, Lieber MR. Downstream boundary of chromosomal R-loops at murine switch regions: implications for the mechanism of class switch recombination. Proc Natl Acad Sci U S A. 2006;103:5030–5035. doi: 10.1073/pnas.0506548103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Illingworth RS, Bird AP. CpG islands--‘a rough guide’. FEBS Lett. 2009;583:1713–1720. doi: 10.1016/j.febslet.2009.04.012. [DOI] [PubMed] [Google Scholar]
  29. Jones PA, Baylin SB. The fundamental role of epigenetic events in cancer. Nat Rev Genet. 2002;3:415–428. doi: 10.1038/nrg816. [DOI] [PubMed] [Google Scholar]
  30. Krajewski WA, Nakamura T, Mazo A, Canaani E. A motif within SET-domain proteins binds single-stranded nucleic acids and transcribed and supercoiled DNAs and can interfere with assembly of nucleosomes. Mol Cell Biol. 2005;25:1891–1899. doi: 10.1128/MCB.25.5.1891-1899.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Latos PA, Stricker SH, Steenpass L, Pauler FM, Huang R, Senergin BH, Regha K, Koerner MV, Warczok KE, Unger C, et al. An in vitro ES cell imprinting model shows that imprinted expression of the Igf2r gene arises from an allele-specific expression bias. Development. 2009;136:437–448. doi: 10.1242/dev.032060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Laurent L, Wong E, Li G, Huynh T, Tsirigos A, Ong CT, Low HM, Kin Sung KW, Rigoutsos I, Loring J, et al. Dynamic changes in the human methylome during differentiation. Genome Res. 2010;20:320–331. doi: 10.1101/gr.101907.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Li X, Manley JL. Cotranscriptional processes and their influence on genome stability. Genes Dev. 2006;20:1838–1847. doi: 10.1101/gad.1438306. [DOI] [PubMed] [Google Scholar]
  35. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009;462:315–322. doi: 10.1038/nature08514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Macleod D, Charlton J, Mullins J, Bird AP. Sp1 sites in the mouse aprt gene promoter are required to prevent methylation of the CpG island. Genes Dev. 1994;8:2282–2292. doi: 10.1101/gad.8.19.2282. [DOI] [PubMed] [Google Scholar]
  37. Mutskov V, Felsenfeld G. Silencing of transgene transcription precedes methylation of promoter DNA and histone H3 lysine 9. EMBO J. 2004;23:138–149. doi: 10.1038/sj.emboj.7600013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Ooi SK, Qiu C, Bernstein E, Li K, Jia D, Yang Z, Erdjument-Bromage H, Tempst P, Lin SP, Allis CD, et al. DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA. Nature. 2007;448:714–717. doi: 10.1038/nature05987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Paulsen RD, Soni DV, Wollman R, Hahn AT, Yee MC, Guan A, Hesley JA, Miller SC, Cromwell EF, Solow-Cordero DE, et al. A genome-wide siRNA screen reveals diverse cellular processes and pathways that mediate genome stability. Mol Cell. 2009;35:228–239. doi: 10.1016/j.molcel.2009.06.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Payer B, Lee JT. X chromosome dosage compensation: how mammals keep the balance. Annu Rev Genet. 2008;42:733–772. doi: 10.1146/annurev.genet.42.110807.091711. [DOI] [PubMed] [Google Scholar]
  41. Pham P, Bransteitter R, Petruska J, Goodman MF. Processive AID-catalysed cytosine deamination on single-stranded DNA simulates somatic hypermutation. Nature. 2003;424:103–107. doi: 10.1038/nature01760. [DOI] [PubMed] [Google Scholar]
  42. Polak P, Arndt PF. Transcription induces strand-specific mutations at the 5′ end of human genes. Genome Res. 2008;18:1216–1223. doi: 10.1101/gr.076570.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Popp C, Dean W, Feng S, Cokus SJ, Andrews S, Pellegrini M, Jacobsen SE, Reik W. Genome-wide erasure of DNA methylation in mouse primordial germ cells is affected by AID deficiency. Nature. 2010;463:1101–1105. doi: 10.1038/nature08829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Ratmeyer L, Vinayak R, Zhong YY, Zon G, Wilson WD. Sequence specific thermodynamic and structural properties for DNA. RNA duplexes. Biochemistry (Mosc) 1994;33:5298–5304. doi: 10.1021/bi00183a037. [DOI] [PubMed] [Google Scholar]
  45. Roberts RW, Crothers DM. Stability and properties of double and triple helices: dramatic effects of RNA or DNA backbone composition. Science. 1992;258:1463–1466. doi: 10.1126/science.1279808. [DOI] [PubMed] [Google Scholar]
  46. Ross JP, Suetake I, Tajima S, Molloy PL. Recombinant mammalian DNA methyltransferase activity on model transcriptional gene silencing short RNA-DNA heteroduplex substrates. Biochem J. 2010;432:323–332. doi: 10.1042/BJ20100579. [DOI] [PubMed] [Google Scholar]
  47. Roy D, Lieber MR. G clustering is important for the initiation of transcription-induced R-loops in vitro, whereas high G density without clustering is sufficient thereafter. Mol Cell Biol. 2009;29:3124–3133. doi: 10.1128/MCB.00139-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Roy D, Yu K, Lieber MR. Mechanism of R-loop formation at immunoglobulin class switch sequences. Mol Cell Biol. 2008;28:50–60. doi: 10.1128/MCB.01251-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Seila AC, Calabrese JM, Levine SS, Yeo GW, Rahl PB, Flynn RA, Young RA, Sharp PA. Divergent transcription from active promoters. Science. 2008;322:1849–1851. doi: 10.1126/science.1162253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Sequeira-Mendes J, Diaz-Uriarte R, Apedaile A, Huntley D, Brockdorff N, Gomez M. Transcription initiation activity sets replication origin efficiency in mammalian cells. PLoS Genet. 2009;5:e1000446. doi: 10.1371/journal.pgen.1000446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Straussman R, Nejman D, Roberts D, Steinfeld I, Blum B, Benvenisty N, Simon I, Yakhini Z, Cedar H. Developmental programming of CpG island methylation profiles in the human genome. Nat Struct Mol Biol. 2009;16:564–571. doi: 10.1038/nsmb.1594. [DOI] [PubMed] [Google Scholar]
  52. Stricker SH, Steenpass L, Pauler FM, Santoro F, Latos PA, Huang R, Koerner MV, Sloane MA, Warczok KE, Barlow DP. Silencing and transcriptional properties of the imprinted Airn ncRNA are independent of the endogenous promoter. EMBO J. 2008;27:3116–3128. doi: 10.1038/emboj.2008.239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Takahashi K, Tanabe K, Ohnuki M, Narita M, Ichisaka T, Tomoda K, Yamanaka S. Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell. 2007;131:861–872. doi: 10.1016/j.cell.2007.11.019. [DOI] [PubMed] [Google Scholar]
  54. Takeshima H, Yamashita S, Shimazu T, Niwa T, Ushijima T. The presence of RNA polymerase II, active or stalled, predicts epigenetic fate of promoter CpG islands. Genome Res. 2009;19:1974–1982. doi: 10.1101/gr.093310.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Tazi J, Bird A. Alternative chromatin structure at CpG islands. Cell. 1990;60:909–920. doi: 10.1016/0092-8674(90)90339-g. [DOI] [PubMed] [Google Scholar]
  56. Thomson JP, Skene PJ, Selfridge J, Clouaire T, Guy J, Webb S, Kerr AR, Deaton A, Andrews R, James KD, et al. CpG islands influence chromatin structure via the CpG-binding protein Cfp1. Nature. 2010;464:1082–1086. doi: 10.1038/nature08924. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Touchon M, Nicolay S, Arneodo A, d’Aubenton-Carafa Y, Thermes C. Transcription-coupled TA and GC strand asymmetries in the human genome. FEBS Lett. 2003;555:579–582. doi: 10.1016/s0014-5793(03)01306-1. [DOI] [PubMed] [Google Scholar]
  58. Tous C, Aguilera A. Impairment of transcription elongation by R-loops in vitro. Biochem Biophys Res Commun. 2007;360:428–432. doi: 10.1016/j.bbrc.2007.06.098. [DOI] [PubMed] [Google Scholar]
  59. Weber M, Davies JJ, Wittig D, Oakeley EJ, Haase M, Lam WL, Schubeler D. Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nat Genet. 2005;37:853–862. doi: 10.1038/ng1598. [DOI] [PubMed] [Google Scholar]
  60. Wu H, Lima WF, Crooke ST. Investigating the structure of human RNase H1 by site-directed mutagenesis. J Biol Chem. 2001;276:23547–23553. doi: 10.1074/jbc.M009676200. [DOI] [PubMed] [Google Scholar]
  61. Yamane A, Resch W, Kuo N, Kuchen S, Li Z, Sun HW, Robbiani DF, McBride K, Nussenzweig MC, Casellas R. Deep-sequencing identification of the genomic targets of the cytidine deaminase AID and its cofactor RPA in B lymphocytes. Nat Immunol. 2011;12:62–69. doi: 10.1038/ni.1964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Yoder JA, Walsh CP, Bestor TH. Cytosine methylation and the ecology of intragenomic parasites. Trends Genet. 1997;13:335–340. doi: 10.1016/s0168-9525(97)01181-5. [DOI] [PubMed] [Google Scholar]
  63. Yu K, Chedin F, Hsieh CL, Wilson TE, Lieber MR. R-loops at immunoglobulin class switch regions in the chromosomes of stimulated B cells. Nat Immunol. 2003;4:442–451. doi: 10.1038/ni919. [DOI] [PubMed] [Google Scholar]
  64. Yu K, Roy D, Bayramyan M, Haworth IS, Lieber MR. Fine-structure analysis of activation-induced deaminase accessibility to class switch region R-loops. Mol Cell Biol. 2005;25:1730–1736. doi: 10.1128/MCB.25.5.1730-1736.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, et al. Model-based analysis of ChIP-Seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

01
02

RESOURCES