Abstract
RNA is modified by hundreds of chemical reactions and folds into innumerable shapes. However, the regulatory role of RNA sequence and structure and how dysregulation leads to diseases remain largely unknown. Here, we uncovered a mechanism where RNA abasic sites in R-loops regulate transcription by pausing RNA polymerase II. We found an enhancer RNA, AANCR, that regulates the transcription and expression of apolipoprotein E (APOE). In some human cells such as fibroblasts, AANCR is folded into an R-loop and modified by N-glycosidic cleavage; in this form, AANCR is a partially transcribed nonfunctional enhancer and APOE is not expressed. In contrast, in other cell types including hepatocytes and under stress, AANCR does not form a stable R-loop as its sequence is not modified, so it is transcribed into a full-length enhancer that promotes APOE expression. DNA sequence variants in AANCR are associated significantly with APOE expression and Alzheimer's Disease, thus AANCR is a modifier of Alzheimer's Disease. Besides AANCR, thousands of noncoding RNAs are regulated by abasic sites in R-loops. Together our data reveal the essentiality of the folding and modification of RNA in cellular regulation and demonstrate that dysregulation underlies common complex diseases such as Alzheimer's disease.
INTRODUCTION
DNA is the genetic code while RNA is the regulatory code of all organisms. DNA sequence differences account for individual variation from gene expression (1,2) to disease susceptibility. While technological advances have made it easier to identify DNA sequence mutations and variants that account for the genetic bases of diseases, molecular understanding remains a challenge. To gain deeper insight, it is necessary to delve into both the genetic and regulatory codes. RNA sequence and structure comprise the regulatory code. RNA is composed of four canonical bases (A, C, G and U) and over 150 modified bases (3) that form a myriad of structures. In an organism, every cell has largely the same DNA, but different RNA species. RNA is what confers cell identity and function. It is templated from DNA, but during and after synthesis, it is highly processed by hundreds of chemical steps that modify the bases and sugar of RNA. Chemical modifications of RNA include methylation that generates N6-methyladenosine to multiple-step reactions that form wybutosine.
RNA sequence and structure are closely related; they regulate each other and co-regulate gene expression and function. RNA sequence affects its structure and conversely, its structure can further alter the sequence, as in RNA editing in which (adenosine-uridine) AU-rich sequences form stem–loop structures that are bound by adenosine deaminase RNA-specific (ADAR) proteins to deaminate adenosine to inosine (4,5). RNA sequence and structure also mediate the interaction between RNA and regulatory nucleic acids and proteins; for example, the repression of mRNA by microRNA and Argonaut-2 is dependent on both sequence and structure (6).
Our interest in RNA sequence and structure was deepened as we study amyotrophic lateral sclerosis type 4, a juvenile-onset ALS due to heterozygous senataxin mutations (7). The patients have significantly fewer R-loops (8,9), three-stranded nucleic acid structures, each with an RNA/DNA hybrid, and a displaced single-stranded DNA (10–13). To understand how the deficiency of R-loops affects cell function, we looked for proteins that bind to these nucleic acid structures. We and other groups have identified hundreds of R-loop binding proteins (14–16). These studies use different methods, yet the results consistently show the same set of several hundred proteins. These include a significant number of enzymes that modify nucleic acids such as METTL3 and METTL14 that methylate adenosine in RNA to form N6-methyladenosine, as well as methylpurine glycosylase (MPG) and apurinic/apyrimidinic endonuclease 1 (APE1) that were known to process DNA (17). These results suggest that R-loops serve as platforms for processing nucleic acid including RNA modification. Yet to confirm their regulatory roles, the mechanism must be determined for how each protein processes nucleic acid individually and jointly.
We first focused on two proteins, MPG and APE1, and found that MPG not only cleaves the N-glycosidic bond on DNA but also on RNA, leading to RNA abasic sites (18). APE1 then processes the RNA by cleaving the sugar-phosphate backbone at the abasic sites. The activity of MPG and APE1 on RNA occurs only when the RNA is hybridized to a DNA strand, as in an R-loop, thus further illustrating the co-dependence of sequence and structure. Mass spectrometry analysis shows that abasic sites are not rare in RNA; there are about four RNA abasic sites per million ribonucleotides in human cells such as primary fibroblasts (18) so there are hundreds of thousands of RNA abasic sites in a cell. Given their abundance, it is necessary to study them beyond knowing how they form.
In the current study, we set out to examine RNA abasic sites in R-loops and uncovered a gene regulatory mechanism by which RNA abasic sites stabilize R-loops to pause RNA Polymerase II transcription. We found a noncoding enhancer RNA of apolipoprotein E (APOE), which we refer to as APOE-activating noncoding RNA, AANCR, whose expression and function are regulated dynamically by pausing RNA Polymerase II. When this noncoding enhancer RNA is full-length, it activates APOE expression. When this RNA is only partially transcribed, it is nonfunctional and APOE is not expressed. In some cells, the noncoding AANCR RNA is not fully transcribed since RNA Polymerase II elongation is paused by R-loops that are stabilized by RNA abasic sites. In response to hypertonic stress, the R-loops that pause transcription resolve, and AANCR RNA is transcribed into a full-length enhancer that activates APOE expression. By genetic analysis, we showed that sequence variants in the enhancer region affect APOE expression in several cell types, including the hippocampus, and are associated with Alzheimer's disease. Besides this noncoding enhancer RNA of APOE, transcription elongation of more than 1,000 other noncoding RNAs are regulated by R-loops with RNA abasic sites. Thus, we have revealed a gene regulatory mechanism in which the sequences and structures of noncoding RNAs regulate their transcription, function, and consequently disease susceptibility.
MATERIALS AND METHODS
Astrocyte expression
Paired-end RNA-seq data from human astrocytes (19) were downloaded from the NCBI GEO database and aligned to the hg19 reference genome using STAR version 2.6.0c. Read coverage for AANCR and APOE were obtained using the samtools bedcov function on the aligned bam files. The reads for each gene were normalized by millions of reads for each sample (RPM). The hg19 chromosome coordinates used for AANCR were chr19:45406985–45408892 and for APOE were chr19:45409039–45412650.
Cell culture
Foreskin fibroblasts from healthy newborns were cultured in MEM medium (Thermo-Fisher, Cat# 11095080) supplemented with 10% fetal bovine serum, 1% l-glutamine, and 1% penicillin–streptomycin. HK-2 cells (ATCC, Cat# CRL-2190) were cultured in DMEM/F12 supplemented with 10% fetal bovine serum and 1% penicillin–streptomycin. A549 (ATCC, Cat# CCL-185) and HepG2 (ATCC, Cat# HB-8065) cells were cultured in DMEM supplemented with 10% fetal bovine serum and 1% penicillin–streptomycin. B-lymphoblasts (Coriell) were cultured in RPMI 1640 supplemented with 15% fetal bovine serum and 1% penicillin–streptomycin. All cells were grown at 37°C with 5% CO2. Adherent cells were passaged every 72 h using trypsin–EDTA (0.05%), or trypsin–EDTA (0.25%) for HepG2 cells. Where indicated, media were supplemented with estrogen (Sigma, Cat# E-060-1ML) dissolved in ethanol or an equal volume of vehicle.
Chromatin immunoprecipitation
Foreskin fibroblasts were cross-linked with 1% formaldehyde for 10 min. Cross-linking was stopped with 2.5M glycine for 5 min. Nuclei were isolated by rotating crosslinked cells for 10 min at 4°C in 5 ml lysis buffer 1 (50 mM HEPES pH 7.6, 140 mM NaCl, 1 mM EDTA, 10% glycerol, 0.5% NP-40, 0.25% Triton X-100) followed by pelleting, and a 10 min rotation in 5 ml lysis buffer 2 (200 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 10 mM Tris, pH 8). Nuclei were pelleted, then swelled in lysis buffer 3 (10 mM Tris, pH 8, 1 mM EDTA, 0.5 mM EGTA, 100 mM NaCl, 0.1% deoxycholic acid, 10% N-lauryl sarcosine) for 10 min, then sonicated on high setting (30 s on, 30 s off) for 15 min to shear chromatin to <500 nt with Bioruptor (Diagenode). After pelleting the insoluble fraction, the supernatant was pre-cleared with Protein A/G agarose beads (Thermo Fisher, Cat# 78609) and anti-rabbit IgG (Sigma, Cat# I5006). 50 μg sheared chromatin was incubated in RIPA buffer (50 mM Tris, pH8, 150 mM NaCl, 1% NP-40, 0.5% sodium deoxycholate, 0.1% SDS) with either 5 μg rabbit IgG (Sigma Cat# I5006), 5 μg mouse IgG (Santa Cruz, Cat# SC2025), or 5 μg of antibodies against NELFA (Santa Cruz, Cat# sc365004), H3K27ac (Abcam, Cat# ab4729) or H3K4me1 (Abcam, Cat# Ab8895) and recovered with Protein A/G beads. Beads were washed twice with low salt RIPA (150 mM NaCl) and twice in high salt RIPA (300 mM NaCl), then eluted in 100 μl 1% SDS plus 100 mM sodium bicarbonate. After cross-link reversal, DNA was purified with a QIAquick PCR Purification Kit (Qiagen) and quantified by qPCR or by sequencing. ChIP-seq libraries were prepared using the Ovation Ultralow Library system (NuGen). Libraries were sequenced on the HiSeq 2500 instrument (Illumina) and ∼40 million 100-nt reads were generated per ChIP sample. Sequence pre-processing and alignment were performed as described for PRO-seq.
Coexpression analysis
Entrez gene IDs for APOE (348), APOC1 (341) and TOMM40 (10452) were submitted to COEXPRESSdb and expression data were plotted under default settings. Pearson correlation coefficients are reported.
Expression analysis
Total RNA was isolated using the RNeasy Mini-Kit (Qiagen) and 0.5 μg RNA converted to cDNA using Taqman RT reagents kit (ThermoFisher) with random hexamer priming. Gene expression was determined by SYBR green qPCR on an ABI 7900HT or BioRad CFX384 instrument using the delta-Ct method. Gene expression primers are listed in Supplementary methods.
Hypertonic stress treatment
HK-2 cells were cultured in DMEM-F12 media supplemented with 50 mM NaCl to a final 400 mOsm. To generate conditioned media HK-2 cells were maintained in isotonic media (300 mOsm) or hypertonic media (400 mOsm) for 6 h. After 6 h the conditioned hypertonic media was collected. The isotonic media was collected and 50 mM NaCl was added to yield hypertonic media. The hypertonic media and hypertonic conditioned media were confirmed to be 400 mOsm using a 6002 μOsm Osmometer (Precision systems). HK-2 cells were seeded in 96-well plates and maintained in hypertonic media or hypertonic conditioned media for 24 h. Apoptosis was monitored using RealTime-Glo™ Annexin V Apoptosis and Necrosis Assay (Promega, Cat# JA1011) according to manufacturer specifications and quantified in a Cytation 5 (Agilent) plate reader.
RNA abasic site detection and quantitation with ARP
RNA samples were incubated in 2 mM ARP (N-(aminooxyacetyl)-N'-(d-Biotinoyl) hydrazine) (Thermo Fisher Scientific, Cat# A10550) in 20 mM Tris–HCl, 1 mM DTT, 1 mM EDTA, pH 8.0 at 37°C with agitation for 1 h. Formaldehyde was then added to 50 mM and incubated for 10 min at 37°C to quench ARP. RNA was precipitated in 0.3 M NaAc (pH 5.5) and 3 volumes of 100% EtOH, followed by 75% EtOH washes. The samples were analyzed on 1% formaldehyde agarose gel in 1× MOPS buffer (20 mM MOPS, 5 mM sodium acetate, 1 mM EDTA, pH 7.0). RNA was transferred to Hybond N+ nylon membrane by overnight capillary transfer in 10× SSC buffer (1.5 M NaCl, 150 mM sodium citrate, pH 7.0). Biotin signal on nylon membrane was detected using a streptavidin-based chemiluminescent method (Thermo Fisher Scientific, Cat# 89880).
For site-specific quantification, ARP-labelled abasic RNA was recovered with M280 streptavidin beads. RNA was eluted from the beads using Trizol and converted to cDNA by reverse transcription using random hexamer priming. Enrichment of RNA containing abasic sites was quantified by qPCR.
DNA–RNA hybrid immunoprecipitation (DRIP)
Immunoprecipitation procedure was adapted from previous studies (Skourti-Stathaki et al., 2011). 5 × 106 primary fibroblasts were lysed in 600 μl cell lysis buffer (50 mM PIPES, pH 8.0, 100 mM KCl, 0.5% NP-40) and nuclei were collected by centrifugation. Pelleted nuclei were resuspended in 300 μl nuclear lysis buffer (25 mM Tris–HCl, pH 8.0, 1% SDS, 5 mM EDTA). Genomic DNA, along with R-loop, were then extracted by phenol:chloroform and ethanol precipitation. Purified DNA was resuspended in IP dilution buffer (16.7 mM Tris–HCl, pH 8.0, 1 mM EDTA, 0.01% SDS, 1% Triton-X100, 167 mM NaCl) and sonicated for 15 min in Bioruptor (Hi setting, 30 s on/30 s off) to fragments with average size of 500 nt. Three μg of S9.6 monoclonal antibody (gift from Dr. Stephen H. Leppla at NIH) or non-specific mouse IgG (Santa Cruz, Cat# SC2025) was used for each immunoprecipitation. Input and precipitates were analyzed by quantitative PCR using primers (see Supplementary methods) or by sequencing. Sequencing libraries were prepared from input and DRIP DNA using Ovation Ultralow System (NuGen, Cat# 0344) and sequenced on an HiSeq 2500 (Illumina). An average of 100 million 100 nt reads per sample were generated. Sequencing reads were pre-processed to remove the adapter sequences from the end of reads using the program fastx_clipper from FASTX-Toolkit (Hannon Lab). Low-quality sequences at the ends of reads as represented by stretches of ‘#’ in the quality score string in the FASTQ file were also removed. Reads shorter than 35 nt after trimming were excluded from the analysis. Sequencing reads were then aligned to human reference hg18 using GSNAP (Version 2013-10-28) (20) using the following parameters: mismatches ≤ [(read length + 2)/12 – 2]; mapping score ≥20; soft-clipping on (-trim-mismatch-score = –3). Reads with identical sequences were compressed into one unique sequence. BigWig tracks were computed using bedtools and converted to hg19 coordinates using CrossMap.
Genetic analysis
Genetic analysis of APOE expression was carried out using the eQTL calculator in GTEx (https://gtexportal.org/home/testyourown), we tested the allelic association of two SNPs in AANCR, rs449647 and rs405509 with APOE expression in the brain, colon, liver, and testis, each SNP and expression pair tested individually. The nominal P-values (<0.05) are shown in Supplementary Table S1. Allelic association of rs449647 with APOE plasma level is obtained from the publication (21). Association of rs449647 and rs405509 with Alzheimer's disease was analyzed using the GWAS Catalog and the NIA Genomics of Alzheimer's disease using the two SNP IDs individually in the search field, https://www.ebi.ac.uk/gwas/ and https://www.niagads.org/genomics/home.jsp, represented results shown in Supplementary Table S2.
iPSC hepatocyte differentiation
Hepatic differentiation of human pluripotent stem cells was performed following a three-step protocol (22). First, iPSC at 60–70% confluence were treated for 3 days with 100 ng/ml Activin A (R + D Systems, Cat# 338-AC) and 100 ng/ml bFGF (R&D Systems, Cat# 3718-FB) in the presence of increasing levels of FBS (0% on day 1, 0.2% on day 2, and 2% on day 3) to generate definitive endoderm. Confluent definitive endoderm cells are then passed 1:3 in presence of Rock Inhibitor (Tocris, Cat# 1254) on growth factors reduced Matrigel (BD Biosciences, Cat# 354277) and cultured for 8 days in differentiation medium: DMEM F12, 10% KOSR (Sigma, Cat# 10828010), with 1% NEAA (Thermo Fisher, Cat# 11140050), 1% glutamine, 100 ng/ml of HGF (Peprotech, Cat# 100–39H) and 1% DMSO (Sigma Aldrich, Cat# D2650), to promote hepatic specification. Finally, the hepatoblasts were matured in DMSO-free differentiation medium with 10−7M of dexamethasone (Sigma Aldrich, Cat# D8893) for 3 days. Hepatocytes were then maintained for up to 1 week in hepatocyte culture medium: L15 medium (Cat# 11415064), 8.4% FCS with 1% glutamine, 10% tryptose phosphate (Life Technologies, Cat# 18050039) containing 1 μM insulin (Sigma, Cat# I3536), 10 μM hydrocortisone (Sigma, Cat# H0888) and 0.1μM of dexamethasone.
MPG electrophoretic mobility shift assay
RNA/DNA hybrid substrates (see Supplementary methods) containing a single N6-methyl-adenosine within the RNA strand was incubated with increasing concentration of human recombinant MPG. MPG protein was purified as described previously (23). Binding incubations were performed on ice for 15 min in buffer containing 20 mM Tris–HCl, pH 8.8, 10 mM (NH4)2SO4, 10 mM KCl, 2 mM MgSO4, 0.1% Triton X-100 and a range of 0–800 nM of MPG. The binding mixtures were immediately subjected to non-denaturing 6% polyacrylamide gel (acrylamide:bis-acrylamide, 37.5:1) electrophoresis. To maintain the integrity of bound complexes during electrophoresis, the gel was run at 4°C. The fraction bound relative to the total signal was then determined and plotted using Kaleidagraph v.4.1. The data from ≥3 binding experiments were fitted to a modified Hill binding equation, where the fraction of substrate-bound is related to the Kd as described (24):
![]() |
where fmax and fmin are normalization factors that represent the fraction of substrate bound at the highest and lowest asymptotes of the titration, [E] is the total enzyme concentration, and n is the Hill coefficient. The Hill coefficient measures the cooperativity of binding. Kd values were estimated from the fitted data.
MPG enzymatic activity assay
RNA/DNA hybrid substrates (see Supplementary methods) containing a single N6-methyl-adenosine were formed by heating to 90°C for 1 min and slowly cooling to 4°C in a buffer containing 30 mM Tris, pH 7.5, and 100 mM potassium acetate. The strand containing the methylated adenosine was 5′-end labeled with fluorescein (FAM). Annealed substrates (200 nM) were incubated with (+) or without (–) 20 units of MPG in a reaction buffer containing 20 mM Tris–HCl, pH 8.8, 10 mM (NH4)2SO4, 10 mM KCl, 2 mM MgSO4, 0.1% Triton X-100 for 60 min at 37°C. Substrates were then treated with 1 μM APE1 (Wilson lab) for an additional 2 min at 37°C. Enzymes were subsequently inactivated by a 5-min incubation at 75°C. Samples were then denatured with formamide (1:1 vol/vol) and a 2-min incubation at 95°C before loading onto a 7 M urea 15% denaturing polyacrylamide gel.
MPG RIP-seq
For MPG RIP-seq, following isolation of RNA, sequencing libraries were prepared using TruSeq Stranded Total RNA Library Prep Kit (Illumina). Sequencing was performed on Illumina HiSeq 2500. Sequencing adapters and low-quality read ends were trimmed using the FASTX-Toolkit, and reads shorter than 35 nt were discarded. Reads were mapped to GRCh37 (hg19) using GSNAP (v20190912) with parameters ‘-B 4 -N 1 -M 1 -n 10 -Q –max-mismatches = 5’. Reads with identical sequences were collapsed to a single read. Input read depth was smoothed using a 1000 nt window, and MPG peaks were called by computing a ratio of RIP/smoothed input coverage at each base. Regions larger than 50 bases with ratio ≥5 (represented by three or more reads) were considered RIP peaks.
m6A RIP-seq
For m6A RIP-seq, following isolation of RNA, sequencing libraries were prepared using TruSeq Stranded Total RNA Library Prep Kit (Illumina). Sequencing was performed on Illumina MiSeq. Sequencing adapters and low-quality read ends were trimmed using the FASTX-Toolkit, and reads shorter than 35 nt were discarded. Reads were mapped to GRCh37 (hg19) using GSNAP (v20190912) with parameters ‘-B 4 -N 1 -M 1 -n 10 -Q (note: no maxMismatches = 5 flag)’. Reads with identical sequences were collapsed to a single read, then bigWig files were generated and plotted as a heatmap using deepTools 3.5.1.
RNA-sequencing
From cells treated with siRNA against MPG or non-target siRNA, sequencing libraries were prepared using TruSeq Stranded Total RNA Library Prep Kit (Illumina). Sequencing was performed on Illumina HiSeq 2500 and >150 million 100-nt reads were generated from each sample. Low-quality bases were trimmed from the 3′ end of reads and 3′ adapter was trimmed using FASTQ/A Clipper with default settings (Hannon lab). Reads shorter than 35 nt were excluded from analysis. Sequencing reads were aligned to human reference (hg19) using GSNAP (v20190912) (Wu and Nacu, 2010) using the following parameters: mismatches % [(read length + 2)/12 – 2]; mapping score R 20; soft-clipping on (-trim-mismatch-score = 3).
Precision run-on sequencing (PRO-seq)
PRO-seq libraries were prepared as described previously (25,26). Fibroblast nuclei (5 × 106) were added to 2× nuclear run-on (NRO) reaction mixture (final concentrations:10 mM Tris–HCl pH 8.0, 300 mM KCl, 1% sarkosyl, 5 mM MgCl2, 1 mM DTT, 0.03 mM each of biotin-11-A/C/G/UTP (Perkin-Elmer, Cat# NEL544001EA, Cat# NEL543001EA, Cat# NEL542001EA and Cat# NEL545001EA), 0.8 u/μl RNase inhibitor) and incubated for 3 min at 37°C. Nascent RNA was extracted by phenol (Trizol LS)/chloroform and then fragmented by base hydrolysis in 0.2 N NaOH on ice for 15 min. The reaction was neutralized by adding 0.7× volume of 1 M Tris–HCl pH 6.8. The fragmented nascent RNA was purified using 30 μl of Streptavidin M-280 magnetic beads (Thermo Fisher Scientific, Cat# 11206D) and ligated with 3′ RNA adapter (5′p-GAUCGUCGGACUGUAGAACUCUGAAC-/3InvdT/). Biotin-labeled products were recovered by streptavidin beads. The RNA products were successively treated with 5′ pyrophosphohydrolase (NEB, Cat# M0356) and polynucleotide kinase (NEB, Cat# M0201) to repair the 5′ end. RNA was ligated to the 5′ RNA adapter (5′-CCUUGGCACCCGAGAAUUCCA-3′). The products were further purified by the streptavidin beads. RNA was reverse transcribed using RT primer (5′-AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGA-3′). The product was PCR amplified, resulting amplicons that are between 150 and 250 bp (insert > 70 bp) were purified using the BluePippin (Sage Science) agarose gel electrophoresis, and then sequenced on the HiSeq 2500 instrument (Illumina) to a depth of >150 million reads per sample. PRO-seq data were aligned to the human genome using GSNAP, results corresponding to GRCh37 (hg19) are shown (20). BAM files were generated and normalized to reads per million mapped reads (RPM). For comparison between conditions, RPM normalized signal was plotted for AANCR-APOE locus.
The pausing index (PI) was calculated as the ratio of the read density for the 50 nt interval in AANCR with the greatest number of reads and compared to the read density over the rest of the noncoding transcript. A region with a PI > 3 was considered as having paused polymerase.
RNA immunoprecipitation
Primary human fibroblasts (5 × 106 cells per experiment) were treated with lysis buffer, (10 mM Tris–HCl pH 7.4, 10 mM NaCl, 0.5% NP-40, 1 mM DTT, 200 units/ml RNase OUT and EDTA-free protease inhibitor cocktail), the lysate mixed with a freshly made preparation of protein A/G magnetic agarose beads bound to either anti-MPG antibody, anti m6A antibody or anti-METTL3 antibody. Protein-RNA complexes in the cell lysates were allowed to bind to their respective antibody-bead preparation, followed by treating the immunoprecipitates with proteinase K, RNA extraction and DNase treatment of the extracted RNA. Samples were converted to cDNA using random hexamers and enrichment was assessed by qPCR.
Elongating RNA Pol II pausing in noncoding transcripts
To identify noncoding transcripts with paused RNA Pol II, we used data from the DBKERO TSS-seq database (Release 1.2.7), PRO-cap and corresponding PRO-seq (26,27). A noncoding transcript is identified as one with TSS and nascent RNA (from PRO-seq) and it is located at least 200 bases from an annotated coding gene.
For each noncoding transcript, we assessed for evidence of paused RNA Pol II. Each transcript was divided into 50-nucleotide windows and the average PRO-seq read counts was determined. Paused RNA Pol II were identified when the average count in a 50-nucleotide window is at least three times greater than the average read count in the remaining transcript. For the 50-nucleotide windows that are at least 500 bases from TSS, we considered those to be RNA Pol II that paused in gene bodies (during elongation). We then identified the gene body pauses that overlap R-loops (annotated consensus locations from RLBase (28) (version 1.0.1)) and MPG binding sites (MPG RIP-seq). The regions in B-cells, fibroblasts and renal proximal tubule cells where RNA Pol II pause in gene bodies, and colocalize with R-loops and MPG binding are reported.
siRNA knockdown
Primary fibroblasts were seeded at 2 × 105 per well in 6-well dishes. Cells were transfected with siRNA using Lipofectamine RNAiMax to a final concentration of 12 nM on day 0 and day 3. Cells were harvested for expression or protein analysis 3 and/or 7 days post-transfection. Catalog numbers for siRNA targeting MPG, METTL3 and control siRNA are listed in Supplementary methods.
S9.6 dot blot
S9.6 dot blot to assess genome-wide R-loop abundance was carried out as before (29). Briefly, genomic DNA containing R-loops was incubated with 1 μl of RNase H1 or mock digestion in 1× RNase H reaction buffer (10 mM Tris–HCl pH 8.0, 50 mM NaCl, 10 mM MgCl2, 10 mM DTT) at 37°C for 30 min. DNA was phenol extracted, ethanol precipitated, and reconstituted in 10 μl TE buffer. 5 μl DNA solution was loaded onto Hybond N+ nylon membranes (GE Life Sciences, Cat# RPN203B) presoaked with PBS, and crosslinked in UV Stratalinker 2400 (Stratagene) at the ‘Auto Crosslink’ setting (1200 μJoulesX100). The membrane was blocked in 5% milk in PBS–0.1% Tween-20 for one hour and incubated with 1:1000 S9.6 antibody overnight at 4°C to detect RNA/DNA hybrids. A duplicate blot was incubated with anti-dsDNA antibody (Abcam, Cat# ab27156) as a loading control. Signals were then detected by horse-radish peroxidase (HRP)-conjugated secondary antibody and enhanced chemiluminescence. The S9.6 signals normalized to dsDNA signals were determined.
Secreted APOE detection
To measure secreted APOE, cell culture media was concentrated 10-fold using an Amicon ultra centrifugal device (Sigma). Concentrated media (500 μl) was mixed with an equal volume of 2× RIPA buffer with protease inhibitor (Sigma, Cat# 11836170001) and PMSF (Sigma, Cat# P7626) and rotated overnight with 10 μg APOE antibody (Sigma, Cat# AB947) or IgG (Sigma, Cat# I5006). Antibody–protein complexes were recovered with protein A/G beads (Fisher, Cat# PI78609) after 2 h of rotation at 4°C and three washes in 1× RIPA. Bound protein was released in 30 μl sample loading buffer, and western blot was performed by standard procedure. APOE was detected with 1:1000 primary antibody (Abcam) and 1:5000 secondary antibody. Ponceau (Sigma, Cat# P3504) staining of the western membrane was used as the loading control.
White blood cell isolation
WBC were obtained from a subject who received clinical evaluations at the National Institutes of Health (NIH) in Bethesda, MD under IRB-approved protocol 00-N-0043 ‘Clinical and Molecular Manifestations of Inherited Neurological Disorders.’ Written informed consent was received from the participant before inclusion in the study. Venous blood was collected in a 10 ml lavender-top K2EDTA tube. 30 ml of RBC lysis solution (Qiagen) was added to 10mL of whole blood and mixed by inverting 10 times. The sample was incubated for 5 min at room temperature. WBCs were pelleted by centrifugation for 2 min at 2000 × g.
Statistics
This study included various statistical approaches which are detailed in the appropriate subsections in the methods. Publicly available sequencing data that support the conclusions of the study are listed in Supplementary methods. The deep sequencing data reported in this paper have been deposited in the NCBI sequence read archive (PRJNA801792) and the NCBI database of Genotypes and Phenotypes archive (Phs001322.v2.p1). The number of biologically independent experiments, sample size, statistical tests, and P-values are indicated in the main text or the figure legends. The significance level was set at P < 0.05 or less.
RESULTS
Methylpurine glycosylase knockdown induces APOE expression
To study RNA abasic sites, in skin fibroblasts, we knocked down MPG by RNA interference and carried out RNA sequencing which showed that the most highly induced gene is APOE. APOE is not expressed or at a very low level in skin fibroblasts, but when MPG is knocked down, APOE gene expression is highly induced (Figure 1A, >50-fold; Supplementary Figure S1A). The immunoblot shows that APOE protein expression is also upregulated significantly (Figure 1B, P << 0.001, >20-fold). The resultant APOE is then secreted, as APOE protein level is significantly higher (Figure 1C, P << 0.001, >50-fold) in the cell culture media for the fibroblasts whose MPG is knocked down. We also knocked down MPG in lung epithelial cells and found that APOE protein expression is also induced in these cells (Supplementary Figure S1B). Since APOE is not expressed or at a very low level in skin fibroblasts, the increase in APOE expression is most likely a result of increased transcription. To measure RNA polymerase II (RNA Pol II) transcription, we carried out Precision nuclear Run-On sequencing (PRO-seq) which identifies nascent RNA with actively transcribing RNA Pol II. The PRO-seq results show that indeed upon MPG knockdown, there is an increase in the abundance of RNA Pol II in the APOE promoter and active transcription of APOE (Figure 1D). Thus, MPG knockdown induces transcription and expression of APOE.
Figure 1.
APOE is regulated by MPG. (A) Representative RNA-seq showing APOE expression in fibroblasts before and after MPG knockdown (days 3 and 7), Y-axis is RPM. (N = 2). (B) Immunoblot of MPG and APOE expression before and after MPG knockdown (day 7). GAPDH is a loading control (N > 5). (C) Densitometry quantification of immunoblots of secreted APOE protein before and after MPG knockdown (day 7) (N = 3, P << 0.001, t-test, error bars = S.E.M.). NTC is the non-target control. (D) PRO-seq results for APOE before and after MPG knockdown (day 7) are plotted. The bar at each nucleotide location represents the abundance of RNA Pol II. APOE transcription is induced after MPG knockdown. Y-axis is RPM (N = 2).
R-loops form upstream of APOE and regulate its expression
Our previous study showed that MPG forms abasic sites in the RNA of RNA/DNA hybrids but not in double-stranded RNA (18). Here, we asked whether MPG affects APOE expression through binding to R-loops. Using the S9.6 antibody that specifically recognizes R-loops (30–32), we carried out DNA–RNA immunoprecipitation, DRIP, followed by sequencing to enrich and map R-loops. Since APOE is not transcribed or at a very low level in fibroblasts, there was no RNA to form R-loops in APOE, rather we detected R-loops upstream of APOE (Figure 2A, top panel), in an intergenic region. This upstream region is transcribed by RNA Pol II until the polymerases pause (pausing index = 7.5), just 5′ to the R-loops (Figure 2A, bottom panel). To assess if the R-loops or negative elongation factor (NELF) protein complex pauses the RNA Pol II transcription (33), we carried out immunoprecipitation against NELFA. In NELFA-IP followed by PCR, we did not detect NELFA binding to the region upstream of APOE, whereas NELFA binding was readily detected at the known NELF-mediated RNA Pol II pause site in the HSP70 promoter (Figure 2B). This suggests that in the region upstream of APOE, RNA Pol II transcription is paused by R-loops.
Figure 2.
Upstream R-loops regulate APOE expression. (A) Average S9.6 DRIP-seq data in fibroblasts (N = 5) are plotted (top panel) and show R-loops upstream (5′) of APOE. Average PRO-seq results in fibroblasts (N = 5) are plotted (bottom panel), showing active transcription in the intergenic region upstream of APOE, and RNA Pol II pausing upstream of the R-loops (pausing index = 7.5). Y-axis is RPM. Arrows indicate qPCR primers used in subsequent figures. Genomic coordinates (hg19) are indicated. (B) Data from NELFA chromatin immunoprecipitation followed by qPCR with primers indicated by arrows in (A) are shown as fold enrichment over IgG in arbitrary units (a.u.) (N = 2, error bars are S.E.M). (C) APOE expression levels in fibroblasts from family-control and ALS4 patients are plotted, each dot represents the APOE expression of one person (*P < 0.05, t-test).
We wondered if, in this region, the R-loops affect APOE expression. We examined APOE expression in ALS4 patient cells. ALS4 is caused by heterozygous senataxin mutations that lead to a hyperactive senataxin (RNA/DNA helicase) that reduces R-loops without affecting transcription initiation (7,8). We compared APOE expression in fibroblasts from ALS4 patients and their family controls. We found that on average, ALS4 patients with fewer R-loops (8) have higher APOE expression (Figure 2C). While APOE is not expressed in most of the fibroblasts from controls, it is expressed in many of the fibroblasts from ALS4 patients. Thus, cells with fewer stable R-loops have higher APOE expression. Like MPG expression, R-loop abundance is negatively correlated with APOE expression.
RNA abasic sites form in the R-loops and pause RNA Pol II transcription upstream of APOE
Next, we asked if the MPG binds to the R-loops near APOE and if so, are RNA abasic sites generated. We carried out MPG RNA-IP and ARP RNA-pulldown. We confirmed by S9.6 DRIP-PCR, the presence of R-loops upstream of APOE as shown in Figure 2A (Figure 3A, left panel, Supplementary Figure S1C). MPG RNA-IP then shows that MPG binds to the RNA in those R-loops upstream of APOE (Figure 3A, middle panel, Supplementary Figure S1D), and ARP RNA-pulldown (Figure 3A, right panel, Supplementary Figure S1E) shows abasic sites in the RNA. Together, the results show that MPG binds to the R-loops upstream of APOE and forms RNA abasic sites.
Figure 3.
Upstream of APOE, RNA abasic sites stabilize R-loops. (A) Data from S9.6 DRIP, MPG RNA-IP, and ARP RNA-pulldown followed by PCR are shown as fold enrichment compared to input in arbitrary units (a.u.). Primers corresponding to the R-loops are shown in Figure 2A. S9.6 DRIP-PCR confirmed the location of the R-loops (N = 3,***P < 0.001, t-test, error bars = S.E.M.), MPG RIP-PCR shows binding of MPG to RNA of the R-loops (N = 3, ****P < 0.0001; t-test, error bars = S.E.M.), and ARP RNA-pulldown-PCR shows abasic sites in the RNA of the R-loops (N = 2, ***P < 0.001; t-test, error bars = S.E.M.). (B) R-loop abundance measured by S9.6 dot blot (see Materials and Methods, Supplementary Figure S2A–C) shows a dose-dependent increase of R-loops at 24 h following treatment with estrogen (P < 0.0001; ANOVA, error bars = S.E.M.) and time-dependent increase following 100 nM estrogen treatment (P < 0.05; ANOVA, error bars = S.E.M.). (C) RNA abasic site abundance as measured by ARP-labeling shows a dose (N = 3, ****P < 0.0001; ANOVA, error bars = S.E.M.) and time (N = 3, *P < 0.05; ANOVA, error bars = S.E.M.) dependent increase following treatment with estrogen. Y-axis is fold-enrichment relative to control. (D) R-loop abundance was measured by S9.6 dot-blot in fibroblasts treated with scrambled siRNA (nontarget control, NTC) and siRNA against MPG, the cells were then given estrogen (0–100 nM) (N = 2, error bars = S.E.M.). R-loops do not accumulate in response to estrogen when MPG is knocked down. (E) More R-loops are present upstream of APOE as measured by S9.6 DRIP-PCR in fibroblasts followed by estrogen treatment (100 mM). Location of the assessed R-loops is shown in the schematic and corresponds to the R-loops in Figure 2A (N = 3; ****P < 0.0001, t-test, error bars = S.E.M.).
To further examine the relationship between R-loops and RNA abasic sites, we induced R-loops with estrogen (34). In primary fibroblasts, estrogen significantly (P < 0.0001) increases R-loops (Figure 3B, Supplementary Figure S2A–C) and RNA abasic sites (Figure 3C) in a dose and time-dependent manner genome-wide. Since RNA abasic sites form on R-loops and given that estrogen increases R-loops, it is not surprising that the abundance of RNA abasic sites also increases following estrogen. To assess if there is a reciprocal relationship between R-loops and RNA abasic sites, we knocked down MPG and treated those fibroblasts with estrogen. Results show that when MPG is knocked down, R-loops do not accumulate in response to estrogen (Figure 3D), indicating that the stability of R-loops may depend on RNA abasic sites.
The estrogen treatment increases R-loops genome-wide, including upstream of APOE (Figure 3E). We assessed the effect of the stabilized R-loops on transcription upstream of APOE by performing PRO-seq in fibroblasts at different timepoints following estrogen treatment. The estrogen-induced R-loops led to increased pausing of RNA Pol II in the intergenic region, and we observed a time-dependent increase in RNA Pol II pausing (pausing index increased from 4.8 in resting cells to 11.8 after 6 h in estrogen; Supplementary Figure S2D).
Together, these results show that upstream of APOE, nascent RNA forms R-loops, then MPG generates RNA abasic sites which likely stabilizes the R-loops that in turn pause transcription of the intergenic RNA.
m6A is likely a precursor to RNA abasic site in R-loops
We next asked what attracts MPG to the RNA of the R-loops upstream of APOE. Since MPG is a methylpurine glycosylase, we focused on methylated purines. In Modomics (3), there are 50 different types of methylated purines in RNA. Among them, m6A is the most abundant (35–38). Given their abundance and based on findings from previous studies that identified m6A in R-loops (39–41), we focused on m6A to ask if they are present in the RNA in the R-loops upstream of APOE and if so, whether they are a target of MPG.
We began by searching for and found in that RNA, the DRACH (D = A/G/U, R = A/G, H = A/C/U) motif (42,43) where methyltransferases such as METTL3 and METTL14 methylate adenosines (Supplementary Figure S3A). We then carried out m6A RNA-IP which identified m6As in the RNA upstream of APOE (Figure 4A, Supplementary Figure S3B) and it coincides with where MPG binds (Figure 3A, middle panel). The colocalization data suggest that m6As in the R-loops may be the substrate of MPG and therefore the precursor to RNA abasic sites.
Figure 4.
m6A is the precursor to RNA abasic site. (A) m6A in the R-loops upstream of APOE are identified by m6A-RIP followed by PCR with primers shown in the schematic and corresponds to the R-loops in Figure 2A (N = 3, ***P < 0.001; t-test, error bars = S.E.M.). m6A enrichment is shown as fold-over input in arbitrary units (a.u). See also Supplementary Figure S3A. (B) fewer R-loops upstream of APOE (N = 3, ****P < 0.0001; t-test, error bars = S.E.M.) and (C) higher APOE expression levels (N = 3, *P < 0.05; t-test, error bars = S.E.M.) in cells treated with siRNA targeting METTL3. Y-axis is fold-expression relative to non-target control (NTC). (D) MPG expression in fibroblast with and without METTL3 siRNA knockdown as measured by RT-PCR (N = 2, P > 0.5, error bars = S.E.M.). (E) MPG binding to m6ARNA/DNA hybrids (top; Kd = 460 ± 22 nM), RNA/m6ADNA hybrids (middle; Kd = 562 ± 51 nM), or unmodified RNA/DNA hybrids (bottom; Kd >> 661 ± 89) was measured by gel shift and were fitted (see Materials and Methods) to the modified Hill equation (N
3, error bars = S.D.).
If m6A is a substrate of MPG, then the METTL3/METTL14 complex that methylates adenosine to form m6A (44,45) should also regulate APOE expression. We knocked down METTL3 (Supplementary Figure S3C) and measured the abundance of the R-loops upstream of APOE and APOE expression. In cells whose METTL3 was knocked down, we found significantly (P < 0.001) fewer R-loops (Figure 4B) and significantly (P < 0.05) higher APOE expression (Figure 4C) without affecting the expression of MPG (Figure 4D). Hence, METTL3 knockdown phenocopies MPG knockdown, and METTL3 likely acts upstream of MPG in the regulation of APOE.
We further assessed if m6A is a substrate of MPG by biochemical assays. In our previous study (18), we knocked down MPG and measured RNA abasic sites by mass spectrometry which showed that RNA abasic sites decreased significantly. We found that MPG excised the hypoxanthine in RNA/DNA hybrids when inosine was in the RNA strand while MPG activity was minimal in the RNA/DNA hybrid without inosine (18). We then mapped the cleavage site with AP-endonuclease 1 which incised at the RNA abasic sites of RNA/DNA hybrid but not in double-stranded RNA (18). Here, we extended those findings and assessed for binding and cleavage of the same RNA/DNA hybrid as in the previous study (18) except the RNA strand contains an m6A and not an inosine. In electromobility shift assays, we found that MPG has a stronger affinity for the hybrid with m6A RNA than its affinity for the hybrid with m6A DNA, or unmodified RNA/DNA hybrid (Figure 4E, and Supplementary Figure S3D). We then assessed if MPG cleaves the m6A by incubating the RNA/DNA hybrid with MPG followed by AP endonuclease 1 (18) which incises the sugar-phosphate backbone at abasic sites. The results showed incision of the RNA/DNA hybrid with m6A in the RNA, but there was no incision of the hybrid with m6A in the DNA strand nor the RNA/DNA hybrid with no modification (Supplementary Figure S3E, F). This suggests that m6A is removed by MPG to form an RNA abasic site. Taken together the genetic and biochemical findings point to m6A as a likely precursor of RNA abasic site in the R-loop upstream of APOE.
AANCR, a non-coding RNA upstream of APOE
Thus far, we have discussed the R-loops that form upstream of APOE, without characterizing the RNA. Our PRO-seq data show that the intergenic region is more actively transcribed than the upstream gene, Translocase of Outer Mitochondrial Membrane 40 (TOMM40), and the downstream gene, APOE. There are more RNA Pol IIs in the intergenic region than in TOMM40 and APOE; thus, the RNA is likely an independent transcript and not part of TOMM40 or APOE (Supplementary Figure S4A). With data from Coordinated Precision Run-on sequencing (CoPRO; Figure 5A top panel) (46) and PRO-cap (Supplementary Figure S4B top panel) (47,48) that map the 5′cap of RNA, we identified a capped RNA corresponding to the intergenic transcript. We then confirmed that the capped RNA coincides with the nascent RNA identified by PRO-seq (Figure 5A, Supplementary Figure S4B bottom panel). We searched for a polyadenylation motif and did not find one in this RNA.
Figure 5.
AANCR functions as an R-loop-dependent enhancer. (A) CoPRO identified capped RNA in the intergenic region upstream of APOE, the cap coincides with nascent transcription from PRO-seq. The transcription start site of the non-coding RNA, AANCR, is marked based on the cap location. Genomic coordinates (hg19) are indicated. (B) RNA-seq data showing full-length AANCR in iPSC-derived hepatocytes and a partial transcript of AANCR in white blood cells (WBC). BRU-seq data show full-length AANCR in HepG2 liver cells and partial AANCR transcript in cultured B-cells. (C, D) At the AANCR locus, ChIP-seq results show enhancer marks H3K27ac and H3K4me1, and DNase mapping results show DNase hypersensitivity in fibroblasts (C) and HepG2 cells (D). (E) Expression of full-length AANCR (N = 3, P < 0.05; t-test, error bars = S.E.M.) and APOE (N = 3, **P < 0.01; t-test, error bars = S.E.M.) are induced in response to hypertonic stress with the addition of 50 mM NaCl (400 mOsm final) to the culture media of HK-2 renal proximal tubule cells. (F) Representative immunoblot showing APOE expression in HK-2 cells following addition of 50 mM NaCl (N = 2). (G) Immunoblot showing increased APOE secretion into culture media following hypertonic stress (N = 1). (H) Luminescent detection of Annexin V signal in arbitrary units (a.u.) following hypertonic stress. HK-2 cells cultured in hypertonic conditioned media with more APOE protein, have less apoptosis than cells cultured in hypertonic media with less APOE protein (N = 3; P < 0.0001, ANOVA, error bars = S.E.M.). (I) Schematic of the SNPs in AANCR and APOE.
Next, we asked if this capped RNA is noncoding. Sequence analysis by BLASTX (49) and PFAM (50) did not identify similar proteins. Additionally, we analyzed the sequence by several algorithms that test for coding potentials (CPAT, CPC, CNIT (51–53)). These analyses determined that this is a noncoding RNA (Table 1). Together, the results show that upstream of APOE is a capped noncoding RNA, which we have named APOE-associated noncoding RNA (AANCR).
Table 1.
The RNA upstream of APOE is a noncoding RNA
| Method | Results |
|---|---|
| BLASTX | Only 3 sequences showed > 80% identity, two are hypothetical proteins, one is a low-quality protein. |
| PFAM | No significant hits by PFAM-A HMM and GA cutoffs |
| CPAT | Fickett score = 0, hexamer score = 0, coding potential = 0.003; classified as noncoding |
| CNIT | Score = 0.31, classified as noncoding |
| CPC | Coding probability = 0.04, classified as noncoding |
The analysis thus far focused on primary skin fibroblasts that do not express APOE. Next, we extended the analysis by performing RNA-seq of other cell types, including liver cells that express APOE and white blood cells that do not express APOE, to assess if the length of AANCR correlates with APOE expression. Figure 5B shows in iPSC-derived hepatocytes (22), AANCR is transcribed as a full-length transcript, but in white blood cells (from the same individual from which the hepatocytes were derived), AANCR is only partially transcribed. To confirm that in cells with full-length AANCR, APOE is expressed but in cells with partial transcription of AANCR, APOE is not expressed, we turned to BRU-seq which maps nascent RNA by bromouridine tagging. We used the BRU-seq data (54) in the ENCODE data portal (55). Those data also show full-length AANCR in liver cells that express APOE, but the partially transcribed AANCR is found in cultured B-cells where APOE is not expressed (Figure 5B). In the liver cells where AANCR is full-length, we did not find R-loops, m6As, or RNA abasic sites (Supplementary Figure S4C) confirming that without stable R-loops, AANCR is transcribed as a full-length RNA that enhances APOE expression.
In summary, we have uncovered a noncoding RNA, AANCR, upstream of APOE and that AANCR is regulated by R-loops with RNA base modifications. In liver cells, AANCR is full-length and APOE is expressed; in contrast, in fibroblasts and B-cells, R-loops pause RNA Pol II transcription so AANCR is only partially transcribed, and APOE is not expressed.
AANCR is an enhancer RNA
Next, we assessed if this noncoding RNA is an enhancer of APOE. The ENCODE registry of candidate cis-regulatory elements (56) had identified the region upstream of APOE as a potential enhancer based on DNase hypersensitivity and histone modification. GeneHancer also annotated AANCR as an elite enhancer based on the criteria that the region has features of enhancer chromatin and a strong enhancer-gene association (57). In primary fibroblasts, we performed chromatin immunoprecipitation followed by sequencing (ChIP-seq) to assess if the AANCR region has features consistent with an enhancer. In the AANCR region, we found H3K27ac and H3K4me1 marks, and DNase hypersensitive chromatin that characterize enhancers (58,59) (Figure 5C). In HepG2 cells where APOE is expressed, as expected this region is also marked with open chromatin and enhancer marks, H3K27ac and H3K4me1 (Figure 5D). Thus, AANCR is an enhancer RNA, and the region that compasses AANCR and APOE is poised for transcription even in cells where APOE is not expressed.
We then asked whether AANCR physically interacts with APOE. We utilized Hi-C data to map chromatin folding at high resolution (60). The results showed that AANCR is in a ∼18 kb topology-associated domain with APOE and the downstream gene apolipoprotein C-I (APOC1) (Supplementary Figure S4D). In contrast, TOMM40 which is 12 kb upstream of AANCR is outside of that topology-associated domain. Following MPG knockdown, like APOE, APOC1 is significantly induced (P < 0.001, 17-fold), whereas TOMM40 is only slightly increased (1.4-fold) (Supplementary Figure S4E). Using the CoXpresDB (61) which allows users to query co-expression in >25 000 datasets from microarray and RNA-seq experiments in the public data bank (62), we found that the expression levels of APOE and APOC1 are highly correlated, while APOE and TOMM40 expression are much less correlated (Supplementary Figure S4F). Thus, our finding that AANCR is an enhancer of APOE and APOC1 is a general phenomenon, beyond the cell types that we examined in this study.
Together, we have identified a non-coding RNA, AANCR, which enhances APOE and APOC1 expression. In cells where base modifications stabilize R-loops, AANCR is only partially transcribed and cannot promote APOE and APOC1 expression, while in cells such as those in the liver, AANCR is a full-length enhancer RNA that induces transcription and expression of APOE and APOC1.
AANCR and APOE expression levels are stress-responsive
Data such as DNase hypersensitivity from the ENCODE consortium and in our labs show that the APOE region is poised for expression in many cell types. This is somewhat surprising since APOE is known to be expressed only in several cell types, such as hepatocytes, macrophages, and astrocytes.
We posited that APOE is poised to be transcribed in response to stress. To test this, we turned to osmotic stress. The kidney is subjected to hypertonic stress from the administration of medications and disease states such as hyperglycemia, which can lead to injury of the renal proximal tubule cells. We treated renal proximal tubule cells with hypertonic salt and measured AANCR and APOE expression. The results showed that in response to osmotic stress, renal proximal tubule cells express full-length AANCR which induces APOE expression (Figure 5E). The resultant APOE transcripts are translated into APOE proteins and secreted into the culture media (Figure 5F and G). We then treated cells in hypertonic conditions and collected the media with the secreted APOE to culture cells. We found in the conditioned media which has more APOE proteins, that cells are more resistant to apoptosis despite the hypertonic stress (Figure 5H). Together, these results show that AANCR acts as an enhancer to facilitate APOE expression in response to cellular stress.
AANCR is a modifier of Alzheimer's disease
Since AANCR regulates APOE expression, we wondered if it may affect susceptibility to Alzheimer's disease given that the ϵ4 allelic form of APOE is a major risk factor for Alzheimer's disease (63). As a complex human disease, multiple factors contribute to Alzheimer's disease. Besides the ϵ4 variant, other genetic factors have been identified, including those upstream of APOE where AANCR is encoded (Figure 5I). Studies have reported an allelic association of rs449647 (also referred to as –491) with Alzheimer's (64–66). The GWAS catalog also shows multiple studies that have identified significant allelic associations for SNPs in the AANCR region including, rs449647, with Alzheimer's Disease (Supplementary Table S1). A study of 42 034 patients and 272 244 controls in the United Kingdom found a significant allelic association (P < 10−58) of the SNP rs449647 with Alzheimer's disease (67). To assess how the sequence variants in AANCR affect susceptibility to dementia, we asked if the genetic variants affect the function of AANCR and therefore APOE expression. Using data from the GTEx consortium (68), despite the small sample size of a few hundred samples, we found several SNPs in the AANCR region that have significant allelic association with APOE expression, including in the hippocampus (Supplementary Table S2). The APOE signal from the hippocampus is most likely contributed by astrocytes where AANCR is expressed and its expression is highly correlated with APOE expression (r = 0.66; P < 0.002 Spearman) (19).
While APOE expression is important in understanding transcriptional regulation, it is not practical to measure APOE gene or protein expression in tissues from a large number of individuals, so we ask if our finding can be extended to plasma APOE level, a more feasible clinical measurement. We turned to a Danish study that measured plasma APOE of 106 652 individuals (21). The results show a significant (P < 10−6) association of SNP variants, including rs449647 in AANCR with plasma APOE level. Together, the genetic data confirm the role of AANCR in regulating the transcription of APOE. The genetic variants in AANCR that contribute to individual differences in APOE expression also affect risk of developing Alzheimer's disease.
Next, we assessed if polymorphisms in AANCR contribute additional risk factors to Alzheimer's disease beyond the ϵ4 variant in APOE. Despite the proximity of the region of AANCR to APOE, the extent of linkage disequilibrium is modest. From the different populations studied by the HapMap Consortium (69), the r2 of rs449647 (in ANNCR) and rs429358 (in APOE) are 0.054 in Western Europeans (CEU), 0.068 in Yoruban from Idaban (YRI), 0.10 in Japanese from Tokyo (JPT) and 0.08 in Han Chinese from Beijing (CHB). Thus, the allelic associations of the SNPs in the AANCR regions with APOE expression and Alzheimer's disease are most likely independent of those of the ϵ4 allele of rs429358. This shows that by regulating APOE expression, AANCR is a modifier of Alzheimer's disease.
Transcription regulation by R-loops with modified RNA is common and prevalent
As shown above, R-loops with RNA abasic sites regulate AANCR transcription by pausing the elongating RNA Pol II. In addition to AANCR, we identified 1,966 noncoding RNAs that are regulated by this mechanism. In these transcripts, R-loops form, and in the RNAs, the adenosines are methylated (Figure 6A, left) and then MPG likely removes the N6-methyladenosine to form abasic sites (Figure 6A, middle panel). The resultant R-loops then pause the elongating RNA Pol II (Figure 6A, right panel) such that the densities of the RNA Pol II 5′ to the R-loops are at least three times higher (pausing index > 3) compared to the rest of the noncoding transcripts. This mechanism regulates RNA Pol II elongation of the 1,966 noncoding transcripts in all three cell types that we have studied, B-cells (median PI = 14.5), primary fibroblasts (median PI = 5.6), and renal proximal tubule cells (median PI = 6.3). Figure 6B shows three examples of these non-coding transcripts with paused elongating RNA Pol II in B-cells, fibroblasts, and renal proximal tubule cells. Unlike the RNA Pol II that pauses in the promoter, the elongating RNA Pol II is not paused by the NELF protein complexes. Figure 6C shows that the NELF protein complex is found behind (5′) the paused RNA Pol II while the R-loops are found ahead (3′; Figure 6C) to the paused RNA Pol II. Like AANCR, the noncoding RNAs regulated by this mechanism are enhancers characterized by H3K27ac and H3K4me1 histone marks (Figure 6D and E). Together we have identified a regulatory mechanism where RNA modifications stabilize R-loops to pause transcription elongation of enhancer RNAs.
Figure 6.
R-loop with RNA modifications pause elongating RNA Pol II. (A) Heatmaps of m6A modified nascent RNA, MPG RIP-seq results, and PRO-seq (27) at 1,966 non-coding RNAs with paused RNA Pol II in the gene body in B-cells. Average signals for m6A (gold), MPG (orange), and PRO-seq (blue) are plotted. Heatmap signals are scaled to maximal signal per row in arbitrary units. The transcripts are in the same order across panels with AACNR located on row 637. (B) Elongating RNA Pol II pauses in three cell types. PRO-seq reads for three representative transcripts are plotted for B-cells, fibroblasts, and HK-2 proximal tubule cells. Genomic locations are hg19. Y-axis is reads per million. (C) Average ChIP seq results (black line) show NELFA 5′ to paused RNA Pol II (PRO-seq data, blue line), with R-loop (DRIP-seq, red line) 3′ of paused RNA Pol II. (D and E) Average H3K27ac and H3K4me1 ChIP-seq results indicate these are enhancer RNA. (C–E) Data from fibroblasts are shown for the same 1,966 noncoding transcripts as shown in panel A. PRO-seq results are scaled as in panel A and ChIP-seq data are scaled to reads per million, with error bands = S.E.M.
DISCUSSION
In this study, we showed that R-loops with RNA abasic sites regulate transcription elongation. To our knowledge, this is the first identification of the regulation of RNA Pol II elongation by RNA sequence and structure. Although there have been tremendous advances in our understanding of RNA, we are still a long way from knowing its components and likely most of its function. RNA was simply considered an essential intermediary between DNA instructions and protein synthesis. We now know that RNA is much more than a go-between for DNA and protein, and that RNA carries important regulatory information. RNAs are also more varied than originally thought. Nearly every nucleotide in DNA is transcribed into RNA (70,71), so in addition to mRNA and tRNA, there are myriad noncoding RNAs (72–74). Co-transcriptionally and post-transcriptionally, the bases and sugar of RNA are modified, and the transcripts are spliced and polyadenylated with different lengths of adenosine. These processing steps add to the complexity of RNA by generating different transcripts from the same DNA template. The processing of RNA can differ by cell type and cellular environments resulting in a vast array of RNA species with different regulatory roles.
This study was motivated by our previous work that uncovered abasic sites in RNA that are present in R-loops (18). When RNA hybridizes with DNA to form R-loops, the RNA is exposed to enzymes that modify its bases including MPG which cleaves the glycosidic bond leading to RNA abasic sites. The discovery of DNA abasic sites and the enzymes such as methylpurine glycosylase which form and process abasic sites, led to the elucidation of base excision repair in response to DNA damage (75). However, the abasic sites in RNA differ from those in DNA due to the additional hydroxyl group. Studies have shown that RNA with abasic sites is more stable than DNA with abasic sites, as cleavage at the abasic site in RNA is at least 17 times less likely than at the abasic site in DNA (76). RNA abasic sites are rather abundant; by mass spectrometry, we found that there are about 4 RNA abasic sites per million ribonucleotides (18), so with over 50 billion ribonucleotides in a cell, there are hundreds of thousands of RNA abasic sites in each cell. Many thousands of R-loops have also been mapped genome-wide (77–80). We are just beginning to understand some of the features and functions of modifications on RNA and R-loops, but many aspects remain to be elucidated, including how they affect the stability and role of RNA.
In this study, we showed how R-loops with RNA abasic sites regulate the transcription of noncoding RNA. We have detailed the mechanism that regulates AANCR, the enhancer of APOE. In cells such as hepatocytes and astrocytes, AANCR is transcribed into a full-length enhancer that activates APOE expression. In other cells, AANCR is not fully synthesized so APOE is not expressed. In these cells, the transcription and function of AANCR are dynamically regulated by RNA sequence modification and R-loops. The nascent RNA of AANCR forms R-loops, and in the R-loops, the RNA is modified by N6-adenine methylation and then the resultant N6-methyladenosines are likely removed by methylpurine glycosylase to become abasic sites that stabilize the R-loops. The stabilized R-loops pause RNA Pol II and prevent the synthesis of the full-length enhancer RNA, yet upon hypertonic stress, the R-loops resolve, and the enhancer RNA is fully transcribed to allow rapid activation of APOE. The nucleic-acid-mediated pausing keeps AANCR poised for rapid transcriptional responses. Like the hammerhead ribozyme whose catalytic function is dependent on its structure (81,82), AANCR is also dependent on its structure to function as an enhancer of APOE and APOC1 expression.
Beyond AANCR, this mechanism where RNA modifications and R-loops regulate transcription elongation determines the synthesis of over 1,000 noncoding RNA in three cell types. RNA Pol II pausing in the proximal promoter region mediated by protein complexes NELF and DSIF is well characterized (83–85). The RNA Pol II pauses identified in this study are different from the pauses in the promoter, they occur in the gene bodies and are mediated by R-loops that are stabilized by RNA abasic sites and not by pausing protein complexes. These R-loops regulate the transcription of enhancer RNAs thus they have a broad impact on gene expression.
In studying the RNA abasic sites and R-loops in AANCR, we revealed the mechanism that activates APOE transcription and expression. Previously, the induction of APOE in response to lipid load is well characterized, it involves the 3′ enhancer elements (86) and LXR/RXR signaling (87). Here, we show how cell-type and stress-responsive expression of APOE are regulated. Understanding the regulation of APOE expression is critical given its role in lipid transport (88), maintenance of cognitive function (63,89,90), and response to infection (91,92) as well as immunotherapies (93,94). Our study shows that sequence variants in AANCR explain individual variation in APOE expression and APOE plasma level. The variants in the AANCR region that affect APOE level also confer susceptibility to Alzheimer's disease. Based on the extent of linkage disequilibrium between the variants in AANCR and APOE, the effect of polymorphisms in AANCR is independent of the risk conferred by the APOE ϵ4 allele. By regulating APOE expression, AANCR acts as a modifier of Alzheimer's disease.
Genetic studies of Alzheimer's disease found that the A-allele of –491 SNP (rs449647) is the risk allele (64,95) and in the GTEx data, the A-allele is associated with lower APOE expression in different cell types including cells in the hippocampus. While studies have shown that Alzheimer's patients have low plasma APOE levels (21,96), other studies report more complex findings including higher APOE levels in the patients (97–100). Future studies can assess how variation in AANCR and APOE interact to affect Alzheimer's disease and those results may guide the modulation of AANCR in RNA-based therapeutics in the prevention and treatment of Alzheimer's disease.
In conclusion, we identified a regulatory mechanism where the METTL3/METTL14 protein complex coalesces with methylpurine glycosylase forming N6-methyladenosines which are then likely cleaved resulting in RNA abasic sites on R-loops to pause RNA Pol II transcription. We identified over 1,000 noncoding RNA that are regulated by this mechanism and detailed how this pathway regulates the expression of APOE. We surmise that RNA modification and RNA structure play a critical role in the dynamic regulation of gene expression. Studies of nucleic acid sequence and structure will advance the basic understanding of gene regulation and the genetic basis of complex human diseases.
DATA AND AVAILABILITY
The deep sequencing data reported in this paper have been deposited in the NCBI sequence read archive, PRJNA801792 and the NCBI database of Genotypes and Phenotypes archive, Phs001322.v2.p1. Downloaded data are listed in the Supplementary Methods.
Supplementary Material
ACKNOWLEDGEMENTS
We thank Annelise Comai and Don Delker for assistance with data processing and analysis. We thank Drs. William Copeland, Paul Doetsch and Philip Hanawalt for comments and suggestions. We thank the University of Pennsylvania Skin Biology and Diseases Resource-based Center (Philadelphia, PA) for skin fibroblasts.
We dedicate this article to the memory of Samuel Wilson who passed during this project.
Contributor Information
Jason A Watts, Department of Medicine, University of Michigan, Ann Arbor, MI 48109, USA; Epigenetics and Stem Cell Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA.
Christopher Grunseich, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA.
Yesenia Rodriguez, Genome Integrity and Structural Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA.
Yaojuan Liu, Department of Pediatrics and Life Sciences Institute, University of Michigan, Ann Arbor, MI 48109, USA.
Dongjun Li, Department of Pediatrics and Life Sciences Institute, University of Michigan, Ann Arbor, MI 48109, USA.
Joshua T Burdick, Department of Pediatrics and Life Sciences Institute, University of Michigan, Ann Arbor, MI 48109, USA.
Alan Bruzel, Department of Pediatrics and Life Sciences Institute, University of Michigan, Ann Arbor, MI 48109, USA.
Robert J Crouch, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, USA.
Robert W Mahley, Gladstone Institute of Neurological Disease, San Francisco, CA, USA; Departments of Pathology and Medicine, University of California, San Francisco, CA, USA.
Samuel H Wilson, Genome Integrity and Structural Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA.
Vivian G Cheung, Department of Pediatrics and Life Sciences Institute, University of Michigan, Ann Arbor, MI 48109, USA.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
Howard Hughes Medical Institute and University of Michigan (to V.G.C.); K99ES031662 (to Y.R.); 1RF1AG059751 (to R.W.M.); ASN-Kidney Cure career de-velopment award (to J.A.W.); ES103361 (to J.A.W.) Z01ES050158 and Z01ES050159 (to S.H.W.); ZIA HD000068-48 (to R.J.C.); 1ZIANS002974 (to C.G.). Open access charge: NIH intramural funds.
Conflict of interest statement. None declared.
REFERENCES
- 1. Cheung V.G., Spielman R.S., Ewens K.G., Weber T.M., Morley M., Burdick J.T.. Mapping determinants of human gene expression by regional and genome-wide association. Nature. 2005; 437:1365–1369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Morley M., Molony C.M., Weber T.M., Devlin J.L., Ewens K.G., Spielman R.S., Cheung V.G.. Genetic analysis of genome-wide variation in human gene expression. Nature. 2004; 430:743–747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Boccaletto P., Machnicka M.A., Purta E., Piatkowski P., Baginski B., Wirecki T.K., de Crécy-Lagard V., Ross R., Limbach P.A., Kotter A.et al.. MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic. Acids. Res. 2018; 46:D303–D307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Mannion N., Arieti F., Gallo A., Keegan L.P., O’Connell M.A. New insights into the biological role of mammalian ADARs; the RNA editing proteins. Biomolecules. 2015; 5:2338–2362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Stefl R., Oberstrass F.C., Hood J.L., Jourdan M., Zimmermann M., Skrisovska L., Maris C., Peng L., Hofr C., Emeson R.B.et al.. The solution structure of the ADAR2 dsRBM-RNA complex reveals a sequence-specific readout of the minor groove. Cell. 2010; 143:225–237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Schirle N.T., Sheu-Gruttadauria J., MacRae I.J.. Structural basis for microRNA targeting. Science. 2014; 346:608–613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Chen Y.-Z., Bennett C.L., Huynh H.M., Blair I.P., Puls I., Irobi J., Dierick I., Abel A., Kennerson M.L., Rabin B.A.et al.. DNA/RNA helicase gene mutations in a form of juvenile amyotrophic lateral sclerosis (ALS4). Am. J. Hum. Genet. 2004; 74:1128–1135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Grunseich C., Wang I.X., Watts J.A., Burdick J.T., Guber R.D., Zhu Z., Bruzel A., Lanman T., Chen K., Schindler A.B.et al.. Senataxin mutation reveals how R-Loops promote transcription by blocking DNA methylation at gene promoters. Mol. Cell. 2018; 69:426–437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Grunseich C., Patankar A., Amaya J., Watts J.A., Li D., Ramirez P., Schindler A.B., Fischbeck K.H., Cheung V.G.. Clinical and molecular aspects of senataxin mutations in ALS4. Ann. Neurol. 2020; 87:547–555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Aguilera A., Garcia-Muse T.. R loops: from transcription byproducts to threats to genome stability. Mol. Cell. 2012; 46:115–124. [DOI] [PubMed] [Google Scholar]
- 11. Castillo-Guzman D., Chédin F.. 2021) Defining R-loop classes and their contributions to genome instability. DNA Repair (Amst.). 106:103182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Niehrs C., Luke B.. 2020) Regulatory R-loops as facilitators of gene expression and genome stability. Nat. Rev. Mol. Cell Biol. 21:167–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Thomas M., White R.L., Davis R.W.. Hybridization of RNA to double-stranded DNA: formation of R-loops. Proc. Natl. Acad. Sci. U.S.A. 1976; 73:2294–2298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Cristini A., Groh M., Kristiansen M.S., Gromak N.. RNA/DNA hybrid interactome identifies DXH9 as a molecular player in transcriptional termination and R-Loop-Associated DNA damage. Cell Rep. 2018; 23:1891–1905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Wang I.X., Grunseich C., Fox J., Burdick J., Zhu Z., Ravazian N., Hafner M., Cheung V.G.. Human proteins that interact with RNA/DNA hybrids. Genome Res. 2018; 28:1405–1414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Yan Q., Wulfridge P., Doherty J., Fernandez-Luna J.L., Real P.J., Tang H.-Y., Sarma K.. Proximity labeling identifies a repertoire of site-specific R-loop modulators. Nat. Commun. 2022; 13:53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Lindahl T. An N-glycosidase from Escherichia coli that releases free uracil from DNA containing deaminated cytosine residues. Proc. Natl. Acad. Sci. U.S.A. 1974; 71:3649–3653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Liu Y., Rodriguez Y., Ross R.L., Zhao R., Watts J.A., Grunseich C., Bruzel A., Li D., Burdick J.T., Prasad R.et al.. RNA abasic sites in yeast and human cells. Proc. Natl. Acad. Sci. U.S.A. 2020; 117:20689–20695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Zhang Y., Sloan S.A., Clarke L.E., Caneda C., Plaza C.A., Blumenthal P.D., Vogel H., Steinberg G.K., Edwards M.S.B., Li G.et al.. Purification and characterization of progenitor and mature human astrocytes reveals transcriptional and functional differences with mouse. Neuron. 2016; 89:37–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Wu T.D., Nacu S.. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics. 2010; 26:873–881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Rasmussen K.L., Tybjærg-Hansen A., Nordestgaard B.G., Frikke-Schmidt R.. Plasma apolipoprotein e levels and risk of dementia: a mendelian randomization study of 106,562 individuals. Alzheimers Dement. 2018; 14:71–80. [DOI] [PubMed] [Google Scholar]
- 22. Carpentier A., Tesfaye A., Chu V., Nimgaonkar I., Zhang F., Lee S.B., Thorgeirsson S.S., Feinstone S.M., Liang T.J.. Engrafted human stem cell–derived hepatocytes establish an infectious HCV murine model. J. Clin. Invest. 2014; 124:4953–4964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Adhikari S., Manthena P.V., Uren A., Roy R.. Expression, purification and characterization of codon-optimized human N-methylpurine-DNA glycosylase from Escherichia coli. Protein Expr. Purif. 2008; 58:257–262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Ryder S.P., Recht M.I., Williamson J.R.. Quantitative analysis of protein-RNA interactions by gel mobility shift. Methods Mol. Biol. 2008; 488:99–115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Kwak H., Fuda N.J., Core L.J., Lis J.T.. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science. 2013; 339:950–953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Watts J.A., Burdick J., Daigneault J., Zhu Z., Grunseich C., Bruzel A., Cheung V.G.. Cis elements that mediate RNA polymerase II pausing regulate human gene expression. Am. J. Hum. Genet. 2019; 105:677–688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Kristjánsdóttir K., Dziubek A., Kang H.M., Kwak H.. Population-scale study of eRNA transcription reveals bipartite functional enhancer architecture. Nat. Commun. 2020; 11:5963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Miller H.E., Montemayor D., Li J., Levy S., Pawar R., Hartono S., Sharma K., Frost B., Chedin F., Bishop A.J.R.. Exploration and analysis of R-loop mapping data with RLBase. Nucleic Acids Res. 2022; gkac732. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Ramirez P., Crouch R.J., Cheung V.G., Grunseich C.. R-Loop analysis by dot-blot. J. Vis. Exp. 2021; 10.3791/62069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Bou-Nader C., Bothra A., Garboczi D.N., Leppla S.H., Zhang J.. Structural basis of R-loop recognition by the S9.6 monoclonal antibody. Nat. Commun. 2022; 13:1641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Hu Z., Zhang A., Storz G., Gottesman S., Leppla S.H.. An antibody-based microarray assay for small RNA detection. Nucleic Acids Res. 2006; 34:e52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Kosar M., Piccini D., Foiani M., Giannattasio M.. A rapid method to visualize human mitochondrial DNA replication through rotary shadowing and transmission electron microscopy. Nucleic Acids Res. 2021; 49:e121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Yamaguchi Y., Takagi T., Wada T., Yano K., Furuya A., Sugimoto S., Hasegawa J., Handa H.. NELF, a multisubunit complex containing RD, cooperates with DSIF to repress RNA polymerase II elongation. Cell. 1999; 97:41–51. [DOI] [PubMed] [Google Scholar]
- 34. Stork C.T., Bocek M., Crossley M.P., Sollier J., Sanz L.A., Chédin F., Swigut T., Cimprich K.A.. Co-transcriptional R-loops are the main cause of estrogen-induced DNA damage. Elife. 2016; 5:17548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Frye M., Jaffrey S.R., Pan T., Rechavi G., Suzuki T.. RNA modifications: what have we learned and where are we headed?. Nat. Rev. Genet. 2016; 17:365–372. [DOI] [PubMed] [Google Scholar]
- 36. Pan T. N6-methyl-adenosine modification in messenger and long non-coding RNA. Trends Biochem. Sci. 2013; 38:204–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Shi H., Wei J., He C.. Where, when, and how: context-dependent functions of RNA methylation writers, readers, and erasers. Mol. Cell. 2019; 74:640–650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Zaccara S., Ries R.J., Jaffrey S.R.. Reading, writing and erasing mRNA methylation. Nat. Rev. Mol. Cell Biol. 2019; 20:608–624. [DOI] [PubMed] [Google Scholar]
- 39. Abakir A., Giles T.C., Cristini A., Foster J.M., Dai N., Starczak M., Rubio-Roldan A., Li M., Eleftheriou M., Crutchley J.et al.. N6-methyladenosine regulates the stability of RNA:DNA hybrids in human cells. Nat. Genet. 2020; 52:48–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Kang H.J., Cheon N.Y., Park H., Jeong G.W., Ye B.J., Yoo E.J., Lee J.H., Hur J.-H., Lee E.-A., Kim H.et al.. TonEBP recognizes R-loops and initiates m6A RNA methylation for R-loop resolution. Nucleic Acids Res. 2021; 49:269–284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Zhang C., Chen L., Peng D., Jiang A., He Y., Zeng Y., Xie C., Zhou H., Luo X., Liu H.et al.. METTL3 and N6-Methyladenosine promote homologous recombination-mediated repair of DSBs by modulating DNA-RNA hybrid accumulation. Mol. Cell. 2020; 79:425–442. [DOI] [PubMed] [Google Scholar]
- 42. Harper J.E., Miceli S.M., Roberts R.J., Manley J.L.. Sequence specificity of the human mRNA N6-adenosine methylase in vitro. Nucleic Acids Res. 1990; 18:5735–5741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Csepany T., Lin A., Baldick C.J., Beemon K.. Sequence specificity of mRNA N6-adenosine methyltransferase. J. Biol. Chem. 1990; 265:20117–20122. [PubMed] [Google Scholar]
- 44. Liu J., Yue Y., Han D., Wang X., Fu Y., Zhang L., Jia G., Yu M., Lu Z., Deng X.et al.. A METTL3-METTL14 complex mediates mammalian nuclear RNA N6-adenosine methylation. Nat. Chem. Biol. 2014; 10:93–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Wang X., Feng J., Xue Y., Guan Z., Zhang D., Liu Z., Gong Z., Wang Q., Huang J., Tang C.et al.. Structural basis of N(6)-adenosine methylation by the METTL3-METTL14 complex. Nature. 2016; 534:575–578. [DOI] [PubMed] [Google Scholar]
- 46. Tome J.M., Tippens N.D., Lis J.T.. Single-molecule nascent RNA sequencing reveals regulatory domain architecture at promoters and enhancers. Nat. Genet. 2018; 50:1533–1541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Core L.J., Martins A.L., Danko C.G., Waters C.T., Siepel A., Lis J.T.. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 2014; 46:1311–1320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Mahat D.B., Kwak H., Booth G.T., Jonkers I.H., Danko C.G., Patel R.K., Waters C.T., Munson K., Core L.J., Lis J.T.. Base-Pair resolution genome-wide mapping of active RNA polymerases using precision nuclear run-on (PRO-seq). Nat. Protoc. 2016; 11:1455–1476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Gish W., States D.J.. Identification of protein coding regions by database similarity search. Nat. Genet. 1993; 3:266–272. [DOI] [PubMed] [Google Scholar]
- 50. Bateman A., Coin L., Durbin R., Finn R.D., Hollich V., Griffiths-Jones S., Khanna A., Marshall M., Moxon S., Sonnhammer E.L.L.et al.. The pfam protein families database. Nucleic Acids Res. 2004; 32:D138–D141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Guo J.-C., Fang S.-S., Wu Y., Zhang J.-H., Chen Y., Liu J., Wu B., Wu J.-R., Li E.-M., Xu L.-Y.et al.. CNIT: a fast and accurate web tool for identifying protein-coding and long non-coding transcripts based on intrinsic sequence composition. Nucleic Acids Res. 2019; 47:W516–W522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Kong L., Zhang Y., Ye Z.-Q., Liu X.-Q., Zhao S.-Q., Wei L., Gao G.. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 2007; 35:W345–W349. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Wang L., Park H.J., Dasari S., Wang S., Kocher J.-P., Li W.. CPAT: coding-potential assessment tool using an alignment-free logistic regression model. Nucleic Acids Res. 2013; 41:e74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Paulsen M.T., Veloso A., Prasad J., Bedi K., Ljungman E.A., Tsan Y.-C., Chang C.-W., Tarrier B., Washburn J.G., Lyons R.et al.. Coordinated regulation of synthesis and stability of RNA during the acute TNF-induced proinflammatory response. Proc. Natl. Acad. Sci. U.S.A. 2013; 110:2240–2245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Davis C.A., Hitz B.C., Sloan C.A., Chan E.T., Davidson J.M., Gabdank I., Hilton J.A., Jain K., Baymuradov U.K., Narayanan A.K.et al.. The encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 2018; 46:D794–D801. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Moore J.E., Purcaro M.J., Pratt H.E., Epstein C.B., Shoresh N., Adrian J., Kawli T., Davis C.A., Dobin A., Kaul R.et al.. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature. 2020; 583:699–710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Fishilevich S., Nudel R., Rappaport N., Hadar R., Plaschkes I., Iny Stein T., Rosen N., Kohn A., Twik M., Safran M.et al.. GeneHancer: genome-wide integration of enhancers and target genes in genecards. Database (Oxford). 2017; 2017:bax028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Heintzman N.D., Stuart R.K., Hon G., Fu Y., Ching C.W., Hawkins R.D., Barrera L.O., Van Calcar S., Qu C., Ching K.A.et al.. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 2007; 39:311–318. [DOI] [PubMed] [Google Scholar]
- 59. Heintzman N.D., Hon G.C., Hawkins R.D., Kheradpour P., Stark A., Harp L.F., Ye Z., Lee L.K., Stuart R.K., Ching C.W.et al.. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature. 2009; 459:108–112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Krietenstein N., Abraham S., Venev S.V., Abdennur N., Gibcus J., Hsieh T.-H.S., Parsi K.M., Yang L., Maehr R., Mirny L.A.et al.. Ultrastructural details of mammalian chromosome architecture. Mol. Cell. 2020; 78:554–565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Obayashi T., Kagaya Y., Aoki Y., Tadaka S., Kinoshita K.. COXPRESdb v7: a gene coexpression database for 11 animal species supported by 23 coexpression platforms for technical evaluation and evolutionary inference. Nucleic Acids Res. 2019; 47:D55–D62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Kodama Y., Mashima J., Kosuge T., Kaminuma E., Ogasawara O., Okubo K., Nakamura Y., Takagi T.. DNA data bank of japan: 30th anniversary. Nucleic Acids Res. 2018; 46:D30–D35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Corder E.H., Saunders A.M., Strittmatter W.J., Schmechel D.E., Gaskell P.C., Small G.W., Roses A.D., Haines J.L., Pericak-Vance M.A.. Gene dose of apolipoprotein e type 4 allele and the risk of alzheimer's disease in late onset families. Science. 1993; 261:921–923. [DOI] [PubMed] [Google Scholar]
- 64. Bullido M.J., Artiga M.J., Recuero M., Sastre I., García M.A., Aldudo J., Lendon C., Han S.W., Morris J.C., Frank A.et al.. A polymorphism in the regulatory region of APOE associated with risk for alzheimer's dementia. Nat. Genet. 1998; 18:69–71. [DOI] [PubMed] [Google Scholar]
- 65. Lambert J.C., Berr C., Pasquier F., Delacourte A., Frigard B., Cottel D., Pérez-Tur J., Mouroux V., Mohr M., Cécyre D.et al.. Pronounced impact of Th1/E47cs mutation compared with -491 AT mutation on neural APOE gene expression and risk of developing alzheimer's disease. Hum. Mol. Genet. 1998; 7:1511–1516. [DOI] [PubMed] [Google Scholar]
- 66. Wang J.C., Kwon J.M., Shah P., Morris J.C., Goate A.. Effect of APOE genotype and promoter polymorphism on risk of alzheimer's disease. Neurology. 2000; 55:1644–1649. [DOI] [PubMed] [Google Scholar]
- 67. Marioni R.E., Harris S.E., Zhang Q., McRae A.F., Hagenaars S.P., Hill W.D., Davies G., Ritchie C.W., Gale C.R., Starr J.M.et al.. GWAS on family history of alzheimer's disease. Transl Psychiatry. 2018; 8:99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Ardlie K.G., DeLuca D.S., Segre A.V., Sullivan T.J., Young T.R., Gelfand E.T., Trowbridge C.A., Maller J.B., Tukiainen T., Lek M.et al.. The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015; 348:648–660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Consortium InternationalHapMap, Frazer K.A., Ballinger D.G., Cox D.R., Hinds D.A., Stuve L.L., Gibbs R.A., Belmont J.W., Boudreau A., Hardenbol P.et al.. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007; 449:851–861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Project Consortium ENCODE, Birney E., Stamatoyannopoulos J.A., Dutta A., Guigó R., Gingeras T.R., Margulies E.H., Weng Z., Snyder M., Dermitzakis E.T.et al.. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007; 447:799–816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Hangauer M.J., Vaughn I.W., McManus M.T.. Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs. PLoS Genet. 2013; 9:e1003569. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Cech T.R., Steitz J.A.. The noncoding RNA revolution-trashing old rules to forge new ones. Cell. 2014; 157:77–94. [DOI] [PubMed] [Google Scholar]
- 73. Mattick J.S. The state of long non-coding RNA biology. Noncoding RNA. 2018; 4:E17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Mercer T.R., Dinger M.E., Mattick J.S.. Long non-coding RNAs: insights into functions. Nat. Rev. Genet. 2009; 10:155–159. [DOI] [PubMed] [Google Scholar]
- 75. Lindahl T. Instability and decay of the primary structure of DNA. Nature. 1993; 362:709–715. [DOI] [PubMed] [Google Scholar]
- 76. Küpfer P.A., Leumann C.J.. The chemical stability of abasic RNA compared to abasic DNA. Nucleic. Acids. Res. 2007; 35:58–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Crossley M.P., Bocek M.J., Hamperl S., Swigut T., Cimprich K.A.. 2020) qDRIP: a method to quantitatively assess RNA-DNA hybrid formation genome-wide. Nucleic. Acids. Res. 48:e84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Ginno P.A., Lott P.L., Christensen H.C., Korf I., Chédin F.. R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters. Mol. Cell. 2012; 45:814–825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Sanz L.A., Hartono S.R., Lim Y.W., Steyaert S., Rajpurkar A., Ginno P.A., Xu X., Chédin F.. Prevalent, dynamic, and conserved R-Loop structures associate with specific epigenomic signatures in mammals. Mol. Cell. 2016; 63:167–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Wahba L., Costantino L., Tan F.J., Zimmer A., Koshland D.. S1-DRIP-seq identifies high expression and polyA tracts as major contributors to R-loop formation. Genes Dev. 2016; 30:1327–1338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81. Pley H.W., Flaherty K.M., McKay D.B.. Three-dimensional structure of a hammerhead ribozyme. Nature. 1994; 372:68–74. [DOI] [PubMed] [Google Scholar]
- 82. Scott W.G., Finch J.T., Klug A.. The crystal structure of an all-RNA hammerhead ribozyme: a proposed mechanism for RNA catalytic cleavage. Cell. 1995; 81:991–1002. [DOI] [PubMed] [Google Scholar]
- 83. Gilchrist D.A., Nechaev S., Lee C., Ghosh S.K.B., Collins J.B., Li L., Gilmour D.S., Adelman K.. NELF-mediated stalling of pol II can enhance gene expression by blocking promoter-proximal nucleosome assembly. Genes Dev. 2008; 22:1921–1933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84. Vos S.M., Farnung L., Urlaub H., Cramer P.. Structure of paused transcription complex pol II-DSIF-NELF. Nature. 2018; 560:601–606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85. Wu C.-H., Yamaguchi Y., Benjamin L.R., Horvat-Gordon M., Washinsky J., Enerly E., Larsson J., Lambertsson A., Handa H., Gilmour D.. NELF and DSIF cause promoter proximal pausing on the hsp70 promoter in drosophila. Genes Dev. 2003; 17:1402–1414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86. Shih S.-J., Allan C., Grehan S., Tse E., Moran C., Taylor J.M.. Duplicated downstream enhancers control expression of the human apolipoprotein e gene in macrophages and adipose tissue. J. Biol. Chem. 2000; 275:31567–31572. [DOI] [PubMed] [Google Scholar]
- 87. Laffitte B.A., Repa J.J., Joseph S.B., Wilpitz D.C., Kast H.R., Mangelsdorf D.J., Tontonoz P.. LXRs control lipid-inducible expression of the apolipoprotein e gene in macrophages and adipocytes. Proc. Natl. Acad. Sci. U.S.A. 2001; 98:507–512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88. Mahley R.W. Apolipoprotein E: cholesterol transport protein with expanding role in cell biology. Science. 1988; 240:622–630. [DOI] [PubMed] [Google Scholar]
- 89. Roses A.D. Apolipoprotein e alleles as risk factors in alzheimer's disease. Annu. Rev. Med. 1996; 47:387–400. [DOI] [PubMed] [Google Scholar]
- 90. Mahley R.W., Huang Y.. Apolipoprotein e sets the stage: response to injury triggers neuropathology. Neuron. 2012; 76:871–885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91. Jiang J., Luo G.. Apolipoprotein e but not b is required for the formation of infectious hepatitis c virus particles. J. Virol. 2009; 83:12680–12691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92. Kuo C.-L., Pilling L.C., Atkins J.L., Masoli J.A.H., Delgado J., Kuchel G.A., Melzer D.. APOE e4 genotype predicts severe COVID-19 in the UK biobank community cohort. J. Gerontol. A. 2020; 75:2231–2232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93. Pencheva N., Buss C.G., Posada J., Merghoub T., Tavazoie S.F.. Broad-spectrum therapeutic suppression of metastatic melanoma through nuclear hormone receptor activation. Cell. 2014; 156:986–1001. [DOI] [PubMed] [Google Scholar]
- 94. Tavazoie M.F., Pollack I., Tanqueco R., Ostendorf B.N., Reis B.S., Gonsalves F.C., Kurth I., Andreu-Agullo C., Derbyshire M.L., Posada J.et al.. LXR/ApoE activation restricts innate immune suppression in cancer. Cell. 2018; 172:825–840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95. Lambert J., Ibrahim-Verbaas C., Harold D., Naj A., Sims R., Bellenguez C., DeStafano A., Bis J., Beecham G., Grenier-Boley B.et al.. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat. Genet. 2013; 45:1452–1458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96. Bertrand P., Poirier J., Oda T., Finch C.E., Pasinetti G.M.. Association of apolipoprotein e genotype with brain levels of apolipoprotein e and apolipoprotein j (clusterin) in alzheimer disease. Brain Res. Mol. Brain Res. 1995; 33:174–178. [DOI] [PubMed] [Google Scholar]
- 97. Koch M., DeKosky S.T., Goodman M., Sun J., Furtado J.D., Fitzpatrick A.L., Mackey R.H., Cai T., Lopez O.L., Kuller L.H.et al.. Association of apolipoprotein e in lipoprotein subspecies with risk of dementia. JAMA Network Open. 2020; 3:e209250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98. Laws S.M., Hone E., Taddei K., Harper C., Dean B., McClean C., Masters C., Lautenschlager N., Gandy S.E., Martins R.N.. Variation at the APOE -491 promoter locus is associated with altered brain levels of apolipoprotein E. Mol. Psychiatry. 2002; 7:886–890. [DOI] [PubMed] [Google Scholar]
- 99. Lehtimäki T., Pirttilä T., Mehta P.D., Wisniewski H.M., Frey H., Nikkari T.. Apolipoprotein e (apoE) polymorphism and its influence on ApoE concentrations in the cerebrospinal fluid in finnish patients with alzheimer's disease. Hum. Genet. 1995; 95:39–42. [DOI] [PubMed] [Google Scholar]
- 100. Taddei K., Clarnette R., E. Gandy S., Martins R.N.. Increased plasma apolipoprotein e (apoE) levels in alzheimer's disease. Neurosci. Lett. 1997; 223:29–32. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







