Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2016 Oct 11;113(43):12316–12321. doi: 10.1073/pnas.1603217113

Nascent RNA sequencing reveals distinct features in plant transcription

Jonathan Hetzel a,b,c,1, Sascha H Duttke a,b,c,1, Christopher Benner d,2, Joanne Chory a,b,c,2
PMCID: PMC5087027  PMID: 27729530

Significance

Transcription is a fundamental and dynamic step in the regulation of gene expression, but the characteristics of plant transcription are poorly understood. We adapted the global nuclear run-on sequencing (GRO-seq) and 5′GRO-seq methods for plants and provide a plant version of the next-generation sequencing software HOMER (homer.ucsd.edu/homer/plants) to facilitate data analysis. Mapping nascent transcripts in Arabidopsis thaliana seedlings enabled identification of known and novel transcripts and precisely mapped their start sites, revealing distinct characteristics in plant transcription. Our modified method to map engaged RNA polymerases and nascent transcripts in primary tissues paves the way for comparative and response studies.

Keywords: plant transcription, nascent transcripts, RNA polymerase pausing, 5′GRO-seq, GRO-seq

Abstract

Transcriptional regulation of gene expression is a major mechanism used by plants to confer phenotypic plasticity, and yet compared with other eukaryotes or bacteria, little is known about the design principles. We generated an extensive catalog of nascent and steady-state transcripts in Arabidopsis thaliana seedlings using global nuclear run-on sequencing (GRO-seq), 5′GRO-seq, and RNA-seq and reanalyzed published maize data to capture characteristics of plant transcription. De novo annotation of nascent transcripts accurately mapped start sites and unstable transcripts. Examining the promoters of coding and noncoding transcripts identified comparable chromatin signatures, a conserved “TGT” core promoter motif and unreported transcription factor-binding sites. Mapping of engaged RNA polymerases showed a lack of enhancer RNAs, promoter-proximal pausing, and divergent transcription in Arabidopsis seedlings and maize, which are commonly present in yeast and humans. In contrast, Arabidopsis and maize genes accumulate RNA polymerases in proximity of the polyadenylation site, a trend that coincided with longer genes and CpG hypomethylation. Lack of promoter-proximal pausing and a higher correlation of nascent and steady-state transcripts indicate Arabidopsis may regulate transcription predominantly at the level of initiation. Our findings provide insight into plant transcription and eukaryotic gene expression as a whole.


Gene expression is a hallmark of life and subject to adaptation in changing environments. Steady-state transcript levels are a result of transcription initiation, elongation, and termination, followed by maturation and decay. Much has been learned about transcriptional mechanisms using yeast and animal models. In contrast, owing to technical difficulties created by plant cell extracts, there remains a large gap in knowledge in plant transcription. Plants and animals diverged more than 1.6 billion years ago. Studying plant transcription therefore not only contributes to a better understanding of the world’s largest food source but also the evolution of eukaryotic gene expression.

The signals initiating transcription are ultimately integrated at the promoter. Sequence-specific transcription factors (TFs) commonly bind the proximal promoter around −150 to −50 bp upstream of the transcriptional start site (TSS) (1, 2). At the core promoter, located approximately ±50 bp relative to the TSS, basal TFs cooperate with conserved DNA sequence motifs to orchestrate recruitment of the RNA polymerase (RNAP) (1, 3). Transcription has been studied extensively in a number of species (13) but not in plant model systems. Studies focusing on promoter-enriched sequences were hindered by the lack of precise TSSs (4, 5) but have improved dramatically through techniques such as paired end analysis of transcription start sites (3PEAT) (6) and cap analysis gene expression (CAGE) (7), but both methods are affected by RNA processing and transcript stability.

To comprehensively study global transcription it is essential to map all transcripts, regardless of RNA stability. Nascent RNA sequencing by global nuclear run-on sequencing (GRO-seq) (8), precision nuclear run-on sequencing (PRO-seq) (9), or native elongating transcript sequencing (NET-seq) (10) highlighted the abundance of unstable transcripts in some eukaryotes such as yeast and mammals (11), and yet these methods have been difficult to perform in plants. GRO-seq was recently used in maize seedlings and provided important insight into monocot transcription (12) but with limited TSS data and the omission of sarkosyl during the run-on reaction. Sarkosyl is required to block RNAP initiation, unhindered elongation, and efficient pause release (13, 14). We thus sought to optimize traditional GRO-seq for plants using Arabidopsis as a model with the aim to make it readily available to the community.

Here, we report an adapted GRO-seq method (8), as well as a new version of HOMER (15), to facilitate analysis of plant next-generation sequencing (NGS) data. In this study, we focus on 7meG-capped transcripts as generated by RNAP II from 6-day-old Arabidopsis seedlings to identify transcripts encoding protein-coding genes, microRNAs (miRNAs), and other noncoding RNAs. De novo annotation of nascent transcripts revealed many unstable noncoding transcripts, although these transcripts were underrepresented in Arabidopsis compared with mammals. Motif analysis identified previously unreported promoter motifs and revealed comparable structures for promoters of coding and noncoding transcripts. Nascent RNA sequencing highlighted the lack of divergent transcripts and promoter-proximal pausing but prominent 3′ pausing that was also apparent in maize. Together, these data affirm distinct features of plant transcription and demonstrate remarkable diversity in the regulation of eukaryotic transcription.

Results

Nascent Transcript Profiles in Arabidopsis thaliana.

To comprehensively characterize the general features of transcription in plants, we adapted GRO-seq and 5′GRO-seq for use with 6-day-old A. thaliana seedlings (Fig. 1A and Fig. S1; see SI Materials and Methods for a detailed protocol). GRO-seq captures nascent RNA independent of RNA stability, thereby providing precise maps of engaged RNAP in a strand specific manner (8); 5′GRO-seq specifically enriches for cap-protected 5′ ends, facilitating TSS mapping of nascent transcripts at single-nucleotide resolution (16). Through enzymatic modifications, we enriched nascent transcripts produced by RNAP II (for details, see Fig. S2). We further profiled steady-state transcripts by conventional RNA-seq for comparison with nascent transcript levels. As exemplified for the gene At4g10180, GRO-seq reads align to the full transcript including introns, 5′GRO-seq enriches for 5′ fragments of the gene, and RNA-seq maps the mature, intron-less transcript (Fig. 1B). For our analysis, we expanded the HOMER (15) software for plants. In total, we observed active transcription covering ∼40% of the genome by GRO-seq and 28% by RNA-seq in 6-d-old Arabidopsis seedlings at 33 million reads. Although this number is in part dependent on sequencing depth (Fig. S3A), it notably differs from humans, where ∼75% of the genome was found to be transcribed across different cell lines, with no individual line transcribing more than 57% (17). The ratio of GRO-seq/RNA-seq coverage was 1.39 in Arabidopsis, which is significantly smaller than in humans at 1.93 (Wilcoxon P value < 0.01; Fig. 1C), suggesting there are fewer unstable transcripts and introns in Arabidopsis.

Fig. 1.

Fig. 1.

GRO-seq reveals distinct features in A. thaliana transcription. (A) GRO-seq method in Arabidopsis. (B) Browser shot of sample gene At4g10180.1 with normalized read densities along the y axis. (C) Ratio of nascent/steady-state transcript genome coverage as a function of GRO-seq/RNA-seq coverage for Arabidopsis seedlings and human IMR-90 cells (8). The Wilcoxon test was used to calculate P value. (D) Distribution of RNA-seq and GRO-seq reads relative to annotations or extended annotations (±500 bp) (Right) for Arabidopsis and human IMR-90 cells. (E) Metaplot of GRO-seq signal from annotated genes normalized for reads per bp per gene along y axis for Arabidopsis and human IMR-90 cells. (F) Intergenic sites were defined by DNase-seq peaks [FEA4 ChIP-seq for maize (19)], and heat maps were generated ±1 kb from intergenic sites for signal from DNase-seq, GRO-seq, RNA-seq, H3K4me3, H3K9/27ac, and input in Arabidopsis, maize, and IMR-90 cells (8, 12, 1825). Sites are sorted based on the total GRO-seq signal observed within 400 bp of the intergenic peak.

Fig. S1.

Fig. S1.

Summary of sequencing experiments performed and published data reanalyzed for this study.

Fig. S2.

Fig. S2.

(A) Effect of enzymes on 5′ monophosporylated (5′Pi) or capped RNA (CAP). T4 RNAP synthesized RNA (264 nt) was kinased using T4 PNK and [α-32P]ATP or capped with the Vaccinia Capping System (M2080) and [α-32P]GTP, as described by the manufacturer. (B) Comparison of RppH activity on 32P-capped RNA in buffer NEB II vs. NEB T4 RNA ligase buffer. (C) 32P-capped RNA (10 pmol) (264 nt) incubated with 0.5 U of RppH at 37 °C and 20 °C. (D) [32P]5′-adenylated oligo (20 pmol) (55 nt) incubated with 2 U of RppH at 20 °C and 37 °C in T4 RNA ligase buffer. (E) Assessment of run-on length: nuclei were run on using the described run-on conditions (20 nM CTP-limiting) for the indicated time in the presence and absence of 4 ng/µL α-amanitin, a concentration efficiently inhibiting RNAP II transcription. For visualization of actual run-on length, nuclei were incubated in Freezing Buffer + RNase A (0.25 mg/mL) for 20 min at 4 °C followed by 5 min at RT and consecutively washed three times before run-on.

Fig. S3.

Fig. S3.

(A) A. thaliana genome coverage for at a given GRO-seq or RNA-seq depth with SDs. (B) Metaplot of GRO-seq and RNA-seq signal from unidirectional promoters of annotated genes. Only isolated TSSs where the closest TSS for another gene is at least 1 kb away were used. GRO-seq/RNA-seq data are presented normalized for reads per base pair per gene along the y axis for Arabidopsis, corn, and human IMR-90 cells (refs. 8, 12, and 21; NCBI GEO database accession no. GSE76939). The distance from the TSS is plotted along the x axis.

In addition to the nuclear and mitochondrial genomes, plants contain a third densely packed chloroplast genome derived from the cyanobacterial lineage. Although we depleted chloroplasts during nuclei isolation and selected against 5′-monophosphorylated RNAs, the organelle is so abundant that a substantial number of nascent transcripts were still captured. We found ∼76% of the chloroplast as actively engaged with notable bidirectional transcription, demonstrating pervasive transcription in this organelle. This result demonstrates the potential for characterizing prokaryotic or viral polymerases using GRO-seq, even though this was not the goal of our study.

GRO-seq revealed 83% and 68% of engaged RNAPs occupy the sense strand of the Arabidopsis nuclear and chloroplast genomes, respectively. On the nuclear genome, ∼4% occupied the antisense strand of genes, and 13% mapped to unannotated regions. These numbers were significantly higher for the chloroplast genome, with 7% and 25% mapping to antisense genic and unannotated regions (Fig. 1D). By comparison, 98% of the RNA-seq mapped to the sense strand of nuclear genes. Together, these findings suggest that nuclear RNAPs are heavily engaged on the sense strand in Arabidopsis, particularly compared with humans (8). Indeed, expanding the annotations by 500 bp to either side increases the number of nuclear engaged RNAP to 95%, suggesting the majority of nonannotated transcription occurs directly adjacent to the annotated TSS and transcription termination site.

Notably, Arabidopsis seedlings lack significant divergent transcription as well as promoter-proximal pausing (Fig. 1E). To more thoroughly investigate these findings, we removed promoters within 1 kb of each other to prevent signal overlap appearing as promoter antisense transcription. Replotting the Arabidopsis and maize GRO-seq data (14) revealed striking directionality (Fig. S3B). It is important to note that the run-on reactions in maize were performed in the absence of sarkosyl, which blocks the initiation but not elongation of RNAP complexes (13) and strips off DNA-associated proteins such as histones (14). The prevalence of promoter-proximal pausing can thus not be ruled out in maize. However, the apparent lack of promoter-proximal pausing in both plant species argues that transcription is predominantly regulated at the level of initiation.

To investigate the presence of enhancer RNAs in plants, we mapped intergenic open chromatin regions using published DNaseI hypersensitivity (DNase-seq) data for Arabidopsis (18) and FASCIATED EAR 4 (FEA4) ChIP-seq (19) peaks for maize (due to a lack of DNase-seq data). In total, 2,467 putative intergenic enhancers were identified in Arabidopsis and 4,665 in maize compared with 21,847 in the human lung fibroblast IMR-90 cell line. Each site was sorted based on their GRO-seq signals, and heat maps were generated for ±1 kb from the intergenic site of DNaseI chromatin accessibility. Very little GRO-seq, RNA-seq, or enhancer-associated chromatin marks (H3K9/27ac) were found in Arabidopsis and maize compared with humans, and both plants lack the distinctive bidirectional transcription common at mammalian enhancers (Fig. 1F) (8, 12, 1825). Given these data, it appears that if plants have enhancer elements, they rarely, if at all, produce transcripts and therefore differ from mammalian enhancers.

Nascent Transcript Identification.

Unlike RNA-seq, which measures steady-state levels of RNA species, GRO-seq captures nascent transcripts independent of transcript stability (8, 16). This method can be exemplified by the microRNA MIR158A (Fig. 2A) (26). The annotated miRNA used in previous studies is 100 bp, whereas the actual primary transcript as mapped by GRO-seq is more than 1 kb in length and initiates several hundred base pairs upstream of the current annotation (26). Additionally, GRO-seq captures transcripts previously undetected by RNA-seq or 3PEAT (Fig. 2A). We therefore created an unbiased atlas of Arabidopsis transcription using de novo transcript discovery based on GRO-seq expression and 5′GRO-seq to annotate the 5′ ends of each transcript. This identified 9,200 high-confidence transcripts defined by a continuous transcribed region (>10 reads GRO-seq) with a TSS defined by 5′GRO (>threefold-enriched) for 8,767 annotated protein-coding genes, 264 gene-antisense, 16 promoter-antisense, 117 annotated noncoding, and 36 unannotated intergenic transcripts (Fig. 2B and Dataset S1). Intergenic transcripts were unstable and significantly less abundant than described in human cell lines (8, 27, 28). However, gene-antisense RNAs, which were described as modulators of gene expression (29), are more enriched in Arabidopsis, suggesting an increase in antisense gene regulation in plants compared with humans (12, 27, 29). Comparison of GRO-seq and RNA-seq transcript levels at each transcript provides an estimate of transcript stability because nascent transcripts are unaffected by degradation. Plotting the de novo transcripts with respect to their GRO-seq and RNA-seq levels revealed a range of stability for annotated protein-coding genes compared with general instability for noncoding annotated RNAs, miRNAs, promoter-antisense transcripts, and unannotated noncoding transcripts (Fig. 2C). The transcripts start sites largely agreed with those defined using 3PEAT in roots (7), but noncoding transcripts were more exclusive to 5′GRO-seq (Fig. S4).

Fig. 2.

Fig. 2.

Genome-wide identification and characterization of nascent transcripts in A. thaliana. (A) Example browser shot for noncoding transcripts MIR158a and At2g30520. Relative read densities for 5′GRO/GRO and RNA-seq were scaled by 10% to enable visualization alongside the 3PEAT TSS mapping data at these loci. (B) Classification of de novo-identified high-confidence transcripts with HOMER. (C) RNA stability plot of RNA-seq reads vs. GRO-seq reads for identified transcript groups in B. (D) Representative list of GO terms with Benjamini values (multiple testing-corrected) identified for the most stable and unstable transcripts defined <eightfold more RNA-seq than GRO-seq over exons. Unstable transcripts were defined as having eightfold more GRO-seq than RNA-seq. (E) Metaplot at the TSSs of 8,767 annotated genes and 153 noncoding transcripts from B with normalized reads per base pair per TSS for chromatin modifications (18, 23, 25) (Top) and transcripts (Bottom).

Fig. S4.

Fig. S4.

Comparison of 5′GRO and 3PEAT methods. (A) Distribution of TSSs identified from 6-d-old Arabidopsis seedling 5′GRO-seq (blue) and 7-d-old Arabidopsis root 3PEAT (red) data relative to the TAIR10 TSS annotation. The diagram depicts the number of TSSs called and the overlap among the datasets. (B) Mapping of 3PEAT TSSs against 5′GRO TSS. (C) Comparison of 5′GRO vs. 3PEAT TSSs: all TSSs from non–protein-coding genes as a function of the 3PEAT/5GRO-seq ratio. Note that datasets were derived from different tissues, and yet TSSs specific to 3PEAT were predominantly protein-coding, whereas TSSs specific to 5′GRO-seq associated with both noncoding as well as protein encoding genes.

Comparison of RNA-seq vs. GRO-seq for exon coverage of human IMR-90 cells revealed a higher variance and lower correlation than in Arabidopsis (Arabidopsis, r2 = 0.57; Human, r2 = 0.32; Fig. S5A), underlining a much tighter correlation between transcription and steady-state RNA levels in Arabidopsis. Only exons were used to avoid bias associated with differential intron length between species. Together with the absence of promoter-proximal pausing, this correlation proposes Arabidopsis transcription is more predominantly regulated at the level of transcription initiation compared with humans.

Fig. S5.

Fig. S5.

(A) RNA stability plot using exons. [log2]-normalized RNA-seq reads vs. [log2]-normalized GRO-seq reads for RefSeq genes in IMR-90 cells (8, 21) and Arabidopsis seedlings. (B) GO analysis of terms enriched for genes with TGT-containing (motif 5) promoters over Inr-containing (motif 1) promoters. (C) Motif finding on TGT-containing promoters. (D) The percentage of promoters containing each motif in the region 200 bp upstream and 100 bp downstream of the TSSs. Each motif was analyzed for the transcript classes of coding genes (purple), noncoding transcripts (green), and gene-antisense transcripts (red) as well as a random genomic regions as a control (blue).

To investigate whether RNA stability was associated with biological functions, we performed gene ontology (GO) analysis of the most stable and unstable transcripts (>eightfold enriched; Dataset S2). Stable transcripts were associated with translation, photosynthesis, and metabolic functions, whereas unstable transcripts were enriched for stimulus response genes, signal transduction, and hormones. These findings are consistent with the biological theme that transcripts associated with essential processes are stable, whereas regulated genes tend to be less stable.

Differential analysis of the epigenetic landscape revealed unifying signatures at annotated protein-coding and noncoding of de novo transcripts. H3K4me3, H3K27ac, and H3K36me3 peaked in proximity to the +1 nucleosome and H3K4me2 slightly downstream (23, 25). DNase hypersensitivity overlapped with the promoter region, and H3K4me1 and H3K36me2 increase along the body of the gene (18, 23, 25), as expected (30, 31) (Fig. 2 E).

Arabidopsis Promoter Structures and Identification of the TGT Core Promoter Motif.

High-resolution nascent TSS data are provided by 5′GRO-seq, enabling investigation of promoter elements in a distance-specific manner. The core promoter region of the 9,200 identified Arabidopsis transcripts was remarkably GC-poor, with a strong AT enrichment around −30 bp, suggesting a predominant role of the TATA-box (Fig. 3A). This finding contrasts with human core promoters that are ordinarily GC-rich and only slightly enriched for the TATA-box (∼10%). De novo motif analysis of initiation sites using HOMER underlined the strong prevalence of an Initiator element (Inr)-like motif (44.8%) and variations thereof (Fig. 3B). Notably, the Arabidopsis Inr consensus sequences “TYA(+1)YYN” and “TYA(+1)GGG” differ from the traditional Inr “TCAKTY” in Drosophila (32). Our analysis further identified an “initiator” that we termed the “TGT motif” at ∼4% of Arabidopsis TSSs (Fig. 3B). Analysis of HeLa (28) and Drosophila S2 5′GRO-seq data revealed the TGT motif to be conserved in humans and flies (Fig. 3C). Compared with the enriched GO terms of genes associated with Inr-containing promoters, TGT-associated genes were slightly enriched for terms related to in negative regulation, chromatin organization, gene silencing, and dsRNA response (Fig. S5B).

Fig. 3.

Fig. 3.

Arabidopsis promoter features and motifs. (A) Metaplot of nucleotide frequency with respect to the +1 TSS as defined by 5′GRO-seq for annotated transcripts at the core promoter region of Arabidopsis and human HeLa samples (28). (B) Position-restricted de novo motif analysis of Arabidopsis initiating nucleotides using HOMER. Percentage of motifs at TSSs compared with background levels. (C) TGT motif as identified at Drosophila S2 and human (HeLa) TSSs. (D) De novo motif analysis of the proximal promoter region from −150 to +50 with respect to the TSS using HOMER. Identified motifs with possible matches (Left) and percent of TSSs containing the motif (Right) along with background levels. (E) Metaplot of the TF binding sites (Left) and simple repeat elements (Right) in distance to the TSS along the x axis. (F) Simplified schematic of regulatory sequence features at Arabidopsis promoters.

Sequence-specific TFs modulate gene expression and commonly bind the proximal promoter region. De novo motif analysis using HOMER highlighted a predominance of simple sequence repeats in Arabidopsis but also identified TF binding sites, two of which were unknown (Fig. 3 D and E). The most prevalent sequence patterns were TC and CKT repeats present downstream of the TSS and simple polyA/T repeats upstream thereof. AT-rich sequences were reported to inhibit nucleosome formation and aid DNA flexibility, thereby facilitating TF recruitment (33). The TATA-box was found in 31% of plant promoters, similar to the 29% reported previously (4). This percentage is higher than in Drosophila or humans and suggests that the −30 and +1 regions are particularly relevant in Arabidopsis core promoters. The most enriched TF motifs were the enhancer box (E-box) and TEOSINTE-BRANCHED CYCLOIDEA/PROLIFERATING CELL FACTOR (TCP/PCF) elements, with 10.5% and 13.2%, respectively. The E-box is targeted by the basic helix–loop–helix TFs (34), and both the E-box and TCP motif are conserved among plants (35) and critical for development, which is reflected in their prevalence in the promoters of genes regulated during seedling development. Additionally, two unidentified motifs were found that have not been reported previously in Arabidopsis (Fig. 3 D and E). Using the Catalogue of Inferred Sequence Binding Preferences (CIS-BP) database (36), the closest but not exact match for unknown motif 1 was Sterol Uptake Control Protein 2 (UPC2) (NRWACGA), whereas unknown motif 2 matched best to Activator of Stress Genes 1 (ASG1) (WTCCGG), both belonging to the zinc cluster TF family in Saccharomyces (37). The factors binding these motifs in Arabidopsis remain to be identified. Cognate promoter motifs in TGT- or Inr-containing promoters did not differ notably with the TATA element slightly depleted and TCP/PCF elements enriched in TGT-containing promoters (Fig. S5C).

Given our classification of coding, noncoding, and antisense transcripts, we compared the promoters from each class. Although there were some minor differences (Fig. S5D), overall, the classes contained similar sequence motifs and general sequence composition. Transcriptional stability and transcript properties are thus unlikely to be encoded within the promoter. A simplified model for the basal Arabidopsis promoter structure at this stage of development is proposed (Fig. 3F).

RNAP 3′ Accumulation.

Although promoter-proximal pausing in Arabidopsis seedlings was not readily apparent, we noticed a sharp accumulation of RNAP adjacent to the 3′ polyadenylation site (PAS) (38) (Fig. 4A). Analysis of published RNAP II ChIP-seq (40) also showed a clear increase in 3′ paused polymerase (Fig. 4B), whereas the RNA-seq signal approaches zero as expected; 3′ pausing at a lesser extent was previously described (8, 39), but a mechanism remains elusive. We found no defining chromatin marks, polyadenylation signals, or nucleotide frequency differences between paused and unpaused genes but found an association between 3′ pausing and both gene length and CpG methylation (Fig. 4 C and E). A breakdown of genes by length revealed several distinct characteristics between the GRO-seq signal, RNA-seq signal, and amount of 3′ RNAP accumulation (Fig. 4D). Longer genes show a higher accumulation in the GRO-seq reads compared with RNA-seq. Reanalyzing maize GRO-seq data (12) showed similar strong 3′ accumulation of nascent RNA that increased with gene length (Fig. S6A), suggesting that the higher level of 3′ pausing for longer genes is a characteristic of plant transcription.

Fig. 4.

Fig. 4.

Arabidopsis shows extensive 3′ RNAP accumulation in proximity to the PAS followed by rapid transcription termination. (A) Metaplot of GRO-seq signal from TAIR10 annotated genes for Arabidopsis and human samples (8). Reads were normalized as reads per base pair per gene. (B) RNAP II accumulation as shown by ChIP-Seq (purple) (40) with RNA-seq (+) (green) and (-) (tan) signal around the PAS. (C) The average gene length as a function of 3′ pausing index for expressed genes (gene body fragments per kilobase of transcript per million mapped reads: >5; at least 10 reads from −250 to 0 relative to the PAS). Index calculated as the ratio of the reads from 0 to +250 relative to the PAS compared with the reads from 0 to −250. Randomized data are shown in tan, and SDs were calculated based on 1,000 randomizations. (D) Metaplots anchored by TSS (0%) and PAS (100%) of GRO-seq and RNA-seq for genes >2.0 kb (Left) and <1 kb (Right) in total length. Reads were normalized as per base pair per gene. (E) CpG methylation (Left) and CHG methylation (Right) (50) were plotted as percentage methylation along the normalized gene body.

Fig. S6.

Fig. S6.

(A) Metagene plots anchored by the TSS (0%) and PAS (100%) of corn GRO-seq (12) and RNA-seq (unstranded) (NCBI GEO database accession no. GSE76939) for genes >2.0 kb (Left) and <1 kb (Right). Reads were normalized per base pair per gene. (B) Methylation of corn using DNA methylation ChIP for long genes >2 kb (Left) and short genes <1 kb (Right) (24). Reads were normalized per base pair per gene.

In addition to gene length, CpG methylation was associated with 3′ pausing. In Arabidopsis, CpG methylation was excluded from the 3′ pausing sites and promoter regions of transcribed genes, whereas CpG methylation is only excluded from the promoter region in mammalian systems (22) (Fig. 4E). In aggregate, long genes showed an average maximum of CpG body methylation at ∼40% in the middle of the gene, which drops below 10% at their 5′ and 3′ ends. However, short genes rarely showed over 15% body methylation and drop comparatively minimally at the 3′ end, demonstrating a connection between CpG methylation and 3′ pausing. In contrast, CHG methylation exhibited a distinctly different pattern with a drop at the promoter but no decrease at the pause site, suggesting a specific exclusion of CpG methylation from the PAS in Arabidopsis. Although a comparable MethylC-sequencing dataset did not exist in maize, we were able to reanalyze methyl-DNA immunoprecipitation (24) to show methylation is generally excluded from both the 3′ and 5′ ends of genes as seen in Arabidopsis (Fig. S6B). These data demonstrate a connection between 3′ pausing of engaged RNAP and CpG methylation as another distinct characteristic of plant transcription.

SI Materials and Methods

Plant Material and Growth Conditions.

The A. thaliana accession Columbia (Col-0) seeds were sterilized with chlorine gas by mixing 100 mL of bleach and 5 mL of concentrated HCl and then grown on plates containing half LS medium (Caisson Laboratories). Plates were placed at 4 °C for 3 d for vernalization and then placed in growth chambers with 24 h of light at 22 °C for 6 d before tissue collection.

Nuclei Isolation.

Approximately 20 g of 6-d-old seedlings were homogenized with an OMNI International General Laboratory Homogenizer in ∼100 mL of ice-cold grinding buffer [300 mM sucrose, 20 mM Tris (pH 8.0), 5 mM MgCl2, 5 mM KCl, 0.2% (vol/vol) Triton X-100, 5 mM β-mercaptoethanol, 35% (vol/vol) glycerol] at 4 °C. Samples were filtered twice through a 250-μm nylon mesh and then Miracloth (EMD Millipore) before being passed through a 50-µm cell strainer into 50-mL conical tubes. Tubes were spun for 10 min at 5,000 × g, and supernatant was discarded. Pellets were washed twice by homogenization in cold grinding buffer using a Kimble loose dounce (Fisher Scientific). Nuclei were resuspended in freezing buffer [50 mM Tris (pH 8.0), 5 mM MgCl2, 20% (vol/vol) glycerol, 5 mM β-mercaptoethanol] and snap frozen in liquid N2.

GRO-seq and 5′GRO-seq Library Preparation.

GRO-seq and 5′ GRO-seq were performed similar to described in refs. 8, 14, and 35, with the following modifications (detailed protocol below):

  • i)

    RNA isolation: The prevalence of secondary metabolites and cell wall residues can make resuspension of the run-on RNA difficult if not sufficiently cleaned up.

  • ii)

    Terminator Exonuclease: After the nuclear run-on step, we have included treatment with Terminator Exonuclease (TER51020; Epicentre) to degrade 5′ monophosphorylated transcripts from RNAP I and likely IV and V, as well as chloroplast RNA from the chloroplast NRA polymerases. RNA 5′ polyphosphatase (RP8092H) can be added to additionally deplete Pol III transcripts.

  • iii)

    Decapping using RppH: Tobacco acid phosphatase is commonly used; however, we prefer RppH (M0356) because it is active in T4RNA ligase buffer and less expensive. In addition, we found RppH to be active at 37 °C but not at 20 °C, allowing us to perform the decapping and immediately proceed to 3′ adapter ligation at 20 °C without changing buffers or cleaning up the RNA (Fig. S2).

  • iv)

    Nuclear run-on temperature: 30 °C is a strong stress condition for plants. The temperature was reduced to room temperature (RT) (22 °C) for the nuclear run-on.

Nuclear run-on.

Run-on of ∼2× 2.5 × 106 nuclei in 200 µL of freezing buffer was carried out by addition of 100 µL of 3× NRO reaction buffer [15 mM Tris⋅HCl (pH 8.0), 450 mM KCl, 7.5 mM MgCl2, 1.5% (vol/vol) sarkosyl, 1.5 mM DTT, 0.2 U/µL SUPERase-in (Fisher Scientific), 375 µM ATP, 375 µM GTP, 60 nM CTP, 375 µM BrUTP (Sigma Aldrich)]. The reaction was mixed by pipetting up and down with an end-cut 200-µL tip and incubated in a RT (∼22 °C) water bath for 5 min. Run-on was slowly stopped by addition of 15 µL of 10× RQ1 DNaseI buffer, 50 µL of nuclease-free deionized water (dH2O), and 5 µL of RQ1 DNase (Promega) for 15 min, followed by addition of 80 µL of TxnSTOP mix [20 mM EDTA, 200 mM NaCl, 1% (vol/vol) SDS, 0.3 mg/mL glycogen] plus 10 µL of 2.5 mg/mL proteinase K and incubated at 37 °C for 30 min. RNA was extracted using TRIzol LS (Fisher Scientific), as described by the manufacturer, and RNA pellets were combined during a second 75% (vol/vol) EtOH wash.

Terminator and DNase treatment.

RNA pellets were resuspended in ∼24 µL of dH2O with 0.05% (vol/vol) Tween20 (dH2O+T); the volume is increased as needed. Full resuspension is critical. Add Terminator 5′–Phosphate-Dependent Exonuclease (TER51020; Epicentre) Buffer A and supplement with 5 mM CaCl2, 0.2 U/µL Terminator enzyme, 0.5 U/µL SUPERase-in, and 0.2 U/µL RQ1 DNase (Promega). Incubate for 1 h at 30 °C. Critical: the required amount of Exonuclease is dependent on nuclei purity. Titration is suggested. Rebuffer RNA using P30 RNase-free spin column (Bio-Rad) to remove free BrUTP.

RNA hydrolysis.

Bring RNA to 27 µL with dH2O+T and add 3 µL of 10× fragmentation reagent (AM8740; Ambion) on ice. Incubate for 12 h at 70 °C in a 1.5-mL tube and then put on ice and add 3.3 µL of provided STOP solution. (Note: If exclusively doing 5′GRO, skip fragmentation and then 3′ repair.)

BrUTP enrichment.

BrdU antibody beads (sc-32323AC; Santa Cruz Biotechnology) may contain RNase. Testing of the lot number is recommended. Wash beads once in GRO binding buffer [0.25× saline-sodium-phosphate-EDTA buffer (SSPE), 0.05% (vol/vol) Tween, 37.5 mM NaCl, 1 mM EDTA] plus 300 mM NaCl and then three times in GRO binding buffer. Resuspend as 25% (vol/vol) slurry with 0.1 U/µL SUPERase-in. Beads are stable in GRO binding buffer at 4 °C for at least 1 mo.

Bring the sample to 500 µL with cold GRO binding buffer and add 40 µL of equilibrated BrdU antibody beads. Slowly rotate samples at 4 °C for 80 min. Spin down beads for 30 min at 1,000 × g and let beads settle. Remove supernatant but do not disturb beads. Transfer beads to a Millipore Ultrafree MC column (UFC30HVNB; Millipore) with 2× 200 µL of cold GRO binding buffer. Spin at 1,000 × g for 1 min and discard flow-through. Wash 2× with 450 µL of GRO binding buffer for 5 h under fast rotation. Move the column to a fresh tube and elute with 200 µL of TRIzol LS under gentle shaking for 5 min. Spin through and repeat elution. Add 120 µL of dH2O+T and extract RNA as described by the manufacturer.

End repair.

Resuspend RNA pellets in 21 µL of dH2O+T and 1 µL of SUPERase-in. Add 6 µL of 5× low pH T4 polynucleotide kinase (PNK) buffer [0.5 M Mes (pH ∼5.6), 50 mM MgCl2, 50 mM mercaptoethanol, 1.5 M NaCl] and 2 µL of PNK. Incubate at 37 °C for 1 h. From here, the protocol diverges for GRO vs. 5′GRO.

For GRO.

Add kinasing master mix containing 10 µL of PNK buffer (NEB), 1 µL of T4 Polynucleotide Kinase, 5 µL of 10 mM ATP, and dH2O+T for a 100-µL final volume and incubate another 1 h at 37 °C.

For 5′ GRO.

Add phosphatase master mix containing 10 µL of NEB CutSmart, 2 µL of CIP (alkaline phosphatase, calf intestinal) (NEB), and dH2O+T to a 100-µL final and incubate another 1 h at 37 °C.

Add 3 µL of 0.4 M EDTA/ 0.1 M EGTA (pH 8.0) to reaction and denature RNA for 3 min at 75 °C and then place on ice.

From here, the protocols join together again. Bring the sample to 500 µL with GRO binding buffer and repeat BrUTP enrichment.

Decapping.

Dissolve RNA pellet in 3.5 µL of TE’T [10 mM Tris (pH 7.5), 0.1 mM EDTA, 0.05% (vol/vol) Tween 20], heat to 75 °C for 2 min and then place on ice. Add 5.5 µL of decapping master mix [1 µL of 10× T4 RNA Ligase Buffer, 0.5 µL of SUPERase-in, 4 µL of 50% (vol/vol) PEG 8000] and 1 µL (5 U) of RppH (NEB) for up to 100 ng of RNA. Incubate at 37 °C for 1 h.

sRNA library preparation.

Proceed with small RNA library for example as described by the NEB Next Small RNA Library Prep Set (NEB) with one exception. Because decapping was performed in “10 µL of 3′ Ligation buffer,” add only a total of 10 µL of 3′ Adapter mix (5 µL of 2× buffer, 0.5 µL of 3′ Adapter, 1.5 µL of dH2O+T, 3 µL of Enzyme Mix) and incubate the sample at 20 °C for at least 1 h [RppH is inactive and does not degrade the 3′ adenylated 3′ adapter at 20 °C (Fig. S2)]. Samples were amplified for 11–14 cycles, purified over a 10% (vol/vol) acrylamide 1× Tris/Borate/EDTA (TBE) gel and size-selected for 160–300 bp for 5′GRO-seq and 160–225 bp for GRO-seq.

RNA-seq Library Preparation.

Total RNA was extracted from seedlings frozen in liquid nitrogen using RNeasy Plant Mini Kit (Qiagen); 10 µg of total RNA was used for extraction of mRNA with the Poly(A)Purist MAG Kit (Ambion) according to the manufacturer’s instructions. Isolated mRNA was fragmented with Fragmentation Reagents (Ambion) for 14 min. Fragmented RNA was incubated with T4 polynucleotide kinase (NEB) in low pH buffer [0.1 M Mes (pH 5.6), 10 mM MgCl2, 10 mM mercaptoethanol, 300 mM NaCl final] for 3′ repair and then kinased by adding three volumes of PNK buffer, 2 μL PNK, and 1 mM ATP final. The Small RNA Library Prep Set was (NEB) was used for library preparation. Samples were amplified for 8–10 cycles, purified over a 10% (vol/vol) acrylamide 1× TBE gel, and size-selected for 160–300 bp.

Analysis for NGS Data.

All data generated by this study and previously published data were reanalyzed in a consistent manner, as described below. Adapter sequences were trimmed from the 3′ ends of all RNA-seq, GRO-seq, and 5′GRO-seq reads, and the reads were then aligned using STAR (version 2.4.0k; default parameters) (51) to the appropriate genome: TAIR10 (Arabidopsis), AGPv3 (maize), or hg19 (human). All ChIP-seq and DNase-seq data were aligned to the appropriate genome using Bowtie 2 (default parameters) (52). Only reads that aligned to the genome at a single unique location were considered for downstream analysis (mapping quality score >10). All analyses or visualizations of NGS data were normalized to a read depth of 107 total uniquely aligned reads per experiment. Genome browser tracks depicting normalized read densities for each experiment were generated using HOMER (15) and visualized in IGV (Integrative Genome Viewer) (53). Metagene plots, histograms, heat maps, GO-enrichment analysis, and gene expression values containing normalized read densities were generated using HOMER, using gene and feature annotations from TAIR10 (Arabidopsis), Ensembl (release 30/AGPv3, maize), and RefSeq (human).

De Novo Transcript Identification.

Transcript discovery and annotation were performed using routines in HOMER as described below. Transcription units were identified directly from GRO-seq data using the HOMER program findPeaks (-style groseq with default parameters v4.8) as described by Wang et al. (54), which looks for regions of continuous, strand-specific GRO-seq read coverage to identify transcripts. TSSs were found using findPeaks (-style tss –F 3 –L 3) as described in Lam et al. (16), which identifies strand specific peaks from 5′GRO-seq data using traditional GRO-seq as a control. Because 5′GRO-seq is only an enrichment for capped protected RNA fragments, only peaks containing threefold more 5′GRO-seq reads compared with traditional GRO-seq were considered as bona fide TSSs. TSSs discovered by 5′GRO-seq were then assigned to GRO-seq transcripts if they were found within 1 kb of the 5′ end of the de novo GRO-seq transcript. In cases where multiple TSSs could be assigned to the same transcript, the TSS with the highest read density was used. Only de novo identified transcripts that were assigned a valid 5′GRO-seq TSS were considered in downstream analysis. Transcripts were annotated into different classes as follows. First, transcripts that strand-specifically overlap any known transcript (TAIR10) were assigned the accession number/annotation of the known transcript. Next, transcripts that overlapped any known transcript on the opposite strand are assigned as “antisense transcripts.” Next, transcripts found upstream of an annotated TSS in the opposite strand of the annotated genes were assigned as “promoter-antisense transcripts.” Transcripts failing to meet any of these criteria were assigned as “novel transcripts.”

Motif Discovery.

De novo motif discovery and calculation of motif positions were performed using HOMER. Two strategies were used to identify proximal-promoter–enriched motifs and Inr elements, respectively. Proximal promoter motifs were identified by applying de novo motif discovery to the sequence from −150 to +50 bp relative to TSS discovered using 5′GRO-seq, using random regions of the Arabidopsis genome as background. To identify Inr motifs, position-restricted motif discovery was performed by specifically analyzing the sequences from −4 to +6 bp relative to the TSS for 11-bp motifs, using randomly selected 11-bp sequences from the surrounding 100 bp of promoter sequences as background.

Discussion

This study has put forward a GRO-seq method for mapping engaged RNAP at a genome-wide level in primary plant tissue. The identification of nascent transcripts and definition of TSSs revealed distinct characteristics of Arabidopsis transcription and their connection to other eukaryotic systems. The lack of divergent transcription in Arabidopsis and likely maize contrasts with the notion that eukaryotic promoters are inherently divergent (41). Highly directional initiation of transcription was also observed in Drosophila (38). Notably, both Arabidopsis and Drosophila display strong core promoter signatures, suggesting a prominent role for the core promoter and its motifs in mediating transcriptional directionality. Arabidopsis core promoters were enriched for distinct Inr-like motifs and the TATA-box with 80% and 30%, respectively. The strong prevalence of these motifs may be due to developmental timing. On the other hand, despite commonly containing more than one copy of the TATA-box binding protein (TBP) gene, plants lack TBP-related factors (42). In bilateral symmetric animals, these factors were shown to support different transcription systems, enabling regulatory diversity through core promoter motif diversity (42, 43). Arabidopsis, on the other hand, encodes two additional eukaryotic RNAPs: RNAP IV and RNAP V, which are integral to the repression of a subset of genes and transposons through RNA-directed DNA methylation (44). These additional RNAPs may reflect a different evolutionary approach to increasing the regulatory diversity of the genome.

GRO-seq identified 9,200 transcripts in 6-d-old Arabidopsis seedlings, of which only 153 were noncoding transcripts generated by RNAP II. This number is considerably less than in humans (8, 17). Plants lack enhancer RNAs (eRNAs) but notably also the NEGATIVE ELONGATION FACTOR (NELF) involved in promoter-proximal pausing (45). eRNAs were reported to mediate release of NELF-dependent pausing (46). Therefore, given the absence of NELF, potential eRNAs may not have provided the same selective advantages in plants. In contrast, however, Zhu et al. (25) predicted over 10,000 plant enhancers based on chromatin signatures in leaves and flowers. Without tissue-matched GRO-seq data for these predicted enhancers or targeted disruption, it is difficult to validate their in vivo role or potential for enhancer transcription. Cell-type specific nuclei obtained using isolation of nuclei tagged in specific cell types (INTACT) (47) or nuclear-localized reporters in combination with fluorescence-activated cell sorting may clarify these results.

Capturing nascent transcripts enabled characterization of basic features. Absence of promoter-proximal pausing, together with a high correlation between nascent and steady-state transcript levels argues that Arabidopsis transcription is predominantly regulated at the level of initiation. Li et al. (35) reported transcription to be the most regulated step in human gene regulation. In this light, transcription initiation may be the major step of gene regulation in plants.

RNAP pausing, a major regulator of transcription elongation in mammals (48), was observed predominantly downstream of the PAS in Arabidopsis and maize. The underlying mechanism is unknown but is likely a common feature in plant transcription. Previous in vitro yeast work has proposed that increased pausing downstream of the polyA signal results in increased surveillance time for the mRNA and therefore a higher chance of degradation (39). This idea may hold true in plants based on the higher GRO-seq signal compared with RNA-seq for longer Arabidopsis genes, which also show higher amounts of 3′ pausing compared with shorter genes. In addition, DNA methylation was shown to slow down transcription elongation (49), and yet the exact role of gene body methylation in plants is still unclear.

In summary, we have described a method for the analysis of nascent transcripts in primary tissue and provide a high-resolution map of Arabidopsis transcripts. GRO-seq opens up avenues to study transcriptional regulation or responses to stimuli at a specific moment in time. We envision that this technical advance will facilitate a better understanding of gene regulation in plants but also eukaryotic transcription in general.

Materials and Methods

A. thaliana (Col-0) 6-d-old seedlings were grown on half Linsmaier and Skoog (LS) medium with 24 h of light at 22 °C. Tissue was mechanically homogenized and nuclei purified by centrifugation. GRO-seq and 5′ GRO-seq are described detailed in SI Materials and Methods. Briefly, 5 × 106 nuclei were run on, and DNase/proteinase treated, and RNA was extracted using TRIzol. RNA was digested with Terminator 5′–Phosphate-Dependent Exonuclease (Epicentre) before fragmentation. Nascent RNA was enriched twice for 5-bromo-UTP (BrUTP) by immunoprecipitation. After end repair, RNA 5′ pyrophosphohydrolase (RppH) was used for decapping and library prepared using the NEB Next Small RNA Library Prep Set. Data were analyzed using HOMERplants, accessible at homer.ucsd.edu/homer/plants/.

Supplementary Material

Supplementary File
Supplementary File
pnas.1603217113.sd02.xlsx (22.5KB, xlsx)

Acknowledgments

We thank J. Kadonaga, C. Glass, and Ira Schildkraut for feedback and suggestions. The work was supported by J.C.’s Howard Hughes Medical Institute funding and NIH Grants R01GM094428 and R01GM52413. J.H. was supported by NIH Grant T32GM007240, the Rose Hills Foundation, and the H. A. and Mary K. Chapman Charitable Trust. S.H.D. is the recipient of the University of California at San Diego Molecular Biology/Cancer Center Fellowship and a CRI-Irvington Fellow.

Footnotes

The authors declare no conflict of interest.

Data deposition: The data reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession no. GSE83108).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1603217113/-/DCSupplemental.

References

  • 1.Smale ST, Kadonaga JT. The RNA polymerase II core promoter. Annu Rev Biochem. 2003;72:449–479. doi: 10.1146/annurev.biochem.72.121801.161520. [DOI] [PubMed] [Google Scholar]
  • 2.Dvir A, Conaway JW, Conaway RC. Mechanism of transcription initiation and promoter escape by RNA polymerase II. Curr Opin Genet Dev. 2001;11(2):209–214. doi: 10.1016/s0959-437x(00)00181-7. [DOI] [PubMed] [Google Scholar]
  • 3.Danino YM, Even D, Ideses D, Juven-Gershon T. The core promoter: At the heart of gene expression. Biochim Biophys Acta. 2015;1849(8):1116–1131. doi: 10.1016/j.bbagrm.2015.04.003. [DOI] [PubMed] [Google Scholar]
  • 4.Molina C, Grotewold E. Genome wide analysis of Arabidopsis core promoters. BMC Genomics. 2005;6:25. doi: 10.1186/1471-2164-6-25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Yamamoto YY, et al. Heterogeneity of Arabidopsis core promoters revealed by high-density TSS analysis. Plant J. 2009;60(2):350–362. doi: 10.1111/j.1365-313X.2009.03958.x. [DOI] [PubMed] [Google Scholar]
  • 6.Morton T, et al. Paired-end analysis of transcription start sites in Arabidopsis reveals plant-specific promoter signatures. Plant Cell. 2014;26(7):2746–2760. doi: 10.1105/tpc.114.125617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Mejía-Guerra MK, et al. Core promoter plasticity between maize tissues and genotypes contrasts with predominance of sharp transcription initiation sites. Plant Cell. 2015;27(12):3309–3320. doi: 10.1105/tpc.15.00630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Core LJ, Waterfall JJ, Lis JT. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science. 2008;322(5909):1845–1848. doi: 10.1126/science.1162228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kwak H, Fuda NJ, Core LJ, Lis JT. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science. 2013;339(6122):950–953. doi: 10.1126/science.1229386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Churchman LS, Weissman JS. Native elongating transcript sequencing (NET-seq) Curr Protoc Mol Biol. 2012;Chapter 4:Unit 4.14.1–4.14.17. doi: 10.1002/0471142727.mb0414s98. [DOI] [PubMed] [Google Scholar]
  • 11.Seila AC, Core LJ, Lis JT, Sharp PA. Divergent transcription: A new feature of active promoters. Cell Cycle. 2009;8(16):2557–2564. doi: 10.4161/cc.8.16.9305. [DOI] [PubMed] [Google Scholar]
  • 12.Erhard KF, Jr, Talbot JE, Deans NC, McClish AE, Hollick JB. Nascent transcription affected by RNA polymerase IV in Zea mays. Genetics. 2015;199(4):1107–1125. doi: 10.1534/genetics.115.174714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hawley DK, Roeder RG. Separation and partial characterization of three functional steps in transcription initiation by human RNA polymerase II. J Biol Chem. 1985;260(13):8163–8172. [PubMed] [Google Scholar]
  • 14.Rougvie AE, Lis JT. The RNA polymerase II molecule at the 5′ end of the uninduced hsp70 gene of D. melanogaster is transcriptionally engaged. Cell. 1988;54(6):795–804. doi: 10.1016/s0092-8674(88)91087-2. [DOI] [PubMed] [Google Scholar]
  • 15.Heinz S, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38(4):576–589. doi: 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Lam MT, et al. Rev-Erbs repress macrophage gene expression by inhibiting enhancer-directed transcription. Nature. 2013;498(7455):511–515. doi: 10.1038/nature12209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Djebali S, et al. Landscape of transcription in human cells. Nature. 2012;489(7414):101–108. doi: 10.1038/nature11233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Sullivan AM, et al. Mapping and dynamics of regulatory DNA and transcription factor networks in A. thaliana. Cell Reports. 2014;8(6):2015–2030. doi: 10.1016/j.celrep.2014.08.019. [DOI] [PubMed] [Google Scholar]
  • 19.Pautler M, et al. FASCIATED EAR4 encodes a bZIP transcription factor that regulates shoot meristem size in maize. Plant Cell. 2015;27(1):104–120. doi: 10.1105/tpc.114.132506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bernstein BE, et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol. 2010;28(10):1045–1048. doi: 10.1038/nbt1010-1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Consortium EP. ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lister R, et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009;462(7271):315–322. doi: 10.1038/nature08514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Luo C, et al. Integrative analysis of chromatin states in Arabidopsis identified potential regulatory mechanisms for natural antisense transcript production. Plant J. 2013;73(1):77–90. doi: 10.1111/tpj.12017. [DOI] [PubMed] [Google Scholar]
  • 24.Wang X, et al. Genome-wide and organ-specific landscapes of epigenetic modifications and their relationships to mRNA and small RNA transcriptomes in maize. Plant Cell. 2009;21(4):1053–1069. doi: 10.1105/tpc.109.065714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhu B, Zhang W, Zhang T, Liu B, Jiang J. Genome-wide prediction and validation of intergenic enhancers in Arabidopsis using open chromatin signatures. Plant Cell. 2015;27(9):2415–2426. doi: 10.1105/tpc.15.00537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Liang G, He H, Yu D. Identification of nitrogen starvation-responsive microRNAs in Arabidopsis thaliana. PLoS One. 2012;7(11):e48951. doi: 10.1371/journal.pone.0048951. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.He Y, Vogelstein B, Velculescu VE, Papadopoulos N, Kinzler KW. The antisense transcriptomes of human cells. Science. 2008;322(5909):1855–1857. doi: 10.1126/science.1163853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Duttke SH, et al. Human promoters are intrinsically directional. Mol Cell. 2015;57(4):674–684. doi: 10.1016/j.molcel.2014.12.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Swiezewski S, Liu F, Magusin A, Dean C. Cold-induced silencing by long antisense transcripts of an Arabidopsis Polycomb target. Nature. 2009;462(7274):799–802. doi: 10.1038/nature08618. [DOI] [PubMed] [Google Scholar]
  • 30.Costas C, et al. Genome-wide mapping of Arabidopsis thaliana origins of DNA replication and their associated epigenetic marks. Nat Struct Mol Biol. 2011;18(3):395–400. doi: 10.1038/nsmb.1988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Xu L, et al. Di- and tri- but not monomethylation on histone H3 lysine 36 marks active transcription of genes involved in flowering time regulation and other processes in Arabidopsis thaliana. Mol Cell Biol. 2008;28(4):1348–1360. doi: 10.1128/MCB.01607-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Juven-Gershon T, Kadonaga JT. Regulation of gene expression via the core promoter and the basal transcriptional machinery. Dev Biol. 2010;339(2):225–229. doi: 10.1016/j.ydbio.2009.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Zuo YC, Li QZ. Identification of TATA and TATA-less promoters in plant genomes by integrating diversity measure, GC-Skew and DNA geometric flexibility. Genomics. 2011;97(2):112–120. doi: 10.1016/j.ygeno.2010.11.002. [DOI] [PubMed] [Google Scholar]
  • 34.Carretero-Paulet L, et al. Genome-wide classification and evolutionary analysis of the bHLH family of transcription factors in Arabidopsis, poplar, rice, moss, and algae. Plant Physiol. 2010;153(3):1398–1412. doi: 10.1104/pp.110.153593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Li S. The Arabidopsis thaliana TCP transcription factors: A broadening horizon beyond development. Plant Signal Behav. 2015;10(7):e1044192. doi: 10.1080/15592324.2015.1044192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Weirauch MT, et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell. 2014;158(6):1431–1443. doi: 10.1016/j.cell.2014.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.de Boer CG, Hughes TR. YeTFaSCo: A database of evaluated yeast transcription factor sequence specificities. Nucleic Acids Res. 2012;40(Database issue):D169–D179. doi: 10.1093/nar/gkr993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Nechaev S, et al. Global analysis of short RNAs reveals widespread promoter-proximal stalling and arrest of Pol II in Drosophila. Science. 2010;327(5963):335–338. doi: 10.1126/science.1181421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Anamika K, Gyenis À, Poidevin L, Poch O, Tora L. RNA polymerase II pausing downstream of core histone genes is different from genes producing polyadenylated transcripts. PLoS One. 2012;7(6):e38769. doi: 10.1371/journal.pone.0038769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Chodavarapu RK, et al. Relationship between nucleosome positioning and DNA methylation. Nature. 2010;466(7304):388–392. doi: 10.1038/nature09147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Preker P, et al. RNA exosome depletion reveals transcription upstream of active human promoters. Science. 2008;322(5909):1851–1854. doi: 10.1126/science.1164096. [DOI] [PubMed] [Google Scholar]
  • 42.Duttke SH, Doolittle RF, Wang YL, Kadonaga JT. TRF2 and the evolution of the bilateria. Genes Dev. 2014;28(19):2071–2076. doi: 10.1101/gad.250563.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Duttke SH. Evolution and diversification of the basal transcription machinery. Trends Biochem Sci. 2015;40(3):127–129. doi: 10.1016/j.tibs.2015.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Law JA, Jacobsen SE. Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat Rev Genet. 2010;11(3):204–220. doi: 10.1038/nrg2719. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Wu CH, et al. NELF and DSIF cause promoter proximal pausing on the hsp70 promoter in Drosophila. Genes Dev. 2003;17(11):1402–1414. doi: 10.1101/gad.1091403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Schaukowitch K, et al. Enhancer RNA facilitates NELF release from immediate early genes. Mol Cell. 2014;56(1):29–42. doi: 10.1016/j.molcel.2014.08.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Deal RB, Henikoff S. The INTACT method for cell type-specific gene expression and chromatin profiling in Arabidopsis thaliana. Nat Protoc. 2011;6(1):56–68. doi: 10.1038/nprot.2010.175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Jonkers I, Lis JT. Getting up to speed with transcription elongation by RNA polymerase II. Nat Rev Mol Cell Biol. 2015;16(3):167–177. doi: 10.1038/nrm3953. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Rountree MR, Selker EU. DNA methylation inhibits elongation but not initiation of transcription in Neurospora crassa. Genes Dev. 1997;11(18):2383–2395. doi: 10.1101/gad.11.18.2383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Lei M, et al. Regulatory link between DNA methylation and active demethylation in Arabidopsis. Proc Natl Acad Sci USA. 2015;112(11):3553–3557. doi: 10.1073/pnas.1502279112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Dobin A, et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Robinson JT, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Wang D, et al. Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA. Nature. 2011;474(7351):390–394. doi: 10.1038/nature10006. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
Supplementary File
pnas.1603217113.sd02.xlsx (22.5KB, xlsx)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES