Abstract
High-throughput detection of nascent RNA is critical for studies of transcription and much more challenging than that of mRNA. Recently, several massively parallel nascent RNA sequencing methods were established in eukaryotic cells. Here, we systematically compared 3 classes of methods on the same pure or crude nuclei preparations: GRO-seq for sequence nuclear run-on RNAs, pNET-seq for sequence RNA polymerase II-associated RNAs, and CB RNA-seq for sequence chromatin-bound (CB) RNAs in Arabidopsis (Arabidopsis thaliana). To improve the resolution of CB RNAs, 3′CB RNA-seq was established to sequence the 3′ ends of CB RNAs. In addition, we modified pNET-seq to establish the Chromatin Native Elongation Transcript sequencing (ChrNET) method using chromatin as the starting material for RNA immunoprecipitation. Reproducibility, sensitivity and accuracy in detecting nascent transcripts, experimental procedures, and costs were analyzed, which revealed the strengths and weaknesses of each method. We found that pNET and GRO methods best detected active RNA polymerase II. CB RNA-seq is a simple and cost-effective alternative for nascent RNA studies, due to its high correlation with pNET-seq and GRO-seq. Compared with pNET, ChrNET has higher specificity for nascent RNA capture and lower sequencing cost. 3′CB is sensitive to transcription-coupled splicing. Using these methods, we identified 1,404 unknown transcripts, 4,482 unannotated splicing events, and 60 potential recursive splicing events. This comprehensive comparison of different nascent/chromatin RNA sequencing methods highlights the strengths of each method and serves as a guide for researchers aiming to select a method that best meets their study goals.
The strengths and weaknesses of 7 nascent RNA sequencing methods were experimentally and systematically compared in Arabidopsis.
IN A NUTSHELL.
Background: mRNA serves as an intermediate molecule that transfers genetic information from DNA to functional proteins. The level of mRNA expression in a eukaryotic cell, which is determined by processes such as synthesis, processing, and turnover, greatly influences its fate and function. To investigate how a genome produces mRNA through transcription, various methods have been established to directly examine nascent RNA products: (i) native elongation transcripts (NETs); (ii) global nuclear run-on (GRO); and (iii) chromatin-bound RNA as a proxy for nascent RNA (CB). These methods have been used by different labs under various conditions with different pretreatments and downstream pipelines. Consequently, it remains unclear what the strengths and weaknesses are for each method. This motivated us to conduct a fair comparison using the same plant tissue, pretreatment, and downstream pipeline whenever possible.
Question: How accurately do these methods reflect transcription? Which method is most suitable for a specific scenario, considering the experimental objectives, feasibility, and budgetary constraints?
Findings: Our findings revealed that the NET and GRO methods performed best when detecting active transcription. CB RNA-seq emerged as a simple yet cost-effective alternative for studying nascent RNA. As an updated version of NET, Chromatin Native Elongation Transcript sequencing demonstrated higher specificity in capturing nascent RNA, while also reducing sequencing costs. Additionally, 3′CB proved sensitive to transcription-coupled splicing.
Next steps: Moving forward, we plan to leverage the advantages offered by different methods to explore how plants respond at the transcriptional level under external (environmental) and internal (developmental) changes.
Introduction
The process from DNA to mRNA in the central dogma includes the synthesis of nascent RNA by RNA polymerase using DNA as a template and the cotranscriptional and posttranscriptional processing of mRNA precursors. Mature RNA sequencing (RNA-seq) methods target mRNA, the final product of transcription and processing. However, these methods cannot distinguish between transcription and processing and the dynamics of each step of transcription. And, RNA-seq is not sensitive enough to detect unstable mRNA and noncoding RNA (ncRNA), such as enhancer RNA (eRNA). Recently, sequencing-based nascent RNA studies have provided powerful tools for detecting RNA synthesis and unstable transcripts.
According to the strategy of nascent/CB RNA isolation, sequencing-based methods can be classified into 4 categories: (i) enrichment of RNA polymerase II (Pol II)-associated RNAs by immunoprecipitating Pol II fused with an epitope tag or using antibody specific for carboxyl-terminal domain (CTD) of nuclear DNA-dependent RNA Pol II largest subunit (NRPB1) (e.g. NET-seq and mNET-seq) (Churchman and Weissman 2011; Nojima et al. 2015); (ii) capture RNAs being synthesized by in vitro nuclear run-on or chromatin run-on with labeled nucleoside triphosphate (e.g. GRO-seq, PRO-seq, ChRO-seq, and fastGRO [Core et al. 2008; Kwak et al. 2013; Chu et al. 2018; Barbieri et al. 2020]); (iii) enrichment of CB RNA from nuclear RNA by stringent washing (e.g. 3′NT and human NET-seq) (Weber et al. 2014; Mayer et al. 2015); and (iv) in vivo metabolic labeling and affinity purification of nascent RNA (e.g. 4sU and TT-seq) (Fuchs et al. 2014; Schwalb et al. 2016).
Most of these methods were established in yeast (Saccharomyces cerevisiae) and animal cells and then applied in plant tissues. For example, GRO-seq or PRO-seq was reported to measure protein-coding genes or intergenic transcription in Arabidopsis (Arabidopsis thaliana), maize (Zea mays), wheat (Triticum aestivum), and cassava (Manihot esculenta) (Erhard et al. 2015; Hetzel et al. 2016; Liu et al. 2018; Zhu et al. 2018; Liu et al. 2021; Lozano et al. 2021; Xie et al. 2022); pNET-seq (immunoprecipitation of NRPB1 with specific CTD posttranslational modifications) (Zhu et al. 2018; Xie et al. 2022) and plaNET-seq (plaNET, immunoprecipitation of Pol II second largest subunit, NRPB2) (Kindgren et al. 2020) were adopted in characterization of Pol II-mediated transcription in Arabidopsis and wheat; CB RNA-seq (Li et al. 2020; Zhu et al. 2020; Zhu et al. 2021) and FLEP-seq (full-length elongating and polyadenylated RNA-seq) (Long et al. 2021; Mo et al. 2021) were used to profile the cotranscriptional pre-mRNA processing and termination; RNA stabilities were determined by pulse-chase metabolic labeling of RNA with 5-ethynyl uridine (5-EU) in Arabidopsis (Szabo et al. 2020).
Most of the above methods are capable of identifying changes in gene transcriptional activity. Yao et al. (2022) recently reported a systematic comparison of nascent RNA and RNA-seq methods using published data from several mammalian cell lines and concluded that methods for detecting transcription start sites (TSS), such as GRO/PRO-cap, are most sensitive for detecting eRNA (Yao et al. 2022). However, it is unclear how these methods differ in sensitivity, accuracy, affordability, and experimental feasibility in detecting different steps of transcriptional and cotranscriptional processes. Moreover, such comparisons are greatly influenced by variables such as starting materials, preprocessing (nuclei isolation), and downstream sequencing library construction protocols. Here, we provide experiment-based benchmarking data on various nascent/CB RNA capture principles to aid researchers in making well-informed decisions that balance experimental objectives, feasibility, and budgetary constraints.
Results
Overall experimental design
To address the advantages and disadvantages of different methods, we selected GRO-seq (sequence nuclear run-on RNAs, GRO), pNET-seq (sequence NRPB1-associated RNAs, pNET), and CB RNA-seq (sequence CB RNAs, CB) as representative methods for various nascent/CB RNA isolation and/or detection principles (Fig. 1A). In addition, we developed a 3′CB RNA-seq protocol to sequence the 3′ ends of CB RNAs (3′CB, also named 3′ end of nascent transcripts, i.e. 3′NT [Weber et al. 2014]), potentially providing better resolution (single nucleotide) than CB (150 to 300 nt; Supplemental Figs. S1A and S2A). Given that the Pol II complex binds tightly to chromatin, we also used chromatin as the starting material for Pol II-associated RNA immunoprecipitation and named this new protocol Chromatin Native Elongation Transcript sequencing (ChrNET). The ChrNET involves the application of a similar step as in the CB method, where chromatin is isolated by rigorous urea wash that removes the majority of non-CB proteins/RNAs (Mayer et al. 2015; Zhu et al. 2020), thus potentially reducing contaminating RNA species such as rRNA, tRNA, etc.
Figure 1.
Experimental setup and protocol comparison of nascent/CB RNA sequencing approaches included in this study. A) Schemes of nuclei preparation. Nuclei were prepared from 10-d-old Arabidopsis seedlings. Pure nuclei were fractionated using a Percoll gradient, while crude nuclei were directly precipitated from tissue lysate. B) Schemes of nascent/CB RNA capture. Pure nuclei were allocated to GRO_pure and pNET_pure, and crude nuclei were allocated to GRO_crude and pNET_crude. Chromatin extracted by further stringent wash of nuclei was allocated to 3′CB, CB, and ChrNET. For GRO, nuclei were supplemented with BrUTP, and nascent RNAs were labeled during run-on and affinity purified using anti-BrU antibody. For pNET/ChrNET, Pol II-RNA-DNA complex was solubilized by MNase digestion from nuclei (pNET_pure and pNET_crude) or chromatin (ChrNET) and then immunoprecipitated using anti-NRPB1 CTD antibody. Pol II-associated RNA was recovered from the complex. C) Summary of critical steps of each method. For 3′CB, GRO, and pNET, RNA was converted to cDNA followed a small RNA cloning protocol. For CB, RNA was converted to cDNA followed a strand-specific RNA cloning protocol. pNET, plant native elongating transcript sequencing; ChrNET, Chromatin Native Elongation Transcript sequencing; GRO, global nuclear run-on sequencing; 3′CB, 3′ end of chromatin-bound RNA sequencing; CB, chromatin-bound RNA sequencing; BrUTP, 5-bromouridine 5′-triphosphate sodium salt; Pol II CTD, carboxyl-terminal domain of the largest subunit of Pol II, NRPB1.
An overview of the experiments is presented in Fig. 1. GRO, pNET, 3′CB, and CB all start with nuclei preparation (Fig. 1A), followed by nascent/CB RNA isolation (Fig. 1B) and cDNA library construction (Fig. 1C). To minimize variation in nuclei preparation and assess its effect, nuclei were uniformly isolated using either a pure protocol with tissue lysates subjected to a Percoll gradient fractionation or a crude protocol in which tissue lysates were simply pelleted (Fig. 1A). Thus, we designate the GRO and pNET experiments using pure and crude nuclei as GRO_pure, GRO_crude, pNET_pure, and pNET_crude. The chromatin preparation obtained from the crude nuclei was employed for ChrNET. Meanwhile, rRNA-depleted nuclear RNA (Nuc), polyA-enriched nuclear RNA (Nuc_pA), rRNA-depleted total RNA (total), and polyA-enriched total RNA (total_pA) were isolated and sequenced from nuclei and seedling tissues for reference, respectively.
Modified small RNA library preparation protocols for GRO, pNET, ChrNET, and 3′CB involved sequential ligation of 3′ and 5′ adapters to the nascent/CB RNA, followed by reverse transcription and PCR amplification (Supplemental Fig. S2A). CB, Nuc, total, Nuc_pA, and total_pA libraries were prepared using a widely adopted strand-specific RNA-seq protocol. This protocol involves several steps including RNA fragmentation, 1st and 2nd cDNA conversion, double-stranded adapter ligation, 1st cDNA digestion, and PCR amplification (Supplemental Fig. S2A). Notably, by small RNA library construction protocol, the potential RNA polymerase position was retained at either 5′ (GRO) or 3′ end (pNET, ChrNET, and 3′CB) of cDNA, whereas for CB, reads from CB RNA fragments could cover the whole gene (Supplemental Fig. S2A). Triple technical replicates were done for all the nascent/CB RNA profiling experiments. Collectively, we did our best to minimize the technical variation beyond these methods.
However, in a pilot experiment, over 90% of the 3′CB reads were from short and highly abundant ncRNAs, such as small nucleolar RNAs (snoRNAs) and small nuclear RNA (snRNAs), which are known to be involved in RNA processing and attached to chromatin. We added an additional step to remove the top 150 abundant RNAs via RNase H digestion (see Material and methods; Supplemental Fig. S1 and Data Set 1).
Each library was sequenced in considerable depth: 39,083,030 to 148,029,794 raw reads were obtained for each GRO, pNET, ChrNET, or 3′CB replication library, while 16,164,752 to 89,923,849 raw reads were obtained for each CB, Nuc, Nuc_pA, total, or total_pA library (Supplemental Data Set 2). After cleaning raw reads, removing highly abundant ncRNA (rRNA, tRNA, snoRNA, and snRNA) and plastid RNA (Supplemental Fig. S3A), 6,694,363 to 31,089,496 reads were retained per library, resulting in a broad yield (ratio of retained reads to clean reads, Supplemental Fig. S3B): 14.51% ± 0.41% (pNET_pure), 19.09% ± 0.19% (pNET_crude), 46.89% ± 5.84% (ChrNET), 12.62% ± 3.91% (GRO_pure), 24.07% ± 3.31% (GRO_crude),18.56% ± 1.05% (3′CB), 57.02% ± 2.59% (CB), 28.93% (Nuc), 82.00% (Nuc_pA), 19.10% (total), and 87.07% (total_pA). Therefore, in terms of sequencing cost, CB/ChrNET and GRO_pure are the most and least effective nascent/CB RNA-seq methods, respectively. Notably, rRNA and polyA RNA depletion steps were included in the CB and 3′CB. As expected, more reads were distributed in introns at nascent/CB RNA level than mature RNA level (Supplemental Fig. S2B).
Negative control experiments demonstrated an enrichment for nascent RNA
To estimate the background of nuclear run-on RNA and Pol II-associated RNA capture, we also performed 3 negative control (NC) experiments with crude nuclei: GRO_NC, pNET_NC, and ChrNET_NC. In GRO_NC, the nuclear run-on reaction was carried out in the presence of unmodified UTP, whereas immunoprecipitation was performed in the absence of anti-Pol II antibody for pNET_NC and ChrNET_NC (see Materials and methods). Pol II complex was immunoprecipitated in pNET_crude and ChrNET, but not in pNET_NC and ChrNET_NC (Supplemental Fig. S4A). After data cleaning and processing, only 1.97% and 2.47% reads were retained for pNET_NC and ChrNET_NC libraries (Supplemental Data Set 2). Correlation of mock experiment replicates is lower than that of pNET, ChrNET, and GRO (Supplemental Fig. S4B), and trace amounts of signals were detected on active genes in NCs (Supplemental Fig. S4C).
A greater percentage of reads originated from rRNA and plastid RNA, lower intron/exon ratio in NCs compared with pNET, ChrNET, and GRO (Supplemental Figs. S2B and S3B). By comparing with the NC, we calculated the relative enrichment fold of retained reads (mainly Pol II products), rRNA-derived and plastid-derived reads. We found that Pol II products were significantly overrepresented in the pNET_crude and ChrNET libraries, while the rRNA and plastid RNA were underrepresented (Supplemental Fig. S3, C and D; Supplemetal Data Set 6). The relative enrichment folds for pNET_crude and ChrNET were approximately 10 and 20, respectively. Additionally, a portion of retained reads from NCs is likely derived from nascent RNA.
Based on these analyses, we hypothesized that more than 90% to 95% of retained reads were from nascent RNA for pNET type methods. Previously, we tested the specificity of anti-BrU purification with nonlabeled and BrU labeled probes, only 0.4% of nonlabeled RNA was recovered by anti-BrU agarose (Zhu et al. 2018). Therefore, we deduced that more than 99% of the retained reads originated from BrU labeled RNA, i.e. nascent RNA, for GRO type methods. Because of the presence of chromatin-associated RNA rather than nascent RNA, we believe that for CB type methods, a lower percentage of retained reads were from nascent RNA.
Reproducibility evaluation
To evaluate the reproducibility of each method, Pearson correlation coefficients were calculated, and principal component analysis (PCA) was conducted based on the read distribution across 500-bp genome bins. Read count was normalized by variance stabilizing transformation (VST) using R package DESeq2 (Love et al. 2014). Both Pearson correlation and PCA showed high consistency among technical triplicates for each method (Fig. 2, A and B). The first 2 PCs explained approximately 54.26% of the variation, representing difference in library preparation and difference between nascent/CB RNA and nucleus/total RNA. Of the 7 methods, GRO type and pNET type had the highest correlation, suggesting that these approaches, although based on different principles, were the most accurate for pinpointing transcription.
Figure 2.
Reproducibility of different nascent/CB RNA sequencing methods. A) Heatmap of Pearson correlation coefficients for different methods based on read count of 500-bp sliding windows across the genome. Libraries are ordered by hierarchical clustering method. B) PCA based on read count of 500-bp sliding windows across the genome. Different methods are indicated by colored triangles. pNET type and GRO type methods are circled. For both Pearson correlation and PCA analysis, read counts were normalized by VST using R package “DESeq2”. pNET, plant native elongating transcript sequencing; ChrNET, Chromatin Native Elongation Transcript sequencing; GRO, global nuclear run-on sequencing; 3′CB, 3′ end of chromatin-bound RNA sequencing; CB, chromatin-bound RNA sequencing; Nuc, nuclear RNA sequencing; Nuc_pA, nuclear polyA RNA sequencing; total, total RNA sequencing; total_pA, total polyA RNA sequencing.
In general, a transcription cycle consists of several stages: Pol II recruitment, transcription initiation, promoter-proximal pausing, elongation, and termination (Wissink et al. 2019). By comparing the dynamics of nascent/CB RNA abundance in different regions of genes, transcriptional regulation at each stage can be predicted. To assess nascent/CB RNA distribution at different transcriptional stages from different methods, protein-coding genes longer than 500 bp were divided into 4 regions: R1 (−150 to +150 bp from TSS), representing the proximal promoter; R2 (+200 bp from TSS to −150 bp from polyadenylation site, PAS), representing the gene body; R3 (−150 to +50 bp from PAS), representing the polyadenylation region; and R4 (+50 to +1000 bp from PAS), representing the terminator (Supplemental Fig. S5A). Overall, hierarchical clustering of Pearson correlation coefficients for each region showed high repeatability across methods (Supplemental Fig. S5, B to E). The signal from gene body (R2) was the most reproducible (correlation within one method) of all methods and the most consistent across methods, suggesting that these methods have similar efficacy in assessing the transcriptional activity, whereas in the proximal promoter (R1), polyadenylation region (R3), and terminator (R4), 3′CB was relatively less reproducible compared with pNET/ChrNET and GRO.
Theoretically, pNET, ChrNET, and GRO reads indicate that polymerase complex and the number of polymerase complexes detected in a gene body are good measures of transcriptional activity. In contrasts, most CB reads may include fragments of CB RNA. A long CB RNA molecule may be detected multiple times, while a short RNA molecule may be detected once or not (Supplemental Fig. S2A). Therefore, the correlation of CB reads in promoter, PAS, and terminator regions with other methods was relatively low (Supplemental Fig. S5, B, D, and E).
Detect gene activity
We then compared the sensitivity of different methods in measuring transcription activity of protein-coding genes. We took 2 approaches to define active transcribed genes: (i) the signal to identify an active gene should be higher than the threshold of read density defined by gene desert regions and (ii) the signal to identify an active gene should be significantly higher than the NC (see Materials and methods; Supplemental Fig. S6). Active genes detected by both methods were largely overlapped for pNET_crude, ChrNET, and GRO_crude (Supplemental Fig. S6L). Thus, in the following analysis, we used the first strategy to define active genes, considering genes with transcript per million (TPM) >0.5 to 3.5 as actively transcribed for various methods (Fig. 3A) in order to minimize the false positives arising from low transcription background noise.
Figure 3.
Sensitivity and accuracy in detecting gene activity. A) The number of active protein-coding genes detected by different methods (circles on the diagonal), and the number of active protein-coding genes detected by either of the 2 methods together (circles on the nondiagonal). Thresholds for active transcription are indicated. B) Heatmaps showing the Pearson correlation of gene expression (TPM in Log2) among 3 technique replicates for different methods. Genes were divided into 4 groups according to transcription levels. C to I) Boxplots showing the changes of detected active genes (left) and Sd of gene activity (right) with the increasing library size. Uniquely mapped retained reads from 3 technical replicates were combined. Gene activity was measured by calculating TPM from gene body. At each sequencing depth (1 m to 20 m), 100 random samplings were conducted. The number of active genes of each sampling and the variation of TPM (represented by Sd) for each gene among the 100 sampling were calculated. To calculate the Sd, gene activity of each sampling was normalized to the maximum value. The line in the box represents the median value, and the upper whisker and lower whisker represent 75% and 25% of the data, respectively. pNET, plant native elongating transcript sequencing; ChrNET, Chromatin Native Elongation Transcript sequencing; GRO, global nuclear run-on sequencing; 3′CB, 3′ end of chromatin-bound RNA sequencing; CB, chromatin-bound RNA sequencing; Nuc, nuclear RNA sequencing; Nuc_pA, nuclear polyA RNA sequencing; total, total RNA sequencing; total_pA, total polyA RNA sequencing.
The most sensitive methods in terms of the number of actively transcribed genes detected were pNET and ChrNET, followed by GRO, 3′CB, and CB (Fig. 3A). More active genes were detected at nascent/CB RNA level than nuclear and total rRNA depleted/polyA enriched RNA, because some nascent/CB RNAs were unstable or aborted transcription products, which could not be detected by Nuc or total. Notably, ChrNET was the most reproducible method for both high and low expressed gene sets, and the similar pNET_crude and pNET_pure methods outperformed the GRO and CB methods (Fig. 3B). Furthermore, nuclei/chromatin preparation procedure has relatively little effect on gene activity.
We then determined the minimal library size (million [M] reads) required for detecting active genes in Arabidopsis. For all methods, the number of active genes detected became saturated when the sequencing depth (uniquely mapped reads after high abundance ncRNA removal) reached about 10 m (Fig. 3, C to I). Moreover, the Sd of transcriptional activity decreased when more unique aligned reads were sampled. When 9 m or more reads were sampled for all methods, the normalized median Sd was smaller than 0.05 (Fig. 3, C to I). Thus, for Arabidopsis, about 10 m retained reads would be enough for detecting actively transcribed genes. However, more reads might be needed for species with larger genomes.
Detect 5′ and 3′ stalling of Pol II
It has been recognized that Pol II stalls within a short window (5′ stalling) after transcription initiation in animals and plants (Core et al. 2008; Zhu et al. 2018). Furthermore, Pol II accumulates in the post PAS 250 bp region, which is plant-specific and known as the 3′ stalling (Hetzel et al. 2016; Zhu et al. 2018; Mo et al. 2021). Accordingly, 5′ stalling was observed by all methods, while the 3′ stalling was prominent only in pNET type and GRO type methods (Fig. 4, A and B). We adopted stalling index (StI) to measure the accumulation of Pol II at 5′ or 3′ end (Fig. 4C). Since CB detects long fragments of CB RNA, it is not suitable for stalling detection.
Figure 4.
Sensitivity and accuracy in detecting 5′ and 3′ stalling. A) Meta-profiles of the read density around the TSS and PAS of different methods. Lines and shading represent the mean ± SEM for each bin. B) Screenshots showing one example of read distribution on a gene (AT1G09430). C) Definition of 5′ StI and 3′ StI. D) Venn diagram showing the number of genes with 5′ stalling detected by pNET type methods (upper), GRO type methods (middle), and 3 types of methods (3′CB, pNET type, and GRO type) (bottom). The percentage of genes with 5′ stalling for each method was listed in the bracket. E) Venn diagram showing the number of genes with 3′ stalling detected by pNET type methods (upper), GRO type methods (middle), and 3 types of methods (3′CB, pNET type, and GRO type) (bottom). The percentage of genes with 3′ stalling was listed in the bracket. F) Heatmap of Pearson correlation coefficients for the 5′StI (lower triangle) and 3′ StI (upper triangle) between methods. pNET, plant native elongating transcript sequencing; ChrNET, Chromatin Native Elongation Transcript sequencing; GRO, global nuclear run-on sequencing; 3′CB, 3′ end of chromatin-bound RNA-seq; CB, chromatin-bound RNA sequencing; Nuc, nuclear RNA sequencing; Nuc_pA, nuclear polyA RNA sequencing; total, total RNA sequencing; total_pA, total polyA RNA sequencing.
Among the 13,815 active genes with length >1 kb, a large fraction of genes was identified having 5′ or 3′ Pol II stalling (5′StI/3′StI > 3, q value <0.05) (Fig. 4, D and E). There were 8,287 (59.99%), 7,639 (55.29%), 9,968 (72.15%), 4,826 (34.93%), and 7,298 (52.83%) genes with 5′ stalling detected by pNET_pure, pNET_crude, ChrNET, GRO_pure, and GRO_crude, respectively. There were 6,489 (46.97%) and 4,139 (29.96%) genes with 5′ stalling detected by all 3 pNET type methods and both GRO type methods, respectively. The 5′ stalling detected by pNET or 3′CB may include arrested Pol II and/or at transcription termination rather than paused Pol II (Thomas et al. 2020). But, the arrested or unstable/terminating Pol IIs do not run-on, so the 5′ stalling detected by GRO is more likely to come from active Pol IIs that are paused (Core et al. 2012).
For 3′ stalling, 4,326 (31.31%) and 2,917 (22.12%) genes were detected by all 3 pNET methods and both GRO methods, respectively. By 3′CB, 6,522 (47.21%) and 1,253 (9.07%) genes were detected with 5′ and 3′ Pol II stalling, respectively.
Pearson correlation indicates that 5′ StI was repeatable among pNET_pure, pNET_crude, ChrNET, GRO_pure, and 3′CB, while 3′ StI shows a moderately positive correlation between pNET_pure/pNET_crude/ChrNET and GRO_pure (Fig. 4F). The poor correlation between GRO_crude and the above 5 methods can be explained by the fact that the nuclear run-on was done with a mixture of intact nuclei and nuclear fragments, wherein the Pol II states were not uniform. In addition, Pol II Ser5P CTD antibody used in the pNET inclined to recognize the Pol II stalled 5′ to the gene, and thus, pNET Ser5P was expected to be more sensitive than pNET using antibody recognizing other phosphorylated form of CTD for 5′ stalling.
Measure cotranscriptional splicing
Introns of eukaryotic mRNA undergo cotranscriptional and posttranscriptional splicing, which involves 2 transesterification steps: the first transesterification yields the 5′ exon with a free 3′ hydroxyl group, referred as a splicing intermediate (SI), and a branched intron lariat that joins the 3′ exon; in the second transesterification step, the 5′ and 3′ exons are ligated, and the intron lariat is released (Fig. 5A). Since the spliceosome interacts with the NRPB1 CTD with Ser5P modification during cotranscriptional splicing (Nojima et al. 2018), SI and the intron lariat associate with Pol II or chromatin to some extent via the spliceosome–Pol II interaction. Hence, we expected to detect cotranscriptional SIs and products by Ser5P pNET/ChrNET and 3′CB.
Figure 5.
Detection of cotranscriptional splicing. A) Scheme of the splicing process. B) Meta-profiles of read density of the nascent/CB RNA-seq methods along exon–intron junction. C) Snapshots of the gene AT1G14000 showing SI detected by 3′CB, pNET_pure, pNET_crude, and ChrNET and lariat signal detected by 3′CB. D) Scheme showing the pipeline to predict 5′SSs. E) Venn diagram showing the number of 5′SS predicted by 3′CB and their overlapping with detected (by CB and/or Nuc and/or total) and annotated (AtRTD3) 5′SS. Only GU-AG splicing rule was considered. F) Boxplots showing the correlation of splicing efficiency and SI index in 3′CB. G) Boxplots showing the correlation of splicing efficiency and Lariat index in 3′CB. The line in the box represents the median value, and the upper whisker and lower whisker represent 75% and 25% of the data, respectively. pNET, plant Native Elongating Transcript sequencing; ChrNET, Chromatin Native Elongation Transcript sequencing; GRO, global nuclear run-on sequencing; 3′CB, 3′ end of chromatin-bound RNA sequencing; CB, chromatin-bound RNA sequencing; 5′SS, 5′ splice site; 3′SS, 3′ splice site.
Consistent with our previous study (Zhu et al. 2018), Ser5P pNET/ChrNET could detect SI (last nucleotide of upstream exon), but GRO could not (Fig. 5, B and C). 3′CB, which sequenced the 3′ end of CB RNA, could detect both SI and lariats (last nucleotide of intron) (Fig. 5, B and C). Although the SI signal detected by 3′CB was higher than that by pNET/ChrNET, the SI signals detected by pNET and 3′CB were positively correlated (Supplemental Fig. S7A). Furthermore, a significantly positive correlation between mean SI signal and transcriptional activity was revealed in both pNET/ChrNET and 3′CB, implying that SI of genes with higher transcriptional activity was more easily detected (Supplemental Fig. S7B).
Since unannotated splicing events were predicted from spikes detected at 5′ splice sites in plaNET-seq data (Kindgren et al. 2020), we tested whether splicing events can also be predicted by 3′CB and pNET/ChrNET. Thus, we set up a pipeline to predict potential 5′SS by 3′CB or pNET/ChrNET (Fig. 5D; see Materials and methods). In total, 92,653 5′SSs were predicted via 3′CB, among which 80,541 (86.93%) were overlapped with detected and/or annotated 5′SS (Fig. 5E). In addition, 40,126 5′SSs were predicted via pNET_pure, among which 32,017 (80.02%) were overlapped with detected and/or annotated 5′SSs (Supplemental Fig. S7C). However, pNET_crude (45.20%, 3,657 of 8,090) and ChrNET (64.94%, 8,587 of 13,222) predicted fewer splice sites (Supplemental Fig. S7C). Thus, 3′CB outperformed the pNET type methods in predicting 5′SSs.
The cotranscriptional splicing efficiency of an intron can be assessed by calculating the percent of intron retention (PIR, ratio of unspliced exon–intron junction reads to the total junction reads, Supplemental Fig. S7D) using CB data (Li et al. 2020; Zhu et al. 2021), i.e. introns with lower PIR are spliced more efficiently. Since SI and lariat detected by 3′CB are the products of cotranscriptional splicing, we expected that SI and lariat signal are related to PIR. Thus, we calculated SI index (ratio of reads mapped to the last nucleotide of the upstream exon to reads mapped from −50 to +50 bp of the 5′SS) and lariat index (ratio of the reads mapped to the last nucleotide of the intron to the total reads mapped from −50 to +50 bp of the 3′SS) (Supplemental Fig. S7D).
Introns were divided into 6 groups according to their splicing efficiency. Overall, both SI index and lariat index of groups 1 to 5 were slightly negatively associated with splicing efficiency in 3′CB (Fig. 5, F and G), indicating that for introns with high cotranscriptional splicing efficiency, the upstream and downstream exons can be joined rapidly, i.e. step 2 of splicing might be rate limiting for cotranscriptional splicing. The sixth group includes introns that were rarely spliced at the cotranscriptional level and thus had low SI indices and lariat indices (Supplemental Fig. S7H). However, for pNET methods, the negative association between SI index and splicing efficiency was only detected in pNET_pure (Supplemental Fig. S7, E to G).
In summary, 3′CB is the most sensitive method in detecting cotranscriptional splicing processing. While for pNET methods, nuclei isolation strategy has profound effects on their sensitivity of cotranscriptional detection. Spliceosome–Pol II interactions may dissociate easily during crude nuclei isolation, resulting in less SIs being detected. Thus, pNET_crude or ChrNET is not recommended for cotranscriptional processing analysis.
Complex cotranscriptional splicing events at CB RNA level
We then explored the splicing events that occurred only at the cotranscriptional level or at the posttranscriptional level or at both the levels by comparing the splicing events detected by CB, Nuc, total, Nuc_pA, and total_pA. In total, 118,110 splicing events, of which 93,774 (79.40%) were shared by 5 methods, and 1,266 were detected only by CB (Supplemental Fig. S8A; Fig. 6A). By comparing these splicing events with the annotation A. thaliana Reference Transcript Dataset 3 (AtRTD3) (Zhang et al. 2022), 4,482 unannotated splicing events were detected (Supplemental Fig. S8A). More unannotated splicing events were detected by CB than others (Supplemental Fig. S8B). The most common sequences near 5′SS and 3′SS matches the “GU-AG” rule, whether for all unannotated or CB-specific unannotated splicing events (Supplemental Fig. S8C).
Figure 6.
Complex cotranscriptional splicing events at CB RNA level. A) UpSet plot showing splicing events detected by CB, Nuc, total, Nuc_pA, and total_pA. B) A scheme showing the process of RS. C) A scheme showing the pipeline of detecting RS in Arabidopsis. D) Two examples of RS, the 11th intron of the gene AT5G16260 (SSJ is annotated) and the 8th intron of the gene ATG4G35335 (SSJ is unannotated). Left: sashimi plots showing read distribution and splicing events detected by CB, Nuc_pA, and total_pA on the genes AT5G16260 and AT4G35335. Right: genome browser shots showing the alignment tracks of CB, Nuc, and total for the potential RS events for the 11th intron of AT5G16260 and the 8th intron of AT4G35335. Reads were sorted by insert size. Light blue lines indicate spliced introns. Red arrows, short splicing junctions (SSJs); black arrow, lariat signal. The tracks of read density on genes of CB, Nuc_pA, and total_pA were plotted in different colours. CB, chromatin-bound RNA sequencing; Nuc, nuclear RNA sequencing; Nuc_pA, nuclear polyA RNA sequencing; total, total RNA sequencing; total_pA, total polyA RNA sequencing; 3′CB, 3′ end of chromatin-bound RNA sequencing.
Recursive splicing (RS) is a splicing mechanism observed in Drosophila (Drosophila melanogaster) and mammalian cells, which removes a single intron (usually very long) from pre-mRNA transcript by 2 consecutive splicing steps (Duff et al. 2015; Sibley et al. 2015; Joseph et al. 2018). The RS-sites in these introns contain the specific sequence “AGGU,” 3′SS followed by 5′SS, allowing the introns to be sequentially spliced (Fig. 6B). We then interrogated the CB reads for potential RS in Arabidopsis (see Materials and methods; Fig. 6C) and found 60 RS candidates (Supplemental Data Set 3), among which 40 short splicing junctions were annotated and 20 were unannotated. For example, potential RS was detected in the 11th intron (602 bp long) of gene AT5G16260 (EARLY FLOWERING9, ELF9), which was annotated in AtRTD3, and the 8th intron (169 bp long) of AT4G35335 which was unannotated (Fig. 6D). Notably, we provide a possibility that RS may occur in Arabidopsis even within short introns, as evidence that the lariat from the first splicing event within the 8th intron of AT4G35335 was detected by 3′CB (Fig. 6D).
Detect unknown and unstable transcripts
To compare the sensitivity for detecting unstable and/or noncoding transcripts of these methods, we first calculated the transcriptional activity of annotated long ncRNA (lncRNA) genes annotated in Araport11 and 2 previous studies (Kindgren et al. 2020; Ivanov et al. 2021). In addition, novel lncRNA transcripts, which are not annotated, were explored (Supplemental Fig. S9A; see Materials and methods). A total of 1,404 novel lncRNA transcripts were detected, of which 516 and 888 were designated as antisense transcripts and intergenic transcripts, respectively (Supplemental Fig. S9A and Data Set 4). pNET was the most sensitive method, followed by GRO and 3′CB, which detected 5,638 to 5,885 (among which 307 to 385 were novel), 4,508 to 4,928 (among which 364 to 510 were novel), and 3,701 (among which 1,088 were novel) lncRNAs, respectively. Although CB detected only 1,530 (among which 49 were novel) active lncRNAs, it was still much more sensitive than Nuc, total, Nuc_pA, and total_pA (Fig. 7A).
Figure 7.
Detection of lncRNA transcripts. A) Number of active annotated and novel lncRNA transcripts detected by different methods (circles on the diagonal) and detected by 2 methods together (circles on the nondiagonal). B) Three novel transcript examples detected using different methods, including nascent, CB, Nuc, and total (left); only nascent RNA/CB (middle), and only in pNET type, GRO type, and 3′CB methods (right). G3652 is a gene annotated in AtRTD3. pNET, plant native elongating transcript sequencing; ChrNET, Chromatin Native Elongation Transcript sequencing; GRO, global nuclear run-on sequencing; 3′CB, 3′ end of chromatin-bound RNA sequencing; CB, chromatin-bound RNA sequencing; Nuc, nuclear RNA sequencing; Nuc_pA, nuclear polyA RNA sequencing; total, total RNA sequencing; total_pA, total polyA RNA sequencing.
Three examples of novel transcripts were shown in Fig. 7B: one is detected in both nascent/CB and rRNA depleted Nuc and total RNA, one is only detected at nascent/CB RNA level, and one is detected only by pNET, GRO, and 3′CB. 3′CB exclusively identified hundreds of novel lncRNAs (Fig. 7A; Supplemental Fig. S9B). These transcripts may be from loci with low transcriptional activity, rendering them challenging to detect using other methods. Moreover, these transcripts might exhibit high stability and chromatin binding affinity, thereby enabling their detection by 3′CB. However, further experimentation is required to substantiate this hypothesis.
Because primary transcripts of microRNAs (miRNA) were rapidly processed into mature miRNAs, we also took miRNA precursors as examples for unstable transcripts. More active miRNA genes were detected by pNET, GRO, and 3′CB, followed by CB, Nuc, and total. CB is less sensitive than other methods (Supplemental Fig. S10 A and B). The transcriptional activities of miRNA genes were highly correlated among the pNET type, GRO type, and 3′CB methods (Supplemental Fig. S10, C to E).
GRO can detect active transcription by polymerases other than Pol II (Liu et al. 2018), whereas pNET only detects transcripts from Pol II. Pol II is also involved in the transcription of Pol V loci (Zheng et al. 2009). Thus, we expected that pNET can also detect transcripts from part of Pol V loci. Here, we interrogated the performance of different methods in detecting transcripts from known Pol V targets (Liu et al. 2018). Surprisingly, 3′CB detected the most active transcripts from Pol V loci (2,632), followed by GRO (612 to 786), and then pNET/ChrNET (94 to 240) (Supplemental Fig. S11A), whereas other methods failed to detect transcripts from Pol V loci (Supplemental Fig. S11, B to E). Of note, the transcripts from Pol V loci detected by 3′CB may not be nascent RNA being transcribed, but mature Pol V ncRNA bound to chromatin that mediates DNA methylation.
The advantages and application scenarios of the 7 methods
Considering the convenience of the experimental process, affordability, and reproducibility of the experimental results, we summed up the features of these methods (Tables 1 and 2). We proposed a guideline for researchers to select the appropriate method according to their aims and experimental conditions (Fig. 8). The pNET type and GRO type methods demonstrated superior performance in terms of reproducibility, sensitivity, and ability to investigate various stages of transcription. Moreover, in ChrNET, rigorous washing of the nuclei resulted in a substantial increase in the proportion of retained reads and a reduction in nonspecific reads from rRNA or plastid (Supplemental Fig. S3B and Data Set 2). pNET type methods can also detect different phosphorylated NRPB1 CTD isoforms and cotranscriptional SIs and achieve the highest resolution of single nucleotide, and ChrNET was chosen as the best method for nascent RNA detection due to the simplicity of chromatin preparation, high specificity of nascent RNA capture, and low sequencing cost.
Table 1.
Comparison of 7 methods for application in plant
| Methods | Detection sensitivity of steps in a transcription cycle | Advantage | Limitation | |||
|---|---|---|---|---|---|---|
| 5′St | Elon. activity | Term. | Cotranscription splicing | |||
| pNET_pure | b | b | a | SI (b) | Identify posttranslational modification of Pol II CTD and its function at different transcription steps. Sensitive in SI detection. ChrNET has a lower background | Contaminated with certain nonnascent RNAs associated with Pol II, such as snRNA Cannot distinguish transcriptional engaged, paused, and arrested Pol II |
| pNET_crude | b | b | a | SI (c) | ||
| ChrNET | b | b | a | SI (c) | ||
| GRO_pure | a | a | a | NA | Reflect the the transcriptionally engaged RNA polymerase. Bona fide pausing detection | Contaminated with nascent RNA generated by Pol I, III, IV, and V. Sensitive to nuclei preparation |
| GRO_crude | c | b | a | NA | Reflect the the transcriptionally engaged RNA polymerase | |
| 3′CB | b | c | b | SI (a) Lariat (c) |
Relatively little starting material is required. High sensitivity in SI and lariat detection | Contaminated with mature RNA and nascent RNA generated by Pol I, III, IV, and V. Cannot distinguish transcriptional engaged, paused, and arrested Pol II |
| CB | NA | b | NA | Splicing efficiency | Relatively little starting material is required. A cost-effective alternative method for transcription/cotranscription processing study | Contaminated with mature RNA. 5′ and 3′ nascent RNA may not be detected. Cannot detect transcriptional engaged, paused, and arrested Pol II. Low resolution |
aBest; bGood; cModest.
Elon., elongation; Term., termination; NA, not applicable.
Table 2.
Experimental characteristics and parameters of 7 methods
| Methods | Experimental procedures | Cost (USD) | Sequence depth recommended (million reads) | Resol. (nt) | Rep. | |||
|---|---|---|---|---|---|---|---|---|
| Nascent/CB RNA capture | Library construction | Nascent/CB RNA capture | Library construction | Sequencing | ||||
| pNET_pure | IP of Pol II complex | Small RNA cloning | 50 | 53 | 170 to 210 | 80 to 100 | 1 | a |
| pNET_crude | 43 | 53 | 170 to 210 | 80 to 100 | 1 | a | ||
| ChrNET | 55 | 53 | 65 to 85 | 30 to 40 | 1 | a | ||
| GRO_pure | Nuclei run-on | 70 | 55 | 210 to 420 | 100 to 200 | 50 to 100 | b | |
| GRO_crude | 62 | 55 | 210 to 250 | 100 to 120 | 50 to 100 | b | ||
| 3′CB | Isolate CB RNA | 45 | 70 | 210 | 100 | 1 | b | |
| CB | Strand-specific RNA cloning | 45 | 35 | 45 to 65 | 20 to 30 | NA | b | |
The cost of sequencing is calculated according to the Illumina PE150 sequencing strategy, and it can vary depending on sequencing strategy and vendor. Optimizing the nuclei preparation to minimize plastid contamination and rRNA depletion may reduce the cost further.
aBest, bGood.
Resol., resolution; Rep., repeatability; NA, not applicable.
Figure 8.
A guideline for selecting proper nascent/CB RNA-seq methods. The selection guide is based on 3 dimensions: (i) experiment expense includes nascent RNA capture, library construction, and sequencing depth. GRO, pNET, and 3′CB are viable options if the research project budget allows. Conversely, if the research budget is limited, CB and ChrNET can be chosen. (ii) The operability of the experimental process includes steps involved, hand on time required, and success rate of nascent RNA capture and library construction. The experimental process of CB is relatively straightforward, while GRO, pNET, ChrNET, and 3′CB are more complex. (iii) The advantages of each method are depicted in color-coded blocks. The method within the dotted box offers single nucleotide detection resolution. For further details, please refer to Tables 1 and 2. pNET, plant native elongating transcript sequencing; ChrNET, Chromatin Native Elongation Transcript sequencing; GRO, global nuclear run-on sequencing; 3′CB, 3′ end of chromatin-bound RNA sequencing; CB, chromatin-bound RNA sequencing.
GRO is sensitive to nuclear purity and/or integrity when determining Pol II stalling states (Fig. 4D). If tissue is plentiful or readily available, GRO_pure experiments are recommended. For materials that are difficult to obtain or materials with poor integrity, GRO_crude can be considered, as shown by Chu et al. (2018), who prepared chromatin from archived samples with degraded RNA for run-on and detected the transcriptional status of cancer tissue (Chu et al. 2018).
Finally, if the details of transcription are not an important requirement and affordability is a major concern, then CB may be a suitable alternative for testing the transcriptional activity of coding and noncoding genes. Due to its high sensitivity in detecting SIs from Steps 1 and 2, 3′CB is advantageous in observing the cotranscriptional splicing dynamics especially in conjunction with the corroborative information from CB.
Discussion
In this study, we aimed to compare 7 nascent/CB RNA-seq methods. We have modified the protocols for some methods, such as the use of a uniform nuclei isolation process and the construction of cDNA libraries as consistent as possible with each other. This allows us to compare the differences in detection principles of nascent/CB RNA, rather than differences in material growth, pretreatment, and library construction.
Despite our best efforts, there is a certain degree of unfairness in comparisons. For example, in the pNET experiments, we used an antibody that recognizes the Ser5P CTD because the isoform of Ser5P NRPB1 CTD is supposed to be involved in almost all the steps of transcription, including pausing release, elongation, and termination after transcription initiates (Zhu et al. 2018). If an antibody that recognizes the nonphosphorylated or Ser2P CTD isoform is used in pNET experiments, the obtained signal is more biased toward the 5′ end (proximal promoter) or 3′ end (terminator) of the gene, respectively (Nojima et al. 2015; Zhu et al. 2018). The specificity of these antibodies has been rigorously tested for mammalian NRPB1 (Nojima et al. 2015). Due to the well-conserved NRPB1 CTD heptad repeats, these antibodies are supposed to work for plant NRPB1 CTD. The Ser2P and Ser5P antibodies were previously confirmed by treating the plants with the kinase inhibitor flavopiridol (Zhu et al. 2018).
Nonetheless, it has been shown that CTD heptad repeats of NRPB1 are not uniformly phosphorylated, e.g. adjacent CTD repeats can be differently phosphorylated (Schüller et al. 2016; Suh et al. 2016). Therefore, pNET using anti-Ser5P antibody captures the nascent RNA associated with NRPB1 that is enriched but not exclusive for Ser5P modification. To encompass RNAs bound to all isoforms of NRPB1, it is advisable to employ pNET with NRPB1-fusion affinity-tagged transgenic materials or plaNET. Furthermore, the RNAs detected by pNET may represent transcripts associates with an arrested or stalled Pol II complex rather than Pol II in transcription. GRO, on the other hand, is designed to detect transcriptionally engaged Pol II (Wissink et al. 2019). It is also possible that the RNA detected by CB or 3′CB is simply a regulatory RNA bound to chromatin rather than nascent RNA being transcribed. Nonetheless, the correlation between GRO_pure and pNET_pure signals in the R1-R4 region was at least 0.87, and the correlation between the 3′CB and the above 2 types of methods was also above 0.79, indicating that nascent RNA levels detected by these methods are very similar at a resolution of several hundred bp (bin size in R1 to R4 regions).
Although the reproducibility of both pNET/ChrNET and GRO in this technical replicate-based experiment is excellent, we recommend conducting all samples in the same set of experiments simultaneously (e.g. controls and treatments, samples in a time course, multiple independent biological replicates, etc.). Fortunately, samples for these methods can be collected at different times and frozen at −80 °C, and the experimental process can be started at the same time. From our experience, no more than 6 to 8 samples of nascent RNA should be extracted for GRO method at a time, and no more than 10 to 12 samples for pNET, 3′CB, and CB. Of course, one can accumulate more nascent/CB RNA samples (say 20 to 24) for downstream small RNA cDNA library construction.
While pNET/ChrNET, GRO, 3′CB, and CB relatively comprehensively represent methods for the detection of nascent/CB RNAs by parallel sequencing, there are some methods that were not included in this study. For example, PRO-seq, which has been reported in cassava (Lozano et al. 2021), is a single nucleotide updated version of GRO-seq. We hypothesize that PRO-seq performs better than GRO and is comparable with pNET. plaNET uses an anti-FLAG antibody to track the affinity-tagged NRPB2 of Pol II in transgenic plants (Kindgren et al. 2020), instead of an antibody against native NRPB1. It is an alternative to pNET and is supposed to be close to pNET experiments using anti-total Pol II antibody. We compared the published plaNET data with our pNET data (Kindgren et al. 2020; Supplemental Fig. S12). Meta-profiles of plaNET showed that nascent RNA accumulated at both 5′ and 3′ ends (Supplemental Fig. S12A; Fig. 4A). plaNET and pNET are nearly sensitive and accurate in detecting active genes (Supplemental Fig. S12B; Fig. 3C). plaNET can also detect 5′SS (Supplemental Fig. S12, C and D). Nascent 5-EU-labeled RNA sequencing (Neu-seq), which applies 5-EU for metabolic labeling of nascent RNA (Szabo et al. 2020), differs in principle and experimental procedures from the 7 methods compared in this study, so we did not test the Neu-seq in parallel. However, comparing the Neu-seq data (Szabo et al. 2020) with ours, Neu-seq is more similar to CB (Supplemental Fig. S12, A, B, and D; Fig. 4, A and B).
In addition to short-read sequencing, long-read sequencing technologies such as the PacBio or Nanopore were also employed for nascent/CB RNA analysis (Jia et al. 2020; Mo et al. 2021). The 3′ adapter was ligated to the CB RNA in FLEP-seq, followed by the first cDNA conversion through reverse transcription and template switching. This method enables sequencing of the full-length CB RNA, as opposed to the partial CB RNA-seq achieved via the 3′CB method. We analyzed the published Nanopore based FLEP-seq data (Jia et al. 2020; Long et al. 2021) and found that FLEP-seq was reproducible in detecting gene activity (Supplemental Fig. S13A). Furthermore, FLEP-seq is comparable with pNET/ChrNET and GRO in terms of detection sensitivity and background noise (Supplemental Fig. S13B).
Despite independent experiments conducted by different laboratories, the nascent RNA/CB RNAs measured by the long-read and short-read platforms were highly correlated (Pearson correlation >0.8, Supplemental Fig. S13C). The read length of FLEP-seq tends to be longer due to the size selection of RNAs longer than 200 nt, which is dedicated to reducing the highly abundant short ncRNAs (with a median length of ∼900 nt in Arabidopsis) (Long et al. 2021), while simultaneously sacrificing the detection of short nascent RNA from the 5′ end of genes. Consequently, the FLEP-seq signals tend to be concentrated toward the 3′ end of genes (Supplemental Fig. S13D), indicating that FLEP-seq may not be suitable for studying 5′ stalling. However, the longer read length of FLEP-seq enables it to investigate cotranscriptional splicing, polyadenylation, and polyA length, as well as the interaction between these cotranscriptional processes at the single molecule level (Jia et al. 2020; Mo et al. 2021). This capability is unattainable for nascent/CB RNA methods that rely on short-read sequencing.
Nascent/CB RNA-seq methods were originally developed using fungal or animal cells, but plant tissues differ from fungal and animal cells in many ways, which may require modifications to nascent/CB RNA capture. For example, plant cell lines are uncommon, and the cell wall needs to be removed to extract nuclei from plant tissue. Isolating nuclei with acceptable integrity and purity as easily and quickly as possible is a challenge for nascent RNA detection in plants. Here, we performed the pNET and GRO type methods on pure and crude nuclei. Because we only prepared crude nuclei and pure nuclei once to minimize the variation introduced by nuclei preparation, it is difficult for us to draw a definite conclusion on how the quality of nuclei affects the 2 nascent RNA methods.
However, pNET/ChrNET is likely insensitive to nuclear purity and integrity due to the high correlation among pNET_pure, pNET-crude, and ChrNET. Based on the low correlation of the GRO_crude and GRO_pure methods in detecting 5′ and 3′ stalling, we consider GRO_crude to be an inappropriate method for the detection of transcription initiation and termination. Moreover, a preexperimental optimization of nuclei/chromatin preparation for pNET/ChrNET and GRO is recommended, especially for plant species and/or tissues that have not yet been tested for nascent/CB RNA capture. In addition, for some species such as Arabidopsis, plastids are difficult to separate from nuclei due to the high similarity in size, shape, and density between plastids and nuclei, which results in a high proportion of plastid RNA contamination in nascent/CB RNA.
Therefore, improving nuclei isolation strategies, such as sorting nuclei stained with DAPI, may also be a possible way to reduce sequencing costs and increase the specificity for nascent RNA capture. The sequencing cost of GRO can be further reduced by including the rRNA depletion step after the purification of nuclear run-on RNAs (Chen et al. 2022; Xie et al. 2022). However, when reads from plastid RNA and highly abundant ncRNA were filtered, the conclusions that we obtained in plant tissues are similar to those observed in mammalian cell analyses using nascent/CB RNA-seq methods (Andersson et al. 2014; Nojima et al. 2015) (Supplemental Fig. S14).
Materials and methods
Plant materials and growth conditions
A. thaliana Col-0 plants were germinated and grown on 1/2 MS medium for 10 d at 22 °C under 16 h light (150 μmol·m−2·s−1) and 8 h dark cycles. Aerial tissue was harvested, flash frozen, and ground to powder with liquid nitrogen.
Nuclei and chromatin isolation
Pure nuclei were isolated from 8-g tissue based on a modified method previously published (Zhu et al. 2018). The tissue powder was solubilized in 120 mL isolation buffer (2 m hexylene glycol, 20 mm PIPES buffer, pH 7.0, 10 mm MgCl2, 1% [v/v] TritonX-100, and 5 mm β-mercaptoethanol) and left on ice for 5 min and then filtered through 110 and 60 μm nylon mesh. Nuclei were further purified using a 30% (v/v)/80% (v/v) Percoll gradient solution, followed by washes with gradient buffer (0.5 m hexylene glycol, 5 mm PIPES buffer, pH 7.0, 10 mm MgCl2, 1% TritonX-100, and 5 mm β-mercaptoethanol) and transcription buffer (25 mm Tris-HCl, pH 7.2, 2.5 mm MgCl2, 2.5 mm KCl, 100 mm NH4Cl, 0.5 mm MnCl2, and 2.5 mm DTT). Then, 39%, 39%, 20%, and 2% of nuclei were used for GRO_pure, pNET_pure, 3′CB/CB, and nuclei RNA, respectively. Approximately 107 pure nuclei were used for each GRO_pure or pNET_pure library, and 5 × 106 nuclei were used for CB RNA isolation of 3′CB and CB.
Crude nuclei were isolated from 12-g tissue based on a modified method previously published (Zhu et al. 2018). The tissue powder was solubilized in 200 mL lysis buffer (50 mm HEPES, pH 7.5, 150 mm NaCl, 1 mm EDTA, 1% Triton X-100, 10% [v/v] glycerol, 5 mm β-mercaptoethanol, 1 μg/mL pepstatin A, 1 μg/mL aprotinin, and 1 mm PMSF) and incubated the mixture on ice for 5 min and then filtered through 110 μm nylon mesh and a 40 μm cell strainer. The crude nuclei were spun down at 3,200 g for 20 min at 4 °C, followed by washes with 1 mL of HBB buffer (25 mm Tris-HCl, pH 7.6, 0.44 m sucrose, 10 mm MgCl2, 0.1% Triton-X, and 10 mm β-mercaptoethanol) and 1 mL of HBC buffer (20 mm Tris-HCl, pH 7.5, 352 mm sucrose, 8 mm MgCl2, 0.08% Triton-X, and 8 mm β-mercaptoethanol, 20% glycerol). Approximately 107 crude nuclei were used for each GRO_crude, GRO_NC, pNET_crude, pNET_NC, ChrNET, or ChrNET_NC library.
Chromatin extraction was conducted according to the published method (Zhu et al. 2020; Zhu et al. 2021). The nuclei from above were resuspended in 150 µL resuspension buffer (50% glycerol, 25 mm Tris-HCl, pH 7.5, 0.5 mm EDTA, 100 mm NaCl, 1 mm DTT, 0.4 U/μL RNase inhibitor, 1 μg/mL pepstatin A, 1 μg/mL aprotinin, 1 mm PMSF, and 8 mm β-mercaptoethanol) and immediately washed by 300 µL CB washing buffer (25 mm Tris-HCl, pH 7.5, 300 mm NaCl, 1 m urea, 0.5 mm EDTA, 1 mm DTT, 1% [v/v] Tween-20, 0.4 U/μL RNase inhibitor, 1 μg/mL pepstatin A, 1 μg/mL aprotinin, 1 mm PMSF, and 8 mm β-mercaptoethanol). Again, the pellet was washed with another 150 µL resuspension buffer and 150 µL CB washing buffer. The chromatin was ready for CB RNA extraction or MNase digestion.
For each method, nuclei or chromatin were divided into 3 aliquots for technical replicates. Volume and composition of buffers required for nuclei isolation and chromatin preparation were listed in Supplemental Data Set 5.
GRO_pure, GRO_crude, and GRO_NC
Briefly, pure or crude nuclei were washed again with transcription buffer and resuspended in 100 µL transcription buffer and then mixed well with 100 µL transcription buffer and supplemented with 2 units/µL RNasin, 500 µm ATP, 500 µm GTP, 500 µm Br-UTP, 2 µm CTP, 2% (w/v) sarkosyl and incubated at 30 °C for 5 min (Zhu et al. 2018). For GRO_NC, run-on was performed in the presence of UTP instead of Br-UTP. α32P-CTP was not used to trace the run-on. Nuclear RNA was extracted using TRIzol (Invitrogen) and subjected to RNA fragmentation and anti-BrU affinity purification (Chen et al. 2022).
GRO_pure, GRO_crude, GRO_NC, pNET_pure, pNET_crude, pNET_NC, ChrNET, ChrNET_NC, and 3′CB cDNA libraries were prepared using NEXTflex Small RNA-Seq Kit v3 (Bioo Scientific) with some modifications. For GRO_pure, GRO_crude, and GRO_NC, after 3′ adapter ligation, the run-on RNA was decapped using RppH (NEB, M0356S) and 5′ phosphorylated with T4 PNK (NEB, M0201L). The 3′ and 5′ adapter-ligated RNA was then reverse transcribed and amplified, followed by size selection with a native 6% TBE polyacrylamide gel, and the cDNA of 170 to 300 bp was recovered.
pNET_pure, pNET_crude, pNET_Nc, ChrNET, and ChrNET_NC
The pure nuclei, crude nuclei, and chromatin were resuspended in MNase buffer (20 mm Tris-HCl, pH 8.0, 5 mm NaCl, and 2.5 mm CaCl2) and digested with 20 U MNase (TaKaRa, 20 U μ/L) for 5 min (pNET_pure, pNET_crude, and pNET_NC) or 3 min (ChrNET and ChrNET_NC) at 37 °C rotating at 1,400 rpm. The digestion was stopped by adding 40 μL of 500 mm EDTA. Pol II-DNA-RNA complexes were released via a mild sonication and were immunoprecipitated with Pol II CTD Ser5P antibody (CMA603, MBL Life Science). No antibody mock control was set for pNET_NC and ChrNET_NC. RNA was treated with T4 PNK on beads and recovered with TRIzol reagent (size selection of RNA was skipped), followed by 3′ adapter ligation, 5′ adapter ligation, and reverse transcription and amplification using the NEXTflex Small RNA-Seq Kit v3 (Bioo Scientific). The cDNA of 150 to 230 bp was recovered with a native 6% TBE polyacrylamide gel.
3′CB and CB
CB RNA was extracted from chromatin pellet using TRIzol reagent and subjected to genomic DNA removal by TURBO DNase (Life Technologies). The RNA was purified with RNA Clean & Concentrator-5 kit (Zymo Research, R1013). About 2 µg CB RNA was subjected to rRNA depletion by a riboPOOL kit (siTOOLs Biotech, PanPlant-10 nmol) and polyA RNA removal by oligo (dT) beads (NEB, S1419). The RNA was recovered by AMPure beads (Beckman, A63880) and then divided into 2 aliquots for 3′CB and CB library construction.
For 3′CB, after 3′ adapter ligation, the CB RNA was fragmented to 30 to 200 nt at 94 °C for 7 min with 1× first-strand synthesis buffer (Takara), followed by RppH and T4 PNK treatment as GRO. After 5′ adapter ligation, the 5′ and 3′ adapted RNA sample was hybridized with top 150 abundant snoRNA probe mix (where 122 single-strand DNA oligos were designed; Supplemental Data Set 1) and digested with RNase H (Lucigen, H39500). The 3′ and 5′ adapter-ligated RNAs were then reverse transcribed and amplified, followed by size selection with a native 6% TBE polyacrylamide gel, and the cDNA of 150 to 230 bp was recovered.
For CB libraries, the RNA was transformed into cDNA libraries using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (NEB #E7765).
Nuclear and total RNA-seq
About 100 µL tissue powder and 2% of pure nuclei were extracted for RNA using TRIzol (Invitrogen). The nuclear and total RNAs were subjected to genomic DNA removal by TURBO DNase, followed by purification with RNA Clean & Concentrator-5 kit. For Nuc and total, about 1 µg purified RNA was subjected to rRNA depletion using riboPOOL, whereas for Nuc_pA and total_pA, polyA RNA was enriched by oligo (dT) beads (NEB, S1419).
Nuclear and total RNA (rRNA depletion and polyA enrichment RNA) were transformed into cDNA libraries using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (NEB #E7765). All cDNA library concentration was determined using a Qubit QuantIT (Invitrogen) and with Kapa quantitative real-time PCR. Paired-end reads were generated on the Illumina Novaseq platform.
Immunoblot
We monitored NRPB1 before and after IP in pNET_crude and ChrNET by immunoblotting. The samples of total, nuclear, chromatin, and IPed proteins from pNET_crude, pNET_NC, ChrNET, and ChrNET_NC were resolved on a 6%/10% step gradient SDS-PAGE. The protein below 75 kD (in 10% gel) was stained by Coomassie brilliant blue as loading control, while the protein above 75 kD (in 6% gel) was blotted and probed with a 1:1000 dilution of anti-Ser5P CTD antibody (Abcam, ab5131).
Data processing
For pNET_pure, pNET_crude, pNET_NC, ChrNET, ChrNET_NC, GRO_pure, GRO_crude, GRO_NC, and 3′CB, 70 nt was trimmed from the 3′ end of read sequence by HOMER (Heinz et al. 2010), followed by PCR duplications removal by clumpify.sh from BBMap (Bushnell 2014), and then adapters and low-quality data were trimmed by Cutadapt (Martin 2011). Before mapping to reference genome, reads originating from rRNA were filtered by aligning against 45S (GenBank accession no. X52322) and 5S (GenBank accession no. AF331007) transcripts of Arabidopsis using STAR (Dobin et al. 2013) with default setting and subsequently aligned to the Arabidopsis genome TAIR10. Only uniquely mapped reads were retained. For GRO_pure, GRO_crude, and GRO_NC, the 5′ coordinate of read1 was considered the potential position of the engaged Pol II, whereas for other methods, the 5′ end of read2 represented the last nucleotide incorporated by polymerases, and the directionality was indicated by read1. Reads mapped to plastid genome and high abundance ncRNAs (rRNA, tRNA, snRNA, and snoRNA) (including their upstream and downstream 100 bp regions) were removed according to the annotation of Araport11 (Cheng et al. 2017).
For CB, after removing adapter sequences and low-quality data, reads originating from rRNA were filtered. Then, the sequences were aligned to the genome using STAR, and only uniquely mapped reads were kept. For Nuc, total, Nuc_pA, and total_pA, after removing adapter sequences and low-quality data, reads were aligned to the reference genome by STAR, and only uniquely mapped reads were kept for further analysis.
Correlation and gene transcription activity calculation
To calculate Pearson correlation of each region, read counts of each region were calculated by “CoverageBed” function of BEDTools (Quinlan and Hall 2010) for pNET_pure, pNET_crude, pNET_NC, ChrNET, ChrNET_NC, GRO_pure, GRO_crude, and GRO_NC and 3′CB, and read counts on each region were quantified by featureCounts (Liao et al. 2014) for CB, Nuc, total, Nuc_pA, and total_pA. For Pearson correlation coefficient calculation and PCA analysis, read counts were normalized by VST method implemented in R package “DESeq2” (Love et al. 2014). Gene activity was normalized by TPM.
Define the threshold for gene activity
To faithfully detect actively transcribed genes, we employed a strategy that relies on read density of gene desert regions as nontranscribed background noise (Core et al. 2008). Intergenic regions at least 3 kb away from annotated transcripts were considered as gene deserts. Read densities of 2 kb windows of gene deserts were calculated. The 90th percentile of the total read density from gene deserts was set as a threshold, and the corresponding TPM or reads per kilobase per million (RPKM)/fragments per kilobase per million (FPKM) was calculated for each method. To be stringent, the cutoff value was set slightly higher than the calculated value.
Define active genes by comparing with NC
To faithfully detect active genes, the read signal (read count to clean reads) on gene body was compared between pNET_crude, ChrNET, GRO_crude and pNET_NC, ChrNET_NC, GRO_NC, respectively. Fisher's exact test was employed to conduct the statistical test. The P values from the tests were converted to FDR (false discovery rate)-corrected q values. A gene is defined as active when the q value < 0.05.
5′ and 3′ StI calculation
The method to calculate 5′StI was described in the previous study (Zhu et al. 2018). First, the promoter-proximal peak, which has the highest read density, was identified in 50 bp windows that shifted by 5 bp from −150 to +150 bp from TSS. Then, the read density of promoter-proximal peak was divided by the read density of gene body. 3′StI was calculated using the read density of terminator (+1 to +200 bp from PAS) divided by the read density of gene body. Fisher's exact test was used to conduct the statistical test. The P values from the tests were converted to FDR-corrected q values. 5′ stalled genes or 3′ stalled genes were defined when q value <0.05 and 5′StI or 3′StI >3.
Meta-profiles
The average read density across TSS/PAS regions (±1 kb) was illustrated by meta-profiles. Given the annotations of TSS and PAS are more accurate in AtRTD3 than previous annotations (Zhang et al. 2022), we adopted the annotation of protein-coding genes from AtRTD3 for meta-profile analysis. Since most genes have multiple isoforms, we first used Salmon for alignment-free transcript abundance estimates at total RNA level (Patro et al. 2017). Transcripts with TPM > 1 were retained. And for each gene, the transcript with the highest expression level was selected as the representative to extract the position of TSS and PAS. Regions were divided into 10 bp bins. The read density in each bin was calculated, and then the average value was plotted.
To visualize the SIs, the average read density in a region of 50 bp upstream and downstream of the splice sites was plotted. Introns with no alternative splicing according to AtRTD3 were used in analysis. For each meta-profile, 1% of the most extreme coverage values in each location were trimmed before averaging; the error bars indicate the SEM across each bin.
Cotranscriptional splicing efficiency, SI index, and lariat index calculation
To quantitatively evaluate the cotranscriptional splicing efficiency, we calculated the PIR value for constitutive introns as described in the previous study (Zhu et al. 2021; Supplemental Fig. S7D). SI index was defined as the ratio of read count at 5′SS to the average read count in EI region (−50 to +50 bp from 5′SS). And, lariat index was defined as the ratio of read count at 3′SS to the average read count in IE region (−50 to +50 bp from 3′SS) (Supplemental Fig. S7D).
5′ splicing sites prediction
Spikes were detected by “findpeaks” in R package “pracma” (Borcher 2021) with a threshold of 5 (at least 5 reads at the peak summit). We only considered the canonical splice rule “GT-AG”. Thus, when the 2 nt sequence downstream of the peak summit matches “GT,” the spike was preserved. To further filter false positives, we set a cutoff of 5 for the ratio of spike read density to the average read density of upstream/downstream 50 bp (Fig. 5D).
RS prediction
We developed a pipeline to investigate whether there are RS events in Arabidopsis. First, splicing events detected by CB, Nuc, total, Nuc_pA, and total_pA were retrieved from bam files using a custom script according to certain criteria (see text and Fig. 6C). Then, RS candidates were selected according to the criteria: (i) a shorter intron is contained within a longer intron with the same 5′SS in CB; (ii) the sequence of potential RS-sites should be “AGGT,” and the distance from RS-site to the 3′SS is longer than 20 nt; and (iii) short introns were not detected in total polyA RNA.
Detection of annotated ncRNA transcripts
To evaluate the performance of different methods in detecting unstable ncRNA transcripts, we merged the lncRNA genes annotated by previous studies (Kindgren et al. 2020; Ivanov et al. 2021) and Apraport11 and considered all of them as annotated lncRNA genes. In addition, the activity of 325 miRNA genes annotated in Araport11 was calculated. To exclude transcription of the coding gene itself, intronic miRNAs and miRNAs from 5′UTR were not analyzed. Besides, Pol V transcripts loci used here were defined previously by combining Pol V ChIP-seq (antibody against the largest subunit of Pol V, NRPE1) and GRO-seq signal in Col-0, Pol V mutant (nrpe1) ,and Pol IV/V double mutant (nrpd1 nrpe1) (Liu et al. 2018).
Gene activity was normalized to RPKM (pNET type, GRO type, and 3′CB) or FPKM (CB, Nuc, total, Nuc_pA, and total_pA). The cutoff to define an active transcript is based on the threshold of read density on gene deserts.
Detection of unknown lncRNA transcripts
To compare the sensitivity of different nascent/CB RNA-seq methods in detecting novel lncRNA transcripts, we first used “findpeaks” in HOMER (pNET type, GRO type, and 3′CB) (-style groseq -tssFold 5 -bodyFold 2 -minBodySize 500) and stringTie (CB, Nuc, total, Nuc_pA, and total_pA) (Pertea et al. 2015) to define transcripts. Then, the detected transcripts were compared with transcripts annotated in AtRTD3 and previous studies (Kindgren et al. 2020; Ivanov et al. 2021) by the function “intersectBed” in bedtools. Transcripts overlapped with annotated transcripts and the extended upstream 200 bp and downstream 500 bp were filtered out. Protein-coding potential of novel transcripts was estimated with CNCI (Sun et al. 2013). Novel lncRNA from different methods were merged, and activity was calculated.
Accession numbers
The original data in this study can be found in NCBI website under Bioproject accession PRJNA843332.
Supplementary Material
Acknowledgments
The assistance provided by the editors and the 4 anonymous reviewers in improving the manuscript is greatly appreciated.
Contributor Information
Min Liu, Guangdong Provincial Key Laboratory of Plant Adaptation and Molecular Design, Guangzhou Key Laboratory of Crop Gene Editing, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou 510006, China.
Jiafu Zhu, Guangdong Provincial Key Laboratory of Plant Adaptation and Molecular Design, Guangzhou Key Laboratory of Crop Gene Editing, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou 510006, China.
Huijuan Huang, Guangdong Provincial Key Laboratory of Plant Adaptation and Molecular Design, Guangzhou Key Laboratory of Crop Gene Editing, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou 510006, China.
Yan Chen, Guangdong Provincial Key Laboratory of Plant Adaptation and Molecular Design, Guangzhou Key Laboratory of Crop Gene Editing, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou 510006, China.
Zhicheng Dong, Guangdong Provincial Key Laboratory of Plant Adaptation and Molecular Design, Guangzhou Key Laboratory of Crop Gene Editing, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou 510006, China.
Author contributions
Z.D., M.L., and J.Z. designed the research; J.Z., M L., H.H., and Y.C. performed the research; M.L. analyzed the data; and M.L., Z.D., and J.Z. wrote the paper.
Supplemental data
The following materials are available in the online version of this article.
Supplemental Figure S1 . Experimental pipeline of 3′ end of chromatin-bound RNA sequencing (3′CB) and comparison of read distribution on small nucleolar RNA (snoRNA) genes between 3′CB with (w/) and without (w/o) snoRNA depletion.
Supplemental Figure S2 . Pipelines of library construction of different methods and the comparison of read density ratio of intron with exon among methods.
Supplemental Figure S3 . Proportion of various RNA components in nascent/CB RNA libraries and enrichment efficiency of Pol II products in pNET/ChrNET.
Supplemental Figure S4 . Negative controls of nascent RNA methods.
Supplemental Figure S5 . Reproducibility of different methods in detecting transcriptional activity at different transcriptional stages.
Supplemental Figure S6 . Two ways to define active transcripts.
Supplemental Figure S7 . Measure cotranscriptional splicing by combining methods.
Supplemental Figure S8 . Detect splicing events and unannotated splicing events.
Supplemental Figure S9 . Detect unknown lncRNA genes.
Supplemental Figure S10 . MiRNA precursors detected by different methods.
Supplemental Figure S11 . Detect transcripts from Pol V loci.
Supplemental Figure S12 . Analysis of plaNET-seq and Neu-seq.
Supplemental Figure S13 . The features of FLEP-seq (full-length elongating and polyadenylated RNA sequencing).
Supplemental Figure S14 . Analysis of nascent/CB RNA methods in Hela cells.
Supplemental Data Set 1 . Probes used in snoRNA depletion for 3′CB RNA-seq.
Supplemental Data Set 2 . Summary of sequencing data.
Supplemental Data Set 3 . Recursive splicing candidates.
Supplemental Data Set 4 . Unknown transcripts detected by nascent/CB RNA sequencing and transcriptional activity (RPKM) in different methods.
Supplemental Data Set 5 . Volume and composition of buffers required for nuclei isolation and chromatin preparation.
Supplemental Data Set 6 . Statistical test of the relative enrichment fold of retained_reads, rRNA, and Pt RNA in pNET_crude and ChrNET.
Funding
This work was supported by grants from the National Natural Science Foundation of China (31871289 and 32090061 to Z.D.), the National Key Research and Development Program (2021YFF1001203), the National Natural Science Foundation of China (31900463 to M.L. and 32101739 to J.Z.), the project of Guangzhou Municipal Science and Technology Bureau (202201020587 to M.L.), and the Guangdong University Innovation Team Project (2019KCXTD010).
Data availability
Scripts of data analysis can be found on GitHub (https://github.com/LIUMIN04/nascent_RNA_seq_analysis).
Dive Curated Terms
The following phenotypic, genotypic, and functional terms are of significance to the work described in this paper:
References
- Andersson R, Refsing Andersen P, Valen E, Core LJ, Bornholdt J, Boyd M, Heick Jensen T, Sandelin A. Nuclear stability and transcriptional directionality separate functionally distinct RNA species. Nat Commun. 2014:5(1):5336. 10.1038/ncomms6336 [DOI] [PubMed] [Google Scholar]
- Barbieri E, Hill C, Quesnel-Vallières M, Zucco AJ, Barash Y, Gardini A. Rapid and scalable profiling of nascent RNA with fastGRO. Cell Rep. 2020:33(6):108373–108373. 10.1016/j.celrep.2020.108373 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Borcher HW. pracma: Practical Numerical Math Functions R package version 2.3.3. [accessed 2021. Oct 15]. https://CRAN.R-project.org/package=pracm.
- Bushnell B. BBMap: a fast, accurate, splice -aware aligner. Technical report. Berkeley (CA): Lawrence Berkeley National Lab (LBNL); 2014.
- Chen Y, Zhu J, Xie Y, Li Z, Zhang Y, Liu M, Dong Z. Protocol for affordable and efficient profiling of nascent RNAs in bread wheat using GRO-seq. STAR Protoc. 2022:3(3):101657. 10.1016/j.xpro.2022.101657 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng CY, Krishnakumar V, Chan AP, Thibaud-Nissen F, Schobel S, Town CD. Araport11: a complete reannotation of the Arabidopsis thaliana reference genome. Plant J. 2017:89(4):789–804. 10.1111/tpj.13415 [DOI] [PubMed] [Google Scholar]
- Chu T, Rice EJ, Booth GT, Salamanca HH, Wang Z, Core LJ, Longo SL, Corona RJ, Chin LS, Lis JT, et al. Chromatin run-on and sequencing maps the transcriptional regulatory landscape of glioblastoma multiforme. Nat Genet. 2018:50(11):1553–1564. 10.1038/s41588-018-0244-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Churchman LS, Weissman JS. Nascent transcript sequencing visualizes transcription at nucleotide resolution. Nature. 2011:469(7330):368–373. 10.1038/nature09652 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Core LJ, Waterfall JJ, Gilchrist DA, Fargo DC, Kwak H, Adelman K, Lis JT. Defining the status of RNA polymerase at promoters. Cell Rep. 2012:2(4):1025–1035. 10.1016/j.celrep.2012.08.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Core LJ, Waterfall JJ, Lis JT. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science. 2008:322(5909):1845–1848. 10.1126/science.1162228 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013:29(1):15–21. 10.1093/bioinformatics/bts635 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duff MO, Olson S, Wei X, Garrett SC, Osman A, Bolisetty M, Plocik A, Celniker SE, Graveley BR. Genome-wide identification of zero nucleotide recursive splicing in Drosophila. Nature. 2015:521(7552):376–379. 10.1038/nature14475 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Erhard KF Jr, Talbot JE, Deans NC, McClish AE, Hollick JB. Nascent transcription affected by RNA polymerase IV in Zea mays. Genetics. 2015:199(4):1107–1125. 10.1534/genetics.115.174714 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fuchs G, Voichek Y, Benjamin S, Gilad S, Amit I, Oren M. 4sUDRB-seq: measuring genomewide transcriptional elongation rates and initiation frequencies within cells. Genome Biol. 2014:15(5):R69. 10.1186/gb-2014-15-5-r69 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010:38(4):576–589. 10.1016/j.molcel.2010.05.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hetzel J, Duttke SH, Benner C, Chory J. Nascent RNA sequencing reveals distinct features in plant transcription. Proc Natl Acad Sci U S A. 2016:113(43):12316–12321. 10.1073/pnas.1603217113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ivanov M, Sandelin A, Marquardt S. Trancriptomereconstructor: data-driven annotation of complex transcriptomes. BMC Bioinform. 2021:22(1):290. 10.1186/s12859-021-04208-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jia J, Long Y, Zhang H, Li Z, Liu Z, Zhao Y, Lu D, Jin X, Deng X, Xia R, et al. Post-transcriptional splicing of nascent RNA contributes to widespread intron retention in plants. Nat Plants. 2020:6(7):780–788. 10.1038/s41477-020-0688-1 [DOI] [PubMed] [Google Scholar]
- Joseph B, Kondo S, Lai EC. Short cryptic exons mediate recursive splicing in Drosophila. Nat Struct Mol Biol. 2018:25(5):365–371. 10.1038/s41594-018-0052-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kindgren P, Ivanov M, Marquardt S. Native elongation transcript sequencing reveals temperature dependent dynamics of nascent RNAPII transcription in Arabidopsis. Nucleic Acids Res. 2020:48(5):2332–2347. 10.1093/nar/gkz1189 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kwak H, Fuda NJ, Core LJ, Lis JT. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science. 2013:339(6122):950–953. 10.1126/science.1229386 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li S, Wang Y, Zhao Y, Zhao X, Chen X, Gong Z. Global co-transcriptional splicing in Arabidopsis and the correlation with splicing regulation in mature RNAs. Mol Plant. 2020:13(2):266–277. 10.1016/j.molp.2019.11.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liao Y, Smyth GK, Shi W. Featurecounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014:30(7):923–930. 10.1093/bioinformatics/btt656 [DOI] [PubMed] [Google Scholar]
- Liu M, Zhu J, Dong Z. Immediate transcriptional responses of Arabidopsis leaves to heat shock. J Integr Plant Biol. 2021:30(3):468–483. 10.1111/jipb.12990 [DOI] [PubMed] [Google Scholar]
- Liu W, Duttke SH, Hetzel J, Groth M, Feng S, Gallego-Bartolome J, Zhong Z, Kuo HY, Wang Z, Zhai J, et al. RNA-directed DNA methylation involves co-transcriptional small-RNA-guided slicing of polymerase V transcripts in Arabidopsis. Nat Plants. 2018:4(3):181–188. 10.1038/s41477-017-0100-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Long Y, Jia J, Mo W, Jin X, Zhai J. FLEP-seq: simultaneous detection of RNA polymerase II position, splicing status, polyadenylation site and poly(A) tail length at genome-wide scale by single-molecule nascent RNA sequencing. Nat Protoc. 2021:16(9):4355–4381. 10.1038/s41596-021-00581-7 [DOI] [PubMed] [Google Scholar]
- Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014:15(12):550–550. 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lozano R, Booth GT, Omar BY, Li B, Buckler ES, Lis JT, del Carpio DP, Jannink J-L. RNA polymerase mapping in plants identifies intergenic regulatory elements enriched in causal variants. G3 (Bethesda). 2021:11(11):jkab273. 10.1093/g3journal/jkab273 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011:17(1):10–12. 10.14806/ej.17.1.200 [DOI] [Google Scholar]
- Mayer A, di Iulio J, Maleri S, Eser U, Vierstra J, Reynolds A, Sandstrom R, Stamatoyannopoulos JA, Churchman LS. Native elongating transcript sequencing reveals human transcriptional activity at nucleotide resolution. Cell. 2015:161(3):541–554. 10.1016/j.cell.2015.03.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mo W, Liu B, Zhang H, Jin X, Lu D, Yu Y, Liu Y, Jia J, Long Y, Deng X, et al. Landscape of transcription termination in Arabidopsis revealed by single-molecule nascent RNA sequencing. Genome Biol. 2021:22(1):322. 10.1186/s13059-021-02543-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nojima T, Gomes T, Grosso ARF, Kimura H, Dye MJ, Dhir S, Carmo-Fonseca M, Proudfoot NJ. Mammalian NET-seq reveals genome-wide nascent transcription coupled to RNA processing. Cell. 2015:161(3):526–540. 10.1016/j.cell.2015.03.027 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nojima T, Rebelo K, Gomes T, Grosso AR, Proudfoot NJ, Carmo-Fonseca M. RNA polymerase II phosphorylated on CTD serine 5 interacts with the spliceosome during co-transcriptional splicing. Mol Cell. 2018:72(2):369–379.e4. 10.1016/j.molcel.2018.09.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017:14(4):417–419. 10.1038/nmeth.4197 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. Stringtie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015:33(3):290–295. 10.1038/nbt.3122 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010:26(6):841–842. 10.1093/bioinformatics/btq033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schüller R, Forné I, Straub T, Schreieck A, Texier Y, Shah N, Decker T-M, Cramer P, Imhof A, Eick D. Heptad-specific phosphorylation of RNA polymerase II CTD. Mol Cell. 2016:61(2):305–314. 10.1016/j.molcel.2015.12.003 [DOI] [PubMed] [Google Scholar]
- Schwalb B, Michel M, Zacher B, Frühauf K, Demel C, Tresch A, Gagneur J, Cramer P. TT-seq maps the human transient transcriptome. Science. 2016:352(6290):1225–1228. 10.1126/science.aad9841 [DOI] [PubMed] [Google Scholar]
- Sibley CR, Emmett W, Blazquez L, Faro A, Haberman N, Briese M, Trabzuni D, Ryten M, Weale ME, Hardy J, et al. Recursive splicing in long vertebrate genes. Nature. 2015:521(7552):371–375. 10.1038/nature14466 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suh H, Ficarro SB, Kang U-B, Chun Y, Marto JA, Buratowski S. Direct analysis of phosphorylation sites on the Rpb1 C-terminal domain of RNA polymerase II. Mol Cell. 2016:61(2):297–304. 10.1016/j.molcel.2015.12.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun L, Luo H, Bu D, Zhao G, Yu K, Zhang C, Liu Y, Chen R, Zhao Y. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res. 2013:41(17):e166. 10.1093/nar/gkt646 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szabo EX, Reichert P, Lehniger M-K, Ohmer M, de Francisco Amorim M, Gowik U, Schmitz-Linneweber C, Laubinger S. Metabolic labeling of RNAs uncovers hidden features and dynamics of the Arabidopsis transcriptome. Plant Cell. 2020:32(4):871–887. 10.1105/tpc.19.00214 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomas QA, Ard R, Liu J, Li B, Wang J, Pelechano V, Marquardt S. Transcript isoform sequencing reveals widespread promoter-proximal transcriptional termination in Arabidopsis. Nat Commun. 2020:11(1):2589. 10.1038/s41467-020-16390-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weber CM, Ramachandran S, Henikoff S. Nucleosomes are context-specific, H2A.Z-modulated barriers to RNA polymerase. Mol Cell. 2014:53(5):819–830. 10.1016/j.molcel.2014.02.014 [DOI] [PubMed] [Google Scholar]
- Wissink EM, Vihervaara A, Tippens ND, Lis JT. Nascent RNA analyses: tracking transcription and its regulation. Nat Rev Genet. 2019:20(12):705–723. 10.1038/s41576-019-0159-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie Y, Chen Y, Li Z, Zhu J, Liu M, Zhang Y, Dong Z. Enhancer transcription detected in the nascent transcriptomic landscape of bread wheat. Genome Biol. 2022:23(1):109. 10.1186/s13059-022-02675-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yao L, Liang J, Ozer A, Leung AK, Lis JT, Yu H. A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers. Nat Biotechnol. 2022:47(7):1056–1065. 10.1038/s41587-022-01211-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang R, Kuo R, Coulter M, Calixto CPG, Entizne JC, Guo W, Marquez Y, Milne L, Riegler S, Matsui A, et al. A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis. Genome Biol. 2022:23(1):149. 10.1186/s13059-022-02711-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng B, Wang Z, Li S, Yu B, Liu J-Y, Chen X. Intergenic transcription by RNA polymerase II coordinates Pol IV and Pol V in siRNA-directed transcriptional gene silencing in Arabidopsis. Genes Dev. 2009:23(24):2850–2860. 10.1101/gad.1868009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu D, Mao F, Tian Y, Lin X, Gu L, Gu H, Qu L-J, Wu Y, Wu Z. The features and regulation of co-transcriptional splicing in Arabidopsis. Mol Plant. 2020:13(2):278–294. 10.1016/j.molp.2019.11.004 [DOI] [PubMed] [Google Scholar]
- Zhu J, Liu M, Liu X, Dong Z. RNA polymerase II activity revealed by GRO-seq and pNET-seq in Arabidopsis. Nat Plants. 2018:4(12):1112–1123. 10.1038/s41477-018-0280-0 [DOI] [PubMed] [Google Scholar]
- Zhu J, Zhao H, Kong F, Liu B, Liu M, Dong Z. Cotranscriptional and posttranscriptional features of the transcriptome in soybean shoot apex and leaf. Front Plant Sci. 2021:12:649634. 10.3389/fpls.2021.649634 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Scripts of data analysis can be found on GitHub (https://github.com/LIUMIN04/nascent_RNA_seq_analysis).








