SUMMARY
The transition of RNA polymerase II (Pol II) from initiation to productive elongation is a central, regulated step in metazoan gene expression. At many genes, Pol II pauses stably in early elongation, remaining engaged with the 25-60 nucleotide-long nascent RNA for many minutes while awaiting signals for release into the gene body. However, 15-20% of genes display highly unstable promoter Pol II, suggesting that paused polymerase might dissociate from template DNA at these promoters and release a short, non-productive mRNA. Here, we report that paused Pol II can be actively destabilized by the Integrator complex. Specifically, we present evidence that Integrator utilizes its RNA endonuclease activity to cleave nascent RNA and drive termination of paused Pol II. These findings uncover a previously unappreciated mechanism of metazoan gene repression, akin to bacterial transcription attenuation, wherein promoter-proximal Pol II is prevented from entering productive elongation through factor-regulated termination.
Graphical Abstract
ETOC blurb
Here, Elrod et al. demonstrate that the Integrator complex associates with paused RNA polymerase II at promoters and enhancers to terminate RNA synthesis. This attenuation mechanism potently represses expression of both stress- and growth-responsive genes in Drosophila and mammalian cells.
INTRODUCTION
Dysregulated gene activity underlies a majority of developmental defects and many diseases including cancer, immune and neurological disorders. Accordingly, the transcription of protein-coding messenger RNA (mRNA) is tightly controlled, and can be regulated at the steps of initiation, elongation or termination. During initiation, transcription factors (TFs) cooperate with coactivators such as Mediator to recruit the general transcription machinery and Pol II to a gene promoter. The polymerase then initiates RNA synthesis and moves downstream from the transcription start site (TSS) into the promoter-proximal region. However, after generating a short, 25-60 nt-long RNA, Pol II pauses in early elongation (Adelman and Lis, 2012). The DSIF and NELF complexes collaborate to stabilize Pol II in a paused conformation (Core and Adelman, 2019; Henriques et al., 2013; Vos et al., 2018). Release of paused Pol II into productive elongation requires the kinase P-TEFb, which phosphorylates DSIF, NELF and the Pol II C-terminal domain (CTD), removing NELF from the elongation complex and allowing Pol II to resume transcription into the gene body, with enhanced elongation efficiency (Peterlin and Price, 2006).
Release of paused Pol II into productive RNA synthesis is essential for formation of a mature, functional mRNA. If promoter-paused Pol II becomes permanently arrested or dissociates from the DNA through premature termination, the process of gene expression is short-circuited and the gene will not be expressed. Thus, the stability and fate of paused Pol II at a given promoter will have profound effects on gene output. Interestingly, work from a number of laboratories has highlighted that the stability of paused Pol II differs substantially among genes (Buckley et al., 2014; Chen et al., 2015; Erickson et al., 2018; Henriques et al., 2013; Krebs et al., 2017; Shao and Zeitlinger, 2017). In particular, studies of paused Pol II in Drosophila revealed a diversity of behaviors following treatment of cells with Triptolide (Trp), an inhibitor of TFIIH that prevents new transcription initiation (Henriques et al., 2018; Krebs et al., 2017; Shao and Zeitlinger, 2017; Vispé et al., 2009). At ~20% of genes, inhibition of transcription initiation with Trp dramatically reduced promoter Pol II levels within <2.5 minutes (Henriques et al., 2018). Thus, these genes consistently require new transcription initiation in order to maintain appropriate levels of promoter Pol II. As such, it has been proposed that Pol II undergoes multiple iterative cycles of initiation, early elongation and premature termination at these genes, each time releasing a short, non-functional RNA (Erickson et al., 2018; Kamieniarz-Gdula and Proudfoot, 2019; Krebs et al., 2017; Nilson et al., 2017; Steurer et al., 2018). In contrast, a majority of genes were found to harbor a more stable Pol II, with paused polymerase levels persisting after Trp treatment. After inhibiting transcription initiation, the median half-life of paused Pol II was ~10 minutes in both mouse and Drosophila systems (Chen et al., 2015; Henriques et al., 2018; Jonkers et al., 2014; Shao and Zeitlinger, 2017). The distinct stabilities of Pol II observed at different promoters suggests that the lifetime of paused polymerase is modulated to tune gene expression levels. However, the factors that mediate this regulation have yet to be elucidated.
Promoter-proximal termination is well-described in bacteria, where it is termed attenuation (Yanofsky, 1981). Attenuation serves to tightly repress gene activity, even under conditions where the polymerase is recruited to a promoter and initiates RNA synthesis at high levels. Mechanistically, bacterial attenuation often involves destabilization of the RNA-DNA hybrid within the polymerase through RNA structures and/or termination factors with RNA helicase activity (Gollnick and Babitzke, 2002; Henkin and Yanofsky, 2002; Yanofsky, 1981). Termination mechanisms are also recognized in Saccharomyces cerevisiae, where the Nrd1-Nab3-Sen1 (NNS) complex directs termination using coordinated RNA binding and helicase activities (Bresson and Tollervey, 2018). Intriguingly, the NNS complex, which predominantly drives termination of non-coding RNAs, has been implicated in premature termination at select mRNA loci (Merran and Corden, 2017; Porrua and Libri, 2015; Sohrabi-Jahromi et al., 2019). However, despite the regulatory potential of promoter-proximal attenuation, a similar phenomenon has not yet been described in metazoan cells. In particular, it remains unclear whether higher eukaryotes possess a termination machinery that promotes dissociation of paused early elongation complexes.
Elongating Pol II is typically extremely stable, with formation of a mature mRNA often involving transcription of many kilobases without Pol II dissociation from DNA. Termination at mRNA 3’-ends involves recognition of specific sequences by cleavage and polyadenylation (CPA) factors, and slowing of Pol II elongation. CPSF73, a component of the CPA complex, utilizes a β-lactamase/β-CASP domain (Mandel et al., 2006) to cleave pre-mRNA, producing both a substrate for polyadenylation and a free 5’ end on the nascent RNA engaged with Pol II. This 5’ end lacks the protective 7-methyl-G cap, allowing it to be targeted by the Xrn2 exonuclease, which ultimately leads to termination (Eaton et al., 2018). Hence, cleavage of the nascent RNA is coupled to termination and dissociation of Pol II from template DNA, as well as degradation of the associated RNA. Although the CPA machinery typically functions at gene 3’ ends, there are examples of premature cleavage and polyadenylation (PCPA) occurring within gene bodies, especially within intronic regions (Kamieniarz-Gdula and Proudfoot, 2019; Venters et al., 2019). However, whether this machinery is involved in RNA cleavage and termination of promoter proximal Pol II remains unknown.
We set out to determine the causes of differential stability of paused Pol II across mRNA genes and define factors that could render promoter Pol II susceptible to premature termination and the release of short, immature RNAs (Erickson et al., 2018; Henriques et al., 2013; Krebs et al., 2017; Nilson et al 2017; Shao and Zeitlinger, 2017; Steurer et al., 2018). We discovered that the Integrator complex is enriched at mRNA promoters with unstable Pol II pausing. The 14-subunit, metazoan-specific, Integrator complex was initially reported to be exclusively required for cleavage and 3’-end formation of small nuclear RNAs (snRNAs) involved in splicing (Baillat et al., 2005). However, subsequent work has suggested a broader role, including at signal-responsive mammalian genes (Gardini et al., 2014; Lai et al., 2015; Skaar et al., 2015; Stadelmayer et al., 2014). Our work reveals that Integrator targets paused Pol II at selected protein-coding genes and enhancers, to promote premature termination. Notably, the Integrator complex, like the CPA machinery, possesses an RNA endonuclease, and we find that this activity is critical for gene repression. Our findings indicate that gene control through regulated transcription attenuation is conserved in metazoan cells.
RESULTS
One potential explanation for the brief lifetime of Pol II near a subset of promoters is that paused polymerase is quickly released into productive elongation. This model would predict that such genes would generally have lower levels of Pol II near their promoters and more Pol II elongating within gene bodies. An alternative possibility is that fast Pol II turnover at these genes results from rapid transcription termination of promoter-paused Pol II. The key prediction of the latter model is that these genes would display lower levels of productively elongating Pol II within gene bodies.
To evaluate these possibilities, we compared nascent RNA profiles determined by PRO-seq, a single-nucleotide resolution method for mapping actively engaged Pol II (Kwak et al., 2013). Genes were stratified into four clusters based on their Pol II decay rate following Trp treatment (Henriques et al., 2018; Krebs et al., 2017) and PRO-seq signals near the promoter or within the gene body were analyzed. We found that genes with short-lived promoter Pol II occupancy (defined as half-life upon Trp-treatment <2.5 min) have significantly lower elongating Pol II levels than other gene classes (Figures 1A and S1A), despite modestly higher average promoter Pol II signals (although median promoter PRO-seq levels are not significantly higher, Figure S1A). These data are consistent with a model wherein Pol II is efficiently recruited to these promoters, but fails to enter productive elongation, possibly due to premature termination (Krebs et al., 2017).
To define factors that might contribute to this behavior, we computationally assessed a comprehensive repertoire of ChIP-seq data (Baumann and Gilmour, 2017; Henriques et al., 2018; Ho et al., 2014; Kaye et al., 2018; modENCODE Consortium et al., 2010; Weber et al., 2014). Specifically, we sought to identify factors enriched (or de-enriched) at gene promoters where pausing is unstable as compared to other promoters (see STAR Methods). Chromatin accessibility was observed to be consistent across Pol II decay classes (as assessed by ATAC-seq, Figures 1B and S1B), consistent with the similar promoter Pol II levels observed. However, reduced levels of tri-methylated H3 Lysine 36 (H3K36me3) were noted within genes harboring unstable promoter Pol II (Figures 1C and S1C). The H3K36me3 mark is deposited during productive elongation, and H3K36me3 levels typically correlate with transcription activity (Venkatesh and Workman, 2015; Wagner and Carpenter, 2012). Thus, the observed, low H3K36me3 signal indicates weak transcription elongation at genes with unstable Pol II, consistent with PRO-seq data. Conversely, genes with stable pausing exhibited stronger transcription activity and higher levels of H3K36me3 (Figures 1A and 1B), in agreement with recent work (Tettey et al., 2019).
Genes with unstable Pol II also displayed significant enrichment in H3K4 mono-methylation (H3K4me1) and lower tri-methylation of H3K4 (H3K4me3) and as compared to genes with more stable pausing (Figures 1B, S1D and S1E). This finding suggests that H3K4 methylation levels increase near promoters as Pol II stability and residence time increases, in agreement with a recent study in yeast (Soares et al., 2017). Elevated H3K4me1 levels, with deficiencies in H3K36me3, H3K4me3 and productive RNA elongation are considered to be characteristics of enhancers (ENCODE Project Consortium, 2012; Kim and Shiekhattar, 2015; Perissi et al., 2010). Enhancers are also characterized by unstable Pol II and the production of short RNAs (Henriques et al., 2018), suggesting a connection between the chromatin signatures typical of enhancers and defective or inefficient transcription elongation.
We next analyzed ChIP-seq profiles of non-chromatin proteins. We found Integrator subunit 1 (IntS1) among the most significantly enriched factors at genes with unstable Pol II (Figures 1D, 1E and S1F). Integrator is implicated in the biogenesis of enhancer-derived RNAs (eRNAs) (Lai et al., 2015), further underscoring the similarity between this class of genes and enhancers. To confirm these results, we conducted ChIP-seq using an antibody raised against another Drosophila Integrator subunit, IntS12, and found a similar enrichment at genes with unstable Pol II (Figure S1G).
In summary, genes with unstable promoter Pol II display typical levels of Pol II recruitment and promoter DNA accessibility, but significantly diminished Pol II elongation. These genes display chromatin features reminiscent of enhancers, suggestive that a lack of stable pausing and transcription elongation has consequences on local chromatin modifications (Figures 1E and 1F). These genes also show elevated occupancy by Integrator, a factor known to mediate RNA cleavage and Pol II termination at non-coding RNA loci.
Loss of Integrator leads to upregulation of gene expression
Two Integrator subunits, IntS11 and IntS9, are paralogs of the CPA proteins CPSF73 and CPSF100, respectively. IntS11, like CPSF73, has a β-lactamase/β-CASP domain and harbors endonuclease activity. Moreover, similar to CPSF73/100, IntS11 forms a heterodimer with IntS9 and this association is essential for function (Wu et al., 2017). This similarity suggests that Integrator might mediate transcription termination at protein-coding genes using a mechanism related to that of the CPA machinery. To evaluate this possibility, IntS9 was depleted using RNA interference (RNAi) for 60 hours (Figure S2A), followed by polyA-selected RNA-seq to identify mRNA expression changes. Consistent with the reported stability of snRNAs, their steady-state levels were not perturbed during the relatively short time course of RNAi (Figure S2B), and very few differences in splicing events were observed in IntS9-depleted cells (see STAR Methods). Thus, short-term loss of Integrator has minimal effects on snRNA functionality or splicing patterns. Nonetheless, genes with any evidence of altered splicing in IntS9-depleted cells were removed from all further analyses, enabling us to solely focus on transcriptional targets of Integrator.
Our analysis revealed 723 upregulated and 163 downregulated mRNAs upon IntS9 depletion (Figure 2A), suggesting that Drosophila Integrator is predominantly a transcriptional repressor. The expression changes observed upon IntS9 RNAi were validated using RT-qPCR at selected genes (Figure S2C). Gene Ontology analysis of upregulated transcripts shows significant enrichment in signal-responsive pathways, including metabolic, receptor and oxidoreductase activities, as well as Epidermal Growth Factor (EGF)-like protein domains (Figure S2D). Consistently, work on mammalian Integrator has implicated this complex in EGF-responsive gene activity (Gardini et al., 2014).
To probe the mechanisms by which Integrator regulates gene expression, we directly monitored nascent RNA synthesis using PRO-seq in control or IntS9-depleted cells. PRO-seq is amenable to spike-in normalization, allowing us to ensure that quantitative differences between samples are accurately measured (STAR Methods). PRO-seq in control cells revealed that Pol II is effectively recruited to IntS9-repressed promoters, but the polymerase often fails to transition into productive elongation (Figures 2B and 2C). In fact, genes upregulated upon IntS9 depletion exhibited significantly higher PRO-seq signal at promoters, yet lower PRO-seq signal within gene bodies and lower mRNA expression than unaffected genes (Figure S2F). These data demonstrate that Integrator does not repress transcription initiation but instead prevents the transition of promoter-proximal Pol II into productive RNA synthesis, perhaps by mediating transcription termination. Depletion of IntS9 relieved the block to productive elongation at upregulated genes, allowing a ~3-fold increase of PRO-seq signal within gene bodies (Figures 2C and 2D).
There was strong agreement between genes upregulated in PRO-seq and RNA-seq experiments, confirming that increased mRNA levels result from increased transcription elongation at these genes (Figures 2E and S2G). In contrast, decreases in RNA-seq were not well-reflected in PRO-seq, with fold-changes between the assays correlating poorly (Figure 2E). We conclude that the dominant transcriptional effect of Drosophila Integrator at protein-coding genes is transcription repression.
The Integrator RNA endonuclease is required for transcriptional repression
To determine whether IntS11 endonuclease activity is required for gene repression, we took advantage of a previously described mutant (IntS11 E203Q; Figure 3A) that abrogates endonuclease function yet retains the integrity of the Integrator complex (Baillat et al., 2005). We treated Drosophila cells for 60 hours with either control RNAi or with RNAi targeting the IntS11 UTRs and also re-expressed either wild-type IntS11 or the E203Q mutant in cells depleted of endogenous IntS11 (Figure S3A). RNA from these cells was isolated and subjected to poly(A)-enriched RNA-seq. As with IntS9 depletion, mature snRNA levels are not perturbed by IntS11 knockdown, and the major effect was upregulation of transcription (Figures S3B and S3C). The levels of gene upregulation observed upon depletion of IntS9 or IntS11 were highly concordant (Figures 3B and S3C). However, there was less agreement and smaller effect sizes observed at downregulated genes (Figures 3B and S3C).
The vast majority of gene expression changes observed in IntS11-depleted cells were restored to normal, control levels upon expression of the wild-type IntS11 (Figures 3B, 3C and S3D). In contrast, expression of the E203Q mutant not only failed to rescue the IntS11 depletion but exacerbated the knockdown phenotype, supportive of a dominant negative effect of the catalytically inactive IntS11 protein (Figures 3B, 3C and S3D). The results observed by RNA-seq (e.g. Figure 3D) were confirmed by RT-qPCR (Figure S3E). Together, these data indicate that depletion of either IntS9 or IntS11 lead to upregulation of a similar set of mRNAs and that IntS11 endonuclease activity is essential for the function of Integrator at these loci.
Integrator attenuates mRNA transcription
The critical involvement of the IntS11 endonuclease in gene repression by Integrator supports a model wherein RNA cleavage triggers premature termination. To further evaluate this model, we defined the full repertoire of transcriptional targets of Integrator, by comparing the above described PRO-seq data in gene bodies between control and IntS9-depleted samples (see STAR Methods). We found 1204 transcripts with significantly more elongating Pol II upon depletion of Integrator (Figure 4A), and 210 with reduced gene-body Pol II signal. This reveals that transcription of ~15% of active Drosophila genes is upregulated upon loss of Integrator activity.
Gene ontology analyses of the genes upregulated in PRO-seq agreed well with those from RNA-seq, highlighting metabolic, oxidoreductase and EGF pathways (Figure S4A and S2D). In contrast, enriched pathways for the downregulated genes in PRO-seq overlapped little with those enriched among RNA-seq downregulated genes (Figures S4B and S2E), in agreement with the lack of concordance between nascent and steady-state RNA levels within the downregulated gene sets (Figures 2E and S2G; only 29 genes downregulated in both PRO-seq and RNA-seq). Thus, we focused our attention on the much larger set of upregulated genes. The increase in gene body PRO-seq signal upon IntS9-depletion was substantial at upregulated genes, with a median increase of over 3.3-fold (Figure 4B). As anticipated, the majority of this increase in actively engaged Pol II is evident in PRO-seq signal near TSSs (Figure S4C). Thus, we conclude that Integrator typically acts on promoter-proximal Pol II, and that loss of Integrator allows engaged polymerase to successfully transition into productive elongation.
To distinguish between models wherein Integrator catalyzes promoter-proximal termination vs. those wherein Integrator prevents Pol II pause release, we evaluated the PRO-seq signal at genes upregulated upon depletion of IntS9. If Integrator holds Pol II near promoters, then IntS9 depletion should release this paused Pol II into gene bodies, resulting in less promoter-proximal PRO-seq signal and an increase in signal downstream. In contrast, if Integrator stimulates termination and dissociation of paused Pol II, then IntS9 depletion should increase PRO-seq signals both promoter-proximally and within genes. We observed that IntS9 depletion resulted in increased PRO-seq signal near promoters, as well as in gene bodies (Figure 4C). The increase in PRO-seq signal from IntS9-depleted cells localized precisely at the position of Pol II pausing, in the window from 25-60 nt into the gene (Figure 4D). This finding supports that Integrator targets promoter-paused Pol II and prevents its transition into productive RNA synthesis, likely through premature termination.
To determine whether Integrator similarly targets paused Pol II at enhancers, we made use of a comprehensive set of Drosophila enhancer transcription start sites (eTSSs) we recently defined (Henriques et al., 2018). We note that these sites were rigorously defined both functionally, in plasmid-based enhancer reporter assays (Arnold et al., 2013; Zabidi et al., 2015) and spatially, with the TSSs of eRNAs mapped at single-nucleotide resolution (Henriques et al., 2018). This dataset allows for a high-resolution analysis of Integrator activity at functionally confirmed, transcriptionally active enhancer loci at the genome-level. We focused on 1498 intergenic eTSSs, to avoid confounding signals from enhancers within annotated genes, and defined differentially transcribed loci using PRO-seq data as we had for mRNA genes (see STAR Methods). We observed increased transcription at ~15% of enhancers in IntS9-depleted cells (N=228), a similar fraction to mRNAs (Figure S4D) and find only 38 eTSSs with downregulated transcription. Thus, at enhancers, like at protein-coding genes, Integrator plays a generally repressive role in transcription elongation, and targets only selected loci. Importantly, many eRNA loci are not affected by loss of Integrator (Figure S4E), consistent with work implicating CPA and other machineries in eRNA 3’ end formation (Austenaa et al., 2015; Ogami et al., 2017).
The parallel in the behavior of Integrator at protein-coding and non-coding loci is further emphasized by the profile of PRO-seq at upregulated eTSSs (compare Figures 4E and 4C), where loss of Integrator causes an increase of PRO-seq signal precisely in the region of Pol II pausing (compare Figures 4F and 4D). We conclude that the function of Integrator is similar at coding and non-coding RNA loci: a comparable subset of TSSs are affected by Integrator, and Integrator depletion allows higher levels of Pol II release downstream into productive elongation.
Integrator is widely associated with mRNA promoter regions
The mechanism for Integrator-mediated 3’ end formation at snRNA loci involves both selective recruitment of Integrator to snRNA promoters and recognition of a degenerate motif near snRNA 3’ ends that promotes IntS11 cleavage activity (Baillat and Wagner, 2015; Hernandez, 1985; Hernandez and Weiner, 1986). Several factors implicated in recruiting Integrator to snRNA genes are also found at protein coding loci, such as the pause-inducing factors DSIF and NELF (Stadelmayer et al., 2014; Yamamoto et al., 2014), and phosphorylation on the Pol II C-terminal domain (CTD) repeats at Serine 7 residues (Egloff et al., 2007; Kim et al., 2010). Consistent with this, Integrator has been observed to associate with some mRNA promoters in human systems (Gardini et al., 2014; Skaar et al., 2015; Stadelmayer et al., 2014). However, it has not been fully explored how well the localization of Integrator at promoters corresponds to its gene regulatory activities at a genome-wide level.
We investigated the global localization of Integrator using our ChIP-seq datasets. We find that IntS1 and IntS12 subunits showed highly correlated localization across snRNA (r=0.99) and mRNA promoters (r=0.89) (Figures S5A and S5B), with a strong enrichment near mRNA transcription start sites (Figures 5A and S5C). Consistent with the low levels of Integrator detected near transcription end sites (Figure 5A, TES), we failed to detect defects in transcription termination at gene 3’ ends in Integrator-depleted cells (Figure S5D). Notably, Integrator signal at promoters correlated only weakly with levels of paused Pol II as determined by promoter PRO-seq signal (Figure S5B, r=0.39). Whereas these findings are consistent with Pol II, DSIF and NELF representing interaction surfaces for Integrator, they also indicate that association of Integrator with mRNA promoters is not strictly tied to paused Pol II levels. Notably, genes repressed by Integrator were significantly enriched in both IntS1 and IntS12 ChIP-seq signal as compared to genes unaffected by Integrator depletion (Figures 5B–5D, S5E and S5F). In fact, levels of Integrator observed at IntS9-repressed promoters were even higher than levels at snRNAs (Figure 5D and S5E). We noted, however, that Integrator ChIP-seq signals at genes with unchanged expression upon IntS9 RNAi were well above background levels (Figure 5D), suggesting that Integrator is also recruited to promoters where it remains inactive.
To further investigate the relationship between Integrator binding and activity, we rank ordered all active mRNA promoters by their IntS1 ChIP-seq signal, and calculated cumulative distributions of Integrator-repressed and unchanged genes across this ranking (Figure 5E). This analysis demonstrated that Integrator exhibits the full spectrum of binding levels at unchanged genes. However, IntS9-repressed genes were clearly and significantly biased towards higher IntS1 occupancy (Figure 5E, >50% of IntS9-repressed genes fall within the top 20% of IntS1 levels, whereas only 15% of unchanged genes fall in this group). Thus, like at the snRNAs, Integrator recruitment to an mRNA promoter is not sufficient to dictate function, but high-level Integrator occupancy is typically associated with activity.
To determine whether increased recruitment of Integrator was also related to functional outcomes at enhancers, we identified eTSSs that exhibited significant peaks of IntS1/IntS12 signal (Figure S5G). Comparing PRO-seq at these loci in control vs. IntS9-depleted conditions demonstrated that Integrator-bound eTSSs showed increased transcription elongation upon IntS9 RNAi (Figure 5F). In contrast, no significant change in PRO-seq signal was observed at Integrator-unbound eTSSs upon depletion of IntS9 (Figure S5H). We conclude that functional mRNA and eRNA targets of Integrator display greater recruitment of this complex. Although the factors governing this elevated recruitment of Integrator at snRNA or other loci remain to be elucidated, our results underscore a common behavior for Integrator at coding and non-coding loci.
Integrator mediates promoter-proximal termination
Taken together, our results are most consistent with Integrator serving as a promoter-proximal RNA cleavage factor that induces termination at a set of protein-coding genes. To definitively test this possibility, we investigated the short, TSS-associated RNAs that would accompany Pol II termination. In particular, we used Start-seq (Henriques et al., 2018; Nechaev et al., 2010; Williams et al., 2015) to identify RNAs under 100 nt in length that were 3′ oligoadenylated, a modification that can be detected on a minor fraction of RNAs released by Pol II during termination (Figure 6A). Such oligoadenylated termination products are subject to degradation and normally very short-lived, but are stabilized in cells depleted of the RNA Exosome. Accordingly, following depletion of the Exosome subunit Rrp40, we observed significantly more oligoadenylated short RNAs from IntS9-repressed genes relative to unchanged genes (Figures 6B, S6A and S6B). The 3′ ends of these oligoadenylated RNAs are highly and specifically enriched within the region of Pol II pausing (Figure 6B).
Integrator-mediated RNA cleavage should occur on nascent RNA that has exited the polymerase. The structure of paused elongation complexes (Core and Adelman, 2019; Henriques et al., 2013; Vos et al., 2018), indicates that RNA emerges from the exit channel and is available for binding ~20 nt upstream of the 3’ end position of the nascent RNA. Accordingly, the peak of oligoadenylated RNA 3’ end locations at upregulated genes is +35 nt (Figure 6B), and the peak of paused Pol II at these genes is +55 nt. This is precisely the anticipated difference of 20 nt between the site of RNA cleavage and the RNA 3’ end in the Pol II active site (Figure S6C). From these data, we conclude that Integrator-repressed genes undergo markedly higher levels of Pol II termination as compared to non-Integrator target genes, and suggest that promoter-proximally paused Pol II is the predominant target of Integrator-mediated RNA cleavage activity. However, we note that Integrator may also act on Pol II after pause release (see Figure 4D, which shows increases in active Pol II signal extending to +75), with Integrator perhaps also targeting elongating Pol II as it encounters the first nucleosome.
We next compared the stability of promoter-associated Pol II at Integrator-repressed genes after treatment with Triptolide. Based on increased premature termination at these genes, and our identification of Integrator enrichment at genes with unstable Pol II (Figure 1D), we predicted that Integrator-repressed genes would exhibit reduced promoter Pol II stability as compared to Integrator-unaffected genes. In agreement with this, we observed that Pol II was lost quickly at a majority of IntS9-repressed genes, with half-lives <10 minutes (Figures 6C and 6D). In contrast, genes whose expression is unchanged by IntS9-depletion presented a Pol II that is stable after Trp treatment, indicative of long-lived pausing (Figure 6D). Thus, based on many independent lines of evidence we conclude that genes with unstable Pol II recruit Integrator, rendering them susceptible to promoter-proximal termination, and resulting in reduced productive RNA synthesis.
Increased transcription at Integrator target genes and enhancers is accompanied by H3K4me3
Genes upregulated by IntS9-depletion exhibited lower levels of H3K36me3 and H3K4me3 (Figures 6E, 6F and S6D) and higher levels of H3K4me1 (Figure S6E) than unchanged genes. This signature is consistent with defects in productive elongation, and in agreement with these genes harboring unstable paused Pol II (Figure 1B). To probe the relationship between low levels of canonical promoter-associated histone modifications such as H3K4me3 and Integrator-mediated transcription attenuation, we performed H3K4me3 ChIP-qPCR at a number of Integrator-repressed promoters in Control and IntS9-depleted cells. For this experiment, we used a highly-specific H3K4me3 antibody that does not cross-react with other H3K4 methylation states (Shah et al., 2018). We found that upregulation of transcription upon loss of Integrator was concomitant with increased H3K4me3 ChIP signal at all promoters tested (Figures 6G and S6F). Parallel ChIP experiments performed at Integrator target enhancers revealed that the increased eRNA synthesis observed was also linked with the appearance of H3K4me3 signal near these eTSSs (Figures 6G and S6G). No such elevation in H3K4me3 levels was detected at promoters or enhancers not targeted by Integrator (Figures 6G, S6F and S6G). These results demonstrate that H3K4me3 levels are responsive to local transcription activity, and can increase at both promoters and enhancers when transcription elongation and/or Pol II occupancy are elevated.
Integrator-mediated gene repression is conserved in human cells
Our data in Drosophila indicate a mechanistically conserved role for Integrator in promoter-proximal termination of mRNA and eRNA synthesis. Although our model is in agreement with data from mammalian systems as regards eRNA biogenesis (Lai et al., 2015), it differs considerably from any of the proposed roles of Integrator at mammalian protein-coding genes (Barbieri et al., 2018; Gardini et al., 2014; Lai et al., 2015; Skaar et al., 2015; Stadelmayer et al., 2014). In particular, a majority of models posit that mammalian Integrator is an activator of transcription, and none of the proposed functions involve the IntS11 endonuclease in termination. For example, based on genomic studies of Integrator localization and activity in HeLa cells, it was proposed that Integrator stabilizes paused Pol II and facilitates both processive transcription elongation and RNA processing (Stadelmayer et al., 2014). Alternatively, other work in HeLa cells has implicated Integrator as critical for the rapid, EGF-mediated induction of ~100 ‘immediate early’ genes, including JUNB and FOS (Gardini et al., 2014). However, a detailed analysis of JUNB and several other immediate early genes in Integrator-depleted HeLa cells prior to EGF stimulation indicated that the basal expression of these genes was upregulated by loss of Integrator, suggesting that Integrator inhibits expression of EGF-responsive genes under normal cellular conditions (Skaar et al., 2015). Thus, it remains an open question whether, in the absence of a stimulus, mammalian Integrator plays a repressive role similar to that uncovered for the Drosophila complex.
We analyzed previously published chromatin-associated RNA-seq from control and IntS11-depleted HeLa cells. While chromatin-associated RNA-seq lacks the spatial resolution of PRO-seq, it is a significantly better indicator of ongoing transcription than is steady-state RNA-seq. Using the same strategies employed for analysis of PRO-seq, we found a substantial number of genes upregulated in IntS11-depleted HeLa cells (N=667; Figures 7A and S7A), comparable to the number of genes downregulated under these conditions (N=616). Thus, mammalian Integrator appears capable of repressing as well as activating gene transcription.
To define the position of mammalian Pol II at higher resolution, we analyzed PRO-seq data from HeLa cells (Nilson et al., 2017) comparing the signals at genes upregulated or unchanged upon Integrator depletion (Figure 7B). As expected, genes unaffected by Integrator show a peak of PRO-seq signal near promoters with signal extending into the gene body, characteristic of pausing followed by productive elongation. In contrast, Integrator target genes exhibit clear promoter-proximal signal indicative of paused Pol II, but very little evidence of productive elongation. These results are in strong agreement with our findings in Drosophila and suggest a promoter-proximal elongation defect at Integrator-repressed genes in human cells.
The very tight distribution of Pol II near Integrator target promoters was notable in implying very efficient termination following RNA cleavage. This stands in contrast to the situation at gene 3’ ends, where Pol II often transcribes for >1 kb past the cleavage and polyadenylation site before Xrn2 mediates transcription termination (Eaton et al., 2018). Therefore, we asked whether Xrn2 is the dominant termination machinery used at Integrator-repressed genes. Chromatin RNA-seq from control and Xrn2-depleted cells (Nojima et al., 2015) show very little difference at Integrator-repressed genes (Figure 7A). Similarly, we observed no role for CPSF73, the RNA endonuclease associated with the CPA machinery, at these promoters (Figure S7B). We conclude that transcription termination at Integrator-repressed genes is not dependent on the activity of Xrn2 or the CPA machinery. We thus propose that Integrator itself, or yet unidentified protein partners, can effectively destabilize promoter-proximal Pol II and induce termination.
The JUNB gene, which is a defined target of Integrator (Gardini et al., 2014), is strongly upregulated in HeLa cells depleted of IntS11 (Figure 7C), consistent with earlier work (Skaar et al., 2015). Moreover, many characterized immediate early genes exhibit elevated transcription under IntS11-depleted conditions and enriched Gene Ontology categories for upregulated transcripts include receptor and EGF pathways (Figure S7C). Further, there is a concordance between upregulated pathways in Drosophila and human cells (compare Figure S7C to S4A), supporting a functional conservation of Integrator activity within specific signaling networks. Critically, basal upregulation of EGF-responsive genes upon Integrator depletion would be predicted to dampen the subsequent responsiveness of these genes to signaling. We therefore suggest that the marked upregulation of EGF-responsive genes observed upon Integrator depletion may be the underlying mechanism behind the lower fold-induction of these genes upon EGF-stimulation that was reported in earlier work (Gardini et al., 2014).
To further probe the parallels between Integrator-mediated gene repression in Drosophila and human cells, we determined whether Integrator-repressed human genes also displayed chromatin features indicative of defective transcription elongation, such as reduced H3K36me3 and H3K4me3. Indeed, both of these histone modifications were significantly lower at human genes upregulated upon Integrator depletion as compared to unchanged genes (Figures 7D, 7E and S7D). In addition, these genes showed enrichment in H3K4me1, a feature of both Drosophila Integrator gene targets and enhancers (Figures S7D and S7E). Thus, the significant commonalities among Drosophila and human genes repressed by Integrator suggest a conserved mechanism across metazoan species (Figure 7F), wherein Integrator targets promoter-proximal elongation complexes at a set of genes to repress gene activity.
DISCUSSION
Collectively, our results demonstrate that the Integrator complex mediates transcription attenuation in metazoan cells. We present evidence that Integrator associates with promoter-proximally paused Pol II, cleaves nascent mRNA transcripts, and directs promoter-proximal termination (Figure 7F). This inhibitory function is broad: 15% of Drosophila genes and enhancers are impacted by Integrator, with receptor, growth and proliferative pathways particularly affected. Furthermore, the mammalian Integrator complex targets genes in similar pathways for transcriptional repression, underlining the conserved nature of this behavior.
These data resolve long-standing questions about the intrinsic stability of promoter-proximal Pol II. We demonstrate that genes that harbor highly unstable promoter Pol II are those where there is an active process of termination, catalyzed by the Integrator complex. Our data support a model wherein the paused polymerase is inherently stable in the absence of termination factors, consistent with a wealth of biochemical characterization of elongation complexes (Kireeva et al., 2000; Wilson et al., 1999). Thus, we propose that rapid turnover of promoter Pol II at specific genes results from a regulated process of Integrator-mediated RNA cleavage and active dissociation of Pol II from the DNA template.
The mechanistic activity we uncover here for Integrator at protein-coding genes and enhancers parallels that described at snRNA genes, where Integrator cleaves the nascent RNA and promotes Pol II termination (Baillat and Wagner, 2015; Cazalla et al., 2011; Hernandez, 1985; Xie et al., 2015). Therefore, our model for Integrator function is parsimonious with its previously defined biochemical activities. Moreover, consistent with IntS9 and IntS11 subunits being paralogs of CPSF100 and CPSF73, respectively, there are many similarities between premature Pol II termination caused by Integrator, and mRNA cleavage and termination by the CPA machinery. We note that RNA cleavage and termination at gene ends mediated by CPA factors and Xrn2 is coupled with polyadenylation to protect the released mRNA. Likewise, Integrator-catalyzed cleavage of snRNAs is coupled to proper 3’ end biogenesis. In contrast, termination driven by Integrator at protein-coding and enhancer loci does not appear to be dependent on Xrn2, and the RNA products are typically degraded rapidly (Ogami et al., 2017). These results indicate that the Integrator endonuclease activity can be deployed for different purposes at different loci, with the outcome governed by the locus-specific recruitment of termination, RNA processing or RNA decay machineries. Moreover, a recent study reports that Integrator limits the activation of the Metallothionein A (MtnA) gene in Drosophila cells during copper stress (Tatomer et al., 2019). We did not detect MtnA as a target of Integrator in unstressed cells, suggesting that Integrator occupancy and/or activity can be altered in response to the cellular environment.
It has been established that cleavage and termination by the CPA machinery is greatly facilitated by pausing of Pol II (Proudfoot, 2016), as is snRNA 3’ end formation by Integrator (Guiro and Murphy, 2017). Current models invoke a kinetic competition between Pol II elongation and termination, wherein slowed transcription elongation provides a greater window of opportunity for termination to occur (Fong et al., 2015; McDowell et al., 1994). Consistent with these models, we find that promoter-proximally paused Pol II is an optimal target for Integrator-mediated cleavage and termination at mRNA and eRNA loci. Our findings thus suggest a novel function for promoter-proximal pausing, wherein slowed elongation provides a regulatory opportunity that enables gene attenuation. Likewise, we suggest that pausing further downstream, as Pol II approaches the first nucleosome, could present an additional target for Integrator-mediated termination.
Integrator-repressed genes, which exhibit very low levels of productive elongation, have chromatin characteristics that are common at enhancers. In particular, these genes display low levels of active histone modifications H3K4me3 and H3K36me3, with an enrichment in H3K4me1. Like at Integrator-repressed genes, transcription at enhancers is known to be non-productive, with a highly unstable Pol II that yields only short, rapidly degraded RNAs (Henriques et al., 2018; Kim and Shiekhattar, 2015). Remarkably, depletion of Integrator and increased productive elongation at promoters and enhancers is coupled with the deposition of H3K4me3. Thus, our data support models wherein the chromatin features surrounding a TSS reflect the level and productivity of transcription at the locus, rather than specifically demarcating the coding vs. non-coding potential of the region (Andersson et al., 2015; Core et al., 2014; Henriques et al., 2018; Soares et al., 2017).
Taken together, the role we describe here for Integrator in determining the fate of promoter Pol II sheds new light on Integrator function in development and disease states. Mutations in Integrator have been associated with a myriad of diseases (Rienzo and Casamassimi, 2016), with each of the 14 Integrator subunits implicated in one or more disorders. Intriguingly, many of these disease states are not characterized by defects in splicing and are often associated with disruption in normal development (Rienzo and Casamassimi, 2016). Thus, the human genetics foretold that Integrator functions extend well beyond snRNA processing. Accordingly, we find that Integrator targets a set of stimulus- and developmentally-responsive genes to potently repress their activity. It will be interesting in future work to tease out the specific roles of the individual Integrator subunits in gene regulation, in the hopes of exploiting this knowledge for therapeutic benefit.
STAR METHODS
LEAD CONTACT AND MATERIALS AVAILABILITY
Further information and requests for resources and reagents should be directed to and will be fulfilled by Karen Adelman: karen_adelman@hms.harvard.edu. The reagents generated in this study will be readily shared via materials transfer agreement.
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Drosophila cell lines
Drosophila DL1 cells were cultured at 25°C in Schneider’s Drosophila medium (Thermo Fisher Scientific 21720024), supplemented with 10% (v/v) fetal bovine serum (HyClone SH30910.03), 1% (v/v) penicillin-streptomycin (Thermo Fisher Scientific 15140122), and 1% (v/v) L-glutamine (Thermo Fisher Scientific 35050061). Drosophila S2 cells from the DGRC were grown in Shields and Sang M3 (Sigma S3652) media supplemented with bactopeptone (BD Biosciences 211677), yeast extract (Sigma Y1000) and 10% FBS (Thermo Fisher Scientific 16000).
METHODS DETAILS
Expression plasmid construction and generation of stable cell lines
To generate the selectable IntS11 expression plasmids, the previously described pUB-3xFLAG vector (Chen et al., 2012) Flag tag and MCS was cloned into the pMT-puro expression plasmid (a gift from David Sabatini, Addgene plasmid # 17923). Drosophila cDNA or the cDNA for the eGFP protein was then cloned into the resultant expression plasmid. The PCR primers are provided in Table S3. The IntS11 E203Q mutation (GAG to CAG) was subsequently introduced using site-directed mutagenesis. All plasmids were sequenced to confirm identity.
To generate DL1 cells stably maintaining the Flag-tagged IntS11WT, E203Q mutant, and the eGFP control line transgenes, 2 × 106 cells were first plated in complete media in 6-well dishes. After 1 hour, 2 μg of pUB Flag-IntS11WT-puro, pUB Flag-IntS11E203Q-puro, or Flag-eGFP-puro were transfected using Fugene HD (Promega E2311). On the following day, 2.5 μg/mL puromycin was added to the media to select and maintain the cell population.
RNAi
Double-stranded RNAs from the DRSC (Drosophila RNAi Screening Center) were generated by in vitro transcription (MEGAscript kit, Thermo Fisher Scientific AMB13345) of PCR templates containing the T7 promoter sequence on both ends. Primer sequences are provided in Table S3. Knockdown experiments in 6-well dishes were then performed by bathing 1.5x106 cells with 2 μg of dsRNA, followed by incubation for 60 hours of standard cell culture conditions. For RNAi + rescue experiments (Figure 3) cells were incubated for 60 hours in the presence of dsRNA and media was supplemented with a final concentration of 100 μM CuSO4 to induce expression of the RNAi-resistant IntS11 WT or IntS11 E203Q transgenes.
RT-qPCR
Total RNA was isolated using Trizol and cDNA was reverse transcribed using M-MLV Reverse Transcriptase (Thermo Fisher Scientific 28025) according to the manufacturer’s instructions. Random hexamers were used for cDNA synthesis and RT-qPCR was then carried out in triplicate using Bio-Rad iTaq Universal SYBR Green Supermix (Bio-Rad 1725120). All RT-qPCR primers are provided in Table S3.
Analysis of protein expression by Western blotting and immunofluorescence
For Western blotting, cells were gently washed in PBS and then resuspended in RIPA buffer (150 mM NaCl, 1% Triton X-100, 50 mM Tris pH 7.5, 0.1% SDS, 0.5% sodium-deoxycholate, and protease inhibitors [Roche 11836170001]). Lysates were passed 10 times through a 28.5 gauge needle and cleared by centrifugation at 20,000xg for 20 min at 4°C. Lysates were then resolved on a NuPAGE 4-12 % Bis-Tris gel (Thermo Fisher Scientific NP0323) and transferred to a PVDF membrane (Bio-Rad 1620177). Primary antibody incubations (IntS9 [guinea pig], IntS11 [rabbit] (Ezzeddine et al., 2011) or alpha-tubulin (rabbit, abcam ab15246) were all done at room temperature for 2 hours with a 1:1000 dilution in 5% milk in TBS-0.1% Tween. Conjugated secondary antibodies against rabbit (GE Healthcare NA934) or guinea pig (Sigma AP108P) were incubated at room temperature for 90 minutes with 1:10000 dilution in TBS-0.1% Tween. Membranes were processed using SuperSignal West Pico Chemiluminescent Substrate (Thermo Fisher Scientific PI34080).
Northern blotting
Total RNA was isolated using Trizol (Thermo Fisher Scientific 15596018) as per the manufacturer’s instructions. Small RNAs were separated by 8% denaturing polyacrylamide gel electrophoresis (National Diagnostics EC-833) and electroblotted/UV crosslinked to Hybond N+ membrane (GE Healthcare RPN303B). ULTRAhyb-oligo hybridization Buffer (Thermo Fisher Scientific AM8663) was used as per the manufacturer’s instructions. All oligonucleotide probe sequences are provided in Table S3. Blots were viewed and quantified with the Typhoon 9500 scanner (GE Healthcare) and quantified using ImageQuant (GE Healthcare). Representative blots from ≥3 experiments are shown.
Chromatin Immunoprecipitation (ChIP)-qPCR
A 10-cm dish of 5 x 107 untreated (IntS1 or IntS12 ChIP) or Control versus IntS9-depleted (H3K4me3 ChIP) DL1 cells was harvested into a 15 mL tube and centrifuged at 1,500x g for 2 min. Cells were then washed with 10 mL PBS and centrifuged at 1,500x g for 2 min. The cell pellet was resuspended in 10 mL of Fixing Buffer (50 mM Hepes pH 7.5, 100 mM NaCl, 1 mM EDTA pH 8.0, 0.5 mM EGTA pH 8.0 with 1% formaldehyde) and incubated at room temperature for 30 min. 0.5 mL of 2.5 M glycine was then added (final concentration of 0.125 M) and incubated at room temperature with rotation for 5 min, centrifuged at 1,500 g for 2 min, and washed two times with 10 mL PBS. Cells were lysed using lysis buffer (50 mM HEPES pH 7.9, 140 mM NaCl, 1 mM EDTA, 10% glycerol, 0.5% NP-40, 0.25% Triton X-100) for 10 min on ice and centrifuged at 1,500 g for 2 min. The pellet was then washed 2x in Wash Buffer (10 mM Tris-HCl pH 8.1, 200 mM NaCl, 1 mM EDTA pH 8.0, 0.5 mM EGTA pH 8.0) and resuspended in 1 mL Shearing Buffer (0.1% SDS, 1 mM EDTA, 10 mM Tris-HCl pH 8.1). The suspension was sonicated at 4°C using a C ovaris S220 machine to obtain 500 bp DNA fragments in TC12x12 tubes with AFA fiber (Settings: Time-15 min, Duty Cycle-5%, Intensity-4, Cycles per Burst-200, Power mode Frequency-Sweeping, Degassing mode-Continuous, AFA Intensifier-none, Water level-8). To the 1 mL of sheared chromatin, 115 μL of 10% Triton X-100 and 34 μL 5 M NaCl was added per ml of sheared chromatin, so that the final concentration of the sample is 1% Triton X-100 and 150 mM NaCl. Sheared chromatin was pre-cleared with protein A/G beads and 10 μL was reserved as input control. For each IP sample, 100 μL of sheared chromatin was diluted to 1 mL using IP Buffer (0.1% SDS, 1 mM EDTA, 10 mM Tris-HCl pH 8.1, 1% Triton X-100, 150 mM NaCl) and incubated overnight at 4°C with serum (IgG) or IntS1, IntS12 or H4K4me3 antibodies. The next day, lysates were immunoprecipitated with protein A/G beads for 2 h at 4°C and washed on ce with low salt buffer (0.1% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM Hepes pH 7.9, 150 mM NaCl), twice with high salt buffer (0.1% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM Hepes pH 7.9, 500 mM NaCl), once with LiCl buffer (100 mM Tris-HCl pH 7.5. 0.5 M LiCl, 1% NP-40, 1% Sodium Deoxycholate), and once with TE. Immunocomplexes were eluted and de-crosslinked at 65°C overnight with Proteinase K and RNase A. DNA was extracted by phenol-chloroform and ethanol precipitated. DNA was resuspended in 100 uL, and 2 uL was used for each qPCR reaction.
QUANTIFICATION AND STATISTICAL ANALYSIS
RT-qPCR quantification and analysis
For RT-qPCRs statistical significance for comparisons of means was assessed by Student’s t test. Unless otherwise indicated, the comparison was to the control RNAi treated samples. Statistical details and error bars are defined in each figure legend.
Generation of Transcript Annotations
All transcript annotations for D. melanogaster r5.57 were downloaded from flybase.org in GTF format and filtered such that only “exon” entries for the feature types considered for re-annotation remained. Annotations from chrY, chrM, and random chromosomes were also excluded. Unique “gene_id” values were assigned to each transcript, such that those grouped and represented by a single member in TSS-based analyses were identical. Precise TSS locations employed were based on high-resolution Start-seq data as described previously (Henriques et al., 2013; 2018; Nechaev et al., 2010). The start location of each transcript was adjusted to the observed TSS from Start-seq when this resulted in truncation, rather than extension of the model. If the observed TSS fell within an intron, all preceding exons were removed, and the transcript start was set to the beginning of the following downstream exon. Termination end sites for each of the transcripts used were taken from annotation and in Figure S5D only those that were distant at least 1kb from an observed TSS were analyzed. Gene annotations for the human genome (hg19, GRCh37 genome build July 2019) were downloaded from gencodegenes.org in GTF format and filtered such that only “gene” entries for the “protein_coding” feature type remained. Annotations from chrM, and random chromosomes were also excluded.
TSS clustering based on promoter Pol II half-lives upon Trp treatment
TSS clustering was accomplished as described in (Henriques et al., 2018) using k-medoids clustering based on the Clustering Large Applications (CLARA) object in R.
Features associated with genes with short-lived promoter Pol II occupancy
A comprehensive repertoire of ChIP-seq datasets from (Baumann and Gilmour, 2017; Henriques et al., 2018; Kaye et al., 2018; Lim et al., 2013; Weber et al., 2014) and ChIP-chip from the modENCODE database (Ho et al., 2014; modENCODE Consortium et al., 2010) was used representing a total of 111 datasets that include transcription factors, chromatin remodelers and histone modifications.
To find features enriched at protein-coding transcription start sites with short-lived promoter Pol II occupancy a similar approach to the web-based tool ORIO (Lavender et al., 2017) was taken. Analysis of all datasets was anchored on the TSS locations of protein-coding transcripts based on high-resolution Start-seq data (see generation of transcript annotations above). A total of 8389 protein-coding TSSs, in which a decay rate could be calculated, was used. A rank order was given to the TSS feature list based on the decay rate clustering. Read coverage for each dataset used was determined at each TSS using a window that originates 500 nucleotides upstream of the TSS and extends downstream by twenty 50 nt non-overlapping bins, with total window size of 1000 nucleotides. Correlative analysis was then performed considering read coverage values. A total read coverage value was found for each genomic feature by adding the coverage from the datasets across all bins in a genomic window. Clustering methods were then applied to total read coverage values considering both the datasets and individual genomic features. To group datasets, the Pearson and Spearman correlation value for each pair of datasets was determined by comparing feature coverage values. To group the datasets, the correlation value for each pair of datasets is found by comparing feature coverage values. Datasets were then grouped by hierarchical clustering.
ATAC-seq library generation and mapping
ATAC-seq libraries from 3 independent biological replicates were generated. 50,000 Drosophila S2 cells were incubated in CSK buffer (10 mM PIPES pH 6.8, 100 mM NaCl, 300 mM sucrose, 3 mM MgCl2, 0.1% Triton X-100) on ice for 5 min. An aliquot of 2.5 μl of Tn5 Transposase was added to a total 25 μl reaction mixture and genomic DNA was purified using a Qiagen MinElute PCR purification kit (Qiagen) following manufacturer’s instructions. After PCR amplification, DNA fragments were purified with AMPure XP (1:3 ratio of sample to beads). Libraries were sequenced using a paired-end 150 bp cycle run on an Illumina NextSeq 500
Paired-end reads were filtered for adapter sequence and low quality 3′ ends using cutadapt 1.14, discarding those containing reads shorter than 20 nt (-m 20 -q 10), and removing a single nucleotide from the 3′ end of all trimmed reads to allow successful alignment with bowtie 1.2.2 to the dm3 genome assembly. The parameters used in each alignment were: up to 2 mismatches, a maximum fragment length of 1000 nt, and uniquely mappable, and unmappable pairs routed to separate output files (-m1, -v2, -X1000, --un). Non-duplicate reads mapping uniquely to dm3, representative of short fragments (> 20 nt and < 150 nt), were separated, and fragment centers determined in 25 nucleotide windows resolution, genome-wide, and expressed in bedGraph format. Combined bedGraphs for all replicates were generated by summing counts per bin for all replicates.
Sample | Total reads | Uniquely mapped reads (Percentage of total) | Agreement between replicates (Spearman’s rho) |
---|---|---|---|
ATAC-seq | 42,095,224 | 62.94% | >0.97 |
RNA-seq library generation and mapping
DL1 cells were treated for 60 h with a control (Beta-galactosidase) dsRNA or a dsRNA to deplete either IntS9 or IntS11 (see RNAi details above) followed by total RNA isolation with Trizol (Thermo Fisher Scientific 15596026) following manufacturer’s instructions. RNA quality was confirmed with a BioAnalyzer (Agilent). Using Oligo d(T)25 Magnetic Beads (NEB S1419S), polyA+ RNA from 2.5 μg of total RNA was then enriched and RNA-seq libraries (3 independent biological replicates per condition) prepared using the Click-seq library preparation method using a 1:35 azido-nucleotide ratio (Jaworski and Routh, 2018). Libraries were sequenced using a single-end 75 bp cycle run on an Illumina NextSeq 500.
Sequencing reads were filtered (requiring a mean quality score ≥20), trimmed to 50 nt, and then mapped to the dm3 reference genome using STAR 2.5.2b. Default parameters were used except that multimappers were reported randomly (outMultimapperOrder Random), spurious junctions were filtered (outFilterType BySJout), minimum overhang for non-annotated junctions was set to 8 nucleotides (alignSJoverhangMin 8), and non-canonical alignments were removed (outFilterIntronMotifs RemoveNoncanonicalUnannotated). The total number of RNA-seq reads aligned in the control, IntS9 or IntS11 RNAi samples is described in the table below.
Sample | Total Reads | Mappable Fragments (Percentage of total) | Agreement between replicates (Spearman’s rho) |
---|---|---|---|
Control (βgal) | 65,094,896 | 71.58% | >0.98 |
IntS9-dep. | 54,967,569 | 68.20% | >0.98 |
Control (βgal) + eGFP | 69,413,816 | 87.42% | >0.99 |
IntS11-dep. + eGFP | 49,878,632 | 84.54% | >0.99 |
IntS11-dep. + WT | 51,586,717 | 85.29% | >0.99 |
IntS11-dep. + E203Q | 57,653,391 | 86.64% | >0.99 |
MISO Analysis
Mixture of Isoform analysis (MISO) (Katz et al., 2010) was performed using the latest stable build (ver. 0.5.4) following the directions for an exon-centric analysis on the documents section of the developer’s site (http://miso.readthedocs.io/en/fastmiso/). Differential expression was compared between the control (Beta-galactosidase) and IntS9-depleted RNA-seq BAM files for retained introns, skipped exons, alternative 5′ splice sites, alternative 3′ splice sites, and mutually excluded exons using the Drosophila annotations mentioned above. The results were then filtered using the developer suggested default settings to contain only events with: (a) at least 10 inclusion reads, (b) 10 exclusion reads, such that (c) the sum of inclusion and exclusion reads is at least 30, and (d) the ΔΨ is at least 0.25 with a (e) Bayes factor of at least 20, and (a)-(e) are true in one of the samples. Using this filter, locations of alternative splicing events were compared to Flybase annotated chromosomal regions using the UCSC genome browser table browser to identify the FBgnIDs of affected genes. The number of changes in splicing events are described in the table below.
Splicing Event Type | Events compared | Events passing filter | Percent Events Passing Filter |
---|---|---|---|
Retained Intron | 24353 | 412 | 1.69% |
Alternative 5’SS | 3231 | 63 | 1.95% |
Alternative 3’SS | 1584 | 46 | 2.9% |
Skipped Exon | 1376 | 27 | 1.96% |
Mutually Exclusive Exon | 73 | 0 | 0% |
All Flybase genes that included any splicing event that passed filter in MISO were removed from the list of active genes, such that a total of 9,499 active genes were investigated for the effects of IntS9 depletion.
Differentially expressed genes in RNA-seq
Read counts were calculated per gene, in a strand-specific manner, based on annotations described in the modified transcript annotations section above, using featureCounts (Liao et al., 2014). Differentially expressed genes were identified using DESeq2 v1.18.1(Anders and Huber, 2010) under R 3.3.1. For Control versus IntS9-depletion comparisons RNA-seq size factors were determined based on DESeq2 (Control [βgal]: 1.1861939, 1.4205182, 1.2440253; IntS9-dep.: 1.0780809, 0.9979663, 0.8519904), and at an adjusted p-value threshold of <0.0001 and fold-change > 1.5, 886 genes (out of 9499) were identified as differentially expressed upon IntS9 depletion in DL1 cells. For Control versus IntS11-depletion or rescue samples comparisons RNA-seq size factors were determined based on DESeq2 (Control [βgal]: 1.3346867, 1.8951248, 0.6622473; IntS11-dep.: 0.8673446, 0.9127478, 0.9793937; IntS11-dep. + WT rescue: 1.1305191, 1.0792675, 0.7458915; IntS11-dep. + E203Q rescue: 1.1589313, 1.1588886, 0.7106579) and fold-changes calculated. For Control versus IntS11-depletion chromatin RNA-seq size factors were determined based on DESeq2 (Control: 1.1315534, 1.1665893; IntS11-dep.: 0.8940834, 0.8515502;) and at an adjusted p-value threshold of <0.0001 and fold-change > 1.5, 1283 genes (out of 17262) were identified as differentially expressed upon IntS11 depletion in HeLa cells. UCSC Genome Browser tracks displaying mean read coverage were generated from the combined replicates per condition, normalized as in the differential expression analysis. For Control versus Xrn2-dep. or Control versus CPSF73-dep. chromatin RNA-seq metagenes, the profiles were plotted in 50 nt windows using a scale of reads per 108 sequences – as described in (Nojima et al., 2015) – using an in-house perl script.
Sequencing, mapping, and data analysis of ChIP-seq
For IntS1 and IntS12 ChIP-seq, DL1 cells were crosslinked for 30 min with 1% formaldehyde. Material was then sheared using the Covaris S220 system and immunoprecipitations for 3 (IntS1 and IntS12) independent biological replicates were carried out with 10 μl anti-IntS1 or anti-IntS12 antibodies per 3 x 107 cells. Additionally, 3 independent biological replicates of input material were carried through. Immunoprecipitated and input material was phenol-chloroform purified and ChIP-seq libraries were prepared using the NEBNext Ultra II DNA library kit (NEB) according to the manufacturer’s instructions with 35ng of DNA of each sample. IntS1, IntS12 and input ChIP-seq libraries were then sequenced using a paired-end 75 bp cycle run on the Illumina NextSeq system with standard sequencing protocols. Raw sequences were aligned at full length against the dm3 version of the Drosophila genome using Bowtie version 1.2.2 (Langmead et al., 2009) with a maximum allowed mismatch of 2 (-m1 – v2). The yield of uniquely mappable reads for each set of biological replicates is listed below.
Sample | Total reads | Uniquely mapped reads (Percentage of total) | Agreement between replicates (Spearman’s rho) |
---|---|---|---|
Input | 76,003,314 | 62.82% | >0.97 |
IntS1 | 92,977,225 | 61.76% | >0.97 |
IntS12 | 107,536,665 | 62.66% | >0.97 |
Datasets were mapped as described above against the dm3 version of the Drosophila genome. The genomic location of mapped reads was compiled using custom scripts and visually examined using the UCSC genome browser in bedGraph format. ChIP-seq hit locations were filtered based on fragment length. The 3 biological replicates of each ChIP-seq dataset were combined and binned in 25 bp windows for visualization in bedGraph files. IntS1 and IntS12 were downsampled by a factor of 1.202985486 and 1.411913925, respectively to match the number of reads in the input dataset. To remove background signal, input signal was subtracted from IntS1 and IntS12 datasets and bedGraphs were generated with 25 bp windows for visualization.
IntS1 and IntS12 ChIP-Seq peak calling and annotation
IntS1 and IntS12 ChIP-seq peaks were called with Homer (v4.9) using (-style factor) and input as background (-i). Filtering based on local signal was set to 3 (-L 3) and fold-change signal over input was also set to 3 (-F 3). 490 IntS1 and 553 IntS12 peaks were identified. A peak was assigned to enhancer TSSs (eTSSs) if the peak center would be within ± 500 bp from the eTSS. A total of 691 eTSSs were found to be bound by at least one Integrator subunit.
Metagene analysis
Composite metagene distributions were generated by summing sequencing reads at each indicated position with respect to the TSS and dividing by the number of TSSs included within each group. These were plotted across a range of distances. Heatmaps were generated using Partek Genomics Suite version 6.15.0127.
Identification of Start-seq reads with non-templated 3′ end residues
Start-seq from Control or Rrp40-depleted S2 cells was published previously (Henriques et al., 2013) and is available for download from GEO (GSE49078). Data were analyzed as described previously (Henriques et al., 2013). Briefly, Start-RNA reads were trimmed to 26 nt and aligned to the D. melanogaster reference genome index with Bowtie version 1.2.2, maintaining unique alignments and allowing 2 mismatches (-m1 -v2). To account for the different depths of sequencing across the data sets, all data sets were normalized by uniquely mappable reads. To then identify Start-RNAs with non-templated 3′ end residues, reads that initially failed to align with the above Bowtie parameters were specifically trimmed at the 3′ end to remove terminal A nucleotides. Reads trimmed of at least 3 A’s with at least 18 nt remaining after trimming were aligned to the genome (note that reads with >26 nt remaining after trimming were further trimmed at the 5′ end to 26mers) and counted as uniquely-aligned Start-RNAs. The percentage and location of Start-seq reads ending in 3 or more A residues (out of total Start-seq reads mapping to that gene) was calculated for each gene in all the groups.
PRO-seq library preparation and data analysis
DL1 cells treated for 60 h with a control (Beta-galactosidase) dsRNA or a dsRNA targeting IntS9 were permeabilized as described below. All temperatures were at 4°C or ice cold unless otherwise specified. Cells were washed once in ice-cold 1x PBS and resuspended in Buffer W (10 mM Tris-HCl pH 8.0, 10% glycerol, 250 mM sucrose, 10 mM KCl, 5 mM MgCl2, 0.5 mM DTT, protease inhibitors cocktail (Roche), and 4 u/mL RNase inhibitor [SUPERaseIN, Ambion]) at the cell density of 2 × 107 cells/mL. 9x volume of Buffer P (10 mM Tris-HCl pH 8.0, 10% glycerol, 250 mM sucrose, 10 mM KCl, 5 mM MgCl2, 0.5 mM DTT, 0.1% Igepal, protease inhibitors cocktail (Roche), 4 u/mL RNase inhibitor [SUPERaseIN, Ambion]) was then immediately added. Cells were gently resuspended and incubated for up to 2 min on ice. Cells were then recovered by centrifugation (800 x g for 4 min) and washed in Buffer F (50 mM Tris-HCl pH 8.0, 40% glycerol, 5 mM MgCl2, 0.5 mM DTT, 4 u/mL RNase inhibitor [SUPERaseIN, Ambion]). Washed permeabilized cells were finally resuspended in Buffer F at a density of 1×106 cells/30 μL and immediately frozen in liquid nitrogen. Permeabilized cells were stored in −80°C until usage.
PRO-seq run-on reactions were carried out as follows: 1 × 106 permeabilized cells spiked with 5 × 104 permeabilized mouse embryonic stem cells were added to the same volume of 2x Nuclear Run-On reaction mixture (10 mM Tris-HCl pH 8.0, 300 mM KCl, 1% Sarkosyl, 5 mM MgCl2, 1 mM DTT, 200 μM biotin-11-A/C/G/UTP (Perkin-Elmer), 0.8 u/μL SUPERaseIN inhibitor [Ambion]) and incubated for 5 min at 30°C. Nascent RNA was extracted using a Total RNA Purification Kit following the manufacturer’s instructions (Norgen Biotek Corp.). Extracted nascent RNA was fragmented by base hydrolysis in 0.25 N NaOH on ice for 10 min and neutralized by adding 1x volume of 1 M Tris-HCl pH 6.8. Fragmented nascent RNA was bound to 30 μL of Streptavidin M-280 magnetic beads (Thermo Fisher Scientific) in Binding Buffer (300 mM NaCl, 10 mM Tris-HCl pH 7.4, 0.1% Triton X-100). The beads were washed twice in High salt buffer (2 M NaCl, 50 mM Tris-HCl pH 7.4, 0.5% Triton X-100), twice in Binding buffer, and twice in Low salt buffer (5 mM Tris-HCl pH 7.4, 0.1% Triton X-100). Bound RNA was extracted from the beads using Trizol (Invitrogen) followed by ethanol precipitation.
For the first ligation reaction, fragmented nascent RNA was dissolved in H2O and incubated with 10 pmol of reverse 3′ RNA adaptor (5′p-rNrNrNrNrNrNrGrArUrCrGrUrCrGrGrArCrUrGrUrArGrArArCrUrCrUrGrArArC-/3′InvdT/) and T4 RNA ligase I (NEB) under manufacturer’s conditions for 2 h at 20°C. Ligated RNA was enriched with biotin-labeled products by another round of Streptavidin bead binding and washing (two washes each of High, Binding and Low salt buffers and one wash of 1x Thermo Pol Buffer (NEB)). To decap 5′ ends, the RNA products were treated with RNA 5′ Pyrophosphohydrolase (RppH, NEB) at 37°C for 30 min followed by one wash of High, Low and T4 PNK Buffer. To repair 5′ ends, the RNA products were treated with Polynucleotide Kinase (PNK, NEB) at 37°C for 30 min.
5’ repaired RNA was ligated to reverse 5′ RNA adaptor (5′-rCrCrUrUrGrGrCrArCrCrCrGrArGrArArUrUrCrCrA-3′) with T4 RNA ligase I (NEB) under manufacturer’s conditions for 2 h at 20°C. Adaptor ligated nascent RNA was enriched with biotin-labeled products by another round of Streptavidin bead binding and washing (two washes each of High, Binding and Low salt buffers and one wash of 1x Superscript IV Buffer [Thermo Fisher Scientific]), and reverse transcribed using 25 pmol RT primer (5′-AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGA-3′) for TRU-seq barcodes (RP1 primer, Illumina). A portion of the RT product was removed and used for trial amplifications to determine the optimal number of PCR cycles. For the final amplification, 12.5 pmol of RPI-index primers (for TRU-seq barcodes, Illumina) was added to the RT product with Phusion polymerase (NEB) under standard PCR conditions. Excess RT primer served as one primer of the pair used for the PCR. The product was amplified 12~14 cycles and beads size selected (ProNex Purification System, Promega) before being sequenced in NextSeq 500 machines in a mid-output 150 bp cycle run.
PRO-seq libraries from 3 independent biological replicates (DL1 control (βgal) RNAi or IntS9 RNAi) were generated. Paired-end reads were trimmed to 42 nt, for adapter sequence and low quality 3′ ends using cutadapt 1.14, discarding those containing reads shorter than 20 nt (-m 20 -q 10), and removing a single nucleotide from the 3’ end of all trimmed reads to allow successful alignment with Bowtie 1.2.2. Remaining pairs were paired-end aligned to the mm10 genome index to determine spike-normalization ratios based on uniquely mapped reads. Mappable pairs were excluded from further analysis, and unmapped pairs were aligned to the dm3 genome assembly. Identical parameters were utilized in each alignment described above: up to 2 mismatches, maximum fragment length of 1000 nt, and uniquely mappable, and unmappable pairs routed to separate output files (-m1, -v2, -X1000, --un). Pairs mapping uniquely to dm3, representing biotin-labeled RNA 3′ ends, were separated, and strand-specific counts of the 3′ mapping positions determined at single nucleotide resolution, genome-wide, and expressed in bedGraph format with “plus” and “minus” strand labels swapped for each 3′ bedGraph, to correct for the “forward/reverse” nature of Illumina paired-end sequencing (see (Mahat et al., 2016)). Counts of pairs mapping uniquely to spike-in RNAs (mouse genome) were determined for each sample. Uniquely mappable reads were determined, and a normalization factor calculated. In this case, the samples displayed highly comparable recovery of spike-in reads, thus only normalization based on the DESeq2 size factors (see below) was used for each bedGraph. Combined bedGraphs were generated by summing counts per nucleotide of both replicates for each condition.
Sample | Total reads | Uniquely mapped reads (Percentage of total) | Agreement between replicates (Spearman’s rho) |
---|---|---|---|
Control (βgal) | 60,860,471 | 48.16% | >0.98 |
IntS9-dep. | 57,112,558 | 53.29% | >0.99 |
Read counts were calculated per gene, in a strand-specific manner, based on annotations described in the modified transcript annotations section above, using featureCounts (Liao et al., 2014). This quantification procedure includes signal only in the gene body (+250 from TSS to annotated gene end). Differentially expressed genes were identified using DESeq2 v1.18.1 (Anders and Huber, 2010) under R 3.3.1. PRO-seq size factors were determined based on DESeq2 (for Control: 1.0029079, 1.2830936, 0.8962051; IntS9-dep.: 0.9151691, 0.9156818, 1.0672821). At an adjusted p-value threshold of <0.0001 and fold-change >1.5, 1,414 mRNA genes were identified as differentially expressed upon IntS9-depletion in DL1 cells. UCSC Genome Browser tracks displaying 3’ end position of each mapped read were generated from the combined replicates per condition, normalized as in the differential expression analysis. The PRO-seq datasets in HeLa cells were mapped as described above with the exception that a hg19 index was used as reference and UCSC Genome Browser tracks displaying 3’ end position of each mapped read were generated from the combined replicates using raw reads.
Genomic statistical tests
For RNA-seq, PRO-seq, and ChIP-seq experiments, statistical significance for comparisons was assessed by Mann-Whitney (pairwise tests) test. Statistical details and error bars are defined in each figure legend. To test for the significant overlap between IntS9-upregulated or IntS9-downregulated genes in RNA-seq and PRO-seq, a hypergeometric test was used from a total of 9499 active mRNA genes.
Gene Ontology Analysis
Gene Ontology analysis was performed using DAVID (v6.8) online tool with standard parameters (https://david.ncifcrf.gov/home.jsp). The number of affected genes used to identify the top Biological categories and Pathways is described in the table below.
Treatment | Assay | Number Upregulated transcripts | Number Downregulated transcripts |
---|---|---|---|
Control vs. IntS9-dep. | RNA-seq | 723 | 163 |
Control vs. IntS9-dep. | PRO-seq | 1204 | 210 |
Control vs. IntS11-dep. | Chromatin RNA-seq | 667 | 616 |
DATA AND CODE AVAILABILITY
GEO Accession numbers
All datasets generated in this study are available for download from GEO (GSE114467).
Supplementary Material
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Antibodies | ||
IntS1 | Ezzeddine et al., 2011 | |
IntS9 | Ezzeddine et al., 2011 | |
IntS11 | Ezzeddine et al., 2011 | |
alpha-tubulin | abcam ab15246 | AB_301787 |
H3K4me3 | EpiCypher 13-0028 | |
Normal Rabbit serum | Invitrogen 01-610-1 | |
Chemicals, Peptides, and Recombinant Proteins | ||
AzNTPs | TriLink Technologies | Cat# K-1005 |
Biotin-11-NTPs | Perkin Elmer | Cat# NEL54(2/3/4/5)001 |
Critical Commercial Assays | ||
NEBNext Ultra II DNA library kit | NEB | Cat# E7103S |
Deposited Data | ||
Raw and analyzed data | This paper | GEO: GSE114467 |
Start-seq from Triptolide-treated S2 cells | Krebs et al., 2017 | GEO: GSE77369 |
H3K4me1, H3K4me3 and H3K36me3 ChIP-seq from S2 cells | Henriques et al., 2018 | GEO: GSE85191 |
Start-seq from Control and Rrp40-depleted S2 cells | Henriques et al., 2013 | GEO: GSE49078 |
Chromatin RNA-seq from IntS11-depleted HeLa cells | Lai et al., 2015 | GEO: GSE68401 |
H3K4me1, H3K4me3 and H3K36me3 ChIP-seq from HeLa cells | Gerstein et al., 2012 | GEO: GSE29611 |
PRO-seq from DMSO-treated HeLa cells | Nilson et al. 2017 | GEO: GSE100742 |
Chromatin RNA-seq from Xrn2 or CPS73-depleted HeLa cells | Nojima et al., 2015 | GEO: GSE60358 |
Experimental Models: Cell Lines | ||
DL1 cells | Dr. Sara Cherry, UPenn | |
S2-DGRC clone 6 | DGRC | Stock number 6 |
DL1 FLAG-eGFP | This paper | |
DL1 FLAG-IntS11WT | This paper | |
DL1 FLAG-IntS11E203Q | This paper | |
Oligonucleotides | ||
Table S3 | This paper | |
Recombinant DNA | ||
pUB-3xFLAG vector | Chen et al., 2012 | |
pMT-puro | David Sabatini lab | Addgene #17923 |
pMT-FLAG-dIntS11-WT-puro | This paper | |
pMT-FLAG-dIntS11-E203Q-puro | This paper | |
pMT-FLAG-eGFP-puro | This paper | |
Software and Algorithms | ||
bowtie 1.2.2 | Langmead et al., 2009 | |
R v3.3.1 | www.r-project.org | |
Rstudio v1.0.136 | www.rstudio.com | |
featureCounts | Liao et al., 2014 | |
DESeq2 | Love et al., 2014 | |
MISO | Katz et al., 2010 | |
Prism v8.1.2 | GraphPad | |
Partek Genomics Suite v6.15.0127 | www.partek.com |
Highlights:
Integrator inhibits transcription elongation at ~15% of mRNA genes and enhancers
Integrator targets promoter-proximally paused Pol II for termination
The RNA endonuclease of Integrator subunit 11 is critical for gene repression
Integrator-depletion increases productive elongation and Histone H3 K4 methylation
ACKNOWLEDGEMENTS
We thank Todd Albrecht for generating Drosophila Integrator antibodies, Erik Andrulis for the Rrp40 antibody, William K. Russel and the UTMB Proteomics Core, and other members of the Adelman, Wilusz and Wagner labs for helpful discussions. J.E.W. is a Rita Allen Foundation Scholar. This work was supported by National Institutes of Health grants R35-GM119735 (to J.E.W.), K99-GM131028 (to D.C.T.), Welch Foundation grant H-1889 (to E.J.W.); Startup Funds provided by Harvard Medical School (to K.A), and R01 GM134539 (to K.A. and E.J.W).
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
DECLARATION OF INTERESTS
The authors declare no competing interests.
REFERENCES
- Adelman K, and Lis JT (2012). Promoter-proximal pausing of RNA polymerase II: emerging roles in metazoans. Nat. Rev. Genet 13, 720–731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anders S, and Huber W (2010). Differential expression analysis for sequence count data. Genome Biol. 11, R106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andersson R, Sandelin A, and Danko CG (2015). A unified architecture of transcriptional regulatory elements. Trends in Genetics 31, 426–433. [DOI] [PubMed] [Google Scholar]
- Arnold CD, Gerlach D, Stelzer C, Boryń ŁM, Rath M, and Stark A (2013). Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077. [DOI] [PubMed] [Google Scholar]
- Austenaa LMI, Barozzi I, Simonatto M, Masella S, Chiara, Della G, Ghisletti S, Curina A, de Wit E, Bouwman BAM, de Pretis S, et al. (2015). Transcription of Mammalian cis-Regulatory Elements Is Restrained by Actively Enforced Early Termination. Molecular Cell 60, 460–474. [DOI] [PubMed] [Google Scholar]
- Baillat D, and Wagner EJ (2015). Integrator: surprisingly diverse functions in gene expression. Trends Biochem. Sci 40, 257–264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baillat D, Hakimi M-A, Näär AM, Shilatifard A, Cooch N, and Shiekhattar R (2005). Integrator, a multiprotein mediator of small nuclear RNA processing, associates with the C-terminal repeat of RNA polymerase II. Cell 123, 265–276. [DOI] [PubMed] [Google Scholar]
- Barbieri E, Trizzino M, Welsh SA, Owens TA, Calabretta B, Carroll M, Sarma K, and Gardini A (2018). Targeted Enhancer Activation by a Subunit of the Integrator Complex. Molecular Cell 71, 103–116.e107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baumann DG, and Gilmour DS (2017). A sequence-specific core promoter-binding transcription factor recruits TRF2 to coordinately transcribe ribosomal protein genes. Nucleic Acids Res. 45, 10481–10491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bresson S, and Tollervey D (2018). Surveillance-ready transcription: nuclear RNA decay as a default fate. Open Biol 8, 170270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buckley MS, Kwak H, Zipfel WR, and Lis JT (2014). Kinetics of promoter Pol II on Hsp70 reveal stable pausing and key insights into its regulation. Genes Dev. 28, 14–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cazalla D, Xie M, and Steitz JA (2011). A primate herpesvirus uses the integrator complex to generate viral microRNAs. Molecular Cell 43, 982–992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen F, Gao X, and Shilatifard A (2015). Stably paused genes revealed through inhibition of transcription initiation by the TFIIH inhibitor triptolide. Genes Dev. 29, 39–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen J, Ezzeddine N, Waltenspiel B, Albrecht TR, Warren WD, Marzluff WF, and Wagner EJ (2012). An RNAi screen identifies additional members of the Drosophila Integrator complex and a requirement for cyclin C/Cdk8 in snRNA 3′-end formation. Rna 18, 2148–2156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Core L, and Adelman K (2019). Promoter-proximal pausing of RNA polymerase II: a nexus of gene regulation. Genes Dev. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Core LJ, Martins AL, Danko CG, Waters CT, Siepel A, and Lis JT (2014). Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet 46, 1311–1320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eaton JD, Davidson L, Bauer DLV, Natsume T, Kanemaki MT, and West S (2018). Xrn2 accelerates termination by RNA polymerase II, which is underpinned by CPSF73 activity. Genes Dev. 32, 127–139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Egloff S, O’Reilly D, Chapman RD, Taylor A, Tanzhaus K, Pitts L, Eick D, and Murphy S (2007). Serine-7 of the RNA polymerase II CTD is specifically required for snRNA gene expression. Science 318, 1777–1779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Erickson B, Sheridan RM, Cortazar M, and Bentley DL (2018). Dynamic turnover of paused Pol II complexes at human promoters. Genes Dev. 32, 1215–1225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ezzeddine N, Chen J, Waltenspiel B, Burch B, Albrecht T, Zhuo M, Warren WD, Marzluff WF, and Wagner EJ (2011). A subset of Drosophila integrator proteins is essential for efficient U7 snRNA and spliceosomal snRNA 3′-end formation. Mol. Cell. Biol 31, 328–341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fong N, Brannan K, Erickson B, Kim H, Cortazar MA, Sheridan RM, Nguyen T, Karp S, and Bentley DL (2015). Effects of Transcription Elongation Rate and Xrn2 Exonuclease Activity on RNA Polymerase II Termination Suggest Widespread Kinetic Competition. Molecular Cell 60, 256–267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gardini A, Baillat D, Cesaroni M, Hu D, Marinis JM, Wagner EJ, Lazar MA, Shilatifard A, and Shiekhattar R (2014). Integrator regulates transcriptional initiation and pause release following activation. Molecular Cell 56, 128–139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan K-K, Cheng C, Mu X,J, Khurana E, Rozowsky J, Alexander R, et al. (2012). Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gollnick P, and Babitzke P (2002). Transcription attenuation. Biochim. Biophys. Acta 1577, 240–250. [DOI] [PubMed] [Google Scholar]
- Guiro J, and Murphy S (2017). Regulation of expression of human RNA polymerase II-transcribed snRNA genes. Open Biol 7, 170073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henkin TM, and Yanofsky C (2002). Regulation by transcription attenuation in bacteria: how RNA provides instructions for transcription termination/antitermination decisions. Bioessays 24, 700–707. [DOI] [PubMed] [Google Scholar]
- Henriques T, Gilchrist DA, Nechaev S, Bern M, Muse GW, Burkholder A, Fargo DC, and Adelman K (2013). Stable Pausing by RNA Polymerase II Provides an Opportunity to Target and Integrate Regulatory Signals. Molecular Cell 52, 517–528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henriques T, Scruggs BS, Inouye MO, Muse GW, Williams LH, Burkholder AB, Lavender CA, Fargo DC, and Adelman K (2018). Widespread transcriptional pausing and elongation control at enhancers. Genes Dev. 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hernandez N (1985). Formation of the 3’ end of U1 snRNA is directed by a conserved sequence located downstream of the coding region. The EMBO Journal 4, 1827–1837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hernandez N, and Weiner AM (1986). Formation of the 3′ end of U1 snRNA requires compatible snRNA promoter elements. Cell 47, 249–258. [DOI] [PubMed] [Google Scholar]
- Ho JWK, Jung YL, Liu T, Alver BH, Lee S, Ikegami K, Sohn K-A, Minoda A, Tolstorukov MY, Appert A, et al. (2014). Comparative analysis of metazoan chromatin organization. Nature 512, 449–452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jaworski E, and Routh A (2018). ClickSeq: Replacing Fragmentation and Enzymatic Ligation with Click-Chemistry to Prevent Sequence Chimeras. Methods Mol. Biol 1712, 71–85. [DOI] [PubMed] [Google Scholar]
- Jonkers I, Kwak H, and Lis JT (2014). Genome-wide dynamics of Pol II elongation and its interplay with promoter proximal pausing, chromatin, and exons. Elife 3, e02407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kamieniarz-Gdula K, and Proudfoot NJ (2019). Transcriptional Control by Premature Termination: A Forgotten Mechanism. Trends in Genetics. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katz Y, Wang ET, Airoldi EM, and Burge CB (2010). Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaye EG, Booker M, Kurland JV, Conicella AE, Fawzi NL, Bulyk ML, Tolstorukov MY, and Larschan E (2018). Differential Occupancy of Two GA-Binding Proteins Promotes Targeting of the Drosophila Dosage Compensation Complex to the Male X Chromosome. Cell Rep 22, 3227–3239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim H, Erickson B, Luo W, Seward D, Graber JH, Pollock DD, Megee PC, and Bentley DL (2010). Gene-specific RNA polymerase II phosphorylation and the CTD code. Nat. Struct. Mol. Biol 17, 1279–1286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim T-K, and Shiekhattar R (2015). Architectural and Functional Commonalities between Enhancers and Promoters. Cell 162, 948–959. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kireeva ML, Komissarova N, Waugh DS, and Kashlev M (2000). The 8-nucleotide-long RNA:DNA hybrid is a primary stability determinant of the RNA polymerase II elongation complex. J. Biol. Chem 275, 6530–6536. [DOI] [PubMed] [Google Scholar]
- Krebs AR, Imanci D, Hoerner L, Gaidatzis D, Burger L, and SchQbeler D (2017). Genome-wide Single-Molecule Footprinting Reveals High RNA Polymerase II Turnover at Paused Promoters. Molecular Cell 67, 411–422.e414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kwak H, Fuda NJ, Core LJ, and Lis JT (2013). Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science 339, 950–953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lai F, Gardini A, Zhang A, and Shiekhattar R (2015). Integrator mediates the biogenesis of enhancer RNAs. Nature 525, 399–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langmead B, Trapnell C, Pop M, and Salzberg SL (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lavender CA, Shapiro AJ, Burkholder AB, Bennett BD, Adelman K, and Fargo DC (2017). ORIO (Online Resource for Integrative Omics): a web-based platform for rapid integration of next generation sequencing data. Nucleic Acids Res. 45, 5678–5690. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liao Y, Smyth GK, and Shi W (2014). featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930. [DOI] [PubMed] [Google Scholar]
- Lim SJ, Boyle PJ, Chinen M, Dale RK, and Lei EP (2013). Genome-wide localization of exosome components to active promoters and chromatin insulators in Drosophila. Nucleic Acids Res. 41, 2963–2980. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mahat DB, Kwak H, Booth GT, Jonkers IH, Danko CG, Patel RK, Waters CT, Munson K, Core LJ, and Lis JT (2016). Base-pair-resolution genome-wide mapping of active RNA polymerases using precision nuclear run-on (PRO-seq). Nat Protoc 11, 1455–1476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mandel CR, Kaneko S, Zhang H, Gebauer D, Vethantham V, Manley JL, and Tong L (2006). Polyadenylation factor CPSF-73 is the pre-mRNA 3’-end-processing endonuclease. Nature 444, 953–956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McDowell JC, Roberts JW, Jin DJ, and Gross C (1994). Determination of intrinsic transcription termination efficiency by RNA polymerase elongation rate. Science 266, 822–825. [DOI] [PubMed] [Google Scholar]
- Merran J, and Corden JL (2017). Yeast RNA-Binding Protein Nab3 Regulates Genes Involved in Nitrogen Metabolism. Mol. Cell. Biol 37, 5320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- modENCODE Consortium, Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, Eaton ML, Landolin JM, Bristow CA, Ma L, et al. (2010). Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330, 1787–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nechaev S, Fargo DC, Santos, dos G, Liu L, Gao Y, and Adelman K (2010). Global analysis of short RNAs reveals widespread promoter-proximal stalling and arrest of Pol II in Drosophila. Science 327, 335–338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nilson KA, Lawson CK, Mullen NJ, Ball CB, Spector BM, Meier JL, and Price DH (2017). Oxidative stress rapidly stabilizes promoter-proximal paused Pol II across the human genome. Nucleic Acids Res. 45, 11088–11105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nojima T, Gomes T, Grosso ARF, Kimura H, Dye MJ, Dhir S, Carmo-Fonseca M, and Proudfoot NJ (2015). Mammalian NET-Seq Reveals Genome-wide Nascent Transcription Coupled to RNA Processing. Cell 161, 526–540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ogami K, Richard P, Chen Y, Hoque M, Li W, Moresco JJ, Yates JR, Tian B, and Manley JL (2017). An Mtr4/ZFC3H1 complex facilitates turnover of unstable nuclear RNAs to prevent their cytoplasmic transport and global translational repression. Genes Dev. 31, 1257–1271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Perissi V, Jepsen K, Glass CK, and Rosenfeld MG (2010). Deconstructing repression: evolving models of co-repressor action. Nat. Rev. Genet 11, 109–123. [DOI] [PubMed] [Google Scholar]
- Peterlin BM, and Price DH (2006). Controlling the elongation phase of transcription with P-TEFb. Molecular Cell 23, 297–305. [DOI] [PubMed] [Google Scholar]
- Porrua O, and Libri D (2015). Transcription termination and the control of the transcriptome: why, where and how to stop. Nature Publishing Group 16, 190–202. [DOI] [PubMed] [Google Scholar]
- Proudfoot NJ (2016). Transcriptional termination in mammals: Stopping the RNA polymerase II juggernaut. Science 352, aad9926–aad9926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rienzo M, and Casamassimi A (2016). Integrator complex and transcription regulation: Recent findings and pathophysiology. Biochim. Biophys. Acta 1859, 1269–1280. [DOI] [PubMed] [Google Scholar]
- Shah RN, Grzybowski AT, Cornett EM, Johnstone AL, Dickson BM, Boone BA, Cheek MA, Cowles MW, Maryanski D, Meiners MJ, et al. (2018). Examining the Roles of H3K4 Methylation States with Systematically Characterized Antibodies. Molecular Cell 72, 162–177.e167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shao W, and Zeitlinger J (2017). Paused RNA polymerase II inhibits new transcriptional initiation. Nat. Genet 49, 1045–1051. [DOI] [PubMed] [Google Scholar]
- Skaar JR, Ferris AL, Wu X, Saraf A, Khanna KK, Florens L, Washburn MP, Hughes SH, and Pagano M (2015). The Integrator complex controls the termination of transcription at diverse classes of gene targets. Cell Res. 25, 288–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soares LM, He PC, Chun Y, Suh H, Kim T, and Buratowski S (2017). Determinants of Histone H3K4 Methylation Patterns. Molecular Cell 68, 773–785.e776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sohrabi-Jahromi S, Hofmann KB, Boltendahl A, Roth C, Gressel S, Baejen C, Soeding J, and Cramer P (2019). Transcriptome maps of general eukaryotic RNA degradation factors. Elife 8, 2148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stadelmayer B, Micas G, Gamot A, Martin P, Malirat N, Koval S, Raffel R, Sobhian B, Severac D, Rialle S, et al. (2014). Integrator complex regulates NELF-mediated RNA polymerase II pause/release and processivity at coding genes. Nat Commun 5, 5531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steurer B, Janssens RC, Geverts B, Geijer ME, Wienholz F, Theil AF, Chang J, Dealy S, Pothof J, van Cappellen WA, et al. (2018). Live-cell analysis of endogenous GFP-RPB1 uncovers rapid turnover of initiating and promoter-paused RNA Polymerase II. Proc. Natl. Acad. Sci. U.S.a 115, E4368–E4376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tettey TT, Gao X, Shao W, Li H, Story BA, Chitsazan AD, Glaser RL, Goode ZH, Seidel CW, Conaway RC, et al. (2019). A Role for FACT in RNA Polymerase II Promoter-Proximal Pausing. Cell Rep 27, 3770–3779.e3777. [DOI] [PubMed] [Google Scholar]
- Venkatesh S, and Workman JL (2015). Histone exchange, chromatin structure and the regulation of transcription. Nature Publishing Group 16, 178–189. [DOI] [PubMed] [Google Scholar]
- Venters CC, Oh J-M, Di C, So BR, and Dreyfuss G (2019). U1 snRNP Telescripting: Suppression of Premature Transcription Termination in Introns as a New Layer of Gene Regulation. Cold Spring Harb Perspect Biol 11, a032235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vispé S, DeVries L, Créancier L, Besse J, Bréand S, Hobson DJ, Svejstrup JQ, Annereau J-P, Cussac D, Dumontet C, et al. (2009). Triptolide is an inhibitor of RNA polymerase I and II-dependent transcription leading predominantly to down-regulation of short-lived mRNA. Mol. Cancer Ther 8, 2780–2790. [DOI] [PubMed] [Google Scholar]
- Vos SM, Farnung L, Urlaub H, and Cramer P (2018). Structure of paused transcription complex Pol II-DSIF-NELF. Nature 560, 601–606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wagner EJ, and Carpenter PB (2012). Understanding the language of Lys36 methylation at histone H3. Nature Publishing Group 13, 115–126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weber CM, Ramachandran S, and Henikoff S (2014). Nucleosomes are context-specific, H2A.Z-modulated barriers to RNA polymerase. Molecular Cell 53, 819–830. [DOI] [PubMed] [Google Scholar]
- Williams LH, Fromm G, Gokey NG, Henriques T, Muse GW, Burkholder A, Fargo DC, Hu G, and Adelman K (2015). Pausing of RNA polymerase II regulates mammalian developmental potential through control of signaling networks. Molecular Cell 58, 311–322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilson KS, Conant CR, and Hippel, von PH (1999). Determinants of the stability of transcription elongation complexes: interactions of the nascent RNA with the DNA template and the RNA polymerase. J. Mol. Biol 289, 1179–1194. [DOI] [PubMed] [Google Scholar]
- Wu Y, Albrecht TR, Baillat D, Wagner EJ, and Tong L (2017). Molecular basis for the interaction between Integrator subunits IntS9 and IntS11 and its functional importance. Proc. Natl. Acad. Sci. U.S.a 114, 4394–4399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie M, Zhang W, Shu M-D, Xu A, Lenis DA, DiMaio D, and Steitz JA (2015). The host Integrator complex acts in transcription-independent maturation of herpesvirus microRNA 3′ ends. Genes Dev. 29, 1552–1564. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yamamoto J, Hagiwara Y, Chiba K, Isobe T, Narita T, Handa H, and Yamaguchi Y (2014). DSIF and NELF interact with Integrator to specify the correct post-transcriptional fate of snRNA genes. Nat Commun 5, 4263. [DOI] [PubMed] [Google Scholar]
- Yanofsky C (1981). Attenuation in the control of expression of bacterial operons. Nature 289, 751–758. [DOI] [PubMed] [Google Scholar]
- Zabidi MA, Arnold CD, Schernhuber K, Pagani M, Rath M, Frank O, and Stark A (2015). Enhancer-core-promoter specificity separates developmental and housekeeping gene regulation. Nature 518, 556–559. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
GEO Accession numbers
All datasets generated in this study are available for download from GEO (GSE114467).