ABSTRACT
Transcription elongation is a highly processive process that is punctuated by RNA polymerase (RNAP) pausing. Long-lived pauses can provide time for diverse regulatory events to occur, which play important roles in modulating gene expression. Transcription elongation factors can dramatically affect RNAP pausing in vitro. The genome-wide role of such factors in pausing in vivo has been examined only for NusG in Bacillus subtilis. NusA is another transcription elongation factor known to stimulate pausing of B. subtilis and Escherichia coli RNAP in vitro. Here, we present the first in vivo study to identify the genome-wide role of NusA in RNAP pausing. Using native elongation transcript sequencing followed by RNase digestion (RNET-seq), we analyzed factor-dependent RNAP pausing in B. subtilis and found that NusA has a relatively minor role in RNAP pausing compared to NusG. We demonstrate that NusA has both stimulating and suppressing effects on pausing in vivo. Based on our thresholding criteria on in vivo data, NusA stimulates pausing at 129 pause peaks in 93 different genes or 5′ untranslated regions (5′ UTRs). Putative pause hairpins were identified for 87 (67%) of the 129 NusA-stimulated pause peaks, suggesting that RNA hairpins are a common component of NusA-stimulated pause signals. However, a consensus sequence was not identified as a NusA-stimulated pause motif. We further demonstrate that NusA stimulates pausing in vitro at some of the pause sites identified in vivo.
IMPORTANCE NusA is an essential transcription elongation factor that was assumed to play a major role in RNAP pausing. NusA stimulates pausing in vitro; however, the essential nature of NusA had prevented an assessment of its role in pausing in vivo. Using a NusA depletion strain and RNET-seq, we identified a similar number of NusA-stimulated and NusA-suppressed pause peaks throughout the genome. NusA-stimulated pausing was confirmed at several sites in vitro. However, NusA did not always stimulate pausing at sites identified in vivo, while in other instances NusA stimulated pausing at sites not observed in vivo. We found that NusA has only a minor role in stimulating RNAP pausing in B. subtilis.
KEYWORDS: NusA, NusG, RNA polymerase pausing, gene regulation, transcription
INTRODUCTION
Transcription by RNA polymerase (RNAP) is regulated at the level of initiation, elongation, and termination. However, our understanding of the regulatory events controlling elongation lags behind our understanding of initiation and termination. RNAP frequently isomerizes into a reversible elemental pause state (1–7). The stabilization of elemental pauses at regulatory sites can occur when the nascent RNA folds into a hairpin structure 11 to 13 nucleotides (nt) upstream of the RNA 3′ end, after interactions with protein factors, and/or due to the presence of downstream RNA and DNA sequences (1, 8–15). These long-lived pauses allow the position of RNAP to be synchronized with cotranscriptional RNA folding and/or regulatory factor binding (2). Thus, long-lived pauses can provide time for diverse regulatory events to occur, which play a significant role in modulating gene expression (11, 14, 15).
Transcription elongation factors dramatically affect RNAP pausing in vitro (10, 16–18), and the genome-wide role of NusG in pausing has been examined in Bacillus subtilis (14). NusA is another elongation factor known to stimulate pausing of B. subtilis and Escherichia coli RNAP in vitro (10, 17, 19, 20). During in vitro transcription of the leader region of the B. subtilis trpEDCFBA operon, NusA stimulates pausing at two positions. Formation of pause hairpins participates in pausing at both positions (10, 11, 18, 20). The first pause provides additional time for the trp RNA-binding attenuation protein (TRAP) to bind and promote termination through a regulated attenuation mechanism (20), while the second pause downstream participates in a trpE translation control mechanism (11). NusA is also an important intrinsic termination factor, and cellular NusA levels are tightly regulated by an autoregulatory attenuation mechanism in which NusA stimulates intrinsic termination at two positions in the operon’s leader region (21).
NusA is a pausing factor in E. coli (17), and its role in regulating gene expression in vitro was discovered during early studies of the trp operon attenuation mechanism (16). NusA stimulates pausing in E. coli through interactions with a pause hairpin in the nascent transcript and with RNAP (19, 22), and NusA proteins from B. subtilis and E. coli can substitute for one another in stimulating pausing of both B. subtilis and E. coli RNAP in vitro (18). To investigate the genome-wide role of NusA in RNAP pausing in vivo, we used native elongating transcript sequencing followed by RNase digestion (RNET-seq) that integrates sequencing of nascent transcripts (NET-seq) with RNase I-mediated footprinting (14). Previous RNET-seq studies on wild-type (WT) and nusG deletion (ΔnusG) B. subtilis strains demonstrated that NusG shifts RNAP to the posttranslocation register and induces strong pausing at 1,600 sites in the genome. These sites were found to be enriched for a consensus TTNTTT pause motif in the nontemplate DNA (ntDNA) strand within the paused transcription bubble (14). In the current study, we performed comparative RNET-seq analysis between the WT, nusA depletion (nusAdep), ΔnusG, and nusAdep ΔnusG strains, as well as in vitro pausing assays on selected NusA-stimulated pause sites identified in vivo. We found that NusA has a relatively minor role as a pausing factor in B. subtilis.
RESULTS
Genome-wide identification of RNAP pausing.
For this study, we used a NusA depletion strain in which NusA was solely generated exogenously from an isopropyl-β-d-thiogalactopyranoside (IPTG)-inducible promoter (21). It was shown previously that growth in the presence of IPTG resulted in approximately wild-type (WT) levels of NusA, whereas growth in the absence of IPTG results in depletion of NusA (nusAdep) to ∼2% of WT levels (21, 23), and we obtained comparable depletion of NusA in this study (see Fig. S1 in the supplemental material). By performing our studies with nusAdep and nusAdep ΔnusG B. subtilis strains with or without (±) IPTG, we were able to mimic WT (nusAdep, +IPTG), NusA depletion (nusAdep, −IPTG), nusG deletion (nusAdep ΔnusG, +IPTG), and NusA depletion, nusG deletion (nusAdep ΔnusG, −IPTG) conditions. To simplify the discussion, we will refer to these four conditions as WT, nusAdep, ΔnusG, and nusAdep ΔnusG strains.
We previously developed RNET-seq to identify the positions of paused RNAP at single-nucleotide resolution throughout the genome (14, 24). Coprecipitation of native elongating transcripts and chromosomal DNA, followed by RNase I digestion, allowed footprinting of nascent transcripts protected by RNAP. This procedure eliminated all terminated (released) transcripts, allowing unique mapping of the nascent 3′ ends across the B. subtilis reference genome. For each coordinate genome-wide, we divided the number of 3′ ends identified at that coordinate (raw count value) by the median number of 3′ ends identified across a 100-nt window centered at that coordinate (median count value). This calculation yielded a metric that we referred to as pause score (14). For this study, we defined a pause peak as a location having both a pause score of ≥10 and a raw count value greater than a minimum count generated for each strain based on the total number of reads, among other parameters (14). In this study, we merged the replicate FASTQ files prior to data analysis. This approach identified 9,045 pause peaks in the WT strain, 8,200 pause peaks in the nusAdep strain, and 8,847 in the ΔnusG strain (Data Set S1). To ensure that our computational method was valid, we conducted a principal-component analysis (PCA) using gene expression profiles generated from each WT and nusAdep replicate (Fig. S2A). We also conducted an analogous PCA, using the pause scores that were calculated for each WT and nusAdep replicate, at each pause peak identified in the WT and nusAdep strains (Fig. S2B). These analyses showed that both the gene expression profiles and pause score profiles generated the WT and nusAdep replicates sorted neatly by strain.
Comparison of the global effects of NusA and NusG on pausing.
For each pause peak identified in the WT strain, we calculated the pause score at that coordinate in the WT, nusAdep, and ΔnusG strains and organized these data into box-and-violin plots (Fig. 1). Distribution of the violin plot was skewed downward in the nusAdep strain, but not to the extent observed for the ΔnusG strain. These data indicated that loss of NusA resulted in a modest decrease in pausing genome-wide, while loss of nusG caused a more substantial decrease in pausing (Data Sets S1 and S2). Thus, we found that NusG is the more potent genome-wide pausing factor in B. subtilis.
FIG 1.
Effects of NusA and NusG on RNAP pausing at pause peaks identified in the WT strain. Violin plots overlaid with box plots showing the distribution of pause strength (score) across wild-type (WT), nusAdep, and ΔnusG strains. Boundaries of the box designate the interquartile range (IQR), while upper/lower whiskers extend from the 75th/25th percentile to the largest/smallest value, respectively, no further than 1.5 × IQR in either direction. All pairwise P values are in Data Set S2 in the supplemental material.
For the superset of all pauses identified in the WT and nusAdep strains, the log2-transformed ratio of the mutant pause score to WT pause score was quantified (Data Set S3). This analysis was then repeated for the ΔnusG mutant strain (Data Set S4). For these analyses, we included only pause peaks in regions that were expressed under both conditions (see Materials and Methods). We defined a pause peak to be stimulated by a Nus factor when the pause score decreased by at least 4-fold in the mutant strain compared to the WT strain. Similarly, we considered a pause peak to be suppressed by a Nus factor when the pause score increased by at least 4-fold in the mutant strain compared to the WT strain.
Through the above criteria, we identified 129 NusA-stimulated pause peaks in 93 different genes or 5′ untranslated regions (UTRs) that decreased in score and 83 NusA-suppressed pause peaks in 57 genes or 5′ UTRs (Data Set S3). These results indicate that NusA has both pause-stimulating and pause-suppressing effects throughout the B. subtilis genome, and this was further confirmed by comparing the distributions of pause scores in the WT and nusAdep strains for the identified NusA-stimulated and NusA-suppressed pause peaks (Fig. 2A and B). In addition to differential analysis of pause strength, visual inspection of the RNET-seq data on the Integrated Genomics Viewer (IGV) was performed to filter out false positives caused by read coverage discrepancies between replicates in regions adjacent to the pause peak. The above analyses were restricted to non-rRNA operons. To ensure that we were not missing a potential role of NusA in rRNA transcription, we inspected the RNET-seq coverage profile across several rRNA operons in the WT and nusAdep strains and did not observe appreciable changes across these operons when NusA was depleted (Fig. S2C). Thus, NusA does not appear to play a significant role in pausing or antitermination in rRNA operons.
FIG 2.
Differential analysis of in vivo pause strength between WT and nusAdep strains reveals RNAP pause-stimulating and pause-suppressing effects of NusA. (A) The frequency distribution of pause strength (score) of NusA-stimulated pauses in WT and nusAdep strains. The distribution shifts to lower scores upon NusA depletion. (B) The frequency distribution of pause strength (score) of NusA-suppressed pauses in WT and nusAdep strains. The distribution shifts to higher scores upon NusA depletion.
Consistent with our previous studies, we considered adjacent pause peaks within a 10-bp window as a single pause site to account for the instances with multiple consecutive 3′ ends. Sixty-three percent of the NusA-stimulated pause peaks possess unique 3′ ends (i.e., a single pause peak) while the others comprise pause sites with multiple peaks. Of the NusA-suppressed pause peaks, 85% consist of unique 3′ ends. Two NusA-stimulated and two NusA-suppressed pause sites from our RNET-seq data are shown in Fig. 3A and B. Only 57% (73 out of 129) of the NusA-stimulated pause peaks were found to be stimulated exclusively by NusA, while the other 43% (56 out of 129) were also stimulated by NusG (Data Sets S3 and S4). This result suggests that NusA and NusG can act cooperatively (or interchangeably) at these latter sites (Fig. 4A). For the superset of all Nus-suppressed pause peaks, 35% (82/234) were suppressed only by NusA, and 65% (151/234) were suppressed only by NusG, while only one pause peak was suppressed by both NusA and NusG (i.e., this pause was observed upon the loss of either Nus factor) (Fig. 4B). These findings indicate that NusA and NusG act almost entirely independently during pause suppression.
FIG 3.
NusA-stimulated and NusA-suppressed pause sites identified by RNET-seq. Examples of NusA-stimulated (A and B) and NusA-suppressed (C and D) pauses as they appear in IGV. Top tracks are for the wild-type (WT) strain, and the bottom tracks are for the nusAdep strain. (A) Pause peak at genomic coordinate 2602034 in the dnaG open reading frame. (B) Consecutive pause peaks at genomic coordinates 3803291, 3803293, and 3803294 in the 5′ UTR of rpmE. (C) Pause peak at genomic coordinate 2288631 in the ypoP open reading frame. (D) Pause peak at genomic coordinate 3670200 in the ggaA open reading frame. Genome-aligned reads are in gray while mapped 3′ ends corresponding to the RNAP active site are in blue (+ strand) and red (− strand). Black arrows indicate the direction of transcription.
FIG 4.
Effects of NusG on NusA-stimulated and NusA-suppressed pauses. (A and B) Venn diagrams depicting the number and overlap of pause peaks that are classified as factor dependent. (A) Pauses that are stimulated (log2FC score of ≤−2) by NusA (blue), NusG (red), or both (magenta). (B) Pauses that are suppressed (log2FC score of ≥2) by NusA (blue), NusG (red), or both (magenta). (C to F) Violin plots overlaid with box plots showing the distribution of pause strength in wild-type (WT), nusAdep, and ΔnusG strains for NusA-stimulated (C), NusA-suppressed (D), NusG-stimulated (E), and NusG-suppressed (F) pauses identified in vivo. Boundaries of the box designate the interquartile range (IQR), while upper/lower whiskers extend from the 75th/25th percentile to the largest/smallest value, respectively, no further than 1.5 × IQR in either direction. All pairwise P values are in Data Set S2.
To further characterize the cooperative effect on pausing between NusA and NusG, we organized the pause scores of all NusA- and NusG-stimulated/suppressed pause peaks in the WT, nusAdep, and ΔnusG strains into box-and-violin plots (Fig. 4C to F). This analysis confirmed that NusG played a major role at NusA-stimulated pause sites, with NusG playing a larger role at several NusA-stimulated pause sites than NusA itself (Fig. 4C; Data Sets S3 and S4). The converse was not found to be true at NusG-stimulated pause sites, where we found that NusA played a relatively minor role (Fig. 4E; Data Sets S3 and S4). Moreover, NusG was found to have a minor stimulatory effect at the NusA-suppressed pauses (Fig. 4D; Data Sets S3 and S4), providing evidence that the main function of B. subtilis NusG is pause stimulation. Much like at NusG-stimulated pauses, NusA displayed little effect at NusG-suppressed pauses (Fig. 4F; Data Sets S3 and S4). From these data we conclude that B. subtilis NusG plays a major role in pause stimulation, whereas NusA plays a minor role in this process.
We found that approximately 75% of the pause peaks identified in the WT strain and the nusAdep strain were located in protein-coding regions (Fig. 5A and B) and that these pause peaks were distributed relatively evenly along open reading frames (ORFs) (Fig. S3). Furthermore, we found that 45% and 31% of the NusA-stimulated and NusA-suppressed pauses were found in 5′ UTRs, respectively (Fig. 5C and D). Of the NusA-stimulated pauses within 5′ UTRs, 2/3 were located upstream of known transcription attenuators. Considering that 5′ UTRs constitute less than 1/10 of the genome, the relative density of NusA-stimulated pause peaks in 5′ UTRs is far higher than in ORFs. Knowing the important role of 5′ UTR pausing in regulating gene expression, these findings suggest that NusA could be an important participant in transcription attenuation or translational control mechanisms.
FIG 5.
Distribution of all pause peaks in 5′ untranslated regions (5′ UTRs) and open reading frames (ORFs). (A and B) All pause peaks identified in WT (A) and nusAdep (B) strains. (C and D) Distribution of NusA-stimulated (C) and NusA-suppressed (D) pause peaks.
To determine whether sequences upstream of the pause peaks exerted an effect on NusA-stimulated pausing, we first conducted hairpin folding predictions using Mfold (25). Folding sequences upstream of all NusA-stimulated pause peaks revealed putative pause hairpins for 67% of all NusA-stimulated pauses (Fig. 6; Data Set S5). The base of each hairpin was positioned 10 to 13 nt upstream of the pause position. This hairpin-to-3′-end distance is similar to the distance previously observed for other hairpin-stabilized pauses (8–10, 14, 15). We conducted a similar pause hairpin analysis for the sequences upstream of each NusA-suppressed pause peak (Data Set S5) and a control set of 50 sequences obtained at random from the B. subtilis genome. Only 35% of NusA-suppressed pauses had an appropriately positioned RNA hairpin upstream of the pause peak, while only 32% of the random sequences could form an appropriately positioned RNA hairpin. Using a Fisher exact test, we compared the percentage of the NusA-stimulated pauses with an RNA hairpin to the percentage of the control set of sequences with an RNA hairpin. This analysis yielded a P value of 4.195e−05, while a similar statistical comparison between the NusA-suppressed pauses and the control set of sequences yielded a P value of 0.8503. We conclude that an appropriately positioned RNA hairpin is a common component of NusA-stimulated pauses but not for NusA-suppressed pauses.
FIG 6.
Classification of putative pause hairpins. RNA hairpin predictions were performed by folding up to 100 nt of sequence upstream of each NusA-stimulated pause peak. The color key indicates the distance from the pause peak to the base of the hairpin (hairpin-to-3′-end distance). All the classified hairpins possess a thermodynamic free energy of folding of ≤−3.00 kcal/mol.
Our previous studies identified the consensus sequence logo TTNTTT in the ntDNA strand within the paused transcription bubble to be the NusG-dependent pausing motif (14). To identify a potential NusA-stimulated pause motif, we next performed a sequence motif enrichment analysis for the upstream sequences of the NusA-stimulated pauses via the MEME suite (26). However, we did not identify a consensus sequence motif for this class of transcription pauses, indicating that NusA interaction with the paused RNAP complex does not occur in a sequence-specific manner.
Effects of NusA on RNAP pausing in vitro.
Of the 129 NusA-stimulated pause peaks identified in vivo, 11 were tested in vitro. We selected candidates for in vitro analysis that represented different genetic contexts (5′ UTR and ORF), differential pause strengths, the presence of putative pause hairpins, and/or the influence of NusG. Six of these pauses were exclusively responsive to NusA, whereas 5 were responsive to both NusA and NusG in vivo. NusA-stimulated pausing was observed in vitro at the cysJ, rapF, and rplK peaks identified in vivo (Fig. 7 and Table S1). However, we did not observe pausing in vitro at eight other in vivo pause peaks that we tested (Fig. S4 and Table S1).
FIG 7.
In vitro transcription of NusA-stimulated pauses identified in vivo. Single-round in vitro transcription was performed with or without NusA and/or NusG for templates containing the pause peak for rplK (A), cysJ (B), and rapF (C). An IVG screenshot of the in vivo pause peaks is shown on the left, putative pause-stabilizing hairpins with their thermodynamic free energy in kilocalories/mole are shown in the center, and in vitro transcription gel images are shown on the right. Time points of elongation are indicated above each lane. ch, chase reactions. The positions of NusA-stimulated pause bands (P) observed in vivo and runoff (RO) transcripts are marked. Pause half-lives (T1/2) and the pausing efficiencies (Eff) are indicated. Lengths of paused and runoff transcripts are 164 and 230 nt for rplK, 130 and 223 nt for cysJ, and 67 and 124 nt for rapF. Positions of weak pauses that were unaffected by NusA in vivo are marked with arrowheads for rplK.
At the NusA-stimulated pause peak identified at genomic coordinate 118583 in the 5′ UTR of rplK, NusA increased the pausing efficiency from 25% to 43% in vitro (pause U164) and stabilized this pause by increasing the pause half-life by about 3-fold (Fig. 7A). In addition to the expected pause position, 3 prominent pauses were identified in vitro that were not observed in vivo. RNA folding predictions of the upstream sequences using Mfold (25) revealed a putative pause-stabilizing hairpin 12 nt upstream of the pause position identified in vivo (Fig. 7A). Both NusA and NusG stimulated pausing at the cysJ pause peak at genomic coordinate 3433938 in vivo. Consistent with the in vivo results, NusA and NusG increased the pause efficiency in vitro by 2-fold and 3.5-fold, respectively (Fig. 7B). NusG also caused a small increase in the pause half-life, but the combination of NusA and NusG resulted in a cooperative 4-fold increase in the half-life (Fig. 7B). A putative pause hairpin was also identified 11 nt upstream of the pause position (Fig. 7B). Modest in vitro stimulatory effects of NusA and NusG were also observed for the rapF pause peak that was identified at genomic coordinate 3846369 in vivo. In this case a pause hairpin was identified 11 nt upstream of the pause position (Fig. 7C).
DISCUSSION
The genome-wide role of NusG in pausing in B. subtilis was previously examined using RNET-seq (14). Of the 1,600 NusG-dependent pause sites identified in this organism, several hundred were identified in 5′ UTRs, suggesting that they could be involved in regulating gene expression as has been shown for pause sites in the 5′ UTR of the trp, tlrB, and ribD operons (11, 14, 15). In this work, we used a NusA depletion strategy to assess the genome-wide role of NusA in RNAP pausing. We conducted RNET-seq on WT, NusA-deficient, and NusG-deficient strains of B. subtilis. Our RNET-seq study revealed a limited number of NusA-stimulated and NusA-suppressed pauses transcriptome-wide (Fig. 1 to 5). Comparative analysis of pause peak strength revealed that NusA has relatively modest effects on pausing compared to NusG (Fig. 4C and E).
Our previous RNET-seq studies for NusG-dependent pausing used highly stringent thresholding criteria (pause score of ≥50) for data analysis (14). Using 5-fold-lower stringency than that of the previous study for pause peak identification (pause score of ≥10), we identified 129 NusA-stimulated pauses. Our data analysis of pause peak strength (pause score) for all WT pauses (Fig. 1), as well as for NusA-stimulated and NusA-suppressed pauses, illustrates that NusA can have both pause-stimulating and -suppressing effects in vivo (Fig. 2 and 4). We found that some pausing events are affected by either NusA or NusG and in some cases by both proteins (Fig. 4). Therefore, these two factors could be acting cooperatively or interchangeably in some cases.
NusA-stimulated pausing was confirmed in vitro at the rplK, cysJ, and rapF pause sites, as was the cooperative role of NusA and NusG for the cysJ and rapF pauses (Fig. 7; also see Table S1 in the supplemental material). Moreover, NusA-stimulating pausing was found to be enriched in 5′ UTRs. Of the 129 NusA-stimulated pause peaks, 58 were identified in 5′ UTRs, including some containing known transcription attenuators, suggesting that NusA-stimulated pausing could play a role in regulating downstream gene expression by participation in transcription attenuation, antitermination, or translation control mechanisms. Previous in vitro studies identified two NusA-stimulated pause sites in the trp operon 5′ UTR (10, 12, 20); however, our RNET-seq studies revealed that NusA does not stimulate pausing at either of these sites in vivo. In addition, pausing was not observed in vitro at several NusA-stimulated pause sites identified in vivo (Fig. S4 and Table S1), while in other instances pausing was stimulated by NusA in vitro at sites that were not observed in vivo (Fig. 7A). Although the reasons for these discrepancies are not clear, it is possible that additional factors contribute to NusA-stimulated pausing in vivo. In addition, our in vitro conditions were unable to fully mimic those in vivo such as ionic strength, the presence of molecular crowding agents, and DNA supercoiling. It is also possible that technical or computational limitations of RNET-seq prevent the identification of the comprehensive set of in vivo pause sites.
Cryo-electron microscopy (cryo-EM) structural studies of an E. coli transcription elongation complex confirmed prior biochemical studies showing that NusA interacts with the β flap of RNAP near the RNA exit channel, which positions NusA to interact with the nascent RNA (19, 22). Hence, we analyzed the RNA sequences upstream of the pause peaks in B. subtilis to identify sequence and/or structural components involved in NusA-stimulated pausing. In silico folding analysis of the sequences upstream of NusA-stimulated pauses revealed potential pause hairpins between 10 and 13 nucleotides upstream from the pause position at 67% of the pause peaks. This hairpin-to-3′-end distance is similar to what was observed for known hairpin-stabilized pause signals (8, 9, 12, 15). However, sequence enrichment analysis of upstream sequences did not yield a consensus pause motif, indicating that NusA-stimulated pausing is not sequence specific.
NusA and NusG are two general transcription elongation factors that, depending on the organism, exhibit varied effects on RNAP pausing and termination. Although we expected that NusA would function as a major pausing factor in B. subtilis, this expectation was based primarily on in vitro studies of a limited number of pause sites in E. coli and B. subtilis (8, 10, 16, 17, 22). In contrast to this expectation, our study revealed that NusA has a relatively modest role as a genome-wide pausing factor in B. subtilis. However, since there is still residual NusA under depletion conditions, it is possible that the in vivo effects of NusA are underrepresented in our RNET-seq data.
In contrast to its role in pausing, terminator sequencing (Term-seq) studies revealed that NusA is a major intrinsic termination factor in this organism (21, 23). In the case of NusG, recent studies established that this protein functions as a major pausing factor and as an intrinsic termination factor in B. subtilis (14, 23). NusG-dependent pausing requires interaction of NusG with conserved T residues in the ntDNA strand within the paused transcription bubble (14, 27). These T residues correspond to U residues in the upstream portion of the RNA-DNA hybrid, which is a critical feature of intrinsic terminators. Thus, in retrospect it is not surprising that NusG-dependent pausing was found to be a critical component of NusG-dependent intrinsic termination (23). In contrast to NusG’s role as a pausing factor in B. subtilis, E. coli NusG functions as a processivity factor by inhibiting frequent elemental pauses in vitro (7). However, like NusA, its role in pausing and termination has not been explored in vivo because both proteins are essential for viability. It will be interesting to identify the similarities and differences that these two proteins have in these model Gram-positive and Gram-negative species.
MATERIALS AND METHODS
Bacterial strains and growth.
The two B. subtilis strains used in this study were described previously (14). Strain PLBS730 (rpoC-10His Cmr amyE::Physpank-nusA lacIq Spr ΔnusA Emr) is the nusAdep strain and encodes a C-terminally His10-tagged β′ subunit of RNAP (28), an IPTG-inducible nusA allele integrated into the amyE locus, and a spectinomycin resistance gene in place of the native nusA coding sequence. Strain PLBS731 (rpoC-10His Cmr amyE::Physpank-nusA lacIq Spr ΔnusA Emr nusG::kan) is identical to PLBS730 except that it also contains a kanamycin resistance gene inserted into the middle of nusG (ΔnusG). NusA production was maintained in these two strains by culturing cells in the presence of 0.2 mM IPTG. PLBS730 and PLBS731 grown with 0.2 mM IPTG were considered WT and ΔnusG strains, respectively. These strains grown in the absence of IPTG were considered nusAdep and nusAdep ΔnusG strains, respectively (14, 23).
Strains were grown at 37°C in Gln-minimal medium (1× Spizizen salts, 0.1% glutamine, 0.5% glucose, 10 μM CaCl2, and 10 μM FeSO4 in tap water) in the presence of 0.2 mM IPTG and appropriate antibiotics. One milliliter of a shaking overnight culture was transferred to 25 mL of fresh Gln-minimal medium, and growth was continued overnight at 37°C. The next day the cells were pelleted by centrifugation, and the pellet was then washed in 0.5 mL of prewarmed medium at 37°C, transferred to a microcentrifuge tube, and then pelleted by centrifugation. Washed cells were suspended in 1 mL of prewarmed medium at 37°C, and then 5 OD600 (optical density at 600 nm) units was used to inoculate a 250-mL culture supplemented with 20 μg/mL chloramphenicol to an OD600 value of 0.02. Cells were grown at 37°C ± IPTG and then harvested during mid-exponential-phase growth when the culture reached an OD600 value of about 0.5. Cells were harvested as described previously (14).
Plasmids and oligonucleotides.
All plasmids and oligonucleotides used in this study are described in Tables S2 and S3, respectively.
Preparation of RNET-seq libraries.
Purification of native transcription elongation complexes using Ni-nitrilotriacetic acid (NTA) agarose, recovery of nascent RNA for RNET-seq, adapter ligation, reverse transcription, DNA circularization, and library preparation and purification were performed as described previously (14).
Sequencing and data analysis.
Pooled libraries of 3 replicates for each of the 4 strains and conditions were sequenced at the Pennsylvania State University genomics core facility on a single Illumina HiSeq Rapid run using 50-nt single-end sequencing as described previously (14). The raw sequencing reads from the libraries were processed as described in detail previously (14). The program cutadapt (v1.18) (29) was used to remove the adapter sequence GATCGGAAGAGCACACGTCTGAACTCCAGTCACATTACTCGATCTCGTATG from the 5′ end of the reads. Only the reads between 14 and 30 nt in length were retained. Duplicate reads within each sample were removed using the clumpify routine from the BBMap (v38.34.0) suite of programs (30). Following this correction step for duplication, cutadapt was again used to remove the 6-bp barcode sequence from the 5′ end of each read. Following these preprocessing steps, the reads were aligned to the B. subtilis genome (NC_000964.3, Bacillus subtilis subsp. subtilis strain 168 complete genome) using bowtie (v1.2.2) (31). All the multimapped reads were eliminated (parameters–best–strata -v 1 -m 1). Pause peaks were identified in the above generated uniquely mapped bam files using a custom program built on the SAMtools/htslib framework (pause peak finder M2.2). The selectivity of the peak finder program was regulated by three parameters: a cutoff (c) of the minimum number of reads required to define a peak and the ratio (r) of the count of the highest point in the peak to the local baseline, where the baseline was the median (m) value of counts in the surrounding N bases (w), upstream and downstream of the putative peak. Initial values were r = 50, w = 100 with cutoff value c = 394 for WT, 431 for nusAdep, 410 for ΔnusG, and 423 for the nusAdep ΔnusG strains, depending on the total number for reads in each sample. We later redefined a pause peak to have r = 10 for this study to lower the stringency of thresholding to identify NusA-stimulated pauses.
Identified peaks were further annotated with respect to their distance and strand (sense or antisense) relative to annotated genomic features. Read lengths and their respective proportions for each coordinate were evaluated. Estimates of the RNA expression levels from the RNET-seq data were determined using the pseudoaligner salmon (v0.12.0) (32) in quant mode. Each of the above steps was encapsulated in a pair of workflows and deployed on the DNAnexus platform, thus ensuring reproducible results. These workflows produced highly interactive HTML tables that allowed simplified manual exploration and validation of the data. The MEME suite (26, 33) was used to generate sequence logos (motifs) for the 30 bp upstream of the pause site.
Differential pause strength analysis between strains.
A superset of all pauses that passed the thresholding criteria of the above peak identification (r = 10) workflow was generated. For each pause, a score was calculated by normalizing the count over the median of counts within a window of 100 bp (w = 100) centered at the pause peak. When comparing two strains, all pauses that were expressed in only one strain but not the other (m = 0) were discarded to correct for effects of differential expression. Log2FC (fold change) of score was calculated for each coordinate. For the pauses that did not pass the peak identification threshold in one strain, the originally mapped counts were used after normalization. The differential analysis steps were integrated into a single workflow generating csv output files on the DNAnexus platform to ensure reproducibility and for further exploration and classification.
In vitro transcription.
Several NusA-stimulated pause peak candidates identified in vivo representing different genetic contexts (5′ UTR and ORF), differential pause strengths, the presence of putative pause hairpins, and/or the influence of NusG were selected for in vitro analysis. Sequences containing the sites of interest were fused downstream of a strong promoter followed by a C-less cassette. For some candidates in the 5′ UTRs or promoter-proximal coding regions of genes, the natural promoter was used, and C-less regions were generated through point mutations of the natural sequence. DNA templates for in vitro transcription were PCR amplified from geneBlock fragments obtained from Integrated DNA Technologies (IDT) containing a consensus promoter with an extended −10 element and a C-less cassette (25 to 31 nt), followed by the pause site sequence (pause peak coordinate + ∼150 bp upstream and downstream). Alternatively, some DNA templates were derived from plasmids containing the pause site sequence fused to a consensus promoter with an extended −10 element and a C-less cassette of 28 nt.
Single-round in vitro transcription and data analysis were performed as described previously (14) with modifications. In the first step of the reaction, halted elongation complexes containing a C-less 25- to 31-nt transcript were formed for 5 min at 37°C in a 20-μL reaction mixture containing a 100 nM concentration of the DNA template, ATP and GTP (40 μM each), 1 μM UTP, 50 μg/mL acetylated bovine serum albumin (BSA), 75 μg/mL (0.19 μM) B. subtilis RNAP holoenzyme, 0.38 μM SigA (housekeeping sigma factor), and 1 μCi of [α-32P]UTP at 37°C (no CTP). RNAP and SigA were added from a 20× stock solution containing 1.5 mg/mL RNAP and 0.35 mg/mL SigA in the enzyme dilution buffer (20 mM Tris-HCl, pH 8.0, 40 mM KCl, 1 mM dithiothreitol, and 50% glycerol). The halted complexes were diluted to a volume required for a particular experiment with 1× transcription buffer, 100 μg/mL acetylated BSA, and KCl such that the final KCl concentration was 20 mM. Elongation was resumed at 23°C by the addition of all four nucleoside triphosphates (NTPs) together with 100 μg/mL heparin, 1 μM NusA, and/or 1 μM NusG. The incoming nucleotide following the pause position (+1) was maintained at a limiting final concentration of 25 μM, and the other three NTP concentrations were maintained at 150 μM. Aliquots of the transcription elongation reaction mixture were removed at various times. Transcription of the last aliquot (chase reaction) was continued for 10 min at 37°C with 0.5 mM (each) NTP. The samples were analyzed by fractionation through 6% polyacrylamide sequencing gels followed by phosphorimaging.
A PhosphorImager and ImageQuant software (Molecular Dynamics) were used for quantification. Three parameters were measured from each lane: the transcripts at the pause site, total transcripts including the total of the RNA at the pause site and above, and the background intensity. The background was subtracted from the other parameters. Pausing efficiency was calculated as the fraction of RNA polymerase molecules that pause at a particular site. Pause half-lives were determined by plotting the decrease of relative pause band intensities over time.
In silico hairpin folding.
One hundred nucleotides upstream of each NusA-stimulated pause peak was analyzed by folding through the RNA batch fold algorithm where multiple batches were repeated with elimination of 5 nt from the 5′ end at each repetition to capture any plausible hairpin. Folding was further validated via Mfold (25).
Data availability.
RNET-seq data are available in the National Center for Biotechnology Information Sequence Read Archive (BioProject ID numbers PRJNA603835) and GEO (accession number GSE186285).
ACKNOWLEDGMENTS
Illumina sequencing was performed at the Pennsylvania State University genomics core facility.
This work was supported by NIH grant GM098399 (to P.B.) and the Intramural Research Program of the NIH National Cancer Institute (to M.K.).
Footnotes
[This article was published on 8 March 2022 with a typographical error in Discussion. The error was corrected in the current version, posted on 6 April 2022.]
Supplemental material is available online only.
Contributor Information
Paul Babitzke, Email: pxb28@psu.edu.
Elizabeth Anne Shank, University of Massachusetts Medical School.
REFERENCES
- 1.Kang J, Mishanina TV, Landick R, Darst SA. 2019. Mechanisms of transcriptional pausing in bacteria. J Mol Biol 431:4007–4029. 10.1016/j.jmb.2019.07.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Landick R. 2006. The regulatory roles and mechanisms of transcriptional pausing. Biochem Soc Trans 34:1062–1066. 10.1042/BST0341062. [DOI] [PubMed] [Google Scholar]
- 3.Larson MH, Mooney RA, Peters JM, Windgassen T, Nayak D, Gross CA, Block SM, Greenleaf WJ, Landick R, Weissman JS. 2014. A pause sequence enriched at translation start sites drives transcription dynamics in vivo. Science 344:1042–1047. 10.1126/science.1251871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Vvedenskaya IO, Vahedian-Movahed H, Bird JG, Knoblauch JG, Goldman SR, Zhang Y, Ebright RH, Nickels BE. 2014. Interactions between RNA polymerase and the “core recognition element” counteract pausing. Science 344:1285–1289. 10.1126/science.1253458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Yakhnin AV, Babitzke P. 2014. NusG/Spt5: are there common functions of this ubiquitous transcription elongation factor? Curr Opin Microbiol 18:68–71. 10.1016/j.mib.2014.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Belogurov GA, Artsimovitch I. 2015. Regulation of transcript elongation. Annu Rev Microbiol 69:49–69. 10.1146/annurev-micro-091014-104047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Landick R. 2021. Transcriptional pausing as a mediator of bacterial gene regulation. Annu Rev Microbiol 75:291–314. 10.1146/annurev-micro-051721-043826. [DOI] [PubMed] [Google Scholar]
- 8.Landick R, Wang D, Chan CL. 1996. Quantitative analysis of transcriptional pausing by Escherichia coli RNA polymerase: his leader pause site as paradigm. Methods Enzymol 274:334–353. 10.1016/s0076-6879(96)74029-6. [DOI] [PubMed] [Google Scholar]
- 9.Chan CL, Wang D, Landick R. 1997. Multiple interactions stabilize a single paused transcription intermediate in which hairpin to 3' end spacing distinguishes pause and termination pathways. J Mol Biol 268:54–68. 10.1006/jmbi.1997.0935. [DOI] [PubMed] [Google Scholar]
- 10.Yakhnin AV, Babitzke P. 2002. NusA-stimulated RNA polymerase pausing and termination participates in the Bacillus subtilis trp operon attenuation mechanism in vitro. Proc Natl Acad Sci USA 99:11067–11072. 10.1073/pnas.162373299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Yakhnin AV, Yakhnin H, Babitzke P. 2006. RNA polymerase pausing participates in the Bacillus subtilis trpE translation control mechanism by providing additional time for TRAP to bind to the nascent trp leader transcript. Mol Cell 24:547–557. 10.1016/j.molcel.2006.09.018. [DOI] [PubMed] [Google Scholar]
- 12.Yakhnin AV, Babitzke P. 2010. Mechanism of NusG-stimulated pausing, hairpin-dependent pause site selection and intrinsic termination at overlapping pause and termination sites in the Bacillus subtilis trp leader. Mol Microbiol 76:690–705. 10.1111/j.1365-2958.2010.07126.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zhang J, Landick R. 2016. A two-way street: regulatory interplay between RNA polymerase and nascent RNA structure. Trends Biochem Sci 41:293–310. 10.1016/j.tibs.2015.12.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yakhnin AV, FitzGerald PC, McIntosh C, Yakhnin H, Kireeva M, Turek-Herman J, Mandell ZF, Kashlev M, Babitzke P. 2020. NusG controls transcription pausing and RNA polymerase translocation throughout the Bacillus subtilis genome. Proc Natl Acad Sci USA 117:21628–21636. 10.1073/pnas.2006873117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Yakhnin H, Yakhnin AV, Mouery B, Mandell ZF, Karbasiafshar C, Kashlev M, Babitzke P. 2019. NusG-dependent RNA polymerase pausing and tylosin-dependent ribosome stalling lead to antibiotic resistance by inducing 23S rRNA methylation. mBio 10:e02665-19. 10.1128/mBio.02665-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Winkler ME, Yanofsky C. 1981. Pausing of RNA polymerase during in vitro transcription of the tryptophan operon leader region. Biochemistry 20:3738–3744. 10.1021/bi00516a011. [DOI] [PubMed] [Google Scholar]
- 17.Artsimovitch I, Landick R. 2000. Pausing by bacterial RNA polymerase is mediated by mechanistically distinct classes of signals. Proc Natl Acad Sci USA 97:7090–7095. 10.1073/pnas.97.13.7090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Yakhnin AV, Yakhnin H, Babitzke P. 2008. Function of the Bacillus subtilis transcription elongation factor NusG in hairpin-dependent RNA polymerase pausing in the trp leader. Proc Natl Acad Sci USA 105:16131–16136. 10.1073/pnas.0808842105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Toulokhonov I, Artsimovitch I, Landick R. 2001. Allosteric control of RNA polymerase by a site that contacts nascent RNA hairpins. Science 292:730–733. 10.1126/science.1057738. [DOI] [PubMed] [Google Scholar]
- 20.Mondal S, Yakhnin AV, Babitzke P. 2017. Modular organization of the NusA- and NusG-stimulated RNA polymerase pause signal that participates in the Bacillus subtilis trp operon attenuation mechanism. J Bacteriol 199:e00223-17. 10.1128/JB.00223-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Mondal S, Yakhnin AV, Sebastian A, Albert I, Babitzke P. 2016. NusA-dependent transcription termination prevents misregulation of global gene expression. Nat Microbiol 1:15007. 10.1038/nmicrobiol.2015.7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Guo X, Myasnikov AG, Chen J, Crucifix C, Papai G, Takacs M, Schultz P, Weixlbaumer A. 2018. Structural basis for NusA stabilized transcriptional pausing. Mol Cell 69:816–827. 10.1016/j.molcel.2018.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Mandell ZF, Oshiro RT, Yakhnin AV, Vishwakarma R, Kashlev M, Kearns DB, Babitzke P. 2021. NusG is an intrinsic transcription termination factor that stimulates motility and coordinates gene expression with NusA. Elife 10:e61880. 10.7554/eLife.61880. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Imashimizu M, Takahashi H, Oshima T, McIntosh C, Bubunenko M, Court DL, Kashlev M. 2015. Visualizing translocation dynamics and nascent transcript errors in paused RNA polymerases in vivo. Genome Biol 16:98. 10.1186/s13059-015-0666-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zuker M. 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31:3406–3415. 10.1093/nar/gkg595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Bailey TL, Johnson J, Grant CE, Noble WS. 2015. The MEME suite. Nucleic Acids Res 43:W39–W49. 10.1093/nar/gkv416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Yakhnin AV, Murakami KS, Babitzke P. 2016. NusG is a sequence-specific RNA polymerase pause factor that binds to the non-template DNA within the paused transcription bubble. J Biol Chem 291:5299–5308. 10.1074/jbc.M115.704189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Qi Y, Hulett FM. 1998. PhoP∼P and RNA polymerase σA holoenzyme are sufficient for transcription of Pho regulon promoters in Bacillus subtilis: PhoP∼P activator sites within the coding region stimulate transcription in vitro. Mol Microbiol 28:1187–1197. 10.1046/j.1365-2958.1998.00882.x. [DOI] [PubMed] [Google Scholar]
- 29.Martin M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17:10–12. 10.14806/ej.17.1.200. [DOI] [Google Scholar]
- 30.Marić J. 2015. Long read RNA-seq mapper. Master thesis. University of Zagreb, Zagreb, Croatia. [Google Scholar]
- 31.Langmead B, Trapnell C, Pop M, Salzberg SL. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to human genome. Genome Biol 10:R25. 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. 2017. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14:417–419. 10.1038/nmeth.4197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Bailey TL, Charles E. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers, p 28–36. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, CA. [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Set S1. Download jb.00534-21-s0001.xlsx, XLSX file, 4.3 MB (4.4MB, xlsx)
Data Set S2. Download jb.00534-21-s0002.xlsx, XLSX file, 0.01 MB (13KB, xlsx)
Data Set S3. Download jb.00534-21-s0003.xlsx, XLSX file, 1.1 MB (1.1MB, xlsx)
Data Set S4. Download jb.00534-21-s0004.xlsx, XLSX file, 1.3 MB (1.3MB, xlsx)
Data Set S5. Download jb.00534-21-s0005.xlsx, XLSX file, 0.02 MB (21.5KB, xlsx)
Supplemental text; Tables S1-S3; Figures S1-S4. Download jb.00534-21-s0006.pdf, PDF file, 3.2 MB (3.2MB, pdf)
Data Availability Statement
RNET-seq data are available in the National Center for Biotechnology Information Sequence Read Archive (BioProject ID numbers PRJNA603835) and GEO (accession number GSE186285).