Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2020 Aug 17;117(35):21628–21636. doi: 10.1073/pnas.2006873117

NusG controls transcription pausing and RNA polymerase translocation throughout the Bacillus subtilis genome

Alexander V Yakhnin a,b, Peter C FitzGerald c, Carl McIntosh c, Helen Yakhnin a, Maria Kireeva b, Joshua Turek-Herman b, Zachary F Mandell a, Mikhail Kashlev b,1, Paul Babitzke a,1
PMCID: PMC7474616  PMID: 32817529

Significance

Transcription can be transiently halted by pausing of RNAP, which provides additional time for gene regulatory events to occur. NusG is a universally conserved transcription elongation factor that was known to stimulate pausing at two positions in the Bacillus subtilis genome, both of which regulated expression of the downstream gene. Using genome-wide sequencing of nascent RNA, we identified thousands of pause sites in B. subtilis including 1,600 NusG-dependent pause sites. NusG induces pausing of RNAP in response to a conserved TTNTTT sequence motif in the nontemplate DNA strand within the paused transcription bubble. NusG-dependent pausing was confirmed at several pause sites in vitro. NusG-dependent pausing in the ribD riboswitch decreases the concentration of flavin mononucleotide required to regulate ribD expression.

Keywords: RNA polymerase pausing, NusG, RNET-seq, translocation register, riboswitch

Abstract

Transcription is punctuated by RNA polymerase (RNAP) pausing. These pauses provide time for diverse regulatory events that can modulate gene expression. Transcription elongation factors dramatically affect RNAP pausing in vitro, but the genome-wide role of such factors on pausing has not been examined. Using native elongating transcript sequencing followed by RNase digestion (RNET-seq), we analyzed RNAP pausing in Bacillus subtilis genome-wide and identified an extensive role of NusG in pausing. This universally conserved transcription elongation factor is known as Spt5 in archaeal and eukaryotic organisms. B. subtilis NusG shifts RNAP to the posttranslocation register and induces pausing at 1,600 sites containing a consensus TTNTTT motif in the nontemplate DNA strand within the paused transcription bubble. The TTNTTT motif is necessary but not sufficient for NusG-dependent pausing. Approximately one-fourth of these pause sites were localized to untranslated regions and could participate in posttranscription initiation control of gene expression as was previously shown for tlrB and the trpEDCFBA operon. Most of the remaining pause sites were identified in protein-coding sequences. NusG-dependent pausing was confirmed for all 10 pause sites that we tested in vitro. Putative pause hairpins were identified for 225 of the 342 strongest NusG-dependent pause sites, and some of these hairpins were shown to function in vitro. NusG-dependent pausing in the ribD riboswitch provides time for cotranscriptional binding of flavin mononucleotide, which decreases the concentration required for termination upstream of the ribD coding sequence. Our phylogenetic analysis implicates NusG-dependent pausing as a widespread mechanism in bacteria.


Transcription is regulated at the level of initiation, elongation, and termination. Highly processive elongation is punctuated by frequent pausing events, some of which are known to regulate gene expression. For example, pausing can allow synchronization of the RNA polymerase (RNAP) position with RNA folding and/or regulatory factor binding (1). NusG is a general transcription elongation factor conserved in all three domains of life; its homolog in archaea and eukaryotes is known as SPT5 (2). Binding of NusG to RNAP modulates the ability of the enzyme to respond to pause signals. In Escherichia coli, NusG exhibits antipausing activity (3). In contrast, NusG-dependent pausing is critical for regulation of the Bacilllus subtilis trpEDCFBA and tlrB (yxjB) operons (Fig. 1B) (4, 5). The trp operon is regulated by transcription attenuation and translation repression mechanisms in response to tryptophan by TRAP, the trp RNA-binding attenuation protein. NusG-dependent pausing in the trp operon leader region provides additional time for tryptophan-activated TRAP to bind to the nascent trp transcript such that it promotes formation of an RNA hairpin that sequesters the trpE Shine–Dalgarno sequence (4, 6). tlrB encodes a 23S ribosomal RNA (rRNA) methyltransferase enzyme involved in resistance to the antibiotic tylosin (5). tlrB expression is regulated by translation attenuation, transcription attenuation, and translation repression mechanisms. NusG-dependent pausing in the tlrB leader region is required for tylosin-dependent induction of tlrB expression; pausing provides time for translation of a short leader peptide that is vital for this complex regulatory mechanism (5).

Fig. 1.

Fig. 1.

NusG-dependent pausing occurs genome-wide and is sequence-specific. (A) Volcano plot showing 3′ ends that decrease at least eight-fold in the ΔnusG strain. trpE, tlrB, and ribD pause sites are indicated. (B) TTNTTT sequence logo of NusG-dependent pause peaks is shared by previously known trpE and tlrB pause signals. (C and D) Pause sites in the trpE and ribD 5′ UTRs in WT and ΔnusG strains as they appear in IGV at two magnifications. Arrows indicate direction of transcription. Genome-aligned reads are in gray while mapped 3′ ends corresponding to the RNAP active site are in blue.

NusG-dependent pausing in the trp operon leader requires NusG interaction with a T-rich sequence in the nontemplate DNA (ntDNA) strand of the paused transcription bubble (7, 8). This T-rich sequence (TTTATTT) is conserved in the tlrB pause site (5). In addition to the critical T-rich sequence, NusG-dependent pausing at both pauses is stimulated by an RNA hairpin that forms 11 to 12 nucleotides upstream of the RNA 3′ end. Although these are the only two NusG-dependent pause sites that have been shown to regulate gene expression, we reasoned that NusG-dependent pausing could be a common occurrence in B. subtilis. In this work, we examined RNAP pausing transcriptome-wide in isogenic B. subtilis wild-type (WT) and nusG knockout (ΔnusG) strains using an improved native elongating transcript sequencing followed by RNase digestion (RNET-seq) technique that combines sequencing of nascent transcripts (NET-seq) with RNase-mediated footprinting, which allowed probing of the translocation state of RNAP at pause sites (9). We found that NusG induces pausing throughout the entire B. subtilis transcriptome at a consensus TTNTTT motif in the ntDNA strand. NusG also shifts RNAP to the posttranslocation register at these pause sites. We further demonstrate that NusG-dependent pausing in the ribD riboswitch decreases the concentration of flavin mononucleotide (FMN) required for transcription termination. Phylogenetic analysis revealed that the B. subtilis type of NusG is widespread among bacteria, thus implicating NusG-dependent pausing as a widespread mechanism as opposed to the E. coli type possessing antipausing activity. We also identified a large number of NusG-independent pauses caused by a T residue in a ntDNA strand next to the 3′ end of the nascent RNA (+1T) and some other sequence elements yet to be identified. Investigation of the robust pausing mechanism at these sites was beyond the scope of the present work.

Results

Sequence-Specific NusG-Dependent Pausing Occurs Genome-Wide.

We developed a robust method to identify paused RNAP throughout the B. subtilis genome. Coprecipitation of nascent RNA with chromosomal DNA, followed by RNase I digestion, eliminated all terminated (released) RNAs that compromised the data of other NET-seq protocols. We defined a pause peak as a single genomic position with a unique 3′ end with the count 50-fold greater than the median count of the surrounding region. Pause peaks in WT and ΔnusG strains are listed in Datasets S1 and S2, respectively. An average of 0.7 pause peaks per gene were identified for the 4,420 annotated genes in B. subtilis. RNAP pauses at three major and a few minor positions at the previously characterized NusG-dependent pause site in the tlrB leader region. Therefore, we considered adjacent pause peaks within an arbitrarily chosen 10-bp window as a single pause site to account for the observed heterogeneity of the 3′ RNA ends at such pause sites. In these instances, the middle peak was designated as the pause site. Thus, pause sites consist of either one or several adjacent 3′ RNA ends. All pause sites in WT and ΔnusG strains are listed in Datasets S3 and S4, respectively. Direct count of 3′ ends that changed in abundance at least eight-fold upon nusG knockout revealed that 690 pauses in 566 different genes or UTRs decreased in number, but only 61 pauses increased, which identified NusG as a global pause-stimulating factor in B. subtilis (Fig. 1A).

After exclusion of pause peaks corresponding to repeated sequences such as rRNA and transfer RNA (tRNA), we classified 1,599 pause peaks (1,363 pause sites) in the WT strain that were more than 50-fold reduced in the ΔnusG strain as NusG-dependent pause peaks (Dataset S5). Similarly, 1,593 pause peaks (1,427 pause sites) in the ΔnusG strain were absent in the WT strain (Dataset S6). We also identified 773 pause peaks (539 pause sites) that were shared by the WT (Dataset S7) and ΔnusG (Dataset S8) strains. Approximately 80% of NusG-dependent pause sites possess unique 3′ ends (a single pause peak, Dataset S9). A summary of information from Datasets S1–S9 is presented in SI Appendix, Table S1. Two NusG-dependent pause sites from our RNET-seq data are shown in Fig. 1 C and D. Approximately 70% of the pause peaks unique to WT or ΔnusG strains are located in protein-coding regions (SI Appendix, Fig. S1 AC), and these pause peaks are distributed relatively evenly along open reading frames (ORFs) (SI Appendix, Fig. S2). Although lower in overall frequency, the density of all pause peaks (∼25%), and especially those shared by WT and ΔnusG strains (∼50%), is higher in untranslated regions (UTRs), which constitute only about one-tenth of the genome (SI Appendix, Fig. S1 AC). The fraction of UTR-localized pause peaks increased to ∼75% among the top 100 strongest peaks unique to the ΔnusG strain or those shared by the WT and ΔnusG strains (SI Appendix, Fig. S1 DF). The large number of pause sites in 5′ UTRs suggests that many of them could participate in attenuation or translation control mechanisms.

NusG-dependent pause peaks possess a distinct sequence logo consisting of an interrupted stretch of T residues (−11 TTNTTT −6) relative to the pause position at −1 (Fig. 1B and SI Appendix, Fig. S3A). This logo is even more pronounced for the top 100 strongest pause peaks based on score (SI Appendix, Fig. S3D) and for the top 100 pause peaks inside ORFs based on score (SI Appendix, Fig. S3B), but is less pronounced for UTR-located pause peaks (SI Appendix, Fig. S3C). We speculate that pausing in ORFs relies heavily on NusG to couple transcription and translation (10). In contrast, a lower frequency of NusG-dependent pauses in UTRs suggests that pausing in these regions occurs by more diverse and gene-specific mechanisms. The conserved TTNTTT sequence corresponds to residues within the ntDNA strand of the paused transcription bubble, which constitutes the known sequence-specific motif for NusG–ntDNA interaction (7, 8). Previously characterized NusG-dependent pause sites in the trpE and tlrB leaders contain the TTNTTT sequence motif (Fig. 1B) (5, 8). The degree of conservation of each T residue in the TTNTTT motif in vivo closely parallels the relative importance of each T residue previously determined for the trp leader pause site in vitro (7). NusG-dependent pause sites were found in the ribD and lysP (yvsH) riboswitches, suggesting that NusG could slow down RNAP to provide time for cotranscriptional binding of metabolites to nascent RNA.

T at position +1 is the only conserved residue in all pauses observed in the WT strain (Fig. 1B) and those shared by the WT and ΔnusG strains (Fig. 2A). The finding that +1T is most pronounced in the sequence logo of pause peaks shared by the WT and ΔnusG strains (Fig. 2A and SI Appendix, Fig. S3J) suggests that +1T participates primarily in NusG-independent pausing. The role of +1T in pausing is consistent with the rate of transcription elongation in vitro at different NTP concentrations; RNAP incorporates UTP at a markedly slower rate than other NTPs (Fig. 2B). Perhaps +1T plays a similar role in B. subtilis as the elemental pause sequence plays in RNAP pausing in E. coli (9, 11, 12). RNAP does not pause at every T residue in B. subtilis, indicating that +1T pauses contain additional elements that the logo analysis could not identify.

Fig. 2.

Fig. 2.

Low rate of UTP incorporation compared to the other NTPs in vitro is consistent with RNAP pausing upstream of +1T in vivo. (A) +1T is the only conserved residue in the sequence logo of pause peaks shared by WT and ΔnusG strains. (B) Transcription was performed using a TTT-to-CAA mutant ribD template that eliminates pausing (see below). Where indicated, one NTP was held at a limiting concentration (10 µM), while the other three NTPs were at 150 µM. Time points of elongation were 15, 20, 30, 40, 55, 70, 90, 120, and 160 s. Positions of terminated (T) and run-off (R) transcripts are indicated.

Pausing at the trpE and tlrB NusG-dependent sites is stimulated several-fold in vitro by upstream pause hairpin structures that form in nascent RNA (5, 7). Therefore, we tested sequences upstream of the strongest pause peaks (score value above 200) in the −61 to −11 nucleotide (nt) window for the ability to form a putative pause hairpin using Mfold. Only pause sites with unique 3′ ends (a single pause peak) were analyzed to determine the distance between the potential hairpin and the 3′ end of the paused transcript (hairpin to a 3′-end distance). Two-thirds of such sites are predicted to form potential pause-stimulating hairpins with ∆G values between −20 and −3 kcal/mol and a hairpin to a 3′-end distance of 10 to 12 nt (Dataset S9).

Ten newly identified NusG-dependent pause sites, as well as 5 control sites containing the TTNTTT sequence motif that did not induce pausing in vivo, were tested in vitro. NusG-dependent pausing was observed for all 10 pause sites (Fig. 3), but not for the 5 control sites (SI Appendix, Table S2 and Fig. S4). Therefore, the TTNTTT motif alone is not sufficient for pausing. Mfold predicted a potential pause hairpin structure upstream of all 10 pause sites (SI Appendix, Table S3). The role of six of these structures in pausing was tested using DNA antisense oligonucleotides designed to prevent hairpin formation, which would lead to a reduction of the pause half-life (13). Hairpin-stimulated pausing was confirmed for the ribD, ykrK, yqxK, and sacX pause sites (SI Appendix, Fig. S5), but not for the lysP and yrhG pause sites. Note that the structure predicted upstream of the lysP pause site overlaps the lysine-binding aptamer domain and is probably incompatible with regulation by the lysP riboswitch.

Fig. 3.

Fig. 3.

In vitro transcription of NusG-dependent pause sites identified in vivo. Single-round transcription was performed using the indicated templates ± NusG. Time points of elongation are indicated above each lane. Ch, chase reactions. The positions of NusG-dependent pause bands are marked by red arrows. Length of paused and run-off transcripts are 212 and 354 nt for ribD, 266 and 334 nt for lysP, 153 and 191 nt for sspD, 171 and 227 nt for ykrK, 154 and 235 nt for ydzK, 134 and 183 nt for sfrAA, 117 and 160 nt for yqxK, 133 and 187 nt for sacX, 95 and 190 nt for yrhG, and 116 and 302 nt for ywkF. Additional nonchaseable bands for ribD (264 nt), lysP (294 nt), yqxK (114–115 nt), and ywkF (195 nt) are products of transcription termination.

NusG Controls the Translocation Register of RNAP.

An important feature of RNET-seq is RNase I-mediated footprinting of nascent RNA protected by RNAP at the 5′ end. The length of the protected fragments depends on the translocation register of RNAP (9). RNase I digestion resulted in RNA fragments that were two bases longer than those obtained from RNase T1 (SI Appendix, Fig. S6). The increased read length resulted in a higher fraction of reads that were uniquely mapped to the genome compared to the previously published RNase T1 data (9). The majority of reads in our libraries were 16 or 17 nt long. Based on previous RNET-seq results (5), we assigned the 16- and 17-nt reads to posttranslocated and pretranslocated states, respectively (Fig. 4A). Therefore, the relative amount of 16- and 17-nt reads at each genomic position was a proxy of the RNAP translocation register. A transcriptome-wide analysis using this 16- to 17-nt ratio indicated that paused RNAP was predominantly in the pretranslocated state. Among pause peaks shared by the WT and ΔnusG strains (i.e., NusG-independent pause peaks), the ratio of pause peaks that are predominantly in the posttranslocated state to the pause peaks that are predominantly in the pretranslocated state was approximately two-fold lower in the ΔnusG strain (Fig. 4B and SI Appendix, Fig. S7), and the overall fraction of posttranslocated 16-nt reads was also approximately two-fold lower in the ΔnusG strain (Fig. 4C). Therefore, NusG normally promotes forward translocation of RNAP, in agreement with in vitro data for E. coli NusG (3, 14, 15). The ratio of NusG-dependent pause peaks that are predominantly in the posttranslocated state to pause peaks that are predominantly in the pretranslocated state was 0.91 (Fig. 4B). Pause peaks in which the fraction of 16-nt reads is ≥0.5 constitute 30% of all posttranslocated pause peaks that are specific to the WT strain, but only 1% of the peaks that are specific to the ΔnusG strain (calculated from data in Datasets S5 and S6). This fraction of WT peaks with predominantly 16-nt reads possesses the strongest signature of the NusG-dependent peaks (SI Appendix, Fig. S3F), in sharp contrast to reads of all other lengths (SI Appendix, Fig. S3 E, G, H, and I), further demonstrating that NusG promotes forward translocation of RNAP.

Fig. 4.

Fig. 4.

NusG shifts RNAP to the posttranslocation register. (A) Length of RNase I-protected nascent transcripts depend on the RNAP translocation register. (B) NusG increases the fraction of RNAP in the posttranslocation register. (C) Distribution of read length at pause peaks in WT and ΔnusG strains. (D) NusG deficiency increases the fraction of backtracked complexes (>18) genome-wide in pause peaks shared by WT and ΔnusG strains. Brown color indicates when WT and ΔnusG peaks overlap.

The 18-nt reads may include both pretranslocated and backtracked translocation states, whereas reads longer than 18 nt resulted primarily from backtracked elongation complexes (Fig. 4A). Reads longer than 18 nt constitute 1% of the reads at NusG-dependent pause peaks and 2% of the reads at pause peaks found only in the ΔnusG strain (Fig. 4C). Analysis of >18-nt reads at pause peaks shared by WT and ΔnusG strains showed that deletion of nusG increased backtracking (Fig. 4D). Our in vivo analysis is consistent with B. subtilis RNAP having a lower propensity to backtrack compared to RNAP from E. coli, as determined in vitro by the length of RNase-resistant fragments of nascent RNA (SI Appendix, Fig. S6; compare 20-nt reads in A and B).

RNAP paused consecutively at two or more adjacent positions at ∼20% of NusG-dependent sites. RNET-seq read length at such sites suggests an extreme forward translocation state of RNAP at the first pause position, as it was characterized by a high fraction of reads shorter than 16 nt. This hypertranslocated state progressively changed to more typical posttranslocation (16 nt) and pretranslocation (17 nt) states as RNAP advanced to the downstream pause position(s) (SI Appendix, Table S4). In each case, stronger pausing (higher read count values) at downstream positions likely resulted from higher stability of posttranslocated vs. hypertranslocated pause complexes. Hypertranslocation was also observed at single-peak pause sites in which the distance between the TTNTTT sequence motif and the unique pause 3′ end is 1 nt shorter than usual. This pausing pattern suggests that NusG can accommodate the TTNTTT sequence motif at several adjacent positions relative to the 3′ end of the nascent RNA and/or that the sequence context of pause sites could provide two or more overlapping TTNTTT motifs. The molecular mechanism of pausing through direct interaction of NusG with the TTNTTT sequence in the ntDNA strand (8) is at least partly responsible for the nonequal pause strength at adjacent positions in favor of the posttranslocation state.

NusG-Dependent Pausing Regulates the ribD Riboswitch.

We identified a strong hairpin-stimulated NusG-dependent pause site within the ribD riboswitch that binds the flavin nucleotides FMN and FAD (16, 17). Two pause sites (PA and PB) were observed previously in this B. subtilis riboswitch by transcription of the ribD leader in vitro using E. coli RNAP (18). The NusG-dependent pause site corresponds to PA, whereas we did not observe pausing at PB in vivo by RNET-seq or in vitro using B. subtilis RNAP. The ribD pause site constitutes a separate regulatory module between the FMN-binding aptamer and the intrinsic terminator (Fig. 5A), such that FMN binding did not affect NusG-dependent pausing (SI Appendix, Fig. S8; compare the left lanes in A with the right lanes in B). Importantly, NusG-dependent pausing increased the FMN-dependent termination efficiency about three-fold at a saturating concentration of FMN (SI Appendix, Fig. S8B). The requirement of the TTNTTT motif for NusG-dependent pausing was confirmed by substitution of the three consecutive T residues with CAA; this substitution resulted in the complete loss of NusG-dependent pausing (Fig. 5B). We also found that deletion of the pause hairpin resulted in a five-fold decrease in the pause half-life, whereas substitution of the original hairpin (ΔG = −9.5 kcal/mol) by a stronger hairpin from the trp leader (ΔG = −11.0 kcal/mol) resulted in a modest increase in the pause half-life in vitro (SI Appendix, Fig. S8A).

Fig. 5.

Fig. 5.

NusG-dependent pausing in the ribD riboswitch reduces the concentration of FMN required for termination. (A) Sequence and structure of the ribD riboswitch showing the FMN-binding aptamer (P1 to P6), pause site, pause hairpin, and alternative terminator and antiterminator (shaded) structures (data from ref. 18). C8U substitution introduced a 27-bp C-less cassette. The TTT-to-CAA substitutions in the TTNTTT motif (red) and disruption of the pause hairpin (Δ169 to 179, blue) are shown. (B) The TTNTTT motif is required for NusG-dependent pausing. Transcription was performed using WT or mutant templates. Positions of run-off (R), terminated (T), and paused (P) transcripts are indicated. Pause half-lives (T1/2) are shown below the gel. (C) Transcription was performed using the WT ribD template in the presence of the FMN concentration indicated above each lane. Termination efficiencies (% Term) are shown below each lane. Half-effective FMN concentration for termination (C1/2) is shown below each set of lanes. (D) Change in termination efficiency using WT and TTT → CAA mutant templates as a function of FMN concentration (±NusG).

NusG-dependent pausing in the FMN riboswitch may provide additional time for cotranscriptional binding of FMN to the nascent RNA. Consistent with this hypothesis, NusG decreased the FMN concentration required for half-maximal transcription termination by nine-fold (Fig. 5 C and D). NusG also increased the termination efficiency at saturating FMN concentrations with the WT template (Fig. 5 C and D), but not with a template containing the TTT-to-CAA mutation in the TTNTTT motif (SI Appendix, Fig. S9A). In the presence of NusG, the effect of the ribD pause hairpin on the half-effective FMN concentration for termination closely paralleled its effect on pausing (SI Appendix, Figs. S8A and S9B). The TTT-to-CAA mutation resulted in a two-fold increase in expression of a ribD-lacZ transcriptional fusion in vivo (SI Appendix, Table S5). We conclude that NusG-dependent pausing provides additional time for cotranscriptional binding of FMN to the nascent RNA, resulting in increased termination preceding the ribD-coding sequence. These findings are similar to the role of NusG-dependent pausing in the trpEDCFBA and tlrB (yxjB) operons where pausing provides time to switch expression of the operon on or off (5, 6).

Discussion

Our RNET-seq study demonstrated that NusG promotes forward translocation of RNAP. This study also led to the identification of 1,600 NusG-dependent pause sites in the B. subtilis genome. We previously showed that two of these pause sites regulate trpE and tlrB in vivo (46). Our ribD studies provide a third example in which NusG-dependent pausing in 5′ UTRs contributes to regulating gene expression. Since RNET-seq identified several hundred NusG-dependent pause sites in 5′ UTRs, it is likely that NusG participates in a large number of gene regulatory mechanisms. The role of the >1,000 NusG-dependent pause sites in ORFs will be addressed in future studies.

Almost all of the 1,600 pause sites that were identified by RNET-seq were located sufficiently downstream from transcription start sites to vacate promoters for binding of additional RNAP(s) upstream of the paused RNAP. However, we did not observe evidence of a queue of trailing RNAPs behind a leading paused RNAP even for exceptionally strong pauses in 5′ UTRs. The lack of a queue suggests that a trailing RNAP is capable of restoring elongation of the leading paused RNAP or that the paused RNAP resumes transcription prior to the arrival of the trailing RNAP. The frequency of these two possibilities would be dictated by the strength of the pause and the rate of transcription initiation.

A previous study provided evidence that NusG participates in an rRNA antitermination mechanism in E. coli (19). The absence of NusG in the ΔnusG strain did not change the transcription pattern of rRNA operons (SI Appendix, Fig. S10), indicating that NusG plays a negligible role in antitermination of rRNA transcription in B. subtilis. Another intriguing observation is that a cluster of tRNA genes between genomic positions 3,171,868 and 3,173,799 revealed that a quarter of the pause sites in this region are located at a T residue, corresponding to the invariable +8U that is important for positioning of the D stem relative to the stacked T and acceptor stems in mature tRNA (20). Perhaps pausing at these positions contributes to cotranscriptional processing of tRNA.

Visual inspection using IGV browser revealed that the NusG-independent pause site in antisense RNA transcribed from the xynA-pps intercistronic region contains a −10 promoter-like sequence motif. Therefore, some promoter-proximal NusG-independent pause sites could be SigA-dependent, reminiscent of the promoter-proximal pausing identified in E. coli (21). Another strong pause site shared by the WT and ΔnusG strains near the transcription start of mhqA likely results from a roadblock by the DNA-bound repressor MhqR, since this pause occurs just upstream of the known MhqR-binding site (22). In addition to multiple gene-specific pausing mechanisms, a weak −9G residue appears in the logo of NusG-independent pause peaks (SI Appendix, Fig. S3J). In the elongation complex, a strong base pair formed by −9G in nascent RNA at the upstream edge of the RNA–DNA hybrid impedes RNAP forward translocation (9), which is consistent with the high fraction of long reads at NusG-independent pause peaks.

Amino acid residues N81 and especially T82 in B. subtilis NusG are involved in stimulation of pausing in vitro by recognition of the TTNTTT pause motif. In contrast, E. coli NusG has residues S85 and V86 at these two positions and does not stimulate pausing at the TTNTTT sequence (8). The structure of the transcription antitermination complex confirmed close proximity of V86 in E. coli NusG to the −8 residue in the ntDNA strand that corresponds to the underlined T of the TTNTTT pause motif (23). Note that this is the same T residue in the trp operon pause site that crosslinked to NusG (8). We performed a protein database search for the region surrounding this dipeptide of NusG. The resulting phylogenetic tree of the taxonomic relationship of bacterial NusG-like homologs (SI Appendix, Fig. S11A) and the sequence logo of the NusG region of interest (SI Appendix, Fig. S11B) indicated that the B. subtilis type of NusG is widespread among bacteria, whereas the E. coli version of NusG is restricted primarily to γ-proteobacteria. Superimposition of the retrieved sequences on a 16S rRNA-based phylogenic tree (24) revealed occasional horizontal transfer of nusG genes between unrelated bacterial species; however, each eubacterial NusG falls into either the B. subtilis type or the E. coli type (SI Appendix, Fig. S11C). Therefore, B. subtilis is a preferred model organism to explore the pause-stimulating activity of NusG in bacterial transcription. Moreover, since yeast Spt5 also interacts with the ntDNA strand within the transcription bubble (25), NusG/Spt5-dependent pausing might be a universally conserved mechanism shared by prokaryotes and eukaryotes.

Materials and Methods

Detailed protocols for construction of RNET-seq libraries, plasmids, and strains as well as the RNase footprinting assay are described in SI Appendix, Supplementary Text.

Outline of the RNET-Seq Protocol.

B. subtilis strains containing His10-tagged RNAP were grown in glucose-minimal medium. Addition of culture to a frozen slurry buffer stopped transcription rapidly by fast cooling and disruption of the cell membrane. Cells were pelleted by centrifugation and stored frozen at −80 °C. The cell suspension was digested with lysozyme and RNase I, the lysate was layered over a sucrose cushion, and the nucleoid along with the associated RNAP and nascent RNA were pelleted by centrifugation. The compact pellet below the sucrose cushion (SI Appendix, Fig. S12) was solubilized by digestion with DNases together with RNase I, and the resulting lysate was cleared by centrifugation. Native transcription elongation complexes were purified by immobilization with Ni-NTA agarose, extensive washing, and elution with imidazole. Nucleic acids were purified from transcription elongation complexes by phenol extraction/isopropanol precipitation and treated with DNase I. RNA was extracted with phenol and then precipitated with isopropanol.

Purified nascent RNA was ligated to a 5′-adenylated barcode DNA linker. The resulting RNA–DNA complexes were recovered by phenol extraction/isopropanol precipitation, reverse-transcribed, and digested with RNase H. The resulting cDNA was fractionated through a urea-polyacrylamide gel, followed by staining with SyBR Gold. Bands of the correct length were excised from the gel, and then the DNA was extracted with water and precipitated with isopropanol. The resulting single-stranded cDNA was circularized with CircLigase and used as a template for subsequent PCR amplification with a pair of adapters that contained Illumina i5 and i7 barcodes. The resulting sequencing libraries were purified by electrophoresis in native polyacrylamide gels. SYBR gold-stained gels were imaged under ultraviolet light, and slices containing PCR products of interest were excised. DNA was extracted with water and precipitated with isopropanol. Library recovery was estimated by electrophoresis of DNA samples followed by SYBR gold staining and quantification using reference DNA samples and ImageQuant software. The quality and concentration of DNA libraries was also determined using a Bioanalyzer DNA high-sensitivity chip and qPCR.

Sequencing and Data Analysis.

Pooled libraries were sequenced at the Penn State University Genomics Core Facility on a single Illumina HiSeq Rapid run using 50-nt single-end sequencing with custom sequencing Read 1 primer oLSC006, resulting in a total of 320 million reads. High-throughput sequencing data were deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) under BioProject ID PRJNA603835. The raw sequencing reads were processed as follows: The program cutadapt (v1.18) (26) was used to remove the adapter sequence GAT​CGG​AAG​AGC​ACA​CGT​CTG​AAC​TCC​AGT​CAC​ATT​ACT​CGA​TCT​CGT​ATG from the 5′ end of the reads, and only reads between 14 and 30 bases were retained. Duplicate reads within each sample were removed using the clumpify routine from the BBmap (v38.34.0) suite of programs (http://bib.irb.hr/datoteka/773708.Josip_Maric_diplomski.pdf). Following this deduplication step, cutadapt was again used to remove the 6-bp barcode sequence from the 5′ end of each read. Following these preprocessing steps, the reads were aligned to the B. subtilis genome (NC_000964.3 Bacillus subtilis subsp. subtilis str. 168 complete genome) using bowtie (v1.2.2) (27), rejecting all reads that mapped more than once (parameters–best–strata -v 1 -m 1). A custom program built on the samtools/htslib framework was used to identify pause peaks in the uniquely mapped bam files. The selectivity of the program was determined by three values: a cutoff (c) of the minimum number of reads required to define a peak; the ratio (r) of the highest point in the peak to the local baseline, where the baseline was the median value in the surrounding N bases (w), upstream and downstream of the putative peak. Typical values were r = 50, w = 100 with cutoff value c = 394 for WT and 410 for ΔnusG strains, depending on the total number for reads in each sample.

Identified peaks were further annotated with respect to their distance and strandedness relative to annotated genomic features. Weblogo (v3.6.0) (28) was used to generate sequence logos (motifs) for the 20 bp around the pause site. Estimates of the RNA expression levels from the RNET-seq data were determined using the pseudoaligner salmon (v0.12.0) (29) in quant mode. All of the above steps were encapsulated in a pair of workflows and deployed on the DNAnexus platform, thus ensuring reproducible results. These workflows produced highly interactive HTML tables that allow simplified manual exploration and validation of the data.

The maximum-likelihood phylogeny (SI Appendix, Fig. S11 A and B) was constructed based on the result of a ClustalW multiple sequence alignment conducted on the recognition motif (VRXXP) of NusG homologs from 246 representative genera on the MEGA7 platform (8, 30, 31). These representative genera were identified from a BLASTp query for the top 10,000 homologs of the NusG recognition region (DDSWXXVRXXPXVXGFXG) where “X” indicates any amino acid with the boldface and underlined Xs corresponding to the critical NT motif in B. subtilis NusG (32). The recognition motif illustrating the sequence preference of these 246 NusG homologs was generated via the MEME suite (33). Tree annotation and display were created with the interactive tree of life web platform (iTOL) (34). Analysis of phylogeny of bacterial NusG proteins for SI Appendix, Fig. S11C, was performed using BLASTp with the filter of low complexity regions. Scoring parameters were PAM30 for Matrix and Existance:7 Extension:2 for Gap Costs. The nonredundant protein sequences (nr) database was searched with the BLASTp query sequence FPGYVLVXXVMXDDSWXXVRXXPXVXGFXG. The underlined dipeptide is SV in E. coli NusG and NT in B. subtilis NusG (8).

In Vitro Transcription.

Ten newly identified NusG-dependent pause sites with diverse pause strength and genetic context (intergenic and ORF regions) and 5 control sites containing the TTNTTT sequence motif that did not induce pausing in vivo were selected for in vitro analysis. Sites of interest were fused with a strong promoter followed by a C-less cassette and then were tested for pausing by single-round in vitro transcription. Except for ribD, templates for in vitro transcription were generated with PCR by merging of two overlapping fragments. The first fragment that contained the pause site of interest surrounded by its flanking regions was amplified from B. subtilis chromosomal DNA as the template and a pair of site-specific primers. The second fragment (derived from the trp leader and common to all pause sites) contained a consensus promoter with an extended −10 region and a 29-nt C-less cassette (6). The ribD DNA fragment containing its promoter, 5′ UTR, and the first 18 codons (−39 to +350 relative to the start of ribD transcription) were PCR-amplified using B. subtilis chromosomal DNA as template and primers RibL For2 and RibL Rev. The primer RibL For2 (SI Appendix, Table S8) introduced seven point substitutions into the natural sequence, resulting in a consensus promoter with an extended −10 region, and a 27-nt C-less cassette. The PCR fragment was cloned into plasmid pTZ19 between the EcoRI and BamHI sites, resulting in pAY197. Pause motif mutation (TTT → CAA, pYH339), pause hairpin deletion (pAY198), or substitution with the trp leader pause hairpin (pAY200) were generated by site-directed mutagenesis using mutagenic oligos and pAY197 as template. Templates for transcription of WT ribD and its mutant derivatives were generated by PCR amplification on plasmid templates described above using RibTZ Fr and RibL Rev primers (SI Appendix, Table S8).

Single-round in vitro transcription and data analysis were performed as described previously (4) with modifications. In the first step, halted elongation complexes containing a 29-nt transcript were formed for 5 min at 37 °C in a 20-µL reaction containing 50 to 100 nM DNA template, ATP and GTP (40 µM each), 1 µM UTP, 50 µg/mL acetylated bovine serum albumin (BSA), 75 µg/mL (0.19 µM) B. subtilis RNAP holoenzyme, 0.38 µM SigA (housekeeping sigma factor), and 1 µCi of [α-32P]UTP at 37 °C (no CTP). RNAP and SigA were added from a 20× stock solution containing 1.5 mg/mL RNAP and 0.35 mg/mL SigA in the enzyme dilution buffer (20 mM Tris⋅HCl, pH 8.0, 40 mM KCl, 1 mM dithiothreitol, and 50% glycerol). The halted complexes were diluted to a volume required for a particular experiment with 1× transcription buffer, 100 µg/mL acetylated BSA, and KCl such that the final KCl concentration was 17 mM. Elongation was resumed at 23 °C by the addition of all four NTPs together with 100 µg/mL heparin ± 1 µM NusG. The final NTP and KCl concentrations were 150 µM and 10 mM, respectively. Aliquots of the transcription elongation reaction were removed at various times. Transcription of the last aliquot (chase reaction) was continued for 10 min at 37 °C with 0.5 mM each NTP. Termination was tested for 10 min at 37 °C in a reaction containing 20 mM KCl.

β-Galactosidase Assay.

B. subtilis cultures were grown at 37 °C in minimal glucose–acid casein hydrolysate supplemented with 1 or 0.1 µM riboflavin until midexponential phase. β-Galactosidase activity was determined as described (35). Experiments were performed at least three times.

Supplementary Material

Supplementary File
pnas.2006873117.sd01.xlsx (461.3KB, xlsx)
Supplementary File
pnas.2006873117.sd02.xlsx (576.3KB, xlsx)
Supplementary File
pnas.2006873117.sd03.xlsx (466.6KB, xlsx)
Supplementary File
pnas.2006873117.sd04.xlsx (460.8KB, xlsx)
Supplementary File
pnas.2006873117.sd05.xlsx (266.5KB, xlsx)
Supplementary File
pnas.2006873117.sd06.xlsx (257.1KB, xlsx)
Supplementary File
pnas.2006873117.sd07.xlsx (151.9KB, xlsx)
Supplementary File
Supplementary File
pnas.2006873117.sd09.xlsx (211.7KB, xlsx)
Supplementary File
pnas.2006873117.sapp.pdf (11.2MB, pdf)

Acknowledgments

Illumina sequencing was performed at the Penn State Genomics Core Facility. We thank Alexander Mironov for providing B. subtilis strain ribB110. This work was supported by NIH Grant GM098399 (to P.B.) and the Intramural Research Program of the NIH National Cancer Institute (to M. Kashlev)

Footnotes

The authors declare no competing interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2006873117/-/DCSupplemental.

Data Availability.

A total of 10.3 Gb of sequencing data have been deposited in the National Center for Biotechnology Information Sequence Read Archive (BioProject ID PRJNA603835). All other study data are included in this article and SI Appendix.

References

  • 1.Landick R., The regulatory roles and mechanism of transcriptional pausing. Biochem. Soc. Trans. 34, 1062–1066 (2006). [DOI] [PubMed] [Google Scholar]
  • 2.Yakhnin A. V., Babitzke P., NusG/Spt5: Are there common functions of this ubiquitous transcription elongation factor? Curr. Opin. Microbiol. 18, 68–71 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Sevostyanova A., Belogurov G. A., Mooney R. A., Landick R., Artsimovitch I., The β subunit gate loop is required for RNA polymerase modification by RfaH and NusG. Mol. Cell 43, 253–262 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Yakhnin A. V., Yakhnin H., Babitzke P., Function of the Bacillus subtilis transcription elongation factor NusG in hairpin-dependent RNA polymerase pausing in the trp leader. Proc. Natl. Acad. Sci. U.S.A. 105, 16131–16136 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Yakhnin H. et al., NusG-dependent RNA polymerase pausing and tylosin-dependent ribosome stalling are required for tylosin resistance by inducing 23S rRNA methylation in Bacillus subtilis. MBio 10, e02665-19 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Yakhnin A. V., Yakhnin H., Babitzke P., RNA polymerase pausing regulates translation initiation by providing additional time for TRAP-RNA interaction. Mol. Cell 24, 547–557 (2006). [DOI] [PubMed] [Google Scholar]
  • 7.Yakhnin A. V., Babitzke P., Mechanism of NusG-stimulated pausing, hairpin-dependent pause site selection and intrinsic termination at overlapping pause and termination sites in the Bacillus subtilis trp leader. Mol. Microbiol. 76, 690–705 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Yakhnin A. V., Murakami K. S., Babitzke P., NusG is a sequence-specific RNA polymerase pause factor that binds to the non-template DNA within the paused transcription bubble. J. Biol. Chem. 291, 5299–5308 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Imashimizu M. et al., Visualizing translocation dynamics and nascent transcript errors in paused RNA polymerases in vivo. Genome Biol. 16, 98 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Burmann B. M. et al., A NusE:NusG complex links transcription and translation. Science 328, 501–504 (2010). [DOI] [PubMed] [Google Scholar]
  • 11.Larson M. H. et al., A pause sequence enriched at translation start sites drives transcription dynamics in vivo. Science 344, 1042–1047 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Vvedenskaya I. O. et al., Interactions between RNA polymerase and the “core recognition element” counteract pausing. Science 344, 1285–1289 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Yakhnin A. V., Babitzke P., NusA-stimulated RNA polymerase pausing and termination participates in the Bacillus subtilis trp operon attenuation mechanism invitro. Proc. Natl. Acad. Sci. U.S.A. 99, 11067–11072 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bar-Nahum G. et al., A ratchet mechanism of transcription elongation and its control. Cell 120, 183–193 (2005). [DOI] [PubMed] [Google Scholar]
  • 15.Herbert K. M. et al., E. coli NusG inhibits backtracking and accelerates pause-free transcription by promoting forward translocation of RNA polymerase. J. Mol. Biol. 399, 17–30 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Mironov A. S. et al., Sensing small molecules by nascent RNA: A mechanism to control transcription in bacteria. Cell 111, 747–756 (2002). [DOI] [PubMed] [Google Scholar]
  • 17.Winkler W. C., Cohen-Chalamish S., Breaker R. R., An mRNA structure that controls gene expression by binding FMN. Proc. Natl. Acad. Sci. U.S.A. 99, 15908–15913 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wickiser J. K., Winkler W. C., Breaker R. R., Crothers D. M., The speed of RNA transcription and metabolite binding kinetics operate an FMN riboswitch. Mol. Cell 18, 49–60 (2005). [DOI] [PubMed] [Google Scholar]
  • 19.Torres M., Balada J. M., Zellars M., Squires C., Squires C. L., In vivo effect of NusB and NusG on rRNA transcription antitermination. J. Bacteriol. 186, 1304–1310 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Metzler D. E., Metzler C. M., “The nucleic acids” in Biochemistry: The Chemical Reactions of Living Cells, Hayhurst J., ed., (Academic Press, ed. 2, 2001), pp. 203–279. [Google Scholar]
  • 21.Ring B. Z., Yarnell W. S., Roberts J. W., Function of E. coli RNA polymerase sigma factor sigma 70 in promoter-proximal pausing. Cell 86, 485–493 (1996). [DOI] [PubMed] [Google Scholar]
  • 22.Töwe S. et al., The MarR-type repressor MhqR (YkvE) regulates multiple dioxygenases/glyoxalases and an azoreductase which confer resistance to 2-methylhydroquinone and catechol in Bacillus subtilis. Mol. Microbiol. 66, 40–54 (2007). [DOI] [PubMed] [Google Scholar]
  • 23.Said N. et al., Structural basis for λN-dependent processive transcription antitermination. Nat. Microbiol. 2, 17062 (2017). [DOI] [PubMed] [Google Scholar]
  • 24.Wittekindt N. E. et al., Nodeomics: Pathogen detection in vertebrate lymph nodes using meta-transcriptomics. PLoS One 5, e13432 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Crickard J. B., Fu J., Reese J. C., Biochemical analysis of yeast suppressor of Ty 4/5 (Spt4/5) reveals the importance of nucleic acid interactions in the prevention of RNA Polymerase II arrest. J. Biol. Chem. 291, 9853–9870 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Martin M., Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011). [Google Scholar]
  • 27.Langmead B., Trapnell C., Pop M., Salzberg S. L., Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Crooks G. E., Hon G., Chandonia J. M., Brenner S. E., WebLogo: A sequence logo generator. Genome Res. 14, 1188–1190 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Patro R., Duggal G., Love M. I., Irizarry R. A., Kingsford C., Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Thompson J. D., Higgins D. G., Gibson T. J., CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kumar S., Stecher G., Tamura K., MEGA7: Molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 33, 1870–1874 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J., Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990). [DOI] [PubMed] [Google Scholar]
  • 33.Bailey T. L., Elkan C., Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994). [PubMed] [Google Scholar]
  • 34.Letunic I., Bork P., Interactive tree of life (iTOL) v3: An online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 44, W242–W245 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Du H., Babitzke P., trp RNA-binding attenuation protein-mediated long distance RNA refolding regulates translation of trpE in Bacillus subtilis. J. Biol. Chem. 273, 20494–20503 (1998). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.2006873117.sd01.xlsx (461.3KB, xlsx)
Supplementary File
pnas.2006873117.sd02.xlsx (576.3KB, xlsx)
Supplementary File
pnas.2006873117.sd03.xlsx (466.6KB, xlsx)
Supplementary File
pnas.2006873117.sd04.xlsx (460.8KB, xlsx)
Supplementary File
pnas.2006873117.sd05.xlsx (266.5KB, xlsx)
Supplementary File
pnas.2006873117.sd06.xlsx (257.1KB, xlsx)
Supplementary File
pnas.2006873117.sd07.xlsx (151.9KB, xlsx)
Supplementary File
Supplementary File
pnas.2006873117.sd09.xlsx (211.7KB, xlsx)
Supplementary File
pnas.2006873117.sapp.pdf (11.2MB, pdf)

Data Availability Statement

A total of 10.3 Gb of sequencing data have been deposited in the National Center for Biotechnology Information Sequence Read Archive (BioProject ID PRJNA603835). All other study data are included in this article and SI Appendix.


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES