Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Dec 8.
Published in final edited form as: Chem Rev. 2013 May 2;113(11):8604–8619. doi: 10.1021/cr400064k

NusG-Spt5 proteins - universal tools for transcription modification and communication

Sushil Kumar Tomar 1, Irina Artsimovitch 1,
PMCID: PMC4259564  NIHMSID: NIHMS646093  PMID: 23638618

1. INTRODUCTION

DNA-dependent RNA polymerases (RNAPs) carry out synthesis of RNAs in all domains of life. Each round of transcription can be divided into three major steps 1. Initiation encompasses RNAP recruitment to double-stranded promoter DNA. This leads to DNA strand separation to form a transcription bubble in which a region around the start site (denoted as +1) is melted. After reiterative synthesis and release of short abortive RNAs, RNAP escapes from the promoter. During elongation, RNAP adds one nucleotide (nt) to the nascent RNA chain and moves forward along the DNA by one base, repeating this cycle for thousands of times while remaining bound to both the DNA template and the growing RNA chain in a stable ternary transcription elongation complex (TEC; Figure 1). At a terminator, the TEC becomes massively destabilized by nucleic acid signals or proteins, which triggers the release of the RNA message. The transcription cycle is completed when RNAP dissociates from the DNA, after which it can rebind at another promoter.

Figure 1.

Figure 1

The main features of the TEC. Bacterial core RNAP (a pentasubunit α2ββ′ω complex) clamps around the nucleic acid chains using pincers composed of the mobile domains of the β and β′ subunits. The active site contains a catalytic Mg2+ ion and is composed of i and i+1 sub-sites. The RNA:DNA hybrid and the downstream DNA are buried in the RNAP, and only a short segment of the nontemplate DNA is exposed on the surface.

Some RNAPs are composed of a single polypeptide chain that can carry out the entire cycle either in the absence (as is the case in bacteriophage T7 2 and N4 3 RNAPs) or with the help of a single promoter-specificity factor (e.g., in the case of the yeast mitochondrial RNAP 4). However, all cellular genomes are transcribed by multisubunit enzymes that differ in their size and complexity (bacterial RNAPs commonly have five subunits, whereas eukaryotic and archaeal enzymes are composed of 12–17 polypeptide chains) but that have evolved from a common ancestor and share overall architecture and active-site organization, types of interactions with the nucleic acid chains, and molecular mechanism of catalysis 56. They also share requirements for regulatory proteins that define the boundaries of the transcription units, enable processive RNA synthesis, and rescue arrested transcription complexes 7.

Catalytically-competent multisubunit core RNAPs can transcribe DNA templates faithfully and efficiently but do not recognize specific nucleic acid sequences with high affinity; instead, they rely on transiently bound sequence-specific factors to initiate (and sometimes terminate) transcription at defined positions. In Bacteria, a family of σ factors recognizing different consensus DNA elements direct core enzymes to specific sets of promoters and facilitate DNA strand separation 89. Although bacterial σ factors are not homologous to initiation factors utilized by more complex RNAPs, structural studies reveal some similarities in the details of their molecular interactions 1011.

To begin processive RNA chain elongation, RNAP must break the specific interactions with the promoter DNA, which would otherwise prevent the transition from initiation to elongation 12. Since these interactions are mediated by the σ factor, the simplest mechanism entails abrupt dissociation of σ from the core triggered by a clash with the growing nascent RNA 13. However, σ release is not an obligatory step during promoter escape: while structural studies are consistent with such a clash, they also reveal that σ/core contacts could be broken sequentially (reviewed in 14), leading to slow and stochastic release of σ from the elongating RNAP 15. Among many σ/core contacts, which encompass 104 Å of the surface area, interactions between σ region 2 and the clamp helices domain of the β′ subunit (β′CH) contribute the most to σ/core affinity and are structurally compatible with both the initiation and the elongation complexes. During initiation, interactions of σ region 2 with the −10 promoter element mediate DNA melting 11,16. During elongation, σ can bind to the β′CH and the −10 sequence exposed in the nontemplate DNA on the TEC surface, inducing strong RNAP pausing far away from a promoter 17. Similarly, translation initiation signals stall the ribosome within coding sequences 18. Thus, the absence of strong interactions with the template ensures unimpeded progression during chain synthesis. The lack of conservation among initiation factors may suggest that in the last universal common ancestor a linear genome was transcribed end-to-end. In addition to targeted initiation, sequence-specific initiation factors may also suppress spurious initiation from the end of a linear template 19.

The mechanistic requirements for termination are simpler, and RNA release can be triggered by nucleic acid signals such as an RNA hairpin followed by a run of U residues (in Escherichia coli and Bacillus subtilis 20) or by a U-track alone (in Methanothermobacter thermautotrophicus 21); yet accessory factors that have a helicase or a translocase activity can be required for termination at other sites. In Bacteria, these classes of factors are represented by Rho, which limits expression of deleterious foreign DNA 22 and suppresses antisense transcription 23, and Mfd, a transcription-repair coupling factor which recruits nucleotide-excision repair machinery to RNAPs stalled at sites of DNA lesions 24. Unrelated proteins that play analogous roles have been characterized in other systems 2526.

Surprisingly, despite the remarkable similarities in the structure, mechanism, and many regulatory challenges faced by all multisubunit RNAPs, the initiation, termination, and most of other types of accessory factors utilized by these enzymes have no common evolutionary origin 7. The only exception is a class of NusG-like proteins (called Spt5 in Archaea and yeast and DSIF in humans) whose sequences, structures, binding sites on RNAP and mechanism of RNAP modification are universally conserved. These regulators are commonly viewed as transcription elongation factors 2728, but they also regulate termination 2223,29 and, as suggested by recent studies in Archaea, may modulate initiation 30. Another key role of these proteins is to tether the elongating RNAP to other macromolecular complexes (e.g., the leading ribosome 3132 in Bacteria and Archaea vs mRNA capping enzyme 33 in eukaryotes), thereby enabling coordinated regulation of transcription and coupled processes.

This review is focused on regulation of gene expression by bacterial NusG and its paralogs, among which RfaH is the best characterized. Studies of RfaH and NusG identified the binding sites on the TEC 3435 and the molecular mechanism of processivity 29 that are common to all members of their class 28,36. Analysis of the cellular targets and comparison of the regulatory properties of RfaH and NusG 3738 revealed opposite effects on gene expression: while NusG, together with Rho, silences foreign DNA 22, RfaH strongly activates a subset of horizontally-transferred genes 29,39. Most recently, demonstration that the C-terminal domain (CTD) of RfaH undergoes a complete fold conversion led to a discovery of a class of transformer proteins 40, in which a dramatic change in protein structure 31 defines a new function. We will briefly describe the TEC structure and mechanisms of pausing and termination as these are essential for understanding how NusG works; see recent reviews for in-depth coverage of these topics 7,4143. We will then focus on structural and regulatory properties of E. coli RfaH and NusG, drawing comparison to their homologs in other organisms to illustrate functional diversity within this family.

2. TRANSCRIPTION ELONGATION

2.1 TEC structure

All multisubunit RNAPs must synthesize RNAs with high fidelity and processivity while participating in regulatory interactions with diverse auxiliary factors; these interactions control initiation, elongation, and termination of RNA synthesis and determine the gene expression program of each cell. The highly conserved structures of the TECs reflect similar challenges faced by RNAPs during elongation: the TEC must be highly stable and processive at most template positions, transcribing through various obstacles (such as DNA-bound proteins), yet should halt RNA synthesis at specific regulatory pause signals and readily fall apart at terminators. Recent structural studies of the TEC revealed an atomic-resolution map of RNAP contacts to the nucleic acid chains and offered insights into the molecular basis of TEC dynamics43. In the TEC, ~15 base-pairs of the DNA duplex are bound in the downstream DNA-binding channel (Figure 1). The duplex is melted just ahead of the active site (at +2) to form an ~14-base-pair transcription bubble, in which the non-template DNA strand is partially exposed on surface of the enzyme and the template DNA strand lies within the active-site cleft, with the acceptor DNA template (+1) available for base pairing with the incoming NTP substrate. The RNAP active site is accessible through a secondary channel, also called the substrate entry pore.

2.2 RNAP clamp dynamics

The RNAP structure resembles a crab claw44 in which two pincers encircle a DNA-binding cleft with the RNAP active center positioned at its base. The clamp domain of the β′ subunit forms one pincer and the β lobe forms the other. In various crystal structures, the clamp has been observed in open and closed conformational states 11,4547 arising as a result of an ~ 20° rotation around the switch region located at the base of the clamp. Conformational dynamics of the β′ clamp have been proposed to play key roles throughout the transcription cycle. The clamp may open to permit loading of DNA duplex into the RNAP active-site cleft during initiation9, close to establish the tight grip on the template DNA during elongation 48, and then open again to allow release of DNA during termination 49.

Recent single-molecule FRET analysis 50 demonstrated that the clamp adopts different states in solution: the clamp is predominantly open in free RNAP and unstable closed promoter complexes but is closed in catalytically-competent open promoter complexes and TECs. These results supported the long-held view that clamp closure is required for the high stability of transcription complexes and that accessory factors that maintain the clamp in the closed state would enable synthesis of long RNAs 51. All RNAPs are obligatorily processive and cannot rebind a prematurely released nascent transcript, which can be tens of thousands of nucleotides long. Thus, regulation of the clamp dynamics is essential for productive transcription.

2.3 Discontinuous RNA chain elongation

A complete cycle of nucleotide addition consists of NTP binding, catalysis, pyrophosphate (PPi) release, and translocation. RNAP repeats this cycle many thousands of times to complete synthesis of the nascent RNA chain, while remaining bound to both the DNA template and the growing transcript. The cycle begins with NTP binding to the i+1 sub-site (also referred to as the insertion site or substrate site) of the post-translocated TEC, in which the 3′ end of the RNA is positioned in the i (or product) sub-site (Figure 1). Catalysis occurs by Mg2+ dependent, SN2-type nucleophilic attack of the RNA 3′ hydroxyl on the NTP α-phosphorus atom, displacing PPi. Catalysis involves stabilization of a trigonal-bipyramidal transition state52 by two Mg2+ ions, a high-affinity Mg2+I that resides in the active site and Mg2+II that is delivered bound to the substrate NTP for each round of catalysis. After catalysis, the 3′ end is positioned in the pre-translocated state; forward translocation driven by thermal motions 53 gives rise to the post-translocated state, and the cycle is repeated.

While in vitro E. coli RNAP can extend the nascent RNA at 500 nt/sec 54, in vivo it moves only at 20–90 nt/sec 55 because RNA synthesis is hindered by various obstacles, including DNA-bound proteins, DNA lesions, and intrinsic signals encoded in the DNA template and the nascent RNA1. Even at saturating NTP concentrations on naked intact templates in vitro, RNAP moves in leaps, with its fast movement along the template punctuated by short-lived pauses 56 that trigger TEC isomerization into an unactivated state. This so-called ‘elemental’ pause intermediate 57 can slowly escape to the elongation pathway upon nucleotide addition, isomerize into long-lived paused states upon formation of a nascent secondary structure (a pause RNA hairpin 58) or lateral sliding (backtracking 59), or give rise to a termination complex (Figure 2). Formation of the elemental pause is accompanied by structural rearrangements near the active site that may include opening of the clamp 58,60. Factors that prevent opening of the clamp would block formation of the elemental pause and favor rapid elongation 51.

Figure 2.

Figure 2

Transcription cycle and isomerization into paused and termination states. The DNA template loads into the holoenzyme with the clamp open to form a closed promoter complex (not shown); following the strand separation, the clamp closes (in the open complex) and remains closed throughout elongation (in the TEC). At some sites, the clamp may partially open, facilitating rearrangements into the elemental pause state, which further isomerizes into pause and termination complexes.

Pausing plays numerous regulatory roles, is an obligatory step in termination pathways, and likely controls the overall rate of RNA chain elongation 41. A relatively slow rate may be necessary for timely recruitment of regulatory factors, attenuation control, co-transcriptional folding of the nascent RNA, and efficient coupling of transcription and translation in Bacteria and Archaea (see 41 for review). Most pauses are not mediated by specific interactions between the RNAP and the nucleic acids. However, core RNAP can recognize certain promoter elements during initiation 61 and pausing at some sites could be explained by specific RNAP/DNA interactions 6263. Pausing may also be induced by proteins that recognize specific sequences. Factors that bind to double-stranded DNA in the path of elongating RNAP would hinder its progression, acting as roadblocks. Other factors, such as E. coli RfaH and σ70, may specifically interact with the bases in the nontemplate DNA strand exposed on the TEC surface to induce RNAP pausing 17,64; interestingly, RfaH and σ70 bind to the same site on RNAP, the β′CH domain, despite the absence of any homology 65.

2.4 Termination

To enable differential regulation of adjacent transcription units, RNA chain synthesis has to be stopped at defined sites called terminators. This is particularly important in Bacteria and Archaea which have compact genomes with short intergenic sequences; indeed, RNA release frequently occurs at just one or two nucleotide positions. How does the TEC, which can add tens of thousands of nucleotides without fault and has a half-life of days in vitro 66, become abruptly destabilized?

The decision between elongation and termination pathways is kinetically controlled 67 and a dramatic destabilization is required to bring the dwell time of the TEC (>105 sec) into the characteristic range for nucleotide addition (<10−1 sec); at most template positions the barrier to termination exceeds the one to elongation by ~18 kcal/mol68. In the TEC, the energetically costly maintenance of the ~17 nt bubble is balanced by two favorable contributions, the formation of the RNA:DNA hybrid, the major stability determinant69, and RNAP interactions with the nascent RNA and with the downstream DNA42.

Among prokaryotic systems, the mechanism of termination is best studied in E. coli, where termination signals can be grouped into two classes, intrinsic and factor-dependent (Figure 2; 42). At intrinsic sites, an RNA signal composed of a GC-rich RNA hairpin followed by a run of U residues triggers TEC dissociation in the absence of any accessory protein. Intrinsic terminators located at ends of operons serve as punctuation signals, whereas those in untranslated leader regions control expression of downstream genes by attenuation 20. These signals can be easily identified by computational approaches and give rise to ~80% of RNA ends in E. coli 42. By contrast, RNA release at Rho-dependent sites is dependent on the action of Rho, a hexameric ATP-dependent translocase 70. These signals are composed of a 30+ nt pyrimidine-rich RNA element that serves as a Rho-loading site, rut, and a site of RNA release, which can be located hundreds of nucleotides downstream and frequently corresponds to a site at which RNAP pauses in the absence of Rho 55. This type of mechanism generates 3′ ends of the remaining 20% of messages which include both mRNAs and stable small RNAs 71. In addition, Rho carries out several essential quality-control tasks. It mediates polarity by interrupting transcription of damaged messages that can’t be translated 72 and suppresses the expression of horizontally acquired genes22. Rho may help to resolve R-loops73 and traffic jams created by slow, unmodified RNAPs in heavily transcribed rrn operons 74. Most importantly, Rho blocks anti-sense transcription, 23 which is likely to be its essential function in E. coli.

Both types of termination signals likely disrupt the same interactions that hold the TEC together. Three models of RNA release have been put forth42. In the hybrid shearing model 69, the upstream portion of the RNA:DNA hybrid is disrupted by formation of the nascent hairpin or the helicase activity of Rho. In the allosteric model 7576, changes in the active site cleft induced by RNAP contacts to the hairpin or Rho weaken interactions with the RNA:DNA hybrid, and perhaps other nucleic acid chains. In the hypertranslocation model 77, RNAP is pushed forward in the absence of nucleotide addition, thereby enlarging the bubble.

2.5 Antitermination

Termination efficiency, and thus the expression of downstream genes, can be modulated by accessory proteins, RNAs, and small molecules 78. RNAP can ignore one, or many, termination signals in response to an antiterminator. In the first case, the action of the termination signal is compromised, e.g., by preventing the formation of a terminator hairpin 79, but the TEC properties are unaltered. In the second case, RNAP is modified into a processive antitermination state by a bound protein or RNA, and can ignore pause signals and many consecutive terminators located a few kilobases from the modification site 80.

Processive broad-specificity antitermination modification is commonly employed by bacteriophages; three mechanisms, utilized by the phage λ N81 and Q proteins and HK202 phage nascent put RNA structure, have been characterized in molecular detail. These regulators allow RNAP to ignore all classes of pause and termination signals, allowing efficient expression of long phage operons during the lytic cycle. However, such run-away transcript elongation would render independent regulation of neighboring operons, insulated by hairpin-dependent terminators, impossible. Thus, one would expect that cellular antiterminators would increase RNAP processivity by reducing pausing and, consequently, Rho-mediated RNA release, but would not stabilize the TEC against dissociation. Indeed, S4 82 and RfaH 64 reduce RNAP pausing but do not have strong effects at intrinsic termination sites in vitro. RfaH dramatically reduces Rho-mediated polarity 29 but does not prevent RNAP dissociation at the end of an operon in vivo 38.

What is the molecular mechanism of antitermination? The branched mechanism of termination (Figure 3) suggests that an antitermination factor could inhibit the TEC isomerization into the elemental pause conformation or act at a later step, by inhibiting the binding/action of Rho or formation of the terminator hairpin. Antiterminators commonly target both steps. Bacteriophage λ N and Q proteins not only reduce pausing but also actively stabilize the TEC against dissociation at intrinsic terminators 8385. This activity is perhaps mediated via their additional contacts to the nascent RNA or the RNA exit channel 8586. However, RfaH inhibits Rho-mediated termination by excluding the Rho cofactor NusG from the TEC 29,38 and increasing translation31.

Figure 3.

Figure 3

The branched mechanism of RNA chain elongation suggests several points of antiterminator (AT) action.

3. NUSG FAMILY OF REGULATORS

Recent studies of the E. coli NusG and its operon-specific paralog RfaH illustrated a universally conserved mechanism of RNAP processivity and highlighted the regulatory diversity of the proteins that constitute the NusG family, as well as their key roles in coordination of RNA synthesis with other cellular processes. Here, we will briefly review the history, structure and activities of these proteins.

3.1 NusG, a general transcription factor

NusG has been extensively studied in E. coli, whereas only fragmentary data on NusG homologs from other Bacteria are available. E. coli NusG is an abundant (~10,000 copies/cell 87) and essential 88 protein 181 residues in length. NusG was identified genetically 89 and biochemically 9091 as a factor required for efficient λ N-mediated antitermination. Subsequent studies demonstrated that NusG directly interacts with RNAP, Rho, and ribosomal protein S10 32 (Figure 4), and possibly with a termination factor Nun 92. These contacts enable NusG to mediate its diverse, and sometimes opposite, effects on gene expression: NusG increases Rho-dependent termination 89 and RNA chain elongation rates 9394, is essential for efficient transcription of rRNA genes 95 and Nun-dependent termination 96, and may couple transcription to translation 32.

Figure 4.

Figure 4

E. coli NusG interacts with RNAP via its NTD (in green) and with NusE (S10), Rho, and Nun via the CTD (in cyan) to control transcription and translation.

3.1.1. λ N antitermination complex

The first antitermination mechanism, mediated by the N protein, was discovered in bacteriophage λ. Roberts proposed that λ N instructs RNAP to ignore Rho-dependent terminators in the early genes 97, enabling transcription of the entire phage genome. Genetic screens for host proteins involved in λ N antitermination identified several N-utilization substances, NusA, NusB, NusE (the ribosomal protein S10) and NusG, that, together with N, assemble into a complex modifying RNAP to a termination-resistant form 98. Assembly of this complex also requires two elements in the nascent RNA, boxA and boxB, jointly called the nut site 99100. Interactions of N with the boxB RNA hairpin and NusA are sufficient for antitermination in a minimal in vitro system 81, but long-range modification requires NusB/S10 contacts to boxA RNA 101. The highly processive antitermination complex, which is able to act far downstream from the nut site 91, is likely additionally stabilized by other interactions, such as those of S10 with RNAP 102 and NusG 32.

3.1.2. rRNA antitermination

RNAP transcribing the rRNA (rrn) genes moves twice as fast as the enzyme transcribing mRNAs and is resistant to the action of Rho 103. These properties are conferred by the assembly of a ribonucleoprotein complex that shares many features with the N-antitermination complex: rrn operons encode box A and box B elements and all the Nus factors are required for efficient synthesis of rRNA 104. NusG and NusB are necessary for high rate of rrn transcription 94, implying that an essential S10, which interacts both with NusG and NusB and is required for NusB binding to box A 101, is also an integral part of the rrn antitermination complex. In addition, ribosomal protein S4, and perhaps other cellular factors, are involved 82, likely defining the key feature of the rrn complex - in contrast to the λ N system, RNAP is efficiently stopped by intrinsic terminators 105, a property that is essential to prevent read-through beyond the end of the rrn operon.

3.1.3. Pausing and elongation rate

Antitermination properties of NusG could be explained by its effects on RNA chain elongation. NusG increases the rate of elongation in vitro 93,106 and in vivo 94, and its effects are most pronounced at sites where RNAP is prone to backtracking51. An apparent increase in the rate of RNA chain elongation could be due to suppression of RNAP isomerization into off-pathway states or an acceleration of nucleotide addition along the main pathway or in the elemental pause intermediate (Figure 2). In ensemble experiments, NusG reduces the half-life of backtracked paused complexes, supporting a model that NusG induces forward translocation of RNAP, thereby facilitating binding of the substrate NTP to the i+1 site 107109. Recent single-molecule analysis 110 lent further support to this model, demonstrating that NusG modestly increases the pause-free elongation velocity and reduces the RNAP propensity to transiently pause during elongation; stabilization of the pause-resistant, post-translocated state of the TEC by NusG is sufficient to explain this effect. In addition to disfavoring short-lived, frequent pauses, NusG had a more pronounced effect on long pauses, likely accompanied by backtracking 110. Some of the NusG effects were position specific, raising a possibility of some sequence-specific contacts with the nontemplate DNA 111; specific interactions may underlie the delay of RNAP by B. subtilis NusG 112.

3.1.4. Rho- and Nun-dependent termination

Paradoxically, despite its elongation-enhancing properties, NusG is also required for efficient factor-dependent termination by bacteriophage HK022 Nun and E. coli Rho proteins 113. The latter activity may represent the key cellular function of NusG which, together with Rho, limits expression of horizontally acquired genes 22 and antisense RNAs 23. NusG becomes largely dispensable in strains lacking toxic prophage genes 22.

NusG increases Rho-dependent termination at some sites in vivo 113 and in vitro 114, but does not increase Rho affinity or ATPase activity. In the presence of NusG, Rho releases shorter RNAs, acting at more promoter-proximal positions 87,114115. The mechanism by which NusG increases Rho termination is yet unclear. A recent study shows that NusG increases Rho-termination only at weak rut elements characterized by a low C/G ratio 23, suggesting that NusG may favor Rho interactions with RNA; indeed, NusG slows the off-rate of Rho from the nascent transcript 116. However, this simple model is inconsistent with an earlier observation that NusG does not change the Rho concentration required for termination 114 and recent studies which concluded that NusG facilitates EC dissociation by Rho 117118.

3.1.5. NusG and translation

A role for NusG in translation has been suggested by the presence of the KOW motif, a signature of several ribosomal proteins, in NusG 119 and by an observation that the translation rate decreased after the depletion of NusG 94, which also leads to deceleration of transcription. Recent studies by Nudler and coauthors demonstrated that the rates of translation and transcription are highly correlated: the slowdown of translation induced by the addition of antibiotics or rare codons in the mRNA leads to a concomitant slowdown of transcription99. A model derived from this study suggests that a trailing ribosome could push a backtracked RNAP forward to prevent pausing and maintain rapid and processive RNA synthesis 99.

Coincident work by Rösch and collaborators revealed that, in addition to its role in preventing backtracking, NusG may serve as a direct physical link between transcription and translation32. These authors reported an NMR structure of the NusG-CTD/S10/NusB complex that could explain how NusG could bridge the RNAP and the ribosome on one hand, and participate in the assembly of antitermination complexes on the other. S10 is not stable in isolation and can exist in the cell either as a heterodimer with NusB or as a component of the 30S ribosomal subunit. NusG-CTD and NusB interact with the opposite faces of S10 (Figure 5), and S10 contacts with the ribosome and NusB are mutually exclusive. S10 is well suited to serve as a tether - it is deeply anchored in the ribosome via a long loop 120, making it one of the most strongly bound ribosomal proteins. This study established that a link between transcription and translation is feasible, and suggested that other NusG paralogs, which share the conserved S10-binding interface, could couple the two processes. Future studies will address the details, timing, and regulatory functions of these contacts.

Figure 5.

Figure 5

The ribosomal protein S10 simultaneously interacts with NusB and NusG CTD. A structural superposition generated from NusE:NusB (PDB: 3D3B) and NusG CTD:S10 (PDB: 2KVQ) complexes.

3.2 RfaH, a regulator of virulence and fertility genes

3.2.1. RfaH discovery

RfaH was discovered among the collection of spontaneous rough mutants of Salmonella typhimurium deficient in lipopolysaccharide (LPS) core biosynthesis. Since rfaH mutants lacked galactosyltransferase activity, the product of the rfaH gene was tentatively categorized as an enzyme that adds a galactose unit to glucose-modified LPS 121. Subsequent work has demonstrated, however, that rfaH mutations caused defects in activities of several LPS glycosyltransferases, suggesting that RfaH is a positive regulator of synthesis or activity of these enzymes, rather than a transferase 122. Mutations in rfaH have been isolated independently in E. coli as sfrB 123 and hlyT 124 alleles, functions required for the expression of the F plasmid tra operon and the hly operon for synthesis and secretion of α-hemolysin. Mutations in sfrB conferred pleiotropic phenotypes, from altered sensitivity to bacteriophages and antibiotics to synthesis of nonfunctional flagella, suggesting defects in the cell envelope 123.

Analysis of tra operon expression revealed that SfrB was required to block premature transcription at Rho-dependent termination sites 125. The same study showed that sfrB mutations reduced synthesis of full-length LPS, several outer membrane proteins, and functional flagella, identifying RfaH as a broad specificity antitermination factor regulating both plasmid- and chromosomally-encoded genes 125. Later studies demonstrated that RfaH activates expression of the CNF-1 toxin 126, group II and III polysaccharide capsules 127128, O-antigen 129130, and hemin receptor 131, but represses Ag43, thereby inhibiting biofilm formation. 132

3.2.2. RfaH target genes

All known RfaH targets share several common features. First, they encode proteins involved in the synthesis, transport, or assembly of membrane and extracellular components 39. Second, they share a common motif, called ops, in their upstream segments. Third, they comprise long horizontally-transferred operons; many of these genes are located in pathogenicity islands or on plasmids. Consistently, inactivation of rfaH dramatically reduces virulence of E. coli 133 and Salmonella typhimurium134 while increasing invasiveness into antigen-presenting cells and intracellular killing, thereby making the rfaH mutant into a promising live attenuated vaccine 134.

ChIP-chip analysis of RfaH association with E. coli genes 38 revealed that RfaH is recruited to RNAP at the ops site, at the same position as observed in vitro (within the resolution of this analysis), and remains associated with the TEC throughout transcription of the entire operon (Figure 6).

Figure 6.

Figure 6

RfaH-controlled rfa and rfb operons; promoters are indicated by bent arrows, the length (in kb) is shown below. Relative occupancies of RNAP (gray), Rho (light blue) and RfaH (red) along the operon are plotted below. Vertical arrows indicate the positions of ops elements; the number in parentheses corresponds to the distance relative to the nearest ORF.

Similarities in localization and functions of proteins encoded by RfaH-controlled genes suggest that RfaH expression could be induced by some environmental cues, for example, upon interactions with host cells. Very little is currently known about control of RfaH expression. In Salmonella, rfaH expression increases at the onset of the stationary phase and is induced by alternative σ factors RpoS and RpoN 135. Interestingly, expression of rfaH homolog anfA1 from a cryptic pathogenic plasmid pADAP in Serratia entomophila is also dependent on RpoS 136

3.2.3 The ops element and its role in RfaH recruitment

Comparison of several RfaH-controlled operons identified an 8-bp sequence motif, GGCGGTAG137. Deletion of this motif from its position upstream of the hlyCABD operon strongly increased polarity, mimicking the phenotype of the rfaH deletion124. Conversely, placing this element downstream from a heterologous promoter increased expression far downstream 136. Hence, this element was termed ops (for operon polarity suppressor) and proposed to mediate RfaH recruitment 39,137.

The ops element comprises a part of the JUMPstart sequence identified earlier in operons encoding biosynthesis of polysaccharides 138; ops is conserved in many Bacteria (from Yersinia enterocolitica and Vibrio cholerae to Halomonas maura139), and rfaH genes from several species are interchangeable in vivo and in vitro140.

The presence of ops reduces the concentration of RfaH required to suppress transcription polarity in a partially purified in vitro system,137 suggesting that ops mediates RfaH recruitment to RNAP. Given that ops must be positioned downstream from a promoter to elicit anti-polar effects, it could function as an RNA or DNA; however, RfaH does not bind to either DNA or RNA and does not have any recognizable nucleic acid-binding motifs. These observations, and the identification of RfaH as a homolog of NusG39, which participates in assembly of transcription antitermination complexes, led Koronakis and colleagues to propose that ops interacts with an unknown cellular factor141 to tether RfaH to RNAP, which is consequently modified into a processive state39.

Interestingly, however, RNAP was shown to pause in vivo 142 and in vitro 51 three nucleotides downstream from the ops element, positioning the conserved sequence inside the transcription bubble. If the ops-paused RNAP serves as a target for RfaH recruitment, as observed during the assembly of other antitermination complexes, the ops could act either indirectly, by altering RNAP conformation, or directly, by establishing base-specific contacts with RfaH or another factor. In the second case, only the nontemplate DNA exposed on the RNAP surface would be available for interactions (Figure 1). Indeed, RfaH directly crosslinks to the nontemplate DNA and binds to any heteroduplex TECs in which the nontemplate strand (but not RNA or the template strand) carries the ops sequence 64. This mechanism of recruitment may be utilized by other proteins targeted to the nontemplate strand transiently exposed on the TEC surface, for example, during activation-induced cytidine deamination 143 required for somatic hypermutation of immunoglobulin genes.

The ops element may play several roles in the RfaH mechanism. First, ops induces efficient pausing by RNAP, possibly providing sufficient time for RfaH recruitment to the TEC. RfaH must bind to a short fragment of the nontemplate strand, which would be exposed only for a fraction of a second on a rapidly moving RNAP; once RNAP moves past the ops site, it cannot be modified by RfaH. Second, ops likely makes base-specific contacts with RfaH. Although a structure of RfaH bound to the TEC is not yet available, dramatic effects of point mutations in ops and single substitutions in RfaH on RfaH/ops interactions 34, together with crosslinking data 64 and molecular modeling 35 support the existence of such interactions. Third, ops may induce conformational changes in the TEC that are required for RfaH binding or antitermination modification. For example, ops-induced scrunching of the nontemplate DNA could favor both RfaH binding to the TEC and escape of RfaH-bound RNAP from the ops site 43; similarly, scrunching facilitates disruption of σ/DNA interactions during RNAP escape from the promoter 12. Finally, ops restricts the RfaH action to a small set of operons, thereby avoiding interference with the essential NusG 38. RfaH and NusG share the same binding site on the TEC but only RfaH requires a specific site for recruitment 35.

3.3 Structural features of the NusG regulators

3.3.1. Domain organization

Proteins from the NusG family follow a common theme of domain organization. In general, all homologs contain a single NGN (NusG-like N-terminal) domain144, followed by one or more CTDs containing a KOW motif 119 (Figure 7a). This makes NusG a perfect module to interact with both nucleic acid and proteins145. The N-terminal domain (NTD) binds to RNAP and nucleic acids to mediate effects on RNA synthesis, while the KOW domain functions as a tether bridging the transcription apparatus and other cellular machineries, such as Rho or ribosome92. E. coli NusG represents a minimal system that contains each of these domains92 whereas other bacterial proteins, for instance Aquifex aeolicus NusG, also carry an additional domain that may perform species-specific regulatory functions145. By contrast, eukaryotic proteins contain multiple KOW domains (3–6, although the domain numbers and assignment remain debatable146). Eukaryotic NusG homologs also acquired additional regulatory domains to satisfy essential regulatory requirements in higher organisms; these domains maybe subjected to modifications (such as phosphorylation) and serve as docking sites for other proteins 27.

Figure 7.

Figure 7

Domain organization and phylogeny of NusG proteins. (a) Domain organization in bacterial, archaeal and eukaryotic NusG proteins is depicted. (b) A simple phylogenetic tree, inferred using Neighbor-Joining method 187 generated in MEGA5 188 program illustrates an evolutionary history of NusG family.

3.3.2. Structural conservation of NusG proteins

The sequence, core structure and the mechanism of RNAP modification are well conserved among all NusG homologs147, consistent with strong evolutionary relationships among these proteins144 (Figure 7b). Simpler monomeric proteins (present in extant Bacteria) gave rise to a more complex heterodimeric system in eukaryotes and Archaea. In these organisms, NusG homolog Spt5 forms a heterodimer with the small zinc-binding protein Spt4148 or its homolog RpoE, respectively149; RpoE was earlier considered to be a subunit of archaeal RNAP150.

Structures of NusG homologs from all three domains of life are available35,92,145,150152 (Figure 8). NGN has similarities to ribonucleoprotein domain motifs145 and is composed of four stranded antiparallel β-sheets that are covered by two helices on one side and by another helix on the other side, forming a α-β-α sandwich. The single helix serves as a connecting region between the NGN and KOW domains and interacts with several conserved hydrophobic residues that could make interactions with RNA bases145. In archaeal NusG, NGN is shortened by a 20-residue deletion of a long β-hairpin loop between the two β-strands. The KOW fold 119 is commonly observed making contacts to RNA in several RNA-binding proteins119. Despite suggestions that both NusG domains may interact with RNA, neither domain has been shown to have RNA-binding activity, and only one NusG homolog, from Thermotoga maritima, binds to RNA (and DNA) nonspecifically 153.

Figure 8.

Figure 8

Structures of (a) E. coli NusG (PDB: 2KO6, 2JVV); (b) A. aeolicus NusG (PDB: 2KO6, 2JVV); (c) E. coli RfaH (PDB: 2OUG) and (d) Pyrococcus furiosus Spt5-Spt4 (PDB: 3P8B). (e) A structural superposition of NGN domains including human DSIF (PDB: 3H7H) reveals high structural conservation among these proteins.

In most NusG-like proteins, the KOW domain is a highly bent, mostly stranded β-barrel fold which is freely connected to NGN by a flexible linker loop. By contrast, the structure of RfaH CTD is unique (Figure 8c) - in the free protein, it is folded as an α-helical hairpin that is stabilized by hydrophobic interactions with the NTD 35. The α-helical fold is unstable, and the CTD refolds into a characteristic β-barrel KOW state upon domain dissociation triggered by recruitment to RNAP or weakening of the interdomain contacts 31.

3.3.3. The autoinhibited state of RfaH

These differences in domain organization underlie important regulatory differences in the recruitment of NusG-like proteins to the TEC. RfaH is a highly specialized regulator whose action is limited to a small set of E. coli operons 38. RfaH-like proteins are present in only some bacterial species38 and are absent in Archaea and eukaryotes. In free RfaH, the two domains form an interface stabilized by conserved hydrophobic interactions; this interface masks the RNAP-binding site located on the NGN and needs to be destabilized to allow for RfaH binding to the TEC 35. RfaH activation via domain separation is thought to be triggered by binding to the ops-paused RNAP; consistently, the isolated NTD does not require ops for binding to, and modification of, the elongating RNAP38. In contrast, NusG and Spt5 are essential and ubiquitous general elongation factors – they do not display sequence-specific effects on transcription and are associated with RNAP transcribing numerous genes 37,154 because the RNAP-binding surface on the NGN domain of NusG is exposed 145, obviating the need for activation and allowing for binding to any TEC.

A similar closed (spring-loaded) state in which the NusG NTD and CTD interact to bury a hydrophobic RNAP-binding site on the NTD was proposed based on the crystal structure of A. aeolicus NusG 151. However, this model is inconsistent with an independent crystal structure of A. aeolicus NusG 145, solution structures of A. aeolicus, E. coli, and Thermus thermophilus NusGs 92,152,155, and functional studies of E. coli NusG 92.

3.3.4. Functional regions of RfaH and NusG

Homology between RfaH and NusG 39 and their common antipausing effects in vitro 64,93 implied that both factors may interact with the TEC similarly. Even though NusG was studied much more extensively, it was the analysis of RfaH that led to identification of the NTD determinants responsible for interactions with the TEC. Two properties of RfaH facilitated its mechanistic analysis and served as constraints in the building of a heterologous model of E. coli RfaH bound to the T. thermophilus TEC (Figure 9): (i) RfaH directly interacts with the ops element in the nontemplate DNA strand 64 and (ii) the isolated RfaH-NTD recognizes ops and mediates antipausing 35. In the model, the NTD positioned to contact the exposed segment of the nontemplate stand was in an interacting distance from the tip of the β′ CH domain, the major binding site for σ factors (Figure 9a). This model is supported by several observations: the RfaH NTD efficiently competes with σ70 for binding to the β′ CH during elongation 65 substitutions of several basic residues predicted to interact with the DNA compromise ops binding 34, and substitutions of the hydrophobic residues in RfaH and in the β′ CH inhibit RfaH association with the TEC 35,65.

Figure 9.

Figure 9

RfaH interactions with the TEC. (a) A model of RfaH NTD (green) bound to the TEC. The RNAP is shown as a gray surface; the β′ bridge helix, β′CH and βGL are shown as cartoons. The RNAP active site is marked by the Mg2+ ion (magenta sphere). The template DNA, the nontemplate DNA, and the RNA are colored in black, blue, and red, respectively. (b) A close-up view with the side chains making the key interactions with the nontemplate DNA, β′CH and βGL shown as surfaces and colored in blue, orange and magenta, respectively.

All transcriptional activities of RfaH are mediated by the NTD. Despite its small size (90 residues), the NTD contains three separate regions that mediate DNA recognition, retention on the RNAP throughout transcription, and antipausing modification (Figure 9b).

DNA binding

RfaH NTD binds to a rather unusual target, the single-stranded DNA strand interacting with the RNAP surface and lacks recognizable DNA-binding motifs 64. Mutational analysis identified five polar and charged residues (Lys10, Arg16, His20, Thr72 and Arg73) that are required for RfaH-induced delay at ops, which is presumed to arise from persistent RfaH/DNA interactions. These residues are the least conserved between RfaH and NusG.

β′ CH binding

The antipausing activity of RfaH depends on its stable contacts to RNAP. Molecular modeling and mutational analysis of the nearby β′ residues suggested that a hydrophobic patch on the RfaH NTD interacts with the tip of the β′ CH35. Substitutions of several RfaH residues (Trp4, Tyr54, and Phe56) decreased its activity at low RfaH concentration, consistent with the reduced affinity of the altered proteins 34.

β gate loop (β GL) binding

In a simple scenario, RfaH interactions with the β′ CH and the nontemplate DNA at the upstream part of the bubble would be sufficient to promote strand reannealing, favoring translocation and reducing pausing. However, mutational analysis revealed a third motif (HTT) that was required for antipausing but dispensable for RfaH binding to the TEC34. In the RfaH/TEC model, the HTT motif makes a contact with the β gate loop (β GL, Figure 9a). We showed that the β GL, which belongs to the β pincer that likely moves in concert with the β′ clamp during isomerization into a paused state 58, is essential for antipausing by RfaH and NusG 29.

In vitro analysis of NusG variants with substitutions of residues in the homologous patch 92 and two-hybrid assays 156 support the assumption that the N-terminal domains of RfaH and NusG bind to the β′ CH similarly38. Defects conferred by substitutions of residues Phe65 and Tyr68 (which corresponds to Tyr54 in RfaH) were consistent with a reduced affinity for the TEC 92. Furthermore, NusG NTD functionally interacts with the β GL 29. The phenotypes of the NTD mutants, competition between RfaH and NusG 38 and structural modeling suggest that RfaH, NusG and Spt5 bind to the same sites on the TEC36. This indicates that the RfaH and NusG NTDs play identical roles in RNAP modification. However, RfaH makes tighter contacts with RNAP owing to a larger hydrophobic area on the RfaH NTD; this allows RfaH to remain associated with RNAP throughout transcription 38.

By contrast, the CTDs determine the protein-specific properties of RfaH and NusG. Both CTDs make similar contacts to S10 3132 which are thought to mediate cross-talk with the translation apparatus as well as NusG assembly into the rrn antitermination complex. The remaining functions of the CTD are very different. In NusG, the CTD directly interacts with Rho and perhaps with Nun 92, and may establish some additional contacts to other cellular factors. In RfaH, the CTD prevents spurious recruitment to the TEC at non-ops sites 35 by establishing a closed autoinhibited state but is not known to make any other contacts, including Rho 31.

Mutational studies of the E. coli NusG identified residues that are necessary for Rho- and Nun-dependent termination (Figure 10). Analysis of NusG variants defective in λ exclusion by Nun identified a cluster of CTD residues (Phe144, Asn145 and Phe165) that could constitute a Nun-binding site 92,157; substitutions of other resides (Trp9, Trp80, Ile103, and Val178) are more likely to exert indirect effects on Nun function. The pattern of Rho-defective NusG mutants is more complex. The residues at which substitutions displaying defects in Rho binding and function have been isolated (Gly146, Val148, Leu158, Val160, and Ile164) are mostly buried in the CTD 92,117, which must undergo conformational changes to allow for a direct contact of these residues with Rho. The effects of these variants could be indirect, arising as a consequence of structural changes in the CTD, and additional studies will be required to dissect the molecular details of NusG/Rho interactions.

Figure 10.

Figure 10

Functional interactions of the NusG CTD. Residues whose substitutions confer defects in Nun function are shown in red (F144, N145 and F165), those that confer defects in Rho function - in blue (V148, L158, V160 and I164).

3.4. RfaH and NusG in other Bacteria

Many Bacteria contain several NusG homologs, among which one can be viewed as a general transcription factor that carries out housekeeping functions, while others perform some dedicated tasks; we termed the latter class specialized NusGs, or NusGSP 38. This view was based on E. coli, where the essential NusG regulates expression of many genes whereas the dispensable RfaH is specifically targeted to a limited set of genes that are involved in extracytoplasmic functions, including virulence and fertility.

In the case of E. coli RfaH, the basis for this specificity is well understood - the ops element is necessary and sufficient for defining the RfaH regulon 38. However, even in E. coli, additional uncharacterized NusG paralogs, such as ActX encoded on a large conjugative multidrug resistance plasmid pOLA52 158, exist. ActX regulates the expression of the conjugative functions encoded by the F plasmid 123. Its homology to RfaH and actX location on an autonomous genetic element suggests that it could be involved in the transcriptional control; however, such a function remains to be demonstrated.

Studies of other NusGSP homologs support a common mode of action, activation of expression of long operons by antitermination. Induction of S. entomophila AnfA1, which has a high similarity to RfaH 159, allowed expression of afp, an 18-ORF anti-feeding prophage gene cluster, which encodes a virus-like structure thought to mediate the transport of toxins that cause a cessation of feeding by the grass grub larvae 160. Strikingly, a perfect ops element is located 108 bp upstream of the first ORF, indicating that AnfA1 may share the fine details of TEC recruitment with RfaH. Similarly, distant RfaH orthologs from Klebsiella pneumoniae, V. cholerae, and Y. enterocolitica readily recognized ops in vitro and complemented the disruption of the E. coli rfaH gene 140. In Myxococcus xanthus, enzymes required for the production of the antibiotic TA (myxovirescin) are encoded within a 36-kb TA gene cluster. Expression of this operon requires the first gene in the cluster, taa 161. TaA is homologous to, and has been proposed to act like, RfaH; TaA is essential for the antibiotic production but not for growth and development 161. In Bacteroides fragilis, expression of eight different capsular polysaccharide operons is controlled by antitermination; each operon is controlled separately by a protein encoded by the first gene in each operon 162. Although the detailed mechanism of these UpxY proteins remains unknown, the available data suggest that they belong to the nusGSP family of regulators 163.

NusGSP proteins control expression of highly specialized targets, and could thus be expected to differ in fine details of their mechanisms. However, even though “general” NusGs are ubiquitous, their properties may also differ. For example, NusG is not essential in B. subtilis 164 or Staphylococcus aureus 165, and, unlike the E. coli NusG, B. subtilis NusG strongly increases pausing at some regulatory sites in vivo and in vitro 166. T. thermophilus NusG does not bind to Rho 152 and slows down rather than facilitates elongation by its cognate RNAP, even though it binds to the nontemplate DNA strand, competes with σA, and induces forward translocation by RNAP 111, properties shared with E. coli RfaH and NusG. NusGs may also differ in their regulatory interactions: T. maritima NusG is the only member of this family known to bind nonspecifically and cooperatively to nucleic acids 153, and one of three NusG paralogs in Thermoanaerobacter tengcongensis interacts with DnaA 167.

Striking structural similarity among NGN domains from all kingdoms (Figure 8e) implies a similar molecular mechanism of RNAP control. However, NusG-like proteins can be very versatile and include factors controlling just one operon (AnfA1 and TaA), a small set of operons with related functions (RfaH), or many operons (NusG). The specificity of recruitment to the TEC is likely controlled by contacts to the nontemplate DNA but the details of recruitment remain to be determined for any protein from this family.

4. REGULATION OF RHO-MEDIATED POLARITY

In Bacteria, Rho surveys the mRNA quality - if translation is inefficient, Rho releases the nascent RNA prematurely72 establishing polarity along the operons. Polarity is characteristic of all bacterial operons but is particularly pronounced in poorly translated, horizontally-transferred operons. Regulation of Rho-mediated termination constitutes the main functions of RfaH and NusG which, despite their similar effects on RNA chain elongation in vitro, play opposite regulatory roles in the cell.

NusG directly binds to Rho 92 and stimulates RNA release by Rho 116 to repress horizontally acquired DNA 22 and antisense transcription 23. Although NusG can also inhibit polarity, acting as part of rrn antitermination complex 104 or a bridge between RNAP and the leading ribosome 32, its essential role is to silence foreign DNA 22. However, some horizontally-transferred genes that play important cellular roles have to be protected from the joint action of Rho and NusG. In E. coli, RfaH has evolved to play this role. Most RfaH-controlled operons are horizontally transferred 168 or plasmid-encoded 39. Observations that rfaH defects are suppressed by mutations in rho 169 and that RfaH almost completely abolishes Rho-mediated polarity in the rfb operon 29 support a model in which the sole function of RfaH is to counteract Rho-mediated polarity in long, poorly translated operons.

Since RfaH and NusG bind to the same site on the TEC, a mechanism that ensures that both proteins are targeted to their respective genes must exist in the cell (Figure 11). Our studies demonstrated that, in contrast to NusG, which associates with RNAP transcribing many operons 37, RfaH targets only those few operons that contain an ops sequence 38. This specificity is dictated by the need for RfaH activation by ops (see above). RfaH recognizes its target DNA even in the presence of a large excess of NusG, likely because ops sequences are located near promoters, whereas NusG is recruited to the TEC further downstream 37. Once bound, RfaH prevents NusG loading 38.

Figure 11.

Figure 11

NusG and RfaH regulons. NusG exists in an open, active state and can bind to RNAP at all sites. Once bound to the β′CH domain, the NusG CTD interacts with Rho. In contrast, RfaH requires domain dissociation to convert a closed, autoinhibited state into an open state capable of interactions with RNAP.

RfaH has only a modest inhibitory effect on Rho-dependent termination in vitro but has a dramatic (hundred fold) effect on Rho-mediated polarity within the rfb operon in vivo 29. RfaH could interfere with Rho action directly, but evidence in support of this model is lacking. Instead, our data suggest that RfaH uses three distinct mechanisms to inhibit Rho-dependent termination.

4.1. Antipausing modification of RNAP

Pausing is a prelude to termination, and Rho is known to preferentially target paused RNAPs 170. One mode of RfaH action is to modify RNAP into a pause-resistant state 109 by restricting the mobility of the clamp, thereby inhibiting pausing and, in turn, termination 29. This model was proposed based on the identification of the β′CH as the RfaH-binding site and then confirmed by identification of the βGL as the second contact site on the TEC 29. We proposed that RfaH binds simultaneously to the β′CH and βGL located on the opposite sides of the main channel, locking the clamp in a closed state and preventing structural rearrangements that occur during transcriptional pausing 58. By bridging the gap between the β′ clamp and the β lobe domains, RfaH completes a proteinaceous ring around the DNA template, thereby increasing RNAP processivity (Figure 11).

Structural studies of archaeal Spt5 proteins bound to their targets on RNAP 147,171 provided strong support for the hypothetical model of RfaH/TEC, suggesting that the clamping mechanism is shared by all NusG-like proteins 7,28. However, we demonstrated that this mode of action makes only a minor (two- to six-fold) contribution to the anti-polar effects of RfaH 29. The remaining two mechanisms (Figure 13) are specific to RfaH.

Figure 13.

Figure 13

Anti-Rho activities of RfaH. (a) RfaH excludes NusG from the TEC, inhibiting Rho-mediated RNA release but not Rho biding. (b) An RfaH-tethered ribosome blocks Rho binding to the nascent RNA.

4.2. Exclusion of NusG

The second mode of RfaH action is to exclude NusG from the elongating RNAP through competition for binding to the β′ CH (Figure 13a). We showed that RfaH efficiently outcompetes NusG in vitro and in vivo provided that the ops element is present in the transcribed region 38. Since NusG stimulates Rho-dependent termination, competition with NusG ensures that Rho-mediated RNA release is inhibited. Consistent with the post-recruitment role of NusG in Rho termination, Rho is associated with the rfb operon (Figure 5) even though NusG is nearly completely excluded 38, and Rho-dependent polarity is essentially abolished in the presence of RfaH 29. These observations suggest that NusG exclusion could be an important component of the anti-Rho activity of RfaH, but the relative contribution of this mechanism to the RfaH action in vivo remains to be elucidated.

4.3. Recruitment of the ribosome

All RfaH-controlled operons are horizontally transferred, suggesting inefficient translation. This view is supported by the strong polarity observed in the rfb operon in vivo 29 and suggests another mode of RfaH action – to increase translation of its target operons. This hypothesis was put forth to explain differences in RfaH retention on the TEC in vivo and in vitro 38 and was consistent with RfaH interactions with the components of the translational machinery in the cellular context 141 and with transcription-translation coupling 99 mediated by NusG/S10 contacts 32.

During canonical translation initiation, recruitment of the 30S preinitiation complex is mediated by interactions between a Shine-Dalgarno (SD) element upstream of the AUG start codon and a complementary sequence in the 16S rRNA172. However, all known RfaH targets lack SD elements, raising the question of how the ribosome is recruited to these mRNAs. Our recent data indicate that RfaH can substitute for an SD element during initiation. We showed that removal of the SD element made the expression of a reporter luxCDABE operon dependent on the RfaH CTD 31 and proposed a model in which RfaH/S10 contact increases the local 30S concentration, thereby recruiting the ribosome to the nascent mRNA synthesized by RfaH-bound RNAP.

This mechanism would facilitate the first round of translation, with the leading ribosome insulating the nascent RNA from Rho-induced release (Figure 13b). It is not known whether RfaH targets are translated many times but it is the pioneer round of translation that serves as a target for RfaH – once transcription is completed, the RNA is no longer in danger of Rho-mediated release.

5. LINKING TRANSCRIPTION TO CONCURRENT CELLULAR PROCESSES

5.1. The CTD as a universal communication module

Identification of NusG as the only ubiquitous transcription factor appears to imply that its key regulatory role is RNAP modification during elongation 7; indeed, the underlying contacts between the NTD and the RNAP pincers are universally conserved 28,36. However, NusG-like proteins have also been implicated in other, equally important processes that must be coordinated with ongoing RNA synthesis in all domains of life. This coordination is provided by the CTDs, which tether the transcribing RNAP to diverse cellular partners, which include antitermination complexes and ribosome in prokaryotes and the RNA splicing, capping, and degradation machineries in eukaryotes.

Even in Bacteria, the CTD can interact with very different partners, whereas the presence of multiple KOW domains in eukaryotic and archaeal proteins provides additional diversity, which likely reflect the increased complexity of gene regulation 27. Studies from yeast demonstrated direct roles of Spt5 in chromatin remodeling and histone modifications. Chromatin remodelers Chd1173 and Paf1174 associate with Spt5. During transcript elongation, in addition to controlling RNAP processivity, Spt5 is also suggested to contribute to overcoming the nucleosome barrier175. Proteins involved in splicing and capping are reported to associate with Spt533. Drosophila Spt5 interacts with exosome, suggesting a direct role in RNA degradation176.

The absence of nuclear membrane allows co-transcriptional translation in Bacteria and Archaea 177. Coupling of these two processes offers their robust and economical regulation: Coupling is advantageous in maintaining the local concentration of the translated products in the vicinity of transcription sites178179 and prevents formation of R-loops180. A translating ribosome could function as an antiterminator during attenuation181: an uncharged tRNA triggers ribosome stalling at the attenuator site, blocking the formation of the terminator hairpin and allowing expression of the downstream genes, for example, those encoding histidine or tryptophan biosynthesis. Uncoupled translation, e.g., as a result of a premature stop codon, allows Rho-mediated polarity to rectify faulty transcripts 72.

Although the importance of transcription-translation coupling is established in Bacteria, its molecular details are not completely known. In addition to mRNA, other factors may directly link RNAP to ribosome. Many proteins (reviewed in 182) are shared between the two processes, and several ribosomal proteins are involved in feedback control of their genes183. NusG proteins likely play the key role in bridging RNAP and ribosome, with the NusG CTD providing an interacting surface for S10 32. For a small protein like NusG, holding two large nucleoprotein complexes together appears to be an immense task, particularly since the speed of both RNAP and ribosome could be modified independently. Given that NusG/S10 interactions are weak in a “binary” NusG-CTD/S10/NusB complex32, other factors may strengthen the link between RNAP and ribosome, once the first contact is made by NusG.

The role of NusG in coupling is supported by many reports (see 3.1.5) but it is yet unclear what contribution it makes to the expression of cellular genome. In fact, it is the quality control function of NusG, mediated by its contacts with Rho that are incompatible with the CTD/S10 interactions, appears to underlie its essentiality in E. coli 2223. In contrast, ribosome recruitment by RfaH could be critical for its dramatic antipolar effects. RfaH increases the expression of its target genes, which are poorly translated, by hundreds of fold 29; this effect could be achieved simply by removing a ribosome-binding site from a heterologous mRNA, implying that RfaH may load ribosome by an alternative route 31. Strikingly, productive interactions between RfaH and ribosome require a complete transformation of the CTD (Figure 14).

Figure 14.

Figure 14

RfaH transformation. RfaH recruitment to RNAP at ops leads to domain dissociation. This triggers refolding of the CTD. The refolded CTD is now capable of coupling transcription to translation by binding to S10.

5.2. Transformation as a novel regulatory paradigm

In RfaH, the CTD is folded as an α-helical hairpin, and the conserved residues implicated in NusG/S10 interactions are inaccessible 35. However, noting that the sequence of the RfaH CTD is readily compatible with the NusG-like β-barrel fold 35, we proposed that after domain dissociation, a prerequisite for the NTD binding to RNAP, the CTD undergoes an unprecedented refolding into a β barrel to interact with the ribosome during coupled translation elongation 38. Recently, we demonstrated that the RfaH CTD does indeed make a barrel when its interactions with the NTD are severed or destabilized 31. This newly formed barrel is nearly identical to NusG CTD and binds to S10 in a similar manner 31, apparently enabling ribosome loading on mRNA lacking canonical translation initiation signals. It remains to be seen whether this hypothetical contact is maintained throughout translation elongation, but RfaH is known to be stably associated with the transcribing RNAP until its dissociation at a terminator 38.

The RfaH CTD is truly unique in its structural and regulatory plasticity, prompting us to define a new class of transformer proteins 40, in which a structural change coincides with an acquisition of a new function. Along with other proteins, such as lymphotactin 184, RfaH combines features of moonlighting proteins 185, which use the same fold to carry out different functions, and metamorphic proteins186, which can adopt different folds in identical environments. However, RfaH CTD stands out from this list by completely changing both its secondary structure elements and its function. This observation is even more surprising in light of the ubiquity of SH3 fold that the β barrel CTD possesses.

We argue that transformation is not an isolated phenomenon unique to RfaH but may be utilized by other proteins from the NusG family to broaden their regulatory repertoire. It is possible that not all of the regulatory interactions reported for NusG-like proteins are mediated by the β-fold. It is also possible that other multi-domain proteins that operate at crossroads of complex regulatory networks could employ similar strategies in interactions with their cellular partners. Transformation opens up an entirely new paradigm for regulation and intermolecular communication and puts proteins at par with ancestral RNA enzymes in terms of structural flexibility.

Figure 12.

Figure 12

The universally conserved clamping action of RfaH NTD. This model was built based on the binary RNAP/NusG complex 147, with E. coli RfaH NTD replacing NusG NTD. The NGN domain anchored by the β′ CH domain reaches across the active site cleft towards the β lobe domain, effectively sealing the double-stranded DNA within the closed cleft.

Acknowledgments

We thank Dmitri Svetlov for comments on the manuscript. This work was supported by the National Institutes of Health grant GM67153.

Biographies

graphic file with name nihms646093b1.gif

Sushil Kumar Tomar studied pharmacy for his undergraduate studies from Rajiv Gandhi university of Health Sciences, Karnataka, India. He completed Bachelor of pharmacy in 2004. Being fascinated to basic science, later he moved to Indian Institute of Technology, Kanpur and received PhD in Structural and Molecular Biology in 2010. There, he worked with Dr. Balaji Prakash to understand the role of bacterial GTPase in the process of ribosome assembly. Since 2011, he is working with Dr. Irina Artsimovitch at The RNA Center, The Ohio State University, Columbus (Ohio), USA. His current research focus is transcription elongation factor, RfaH.

graphic file with name nihms646093b2.gif

Irina Artsimovitch completed her M. Biochemistry (Hons) at Moscow State University (Russia) in 1989. She received her Ph.D. in Microbiology from the University of Tennessee-Memphis in 1996, working on regulation of transcription in bacteriophage Mu under the supervision of Martha M. Howe. Following postdoctoral work with Robert Landick at the University of Wisconsin-Madison (1996–2001) aimed to elucidate the molecular mechanism of transcriptional pausing, she joined the Department of Microbiology at the Ohio State University as an Assistant Professor in 2001, was promoted to an Associate Professor in 2006, and to a Professor in 2011. Her lab studies molecular mechanism and regulation of bacterial RNA polymerases as well as cross-talk between transcription, DNA repair, and protein synthesis in Bacteria.

References

RESOURCES