Significance
For all cellular RNA polymerases, the position of the transcription start site (TSS) relative to core promoter elements is variable. Furthermore, environmental conditions and regulatory factors that affect TSS selection have profound effects on levels of gene expression. Thus, identifying determinants of TSS selection is important for understanding gene expression control. Here we identify a previously undocumented determinant for TSS selection by Escherichia coli RNA polymerase. We show that sequence-specific protein–DNA interactions between RNA polymerase core enzyme and a sequence element in unwound promoter DNA, the core recognition element, modulate TSS selection.
Keywords: RNA polymerase, transcription start site selection, promoter, transcription bubble, transcription initiation
Abstract
During transcription initiation, RNA polymerase (RNAP) holoenzyme unwinds ∼13 bp of promoter DNA, forming an RNAP-promoter open complex (RPo) containing a single-stranded transcription bubble, and selects a template-strand nucleotide to serve as the transcription start site (TSS). In RPo, RNAP core enzyme makes sequence-specific protein–DNA interactions with the downstream part of the nontemplate strand of the transcription bubble (“core recognition element,” CRE). Here, we investigated whether sequence-specific RNAP–CRE interactions affect TSS selection. To do this, we used two next-generation sequencing-based approaches to compare the TSS profile of WT RNAP to that of an RNAP derivative defective in sequence-specific RNAP–CRE interactions. First, using massively systematic transcript end readout, MASTER, we assessed effects of RNAP–CRE interactions on TSS selection in vitro and in vivo for a library of 47 (∼16,000) consensus promoters containing different TSS region sequences, and we observed that the TSS profile of the RNAP derivative defective in RNAP–CRE interactions differed from that of WT RNAP, in a manner that correlated with the presence of consensus CRE sequences in the TSS region. Second, using 5′ merodiploid native-elongating-transcript sequencing, 5′ mNET-seq, we assessed effects of RNAP–CRE interactions at natural promoters in Escherichia coli, and we identified 39 promoters at which RNAP–CRE interactions determine TSS selection. Our findings establish RNAP–CRE interactions are a functional determinant of TSS selection. We propose that RNAP–CRE interactions modulate the position of the downstream end of the transcription bubble in RPo, and thereby modulate TSS selection, which involves transcription bubble expansion or transcription bubble contraction (scrunching or antiscrunching).
Transcription initiation consists of a number of biochemical steps leading to formation of a phosphodiester bond between a nucleoside triphosphate (NTP) bound in the RNA polymerase (RNAP) active-center initiating NTP binding site (i site) and an NTP bound in the RNAP active-center extending NTP binding site (i+1 site) (1–3). For bacterial RNAP, promoter-specific initiation requires the RNAP core enzyme (subunit composition α2ββ'ω) to associate with a σ factor forming the RNAP holoenzyme (subunit composition α2ββ'ωσ). The σ factor contains determinants for sequence-specific protein–DNA interactions with four core promoter elements: the −35 element, the extended −10 element, the −10 element, and the discriminator element (4).
During transcription initiation, RNAP holoenzyme unwinds promoter DNA to form an RNAP-promoter open complex (RPo) containing an unwound, single-stranded “transcription bubble.” The process of promoter unwinding begins within the promoter −10 element and propagates downstream, enabling single-stranded nucleotides at the downstream end of the transcription bubble template strand to occupy the RNAP active center i and i+1 sites (Fig. 1A) (1–3). In particular, in RPo, the second-most downstream nucleotide of the transcription bubble template strand occupies the active center i site and serves as the transcription start site (TSS), and the downstream-most nucleotide of the transcription bubble template strand occupies the active center i+1 site. We designate the template-strand nucleotide at the TSS position as TSST (Fig. 1, base in pink) and the template-strand nucleotide at the next base pair as TSS+1T (Fig. 1, base in red).
The position of the TSS relative to the position of the promoter −10 element is variable (5–11). TSS selection preferentially occurs at the position 7-bp downstream of the promoter −10 element, but can occur over a range of at least five positions, encompassing the positions 6-, 7-, 8-, 9-, or 10-bp downstream of the promoter −10 element. Thus, there must be flexibility in the structure of RPo that enables the position of the TSS to vary relative to the position of the −10 element. We previously have proposed that variability in TSS selection is mediated by variability in the size of the unwound transcription bubble (Fig. S1A) (11–13). According to this model, RPo generally contains a 13-bp unwound transcription bubble that places the template-strand nucleotide 7-bp downstream of the −10 element in the i site and places the template-strand nucleotide 8-bp downstream of the −10 element in the i+1 site (Fig. 1A and Fig. S1A) (TSS = 7). For TSS selection to occur at positions further downstream, the downstream DNA duplex is unwound, the unwound DNA is pulled into and past the RNAP active center, and the unwound DNA is accommodated as single-stranded DNA bulges within the transcription bubble, yielding a “scrunched” complex (Fig. S1A) (TSS = 8 and TSS = 9). For TSS selection to occur at positions further upstream, the opposite occurs: downstream DNA is rewound, downstream DNA is extruded from the RNAP active center, and the extrusion of DNA from the RNAP active center is accommodated by stretching DNA within the transcription bubble, yielding an “antiscrunched” complex (Fig. S1A) (TSS = 6). According to this model, any protein–DNA or protein–protein interaction that affects the energy landscape for transcription bubble expansion or contraction (scrunching or antiscrunching) in RPo potentially could modulate TSS selection (13, 14).
In the structure of RPo, the RNAP core makes direct protein–DNA interactions with the non–template-strand DNA segment at the downstream part of the transcription bubble (15); this DNA segment has been designated the “core recognition element” (CRE; Fig. 1A) (15). RNAP–CRE interactions with the non–template-strand nucleotide at the extreme downstream end of the transcription bubble (i.e., TSS+1NT) are sequence specific, with preference for the base G (GCRE) (Fig. 1, red G) (15).
It has been proposed that sequence-specific RNAP–GCRE interactions facilitate promoter unwinding to form the transcription bubble, stabilize the unwound transcription bubble, and define the downstream end of the transcription bubble (15). According to this proposal, sequence-specific RNAP–GCRE interactions should affect the energy landscape for transcription bubble expansion or contraction (scrunching or antiscrunching) in RPo and therefore potentially could affect TSS selection (Fig. S1B). Here we tested the proposal that sequence-specific RNAP–GCRE interactions affect TSS selection. To do this, we used high-throughput sequencing–based approaches to compare TSS selection by WT RNAP to TSS selection by a mutant RNAP defective in sequence-specific RNAP–GCRE interactions. Our results demonstrate that sequence-specific RNAP–CRE interactions are a determinant of TSS selection.
Results
Sequence-Specific RNAP–CRE Interactions Are a Determinant of TSS Selection in Vitro.
In crystal structures of RNAP–promoter open complexes, residue D446 of the RNAP β subunit makes direct H-bonded interactions with Watson–Crick H-bond–forming atoms of G at GCRE (15). The interactions by βD446 determine specificity at GCRE. Thus, substitution of βD446 by alanine eliminates the ability of RNAP to distinguish A, G, C, and T at the GCRE position (16). Accordingly, an RNAP derivative carrying the βD446A substitution can serve as a reagent to assess the functional significance of sequence-specific RNAP-GCRE interactions (Fig. 1B, Lower Left).
To define the contribution of sequence-specific RNAP–GCRE interactions to TSS selection, we used a high-throughput sequencing–based methodology termed massively systematic transcript end readout (MASTER) (11). MASTER entails the construction of a template library that contains up to 410 (∼1,000,000) bar-coded sequences, production of RNA transcripts from the template library in vitro or in vivo, and analysis of transcript ends using high-throughput sequencing (11, 13).
To analyze the effect of disrupting sequence-specific RNAP–GCRE interactions on TSS selection, we used a MASTER template library, lacCONS-N7, that contained 47 (∼16,000) sequence variants at positions 4–10 bp downstream of the −10 element of a consensus Escherichia coli σ70-dependent promoter (Fig. 1B, Upper) (11). We performed in vitro transcription experiments with the lacCONS-N7 template library, using, in parallel, WT RNAP (RNAP-βWT) or the RNAP derivative containing the βD446A substitution (RNAP-βD446A). RNA products generated in the transcription reactions were isolated and analyzed using high-throughput sequencing of RNA barcodes and 5′ ends (5′ RNA-seq) to define, for each RNA product, the template that produced the RNA and the TSS position (Fig. 1B, Lower Right). For each sequence variant, we calculated the percentage of reads starting at each position within the randomized TSS region, %TSSY = 100 × (no. reads starting at position Y/total no. reads starting at positions 4–10).
To determine the effect of disrupting RNAP–GCRE interactions on TSS selection, we considered TSS positions where TSS+1NT is included within the randomized region of the MASTER template library (i.e., TSS positions 6, 7, 8, and 9). We first calculated %TSS values for each of these positions on the basis of the identity of TSS+1NT. Thus, for each TSS position, we averaged the %TSS values for the ∼4,000 templates having A at TSS+1NT, the ∼4,000 templates having C at TSS+1NT, the ∼4,000 templates having G at TSS+1NT, and the ∼4,000 templates having T at TSS+1NT. Next, we calculated the difference in these %TSS values for reactions performed with RNAP-βWT vs. reactions performed with RNAP-βD446A. We observed that, for all four tested TSS positions (positions 6, 7, 8, and 9), the βD446A substitution decreased the %TSS when TSS+1NT was G (1.3–7.3% decreases; Fig. 2A, top row of table). In contrast, for three of the four tested TSS positions (positions 6, 7, and 8), the βD446A substitution did not decrease the %TSS when TSS+1NT was A, C, or T, and, for the fourth position (position 9), the βD446A substitution did not decrease the %TSS, or decreased the %TSS by smaller amounts, when TSS+1NT was A, C, or T (Fig. 2A, bottom three rows of table).
We identified 1,230 TSS positions (5.6% of the 21,872 above-threshold TSS positions located 6-, 7-, 8-, or 9-bp downstream of the −10 element) that exhibited large, ≥20%, reductions in %TSS in reactions performed with RNAP-βD446A vs. reactions performed with RNAP-βWT. For these 1,230 TSS positions with large, ≥20%, CRE effects, ∼90% contained G at TSS+1NT (Fig. 2B, top row, Right), whereas, for the total sample of 21,872 TSS positions, there were no detectable sequence preferences at position TSS+1NT (Fig. 2B, top row, Left). Enrichment of G at TSS+1NT for TSS position with large, ≥20%, CRE effects was observed for TSS positions located 6-, 7-, 8-, or 9-bp downstream of the −10 element (TSS = 6, 7, 8, or 9) (Fig. 2B, bottom four rows). In summary, the overwhelming majority of TSS positions that exhibit large, ≥20%, CRE effects have G at TSS+1NT.
To validate the MASTER results, we performed further analyses of two TSS region sequences that exhibited large, ≥20%, CRE effects, contained a TSS at the most common position (position 7), and contained G at TSS+1NT (position 8) (Fig. 3). For each of these two TSS region sequences, we prepared templates containing G, A, C, or T at position TSS+1NT, performed in vitro transcription experiments with RNAP-βWT or RNAP-βD446A, and analyzed RNA products by primer extension. For each of the two sets of constructs, the primer-extension results matched the MASTER results. A large, ∼30%, CRE effect was observed when TSS+1NT was G but not when TSS+1NT was A, C, or T (Fig. 3).
The results in Figs. 2 and 3 establish that disrupting sequence-specific RNAP–GCRE interactions affects TSS selection in vitro in a manner that correlates with the presence and position of GCRE in the TSS region. We conclude that sequence-specific RNAP–CRE interactions are a determinant of TSS selection in vitro.
Sequence-Specific RNAP–CRE Interactions Are a Determinant of TSS Selection in Vivo.
Analysis of 47 (∼16,000) consensus promoter derivatives.
To define the contribution of sequence-specific RNAP–GCRE interactions to TSS selection in vivo, we used merodiploid native-elongating transcript sequencing (mNET-seq) (16). mNET-seq involves selective analysis of transcripts associated with an epitope-tagged RNAP in the presence of a mixed population of epitope-tagged RNAP and untagged RNAP (Fig. 4A). In prior work, we used mNET-seq to determine the effect of sequence-specific RNAP–GCRE interactions on pausing during elongation (16). In this work, we used a variant of mNET-seq, 5′ mNET-seq, to determine the effect of sequence-specific RNAP–GCRE interactions on TSS selection (Fig. 4A). To do this, we introduced into cells a plasmid encoding 3xFLAG-tagged βWT or 3× FLAG-tagged βD446A, isolated RNA products associated with RNAP-βWT or RNAP-βD446A by immunoprecipitation, converted RNA 5′ ends to cDNAs, and performed high-throughput sequencing (Fig. 4A).
To enable direct comparison of in vivo and in vitro results, we performed 5′ mNET-seq using the same MASTER template library of 47 (∼16,000) consensus core promoter derivatives that we used for in vitro analysis. The results of MASTER in vivo (Fig. 4 B and C) matched the results of MASTER in vitro (Fig. 2). For all four tested TSS positions (positions 6, 7, 8, and 9), the βD446A substitution decreased the %TSS when TSS+1NT was G (0.6–7.3% decreases) (Fig. 4B, top row of table). In contrast, for three of the four tested TSS positions (positions 6, 7, and 8), the βD446A substitution did not decrease the %TSS when TSS+1NT was A, C, or T, and, for the fourth position (position 9), the βD446A substitution decreased the %TSS by smaller amounts when TSS+1NT was A, C, or T (Fig. 4B, bottom three rows of table). Furthermore, we identified 860 TSS positions (4.3% of the 20,217 above-threshold TSS positions located 6-, 7-, 8-, or 9-bp downstream of the −10 element) with large, ≥20%, CRE effects. For these 860 TSS positions with large, ≥20%, CRE effects, ∼80% contained G at TSS+1NT (Fig. 4C, Right), whereas, for the total sample of 20,217 TSS positions, there were no detectable sequence preferences at position TSS+1NT (Fig. 4C, Left).
The results establish that disrupting sequence-specific RNAP–GCRE interactions affects TSS selection in vivo in a manner that correlates with the presence and position of GCRE in the TSS region. We conclude that sequence-specific RNAP–CRE interactions are a determinant of TSS selection in vivo.
Analysis of E. coli transcriptome.
Having shown by MASTER that sequence-specific RNAP–CRE interactions are a determinant of TSS selection in the context of a consensus core promoter in vivo, we next assessed the contribution of sequence-specific RNAP–CRE interactions to TSS selection in the context of natural promoters in vivo in E. coli. (The primers used in the in vivo MASTER analysis by 5′ mNET-seq shown in Fig. 4 provided information only about transcripts from the synthetic consensus promoter derivatives. This is because the primers used for synthesis of the first cDNA strand annealed only to transcripts produced from the synthetic consensus promoter derivatives. A separate experiment, with primers that enable generation of cDNAs from transcripts produced from natural E. coli promoters, was necessary to provide information about transcripts from natural E. coli promoters. Therefore, to analyze transcripts from natural E. coli promoters, the primers used for synthesis of the first cDNA strand carried nine randomized nucleotides at the 3′ end.)
Using data from experiments performed with RNAP-βWT, we identified 1,500 above-threshold TSS positions associated with natural promoters in E. coli. Of these 1,500 TSS positions, we identified 44 TSS positions that exhibited large, ≥20%, CRE effects (Table S1); 39 of these 44 (∼90%) contained G at TSS+1NT (Fig. 5B, Right, and Table S1), whereas for the total sample of 1,500 above-threshold TSS, there were no detectable sequence preferences at TSS+1NT (Fig. 5B, Left).
Table S1.
Promoter sequence (putative −10 element is underlined; TSS is capitalized; GCRE is in bold) | TSS+1NT | CRE effect (%) | Genome coordinate | Strand | Downstream gene | Distance (in bp) to start of downstream gene |
ggagattgcccatcccgccatcctggtctaagcttggaaaGgatcaa | g | 39 | 1490753 | − | gapC | 40 |
aagcattttcttatacccgttcagacgttattcttatttcAgatcat | g | 36 | 1986797 | + | ftnB | 128 |
tttttacctacgcaggctatttcttcggtacaatcccgatGgttcag | g | 35 | 2434193 | − | accD | 267 |
accctgcattgtgtcctctctttggtactaagctttacttGgagtaa | g | 35 | 3441061 | + | ||
atgaaagattaattagtcaagattatgatatctttttaacGgataat | g | 33 | 2315698 | + | rcsB | 479 |
tatcccgagcggtttcaaaattgtgatctataattaacaaAgtgatg | g | 33 | 4611152 | + | osmY | 244 |
cttaaccggagggtgtaagcaaacccgctacgcttgttacAgagatt | g | 33 | 4330407 | + | proP | 95 |
cgcacaaatcatatgaaaaatgaatgcttatactgaagacCgcgctt | g | 32 | 4439413 | + | ytfK | 174 |
ctatctttattgccagcctggcctttggtagcgtagatccAgaactg | g | 31 | 18553 | + | nhaR | 162 |
GgcgcaaacgtctgggtgctcggtctgttactgttcttccAgcaaat | g | 30 | 3371448 | − | nanE | 412 |
ggatgacgcggcaaatcaggtgctctgctactgttatgaaGgtaacc | g | 28 | 671079 | − | nadD | 507 |
tctgtgtggtgcacgccgcacgggcgtctatattcttgttGgcgtgg | g | 28 | 18122 | + | nhaR | 593 |
aagcgcggcaggctgttgtcgaccaggctaaactggaggaGgaaatg | g | 27 | 2910110 | − | pyrG | 444 |
tgttgaaaaaattttcccccgttttgactaaaatgcgccaGgattga | g | 27 | 850212 | + | ompX | 238 |
aggatctccgttgctttatgagtcatgatttactaaaggcTgcaact | g | 27 | 519164 | + | ybbA | 569 |
acactgacatcactctggcaaggatgttaggatggaccacGgatgat | g | 27 | 3990789 | − | hemC | 23 |
cctcgacgaagctgacatttattgcggtattattgctgatGgcctgc | g | 27 | 702007 | − | nagC | 413 |
AagaatcgcgtcgattgctggtgaatgctaataatgtattGgctcgg | g | 26 | 2522223 | − | ||
tggtacgccgctcatcgcacaactgtttatgatctattacGgcctgc | g | 25 | 1998251 | − | yecC | 437 |
cgccgcctggacaccgctctccgtctggtataatgatgccGgacagg | g | 25 | 633069 | − | ||
tgttgtttaaaaattgttaacaattttgtaaaataccgacGgataga | g | 25 | 127717 | + | lpd | 195 |
tgctggcgcagcaaatgcggctgaagtctataataaagatGgtaata | g | 25 | 2034135 | + | ||
attgctgagacaggctctgttgagggcgtataatccgaaaAgctaat | g | 25 | 4177279 | + | secE | 79 |
gaacctggcaaaagagaccgttgatttctatgatttgaaaGgcgatc | g | 25 | 1796090 | − | ihfA | 538 |
ccgccaccccgtacctctgataatggtctaaaatcattgaAgccact | g | 25 | 2035631 | + | hchA | 204 |
gcaactggtcacgctgatcgacgaagggtacactagcggaAgtcagt | g | 24 | 147583 | + | ||
tgccctttaaaattcggggcgccgaccccatgtggtctcaAgcccaa | g | 24 | 1862617 | + | gapA | 154 |
acatattgactaatttctgtaactgcataatctgatagacCgcgcct | g | 24 | 3354064 | + | gltB | 661 |
ttcattcacaatactggagcaatccagtatgttcattctcTggtata | g | 24 | 4161080 | + | fabR | 44 |
tattcttttaaaaaaaggggtaacaccgtaatctcataccGgtacgc | g | 23 | 2939320 | + | fucR | 48 |
caacgtcaataaaatcaaaatcatcgtctattctctttgtGgtctgc | g | 22 | 3935969 | + | rbsB | 309 |
ccgtgttgcgcaatttgtcaacgaaaacaataatgcgtaaGgtagaa | g | 21 | 2520783 | − | gltX | 111 |
aacgtaatcacggacggtaaaatccgctacgctgtaatacAggctgt | g | 21 | 158624 | + | ||
tttccctcgatcccaacgagcgcattggtaaactgcgtcaGgatcag | g | 21 | 856176 | + | ||
tccttgctttaaaacgttataagcgtttaaattgcgcttcAggtgct | g | 21 | 419421 | + | brnQ | 170 |
aaaccggatacgttccgccacagtggtgtacaatagaacaAgctatt | g | 21 | 1323765 | + | yciO | 333 |
gcggattgacggatcatccgggtcgctataaggtaaggatGgtctta | g | 21 | 3639984 | + | uspA | 127 |
cgctggcggcggcgcgtttaagcaggtattagtagatagcGgtgtcg | g | 20 | 4491570 | − | idnR | 431 |
tcactttggtgatttcaccgtaactgtctatgattaatgaGgcggtg | g | 20 | 3248888 | + | yqjC | 81 |
ccgctcgcatttttccctaagttaaatgagtaatctgatgGtgtgta | t | 34 | 1498605 | + | ydcH | 46 |
actgcagcattctctttacctctgttgcagaatcttgatcCtgagct | t | 24 | 3307652 | − | ||
ttggttgacattcatatgaaaaaaatcataattccatcatGtttgtg | t | 24 | 345252 | + | yahM | 152 |
cccgatcttttttgtcactttttgtataaaatgccagggtGatggtt | a | 20 | 4074647 | + | yihW | 22 |
aacggtaaaggcgaagtcgatgatatcgaccacctcggcaAccgtcg | c | 27 | 4182593 | + |
To validate the 5′ mNET-seq results, we performed primer-extension experiments with two E. coli promoters that contained a TSS that exhibited a large, ≥20%, CRE effect and contained G at TSS+1NT: PsecE and PhemC (Table S1). We generated linear templates carrying PsecE or PhemC, performed in vitro transcription assays using RNAP-βWT or RNAP-βD446A, and analyzed TSS selection by primer extension (Fig. 5C). For each promoter, two prominent start sites were observed in reactions with RNAP-βWT. In the case of PsecE, ∼60% of the transcripts started at an A located 7-bp downstream of the predicted −10 element (A7) and ∼40% of the transcripts started at a G located 8-bp downstream (G8) (Fig. 5C, Left). In the case of PhemC, ∼30% of the transcripts started at an A located 6-bp downstream of the predicted −10 element (A6) and ∼70% of the transcripts started at a G located 8-bp downstream (G8) (Fig. 5C, Right). For each promoter, the percentage of transcripts starting at the position that contained G at TSS+1NT (A7 for PsecE and G8 for PhemC) was reduced by ∼30% when reactions were performed with RNAP-βD446A (Fig. 5C), consistent with results of 5′ mNET-seq (Table S1). We conclude that sequence-specific RNAP–CRE interactions are a determinant of TSS selection in natural promoters in the E. coli genome.
Discussion
Sequence-Specific RNAP–CRE Interactions in TSS Selection.
Here we show that sequence-specific interactions between RNAP and the downstream segment of the nontemplate strand of the transcription bubble (CRE) are a determinant of TSS selection. In particular, using high-throughput sequencing–based approaches, we define a role of sequence-specific recognition of a G at the most downstream position of the CRE (GCRE) during TSS selection in the context of a library of 47 (∼16,000) TSS region sequences of a consensus core promoter in vitro and in vivo (Figs. 2–4) and in the context of natural promoters in E. coli in vivo (Fig. 5 and Table S1).
As discussed above, variability in TSS selection is believed to involve transcription bubble expansion or contraction (scrunching or antiscrunching) in RPo (Fig. S1A) (11–14). We propose that the observed effects of sequence-specific RNAP–CRE interactions on TSS selection occur by influencing transcription bubble expansion or contraction (scrunching or antiscrunching) in RPo (Fig. S1B). Specifically, we propose that sequence-specific RNAP–CRE interactions favor TSS selection at sequences that contain G at TSS+1NT. According to this proposal, the role of sequence-specific RNAP–CRE interactions in defining the downstream edge of the transcription bubble concurrently defines the extent of transcription bubble expansion or contraction (scrunching or antiscrunching) in RPo and therefore modulates TSS selection (Fig. S1B).
The results of this work, together with results of previous work, establish that TSS selection involves at least four promoter sequence determinants: (i) position relative to the −10 element (preference for the position 7-bp downstream of the −10 element) (5–11); (ii) sequence of TSST and TSS-1T (strong preference for pyrimidine at TSST and preference for purine at TSS-1T, which enable initiation with a purine NTP and maximize stacking between DNA bases and the initiating purine NTP) (11, 17–20); (iii) sequence of the discriminator element (preference for TSS selection at upstream positions for discriminator sequences that disfavor scrunching and preference for TSS selection at downstream positions for discriminator sequences that favor scrunching) (13, 14); and (iv) sequence of the CRE (preference for G at TSS+1NT). In addition to these sequence determinants, DNA topology and NTP concentrations also influence TSS selection (6, 8, 9, 11, 21–26). Thus, TSS selection is a multifactorial process, in which the ultimate outcome for a given promoter reflects the contributions of multiple promoter sequence determinants and multiple reaction conditions. Because sequence-specific RNAP–CRE interactions are only one of several determinants of TSS selection, their quantitative significance at different promoters differs. At some promoters, such as PsecE and PhemC, sequence-specific RNAP–CRE interactions have quantitatively large, ≥20%, effects on TSS selection (Fig. 5C and Table S1), whereas at other promoters, the quantitative effects of RNAP–CRE interactions are smaller.
Prospect.
In prior work, we showed that sequence-specific RNAP–CRE interactions affect RPo formation during transcription initiation, RPo stability during transcription initiation, translocational bias during transcription elongation, and sequence-specific pausing during transcription elongation (15, 16). Accordingly, our findings that sequence-specific RNAP–CRE interactions are a determinant of TSS selection add to an emerging view that sequence-specific RNAP–CRE interactions play functionally important roles during all stages of transcription that involve an unwound transcription bubble. A priority for future work will be to assess the roles of sequence-specific RNAP–CRE interactions in other steps of transcription that involve an unwound transcription bubble (e.g., transcriptional slippage, initial transcription, promoter escape, factor-dependent pausing, and termination). Another priority for future work will be to assess possible roles of sequence-specific RNAP–CRE interactions in eukaryotic transcription, noting that RNAP residues involved in sequence-specific RNAP–CRE interactions are conserved in bacteria and eukaryotes.
Materials and Methods
Details for all procedures are in the SI Materials and Methods.
Plasmids and Oligonucleotides.
Plasmids are listed in Table S2. Oligonucleotides are listed in Table S3.
Table S2.
Plasmid name | Description | Source |
pMASTER-lacCONS-N7 | TSS-region library containing 16,295 (∼99.5%) of a possible 16,384 sequences at positions 4- to 10-bp downstream of the −10 element of the lacCONS promoter | (11) |
pRL706 | Used for overexpression and purification of RNAP-βWT | (34) |
pRL706-βD446A | Used for overexpression and purification of RNAP-βD446A | (16) |
pRL706-βWT;3xFLAG | Used for overexpression of 3×FLAG tagged RNAP-βWT | (16) |
pRL706-βD446A;3xFLAG | Used for overexpression of 3×FLAG tagged RNAP-βD446A | (16) |
pBEN 493 | Plasmid derived from pACYC184; inserts can be cloned upstream of the tR' terminator on a HindIII, BamHI fragment | (35) |
pHV-S01 | pBEN493 carrying lacCONS promoter derivative with CGCTGAT TSS- region sequence (in italics) between the HindIII site and BamHI site (used for experiments shown in Fig. 3B). Sequence of insert: AAGCTTGTTCAGAGTTCTACAGTCCGACGATCaggcTTGACActttatgcttcggctcgTATAATgtgCGCTGATgtgagcggataacaatGGATCC | Present work |
pHV-S02 | pBEN493 carrying lacCONS promoter derivative with CGCTAAT TSS-region sequence between the HindIII site and BamHI site (used for experiments shown in Fig. 3B). Sequence of insert: AAGCTTGTTCAGAGTTCTACAGTCCGACGATCaggcTTGACActttatgcttcggctcgTATAATgtgCGCTAATgtgagcggataacaatGGATCC | Present work |
pHV-S03 | pBEN493 carrying lacCONS promoter derivative with CGCTCAT TSS-region sequence between the HindIII site and BamHI site (used for experiments shown in Fig. 3B). Sequence of insert: AAGCTTGTTCAGAGTTCTACAGTCCGACGATCaggcTTGACActttatgcttcggctcgTATAATgtgCGCTCATgtgagcggataacaatGGATCC | Present work |
pHV-S04 | pBEN493 carrying lacCONS promoter derivative with CGCTTAT TSS-region sequence between HindIII and BamHI sites (used for experiments shown in Fig. 3B). Sequence of insert: AAGCTTGTTCAGAGTTCTACAGTCCGACGATCaggcTTGACActttatgcttcggctcgTATAATgtgCGCTTATgtgagcggataacaatGGATCC | Present work |
pHV-S05 | pBEN493 carrying lacCONS promoter derivative with AACGGCA TSS-region sequence between the HindIII site and BamHI site (used for experiments shown in Fig. 3A). Sequence of insert: AAGCTTGTTCAGAGTTCTACAGTCCGACGATCaggcTTGACActttatgcttcggctcgTATAATgtgAACGGCAgtgagcggataacaatGGATCC | Present work |
pHV-S06 | pBEN493 carrying lacCONS promoter derivative with AACGACA TSS-region sequence between the HindIII site and BamHI site (used for experiments shown in Fig. 3A). Sequence of insert: AAGCTTGTTCAGAGTTCTACAGTCCGACGATCaggcTTGACActttatgcttcggctcgTATAATgtgAACGACAgtgagcggataacaatGGATCC | Present work |
pHV-S07 | pBEN493 carrying lacCONS promoter derivative with AACGCCA TSS-region sequence between the HindIII site and BamHI site (used for experiments shown in Fig. 3A). Sequence of insert: AAGCTTGTTCAGAGTTCTACAGTCCGACGATCaggcTTGACActttatgcttcggctcgTATAATgtgAACGCCAgtgagcggataacaatGGATCC | Present work |
pHV-S08 | pBEN493 carrying lacCONS promoter derivative with AACGTCA TSS region sequence between the HindIII site and BamHI site (used for experiments shown in Fig. 3A). Sequence of insert: AAGCTTGTTCAGAGTTCTACAGTCCGACGATCaggcTTGACActttatgcttcggctcgTATAATgtgAACGTCAgtgagcggataacaatGGATCC | Present work |
pHV-S17 | pBEN493 carrying PsecE sequence between the HindIII site and BamHI site (used for experiments shown in Fig. 5C). Sequence of insert (TSS positions are capitalized): AAGCTTctttttgcacgctttcgtaccagaacctggctcatcagtgattttctttgtcataatcattgctgagacaggctctgttgagggcgtataatccgaaaAGctaatacgcgtttcGGATCC | Present work |
pHV-S18 | pBEN493 carrying PhemC sequence between the HindIII site and BamHI site (used for experiments shown in Fig. 5C). Sequence of insert (TSS positions are capitalized): AAGCTTtcaggatccactgccagacctcattttacggtttgcgcaggcgtctacgtttcaccacaacactgacatcactctggcaaggatgttaggatggaccAcGgatgataatgacggGGATCC | Present work |
Table S3.
Name | Description | Sequence |
HV100 | HindIII upstream primer | 5′-TATAAAGCTTGTTCAGAGTTCTACAGTCCGACGA-3′ |
HV101 | BamHI downstream primer | 5′-TGCCGGATCCattgttatccgctcacATCAGCGcacATTATAcgagccga-3′ |
HV102 | BamHI downstream primer | 5′-TGCCGGATCCattgttatccgctcacATTAGCGcacATTATAcgagccga-3′ |
HV103 | BamHI downstream primer | 5′-TGCCGGATCCattgttatccgctcacATGAGCGcacATTATAcgagccga-3′ |
HV104 | BamHI downstream primer | 5′-TGCCGGATCCattgttatccgctcacATAAGCGcacATTATAcgagccga-3′ |
HV105 | BamHI downstream primer | 5′-TGCCGGATCCattgttatccgctcacTGCCGTTcacATTATAcgagccga-3′ |
HV106 | BamHI downstream primer | 5′-TGCCGGATCCattgttatccgctcacTGTCGTTcacATTATAcgagccga-3′ |
HV107 | BamHI downstream primer | 5′-TGCCGGATCCattgttatccgctcacTGGCGTTcacATTATAcgagccga-3′ |
HV108 | BamHI downstream primer | 5′-TGCCGGATCCattgttatccgctcacTGACGTTcacATTATAcgagccga-3′ |
HV117 | −100 HindIII upstream primer, secE | 5′-TATAAAGCTTctttttgcacgctttcgtaccag-3′ |
HV118 | +15 BamHI downstream primer, secE | 5′-TGCCGGATCCgaaacgcgtattagcttttcg-3′ |
HV119 | −100 HindIII upstream primer, hemC | 5′-TATAAAGCTTtcaggatccactgccagacctc-3′ |
HV120 | +15 BamHI downstream primer, hemC | 5′-TGCCGGATCCccgtcattatcatccgtggt-3′ |
HV121 | Biotinylated upstream primer for making IVT templates (anneals 35 base pairs upstream of HindIII site in BN493) | 5′-/5Biosg/GTTGTAATTCTCATGTTTGACAGC-3′ |
HV122 | Downstream primer for making IVT templates | 5′-GGTCCTCGCCGAAAATGACCCAG-3′ |
HV123 | For primer extension analyses of RNAs produced from promoter-tR' fusions | 5′-CCTCTCTGCCGGATCC-3′ |
BN436 | Sequencing primer | 5′-GATTTCAGTGCAATTTATCTC-3′ |
s1086 | Illumina RA5+6N | 5′-GUUCAGAGUUCUACAGUCCGACGAUCNNNNNN-3′(all bases are RNA) |
s1082 | Illumina RTP+9N | 5′-GCCTTGGCACCCGAGAATTCCANNNNNNNNN-3′ |
Illumina RP1 | 5′-AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGA-3′ | |
Illumina RPI1 | 5′-CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA-3′ | |
s1206 | Illumina RA5+15N | 5′-GUUCAGAGUUCUACAGUCCGACGAUCNNNNNNNNNNNNNNN-3′(all bases are RNA) |
s128 | RT primer used for generation of MASTER libraries | 5′-CCTTGGCACCCGAGAATTCC-3′ |
s1115 | Custom primer used for Illumina sequencing | 5′-CTACACGTTCAGAGTTCTACAGTCCGACGATC-3′ |
Proteins.
RNAP-βWT holoenzyme and RNAP-βD446 holoenzyme were prepared from E. coli strain XE54 (27) transformed with plasmids pRL706 or pRL706-βD446A, respectively, using procedures described in ref. 28.
In Vitro Transcription Assays.
For MASTER experiments shown in Fig. 2, single round in vitro transcription assays were performed essentially as described in ref. 11 using a linear DNA template containing the placCONS-N7 library (Fig. 1B, Upper). RNA products were purified and TSS selection was analyzed by 5′ RNA-seq as described in ref. 11 (see Table S4 for list of samples). In vitro transcription assays shown in Figs. 3 and 5C were performed essentially as described in ref. 29. RNA products generated in these reactions were analyzed by primer extension as described in ref. 29.
Table S4.
Sample serial no. | Description |
RNA libraries | |
VV631 | 5′ mNET-seq, RNAP-βWT, replicate 1 |
VV632 | 5′ mNET-seq, RNAP-βWT, replicate 2 |
VV655 | 5′ mNET-seq, RNAP-βWT, replicate 3 |
VV656 | 5′ mNET-seq, RNAP-βWT, replicate 4 |
VV633 | 5′ mNET-seq, RNAP-βD446A, replicate 1 |
VV634 | 5′ mNET-seq, RNAP-βD446A, replicate 2 |
VV657 | 5′ mNET-seq, RNAP-βD446A, replicate 3 |
VV658 | 5′ mNET-seq, RNAP-βD446A, replicate 4 |
VV854 | MASTER in vitro, RNAP-βWT, replicate 1 |
VV855 | MASTER in vitro, RNAP-βWT, replicate 2 |
VV860 | MASTER in vitro, RNAP-βD446A, replicate 1 |
VV861 | MASTER in vitro, RNAP-βD446A, replicate 2 |
VV871 | MASTER in vivo, RNAP-βWT, replicate 1 |
VV872 | MASTER in vivo, RNAP-βWT, replicate 2 |
VV873 | MASTER in vivo, RNAP-βWT, replicate 3 |
VV874 | MASTER in vivo, RNAP-βD446A, replicate 1 |
VV875 | MASTER in vivo, RNAP-βD446A, replicate 2 |
VV876 | MASTER in vivo, RNAP-βD446A, replicate 3 |
DNA template libraries | |
VV891 | Used for analysis of VV854, VV855, VV860, VV861 |
VV782 | Used for analysis of VV871 |
VV783 | Used for analysis of VV871 |
VV914 | Used for analysis of VV871 |
VV784 | Used for analysis of VV872 |
VV904 | Used for analysis of VV872 |
VV905 | Used for analysis of VV872 |
VV786 | Used for analysis of VV873 |
VV906 | Used for analysis of VV873 |
VV907 | Used for analysis of VV873 |
VV788 | Used for analysis of VV874 |
VV790 | Used for analysis of VV875 |
VV792 | Used for analysis of VV876 |
5′ mNET-seq.
For the in vivo MASTER experiments shown in Fig. 4, E. coli DH10B-T1R cells (Life Technologies) containing plasmids pRL706-βWT;3xFLAG or pRL706-βD446A;3xFLAG were transformed with ∼50 ng pMASTER-lacCONS-N7 library to obtain a 25-mL overnight culture representing cells derived from at least 20 million unique transformants; 0.5 mL of the overnight cell culture was used to inoculate 50 mL LB media containing 100 μg/μL carbenicillin and 25 μg/μL chloramphenicol. When the cell density reached an OD600 ∼0.3, 1 mM isopropyl β-d-1-thiogalactopyranoside (IPTG) was added, and cells were grown for an additional 2 h. RNA associated with RNAP was isolated using procedures described in ref. 16.
For the experiments shown in Fig. 5, MG1655 cells containing plasmids pRL706-βWT;3xFLAG or pRL706-βD446A;3xFLAG were shaken at 220 rpm at 37 °C in 100 mL 4× LB (40 g Bacto tryptone, 20 g Bacto yeast extract, and 10 g NaCl per liter) containing 200 µg/µL carbenicillin in 500-mL DeLong flasks (Bellco). When cell density reached an OD600 ∼0.6, 1 mM IPTG was added, and cells were grown for an additional 4 h. RNA associated with RNAP was isolated using procedures described in ref. 16.
RNA products associated with RNAP were analyzed by 5′ RNA-seq using procedures described in ref. 30 (see Table S4 for list of samples).
In Vitro and in Vivo MASTER Data Analysis.
Analysis of 5′ RNA-seq data obtained from MASTER experiments was performed essentially as described in ref. 11. Sequencing of template DNA was used to associate the 7-bp randomized TSS region sequence with a corresponding second 15-bp randomized sequence that serves as its barcode. Reads that contained a perfect match to the DNA template from which they were derived were used for the analysis of TSS selection. The percentage of reads starting at a given TSS position (%TSS) was calculated using the following formula: %TSSY = 100 × (no. reads starting at position Y/total no. reads starting at positions 4–10). Above-threshold TSS positions were those for which the %TSS value was ≥20%.
5′ mNET-seq Analysis of Natural Promoters in Vivo in E. coli.
Identification of TSS positions and TSS regions for natural promoters in E. coli was done essentially as described in ref. 31. The first six bases of each read were trimmed (to remove sequences introduced during the cDNA library construction procedure), and the next 30 bases were aligned to the E. coli reference genome (NC_000913.3) using Bowtie (32). Among these reads, we used those that aligned to a unique position in the genome with zero mismatches for the analysis of TSS selection.
Using data derived from the analysis of RNA products associated with RNAP-βWT, we defined a list of primary TSS positions that met the following two criteria: (i) the read count at the coordinate was above a threshold value (≥50 reads) and (ii) the read count at the coordinate represented a local maximum in an 11-bp window centered on the coordinate. For each primary TSS position, we designated the positions spanning 5-bp upstream to 5-bp downstream as a TSS region. Next, for each TSS region, we calculated the percentage of reads starting at each of the 11 positions: %TSSY = 100 × (no. reads starting at position Y/total no. reads starting within the TSS region). We identified 1,500 TSS positions within TSS regions with an above-threshold value of %TSS (≥20%). For each of these 1,500 TSS positions, we calculated the difference between the average %TSS observed in experiments performed with RNAP-βWT and that observed in experiments performed with RNAP-βD446A. TSS positions for which this difference was ≥20% are listed in Table S1.
SI Materials and Methods
Analysis of TSS Selection in Vitro by MASTER.
Preparation of template DNA.
pMASTER-lacCONS-N7 plasmid DNA was diluted to ∼109 molecules/μL. One microliter of diluted DNA was amplified by emulsion PCR using a Micellula DNA Emulsion and Purification Kit (Chimerx) in detergent-free Phusion HF reaction buffer containing 5 μg/mL BSA, 0.4 mM dNTPs, 0.5 μM Illumina RP1 primer, 0.5 μM Illumina RPI1 primer, and 0.04 U/μL Phusion HF polymerase (Thermo Scientific). Emulsion PCR reactions were performed with an initial denaturation step of 10 s at 95 °C, amplification for 30 cycles (denaturation for 5 s at 95 °C, annealing for 5 s at 60 °C, and extension for 15 s at 72 °C), and a final extension for 5 min at 72 °C. The emulsion was broken, and DNA was purified according to the manufacturer’s recommendations. DNA was recovered by ethanol precipitation and resuspended in 30 μL nuclease-free water.
Transcription reactions.
In vitro transcription assays were performed by mixing 10 nM template DNA with 50 nM RNAP-βWT holoenzyme or 50 nM RNAP-βD446A holoenzyme in transcription buffer [50 mM Tris⋅HCl (pH 8.0), 10 mM MgCl2, 0.01 mg/mL BSA, 100 mM KCl, 5% (vol/vol) glycerol, 10 mM DTT, and 0.4U/μL RNase OUT]. RNAP-promoter open complexes were allowed to form by incubation at 37 °C for 10 min. A single round of transcription was initiated by addition of a mixture of NTPs to a final concentration of 1 mM and heparin to a final concentration of 0.1 mg/mL. After 15 min, reactions were stopped by addition of EDTA (pH 8) to a final concentration of 10 mM. Nucleic acids were recovered by ethanol precipitation and resuspended in 30 μL nuclease-free water.
Purification of RNA products.
Nucleic acids recovered from the ethanol precipitation were treated with 2 U TURBO DNase (Life Technologies) at 37 °C for 1 h, mixed with an equal volume of 2× RNA loading dye [95% (vol/vol) deionized formamide, 18 mM EDTA, 0.25% (wt/vol) SDS, xylene cyanol, bromophenol blue, and amaranth], and separated by electrophoresis on 10% (wt/vol) acrylamide, 7 M urea slab gels (equilibrated and run in 1× TBE). The gel was stained with SYBR Gold nucleic acid gel stain (Life Technologies), bands were visualized on a UV transilluminator, and RNA transcripts ∼100 nt in size were excised from the gel. The excised gel slice was crushed and incubated in 300 μL 0.3 M NaCl in 1× TE buffer at 70 °C for 10 min. Eluted RNAs were separated from crushed gel fragments using a Spin-X column (Corning). After the first elution, the crushed gel fragments were collected; the elution procedure was repeated; and nucleic acids were collected, pooled with the first elution, isolated by ethanol precipitation, and resuspended in 25 μL RNase-free water. Purified RNA products were analyzed by 5′ RNA-seq using the procedure described in the next section.
5′ RNA-seq.
Before cDNA library construction 5′ triphosphate RNA products were converted to 5′ monophosphate RNA. To do this, ∼100 ng purified RNA was treated with 20 U 5′-RNA polyphosphatase (New England Biolabs). Samples were extracted with acid phenol:chloroform (pH 4.5). RNA products were recovered by ethanol precipitation and resuspended in 10 μL RNase-free water.
Ligation of adaptor to 5′ end of RNA products.
RNA products were combined with PEG 8000 [10% (wt/vol) final concentration], oligo s1206 (1 pmol/μL final concentration), ATP (1 mM final concentration), 40 U RNase OUT, 1× T4 RNA ligase 1 reaction buffer (New England Biolabs), and 10 U of T4 RNA ligase 1 (New England Biolabs) in a total volume of 30 μL. The mixture was incubated at 16 °C for 16 h.
Size selection of adaptor-ligated RNA products.
Adaptor-ligated RNA products were mixed with an equal volume of 2× RNA loading dye and separated by electrophoresis on 10% (wt/vol) acrylamide, 7 M urea slab gels (equilibrated and run in 1× TBE). The gel was stained with SYBR Gold nucleic acid gel stain, bands were visualized with UV transillumination, and species ranging from ∼80 to ∼300 nt were excised from the gel. RNA products were eluted from the gel using the procedure described above, isolated by ethanol precipitation, and resuspended in 10 μL nuclease-free water.
cDNA synthesis.
Ten microliters of gel-eluted RNA products was mixed with 0.3 μL s128 oligonucleotide (100 pmol/μL), incubated at 65 °C for 5 min, and cooled to 4 °C; 9.7 μL of a mixture containing 4 μL 5× First-Strand buffer (Life Technologies), 1 μL 10 mM dNTP mix, 1 μL 100 mM DTT, 1 μL (40 U) RNase OUT, 1 μL (200 U) SuperScript III Reverse Transcriptase (Life Technologies), and 1.7 μL nuclease-free water was added to the RNA/oligonucleotide mixture. The reactions were incubated in a thermal cycler with a heated lid at 25 °C for 5 min, followed by 55 °C for 60 min and 70 °C for 15 min. Reactions were cooled to room temperature, 10 U RNase H (Life Technologies) was added, and the reactions were incubated at 37 °C for 20 min.
Size selection of cDNA products.
An equal volume of 2× RNA loading dye was added, and nucleic acids were separated by electrophoresis on 10% (wt/vol) acrylamide, 7 M urea slab gels (equilibrated and run in 1× TBE). The gel was stained with SYBR gold nucleic acid gel stain, and ∼80 to ∼150 nt species were excised from the gel. cDNA products were recovered from the gel using the procedure described above and resuspended in 10 μL nuclease-free water.
Amplification of cDNA products.
Five microliters of gel-isolated cDNA products were added to a mixture containing 1× Phusion HF reaction buffer, 0.2 mM dNTPs, 0.25 μM Illumina RP1 primer, 0.25 μM Illumina index primer, and 0.02 U/μL Phusion HF polymerase. PCR was performed with an initial denaturation step of 30 s at 98 °C, amplification for 11 cycles (denaturation for 10 s at 98 °C, annealing for 20 s at 62 °C, and extension for 10 s at 72 °C), and a final extension for 5 min at 72 °C.
Purification of cDNA products.
Amplified cDNA products were separated by gel electrophoresis using a nondenaturing 10% (wt/vol) acrylamide slab gel (equilibrated and run in 1× TBE). The gel was stained with SYBR Gold nucleic acid gel stain, and species at ∼170 bp were excised from the gel. cDNA products were eluted from the gel with 600 μL 0.3 M NaCl in 1× TE buffer at 37 °C for 2 h, precipitated, and resuspended in 13 μL nuclease-free water.
High-throughput sequencing.
Libraries were sequenced on an Illumina HiSeq 2500 platform in rapid mode using custom primer s1115.
Data analysis.
Sequencing of template DNA (sample VV891) (Table S4) was used to associate the 7-bp randomized sequence in the region of interest with a corresponding second 15-bp randomized sequence that serves as its barcode. The identity of the 15-bp barcode in each RNA product was used to determine the identity of bases at positions 4–10 of the lacCONS template from which the RNA product was generated. Sequences derived from the RNA 5′ end of reads that were perfect matches to the sequence of the template were used for analysis of TSS selection. Experiments were performed in duplicate (samples VV854 and VV855 for RNAP-βWT and samples VV860 and VV861 for RNAP-βD446A) (Table S4).
Analysis of TSS Selection in Vitro by Primer Extension.
Preparation of template DNA.
Linear DNA templates were generated by PCR using plasmids pHV-S01, pHV-S02, pHV-S03, pHV-S04, pHV-S05, pHV-S06, pHV-S07, pHV-S08, pHV-S17, or pHV-S18 as template and oligonucleotide primer HV121, which contains a 5′ biotin moiety, and oligonucleotide primer HV122.
The biotinylated linear DNA templates generated by PCR were bound to streptavidin-coated paramagnetic beads [Streptavidin MagnaSphere Paramagnetic Particles (SA-PMPs); Promega]. To do this, 100 µL SA-PMP slurry per each DNA template was washed three times with 100 µL binding buffer [10 mM Tris (pH 8), 150 mM NaCl, and 100 µg/mL BSA]. The SA-PMPS were resuspended in 100 µL binding buffer, 2.5 µL 400 nM DNA template stock was added to each slurry, and the mixture was gently mixed for 30 min at 25 °C. The binding buffer was removed, SA-PMPs were washed three times with 1× TB [40 mM Tris (pH 8), 10 mM MgCl2, 50 mM KCl, 10 mM β-mercaptoethanol, 10 µg/mL BSA, and 5% (wt/vol) PEG-8000], and resuspended in 10 µL reaction buffer to obtain 100 nM SA-PMP–conjugated DNA templates stock solutions that were used for the transcription assays.
Transcription reactions.
In vitro transcription assays were performed by mixing 50 nM RNAP with 10 nM template (attached to beads) for 10 min at 37 °C in 1× TB. Transcription was initiated by adding NTPs to a final concentration of 100 µM. The total reaction volume was 20 μL. Reactions were stopped after 10 min by adding 100 μL stop solution [0.5 mg/mL glycogen and 10 mM EDTA (pH 8.0)]. Magnetic beads were pelleted using a MagneSphere Technology Magnetic Separation Stand (Promega), and the supernatant was transferred to a fresh tube and extracted with acid phenol:chloroform. RNA transcripts were recovered by ethanol precipitation and resuspended in 12 μL water.
Primer-extension reactions.
Oligonucleotide primer HV123 was 32P-5′ end-labeled with T4 polynucleotide kinase in a 50-μL reaction containing 120 pmol of primer, 40 U of enzyme, and 100 μCi of γ32P ATP (Perkin Elmer). The labeling reaction was incubated at 37 °C for 1 h followed by an incubation at 95 °C for 10 min. Unincorporated nucleotides and salts were removed by passage over an Illustra G-25 microspin column (GE Healthcare). One microliter of labeled primer was mixed with 5 μL of the RNA recovered from the transcription reactions. This mixture was heated at 90 °C for 2 min and immediately transferred to ice. Reverse transcription was performed by adding 4 µL of a mixture containing 10 U AMV reverse transcriptase (New England Biolabs), AMV buffer, dNTPs (10 mM of each dNTP), and 10 U murine RNase inhibitor (New England Biolabs) to the annealed primer template mixture and incubating at 55 °C for 1 h, 5 min at 95 °C, and then cooled to 4 °C. Reactions were stopped by addition of 10 μL 98% (vol/vol) formamide containing 10 mM EDTA, 0.02% (wt/vol) bromophenol blue, and 0.02% (wt/vol) xylene cyanol. Samples were electrophoresed on an 8% (wt/vol) acrylamide, 7 M urea slab gel (equilibrated and run using a gradient buffer of 1× TBE in the upper reservoir and 1× TBE, 0.3 M NaOAc in the lower reservoir). Radiolabeled species were detected by storage-phosphor imaging. TSS assignments were made by comparison with a sequencing ladder prepared using the same radiolabeled primer used for the extension reactions and a Sequenase Version 2.0 DNA sequencing kit (USB Corporation). Experiments were performed three independent times (one of the independent replicates for each template is shown in Fig. 3 and Fig. 5C). The values for %TSS (RNAP-βWT) − %TSS (RNAP-βD446A) reported in Fig. 3 were derived by averaging the results of the three experiments.
Analysis of TSS Selection in Vivo from 47 (∼16,000) Consensus Promoter Derivatives.
Cell growth.
Escherichia coli DH10B-T1R cells (Life Technologies) containing plasmids pRL706-βWT;3xFLAG or pRL706-βD446A;3xFLAG were transformed with ∼50 ng pMASTER-lacCONS-N7 library to obtain a 25-mL overnight culture representing cells derived from at least 20 million unique transformants; 0.5 mL of the overnight cell culture was used to inoculate 50 mL LB media containing 100 μg/μL carbenicillin and 25 μg/μL chloramphenicol. When the cell density reached an OD600 ∼0.3, 1 mM IPTG was added, and cells were grown for an additional 2 h. Cell suspensions were divided equally among 12 × 2-mL tubes (BioExcell) and centrifuged (1 min, 21,000 × g at room temperature) to collect cells, and supernatants were removed. Cell pellets were then rapidly frozen on dry ice and stored at −80 °C.
pMASTER-lacCONS-N7 plasmid DNA was isolated from these cells using a Plasmid Miniprep kit (Qiagen). Plasmid DNA was used as template in emulsion PCR reactions to generate a product that was sequenced to assign barcodes (see below).
RNA isolation.
Cells pellets derived from 12 mL culture were resuspended in 1 mL lysis buffer (B-Per, Bacterial Protein Extraction Reagent; Thermo Scientific) supplemented with one quarter of a protease inhibitor mixture tablet (complete Mini EDTA-free; Roche), 1 mM EDTA, 80 U Murine RNase Inhibitor (NEB), 100 μg lysozyme (Thermo Scientific), and 150 U DNase I (Thermo Scientific) and incubated for 10 min. The lysate was then clarified by centrifugation (10 min, 21,000 × g), and NaCl was added to a final concentration of 150 mM. The lysate was added to 1 mL anti-FLAG M2 affinity gel (Sigma Aldrich) that had been washed three times with 3 mL 1× TBS and equilibrated in 3 mL wash buffer (B-Per solution containing 150 mM NaCl, 1 mM EDTA, 50 U/mL Murine RNase Inhibitor, and protease inhibitor mixture [complete EDTA-free (Roche); 1 tablet per 50 mL]). The lysate and affinity gel mixture was nutated at 4 °C for 2.5 h in a 1.7-mL centrifuge tube. The mixture was transferred to a 10-mL Econo-Pack disposable chromatography column (Bio-Rad), the flow through was collected, and the affinity gel was washed eight times with 5 mL wash buffer and three times with 250 μL elution buffer (B-Per solution containing 150 mM NaCl, 1 mM EDTA, 50 U/mL Murine RNase Inhibitor, and 2 mg/mL 3× FLAG peptide; GenScript). For the washes with elution buffer, the affinity gel was incubated for 30 min before collection of the fractions. The presence of epitope tagged βWT or βD446A was analyzed in each fraction by immunoblotting.
To isolate the RNA products associated with RNAP, pooled eluates from above were mixed with three volumes of TRI Reagent solution (Molecular Research Center), incubated at 70 °C for 10 min, and centrifuged (10 min, 21,000 × g) to remove insoluble material. The supernatant was transferred to a fresh tube, ethanol was added to a final concentration of 60.5% (vol/vol), and the mixture was applied to a Direct-zol spin column (Zymo Research). DNase I treatment was performed on-column according to the manufacturer’s recommendations. RNA products were eluted from the column with three sequential portions of 30 μL nuclease-free water that had been heated to 70 °C. Before cDNA library construction, RNA products were treated with 4 U TURBO DNase (Ambion) at 37 °C for 1 h. Following DNase treatment, samples were extracted with acid phenol:chloroform, and RNA products were recovered by ethanol precipitation and resuspended in RNase free water.
5′ RNA-seq.
Before cDNA library construction, 5′ monophosphate RNA products were first removed by treatment of 0.75–1.3 μg of RNA with 1 U Terminator 5′-Phosphate-Dependent Exonuclease (Epicentre). Samples were extracted with acid phenol:chloroform, RNA products were recovered by ethanol precipitation and resuspended in RNase-free water. Next, 5′ triphosphate RNA products were converted to 5′ monophosphate RNA products by treating samples with 20 U 5′-RNA polyphosphatase as described in ref. 29. Samples were extracted with acid phenol:chloroform, and RNA products were recovered by ethanol precipitation and resuspended in 10 μL RNase-free water.
5′ RNA-seq analysis was performed as described above.
Data analysis.
In vivo MASTER experiments were performed in triplicate (samples VV871, VV872, and VV873 for RNAP-βWT and samples VV874, VV875, and VV876 for RNAP-βD446A) (Table S4). pMASTER-lacCONS-N7 plasmid DNA isolated from each individual cell culture was used as template in emulsion PCR reactions to generate products that were sequenced to assign barcodes as described in ref. 11. For each RNAP-βWT sample, three emulsion PCR products were generated and sequenced (Table S4). For each RNAP-βD446A sample, one emulsion PCR product was generated and sequenced (Table S4). The identity of the 15-bp barcode in each RNA product was used to determine the identity of bases at positions 4–10 of the lacCONS template from which the RNA product was generated. Sequences derived from the RNA 5′ end of reads that were perfect matches to the sequence of the template were used for analysis of TSS selection.
Analysis of TSS Selection in Natural Promoters in Vivo in E. coli.
Cell growth.
MG1655 cells containing plasmids pRL706-βWT;3xFLAG or pRL706-βD446A;3xFLAG were shaken at 220 rpm at 37 °C in 100 mL 4× LB (40 g Bacto tryptone, 20 g Bacto yeast extract, and 10 g NaCl per liter) containing 200 µg/µL carbenicillin in 500-mL DeLong flasks (Bellco). When cell density reached an OD600 ∼0.6, 1 mM IPTG was added, and cells were grown for an additional 4 h. Cells were harvested and stored as described above.
RNA isolation.
RNA products associated with RNAP were isolated as described above.
5′ RNA-seq.
Before cDNA library construction, enzymatic treatments were performed to first remove 5′ monophosphate RNA products and second convert 5′ triphosphate RNA products to 5′ monophosphate RNA products as described above.
5′ RNA-seq analysis was performed as described above with the following exceptions. In the step, ligation of adaptor to 5′ end of RNA products, primer s1086 was used instead of primer s1206. In the step, size selection of 5′ adaptor-ligated RNA products, all species larger than the 5′ adaptor were excised from the gel instead of ∼80- to ∼300-nt species. In the step, cDNA synthesis, primer s1082 was used instead of s128. In the step, size selection of cDNA products, ∼90- to ∼450-nt cDNA products were isolated instead of ∼80- to ∼150-nt cDNA products. In the step, purification of cDNA products, ∼160- to ∼350-bp species were isolated instead of ∼170-bp species.
Data analysis.
Identification of TSS positions and TSS regions for natural promoters in E. coli was done essentially as described in ref. 31. The first six bases of each read were trimmed (to remove sequences introduced during the cDNA library construction procedure), and the next 30 bases were aligned to the E. coli reference genome (NC_000913.3) using Bowtie (32). Among these reads, we used those that aligned to a unique position in the genome with zero mismatches for the analysis of TSS selection. Using data derived from the analysis of RNA products associated with RNAP-βWT (samples VV631, VV632, VV655, and VV656; Table S4), we defined a list of primary TSS positions that met the following two criteria: (i) the read count at the coordinate was above a threshold value (≥50 reads) and (ii) the read count at the coordinate represented a local maximum in an 11-bp window centered on the coordinate. For each primary TSS position, we designated the positions spanning 5-bp upstream to 5-bp downstream as a TSS region. Next, for each TSS region, we calculated the percentage of reads starting at each of the 11 positions, %TSSY = 100 × (# reads starting at position Y/total # reads starting within the TSS region).
To enable a comparison between data derived from analysis of nascent RNA associated with RNAP-βWT with that derived from analysis of nascent RNA associated with RNAP-βD446A, we identified TSS regions for which we obtained ≥50 total reads starting within the TSS region in each of the eight samples used for the analysis (VV631–VV634 and VV655–VV658; Table S4). Next, we averaged the %TSS values observed for RNAP-βWT (samples VV631, VV632, VV655, and VV656; Table S4) for each position within these TSS regions. We identified 1,500 TSS positions with an above-threshold value of %TSS (≥20%). For each of these 1,500 TSS positions, we calculated the difference between the average %TSS observed in experiments performed with RNAP-βWT (average derived from samples VV631, VV632, VV655, and VV656; Table S4) and that observed in experiments performed with RNAP-βD446A (average derived from samples VV633, VV634, VV657, and VV658; Table S4). Table S1 lists TSS positions for which this difference was ≥20%.
Acknowledgments
We thank Jared Knoblauch for assistance with data analysis. This work was supported by National Institutes of Health Grants GM041376 (to R.H.E.), GM088343 (to B.E.N.), GM096454 (to B.E.N.), and GM115910 (to B.E.N.).
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: The sequence reported in this paper has been deposited in the NIH/NCBI Sequence Read Archive (accession no. SRP071742).
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1603271113/-/DCSupplemental.
References
- 1.Saecker RM, Record MT, Jr, Dehaseth PL. Mechanism of bacterial transcription initiation: RNA polymerase - promoter binding, isomerization to initiation-competent open complexes, and initiation of RNA synthesis. J Mol Biol. 2011;412(5):754–771. doi: 10.1016/j.jmb.2011.01.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Decker KB, Hinton DM. Transcription regulation at the core: Similarities among bacterial, archaeal, and eukaryotic RNA polymerases. Annu Rev Microbiol. 2013;67:113–139. doi: 10.1146/annurev-micro-092412-155756. [DOI] [PubMed] [Google Scholar]
- 3.Ruff EF, Record MT, Jr, Artsimovitch I. Initial events in bacterial transcription initiation. Biomolecules. 2015;5(2):1035–1062. doi: 10.3390/biom5021035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Feklístov A, Sharon BD, Darst SA, Gross CA. Bacterial sigma factors: A historical, structural, and genomic perspective. Annu Rev Microbiol. 2014;68:357–376. doi: 10.1146/annurev-micro-092412-155737. [DOI] [PubMed] [Google Scholar]
- 5.Aoyama T, Takanami M. Essential structure of E. coli promoter II. Effect of the sequences around the RNA start point on promoter function. Nucleic Acids Res. 1985;13(11):4085–4096. doi: 10.1093/nar/13.11.4085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Sørensen KI, Baker KE, Kelln RA, Neuhard J. Nucleotide pool-sensitive selection of the transcriptional start site in vivo at the Salmonella typhimurium pyrC and pyrD promoters. J Bacteriol. 1993;175(13):4137–4144. doi: 10.1128/jb.175.13.4137-4144.1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Jeong W, Kang C. Start site selection at lacUV5 promoter affected by the sequence context around the initiation sites. Nucleic Acids Res. 1994;22(22):4667–4672. doi: 10.1093/nar/22.22.4667. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Liu J, Turnbough CL., Jr Effects of transcriptional start site sequence and position on nucleotide-sensitive selection of alternative start sites at the pyrC promoter in Escherichia coli. J Bacteriol. 1994;176(10):2938–2945. doi: 10.1128/jb.176.10.2938-2945.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Walker KA, Osuna R. Factors affecting start site selection at the Escherichia coli fis promoter. J Bacteriol. 2002;184(17):4783–4791. doi: 10.1128/JB.184.17.4783-4791.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lewis DE, Adhya S. Axiom of determining transcription start points by RNA polymerase in Escherichia coli. Mol Microbiol. 2004;54(3):692–701. doi: 10.1111/j.1365-2958.2004.04318.x. [DOI] [PubMed] [Google Scholar]
- 11.Vvedenskaya IO, et al. Massively systematic transcript end readout, “MASTER”: Transcription start site selection, transcriptional slippage, and transcript yields. Mol Cell. 2015;60(6):953–965. doi: 10.1016/j.molcel.2015.10.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Robb NC, et al. The transcription bubble of the RNA polymerase-promoter open complex exhibits conformational heterogeneity and millisecond-scale dynamics: Implications for transcription start-site selection. J Mol Biol. 2013;425(5):875–885. doi: 10.1016/j.jmb.2012.12.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Winkelman JT, et al. Multiplexed protein-DNA cross-linking: Scrunching in transcription start site selection. Science. 2016;351(6277):1090–1093. doi: 10.1126/science.aad6881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Winkelman JT, Chandrangsu P, Ross W, Gourse RL. Open complex scrunching before nucleotide addition accounts for the unusual transcription start site of E. coli ribosomal RNA promoters. Proc Natl Acad Sci USA. 2016;113(13):E1787–E1795. doi: 10.1073/pnas.1522159113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Zhang Y, et al. Structural basis of transcription initiation. Science. 2012;338(6110):1076–1080. doi: 10.1126/science.1227786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Vvedenskaya IO, et al. Interactions between RNA polymerase and the “core recognition element” counteract pausing. Science. 2014;344(6189):1285–1289. doi: 10.1126/science.1253458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Maitra U, Hurwitz H. The role of DNA in RNA synthesis, IX. Nucleoside triphosphate termini in RNA polymerase products. Proc Natl Acad Sci USA. 1965;54(3):815–822. doi: 10.1073/pnas.54.3.815. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Jorgensen SE, Buch LB, Nierlich DP. Nucleoside triphosphate termini from RNA synthesized in vivo by Escherichia coli. Science. 1969;164(3883):1067–1070. doi: 10.1126/science.164.3883.1067. [DOI] [PubMed] [Google Scholar]
- 19.Hawley DK, McClure WR. Compilation and analysis of Escherichia coli promoter DNA sequences. Nucleic Acids Res. 1983;11(8):2237–2255. doi: 10.1093/nar/11.8.2237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Shultzaberger RK, Chen Z, Lewis KA, Schneider TD. Anatomy of Escherichia coli σ70 promoters. Nucleic Acids Res. 2007;35(3):771–788. doi: 10.1093/nar/gkl956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wilson HR, Archer CD, Liu JK, Turnbough CL., Jr Translational control of pyrC expression mediated by nucleotide-sensitive selection of transcriptional start sites in Escherichia coli. J Bacteriol. 1992;174(2):514–524. doi: 10.1128/jb.174.2.514-524.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Qi F, Turnbough CL., Jr Regulation of codBA operon expression in Escherichia coli by UTP-dependent reiterative transcription and UTP-sensitive transcriptional start site switching. J Mol Biol. 1995;254(4):552–565. doi: 10.1006/jmbi.1995.0638. [DOI] [PubMed] [Google Scholar]
- 23.Tu AH, Turnbough CL., Jr Regulation of upp expression in Escherichia coli by UTP-sensitive selection of transcriptional start sites coupled with UTP-dependent reiterative transcription. J Bacteriol. 1997;179(21):6665–6673. doi: 10.1128/jb.179.21.6665-6673.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Walker KA, Mallik P, Pratt TS, Osuna R. The Escherichia coli Fis promoter is regulated by changes in the levels of its transcription initiation nucleotide CTP. J Biol Chem. 2004;279(49):50818–50828. doi: 10.1074/jbc.M406285200. [DOI] [PubMed] [Google Scholar]
- 25.Turnbough CL., Jr Regulation of bacterial gene expression by the NTP substrates of transcription initiation. Mol Microbiol. 2008;69(1):10–14. doi: 10.1111/j.1365-2958.2008.06272.x. [DOI] [PubMed] [Google Scholar]
- 26.Turnbough CL, Jr, Switzer RL. Regulation of pyrimidine biosynthetic gene expression in bacteria: Repression without repressors. Microbiol Mol Biol Rev. 2008;72(2):266–300. doi: 10.1128/MMBR.00001-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Tang H, et al. Location, structure, and function of the target of a transcriptional activator protein. Genes Dev. 1994;8(24):3058–3067. doi: 10.1101/gad.8.24.3058. [DOI] [PubMed] [Google Scholar]
- 28.Mukhopadhyay J, et al. Fluorescence resonance energy transfer (FRET) in analysis of transcription-complex structure and function. Methods Enzymol. 2003;371:144–159. doi: 10.1016/S0076-6879(03)71010-6. [DOI] [PubMed] [Google Scholar]
- 29.Goldman SR, et al. NanoRNAs prime transcription initiation in vivo. Mol Cell. 2011;42(6):817–825. doi: 10.1016/j.molcel.2011.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Vvedenskaya IO, Goldman SR, Nickels BE. Preparation of cDNA libraries for high-throughput RNA sequencing analysis of RNA 5′ ends. Methods Mol Biol. 2015;1276:211–228. doi: 10.1007/978-1-4939-2392-2_12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Druzhinin SY, et al. A conserved pattern of primer-dependent transcription initiation in Escherichia coli and Vibrio cholerae revealed by 5′ RNA-seq. PLoS Genet. 2015;11(7):e1005348. doi: 10.1371/journal.pgen.1005348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: A sequence logo generator. Genome Res. 2004;14(6):1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Severinov K, Mooney R, Darst SA, Landick R. Tethering of the large subunits of Escherichia coli RNA polymerase. J Biol Chem. 1997;272(39):24137–24140. doi: 10.1074/jbc.272.39.24137. [DOI] [PubMed] [Google Scholar]
- 35.Vvedenskaya IO, et al. Growth phase-dependent control of transcription start site selection and gene expression by nanoRNAs. Genes Dev. 2012;26(13):1498–1507. doi: 10.1101/gad.192732.112. [DOI] [PMC free article] [PubMed] [Google Scholar]