Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Mar 22.
Published in final edited form as: Cell. 2018 Mar 15;173(1):181–195.e18. doi: 10.1016/j.cell.2018.02.034

Pervasive regulatory functions of mRNA structure revealed by high-resolution SHAPE probing

Anthony M Mustoe 1,3,*, Steven Busan 1,3, Greggory M Rice 1,2,3, Christine E Hajdin 2, Brant K Peterson 2, Vera Ruda 2, Neil Kubica 2, Razvan Nutiu 2, Jeremy L Baryza 2, Kevin M Weeks 1,4,*
PMCID: PMC5866243  NIHMSID: NIHMS944914  PMID: 29551268

SUMMARY

Messenger RNAs (mRNAs) can fold into complex structures that regulate gene expression. Resolving such structures de novo has remained challenging and has limited understanding of the prevalence and functions of mRNA structure. We use SHAPE-MaP experiments in living E. coli cells to derive quantitative, nucleotide-resolution structure models for 194 endogenous transcripts encompassing approximately 400 genes. Individual mRNAs have exceptionally diverse architectures, and most contain well-defined structures. Active translation destabilizes mRNA structure in cells. Nevertheless, mRNA structure remains similar between in-cell and cell-free environments, indicating broad potential for structure-mediated gene regulation. We find that translation efficiency of endogenous genes is regulated by unfolding kinetics of structures overlapping the ribosome binding site. We discover conserved structured elements in 35% of untranslated regions, several of which we validate as novel protein binding motifs. RNA structure regulates every gene studied here in a meaningful way, implying that most functional structures remain to be discovered.

eTOC Blurb

High-resolution probing of hundreds of genes in living E. coli cells reveals that bacterial mRNAs fold into highly diverse and complex structures, and that these structures have widespread regulatory functions.

graphic file with name nihms944914u1.jpg

INTRODUCTION

Nearly all RNA molecules fold into structures that are stabilized by networks of base pairing interactions. These structures mediate numerous functions, ranging from catalysis to ligand-responsive gene regulation (Cech and Steitz, 2014). In mRNAs, it is hypothesized that RNA structure broadly regulates gene translation efficiency (TE) (reviewed in Kozak, 2005), and numerous complex post-transcriptional regulatory structures have been identified in 5′ and 3′ untranslated regions (UTRs) (Cech and Steitz, 2014). However, efforts to understand the prevalence and role of mRNA-structure-based regulatory mechanisms have been hampered by long-standing challenges in RNA structure modeling.

Recent transcriptome-wide structure probing experiments have implied that mRNAs are frequently structured (Del Campo et al., 2015; Ding et al., 2013; Lu et al., 2016; Rouskin et al., 2013; Spitale et al., 2015; Sugimoto et al., 2015; Wan et al., 2014; Zubradt et al., 2017), but studies to date have lacked the resolution, quantitative accuracy, and comprehensive data coverage necessary to characterize structure at the level of individual mRNAs (Smola et al., 2015a; Weeks, 2015). In particular, there is no validated pathway for using dimethyl sulfate (DMS) probing data or ligation-dependent strategies to accurately model complex RNAs such as endogenous cellular mRNAs. Consequently, fundamental questions such as whether individual mRNAs adopt well-defined or dynamic structures, whether and why mRNA structure differs in vivo compared to ex vivo, and the extent to which RNA structures regulate gene expression have remained unresolved.

Reliable structure models are essential for understanding mRNA regulatory mechanisms. A prime example concerns what role, if any, does RNA structure play in tuning gene translation efficiency (TE) – the amount of protein produced from a given mRNA transcript? TE is a precisely tuned quantity, varying over 100-fold between different genes, and is central to how cells maintain protein homeostasis (Li et al., 2014). Numerous studies have shown that RNA structural stability around the ribosome binding site (RBS) is a major determinant of TE for designed genes, primarily using artificial reporter genes engineered to have specific compact structures in the vicinity of the translation start site (Goodman et al., 2013; Kudla et al., 2009; Salis et al., 2009). Indeed, for synthetic genes, quantitative models can predict and allow rational tuning of TE (Salis et al., 2009). However, studies of native, unmanipulated endogenous genes using poorly validated RNA structure models have observed poor correlations between TE and RBS structure (Boël et al., 2016; Guimaraes et al., 2014; Li et al., 2014; Tuller et al., 2010b). Several major studies have since proposed that TE is regulated via different mechanisms in endogenous genes (Boël et al., 2016; Burkhardt et al., 2017), but in the absence of confident RNA structural models it is premature to draw firm conclusions.

The ability to efficiently model accurate mRNA structures also has the potential to transform our understanding of the role of structure in mediating more complex forms of regulation. To date, discovery of new functional non-coding motifs has been largely restricted to bioinformatics and genetics strategies. These strategies work well for identifying large, broadly conserved structures, such as riboswitches and ribozymes (Weinberg et al., 2015), but suffer from unacceptably high false positive rates when trying to identify smaller or less conserved motifs (Eddy, 2014). The prevalence of non-coding regulatory motifs genome-wide has therefore remained controversial, but it is likely that many functional motifs remain to be discovered. By comparison, starting with an accurate RNA structure model inverts the discovery problem, and would potentially facilitate highly sensitive strategies to discovering novel RNA biology.

In this study, we harness recent technological advances to create the first “no-compromises” RNA structure probing dataset on a transcriptome-wide scale. This conceptual advance allows us to dissect the mechanisms shaping in-cell RNA structure with unparalleled resolution, and enables accurate structure modeling for hundreds of mRNA transcripts. These structure models in turn allow us to test key hypotheses regarding the prevalence and function of mRNA structure. Overall, our work establishes RNA structure as a pervasive and fundamental regulator of gene expression, likely directing expression of every gene in E. coli.

RESULTS

High-resolution probing reveals mRNAs adopt highly diverse structures

We used the SHAPE-MaP (Siegfried et al., 2014; Smola et al., 2015b) chemical probing strategy to obtain quantitative, single-nucleotide resolution measurements of RNA structure across the E. coli transcriptome. SHAPE reactivities are proportional to local nucleotide flexibility and thus provide a direct measure of the extent of RNA structure. Using the extensively validated reagent 1-methyl-7-nitroisatoic anhydride (1M7), we probed RNA structure under three conditions: (i) in living E. coli cells during mid-log growth in liquid culture, (ii) in living cells treated with the antibiotic kasugamycin, which inhibits translation initiation, and (iii) in protein- and ribosome-free extracts maintained in native-like buffers, which we refer to as cell-free (Fig. 1A).

Figure 1. E. coli RNA structure overview.

Figure 1

(A) Experimental strategy. (B) Diversity of E. coli mRNA structures reflected by variation in median gene SHAPE reactivity. (C) Nucleotide-resolution SHAPE profiles for selected genes. Genes are labeled in B. (D) Comparison of in-cell and cell-free SHAPE reactivities for coding regions shows RNA structure is destabilized in cells, but clearly correlated overall. See also Figures S1 and S2.

Critically for this study, we focused on studying native mRNAs in E. coli for which it was possible to acquire near-complete and very high quality chemical probing data. This approach is thus distinct from prior transcriptome-scale studies, which used most or all collected data, but due to data sparseness and irregularity at the per-nucleotide level, required most chemical probing information to be averaged over many genes or averaged over large regions of an RNA. We applied an unbiased whole-transcriptome sequencing strategy which yielded high quality structural data for 194 highly expressed transcripts, encoding approximately 400 genes, that met stringent read-depth and completeness thresholds (Fig. 1B) (Siegfried et al., 2014). These datasets are of comparable quality to those collected in focused studies of individual RNAs (Fig. 1C). 1M7 readily penetrates E. coli cells (McGinnis et al., 2015; Tyrrell et al., 2013; Watters et al., 2016) and we resolve precise nucleotide-resolution changes in SHAPE reactivity reflective of protein binding in non-coding RNAs in cells (Fig. S1). Reproducibility was confirmed by comparisons between biological replicates (Fig. S2).

We initially characterized structural variation across different classes of RNA based on their cell-free SHAPE reactivities. Nucleotide-resolution SHAPE data immediately revealed the enormous diversity in RNA structure across E. coli genes (Figs. 1B, C). This structural heterogeneity is obscured in meta-gene analyses (Fig. S2), and clearly no individual RNA has a structure matching that of an averaged meta-gene. Non-coding RNAs (ncRNAs) and pre-tRNAs have low SHAPE reactivities (Fig. 1B, C), consistent with these ncRNAs possessing stable, well-defined secondary and tertiary structures. By comparison, SHAPE reactivities of coding regions vary dramatically. Some genes exhibit very little stable structure, and others are structured to degrees similar to that of ncRNAs (Figs. 1B, C). Within a given gene-product category, there is again a wide diversity in mRNA structure (Fig. 1C). There is no periodicity in the reactivity profiles of coding regions, indicating that, at least in E. coli, mRNA structure is not periodic (Fig. S2). We suggest that periodicities observed in other studies likely reflect sequence biases of non-MaP-based structure-probing methods and, for structures probed in cells, second-order effects of local ribosome-induced unfolding (see Methods). Overall, mRNA structures are diverse and largely orthogonal to gene identity and thus potentially able to exert heterogeneous and transcript-specific roles in regulating gene expression.

Translation transiently disrupts mRNA structure in cells

Comparisons between in-cell and cell-free datasets revealed that the cellular environment has a significant effect on mRNA structure. Specifically, coding regions are less structured (have higher SHAPE reactivities) in cells than under cell-free conditions (Fig. 1D), consistent with observations from prior studies (Burkhardt et al., 2017; Ding et al., 2013; Rouskin et al., 2013; Spitale et al., 2015). We hypothesized that this structural destabilization was due to ribosome-induced mRNA unfolding during translation (Takyar et al., 2005), and therefore examined the relationship between in-cell SHAPE reactivity and gene TE, which is proportional to average ribosome occupancy (Li et al., 2014).

Three lines of evidence support that mRNA structural disruption observed in cells is primarily due to transient unfolding caused by active translation. First, we observe strong transcriptome-wide correlations between gene TE and in-cell SHAPE reactivity, but not with cell-free SHAPE reactivity (Fig. 2A). Second, in polycistronic transcripts, in-cell SHAPE reactivities increase precisely in highly translated genes, whereas genes on the same transcript with low TE have comparable in-cell and cell-free reactivities (Fig. 2B). Third, compared to normal in-cell conditions, SHAPE reactivity decreases when translation is partially inhibited by the antibiotic kasugamycin, and the correlation between TE and SHAPE reactivity is sharply reduced (Fig. 2A, B). By contrast, kasugamycin treatment has no effect on the structure of ncRNAs, such that any structural destabilization is constant across both in-cell conditions, consistent with the action expected of chaperone proteins such as Hfq (Fig. 2C). Thus, while multiple cellular factors can remodel RNA structure in vivo, ribosome-induced unfolding is a primary cause of mRNA destabilization in cells, and this destabilization correlates with the translation level of individual genes.

Figure 2. Translation destabilizes coding RNA structure.

Figure 2

(A) Gene median SHAPE reactivity versus translation efficiency (TE) (Li et al., 2014). (B) SHAPE reactivity profiles for polycistronic mRNAs. Reactivities are shown as medians over 51-nt sliding windows. Translation efficiency is shown beneath each gene. In-cell SHAPE reactivities increase specifically in highly translated genes. Kasugamycin treatment partially abrogates this increase in mRNAs, but (C) has no effect on non-coding RNAs glmZ and gcvB. (D) Fraction of high-confidence (pairing probability >98%) base pairs spanning greater than 50 nucleotides in cell-free and in-cell coding regions as a function of TE. Long-range base pairs are specifically disfavored in highly translated genes. (E) Percentages of base pairs shared in minimum free energy RNA structure models. Boxes indicate the interquartile range (IQR), and whiskers indicate data within 1.5×IQR of the top and bottom quartiles. See also Figure S3.

Despite the destabilization caused by translation, SHAPE reactivities under in-cell, cell-free, and kasugamycin-treated conditions remain strongly correlated, suggesting that RNA structure is, on average, maintained in cells (Fig. 1D, S2). A unique advantage of 1M7 SHAPE-MaP data is that they can be used to guide accurate secondary structure modeling using extensively validated strategies (Siegfried et al., 2014). Structural modeling was performed for all transcripts at each condition with sufficient SHAPE data, yielding both minimum free energy structure models and base-pairing probabilities. Consistent with the enormous diversity among SHAPE reactivity profiles, different transcripts exhibit highly variable degrees of structure (Fig. S3). For some transcripts, 50% of nucleotides form high-probability base pairs, indicating that the mRNA adopts a well-defined global structure. For other transcripts, only ~10% of nucleotides form well-defined base pairs, indicating that the mRNA structure is highly dynamic. In-cell structure models have ~20% fewer base pairs than cell-free and kasugamycin structure models (Fig. S3), consistent with translation-induced structural destabilization. Highly translated coding regions are selectively depleted of high-probability long-range base pairs in cells, implying that ribosome-induced unfolding specifically disfavors long-range pairing (Fig. 2D). Nevertheless, >60% of minimum free energy and >70% of high-probability base pairs are shared between in-cell, cell-free, and kasugamycin structure models (Figs. 2E, S3), and most structural differences are localized to dynamic regions (see Methods). By contrast, structure models predicted without SHAPE data deviate significantly from data-driven models (Figs. 2E, S3).

In sum, RNA structure is destabilized in the cellular environment by active translation, and such that translation disfavors long-range base pairing. Nonetheless, in-cell RNA structure does not appear to undergo radical changes, leaving intact the potential for RNA structure to regulate cellular processes.

mRNA structure globally tunes gene translation efficiency

Our SHAPE-directed structure models provide an unparalleled resource for exploring hypotheses on the cellular functions of mRNA structure. One of the most important potential functions of mRNA structure is as a regulator of gene TE. Seminal studies of simplified, exogenously expressed model genes have shown that RNA structures that occlude the Shine-Dalgarno sequence and beginning of the coding sequence – collectively termed the ribosome binding site (RBS) – impede loading of the gene into the mRNA binding channel of the 30S ribosomal subunit, and therefore reduce TE (de Smit and van Duin, 1990; Goodman et al., 2013; Kudla et al., 2009; Salis et al., 2009). In contrast, studies of authentic native genes have reported that RBS structure is only weakly correlated with TE (Boël et al., 2016; Guimaraes et al., 2014; Li et al., 2014; Tuller et al., 2010b). More recently, it has been suggested that average structure across the entire coding sequence (CDS), rather than RBS structure, is the key determinant of TE for endogenous native genes (Burkhardt et al., 2017). Importantly, however, all of these studies relied on naïve prediction or unvalidated RNA structure modeling strategies.

Understanding TE in endogenous polycistronic transcripts is complicated by the phenomena of translational coupling, where translation of a downstream gene is dependent on and coupled to translation of upstream genes (Kozak, 2005). Because the mechanism of translation initiation likely differs in translationally coupled genes, we excluded possible translationally coupled genes from our analysis (Fig. 3A). Genes were required to be either the first gene on the transcript, or have more than a two-fold different TE than the immediate upstream gene. The distinct role of RNA structure in translational coupling is discussed later.

Figure 3. RBS structure regulates translation.

Figure 3

(A) Identification of potential translationally coupled genes, which were excluded from TE analysis. (B) Equilibrium unfolding model for mRNA loading into the 30S mRNA channel (top) and correlation between TE and RBS ΔGunfold for translationally uncoupled genes (bottom). N=157. (C) Kinetic unfolding model for mRNA loading into the 30S subunit (top) and correlation between TE and RBS ΔGunfold for translationally uncoupled genes (bottom). (D) Correlation between gene TE and ΔGunfold computed for different coding sequence windows. The indicated significance cutoff corresponds to p≈0.05 (two-sided Wald test; precise cutoff varies between datasets). (E) Example of two high TE genes with structured CDSs in-cell. Base pairs are shown as arcs, colored by pairing probability. Both genes have unstructured RBSs, and hence are predicted to have high TE by the RBS kinetic unfolding model (C), but not by models considering CDS structure. See also Figure S4.

We used our SHAPE-directed structure models to examine two alternative biophysical mechanisms through which RBS structure may regulate mRNA loading onto the 30S subunit during translation initiation. If loading is an equilibrium process, TE should vary with the equilibrium free energy of unfolding the RBS structure (ΔGunfold) (Fig. 3B) (Salis et al., 2009). Alternatively, ribosome loading could depend on a kinetic competition between RBS unfolding versus dissociation of the mRNA from the 30S subunit (de Smit and van Duin, 2003). In this kinetic scenario, TE should vary with the free energy of the unfolding transition state, ΔGunfold (Fig. 3C). Both ΔGunfold and ΔGunfold are straightforward to computationally approximate, but will only be accurate if the underlying RNA structure model is also accurate. Analysis of our SHAPE-directed models revealed that TE is weakly correlated with the equilibrium ΔGunfold (r = −0.37), but is strongly anticorrelated with ΔGunfold (r = −0.64), indicating that TE is strongly dependent on RBS unfolding kinetics (Fig. 3B, C, S4; see Methods). Significantly, this r = −0.64 correlation between RBS structure and TE is comparable to that observed from prior studies of simplified engineered genes (Goodman et al., 2013; Kudla et al., 2009; Salis et al., 2009), suggesting that native endogenous genes regulate TE via similar mechanisms. (Note that prior studies have not attempted to resolve kinetic versus equilibrium mechanisms; see Discussion.) This strong correlation is not inherent to our gene-set. When we repeated our analysis using structures predicted without SHAPE data we observed only a weak correlation between ΔGunfold and TE (r = −0.33; Fig. S4), exactly consistent with prior studies of endogenous genes (Boël et al., 2016; Li et al., 2014). Thus, good structural models, as obtained by SHAPE-directed modeling, are essential for understanding the relationship between RNA structure and gene expression in native mRNAs and, in this case, inform a new understanding of regulation of native genes in E. coli.

It has also been proposed that RNA structures in the CDS can affect TE, potentially by modulating the rate of translation elongation (Burkhardt et al., 2017). We therefore examined the relationship between TE and ΔGunfold for windows downstream of the RBS (Fig. 3D). ΔGunfold is weakly correlated with gene TE over the first 150 nucleotides of the CDS (r ≈ −0.3; Fig. 3D), suggesting that stable structures at the 5′ CDS can reduce TE, and consistent with this region playing an outsized role in determining the rate of translation elongation (Tuller et al., 2010a). However, the correlation is much weaker than that observed between RBS structure and TE. In addition, there is no correlation between TE and ΔGunfold past this initial 5′ region (Fig. 3D). Comparable results were observed for the equilibrium ΔGunfold of CDS structure. Therefore, although translation destabilizes CDS structure, highly translated genes can be highly structured, and we identified many highly translated genes with stable, well-defined CDS structures (Fig. 3E). Thus, our analysis collectively indicates that RNA structure primarily affects TE at the stage of translation initiation at the RBS, with TE relatively unaffected by downstream CDS structure.

To directly validate the kinetic RBS unfolding model of endogenous TE, we constructed translational fusions between endogenous genes and a green fluorescent protein reporter (GFP; Fig. 4). To preserve structures observed in our SHAPE-directed models, we included both the endogenous RBS and flanking regions containing self-contained structural elements upstream and downstream of the endogenous start codon. The TE of each fusion was then assessed as the normalized GFP fluorescence measured by flow-cytometry. Critically, GFP expression was strongly anticorrelated with the expected ΔGunfold of the RBS (r = −0.55; Fig. 4C), supporting the fundamental importance of RBS structure in regulating TE. Consistent with the importance of the kinetic unfolding mechanism, GFP expression was less correlated with (equilibrium) ΔGunfold (r = −0.48). Thus, even though native endogenous sequences are structurally complex and highly heterogeneous relative to each other, with accurate secondary structure models, it is possible to detect a strong relationship between RBS structure and TE, and this relationship is conserved across both authentic native endogenous genes and heterologous reporter systems.

Figure 4. Reporter-gene validation of RBS kinetic unfolding model.

Figure 4

(A) Example parent endogenous transcript and fusion to GFP. Lengths of the fused non-coding and CDS segments are indicated. In-cell structures are shown as pairing probability arcs, as in Figure 3. The RBS is highlighted in brown, with the computed ΔGunfold shown underneath. (B) Example fusions for endogenous genes predicted to have moderate and low ΔGunfold. Note that, despite being embedded in a larger hairpin structure, the dapF RBS is located in a relatively unstructured loop with moderate ΔGunfold, and hence is predicted to have moderate TE by the kinetic unfolding model. (C) Fusion genes recapitulate predicted trend between expression and RBS ΔGunfold. Protein expression was measured as GFP fluorescence normalized to an RFP reference encoded on the same plasmid (nGFP). Genes shown in panels A and B are highlighted in red. Data represent the mean ± SD from three replicates. N=29. P-value computed by two-sided Wald test.

mRNA structure mediates translational coupling

Genes in polycistronic transcripts are often translationally coupled, meaning that translation of a downstream gene is modulated by translation of the preceding gene. Studies of several model transcripts have indicated that RNA structures can mediate translational coupling by acting as conformational switches that mask the RBS until unfolded by upstream ribosomes (Fig. 5A) (Kozak, 2005). Indeed, analysis of the “potentially translationally coupled” genes excluded from our analyses above revealed a much weaker relationship between RBS ΔGunfold and TE (r = −0.37; not shown), supporting that translationally coupled genes are regulated by different mechanisms. We therefore used our structure models to investigate the relevance of a structural switching mechanism, transcriptome-wide.

Figure 5. RNA structure mediates translational coupling.

Figure 5

(A) Model of structure-mediated translational coupling in which upstream translation unfolds otherwise inhibitory RNA structures. (B, C) Representative genes possessing many or few gene-linking base pairs. In-cell structures are shown as pairing probability arcs, as in Figure 3. Translation efficiency is shown beneath each gene. (D) In-cell transcriptome-wide analysis reveals that having many gene-linking base pairs is a significant predictor that adjacent genes will have similar TEs. Gene pairs were classified as having few versus many linking pairs if they were in top and bottom quintiles of all gene pairs, respectively. P-value computed by two-tailed Mann-Whitney U-test. See also Figure S5.

We were immediately able to identify a potential broad role for RNA structure in mediating translational coupling. When adjacent genes have similar TEs, the RBS of the downstream gene tends to be base-paired to the coding sequence of the upstream gene (Fig. 5B). Such “gene-linking” structures will be unfolded by movement of the ribosome during translation of the upstream gene, conditionally unmasking the downstream RBS (Fig. 5A). By comparison, adjacent genes with different TEs tend to have self-contained structures, with few gene-linking pairs, and hence the structure-based accessibility of the RBS should be relatively unperturbed by upstream translation (Fig. 5C). Performing this analysis transcriptome-wide, we find that adjacent genes with many linking base pairs are significantly more likely to have similar TEs than those with few linking pairs (p = 9×10−5, Fig. 5D). Thus, structural coupling between adjacent genes is a specific indicator of similar TE, consistent with RNA structure mediating translational coupling. In contrast, we found that short intergenic distance is not a significant predictor of genes having similar TE, even though intergenic distance is typically thought to be a hallmark of translational coupling (p = 0.1; Fig. S5). Indeed, we observe multiple cases where structure appears to mediate translational coupling of genes separated by more than 30 nucleotides (Fig. S5). To further validate that gene-linking structures mediate translational coupling, we identified the top quintile of genes with the most gene-linking base pairs. Strikingly, 24% (8/33) of these most-linked genes, identified from RNA structure data alone, are known to be translationally coupled. RNA structure has been specifically shown to mediate translational coupling of rplT (Lesage et al., 1992), while rpsK and rplD (Fig. 5B), and rpsD, rplF, rpmD, rplW, and thrB have been shown to be translationally coupled, but via unknown mechanisms (Mattheakis and Nomura, 1988; Thomas et al., 1987; Yates and Nomura, 1980). We again note that high quality structural data is essential because, if structural coupling is inferred without using SHAPE-directed structure models, the relationship between structural coupling and TE is lost (p = 0.09). Combined, our data show that RNA structure-based switches comprised of gene-linking base pairs frequently and selectively couple translation of adjacent genes in E. coli.

Discovery and validation of novel RNA regulatory motifs

Prior work has shown that experimentally supported RNA structure models can be used to identify novel RNA regulatory elements de novo based on the fact that regulatory elements often have particularly well-determined structures (Mauger et al., 2015; Siegfried et al., 2014). We therefore searched for motifs in untranslated regions (UTRs) and intergenic regions (IGRs) with uncommonly stable (low SHAPE reactivity) and well-defined (low entropy) secondary structures (Fig. 6A). Significantly, this unbiased low-SHAPE/low-entropy search returned 9 out of 13 (69%) of the known functional RNA motifs covered by our SHAPE data. The majority of these known motifs are ribosomal protein autoregulatory elements (RAREs) located upstream of ribosomal protein genes. RAREs function by binding excess ribosomal protein to inhibit translation initiation, creating a feedback-loop that controls the ratio of protein to ribosomal RNA (rRNA). Interestingly, our in-cell SHAPE data reveals that many of these RAREs are only partially formed or adopt alternative structures in the absence of bound protein, implying that RNA dynamics are important to their regulatory function (Fig. S6, Table S1). Critically, the high sensitivity of the low-SHAPE/low-entropy strategy for finding known elements strongly supports that structural data can be used to identify novel functional elements de novo.

Figure 6. Structure-based discovery of novel RNA regulatory motifs.

Figure 6

(A) Candidate motifs are identified in non-coding regions based on ability to form stable, well-defined structures, as defined by low SHAPE reactivity and low structural entropy. Low-SHAPE/low-entropy region is emphasized with gray shading. (B) Conservation of identified low-SHAPE/entropy structures in enterobacteria, and evidence of function from prior literature (N=58; see Table S1). (C–E) Identification and validation of the L13-binding motif, C5-binding motif, and L9/L28-binding motif. For each motif, the defining low-SHAPE/entropy region is highlighted in dark gray on the transcript model, with expansions to incorporate surrounding sequences in light gray (top). The two secondary structures shown illustrate (i) SHAPE probing data superimposed on the structure of the 5′ UTR construct used for validation and (ii) the consensus structure labeled by percent conservation in enterobacteria. Gels show electrophoretic mobility shift assays for the designated protein-RNA interactions. In (E), the structure of the 23S rRNA binding site for ribosomal proteins L9 and L28 is also shown. In (C), L13 concentrations varied from 22 to 800 nM for the H1–H5 construct, and 288 to 800 nM for the H2–H4 construct. In (D) C5 varied from 10 to 240 nM. In (E) L9 and L28 concentrations were 500 nM. –, no protein. See also Figure S7, Table S1.

Overall, we identified 58 low-SHAPE/low-entropy structures located in 51 (35%) of the 147 searched UTR and IGR regions. 49 of these motifs are uncharacterized and represent compelling novel regulatory motif candidates. We substantiated the potential functions of these motifs by three approaches. First, for non-CDS-overlapping motifs, phylogenetic analysis revealed that 82% are evolutionarily conserved, with many conserved in 100% of enterobacterial species (Fig. 6B, Table S1). Second, literature searches readily revealed that 23 of these uncharacterized structures (47%) are located in genomic regions with either strong or moderate evidence of biochemical function (Fig. 6B, Table S1). Finally, for three candidate RAREs newly identified as low-SHAPE/low-entropy motifs, we validated functional RNA-protein interactions by electrophoretic mobility shift assays. We discuss these new RAREs below, and provide detailed discussions of all 58 motifs in Table S1.

Our search identified a highly conserved multi-hairpin structure in the 5′ UTR of the rplM-rpsI transcript, which encodes ribosomal proteins L13 and S9, respectively (Fig. 6C). We hypothesized that this structure constituted a novel RARE and, indeed, a contemporaneous study found that L13 translationally represses the rplM-rpsI operon in vivo (Aseev et al., 2016). No RNA structure or mechanistic information has been reported for the putative L13-binding motif. The 5′ UTR and CDS form five well-defined hairpins in cell-free conditions; however, in cells, the H1 and H2 hairpins are moderately destabilized and the H4 hairpin, which sequesters the start codon, is completely destabilized (not shown). The L13 protein specifically bound to an RNA containing helices H1–H5 (Fig. 6C; KD = 390 ± 60 nM), but no binding was observed to a truncated construct containing H2–H4 (Fig. 6C). Thus, L13 binds RNA containing the H1–H5 RNA, and likely inhibits rplM translation by stabilizing H4 and occluding access to the RBS. L13 has also been shown to negatively regulate translation of the downstream rpsI gene (Aseev et al., 2016). Our structure models revealed that rpsI is structurally linked to rplM (Fig. 6C), thus indicating that co-regulation of these two genes is likely achieved via RNA structure mediated translational coupling.

Another well-defined structure occurs in the 5′ UTR of the rpmH-rnpA transcript, which encodes ribosomal protein L34 and protein C5, the protein component of RNase P (Fig. 6D). Helix H2 is highly conserved upstream of the rpmH-rnpA operon in enterobacteria, and the conserved juxtaposition of these two genes suggests that the regulatory circuits governing RNase P and ribosome biosynthesis are co-regulated. C5 binds tightly to the rpmH-rnpA 5′ UTR (KD = 94 ± 9 nM) but not to a mutant lacking the H2 hairpin (Fig. 6D). The increased electrophoretic mobility of the C5-bound UTR is consistent with protein binding inducing a global conformational change in the UTR structure (Ryder et al., 2008). Intriguingly, the H2 hairpin shares similarity to C5-binding hairpins identified by in vitro selection (Lee et al., 2002). Since the L34 coding sequence lies between the 5′ UTR and the coding sequence for C5, binding of C5 likely regulates expression of both L34 and C5, with function at either the transcriptional or translational stage. To our knowledge, this is the first example of a “moonlighting” regulatory function for C5.

Finally, we identified a well-defined motif in the 5′ UTR of the rpmB-rpmG operon, encoding ribosomal proteins L28 and L33. Remarkably, this highly conserved three-helix junction motif shows strong structural similarity to the 23S rRNA binding sites for both L28 and ribosomal protein L9, the latter of which is encoded on a separate operon (Fig. 6E). Prior studies failed to observe autoregulation of the rpmB-rpmG operon by L28 or L33, but the potential involvement of L9 was not explored (Aseev et al., 2016; Maguire and Wild, 1997). The rpmB-rpmG 5′ UTR folds into several conformations at high salt concentrations, as visualized by non-denaturing gel electrophoresis, one of which has exceptionally slow mobility suggestive of a defined tertiary structure (Fig. 6E, S7). Strikingly, L9 specifically binds this low mobility conformation (KD ≈ 300 nM) and L9 and L28 can jointly bind the slow conformation (Fig. 6E, S7). L28 and L33 also bind independently to the UTR without discriminating between the low and high mobility states. L9 and L33 binding are mutually exclusive, with L9 competing off L33 (Fig. S7). Interactions are specific to the native UTR, as deletion of helix H3 eliminates the low mobility conformation and consequently L9 and L28 binding (Fig. 6E). This motif, identified de novo by structure-informed discovery, thus reveals remarkable complexity and constitutes a novel RARE that likely integrates regulation of L9, L28, and L33 across multiple operons.

In sum, phylogenetic analysis, prior functional genetics studies, and our biochemical validation support clear functional roles for many of the novel RNA motifs identified by our study (Fig. 6). With limited exception, the motifs identified here have remained structurally uncharacterized, and 31% of the motifs derive from fully novel loci not even suggested by large-scale bioinformatic predictions (Table S1). Thus, our analysis indicates that, with high-quality probing data, it is possible to discover novel RNA regulatory motifs de novo based on RNA structure information alone.

DICUSSION

High-throughput structure probing experiments have the potential to transform our understanding of the diverse cellular functions of RNA structure. Many studies to date have emphasized rapid and large-scale data acquisition, with less emphasis placed on the quality or completeness of data, or on the quality of the resulting structure models. Such strategies place fundamental limits on the ability to resolve individual RNA structures, which is essential for understanding biological mechanisms. In the present work, we took an alternative approach by performing extensive structure probing experiments and then curating these data to focus on transcripts for which we could obtain nearly complete, quantitative, and nucleotide-resolution profiles (Fig. 1). For the roughly 400 genes examined here, our structure probing data are comparable in quality to prior highly-focused studies of individual RNAs. The completeness and quality of these SHAPE data make it possible to derive realistic structure models for individual RNAs, for individual motifs within these RNAs, and for per-nucleotide structure changes within individual motifs. Ultimately, we were able to discover and validate multiple new mechanisms by which RNA structure governs gene expression in E. coli (Fig. 7).

Figure 7. Mechanisms identified in this study through which RNA structure regulates gene expression.

Figure 7

The function of identified novel non-coding motifs is supported by direct binding studies, evolutionary conservation, and literature cross-references. The function of RBS structure in regulating gene TE is supported by transcriptome-wide analysis and reporter gene assays. The role of gene-linking structures in mediating translational coupling is supported by transcriptome-wide analysis and literature cross-references.

The most fundamental result of our study is that individual mRNAs have highly idiosyncratic architectures; in essence, each mRNA has its own distinctive structural “personality”. Previous studies have presented evidence that mRNAs are frequently structured in cells, but were unable to resolve this functionally important variability, or distinguish the extent to which RNA structure differs between in-cell and cell-free environments (Del Campo et al., 2015; Ding et al., 2013; Lu et al., 2016; Ramani et al., 2015; Rouskin et al., 2013; Spitale et al., 2015; Wan et al., 2014; Zubradt et al., 2017). Comparisons between cell-free, in-cell, and kasugamycin-treated SHAPE datasets reveal that translation destabilizes RNA structure in highly translated genes and reduces long-range base pairing in these genes (Fig. 2). Importantly, however, RNA structure is largely conserved in cells, leaving intact the potential for sequence-encoded structures to mediate gene regulation.

Significantly, our high-quality structural models allow us to address long-standing controversies regarding how translation is regulated in native endogenous genes. Studies of simplified engineered genes have shown that TE is strongly related to RBS structure (Goodman et al., 2013; Kudla et al., 2009; Salis et al., 2009), but studies of native genes have failed to recapitulate this relationship (Boël et al., 2016; Guimaraes et al., 2014; Li et al., 2014; Tuller et al., 2010b). Thus, it has remained unclear whether endogenous genes are regulated by alternative yet-to-be-discovered mechanisms, or rather that the role of RBS structure has been obscured by inaccuracies when modeling structures of native genes. Our work strongly supports the latter conclusion: TE is regulated by RBS structure in similar ways for both engineered and endogenous genes, but that endogenous genes have highly diverse and much more complex structures. We explicitly validate this commonality by transplanting idiosyncratic endogenous RBS sequences in front of exogenous GFP reporters and recover a strong relationship between RBS structure and gene expression (Fig. 4). This conclusion differs from that of a recent study (Burkhardt et al., 2017), which interpreted strong correlations between TE and the DMS reactivity of endogenous genes as evidence that TE is regulated by coding sequence structure. Our analysis indicates that TE is only weakly correlated with ΔGunfold in coding regions (Fig. 3D). In addition, given that correlations between SHAPE reactivity and TE are best explained by ribosome-mediated unfolding of the CDS (Fig. 2), reduced CDS structure is most likely a consequence rather than a cause of high TE. Overall, the model that endogenous genes rely on RBS structure to tune TE explains the unique evolutionary constraint of RBS-adjacent sequences (Bentele et al., 2013; Tuller et al., 2010a), and unifies understanding of translation regulation for synthetic and endogenous genes. Again, these broad insights into the regulation of TE require robust models of the underlying mRNA structure.

Our data also allow us to distinguish whether translation initiation depends on kinetics versus equilibrium unfolding of RBS structure. This distinction is essential for understanding the multistep, highly regulated mechanism of translation initiation, and correspondingly, how translation is dynamically reprogrammed in response to cellular stimuli such as heat shock. The possibility of a kinetic mechanism was first proposed from a theoretical analysis showing, at equilibrium, the lifetime of the unfolded state for a well-structured RBS is much too short to bind a 30S subunit (de Smit and van Duin, 2003). This limitation can be overcome if the 30S subunit first binds non-specifically to an mRNA and transiently “stands by” until the RBS unfolds. The importance of standby sites in translation initiation is now well supported (Espah Borujeni et al., 2014; Studer and Joseph, 2006). However, whether translation initiation depends on RNA unfolding kinetics has been essentially untestable due to the difficulty of modeling long-range RNA structures; not modeling such long-range structures effectively hides differences between equilibrium versus kinetic unfolding mechanisms (see Methods). The kinetic unfolding model explains roughly 40% of the observed TE variation in endogenous genes as compared to only 13% explained by the equilibrium unfolding model. Necessary approximations made in our analysis leave open the possibility of contributions from an equilibrium mechanism (see Methods), but overall our data imply the kinetic mechanism predominates. When combined with accurate mRNA secondary structure models, incorporation of the kinetic mechanism into holistic biophysical models of translation is likely to yield further improvements in the ability to predict and rationally tune gene TE (Espah Borujeni et al., 2017).

Our work also reveals that large-scale RNA structure probing and modeling, when sufficiently accurate, make it possible to discover and understand complex post-transcriptional regulatory mechanisms. We found that searching for well-defined and highly structured RNA elements (low-SHAPE/low-entropy motifs) identifies 70% of previously known regulatory structures. The few known structures missed by our analysis consist of small and dynamic RNA motifs, which present challenges for any detection strategy. This initial finding supports the hypothesis that many functional motifs have been evolutionary selected to have uniquely well-defined structures relative to the genetic background, and that searching for such motifs will be useful for identifying novel regulatory elements. Strikingly, searching for low-SHAPE/low-entropy motifs across all non-coding regions in our dataset revealed well-structured motifs in 35% of UTRs and IGRs. The large majority of these motifs are well-conserved, and many overlap functional sites of protein binding, RNase processing, transcription termination, and small RNA binding, strongly implying involvement of RNA structure in diverse post-transcriptional regulatory processes (Table S1). We specifically validated protein binding activity for three regulatory elements upstream of ribosomal protein genes rplM, rpmB, and rpmH. The discovery of novel RNA motifs is particularly significant given that our analysis was limited to highly expressed housekeeping genes in E. coli, which represent some of the most intensively interrogated and finely parsed genetic loci in biology. While outside the scope of our current study, the 46 other novel structures identified by our motif search represent compelling targets for future functional studies (Table S1); for example, complex motifs were found in front of essential genes rpsT (ribosomal protein S20), csrA (carbon storage regulator A; CsrA), rho (Rho terminator factor), rpoB and rpoC (RNA polymerase subunits β and β′), and accA and accB (subunits of acetyl-CoA carboxylase).

More important than any individual conclusion, our data collectively imply that regulation by RNA structure is much more common than previously appreciated. Indeed, either by tuning TE via RBS structure, or using non-coding structure to achieve more complex differential regulation, every single gene examined here is regulated in a meaningful way by RNA structure (Fig. 7). Our dataset covers roughly 8% of the E. coli genome, suggesting that the majority of RNA regulatory structures and functions have yet to be discovered.

STAR METHODS

CONTACT FOR REAGENT AND RESOURCE SHARING

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Kevin Weeks (weeks@unc.edu).

EXPERIMENTAL MODEL AND SUBJECT DETAILS

SHAPE probing experiments were performed on E. coli K12 MG1655 (gift of Bo Li, UNC, Chapel Hill), grown at 37 °C in LB Broth. Translation reporter assays were done using transformants of E. coli TOP10 (Invitrogen), grown at 37 °C in Terrific Broth. Proteins for in vitro binding assays were expressed in E. coli BL21-AI (Invitrogen), grown in Terrific Broth or ZYM-5052 auto-induction media at 37 °C with shift to 18 °C during induction.

METHOD DETAILS

In-cell SHAPE probing

In full biological replicates, 2 mL of overnight culture were added to 48 mL of LB. Cells were incubated with shaking until the culture reached OD600 ~0.5 (~30 min). To each culture, 5.55 mL of 10 mg/mL kasugamycin or LB was added, and cells were incubated with shaking for 20 minutes. Next, the media was buffered by addition of 3 mL 2 M HEPES pH 8.0 (100 mM final), and the cultures incubated by shaking for two minutes. SHAPE probing was performed in culture tubes by adding 9 mL of cells to 600 μL of 167 mM 1M7 (Steen et al., 2011) in DMSO and shaking for 2 minutes. The samples were transferred to a new tube containing 200 μL of 500 mM 1M7 in DMSO and incubated at 37 °C for 2 minutes. This was repeated once more for a total of three rounds of 1M7 modification. The same procedure was performed for untreated control samples, but adding only DMSO. Cells were pelleted at 4000 g for 20 minutes at 4 °C. Supernatant was discarded, and the cell pellet was resuspended in 200 μL of 0.5× TE buffer (pH 8.0) with lysozyme (1 mg/mL) and incubated on ice for 5 minutes. 1 mL of Trizol reagent (Invitrogen) was added, and the reaction tubes were incubated at room temperature for 5 minutes. To each sample was added 200 μL of cold chloroform. Samples were mixed by shaking for 15 seconds and incubated at room temperature for 2–3 minutes. Tubes were centrifuged at 12,000 g for 15 min at 4 °C. The aqueous upper layer was transferred to a new tube, and a 1.1× volume of isopropanol was added. Reactions were incubated at −20 °C for 30 minutes and then centrifuged at 15,000 g for 30 minutes at 4 °C. The supernatant was discarded and pellets were washed twice with 500 μL 80% ethanol, centrifuging 5 minutes at 15,000 g between washes. Following the washes, the supernatant was discarded, and pellets were air-dried for 5 minutes. Samples were then treated with DNase I (Ambion) and purified on an affinity column (RNeasy Mini Kit; Qiagen).

SHAPE probing of cell-free RNA

In full biological replicates, 2 mL of overnight culture were added to 48 mL of LB. Cells were grown to mid-log phase (OD600 = 0.6), and RNA was gently extracted under non-denaturing conditions as described (Deigan et al., 2009). Total RNA was exchanged using a gravity-flow Sephadex column (PD-10; GE Healthcare) into folding buffer containing 50 mM HEPES, pH 8.0, 200 mM potassium acetate, and 5 mM MgCl2 and incubated at 37 °C for 20 minutes. RNA was modified using three consecutive additions of 1M7 as follows: Folded RNA (360 μL) was combined with 32 μL of 167 mM 1M7 solution in anhydrous DMSO, rapidly mixed, and incubated at 37 °C for 2 minutes. Subsequently, 8 μL of 500 mM 1M7 in DMSO was then added, samples were quickly vortexed and incubated for 2 minutes at 37 °C, and this was repeated once. Following modification, RNA was isolated by affinity chromatography (RNeasy Mini kit; Qiagen), followed by DNase I treatment (Ambion), and a second affinity column (RNeasy).

Reverse transcription

The integrity of each total RNA sample was evaluated using an Agilent Bioanalyzer 2100; RIN numbers were greater than 8.0 for all samples. rRNA was subsequently removed from 15 μg of total RNA (bacterial Ribo-zero kit; Illumina), yielding 50–100 ng of mRNA. All recovered RNA was input into SHAPE-MaP reverse transcription reactions, using SuperScript II (Invitrogen), 6 mM Mn2+, and random nonamer primers (Siegfried et al., 2014; Smola et al., 2015b). Following reverse transcription, Mn2+ was removed using G-25 microspin columns (GE Healthcare). Next, second-strand synthesis was performed (NEB), and dsDNA was isolated using a spin column (PureLink micro spin column; Life Technologies).

Library preparation and sequencing

Libraries were prepared using NexteraXT (Illumina) from 1 ng of each second-strand synthesis product. Final libraries were size-selected (AmpureXP beads; Agencourt) with a 1:1 bead to sample ratio (targeting products greater than 200 bp long), and quantified using an Agilent Bioanalyzer 2100 and QuBit high-sensitivity dsDNA assay. For quality control, sequencing was initially performed on a MiSeq. Subsequently, samples were sequenced on an Illumina HiSeq 2500 using version 4 chemistry and 2 × 125 reads. 20–100 million mapped sequencing reads were obtained per experimental conditional (Table S2), with 88% of base calls above Q30.

Translation reporter assays

Gene panel selection

We selected a subset of genes covered by our in-cell SHAPE data that had constant Shine-Dalgarno sequence strength (−7.5 ≤ΔGhyb ≤ −5.5 kcal/mol, ΔGhyb calculated as described in RBS–TE correlations) and reasonably well-defined structures around the RBS. Regions consisting of 34–191 nts upstream (mean 90 nts) and 45–231 nts downstream (mean 120 nts) relative to the start codon were then identified for each gene that could be excised with minimum perturbation of the observed endogenous RBS structure (Table S3). These sequences were synthesized with flanking BamHI and HindIII restriction sites and cloned into the pTrc-TE plasmid containing sfCherry and sfGFP under the control of independent Trc promoters (described below), with BamHI and HindIII sites allowing in-frame insertion in front of sfGFP. Gene synthesis and cloning was performed by Genscript.

Plasmid construction

The pTrc-TE plasmid was constructed as follows. A pTrcHis A (Invitrogen V36020) was linearized by PCR using pTrcHis_rev and pTrcHis_for primers (Table S4). Following PCR, plasmid template was digested with DpnI (NEB) and purified using a PCR cleanup kit. Sequences for sfCherry (Kamiyama et al., 2016), a double terminator stem (iGEM part BBa_B0015) with restriction sites, and sfGFP (Pédelacq et al., 2006) were designed with ~40 nucleotides of overlapping sequence and ordered as geneBlocks (IDT) (Table S4). Linearized backbone and gene blocks were assembled using isothermal assembly (NEB E5520S).

Measurement of GFP expression

Final cloned plasmids containing inserted endogenous leaders were transformed into TOP10 E. coli (Invitrogen C404010). Overnight cultures were mixed 1:1 with 50% glycerol, aliquoted in 40 μL volumes to 96-well deep-well plates, and stored at −80 °C. Translation efficiency experiments were initiated by adding 360 μL Terrific Broth supplemented with 50 μg/mL carbenicillin (TB+carb) to thawed plates and growing overnight. These overnight cultures were diluted 1:700 into TB+carb and outgrown for 2 hours, followed by induction of GFP/RFP-expression by addition of 0.2 mM IPTG for 2 hours. After the induction period, aliquots were removed to measure OD600 via plate reader and the remaining culture was pelleted by centrifugation at 2000g for 10 min at 4 °C, resuspended in ice-cold PBS, and immediately forwarded to fluorescence measurement. A Beckman Coulter CytoFLEX flow-cytometer was used to measure at least 10,000 cells, exciting at 488 nM and monitoring at 510 nM (525/40 filter) for GFP, and exciting at 561 nM and monitoring at 610 nM (610/20 filter) for RFP. Data were analyzed in FlowJo, using forward/side-scatter gates to mask debris and FSC-A/FSC-Width gates to isolate singlet cells. The median RFP and GFP fluorescence was then calculated from the population of RFP positive cells, with normalized GFP (nGFP) computed as the ratio of GFP to RFP. Results represent the average of three biological replicates performed on separate days.

Exclusion of a typical transformants

In total, we made GFP-fusion expression transformants for 53 different genes. However, 21 transformants exhibited severe slow-growth or low RFP fluorescence phenotypes, indicative of cellular toxicity caused by the endogenous 5′ UTR or CDS leader of the fusion gene. We excluded these transformants from further analysis due to the unpredictable effects that toxicity can have on translation. In particular, we excluded transformants where the median fraction of RFP positive cells across three biological replicates was < 0.6, or where the median post-induction OD600 was < 0.001. In addition, we excluded three strains that exhibited >5-fold variability in RFP or GFP fluoresence across replicates. This left the 29 transformants shown in Figure 4.

Electrophoretic mobility shift assays

Protein expression and purification

His6-tagged genes for the five E. coli proteins rplI (L9), rplM (L13), rpmB (L28), rpmG (L33), and rnpA (C5) were synthesized and cloned into pET-29a(+) vectors using NdeI and XhoI restriction sites (GenScript). C5 contained an N-terminal MRGSH6GS sequence tag (43), while all other proteins contained C-terminal GSH6 tags. Vectors were transformed into BL21-Arabinose-inducible E. coli cells (Invitrogen). For L9, L13, and L28, overnight cultures were used to inoculate Terrific Broth and grown to OD600 = 0.6 at 37 °C, followed by induction for ~16 hours at 18 °C with L-arabinose at 0.02% (w/v) and IPTG at 0.1 mM final concentrations. For C5 and L33, ZYM-5052 autoinduction media was inoculated and grown to OD600 = 2.5, followed by addition of L-arabinose to 0.02% (w/v) and shift to 18 °C for ~16 hours. Cultures were collected by centrifugation, resuspended in A1 Ni-binding buffer (50 mM NaPO4 pH 7.4, 0.5 M NaCl, 40 mM imidazole), lysed by sonication, and clarified by centrifugation at 10,000g for 30 minutes at 4 °C. Supernatant was mixed and incubated with Nickel-NTA Sepharose-FF beads (GE Healthcare), collected by centrifugation, and washed twice with A1 binding buffer, twice with A2 wash buffer (1× DPBS (Gibco), 860 mM NaCl, 40 mM imidazole), and twice with A3 wash buffer (1× DPBS, 360 mM NaCl, 40 mM imidazole). Washed beads were resuspended in elution buffer (1× DPBS, 110 mM NaCl, 250 mM Imidazole), followed by centrifugation and removal of the supernatant containing the eluted protein. Millipore Amicon Ultra 0.5 mL 3000 Da filters were used to concentrate and buffer exchange proteins; L9, L13, and L28 were exchanged into 20 mM Tris pH 7.5, 150 mM NaCl, 1 mM EDTA; and C5 and L33 were exchanged into 20 mM Tris pH 7.5, 500 mM KCl. Concentrations of C5, L13, and L28 were determined by A280 with extinction coefficients estimated by Expasy, and the concentrations of L9, and L33 were determined via Bradford assay (ThermoFisher, calibrated using BSA standard). SDS-PAGE indicated purities of >95% for C5, L9, L13, and L28, and ~80% purity for L33. L13 was stored at 4 °C in the final exchange buffer noted above. C5, L9, L28, and L33 were stored at 4 °C in the above-noted buffer for several weeks before being diluted into glycerol (50% v/v final glycerol concentration) and stored −20 °C.

RNA transcription

DNA oligos for in vitro RNA synthesis (IDT; single-stranded oligos or double-stranded gBlocks) were PCR amplified using Q5 hot-start DNA polymerase (NEB) (sequences listed in Table S5). 32P-body-labeled RNAs were synthesized using T7 RNA polymerase (Rio et al., 2011) and α32P-ATP, purified by 6% denaturing PAGE, eluted overnight using the crush and soak method, and precipitated with ethanol. RNA concentrations were determined using the Qubit RNA HS assay (Invitrogen).

Binding assays

For binding reactions, RNAs were denatured at 95 °C for 2 minutes, cooled on ice for 2 minutes, and then mixed with protein and binding buffer and incubated at 25 °C for 40 minutes. Final reaction concentrations were 5 nM 32P-RNA, protein (variable concentrations), 12 mM Tris-HCl (pH 7.5), 0.1 mg/μL yeast tRNA, 0.1 mg/mL BSA, 5 mM DTT, 1 unit/μL recombinant RNasin (Promega), and KCl and MgCl2 optimized for each system. Final salt concentrations were as follows (mM KCl, mM MgCl2): rplM RNAs (80, 1); rpmH RNAs (80, 1); and rpmB RNAs (250, 10). Protein dilutions from glycerol stocks were performed to maintain constant final glycerol concentrations of 4% (v/v) for all binding reactions. For L13, which was not stored in glycerol, the binding buffer was supplemented with 2.5% final (v/v) glycerol. Following equilibration, samples were mixed with glycerol loading dye to 10% final glycerol concentration and immediately loaded onto running native polyacrylamide gels (0.5× TBE; 0.4-mm × 28.5-cm × 30-cm). 8% (37.5:1 acrylamide:bisacrylamide) gels were used for rpmB and rplM RNAs and 6% (29:1 acrylamide:bisacrylamide) gels were used for rpmH RNAs. Gels were run for 4 hours in a cold room at 720 V, which maintained the gel temperature <15 °C, with at least 1 hour of prerun.

Gel imaging and quantification

Gels were imaged using a GE Healthcare Typhoon Trio phosphoimager, and bands quantified using ImageQuant. Kd values were obtained from fitting to the equation:

f=b+[m-b1+(Kd/Pt)n]

where b and m are the upper and lower asymptotes of the fraction of RNA bound, respectively, Pt is the concentration of protein, and n is the Hill coefficient. Non-linear least square fits were obtained using the curve_fit module of SciPy in Python. n ranged from 1.1–2.6. Reported Kd values represent the average and standard deviation of at least two independent datasets.

QUANTIFICATION AND STATISTICAL ANALYSIS

Read trimming and sequence alignment

Forward and reverse reads were quality trimmed by computing 5-nt averages of the Phred score, trimming at the first 5-nt window with an average Phred score below 20. Reads shorter than 25 nts after trimming were excluded. Trimmed reads were aligned using Bowtie2 (Langmead and Salzberg, 2012) to the E. coli strain MG1655 genome (GenBank accession U00096.2). Bowtie2 alignment was performed in paired-end mode with the following arguments: --local -D 20 -R 3 -N 1 -L 15 -i S,1,0.50 --score-min G,20,8 --ma 2 --mp 6,2 --rdg 5,1 --rfg 5,1 --dpad 100 --maxins 700. Reads not mapping to E. coli or with Bowtie-reported mapping quality scores below 30 were excluded from analysis.

SHAPE reactivity calculation

Data processing and quality control

Data were processed using the ShapeMapper software (Siegfried et al., 2014). Apparent mutation rates were calculated at each genomic position by summing the number of mismatches and deletions and dividing by the number of reads overlapping the position. Sequence insertions and ambiguously aligned deletions were excluded. Mutations spanning multiple adjacent nucleotides were treated as single mutations at the 3′-most position (Siegfried et al., 2014). Nucleotides with apparent mutation rates above 0.02 in any untreated sample were excluded from analysis. In some genomic regions, we observed clusters of elevated mutation rates that appeared to correspond to local self-complementarity artifacts, possibly arising from PCR. These artifacts were identified as regions of at least 10 nucleotides in which three or more of the 10 nucleotides showed mutation rates above 0.03 in the absence of 1M7 treatment, or modified mutation rates above 0.1 in any condition, and were also excluded from analysis. Except where noted elsewhere, SHAPE reactivities were only computed for nucleotides possessing sequencing depths above 1000 in both modified and untreated samples; nucleotides not passing this filter were treated as “no data” and excluded from analysis. Genes were required to have SHAPE data across 80% of the coding sequence for a given condition to be included in gene-specific analyses.

SHAPE reactivity normalization

SHAPE reactivities were calculated as the difference in mutation rates between 1M7-modified and untreated samples. Reactivities were normalized within each probing condition to the mean of the 92–98th percentile reactivities of nucleotides from the ncRNAs RNase P, tmRNA, and 6S RNA, as these ncRNAs were sequenced to high depths and showed few changes across experimental probing conditions.

Calculation of gene median SHAPE

In Figure 1B, medians were computed over all coding sequence nucleotides with defined SHAPE reactivities. In Figure 2A, medians were computed over the region 30 nucleotides 5′ of the start codon to 30 nucleotides 5′ of the stop codon; this captures potential SHAPE reactivity changes associated with translation initiation at the considered gene while excluding changes attributable to translation initiation at neighboring genes.

Coding region a periodicity

Previous transcriptome-wide structure-probing experiments in E. coli, yeast, and mammalian cells have been interpreted to indicate that mRNA coding regions exhibit periodic reactivity profiles (Del Campo et al., 2015; Ding et al., 2013; Spitale et al., 2015; Wan et al., 2014). To provide the best comparison to these prior studies, we collectively averaged over all internal coding region 99-nt windows with at least 60% SHAPE data coverage, aligning to preserve a common reading frame. This meta-gene analysis revealed that coding regions have aperiodic SHAPE reactivity profiles in both cell-free and in-cell conditions (Fig. S2). There are several potential explanations for this discrepancy. First, prior studies of E. coli relied on enzymatic reagents with known sequence biases (Del Campo et al., 2015). Given that coding regions have inherently periodic sequences, periodic structure-probing signal may be a consequence of sequence bias. The 1M7 SHAPE reagent by contrast has minimal sequence bias. Second, prior studies relied on detecting truncated RNA fragments via a ligation-based library preparation strategy. Such strategies introduce sequence biases that are avoided by the SHAPE-MaP strategy (Smola et al., 2015a; 2015b; Weeks, 2015). Third, in truncation based detection strategies, any cellular or experimental process that generates truncated or degraded RNA fragments will give artefactual signals. For example, cotranslational decay in yeast yields intermediates consistent with the periodicity observed in structure-probing experiments (Pelechano et al., 2015). Since SHAPE-MaP detects 1M7 modifications as mutations within continuous RNA sequences, such artefacts are avoided. Fourth, previous E. coli probing experiments were performed on in vitro refolded RNAs, compared to the natively extracted cell-free and in-cell conditions used here. Thus, differences in experimental conditions may contribute to these discordant observations.

Transcript boundary assignment

General strategy

Our SHAPE data represent averages over all transcript isoforms and thus primarily report on the structure of the most highly expressed isoforms. We used manual analysis of the sequencing depths observed in the in-cell dataset to determine the transcript isoform most consistent with the expression observed at each genomic locus. Hallmark signs of consistent read-depth across a transcript with drop-offs near the transcript boundaries were cross-referenced to E. coli transcript annotations compiled from high-throughput end-mapping experiments (Conway et al., 2014). For transcripts without clear dominant transcription start sites, we assigned the transcription start site to the most distal, significantly expressed transcript. Transcripts with internal terminators were modeled as the read-through product if the read-depth of downstream genes was sufficient for accurate calculation of SHAPE reactivities.

Treatment of dominant internal promoters

Several operons exhibited expression profiles consistent with dominant expression of a “short” transcript from an internal start site and lesser, although significant, expression of a “long” transcript from a start site in front of upstream genes. In such cases, the genes downstream of the internal start site were assigned to the short transcript isoform, and the long isoform was truncated to include only the upstream genes. To prevent structure prediction errors associated with using an artificial 3′ boundary, the long isoform was modeled with a 3′ UTR that extended 600 nts past the stop codon of the last gene, or if closer, to the natural transcript termini.

Treatment of annotation-inconsistent transcripts

Approximately 20% of loci had expression profiles inconsistent with any annotated transcription unit (Conway et al., 2014). We searched regulonDB (Salgado et al., 2013) for alternative start sites and/or terminator annotations that better fit the observed expression and found annotations for the majority of such loci. Visual analysis was used to estimate transcript boundaries for the remaining 8% of loci with unannotated transcription start or termination sites.

Secondary structure modeling and analysis

Modeling methodology

While nucleotides possessing <1000 read-depth were otherwise excluded from SHAPE analyses, for the purposes of structure modeling we included SHAPE reactivities for all nucleotides possessing sequencing depths of >350 in both the modified and untreated samples. This choice was made to minimize regions with no data near transcript boundaries and is justified by prior studies showing that SHAPE reactivities computed from as few as 200 reads provide useful information for guiding secondary structure prediction (Siegfried et al., 2014). Minimum free energy secondary structures and base pairing probabilities were generated for each mRNA transcript using the SuperFold algorithm and SHAPE reactivities as restraints (Smola et al., 2015b). SuperFold uses a windowing approach to fold large RNAs. First, partition function calculations are performed for overlapping windows and are merged, yielding transcript-wide base-pairing probabilities and base-pairing (Shannon) entropies. The minimum free energy structure is then predicted in sliding windows, constrained by highly probable pairs observed in the merged partition function. Partition function and minimum free energy calculations were performed using RNAstructure (v 5.8) (Reuter and Mathews, 2010). SuperFold parameters were as follows: SHAPEslope = 1.8, SHAPEintercept = −0.6, trimInterior = 300, partitionWindowSize=1500, partitionStepSize=100, foldWindowSize = 3000, foldStepSize = 300, maxPairingDist = 500. “No-data” models were generated using the same SuperFold parameters, but setting SHAPE reactivities to −999 (equivalent to NaN).

Cross-condition comparisons of MFE structures

Analysis for a given condition was limited to transcripts possessing >80% SHAPE data coverage (defined using >1000 read-depth threshold); due to varying read-depths in different samples, the number of transcripts passing this threshold varied from 59 to 157 (194 transcripts have at least one coding sequence with 80% data coverage in one sample). Comparisons between minimum free energy (MFE) structures indicated that in-cell, cell-free, and kasugamycin transcript models share on average ~60% of base pairs (Fig. S3A). A larger fraction of in-cell pairs are shared with cell-free structures than vice versa. This asymmetry arises from the higher number of base pairs in cell-free models (Fig. S3B); in cells, translation likely disrupts weak base pairs. Supporting this interpretation, structures in kasugamycin-treated cells have more base pairs than in-cell structures but fewer than cell-free structures (Fig. S3B). Note that the apparent increased similarity in Figure S3A between kasugamycin and cell-free structures, and kasugamycin and in-cell structures is misleading, and arises from the smaller number of transcripts with SHAPE data in the kasugamycin condition. In-cell, kasugamycin, and cell-free structures shared comparable fractions of base pairs when analysis was limited to the same subset of transcripts.

Structural variation in dynamic regions

RNAs with poorly-defined dynamic structures can form multiple structures with similar free energies as the MFE structure, which can cause structure modeling to be artificially sensitive to insignificant differences in SHAPE data. We therefore repeated our analysis considering only well-defined base pairs (pairing probability >0.9). Shown in Figure S3D, 25–30% of nucleotides participate in well-defined base pairs, representing ~50% of the base pairs in each MFE structure. Again consistent with translation destabilizing RNA structure, there are fewer high probability pairs in in-cell than in cell-free models. Notably, high-probability pairs are much more likely than MFE pairs to be shared between conditions (>70% of in-cell P>0.9 pairs are also observed in cell-free models; Fig. S3C). As a complementary analysis, we also analyzed how MFE structure similarity varies as a function of base-pairing entropy (a measure of how well-defined a structure is). Similarity between models is strongly anticorrelated with base-pairing entropy (Fig. S3E, F). Together, these analyses indicate that differences between in-cell and cell-free structures are primarily localized to poorly defined regions. Some of these differences are caused by ribosome-induced unfolding in cells, which reduces the overall number of base pairs observed in cells. However, in-cell and kasugamycin models differ from each other to similar degrees as they differ with respect to cell-free models, implying that the cellular environment does not induce large-scale changes in RNA structure.

Calculation of ΔGunfold and ΔGunfold

General strategy

We tested four different models of the RBS unfolding process that occurs during mRNA accommodation into the 30S subunit (Fig. S4A). Equilibrium versus non-equilibrium unfolding allows versus disallows the mRNA molecule to refold to a new minimum free energy structure after unfolding of the RBS. Local versus complete unfolding allows versus disallows base pairs spanning the unfolded RBS window. For all four models, ΔGunfold was computed as

ΔGunfold=ΔGcons-ΔGref

ΔGref is the free energy of the reference SHAPE-directed transcript structure. ΔGcons is the free energy of the “constrained” transcript structure with the RBS window constrained as single-stranded. For complete unfolding, the constrained structure was also prevented from having base pairs spanning the RBS window. All ΔG calculations were performed using the efn2 command of RNAstructure (excluding SHAPE pseudo-energies) (Reuter and Mathews, 2010).

Calculation of ΔGunfold

The non-equilibrium ΔGunfold is assumed to correspond to the unfolding transition state free energy, referred to as ΔGunfold throughout the text. For these calculations, the SuperFold minimum free energy transcript structure was used as the reference structure. The constrained structure was obtained by deleting base pairs involving the RBS window from the reference. In the case of complete unfolding, all base pairs spanning the RBS window were also deleted.

Calculation of ΔGunfold

Equilibrium ΔGunfold calculations required computing new sets of structure models. The reference structure for each gene was obtained by folding up to a 1500-nt subsequence centered on the start codon. For genes with start codons <750 nts from either the 5′ or 3′ transcript boundary, the subsequence extended from the proximal boundary up to 1500 nts, or to the distal boundary. For the local unfolding scenario, the constrained structure was obtained by refolding the same subsequence with the RBS constrained as single stranded. For the complete unfolding scenario, the subsequence was refolded in two segments (5′ and 3′ to the RBS window) to prevent RBS-spanning pairs; ΔGcons was then obtained by summing the ΔG computed for each segment. These folding calculations were performed using the Fold command of RNAstructure with parameters –mfe –md 500 –si –0.6 –sm 1.8 and the same SHAPE restraints as used with SuperFold.

RBS–TE correlations

Gene inclusion criteria

Downstream genes in polycistronic transcripts with TE (Li et al., 2014) within 2-fold of the TE of the immediate upstream gene were classified as potentially translationally coupled and excluded (Fig. 3A). Analysis was restricted to genes possessing SHAPE data for >80% of nucleotides in the 200-nt window centered around the start codon. If the gene start codon was less than 100 nts from the transcript boundary this window extended from the boundary to 100 nts upstream of the start codon. Genes with non-canonical start codons (not AUG or GUG) or lacking Shine-Dalgarno sequences were excluded. Shine-Dalgarno sequences were assessed by computing the hybridization free energy ΔGhyb between the 16S rRNA anti-Shine-Dalgarno sequence CACCUCCU and the gene subsequence from −16 to −3 relative to the start codon. Genes with valid Shine-Dalgarno sequences were defined as having ΔGhyb≤0, with the terminal Shine-Dalgarno/anti-Shine-Dalgarno base pair located within the interval [−10, −4] relative to the gene start. ΔGhyb calculations were performed using RNAstructure ( Bifold –i).

Investigation of different RBS unfolding models

Correlations were computed between TE and the local and complete equilibrium energy of unfolding (ΔGunfold), and local and complete non-equilibrium energy of unfolding (ΔGunfold) for varied sized windows around the RBS. As shown in Figure S4B, local ΔGunfold was strongly correlated with TE for RBS windows between 30-nt to 50-nt in size (r < −0.6). Supporting that this strong correlation reflects a true physical unfolding process, the correlation between ΔGunfold and TE markedly decreased once the RBS window was expanded beyond the anticipated physical unfolding window (beyond −25 or +25 of the gene start). By contrast, the correlations between TE and complete ΔGunfold, local ΔGunfold, and complete ΔGunfold are significantly weaker for all analyzed windows (r ≤ −0.4). For other analyses, RBS ΔGunfold and RBS ΔGunfold were taken to be the local ΔGunfold and ΔGunfold, respectively, computed for the window [−25, +25] around the start codon. All linear regressions were computed using the stats.linregress function of ScipPy in Python.

Analysis limitations

In general, SHAPE-directed structure models are more accurate in modeling short-range than long-range base pairs. Thus, the reduced correlation between complete ΔGunfold and TE compared to local ΔGunfold may be a consequence of including lower-accuracy long-range base pairs in the complete unfolding calculation. Additionally, our equilibrium ΔGunfold calculations are compromised by the necessary assumption that our SHAPE data can be used to model the RBS-unfolded structure.

Comparison to prior studies of synthetic genes

Studies of overexpressed synthetic genes have observed comparable (r ≈ −0.6) correlations between TE and complete equilibrium RBS unfolding (complete ΔGunfold) as we observe between TE and ΔGunfold for native genes (Espah Borujeni et al., 2014; Goodman et al., 2013; Kudla et al., 2009; Salis et al., 2009). The similar observed correlations suggest a common mechanism despite the important conceptual difference between ΔGunfold and ΔGunfold. We note that several features made these prior experiments insensitive to differences between kinetic and equilibrium mechanisms. First, previously studied mRNAs were engineered to have well-defined, modular secondary structures. Consequently, RBS unfolding is unlikely to promote refolding of adjacent mRNA sequences, rendering equilibrium ΔGunfold and non-equilibrium ΔGunfold equivalent. Second, the studied mRNAs had only short-range pairing interactions, and thus were insensitive to differences between complete versus local unfolding. Finally, the kinetic mechanism is based on the assumption that individual mRNA species comprise a small fraction of the total cellular mRNA. This assumption may be violated for highly overexpressed mRNAs, which are more likely to be at or near equilibrium with the pool of free 30S subunits.

CDS–TE correlations

The local ΔGunfold (or ΔGunfold) was computed for 50-nt windows across the CDS, relative to the start or stop codon of each gene, using the same methodology described above for the RBS. For windows relative to the start codon (5′ CDS), the regression of ln(TE) on ΔGunfold (or ΔGunfold) was computed for the same genes as used for RBS regressions, with the additional restriction that genes must be >200 nts long (in-cell N=150; kasugamycin N=102; cell-free N=120). For windows relative to the stop codon (3′ CDS), regressions were computed for genes passing the same start codon, Shine-Dalgarno strength, translational coupling, and >200 nt length filters, while requiring SHAPE data for >80% of nucleotides in the 200-nt window centered around the stop codon (in-cell N=155; kasugamycin N=92; cell-free N=173). Linear regressions and significance were computed using the stats.linregress function of ScipPy in Python.

Translational coupling analysis

The number of gene-linking pairs (LP) between a given gene and its upstream neighbor was computed from the base pairing partition function as:

LP=i=1tuj>ittp(i,j)·IA(jsg-w)

where p(i, j) is the pairing probability between positions i and j, IA is the indicator function, sg is the position of the gene start, and tu and tt are the termini of the upstream gene and the transcript, respectively. The w parameter specifies the size of the included RBS window (for example, pairs linking the Shine-Dalgarno sequence to the upstream gene are included in LP). We used w=25, matching the RBS window size used elsewhere in the text. A similar trend of decreasing TE variation with LP was observed for different w values. Mechanistic considerations distinct from potential structural coupling make translational coupling unlikely between genes separated by very long intergenic regions, or between genes with significantly overlapping coding sequences. Therefore, to prevent such genes from skewing analyses, we limited our analysis to genes where −5 < sg-tu < 100, but comparable results were obtained when the analysis was applied to all genes. Analysis was restricted to genes that had SHAPE data for >80% of nts in the 200-nt window centered around the gene start.

Automated motif detection

Algorithm description

We built on a previously described strategy for identifying well-structured motifs in large RNA molecules (Fig. 6A) (Siegfried et al., 2014; Smola et al., 2015b). Local median SHAPE reactivity and entropy were computed over centered, sliding 51-nt windows using the cell-free dataset. At boundaries, local medians were computed using all nucleotides within +/− 26 nts of the considered position (for example, for a window centered on nucleotide 10, the median was computed using nucleotides [1, 36]). At least 26 nts were required to have SHAPE data in order to compute a valid local median. Well-structured regions were identified as regions where the local median SHAPE fell below 0.3 and median entropy fell below 0.04 for more than 25 contiguous nucleotides. These regions were then expanded by up to 50 nts on either side to incorporate nested structures with pairing probability (pp) >0.9. To confirm identified structures also existed in cells, >95% of cell-free pp>0.9 base pairs were required to have pp>0.5 in-cell. If this 95% cutoff was not satisfied, the region was trimmed to the maximal sub-region meeting this requirement. Finally, all nucleotides with pp<0.5 were trimmed from the 5′ and 3′ ends. Final trimmed consensus regions that were shorter than 25 nts or possessed <80% cell-free or in-cell SHAPE data coverage were rejected. Following automated identification, each motif was visually inspected and in some cases manually adjusted to include (or exclude) adjacent structures that were judged to be part of (or distinct from) the algorithmically identified structure.

Our use of fixed-value SHAPE and base-pairing entropy cutoffs differs modestly from our previously described algorithm, where regions were identified from comparisons to the global medians of SHAPE and entropy (Siegfried et al., 2014; Smola et al., 2015b). Fixed-value cutoffs are required for analyzing RNAs that are potentially poorly structured overall (such as the mRNAs analyzed here), or, conversely, those that are highly structured overall (such as structured ncRNAs). The 0.3 SHAPE cutoff corresponds to the maximum median reactivity expected of paired nucleotides, and the 0.04 base-pairing entropy cutoff corresponds to a pp ≈ 0.95.

UTR/IGR data coverage criteria

We limited our search to UTRs and IGRs > 25 nts long, and which contained at least one 25-nt stretch with 75% SHAPE coverage in both the in-cell and cell-free datasets. Annotated REP (Keseler et al., 2013) and ERIC (Wilson and Sharp, 2006) motifs were masked out.

Sensitivity of detecting known motifs

We compiled a list of all E. coli RFAM (Nawrocki et al., 2015) motifs and known RAREs (Aseev et al., 2015; Fu et al., 2013; 2014; Matelska et al., 2013). Structures identified from our de novo structure models were considered “true positives” if they recapitulated any portion of the known structure. Thirteen of these known motifs fall within UTR/IGRs passing our length and data coverage filters, and of these thirteen, we positively identified nine, corresponding to a sensitivity of 69%. The four known motifs we failed to identify were the rplK, rpsO, rpsF, and rplY RAREs. We note that our motif search also identified the so-called Pseudomonas sRNA P26 motif listed in RFAM (named intergenic rplL-rpoB motif in Table S1). Despite its entry in RFAM, we determined that this motif is better described as “functionally uncharacterized” due to a lack of validation (see Table S1), and therefore excluded this motif from our sensitivity calculations. If we include this motif in our sensitivity calculations, we detect 10 out of 14 (71%) of known RFAM and RARE motifs.

Comparisons with prior comparative genomics predictions

We compared the UTR/IGR motifs identified here against prior comparative genomics and bioinformatics predictions of functional RNAs (Livny et al., 2008; Ott et al., 2012; Pichon et al., 2012; Rivas et al., 2001; Tran et al., 2009; Uzilov et al., 2006). Several of these algorithms were optimized to predict small RNA genes rather than functional UTR/IGR motifs, but were nonetheless included for completeness. The study by Uzilov et al. includes predictions made using three algorithms: Dynalign (Uzilov et al., 2006), QRNA (Rivas and Eddy, 2001), and RNAz (Washietl et al., 2005); comparisons were performed to all three sets of predictions (requiring P>0.9 for Dynalign and RNAz). Motifs were considered “previously predicted” if they overlapped a predicted functional loci by at least 50 nts and were located on the same strand (if specified).

Motif conservation analysis

Algorithm for identifying homologs

We constructed an automated pipeline to search for motif homologs in other bacterial genomes (Fig. S8). Similar to other comparative genomics pipelines (Slinger et al., 2014; Yao et al., 2007), we use iterative Infernal (v1.1.1) (Nawrocki and Eddy, 2013) searches to train a covariation model (CM) constructed from a single input E. coli structure. The initial CM was built and calibrated from a Stockholm file containing the E. coli sequence and base pairs ( cmbuild --F; cmcalibrate). cmsearch was performed against a non-redundant bacterial genome database using a lenient e-value cutoff of 1.0 ( cmsearch --incE 1.0 --mid --cpu 8). The genetic context of each identified homolog was cross-referenced to E. coli, filtering out homologs found in different contexts or at unannotated loci. The filtered homologs were then aligned ( cmalign --cpu 8 --noprob) and used to construct a new CM. This process was repeated a total of three times, yielding a “trained” CM. The trained CM was then used to perform a final search against the bacterial database using a e-value cutoff of 0.01.

Homolog genetic context filtering

Genetic context filtering was performed using RefSeq annotations (Tatusova et al., 2014). The “transcript” of each homolog was inferred by first identifying adjacent same-strand genes within 400 nts. The “transcript” was then extended from both directions to incorporate additional same-strand genes, allowing a maximum intergenic distance of 400 nts. These genes were then cross-referenced against the genes of the parent E. coli transcript, defining shared context as at least one common gene between the two transcripts. Cross-referencing was performed using both gene names and products: names were cross-referenced using gene and gene_synonym fields; products were cross-referenced using manually specified keywords.

Bacterial genetic database details

The genomic database was constructed by downloading the RefSeq (Tatusova et al., 2014) bacterial genome assembly summary from ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt (on October 19, 2015). Genomes that were not “latest”, “Complete Genome”, “reference”, or “representative” were discarded. From the remaining genomes, a single genome was chosen for each species and downloaded from the NCBI genome ftp (also on October 19, 2015). Reference genomes were prioritized over representative genomes. For species with multiple reference genomes, or multiple representative but no reference, the last listed genome was used.

Consensus motif analysis

The homologs returned from our algorithmic search were manually assessed for context specificity and phylogenetic diversity. For the large majority of motifs, the search procedure returned homologues with 100% context-specificity and reasonable structure conservation. Our homolog searches for identified ribosomal protein autoregulatory motifs provide strong positive controls, yielding consensus structures and phylogenetic diversities comparable to prior studies (Table S1) (Fu et al., 2013). However, in some cases, searches using the trained CM returned homologs with poor context or secondary structure conservation. This was attributable to either progressive loss of CM specificity during the refinement stage, or for small motifs, low information content of the original motif. These cases are noted in Table S1, and were excluded from downstream conservation and consensus structure analysis. R2R (Weinberg and Breaker, 2011) was used to draw consensus structure diagrams and assess secondary structure conservation (--GSC-weighted-consensus 3 0.97 0.9 0.75 4 0.97 0.9 0.75 0.5 0.1).

Conservation calculation

Conservation within enterobacteria was computed as the number of enterobacterial homologs identified divided by 32, the total number of enterobacteria in our database. The endosymbionts Wigglesworthia glossinidia and Buchnera aphidicola were excluded from conservation calculations.

DATA AND SOFTWARE AVAILABILITY

Raw sequencing reads from SHAPE experiments have been deposited in the European Nucleotide Archive (ENA), accession PRJEB23974, and can accessed at http://www.ebi.ac.uk/ena/data/view/PRJEB23974.

Processed SHAPE data, RNA structure models, Python code used to perform automated low-SHAPE/low-entropy motif detection, and Python code to perform automated homology searches are freely available at the Lead Contact’s webpage, http://www.chem.unc.edu/rna/.

Supplementary Material

1. Figure S1, related to Figure 1. In-cell SHAPE resolves protein binding sites in non-coding RNAs.

(A, B) Raw modification profiles and resultant SHAPE profiles of tmRNA and RNase P probed under in-cell and cell-free conditions. DMSO no-reagent control samples were collected for both cell-free and in-cell conditions, but for simplicity, only a single DMSO profile is shown. Profiles show unambiguously that 1M7 penetrates E. coli cells and modifies in-cell RNA at roughly the same rate as cell-free RNA. Smoothed SHAPE reactivity differences were calculated using the ΔSHAPE framework (Smola et al., 2015a). (C) SHAPE reactivity changes mapped on the E. coli tmRNA secondary structure. (D) SHAPE reactivity changes mapped on the crystal structure of the tRNA-like domain of A. aeolicus tmRNA (PDB 1P6V). In-cell SHAPE reactivity protections (green) correspond closely with the SmpB binding site. (E) SHAPE reactivity changes mapped on the E. coli RNase P RNA secondary structure. (F) SHAPE reactivity changes mapped on the crystal structure of T. maritima RNase P (PDB 3QIQ). In-cell SHAPE reactivity protections (green) correspond closely with C5 protein and tRNA binding sites.

10. Figure S2, related to Figure 1. Reproducibility and meta-gene analysis of SHAPE reactivity.

(A) Per-gene Pearson correlation between SHAPE profiles across biological replicates. Medians are denoted by black bisecting lines, boxes indicate the interquartile range (IQR), and whiskers indicate data within 1.5×IQR of the top and bottom quartiles. (B) Per-gene Pearson correlation between SHAPE profiles across experimental conditions. (C) Meta-gene analysis of cell-free SHAPE reactivity provides little information on the structure of individual mRNAs, but indicates that coding regions do not have periodic structures (top; see also Methods). Note that changes in average SHAPE reactivity are much smaller than the per-nucleotide standard deviation. Note also that the increased SHAPE reactivity observed at the meta-gene start and stop codons mirror AU-sequence biases (bottom). Averaging was performed transcriptome-wide, including all 100-nt windows with at least 60% cell-free SHAPE data coverage irrespective of whether the parent transcript had sufficient full-length SHAPE coverage for other analyses. Hence, this analysis reflects a larger pool of genes, and is comparable in makeup to other transcriptome-wide studies. The number of windows used for each average is denoted.

2. Figure S3, related to Figure 2. Comparison between SHAPE-directed and no-data structure models.

(A) Similarity between MFE structure models for each transcript. Comparisons were performed by computing the fraction of base pairs shared between the first and second structures and vice versa (first and second correspond to order listed on x-axis). These fractions correspond to positive predictive value (ppv) and sensitivity, respectively, which are conventionally used when comparing structure models to known references. (B) Fraction of nucleotides that are base paired in MFE structures for different conditions. (C) Similarity between the set of highly probable (P>0.9) base pairs for each condition. Comparisons were performed as described in panel A. (D) Fraction of nucleotides paired with P>0.9 under different conditions. In panels A-D, medians are denoted by red bisecting lines, boxes indicate the IQR, whiskers indicate data within 1.5×IQR of the top and bottom quartiles, and outliers are indicated by crosses. (E) Correlation between base-pairing entropy and the fraction of MFE pairs shared between in-cell and cell-free models. High entropy indicates structures are poorly defined. (F) Correlation between base-pairing entropy and the fraction of MFE pairs shared between in-cell and kasugamycin models.

3. Figure S4, related to Figure 3. Correlation between TE (Li et al., 2014) and ΔGunfold and ΔGunfold.

(A) Scheme illustrating different models of mRNA accommodation into the 30S subunit. For equilibrium calculations, the mRNA molecule is allowed to refold to a new minimum free energy structure after unfolding the RBS, but not in non-equilibrium (kinetic) calculations. Local versus complete unfolding allows versus disallows base pairs across the RBS window. Non-equilibrium unfolding energies are assumed to correspond to ΔGunfold, the free energy of the unfolding transition state (see Methods). (B, C) Correlation coefficients computed using different sized windows for local (filled bars) and complete (open bars) RBS unfolding models. Correlations were computed using in-cell structures, excluding potential translationally coupled genes (N=157). In panel B, red shading indicates the model used for all remaining analyses. (D-F) Correlation between TE and local ΔGunfold for the three probing conditions. To facilitate direct comparison, we only show genes that possess sufficient data coverage in all three SHAPE probing conditions (N=92). (G) Correlation between TE and local ΔGunfold computed from “no-data” structure models. (H) Correlation between TE and ΔGtotal predicted by the RBS calculator (v1.0), a representative thermodynamics-based TE calculator (Salis et al., 2009). Analyses in panels G and H were performed on genes possessing in-cell SHAPE data (N=157) and thus can be directly compared to Figure 3C.

4. Figure S5, related to Figure 5. RNA structure couples translation of adjacent genes.

(A) Relationship between the TE ratio of adjacent genes as a function of the number base pairs linking the genes. Bottom and top quintiles are shown in yellow and blue, respectively; these quintiles correspond to the “few” and “many” linking-pairs categories in Figure 5. The red dashed line highlights the consistent decrease in TE variability as genes are linked by more base pairs. (B) Relationship between TE of adjacent genes as a function of the length of the intervening intergenic region. This analysis shows clearly that translational coupling is not a simple function of intergenic distance. Top and bottom quintiles are shown as in (A). Statistical significance between the top and bottom quintiles is indicated above (A) and (B) and was tested using two-tailed Mann-Whitney U-tests. (C, D) Examples of structure-mediated translational coupling over long intergenic regions. The rpmI-rplT IGR is 53-nt long, and the rpsK-rpsD IGR is 34-nt long. Structures are shown as pairing probability arcs (key shown in Fig. 3).

5. Figure S6, related to Figure 6. SHAPE data reveal that many known ribosomal protein autoregulatory element (RARE) structures are unstable in the absence of bound protein.

Motifs are labeled by downstream gene, with the ribosomal protein ligand listed in parentheses. Accepted functional structures and SHAPE data are shown for each motif (Aseev et al., 2015; Fu et al., 2013; 2014; Matelska et al., 2013). Regions where SHAPE data are inconsistent with an accepted structure are highlighted with light blue shading, and corresponding unstable structural elements are shown using grey arcs. Brown boxes indicate coding sequences. Note that the rplE motif is located entirely within the rplE coding region, and thus was not included in our automated motif search.

6. Figure S7, related to Figure 6. The rpmB 5′ UTR binds L9 and L28.

(A) Constructs used. Alterations made in variant constructs are highlighted in black. In the stabilizing H1insA-GC construct, an A is inserted to pair with the bulged U, and the neighboring AU pair is changed to a GC. The WTtrunc construct contains only the three-way junction (truncated nucleotides are drawn in gray). (B) The low-mobility conformation of the rpmB 5′ UTR is salt-dependent, and is stabilized by the H1insA-GC mutation. Quantification indicates that 64% of H1insA-GC RNA is in the low-mobility conformation at 200 mM KCl and 20 mM MgCl2, compared to 48% for WT RNA. 10 nM RNA was folded as described in Methods in 10 mM Tris-HCl (pH 7.5), 0.1 mg/mL yeast tRNA, and varying KCl and MgCl2. Concentrations are in mM. (C) Co-incubation experiment indicates that the WT and WTtrunc constructs do not interact, confirming that the slow conformation is not a dimer. 2.5 nM isolated or mixed RNAs were denatured and folded as described in Methods in L9-binding buffer. (D) Binding of L9 and L28 to different constructs. L28 appears to bind both high- and low-mobility states, as evidenced by the appearance of new bands in both regions of the gel (see also Figure 6E). L9 and L28 concentrations are 250 and 500 nM. The no protein and 500 nM L9+L28 lanes in the ΔH3 panel are identical to the ΔH3 panel in Figure 6E. (E) Concentration-dependent binding of L9 (concentrations vary from 178 to 600 nM). Estimate, KD≈300 nM. (F) L33 binds the rpmB 5′ UTR but is competed by L9 (L33 = 500 nM; L9 varies from 125 to 500 nM). (G) The WTtrunc construct does not bind L9 or L28 (500 nM concentrations). (H) Consensus 5′ UTR across all enterobacterial species indicates that sequences downstream of the three-way junction are highly conserved. Key for the consensus is located in main text Figure 6.

7. Figure S8, related to STAR Methods.

Outline of homolog search strategy. Each E. coli structure was used to build an Infernal (Nawrocki and Eddy, 2013) covariation model. The initial model was refined three times by incorporating additional homologs identified in similar genetic contexts. The trained covariation model was then used to perform a final search, with returned homologs used to construct consensus structures using R2R (Weinberg and Breaker, 2011).

8
9

Table S1. UTR and IGR motifs identified from SHAPE-directed motif search, related to Figure 6.

Table S2. Sequencing read counts for SHAPE experiments, related to STAR Methods.

Table S3. Endogenous sequences cloned into GFP-fusion expression constructs, related to STAR Methods.

Table S4. Oligos used for construction of pTrc-TE plasmid, related to STAR Methods

Table S5. Oligos used for in vitro RNA transcription, related to STAR Methods.

Highlights.

  • E. coli mRNAs adopt highly diverse and complex structures

  • Translation is the main source of mRNA structural destabilization in cells

  • Translation efficiency is strongly correlated with ribosome binding site structure

  • Conserved structured elements found in 35% of untranslated regions

Acknowledgments

This work was supported by grants from the National Institutes of Health (R35 GM122532) and the National Science Foundation (MCB-1121024) to K.M.W, and NIBR internal funding. A.M.M. is an Arnold O. Beckman Postdoctoral Fellow, and was a Lineberger Postdoctoral Fellow in the Basic Sciences (T32-CA009156). We are indebted to D. Mathews (U. Rochester) for helpful discussions, M. Smola (UNC) for sharing his ΔSHAPE analysis framework, and P. Irving (UNC), S. Gowrisankar (NIBR), A. Ho (NIBR), O. Iartchouk (NIBR), and B. Jones (NIBR) for their assistance.

Footnotes

AUTHOR CONTRIBUTIONS

K.M.W. conceived the study. A.M.M., S.B., G.M.R., J.L.B., and K.M.W. designed the experiments, with assistance from C.E.H., B.K.P., N.K. and R.N. A.M.M., S.B. and K.M.W. designed analyses. A.M.M. and G.M.R. performed the experiments with assistance from V.R. A.M.M. and S.B. performed analyses with assistance from C.E.H. and B.K.P. A.M.M. and K.M.W. wrote the paper, with input from all authors.

DECLARATION OF COMPETING INTERESTS

K.M.W. is an advisor to and holds equity in Ribometrix, to which SHAPE-MaP technologies have been licensed.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Aseev LV, Bylinkina NS, Boni IV. Regulation of the rplY gene encoding 5S rRNA binding protein L25 in Escherichia coli and related bacteria. RNA. 2015;21:851–861. doi: 10.1261/rna.047381.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Aseev LV, Koledinskaya LS, Boni IV. Regulation of Ribosomal Protein Operons rplM-rpsI, rpmB-rpmG, and rplU-rpmA at the Transcriptional and Translational Levels. J Bacteriol. 2016;198:2494–2502. doi: 10.1128/JB.00187-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bentele K, Saffert P, Rauscher R, Ignatova Z, Blüthgen N. Efficient translation initiation dictates codon usage at gene start. Mol Syst Biol. 2013;9:675–675. doi: 10.1038/msb.2013.32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Boël G, Letso R, Neely H, Price WN, Wong KH, Su M, Luff JD, Valecha M, Everett JK, Acton TB, et al. Codon influence on protein expression in E. coli correlates with mRNA levels. Nature. 2016;529:358–363. doi: 10.1038/nature16509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Burkhardt DH, Rouskin S, Zhang Y, Li GW, Weissman JS, Gross CA. Operon mRNAs are organized into ORF-centric structures that predict translation efficiency. Elife. 2017;6:811. doi: 10.7554/eLife.22037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cech TR, Steitz JA. The noncoding RNA revolution-trashing old rules to forge new ones. Cell. 2014;157:77–94. doi: 10.1016/j.cell.2014.03.008. [DOI] [PubMed] [Google Scholar]
  7. Conway T, Creecy JP, Maddox SM, Grissom JE, Conkle TL, Shadid TM, Teramoto J, San Miguel P, Shimada T, Ishihama A, et al. Unprecedented high-resolution view of bacterial operon architecture revealed by RNA sequencing. MBio. 2014;5:e01442–14. doi: 10.1128/mBio.01442-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. de Smit MH, van Duin J. Secondary structure of the ribosome binding site determines translational efficiency: a quantitative analysis. Proc Natl Acad Sci USA. 1990;87:7668–7672. doi: 10.1073/pnas.87.19.7668. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. de Smit MH, van Duin J. Translational standby sites: how ribosomes may deal with the rapid folding kinetics of mRNA. J Mol Biol. 2003;331:737–743. doi: 10.1016/s0022-2836(03)00809-x. [DOI] [PubMed] [Google Scholar]
  10. Deigan KE, Li TW, Mathews DH, Weeks KM. Accurate SHAPE-directed RNA structure determination. Proc Natl Acad Sci USA. 2009;106:97–102. doi: 10.1073/pnas.0806929106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Del Campo C, Bartholomäus A, Fedyunin I, Ignatova Z. Secondary structure across the bacterial transcriptome reveals versatile roles in mRNA regulation and function. PLoS Genet. 2015;11:e1005613. doi: 10.1371/journal.pgen.1005613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Ding Y, Tang Y, Kwok CK, Zhang Y, Bevilacqua PC, Assmann SM. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature. 2013;505:696–700. doi: 10.1038/nature12756. [DOI] [PubMed] [Google Scholar]
  13. Eddy SR. Computational Analysis of Conserved RNA Secondary Structure in Transcriptomes and Genomes. Annu Rev Biophys. 2014;43:433–456. doi: 10.1146/annurev-biophys-051013-022950. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Espah Borujeni A, Channarasappa AS, Salis HM. Translation rate is controlled by coupled trade-offs between site accessibility, selective RNA unfolding and sliding at upstream standby sites. Nucleic Acids Res. 2014;42:2646–2659. doi: 10.1093/nar/gkt1139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Fu Y, Deiorio-Haggar K, Anthony J, Meyer MM. Most RNAs regulating ribosomal protein biosynthesis in Escherichia coli are narrowly distributed to Gammaproteobacteria. Nucleic Acids Res. 2013;41:3491–3503. doi: 10.1093/nar/gkt055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Fu Y, Deiorio-Haggar K, Soo MW, Meyer MM. Bacterial RNA motif in the 5′ UTR of rpsF interacts with an S6:S18 complex. RNA. 2014;20:168–176. doi: 10.1261/rna.041285.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Goodman DB, Church GM, Kosuri S. Causes and effects of N-terminal codon bias in bacterial genes. Science. 2013;342:475–479. doi: 10.1126/science.1241934. [DOI] [PubMed] [Google Scholar]
  18. Guimaraes JC, Rocha M, Arkin AP. Transcript level and sequence determinants of protein abundance and noise in Escherichia coli. Nucleic Acids Res. 2014;42:4791–4799. doi: 10.1093/nar/gku126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Kamiyama D, Sekine S, Barsi-Rhyne B, Hu J, Chen B, Gilbert LA, Ishikawa H, Leonetti MD, Marshall WF, Weissman JS, et al. Versatile protein tagging in cells with split fluorescent protein. Nat Commun. 2016;7:11046. doi: 10.1038/ncomms11046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Keseler IM, Mackie A, Peralta-Gil M, Santos-Zavaleta A, Gama-Castro S, Bonavides-Martinez C, Fulcher C, Huerta AM, Kothari A, Krummenacker M, et al. EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res. 2013;41:D605–D612. doi: 10.1093/nar/gks1027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Kozak M. Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene. 2005;361:13–37. doi: 10.1016/j.gene.2005.06.037. [DOI] [PubMed] [Google Scholar]
  22. Kudla G, Murray AW, Tollervey D, Plotkin JB. Coding-sequence determinants of gene expression in Escherichia coli. Science. 2009;324:255–258. doi: 10.1126/science.1170160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Lee JH, Kim H, Ko J, Lee Y. Interaction of C5 protein with RNA aptamers selected by SELEX. Nucleic Acids Res. 2002;30:5360–5368. doi: 10.1093/nar/gkf694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Li GW, Burkhardt D, Gross C, Weissman JS. Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell. 2014;157:624–635. doi: 10.1016/j.cell.2014.02.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Livny J, Teonadi H, Livny M, Waldor MK. High-throughput, kingdom-wide prediction and annotation of bacterial non-coding RNAs. PLoS ONE. 2008;3:e3197. doi: 10.1371/journal.pone.0003197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Lu Z, Zhang QC, Lee B, Flynn RA, Smith MA, Robinson JT, Davidovich C, Gooding AR, Goodrich KJ, Mattick JS, et al. RNA Duplex Map in Living Cells Reveals Higher-Order Transcriptome Structure. Cell. 2016;165:1267–1279. doi: 10.1016/j.cell.2016.04.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Maguire BA, Wild DG. Mutations in the rpmBG operon of Escherichia coli that affect ribosome assembly. J Bacteriol. 1997;179:2486–2493. doi: 10.1128/jb.179.8.2486-2493.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Matelska D, Purta E, Panek S, Boniecki MJ, Bujnicki JM, Dunin-Horkawicz S. S6:S18 ribosomal protein complex interacts with a structural motif present in its own mRNA. RNA. 2013;19:1341–1348. doi: 10.1261/rna.038794.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Mattheakis LC, Nomura M. Feedback regulation of the spc operon in Escherichia coli: translational coupling and mRNA processing. J Bacteriol. 1988;170:4484–4492. doi: 10.1128/jb.170.10.4484-4492.1988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Mauger DM, Golden M, Yamane D, Williford S, Lemon SM, Martin DP, Weeks KM. Functionally conserved architecture of hepatitis C virus RNA genomes. Proc Natl Acad Sci USA. 2015;112:201416266–3697. doi: 10.1073/pnas.1416266112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. McGinnis JL, Liu Q, Lavender CA, Devaraj A, McClory SP, Fredrick K, Weeks KM. In-cell SHAPE reveals that free 30S ribosome subunits are in the inactive state. Proc Natl Acad Sci USA. 2015;112:2425–2430. doi: 10.1073/pnas.1411514112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29:2933–2935. doi: 10.1093/bioinformatics/btt509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY, Eddy SR, Floden EW, Gardner PP, Jones TA, Tate J, et al. Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. 2015;43:D130–D137. doi: 10.1093/nar/gku1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Ott A, Idali A, Marchais A, Gautheret D. NAPP: the Nucleic Acid Phylogenetic Profile Database. Nucleic Acids Res. 2012;40:D205–D209. doi: 10.1093/nar/gkr807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Pelechano V, Wei W, Steinmetz LM. Widespread Co-translational RNA Decay Reveals Ribosome Dynamics. Cell. 2015;161:1400–1412. doi: 10.1016/j.cell.2015.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Pédelacq JD, Cabantous S, Tran T, Terwilliger TC, Waldo GS. Engineering and characterization of a superfolder green fluorescent protein. Nat Biotechnol. 2006;24:79–88. doi: 10.1038/nbt1172. [DOI] [PubMed] [Google Scholar]
  38. Pichon C, du Merle L, Caliot ME, Trieu-Cuot P, Le Bouguénec C. An in silico model for identification of small RNAs in whole bacterial genomes: characterization of antisense RNAs in pathogenic Escherichia coli and Streptococcus agalactiae strains. Nucleic Acids Res. 2012;40:2846–2861. doi: 10.1093/nar/gkr1141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Ramani V, Qiu R, Shendure J. High-throughput determination of RNA structure by proximity ligation. Nat Biotechnol. 2015;33:980–984. doi: 10.1038/nbt.3289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Reuter JS, Mathews DH. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics. 2010;11:129. doi: 10.1186/1471-2105-11-129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Rio DC, Ares M, Jr, Hannon GJ, Nilsen TW. RNA: a laboratory manual. Cold Spring Harbor: Cold Spring Harbor Laboratory Press; 2011. [Google Scholar]
  42. Rivas E, Eddy SR. Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics. 2001;2:8. doi: 10.1186/1471-2105-2-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Rivas E, Klein RJ, Jones TA, Eddy SR. Computational identification of noncoding RNAs in E. coli by comparative genomics. Curr Biol. 2001;11:1369–1373. doi: 10.1016/s0960-9822(01)00401-8. [DOI] [PubMed] [Google Scholar]
  44. Rouskin S, Zubradt M, Washietl S, Kellis M, Weissman JS. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature. 2013;505:701–705. doi: 10.1038/nature12894. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Ryder SP, Recht MI, Williamson JR. RNA-Protein Interaction Protocols. Totowa, NJ: Humana Press; 2008. Quantitative Analysis of Protein-RNA Interactions by Gel Mobility Shift; pp. 99–115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muniz-Rascado L, García-Sotelo JS, Weiss V, Solano-Lira H, Martínez-Flores I, Medina-Rivera A, et al. RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more. Nucleic Acids Res. 2013;41:D203–D213. doi: 10.1093/nar/gks1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Salis HM, Mirsky EA, Voigt CA. Automated design of synthetic ribosome binding sites to control protein expression. Nat Biotechnol. 2009;27:946–950. doi: 10.1038/nbt.1568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Siegfried NA, Busan S, Rice GM, Nelson JAE, Weeks KM. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP) Nat Methods. 2014;11:959–965. doi: 10.1038/nmeth.3029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Slinger BL, Deiorio-Haggar K, Anthony JS, Gilligan MM, Meyer MM. Discovery and validation of novel and distinct RNA regulators for ribosomal protein S15 in diverse bacterial phyla. BMC Genomics. 2014;15:657. doi: 10.1186/1471-2164-15-657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Smola MJ, Calabrese JM, Weeks KM. Detection of RNA–Protein Interactions in Living Cells with SHAPE. Biochemistry. 2015a;54:6867–6875. doi: 10.1021/acs.biochem.5b00977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Smola MJ, Rice GM, Busan S, Siegfried NA, Weeks KM. Selective 2′-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis. Nat Protoc. 2015b;10:1643–1669. doi: 10.1038/nprot.2015.103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Spitale RC, Flynn RA, Zhang QC, Crisalli P, Lee B, Jung JW, Kuchelmeister HY, Batista PJ, Torre EA, Kool ET, et al. Structural imprints in vivo decode RNA regulatory mechanisms. Nature. 2015;519:486–490. doi: 10.1038/nature14263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Steen K-A, Siegfried NA, Weeks KM. Synthesis of 1-methyl-7-nitroisatoic anhydride (1M7) Protocol Exchange. 2011 doi: 10.1038/protex.2011.255. [DOI] [Google Scholar]
  54. Studer SM, Joseph S. Unfolding of mRNA secondary structure by the bacterial translation initiation complex. Mol Cell. 2006;22:105–115. doi: 10.1016/j.molcel.2006.02.014. [DOI] [PubMed] [Google Scholar]
  55. Sugimoto Y, Vigilante A, Darbo E, Zirra A, Militti C, D’Ambrogio A, Luscombe NM, Ule J. hiCLIP reveals the in vivo atlas of mRNA secondary structures recognized by Staufen 1. Nature. 2015;519:491–494. doi: 10.1038/nature14280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Takyar S, Hickerson RP, Noller HF. mRNA Helicase Activity of the Ribosome. Cell. 2005;120:49–58. doi: 10.1016/j.cell.2004.11.042. [DOI] [PubMed] [Google Scholar]
  57. Tatusova T, Ciufo S, Fedorov B, O’Neill K, Tolstoy I. RefSeq microbial genomes database: new representation and annotation strategy. Nucleic Acids Res. 2014;42:D553–D559. doi: 10.1093/nar/gkt1274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Thomas MS, Bedwell DM, Nomura M. Regulation of α operon gene expression in Escherichia coli. J Mol Biol. 1987;196:333–345. doi: 10.1016/0022-2836(87)90694-2. [DOI] [PubMed] [Google Scholar]
  59. Tran TT, Zhou F, Marshburn S, Stead M, Kushner SR, Xu Y. De novo computational prediction of non-coding RNA genes in prokaryotic genomes. Bioinformatics. 2009;25:2897–2905. doi: 10.1093/bioinformatics/btp537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Tuller T, Carmi A, Vestsigian K, Navon S, Dorfan Y, Zaborske J, Pan T, Dahan O, Furman I, Pilpel Y. An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell. 2010a;141:344–354. doi: 10.1016/j.cell.2010.03.031. [DOI] [PubMed] [Google Scholar]
  61. Tuller T, Waldman YY, Kupiec M, Ruppin E. Translation efficiency is determined by both codon bias and folding energy. Proc Natl Acad Sci USA. 2010b;107:3645–3650. doi: 10.1073/pnas.0909910107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Tyrrell J, McGinnis JL, Weeks KM, Pielak GJ. The cellular environment stabilizes adenine riboswitch RNA structure. Biochemistry. 2013;52:8777–8785. doi: 10.1021/bi401207q. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Uzilov AV, Keegan JM, Mathews DH. Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics. 2006;7:173. doi: 10.1186/1471-2105-7-173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Wan Y, Qu K, Zhang QC, Flynn RA, Manor O, Ouyang Z, Zhang J, Spitale RC, Snyder MP, Segal E, et al. Landscape and variation of RNA secondary structure across the human transcriptome. Nature. 2014;505:706–709. doi: 10.1038/nature12946. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Washietl S, Hofacker IL, Stadler PF. Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci USA. 2005;102:2454–2459. doi: 10.1073/pnas.0409169102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Watters KE, Abbott TR, Lucks JB. Simultaneous characterization of cellular RNA structure and function with in-cell SHAPE-Seq. Nucleic Acids Res. 2016;44:e12–e12. doi: 10.1093/nar/gkv879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Weeks KM. Toward all RNA structures, concisely. Biopolymers. 2015;103:438–448. doi: 10.1002/bip.22601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Weinberg Z, Breaker RR. R2R--software to speed the depiction of aesthetic consensus RNA secondary structures. BMC Bioinformatics. 2011;12:3. doi: 10.1186/1471-2105-12-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Weinberg Z, Kim PB, Chen TH, Li S, Harris KA, Lünse CE, Breaker RR. New classes of self-cleaving ribozymes revealed by comparative genomics analysis. Nat Chem Biol. 2015;11:606–610. doi: 10.1038/nchembio.1846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Wilson LA, Sharp PM. Enterobacterial repetitive intergenic consensus (ERIC) sequences in Escherichia coli: Evolution and implications for ERIC-PCR. Mol Biol Evol. 2006;23:1156–1168. doi: 10.1093/molbev/msj125. [DOI] [PubMed] [Google Scholar]
  71. Yao Z, Barrick J, Weinberg Z, Neph S, Breaker R, Tompa M, Ruzzo WL. A computational pipeline for high-throughput discovery of cis-regulatory noncoding RNA in prokaryotes. PLoS Comput Biol. 2007;3:e126. doi: 10.1371/journal.pcbi.0030126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Yates JL, Nomura M. E. coli ribosomal protein L4 is a feedback regulatory protein. Cell. 1980;21:517–522. doi: 10.1016/0092-8674(80)90489-4. [DOI] [PubMed] [Google Scholar]
  73. Zubradt M, Gupta P, Persad S, Lambowitz AM, Weissman JS, Rouskin S. DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo. Nat Methods. 2017;14:75–82. doi: 10.1038/nmeth.4057. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1. Figure S1, related to Figure 1. In-cell SHAPE resolves protein binding sites in non-coding RNAs.

(A, B) Raw modification profiles and resultant SHAPE profiles of tmRNA and RNase P probed under in-cell and cell-free conditions. DMSO no-reagent control samples were collected for both cell-free and in-cell conditions, but for simplicity, only a single DMSO profile is shown. Profiles show unambiguously that 1M7 penetrates E. coli cells and modifies in-cell RNA at roughly the same rate as cell-free RNA. Smoothed SHAPE reactivity differences were calculated using the ΔSHAPE framework (Smola et al., 2015a). (C) SHAPE reactivity changes mapped on the E. coli tmRNA secondary structure. (D) SHAPE reactivity changes mapped on the crystal structure of the tRNA-like domain of A. aeolicus tmRNA (PDB 1P6V). In-cell SHAPE reactivity protections (green) correspond closely with the SmpB binding site. (E) SHAPE reactivity changes mapped on the E. coli RNase P RNA secondary structure. (F) SHAPE reactivity changes mapped on the crystal structure of T. maritima RNase P (PDB 3QIQ). In-cell SHAPE reactivity protections (green) correspond closely with C5 protein and tRNA binding sites.

10. Figure S2, related to Figure 1. Reproducibility and meta-gene analysis of SHAPE reactivity.

(A) Per-gene Pearson correlation between SHAPE profiles across biological replicates. Medians are denoted by black bisecting lines, boxes indicate the interquartile range (IQR), and whiskers indicate data within 1.5×IQR of the top and bottom quartiles. (B) Per-gene Pearson correlation between SHAPE profiles across experimental conditions. (C) Meta-gene analysis of cell-free SHAPE reactivity provides little information on the structure of individual mRNAs, but indicates that coding regions do not have periodic structures (top; see also Methods). Note that changes in average SHAPE reactivity are much smaller than the per-nucleotide standard deviation. Note also that the increased SHAPE reactivity observed at the meta-gene start and stop codons mirror AU-sequence biases (bottom). Averaging was performed transcriptome-wide, including all 100-nt windows with at least 60% cell-free SHAPE data coverage irrespective of whether the parent transcript had sufficient full-length SHAPE coverage for other analyses. Hence, this analysis reflects a larger pool of genes, and is comparable in makeup to other transcriptome-wide studies. The number of windows used for each average is denoted.

2. Figure S3, related to Figure 2. Comparison between SHAPE-directed and no-data structure models.

(A) Similarity between MFE structure models for each transcript. Comparisons were performed by computing the fraction of base pairs shared between the first and second structures and vice versa (first and second correspond to order listed on x-axis). These fractions correspond to positive predictive value (ppv) and sensitivity, respectively, which are conventionally used when comparing structure models to known references. (B) Fraction of nucleotides that are base paired in MFE structures for different conditions. (C) Similarity between the set of highly probable (P>0.9) base pairs for each condition. Comparisons were performed as described in panel A. (D) Fraction of nucleotides paired with P>0.9 under different conditions. In panels A-D, medians are denoted by red bisecting lines, boxes indicate the IQR, whiskers indicate data within 1.5×IQR of the top and bottom quartiles, and outliers are indicated by crosses. (E) Correlation between base-pairing entropy and the fraction of MFE pairs shared between in-cell and cell-free models. High entropy indicates structures are poorly defined. (F) Correlation between base-pairing entropy and the fraction of MFE pairs shared between in-cell and kasugamycin models.

3. Figure S4, related to Figure 3. Correlation between TE (Li et al., 2014) and ΔGunfold and ΔGunfold.

(A) Scheme illustrating different models of mRNA accommodation into the 30S subunit. For equilibrium calculations, the mRNA molecule is allowed to refold to a new minimum free energy structure after unfolding the RBS, but not in non-equilibrium (kinetic) calculations. Local versus complete unfolding allows versus disallows base pairs across the RBS window. Non-equilibrium unfolding energies are assumed to correspond to ΔGunfold, the free energy of the unfolding transition state (see Methods). (B, C) Correlation coefficients computed using different sized windows for local (filled bars) and complete (open bars) RBS unfolding models. Correlations were computed using in-cell structures, excluding potential translationally coupled genes (N=157). In panel B, red shading indicates the model used for all remaining analyses. (D-F) Correlation between TE and local ΔGunfold for the three probing conditions. To facilitate direct comparison, we only show genes that possess sufficient data coverage in all three SHAPE probing conditions (N=92). (G) Correlation between TE and local ΔGunfold computed from “no-data” structure models. (H) Correlation between TE and ΔGtotal predicted by the RBS calculator (v1.0), a representative thermodynamics-based TE calculator (Salis et al., 2009). Analyses in panels G and H were performed on genes possessing in-cell SHAPE data (N=157) and thus can be directly compared to Figure 3C.

4. Figure S5, related to Figure 5. RNA structure couples translation of adjacent genes.

(A) Relationship between the TE ratio of adjacent genes as a function of the number base pairs linking the genes. Bottom and top quintiles are shown in yellow and blue, respectively; these quintiles correspond to the “few” and “many” linking-pairs categories in Figure 5. The red dashed line highlights the consistent decrease in TE variability as genes are linked by more base pairs. (B) Relationship between TE of adjacent genes as a function of the length of the intervening intergenic region. This analysis shows clearly that translational coupling is not a simple function of intergenic distance. Top and bottom quintiles are shown as in (A). Statistical significance between the top and bottom quintiles is indicated above (A) and (B) and was tested using two-tailed Mann-Whitney U-tests. (C, D) Examples of structure-mediated translational coupling over long intergenic regions. The rpmI-rplT IGR is 53-nt long, and the rpsK-rpsD IGR is 34-nt long. Structures are shown as pairing probability arcs (key shown in Fig. 3).

5. Figure S6, related to Figure 6. SHAPE data reveal that many known ribosomal protein autoregulatory element (RARE) structures are unstable in the absence of bound protein.

Motifs are labeled by downstream gene, with the ribosomal protein ligand listed in parentheses. Accepted functional structures and SHAPE data are shown for each motif (Aseev et al., 2015; Fu et al., 2013; 2014; Matelska et al., 2013). Regions where SHAPE data are inconsistent with an accepted structure are highlighted with light blue shading, and corresponding unstable structural elements are shown using grey arcs. Brown boxes indicate coding sequences. Note that the rplE motif is located entirely within the rplE coding region, and thus was not included in our automated motif search.

6. Figure S7, related to Figure 6. The rpmB 5′ UTR binds L9 and L28.

(A) Constructs used. Alterations made in variant constructs are highlighted in black. In the stabilizing H1insA-GC construct, an A is inserted to pair with the bulged U, and the neighboring AU pair is changed to a GC. The WTtrunc construct contains only the three-way junction (truncated nucleotides are drawn in gray). (B) The low-mobility conformation of the rpmB 5′ UTR is salt-dependent, and is stabilized by the H1insA-GC mutation. Quantification indicates that 64% of H1insA-GC RNA is in the low-mobility conformation at 200 mM KCl and 20 mM MgCl2, compared to 48% for WT RNA. 10 nM RNA was folded as described in Methods in 10 mM Tris-HCl (pH 7.5), 0.1 mg/mL yeast tRNA, and varying KCl and MgCl2. Concentrations are in mM. (C) Co-incubation experiment indicates that the WT and WTtrunc constructs do not interact, confirming that the slow conformation is not a dimer. 2.5 nM isolated or mixed RNAs were denatured and folded as described in Methods in L9-binding buffer. (D) Binding of L9 and L28 to different constructs. L28 appears to bind both high- and low-mobility states, as evidenced by the appearance of new bands in both regions of the gel (see also Figure 6E). L9 and L28 concentrations are 250 and 500 nM. The no protein and 500 nM L9+L28 lanes in the ΔH3 panel are identical to the ΔH3 panel in Figure 6E. (E) Concentration-dependent binding of L9 (concentrations vary from 178 to 600 nM). Estimate, KD≈300 nM. (F) L33 binds the rpmB 5′ UTR but is competed by L9 (L33 = 500 nM; L9 varies from 125 to 500 nM). (G) The WTtrunc construct does not bind L9 or L28 (500 nM concentrations). (H) Consensus 5′ UTR across all enterobacterial species indicates that sequences downstream of the three-way junction are highly conserved. Key for the consensus is located in main text Figure 6.

7. Figure S8, related to STAR Methods.

Outline of homolog search strategy. Each E. coli structure was used to build an Infernal (Nawrocki and Eddy, 2013) covariation model. The initial model was refined three times by incorporating additional homologs identified in similar genetic contexts. The trained covariation model was then used to perform a final search, with returned homologs used to construct consensus structures using R2R (Weinberg and Breaker, 2011).

8
9

Table S1. UTR and IGR motifs identified from SHAPE-directed motif search, related to Figure 6.

Table S2. Sequencing read counts for SHAPE experiments, related to STAR Methods.

Table S3. Endogenous sequences cloned into GFP-fusion expression constructs, related to STAR Methods.

Table S4. Oligos used for construction of pTrc-TE plasmid, related to STAR Methods

Table S5. Oligos used for in vitro RNA transcription, related to STAR Methods.

RESOURCES