Significance
Despite widespread recognition that RNA is inherently structured, the interplay between local and global mRNA secondary structure (particularly in the coding region) and overall protein expression has not been thoroughly explored. Our work uses 2 approaches to disentangle the regulatory roles of mRNA primary sequence and secondary structure: global substitution with modified nucleotides and computational sequence design. By fitting detailed kinetic expression data to mathematical models, we show that secondary structure can increase mRNA half-life independent of codon usage. These findings have significant implications for both translational regulation of endogenous mRNAs and the emerging field of mRNA therapeutics.
Keywords: mRNA therapuetics, modified nucleotides, translation, RNA structure, SHAPE
Abstract
Messenger RNAs (mRNAs) encode information in both their primary sequence and their higher order structure. The independent contributions of factors like codon usage and secondary structure to regulating protein expression are difficult to establish as they are often highly correlated in endogenous sequences. Here, we used 2 approaches, global inclusion of modified nucleotides and rational sequence design of exogenously delivered constructs, to understand the role of mRNA secondary structure independent from codon usage. Unexpectedly, highly expressed mRNAs contained a highly structured coding sequence (CDS). Modified nucleotides that stabilize mRNA secondary structure enabled high expression across a wide variety of primary sequences. Using a set of eGFP mRNAs with independently altered codon usage and CDS structure, we find that the structure of the CDS regulates protein expression through changes in functional mRNA half-life (i.e., mRNA being actively translated). This work highlights an underappreciated role of mRNA secondary structure in the regulation of mRNA stability.
Messenger RNAs (mRNAs) direct cytoplasmic protein expression. How much protein is produced per mRNA molecule is a function of how well the translational machinery initiates and elongates on the coding sequence (CDS) and the mRNA’s functional half-life. Both translational efficiency and functional half-life are driven by features encoded in the primary mRNA sequence. Synonymous codon choice directly impacts translation, with highly expressed genes tending to include more “optimal” codons (1, 2). Conversely, “nonoptimal” codons can increase ribosomal pausing and decrease mRNA half-life (3, 4). Other mRNA sequence features that reportedly correlate with protein output are dinucleotide frequency in the CDS (5) and the effect of codon order on locally accessible charged tRNA pools (6). Because these effects are interdependent on mRNA sequence, teasing apart their individual contributions to protein output is difficult and often controversial (7, 8).
In addition to dictating encoded protein identity, its primary sequence also determines an mRNA’s propensity to form secondary and tertiary structure (9). Transcriptome-wide RNA structure characterization is beginning to reveal global relationships between the structure content in different mRNA regions and protein expression (10, 11). Multiple studies have shown that secondary structure in the 5′ untranslated region (5′ UTR) generally reduces translation initiation efficiency and therefore overall protein output (10–13). But, the extent to which CDS and 3′ untranslated region (3′ UTR) secondary structure impacts protein output, however, is much less understood.
One way to alter RNA secondary structure is to change the primary sequence. In the CDS, however, primary sequence changes necessarily alter codon usage, confounding any effects that might be attributable to changes in mRNA structure alone. An alternate means to affect secondary structure without changing codons is to incorporate modified nucleotides (nt) that maintain the same Watson–Crick base-pairing relationships (e.g., pseudouridine [Ψ] for U) but have small effects on local secondary structure. Such modified nucleotides can either stabilize (14) or destabilize (15) base pairs and hence overall mRNA structure.
Here, we combined computational sequence design with global modified nucleotide substitution as tools to investigate the separate impacts of mRNA primary sequence and structural stability on protein output. We find that differences in the innate thermodynamic base pair stability of 2 modified uridine nucleotides, N1-methyl-pseudouridine and 5-methoxy-uridine, induce global changes in mRNA secondary structure. These structural changes in turn drive changes in protein expression. As expected, our data confirm that reduced secondary structure within a 5′ leader region (the 5′ UTR and first ∼10 codons of the CDS) correlates with high protein expression. Surprisingly, we also find that high protein expression correlates with increased secondary structure in the remainder of the mRNA (the rest of the CDS and the 3′ UTR). We validated this finding by designing an enhanced gene fluorescent protein (eGFP) mRNA panel wherein the effects of codon usage and secondary structure could be examined separately. Our data reveal a relationship wherein codon optimality and greater CDS secondary structure synergize to increase mRNA functional half-life.
Results
RNA Sequence and Nucleotide Modifications Combine to Determine Protein Expression.
For this study, we created diverse synonymous CDS sets encoding eGFP (4 variants), human erythropoietin (hEpo, 9 variants) and firefly Luciferase (Luc, 39 variants) transcribed in vitro with ATP, CTP, GTP, and either UTP, pseudouridine triphosphate (ΨTP), N1-methyl-pseudouridine triphosphate (m1ΨTP), or 5-methyoxy-uridine triphosphate (mo5UTP) (Fig. 1A). For comparison with a previous study documenting the effects of modified nucleotides on RNA immunogenicity (16), we also made eGFP mRNA wherein both U and C were substituted with Ψ and 5-methyl-cytidine (m5C), respectively. We designed the sequence sets with bias toward optimal codons (for hEPO and eGFP mRNAs) or designed to sample a larger sequence space, including nonoptimal codons (for Luc mRNA). All mRNAs carried cap1, identical 5′ and 3′ UTRs, and a 100-nucleotide poly(A) tail.
First, we analyzed the impact of primary CDS sequence on protein expression of mRNAs containing no modified nucleotides (eGFP/hEPO, Fig. 1; Luc, Fig. 2). Cellular protein expression ranged between >2.5-fold for eGFP (Fig. 1B, gray) and >4-fold for hEpo (Fig. 1C, gray), despite all sequences containing only frequently used codons. Expression of 39 unmodified Luc variants containing codons with a greater optimality range varied >10-fold (Fig. 2A, gray). Highly expressed mRNAs tended to have increased GC content, consistent with previous reports (17), but not all high GC sequences were high expressers (SI Appendix, Figs. S1 A and B and S2A, gray). Unmodified Luc expression moderately correlated with both GC content and codon adaptation index (CAI) (Pearson correlations r = 0.63 and 0.64, respectively; see SI Appendix, Fig. S2A, gray). Each Luc variant globally used the same single codon for all instances of a given amino acid. This allowed us to look at the impact of individual codons on protein expression. Only 4 of 87 pairwise synonymous codon comparisons exhibited statistically significant differences (P < 0.05; see SI Appendix, Fig. S3, gray). For example, inclusion of PheUUU was associated with a slight increase in expression over PheUUC (Fig. 2B, gray). Surprisingly, even global inclusion of extremely nonoptimal codons such as SerUCG, LeuCUA, AlaGCG, and ProCCG had no statistically significant impact on Luc expression in unmodified RNA (Fig. 2B, gray; see SI Appendix, Fig. S3A, gray). Thus, codon usage, as measured by metrics such as CAI, cannot adequately explain these data.
Next, we examined how protein expression was affected by global substitution with modified nucleotides in the same sequences. For eGFP mRNAs in HeLa cells, modified nucleotides changed the expression of both individual variants and the overall expression mean and range of the entire sequence set. Compared to unmodified mRNA, mean expression was similar for eGFP mRNAs containing Ψ and m1Ψ, but lower for mo5U and Ψ/m5C (3-fold and 1.5-fold lower, respectively, Fig. 1B). Of note, the identities of the best and worst expressing sequences were not consistent across the different modified nucleotides. For example, eGFP sequence G2 expressed highly with Ψ and m1Ψ, moderately with U and Ψ/m5C, but poorly with mo5U (Fig. 1B). Similar trends were observed for hEpo mRNA in HeLa cells, with m1Ψ yielding a 1.5-fold greater mean expression than U, which was in turn 2-fold higher than mo5U (Fig. 1C). As with eGFP and Luc, we observed hEpo variants (e.g., ECO and HAE2) that expressed well with m1Ψ, but not U or mo5U-containing mRNA (Fig. 1C). Although we observed some variation in the expression levels of individual RNAs in hepatocytes versus HeLa cells, the general expression trends were remarkably similar (Fig. 1C and SI Appendix, Fig. S1C).
To extend this analysis, we next examined 39 synonymous Luc sequences containing U, m1Ψ, or mo5U mRNA in HeLa, AML12, and primary hepatocyte cells. Mean expression increased 1.5-fold for m1Ψ mRNA but decreased 5-fold for mo5U compared to unmodified mRNA in HeLa cells (Fig. 2A). This trend held in AML12 cells and primary hepatocytes cells as well as across delivery methods, including electroporation and transfection of lipid nanoparticles (LNPs) (although some individual differences were noted (Fig. 2A and SI Appendix, Figs. S2B and S3B). For several mRNA sequences, inclusion of modified nucleotides substantially impacted protein expression (Fig. 2A and SI Appendix, Fig. S2C). Several sequences (e.g., L24, and L22) universally produced low levels of protein across all modified nucleotides, but many variants (e.g., L18, L7, L2, L8, and L29) favored specific modified nucleotides over others. Taken together, these data indicate that sequence and nucleotide modifications make distinct contributions to the overall level of protein expression.
A simple explanation for the observed modified nucleotide-specific expression differences would be a direct effect on decoding by the ribosome. If so, expression should correlate with overall modified nucleotide content, or alternatively with the use of specific codons containing modified nucleotides. However, there is no clear relationship between % U content and expression (SI Appendix, Fig. S2A) and only a few m1Ψ- and mo5U-containing codons had any statistically significant impact on protein output (6 and 4, respectively, of 87 synonymous pairwise comparisons P < 0.05, Fig. 2B and see SI Appendix, Fig. S3A). A notable exception is an unexpected and unexplained 2-fold increase in protein production with inclusion of the nonoptimal codon SerUCG in m1Ψ mRNA (P < 0.05, Fig. 2B and see SI Appendix, Fig. S3A). Thus, mRNAs containing modified nucleotides (Ψ, m1Ψ, or mo5U) can support high levels of protein expression, but in a very context-specific manner.
To assess the degree to which the above conclusions from cell lines translated to animals, we examined protein expression in mice from formulated hEpo and Luc mRNA variants containing 2 nucleotide modifications shown to have reduced immunogenicity (m1Ψ and mo5U) (16). Unmodified mRNAs were not included because in vivo protein expression can be obscured by strong activation of innate immunity (18). For some hEpo mRNAs, such as m1Ψ HBE3, we noted expression differences between the cell lines and mice (Fig. 1 C and D). These differences were larger than the differences observed between cell lines, and more pronounced for m1Ψ hEPO mRNA than for mo5U hEPO mRNA (SI Appendix, Fig. S1D). However, general expression trends were maintained in vivo (Fig. 1D). All 6 sequence variants containing m1Ψ expressed well (Fig. 1D, orange), but only 2 containing mo5U mRNA expressed at detectable levels (Fig. 1D, purple). Further, the codon optimized variant ECO expressed well with m1Ψ but not at all in mo5U. Even so, the best expression came from sequence variants containing mo5U (HAE4 and HAE3). The mo5U HAE4 variant produced >1.5-fold more protein than the best expressing m1Ψ variant (HAE3, Fig. 1D).
The 10 Luc variants tested in vivo were chosen to represent the widest possible range of protein expression observed in cell culture. As expected from the known biodistribution of mRNA-containing lipid nanoparticles (19), the liver was the main site of protein expression (SI Appendix, Fig. S2E). Luc mRNAs containing m1Ψ were highly expressed in vivo, particularly L18 and L7 (Fig. 2 C, Top). Variability in protein expression with mo5U was more exaggerated in vivo, as 7 of the 10 variants produced little to no protein (Fig. 2 C, Bottom). L18 was an exception, but still produced >10-fold less Luc than the same sequence with m1Ψ. Notably, L7 produced large amounts of protein with m1Ψ but barely detectable levels with mo5U (Fig. 2 C, Top versus Fig. 2 C, Bottom; note the y axis scales). These data suggest that expression differences observed in cell culture persist and can be more pronounced for exogenous RNAs delivered in vivo (SI Appendix, Fig. S2D).
Protein Expression Differences Trends with mRNA Thermodynamic Stability.
Since codon usage alone could not fully explain sequence-dependent expression differences in mRNAs containing modified nucleotides, we examined how modified nucleotides might affect mRNA secondary structure. We determined UV absorbance melting curves for mRNAs across a range of expression levels containing different uridine analogs (U, m1Ψ, and mo5U) as an overall measure of secondary structure. Highly expressing mRNAs underwent substantial melting transitions, detected as sharp peaks in the melting curves, above 35 °C (e.g., variant L18 with all 3 uridine analogs and L15 with m1Ψ only; Fig. 3A). For some variants (e.g., L15), inclusion of m1Ψ but not mo5U induced a shift to higher melting temperatures, suggesting global stabilization of structural features within the mRNA (Fig. 3A). Notably, L15 expression was much higher with m1Ψ than mo5U (Fig. 2A). Similar trends were observed in most, but not all, sequences tested (SI Appendix, Fig. S4A). Although these initial results suggested a link between RNA structural stability and modification-dependent protein expression in vivo, higher resolution structural information was required.
The RNA base-pairing thermodynamics is commonly understood in nearest-neighbor energy terms (20). Whereas these parameters were previously reported for unmodified RNA and RNA containing Ψ, to our knowledge they have not yet been established for m1Ψ or mo5U. To establish these parameters for Ψ, m1Ψ, and mo5U, we performed optical melting experiments on 35 synthetic short RNA duplexes containing global substitutions of uridine with Ψ, m1Ψ, and mo5U (20). Nearest neighbors containing Ψ (Fig. 3B, diamonds) and m1Ψ (Fig. 3B, squares) form substantially more stable base pairs than uridine (by 0.25 and 0.18 kcal/mol on average, respectively; Fig. 3B, circles; SI Appendix, Table S1). In contrast, nearest neighbors containing mo5U (Fig. 3B, triangles) are destabilized by 0.28 kcal/mol relative to uridine (Fig. 3B and SI Appendix, Table S1). The average difference for mo5U versus Ψ is −0.5 kcal/mol per nearest neighbor, or −1.0 kcal/mol per base pair. The impact of each nucleotide modification on RNA is consistent across the nearest-neighbor base pairs when compared to U. This differs from a previous study (21) that found large context-dependent differences in energies of single A-Ψ pairs, depending on the flanking A-U and G-C pairs (SI Appendix, Fig. S4B), suggesting that introduction of single modified nucleotides can have complex, context-dependent impacts on folding energies. The global differences in pairing energies that we measured, summed over all base pairs, including a modified nucleotide in a full-length mRNA, readily explain the observed differences in the UV melting curves caused by inclusion of different modified nucleotides.
Position-Dependent Structure Correlates with High Expression.
To investigate how modified nucleotides impact mRNA structure at single nucleotide resolution, we used selective 2’-hydroxyl acylation analyzed by primer-extension - mutational profiling (SHAPE-MaP) to probe RNA structure (22). We first verified that the methodology would produce high-quality data with m1Ψ and mo5U containing mRNAs (SI Appendix, Fig. S5A). In the absence of the SHAPE reagent (1-methyl-6-nitroisatoic anhydride [1M6]), there was no evidence of increased background error rates by next-generation sequencing (NGS) with either m1Ψ or mo5U (SI Appendix, Fig. S5B). The 1M6 treatment increased the mutation rates for RNAs containing either m1Ψ or mo5U to a similar extent as observed for uridine (SI Appendix, Fig. S5C). A comparison of SHAPE-induced mutation rates at U bases revealed a trend with m1Ψ < U < mo5U, which is consistent with the expected pairing frequency from the thermodynamic pairing energies (SI Appendix, Fig. S5D). Next, we measured RNA structure across the experimentally tested variants of hEpo containing U, m1Ψ, or mo5U (Dataset S1). Data for a representative sequence, hEpo HAE3, revealed local structure that differed dramatically by modified nucleotide (SI Appendix, Fig. S5 A and D). Consistent with the thermodynamic melting data in many RNAs, m1Ψ stabilized and mo5U destabilized structure (hEpo HAE3; see SI Appendix, Fig. S5 D and E). SHAPE-directed modeling of secondary structure suggested that modified nucleotides can induce widespread changes to the secondary structure ensemble for the same sequence (SI Appendix, Fig. S6 A and B). Thus, global incorporation of modified nucleotides induces widespread changes in mRNA structural content and conformation.
We next investigated the positional dependence of protein expression on local RNA structure. To do so, we obtained SHAPE data for synonymous variants of hEpo (8 variants each with m1Ψ or mo5U) and Luc (38 variants each with U, m1Ψ, or mo5U) whose protein output varied over >2 orders of magnitude (130 mRNAs total) (23). Regions displaying structural differences were identified using 31-nucleotide sliding window median reactivities, as previously described (24). Consistent with observations above, high protein output mRNA variants had lower median SHAPE reactivities (i.e., increased structure) across the CDS than low protein output variants. This was true for both proteins and all 3 chemistries (Fig. 4 A and B and Dataset S1). Particularly striking examples were ECO and L8 mRNAs, where their high expression in m1Ψ compared to mo5U correlated with widespread m1Ψ-dependent decreases in median SHAPE reactivity throughout the CDS (Fig. 4A and SI Appendix, Fig. S5E). In contrast, the 5′ UTR of high-expressing variants exhibited high SHAPE, indicating a general lack of structure in this region (Fig. 4A and SI Appendix, Fig. S5E).
We next analyzed the directionality and strength of the correlation between positionwise SHAPE reactivity and protein expression across all Luc variants (Fig. 4B). This revealed a striking, position-dependent relationship between mRNA structure and expression that was largely consistent between mRNAs with m1Ψ and mo5U. The region encompassing the 47-nt 5′ UTR and the first ∼30 nucleotides of the CDS (Fig. 4C, region A) showed a positive and statistically significant correlation (P < 0.05) between SHAPE reactivity and protein expression for both m1Ψ and mo5U mRNAs. In contrast, the remainder of the CDS and the entire 3ʹ UTR (Fig. 4C, region B) exhibited a predominantly inverse correlation between SHAPE reactivity and protein expression for U, m1Ψ, and mo5U. For all modified nucleotides, the percent of nucleotides with negative correlations far outnumber those positions with positive correlations (U: 78.6%; m1Ψ: 78.7%; and mo5U: 72.9%) (Fig. 4C). In other words, increased secondary structure in these regions correlated with improved protein expression, consistent with the global structural properties measured by optical melting. Although statistically underpowered, a similar trend of higher protein expression from structured coding sequences was evident in the hEPO data (SI Appendix, Fig. S7). Notably, however, the strength of the structure–function correlation varied across the metasequence. Specific regions of the CDS exhibited statistically significant correlations between SHAPE reactivity and protein expression correlations of which the vast majority were negative rather than positive (U: 1 positive, 15 negative; m1Ψ: 1 positive, 18 negative; and mo5U: 2 positive, 17 negative) (Fig. 4C).
The observed structure–function relationships were evaluated further using targeted mutations. We examined the role of flexibility in region A (47-nt 5′ UTR and the first 30 nucleotides of the CDS) by creating chimeras combining variants with different structural signatures. Luc variants L7 and L27 (Fig. 2A) both exhibited lower than average protein expression in m1Ψ. Both also exhibited low SHAPE reactivity (high structure) throughout both regions A and B (Dataset S1). However, when we replaced region A with the relatively unstructured corresponding region A from the high expresser L18 (Fig. 4A) to produce fusion mRNAs FL18/7 and FL18/27, both region A SHAPE reactivity and Luc expression increased (SI Appendix, Fig. S8A and Fig. 5B). The FL18/7 and FL18/27 chimeras only differed by 2 and 4 individual bases from their respective parents (note that the 47-nucleotide 5′ UTR is common to all sequences). Consistent with the structure–function correlations within region B (the rest of the CDS and the 3′ UTR), a Luciferase variant (LHS) predicted to have more stable secondary structure in CDS (LHS for high structure) yielded 1.5-fold greater protein expression than L18 in mo5U (SI Appendix, Fig. S8C). The expression of LHS in m1Ψ was slightly lower than L18 (SI Appendix, Fig. S8C), likely due to more stable secondary structure near the start codon (SI Appendix, Fig. S8D). While this is consistent with our observation that stable CDS structure correlates with increased protein expression, a more rigorous approach was needed to disentangle the role of structure from codon optimality.
Codon Usage and mRNA Structure Synergize to Determine Ribosome Loading and mRNA Half-Life.
The redundancy of the genetic code means that it is impossible to completely enumerate the relationships between codon usage, mRNA secondary structure, and protein expression. Instead we computationally generated sets of 150,000 synonymous CDSs encoding eGFP-degron with 3 different algorithms. For each sequence, we calculated relative synonymous codon usage (RSCU, ref. 25) and the predicted minimum free energy (MFE) structure (26). Randomly choosing synonymous codons with equal probability generates sequences that cluster around 0.75 ± 0.05 RSCU and −325 ± 40 kcal/mol MFE (Fig. 5A, red). Using probabilities weighted by frequency in the human transcriptome (27) generates a similar-shaped distribution, but shifted to both significantly higher RSCU (0.825 ± 0.05, P < 0.05) and greater structure (−340 ± 40 kcal/mol, P < 0.05) (Fig. 5A, blue). Next, we developed an algorithm that varied the individual codon choice probabilities dynamically so that RCSU and MFE were both driven to their accessible extremes (Fig. 5A, gray). The space covered is far greater than for random or frequency-weighted sequences, but has well-defined limits. Notably, because optimal codons tend to be GC rich the structure of the genetic code inherently disallows sequences with both highly optimal codons and low structure (Fig. 5 A, Top Left corner) or rare codons and high structure (Fig. 5 A, Bottom Right corner).
To investigate how the limits of codon optimality and allowable structure affect protein expression, we selected 6 regions collectively spanning the range of accessible space (Fig. 5A, boxes). From each of these 6 selected regions, we synthesized 5 synonymous sequences (30 in total) and followed the production and decay of GFP fluorescence over a 20-h timecourse in HeLa cells. This enabled us to directly compare the effects changing each factor independently, for example changes in MFE at constant RSCU (Fig. 5A, yellow vs. orange or purple vs. green) or the converse (Fig. 5A, yellow vs. purple or orange vs. green). The calculated folding energies of a subset of the eGFP mRNAs were validated by obtaining both SHAPE data and data-directed folding models (SI Appendix, Fig. S9 A and B). As expected, eGFP-degron mRNAs containing rare codons and very little secondary structure produced minimal protein (Fig. 5B, brown). Low protein expression was also observed for mRNAs with middling scores in both relative synonymous codon usage and structure (Fig. 5B, yellow). Notably, increasing either the codon optimality or secondary structure while holding the other feature constant both increased median protein expression (Fig. 5B, orange and purple, respectively, P value < 0.01). The set of mRNAs with the highest codon optimality and most structure, however, showed no additional increase in median protein expression (Fig. 5B, green, P value = 0.55). Local percentages of both U and A gave similar negative correlation across the entire CDS (SI Appendix, Fig. S10). Similar effects were observed in AML12 cells (SI Appendix, Fig. S11A). Combined, these data indicate that codon usage and secondary structure are both important, but distinct regulators of overall protein expression.
Next, we analyzed the kinetics of protein production. Real-time, continuous expression data were fit by a model including rate constants for mRNA translation, mRNA functional half-life, maturation of eGFP protein into its fluorescent form (28), and eGFP protein degradation (Fig. 5C). Functional half-life reflects the productive life of the mRNA in generating protein and is not necessarily the same as physical half-life ending with degradation—it could also reflect intracellular trafficking or sequestration away from the ribosomal machinery. Since all mRNA sequences expressed the same eGFP protein sequence and we measured fluorescent (i.e., mature functional) protein, we could assume constant rates of protein maturation (kMat) and protein degradation (λFluor, Fig. 5C). Fitting this model to the experimental data allowed us to calculate the rate of translation (kTrans) and functional half-life (t1/2 RNA) individually for each mRNA variant (Fig. 5D and SI Appendix, Table S2). Surprisingly, whereas overall protein expression correlated poorly with mRNA translation rate (r = 0.45), it correlated remarkably well with functional mRNA half-life (r = 0.90, Fig. 5E). Although the model was necessarily simplistic, these results were consistent across multiple computational models including models containing a delivery rate (SI Appendix, Fig. S11 B and C). Highly structured mRNAs had a >2-fold increase in functional mRNA half-life relative to those with middling degrees of secondary structure, regardless of whether their codon usage was middling or optimal (Fig. 5F). Thus, secondary structure increases protein output by extending mRNA functional half-life in a previously unrecognized regulatory mechanism independent of codon optimality.
Discussion
The amount of protein produced from any given mRNA (i.e., the translational output) is influenced by multiple factors specified by the primary nucleotide sequence. These factors include GC content, codon usage, codon pairs, and secondary structure. Disentangling the individual roles played by each of these factors in translational output of endogenous mRNAs, however, has proven difficult because of their high covariance. To separate these confounding relationships, we directly manipulated the secondary structure of exogenously delivered mRNAs using 2 distinct approaches. First, we globally replaced uridine with modified analogs having markedly different base-pairing thermodynamics—this led to global secondary structure changes without altering the mRNA sequence. Second, we used computational design to identify sets of mRNAs whose coding sequences explored the limits of codon usage and secondary structure.
Global incorporation of different modified nucleotides often (but not always) markedly changed mRNA expression. This effect was seen across numerous synonymous coding variants of multiple proteins, in several different cell lines, and in vivo (Figs. 1 and 2). m1Ψ generally gave higher expression than U or mo5U for the same sequence. Biophysical studies revealed that m1Ψ and mo5U have dramatically different and opposite effects compared to U (stabilizing and destabilizing, respectively) on overall mRNA folding, nearest-neighbor base-pairing thermodynamics, and secondary structure pattern as mapped by SHAPE (Figs. 3 and 4). We also found that secondary structure correlates with protein expression in a position-specific manner (Fig. 4). Consistent with previous reports (10–13), highly expressed mRNAs had low structure in the entire 5′ UTR and the first ∼30 CDS nucleotides. Notably, even though the constant, 47-nucleotide 5′ UTR was chosen to support high expression across many coding sequences, we still observed a clear structure–expression relationship in this region. Unexpectedly, however, we found that a highly structured CDS region downstream of the first 30 nucleotides also correlates with increased protein expression. By rationally designing sequences to contain more CDS structure, we could rescue low-expressing mo5U-containing mRNA variants (SI Appendix, Fig. S8A). Protein expression from sequences selected to vary the degree of secondary structure independent of codon usage, and vice versa, revealed that secondary structure and codon usage each have distinct and roughly equivalent impacts on protein output (Fig. 5). For this set of mRNAs, total protein output was driven primarily by changes in functional mRNA half-life.
There are several possible mechanisms for the observed relationship between CDS secondary structure and functional mRNA half-life. It is possible that higher mRNA structure improves interaction with RNA binding proteins (RBPs) that positively impact translation, such as double-stranded RNA binding protein Staufen (29) or reduces accessibility to single strand-specific endonucleases, although endonucleolytic cleavage is thought to be limited to specific cases (30). Another possibility is that ribosome queuing near the start codon may facilitate translation initiation (31). This could explain the structural signature in the beginning of the CDS, but it is worth noting that our data suggest a beneficial effect for structure throughout the CDS (Fig. 4C). Another possibility is that structure leads to ribosome pausing at specific locations in the peptide, crucial for proper protein folding, and therefore activity, of certain multidomain proteins (32, 33).
Yet another possibility is that mRNA structure slows ribosome movement, leading to enhanced functional protein output and extended mRNA half-life. Biophysical experiments have shown that secondary structure can slow ribosome processivity (34). While counterintuitive, the increase in protein output is consistent with previous work in cell-free lysates, showing that mRNAs containing m1Ψ slow ribosome elongation, increase total protein output, and improve initiation through decreased phosphorylation of the initiation factor, eIF2α (35). Our data help explain the first 2 observations, in that m1Ψ stabilizes mRNA secondary structure leading to greater protein output. If eIF2α dephosphorylation were the primary driver of the change in translation, we would have expected to see a strong correlation between translation efficiency and total protein output. However, we found very little correlation between translation efficiency, which should closely correlate to initiation rate and total protein output (Fig. 5E). Our use of a standardized 5′ UTR lacking strong translational modulators found in many endogenous genes, such as upstream ORFs and specific regulatory structures, likely rule out effects of those elements. Interestingly, a recent study in mammalian neurons links codon-specific ribosome stalling directly to eIF2α phosphorylation through the kinase GCN2 (36), suggesting that translation initiation and elongation may be highly interconnected (37).
Further, ribosomal pauses induced by rare codons have recently been directly linked to ribosomal frame shifting (38) and mRNA degradation (4, 39). An mRNA degradation-based mechanism is consistent with our data showing a tight correlation between mRNA half-life and total protein output (Fig. 5E). Notably, ribosome collisions were demonstrated to activate the mRNA degradation through the no-go decay (NGD) pathway and decrease mRNA half-life (40, 41). One suggested mechanism for this regulation is the action of the ubiquitin ligase, ZNF598, on the small subunit proteins of collided diribosomes (42). An aspect of this regulation that remains poorly understood is the relationship between ribosomal pausing/collision and secondary structure. Although secondary structure likely slows ribosome progression, the ribosome is an inherently powerful helicase that necessarily unwinds mRNA structure during elongation (43). We propose that local secondary structure elements unwound by each advancing ribosome should quickly reform after that ribosome has moved on and may thus act as buffers to prevent collisions between adjacent ribosomes on the same message. If ribosome collisions play a central role in translational quality control, local secondary structure may be a key regulator of functional mRNA half-life by enforcing spacing between ribosomes and thereby decreasing collisions.
Methods
Animal Research.
All animal studies were approved by and performed in accordance with the Institutional Animal Care and Use Committee of Moderna, Inc.
Sequence Design.
eGFP variants were stochastically generated using only frequently used codons. For hEpo regions and Luc, variants were deterministically encoded with one codon per amino acid.
mRNA Preparation.
mRNAs for hEpo, eGFP, and Luc were synthesized in vitro using all unmodified nucleotides or global substitutions of uridine (U) for the modified uridine analogs pseudouridine (Ψ), N1-methyl-pseudouridine (m1Ψ), 5-methyoxy-urdine (mo5U), or a combination of Ψ and 5-methyl-cytidine (m5C).
Determination of Nearest-Neighbor Thermodynamic Parameters.
UV-melting experiments were performed on 39 synthetic RNA duplexes with Ψ, m1Ψ, and mo5U instead of uridine, and the nearest-neighbor free energy contributions for each modified nucleotide to be determined using established methods (20).
Computational Modeling of eGFP Expression Data.
Timecourse data were collected from HeLa and AML12 cells transfected with the designed eGFP-degron mRNAs. These data were used to fit the computational model of active protein production and degradation in which rate terms for protein maturation and degradation were held constant and the translation efficiency and rate of RNA degradation were allowed to vary to find the best fit to the experimental data.
Supplementary Material
Acknowledgments
We thank DNA Software for expert work in the experimental determination of the nearest-neighbor parameters for modified nucleotides; Katie Hughes, Chris Pepin, and Wei Zheng for statistics advice and support; and Paul Yourik, Mihir Metkar, Alicia Bicknell, Ruchi Jain, and Caroline Köhrer for critical reading of the manuscript. This work was supported by Moderna, Inc.
Footnotes
Competing interest statement: All authors are employees (or ex-employees in the case of S.V.S., J.R., and B.J.C.) of Moderna, Inc.
This article is a PNAS Direct Submission.
Data deposition: The data reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database, https://www.ncbi.nlm.nih.gov/geo (accession no. GSE139176).
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1908052116/-/DCSupplemental.
References
- 1.Gustafsson C., Govindarajan S., Minshull J., Codon bias and heterologous protein expression. Trends Biotechnol. 22, 346–353 (2004). [DOI] [PubMed] [Google Scholar]
- 2.Horstick E. J., et al. , Increased functional protein expression using nucleotide sequence features enriched in highly expressed genes in zebrafish. Nucleic Acids Res. 43, e48 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Weinberg D. E., et al. , Improved ribosome-footprint and mRNA measurements provide insights into dynamics and regulation of yeast translation. Cell Rep. 14, 1787–1799 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Presnyak V., et al. , Codon optimality is a major determinant of mRNA stability. Cell 160, 1111–1124 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Tulloch F., Atkinson N. J., Evans D. J., Ryan M. D., Simmonds P., RNA virus attenuation by codon pair deoptimisation is an artefact of increases in CpG/UpA dinucleotide frequencies. eLife 3, e04531 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Tuller T., et al. , An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell 141, 344–354 (2010). [DOI] [PubMed] [Google Scholar]
- 7.Simmonds P., Tulloch F., Evans D. J., Ryan M. D., Attenuation of dengue (and other RNA viruses) with codon pair recoding can be explained by increased CpG/UpA dinucleotide frequencies. Proc. Natl. Acad. Sci. U.S.A. 112, E3633–E3634 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Futcher B., et al. , Reply to Simmonds et al.: Codon pair and dinucleotide bias have not been functionally distinguished. Proc. Natl. Acad. Sci. U.S.A. 112, E3635–E3636 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Mortimer S. A., Kidwell M. A., Doudna J. A., Insights into RNA structure and function from genome-wide studies. Nat. Rev. Genet. 15, 469–479 (2014). [DOI] [PubMed] [Google Scholar]
- 10.Ding Y., et al. , In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505, 696–700 (2014). [DOI] [PubMed] [Google Scholar]
- 11.Wan Y., et al. , Landscape and variation of RNA secondary structure across the human transcriptome. Nature 505, 706–709 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Shah P., Ding Y., Niemczyk M., Kudla G., Plotkin J. B., Rate-limiting steps in yeast protein translation. Cell 153, 1589–1601 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Tuller T., Zur H., Multiple roles of the coding sequence 5′ end in gene expression regulation. Nucleic Acids Res. 43, 13–28 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Newby M. I., Greenbaum N. L., A conserved pseudouridine modification in eukaryotic U2 snRNA induces a change in branch-site architecture. RNA 7, 833–845 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kierzek E., Kierzek R., The thermodynamic stability of RNA duplexes and hairpins containing N6-alkyladenosines and 2-methylthio-N6-alkyladenosines. Nucleic Acids Res. 31, 4472–4480 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Karikó K., et al. , Incorporation of pseudouridine into mRNA yields superior nonimmunogenic vector with increased translational capacity and biological stability. Mol. Ther. 16, 1833–1840 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Plotkin J. B., Kudla G., Synonymous but not the same: The causes and consequences of codon bias. Nat. Rev. Genet. 12, 32–42 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kormann M. S., et al. , Expression of therapeutic proteins after delivery of chemically modified mRNA in mice. Nat. Biotechnol. 29, 154–157 (2011). [DOI] [PubMed] [Google Scholar]
- 19.Sabnis S., et al. , A novel amino lipid series for mRNA delivery: Improved endosomal escape and sustained pharmacology and safety in non-human primates. Mol. Ther. 26, 1509–1519 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Xia T., et al. , Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry 37, 14719–14735 (1998). [DOI] [PubMed] [Google Scholar]
- 21.Hudson G. A., Bloomingdale R. J., Znosko B. M., Thermodynamic contribution and nearest-neighbor parameters of pseudouridine-adenosine base pairs in oligoribonucleotides. RNA 19, 1474–1482 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Siegfried N. A., Busan S., Rice G. M., Nelson J. A., Weeks K. M., RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat. Methods 11, 959–965 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Mauger D. M., Cabral B. J., Presnyak V., Moore M. J., mRNA structure regulates protein expression through changes in functional half-life. Gene Expression Omnibus (GEO) database. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE139176. Deposited 21 October 2019. [DOI] [PMC free article] [PubMed]
- 24.Watts J. M., et al. , Architecture and secondary structure of an entire HIV-1 RNA genome. Nature 460, 711–716 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Scherer S., Guide to the Human Genome (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 2010) p. xiv, 1008 p. [Google Scholar]
- 26.Lu Z. J., Turner D. H., Mathews D. H., A set of nearest neighbor parameters for predicting the enthalpy change of RNA secondary structure formation. Nucleic Acids Res. 34, 4912–4924 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Nakamura Y., Gojobori T., Ikemura T., Codon usage tabulated from international DNA sequence databases: Status for the year 2000. Nucleic Acids Res. 28, 292 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Crameri A., Whitehorn E. A., Tate E., Stemmer W. P., Improved green fluorescent protein by molecular evolution using DNA shuffling. Nat. Biotechnol. 14, 315–319 (1996). [DOI] [PubMed] [Google Scholar]
- 29.Jungfleisch J., et al. , A novel translational control mechanism involving RNA structures within coding sequences. Genome Res. 27, 95–106 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Schoenberg D. R., Mechanisms of endonuclease-mediated mRNA decay. Wiley Interdiscip. Rev. RNA 2, 582–600 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kearse M. G., et al. , Ribosome queuing enables non-AUG translation to be resistant to multiple protein synthesis inhibitors. Genes Dev. 33, 871–885 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Kimchi-Sarfaty C., et al. , A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science 315, 525–528 (2007). [DOI] [PubMed] [Google Scholar]
- 33.Rauscher R., Ignatova Z., Timing during translation matters: Synonymous mutations in human pathologies influence protein folding and function. Biochem. Soc. Trans. 46, 937–944 (2018). [DOI] [PubMed] [Google Scholar]
- 34.Wen J. D., et al. , Following translation by single ribosomes one codon at a time. Nature 452, 598–603 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Svitkin Y. V., et al. , N1-methyl-pseudouridine in mRNA enhances translation through eIF2α-dependent and independent mechanisms by increasing ribosome density. Nucleic Acids Res. 45, 6023–6036 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ishimura R., Nagy G., Dotu I., Chuang J. H., Ackerman S. L., Activation of GCN2 kinase by ribosome stalling links translation elongation with translation initiation. eLife 5, e14295 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Chu D., et al. , Translation elongation can control translation initiation on eukaryotic mRNAs. EMBO J. 33, 21–34 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Simms C. L., Yan L. L., Qiu J. K., Zaher H. S., Ribosome collisions result in +1 frameshifting in the absence of no-go decay. Cell Rep. 28, 1679–1689.e4 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Radhakrishnan A., et al. , The DEAD-box protein Dhh1p couples mRNA decay and translation by monitoring codon optimality. Cell 167, 122–132.e9 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Simms C. L., Yan L. L., Zaher H. S., Ribosome collision is critical for quality control during no-go decay. Mol. Cell 68, 361–373.e5 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.D’Orazio K. N., et al. , The endonuclease Cue2 cleaves mRNAs at stalled ribosomes during no go decay. eLife 8, e49117 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Juszkiewicz S., et al. , ZNF598 is a quality control sensor of collided ribosomes. Mol. Cell 72, 469–481.e7 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Mustoe A. M., et al. , Pervasive regulatory functions of mRNA structure revealed by high-resolution SHAPE probing. Cell 173, 181–195.e18 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Welch M., et al. , Design parameters to control synthetic gene expression in Escherichia coli. PLoS One 4, e7002 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.