Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 2025 Jan 22;34(2):e70036. doi: 10.1002/pro.70036

Translation of the downstream ORF from bicistronic mRNAs by human cells: Impact of codon usage and splicing in the upstream ORF

Philippe Paget‐Bailly 1,, Alexandre Helpiquet 1, Mathilde Decourcelle 2, Roxane Bories 1, Ignacio G Bravo 1,
PMCID: PMC11751868  PMID: 39840808

Abstract

Biochemistry textbooks describe eukaryotic mRNAs as monocistronic. However, increasing evidence reveals the widespread presence and translation of upstream open reading frames preceding the “main” ORF. DNA and RNA viruses infecting eukaryotes often produce polycistronic mRNAs and viruses have evolved multiple ways of manipulating the host's translation machinery. Here, we introduce an experimental model to study gene expression regulation from virus‐like bicistronic mRNAs in human cells. The model consists of a short upstream ORF and a reporter downstream ORF encoding a fluorescent protein. We have engineered synonymous variants of the upstream ORF to explore large parameter space, including codon usage preferences, mRNA folding features, and splicing propensity. We show that human translation machinery can translate the downstream ORF from bicistronic mRNAs, albeit reporter protein levels are thousand times lower than those from the upstream ORF. Furthermore, synonymous recoding of the upstream ORF exclusively during elongation significantly influences its own translation efficiency, reveals cryptic splice signals, and modulates the probability of downstream ORF translation. Our results are consistent with a leaky scanning mechanism facilitating downstream ORF translation from bicistronic mRNAs in human cells, offering new insights into the role of upstream ORFs in translation regulation.

Keywords: bicistronic mRNA, codon usage, downstream ORF, elongation, eukaryote, fluorescence, GFP, main ORF, polycistronic mRNA, RNA splicing, transcription, translation, upstream ORF

1. INTRODUCTION

Translation is the most energy‐demanding step in gene expression and thus is highly regulated by in‐cis acting sequences and in‐trans acting factors (Lynch & Marinov, 2015). Canonical eukaryotic translation begins with scanning of an mRNA from the 5′ 7‐meG cap until the recognition of a suitable start codon by the preinitiation complex (Cigan et al., 1988; Kozak, 1989). The presence of mRNA secondary structures and regulatory sequences between the 5′ 7‐meG cap and the ORF start codon greatly affects translation initiation efficiency (Hinnebusch et al., 2016; Kozak, 1986). Textbook descriptions describe genes in eukaryotic genomes as most often arranged in individual promoters and monocistronic mRNAs. However, mounting evidence suggests that up to 50% of vertebrate mRNAs present a short regulatory ORF immediately upstream or overlapping the “main” or “canonical” ORF (Calvo et al., 2009; Chew et al., 2016; Lin et al., 2019). This suggests that upstream ORFs are not rare exceptions, but rather integral to gene expression regulation in vertebrates. Such upstream ORFs play regulatory roles on the translation of the downstream ORF, either by in‐cis mechanisms, whereby the presence of the upstream ORF modulates the probability of the ribosome to engage in translation of the downstream ORF, or by in‐trans mechanisms mediated by the protein product of the upstream ORF (Chen et al., 2020).

In contrast to idealized monocistronic mRNAs in eukaryotes, many RNA and DNA viruses infecting eukaryotes produce polycistronic mRNAs containing multiple successive ORFs. Viruses have evolved a large repertoire of mechanisms that manipulate the cellular translation machinery to engage in multiple, independent or sequential, translation events from a single mRNA (Walsh & Mohr, 2011). Many viruses display for instance internal ribosomal entry sites preceding the ORFs present on a single mRNA molecule, thus allowing for ribosomal recruitment and translation of downstream ORFs. Many other viruses tamper with the ribosomal machinery allowing for non‐canonical initiation, elongation, or termination (Sorokin et al., 2021). Finally, for other viruses, the mechanisms allowing for viral protein translation from downstream ORFs on polycistronic mRNAs remain poorly understood, as no cis‐acting mRNA elements nor trans‐acting viral proteins have been identified (Stacey et al., 2000; Tang et al., 2006).

Compositional and structural features of viral mRNA (acting in‐cis) as well as the biochemical context in the infected cell (acting in‐trans) result in a probabilistic, differential ORF translation. This stochastic process influences viral gene expression patterns in a given cell type, influencing whether the cell is permissive or not for viral infection and eventually governing viral life cycle.

In this study, we address the question of to what extent the characteristics of an upstream ORF have an impact on the translation of a downstream ORF in human cells. To model translation from viral polycistronic mRNAs, we engineered a heterologous system expressing bicistronic mRNAs in human cells. We have combined transcriptomic, proteomic, and cytometry‐based fluorescence analyses to evaluate the impact of mRNA composition, codon recoding, and splicing on protein expression and translation efficiency from bicistronic mRNAs. Our findings provide qualitative and quantitative insights into the regulatory roles of mRNA features in shaping gene expression, with potential implications for understanding cellular the diversity of outcomes for viral infections.

2. RESULTS

2.1. Transfection with synonymous versions of a bicistronic construct does not affect heterologous mRNA levels, but synonymous recoding can introduce unpredicted splicing events

Thirteen synonymous versions of the shble ORF were chemically synthesized and cloned into the pcDNA3.1‐C‐EGFP expression vector (Figure S1A). Transfection with all constructs results in the expression of a 1602 nt‐long mRNA with the same organization: a 161 nt‐long 5′UTR; a 414 nt‐long shble ORF, including an N‐terminal AU1 tag and a C‐terminal FLAG tag; a 19‐nt long gap; a 714 nt‐long egfp ORF; and a 288 nt‐long 3′UTR (Figures 1a and S1B). Only the shble coding sequences differed synonymously between constructs. Versions shble#1 to #6 were engineered for a previous study (Picard et al., 2023) to explore the extremes of CUPrefs and nucleotide composition with regards to the average human ones. The design strategy is detailed in Experimental procedures. Versions shble#7 to #13 were selected among a pool of a thousand “guided random” shble versions to encompass the CUPrefs and mRNA folding energy spectrum found in actual human mRNA sequences. Thus, the 13 shble versions display by design compositional variation in G and C frequency in the third codon position (GC3), CpG dinucleotide frequency, TpA dinucleotide frequency, CUPrefs, as well as in mRNA folding energy (Table S2). We have further evaluated the match between the CUPrefs of each version to the average ones in the human genome using COUSIN (Codon Usage Similarity Index), which allows us to differentiate and quantify random codon usage, under‐ and overmatch to a given reference (Bourret et al., 2019). Using these five parameters in a principal component analysis (PCA) allowed for sharp discrimination of versions shble#1 to #6 and to a lesser extent for versions shble#7 to #13 (Figure 1b) with the first and second axes capturing 77.3% and 16.3% of the total variance respectively. We have subsequently used the projections of each shble version on the first PCA axis as composite variable to evaluate the impact of codon recoding against the different levels of experimental data we have generated.

FIGURE 1.

FIGURE 1

(a) Cartoon of the synonymous recoding strategy for our constructs. All synonymous versions of the shble ORF are located in a bicistronic mRNA in tandem with the egfp ORF downstream, and under the transcriptional control of the cytomegalovirus (CMV)‐ promoter. The shble ORF is preceded by an invariant stretch of 18 nucleotides encoding for an AU1 epitope. Synonymous shble variants explore a sequence space associated to differences in GC3, CpG and TpA composition, as well as in match to the average human codon usage preferences and mRNA folding energy. (b) First two axes of a principal component analysis (PCA) of the compositional variables for the 13 shble versions used. The percentage of the total variance captured by each axis is given in parenthesis. GC3, percentage of G or C at the third codon nucleotide codons; freq_CpG, CpG dinucleotide frequency; freq_TpA. TpA dinucleotide frequency; folding, energy of the more stable structure predicted for the mRNA shble ORF estimated with UNAfold online tool (http://unafold.org) (Markham & Zuker, 2008); COUSIN_59, value of the COdon Usage Similarity Index of the shble version with respect to the average human codon usage (Bourret et al., 2019).

We scanned all 13 synonymous versions using the HSP (Desmet et al., 2009) and SPLM algorithms (Solovyev, 2003) to avoid the presence of splice sites. Nevertheless, we identified splicing events in the shble ORF when expressing versions shble#4, #6 #7, #10, and #13 in the U‐2 OS cell line (Figure S2A). Our lab had previously communicated splicing activity in versions shble#4 and #6 when expressed in HEK‐293 human cells (Picard et al., 2023). For all spliced versions, we individually identified donor and acceptor sites by RT‐PCR followed by Sanger sequencing (Figure S2B, C). We generated mutated versions of these five constructs with ablated splice sites to eliminate splicing as a confounding factor. We took later advantage of the paired splice‐able vs splice‐unable constructs to study the impact of splicing on translation regulation of our bicistronic mRNAs.

Transient transfection in biological quadru‐ or octuplicate experiments resulted in normalized mean mRNA levels ranging from 0.80 (95%CI: 0.76–0.84) for shble#1 to 1.28 (95%CI: 1.10–1.46) for shble#8, as evaluated by RT‐qPCR, with no significant differences in mRNA levels among the 13 constructs (Figure 2a, pairwise Wilcoxon rank sum test with Benjamini–Hochberg (B–H) correction; α = 0.05). Only transfection with the control construct, monocistronic egfp coding mRNA (mean 1.61, 95%CI: 1.47; 1.74), presented a significant difference with the shble#1 condition (Figure 2a, Wilcoxon rank sum test with B–H correction; p = 0.028). In agreement, variation in mRNA composition and structure did not explain variation in mRNA levels (Figure 2b, R 2 = 0.0008, p = 0.8).

FIGURE 2.

FIGURE 2

Compositional variation of synonymous shble versions does not affect mRNA levels. (a) Box‐and‐whiskers plot showing relative levels of total heterologous mRNAs measured by RT‐qPCR from four or eight biological replicates. Within a given replicate, relative mRNA level values were normalized by the median value of all samples, allowing comparison across replicates. The positive control “empty” condition transcribed a monocistronic egfp mRNA while the 13 shble conditions (sh1 to sh13) transcribed a bicistronic shble_egfp mRNA. Letters present the results of a pairwise Wilcoxon rank sum test with B–H adjusted p‐values. Median values for samples labeled with the same letter are not statistically different (α = 0.05). (b) Pearson's linear regression (black line) and 95% confidence interval of the fit (gray) between the projection on the first axis of PCA in Figure 1 for each shble version, and the relative mRNA levels.

2.2. Changes in nucleotide composition of the upstream shble ORF, without modifying translation initiation, modulate its own translation efficiency

Heterologous expression resulted in very high SHBLE protein levels: label‐free proteomic analyses of the 13 shble versions in biological triplicates revealed SHBLE as the second most abundant protein in the cellular extracts among the 4199 detected, in terms of iBAQ values (in average to the 8th most abundant protein in raw intensity values; data available in the PRIDE repository entry PXD047576). We verified first that high heterologous protein expression had no impact on the total amount of protein signal detected in each sample (Figure S3A), and that heterologous protein levels did not vary with the overall protein levels in the sample (Figure S3B, for SHBLE R 2 = 0.0004, p = 0.88; and S3C for EGFP R 2 = 0.0001, p = 0.93). We quantified next to what extent variation in SHBLE protein levels could be accounted for by variation in mRNA levels and in mRNA composition. Using triplicate biological experiments comparing RT‐qPCR and proteomic data, our results show that 71% of the variation in SHBLE levels can be explained by variation in the bicistronic shble_egfp mRNA levels (Figure 3a, R 2 = 0.71, p = 1.9 × 10−11) and that 44% of the variation in SHBLE levels can be explained by variation in the overall mRNA composition (Figure 3b; R 2 = 0.44, p = 3.9 × 10−6).

FIGURE 3.

FIGURE 3

(a) SHBLE protein levels as a function of mRNA levels. Pearson's linear regression (black line) and 95% confidence interval of the fit (gray) between relative shble_egfp mRNA levels and relative GFP protein levels for the 13 shble synonymous versions. (b) SHBLE translation efficiency as a function of shble sequence characteristics. Pearson's linear regression (black line) and 95% confidence interval of the fit (gray) between the projection on the first axis of PCA in Figure 1 for each shble version, and the protein‐over‐mRNA ratio for data presented in panel (a). For all, values from a same biological replicate are represented by triangle, rectangle or circle shapes.

As a proxy for translation efficiency, we normalized SHBLE protein levels over the corresponding shble_egfp mRNA levels. We did not detect significant differences in translation efficiency for SHBLE among the 13 synonymous versions, most likely because we could use data from only triplicate experiments pairwise (Figure S4A; Wilcoxon rank sum test after B–H correction for multiple comparisons; α = 0.05). Nevertheless, we state a visual trend for shble#1 and #2 to display higher translation efficiency values, with a noteworthy three‐fold variation between values for shble#6 (median 0.40) and for shble#1 (median 1.30). We resorted analyzed the SHBLE translation efficiency as a function of the overall compositional features of the recoded shble versions. Our results show that variation in the characteristics of the upstream shble ORF explained 44% of the variation in the SHBLE translation efficiency (Figure 3b; R 2 = 0.44, p = 3.9 × 10−6). The specific analysis by individual variable (Figure S4B to F) showed that SHBLE translation efficiency increased with increased GC3 (R 2 = 0.46, p = 2.1 × 10−6), with a higher match to the average CUPrefs of the human genome (R 2 = 0.40, p = 1.7 × 10−5), with increased CpG frequency (R 2 = 0.22, p = 2.5 × 10−3), with decreased TpA levels (R 2 = −0.40, p = 1.7 × 10−5) and with less stable folding structures of the mRNA shble ORF (R 2 = −0.26, p = 9.9 × 10−4). Finally, we performed a sequential linear regression to determine a posteriori which combination of the five parameters used for the PCA in Figure 1 provided the highest explanatory power for variation in SHBLE translation efficiency (Table S1). Sequential linear regression revealed GC3 as the best predictor of SHBLE translation efficiency value (R 2 = 0.46), with the four remaining parameters providing no additional explanatory power.

Since all the recoded shble versions are identical in the 5′ UTR and in the first 24 coding nucleotides, we interpret that the differences observed in translation efficiency are related to translation elongation, rather than to translation initiation. Overall, our results demonstrate that synonymous recoding of the shble ORF is sufficient to impact SHBLE protein levels by affecting translation efficiency during translation elongation.

2.3. Human cells constitutively translate the downstream egfp ORF from bicistronic mRNAs, albeit thousand times less efficiently than the upstream ORF

We assessed translation of the egfp downstream ORF from bicistronic mRNAs, as a function of mRNA levels and/or mRNA composition. We used RT‐qPCR and proteomic data for three replicates, as described above, and included further five replicates in which we used flow cytometry as an orthogonal technique to assess GFP levels. We integrated fluorescence intensity from a random sample of 30,000 transfected cells (events) from each experiment and used it as a proxy for GFP protein levels in the cell population. There was a very good correspondence between both GFP estimates, as variation in proteomic‐based GFP levels accounted for 83% of the variation in cytometry‐based GFP fluorescence levels (Figure 4a, R 2 = 0.83, p = 2.1 × 10−15).

FIGURE 4.

FIGURE 4

(a) GFP production from synonymous versions of a bicistronic shble_egfp mRNA. Pearson's linear regression (black line) and 95% confidence interval of the fit (gray) between proteomic‐based, sample‐normalized GFP intensity levels and the sum of fluorescence intensity of the cellular population for the 13 shble versions. Sums of fluorescence were calculated by integrating the fluorescence signal of 30,000 randomly selected cells in each transfection event. (b) Dot‐plot showing relative levels of heterologous proteins measured by label‐free proteomic from three biological replicates. For each sample, iBAQ values of SHBLE and GFP were normalized by the total iBAQ value of the sample. The positive control “GFP_mono” condition translated GFP from a monocistronic egfp mRNA while the 13 shble conditions translated SHBLE and GFP from a bicistronic shble_egfp mRNA. GFP values do not follow a normal distribution after a Shapiro normality test (p = 0.0014), hence the p‐values present the results of a pairwise Wilcoxon signed rank exact test. (c) Pearson's linear regression (black line) and 95% confidence interval of the fit (gray) between relative shble_egfp mRNA levels and proteomic‐based GFP levels for the 13 shble synonymous versions. (d) Pearson's linear regression (black line) and 95% confidence interval of the fit (gray) between relative shble_egfp mRNA levels and fluorescence‐based GFP levels for the 13 shble synonymous versions. (e) Pearson's linear regression (black line) and 95% confidence interval of the fit (gray) between the projection on the first axis of PCA in Figure 1 for each shble version, and egfp translation efficiency calculated with proteomic data. (f) Pearson's linear regression (black line) and 95% confidence interval of the fit (gray) between the projection on the first axis of PCA in Figure 1 for each shble version, and egfp translation efficiency calculated with fluorescence data. For panels (a), (c), (e), and (f), values from a same biological replicate are represented by triangle, rectangle or circle shapes.

SHBLE protein levels were on average 1197 (95%CI: 1131–1364) times higher than GFP levels (Figure 4b, Wilcoxon rank sum test with continuity correction; p < 3.08 × 10−14). We verified that GFP could be efficiently produced from our constructs, and indeed GFP protein levels produced from the egfp monocistronic control were only slightly lower than SHBLE protein ones (Figure 4b Wilcoxon rank sum exact test; p = 0.030) but most importantly, were significantly higher than GFP protein levels produced from the second ORF of every bicistronic mRNAs (Figure 4b Wilcoxon rank sum exact test; p = 1.74 × 10−4). We interpret that translation occurs preferentially from the upstream ORF, while translation from the downstream ORF occurs at lower levels albeit constitutively.

Variation in mRNA levels accounted for a significant fraction of variation in GFP levels, both using proteomic‐based data (Figure 4c, R 2 = 0.55, p = 7.7 × 10−8) and fluorescence‐based data (Figure 4d, R 2 = 0.30, p = 7.4 × 10−8). It is of note that the explanatory power of variation in mRNA levels on variation in protein levels is substantially higher for the upstream shble ORF (Figure 3a) than for the downstream egfp ORF. Translation efficiency of egfp, defined as GFP protein levels over shble_egfp mRNA levels, was not different between shble synonymous conditions (Figure S5A,B pairwise Wilcoxon rank sum test after B–H correction for multiple comparisons; α = 0.05). More importantly, egfp translation efficiency displayed no correlation with the overall compositional features of the bicistronic shble_egfp mRNA (Figure 4e, f).

Overall, our results show that synonymous recoding of the upstream shble ORF does not have an impact on translation efficiency of the downstream egfp ORF.

2.4. Translation of the downstream egfp ORF is compatible with leaky scanning, independently of the nucleotide composition upstream the egfp AUG

We have determined first the impact of synonymous variation in the shble upstream ORF on its own translation efficiency; second the constitutive translation of the egfp downstream ORF at lower levels compared to the upstream ORF; and third the absence of impact of synonymous variation in the shble upstream ORF on the translation efficiency of the egfp downstream ORF. We sought then to identify the (non‐)canonical mechanism(s) at play allowing the translation of a downstream ORF in eukaryotic cells. To do so, we have integrated results from mRNA levels and protein quantification for the upstream and the downstream ORFs.

We first evaluated the correlation between SHBLE and GFP protein levels, aiming at understanding whether ribosomal engagement in translating the upstream shble ORF was monotonically accompanied by an engagement in translating the downstream egfp ORF. Globally, variation in SHBLE levels was only a moderate predictor of variation in GFP levels (Figure S6, R 2 = 0.24; p = 0.0017). Yet, when we stratified our data according to the match between the CUPrefs of the shble synonymously recoded ORF and those of the human average, a striking pattern appeared (Figure 5a): for shble versions overmatching the human average CUPrefs (i.e., shble#1, #2, #9, #12, and #13), SHBLE levels were in average 1628 times higher than GFP levels (95%CI:1403–1854), while for shble versions with CUPrefs close to the human average (i.e., shble#5, #7, #10 and #11) SHBLE levels were in average 998 times higher than GFP levels (95%CI: 831–1165), and for shble versions with CUPrefs under matching the human average (i.e., shble#3, #4, #6 and #8) SHBLE levels were in average 422 times higher than GFP (95%CI:394–503). The comparison of the three categories revealed significant increase in the SHBLE‐over‐GFP protein ratio following the increase in shble match to the average human CUPrefs (Figure 5a, pairwise Wilcoxon rank sum exact test with B–H correction; α = 0.05). Variation in the shble upstream ORF composition explained only 14% of variation in SHBLE‐over‐GFP levels (Figure 5b, R 2 = −0.14; p = 0.00032). Finally, variation in GFP translation efficiency was not explained by variation in SHBLE translation efficiency (Figure 5c, R 2 = 0, p = 0.64), suggesting a lack of influence of upstream ORF translation onto downstream ORF translation.

FIGURE 5.

FIGURE 5

Relative SHBLE and GFP production from bicistronic shble_egfp mRNA. (a) Box‐and‐whiskers plot of the SHBLE/GFP proteins ratio for the 13 shble versions, stratified by their match to the average CUPrefs of the human genome. Letters present the results of a pairwise Wilcoxon rank sum test with B–H adjusted p‐values; α = 0.05; median values for samples labeled with the same letter are not statistically different. (b) Pearson's linear regression (black line) and 95% confidence interval of the fit (gray) between the projection on the first axis of PCA in Figure 1 for each shble version, and the SHBLE‐over‐GFP proteins levels. (c) Pearson's linear regression (black line) and 95% confidence interval of the fit (gray) between SHBLE translation efficiency and GFP translation efficiency. For panels (b) and (c), values from a same biological replicate are represented by triangle, rectangle or circle shapes.

Overall, our protein quantification results show that synonymous variation in the shble upstream ORF modulates its own translation, but not that of the egfp downstream ORF. Given our experimental design with invariant 5′UTR, invariant ribosome binding site, and invariant AUG translation context, we conclude that GFP is translated through infrequent ribosome leaky scanning events, which are not influenced by the nucleotide composition of the scanned sequence upstream the egfp AUG.

2.5. Splicing events within the shble upstream ORF modulate both SHBLE and GFP protein levels but fall short to close the expression gap between the two

Five out of our 13 shble_egfp engineered constructs produced bicistronic transcripts that undergo splicing events with donor and acceptor sites within the shble ORF. In these cases, the SHBLE protein can be produced only from the unspliced bicistronic mRNA, while the GFP protein can be conceptually produced from both the unspliced and spliced mRNAs, albeit with potentially modified translation efficiency. Two of these splicing events (shble#4 and #6) had been described previously in Picard et al., (2023) and correspond to the excision of an intron spanning nt 70 to nt 380 of the shble ORF (Figure 6a). This splicing event produces a shble*I ORF still in frame with the original shble and maintaining the same stop codon, thus conserving the 19‐nt gap between the shble UAA stop codon and the egfp AUG start codon. The second splicing event concerns constructs shble#7, #10, and #13 and leads to the excision of an intron spanning nt 186 to nt 380 for shble#7 and #10, or to nt 377 for shble#13. These three splicing events using the splice donor at nt 186 introduce a frameshift so that the UGA stop codon of the spliced, frameshifted shble*II ORF overlaps with the AUG start codon of egfp, within the CGCAUGAGC sequence.

FIGURE 6.

FIGURE 6

Effect of shble_egfp bicistronic mRNA splicing on SHBLE and GFP protein levels. (a) Schematic representation of unspliced and spliced bicistronic mRNAs generated for shble versions shble#4, #6, #7, #10, and #13 and their respective original (splice‐able) or mutated (splice‐ablated) sequences. Figure should be read as follows, using shble#4 as an example: “sh4sp” refers to the sequence that undergoes splicing, while “sh4” refers to the sequence that has been mutated to ablate splicing. Splice donor (SD) and splice acceptor (SA) site positions relative to the shble AUG are indicated by discontinuous lines. (b) Box‐and‐whiskers plot showing the fraction of unspliced mRNA generated by splice‐able (e.g., “sh4sp”) and splice‐ablated (e.g., “sh4”) constructs, determined by Bioanalyzer. (c) Dot plot showing relative levels of heterologous proteins measured by label free proteomic from three biological replicates. For each sample, iBAQ values of SHBLE and GFP were normalized by the total iBAQ value of the sample. The positive control “GFP_mono” condition translated GFP from a monocistronic egfp mRNA while the 10 shble conditions translated SHBLE and GFP from a spliced or non‐spliced bicistronic shble_egfp mRNA, as specified by the color code. GFP values do not follow a normal distribution after a Shapiro normality test (p = 6.18e‐13), hence the p‐values present the probability that the medians of the groups are not different, after a pairwise Wilcoxon signed rank exact test (α = 0.05). (d) Pearson's linear regression (black line) and 95% confidence interval of the fit (gray) between proteomics‐based SHBLE levels and splicing efficiency, measured as fraction of spliced mRNA over total mRNA. (e) Pearson's linear regression (black line) and 95% confidence interval of the fit (gray) between fluorescence‐based EGFP levels and splicing efficiency, measured as fraction of spliced mRNA over total mRNA. (f) Connected dot‐plot showing SHBLE‐over‐GFP protein levels for the 10 splice‐able and splice‐ablated shble versions from three biological replicates. Paired differences between splice‐able and splice‐ablated versions were assessed using the Wilcoxon signed rank sum test (p = 4.27e‐4).

In order to study the impact of splicing in the shble upstream ORF on the translation of the egfp downstream ORF, we introduced synonymous point mutations in each spliced shble version aiming at ablating the splicing capacity. All original and mutated donor and acceptor splice site sequences are displayed in Figure 6a. Splicing efficiency was evaluated by means of bioanalyzer quantification (Figure S7A, B). Splicing efficiency of the original events ranged between 99% for shble#6 and 20% for shble#13, and mutational ablation drastically reduced splicing in all cases (Figure 6b). Assessment of total mRNA levels by RT‐qPCR revealed no significant differences between constructs (Figure S8A) and no correlation between mRNA levels and splicing efficiency (Figure S8B).

Both SHBLE and GFP proteins were detected in all samples from the 10 constructs in biological triplicates, even those undergoing highly efficient splicing. SHBLE protein levels were on average 754 (95%CI: 581–927) times higher than GFP levels (Figure 6c, Wilcoxon rank sum test with continuity correction; p < 3.02 × 10−11). GFP protein levels produced from the egfp monocistronic control were not significantly different from SHBLE protein levels (Figure 6c Wilcoxon rank sum exact test; p = 0.2108) but most importantly, were significantly higher than GFP protein levels produced from the second ORF of every bicistronic mRNAs (Figure 6c, Wilcoxon rank sum exact test; p = 3.67 × 10−4). We detected SHBLE protein in all three label‐free proteomic replicates for the shble#6sp condition (Figure 6c; data available in PRIDE database entry PXD047576), albeit at the lowest levels across conditions. However, we detected unspliced mRNA coding SHBLE in only one replicate out of eight (Figure 6b), suggesting a better sensitivity of the proteomic measurements than of the transcriptomic ones following RT‐PCR amplification and Bioanalyzer analyses.

Mass spectrometry analyses did not detect any peptide that could correspond to the truncated SHBLE*I or SHBLE*II proteins, potentially translated from the corresponding spliced mRNAs. All our western‐blotting efforts to detect these proteins by targeting the AU1 tag rendered also negative results (data not shown).

Ablation of the splicing signals was accompanied by an increase in SHBLE protein levels (Figure 6d; R 2 = 0.42; p = 0.0001) and by a decrease in GFP levels (Figure 6e; R 2 = −0.42; p = 1 × 10−8; Figure S9A; R 2 = 0.29; p = 0.0021), overall resulting in a significant increase of the SHBLE‐over‐GFP protein ratio (Figure 6f, paired Wilcoxon signed rank sum test; p = 4.2 × 10−4; Figure S9B, R 2 = −0.39, p = 2.2 × 10−4). Specifically, splicing ablation resulted in SHBLE‐over‐GFP levels increase on average by 6.41 (CI95: 4.14–8.68), 11.61 (CI95: 8.14–15.08), and 6.87 (CI95:5.64–8.10) fold for shble#4, shble#6, and shble#10, respectively. In the case of shble#13, displaying low starting splicing levels, mutation of splicing signals did not result in a change of the SHBLE‐over‐GFP ratio (1.02 fold, CI95: 0.78–1.27). Finally, the response of shble#7 (2.40 fold, CI95: 1.25–3.55) was milder than those of shble#4, #6, and #10, even if mutation totally ablated the high basal splicing activity of shble#7sp (Figure 6b, median basal splicing efficiency 76%).

Compared to variation in protein levels, results for variation in translation efficiency are intriguing. Splice‐ablated versions displayed higher SHBLE levels but lower SHBLE translation efficiency for a given shble version (Figure S9C, paired Wilcoxon signed rank sum test; p = 6.1 × 10−5). We exclude saturation of the translation machinery by the heterologous transcripts as an explanation for this decrease in translation efficiency, as variation in shble_egfp mRNA levels explains 67% of the variation in SHBLE protein levels for the splice‐ablated versions (Figure S9D, R 2 = 0.67, p = 2.9 × 10–14). Regarding GFP, splice‐ablated versions displayed lower GFP levels and overall higher GFP translation efficiency (Figure S9F, paired Wilcoxon signed rank sum test; p = 4.27 × 10−4). However, the response was highly dependent on the precise shble synonymous version, as shble#7 and shble#13 displayed no difference in GFP translation efficiency between splice‐able and splice‐ablated versions (Figure S9E).

Overall, our results show that splicing in the upstream shble ORF results in a qualitative increase in translation of the downstream egfp ORF. Nevertheless, the quantitative extent of this increase depends on the specific synonymous recoding of each shble version.

3. DISCUSSION

In this work, we present a straightforward experimental model conceived to study gene expression regulation from virus‐like bicistronic mRNAs in human cells, constituted by synonymous versions of a short shble upstream ORF followed by an egfp downstream reporter ORF, overall allowing for mRNA and protein quantification. We demonstrate that translation efficiency during elongation of the upstream ORF is largely determined by its own nucleotide composition. Furthermore, we show that the human cellular translation machinery can translate downstream ORFs in bicistronic mRNAs. Although protein levels from the downstream ORF are thousand times lower than those from the upstream ORF, our results challenge conventional textbook interpretations of eukaryotic translation as strictly monocistronic. Finally, synonymous modifications within the upstream ORF may uncover cryptic splicing signals, resulting in increased translation of the downstream ORF. Overall, our results are compatible with a leaky scanning mechanism at play for translation of the downstream ORF in our system.

3.1. Synonymous variations in the shble upstream ORF do not affect levels of bicistronic heterologous transcripts

CUPrefs have an impact on mRNA production and stability (Hanson & Coller, 2018). We have here experimentally addressed the impact of codon recoding on heterologous mRNA levels. In our system, neither synonymous sequence recoding nor splicing led to differences in terms of total heterologous mRNA levels. Our results appear to contradict in first‐view studies showing CUPrefs as an important determinant of steady‐state mRNA levels in other eukaryotic experimental systems, such as Neurospora (Zhao et al., 2021; Zhou et al., 2016), Saccharomyces (Chen et al., 2017; Presnyak et al., 2015; Victor et al., 2019), or human cells (Newman et al., 2016). However, this discrepancy is reconciled by considering that our experimental design introduced changes exclusively in the upstream ORF coding region so that all our constructs share identical 5′UTR, identical first 24 coding nucleotides in the shble ORF, and identical 3′UTR. The effect of mRNA composition on mRNA levels had been related to the non‐coding regions of the transcript, depending on the promoter (Yang et al., 2021) and on specific 3′UTR cis‐regulatory sequences and poly‐A signals (Cheng et al., 2017), but mRNA levels can also be modulated by variation in nucleotide composition and CUPrefs in the transcript 5′ end (Chen et al., 2017). We interpret that the overall invariant mRNA steady levels in our system are consistent with a low impact of variation in nucleotide composition and mRNA folding energy in the coding portion of the bicistronic mRNA.

3.2. Compositional features of the shble upstream ORF modulate its own translation efficiency during elongation

In our design, the molecular contexts for shble translation initiation and translation termination are strictly identical between versions in our experimental design. To explore the impact of synonymous recoding on bicistronic mRNAs translatability we have addressed steady‐state protein‐over‐mRNA levels, as a proxy to study translation efficiency (Hernandez‐Alias et al., 2023).

Our results show that 44% of the variation in shble translation efficiency is explained by variation in shble nucleotide composition (Figure 3b). Maximum differences in SHBLE translation efficiency across constructs were around 3.2 times between human over‐matched shble#1 and human under‐matched shble#6. Such magnitude in translation efficiency differences appears modest compared to literature reporting differences up to 100 times in protein levels between synonymous variants of the same ORF in different eukaryotic systems, for example, yeast (Kaishima et al., 2016), mammalian cells (Nagata et al., 1999) or plant cells (Kwon et al., 2016). However, this moderate effect is consistent with the invariable translation initation context in our study. Indeed, even after exploring over 240,000 synonymous variants in a large endeavor to understand the impact of nucleotide composition on prokaryotic translation, the beautiful and exhaustive work by Cambray and co‐workers could only explain 30% of the total variance in protein production. These authors identified initiation of protein synthesis as the most determinant step for total protein production. Variation in the folding energy of the mRNA structure(s) spanning 30 nucleotides upstream and downstream the AUG start codon, which corresponds to the average ribosomal mRNA occupancy (Ingolia et al., 2011), accounted alone for around 25% of the total variation in protein production (Cambray et al., 2018). In our experimental design, the mRNA sequence in the vicinity of the shble AUG start codon is identical same across constructs, and local changes in mRNA structure do not come into play. We interpret that the impact we observe from variation in CUPrefs on variation in SHBLE production efficiency is linked directly to translation elongation. Our results are consistent with experimental data showing that synonymous codon recoding can affect eukaryotic translation elongation in Neurospora, Drosophila, and yeast (Pop et al., 2014; Yu et al., 2015; Zhao et al., 2017). The reported modulation of translation efficiency at constant translation initiation supports further the claim that translation elongation can influence translation initiation (Chu et al., 2014; Lyu et al., 2021).

The impact of the so‐called “gene optimization” on protein expression is such that a growing number of algorithms have been published (Chin et al., 2014; Fu et al., 2020; Sandhu et al., 2008) describing different engineering approaches for tuning composition variables to maximize mRNA, protein, and protein‐to‐mRNA production for heterologous expression in biotechnology (Broadbent et al., 2016; Gao et al., 2003; Graf & Deml, 2003; Nogales et al., 2014); for a review see (Mauro &Chappell 2014). However, the literature shows that the ability to manipulate and predict the behavior of the translation machinery does not necessarily imply understanding it. In our experimental system, after having established that variation in the overall biochemical features of the shble synonymous variants explains 44% of variation in its own translation efficiency, we aimed at disentangling the individual impact of compositional variables.

Dinucleotide frequencies and their impact on different steps of gene expression are particularly interesting, as underrepresentation of CpG and TpA dinucleotides across a broad panel of genomes is a well‐established observation (Beutler et al., 1989; Simmonds et al., 2013; Swartz et al., 1962). Regarding CpG dinucleotides at the DNA level, intragenic CpG methylation is a potent repressor of transcription (Bauer et al., 2010; Kosovac et al., 2011; Radrizzani et al., 2024), while at the RNA level, CpG‐rich transcripts can be targeted for degradation through a mechanism proposed to recognize non‐self mRNAs by endonucleases (Duan & Antezana, 2003; Takata et al., 2017). Regarding TpA dinucleotides, their underrepresentation in coding sequences has been associated to the risk of deleterious nonsense mutations (Beutler et al., 1989), while at the mRNA level, TpA‐rich transcripts can be recognized and degraded by endonucleases (Odon et al., 2019).

In our bicistronic system, variation in GC3 composition in the shble upstream ORF explained alone 46% of the variation in SHBLE production efficiency, in good accord with our previous study using shbleP2Aegfp monocistronic expression system (Picard et al., 2023). The large explanatory power of shble GC3 composition on translation efficiency aligns with previous findings on intronless reporter genes (Mordstein et al., 2020). The individual explanatory power from CUPrefs (40%), TpA (40%), or CpG (22%) frequency, and mRNA folding energy (26%) was completely accounted for by the explanatory power provided by variation in GC3 (Table S1). When combined, variation in TpA, CUPrefs, and CpG provided similar explanatory power (49%) to that of variation in GC3 to explain variation in SHBLE translation efficiency, and mRNA folding energy did not provide any additional explanatory power. CpG frequency is highly correlated to total GC content and their respective effects on translation are consequently hard to disentangle (Mordstein et al., 2020), and indeed covariation and anti‐covariation are also observed among the different compositional variables monitored in our constructs (Figure 1b). Our results support the view that mRNA folding downstream the AUG start codon context influences only marginally translation efficiency, in agreement with the well‐established literature showing that effects of mRNA folding energy on protein levels are exerted mostly on translation initiation (Boël et al., 2016; Cambray et al., 2018; Kudla et al., 2009).

3.3. A bicistronic mRNA organization is a major factor conditioning protein synthesis in human cells

Traditional textbook descriptions of translation emphasize the monocistronic nature of eukaryotic mRNAs, as opposed to typically polycistronic mRNAs in prokaryotes. It is thus often interpreted that the eukaryotic machinery cannot handle downstream ORFs located in bicistronic mRNAs. However, our results add to the growing body of evidence demonstrating that eukaryotic machinery can translate downstream ORFs on bicistronic mRNAs. We show a thousand‐fold higher translation efficiency from the upstream ORF relative to the downstream ORF, consistent with earlier findings and contributing with new quantitative information.

Experimental work already demonstrated in the 1980s the translation of a downstream ORF from a bicistronic mRNA in human cells (Mertz et al., 1983; Subramani et al., 1981). Further pioneering works on the regulation of translation initiation established that adding an out‐of‐frame AUG codon upstream an ORF impaired its translation, but that adding a stop codon between the two ORFs and in frame with the first ORF, thence creating what is called now a short upstream ORF, rescues translation from the downstream ORF (Dixon & Hohn, 1984; Hughes et al., 1984; Kozak, 1984; Liu et al., 1984). Subsequent work using biochemical approaches to quantify downstream ORF translation from bicistronic mRNAs determined drops of between 5 and 10 times and between 100 and 300 times, compared to translation of the same ORF from monocistronic mRNA (Kaufman et al., 1987; Peabody et al., 1986; Peabody & Berg, 1986). The currently consolidated view is that in eukaryotes, the presence of an upstream ORFs can modulate translation efficiency of downstream ORFs (Hellens et al., 2016; Hinnebusch et al., 2016) and it is commonly understood that both the presence of introns in the 5′UTR and the presence of an upstream ORF result in a decreased translation of the downstream ORF (Lim et al., 2018). Nevertheless, in a systematic exploration of 4096 variants of a 21‐nucleotides long upstream ORF with synonymous differences in three codons, Lin and coworkers reported effects from two‐fold increase to two‐fold decrease in translation of the downstream ORF (Lin et al., 2019). Using normalized iBAQ data obtained from label‐free proteomics we estimate SHBLE levels to be in average over thousand times higher than the accompanying GFP levels (Figure 4b). Our results show that increased SHBLE levels are accompanied of increased GFP levels in all cases (Figure S6). This covariation was not homogeneous among synonymous shble versions, as SHBLE‐over‐GFP ratios varied with the match between CUPrefs of the shble ORF and the human average ones (Figure 6a, b).

3.4. Alternative mechanistic interpretations for downstream ORF translation in human cells

Any mechanistic interpretation of EGFP production in our system will need to explain that: (i) shble translation efficiency varies as a function of its own composition (Figure 3b); (ii) that this is not the case for egfp (Figure 4d); and (iii) that there is no covariation between the translation efficiency values for the two ORFs (Figure 5d). By design, ribosomal recruitment and engagement do not differ among constructs, as the 5′UTR and the first 24 coding nucleotides of the upstream ORF are identical. Differences in translation efficiency of the upstream ORF will thus arise essentially during the elongation phase. Also by design, the context between the two ORFs is shared across constructs as the last 24 coding nucleotides of the upstream ORF, its stop codon, the 19 nucleotides spacer, and the full downstream ORF are identical for all versions. Differences among constructs in GFP synthesis will thus be linked to a differential probability for leaky scanning and/or to a differential probability for translation reinitiation upon termination, arising as a function of the compositional features and splicing of the upstream shble ORF. We have applied this reasoning for discussing our results.

Two main mechanistic scenarios can be invoked to account for GFP synthesis in our experimental system: (i) leaky scanning, that is, the recruited ribosome does not engage in translation of the upstream ORF and translates instead the downstream ORF (Sorokin et al., 2021); and (ii) translation reinitiation, that is, the ribosome remains engaged in translation and the downstream ORF is translated immediately upon termination of upstream ORF translation (Hellen, 2018). Under the leaky scanning hypothesis, ribosomal engagement in the upstream ORF is enhanced for human‐matched coding sequences, so that for a similar mRNA ability at recruiting ribosomes and for similar mRNA levels, higher levels of SHBLE synthesis would be accompanied by lower levels of GFP synthesis, resulting in the observed increased SHBLE‐over‐GFP ratio for human‐matched shble versions. This interpretation would nevertheless imply that ribosomal engagement on the shble ATG is modulated by a downstream mRNA region, not covered by the ribosome. Conversely, under the reinitiation upon termination hypothesis, the nucleotide composition of the shble coding region has a direct impact on downstream ORF translation, so that termination efficiency is higher and/or the probability of ribosomal re‐engagement is lower in human‐matched coding sequences. In its turn, this interpretation would imply that the termination process is modulated by an upstream mRNA region, not covered by the ribosome. We have resorted to the comparisons of the paired splice‐able and splice‐unable versions in our system to identify the differential experimental support for these two alternative explanations.

3.5. Splicing in the shble upstream ORF increases GFP protein levels

Our results suggest that cryptic splicing is conspicuous after synonymous recoding, as we detected it in five out of the 13 synonymous variants. Our results are consistent with the well‐documented plasticity of alternative splicing in the human genome (Rogalska et al., 2024), and with the diversity of unreported splicing across species (Bénitière et al., 2024). The detection of functional splice sites in five out of 13 synonymous variants highlights the influence of synonymous recoding on splicing probability and on downstream protein synthesis.

Our previous study had identified splicing events in shble#4 and shble#6 (Picard et al., 2023), and from the nine newly tested synonymous shble versions, three harbored functional splice sites, namely shble#7, shble#10, and shble#13. In the U‐2 OS human cell model, splicing was highly efficient for shble#4, #6, #7, and #10 while it was inefficient for shble#13 (Figure 6b). Splicing of shble#4 and #6 mRNAs was more efficient in U‐2 OS than in HEK293 cells (84% vs 21% for shble#4 and 99% vs 79% for shble#6 in U‐2 OS compared to HEK293 (Picard et al., 2023)). All splice events occurred within the shble ORF, thus ablating the SHBLE coding potential in the spliced mRNAs. The donor splice sites resembled the second most common among the typical U1 and U2 splice sites (45.1% of the splice sites in human transcripts) (Sibley et al., 2016). In contrast, while the intron 3′ sequences resembled also the typical U1 and U2 sites, the 5′ of the downstream exons did not match any of the common typical or atypical U1 and U2 sites nor the typical U11 and U12 sites (Sibley et al., 2016). We introduced synonymous mutations in all five donor and acceptor sites, aiming at ablating splicing (Figure 6a). We tried to diverge from the typical splice signal, that is, removing the GU and AG dinucleotides at the 5′ and 3′ of the intron, respectively (Eskesen et al., 2004), but were constrained by the need to conserve the synonymous coding shble sequence and reading frame. Thus, mutated shble#7, shble#10, and shble#13 still retained the consensus GU dinucleotide at the intron 5′ end. Overall, splice site mutation resulted in total ablation of splicing in shble#7 and shble#13, and in substantial reduction of splicing in all other versions, allowing us to compare splice‐able and splice‐ablated pairs of constructs (Figure 6b). Remaining branchpoint sequences and polypyrimidine tracts in the intron 3′ end most likely account for the residual splicing activity (Fallot et al., 2009; Gao et al., 2008; Sibley et al., 2016; Wang & Burge, 2008).

In our experimental setup, we did not observe differences in total heterologous mRNA levels neither among constructs in general, nor between pairs of mRNA constructs solely differing in their ability to undergo splicing (Figure S8A). Our results do not support the well‐established but not fully understood mechanism called intron‐mediated enhancement, referring to increased mRNA levels in intron‐containing mRNAs (Brinster et al., 1988; Callis et al., 1987). In contrast, we observed a systematic increase in GFP translation efficiency in constructs that undergo splicing compared to those that do not, and this increase was significant for shble#4, #6, and #10 (Figure S9E, F). Given that GFP can be synthesized from both the spliced and the unspliced constructs, we interpret that this increase in GFP translation efficiency is related to an increased translatability of the egfp ORF from the shble*_egfp spliced mRNA. Two alternative explanations may account for the observed differences in translation efficiency following splicing at constant mRNA levels: (i) splicing may affect mRNA subcellular location and thus determine translatability (Le Hir et al., 2003); (ii) splicing may modify translation regulation (Fallot et al., 2009).

Regarding the impact of splicing on mRNA availability for translation, the presence of splicing signals in the upstream shble ORF could positively regulate translation of the downstream egfp ORF by depositing regulatory protein complexes on the splice junction, which in turn facilitate nuclear export and increase availability for the translation machinery (Le Hir et al., 2003). Indeed, AU‐rich mRNAs can be retained in the nucleus or in specific cytoplasmic structures such as P‐bodies and may not be available for translation, thus resulting in lower protein levels and decreased translation efficiency when normalizing by total mRNA levels (Courel et al., 2019; Mordstein et al., 2020). Splicing can also enhance translation, as intronless transcripts result in lower protein levels than their counterparts containing an intron in the 5′UTR (Matsumoto et al., 1998; Mordstein et al., 2020). This effect is again mainly mediated by modulation of mRNA subcellular location (Le Hir et al., 2003), as spliced mRNAs are more efficiently exported from the nucleus (Luo & Reed, 1999).

Regarding splicing‐mediated regulation of translation, the increase in GFP levels was not accompanied by the presence of the SHBLE*I (36 amino acids) and SHBLE*II (81 amino acids) truncated proteins, corresponding to the modified shble ORFs upon splicing that could potentially be synthesized from the spliced versions. They remained undetected after numerous western‐blot attempts, under the same conditions (tag and antibodies) that allowed detection of the full‐length SHBLE protein. Furthermore, in our proteomic data, we found no peptide that could have originated from any of these proteins in our label‐free proteomic assays. We should raise a word of caution, because the N‐terminus of these frameshifted proteins is identical to SHBLE, and the number of peptides that could allow for differential identification is very restricted. It could further be argued that these protein forms could be difficult to detect by mass spectrometry or western blot, as they are small, potentially misfolded, thence inherently unstable and rapidly degraded (Wacholder & Carvunis, 2023).

The splice events detected in our system are not homogeneous with regard to the C‐terminus of the SHBLE*I and SHBLE*II proteins. While the splicing for shble#4 and #6 conserves the shble stop codon, the 19 bp intergenic and the egfp start, the splicing event in shble#7, #10 and #13 causes a frameshift that places the UGA shble*II stop codon overlapping the AUG egfp start codon. A similar AUGA configuration has evolved in Feline calicivirus to facilitate translation reinitiation, but it is strongly dependent on a specific mRNA secondary structure (Powell et al., 2008). It could be thus claimed that a cryptic signal for reinitiation upon termination uncovered by the unpredicted splice events allows for GFP synthesis from the shble#7, #10, and #13 spliced mRNAs. However, the efficiency of this mechanism strongly depends on a very specific mRNA secondary structure (Pöyry et al., 2007), and it would still not explain GFP synthesis from shble#4 and #6, which would need a second, not yet described termination‐reinitiation signal. Further, invoking reinitiation upon termination as a mechanism for downstream ORF translation would need to explain that the increased levels of EGFP are not accompanied by any increasing detection of SHBLE*I or SHBLE*II.

Overall, the explanation of leaky scanning as the mechanism to account for the increased synthesis of the egfp downstream ORF in our spliced shble versions is compatible with all the experimental evidence and is more parsimonious than reinitiation upon termination.

3.6. Experimental limitations of our study

Our label‐free proteomic approach allows comparing levels between two different proteins, but different proteins display different responses across the full procedure of digestion, fragmentation, peptide separation, detection, and identification, and thus different detectability (Arike et al., 2012; Zhao et al., 2020). We have estimated that as on average, SHBLE levels are around 1000 times higher than EGFP levels. Notwithstanding, these results do not mean that in our cells there is one GFP molecule for every thousand SHBLE molecules. In previous reports using a shbleP2Aegfp monocistronic expression system, in which SHBLE and GFP were translated in equi‐molarity thanks to the ribosomal skipping step introduced by the P2A signal (Liu et al., 2017), label‐free proteomics still revealed a 1:2 detection ratio for SHBLE:GFP proteins (Picard et al., 2023). This suggests that label‐free proteomic data could have underestimated by a factor of two the actual SHBLE‐over‐GFP levels.

We have assessed mRNA and protein levels at a single time point, 24 h post‐transfection, so that we communicate steady‐state protein levels over mRNA levels as a proxy for translation efficiency. A possible caveat of our approach would be saturation of the protein synthesis machinery. However, the monotonic covariation observed between shble_egfp mRNA levels and both SHBLE and EGFP protein levels separately (Figures 3a and 4c, respectively) suggests that our experimental system does not saturate the translation machinery or the degradation machinery. A second limitation for our single time point approach is the possible differential degradation kinetics for SHBLE and for EGFP. In our interpretation of the results, we make the hypothesis that degradation rates for the shble_egfp mRNA as well as for the SHBLE and the EGFP proteins are similar for all synonymous shble versions so that relative variation in the protein‐over‐messenger steady‐state levels remains informative as a proxy for translation efficiency. Thus, our hypothesis does not necessarily require that the degradation rates of SHBLE and of EGFP are similar to one another, but simply that SHBLE degradation rates are similar across constructs, independently of the synonymous variation used to encode for it.

Our results show that EGFP levels are around thousand times lower than SHBLE levels. It could be argued that EGFP translation could occur randomly, because of stochastic ribosomal scanning. Indeed, a substantial fraction of the variation in both SHBLE and in EGFP levels is explained by variation in shble_egfp mRNA levels (Figure 4c, d). Furthermore, the 5′ region preceding the egfp start codon is strictly identical for all our versions, and consistently we did not observe variation in EGFP translation efficiency across constructs (Figure 4e, f). These observations are also consistent with the pervasive, basal translation phenomenon discovered by ribosome profiling (Ingolia et al., 2014; Ruiz‐Orera & Albà, 2019). Notwithstanding, EGFP levels strongly increased with splicing efficiency in the shble upstream ORF (Figure 6e), while overall mRNA levels remained constant irrespective of splicing (Figure S8A). We conclude thus that our data are rather consistent with a ribosome recruitment at the 5′‐UTR of the sble_egfp mRNA, followed by a probabilistic leaky scanning mechanism resulting in downstream EGFP translation.

Finally, in our study, we have not considered the impact of codon pairs on translation efficiency. Intuitively, the functional translation unit in the ribosome is rather the di‐codon, as it can be conceived that the chemistry of the codon‐tRNA complex in the P site may alter the codon‐anticodon recognition chemistry for tRNAs entering the A site. The impact of dicodon frequency on translation is well documented in global transcriptome‐proteome studies (Alexaki et al., 2019; Doyle et al., 2016), as well as in synonymous recoding, mostly from virus models (Conrad et al., 2018; Eldemery et al., 2023). However, this variable is often overlooked in experimental studies, even in those with much larger scale than ours (Cambray et al., 2018), essentially for two reasons: first, because the large di‐codon combinatory space requires strong effects for pinpointing statistically significant differences; and second because it is intrinsically complicated to disentangle the effect of di‐codons from that of di‐nucleotides over the codon‐codon boundary (Alonso & Diambra, 2023; Daron & Bravo, 2021; Kunec & Osterrieder, 2016).

4. CONCLUSION

We have conceived and tested a simple experimental model system to study translation from a bicistronic mRNA in human cells. Our results support the view that mRNA nucleotide composition and synonymous codon usage can have a direct impact on translation efficiency, but that they further modulate protein synthesis by the introduction/removal of splicing signals and mid‐ and long‐range intramolecular interactions (Callens et al., 2021). We show that modification of CUPrefs exclusively during elongation modifies overall translation efficiency and can result in the appearance of cryptic, unpredicted splice signals. We demonstrate that the ORF located downstream in a bicistronic mRNA can be translated by human cells, albeit at much lower levels than the upstream ORF. We finally propose that our experimental results about the impact of variation in CUPrefs and splicing in the upstream ORF on the translation of the downstream ORF are largely compatible with a mechanism of leaky scanning. We anticipate that our results will help understand the global role of upstream ORFs in the eukaryotic genomes. Finally, we hope that our work will help shed light on the mechanisms by which viruses manipulate the eukaryotic translation machinery, resulting in protein synthesis from complex, polycistronic viral transcripts.

5. EXPERIMENTAL PROCEDURES

Design of the shble synonymous versions and plasmid constructs. Thirteen synonymous versions of the shble gene were designed to explore the impact of open reading frame composition variables on gene expression (all sequences available on GenBank, see data availability paragraph). As previously described in (Picard et al., 2023), versions shble#1–#6 were designed applying the “one amino acid—one codon” approach, that is, all instances of a given amino acid were recoded with the same codon, depending on their frequency in the human genome and the GC‐rich or AT‐rich choice. For versions shble#7–#13, we designed first, a thousand “guided random” synonymous shble sequences with CUPrefs based on the average human ones (Puigbo et al., 2007) and we chose seven based on their match to the human CUPrefs as evaluated by the codon adaptation index (Sharp & Li, 1987) and on their mRNA folding energy values of the shble sequence, calculated using UNAfold online tool (http://unafold.org) (Markham & Zuker, 2008). Identical nucleotide sequences encoding for an N‐terminal AU1 tag (amino acid sequence MDTYRI), a C‐terminal FLAG tag (amino acid sequence DYKDDDDK), and a TAA a STOP codon were added to all shble versions. All 13 versions were chemically synthesized (GenScript), and cloned in the pCDNA3.1(+)‐EGFP‐C (Invitrogen) expression vector on the Xho1 restriction site. Schematic depiction of vector organization is displayed in Figure S1. For all sequences we calculated nucleotide composition variables: GC percentage in the third codon position (GC3), CpG dinucleotide frequency, TpA dinucleotide frequency, CUPrefs to the human average using the COUSIN online tool (http://cousin.ird.fr) (Bourret et al., 2019) and mRNA folding energy of the 5′UTRau1‐shble portion (Table S2).

Splicing discovery, experimental design, and data analyses. Our previous study on shble#1 to #6 showed that transcripts from shble#4 and shble#6 underwent splicing (Picard et al., 2023). In the preliminary results for the present study, we identified by means of RT‐PCR and Sanger sequencing three additional shble versions subject to splicing with varying efficiency, namely shble#7, #10, and #13. We introduced thus synonymous changes in both donor and acceptor splicing sites for all five constructs aiming at ablating splicing. Modified sequences are described in Figure 6a and S2C. Vector information and complete plasmid sequences are available on GenBank (see data availability paragraph).

Cell culture and transient transfection. The U‐2 OS cell line (ATCC, HTB‐96) was cultured at 37°C with 5% CO2 in McCoy's 5A medium supplemented with L‐glutamine (Biowest), with 10% heat‐inactivated foetal calf serum (FCS, EuroBio) and with 1% penicillin–streptomycin (Fisher scientific). Transient transfection was performed using Turbofect according to the manufacturer's instructions (Thermo Scientific). One million cells were seeded in a T25 flask 24 h prior to transfection. Cells were transfected with a mix of 12 μL Turbofect (Thermo Fisher Scientific) and 3 μg of the corresponding vector in 2% FBS McCoy's 5A for 6 h and then incubated in McCoy's 5A 10% FCS. Cells were harvested 24 h after transfection started. For all replicates we used as transfection controls the pcDNA3.1(+)‐C‐EGFP (designated as “empty”), which expresses a monocistronic egfp mRNA, and a mock transfection exposing the cells to the transfection agent alone, without heterologous DNA (designated as “mock”).

Cell collection for downstream experiments. Briefly, 24 h after transfection cells were washed in Ca2+/Mg2+‐depleted PBS (Biowest) and treated with trypsin 0.25%, 0.53 mM EDTA in Ca2+/Mg2+‐depleted PBS (Biowest) for 5 min. Cells were resuspended in cold McCoy's 5A medium and split into four aliquots: one for cytometry, one for RNA extraction, and two for protein extraction. Both RNA and protein cell pellets were stored at −70°C until further use.

Flow cytometry. Pelleted cells were fixed in cold PBS, 2% paraformaldehyde (Sigma) for 15 min, washed twice in cold PBS 0.1 M glycine (Sigma), and analyzed within 24 h. Flow cytometry was performed at the MRI imaging facility (Montpellier, France) on a NovoCyte flow cytometer system (ACEA biosciences) running the NovoExpress software (v1.6.2) with the following measurement settings: a hundred thousand ungated events at fast flow rate (approx. 2000–3000 events/s) with FSC.H superior to 100.0 arbitrary units (A.U.). GFP fluorescence was acquired by excitation at 488 nm with PMT gain set to 479. Filtering of cell debris and doublets and downstream cell population analyses were done with an in‐house R script available on git‐hub (see data availability paragraph).

Qualitative and quantitative analyses of heterologous mRNA. Total RNA was extracted using the Monarch total RNA miniprep kit (NEB). A total of 250 ng RNA (measured on a Nanodrop1000 spectrophotometer, ThermoScientific) were retro‐transcribed using Maxima first strand cDNA synthesis kit (ThermoScientific) following the manufacturer's instructions, including a dsDNAse treatment, in 20 μL final volume. For PCR, 0.5 μL cDNA were used as template for amplification using the Master Mix PCR (2X) kit (ThermoScientific) according to manufacturer's instructions. The relative presence of the spliced and unspliced amplicons generated after RT‐PCR were determined using a DNA12000 chip on a Bioanalyzer (Agilent) and analyzed running the 2100expert software (v.B02.11.SI824). Splicing efficiency was calculated by comparing the area under the curves of electropherogram peaks corresponding to the different mRNA isoforms. Quantitative PCR was performed using QuantiNova SYBR Green PCR kit (Qiagen) according to the manufacturer's instructions (Bustin et al., 2009). Primer sequence, RT‐(q)PCR detailed conditions and amplicons extraction and purification are detailed in supplementary material, Table S3.

Label‐free proteomic analysis. Pelleted cells were lysed using RIPA buffer (50 mM Tris–HCl pH 7.4, 150 mM NaCl, 1% Triton X‐100, 0.5% Na deoxycholate, 1 mM EDTA) supplemented with 30 μg/mL of anti‐protease mixture (Roche Diagnostics) during 20 min at 4°C. After centrifugation at 15,000 × g for 10 min at 4°C, protein concentrations were determined in the supernatant using Bio‐Rad Protein Assay (Bio‐Rad) according to the manufacturer's instructions using Bovine Serum Albumin (Sigma) as standard. Label‐free proteomic was performed at the Montpellier Proteomics Platform (PPM, BioCampus Montpellier), ran on all conditions and from three biological replicates. A total of 20 μg of proteins were in‐gel digested and resulting peptides were analyzed using a Q Exactive HF mass spectrometer coupled with an Ultimate 3000 RSLC system (Thermo Fisher Scientific). MS/MS analyses were performed running the Maxquant software (v1.5.5.1). All MS/MS spectra were searched by the Andromeda search engine against a decoy database consisting in a combination of Homo sapiens entries from Reference Proteome (UP000005640, release 2019_02, https://www.uniprot.org/), a database with classical contaminants, and the sequences of interest (SHBLE, SHBLE*I, SHBLE*II, and GFP). After excluding the usual contaminants, we obtained a final set of 4199 proteins detected at least once in one of the samples. Intensity‐based absolute quantification (iBAQ) normalized by the total iBAQ amount of each sample was used to compare heterologous protein levels between samples and replicates. Mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (Perez‐Riverol et al., 2022) partner repository with the dataset identifier PXD047576 and 10.6019/PXD047576.

Data treatment. Data analyses and statistical analyses were carried out using R (v4.3.1) and R studio software (v 2023.06.1 Build 524). All packages used are included in the scripts available on git‐hub (see data availability paragraph).

AUTHOR CONTRIBUTIONS

Philippe Paget‐Bailly: Data curation; formal analysis; investigation; methodology; visualization; writing – original draft; writing – review and editing. Alexandre Helpiquet: Investigation; methodology. Mathilde Decourcelle: Data curation; formal analysis; investigation; methodology. Roxane Bories: Investigation; methodology. Ignacio G. Bravo: Conceptualization; data curation; formal analysis; funding acquisition; supervision; writing – original draft; writing – review and editing.

FUNDING INFORMATION

This study is supported by the European Union's Horizon 2020 research and innovation program under the grant agreement CODOVIREVOL (ERC‐CoG‐647916) to IGB. PPB is the recipient of a two‐year post‐doctoral grant from Fondation pour la Recherche Médicale.

CONFLICT OF INTEREST STATEMENT

The authors declare that they have no conflicts of interest with the contents of this article.

Supporting information

Data S1: Supplementary Information.

PRO-34-e70036-s001.pdf (2.9MB, pdf)

ACKNOWLEDGMENTS

We acknowledge the Montpellier Proteomics Platform (PPM, BioCampus Montpellier) for mass spectrometry experiments and the MRI imaging facility, member of the France‐BioImaging national infrastructure supported by the French National Research Agency (ANR‐10‐INBS‐04, «Investments for the future») for flow cytometry experiments.

Paget‐Bailly P, Helpiquet A, Decourcelle M, Bories R, Bravo IG. Translation of the downstream ORF from bicistronic mRNAs by human cells: Impact of codon usage and splicing in the upstream ORF . Protein Science. 2025;34(2):e70036. 10.1002/pro.70036

Review Editor: John Kuriyan

Contributor Information

Philippe Paget‐Bailly, Email: philippe.paget-bailly@ird.fr.

Ignacio G. Bravo, Email: ignacio.bravo@cnrs.fr.

DATA AVAILABILITY STATEMENT

Vector and insert sequences are available on GenBank (pcDNA3.1Shble1_eGFP: OR659018, pcDNA3.1Shble2_eGFP: OR659019, pcDNA3.1Shble3_eGFP: OR659020, pcDNA3.1Shble4_eGFP: OR659021, pcDNA3.1Shble5_eGFP: OR659022, pcDNA3.1Shble6_eGFP: OR659023, pcDNA3.1Shble7_eGFP: OR659024, pcDNA3.1Shble8_eGFP: OR659025, pcDNA3.1Shble9_eGFP: OR659026, pcDNA3.1Shble10_eGFP: OR659027, pcDNA3.1Shble11_eGFP: OR659028, pcDNA3.1Shble12_eGFP: OR659029 and pcDNA3.1Shble13_eGFP: OR659030 and pcDNA3.1Shble4mut_eGFP: OR659031, pcDNA3.1Shble6mut_eGFP: OR659032, pcDNA3.1Shble7mut_eGFP: OR659033, pcDNA3.1Shble10mut_eGFP: OR659034 and pcDNA3.1Shble13mut_eGFP: OR659035). All R scripts used to analyze the data are available at Github.com/philippe‐paget/PERVASIVE‐TRANSLATION‐OF‐THE‐DOWNSTREAM‐ORF‐FROM‐BICISTRONIC‐MRNAS‐BY‐HUMAN‐CELLS. The input data .csv files used are also available, together with the scripts used for treating cytometry raw data. Raw data from cytometry, RT‐qPCR, Bioanalyzer, and Sanger sequencing can be shared upon reasonable request to the corresponding author. Label‐free proteomic data are available on the ProteomeXchange platform under the identifier PXD047576.

REFERENCES

  1. Alexaki A, Kames J, Holcomb DD, Athey J, Santana‐Quintero LV, Lam PVN, et al. Codon and codon‐pair usage tables (CoCoPUTs): facilitating genetic variation analyses and recombinant gene design. J Mol Biol. 2019;431:2434–2441. [DOI] [PubMed] [Google Scholar]
  2. Alonso AM, Diambra L. Dicodon‐based measures for modeling gene expression. Bioinformatics. 2023;39:btad380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Arike L, Valgepea K, Peil L, Nahku R, Adamberg K, Vilu R. Comparison and applications of label‐free absolute proteome quantification methods on Escherichia coli . J Proteomics. 2012;75:5437–5448. [DOI] [PubMed] [Google Scholar]
  4. Bauer AP, Leikam D, Krinner S, Notka F, Ludwig C, Längst G, et al. The impact of intragenic CpG content on gene expression. Nucleic Acids Res. 2010;38:3891–3908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bénitière F, Necsulea A, Duret L. Random genetic drift sets an upper limit on mRNA splicing accuracy in metazoans. Elife. 2024;13:RP93629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Beutler E, Gelbart T, Han JH, Koziol JA, Beutler B. Evolution of the genome and the genetic code: selection at the dinucleotide level by methylation and polyribonucleotide cleavage. Proc Natl Acad Sci. 1989;86:192–196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Boël G, Letso R, Neely H, Price WN, Wong K‐H, Su M, et al. Codon influence on protein expression in E. coli correlates with mRNA levels. Nature. 2016;529:358–363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bourret J, Alizon S, Bravo IG. COUSIN (COdon usage similarity INdex): a normalized measure of codon usage preferences. Genome Biol Evol. 2019;11:3523–3528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Brinster RL, Allen JM, Behringer RR, Gelinas RE, Palmiter RD. Introns increase transcriptional efficiency in transgenic mice. Proc Natl Acad Sci. 1988;85:836–840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Broadbent AJ, Santos CP, Anafu A, Wimmer E, Mueller S, Subbarao K. Evaluation of the attenuation, immunogenicity, and efficacy of a live virus vaccine generated by codon‐pair bias de‐optimization of the 2009 pandemic H1N1 influenza virus, in ferrets. Vaccine. 2016;34:563–570. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bustin SA, Benes V, Garson JA, Hellemans J, Huggett J, Kubista M, et al. The MIQE guidelines: minimum information for publication of quantitative real‐time PCR experiments. Clin Chem. 2009;55:611–622. [DOI] [PubMed] [Google Scholar]
  12. Callens M, Pradier L, Finnegan M, Rose C, Bedhomme S. Read between the lines: diversity of nontranslational selection pressures on local codon usage. Genome Biol Evol. 2021;13:evab097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Callis J, Fromm M, Walbot V. Introns increase gene expression in cultured maize cells. Genes Dev. 1987;1:1183–1200. [DOI] [PubMed] [Google Scholar]
  14. Calvo SE, Pagliarini DJ, Mootha VK. Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans. Proc Natl Acad Sci. 2009;106:7507–7512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Cambray G, Guimaraes JC, Arkin AP. Evaluation of 244,000 synthetic sequences reveals design principles to optimize translation in Escherichia coli . Nat Biotechnol. 2018;36:1005–1015. [DOI] [PubMed] [Google Scholar]
  16. Chen J, Brunner A‐D, Cogan JZ, Nuñez JK, Fields AP, Adamson B, et al. Pervasive functional translation of noncanonical human open reading frames. Science. 2020;367:1140–1146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Chen S, Li K, Cao W, Wang J, Zhao T, Huan Q, et al. Codon‐resolution analysis reveals a direct and context‐dependent impact of individual synonymous mutations on mRNA level. Mol Biol Evol. 2017;34:2944–2958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Cheng J, Maier KC, Avsec Ž, Rus P, Gagneur J. Cis ‐regulatory elements explain most of the mRNA stability variation across genes in yeast. RNA. 2017;23:1648–1659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Chew G‐L, Pauli A, Schier AF. Conservation of uORF repressiveness and sequence features in mouse, human and zebrafish. Nat Commun. 2016;7:11663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Chin JX, Chung BK‐S, Lee D‐Y. Codon optimization OnLine (COOL): a web‐based multi‐objective optimization platform for synthetic gene design. Bioinformatics. 2014;30:2210–2212. [DOI] [PubMed] [Google Scholar]
  21. Chu D, Kazana E, Bellanger N, Singh T, Tuite MF, Von Der Haar T. Translation elongation can control translation initiation on eukaryotic mRNAs. EMBO J. 2014;33:21–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Cigan AM, Feng L, Donahue TF. tRNAi met functions in directing the scanning ribosome to the start site of translation. Science. 1988;242:93–97. [DOI] [PubMed] [Google Scholar]
  23. Conrad SJ, Silva RF, Hearn CJ, Climans M, Dunn JR. Attenuation of Marek's disease virus by codon pair deoptimization of a core gene. Virology. 2018;516:219–226. [DOI] [PubMed] [Google Scholar]
  24. Courel M, Clément Y, Bossevain C, Foretek D, Vidal Cruchez O, Yi Z, et al. GC content shapes mRNA storage and decay in human cells. Elife. 2019;8:e49708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Daron J, Bravo I. Variability in codon usage in coronaviruses is mainly driven by mutational bias and selective constraints on CpG dinucleotide. Viruses. 2021;13:1800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Desmet F‐O, Hamroun D, Lalande M, Collod‐Béroud G, Claustres M, Béroud C. Human splicing finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res. 2009;37:e67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Dixon LK, Hohn T. Initiation of translation of the cauliflower mosaic virus genome from a polycistronic mRNA: evidence from deletion mutagenesis. EMBO J. 1984;3(12):2731 36. 10.1002/j.1460-2075.1984.tb02203.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Doyle F, Leonardi A, Endres L, Tenenbaum SA, Dedon PC, Begley TJ. Gene‐ and genome‐based analysis of significant codon patterns in yeast, rat and mice genomes with the CUT codon UTilization tool. Methods. 2016;107:98–109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Duan J, Antezana MA. Mammalian mutation pressure, synonymous codon choice, and mRNA degradation. J Mol Evol. 2003;57:694–701. [DOI] [PubMed] [Google Scholar]
  30. Eldemery F, Ou C, Kim T, Spatz S, Dunn J, Silva R, et al. Evaluation of Newcastle disease virus LaSota strain attenuated by codon pair deoptimization of the HN and F genes for in ovo vaccination. Vet Microbiol. 2023;277:109625. [DOI] [PubMed] [Google Scholar]
  31. Eskesen ST, Eskesen FN, Ruvinsky A. Natural selection affects frequencies of AG and GT dinucleotides at the 5′ and 3′ ends of exons. Genetics. 2004;167:543–550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Fallot S, Ben Naya R, Hieblot C, Mondon P, Lacazette E, Bouayadi K, et al. Alternative‐splicing‐based bicistronic vectors for ratio‐controlled protein expression and application to recombinant antibody production. Nucleic Acids Res. 2009;37:e134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Fu H, Liang Y, Zhong X, Pan Z, Huang L, Zhang H, et al. Codon optimization with deep learning to enhance protein expression. Sci Rep. 2020;10:17617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Gao F, Li Y, Decker JM, Peyerl FW, Bibollet‐Ruche F, Rodenburg CM, et al. Codon usage optimization of HIV type 1 subtype C gag, pol, env, and nef genes: in vitro expression and immune responses in DNA‐vaccinated mice. AIDS Res Hum Retroviruses. 2003;19:817–823. [DOI] [PubMed] [Google Scholar]
  35. Gao K, Masuda A, Matsuura T, Ohno K. Human branch point consensus sequence is yUnAy. Nucleic Acids Res. 2008;36:2257–2267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Graf M, Deml L. Wagner R codon‐optimized genes that enable increased heterologous expression in mammalian cells and elicit efficient immune responses in mice after vaccination of naked DNA. Molecular diagnosis of infectious diseases. Volume 0. New Jersey: Humana Press; 2003. p. 197–210. 10.1385/1-59259-679-7:197 [DOI] [PubMed] [Google Scholar]
  37. Hanson G, Coller J. Codon optimality, bias and usage in translation and mRNA decay. Nat Rev Mol Cell Biol. 2018;19:20–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Hellen CUT. Translation termination and ribosome recycling in eukaryotes. Cold Spring Harb Perspect Biol. 2018;10:a032656. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Hellens RP, Brown CM, Chisnall MAW, Waterhouse PM, Macknight RC. The emerging world of small ORFs. Trends Plant Sci. 2016;21:317–328. [DOI] [PubMed] [Google Scholar]
  40. Hernandez‐Alias X, Benisty H, Radusky LG, Serrano L, Schaefer MH. Using protein‐per‐mRNA differences among human tissues in codon optimization. Genome Biol. 2023;24:34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Hinnebusch AG, Ivanov IP, Sonenberg N. Translational control by 5′‐untranslated regions of eukaryotic mRNAs. Science. 2016;352:1413–1416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Hughes S, Mellstrom K, Kosik E, Tamanoi F, Brugge J. Mutation of a termination codon affects src initiation. Mol Cell Biol. 1984;4(9):1738 46. 10.1128/mcb.4.9.1738-1746.1984 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Ingolia NT, Brar GA, Stern‐Ginossar N, Harris MS, Talhouarne GJS, Jackson SE, et al. Ribosome profiling reveals pervasive translation outside of annotated protein‐coding genes. Cell Rep. 2014;8:1365–1379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Ingolia NT, Lareau LF, Weissman JS. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell. 2011;147:789–802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Kaishima M, Ishii J, Matsuno T, Fukuda N, Kondo A. Expression of varied GFPs in Saccharomyces cerevisiae: codon optimization yields stronger than expected expression and fluorescence intensity. Sci Rep. 2016;6:35932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Kaufman RJ, Murtha P, Davies MV. Translational efficiency of polycistronic mRNAs and their utilization to express heterologous genes in mammalian cells. EMBO J. 1987;6:187–193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Kosovac D, Wild J, Ludwig C, Meissner S, Bauer AP, Wagner R. Minimal doses of a sequence‐optimized transgene mediate high‐level and long‐term EPO expression in vivo: challenging CpG‐free gene design. Gene Ther. 2011;18:189–198. [DOI] [PubMed] [Google Scholar]
  48. Kozak M. Selection of initiation sites by eucaryotic ribosomes: effect of inserting AUG triplets upstream from the coding sequence for preproinsulin. Nucleic Acids Res. 1984;12(9):3873 93. 10.1093/nar/12.9.3873 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Kozak M. Influences of mRNA secondary structure on initiation by eukaryotic ribosomes. Proc Natl Acad Sci. 1986;83:2850–2854. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Kozak M. The scanning model for translation: an update. J Cell Biol. 1989;108:229–241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Kudla G, Murray AW, Tollervey D, Plotkin JB. Coding‐sequence determinants of gene expression in Escherichia coli . Science. 2009;324:255–258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Kunec D, Osterrieder N. Codon pair bias is a direct consequence of dinucleotide bias. Cell Rep. 2016;14:55–67. [DOI] [PubMed] [Google Scholar]
  53. Kwon K‐C, Chan H‐T, León IR, Williams‐Carrier R, Barkan A, Daniell H. Codon optimization to enhance expression yields insights into chloroplast translation. Plant Physiol. 2016;172:62–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Le Hir H, Nott A, Moore MJ. How introns influence and enhance eukaryotic gene expression. Trends Biochem Sci. 2003;28:215–220. [DOI] [PubMed] [Google Scholar]
  55. Lim CS, Wardell SJT, Kleffmann T, Brown CM. The exon–intron gene structure upstream of the initiation codon predicts translation efficiency. Nucleic Acids Res. 2018;46:4575–4591. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Lin Y, May GE, Kready H, Nazzaro L, Mao M, Spealman P, et al. Impacts of uORF codon identity and position on translation regulation. Nucleic Acids Res. 2019;47:9358–9367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Liu C‐C, Simonsen CC, Levinson AD. Initiation of translation at internal AUG codons in mammalian cells. Nature. 1984;309(5963):82 85. 10.1038/309082a0 [DOI] [PubMed] [Google Scholar]
  58. Liu Z, Chen O, Wall JBJ, Zheng M, Zhou Y, Wang L, et al. Systematic comparison of 2A peptides for cloning multi‐genes in a polycistronic vector. Sci Rep. 2017;7:2193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Luo MJ, Reed R. Splicing is required for rapid and efficient mRNA export in metazoans. Proc Natl Acad Sci USA. 1999;96:14937–14942. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Lynch M, Marinov GK. The bioenergetic costs of a gene. Proc Natl Acad Sci. 2015;112:15690–15695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Lyu X, Yang Q, Zhao F, Liu Y. Codon usage and protein length‐dependent feedback from translation elongation regulates translation initiation and elongation speed. Nucleic Acids Res. 2021;49:9404–9423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Markham NR, Zuker M. UNAFold. In: Keith JM, editor. Bioinformatics: structure, function and applications. Methods in molecular BiologyTM. Totowa, NJ: Humana Press; 2008. p. 3–31. 10.1007/978-1-60327-429-6_1 [DOI] [Google Scholar]
  63. Matsumoto K, Wassarman KM, Wolffe AP. Nuclear history of a pre‐mRNA determines the translational activity of cytoplasmic mRNA. EMBO J. 1998;17:2107–2121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Mauro VP, Chappell SA. A critical analysis of codon optimization in human therapeutics. Trends Mol Med. 2014;20:604–613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Mertz JE, Murphy A, Barkan A. Mutants deleted in the agnogene of simian virus 40 define a new complementation group. J Virol. 1983;45:36–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Mordstein C, Savisaar R, Young RS, Bazile J, Talmane L, Luft J, et al. Codon usage and splicing jointly influence mRNA localization. Cell Syst. 2020;10:351–362.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Nagata T, Uchijima M, Yoshida A, Kawashima M, Koide Y. Codon optimization effect on translational efficiency of DNA vaccine in mammalian cells: analysis of plasmid DNA encoding a CTL epitope derived from microorganisms. Biochem Biophys Res Commun. 1999;261:445–451. [DOI] [PubMed] [Google Scholar]
  68. Newman ZR, Young JM, Ingolia NT, Barton GM. Differences in codon bias and GC content contribute to the balanced expression of TLR7 and TLR9. Proc Natl Acad Sci. 2016;113:E1362–E1371. 10.1073/pnas.1518976113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Nogales A, Baker SF, Ortiz‐Riaño E, Dewhurst S, Topham DJ, Martínez‐Sobrido L. Influenza a virus attenuation by codon deoptimization of the NS gene for vaccine development. J Virol. 2014;88:10525–10540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Odon V, Fros JJ, Goonawardane N, Dietrich I, Ibrahim A, Alshaikhahmed K, et al. The role of ZAP and OAS3/RNAseL pathways in the attenuation of an RNA virus with elevated frequencies of CpG and UpA dinucleotides. Nucleic Acids Res. 2019;47:8061–8083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Peabody DS, Berg P. Termination‐reinitiation occurs in the translation of mammalian cell mRNAs. Mol Cell Biol. 1986;6:2695–2703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Peabody DS, Subramani S, Berg P. Effect of upstream Reading frames on translation efficiency in simian virus 40 recombinants. Mol Cell Biol. 1986;6:2704–2711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Perez‐Riverol Y, Bai J, Bandla C, García‐Seisdedos D, Hewapathirana S, Kamatchinathan S, et al. The PRIDE database resources in 2022: a hub for mass spectrometry‐based proteomics evidences. Nucleic Acids Res. 2022;50:D543–D552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Picard MAL, Leblay F, Cassan C, Willemsen A, Daron J, Bauffe F, et al. Transcriptomic, proteomic, and functional consequences of codon usage bias in human cells during heterologous gene expression. Protein Sci. 2023;32:e4576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Pop C, Rouskin S, Ingolia NT, Han L, Phizicky EM, Weissman JS, et al. Causal signals between codon bias, mrna structure, and the efficiency of translation and elongation. Mol Syst Biol. 2014;10:770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Powell ML, Brown TDK, Brierley I. Translational termination‐re‐initiation in viral systems. Biochem Soc Trans. 2008;36:717–722. [DOI] [PubMed] [Google Scholar]
  77. Pöyry TAA, Kaminski A, Connell EJ, Fraser CS, Jackson RJ. The mechanism of an exceptional case of reinitiation after translation of a long ORF reveals why such events do not generally occur in mammalian mRNA translation. Genes Dev. 2007;21:3149–3162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Presnyak V, Alhusaini N, Chen Y‐H, Martin S, Morris N, Kline N, et al. Codon optimality is a major determinant of mRNA stability. Cell. 2015;160:1111–1124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Puigbo P, Guzman E, Romeu A, Garcia‐Vallve S. OPTIMIZER: a web server for optimizing the codon usage of DNA sequences. Nucleic Acids Res. 2007;35:W126–W131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Radrizzani S, Kudla G, Izsvák Z, Hurst LD. Selection on synonymous sites: the unwanted transcript hypothesis. Nat Rev Genet. 2024;25:431–448. [DOI] [PubMed] [Google Scholar]
  81. Rogalska ME, Mancini E, Bonnal S, Gohr A, Dunyak BM, Arecco N, et al. Transcriptome‐wide splicing network reveals specialized regulatory functions of the core spliceosome. Science. 2024;386:551–560. [DOI] [PubMed] [Google Scholar]
  82. Ruiz‐Orera J, Albà MM. Translation of small open Reading frames: roles in regulation and evolutionary innovation. Trends Genet. 2019;35:186–198. [DOI] [PubMed] [Google Scholar]
  83. Sandhu KS, Pandey S, Maiti S, Pillai B. GASCO: genetic algorithm simulation for codon optimization. In Silico Biol. 2008;8:187–192. [PubMed] [Google Scholar]
  84. Sharp PM, Li W‐H. The codon adaptation index‐a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987;15:1281–1295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Sibley CR, Blazquez L, Ule J. Lessons from non‐canonical splicing. Nat Rev Genet. 2016;17:407–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Simmonds P, Xia W, Baillie J, McKinnon K. Modelling mutational and selection pressures on dinucleotides in eukaryotic phyla—selection against CpG and UpA in cytoplasmically expressed RNA and in RNA viruses. BMC Genomics. 2013;14:610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Solovyev V. Statistical approaches in eukaryotic gene prediction. In: Balding DJ, Bishop M, Cannings C, editors. Handbook of statistical genetics. 1st ed. New Jersey: Wiley; 2003. 10.1002/0470022620.bbc06 [DOI] [Google Scholar]
  88. Sorokin II, Vassilenko KS, Terenin IM, Kalinina NO, Agol VI, Dmitriev SE. Non‐canonical translation initiation mechanisms employed by eukaryotic viral mRNAs. Biochem Mosc. 2021;86:1060–1094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Stacey SN, Jordan D, Williamson AJ, Brown M, Coote JH, Arrand JR. Leaky scanning is the predominant mechanism for translation of human papillomavirus type 16 E7 oncoprotein from E6/E7 bicistronic mRNA. J Virol. 2000;74:7284–7297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Subramani S, Mulligan R, Berg P. Expression of the mouse dihydrofolate reductase complementary deoxyribonucleic acid in simian virus 40 vectors. Mol Cell Biol. 1981;1:854–864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Swartz MN, Trautner TA, Kornberg A. Enzymatic synthesis of deoxyribonucleic acid. J Biol Chem. 1962;237:1961–1967. [PubMed] [Google Scholar]
  92. Takata MA, Gonçalves‐Carneiro D, Zang TM, Soll SJ, York A, Blanco‐Melo D, et al. CG dinucleotide suppression enables antiviral defence targeting non‐self RNA. Nature. 2017;550:124–127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Tang S, Tao M, McCoy JP, Zheng Z‐M. The E7 oncoprotein is translated from spliced E6*I transcripts in high‐risk human papillomavirus type 16‐ or type 18‐positive cervical cancer cell lines via translation Reinitiation. J Virol. 2006;80:4249–4263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Victor MP, Acharya D, Begum T, Ghosh TC. The optimization of mRNA expression level by its intrinsic properties—insights from codon usage pattern and structural stability of mRNA. Genomics. 2019;111:1292–1297. [DOI] [PubMed] [Google Scholar]
  95. Wacholder A, Carvunis A‐R. Biological factors and statistical limitations prevent detection of most noncanonical proteins by mass spectrometry. PLoS Biol. 2023;21:e3002409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Walsh D, Mohr I. Viral subversion of the host protein synthesis machinery. Nat Rev Microbiol. 2011;9:860–875. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Wang Z, Burge CB. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA. 2008;14:802–813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Yang Q, Lyu X, Zhao F, Liu Y. Effects of codon usage on gene expression are promoter context dependent. Nucleic Acids Res. 2021;49:818–831. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Yu C‐H, Dang Y, Zhou Z, Wu C, Zhao F, Sachs MS, et al. Codon usage influences the local rate of translation elongation to regulate Co‐translational protein folding. Mol Cell. 2015;59:744–754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Zhao F, Yu C, Liu Y. Codon usage regulates protein structure and function by affecting translation elongation speed in drosophila cells. Nucleic Acids Res. 2017;45:8484–8492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Zhao F, Zhou Z, Dang Y, Na H, Adam C, Lipzen A, et al. Genome‐wide role of codon usage on transcription and identification of potential regulators. Proc Natl Acad Sci. 2021;118:e2022590118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Zhao L, Cong X, Zhai L, Hu H, Xu J‐Y, Zhao W, et al. Comparative evaluation of label‐free quantification strategies. J Proteomics. 2020;215:103669. [DOI] [PubMed] [Google Scholar]
  103. Zhou Z, Dang Y, Zhou M, Li L, Yu C, Fu J, et al. Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proc Natl Acad Sci. 2016;113:E6117–E6125. 10.1073/pnas.1606724113 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1: Supplementary Information.

PRO-34-e70036-s001.pdf (2.9MB, pdf)

Data Availability Statement

Vector and insert sequences are available on GenBank (pcDNA3.1Shble1_eGFP: OR659018, pcDNA3.1Shble2_eGFP: OR659019, pcDNA3.1Shble3_eGFP: OR659020, pcDNA3.1Shble4_eGFP: OR659021, pcDNA3.1Shble5_eGFP: OR659022, pcDNA3.1Shble6_eGFP: OR659023, pcDNA3.1Shble7_eGFP: OR659024, pcDNA3.1Shble8_eGFP: OR659025, pcDNA3.1Shble9_eGFP: OR659026, pcDNA3.1Shble10_eGFP: OR659027, pcDNA3.1Shble11_eGFP: OR659028, pcDNA3.1Shble12_eGFP: OR659029 and pcDNA3.1Shble13_eGFP: OR659030 and pcDNA3.1Shble4mut_eGFP: OR659031, pcDNA3.1Shble6mut_eGFP: OR659032, pcDNA3.1Shble7mut_eGFP: OR659033, pcDNA3.1Shble10mut_eGFP: OR659034 and pcDNA3.1Shble13mut_eGFP: OR659035). All R scripts used to analyze the data are available at Github.com/philippe‐paget/PERVASIVE‐TRANSLATION‐OF‐THE‐DOWNSTREAM‐ORF‐FROM‐BICISTRONIC‐MRNAS‐BY‐HUMAN‐CELLS. The input data .csv files used are also available, together with the scripts used for treating cytometry raw data. Raw data from cytometry, RT‐qPCR, Bioanalyzer, and Sanger sequencing can be shared upon reasonable request to the corresponding author. Label‐free proteomic data are available on the ProteomeXchange platform under the identifier PXD047576.


Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES