Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2017 Jun 9;13(6):e1005592. doi: 10.1371/journal.pcbi.1005592

Ribosome reinitiation can explain length-dependent translation of messenger RNA

David W Rogers 1,2,*,#, Marvin A Böttcher 2,#, Arne Traulsen 2, Duncan Greig 1,3
Editor: Alexandre V Morozov4
PMCID: PMC5482490  PMID: 28598992

Abstract

Models of mRNA translation usually presume that transcripts are linear; upon reaching the end of a transcript each terminating ribosome returns to the cytoplasmic pool before initiating anew on a different transcript. A consequence of linear models is that faster translation of a given mRNA is unlikely to generate more of the encoded protein, particularly at low ribosome availability. Recent evidence indicates that eukaryotic mRNAs are circularized, potentially allowing terminating ribosomes to preferentially reinitiate on the same transcript. Here we model the effect of ribosome reinitiation on translation and show that, at high levels of reinitiation, protein synthesis rates are dominated by the time required to translate a given transcript. Our model provides a simple mechanistic explanation for many previously enigmatic features of eukaryotic translation, including the negative correlation of both ribosome densities and protein abundance on transcript length, the importance of codon usage in determining protein synthesis rates, and the negative correlation between transcript length and both codon adaptation and 5' mRNA folding energies. In contrast to linear models where translation is largely limited by initiation rates, our model reveals that all three stages of translation—initiation, elongation, and termination/reinitiation—determine protein synthesis rates even at low ribosome availability.

Author summary

Recent advances in proteomics show that translation is strongly dependent on transcript length, but current theoretical models fail to capture this relationship. Here, we propose that the high initiation rates and protein yields of short transcripts result from terminating ribosomes reinitiating on the same transcript. The frequency of reinitiation depends on the time required to complete one full transit of a transcript, coupling transcript lengths and elongation rates to protein yield. Any slow step reduces the protein yield of shorter transcripts more than the yield of longer transcripts, generating stronger selective pressure to eliminate slow steps in shorter transcripts and explaining the widespread negative correlations in eukaryotes between transcript length and both 5' mRNA folding energy and codon adaptation. Our reinitiation-based model reconciles conflicting results from previous initiation-limited models with recent advances in biotechnology and identifies the mechanism underlying length-dependent translation, allowing powerful prediction of translational regulation across eukaryotes.

Introduction

The physiological state of a cell is largely determined by the identity and abundance of the proteins encoded by its genome. Understanding how genetic information is first transcribed into messenger RNA and then translated into protein is therefore fundamental to our understanding of biological systems. A wide variety of technologies has allowed detailed investigations of transcription, but—until very recently—a lack of similar tools for empirical research on translation has meant that the study of post-transcriptional regulation has been largely restricted to mathematical models with little opportunity for parameterization or evaluation. Recent advances in both sequencing technology and mass spectrometry have now produced large amounts of data on the translation of eukaryotic mRNA, revealing how transcript features, RNA-binding proteins, and non-coding RNAs influence translation [1,2]. While many of the determinants of translation rates revealed by these empirical studies were predicted by existing models, some remain difficult to explain. Perhaps the most striking correlate of translation rate is the length of the transcript itself. Multiple experimental studies, across a wide range of eukaryotic organisms, have demonstrated a steep negative correlation between the length of a given coding sequence (CDS) and three different measures of translation: translation initiation rates [35], the density of ribosomes on a transcript [515], and the abundance of the encoded protein [1619].

Ribosome and polysome profiling experiments have shown a positive relationship between ribosome density and protein abundance, leading to the conclusion that transcripts with higher ribosome densities have higher translation rates [9,11,20]. A positive relationship between ribosome density and translation rate can occur when translation is limited by low initiation rates. In traditional models of translation, initiation can be limiting when other steps in translation, such as elongation, occur quickly enough to prevent collisions between ribosomes [20]. Consistent with this key role of initiation rates in determining translation rates, Arava et al [6] found that the higher densities of ribosomes on shorter transcripts was most consistent with shorter transcripts having exponentially higher initiation rates than longer transcripts, estimating a halving of the initiation rate with every 400-codon-increase in CDS length. More recent analyses [3,4] have revealed that the relationship between CDS length and initiation rates is better described by a power law: the initiation rate is roughly halved for every doubling of CDS length (i.e. a log-log slope of approximately -1). However, the assumption of initiation-limitation leaves little room for variation in elongation rates to influence translation rates, which is at odds with recent work demonstrating that codon usage can be an important determinant of protein yields [21,22].

If translation is limited by the ability of transcripts to capture ribosomes from the cytoplasmic pool (the de novo initiation rate), mechanisms that allow transcripts to retain terminating ribosomes for subsequent rounds of translation should improve translation rates. The closed-loop model of translation was first proposed as a hypothetical mechanism to improve translation efficiency through intrapolysomal ribosome reinitiation [23,24]. By bringing the sites of termination and initiation into close proximity through circularization of the mRNA, the closed-loop complex allows ribosomes that have finished translating to reinitiate translation on the same mRNA molecule rather than returning to the cytoplasmic pool. The closed-loop model was initially based on the appearance of many polysomes in electron micrographs as circular, rather than linear, structures (detailed high resolution tomographic analyses of circular polysomes are now available [25]). Recent theoretical and experimental studies have shown that secondary structures in single stranded RNAs bring the 5' and 3' ends close together (equivalent to the distance spanned by 9–16 nucleotides) meaning that mRNAs are effectively circularized [26,27]. Interactions between initiation factors bound to the 5' end, and proteins associated with the 3' end including release and recycling factors, and the poly(A) binding proteins, are thought to facilitate translation, possibly by stabilizing the closed-loop structure or by actively promoting reinitiation [28,29].

The importance of reinitiation of ribosomes on circular transcripts in determining protein yield is well established in vitro [23,24,3032]. Measuring translation of the luminescent protein luciferase in a eukaryotic cell-free system, Kopeina et al [31] showed that circular polysomes rarely exchanged ribosomes with the free pool or lost ribosomes to other transcripts, but linear polysomes did so frequently. On circular polysomes, most terminating ribosomes immediately reinitiated on the same mRNA molecule (see also [30]). Alekhina et al [32] found that protein production in a similar cell-free system does not rapidly reach a steady state, as would be expected under a linear model of translation, but rather accelerates over the lifetime of the transcript, consistent with reinitiation on the same transcript. They proposed that the translation rate initially depends on slow de novo initiation of ribosomes from the free pool but soon becomes dominated by the much faster process of reinitiation.

Here, we use a minimal computational model to investigate the consequences of ribosome reinitiation on translation, with particular focus on transcript length and codon usage. We find that reinitiation causes ribosome densities, overall initiation rates, and protein yields to decrease with increasing transcript length. Furthermore, higher levels of reinitiation increase the importance of codon usage in determining translation rates in a length-dependent manner, even at low ribosome densities or low de novo initiation rates. Reinitiation therefore provides a potential mechanistic explanation for multiple previously-enigmatic patterns observed in empirical studies of translation.

Model

We use a totally asymmetric simple exclusion process (TASEP, reviewed in [33]) to investigate the closed loop model of translation. The TASEP (Fig 1) models each transcript as a one-dimensional lattice consisting of a number of sites equal to the number of codons in the CDS: each site represents a single codon. Each site can be either free or occupied by a ribosome. Ribosomes move along the transcript in the 5' to 3' direction and cannot occupy the same codon(s) as any other ribosome. In our model, the transcript is circularized, meaning that terminating ribosomes can not only be released into the cytoplasmic pool (as in a linear TASEP) but can also move to the initiation site of the same transcript (reinitiation).

Fig 1. The closed-loop model of translation.

Fig 1

Ribosomes (shaded in grey), modeled as extended particles that occupy 10 codons (black boxes), join the transcript at the start codon from the cytoplasmic pool at the de novo initiation rate, and hop to the next codon (in the 5' to 3' direction) at the elongation rate. Upon reaching the stop codon, ribosomes either return to the cytoplasmic pool at the release rate or return to the start codon at the reinitiation rate. The reinitiation level is determined by the reinitiation rate divided by the sum of the reinitiation and release rates. If the initiation site is occupied (i.e. any of the first 10 codons is being decoded), new ribosomes fail to join the transcript and reinitiating ribosomes either remain at the termination site or return to the cytoplasmic pool at the release rate. An elongating ribosome fails to step forward if the distance between its center and that of the ribosome in front is ≤ 10 codons (collision). Stochastic simulations were performed using the Gillespie algorithm. The Gillespie algorithm consists of multiple steps: (1) Initialization: the simulation time is set to zero; at this point in our simulations, all transcripts are empty and the only possible reaction is de novo initiation; (2) List all possible reactions: all possible reactions are determined and their rates are used to calculate the total rate of possible reactions; (3) Monte Carlo step: two random numbers are generated, the first determines the waiting time until the next reaction based on the total rate of possible reactions, and the second determines which reaction occurs using each reaction rate as a probabilistic weight; (4) Update: the time is increased by the randomly generated waiting time from step 3 and the chosen reaction is performed; (5) Iteration: repeat from step 2 unless the transcript lifetime has been reached.

Four different types of reactions can take place in the TASEP: (i) de novo initiation: a free ribosome can be placed onto the 5' end of the transcript (the initiation site) at the de novo initiation rate; (ii) elongation: ribosomes at any codon on the transcript (except the termination site) can move forward one codon in the 3' direction at the elongation rate; if a ribosome occupies the termination site, it can either (iii) leave the transcript at the release rate or (iv) it can move to the initiation site at the reinitiation rate.

We model ribosomes as extended particles that occupy ten codons each: the A-sites (where each codon is translated) of adjacent ribosomes must be spaced apart by at least 10 codons. Thus, the elongation reaction is only possible when the A-site of the next ribosome in the 3' direction is > 10 codons downstream. Similarly, neither de novo initiation nor reinitiation is possible if any of the first 10 codons is occupied by an A-site.

Analytical solutions of the TASEP are possible, but currently can only be applied to the steady state. Consequently, most TASEP models, including a recent study of reinitiation [34], investigate translation at the steady state, where the rate at which ribosomes join a transcript equals the rate at which they leave, and the translation rate is constant. However, every real transcript spends some proportion of its lifetime outside of the steady state, where these solutions do not apply; the assumption of a perpetual steady state is therefore an approximation. A new transcript does not instantly acquire ribosomes distributed along its length. Instead, ribosomes join at the 5' end and gradually progress towards the stop codon, where they can be released. In the absence of reinitiation, the steady state can be reached once the first ribosome to join a transcript is released. The duration of this "pioneer round" [35] increases with transcript length, but generally represents a small proportion of eukaryotic transcript lifetimes. In the absence of reinitiation the steady state is therefore a good approximation (although it can be inappropriate for prokaryotes with short-lived transcripts [36,37]). However under reinitiation, ribosomes do not necessarily leave the transcript upon termination, which causes the effective initiation rate (and the translation rate) to increase over time [32]. The time to reach the steady state therefore increases with both transcript length and reinitiation probability, and the time spent outside of the steady state thus represents a greater proportion of transcript lifetimes (S1 Fig). The steady state assumption consequently becomes a much worse approximation of translation at higher levels of reinitiation, overestimating translation rates on long transcripts and underestimating translation on short transcripts (S1 Fig). It is therefore impossible to make a fair comparison of translation at different reinitiation levels using the steady state approximation, particularly for transcripts of different lengths.

Since we do not assume that translation on any given transcript is always at the steady state, we cannot use the steady state analytical solutions of the TASEP. Instead, we perform stochastic simulations using the Gillespie algorithm [38], which capture both the steady state and the non-steady state. In models that assume the steady state, all translation that occurs in simulations prior to the steady state is ignored. For example, in a recent reinitiation-based model of translation in yeast, the first 105 s of simulations was discarded [34]. Given that the average lifetime of yeast transcripts is on the scale of 103−104 s [39], this means that all translation occurring over biologically plausible lifetimes was excluded from the analysis. Here, we make no assumptions about the steady state; we simply account for all translation that occurs during the lifetime of a transcript (both before and after the steady state is achieved). We simulated translation on each transcript independently. Each run generated a time evolution of the ribosome occupancy at each codon on a given transcript. We computed three measures of translation: ribosome density (the average number of ribosomes on a transcript over its lifetime divided by one tenth of CDS length, because each ribosome occupies 10 codons), effective initiation rate (the total number of initiations occurring through either de novo initiation or reinitiation divided by transcript lifetime) and protein yield (the total number of ribosomes reaching the stop codon of a transcript). We averaged the results of 1000 runs to produce results that are not subject to large stochastic fluctuations. We do not consider untranslated regions and our transcripts therefore represent only the CDS. The code for our TASEP is available at: https://github.com/marvinboe/reTASEP

Changing any transcriptome-wide parameter can dramatically alter global ribosome usage. For instance, at a given de novo initiation rate, increasing the reinitiation probability increases the total number of actively translating ribosomes. While this effect may be true, given that reinitiation is expected to allow more efficient use of ribosomes (see Discussion), it makes parameterizing the model difficult because the actual level of reinitiation is unknown. To keep all simulations consistent with empirical values, we have adjusted the de novo initiation rate to maintain the empirically observed average ribosome density. For simplicity, we have kept the number of ribosomes on a 400-codon-long transcript constant (at 6 ribosomes) for all transcriptome-wide reinitiation probabilities and elongation rates. See S1 Text for details on parameter estimates used in each simulation.

Results

High levels of reinitiation generate length-dependent translation

Our model captures the negative correlation between ribosome density and CDS length observed in empirical studies, but only if the probability of reinitiation is high (Fig 2). This result is intuitive; if reinitiation were perfect, all ribosomes that initiate would continue to reinitiate and translate, never leaving a transcript until it degrades. The density of ribosomes on a transcript of a given length and age would therefore be determined exclusively by the de novo initiation rate. If the de novo initiation rate is the same for all transcripts, then all transcripts of a given age should carry the same number of ribosomes and ribosome density will be the inverse of CDS length (with a log-log slope of -1). At a given elongation rate, the time required for a ribosome to complete one cycle (travel from the start codon to the stop codon) is less for short transcripts than for long transcripts. This means that, prior to the steady state, reinitiation occurs more frequently on shorter transcripts resulting in higher protein yields for short transcripts than long transcripts. When all or nearly all terminating ribosomes reinitiate, the effective initiation rate is much higher for shorter transcripts—providing a simple mechanism that could explain the length-dependence of initiation rates predicted by recent studies of translation [35,8]. The higher ribosome densities on shorter transcripts increase the likelihood of collisions between ribosomes, resulting in deviations from the expected power law relationship between measures of translation and CDS length (Fig 2) through two related mechanisms. First, more frequent collisions between elongating ribosomes on shorter transcripts slow down translation, generating less steep slopes for the effective initiation rate and protein yield at low CDS lengths. Second, the initiation site is more likely to be occupied on a short transcript than on a long transcript, resulting in higher levels of initiation interference [21] on shorter transcripts, further flattening length-dependence for all measures of translation on short transcripts.

Fig 2. Reinitiation of post-termination ribosomes causes length-dependent translation.

Fig 2

Ribosome density is the average number of ribosomes occupying a transcript during its lifetime divided by one tenth of the CDS length (since each ribosome occupies 10 codons). The effective initiation rate is the total number of initiation events (de novo initiation and reinitiation) divided by transcript lifetime. Protein yield is the total number of ribosomes reaching the stop codon during the lifetime of a transcript. Slopes (95% confidence intervals) are indicated in the bottom left corner of each panel. De novo initiation rates were adjusted at each reinitiation level so that a 400-codon long transcript carried 6 ribosomes. Top: experimentally observed relationships between CDS length and ribosome density (left), initiation rate (center) or protein abundance (right) in the budding yeast Saccharomyces cerevisiae. Middle: experimentally observed relationship between CDS length and ribosome density (left) and protein abundance (right) in the human HEK293T cell line. Estimates of the initiation rate are not currently available for this cell line, so we have used this space to list the slopes from our simulations. Bottom: predicted relationships between ribosome density (left), the effective initiation rate (center), and protein yield (right) and CDS length at different reinitiation levels (different colours) from our simulations. Simulations were performed using a fixed elongation rate of 10s-1. See S2 Fig for simulations at other parameter values. Data sources: Yeast densities are weighted averages of the signals in polysomal fractions for 6071 transcripts from [6]; initiation rates for 5348 transcripts were calculated by Ciandrini et al [3] based on ribosome density data from [7]; protein abundances of 4694 proteins included in the Peptide Atlas 2013 dataset from PaxDb [40] normalized against the total number of proteins (expressed as parts per million). HEK293T densities were calculated from mean ribosome numbers (across 3 replicates) reported by Hendrickson et al [11]; protein abundances of 2636 proteins with identified CDS lengths included in the Geiger MCP 2012 data set (based on spectral counting) from PaxDb [40].

When reinitiation is not perfect, ribosomes can return to the cytoplasmic pool after termination, and the effect of CDS length on ribosome density, effective initiation rate, and protein yield is diminished. Even small reductions in reinitiation probability greatly weaken length-dependence (Fig 2). This is because short transcripts have more opportunities to lose ribosomes than do long transcripts. While a successful reinitiation event only guarantees that a ribosome remains associated with the transcript until the next termination event, ribosome loss is permanent. In the complete absence of reinitiation, length-dependence is therefore abolished.

Reinitiation, but not de novo initiation, has a larger effect on short transcripts than long transcripts

While changing transcriptome-wide parameters can dramatically affect global ribosome usage (see Model), altering parameters of transcripts encoded by a single gene will have little effect on global ribosome usage. This is because nearly all endogenous genes are expressed at low levels, so changing the translation parameters of the transcripts produced by a single gene will have a negligible effect on global ribosome availability [4,20,41]. By studying transcripts of individual genes, we can therefore investigate the consequences of changing a single parameter while holding all other values constant. We first tested the effects of altering the reinitiation rate of transcripts encoded by a single gene (Fig 3A and 3B). Doubling the reinitiation rate results in an extremely similar increase in all three measures of translation (ribosome density, effective initiation rate, and protein yield; results are therefore only shown for ribosome density), but the effects are greater for short transcripts than long transcripts. These effects are mirrored by a length-dependent decrease in translation when the reinitiation rate is halved (Fig 3B). Furthermore, the length-dependent effects of changing the reinitiation rate of a single transcript species are generally stronger at higher transcriptome-wide reinitiation probabilities, except when reinitiation is so high that ribosomes rarely leave the transcript (e.g. 99.9%).

Fig 3. Transcript-specific change of the reinitiation rate, but not the de novo initiation rate, has larger effects on short transcripts than long transcripts.

Fig 3

We simulated the effects of changing the reinitiation rate (A, B) or the de novo initiation rate (C, D) of a single transcript species by 2-fold or 0.5-fold at different transcriptome-wide reinitiation levels. For each transcriptome-wide reinitiation level (different colours), doubling (or halving) the reinitiation rate shifted the reinitiation level to: 99% to 99.5% (98.0%); 95% to 97.5% (90.5%); 90% to 94.7% (81.8%); 80% to 88.9% (66.7%); 50% to 66.7% (33.3%); 0% to 0% (0%). Doubling or halving the reinitiation rate at very high transcriptome-wide reinitiation levels (e.g. 99.9%) has little effect on translation since ribosomes rarely leave transcripts. Y-axes show the ribosome density of altered transcripts relative to an equivalent transcript at the transcriptome wide reinitiation level. The effects of changing either the reinitiation rate or the de novo initiation rate on the effective initiation rate and protein yield were nearly identical to the effects on ribosome density. Transcriptome-wide de novo initiation rates were adjusted at each reinitiation level so that a 400-codon long CDS at the transcriptome-wide reinitiation level carried 6 ribosomes.

We next tested the effects of altering the de novo initiation rate of a single transcript species (Fig 3C and 3D). In the absence of reinitiation, doubling the de novo initiation rate had an equal effect on ribosome density for transcripts of all lengths. However, at higher levels of reinitiation, doubling the de novo initiation rate resulted in a smaller increase in ribosome density on short transcripts than on long transcripts, caused by increased initiation interference; the higher density of ribosomes on short transcripts under reinitiation increases the probability that the initiation site is blocked, preventing successful de novo initiation. The effects of altering the de novo initiation rate on the effective initiation rate and protein yield are very similar to the effects on ribosome density.

High levels of reinitiation couple effective initiation rates and protein yields to the elongation rate

So far, we have assumed that all transcripts have identical elongation rates, but in reality the elongation rate varies between transcripts encoded by different genes [42]. We therefore investigated the consequences of changing the elongation rate of a single CDS from 10s-1 to either 20s-1 or 5s-1 (Fig 4). Increasing the elongation rate reduces the amount of time between initiation and termination. In the absence of reinitiation, this causes ribosomes to spend less time on the altered transcript resulting in decreased ribosome density, but has little effect on the initiation rate or protein yield since these elongation rates are generally not limiting. Altered elongation rates do affect how long it takes to clear the initiation site and therefore the amount of initiation interference, explaining the relatively small differences in initiation rates and protein yields seen at 0% reinitiation [21].

Fig 4. Transcript-specific change in translation caused by altering the average elongation rate of a single coding sequence.

Fig 4

We simulated the effects of changing the average elongation rate of a single transcript species from 10s-1 to either (A) 20s-1 or (B) 5s-1 at different reinitiation levels (different colours). CDS length refers to the length of the altered coding sequence. Y-axes show the effect of altering the elongation rate on the ribosome density (left), effective initiation rate (center), and protein yield (right) of the altered transcript at the higher or lower elongation rate relative to 10s-1 (dotted line). De novo initiation rates were adjusted at each reinitiation level so that a 400-codon long transcript with a fixed elongation rate of 10s-1 carried 6 ribosomes.

Under perfect reinitiation, terminating ribosomes explicitly reinitiate on the same transcript. Changing the elongation rate of a single gene therefore has no effect on the density of ribosomes on the altered transcript. However, by altering the time between reinitiation events, changing the elongation rate results in an equal change in the effective initiation rate of the altered transcript (Fig 4). The protein yield of any endogenous gene is therefore exquisitely sensitive to changes in elongation rate under perfect reinitiation. Under perfect reinitiation, this effect is seen at all CDS lengths. The importance of the elongation rate decreases dramatically when reinitiation levels are reduced: faster elongation results in more opportunities to lose ribosomes, particularly on short transcripts.

Length-dependent consequences of a single slow step on translation

So far, we have only considered the effects of changing the average elongation rate of a transcript. However, it is difficult to imagine a mechanism that could simultaneously alter the elongation rate of all codons in a single transcript species without affecting the global elongation rate. Instead, transcripts are likely altered by mutations affecting a single codon at a time. Codon usage can affect elongation by determining the stability of secondary structures in the mRNA, but different codons are also decoded at different rates depending on the cellular availability of the appropriate tRNA. Most amino acids are encoded by multiple codons, and some codons (including synonymous codons that code for the same amino acid) are decoded faster than others [42,43]. We therefore investigated the consequences of a single slow step on translation of transcript species of different lengths (Fig 5). Here, we only examined translation at 99.9% reinitiation; similar results would be expected for other models of length-dependent translation including those that omit reinitiation. Introducing a single slow step into any transcript reduces its effective initiation rate and protein yield, but the effects are much larger for short transcripts than for long transcripts (Fig 5). The length-dependence of a single slow step arises from two sources. First, a single site represents a larger proportion of a short transcript than a long transcript and consequently results in a greater decrease in the average elongation rate [44]. Second, short transcripts have higher ribosome densities and are therefore more prone to collisions or "traffic jams" than are long transcripts. Effective initiation rates and protein yields are particularly sensitive to single slow steps near the start codon, with larger effects on shorter transcripts: slow clearance of the initiation site delays reinitiation and blocks de novo initiation resulting in lower ribosome densities on affected transcripts.

Fig 5. The consequences of a single slow step under length-dependent translation.

Fig 5

We investigated the consequences of introducing a slow elongation step at either the start codon (green), the middle codon (grey), or the final codon (immediately before the stop codon, blue) under 99.9% reinitiation. CDS length refers to the length of the altered coding sequence. Slow codons were translated at 1s-1 (filled circles), 0.1s-1 (open circles), or 0.01s-1 (plus signs). Y-axes show the effect of a single slow step on the ribosome density (A), effective initiation rate (B), and protein yield (C) at the lower elongation rate relative to 10s-1 (dotted line).

A yeast-specific model of translation with reinitiation

Given the importance of variation in elongation rates to translation under reinitiation, we used our model to simulate translation in S. cerevisiae using codon-specific decoding rates. We used decoding rates (see S1 Table) estimated by Gilchrist & Wagner [45] which are based on tRNA availability and wobble pairing rules and scaled so that the average decoding rate is 10s-1; they are related to measures of codon occupancy reported in [5] (r = 0.494, n = 61, P < 0.0001). Since efficient reinitiation couples protein production to elongation rates, synonymous codon usage should have detectable consequences for protein yield at high levels of reinitiation. We tested the effects of synonymous codon usage at 99.9% reinitiation by predicting the yields of nine different synthetic GFP constructs [46] that differ only in their synonymous codon usage (Fig 6A). We compared these predictions to observed protein abundances measured in S. cerevisiae expressing each construct, and found a strong positive correlation between predicted yields and observed abundances (r = 0.750, n = 9, P = 0.020); our model predicted approximately half of the observed effect of using different synonymous codons (relative expression of highest vs. lowest construct, model = 2.4-fold, observed = 5.4-fold). Thus, efficient reinitiation correctly predicts a role for synonymous codon usage in determining yield.

Fig 6. Simulating translation in the budding yeast S. cerevisiae.

Fig 6

Our simulations were run using yeast-specific decoding rates [45] and a transcript lifetime of 1553s [39]. Results are shown for 99.9% reinitiation (see S3 Fig & S4 Fig for simulations at other reinitiation levels). (A) Correlation of model-predicted protein yields (proteins per mRNA per lifetime) for 9 differently codon-optimized GFP coding sequences with experimentally measured abundances (arbitrary units) from [46]. (B) Correlation of model-predicted ribosome density (left), effective initiation rate (middle), and protein yield (right) with experimentally measured values. Sources of experimental data are the same as in Fig 2. When X and Y-axis scales are equivalent, the 1:1 line is represented by a dotted line. When X and Y-axis scales are not equivalent, the solid line is the regression line.

Having established that Gilchrist & Wagner’s [45] codon-specific elongation rates are realistic, we used them to simulate the entire budding yeast translatome. The results of our simulations at 99.9% reinitiation are strongly correlated (Fig 6B) with experimental measures of ribosome densities (r = 0.932, n = 5542, using data from [6]) and calculated initiation rates (r = 0.742, n = 5348, using estimates from [3]; r = 0.618, n = 3728, using estimates from [4]). Our yield predictions are less strongly correlated with measured protein abundances (r = 0.478, n = 4686, data from Peptide Atlas 2013). This weaker correlation is unsurprising as our predictions of yield omit many important determinants of protein abundance including transcript abundance and protein stability. Results of simulations at other reinitiation levels are included in S3 Fig (fixed transcript lifetime) and S4 Fig (experimentally measured transcript lifetimes).

Discussion

We have shown that a fixed transcriptome-wide level of ribosome reinitiation can generate both length-dependent translation and a powerful transcript-specific role for codon usage, but only when reinitiation is extremely efficient. The level of reinitiation in live cells is unknown, but multiple studies have established that reinitiation is much more frequent than de novo initiation in cell-free systems. Furthermore, if reinitiation benefits the cell, we would expect it to evolve to become highly efficient. Maintaining a large pool of ribosomes consumes a substantial part of a cell's energy budget and selection will favor mechanisms that allow ribosomes to be used efficiently [47]. If, as reported by Nelson & Winkler [30] and Kopeina et al [31], reinitiation of post-termination ribosomes is faster than de novo initiation from the free ribosome pool, then efficient reinitiation reduces the amount of "dead time" ribosomes spend in the pool waiting to be recruited [34,48]. Each ribosome in a cell is therefore able to complete more rounds of translation in a given time interval under high levels of reinitiation compared to low levels of reinitiation. Reinitiation levels should be very closely associated with fitness: the translation initiation rate is thought to be the principal determinant of the rate of cell division [49,50]. Consequently, if reinitiation does occur in living cells, it is hard to imagine why it would not work very efficiently. Direct measurement of the level of reinitiation in vivo may soon be possible thanks to recent technological advancements enabling selective labeling of ribosomes [51] and the visualization of translation on individual mRNAs [52,53].

A single fixed level of reinitiation is not necessary to explain length-dependent translation; efficient reinitiation is only required on short transcripts (S5 Fig). Studies in living cells have shown that some transcripts are more likely to be associated with translation factors required to form the closed-loop complex than others [54]. If the closed-loop complex is required for efficient reinitiation, then reinitiation levels likely vary between transcripts. More specifically, shorter transcripts likely experience higher levels of reinitiation since they are both more likely to be enriched with closed-loop factors [15,55], form more stable closed-loop complexes [56], and may exhibit shorter end-to-end distances allowing increased levels of reinitiation by passive diffusion [57]. Additionally, cellular depletion of both the closed loop factor eIF4G and the translational regulator Asc1/RACK1 has also been shown to have a greater effect on the translation of short transcripts than on long transcripts [13,15]. Using length-dependent reinitiation levels in our simulations allows the empirical relationship between CDS length and ribosome density, effective initiation rate, and protein yield to be captured at an average reinitiation level orders of magnitude smaller (~90%; S5 Fig) than does a fixed reinitiation level (99.9%; S5 Fig).

Beyond acting on global mechanisms, natural selection also operates to maximize the protein yield of transcripts encoded by individual genes (translational efficiency [44]). Selection for increased translational efficiency can not only increase the abundance of a given protein in a cell, but can also maintain protein levels while minimizing the cost of transcription, which has been shown to be an important determinant of fitness in yeast [58]. The strength of selection depends on the magnitude of the effect of a given mutation on translational efficiency; mutations with larger effects are subject to stronger selection. We have shown that the magnitude of the effect on translational efficiency of altering a given parameter by an equal amount can vary with the length of the altered transcript species. Thus, the strength of selection on mutations that affect a given parameter can be length-dependent [44]. For instance, doubling the reinitiation rate of a single transcript species results in a bigger increase in translational efficiency for shorter transcripts (Fig 3). Mutations affecting the reinitiation rate of short transcripts are therefore more likely to be selected than are those than occur on long transcripts, potentially contributing to higher levels of reinitiation on shorter transcripts as discussed above (S5 Fig). In contrast, doubling the de novo initiation rate does not result in higher translational efficiency on shorter transcripts and, under reinitiation, can actually have smaller effects on shorter transcripts due to increased initiation interference (Fig 3). Selection for increased translational efficiency on individual transcript species is therefore not predicted to result in higher de novo initiation rates on shorter transcripts. Instead, selection under reinitiation will be more effective at reducing initiation interference on shorter transcripts.

At high levels of reinitiation, we have shown that a single slow step in translation causes a greater reduction in the translational efficiency of short transcripts than that of long transcripts (Fig 5). Eliminating slow steps has larger effects on the translation of short transcripts compared to long transcripts and therefore selection to eliminate slow steps will be most effective in genes encoding short transcripts. Length-dependent selection against slow steps under reinitiation therefore offers an explanation for the negative correlation between codon adaptation and CDS length observed across eukaryotes ([4, 44, 5963] but see also [64]). Translational efficiency is particularly sensitive to slow sites near the start codon (Fig 5, see also [21]): slow clearance of the initiation site delays reinitiation (promoting ribosome loss) and blocks de novo initiation resulting in lower ribosome densities on affected transcripts. Multiple mechanisms can determine how slowly ribosomes vacate the initiation site including the presence of one or more slow codons [21] or the presence of stable 5' secondary structures in the transcript [65]. As both features reduce yield to a greater extent on short transcripts compared to long transcripts (Fig 5), selection should be more efficient at eliminating them on shorter transcripts, consistent with the negative correlations between CDS length and both 5' mRNA folding energy and 5' codon adaptation [59]. Thus, length-dependent translation generated by high levels of reinitiation will generate length-dependent selection against slow steps [44], which will in turn reinforce patterns of length-dependent translation.

Reinitiation provides a simple mechanistic explanation for empirically observed patterns of length-dependent translation including negative correlations between CDS length and ribosome density, effective initiation rate, protein yield, transcript codon adaptation, 5' codon adaptation, 5' folding energy, and association with closed-loop factors. Under reinitiation, these patterns are expected to emerge through selection for efficient ribosome usage, maximizing protein yield, and translational efficiency on individual transcript species. This is in sharp contrast to linear models in which, at low ribosome availability, length-dependence arises through direct selection for higher de novo initiation rates on shorter transcripts [3,4]. Our model is consistent with the emerging view that translation is controlled not only by initiation, but also by elongation and termination/reinitiation [21,22,66]. This conceptual shift makes clear that manipulating any these stages can have profound consequences on translation, and presents factors associated with elongation, release, and recycling as new targets for therapeutic intervention (cf. [67]).

Supporting information

S1 Fig. The steady state is a poor approximation of translation at high reinitiation levels for transcripts with finite lifetimes.

(A) Time to approach the steady state ribosome density at different reinitiation levels for transcripts of different lengths. Simulations were run for 3x105 s using the parameter values described for our general model. De novo initiation rates were adjusted at each reinitiation level so that a 400-codon transcript carried an average of 6 ribosomes at the steady state. The steady state ribosome density was calculated as the average density over the final 104 s of 1000 iterations (if this value showed no directional change over time). The first passage time represents the earliest time point that the average density of 1000 iterations equaled or exceeded the steady state density. Long transcripts (4000 codons) failed to reach the steady state during the 3x105 s run time at reinitiation levels above 99.6%. Data points at higher reinitiation levels were therefore excluded. (B) Steady state translation rate (yield s-1) relative to the average translation rate (yield s-1) on equivalent transcripts with a finite 3000 s lifetime. Deviations from a ratio of 1 represent the magnitude of the misestimation of the translation rate caused by assuming a perpetual steady state on transcripts with finite lifetimes. De novo initiation rates were adjusted at each reinitiation level such that the average 400 codon-transcript carried 6 ribosomes either at the steady state or on average over its 3000 s lifetime. Consequently, all 400 codon transcripts have the same average ribosome density allowing fair comparison. At high reinitiation levels, the steady state approximation overestimates ribosome density on long transcripts and underestimates ribosome density on short transcripts.

(TIF)

S2 Fig. Full model predictions of ribosome density, effective initiation rate, and protein yield at different reinitiation levels.

We simulated translation for a wide range of transcript lifetimes and de novo initiation rates. For any given combination of transcript lifetime and de novo initiation rate, we simulated translation for transcripts with different CDS lengths and then calculated the slope of ribosome density, effective initiation rate, and protein yield over CDS length. Slopes are indicated in different colours (see colour bar), reflecting the degree of length-dependence. The white line in each panel shows the de novo initiation rate required at each lifetime such that a 400-codon long transcript carries 6 ribosomes. Our model predicts that high levels of reinitiation can cause length-dependent translation across a wide range of transcript lifetimes and de novo initiation rates. At high levels of reinitiation, length-dependence is strongest for short transcript lifetimes and low de novo initiation rates; since ribosomes are unlikely to leave the transcript, at long lifetimes or high de novo initiation rates, all transcripts eventually become saturated with ribosomes. As a result, the number of ribosomes per transcript becomes proportional to CDS length (since longer transcripts can carry more ribosomes) and ribosome density, the effective initiation rate, and protein yield become similar for all CDS lengths. As the reinitiation level falls, the effective initiation rate becomes dominated by the de novo initiation rate, which is the same for all CDS lengths, and ribosome density, the effective initiation rate, and protein yield all become independent of CDS length. In a linear model of translation (0% reinitiation), negative slopes for density and yield are only seen for extremely short transcript lifetimes simply because the first initiating ribosome fails to reach the stop codon on long transcripts before the maximum lifetime is reached. This effect disappears when transcript lifetimes exceed the time required for the first ribosome to reach the stop codon of the longest transcript (~400 seconds).

(TIF)

S3 Fig. Simulating translation in the budding yeast S. cerevisiae at different reinitiation levels.

Predicted ribosome density, effective initiation rate, and protein yield for all 5888 budding yeast transcripts simulated at different reinitiation levels using codon-specific decoding rates from [45]. The left y-axis scale applies to ribosome density (left) and effective initiation rates (center); the right y-axis scale applies to protein yield (right). Slopes are indicated in the top-right corner. As in Fig 6, all transcripts had a fixed lifetime of 1553 seconds. De novo initiation rates were adjusted at each reinitiation level so that a 400-codon long transcript with a fixed decoding rate of 10s-1 carried an average of 6 ribosomes. Pearson correlation coefficients between simulated results and empirical measures for each reinitiation level are: (i) ribosome density (n = 5542): 100% = 0.935, 99.9% = 0.932, 99% = 0.908, 95% = 0.784, 90% = 0.619, 80% = 0.377, 50% = 0.083; (ii) effective initiation rate (n = 5348): 100% = 0.742, 99.9% = 0.742, 99% = 0.729, 95% = 0.671, 90% = 0.591, 80% = 0.468, 50% = 0.289; (iii) protein yield (n = 4933): 100% = 0.480, 99.9% = 0.478, 99% = 0.475, 95% = 0.458, 90% = 0.449, 80% = 0.442, 50% = 0.438. Empirical data sources as in Fig 2.

(TIF)

S4 Fig. Simulating translation in the budding yeast S. cerevisiae at different reinitiation levels using empirical estimates of transcript lifetime.

The simulations shown in S3 Fig were repeated using empirical estimates of transcript lifetimes. Transcript lifetimes were calculated using relative abundances from [68] (measured using single-molecule sequencing digital gene expression which eliminates the length-bias associated with RNA-Seq), a total of 36,000 transcripts per cell [69], and nascent transcription rates from [39]. For each transcript, absolute abundance was divided by the transcription rate to obtain the average lifetime. Transcripts with average lifetimes below 400s were excluded to prevent bias towards increased length-dependence (see S2 Fig). Pearson correlation coefficients between simulated results and empirical measures for each reinitiation level are: (i) ribosome density (n = 3829): 100% = 0.767, 99.9% = 0.771, 99% = 0.773, 95% = 0.741, 90% = 0.673, 80% = 0.512, 50% = 0.151, 0% = -0.072; (ii) effective initiation rate (n = 3752): 100% = 0.668, 99.9% = 0.668, 99% = 0.668, 95% = 0.654, 90% = 0.627, 80% = 0.567, 50% = 0.394, 0% = 0.182; (iii) protein yield (n = 3451): 100% = 0.615, 99.9% = 0.620, 99% = 0.609, 95% = 0.578, 90% = 0.561, 80% = 0.546, 50% = 0.535, 0% = 0.527. Empirical data sources as in Fig 2.

(TIF)

S5 Fig. Length-dependent translation only requires high levels of reinitiation on short transcripts.

Decreasing reinitiation levels with transcript length (open circles) produces very similar relationships between CDS length and ribosome density, effective initiation rate, and protein yield as does a fixed reinitiation level of 99.9% (purple circles). Slopes of each relationship are shown in the upper right corner of each panel. Length-dependent reinitiation levels capture length dependent translation at much lower reinitiation levels than that required using a fixed reinitiation level. We have arbitrarily chosen to decrease the reinitiation level as a function of CDS length according to the formula 100%*(1- CDS length/4000). The de novo initiation rate was set such that a 400-codon transcript carried an average of 6 ribosomes (99.9% reinitiation = 0.00458s-1, length-dependent reinitiation = 0.02285s-1).

(TIF)

S1 Table. S. cerevisiae codon-specific elongation rates from [45].

(DOCX)

S1 Text. Parameter estimates and justification.

(DOCX)

Data Availability

The code for our TASEP is available at: https://github.com/marvinboe/reTASEP. Data sources for Fig 2: Yeast ribosome densities were taken from from Arava et al 2003 are available at http://genome-www.stanford.edu/yeast_translation/data.shtml. Yeast initiation rates were taken from Ciandrini et al 2013, available in Table S1 at http://dx.doi.org/10.1371/journal.pcbi.1002866. Yeast protein abundances were taken from the Peptide Atlas 2013 dataset from PaxDb available at http://pax-db.org/dataset/4932/242/. HEK293T ribosome densities were taken from Hendrickson et al 2009, available in Dataset S3 at http://dx.doi.org/10.1371/journal.pbio.1000238. HEK293T protein abundances were taken from the Geiger MCP 2012 data set (based on spectral counting) from PaxDb http://pax-db.org/dataset/9606/485/

Funding Statement

AT and DG received funding from the Max Planck Society https://www.mpg.de/en. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Fig. The steady state is a poor approximation of translation at high reinitiation levels for transcripts with finite lifetimes.

(A) Time to approach the steady state ribosome density at different reinitiation levels for transcripts of different lengths. Simulations were run for 3x105 s using the parameter values described for our general model. De novo initiation rates were adjusted at each reinitiation level so that a 400-codon transcript carried an average of 6 ribosomes at the steady state. The steady state ribosome density was calculated as the average density over the final 104 s of 1000 iterations (if this value showed no directional change over time). The first passage time represents the earliest time point that the average density of 1000 iterations equaled or exceeded the steady state density. Long transcripts (4000 codons) failed to reach the steady state during the 3x105 s run time at reinitiation levels above 99.6%. Data points at higher reinitiation levels were therefore excluded. (B) Steady state translation rate (yield s-1) relative to the average translation rate (yield s-1) on equivalent transcripts with a finite 3000 s lifetime. Deviations from a ratio of 1 represent the magnitude of the misestimation of the translation rate caused by assuming a perpetual steady state on transcripts with finite lifetimes. De novo initiation rates were adjusted at each reinitiation level such that the average 400 codon-transcript carried 6 ribosomes either at the steady state or on average over its 3000 s lifetime. Consequently, all 400 codon transcripts have the same average ribosome density allowing fair comparison. At high reinitiation levels, the steady state approximation overestimates ribosome density on long transcripts and underestimates ribosome density on short transcripts.

(TIF)

S2 Fig. Full model predictions of ribosome density, effective initiation rate, and protein yield at different reinitiation levels.

We simulated translation for a wide range of transcript lifetimes and de novo initiation rates. For any given combination of transcript lifetime and de novo initiation rate, we simulated translation for transcripts with different CDS lengths and then calculated the slope of ribosome density, effective initiation rate, and protein yield over CDS length. Slopes are indicated in different colours (see colour bar), reflecting the degree of length-dependence. The white line in each panel shows the de novo initiation rate required at each lifetime such that a 400-codon long transcript carries 6 ribosomes. Our model predicts that high levels of reinitiation can cause length-dependent translation across a wide range of transcript lifetimes and de novo initiation rates. At high levels of reinitiation, length-dependence is strongest for short transcript lifetimes and low de novo initiation rates; since ribosomes are unlikely to leave the transcript, at long lifetimes or high de novo initiation rates, all transcripts eventually become saturated with ribosomes. As a result, the number of ribosomes per transcript becomes proportional to CDS length (since longer transcripts can carry more ribosomes) and ribosome density, the effective initiation rate, and protein yield become similar for all CDS lengths. As the reinitiation level falls, the effective initiation rate becomes dominated by the de novo initiation rate, which is the same for all CDS lengths, and ribosome density, the effective initiation rate, and protein yield all become independent of CDS length. In a linear model of translation (0% reinitiation), negative slopes for density and yield are only seen for extremely short transcript lifetimes simply because the first initiating ribosome fails to reach the stop codon on long transcripts before the maximum lifetime is reached. This effect disappears when transcript lifetimes exceed the time required for the first ribosome to reach the stop codon of the longest transcript (~400 seconds).

(TIF)

S3 Fig. Simulating translation in the budding yeast S. cerevisiae at different reinitiation levels.

Predicted ribosome density, effective initiation rate, and protein yield for all 5888 budding yeast transcripts simulated at different reinitiation levels using codon-specific decoding rates from [45]. The left y-axis scale applies to ribosome density (left) and effective initiation rates (center); the right y-axis scale applies to protein yield (right). Slopes are indicated in the top-right corner. As in Fig 6, all transcripts had a fixed lifetime of 1553 seconds. De novo initiation rates were adjusted at each reinitiation level so that a 400-codon long transcript with a fixed decoding rate of 10s-1 carried an average of 6 ribosomes. Pearson correlation coefficients between simulated results and empirical measures for each reinitiation level are: (i) ribosome density (n = 5542): 100% = 0.935, 99.9% = 0.932, 99% = 0.908, 95% = 0.784, 90% = 0.619, 80% = 0.377, 50% = 0.083; (ii) effective initiation rate (n = 5348): 100% = 0.742, 99.9% = 0.742, 99% = 0.729, 95% = 0.671, 90% = 0.591, 80% = 0.468, 50% = 0.289; (iii) protein yield (n = 4933): 100% = 0.480, 99.9% = 0.478, 99% = 0.475, 95% = 0.458, 90% = 0.449, 80% = 0.442, 50% = 0.438. Empirical data sources as in Fig 2.

(TIF)

S4 Fig. Simulating translation in the budding yeast S. cerevisiae at different reinitiation levels using empirical estimates of transcript lifetime.

The simulations shown in S3 Fig were repeated using empirical estimates of transcript lifetimes. Transcript lifetimes were calculated using relative abundances from [68] (measured using single-molecule sequencing digital gene expression which eliminates the length-bias associated with RNA-Seq), a total of 36,000 transcripts per cell [69], and nascent transcription rates from [39]. For each transcript, absolute abundance was divided by the transcription rate to obtain the average lifetime. Transcripts with average lifetimes below 400s were excluded to prevent bias towards increased length-dependence (see S2 Fig). Pearson correlation coefficients between simulated results and empirical measures for each reinitiation level are: (i) ribosome density (n = 3829): 100% = 0.767, 99.9% = 0.771, 99% = 0.773, 95% = 0.741, 90% = 0.673, 80% = 0.512, 50% = 0.151, 0% = -0.072; (ii) effective initiation rate (n = 3752): 100% = 0.668, 99.9% = 0.668, 99% = 0.668, 95% = 0.654, 90% = 0.627, 80% = 0.567, 50% = 0.394, 0% = 0.182; (iii) protein yield (n = 3451): 100% = 0.615, 99.9% = 0.620, 99% = 0.609, 95% = 0.578, 90% = 0.561, 80% = 0.546, 50% = 0.535, 0% = 0.527. Empirical data sources as in Fig 2.

(TIF)

S5 Fig. Length-dependent translation only requires high levels of reinitiation on short transcripts.

Decreasing reinitiation levels with transcript length (open circles) produces very similar relationships between CDS length and ribosome density, effective initiation rate, and protein yield as does a fixed reinitiation level of 99.9% (purple circles). Slopes of each relationship are shown in the upper right corner of each panel. Length-dependent reinitiation levels capture length dependent translation at much lower reinitiation levels than that required using a fixed reinitiation level. We have arbitrarily chosen to decrease the reinitiation level as a function of CDS length according to the formula 100%*(1- CDS length/4000). The de novo initiation rate was set such that a 400-codon transcript carried an average of 6 ribosomes (99.9% reinitiation = 0.00458s-1, length-dependent reinitiation = 0.02285s-1).

(TIF)

S1 Table. S. cerevisiae codon-specific elongation rates from [45].

(DOCX)

S1 Text. Parameter estimates and justification.

(DOCX)

Data Availability Statement

The code for our TASEP is available at: https://github.com/marvinboe/reTASEP. Data sources for Fig 2: Yeast ribosome densities were taken from from Arava et al 2003 are available at http://genome-www.stanford.edu/yeast_translation/data.shtml. Yeast initiation rates were taken from Ciandrini et al 2013, available in Table S1 at http://dx.doi.org/10.1371/journal.pcbi.1002866. Yeast protein abundances were taken from the Peptide Atlas 2013 dataset from PaxDb available at http://pax-db.org/dataset/4932/242/. HEK293T ribosome densities were taken from Hendrickson et al 2009, available in Dataset S3 at http://dx.doi.org/10.1371/journal.pbio.1000238. HEK293T protein abundances were taken from the Geiger MCP 2012 data set (based on spectral counting) from PaxDb http://pax-db.org/dataset/9606/485/


Articles from PLoS Computational Biology are provided here courtesy of PLOS

RESOURCES