Skip to main content
Heliyon logoLink to Heliyon
. 2023 Jan 26;9(2):e13101. doi: 10.1016/j.heliyon.2023.e13101

A dynamical stochastic model of yeast translation across the cell cycle

Martin Seeger 1, Max Flöttmann 1, Edda Klipp 1,
PMCID: PMC9922973  PMID: 36793957

Abstract

Translation is a central step in gene expression, however its quantitative and time-resolved regulation is poorly understood. We developed a discrete, stochastic model for protein translation in S. cerevisiae in a whole-transcriptome, single-cell context. A “base case” scenario representing an average cell highlights translation initiation rates as the main co-translational regulatory parameters. Codon usage bias emerges as a secondary regulatory mechanism through ribosome stalling. Demand for anticodons with low abundancy is shown to cause above-average ribosome dwelling times. Codon usage bias correlates strongly both with protein synthesis rates and elongation rates. Applying the model to a time-resolved transcriptome estimated by combining data from FISH and RNA-Seq experiments, it could be shown that increased total transcript abundance during the cell cycle decreases translation efficiency at single transcript level. Translation efficiency grouped by gene function shows highest values for ribosomal and glycolytic genes. Ribosomal proteins peak in S phase while glycolytic proteins rank highest in later cell cycle phases.

Keywords: Translation, Cell cycle, Yeast, Stochastic, Model

Graphical abstract

graphic file with name gr001.jpg

Highlights

  • We present a whole single-cell simulation of protein translation.

  • The mRNA pool at whole-transcriptome level has 15k-70k transcripts organized in 4k-6.6k transcript types.

  • The whole-cell tRNA pool comprises 3M molecules with 41 anticodon types.

  • Initiation is gene-specific, elongation is sequence-specific.

  • All protein-synthesis related observables can be simulated.

1. Introduction

Protein translation lies at the interface between genetics and structural biology and constitutes one of the key components of a cell's machinery. Many of the key gene regulatory mechanisms, such as DNA methylation, transcriptional initiation and termination, splicing, mRNA transport and localization and mRNA degradation take place before translation, whereas others, such as protein maturation and degradation or posttranslational modifications occur after translation. Co-translational regulation has, e.g. been studied in the context of the transcriptome-proteome correlation Brar and Weissman [7], Calviello and Ohler [8], Lahtvee et al. [32]. For a review of the experimental data situation see Csárdi et al. [12].

The degrees of freedom by which translation can play a regulatory role are constrained by the manifest uniformity of the process: all non-mitochondrial proteins use the same translation machinery and the same pools of ribosomes and tRNAs (Fig. 1). Previous research has identified to date the following co-translational regulatory mechanisms: translation initiation (which itself is influenced by factors such as accessibility of the 5' cap, secondary mRNA structure, promoter switching, electrostatic potentials in the ribosomal tunnel, in particular the interaction of hydropathic residues of the nascent polypeptide with the ribosome (Lu and Deutsch [35], Dao Duc and Song [15]), ribosomal interference (Dao Duc and Song [15]), codon composition of the ORF's 5' end (Tuller and Zur [54]) or the presence of small ORFs in the 5' UTR), mRNA codon usage and the nascent polypeptide secondary structure (Dever et al. [16]). Less well understood, however, are quantitative aspects of the regulation, which include

  • the influence of synonymous codon usage on the regulation/modulation of translation

  • the relative balance of resource efficiency and capacity utilization between mRNAs, ribosomes and tRNAs

  • the impact of the variability and time-dependence of mRNA abundance on translation rates

Ciandrini et al. [11] analyzed similar questions as the ones raised here but with data based on microarrays, not on Ribo-Seq as in this study. The ideal model organism to study these effects is yeast because of its relative simplicity and the low importance of alternative splicing (Schreiber et al. [46]). The emergence of techniques such as RNA-Seq (Nagalakshmi et al. [40]), ribosome profiling (Ingolia et al. [29]) and SILAC (Ong et al. [41]) has greatly improved data availability both at the transcriptomic and at the proteomic level. Furthermore, using these data, many kinetic parameters such as initiation probabilities have been estimated for individual genes (Shah et al. [48]).

Figure 1.

Figure 1

Degrees of freedom of the model. Implemented and individually tracked are ribosomes, tRNA molecules of 41 subtypes and a transcriptome consisting of 15k - 70k transcripts, depending on the modelled scenario (base case or time-resolved).

We describe a single-cell model that integrates several of these parameter sets, using available transcriptomic data as inputs and generating protein synthesis rates as outputs. We apply the model both in a “base case” which aims at describing an average exponential-phase yeast cell and, later on, in a time-resolved setting, which emulates the development and growth of a single yeast cell across its cell cycle.

The modelling framework is a real-time discrete stochastic model resembling the Gillespie approach described in Shah et al. [48] but treats mRNA molecules, ribosomes and tRNAs as individual objects participating in a Monte-Carlo simulation. While we use the same biophysical assumptions, we focus on different analyses than Shah et al. [48]. We chose the formalism as it enables the study of all modelled processes at arbitrary temporal and molecular resolution: individual ribosome speeds, mRNA occupancies, tRNA dynamics etc., and also because it is well-suited for future expansion, e.g. to include transcriptional regulation, tRNA recharging, mRNA and tRNA synthesis and decay or protein degradation.

2. Results

2.1. The model's base case set-up confirms experimental findings

To test the overall validity of the model, the base case set-up (fixed transcriptome, ribosome and tRNA count) was used to simulate a range of observables and compare them to experimental values. A first pair of observables are the speed and efficiency of translation, defined as the number of finished proteins for every gene and the number of finished proteins per transcript for every gene (see Fig. 2 A, B).

Figure 2.

Figure 2

Quantitative characterization of protein synthesis. A: Protein synthesis rate overall and per transcript. Distribution of translation speeds of individual genes, showing a variation over four orders of magnitude. B: Distribution of translation efficiencies of individual genes, taking into account transcript abundance. Variation is reduced to approx. 1.5 orders of magnitude with residual variation mostly caused by variation in initiation probabilities. C: Variability of ribosome speed on the transcript. The time required to finish a protein after a ribosome has bound to the start position depends mostly, but not entirely on the length. D: Run-time of ribosomes on transcripts after ribosome has bound shows some variation by gene, in particular because of codon usage (see also Fig. 3). E: Amplification of pre-transcriptional regulation of protein synthesis rates. The slope of the LOWESS regression line (blue) is greater than 1, which indicates that genes that enter translation with high initiation rates emerge with disproportionately high synthesis rates whereas genes with low initiation rates are suppressed. R2 values are of log values where log axes are displayed.

The spread of the two distributions differs widely (four orders of magnitude for translation speed vs 1.5 orders for translation efficiency), which can be attributed to the influence of transcript abundance. The variability remaining after normalizing for the transcript abundance can be ascribed to the variability in initiation probabilities.

Protein synthesis rates predominantly depend on the initiation probability and the transcript abundance (linear regression R2=99%). The ribosome speed on the polysome (conditional on a previous successful initiation event) depends on the codon usage of the gene but no longer on the initiation probability (Fig. 2 C).

The model shows that the time to finish a protein depends mostly, but not exclusively on ORF length (Fig. 2 C), with a ribosome speed distribution across genes centered around 6-10 codons per second for each ribosome (Fig. 2 D). This compares with experimental estimates of 7–8 codons per second (Arava et al. [2]).

The share of free ribosomes in the base case steady state is at 18.7% (compared to 15% in Shah et al. [48]). Finally, the model is able to address the question of the co-translational contribution to overall gene regulation. Fig. 2 E shows the relation between transcript abundance and modelled protein synthesis rate. In the double-logarithmic plot, the near-linearity with slope > 1 is evidence of co-translational regulation by amplification of mRNA Csárdi et al. [12], Shah et al. [48].

2.2. Analysis of ribosome stalling at slow codons

Ribosome densities per gene exhibit characteristic peaks (Fig. 3 A) at certain positions. These can be interpreted as preferred positions for ribosome stalling. The connection between these peaks and slow codons is clarified as follows. For every gene one defines a peak as any position on the gene where the ribosome density is more than 5 standard deviations greater than the mean (5σ being conservative to rule out normally distributed outliers). In a snapshot of the simulation, all peaks are then recorded and codons and anticodons at the peaks are counted. The result (Fig. 3 B) shows a deviation from the uniform distribution with most peaks occurring at the gag anticodon (corresponding to codons cuu and cuc) and fewest at the ccg anticodon (corresponding to codon cgg); the ratio of peak counts varies by a factor of 16.5 between these two anticodons.

Figure 3.

Figure 3

Ribosome stalling as a regulatory mechanism. A: Ribosome stalling peak for a typical gene (Ribosomal 60S subunit protein L16A). B: Non-uniform distribution of ribosome stalling peaks across anticodons. C: Monotonically decreasing relationship between peak counts at anticodons and the corresponding tRNA abundance (in red: LOWESS regression). D: Model-free relationship between anticodon-specific tRNA abundance and anticodon demand (total number of required anticodons weighted by mRNA abundance) (in red: LOWESS regression). R2 values are of log values where log axes are displayed.

Systematically, the peak counts show a monotonically decreasing dependency on tRNA abundance (Fig. 3 C) which can be understood as ribosome stalling becomes more likely if tRNAs required at a particular codon become less abundant and vice versa. This is despite the fact that evolution has equilibrated tRNA supply (abundance) and tRNA demand (total frequency of required anticodons, also taking into account transcript abundance) (Fig. 3 D). This means that, at whole-transcriptome level, an anticodon occurs more frequently in the cell if it there is a higher overall (transcript abundance-weighted) demand for it. There is one outlier, ccg (bottom left data point in Fig. 3 D), for which demand is exceptionally low even compared to its already low tRNA abundance. This explains the lowest peak frequency at this anticodon in Fig. 3 B.

2.3. Codon usage metrics predict elongation and synthesis rates

Gingold and Pilpel [26] identified the Codon Adaptation Index (CAI) Sharp and Li [49] and the tRNA Adaptation Index (tAI) Reis et al. [44], among others, as appropriate metrics of codon usage bias, i.e. of deviations from uniform codon usage (among synonymous codons) in the coding regions of genomic sequences. While CAI only considers amino acid composition of the gene and discriminates between the translation efficiencies of individual codons, tAI additionally takes into account tRNA availability in the cell. Both metrics range between 0 and 1, with 0 indicating uniform codon usage and 1 maximum bias.

Irrespective of the considered metric or parameter (tAI, CAI, initiation rate, transcript count) there is a positive correlation that supports the maximization of the production of frequent proteins and at the same time fine-tuning/regulation of the synthesis of infrequent proteins. These observations carry over to modelled protein synthesis rates and elongation rates. The total synthesis rate depends on CAI and tAI in much the same way as the initiation rate, indicating the importance of the initiation rate for the total synthesis rate (Fig. 4 left panel). As before, although both have positive correlation, tAI displays a less noisy, more regular scatter plot than CAI.

Figure 4.

Figure 4

Codon usage metrics (CAI and tAI) against modelled protein synthesis rates and elongation rates. One marker corresponds to one gene. Red: LOWESS regression lines. R2 values are of log values where log axes are displayed. Left panel: positive dependency of protein synthesis rates on codon usage, slightly higher correlation with tAI than with CAI. Right panel: positive dependency of gene-level elongation speed on codon usage, again slightly higher correlation with tAI.

As far as the elongation rate is concerned, there is a strong correlation of both CAI and tAI with elongation speed (Fig. 4 right panel); only for tAI it is linear.

In summary, tAI is a linear predictor of elongation speed and a non-linear predictor of synthesis rate, while initiation rate is a linear predictor of total protein synthesis rate.

2.4. Cell cycle resolved transcriptome and proteome analysis

With the cell cycle time-resolved transcriptome, protein time courses were simulated across one cell cycle. Budding yeast divides asymmetrically with a large mother and a smaller bud (daughter cell). Here, it was assumed that the bud at cytokinesis had 70% of the mother cell's biomass, so that during one full cell cycle ribosome and tRNA molecule counts were interpolated between 100% (beginning of cell cycle) and 170% (end of cell cycle) of the base case values. In addition, the volume growth of the cell simultaneously implies a reduction of the probabilities of initiation and elongation (ribosomes and tRNA molecules have to diffuse through a comparatively larger volume before meeting an initiation or elongation site). Thus, cell growth is responsible for two opposing effects, which play a role in different parameters of the formalism.

The simulation was carried out separately for every 5 minute interval until a quasi-steady state of ribosome occupancy was reached. The protein synthesis rates in this steady state were measured. To incorporate cell growth, different initial conditions were chosen for mRNA, ribosome and tRNA counts at the interval starting points.

Results are shown in Fig. 6. It is visible that the time course of the mean translational efficiency is opposite to that of the transcript abundance, indicating that a limitation is in effect, namely by the ribosome initiation events. The same effect is present at the level of individual genes, albeit with additional noise.

Figure 6.

Figure 6

Time-resolved transcriptome and translational yield per transcript. Top: modelled time course of total transcript abundance across cell cycle, obtained by scaling data from FISH experiments to relative data from RNA-Seq experiment. Bottom: mean (across transcriptome) translational efficiency per transcript obtained from simulating cell-cycle resolved transcriptome to steady state in five minute intervals.

A further analysis was carried out to map translational efficiencies to functional categories (chosen from Proteomaps (Liebermeister et al. [34])). The visualization was improved by omitting all genes with unknown pathway and those that had a count below 50 genes. The results are displayed in Fig. 5 A–D.

Figure 5.

Figure 5

Time courses by functional category A: Translation efficiencies with no major shifts between functional categories appearing during the cell cycle. Proteins with unknown function and any categories with counts below 50 are not displayed in chart. B: Relationship between initiation rates and translation efficiencies by functional category. Ribosomal proteins are transcribed more efficiently than any other category, most of which can be attributed to their higher initiation rates. Blocking of initiation sites at high initiation rates leads to saturation effect. Upper envelope of the point cloud corresponds to genes that have no ribosome stalling. Proteins with unknown function and any categories with counts below 50 are not displayed in chart. C: Ribosome efficiency ranking by functional group vs. time. D: Estimated absolute number of transcripts for eight histone proteins. The maximum occurs at the transition from late G1 to S phase.

3. Discussion

A model for protein translation in the yeast S. cerevisiae was presented which exhibits good agreement for a broad range of observables both from experiment and from other theoretical studies. The model reproduces the limiting characteristics of ribosome initiation that were already pointed out by other studies (Shah et al. [48]) and adds to this a secondary mechanism of translational limitation, ribosome stalling at slow codons, as found experimentally in the analysis of ribosome profiling data and using theoretical models by Dana and Tuller [14] and Gardin et al. [23]. Our model goes beyond that of Pop et al. [43] as it can produce non-equilibrium and transient dynamics that may be important during the cell cycle. In contrast to a comparable approach proposed by Brackley et al. [6], our model is applied at genome scale, describes elongation microscopically, and uses gene-specific initiation rates but does currently not model tRNA recharging. Where the same parameters are used, our results are consistent with those in Shah et al. [48]; this study adds further analyses in Sect. 2.2 and Sect. 2.3 and a novel, time-dependent parameter set in Sect. 2.4.

Protein synthesis rates are discussed quantitatively in Arava et al. [2] (microarray analysis) and von der Haar [55] (using different experiments). While Arava et al. [2] find a total synthesis rate of approximately 1900 prot./cell/s, von der Haar [55] quote an estimate of 13000 prot./cell/s (between 6500 and 19500 prot./cell/s). Using the parameterization for the time-averaged model (with 60000 transcripts), our model yields an estimate of close to 6000 prot./cell/s which places it approximately in the middle between the two and approximately one standard deviation from von der Haar [55]. The time-resolved parameterization used in Sect. 2.4 has a lower average transcript count of 38000 transcripts (average over a one-hour cell cycle) and gives a slightly different result. There, the total number of proteins produced per second is 2400 which, while still placing the model between the two estimates, is closer to the Arava et al. [2] estimate than to the von der Haar [55] estimate.

Several sources in the literature discuss the speed of elongation or speed of the ribosome on the transcript: Karpinets et al. [31] give a range of 2.8–10 AA/s/ribosome citing Waldron and Lacroute [56] and Boehlke and Friesen [5]. Both studies investigate the connection between growth rate (generations per unit time) and elongation rate and find it to be a linear relationship (Fig. 8 in Waldron and Lacroute [56] and Fig. 2 in Boehlke and Friesen [5]). Altogether these numbers, including the ranges quoted in the original studies are in good agreement with the findings presented in Fig. 2.

A consequence thereof is the co-translational amplification of mRNA abundance, defined as the cooperation of various parameters and mechanisms to produce protein synthesis rates that grow more than proportional with the number of transcripts. In essence, Fig. 2 shows this amplification of mRNA during translation. Csárdi et al. [12] have stated that protein levels rise much more rapidly than proportional to mRNA levels, citing stability of mRNA structures in the 5' region which leads to an increased density of ribosomes on high-expression mRNAs. The same would hold true of other regulatory mechanisms such as promoter switching. These are summarized by the translation initiation parameter in our model as the main contributor to this amplification. Also, Weinberg et al. [58] find the same effect in their unbiased data set.

In principle, two mechanisms may be responsible for ribosome pausing (defined as a ribosome staying at one position for an above-average period of time), namely ribosome queuing (one ribosome is stopped sterically by another ribosome 3' downstream) and tRNA-induced ribosome stalling described in the following. McCarthy [37] state that in the initiation-limited regime, ribosome queuing is infrequent, yet Diament et al. [18] find a share of 0.20.35 of ribosome pairs of all ribosome footprints in their Ribo-seq data.

The peak sorting of Fig. 3B can be compared with the ribosome residence time (RRT) results quoted in Gardin et al. [23]. A grouping of the RRTs by cognate anticodon and subsequent averaging by anticodon yields a Spearman correlation of 61% to the peak frequencies reported above which is plausible given the simplicity of this averaging/grouping method and the additional processing by Gardin et al. [23].

Ben-Yehezkel et al. [4] found a positive correlation between mean decoding rates (Dana and Tuller [14]) and protein abundance which is consistent with the one found here despite the absence of mRNA folding in our model.

Regarding the variation of translation efficiency, the ratio between the 99%ile and 1%ile of the efficiency distribution is found to be 21 in our model. For comparison, the factor 100 found by Ingolia et al. [29] appears not consistent with this. At the same time, however, Weinberg et al. [58] finds a 15-fold range for the initiation efficiency (also estimated from ribosome profiling data), which may be largely due to the fact that this work is using the same initiation probabilities.

The particular features of ribosome kinetics discovered at certain codons indicate a dynamical way of cotranslational regulation of protein synthesis which depends on an intricate balance of codon usage, translation initiation as well as ribosome and tRNA availability. It was shown that the codon-level regulatory mechanisms can be viewed as local deviations from an otherwise evolutionarily adapted balance between tRNA supply and tRNA demand. In other words, although supply and demand are well adapted at the whole-transcriptome level, the usage of slow (fast) codon/anticodon pairs will still cause a deceleration (acceleration) of protein synthesis at the level of the individual transcript.

Brackley et al. [6] also found a balance between supply and demand for a particular tRNA in their statistical mechanics inspired TASEP model. In particular they explained that a mismatch between supply and demand can become the rate-limiting step of translation. An important difference between the Brackley et al. [6] model and this model is that the former was only able to simulate systems around two orders of magnitude smaller than a realistic yeast cell while our model is able to simulate the whole transcriptome. The high agreement between the relative codon speeds of the TASEP model and the peak frequencies measured using our model (Fig. 3) is particularly remarkable as the TASEP model uses a parameterization that is independent from ours.

Similarly, Gorgoni et al. [27] employed a TASEP model to simulate the interplay between rare codons, tRNA abundance, initiation rates and translation efficiency in a way that is consistent with our findings.

A whole-cell model of translation with similar assumptions to ours, based on the TASEP model but parameterised for E. coli was developed by Levin and Tuller [33]. It would be interesting to compare results for the same organism.

All gene-level parameters and metrics (tAI, CAI, transcript count, initiation rates) display a positive correlation to one another, which is plausible from an evolutionary perspective. Where evolutionary pressure selects for high protein abundance, it will affect all parameters simultaneously, as opposed to, e.g. only codon usage, unless certain parameters offer additional regulatory mechanisms, such as co-translational folding.

In the same way, an analysis comparing transcript count with codon usage leads to a similar conclusion, namely that genes with low expression have lower CAI and tAI. It is natural to assume that these genes are more strongly regulated which is partially achieved by codon usage, and partially by strength of expression (transcript abundance). McManus et al. [38] found that codon usage bias was positively correlated to mRNA abundance, ribosome occupancy, and translation efficiency which fits well with the above codon usage analysis results.

While the interpretation of common optimization of gene expression fits well with the above general statement, there are no direct mechanistic links between the two because the initiation rate only depends on the RNA sequence upstream from the start codon, whereas tAI depends entirely on the sequence downstream from the start codon. This is understandable from molecular evolution as highly expressed genes undergo selection to simultaneously improve their elongation and their initiation rate.

Overall, two contributions to regulation emerged which complement each other in their properties and cooperate by amplification to produce the required protein synthesis rates: on one hand, the energy-intensive but adjustable regulation by the mRNA abundance, on the other, the inexpensive but also inflexible regulation by translation initiation and codon usage. In this sense, the cell appears to regulate translation by enhancing high abundance mRNAs and repressing low abundance mRNAs. This is also consistent with an observation by Man and Pilpel [36], according to which codon usage and transcript abundance cooperate to yield higher protein synthesis rates.

Our model does not take into account slow down due to mRNA secondary structure and interaction with amino acids and may thus overemphasize the aspect of codon/anticodon usage on translational regulation. In addition, both the time-resolved and the average-case simulations are also affected by the fact that RNA-Seq and Ribo-Seq data is often noisy (Gerashchenko and Gladyshev [24], [25], Diament and Tuller [17]).

The procedure to derive the time-resolved results (Sect. 2.4) may imply error sources such as the following:

  • Absolute and relative mRNA abundance data stem from two different experiments. Strains, media and conditions have slight differences, and the experimental preparations were different (synchronized vs. unsynchronized populations etc.) which may lead to differences in gene expression.

  • The mapping of the cell cycles can only be an approximation because the cell cycle lengths differ between the two experiments and the RNA-Seq experiments largely shadows the early G1 phase.

  • The number of observed FISH genes may be too small to allow for extrapolation to the whole transcriptome.

  • The statistical model may not be a good approximation for all genes; in particular the assumption that one time-resolved scaling factor is sufficient for all genes may not be accurate.

Time dependence of mRNA abundance across the cell cycle was analysed theoretically and experimentally by Zopf et al. [61] albeit with a focus on stochasticity/bursts and not at a whole transcriptome level. One reference that we found, Eser et al. [19] suggests that maxima of mRNA peak times occur at the end of G1 phase. While mRNA peak times are not the same as mRNA abundances, a rough correspondence may be expected, which is consistent with our findings. Biologically, mRNA peaks should precede protein peaks by several minutes as this corresponds to the time to produce noticeable amounts of proteins. Other references, Cao and Grima [9] and Beentjes et al. [3] include stochasticity in their analytical model but doe not quote the time-resolved mRNA content of the cell across the cycle. Both studies, as well as Perez-Carrasco et al. [42], focus on the variability and bursts of the number of mRNAs rather than on the actual cell cycle dependence. It would be very interesting to unify both effects to obtain deeper insights into mRNA dynamics. It was shown that there is a strong dependency of the efficiency of translation on the amount of mRNA in the cell. If it is indeed the case that the absolute level of mRNA fluctuates strongly during the cell cycle—which is supported by the combination of FISH and RNA-Seq experiments—then this has significant implications on the efficiency of protein translation. Under these conditions, translational efficiency might vary by a factor of three or more in the course of the cell cycle. We have not yet combined the two types of analyses, cell-cycle dependency and codon/anticodon usage into one model setup. This analysis is planned as a next step and would allow a comparison with the results by Frenkel-Morgenstern et al. [21]. In a similar manner, more granular input data would be required to model the regulation of elongation across the cell-cycle as observed by Sabi and Tuller [45].

Our analysis shows that the inverse relationship between translational efficiency and transcript abundance reduces the variability of cellular protein synthesis. This is plausible as it saves the cell from having to assemble/disassemble the translational apparatus in every cell cycle.

These findings are consistent with the ones obtained by Zarai and Tuller [59] in a TASEP model, although this paper does not attempt to provide an explicit model of the cell cycle.

The right-leaning curve in the scatter plot Fig. 5 indicates a degree of saturation, i.e. further increasing the initiation rate would have lower marginal impact on the translation efficiency. This is because a high initiation rate will lead to frequent blocking events, in which a ribosome is attempting to bind to an occupied initiation site futilely.

The ranked average translation efficiencies in Fig. 5 remove the noise from the gene-level analysis and allow a biological interpretation. The most striking feature is the time course of the rank of ribosomal proteins: ribosomes appear to be more enhanced in S phase than other protein categories.

This may be the case because S phase precedes G2 phase during which the cell prepares for mitosis and a major part of protein synthesis takes place. Protein synthesis would then require a ramp-up of ribosomes. The reason why the ribosomal proteins are enhanced in S phase and not in G2 phase might be the high complexity of ribosome biosynthesis which may take several minutes, a significant part of a cell cycle phase. Therefore, in particular when the cell cycle is abnormally short as in the RNA-seq data set, synthesis of ribosomal proteins, as a precondition to synthesis of all other proteins needs to occur in S phase, preceding and adjacent to G2 phase.

It appears that ribosome synthesis is largely completed with S phase; instead, the synthesis of building blocks for protein synthesis, but also of chaperones and folding catalysts, comes to the fore in G2 phase.

Energy metabolism, represented here by glycolysis, ranks highest in efficiency in early G1 phase as well as in G2 and M phase. This can be interpreted as the supply of energy equivalents for protein synthesis.

Finally, regarding the total number of synthesized protein molecules per cell, literature values for S. cerevisiae are in the region of 5107 protein molecules per cell (Futcher et al. [22]) (or higher) which seems to be underestimated by the value 8.81106 synthesized protein molecules per hour produced by our model (for the time-resolved case) and somewhat less so by the value of 21.7106 synthesized protein molecules per hour for the not time-resolved case. Nevertheless, the two independent (of each other) estimates in Table 1 and Table 2 partially explain the modelled value and reconcile it to some extent with the Futcher et al. [22] value.

Table 1.

Estimate for number of proteins produced per cell per hour using free ribosomes and initiation rates. As before, it is assumed that ribosome drop-off does not occur. The lower and upper bounds assume the average mRNA transcript count from the time-resolved parameterization and the time-independent parameterization, respectively. (The time-independent parameterization uses a literature value, whereas the time-resolved parameterization uses the model laid out in Sect. 2.4.)

Number of ribosomes 200000
of which free × 16%
Free ribosomes 32000
× avg. initiation rate (transcr.-weighted) × 2.2 ⋅ 10−6 s−1…3.5 ⋅ 10−6 s−1

Avg. initiations per transcript per second 0.071 s−1…0.112 s−1
× number of transcripts per cell × 39315…60000

Number of initiations per cell per second
2784 s−1…6735 s−1
× 3600 s/h

Total number of proteins synthesized per hour 10.0 ⋅ 106 h−1…24.2 ⋅ 106 h−1

Table 2.

Estimate for number of proteins produced per cell per hour using bound ribosomes and elongation rates.

Number of ribosomes 200000
of which bound × 84%
Bound ribosomes 168000
× peptide bonds per ribosome per second × 8 s−1

Peptide bonds per cell per second
1.3 ⋅ 106 s−1
× 3600 s/h

Peptide bonds per cell per hour 4.8 ⋅ 109 h−1
÷ average peptide bonds per protein (transcr.-weighted) ÷ 390

Total number of proteins synthesized per hour 12.4 ⋅ 106 h−1

Explanations for this discrepancy might be ascribed to the following reasons:

  • First it shall be noted that the value of 8.81106 synthesized protein molecules per hour quoted in Sect. 2.4 is derived from the time-resolved parameter set, so it is to be compared with the lower bound in Table A.3, with which it agrees relatively well. The value 21.7106 from the generic model is closer to the value in Futcher et al. [22].

  • The cell cycle length found for the α-factor synchronized cells is abnormally short (60 min as opposed to the typical cell cycle length of 160 min, giving the cells less time to produce proteins in the shortened cell cycle). The cells are arrested in G1 phase and then released. As a result of this cell cycle block, such cells grow to an abnormally large size and can be expected to have “pre-produced” a major fraction of the proteins that are required for division. In consequence, they do not need to produce as many proteins as cells undergoing a typical cell cycle.

Summarizing, a better understanding of the regulatory role of protein translation would be an important step towards controlling protein output in a directed/targeted process. E.g. where codon usage does have an impact on protein yields, our model could be used to predict how a mutation towards synonymous codons would influence the yield of the target gene but also the impact on the synthesis rates of all other proteins, making the model a potential tool for bioengineering.

Funding statement

NO funds have been received.

CRediT authorship contribution statement

Martin Seeger: Analyzed and interpreted the data, contributed reagents, materials, analysis tools or data, wrote the paper. Max Flöttmann: Analyzed and interpreted the data. Edda Klipp: Analyzed and interpreted the data, wrote the paper.

Declaration of Competing Interest

The authors declare no conflict of interest.

Contributor Information

Martin Seeger, Email: martin.seeger@hu-berlin.de.

Max Flöttmann, Email: max.floettmann@biologie.hu-berlin.de.

Edda Klipp, Email: edda.klipp@rz.hu-berlin.de.

Appendix A. STAR methods

A.1. Initiation submodel

In a discretized cell divided into Nr voxels, a single ribosome diffuses to one neighbouring voxel of side length λr per time τr. Equivalently, a fixed position, e.g. that of a given transcript's translation initiation site is visited by a fixed ribosome at the rate (τrNr)1.

With gene-specific initiation probabilities pinit,g it follows that the gene-specific initiation rate per ribosome (Eq. (A.1)) is given by

rinit,g=pinit,gτrNr. (A.1)

In the Monte-Carlo simulation time is discretized into intervals of size Δt, so in this interval a ribosome arrives at a given initiation site with probability pinit,gΔt/τrNr. The number of ribosomes in the cell is decomposed into free (f) and bound (b) ribosomes (Eq. (A.2)), which reads

R=Rf+Rb. (A.2)

Therefore, the number of ribosomes arriving in the time interval Δt at a given mRNA's initiation site is a binomial random variable with parameters n=Rf and pinit,g=rinit,gΔt. Due to the large number of ribosomes (105) and the small initiation probabilities (106 for a time interval of Δt0.1s) this variable can be replaced by a Poisson variable in the usual way. The model accounts for steric hindrance, so once an initiation was successful, another initiation cannot occur until the first ribosome has proceeded downstream by at least its “footprint”, corresponding to 10 codons.

A.2. Elongation submodel

The TRSL model identifies different tRNA types that possess the same anticodon. In yeast this yields 41 tRNA types, not counting the initiator tRNA, as the latter's dynamics are summarized in the initiation probability (the ribosome starts translation charged with an initiator tRNA). The number of tRNAs of type i is again decomposed into free (f) and bound (b) subtypes as given in Eq. (A.3), with

Ti=Tf,i+Tb,i. (A.3)

To determine the parameters of the tRNA diffusion and binding experiment, one defines the elongation rate as given in Eq. (A.4) with

relong=cτtNt, (A.4)

where the competition coefficient c describes the competition between different tRNA species.

To determine the dimensionless diffusion probability (Eq. (A.5)) one multiplies this rate by a codon-dependent wobble factor wi and the simulated time interval Δt,

pelong,i=cwiΔtτtNt. (A.5)

In order to model several successive elongation steps per time interval, the following method is applied:

  • The probability for all Tf,i free tRNAs to not bind at the A site (Eq. (A.6)) is generated as
    pfail,i=(1pelong,i)Tf,i, (A.6)
    corresponding to a geometric distribution.
  • A random experiment is carried out to determine whether at least one cognate tRNA was able to bind in the time interval Δt (with the concomitant ribosome translocation and peptide elongation).

  • If a binding event was successful, it occurred on average in the middle of the time interval. The time remaining in this simulation time slot for another tRNA to bind is the rest, i.e. Δt/2. The experiment is repeated with the halved time interval and an updated value for pfail,i. (Because a new codon is in focus, requiring updates to wobble factor and free tRNA abundance.)

  • This procedure is repeated until no binding occurs (which will eventually be the case because the average probability is reduced by a factor of 2 in every step).

This procedure permits an elongation of the polypeptide by more than one amino acid per time step (which is obviously not necessary for the initiation).

A.3. Model description

Protein synthesis was simulated by a discrete stochastic Monte-Carlo model. The model components are (1) the mRNA transcripts (the “transcriptome”), (2) ribosomes (free or bound at a certain codon position of one of the mRNAs), and (3) tRNAs of one of 41 cognate types, distinguished by their anticodons (Fig. 1). mRNA molecules are characterized by their nucleotide sequence and ribosome occupancy (by exact nucleotide position); every mRNA molecule is maintained as a separate object. A polysome is defined as an mRNA molecule in complex with one or more ribosomes. Every ribosome on a polysome can be unoccupied or occupied by a tRNA molecule. Bound ribosomes are displaced by one codon after the tRNA type required by the current codon has bound; after displacement the nascent polypeptide is elongated by one unit and the tRNA molecule is released. tRNA molecules can bind only to unoccupied ribosomes on an mRNA transcript; the competition between different tRNA species is accounted for by an explicit coefficient (Shah et al. [48]) based on the ratio of arrival rates of cognate and non-cognate tRNA species at the focal ribosome's A site (Fluitt et al. [20]). In every elementary reaction the model conserves the total number of ribosomes and tRNA molecules while distinguishing between the bound and free ribosome and tRNA species. ER-bound ribosomes are not modelled differently than cytosolic ribosomes.

The diffusion dynamics of ribosomes and tRNAs that form the kinetic basis of the model are described in Shah et al. [48]. At the core of the simulation are two classes of random experiments: the first evaluates a Poisson random variable in order to determine how many ribosomes diffuse to a given mRNA's initiation site in a given time step (typically 0.1...1.0 s). If successful, a ribosome is considered bound to the first codon of the given transcript, unless another ribosome is already bound to one of the first 10 codons including the start codon (the “footprint” of a ribosome leads to steric hindrance). By the same effect, any two ribosomes cannot come closer than 10 codons to each other on a polysome (“ribosome queueing” Brackley et al. [6]).

The second experiment evaluates a binomial variable for each unoccupied bound ribosome to simulate the binding of a codon-specific tRNA molecule to the ribosome as the ribosome translocates by one codon on the mRNA. Termination is modelled as an instantaneous event upon occurrence of a stop codon.

Translation initiation, elongation and termination are simulated for every ribosome and mRNA transcript. Using a small time step (∼0.1 s) so that the reservoirs of ribosomes and tRNA molecules do not change significantly, the three processes can be treated independently.

A.4. Choice of parameters and input data

Translation initiation in the model is determined by the product of the gene-specific initiation probability and the number of free ribosomes. Probabilities of translation initiation can only be inferred indirectly by ribosome profiling (e.g. Shah et al. [48], Siwiak and Zielenkiewicz [50] using data from Ingolia et al. [29], Pop et al. [43] using data from Ingolia et al. [30]) or microarrays (e.g. Ciandrini et al. [11]). This study uses the set of translation initiation probabilities also used in Weinberg et al. [58], which controls for Ribo-Seq protocol-specific biases.

To calculate diffusion probabilities the cell is “voxelized” by dividing its volume by the number of possible discrete ribosome positions (Shah et al. [48]). Initiation rates used herein are obtained from initiation probabilities by dividing by the ribosome's diffusion time and the number of ribosome positions.

To define a base case of the model describing an average cell in exponential phase, analyses are conducted with a fixed transcriptome, corresponding to the distribution measured in Weinberg et al. [58]. There are 4475 yeast genes for which a complete data set, including initiation probability, is available. Assuming a total mRNA abundance of 60000 (Zenklusen et al. [60]), mRNA transcripts were drawn from the distribution of this gene set. In agreement with Shah et al. [48], 200000 ribosomes were used. As for the tRNA molecules, abundances are given in Chu et al. [10], based on tRNA gene copy numbers. This data set is used together with the yeast-specific codon-anticodon mapping from Hani and Feldmann [28].

For elongation the cell is voxelized using the possible discrete tRNA positions (Shah et al. [48]). The tRNA binding probability per unit time is given by the product of the wobble factor (0.625...1.0) with the tRNA competition coefficient (Shah et al. [48]), divided by the tRNA molecule's diffusion time and the number of distinct positions. The number of attempts in the binomial experiments is then given by the number of unbound cognate tRNA molecules.

Because there might be more than one successful tRNA binding event per ribosome in every time step, in the case of successful tRNA binding the nascent peptide is elongated by one amino acid and the binomial experiment is repeated with a halved time step and the next required tRNA type. This is iterated until no new binding takes place.

To obtain steady-state observables, we simulated cells to equilibrium ribosome occupancy which was reached after a “burn-in” period of about 600 simulation seconds. For measurements of dynamical quantities, we continued simulations until 3600 s.

All model parameters used in the base case are summarized in Table A.3.

Table A.3.

Model parameters used in the base case scenario.

Parameter Value Source
Cell volume 4.2 ⋅ 10−17 m3 Siwiak and Zielenkiewicz [50]
Ribosome abundance 2.0 ⋅ 105 Warner [57]
Ribosome diffusion time 5.0 ⋅ 10−4 s Shah et al. [48]
Ribosome voxel size 2.7 ⋅ 10−23 m3 Shah et al. [48]
Ribosome “footprint” (width) 10 codons Pop et al. [43]
Ribosome initiation rates 9.4 ⋅ 10−10 s−1 – 1.4 ⋅ 10−5 s−1 Weinberg et al. [58]
tRNA abundances 11070 – 177112 Chu et al. [10]
tRNA diffusion time 4.5 ⋅ 10−7 s Shah et al. [48]
tRNA voxel size 3.4 ⋅ 10−24 m3 Shah et al. [48]
tRNA competition coefficient 7.8 ⋅ 10−4 Shah et al. [48]
Wobble factor 0.625 for mismatch
1.000 for Crick-Watson pair
Curran and Yarus [13]
Size of transcriptome 6.0 ⋅ 104 Zenklusen et al. [60]
Transcript abundances 0 – 1381 Shah [47]

A.5. Construction of a cell-cycle time-resolved transcriptome

Away from the base case, a time-resolved transcriptome was estimated with the aim to model the variation of mRNA abundance during the cell cycle. In the absence of absolute genome-level mRNA quantification data for yeast, a simple model was used to combine absolute mRNA counts from a FISH experiment on a subset of genes with relative mRNA percentage values from an RNA-Seq experiment for all genes. This yields individual gene-level estimates for mRNA abundance time courses during the cell cycle.

FISH experiments were conducted for seven genes (SIC1, CLN2, CLB2, CLB5, PCL1, PCL9, SWE1) which are known to change their expression during the cell cycle (Amoussouvi et al. [1]). Data was available in the form of time courses of counts of fluorescence-labelled mRNA molecules for each of seven cell cycle phases (early G1, late G1, S, G2, P/M, Ana, T/C). A total of 500-1000 cells was counted for each gene, yielding a time-resolved frequency distribution of mRNA counts. The average values of these distributions were used as estimators of the absolute number of mRNA transcripts per gene in each cell cycle phase. Time points were mapped to the cell cycle phases with the help of four genetic and morphological markers (presence and size of bud, shape and distribution of the DAPI-stained nucleus, localization of TagGFP-labelled transcription repressor Whi5, number and localization of spindle pole body visualized by mTurquoise-labelled Spc42) (Trcek et al. [52], [53]).

On the other hand, RNA-Seq data was available for 6650 genes as time series with five-minute time steps at which the relative abundance was quantified (normalized to 100% for every time step) (Teufel et al. [51]).

For each of the seven genes of the FISH experiment a time-resolved scaling factor was then derived by dividing the known absolute mRNA values from FISH by the known relative mRNA percentages from RNA-Seq. Depending on the cell cycle phase, the scaling factor varied across the genes so that an aggregation method was sought to combine the seven observed gene-level factors into one. Median, arithmetic mean and geometric mean yielded similar results, with the possible exception of the early G1 phase, where the arithmetic mean was higher due to high values of SIC1 and PCL9 in that phase. On this basis, the median was used for all times.

This yielded a qualitatively and quantitatively plausible candidate for a through-the-cell-cycle transcriptome (Fig. 6) which can be used to conduct time-resolved simulations. It is remarkable that the range of the time-resolved transcript counts (between 73000 in early G1 phase and 15000 in G2 phase) nearly coincides with the ranges given in the literature (Milo et al. [39, IDs 108248, 106763, 103023, 102988]).

Appendix B. Sensitivity analysis

A sensitivity analysis of the model was carried out in order to assess the model's response to changes in the main parameters. The discrete elasticities (% change in quantity / % change in parameter) are displayed in Table B.4.

Table B.4.

Sensitivity analysis of model results in the base case scenario w.r.t. the main model parameters. We stressed the ribosome count, initiation rates and tRNA counts by changing the values 25% up and down from the basis values. Displayed are the discrete elasticities (% change in quantity / % change in parameter).

Parameter Protein speed (proteins/s) Proteins per transcript speed (proteins/transcript/s) Ribosome speed (codons/s)
Ribosome count 0.300.39 0.300.40 0.110.12
Initiation rates −0.09… − 0.04 −0.13… − 0.09 −0.01… − 0.02
tRNA counts 0.000.04 0.00 0.010.00
mRNA counts 0.000.04 0.27 −0.16… − 0.11

We stressed the ribosome count, initiation rates and tRNA counts by changing the values 25% up and down from the basis values.

The elasticity for the ribosome count is highest, as expected and close to zero for tRNA counts, also as expected, in particular as tRNA activation is not part of the model.

Negative sensitivities w.r.t. initiation rates indicate that, in the initiation-limited parameter regime which is used here, an increase of initiation rates only redistributes free ribosomes towards transcripts with lower translation efficiency.

Data availability

Data included in article/supp. material/referenced in article.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data included in article/supp. material/referenced in article.


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES