Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2021 Apr 22;49(9):5124–5142. doi: 10.1093/nar/gkab260

Estimation of peptide elongation times from ribosome profiling spectra

Michael Y Pavlov 1,3, Gustaf Ullman 2,3, Zoya Ignatova 3,, Måns Ehrenberg 4,
PMCID: PMC8136808  PMID: 33885812

Abstract

Ribosome profiling spectra bear rich information on translation control and dynamics. Yet, due to technical biases in library generation, extracting quantitative measures of discrete translation events has remained elusive. Using maximum likelihood statistics and data set from Escherichia coli we develop a robust method for neutralizing technical biases (e.g. base specific RNase preferences in ribosome-protected mRNA fragments (RPF) generation), which allows for correct estimation of translation times at single codon resolution. Furthermore, we validated the method with available datasets from E. coli treated with antibiotic to inhibit isoleucyl-tRNA synthetase, and two datasets from Saccharomyces cerevisiae treated with two RNases with distinct cleavage signatures. We demonstrate that our approach accounts for RNase cleavage preferences and provides bias-corrected translation times estimates. Our approach provides a solution to the long-standing problem of extracting reliable information about peptide elongation times from highly noisy and technically biased ribosome profiling spectra.

INTRODUCTION

Ribosome profiling (or Ribo-Seq) couples cell-wide profiling of the positions of translating ribosomes on messenger (mRNA) at single codon resolution (1) with deep sequencing (2) and has provided new insights into regulation of protein synthesis across species (reviewed in (3–5)). The approach requires rapid arrest of mRNA translation followed by isolation of intact mRNA-ribosome complexes, nuclease digestion of unprotected mRNA and generation of a deep-sequencing library from the ribosome-protected mRNA fragments (RPFs) (2). Interpretation of the RPFs in terms of elongation times at single codon resolution requires (i) ribosomal arrest to be faster than the single peptide elongation steps, (ii) precise estimation of the distance of the ribosomal A site (that is the ribosomal site accepting aminoacyl-tRNA-elongation factor complex) from the 5′- or 3′-ends of RPFs, (iii) neutralization of sequence-dependent biases in the experimental protocol (i.e. nuclease cleavage, amplification in the library preparation) (3,6). Fulfillment of these criteria enables determining translation time for any particular codon in the transcriptome.

Codon resolution of the RPF spectra is generally higher in eukaryotes than in bacteria. In eukaryotes, RNase I is the nuclease of choice and it cleaves precisely at ribosome boundaries (7). RNase I is inhibited by the bacterial ribosome (8), thus micrococcal nuclease (MNase, S7 nuclease) is most widely applied in generating bacterial Ribo-Seq libraries. MNase, however, cleaves with base-dependent specificity, preferably before A and U (9). Systematic analysis reveals that the MNase generated RPFs have more variable lengths at their 5′- than at their 3′-ends (7,10). Consequently, using the more precise MNase cleavage at the 3′-end to infer the A-site codon position improves the resolution of bacterial ribosome profiling sets (6,7), yet the bias in RPF generation due to the nucleotide-dependent specificity of the MNase persists.

An additional source of bias in the Ribo-Seq libraries is the local RPF sequence composition including high propensity for secondary structure formation for some RNA fragments which can interfere with the reverse transcription priming and/or with the adaptor ligation (11,12). Attempts at considering the systematic biases across Ribo-Seq libraries (13) or using smoothing algorithms to reduce data variance in the presence of the inherent heterogeneous noise of the ribosome profiling data sets (14,15) significantly improve the ability to distinguish genuine ribosome pausing from technical artifacts introduced by the library construction. Yet, a simple and robust method for neutralizing technical biases and extracting factors that determine the large sequence context dependent variations in translation speed even at identical ribosomal A-site codons is missing.

In the present work, we develop a model that accounts for the local codon context-dependent variation of peptide elongation times and RPF generation/processing biases. In total, we use 915 context-defining parameters, which are estimated by fitting the model-predicted RPF spectra to the experimental, transcriptome-wide RPF spectra using non-linear regression with maximum likelihood (ML) statistics. We also consider ribosome profiling spectra at single nucleotide resolution with homogenous fragment size to identify and neutralize RPF generation/processing biases near the 5′- and 3′-fragment ends. Our results suggest that an inner local context of five codons, including those at the A, P and E sites, accounts for the ribosomal dwell time on each A-site codon of the transcriptome. This determination of the peptide elongation times provides a basis for a detailed understanding of the dynamics of protein synthesis in living cells.

MATERIALS AND METHODS

Ribo-Seq library generation

Escherichia coli B strain AS19 was grown in LB medium until the culture reached an OD600 of 0.5. Cells were harvested by flash freezing and libraries from biological replicates were prepared for ribosome profiling by direct ligation of the platform-specific sequences or adapters as described (16). Sequenced RPFs were quality trimmed using fastx-toolkit (0.0.13.2; quality threshold: 20), sequencing adapters were cut using cutadapt (1.8.3); minimal overlap: (1 nt) and uniquely mapped to the E. coli genome (strain MG1655, version U00096.3, NCBI) using Bowtie (1.2.2) with parameters -l 16 -n 1 -e 50 -m 1—strata—best y. The RPF counts for each ORF were normalized per total mapped reads per million (RPM) (17) and calibrated to the A site using the 3′-ends of the RPFs as described earlier (18). The data sets generated in this study are accessible under the accession number GSE145571. Furthermore, we analyzed in the same way the following data sets: GSM3358136 and GSM3358137 for Ribo-Seq libraries of E. coli MG1655 cultured in MOPS complete synthetic media containing all 20 amino acids with no treatment or treated for 10 min with 200 μM mupirocin, respectively, and collected by filtration (6), and GSM2186726 and GSM2186728 for S. cerevisiae libraries in which the RPFs were generated using MNase and RNase A, respectively (19).

Modeling strategy for Ribo-Seq spectra

Each RPF is assigned to a codon position j of the open reading frame from gene i, ORFi. The detected number of RPFs, Inline graphic, often colloquially referred to as ‘RPF counts’, reflects the number of ribosomes with this particular codon in A site at the moment of flash-freezing of the cells as well as biases in the nuclease digestion of mRNA and in the further amplification/processing to DNA libraries (3,9,11,12,14,20). The expected value Inline graphic of the stochastic integer Inline graphic at any A-site codon position (i,j) we write as:

graphic file with name M43.gif (1)

Here, Inline graphic is the same constant for all A-site positions (i,j), Inline graphic is the global frequency of translation initiation of an ORF of type i in the cell population and proportional to the ORFi expression level, Inline graphic is the expected peptide elongation cycle time, Inline graphic is a ‘bias’ factor that depends on the context of codon j in ORFi and reflects the extent of digestion/processing/ amplification biases in Ribo-Seq library preparation. We note that Inline graphic constant reflects the depth of Ribo-Seq library. Its numeric value depends on the number of translating ribosomes in the cell population used for library preparation and also on the efficiencies of ligation, RPF amplification and sequencing.

Each elongation time Inline graphic in Equation (1) is the product of a time calibration factor Inline graphic and a parameter Inline graphic that, like Inline graphic, depends on the context of codon j but is proportional to the peptide elongation cycle time: Inline graphic. Accordingly, we re-write Equation (1) as:

graphic file with name M54.gif (2)

Here, parameter Inline graphic is proportional to global frequency of translation initiation Inline graphic of ORFi and Inline graphic is defined by Inline graphic. The expected value of RPF counts, Inline graphic, contains two factors of great physiological relevance, namely the protein expression level Inline graphic from gene i and the expected peptide elongation cycle time Inline graphic at A-site codon j in transcript i. The major methodological task is, therefore, to elicit reliable estimates of the expected values, Inline graphic, proportional to Inline graphic, and Inline graphic for all codons (i, j) from the experimental sets of the sampled Inline graphic values and known growth rate, Inline graphic, of the bacterial culture. When Inline graphic is much larger than 1, it provides a reliable estimate of Inline graphic but for small Inline graphic values its statistical nature must be accounted for by the probability Inline graphic that the number of RPF counts from A-site codon (i,j) is Inline graphic. The RPF counts Inline graphic are obtained from ligated RNA fragments with copy numbers amplified by PCR and greatly reduced in the sequencing procedure. The probability distributions for RNA fragments after ligation are of Poisson type (Supplementary Text), the distributions of DNA fragments after amplification are burst-like (21) and the distributions of sequenced DNA fragments Inline graphic are of Neyman type A. At small values Inline graphic, where A is the PCR copy number amplification factor and q is the fraction of the amplified library that has been finally sequenced, the Neyman type A distribution is close to Poisson but with variance equal to the expected value (Inline graphic) multiplied by a constant factor Inline graphic (Supplementary Text). Under our experimental conditions the Inline graphic product is smaller than 1, and for the simplicity in what follows, we assume the copy number distribution for any (i,j) fragment to be of Poisson type:

graphic file with name M78.gif (3)

We ascribe a log-likelihood function L for the whole transcriptome based on all Inline graphic probabilities:

graphic file with name M80.gif (4)

In what follows we develop a model for the Inline graphic values in Equation 1 built on the hypothesis that each Inline graphic in Equation (2) is determined by a local context of the current A-site codon j in ORFi and that this context is composed of pL, codons with the A site at its near middle position (Figure 1). For clarity, below we make explicit three distinct description levels of parameters in our approach: directly experimental (e.g. Inline graphic), modelled (e.g. Inline graphic) and expected (e.g. Inline graphic) values of key parameters. For ease of identification, we also use the Latin letters for the first two categories and Greek letters for the third one (Table 1).

Figure 1.

Figure 1.

Local and global codon contexts for a ribosome translating an ORF of type i. Global A-site parameter j corresponds to the local A-site parameter p = pA= 8. Global parameter j' corresponds to the local parameter p through j' = j-pA+p, where p varies from 1 to pL= 15, so that P and E site correspond to p= 7 and p = 6, respectively.

Table 1.

Meaning of key parameters of the present work

Expected parameters
Inline graphic Expected number of RPF counts from codons at position j in open reading frames from gene i (ORFi) in cell population
Inline graphic Constant reflecting the depth of the Ribo-Seq library as determined by the number of translating ribosomes in the cell population, efficiencies of RPF generation, ligation, amplification and sequencing.
Inline graphic Expected number of initiations on ORFi.
Inline graphic Technical factor, determined by context bias efficiencies of RPF generation, ligation and amplification for codon j in ORFi.
Inline graphic Expected elongation cycle time for codon j of ORFi.
Inline graphic Expected elongation cycle time average for the whole cell population.
Inline graphic Expected codon context dependent elongation cycle time for codon j of ORFi normalized to τe.
Inline graphic Factor proportional to global frequency Inline graphic of translation initiation of ORFi.
Inline graphic Expected codon context dependent variation of number of RPFs normalized to factor Inline graphic and partitioned into elongation cycle time and bias factors.
Experimental parameters
Inline graphic Measured number of RPF counts for A-site codon at position j in ORFi.
Inline graphic Sum of RPF counts for the ‘inner’ region of ORFi containing Inline graphic codons.
Inline graphic Mean RPF density in the ‘inner’ ORFi region containing Inline graphic codons.
Inline graphic Sum of Inline graphic over all positions j in all ORFs for which there is a codon of type ‘c’ at position ‘j+p-pA .
Inline graphic RPF score function describing relative variation of Inline graphic along ORFi.
Modell parameters
Inline graphic Maximum likelihood (ML) estimate of λij.
Inline graphic ML estimate of Inline graphic by a number pL of Inline graphic factors, each with 64 codon identity determined values.
Inline graphic Underlying parameters of our model, determined from the ML fit of all Inline graphic to all Inline graphic values.
Inline graphic ML estimate of Inline graphic.
Inline graphic ML estimate of the sum of model RPF counts for the ‘inner’ region of ORFi. From the expressions for Inline graphic and Inline graphic it follows that Inline graphic.
Inline graphic ML estimate ofInline graphic.
Inline graphic RPF score function describing relative variation of Inline graphic along ORFi.
Inline graphic Model estimate of the expected elongation cycle time Inline graphic for codon j of ORFi.
Inline graphic Model estimate of time factor Inline graphic; experimentally determined from the growth rate Inline graphic of cell population.
Inline graphic Model estimate of bias-free relative elongation cycle time for codon j of ORFi; determined by the product Inline graphic factor for inner position of local context of codon (i,j).
Inline graphic Model estimate of bias-free total time for ORFi translation normalized to te.
Inline graphic Model estimate of absolute total time for ORFi translation.
Inline graphic Pausing score function describing relative variation of bias free translation time Inline graphic along ORFi.

Ribo-seq spectral modeling at single codon resolution

To obtain estimates for all Inline graphic values (Eq. 1 or 2), we introduce model RPF counts, Inline graphic composed of a factor Inline graphic for gene i, estimating Inline graphic multiplied by a local context factor Inline graphic, estimating Inline graphic in Equation (2):

graphic file with name M140.gif (5)

Each local codon context position (p) among the total number pL of context defining positions contributes with a factor Inline graphic to the value of Inline graphic:

graphic file with name M143.gif (6)

Each factor Inline graphic is determined by the identity (c) of each one of the 64 possible codons at each position p (Figure 1). Index Inline graphic identifies the codon at local position p, corresponding to global codon position j + p −pA in ORFi sequence (Figure 1). We fit the model RPF counts, Inline graphic, to the experimental RPF counts, Inline graphic, in the inner ORFs regions, by adjusting the Inline graphic context factors Inline graphic to maximize a Poisson-based likelihood function (Equation 4). If not stated otherwise, we use pL = 15, so that a total of 915 (15 × 61 = 915) factors Inline graphic estimate all Inline graphic-values in all potential contexts, where 61 is the number of sense codons. The E. coli transcriptome may contain up to 1.8 × 106 distinct contexts (about 6000 ORFs and 300 codons per ORF), and the ultimate number of contexts for which Inline graphic could be predicted by the model is 6115Inline graphic 1027. In the next section, we describe how model parameters are derived from experimental data by maximizing a transcriptome-wide log likelihood function.

Ribo-seq spectral modeling with maximum likelihood (ML) estimation of local codon context parameters

To extract model parameters Inline graphic (Equation 5) and Inline graphic (Equation 6) from Ribo-Seq datasets, we assume that each Inline graphic value is sampled from a Poisson distribution with expected value Inline graphic (Equation 3), the latter estimated by the model parameter Inline graphic (Equations 5 and 6). The log-likelihood function L for the RPF spectrum takes the simple form (see also Equation 4):

graphic file with name M159.gif (7)

Here, the j-summations for each ORFi are confined to an internal ORF region starting at codon pA and ending at codon Inline graphic with a total number of internal codons, Inline graphic, where Inline graphic is the total number of ORFi codons. In what follows we use the short hand notationInline graphic for the j-summations in Equation 7. The maximal value of L (Equation 7) is obtained by setting its partial derivatives with respect to all Inline graphic and Inline graphic parameters equal to zero, which leads to the following equation system for determination of all Inline graphic parameters (see Supplementary Text):

graphic file with name M167.gif (8)

where

graphic file with name M168.gif (9)

Here, Inline graphic is the ‘Kronecker delta function’ equal to 1 and 0, when Inline graphic and Inline graphic, respectively; Inline graphic (Table 1) and Inline graphic is a function that depends on codon type ‘c’ at local position ‘p’ (Figure 1). Inline graphic is calculated from experimental RPF counts Inline graphic (and sequence data) as:

graphic file with name M176.gif (10)

We note that in the special case p = pA (Figure 1), Inline graphic is the total number of RPFs in a dataset generated by ribosomes with A-site codon of type ‘c’. More generally, Inline graphic is the total number of RPFs for which there is a codon ‘c’ at a distance Inline graphic from the A site.

With the help of Equation 6 that relates Inline graphic with Inline graphic, Equations (8) and (9) are solved using a Levenberg-Marquardt type algorithm (21,22) to obtain the table of Inline graphic factors (see Supplementary Text). Using the obtained Inline graphic factors we compute local context parameters Inline graphic (Equation 6), and then model RPF counts Inline graphic from Equations (5) and (9) as:

graphic file with name M186.gif (11)

Instead of comparing experimental (Inline graphic) and modelled (Inline graphic) RPF spectra of the same transcript, it is more convenient to compare experimental (Inline graphic) and modeled (Inline graphic) RPF scores defined here as:

graphic file with name M191.gif (12)

and

graphic file with name M192.gif (13)

where ni is the total number of internal codons in ORFi.

The average RPF density of a gene:

graphic file with name M193.gif (14)

is often used as a statistical reliability measure of its RPF coverage profile.

The lower the Inline graphic- value, the less informative the profile. For example, when Inline graphic RPFs per codon, more than a half of the Inline graphic values in the gene profile are zeroes and, hence, contain little information about codon translation times. We note that similar to j-summations above, the k-summations in Eqs. 1114 are from k = pA to Inline graphic (Figure 1). We also note that experimental RPF scores Inline graphic are sometimes referred to as ‘normalized footprint counts’ (23) or ‘relative enrichment values’ (24) and describe how much RPF counts for codon j deviate from a per-codon average value Inline graphic in the inner region of a gene.

The Inline graphic factors can always be scaled so that for each position Inline graphic of the local context we have (Supplementary Text):

graphic file with name M202.gif (15)

where the Inline graphic weighting factors are calculated as:

graphic file with name M204.gif (16)

and Inline graphic is :

graphic file with name M206.gif (17)

Since Inline graphic estimates Inline graphic, a parameter proportional to the expression level of gene i (Equation 1) and Inline graphic (Equation 17) varies little with position p (see Supplementary Text), each product Inline graphic in Equation 16 is proportional to the frequency with which the ribosome encounters a codon of a type c in the inner region of ORFi. Hence, each Inline graphic confers a statistical weight proportional to the frequency with which the ribosome encounters a codon of type c in the transcriptome (Supplementary Text).

We also introduce the ‘sensitivity parameter’ Inline graphic as a measure of the sensitivity of Inline graphic to the codon identity c at local context position p. It is defined as the standard deviation, Inline graphic, from the mean Inline graphic (Equation 15) for row p of the table of Inline graphic factors:

graphic file with name M217.gif (18)

where the weights Inline graphic are defined in Equation (16).

Ribo-seq spectral modeling at single nucleotide resolution

In order to estimate the fragment processing bias (Inline graphic; Equation 1), we extend our modeling resolution from codon to nucleotide level. For this, we use the number of RPFs, Inline graphic, of single length FL with ribosomal A site located at nucleotide j in ORFi and estimate its expected value, Inline graphic (compare with Equations 1 and 5) as:

graphic file with name M222.gif (19)

where each parameter Inline graphic is modeled as product of local context z-factors (compare with Equation 6):

graphic file with name M224.gif (20)

Here, index j in Equations (19) and (20) refers to nucleotide j of ORFi, and Inline graphic specifies nucleotide base b (U, C, A or G) at transcriptome position (i,j); Inline graphic factors form a pNLx4 table; the local nucleotide position p is counted from p= 1, via the first nucleotide at A-site position p = pNA to the third base of the last codon of the local context sequence of length pNL (Supplementary Figure S1).

Parameters Inline graphic are ML estimated by non-linear model fitting to experimental data assuming Poisson distributed RPF counts Inline graphic. The data treatment is formally equivalent to that leading up to Equations. 8 and 9 with parameters Inline graphic and Inline graphic replaced byInline graphic and Inline graphic, respectively. Thus:

graphic file with name M233.gif (21)

where Inline graphic is the Kronecker delta function and Inline graphic is obtained from experimental data Inline graphic through (compare with Equation 10) :

graphic file with name M237.gif (22)

Assuming x to be the distance from the first A-site nucleotide to the 3′-end of the RPF in nucleotides, it follows that Inline graphic and Inline graphic are the numbers of RPFs of length FL with nucleotide ‘b’ at 5′- and 3′-end, respectively. By applying the same ML procedure as in the codon-resolution case, we solve Equation 21 to estimate the Inline graphic and Inline graphic factors for computing all Inline graphic and Inline graphic parameters. Using formulae analogous to those in Eqs 12 and 13, one can compute the model nucleotide RPF scores Inline graphic to compare them with the experimental scores Inline graphic for RPF profiles generated from RPFs with a length of FL nucleotides.

Construction of unbiased Ribo-Seq spectra for estimation of relative peptide elongation times

To separate the effects of bias and peptide elongation time variations on the RPF counts, we partition the context dependent factors Inline graphic in Equation 5 into two parts:

graphic file with name M247.gif (23)

where (compare with Equation 6):

graphic file with name M248.gif (24)

and:

graphic file with name M249.gif (25)

As shown in Results, the outer context dependent factors Inline graphic, determined by Inline graphic factors for outer local context positions p from 1 to p1–1 and from p2+1 to pL, mainly account for the nuclease digestions/processing biases (B). The inner context dependent factors Inline graphic, determined by Inline graphic factors for inner positions p (from p1 to p2), mainly reflect the variation of the peptide elongation time, hence superscript (T) in Inline graphic. We model the bias-free RPF spectrum as:

graphic file with name M255.gif (26)

We also introduce model pausing scores Inline graphic to quantify the relative peptide elongation time as the ribosome moves along an ORFi (compare with Equations 12 and 13):

graphic file with name M257.gif (27)

From the 15 Inline graphic factors used to model the experimental Inline graphic values in the dataset we normally use five inner Inline graphic factors to obtain bias-corrected model RPF counts Inline graphic (Eqs 25, 26). This approach is distinct from using a Inline graphic-parameter function Inline graphic, defined only by the inner codons of the local context in the p-interval from p1 to p2:

graphic file with name M264.gif (28)

When the ML method is used to estimate the innerInline graphic parameters in Equation 28 that best account for the whole RPF spectrum, strong technical biases inherent to Inline graphic spectra distort the Inline graphic factors. This makes the model RPF scores Inline graphic, defined as:

graphic file with name M269.gif (29)

distinct from and inferior to the more accurate elongation time estimating pause scores Inline graphic (Equation 27).

Absolute peptide elongation cycle times from exponential growth rate

The expected time, Inline graphic, to translate a codon at position j of gene i in the cell (Equation 1) is estimated by the model time, Inline graphic, defined by the product of the local context factor Inline graphic (Equation 25) and a time factor Inline graphic, estimating Inline graphic in Equation 1:

graphic file with name M276.gif (30)

It follows that the total expected time Ti to translate ORFi (Table 1) is estimated by:

graphic file with name M277.gif (31)

where

graphic file with name M278.gif (32)

We note that the Inline graphic value estimates relative translation time of protein i (Table 1). Let Pi be the number of proteins of a type i in an exponentially growing cell population at a given time. The rate of copy number increase for proteins of a type ‘i’ is:

graphic file with name M280.gif (33)

Here, Inline graphic is the current number ribosomes in the population, Inline graphic the fraction of ribosomes in elongation phase, estimated as 0.8 by Dennis and Bremer (25) and Inline graphic is the fraction of elongating ribosomes devoted to synthesis of protein i. Fraction ui is proportional to the sum, Inline graphic, of bias-corrected RPF counts Inline graphic for ORFi. Taking Equations 26 and 32 into account one gets for Inline graphic:

graphic file with name M287.gif (34)

so that Inline graphic, where Inline graphic. Using this and Equations 31 and 34, one can re-write Equation 33 as:

graphic file with name M290.gif (35)

Introducing Inline graphic, the sum total of current protein copies, the exponential growth rate, Inline graphic can be defined as the increase in total protein copy number per time unit (Inline graphic) normalized to Inline graphic:

graphic file with name M295.gif (36)

We note that for exponential growth the above definition of growth rate (Equation 36) is equivalent to its standard definition (26) as the rate of relative increase in total protein mass (see Supplementary Text). Taking Equations (34) and (35) into account, Equation (36) for the growth rate becomes:

graphic file with name M296.gif (37)

Here, we used that during exponential growth and when the rate of protein degradation is negligible compared to growth rate, the protein copy numbers Inline graphic are proportional to our estimates, Inline graphic, of the frequencies, Inline graphic (Equation 1) of protein i translation initiation, so that the relation Inline graphic is valid (see Supplementary Text). From Equation (37) one obtains:

graphic file with name M301.gif (38)

so that the model time, Inline graphic is

graphic file with name M303.gif (39)

Note that all parameters in Equations 38 and 39 except Inline graphic and Inline graphic can be obtained from the Ribo-seq experiments themselves. The time factor Inline graphic in Equation 38 can be interpreted as an average per codon elongation time for a particular growth condition of the cell population, conditional on our special scaling of Inline graphic parameters (Equation 15) which forces Inline graphic in Equation (30) to oscillate around 1. Importantly, despite that both Inline graphic and Inline graphic depend on Inline graphic scaling, their product, the model time Inline graphic, is scaling insensitive and estimates the absolute time Inline graphic of codon (i,j) translation (see Equation 39).

Self-consistency of the RPF spectrum modeling

By self-consistent modeling we mean that a parameter estimation procedure applied to a dataset simulated using parameters extracted from the original data, will produce exactly the same parameter values as determined directly from the original data. It can be proven that our procedure of extracting the underlying parameters Inline graphic is indeed self-consistent (see also Supplementary Text). To illustrate this, we first use our ML approach to estimate an original Inline graphic parameter table from experimental RPF data, then use Equations (6) and (11) to simulate an RPF dataset and, finally, retrieve a new Inline graphic parameter table from the simulated RPF dataset. We find that the original and retrieved Inline graphic parameter tables are virtually identical as illustrated in Supplementary Figure S2A for A-, P- and E-site positions of Inline graphic parameter tables. In contrast, other methods like RUST (14) are not self-consistent in this sense. Computing the RUST ratio metafile table to simulate RPF data and then applying RUST again to retrieve the RUST ratio metafile one finds that the original and retrieved metafile tables differ significantly as illustrated for A-, P- and E-site metafile positions in Supplementary Figure S2B.

RESULTS

Modeling of Ribo-Seq spectra

There is a clear connection between the expected number, Inline graphic, of experimentally detected ribosomes with a particular codon j of ORFi in the A site, and the expected codon translation time Inline graphic (Equation 1). This connection allows one to use ribosome profiling for transcriptome-wide kinetic analysis of mRNA translation, but attainment of reliable kinetics data from ribosome profiling has remained elusive. The codon coverage within ORFs in the ribosome profiling spectra is highly variable (Figure 2). This is not only due to the codon context dependent variation of the codon translation time but also to context-dependent bias in the efficiency of nuclease dependent RPF generation and subsequent DNA library preparation steps including reverse transcription, adaptor ligation and PCR (9,11,12,14,20,24,26,27). Here, we consider three major causes of codon-to-codon variation of the experimental (‘exp’) RPF counts Inline graphic at each transcriptome position (i, j) summarized in Equation 1. These include: (i) codon context-dependent variation in the peptide elongation time, Inline graphic, (ii) bias, Inline graphic, of RPF generation and processing, and (iii) stochastic fluctuations in the experimental Inline graphic values. As seen in Equation (2), each Inline graphic value is the product of a time factor Inline graphic reflecting average codon translation time under a particular growth condition and a unit-less parameter Inline graphic that depends on the context of codon j, so that Inline graphic. Local context dependent variation of Inline graphic that causes the variations in Inline graphic can be traced to identities of A-, P- and E-site tRNAs, interactions between mRNA codons and the ribosome and/or interactions of the nascent peptide chain with the ribosomal exit tunnel in an amino acid-sequence dependent manner (28–30). The variations of bias factor Inline graphic are also due to local context dependence of the nuclease digestion and/or amplification/processing steps in RPF library preparation. From these, it follows that the variation of the product Inline graphic that reports on variations of expected counts, Inline graphic (Equation 2), is defined by local sequence context of the current A-site codon j in ORFi (Figure 1).

Figure 2.

Figure 2.

Ribosome profiling spectrum for gene rpsQ. RPF counts (Inline graphic-values) are plotted versus codon position j of the rpsQ transcript encoding ribosomal protein S17. The horizontal line represents the average number of RPFs per codon (Inline graphic) for the inner transcript region from Inline graphic to Inline graphic (see Equation 14 for formal Inline graphic definition).

We estimated each Inline graphic value by a model (‘mod’) Inline graphic parameter, which is the product of 15 Inline graphic factors (Equation 6). Each zp,c value is determined by the type of codon (c) at local sequence context position (p) (Figure 1). These Inline graphic values were estimated by fitting our model (Equations 5 and 6) to the experimental Inline graphic values of the whole transcriptome. To illustrate the goodness of the fit, we compare experimental (Equation 12) and model (Equation 13) RPF scores for single genes with high RPF density. The model-predicted, Inline graphic, and experimental, Inline graphic, RPF score spectra show relative codon-to-codon variation of modeled and experimental RPF counts. They can be remarkably similar at the single gene level (Figure 3A, B) with Pearson correlation coefficients, r, in the 0.7–0.8 range, suggesting that the local mRNA sequence context accounts for the major part of the variability of experimental Inline graphic values. Figure 3C shows that high r-values are frequent for genes with high experimental RPF density. The r -values decrease as an increasing number of genes with medium and low RPF density are included in the comparison – an effect due to the high statistical uncertainty of RPF profiles for genes with low experimental RPF density. In comparison with the RUST method (14), our method achieves, on average, significantly higher Pearson correlations between experimental and model RPF spectra (Supplementary Figure S3).

Figure 3.

Figure 3.

Comparisons of experimental (Inline graphic; red; Equation 12) and model (Inline graphic; blue; Equation 13) RPF score spectra at codon resolution for rpsQ (A) and atpE transcript (B). r, Pearson correlation, r = 0.83 for rpsQ and r = 0.81 for atpE. (C) Frequency density of Pearson correlation coefficient, r, between Inline graphic and Inline graphic for sets of 161 (red, Inline graphic), 337 (light blue, Inline graphic) and 945 (dark blue, Inline graphic) transcripts. Note that the transcripts were first ranked by their dexp-values (Figure 2) and then top-ranked 161, 337 and 945 transcripts were considered.

Ribosomal profiling spectra are ultra-sensitive to codon identity near ribosome edges

Strikingly, variation of the Inline graphic factors with codon identity c is much larger for local codon positions (p) near the lagging (p = 4) and leading (p = 11) ribosome edges than in A site (p = 8) (Figure 1). Indeed, the Inline graphic value varies from 0.4 for the UUU (Phe) codon to 1.6 for the AAG (Lys) codon, while Inline graphic and Inline graphic values span significantly larger ranges from 0.2 for the GGG (Gly) to 2.1 for the AUG (Met) codon for Inline graphic and from 0.2 for the UUU (Phe) to 2.2 for the CCA (Pro) codon for Inline graphic, respectively (Figure 4A). We have quantified the sensitivity of Inline graphic to codon identity c at position p as a weighted standard deviation, Sp, from the mean along the p-row of the Inline graphic-factor table (Equation 18). A plot of Sp versus p confirms much higher sensitivity to codon identity at local codon positions close to ribosome edges (p = 4 and p = 11) than at ribosomal A, P or E site (p = 8, 7 or 6, respectively) (Figure 4B).

Figure 4.

Figure 4.

Sensitivity of Inline graphic factor with codon identity at different local context positions p. (A) Variation of Inline graphic values with codon identity c for positions p = 4 (dark blue, lagging ribosome edge), p = 8 (light blue, A site) and p = 11 (red, leading ribosome edge). Codons are ordered as in the genetic code table. (B) Position sensitivity Sp (Equation 18) versus local context position p (see Figure 1 for position numbering).

Nuclease induced bias in Ribo-Seq spectra from E. coli

To dissect the origins of enhanced codon sensitivity of Inline graphic factors at positions near ribosome edges (Figure 4), we analyzed Ribo-Seq spectra also at single nucleotide resolution. Bacterial Ribo-Seq libraries are commonly constructed by first mapping the 3′-ends of RPFs to genomic nucleotide sequences (6,7). RPF coverage profiles at single nucleotide resolution are then obtained by counting the number Inline graphic of RPFs assigned to nucleotide j of gene i. The Inline graphic-values are subsequently converted to standard experimental RPF profiles at single nucleotide resolution Inline graphic by the re-assignment rule Inline graphic. The premise for this procedure is that the nucleotide distance (x) from the 3′-end to the first A-site nucleotide of an RPF is constant (6). Fragment length(FL)-specific profiles, Inline graphic, are generated from RPFs of the same length, FL, so that the standard Inline graphic profiles can also be obtained by summation of Inline graphic over all FLs. In bacteria, both FL-summed and single FL-specific RPF coverage profiles lack the well-defined three-nucleotide periodicity that is observed in yeast or mammalian cells (6,7). We suggest that this periodicity loss is caused by ‘anomalous’ MNase cleavage at one or two nucleotides downstream of the ordinary cleavage site at the leading (3′) edge of the ribosome. Consequently, the RPF profiles appear as if the translating ribosome moves one nucleotide at a time. In both, single-codon resolution (Equation 1) and single-nucleotide resolution cases, there are expected numbers of RPFs, Inline graphic, generated from ribosomes with their A site at nucleotide number j of ORFi. The local 15-codon context (pL = 15) with the A-site codon at position pA = 8 (Figure 1) here corresponds to a local 45-nucleotide sequence (pNL = 45) with the first A-site nucleotide at position pNA = 22 (Supplementary Figure S1).

We used our maximum likelihood (ML) approach to estimate the local context factors Inline graphic that estimate the nucleotide context dependent variation in Inline graphic as modelled by Inline graphic using Equations 19 and 20. Those Inline graphic factors calculated for fragment length-specific experimental coverage profiles, Inline graphic, are shown in Figure 5 for FL = 23, 24 and 25 nt. Inline graphic varied greatly in response to changing nucleotide base identity (b) at positions 10, 9 and 8 for FL = 23 (Figure 5A), FL = 24 (Figure 5B) and FL = 25 nt (Figure 5C), respectively. At these combinations of nucleotide positions and lengths the Inline graphic factors were always relatively small when b = G or b = C leading to small model (‘mod’) Inline graphic and Inline graphic values (Equations 19 and 20). Local positions p equal to 10, 9 and 8 correspond to 5′- ends of the FL = 23, 24 or 25 nts fragments, respectively, implying low abundance of RPFs with G/C at their 5′-ends. Indeed, experimental RPFs with an A at their 5'- end are about 60-fold more abundant than experimental RPFs with a G at the 5′- end, in line with the previous report on strong preference of MNase to cleave before an A or a U (9). Notably, the 5′-peak of the position sensitivity to nucleotide identity (calculated analogously to Sp in Equation (18)) moves exactly one nucleotide to the right as the fragment length increases by one nucleotide from 22 to 27 nt (Figure 5D). Irrespective of fragment length, the 3′-ends of RPFs are always aligned at local position p = 32, so that MNase cleavage occurs between positions 32 and 33 in the local nucleotide context (Figure 5). The Inline graphic parameters with G or C at local position p = 33 were much smaller and those with A or U much larger than 1 (Figure 5), also in line with the observation that MNase cleaves before A/ U nts (9). The 3′-end cleavage bias of MNase was strong and yet considerably less pronounced than the 5′-end cleavage bias. In the local nucleotide region between positions 13 and 29 well inside the ribosome (Supplementary Figure S1), the Inline graphic factors were very similar for different fragment lengths (Figure 5), suggesting insignificant technical bias in the 13–29 region of the local nucleotide context.

Figure 5.

Figure 5.

Context factors, Inline graphic displayed for local nucleotide positions 1 to 45 for: (A) FL = 23 nt, (B) FL = 24 nt, (C) FL = 25 nt; p = 22 corresponds to the first A-site position (Supplementary Figure S1). (D) Position sensitivity profiles for Inline graphic-parameters calculated from RPF genome coverage with RPF fragments of lengths ranging from 22 to 27 nts.

Inline graphic factors estimated from the standard experimental RPF coverage profile, Inline graphic, obtained by summation of length-specific experimental RPF coverage profiles Inline graphic for RPF lengths from 22 to 27 nts, exhibit much reduced 5′-bias but essentially unchanged 3′-bias (Supplementary Figure S4). The great reduction of the 5′-bias is easily understood by considering that the summation of length specific RPF profiles Inline graphic corresponds roughly to an FL-averaging of Inline graphic factors. This also explains why the position sensitivity profile of Inline graphic factors at codon resolution (Figure 4B) has a smaller bias at positions close to the lagging than to the leading edge of the ribosome.

The strong effects of codon identities at position 11 (leading edge of the ribosome) on Inline graphic values (Figure 4A) can now be easily explained by the biases at positions 31, 32 and 33 observed at nucleotide resolution. For example, the combinations of G or C at positions 31 and 32 with A or U at position 33 (corresponding to the three nucleotide positions of codon 11) are expected to result in large Inline graphic-values, while U or A at positions 31 and 32 combined with C or G at position 33 should result in small Inline graphic-values (see Figure 5 or Supplementary Figure S4). Indeed, GGU (Gly) and CCA (Pro) codons have Inline graphic-values much larger than 1, while codons UUC (Phe) and AAG (Lys) have Inline graphic-values much smaller than 1 (Figure 4A), exactly as predicted from 3′ biases (Supplementary Figure S4). The same analysis applied to the 5′ biases (Supplementary Figure S4) explains the strong Inline graphic codon dependence at the ribosomal lagging edge positions 3 and 4 (Figure 4).

For the codon resolution data, we conclude that the outer codon context-dependent Inline graphic factors for positions p = 1 to p1–1 and from p2+1 to pL (Figure 1) account for the technical biases in RPF library generation. In contrast, the inner codon context Inline graphic factors, for positions p from p1 to p2 mainly reflect the context dependent variation of the peptide elongation times. With this as a lead we estimated the Inline graphic factors for the E. coli AS19 dataset and used the inner subset of Inline graphic factors for all positions from p1 = 5 to p2 = 9 to obtain bias-corrected model Inline graphic parameters (Equation 25). A typical example of such bias elimination is shown in Figure 6A for the E. coli atpE transcript. We contend that the bias-corrected model Inline graphic pausing scores (Equation 27) reflect the bias-free peptide elongation times showing that the ribosome translates mRNA in a much smoother fashion than the experimental Inline graphic RPF scores might suggest.

Figure 6.

Figure 6.

Pausing score profile and absolute time spectrum for the atpE (ATP synthase subunit C) transcript at single codon resolution. (A) Comparison of an experimental RPF score profile Inline graphic (red, Equation 12) with the total model RPF score profile Inline graphic (blue, Equation 13) and model pausing score profile Inline graphic (light blue, Equation 27). The pausing score profile is much less jagged ( Inline graphic = 0.3) than the total model (Inline graphic = 0.9) and experimental (Inline graphic = 1.1) RPF score profiles. (B) Absolute elongation time spectrum Inline graphic (Equation 39); the horizontal line corresponds to the average per-codon translation time of the atpE transcript.

We have also estimated the absolute peptide elongation time, Inline graphic, as the product Inline graphic where Inline graphic estimates the time factor Inline graphic in Equation 1 (Figure 6B). We note that our modeling approach allows for determination of the model Inline graphic parameters, and, hence, model pausing scores Inline graphic from the ribosome profiling data alone, but for Inline graphic calculation we need to use additional experimental information provided by the growth rate Inline graphic of the bacterial population (Equation 38).

The local codon context dependent distribution of relative peptide elongation times

The elimination of the technical bias described in the previous section enables estimation of authentic peptide elongation times for any A-site codon j in any ORFi by ‘dividing out’ the bias dependent local context parameter Inline graphic (Equation 24) from the total context parameter Inline graphic (Equation 6), which leads to the context parameter Inline graphic (Equation 25) proportional to the A-site codon elongation time Inline graphic (Equation 30). The frequency densities of Inline graphic and bias-free Inline graphic values for the E. coli transcriptome are displayed along with those for their logarithms in Supplementary Figure S5. The frequency densities of Inline graphic and Inline graphic logarithms are near Gaussian with σ-values of 0.61 and 1.2, respectively (Supplementary Figure S5B). From this, we propose that each rate-limiting elongation step involves the passage over a standard free energy barrier determined by the sum of standard free energy contributions determined by the logarithms of Inline graphic factors in the local codon context. According to the transition-state theory, the time it takes to overcome a standard free-energy barrier increases exponentially with the barrier height (31). In translocation, the height of the free energy barrier could be the sum of the free energies of interaction between ribosome and mRNA throughout the whole inner context region. In peptidyl transfer, the barrier height could be the sum of the free energies from the identities of codons upstream of the A-site codon. According to the Central Limit Theorem, the frequency densities of such free energy sums would be near-Gaussian, providing a tentative explanation for the near-Gaussian frequency densities of the logarithm of Inline graphic-values (Supplementary Figure S5B) the exponentiation of which then leads to a log-normal distribution (Supplementary Figure S5A). Interestingly, frequency density of a log-normal distribution is mimicked by the distribution of the sum of two stochastic variables, one normally and one exponentially distributed. Possibly, this feature has led to the previous proposal that there are two-time components in peptide elongation, one Gaussian and one exponential (32). Finally, we note that due to the local context dependent bias there are more Inline graphic factors in Inline graphic (Equation 6) than in Inline graphic (Equation 25), leading to a broader near-Gaussian distribution for the logarithm of Inline graphic than of Inline graphic (Supplementary Figure S5B).

Determinants of fast and slow peptide elongation cycles in E. coli

The model estimate Inline graphic (Equation 30) of the time that the ribosome spends translating codon j of ORFi is proportional to Inline graphic (Equation 25), a parameter which is estimated from product of Inline graphic factors for the inner codons of the local context around the A site in the p-interval from 5 to 9 (Figure 1). Accordingly, the size of each inner Inline graphic factor is a determinant of the peptide elongation time. Under our experimental E. coli AS19 growth conditions the Inline graphic values for Lys codons AAA or AAG pairing to tRNALys in A (p = 8), P (p = 7) or E site (p = 6) were relatively large and contributed to slow peptide elongation (Figure 7). A similar picture holds for Gly codons GGU and GGC, read by tRNAGly3. In contrast, Ile codons AUC and AUU, Phe codons UUU and UUC and Val codons GUC and GUU translated by tRNAIle2, tRNAPhe and tRNAVal2, respectively, exhibited relatively small Inline graphic values in the A, P and E site of the local context and contributed to fast peptide elongation (Figure 7). In most cases, synonymous codons read by the same tRNA isoacceptor have similar Inline graphic values (Figure 7 and Supplementary Figure S6). This, we propose, reflects similar interactions between the ribosome and the shared cognate tRNA. Along the same line, inner Inline graphic factors of Val codons at the same local position p were different when read by tRNAVal2 or tRNAVal1 (Figure 7), probably reflecting different interactions between the ribosome and the bodies of tRNAVal2 and tRNAVal1.

Figure 7.

Figure 7.

Variation of z-factors with the local codon position around the A site (pA= 8) for selected tRNAs reading synonymous codons. Large and small Inline graphic-values contribute to slow and fast peptide elongation, respectively. A complete set of z-factors for all tRNAs is presented in Supplementary Figure S6.

In the A site, codons for charged AAs, e.g. Lys, Asp and Glu, and one hydrophobic AA, Val, encoded by the GUA codon promoted slow peptide elongation (Figure 8A). Codons encoding Gly, Pro and Ala promoted fast or slow peptide elongation depending on whether they are in the A or P site of the local context (Figure 8A, B). In the E site of the local context codons encoding Lys, Glu, Gln and Asp as well as the Gly codons GGC and GGU (translated by tRNAGly3) contributed to slow peptide elongation (Figure 8C). Codons encoding aromatic AAs generally promoted fast elongation when in A, P and E sites of the local context, with Phe being the fastest for our dataset.

Figure 8.

Figure 8.

Codons ranked according to Inline graphic values for (A) A-site, (B) P-site and (C) E-site position of local context. Large and small Inline graphic-values designate slow and fast peptide elongation, respectively. Codons are ordered in the descending order of Inline graphic values.

Peptide elongation times in conditions of ternary complex depletion

Next, we considered two published Ribo-Seq datasets, one generated from E. coli MG 1655 strain following short incubation with mupirocin and the other representing an untreated control, grown under otherwise identical conditions (6). Mupirocin is an inhibitor of isoleucyl-tRNA synthetase (IleRS) (33), which depletes charged tRNAIle and causes strong A-site pausing at Ile codons (6). Accordingly, our analysis of the dataset with mupirocin treatment showed greatly increased Inline graphic values for all three Ile codons at A site which correlated with slow peptide elongation at Ile codons due to reduced supply of Ile-tRNAIle-containing ternary complexes (Figure 9A). We noted also that for the major Ile codons (AUC and AUU) Inline graphic increased by 13- and 16-fold, respectively, whereas Inline graphic for the minor AUA Ile codon increased 8.5-fold (Figure 9A). Since the concentration of the minor AUA reading tRNAIle2 is an order of magnitude lower than the tRNAIle1 concentration pairing to the major Ile codon (34), we propose that the mupirocin-induced relative increase in A-site binding time is much larger for ternary complex with the major than with the minor isoacceptor (see Discussion for more details). A much higher sensitivity to IleRS inhibition for AUC/AUU than for AUA reading is also predicted by the theory of selective charging of tRNA isoacceptors (35), corroborated for a similar case of other aminoacyl-tRNA synthetase inhibition (36).

Figure 9.

Figure 9.

Inline graphic values are affected by E. coli MG 1655 amino acid starvation. (A) Marked stalling at Ile codons following treatment with mupirocin. Variation of Inline graphic factors with the local codon position around the A site (pA= 8) for three Ile codons in untreated E. coli MG1655 (left panel) and treated with mupirocin (right panel). Large Inline graphic values indicate propensity for slow peptide elongation. (B) Comparison of the A-site Inline graphic values for untreated E. coli MG 1655 (red) and E. coli MG 1655 treated with mupirocin (light blue). Codons are ordered as in the genetic code table.

Compared to the untreated E. coli MG1655 cells we also found that along with the slower Ile codon reading, mupirocin addition caused a faster Ser/Gly codon decoding (Figure 9B). The relatively slow decoding of Ser and Gly codons in the control was attributed to quick depletion of Ser- and Gly-tRNAs due to culture filtration before the Ribo-Seq library preparation (6). Accordingly, we also detected higher Inline graphic values at Ser and Gly codons indicative of slow elongation on these codons in the untreated E. coli MG1655 cells (Figure 9B). We speculate that mupirocin treatment results in a drastic slowing down of the global translation in the cell, which also reduces Ser and Gly consumption. Hence, the pools of charged seryl-tRNAs and glycyl-tRNAs are maintained, thus eliminating the pausing on Gly and Ser codons (see Discussion for details).

We have also calculated Inline graphic factors at nucleotide resolution for the untreated E. coli MG 1655 data set (Supplementary Figure S7A) and compared them with Inline graphic factors for our dataset (Figure 5). While the 3′-end bias in Inline graphic factors for the same FL was similar for the two data sets, the 5′-end bias was much less pronounced for the E. coli MG 1655 (compare Figure 5A and S7A). Similarly, Inline graphic factors estimated from the standard RPF nucleotide coverage profile Inline graphic for E. coli MG 1655 also had much less pronounced 5′- bias than the corresponding Inline graphic factors for the E. coli AS19 data set (Supplementary Figures S7B and S4). We attribute these differences to the much longer incubation with MNase of 1 hour for E. coli MG1655 (6) vs. 10 min for our E. coli AS19 during library preparations.

Neutralization of nuclease induced bias in Ribo-seq spectra from Saccharomyces cerevisiae

To further validate our modeling approach, we considered two published datasets from the yeast S. cerevisiae (19). These were prepared with MNase (S7) and RNase A with distinct cleavage biases: while MNase cuts preferentially before A and U, RNase A cleaves preferentially after C and U (19). Here, we applied our nucleotide-resolution ML approach to quantify the characteristic biases in the two datasets. For the MNase set we detected much higher Inline graphic factor values for A/U compared to C/G nucleotides at position p = 7 and p = 35 corresponding to the nucleotides at the 5′- end and the nucleotide after the 3′- RPF end, respectively (Figure 10A). This pattern is very similar to that in our MNase dataset from E. coli (Figure 5). In contrast, the Inline graphic factors for the RNaseA dataset were relatively small for b = A or G at position p = 6, corresponding to the nucleotide before the 5′- end of the RPF and near zero at position p = 34 which corresponds its 3′- end (Figure 10 B). This suggests that the technical biases of the two data sets are distinct and the differences reflect the cleavage preferences of the nucleases used to generate the RPF libraries.

Figure 10.

Figure 10.

Nucleotide-resolution Inline graphic-factors calculated from RPF coverage profiles with RPFs with FL = 28 for yeast Ribo-Seq datasets constructed using MNase (A) and RNase A (B).

We then calculated the codon resolution Inline graphic factors for the MNase- and RNase A-treated yeast datasets and used the inner subset of Inline graphic factors for positions from p1 = 5 to p2 = 9 to obtain bias-corrected Inline graphic parameters (Equation 25) and Inline graphic pausing scores (Equation 27). As expected, both the correlation between the Inline graphic RPF scores (Equation 12) and between the Inline graphic RPF scores (Equation 13) from the two data sets obtained with MNase and RNase are weak (r = 0.3) as exemplified in Figure 11A and B for the YGR027C transcript (coding for the S25 protein of the 40S ribosomal subunit). In contrast, the bias-corrected pausing score Inline graphic profiles of YGR027C from the two data sets are strongly correlated (r = 0.77) with similar features (Figure 11C). The absolute translation time profiles Inline graphic for YGR027C transcript calculated from RNase A and MNase data sets, assuming 2 h duplication time of the yeast culture (Equation 36), are also remarkably similar (Figure 11D). This similarity also reflects the varying codon elongation time as the ribosome moves codon by codon along the YGR027C transcript. We obtained very similar results for other transcripts. Notably, the frequency distribution of the r -values underwent a large shift from low to high correlation following neutralization of nuclease-introduced biases (Figure 11E). We also observed a strong correlation between Inline graphic and Inline graphic values obtained for the MNase (S7) and RNase A datasets for positions near the A site (Supplementary Figure S8A). This correlation for the A-site position was r = 0.8 and increased (r = 0.85) when rear codons (i.e. with frequency < 0.3%) were excluded. This, we suggest, reflects the similarity of the effects of a particular A-site codon on the codon translation time in both data sets. For P and E sites, the correlation between Inline graphic and Inline graphic factors for the P and E sites was less pronounced (r = 0.7). As expected, the correlation between Inline graphic and Inline graphic factors for positions near the edges of the yeast ribosome (e.g. positions 3 and 12) is low (r < 0.25), which reflects the distinct sequence preferences of the nucleases in RPF generation (Supplementary Figure S8B).

Figure 11.

Figure 11.

Comparison of experimental RPF scores Inline graphic (A), model RPF scores Inline graphic (B) and pausing scores Inline graphic (C) for YGR027C transcript (encoding ribosomal protein S25, Rps25a) in MNase (red) and RNase A (blue) yeast Ribo-Seq data sets; the Pearson correlation coefficients, r, between the two data sets are r = 0.32 (A), r = 0.35 (B), r = 0.77 (C) and r = 0.77 (D). (D) Absolute elongation time spectrum (Equation 39) of YGR027C derived from the MNase (red) and RNase A (blue) Ribo-Seq data set. The mean elongation time per codon for the YGR027C transcript is 80.8 ms and 86.7 ms when estimated from the RNAse A and MNase datasets, respectively. (E) Frequency distribution of Pearson correlation coefficients, r, between scores for the same transcripts in yeast datasets prepared with RNase A (RA) and MNase (S7): for experimental scores (light blue, correlation between Inline graphic and Inline graphic), model scores (red, correlation between Inline graphic and Inline graphic), ‘five-inner model’ scores (dark blue, correlation between Inline graphic and Inline graphic) and bias-free pausing scores (orange, correlation between Inline graphic and Inline graphic). ‘Five-inner model’ refers to modeling each (i,j) A-site context contribution with 5 Inline graphic position parameters (see Equations 28 and 29 for Inline graphic definition).

We wish to emphasize that to obtain the bias-corrected pausing scores Inline graphic (Equation 27), we used the five inner-position Inline graphic values from the fifteen Inline graphic values accounting for the total local A-site context in the modeling of the experimental Inline graphic datasets. Different bias elimination method has been developed earlier (20) using neural network modeling to predict the elongation time of the A-site codon from its short sequence context (that does not include the edges of the RPFs). To compare these two principally different ways of bias elimination, in similarity to this approach (20), we restricted the context to five codons around each A-site codon, and, hence excluded the edges of the RPFs. We then modeled the two Ribo-Seq sets from S. cerevisiae (19) processed with either MNase or RNase A using “five-inner modeling", i.e. using only five-inner Inline graphic values to obtain RPF scores Inline graphic (Equations 28 and 29) and found them different from pausing scores Inline graphic. We then calculated Pearson correlations between Inline graphic and Inline graphic for each transcript i in the two datasets. Clearly, our ‘bias-free method’ leads to much higher correlation between the Inline graphic derived Inline graphic scores (Figure 11E, orange ‘bias-free’ r-frequency profile) than the ‘five-inner model’ for the correlation between the Inline graphic-derived Inline graphic scores (Figure 11E, ‘five inner model’, dark blue r-frequency profile). An intuitive explanation for this result is that in the ‘five inner model’ the five Inline graphic inner-position factors used for the description of a Inline graphic dataset absorb the experimental biases. In contrast, using pL = 15 local positions for modeling Inline graphic, the experimental biases are absorbed by the outer ten Inline graphic factors (Equation 24), thus leaving the five-inner Inline graphic factors bias-free. Thus, the ‘five-inner modelling’ that essentially emulates an earlier approach (20), reduces the precision of elongation time estimates.

DISCUSSION

Since decades, quantitative studies of protein synthesis with purified ribosomes and auxiliary translation components have been performed across species (37,38). In spite of the insights from these biochemical approaches, there are considerable differences between the empirical contexts of cell-free and intracellular mRNA translation. For instance, in the living cell tightly controlled parallel pathways exist for the supply of aminoacyl-tRNAs, for ternary complex formation and, furthermore, the translation of A-site codons takes place in the context of a virtual infinitude of sets of neighboring codons. Thus, experimental approaches orthogonal to in vitro biochemistry will deepen our understanding of how the intracellular kinetic networks of mRNA translation shape the life sustaining phenotypes of living cells. In the present work, we join the ongoing and rapidly growing efforts to establish genome-wide technologies (3,39) for quantitative studies of mRNA translation. We provide a framework for parallel estimation of elongation times of all codons in all local codon contexts of different types of cells. This was made possible by the development of novel type of model to be fitted to transcriptome-wide ribosome profiling data for parameters estimation. Our model describes the elongation time at each codon of the transcriptome as a product of 15 independent Inline graphic factors, one for each codon position in the local context surrounding the ribosomal A site. The factor for each codon context position can have one of 61 possible values, depending on its codon identity and context position. Using a maximum likelihood criterion, we obtain the values of 15 × 61 = 915 Inline graphic factors for 61 sense codons in 15 local context positions by fitting our model to the experimental RPF spectrum. Despite large ruggedness and stochastic fluctuations, the experimental data are well fitted by the model.

To discriminate between effects of codon context on nuclease cleavage preferences on one hand and peptide elongation time variations on the other, we use models with both single-codon and single-nucleotide resolution. In line with previous findings (9), we find much higher MNase activity at A/U compared to G/C nucleotides near the 5′- or 3′-ends of RPFs, leading to strongly skewed fragment creation/processing and biased RPF spectra. At the same time, the MNase cleavage bias does not propagate into the inner context on both sides of the A-site codon, a crucial feature enabling neutralization of technical codon context-dependent bias. In this way, we derived unbiased RPF spectra suitable for estimation of codon elongation times throughout the transcriptome. We observed differences in the Inline graphic values for the A site between two different E. coli strain, MG1655 and AS19 (compare Figures 8A and 9B), implying that our approach can very sensitively detect elongation time difference at single codon between different strains, growth medium and conditions.

We have applied our modeling approach to clarify the effects of mupirocin-induced inhibition of the IleRS activity in a bacterial system using a previously published data set (6). The inhibition decreases the rate of supply of charged tRNAIle isoacceptors (33) and greatly enhances values of Inline graphic parameters for all three Ile codons (AUA, AUC or AUU) in the A site, suggesting greatly increased binding time for isoleucyl-tRNAIle-containing ternary complexes. Considering that the total concentration of major tRNAIle1 isoacceptor is an order of magnitude larger than that of the minor tRNAIle2 isoacceptor (34) and assuming nearly 100% charged levels of both tRNAIle isoacceptors in the absence of the inhibitor, the time for ternary complex binding into the A site is estimated to be an order of magnitude smaller for AUC/AUU than for AUA codons. In the inhibitor-less case, the total peptide elongation time is about 30% longer for AUA than for AUC/AUU codons (Figure 9B). From these data we suggest that the relative change in the time for ternary complex binding into the A site is much larger for AUC/AUU than AUA codons, meaning that AUA decoding is much less sensitive to IleRS inhibition than AUC/AUU. This further corroborates the theory of selective charging of tRNA isoacceptors (35), previously validated by SerRS inhibition in E. coli cells (36). In fact, our method might be very useful for detection of ternary complex depletion scenarios in cells. This optimistic notion receives further support from the observation that mupirocin, in addition to slowing down translation at Ile codons, also speeds up translation of Gly and Ser codons in a codon selective manner. That is, mupirocin addition reduces considerably the reading times of major (GGC/GGU) but not of the minor (GGG/GGA) Gly codons and reduces the reading times for all Ser codons (Figure 9). A possible scenario to explain also these codon-specific patterns is that under experimental conditions used to obtain the RPF dataset in E. coli MG1655 grown in balanced medium both Gly and Ser codons are weakly starved for their cognate ternary complex (6) due to deficient intracellular supply of Gly and Ser (40). Mupirocin addition slows down the overall protein synthesis, thereby removing the supply bottlenecks of Gly and Ser and pausing at their codons. We note that the theory of selective charging of tRNAs predicts starvation-sensitive reading of GGC/GGU but not of GGG/GGA codons and starvation sensitivity of all Ser codons (35), which corroborates the proposed scenario of weak Gly and Ser starvation that is removed by addition of an IleRS inhibitor.

We have broadened our approach from bacterial systems to include also eukaryote systems. We compared two published Ribo-Seq sets from S. cerevisiae (19), derived from identical yeast populations but processed with different nucleases, either MNase or RNase A. Both RNases exhibit strong but distinct cleavage preferences leading to greatly different and virtually uncorrelated experimental and model reproduced RPF spectra. However, after bias neutralization model spectra for both RNases become less rugged and are strikingly similar (Figure 11). This means, we propose, that our bias-neutralization approach provides a solution to the long-standing problem of extracting reliable quantitative information about individual codon elongation cycle times from greatly rugged, highly noisy and biased RPF spectra.

Ribosome profiling holds a great promise of detailed insights into the dynamics of protein synthesis in single cells and multicellular organisms. The ongoing improvements of data analysis along with refinements of experimental techniques and the synergy of different and sometimes orthogonal approaches will accelerate the development of this promising field.

DATA AVAILABILITY

The sequencing data for E. coli AS19 generated in this study have been deposited within Gene Expression Omnibus (GEO) under accession number GSE145571. Two published data sets (6,19) analyzed here too, are available under the accession numbers in the GEO Series with accession number GSE119104 (GSM3358136 and GSM3358137) for E. coli MG1655 and GSE 82220 (GSM2186726 and GSM2186728) for yeast. All scripts and source code for modeling and calculating the parameters used here are deposited in https://github.com/gustafGitHub/RiboTimes.

Supplementary Material

gkab260_Supplemental_File

ACKNOWLEDGEMENTS

We thank Irem Avcilar, Christian del Campo, and Anneli Borg for ribosome profiling libraries and growth curves, and Alexander Bartholomäus and Baban Kolte for helping with the mapping pipeline.

Contributor Information

Michael Y Pavlov, Department of Cell and Molecular Biology, Biomedical Center, University of Uppsala, 75237 Uppsala, Sweden.

Gustaf Ullman, Department of Cell and Molecular Biology, Biomedical Center, University of Uppsala, 75237 Uppsala, Sweden.

Zoya Ignatova, Institute for Biochemistry & Molecular Biology, University of Hamburg, 20146 Hamburg, Germany.

Måns Ehrenberg, Department of Cell and Molecular Biology, Biomedical Center, University of Uppsala, 75237 Uppsala, Sweden.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Swedish Research Council [VR 2018-0404, VR 2016-0624]; Knut and Alice Wallenberg Foundation (to M.E.); EU Horizon 2020 program (Marie Skłodowska-Curie) [764591 to Z.I.]. Funding for open access charge: EU Horizon 2020 program (Marie Skłodowska-Curie) [764591].

Conflict of interest statement. None declared.

REFERENCES

  • 1. Steitz J.A. Polypeptide chain initiation: nucleotide sequences of the three ribosomal binding sites in bacteriophage R17 RNA. Nature. 1969; 224:957–964. [DOI] [PubMed] [Google Scholar]
  • 2. Ingolia N.T., Ghaemmaghami S., Newman J.R., Weissman J.S.. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science. 2009; 324:218–223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Ingolia N.T. Ribosome footprint profiling of translation throughout the genome. Cell. 2016; 165:22–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Schuller A.P., Green R.. Roadblocks and resolutions in eukaryotic translation. Nat. Rev. Mol. Cell Biol. 2018; 19:526–541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Stern-Ginossar N., Ingolia N.T.. Ribosome profiling as a tool to decipher viral complexity. Annu Rev Virol. 2015; 2:335–349. [DOI] [PubMed] [Google Scholar]
  • 6. Mohammad F., Green R., Buskirk A.R.. A systematically-revised ribosome profiling method for bacteria reveals pauses at single-codon resolution. eLife. 2019; 8:e42591. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Woolstenhulme C.J., Guydosh N.R., Green R., Buskirk A.R.. High-precision analysis of translational pausing by ribosome profiling in bacteria lacking EFP. Cell Rep. 2015; 11:13–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Datta A.K., Burma D.P.. Association of ribonuclease I with ribosomes and their subunits. J. Biol. Chem. 1972; 247:6795–6801. [PubMed] [Google Scholar]
  • 9. Dingwall C., Lomonossoff G.P., Laskey R.A.. High sequence specificity of micrococcal nuclease. Nucl Acis Res. 1981; 9:2659–2673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. O’Connor P.B., Li G.W., Weissman J.S., Atkins J.F., Baranov P.V.. rRNA:mRNA pairing alters the length and the symmetry of mRNA-protected fragments in ribosome profiling experiments. Bioinformatics. 2013; 29:1488–1491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Weinberg D.E., Shah P., Eichhorn S.W., Hussmann J.A., Plotkin J.B., Bartel D.P.. Improved ribosome-footprint and mRNA measurements provide insights into dynamics and regulation of yeast translation. Cell Rep. 2016; 14:1787–1799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Zheng W., Chung L.M., Zhao H.. Bias detection and correction in RNA-sequencing data. BMC Bioinfomat. 2011; 12:290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Artieri C.G., Fraser H.B.. Accounting for biases in riboprofiling data indicates a major role for proline in stalling translation. Genome Res. 2014; 24:2011–2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. O’Connor P.B., Andreev D.E., Baranov P.V.. Comparative survey of the relative impact of mRNA features on local ribosome profiling read density. Nat. Commun. 2016; 7:12915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Sharma A.K., Sormanni P., Ahmed N., Ciryam P., Friedrich U.A., Kramer G., O’Brien E.P.. A chemical kinetic basis for measuring translation initiation and elongation rates from ribosome profiling data. PLoS Comput. Biol. 2019; 15:e1007070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Del Campo C., Bartholomaus A., Fedyunin I., Ignatova Z.. Secondary structure across the bacterial transcriptome reveals versatile roles in mRNA regulation and function. PLoS Genet. 2015; 11:e1005613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Mortazavi A., Williams B.A., McCue K., Schaeffer L., Wold B.. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Meth. 2008; 5:621–628. [DOI] [PubMed] [Google Scholar]
  • 18. Mohammad F., Woolstenhulme C.J., Green R., Buskirk A.R.. Clarifying the translational pausing landscape in bacteria by ribosome profiling. Cell Rep. 2016; 14:686–694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Gerashchenko M.V., Gladyshev V.N.. Ribonuclease selection for ribosome profiling. Nucleic Acids Res. 2017; 45:e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Tunney R., McGlincy N.J., Graham M.E., Naddaf N., Pachter L., Lareau L.F.. Accurate design of translational output by a neural network model of ribosome distribution. Nat. Struct. Mol. Biol. 2018; 25:577–582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Levenberg K. A method for the solution of certain non-linear problems in least squares. Quarterly Appl Math. 1944; 2:164–168. [Google Scholar]
  • 22. Marquardt D. An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math. 1963; 11:431–441. [Google Scholar]
  • 23. Dana A., Tuller T.. The effect of tRNA levels on decoding times of mRNA codons. Nucleic Acids Res. 2014; 42:9171–9181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Hussmann J.A., Patchett S., Johnson A., Sawyer S., Press W.H.. Understanding biases in ribosome profiling experiments reveals signatures of translation dynamics in yeast. PLoS Genet. 2015; 11:e1005732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Bremer H., Dennis P.P.. Modulation of chemical composition and other parameters of the cell at different exponential growth rates. EcoSal Plus. 2008; 3:doi:10.1128/ecosal.5.2.3. [DOI] [PubMed] [Google Scholar]
  • 26. Bartholomaus A., Del Campo C., Ignatova Z.. Mapping the non-standardized biases of ribosome profiling. Biol. Chem. 2016; 397:23–35. [DOI] [PubMed] [Google Scholar]
  • 27. McGlincy N.J., Ingolia N.T.. Transcriptome-wide measurement of translation by ribosome profiling. Methods. 2017; 126:112–129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Chiba S., Ito K.. Multisite ribosomal stalling: a unique mode of regulatory nascent chain action revealed for MifM. Mol. Cell. 2012; 47:863–872. [DOI] [PubMed] [Google Scholar]
  • 29. Lu J., Deutsch C.. Electrostatics in the ribosomal tunnel modulate chain elongation rates. J. Mol. Biol. 2008; 384:73–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Nakatogawa H., Ito K.. Secretion monitor, SecM, undergoes self-translation arrest in the cytosol. Mol. Cell. 2001; 7:185–192. [DOI] [PubMed] [Google Scholar]
  • 31. Laidler K., King C.. Development of transition-state theory. J. Phys. Chem. 1983; 87:2657–2664. [Google Scholar]
  • 32. Dana A., Tuller T.. Properties and determinants of codon decoding time distributions. BMC Genomics. 2014; 15:S13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Hughes J., Mellows G.. Inhibition of isoleucyl-transfer ribonucleic acid synthetase in Escherichia coli by pseudomonic acid. Biochem. J. 1978; 176:305–318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Ikemura T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol. 1981; 151:389–409. [DOI] [PubMed] [Google Scholar]
  • 35. Elf J., Nilsson D., Tenson T., Ehrenberg M.. Selective charging of tRNA isoacceptors explains patterns of codon usage. Science. 2003; 300:1718–1722. [DOI] [PubMed] [Google Scholar]
  • 36. Lindsley D., Bonthuis P., Gallant J., Tofoleanu T., Elf J., Ehrenberg M.. Ribosome bypassing at serine codons as a test of the model of selective transfer RNA charging. EMBO Rep. 2005; 6:147–150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Dever T.E., Green R.. The elongation, termination, and recycling phases of translation in eukaryotes. Cold Spring Harb. Perspect. Biol. 2012; 4:a013706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Maracci C., Rodnina M.V.. Review: translational GTPases. Biopolymers. 2016; 105:463–475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Iwasaki S., Ingolia N.T.. The growing toolbox for protein synthesis studies. Trends Biochem. Sci. 2017; 42:612–624. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Avcilar-Kucukgoze I., Bartholomaus A., Cordero Varela J.A., Kaml R.F., Neubauer P., Budisa N., Ignatova Z.. Discharging tRNAs: a tug of war between translation and detoxification in Escherichia coli. Nucleic Acids Res. 2016; 44:8324–8334. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkab260_Supplemental_File

Data Availability Statement

The sequencing data for E. coli AS19 generated in this study have been deposited within Gene Expression Omnibus (GEO) under accession number GSE145571. Two published data sets (6,19) analyzed here too, are available under the accession numbers in the GEO Series with accession number GSE119104 (GSM3358136 and GSM3358137) for E. coli MG1655 and GSE 82220 (GSM2186726 and GSM2186728) for yeast. All scripts and source code for modeling and calculating the parameters used here are deposited in https://github.com/gustafGitHub/RiboTimes.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES