Abstract
Ribosome profiling spectra bear rich information on translation control and dynamics. Yet, due to technical biases in library generation, extracting quantitative measures of discrete translation events has remained elusive. Using maximum likelihood statistics and data set from Escherichia coli we develop a robust method for neutralizing technical biases (e.g. base specific RNase preferences in ribosome-protected mRNA fragments (RPF) generation), which allows for correct estimation of translation times at single codon resolution. Furthermore, we validated the method with available datasets from E. coli treated with antibiotic to inhibit isoleucyl-tRNA synthetase, and two datasets from Saccharomyces cerevisiae treated with two RNases with distinct cleavage signatures. We demonstrate that our approach accounts for RNase cleavage preferences and provides bias-corrected translation times estimates. Our approach provides a solution to the long-standing problem of extracting reliable information about peptide elongation times from highly noisy and technically biased ribosome profiling spectra.
INTRODUCTION
Ribosome profiling (or Ribo-Seq) couples cell-wide profiling of the positions of translating ribosomes on messenger (mRNA) at single codon resolution (1) with deep sequencing (2) and has provided new insights into regulation of protein synthesis across species (reviewed in (3–5)). The approach requires rapid arrest of mRNA translation followed by isolation of intact mRNA-ribosome complexes, nuclease digestion of unprotected mRNA and generation of a deep-sequencing library from the ribosome-protected mRNA fragments (RPFs) (2). Interpretation of the RPFs in terms of elongation times at single codon resolution requires (i) ribosomal arrest to be faster than the single peptide elongation steps, (ii) precise estimation of the distance of the ribosomal A site (that is the ribosomal site accepting aminoacyl-tRNA-elongation factor complex) from the 5′- or 3′-ends of RPFs, (iii) neutralization of sequence-dependent biases in the experimental protocol (i.e. nuclease cleavage, amplification in the library preparation) (3,6). Fulfillment of these criteria enables determining translation time for any particular codon in the transcriptome.
Codon resolution of the RPF spectra is generally higher in eukaryotes than in bacteria. In eukaryotes, RNase I is the nuclease of choice and it cleaves precisely at ribosome boundaries (7). RNase I is inhibited by the bacterial ribosome (8), thus micrococcal nuclease (MNase, S7 nuclease) is most widely applied in generating bacterial Ribo-Seq libraries. MNase, however, cleaves with base-dependent specificity, preferably before A and U (9). Systematic analysis reveals that the MNase generated RPFs have more variable lengths at their 5′- than at their 3′-ends (7,10). Consequently, using the more precise MNase cleavage at the 3′-end to infer the A-site codon position improves the resolution of bacterial ribosome profiling sets (6,7), yet the bias in RPF generation due to the nucleotide-dependent specificity of the MNase persists.
An additional source of bias in the Ribo-Seq libraries is the local RPF sequence composition including high propensity for secondary structure formation for some RNA fragments which can interfere with the reverse transcription priming and/or with the adaptor ligation (11,12). Attempts at considering the systematic biases across Ribo-Seq libraries (13) or using smoothing algorithms to reduce data variance in the presence of the inherent heterogeneous noise of the ribosome profiling data sets (14,15) significantly improve the ability to distinguish genuine ribosome pausing from technical artifacts introduced by the library construction. Yet, a simple and robust method for neutralizing technical biases and extracting factors that determine the large sequence context dependent variations in translation speed even at identical ribosomal A-site codons is missing.
In the present work, we develop a model that accounts for the local codon context-dependent variation of peptide elongation times and RPF generation/processing biases. In total, we use 915 context-defining parameters, which are estimated by fitting the model-predicted RPF spectra to the experimental, transcriptome-wide RPF spectra using non-linear regression with maximum likelihood (ML) statistics. We also consider ribosome profiling spectra at single nucleotide resolution with homogenous fragment size to identify and neutralize RPF generation/processing biases near the 5′- and 3′-fragment ends. Our results suggest that an inner local context of five codons, including those at the A, P and E sites, accounts for the ribosomal dwell time on each A-site codon of the transcriptome. This determination of the peptide elongation times provides a basis for a detailed understanding of the dynamics of protein synthesis in living cells.
MATERIALS AND METHODS
Ribo-Seq library generation
Escherichia coli B strain AS19 was grown in LB medium until the culture reached an OD600 of 0.5. Cells were harvested by flash freezing and libraries from biological replicates were prepared for ribosome profiling by direct ligation of the platform-specific sequences or adapters as described (16). Sequenced RPFs were quality trimmed using fastx-toolkit (0.0.13.2; quality threshold: 20), sequencing adapters were cut using cutadapt (1.8.3); minimal overlap: (1 nt) and uniquely mapped to the E. coli genome (strain MG1655, version U00096.3, NCBI) using Bowtie (1.2.2) with parameters -l 16 -n 1 -e 50 -m 1—strata—best y. The RPF counts for each ORF were normalized per total mapped reads per million (RPM) (17) and calibrated to the A site using the 3′-ends of the RPFs as described earlier (18). The data sets generated in this study are accessible under the accession number GSE145571. Furthermore, we analyzed in the same way the following data sets: GSM3358136 and GSM3358137 for Ribo-Seq libraries of E. coli MG1655 cultured in MOPS complete synthetic media containing all 20 amino acids with no treatment or treated for 10 min with 200 μM mupirocin, respectively, and collected by filtration (6), and GSM2186726 and GSM2186728 for S. cerevisiae libraries in which the RPFs were generated using MNase and RNase A, respectively (19).
Modeling strategy for Ribo-Seq spectra
Each RPF is assigned to a codon position j of the open reading frame from gene i, ORFi. The detected number of RPFs,
, often colloquially referred to as ‘RPF counts’, reflects the number of ribosomes with this particular codon in A site at the moment of flash-freezing of the cells as well as biases in the nuclease digestion of mRNA and in the further amplification/processing to DNA libraries (3,9,11,12,14,20). The expected value
of the stochastic integer
at any A-site codon position (i,j) we write as:
![]() |
(1) |
Here,
is the same constant for all A-site positions (i,j),
is the global frequency of translation initiation of an ORF of type i in the cell population and proportional to the ORFi expression level,
is the expected peptide elongation cycle time,
is a ‘bias’ factor that depends on the context of codon j in ORFi and reflects the extent of digestion/processing/ amplification biases in Ribo-Seq library preparation. We note that
constant reflects the depth of Ribo-Seq library. Its numeric value depends on the number of translating ribosomes in the cell population used for library preparation and also on the efficiencies of ligation, RPF amplification and sequencing.
Each elongation time
in Equation (1) is the product of a time calibration factor
and a parameter
that, like
, depends on the context of codon j but is proportional to the peptide elongation cycle time:
. Accordingly, we re-write Equation (1) as:
![]() |
(2) |
Here, parameter
is proportional to global frequency of translation initiation
of ORFi and
is defined by
. The expected value of RPF counts,
, contains two factors of great physiological relevance, namely the protein expression level
from gene i and the expected peptide elongation cycle time
at A-site codon j in transcript i. The major methodological task is, therefore, to elicit reliable estimates of the expected values,
, proportional to
, and
for all codons (i, j) from the experimental sets of the sampled
values and known growth rate,
, of the bacterial culture. When
is much larger than 1, it provides a reliable estimate of
but for small
values its statistical nature must be accounted for by the probability
that the number of RPF counts from A-site codon (i,j) is
. The RPF counts
are obtained from ligated RNA fragments with copy numbers amplified by PCR and greatly reduced in the sequencing procedure. The probability distributions for RNA fragments after ligation are of Poisson type (Supplementary Text), the distributions of DNA fragments after amplification are burst-like (21) and the distributions of sequenced DNA fragments
are of Neyman type A. At small values
, where A is the PCR copy number amplification factor and q is the fraction of the amplified library that has been finally sequenced, the Neyman type A distribution is close to Poisson but with variance equal to the expected value (
) multiplied by a constant factor
(Supplementary Text). Under our experimental conditions the
product is smaller than 1, and for the simplicity in what follows, we assume the copy number distribution for any (i,j) fragment to be of Poisson type:
![]() |
(3) |
We ascribe a log-likelihood function L for the whole transcriptome based on all
probabilities:
![]() |
(4) |
In what follows we develop a model for the
values in Equation 1 built on the hypothesis that each
in Equation (2) is determined by a local context of the current A-site codon j in ORFi and that this context is composed of pL, codons with the A site at its near middle position (Figure 1). For clarity, below we make explicit three distinct description levels of parameters in our approach: directly experimental (e.g.
), modelled (e.g.
) and expected (e.g.
) values of key parameters. For ease of identification, we also use the Latin letters for the first two categories and Greek letters for the third one (Table 1).
Figure 1.

Local and global codon contexts for a ribosome translating an ORF of type i. Global A-site parameter j corresponds to the local A-site parameter p = pA= 8. Global parameter j' corresponds to the local parameter p through j' = j-pA+p, where p varies from 1 to pL= 15, so that P and E site correspond to p= 7 and p = 6, respectively.
Table 1.
Meaning of key parameters of the present work
| Expected parameters | |
|---|---|
|
Expected number of RPF counts from codons at position j in open reading frames from gene i (ORFi) in cell population |
|
Constant reflecting the depth of the Ribo-Seq library as determined by the number of translating ribosomes in the cell population, efficiencies of RPF generation, ligation, amplification and sequencing. |
|
Expected number of initiations on ORFi. |
|
Technical factor, determined by context bias efficiencies of RPF generation, ligation and amplification for codon j in ORFi. |
|
Expected elongation cycle time for codon j of ORFi. |
|
Expected elongation cycle time average for the whole cell population. |
|
Expected codon context dependent elongation cycle time for codon j of ORFi normalized to τe. |
|
Factor proportional to global frequency of translation initiation of ORFi. |
|
Expected codon context dependent variation of number of RPFs normalized to factor and partitioned into elongation cycle time and bias factors. |
| Experimental parameters | |
|
Measured number of RPF counts for A-site codon at position j in ORFi. |
|
Sum of RPF counts for the ‘inner’ region of ORFi containing codons. |
|
Mean RPF density in the ‘inner’ ORFi region containing codons. |
|
Sum of over all positions j in all ORFs for which there is a codon of type ‘c’ at position ‘j+p-pA’ . |
|
RPF score function describing relative variation of along ORFi. |
| Modell parameters | |
|
Maximum likelihood (ML) estimate of λij. |
|
ML estimate of by a number pL of factors, each with 64 codon identity determined values. |
|
Underlying parameters of our model, determined from the ML fit of all to all values. |
|
ML estimate of . |
|
ML estimate of the sum of model RPF counts for the ‘inner’ region of ORFi. From the expressions for and it follows that . |
|
ML estimate of . |
|
RPF score function describing relative variation of along ORFi.
|
|
Model estimate of the expected elongation cycle time for codon j of ORFi. |
|
Model estimate of time factor ; experimentally determined from the growth rate of cell population. |
|
Model estimate of bias-free relative elongation cycle time for codon j of ORFi; determined by the product factor for inner position of local context of codon (i,j). |
|
Model estimate of bias-free total time for ORFi translation normalized to te. |
|
Model estimate of absolute total time for ORFi translation. |
|
Pausing score function describing relative variation of bias free translation time along ORFi. |
Ribo-seq spectral modeling at single codon resolution
To obtain estimates for all
values (Eq. 1 or 2), we introduce model RPF counts,
composed of a factor
for gene i, estimating
multiplied by a local context factor
, estimating
in Equation (2):
![]() |
(5) |
Each local codon context position (p) among the total number pL of context defining positions contributes with a factor
to the value of
:
![]() |
(6) |
Each factor
is determined by the identity (c) of each one of the 64 possible codons at each position p (Figure 1). Index
identifies the codon at local position p, corresponding to global codon position j + p −pA in ORFi sequence (Figure 1). We fit the model RPF counts,
, to the experimental RPF counts,
, in the inner ORFs regions, by adjusting the
context factors
to maximize a Poisson-based likelihood function (Equation 4). If not stated otherwise, we use pL = 15, so that a total of 915 (15 × 61 = 915) factors
estimate all
-values in all potential contexts, where 61 is the number of sense codons. The E. coli transcriptome may contain up to 1.8 × 106 distinct contexts (about 6000 ORFs and 300 codons per ORF), and the ultimate number of contexts for which
could be predicted by the model is 6115
1027. In the next section, we describe how model parameters are derived from experimental data by maximizing a transcriptome-wide log likelihood function.
Ribo-seq spectral modeling with maximum likelihood (ML) estimation of local codon context parameters
To extract model parameters
(Equation 5) and
(Equation 6) from Ribo-Seq datasets, we assume that each
value is sampled from a Poisson distribution with expected value
(Equation 3), the latter estimated by the model parameter
(Equations 5 and 6). The log-likelihood function L for the RPF spectrum takes the simple form (see also Equation 4):
![]() |
(7) |
Here, the j-summations for each ORFi are confined to an internal ORF region starting at codon pA and ending at codon
with a total number of internal codons,
, where
is the total number of ORFi codons. In what follows we use the short hand notation
for the j-summations in Equation 7. The maximal value of L (Equation 7) is obtained by setting its partial derivatives with respect to all
and
parameters equal to zero, which leads to the following equation system for determination of all
parameters (see Supplementary Text):
![]() |
(8) |
where
![]() |
(9) |
Here,
is the ‘Kronecker delta function’ equal to 1 and 0, when
and
, respectively;
(Table 1) and
is a function that depends on codon type ‘c’ at local position ‘p’ (Figure 1).
is calculated from experimental RPF counts
(and sequence data) as:
![]() |
(10) |
We note that in the special case p = pA (Figure 1),
is the total number of RPFs in a dataset generated by ribosomes with A-site codon of type ‘c’. More generally,
is the total number of RPFs for which there is a codon ‘c’ at a distance
from the A site.
With the help of Equation 6 that relates
with
, Equations (8) and (9) are solved using a Levenberg-Marquardt type algorithm (21,22) to obtain the table of
factors (see Supplementary Text). Using the obtained
factors we compute local context parameters
(Equation 6), and then model RPF counts
from Equations (5) and (9) as:
![]() |
(11) |
Instead of comparing experimental (
) and modelled (
) RPF spectra of the same transcript, it is more convenient to compare experimental (
) and modeled (
) RPF scores defined here as:
![]() |
(12) |
and
![]() |
(13) |
where ni is the total number of internal codons in ORFi.
The average RPF density of a gene:
![]() |
(14) |
is often used as a statistical reliability measure of its RPF coverage profile.
The lower the
- value, the less informative the profile. For example, when
RPFs per codon, more than a half of the
values in the gene profile are zeroes and, hence, contain little information about codon translation times. We note that similar to j-summations above, the k-summations in Eqs. 11–14 are from k = pA to
(Figure 1). We also note that experimental RPF scores
are sometimes referred to as ‘normalized footprint counts’ (23) or ‘relative enrichment values’ (24) and describe how much RPF counts for codon j deviate from a per-codon average value
in the inner region of a gene.
The
factors can always be scaled so that for each position
of the local context we have (Supplementary Text):
![]() |
(15) |
where the
weighting factors are calculated as:
![]() |
(16) |
and
is :
![]() |
(17) |
Since
estimates
, a parameter proportional to the expression level of gene i (Equation 1) and
(Equation 17) varies little with position p (see Supplementary Text), each product
in Equation 16 is proportional to the frequency with which the ribosome encounters a codon of a type c in the inner region of ORFi. Hence, each
confers a statistical weight proportional to the frequency with which the ribosome encounters a codon of type c in the transcriptome (Supplementary Text).
We also introduce the ‘sensitivity parameter’
as a measure of the sensitivity of
to the codon identity c at local context position p. It is defined as the standard deviation,
, from the mean
(Equation 15) for row p of the table of
factors:
![]() |
(18) |
where the weights
are defined in Equation (16).
Ribo-seq spectral modeling at single nucleotide resolution
In order to estimate the fragment processing bias (
; Equation 1), we extend our modeling resolution from codon to nucleotide level. For this, we use the number of RPFs,
, of single length FL with ribosomal A site located at nucleotide j in ORFi and estimate its expected value,
(compare with Equations 1 and 5) as:
![]() |
(19) |
where each parameter
is modeled as product of local context z-factors (compare with Equation 6):
![]() |
(20) |
Here, index j in Equations (19) and (20) refers to nucleotide j of ORFi, and
specifies nucleotide base b (U, C, A or G) at transcriptome position (i,j);
factors form a pNLx4 table; the local nucleotide position p is counted from p= 1, via the first nucleotide at A-site position p = pNA to the third base of the last codon of the local context sequence of length pNL (Supplementary Figure S1).
Parameters
are ML estimated by non-linear model fitting to experimental data assuming Poisson distributed RPF counts
. The data treatment is formally equivalent to that leading up to Equations. 8 and 9 with parameters
and
replaced by
and
, respectively. Thus:
![]() |
(21) |
where
is the Kronecker delta function and
is obtained from experimental data
through (compare with Equation 10) :
![]() |
(22) |
Assuming x to be the distance from the first A-site nucleotide to the 3′-end of the RPF in nucleotides, it follows that
and
are the numbers of RPFs of length FL with nucleotide ‘b’ at 5′- and 3′-end, respectively. By applying the same ML procedure as in the codon-resolution case, we solve Equation 21 to estimate the
and
factors for computing all
and
parameters. Using formulae analogous to those in Eqs 12 and 13, one can compute the model nucleotide RPF scores
to compare them with the experimental scores
for RPF profiles generated from RPFs with a length of FL nucleotides.
Construction of unbiased Ribo-Seq spectra for estimation of relative peptide elongation times
To separate the effects of bias and peptide elongation time variations on the RPF counts, we partition the context dependent factors
in Equation 5 into two parts:
![]() |
(23) |
where (compare with Equation 6):
![]() |
(24) |
and:
![]() |
(25) |
As shown in Results, the outer context dependent factors
, determined by
factors for outer local context positions p from 1 to p1–1 and from p2+1 to pL, mainly account for the nuclease digestions/processing biases (B). The inner context dependent factors
, determined by
factors for inner positions p (from p1 to p2), mainly reflect the variation of the peptide elongation time, hence superscript (T) in
. We model the bias-free RPF spectrum as:
![]() |
(26) |
We also introduce model pausing scores
to quantify the relative peptide elongation time as the ribosome moves along an ORFi (compare with Equations 12 and 13):
![]() |
(27) |
From the 15
factors used to model the experimental
values in the dataset we normally use five inner
factors to obtain bias-corrected model RPF counts
(Eqs 25, 26). This approach is distinct from using a
-parameter function
, defined only by the inner codons of the local context in the p-interval from p1 to p2:
![]() |
(28) |
When the ML method is used to estimate the inner
parameters in Equation 28 that best account for the whole RPF spectrum, strong technical biases inherent to
spectra distort the
factors. This makes the model RPF scores
, defined as:
![]() |
(29) |
distinct from and inferior to the more accurate elongation time estimating pause scores
(Equation 27).
Absolute peptide elongation cycle times from exponential growth rate
The expected time,
, to translate a codon at position j of gene i in the cell (Equation 1) is estimated by the model time,
, defined by the product of the local context factor
(Equation 25) and a time factor
, estimating
in Equation 1:
![]() |
(30) |
It follows that the total expected time Ti to translate ORFi (Table 1) is estimated by:
![]() |
(31) |
where
![]() |
(32) |
We note that the
value estimates relative translation time of protein i (Table 1). Let Pi be the number of proteins of a type i in an exponentially growing cell population at a given time. The rate of copy number increase for proteins of a type ‘i’ is:
![]() |
(33) |
Here,
is the current number ribosomes in the population,
the fraction of ribosomes in elongation phase, estimated as 0.8 by Dennis and Bremer (25) and
is the fraction of elongating ribosomes devoted to synthesis of protein i. Fraction ui is proportional to the sum,
, of bias-corrected RPF counts
for ORFi. Taking Equations 26 and 32 into account one gets for
:
![]() |
(34) |
so that
, where
. Using this and Equations 31 and 34, one can re-write Equation 33 as:
![]() |
(35) |
Introducing
, the sum total of current protein copies, the exponential growth rate,
can be defined as the increase in total protein copy number per time unit (
) normalized to
:
![]() |
(36) |
We note that for exponential growth the above definition of growth rate (Equation 36) is equivalent to its standard definition (26) as the rate of relative increase in total protein mass (see Supplementary Text). Taking Equations (34) and (35) into account, Equation (36) for the growth rate becomes:
![]() |
(37) |
Here, we used that during exponential growth and when the rate of protein degradation is negligible compared to growth rate, the protein copy numbers
are proportional to our estimates,
, of the frequencies,
(Equation 1) of protein i translation initiation, so that the relation
is valid (see Supplementary Text). From Equation (37) one obtains:
![]() |
(38) |
so that the model time,
is
![]() |
(39) |
Note that all parameters in Equations 38 and 39 except
and
can be obtained from the Ribo-seq experiments themselves. The time factor
in Equation 38 can be interpreted as an average per codon elongation time for a particular growth condition of the cell population, conditional on our special scaling of
parameters (Equation 15) which forces
in Equation (30) to oscillate around 1. Importantly, despite that both
and
depend on
scaling, their product, the model time
, is scaling insensitive and estimates the absolute time
of codon (i,j) translation (see Equation 39).
Self-consistency of the RPF spectrum modeling
By self-consistent modeling we mean that a parameter estimation procedure applied to a dataset simulated using parameters extracted from the original data, will produce exactly the same parameter values as determined directly from the original data. It can be proven that our procedure of extracting the underlying parameters
is indeed self-consistent (see also Supplementary Text). To illustrate this, we first use our ML approach to estimate an original
parameter table from experimental RPF data, then use Equations (6) and (11) to simulate an RPF dataset and, finally, retrieve a new
parameter table from the simulated RPF dataset. We find that the original and retrieved
parameter tables are virtually identical as illustrated in Supplementary Figure S2A for A-, P- and E-site positions of
parameter tables. In contrast, other methods like RUST (14) are not self-consistent in this sense. Computing the RUST ratio metafile table to simulate RPF data and then applying RUST again to retrieve the RUST ratio metafile one finds that the original and retrieved metafile tables differ significantly as illustrated for A-, P- and E-site metafile positions in Supplementary Figure S2B.
RESULTS
Modeling of Ribo-Seq spectra
There is a clear connection between the expected number,
, of experimentally detected ribosomes with a particular codon j of ORFi in the A site, and the expected codon translation time
(Equation 1). This connection allows one to use ribosome profiling for transcriptome-wide kinetic analysis of mRNA translation, but attainment of reliable kinetics data from ribosome profiling has remained elusive. The codon coverage within ORFs in the ribosome profiling spectra is highly variable (Figure 2). This is not only due to the codon context dependent variation of the codon translation time but also to context-dependent bias in the efficiency of nuclease dependent RPF generation and subsequent DNA library preparation steps including reverse transcription, adaptor ligation and PCR (9,11,12,14,20,24,26,27). Here, we consider three major causes of codon-to-codon variation of the experimental (‘exp’) RPF counts
at each transcriptome position (i, j) summarized in Equation 1. These include: (i) codon context-dependent variation in the peptide elongation time,
, (ii) bias,
, of RPF generation and processing, and (iii) stochastic fluctuations in the experimental
values. As seen in Equation (2), each
value is the product of a time factor
reflecting average codon translation time under a particular growth condition and a unit-less parameter
that depends on the context of codon j, so that
. Local context dependent variation of
that causes the variations in
can be traced to identities of A-, P- and E-site tRNAs, interactions between mRNA codons and the ribosome and/or interactions of the nascent peptide chain with the ribosomal exit tunnel in an amino acid-sequence dependent manner (28–30). The variations of bias factor
are also due to local context dependence of the nuclease digestion and/or amplification/processing steps in RPF library preparation. From these, it follows that the variation of the product
that reports on variations of expected counts,
(Equation 2), is defined by local sequence context of the current A-site codon j in ORFi (Figure 1).
Figure 2.
Ribosome profiling spectrum for gene rpsQ. RPF counts (
-values) are plotted versus codon position j of the rpsQ transcript encoding ribosomal protein S17. The horizontal line represents the average number of RPFs per codon (
) for the inner transcript region from
to
(see Equation 14 for formal
definition).
We estimated each
value by a model (‘mod’)
parameter, which is the product of 15
factors (Equation 6). Each zp,c value is determined by the type of codon (c) at local sequence context position (p) (Figure 1). These
values were estimated by fitting our model (Equations 5 and 6) to the experimental
values of the whole transcriptome. To illustrate the goodness of the fit, we compare experimental (Equation 12) and model (Equation 13) RPF scores for single genes with high RPF density. The model-predicted,
, and experimental,
, RPF score spectra show relative codon-to-codon variation of modeled and experimental RPF counts. They can be remarkably similar at the single gene level (Figure 3A, B) with Pearson correlation coefficients, r, in the 0.7–0.8 range, suggesting that the local mRNA sequence context accounts for the major part of the variability of experimental
values. Figure 3C shows that high r-values are frequent for genes with high experimental RPF density. The r -values decrease as an increasing number of genes with medium and low RPF density are included in the comparison – an effect due to the high statistical uncertainty of RPF profiles for genes with low experimental RPF density. In comparison with the RUST method (14), our method achieves, on average, significantly higher Pearson correlations between experimental and model RPF spectra (Supplementary Figure S3).
Figure 3.
Comparisons of experimental (
; red; Equation 12) and model (
; blue; Equation 13) RPF score spectra at codon resolution for rpsQ (A) and atpE transcript (B). r, Pearson correlation, r = 0.83 for rpsQ and r = 0.81 for atpE. (C) Frequency density of Pearson correlation coefficient, r, between
and
for sets of 161 (red,
), 337 (light blue,
) and 945 (dark blue,
) transcripts. Note that the transcripts were first ranked by their dexp-values (Figure 2) and then top-ranked 161, 337 and 945 transcripts were considered.
Ribosomal profiling spectra are ultra-sensitive to codon identity near ribosome edges
Strikingly, variation of the
factors with codon identity c is much larger for local codon positions (p) near the lagging (p = 4) and leading (p = 11) ribosome edges than in A site (p = 8) (Figure 1). Indeed, the
value varies from 0.4 for the UUU (Phe) codon to 1.6 for the AAG (Lys) codon, while
and
values span significantly larger ranges from 0.2 for the GGG (Gly) to 2.1 for the AUG (Met) codon for
and from 0.2 for the UUU (Phe) to 2.2 for the CCA (Pro) codon for
, respectively (Figure 4A). We have quantified the sensitivity of
to codon identity c at position p as a weighted standard deviation, Sp, from the mean along the p-row of the
-factor table (Equation 18). A plot of Sp versus p confirms much higher sensitivity to codon identity at local codon positions close to ribosome edges (p = 4 and p = 11) than at ribosomal A, P or E site (p = 8, 7 or 6, respectively) (Figure 4B).
Figure 4.

Sensitivity of
factor with codon identity at different local context positions p. (A) Variation of
values with codon identity c for positions p = 4 (dark blue, lagging ribosome edge), p = 8 (light blue, A site) and p = 11 (red, leading ribosome edge). Codons are ordered as in the genetic code table. (B) Position sensitivity Sp (Equation 18) versus local context position p (see Figure 1 for position numbering).
Nuclease induced bias in Ribo-Seq spectra from E. coli
To dissect the origins of enhanced codon sensitivity of
factors at positions near ribosome edges (Figure 4), we analyzed Ribo-Seq spectra also at single nucleotide resolution. Bacterial Ribo-Seq libraries are commonly constructed by first mapping the 3′-ends of RPFs to genomic nucleotide sequences (6,7). RPF coverage profiles at single nucleotide resolution are then obtained by counting the number
of RPFs assigned to nucleotide j of gene i. The
-values are subsequently converted to standard experimental RPF profiles at single nucleotide resolution
by the re-assignment rule
. The premise for this procedure is that the nucleotide distance (x) from the 3′-end to the first A-site nucleotide of an RPF is constant (6). Fragment length(FL)-specific profiles,
, are generated from RPFs of the same length, FL, so that the standard
profiles can also be obtained by summation of
over all FLs. In bacteria, both FL-summed and single FL-specific RPF coverage profiles lack the well-defined three-nucleotide periodicity that is observed in yeast or mammalian cells (6,7). We suggest that this periodicity loss is caused by ‘anomalous’ MNase cleavage at one or two nucleotides downstream of the ordinary cleavage site at the leading (3′) edge of the ribosome. Consequently, the RPF profiles appear as if the translating ribosome moves one nucleotide at a time. In both, single-codon resolution (Equation 1) and single-nucleotide resolution cases, there are expected numbers of RPFs,
, generated from ribosomes with their A site at nucleotide number j of ORFi. The local 15-codon context (pL = 15) with the A-site codon at position pA = 8 (Figure 1) here corresponds to a local 45-nucleotide sequence (pNL = 45) with the first A-site nucleotide at position pNA = 22 (Supplementary Figure S1).
We used our maximum likelihood (ML) approach to estimate the local context factors
that estimate the nucleotide context dependent variation in
as modelled by
using Equations 19 and 20. Those
factors calculated for fragment length-specific experimental coverage profiles,
, are shown in Figure 5 for FL = 23, 24 and 25 nt.
varied greatly in response to changing nucleotide base identity (b) at positions 10, 9 and 8 for FL = 23 (Figure 5A), FL = 24 (Figure 5B) and FL = 25 nt (Figure 5C), respectively. At these combinations of nucleotide positions and lengths the
factors were always relatively small when b = G or b = C leading to small model (‘mod’)
and
values (Equations 19 and 20). Local positions p equal to 10, 9 and 8 correspond to 5′- ends of the FL = 23, 24 or 25 nts fragments, respectively, implying low abundance of RPFs with G/C at their 5′-ends. Indeed, experimental RPFs with an A at their 5'- end are about 60-fold more abundant than experimental RPFs with a G at the 5′- end, in line with the previous report on strong preference of MNase to cleave before an A or a U (9). Notably, the 5′-peak of the position sensitivity to nucleotide identity (calculated analogously to Sp in Equation (18)) moves exactly one nucleotide to the right as the fragment length increases by one nucleotide from 22 to 27 nt (Figure 5D). Irrespective of fragment length, the 3′-ends of RPFs are always aligned at local position p = 32, so that MNase cleavage occurs between positions 32 and 33 in the local nucleotide context (Figure 5). The
parameters with G or C at local position p = 33 were much smaller and those with A or U much larger than 1 (Figure 5), also in line with the observation that MNase cleaves before A/ U nts (9). The 3′-end cleavage bias of MNase was strong and yet considerably less pronounced than the 5′-end cleavage bias. In the local nucleotide region between positions 13 and 29 well inside the ribosome (Supplementary Figure S1), the
factors were very similar for different fragment lengths (Figure 5), suggesting insignificant technical bias in the 13–29 region of the local nucleotide context.
Figure 5.
Context factors,
displayed for local nucleotide positions 1 to 45 for: (A) FL = 23 nt, (B) FL = 24 nt, (C) FL = 25 nt; p = 22 corresponds to the first A-site position (Supplementary Figure S1). (D) Position sensitivity profiles for
-parameters calculated from RPF genome coverage with RPF fragments of lengths ranging from 22 to 27 nts.
factors estimated from the standard experimental RPF coverage profile,
, obtained by summation of length-specific experimental RPF coverage profiles
for RPF lengths from 22 to 27 nts, exhibit much reduced 5′-bias but essentially unchanged 3′-bias (Supplementary Figure S4). The great reduction of the 5′-bias is easily understood by considering that the summation of length specific RPF profiles
corresponds roughly to an FL-averaging of
factors. This also explains why the position sensitivity profile of
factors at codon resolution (Figure 4B) has a smaller bias at positions close to the lagging than to the leading edge of the ribosome.
The strong effects of codon identities at position 11 (leading edge of the ribosome) on
values (Figure 4A) can now be easily explained by the biases at positions 31, 32 and 33 observed at nucleotide resolution. For example, the combinations of G or C at positions 31 and 32 with A or U at position 33 (corresponding to the three nucleotide positions of codon 11) are expected to result in large
-values, while U or A at positions 31 and 32 combined with C or G at position 33 should result in small
-values (see Figure 5 or Supplementary Figure S4). Indeed, GGU (Gly) and CCA (Pro) codons have
-values much larger than 1, while codons UUC (Phe) and AAG (Lys) have
-values much smaller than 1 (Figure 4A), exactly as predicted from 3′ biases (Supplementary Figure S4). The same analysis applied to the 5′ biases (Supplementary Figure S4) explains the strong
codon dependence at the ribosomal lagging edge positions 3 and 4 (Figure 4).
For the codon resolution data, we conclude that the outer codon context-dependent
factors for positions p = 1 to p1–1 and from p2+1 to pL (Figure 1) account for the technical biases in RPF library generation. In contrast, the inner codon context
factors, for positions p from p1 to p2 mainly reflect the context dependent variation of the peptide elongation times. With this as a lead we estimated the
factors for the E. coli AS19 dataset and used the inner subset of
factors for all positions from p1 = 5 to p2 = 9 to obtain bias-corrected model
parameters (Equation 25). A typical example of such bias elimination is shown in Figure 6A for the E. coli atpE transcript. We contend that the bias-corrected model
pausing scores (Equation 27) reflect the bias-free peptide elongation times showing that the ribosome translates mRNA in a much smoother fashion than the experimental
RPF scores might suggest.
Figure 6.

Pausing score profile and absolute time spectrum for the atpE (ATP synthase subunit C) transcript at single codon resolution. (A) Comparison of an experimental RPF score profile
(red, Equation 12) with the total model RPF score profile
(blue, Equation 13) and model pausing score profile
(light blue, Equation 27). The pausing score profile is much less jagged (
= 0.3) than the total model (
= 0.9) and experimental (
= 1.1) RPF score profiles. (B) Absolute elongation time spectrum
(Equation 39); the horizontal line corresponds to the average per-codon translation time of the atpE transcript.
We have also estimated the absolute peptide elongation time,
, as the product
where
estimates the time factor
in Equation 1 (Figure 6B). We note that our modeling approach allows for determination of the model
parameters, and, hence, model pausing scores
from the ribosome profiling data alone, but for
calculation we need to use additional experimental information provided by the growth rate
of the bacterial population (Equation 38).
The local codon context dependent distribution of relative peptide elongation times
The elimination of the technical bias described in the previous section enables estimation of authentic peptide elongation times for any A-site codon j in any ORFi by ‘dividing out’ the bias dependent local context parameter
(Equation 24) from the total context parameter
(Equation 6), which leads to the context parameter
(Equation 25) proportional to the A-site codon elongation time
(Equation 30). The frequency densities of
and bias-free
values for the E. coli transcriptome are displayed along with those for their logarithms in Supplementary Figure S5. The frequency densities of
and
logarithms are near Gaussian with σ-values of 0.61 and 1.2, respectively (Supplementary Figure S5B). From this, we propose that each rate-limiting elongation step involves the passage over a standard free energy barrier determined by the sum of standard free energy contributions determined by the logarithms of
factors in the local codon context. According to the transition-state theory, the time it takes to overcome a standard free-energy barrier increases exponentially with the barrier height (31). In translocation, the height of the free energy barrier could be the sum of the free energies of interaction between ribosome and mRNA throughout the whole inner context region. In peptidyl transfer, the barrier height could be the sum of the free energies from the identities of codons upstream of the A-site codon. According to the Central Limit Theorem, the frequency densities of such free energy sums would be near-Gaussian, providing a tentative explanation for the near-Gaussian frequency densities of the logarithm of
-values (Supplementary Figure S5B) the exponentiation of which then leads to a log-normal distribution (Supplementary Figure S5A). Interestingly, frequency density of a log-normal distribution is mimicked by the distribution of the sum of two stochastic variables, one normally and one exponentially distributed. Possibly, this feature has led to the previous proposal that there are two-time components in peptide elongation, one Gaussian and one exponential (32). Finally, we note that due to the local context dependent bias there are more
factors in
(Equation 6) than in
(Equation 25), leading to a broader near-Gaussian distribution for the logarithm of
than of
(Supplementary Figure S5B).
Determinants of fast and slow peptide elongation cycles in E. coli
The model estimate
(Equation 30) of the time that the ribosome spends translating codon j of ORFi is proportional to
(Equation 25), a parameter which is estimated from product of
factors for the inner codons of the local context around the A site in the p-interval from 5 to 9 (Figure 1). Accordingly, the size of each inner
factor is a determinant of the peptide elongation time. Under our experimental E. coli AS19 growth conditions the
values for Lys codons AAA or AAG pairing to tRNALys in A (p = 8), P (p = 7) or E site (p = 6) were relatively large and contributed to slow peptide elongation (Figure 7). A similar picture holds for Gly codons GGU and GGC, read by tRNAGly3. In contrast, Ile codons AUC and AUU, Phe codons UUU and UUC and Val codons GUC and GUU translated by tRNAIle2, tRNAPhe and tRNAVal2, respectively, exhibited relatively small
values in the A, P and E site of the local context and contributed to fast peptide elongation (Figure 7). In most cases, synonymous codons read by the same tRNA isoacceptor have similar
values (Figure 7 and Supplementary Figure S6). This, we propose, reflects similar interactions between the ribosome and the shared cognate tRNA. Along the same line, inner
factors of Val codons at the same local position p were different when read by tRNAVal2 or tRNAVal1 (Figure 7), probably reflecting different interactions between the ribosome and the bodies of tRNAVal2 and tRNAVal1.
Figure 7.
Variation of z-factors with the local codon position around the A site (pA= 8) for selected tRNAs reading synonymous codons. Large and small
-values contribute to slow and fast peptide elongation, respectively. A complete set of z-factors for all tRNAs is presented in Supplementary Figure S6.
In the A site, codons for charged AAs, e.g. Lys, Asp and Glu, and one hydrophobic AA, Val, encoded by the GUA codon promoted slow peptide elongation (Figure 8A). Codons encoding Gly, Pro and Ala promoted fast or slow peptide elongation depending on whether they are in the A or P site of the local context (Figure 8A, B). In the E site of the local context codons encoding Lys, Glu, Gln and Asp as well as the Gly codons GGC and GGU (translated by tRNAGly3) contributed to slow peptide elongation (Figure 8C). Codons encoding aromatic AAs generally promoted fast elongation when in A, P and E sites of the local context, with Phe being the fastest for our dataset.
Figure 8.

Codons ranked according to
values for (A) A-site, (B) P-site and (C) E-site position of local context. Large and small
-values designate slow and fast peptide elongation, respectively. Codons are ordered in the descending order of
values.
Peptide elongation times in conditions of ternary complex depletion
Next, we considered two published Ribo-Seq datasets, one generated from E. coli MG 1655 strain following short incubation with mupirocin and the other representing an untreated control, grown under otherwise identical conditions (6). Mupirocin is an inhibitor of isoleucyl-tRNA synthetase (IleRS) (33), which depletes charged tRNAIle and causes strong A-site pausing at Ile codons (6). Accordingly, our analysis of the dataset with mupirocin treatment showed greatly increased
values for all three Ile codons at A site which correlated with slow peptide elongation at Ile codons due to reduced supply of Ile-tRNAIle-containing ternary complexes (Figure 9A). We noted also that for the major Ile codons (AUC and AUU)
increased by 13- and 16-fold, respectively, whereas
for the minor AUA Ile codon increased 8.5-fold (Figure 9A). Since the concentration of the minor AUA reading tRNAIle2 is an order of magnitude lower than the tRNAIle1 concentration pairing to the major Ile codon (34), we propose that the mupirocin-induced relative increase in A-site binding time is much larger for ternary complex with the major than with the minor isoacceptor (see Discussion for more details). A much higher sensitivity to IleRS inhibition for AUC/AUU than for AUA reading is also predicted by the theory of selective charging of tRNA isoacceptors (35), corroborated for a similar case of other aminoacyl-tRNA synthetase inhibition (36).
Figure 9.
values are affected by E. coli MG 1655 amino acid starvation. (A) Marked stalling at Ile codons following treatment with mupirocin. Variation of
factors with the local codon position around the A site (pA= 8) for three Ile codons in untreated E. coli MG1655 (left panel) and treated with mupirocin (right panel). Large
values indicate propensity for slow peptide elongation. (B) Comparison of the A-site
values for untreated E. coli MG 1655 (red) and E. coli MG 1655 treated with mupirocin (light blue). Codons are ordered as in the genetic code table.
Compared to the untreated E. coli MG1655 cells we also found that along with the slower Ile codon reading, mupirocin addition caused a faster Ser/Gly codon decoding (Figure 9B). The relatively slow decoding of Ser and Gly codons in the control was attributed to quick depletion of Ser- and Gly-tRNAs due to culture filtration before the Ribo-Seq library preparation (6). Accordingly, we also detected higher
values at Ser and Gly codons indicative of slow elongation on these codons in the untreated E. coli MG1655 cells (Figure 9B). We speculate that mupirocin treatment results in a drastic slowing down of the global translation in the cell, which also reduces Ser and Gly consumption. Hence, the pools of charged seryl-tRNAs and glycyl-tRNAs are maintained, thus eliminating the pausing on Gly and Ser codons (see Discussion for details).
We have also calculated
factors at nucleotide resolution for the untreated E. coli MG 1655 data set (Supplementary Figure S7A) and compared them with
factors for our dataset (Figure 5). While the 3′-end bias in
factors for the same FL was similar for the two data sets, the 5′-end bias was much less pronounced for the E. coli MG 1655 (compare Figure 5A and S7A). Similarly,
factors estimated from the standard RPF nucleotide coverage profile
for E. coli MG 1655 also had much less pronounced 5′- bias than the corresponding
factors for the E. coli AS19 data set (Supplementary Figures S7B and S4). We attribute these differences to the much longer incubation with MNase of 1 hour for E. coli MG1655 (6) vs. 10 min for our E. coli AS19 during library preparations.
Neutralization of nuclease induced bias in Ribo-seq spectra from Saccharomyces cerevisiae
To further validate our modeling approach, we considered two published datasets from the yeast S. cerevisiae (19). These were prepared with MNase (S7) and RNase A with distinct cleavage biases: while MNase cuts preferentially before A and U, RNase A cleaves preferentially after C and U (19). Here, we applied our nucleotide-resolution ML approach to quantify the characteristic biases in the two datasets. For the MNase set we detected much higher
factor values for A/U compared to C/G nucleotides at position p = 7 and p = 35 corresponding to the nucleotides at the 5′- end and the nucleotide after the 3′- RPF end, respectively (Figure 10A). This pattern is very similar to that in our MNase dataset from E. coli (Figure 5). In contrast, the
factors for the RNaseA dataset were relatively small for b = A or G at position p = 6, corresponding to the nucleotide before the 5′- end of the RPF and near zero at position p = 34 which corresponds its 3′- end (Figure 10 B). This suggests that the technical biases of the two data sets are distinct and the differences reflect the cleavage preferences of the nucleases used to generate the RPF libraries.
Figure 10.

Nucleotide-resolution
-factors calculated from RPF coverage profiles with RPFs with FL = 28 for yeast Ribo-Seq datasets constructed using MNase (A) and RNase A (B).
We then calculated the codon resolution
factors for the MNase- and RNase A-treated yeast datasets and used the inner subset of
factors for positions from p1 = 5 to p2 = 9 to obtain bias-corrected
parameters (Equation 25) and
pausing scores (Equation 27). As expected, both the correlation between the
RPF scores (Equation 12) and between the
RPF scores (Equation 13) from the two data sets obtained with MNase and RNase are weak (r = 0.3) as exemplified in Figure 11A and B for the YGR027C transcript (coding for the S25 protein of the 40S ribosomal subunit). In contrast, the bias-corrected pausing score
profiles of YGR027C from the two data sets are strongly correlated (r = 0.77) with similar features (Figure 11C). The absolute translation time profiles
for YGR027C transcript calculated from RNase A and MNase data sets, assuming 2 h duplication time of the yeast culture (Equation 36), are also remarkably similar (Figure 11D). This similarity also reflects the varying codon elongation time as the ribosome moves codon by codon along the YGR027C transcript. We obtained very similar results for other transcripts. Notably, the frequency distribution of the r -values underwent a large shift from low to high correlation following neutralization of nuclease-introduced biases (Figure 11E). We also observed a strong correlation between
and
values obtained for the MNase (S7) and RNase A datasets for positions near the A site (Supplementary Figure S8A). This correlation for the A-site position was r = 0.8 and increased (r = 0.85) when rear codons (i.e. with frequency < 0.3%) were excluded. This, we suggest, reflects the similarity of the effects of a particular A-site codon on the codon translation time in both data sets. For P and E sites, the correlation between
and
factors for the P and E sites was less pronounced (r = 0.7). As expected, the correlation between
and
factors for positions near the edges of the yeast ribosome (e.g. positions 3 and 12) is low (r < 0.25), which reflects the distinct sequence preferences of the nucleases in RPF generation (Supplementary Figure S8B).
Figure 11.

Comparison of experimental RPF scores
(A), model RPF scores
(B) and pausing scores
(C) for YGR027C transcript (encoding ribosomal protein S25, Rps25a) in MNase (red) and RNase A (blue) yeast Ribo-Seq data sets; the Pearson correlation coefficients, r, between the two data sets are r = 0.32 (A), r = 0.35 (B), r = 0.77 (C) and r = 0.77 (D). (D) Absolute elongation time spectrum (Equation 39) of YGR027C derived from the MNase (red) and RNase A (blue) Ribo-Seq data set. The mean elongation time per codon for the YGR027C transcript is 80.8 ms and 86.7 ms when estimated from the RNAse A and MNase datasets, respectively. (E) Frequency distribution of Pearson correlation coefficients, r, between scores for the same transcripts in yeast datasets prepared with RNase A (RA) and MNase (S7): for experimental scores (light blue, correlation between
and
), model scores (red, correlation between
and
), ‘five-inner model’ scores (dark blue, correlation between
and
) and bias-free pausing scores (orange, correlation between
and
). ‘Five-inner model’ refers to modeling each (i,j) A-site context contribution with 5
position parameters (see Equations 28 and 29 for
definition).
We wish to emphasize that to obtain the bias-corrected pausing scores
(Equation 27), we used the five inner-position
values from the fifteen
values accounting for the total local A-site context in the modeling of the experimental
datasets. Different bias elimination method has been developed earlier (20) using neural network modeling to predict the elongation time of the A-site codon from its short sequence context (that does not include the edges of the RPFs). To compare these two principally different ways of bias elimination, in similarity to this approach (20), we restricted the context to five codons around each A-site codon, and, hence excluded the edges of the RPFs. We then modeled the two Ribo-Seq sets from S. cerevisiae (19) processed with either MNase or RNase A using “five-inner modeling", i.e. using only five-inner
values to obtain RPF scores
(Equations 28 and 29) and found them different from pausing scores
. We then calculated Pearson correlations between
and
for each transcript i in the two datasets. Clearly, our ‘bias-free method’ leads to much higher correlation between the
derived
scores (Figure 11E, orange ‘bias-free’ r-frequency profile) than the ‘five-inner model’ for the correlation between the
-derived
scores (Figure 11E, ‘five inner model’, dark blue r-frequency profile). An intuitive explanation for this result is that in the ‘five inner model’ the five
inner-position factors used for the description of a
dataset absorb the experimental biases. In contrast, using pL = 15 local positions for modeling
, the experimental biases are absorbed by the outer ten
factors (Equation 24), thus leaving the five-inner
factors bias-free. Thus, the ‘five-inner modelling’ that essentially emulates an earlier approach (20), reduces the precision of elongation time estimates.
DISCUSSION
Since decades, quantitative studies of protein synthesis with purified ribosomes and auxiliary translation components have been performed across species (37,38). In spite of the insights from these biochemical approaches, there are considerable differences between the empirical contexts of cell-free and intracellular mRNA translation. For instance, in the living cell tightly controlled parallel pathways exist for the supply of aminoacyl-tRNAs, for ternary complex formation and, furthermore, the translation of A-site codons takes place in the context of a virtual infinitude of sets of neighboring codons. Thus, experimental approaches orthogonal to in vitro biochemistry will deepen our understanding of how the intracellular kinetic networks of mRNA translation shape the life sustaining phenotypes of living cells. In the present work, we join the ongoing and rapidly growing efforts to establish genome-wide technologies (3,39) for quantitative studies of mRNA translation. We provide a framework for parallel estimation of elongation times of all codons in all local codon contexts of different types of cells. This was made possible by the development of novel type of model to be fitted to transcriptome-wide ribosome profiling data for parameters estimation. Our model describes the elongation time at each codon of the transcriptome as a product of 15 independent
factors, one for each codon position in the local context surrounding the ribosomal A site. The factor for each codon context position can have one of 61 possible values, depending on its codon identity and context position. Using a maximum likelihood criterion, we obtain the values of 15 × 61 = 915
factors for 61 sense codons in 15 local context positions by fitting our model to the experimental RPF spectrum. Despite large ruggedness and stochastic fluctuations, the experimental data are well fitted by the model.
To discriminate between effects of codon context on nuclease cleavage preferences on one hand and peptide elongation time variations on the other, we use models with both single-codon and single-nucleotide resolution. In line with previous findings (9), we find much higher MNase activity at A/U compared to G/C nucleotides near the 5′- or 3′-ends of RPFs, leading to strongly skewed fragment creation/processing and biased RPF spectra. At the same time, the MNase cleavage bias does not propagate into the inner context on both sides of the A-site codon, a crucial feature enabling neutralization of technical codon context-dependent bias. In this way, we derived unbiased RPF spectra suitable for estimation of codon elongation times throughout the transcriptome. We observed differences in the
values for the A site between two different E. coli strain, MG1655 and AS19 (compare Figures 8A and 9B), implying that our approach can very sensitively detect elongation time difference at single codon between different strains, growth medium and conditions.
We have applied our modeling approach to clarify the effects of mupirocin-induced inhibition of the IleRS activity in a bacterial system using a previously published data set (6). The inhibition decreases the rate of supply of charged tRNAIle isoacceptors (33) and greatly enhances values of
parameters for all three Ile codons (AUA, AUC or AUU) in the A site, suggesting greatly increased binding time for isoleucyl-tRNAIle-containing ternary complexes. Considering that the total concentration of major tRNAIle1 isoacceptor is an order of magnitude larger than that of the minor tRNAIle2 isoacceptor (34) and assuming nearly 100% charged levels of both tRNAIle isoacceptors in the absence of the inhibitor, the time for ternary complex binding into the A site is estimated to be an order of magnitude smaller for AUC/AUU than for AUA codons. In the inhibitor-less case, the total peptide elongation time is about 30% longer for AUA than for AUC/AUU codons (Figure 9B). From these data we suggest that the relative change in the time for ternary complex binding into the A site is much larger for AUC/AUU than AUA codons, meaning that AUA decoding is much less sensitive to IleRS inhibition than AUC/AUU. This further corroborates the theory of selective charging of tRNA isoacceptors (35), previously validated by SerRS inhibition in E. coli cells (36). In fact, our method might be very useful for detection of ternary complex depletion scenarios in cells. This optimistic notion receives further support from the observation that mupirocin, in addition to slowing down translation at Ile codons, also speeds up translation of Gly and Ser codons in a codon selective manner. That is, mupirocin addition reduces considerably the reading times of major (GGC/GGU) but not of the minor (GGG/GGA) Gly codons and reduces the reading times for all Ser codons (Figure 9). A possible scenario to explain also these codon-specific patterns is that under experimental conditions used to obtain the RPF dataset in E. coli MG1655 grown in balanced medium both Gly and Ser codons are weakly starved for their cognate ternary complex (6) due to deficient intracellular supply of Gly and Ser (40). Mupirocin addition slows down the overall protein synthesis, thereby removing the supply bottlenecks of Gly and Ser and pausing at their codons. We note that the theory of selective charging of tRNAs predicts starvation-sensitive reading of GGC/GGU but not of GGG/GGA codons and starvation sensitivity of all Ser codons (35), which corroborates the proposed scenario of weak Gly and Ser starvation that is removed by addition of an IleRS inhibitor.
We have broadened our approach from bacterial systems to include also eukaryote systems. We compared two published Ribo-Seq sets from S. cerevisiae (19), derived from identical yeast populations but processed with different nucleases, either MNase or RNase A. Both RNases exhibit strong but distinct cleavage preferences leading to greatly different and virtually uncorrelated experimental and model reproduced RPF spectra. However, after bias neutralization model spectra for both RNases become less rugged and are strikingly similar (Figure 11). This means, we propose, that our bias-neutralization approach provides a solution to the long-standing problem of extracting reliable quantitative information about individual codon elongation cycle times from greatly rugged, highly noisy and biased RPF spectra.
Ribosome profiling holds a great promise of detailed insights into the dynamics of protein synthesis in single cells and multicellular organisms. The ongoing improvements of data analysis along with refinements of experimental techniques and the synergy of different and sometimes orthogonal approaches will accelerate the development of this promising field.
DATA AVAILABILITY
The sequencing data for E. coli AS19 generated in this study have been deposited within Gene Expression Omnibus (GEO) under accession number GSE145571. Two published data sets (6,19) analyzed here too, are available under the accession numbers in the GEO Series with accession number GSE119104 (GSM3358136 and GSM3358137) for E. coli MG1655 and GSE 82220 (GSM2186726 and GSM2186728) for yeast. All scripts and source code for modeling and calculating the parameters used here are deposited in https://github.com/gustafGitHub/RiboTimes.
Supplementary Material
ACKNOWLEDGEMENTS
We thank Irem Avcilar, Christian del Campo, and Anneli Borg for ribosome profiling libraries and growth curves, and Alexander Bartholomäus and Baban Kolte for helping with the mapping pipeline.
Contributor Information
Michael Y Pavlov, Department of Cell and Molecular Biology, Biomedical Center, University of Uppsala, 75237 Uppsala, Sweden.
Gustaf Ullman, Department of Cell and Molecular Biology, Biomedical Center, University of Uppsala, 75237 Uppsala, Sweden.
Zoya Ignatova, Institute for Biochemistry & Molecular Biology, University of Hamburg, 20146 Hamburg, Germany.
Måns Ehrenberg, Department of Cell and Molecular Biology, Biomedical Center, University of Uppsala, 75237 Uppsala, Sweden.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
Swedish Research Council [VR 2018-0404, VR 2016-0624]; Knut and Alice Wallenberg Foundation (to M.E.); EU Horizon 2020 program (Marie Skłodowska-Curie) [764591 to Z.I.]. Funding for open access charge: EU Horizon 2020 program (Marie Skłodowska-Curie) [764591].
Conflict of interest statement. None declared.
REFERENCES
- 1. Steitz J.A. Polypeptide chain initiation: nucleotide sequences of the three ribosomal binding sites in bacteriophage R17 RNA. Nature. 1969; 224:957–964. [DOI] [PubMed] [Google Scholar]
- 2. Ingolia N.T., Ghaemmaghami S., Newman J.R., Weissman J.S.. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science. 2009; 324:218–223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Ingolia N.T. Ribosome footprint profiling of translation throughout the genome. Cell. 2016; 165:22–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Schuller A.P., Green R.. Roadblocks and resolutions in eukaryotic translation. Nat. Rev. Mol. Cell Biol. 2018; 19:526–541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Stern-Ginossar N., Ingolia N.T.. Ribosome profiling as a tool to decipher viral complexity. Annu Rev Virol. 2015; 2:335–349. [DOI] [PubMed] [Google Scholar]
- 6. Mohammad F., Green R., Buskirk A.R.. A systematically-revised ribosome profiling method for bacteria reveals pauses at single-codon resolution. eLife. 2019; 8:e42591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Woolstenhulme C.J., Guydosh N.R., Green R., Buskirk A.R.. High-precision analysis of translational pausing by ribosome profiling in bacteria lacking EFP. Cell Rep. 2015; 11:13–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Datta A.K., Burma D.P.. Association of ribonuclease I with ribosomes and their subunits. J. Biol. Chem. 1972; 247:6795–6801. [PubMed] [Google Scholar]
- 9. Dingwall C., Lomonossoff G.P., Laskey R.A.. High sequence specificity of micrococcal nuclease. Nucl Acis Res. 1981; 9:2659–2673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. O’Connor P.B., Li G.W., Weissman J.S., Atkins J.F., Baranov P.V.. rRNA:mRNA pairing alters the length and the symmetry of mRNA-protected fragments in ribosome profiling experiments. Bioinformatics. 2013; 29:1488–1491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Weinberg D.E., Shah P., Eichhorn S.W., Hussmann J.A., Plotkin J.B., Bartel D.P.. Improved ribosome-footprint and mRNA measurements provide insights into dynamics and regulation of yeast translation. Cell Rep. 2016; 14:1787–1799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Zheng W., Chung L.M., Zhao H.. Bias detection and correction in RNA-sequencing data. BMC Bioinfomat. 2011; 12:290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Artieri C.G., Fraser H.B.. Accounting for biases in riboprofiling data indicates a major role for proline in stalling translation. Genome Res. 2014; 24:2011–2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. O’Connor P.B., Andreev D.E., Baranov P.V.. Comparative survey of the relative impact of mRNA features on local ribosome profiling read density. Nat. Commun. 2016; 7:12915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Sharma A.K., Sormanni P., Ahmed N., Ciryam P., Friedrich U.A., Kramer G., O’Brien E.P.. A chemical kinetic basis for measuring translation initiation and elongation rates from ribosome profiling data. PLoS Comput. Biol. 2019; 15:e1007070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Del Campo C., Bartholomaus A., Fedyunin I., Ignatova Z.. Secondary structure across the bacterial transcriptome reveals versatile roles in mRNA regulation and function. PLoS Genet. 2015; 11:e1005613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Mortazavi A., Williams B.A., McCue K., Schaeffer L., Wold B.. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Meth. 2008; 5:621–628. [DOI] [PubMed] [Google Scholar]
- 18. Mohammad F., Woolstenhulme C.J., Green R., Buskirk A.R.. Clarifying the translational pausing landscape in bacteria by ribosome profiling. Cell Rep. 2016; 14:686–694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Gerashchenko M.V., Gladyshev V.N.. Ribonuclease selection for ribosome profiling. Nucleic Acids Res. 2017; 45:e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Tunney R., McGlincy N.J., Graham M.E., Naddaf N., Pachter L., Lareau L.F.. Accurate design of translational output by a neural network model of ribosome distribution. Nat. Struct. Mol. Biol. 2018; 25:577–582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Levenberg K. A method for the solution of certain non-linear problems in least squares. Quarterly Appl Math. 1944; 2:164–168. [Google Scholar]
- 22. Marquardt D. An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math. 1963; 11:431–441. [Google Scholar]
- 23. Dana A., Tuller T.. The effect of tRNA levels on decoding times of mRNA codons. Nucleic Acids Res. 2014; 42:9171–9181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Hussmann J.A., Patchett S., Johnson A., Sawyer S., Press W.H.. Understanding biases in ribosome profiling experiments reveals signatures of translation dynamics in yeast. PLoS Genet. 2015; 11:e1005732. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Bremer H., Dennis P.P.. Modulation of chemical composition and other parameters of the cell at different exponential growth rates. EcoSal Plus. 2008; 3:doi:10.1128/ecosal.5.2.3. [DOI] [PubMed] [Google Scholar]
- 26. Bartholomaus A., Del Campo C., Ignatova Z.. Mapping the non-standardized biases of ribosome profiling. Biol. Chem. 2016; 397:23–35. [DOI] [PubMed] [Google Scholar]
- 27. McGlincy N.J., Ingolia N.T.. Transcriptome-wide measurement of translation by ribosome profiling. Methods. 2017; 126:112–129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Chiba S., Ito K.. Multisite ribosomal stalling: a unique mode of regulatory nascent chain action revealed for MifM. Mol. Cell. 2012; 47:863–872. [DOI] [PubMed] [Google Scholar]
- 29. Lu J., Deutsch C.. Electrostatics in the ribosomal tunnel modulate chain elongation rates. J. Mol. Biol. 2008; 384:73–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Nakatogawa H., Ito K.. Secretion monitor, SecM, undergoes self-translation arrest in the cytosol. Mol. Cell. 2001; 7:185–192. [DOI] [PubMed] [Google Scholar]
- 31. Laidler K., King C.. Development of transition-state theory. J. Phys. Chem. 1983; 87:2657–2664. [Google Scholar]
- 32. Dana A., Tuller T.. Properties and determinants of codon decoding time distributions. BMC Genomics. 2014; 15:S13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Hughes J., Mellows G.. Inhibition of isoleucyl-transfer ribonucleic acid synthetase in Escherichia coli by pseudomonic acid. Biochem. J. 1978; 176:305–318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Ikemura T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol. 1981; 151:389–409. [DOI] [PubMed] [Google Scholar]
- 35. Elf J., Nilsson D., Tenson T., Ehrenberg M.. Selective charging of tRNA isoacceptors explains patterns of codon usage. Science. 2003; 300:1718–1722. [DOI] [PubMed] [Google Scholar]
- 36. Lindsley D., Bonthuis P., Gallant J., Tofoleanu T., Elf J., Ehrenberg M.. Ribosome bypassing at serine codons as a test of the model of selective transfer RNA charging. EMBO Rep. 2005; 6:147–150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Dever T.E., Green R.. The elongation, termination, and recycling phases of translation in eukaryotes. Cold Spring Harb. Perspect. Biol. 2012; 4:a013706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Maracci C., Rodnina M.V.. Review: translational GTPases. Biopolymers. 2016; 105:463–475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Iwasaki S., Ingolia N.T.. The growing toolbox for protein synthesis studies. Trends Biochem. Sci. 2017; 42:612–624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Avcilar-Kucukgoze I., Bartholomaus A., Cordero Varela J.A., Kaml R.F., Neubauer P., Budisa N., Ignatova Z.. Discharging tRNAs: a tug of war between translation and detoxification in Escherichia coli. Nucleic Acids Res. 2016; 44:8324–8334. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The sequencing data for E. coli AS19 generated in this study have been deposited within Gene Expression Omnibus (GEO) under accession number GSE145571. Two published data sets (6,19) analyzed here too, are available under the accession numbers in the GEO Series with accession number GSE119104 (GSM3358136 and GSM3358137) for E. coli MG1655 and GSE 82220 (GSM2186726 and GSM2186728) for yeast. All scripts and source code for modeling and calculating the parameters used here are deposited in https://github.com/gustafGitHub/RiboTimes.

































































