On the Decoupling of Evolutionary Changes in mRNA and Protein Levels

Daohan Jiang; Alexander L Cope; Jianzhi Zhang; Matt Pennell

doi:10.1093/molbev/msad169

. 2023 Jul 27;40(8):msad169. doi: 10.1093/molbev/msad169

On the Decoupling of Evolutionary Changes in mRNA and Protein Levels

Daohan Jiang ¹, Alexander L Cope ^2,^3,⁴, Jianzhi Zhang ^5,^✉, Matt Pennell ^6,^7,^✉

Editor: Julian Echave

PMCID: PMC10411491 PMID: 37498582

Abstract

Variation in gene expression across lineages is thought to explain much of the observed phenotypic variation and adaptation. The protein is closer to the target of natural selection but gene expression is typically measured as the amount of mRNA. The broad assumption that mRNA levels are good proxies for protein levels has been undermined by a number of studies reporting moderate or weak correlations between the two measures across species. One biological explanation for this discrepancy is that there has been compensatory evolution between the mRNA level and regulation of translation. However, we do not understand the evolutionary conditions necessary for this to occur nor the expected strength of the correlation between mRNA and protein levels. Here, we develop a theoretical model for the coevolution of mRNA and protein levels and investigate the dynamics of the model over time. We find that compensatory evolution is widespread when there is stabilizing selection on the protein level; this observation held true across a variety of regulatory pathways. When the protein level is under directional selection, the mRNA level of a gene and the translation rate of the same gene were negatively correlated across lineages but positively correlated across genes. These findings help explain results from comparative studies of gene expression and potentially enable researchers to disentangle biological and statistical hypotheses for the mismatch between transcriptomic and proteomic data.

Keywords: gene expression, mRNA–protein correlation, evolutionary theory, phylogenetic comparative analysis

Introduction

Understanding the causes and consequences of evolutionary divergence in gene expression is important for explaining divergence in organismal phenotypes and adaptation (King and Wilson 1975). As proteins carry out functions encoded by protein-coding sequences and are generally thought of as the functional unit of the cell, the protein abundance (hereafter, the protein level) is expected to be the target of natural selection. However, previous work on gene expression evolution has predominantly relied on mRNA levels due to the relative simplicity and cost-effectiveness of high-throughput mRNA sequencing methods compared to mass spectrometry-based proteomics. This implicitly assumes that mRNA levels are an adequate proxy for protein levels; however, many studies, including those on mammals, flies, and yeasts, have documented weak to moderate correlations (i.e., Pearson’s correlation coefficient r or Spearman’s correlation coefficient $ρ$ below $0.6$ ) between mRNA and protein levels across genes (Gygi et al. 1999; Marguerat et al. 2012; Becker et al. 2018; Wang et al. 2019), tissues (Edfors et al. 2016; Franks et al. 2017; Fortelny et al. 2017; Perl et al. 2017; Wang et al. 2019), and species (Laurent et al. 2010; Khan et al. 2013; Ba et al. 2022). Disentangling the biological, technical, and statistical explanations for the observed correlations between mRNA and protein levels remains an open and challenging problem (de Sousa Abreu et al. 2009; Vogel and Marcotte 2012; Liu et al. 2016; Buccitelli and Selbach 2020). In order to understand the biological underpinnings of this relationship—that is, to better understand how, when, and why discrepancies between mRNA and protein levels arise on evolutionary timescales—we need mathematical models that describe the coevolution between these two aspects of gene expression and that can generate clear predictions.

In this study, we focus specifically on the correlation between mRNA and protein levels of the same gene across species. In addition to a weak to moderate correlation between mRNA and protein levels, protein levels are generally more conserved than mRNA levels across species (Schrimpf et al. 2009; Laurent et al. 2010; Khan et al. 2013). This phenomenon is hypothesized to be due to compensatory evolution, in which changes to the mRNA level can be offset by changes to translation regulation, and vice versa (Schrimpf et al. 2009; Laurent et al. 2010; Khan et al. 2013; Wang et al. 2020). Consistent with the compensatory evolution interpretation is the observed negative correlation between the evolutionary divergence of mRNA levels and that of translational efficiencies (i.e., per-transcript rate of translation, as measured by ribosome profiling). For instance, the difference between two yeast species, Saccharomyces cerevisiae and S. paradoxus in the mRNA level and that in translational efficiency of the same gene is more frequently in opposite directions than in the same direction (Artieri and Fraser 2014; McManus et al. 2014). Similarly in mammals, the amount of divergence across five species from diverse clades (eutherian mammals, marsupials, and monotremes) in the mRNA level and that in translational efficiency are negatively correlated (Wang et al. 2020). However, the observed negative correlation between the mRNA level and the translational efficiency may be attributable to a statistical artifact, as translational efficiency estimated using ribosome profiling data is a ratio of the total translation level and the mRNA level, and regressing $Y / X$ (or $Y - X$ after log transformation) against X is well known to induce spurious negative relationships when there is measurement error in one, or both, of the variables (Long 1980; Fraser 2019; Ho and Zhang 2019). Indeed, other studies find little evidence of compensatory evolution between transcription and translation, with changes to translation largely mirroring changes to transcription (Albert et al. 2014; Wang et al. 2018). The conflicting observations regarding the coevolution of transcription, translation, and protein levels raise the questions as to what evolutionary conditions are likely to result in compensatory evolution and whether these conditions are likely prevalent.

It is challenging to develop statistical tests to disentangle evolutionary from technical explanations. The proposed evolutionary models have been purely verbal, post hoc explanations of observed patterns, and there is not a strong theoretical foundation from which to make more quantitative predictions. More rigorous theory would also aid in the interpretation of phylogenetic models fit to comparative gene expression data. There has been a rapid increase in the availability of both comparative gene expression data and in the sophistication of statistical approaches for analyzing them (e.g., Bedford and Hartl 2009; Brawand et al. 2011; Rohlfs et al. 2014; Rohlfs and Nielsen 2015; Nourmohammad et al. 2017; Chen et al. 2019; Barua and Mikheyev 2020). However, we typically lack strong predictions about how different evolutionary processes will change the observed phylogenetic distribution of expression levels (Price et al. 2022). Furthermore, we anticipate that in the next several years, researchers will collect transcriptomic data paired with proteomic data from multiple lineages (e.g., Ba et al. 2022) and we would like to be able to develop predictions and quantitative models to describe patterns of coevolution from these data.

To address this lack of theoretical justification, we develop a framework rooted in quantitative genetics theory to investigate the coevolution of the mRNA level, the rate of translation, and the protein level of a gene when the protein level is the target of natural selection. Under this framework, we conducted simulations to demonstrate how patterns of evolutionary divergence in these traits and correlation between them are related to values of evolutionary parameters. Our simulations reveal that stabilizing selection on the protein level is sufficient to cause compensatory evolution, which holds across a variety of evolutionary conditions. We also find that evolutionary changes in the mRNA level and that in the rate of translation complement each other when the protein level is under directional selection, resulting in a negative transcription–translation correlation that is similar to that caused by stabilizing selection.

Results

Stabilizing Selection Leads to Compensatory Evolutionary of Transcription and Translation

To understand the coevolutionary dynamics of mRNA and protein levels when the latter is subject to natural selection, we considered a model where the mRNA level and the per-transcript rate of translation (more concisely, the “translation rate”) are directly affected by mutations and collectively determine the protein level. The protein level is the fitness-related trait, and we first considered the case when the protein level is subject to stabilizing selection. Under this model, we simulated the evolution of a gene’s expression in 500 replicate lineages and examined the resulting distribution of phenotypes among these lineages (see Materials and Methods). This model assumes that mRNA and protein degradation rates remain constant through time, such that all evolutionary changes to mRNA and protein levels are mediated through changes to transcription and/or translation. Here, each replicate lineage can be conceived as a species, and the amount of evolutionary divergence among species was represented by the variance among these lineages. As a negative control, we also simulated mRNA and protein levels when neither trait is subject to natural selection, that is, neutral evolution.

When the protein level is under stabilizing selection, the correlation between the mRNA level and the translation rate of the same gene among species (the “transcription–translation correlation”) were strongly negative ( $r < - 0.95$ under the parameter combination we considered, fig. 1A). In contrast, such a negative transcription–translation correlation was not observed under neutrality (supplementary fig. S1A, Supplementary Material online). The positive correlation between the mRNA level and the protein level (the “mRNA–protein correlation”) was much weaker under stabilizing selection compared to that under neutrality (fig. 1B and supplementary fig. S1B, Supplementary Material online). These patterns are also robust to relative mutational target sizes of transcription and translation, though the mRNA–protein correlation becomes stronger when the proportion of mutations that affect the mRNA level is high (supplementary table S1, Supplementary Material online).

Fig. 1. — Coevolution of the mRNA level, the rate of translation, and the protein level when the protein level is under stabilizing selection. (A) End-point transcription–translation correlation across lineages. (B) End-point mRNA–protein correlation. In (A) and (B), each data point represents a replicate lineage (i.e., species), and the position of the point represents the lineage’s phenotype at the end of the simulation. Lines in (A) and (B) are least-squares regression lines. (C) Variances of the mRNA level, the rate of translation, and the protein level across lineages through time. (D,E) End-point transcription–translation correlation (D) and mRNA–protein correlation (E) under different combinations of $N_{e}$ and shape of the fitness function. y-Axis of (D) and (E) is reciprocal of the fitness function’s standard deviation (SD). A greater reciprocal means a smaller SD, a narrower fitness function, and stronger selection. All phenotypes plotted are in log scale.

As measurement errors are known to impact empirical measures of gene expression and mRNA–protein correlation (Csárdi et al. 2015), we added noise to each end-point (i.e., at the end of the simulations, equivalent to measurements from multiple extant species) mRNA level and protein level (see Materials and Methods) and repeated our analyses to determine if our results were robust. In general, measurement error had little qualitative impact on our results (supplementary table S2, Supplementary Material online). The dependence of measurement error in the translation rate on measurement error in the mRNA level (see Materials and Methods) created a trend towards a negative correlation between the mRNA level and the translation rate under neutrality. In contrast, measurement error did not strengthen, but weakened, the negative transcription–translation correlation in the presence of stabilizing selection (supplementary table S2, Supplementary Material online). This is consistent with attenuation bias towards 0, which is known to occur when two correlated traits are measured with error.

When we examined how the variance of each trait across replicate lineages changed over time, we observed patterns that are consistent with these findings from looking at the correlation of end-points (fig. 1A and B). Variances of both the mRNA level and the translation rate increased over time when the protein level was subject to stabilizing selection, although the variances of each trait were much lower than the neutral expectation (fig. 1C and supplementary fig. S1C, Supplementary Material online). In contrast, the variance of the protein level saturated early during the simulated evolution in the presence of stabilizing selection (fig. 1C), which did not happen under neutrality (supplementary fig. S1A, Supplementary Material online).

To confirm that a negative transcription–translation correlation arises due to stabilizing selection, we varied the strength of stabilizing selection by altering the effective population size $N_{e}$ and SD of the fitness function $σ_{ω}$ . As expected, the transcription–translation correlation became more negative when selection was stronger (i.e., greater $N_{e}$ and/or smaller $σ_{ω}$ ) (fig. 1D). In contrast, the mRNA–protein correlation was close to zero when selection was strong, but more positive when selection was weak (fig. 1E). Together, these observations support the view that compensatory evolution between the mRNA level and the translation rate helps maintain a relatively constant protein level when protein levels are subject to stabilizing selection.

We also conducted simulations along a phylogenetic tree of 50 species (fig. 2A) and estimated evolutionary correlations between traits (i.e., the correlation accounting for phylogenetic history) to confirm whether the above patterns would be seen in phylogenetic comparative analyses. Consistent with observations from the simulated replicate lineages, the evolutionary correlation between the mRNA level and the translation rate is strongly negative when the protein level is under stabilizing selection ( $r \approx - 0.8$ under the parameter combinations we considered, supplementary table S3, Supplementary Material online). As this correlation is not as negative as that across replicate lineages (regression slope of the translation rate on the mRNA level $b = - 0.791$ for simulations along a tree and $b = - 0.977$ for simulation on replicate lineages), we repeated the simulation along trees transformed using Pagel’s λ transformation (Pagel 1999) (i.e., extending external branches and shortening internal branches of the original tree) to test if the discrepancy is due to small effective sample size caused by the tree structure (Ané 2008). Indeed, evolutionary correlations estimated from the transformed trees were more similar to the correlations among replicate lineages (supplementary table S4, Supplementary Material online). We also observed a positive association of divergence time with both the mRNA level and the translation rate, while obvious saturation was seen for the protein level (fig. 2B), consistent with the time-variance relationships seen in simulations along replicate lineages (fig. 1C). Fitting standard phylogenetic models of continuous trait evolution (see Materials and Methods) indicates that divergence of the protein level is better described by an Ornstein–Uhlenbeck (OU) process (a random walk around an optimum, commonly interpreted as a model of stabilizing selection Hansen 1997), while divergence of the mRNA level and the translation rate are better described by a Brownian motion (BM) process [a simple random walk, commonly interpreted as genetic drift or randomly varying selection (Felsenstein 1988)] (supplementary fig. S2A, C, and E, Supplementary Material online). However, when measurement error was unaccounted for, model comparisons to the mRNA level and the translation rate favored OU models as well (supplementary fig. S2A, C, andE, Supplementary Material online). This is consistent with previous findings that measurement error biases model fits toward OU models even if the traits evolved under a BM model (Pennell et al. 2015; Cooper et al. 2016).

Transcription–Translation Coevolution of Interacting Genes

Due to shared gene regulatory architecture, the expression levels of different genes are not independent of each other, and this interdependence can potentially shape the mutational architecture and influence the way evolution of expression levels is constrained (Hether and Hohenlohe 2014). Therefore, we also examined how interactions between genes influence the coevolution of mRNA and protein levels. We simulated data under a model of two interacting genes where each gene’s realized transcription rate is determined collectively by its genotypic value (i.e., a baseline transcription rate) and the regulatory effect of protein product(s) of other gene(s) (fig. 3A, also see Materials and Methods).

Fig. 3. — (A) Schematic illustration for the model of between-gene interaction considered in this study. (B–G) Transcription–translation correlation (B, D, and F) and mRNA–protein correlation (C, E, and G) of interacting genes. Axes are genes’ regulatory effects on each other. (B,C) A gene directly subject to stabilizing selection (i.e., has an optimal protein level). (D,E) A regulator gene that is not directly subject to selection. Transcription of the target gene in (B) and (C) is regulated by the protein of the regulator in (D) and (E). (F,G) Correlations observed for one gene (gene 1) when both genes are directly subject to stabilizing selection.

We first examined the scenario where one gene is only a regulator (referred to as “the regulator”), while the protein level of the other gene is subject to stabilizing selection (referred to as “the target”). The target gene, which is directly under stabilizing selection, showed negative transcription–translation correlation (i.e., correlation between genotypic values of transcription and translation; fig. 3B) and weak mRNA–protein correlation (fig. 3C) for most combinations of interaction parameter values. However, the negative transcription–translation correlation became weaker in the presence of the regulator, reflecting between-gene compensatory evolution. In other words, in the presence of a regulator, a deleterious substitution affecting the target’s transcription or translation rates can be compensated by a substitution affecting transcription or translation rate of either the target itself or the regulator. In the presence of strong negative feedback (i.e., regulatory effects of two genes on each other both have large absolute values but opposite signs), the mRNA–protein correlation became more positive. The regulator also exhibited a negative transcription–translation correlation (fig. 3D), indicating that indirect selection due to the regulatory roles of a gene is sufficient to cause its transcription–translation compensatory evolution. The mRNA–protein correlation of the regulator was weakened by the interaction but remained rather strong across the parameter space (fig. 3E). When both genes under consideration are directly subject to stabilizing selection, we observed a negative transcription–translation across the examined parameter space, yet was weaker in the presence of strong negative feedback (fig. 3F). Consistently, the mRNA–protein correlation is generally weak but becomes stronger when there is strong negative feedback (fig. 3G).

Although the number of genes involved in real regulatory networks is much greater, it is impractical to evenly sample a higher-dimensional space of interaction parameter combinations. To this end, we examined a series of motifs that bear features commonly seen in real regulatory networks (supplementary fig. S4, Supplementary Material online). The correlations were mostly consistent with those of genes in two-gene regulatory motifs: the regulator(s) and the target(s) all showed negative, though not necessarily strong transcription–translation correlations, while targets subject to direct stabilizing selection showed weaker mRNA–protein correlation (supplementary table S5, Supplementary Material online). Together, these results reveal that transcription–translation compensatory evolution can take place as a result of a gene’s regulatory effect on other gene(s), yet would be less pronounced due to other gene(s)’ regulatory effect(s) on the gene of interest.

Transcription–Translation Coevolution of Functionally Equivalent Genes

Next, we considered a scenario where two genes express functionally equivalent proteins, such that fitness is determined by the total amount of proteins expressed from the two genes. These genes can be perceived as duplicate genes that have not yet been divergent enough in their protein sequences to be functionally distinct, in which case selection would act to maintain the total expression level (Qian et al. 2010). In simulations under this scenario, both genes show negative transcription–translation correlations (supplementary table S6, Supplementary Material online), though the correlations are not as strong as that in the single gene case (fig. 1A and supplementary fig. S2, Supplementary Material online). We also observed negative correlations between expression traits (i.e., mRNA levels, translation rates, and protein levels) of different genes (supplementary table S6, Supplementary Material online). This scenario, like examples of interacting genes (fig. 3B, C, F, and G), demonstrates that between-gene compensatory evolution can complement compensatory evolution of transcription and translation of the same gene and weaken the negative transcription–translation correlation.

Transcription–Translation Coevolution Under Directional Selection

To understand how directional selection might influence the coevolution of mRNA and protein levels differently, we simulated evolution towards the optimal protein level in 500 replicate lineages starting from the same phenotype (see Materials and Methods). The end-point protein level is distributed around the new optimum, yet the relative contribution of mRNA level and translation to evolutionary change in the protein level varied across lineages (fig. 4A and B). The end-point mRNA level and the end-point translation rate are negatively correlated (fig. 4A), whereas the protein level is essentially uncorrelated with the mRNA level (fig. 4B). While the correlations were similar to those observed when the protein level is under stabilizing selection, variances of both the mRNA level and the translation rate are much greater among lineages that have evolved under directional selection (≈0.04 for both the mRNA level and the translation rate) than variances observed when there is stabilizing selections only (≈0.005, again for both mRNA level and the translation rate). This difference in the variances indicates the distribution of phenotypes is not only shaped by stabilizing on the protein level after the optimum is reached, but also reflects different paths different lineages took to reach the same optimal protein level.

Fig. 4. — Coevolution of the mRNA level, the rate of translation, and the protein level when the protein level is under directional selection. (A) End-point transcription–translation correlation. (B) End-point mRNA–protein correlation. (C) A schematic illustration of the difference between across-species and across-gene correlations, using mRNA–protein correlation as an example. Correlation across species is calculated from mRNA and protein levels of the same gene in different species, whereas correlation across genes is calculated from mRNA and protein levels of different genes in the same species. (D) End-point transcription–translation correlation among multiple genes with different optimal protein levels. (E) End-point mRNA–protein correlation among multiple genes with different optimal protein levels. In (D) and (E), each gene is represented by a cloud of points of a distinct color. Solid lines of different colors are least-squares regression lines of different genes, while the dashed lines are least-squares regression lines based on all data points. Correlation coefficient shown in each panel is based on all data points in the panel.

As different genes within a genome likely have different optimal protein levels, we repeated the above simulations for multiple genes with the same starting mRNA levels and translation rates but different optimal protein levels. Specifically, we asked if patterns of correlation across species (within gene) and that across genes would be different (difference between two types of correlation illustrated in fig. 4C). Patterns of correlations across species and across genes are drastically different: for each gene, there is a strong negative correlation between the mRNA level and the translation rate, but essentially no correlation between the mRNA level and the protein level. In contrast, both correlations are positive when compared among genes (fig. 4D and E; supplementary table S7, Supplementary Material online). These observations reflect the fact that evolutionary changes in a gene’s mRNA level and translation rate were usually concordant (i.e., changing the protein level in the same direction), despite the negative correlation in terms of magnitude. Among-species variances of both the mRNA level and the translation rate are positively correlated with the absolute distance between the optimal protein level and the ancestral protein level ( $r > 0.99$ for both the mRNA level and the translation rate), confirming the role of directional selection in shaping the distribution of phenotypes among species.

Discussion

In this study, we demonstrate how the mRNA level, the translation rate, and the protein level coevolve when it is the protein level that is subject to selection. Using simulated data generated under a quantitative genetics model of gene expression evolution, we show that stabilizing selection on the protein level can cause a negative transcription–translation correlation (fig. 1A) and weaken the positive mRNA–protein correlation (fig. 1B). As measurement errors are known to impact empirical estimates of gene expression, we examined the impact of random error added to the end-point traits on our results. While measurement error weakened the negative transcription–translation correlation under neutrality, it does not account for the strong correlation observed in the presence of stabilizing selection on the protein level (supplementary table S2, Supplementary Material online). Notably, as errors in the translation rate are correlated with errors in the mRNA level, a spurious correlation between estimates of these two traits might arise. Our simulations also reveal that stabilizing selection on the protein level can make the protein level more conserved across species than the mRNA level (figs. 1C and 2B), which is a pattern often found in empirical studies (Schrimpf et al. 2009; Laurent et al. 2010; Khan et al. 2013). However, we note that this phenomenon is expected to occur only under some combinations of evolutionary parameters (see the Expected Phenotypic Variances subsection of the Materials and Methods section).

The data points in fig. 1A, which correspond to different lineages, can also be interpreted as representing different genes. In this case, the end-point phenotypes shall be labeled as divergence from the ancestral phenotype. Such negative correlation between evolutionary divergence in transcription and translation of different genes has been observed in empirical studies of both budding yeasts (Artieri and Fraser 2014; McManus et al. 2014) and mammals (Wang et al. 2020). The negative correlations observed in these empirical studies shown in were weaker than those observed in our simulations, likely because different genes have different evolutionary parameters (e.g., protein levels of different genes are not under equally strong selection).

We observed that the evolution of the mRNA level and the translation rate, which are not directly under selection and have no optima, was qualitatively similar under both neutral evolution (i.e., no optimum for protein levels) and protein levels subject to stabilizing selection. Under neutrality, variance among lineages is expected to increase through time at a rate determined by the mutational variance (Lynch and Hill 1986; Khaitovich et al. 2005). In our simulations where the protein level is under stabilizing selection, variances in the mRNA level and the translation rate among replicate lineages increased through time without saturation (fig. 1C), and phylogenetic analysis of simulations along a phylogenetic tree favored a BM model as well, consistent with neutral expectations. Importantly, the evolution of the mRNA level and the translation rate were affected by selection, as they underwent much less evolutionary divergence than expected under neutrality. Such apparently (but not truly) neutral evolution occurs when a trait is not under direct selection but genetically correlated to trait(s) subjected to constraint: some mutations affecting the focal traits are purged by selection due to their deleterious effect on other trait(s), yet the focal trait itself has neither an optimal value nor boundaries, allowing it to diverge indefinitely, albeit slowly (Jiang and Zhang 2020). It should be noted that the mRNA level and the rate of translation are presumably not truly unbounded, as resources within the cell (i.e., RNA polymerase molecules, ribosomes, ATP, etc.) are ultimately limited, though we assumed that the phenotypes are far from such limits in our simulations.

Previous studies have shown evolutionary divergence in the mRNA level at phylogenetic scales is best described by an OU process (Bedford and Hartl 2009; Chen et al. 2019; Dimayacyac et al. 2023), which appears in contradiction to our observation that variance in the mRNA level continued to increase through time (figs. 1C and 2B). As mentioned above, one possible explanation for this is that measurement errors create a bias in favor of the OU model (Pennell et al. 2015; Cooper et al. 2016), which is confirmed in this study (supplementary fig. S2, Supplementary Material online). The discrepancy between our simulation results and observations from analyses of real data is likely to be due in part to measurement error. It is also worth noting that the same amount of error would cause more severe bias when the total amount of divergence is low. Therefore, stabilizing selection on the protein level does make it more likely that divergence in the mRNA level appears to fit an OU model by augmenting the influence of measurement errors. Note that there could be other, nonmutually exclusive explanations to these discrepancies. For example, there might be an upper bound to each gene’s mRNA level and translation rate imposed by the availability of cellular resources.

Directional selection on the protein level results in patterns of correlations that are similar to those resulting from stabilizing selection. To reach the optimal protein level, the mRNA level and the rate of translation undergo evolutionary changes that are concordant in terms of direction, but complementary in terms of magnitude. As a result, negative transcription–translation correlation and weak mRNA–protein correlation are both expected among a group of species that underwent selection towards the same optimal protein level. After the optimum is reached, stabilizing selection takes over and continues to promote the same kind of correlation. The effects of stabilizing and directional selection can be collectively viewed as the effect of the fitness landscape: when there exists an optimal protein level, selection is expected to result in a negative correlation between the mRNA level and the translation rate, and weak to no correlation between the mRNA level and the protein level.

The negative transcription–translation correlation and weak mRNA–protein correlation resulting from selection on the protein level are within gene and across species. We also extended our model to explore how the same evolutionary processes would shape the correlations across different genes. We show strongly positive transcription–translation and mRNA–protein correlations among genes with different optimal protein levels, demonstrating an instance of Simpson’s paradox (i.e., the correlation between variables seen in certain subsets of data differs from that seen in the complete dataset). Recent studies have found rather strong mRNA–protein correlations across genes (e.g., Franks et al. 2017; Fortelny et al. 2017), and it is suggested that measurement errors played a significant role in weakening the correlations in earlier studies (Csárdi et al. 2015). Our finding reconciles these results and the weaker within-gene, among-species correlations.

In our simulations, we considered a simple mutational architecture with no pleiotropy (i.e., each mutation affects either transcription or translation but not both) as parameterization of pleiotropy in the simulations is challenging. The prevalence of pleiotropic mutations and their effects on transcription and translation are unclear. If the mRNA level and the rate of translation could be measured simultaneously for a sufficiently large number of mutant genotypes, a more complete picture of the two traits’ mutational architecture could be obtained, which would allow better parameterization. Similarly, we did not consider mutations affecting the degradation rates due to the difficulty of parameterization.

Conclusion

By connecting patterns in the coevolution of the mRNA level, the rate of translation, and the protein level to explicit evolutionary processes, we demonstrate that several widely observed phenomena in between-species comparisons—namely, weak mRNA–protein correlation, negative transcription–translation correlation, and the protein levels being more evolutionarily conserved than the mRNA level—can all result from stabilizing selection on the protein level. Additionally, positive mRNA–protein correlations across genes arise because different genes have different optimal protein levels. With these connections built, our results can aid the interpretation of observation in future empirical studies and help disentangle the effects of biological and technical factors.

Materials and Methods

Model of Gene Expression

The basic model we study is a system of two equations. The first describes the rate at which the mRNA level of a single gene, R changes with time. This is given by

\frac{d R}{d t} = α - γ_{R} R,

(1)

where α is the gene’s transcription rate, and $γ_{R}$ is the rate at which the mRNA molecules are degraded. The rate at which the protein level, P changes with time is described as

\frac{d P}{d t} = β R - γ_{P} P,

(2)

where β is the per-transcript translation rate, and $γ_{P}$ is the rate by which protein products are degraded $γ_{P}$ . The β parameter can be interpreted as the rate of translation initiation, as initiation has been shown to be a major rate-limiting step of translation (Shah et al. 2013). At the equilibrium state (i.e., neither the mRNA level nor the protein level is changing),

{\begin{matrix} α = γ_{R} R, \\ β R = γ_{P} P . \end{matrix}

(3)

We can solve equation (3) and obtain

{\begin{matrix} R = \frac{α}{γ_{R}}, \\ P = \frac{α β}{γ_{R} γ_{P}}, \end{matrix}

(4)

which can then be log-transformed

{\begin{matrix} \ln R = \ln α - \ln γ_{R}, \\ \ln P = \ln α + \ln β - \ln γ_{R} - \ln γ_{P} . \end{matrix}

(5)

In our model, we assume a precise match between the genotypic values (i.e., α and β) and the phenotype (i.e., equilibrium R and P) and did not consider expression noise. Although expression noise could play a role in constraining evolution of transcription and translation rates (Hausser et al. 2019), we assume, in this study, that selection imposed by noise is far weaker than that imposed by the mean protein level and thus negligible.

Expected Phenotypic Variances

Under the assumption that degradation rates in equation (5) are constant, the variance of the mRNA level across samples (i.e., replicate lineages) is equal to variance of the transcription rate [i.e., $Var (\ln R) = Var (\ln α)$ ]. Variance of the protein level, as a function of other variances, is given by

\begin{aligned} Var (\ln P) & = Var (\ln α + \ln β) \\ = Var (\ln α) + Var (\ln β) + 2 Cov (\ln α, \ln β) \\ = Var (\ln α) + Var (\ln β) + 2 ρ \sqrt{Var (\ln α) Var (\ln β)}, \end{aligned}

(6)

where $Cov (\ln α, \ln β)$ is the covariance between $\ln α$ and $\ln β$ , and $ρ$ is their correlation coefficient. The protein level is more conserved than the mRNA level if

Var (\ln α) + Var (\ln β) + 2 ρ \sqrt{Var (\ln α) Var (\ln β)} < Var (\ln α) .

(7)

If we re-arrange the inequality, we obtain the conditions under which we would expect to see the more inter-lineage variation in R than P

ρ < - \frac{\sqrt{Var (\ln β)}}{2 \sqrt{Var (\ln α)}} .

(8)

We note that the inequality in equation (8) is invalid when $\sqrt{Var (\ln β)} > 2 \sqrt{Var (\ln α)}$ since $ρ$ is only defined between $- 1$ and 1, which means the protein level can be more conserved than the mRNA only if variance in the rate of translation is not too much higher than variance of the mRNA level. The variances [ $Var (\ln β)$ and $Var (\ln α)$ ] are depending on multiple factors, including strength of selection on the protein level, mutational parameters, and indirect effect of selection due to interaction with other gene(s) (see below). Therefore, the observation the protein level is more conserved than the mRNA level reflects the collective effect of multiple factors and may not occur depending on value of different parameters.

Model of Interaction Between Genes

Let us consider two interacting genes, gene 1 and gene 2. Gene 1’s protein can influence gene 2’s transcription, and vice versa. We assume that the expression level of gene 1 and gene 2 are governed by the dynamics depicted above, with each gene having its own genotypic values for the translation and transcription rate parameters (i.e., the α value for gene 1 is $α_{1}$ and for gene 2 is $α_{2}$ ). We assume that the rates of mRNA and protein degradation are the same for all genes. In this model, the two genes interact according to a set of interaction parameters C. The parameter $C_{1, 2}$ is the effect of gene 1’s protein product $P_{1}$ on gene 2’s transcription level, and $C_{2, 1}$ is the reverse. We can then set up a system of differential equations as follows:

{\begin{matrix} \frac{d R_{1}}{d t} = α_{1} P_{2}^{C_{2, 1}} - γ_{R} R_{1}, \\ \frac{d P_{1}}{d t} = β_{1} R_{1} - γ_{P} P_{1}, \\ \frac{d R_{2}}{d t} = α_{2} P_{1}^{C_{1, 2}} - γ_{R} R_{2}, \\ \frac{d P_{2}}{d t} = β_{2} R_{2} - γ_{P} P_{2} . \end{matrix}

(9)

When the interaction parameter is negative, the regulatory effect on the target gene is repression. This is reflected in the asymptotic decrease of the realized transcription rate ( $α_{1} P_{2}^{C_{2, 1}}$ or $α_{2} P_{1}^{C_{2, 1}}$ ) as the concentration of the repressor increases. Conversely, a positive interaction parameter indicates an activation effect, where the realized transcription rate increases with the concentration of the activator.

It is worth noting that the activation effect would plateau as the activator’s concentration increases, because an excess of activator molecules would not be able to bind the target. Although the realized transcription rate increases monotonously in our model, our approximation remains reasonable when the interaction parameter is between 0 and 1 and the regulator’s expression level is not extremely high. Given our focus on scenarios where stabilizing selection is in action, we can assume that the concentration of the regulator’s protein would never reach a level where target’s transcription rate plateaus. We did not consider interaction parameter values above one in this study.

At equilibrium, the following four equations hold:

{\begin{matrix} α_{1} P_{2}^{C_{2, 1}} = γ_{R} R_{1}, \\ β_{1} R_{1} = γ_{P} P_{1}, \\ α_{2} P_{1}^{C_{1, 2}} = γ_{R} R_{2}, \\ β_{2} R_{2} = γ_{P} P_{2} . \end{matrix}

(10)

After log-transformation and rearrangement, we can express equation (10) in matrix form as follows:

[\begin{matrix} - 1 & 0 & 0 & C_{2, 1} \\ 1 & - 1 & 0 & 0 \\ 0 & C_{1, 2} & - 1 & 0 \\ 0 & 0 & 1 & - 1 \end{matrix}] [\begin{matrix} \ln R_{1} \\ \ln P_{1} \\ \ln R_{2} \\ \ln P_{2} \end{matrix}] = [\begin{matrix} \ln γ_{R} - \ln α_{1} \\ \ln γ_{P} - \ln β_{1} \\ \ln γ_{R} - \ln α_{2} \\ \ln γ_{P} - \ln β_{2} \end{matrix}] .

(11)

Solving the system of linear equations gives the logarithms of $R_{1}$ , $R_{2}$ , $P_{1}$ , and $P_{2}$ .

This model can be extended to systems of three or more genes. For gene i in a system of n genes, we have the following differential equations:

{\begin{matrix} \frac{d R_{i}}{d t} = α_{i} \prod_{j = 1}^{n} P_{j}^{C_{j, i}} - γ_{R} R_{i}, \\ \frac{d P_{i}}{d t} = β_{i} R_{i} - γ_{P} P_{i} . \end{matrix}

(12)

Again, after log-transformation and rearrangement, we can express equation (12) in matrix form as follows:

\begin{aligned} [\begin{matrix} - 1 & 0 & 0 & C_{2, 1} & 0 & C_{3, 1} & \dots & 0 & C_{n, 1} \\ 1 & - 1 & 0 & 0 & 0 & 0 & \dots & 0 & 0 \\ 0 & C_{1, 2} & - 1 & 0 & 0 & C_{3, 2} & \dots & 0 & C_{n, 2} \\ 0 & 0 & 1 & - 1 & 0 & 0 & \dots & 0 & 0 \\ 0 & C_{1, 3} & 0 & C_{2, 3} & - 1 & 0 & \dots & 0 & C_{n, 3} \\ 0 & 0 & 0 & 0 & 1 & - 1 & \dots & 0 & 0 \\ \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots \\ 0 & C_{1, n} & 0 & C_{2, n} & 0 & C_{3, n} & \dots & - 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & \dots & 1 & - 1 \end{matrix}] \\ [\begin{matrix} \ln R_{1} \\ \ln P_{1} \\ \ln R_{2} \\ \ln P_{2} \\ \ln R_{3} \\ \ln P_{3} \\ \dots \\ \ln R_{n} \\ \ln P_{n} \end{matrix}] = [\begin{matrix} \ln γ_{R} - \ln α_{1} \\ \ln γ_{P} - \ln β_{1} \\ \ln γ_{R} - \ln α_{2} \\ \ln γ_{P} - \ln β_{2} \\ \ln γ_{R} - \ln α_{3} \\ \ln γ_{P} - \ln β_{3} \\ \dots \\ \ln γ_{R} - \ln α_{n} \\ \ln γ_{P} - \ln β_{n} \end{matrix}] . \end{aligned}

(13)

Note that the system may not have a solution (i.e., the leftmost matrix may not be invertible), depending on values of the parameters. A biological mechanism underlying such scenarios is positive feedback: if a set of genes activate each other, and their initial expression levels are high enough such that the degradation cannot counteract the increase of their expression, there would not be an equilibrium. Instead, their expression levels would increase indefinitely until a physical barrier is reached (e.g., limited by availability of ribosomes). When we simulated evolution of two interacting genes (see below), we considered all combinations of the following values for $C_{1, 2}$ and $C_{2, 1}$ : $- 0.8$ , $- 0.6$ , $- 0.4$ , $- 0.2$ , 0, $0.2$ , $0.4$ , $0.6$ , and $0.8$ .

When we simulated evolution of three interacting genes, we had absolute values of all nonzero interaction parameters equal to $0.5$ . The set of triple-gene regulatory motif examined in this study were chosen because they bear features commonly seen in real regulatory networks (Lee et al. 2002; Shen-Orr et al. 2002). Specifically, motif 1 is a simple regulator chain, motif 2 is a negative feedback loop, motifs 3–5 represent regulatory motifs where the same target is regulated by more than one regulators, while motifs 6–8 represent motifs where one regulator (i.e., transcription factor) regulates more than one targets (see supplementary fig. S4, Supplementary Material online for graphical depictions of the motifs).

Simulation of Evolution Along a Single Lineage

For a system of n genes, we considered a total of $2 n$ traits that are directly affected by mutations, including log-transformed genotypic values of their transcription rates ( $\ln α_{1}, \dots, \ln α_{n}$ ) and translation rates ( $\ln β_{1}, \dots, \ln β_{n}$ ), and simulated the evolution of the population mean phenotype through time. Traits that are directly under selection are the equilibrium protein levels ( $\ln P_{1}, \dots, \ln P_{n}$ ). Given the genotypic values, the steady-state protein levels are calculated using equation system (5) when only a single gene is under concern, using equation (11) for a system of two genes, or equation (13) for a system of three or more genes. As the degradation terms in equation (5) are constants, we omitted them in the simulations when only a single gene is considered; that is, $\ln R$ is represented by $\ln α$ , and $\ln P$ is represented by $\ln α + \ln β$ . The total number of mutations that would occur in time step t, denoted $m_{t}$ , is drawn from a Poisson distribution. The mean of the distribution, $E [m_{t}]$ , is given by

E [m_{t}] = 2 N_{e} \sum U = 2 N_{e} (\sum_{i = 0}^{n} U_{α_{i}} + \sum_{i = 0}^{n} U_{β_{i}}),

(14)

where $N_{e}$ is the effective population size, $\sum U$ is the total per-genome rate of mutations affecting the transcription and translation rates, while $U_{α_{i}}$ and $U_{β_{i}}$ are rates of mutations affecting gene is transcription and translation rates (i.e., number of mutations per haploid genome per time step), respectively. It is assumed here that a mutation can affect either transcription or translation rate of one gene, but not both. The probability that a mutation affects the transcription rate of gene i is $U_{α_{i}} / \sum U$ , and the probability that it affects the translation rate of gene i is $U_{β_{i}} / \sum U$ . The phenotypic effect of a mutation affecting a gene is transcription rate ( $\ln α_{i}$ ) is drawn from a normal distribution $N (0, σ_{α_{i}})$ . Similarly, if the mutation affects the translation rate of gene i ( $\ln β_{i}$ ), its effect is drawn from another normal distribution, $N (0, σ_{β_{i}})$ . Given the protein level of a gene i, the fitness ω with respect to its protein level is given by a Gaussian function:

ω = \exp (- \frac{(\ln P - \ln O)^{2}}{2 σ_{ω}^{2}}),

(15)

where O is the optimal protein level and $σ_{ω}$ is the SD of the fitness function (also referred to as width of the fitness function). In a system where protein levels of n genes are subject to selection, the overall fitness is calculated as

ω = \exp (- \sum_{i = 1}^{n} (\frac{(\ln P_{i} - \ln O_{i})^{2}}{2 σ_{ω, i}^{2}})) .

(16)

When $n = 1$ , equation (16) gives the same result as (15). If there is any gene that has no equilibrium phenotype, the fitness would be treated as zero, though such situations did not occur in our simulations. The coefficient of selection, s, is then calculated as $s = (ω / ω_{A}) - 1$ , where $ω_{A}$ is the ancestral fitness. The fixation probability $p_{f}$ of the mutation is calculated following the approach of Kimura (1962):

p_{f} = \frac{1 - \exp (- 2 s)}{1 - \exp (- 4 N_{e} s)} .

(17)

With probability $p_{f}$ , the mutation’s phenotypic effect would be added to the population’s mean before the next mutation is considered.

The above process would be repeated for T times for each lineage. We set $T = 10^{5}$ , $N_{e} = 1000$ , $U_{α} = U_{β} = 5 \times 10^{- 4}$ (i.e., $2 N_{e} U = 1$ ), $σ_{α} = σ_{β} = 0.1$ , $\ln O = 0$ , and $σ_{ω} = 1$ for all simulations, unless specified. The choice of $N_{e}$ and $σ_{ω}$ was initially arbitrary, but turned out sufficient for the effect of stabilizing selection to manifest: as seen in fig. 1D and E, correlations under this combination is very different from those under weak selection (i.e., lower-left corners of the heat maps) yet similar to many other combinations of relatively strong selection. Therefore, we chose to use these values for most of the simulations. When we simulated a single gene that is subject to directional selection, we set $\ln O = 1$ . All simulations started with $\ln α = 0$ and $\ln β = 0$ , unless specified. For each combination of parameter values, we simulated 500 independent lineages.

When we performed simulations using different combinations of $U_{α}$ and $U_{β}$ , we had a constant total mutation rate $U_{α} + U_{β} = 10^{- 3}$ and let the relative contribution of mutations affecting transcription and those affecting translation vary. Values of $U_{α}$ and $U_{β}$ were chosen such that the expected proportion of mutations that affect transcription ( $U_{α} / (U_{α} + U_{β})$ ) is equal to $10 %$ , $20 %$ , $30 %$ , $40 %$ , $50 %$ (the “default” combination), $60 %$ , $70 %$ , $80 %$ , and $90 %$ (supplementary table S1, Supplementary Material online).

For simulations where multiple genes with different optimal protein levels were involved, we randomly sampled a set of 20 optima $\ln O \sim N (0, 2)$ . For each gene, we conducted simulation along 500 replicate lineages. We assumed no linkage between these genes and had them evolve independently (i.e., simulations for different genes were run separately).

For simulations of functional equivalent genes, fitness is calculated as

ω = \exp (- \frac{(\ln (P_{1} + P_{2}) - \ln O)^{2}}{2 σ_{ω}^{2}}),

(18)

where $P_{1}$ and $P_{2}$ are two genes’ respective protein levels. In these simulations, the two genes are assumed to be unlinked and independently regulated (i.e., each mutation can affect either transcription or translation of only one gene).

It should be noted that our simulations were based on a sequential-fixation model of evolution, which allows more efficient simulations. That is, only one mutation is considered each time, and fixation probability of a mutation is calculated after it is determined if the previous mutation being considered is fixed. Such a model can be a good approximation as long as the mutation rate is not too high such that the probability that multiple mutations affecting the trait of interest segregate in the population at the same time is very low (Lynch 2020).

Simulation Along a Phylogenetic Tree

We generated a Yule tree (no extinction) and used this for all subsequent simulations (fig. 2A). We then rescaled the tree to make its height (i.e., distance between the root and each tip) equal to $10^{5}$ . We simulated evolution along each branch following the same procedure as above. The number of time steps is equal to the branch length (rounded down to the nearest integer). As this is purely for illustrative purposes (the model is the same as the case studied above), we used a single, representative tree. The qualitative patterns did not depend on the shape of the tree.

For each simulation, we estimated the evolutionary variance–covariance matrix between lineages. To assess the influence of tree structure, we used a λ transformation (Pagel 1999) to rescale the relative length of terminal branches with tree height kept the same. We fit two models, BM and OU to each trait (the mRNA level, the translation rate, and the protein level) for each simulation and compared the relative support of the models using their sample-size corrected AICc weights (following Pennell et al. 2015) and averaged the AICc weights across replicate simulations. The BM model describes a continuous-time random walk process where the amount of change in the population mean $\bar{z}$ of a trait over a time interval t is given by:

Δ \bar{z} = σ d W,

(19)

where σ is the rate parameter of the process and $d W$ is a drawn from $N (0, t)$ . Under an OU process $Δ \bar{z}$ is described by:

Δ \bar{z} = - α (\bar{z} - θ) + σ d W,

(20)

where θ is the optimum value and α is the strength of attraction towards the optimum. All phylogenetic analyses were conducted using geiger (Pennell et al. 2014).

Adding Measurement Errors to Simulated Data

For each sample (i.e., an independent lineage or a tip of the phylogenetic tree), we added errors to the end-point mRNA and protein levels; the errors $ε$ were normally distributed and centered on the true value [i.e., $ε \sim N (0, σ_{ϵ})]$ . We considered values of $σ_{ϵ} = {0.01, 0.02, 0.03, 0.04, 0.05}$ in this study for both mRNA and protein levels. The “estimated” translation rate is calculated as a ratio of the “measured” protein and mRNA levels, such that error in the translation rate is correlated with error in the mRNA level.

Supplementary Material

msad169_Supplementary_Data

Click here for additional data file.^{(1.4MB, zip)}

Acknowledgments

We thank members of the Pennell, Edge, and Mooney labs for their thoughtful comments on this study. Three anonymous reviewers provided critical feedback that substantially improved the manuscript. M.P. and D.J. were supported by startup funds from the University of Southern California. J.Z. was funded by the grant R35GM139484 from the NIH. A.L.C. was funded by a NIH-IRACDA postdoctoral fellowship from Rutgers University. ChatGPT helped to improve the clarity of the text.

Contributor Information

Daohan Jiang, Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.

Alexander L Cope, Department of Genetics, Rutgers University, Piscataway, NJ, USA; Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA; Robert Wood Johnson Medical School, Rutgers University, New Brunswick, NJ, USA.

Jianzhi Zhang, Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA.

Matt Pennell, Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online.

Data Availability

All R code is available at https://github.com/phylo-lab-usc/Expression_Evolution.

References

Albert FW, Muzzey D, Weissman JS, Kruglyak L. 2014. Genetic influences on translation in yeast. PLoS Genet. 10:e1004692. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ané C. 2008. Analysis of comparative data with hierarchical autocorrelation. Ann Appl Stat. 2:1078–1102. [Google Scholar]
Artieri CG, Fraser HB. 2014. Evolution at two levels of gene expression in yeast. Genome Res. 24:411–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ba Q, Hei Y, Dighe A, Li W, Maziarz J, Pak I, Wang S, Wagner GP, Liu Y. 2022. Proteotype coevolution and quantitative diversity across 11 mammalian species. Sci Adv. 8:eabn0756. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barua A, Mikheyev AS. 2020. Toxin expression in snake venom evolves rapidly with constant shifts in evolutionary rates. Proc R Soc B. 287:20200613. [DOI] [PMC free article] [PubMed] [Google Scholar]
Becker K, Bluhm A, Casas-Vila N, Dinges N, Dejung M, Sayols S, Kreutz C, Roignant J-Y, Butter F, Legewie S. 2018. Quantifying post-transcriptional regulation in the development of Drosophila melanogaster. Nat Commun. 9:4970. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bedford T, Hartl DL. 2009. Optimization of gene expression by natural selection. Proc Natl Acad Sci USA. 106:1133–1138. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brawand D, Soumillon M, Necsulea A, Julien P, Csárdi G, Harrigan P, Weier M, Liechti A, Aximu-Petri A, Kircher M, et al. 2011. The evolution of gene expression levels in mammalian organs. Nature 478:343–348. [DOI] [PubMed] [Google Scholar]
Buccitelli C, Selbach M. 2020. mRNAs, proteins and the emerging principles of gene expression control. Nat Rev Genet. 21:630–644. [DOI] [PubMed] [Google Scholar]
Chen J, Swofford R, Johnson J, Cummings BB, Rogel N, Lindblad-Toh K, Haerty W, Palma FDi, Regev A. 2019. A quantitative framework for characterizing the evolutionary history of mammalian gene expression. Genome Res. 29:53–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cooper N, Thomas GH, Venditti C, Meade A, Freckleton RP. 2016. A cautionary note on the use of Ornstein Uhlenbeck models in macroevolutionary studies. Biol J Linn Soc. 118:64–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
Csárdi G, Franks A, Choi DS, Airoldi EM, Drummond DA. 2015. Accounting for experimental noise reveals that mRNA levels, amplified by post-transcriptional processes, largely determine steady-state protein levels in yeast. PLoS Genet. 11:e1005206. [DOI] [PMC free article] [PubMed] [Google Scholar]
de Sousa Abreu R, Penalva LO, Marcotte EM, Vogel C. 2009. Global signatures of protein and mRNA expression levels. Mol Biosyst. 5:1512–1526. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dimayacyac JR, Wu S, Pennell M. 2023. Evaluating the performance of widely used phylogenetic models for gene expression evolution. bioRxiv, 2023.02.09.527893. [DOI] [PMC free article] [PubMed]
Edfors F, Danielsson F, Hallström BM, Käll L, Lundberg E, Pontén F, Forsström B, Uhlén M. 2016. Gene-specific correlation of RNA and protein levels in human cells and tissues. Mol Syst Biol. 12:883. [DOI] [PMC free article] [PubMed] [Google Scholar]
Felsenstein J. 1988. Phylogenies and quantitative characters. Annu Rev Ecol Syst. 19:445–471. [Google Scholar]
Fortelny N, Overall CM, Pavlidis P, Freue GVC. 2017. Can we predict protein from mRNA levels? Nature 547:E19–E20. [DOI] [PubMed] [Google Scholar]
Franks A, Airoldi E, Slavov N. 2017. Post-transcriptional regulation across human tissues. PLoS Comput Biol. 13:e1005535. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fraser HB. 2019. Improving estimates of compensatory cis–trans regulatory divergence. Trends Genet. 35:3–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gygi SP, Rochon Y, Franza BR, Aebersold R. 1999. Correlation between protein and mRNA abundance in yeast. Mol Cell Biol. 19:1720–1730. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hansen TF. 1997. Stabilizing selection and the comparative analysis of adaptation. Evolution 51:1341–1351. [DOI] [PubMed] [Google Scholar]
Hausser J, Mayo A, Keren L, Alon U. 2019. Central dogma rates and the trade-off between precision and economy in gene expression Nat Commun. 10:68. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hether TD, Hohenlohe PA. 2014. Genetic regulatory network motifs constrain adaptation through curvature in the landscape of mutational (co)variance. Evolution 68:950–964. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ho W-C, Zhang J. 2019. Genetic gene expression changes during environmental adaptations tend to reverse plastic changes even after the correction for statistical nonindependence. Mol Biol Evol. 36:604–612. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jiang D, Zhang J. 2020. Fly wing evolution explained by a neutral model with mutational pleiotropy. Evolution 74:2158–2167. [DOI] [PMC free article] [PubMed] [Google Scholar]
Khaitovich P, Paabo S, Weiss G. 2005. Toward a neutral evolutionary model of gene expression. Genetics 170:929–939. [DOI] [PMC free article] [PubMed] [Google Scholar]
Khan Z, Ford MJ, Cusanovich DA, Mitrano A, Pritchard JK, Gilad Y. 2013. Primate transcript and protein expression levels evolve under compensatory selection pressures. Science 342:1100–1104. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kimura M. 1962. On the probability of fixation of mutant genes in a population. Genetics 47:713–719. [DOI] [PMC free article] [PubMed] [Google Scholar]
King M-C, Wilson AC. 1975. Evolution at two levels in humans and chimpanzees: their macromolecules are so alike that regulatory mutations may account for their biological differences. Science 188:107–116. [DOI] [PubMed] [Google Scholar]
Laurent JM, Vogel C, Kwon T, Craig SA, Boutz DR, Huse HK, Nozue K, Walia H, Whiteley M, Ronald PC, et al. 2010Protein abundances are more conserved than mRNA abundances across diverse taxa. Proteomics 10:4209–4212. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, et al. 2002. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298:799–804. [DOI] [PubMed] [Google Scholar]
Liu Y, Beyer A, Aebersold R. 2016. On the dependency of cellular protein levels on mRNA abundance. Cell 165:535–550. [DOI] [PubMed] [Google Scholar]
Long SB. 1980. The continuing debate over the use of ratio variables: facts and fiction. Sociol Methodol. 11:37–67. [Google Scholar]
Lynch M. 2020. The evolutionary scaling of cellular traits imposed by the drift barrier Proc Natl Acad Sci USA. 117:10435–10444. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lynch M, Hill WG. 1986. Phenotypic evolution by neutral mutation. Evolution 40:915–935. [DOI] [PubMed] [Google Scholar]
Marguerat S, Schmidt A, Codlin S, Chen W, Aebersold R, Bähler J. 2012. Quantitative analysis of fission yeast transcriptomes and proteomes in proliferating and quiescent cells. Cell 151:671–683. [DOI] [PMC free article] [PubMed] [Google Scholar]
McManus CJ, May GE, Spealman P, Shteyman A. 2014Ribosome profiling reveals post-transcriptional buffering of divergent gene expression in yeast. Genome Res. 24:422–430. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nourmohammad A, Rambeau J, Held T, Kovacova V, Berg J, Lässig M. 2017. Adaptive evolution of gene expression in drosophila. Cell Rep. 20:1385–1395. [DOI] [PubMed] [Google Scholar]
Pagel M. 1999. Inferring the historical patterns of biological evolution. Nature 401:877–884. [DOI] [PubMed] [Google Scholar]
Pennell MW, Eastman JM, Slater GJ, Brown JW, Uyeda JC, FitzJohn RG, Alfaro ME, Harmon LJ. 2014. geiger v20: an expanded suite of methods for fitting macroevolutionary models to phylogenetic trees. Bioinformatics 30:2216–2218. [DOI] [PubMed] [Google Scholar]
Pennell MW, FitzJohn RG, Cornwell WK, Harmon LJ. 2015. Model adequacy and the macroevolution of angiosperm functional traits. Am Nat. 186:E33–E50. [DOI] [PubMed] [Google Scholar]
Perl K, Ushakov K, Pozniak Y, Yizhar-Barnea O, Bhonker Y, Shivatzki S, Geiger T, Avraham KB, Shamir R. 2017. Reduced changes in protein compared to mRNA levels across non-proliferating tissues. BMC Genomics. 18:1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
Price PD, Palmer Droguett DH, Taylor JA, Kim DW, Place ES, Rogers TF, Mank JE, Cooney CR, Wright AE. 2022. Detecting signatures of selection on gene expression. Nat Ecol Evol. 6:1035–1045. [DOI] [PubMed] [Google Scholar]
Qian W, Liao B-Y, Chang AY-F, Zhang J. 2010. Maintenance of duplicate genes and their functional redundancy by reduced expression. Trends Genet. 26:425–430. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rohlfs RV, Harrigan P, Nielsen R. 2014. Modeling gene expression evolution with an extended Ornstein–Uhlenbeck process accounting for within-species variation. Mol Biol Evol. 31:201–211. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rohlfs RV, Nielsen R. 2015. Phylogenetic ANOVA: the expression variance and evolution model for quantitative trait evolution. Syst Biol. 64:695–708. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schrimpf SP, Weiss M, Reiter L, Ahrens CH, Jovanovic M, Malmström J, Brunner E, Mohanty S, Lercher MJ, Hunziker PE, et al. 2009. Comparative functional analysis of the Caenorhabditis elegans and Drosophila melanogaster proteomes. PLoS Biol. 7:e1000048. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shah P, Ding Y, Niemczyk M, Kudla G, Plotkin JB. 2013. Rate-limiting steps in yeast protein translation. Cell 153:1589–1601. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shen-Orr SS, Milo R, Mangan S, Alon U. 2002. Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet. 31:64–68. [DOI] [PubMed] [Google Scholar]
Vogel C, Marcotte EM. 2012. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat Rev Genet. 13:227–232. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang D, Eraslan B, Wieland T, Hallström B, Hopf T, Zolg DP, Zecha J, Asplund A, Li L-h, Meng C, et al. 2019. A deep proteome and transcriptome abundance atlas of 29 healthy human tissues. Mol Syst Biol. 15:e8503. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang SH, Hsiao CJ, Khan Z, Pritchard JK. 2018. Post-translational buffering leads to convergent protein expression levels between primates. Genome Biol. 19:1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang Z-Y, Leushkin E, Liechti A, Ovchinnikova S, Mößinger K, Brüning T, Rummel C, Grützner F, Cardoso-Moreira M, Janich P, et al. 2020. Transcriptome and translatome co-evolution in mammals. Nature 588:642–647. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

msad169_Supplementary_Data

Click here for additional data file.^{(1.4MB, zip)}

Data Availability Statement

All R code is available at https://github.com/phylo-lab-usc/Expression_Evolution.

[msad169-B1] Albert FW, Muzzey D, Weissman JS, Kruglyak L. 2014. Genetic influences on translation in yeast. PLoS Genet. 10:e1004692. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B2] Ané C. 2008. Analysis of comparative data with hierarchical autocorrelation. Ann Appl Stat. 2:1078–1102. [Google Scholar]

[msad169-B3] Artieri CG, Fraser HB. 2014. Evolution at two levels of gene expression in yeast. Genome Res. 24:411–421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B4] Ba Q, Hei Y, Dighe A, Li W, Maziarz J, Pak I, Wang S, Wagner GP, Liu Y. 2022. Proteotype coevolution and quantitative diversity across 11 mammalian species. Sci Adv. 8:eabn0756. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B5] Barua A, Mikheyev AS. 2020. Toxin expression in snake venom evolves rapidly with constant shifts in evolutionary rates. Proc R Soc B. 287:20200613. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B6] Becker K, Bluhm A, Casas-Vila N, Dinges N, Dejung M, Sayols S, Kreutz C, Roignant J-Y, Butter F, Legewie S. 2018. Quantifying post-transcriptional regulation in the development of Drosophila melanogaster. Nat Commun. 9:4970. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B7] Bedford T, Hartl DL. 2009. Optimization of gene expression by natural selection. Proc Natl Acad Sci USA. 106:1133–1138. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B8] Brawand D, Soumillon M, Necsulea A, Julien P, Csárdi G, Harrigan P, Weier M, Liechti A, Aximu-Petri A, Kircher M, et al. 2011. The evolution of gene expression levels in mammalian organs. Nature 478:343–348. [DOI] [PubMed] [Google Scholar]

[msad169-B9] Buccitelli C, Selbach M. 2020. mRNAs, proteins and the emerging principles of gene expression control. Nat Rev Genet. 21:630–644. [DOI] [PubMed] [Google Scholar]

[msad169-B10] Chen J, Swofford R, Johnson J, Cummings BB, Rogel N, Lindblad-Toh K, Haerty W, Palma FDi, Regev A. 2019. A quantitative framework for characterizing the evolutionary history of mammalian gene expression. Genome Res. 29:53–63. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B11] Cooper N, Thomas GH, Venditti C, Meade A, Freckleton RP. 2016. A cautionary note on the use of Ornstein Uhlenbeck models in macroevolutionary studies. Biol J Linn Soc. 118:64–77. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B12] Csárdi G, Franks A, Choi DS, Airoldi EM, Drummond DA. 2015. Accounting for experimental noise reveals that mRNA levels, amplified by post-transcriptional processes, largely determine steady-state protein levels in yeast. PLoS Genet. 11:e1005206. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B13] de Sousa Abreu R, Penalva LO, Marcotte EM, Vogel C. 2009. Global signatures of protein and mRNA expression levels. Mol Biosyst. 5:1512–1526. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B14] Dimayacyac JR, Wu S, Pennell M. 2023. Evaluating the performance of widely used phylogenetic models for gene expression evolution. bioRxiv, 2023.02.09.527893. [DOI] [PMC free article] [PubMed]

[msad169-B15] Edfors F, Danielsson F, Hallström BM, Käll L, Lundberg E, Pontén F, Forsström B, Uhlén M. 2016. Gene-specific correlation of RNA and protein levels in human cells and tissues. Mol Syst Biol. 12:883. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B16] Felsenstein J. 1988. Phylogenies and quantitative characters. Annu Rev Ecol Syst. 19:445–471. [Google Scholar]

[msad169-B17] Fortelny N, Overall CM, Pavlidis P, Freue GVC. 2017. Can we predict protein from mRNA levels? Nature 547:E19–E20. [DOI] [PubMed] [Google Scholar]

[msad169-B18] Franks A, Airoldi E, Slavov N. 2017. Post-transcriptional regulation across human tissues. PLoS Comput Biol. 13:e1005535. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B19] Fraser HB. 2019. Improving estimates of compensatory cis–trans regulatory divergence. Trends Genet. 35:3–5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B20] Gygi SP, Rochon Y, Franza BR, Aebersold R. 1999. Correlation between protein and mRNA abundance in yeast. Mol Cell Biol. 19:1720–1730. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B21] Hansen TF. 1997. Stabilizing selection and the comparative analysis of adaptation. Evolution 51:1341–1351. [DOI] [PubMed] [Google Scholar]

[msad169-B22] Hausser J, Mayo A, Keren L, Alon U. 2019. Central dogma rates and the trade-off between precision and economy in gene expression Nat Commun. 10:68. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B23] Hether TD, Hohenlohe PA. 2014. Genetic regulatory network motifs constrain adaptation through curvature in the landscape of mutational (co)variance. Evolution 68:950–964. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B24] Ho W-C, Zhang J. 2019. Genetic gene expression changes during environmental adaptations tend to reverse plastic changes even after the correction for statistical nonindependence. Mol Biol Evol. 36:604–612. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B25] Jiang D, Zhang J. 2020. Fly wing evolution explained by a neutral model with mutational pleiotropy. Evolution 74:2158–2167. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B26] Khaitovich P, Paabo S, Weiss G. 2005. Toward a neutral evolutionary model of gene expression. Genetics 170:929–939. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B27] Khan Z, Ford MJ, Cusanovich DA, Mitrano A, Pritchard JK, Gilad Y. 2013. Primate transcript and protein expression levels evolve under compensatory selection pressures. Science 342:1100–1104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B28] Kimura M. 1962. On the probability of fixation of mutant genes in a population. Genetics 47:713–719. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B29] King M-C, Wilson AC. 1975. Evolution at two levels in humans and chimpanzees: their macromolecules are so alike that regulatory mutations may account for their biological differences. Science 188:107–116. [DOI] [PubMed] [Google Scholar]

[msad169-B30] Laurent JM, Vogel C, Kwon T, Craig SA, Boutz DR, Huse HK, Nozue K, Walia H, Whiteley M, Ronald PC, et al. 2010Protein abundances are more conserved than mRNA abundances across diverse taxa. Proteomics 10:4209–4212. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B31] Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, et al. 2002. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298:799–804. [DOI] [PubMed] [Google Scholar]

[msad169-B32] Liu Y, Beyer A, Aebersold R. 2016. On the dependency of cellular protein levels on mRNA abundance. Cell 165:535–550. [DOI] [PubMed] [Google Scholar]

[msad169-B33] Long SB. 1980. The continuing debate over the use of ratio variables: facts and fiction. Sociol Methodol. 11:37–67. [Google Scholar]

[msad169-B34] Lynch M. 2020. The evolutionary scaling of cellular traits imposed by the drift barrier Proc Natl Acad Sci USA. 117:10435–10444. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B35] Lynch M, Hill WG. 1986. Phenotypic evolution by neutral mutation. Evolution 40:915–935. [DOI] [PubMed] [Google Scholar]

[msad169-B36] Marguerat S, Schmidt A, Codlin S, Chen W, Aebersold R, Bähler J. 2012. Quantitative analysis of fission yeast transcriptomes and proteomes in proliferating and quiescent cells. Cell 151:671–683. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B37] McManus CJ, May GE, Spealman P, Shteyman A. 2014Ribosome profiling reveals post-transcriptional buffering of divergent gene expression in yeast. Genome Res. 24:422–430. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B38] Nourmohammad A, Rambeau J, Held T, Kovacova V, Berg J, Lässig M. 2017. Adaptive evolution of gene expression in drosophila. Cell Rep. 20:1385–1395. [DOI] [PubMed] [Google Scholar]

[msad169-B39] Pagel M. 1999. Inferring the historical patterns of biological evolution. Nature 401:877–884. [DOI] [PubMed] [Google Scholar]

[msad169-B40] Pennell MW, Eastman JM, Slater GJ, Brown JW, Uyeda JC, FitzJohn RG, Alfaro ME, Harmon LJ. 2014. geiger v20: an expanded suite of methods for fitting macroevolutionary models to phylogenetic trees. Bioinformatics 30:2216–2218. [DOI] [PubMed] [Google Scholar]

[msad169-B41] Pennell MW, FitzJohn RG, Cornwell WK, Harmon LJ. 2015. Model adequacy and the macroevolution of angiosperm functional traits. Am Nat. 186:E33–E50. [DOI] [PubMed] [Google Scholar]

[msad169-B42] Perl K, Ushakov K, Pozniak Y, Yizhar-Barnea O, Bhonker Y, Shivatzki S, Geiger T, Avraham KB, Shamir R. 2017. Reduced changes in protein compared to mRNA levels across non-proliferating tissues. BMC Genomics. 18:1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B43] Price PD, Palmer Droguett DH, Taylor JA, Kim DW, Place ES, Rogers TF, Mank JE, Cooney CR, Wright AE. 2022. Detecting signatures of selection on gene expression. Nat Ecol Evol. 6:1035–1045. [DOI] [PubMed] [Google Scholar]

[msad169-B44] Qian W, Liao B-Y, Chang AY-F, Zhang J. 2010. Maintenance of duplicate genes and their functional redundancy by reduced expression. Trends Genet. 26:425–430. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B45] Rohlfs RV, Harrigan P, Nielsen R. 2014. Modeling gene expression evolution with an extended Ornstein–Uhlenbeck process accounting for within-species variation. Mol Biol Evol. 31:201–211. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B46] Rohlfs RV, Nielsen R. 2015. Phylogenetic ANOVA: the expression variance and evolution model for quantitative trait evolution. Syst Biol. 64:695–708. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B47] Schrimpf SP, Weiss M, Reiter L, Ahrens CH, Jovanovic M, Malmström J, Brunner E, Mohanty S, Lercher MJ, Hunziker PE, et al. 2009. Comparative functional analysis of the Caenorhabditis elegans and Drosophila melanogaster proteomes. PLoS Biol. 7:e1000048. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B48] Shah P, Ding Y, Niemczyk M, Kudla G, Plotkin JB. 2013. Rate-limiting steps in yeast protein translation. Cell 153:1589–1601. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B49] Shen-Orr SS, Milo R, Mangan S, Alon U. 2002. Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet. 31:64–68. [DOI] [PubMed] [Google Scholar]

[msad169-B50] Vogel C, Marcotte EM. 2012. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat Rev Genet. 13:227–232. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B51] Wang D, Eraslan B, Wieland T, Hallström B, Hopf T, Zolg DP, Zecha J, Asplund A, Li L-h, Meng C, et al. 2019. A deep proteome and transcriptome abundance atlas of 29 healthy human tissues. Mol Syst Biol. 15:e8503. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B52] Wang SH, Hsiao CJ, Khan Z, Pritchard JK. 2018. Post-translational buffering leads to convergent protein expression levels between primates. Genome Biol. 19:1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msad169-B53] Wang Z-Y, Leushkin E, Liechti A, Ovchinnikova S, Mößinger K, Brüning T, Rummel C, Grützner F, Cardoso-Moreira M, Janich P, et al. 2020. Transcriptome and translatome co-evolution in mammals. Nature 588:642–647. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

On the Decoupling of Evolutionary Changes in mRNA and Protein Levels

Daohan Jiang

Alexander L Cope

Jianzhi Zhang

Matt Pennell

Roles

Abstract

Introduction

Results

Stabilizing Selection Leads to Compensatory Evolutionary of Transcription and Translation

Fig. 1.

Fig. 2.

Transcription–Translation Coevolution of Interacting Genes

Fig. 3.

Transcription–Translation Coevolution of Functionally Equivalent Genes

Transcription–Translation Coevolution Under Directional Selection

Fig. 4.

Discussion

Conclusion

Materials and Methods

Model of Gene Expression

Expected Phenotypic Variances

Model of Interaction Between Genes

Simulation of Evolution Along a Single Lineage

Simulation Along a Phylogenetic Tree

Adding Measurement Errors to Simulated Data

Supplementary Material

Acknowledgments

Contributor Information

Supplementary Material

Data Availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases