Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Aug 7.
Published before final editing as: Nat Biotechnol. 2025 Jul 25:10.1038/s41587-025-02718-5. doi: 10.1038/s41587-025-02718-5

Translation efficiency covariation identifies conserved coordination patterns across cell types

Yue Liu 1, Shilpa Rao 1, Ian Hoskins 1, Michael Geng 1, Qiuxia Zhao 1, Jonathan Chacko 1, Vighnesh Ghatpande 1, Kangsheng Qi 1, Logan Persyn 1, Jun Wang 2, Dinghai Zheng 2, Yochen Zhong 1, Dayea Park 1, Elif Sarinay Cenik 1, Vikram Agarwal 2, Hakan Ozadam 1,3, Can Cenik 1,
PMCID: PMC12329774  NIHMSID: NIHMS2101085  PMID: 40715459

Abstract

Characterizing shared patterns of RNA expression between genes across conditions has led to the discovery of regulatory networks and biological functions. However, it is unclear if such coordination extends to translation. In this study, we uniformly analyze 3,819 ribosome profiling datasets from 117 human and 94 mouse tissues and cell lines. We introduce the concept of translation efficiency covariation (TEC), identifying coordinated translation patterns across cell types. We nominate candidate mechanisms driving shared patterns of translation regulation. TEC is conserved across human and mouse cells and uncovers gene functions that are not evident from RNA or protein co-expression. Moreover, our observations indicate that proteins that physically interact are highly enriched for positive covariation at both translational and transcriptional levels. Our findings establish TEC as a conserved organizing principle of mammalian transcriptomes. TEC has potential as a predictive marker for gene function and may offer a framework for designing gene expression systems in synthetic biology and biotechnological applications.


Over the past three decades, technological advances have progressively revealed the expression of RNAs with increasing spatial and cellular resolution17. These measurements have driven conceptual advances including the widespread use of RNA co-expression analysis, which quantifies the similarity of gene expression patterns across biological conditions811. Such analyses have proved informative for inferring gene functions9,12,13, predicting protein–protein interactions14,15 and identifying shared regulatory mechanisms via common transcription factor binding sites16,17.

These findings suggest that RNA co-expression may serve as a proxy for the proteomic organization of cells. Quantification of protein abundance across numerous cell types and conditions has recently allowed this assumption to be explicitly tested. Surprisingly, coordinated patterns of protein abundance patterns are frequently not detected at the RNA level18,19, and physically interacting proteins are much more likely to have coordinated protein abundance than RNA co-expression1820. These discrepancies suggest that posttranscriptional regulation likely plays a key role in proteome organization.

Translation regulation may bridge this gap, yet whether translational efficiency is coordinated among functionally related genes across biological contexts remains an open question. There are three lines of evidence that suggest its plausibility. First, mammalian mRNAs bind various proteins to form ribonucleoproteins that influence their lifecycle from export to translation21. The set of proteins interacting with an mRNA varies with time and context, substantially altering the duration, efficiency and localization of protein production. These observations led to the proposal of the posttranscriptional RNA regulon model, positing that functionally related mRNAs are regulated together posttranscriptionally22,23. Second, in Escherichia coli and yeast, proteins within multiprotein complexes are synthesized in stoichiometric ratios24,25. However, in humans, evidence of such proportional synthesis is reported for only two complexes: ribosomes25,26 and the oxidative phosphorylation machinery27 in a limited number of cell lines. Third, the formation of many protein complexes is facilitated by the co-translational folding of nascent peptides24,25,2830.

Despite these precedents, a general framework for detecting translation-level coordination across cell types has been lacking. In the present study, we analyzed thousands of matched ribosome profiling and RNA sequencing (RNA-seq) datasets from more than 140 human and mouse cell lines and tissues. To quantify the similarity in translation efficiency (TE) patterns across cell types, we introduce the concept of translation efficiency covariation (TEC). Analogous to RNA co-expression, TEC is common among functionally related genes. We demonstrate that TEC nominates gene functions, is enriched among physically interacting proteins and is conserved across species. Together, these findings suggest that TEC is a functionally relevant and evolutionarily conserved organizing principle of mammalian gene expression.

Results

Integrated analysis of thousands of ribosome profiling and RNA-seq measurements

We undertook a comprehensive, large-scale meta-analysis of ribosome profiling data to quantify TE across different cell lines and tissues. We collected 2,195 ribosome profiling datasets for humans and 1,624 experiments for mice, along with their metadata (Fig. 1a and Methods). Given that metadata are frequently reported in an unstructured manner and lack a formal verification step, we conducted a manual curation process to rectify inaccuracies and collect missing information, such as experimental conditions and cell types used in experiments. One crucial aspect of our manual curation was pairing between ribosome profiling and corresponding RNA-seq when possible. Overall, 1,282 (58.4%) human and 995 (61.3%) mouse ribosome profiling samples were matched with corresponding RNA-seq data (Supplementary Table 1). The resulting curated metadata facilitated the uniform processing of ribosome profiling and corresponding RNA-seq data using an open-source pipeline31. We call the resulting repository harboring these processed files RiboBase (Supplementary Table 1).

Fig. 1 |. RiboBase: a comprehensive ribosome profiling database with thousands of experiments.

Fig. 1 |

a, Schematic of RiboBase. We manually curated metadata and processed the sequencing reads using a uniform pipeline (RiboFlow31). b, Top five most highly represented cell lines or tissues with respect to the number of experiments were plotted. c, We determined the ribonuclease used to generate ribosome profiling data for 680 experiments using human cancer cell lines. For each experiment, the read length distribution of RPFs mapping to coding regions was visualized as a heatmap. The color represents the z-score adjusted RPF counts (Methods). Each experiment where the percentage of RPFs mapping to CDS was greater than 70% and achieving sufficient coverage of the transcript (≥0.1×) was annotated as QC pass (Methods). d, For the 3,819 ribosome profiling experiments in RiboBase, we applied a function to select the range of RPFs for further analysis (Methods). We calculated the proportion of the selected RPFs that map to the coding regions (y axis). The horizontal line represents the median of the distribution. e, Experiments (x axis) were grouped by the transcript coverage (y axis). f, Among the ribosome profiling experiments in RiboBase, 2,277 of them had corresponding RNA-seq data (matched). The number of samples that passed quality controls was plotted. QC, quality control.

In RiboBase, the cell types with the most experiments were HEK293T (13.1%) and HeLa (8.1%) for human; in mouse, the leading tissues were brain (9.6%), embryonic fibroblasts (8.3%) and liver (7.7%) (Fig. 1b and Supplementary Table 1). The median number of sequencing reads for ribosome profiling samples was approximately 43.2 million for humans and approximately 37.5 million for mice (Extended Data Fig. 1a,b, Supplementary Tables 2 and 3 and Supplementary Information). Most reads contained adapter sequences included during library preparation (with medians of 82.2% and 79.2% of total reads having adapters for human and mouse, respectively). Due to the substantial presence of ribosomal RNA in ribosome profiling datasets, only approximately 15% of total reads aligned to the transcript reference (Extended Data Fig. 1c,d, Supplementary Tables 4 and 5 and Supplementary Information).

The length of ribosome-protected mRNA footprints (RPFs) provides valuable information about data quality, the experimental protocol used and translational activity32. The choice of nuclease impacts the resulting read length distribution of RPFs33 (Extended Data Fig. 1e,f). In agreement, we found that the peak position and range of RPF lengths were closely associated with the type of digestion enzymes used in human cancer samples (Fig. 1c). To account for the variability of RPF length distributions across the compendium of experiments, we developed a module that allowed for setting sample-specific RPF read length cutoffs (Extended Data Fig. 2a and Methods). This dynamic approach proved more effective than using fixed minimum and maximum values for RPF lengths, resulting in a higher retrieval of usable reads (median increase of 10.8% for human and 17.1% for mouse) and an increased proportion of reads within the coding sequence (CDS) region (Extended Data Fig. 2b).

After selecting a set of RPFs, we assessed the quality of ribosome profiling data within RiboBase using two additional criteria. Given that translating ribosomes should be highly enriched in annotated coding regions, we require that at least 70% of RPFs should be mapped to the CDS. We found that 160 human and 115 mouse samples failed to meet this criterion (Fig. 1d and Supplementary Tables 6 and 7). Subsequently, we required a minimum number of RPFs that map to the CDS to ensure sufficient coverage of translated genes (Methods). There were 318 human and 431 mouse samples with less than 0.1× transcript coverage (Fig. 1e and Supplementary Tables 6 and 7). Altogether, 1,794 human samples and 1,134 mouse samples were retained for in-depth analysis. Of these, 1,076 human and 845 mouse samples were paired with matching RNA-seq data. Our results indicate that a considerable fraction of publicly available ribosome profiling experiments had suboptimal quality (18.3% of the human samples and 30.1% of the mouse samples) (Fig. 1f). Interestingly, the data quality appeared to be independent of time (Extended Data Fig. 2c). Additionally, we found that samples that passed our quality thresholds were more likely to exhibit three-nucleotide periodicity compared to those that failed quality control (92.59% versus 78.30% for humans and 91.36% versus 86.73% for mice; Extended Data Fig. 3ad and Methods). These findings underscore the necessity of meticulous quality control for the selection of experiments to enable large-scale data analyses.

TE is cell type specific

Ribosome profiling measures ribosome occupancy, a variable influenced by both RNA expression and translation. Thus, estimating TE necessitates analysis of paired RNA-seq and ribosome profiling data. To assess accurate matching in RiboBase, we first compared the coefficient of determination (R2) between matched ribosome profiling and RNA-seq data to that from other pairings within the same study. As would be expected from correct matching, we found that matched samples had significantly higher similarity on average (Fig. 2a; Welchʼs two-sided t-test P < 2.2 × 10−16 for human and P = 2.1 × 10−5 for mouse). We then implemented a scoring system to quantitatively evaluate the correctness of our manual matching information (Methods). In total, 99.2% of human samples and 98.5% of mouse samples had a sufficiently high matching score, demonstrating the effectiveness of our manual curation strategy (Extended Data Fig. 3e and Methods).

Fig. 2 |. TE defined using a compositional linear regression model is conserved across cell types and species.

Fig. 2 |

a, The distribution R2 (y axis) between 1,922 ribosome profiling data and corresponding RNA-seq in RiboBase from 236 different studies was compared to random matching within the same study and across different studies. In each figure panel containing box plots, the horizontal line corresponds to the median. The box represents the IQR, and the whiskers extend to the largest value within 1.5× the IQR. The significant P value shown in this figure was calculated using the two-sided Wilcoxon test. b, Schematic of TE calculation using the linear regression model with compositional data (CLR transformed; Methods and Extended Data Fig. 4). c, Distribution of correlations of TE across 231 experiments with 1,889 ribosome profiling and corresponding RNA-seq. d, The distribution of Spearman correlations between experiments (y axis) was calculated based on whether they originated from identical or different cell lines or tissues; the sample numbers are the same as in c. e, Correlation between compositional TE (68 samples from 11 studies (HEK293), 86 samples from 10 studies (HeLa), 58 samples from four studies (U2OS), 29 samples from five studies (A549), five samples from two studies (MCF7), seven samples from two studies (K562) and 10 samples from two studies (HepG2)) and protein abundance from seven human cell lines82 was calculated using the compositional regression method. f, We used UMAP to cluster the TE values of all genes across different cell types, considering only those origins with at least five distinct cell types. g, The Spearman correlation of 9,194 orthologous genes between human and mouse across TE, ribosome profiling and RNA-seq levels. The circles represent the value of the Spearman correlation between groups. h, TE values were averaged across cell types and tissues for either human or mouse. Each dot represents a gene, and a 95% prediction interval was plotted to identify outlier genes (highlighted in purple and green). i, We conducted GO term enrichment analysis for outlier genes from h. We ranked the GO terms (y axis) by the logarithm of the odds (LOD; x axis). j, The correlation of the s.d. of TE (quantified with adjusted m.s.d.; Methods and Extended Data Fig. 7c,d) for orthologous genes across different cell types between human and mouse. Mesc, mouse embryonic stem cells; UMAP, uniform manifold approximation and projection.

Using the set of matched ribosome profiling and RNA-seq experiments, we next quantified TE, which is typically defined as the ratio of ribosome footprints to RNA-seq reads, normalized as counts per million (CPM)34. However, this approach leads to biased estimates with important drawbacks35. To address this limitation, we calculate TE based on a regression model using a compositional data analysis method3638, avoiding the mathematical shortcomings of using the canonical log-ratio method (Fig. 2b, Extended Data Figs. 3eg and 4, Supplementary Tables 811, Methods and Supplementary Information).

We next assessed whether measurement errors due to differences in experimental procedures dominate variability that would otherwise be attributed to biological variables of interest. Specifically, we compared similarities between experiments that used the same cell type or tissue in different studies (Extended Data Fig. 5a). We found that ribosome profiling or RNA experiments from the same cell type or tissue exhibited higher similarity compared to those from different cell lines or tissues (Fig. 2c). Consistent with this observation, TE values displayed higher Spearman correlation coefficient within the same cell type or tissue (median correlation coefficient of 0.56 and 0.53 in human and mouse, respectively) compared to different cell lines and tissues (median correlation coefficient of 0.49 and 0.45 in human and mouse, respectively) (Fig. 2d).

We expected that an accurate estimate of TE would correlate with protein abundance. We calculated for each transcript the cell-type-specific TE by taking the average of TE values across all experiments conducted with that particular cell line and found that TE derived using the regression approach with winsorized read counts (Extended Data Figs. 5b and 6 and Supplementary Information) was significantly correlated with protein abundance in seven cancer cell lines (mean Spearman correlation coefficient of 0.465; Fig. 2e).

Furthermore, TE measurements from cell lines and tissues with the same biological origin (for example, blood) tended to cluster together, supporting the existence of cell-type-specific in addition to species-specific differences in TE (Fig. 2f). As expected, mean ribosome occupancy and RNA expression across cell types showed a strong correlation (Spearman correlation coefficient: ~0.8), yet mean TE was only weakly associated with RNA expression (Spearman correlation coefficient: ~0.2) (Fig. 2g).

Measurements of TE in two species across a large number of cell types enabled us to investigate the conservation of TE, ribosome occupancy and RNA expression. Transcriptomes, ribosome occupancy and proteomes exhibit a high degree of conservation across diverse organisms39,40. Consistently, we found that average ribosome occupancy, RNA expression and TE across different cell lines and tissues were highly similar between orthologous genes in human and mouse (Fig. 2g and Supplementary Table 12). Specifically, the Spearman correlation coefficient of mean TE across cell types and tissues between human and mouse was 0.9 (Fig. 2h), which is similar to the mean RNA expression correlation between human and mouse (~0.86; Extended Data Fig. 7a). Using a 95% prediction interval to identify outlier genes, we found that outlier genes with higher mean TE in humans compared to mice were enriched in the Gene Ontology (GO) term ‘RNA binding function’ (Fig. 2i). In contrast, genes with elevated mean TE in mice were enriched for having functions related to extracellular matrix and collagen-containing components (Fig. 2i). The enrichment of genes with higher TE in mice, particularly those from the extracellular matrix and collagen-containing components, may be due to the fact that many samples in mouse studies are derived from the early developmental stage41.

Despite the high correlation of mean TE across various cell lines and tissues between human and mouse, TE distinctly exhibits cell type specificity. Although several studies compared the conservation of TE between the same tissues of mammalians or model organisms40,42,43, our dataset uniquely enabled us to determine the conservation of variability of TE for transcripts across different cell types. Intriguingly, we observed a moderately high similarity between the variability of TE of orthologous genes in human and mouse (Spearman partial correlation coefficient = 0.63; Fig. 2j, Extended Data Fig. 7bd and Methods). Our results reveal that certain genes exhibit higher variability of TE across cell types, and this is a conserved property between human and mouse.

TEC is conserved

Uniform quantification of TE enabled us to investigate the similarities in TE patterns across cell types. Given the usefulness of RNA co-expression in identifying shared regulation and biological functions, we aimed to establish an analogous method to detect patterns of TE similarity among genes. To achieve this, we employed the proportionality score (ρ)37,38, a statistical method that quantifies the consistency of how relative TE changes across different contexts (Methods). Recent work suggested that the proportionality score enhances cluster identification in high-dimensional single-cell RNA co-expression data10. Consistent with these findings, our analysis revealed its particular effectiveness in quantifying ribosome occupancy covariation (Extended Data Fig. 8a and Methods). We calculated ρ scores for all pairs of human or mouse genes where a high absolute ρ score indicates considerable TEC between pairs (Fig. 3a).

Fig. 3 |. TE covariation is conserved between human and mouse.

Fig. 3 |

a, Example illustrating TEC between genes. The top section presents TE patterns across cell types in human. The bottom left part displays the similarity of the pattern between these genes quantified using proportionality scores. b, We calculated the TEC for gene pairs and compared their differences for the same orthologous gene pairs between human and mouse. In the figure panel, each dot size represents the aggregated log10-transformed counts of gene pairs falling within specified ranges. We also calculated TEC using randomized TE for each gene (shuffled). The red dashed line in the figure captures the 95% gene pair TEC values obtained with shuffled TE (Extended Data Fig. 8b). c, The top 10 RBPs with the highest number of genes showing significant correlations between transcript TE and RBP RNA expression are displayed. An asterisk marks genes in the top 10 in both species. d, For all RBPs (each dot), we plotted the proportion of positive correlations between TE of significant transcripts and the RNA expression of the RBP. Blue line is a linear fit with gray bands marking 95% confidence intervals. The Pearson correlation coefficient (r) is shown. FDR, false discovery rate.

Previous studies indicated that RNA co-expression between genes is conserved in mammals39,44,45. To assess the potential evolutionary significance of the newly introduced TEC concept, we evaluated its conservation across human and mouse transcripts. Indeed, TEC was highly similar for orthologous gene pairs in humans and mice (Fig. 3b; Pearson correlation coefficient = 0.41), compared to a negligible correlation in TEC derived from shuffled TE values (Extended Data Fig. 8b; Pearson correlation coefficient = 0.00022). Our findings imply that TE patterns are evolutionarily preserved, paralleling the conservation of RNA co-expression.

There are few known examples of transcript sets with shared translational control across cell types, including those with 5′ terminal oligopyrimidine (TOP) motifs46 and transcripts regulated by CSDE1 in human47. We hypothesized that these sets of transcripts would exhibit significantly higher TEC compared to a control set, and this was indeed observed (Extended Data Fig. 8c; Wilcoxon P < 2.2 × 10−16). These findings highlight that TEC effectively captures previously established patterns of shared translational regulation among transcripts.

RNA co-expression analyses led to the discovery of regulatory motifs and shared transcription factor binding sites48. We hypothesized that TEC among genes may similarly nominate RNA-binding proteins (RBPs) as potential drivers of TEC49. We identified groups of transcripts whose TE is correlated with the RNA expression of RBPs50,51 (1,274 human and 1,762 mouse RBPs; Methods). For example, the RNA expression of ZC3H10 in human has largely positive correlations with TE of transcripts (71% of all significant correlations; Fig. 3c). Conversely, the RNA expression of subunits of ubiquinone oxidoreductase (Ndufa7 and Ndufv3) is negatively correlated with TE for many transcripts in mice (Fig. 3c).

We observed a correspondence between human and mouse in the proportion of transcripts whose TE is significantly correlated with the expression of an RBP (Pearson correlation coefficient = 0.44; Fig. 3d). However, knockout of three candidate RBPs did not lead to significant changes in TE (Supplementary Information), which may be due to several limitations, including cell type specificity, efficiency of knockout and limited replicates (Supplementary Information). Taken together, our analyses nominate RBPs that may coordinate the TEC of evolutionarily conserved networks.

TEC is associated with shared biological functions

Given that co-expression at the RNA level is predictive of shared biological functions11,52,53, we next assessed whether TEC indicates common biological roles among genes. We calculated the area under the receiver operating characteristic curve (AUROC) to measure the ability of TEC in distinguishing genes with the same biological functions (Methods). Genes that are annotated with a common GO term exhibited a similar degree of RNA co-expression and TEC, both of which were considerably higher than would be expected by chance (median AUROC across GO terms calculated with TEC: 0.63 for human and 0.65 for mouse; RNA co-expression RNA: 0.66 for human and 0.69 for mouse; Fig. 4a, Supplementary Tables 13 and 14 and Methods). These findings demonstrate that TEC, similar to RNA co-expression, serves as an indicator of shared biological functions among genes.

Fig. 4 |. Genes associated with certain biological functions exhibit higher similarity patterns in TE than in RNA expression.

Fig. 4 |

a, We calculated the similarity of expression (quantified by AUROC; y axis) among genes within 2,989 human and 3,340 mouse GO terms. In the box plot, the horizontal line corresponds to the median. The box represents the IQR, and the whiskers extend to the largest value within 1.5 times the IQR. b, Each blue dot represents the AUROC calculated for a given GO term using TEC and RNA co-expression levels. Orange dots represent the same values for random grouping of genes (Methods). c, For GO terms where genes exhibit greater similarity at the TE level than at the RNA expression level (AUROC for TEC > 0.8 and difference of AUROC measured with TEC and RNA co-expression > 0.1), we visualized the distribution of absolute ρ scores for gene pairs (bottom; gene pairs with abs(ρ) > 0.1). d, AUROC plot calculated with genes associated with MAPKKK activity. e, In the circle plot, the connections display absolute ρ above 0.1 at TE level alone (purple), at both RNA and TE levels (blue) or at RNA level alone (gray) for gene pairs involved in MAPKKK activity. f, Motif enrichment (left) for the GO term ‘Molecular function inhibitor activity’ (Extended Data Fig. 9b). RBPs matching the motifs from oRNAment94 or Transite95 are indicated. eCLIP data96 (right) indicate increased binding of RBPs TRA2A and SRSF1 in the CDS of genes for this GO term compared to matched control genes with similar sequence properties (Methods). cor., Spearman correlation; FPR, false-positive rate; TPR, true-positive rate.

Furthermore, we observed that biological functions whose members exhibit a high degree of RNA co-expression were also likely to have TEC. Specifically, the Spearman correlation between the AUROC scores calculated using TEC and RNA co-expression was approximately 0.64 for human GO terms in contrast to approximately −0.02 when random genes were grouped (Fig. 4b). Despite the low correlation between average RNA expression and TE for human genes (Fig. 2g), our results highlight that members of specific biological functions whose RNA expression is coordinated across cell types tend to exhibit consistent TE patterns. This finding suggests coordinated regulation at both transcriptional and translational levels among functionally related genes.

Although many gene functions were predicted accurately with both RNA co-expression and TEC, we noted specific exceptions. Notably, genes in 29 human GO terms demonstrated stronger TEC than RNA co-expression (at least 0.1 higher AUROC; Fig. 4c, Extended Data Fig. 9 and Supplementary Information). An example of such a GO term is ‘MAPKKK activity’ (Fig. 4d,e). Although there is limited evidence of direct translational regulation of the MAPKKK family, the RBP IMP3 may provide a potential mechanism for such regulation54. These results indicate that some genes with specific biological functions exhibit greater similarity at the translational level.

We hypothesized that genes with shared functions and high TEC may be regulated through a common mechanism, analogous to shared transcription factor binding sites that mediate RNA co-expression55,56. Accordingly, we expected these genes to harbor shared sequence elements. We identified enriched heptamers in the transcripts of five human and three mouse GO terms with TEC and at least 12 genes in the GO term (AUROC measured with TEC > 0.7, difference in AUROC between TEC and RNA co-expression > 0.2; Fig. 4f, Extended Data Fig. 9b,f and Methods). For example, we found AG-rich motifs in coding regions of human genes with ‘Molecular function inhibitor activity’ (Fig. 4f). These motifs match the known binding sites of three RBPs (TRA2A, PABPN1 and SRSF1). In line with the enrichment of these motifs, analysis of enhanced cross-linking immunoprecipitation (eCLIP) data revealed increased deposition of these RBPs in the CDSs of genes in this GO term compared to matched control transcripts (Fig. 4f and Methods). Furthermore, we identified several additional enriched heptamers that currently have no RBP annotations, suggesting that these motifs might be targets for RBPs that have not yet been characterized (also see Supplementary Information for sequence features associated with TEC). In contrast, our analyses indicate that microRNA is not a primary driver of TEC (Supplementary Information), in agreement with previous literature57,58.

TEC nominates gene functions

We next investigated whether gene functions may be predicted by using TEC, given the success of RNA co-expression for this task52,53. The functional annotations of human genes are continuously being updated, providing an opportunity to test this hypothesis using recently added information to the knowledge base. Specifically, we used functional annotations from the GO database from 1 January 2021 to determine functional groups that demonstrate strong TEC among its members (AUROC > 0.8) and developed a framework to predict new functional associations with these groups (Methods). By comparing our predictions to annotations from 4 December 2022, we confirmed the predicted association of the LOX gene with the GO term ‘Collagen-containing extracellular matrix’. LOX critically facilitates the formation, development, maturation and remodeling of the extracellular matrix by catalyzing the crosslinking of collagen fibers, thereby enhancing the structural integrity and stability of tissues59,60. Our prediction successfully identified this new addition, as LOX exhibits positive similarity in TE with the vast majority of genes in this term (Fig. 5a).

Fig. 5 |. TEC enables the prediction of gene functions.

Fig. 5 |

a, We predicted that LOX belongs to the collagen-containing extracellular matrix using an older version of human GO terms (1 January 2021) and confirmed this prediction with the newer version (4 December 2022) (Methods). The network displays the similarity in TE between LOX (yellow dot) and other genes (gray dots) from the collagen-containing extracellular matrix. Line weight in figure panels indicates the absolute value of ρ from 0.1 to 1. b, The networks display the ρ between LRRC28 and glycolytic genes at the TE level (on the left) and RNA level (on the right) in humans. Green dots represent genes that belong to the glycolysis pathway; purple nodes are transcription factors that regulate glycolysis. c, TE and RNA expression of LRRC28, glycolytic genes and transcription factors whose protein products regulate glycolysis (FOXK1 and FOXK2) across human cell types and tissues. d, We used AlphaFold2-Multimer to calculate the binding probabilities between the proteins LRRC28 or LRRC42 and glycolytic proteins (Methods). We evaluated the models with ipTM+pTM (x axis) and precision of protein–protein interface binding predictions (pDOCKQ; y axis). We set a threshold of ipTM+pTM > 0.7 (ref. 97) and pDOCKQ > 0.23 (refs. 98,99) as previously suggested to identify confident binding. e, Three-dimensional model of binding between LRRC28 and FOXK1. For visualization purposes, we removed residues 1–101 and 370–733 in FOXK1 (pLDDT scores below 50). f, Binding probabilities between LRRC28 and transcription factors belonging to the forkhead family71. The dashed lines represent ipTM+pTM > 0.7 or pDOCKQ > 0.23. g,h, Kinetic ECAR response of SHSY-5Y cell line (n = 6, stable overexpression) (g) and Huh 7.5 cell line (n = 6; transient overexpression) (h) overexpressing LRRC28 or LRCC42 to 10 mM glucose and 100 mM 2-DG. Unpaired two-sided Student’s t-test, ***P < 0.005 and **P < 0.05 (SHSY-5Y measurement 4 P = 0.01, measurement 5 P = 0.003 and measurement 6 P = 0.002; Huh 7.5 measurement 4 P = 4.2 × 10−5, measurement 5 P = 6 × 10−8 and measurement 6 P = 8.4 × 10−8). g and h show mean ± s.d.; n shows biological independent experiments. 2-DG, 2-deoxy-d-glucose; TF, transcription factor.

Recognizing the capacity of TEC to elucidate biological functions, we used a recent version of GO annotations (4 December 2022) to systematically predict new associations for genes. To underscore the unique insights gained from TEC, we focused on the 33 human and 31 mouse GO terms that either exhibited considerably higher TEC than RNA co-expression (Supplementary Table 15) or provided new functional predictions that were supported only by TEC (the ranking of the newly predicted gene with RNA co-expression fell beyond the top 50%; Supplementary Tables 16 and 17 and Methods). By focusing on these GO terms, we aimed to identify similarity patterns based on TE, revealing functional associations that would not be detected by RNA co-expression. We conducted a literature search to determine if prior research supported these predictions, and we found that 11 were already corroborated by previous publications, although they have not yet been reflected in the relevant GO term annotations (Supplementary Table 15 and Supplementary Information). For example, cryo-electron microscopy experiments demonstrated that human DNMT1 binds to hemimethylated DNA in conjunction with ubiquitinated histone H3 (ref. 61). This binding facilitates the enzymatic activity of DNMT1 in maintaining genomic DNA methylation. Our analysis revealed that DNMT1 was the highest-ranking prediction exhibiting strong TEC with genes associated with nucleosomal DNA binding function. In mouse, we predicted Plekha7 to be a member of the regulation of developmental processes. This prediction was recently validated by the observation of neural progenitor cell delamination upon the disruption of Plekha7 (refs. 6266).

The high rate of validation of our predictions in the literature suggests that other predictions based on TEC may reflect new and yet-to-be-confirmed functions. In particular, we observed that the human LRRC28 (leucine-rich repeat-containing 28) gene displays strong TEC with glycolytic genes but is not co-expressed at the RNA level (Fig. 5b,c). Specifically, LRRC28 displayed negatively correlated TE with key glycolytic genes, including HK1, HK2, PFKL, PFKM, PFKP, TPI1, PGK1, ENO1, ENO2 and PKM, and two transcription factors, FOXK1 and FOXK2, whose protein products regulate glycolytic genes67. Given that the leucine-rich repeat domains typically facilitate protein–protein interactions68, LRRC28 may interact directly with one or more of the glycolytic proteins. Using AlphaFold2-Multimer69, we calculated the binding confidence scores between LRRC28 and all glycolysis-associated proteins (Methods) and found that LRRC28 has a very high likelihood of binding to FOXK1 (Fig. 5d,e).

FOXK1 is a member of the forkhead family of transcription factors that share a structurally similar DNA-binding domain70,71. Interestingly, LRRC28 likely binds both the non-DNA-binding region and the DNA-binding domain of FOXK1 (distance < 4 Å; Fig. 5e and Extended Data Fig. 10a). This observation led us to examine the specificity of the interaction between LRRC28 and FOXK1. We calculated the binding probabilities of LRRC28 with 35 other forkhead family transcription factors, finding that FOXK1 exhibits the strongest evidence of physical interaction with LRRC28 (Fig. 5f and Supplementary Table 18). This specificity is potentially due to a unique binding site between LRRC28 and FOXK1’s non-DNA-binding region (Fig. 5e). As an additional control, we selected LRRC42, a protein with leucine-rich repeats that does not exhibit TEC with glycolytic genes. As expected, LRRC42 showed a very low likelihood of interaction with any of the glycolytic genes, including FOXK1 (Fig. 5d). These findings suggest that LRRC28 may serve as a regulator of glycolysis by binding to FOXK1, thereby preventing FOXK1 from binding to the promoter regions of glycolytic genes and leading to the downregulation of glycolysis. To experimentally test this prediction, we stably overexpressed LRRC28 in four human cell lines and assessed the glycolytic capacity of the cells by quantifying the extracellular acidification rate (ECAR). Stable overexpression of LRRC28 in Huh 7.5 resulted in slower growth; therefore, we resorted to a transient overexpression strategy for this particular cell line. Congruent to our expectation, ECAR was found to be significantly lower in SH-SY5Y and Huh 7.5 cells overexpressing LRRC28 compared to cells overexpressing a control gene, LRRC42 (Fig. 5g,h). However, HEK293T and MCF7 showed non-significant changes between the two conditions (Extended Data Fig. 10b,c). Interestingly, the varied response in the presence of LRRC28 likely reflects the metabolic state differences between cell lines and suggests differential dependency on LRRC28. Taken together, TEC reveals shared biological functions and provides insights not attainable with RNA co-expression analysis alone.

Positive TEC is associated with protein–protein interactions

The predicted binding between LRRC28 and FOXK1 suggests the utility of TEC to reveal physical interactions between proteins. Proteins that physically interact tend to be co-expressed at the RNA level14,72,73, and many protein complexes are assembled co-translationally74, leading us to hypothesize that the TE of interacting proteins may be coordinated across cell types. Specifically, we expect that there should be positive covariation between the TE of interacting proteins to ensure their coordinated production24,25. To test this hypothesis, we categorized gene pairs by whether they display positive or negative similarity in RNA expression or TE across cell types. Compared to all possible pairs (62,155,675), or those with the same biological function (6,492,564), physically interacting pairs of proteins (1,030,794 from the STRING database73) were substantially enriched for positive similarity of TE and RNA expression patterns (Fig. 6a; chi-square test P < 2.2 × 10−16 and 1.88-fold enrichment compared to all pairs; Supplementary Table 19). This result aligns with the notion that genes with the same function can be regulated in opposite directions, as indicated by negative ρ values, in contrast to physically interacting proteins75, which are enriched for positive ρ values.

Fig. 6 |. Physically interacting proteins display TEC.

Fig. 6 |

a, Number of pairs of genes among three sets (physical interaction, red; shared function, blue; all genes, gray) categorized based on the direction of similarity (ρ scores). b, The distribution AUROC calculated with either TEC or RNA co-expression for 3,755 hu.MAP terms (Methods). The distribution was compared to AUROC for each term that is randomly assigned genes with size matched to the original hu.MAP term. P values were calculated using a two-sided Wilcoxon test. c, AUROC plot for hu.MAP term 00862, which includes eight genes within the exocyst complex. d, Connections represent gene pairs with ρ scores above 0.1. Purple lines indicate pairs connected at TE level alone; blue lines depict those at both the RNA co-expression and TE levels. e, Heatmaps display the ρ calculated among genes at the TE (left) and RNA expression (right) levels. FPR, false-positive rate; TPR, true-positive rate.

We then examined whether these patterns generalize to the higher-order organization of protein complexes. We observed that protein complexes (as defined by hu.MAP76) displayed positive TEC and RNA co-expression (Fig. 6b and Methods). Noticeably, whereas proteins within the same complex generally exhibited similar positive patterns in both TEC and RNA co-expression, certain interactions within protein complexes were particularly evident only at the TE level (Fig. 6ce). For instance, members of the exocyst complex showed a strong positive TEC but not RNA co-expression (Fig. 6ce). The exocyst complex consists of eight subunits in equal stoichiometry, forming two stable four-subunit modules77,78. Several known exocyst-binding partners are not required for its assembly and stability, indicating that the molecular details are still unclear77. Our finding suggests that translational regulation may play a role in maintaining the proper stoichiometry of the exocyst complex. In summary, physically interacting proteins are likely to have positive TEC in addition to positive RNA co-expression profiles. The positive correlation in RNA abundance and TE among physically interacting proteins may reflect an evolutionary pressure to efficiently use energy resources24,25,79.

Discussion

In this study, we performed a large-scale analysis of ribosome profiling and matched RNA-seq data across diverse human and mouse cell types to quantify TE. A particular challenge was inadequate metadata associated with these experiments, which limits their reuse (Supplementary Information). To address this issue, we conducted a manual curation process and made several methodological advances, including the selection of RPF read lengths, data normalization and estimation of TE. Although the analyzed datasets are predominantly derived from peer-reviewed publications, over 20% failed basic quality control. Our findings point to a pressing need for community standards for data quality and structured metadata.

Consistent with its established use in prior literature, we used the term ‘translation efficiency’ to refer to ribosome occupancy per unit of mRNA (ribosome density). Recent work has questioned its interpretation as the efficiency of protein synthesis at least in the context of base-modified reporter constructs80. However, our work and that of others indicate that TE is significantly positively correlated with protein abundance and synthesis rate for endogenous transcripts81. Thus, for most endogenous mRNAs, TE (ribosome density) is positively associated with protein production.

Here we introduce the concept of TEC to quantify the similarity of TE patterns across cell types. We found that TEC relationships are evolutionarily conserved across orthologous gene pairs in humans and mice, suggesting that translational coordination is a conserved feature of mammalian transcriptome organization. Comparative analyses of TEC and RNA co-expression networks may help distinguish shared and distinct regulatory architectures and reveal network-level conservation and rewiring.

Notably, TEC is predictive of gene function. Although RNA expression and TE levels are typically weakly correlated across cell types, genes with shared functions frequently exhibit coordinated patterns of both. This coordination may enhance cellular energy conservation and responsiveness to environmental cues. Analysis of TEC revealed unique insights into protein function missed by transcriptomics and proteomics. For example, LRRC28 showed strong TEC covariation with glycolytic enzymes despite absent RNA co-expression. These patterns were also undetectable at the protein level as LRRC28 is absent from most proteomic databases, such as PAXdb and ProteomeHD72,82. We experimentally validated the role of LRRC28 in attenuating glycolysis in two human cell lines. Furthermore, we identified a high-confidence predicted interaction between LRRC28 and FOXK1, the key transcription factor controlling glycolytic enzyme expression, suggesting a potential mechanism. Given that LRRC28 is downregulated in multiple cancers8385, its regulation may represent a previously unrecognized axis of metabolic control with therapeutic relevance67,86,87.

These findings prompted us to systematically analyze TEC among physically interacting proteins, revealing a significant enrichment of positive TEC among such proteins. Coordination of TE between proteins may facilitate their co-translational assembly74 into complexes and contribute to their stoichiometric production, hence reducing the energetic costs of orphan protein production. These results also highlight the potential utility of TEC for synthetic biology applications. Design of synthetic gene circuits often requires precise stoichiometric production of multi-subunit complexes or pathway enzymes. As shown in the accompanying paper88, we developed machine learning approaches to predict TE from mRNA sequences. Such predictive modeling approaches can be combined with TEC to guide balanced translational output in engineered systems. This optimization may be particularly advantageous in applications for improving yield, given that protein biosynthesis is the largest consumer of energy during cellular proliferation25,28,79.

We acknowledge several limitations in our study. First, the limited number of samples for certain cell lines may affect the reliability of corresponding TE estimates. Second, our analyses focused on genes robustly expressed across most cell types, limiting our ability to assess TEC for genes with cell-type-specific or consistently low expression levels. Third, owing to limitations in RNA-seq quality and the short length of ribosome-protected fragments, we restricted our analysis to one representative isoform per gene89. As a result, we may miss regulatory features that vary across isoforms, such as alternatively spliced untranslated regions (UTRs) that influence microRNAs or RBP binding. Therefore, our analyses are unable to adequately capture isoform-specific usage that may affect TEC. Fourth, given the compositional nature of all sequencing experiments, TE reflects the relative allocation of translational resources. Hence, a transcript can exhibit relatively higher TE even without changes in its polysome distribution, as observed in global translational capacity shifts during viral infections and other biological contexts90,91. Moreover, TEC is currently defined at the level of bulk cell populations. As single-cell ribosome profiling technologies mature92,93, TEC could be extended to quantify translational coordination within heterogeneous cellular environments, enabling context-aware models of posttranscriptional regulation in development and disease.

In summary, TEC nominates new gene functions not captured by RNA or protein expression alone. The discovery of translation-level coordination among functionally related genes and physically interacting proteins across diverse cell types supports TEC as a conserved and functionally relevant organizing principle of mammalian gene expression. TEC may offer a framework for rationally designing gene expression systems in synthetic biology, enabling control of protein stoichiometry in engineered pathways.

Methods

Acquisition and curation of ribosome profiling data

We used keyword search (‘ribosome profiling’, ‘riboseq’, ‘ribo-seq’, ‘translation’, ‘ribo’ and ‘ribosome protected footprint’) to determine studies that may employ ribosome profiling in their experimental design, from the Gene Expression Omnibus (GEO) database, with a cutoff date of 1 January 2022. Search results were manually inspected, and studies containing ribosome profiling data were kept. Organism, cell line, publication and Sequence Read Archive (SRA) identifiers were obtained by automatically parsing the GEO pages of the corresponding study and sample. There was no dedicated experiment-type field for ribosome profiling experiments in the GEO. Therefore, we determined the experiment type (ribosome profiling, RNA-seq or other) of each sample by manually inspecting the GEO metadata and the associated publication of the study. Typically, ribosome profiling samples were indicated in the GEO using one of the following terms: ‘ribosome protected footprint’, ‘ribo-seq’ and ‘ribosome profiling’ in various parts of the metadata such as title, extraction protocol and library strategy. If there were RNA-seq samples in the same study, they were matched with ribosome profiling experiments, where available, after inspecting the sample names, metadata and the publication of the study.

Adapters are commonly observed on the 3′ end of sequencing reads in ribosome profiling experiments, a consequence of the inherently short length of RPFs. If the 3′ adapter sequence was listed in the GEO, we extracted it as part of the manual data curation process. If this sequence was unavailable, we attempted to determine it from the corresponding publication of the study. If no explicit sequence was available, we computationally analyzed the sequencing reads and searched for commonly used adapters, which are CTGTAGGCACCATCAAT, AAGATCGGAAGAGCACACGTCT, AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC, TGGAATTCTCGGGTGCCAAGG and AAAAAAAAAA. If any of these adapters was found in at least 50% of the reads, we used the detected sequence as the 3′ adapter. If no match was found, we removed the first 25 nucleotides of the reads-anchored 6-mers and tried to extend them. If any of these extensions reached 10 nucleotides and was still detected in at least 50% of the reads, we took the highest-matching sequence as the 3′ adapter. On the other hand, for sequencing reads from the SRA having a length of less than 35 nucleotides, we assumed that the 3′ adapters had already been removed. Detailed code can be accessed from https://github.com/RiboBase/snakescale/blob/main/scripts/guess_adapters.py.

RiboBase was pre-populated after mining the GEO. Then, data curators were assigned specific studies and used the web-based interface to access the database. Each study was curated independently by at least two people. In case of disagreements, an additional experienced scientist inspected the corresponding studies and publications to make the final decision. We supplemented any missing metadata from the GEO by checking the corresponding publications to ensure completeness. The result of this data curation process with information such as cell line, organism and matched RNA-seq can be found in Supplementary Table 1, which forms the metadata backbone of RiboBase.

Ribosome profiling and RNA-seq data processing

For each selected study in the GEO, ribosome profiling and matching RNA-seq reads (where available) were downloaded from the SRA, using SRA-Tools version 2.9.1 (ref. 100), in FASTQ format using their accession numbers. FASTQ files were processed using RiboFlow version 0.0.0 and version 0.0.1 (ref. 31) where parameters were determined using the metadata in RiboBase. The reference files for human and mouse transcriptomes, annotations and non-coding RNA sequences are available at https://github.com/RiboBase/reference_homo-sapiens and https://github.com/RiboBase/reference_mus-musculus, respectively. The detailed documentation and code used to select representative transcripts are also provided: https://github.com/ribosomeprofiling/references_for_riboflow/tree/master/transcriptome/human/v2 and https://github.com/ribosomeprofiling/references_for_riboflow/tree/master/transcriptome/mouse/v1. In brief, the 3′ adapters of the ribosome profiling reads were trimmed using Cutadapt version 1.18 (ref. 101), and reads having lengths between 15 and 40 nucleotides were kept. Then, reads were aligned against non-coding RNAs, and unaligned reads were kept. Next, reads were aligned against transcriptome reference, and alignments having mapping quality score above 20 were kept. Reads having the same length and mapping to the same transcriptome position were collapsed, which we refer to as ‘PCR deduplication’. In the final step, we compiled the alignments into ribo files using RiboPy31. All alignment steps used Bowtie 2 version 2.3.4.3 (ref. 102). For each sample, we also performed the same run without the PCR deduplication step. We developed a pipeline, SnakeScale, available at https://github.com/RiboBase/snakescale, to automate the entire process from downloading the data from the SRA to generating the ribo files. SnakeScale went over the selected list of studies and obtained their metadata from RiboBase, downloaded the sequencing data from the SRA, generated a RiboFlow parameters file and ran RiboFlow to generate the ribo files. Examples of non-deduplicated ribo files for the HeLa cell line can be accessed at ref. 103.

To visualize the length distribution of the RPFs, we applied the scale function (z-score) in R to normalize the count of RPFs mapped to CDS regions with PCR-deduplicated ribosome profiling data. Subsequently, we plotted the distribution of these normalized RPFs using the heatmap (Fig. 1c and Extended Data Fig. 1).

RPF length selection and quantification of ribosome occupancy

Ribosome profiling experiments employ a range of ribonucleases, including RNase I, RNase A, RNase T1 and MNase (that is, micrococcal nuclease) (S7). These different enzymes lead to variable RPF lengths33,104106. To ensure that we retain high-quality RPFs for further analyses, we implemented a dynamic extraction module that automatically selected lower and higher boundaries of RPFs for each sample. Initially, we determined the first RPF length, ranging from 21 to 40 nucleotides, that contained the highest number of CDS mapping reads. Then, we examined the two positions adjacent to this selected position. The extension of the position was carried out on either side to include a higher number of CDS-aligned reads. This extension process was repeated until it encompassed at least 85% of the total CDS reads within the 21–40-nucleotides range (Extended Data Fig. 2a). The final two positions identified were designated as the lower and upper boundaries. If these boundaries extended to either 21 or 40 nucleotides without including a sufficient number of reads, then 21 or 40 nucleotides, respectively, were set as the final boundaries. This approach was employed to establish the RPF cutoffs for each sample.

Quality control for ribosome profiling

We performed quality control using RPFs that were deduplicated based on the length and position (PCR deduplication) ribo files (Fig. 1d,e). We set two cutoffs for ribosome profiling quality control. We required that, on average, each nucleotide of the transcript should be covered at least 0.1 times (0.1×). Coverage was calculated as

coverage=totalnucleotidesfromreadsmappedtotranscriptstotallengthoftranscripts

Additionally, samples with CDS mapping read percentage of 70% or higher were retained for subsequent analysis.

To assess the pattern of three-nucleotide periodicity that is typically associated with ribosome profiling experiments, we first selected the length of RPFs with the highest number of counts from the PCR-deduplicated ribo files. We then assigned all CDS mapping reads to one of three coding frames based on the position of their 5′ end. We aggregated the results for all genes for each sample. To facilitate comparison, we reordered the counts for each position of the three-nucleotide periodicity from highest to lowest and converted these counts into percentages for each sample.

We initially classified samples based on the differences between positions 1 and 2. We identified group 1 by selecting samples where the difference exceeded the 10th percentile of these differences between positions 1 and 2. For the remaining samples, we further classified them based on the differences between positions 2 and 3. Similarly, samples that did not exceed the 10th percentile of these differences between positions 2 and 3 among remaining samples were classified as group 3, whereas the rest of the samples were group 2. We further summarized the samples based on their quality control status.

We classified samples from group 1 as exhibiting three-nucleotide periodicity. The percentage of samples following three-nucleotide periodicity was calculated by dividing the number of group 1 samples by the total number of samples across all three groups.

Comparison of PCR and unique molecular identifier-based RPF deduplication

We selected eight ribosome profiling experiments that incorporated unique molecular identifiers (UMIs) into the sequence library preparation to assess the impact of different deduplication methods. Specifically, these samples are GSM4282032, GSM4282033 and GSM4282034 from GSE144140 (ref. 107); GSM3168387, GSM3168389 and GSM3168390 from GSE115162 (ref. 105); and GSM4798525 and GSM4798526 from GSE158374 (ref. 90). We processed the data using RiboFlow, applying three different deduplication methods: non-deduplication, PCR deduplication and UMI deduplication. The yaml files are available at https://github.com/CenikLab/TE_model/tree/main/riboflow_scr. The RPF length cutoffs for samples from GSE144140 and GSE115162 are listed in Supplementary Table 6. Because GSE158374 is not currently included in RiboBase, we manually performed the dynamic module and selected 28–32 as the RPF cutoff for this study.

Winsorization of CDS mapping RPFs

To address the issue of reduced usable reads resulting from PCR deduplication (Supplementary Information), we employed a winsorization method, which was previously proposed for tackling this problem24,108. For each gene’s CDS region, we obtained the distribution of non-deduplicated nucleotide counts and calculated the 99.5th percentile value. This calculation was based on reads whose lengths fell within the RPF range determined by the RPF boundary selection function. RPF counts that exceed the 99.5th percentile were capped to the value corresponding to the 99.5th percentile. This method was designed to mitigate the impact of outlier values that might arise due to disproportionate amplification during the PCR process24.

Gene filtering and normalization for ribosome profiling and RNA-seq

RNA-seq experiments in RiboBase used several different strategies to enrich mRNAs. The two most common approaches were the depletion of ribosomal RNAs and the enrichment of transcripts by poly(A)-tail selection. This difference leads to dramatically different quantification of a subset of genes that lack poly(A) tails (for example, histone genes; Extended Data Fig. 3g). Hence, we removed 166 human and 51 mouse genes identified as lacking poly(A) tails (Supplementary Tables 8 and 9)109,110.

We normalized both PCR-deduplicated RNA-seq data and winsorized non-deduplicated ribosome profiling data with CPM after removing the genes without poly(A) tails. Genes with CPM less than 1 in over 30% of the total samples in both RNA-seq and ribosome profiling for either human or mouse were removed in further analyses. In total, 11,149 human and 11,434 mouse genes were retained using this approach. We summed the counts of all poly(A) genes that were filtered out and grouped them under ‘others’ in the count table.

Validation of curation and pairing accuracy in RiboBase

We assessed the manual matching of ribosome profiling (winsorization) and RNA-seq (PCR deduplication) data in RiboBase by establishing a matching score for the samples that successfully passed quality control (transcript coverage > 0.1× and CDS percentage > 70% with PCR-deduplicated ribosome profiling data). We calculated R2 using the centered log-ratio (CLR)-transformed gene counts. This was done for each ribosome profiling sample against all corresponding RNA-seq samples within the same study. Subsequently, for each ribosome profiling sample, we calculated the difference between the R2 of its matching pair from RiboBase and the mean R2 of the non-matching pairs within the same study. The difference was defined as the matching score.

To remove poorly matched samples in both human and mouse datasets, we established a cutoff based on the R2 from the matched ribosome profiling and RNA-seq data in RiboBase. Any sample with an R2 lower than 0.188 in either human or mouse, which is Q1 − 1.5× interquartile range (IQR) of mouse R2 distribution, was considered a poor match and consequently excluded from further analysis (Extended Data Fig. 3f). Finally 1,054 human and 835 mouse ribosome profiling experiments with their matched RNA-seq were used for TE calculation.

TE calculation

CLR-normalized counts from PCR-deduplicated RNA-seq and winsorized non-deduplicated ribosome profiling were used to calculate TE with compositional linear regression36,111,112. Missing values can limit the power of TE calculations, and there is ongoing debate on how to address zeros in compositional data113. Previous studies showed that approximately 15–40% of genes in bulk RNA-seq datasets from various tissues are not expressed114. To address the issue of missing values, we followed the guidance from the ‘propr’ library, replacing zeros with the lowest observed read count (one). This approach ensures that these zeros are represented as a low proportion within the sample. We also applied two other zero-imputation methods—Geometric Bayesian-Multiplicative and Square-Root Bayesian-Multiplicative from the zCompositions package (version 1.5.0.4)—for TE and TEC calculations.

In our linear regression approach, ribosome profiling data served as the dependent variable, and the corresponding RNA-seq data provided the explanatory variable. The first step involved transforming the gene count, which includes ‘others’, into CLR-normalized compositional vectors. Given the constraints of count data within a simplex, a further transformation from CLR to isometric log-ratio (ILR) was necessary for linear regression36. This transformation is crucial as it allows the compositional data to be decomposed into an array of uncorrelated variables while preserving relative proportions. The ILR transformation projects the original data onto a set of orthonormal basis vectors derived from the Aitchison simplex. Then, the linear regression model applied to these transformed variables can be represented as

Y=b+B×X

where Y is the ILR-transformed ribosome profiling data, X is the ILR-transformed counts from RNA-seq, b is a vector of intercept terms, and B is a matrix of regression coefficients. The model assumes a normal distribution:

Y~N(D1)(Y,Σε)

where Σε represents the residual variances of dimension (D − 1) × (D − 1), D is the number of components (e.g., genes or features) in the original compositional data and N is the number of samples. These residuals were then extracted from each sample and reconverted to CLR coordinates, which are used as the definition of TE for each gene in each sample. Finally, we averaged TE for different cell lines and tissues (Fig. 2b and Extended Data Fig. 4), and we report the TE in Supplementary Tables 10 and 11. The scripts to generate TE are available at https://github.com/CenikLab/TE_model. We also provide uncertainty estimates calculated using the metric standard deviation (m.s.d.) designed for compositional data from the ‘compositions’ (version 2.0.8) library.

Correlation between TE and protein abundance

We assessed the correlation between TE and protein abundance from seven human cell lines (A549, HEK293, HeLa, HepG2, K562, MCF7 and U2OS). The protein measurements were obtained from PAXdb82. In total, 9,924 genes were shared between our TE and the protein abundance data. We calculated the Spearman correlation coefficient for each cell line using the ‘stats’ package in R to evaluate the relationship between TE and protein abundance. For a more detailed discussion of how to interpret similarity between protein abundance and sequencing-based measurements of RNA expression and translation, see Csárdi et al.115.

Cross-species TE conservation in orthologs

Orthologous genes between human and mouse were identified using the ‘orthogene’ package from Bioconductor116 using the parameters standardise_genes = TRUE, method_all_genes = ‘homologene’ and non121_strategy = ‘keep_both_species’. A single human gene could correspond to multiple mouse orthologs or vice versa. To maintain all one-to-many matches in our analysis, each correspondence is represented by multiple rows in our table (if a human gene ‘A’ is orthologous to mouse genes ‘B’ and ‘C’, we generate two separate rows: ‘A-B’ and ‘A-C’). Human genes lacking corresponding mouse orthologs were excluded or vice versa. As a result, a total of 9,194 gene pairs were identified as orthologous between human and mouse (Supplementary Table 12)

To capture the variability in TE and mRNA expression between orthologous genes in human and mouse, we measured the s.d. using the m.s.d. function from the ‘compositions’ package in R117. We observed a negative Spearman correlation coefficient between m.s.d. of TE and mean TE, as well as m.s.d. of RNA expression and mean RNA expression, in both species. To address the dependency between m.s.d. and mean values, we conducted a partial correlation analysis. For example, we adjusted the human m.s.d. values using the mean TE from both human and mouse with the ‘pcor.test’ function from the ‘ppcor’ package version 1.1 (ref. 118).

GO term analysis was performed using FuncAssociate 3.0, accessible at http://llama.mshri.on.ca/funcassociate/ (ref. 119). For this analysis, we set either 9,194 mouse or 9,189 human orthologous genes as the background. We generated association files for these genes with the 4 December 2022 version of human or mouse GO terms. In the human or mouse association file, we kept only those GO terms containing at least 10 genes for further analysis.

Evaluating methods for similarity of ribosome occupancy

We used eight commonly used methods to quantify the similarity of ribosome occupancy across cell types for all pairs of 11,149 human or 11,434 mouse genes in RiboBase.

Method 1: CPM-normalized ribosome footprint counts were used to calculate the Pearson correlation coefficient as implemented in the ‘stats’ R package.

Method 2: Quantile-normalized (customized Python script) ribosome footprint counts were used to calculate the Pearson correlation coefficient.

Method 3: Ranking of ribosome footprint counts was used to calculate the Spearman correlation coefficient as implemented in the ‘stats’ R package.

Method 4: CLR-normalized ribosome footprint counts were used to calculate the proportionality (ρ scores) between genes as implemented in the ‘propr’ package with the ‘lr2rho’ function38.

Method 5: CPM-normalized ribosome footprint counts were used to calculate the similarity between genes with a decision tree-based method as implemented in the ‘treeClust’ package72,120. We applied the ‘treeClust.dist’ function with a dissimilarity specifier set to d.num = 2.

Method 6: Quantile-normalized ribosome footprint counts were used to calculate the similarity between genes with the decision tree-based method.

Method 7: CPM-normalized ribosome footprint counts were used to calculate gene similarity with the generalized least squares method121.

Method 8: Quantile-normalized ribosome footprint counts were used to calculate gene similarity with the generalized least squares method.

We compared these eight ribosome occupancy similarity matrices to determine the most effective method for constructing gene relationships with respect to biological functions. This assessment employed the guilt-by-association principle to ascertain the functional coherence within a gene matrix, determining if genes associated with a particular biological function (GO terms122, TOP mRNAs123) exhibit similar expression patterns and network interactions124.

The complete ontology was sourced from the GO website, with the files goa_human.gpad.gz and mgi.gpad.gz, generated on 4 December 2022 (ref. 122). The annotation of GO terms was accomplished with the aid of the ‘org.Hs.eg.db’ and ‘org.Mm.eg.db’ R packages125,126. We restricted the selection of GO terms to those associated with the 11,149 human and 11,434 mouse genes that had passed gene filtering. We used GO terms associated with at least 10 but less than 1,000 genes for evaluation, yielding a total of 2,989 human and 3,340 mouse GO terms.

We then employed the neighbor voting algorithm to assess the covariations of ribosome occupancy among genes from the same GO term with AUROC124. Specifically, we first converted the similarity scores to absolute values. Then, we extracted genes associated with a specific function and implemented the leave-one-out cross-validation method. For this analysis, we iteratively masked one gene at a time, treating it as if it did not belong to the function. In each iteration, we calculated the total sum of similarity scores from all genes not belonging to the function to all the remaining genes within the function. We normalized the sum of similarity scores for each gene against the sum of similarity scores for that gene with all genes. After normalization, we converted these normalized similarity scores into rankings. We retained the rankings only for genes that belong to this specified functional property. Finally, we computed the AUROC for all genes within this functional property based on these rankings. A detailed script for genes’ functional similarity pattern analysis can be found at https://github.com/CenikLab/TE_model/blob/main/other_scr/benchmarking.R.

RNA co-expression and TE covariation

We introduce the concept of TEC, which employs a compositional data analysis approach36,38, to quantify the similarity patterns of TE across various cell and tissue sources, as described in Method 4 above. The proportionality scores were calculated with the following formula from the ‘propr’ package with the ‘lr2rho’ function38:

ρAi,Aj=1varAiAjvarAi+varAj

where Ai and Aj represent TE values for genes i and j from the TE matrix A.

In this study, the TEC was calculated with 77 human cell lines for 11,149 genes or 68 mouse cell lines for 11,434 genes. The proportionality coefficients (ρ scores) generated from this method range from −1 to 1. Full TEC and RNA co-expression matrices are accessible via Zenodo repository at https://doi.org/10.5281/zenodo.10373032 (ref. 127).

Prediction of gene functions with TEC

We compared the AUROC between an older version of GO terms (1 January 2021) and the newer version of GO terms (4 December 2022) to identify genes that had been newly added to the GO terms in this timeframe. GO terms were downloaded and filtered to include only those terms containing between 10 and 1,000 genes with either human or mouse backgrounds (11,149 human genes or 11,434 mouse genes). We selected 184 human and 238 mouse GO terms from the older version that demonstrated high TEC similarity (AUROC > 0.8) among genes within the same term for predicting gene functions. We first converted the ρ scores for TEC between gene pairs to absolute values. For genes not currently included in the GO terms, we calculated the sum of ρ for each gene relative to all genes within the term, based on either TE or mRNA expression levels. We then normalized these ρ sums for each gene against the total ρ sum of that gene across all 11,149 human genes or 11,434 mouse genes. These normalized values were converted into ranking percentages to reflect the likelihood of these genes being associated with the respective GO term. Finally, we identified the top-ranking genes as potentially new additions and cross-validated them with the newer version of the GO terms to confirm our predictions.

We then analyzed 243 human and 310 mouse GO terms as of 4 December 2022, which demonstrated high similarity patterns between genes in TE level (AUROC > 0.8) to predict gene functions. Absolute TEC ρ scores served as the input for biological function prediction (GO terms). We added a filter step: a newly predicted gene was retained only if its average ρ score with other genes within the same term exceeded the overall average ρ score for all existing genes in that term. This prediction analysis was performed using a custom script that can be found at https://github.com/CenikLab/TE_model/blob/main/other_scr/prediction.R.

Computational analyses of LRRC28 and glycolysis

We computed the pairwise interaction probabilities between LRRC28 or LRRC42 and glycolytic proteins (HK1, HK2, PFKL, PFKM, PFKP, TPI1, PGK1, ENO1, ENO2 and PKM) with AlphaFold2-Multimer 2.3.0 (refs. 69,128). In addition, we also calculated pairwise interaction probabilities for LRRC28 with 35 proteins from the forkhead transcription factor family71. We extracted the canonical amino acid sequence for each gene from UniProt129 as the input file. We set 0.7 as the cutoff of ipTM+pTM as a high-confidence protein structure and binding probability cutoff97. We then evaluated the interfaces predicted by AlphaFold2-Multimer, using a pDOCKQ score greater than 0.23 as our criterion for reliability98,99.

Benchmarking TEC and RNA co-expression for protein interactions

Using a similar approach to our benchmarking with biological functions, we employed the neighbor voting algorithm to assess physical protein interactions based on ρ scores among genes at either the TE or mRNA expression level. We first kept the non-negative ρ between genes and set negative ρ to zero. We then analyzed similarity patterns between genes from the same protein complex, downloading from the hu.MAP 2.0 website76. In this process, we excluded genes from hu.MAP terms that were not in the 11,149 human gene list, resulting in 8,024 overlapping genes between our list and hu.MAP terms. Furthermore, we removed hu.MAP terms that included fewer than three genes. This filtering process left us with 3,880 hu.MAP terms, among which 3,755 contained unique genes.

Because proteins within the same complex may not physically interact, we used physical interaction pairs downloaded from the STRING website instead of gene pairs from hu.MAP terms to summarize the interactions in Fig. 6a.

Identification of RNA motifs

To reduce bias in motif enrichment analysis that may arise by ribosome footprint mapping to paralogous genes, we removed predicted paralogs from each GO term using Paralog Explorer130 (DIOPT score > 1). Then, we enumerated heptamers in each transcript region using the Transite kmer-TSMA method95 with default parameters for each species (human, mouse), transcript region (5′ UTR, CDS, 3′ UTR) and GO term (selected terms with TE AUROC > 0.7, TE-RNA AUROC difference > 0.2, number genes after paralog removal ≥ 12). We selected the three mouse terms and the top five terms in humans with the highest number of genes and greatest AUROC difference.

After counting heptamers with Transite, we selected motifs that had more than 20 hits among genes in the GO term to address assumptions of uniformity near P values of 1 for some multiple test correction methods. Then, we used the Holm method to correct P values for each species separately and selected motifs with an adjusted P < 0.05. Finally, heptamers were annotated with RBPs included in the Transite95 and oRNAment databases94. For annotation of RBPs in the oRNAment database, we required that the heptamer have a matrix similarity score94 of 0.8 or greater when matching to each RBP position weight matrix. RBP motif hits from other species (Drosophila, artificial constructs) were removed from RBP annotations, and the hits to the heptamer of Drosophila tra2 were annotated as RBP TRA2A for human genes with the GO term GO:0140678.

eCLIP data for PABPN1, SRSF1 and TRA2A were downloaded from ENCODE96 as BED files (K562 and HepG2 cell lines, GRCh38 reference). The BED files for biological replicates were concatenated, and peaks that overlapped by at least one base pair were merged with ‘bedtools merge -s -c 4,6,7 -o collapse’131. The resulting merged peaks were intersected with transcripts in the GO term of interest and an equal number of control transcripts (GENCODE version 34 GTF). The control transcripts were selected by matching on length and GC content for each transcript region (5′ UTR, CDS, 3′ UTR) using MatchIt132 with default parameters. Because the gene CARMIL2 in GO term GO:0010592 does not have a 5′ UTR, required for matching, we assigned it a dummy 5′ UTR with length and GC content equal to the median across all transcripts. The number of eCLIP peaks in the CDS for each RBP was summed for genes in the GO term and control genes.

Identification of RBP–transcript pairs with high correlation between RBP RNA expression and transcript TE

The Pearson correlation coefficient between transcript TE and the RNA expression of RBPs from human and mouse50 was tested using R stats::cor.test after taking the mean of these values by cell types and tissues. P values were corrected with the Benjamini–Hochberg procedure, and correlations were deemed significant at false discovery rate < 0.05. For cloning the guides required for knockout cell line generation, the top two ranked guides were selected from the Brunello library133 for each RBP (Supplementary Table 20). The guides were cloned in LentiCRISPRv2 (Addgene, 52961) as per protocol134 and confirmed by Sanger sequencing. In brief, for lentiviral production, HEK293T cells were seeded at a density of 1.2 × 106 cells per well in a six-well plate in Opti-MEM media supplemented with 5% FBS and 100 mM sodium pyruvate, 24 hours prior to transfection. Both the cloned gRNA plasmids for each RBP (700 ng of each transfer plasmid) were co-transfected with the packaging plasmids pMD2.G and psPAX2 (Addgene, 12259 and 12260) using Lipofectamine 3000 (Invitrogen), and the virus was collected as per the manufacturer’s protocol. For generation of the knockout clones, HEK293T cells were seeded at a density of 5 × 104 cells per well in a six-well plate in DMEM media supplemented with 10% FBS, 24 hours prior to infection. The next day, the media were replaced with 1.5 ml of 1:2 diluted lentivirus containing polybrene (8 μg ml−1). After 16 hours, the lentivirus was replaced with fresh media, and puromycin (2 μg ml−1) was added to the cells 48 hours after transduction. The selection continued for 5 days followed by a period of recovery for 24 hours before harvesting the cells.

Cell culture

HEK293T and MCF7 cell lines were obtained from the American Type Culture Collection. Human hepatoma Huh 7.5 cells were a gift from Charles Rice (The Rockefeller University) and human SH-SY5Y neuroblastoma cells were obtained from Tanya Paull (The University of Texas at Austin). All cell lines were maintained in DMEM (Gibco) supplemented with 10% FBS (Gibco, Life Technologies) and 1% penicillin–streptomycin (Gibco, Life Technologies) at 37 °C in 5% CO2 atmosphere. Cell lines were tested for mycoplasma contamination every 6 months.

Plasmid cloning

pCMV-SPORT6 vector with LRRC28 was obtained from transOMIC (TCH1003). LRRC42 open reading frame (ORF) was amplified using cDNA from HEK293T cells. The oligos used for PCR amplification of the two genes had a Flag sequence to be incorporated as an N-terminal tag along with BamHI and EcoRI restriction sites in forward and reverse primers, respectively.

PCR amplification of the LRRC28 and LRRC42 ORF using the above-mentioned templates was carried out using Q5 polymerase (New England Biolabs) with the following protocol: 98 °C for 30 seconds, 30 cycles of 98 °C for 10 seconds, 55 °C for 10 seconds and 72 °C for 40 seconds and a final extension at 72 °C for 2 minutes. The PCR-amplified products were gel extracted followed by double digestion of the amplicon and the pLVX-M vector (Addgene, 125839) with EcoRI-HF (New England Biolabs) and BamHI-HF (New England Biolabs) at 37 °C for 1 hour. After digestion, the fragments were purified and cloned using T4 DNA ligase at a ratio of 1:3 at 25 °C for 1 hour followed by heat inactivation at 65 °C for 15 minutes and transformation by the heat shock method. For LRRC42 cloning, ORF was amplified using cDNA from the HEK293T cell line, and two sequences (seq1 and seq 2) from the pCMV-SPORT6 adjacent to EcoRI and NotI site were amplified with LRRC42 overhangs using oligos in Supplementary Table 21. The pCMV-SPORT6 vector with LRRC28 was digested with EcoRI and NotI followed by cloning of LRRC42 and two sequences by Gibson cloning. All clones were verified by Sanger sequencing (ACGT, Inc.) and full plasmid sequencing (Plasmidsaurus).

Lentivirus and stable cell line generation

For lentiviral generation, 1.2 × 106 HEK293T cells were seeded in a six-well plate followed by transfection of the cloned plasmid along with psPAX2 (Addgene, 12260) and pMD2.G (Addgene, 12259) plasmids using Lipofectamine 3000 (Invitrogen) as per the manufacturer’s protocol. The generated lentivirus was used for infection using 8 μg ml−1 polybrene for HEK293T, MCF7 and Huh 7.5 and 2 μg ml−1 polybrene for SH-SY5Y. After 48 hours of infection, puromycin was added at concentrations of 2 μg ml−1 for HEK293T and Huh 7.5 and 1 μg ml−1 for MCF7 and SH-SY5Y. The selection was continued for 5 days after recovery in antibiotic-free media.

Measurement of ECAR

In total, 20,000 cells of HEK293T and MCF7 and 40,000 cells of SH-SY5Y stably expressing LRRC28 or LRRC42 and 20,000 cells of Huh 7.5 were seeded onto poly-l-lysine-coated XF Pro Cell Culture Microplates with DMEM (Gibco) supplemented with 10% FBS (Gibco, Life Technologies). Huh 7.5 was transiently transfected with pCMV-SPORT6 vector with either LRRC28 or LRRC42 using Lipofectamine 3000 as per the manufacturer’s protocol. The ECAR was measured using Seahorse XF Pro (Agilent). The cartridges were hydrated a day prior to the experiment as per the manufacturer’s instructions. On the day of the experiment, prior to performing the assay, the cells were washed with Seahorse XF DMEM Medium (pH 7.4) (Agilient, 103575), supplemented with 2 mM L-glutamine (Agilient, 103579–100) followed by incubation in a non-CO2 incubator for 1 hour. To determine glycolytic capacity of cells expressing LRRC28 or LRRC42, glucose (Agilient, 103577–100) was injected at a final concentration of 10 mM (injection 1) followed by injection of 100 mM 2-deoxy-d-glucose (Sigma-Aldrich, D6134–1G) (injection 2). Three measurement cycles for each assay point—that is, baseline, injection 1 and injection 2—were carried out, with each measurement cycle consisting of a mixing time of 3 minutes and a data acquisition period of 3 minutes. The results were analyzed using Wave Pro 10.2.1.

Extended Data

Extended Data Fig. 1 |. Sequencing quality of ribosome profiling data.

Extended Data Fig. 1 |

a, Distribution of read counts for 2,195 human and 1,624 mouse ribosome profiling data in RiboBase. In all figure panels, the horizontal line corresponds to the median. The box represents the interquartile range and the whiskers extend to 1.5 times of it. b, Distribution plot similar to panel a for 1,282 human and 995 mouse ribosome profiling data with matched RNA-seq. c, Distribution of the proportion of read count aligned to transcripts, read counts with high-quality alignments, and the percentage of reads remaining after PCR deduplication, relative to the total number of reads from panel a. d, Similar plot as panel c for ribosome profiling with matched RNA-seq. e, The read length distribution of RPFs aligned to coding sequences for all human experiments. The color in the heatmap represents the z-score adjusted RPF counts (Methods). Each experiment where the percentage of RPFs mapping to CDS was greater than 70% and achieving sufficient coverage of the transcript (>= 0.1X) was annotated as QC-pass. f, Similar to panel a for mouse samples.

Extended Data Fig. 2 |. Quality control and RPF length selection.

Extended Data Fig. 2 |

a, RPFs shorter than 21 nucleotides were removed, then we identified the RPF length with the highest number of reads mapping to CDS to serve as the starting point. Subsequently, we compared one nucleotide longer or shorter than the first and chose the length with the most reads again. This looping process continued until at least 85% of the total CDS mapping RPFs were included. b, We compared the usable reads selected with two different boundary cutoffs (y-axis) and the proportion of these selected reads that map to the coding regions (x-axis) for each ribosome profiling experiment. c, The percentage of ribosome profiling experiments from GEO that pass or fail quality control (the percentage of RPFs mapping to CDS was greater than 70% and achieving at least 0.1X coverage of the transcript as QC pass).

Extended Data Fig. 3 |. Assessment of periodicity and data matching for TE estimation.

Extended Data Fig. 3 |

a-d, In ribosome profiling experiments from RiboBase, samples were classified according to distinct periodicity patterns (Methods). For all figure panels, we added error bars to represent the standard deviation across samples. Statistical significance was assessed using the Wilcoxon test, and the p-values were subsequently adjusted for all 33 comparisons using the Benjamini-Hochberg method. We considered the Group 1 pattern as indicative of the expected three-nucleotide periodicity patterns. Human samples that pass quality control (a), human samples that fail quality control (b), mouse samples that pass quality control (c), mouse samples that fail quality control (d). e, We calculated the coefficient of determination (R2) between a specific ribosome profiling experiment and its corresponding RNA-seq from RiboBase. Additionally, we determined the average R2 for all other pairings for the same ribosome profiling sample with other RNA-seq data from the same study. The matching score represents the difference in R2 values between these two (x-axis; Methods). f, A dashed line at 0.188 serves as the threshold to identify samples with poor matching (Methods). In each figure panel containing boxplots, the horizontal line corresponds to the median. The box represents the IQR and the whiskers extend to 1.5 times of it. g, Distribution of standard error of TE values across tissue and cell lines (y-axis) for genes with polyA and without polyA tails.

Extended Data Fig. 4 |. Detailed workflow of data processing for TE and TEC calculations.

Extended Data Fig. 4 |

a, We selected ribosome profiling data with matched RNA-seq and removed duplicated reads with identical positions and lengths (PCR-deduplication). We set the RPF read length range for individual samples with our dynamic cutoff and filtered out ribosome profiling experiments that failed quality control. After selecting high-quality samples, we reprocessed all these ribosome profiling experiments using the winsorization method with non-deduplicated data. We removed genes without polyA tails and kept genes with sufficient counts per million RPFs. After obtaining RPF counts from the coding regions for both ribosome profiling and RNA-seq, we performed CLR normalization and compositional linear regression, defining the residuals as TE for each gene in each sample. We averaged this sample-level TE based on cell lines and tissues. TEC is further calculated with rho scores38. To build an RNA co-expression matrix, we transformed CDS counts from RNA-seq experiments using CLR, averaged them based on cell lines and tissue, and calculated pairwise proportionalities (rho scores).

Extended Data Fig. 5 |. Spearman correlation between TE and protein abundance.

Extended Data Fig. 5 |

a, The correlation between protein abundance and clr-transformed RPF counts from ribosome profiling (left), clr-transformed read counts from RNA-seq (middle), or TE calculated with winsorized RPFs counts using the linear regression model (right). Individual dots indicate specific experiments colored according to study (68 samples from 11 studies-HEK293, 86 samples from 10 studies-HeLa, 58 samples from 4 studies-U2OS, 29 samples from 5 studies-A549, 5 samples from 2 studies-MCF7, 7 samples from 2 studies-K562, 10 samples from 2 studies-HepG2). In the boxplot, the horizontal line corresponds to the median. The box represents the IQR and the whiskers extend to 1.5 times of this range. b, TE was calculated with winsorized RPF counts without deduplication or with deduplication based on position and fragment length. The Spearman correlation coefficient between TE calculated with winsorized RPF counts and protein abundance82 (y-axis) was plotted against “delta correlation” (x-axis) defined by subtracting the correlation values obtained with PCR deduplication from those obtained with the method using winsorized RPF counts without deduplication.

Extended Data Fig. 6 |. PCR vs UMI deduplication comparison for GSE144140.

Extended Data Fig. 6 |

a, Metagene plots centered on the start codon for samples GSM4282032 (RPFs range: 28–36 nt), GSM4282033 (RPFs range: 28–36 nt range), and GSM4282034 (RPFs range: 26–35 nt range) were plotted using three different deduplication methods: non-deduplication (ND), UMI-deduplication (UMI), and PCR-deduplication (PCR). b, Correlation of gene counts for GSM4282032 between the three deduplication methods. A blue diagonal line represents a 1:1 ratio in all figure panels. Same analysis as panel b for GSM4282033 c, and GSM4282034 d.

Extended Data Fig. 7 |. Conservation of gene expression between human and mouse.

Extended Data Fig. 7 |

a, The relationship between the mean RNA expressions (clr-transformed counts) of 9,194 orthologous genes across two species is plotted. Dots represent genes in all figure panels. b, The variability of genes’ RNA expression was quantified with metric standard deviation (msd; Methods) across different cell lines and tissues in either human or mouse. To account for the correlation between mean RNA expression and its variability, we adjusted the msd values with their mean values (Methods). c, The scatter plot shows the adjusted msd values (y-axis; Methods) and the average TE across different cell types (x-axis) for human genes. d, Similar analysis as in panel c for mouse genes.

Extended Data Fig. 8 |. Evaluation TEC calculation methods and TEC patterns.

Extended Data Fig. 8 |

a, The AUROCs for biological functions were calculated using the similarity scores among genes at ribosome occupancy level determined by eight distinct methods with 1,794 human ribosome profiling data (Methods). In the boxplot, the horizontal line corresponds to the median. The box represents the IQR and the whiskers extend to the largest value within 1.5 times the IQR from the hinge. The dot in this figure represents the AUROC for human 5′ TOP mRNAs. b, TE values that were randomly reassigned from the original data for each gene (shuffled) and TEC was calculated. In the figure panel, we plotted the number of orthologous gene pairs within specified ranges. Each dot represents the aggregated log10-transformed counts of these gene pairs. The dashed line captures 95% of the data. c, Distribution of absolute TEC among 110 TOP motif-containing mRNAs123 and 83 transcripts targeted by CSDE1 (Supplementary Table 22 (ref. 47); Methods) in comparison to all 11,149 human genes as background. Statistical significance between the groups was assessed using a Wilcoxon two-tailed test.

Extended Data Fig. 9 |. TEC and RNA co-expression among genes with shared functions.

Extended Data Fig. 9 |

a, A comparison between the number of human GO terms that have AUROC of 0.8 or higher with either TEC or RNA co-expression. b, Motif enrichment in human GO terms. RNA binding proteins (RBPs) from oRNAment94 or Transite95 are indicated. P-values were corrected using the Holm method and those kmers with a p-value < 0.05 are shown. c, Venn diagram for mouse GO terms that achieve an AUROC of 0.8 or higher with proportionality scores (rho) among genes at either TE or RNA expression level. d, The AUROC plot was calculated with genes associated with mannosyltransferase activity in mice. e, The connections represent absolute rho values above 0.1 in either TE pattern alone (green) from d, in both RNA co-expression and TE pattern (blue), or RNA co-expression alone (gray). f, Motif enrichment in mouse GO terms. RNA binding proteins (RBPs) from oRNAment94 or Transite95 are indicated. P-values were corrected using the Holm method and those kmers with a p-value < 0.05 are shown. g, We summarized GO terms where genes exhibit greater similarity at the TE level than at the RNA expression level (AUROC with TEC > 0.8, and different AUROC between TEC and RNA co-expression > 0.1) in mice. We visualized the distribution of absolute rho score for gene pairs within each specific GO term (bottom; gene pairs with abs(rho) > 0.1) at the TE level.

Extended Data Fig. 10 |. 3D structure of the interaction between LRRC28 with FOXK1.

Extended Data Fig. 10 |

a, AlphaFold2-multimer predicted binding between LRRC28 and FOXK1. Kinetic ECAR response of b, MCF-7 cell line (n = 6, stable overexpression) and c, HEK293T cell line (n = 6; stable overexpression) overexpressing LRRC28 or LRCC42 to 10 mM glucose and 100 mM 2-DG. Unpaired two-sided Student’s t-test, (MCF-7; measurement 4 p = 0.06, 5 p = 0.1, 6 p = 0.3 & HEK293T measurement 4 p = 0.6, 5 p = 0.8, 6 p = 0.4). Panels b & c show mean ± s.d.; n shows biological independent experiments.

Supplementary Material

Supplementary Information
Supplementary Tables

Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41587-025-02718-5.

Acknowledgements

We thank all contributors to metadata curation: H. Chiang, A. Hoffman, T. Tonn, A. Segura, C. Tante, E. Vasquez and L. Xu. We also thank Y. Shin and V. D. Chapman for their help with the experiments. We appreciate M. Miladi for providing critical feedback. The original text in this paper was written by the authors. A large language model was used to suggest edits for clarity and grammar (Open AI ChatGPT, https://chat.openai.com). The authors acknowledge the Texas Advanced Computing Center at The University of Texas at Austin (http://www.tacc.utexas.edu) for providing high-performance computing and storage resources that contributed to the research results reported within this paper.

Research reported in this publication was supported in part by the National Institute of General Medical Sciences of the National Institutes of Health (NIH) under award numbers R35GM150667 (C.C.) and R35GM138340 (E.S.C.). This work was also supported by NIH grant HD110096 (C.C.) and Welch Foundation grants F-2027-20230405 (C.C.) and F-2133-20230405 (E.S.C.). C.C. was a Cancer Prevention and Research Institute of Texas (CPRIT) Scholar in Cancer Research, supported by CPRIT grant RR180042.

Footnotes

Online content

Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41587-025-02718-5.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Competing interests

D.Z., J.W. and V.A. are employees of Sanofi and may hold shares and/or stock options in the company. H.O. is an employee of Sail Biomedicines. I.H. is an employee of Monoceros Biosystems. The remaining authors declare no competing interests.

Extended data is available for this paper at https://doi.org/10.1038/s41587-025-02718-5.

Data availability

Metadata about RiboBase can be found in Supplementary Table 1. Ribo files for the HeLa cell line are accessible via Zenodo at https://doi.org/10.5281/zenodo.15660080 (ref. 103). Full TEC and RNA co-expression matrices are accessible via Zenodo at https://doi.org/10.5281/zenodo.10373032 (ref. 127). A RiboFlow configuration file and processed ribo files for RBP knockout can be accessed via Zenodo at https://doi.org/10.5281/zenodo.11388478 (ref. 135). Sequencing data and ribo files for the RBP knockout experiments are available under GEO accession code GSE269734.

Code availability

The code and data used in this study are available via Zenodo at https://doi.org/10.5281/zenodo.10373032 (ref. 127) and via GitHub at https://github.com/CenikLab/TE_model. The code and data used to generate figures can be found via Zenodo at https://doi.org/10.5281/zenodo.15337774 (ref. 136) and via GitHub at https://github.com/CenikLab/coTE_paper.

References

  • 1.Tang F et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009). [DOI] [PubMed] [Google Scholar]
  • 2.Nagalakshmi U et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Mortazavi A, Williams BA, McCue K, Schaeffer L & Wold B Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008). [DOI] [PubMed] [Google Scholar]
  • 4.Schena M, Shalon D, Davis RW & Brown PO Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995). [DOI] [PubMed] [Google Scholar]
  • 5.Chen KH, Boettiger AN, Moffitt JR, Wang S & Zhuang X RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Combs PA & Eisen MB Sequencing mRNA from cryo-sliced Drosophila embryos to determine genome-wide spatial patterns of gene expression. PLoS ONE 8, e71820 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Achim K et al. High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nat. Biotechnol 33, 503–509 (2015). [DOI] [PubMed] [Google Scholar]
  • 8.Langfelder P & Horvath S WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Eisen MB, Spellman PT, Brown PO & Botstein D Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA 95, 14863–14868 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Skinnider MA, Squair JW & Foster LJ Evaluating measures of association for single-cell transcriptomics. Nat. Methods 16, 381–386 (2019). [DOI] [PubMed] [Google Scholar]
  • 11.Stuart JM, Segal E, Koller D & Kim SK A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003). [DOI] [PubMed] [Google Scholar]
  • 12.Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO & Eisenberg D A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86 (1999). [DOI] [PubMed] [Google Scholar]
  • 13.DeRisi JL, Iyer VR & Brown PO Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680–686 (1997). [DOI] [PubMed] [Google Scholar]
  • 14.Jansen R, Greenbaum D & Gerstein M Relating whole-genome expression data with protein–protein interactions. Genome Res. 12, 37–46 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Szklarczyk D et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638–D646 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Tavazoie S, Hughes JD, Campbell MJ, Cho RJ & Church GM Systematic determination of genetic network architecture. Nat. Genet 22, 281–285 (1999). [DOI] [PubMed] [Google Scholar]
  • 17.Roth FP, Hughes JD, Estep PW & Church GM Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol 16, 939–945 (1998). [DOI] [PubMed] [Google Scholar]
  • 18.Nusinow DP et al. Quantitative proteomics of the Cancer Cell Line Encyclopedia. Cell 180, 387–402 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gonçalves E et al. Pan-cancer proteomic map of 949 human cell lines. Cancer Cell 40, 835–849 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ryan CJ, Kennedy S, Bajrami I, Matallanas D & Lord CJ A compendium of co-regulated protein complexes in breast cancer reveals collateral loss events. Cell Syst. 5, 399–409 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Singh G, Pratt G, Yeo GW & Moore MJ The clothes make the mRNA: past and present trends in mRNP fashion. Annu. Rev. Biochem 84, 325–354 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Keene JD & Tenenbaum SA Eukaryotic mRNPs may represent posttranscriptional operons. Mol. Cell 9, 1161–1167 (2002). [DOI] [PubMed] [Google Scholar]
  • 23.Keene JD RNA regulons: coordination of post-transcriptional events. Nat. Rev. Genet 8, 533–543 (2007). [DOI] [PubMed] [Google Scholar]
  • 24.Li G-W, Burkhardt D, Gross C & Weissman JS Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell 157, 624–635 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Taggart JC & Li G-W Production of protein-complex components is stoichiometric and lacks general feedback regulation in eukaryotes. Cell Syst. 7, 580–589 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Amirbeigiarab S et al. Invariable stoichiometry of ribosomal proteins in mouse brain tissues with aging. Proc. Natl Acad. Sci. USA 116, 22567–22572 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Soto I et al. Balanced mitochondrial and cytosolic translatomes underlie the biogenesis of human respiratory complexes. Genome Biol. 23, 170 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Natan E et al. Cotranslational protein assembly imposes evolutionary constraints on homomeric proteins. Nat. Struct. Mol. Biol 25, 279–288 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Li G-W, Oh E & Weissman JS The anti-Shine–Dalgarno sequence drives translational pausing and codon choice in bacteria. Nature 484, 538–541 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Bertolini M et al. Interactions between nascent proteins translated by adjacent ribosomes drive homomer assembly. Science 371, 57–64 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Ozadam H, Geng M & Cenik C RiboFlow, RiboR and RiboPy: an ecosystem for analyzing ribosome profiling data at read length resolution. Bioinformatics 36, 2929–2931 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Gerashchenko MV & Gladyshev VN Ribonuclease selection for ribosome profiling. Nucleic Acids Res. 45, e6 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Mohammad F, Green R & Buskirk AR A systematically-revised ribosome profiling method for bacteria reveals pauses at single-codon resolution. eLife 8, e42591 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ingolia NT, Ghaemmaghami S, Newman JRS & Weissman JS Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Larsson O, Sonenberg N & Nadon R Identification of differential translation in genome wide studies. Proc. Natl Acad. Sci. USA 107, 21487–21492 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.van den Boogaart KG, Filzmoser P, Hron K, Templ M & Tolosana-Delgado R Classical and robust regression analysis with compositional data. Math. Geosci 53, 823–858 (2021). [Google Scholar]
  • 37.Quinn TP et al. A field guide for the compositional analysis of any-omics data. Gigascience 8, giz107 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Quinn TP, Richardson MF, Lovell D & Crowley TM propr: an R-package for identifying proportionally abundant features using compositional data analysis. Sci. Rep 7, 16252 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Sudmant PH, Alexis MS & Burge CB Meta-analysis of RNA-seq expression data across species, tissues and studies. Genome Biol. 16, 287 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Wang Z-Y et al. Transcriptome and translatome co-evolution in mammals. Nature 588, 642–647 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Lu P, Takai K, Weaver VM & Werb Z Extracellular matrix degradation and remodeling in development and disease. Cold Spring Harb. Perspect. Biol 3, a005058 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Artieri CG & Fraser HB Evolution at two levels of gene expression in yeast. Genome Res. 24, 411–421 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.McManus CJ, May GE, Spealman P & Shteyman A Ribosome profiling reveals post-transcriptional buffering of divergent gene expression in yeast. Genome Res. 24, 422–430 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Breschi A, Gingeras TR & Guigó R Comparative transcriptomics in human and mouse. Nat. Rev. Genet 18, 425–440 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Crow M, Suresh H, Lee J & Gillis J Coexpression reveals conserved gene programs that co-vary with cell type across kingdoms. Nucleic Acids Res. 50, 4302–4314 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Thoreen CC et al. A unifying model for mTORC1-mediated regulation of mRNA translation. Nature 485, 109–113 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Wurth L et al. UNR/CSDE1 drives a post-transcriptional program to promote melanoma invasion and metastasis. Cancer Cell 36, 337 (2019). [DOI] [PubMed] [Google Scholar]
  • 48.Pierson E et al. Sharing and specificity of co-expression networks across 35 human tissues. PLoS Comput. Biol 11, e1004220 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Kershaw CJ et al. Translation factor and RNA binding protein mRNA interactomes support broader RNA regulons for posttranscriptional control. J. Biol. Chem 299, 105195 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Hentze MW, Castello A, Schwarzl T & Preiss T A brave new world of RNA-binding proteins. Nat. Rev. Mol. Cell Biol 19, 327–341 (2018). [DOI] [PubMed] [Google Scholar]
  • 51.Liu Y The number of genes whose TE significantly correlates with an RBP’s expression. Zenodo 10.5281/zenodo.11359114 (2024). [DOI] [Google Scholar]
  • 52.Korbel JO, Jensen LJ, von Mering C & Bork P Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nat. Biotechnol 22, 911–917 (2004). [DOI] [PubMed] [Google Scholar]
  • 53.Szklarczyk R et al. WeGET: predicting new genes for molecular systems by weighted co-expression. Nucleic Acids Res. 44, D567–D573 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Zhang M et al. RNA-binding protein IMP3 is a novel regulator of MEK1/ERK signaling pathway in the progression of colorectal cancer through the stabilization of MEKK1 mRNA. J. Exp. Clin. Cancer Res 40, 200 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Bodén M & Bailey TL Associating transcription factor-binding site motifs with target GO terms and target genes. Nucleic Acids Res. 36, 4108–4117 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Machanick P & Bailey TL MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Eichhorn SW et al. mRNA destabilization is the dominant effect of mammalian microRNAs by the time substantial repression ensues. Mol. Cell 56, 104–115 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Bartel DP Metazoan microRNAs. Cell 173, 20–51 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Mecham R The Extracellular Matrix: An Overview (Springer Science & Business Media, 2011). [Google Scholar]
  • 60.Kagan HM & Li W Lysyl oxidase: properties, specificity, and biological roles inside and outside of the cell. J. Cell. Biochem 88, 660–672 (2003). [DOI] [PubMed] [Google Scholar]
  • 61.Kikuchi A et al. Structural basis for activation of DNMT1. Nat. Commun 13, 7130 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Wu Y-Y et al. The hTERT-p50 homodimer inhibits PLEKHA7 expression to promote gastric cancer invasion and metastasis. Oncogene 42, 1144–1156 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Kurita S, Yamada T, Rikitsu E, Ikeda W & Takai Y Binding between the junctional proteins afadin and PLEKHA7 and implication in the formation of adherens junction in epithelial cells. J. Biol. Chem 288, 29356–29368 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Pulimeno P, Paschoud S & Citi S A role for ZO-1 and PLEKHA7 in recruiting paracingulin to tight and adherens junctions of epithelial cells. J. Biol. Chem 286, 16743–16750 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Jeung H-C et al. PLEKHA7 signaling is necessary for the growth of mutant KRAS driven colorectal cancer. Exp. Cell. Res 409, 112930 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Tavano S et al. Insm1 induces neural progenitor delamination in developing neocortex via downregulation of the adherens junction belt-specific protein Plekha7. Neuron 97, 1299–1314 (2018). [DOI] [PubMed] [Google Scholar]
  • 67.Sukonina V et al. FOXK1 and FOXK2 regulate aerobic glycolysis. Nature 566, 279–283 (2019). [DOI] [PubMed] [Google Scholar]
  • 68.Kobe B & Kajava AV The leucine-rich repeat as a protein recognition motif. Curr. Opin. Struct. Biol 11, 725–732 (2001). [DOI] [PubMed] [Google Scholar]
  • 69.Evans R et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv 10.1101/2021.10.04.463034 (2021). [DOI] [Google Scholar]
  • 70.Carlsson P & Mahlapuu M Forkhead transcription factors: key players in development and metabolism. Dev. Biol 250, 1–23 (2002). [DOI] [PubMed] [Google Scholar]
  • 71.Lambert SA et al. The human transcription factors. Cell 172, 650–665 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Kustatscher G et al. Co-regulation map of the human proteome enables identification of protein functions. Nat. Biotechnol 37, 1361–1371 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Szklarczyk D et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43, D447–D452 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Shiber A et al. Cotranslational assembly of protein complexes in eukaryotes revealed by ribosome profiling. Nature 561, 268–272 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Ewing RM et al. Large-scale mapping of human protein–protein interactions by mass spectrometry. Mol. Syst. Biol 3, 89 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Drew K, Wallingford JB & Marcotte EM hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies. Mol. Syst. Biol 17, e10016 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Heider MR et al. Subunit connectivity, assembly determinants and architecture of the yeast exocyst complex. Nat. Struct. Mol. Biol 23, 59–66 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Kee Y et al. Subunit structure of the mammalian exocyst complex. Proc. Natl Acad. Sci. USA 94, 14438–14443 (1997). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Lalanne J-B et al. Evolutionary convergence of pathway-specific enzyme expression stoichiometry. Cell 173, 749–761 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Bicknell AA et al. Attenuating ribosome load improves protein output from mRNA by limiting translation-dependent mRNA decay. Cell Rep. 43, 114098 (2024). [DOI] [PubMed] [Google Scholar]
  • 81.Liu T-Y et al. Time-resolved proteomics extends ribosome profiling-based measurements of protein synthesis dynamics. Cell Syst. 4, 636–644 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Wang M, Herrmann CJ, Simonovic M, Szklarczyk D & von Mering C Version 4.0 of PaxDb: protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics 15, 3163–3168 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Piepoli A et al. The expression of leucine-rich repeat gene family members in colorectal cancer. Exp. Biol. Med 237, 1123–1128 (2012). [DOI] [PubMed] [Google Scholar]
  • 84.Liu Y et al. Identification of differential expression of genes in hepatocellular carcinoma by suppression subtractive hybridization combined cDNA microarray. Oncol. Rep 18, 943–951 (2007). [PubMed] [Google Scholar]
  • 85.Chen H et al. miR-218 contributes to drug resistance in multiple myeloma via targeting LRRC28. J. Cell. Biochem 122, 305–314 (2021). [DOI] [PubMed] [Google Scholar]
  • 86.Vander Heiden MG, Cantley LC & Thompson CB Understanding the Warburg effect: the metabolic requirements of cell proliferation. Science 324, 1029–1033 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Liu Y et al. Histone H2AX promotes metastatic progression by preserving glycolysis via hexokinase-2. Sci. Rep 12, 3758 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Zheng D et al. Predicting the translation efficiency of messenger RNA in mammalian cells. Nat. Bio 10.1038/s41587-025-02712-x (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Rodriguez JM et al. APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Res. 41, D110–D117 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Rao S et al. Genes with 5′ terminal oligopyrimidine tracts preferentially escape global suppression of translation by the SARS-CoV-2 Nsp1 protein. RNA 27, 1025–1045 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Mills EW & Green R Ribosomopathies: there’s strength in numbers. Science 358, eaan2755 (2017). [DOI] [PubMed] [Google Scholar]
  • 92.Ozadam H et al. Single-cell quantification of ribosome occupancy in early mouse development. Nature 618, 1057–1064 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.VanInsberghe M, van den Berg J, Andersson-Rolf A, Clevers H & van Oudenaarden A Single-cell Ribo-seq reveals cell cycle-dependent translational pausing. Nature 597, 561–565 (2021). [DOI] [PubMed] [Google Scholar]
  • 94.Benoit Bouvrette LP, Bovaird S, Blanchette M & Lécuyer E oRNAment: a database of putative RNA binding protein target sites in the transcriptomes of model species. Nucleic Acids Res. 48, D166–D173 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Krismer K et al. Transite: a computational motif-based analysis platform that identifies RNA-binding proteins modulating changes in gene expression. Cell Rep. 32, 108064 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Van Nostrand EL et al. A large-scale binding and functional map of human RNA-binding proteins. Nature 583, 711–719 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Hou Y, Xie T, He L, Tao L & Huang J Topological links in predicted protein complex structures reveal limitations of AlphaFold. Commun. Biol 6, 1098 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Burke DF et al. Towards a structurally resolved human protein interaction network. Nat. Struct. Mol. Biol 30, 216–225 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Bryant P, Pozzati G & Elofsson A Improved prediction of protein-protein interactions using AlphaFold2. Nat. Commun 13, 1265 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.National Center for Biotechnology Information. SRA Tools. GitHub https://github.com/ncbi/sra-tools (2018). [Google Scholar]
  • 101.Martin M Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011). [Google Scholar]
  • 102.Langmead B & Salzberg SL Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Liu Y HeLa ribosome profiling data. Zenodo 10.5281/zenodo.15660080 (2024). [DOI] [Google Scholar]
  • 104.Gerashchenko MV & Gladyshev VN Translation inhibitors cause abnormalities in ribosome profiling experiments. Nucleic Acids Res. 42, e134 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Wu CC-C, Zinshteyn B, Wehner KA & Green R High-resolution ribosome profiling defines discrete ribosome elongation states and translational regulation during cellular stress. Mol. Cell 73, 959–970 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Wolin SL & Walter P Ribosome pausing and stacking during translation of a eukaryotic mRNA. EMBO J. 7, 3559–3569 (1988). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Sharma J et al. A small molecule that induces translational readthrough of CFTR nonsense mutations by eRF1 depletion. Nat. Commun 12, 4358 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Tukey JW The future of data analysis. Ann. Math. Stat 33, 1–67 (1962). [Google Scholar]
  • 109.Zhang X-O, Yin Q-F, Chen L-L & Yang L Gene expression profiling of non-polyadenylated RNA-seq across species. Genom. Data 2, 237–241 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Yang L, Duff MO, Graveley BR, Carmichael GG & Chen L-L Genomewide characterization of non-polyadenylated RNAs. Genome Biol. 12, R16 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.van den Boogaart KG & Tolosano-Delgado R Analyzing Compositional Data with R (Springer, 2013). [Google Scholar]
  • 112.Cenik C et al. Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans. Genome Res. 25, 1610–1621 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Greenacre M Compositional data analysis. Annu. Rev. Stat. Appl 8, 271–299 (2021). [Google Scholar]
  • 114.Ramsköld D, Wang ET, Burge CB & Sandberg R An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data. PLoS Comput. Biol 5, e1000598 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Csárdi G, Franks A, Choi DS, Airoldi EM & Drummond DA Accounting for experimental noise reveals that mRNA levels, amplified by post-transcriptional processes, largely determine steady-state protein levels in yeast. PLoS Genet. 11, e1005206 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Schilder BM & Skene NG orthogene: An R package for easy mapping of orthologous genes across hundreds of species. R package version 3.21 10.18129/B9.bioc.orthogene (2022). [DOI] [Google Scholar]
  • 117.van den Boogaart KG & Tolosana-Delgado R ‘compositions’: a unified R package to analyze compositional data. Comput. Geosci 34, 320–338 (2008). [Google Scholar]
  • 118.Kim S ppcor: an R package for a fast calculation to semi-partial correlation coefficients. Commun. Stat. Appl. Methods 22, 665–674 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Berriz GF, Beaver JE, Cenik C, Tasan M & Roth FP Next generation software for functional trend analysis. Bioinformatics 25, 3043–3044 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Buttrey S & Whitaker L TreeClust: an R package for tree-based clustering dissimilarities. R J. 7, 227 (2015). [Google Scholar]
  • 121.Wainberg M et al. A genome-wide atlas of co-essential modules assigns function to uncharacterized genes. Nat. Genet 53, 638–649 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Gene Ontology Consortium et al. The Gene Ontology knowledgebase in 2023. Genetics 224, iyad031 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Philippe L, van den Elzen AMG, Watson MJ & Thoreen CC Global analysis of LARP1 translation targets reveals tunable and dynamic features of 5′ TOP motifs. Proc. Natl Acad. Sci. USA 117, 5319–5328 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Ballouz S, Weber M, Pavlidis P & Gillis J EGAD: ultra-fast functional analysis of gene networks. Bioinformatics 33, 612–614 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Carlson M org.Mm.eg.db: Genome wide annotation for mouse. R package version 3.21 10.18129/B9.bioc.org.Mm.eg.db (2025). [DOI] [Google Scholar]
  • 126.Carlson M org.Hs.eg.db: Genome wide annotation for human. R package version 3.21 10.18129/B9.bioc.org.Hs.eg.db (2025). [DOI] [Google Scholar]
  • 127.Liu Y Intermediate data for TE calculation. Zenodo 10.5281/zenodo.10373032 (2024). [DOI] [Google Scholar]
  • 128.Jumper J et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Hu Y et al. Paralog Explorer: a resource for mining information about paralogs in common research organisms. Comput. Struct. Biotechnol. J 20, 6570–6577 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Quinlan AR & Hall IM BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Ho D, Imai K, King G & Stuart EA MatchIt: nonparametric preprocessing for parametric causal inference. J. Stat. Softw 10.18637/jss.v042.i08 (2011). [DOI] [Google Scholar]
  • 133.Sanson KR et al. Optimized libraries for CRISPR–Cas9 genetic screens with multiple modalities. Nat. Commun 9, 5416 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Sanjana NE, Shalem O & Zhang F Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783–784 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Liu Y KO_validation_RiboBase. Zenodo 10.5281/zenodo.11388478 (2024). [DOI] [Google Scholar]
  • 136.Yue L coTE_paper: code and to generate main figures. Zenodo 10.5281/zenodo.15337774 (2025). [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information
Supplementary Tables

Data Availability Statement

Metadata about RiboBase can be found in Supplementary Table 1. Ribo files for the HeLa cell line are accessible via Zenodo at https://doi.org/10.5281/zenodo.15660080 (ref. 103). Full TEC and RNA co-expression matrices are accessible via Zenodo at https://doi.org/10.5281/zenodo.10373032 (ref. 127). A RiboFlow configuration file and processed ribo files for RBP knockout can be accessed via Zenodo at https://doi.org/10.5281/zenodo.11388478 (ref. 135). Sequencing data and ribo files for the RBP knockout experiments are available under GEO accession code GSE269734.

The code and data used in this study are available via Zenodo at https://doi.org/10.5281/zenodo.10373032 (ref. 127) and via GitHub at https://github.com/CenikLab/TE_model. The code and data used to generate figures can be found via Zenodo at https://doi.org/10.5281/zenodo.15337774 (ref. 136) and via GitHub at https://github.com/CenikLab/coTE_paper.

RESOURCES