Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2023 Nov 23:2023.05.11.540322. Originally published 2023 May 12. [Version 2] doi: 10.1101/2023.05.11.540322

A ubiquitous GC content signature underlies multimodal mRNA regulation by DDX3X

Ziad Jowhar 1,9,10,#, Albert Xu 1,9,10,#, Srivats Venkataramanan 1, Francesco Dossena 8, Mariah L Hoye 2, Debra L Silver 2,3,4,5,6, Stephen N Floor 1,7,*, Lorenzo Calviello 8,*
PMCID: PMC10197686  PMID: 37214951

Abstract

The road from transcription to protein synthesis is paved with many obstacles, allowing for several modes of post-transcriptional regulation of gene expression. A fundamental player in mRNA biology is DDX3X, an RNA binding protein that canonically regulates mRNA translation. By monitoring dynamics of mRNA abundance and translation following DDX3X depletion, we observe stabilization of translationally suppressed mRNAs. We use interpretable statistical learning models to uncover GC content in the coding sequence as the major feature underlying RNA stabilization. This result corroborates GC content-related mRNA regulation detectable in other studies, including hundreds of ENCODE datasets and recent work focusing on mRNA dynamics in the cell cycle. We provide further evidence for mRNA stabilization by detailed analysis of RNA-seq profiles in hundreds of samples, including a Ddx3x conditional knockout mouse model exhibiting cell cycle and neurogenesis defects. Our study identifies a ubiquitous feature underlying mRNA regulation and highlights the importance of quantifying multiple steps of the gene expression cascade, where RNA abundance and protein production are often uncoupled.

Introduction

The cytoplasmic fate of RNA molecules is impacted their subcellular localization, RNA binding partners, and engagement with the ribosomal machinery. These aspects are strongly interconnected1, which poses a great challenge, as it increases the number of variables and experimental approaches needed to answer many questions in mRNA biology. To this end, many protocols couple biochemical isolation, or metabolic labeling, of RNA with high throughput sequencing technologies, thus providing a snapshot of the transcriptome at specific stages of the mRNA life cycle, with high throughput and sensitivity. For example, high-throughput sequencing protocols, when coupled to ribosome isolation, such as in Ribo-seq2, metabolic labeling strategies in SLAM-seq3, immunoprecipitation of RNA binding proteins (RBP) as in CLIP-seq4 and many others, have shed light on many regulatory mechanisms pertaining to different aspects of post-transcriptional gene regulation.

DDX3X is a multifunctional RNA helicase that is highly expressed in many tissues and able to unwind structured RNA to influence cytoplasmic post-transcriptional gene regulation5. Together with its ability to bind initiating ribosomes, DDX3X has been often described as a translation regulator, specifically promoting translation of RNA with structured 5’UTRs6,7. However, as mentioned above, cytoplasmic processes like translation or mRNA decay are intertwined, and connection between the two processes encompass different molecular mechanisms, such as mRNA surveillance mechanisms like nonsense-mediated decay (NMD)8, ribosome-collision dependent mRNA cleavage9, and others. In order to understand when and how such processes are coupled, it is important to study the dynamics of such mechanisms. For instance, it has been proposed that miRNA can first trigger translation suppression and then mRNA deadenylation and decapping leading to RNA degradation10.

Mutations in DDX3X are associated with a variety of human diseases including cancers and developmental delay11. Variant types are disease selective in DDX3X, with cancers ranging from primarily loss-of-function alleles in NK-TCL and other blood cancers to nearly exclusively missense variants in medulloblastoma11. In DDX3X syndrome, missense variants are phenotypically more severe than loss-of-function. Previously, we used functional genomics approaches to identify mechanistic differences between depletion of DDX3X and expression of missense variants7. We found that DDX3X missense variants predominantly affect ribosome occupancy while DDX3X depletion impacts both ribosome occupancy and RNA levels. However, it is unclear whether the changes in RNA levels constituted a cellular response to translation suppression, often described as “buffering”12.

mRNA regulation has been linked to neurogenesis during development, where multiple RNA binding factors, including DDX3X, ensure correct protein synthesis as cells transition between different fates and states13. To that end, it is important to think about the dynamics of gene expression, as complex dynamics of cell proliferation and differentiation ensure correct developmental patterning.

In order to access such complex interplays of a multitude of factors which shape gene expression, large-scale consortia have provided a great resource for investigations into gene regulation. While historically devoted to promoting investigation into transcriptional regulation, recent efforts started to provide precious information into post-transcriptional mechanisms, with hundreds of RBPs profiled in terms of both binding and function, by means of CLIP-seq, and knockdown followed by RNA-seq14. As in biology many molecular processes are interconnected, large-scale datasets and data amenable to re-analysis are at the very heart of many research efforts15.

Here, we identify how inactivation of DDX3X evolves over time to lead to acute and long-term changes to post-transcriptional gene regulation. We here employ different analytical approaches applied to newly generated experimental data and many previously published studies related to mRNA regulation, to show that GC content is associated with mRNA stability changes following DDX3X depletion. Our analyses indicate that this effect is widespread and is associated with cell cycle changes in mRNA regulation, including RNA stability. This further reinforces roles for DDX3X in RNA stability in addition to translation. Together, our work represents a significant advancement in the understanding of a fundamental regulator, which sits at the very heart of the gene expression cascade.

Results

Time-resolved gene expression regulation by DDX3X.

To characterize the dynamics of DDX3X-dependent changes in the gene expression cascade, we employed a previously validated auxin-degron system to efficiently deplete DDX3X protein in the HCT116 colorectal cancer cell line16, where we found near-complete rescue of gene expression changes by DDX3X expression, thus being able to use this tool to monitor DDX3X-dependent changes. We profiled RNA levels and translation using RNA-seq and Ribo-seq along a time-course of DDX3X depletion, at 4, 8, 16, 24 and 48 hours after auxin or DMSO control treatment. (Figure 1A). Efficiency of DDX3X depletion, together with quality control and general statistics of the generated libraries, can be found in Supplementary Figure 1 and Supplementary Table 1. As expected, the number of differentially expressed genes increased along the time-course, with most changes supporting the role of DDX3X as a positive regulator of translation (Figure 1B). Changes in translation were negatively correlated with changes in mRNA levels, which together contributed to many changes in Translation Efficiency (TE), calculated using Ribo-seq changes given RNA-seq changes (Methods). At a closer look, we observed how “TE_down” mRNAs undergo translation suppression in the early time point after DDX3X depletion, with their mRNA levels increasing in the later time points (Figure 1C). The opposite behavior is observed for “TE_up” mRNAs, exhibiting higher ribosome occupancy first, and lower mRNA levels later. Such behavior was more evident when showing time-point specific changes and binning mRNAs in a 2D grid on the Ribo-seq/RNA-seq coordinate plane (Figure 1D, Methods), which highlighted a common regulatory mode, with early translation regulation followed by changes in mRNA levels.

Figure 1: Dynamics of mRNA regulation by DDX3X.

Figure 1:

In A) a description of the experimental design. Below Ribo-seq and RNA-seq fold changes at different time points. Different regulated classes are shown in different colors. The size of the dots indicates the adjusted P-values for differential translation efficiency test (Methods). TE: translation efficiency, NS: not significant. In B) average delta TE values (differences in TE values) for each class along the time course. The size of the dots indicates the number of significantly changing mRNAs. C) progression along the time course for mRNA regulated 48h post degron induction. RNA-seq and Ribo-seq fold changes are shown at each time point. D) Differences in Ribo-seq or RNA-seq fold changes between each time point and the previous one, shown as a vector plot. Magnitude of changes shown as a color gradient, while transparency of the vectors indicates the number of mRNAs in each coordinate bin (Methods).

This analysis shows the time-resolved dynamics of mRNA regulation by DDX3X, with hundreds of mRNAs changing in their steady-state levels albeit showing the opposite directionality in translation rates.

Translation suppression by DDX3X is coupled with mRNA stabilization.

Changes to transcript levels can result from changes in transcription rates or post-transcriptional regulation. To identify the relative contribution of different processes to RNA levels, we used our time-course dataset to calculate changes in transcription, processing and stability using INSPEcT17. INSPEcT uses the proportion of intronic versus exonic reads to identify nascent vs. mature transcripts, and uses a system of ordinary differential equations (ODEs) to infer rates of RNA synthesis, processing and decay. Compared to non-regulated mRNAs, regulated mRNAs showed modest changes in transcription rates, suggesting transcription changes are not the major contributor to RNA level changes following DDX3 depletion, In contrast, we found more pronounced changes in mRNA stability as evidenced by TE down transcripts (Figure 2A). As our initial RNA-seq protocol was not designed to capture pre-mRNA molecules, we validated our estimated mRNA stability changes by employing the 4sU metabolic labeling SLAM-seq protocol3 in our degron system after 8 hours of DDX3X depletion, in a way to detect changes in mRNA stability at early time points. Briefly, cells were incubated with 4sU to comprehensively label transcribed RNAs, and their abundance was followed after 8h of DDX3X degron activation, using DMSO as control. 4sU treatment induces T>C conversions in the sequenced cDNA molecules, which can be used to monitor mRNA stability changes after a uridine chase, as shown in Figure 2B. As expected, we observed a drastic drop in T>C harboring reads after the chase, which reflects mRNA decay rates (Supplementary Figure 2). As shown in Figure 2B, after a labeling time of 24 hours, the percentage of reads harboring T>C mutations was different for the regulated categories (Methods) after only 8 hours of degron induction, confirming the stabilization of translationally suppressed mRNAs upon DDX3X depletion. While the modest depth and resolution of our SLAM-seq dataset (Supplementary Figure 2) couldn’t allow for more detailed insights on mRNA changes, it represented an important validation of mRNA stability regulation by DDX3X. In addition, we profiled RNA abundance via qPCR combining our DDX3X degron system with ActD treatment, to measure RNA stability changes. We selected few target genes: JUND was identified in our data as a stabilized RNA, while EIF2A was identified to be degraded. RACK1, LGALS1, and PFN1 were used as controls to normalize with via RT-PCR with Taq-man probes. JUND RNA was stabilized after 24 hours with knock down of DDX3 and Actinomycin D (ACTd) treatment (Supplementary Figure 3A); EIF2A RNA was more degraded after 24 hours with knock down of DDX3 and ACTD (Supplementary Figure 3B). These results show an overall good agreement between the qPCR and the sequencing-based assays, despite the difficulty arising from choosing control genes and the modest fold changes observed in the sequencing data.

Figure 2: Stabilization of untranslated mRNAs.

Figure 2:

A) Synthesis and decay as inferred by INSPEcT: different regulated classes in different colors along the time course. Log2FC of estimated rates with respect to control are shown on the y axis. B) Schematic of a SLAM-seq experiment (above). Real data shown at the bottom: percentage of T>C-containing reads on the y axis after labeling and chase. DDX3X degron (using DMSO as a control) was triggered together with the chase reaction to monitor differences in decay rates upon DDX3X depletion. Significance values from a one-sided Wilcoxon test.

By profiling ribosome occupancy, steady state transcript levels, and mRNA decay, this analysis shows that DDX3X depletion triggers multiple modes of post-transcriptional regulation, involving translation suppression and a subsequent wave of mRNA stabilization.

GC-rich coding sequences underlie mRNA regulation by DDX3X.

With hundreds of mRNAs post-transcriptionally regulated after DDX3X depletion, we aimed to identify specific features belonging to up- or downregulated targets. We therefore built regression models to quantitatively predict levels of TE changes (Methods, Supplementary Table 2). We used different biophysical properties of genes and mRNAs, (e.g. length and GC content) and several gene and transcript features (e.g. introns, 3’UTR, etc.., Methods) as features for a Random Forest regression model. Given the extensive literature on codon-mediated mRNA stability regulation, we added codon frequencies and previously validated codon optimality calculations18. Also, we added measured GC-content at 1st, 2nd or 3rd codon position, as it was recently shown to potentially play a role in mRNA stability regulation19,20. In addition, to pinpoint features predictive of mRNA stability changes rather than translation changes exclusively, we divided transcripts according to their position in the Ribo-seq/RNA-seq coordinate system, to capture mRNAs where changes between assays agreed or not (Figure 3A, Methods). Interestingly, the categories differed in their DDX3X binding pattern (Supplementary Figure 4): re-analysis of our previously published PAR-CLIP data showed how stabilized targes (x,-xy groups) have a lower T>C conversion signal in their 5’UTRs, and a higher signal in CDS peaks, with the opposite being true for true translation targets (y group). This analysis suggests that stabilized mRNAs might be regulated differently than “canonical” translationally suppressed targets.

Figure 3: GC content in the coding sequence predicts regulation by DDX3X.

Figure 3:

A) Classification of different mRNAs according to their change in mRNA levels or translation. In B) model performance (correlation between predicted vs. real values) on unseen test data of the random forest regression model for transcript classes as defined in A). C) Predictive power of most informative features, with their importance values (Methods) plotted on the x axis. Feature pertaining to GC content in different section of transcripts (GCpct*), baseline translation levels (base_TE), codon frequencies (codonfr*), positional read density (posdens*), and length features (intronlen) are displayed. D) Vector plot as in Figure 1D, highlighting GCcds values. Partition of inferred degradation rate (E) or SLAM-seq profiles (F,) for mRNAs partitioned by GCcds values. Significance values for SLAM-seq from a one-sided Wilcoxon test.

As shown in Figure 3B, the Random Forest model predicted TE changes with high precision, especially in cases where mRNA stability and translation were anti-correlated (-xy group). In addition, this model calculated the predictive power of each input feature (Figure 3C, Methods), which highlighted GC content in the coding sequence (which we will refer to as GCcds) as the most important feature. Feature selection is a very important method to select predictive features, especially when facing high levels of multicollinearity (Supplementary Figure 5). To validate the results from the Random Forest regression, we used Lasso regression (Methods), another well-known method for feature selection. Results from the Lasso regression were similar, and also identified GC content in the coding sequence as the most relevant feature in predicting TE changes (Supplementary Figure 6). GC content in the CDS remained the top predictor when using additional features, such as GC content in different sections of the CDS, or amino acid frequencies (Supplementary Figure 7).

In the light of these results, we tested whether GCcds was associated with the DDX3X-dependent transcriptome dynamics reported above. As shown in Figure 3D, mRNAs partitioned on the Ribo-seq/RNA-seq coordinate system based on their GCcds value. Moreover, stability values from both INSPEcT and SLAM-seq partitioned according to GCcds values (Figure 3EF). A similar, albeit weaker, separation was observed for predicted transcription and processing rates (Supplementary Figure 8).

By using multiple analytical approaches, we here show how GCcds, not just GC content in general, or in other sections of the transcriptome, is a predominant feature of stabilized, yet untranslated, mRNAs following DDX3X depletion.

GC content in the coding sequence is a ubiquitous signal in mRNA regulation.

Given the extensive connections between different aspects of mRNA regulation by thousands of regulators, we tested the breadth of the influence of features such as GCcds in other studies of RNA regulators. We re-analyzed >2000 RNA-seq samples (Methods) from the recent ENCODE RBPome14 study encompassing >200 RBP knockdowns, and performed differential analysis followed by predictive modeling using the same methods and features as described in the previous section, this time aiming at predicting changes in mRNA levels (Figure 4A).

Figure 4: A ubiquitous feature in mRNA regulation.

Figure 4:

A) Schema describing the ENCODE analysis strategy. B) Histogram representing overall model performance across datasets. C) Model performance (spearman correlation between predicted and real values on unseen test data) on the y axis, with importance of 3 example features variables (indicating their predictive value) on the x axis. Top knockdown experiments, together with DDX3X, are show with labels. Data shown is from shRNA KD experiments in K562 cells. The linear relationship between GCcds importance and model performance indicates its relevance as the top predictor of RNA changes in dozens of datasets. D) mRNA level changes against GCcds values in a DDX3X knockdown experiment in the ENCODE dataset. E) Schematics of the cell cycle data used. Values for different kinetic parameters were partitioned according to GCcds values of their mRNAs and tested for significant differences. F) Normalized cell proportion (obtained by dividing cell percentages between Auxin treatment and DMSO) in different stages of the cell cycle along the degron time course. An increase in G1 and decrease in S phase can be observed at later time points. Significance values come from a Wilcoxon two-sided test (n=6 in each condition).

We first grouped datasets according to knockdown efficiency, which varied according to knockdown method and cell line (Supplementary Figure 9, Methods). We selected the sample with the highest knockdown efficiency for each RBP and called feature importance using our analytical pipeline. Predictive power of our Random Forest regression strategy varied across different datasets (Figure 4B). Once again, the strongest predictor of mRNA changes was GCcds, whose predictive power dominated over other variables (Figure 4C, Supplementary Figure 10). As expected, changes upon DDX3X knockdown in the ENCODE data also exhibited a clear dependency over GCcds (Figure 4D), albeit to a lower degree compared to our degron dataset, likely due to differences in DDX3X depletion strategies and, importantly, to our translation profiling dataset, which allowed us to distinguish between specific classes (i.e. “TE_down”) of regulated mRNAs (Discussion).

Given the widespread relevance of GCcds as a predictor of post-transcriptionally regulated targets, we reasoned that a major cellular process might mediate the observed mRNA changes. We re-analyzed data from a recent study21 focused on mRNA clearance during cell cycle re-entry, where the authors used a FUCCI (fluorescent, ubiquitination-based cell-cycle indicators) cell system coupling RNA labeling, scRNA-seq and single-molecule imaging techniques to find extensive decay differences among different transcripts, potentially related to poly-A tail mediated regulation. Despite a lower throughput when compared to sequencing-based experiments, kinetic parameters estimated from their data (exemplified in the decay curve in Figure 4E) showed significant differences when partitioned by GCcds values (Figure 4E). mRNAs rich in GCcds showed lower half-life values, and fast decay kinetics at cell cycle re-entry, with the opposite trend exhibited by mRNAs poor in GC content in their coding sequence. Motivated by this finding, we decided to investigate differences in cell cycle dynamics in our degron system, by using 5-ethynyl-2’-deoxyuridine (EdU) incorporation followed by FACS analysis (Methods, Supplementary Figure 11). As shown in Figure 4F and Supplementary Figure 12, DDX3X depletion resulted in cells staying more in G1 and less in S phase when compared to controls, throughout the time course.

By re-analysis of thousands of RNA-seq samples, these results show the prevalence of GCcds in post-transcriptional regulation and RBP functions, with a potential role for cell-cycle dependent mRNA dynamics in shaping such a regulatory phenomenon.

A shift in 5’-3’ RNA-coverage as a hallmark of mRNA stabilization.

In addition to gene-level aggregate measures of abundance, we investigated whether changes in decay could be identified by taking advantage of the high resolution of RNA-seq experiments across gene bodies, which has previously been employed to inform about mRNA decay19. We leveraged our time-resolved degron dataset to investigate changes in 5’-3’ coverage, a known hallmark of RNA degradation often employed to verify overall integrity of cellular mRNAs or to estimate transcript-level decay. We calculated 2 different metrics, using the strategy illustrated in Figure 5.

Figure 5: Coverage analysis of regulated mRNAs reveals changes in 5’-3’ decay.

Figure 5:

Coverage analysis strategy in the degron dataset using a practical example (CSRNP2 gene): coverage starting point is first identified using pooled data, then coverage tracks for each experiment are extracted. Coverage starting points (in transcript coordinates) and coverage values (log2FC to DMSO) are calculated for each time point and used as input to a linear model. The beta coefficient (shown in pink) for each model is then extracted for each mRNA and values are compared across different classes (stabilized vs unchanging vs degraded). More details are available in the Methods section. P-values from one-sided Wilcoxon test.

Initially, we pooled all samples to identify the major isoform for each gene (Methods), and the first position at 15% of the maximum coverage. We then calculated such position for each time point. Importantly, coverage values were normalized for each transcript, thus controlling for expression level changes. Also, we did not observe similar changes at the 3’ end of transcripts (Supplementary Figure 13). We then used coverage starting points as input for linear regression. The regression coefficient was extracted and compared across the top 250 stabilized, degraded, and control mRNAs, alongside 1500 control transcripts. As shown in Figure 5, coverage values on stabilized mRNAs started as an earlier position in the transcripts, with moderate albeit significant differences between categories, indicating a lower 5’-3’ decay along the DDX3X degron time course. The opposite trend was observed for degraded transcripts. Similarly, we calculated average coverage values in a window of 300nt around the coverage start and applied a similar strategy: 5’ coverage values increased along the time course, confirming the accumulation of translationally suppressed mRNA species otherwise destined for degradation. Results were similar when using different cutoffs for the definition of coverage starting point (Supplementary Figure 14).

To test whether the suppression of 5’-3’ decay of untranslated transcripts by DDX3X occurs in vivo, we re-analyzed recent RNA-seq/Ribo-seq dataset in a conditional Ddx3x (cKO) mouse model13(Figure 6), where cell cycle and neurogenesis defects are evident when Ddx3x is depleted in neuronal progenitors. After applying our analytical pipeline, we observed that the accumulation of untranslated transcripts is even more evident in this in vivo model, as is its relationship with GCcds values (Figure 6A). Analogous to the strategy presented in Figure 5, 5‘ coverage values, as well as coverage starting points (Supplementary Figure 15), differed significantly between wild type and Ddx3x cKO animals (Figure 6B) in regulated transcripts, while no difference was detected at the 3’end (Supplementary Figure 16).

Figure 6. GCcds - mediated mRNA stabilization is detectable in vivo and across the ENCODE RBP database.

Figure 6.

A) Changes in Ribo-seq and RNA-seq levels in a conditional Ddx3x mouse model, as in Figure 1A, showing GCcds values. B) Strategy for coverage analysis in the mouse Ddx3x cKO experiment, shown for the Ctxn1 gene. Differences in coverage values are extracted and compared across regulated mRNAs. In C) same strategy as in Figure 5A applied to each differential analysis followed by RBP knockdown in the ENCODE dataset. Differences in coverage values between stabilized and unchanging mRNAs (shown by p-values, as calculated as in panel B), in pink color) are plotted against GCcds importance (x axis). D) Example mean coverage on 2 mRNAs (1 stabilized and 1 degraded), partitioning RBP knockdown datasets by their GCcds importance. An increase in coverage can be observed for the stabilized mRNA, while the opposite trend is visible for a degraded transcript.

Leveraging again the power of hundreds of RNA-seq experiments, we examined 5’ coverage profiles in the ENCODE dataset, partitioning experiments by their dependency on GCcds values. Differences between stabilized and control mRNAs are greater as the GCcds signature is more predominant (Figure 6C). Aggregating different experiments according to their GCcds dependency for example transcripts (Figure 6D) confirm this phenomenon, where both coverage starting position and coverage values changed across different datasets, indicative of mRNA decay regulation.

Taken together, we provide evidence for in vivo DDX3X-mediated stabilization of untranslated transcripts, its dependence on GCcds values, and, supporting the different analyses reported in this study (Figure 7) a high-resolution RNA-seq coverage analysis strategy to investigate GCcds-related mRNA decay regulation, with support from hundreds of post-transcriptionally perturbed transcriptomes.

Figure 7. A model for multimodal mRNA regulation by DDX3X.

Figure 7.

A schematic showing the effects of DDX3X depletion on GC-content related changes in translation and mRNA stability, highlighting potential molecular mechanisms underlying this phenomenon.

Discussion

The multifaceted role of DDX3X, described as involved in different molecular processes, often hinders the ability to understand its functions, especially considering the interconnected nature of molecular processes in the cell. Multiple mRNA features might underlie different modes of regulation, as we previously showed and experimentally validated 5’UTR dependencies underlying DDX3X translation regulation7. This outlines an unmet need for studies linking multiple aspects of the gene expression cascade.

In addition to profiling RNA levels and translation, we further dissected dynamics of cytoplasmic regulation by DDX3X, by employing a time course of efficient DDX3X depletion (Figure 1A). Akin to previous studies observing translation suppression preceding mRNA changes during miRNA-mediated regulation10, we observed an accumulation of translationally suppressed RNAs. This highlights the importance to profile not only mRNA abundance but also translation levels, which, in absence of quantitative estimates of regulated protein levels, can greatly help researchers understanding the functions of many cryptic regulators often involved in multiple processes, like DDX3X and other RBPs22. Despite relatively fast kinetics of DDX3X degradation from our degron system, more work needs to be performed to pinpoint exactly what changes occur right after DDX3X depletion, and to more precisely quantify the lag between translation suppression and mRNA stabilization.

By employing multiple techniques for feature selection, we identified a major feature underlying mRNA regulation by DDX3X, as well as by many other post-transcriptional regulators. An important area of investigation for the future is to employ more unbiased approaches, akin to recent Natural Language Processing-inspired methods in transcription regulation23, in mRNA biology to accurately estimate the relevant features directly from data rather than specified by potentially biased approaches. In our hands, the relevance of GCcds is clearly picked up by both the Random Forest and the Lasso (Supplementary Figure 4). Importantly, we included similar features, such as overall GC content24, in UTRs, introns etc., alongside codon frequencies20 and previously estimated values of codon optimality.

Our study suggests that data-driven approaches to functional transcriptomics are highly needed, where data from multiple experiments are routinely re-analyzed to test hypotheses and provide new insights into the complex world of mRNA biology. However, while profiling translation allowed us to focus on specific mRNA classes and their features, no large-scale translation profiling study exists yet, with few, precious small atlases recently appearing in the literature25. The current ENCODE RBP series is of great value to many mRNA biology researchers worldwide and it has been an invaluable resource for many recent studies26,27, yet an extension of these approaches which includes other aspects of post-transcriptional regulation, such as translation and stability, is in great need.

In the original ENCODE RBP study14, gene expression estimates were GC-corrected for each sample, as GC content has been often reported as a confounder, especially when comparing across sequencing technologies and labs. Given the presence of GC-related biases in sequencing-based assays, we think that great caution must be taken when observing expression changes driven by GC content features, especially when interpreted as direct effects from single molecular factors. Our degron time course analysis, despite containing dozens of features pertaining to GC content measures, detected GC content specifically in coding sequence as a feature underlying regulation, and this region-specific effect is not consistent with a general confounding role for GCcds. Moreover, our analysis focused on differences upon a perturbation under a single sequencing platform and laboratory settings, which are likely to have similar GC-related confounders, should there be any. Important confirmation of the relevance of GCcds and its relationship to mRNA dynamics also came from: employing SLAM-seq to estimate differences in stability (Figure 2), qPCR validations (Supplementary Figure 3), re-analysis of in vivo Ddx3x cKO RNA-seq/Ribo-seq (Figure 6), re-analysis of hundreds of RBP perturbations in human cell lines (Figure 4), and by analyzing kinetics extracted by transcriptome dynamics in cell-cycle specific states (Figure 4).

Together with well-established differential analysis statistical methods, which allowed us to robustly identify different classes of regulated mRNAs, we exploited the high resolution offered by RNA-seq to analyze differences in 5’end coverage for thousands of individual transcripts (Figure 5), as an additional metric reflecting active regulation of mRNA decay mechanisms. We posit that popular analysis strategies for -omics techniques, despite their popularity over more than a decade, often obscures information with regards to mRNA processing and other molecular mechanisms, which can be uncovered by dedicated computational methods. Importantly, such dynamics are invisible (or, worse, can significantly distort quantification estimates) when performing gene-level analyses.

The mechanism, or mechanisms, by which GC content in coding regions shapes mRNA dynamics is still to be determined. We speculate that complex RNA structures in the coding sequence can form in the absence of active translation elongation, and such structure may mediate degradation, helped by RNP complexes in the cytoplasm. However, recent literature focused on the role of different codons in mediating such effect18. In our hands, codon-mediated effects seem to be negligible when considering the overall GCcds values, but more work needs to be done to identify cases where one or the other, or a mix of the two, can mediate mRNA decay on different transcripts. The involvement of mRNA dynamics during the cell cycle (Figure 4) suggests a model where, during cell cycle - dependent translation suppression, mRNAs are able to fold structures in the coding sequence promoting decay, and, when such processes are misregulated (e.g., by depleting multifunctional RNA helicases such as DDX3X), this process is less efficient. The extent to which cell cycle changes might depend on direct DDX3X binding and regulation remains to be elucidated. Further work needs to be done to refine the exact function, together with the subcellular localization, of regulated mRNAs. For instance, mRNA retention in the nucleus might be an additional underappreciated mode of gene expression control28, and is in line with our observation about the untranslated status of regulated transcripts. However, we identified GC content in the coding sequence as the hallmark feature for stabilized transcripts, a feature which is defined by translation in the cytoplasm.

While RBP binding data remains an important starting point from which we can build testable hypothesis, simple binding-to-function paradigms might also create bias when trying to explain complex phenotypes arising from RBP misfunction. Moreover, we observed how binding patterns might different between different regulated classes (Supplementary Figure 4). In our previous study we investigated the changes in translation and RNA abundance using a DDX3X helicase mutant; one of the observations we made pertained to the lack of RNA changes in our data, suggesting a potential function for the helicase activity in orchestrating such changes.

Previous work implicated DDX3X in mediating cell cycle dynamics by a variety of mechanisms29, including a direct regulation of cyclin E1 translation30, which however was not among the most regulated mRNAs in our dataset (Supplementary Table 2). More work needs to be done to accurately quantify mRNA dynamics and RBP functions in the cell cycle, where translation regulation mechanisms31,32 ensure controlled rates protein synthesis. The connection between cell cycle, sequence content and mRNA regulation is reinforced by the in vivo data, adding to the importance of studying post-transcriptional regulation along the neurogenesis axis33,34, where the equilibrium between proliferation, apoptosis and differentiation35 shapes the complexity of the developing brain.

Methods

Ribo-seq and RNA seq experimental protocol

HCT116 cells with inducible degradation of DDX3X (as previously described16), were plated in 15cm plates at 20% confluency (~3.5×10 6 cells/plate). 48 hours post plating, when the cells were at ~ 70% confluency, the media was changed and fresh media with 500 μM IAA (Indole-3-acetic acid, the most common naturally occurring Auxin hormone) (Research Products International, cat: I54000–5.0) or DMSO was added to cells. Cells were harvested at 0, 4, 8, 16, 24, and 48 hours post IAA addition. Cell number did not appreciably increase over the 48 hours of the experiment. To quantify DDX3X protein, we used an anti-DDX3X antibody described in previous work7 normalized to an anti-GAPDH antibody (Rockland Immunochemicals, cat: 600–401-A33S).

Cells were treated with 100 μg/ml cycloheximide (CHX), washed with PBS containing 100 μg/ml CHX, and immediately spun down and flash frozen. Once all time-points were collected, the cells were thawed and lysed in ice-cold lysis buffer (20 mM TRIS-HCl pH 7.4, 150mM NaCl, 5 mM MgCl2, 1mM DTT, 100 μg/ml CHX, 1 % (v/v) Triton X-100, 25 U/ml TurboDNase (Ambion)). 240 μl lysate was treated with 6 μl RNase I (Ambion, 100 U/μl) for 45 minutes at RT with gentle agitation and further digestion halted by addition of SUPERase:In (Ambion). Illustra Microspin Columns S-400 HR (GE healthcare) were used to enrich for monosomes, and RNA was extracted from the flow-through using Direct-zol kit (Zymo Research). Gel slices of nucleic acids between 24–32 nts long were excised from a 15% urea-PAGE gel. Eluted RNA was treated with T4 PNK and preadenylated linker was ligated to the 3’ end using T4 RNA Ligase 2 truncated KQ (NEB, M0373L).

Linker-ligated footprints were reverse transcribed using Superscript III (Invitrogen) and gel-purified RT products circularized using CircLigase II (Lucigen, CL4115K). rRNA depletion was performed using biotinylated oligos as described36 and libraries constructed using a different reverse indexing primer for each sample.

For the RNA-seq, RNA was extracted from 25 μl intact lysate (non-digested) using the Direct-zol kit (Zymo Research) and stranded total RNA libraries were prepared using the TruSeq Stranded Total RNA Human/Mouse/Rat kit (Illumina), following manufacturer’s instructions.

Libraries were quantified and checked for quality using a Qubit fluorimeter and Bioanalyzer (Agilent) and sequenced on a HiSeq 4000 sequencing system.

Slam-seq experimental protocol

SLAM-seq was performed at 60–70% confluency for DDX3X-mAID tagged HCT116. Media was changed and fresh media with 100μM 4-thiouridine (4sU) was added to cells and changed every 3 hours for 24 hours. 8 hours prior to collection, growth medium was aspirated and replaced. Uridine chase was performed where cells were washed twice with 1X PBS and incubated with media containing 10 mM uridine and DMSO or 100μM IAA for 0 or 8 hours to induce degradation of DDX3X. At respective time points, cells were harvested followed by total RNA extraction using TRIzol (Ambion) following the manufacturer’s instructions (SLAMseq Kinetics Kit – Catabolic Kinetics Module, Lexogen). Total RNA was alkylated by iodoacetamide for 15 min and RNA was purified by ethanol precipitation. 200ng alkylated RNA were used as input for generating 3’-end mRNA sequencing libraries using a commercially available kit (QuantSeq 3ʹ mRNA-Seq Library Prep Kit FWD for Illumina, Lexogen).

Ribo-seq data analysis

Reads were stripped of their adapter, collapsed, and UMI sequences were removed. Clean reads were then mapped to rRNA, tRNA, snoRNA and miRNA sequences using bowtie237 using sequences retrieved from UCSC browser and aligning reads were discarded. Remaining reads were mapped to the genome and transcriptome using STAR38 v2.7.9a supplied with the GENCODE v32 GTF file. STAR parameters were: --outFilterMismatchNmax 3 --outFilterMultimapNmax 50 --chimScoreSeparation 10 --chimScoreMin 20 --chimSegmentMin 15 --outFilterIntronMotifs RemoveNoncanonicalUnannotated --alignSJoverhangMin 500 --outSAMmultNmax 1 --outMultimapperOrder Random.

SLAM-seq data analysis

Reads were mapped to the genome and transcriptome using same RNA-seq parameters, except for --outFilterMismatchNmax 10. Reads containing T>C mutations were extracted from the BAM file using GenomicAlignments and GenomicFiles Bioconductor39 packages.

RNA-seq data analysis

Reads were mapped to the genome and transcriptome using STAR with same Ribo-seq parameters. Synthesis, processing, and degradation rates were obtained using INSPEcT17 v1.17, using default settings. Genes significantly changing in their dynamics at a p-value cutoff of .05 were used for subsequent analysis.

Differential analysis

Unique counts on different genomic regions were obtained using RiboseQC40. 5’ end coverage values were inspected using Bioconductor39 packages such as GenomicFeatures41 and rtracklayer42. DESeq243 was used to obtain RNA-seq, Ribo-seq, and TE regulation, as described previously7: changes in translation efficiency were calculated using DESeq2 by using assay type (RNA-seq or Ribo-seq) as an additional covariate. Translationally regulated genes were defined using an FDR cutoff of 0.05 from a likelihood ratio test, using a reduced model without the assay type covariate, e.g. assuming no difference between RNA-seq and Ribo-seq counts.

A similar strategy was used to define significant changes in DDX3X-mediated stability from SLAM-seq: count tables with T>C reads were built and analyzed using labeling (4sU/DMSO) and degron status (8h. vs DMSO) as the two variables of interest; regulation in stability was defined using a reduced model without the degron type covariate, e.g. assuming no difference between DMSO and degron activation.

Translationally regulated genes (as defined by Ribo-seq/RNA-seq) and stability regulated genes (as defined by SLAM-seq) were defined using a p-value cutoff of .05.

For Figure 1D and 3D, the coordinate system was divided into 70 bins on each axis. GCcds values (for Figure 3D), or Ribo-seq and RNA-seq fold changes between each time point and the previous one (for Figure 1D) were averaged across genes in the same bin. Only mRNAs with significant changes in translation efficiency at 48h post degron induction were considered.

Random Forest and Lasso regression

The Random Forest regression was run using the randomForest44 package with default parameters. Lasso regression was performed on scaled variables using the glmnet45 package. While the entire feature table is available in Supplementary Table 2, a short description of the input features follows:

TPM values using RNA-seq (in log scale). Baseline TE levels, defined as ratio of Ribo to RNA reads. Baseline RNA mature levels, defined as length-normalized ratio of RNA-seq reads in introns versus exons. GC content, length (in log scale) and Ribo-seq/RNA-seq density in: 5ʹ UTRs, a window of 25nt around start and stop codons, CDS regions, non-coding internal exons, introns, and 3ʹ UTRs. Codon frequencies. Measures of gene-specific codon optimality, previously calculated from a recent study18. GC-content at first, second, or third codon position.

Feature importance (measured by mean decrease in accuracy for the random forest model and by the lasso coefficients) and correlation between predicted and measured test data were calculated on a 5-fold cross-validation scheme.

Analysis of cell cycle - dependent mRNA dynamics

Estimated mRNA decay kinetics at cell cycle re-entry were deposited as supplementary files of the original study21. Genes were partitioned cutting their GCcds values into 3 groups given the low number of quantified genes (total n=220).

Cell cycle staging

To measure DNA replication and cell cycle stage, EdU (5-ethynyl-2´-deoxyuridine) was added to cells at 10nM for 1.5 hrs before harvesting. 1 confluent well of a 6-well plate of HCT116 cells were harvested and processed as per manufacturer’s instructions for the Click-iT Plus EdU Alexa Fluor 647 Flow Cytometry Assay Kit (Thermo Fisher cat: C10634). Per manufacturer’s instructions, FxCycle Violet DNA content stain (Thermo Fisher cat: F10347) was added after the Click-iT reaction at 1:1,000 dilution before quantifying on a BD LSR Dual Fortessa flow cytometer. Alexa Fluor 647 was measured in the 670–30 Red C-A Channel and FxCycle Violet Stain was measured in the 450–50 Violet F-A Channel. Analysis was performed using FACS DIVA and FlowJo V10 (FlowJo, LLC) software.

5’end coverage analysis

Computation on single-nucleotide coverage values was performed using rtracklayer42. For each differential analysis, we extracted the most 250 stabilized and the most 250 degraded genes ranking P-values from RNA-seq differential analysis. 1500 control RNAs were randomly sampled from non-regulated genes, using p-values >.2 and TPM values > 3. Coverage values were 0–1 (min/max) normalized and the first position at value >.15 was identified as coverage starting position. In addition, a general coverage starting point was selected by pooling all samples, and a window of 250nt around such position was used to calculate average coverage values around the coverage start. Log2 fold change with respect to the control condition were then calculated.

For degron data, starting position and log2fc coverage values were extracted and used as input for linear regression. For coverage values, intercept was omitted, as the first value was 0. Beta coefficients were then extracted and compared between stabilized, degraded, and control mRNAs.

For mouse Ddx3x cKO and ENCODE data, differences between starting position (knockdown vs wt) and log2FC (knockdown vs wt) in coverage values were used to compare stabilized, degraded and control mRNAs, bypassing the regression step (2 values were calculated, as only wt or knockdown conditions were present).

TaqMan RT-PCR

DDX3X-mAID tagged HCT116 cells were plated in 6 well plates at 30–40% confluency. 24 hours post plating 500 μM IAA or DMSO was added to cells with or without 200nM Actinomycin D (ActD) for respective conditions. Total RNA was extracted from cells at 60–70% confluency using Direct-zol kit (Zymo Research) at 0 and 24 hours post-ActD and IAA or DMSO treatment. TaqMan probes for JUND, EIF2A, RACK1, LGALS1, and PFN1 were predesigned and purchased from ThermoFisher Scientific. Riboseq degraded (EIF2A) or stabilized genes (JUND) were conjugated with FAM dye while control genes RACK1, LGALS1, and PFN1 were conjugated with VIC dye. For the TaqMan real-time quantitative PCR amplification reactions, we employed an Applied Biosystems QuantStudio 6 Real-Time PCR System instrument. Real-time PCR was conducted using TaqMan Fast Virus 1-Step Master Mix from Applied Biosystems in 384-well plates, following the manufacturer’s protocol. Each well contained either the genes subject to riboseq degradation gene (EIF2A) or stabilization gene (JUND) along with control genes (RACK1, LGALS1, or PFN1). All reactions were conducted in triplicate. Thermal cycling conditions adhered to the manufacturer’s recommended standard protocol. The quantification of the target input amount was determined using the cycle threshold (CT) value, which corresponds to the point at which the PCR amplification plot crosses the threshold. Expression of ribose degraded and stabilized genes were normalized to each control genes respectively.

Gene Species Chromosome Location Assay ID Dye

RACK1 HUMAN Chr.5: 181236928 – 181243906 on Build GRCh38 Hs00272002_m1 VIC-MGB LGALS1 HUMAN Chr.22: 37675606 – 37679802 on Build GRCh38 Hs00355202_m1 VIC-MGB PFN1 HUMAN Chr.17: 4945650 – 4949088 on Build GRCh38 Hs07291746_gH VIC-MGB JUND HUMAN Chr.19: 18279694 – 18281656 on Build GRCh38 Hs04187679_s1 FAM-MGB EIF2A HUMAN Chr.3: 150546678 – 150586016 on Build GRCh38 Hs00230684_m1 FAM-MGB Details of TaqMan® real-time PCR assays obtained from ThermoFisher Scientific.

Supplementary Material

Supplement 1

Supplementary Table 1. Read mapping statistics for the Ribo-seq RNA-seq DDX3X time course dataset.

media-1.tsv (1.2KB, tsv)
Supplement 2

Supplementary Table 2. Input to the Random Forest model for the DD3X3 time course dataset.

media-2.tsv (18.4MB, tsv)
Supplement 3

Supplementary Table 3. Accession codes for the analyzed ENCODE datasets, with information for each differential analysis. Multiple accession can be technical replicate of a biological replicate.

media-3.tsv (213KB, tsv)
Supplement 4

Supplementary Table 4. Input to the Random Forest model for the cKO Ddx3x mouse dataset.

media-4.tsv (15.8MB, tsv)
Supplement 5
media-5.pdf (2.9MB, pdf)

Acknowledgements

L.C. wants to thank Piero Angela (1928–2022) for inspiring him, and countless other kids in Italy, to pursue a career in academic science, in constant awe of the beautiful mechanisms shaping the natural world. This work was supported by the National Institutes of Health DP2GM132932 and R01NS120667 (to S.N.F.). S.N.F. is a Pew Scholar in the Biomedical Sciences, supported by The Pew Charitable Trusts. ZJ was supported by the UCSF Moritz-Heyman Discovery Fellow Program. AX was supported by NIH F30 Ruth L Kirschstein National Research Service Award HD110250. ZJ and AX were supported by the UCSF Medical Scientist Training Program (T32GM007618). Flow cytometry data was generated at the UCSF Parnassus Flow CoLab (RRID:SCR_018206) and sequencing was performed at the UCSF CAT, supported by UCSF PBBR, RRP IMIA, and NIH 1S10OD028511-01 grants. Mouse image in Figure 6A used with permission from doi.org/10.5281/zenodo.3925903 . F.D. is a PhD student within the European School of Molecular Medicine (SEMM).

Data and code availability

Raw sequencing data for Ribo-seq, RNA-seq and SLAM-seq can be found under GEO accession GSE218433, with token “ujmtquoulnirpgx”. Encode accession numbers can be found in Supplementary Table 3. Ddx3x knockout Ribo-seq and RNA-seq were analyzed from accession number GSE203078, processed data can be found in Supplementary Table 4. Code to reproduce all figures, together with processed data, can be found at https://github.com/calviellolab/DDX3X_GC_paper.

References

  • 1.Shoemaker C. J. & Green R. Translation drives mRNA quality control. Nat Struct Mol Biol 19, 594–601 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ingolia N. T., Brar G. a, Rouskin S., McGeachy A. M. & Weissman J. S. The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments. Nat Protoc 7, 1534–1550 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Herzog V. A. et al. Thiol-linked alkylation of RNA to assess expression dynamics. Nature Methods 2017 14:12 14, 1198–1204 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hafner M. et al. CLIP and complementary methods. Nature Reviews Methods Primers 2021 1:1 1, 1–23 (2021). [Google Scholar]
  • 5.Sharma D. & Jankowsky E. The Ded1/DDX3 subfamily of DEAD-box RNA helicases. Critical Reviews in Biochemistry and Molecular Biology Preprint at 10.3109/10409238.2014.931339 (2014). [DOI] [PubMed] [Google Scholar]
  • 6.Oh S. et al. Medulloblastoma-associated DDX3 variant selectively alters the translational response to stress. Oncotarget (2016) doi: 10.18632/oncotarget.8612. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Calviello L. et al. DDX3 depletion represses translation of mRNAs with complex 5ʹ UTRs. Nucleic Acids Res 49, 5336–5350 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Chang Y.-F. F., Imam J. S. & Wilkinson M. F. The nonsense-mediated decay RNA surveillance pathway. Annu Rev Biochem 76, 51–74 (2007). [DOI] [PubMed] [Google Scholar]
  • 9.D’Orazio K. N. et al. The endonuclease Cue2 cleaves mRNAs at stalled ribosomes during No Go Decay. Elife 8, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Bazzini A. A., Lee M. T. & Giraldez A. J. Ribosome profiling shows that miR-430 reduces translation before causing mRNA decay in zebrafish. Science 336, 233–7 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lennox A. L. et al. Pathogenic DDX3X Mutations Impair RNA Metabolism and Neurogenesis during Fetal Cortical Development. Neuron 106, 404–420.e8 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ingolia N. T. Ribosome Footprint Profiling of Translation throughout the Genome. Cell 165, 22–33 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hoye M. L. et al. Aberrant cortical development is driven by impaired cell cycle and translational control in a DDX3X syndrome model. Elife 11, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Van Nostrand E. L. et al. A large-scale binding and functional map of human RNA-binding proteins. Nature 2020 583:7818 583, 711–719 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hon C. C. & Carninci P. Expanded ENCODE delivers invaluable genomic encyclopedia. Nature 2021 583:7818 583, 685–686 (2020). [DOI] [PubMed] [Google Scholar]
  • 16.Venkataramanan S., Gadek M., Calviello L., Wilkins K. & Floor S. N. DDX3X and DDX3Y are redundant in protein synthesis. RNA 27, rna.078926.121 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.De Pretis S. et al. INSPEcT: a computational tool to infer mRNA synthesis, processing and degradation dynamics from RNA- and 4sU-seq time course experiments. Bioinformatics 31, 2829–2835 (2015). [DOI] [PubMed] [Google Scholar]
  • 18.Medina-Muñoz S. G. et al. Crosstalk between codon optimality and cis-regulatory elements dictates mRNA stability. Genome Biol 22, 1–23 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Courel M. et al. Gc content shapes mRNA storage and decay in human cells. Elife 8, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hia F. et al. Codon bias confers stability to human mRNAs. EMBO Rep 20, e48220 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Krenning L., Sonneveld S. & Tanenbaum M. Time-resolved single-cell sequencing identifies multiple waves of mRNA decay during the mitosis-to-G1 phase transition. Elife 11, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Gerstberger S., Hafner M. & Tuschl T. A census of human RNA-binding proteins. Nat Rev Genet 15, 829–845 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Avsec Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods 2021 18:10 18, 1196–1203 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Thomas A. et al. RBM33 directs the nuclear export of transcripts containing GC-rich elements. Genes Dev 36, 550–565 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Chothani S. P. et al. A high-resolution map of human RNA translation. Mol Cell 82, 2885–2899.e8 (2022). [DOI] [PubMed] [Google Scholar]
  • 26.Zhao W. et al. POSTAR3: an updated platform for exploring post-transcriptional regulation coordinated by RNA-binding proteins. Nucleic Acids Res 50, D287–D294 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Van Nostrand E. L. et al. Principles of RNA processing from analysis of enhanced CLIP maps for 150 RNA binding proteins. Genome Biol 21, 1–26 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Bahar Halpern K. et al. Nuclear Retention of mRNA in Mammalian Tissues. Cell Rep 13, 2653–2662 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kotov A. A., Olenkina O. M., Kibanov M. V. & Olenina L. V. RNA helicase Belle (DDX3) is essential for male germline stem cell maintenance and division in Drosophila. Biochimica et Biophysica Acta (BBA) - Molecular Cell Research 1863, 1093–1105 (2016). [DOI] [PubMed] [Google Scholar]
  • 30.Lai M.-C., Chang W.-C., Shieh S.-Y. & Tarn W.-Y. DDX3 Regulates Cell Growth through Translational Control of Cyclin E1. Mol Cell Biol 30, 5444–5453 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Clemm von Hohenberg K. et al. Cyclin B/CDK1 and Cyclin A/CDK2 phosphorylate DENR to promote mitotic protein translation and faithful cell division. Nature Communications 2022 13:1 13, 1–14 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Tanenbaum M. E., Stern-Ginossar N., Weissman J. S. & Vale R. D. Regulation of mRNA translation during mitosis. Elife 4, (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Hoye M. L. & Silver D. L. Decoding mixed messages in the developing cortex: translational regulation of neural progenitor fate. Curr Opin Neurobiol 66, 93–102 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Harnett D. et al. A critical period of translational control during brain development at codon resolution. Nature Structural & Molecular Biology 2022 29:12 29, 1277–1290 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Pilaz L. J. et al. Prolonged Mitosis of Neural Progenitors Alters Cell Fate in the Developing Brain. Neuron 89, 83–99 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Ingolia N. T., Brar G. A., Rouskin S., McGeachy A. M. & Weissman J. S. The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments. Nat Protoc 7, 1534–50 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Langmead B. & Salzberg S. L. Fast gapped-read alignment with Bowtie 2. Nat Methods (2012) doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Dobin A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Huber W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12, 115–121 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Calviello L., Sydow D., Harnett D. & Ohler U. Ribo-seQC: comprehensive analysis of cytoplasmic and organellar ribosome profiling data. doi: 10.1101/601468. [DOI] [Google Scholar]
  • 41.Lawrence M. et al. Software for Computing and Annotating Genomic Ranges. PLoS Comput Biol 9, e1003118 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Lawrence M., Gentleman R. & Carey V. rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25, 1841–1842 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Love M. I., Huber W. & Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Wiener A. L. and Classification M. and Regression by randomForest. R News 2, 18–22 (2002). [Google Scholar]
  • 45.Friedman J., Hastie T. & Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 33, 1 (2010). [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

Supplementary Table 1. Read mapping statistics for the Ribo-seq RNA-seq DDX3X time course dataset.

media-1.tsv (1.2KB, tsv)
Supplement 2

Supplementary Table 2. Input to the Random Forest model for the DD3X3 time course dataset.

media-2.tsv (18.4MB, tsv)
Supplement 3

Supplementary Table 3. Accession codes for the analyzed ENCODE datasets, with information for each differential analysis. Multiple accession can be technical replicate of a biological replicate.

media-3.tsv (213KB, tsv)
Supplement 4

Supplementary Table 4. Input to the Random Forest model for the cKO Ddx3x mouse dataset.

media-4.tsv (15.8MB, tsv)
Supplement 5
media-5.pdf (2.9MB, pdf)

Data Availability Statement

Raw sequencing data for Ribo-seq, RNA-seq and SLAM-seq can be found under GEO accession GSE218433, with token “ujmtquoulnirpgx”. Encode accession numbers can be found in Supplementary Table 3. Ddx3x knockout Ribo-seq and RNA-seq were analyzed from accession number GSE203078, processed data can be found in Supplementary Table 4. Code to reproduce all figures, together with processed data, can be found at https://github.com/calviellolab/DDX3X_GC_paper.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES