Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Dec 21.
Published in final edited form as: Mol Cell. 2017 Dec 7;68(6):1083–1094.e5. doi: 10.1016/j.molcel.2017.11.014

Massively parallel reporter assay of 3′UTR sequences identifies in vivo rules for mRNA degradation

Michal Rabani 1,□,*, Lindsey Pieper 1, Guo-Liang Chew 1, Alexander F Schier 1,2,3,4,*
PMCID: PMC5994907  NIHMSID: NIHMS920643  PMID: 29225039

Summary

The stability of mRNAs is regulated by signals within their sequences, but a systematic and predictive understanding of the underlying sequence rules remains elusive. Here, we introduce UTR-Seq, a combination of massively parallel reporter assays and regression models, to survey the dynamics of tens-of-thousands of 3′UTR sequences during early zebrafish embryogenesis. UTR-Seq revealed two temporal degradation programs: a maternally encoded early-onset program and a late-onset program that accelerated degradation after zygotic genome activation. Three signals regulated early-onset rates: stabilizing poly-U and UUAG sequences, and destabilizing GC-rich signals. Three signals explained late-onset degradation: miR-430 seeds, AU-rich sequences and Pumilio recognition sites. Sequence based regression models translated 3′UTRs into their unique decay patterns, and predicted the in vivo impact of sequence signals on mRNA stability. Their application led to the successful design of artificial 3′UTRs that conferred specific mRNA dynamics. UTR-Seq provides a general strategy to uncover the rules of RNA cis-regulation.

Graphical abstract

The sequences of mRNAs affect their stability. Rabani et al. introduce UTR-Seq to uncover the rules that translate mRNA sequences into decay patterns. They survey the decay of tens-of-thousands of mRNAs, identify sequences that regulate mRNA degradation during early zebrafish embryogenesis, and establish sequence-based models that predict mRNA decay.

graphic file with name nihms920643u1.jpg

Introduction

The post-transcriptional fate of mRNAs is determined by interactions between cis-acting elements and trans-regulators such as RNA binding proteins (RBPs) (Pique et al., 2008; Tadros et al., 2007; Wharton and Struhl, 1991) and miRNAs (Giraldez et al., 2006; Guo et al., 2010). For example, during early development, 3′UTR sequences serve as degradation-inducing binding sites for the fly SMAUG protein (Tadros et al., 2007) and for the zebrafish microRNA miR-430 (Giraldez et al., 2006). But even though sequence elements that regulate mRNA degradation have been identified (Barckmann and Simonelig, 2013; Bazzini et al., 2016; Geisberg et al., 2014; Giraldez et al., 2006; Kedde et al., 2007; Mishima and Tomari, 2016; Tadros et al., 2007; Voeltz and Steitz, 1998; Wharton and Struhl, 1991), a systematic and predictive understanding of the regulatory information within mRNAs remains elusive. Current strategies, including computational (Rabani et al., 2008; Ray et al., 2013), in vivo (Geisberg et al., 2014; Hafner et al., 2010) and in vitro (Tuerk and Gold, 1990) methods, are limited by the difficulty in obtaining genome-scale data, the degeneracy of sequence elements, and multiple or combinatorial interactions. As a result, individual regulatory interactions in mRNAs are not immediately translated into molecular phenotypes, and mRNAs that are targeted by a given regulator commonly display a variety of decay patterns (Giraldez et al., 2006; Rabani et al., 2014; Tadros et al., 2007). It has therefore been difficult to identify the sequence rules that determine the onset and kinetics of mRNA degradation.

Massively parallel reporter assays (MPRA) identified functional cis-regulatory elements for mRNA transcription (Arnold et al., 2013; Grossman et al., 2017; Kheradpour et al., 2013), stability (Oikonomou et al., 2014; Shalem et al., 2015; Yartseva et al., 2017; Zhao et al., 2014) and splicing (Rosenberg et al., 2015). MPRAs systematically introduce tens-of-thousands of artificial sequences into the biological system, and read the regulatory output via high-throughput sequencing. However, mRNA degradation studies have been mostly restricted to steady-state tissue culture cells (Oikonomou et al., 2014; Zhao et al., 2014) and relied on transfection of DNA sequences, requiring extensive normalization to distinguish transcriptional from post-transcriptional effects (Shalem et al., 2015). Even when applied within a dynamic system (Yartseva et al., 2017), measurements did not follow changes in mRNA levels at multiple time points, hindering the detection of subtle kinetic regulation. It has therefore been unclear how effectively MPRA approaches can identify the in vivo sequence determinants underlying mRNA dynamics.

One of the ultimate goals of decoding cis-regulatory elements is to develop computational models that explain and predict RNA degradation dynamics. In recent studies, models that were learned on MPRA data predicted alternative splicing frequencies (Rosenberg et al., 2015) and transcriptional enhancers (Grossman et al., 2017), but predictive stability modeling has not been attempted. Two recent predictive approaches of yeast mRNA stability were based on high complexity genomic data, limiting predictions to local effects within motif positions (Eser et al., 2016) or requiring other transcript properties in combination with sequence-derived features (Neymotin et al., 2016). It has therefore been unclear if computational models can explain and predict an mRNA's stability from its sequence.

The massive degradation of maternal mRNAs is a key regulatory event in early embryos (Jukam et al., 2017) and a powerful system to study mRNA dynamics in the absence of de novo transcription. At the onset of development, fertilized embryos are transcriptionally silent and exclusively rely on post-transcriptional programs encoded by maternally provided mRNAs and proteins. In zebrafish, maternal programs drive rapid cleavage divisions, patterning and cell movements (Kane and Kimmel, 1993). Starting at three hours post fertilization (hpf), newly synthesized zygotic mRNAs gradually replace destabilized maternal transcripts and establish distinct zygotically encoded developmental programs (Jukam et al., 2017). Studies of zebrafish maternal mRNA clearance have documented zygotically encoded pathways, including miR-430 mediated regulation (Giraldez et al., 2006), inefficient codon compositions (Bazzini et al., 2016; Mishima and Tomari, 2016) and lower levels of m6A-modified bases (Zhao et al., 2017). However, these elements do not explain the bulk of mRNA degradation. Moreover, little is known about clearance of maternal transcripts prior to zebrafish genome activation, with contradicting evidence that either implicated (Aanes et al., 2011; Voeltz and Steitz, 1998) or ruled out early mRNA decay (Vesterlund et al., 2011). In addition to genome-encoded sequence elements, the post-transcriptionally added poly(A) tail is also an important regulator of mRNA stability. Its shortening or elongation can direct or protect mRNAs from degradation, respectively (Chen and Shyu, 2011). Tightly regulated changes of maternal poly(A) tails accompany oocyte maturation and early embryogenesis in many organisms (Hyman and Wormington, 1988; Weill et al., 2012), with different mRNAs maintaining different poly(A) lengths (Eichhorn et al., 2016; Lim et al., 2016; Subtelny et al., 2014). Here we use the zebrafish in vivo system to implement a dynamic MPRA and identify 3′UTR sequences that regulate maternal mRNA clearance before and after genome activation. We integrate these individual signals into regression models, and deduce design rules for 3′UTR-mediated mRNA decay. The combination of MPRAs and regression models, called UTR-Seq, provides a general framework to study mRNA regulation.

Results

An MPRA to test the roles of 3′UTR sequences in mRNA stability

We developed an MPRA to assess the roles of short UTR sequences in mRNA stability (Figure 1, see STAR Methods). We designed 90,000 sequences (110nt long, Table S1) to cover annotated 3′UTRs of 7,208 zebrafish transcripts with expression in early embryos (Pauli et al., 2012). We synthesized and cloned these sequences into the 3′UTR of a GFP reporter, and in vitro transcribed a library of mRNA reporters with tens-of-thousands of different 3′UTR sequences (Figure 1A). Two mRNA reporter libraries with different poly(A) tail lengths were generated: pre-adenylated reporters (A+) were transcribed with a 36nt long poly(A) that is characteristic (Subtelny et al., 2014) of highly adenylated maternal mRNAs (75-th percentile at 2hpf), while non-adenylated reporters (A-) were transcribed without a poly(A) at the end of their 3′UTR to mimic deadenylated maternal messages (Hyman and Wormington, 1988). Despite biases in representation of sequences within our libraries, originating in the synthetic oligonucleotide pool, we successfully recovered 95% of the initial 90,000 sequences with a sequencing depth of 5 million aligned reads (Figure S1).

Figure 1. An MPRA survey of 3′UTR-mediated dynamic mRNA decay.

Figure 1

(a) In vitro synthesized DNA fragments (left) of zebrafish 3′UTR sequences (110nt, red and orange), with two flanking terminal adaptors (20nt, gray) were cloned into the 3′UTR of a GFP reporter (middle), and in vitro transcribed to generate mRNA reporters (right) with different 3′UTR sequences (red and orange). Two reporter pools contained either non-adenylated reporters (A-, top) or pre-adenylated reporters (A+, bottom). (b) mRNA reporters were microinjected into 1-cell staged zebrafish embryos (left), and hourly RNA samples were collected for the first 10 hours of development (middle). Stable mRNAs maintain a similar level in all samples (dark orange), while levels of unstable mRNAs decrease over time (light orange). Temporal samples were sequenced and normalized to internal spike-ins, to generate a decay profile for each reporter (left). (c) Temporal (columns; hpf) mRNA abundance (white: X0, orange: 16-fold above X0, purple: 16-fold below X0; log2-scale) of 34,809 reporters (rows; sorted by their predicted onset-times and half-life) that were measured with at least a minimal average coverage in both A+ (left) and A- (right) samples. See also Figures S1-S3 and Tables S1-S3.

We injected the two mRNA reporter libraries into 1-cell zebrafish embryos, and collected hourly samples through the first 10 hours of development (Figure 1B). Injected embryos developed normally, and expressed the GFP protein throughout the time-course, starting at 1hpf (Figure S2). We measured mRNA levels of reporters by sequencing their 3′UTRs (Tables S2, S3) and normalized by internal mRNA spike-ins (Figure S3). Two time-course experiments with either A+ or A- reporters were sequenced at high-scale (Figure 1C) and collected temporal abundance profiles of 34,809 reporters (39%) with at least a minimal average coverage in both experiments. In additional replicated experiments (Figure S3), technical and biological noise did not exceed 10% of variation (r2). These results demonstrate that the MPRA component of UTR-Seq can measure the temporal mRNA levels associated with thousands of different 3′UTR sequences.

A model-based approach to quantify dynamic changes in reporter mRNA stability

We developed a computational strategy to quantify the decay kinetics of individual mRNA reporters in our library from their temporal abundance (see STAR Methods). We tested two alternative degradation models with increasing complexity and more kinetic parameters. The simpler ‘early-onset’ degradation model (Figure 2A) assumed that mRNA abundance follows first-order exponential decay kinetics with a temporally constant rate (β) that does not change throughout developmental time. The more complex ‘late-onset’ degradation model (Figure 2B) described changes in degradation by a step function with three parameters: shifting from an initial rate (β0) into a final rate (β) at onset-time (t0). Both models also inferred the initial level of each reporter (X0) to normalize their representation differences.

Figure 2. Degradation kinetics of UTR-Seq reporters. (a-b).

Figure 2

Top: temporal (x-axis; hpf) mRNA levels (X; top y-axis, black dots, log2-scale) and fitted decay kinetics (dashed light blue lines) of example reporters. Bottom: associated degradation function (b; bottom y-axis; dark-blue). Kinetic parameters are noted on plot. (a) Reporters that retained an early-onset decay had a constant degradation function with a single rate parameter (β, h-1). (b) Reporters that fit a late-onset decay had a step-change degradation function with three parameters: initial rate (β0, h-1), final rate (β, h-1) and onset-time (t0, h). Both models also inferred the initial expression level of each reporter (X0; top). (c) Fraction of A- (top, blue) and A+ (bottom, purple) reporters that fit early-onset (light colors, left) or late-onset (dark colors, right) degradation. Venn diagram shows overlap between A- (blue) and A+ (purple) late-onset classes. Numbers and percentages are noted on bars. (d) Distributions of degradation onset times (t0, x-axis, hpf) of A- (blue) and A+ (purple) late-onset reporters (y-axis, % of reporters). (e) Distributions of degradation rates (β, h-1, x-axis; labeled with half-life equivalents, h) of A- (blue) or A+ (purple) reporters (y-axis, % of reporters). Dashed lines represent distribution medians. See also Figure S4.

For each reporter, we used a canonical likelihood ratio test to select which model described its decay more accurately. We assigned a late-onset kinetics only to reporters that confidently (p<0.01) rejected the simpler early-onset model. For these reporters, early changes in mRNA levels (before t0) were markedly different than later ones, and followed a different rate constant. For example, although mRNA levels of reporters in Figure 2B did not change much initially (β0<0.15, up to 1.1-fold decrease/h), they decreased very quickly (β=0.63, 1.9-fold decrease/h) after 4 hpf (t0). In all other cases we retained the early-onset model that explained both early and late changes in mRNA levels with the same rate parameter. For example, reporters in Figure 2A decayed at the same rate throughout the response: either with no significant change to mRNA levels (β=0), or a constant 1.5-fold decrease/h (β=0.38).

Predictions by the selected degradation models (early-onset or late-onset) closely matched the measured mRNA levels (96% of time-courses with ‘goodness of fit’ test p>0.05, 72% with r2>0.7, Figure S4), and explained over 81% (r2) of the overall variation in this data (excluding variation of initial mRNA levels). Models that were independently learned on replicated experiments predicted highly correlated degradation rates and onset times (Figure S4). Notably, predictions of onset times rely on fewer samples, and are therefore less robust than degradation rates that are predicted by regression from multiple samples. Overall, we successfully fitted both A+ and A- time-courses of 31,319 reporters (35%) at a 5% confidence level. These results support the validity of our modeling strategy in capturing the in vivo dynamics of reporter levels.

A variety of degradation kinetics in mRNA reporters with different 3′UTRs

We observed a range of decay kinetics within the reporter libraries, with significant differences in both onset times and rates of degradation. Levels of most mRNAs (68% A+, 92% A-, Figure 2C) followed the early-onset model with no significant changes to rate. In contrast, mRNAs that exhibited late-onset degradation (32% A+, 8% A-, Figure 2C) were initially very stable, but their degradation accelerated during genome activation (3-5hpf, Figure 2D), and their mRNA levels started decreasing rapidly (mean half-life early >10h, late 1.4h, Figure 2E).

Within each program, we measured a distribution of rates with more than 10-fold differences in mRNA half-lives that ranged from 1 to more than 10 hours (Figure 2E), similarly to endogenous maternal mRNAs (Rabani et al., 2014). For example, 1,144 A+ reporters with a miR-430 seed (GCACUU) in their 3′UTR had an average half-life of 1.4h, but 1,036 reporters with a slightly permuted sequence (GCCAUU) had an average half-life of 3.6h. Within this distribution, late-onset rates that initiated after genome activation were faster than early-onset rates (mean A+ half-life: 3.8h early-onset, 1.6h late-onset, Figure 2E), and A- reporters decayed faster than A+ messages (mean half-life: 2.4h A-, 3.3h A+). Together, these results show that the mRNA reporters revealed a repertoire of dynamic in vivo decay profiles that originated from changes in 3′UTR sequences of mRNAs.

Regulation by poly(A) length identifies three classes of 3′UTR sequences

To determine the roles of poly(A) tails, we compared between the A+ and A- decay kinetics of each reporter, and identified three classes of 3′UTRs. For the majority of reporters (class I, 67%, Figure 3A) poly(A) tails affected the rates of degradation (β). 3′UTRs in this class (early-onset 3′UTRs) drove a constant early-onset decay of both A+ and A- mRNAs. Degradation of A+ reporters was slower than the cognate A- messages (mean half-life: 3.8h A+, 2.7h A-) with the strongest (1.7-fold) effect on the least stable mRNAs (top 10% mean half-life: 2.7h A+, 1.6h A-). For example, the reporter in Figure 3A was stabilized by 1.7-fold when injected with a poly(A). Poly(A) tails, however, had minimal effect on the relative ordering of degradation rates, which remained highly correlated between A- and A+ reporters in this class (Pearson r 0.73). These results indicate that poly(A) tails of class I mRNAs stabilizes their early-onset decay.

Figure 3. Three classes of 3′UTR sequences with distinct regulation by poly(A). (a-c).

Figure 3

Top: temporal (x-axis; hpf) mRNA levels (y-axis; log2-scale; A+ black, A- gray dots) and fitted decay kinetics (dashed lines, A+ purple, A- blue) of example reporters. Half-life equivalent (h) of degradation rates are noted in brackets. Middle: distributions of degradation rates (β, h-1, x-axis; labeled with half-life equivalents, h) of A- (blue) or A+ (purple) reporters (y-axis, % of reporters). Dashed lines represent distribution medians. Bottom: correlation between degradation rates (β, h-1, log2-scale) of A- (x-axis) and A+ (y-axis) reporters. Colors represent density (yellow: high density, blue: low density). Pearson correlation (r) is noted. (a) Class I reporters (early-onset 3′UTRs). (b) Class II reporters (late-onset 3′UTRs, initial poly(A) dependent). (c) Class III reporters (late-onset 3′UTRs, initial poly(A) independent). Number of reporters in each class is noted in brackets.

In the second class of messages (class II, 25%, Figure 3B), poly(A) tails affected the onset time of degradation (t0). A+ reporters with class II 3′UTRs (late-onset 3′UTRs, initiation poly(A) dependent) were initially stable and their degradation accelerated upon zygotic genome activation (3-5hpf), while A- reporters decayed at a constant early-onset rate throughout the time course. For example, a poly(A) tail delayed the degradation of the reporter in Figure 3B by 3h. These results show that poly(A) tails of class II mRNAs delays the onset of their degradation until genome activation.

Finally, messages in class III (7%, Figure 3C) were degraded by a late-onset program independently of their initial poly(A) tails. Both A+ and A- mRNAs with class III 3′UTRs (late-onset 3′UTRs, initial poly(A) independent) exhibited similar decay kinetics, and accelerated their degradation upon zygotic genome activation. For example, a poly(A) tail had no evident effect on degradation of the reporter in Figure 3C. These accelerated rates had similar distributions and were correlated between A- and A+ reporters (Pearson r 0.40). These results reveal that class III 3′UTRs drive late-onset degradation independently of initial poly(A) tails.

Taken together, the analyses of 3′ UTRs vis-à-vis poly(A) tails show that initial poly(A) tails stabilized mRNAs but had distinct and specific effects on mRNAs with different 3′UTRs.

Regulatory signals within groups of 3′UTR sequences

To identify 3′UTR signals that regulate degradation kinetics, we searched for sequence enrichments within our data (Table S4, see STAR Methods). Relying on evidence that most RBPs bind 3-7nt long sequences (Lunde et al., 2007), we tested such k-mers for their association with specific decay classes (hypergeometric test) and degradation rates (KS-test).

Three signals were linked to changes in early-onset degradation rates. Poly-U (Radford et al., 2008) and UUAG sequences (Charlesworth et al., 2006) were associated (Figure 4A) with a slower degradation of reporters, (A- p<2e-188), while GC-rich signals were associated (Figure 4B) with a faster decay (A- p<8e-92). All signals had the strongest effect on A- reporters. For example, in class I (early-onset 3′UTRs), 1,533 A- reporters with a UUUUUUU sequence in their 3′UTR (Figure 4A) had an average half-life of 6.9h compared to 719 reporters with the C-rich CUCCUC sequence (Figure 4B) that had an average half-life of 2.5h. In our analysis, class III reporters (late-onset 3′UTRs, initial poly(A) independent) were enriched in stabilizing (37%, p<7e-27) and depleted of destabilizing (13%, p<8e-37) signals, consistent with an increased reporter stability before genome activation.

Figure 4. Sequence enrichments within groups of 3′UTRs. (a-c).

Figure 4

Top: distribution of degradation rates (β, h-1, x-axis; labeled with half-life equivalents, h) of A+ (top) or A-(bottom) reporters (y-axis, % of reporters) with indicated sequences in their 3′UTRs (colored solid lines) or without them (black dashed lines). Number of reporters is noted in brackets. Middle: fraction of 3′UTRs (x-axis) within each class (I-III, y-axis) with indicated sequences (colored). Overall percentage per class is noted. Bottom: fraction of each class (I-III, gray-scale) within 3′UTRs with any of the indicated sequences (y-axis). (a) Two poly-U and two UUAG sequences (green lines) that associate with a slower early-onset degradation. (b) Four GC-rich sequences (red lines) that associate with a faster early-onset degradation. (c) Three acceleration sequences (blue lines) that associate with a faster late-onset degradation. See also Table S4.

Three other signals were associated (Figure 4C) with destabilizing late-onset reporters after genome activation: a core AU-rich (ARE) sequence (UAUUUAU) (Voeltz and Steitz, 1998), the known miR-430 seed (GCACUU) (Giraldez et al., 2006) and a Pumilio protein (PUM) recognition sequence (UGUAHAUA) (Gamberi et al., 2002). Reporters containing these sequences in their 3′UTR (<1% contain more than one) were frequently degraded via the late-onset program (A+ 87%, p<1e-300) and at faster rates (A+ p<1e-300). For example, we measured (Figure 4C) an average half-life of 1.4h for 1,144 A+ reporters with a miR-430 seed, 1.6h for 1,435 reporters with an ARE and 1.7h for 1,083 reporters with a PUM site, compared to 3.8h for reporters without any of these sequences in their 3′UTR. Although these three sequences represent the core elements with maximal signal, variants also had regulatory effects. For example, 88% of reporters with an ARE (UAUUUAU) had a late-onset A+ decay, and so did 71% of reporters with a variant of this sequence (UAUUUAA). These results show that UTR-Seq can identify the in vivo sequence determinants underlying mRNA dynamics, and revealed two stabilizing and four destabilizing mRNA regulatory elements.

Constant early-onset degradation relies on maternal factors, while late-onset acceleration requires zygotic factors

We hypothesized that the observed kinetics of mRNA reporters arises from the interplay of ‘maternal’ degradation signals with a constant and early effect and ‘zygotic’ signals that accelerate degradation upon genome activation. To test this hypothesis, we introduced the reporter libraries into activated, but unfertilized zebrafish oocytes that contained only maternally deposited factors, and measured their decay in the absence of any zygotic machinery (Figure S2).

We found that reporters with an early-onset degradation in embryos were also degraded at similar rates in activated oocytes (Figure 5A). Consistent with this result, early-onset signals had similar effects in activated oocytes as in fertilized embryos (Figure 5B). In contrast, reporters with late-onset degradation in embryos were stabilized in activated oocytes (Figure 5A), and any effect of late-onset signals on their degradation rates was lost in this context (Figure 5B). Based on these observations, we conclude that an early-onset degradation of reporters does not require the activation of the zygotic genome, and the effect of early-onset signals is independent of any zygotic factors. On the other hand, late-onset acceleration of degradation depends on zygotic factors, and therefore only initiates after the zygotic genome is activated.

Figure 5. mRNA degradation in unfertilized zebrafish oocytes.

Figure 5

(a) Correlation between degradation rates (β, h-1, log2-scale) in embryos (x-axis) and activated oocytes (y-axis) for A- (top) or A+ (bottom) reporters with an early-onset (left) or a late-onset degradation in embryos. Colors represent density (yellow: high density, blue: low density). Pearson correlation (r) and number of reporters are noted. (b) Distribution of degradation rates (β, h-1, x-axis; labeled with half-life equivalents, h) in embryos (left) or activated oocytes (right) for A+ (top) or A- (bottom) reporters (y-axis, % of reporters) with indicated sequences in their 3′UTRs (colored solid lines) or without them (black dashed lines).

Sequence-based regression models predict reporter mRNA clearance rates

One of the ultimate goals of decoding cis-regulatory elements is to develop sequence-based models that explain and predict RNA degradation dynamics. We therefore implemented as the second part of UTR-Seq a linear regression analysis (Grossman et al., 2017; Rosenberg et al., 2015) that captured individual regulatory signals in 3′UTRs and integrated them into a single prediction of mRNA decay rates (see STAR Methods). Regression models represented each mRNA reporter by its k-mer composition, counting the number of times that any 3-7nt long sequence occurred in its 3′UTR. They assigned a weight to each k-mer, and predicted the degradation rate (β) of a given 3′UTR by the sum of weights of all k-mers within it. Thus, any k-mer with a non-zero weight contributes independently and additively (no cooperativity) to the overall rate. We learned two regression models (Figure 6A-B) that decode the sequence-based rules of either early-onset or late-onset degradation. We evaluated their performance using 10-fold cross-validation that repeatedly used a random 90% of the data to learn a model and predicted the rates of the remaining 10% (see STAR Methods).

Figure 6. Regression models decode sequence-based rules of mRNA degradation. (a-b).

Figure 6

Left: correlation between degradation rates (β, h-1, log2-scale) of A+ (left) or A- (right) reporters as measured by MPRA (x-axis) or predicted by regression from 3′UTR sequences (y-axis, 10-fold cross validation). Rates were scaled as indicated (y-axis) when initial poly(A) differed from the training set. Colors represent density (yellow: high, blue: low density). Pearson correlation (r) is noted. Right: motif logos of destabilizing (top) and stabilizing (bottom) regression peak. Percent of peaks assigned is indicated below logos. Number of overall peaks is noted in brackets. (a) Early-onset regression rates of early-onset reporters (class I). (b) Late-onset regression rates of late-onset reporters (classes II and III). (c-e) Left: early-onset (gray) or late-onset positional regression weights (y-axis, weight, h-1) of 3′UTR sequences (x-axis, position). Yellow squares mark peaks (peak sequence is noted). Right: temporal (x-axis; hpf) mRNA levels (y-axis; log2-scale) of A+ (left) or A- (right) reporters with 3′UTR sequences, and their putative gain-of-function and loss-of-function changes. (c) Three destabilizing late-onset peaks (top to bottom: miR-430, ARE, PUM). (d) A stabilizing early-onset poly-U peak. (e) Two neutral sequences with no predicted peaks. See also Figures S5-S7 and Tables S5-S7.

A sequence-based early-onset regression model that was trained on reporters in class I (early-onset 3′UTRs) explained 60% of variation among their early-onset rates (cross-validation r2, A- 67%, A+ 38%, Figure 6A) with an additional 13% of variance due to technical noise (Figure S5). Separate early-onset regression models for either A+ or A-reporters further explained additional 9% of variation (cross-validation r2, 69% overall), leaving ∼18% of variation not explained by either 3′UTR sequence or poly(A) tail. This remaining variation could arise from regulatory interactions that are not modeled in our k-mer based linear regression. Analysis of the position of signals within 3′UTRs (Figure S5) suggests that this effect has minimal, if any, contribution. Order of elements, cooperative interactions or secondary structures could still be involved.

A late-onset regression model that was trained on reporters from all classes explained 43% of variation in degradation rates of late-onset reporters (classes II and, III, cross-validation r2, A- 29%, A+ 38%, Figure 6B). Estimated technical noise reached as much as 60% of variance (Figure S5), most likely a result of noisy predictions of late-onset times. Combined predictions of early-onset and late-onset regression explained 70% of variation in measured degradation rates (cross-validation r2, A- 60%, A+ 77%), with an additional 28% due to technical noise (Figure S4). These results establish that the regression component of UTR-Seq successfully predicted degradation rates of mRNA reporters from the k-mer composition of their 3′UTRs.

Position specific regression weights identify regulatory sequences

In order to understand the sequence-based rules behind the regression models and extend the k-mer enrichment analysis presented above, we identified 3′UTR locations associated with high regression scores (peaks, see STAR Methods). Such peaks represent short regulatory elements: positive peaks are destabilizing and negative peaks are stabilizing. For example, positive peaks overlap destabilizing miR-430 seeds (Figure 6C) and negative peaks overlap stabilizing poly-U sequences (Figure 6D). Since peak signals are potential targets for any of the multiple regulators that govern our measurements, we decomposed them into separate signals by their sequence composition (see STAR Methods). Early-onset peaks (Figure 6A) were both destabilizing and stabilizing. Destabilizing peaks (3,520 peaks, 20%, 0.03h-1 avg. height, 7.6nt avg. width) were associated with two late-onset acceleration signals (miR-430 46%, ARE 42%) and with diverse GC-rich sequences (12%). Stabilizing peaks (14,333 peaks, 80%, -0.037h-1 avg. height, 6.9nt avg. width) predominantly contained poly-U (61%) and UUAG (36%) signals. Stabilizing peaks were enriched in class III (late-onset 3′UTRs, initial poly(A) independent, p<6e-49) and depleted from class I (early-onset 3′UTRs, p<2e-16).

Late-onset peaks were overwhelmingly destabilizing (6,743 peaks, 99.8%, 0.12h-1 avg. height, 9.6nt avg. width). These peaks fit into one of three motifs (Figure 6B): miR-430 (31%), ARE (45%) and PUM (24%). Late-onset peaks were enriched in class II and III (p<1e-300, 73% of reporters) and depleted in class I (p<1e-300, 6% of reporters). Class III reporters combined late-onset acceleration with early-onset stabilizing signals (p<2e-28) that matched the sharp transition during genome activation. In contrast, this combination was depleted in class II (p<3e-8) that instead combined late-onset destabilizing peaks with early-onset GC-rich destabilizing peaks (p<2e-8) and correlated with a smoother transition during genome activation.

The regression peaks not only confirmed but also extended the sequence motifs we had identified through enrichment analysis. For example, 47% of class III reporters contained sequence enrichment signals (Figure 4C), whereas 73% of these reporters contained late-onset peaks identified through regression models.

The early-onset and late-onset regression models used different sets of k-mers to predict rates (1,264 and 718 k-mers respectively, out of 21,824 possible k-mers). Their comparison identified 122 overlapping k-mers, but those had different weights in the two models. For example, the miR-430 seed (GCACUU) had a 4.4-fold stronger late-onset weight, and poly-U (UUUUUU) had a 6.8-fold stronger early-onset weight. Moreover, some k-mers had opposite effects on degradation in the two models. For example, the 7-mer UAAAUUA had a positive late-onset weight (0.04h-1) and a negative early-onset weight (-0.02h-1), suggesting that this motif both stabilizes mRNAs early and enhances their clearance after genome activation.

In summary, the regression component of UTR-Seq identified six major sequence determinants and their variants underlying mRNA dynamics that confirmed and refined the regulatory signals initially identified from sequence enrichments.

Regression peaks in 3′UTRs of endogenous maternal mRNAs

UTR-Seq identified 3′UTR signals that regulate mRNA stability in our reporter library. To determine the role of these elements in the stability of endogenous maternal mRNAs, we analyzed regression peaks (Figure S6) in the annotated 3′UTRs of 2,021 maternal-only mRNAs (Rabani et al., 2014). Consistent with the reporter library, the 3′UTRs of maternal mRNAs were enriched for miR-430 (p<2e-13) and ARE (p<7e-10) late-onset peaks. Maternal mRNAs that contained these acceleration signals degraded faster than mRNAs without such peaks (35% and 20% increase in average half-life respectively), and their poly(A) lengths (Subtelny et al., 2014) were shorter after the onset of zygotic transcription. In similar consistency with the reporter library, poly-U and UUAG peaks were enriched in maternal mRNAs whose degradation was slower (32% and 24% decrease in average half-life respectively) and whose poly(A) tails were longer.

Regression models also predicted the degradation rates of maternal mRNAs (Figure S6, see STAR Methods). After controlling for variable lengths of endogenous 3′UTRs, predictions by late-onset regression explained 7% (Pearson r = 0.27) of variability in maternal degradation rates. As a control, the 5′UTRs of those genes and random permutations did not predict any of the observed variability (0%, absolute Pearson r < 0.03).

These results reveal that, despite confounding factors (see Discussion), 3′UTR signals that we learned from the reporter library also influence the stability of endogenous transcripts.

Using regression models to design mRNAs with specific decay onsets and rates

UTR-Seq revealed regulatory rules of mRNA degradation. To test if these rules accurately predict the impact of sequence variations and can design mRNAs with defined degradation dynamics, we selected six reporter sequences for further analyses. We chose two ‘neutral’ sequences that did not contain any peaks (Figure 6E), and four sequences with a single predicted peak that is associated with one of the main signals in our data: three late-onset acceleration signals (Figure 6C) and an early-onset poly-U signal (Figure 6D). We used the model predictions to introduce putative gain-of-function and loss-of-function changes into these sequences, by replacing neutral positions with functional peaks or by mutating peaks into non-functional sequences (Table S5). We tested each of the resulting modifications for their in vivo effects on mRNA stability (Tables S6, S7). The measured degradation rates of the 58 designed reporters matched the predictions by sequence-based regression models (Pearson r, 0.68 early-onset, 0.51 late-onset, Figure S7). Predictions explained 71% of variation (r2) in degradation rates, reaching a similar result as demonstrated by cross-validation analysis of the original set of reporters.

Even though differences in 3′UTR sequences outside peak positions created some variation between experiments, the resulting mRNA decay profiles confirmed the functionality of peak sequences. In gain-of-function experiments (Figure 6E), late-onset signals destabilized neutral A+ reporters by accelerating their decay after genome activation. Conversely, eliminating late-onset signals (Figure 6C) stabilized both A+ and A- reporters, with a stronger effect on A+ reporters. Introducing poly-U signals stabilized neutral A- reporters, whereas removing poly-U signals (Figure 6D) destabilized both A+ and A- reporters.

Pairwise combinations of regulatory peaks confirmed that interactions between peaks were mostly additive (Figure 6C-D). Increased numbers of late-onset signals raised degradation rates, whereas more poly-U signals reduced degradation rates. Late-onset elements destabilized poly-U containing reporters after genome activation, with the strongest effect by miR-430 seeds. Similarly, Poly-U elements stabilized mRNAs with ARE or miR-430 late-onset signals, and sharpened the transition between the two phases of the response. Taken together, these results validate the core signals and regulatory principles of mRNA clearance identified by UTR-Seq and demonstrate the power of rule-based design to create mRNAs with defined degradation kinetics.

Discussion

We developed UTR-Seq, a dynamic MPRA and computational framework for the high-throughput analysis of 3′UTR signals that regulate mRNA stability, and applied it to the degradation of maternal mRNAs in zebrafish. Our study provides four major conclusions. First, the MPRA is a powerful approach to study how 3′UTR sequences regulate mRNA stability. This technology maps dynamic decay profiles and identifies sequence determinants at a much larger in vivo scale and resolution than previous approaches (Aanes et al., 2011; Geisberg et al., 2014; Pique et al., 2008; Yartseva et al., 2017). Second, our study reveals two main temporal degradation programs during early zebrafish embryogenesis: a maternally encoded early-onset decay program, and a late-onset program to accelerate degradation after zygotic genome activation. Third, our analysis identifies regulatory sequences with specific roles: stabilizing poly-U and UUAG signals and destabilizing GC-rich signals act via early-onset pathways; and miR-430 seeds, ARE signals and PUM sites promote late-onset degradation. Finally, our regression models convert regulatory information in 3′UTR sequences into mRNA degradation patterns and allow the design of synthetic mRNAs with defined stabilities.

Identification of cis-elements for maternal mRNA degradation

By surveying a large sequence space (3.8 million nt) of endogenous 3′UTRs, we identified six major regulatory sequences of mRNA degradation. Although our experimental design does not account for secondary structures or long-range effects, the scale of the MPRA suggests that these six elements are the major short 3′UTR sequences that regulate early embryonic mRNA degradation in zebrafish.

Degradation of maternal mRNAs has been shown to utilize both early-onset (maternal) and late-onset (zygotic) pathways (Barckmann and Simonelig, 2013), but early-onset pathways have not been characterized in zebrafish. Our results demonstrate early-onset clearance of injected mRNAs, particularly for deadenylated mRNAs (A-), and identify three signals that affected their stability independently of any zygotic transcription. After the onset of zygotic genome activation, UTR-Seq discovered that in addition to the known miR-430 seeds (Giraldez et al., 2006), ARE and PUM signals also induce late-onset decay and require zygotically encoded factors for their activity. These decay signals have not been identified in previous studies in zebrafish (Aanes et al., 2011; Yartseva et al., 2017).

Our data suggests a model (Figure 7A) in which the combined action of early-onset and late-onset pathways determines mRNA decay in early embryos. After fertilization, early-onset programs stabilize mRNAs whose 3′UTRs are rich in poly-U and UUAG signals and depleted of GC-rich signals. Pre-existing poly(A) tails further stabilize mRNAs (by up to 1.7-fold), but their effect diminishes as 3′UTR-encoded stability increases. Upon activation of the zygotic genome, late-onset signals override early stabilization and accelerate degradation. This interplay between the two pathways generates a range of dynamic behaviors in mRNAs. For example, early-onset signals govern the constant degradation of mRNAs in class I (early-onset 3′UTRs) in the absence of late-onset signals. On the other hand, late-onset signals accelerate degradation of messages in class II (late-onset 3′UTRs, initial poly(A) dependent) and class III (late-onset 3′UTRs, initial poly(A) independent) after genome activation. Because 3′UTRs of class II mRNAs encode a faster early-onset decay, A- mRNAs in this class are degraded early and transition smoothly between the two phases of the response, while A+ mRNAs are stabilized by their poly(A) tails until genome activation. mRNAs in class III combine late-onset acceleration with high early-onset stabilities to generate a sharper transition between the two phases of the response. This model not only explains the dynamics of mRNA reporters, but also allowed us to create mRNAs with predefined degradation kinetics, by adding, removing or combining specific 3′UTR elements. This synthetic biology approach validated the core signals and regulatory principles that emerged from UTR-Seq, providing further support to this model, and demonstrated the power of rule-based design of mRNAs with defined degradation kinetics.

Figure 7. mRNA decay in early embryos.

Figure 7

(a) Left: Early-onset 3′UTR signals (x-axis) stabilize (poly-U, left) or destabilize (GC-rich, right) mRNAs. Initial poly(A) tails (purple bar, top) increase low early-onset stabilities, but their effect diminishes as 3′UTR-encoded stability increases. Late-onset 3′UTR signals (y-axis) accelerate degradation after genome activation (miR-430, ARE, PUM, top). Reporters are divided into class I (early-onset), class II (late-onset, initial poly(A) dependent) and class III (late-onset, initial poly(A) independent). Right: temporal (x-axis, h) mRNA levels (y-axis, log2-scale) of A+ (purple) and A- (blue) reporters with early-onset and late-onset rates as indicated on plot (black dots, 1-4, numbered). (b) RNA-Seq measurements of temporal (x-axis, h) mRNA levels (y-axis, log2-scale, black dots) of six maternal genes and fitted decay kinetics (blue dashed lines). Half-life (h) is indicated on top. Regression peaks in 3′UTR are indicated on left, and provide indirect evidence for their class assignment in the absence of information on their initial poly(A) length.

Similar dynamic behaviors are also evident in endogenous maternal mRNAs (Figure 7B). The elements identified through UTR-Seq also influence endogenous mRNAs: destabilizing signals are enriched in unstable mRNAs, and stabilizing sequences are enriched in stable mRNAs. A more detailed dissection of these and other stability signals in native mRNAs awaits improved measurements and annotations of endogenous mRNA dynamics. Moreover, the high variability in sequence, length and poly(A) tail of endogenous transcripts can obscure 3′UTR-specific effects. For UTR-Seq to have its full impact, future studies will need to extend it beyond 3′UTRs and build a richer repertoire of cis-elements.

With the exception of miR-430, the zebrafish trans-regulators associating with the cis-elements that UTR-Seq identified are unknown, but studies in other systems point to potential trans-regulators. U-rich and UUAG signals function as embryonic cytoplasmic polyadenylation elements in Xenopus (Charlesworth et al., 2006; Radford et al., 2008) that promote tail elongation via ElrA and Msi binding, respectively. U-rich regions also mediate binding of Dnd1 to stabilize maternal mRNAs in zebrafish primordial germ cells (Kedde et al., 2007). It is therefore conceivable that early-onset signals stabilize mRNAs by promoting polyadenylation. C-rich sequences bind Pcbp proteins and destabilize maternal mRNAs in C. elegans before genome activation (Stoeckius et al., 2014). AREs promote the cytoplasmic decay of mRNAs via association with C3H-class proteins, and were linked to early embryonic phenotypes in mouse (Ramos et al., 2004). PUM sites promote Pumilio-mediated decay of Drosophila maternal mRNAs via zygotic pathways (Gamberi et al., 2002). Future studies will reveal whether these candidate factors regulate mRNA degradation in zebrafish through the elements identified by UTR-Seq.

UTR-Seq: a general strategy to dissect sequence-to-activity relationships

Our combined experimental and computational framework is a principled technology to decode sequence-to-activity relationships in dynamic mRNA regulation. The two components of UTR-Seq provide important technological advances. The MPRA surveys a large sequence space of endogenous sequences via a direct and dynamic readout, and the regression models both highlight dominant signals and quantify more subtle regulatory effects. These translate a limited number of signals into a variety of decay patterns that explain over 70% of the observed decay profiles. Future studies will need to extend UTR-Seq to include the contribution of other effects, such as secondary structures, post-transcriptional modifications or non-linear interactions between regulators.

The UTR-Seq framework has wide applications. First, UTR-Seq is applicable across many biological processes and developmental stages to discover subtle differences and similarities in mRNA regulation. For example, it can determine the effects of variants in 3′UTRs that are associated with disease or positive selection. Second, by combining UTR-Seq with other regulatory readouts it can decipher sequence signals that affect different post-transcriptional processes. For example, it can identify elements that regulate poly(A) length or ribosomal occupancy. Third, by creating reporter libraries in 5′UTRs or coding sequences, UTR-Seq can study the role of sequence elements in other parts of mRNAs. Finally, by providing validated in silico tools to predict the impact of genetic variations on mRNA stability, UTR-Seq provides a powerful design tool to create synthetic mRNAs with pre-defined characteristics.

Star Methods

Contact for Reagent and Resource Sharing

Further information and requests may be directed to and will be fulfilled by the Lead Contact, Michal Rabani (mrabani@fas.harvard.edu).

Experimental Model and Subject Details

Zebrafish

All protocols and procedures involving zebrafish were approved by the Harvard University/Faculty of Arts and Sciences Standing Committee on the Use of Animals in Research and Teaching (IACUC; Protocol #25-08). Wild-type zebrafish embryos (TL/AB strains) were used for all experiments. Embryos were grown and staged according to standard procedures. Fertilized eggs were collected at 28C, and kept in culture medium (5.03mM NaCl, 0.17mM KCl, 0.33mM CaCl2, 0.33mM MgSo4, 0.1% Methylene blue). Embryos were removed from their chorion and injected at the one-cell stage. A total of 20-40 injected embryos were randomly collected per sample after visually ensuring that all embryos were at the same expected developmental stage. Mature oocytes were collected at 28C by squeezing the belly of adult female zebrafish under anesthesia with tricaine solution (0.15  mg/ml). Oocytes were incubated in culture medium for 5 min. to induce maturation, and remained in culture medium. Oocytes were removed from their chorion and injected after maturation. A total of 20-40 oocytes were randomly collected per sample.

Method Details

Design and synthesis of the zebrafish 3′UTR oligonucleotide library

A set of 90,000 oligonucleotides was designed to cover 3′UTR sequences of zebrafish embryonic genes. We combined RefSeq and ENSMBL zebrafish annotations (Jul. 2010, Zv9/danRer7) provided by the UCSC genome browser (http://genome.ucsc.edu) with annotations of zebrafish embryonic transcripts that were calculated from RNA-Seq data of early developmental stages (Pauli et al., 2012). The 3′UTR of a gene was defined as the sequence of all exons downstream of the longest annotated CDS. 3′UTR sequences were split into overlapping 110nt fragments. Shorter 3′UTR sequences were concatenated when possible or supplemented by random flanking sequences in order to reach the 110nt size. Zebrafish functional annotations were extracted from the Gene Ontology Consortium version 1.2 (releases/2014-05-13). Final oligonucleotide set (Table S1) was composed of 3 parts: (1) 32,430 oligonucleotides (36%) covered 1,385 3′UTRs of transcripts that were annotated as either signaling, developmental, transcription factors, RNA-binding or cell cycle genes, at high-resolution (80nt overlap between fragments). (2) 56,254 oligonucleotides (62.5%) covered the 5,823 3′UTRs of 4,935 maternal and 888 zygotic genes that were the top 58% expressed genes up to 10hpf (RPKM > 3.2 in (Pauli et al., 2012)), at lower resolution (20nt overlap). (3) 1,316 oligonucleotides (1.5%) covered sequences with known mRNA stability and translation regulatory elements by the Rfam database (release 12.0), at high-resolution (70nt overlap). Final DNA oligonucleotides (150nt) also included two 20nt universal adaptors upstream (GGAGATCTGAGTTCAAGGAT) and downstream (ATCTAGAACTATAGTGAGTC) of each sequence to use for primer extension and cloning. A pooled oligonucleotide library (90,000 sequences) was synthesized by massive parallel oligonucleotide synthesis (Broad Institute) at pmole scale.

Reporter library cloning and transcription

A pUCIDT vector (IDT) was used to generate a GFP construct driven by an SP6 promoter, with a short 5′UTR (48nt), a 3′UTR (60nt) that contained the two universal adaptors sequences (40nt) as a cloning site, and a 36nt poly(A) (Figure 1A). The synthetic oligonucleotide library was amplified via the universal adaptors sequences and cloned it into the 3′UTR via two fragments Gibson assembly (NEB) with the cloning site adaptors. Bacterial transformations were performend by electroporation into Endura electrocompetent cells (Lucigen) according to the manufacturer's recommendations, and in large enough volumes and efficiencies to ensure at least 10× coverage of the oligonucleotide library. All colonies grown overnight from transformation were harvested and plasmids extracted with EZNA Plasmid Mini (Omega). Plasmid library was linearized by enzymatic restriction at the end of its 3′UTR, either before (A- library) or after (A+ library) the poly(A) sequence. mMessage mMachine Sp6 kit (Thermo Fisher) was used to in-vitro transcribe a library of mRNA reporters from linearized plasmids. In-vitro transcribed mRNAs were fully processed, capped and without any introns. Resulting mRNA levels were quantified by Qubit fluorometric quantification (Thermo Fisher) and length validated by Tapestation RNA analysis screen tape (Agilent).

Fish microinjection and sample collection

One-cell staged wild-type zebrafish embryos were injected (Figure 1B) with 80pg of mRNA reporters (either A- or A+). A total of five replicates were collected using A+ reporters in three separate experiments (Figure S3A). In the first experiment, a sample was collected every hour between 1hpf to 8hpf, and at 10hpf. In the second experiment, a single sample was collected every hour between 1hpf to 10hpf, and split into two after RNA extraction to produce two technical replicates. In the third experiment, two separate samples were collected every two hours between 2hpf to 10hpf to produce two same-day biological replicates. A total of two replicates were collected using A- reporters in two separate experiments (Figure S3A). In the first experiment, a sample was collected every hour between 1hpf to 8hpf, and at 10hpf. In the second experiment, samples were collected at 1hpf, 2hpf, 3hpf, 4hpf (3 samples), 6hpf, 8hpf and 10hpf. Mature oocytes were injected with 50pg of mRNA reporters (either A- or A+). Oocytes were collected after 1h, 4h and 7h from their incubation.

RNA extraction, library construction and sequencing

Total RNA was isolated using TRIzol (Invitrogen), after adding 120fg of mRNA with 5 known 3′UTR control sequences into each RNA sample during the initial TRIzol lysis step. Control mRNAs had identical structure as A+ reporters and were added at 2-fold decreasing amounts (highest 50fg, lowest 3.125fg) and processed with the sample. We used a two-step RNA-Seq library construction approach that specifically targeted the variable 3′UTR region of reporters. In the first step, total-RNA was reverse-transcribed with Maxima RT (Thermo Fisher) and a gene-specific primer that matched the 20nt downstream 3′UTR universal adaptor. RT primer also added a random 8nt UMI and a constant RT adaptor sequence (CTACACGACGCTCTTCC). In the second step, resulting cDNA was amplified by 18 cycles of Phusion PCR (NEB) with primers that matched the 20nt upstream 3′UTR universal adaptor and the RT adaptor, and cleaned with 1.0 volumes of AMPure beads (Agencourt). PCR primers also added the appropriate Illumina sample barcodes and sequencing adaptors, thus resulting in the final RNA-Seq library after a single amplification step. Libraries were quantified by Tapestation D1000hs screen tape (Agilent), and sequenced at low-scale (3-5 million reads per library) on Illumina miSeq platform with 150nt single-end reads, or at high-scale (5-10 million reads per library) on Illumina hiSeq 2000 platform with single-end 100nt reads (Figure S3A).

RNA-Seq data processing

MiSeq data (160nt reads) was filtered to retain only sequences that contained both terminal adapter sequences (with up to 10 mismatches) and an insert of 90nt or longer. HiSeq data (100nt reads) was filtered to retain only sequences that contained the 5′ terminal adapter sequence (with up to 10 mismatches) and an insert of 62nt or longer. After removing terminal adapter and UMI sequences, Bowtie2 (Langmead and Salzberg, 2012) was used to align retained reads to a reference set of all 90,000 synthetic oligonucleotide sequences (‘end-to-end’ and ‘very-sensitive’ parameters, edit score of 20 or less, quality score of 10 or more). The number of reads (DNA samples) or UMIs (RNA samples) that were mapped to each oligonucleotide sequence was recorded. UMI counts were adjusted to represent the expected number of mRNA molecules in the sample as previously described (Islam et al., 2014). A generalized binomial linear regression model was fitted to counts of five control mRNAs that were added to samples at known quantities in 2-fold increments (highest 50fg, lowest 3.125fg), and the resulting linear transformation was used to normalize UMI counts. Each sample was processed separately and normalized to a common scale of the expected spike-in quantities (Tables S2, S3). Only reporter sequences with at least a minimal average coverage across analyzed samples (average UMI count above 20 in 9 time-course samples, and a normalized fitted mRNA expression above 2 at 1hpf) were considered for further analysis.

A linear regression model for predicting degradation rate from 3′UTR sequences

We represent a reporter g by its k-mer composition (Ng), counting the number of times that any possible 3-7nt long sequence (k) occurred in its 3′UTR sequence. We also associate the reporter g with a degradation rate (βg), as measured in MPRA experiment. These counts and rates define a set of linear equations of the form:

βg=w0+kNkgwk

Our goal is to find a set of k-mer specific weights (wk) that solve these linear equations for all reporters g in our data (in matrix notation):

(β1βg)=w0+(N11N1kNg1Ngk)(w1wk)

We used a two-step approach to find an optimal solution. In the first step, we selected a high-confidence subset of reporters (up to 20,000 reporters with highest fit to their kinetic parameters by r2 and goodness-of-fit tests) as the training input for a lasso-regularized linear regression analysis (Tibshirani, 2011) that finds locally optimal regression weights (wk). This model is optimal in the sense that it predicts rates as similar as possible to the input rates (maximal likelihood optimization), while minimizing the number of non-zero weights (regularization). In the second step, we used a standard linear regression with only k-mers that had non-zero weights by regularized regression, and recalculated the optimal regression weights (wk) using the complete set of input reporters. In this way, we identified relevant short k-mers and quantified their effect on reporters decay: a positive weight represents a destabilizing element that increases degradation rate, while a negative weight represents a stabilizing element, and stronger elements have higher absolute weight values. Predicted degradation rates are a linear sum of the weights of all k-mers within a sequence, so any k-mer with a non-zero weight affected these predictions. We applied this approach to train two sequence-based regression models. An early-onset regression model was trained on 3′UTR sequences of class I reporters (early-onset 3′UTRs) and their A- degradation rates. A late-onset regression model was trained on 3′UTR sequences of all reporters and on their A+ degradation rates that were estimated from samples collected at 4hpf or later. These included both destabilizing examples of late-onset decay in classes II (late-onset 3′UTRs, initial poly(A) dependent) and III (late-onset 3′UTRs, initial poly(A) independent) and stabilizing examples in class I (early-onset 3′UTRs). We fitted a global scaling factor to transform regression rates by a model that was trained on A+ data to predict A- rates and vice-versa. To validate these models, we tested how accurately they predicted degradation rates of sequences that were not used during training. We used a cross-validation test (Figure S5A), by training a linear regression model with only 90% of the data as input (selected at random) and then applying it to predict the degradation rates of the remaining 10% from their 3′UTR sequences.

Identifying regression peaks in 3′UTR sequences

For each regression model, we calculated positional regression weights by summing the regression weights of all k-mers that overlap a position. We defined the threshold peak score of a regression model as the minimal score within the top 1% of all ‘regression weights’ that were assigned to 3′UTRs in the reporter library by this model. We selected all 3′UTR positions with an absolute ‘regression weight’ above the model's threshold as ‘peak’ positions, and extended by 2 bases in both directions (minimal peak width is 5nt).

Decomposing peaks into sequence signals

We used the following three steps approach to decompose peaks into motifs. The first step assigned peaks into positional weight matrices (PWMs). We scored each 4-6nt long k-mer by the sum of weights of all peaks that contained it. We identified a k-mer with maximal score, and used it to generate an initial multiple sequence alignment of peaks that contained it. We removed those from the set of peaks, and repeat the process with the remaining peaks by selecting a new k-mer, until all peaks have been assigned. The second step optimized the PWMs. We applyed the following steps separately to positive and negative peaks. We selected only PWMs that were assigned more then 1% of the overall peak weights, and assigned each peak to the best matching PWM. We used a null model of the overall nucleotide distribution in peaks, which retained peaks that did not fit any PWM. We re-calculated each PWM from the peaks that were assigned to it, and repeated the process until convergence. If at any iteration the null model contained 1% or more of the overall peak weights, we constructed a new PWM from it. Finally, the third step removed redundant PWMs. We scored PWMs by the overall weight of peaks assigned to them. We removed PWMs one after the other starting at the lowest scoring PWM, and repeated the optimization of the second step. If optimization did not re-expand the set of PWMs we concluded the PWM was redundant, and excluded it.

Analysis of endogenous 3′UTR sequences

We selected a subset of 15,415 early-expressed (RPKM up to 10hpf > 3.2 in (Pauli et al., 2012)) zebrafish embryonic transcripts (annotated as described above). We identified regression peaks in endogenous 3′UTR sequences (annotated as defined above) by calculating positional regression weights and using regression models' peak thresholds (as defined above on reporter sequences). To reduce annotation and length biases we chose to analyze only maternal mRNAs with annotated 3′UTRs that are 50nt to 1kb long. We calculated degradation rates of maternal mRNAs with no evidence for zygotic transcription until 10hpf (as defined in (Rabani et al., 2014)), by linear regression from ribosomal RNA-depleted mRNA levels that were measured in embryos at 2, 4 and 6hpf (Lee et al., 2013). We applied our regression models (early-onset and late-onset) to the annotated 3′UTR sequences of 2,021 maternal genes by calculating the k-mer composition of their annotated 3′UTR sequences, and multiplied by regression weights to assign a raw degradation rate per sequence. To control for variable 3′UTR lengths of endogenous mRNAs, we applied each of our regression models to a large set of random sequences of variable lengths, and used these to normalize predictions by fitting a sigmoid to all raw degradation rates, and normalizing predictions accordingly.

Design and synthesis of validation sequences

We selected seven UTR-Seq reporter sequences: two “neutral” sequences that did not contain any peaks (BG1, BG2), three sequences with a late-onset high-weight peak (M430, ARE, PUM), a sequence with an early-onset high-weight peak (POLYU), and a sequence with an early-onset low-weight peak (POLYA). To introduce putative loss-of-function changes, we mutated peaks into sequences that were predicted as non-functional by regression models. To introduce putative loss-of-function changes, we replaced neutral positions with each of the peak sequences or with a different neutral sequence (once or more). These resulted in a set of 58 validation sequences (Table S5). Each of the validation sequences was synthesized as a separate oligo at nmole scale (IDT). Oligos were separately amplified via the universal adaptors sequences, mixed at equal concentrations and cloned and transcribed as previously described. By using equal concentrations from each sequences, we minimized representation biases in the resulting plasmid and mRNA validation libraries. Wild-type zebrafish embryos were injected with 80pg of validation library (either A- or A+) as previously described, and 25 embryos were collected every hour between 1 hpf to 9hpf (Table S6, S7). Processing of mRNA samples, sequencing, normalization and fitting of early-onset and late-onset degradation models were all done as previously described.

Quantification and Statistical Analysis

Fitting of early-onset and late-onset degradation models

We assume first-order exponential decay kinetics of individual mRNA reporters in our library:

dXdt=b(t)X(t)

Where t is time (hours), X(t) is measured mRNA abundance at time t, b(t) is degradation rate at time t and X0 is the initial mRNA level of the reporter. We tested two alternative models for b(t) with increasing complexity and more kinetic parameters. The simpler early-onset degradation model (Figure 2A) used a temporally constant degradation rate (β) throughout developmental time:

b(t)=β

The alternative and more complex late-onset degradation model (Figure 2B) described changes in degradation by a step function: shifting degradation from an initial rate (β0) into a final rate (β) at onset-time (t0):

b(t)={β0tt0βt>t0

When b(t) is constant, the differential equation above has a closed form solution:

dXdt=βX(t)X(t)=X0eβtlogX(t)=logX0βt

In the constant model, the optimal β; and X0 were inferred by a standard linear regression between mRNA levels (X) and times (t). To optimize t0, we used an iterative search, in which we optimized β0, β for each possible t0 value, and selected the solution (t0, β0, β and X0) that minimized the least-square error between expected and measured mRNA levels. For each reporter, we used a canonical likelihood-ratio test to compare between a hierarchy of four models: early-onset degradation with β = 0 (fitting only X0), early-onset degradation with β>0 (fitting X0, β), late-onset degradation with β0 = 0 (fitting X0, t0, β) and late-onset degradation with β0>0 (fitting X0, t0, β0 and β). We tested the fit of model predictions to our data by assuming an additive Gaussian error (with zero mean and fitted maximal-likelihood variance) and obtained p values by using a chi-square test of the log likelihood ratios for a nested hypothesis testing. We assigned a more complex model only to reporters that confidently (p < 0.01) rejected a simpler model with less param-eters. The most complex model (with β0>0) fitted to only 2.8% (891) of A+ reporters and 3.3% (1,024) of A-reporters, and predicted very slow early rates (mean half-life > 10h), this model was not used for any further analysis.

Estimating the percent of variation explained by a regression model

The percent of variation in degradation rates that is explained by the early-onset or late-onset regression models was estimated by the squared Pearson correlation between measurements and cross-validation predictions (r2). The percent of variation explained by the combined predictions of early-onset and late-onset regression was estimated by the Pearson correlation of early-onset cross-validation predictions to early-onset rates (class I) and late-onset cross-validation predictions to late-onset rates (class II and III).

k-mer enrichments in 3′UTR sequences

We tested all 3-7nt long k-mer sequences via three statistical tests (Table S4). We associated a k-mer with a class of reporters when reporters within the class were significantly (hypergeometric p-value, 1% FDR) enriched or depleted for the k-mer relative to the whole reporter library. Within each class, we associated a k-mer with a regulatory effect when reporters with this k-mer in their 3′UTR had a significantly different mean (t-test, 1% FDR) or distribution (KS-test, 1% FDR) of degradation rates (faster or slower) than reporters without this sequence. We manually curated all statistically significant enrichments, and grouped highly similar sequences into longer and more generalized consensus sequences.

Enrichment analysis

We calculated the enrichment of a query set of genes for a binary annotation by a hypergeometric p-value and a 5% False Discovery Rate (FDR) significance threshold across all tested annotations. We defined the expression source (maternal or zygotic) of zebrafish embryonic transcripts as previously described (Rabani et al., 2014).

Data and Software Availability

Data Resources

Sequencing data was deposited into the Gene Expression Omnibus (GEO) under the accession number GSE106677. We also provide supplemental data files with processed RNA expression data (Tables S2, S3, S6, S7). Original imaging data was deposited to Mendeley Data and is available at http://dx.doi.org/10.17632/nxxcf8yshc.1.

Software

Software for data analysis and quantitative modeling will be made available upon request

Supplementary Material

1

Table S1. A list of 90,000 3′UTR sequences designed for UTR-Seq reporters. Related to Figure 1.

Table S2. Normalized mRNA levels of A- UTR-Seq reporters. Related to Figure 1.

Table S3. Normalized mRNA levels of A+ UTR-Seq reporters. Related to Figure 1.

Table S4. Sequence enrichments of k-mers in UTR-Seq data. Related to Figure 4. All enriched k-mer sequences with p<1e-25. P-values below 1e-30 are colored in orange (fast degradation), green (show degradation), red (class enrichment) or blue (class depletion).

Table S5. A list of 58 3′UTR sequences designed for validation reporters. Related to Figure 6.

Table S6. Normalized mRNA levels of A- validation reporters. Related to Figure 6.

Table S7. Normalized mRNA levels of A+ validation reporters. Related to Figure 6.

2
3
4
5
6
7
8

Highlights.

  1. UTR-Seq: a new technology to discover 3′UTR sequences that regulate mRNA decay

  2. Six sequence signals regulate early or late mRNA decay in zebrafish embryos

  3. Sequence-based regression models convert 3′UTR sequences into mRNA decay patterns

  4. Design of artificial 3′UTRs that confer specific mRNA dynamics

Acknowledgments

We thank Attila Becskei, Sean Eddy, Nir Friedman, Elena Rivas, Mihaela Zavolan, Jeff Farell, James Gagnon, Nathan Lord, and Yiqun Wang for critical reading of the manuscript. We thank Aviv Regev for helpful discussions and for providing access to sequencing facilities. This research was supported by James S. McDonnell Foundation and Helen Hay Whitney Foundation postdoctoral fellowships (MR), and NIH grants R01 HD085905 and R01 HD076708 (AFS).

Footnotes

Author Contributions: MR and AFS conceived and designed the study. MR conducted the experiments, developed and implemented the computational methods, and analyzed the results. LG and GLC helped with data collection. MR and AFS interpreted the results and wrote the paper.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Aanes H, Winata CL, Lin CH, Chen JP, Srinivasan KG, Lee SG, Lim AY, Hajan HS, Collas P, Bourque G, et al. Zebrafish mRNA sequencing deciphers novelties in transcriptome dynamics during maternal to zygotic transition. Genome research. 2011;21:1328–1338. doi: 10.1101/gr.116012.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Arnold CD, Gerlach D, Stelzer C, Boryn LM, Rath M, Stark A. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science. 2013;339:1074–1077. doi: 10.1126/science.1232542. [DOI] [PubMed] [Google Scholar]
  3. Barckmann B, Simonelig M. Control of maternal mRNA stability in germ cells and early embryos. Biochimica et biophysica acta. 2013;1829:714–724. doi: 10.1016/j.bbagrm.2012.12.011. [DOI] [PubMed] [Google Scholar]
  4. Bazzini AA, Del Viso F, Moreno-Mateos MA, Johnstone TG, Vejnar CE, Qin Y, Yao J, Khokha MK, Giraldez AJ. Codon identity regulates mRNA stability and translation efficiency during the maternal-to-zygotic transition. EMBO J. 2016 doi: 10.15252/embj.201694699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Charlesworth A, Wilczynska A, Thampi P, Cox LL, MacNicol AM. Musashi regulates the temporal order of mRNA translation during Xenopus oocyte maturation. EMBO J. 2006;25:2792–2801. doi: 10.1038/sj.emboj.7601159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chen CY, Shyu AB. Mechanisms of deadenylation-dependent decay. Wiley Interdiscip Rev RNA. 2011;2:167–183. doi: 10.1002/wrna.40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Eichhorn SW, Subtelny AO, Kronja I, Kwasnieski JC, Orr-Weaver TL, Bartel DP. mRNA poly(A)-tail changes specified by deadenylation broadly reshape translation in Drosophila oocytes and early embryos. Elife. 2016;5 doi: 10.7554/eLife.16955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Eser P, Wachutka L, Maier KC, Demel C, Boroni M, Iyer S, Cramer P, Gagneur J. Determinants of RNA metabolism in the Schizosaccharomyces pombe genome. Molecular systems biology. 2016;12:857. doi: 10.15252/msb.20156526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Gamberi C, Peterson DS, He L, Gottlieb E. An anterior function for the Drosophila posterior determinant Pumilio. Development. 2002;129:2699–2710. doi: 10.1242/dev.129.11.2699. [DOI] [PubMed] [Google Scholar]
  10. Geisberg JV, Moqtaderi Z, Fan X, Ozsolak F, Struhl K. Global analysis of mRNA isoform half-lives reveals stabilizing and destabilizing elements in yeast. Cell. 2014;156:812–824. doi: 10.1016/j.cell.2013.12.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Giraldez AJ, Mishima Y, Rihel J, Grocock RJ, Van Dongen S, Inoue K, Enright AJ, Schier AF. Zebrafish MiR-430 promotes deadenylation and clearance of maternal mRNAs. Science. 2006;312:75–79. doi: 10.1126/science.1122689. [DOI] [PubMed] [Google Scholar]
  12. Grossman SR, Zhang X, Wang L, Engreitz J, Melnikov A, Rogov P, Tewhey R, Isakova A, Deplancke B, Bernstein BE, et al. Systematic dissection of genomic features determining transcription factor binding and enhancer function. Proceedings of the National Academy of Sciences of the United States of America. 2017;114:E1291–E1300. doi: 10.1073/pnas.1621150114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Guo H, Ingolia NT, Weissman JS, Bartel DP. Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature. 2010;466:835–840. doi: 10.1038/nature09267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P, Rothballer A, Ascano M, Jr, Jungkamp AC, Munschauer M, et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell. 2010;141:129–141. doi: 10.1016/j.cell.2010.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hyman LE, Wormington WM. Translational inactivation of ribosomal protein mRNAs during Xenopus oocyte maturation. Genes Dev. 1988;2:598–605. doi: 10.1101/gad.2.5.598. [DOI] [PubMed] [Google Scholar]
  16. Islam S, Zeisel A, Joost S, La Manno G, Zajac P, Kasper M, Lonnerberg P, Linnarsson S. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat Methods. 2014;11:163–166. doi: 10.1038/nmeth.2772. [DOI] [PubMed] [Google Scholar]
  17. Jukam D, Shariati SAM, Skotheim JM. Zygotic Genome Activation in Vertebrates. Dev Cell. 2017;42:316–332. doi: 10.1016/j.devcel.2017.07.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Kane DA, Kimmel CB. The zebrafish midblastula transition. Development. 1993;119:447–456. doi: 10.1242/dev.119.2.447. [DOI] [PubMed] [Google Scholar]
  19. Kedde M, Strasser MJ, Boldajipour B, Oude Vrielink JA, Slanchev K, le Sage C, Nagel R, Voorhoeve PM, van Duijse J, Orom UA, et al. RNA-binding protein Dnd1 inhibits microRNA access to target mRNA. Cell. 2007;131:1273–1286. doi: 10.1016/j.cell.2007.11.034. [DOI] [PubMed] [Google Scholar]
  20. Kheradpour P, Ernst J, Melnikov A, Rogov P, Wang L, Zhang X, Alston J, Mikkelsen TS, Kellis M. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome research. 2013;23:800–811. doi: 10.1101/gr.144899.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Lee MT, Bonneau AR, Takacs CM, Bazzini AA, DiVito KR, Fleming ES, Giraldez AJ. Nanog, Pou5f1 and SoxB1 activate zygotic gene expression during the maternal-to-zygotic transition. Nature. 2013;503:360–364. doi: 10.1038/nature12632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lim J, Lee M, Son A, Chang H, Kim VN. mTAIL-seq reveals dynamic poly(A) tail regulation in oocyte-to-embryo development. Genes Dev. 2016;30:1671–1682. doi: 10.1101/gad.284802.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Lunde BM, Moore C, Varani G. RNA-binding proteins: modular design for efficient function. Nat Rev Mol Cell Biol. 2007;8:479–490. doi: 10.1038/nrm2178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Mishima Y, Tomari Y. Codon Usage and 3′ UTR Length Determine Maternal mRNA Stability in Zebrafish. Molecular cell. 2016;61:874–885. doi: 10.1016/j.molcel.2016.02.027. [DOI] [PubMed] [Google Scholar]
  26. Neymotin B, Ettore V, Gresham D. Multiple Transcript Properties Related to Translation Affect mRNA Degradation Rates in Saccharomyces cerevisiae. G3 (Bethesda) 2016 doi: 10.1534/g3.116.032276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Oikonomou P, Goodarzi H, Tavazoie S. Systematic identification of regulatory elements in conserved 3′ UTRs of human transcripts. Cell reports. 2014;7:281–292. doi: 10.1016/j.celrep.2014.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Pauli A, Valen E, Lin MF, Garber M, Vastenhouw NL, Levin JZ, Fan L, Sandelin A, Rinn JL, Regev A, et al. Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis. Genome research. 2012;22:577–591. doi: 10.1101/gr.133009.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Pique M, Lopez JM, Foissac S, Guigo R, Mendez R. A combinatorial code for CPE-mediated translational control. Cell. 2008;132:434–448. doi: 10.1016/j.cell.2007.12.038. [DOI] [PubMed] [Google Scholar]
  30. Rabani M, Kertesz M, Segal E. Computational prediction of RNA structural motifs involved in posttranscriptional regulatory processes. Proceedings of the National Academy of Sciences of the United States of America. 2008;105:14885–14890. doi: 10.1073/pnas.0803169105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Rabani M, Raychowdhury R, Jovanovic M, Rooney M, Stumpo DJ, Pauli A, Hacohen N, Schier AF, Blackshear PJ, Friedman N, et al. High-resolution sequencing and modeling identifies distinct dynamic RNA regulatory strategies. Cell. 2014;159:1698–1710. doi: 10.1016/j.cell.2014.11.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Radford HE, Meijer HA, de Moor CH. Translational control by cytoplasmic polyadenylation in Xenopus oocytes. Biochimica et biophysica acta. 2008;1779:217–229. doi: 10.1016/j.bbagrm.2008.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Ramos SB, Stumpo DJ, Kennington EA, Phillips RS, Bock CB, Ribeiro-Neto F, Blackshear PJ. The CCCH tandem zinc-finger protein Zfp36l2 is crucial for female fertility and early embryonic development. Development. 2004;131:4883–4893. doi: 10.1242/dev.01336. [DOI] [PubMed] [Google Scholar]
  34. Ray D, Kazan H, Cook KB, Weirauch MT, Najafabadi HS, Li X, Gueroussov S, Albu M, Zheng H, Yang A, et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature. 2013;499:172–177. doi: 10.1038/nature12311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Rosenberg AB, Patwardhan RP, Shendure J, Seelig G. Learning the sequence determinants of alternative splicing from millions of random sequences. Cell. 2015;163:698–711. doi: 10.1016/j.cell.2015.09.054. [DOI] [PubMed] [Google Scholar]
  36. Shalem O, Sharon E, Lubliner S, Regev I, Lotan-Pompan M, Yakhini Z, Segal E. Systematic dissection of the sequence determinants of gene 3′ end mediated expression control. PLoS Genet. 2015;11:e1005147. doi: 10.1371/journal.pgen.1005147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Stoeckius M, Grun D, Kirchner M, Ayoub S, Torti F, Piano F, Herzog M, Selbach M, Rajewsky N. Global characterization of the oocyte-to-embryo transition in Caenorhabditis elegans uncovers a novel mRNA clearance mechanism. EMBO J. 2014;33:1751–1766. doi: 10.15252/embj.201488769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Subtelny AO, Eichhorn SW, Chen GR, Sive H, Bartel DP. Poly(A)-tail profiling reveals an embryonic switch in translational control. Nature. 2014;508:66–71. doi: 10.1038/nature13007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Tadros W, Goldman AL, Babak T, Menzies F, Vardy L, Orr-Weaver T, Hughes TR, Westwood JT, Smibert CA, Lipshitz HD. SMAUG is a major regulator of maternal mRNA destabilization in Drosophila and its translation is activated by the PAN GU kinase. Dev Cell. 2007;12:143–155. doi: 10.1016/j.devcel.2006.10.005. [DOI] [PubMed] [Google Scholar]
  40. Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2011;73:273–282. [Google Scholar]
  41. Tuerk C, Gold L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science. 1990;249:505–510. doi: 10.1126/science.2200121. [DOI] [PubMed] [Google Scholar]
  42. Vesterlund L, Jiao H, Unneberg P, Hovatta O, Kere J. The zebrafish transcriptome during early development. BMC Dev Biol. 2011;11:30. doi: 10.1186/1471-213X-11-30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Voeltz GK, Steitz JA. AUUUA sequences direct mRNA deadenylation uncoupled from decay during Xenopus early development. Mol Cell Biol. 1998;18:7537–7545. doi: 10.1128/mcb.18.12.7537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Weill L, Belloc E, Bava FA, Mendez R. Translational control by changes in poly(A) tail length: recycling mRNAs. Nat Struct Mol Biol. 2012;19:577–585. doi: 10.1038/nsmb.2311. [DOI] [PubMed] [Google Scholar]
  45. Wharton RP, Struhl G. RNA regulatory elements mediate control of Drosophila body pattern by the posterior morphogen nanos. Cell. 1991;67:955–967. doi: 10.1016/0092-8674(91)90368-9. [DOI] [PubMed] [Google Scholar]
  46. Yartseva V, Takacs CM, Vejnar CE, Lee MT, Giraldez AJ. RESA identifies mRNA-regulatory sequences at high resolution. Nat Methods. 2017;14:201–207. doi: 10.1038/nmeth.4121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Zhao BS, Wang X, Beadell AV, Lu Z, Shi H, Kuuspalu A, Ho RK, He C. m6A-dependent maternal mRNA clearance facilitates zebrafish maternal-to-zygotic transition. Nature. 2017;542:475–478. doi: 10.1038/nature21355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Zhao W, Pollack JL, Blagev DP, Zaitlen N, McManus MT, Erle DJ. Massively parallel functional annotation of 3′ untranslated regions. Nature biotechnology. 2014;32:387–391. doi: 10.1038/nbt.2851. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

Table S1. A list of 90,000 3′UTR sequences designed for UTR-Seq reporters. Related to Figure 1.

Table S2. Normalized mRNA levels of A- UTR-Seq reporters. Related to Figure 1.

Table S3. Normalized mRNA levels of A+ UTR-Seq reporters. Related to Figure 1.

Table S4. Sequence enrichments of k-mers in UTR-Seq data. Related to Figure 4. All enriched k-mer sequences with p<1e-25. P-values below 1e-30 are colored in orange (fast degradation), green (show degradation), red (class enrichment) or blue (class depletion).

Table S5. A list of 58 3′UTR sequences designed for validation reporters. Related to Figure 6.

Table S6. Normalized mRNA levels of A- validation reporters. Related to Figure 6.

Table S7. Normalized mRNA levels of A+ validation reporters. Related to Figure 6.

2
3
4
5
6
7
8

RESOURCES