Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2025 Jun 30;53(12):gkaf568. doi: 10.1093/nar/gkaf568

The quantitative impact of 3′UTRs on gene expression

Jessica D West 1, Hannah J Smith 2, Luyen Tien Vu 3, Elizabeth A Fogarty 4, Kenneth A Matreyek 5, Douglas M Fowler 6,7, Andrew Grimson 8,
PMCID: PMC12207410  PMID: 40586305

Abstract

Control of gene expression is fundamental to biology, and post-transcriptional regulation is an important component of this process. In mammals, the 3′UTR in particular serves as a major source of regulatory information within the transcript. Here we developed an accurate massively parallel reporter assay (MPRA) system to evaluate the impact of >1400 full-length human 3′UTRs on RNA abundance, stability, translational regulation, and total protein output. We demonstrated that our MPRA is consistent with regulation of the corresponding endogenous transcripts. We used the MPRA datasets to model the relative contributions of RNA abundance and translational efficiency toward total 3′UTR-mediated regulation, revealing an unexpectedly large role for 3′UTR-specified translational control, and providing additional evidence that much of 3′UTR-encoded regulation is mediated by concerted regulation of translation plus decay. We observed relationships between GC content and 3′UTR length and different modes of regulation, and identified sequence motifs corresponding to regulatory RNA-binding proteins associated with mediating 3′UTR-dependent gene expression. We compared regulation from >1400 3′UTRs under control of two dissimilar promoters, which revealed promoter-associated differences in post-transcriptional regulation for certain 3′UTRs. Together, this dataset represents a comprehensive characterization of 3′UTR-mediated quantitative regulation.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Introduction

Regulation of gene expression is a fundamental aspect of cellular function and plays a crucial role in virtually all biological processes. Thus far, much research has been centered on understanding how DNA elements control transcription; however, post-transcriptional mechanisms add a consequential layer of regulation, and are less understood [1, 2]. The entire mRNA can harbor sequence elements that govern post-transcriptional regulation; however, the 3′ untranslated region (3′UTR) serves as a major source of regulatory information within the transcript, a role that is particularly prominent in mammals [3]. Sequences within 3′UTRs are often conserved, and mutations predicted to impact fitness are enriched in 3′UTRs relative to other noncoding portions of the genome, including promoters and enhancers [4–6]. In humans, 3′UTRs are typically longer and contain more conserved sequence elements than in other organisms, consistent with their role in complex regulation [3]. Many human genes undergo alternative polyadenylation and cleavage (APA), whereby different 3′ ends are chosen, resulting in different 3′UTR sequences [7–9]. APA is a highly regulated process, as alternative sites are often used at different rates in different cell types, and has been implicated in many biological processes, as well as in human health and disease [10–17].

Post-transcriptional regulation is typically mediated by trans-acting factors that interact with the 3′UTR [18]. For example, microRNAs (miRNAs) are well-characterized regulators that bind to cis regulatory elements within the 3′UTR to accelerate mRNA decay and repress translation of the cognate transcript, thus decreasing total protein expression of target genes [19]. It is widely accepted that these trans factors are key regulatory molecules, as inhibition of the miRNA biogenesis machinery or of individual miRNAs has severe, often lethal, phenotypes [20]. Regulatory RNA-binding proteins (RBPs) also act as 3′UTR trans factors [21–23]. Indeed, it is possible that the majority of 3′UTR-mediated gene regulation in mammals is specified by RBPs and their cognate binding sites [24]. However, identifying such binding sites is often challenging; in vitro binding assays such as RNA Bind-n-Seq have found that many RBPs intrinsically bind degenerate sequences, and a large fraction of RBPs exhibit complex patterns of RNA recognition [23, 25, 26]. Moreover, RBP binding preferences determined in vitro often do not correspond closely to in vivo binding patterns, as demonstrated by assays such as enhanced UV crosslinking and immunoprecipitation (eCLIP) and mRNA response upon RBP knockdown [23, 26, 27]. These discrepancies likely derive from a variety of factors, including the influence of sequence and structure surrounding potential cis elements, the impact of the suite of regulatory RBPs binding a particular transcript, and, perhaps, promoter-specific co-transcriptional RBP loading [28–30].

Existing high-throughput methods that assess regulatory activity in the 3′UTR suffer from significant limitations. Transcriptomic profiling of endogenous transcripts can identify post-transcriptional regulation, but it is difficult to ascribe such regulation directly to the 3′UTR, as the 5′UTR and coding region also influence post-transcriptional regulation [31–33]. Additionally, long-read sequencing studies have indicated that mRNA processing events are often inter-dependent, complicating comparisons made for different 3′UTR isoforms of the same gene [34, 35]. Massively parallel 3′UTR reporter assays (MPRAs) are a powerful approach used to mitigate this problem; the promoter, 5′UTR, and coding region can be held constant, and thus observed regulation can be directly attributed to the 3′UTR [36, 37]. MPRAs are capable of measuring thousands of 3′UTR sequences at once, and have been used to investigate 3′UTR-mediated regulation in multiple contexts [38–46]. However, MPRA-based studies have thus far been limited to the investigation of short, typically ∼100–150 nucleotide (nt), 3′UTR fragments, a limitation given the well-known context dependency of many regulatory elements. For example, identical AU-rich elements can be activating in one sequence context while repressive in another, and efficacy of miRNA target sites depend heavily on context [42, 47, 48]. Additionally, 3′UTR elements often act synergistically; therefore, restricting analysis to short fragments is likely unable to capture endogenous regulation [18]. Therefore, to fully understand 3′UTR-mediated regulation, elements must be studied within their complete surrounding sequence context. Additionally, post-transcriptional control encompasses regulation at multiple levels, including transcript stability, localization and translational status [3, 18, 49]. Most previous studies have neglected differentiating between these levels of regulation, and none have analyzed full-length 3′UTRs in such a manner.

In this work, we develop a system to monitor large libraries of full-length 3′UTRs for their impact on gene regulation. Importantly, our approach determines systematically the impact of full-length 3′UTR sequences on cumulative quantitative regulation (i.e. protein output) and differentiates the contributions from both RNA abundance, RNA decay, and translational efficiency on this total regulation. We use these assays to define 3′UTR features influencing RNA abundance, translation, and overall protein output, lending insights into the global scope of 3′UTR-mediated regulation.

Materials and methods

Cell lines

All cell lines were grown at 37°C and 5% CO2. HEK293T and HEK293T-derived landing pad cell lines were used and grown in Dulbecco’s Modified Eagle Medium (Cytiva) supplemented with 10% FBS (Sigma-Aldrich) and 1% penicillin-streptomycin (Gibco). The cell line with the Tet-On promoter was supplemented with 2 μg/mL doxycycline. The cell line with the Tet-Off promoter was passaged in 1 ng/mL doxycycline before the transcriptional pulse and 2 μg/mL after the transcriptional pulse. Cells were washed in PBS (Lonza) and Trypsin-EDTA (Gibco) was used to dissociate cells from plates. Cell lines were routinely verified to be free of mycoplasma contamination using a previously described PCR protocol with primers 5′TGCACCATCTGTCACTCTGTTAACCTC and 5′GGGAGCAAACAGGATTAGATACCCT [50]. Cells were also verified to be free of cell line contamination by verifying presence of HEK293T-specific SNPs within the RNA-seq data [51].

Generating 3′UTR plasmid library

Cloning barcoded GFP vectors

To make the 3′UTR reporter library, a vector pool of over one million barcoded GFP vectors was generated (Supplementary Fig. S1A). To make this pool, a BamHI restriction enzyme site was inserted into an attB-EGFP vector [52] directly downstream of GFP, such that the 12 nt random barcode was inserted immediately after a dual stop codon. The region 15 nt downstream of the stop codon is typically refractory to functional cis regulatory elements [48], presumably because it is blocked by the terminating ribosome. We verified with a pilot MPRA that the barcodes did not influence translation (r = 0.94 for replication between barcodes linked to the same 3′UTR). Two complementary oligonucleotides were ordered (5′TCGGCATGGACGAGCTGTACAAGTAAACGNNNNNNNNNNNNCGTTCAACTTTCTTGTACA, 5′GTATGGCTGATTATGATCAGTCGTTGCAACAAATTGATAAGGATCCTCCACTTTGTACAAGAAAGTTGAACG), one with a degenerate 12 nt sequence (hand-mixed from Integrated DNA Technologies), which were then annealed and extended with Q5 polymerase (New England Biolabs). The double stranded DNA duplex contained Gibson cloning-compatible overlapping sequences and was designed to regenerate the BamHI site, facilitating later insertion of 3′UTRs. This complex pool of inserts was cloned into the BamHI-digested attB-EGFP vector using homemade “Hot Fusion” assembly reaction mix [53], transformed into DH5α Escherichia coli competent cells (prepared in-house with the Zymo Mix & Go! E. coli Transformation Kit) and plated onto 15 cm carbenicillin LB agar plates. After overnight incubation at 37°C, colonies (approximately one million in total) were scraped from each plate with <5 ml LB and combined from all plates. The pooled bacteria were pelleted and plasmid DNA was prepared using the Promega PureYield Plasmid Midiprep Kit. The final complex pool of barcoded vectors was used in each 3′UTR cloning reaction.

Cloning 3′UTR library into barcoded GFP vectors

The plasmid collection of human 3′UTRs was purchased as bacterial stocks in 96-well plates from DNASU [54]. Each 3′UTR was cloned with 150 nt of genomic sequence past the 3′ cleavage site, increasing the likelihood it is processed using its endogenous cleavage and polyadenylation (CPA) site. We also included a strong SV40 poly(A) signal immediately downstream of the extended 3′UTR sequence. To clone 3′UTRs into the GFP barcoded vector, each of 16 96-well plates was stamped onto rectangular kanamycin LB agar plates, grown overnight at 37°C, stamped into liquid culture in deepwell 96-well plates, and grown overnight at 37°C in a 96-well shaker (Supplementary Fig. S1A). The 3′UTR plasmids were miniprepped using the Zyppy-96 Plasmid Miniprep Kit (Zymo). Minipreps were diluted 1:10 and 1 μl was used in a 20 μl Q5 (New England Biolabs) PCR reaction with primers targeted to short constant regions surrounding each 3′UTR. The primers also added sequences overlapping the barcoded attB-EGFP vector to facilitate plasmid assembly. The 16 96-well plates of PCR reactions were purified using the ZR-96 DNA Clean & Concentrator-5 kit (Zymo). These 3′UTR amplicons were cloned into the pool of barcoded GFP vectors by adding 2 μl of the 3′UTR purified PCR products to the pool of BamHI-digested vectors with 15 μl of 1.33x homemade assembly mix [53]. The assembly reaction was incubated for 1 h at 50°C and 1 μl was added to 10 μl of competent DH5α (prepared in-house with the Zymo Mix & Go! E. coli Transformation Kit) in 96-well plates. Each reaction was immediately plated onto one prewarmed carbenicillin LB agar plate, and grown overnight at 37°C. The next day, four colonies were chosen from each LB agar plate for colony PCR and picked into 5 μl water; 3 μl was transferred to a rectangular carbenicillin LB agar plate and the remaining 2 μl was used in a 20 μl PCR reaction using homemade Taq polymerase with primers that annealed outside the 3′UTR to the parental barcoded GFP vector. Colony PCR reactions were run on agarose gels and clones matching the expected 3′UTR length were selected. If fewer than three clones were obtained, another set of clones was taken for colony PCR to increase the likelihood each 3′UTR was covered by at least three barcodes. The selected clones were re-arrayed and grown in 96-well deepwell plates in liquid culture at 37°C overnight in a 96-well shaker. The next day, bacterial cultures were pooled, split into tubes, pelleted, and DNA was isolated using the Promega PureYield Plasmid Midiprep Kit.

Plasmid sequencing via tagmentation to link barcodes to 3′UTR

For library preparation with Tn5 transposase, the arrayed library of GFP 3′UTR reporter plasmids was combined into three pools such that only one barcode for a given 3′UTR should be present in each pool. To determine the appropriate ratio of DNA to Tn5 transposase, 50 ng of each plasmid pool was combined with various dilutions of in-house purified Tn5 transposase (a gift from Roman Spektor) in a 25 μl reaction (reaction conditions 10 mM Tris pH 7.5, 5mM MgCl2, 10% DMF) and incubated at 55°C for 8 min [55]. The tagmentation reactions were stopped with 2.5 μl 1% SDS. The optimization reactions were PCR amplified and ran on a 6% PAGE gel. The reaction that yielded the most library in the size range 400–1000 bp was selected for full-scale amplification. Five μl of the tagmentation reaction was PCR amplified in a 50 μl reaction with Phusion polymerase for seven cycles, ethanol precipitated and size-selected (400–1000 bp) on a 6% native PAGE gel. Libraries were quantified with Qubit dsDNA High Sensitivity, size-verified on an Agilent Fragment Analyzer, and sequenced at Novogene on a HiSeqX Ten (Illumina) with the paired-end 2 × 150 bp kit.

Generating landing pad cell lines and integrating 3′UTR reporters

Lentiviral transfer plasmid design and construction

The Tet-On cell line (293T LLP-iCasp9-Blast Clone 12) used in the pilot MPRA was described previously [56]. To generate the PGK landing pad vector (Supplementary Fig. S1B, the human phosphoglycerate kinase 1 (PGK) promoter, EF1α 5′UTR (containing an intron and commonly used in mammalian expression vectors), and BFP (mTagBFP2) were cloned into a lentiviral plasmid backbone [56]. To generate the CAG landing pad vector (Supplementary Fig. S1B), the CAG promoter and BFP (mTagBFP2) were cloned into the lentiviral plasmid backbone. A puromycin marker was included in both vectors to facilitate selection during landing pad generation. For the Tet-Off RNA stability MPRA, a Tet-Off-compatible vector was made by inserting the following into the above lentiviral plasmid backbone: the TRE-tight2 promoter, a human ACTB intron (amplified with primers 5′ CGCCTGGAGAATTCGAGCTGACCGCCGAGACCGCGTCC and 5′ GTGGTTGACCAGACAAACCGGGTGAGCTGCGAGAATAGCCGG from human genomic DNA), mTagBFP2 fused to Bxb1 recombinase and BleoR antibiotic resistance CDS, and a tTA transactivator driven by the CMV promoter. The vectors were assembled with homemade Hot Fusion reaction mix [53]. Plasmids were sequence-verified with Sanger sequencing (Cornell BRC Genomics Facility).

Lentivirus production

To produce lentivirus, 600 000 HEK293T cells were plated in 6 cm plates in DMEM supplemented with 10% FBS. The next day (24 h later) cells were transfected with 1 μg lentiviral transfer vector (containing either the PGK or CAG landing pad), 900 ng packaging vector (psPAX2, Addgene plasmid #12 260), and 100 ng VSV-G envelope vector (pMD2.G, Addgene plasmid #12 259) diluted to 120 μL in optimum with 6 μL TransIT-LT1 transfection reagent. The next day (24 h later), the media was changed to DMEM supplemented with 30% FBS and 1% penicillin-streptomycin. The next day (24 h later) lentivirus was harvested by transferring conditioned media to a 15 mL Falcon tube, centrifuging at 400 x g for 5 min, and collecting supernatant. Fresh virus was used to transduce cells to generate landing pad cell lines and the remaining virus stored at −80°C.

Lentiviral transduction

To generate landing pad cell lines, 250 000 HEK293T cells were plated in each well of a 6-well plate. The next day the media was changed to DMEM supplemented with 10% FBS, 1% penicillin-streptomycin, and 8 μg/mL polybrene. Cells were transduced with varying amounts (between 25 and 200 μL) of lentivirus. The next day media was replaced with DMEM supplemented with 10% FBS, 1% penicillin-streptomycin, and 2 μg/mL puromycin. To estimate multiplicity of infection (MOI) with percent BFP+ cells via flow cytometry, some wells did not undergo puromycin selection. A low MOI (<1% as determined by percent BFP+ cells) was verified to ensure single-copy integration. Cells were cultured for another 10 days before single-cell FACS sorting to isolate single cell clones.

Generation and verification of clonal cell lines

After transduction and puromycin selection, landing pad cell lines were single cell sorted on a FACSAriaII into 96-well plates based on BFP expression. Clones were cultured, expanded, and screened for BFP expression and high recombination efficiency. Clones were also verified to have single integration with a dual-fluor experiment whereby an equimolar ratio of attB-mCherry and attB-GFP plasmids are transfected with Bxb1 recombinase and analyzed for presence of mCherry+/GFP + cells [52].

Integration of pooled 3′UTR library into landing pad cell lines

To generate pooled 3′UTR reporter cell lines (Supplementary Fig. S1C), five 10 cm plates of the PGK or CAG landing pad cell were reverse transfected with 17 μg of Bxb1 recombinase-expressing plasmid with Fugene 6 (Promega) in antibiotic-free media. The next day, media was changed and 17 μg of the pooled GFP 3′UTR reporter plasmid was transfected with Fugene 6. The next day, media was changed. Cells were pooled and passaged for at least 10 days before sorting. At least 5-6 million cells were plated every 3 days to maintain coverage of the library.

Flow cytometry

Isolating recombinants with FACS

For cell sorting, cells were grown to 90% confluency, dissociated from plates, filtered through a 40 μm filter, and resuspended in PBS. Cells were sorted on a FACS Aria Fusion. Gates were chosen subjectively as follows: (1) FSC-A versus SSC-A to select for live cells, (2) FSC-A versus FSC-H to remove doublets, (3) GFP versus BFP, to select recombinants. Only cells lacking BFP and expressing GFP were chosen. To maintain coverage, at least 5 million live cells were sorted. Sorted cells were plated and grown at least one week before subsequent sorting into GFP bins.

Pooled 3′UTR reporter FACS MPRA

The pool of cells expressing the 3′UTR reporter library was sorted into five bins based on GFP fluorescence. Gating was performed as above for live cell selection and doublet exclusion, and then drawn on the GFP histogram as follows: 0–10%, 20–30%, 40–60%, 70–80%, 90–100%. Four-way sorting was used, and therefore cells were sorted in batches, alternating throughout the sort. Batch 1: 0–10%, 20–30%, 70–80%, 90–100%. Batch 2: 40–60%. Between 1.5 and 2 million cells were sorted per bin in order to maintain >350x coverage of the complex library. After the sort, cells were pelleted, resuspended in lysis buffer (Qiagen Gentra Puregene Cell DNA extraction Kit) and stored at room temperature until genomic DNA extraction.

Flow cytometry analysis of validation reporters

Analytical flow cytometry of validation reporters was performed using the Thermo Fisher Attune NxT flow cytometer; FlowJo v10 was used to determine median and geometric mean GFP fluorescence intensity of 3′UTR validation reporters after gating for live cells (FSC-A versus SSC-A), doublet exclusion (FSC-A versus FSC-H), and recombinants (BFP-/GFP+), as described above. Gates for recombinants were either drawn using the autogate feature in FlowJo or visually based on the major population on the BFP/GFP axes. Fluorescence intensities were normalized as described in the figure legends.

Polysome profiling

For each replicate, five 70% confluent 10 cm plates of cells were treated with 100 μg/mL cycloheximide in media for 3 min at 37°C, lifted from plates in ice cold PBS supplemented with 100 μg/mL cycloheximide, pelleted at 180 × g for 3 min, and lysed in 500 μL ice-cold polysome lysis buffer (10 mM HEPES pH 7.6, 100 mM KCl, 5 mM MgCl2, 5 mM DTT, 1% Triton X-100, 100 μg/mL cycloheximide) on ice for 10 min. Lysates were clarified by centrifuging at 16 000 × g for 15 min at 4°C and the supernatant was snap frozen in liquid nitrogen. For polysome profiling, 15-45% (w/v) sucrose gradients in polysome buffer (10 mM HEPES pH 7.6, 100 mM KCl, 5 mM MgCl2, 5 mM DTT, 100 μg/mL cycloheximide) were prepared with a Gradient Master (Biocomp). Lysate was thawed quickly and 500 μL was loaded onto the gradient. As input, 50 μL of clarified lysate was taken into 1 mL TRIzol to represent the cytoplasmic fraction. Polysome gradients were centrifuged at 32 000 rpm for 2.5 h at 4°C in a SW40 ultracentrifuge rotor. To fractionate the sample using the Brandel density gradient fractionation system, the bottom of the ultracentrifuge tube was pierced with a needle and 60% sucrose was injected. The sample was flowed through a UV absorbance (254 nm) reader and fractions were manually selected in real time based on peak absorbances, accounting for the slight delay between UV absorbance reading and collection. From each fraction, 100 μL was taken into 1 mL TRIzol for library preparation and the rest snap frozen in liquid nitrogen. RNA was extracted and libraries were made as described below.

RNA stability assays

Transcriptional pulse and promoter shutoff

Promoter shutoff experiments were performed using the Tet-Off system [57]. The pilot library of 173 3′UTRs was integrated into the Tet-Off landing pad cell line (see above for plasmid construction) as described above. Cells were passaged in the presence of 1 ng/mL doxycycline for at least 1 week before starting the decay experiments. For individual 3′UTR reporter experiments, 400 000 cells per well were plated (in duplicate) in 6-well plates 16 h before the transcriptional pulse. For the MPRA decay experiment with the 173 3′UTRs, 400 000 cells were plated (in duplicate) in 6 cm dishes 48 h prior to starting the transcriptional pulse. Cells were washed with PBS to remove residual doxycycline and incubated in media without doxycycline for a 4-h transcriptional pulse. The transcriptional pulse was halted with 2 μg/mL doxycycline (timepoint 0). For the four individual 3′UTR reporters, cells were collected in 1 mL TRIzol at hours 0, 1, 2.5, 4, 8, and 20.5. For the library of 173 3′UTRs, cells were collected in 2 mL TRIzol at hours 0, 1, 2, 3, 4, 8, and 20. To obtain total RNA abundance measurements for the library of 3′UTRs, cells never passaged in doxycycline were collected in TRIzol for RNA libraries and lysis buffer (Qiagen Gentra Puregene Cell DNA extraction Kit) for gDNA libraries. Samples in TRIzol were stored at −80°C in 1 mL aliquots and samples in gDNA lysis buffer were stored at room temperature until library preparation.

Quantitative RT-PCR

RNA extractions were performed with TRIzol followed by column purification (Zymo Quick-RNA Microprep RNA kit). Samples in TRIzol were thawed at room temperature with agitation, and chloroform (200 μL) was added and mixed thoroughly. Samples were centrifuged at 16 000 x g for 15 min at 4°C, and the aqueous layer collected. Ethanol (100%) was added at a 1:1 ratio and mixed. Samples were transferred to Zymo IC columns and RNA extractions finished according to the Zymo Quick-RNA Microprep RNA protocol, including the DNase I treatment.

Sample concentration and purity were assessed by a NanoDrop spectrophotometer (Thermo Scientific). To perform reverse transcription, 500 ng RNA was heated with 8 μM random nonanucleotide (dN9) primer and nuclease-free H2O in 12.5 μL at 80°C for 5 min followed by 60°C for 5 min, then cooled on ice. The rest of the reaction mix was added (4 μL RevertAid 5x Reaction Buffer, 0.5 μL Ribolock, 1 mM dNTPs, 1 μL RevertAid Reverse Transcriptase or H2O for the no RT control) and the reaction incubated at 42°C for 1 h, 70°C for 10 min, then cooled on ice. The cDNA was diluted 1:25 and 5 μL used in a 10 μL qPCR reaction with 0.25 μM each primer and LightCycler 480 SYBR Green I Master (Roche). Reactions were run in technical triplicate, including no-RT and no-template controls. GFP RNA expression was measured with primers 5′GTCAACCACCGCGGTCTC and 5′GCTGAACTTGTGGCCGTTTA, and normalized to housekeeping gene PPIA with primers 5′ATGGTCAACCCCACCGTGT and 5′ TCTGCTGTCTTTGGGACCTTGTC. The following program was used for qPCR: 95°C for 5 min; 95°C for 10 s, 53°C for 15 s, and 72°C for 30 s 40 cycles; 95°C for 5 s, 65°C for 1 min, and 97°C for the melting curve; cooling to 40°C.

RNA half-life calculation for individual reporters

For each biological replicate, the qRT-PCR crossing point (Cp) values were averaged across technical replicates and normalized to the reference gene PPIA. For each 3′UTR reporter, each timepoint was then normalized to timepoint 0. Each normalized timepoint was then averaged across the two biological replicates and plotted to obtain decay curves. Decay curves and half-lives were determined using the one phase decay model in GraphPad Prism.

Barcode library preparation for RNA stability MPRA

RNA from each 6-cm plate was extracted with the Zymo Quick-RNA Microprep Kit as described above. For library preparation, 8 μg RNA was used in a reverse transcription reaction. Libraries were prepared as described in the section “Barcode sequencing library preparation” except SPARQ beads were used to purify PCR products. In parallel, 500 ng RNA with dN9 primers was used in a reverse transcription reaction for qRT-PCR, to monitor total GFP mRNA levels across the timecourse; these values were used to scale MPRA barcode sequencing counts when calculating half-lives from MPRA data.

3′RACE

RNA was extracted with the Zymo Quick-RNA Microprep Kit as above. For reverse transcription immediately afterwards, 0.5 mM dNTPs and 2.5 μM “lock-dock” primer (5′TCAGCTTGCATGCCTGCAGGTTTTTTTTTTTTTTTTTTTTTTTTTVN) were added to 500 ng cDNA and the mix heated at 65°C for 5 min followed by a 1-min incubation on ice. The rest of the reaction mix was then added (5x SSIV Buffer, 5 mM DTT, 1 μL Ribolock RNase Inhibitor, and 1 μL SuperScript IV Reverse Transcriptase). One-tenth of the cDNA was used for PCR I, a 50 μL reaction using 5x Q5 reaction buffer, 0.2 mM dNTPs, 0.5 μM each forward primer (5′ GTCAACCACCGCGGTCTC) and reverse primer (5′TCAGCTTGCATGCCTGCAGG), 0.5 μL Q5 High-Fidelity DNA Polymerase with the following reaction conditions: initial denaturation 98°C for 30 s; 12 touchdown cycles 98°C 10 s, 72°C for 20 s and decreasing by 0.5°C each cycle, 72°C for 2 min; 5 cycles 98°C for 10 s, 66°C for 20 s, 72°C 2 min; final extension at 72°C for 2 min; samples immediately removed to ice. The forward primer binds to the attR recombination site to selectively amplify integrated reporters. A 1x SPARQ bead cleanup was performed and one-tenth of the product used for PCR II, with the same reaction mix as PCR I except using a nested forward primer (5′ATGGTCCTGCTGGAGTTC) binding within the GFP coding region, with the following conditions: initial denaturation 98°C for 30 s; 20 touchdown cycles 98°C for 10 s, 72°C for 20 s and decreasing by 0.5°C each cycle, 72°C for 75 s; 15 cycles 98°C for 10 s, 62°C for 20 s, 72°C for 75 s; final extension at 72°C for 2 min. Samples were run on a 1% agarose gel in TBE buffer to visualize PCR products and confirm the absence of product in no RT and no-template controls. A 1x SPARQ bead cleanup was performed on the PCR II products, concentrations determined with Qubit, sample purity verified with a NanoDrop spectrophotometer (Thermo Scientific), and submitted for long-read PCR sequencing (Plasmidsaurus).

3′UTR MPRA library preparation

Genomic DNA extraction

Genomic DNA (gDNA) was extracted using the Qiagen Gentra Puregene Cell DNA extraction Kit with minor modifications. Briefly, cells were pelleted by centrifuging for 5 s at 16 000 × g. Cell lysis solution (300 μL) was added to the pellet, vortexed for 10 s and stored at room temperature until further processing. RNAse A (1.5 μL) was added to each tube, inverted 25 times, and incubated on a Thermomixer (37°C, 1200 rpm) for 1 h. Samples were incubated on ice for 1 min, 100 μL protein precipitation solution was added, vortexed at max speed for 20 s, and incubated on ice for 5 min. Samples were centrifuged at 16 000 × g for 3 min, and the supernatant was moved to a new tube. Glycoblue (1 μL) and 100% ethanol (1 mL) was added, mixed, and precipitated at −20°C for at least 1 h. DNA was pelleted at 16 000 × g at 4°C for 15 min. The pellet was washed with 900 μL cold 70% ethanol and dried at room temperature for 10–20 min. To fully dissolve pellet, DNA was resuspended in 50 μL nuclease-free water and incubated on the thermomixer (37°C, 1200 rpm) overnight. Genomic DNA was quantified with Qubit (Broad Range dsDNA Quantitation Kit). All gDNA was included in subsequent PCR reactions to maintain coverage.

RNA extraction and reverse transcription

Cells expressing the pooled 3′UTR reporter library were plated (at least 5 × 106 cells) in 10 cm plates to reach 70% confluency. Media was aspirated, TRIzol (Invitrogen) was added directly to the cells and then transferred to 1.7 mL tubes to be stored at −80°C until RNA extraction. In parallel, another set of plates was used to extract genomic DNA for normalizing barcode counts derived from RNA. Cells were dissociated from plates, washed in PBS, pelleted, and processed with the Qiagen Gentra Puregene Cell DNA Extraction Kit as described above.

RNA was extracted from TRIzol according to the manufacturer's protocol, with minor modifications. Briefly, 200 μL of chloroform was added to the sample, aggressively inverted, incubated at room temperature for 10 min, and centrifuged (16 000 × g, 4°C) for 15 min. The aqueous layer was transferred to a new tube and 500 μL isopropanol was added. Sample was mixed and precipitated on ice for 15 min. Sample was centrifuged (16 000 × g, 4°C) for 15 min and the pellet was washed twice with 900 μL 75% ethanol. The pellet was dried at room temperate for 10-20 min and resuspended in nuclease-free water. To generate cDNA, 10 μg of RNA was used in a 40 μL reaction with RevertAid Reverse Transcriptase, and an RT primer that annealed in a small constant region downstream of the barcode and upstream of the variable 3′UTR (5′CTCCACTTTGTACAAGAAAG). The cDNA was purified and concentrated with the Zymo DNA Clean & Concentrator Kit according to the manufacturer's ssDNA protocol. All purified cDNA was used in the subsequent PCR reactions to maintain coverage.

Barcode sequencing library preparation

To prepare barcode libraries, three PCRs were used to amplify the barcode region (derived from either genomic DNA or from RNA) and to add sequences compatible with Illumina sequencing. We observed that GFP reporter plasmids transfected during Bxb1 recombination can persist in cells for many weeks, and PCR amplification from this episomal DNA can confound results. Therefore, PCRI uses a forward primer that anneals only to the recombined site (attR, but not attP or attB), and therefore only amplifies barcode regions derived from chromosomally-integrated 3′UTR reporters. The reverse primer also contains a 10 nt variable region to improve nucleotide diversity in the first few cycles of Illumina sequencing and to measure extent of PCR duplication. PCR II amplifies only the region surrounding the barcode and adds sequences compatible with commercially available Illumina unique dual index (UDI) primers used in PCR III.

For PCRI, all cDNA (for RNA-derived libraries) or all gDNA (for DNA libraries) was used to amplify for three PCR cycles (ExTaq, forward primer 5′GTCAACCACCGCGGTCTC and reverse primer 5′ACGACGCTCTTCCGATCTNNNNNNNNNNcactttgtacaagaaagttgaacg). Reactions were then purified with SPRIselect beads (Beckman Coulter) using a 0.7x one-sided size selection protocol. All purified PCRI product was used for PCRII (three cycles, ExTaq, with forward primer 5′CGTGTGCTCTTCCGATCTcgaccactaccagcagaaca and reverse primer 5′ACACTCTTTCCCTACACGACGCTCTTCCGATCT), which was then purified with SPRIselect beads using a 0.6–1.2x two-sided size selection protocol. A portion of the purified PCRII reaction was used in a test amplification to determine number of cycles for PCRIII. The remaining PCRII purification was used in a final PCRIII reactions to add Illumina indices (Extaq, with primers from the NEBNext Multiplex Oligos for Illumina Kit). The final libraries were purified and concentrated with SPRIselect beads using a 0.5–1x two-sided size selection protocol, quantified with Qubit (High Sensitivity dsDNA Quantitation Kit) and size-verified on an Agilent Fragment Analyzer. The libraries were pooled in equimolar amounts, purified, and concentrated again with SPRIselect beads (0.5–1x two-sided size selection), and submitted for Illumina sequencing at a depth of 5–10 M reads per library (1000-2000x coverage).

RNA-seq, small RNA-seq and PRO-seq

Cell collection

Three replicates each of the PGK and CAG landing pad cell lines were plated to reach 70% confluency in 10 cm plates. Cells were washed with 5 mL ice-cold PBS, scraped off the plate in 5 mL PBS, pelleted (700 × g, 4°C, 3 min) and resuspended in 500 μL PBS. A portion (100 μL, 20%) of the cells was added to 1 mL TRIzol (Invitrogen) and stored at −80°C for small RNA-seq and RNA-seq. The remaining 400 μL was used for PRO-seq.

PRO-seq library preparation

The library preparation procedure was adapted from a previously-described PRO-seq protocol [58]. For permeabilization, cells were pelleted (700 × g, 4°C, 3 min), resuspended in permeabilization buffer (10 mM Tris-Cl pH 7.5, 10 mM KCl, 250 mM sucrose, 5 mM MgCl2, 1mM EGTA, 0.05% Tween-20, 0.5 mM DTT, 1X EDTA-free Protease Inhibitor, with 2 μL Superase RNase Inhibitor per 10 mL added fresh), and incubated on ice for 5 min. Cells were washed with wash buffer (10 mM Tris-Cl pH 7.5, 10 mM KCl, 150 mM sucrose, 5 mM MgCl2, 0.5 mM CaCl2, 0.008% Tween-20, 0.5 mM DTT, 1X Protease Inhibitor, with 2 μL Superase RNase Inhibitor per 10 mL added fresh) and washed and resuspended in storage buffer (50 mM Tris-Cl pH 8.3, 40% glycerol, 5 mM MgCl2, 0.1 mM EDTA, 0.5 mM DTT, with 2 μL Superase RNase Inhibitor per 10 mL added fresh). Samples were frozen in liquid nitrogen and stored at −80°C.

To perform the run-on reactions, cells were thawed and ∼1 million cells in 50 μL storage buffer used for each reaction. After adding 50 μL run-on reaction mix (10 mM Tris-Cl pH 8.0, 5 mM MgCl2, 1 mM DTT, 300 mM KCl, 40 μM ATP, 40 μM GTP, 40 μM Biotin-11-CTP, 40 μM Biotin-11-UTP, 0.5 μL SUPERase-In RNase Inhibitor, 1% sarkosyl) samples were incubated at 37°C for 5 min, the run-on reaction ended precisely by adding 350 μL Buffer RL from the Norgen Total RNA Purification Kit, which continued to be used for total RNA extraction. Each sample was thoroughly vortexed after adding 240 μL 100% ethanol, and spun through a spin column. Spin columns were washed twice with wash solution A, and given an additional spin to dry the column before eluting the RNA with 50 μL H2O. Following denaturation at 65°C for 30 s, RNA was hydrolyzed on ice with 25 μL of cold 1 N NaOH. After 10 min, NaOH was neutralized with 125 μL cold 1 M Tris-Cl pH 6.8. RNA was precipitated by adding 20 μL 5 M NaCl, 1 μL GlycoBlue, and 625 μL cold 100% ethanol, thoroughly vortexing, and incubating for 1 h at 4°C with agitation. RNA precipitation was completed by centrifugation at 20 000 × g for 30 min at 4°C, the pellet washed with cold 70% ethanol, air-dried for 5 min, and then resuspended in 1 μL 100 μM 3′ adapter (rUrNrNrNrNrNGATCGTCGGACTGTAGAACTCTGAAC/3InvdT/) and 6.5 μL H2O.

To ligate 5′ and 3′ adapters, RNA was first denatured at 65°C for 30 s and snap cooled on ice. The 3′ adapter was ligated at 20°C for 4 h in 1x T4 RNA ligase buffer, 1 mM ATP, 0.5 μL SUPERase RNase Inhibitor, 15% PEG 8000, and 2 μL T4 RNA ligase, and then stored at 4°C overnight. For each sample, 10 μL Dynabeads MyOne Streptavidin C1 were prepared by washing once in NaOH wash buffer (0.1 N NaOH, 50 mM NaCl) and twice with binding buffer (10 mM Tris-Cl pH 7.5, 300 mM NaCl, and 0.1% Triton X-100, with 2 μL Superase RNase Inhibitor added fresh per 10 mL) prior to resuspension in 25 μL binding buffer. Samples were incubated for 20 min at 25°C with 55 μL binding buffer and 25 μL washed beads. Samples were then washed with a high salt buffer (50 mM Tris-Cl pH 7.5, 2 M NaCl, 0.5% Triton X-100, with 2 μL Superase RNase Inhibitor added fresh per 10 mL) and a low salt buffer (5 mM Tris-Cl pH 7.5, 0.1% Triton X-100, with 2 μL Superase RNase Inhibitor added fresh per 10 mL). For 5′ hydroxyl repair, samples were incubated at 37°C for 30 min with 1x PNK buffer, 1 mM ATP, 1 μL T4 PNK and 0.5 μL SUPERase-In RNase Inhibitor. For 5′ decapping, samples were incubated at 37°C for 60 min with 0.5x NEB2 ThermoPol buffer, 1 μL RppH, and 0.5 μL SUPERase-In RNase Inhibitor. For 5′ adapter ligation, supernatant was removed and samples resuspended in 1 μL 100 μM 5′ adapter (/5InvddT/CCTTGGCACCCGAGAATTCCANrNrNrNrNrNrC) and 6.5 μL H2O. Following denaturation at 65°C for 30 s and snap cooling on ice, 11.5 μL ligation mix (1x T4 RNA ligase buffer, 1 mM ATP, 0.5 μL SUPERase-In RNase Inhibitor, 15% PEG 8000, 1 μL T4 RNA ligase) was added and samples were incubated at 25°C for 1 h with agitation.

To isolate RNA and generate libraries, after washing with high salt buffer and low salt buffer, beads from each sample were resuspended in 500 μL TRIzol. Samples were incubated on ice for 3 min and 100 μL chloroform was added. Samples were thoroughly vortexed and then centrifuged at 20 000 × g for 5 min at 4°C. The aqueous layer was collected, and another TRIzol-chloroform extraction (300 μL TRIzol, 60 μL chloroform) and additional chloroform extraction (500 μL chloroform) performed. GlycoBlue (1 μL) and 2.5 × 100% ethanol were added to the extracted RNA solutions. After incubation at −20°C for 30 min, RNA precipitation was completed by pelleting the samples at 20 000 × g for 30 min at 4°C. The GlycoBlue pellet was washed with 70% ethanol and air-dried for 5 min. Samples were resuspended in 3.5 μM RP1 and 0.9 μM dNTPs. After denaturation at 65°C for 30 s and snap cooling on ice, RT master mix (1x RT buffer, 5mM DTT, MgCl2, 1 μL SuperScript III, 0.5 μL SUPERase-In RNase Inhibitor) was added and reverse transcription performed (48°C for 3 min, 44°C for 20 min, 52°C for 45 min, and 85°C for 5 min). PCR (13 cycles) was used to generate the final libraries using 1x Q5 reaction buffer, 1x Q5 enhancer, 0.1 μM RP1, 0.25 mM dNTPs, 1x Q5 DNA polymerase, and TruSeq index primers. Libraries were size selected using PAGE and submitted for Illumina sequencing on the NextSeq500 (Mid-Output Kit, 2 × 75 bp).

RNA-seq and small RNA-seq library preparation

RNA in TRIzol (collected as described above) for the PGK and CAG landing pad cell lines was submitted to the Cornell Transcriptional Regulation and Expression Core Facility for RNA-seq (in triplicate) and small RNA-seq (in duplicate). Libraries were generated using the NEBNext Ultra II Directional RNA Kit (with poly(A) enrichment) and NEBNext Small RNA Library Prep Kit. Small RNA-seq libraries were sequenced on a NextSeq500 (1 × 75 bp) and RNA-seq libraries were sequenced on a NovaSeq (2 × 150 bp).

Initial data processing

Linking barcodes to 3′UTR

The Tn5 tagmentation plasmid libraries were used to link barcode to 3′UTRs. A custom genome consisting of the expected 3′UTR sequences with 250 nt flanking vector sequence was constructed for mapping. Approximately 300 M paired-end (2 × 150 bp) reads were obtained for each of three libraries. Adapters were trimmed from reads with cutadapt -q 20 -j 25 -a CTGTCTCTTATACACATCTCCGAGCCCACGAGAC -A CTGTCTCTTATACACATCTGACGCTGCCGACGA. Bowtie2 (with parameters -X 1000) was used to map reads to the custom genome. Only high-quality reads and concordantly mapping read pairs (i.e. pairs that map to both the barcode region and a 3′UTR) were kept. A custom Python script (python version 2.7) was used to identify and count reads containing the barcode region, and the barcode sequences were added to the read name of both pairs. The barcode-3′UTR linkages were exported and further filtering was performed in R, e.g. to remove lowly-detected barcode sequences likely due to PCR or sequencing errors (Supplementary Fig. S1D). Sixteen barcodes were found to be associated with two different 3′UTRs, and therefore excluded from future analyses. We confidently identified 4245 barcodes covering 1401 3′UTRs. 99% of 3′UTRs were associated with two or more barcodes and 81% of 3′UTRs are associated with three or more, serving as internal replicates for each assay (Supplementary Fig. S1E).

Barcode sequencing

For barcode libraries derived from both RNA and DNA, barcodes were demultiplexed, counted, normalized, and exported to a table using a custom Python script (python version 2.7). For both RNA and DNA barcode libraries, barcode counts were normalized to library size, i.e. counts per million (CPM), first by calculating the “per million” scaling factor (total number of reads mapping to barcodes divided by 1 000 000) and then by dividing each barcode by this scaling factor, summarized in the equation:

graphic file with name TM0001.gif

R (version 4.2.3) was used to link barcode counts to 3′UTRs (linkages determined by tagmentation sequencing, as described above).

Calculating protein score, translation score, and RNA abundance

For each barcode, the protein score was calculated by taking the weighted average (weights ranging from −2 to 2) of the normalized counts (CPM) of each of the five sorted bins (0–10%, 20–30%, 40–60%, 70–80%, and 90–100%), summarized in the equation:

graphic file with name TM0001a.gif

The translation score was calculated similarly, except that normalized barcode counts (CPM) from each fraction of the polysome gradient was used, and each polysome fraction was weighted with the corresponding number of ribosomes within each peak:

graphic file with name TM0002.gif

RNA abundance was calculated as the ratio of barcode counts deriving from RNA and barcode counts deriving from gDNA, i.e. using the equation:

graphic file with name TM0003.gif

Scores were calculated for every barcode in the library, and the median score across all barcodes for a given 3′UTR was used for most analyses, as indicated in axis titles and figure legends.

Calculating half-life in RNA stability MPRA

To estimate RNA half-life, a strategy adapted from to Zhao et al. [38] and Chen et al. [59] was used. First, normalized counts for each barcode (CPM) were calculated as above. Each barcode within each timepoint was then scaled by total relative GFP mRNA, measured with qRT-PCR, to generate values with which to fit an exponential decay curve. To fit the decay curve, the nls() function in R with the self-starting nls asymptotic model (SSasymp) was used, summarized by:

graphic file with name TM0004.gif

Where RNA0 is the starting RNA amount, RNAplateau is the value at which the RNA reaches steady state, k is the decay constant, and t is time. We allowed a plateau constant because there is some level of transcriptional leakage in the Tet-Off system, even upon reaching steady-state in 2 μg/mL doxycycline. The half-life (t1/2) in hours is then calculated:

graphic file with name TM0005.gif

For most plots, the median half-life across all barcodes for a given 3′UTR is used, as indicated in the axis titles and figure legends.

Small RNA-sequencing read processing and differential expression analysis

For small RNA-seq, reads were mapped to miRBase miRNA annotations [60] (version 22.1) using mapper.pl (parameters -m -e -h -i -j -v -k AGATCGGA -l 18) and quantified with quantifier.pl (parameters -W -j) from the miRDeep2 [61] package (version 2.0.0.7). In R (version 4.2.3), miRNA counts were summed for those in the same family (defined as containing the same extended seed region, nt positions 2–8 of miRNA). DESeq2 (version 1.38.3) [62] was used for miRNA differential expression analysis. The log2 fold-changes were shrunk with the lfcShrink() function in DESeq2 except in MA plots.

RNA-seq read processing and differential expression analysis

Paired-end reads were first trimmed with cutadapt [63] (version 4.6) with parameters -q 20 –poly-a –minimum-length = 10 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT. Reads mapping to the PhiX genome (Illumina spike-in) and to rDNA were removed with bowtie2 (version 2.5.1) [64] (parameters –fast-local –dovetail –un-conc). Reads were mapped to the human genome (hg19) with hisat2 (version 2.2.1) [65] using default parameters. Gene annotations were downloaded from Gencode (v19). Annotation of exons were converted to SAF format and the featureCounts [66] program from the subread package (version 2.0.3, parameters -s 2 -p –countReadPairs -F SAF) was used to count mapped reads.

PRO-seq read processing and differential expression analysis

PRO-seq reads were processed the same as RNA-seq reads except with the following modifications. The fastp [67] program (version 0.23.2, with parameters –umi –umi_loc = per_read –umi_len = 7 –disable_adapter_trimming –disable_quality_filtering –disable_length_filtering –disable_trim_poly_g) was used to add the unique molecular identifiers (UMIs) to the read name for removing PCR duplicates. Cutadapt was then used to remove adapters (parameters -q 20 –poly-a –minimum-length = 10 -a TGGAATTCTCGGGTGCCAAGG -A GATCGTCGGACTGTAGAACTCTGAAC). Reads mapping to PhiX and rDNA were removed and the remaining reads mapped to the hg19 genome, as performed for RNA-seq. PCR duplicates were removed using the dedup tool from the UMI-tools [68] program (parameters –umi-separator=“:” –paired). Mapped reads were counted using featureCounts similar to RNA-seq, except the SAF file contained transcripts (including introns and exons) with 200 bp removed from the transcription start site (TSS) and transcriptional end site (TES), to avoid counting reads from paused/slowed polymerase.

Predicting 3′ ends for 3′UTR reporters

3′UTR reporters were cloned to include ∼150 nt of genomic sequence past the annotated CPA site [54], and therefore, we used predicted endogenous CPA sites. To precisely call this site, the polyA_DB (version 3) database [69] was used. The CPA with the highest average reads per million of across all samples in the database was used. The 3′UTR sequence and length resulting from these CPA predications was used for all analyses.

Partial r2 analysis

Statistical modeling and partial r2 analysis was conducted in R. Linear regressions and the Bayesian information criterion (BIC) were computed with the lm() and BIC() functions, respectively, in the stats package.

Calculating transcriptome-wide RNA stability with PRO-seq and RNA-seq

To estimate relative RNA stability using PRO-seq and RNA-seq, transcript per million (TPM) values for each gene for each assay were calculated from the read count tables described above. Transcriptome-wide RNA stability was calculated as follows:

graphic file with name TM0006.gif

To measure differences in RNA stability between the CAG and PGK cell line, the raw count tables for RNA-seq and PRO-seq were used as input and DESeq2 was performed with design = ∼assay*cell_line, where assay refers to either RNA-seq or PRO-seq and cell_line refers to either PGK or CAG. The function lfcShrink() was used to shrink the RNA stability fold-changes.

The bias-adjusted, normalized half-lives (referred to as “composite half-lives” in the text and figures) were taken from Supplementary Table S2 of Agarwal et al. (half-life, PC1) [70].

miRNA targeting analysis

Human miRNA target predictions were obtained from TargetscanHuman [71] (Release 8.0, September 2021). As above, we collapsed miRNAs into families and performed subsequent targeting analysis on a family level. Only conserved (defined by Targetscan) and expressed (CPM > 10) miRNA families were selected for targeting analysis (106 families). Targeting signatures were tested by comparing transcriptome-wide RNA stability estimates (RNA-seq/PRO-seq) for miRNA family targets (context++ score < −0.2) to non-targets (transcripts whose 3′UTRs did not contain an extended seed match). Targetscan context scores represent the predicted log2 fold-repression for a given miRNA [48], and therefore, for analyses involving targets of multiple miRNA families, we summed their context scores. This assumes miRNAs are acting independently, a reasonable assumption given that only closely spaced (∼10–40 nt apart) miRNA target sites are likely to act cooperatively [72]. Therefore, we reasoned that the sum of context scores for different miRNAs acting on the same transcripts is likely a good approximation for targeting strength. As a control group for the combined set of five miRNAs, five conserved, unexpressed (CPM < 10) miRNAs (miR-138–5p, miR-145–5p, miR-224–5p, miR-208–3p, miR-412–5p) were randomly selected and the 3′UTR targets of these five miRNAs were used as a control.

To calculate potential targeting signatures for the 46 differentially expressed conserved miRNA families, the log2 fold-change in RNA stability (using RNA-seq/PRO-seq) was compared for targets of a given miRNA family (transcripts with a predicted context++ score < −0.2) with nontargets (transcripts whose 3′UTRs did not contain an extended seed match). Only three miRNAs were differentially active (Bonferroni corrected P value < 0.05, two-sided Wilcoxon rank-sum test). Targeting signatures for these three miRNAs were further assessed by comparing miRNA targets of varying strength, as determined by Targetscan context++ score.

Spectrum motif analysis (SPMA)

SPMA was performed using the Transite program [73]. Transcripts were ranked as indicated in figure legends. For RNA-seq/PRO-seq RNA stability rankings, either the raw values or the shrunken log2 fold-changes were used, as indicated in the figure legends. Matrix-based SPMA for transcriptomic data was performed on the Transite server (https://transite.mit.edu) using the Transite motif database (40 bins, maximum model degree of 2, sequence region 3′UTR, maximum of five binding sites). Matrix-based SPMA for 3′UTR reporters was performed with the Transite package in R using the function run_matrix_spma() with max_model_degree = 2 and otherwise default parameters. Sequence logos were generated with the ggseqlogo package in R [74].

Sequence conservation

Per-nucleotide phastCons [4] scores (100-way vertebrate) were downloaded from the UCSC (genome build hg19) genome browser. To consider a nucleotide conserved, we used a phastCons score > 0.95, meaning the nucleotide has a 95% chance of belonging to a conserved regulatory element. To identify non-conserved nucleotides, we used a phastCons score < 0.01, meaning the nucleotide has <1% chance of belonging to a conserved regulatory element.

ELAVL1 eCLIP-seq

Publicly available enhanced CLIP-seq data for ELAVL1, performed in K562 cells, was downloaded from ENCODE (experiment accession ENCSR090LNQ; doi:10.17989/ENCSR090LNQ) [27]. 3′UTR reporters were considered an ELAVL1 target if they contained at least one reproducible eCLIP peak (as defined in the narrowPeak bed file, accession ENCFF566LNK).

Visualization

Most plots were generated with the ggplot2 package in R [75]. Heatmaps were generated with the pheatmap package in R. Normalized barcode counts were converted to z-scores for row normalization in heatmaps. RNA half-life plots for individual 3′UTR reporters were generated with GraphPad Prism.

Correlation between two variables

For all figures, correlation coefficients and associated P values were calculated with the stat_cor() function in the ggpubr package in R. Pearson’s correlation coefficient (denoted by r) is reported for linear relationships and Spearman’s rank correlation coefficient (denoted by ρ) is reported otherwise.

Comparing two distributions

One-sided or two-sided Wilcoxon rank-sum tests were used to compare two distributions, as indicated in figure legends, using the wilcox.test() function in the stats package in R. Exact P values are reported unless otherwise noted.

Results

A massively parallel reporter assay to assess 3′UTR-mediated gene regulation

We developed an MPRA to understand the regulatory impact of full-length 3′UTRs on quantitative gene output. Our approach relies upon a GFP reporter fused to a 3′UTR of interest, which is integrated at a defined locus in the genome and contains a barcode sequence that identifies each 3′UTR within a reporter library (Supplementary Fig. S1C). This approach leverages the Bxb1 recombinase system, which allows efficient integration of barcoded 3′UTR reporters at a hemizygous and transcriptionally active locus, referred to as a landing pad [52, 56, 76]. In this system, we integrate a library of barcoded 3′UTR reporters into a HEK293T cell line containing a landing pad locus. Because each recombinant cell receives a single copy of a 3′UTR reporter, total protein output for each 3′UTR can be measured via cell sorting followed by barcode deconvolution (Fig. 1A). Pooled cells expressing 3′UTR reporters are FACS sorted into bins based on their GFP fluorescence; barcodes identifying each 3′UTR are amplified from the genomic DNA of the sorted cells and sequenced. This strategy allows us to simultaneously measure each reporter's relative proportion per bin and hence protein output. This approach has numerous advantages, including the constant promoter, 5′UTR and coding region, which allow us to attribute regulation solely to the 3′UTR. Moreover, the barcoded Bxb1-based system allows thousands of 3′UTR reporters to be interrogated in parallel, with expression of each reporter at physiological levels and expressed from a consistent locus; this is likely a critical feature of the system, as relative concentrations of trans factors and cognate cis regulatory elements determine the degree of regulation [46, 59, 77–79].

Figure 1.

Figure 1.

MPRA measures impact of full-length 3′UTRs on gene expression. (A) Experimental design to measure 3′UTR-mediated protein output. Pooled 3′UTR reporters are FACS sorted (GFP) into five bins. Genomic DNA is extracted and 3′UTR barcodes amplified and sequenced. (B) Heatmap showing barcode enrichment (row-normalized counts per million, CPM) in each of five sorted GFP bins (0–10%, 20–30%, 40–60%, 70–80%, and 90–100%) for two independently sorted replicates. Each row is a barcoded 3′UTR reporter, and rows are sorted in ascending order by protein score of replicate 1. (C) Scatterplot demonstrating within-assay barcode replication. Each point is a 3′UTR; axes show protein score derived independently from each of the two barcodes. Discordant barcodes (standard deviation > 0.35) are colored red, with text identifying putative regulatory elements within one of the two discordant barcodes (AUUUA is a core ARE sequence; AAUAAA is the canonical polyadenylation signal). (D) Flow cytometry histograms depicting GFP fluorescence for 13 individually integrated 3′UTR reporters. (E) X-axis: GFP fluorescence (geometric mean), measured via flow cytometry for individually integrated 3′UTR reporters shown in (D), normalized to the 3′UTR with the lowest fluorescence. Y-axis: protein score in pilot assay for each 3′UTR. Each color represents a different 3′UTR; points of the same color represent independent barcodes corresponding to a single 3′UTR. (F) Protein score comparisons for 3′UTRs driven by the PGK or CAG promoter. Each 3′UTR is represented by the median protein score of internal replicates (barcodes). The colored points represent 3′UTRs selected for validation, with protein score percentiles indicated. (G) Median GFP fluorescence (flow cytometry) for the 26 3′UTR reporters shown in (F), integrated into the CAG landing pad, compared with the high-throughput protein scores. Different colors represent different 3′UTRs. (H) Bar chart of the flow cytometry data for the reporters in (F) and (G). Median GFP fluorescence values are normalized to the 3′UTR with the lowest GFP fluorescence.

To optimize and evaluate the reproducibility of our system, we generated a pilot library of 173 full-length human 3′UTRs, each represented by at least two independently generated constructs, linked to distinct 12 nt barcode sequences, serving as internal replicates. The barcode is embedded in a short constant region that facilitates amplification with Illumina-compatible primers (Fig. 1A). To decrease the likelihood that barcodes themselves generate functional regulatory elements, we positioned them within the first 15 nt of the 3′UTR, a region that is often refractory to regulation [48]. We additionally verified that the barcode placement does not impact regulation (Supplementary Fig. 1F). Arrayed 3′UTRs were cloned downstream of GFP using a complex pool of barcoded GFP vectors (Supplementary Fig. S1A). We linked each barcode to each 3′UTR using tagmentation-based plasmid sequencing (Supplementary Fig. S1D). We selected full-length 3′UTRs from a previously-generated collection, each of which was cloned in full and extended 150 nt 3′ of the annotated transcript to increase the likelihood that 3′ ends are processed using endogenous cleavage and polyadenylation (CPA) sites [54]. Additionally, we included a downstream SV40 polyadenylation signal, to ensure efficient processing in the case of inefficient endogenous CPA usage. To empirically determine the 3′ end for a subset (ten) of 3′UTR reporters, we performed Rapid Amplification of cDNA Ends (3′RACE) [80]. As anticipated, we observed variation in CPA site usage, with some reporters using the SV40 site (5/10), some using the canonical predicted CPA site (2/10), and some using an alternate internal CPA site (3/10), although for alternative sites we cannot rule out oligo(dT) priming at internal A-rich sequences (Supplementary Table S1).

We used a previously characterized HEK293T landing pad cell line [52] to integrate our pilot reporter library (Supplementary Fig. S1C). In this system, BFP is driven by a Tet-On promoter at the landing pad and contains an attP recombination site to facilitate cargo integration. Upon transfection of Bxb1 recombinase and an attB-containing 3′UTR reporter plasmid, GFP replaces BFP at the locus; GFP-positive and BFP-negative recombinants are then selected by FACS. After culture and expansion of the recombinant pool of cells, we FACS-sorted cells into five bins across the range of GFP expression, isolated genomic DNA from each bin, amplified the barcode sequences with PCR, and used Illumina sequencing to quantify the barcode identifiers in each bin. We observed highly concordant enrichment profiles for 3′UTRs tagged by different barcodes, indicating the robustness of this system (Fig. 1B).

In order to quantify relative protein output for each 3′UTR assayed, we used barcode read counts (CPM) from each of the five sorted GFP bins to calculate a “protein score” for each barcoded 3′UTR. This score is a weighted average of the read counts across the sorted GFP bins, arbitrarily scaled from −2 to 2 and represents a semi-quantitative readout of cognate protein abundance. We note that a score of 0 does not necessarily mean the 3′UTR is devoid of regulatory elements, but rather represents the protein output for the median 3′UTR. Scores obtained from independently sorted replicates were near-identical (Pearson’s r = 1.0, Supplementary Fig. S2A). Moreover, scores from different barcodes representing the same 3′UTR were also highly correlated (r = 0.97, Fig. 1C). We observed four cases in which different barcodes corresponding to the same 3′UTRs were discordant; we hypothesized that these cases might be due to barcodes that serendipitously generated functional regulatory elements. Indeed, one barcode included a polyadenylation signal sequence and another contained the canonical AU-rich element motif (AAUAAA [81] and AUUUA [82], respectively, Fig. 1C). Notwithstanding these rare (<3%) exceptions, the regulatory output for most 3′UTRs represented by two (or more) barcodes is reliable and concordant.

Finally, we sought to evaluate the accuracy of the MPRA protein score by generating validation reporters for a subset of 3′UTRs. From the pilot library, we selected 13 3′UTRs that covered a range of protein scores, integrated them individually using the Bxb1 recombination system, and measured GFP levels with flow cytometry (Fig. 1D). We observed high concordance between MPRA protein scores and individually determined GFP fluorescence measurements (Spearman’s ρ = 0.97, Fig. 1E), indicating the accuracy of our FACS-based protein assay.

Determining the regulatory impact of >1400 3′UTRs

We next expanded our repertoire of reporters to include >1400 sequences, covering full-length 3′UTRs from ∼7% of human coding genes. Based on our pilot MPRA, we adjusted our approach in several ways: first, we tagged each 3′UTR in the library with multiple (≥3 for most 3′UTRs) unique barcodes to serve as internal replicates, in order to use the median measurement for each 3′UTR. This approach allowed us to eliminate measurements compromised by the rare cases where a single barcode gained a regulatory sequence and therefore generated erroneous readouts. Second, because our original reporter lacked an intron and many post-transcriptional regulatory mechanisms are dependent on the exon–junction complex (EJC) deposited by the spliceosome [83], we generated two new landing pad cell lines, each with a different promoter and an intron in their 5′UTR (Supplementary Fig. S1B) [52]. The first promoter is from the human phosphoglycerate kinase gene (PGK), a commonly used promoter that drives moderate expression of transgenes [77], to which we added an intron from the 5′UTR of EF1α. The second landing pad utilizes the CAG promoter, a commonly used, strong promoter consisting of the cytomegalovirus (CMV) enhancer, the promoter of chicken beta-actin, and the splice acceptor of the rabbit beta-globin gene [84]. Third, part of our motivation for changing the promoter was to achieve expression at physiological levels, as higher expression of reporters may saturate the decay machinery [46, 59]. The Tet-On promoter, used in our pilot screen, expresses genes at extremely high levels; in contrast, the PGK and the CAG promoters direct more moderate expression [77]. Finally, because promoter identity can influence post-transcriptional events, we sought to compare the regulatory impact of 3′UTRs on transcripts generated by either the PGK or CAG promoter [28, 30, 85].

To generate the large 3′UTR reporter library, we successfully cloned >4200 barcoded GFP vectors covering 1401 3′UTRs (Supplementary Fig. S1B and S1D). The final library contains 3′UTRs with lengths up to ∼2400 nt (median ∼670 nt) and covers >1 million nt of sequence space (Supplementary Fig. S2B). Approximately 80% of 3′UTRs were represented by at least three barcodes, and 99% of 3′UTRs were represented by at least two, ensuring robust internal replication (Supplementary Fig. S1E). Using our FACS-based MPRA (Fig. 1A), we determined protein output for >4200 barcoded reporters, analyzing the reporter library individually expressed under both the PGK and CAG promoter. We observed high concordance in protein scores between barcodes for the same 3′UTR (r = 0.89 and 0.88 for PGK and CAG, respectively). We then selected a subset of 3′UTRs from the library, which spanned the full range of protein scores (Fig. 1F), integrated these individually and used flow cytometry to quantify GFP reporter protein levels. Importantly, flow cytometry measurements of individually integrated reporters and protein score measurements from our MPRA were highly correlated (ρ = 0.98, Fig. 1G). We observed that the middle 80% (i.e. excluding the top and bottom 10%) of our 3′UTRs control GFP expression over a ∼4-fold range (Fig. 1H). However, we observed larger, ∼16-fold differences in regulation between 3′UTRs found at the tails of the protein score range (1st and 99th percentiles, Fig. 1H). Overall, this data demonstrates that our MPRA is robust and accurate and captures the regulatory impact of 3′UTRs over a wide range. Importantly, our data indicate that most 3′UTR-mediated regulation occurs over a modest range of ∼4-fold, but a number of 3′UTRs mediate repression or activation sufficient to control protein levels by an order of magnitude.

Measuring the influence of 3′UTRs on RNA abundance and translational efficiency

After observing large differences in total protein output driven by 3′UTRs, we sought to investigate the relationship between RNA stability and translational efficiency, and how these processes together control protein output. To do this, we first measured RNA abundance, as a proxy for RNA stability, for each 3′UTR in the library (Fig. 2A). We encoded the 12-nt barcode sequence within a short constant region between the GFP coding region and 3′UTR of interest; therefore, the barcode is expressed within the mRNA. To measure RNA abundance in parallel for >4200 barcoded 3′UTR reporters, we extracted RNA from the pool of cells, performed reverse transcription, PCR amplified the constant region surrounding the barcode, and proceeded with Illumina sequencing (Fig. 2A). Because cells expressing each 3′UTR may be present at different levels within the pool of reporters, we isolated genomic DNA in parallel, obtaining barcode sequencing counts from the DNA in order to normalize counts derived from reporter RNA. Thus, the RNA/DNA barcode count ratios are a direct measure of RNA abundance. We observed concordance in RNA abundance between internally replicated 3′UTRs (Supplementary Fig. S2C and D), and correlation between the CAG and PGK cell lines (Supplementary Fig. S2E). As expected, we found that RNA abundance correlates well with total protein output (Fig. 2B and Supplementary Fig. S2F).

Figure 2.

Figure 2.

3′UTR-mediated differences in protein output can be attributed to variation in RNA abundance and translational efficiency. (A) Schematic for measuring RNA abundance and translational efficiency of >1400 3′UTR reporters. (B) Correlation between RNA abundance and protein score for the CAG promoter dataset. The median value of the barcode replicates was taken for each 3′UTR. A third-order polynomial regression line is shown. Shaded region indicates 95% confidence interval for regression. (C) Schematic for measuring RNA stability following promoter shutoff. Prior to a transcriptional pulse, cells are passaged in doxycycline to repress transcription; doxycycline is removed for a 4-h transcriptional pulse, and, upon reintroduction of doxycycline to shut off the promoter, RNA levels are monitored at multiple timepoints. An exponential decay curve is fit to the measurements and the RNA half-life (t1/2) is calculated from the decay constant k. (D) Relative GFP RNA levels for four 3′UTR reporters, monitored with qRT-PCR, following a 4-h transcriptional pulse. Samples were first normalized to a housekeeping gene and then each 3′UTR reporter was normalized to timepoint zero. The RNA half-lives (t1/2) are shown for each 3′UTR, in hours. A bar plot is shown (inset) depicting median RNA abundance measured in MPRA. (E) Correlation between RNA abundance (as measured by RNA/DNA ratios) and RNA half-life (calculated from MPRA of 173 3′UTRs following promoter shutoff). (F) Correlation between protein score and RNA half-life, measured in MPRA for 173 3′UTRs. (G) Heatmap of polysome enrichment (row normalized CPM) in each fraction for each of three internal replicates (barcodes for the same 3′UTR). Rows are 3′UTR matched and sorted in ascending order by protein score of barcode 1. (H) Correlation between translation score and protein score, otherwise as described in (B).

Steady state RNA levels are a function of synthesis (transcription) and decay. Because every 3′UTR is controlled by the same promoter, we assumed that their synthesis rates are identical, and therefore, we used steady-state RNA abundance (i.e. RNA/DNA ratios) as a proxy for RNA stability. We tested this assumption directly using promoter shutoff experiments [57]. In this system, cells expressing a 3′UTR reporter (or pool of reporters) under control of a Tet-Off promoter are passaged in doxycycline to repress transcription. Next, cells are subjected to a 4-h transcriptional pulse by removal of doxycycline; the promoter is then shut off by doxycycline addition, and GFP mRNA levels assayed at intervals ranging from 0 to 20 h (Fig. 2C). We first established this system using four 3′UTRs selected from the pilot 3′UTR library. RNA was collected at each timepoint following promoter shutoff, and RNA levels were measured using qRT-PCR. As expected, we observed a steady decrease in RNA levels for all four reporters across the timecourse (Fig. 2D). To calculate the half-life, we fit an exponential decay model and observed half-lives consistent with MRPA RNA abundance measurements (Fig. 2D inset). We next applied this assay to a library of 173 3′UTRs. We integrated the pooled 3′UTR reporters into the Tet-Off cell line, FACS-sorted recombinants, and performed promoter shutoff experiments, as above. We generated barcode libraries following shutoff at timepoints 0, 1, 2, 3, 4, 8, and 20 h. Early timepoints were taken more frequently as we observed the most decay during this time for the individually-integrated reporters (Fig. 2D). In parallel, we performed qRT-PCR for GFP in order to calculate scaling factors for each of the seven timepoints. CPM values for each barcode for each timepoint are multiplied by this scaling factor, an exponential decay curve is fit to each barcoded 3′UTR, and the half-life calculated. We observed high concordance in half-lives between biological replicates, and between barcodes for the same 3′UTRs (Supplementary Fig. S2G and S2H). Importantly, we observed high correlation between half-life and RNA abundance, as measured by RNA/DNA ratios (ρ = 0.77, Fig. 2E) and high correlation between half-life and protein output from our Tet-On MPRA (ρ = 0.93, Fig. 2F). In all, these data suggest that RNA abundance (as indicated by the RNA/DNA ratio in MPRA) is predominantly determined by RNA stability.

We next sought to investigate the 3′UTR-mediated influence on translational efficiency. To measure translation, we performed polysome profiling, an established method for measuring translational regulation [86]. We treated cells expressing the reporter library with cycloheximide to freeze ribosomes on transcripts and ran lysates over a sucrose gradient to separate mRNAs by density, which reflects the number of ribosomes bound (Fig. 2A). We collected fractions for each peak in the polysome gradient and sequenced barcoded reporters from each fraction. Overall polysome profiles were similar between gradients (Supplementary Fig. S3A and B), and barcode enrichments within polysome fractions were consistent for replicate barcodes tagging the same 3′UTR (Fig. 2G). As expected, we observed a shift towards higher polysome association for 3′UTRs with higher protein scores (Fig. 2G). From this data we calculated a weighted “translation score.” Calculated similarly to the protein score, each polysome fraction is assigned a scaling factor based on number of ribosomes within that peak (i.e. monosome fraction is given a weight of one, disome fraction is weighted as two, etc.), and barcode counts for each fraction are used to calculate a weighted average across the polysome gradient, effectively normalizing for total RNA expression and resulting in a translational efficiency metric. We observed high concordance in translation scores between barcodes of the same 3′UTR in both our pilot MPRA (Supplementary Fig. S1F) and large-scale MPRA (Supplementary Fig. S3C and D) indicating the 12 nt barcodes do not influence translation to a significant degree. Translation scores for reporters driven by the PGK and CAG promoters were also concordant (r = 0.92, Supplementary Fig. S3E). As expected, we found translational efficiency to be highly correlated with protein output (ρ = 0.79 and 0.83, Fig. 2H and Supplementary Fig. S3F). In all, this data demonstrates that we are effectively able to measure RNA abundance, as a proxy for RNA stability, and translational efficiency, enabling us to investigate the contributions from multiple mechanisms of 3′UTR-mediated quantitative regulation.

Differentiating the impact of RNA abundance and translational efficiency on 3′UTR-mediated regulation

Interestingly, we found RNA abundance and translational efficiency to be correlated (ρ = 0.62, Fig. 3A and ρ = 0.63, Supplementary Fig. S3G). Although this relationship has been observed previously, our approach removes the impact of variable 5′UTRs and coding regions, which also contribute to translational regulation. Thus, it was previously unknown the extent to which 3′UTRs influence both translation and RNA abundance, and how associated these processes are.

Figure 3.

Figure 3.

3′UTR-mediated differences in protein output can be attributed to variation in RNA abundance and translational efficiency. (A) Correlation between RNA abundance and translation score. (B) Stacked bar chart illustrating contributions from translational efficiency (T) and RNA abundance (R) towards protein output (P), as determined by coefficient of partial determination (partial r2) analysis. A model was built and partial r2 values calculated to describe the relationship between RNA abundance, translational efficiency, and protein output. (C) Scatterplot depicting relationship between protein output (x-axis), translational efficiency (y-axis), and RNA abundance (color). 3′UTR reporters with low RNA abundance and high translation rates are labeled. (D and E) 3′UTR reporters were binned into quartiles by median RNA abundance (D) or protein score (E) and RNA stability (estimated with RNA-seq/PRO-seq) for the cognate endogenously expressed transcript was plotted. A one-sided Wilcoxon rank-sum test was used to calculate P values. (F and G) 3′UTR reporters were binned into quartiles by median RNA abundance (F) or protein score (G) and composite half-lives (obtained from Agarwal et al. [70]) for the cognate endogenously expressed transcript was plotted. A one-sided Wilcoxon rank-sum test was used to calculate P values.

The correlation between mRNA abundance and translational efficiency confounds interpretation of their relationship with protein output. To deconvolute the relationship between these regulatory mechanisms, we developed a multiple linear regression model of protein output as a function of RNA abundance and translational efficiency. To select the model that best describes these relationships, we used the BIC [87]. The BIC evaluates the fit of the regression model by maximizing the likelihood ratio while penalizing more complex models to minimize overfitting. Using this strategy, we selected a third-order polynomial to describe the relationships between protein output and RNA abundance (Fig. 2B), and between protein output and translational efficiency (Fig. 2H). We included an interaction term to incorporate the interaction between RNA abundance and translation into the model, as the BIC indicated this additional term significantly improved the model, although the effect size was small (Fig. 3B). For the CAG dataset, we found that RNA abundance and translational efficiency together explain 76.4% of the variation in protein expression (Fig. 3B). We obtained a similar value for the PGK dataset (78.0%), indicating the model was not overfitting the data. We note that this value is likely underestimated due to variation in measurements and imperfections in the model, although some of the unexplained variation could be due to other mechanisms of gene regulation, such as 3′UTR-mediated nuclear retention [88, 89] or sequestration to cytoplasmic granules [90–92]. Taken together, this data demonstrates that control of transcript abundance and translation alone are the dominant modes by which 3′UTRs control protein levels of their cognate transcript.

We next sought to describe the relationship between translational efficiency and protein output after removing the influence of RNA abundance, and between RNA abundance and protein output after removing the influence of translation. Using partial correlation analysis, we calculated the coefficients of partial determination for protein output as a function of RNA abundance and translational efficiency. While 10.0% of the variance in protein levels can be uniquely explained by RNA abundance after removing the influence of translation, 21.1% can be uniquely explained by translation after removing the influence of RNA abundance. However, 45% of total variation cannot be uniquely attributed to either RNA abundance or translational efficiency, but is attributed to the combined impacts of control of abundance and translation. Interestingly, the model suggests that translation may have a stronger singular and independent influence on protein levels than does RNA abundance.

We next examined individual 3′UTRs that show discrepancy between RNA abundance and translational efficiency. Intriguingly, we observed that a subset of 3′UTRs exhibits very low RNA abundance but markedly high translational efficiency, leading to relatively higher protein output (Fig. 3C). These 3′UTRs represent interesting candidates to investigate mechanisms that may lead to enhanced translation.

In all, this data illustrates the extent of interconnection between post-transcriptional processes, specifically the close connection between control of RNA abundance and translation. This link may be driven by 3′UTR-binding proteins that influence both mRNA decay and translational efficiency [93], and/or may be due to decay-coupled translational quality control mechanisms [94]. Surprisingly, it appears a large portion of a 3′UTR’s total impact is due to influence on translation, and we observed examples where 3′UTR reporters with exceptionally low RNA abundance may exhibit enhanced translation.

3′UTR MPRA measurements reflect post-transcriptional regulation of endogenous genes

We assumed that our GFP reporter measurements reflect post-transcriptional regulation present in endogenous transcripts, i.e. when the 3′UTR is expressed in its native context with its cognate 5′UTR and coding region. However, in one example, it has been shown that 3′UTR reporter assays may not recapitulate regulation of endogenous genes and instead influence from endogenous sequence context renders 3′UTR reporter assays unreliable, although presumably such concerns are mitigated by our analysis of full-length 3′UTRs [94–96]. Nevertheless, we sought to test whether 3′UTR-mediated control of RNA abundance measured in our MPRA reflected RNA stability of the native transcript. To this end, we performed transcriptome-wide measurements of RNA steady state levels (with RNA-seq), and nascent transcript synthesis (with Precision Run-On Sequencing, PRO-seq) [97]. PRO-seq maps the position of active RNA polymerase along the transcript and is therefore a measure of transcriptional activity [98]. Because RNA steady-state levels are a function of RNA synthesis and decay rates, PRO-seq and RNA-seq together can be used to estimate RNA stability for all expressed genes [99–101]. We found ∼900 genes expressed in the PGK and CAG HEK293T cell lines that also had a 3′UTR represented in the GFP reporter library. We compared RNA stability (estimated with RNA-seq/PRO-seq) for these ∼900 endogenous genes to the high-throughput 3′UTR reporter measurement of RNA abundance. We observed a significant albeit moderate relationship between RNA stability for endogenously expressed genes and their corresponding 3′UTR reporter RNA abundance and protein score for both the CAG (Fig. 3D and E, for RNA abundance and protein output, respectively) and PGK (Supplementary Fig. S4A and S4B) datasets.

Because methods of measuring RNA half-lives are biased and agreement between studies is relatively low [70, 100, 102], we also compared our MPRA RNA abundance measurements to bias-adjusted “composite” half-lives, calculated from a meta-analysis of RNA stability studies [70]. In this comparison, we also observed a similar trend linking RNA stability of endogenous genes and RNA abundance measurements of corresponding reporters (Fig. 3F and G, Supplementary Fig. S4C and S4D). These results indicate that our 3′UTR MPRA indeed captures regulation for 3′UTRs expressed from endogenous loci within their full sequence context, notwithstanding the additional inputs on RNA stability of endogenous transcripts deriving from their intrinsic and variable 5′UTR and coding sequences. We note that these data, in combination, could be used to identify transcripts whose post-transcriptional regulation is specified predominantly by cis regulatory elements located within the 3′UTR, rather than other regions such as the 5′UTR and CDS, and vice versa.

Our PRO-seq data also provided an opportunity to assess the frequency of active enhancer elements within our 3′UTR reporter sequences. PRO-seq identifies active enhancers by detecting bidirectional peaks of enhancer RNAs [97] and is one of the most reliable methods of enhancer detection [103]. Notably, only 2.5% of 3′UTR reporter sequences contained enhancers (out of 30 620 we detect) within their cognate endogenous sequences, a percentage comparable to that we detect in all 3′UTRs (2.1%). This data also has technical implications: the 3′UTR sequences we assay are unlikely to act as traditional enhancers that increase reporter expression, as is the case for STARR-seq [104], a conclusion also supported by our RNA stability MPRA (Fig. 2E and F). We conclude that it is relatively rare for active enhancers to be located within 3′UTRs.

Features associated with 3′UTR-mediated regulation differentially influence RNA and translation

We next sought to investigate 3′UTR features influencing different modes of regulation, focusing first on nucleotide content within the 3′UTR. GC content has implications for RNA structure, as sequences with higher GC content are predisposed towards a more condensed secondary structure [105]. 3′UTRs are typically more AU-rich than the 5′UTR and coding region [106], and AU-rich regions in the 3′UTR can serve as cis regulatory elements for AU-binding proteins [107]. We found that GC content negatively correlates with protein output, i.e. high AU-content corresponds to higher protein output (ρ = –0.33, Fig. 4A and ρ = –0.22, Supplementary Fig. S5A) and also observed negative correlations between GC content and translational efficiency (ρ = –0.24, Fig. 4B and ρ = –0.25, Supplementary Fig. S5B). Interestingly, the relationship between GC content and RNA abundance appeared parabolic, whereby 3′UTRs with ∼50% GC content had the lowest RNA abundance, and either high or low GC content were associated with higher levels (Fig. 4C and Supplementary Fig. S5A). This contrasts with the relationship between GC content and translational efficiency, which appeared monotonic (Fig. 4B and Supplementary Fig. S5B). This suggests that AU-rich elements may have opposing roles with respect to influencing RNA abundance and translational efficiency.

Figure 4.

Figure 4.

Features associated with 3′UTR-mediated regulation differentially influence RNA abundance and translational efficiency. (AC) Binned scatterplot comparing GC content with protein score (A), translation score (B), or RNA abundance (C). The Spearman correlation coefficient shown was calculated using all unbinned datapoints. (DF) Binned scatterplot comparing 3′UTR length with protein score (D), translation score (E), or RNA abundance (F). The Spearman correlation coefficient shown was calculated using all unbinned datapoints. (G) Histogram displaying distribution of 3′UTRs by levels of conservation. Percent conserved sequence (phastCons score > 0.95) per 3′UTR is plotted. (H) Reporters binned by number of total nucleotides irrespective of conservation score. One-sided Wilcoxon rank-sum tests were used to calculate P values. Each bin was compared to the group of 3′UTRs with the shortest number of nucleotides (in teal). (I) As described in (H), except only nonconserved nucleotides (phastCons score < 0.01) were used to bin 3′UTRs.

Because previous high-throughput studies of 3′UTR elements have been limited to short fragments of similar lengths, our MPRA of full-length 3′UTRs provided a unique opportunity to investigate the influence of 3′UTR length on different modes of regulation. Although some studies have suggested that there is little correlation between 3′UTR length and mRNA expression [14, 108, 109], others have found long 3′UTRs to be repressive through various mechanisms, including the nonsense-mediated decay (NMD) pathway, trans factors, and decreased nuclear export efficiency, although these mechanisms are poorly understood and remain controversial [110–112]. To investigate this question in a system where influence from the promoter, 5′UTR and coding region is removed, we compared 3′UTR length to GFP MPRA measurements of RNA abundance, translational efficiency, and total protein output. We found that 3′UTR length negatively correlated with total protein output (ρ = –0.32, Fig. 4D, and ρ = –0.23, Supplementary Fig. S5D), as well as with translational efficiency (ρ = -0.40, Fig. 4E and ρ = -0.41, Supplementary Fig. S5E) and RNA abundance (ρ = –0.30, Fig. 4F and ρ = –0.24, Supplementary Fig. S5F). While the effect sizes observed are modest, there is a clear relationship between 3′UTR length and repression.

Although our data (Fig. 4DF) indicates that increasingly long 3′UTRs are more repressive, this relationship could be due to accumulation of repressive regulatory elements, or to element-independent effects, as we have recently observed [113]. To disentangle element-dependent from element-independent effects, we first assumed evolutionarily neutral sequences within the 3′UTR would be largely devoid of cis regulatory elements. For this we used the phastCons score, which represents the probability that a given nucleotide belongs to a conserved element [4]. We found 28.4% of assayed 3′UTR sequence to be conserved (phastCons score > 0.95), highlighting the importance of 3′UTR sequence throughout evolution (Fig. 4G). Many 3′UTRs were highly conserved; for example, 63 of the 3′UTRs in our library had over 90% conserved sequence.

When including all nucleotides regardless of conservation score, we again observed an association between 3′UTR length and repression (Fig. 4H). When limiting our analysis to nonconserved (phastCons score < 0.01) nucleotides, we observed a clear pattern where 3′UTRs with more nonconserved sequence were more repressed (Fig. 4I), and observed similar patterns for RNA abundance and translational efficiency (Supplementary Fig. S5G and S5H). It is important to acknowledge, however, that RNA abundance and translational efficiency are correlated, and therefore, it is difficult to parse out their relative roles in repression elicited by 3′UTR length. Nevertheless, by focusing on sequences least likely to correspond to regulatory elements, longer 3′UTRs likely mediate repression that is at least in part independent of regulatory elements.

Identifying trans factors that control gene expression via 3′UTRs

After investigating general 3′UTR features, we next sought to identify underlying regulatory trans factors. We performed small RNA-seq (sRNA-seq) to identify miRNAs expressed in our landing pad cell lines. To infer active miRNAs impacting the transcriptome, we complemented the sRNA-seq results with miRNA targeting analysis [71, 99, 114]. We limited our analysis to conserved miRNA families, as poorly conserved miRNAs are less likely to be functional [20]. For each of these 106 conserved and detected miRNA families, we used Targetscan context++ scores to predict transcripts subject to miRNA targeting [48, 71]. We compared transcriptome-wide RNA stability measurements (estimated using PRO-seq and RNA-seq) for miRNA targets versus non-targets, in order to identify any miRNAs with signatures of targeting. Of the miRNA families with significant (Bonferroni corrected P value < 0.05, one sided Wilcoxon rank-sum test) targeting signatures, we selected the top five most highly expressed for subsequent analyses (Fig. 5A, Supplementary Fig. S6A–E). MicroRNAs accelerate mRNA decay and, to a lesser extent, repress translation, and therefore we reasoned that the protein score would best reflect miRNA repression [99, 115]. We split 3′UTR reporters into quartiles based on the combined predicted targeting strength for the five chosen miRNA families and compared their protein score values. We observed that the most strongly predicted 3′UTR targets were indeed repressed, but generally there was little difference among 3′UTR reporters with varying levels of predicted miRNA targeting strength (Fig. 5B). Our data suggest that miRNA targeting may play a relatively minor role in quantitative 3′UTR-mediated regulation, at least when assessed in aggregate and using our MPRA data.

Figure 5.

Figure 5.

Identifying trans factors that control gene expression via 3′UTRs. (A) Rank order plot of miRNA family expression based on sRNA-seq. Each point represents a miRNA family, whereby counts from miRNAs containing the same seed sequence (nt position 2–8 of miRNA) were summed. Families with a combined CPM > 10 are plotted (n = 464). Conserved (defined by Targetscan) and expressed (CPM > 10) miRNA families were selected for targeting analysis (n = 106). Families with significant targeting signatures (Bonferroni corrected P value < 0.05, one sided Wilcoxon rank-sum test) are colored red. Families chosen for subsequent analysis are labeled with a representative miRNA (n = 5). The PGK cell line was used for analysis. (B) 3′UTR reporters were binned into quartiles based on predicted strength of miRNA targeting for miRNAs labeled in (A). The sum of all context scores from each of five miRNAs was used. The median protein score per bin was plotted. Quartile 1 (Q1) represents 3′UTR reporters with the strongest predicted repression. The control set consists of miRNA targets for any of five randomly selected unexpressed (CPM < 10) miRNAs. One sided Wilcoxon rank-sum tests were used to calculate P values for each quartile compared to the control set. (C) Spectrum motif enrichment analysis (SPMA) results, performed by ranking transcriptome-wide RNA stability (RNA-seq/PRO-seq), binning transcripts into 40 bins and performing motif enrichment per bin. Each point (representing a motif) is colored based on the expression (measured in log2 transcripts per million, TPM) of the RBP for the cognate motif. For motifs represented by multiple RBPs, the most highly expressed RBP was chosen. The slope is calculated from the line of best fit for each motif's enrichment across the 40 bins. The y-axis is the Pvalue for the F statistic. (D) SPMA results for PCBP2, PCBP3, and PCBP4, ranked by transcriptome-wide RNA stability (RNA-seq/PRO-seq). (Top) Sequence logo for the given motif. (Bottom) Heatmap and scatterplot depicting log2 motif enrichment in each of the 40 bins relative to background. (E) SPMA results as described in (C), except transcripts were ranked by 3′UTR reporter protein scores. (F) As described in (D), except transcripts were ranked by 3′UTR reporter protein scores, and ELAVL1/ELAVL3 enrichments are plotted. (G) As described in (F), except enrichment plots are shown for RBM4 (top) and YBX1 (bottom). (H) Cumulative distribution function (CDF) plot comparing 3′UTR reporters containing at least one reproducible ELAVL1 eCLIP peak to reporters without an eCLIP peak. A one-sided Wilcoxon rank-sum test was used to calculate P value.

We next investigated the relationship between the presence of RBP motifs and expression. To this end, we performed Spectrum Motif Enrichment Analysis (SPMA) using Transite [73]. SPMA takes as input a ranked (e.g. by transcript abundance or fold-change) list of transcripts, divides the ranked list into bins, and calculates enrichment, or depletion, of RBP motifs per bin relative to background (Supplementary Fig. S6F). A line of best fit is constructed across the bins using the per-bin enrichment values, and the slope describes the relationship between motif enrichment and the parameter used to rank the bins. An advantage of this approach is that the entire list of ranked genes is used, rather than arbitrarily restricting the data to a foreground set, as is done with more traditional motif enrichment analyses [116].

We performed SPMA first on transcriptome-wide RNA stability (estimated using RNA-seq and PRO-seq) to identify RBP motifs that may be actively influencing RNA stability. Out of 174 motifs covering 123 RBPs in the Transite database, we found 46 RBPs in the PGK cell line whose motif enrichment displayed nonzero slope across the spectrum of 40 bins (F statistic adjusted P value < 0.05, adjusted r2 > 0.4; Fig. 5C and Supplementary Fig. S6G). For example, the motif for PCBP2/ PCBP3/PCBP4 was enriched in transcripts with higher RNA stability and depleted in those with lower RNA stability, with a slope of ∼1, associating stronger motif matches with increased stability (Fig. 5D). Indeed, the poly(C) binding protein family of RBPs has been implicated in transcript stabilization and translation activation, consistent with the trends we observe [117].

We then performed SPMA by ranking 3′UTR reporters based on their high-throughput protein scores and calculating motif enrichment across the 40 bins for motifs corresponding to the 123 RBPs in the Transite database. We observed RBP motifs enriched in 3′UTRs with low (slope < 0) or high (slope > 0) protein scores (Fig. 5E and Supplementary Fig S6H). For example, motifs for ELAVL1 (HuR), ELAVL2 and ELAVL3 were enriched in 3′UTRs with high protein scores (Fig. 5F). The ELAV family of RBPs have been implicated in many aspects of post-transcriptional regulation, including transcript stabilization [118]. In contrast, motifs for RBM4/RBM4B and YBX1 were enriched in lowly expressed 3′UTRs; both of these RBPs have both been implicated in post-transcriptional repression (Fig. 5G) [119, 120].

Often, RBPs bind only a subset of potential motifs in vivo, presumably because binding is influenced by many factors [26, 121]. As an orthogonal method to validate RNA binding of 3′UTR reporters, we used publicly available eCLIP-seq data to detect in vivo RBP binding events, which was performed for ELAVL1 in K562 cells [23, 27]. We found 105 3′UTR reporters with at least one reproducible input-normalized ELAVL1 eCLIP peak; many 3′UTRs contained multiple peaks. Consistent with the SPMA results, we found putative ELAVL1 target 3′UTR reporters to have higher protein output (Fig. 5H). We note that ELAVL1 eCLIP was performed in a different cell line, and therefore there could be cell line-specific differences in RBP binding [122, 123]. Nevertheless, it is likely that ELAVL1 is mediating significantly increased expression from 3′UTR reporters that contain validated binding sites for this RBP. In all, these data suggest that miRNAs play a relatively minor role in determining quantitative gene regulation, and other factors, including the RBPs we have implicated, may play a more extensive role in mediating the regulatory impact of 3′UTRs.

The impact of promoter identity on post-transcriptional regulation mediated by the 3′UTR

A goal of this study was to compare the regulatory impact of 3′UTRs when their transcripts were expressed from different promoters, as links between transcriptional and post-transcriptional processes have been observed [30, 85, 124, 125]. With this strategy, we could empirically determine the frequency with which transcriptional inputs modulate post-transcriptional regulation. While protein scores for independently sorted replicate assays were near-identical (r = 1.0, Supplementary Fig. S2A), the protein scores for 3′UTRs expressed from either the PGK or CAG promoter were less correlated (r = 0.85, Fig. 1F). We observed consistent differences for internal replicates within the protein and RNA abundance MPRAs (Fig. 6A, Supplementary Fig. S7A and B), indicating that at least some of the discordance is due to genuine regulation rather than technical variation between assays. For example, ZNF808 (Fig. 6A, in green) had a protein score of -0.52 (29th percentile) and + 1.6 (99th percentile) when expressed from the PGK or CAG promoter, respectively; this discordance reflects a substantial difference in expression, indicating possible interplay between promoter and 3′UTR.

Figure 6.

Figure 6.

Post-transcriptional regulation varies between CAG and PGK promoters. (A) Barcode correlations for 3′UTR protein scores when expressed from the PGK (x-axis) and CAG (y-axis) promoter. Each point is a barcode-3′UTR linkage. Examples of 3′UTRs with consistent barcode changes between PGK and CAG are colored. (B) MA plot for RNA-seq, comparing PGK and CAG cell lines (each in triplicate). X-axis: mean expression per gene across all six samples. Y-axis: log2 fold-change per gene (CAG/PGK). Genes in red are differentially expressed RBPs (adjusted P value < 0.05, log2 fold-change > 0.2, n = 378). Genes in blue are differentially expressed RBPs with a log2 fold-change > 1 (n = 83). Overall, 3362 and 3974 genes were found to be up and downregulated in the CAG cell line, respectively (adjusted P value < 0.05, log2 fold-change > 0.2). (C) Plot comparing CAG/PGK log2 fold-changes in PRO-seq (x-axis) with RNA-seq (y-axis). Transcripts with significantly different changes in RNA abundance are colored (n = 1208, adjusted P value < 0.05, log2 fold-change > 0.2). (D) SPMA results, performed by ranking CAG/PGK differences in transcriptome-wide RNA stability (RNA-seq/PRO-seq), binning transcripts into 40 bins and performing motif enrichment in each bin. Each point (representing a motif) is colored based on the log2 fold-change (CAG/PGK) in expression (TPM) of the RBP for the cognate motif. For motifs associated with multiple RBPs, the RBP with the largest magnitude fold-change was selected. The slope is calculated from the line of best fit for each motif's enrichment across the 40 bins. The y-axis is the P value for the F statistic. (E) As described in (D) except SPMA was performed for 3′UTR reporters ranked by protein score differences (CAG - PGK). (F) Sequence logo for ELAVL2 (top) and motif enrichment across the bins of ranked 3′UTR reporter protein score differences (bottom). (G) 3′UTR reporters were binned into quartiles of increasing predicted miRNA targeting strength for targets of miRNAs up-regulated in PGK (adjusted P value < 0.05, log2 fold-change < 0) and differentially active (Bonferroni-corrected P value < 0.05). Context scores for two miRNAs, miR-199 and miR-221, were summed. The difference in protein score between CAG and PGK is plotted on the x-axis. All 3′UTRs were included in the control group. (H) Flow cytometry data of four 3′UTR reporters integrated into six monoclonal landing pad cell lines (three CAG, three PGK). The geometric mean for each reporter was taken from gated GFP+/BFP- recombinants and normalized to the cell-line matched geometric mean of RBPMS2 (3′UTR concordant between the PGK and CAG cell lines). (I) 3′UTRs were binned by number of cleavage and polyadenylation (CPA) sites, either one CPA (single-UTR) or multiple CPA (multi-UTR) and the difference in protein scores between CAG and PGK cell lines plotted. A two-sided Wilcoxon rank-sum test was used to calculate P value.

There are multiple mechanisms, not mutually exclusive, that could result in the large differences observed: (1) Promoters and enhancers have been reported to influence post-transcriptional regulation through methylation [124], association of RBPs at promoter elements [28], and alterations to cleavage and polyadenylation site choice [30]. It is possible that the PGK and CAG promoters differentially influence co- and post-transcriptional regulation. (2) Although both PGK and CAG landing pad cell lines were generated in parallel from the same cells, they may have diverged during the process of clonalization, as the HEK293T cell line contains substantial cell-to-cell heterogeneity in genomic structure and gene copy number [51]. Thus, differential expression of trans factors, such as miRNAs or RBPs, could be responsible for the observed changes in 3′UTR reporter levels. (3) Because PGK is a weaker promoter than CAG, the relative concentration of GFP transcripts to trans factors will vary between cell lines, potentially altering the degree of regulation [77, 78, 126]. (4) As the promoter defines the transcription start site, the 5′UTRs are different between the PGK and CAG cell lines, which could be differentially influencing regulation at the 3′ end of the transcript [127]. However, most of the discrepancies observed are due to differences in RNA abundance (Supplementary Fig. S7A and S7B); 5′UTRs are primarily implicated in translation regulation, and therefore it is unlikely the 5′UTR is a confounding factor [128].

To evaluate whether differential expression of trans factors is influencing the differences in 3′UTR reporter expression, we performed differential expression analysis using RNA-seq in the PGK and CAG landing pad cell lines. Surprisingly, we observed extensive differences in the transcriptome between the two cell lines, including in the expression of mRNA binding proteins (Fig. 6B). We used transcriptome-wide estimates of RNA stability (using RNA-seq and PRO-seq) to detect transcripts with differences in RNA stability (Fig. 6C). We identified some endogenous genes with differences in RNA stability, but observed generally concordant changes in RNA-seq and PRO-seq, indicating most changes are due to transcriptional rather than post-transcriptional regulation (Fig. 6C).

We next asked whether differences in post-transcriptional regulation are associated with specific RBP motifs. We performed SPMA to identify differentially enriched RBP motifs using log2 fold-change values (CAG/PGK) of transcriptome-wide RNA stability. We found evidence of differential RBP motif enrichments for very few RBPs, including KHDRBS2 and KHDRBS3 (Fig. 6D), which are differentially expressed (Fig. 6B). SPMA indicated that RBP motifs are not associated with translational differences (Supplementary Fig. S7C), as expected given that translational efficiency for 3′UTR reporters is similar when driven by the PGK or CAG promoter (Supplementary Fig. S7B). Interestingly, we observed many RBP motifs enriched in differentially expressed 3′UTR reporters, both at the protein and RNA levels (Fig. 6E and Supplementary Fig. S7D). Notably, many of these RBPs are not differentially expressed between the cell lines, nor do they show evidence of differential targeting transcriptome-wide (Fig. 6D). One such example is ELAVL2 (HuB), part of the ELAV/Hu family, which often directs transcript stabilization [129]. Along with motifs for ELAVL1 and ELAVL3, ELAVL2 motifs were enriched in 3′UTRs up-regulated when driven by the PGK promoter (slope < 0, Fig. 6E and F). These data indicate that the promoter identity is influencing 3′UTR-mediated regulation, perhaps by co-transcriptionally recruiting RBPs.

We also investigated whether differentially expressed miRNAs were contributing to regulatory differences. To this end, we performed sRNA-seq and subsequent miRNA targeting analysis in the PGK and CAG cell lines. We found minimal differential expression of miRNAs, and no evidence of differential miRNA activity in the transcriptome (Supplementary Fig. S7E-S7H) nor in the 3′UTR library (Fig. 6G). Therefore, we concluded that differentially expressed miRNAs were not responsible for the observed differences.

The data above suggest that the promoter is driving the differences observed in 3′UTR-mediated gene expression; however, because the landing pads were integrated into the genome randomly, local chromatin context may also be influencing these observed differences. To test the hypothesis directly, we selected 3′UTRs with protein scores concordant (RBPMS2) or discordant (ZNF808, ZNF682, and VGLL1) in the PGK and CAG cell lines. We integrated each of these 3′UTR reporters into three independently generated PGK cell lines and three CAG cell lines; thus, each of these monoclonal cell lines have landing pads integrated at different genomic loci. We normalized each of the three discordant 3′UTR reporters to the cell line-matched concordant reporter (RBPMS2). We observed that, in general, the fold-changes were higher when the 3′UTR reporter was expressed from the CAG promoter relative to the PGK promoter, consistent with our hypothesis (Fig. 6H). For example, in the three PGK cell lines the ZNF808 reporter was on average 1.8-fold higher than the RBPMS2 control, but 2.4-fold higher in the three CAG cell lines. In all, these data indicate that for these 3′UTRs, their regulatory impact is modulated primarily by promoter identity.

We implicated RBPs in the ELAV family (ELAVL1, ELAVL2, and ELAVL3) in the promoter-driven regulation observed (Fig. 6E and F). Interestingly, in Drosophila, ELAV binds promoters and influences alternative cleavage and polyadenylation (APA). If this mechanism was involved, we expected that 3′UTR reporters with multiple cleavage and polyadenylation (CPA) sites would be more likely to show regulatory differences between the CAG and PGK cell lines. However, we observed similar distributions for 3′UTRs containing one or multiple CPA sites (Fig. 6I). This indicates that post-transcriptional regulation, rather than co-transcriptional processing, is more likely involved.

We next used publicly available ELAVL1 eCLIP-seq data to ascertain whether 3′UTR reporters with larger differences in protein output preferentially bind ELAVL1 in vivo. We observed that ELAVL1 targets in our 3′UTR reporter library have a higher protein output when driven by the PGK promoter when compared to the CAG promoter (Supplementary Fig. S7I). Given ELAVL1’s established role in transcript stabilization [118], this data agrees with the SPMA results suggesting ELAVL1 motifs are enriched in 3′UTRs up-regulated in the PGK cell line. Perhaps the PGK promoter, taken from an endogenous human gene, aids in ELAVL1 loading, in contrast to the CAG promoter. To conclude, our data indicate that differentially expressed trans factors are not driving the majority of the differential 3′UTR-mediated regulation we observe. Instead, our data suggest that the promoter is playing a role in the differential regulation of 3′UTRs.

Discussion

A framework to study 3′UTR-driven regulation

In this work, we designed, optimized and implemented a series of assays to quantify the impact of 3′UTRs on gene regulation on a large scale using high-throughput, pooled, fluorescent reporter-based screens combined with next-generation sequencing. The collection of 3′UTRs included in these assays was depleted for housekeeping genes and enriched for kinases, transcription factors, and RBPs, genes that likely exhibit more complex modes of regulation [54]. We first showed this system is accurate and reliable; then, for >1400 3′UTRs, we determined their impact on protein output, RNA abundance, and translational efficiency. Our approach incorporates multiple innovations and improvements over other studies: (1) we integrated pooled GFP-3′UTR reporters at a single, transcriptionally stable locus using the highly efficient and accurate Bxb1 system, decreasing technical variability and increasing the reliability of our assay; (2) we used full-length 3′UTRs to best reflect post-transcriptional regulation; and (3) we generated data reflecting the impact of 3′UTRs on both translation and transcript abundance in order to define relationships between these mechanisms.

A critical feature of our approach is the analysis of full-length (up to ∼2400 nt) 3′UTRs rather than short fragments, as are typically used in high-throughput reporter assays [37]. Identical 3′UTR elements have been observed to behave substantially differently depending on surrounding sequence context [42, 47], and thus including the full sequence context likely more closely reflects endogenous regulation. Indeed, we demonstrated that RNA abundance measurements from our 3′UTR reporter assays agree with RNA stability measured for endogenous transcripts (Fig. 3), and with RNA stability measurements obtained using our MPRA system.

Post-transcriptional regulation is often thought to only exert a subtle influence on gene expression (<2-fold) [48]. We found that while most 3′UTRs influence expression over a 4-fold range, the overall dynamic range was substantial (16-fold). This level of regulation is on par with, or more than, that typically observed for enhancers [130, 131].

Disentangling modes of post-transcriptional regulation

We measured the impact of 3′UTRs on RNA abundance, translational efficiency, and total protein output, allowing us to explore the relationships between regulatory mechanisms (Fig. 2). A correlation between RNA abundance and translational efficiency, often attributed to influence from the coding region [132, 133], has long been observed, but this relationship remains poorly understood [134]. Although many 3′UTR-binding trans factors are known to influence both RNA stability and translational efficiency, e.g. miRNAs [20] and Pumilio proteins [135], the extent to which the 3′UTR connects these two processes has not been defined. Interestingly, we found that 3′UTR-governed RNA abundance and translational efficiency were well-correlated (Fig. 3). This may be due to trans factors that influence both processes [93], or to an intrinsic link between mRNA decay and translational repression. For example, removal of the 5′ cap and poly(A) tail occurring during mRNA decay represses translation, and translation-coupled mRNA decay pathways can also link the two processes [94, 136]. Although generally correlated, we also discovered examples of 3′UTRs with discordant levels of RNA and translation, representing candidates for mechanistic dissection, e.g. to identify cis regulatory elements within 3′UTRs that enhance translation.

Interestingly, our data suggest that 3′UTRs may impact protein output equally or even predominantly through translational regulation. Indeed, the reliance on widely-used transcriptomic-based approaches such as RNA-seq to examine 3′UTR-mediated regulation may create a bias for elucidating mechanisms involving RNA decay. As methods to investigate translational regulation, such as ribosome profiling, become more broadly used, perhaps a more substantial role for translation in 3′UTR-mediated regulation will be revealed.

Regulation via 3′UTR length and trans factors

Analysis of full-length 3′UTRs lent us the unique opportunity to investigate the influence of 3′UTR length on gene expression independent of confounding factors such as the 5′UTR and coding region. We observed an increasingly repressive effect of longer 3′UTRs, which could derive from an accumulation of repressive cis regulatory elements; however, we found a similar pattern when restricting our analysis to poorly conserved sequences likely depleted in regulatory elements, suggesting that 3′UTR length is also repressive in an element-independent manner (Fig. 4). We have observed this phenomenon orthogonally using randomly generated 3′UTRs of varying lengths [113]. These effects could be due to several non-mutually exclusive mechanisms. For example, most 3′UTRs lack introns and therefore EJC complexes; thus, long 3′UTRs may be inefficiently packaged and exported from the nucleus [137, 138]. Transcripts with long 3′UTRs may also be more likely to drive phase separation to repressive RNA granules [139], and have also been shown to trigger NMD [110]. Additionally, long 3′UTRs may influence 5′ to 3′ mRNA looping and/or ribosome recycling [140].

Using our high-throughput reporter data, we identified motifs correlated with trans factor binding (Fig. 5). Specifically, we have implicated the ELAV family of RBPs as activating 3′UTR binding factors, as well as YBX1 and RBM4 as repressive factors. However, because many RBPs bind degenerate and overlapping sequences, it is difficult to attribute such motifs to specific RBPs with certainty, and therefore the RBPs implicated here warrant additional, more targeted investigations. Although miRNAs are the most well-studied post-transcriptional regulators, our data suggest that miRNAs may play only a minor role in 3′UTR-driven regulation relative to RBPs. This is not particularly surprising, as only a small fraction of 3′UTR conservation corresponds to miRNA target sites, suggesting extensive additional regulatory pathways [4, 141]. Nevertheless, it is likely that contributions from different post-transcriptional regulators vary under different cellular contexts, and under dynamic cell states such as during development. Importantly, the framework we developed can be easily adapted to focus on specific 3′UTRs and/or RBPs of interest, and is readily implemented in any cell culture system.

The promoter influences 3′UTR-mediated regulation

We assayed 3′UTR reporters under control of two distinct promoters and found striking differences in their behavior (Fig. 6). The PGK and CAG landing pad cell lines differed in their transcriptomes, likely due to the known cell-to-cell heterogeneity in the HEK293T genome [51]; however, our transcriptomic analyses indicate it is unlikely that differentially expressed trans factors are impacting post-transcriptional regulation in a cell line-dependent manner. Importantly, we confirmed that 3′UTR reporter differences were reproducible in multiple independent monoclonal cell lines, indicating that the promoter is most likely primarily responsible for mediating these changes in regulation. We found many RBPs whose motifs were enriched in the differentially regulated 3′UTRs, suggesting that the promoter may be facilitating RBP loading and subsequent post-transcriptional regulation. While the Tet-On and CAG promoters are synthetic, the PGK promoter more closely resembles that of an endogenous human gene. Indeed, we found many more motifs associated with RBPs implicated in 3′UTR-mediated regulation in the PGK cell line relative to the CAG cell line (Fig. 5 and Supplementary Fig. S6), supporting this hypothesis. In particular, we identified motifs associated with the ELAV family as candidate regulators, proteins that have been previously implicated in connecting transcriptional to co- and post-transcriptional processes [28]. It is estimated that ∼50% of transcription factors can bind RNA [142] and ∼60% of RBPs associate with chromatin in a promoter-specific manner [143], further suggesting that mechanistic links between these processes are pervasive. Notably, promoter choice has been linked to 3′ CPA site choice [85], a mechanism in which ELAV proteins have been implicated [28]. It is possible that the promoters used in our system are differentially influencing 3′ CPA site choice, which in turn leads to differences in post-transcriptional regulation. More work is needed to parse out the exact mechanisms responsible for differential regulation of 3′UTRs when under control of different promoters.

Limitations

Our study is not without limitations. The use of GFP reporters is a double-edged sword; influence from the promoter, 5′UTR, and coding region is removed and therefore all observed regulation can be directly attributed to the 3′UTR. However, as illustrated above and observed in other studies, regulation of the 3′UTR can be influenced by the promoter and other regions of the transcript [30, 95, 127]. The use of orthogonal assays, such as CRISPR-Cas9 gene editing, can further support or refute observations made from reporter assays [144]. Additionally, while a strength of our study is the use of full-length 3′UTRs, we acknowledge that alternative CPA site usage may influence our results, and indeed could be the mechanism by which we observe promoter-dependent differences.

Although we showed that our RNA abundance metrics correlate closely with RNA stability, it is important to acknowledge that multiple additional factors impact RNA abundance. Moreover, although we measured almost all aspects of post-transcriptional quantitative regulation (RNA stability, RNA abundance, and translational efficiency), we did not measure mRNA subcellular localization, nuclear export, and sequestration to RNA granules, which may be influencing the measurements here. Indeed, this is a complex consideration, as RNA stability (and other parameters) is influenced by localization, and a given transcript may be partitioned across different subcellular localizations.

Additionally, much 3′UTR-mediated regulation is likely context-specific [17, 145] and thus not present in HEK293T cells at steady-state, particularly for 3′UTRs from genes expressed in a tissue-specific manner. While this study was not designed to capture the entire scope of 3′UTR-mediated regulation, our approach can be performed in additional cellular contexts or under dynamic conditions such as differentiation. In all, this study represents a comprehensive characterization of full-length human 3′UTRs.

Supplementary Material

gkaf568_Supplemental_File

Acknowledgements

We thank Jennifer Grenier, Ann Tate, and the Cornell Transcriptional Regulation and Expression Core Facility (RRID:SCR_022532) for helpful discussions and assistance with RNA-seq and small RNA-seq library generation; Peter Schweitzer and the Cornell BRC Genomics Core Facility (RRID:SCR_021727) for advice and technical support with Illumina sequencing; the University of Rochester Genomics Research Center Core Facility (RRID:SCR_012359) for Illumina sequencing support; the Cornell Bioinformatics Facility (RRID:SCR_021757) for data storage and server management; the Cornell University BRC Flow Cytometry Facility (RRID:SCR_021740) for FACS support; and the Cornell Statistical Consulting Unit for statistical advice. We thank Jacqueline Copeland, Brian Feng, Crystal Cheng, and Saleha Tahseen for technical support with validation reporters. We also thank Roman Spektor for providing Tn5 transposase and members of the Grimson laboratory for fruitful discussions. Finally, we thank the anonymous reviewers for their valuable and thoughtful input.

Author contributions: J.D.W. and A.G. conceived and planned experiments, interpreted data, and wrote the paper. J.D.W. performed most of the experiments and all of the analyses. H.J.S. performed PRO-seq, RNA stability assays, and 3′RACE. L.T.V. helped generate validation reporters. E.A.F. provided assistance with generating the reporter library. K.A.M. and D.M.F. provided cell lines, plasmids, and technical advice related to the Bxb1 system. A.G. supervised the research and secured funding. All authors provided feedback and approved the manuscript.

Contributor Information

Jessica D West, Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, United States.

Hannah J Smith, Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, United States.

Luyen Tien Vu, Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, United States.

Elizabeth A Fogarty, Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, United States.

Kenneth A Matreyek, Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, OH 44106, United States.

Douglas M Fowler, Department of Genome Sciences, University of Washington, Seattle, WA 98115, United States; Department of Bioengineering, University of Washington, Seattle, WA 98115, United States.

Andrew Grimson, Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, United States.

Supplementary data

Supplementary data is available at NAR online.

Conflict of interest

None declared.

Funding

This work was supported by the National Institutes of Health [R21HG011512, P50HD104454 to A.G.; R35GM152106, RM1HG010461 to D.M.F.] and the Center for Vertebrate Genomics awarded to J.D.W. Funding to pay the Open Access publication charges for this article was provided by Discretionary funds to A.G.

Data availability

Raw sequencing data and processed data tables reported in this article have been deposited to Gene Expression Omnibus (GEO) and are available under the SuperSeries accession number GSE270254.

References

  • 1. Cramer  P  Organization and regulation of gene transcription. Nature. 2019; 573:45–54. 10.1038/s41586-019-1517-4. [DOI] [PubMed] [Google Scholar]
  • 2. Corbett  AH  Post-transcriptional regulation of gene expression and human disease. Curr Opin Cell Biol. 2018; 52:96–104. 10.1016/j.ceb.2018.02.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Mayr  C  Evolution and biological roles of alternative 3’UTRs. Trends Cell Biol. 2016; 26:227–37. 10.1016/j.tcb.2015.10.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Siepel  A, Bejerano  G, Pedersen  JS  et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005; 15:1034–50. 10.1101/gr.3715005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Gulko  B, Hubisz  MJ, Gronau  I  et al.  A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat Genet. 2015; 47:276–83. 10.1038/ng.3196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Consortium  GTE  The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020; 369:1318–30. 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Wang  ET, Sandberg  R, Luo  S  et al.  Alternative isoform regulation in human tissue transcriptomes. Nature. 2008; 456:470–6. 10.1038/nature07509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Elkon  R, Ugalde  AP, Agami  R  Alternative cleavage and polyadenylation: extent, regulation and function. Nat Rev Genet. 2013; 14:496–506. 10.1038/nrg3482. [DOI] [PubMed] [Google Scholar]
  • 9. Boreikaitė  V, Passmore  LA  3′-End processing of eukaryotic mRNA: machinery, regulation, and impact on gene expression. Annu Rev Biochem. 2023; 92:199–225. 10.1146/annurev-biochem-052521-012445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Mayr  C, Bartel  DP  Widespread shortening of 3’UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell. 2009; 138:673–84. 10.1016/j.cell.2009.06.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Lianoglou  S, Garg  V, Yang  JL  et al.  Ubiquitously transcribed genes use alternative polyadenylation to achieve tissue-specific expression. Genes Dev. 2013; 27:2380–96. 10.1101/gad.229328.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Agarwal  V, Lopez-Darwin  S, Kelley  DR  et al.  The landscape of alternative polyadenylation in single cells of the developing mouse embryo. Nat Commun. 2021; 12:5101. 10.1038/s41467-021-25388-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Gao  Y, Li  L, Amos  CI  et al.  Analysis of alternative polyadenylation from single-cell RNA-seq using scDaPars reveals cell subpopulations invisible to gene expression. Genome Res. 2021; 31:1856–66. 10.1101/gr.271346.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Fansler  MM, Mitschka  S, Mayr  C  Quantifying 3′UTR length from scRNA-seq data reveals changes independent of gene expression. Nat Commun. 2024; 15:4050. 10.1038/s41467-024-48254-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Zheng  D, Wang  R, Ding  Q  et al.  Cellular stress alters 3′UTR landscape through alternative polyadenylation and isoform-specific degradation. Nat Commun. 2018; 9:2268. 10.1038/s41467-018-04730-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Gruber  AJ, Zavolan  M  Alternative cleavage and polyadenylation in health and disease. Nat Rev Genet. 2019; 20:599–614. 10.1038/s41576-019-0145-z. [DOI] [PubMed] [Google Scholar]
  • 17. Mitschka  S, Mayr  C  Context-specific regulation and function of mRNA alternative polyadenylation. Nat Rev Mol Cell Biol. 2022; 23:779–96. 10.1038/s41580-022-00507-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Mayr  C  What are 3’ UTRs doing?. Cold Spring Harb Perspect Biol. 2019; 11:a034728. 10.1101/cshperspect.a034728. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Bartel  DP  MicroRNAs: target recognition and regulatory functions. Cell. 2009; 136:215–33. 10.1016/j.cell.2009.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Bartel  DP  Metazoan MicroRNAs. Cell. 2018; 173:20–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Chakrabarti  AM, Iosub  IA, Lee  FCY  et al.  A computationally-enhanced hiCLIP atlas reveals Staufen1-RNA binding features and links 3’ UTR structure to RNA metabolism. Nucleic Acids Res. 2023; 51:3573–89. 10.1093/nar/gkad221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Gerstberger  S, Hafner  M, Tuschl  T  A census of human RNA-binding proteins. Nat Rev Genet. 2014; 15:829–45. 10.1038/nrg3813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Van Nostrand  EL, Freese  P, Pratt  GA  et al.  A large-scale binding and functional map of human RNA-binding proteins. Nature. 2020; 583:711–9. 10.1038/s41586-020-2077-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Findlay  SD, Romo  L, Burge  CB  Quantifying negative selection in human 3′ UTRs uncovers constrained targets of RNA-binding proteins. Nat Commun. 2024; 15:85. 10.1038/s41467-023-44456-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Lambert  N, Robertson  A, Jangi  M  et al.  RNA Bind-n-seq: quantitative assessment of the sequence and structural binding specificity of RNA binding proteins. Mol Cell. 2014; 54:887–900. 10.1016/j.molcel.2014.04.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Dominguez  D, Freese  P, Alexis  MS  et al.  Sequence, structure, and context preferences of human RNA binding proteins. Mol Cell. 2018; 70:854–867.e9. 10.1016/j.molcel.2018.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Van Nostrand  EL, Pratt  GA, Shishkin  AA  et al.  Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat Methods. 2016; 13:508–14. 10.1038/nmeth.3810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Oktaba  K, Zhang  W, Lotz  TS  et al.  ELAV links paused Pol II to alternative polyadenylation in the Drosophila nervous system. Mol Cell. 2015; 57:341–8. 10.1016/j.molcel.2014.11.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Hilgers  V  Alternative polyadenylation coupled to transcription initiation: insights from ELAV-mediated 3’ UTR extension. RNA Biol. 2015; 12:918–21. 10.1080/15476286.2015.1060393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Kwon  B, Fansler  MM, Patel  ND  et al.  Enhancers regulate 3’ end processing activity to control expression of alternative 3’UTR isoforms. Nat Commun. 2022; 13:2709. 10.1038/s41467-022-30525-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Presnyak  V, Alhusaini  N, Chen  Y-H  et al.  Codon optimality is a major determinant of mRNA stability. Cell. 2015; 160:1111–24. 10.1016/j.cell.2015.02.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Narula  A, Ellis  J, Taliaferro  JM  et al.  Coding regions affect mRNA stability in human cells. RNA. 2019; 25:1751–64. 10.1261/rna.073239.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Jia  L, Mao  Y, Ji  Q  et al.  Decoding mRNA translatability and stability from the 5′ UTR. Nat Struct Mol Biol. 2020; 27:814–21. 10.1038/s41594-020-0465-x. [DOI] [PubMed] [Google Scholar]
  • 34. Neugebauer  KM  Nascent RNA and the coordination of splicing with transcription. Cold Spring Harb Perspect Biol. 2019; 11:a032227. 10.1101/cshperspect.a032227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Reimer  KA, Mimoso  CA, Adelman  K  et al.  Co-transcriptional splicing regulates 3’ end cleavage during mammalian erythropoiesis. Mol Cell. 2021; 81:998–1012. 10.1016/j.molcel.2020.12.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Inoue  F, Ahituv  N  Decoding enhancers using massively parallel reporter assays. Genomics. 2015; 106:159–64. 10.1016/j.ygeno.2015.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Klein  JC, Agarwal  V, Inoue  F  et al.  A systematic evaluation of the design and context dependencies of massively parallel reporter assays. Nat Methods. 2020; 17:1083–91. 10.1038/s41592-020-0965-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Zhao  W, Pollack  JL, Blagev  DP  et al.  Massively parallel functional annotation of 3′ untranslated regions. Nat Biotechnol. 2014; 32:387–91. 10.1038/nbt.2851. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Oikonomou  P, Goodarzi  H, Tavazoie  S  Systematic identification of regulatory elements in conserved 3’ UTRs of human transcripts. Cell Rep. 2014; 7:281–92. 10.1016/j.celrep.2014.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Rabani  M, Pieper  L, Chew  G-L  et al.  A massively parallel reporter assay of 3′ UTR sequences identifies In vivo rules for mRNA degradation. Mol Cell. 2017; 68:1083–94. 10.1016/j.molcel.2017.11.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Vainberg Slutskin  I, Weingarten-Gabbay  S, Nir  R  et al.  Unraveling the determinants of microRNA mediated regulation using a massively parallel reporter assay. Nat Commun. 2018; 9:529. 10.1038/s41467-018-02980-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Cottrell  KA, Chaudhari  HG, Cohen  BA  Djuranovic S. PTRE-seq reveals mechanism and interactions of RNA binding proteins and miRNAs. Nat Commun. 2018; 9:301. 10.1038/s41467-017-02745-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Litterman  AJ, Kageyama  R, Tonqueze  OL  et al.  A massively parallel 3′ UTR reporter assay reveals relationships between nucleotide content, sequence conservation, and mRNA destabilization. Genome Res. 2019; 29:896–906. 10.1101/gr.242552.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Griesemer  D, Xue  JR, Reilly  SK  et al.  Genome-wide functional screen of 3′UTR variants uncovers causal variants for human disease and evolution. Cell. 2021; 184:5247–60. 10.1016/j.cell.2021.08.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Siegel  DA, Le  Tonqueze O, Biton  A  et al.  Massively parallel analysis of human 3’ UTRs reveals that AU-rich element length and registration predict mRNA destabilization. G3. 2022; 12:jkab404. 10.1093/g3journal/jkab404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Fu  T, Amoah  K, Chan  TW  et al.  Massively parallel screen uncovers many rare 3′ UTR variants regulating mRNA abundance of cancer driver genes. Nat Commun. 2024; 15:3335. 10.1038/s41467-024-46795-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Kristjánsdóttir  K, Fogarty  EA, Grimson  A  Systematic analysis of the Hmga2 3’ UTR identifies many independent regulatory sequences and a novel interaction between distal sites. RNA. 2015; 21:1346–60. 10.1261/rna.051177.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Grimson  A, Farh  KK-H, Johnston  WK  et al.  MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell. 2007; 27:91–105. 10.1016/j.molcel.2007.06.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Mayr  C  Regulation by 3’-untranslated regions. Annu Rev Genet. 2017; 51:171–94. 10.1146/annurev-genet-120116-024704. [DOI] [PubMed] [Google Scholar]
  • 50. Young  L, Sung  J, Stacey  G  et al.  Detection of Mycoplasma in cell cultures. Nat Protoc. 2010; 5:929–34. 10.1038/nprot.2010.43. [DOI] [PubMed] [Google Scholar]
  • 51. Lin  Y-C, Boone  M, Meuris  L  et al.  Genome dynamics of the human embryonic kidney 293 lineage in response to cell biology manipulations. Nat Commun. 2014; 5:4767. 10.1038/ncomms5767. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Matreyek  KA, Stephany  JJ, Fowler  DM  A platform for functional assessment of large variant libraries in mammalian cells. Nucleic Acids Res. 2017; 45:e102–. 10.1093/nar/gkx183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Fu  C, Donovan  WP, Shikapwashya-Hasser  O  et al.  Hot fusion: an efficient method to clone multiple DNA fragments as well as inverted repeats without ligase. PLoS One. 2014; 9:e115318. 10.1371/journal.pone.0115318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Kotagama  K, Babb  CS, Wolter  JM  et al.  A human 3’UTR clone collection to study post-transcriptional gene regulation. BMC Genomics. 2015; 16:1036. 10.1186/s12864-015-2238-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Picelli  S, Björklund  AK, Reinius  B  et al.  Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res. 2014; 24:2033–40. 10.1101/gr.177881.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Matreyek  KA, Stephany  JJ, Chiasson  MA  et al.  An improved platform for functional assessment of large protein libraries in mammalian cells. Nucleic Acids Res. 2020; 48:e1. 10.1093/nar/gkz910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Baird  TD, Hogg  JR  Using tet-off cells and RNAi knockdown to assay mRNA decay. Methods Mol Biol. 2018; 1720:161–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Judd  J, Wojenski  LA, Wainman  LM  et al.  A rapid, sensitive, scalable method for Precision run-on sequencing (PRO-seq). bioRxiv19 May 2020, preprint: not peer reviewed 10.1101/2020.05.18.102277. [DOI]
  • 59. Chen  C-YA, Ezzeddine  N, Shyu  A-B  Messenger RNA half-life measurements in mammalian cells. Methods Enzymol. 2008; 448:335–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Kozomara  A, Birgaoanu  M, miRBase  G-JS  from microRNA sequences to function. Nucleic Acids Res. 2019; 47:D155–62. 10.1093/nar/gky1141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Friedländer  MR, Mackowiak  SD, Li  N  et al.  miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 2012; 40:37–52. 10.1093/nar/gkr688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Love  MI, Huber  W, Anders  S  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15:550. 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Martin  M  Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet j. 2011; 17:10. 10.14806/ej.17.1.200. [DOI] [Google Scholar]
  • 64. Langmead  B, Wilks  C, Antonescu  V  et al.  Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics. 2019; 35:421–32. 10.1093/bioinformatics/bty648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Kim  D, Paggi  JM, Park  C  et al.  Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019; 37:907–15. 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Liao  Y, Smyth  GK, Shi  W  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014; 30:923–30. 10.1093/bioinformatics/btt656. [DOI] [PubMed] [Google Scholar]
  • 67. Chen  S  Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. Imeta. 2023; 2:e107. 10.1002/imt2.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Smith  T, Heger  A, Sudbery  I  UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 2017; 27:491–9. 10.1101/gr.209601.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Wang  R, Nambiar  R, Zheng  D  et al.  PolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes. Nucleic Acids Res. 2018; 46:D315–9. 10.1093/nar/gkx1000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Agarwal  V, Kelley  DR  The genetic and biochemical determinants of mRNA degradation rates in mammals. Genome Biol. 2022; 23:245. 10.1186/s13059-022-02811-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Agarwal  V, Bell  GW, Nam  J-W  et al.  Predicting effective microRNA target sites in mammalian mRNAs. eLife. 2015; 4:e05005. 10.7554/eLife.05005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Briskin  D, Wang  PY, Bartel  DP  The biochemical basis for the cooperative action of microRNAs. Proc Natl Acad Sci USA. 2020; 117:17764–74. 10.1073/pnas.1920404117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Krismer  K, Bird  MA, Varmeh  S  et al.  Transite: a computational motif-based analysis platform that identifies RNA-binding proteins modulating changes in gene expression. Cell Rep. 2020; 32:108064. 10.1016/j.celrep.2020.108064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74. Wagih  O  ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics. 2017; 33:3645–7. 10.1093/bioinformatics/btx469. [DOI] [PubMed] [Google Scholar]
  • 75. Wickham  H  ggplot2: Elegant Graphics for Data Analysis. 2009; New York, NY: Springer New York; 10.1007/978-0-387-98141-3. [DOI] [Google Scholar]
  • 76. Xu  Z, Thomas  L, Davies  B  et al.  Accuracy and efficiency define Bxb1 integrase as the best of fifteen candidate serine recombinases for the integration of DNA into the human genome. BMC Biotechnol. 2013; 13:87. 10.1186/1472-6750-13-87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Qin  JY, Zhang  L, Clift  KL  et al.  Systematic comparison of constitutive promoters and the doxycycline-inducible promoter. PLoS One. 2010; 5:e10611. 10.1371/journal.pone.0010611. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Bosson  AD, Zamudio  JR, Sharp  PA  Endogenous miRNA and target concentrations determine susceptibility to potential ceRNA competition. Mol Cell. 2014; 56:347–59. 10.1016/j.molcel.2014.09.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Mullokandov  G, Baccarini  A, Ruzo  A  et al.  High-throughput assessment of microRNA activity and function using microRNA sensor and decoy libraries. Nat Methods. 2012; 9:840–6. 10.1038/nmeth.2078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80. Scotto-Lavino  E, Du  G, Frohman  MA  3’ end cDNA amplification using classic RACE. Nat Protoc. 2006; 1:2742–5. 10.1038/nprot.2006.481. [DOI] [PubMed] [Google Scholar]
  • 81. Proudfoot  NJ  Ending the message: poly(A) signals then and now. Genes Dev. 2011; 25:1770–82. 10.1101/gad.17268411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82. Ripin  N, Boudet  J, Duszczyk  MM  et al.  Molecular basis for AU-rich element recognition and dimerization by the HuR C-terminal RRM. Proc Natl Acad Sci USA. 2019; 116:2935–44. 10.1073/pnas.1808696116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83. Le  Hir H, Saulière  J, Wang  Z  The exon junction complex as a node of post-transcriptional networks. Nat Rev Mol Cell Biol. 2016; 17:41–54. 10.1038/nrm.2015.7. [DOI] [PubMed] [Google Scholar]
  • 84. Alexopoulou  AN, Couchman  JR, Whiteford  JR  The CMV early enhancer/chicken beta actin (CAG) promoter can be used to drive transgene expression during the differentiation of murine embryonic stem cells into vascular progenitors. BMC Cell Biol. 2008; 9:2. 10.1186/1471-2121-9-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85. Alfonso-Gonzalez  C, Legnini  I, Holec  S  et al.  Sites of transcription initiation drive mRNA isoform selection. Cell. 2023; 186:2438–55. 10.1016/j.cell.2023.04.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86. King  HA, Gerber  AP  Translatome profiling: methods for genome-scale analysis of mRNA translation. BriefFunction Genomics. 2014; 15:22–31. 10.1093/bfgp/elu045. [DOI] [PubMed] [Google Scholar]
  • 87. Schwarz  G  Estimating the dimension of a model. Ann Statist. 1978; 6:461–64. 10.1214/aos/1176344136. [DOI] [Google Scholar]
  • 88. Khan  M, Hou  S, Chen  M  et al.  Mechanisms of RNA export and nuclear retention. WIREs RNA. 2023; 14:e1755. 10.1002/wrna.1755. [DOI] [PubMed] [Google Scholar]
  • 89. Wegener  M, Müller-McNicoll  M  Nuclear retention of mRNAs - quality control, gene regulation and human disease. Semin Cell Dev Biol. 2018; 79:131–42. 10.1016/j.semcdb.2017.11.001. [DOI] [PubMed] [Google Scholar]
  • 90. Decker  CJ, Parker  R  P-bodies and stress granules: possible roles in the control of translation and mRNA degradation. Cold Spring Harb Perspect Biol. 2012; 4:a012286. 10.1101/cshperspect.a012286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91. Wadsworth  GM, Srinivasan  S, Lai  LB  et al.  RNA-driven phase transitions in biomolecular condensates. Mol Cell. 2024; 84:3692–705. 10.1016/j.molcel.2024.09.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92. Ripin  N, Parker  R  Formation, function, and pathology of RNP granules. Cell. 2023; 186:4737–56. 10.1016/j.cell.2023.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93. Schneider-Lunitz  V, Ruiz-Orera  J, Hubner  N  et al.  Multifunctional RNA-binding proteins influence mRNA abundance and translational efficiency of distinct sets of target genes. PLoS Comput Biol. 2021; 17:e1009658. 10.1371/journal.pcbi.1009658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94. Monaghan  L, Longman  D, Cáceres  JF  Translation-coupled mRNA quality control mechanisms. EMBO J. 2023; 42:e114378. 10.15252/embj.2023114378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95. Mitschka  S, Mayr  C  Endogenous p53 expression in human and mouse is not regulated by its 3’UTR. eLife. 2021; 10:e65700. 10.7554/eLife.65700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96. Kurosaki  T, Popp  MW, Maquat  LE  Quality and quantity control of gene expression by nonsense-mediated mRNA decay. Nat Rev Mol Cell Biol. 2019; 20:406–20. 10.1038/s41580-019-0126-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97. Mahat  DB, Kwak  H, Booth  GT  et al.  Base-pair-resolution genome-wide mapping of active RNA polymerases using precision nuclear run-on (PRO-seq). Nat Protoc. 2016; 11:1455–76. 10.1038/nprot.2016.086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98. Kwak  H, Fuda  NJ, Core  LJ  et al.  Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science. 2013; 339:950–3. 10.1126/science.1229386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99. Patel  RK, West  JD, Jiang  Y  et al.  Robust partitioning of microRNA targets from downstream regulatory changes. Nucleic Acids Res. 2020; 48:9724–46. 10.1093/nar/gkaa687. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100. Blumberg  A, Zhao  Y, Huang  Y-F  et al.  Characterizing RNA stability genome-wide through combined analysis of PRO-seq and RNA-seq data. BMC Biol. 2021; 19:30. 10.1186/s12915-021-00949-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101. Kwak  Y, Daly  CWP, Fogarty  EA  et al.  Dynamic and widespread control of poly(A) tail length during macrophage activation. RNA. 2022; 28:947–71. 10.1261/rna.078918.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102. Baudrimont  A, Voegeli  S, Viloria  EC  et al.  Multiplexed gene control reveals rapid mRNA turnover. Sci Adv. 2017; 3:e1700006. 10.1126/sciadv.1700006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103. Yao  L, Liang  J, Ozer  A  et al.  A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers. Nat Biotechnol. 2022; 40:1056–65. 10.1038/s41587-022-01211-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104. Arnold  CD, Gerlach  D, Stelzer  C  et al.  Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science. 2013; 339:1074–7. 10.1126/science.1232542. [DOI] [PubMed] [Google Scholar]
  • 105. Chan  CY, Carmack  CS, Long  DD  et al.  A structural interpretation of the effect of GC-content on efficiency of RNA interference. BMC Bioinf. 2009; 10:S33. 10.1186/1471-2105-10-S1-S33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106. Zhang  L, Kasif  S, Cantor  CR  et al.  GC/AT-content spikes as genomic punctuation marks. Proc Natl Acad Sci USA. 2004; 101:16855–60. 10.1073/pnas.0407821101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107. Beisang  D, Bohjanen  PR  Perspectives on the ARE as it turns 25 years old. WIREs RNA. 2012; 3:719–31. 10.1002/wrna.1125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108. Spies  N, Burge  CB, Bartel  DP  3’ UTR-isoform choice has limited influence on the stability and translational efficiency of most mRNAs in mouse fibroblasts. Genome Res. 2013; 23:2078–90. 10.1101/gr.156919.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109. Gruber  AR, Martin  G, Müller  P  et al.  Global 3’ UTR shortening has a limited effect on protein abundance in proliferating T cells. Nat Commun. 2014; 5:5465. 10.1038/ncomms6465. [DOI] [PubMed] [Google Scholar]
  • 110. Hogg  JR, Goff  SP  Upf1 Senses 3′UTR length to potentiate mRNA decay. Cell. 2010; 143:379–89. 10.1016/j.cell.2010.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111. Chen  S, Wang  R, Zheng  D  et al.  The mRNA export receptor NXF1 coordinates transcriptional dynamics, alternative polyadenylation, and mRNA export. Mol Cell. 2019; 74:118–31. 10.1016/j.molcel.2019.01.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112. Hoffman  Y, Bublik  DR, Ugalde  AP  et al.  3’UTR shortening potentiates MicroRNA-based repression of pro-differentiation genes in proliferating Human cells. PLoS Genet. 2016; 12:e1005879. 10.1371/journal.pgen.1005879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113. Daly  CWP, Kristjánsdóttir  K, West  JD  et al.  A novel regulatory pathway recognizes and degrades transcripts with long 3′ untranslated regions. bioRxiv13 March 2024, preprint: not peer reviewed 10.1101/2024.03.11.584429. [DOI]
  • 114. Shi  CY, Elcavage  LE, Chivukula  RR  et al.  ZSWIM8 destabilizes many murine microRNAs and is required for proper embryonic growth and development. Genome Res. 2023; 33:1482–96. 10.1101/gr.278073.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115. Huntzinger  E, Izaurralde  E  Gene silencing by microRNAs: contributions of translational repression and mRNA decay. Nat Rev Genet. 2011; 12:99–110. 10.1038/nrg2936. [DOI] [PubMed] [Google Scholar]
  • 116. Machlab  D, Burger  L, Soneson  C  et al.  monaLisa: an R/bioconductor package for identifying regulatory motifs. Bioinformatics. 2022; 38:2624–5. 10.1093/bioinformatics/btac102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117. Makeyev  AV, Liebhaber  SA  The poly(C)-binding proteins: a multiplicity of functions and a search for mechanisms. RNA. 2002; 8:265–78. 10.1017/S1355838202024627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118. Lebedeva  S, Jens  M, Theil  K  et al.  Transcriptome-wide analysis of regulatory interactions of the RNA-binding protein HuR. Mol Cell. 2011; 43:340–52. 10.1016/j.molcel.2011.06.008. [DOI] [PubMed] [Google Scholar]
  • 119. Kretov  DA, Clément  M-J, Lambert  G  et al.  YB-1, an abundant core mRNA-binding protein, has the capacity to form an RNA nucleoprotein filament: a structural analysis. Nucleic Acids Res. 2019; 47:3127–41. 10.1093/nar/gky1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120. Markus  MA, Morris  BJ  RBM4: a multifunctional RNA-binding protein. Int J Biochem Cell Biol. 2009; 41:740–3. [DOI] [PubMed] [Google Scholar]
  • 121. Taliaferro  JM, Lambert  NJ, Sudmant  PH  et al.  RNA sequence context effects measured in vitro predict in vivo protein binding and regulation. Mol Cell. 2016; 64:294–306. 10.1016/j.molcel.2016.08.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122. Ciafrè  SA, Galardi  S  microRNAs and RNA-binding proteins: a complex network of interactions and reciprocal regulations in cancer. RNA Biol. 2013; 10:934–42. 10.4161/rna.24641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123. Kim  S, Kim  S, Chang  HR  et al.  The regulatory impact of RNA-binding proteins on microRNA targeting. Nat Commun. 2021; 12:5057. 10.1038/s41467-021-25078-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124. Slobodin  B, Han  R, Calderone  V  et al.  Transcription impacts the efficiency of mRNA translation via Co-transcriptional N6-adenosine methylation. Cell. 2017; 169:326–37. 10.1016/j.cell.2017.03.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125. Haimovich  G, Medina  DA, Causse  SZ  et al.  Gene expression is circular: factors for mRNA degradation also foster mRNA synthesis. Cell. 2013; 153:1000–11. 10.1016/j.cell.2013.05.012. [DOI] [PubMed] [Google Scholar]
  • 126. Denzler  R, McGeary  SE, Title  AC  et al.  Impact of MicroRNA levels, target-site complementarity, and cooperativity on competing endogenous RNA-regulated gene expression. Mol Cell. 2016; 64:565–79. 10.1016/j.molcel.2016.09.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127. Theil  K, Herzog  M, Rajewsky  N  Post-transcriptional regulation by 3′ UTRs can Be masked by regulatory elements in 5′ UTRs. Cell Rep. 2018; 22:3217–26. 10.1016/j.celrep.2018.02.094. [DOI] [PubMed] [Google Scholar]
  • 128. Hinnebusch  AG, Ivanov  IP, Sonenberg  N  Translational control by 5’-untranslated regions of eukaryotic mRNAs. Science. 2016; 352:1413–6. 10.1126/science.aad9868. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129. Pascale  A, Amadio  M, Scapagnini  G  et al.  Neuronal ELAV proteins enhance mRNA stability by a pkcα-dependent pathway. Proc Natl Acad Sci USA. 2005; 102:12065–70. 10.1073/pnas.0504702102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130. Tippens  ND, Liang  J, Leung  AK-Y  et al.  Transcription imparts architecture, function and logic to enhancer units. Nat Genet. 2020; 52:1067–75. 10.1038/s41588-020-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131. Gasperini  M, Tome  JM, Shendure  J  Towards a comprehensive catalogue of validated and target-linked human enhancers. Nat Rev Genet. 2020; 21:292–310. 10.1038/s41576-019-0209-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132. Hanson  G, Alhusaini  N, Morris  N  et al.  Translation elongation and mRNA stability are coupled through the ribosomal A-site. RNA. 2018; 24:1377–89. 10.1261/rna.066787.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133. Wu  Q, Medina  SG, Kushawah  G  et al.  Translation affects mRNA stability in a codon-dependent manner in human cells. eLife. 2019; 8:e45396. 10.7554/eLife.45396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134. Dave  P, Roth  G, Griesbach  E  et al.  Single-molecule imaging reveals translation-dependent destabilization of mRNAs. Mol Cell. 2023; 83:589–606. 10.1016/j.molcel.2023.01.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135. Goldstrohm  AC, Hall  TMT, McKenney  KM  Post-transcriptional regulatory functions of mammalian pumilio proteins. Trends Genet. 2018; 34:972–90. 10.1016/j.tig.2018.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136. Tuck  AC, Rankova  A, Arpat  AB  et al.  Mammalian RNA decay pathways are highly specialized and widely linked to translation. Mol Cell. 2020; 77:1222–36. 10.1016/j.molcel.2020.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137. Pacheco-Fiallos  B, Vorländer  MK, Riabov-Bassat  D  et al.  mRNA recognition and packaging by the human transcription-export complex. Nature. 2023; 616:828–35. 10.1038/s41586-023-05904-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138. Bicknell  AA, Cenik  C, Chua  HN  et al.  Introns in UTRs: why we should stop ignoring them. Bioessays. 2012; 34:1025–34. 10.1002/bies.201200073. [DOI] [PubMed] [Google Scholar]
  • 139. Namkoong  S, Ho  A, Woo  YM  et al.  Systematic characterization of stress-induced RNA granulation. Mol Cell. 2018; 70:175–87. 10.1016/j.molcel.2018.02.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140. Vicens  Q, Kieft  JS, Rissland  OS  Revisiting the closed-loop model and the nature of mRNA 5’-3’ Communication. Mol Cell. 2018; 72:805–12. 10.1016/j.molcel.2018.10.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141. Geissler  R, Simkin  A, Floss  D  et al.  A widespread sequence-specific mRNA decay pathway mediated by hnRNPs A1 and A2/B1. Genes Dev. 2016; 30:1070–85. 10.1101/gad.277392.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142. Oksuz  O, Henninger  JE, Warneford-Thomson  R  et al.  Transcription factors interact with RNA to regulate genes. Mol Cell. 2023; 83:2449–63. 10.1016/j.molcel.2023.06.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143. Xiao  R, Chen  J-Y, Liang  Z  et al.  Pervasive chromatin-RNA binding protein interactions enable RNA-based regulation of transcription. Cell. 2019; 178:107–21. 10.1016/j.cell.2019.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 144. Mitschka  S, Fansler  MM, Mayr  C  Generation of 3’UTR knockout cell lines by CRISPR/Cas9-mediated genome editing. Methods Enzymol. 2021; 655:427–57. [DOI] [PubMed] [Google Scholar]
  • 145. Floor  SN, Doudna  JA  Tunable protein synthesis by transcript isoforms in human cells. eLife. 2016; 5:e10921. 10.7554/eLife.10921. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkaf568_Supplemental_File

Data Availability Statement

Raw sequencing data and processed data tables reported in this article have been deposited to Gene Expression Omnibus (GEO) and are available under the SuperSeries accession number GSE270254.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES