Summary
The generation of distinct messenger RNA isoforms through alternative RNA processing modulates the expression and function of genes, often in a cell-type-specific manner. Here, we assess the regulatory relationships between transcription initiation, alternative splicing, and 3′ end site selection. Applying long-read sequencing to accurately represent even the longest transcripts from end to end, we quantify mRNA isoforms in Drosophila tissues, including the transcriptionally complex nervous system. We find that in Drosophila heads, as well as in human cerebral organoids, 3′ end site choice is globally influenced by the site of transcription initiation (TSS). “Dominant promoters,” characterized by specific epigenetic signatures including p300/CBP binding, impose a transcriptional constraint to define splice and polyadenylation variants. In vivo deletion or overexpression of dominant promoters as well as p300/CBP loss disrupted the 3′ end expression landscape. Our study demonstrates the crucial impact of TSS choice on the regulation of transcript diversity and tissue identity.
Keywords: transcription, mRNA isoform, 5ʹ-3ʹ coupling, transcription start site, alternative polyadenylation, long-read sequencing, Drosophila, human brain organoids, nervous system, p300/CBP
Graphical abstract
Highlights
-
•
A quantification of Drosophila and human nervous system full-length mRNA isoforms
-
•
3′ end site selection is coupled to alternative TSS usage
-
•
Dominant promoters drive alternative polyadenylation through p300/CBP
-
•
Conserved 5′-3′ couplings regulate tissue-specific functions
Where an mRNA transcript starts determines where it ends. Epigenetic marks characterize “dominant promoters” that constrain start and polyadenylation sites, driving tissue-specific preferential expression of transcript variants.
Introduction
Variation at each step of pre-messenger RNA (mRNA) synthesis impacts the coding and non-coding content of the mature transcript. Alternative splicing (AS) and alternative polyadenylation (APA) generate mRNA isoforms that differ in their coding sequence (CDS) or the length of their 3′ untranslated region (3′ UTR), thereby contributing to proteome diversity and fine-tuning gene expression. Alternative 3′ UTRs, through distinct sequence and structure elements that dictate interactions of the transcript with microRNAs and RNA-binding proteins (RBPs), regulate the encoded protein’s abundance, localization, and integration into protein complexes.1 APA modulates protein function in a context-specific, gene-specific, or cell-type-specific manner and is critically involved in a variety of cellular processes; indeed, numerous human diseases including cancer and neurological disorders2,3 are associated with APA deregulation. 3′ UTR genetic variants contribute to a substantial number of phenotypic traits and disease heritability,4,5 making APA a possible actionable target for therapeutic intervention.
The tissue- or context-specific regulation of APA is mediated through the activity of effectors such as transcription factors or RBPs. For example, in animals from flies to humans, the neuron-specific ELAV/Hu proteins inhibit splice site and proximal polyadenylation (poly(A)) site usage to mediate the formation of neuronal 3′ UTRs.6 Depending on cellular context, transcription elongation and termination factors interact with the cleavage and polyadenylation (CPA) machinery to enhance or inhibit 3′ end processing.7,8,9,10 The gene-specific regulation of APA is less well understood. Alternative 3′ UTR formation in individual mRNAs was shown to depend on sequence elements located in promoters or enhancers.11,12 Several studies provide evidence of a physical connection between transcription start sites (TSSs) and poly(A) sites (PASs): RBPs pervasively associate with promoter regions, as does the CPA machinery.13,14,15,16 Moreover, DNA methylation and CTCF recruitment influence APA,17 and gene loops affect alternative 3′ end processing in yeast,18 indicating a possible role for chromatin looping in 3′ end site selection. Together, such observations suggest that transcription regulation at promoters may be functionally coupled with APA; however, whether TSSs globally influence the selection of PASs remains unknown.19
The main challenge in determining the regulatory links that mediate the choice of transcription initiation, splicing, and termination sites has been the ability to correlate different regions of a single transcript to one another—in particular, the 5′ end and the 3′ end of the same mRNA molecule, which typically lie several kilobases (kb) apart. Long-read sequencing (LRS) technologies now allow for full delineation of individual mRNA isoforms: in a single read, transcript coverage can be achieved from 5′ to 3′ end.20,21 LRS has been successfully used for the discovery of novel transcripts from repetitive regions, detection of novel splice variants, identification of interactions between alternative promoters and splicing of promoter-proximal exons, and for the identification of coupling events in feature pairs including TSSs, exons, and PASs.22,23,24,25,26,27,28 Short-read sequencing and LRS of nascent RNAs have shed light on intertwined co-transcriptional processes29,30 and demonstrated, for example, the influence of splicing dynamics on CPA efficiency,31,32 indicating a widespread interdependency between alternative transcription and RNA processing. However, so far, technologies have failed to resolve the link between 5′ ends and 3′ ends. Transcript isoform sequencing approaches that concurrently determine the start and end sites of individual RNA molecules, although well suited for determining transcript boundaries and their combinations,33 have not been employed to quantify couplings between 5′ and 3′ ends. Major limitations have indeed precluded the systematic analysis of the regulatory relationship between transcription initiation and termination. LRS read distributions typically peak at 1–2 kb in length, resulting in truncations, underrepresentation of long isoforms, and 5′ or 3′ sequencing biases.22,34 As a result, due to the incomplete representation of full-length mRNA isoforms, it has not been possible to quantify the contribution of different TSSs of the same gene to the expression of distinct 3′ ends.
Here, we analyze the co-occurrence of mRNA features at the isoform level in the Drosophila nervous system, which is characterized by a particularly diverse transcriptome. We used multiple LRS approaches and developed a framework to accurately assess and quantify mRNA isoform usage, including the definition of true PASs. Our data demonstrate coupling between transcript 5′ ends and 3′ ends. We identify “dominant” promoters that, characterized by a unique epigenetic signature, outcompete cognate promoters to drive the expression of alternative, usually more distal, 3′ ends. Promoter dominance is widespread in Drosophila brains and human cerebral organoids and constitutes a major mechanism to regulate 3′ end site choice during transcription to generate select 5′ UTR-3′ UTR combinations in mature mRNAs.
Results
A combined isoform assembly reflects the Drosophila transcriptome
To examine regulatory links between transcription initiation, exon usage, and APA in Drosophila, we first developed a comprehensive LRS isoform annotation approach (Figure 1A). In order to span the maximum range of the coding transcriptome, we used adult brains—the animal tissue with the greatest mRNA isoform diversity and where mRNAs reach their most extreme lengths35,36— as well as embryos at different developmental stages (14–16 and 18–20 h after egg laying [AEL]), and adult ovaries (Table S1). Critically, we size-selected mRNAs (enriching for transcripts >3 kb) using Sage Science BluePippin. We performed Oxford Nanopore Technologies (ONT) cDNA sequencing as well as Pacific Biosciences (PacBio) Iso-seq.25 Both LRS approaches use reverse transcription on polyadenylated RNAs and PCR amplification followed by sequencing through a nanopore (ONT cDNA) or single-molecule real-time (SMRT) technology (Iso-seq).
Internal priming and RT template switching cause misidentification of 3′ ends in most short-read and LRS approaches.21 To avoid these artifacts, we applied ONT direct RNA sequencing (DRS)37 and full-length poly(A) and mRNA sequencing (FLAM-seq),38 two independent LRS methods that detect the very end of poly(A) tails, and we defined the RNA cleavage site with nucleotide resolution. For a high-precision, high-coverage annotation of Drosophila TSSs, we used the Eukaryotic Promoter Database (EPD), a library of RNA polymerase II (RNA Pol II) promoters for which the TSSs were determined experimentally, usually by cap analysis of gene Expression (CAGE) or global run-on (GRO-cap).39 We found it crucial to only consider high-quality reads that span entire mRNA isoforms, from 5′ end to 3′ end. We assembled reads from each of the sequencing methods individually using full-length alternative isoform analysis of RNA (FLAIR).40 Each assembly was refined to retain only transcripts with a TSS represented in the EPD, and whose 3′ end fell within a FLAM-seq or DRS cluster (Figures 1A, S1A, and S1B), thereby filtering out close to two thirds of all putative transcripts (Tables S1–S3). The remaining transcripts were assembled into a combined isoform assembly (CIA). We detected transcripts with mean read lengths over 4 kb and obtained high full-length coverage of long and ultra-long transcripts typical of the nervous system (Figures 1B, S1C, and S1D). Gene expression estimates from CIA transcripts were highly consistent with those assessed by short-read mRNA-seq in each tissue. In contrast, gene expression estimates assessed from nanopore sequencing on non-size-selected transcripts or DRS displayed substantial deviations from the gold standard method (Figure S1E), showing that size selection, rather than biasing toward longer transcripts, allowed for a better representation of tissue transcriptomes.
To assess the quality of full-length reads, we analyzed CIA 5′ ends and 3′ ends. 5′ end pile-ups of ONT cDNA reads coincided with TSSs annotated in the EPD in 80% of cases; non-overlapping pile-ups fell within distal gene regions, usually 3′ UTRs, and lacked distinctive TSS features such as RNA Pol II ChIP-seq and ATAC-seq peaks (Figures S1F–S1H), indicating high accuracy of Drosophila 5′ end annotation in the EPD. CIA 3′ ends harbor the characteristic, defined nucleotide composition41 at the cleavage site, whereas filtered-out 3′ ends display noisy A-rich distributions reminiscent of sites of internal priming (Figure 1C). Strikingly, 3′ ends unique to the Ensembl reference globally displayed a noisy nucleotide distribution, indicating that many reference 3′ ends are mis-annotated (Figures S1I–S1L). We conclude that our stringent DRS- and FLAM-seq-guided filtering effectively identified false 3′ ends. Thus, we generated a Drosophila mRNA isoform atlas, with 59,970 high-confidence, full-length transcripts. This CIA atlas that represents differential expression and poly(A) tail length of each mRNA isoform in heads, ovaries, and embryos can be accessed at https://hilgerslab.shinyapps.io/ciaTranscriptome.
We identified over 30,000 previously undescribed mRNA isoforms. Novel splice variants harbored canonical splicing signals and therefore likely arose from new combinations of known splice sites. In contrast, nearly 9,000 isoforms were characterized by unannotated 3′ end sites (Figures S2A–S2E). Strikingly, isoform novelty drastically increased with transcript length, especially in heads and embryos, two tissues that contain neurons (Figures 1D and S2C), confirming the improved detection of long isoforms of neuronal mRNAs. CIA mRNA isoforms originate from 11,310 genes, 5,020 of which were found to be expressed in all three analyzed tissues. Interestingly, over 80% of these genes are expressed as at least one identical isoform in all three tissues; although most genes expressed in heads were also expressed in other tissues, most CIA isoforms (35,170 out of 59,970) were found exclusively in head samples (Figures 1E, 1F, and S2F). We sequenced neural tissues much more deeply than ovaries and embryos (Table S1), which contributed to, but did not solely account for, the disproportionate representation of brain isoforms (Figure S2G). Our data are consistent with the neural-specific splicing pattern complexity described by modENCODE35 and further illuminate the astonishing isoform diversity of the nervous system.
We next investigated ultra-long mRNAs (>5 kb) of the nervous system more closely. Compared with ovaries and embryos, 3′ UTRs disproportionately contribute to transcript length in head tissue (Figure S2H), consistent with the nervous-system-specific 3′ UTR lengthening seen in multiple animal models.42,43,44,45 Moreover, nervous system transcripts display surprisingly long poly(A) tails, with their size increasing with transcript length (Figures S2I and S2J). This trend in flies has also been described in human cells and C. elegans,38 and it suggests a conserved coupling between distal PAS selection and tail length, possibly reflecting the result of distinct turnover kinetics and a potential role for long poly(A) tails in the protection of ultra-long transcripts.
Coupling between transcript 5′ ends and 3′ ends
The CIA transcriptome now allows us to quantify the co-occurrence of distinct co-transcriptional events in full-length mRNA isoforms. We focused on the analysis of regulatory relationships between transcription initiation and transcription termination. First, we categorized genes based on the number of identified TSSs and PASs in the CIA dataset (Figure 2A). We found that genes with alternative TSS usage (ATSS) undergo APA disproportionately often, and vice versa (Figures 2B and S2K); moreover, 3′ end diversity increases as a function of TSS number, and vice versa (Figures S2L and S2M). This could suggest that ATSSs have evolved to drive the production of distinct 3′ ends. To study couplings between TSSs and PASs, we quantified the differential use of 3′ ends as a function of the 5′ end with which they are associated. We term a “5′-3′ isoform,” a combination of 5′ end and 3′ end, i.e., a co-occurrence of any 5′ end and 3′ end in the same full-length CIA transcript. Importantly, many of the 5′-3′ isoforms we detected in our sensitive LRS approach may have resulted from unproductive transcription and represent “noise” rather than biologically relevant isoforms. To eliminate these isoforms, we used an expression cutoff of >2 transcripts per million (TPM). We found over 16,000 5′-3′ isoforms, almost 7,000 of which were novel (Figure S2N). We subsampled ONT cDNA reads and assessed the number of identified 5′-3′ isoforms for each fraction and for different expression categories. Above cutoff, we reached near-saturation of 5′-3′ isoform detection, even for genes with multiple TSSs and multiple PASs (ATSS-APA genes) (Figure S2O), strongly suggesting that our analysis faithfully represents the 5′-3′ isoform landscape in Drosophila tissues.
TSSs drive the selection of tissue-enriched 3′ end sites
To assess whether APA is driven by the use of distinct TSSs, we first asked whether tissue-specific 3′ end expression is associated with tissue-specific 5′ ends. Ovaries and heads constitute the two tissues at the extremes of the APA spectrum, with shifts toward proximal and distal PAS selection, respectively.43 We calculated differential 3′ end and 5′ end expression between the two tissues to identify “nervous-system 3′ ends” and “ovary 3′ ends,” and we then assessed differential 5′-3′ isoform expression in genes expressed in both tissues (Figure S2P). We discovered that for over half of all ATSS-APA genes, at least one 5′-3′ isoform is enriched in one tissue compared with the other, representing a significant 5′-3′ link (Figure 2C; Table S4), and distinct TSSs are specifically associated with 3′ ends with differential expression between the two tissues (Figure 2D). Moreover, almost half of all nervous-system 3′ ends were specifically expressed from a nervous-system TSS, and vice versa (Figure S2Q). In genes with several significant 5′-3′ links, we observe, almost always, a pattern of bidirectionality in which one 5′-3′ isoform is enriched in heads while the other is enriched in ovaries (Figures 2E, 2F, and S2R). Our results show that ovary- and head-specific PAS usage is linked to the alternative use of TSSs and suggest that TSSs influence PAS selection.
To functionally test this hypothesis in vivo, we used the CRISPR transcriptional activator (CRISPRa) system, in which a catalytically dead Cas9 (dCas9) fused to the VPR activator domain can be recruited to the upstream TSSs of individual genes by single-guide RNAs (sgRNAs).46 We tested all “bidirectional” genes for which a sgRNA strain was available (53 genes) and in which the upstream TSS was head- or ovary-enriched (23 TSSs, Figure 2G; Table S5). TSS activation failed in ovaries for all tested sgRNAs except one; in heads, we obtained significant gene activation for six nervous-system TSSs. In all cases, activation of the nervous-system TSS caused a specific increase in the expression of the linked, nervous-system 3′ end (Figures 2H and S2S). Notably, induction of the Malvolio (Mvl) nervous-system TSS in ovaries caused the ectopic expression of the linked, nervous-system 3′ end, demonstrating that specific TSS activation is sufficient to drive atypical 5′-3′ isoform expression (Figure 2I). Our data thus show that the site of transcription initiation drives head-specific 3′ end site usage, thereby crucially contributing to the establishment of the distinct 3′ UTR landscape of the nervous system.
Co-expression of multiple 5′-3′ isoforms in neuronal cell types
The coordination between tissue-specific TSSs and APA could be mediated by tissue-specific trans-factors; for example, the pan-neuronal RBP ELAV promotes APA of individual genes in a TSS-dependent manner.11 To explore the regulation of co-transcriptional processing independently of the cellular environment, we investigated 5′-3′ links at the gene level in a single tissue—the brain—in which ATSS and APA are particularly abundant. Since APA isoform usage displays cell-to-cell heterogeneity,47 and some 3′ ends can be specific to certain cell populations,48 we assessed whether the 5′-3′ links that we identified in Drosophila heads tend to be expressed in the same cell, or whether on the contrary, distinct isoforms are exclusive to different cell types. Using the Drosophila brain atlas,49 we evaluated every CIA 3′ end at the single-cell level and quantified the co-occurrence of different 3′ ends of the same gene in each of the 177 cell types described in the dataset. We found that the majority of ATSS-APA genes are co-expressed as several APA isoforms in most cell types, and we did not detect a general trend of mutually exclusive 3′ end isoform expression within the brain (Figures S3A–S3C). We conclude that differential usage of TSSs and PASs can occur within the same cell type, independently of tissue-specific or cell-type-specific factors. Hence, we can use the nervous system 5′-3′ isoform dataset to probe PAS preference within the same cell populations.
Global bias of 3′ end site selection depending on the TSS
The identification of full-length gene isoforms of ATSS-APA genes in heads revealed that in many cases (e.g., stai), distinct PASs were preferentially associated with specific TSSs, while for other genes (e.g., Act5C) there was no such bias (Figure 3A). We set out to assess whether the competitive use of PASs is regulated at the site of transcription start. To discern regulatory links transcriptome-wide between transcription start and 3′ end formation, we developed the computational framework long-reads-based alternative termination estimation and recognition (LATER) (Figure 3B). For all ATSS-APA genes, for a given PAS, we calculated the frequency of association of each TSS with the expression of the associated 3′ end (Figures 3A, 3B, and S3D). We defined two modes of 3′ end site selection in ATSS-APA genes: “TSS-unbiased,” in which the association frequencies of distinct TSSs with a given 3′ end did not significantly differ; and “promoter dominance,” in which one TSS was disproportionately associated with the expression of a specific 3′ end. Strikingly, deviations from the expected proportions were the rule rather than the exception, with most (55%) ATSS-APA genes displaying promoter dominance in at least one tissue (Figures 3C, S3E, and S3F; Table S4).
Highly expressed genes displayed predominantly short 3′ UTRs, and stronger promoters were found to favor the selection of proximal PASs in reporter assays,50 consistent with the idea that high transcriptional activity enhances 3′ end processing on a first-come, first-served basis.51 In contrast, a fast RNA Pol II elongation rate correlates with the use of more distal PAS in yeast.52 However, we did not observe any significant difference in expression levels of isoforms from our identified dominant promoters (Figure S3G); importantly, full-length 5′-3′ isoform detection and categorization as dominant-promoter-isoform were not biased by read length for transcripts up to 10 kb long (Figures S3H and S3I). Therefore, transcript length or TSS strength cannot explain PAS selection in cases of promoter dominance. With the ability to quantitatively assess individual 5′-3′ isoforms, we demonstrate a global effect of TSS selection on differential 3′ end expression, causally linking transcription initiation to termination.
Dominant promoters override strong poly(A) signals and constrain AS
We asked whether dominant promoters showed a propensity to override well-defined rules of mRNA processing. For APA genes, differential 3′ end expression is thought to depend on PAS “strength”: unless specifically inhibited in trans, PASs containing the hexamer AAUAAA and variants thereof are rarely bypassed to produce a more distal 3′ end.53,54 For APA genes with a single promoter, the presence of the AAUAAA sequence was indeed a predictor of proximal PAS usage in our dataset, and skipping of the proximal PAS usually occurred in the absence of a poly(A) signal. Strikingly, ATSS-APA genes with dominant promoters showed the opposite trend; in fact, proximal PASs containing AAUAAA were preferentially skipped in transcripts arising from a dominant promoter (Figure 3D).
Next, we tested whether splicing plays a role in the observed 5′-3′ couplings, possibly representing the regulatory intermediate between dominant promoters and 3′ end site selection. First, we ensured that splice isoform coverage in long reads was sufficient to assess exon-exon junction choice. Except for isoforms identified with one single read, likely representing very rare or aberrant variants, we reached saturation of splice isoform detection (Figures S3J and S3K). We developed long-reads-based AS estimation and recognition (LASER), based on the same principles as LATER (Figure S3L), to identify disproportionate association frequencies between distinct TSSs and exon-exon junctions—“TSS-exon links”—as well as between exon-exon junctions and PASs—“exon-PAS links.” Compared with TSS-PAS links (promoter dominance), we identified surprisingly little coupling between AS and APA, with significant links in about 10% of AS-APA genes (Figures 3E, S3M, and S3N; Table S4). A significant link between AS and 3′ end site selection was seen in about one-third of genes with a dominant promoter; for example, stai exons 4 and 8 are near-mutually exclusively associated with distinct PASs and their respective dominant promoters (Figures 3A and 3F). This enrichment, but lack of systematic association of AS with APA led us to hypothesize that exon-PAS couplings are a consequence, not a causal intermediate, of the influence of dominant promoters on co-transcriptional processing. Indeed, we find that in ATSS-APA genes, exon-PAS links almost always (88%) occur when transcription starts from a dominant promoter. Moreover, exon-PAS links are significantly weaker in the absence of a dominant promoter (Figures 3F and 3G). We conclude that in ATSS-APA genes, AS does not represent a necessary intermediate step for biased 3′ end selection by dominant promoters, although it may influence APA in individual cases. Together, our findings indicate that sites of transcription initiation direct APA independently of poly(A) signal strength and also impose a constraint on other RNA processing events such as splicing.
To functionally validate 5′-3′ links and verify that 3′ end choice is mediated by dominant promoters in vivo, we generated the fly mutant orbΔDP, in which the dominant promoter of the gene orb was specifically deleted by CRISPR-Cas9-mediated gene editing. Orb possesses two 3′ ends and three TSSs, with the first TSS dominantly associated with the distal-most 3′ end (Figure 3H). In orbΔDP embryos, expression of the distal but not the proximal 3′ end was massively depleted (Figure 3I). Our data thus show that dominant promoters influence PAS selection and can mediate skipping of canonical poly(A) signals to favor more distal sites of transcription termination.
3′ end site selection through promoter dominance impacts transcriptome identity and gene function
To assess the functional importance of the reported connection between TSSs and PASs, we first sought to determine if it is evolutionarily conserved. We found that 5′ UTRs transcribed from dominant promoters and 3′ UTR sequences generated via dominant-promoter-associated PASs (“dominant-promoter-3ʹ UTRs”) were more conserved than their non-dominant and unlinked counterparts, respectively (Figures 4A and 4B). Following the notion that functional interactions can be detected through evolutionary couplings,55 we performed a mutual information analysis56 to test whether 3′ end site regions and their dominant promoters mutate jointly to maintain genetic interactions. We calculated the co-evolution score for each pair of nucleotide positions within the gene stai. Strikingly, a cluster of high-scoring nucleotide pairs could be identified between 3′ UTR sequences and regulatory regions upstream of the linked dominant promoter, but not the non-dominant promoter. Act5C, a gene with no TSS bias, did not display any 5′-3′ co-evolution clusters (Figure 4C). We performed a more global analysis, selecting 100 ATSS-APA genes (top and bottom 50 by promoter dominance p value), and scored, for each gene, co-evolution clusters in nucleotide pair matrices between 5′ end regions (TSS − 1 kb) and the 3′ end region (3′ UTR). We found that co-evolution scores were significantly higher for dominant promoters, compared with other TSSs; most dominant-promoter genes, but not TSS-unbiased genes, showed strong co-evolution between 5′ end and associated 3′ end sequences (Figures 4D and 4E). Our results show not only that sequences generated directly (5′ UTRs) or indirectly (linked 3′ UTRs) from dominant promoters are conserved but also that evolutionary pressure maintains the link between them.
We next computationally predicted the consequence of disrupting TSS-PAS links and the ensuing 3′ end mis-selection. In Drosophila heads, differential 3′ end site selection by dominant promoters results in a change in protein-CDS, 3′ UTR lengthening, and 3′ UTR shortening in 40%, 42%, and 18% of cases, respectively. A substantial amount of regulatory 3′ UTR sequence is gained through dominant-promoter-mediated 3′ UTR lengthening (Figure S4A); we sought to quantify the influence of dominant promoters by computing the occurrence, in either 3′ UTR isoform, of potential binding sites for neuronal RBPs and microRNAs highly conserved and enriched in fly heads, since these are more likely to exert a functionally relevant effect on target mRNAs.57 Interestingly, binding motifs for miR-277, a microRNA involved in synaptogenesis with a possible role in neurodegeneration,58,59 were the most impacted by dominant-promoter-mediated 3′ UTR lengthening (Figure S4B). In addition, dominant-promoter 3′ UTRs were enriched in putative binding sites for RBPs well known for specialized neuronal roles, such as pumilio (Pum) and alan shepard (Shep), as well as for miR-2279, a poorly expressed and conserved microRNA that is nonetheless predicted to target neural pathways related to axonal projections (Figures S4C–S4F). This indicates that dominant-promoter-associated 3′ UTR sequences function in the regulation of the encoded protein in an isoform-specific manner; our analyses predict that disruption of conserved TSS-PAS links causes a widespread mis-selection of 3′ end sites, resulting in loss of tissue-specific protein isoforms and 3′ UTR-mediated regulation by microRNAs and RBPs, strongly suggesting that regulation through dominant promoters is functionally relevant for animal fitness.
A combination of epigenetic features defines the chromatin environment of dominant promoters
One possible interpretation of the observed 5′-3′ coupling is that dominant promoters possess a characteristic that subjects the nascent transcript to modified rules of co-transcriptional processing. Splicing and 3′ end cleavage have been shown to be influenced by the presence of particular chromatin elements at the sites of transcription initiation and termination, respectively.60 We set out to identify whether dominant promoters possess a common regulatory feature that mediates coupling between TSS and PAS. We analyzed ChIP-seq data generated in Drosophila heads (modENCODE61) to assess the in vivo location of over 40 histone marks, histone variants, and transcription factor binding sites. We found that promoter regions of ATSS-APA genes, while not displaying any notable enrichment in RNA Pol II or common repressive or active chromatin marks, were strongly depleted for the histone variant H2A.Z. Conversely, acetylation of histone H3 at lysine 18 (H3K18Ac), a histone mark associated with gene activation and transcriptional priming in developmental transitions,62 was specifically enriched around the TSS of ATSS-APA genes (Figure S5A).
We grouped TSS regions genome-wide according to H2A.Z and H3K18ac ChIP-seq signal, which generated five clusters of distinct H2A.Z and H3K18 patterns. Cluster 1 and cluster 2 were characterized by H2A.Z depletion concomitant with H3K18Ac enrichment. Strikingly, those two clusters included significantly more dominant promoters than the other three clusters (Figures 5A and 5B; Table S4), suggesting that H2A.Z depletion and H3K18Ac enrichment are common characteristics of dominant promoters. Next, we assessed transcription factor binding at the TSS and linked 3′ end of dominant promoter genes in fly heads, using the ReMap 2022 database.63 We found coupled enrichment of 20 factors at both transcription initiation and termination sites of these genes (Figures S5B–S5D); most interestingly, the highly conserved acetyltransferase Nejire (Nej, also known as p300 or CREB-binding protein, CBP) was the factor most frequently found at dominant promoters and at their associated 3′ end (Figures 5C and 5D). Fly and mammalian CBP promote the proper deposition of H3K18Ac,64,65 the histone mark we found enriched around dominant promoters. Together, our data thus indicate that dominant promoters of ATSS-APA genes are characterized by a specific epigenetic landscape, partially established by the presence of CBP.
p300/CBP mediates dominant-promoter-driven 3′ end site selection
To test whether CBP is instructive for the selection of alternative PASs, we performed mRNA-seq and assessed 3′ end usage in two independent CBP mutants. We used 14- to 16-h embryos, a stage at which maternally deposited CBP was depleted but embryos still showed a normal gross morphology. The absence of zygotic CBP caused a widespread impairment of the embryonic 3′ end landscape: 21% of all expressed APA genes displayed a change in 3′ end site selection, characterized by a significant upregulation or downregulation of RNA expression downstream of the proximal PAS, compared with upstream regions (Figure 5E). Strikingly, affected genes are those that display, in wild-type flies, CBP ChIP signal at both the TSS and the associated PAS (clusters 1 and 2), whereas APA was largely unaffected in genes where CBP signal was only found at the TSS (cluster 3, Figure 5E; Table S4). PAS shifts were more frequent and more pronounced in dominant-promoter genes compared with TSS-unbiased genes (Figures 5F–5H), demonstrating that p300/CBP mediates, at least partially, dominant-promoter-driven 3′ end site selection. In contrast, mutation of one of three other factors we had found enriched at the TSS and PAS of dominant promoter genes—Enhancer of zeste (E(z)), Deformed (Dfd), and Posterior sex combs (Psc)—had little to no effect on PAS usage (Figure S5E). We propose that in addition to CBP, other factors are involved in the promoter-mediated regulation of APA, both globally and on a gene-by-gene basis. Such factors may include chromatin modifiers, AS regulators, and transcription factors.
TSS influence on isoform choice is a conserved regulatory mechanism
To assess whether TSS-mediated PAS selection is conserved in mammals, we performed our LRS-based analysis in human cerebral organoids, an in vitro model of the human brain.66 Coupling FLAM-seq with ONT cDNA sequencing and size selection, we generated an organoid CIA dataset including many novel long mRNA isoforms and defined highly accurate 5′-3′ isoforms in ATSS-APA genes (Figures 6A and S6A; Tables S1–S3). Since FLAM-seq identified only 16,840 3′ end sites, we performed 3′ end sequencing (3′-seq) and predicted further confident 3′ end sites based on the nucleotide composition of FLAM 3′ ends, thereby substantially expanding the 3′ end database (see STAR Methods). Similar to Drosophila, in human organoids the presence of ATSSs was associated with APA (Figure 6B). We applied LATER to the human dataset and found that over a third of ATSS-APA genes display a TSS bias, in which 3′ end choice is influenced by the promoter (Figures 6C and S6B; Table S4), in many cases mediated by skipping of the proximal canonical poly(A) signal (Figures 6D and 6E). The lack of ChIP-seq data from human neural tissue prevented us from identifying a clear TSS signature of dominant promoters, as we did in Drosophila. However, we performed a transcription factor enrichment analysis using the ReMap 2022 database63 and found that factors displaying an association with APA,12 such as FOXA1 and p300/CBP, were enriched at dominant promoters and/or linked 3′ ends also in human cells (Figure S6C). We conclude that dominant promoters apply a conserved transcriptional constraint on isoform choice, often mediating the usage of more distal PASs. The epigenetic signatures at these sites may have evolved to aid in the recruitment of transcription and processing factors—including p300/CBP—that execute this program, which is determined at the time of transcription initiation.
Discussion
Over the past decades, a rich body of work has described coupling mechanisms that coordinate transcription with splicing67,68; for example, a role for promoter identity,69 RNA Pol II kinetics,70 and transcription factors71 was demonstrated in defining splice site choice. In comparison, our knowledge on links between transcription initiation and APA was very limited.19 In this work, we provide an integrated view of mRNA features and their association in individual transcripts. Our data will serve as a useful resource to study alternative RNA processing, poly(A) tail lengths, RNA modifications, and the interrelation of these features in a tissue-dependent manner. Our finding that 3′ end site selection depends on TSS choice has broad implications for the study of gene expression and its role in disease. It is well established that the use of distinct 3′ end sites contributes to important gene expression programs, including those involved in developmental transitions, tissue identity, and the cell cycle; APA deregulation is associated with numerous human pathologies, most notably cancer.1,3,72 We hypothesize that the regulation of isoform expression by the use of ATSSs is a central mechanism to ensure tissue function and identity.
Given the pattern of bidirectional 5′-3′ isoform production we found when comparing tissues, it is evident that both cis-elements as well as tissue-specific trans-factors must act at transcription initiation to drive APA. We describe two modes of APA regulation in cis: TSS unbiased, in which the site of transcription termination does not depend on the TSS and is likely determined by cis- and trans-regulatory elements at the PAS73; and promoter dominance, in which the use of specific TSSs drives differential splice site and PAS usage. Coupling 5′ ends with 3′ ends may represent a cellular strategy to ensure the co-occurrence of particular 5′ UTR and 3′ UTR elements in the same mRNA molecule. Post-transcriptional gene regulation including mRNA localization, stabilization, and translation depends not only on the sequence and structural elements found in 5′ and 3′ UTRs1,74 but also on 5′-3′ communication,75 either through physical proximity mediated by the concomitant binding of RBPs to both RNA ends (closed-loop model) or through indirect interactions.76 Hence, dominant promoters may act to enhance these intramolecular interactions to regulate mRNA expression.
At dominant promoters, H2A.Z depletion, indicative of high transcription rates, frequent chromatin interactions, and lower nucleosome definition62 synergizes with the enrichment of the active histone mark H3K18Ac, which was shown to help prime genes for activation during developmental transitions77; such increased chromatin accessibility at the TSS and PAS may enhance 5′-3′ coupling and the controlled differential expression of distinct mRNA isoforms. CBP may also link 5′ and 3′ ends independently of its established role in H3K18Ac deposition; concomitant binding of CBP molecules at the TSS and PAS could facilitate an intragenic loop, a mechanism that was proposed to connect transcription initiation with PAS choices.18,78 Additionally, we hypothesize that CBP mediates the recently recognized influence of distal cis-regulatory elements on APA,12 possibly by binding to enhancer RNAs (eRNAs), an interaction that stimulates histone acetylation and transcription of target genes.79 Gene topology may further distinguish the regulation of neuronal ATSS-APA genes. In mouse brains, “melting” chromatin states and distinct chromatin contact patterns were seen in long genes associated with specialized neuronal processes,80 and it is possible that such topological constraints contribute to 5′-3′ coupling. We propose that dominant promoters, by residing in a chromatin environment that dictates specialized regulation through enhanced protein interactions and possibly gene looping,81 promote communication between the transcription and RNA processing machineries. Interestingly, dominant promoters display typical characteristics of promoters of developmental genes, including lower nucleosome occupancy, CBP binding, and H3K18Ac. “Developmental core promoters” were previously defined as TSSs regulated by “developmental enhancers” that play a defining role in development-, tissue-, or context-specific gene regulation, in contrast with “housekeeping promoters.”82 Our results in the context of prior literature are therefore consistent with a model in which developmental genes employ specific epigenetic regulation evolved to ensure robust and highly regulated interactions not only between enhancers and promoters but also between promoters and PASs to dictate gene expression.
Coupling 5′ ends to 3′ ends of transcripts represents a conserved principle in the regulation of gene expression, with broad relevance, as APA affects mRNA coding potential, localization, stability, and translation to achieve context-specific modulation of developmental genes. The universal impact of alternative mRNA processing in the etiology of disease has been highlighted by the substantial association found between APA-altering SNPs in 3′ UTRs with human phenotypic traits and diseases,83 which can be further probed using variant expression-aware annotations84 and large LRS datasets of human tissues.27 Linking 5′ ends to disease-relevant mutations in 3′ UTRs will close an important gap in our understanding of genetic disease mechanisms, aid in the identification of disease-associated mutations in the full-length context in which they are deleterious, and may provide a platform to target variant-associated diseases.
Limitations of the study
We centered our analyses on the nervous system as a whole, as opposed to considering the complexity of its many different cell types. As a consequence, for genes with extreme isoform diversity and highly cell-type-specific isoform expression, only relatively abundant isoforms passed our stringent detection cutoff. Therefore, we expect that many functionally relevant mRNA isoforms went undetected. Our study uses BluePippin size selection prior to nanopore LRS. Although gene expression calculations from these data were highly consistent with those obtained through mRNA-seq, in individual cases, longer transcripts may be overrepresented. Full-length mRNA coverage from nanopore long reads substantially declined in transcripts exceeding 10 kb in size. Although we excluded isoforms exceeding that limit from quantitative analyses, they are still depicted in the CIA atlas, where they may be underrepresented, compared with significantly shorter mRNAs. Finally, the transcription factor binding analysis on human TSSs conducted with the ReMap 2022 database63 used ChIP-seq data from a variety of human cells: the results shown in Figure S6 likely incompletely represent binding in cerebral organoids.
STAR★Methods
Key resources table
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Critical commercial assays | ||
NEBNext® Poly(A) mRNA Magnetic Isolation Module | New England BioLabs | E7490 |
PCR-cDNA Sequencing Kit | Oxford Nanopore | SQK-PCS109 |
AMPure XP for PCR Purification | Beckman Coulter | A63881 |
Dynabeads™ mRNA Purification Kit | Invitrogen | 61006 |
USB poly(A) length assay kit | Thermo Fisher | Cat# 764551KT |
RNAClean XP Beads | Beckmann Coulter | Cat# A63987 |
SMARTScribe Reverse Transcriptase kit | Clontech | Cat# 639537 |
Advantage 2 DNA polymerase mix | Clontech | Cat# 639201 |
Direct RNA sequencing kit | Oxford Nanopore | SQK-RNA002 |
TruSeq® Stranded mRNA Library Prep | Illumina | Cat# 20020595 |
TruSeq® Stranded Total RNA Library Prep Gold | Illumina | Cat# 20020599 |
QuantSeq 3′-Seq Library Prep Kit REV | Lexogen | Cat# 016.96 |
Deposited data | ||
Raw and analyzed LRS and RNA-seq data | This paper | GEO: GSE203583 |
CIA reference transcriptome data | This paper | GEO: GSE203583 |
Drosophila reference genome (dm6) | The FlyBase Consortium/Berkeley Drosophila Genome Project/Celera Genomics | https://www.ncbi.nlm.nih.gov/assembly/GCF_000001215.4/ |
Human reference genome (GRCh38/hg38) | Genome Reference Consortium | https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/ |
FLAM-seq and mRNA-seq Human Brain Organoids | Rybak-Wolf et al.85 | GEO: GSE163952 |
mRNA-seq embryo (14-16 h and 18-22 h) | Carrasco et al.86 | GEO: GSE146986 |
Experimental models: Cell lines | ||
Human iPSC lines iPSC-1 XM001 | Thermo Fisher Scientific | A18944 |
Human iPSC lines iPSC-2 | Thermo Fisher Scientific | A18945 |
Experimental models: Organisms/strains | ||
D. melanogaster: w1118 | Bloomington Drosophila Stock Center | BDSC: 5905; RRID:BDSC_5905 |
D. melanogaster: GFP-marked TM3 balancer: w[1118]; Dr[Mio]/TM3, P{w[+mC]=GAL4-twi.G}2.3, P{UAS-2xEGFP}AH2.3, Sb[1] Ser[1] | Bloomington Drosophila Stock Center | BDSC: 6663; RRID:BDSC_6663 |
D. melanogaster: orbΔDP | This paper | N/A |
D. melanogaster: tub-Gal4;UAS:dCas9-VPR: w[∗]; P{y[+t7.7] w[+mC]=UAS-3xFLAG.dCas9.VPR}attP40; P{w[+mC]=tubP-GAL4}LL7/T(2;3)TSTL14, SM5: TM6B, Tb[1] |
Bloomington Drosophila Stock Center | BDSC: 67048; RRID:BDSC_67048 |
D. melanogaster: Mvl-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS01237}attP40 |
Bloomington Drosophila Stock Center | BDSC: 78119; RRID:BDSC_78119 |
D. melanogaster: ttv-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS01385}attP40 |
Bloomington Drosophila Stock Center | BDSC: 78207; RRID:BDSC_78207 |
D. melanogaster: ttk-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS02363}attP40 |
Bloomington Drosophila Stock Center | BDSC: 78287; RRID:BDSC_78287 |
D. melanogaster: Fatp1-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS01376}attP40 |
Bloomington Drosophila Stock Center | BDSC:79440; RRID:BDSC_79440 |
D. melanogaster: wun-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS01590}attP40 |
Bloomington Drosophila Stock Center | BDSC: 79461; RRID:BDSC_79461 |
D. melanogaster: chn-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS02080}attP40 |
Bloomington Drosophila Stock Center | BDSC: 79871; RRID:BDSC_79871 |
D. melanogaster: non-targeting sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=GS00089}attP40 |
Bloomington Drosophila Stock Center | BDSC: 67539; RRID:BDSC_67539 |
D. melanogaster: csw-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS01896}attP40 |
Bloomington Drosophila Stock Center | BDSC: 78649 RRID:BDSC_78649 |
D. melanogaster: zfh1-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS02033}attP40 |
Bloomington Drosophila Stock Center | BDSC: 79798 RRID:BDSC_79798 |
D. melanogaster: sbb-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS02147}attP40 |
Bloomington Drosophila Stock Center | BDSC: 79903 RRID:BDSC_79903 |
D. melanogaster: twin-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS02161}attP40 |
Bloomington Drosophila Stock Center | BDSC: 79908 RRID:BDSC_79908 |
D. melanogaster: jing-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS02847}attP40 |
Bloomington Drosophila Stock Center | BDSC: 80271 RRID:BDSC_80271 |
D. melanogaster: psq-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS05187}attP40 |
Bloomington Drosophila Stock Center | BDSC: 82755 RRID:BDSC_82755 |
D. melanogaster: CASK-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS01254}attP40 |
Bloomington Drosophila Stock Center | BDSC: 78127 RRID:BDSC_78127 |
D. melanogaster: sky-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS02377}attP40 |
Bloomington Drosophila Stock Center | BDSC: 78295 RRID:BDSC_78295 |
D. melanogaster: Pka-R1-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS01286}attP40 |
Bloomington Drosophila Stock Center | BDSC: 78595 RRID:BDSC_78595 |
D. melanogaster: Pdp1-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS02089}attP40 |
Bloomington Drosophila Stock Center | BDSC: 79516 RRID:BDSC_79516 |
D. melanogaster: SPoCk-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS01261}attP40 |
Bloomington Drosophila Stock Center | BDSC: 79673 RRID:BDSC_79673 |
D. melanogaster: Mef2-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS02062}attP40 |
Bloomington Drosophila Stock Center | BDSC: 79863 RRID:BDSC_79863 |
D. melanogaster: brat-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS02140}attP40 |
Bloomington Drosophila Stock Center | BDSC: 79900 RRID:BDSC_79900 |
D. melanogaster: REPTOR-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS02742}attP40 |
Bloomington Drosophila Stock Center | BDSC: 79987 RRID:BDSC_79987 |
D. melanogaster: E2f1-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=SAM.dCas9.GS02441}attP40 |
Bloomington Drosophila Stock Center | BDSC: 80516 RRID:BDSC_ 80516 |
D. melanogaster: Stat92E-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=SAM.dCas9.GS02442}attP40/CyO |
Bloomington Drosophila Stock Center | BDSC: 80517 RRID:BDSC_80517 |
D. melanogaster: gfzf-sgRNA y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=SAM.dCas9.GS05528}attP40 |
Bloomington Drosophila Stock Center | BDSC: 84063 RRID:BDSC_84063 |
D. melanogaster: nej3 mutant w[∗] nej[3]/FM7c |
Bloomington Drosophila Stock Center | BDSC:3729 RRID:BDSC_3729 |
D. melanogaster: nejEP1179 mutant w[∗] P{w[+mC]=EP}nej[EP1179] |
Bloomington Drosophila Stock Center | BDSC: 30733; RRID:BDSC_30733 |
D. melanogaster: E(z)731mutant w[∗]; E(z)[731] P{1xFRT.G}2A/TM6C, Sb[1] Tb[1] |
Bloomington Drosophila Stock Center | BDSC: 24470; RRID:BDSC_24470 |
D. melanogaster: psch27 mutant Psc[h27]/CyO |
Bloomington Drosophila Stock Center | BDSC: 5547; RRID:BDSC_5547 |
D. melanogaster: psce22 mutant Psc[e22]/CyO |
Bloomington Drosophila Stock Center | BDSC: 5546; RRID:BDSC_5546 |
D. melanogaster: Dfd1 mutant Dfd[1] p[p] |
Bloomington Drosophila Stock Center | BDSC: 800; RRID:BDSC_800 |
D. melanogaster: SppsG8810mutant w[1118]; P{w[+mC]=EP}Spps[G8810]/TM6C, Sb[1] |
Bloomington Drosophila Stock Center | BDSC:30186; RRID:BDSC_30186 |
Oligonucleotides | ||
Oligonucleotides used for RT-qPCR | Table S6 | N/A |
CRISPR guide RNAs | STAR Methods | N/A |
Software and algorithms | ||
Iso-seq3 pipeline | PacBio | https://github.com/PacificBiosciences/IsoSeq |
CIA assembly pipeline | This paper |
https://doi.org/10.5281/zenodo.7759448 https://github.com/hilgers-lab /CIAtranscriptome_assembly |
Long-reads-based Alternative Termination Estimation and Recognition (LATER) | This paper |
https://doi.org/10.5281/zenodo.7759430 https://github.com/hilgers-lab/LATER |
Long-reads-based Alternative Splicing Estimation and Recognition (LASER) | This Paper |
https://doi.org/10.5281/zenodo.7759428 https://github.com/hilgers-lab/LASER |
R 4.1.1 | N/A | https://www.R-project.org/ |
Minimap2 v2.17-r941 | Li87 | https://github.com/lh3/minimap2 |
NanoPlot 1.29.1 | N/A | https://github.com/wdecoster/NanoPlot |
guppy-5.0.7 model: dna_r9.4.1_450bps_sup.cfg | Oxford Nanopore | https://github.com/nanoporetech/pyguppyclient |
snakePipes v1.2.2 | Bhardwaj et al.88 | https://github.com/maxplanck-ie/snakepipes/blob/develop/docs/index.rst |
DEXSeq_1.28.3 | Anders et al.89 | http://bioconductor.org/packages/release/bioc/html/DEXSeq.html |
DESeq2 | Love et al.90 | N/A |
Seurat V4.1.0 | N/A | https://github.com/satijalab/seurat/ |
STARlong v2.7.8a | Dobin et al.91 | https://github.com/alexdobin/STAR/blob/master/bin/Linux_x86_64/STARlong |
STAR v2.6.1b | Dobin et al.91 | https://github.com/alexdobin/STAR |
FLAMAnalysis | Legnini et al.38 | https://github.com/rajewsky-lab/FLAMAnalysis |
pipeline-polya-ng | Oxford Nanopore | https://github.com/nanoporetech/pipeline-polya-ng |
GenomicRanges_1.32.7 | Lawrence et al.92 | https://bioconductor.org/packages/release/bioc/html/GenomicRanges.html |
GenomicFeatures_1.36.4 | Lawrence et al.92 | https://bioconductor.org/packages/release/bioc/html/GenomicFeatures.html |
ggplot2_3.2.1 | N/A | https://github.com/tidyverse/ggplot2 |
dplyr_1.0.8 | N/A | https://github.com/tidyverse/dplyr |
seqtk 1.2-r94 | N/A | https://github.com/lh3/seqtk |
Tama | N/A | https://github.com/GenomeRIK/tama |
Sierra | Patrick et al.93 | https://github.com/VCCRI/Sierra |
SUPPA v2.3 | Trincado et al.94 | https://github.com/comprna/SUPPA |
BSgenome.Dmelanogaster.UCSC.dm6 | N/A | https://bioconductor.org/packages/release/data/annotation/html/BSgenome.Dmelanogaster.UCSC.dm6.html |
Rsamtools_2.10.0 | N/A | https://bioconductor.org/packages/Rsamtools |
samtools 1.12 | N/A | https://github.com/samtools/htslib.git |
UpSetR 1.4.0. | N/A | http://github.com/hms-dbmi/UpSetR |
flair v1.1 | Tang et al.40 | https://github.com/BrooksLabUCSC/flair |
Biostrings 2.62.0 | N/A | https://bioconductor.org/packages/Biostrings |
cellranger-6.1.2 | Zheng et al.95 | N/A |
snakemake 7.0.4 | N/A | https://github.com/snakemake/snakemake |
bedtools v2.27.0 | N/A | https://github.com/arq5x/bedtools2 |
vegan 2.6-2 | Oksanen et al.96 | https://github.com/vegandevs/vegan |
ReMapEnrich | N/A | https://github.com/remap-cisreg/ReMapEnrich |
SQANTI3 v1.2 | Tardaguila et al.21 | https://github.com/ConesaLab/SQANTI3 |
SQANTI3 v5.1.3 | Tardaguila et al.21 | https://github.com/ConesaLab/SQANTI3 |
IsoAnnotLite 2.7.3 | N/A | https://isoannot.tappas.org/isoannot-lite/ |
cDNA_Cupcake v12.5 | N/A | https://github.com/Magdoll/cDNA_Cupcake |
deeptools 3.5.0 | N/A | https://github.com/deeptools/deepTools |
randomForest | N/A | https://cran.r-project.org/web/packages/randomForest/index.html |
MEME Suite 5.5.0 AME | N/A | https://meme-suite.org/meme/tools/ame |
MEME Suite 5.5.0 FIMO | N/A | https://meme-suite.org/meme/tools/fimo |
exaR/apa_target_caller | Carrasco et al.86 | https://github.com/hilgers-lab/apa_target_caller |
prody 2.2.0 | Zhang et al.97 | http://prody.csb.pitt.edu/ |
GenomicScores | Puigdevall and Castelo98 | https://bioconductor.org/packages/release/bioc/html/GenomicScores.html |
ChIPseeker | Wang et al.99 | https://bioconductor.org/packages/release/bioc/html/ChIPseeker.html |
TargetScan Fly v7.2 | Agarwal et al.100 | https://www.targetscan.org/fly_72/ |
DAVID Knowledgebase v2022q4 | N/A | https://david.ncifcrf.gov/tools.jsp |
Co-evolution analysis | This paper |
https://doi.org/10.5281/zenodo.7759440 https://github.com/hilgers-lab/isoform-coevolution |
Random forest classification of 3ʹ ends | This paper | https://doi.org/10.5281/zenodo.7438383 |
Gsignal | N/A | https://github.com/gjmvanboxtel/gsignal |
Other | ||
Drosophila mRNA isoform atlas of CIA Transcriptome | This paper | https://hilgerslab.shinyapps.io/ciaTranscriptome/ |
Isoform-level functional feature annotation of CIA Transcriptome | This paper | GEO: GSE203583 |
CIA transcriptome explorer | This paper |
https://doi.org/10.5281/zenodo.7759434 https://github.com/hilgers-lab/ciaTailoR |
Resource availability
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Valérie Hilgers (hilgers@ie-freiburg.mpg.de).
Materials availability
All plasmids and fly strains generated in this study are available from the lead contact without restriction.
Experimental model and subject details
Drosophila melanogaster
Experiments in this study used male and female (in equal amounts, except for experiments using ovaries) Drosophila melanogaster embryos and adult flies. Flies were raised at 25°C. The CIA reference transcriptome was built using w1118 flies (Bloomington stock number 5905). Flies mutant for p300/cbp/nej (nej3 and nejEP1179),101,102 Enhancer of zeste (E(z)731),103 Posterior sex combs (Psch27 and Psce22),104 Deformed (Dfd1)105 and Spps (SppsG8810)106 were obtained from the Bloomington Drosophila Stock Center. We used CRISPR/Cas9-mediated genome editing following the procedure described in Port and Bullock 107 to generate the orb dominant promoter deletion orbΔDP. Two guide RNAs (GAGAGAGCTCTACATCAGC, CGCCACGGCGTGCAACGCTG) targeted the orb promoter region, generating a 1.9 kb deletion beginning 40 bp upstream of the annotated TSS. All embryo injections were performed by Bestgene, Inc. Recessive lethal mutations were kept in heterozygosis over GFP-balancer alleles. In CRISPRa, to induce the expression of tissue-specific TSSs, TRiP-OE lines from the Transgenic RNAi Project46,108 were used. Flies expressing single guide RNAs (sgRNAs) targeting the upstream TSS of genes of interest (sgRNA, example genotype: y[1] sc[∗] v[1] sev[21]; P{y[+t7.7] v[+t1.8]=TOE.GS02080}attP40) were crossed with flies expressing, under control of tubulin-Gal4, a catalytically dead Cas9 (dCas9) fused to the VP64 activation domain (Tub>dCas9-VPR, genotype: w; UAS:dCas9-VPR; tub-Gal4/SM5, TM6B). All fly strains are listed in the key resources table.
Human cerebral organoids
iPSC-derived cerebral organoids were generated as described in Giandomenico et al.,109 with some modifications. Briefly, after dissociation into a cell suspension with accutase, 6,000 cells were seeded per one well of 96-well plates in 100 μl of embryoid body medium (EBM: DMEM/F12, 20% Knockout replacement serum, 1x Glutamax, 1x MEM-NEAA, 2% ESC FBS, 50μM ROCK Inhibitor, 10 μM bFGF). On day four, the medium was replaced with EBM without bFGF and ROCK inhibitor. On day five, the medium was replaced with a neural induction medium (NIM: DMEM/F12, 1x N2 supplement, 1x Glutamax, 1x MEM-NEAA, 10μg/ml heparin solution). On day 7-9, the formed organoids were embedded into Matrigel (Corning, 356234) and kept in NIM for one day, and in 1:1 NIM: organoid differentiation medium (ODM: 1:1 DMEM/F12: Neurobasal, 1xN2 supplement, 1x B27- vitamin A supplement, insulin, 2-ME solution, Glutamax, MEM-NEAA) for one additional day, followed by four days in ODM. Next, the organoids were transferred to ultra-low attachment 6-well plates and cultured on an orbital shaker (85 rpm) in organoid maturation medium (OMM: 1:1 DMEM/F12: Neurobasal, N2 supplement, B27+ vitamin A supplement, insulin, 2-ME solution, Glutamax supplement, MEM-NEAA, Vitamin C solution, chemically defined lipid concentrate, BDNF, GDNF, cAMP, 1% Matrigel).
Method details
Sample collection for RNA analysis
For head transcriptomes, 3-day-old w1118 flies were collected and flash-frozen in liquid nitrogen and heads were homogenized in QIAzol Lysis Reagent (QIAGEN 79306) for RNA extraction. For ovary transcriptomes, 3-day old w1118 virgin females were collected, and 20 ovaries per replicate were dissected and homogenized. For embryo transcriptomes, eggs from w1118 flies were collected for two hours on agar plates and aged for either 14h (14-16h AEL embryos), or 18h (18-20h AEL) at 25°C. 50 embryos per replicate were homogenized. For orbΔDP, nej3, nejEP1179, E(z)731, Psch27, Psce22, Dfd1 and SppsG8810 mutant analysis, eggs from mutant flies grown in heterozygosis with GFP-marked balancer chromosomes were collected for two hours on agar plates and aged for the appropriate amount of time at 25°C. Embryos were dechorionated following standard procedures and placed on a plate containing halocarbon oil. 20 to 30 mutant embryos were hand-sorted according to morphology and against GFP signal, in at least three replicates. For the CRISPRa experiment, to obtain flies ubiquitously expressing dCas9 and a promoter-targeting sgRNA, tub>dCas9VPR virgin female flies were crossed with sgRNA males. Crosses were maintained at 25°C and parents were removed from the vial after two days. Eclosed progeny were aged for five days, selected against Tb and Cyo, and the heads and ovaries of five female flies per replicate were processed for RNA extraction. A sgRNA line targeting a non-Drosophila sequence was used as a control. Organoid RNA was prepared from 60-day-old cerebral organoids as described in Rybak-Wolf et al.85 Briefly, organoids were collected in TRIzol (Invitrogen 15596026) and RNA was prepared with the Direct-zol RNA Miniprep kit (Zymo Research R2050) according to the manufacturer’s instructions.
RNA extraction and RT-qPCR
For all experiments, RNA was extracted using QIAzol Lysis Reagent (QIAGEN 79306) according to the manufacturer’s instructions. Before library preparation, RNA integrity was analyzed using a 2100 Bioanalyzer (Agilent Technologies). Only RNAs with RQN values of 10 were used for all sequencing experiments. For RT-qPCR, 300 ng total RNA were used for reverse transcription with iScript gDNA Clear cDNA Synthesis Kit (Bio-Rad). RT-qPCR was performed in a LightCycler 480 II instrument using FastStart SYBR Green Master (Roche). RT-qPCR primer sequences are listed in Table S6.
Short-read Sequencing (RNA-seq and 3ʹ-seq)
Libraries for mRNA-seq were prepared from 3-day-old w1118 fly heads with 100 ng of total RNA using TruSeq Stranded mRNA Library Prep (Illumina 20020595) according to the manufacturer’s instructions. Libraries for total RNA-seq were prepared from dissected fly ovaries with 100 ng of total RNA using TruSeq Stranded total RNA Library Prep (Gold) (Illumina 20020599) according to the manufacturer’s instructions. Paired-end sequencing was performed using the NovaSeq6000 platform (Illumina) and 101-bp reads. mRNA-seq data from 14-16h AEL embryos and 18-20h AEL embryos are from Carrasco et al.86 Sequencing data were processed using the RNA-seq module from snakePipes,88 adding flags for --trim, -m “alignment-free,alignment”. Reads were mapped to the Drosophila melanogaster reference genome (Ensembl assembly release dm6), and the transcriptome reference annotation release-96 using STAR.91 3ʹ-seq libraries were prepared with 10 ng of total RNA using the QuantSeq 3ʹ-seq Library Prep Kit REV (Lexogen) according to the manufacturer’s instructions. Paired-end sequencing was performed using the NovaSeq6000 platform (Illumina) and 101-bp reads.
Nanopore sequencing (ONT cDNA)
Nanopore sequencing was performed on 3-day-old w1118 fly heads, 14-16h AEL embryos, 18-20h AEL embryos, dissected fly ovaries, and human cerebral organoids. For generation of full-length cDNA libraries, polyadenylated RNA molecules were isolated from total RNA preparations using the NEB’s NEBNext® Poly(A) mRNA Magnetic Isolation Module (NEB). Purified polyadenylated RNA molecules were used for library preparation using the cDNA-PCR Sequencing protocol (Oxford Nanopore Technologies). The following modifications were made to the procedure. To eliminate short reads from the final data, both input polyadenylated RNA molecules and cDNA molecules were cleaned upon further processing using AMPure XP beads (Beckman Coulter) using a magnetic bead sample ratio of 0.4. To retain cDNA fragments > 3kb, the BluePippin device and appropriate separation DNA gel cassettes were used (Sage science). cDNA was amplified using 14 PCR cycles and 12 min extension time at 65°C. Libraries were sequenced on a MinION 1B or GridION sequencing device from Oxford Nanopore Technologies (R9.4.1). Reads were processed using guppy-5.0.7 (model: dna_r9.4.1_450bps_sup.cfg). Reads were aligned to the Drosophila melanogaster reference genome (Ensembl assembly release dm6) or to the Homo sapiens reference (GRCh38), and transcriptome reference annotation release-96 and release-91, respectively. For genomic alignments, reads were mapped using minimap2,87 with parameters “minimap2 -ax splice -u f”. Alignment files were sorted and indexed using samtools v1.12. For transcriptome alignments, “minimap2 -ax map-ont -u f” was used.
Nanopore Direct RNA sequencing (DRS)
DRS was performed on 3-day-old w1118 fly heads, 14-16h AEL embryos, and dissected ovaries. Polyadenylated RNA molecules were isolated from total RNA preparations using the Dynabeads™ mRNA Purification Kit (Invitrogen). Multiple poly-A+ pulldowns were pooled to reach 500 ng PolyA+ RNA input for library preparation using the Direct RNA sequencing kit (Oxford Nanopore Technologies). Libraries were sequenced on a MinION 1B or GridION sequencing device from Oxford Nanopore Technologies. Reads were processed using guppy-5.0.7 (model: rna_r9.4.1_70bps_hac.cfg).
Iso-seq
Iso-seq libraries were prepared using 500 ng total RNA from 3-day-old w1118 fly heads, processed with the Iso-seq express 2.0 workflow (PacBio) with 14 cycles of PCR amplification and size selection with the BluePippin system for transcripts larger than 3 kb according to the manufacturer’s protocol. After SMRTbell adapter addition, libraries were sequenced on three SMRTcells on a Sequel I PacBio sequencer. The raw data files were processed with SMRT Link v8 software to generate CCS fastq files. Data analysis was performed using the Iso-seq3 pipeline to generate consensus reads. Reads were mapped using STARlong91 to the Drosophila melanogaster reference genome (Ensembl assembly release dm6), and the transcriptome reference annotation release-96.
FLAM-seq
FLAM-seq libraries were prepared as described in Legnini et al.38 (extended protocol available at 10.21203/rs.2.10045/v1) using 4 μg total RNA from 3-day-old w1118 fly heads. Briefly, poly(A)-selected RNA was tailed using the USB poly(A) length assay kit (Thermo Fisher), cleaned up with RNAClean XP Beads (Beckmann Coulter) and reverse transcribed with SMARTScribe Reverse Transcriptase kit (Clontech). The resulting cDNA was purified with XP DNA beads (Beckmann Coulter), amplified by PCR with the Advantage 2 DNA polymerase mix (Clontech), and purified again using Ampure XP DNA Beads (Beckmann Coulter). After SMRTbell adapter addition, libraries were sequenced on 3 SMRT cells on a Sequel I PacBio sequencer. Reads were processed using the FLAMAnalysis pipeline38 (https://github.com/rajewsky-lab/FLAMAnalysis) with the Drosophila melanogaster Ensembl genome assembly and transcriptome reference annotation (release dm6).
Comparison across LRS methods
Calculations of transcript coverage per read were obtained by dividing the number of aligned nucleotides by the annotated transcript length.20 To compare gene expression estimates across long-read and short-read sequencing methods, a variance stabilizing transformation (VST) was applied using the DESeq290 function vst() on raw gene counts data from the different samples. The transformed data was used to compute a PCA using the DESeq2 function plotPCA() with standard parameters. Enrichments relative to TSS and PAS were computed by comparing the total number of reads mapping to TSS or PAS regions divided by the total number of reads assigned to the whole gene.38 Poly(A) signal enrichment was obtained by screening for motifs in a 20-nucleotide window of every PAS. Screening followed a hierarchical order based on known poly(A) signals and their strength, with the following rank: AATAAA, ATTAAA, AATATA, AAGAAA, AATACA, AATAGA, AATGAA, ACTAAA CATAAA, GATAAA, TATAAA, TTTAAA. The positional probabilities per nucleotide were computed by counting the total number of times a given nucleotide was found in a given position per total number of nucleotides observed at a given position.
Comparison of long read 5ʹ end pile-ups
For the benchmarking of LRS putative novel TSSs, we used ONT cDNA datasets. The reads were trimmed to their most 5ʹ nucleotide, and peaks were called in windows of 50 nt. Only peaks with more than 30 counts per million were kept for comparison. Peaks were tested for overlaps against the Eukaryotic Promoter Database (EPD) using a window of 50 nt.39 Using the ChipSeeker package99 and a window of -150 to +150, non-overlapping 5ʹ-pile-ups were annotated to features against the reference annotation (Ensembl assembly release dm6).
Generation of the Drosophila Combined Isoform Assembly (CIA) database
Transcriptome assemblies
For each tissue and method, all sequencing replicates were merged into a FASTQ file before assembly. Minimap287 was used to map Nanopore long reads with the "-ax splice -uf" option to the Drosophila dm6 genome indexed with the "-x 14" option. STARlong91 was used to map Iso-seq and FLAM-seq data using the following parameters38: “--outFilterMultimapScoreRange 20 --outFilterScoreMinOverLread --outFilterMatchNminOverLread 0.66--outFilterMismatchNmax 1000 --winAnchorMultimapNmax 200 --seedSearchStartLmax 12 --seedPerReadNmax 100000--seedPerWindowNmax 100 --alignTranscriptsPerReadNmax 100000 --alignTranscriptsPerWindowNmax 10000”. The resulting BAM files were indexed and converted to bed12 files. FLAIR40 was then used to correct and collapse isoforms. During the FLAIR correct step, splice junction information from the respective RNA-seq datasets (short reads) was used to correct individual transcriptomes. During the FLAIR collapse step, the Eukaryotic Promoter Database EPD39 was used to retain only reads with a supported TSS at their 5ʹ end, using “--max_ends 5” to allow for multiple 5ʹ-3ʹ end identification. A minimum of three (Nanopore) or two (Iso-seq/FLAM-seq) full-length reads were required for an isoform to be collapsed in the assembly. The resulting isoforms were annotated with SQANTI3 v1.221 to determine novel isoforms and structural categories, using an internal priming window of 50.
Generation of a PAS database
Assemblies were filtered for 3ʹ ends that likely originated from internal priming or truncation during library preparation. We used FLAM-seq and DRS data, as both of these methods allow for poly(A) tail detection, to perform poly(A) tail calling. Only reads containing a poly(A) tail were retained, and were trimmed to a single nucleotide preceding the poly(A) tail. Single nucleotide reads were clustered in 20-nt windows; clusters supported by at least two reads were included in the PAS database. The database includes only protein-coding transcripts.
3ʹ end filtering and correction
Individual assemblies from each method were corrected using the PAS database. The following filtering parameters were considered: 1) All isoforms overlapping a 3ʹ end in a window of 100 nt were retained, 2) 3ʹ ends found in the assembly more distal than the 3ʹ ends found in the database were retained only if they were within the reference annotation and contained an AATAAA signal. For 3ʹ end correction: 1) 3ʹ UTR bins were created using the PAS database, starting from the end of the open reading frame, between each consecutive PAS, to the most distal PAS. Isoform 3ʹ ends falling within the last bin of the 3ʹ UTR (between the two distal-most PASs) were corrected to the most distal bin, provided the isoform covered more than 10% of the last bin. Assemblies were merged first by tissue, using TAMA.110 Isoform merging was allowed if their difference was less than: 150 nt at the 3ʹ end, 50 nt at the 5ʹ end, and 10 nt at exon boundaries. After generating merged transcriptomes per technique per tissue, we combined transcriptomes per tissue to create the CIA assembly. All steps and pipelines used to create CIA can be found in: https://doi.org/10.5281/zenodo.7759448.
Functional annotation of CIA transcriptome
To generate an annotation of the CIA transcriptome at isoform-feature level, we used IsoAnnotLite version 2.7.3 with “-novel flag”, using precomputed files for Drosophila melanogaster and the CIA reference. Annotated transcriptome data were deposited at NCBI Gene Expression Omnibus (GEO). To explore and retrieve features from the CIA transcriptome, the R package TaiLoR is available at: https://doi.org/10.5281/zenodo.7759434.
Generation of the human cerebral organoid CIA database
Organoid CIA was generated using FLAIR40 and steps were identical to Drosophila CIA, with the following modifications. The FANTOM TSS database111 was used for FLAIR collapse. The organoid 3ʹ end database used organoid FLAM-seq data85 obtained from biological replicates of the RNA samples from which ONT cDNA data were generated. Short-read correction used organoid mRNA-seq data85 obtained from biological replicates of the RNA samples from which ONT cDNA data were generated. A minimum of three full-length reads were required for an isoform to be collapsed in the assembly. The same parameters were used for database building as for Drosophila CIA, except that clusters supported by at least one read were included in the PAS database. The assembled transcriptomes were assessed for novel isoforms as well as structural categories using SQANTI3v.1.2.21
Poly(A) tail length estimation
For FLAM-seq datasets, poly(A) tail length estimation was performed using https://github.com/rajewsky-lab/FLAMAnalysis.38 For DRS datasets, poly(A) tail length estimations were performed using https://github.com/nanoporetech/pipeline-polya-ng. Lengths were summarized at gene level as the median poly(A) tail length per gene. At isoform level, tails were assigned to transcripts and summarized as median poly(A) tail length per transcript.
Saturation analysis
Saturation analysis was performed by pooling all ONT cDNA datasets from all tissues and randomly sampling different fractions from 1% to 100% from the raw read files using seqtkV1.2-r94. Then, the CIA framework was applied to each individual fraction. Results were summarized as a fraction of recovered compared to the full set.
3ʹ end and 5ʹ end diversity calculation
The diversity of 3ʹ ends per gene type was estimated with the Shannon and Simpson indexes using the R package vegan.96 To assess the regulatory relationship between TSS and PAS diversity, we computed the number of 3ʹ ends found in genes with increasing numbers of 5ʹ ends, and vice versa. The matrices of counts for both calculations were provided as input for both Shannon and Simpson index calculation using the function diversity().
3ʹ UTR length comparisons
Differential expression of 3ʹ ends in heads and ovaries was computed using DEXSeq.89 The average length of the bins that were significantly differentially expressed was calculated and summarized per gene for each tissue.
Long-reads-based Alternative Termination Estimation and Recognition (LATER)
Quantification of 5ʹ-3ʹ isoforms
We counted 5ʹ-3ʹ isoforms using GenomicFeatures.92 Each ONT cDNA read was assigned to a TSS in a window of 50 nt and to a PAS in a window of 150 nt. Only the reads that mapped to both features were retained and considered full-length reads. Counts were summarized in 5ʹ-3ʹ isoforms, resulting in counts for each 5ʹ-3ʹ combination. For dominant promoter calculations, transcripts longer than 10 kb were not assessed due to lack of full-length coverage.
Calculation of TSS bias in APA-ATSS genes
A joint frequency matrix containing the reads of each 5ʹ-3ʹ isoform was summarized and subjected to multinomial testing with chi-squared test. We used Monte-Carlo simulation processing to obtain reliable estimates for the p-values and then corrected them using the Benjamini-Hochberg method. Only genes with at least two 5ʹ-3ʹ isoforms, each isoform defined by at least two full-length reads, were considered for the analysis. For Drosophila data, a gene was classified as transcriptionally biased with the p-value cutoff: adj. p-value < 0.1. For human brain organoid data, because it was supported by fewer reads, we used a more stringent p-value cutoff: adj. p-value < 0.01.
Calculation of TSS bias
Promoter dominance was estimated using two different metrics: TSS contribution and PAS contribution. TSS contribution represents the number of reads of a given 5ʹ-3ʹ isoform, divided by the total number of reads supporting the overall expression of that 3ʹ end. PAS contribution represents the number of reads of a given 5ʹ-3ʹ isoform, divided by the total number of reads supporting the overall expression of that 5ʹ end. A TSS was termed a “dominant promoter” if 1) the gene was classified as transcriptionally biased, 2) the TSS contribution exceeded 20%, and 3) the PAS contribution exceeded 60%. The R package LATER with a description of all processing steps can be found in: https://doi.org/10.5281/zenodo.7759430.
Quantification of differential 5ʹ-3ʹ isoform expression
5ʹ-3ʹ isoforms were quantified using the LATER counter and summarized as a counts table per pair. The table was provided to the DEXSeq framework89 for differential isoform usage, modeling each 5ʹ-3ʹ isoform as an exon feature within a gene group.112 To determine whether the changes in 5ʹ-3ʹ isoform expression originated from the TSS, the PAS, or both, differential gene expression was carried out individually for each TSS and PAS, then assigned to each 5ʹ-3ʹ isoform.
Long-reads-based Alternative Splicing Estimation and Recognition (LASER)
LASER quantifies the regulatory links between exons, 5ʹ ends and 3ʹ ends. Given that every read represents a full-length transcript, we assessed all features of each read to quantify the frequency of co-occurrence between features using multinomial testing.
Quantification of TSS-exon or 3ʹ-exon associations
Reads were filtered to retain only full-length reads using the same parameters as in LATER. For every read, junctions were corrected using short-read sequencing and the reference annotation. Then for each read, a database was created containing all exon junctions as well as the 5ʹ and 3ʹ ends. Using this read to feature assignment, the total reads carrying the combination of a given 5ʹ end with an exon-junction, or 3ʹ end with a given exon junction were summarized.
Calculation of TSS-exon or 3′-exon biases
We created a database of exon junctions that considered only exons that are independent of 5ʹ (alternative 1st exon) or 3ʹ regulation (alternative last exon). Only genes containing more than one splice junction combination were retained. A joint frequency matrix containing the total number of counts per 5ʹ-exon or 3ʹ-exon pair was summarized and subjected to multinomial testing as in LATER. As a measure of bias strength, we summarized every residual of each tested combination using the sum of squares for each gene. To classify splicing events associated with links, we classified alternative splicing events in the CIA transcriptome using SUPPA.94 Using this annotation, exon junctions associated with the splicing events were extracted. Only junctions with an absolute residual change > 0.7 were considered biased. The R package LASER with a description of all processing steps can be found in: https://doi.org/10.5281/zenodo.7759428.
ChIP-seq data analysis
ChIP-seq data obtained from Drosophila head tissue (modENCODE61) was analyzed. Fastq files were mapped and processed with snakePipes88 using DNA-mapping and the ChIP-seq workflow, adding flags for “--singleEnd and --fragmentLength 50”. Bigwig signal tracks were generated by computing the log2 fold change of each ChIP compared to the respective input. Heatmaps, gene profiles and clustering were generated using deeptools.113
Analysis of transcription factor enrichment at TSSs
TSSs were generated from the CIA reference transcriptome using a 50 nt window. Enrichment of factors at TSSs was estimated using the ReMap2022 databases for Drosophila and Human and the package ReMapEnrich63 with, as background, ATSS-APA genes without a dominant promoter. Enrichment was determined with the cutoff: p-value<0.01.
Analysis of single-cell RNA-seq data using the CIA 3ʹ end database
Raw data from the single-cell Drosophila brain transcriptome atlas49 were mapped using CellRanger.95 To generate the matrix of counts, the CIA 3ʹ end database was provided as an input to the Sierra93 function CountPeaks(). Per isoform-cell counts were annotated to cell types and clustering information with PeakSeuratFromTransfer() using metadata from Davie et al.49 3ʹ end expression was then summarized per cell type using the Seurat::AverageExpression() function, and normalized using the Seurat::NormalizeData() function with LogNormalize. 3ʹ ends per cell type were considered expressed if they were represented by at least 0.1 normalized counts.
Conservation of 5ʹ UTRs and 3ʹ UTRs
PhasCons scores were retrieved using the GenomicScores98 R package for Drosophila using reference phastCons tree model for the 27 species.
Co-evolution analysis
We determined gene co-evolution maps at the single nucleotide level using pairwise mutational information between positions derived from 27 species alignment tracks from UCSC, with Drosophila melanogaster (dm6) as the reference sequence. For the genes stai and Act5C, we extracted multiple sequence alignments from -1.5 kb to the 3ʹ end of the gene. The retrieved alignments were filtered using the refineMSA function from the ProDy package, keeping sequences with 60% gaps (parameter: rowocc = 0.4) and an identity level of 98% (parameter seqid=0.98), since the alignments spanned the entire gene, including introns. We used mutual information to estimate the probability that a given nucleotide change would be accompanied by another nucleotide change. We normalized the mutual information using the average product correction method (APC)56 and implemented in the ProDy python package.97 To perform a global analysis of co-evolution, we selected the top 50 dominant promoter and the bottom 50 (by p-value from the LATER analysis). We computed co-evolution using three regions of interest of each gene to reduce computational time: 1) TSS1 (-1kb), 2) TSS2 (-1kb), 3) the entire 3ʹ UTR sequence. To extract the mutual information between each TSS region and the 3ʹ UTR from the co-evolution matrix, we identified the local maxima of normalized mutual information using the function gsignal::findpeaks(x, MinPeakDistance = 2, MinPeakWidth = 2, MinPeakHeight = 0.2) of the R package gsignal. For every gene, we computed the sum of local maxima of the overlapping regions promoter/3ʹUTR. We classified genes as “co-evolving” when the sum of local maxima was in the top 50th percentile of the distribution of the sum of local maxima in the dataset. The code for all steps from extraction to processing and output is available at https://doi.org/10.5281/zenodo.7759440.
Identification of differential poly(A) site usage
We identified differential poly(A) site usage, using the APA target caller86 with the parameters “min_distance = 100 padj < 0.05”.
Motif enrichment in dominant-promoter-associated 3ʹ UTRs
To predict potentially relevant microRNA binding sites (i.e. with a higher likelihood to exert a functional impact on target mRNAs) in dominant-promoter associated, distal 3ʹ UTRs, we used a subset of 65 microRNAs that were 1. highly conserved (node of origin: Diptera) and 2. well expressed in fly heads (at least 1000 cpm) from MirGeneDB v.2.1,57 collapsed them into 52 unique 7mer (2-8) seed sequences and computed the number of occurrences of their reverse complementary sequence in either proximal or distal 3ʹ UTR isoforms for a set of 173 dominant-promoter genes for which the distal 3ʹ UTR was uniquely associated with a dominant promoter. RBP enrichment in dominant-promoter-associated distal 3ʹ UTRs was performed on the distal 3ʹ UTR segments using the BSgenome.Dmelanogaster.UCSC.dm6 reference genome package in R. The FASTA files were submitted to the MEME suite server and the AME program was used to calculate enrichment. For the comparisons, proximal 3ʹ UTR segments were used as control sequences. Motif scanning was performed using FIMO with a cut-off p-value < 0.0001, using the motif matrices from114 for RBP enrichment. For microRNA enrichment analysis, motif scanning used the miRbase v22 Single Species microRNA database for Drosophila melanogaster.115 To further assess the regulatory potential of these 6 microRNAs, we first confirmed that they are expressed in fly heads in MiRGeneDB v.2.1,57 and of the only two resulting (poorly) expressed in fly heads, dme-miR-2279-5p and dme-miR-9388-5p, we used TargetScan Fly v7.2100 to compile a list of predicted binding sites transcriptome-wide for miR-2279-5p. A Gene Ontology analysis was performed on the resulting gene list (mRNAs not expressed in heads were excluded) using DAVID (v2022q4). We defined microRNA targets as genes with a cumulative weighted context score less than -1. Head-expressed genes were used as the background. GO terms with a p-value less than 0.05 (after Bonferroni false discovery rate (FDR) correction) were considered significant.
3′-seq analysis
Reads were processed with fastp to remove poly(A) stretches and then mapped to the dm6 genome using STAR v2.6.1b with modified parameters ("--sjdbOverhang 74 --limitBAMsortRAM 60000000000 --alignIntronMax 1"). In order to eliminate the signal that may come from internal priming, any poly(A) sites overlapping with a strand-specific blacklist region that contained genomic positions with more than 70% As in a 10-bp upstream window were discarded. Regions with high A density within 250 bp of annotated transcription end sites were not included in the blacklist. The remaining single base pair poly(A) sites from all samples with a minimum coverage of 5 reads per sample were grouped, with sites within 15 bp merged into a single poly(A) cluster.
Random forest classification of 3ʹ ends
Using 3ʹ-seq clusters, we extracted features from 3ʹ ends identified by FLAM-seq in human organoids. These features included: poly(A) signals at 20 nucleotides upstream from the identified 3ʹ end, the nucleotide content and annotated feature (e.g. 3ʹ UTR, 5ʹ UTR) of the 3ʹ end. We used these features to train a Random Forest model in R using the randomForest package. We created a training set based on FLAM-seq 3ʹ end clusters as our TRUE set and non-overlapping 3ʹ ends as the FALSE set. The model was trained using 1000 trees with 12 random variables set at each split (randomForest(ntree=1000, mtry=12)). The TRUE clusters obtained from classification were then used as a poly(A) database to correct human organoid assemblies. Pretrained models are available at: https://doi.org/10.5281/zenodo.7438383.
Quantification and statistical analysis
Statistical parameters and tests are reported in the respective figure legends; software used is described in the STAR Methods section and in the key resources table.
Additional resources
A Drosophila mRNA isoform atlas, depicting all CIA transcript isoforms identified and representing their differential expression in several tissues and developmental stages, is publicly available. https://hilgerslab.shinyapps.io/ciaTranscriptome.
Acknowledgments
We thank Salah Ayoub for technical help, Stephanie Falk at the MPI-IE Deep Sequencing Core, and Claudia Quedenau at the BIH/MDC Genomics Technology Platform. We are grateful to Alejandro Gomez Auli and Gerhard Mittler at the Proteomics Core and Thomas Manke and the Bioinformatics Core at MPI-IE, especially Leily Rabbani and Devon Ryan for help with long-read processing and base calling. We thank Judit Carrasco, Dominika Grzejda, Anton Hess, Sakshi Gorey, Yidan Sun, Niyazi Umut Erdogdu, Laurent Pieuchot, Dominic Grün, Wolfgang Driever, and Nicola Iovino for helpful discussions and feedback. We thank Marvin Jens, Grygoriy Zolotarov, Michael Rauer, and Andrew Rezansoff for expert advice on data analysis. We thank the TRiP at Harvard Medical School (NIH/NIGMS R01-GM084947) for providing transgenic RNAi fly stocks. Stocks obtained from the Bloomington Drosophila Stock Center (NIH P40OD018537) were used in this study. This work was funded by the Max Planck Society, the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) project 500335138, SFB 1381 (project-ID 403222702), and under Germany’s Excellence Strategy (CIBSS—EXC-2189—project ID 390939984), and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement no. ERC-2018-STG-803258).
Author contributions
V.H. conceptualized the study. C.A.-G., S.H., I.L., L.A., F.M., A.R.-W., U.B., H.C.O., and V.H. performed experiments. C.A.-G., V.H., S.H., I.L., L.A., and U.B. designed and analyzed experiments. C.A.-G., V.H., I.L., and N.R. designed computational data analysis. C.A.-G. and H.C.O. performed computational data analysis. I.L. performed FLAM data analysis and microRNA analysis. C.A.-G. and D.K. performed co-evolution analysis. D.K. optimized isoform assembly. V.H. and C.A.-G. prepared the figures. V.H. and C.A.-G. wrote the manuscript with input from all authors. V.H. and N.R. supervised the study and acquired funding.
Declaration of interests
The authors declare no competing interests.
Inclusion and diversity
We support inclusive, diverse, and equitable conduct of research. We worked to ensure sex balance in the selection of non-human subjects. One or more of the authors of this paper self-identifies as an underrepresented ethnic minority in their field of research or within their geographical location. One or more of the authors of this paper self-identifies as a gender minority in their field of research. While citing references scientifically relevant for this work, we also actively worked to promote gender balance in our reference list.
Published: May 12, 2023
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.cell.2023.04.012.
Supplemental information
Data and code availability
-
•
All LRS and RNA-seq data have been deposited at NCBI Gene Expression Omnibus (GEO) and are publicly available as of the date of publication. Accession numbers are listed in the key resources table.
-
•
All original code has been deposited at Zenodo and is publicly available. DOIs and GitHub links are listed in the key resources table.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
References
- 1.Mitschka S., Mayr C. Context-specific regulation and function of mRNA alternative polyadenylation. Nat. Rev. Mol. Cell Biol. 2022;23:779–796. doi: 10.1038/s41580-022-00507-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.LaForce G.R., Philippidou P., Schaffer A.E. mRNA isoform balance in neuronal development and disease. Wiley Interdiscip. Rev. RNA. 2022 doi: 10.1002/wrna.1762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Gruber A.J., Zavolan M. Alternative cleavage and polyadenylation in health and disease. Nat. Rev. Genet. 2019;20:599–614. doi: 10.1038/s41576-019-0145-z. [DOI] [PubMed] [Google Scholar]
- 4.Mariella E., Marotta F., Grassi E., Gilotto S., Provero P. The length of the expressed 3′ UTR is an intermediate molecular phenotype linking genetic variants to complex diseases. Front. Genet. 2019;10:714. doi: 10.3389/fgene.2019.00714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Li L., Huang K.L., Gao Y., Cui Y., Wang G., Elrod N.D., Li Y., Chen Y.E., Ji P., Peng F., et al. An atlas of alternative polyadenylation quantitative trait loci contributing to complex trait and disease heritability. Nat. Genet. 2021;53:994–1005. doi: 10.1038/s41588-021-00864-5. [DOI] [PubMed] [Google Scholar]
- 6.Hilgers V. Regulation of neuronal RNA signatures by ELAV/Hu proteins. Wiley Interdiscip. Rev. RNA. 2023;14 doi: 10.1002/wrna.1733. [DOI] [PubMed] [Google Scholar]
- 7.Nagaike T., Logan C., Hotta I., Rozenblatt-Rosen O., Meyerson M., Manley J.L. Transcriptional activators enhance polyadenylation of mRNA precursors. Mol. Cell. 2011;41:409–418. doi: 10.1016/j.molcel.2011.01.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gromak N., West S., Proudfoot N.J. Pause sites promote transcriptional termination of mammalian RNA polymerase II. Mol. Cell. Biol. 2006;26:3986–3996. doi: 10.1128/MCB.26.10.3986-3996.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Dubbury S.J., Boutz P.L., Sharp P.A. CDK12 regulates DNA repair genes by suppressing intronic polyadenylation. Nature. 2018;564:141–145. doi: 10.1038/s41586-018-0758-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yang Y., Li W., Hoque M., Hou L., Shen S., Tian B., Dynlacht B.D. PAF complex plays novel subunit-specific roles in alternative cleavage and polyadenylation. PLOS Genet. 2016;12 doi: 10.1371/journal.pgen.1005794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Oktaba K., Zhang W., Lotz T.S., Jun D.J., Lemke S.B., Ng S.P., Esposito E., Levine M., Hilgers V. ELAV links paused Pol II to alternative polyadenylation in the drosophila nervous system. Mol. Cell. 2015;57:341–348. doi: 10.1016/j.molcel.2014.11.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kwon B., Fansler M.M., Patel N.D., Lee J., Ma W., Mayr C. Enhancers regulate 3′ end processing activity to control expression of alternative 3′UTR isoforms. Nat. Commun. 2022;13 doi: 10.1038/s41467-022-30525-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Xiao R., Chen J.-Y., Liang Z., Luo D., Chen G., Lu Z.J., Chen Y., Zhou B., Li H., Du X., et al. Pervasive chromatin-RNA binding protein interactions enable RNA-based regulation of transcription. Cell. 2019;178:107–121.e118. doi: 10.1016/j.cell.2019.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Dantonel J.C., Murthy K.G.K., Manley J.L., Tora L. Transcription factor TFIID recruits factor CPSF for formation of 3′ end of mRNA. Nature. 1997;389:399–402. doi: 10.1038/38763. [DOI] [PubMed] [Google Scholar]
- 15.Glover-Cutter K., Kim S., Espinosa J., Bentley D.L. RNA polymerase II pauses and associates with pre-mRNA processing factors at both ends of genes. Nat. Struct. Mol. Biol. 2008;15:71–78. doi: 10.1038/nsmb1352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wang Y., Fairley J.A., Roberts S.G. Phosphorylation of TFIIB links transcription initiation and termination. Curr. Biol. 2010;20:548–553. doi: 10.1016/j.cub.2010.01.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Nanavaty V., Abrash E.W., Hong C., Park S., Fink E.E., Li Z., Sweet T.J., Bhasin J.M., Singuri S., Lee B.H., et al. DNA methylation regulates alternative polyadenylation via CTCF and the cohesin complex. Mol. Cell. 2020;78:752–764.e6. doi: 10.1016/j.molcel.2020.03.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lamas-Maceiras M., Singh B.N., Hampsey M., Freire-Picos M.A. Promoter-terminator gene loops affect alternative 3′-end processing in yeast. J. Biol. Chem. 2016;291:8960–8968. doi: 10.1074/jbc.M115.687491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Soles L.V., Shi Y. Crosstalk between mRNA 3′-end processing and epigenetics. Front. Genet. 2021;12 doi: 10.3389/fgene.2021.637705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Soneson C., Yao Y., Bratus-Neuenschwander A., Patrignani A., Robinson M.D., Hussain S. A comprehensive examination of nanopore native RNA sequencing for characterization of complex transcriptomes. Nat. Commun. 2019;10 doi: 10.1038/s41467-019-11272-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Tardaguila M., de la Fuente L., Marti C., Pereira C., Pardo-Palacios F.J., Del Risco H., Ferrell M., Mellado M., Macchietto M., Verheggen K., et al. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res. 2018;28:396–411. doi: 10.1101/gr.222976.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Chen Y., Davidson N.M., Wan Y.K., Patel H., Yao F., Low H.M., Hendra C., Watten L., Sim A., Sawyer C., et al. 2021. A systematic benchmark of nanopore long read RNA sequencing for transcript level analysis in human cell lines. [DOI] [Google Scholar]
- 23.Logsdon G.A., Vollger M.R., Eichler E.E. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 2020;21:597–614. doi: 10.1038/s41576-020-0236-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Oikonomopoulos S., Bayega A., Fahiminiya S., Djambazian H., Berube P., Ragoussis J. Methodologies for transcript profiling using long-read technologies. Front. Genet. 2020;11 doi: 10.3389/fgene.2020.00606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Sharon D., Tilgner H., Grubert F., Snyder M. A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol. 2013;31:1009–1014. doi: 10.1038/nbt.2705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Wang X., You X., Langer J.D., Hou J., Rupprecht F., Vlatkovic I., Quedenau C., Tushev G., Epstein I., Schaefke B., et al. Full-length transcriptome reconstruction reveals a large diversity of RNA and protein isoforms in rat hippocampus. Nat. Commun. 2019;10 doi: 10.1038/s41467-019-13037-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Glinos D.A., Garborcauskas G., Hoffman P., Ehsan N., Jiang L., Gokden A., Dai X., Aguet F., Brown K.L., Garimella K., et al. Transcriptome variation in human tissues revealed by long-read sequencing. Nature. 2022;608:353–359. doi: 10.1038/s41586-022-05035-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Anvar S.Y., Allard G., Tseng E., Sheynkman G.M., de Klerk E., Vermaat M., Yin R.H., Johansson H.E., Ariyurek Y., den Dunnen J.T., et al. Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing. Genome Biol. 2018;19 doi: 10.1186/s13059-018-1418-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Sousa-Luís R., Dujardin G., Zukher I., Kimura H., Weldon C., Carmo-Fonseca M., Proudfoot N.J., Nojima T. POINT technology illuminates the processing of polymerase-associated intact nascent transcripts. Mol. Cell. 2021;81:1935–1950.e6. doi: 10.1016/j.molcel.2021.02.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Drexler H.L., Choquet K., Merens H.E., Tang P.S., Simpson J.T., Churchman L.S. Revealing nascent RNA processing dynamics with Nano-COP. Nat. Protoc. 2021;16:1343–1375. doi: 10.1038/s41596-020-00469-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Reimer K.A., Mimoso C.A., Adelman K., Neugebauer K.M. Co-transcriptional splicing regulates 3′ end cleavage during mammalian erythropoiesis. Mol. Cell. 2021;81:998–1012.e7. doi: 10.1016/j.molcel.2020.12.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Prudêncio P., Savisaar R., Rebelo K., Martinho R.G., Carmo-Fonseca M. Transcription and splicing dynamics during early Drosophila development. Rna. 2022;28:139–161. doi: 10.1261/rna.078933.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Li B., Marques S., Wang J., Pelechano V. Using TIF-Seq2 to investigate association between 5′ and 3′mRNA ends. Methods Enzymol. 2021;655:85–118. doi: 10.1016/bs.mie.2021.03.017. [DOI] [PubMed] [Google Scholar]
- 34.Amarasinghe S.L., Su S., Dong X., Zappia L., Ritchie M.E., Gouil Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 2020;21 doi: 10.1186/s13059-020-1935-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Brown J.B., Boley N., Eisman R., May G.E., Stoiber M.H., Duff M.O., Booth B.W., Wen J., Park S., Suzuki A.M., et al. Diversity and dynamics of the Drosophila transcriptome. Nature. 2014;512:393–399. doi: 10.1038/nature12962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Larkin A., Marygold S.J., Antonazzo G., Attrill H., dos Santos G., Garapati P.V., Goodman J.L., Gramates L.S., Millburn G., Strelets V.B., et al. FlyBase: updates to the Drosophila melanogaster knowledge base. Nucleic Acids Res. 2021;49:D899–D907. doi: 10.1093/nar/gkaa1026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Parker M.T., Knop K., Sherwood A.V., Schurch N.J., Mackinnon K., Gould P.D., Hall A.J.W., Barton G.J., Simpson G.G. Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m\textsuperscript{6}A modification. eLife. 2020;9 doi: 10.7554/eLife.49658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Legnini I., Alles J., Karaiskos N., Ayoub S., Rajewsky N. FLAM-seq: full-length mRNA sequencing reveals principles of poly(A) tail length control. Nat. Methods. 2019;16:879–886. doi: 10.1038/s41592-019-0503-y. [DOI] [PubMed] [Google Scholar]
- 39.Meylan P., Dreos R., Ambrosini G., Groux R., Bucher P. EPD in 2020: enhanced data visualization and extension to ncRNA promoters. Nucleic Acids Res. 2020;48:D65–D69. doi: 10.1093/nar/gkz1014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Tang A.D., Soulette C.M., van Baren M.J., Hart K., Hrabeta-Robinson E., Wu C.J., Brooks A.N. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat. Commun. 2020;11 doi: 10.1038/s41467-020-15171-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Retelska D., Iseli C., Bucher P., Jongeneel C.V., Naef F. Similarities and differences of polyadenylation signals in human and fly. BMC Genomics. 2006;7 doi: 10.1186/1471-2164-7-176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Hilgers V., Perry M.W., Hendrix D., Stark A., Levine M., Haley B. Neural-specific elongation of 3 ′ UTRs during Drosophila development. Proc. Natl. Acad. Sci. USA. 2011;108:15864–15869. doi: 10.1073/pnas.1112672108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Smibert P., Miura P., Westholm J.O., Shenker S., May G., Duff M.O., Zhang D., Eads B.D., Carlson J., Brown J.B., et al. Global patterns of tissue-specific alternative polyadenylation in Drosophila. Cell Rep. 2012;1:277–289. doi: 10.1016/j.celrep.2012.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Ulitsky I., Shkumatava A., Jan C.H., Subtelny A.O., Koppstein D., Bell G.W., Sive H., Bartel D.P. Extensive alternative polyadenylation during zebrafish development. Genome Res. 2012;22:2054–2066. doi: 10.1101/gr.139733.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Zhang H., Lee J.Y., Tian B. Biased alternative polyadenylation in human tissues. Genome Biol. 2005;6 doi: 10.1186/gb-2005-6-12-r100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Zirin J., Hu Y., Liu L., Yang-Zhou D., Colbeth R., Yan D., Ewen-Campen B., Tao R., Vogt E., VanNest S., et al. Large-scale transgenic drosophila resource collections for loss- and gain-of-function Studies. Genetics. 2020;214:755–767. doi: 10.1534/genetics.119.302964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Zhu S., Lian Q., Ye W., Qin W., Wu Z., Ji G., Wu X. scAPAdb: a comprehensive database of alternative polyadenylation at single-cell resolution. Nucleic Acids Res. 2022;50:D365–D370. doi: 10.1093/nar/gkab795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Lee S., Chen Y.C., FCA Consortium. Gillen A.E., Taliaferro J.M., Deplancke B., Li H., Lai E.C. Diverse cell-specific patterns of alternative polyadenylation in Drosophila. Nat. Commun. 2022;13 doi: 10.1038/s41467-022-32305-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Davie K., Janssens J., Koldere D., De Waegeneer M., Pech U., Kreft Ł., Aibar S., Makhzami S., Christiaens V., Bravo González-Blas C., et al. A single-cell transcriptome atlas of the aging drosophila brain. Cell. 2018;174:982–998.e20. doi: 10.1016/j.cell.2018.05.057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Ji Z., Luo W., Li W., Hoque M., Pan Z., Zhao Y., Tian B. Transcriptional activity regulates alternative cleavage and polyadenylation. Mol. Syst. Biol. 2011;7 doi: 10.1038/msb.2011.69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Bentley D.L. Coupling mRNA processing with transcription in time and space. Nat. Rev. Genet. 2014;15:163–175. doi: 10.1038/nrg3662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Geisberg J.V., Moqtaderi Z., Struhl K. The transcriptional elongation rate regulates alternative polyadenylation in yeast. eLife. 2020;9 doi: 10.7554/eLife.59810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Bogard N., Linder J., Rosenberg A.B., Seelig G. A deep neural network for predicting and engineering alternative polyadenylation. Cell. 2019;178:91–106.e23. doi: 10.1016/j.cell.2019.04.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Tian B., Manley J.L. Alternative polyadenylation of mRNA precursors. Nat. Rev. Mol. Cell Biol. 2017;18:18–30. doi: 10.1038/nrm.2016.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Weinreb C., Riesselman A.J., Ingraham J.B., Gross T., Sander C., Marks D.S. 3D RNA and functional interactions from evolutionary couplings. Cell. 2016;165:963–975. doi: 10.1016/j.cell.2016.03.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Dunn S.D., Wahl L.M., Gloor G.B. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics. 2008;24:333–340. doi: 10.1093/bioinformatics/btm604. [DOI] [PubMed] [Google Scholar]
- 57.Fromm B., Høye E., Domanska D., Zhong X., Aparicio-Puerta E., Ovchinnikov V., Umu S.U., Chabot P.J., Kang W., Aslanzadeh M., et al. MirGeneDB 2.1: toward a complete sampling of all major animal phyla. Nucleic Acids Res. 2022;50:D204–D210. doi: 10.1093/nar/gkab1101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.McNeill E.M., Warinner C., Alkins S., Taylor A., Heggeness H., DeLuca T.F., Fulga T.A., Wall D.P., Griffith L.C., Van Vactor D. The conserved microRNA miR-34 regulates synaptogenesis via coordination of distinct mechanisms in presynaptic and postsynaptic cells. Nat. Commun. 2020;11 doi: 10.1038/s41467-020-14761-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Tan H., Poidevin M., Li H., Chen D., Jin P. MicroRNA-277 modulates the neurodegeneration caused by fragile X premutation rCGG repeats. PLoS Genet. 2012;8 doi: 10.1371/journal.pgen.1002681. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Chen S., Wang R., Zheng D., Zhang H., Chang X., Wang K., Li W., Fan J., Tian B., Cheng H. The mRNA export receptor NXF1 coordinates transcriptional dynamics, alternative polyadenylation, and mRNA export. Mol. Cell. 2019;74:118–131.e7. doi: 10.1016/j.molcel.2019.01.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Kharchenko P.V., Alekseyenko A.A., Schwartz Y.B., Minoda A., Riddle N.C., Ernst J., Sabo P.J., Larschan E., Gorchakov A.A., Gu T., et al. Comprehensive analysis of the chromatin landscape in Drosophila melanogaster. Nature. 2011;471:480–485. doi: 10.1038/nature09725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Ibarra-Morales D., Rauer M., Quarato P., Rabbani L., Zenk F., Schulte-Sasse M., Cardamone F., Gomez-Auli A., Cecere G., Iovino N. Histone variant H2A.Z regulates zygotic genome activation. Nat. Commun. 2021;12 doi: 10.1038/s41467-021-27125-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Hammal F., de Langen P., Bergon A., Lopez F., Ballester B. Remap 2022: a database of Human, Mouse, Drosophila and Arabidopsis regulatory regions from an integrative analysis of DNA-binding sequencing experiments. Nucleic Acids Res. 2022;50:D316–D325. doi: 10.1093/nar/gkab996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Feller C., Forné I., Imhof A., Becker P.B. Global and specific responses of the histone acetylome to systematic perturbation. Mol. Cell. 2015;57:559–571. doi: 10.1016/j.molcel.2014.12.008. [DOI] [PubMed] [Google Scholar]
- 65.Jin Q., Yu L.R., Wang L., Zhang Z., Kasper L.H., Lee J.E., Wang C., Brindle P.K., Dent S.Y., Ge K. Distinct roles of GCN5/PCAF-mediated H3K9ac and CBP/p300-mediated H3K18/27ac in nuclear receptor transactivation. EMBO J. 2011;30:249–262. doi: 10.1038/emboj.2010.318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Kelley K.W., Pașca S.P. Human brain organogenesis: toward a cellular understanding of development and disease. Cell. 2022;185:42–61. doi: 10.1016/j.cell.2021.10.003. [DOI] [PubMed] [Google Scholar]
- 67.Naftelberg S., Schor I.E., Ast G., Kornblihtt A.R. Regulation of alternative splicing through coupling with transcription and chromatin structure. Annu. Rev. Biochem. 2015;84:165–198. doi: 10.1146/annurev-biochem-060614-034242. [DOI] [PubMed] [Google Scholar]
- 68.Boumpas P., Merabet S., Carnesecchi J. Integrating transcription and splicing into cell fate: transcription factors on the block. WIREs RNA n/a. 2022 doi: 10.1002/wrna.1752. [DOI] [PubMed] [Google Scholar]
- 69.Cramer P., Pesce C.G., Baralle F.E., Kornblihtt A.R. Functional association between promoter structure and transcript alternative splicing. Proc. Natl. Acad. Sci. USA. 1997;94:11456–11460. doi: 10.1073/pnas.94.21.11456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.de la Mata M., Alonso C.R., Kadener S., Fededa J.P., Blaustein M., Pelisch F., Cramer P., Bentley D., Kornblihtt A.R. A slow RNA polymerase II affects alternative splicing in vivo. Mol. Cell. 2003;12:525–532. doi: 10.1016/j.molcel.2003.08.001. [DOI] [PubMed] [Google Scholar]
- 71.Rambout X., Dequiedt F., Maquat L.E. Beyond transcription: roles of transcription factors in Pre-mRNA splicing. Chem. Rev. 2018;118:4339–4364. doi: 10.1021/acs.chemrev.7b00470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Reyes A., Huber W. Alternative start and termination sites of transcription drive most transcript isoform differences across human tissues. Nucleic Acids Res. 2018;46:582–592. doi: 10.1093/nar/gkx1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Proudfoot N.J. Transcriptional termination in mammals: stopping the RNA polymerase II juggernaut. Science. 2016;352 doi: 10.1126/science.aad9926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Byeon G.W., Cenik E.S., Jiang L., Tang H., Das R., Barna M. Functional and structural basis of extreme conservation in vertebrate 5′ untranslated regions. Nat. Genet. 2021;53:729–741. doi: 10.1038/s41588-021-00830-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Theil K., Herzog M., Rajewsky N. Post-transcriptional regulation by 3′ UTRs can be masked by regulatory elements in 5′ UTRs. Cell Rep. 2018;22:3217–3226. doi: 10.1016/j.celrep.2018.02.094. [DOI] [PubMed] [Google Scholar]
- 76.Vicens Q., Kieft J.S., Rissland O.S. Revisiting the closed-loop model and the nature of mRNA 5′–3′ communication. Mol. Cell. 2018;72:805–812. doi: 10.1016/j.molcel.2018.10.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Luo M., Bai J., Liu B., Yan P., Zuo F., Sun H., Sun Y., Xu X., Song Z., Yang Y., et al. H3K18ac primes mesendodermal differentiation upon nodal signaling. Stem Cell Rep. 2019;13:642–656. doi: 10.1016/j.stemcr.2019.08.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Hilgers V. Alternative polyadenylation coupled to transcription initiation: insights from ELAV-mediated 3 ′ UTR extension. Rna Biology 12, 918–921. RNA Biol. 2015;12:918–921. doi: 10.1080/15476286.2015.1060393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Bose D.A., Donahue G., Reinberg D., Shiekhattar R., Bonasio R., Berger S.L. RNA binding to CBP stimulates histone acetylation and transcription. Cell. 2017;168:135–149.e22. doi: 10.1016/j.cell.2016.12.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Winick-Ng W., Kukalev A., Harabula I., Zea-Redondo L., Szabó D., Meijer M., Serebreni L., Zhang Y., Bianco S., Chiariello A.M., et al. Cell-type specialization is encoded by specific chromatin topologies. Nature. 2021;599:684–691. doi: 10.1038/s41586-021-04081-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Leidescher S., Ribisel J., Ullrich S., Feodorova Y., Hildebrand E., Galitsyna A., Bultmann S., Link S., Thanisch K., Mulholland C., et al. Spatial organization of transcribed eukaryotic genes. Nat. Cell Biol. 2022;24:327–339. doi: 10.1038/s41556-022-00847-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Haberle V., Arnold C.D., Pagani M., Rath M., Schernhuber K., Stark A. Transcriptional cofactors display specificity for distinct types of core promoters. Nature. 2019;570:122–126. doi: 10.1038/s41586-019-1210-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Mittleman B.E., Pott S., Warland S., Zeng T., Mu Z., Kaur M., Gilad Y., Li Y. Alternative polyadenylation mediates genetic regulation of gene expression. eLife. 2020;9 doi: 10.7554/eLife.57492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Cummings B.B., Karczewski K.J., Kosmicki J.A., Seaby E.G., Watts N.A., Singer-Berk M., Mudge J.M., Karjalainen J., Satterstrom F.K., O'Donnell-Luria A.H., et al. Transcript expression-aware annotation improves rare variant interpretation. Nature. 2020;581:452–458. doi: 10.1038/s41586-020-2329-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Rybak-Wolf A., Wyler E., Legnini I., Loewa A., Glažar P., Kim S.J., Pentimalli T.M., Martinez A.O., Beyersdorf B., Woehler A., et al. 2021. Neurodegeneration in human brain organoids infected with herpes simplex virus type 1. [DOI] [Google Scholar]
- 86.Carrasco J., Rauer M., Hummel B., Grzejda D., Alfonso-Gonzalez C., Lee Y., Wang Q., Puchalska M., Mittler G., Hilgers V. ELAV and FNE determine neuronal transcript signatures through EXon-activated rescue. Mol. Cell. 2020;80:156–163.e6. doi: 10.1016/j.molcel.2020.09.011. [DOI] [PubMed] [Google Scholar]
- 87.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Bhardwaj V., Heyne S., Sikora K., Rabbani L., Rauer M., Kilpert F., Richter A.S., Ryan D.P., Manke T. snakePipes: facilitating flexible, scalable and integrative epigenomic analysis. Bioinformatics. 2019;35:4757–4759. doi: 10.1093/bioinformatics/btz436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Anders S., Reyes A., Huber W. Detecting differential usage of exons from RNA-seq data. Genome Res. 2012;22:2008–2017. doi: 10.1101/gr.133744.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Love M.I., Huber W., Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15 doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Lawrence M., Huber W., Pagès H., Aboyoun P., Carlson M., Gentleman R., Morgan M.T., Carey V.J. Software for computing and annotating genomic ranges. PLOS Comp. Biol. 2013;9 doi: 10.1371/journal.pcbi.1003118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Patrick R., Humphreys D.T., Janbandhu V., Oshlack A., Ho J.W.K., Harvey R.P., Lo K.K. Sierra: discovery of differential transcript usage from polyA-captured single-cell RNA-seq data. Genome Biol. 2020;21 doi: 10.1186/s13059-020-02071-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Trincado J.L., Entizne J.C., Hysenaj G., Singh B., Skalic M., Elliott D.J., Eyras E. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 2018;19 doi: 10.1186/s13059-018-1417-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Zheng G.X.Y., Terry J.M., Belgrader P., Ryvkin P., Bent Z.W., Wilson R., Ziraldo S.B., Wheeler T.D., McDermott G.P., Zhu J., et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 2017;8 doi: 10.1038/ncomms14049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Oksanen J., Simpson G.L., Blanchet F.G., Kindt R., Legendre P., Minchin P.R., O'Hara R.B., Solymos P., Stevens M.H.H., Szoecs E., et al. 2022. vegan: community Ecology Package.https://github.com/vegandevs/vegan [Google Scholar]
- 97.Zhang S., Krieger J.M., Zhang Y., Kaya C., Kaynak B., Mikulska-Ruminska K., Doruker P., Li H., Bahar I. ProDy 2.0: increased scale and scope after 10 years of protein dynamics modelling with python. Bioinformatics. 2021;37:3657–3659. doi: 10.1093/bioinformatics/btab187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Puigdevall P., Castelo R. GenomicScores: seamless access to genomewide position-specific scores from R and Bioconductor. Bioinformatics. 2018;34:3208–3210. doi: 10.1093/bioinformatics/bty311. [DOI] [PubMed] [Google Scholar]
- 99.Wang Q., Li M., Wu T., Zhan L., Li L., Chen M., Xie W., Xie Z., Hu E., Xu S., et al. Exploring epigenomic datasets by ChIPseeker. Curr. Protoc. 2022;2 doi: 10.1002/cpz1.585. [DOI] [PubMed] [Google Scholar]
- 100.Agarwal V., Subtelny A.O., Thiru P., Ulitsky I., Bartel D.P. Predicting microRNA targeting efficacy in Drosophila. Genome Biol. 2018;19 doi: 10.1186/s13059-018-1504-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Marek K.W., Ng N., Fetter R., Smolik S., Goodman C.S., Davis G.W. A genetic analysis of synaptic development. Neuron. 2000;25:537–547. doi: 10.1016/S0896-6273(00)81058-2. [DOI] [PubMed] [Google Scholar]
- 102.Akimaru H., Chen Y., Dai P., Hou D.-X., Nonaka M., Smolik S.M., Armstrong S., Goodman R.H., Ishii S. Drosophila CBP is a co-activator of cubitus interruptus in hedgehog signalling. Nature. 1997;386:735–738. doi: 10.1038/386735a0. [DOI] [PubMed] [Google Scholar]
- 103.Müller J., Hart C.M., Francis N.J., Vargas M.L., SenGupta A., Wild B., Miller E.L., O'Connor M.B., Kingston R.E., Simon J.A. Histone methyltransferase activity of a Drosophila Polycomb group repressor complex. Cell. 2002;111:197–208. doi: 10.1016/s0092-8674(02)00976-5. [DOI] [PubMed] [Google Scholar]
- 104.Wu C.T., Howe M. A genetic analysis of the Suppressor 2 of zeste complex of Drosophila melanogaster. Genetics. 1995;140:139–181. doi: 10.1093/genetics/140.1.139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Chadwick R., Jones B., Jack T., McGinnis W. Ectopic expression from the Deformed gene triggers a dominant defect in Drosophila adult head development. Dev. Biol. 1990;141:130–140. doi: 10.1016/0012-1606(90)90108-u. [DOI] [PubMed] [Google Scholar]
- 106.Bellen H.J., Levis R.W., He Y., Carlson J.W., Evans-Holm M., Bae E., Kim J., Metaxakis A., Savakis C., Schulze K.L., et al. The Drosophila gene disruption project: progress using transposons with distinctive site specificities. Genetics. 2011;188:731–743. doi: 10.1534/genetics.111.126995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Port F., Bullock S.L. Augmenting CRISPR applications in Drosophila with tRNA-flanked sgRNAs. Nat. Methods. 2016;13:852–854. doi: 10.1038/nmeth.3972. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Ewen-Campen B., Yang-Zhou D., Fernandes V.R., González D.P., Liu L.P., Tao R., Ren X., Sun J., Hu Y., Zirin J., et al. Optimized strategy for in vivo Cas9-activation in Drosophila. Proc. Natl. Acad. Sci. USA. 2017;114:9409–9414. doi: 10.1073/pnas.1707635114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Giandomenico S.L., Sutcliffe M., Lancaster M.A. Generation and long-term culture of advanced cerebral organoids for studying later stages of neural development. Nat. Protoc. 2021;16:579–602. doi: 10.1038/s41596-020-00433-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Kuo R.I., Cheng Y., Zhang R., Brown J.W.S., Smith J., Archibald A.L., Burt D.W. Illuminating the dark side of the human transcriptome with long read transcript sequencing. BMC Genomics. 2020;21 doi: 10.1186/s12864-020-07123-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Abugessaisa I., Ramilowski J.A., Lizio M., Severin J., Hasegawa A., Harshbarger J., Kondo A., Noguchi S., Yip C.W., Ooi J.L.C., et al. FANTOM enters 20th year: expansion of transcriptomic atlases and functional annotation of non-coding RNAs. Nucleic Acids Res. 2020;49:D892–D898. doi: 10.1093/nar/gkaa1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Love M.I., Soneson C., Patro R. Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification. F1000Res. 2018;7 doi: 10.12688/f1000research.15398.3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Ramírez F., Ryan D.P., Grüning B., Bhardwaj V., Kilpert F., Richter A.S., Heyne S., Dündar F., Manke T. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160–W165. doi: 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Ray D., Kazan H., Cook K.B., Weirauch M.T., Najafabadi H.S., Li X., Gueroussov S., Albu M., Zheng H., Yang A., et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature. 2013;499:172–177. doi: 10.1038/nature12311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Kozomara A., Birgaoanu M., Griffiths-Jones S. miRBase: from microRNA sequences to function. Nucleic Acids Res. 2019;47:D155–D162. doi: 10.1093/nar/gky1141. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
-
•
All LRS and RNA-seq data have been deposited at NCBI Gene Expression Omnibus (GEO) and are publicly available as of the date of publication. Accession numbers are listed in the key resources table.
-
•
All original code has been deposited at Zenodo and is publicly available. DOIs and GitHub links are listed in the key resources table.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.