Abstract
Numerous factors regulate alternative splicing of human genes at a co-transcriptional level. However, how alternative splicing depends on the regulation of gene expression is poorly understood. We leveraged data from the Genotype-Tissue Expression (GTEx) project to show a significant association of gene expression and splicing for 6874 (4.9%) of 141,043 exons in 1106 (13.3%) of 8314 genes with substantially variable expression in nine GTEx tissues. About half of these exons demonstrate higher inclusion with higher gene expression, and half demonstrate higher exclusion, with the observed direction of coupling being highly consistent across different tissues and in external datasets. The exons differ with respect to multiple characteristics and are enriched for hundreds of isoform-specific Gene Ontology annotations suggesting an important regulatory mechanism. Notably, splicing-expression coupling of exons with roles in JUN and MAP kinase signalling could play an important role during cell division.
Subject terms: Computational biology and bioinformatics, Genetics
Introduction
Over 95% of human multi-exon genes undergo alternative splicing (AS) in a developmental, tissue-specific, or signal transduction-dependent manner1. Splicing is a highly regulated process by which an intron is excised from a pre-mRNA transcript and the flanking exons are ligated together by a series of steps, whereby all or part of the splicing process occurs co-transcriptionally2–4. Transcript elongation follows the initiation of transcription, adding ribonucleoside triphosphates to the growing mRNA chain. Splicing, as well as other processes involved in mRNA maturation is influenced by interactions with the RNA polymerase II (RNAP2) transcript elongation complex2. Changes in promoter sequence and occupation can modify the splicing pattern of several genes, evidencing a coupling between transcription and AS5–8. It has been proposed that the promoter effect involves modulation of RNAP2 elongation rates9,10. Two major and potentially complementary models have been proposed to explain how transcription and splicing are coupled, referred to as the kinetic coupling and the spatial coupling models.
Kinetic coupling refers to the notion that the rate of transcription elongation determines the temporal “window of opportunity” for selection or rejection of an upstream sequence. If upstream and downstream events on the nascent transcript compete, the upstream sequence will have a “head start” because it emerges from RNAP2 before the downstream sequence does. The advantage conferred by the head start is greater when elongation is slow10,11. It has been shown that elongation rate can influence AS by modulating several classes of co-transcriptional events including alternative splice site recognition, binding of regulatory proteins, and formation of RNA secondary structures12,13. These observations led to the notion that slow elongation expands the “window of opportunity” for recognition of an upstream 3′ splice site before it must compete with a downstream site, therefore promoting inclusion of the upstream cassette exon. In contrast, slow elongation was shown to favor promoter skipping of CFTR exon 9 by increasing the recruitment of the negative factor ETR-3 onto the UG-repeat at the 3′ splice site of the exon14,15.
Spatial coupling refers to the ability of the transcription machinery to recruit various classes of RNA processing factors to the site of transcript. The RNAP2 C-terminal domain (CTD) plays a central role in recruiting factors involved in transcriptional elongation, splicing, and other functions related to mRNA maturation. The RNAP2 CTD is extensively phosphorylated and dephosphorylated upon different stages of transcription and acts as a dynamic docking site for factors required for the mRNA processing events that occur together with transcript elongation16. Transcribed exons are tethered to the elongating RNAP2 transcription complex17,18. The serine and arginine-rich splicing factor 3 (SRSF3) was shown to possess a CTD-dependent inhibitory action on the inclusion of fibronectin cassette exon 3319.
Numerous other factors influence AS, including nucleosome occupancy, chromatin remodelers, RNA secondary structure, as well as histone marks and DNA methylation and the protein factors that interact with them20–24. In principle these factors could influence AS by modulating elongation through differential nucleosome density, histone modification profiles, DNA methylation density, or by recruiting splicing factors to the chromatin template as the transcriptional machinery passes11,25. Two studies have demonstrated a pervasive impact of elongation rate on splicing. The first showed that reduction of RNAP2 elongation speed by drugs or RNAP2 mutations tended to increase exon inclusion levels26. Interestingly, many of the corresponding splicing events often introduce premature truncation codons, which are predicted to lead to nonsense-mediated decay. This has been shown experimentally to be a common mechanism for gene regulation, including the autoregulation of proteins that affect the splicing process27–29. A second study investigated RNAP2 mutants that increased or decreased elongation rates, characterizing exons for which a faster elongation rate results in more inclusion of the exon in transcripts, and exons for which a faster gene expression rate results in more skipping of the exon in transcripts30.
Although gene expression is controlled by numerous transcriptional and posttranscriptional factors, substantial evidence argues that expression of most genes is controlled in part at the level of transcription elongation31–36. In this work, we leverage comprehensive bulk RNA-seq data from the Genotype-Tissue Expression (GTEx) project37,38 to investigate associations between gene expression and AS. We identify thousands of exons whose inclusion or exclusion is correlated to the overall level of gene expression and characterize significantly different properties of the exons and the transcripts and genes they are contained in.
Results
Association between gene expression and alternative splicing
We focused on alternative splicing events that differentiate between a subset of a gene’s transcripts and the rest of its transcripts. We examined rates of exon inclusion/exclusion in comparison to the overall rate of gene expression in nine tissues with 226 to 653 samples each (Fig. 1, Supplementary Table 1).
Type 0, upregulated-high ψ (UHP), and downregulated-high ψ (DHP) exons
We filtered 683,196 annotated human exons for those that show a threshold amount of variability in RNA-seq experiments from nine GTEx organ cohorts with between 226 and 653 samples each, identifying 141,043 exons that showed a degree of variable expression equal to or above a threshold of a mean count of at least 20 reads per sample and at least a two-fold ratio of the 95th percentile to the 5th percentile of expression values.
We classified the relationship between overall gene expression and the percent-spliced in (ψ) values of these exons, defining exons where increasing values of ψ (higher exon inclusion) are associated with higher gene expression as UHP exons (“upregulation of gene expression associated with high percent splice in”), exons where increasing values of ψ are associated with lower gene expression as DHP (“downregulation of gene expression associated with high percent splice in”), and exons that show alternative splicing without association between the ψ value and gene expression as type 0 exons (defined as a Benjamini–Hochberg-corrected p-value of at least 0.5). For each of the nine investigated tissues from the GTEx resource, we performed linear regression to predict gene expression based on ψ, and determined the significance of the coefficient for ψ. Raw p values were corrected for multiple testing by the Benjamini-Hochberg method, and associations are reported as significant at a corrected p-value threshold of 0.05 (Methods).
Using these heuristic definitions, we identified 3667 UHP and 3207 DHP exons; a total of 6874 unique exons were identified as UHP or DHP in at least one tissue, corresponding to 4.9% of the 141,043 exons that showed a at least a threshold level of gene expression variability (Methods). 989 exons were identified as UHP or DHP in multiple tissues (Fig. 1; additional examples are shown in Supplementary Fig. 1). In all, exons were identified as UHP or DHP 8282 times across the 9 tissues that were tested.
In all 989 cases in which exons were identified as UHP or DHP in multiple tissues, the assignment to UHP or DHP was consistent. We further used the same criteria to find the same UHP/DHP exons in sets of samples that originated from the same donor, for donors with at least 20 tissue samples. A total of 63,961 of the same UHP/DHP exons (1916 unique exons, ~89%) were detected in 528 donors. For 63,255 (~99%) the assignment to UHP or DHP was consistent with the assignment from tissue samples. The small number of inconsistencies is possibly a result of wrong classification due to the relatively small number of samples per donor (a median of 27 samples per donor vs. 342.5 per tissue).
We repeated the same analysis in unrelated breast, left ventricle and liver bulk RNA-seq datasets obtained from the SRA (Methods). In all three datasets, most of the overlapping exons were type 0 in both the GTEx and the SRA dataset, and most of the other exons were type 0 in one of the datasets. For the breast and left ventricle datasets, we observed a highly significant overlap of UHP or DHP classifications between the GTEx and SRA datasets. For liver, there were 52,521 exons that were classified as type 0, 17 exons that were classified as UHP and 47 exons that were classified as DHP. 14 exons were classified as DHP in both datasets, one exon was classified as UHP in both datasets, and all other exons were type 0 in at least one of the datasets (Supplementary Table 2). These results suggest that there is a significant consistency of exon types across different donor cohorts and experimental procedures.
Minimum prevalence of expression/splicing regulation coupling
In order to estimate how prevalent the coupling between expression and splicing is, we counted the number of exons that were neither detected as UHP nor as DHP, had a 95th/5th expression percentile ratio of at least 2, and were assigned a Benjamini-Hochberg-corrected p-value of at least 0.5, in addition to being expressed in at least half the samples in a tissue and at a mean level of 20 transcripts. This definition of type 0 exons intends to identify exons with substantial gene expression variability but with no evidence for being UHP or DHP exons. This resulted in 67,814 cassette exons identified as type 0. Since observing an effect of expression on splicing requires the presence of regulatory factors, such as RNA binding proteins, not observing a correlation does not immediately imply that an exon is type 0 in all tissues. However, since we examined nine different tissues, it is likely that there is roughly an order of magnitude difference between the counts of UHP/DHP exons and type 0 exons (6874 UHP/DHP vs. 67,814 type 0). In the nine tissue dataset from GTEx, there were a total of 8314 genes that contained at least one exon classified as UHP, DHP, or type 0. Of these, 1106 genes (13.3%) had at least one UHP or DHP exon. Supplementary Table 7 summarizes the number of UHP/DHP exons that were detected in the GTEx dataset, those that were detected in multiple tissues, and the overlap of these exons with exons detected in other datasets. While the number of UHP/DHP exons that are detected depends on the genes are expressed in each dataset, those genes vary in expression, statistical power and cellular mechanisms such as epigenetics modifications, the consistency in the direction of coupling suggests a core mechanism that if active, has a specific effect for each exon.
Characteristics of type 0, UHP, and DHP exons and the transcript and genes that contain them
UHP/DHP exons differ from type 0 exons in a number of characteristics including exon count, intron length, and distribution of biotypes (Fig. 2). Genes containing UHP/DHP exons have on average more exons than genes containing only type 0 exons. The genes containing them had on average slightly fewer transcripts (13 and 12 for UHP and DHP, respectively, and 14 for type 0). Furthermore, type UHP/DHP exons are included in a larger proportion of transcripts than type 0 exons.
We define the “upstream” intron as the last contiguous non-coding region that is transcribed 5′ to the exon, and the “downstream” intron as the first such region that is transcribed 3′ to the exon. The median upstream intron lengths were 572 bp for types 0, 857 bp for type UHP, and 732 bp for DHP; the differences between UHP or DHP and type 0 were statistically significant. In contrast, the median downstream intron lengths were 576 bp for type 0, 834.5 bp for UHP, and 485 bp for DHP. The differences are statistically significant between all types. DHP exons had a median length of 158 bp, which is significantly longer than UHP (median 135 bp) and type 0 (median 142 bp) exons. Finally, transcripts containing UHP/DHP exons have a higher fraction of protein coding transcripts (65% for UHP/DHP and 50.7% for type 0 exons), and a smaller fraction of retained introns (12.5% and 13% for UHP/DHP, respectively, and 20.5% for type 0) and long non-coding RNA (0.47%, 0.36% and 2.3% for UHP/DHP, and type 0, respectively) (Fig. 2 and Table 1). Additionally, the mean MaxEnt39 acceptor and donor splice site scores were higher for both UHP and DHP exons than for type 0 exons (Supplementary Figs. 3 and S4).
Table 1.
Feature | type 0 | UHP | DHP | 0 vs. UHP | 0 vs. DHP | UHP vs. DHP |
---|---|---|---|---|---|---|
exons per genea | 11 | 13 | 14 | |||
transcripts per genea | 14 | 13 | 12 | |||
inclusion in proportion of transcriptsa | 9% | 24.1% | 20% | |||
upstream intron lengtha | 584 bp | 930 bp | 832 bp | 0.03 | ||
downstream intron lengtha | 587 bp | 932 bp | 613 bp | |||
exon lengtha | 142 bp | 135 bp | 158 bp | 0.03 |
The values for genes that had both exon types were counted for both types of exons. a) Mann-Whitney test.
High consistency of UHP vs. DHP classification across multiple tissues and datasets
We hypothesized that if the classification of exons as UHP or DHP is related to one or more core regulatory processes, then the classification should be largely conserved across different tissues. Among the detected UHP/DHP exons, there are 606 exons that appear in more than one tissue as DHP always, 383 that appear in more than one tissue always as UHP, and none that appear in more than one tissue as conflicting types. The slopes of the regression lines fitted in different tissues may have different slopes, but the change in slope is correlated across UHP/DHP exons (Supplementary Fig. 5). In addition, the slope is a linear function of the mean expression level, with coefficient close to 1, possibly indicating that differences in expression rates affect the impact of UHP/DHP exons on the gene’s transcript profile (Supplementary Fig. 2).
Distribution of RNA polymerase II binding in type 0, UHP, and DHP exons
RNA Pol II accumulates on exons in yeast and human and pauses over the 5′ and 3′ splice sites of human exons40. Additionally, Pol II density is lower at skipped exons than at alternative retained exons41,42. Based on the suggested mechanism (Fig. 1E), we hypothesized that RNAP2 density might differ between the type 0, UHP, and DHP exons investigated in the current study.
In order to estimate the difference in transcription speed of UHP and DHP exons compared to type 0 exons, we used two PRO-Seq datasets43,44 (Methods). These datasets sequenced nascent mRNA in addition to mature mRNA, and therefore allowed reads to be counted in the intronic parts of the nascent mRNA of each gene. The introns downstream of UHP/DHP exons are more likely to be sequenced, suggesting that RNA polymerase spends more time transcribing them (Fig. 3, Chi-squared test p < 2.2 × 10–308). The longer transcription time may be necessary for the regulatory interactions that promote or suppress the splicing of the exon, and thus may be sensitive to changes in expression rate. Supplementary Fig. 10 shows that RNAP binding to the exons themselves is likely to be lower for UHP/DHP.
Enriched motifs
Binding of transcription factors to promoters may influence splicing by altering the rate of RNAP2 elongation or recruiting splicing factors to pre-mRNAs45. We reasoned that if this were a common factor related to the mechanisms that underlie UHP/DHP exons, then we would expect to see enrichment of predicted transcription factor flexible model (TFFM) sites in the promoter regions of UHP/DHP exons compared to type 0 exons, and would also see enrichments of predicted RBP binding sites in the sequences surrounding the UHP and DHP exons. We therefore calculated the numbers of predicted binding sites and compared the observed counts to those observed in 1,000,000 permutations in which the labels of UHP, DHP, and type 0 exons had been randomly shuffled (Methods).
321 of 610 tested TFFMs showed significant enrichment in genes with UHP or DHP exons but no type 0 exons as compared to genes with at least one type 0 exon but no UHP/DHP exon. However, the maximum difference between the two classes was 3%, suggesting that no individual transcription factor is associated with a majority of the observed effects (Table 2, Supplementary Table 3). We tested enrichment for core promoter elements and CpG islands and found that a significantly higher proportion of DHP genes co-localized with a CpG island and a lower proportion contain a TATA box (Supplementary Table 4). We examined 71 RBP models, 31 of which showed significant differences between UHP or DHP and type 0 exons (Supplementary Table 5).
Table 2.
motif | model | Type 0 | UHP | DHP | Type 0 vs. UHP | Type 0 vs. DHP | UHP vs. DHP |
---|---|---|---|---|---|---|---|
PRDM14 | TFFM0987.1 | 41.5% | 44.5% | 44.2% | p < 1.0 × 10–6 * | p < 1.0 ×10–6 * | n.s. |
SP1 | TFFM0097.2 | 39.2% | 37.9% | 36.8% | n.s. | p < 1.0 ×10–6 * | n.s. |
KLF4 | TFFM0056.3 | 36.7% | 38.0% | 35.8%, | n.s. | n.s. | 6.80 ×10–5 * |
KLF4 | TFFM0056.2 | 35.1% | 35.3% | 33.3% | n.s. | 2.20 ×10–5 * | 0.000394 |
ZNF75D | TFFM0647.1 | 34.4% | 34.2% | 32.3% | n.s. | p < 1.0 ×10–6 * | 0.000751 |
KLF15 | TFFM0515.1 | 34.1% | 33.8% | 32.0% | n.s. | p < 1.0 ×10–6 * | 0.001070 |
ZBTB6 | TFFM0624.1 | 32.7% | 30.6% | 30.2% | p < 1.0 × 10–6 * | p < 1.0 ×10–6 * | n.s. |
FLI1 | TFFM0031.1 | 31.6% | 29.7% | 31.9% | p < 1.0 × 10–6 * | n.s. | 5.90 ×10–5 * |
CTCF | TFFM0014.1 | 29.7% | 31.6% | 30.6% | p < 1.0 × 10–6 * | n.s. | n.s. |
NEUROD1 | TFFM0143.1 | 30.6% | 28.4% | 29.2% | p < 1.0 × 10–6 * | n.s. | n.s. |
TBFSs were assessed for overrepresentation in genes harboring UHP or DHP exons compared to genes only harboring one or more type 0 exon. 291 models showed a significant difference in permutation testing in which labels of exons (UHP, DHP, type 0) were randomly permuted and the p-value was calculated empirically as the proportion of permutations in which the observed difference between UHP (DHP) and type 0 exons was at least as extreme as the observed difference. The top ten are shown in this table and all results are presented in Supplementary Table 3. Of the significant models, the mean difference was 1.7% (UHP vs. type 0) and 1.1% (DHP vs. type 0). No significant differences were observed between UHP and DHP (not shown). *: significant at a Bonferroni-corrected threshold of 9.11 × 10–5.
Biological rationale for coupling expression and splicing
The ubiquity of UHP/DHP exons led us to further investigate associations of transcript-specific functions. Alternative splicing of many genes can produce isoforms that differ with respect to enzymatic activities and subcellular localizations, as well as protein–protein, protein–DNA, and protein–ligand physical interactions46. Gene Ontology (GO) overrepresentation analysis is a standard approach to assessing the functional profile of differentially expressed genes47, but analogous methods for examining the functional profile of differential isoforms have not been available, possibly because of the paucity of experimentally confirmed functional annotations of isoforms48. We recently developed an expectation-maximization framework for predicting isoform-specific GO annotations49. We used these annotations to assess overrepresentation of GO terms in the set of isoforms that were found to be UHP in our study, with the universe of comparison being the set of all isoforms for which an exon of any type was detected. A total of 410 GO terms displayed significant overrepresentation (See Table 3 for the top ten and Supplementary File 2 for a complete list).
Table 3.
GO Term | Coverage | p.value | adj.p |
---|---|---|---|
JUN kinase activity (GO:0004705) | 0.62 | 2.04 × 10–54 | 2.89 × 10–51 |
MAP kinase activity (GO:0004707) | 0.49 | 2.22 × 10–54 | 3.13 × 10–51 |
MAP kinase kinase activity (GO:0004708) | 0.59 | 7.68 × 10–50 | 1.08 × 10–46 |
response to light stimulus (GO:0009416) | 0.60 | 7.90 × 10–44 | 1.11 × 10–40 |
actin binding (GO:0003779) | 0.19 | 2.78 × 10–35 | 3.91 × 10–32 |
Fc-epsilon receptor signaling pathway (GO:0038095) | 0.51 | 2.66 × 10–33 | 3.74 × 10–30 |
GPI-anchor transamidase complex (GO:0042765) | 0.65 | 6.37 × 10–31 | 8.95 × 10–28 |
cytoskeleton organization (GO:0007010) | 0.32 | 7.16 × 10–31 | 1.01 × 10–27 |
cytoskeletal protein binding (GO:0008092) | 0.31 | 1.30 × 10–30 | 1.82 × 10–27 |
cellular senescence (GO:0090398) | 0.44 | 1.02 × 10–29 | 1.43 × 10–26 |
Enrichment was tested using the hypergeometric test, where we draw UHP isoforms or other isoforms a number of times that equals the number of isoforms that are annotated to the GO term (Methods). Bonferroni multiple testing correction was applied (adjusted p-value column). Coverage refers to the proportion of isoforms annotated to the term that contain a UHP exon.
Interestingly, the top three overrepresented terms, JUN kinase activity (Supplementary Fig. 6), MAP kinase activity, and MAP kinase kinase activity annotate genes and pathways that coordinately regulate gene expression, mitosis, metabolism, motility, survival, apoptosis, differentiation and protection against DNA damage, and deleterious mutations50–52. Our results suggest that cells can upregulate these pathways both by increasing overall expression of genes in the pathway and also by alternative splicing to favor transcription of isoforms that specifically possess pathway activity.
We investigated the association between the degree of differential expression and the degree of alternative splicing by regressing ψ against the gene expression level for all exons identified as UHP/DHP. We found that the slope of the expression-ψ line is approximately the mean expression level of the gene (Supplementary Fig. 2). Therefore, in dividing cells, which need to double their protein content, increasing expression of genes annotated to JUN kinase activity, MAP kinase activity, and MAP kinase kinase activity will additionally strongly favor transcription of UHPs that also are specifically annotated to these GO terms. Thus, increased gene expression will both increase the overall amount of genes and shift their transcript distributions to transcripts with Jun kinase activity. In order to test this hypothesis, we computed the Pearson correlation between UHP exons and Cyclin D1 gene expression, as the latter is a marker for the level of mitosis53. This correlation is mostly positive, while the same correlation between DHP/Type 0 exon and Cyclin D1 expression is mostly negative (Supplementary Fig. 7). We further calculated correlation between UHP exons and Cyclin D1 expression in The Cancer Genome Atlas (TCGA) transcript expression dataset, for primary tumor and normal tissue, and found a significant reduction in correlation in tumor samples (Supplementary Fig. 8). Synchronization of alternative splicing with the cell cycle has been previously observed by Domingues et al. 54. Their list of genes significantly overlaps with genes that contain UHP or DHP exons (p-value 0.01, Hypergeometric Test). Schor et al. also suggested that the coupling mechanism may be altered in cancer55. Based on our findings, we suggest a potential mechanism by which the degradation of expression-splicing coupling may contribute to cancer development. At early stages of cancer development, when the cells divide quickly but have not accumulated a large number of mutations, the coupling is intact, and might even be activated compared to normal cells if the normal cells divide slowly. At later stages, as more mutations accumulate, the coupling is reduced as splicing factor binding sites and splicing factors are affected by mutations. Since this does not prevent the cells from further dividing, we speculate that the pathways that are no longer induced by the cell cycle are intended to prevent genomic instability, for example through the inclusion of transcripts in the Jun Kinase and MAP Kinases pathways. In order to test this idea, we separated the cancer samples in Supplementary Fig. 8 to those that had less than 20 non-silent mutations (SNPs and indel) and those that had 20 or more. As can be seen in Supplementary Fig. 9, the correlation of UHP PSI with Cyclin D1 expression decreased with increase in the number of SNVs, which supports the hypothesis that the coupling degrades gradually.
Many of the top enriched terms of DHP-containing transcripts are involved in monosaccharide metabolism, e.g., glycolysis, fructose metabolism, glycolysis from storage polysaccharide and glycogen catabolism (Supplementary File 3 contains a complete list). We speculate that this coupling could promote efficient replenishment of energy reserves after cell division or generate energy for future division. It is reasonable to assume that DHP exons are also used to shut down pathways whose activity is undesirable during cell division. We defer a focused analysis of the functional synergy between UHP and DHP exons to a future study.
Discussion
We developed an approach to characterize associations between overall gene expression, defined as the sum of read counts for all transcripts assigned to a gene, and the regulation of alternative splicing, defined as the inclusion or exclusion of an exon belonging to some, but not all, transcripts of the gene. We identified exons whose exclusion or inclusion was correlated with total gene expression. UHP (upregulated-high ψ) exons show a significant association of higher overall gene expression with higher degrees of exon inclusion, and DHP (downregulated-high ψ) exons show a significant association of lower overall gene expression with higher degrees of exon inclusion. It is likely that the total number of such exons identified by our study, 3667 UHP exons and 3207 DHP exons, corresponding to a total of 6874 exons in 1106 genes, represents a lower bound, because the experiments investigated in our study do not comprise a sufficient range of conditions to assay a sufficiently variable range of expression and splicing to detect all UHP and DHP exons.
A previous work assayed RNAP2 mutants that change average elongation rates genome-wide and showed two classes of cassette exons that displayed higher degrees of inclusion with slower RNAP2 mutants (type I) and lower degrees of inclusion with faster RNAP2 mutants (type II). The type I exons tended to have weaker splice sites, to be surrounded by shorter introns compared to type II exons, and to harbor distinct sequence motifs30. The exons identified by this work were mapped to the hg19 genome, and splicing was quantified using the MATS tool, which does not reconstruct full transcripts, limiting comparability with our results. Speculatively, however, the association of type I/II as well as of UHP/DHP exons with intron length, splice site strength, and sequence motifs could indicate partially shared mechanism, with differences being due to the fact that the previous study was investigating global changes of RNAP2 extension speed.
Our study identified significant differences in the strength of splice sites, intron and exon length, and different proportions of predicted TFBS in promoter regions of gene harboring UHP/DHP exons compared to genes with type 0 exons. Additionally, we identified a significantly higher relative RNAP binding to UHP/DHP exons vs. type 0 on the same gene in data from 106 POLR2A ChIP-Seq experiments, and a higher count of nascent RNA reads per base pair in introns downstream of UHP and DHP exons as compared to type 0 exons, suggesting a role of RNAP2 in mediating the observed effects. The consistency of UHP/DHP classification across tissues of the direction of correlation between expression and exon proportion suggests an intrinsic mechanism that is not the sole result of epigenetic modifications. Our interpretation is that local modulation of transcription speed40 could play a role in modulation of alternative splicing. In our study, we identified 141,043 exons with a mean count of at least 20 reads per sample and at least a twofold ratio of the 95th percentile to the 5th percentile of expression values. Of these, 4.8% were classified as either UHP or DHP. We expect that the figure of 4.8% of exons displaying a significant relation between splicing and expression is a lower bound, and that comprehensive profiling of large-scale datasets representing a wider range of tissues, developmental stages, and disease states may reveal additional instances of coupled splicing and expression regulation. 1106 genes, corresponding to 13.3% of genes with non-trivial expression in the nine investigated GTEx tissues, contained at least one UHP or DHP exon. This proposed mechanism is decentralized and stems from intrinsic properties of transcription.
Separate mechanisms of transcriptional regulation may override the coupling of expression and splicing. For instance, inclusion of exon 7 of the SMN2 gene, whose inclusion increases with transcription rate, can be modified through acetylation56. The same mechanism can potentially override the coupling and thus alter the patterns observed in different experiments. This is expected to cause exons that are UHP/DHP in the absence of epigenetic modifications to behave as Type 0 exons. Indeed, in the datasets that we examined, exons may be identified as coupled only in a subset of the experimental conditions/cell types. In addition to modifications that override the coupling, the presence of signals that affect gene expression, for example specific transcription factors or mitogens, may also change the observed coupling pattern observed in our analysis. A dataset that is more diverse with respect to these factors is more likely to reveal coupling than a dataset in which the rate of transcription is constant across samples.
Finally, UHP/DHP exons are enriched for hundreds of distinct GO terms, suggesting that the coupling between expression and alternative splicing may provide an important gene regulatory mechanism that might be used in a variety of biological contexts.
There are several limitations to the current study, which could be addressed by tailoring the experimental data to the needs of the study. First, perturbation experiments are needed in order to show a cause–effect relationship between expression and splicing. Our study is based on correlation but we did not conduct perturbation experiments. Additionally, datasets that compare the coupling in the presence of different epigenetic modifications can elucidate the effect of such modification on coupling. A third limitation is that we did not have access to cell cycle-synchronized cells, which would be essential to show the relationship between UHP/DHP exons and cell division. The sparsity of information about RBP binding sites, or relevant ChIP-Seq datasets, also limits our potential to identify RBPs that may affect the coupling. While the affinity of RNAPII and its transcription rate are likely to be a core element of the coupling mechanism, our ChIp-Seq and PRO-Seq datasets that examine the binding of RNAPII are derived from cell lines, whereas our RNA-Seq data are derived from tissues. Future work should ideally match the cell types for different assays.
A direct comparison of our findings to a previously published study on genetic interactions in Saccharomyces cerevisiae57 shows that the behavior we observed in human cells is more complex than the one observed in that study. In yeast cells, increased RNAPII speed always decreases splicing efficiency; however in our study on human tissues, we observed two classes of exon on which increased RNAPII speed has an opposite effect. This difference is possibly a result of splicing regulatory mechanisms that evolved in humans but are absent in yeast. In yeast, most genes have a single intron, and therefore the effects of coupling on pathway activation would be absent. At the same time, both works find that slower elongation speed provides more time for splicing factors to exert their effect. Hence, a coupling similar to the one observed in yeast could have constituted a starting point for evolution of a more complex system.
Methods
Data
RNA-seq data: The Genotype-Tissue Expression (GTEx) project offers a genome-wide quantification of the expected number of transcripts in thousands of samples across tens of different human tissues37. Quantification is performed using bulk RNA-Sequencing and the RSEM tool58. We used the file GTEx_Analysis_2017–06–05_v8_RSEMv1.3.0_transcript_tpm.gct.gz, which provides transcripts per million counts across tissues such that expression levels are normalized across experiments.
The tissues we tested include Spleen, Thyroid, Brain - Cortex, Adrenal Gland, Breast - Mammary Tissue, Heart - Left Ventricle, Liver, Pituitary, and Pancreas, and included several hundreds of samples each. This large number of samples was chosen in order to have enough statistical power to satisfy rigorous selection criteria. The UHP/DHP exons that were detected in the GTEx RNA-Seq dataset were used for all the analyses described in the paper. We compared the exons detected in the GTEx dataset to type 0, UHP, and DHP exons in three breast (SRP301453), left ventricle (SRP237337), and liver (SRP326468) bulk RNA-seq datasets that were obtained from the Sequence Read Archive (SRA)59. To maximize statistical power, we compared the exons detected in the GTEx tissues as one set.
We additionally analyzed an RNA-seq dataset that comprised matched tumor and control samples from The Cancer Genome Atlas (TCGA60); TCGA expression and mutation data was obtained from the UCSC Xena Browser (xenabrowser.net). In order to find a number of non-silent somatic mutations that would split the cancer samples into two groups of low and high mutation number, we fitted a mixture of two Poisson distributions to the mutation count in the cancer samples. The number 20 corresponds to the larger mean. The Thyroid Carcinoma data had at least 50 samples in each condition allowing a reliable computation of correlation, a diverse range of CyclinD1 expression values, and corresponded to a tissue that we also analyzed in GTEx. The healthy samples showed the same level of correlation with Cyclin D1 that is observed in GTEx. Finally, the distribution of mutations in the tumor samples spanned both early stages (small number of mutations) and advanced stages (large number of mutations). The other cancer types in TCGA did not satisfy these requirements.
Gene models used for the definition of exon bounds and transcript affiliation were derived from the GTF file Homo_sapiens.GRCh38.91.gtf from GENCODE61. The GTF file contained 683,196 unique exons.
ChIP-seq data: For ChIP-Seq peaks, we downloaded BED files from ENCODE using the provided filters to select ChIP-Seq files for POLR2A in human cells62. This resulted in 105 BED files containing peaks. File names are provided in Supplemental Table 6.
Gene expression variability threshold
We reasoned that genes that do not display a certain minimum level of expression variability would not be highly powered to discover associations of expression with alternative splicing. Therefore, we applied the following inclusion criteria. The GRCh38 GENCODE annotations of the human genome comprise 683,196 exons. Exons were removed from further analysis unless they were expressed in at least half of the samples from a given tissue (i.e., had a read count of at least one) and which displayed a mean expression level across all samples from the tissue of 20 counts or more. Additionally, we calculated the ratio of the 95th percentile and 5th percentile of the expression values, and removed exons whose ratio was less than 2.0. Finally, we limited analysis to genes that contained at least one exon that showed alternative splicing, defined as a gene with at least two transcripts that differed with respect to inclusion or exclusion of an exon or exon segment.
Percent spliced in (ψ)
For each gene that passed that threshold defined in the previous section, we investigated whether the transcripts differ with respect to inclusion or exclusion of a cassette exon. If so, we treat each affected cassette exon in the gene separately, and define the count of transcripts that contain the exon as and the count of transcripts that exclude the exon as to calculate the Percent Spliced In, as
1 |
If multiple sets of exons are perfectly correlated with respect to transcript structure, they are collapsed such that the statistics for the event are calculated only once. For instance, if a gene has two transcripts with exon structure A-B-C-D-E and A-C-E, then we calculate the selection criteria for only one of the alternatively spliced exons B and D and apply them to both.
Correlation between gene expression and alternative splicing
We investigated potential associations between gene expression and alternative splicing of cassette exons as defined above. We applied the following linear regression model for cassette exon of gene , whereby is the total expression of the gene (sum of counts of all transcripts assigned to the gene), and is the percent spliced in as defined above.
2 |
In words, the model predicts the gene expression level based on exon inclusion fraction.
The p-value for the coefficient tests the null hypothesis that has no correlation with . This p-value is corrected for multiple testing using the Benjamini Hochberg method63 in each tissue separately.
We conclude that there is a significant relationship between alternative splicing and expression if the corrected p-value is 0.05 or less, the coefficient of determination () is at least 0.5, and additionally the ratio of the 95 percentile and 5 percentile of the expression values is at least 2.
The results of this analysis are used to define the exon type. For each analyzed cassette exon, if there is a significant correlation and > 0, that is, higher inclusion predicts higher expression, the exon is classified as upregulated-high ψ (UHP). If , that is, higher inclusion predicts lower expression, the exon is classified as downregulated-high ψ (DHP). If the relationship is not significant, the exon is classified as type 0. We note that exons that are not cassette exons are not classified by our definition.
Analysis of PRO-Seq datasets
We obtained the aligned reads for the dataset of ref. 43 in.bam file format from the ENCODE website using the PRO-Seq filter, which retrieves 8 files corresponding to two biological samples. For the dataset of44 we obtained the FASTQ files from SRA and processed them using the pipeline described in ref. 64, using the “output-genome-bam” option of RSEM. In order to compute overlaps with intronic regions we used bedtools intersect with default parameters65. The counts were computed for every gene that contained at least one UHP/DHP exon.
Enriched motif testing
Here, we characterized predicted sequence motifs for transcript factor binding sites (TFBS), RNA-binding protein (RBP) binding sites, and core promoter elements (CPE).
We characterized TFBS predicted by detailed transcript factor flexible models (TFFM)66 in the promoters of genes containing at least one type 0 exon but no UHD or DHP exon (referred to as type 0 gene), genes containing no type 0 or DHP exon but at least one UHP exon (referred to as UHP gene), and genes containing no type 0 or UHP exon but at least one DHP exon (referred to as DHP gene). TFFMs binding motifs were taken from JASPAR67, RBP matrices were taken from the RNA-binding protein database68, and CPEs were characterized as previously69. The calculations were conducted within the backend infrastructure of the FABIAN-variant application70.
We derived empirical p-values by random sampling (without replacement) with one million permutations of our variable of interest. The p-value is the proportion of samples that have a test statistic larger than that of our observed In our case, the statistic of interest is the difference of the proportion of hits for some protein-binding factor in UHP (or DHP) vs. type 0 exons. For instance, let’s say that the proportion of UHP promoters with a TATA box is 32.6% and the proportion of type 0 promoters with a TATA box is 17.2%. Then our statistic of interest is ∆ = 32.6−17.2 = 15.4. We then run the same analysis 1,000,000 times with permutations of the promoters (start with the same collection of promoters and randomize the assignments to UHP, DHP, and type 0 while retaining the same overall numbers). Call the result of each randomizing analysis ∆′. Then our p-value is the proportion of times that ∆′ > ∆.
Since we are performing the above procedure for hundreds of covariates (i.e., several tests for each TFBS), we adjusted for multiple testing by Bonferroni correction after excluding tests where either |∆′ − ∆| < 0.5 or |∆′ − ∆|/∆ < 0.05.
Functional enrichment analysis
Using isoform-level function assignment from isopret49, the hypergeometric test was used to determine the probability of observing at least the observed number of UHP-containing isoforms out of the total isoforms annotated to a given GO term that were UHP, DHP, or type 0. Only GO terms with at least five UHP isoforms annotated to them were considered. Bonferroni correction was applied to the resulting p-values.
Supplementary information
Acknowledgements
The authors would like to thank Dr. Gloria Fuentes, The Visual Thinker, for designing Fig. 1. This work was supported by National Institutes of Health grants R01CA248317 and R01GM138541 to OA. We acknowledge the use of shared resources supported by the JAX Cancer Center (National Cancer Institute P30CA034196). Funding for open access charge: JAX Institutional funding.
Author contributions
G.K. and P.N.R. conceived the research. G.K. designed the computational experiments and analyzed the data. P.N.R. supervised the project. R.S. and P.N.R. performed the promoter analysis. D.D. contributed to the splice site scoring analysis. M.D., O.A., G.S. and D.S. helped supervise the project.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Data availability
The data used in this study are available at the NCBI Sequence Read Archive (SRA)59. The individual datasets can be downloaded by using the Snakemake71 script that is provided under an MIT License at https://github.com/TheJacksonLaboratory/gene_exp_psi.
Code availability
TheSnakemake file additionally runs a collection of scripts that were used to generate the main results presented in the manuscript. Source code for the C++ application used to analyze motifs associated with UHP and DHP exons is also provided at the GitHub repository. Any additional information required to reanalyze the data reported in this paper is available from the corresponding author upon request.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Guy Karlebach, Email: gkarleba@fitchburgstate.edu.
Peter N. Robinson, Email: peter.robinson@bih-charite.de
Supplementary information
The online version contains supplementary material available at 10.1038/s41525-024-00432-w.
References
- 1.Pan, Q., Shai, O., Lee, L. J., Frey, B. J. & Blencowe, B. J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet.40, 1413–1415 (2008). [DOI] [PubMed] [Google Scholar]
- 2.Wahl, M. C., Will, C. L. & Lührmann, R. The spliceosome: design principles of a dynamic RNP machine. Cell136, 701–718 (2009). [DOI] [PubMed] [Google Scholar]
- 3.Moehle, E. A., Braberg, H., Krogan, N. J. & Guthrie, C. Adventures in time and space: splicing efficiency and RNA polymerase II elongation rate. RNA Biol.11, 313–319 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Gehring, N. H. & Roignant, J.-Y. Anything but ordinary - emerging splicing mechanisms in eukaryotic gene regulation. Trends Genet.37, 355–372 (2021). [DOI] [PubMed] [Google Scholar]
- 5.Cramer, P., Pesce, C. G., Baralle, F. E. & Kornblihtt, A. R. Functional association between promoter structure and transcript alternative splicing. Proc. Natl. Acad. Sci. USA94, 11456–11460 (1997). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Cramer, P. et al. Coupling of transcription with alternative splicing: RNA pol II promoters modulate SF2/ASF and 9G8 effects on an exonic splicing enhancer. Mol. Cell4, 251–258 (1999). [DOI] [PubMed] [Google Scholar]
- 7.Auboeuf, D., Hönig, A., Berget, S. M. & O’Malley, B. W. Coordinate regulation of transcription and splicing by steroid receptor coregulators. Science298, 416–419 (2002). [DOI] [PubMed] [Google Scholar]
- 8.Chang, M.-L., Chen, J.-C., Alonso, C. R., Kornblihtt, A. R. & Bissell, D. M. Regulation of fibronectin splicing in sinusoidal endothelial cells from normal or injured liver. Proc. Natl. Acad. Sci. USA101, 18093–18098 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kadener, S., Fededa, J. P., Rosbash, M. & Kornblihtt, A. R. Regulation of alternative splicing by a transcriptional enhancer through RNA pol II elongation. Proc. Natl. Acad. Sci. USA99, 8185–8190 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Aslanzadeh, V. & Beggs, J. D. Revisiting the window of opportunity for cotranscriptional splicing in budding yeast. RNA26, 1081–1085 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Saldi, T., Cortazar, M. A., Sheridan, R. M. & Bentley, D. L. Coupling of RNA polymerase II transcription elongation with pre-mRNA splicing. J. Mol. Biol.428, 2623–2635 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Eperon, L. P., Graham, I. R., Griffiths, A. D. & Eperon, I. C. Effects of RNA secondary structure on alternative splicing of pre-mRNA: is folding limited to a region behind the transcribing RNA polymerase? Cell54, 393–401 (1988). [DOI] [PubMed] [Google Scholar]
- 13.de la Mata, M. et al. A slow RNA polymerase II affects alternative splicing in vivo. Mol. Cell12, 525–532 (2003). [DOI] [PubMed] [Google Scholar]
- 14.Pagani, F., Stuani, C., Zuccato, E., Kornblihtt, A. R. & Baralle, F. E. Promoter architecture modulates CFTR exon 9 skipping. J. Biol. Chem.278, 1511–1517 (2003). [DOI] [PubMed] [Google Scholar]
- 15.Dujardin, G. et al. How slow RNA polymerase II elongation favors alternative exon skipping. Mol. Cell54, 683–690 (2014). [DOI] [PubMed] [Google Scholar]
- 16.Hsin, J.-P. & Manley, J. L. The RNA polymerase II CTD coordinates transcription and RNA processing. Genes Dev.26, 2119–2137 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Dye, M. J., Gromak, N. & Proudfoot, N. J. Exon tethering in transcription by RNA polymerase II. Mol. Cell21, 849–859 (2006). [DOI] [PubMed] [Google Scholar]
- 18.Gromak, N., Talotti, G., Proudfoot, N. J. & Pagani, F. Modulating alternative splicing by cotranscriptional cleavage of nascent intronic RNA. RNA14, 359–366 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.de la Mata, M. & Kornblihtt, A. R. RNA polymerase II C-terminal domain mediates regulation of alternative splicing by SRp20. Nat. Struct. Mol. Biol.13, 973–980 (2006). [DOI] [PubMed] [Google Scholar]
- 20.Iannone, C. et al. Relationship between nucleosome positioning and progesterone-induced alternative splicing in breast cancer cells. RNA21, 360–374 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Khan, D. H. et al. RNA-dependent dynamic histone acetylation regulates MCL1 alternative splicing. Nucleic Acids Res.42, 1656–1670 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Yearim, A. et al. HP1 is involved in regulating the global impact of DNA methylation on alternative splicing. Cell Rep.10, 1122–1134 (2015). [DOI] [PubMed] [Google Scholar]
- 23.Kolasinska-Zwierz, P. et al. Differential chromatin marking of introns and expressed exons by H3K36me3. Nat. Genet.41, 376–381 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Buratti, E. & Baralle, F. E. Influence of RNA secondary structure on the pre-mRNA splicing process. Mol. Cell. Biol.24, 10505–10514 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Jimeno-González, S. et al. Defective histone supply causes changes in RNA polymerase II elongation rate and cotranscriptional pre-mRNA splicing. Proc. Natl. Acad. Sci. USA112, 14840–14845 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Ip, J. Y. et al. Global impact of RNA polymerase II elongation inhibition on alternative splicing regulation. Genome Res.21, 390–401 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Sureau, A., Gattoni, R., Dooghe, Y., Stévenin, J. & Soret, J. SC35 autoregulates its expression by promoting splicing events that destabilize its mRNAs. EMBO J.20, 1785–1796 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lamba, J. K. et al. Nonsense mediated decay downregulates conserved alternatively spliced ABCC4 transcripts bearing nonsense codons. Hum. Mol. Genet.12, 99–109 (2003). [DOI] [PubMed] [Google Scholar]
- 29.Wollerton, M. C., Gooding, C., Wagner, E. J., Garcia-Blanco, M. A. & Smith, C. W. J. Autoregulation of polypyrimidine tract binding protein by alternative splicing leading to nonsense-mediated decay. Mol. Cell13, 91–100 (2004). [DOI] [PubMed] [Google Scholar]
- 30.Fong, N. et al. Pre-mRNA splicing is facilitated by an optimal RNA polymerase II elongation rate. Genes Dev.28, 2663–2676 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Yankulov, K., Blau, J., Purton, T., Roberts, S. & Bentley, D. L. Transcriptional elongation by RNA polymerase II is stimulated by transactivators. Cell77, 749–759 (1994). [DOI] [PubMed] [Google Scholar]
- 32.Krumm, A., Hickey, L. B. & Groudine, M. Promoter-proximal pausing of RNA polymerase II defines a general rate-limiting step after transcription initiation. Genes Dev.9, 559–572 (1995). [DOI] [PubMed] [Google Scholar]
- 33.Brown, S. A., Weirich, C. S., Newton, E. M. & Kingston, R. E. Transcriptional activation domains stimulate initiation and elongation at different times and via different residues. EMBO J.17, 3146–3154 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Reines, D., Conaway, R. C. & Conaway, J. W. Mechanism and regulation of transcriptional elongation by RNA polymerase II. Curr. Opin. Cell Biol.11, 342–346 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Couvillion, M. et al. Transcription elongation is finely tuned by dozens of regulatory factors. Elife11, e78944 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Giono, L. E. & Kornblihtt, A. R. Linking transcription, RNA polymerase II elongation and alternative splicing. Biochem. J.477, 3091–3104 (2020). [DOI] [PubMed] [Google Scholar]
- 37.GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet.45, 580–585 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.GTEx Consortium. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science348, 648–660 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Yeo, G. & Burge, C. B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol.11, 377–394 (2004). [DOI] [PubMed] [Google Scholar]
- 40.Muniz, L., Nicolas, E. & Trouche, D. RNA polymerase II speed: a key player in controlling and adapting transcriptome composition. EMBO J.40, e105740 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Mayer, A. et al. Native elongating transcript sequencing reveals human transcriptional activity at nucleotide resolution. Cell161, 541–554 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Nojima, T. et al. Mammalian NET-seq reveals genome-wide nascent transcription coupled to RNA processing. Cell161, 526–540 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Wissink, E. M., Martinez, D. M., Ehmsen, K. T., Yamamoto, K. R. & Lis, J. T. Glucocorticoid receptor collaborates with pioneer factors and AP-1 to execute genome-wide regulation. bioRxiv 2021.06.01.444518 10.1101/2021.06.01.444518 (2021)
- 44.Gupta, A. et al. Deconvolution of multiplexed transcriptional responses to wood smoke particles defines rapid aryl hydrocarbon receptor signaling dynamics. J. Biol. Chem.297, 101147 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Rambout, X., Dequiedt, F. & Maquat, L. E. Beyond transcription: roles of transcription factors in pre-mRNA splicing. Chem. Rev.118, 4339–4364 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Kelemen, O. et al. Function of alternative splicing. Gene514, 1–30 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Bauer, S., Grossmann, S., Vingron, M. & Robinson, P. N. Ontologizer 2.0–a multifunctional tool for GO term enrichment analysis and data exploration. Bioinformatics24, 1650–1651 (2008). [DOI] [PubMed] [Google Scholar]
- 48.Bhuiyan, S. A. et al. Systematic evaluation of isoform function in literature reports of alternative splicing. BMC Genom.19, 637 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Karlebach, G. et al. An expectation-maximization framework for comprehensive prediction of isoform-specific functions. Bioinformatics39, btad132 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Picco, V. & Pagès, G. Linking JNK activity to the DNA damage response. Genes Cancer4, 360–368 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Maertens, O. et al. MAPK pathway suppression unmasks latent DNA repair defects and confers a chemical synthetic vulnerability in BRAF-, NRAS-, and NF1-mutant melanomas. Cancer Discov.9, 526–545 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Cargnello, M. & Roux, P. P. Activation and function of the MAPKs and their substrates, the MAPK-activated protein kinases. Microbiol. Mol. Biol. Rev.75, 50–83 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Massagué, J. G1 cell-cycle control and cancer. Nature432, 298–306 (2004). [DOI] [PubMed] [Google Scholar]
- 54.Dominguez, D. et al. An extensive program of periodic alternative splicing linked to cell cycle progression. Elife5, e10288 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Schor, I. E., Gómez Acuña, L. I. & Kornblihtt, A. R. Coupling between transcription and alternative splicing. Cancer Treat. Res.158, 1–24 (2013). [DOI] [PubMed] [Google Scholar]
- 56.Marasco, L. E. et al. Counteracting chromatin effects of a splicing-correcting antisense oligonucleotide improves its therapeutic efficacy in spinal muscular atrophy. Cell185, 2057–2070.e15 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Braberg, H. et al. From structure to systems: high-resolution, quantitative genetic analysis of RNA polymerase II. Cell154, 775–788 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinforma.12, 323 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Leinonen, R., Sugawara, H., Shumway, M. & International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res.39, D19–21 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Cancer Genome Atlas Research Network. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet.45, 1113–1120 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Frankish, A. et al. GENCODE 2021. Nucleic Acids Res.49, D916–D923 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.ENCODE Project Consortium. et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature583, 699–710 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol.57, 289–300 (1995). [Google Scholar]
- 64.Karlebach, G. et al. Betacoronavirus-specific alternate splicing. Genomics114, 110270 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics26, 841–842 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Mathelier, A. & Wasserman, W. W. The next generation of transcription factor binding site prediction. PLoS Comput. Biol.9, e1003214 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Castro-Mondragon, J. A. et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res.50, D165–D173 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Cook, K. B., Kazan, H., Zuberi, K., Morris, Q. & Hughes, T. R. RBPDB: a database of RNA-binding specificities. Nucleic Acids Res.39, D301–D308 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Steinhaus, R., Gonzalez, T., Seelow, D. & Robinson, P. N. Pervasive and CpG-dependent promoter-like characteristics of transcribed enhancers. Nucleic Acids Res.48, 5306–5317 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Steinhaus, R., Robinson, P. N. & Seelow, D. FABIAN-variant: predicting the effects of DNA variants on transcription factor binding. Nucleic Acids Res.50, W322–W329 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Mölder, F. et al. Sustainable data analysis with Snakemake. F1000Res.10, 33 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data used in this study are available at the NCBI Sequence Read Archive (SRA)59. The individual datasets can be downloaded by using the Snakemake71 script that is provided under an MIT License at https://github.com/TheJacksonLaboratory/gene_exp_psi.
TheSnakemake file additionally runs a collection of scripts that were used to generate the main results presented in the manuscript. Source code for the C++ application used to analyze motifs associated with UHP and DHP exons is also provided at the GitHub repository. Any additional information required to reanalyze the data reported in this paper is available from the corresponding author upon request.