Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Nov 22.
Published in final edited form as: Curr Biol. 2021 Sep 22;31(22):4898–4910.e4. doi: 10.1016/j.cub.2021.09.004

Coupling of spliceosome complexity to intron diversity

Jade Sales-Lee 1, Daniela S Perry 1, Bradley A Bowser 2, Jolene K Diedrich 3, Beiduo Rao 1, Irene Beusch 1, John R Yates 3rd 3, Scott W Roy 4, Hiten D Madhani 5
PMCID: PMC8967684  NIHMSID: NIHMS1743503  PMID: 34555349

SUMMARY

We determined that over 40 spliceosomal proteins are conserved between many fungal species and humans but were lost during the evolution of S. cerevisiae, an intron-poor yeast with unusually rigid splicing signals. We analyzed null mutations in a subset of these factors, most of which had not been investigated previously, in the intron-rich yeast Cryptococcus neoformans. We found they govern splicing efficiency of introns with divergent spacing between intron elements. Importantly, most of these factors also suppress usage of weak nearby cryptic/alternative splice sites. Among these, orthologs of GPATCH1 and the helicase DHX35 display correlated functional signatures and copurify with each other as well as components of catalytically active spliceosomes, identifying a conserved G-patch/helicase pair that promotes splicing fidelity. We propose that a significant fraction of spliceosomal proteins in humans and most eukaryotes are involved in limiting splicing errors, potentially through kinetic proofreading mechanisms, thereby enabling greater intron diversity.

eTOC blurb

Sales-Lee et al. show that the ancestral spliceosome was complex and the intron-reduced yeast S. cerevisiae lost dozens of spliceosomal proteins maintained in intron-rich fungi and humans. Functional and proteomic analysis in the intron-rich yeast C. neoformans reveals roles for these proteins in splicing fidelity and uncovers functional modules.

INTRODUCTION

The spliceosome is a complex and dynamic assembly of small nuclear ribonucleoproteins (snRNPs) and proteins that assemble onto the intron substrate and then undergo several large rearrangements to form a catalytically active complex 1. Two sequential transesterification steps mediate intron removal. Pre-mRNA splicing by the spliceosome seems complex for a process that removes a segment of RNA from a precursor. Splicing requires eight ATP-dependent steps and about 90 proteins in S. cerevisiae. Much of our functional understanding derives from the analysis of conditional and null mutants in S. cerevisiae 1. Human spliceosomes appear to contain about 60 additional proteins 2,3. The reason for this added complexity is not understood.

Structures of the core portion of the spliceosome at various stages of its cycle have been elucidated using cryoEM 1. Many structures have been obtained using in vitro-assembled spliceosomes using extracts from the budding yeast S. cerevisiae or from HeLa cells. While the structure of the core of the spliceosome is invariant across divergent species, proteins and structures have been identified in human spliceosomes that are not found in S. cerevisiae spliceosomes. While it might be imagined that the higher complexity of human spliceosomes relates to late evolutionary innovations that enabled metazoan complexity, an alternative model is that the common ancestor of S. cerevisiae and humans harbored a complex spliceosome, whose components were lost during the evolution of S. cerevisiae. There is anecdotal support for this hypothesis. For example, orthologs of a number of human splicing factors that do not exist in S. cerevisiae have been described in the fission yeast Schizosaccharomyces pombe 46. Prior work indicates that the Saccharomycotina, the subphylum to which S. cerevisiae belongs, has lost introns that were present in an intron-rich ancestor, such that less than ten percent of genes harbor introns in S. cerevisiae 7. As in other lineages, such loss events correlate with intron signals moving towards optimal intron signals. Thus, as introns are lost, intron signals become homogenous and lose diversity. Insofar as certain splicing factors play outsized roles in recognition of introns with divergent splice signals, such homogenization might be expected to be associated with loss of spliceosomal factors and thus overall spliceosomal simplification. We describe below, phylogenetic, functional, and proteomic investigations of this question.

RESULTS

Maintenance of many dozens of human spliceosomal orthologs in fungal lineages

Cryptococcus neoformans offers a genetically tractable intron-rich haploid organism in which to investigate fundamental aspects of gene expression. We highlight the differences between the intron sequences and abundance between Saccharomyces cerevisiae, Cryptococcus neoformans and Homo sapiens in Figure 1A. The S. cerevisiae genome is estimated to encode 282 introns 8 spread over 5410 annotated genes (0.05 introns/gene), while Cryptococcus neoformans (H99 strain) has 6941 annotated protein coding genes harboring over 40,000 introns 9,10, comparable to humans (8 introns/gene), with 27,219 annotated genes and over 200,000 introns 11. The sequences of C. neoformans 5’ splice sites and branchpoints are more variable than those of S. cerevisiae, suggesting its spliceosomes, like those of humans, may be more flexible in substrate utilization (Figure 1A).

Figure 1: Massive loss of human spliceosomal protein orthologs in specific fungal lineages.

Figure 1:

A. Comparison of intron number and properties in humans versus the yeasts S. cerevisiae and C. neoformans.

B. Evolutionary loss events. Phylogeny is based on James et al. (2020). See also Data S1 and Table S2.

C. Numbers of human spliceosomal protein orthologs in S. cerevisiae, S. pombe and C. neoformans.

D. Spliceosomal factor orthologs for which null mutations in C. neoformans were obtained. Confidence scores for the presence of the indicated human spliceosomal protein orthologs in the indicated species are shown.

We asked whether the loss of introns in the Saccharomycotina, the subphylum in which S. cerevisiae belongs, is accompanied with a loss of spliceosomal protein orthologs. We compiled a list of all spliceosome components reproducibly detected through mass spectrometry, interaction studies and/or purified and visualized in the spliceosome in structural biology studies 2,3,12. This list includes 157 human proteins (Data S1). To identify candidates for fungal orthologs we used a combination of criteria including reciprocal BLASTP searches and the presence of predicted protein domains, followed by the application of additional criteria. We generated a confidence score (0–9) for the presence of an ortholog in given species (see Methods). Using this semi-automated process, we analyzed 24 fungal species with at least two representatives from each major clade (Figure 1B, left panel). We then plotted the number of proteins for which an ortholog to a human spliceosomal protein could be identified at a given confidence level in each species (Figure 1B, right panel). This pipeline did not identify any duplicated paralogs. Strikingly, members of the intron-reduced Saccharomycotina harbored the fewest strong human spliceosomal protein orthologs. Other species exhibited considerably larger numbers of human spliceosomal orthologs, including C. neoformans (Figure 1B right panel). Because members of the most early branching groups analyzed harbor the highest number human spliceosomal protein orthologs, clades displaying lower numbers of orthologs have most likely undergone gene loss events, with the Saccharomycotina exhibiting the highest degree of loss. This correlates with the reduction in intron number found in species of this group 13. For three fungal species of interest (S. cerevisiae, S. pombe and C. neoformans), we performed literature curation of the spliceosome (including our past studies of purified Cryptococcal spliceosomes – Burke et al, 2018) and an available experimentally curated database (Cvitkovic and Jurica, 2013). Nine proteins in S. cerevisiae and one protein in S. pombe are included in the curation based on the literature despite the fact they display insufficient sequence identity with the presumptive human ortholog to be detected bioinformatically. This analysis revealed 94 spliceosomal protein orthologs in S. cerevisiae, 126 in S. pombe and 139 in C. neoformans (Figure 1C and Data S1).

Some 45 genes encoding predicted human spliceosomal orthologs are present in C. neoformans but not in S. cerevisiae. To investigate these spliceosomal proteins, we searched for viable knockout mutants in these factors in a gene deletion collection for C. neoformans and identified strains deleted in 13 of these putative spliceosomal factors. We also identified a strain harboring a deletion of an ortholog of the human spliceosomal protein DHX35, which is found in S. cerevisiae (Dhr2) but had no detectable effect on splicing but instead nucleolar ribosomal RNA processing 14. We identified the Cryptococcal ortholog of DHX35 previously in purified C. neoformans spliceosomes 15 and included it in this study. The names and confidence scores in fungi of these 14 human spliceosomal proteins are displayed in Figure 1D. For readability, we will use the human nomenclature for Cryptococcal spliceosome proteins (see Data S1 for C. neoformans gene locus and name). We also identified a viable gene deletion corresponding to Rrp6, a nuclear exosome subunit, involved in RNA degradation and quality control, whose loss we hypothesized might stabilize RNAs produced by aberrant splicing events compared to wild-type cells.

To examine the impact of these 14 gene deletion mutations on the abundance of pre-mRNA and mRNA along with splice site choice, we cultured these strains, extracted RNA, purified polyadenylated transcripts, and performed RNA-seq. Samples were grown in duplicate and paired with wild-type samples grown on the same day to the same optical density. Paired-end 100 nt reads were obtained.

Limited impact of spliceosomal protein null mutations on global transcript abundance

The overall scheme for the analysis of the RNA-seq data is shown in Figure 2, which includes analysis of transcript counts and splicing changes. We first sought to determine whether deletions of putative spliceosomal proteins altered the transcript levels of other spliceosomal proteins. Hence, we subjected RNA-seq reads to mapping and applied DESeq2 to identify changes in transcript levels 16. Shown in Figure 2D is the impact of gene deletions on the levels of spliceosomal protein-encoding transcripts (see Table S1 for full results). Among the deletions analyzed only one displayed a significant change (>2-fold change, adjusted pvalue<0.01) in the transcript levels of a spliceosome-encoding protein. This strain is deleted for CNAG_02260, which encodes the Cryptococcal ortholog of FAM50A. While FAM50A is a spliceosomal protein in humans, it has also been linked to transcription 17 suggesting a pleiotropic role. Consistent with this, we observed that many genes display transcript level changes in this mutant, while few global transcript changes were observed for the other gene deletion strains, save for the rrp6Δ strain, which increased the levels of ~250 mRNAs, consistent with its predicted role in nuclear RNA turnover (Figure 2D). Thus, mutations in putative spliceosomal factors analyzed here do not generally appear to have large effects on the expression of other spliceosomal factors, suggesting that effects on splicing in the corresponding C. neoformans mutants likely reflect direct roles. We therefore proceeded to analyze the impact of mutations on splicing.

Figure 2: RNA-seq analysis of null mutations in 14 human spliceosomal protein orthologs.

Figure 2:

A. Schematic of the RNA-seq pipeline.

B. Definitions of splicing events quantified.

C. Definitions of changes to Percent Spliced In (delta PSI or Δψ) values for intron retention, alternative 3’ splice site, and alternative 5’ spice site events.

D. Changes in transcript levels in mutants. Plotted is the total number of splicing factors changed in the RNA-seq data as well as total transcriptome changes. See also Table S1.

Altered splicing choice and efficiency in mutants lacking human spliceosomal protein orthologs

To examine splicing changes (Fig 2AC), we used the Junctional Utilization Method [JUM; 18]. Using a stringent read-count and p-value cutoffs (see Methods), we quantified splicing changes in each of the 14 gene deletion mutants described above. Since we did not identify instances of mutually exclusive exons and only a handful of cassette exons, we excluded these two categories, along with the ‘complex splicing’ category, from our downstream analysis.

As diagrammed in Figure 2B and 2C, analysis of intron retention, alternative 5’ splice site usage, and alternative 3’ splice site usage involves multiple possibilities for a mutant phenotype. For intron retention, the amount of retained intron transcripts (i.e., precursor) can be increased or decreased relative to mRNA. For the “change in Percent Splicing In” metric (Δψ), a positive value corresponds to an increase intron retention in a mutant, while a decrease in intron retention produces a negative Δψ value (Figure 2AC). For changes in the relative use of a splice site relative to an alternative splice site, we first determined that the site preferred in wild-type cells (>50% usage relative to the alternative site) was always an annotated splice site, while the alternative site was either unannotated or annotated as an alternative site in the current C. neoformans H99 strain genome annotation. For alternative 5’ or 3’ splice site changes, a decrease of usage of the preferred site in mutant produces a negative Δψ, while an increase in the usage of the wild-type site in a mutant produces a positive Δψ value (Figure 2BC).

For each mutant, we quantified the effects across alternative splicing events in the C. neoformans genome and tabulated this data across events. The results of this analysis are shown in Figure 3A (dataset available in Data S2; see Figure S1 for RT-PCR validation). Plotted are the number of introns impacted in each gene deletion mutant for each of the three types of splicing changes. The numbers plotted above the line indicate the number of introns whose splicing is altered in a such a way to produce a positive Δψ value as defined above, while those plotted below the line represent the number of introns impacted for a given splicing type that produce a negative Δψ value as defined above. We observed the largest numbers of affected introns in the intron-retention category, and the fewest in the alternative 5’ splice site category.

Figure 3: Quantification of altered pre-mRNA splicing in mutant lacking orthologs of human spliceosomal proteins.

Figure 3:

A. Number of introns altered in pre-mRNA splicing in mutants. Changes in splicing is plotted as a count of the number of introns with significant Δψ (p<0.05) values. See also Figure S1 and Data S2.

B. Binomial test for directionality of alternative 3’ splice site usage changes. Introns affected by each KO strain were analyzed to test for a bias towards positive or negative Δψ. KO name is reported followed by direction and p-value. -log10(p-value) is displayed and colored as indicated. The labels on the left indicate the mutant, whether the bias reflects a decrease or increase in the canonical splice site in the mutant with the p-value shown in parenthesis.

C. Binomial test for directionality of alternative 5’ splice site usage changes.

D. Binomial test for directionality of intron retention changes.

It appeared that many of the mutants were biased towards a negative Δψ for 3’ and 5’ splice site choice, indicating a decrease in the use of the canonical splice site in the mutant (and therefore an increase in the use of an alternative site). Likewise, for intron retention, several mutants appeared to be biased towards increasing intron retention, consistent with increased splicing defects (increased pre-mRNA vs. mRNA). To test the statistical significance of these apparent skews, we used the binomial distribution to model the null hypothesis. As can be seen in Figure 3B, nine deletion mutants displayed statistically significant bias towards decreased usage of the canonical site (and therefore increased use of an alternative site) for 3’ splice site usage. These correspond to strains lacking orthologs of human FAM32A, RBM5, RBM17, GPATCH1, FAM50A, NOSIP, IK, DHX35 and SAP18 (in humans RBM5 and RBM10 are paralogs; we refer to the Cryptococcal ortholog as RBM5 for simplicity). Curiously, a mutant lacking the ortholog of ZNF830, a human spliceosomal protein of unknown function, displayed a bias towards increased use of the canonical 3’ splice site. For alternative 5’ splice site usage, we observed a similar pattern, with cells lacking orthologs of GPATCH1, NOSIP, and DHX35 displaying a bias towards decreased use of the canonical 5’ splice site and increased use of an alternative 5’ splice site in the mutant (Figure 3C). Again, cells lacking ZNF830 displayed the opposite bias. Finally, five mutants displayed a bias towards an increase in intron retention in the mutant (DHX35, GPATCH1, RBM5, SAP18 and RBM17) suggesting a role in splicing efficiency for a subset of transcripts (Figure 3D). Unexpectedly, strains lacking orthologs of NOSIP, IK, FAM50A, and FRA10AC1, human spliceosomal protein of unknown function, display reduced intron retention in the mutant, indicating that their absence results in increased splicing efficiency for a subset of transcripts, an unexpected phenotype (Figure 3D).

Clustering of gene deletions based on splicing changes suggests some factors act together

To identify candidates for spliceosomal factors that might act together, we calculated correlations of splicing effects for each pair of factors. Specifically, we calculated vectors of log10 corrected P-values (produced by JUM’s linear model approach) for each of the ~40,000 C. neoformans introns, with nonsignificant p-values corrected to 1; and then calculated correlations for these vectors for each pair of mutants. Figure 4 displays these correlation matrices for three types of splicing events using a Pearson correlation as the distance metric. We observed that GPATCH1 and DHX35 were among the mutants that clustered together in three matrices, suggesting they consistently impact overlapping intron sets. We also noticed that RBM5 and RBM17 also tended to cluster together the alternative 5’ splice site usage data and the intron retention data (Figure 4BC). Human GPATCH1 and DHX35 have both been identified in C complex spliceosomes assembled in vitro 19, while RBM5 and RBM17 have been found in the early A complex that includes U2 snRNP 20. Loss of the nuclear exosome factor Rrp6 produced a signature that tended to cluster adjacent to that of strains lacking the ortholog of CTNNBL1 (Figure 4AC), a core component of active human spliceosomes recently visualized by cryoEM 21, indicating an overlap between RNA species normally degraded by Rrp6 and those that accumulate in cells lacking CTNNBL1. Other mutants also showed some degree of clustering, suggesting functional/biochemical relationships.

Figure 4: Correlation between phenotypic signatures of spliceosomal protein ortholog gene deletion mutants.

Figure 4:

P-values (corrected for multiple hypothesis testing) for changes in splicing were treated as vectors and used to generate an autocorrelation matrix for each type of splicing event. P-values greater than 0.05 were set to 1. Data are organized by hierarchical clustering.

A. Mutant autocorrelation matrix based on significant alternative 3’ splice site changes

B. Mutant autocorrelation matrix based on significant alternative 5’ splice site changes

C. Mutant autocorrelation matrix based on significant intron retention changes

GPATCH1 and DHX35 as well as RBM5 and RBM17 associate in spliceosomes

The genetic data above together with existing data on the association of the human orthologs suggests that GPATCH1 and DHX35 might act together. This would require for them to be present in the same spliceosomal complex(es). To test this hypothesis, we generated a FLAG-tagged allele of GPATCH1 and performed immunoprecipitation (IP) of an untagged strain and of the tagged strain under low and high-salt conditions (four IPs total). To quantify the proteins in the coimmunoprecipitated material we performed tandem mass tag (TMT) mass spectrometry analysis (Figure 5A). We then ranked proteins based on relative peptide counts/protein length for all identified proteins. Remarkably, the next most abundant protein in the GPATCH1 IP was DHX35 (Figure 5B). The Additional spliceosomal proteins were identified, including those characteristic of active C complex spliceosomes (Figure 5B), suggesting that, as in human cells, GPATCH1 and DHX35 associate with active spliceosomes in C. neoformans. A poorly annotated protein, WDR83, displayed higher normalized abundance than GPATCH1 (Figure 5B); the significance of this finding will require additional experimental work. The dataset can be found in Data S3.

Figure 5: Purifications and TMT-MS analysis of endogenously tagged human spliceosomal protein orthologs.

Figure 5:

A. Schematic of sample preparation for TMT-MS. Shown are relative normalized abundances of the sum of the low- and high-salt peptide intensities of the spliceosomal protein orthologs.

B. IP-MS results for 2x-FLAG GPATCH1. Plotted are TMT-MS data with length-normalized peptide intensity (log10) on the Y-axis and rank on the X-axis. See also Data S3.

C. IP-MS results for 2x-FLAG RBM17. Plotted are TMT-MS data with length-normalized peptide intensity (log10) on the Y-axis and rank on the X-axis. See also Data S4.

D. IP-MS results for 2x-FLAG RBM5. Plotted are TMT-MS data with length-normalized peptide on the Y-axis (log10) and rank on the X-axis. See also Data S5.

We also performed parallel IP experiments with RBM5 and RBM17 (eight additional purifications), as they also harbor a G-patch domain and displayed clustering in their functional signatures. These proteins displayed different associated proteins. RBM5 associated with components of the U2 snRNP including DDX46 (S. cerevisiae (Sc.) Prp5), SF3A3 (Sc. Prp9) and SF3A2 (Sc. Prp11) along with SF3B complex proteins (Figure 5C), consistent with its association with A complex spliceosomes 20. RBM17 has been found to associate with U2SURP and CHERP in IP-MS studies from human cells 22. We found that purification of C. neoformans RBM17 identified U2SURP as the most abundant coimmunoprecipitating protein (Figure 5B), indicating evolutionary conservation of this association. We also identified peptides corresponding to RBM5 (Figure 5D), consistent with their clustering in the autocorrelation matrix based on the RNA-seq data described above. However, as RBM17 was not identified in the RBM5 purification, the degree to which and mechanism by which they might act together remains to be determined. Datasets for the RBM5 and RBM17 purifications can be found in Data S4 and S5. These data indicate that the clustering of factors based on their impact on splicing choice and efficiency can be informative.

Identification of intron features that correlate with sensitivity to dependence on specific factors

To investigate why some introns are sensitive to loss of the spliceosomal protein orthologs described above, we tested whether 5’ splice site strength, predicted branchpoint strength (see Methods), or 3’ splice site strength was distinct for introns affected in each of the mutants studied. These studies identified only weak or marginal effects. Next, we investigated intron geometry. We asked whether the intron length distributions of affected versus unaffected introns differed for a given mutant and splicing type. We performed the same for the number of intronic nucleotides between the predicted branchpoint and the 3’ splice site. All mutants that impacted the splicing of introns skewed significantly towards affecting introns with longer lengths (Figure 6A; clustered heatmap of corrected p-values is shown on the left panel and top three mutants/splicing types are shown in cumulative density plots on the right). The impact was strongest for intron retention changes (Figure 6A). Differences in branchpoint-to-3’ splice site distance [both increases and decreases; denoted by (+) and (−)] were most notable of introns affected for intron retention for RBM17, CCDC12, FAM32A, and FAM50A. FAM32A has been identified as “metazoan-specific” alternative step 2 factor in human spliceosome cryoEM structures that promotes the splicing in vitro of an adenovirus substrate, harboring a relatively short branchpoint-to-3’ splice site distance 23. The analysis described here suggests it also limits the splicing of longer introns as well as those with nonoptimal branchpoint-3’ splice sites in vivo.

Figure 6: Enrichment of divergent geometry parameters of introns whose splicing is altered in human spliceosomal ortholog mutants.

Figure 6:

A. Enrichment of altered lengths in affected introns. Shown in the heatmap is the negative log of the p-value generated produced by a corrected Wilcoxon Rank Sum Test. Indicated by a (+) or (−) sign is the direction of effect. Shown on the right are CDF plots and statistical test results for three gene deletion mutants/splicing change combinations.

B. Enrichment of altered predicted branchpoint-to-3’ splice site distances. Analysis was performed as in A. Branchpoint-to-3’ splice site distances were predicted by using C. neoformans branchpoint consensus to predict branchpoints computationally.

Mutants result in activation of weak alternative 5’ and/or 3’ splice sites

As many mutants that we examined were found to trigger reduced use of the canonical 5’ or 3’ splice site and a shift toward an alternative 5’ or 3’ splice site, we asked whether the corresponding splice site sequence differed between the canonical and alternative sites. To accomplish this, we examined the frequency of each of the four bases at the first six and last six position of each intron for the canonical versus alternative 5’ or 3’ splice site. We tested whether the nucleotide biases of the canonical versus alternative site were significantly different at a given position for a given gene deletion using a corrected Chi-squared test. Plotted in Figure 7A are the results for the first six nucleotides of the intron for the cases of alternative 5’ slice site usage. We observed significant differences at many nucleotides depending on the mutant, particularly, positions 4–6 of the 5’ splice site, which normally base-pair with U6 snRNA in the spliceosome (Figure 7A). For introns displaying alternative 3’ splice site usage in the mutants, we observed significant deviation between the canonical and alternative site primarily at position −3, which is typically a pyrimidine. We next generated sequence logo plots of the canonical and alternative sites. Shown in Figure 7CD are those for the introns displaying decreased use of the canonical site for mutants in the three G-patch proteins analyzed above as well as DHX35. The alternative splice site is consistently considerably weaker than the canonical and often lacking conservation at key intronic positions (e.g., positions 5 and 6 of the 5’ splice site or −3 of the 3’ splice site). We observed similar patterns in mutants of other factors. We conclude the spliceosomal proteins investigated here display a functional bias towards limiting the use of weak/alternative sites.

Figure 7: Activation of weak/cryptic alternative 5’ and 3’ splice sites in human spliceosomal ortholog mutants.

Figure 7:

A. 5’ splice site bases showing significant differences in composition between canonical and alternative sites. Chi-squared analysis of the first six nucleotides of introns showing significantly decreased Δψ for alternative 5’ splice sites in the mutant. Plotted is the negative log10 p value. Strains were clustered by similarity.

B. 3’ splice site bases showing significant differences in composition between canonical and alternative sites

DISCUSSION

Our work defines a large group of spliceosomal proteins conserved between fungi and humans that enable the splicing of divergent introns while promoting fidelity. Most had not been investigated functionally in vivo. These factors are not essential for splicing per se as they were lost in large numbers during the evolution of intron-reduced species, yet they have been conserved at least since the evolutionary divergence of fungi and humans several hundred million years ago. In the cases investigated by immunoprecipitation and mass spectrometry, factors display biochemical interactions in C. neoformans that are similar to those of their human orthologs, suggesting conserved functional roles.

Massive evolutionary loss of spliceosomal proteins in the Saccharomycotina

Prior experimental work has shown that S. cerevisiae spliceosomes are not very tolerant of mutations of intronic sequences away from consensus 24, with kinetic proofreading by ATPases limiting the splicing of mutant pre-mRNAs via discard and disassembly of substrates with kinetic defects during the catalytic stages of splicing 25. How the spliceosomes of organisms tolerate diversity in intron splicing signals and geometries is not understood. We reasoned that spliceosomal proteins whose genes were lost during evolution of organisms undergoing intron loss/homogenization might correspond to factors and processes that promote the use of divergent introns. Our analysis suggests orthologs of about a third of human spliceosomal proteins cannot be identified in S. cerevisiae. However, most are maintained in other fungal lineages. We focused our attention on C. neoformans. Our analysis revealed 45 genes in C. neoformans that encode orthologs of human spliceosomal proteins that do not appear in the S. cerevisiae genome. Of these, we identified 13 for which deletion alleles had been generated as part of a gene deletion effort. We also included the helicase DHX35 in this analysis as it is found in C. neoformans spliceosomes but not in those of S. cerevisiae 15. The human orthologs of the encoded proteins studied here associate with spliceosomes at stages ranging from early complexes such as the A complex to late catalytic/postcatalytic complexes 12. Three of the proteins investigated here harbor a G-patch motif, which is found in proteins that activate superfamily 2 helicases including two involved in splicing in yeast 26,27.

GPATCH1 and DHX35 act together on active spliceosomes

Mutations in each of the 14 of the human spliceosome protein orthologs examined altered both splicing efficiency and choice. Clustering of the data demonstrated that mutants lacking orthologs of GPATCH1 and DHX35 displayed consistently clustered together for multiple types of splicing changes. Affinity purification of a FLAG-tagged allele of GPATCH1 identified DHX35 as a top hit. Given that G-patch proteins are known activators of helicases, it seems likely that GPATCH1 functions to activate DHX35 in the spliceosome, although further biochemical work will be necessary. What the substrate of a GPATCH1/DHX35 complex might be is unclear. Based on the nature of the changes in splice site choice (see below), it may serve a role reminiscent to those of Prp16 and Prp22 in proofreading 25. We note that, in human cells, GPATCH1 and DHX35 are found in catalytically active spliceosomal complexes 19, and the mass spectrometry data in Cryptococcus presented here and elsewhere 15 indicates that this pattern of association is conserved in fungi.

Accessory factors impact the processing of genes with divergent geometries

The mRNA-to-precursor ratio is a measurement of splicing efficiency 28,29. A subset of mutants analyzed here display a bias in an increase in intron retention (versus decrease), indicating a tendency towards reducing the efficiency of splicing of specific substrates when mutated, reminiscent of classic pre-mRNA splicing mutants. These include mutants lacking ortholog of GPATCH1 and DHX35 as well as mutants in RBM5, SAP18, and RBM17. Unexpectedly, four mutants tested show the opposite bias (a bias towards improving splicing efficiency when absent): NOSIP, IK, FAM50A, and FRA10AC1. The effects of accessory factors on splicing efficiency correlates with distinctive features of substrates, notably longer intron size and nonoptimal predicted branchpoint-to-3’ splice site distance. Many factors studied here is biased rather than purely unidirectional. For example, while knockout of the ortholog of human GPATCH1 is strongly biased towards causing reduced use of canonical 5’ and 3’ splice sites in favor of poor alternative sites, in a minority of cases, the opposite effect is observed. This may reflect a combination of direct and indirect effects [such as competition of ‘hungry’ spliceosomes for introns 30,31 or context-dependent roles influenced by complex differences in intron structure and sequence.

Accessory factors promote spliceosome fidelity

Nine factors analyzed here display functional signatures that are biased towards the suppression of the use of nearby weak/cryptic 5’ while four factors are biased towards suppression of nearby, weak 3’ splice sites. Orthologs of GPATCH1 and DHX35 are notable in that they display this function for both 5’ and 3’ sites. This phenotype further suggests that these factors may act in a manner akin to the S. cerevisiae fidelity factors. An additional layer of proofreading might be necessary in organisms whose spliceosomes need to accommodate more variable intron consensus sequences as such flexible spliceosomes are likely to be more error prone. Other factors, such as the G-patch proteins RBM5 and RBM17 may have similar roles in earlier spliceosomal complexes.

STAR METHODS

RESOURCE AVAILABILITY:

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Hiten Madhani (hitenmadhani@gmail.com)

Material availability

C. neoformans strains are available without restriction from the lead contact.

Data and code availability

RNA-seq data is publically available at the NCBI GEO database (GSE168814). This paper does not report original code.

Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

EXPERIMENTAL MODEL AND SUBJECT DETAILS

All experiments were performed in Cryptococcus neoformans in the KN99 strain background. Strains were cultivated in YPAD media (Difco) at 30°C.

METHOD DETAILS

Spliceosomal protein searches

Spliceosomal protein searches were performed on proteome assemblies available from NCBI and UniProt (See Table S2). A curated list of relevant human spliceosomal proteins was used as queries in local BLASTp (version 2.9.0+) searches against independent Fungal proteome databases (initial e-value threshold of 10−6) 32. The results from the BLAST searches were further screened by analyzing domain content (HMMsearch, HMMer 3.1b2 – default parameters), size comparisons against human protein sequence length (within 25% variation), and reciprocal best-hit BLAST searches (RBH) to the query proteome 3335. To avoid bias in protein domain content, domains used for HMM searches were defined as described 36. Briefly, a conserved set of domains for each spliceosomal protein was assembled by using only those domains present in all three of the human, yeast, and Arabidopsis orthologs. Fungal ortholog candidates in this study were scored and awarded a confidence value of 0–9 based on passing the above criteria. Scores were calculated by starting at 9 and penalizing candidates for falling outside of the expected size range (−1 point), missing HMM domain calls (−2 points), and failing to strictly pass RBH (−5 points). A score of 0.5 was given to candidates that failed all criteria but still had BLAST hits after the initial human query to separate from those that had no BLAST hits. See Data S1.

C. neoformans cultivation

Two-liter liquid cultures of all strains were grown in YPAD medium (Difco) by inoculation at low density (0.002–0.004 OD600 nm) followed by overnight growth with shaking 30°C. For RNA-seq experiments, cells were harvested at OD600 of ~ 1. For TMT-MS experiments, an additional 1% glucose was added when the cultures reached OD600 of 1. Cells were harvested at OD600 of 2.

Immunoprecipitation and TMT-MS

Strains harboring a C-terminal (GPATCH1 and RBM17) or an N-terminal (RBM5) CBP-2XFLAG tag were generated by homologous replacement. Immunoprecipitations were performed exactly as described (Burke et al, 2018) with the following modifications: lysis and wash buffers were adjusted to either 150 mM NaCl (low salt) or 300 mM NaCl (high salt). Two untagged samples and two tagged samples (one at each salt concentration) was produced. The four samples were then subject to TMT-MS exactly as described (Burke et al., 2018). See Data S3 and Data S4.

RNA preparation

Polyadenylated RNA was prepared exactly as described 15.

RNA-seq

RNA-seq libraries were prepared using the NEBNext Ultra Directional RNA Library Prep Kit for Illumina. Samples were sequenced using an Illumina HiSeq 4000 instrument. Paired-end 100 nt reads were obtained. Data are available at the NCBI GEO database: GSE168814.

RNA-seq data analysis

All reads were analyzed using FastQC and reads with more than 80% of quality scores below 25 were thrown out. Reads were aligned to the C. neoformans H99 genome sequence (NCBI ID: GCA_00149245.3) using STAR 37. A minimum of 12M read/strain/replicate were obtained (Table S3). Differentially spliced introns were called using JUM (version 2.0.2). Differential events with a p-value of greater than 0.05 were set to 1. An additional 5 read minimum was imposed. To further minimize false positive, differential splicing events called by JUM that do not have an isoform harboring a start and end corresponding with an annotated intron were also removed. Spot-checking of differential events was accomplished by manual browsing of the data. Alternative 3’ and 5’ splicing events containing more than two alternative endpoints were also removed. In few cases, JUM called introns as significantly alternatively spliced with a Δψ of 0; those introns were also removed. Next, each intron is classified as increased or decreased and proximal or distal based on the observed canonical endpoint and associated Δψ. See Data S2,

QUANTIFICATION and STATISTICAL ANALYSIS

General data analysis, plotting and statistical testing were performed using Python and the SciPy stack as follows:

Binomial tests:

Introns were grouped by splicing event and strain. Within each strain a binomial test (scipy.stats.binom_test) was conducted to see if there was significantly more or fewer introns with increased splicing.

Comparisons of distributions:

Introns are grouped by strain and condition and each subset is compared to unaffected introns. The resulting two distributions are compared for each attribute. A Wilcoxon rank-sum test (scipy.stats.ranksums) is conducted to determine if the means of the two distributions are significantly different. (A Kolmogorov–Smirnov test is conducted to compare the distributions themselves (scipy.stats.kstest).) All results are multiple-test corrected using the FDR correction (statsmodels.stats.multitest.fdrcorrection).

Chi-squared analysis (canonical vs alternative sites):

Introns were separated by strain and condition and the first and last six nucleotides of the canonical sequence of affected introns was compared to the non-canonical sequence of affected introns. Each position in the endpoints was treated as an independent Fisher exact test (FisherExact.fisherexact) or chi-square test (scipy.stats.chisquare) with (4–1)*(2–1)= 3 degrees of freedom performed on a contingency table with nucleotides in the rows and affected vs. unaffected introns as the columns. In some cases where less than 5 counts are observed in a category, a chi-squared test becomes inappropriate, and the Fisher exact test is used.

P-value correlations:

Treating all strains as a vector of p-values of affected introns, a Pearson correlation matrix is computed. (pandas.DataFrame.corr).

Seqlogos:

Affected introns are grouped by splice type, condition (increased or decreased), and strain. Seqlogos are generated from the first and last six nucleotides (seqlogo.seqlogo).

Supplementary Material

Document S1. Figure S1
Data S1

Data S1: Core human spliceosomal proteins and fungal orthologs. Related to Figure 1 and STAR Methods.

Data S2

Data S2: Output of Junction Utilization Package (JUM) Analysis of 14 Mutant Strains. Related to Figure 3 and STAR Methods.

Data S3

Data S3: Full TMT-MS Data for GPATCH1 immunopurifications. Related to Figure 5 and STAR Methods.

Data S4

Data S4: Full TMT-MS Data for RBM17 immunopurifications. Related to Figure 5 and STAR Methods.

Data S5

Data S5: Full TMT-MS Data for RBM5 immunopurifications. Related to Figure 5 and STAR Methods.

Table S1

Table S1: DESeq2 output of RNA-seq data. Related to Figure 2.

Table S2

Table S2: Proteomes used for Evolutionary Analysis. Related to Figure 1.

Table S3

Table S3: RNA-seq statistics. Related to Figure 3.

Key Resources Table

REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies
Anti-FLAG M2 affinity gel Sigma-Aldrich A2220
Chemicals, Peptides, and Recombinant Proteins
3X FLAG peptide Sigma-Aldrich F4799
T4 RNA Ligase 2, truncated K227Q New England Biolabs M0351S
TURBO DNA-free Kit Thermo Fisher Scientific AM1907
T4 PNK New England Biolabs M0201S
Superscript III First-strand Synthesis System Thermo Fisher Scientific 18080051
RNaseOUT Recombinant Ribonuclease Inhibitor Thermo Fisher Scientific 10777109
Proteinase K Sigma-Aldrich P6556
Pierce Protease Inhibitor Tablets, EDTA-free Thermo Fisher Scientific 88266
Critical Commercial Assays
PolyATtract® mRNA Isolation System Promega Z5300
NEBNext Ultra Directional RNA Library Prep Kit for Illumina New England Biolabs E7420S
RNA 6000 Pico Kit Agilent 5067–1513
High Sensitivity DNA Kit Agilent 5067–4626
Deposited Data
RNA-seq data GSE168814
Experimental Models: Organisms/Strains
C. neoformans: GPATCH1-CBP-2XFLAG This Study CM2047
C. neoformans: RBM17-CBP-2XFLAG This Study CM2046
C. neoformans: RBM5-CBP-2XFLAG This Study CM2048
C. neoformans: untagged Madhani Laboratory CM025
C. neoformans: cnag_05579Δ:: NEO Madhani Laboratory CK5778
C. neoformans: cnag_00761Δ:: NEO Madhani Laboratory CK3285
C. neoformans: cnag_00294Δ:: NEO Madhani Laboratory CK1844
C. neoformans: cnag_03665Δ:: NEO Madhani Laboratory CK2692
C. neoformans: cnag_02401Δ:: NEO Madhani Laboratory CK1446
C. neoformans: cnag_04679Δ:: NEO Madhani Laboratory CK5311
C. neoformans: cnag_02260Δ:: NEO Madhani Laboratory CK4072
C. neoformans: cnag_05845Δ:: NEO Madhani Laboratory CK5920
C. neoformans: cnag_06616Δ:: NEO Madhani Laboratory CK954
C. neoformans: cnag_05030Δ:: NEO Madhani Laboratory CK5488
C. neoformans: cnag_01058Δ:: NEO Madhani Laboratory CK3448
C. neoformans: cnag_02340Δ:: NEO Madhani Laboratory CK4114
C. neoformans: cnag_05307Δ:: NEO Madhani Laboratory CK5607
C. neoformans: cnag_02773Δ:: NEO Madhani Laboratory CM
C. neoformans: cnag_03031Δ:: NEO Madhani Laboratory CK4458
Software and Algorithms
Samtools 38 http://samtools.sourceforge.net/
SciPy 39 https://github.com/scipy/scipy
Cutadapt N/A https://github.com/marcelm/cutadapt
FastX toolkit N/A http://hannonlab.cshl.edu/fastx_toolkit
HTSeq 40 https://pypi.python.org/pypi/HTSeq
BEDTools N/A https://github.com/arq5x/bedtools2
Integrative Genomics Viewer 41 http://software.broadinstitute.org/software/igv/
BAS: Bayesian Variable Selection and Model Averaging using Bayesian Adaptive Sampling (R) 42 https://cran.r-project.org/web/packages/BAS/index.html
Pandas (version 1.1.1) 43 https://zenodo.org/record/4067057#.X7RG3NNKi34
SP Pipeline 15 https://github.com/jeburke/SPTools/
IP2 N/A http://www.integratedproteomics.com
RawConverter 44 http://fields.scripps.edu/rawconv/
ProLucid 45 http://fields.scripps.edu/downloads.php
DTASelect 46 http://fields.scripps.edu/yates/wp/
Census 2 47 http://fields.scripps.edu/census/
STAR 37 https://github.com/alexdobin/STAR
Matplotlib (version 3.3.3) 48 https://matplotlib.org/
Numpy 49 https://numpy.org
Seqlogo 50 https://github.com/betteridiot/seqlogo
Statsmodels (version 0.13.0) 51 https://www.statsmodels.org/dev/index.html
Statannot 52 https://github.com/webermarcolivier/statannot
Pybedtools 53 https://daler.github.io/pybedtools/
JUM (version 2.0.2) 18 https://github.com/qqwang-berkeley/JUM

Highlights.

  • Phylogenetic analysis reveals ancestral spliceosome was complex

  • Human spliceosomal protein orthologs lost in S. cerevisiae found in C. neoformans

  • Functional analysis in C. neoformans demonstrates roles in splicing fidelity

  • Proteomic and genetic analysis reveal functional modules

ACKNOWLEDGEMENTS

This work was supported by R01 GM71801 and R01 AI00272 to H.D.M. B.A.B. and S.W.R. are supported by NSF Award 1751372 to S.W.R. I.B. is supported by a Swiss National Foundation Fellowship (191929). We thank Qingqing Wang for assistance with installation and usage of JUM scripts.

Footnotes

DECLARATION OF INTERESTS

The authors declare no competing interests.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

REFERENCES

  • 1.Wilkinson ME, Charenton C, and Nagai K. (2020). RNA Splicing by the Spliceosome. Annu Rev Biochem 89, 359–388. 10.1146/annurev-biochem-091719-064225. [DOI] [PubMed] [Google Scholar]
  • 2.Wahl MC, and Luhrmann R. (2015). SnapShot: Spliceosome Dynamics II. Cell 162, 456–456 e451. 10.1016/j.cell.2015.06.061. [DOI] [PubMed] [Google Scholar]
  • 3.Wahl MC, and Luhrmann R. (2015). SnapShot: Spliceosome Dynamics I. Cell 161, 1474-e1471. 10.1016/j.cell.2015.05.050. [DOI] [PubMed] [Google Scholar]
  • 4.Chen W, Shulha HP, Ashar-Patel A, Yan J, Green KM, Query CC, Rhind N, Weng Z, and Moore MJ (2014). Endogenous U2.U5.U6 snRNA complexes in S. pombe are intron lariat spliceosomes. RNA 20, 308–320. 10.1261/rna.040980.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Cipakova I, Jurcik M, Rubintova V, Borbova M, Mikolaskova B, Jurcik J, Bellova J, Barath P, Gregan J, and Cipak L. (2019). Identification of proteins associated with splicing factors Ntr1, Ntr2, Brr2 and Gpl1 in the fission yeast Schizosaccharomyces pombe. Cell Cycle 18, 1532–1536. 10.1080/15384101.2019.1632126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.McDonald WH, Ohi R, Smelkova N, Frendewey D, and Gould KL (1999). Mybrelated fission yeast cdc5p is a component of a 40S snRNP-containing complex and is essential for pre-mRNA splicing. Mol Cell Biol 19, 5352–5362. 10.1128/mcb.19.8.5352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Irimia M, Penny D, and Roy SW (2007). Coevolution of genomic intron number and splice sites. Trends Genet 23, 321–325. 10.1016/j.tig.2007.04.001. [DOI] [PubMed] [Google Scholar]
  • 8.Grate L, and Ares M Jr. (2002). Searching yeast intron data at Ares lab Web site. Methods Enzymol 350, 380–392. 10.1016/s0076-6879(02)50975-7. [DOI] [PubMed] [Google Scholar]
  • 9.Janbon G, Ormerod KL, Paulet D, Byrnes EJ 3rd, Yadav V, Chatterjee G, Mullapudi N, Hon CC, Billmyre RB, Brunel F, et al. (2014). Analysis of the genome and transcriptome of Cryptococcus neoformans var. grubii reveals complex RNA expression and microevolution leading to virulence attenuation. PLoS Genet 10, e1004261. 10.1371/journal.pgen.1004261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Loftus BJ, Fung E, Roncaglia P, Rowley D, Amedeo P, Bruno D, Vamathevan J, Miranda M, Anderson IJ, Fraser JA, et al. (2005). The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans. Science 307, 1321–1324. 10.1126/science.1103773. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Piovesan A, Antonaros F, Vitale L, Strippoli P, Pelleri MC, and Caracausi M. (2019). Human protein-coding genes and gene feature statistics in 2019. BMC Res Notes 12, 315. 10.1186/s13104-019-4343-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Cvitkovic I, and Jurica MS (2013). Spliceosome database: a tool for tracking components of the spliceosome. Nucleic Acids Res 41, D132–141. 10.1093/nar/gks999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Irimia M, and Roy SW (2008). Evolutionary convergence on highly-conserved 3’ intron structures in intron-poor eukaryotes and insights into the ancestral eukaryotic genome. PLoS Genet 4, e1000148. 10.1371/journal.pgen.1000148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Colley A, Beggs JD, Tollervey D, and Lafontaine DL (2000). Dhr1p, a putative DEAH-box RNA helicase, is associated with the box C+D snoRNP U3. Mol Cell Biol 20, 7238–7246. 10.1128/mcb.20.19.7238-7246.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Burke JE, Longhurst AD, Merkurjev D, Sales-Lee J, Rao B, Moresco JJ, Yates JR 3rd, Li JJ, and Madhani HD (2018). Spliceosome Profiling Visualizes Operations of a Dynamic RNP at Nucleotide Resolution. Cell 173, 1014–1030 e1017. 10.1016/j.cell.2018.03.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Love MI, Huber W, and Anders S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550. 10.1186/s13059-0140550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kim Y, Hur SW, Jeong BC, Oh SH, Hwang YC, Kim SH, and Koh JT (2018). The Fam50a positively regulates ameloblast differentiation via interacting with Runx2. J Cell Physiol 233, 1512–1522. 10.1002/jcp.26038. [DOI] [PubMed] [Google Scholar]
  • 18.Wang Q, and Rio DC (2018). JUM is a computational method for comprehensive annotation-free analysis of alternative pre-mRNA splicing patterns. Proc Natl Acad Sci U S A 115, E8181–E8190. 10.1073/pnas.1806018115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Ilagan JO, Chalkley RJ, Burlingame AL, and Jurica MS (2013). Rearrangements within human spliceosomes captured after exon ligation. RNA 19, 400–412. 10.1261/rna.034223.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hartmuth K, Urlaub H, Vornlocher HP, Will CL, Gentzel M, Wilm M, and Luhrmann R. (2002). Protein composition of human prespliceosomes isolated by a tobramycin affinity-selection method. Proc Natl Acad Sci U S A 99, 16719–16724. 10.1073/pnas.262483899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Townsend C, Leelaram MN, Agafonov DE, Dybkov O, Will CL, Bertram K, Urlaub H, Kastner B, Stark H, and Luhrmann R. (2020). Mechanism of proteinguided folding of the active site U2/U6 RNA during spliceosome activation. Science 370. 10.1126/science.abc3753. [DOI] [PubMed] [Google Scholar]
  • 22.De Maio A, Yalamanchili HK, Adamski CJ, Gennarino VA, Liu Z, Qin J, Jung SY, Richman R, Orr H, and Zoghbi HY (2018). RBM17 Interacts with U2 SURP and CHERP to Regulate Expression and Splicing of RNA-Processing Proteins. Cell Rep 25, 726–736 e727. 10.1016/j.celrep.2018.09.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Fica SM, Oubridge C, Wilkinson ME, Newman AJ, and Nagai K. (2019). A human postcatalytic spliceosome structure reveals essential roles of metazoan factors for exon ligation. Science 363, 710–714. 10.1126/science.aaw5569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Lesser CF, and Guthrie C. (1993). Mutational analysis of pre-mRNA splicing in Saccharomyces cerevisiae using a sensitive new reporter gene, CUP1. Genetics 133, 851–863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Koodathingal P, and Staley JP (2013). Splicing fidelity: DEAD/H-box ATPases as molecular clocks. RNA Biol 10, 1073–1079. 10.4161/rna.25245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Robert-Paganin J, Rety S, and Leulliot N. (2015). Regulation of DEAH/RHA helicases by G-patch proteins. Biomed Res Int 2015, 931857. 10.1155/2015/931857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Studer MK, Ivanovic L, Weber ME, Marti S, and Jonas S. (2020). Structural basis for DEAH-helicase activation by G-patch proteins. Proc Natl Acad Sci U S A 117, 71597170. 10.1073/pnas.1913880117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Rymond BC, Pikielny C, Seraphin B, Legrain P, and Rosbash M. (1990). Measurement and analysis of yeast pre-mRNA sequence contribution to splicing efficiency. Methods Enzymol 181, 122–147. 10.1016/0076-6879(90)81116-c. [DOI] [PubMed] [Google Scholar]
  • 29.Pikielny CW, and Rosbash M. (1985). mRNA splicing efficiency in yeast and the contribution of nonconserved sequences. Cell 41, 119–126. 10.1016/00928674(85)90066-2. [DOI] [PubMed] [Google Scholar]
  • 30.Talkish J, Igel H, Perriman RJ, Shiue L, Katzman S, Munding EM, Shelansky R, Donohue JP, and Ares M Jr. (2019). Rapidly evolving protointrons in Saccharomyces genomes revealed by a hungry spliceosome. PLoS Genet 15, e1008249. 10.1371/journal.pgen.1008249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Munding EM, Shiue L, Katzman S, Donohue JP, and Ares M Jr. (2013). Competition between pre-mRNAs for the splicing machinery drives global regulation of splicing. Mol Cell 51, 338–348. 10.1016/j.molcel.2013.06.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, and Lipman DJ (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402. 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, and Yuan Y. (1998). Predicting function: from genes to genomes and back. J Mol Biol 283, 707–725. 10.1006/jmbi.1998.2144. [DOI] [PubMed] [Google Scholar]
  • 34.Johnson LS, Eddy SR, and Portugaly E. (2010). Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics 11, 431. 10.1186/1471-2105-11-431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Tatusov RL, Koonin EV, and Lipman DJ (1997). A genomic perspective on protein families. Science 278, 631–637. 10.1126/science.278.5338.631. [DOI] [PubMed] [Google Scholar]
  • 36.Hudson AJ, McWatters DC, Bowser BA, Moore AN, Larue GE, Roy SW, and Russell AG (2019). Patterns of conservation of spliceosomal intron structures and spliceosome divergence in representatives of the diplomonad and parabasalid lineages. BMC Evol Biol 19, 162. 10.1186/s12862-019-1488-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, and Gingeras TR (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and Genome Project Data Processing, S. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079. 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, et al. (2020). SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17, 261–272. 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Anders S, Pyl PT, and Huber W. (2015). HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169. 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Thorvaldsdottir H, Robinson JT, and Mesirov JP (2013). Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 14, 178–192. 10.1093/bib/bbs017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Clyde MA, and Parmigiani G. (1998). Protein construct storage: Bayesian variable selection and prediction with mixtures. J Biopharm Stat 8, 431–443. 10.1080/10543409808835251. [DOI] [PubMed] [Google Scholar]
  • 43.McKinney W. (2010). Data Structures. Proceding of the 9th Python in Science Conference. [Google Scholar]
  • 44.He L, Diedrich J, Chu YY, and Yates JR 3rd (2015). Extracting Accurate Precursor Information for Tandem Mass Spectra by RawConverter. Anal Chem 87, 11361–11367. 10.1021/acs.analchem.5b02721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Xu T, Park SK, Venable JD, Wohlschlegel JA, Diedrich JK, Cociorva D, Lu B, Liao L, Hewel J, Han X, et al. (2015). ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity. J Proteomics 129, 16–24. 10.1016/j.jprot.2015.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Tabb DL, McDonald WH, and Yates JR 3rd (2002). DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. J Proteome Res 1, 21–26. 10.1021/pr015504q. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Park SK, Aslanian A, McClatchy DB, Han X, Shah H, Singh M, Rauniyar N, Moresco JJ, Pinto AF, Diedrich JK, et al. (2014). Census 2: isobaric labeling data analysis. Bioinformatics 30, 2208–2209. 10.1093/bioinformatics/btu151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Hunter JD (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering 9, 90–95. [Google Scholar]
  • 49.Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, et al. (2020). Array programming with NumPy. Nature 585, 357–362. 10.1038/s41586-020-2649-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Bembon O. (2017).
  • 51.Seabold S, and Perktold J. (2010). In python s.e.a.s.m.w, ed. 9th Pythin in Science Conference. [Google Scholar]
  • 52.Weber MO (2021). statannot. [Google Scholar]
  • 53.Dale RK, Pedersen BS, and Quinlan AR (2011). Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 34233424. 10.1093/bioinformatics/btr539. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figure S1
Data S1

Data S1: Core human spliceosomal proteins and fungal orthologs. Related to Figure 1 and STAR Methods.

Data S2

Data S2: Output of Junction Utilization Package (JUM) Analysis of 14 Mutant Strains. Related to Figure 3 and STAR Methods.

Data S3

Data S3: Full TMT-MS Data for GPATCH1 immunopurifications. Related to Figure 5 and STAR Methods.

Data S4

Data S4: Full TMT-MS Data for RBM17 immunopurifications. Related to Figure 5 and STAR Methods.

Data S5

Data S5: Full TMT-MS Data for RBM5 immunopurifications. Related to Figure 5 and STAR Methods.

Table S1

Table S1: DESeq2 output of RNA-seq data. Related to Figure 2.

Table S2

Table S2: Proteomes used for Evolutionary Analysis. Related to Figure 1.

Table S3

Table S3: RNA-seq statistics. Related to Figure 3.

Data Availability Statement

RNA-seq data is publically available at the NCBI GEO database (GSE168814). This paper does not report original code.

Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

RESOURCES