Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Nov 13.
Published in final edited form as: Science. 2010 Nov 5;330(6005):838–841. doi: 10.1126/science.1194554

Evolution of Yeast Noncoding RNAs Reveals an Alternative Mechanism for Widespread Intron Loss

Quinn M Mitrovich 1,2,*, Brian B Tuch 1,3,*,, Francisco M De La Vega 3, Christine Guthrie 2,, Alexander D Johnson 1,2,
PMCID: PMC3496775  NIHMSID: NIHMS267973  PMID: 21051641

Abstract

The evolutionary forces responsible for intron loss are unresolved. Whereas research has focused on protein-coding genes, here we analyze noncoding small nucleolar RNA (snoRNA) genes in which introns, rather than exons, are typically the functional elements. Within the yeast lineage exemplified by the human pathogen Candida albicans, we find—through deep RNA sequencing and genome-wide annotation of splice junctions—extreme compaction and loss of associated exons, but retention of snoRNAs within introns. In the Saccharomyces yeast lineage, however, we find it is the introns that have been lost through widespread degeneration of splicing signals. This intron loss, perhaps facilitated by innovations in snoRNA processing, is distinct from that observed in protein-coding genes with respect to both mechanism and evolutionary timing.


In eukaryotes, protein-coding genes are frequently interrupted by introns, which must be precisely removed from RNA transcripts by the nuclear spliceosome (1). Over evolutionary time scales, the presence of introns is dynamic, with intron gain and loss rates varying substantially across eukaryotic lineages (2-4). The mechanisms of intron gain and loss speak to questions about both the origins of introns and the markedly different intron-exon patterns observed across eukaryotes (5)—for example, whether spliceosomal introns arose within eukaryotes (“introns late”), within an ancestor of both prokaryotes and eukaryotes (“introns early”), or even before the emergence of protein-coding genes (“introns first”) (6). The last two hypotheses depend on the feasibility of comprehensive intron loss within both the prokaryotic and archaeal lineages, whose modern representatives lack spliceosomal introns. Within eukaryotes, the hemiascomycetous yeasts show substantial intron loss, with modern species like Saccharomyces cerevisiae and Candida albicans devoid of introns in >90% of their genes (7). A postulated mechanism for this loss is reverse transcription of spliced RNA, followed by homologous DNA recombination that replaces the intron-containing genomic sequence with the intronless copy (8). Previous studies of intron loss have focused on protein-coding genes, however, and are therefore likely to be biased toward mechanisms that lead to precise intron removal. Here, we examine instead the evolution of splicing patterns in yeast noncoding genes.

We performed massively parallel ligation-based sequencing of RNA libraries (RNA-seq) and mapped the resulting ~160 million 50-nucleotide (nt) strand-specific sequence reads to the C. albicans genome (9). Our data confirm 89% of 421 previously an-notated spliceosomal introns (7, 10) (table S1), while correcting or rejecting seven of these an-notations (table S2). We also find 68 previously unannotated splice junctions, identifying 15 new introns in protein-coding genes (table S3), 30 examples of alternative splicing [table S4 and supporting online (SOM) text], and 23 new introns in previously unannotated transcripts that lack substantial protein-coding potential (table S5). Analysis of 11 of these spliced, noncoding RNAs revealed that their exons have no apparent function, but that their introns contain C/D box snoRNAs—noncoding RNAs that target modifications to ribosomal RNA (rRNA) (11). In the nonhemias-comycetous fungus Neurospora crassa, snoRNAs are also generally processed from the introns of non-protein-coding precursors (12). This is different, however, from the more closely related hemiascomycete S. cerevisiae, where nearly all snoRNAs arise from unspliced primary transcripts and, therefore, require a splicing-independent processing pathway (13). This difference between C. albicans and S. cerevisiae suggests that the transition of snoRNAs from intron sequences to unspliced, dedicated transcripts occurred within the Saccharomyces lineage, well after the onset of rapid intron loss from protein-coding genes in the hemiascomycete ancestor.

To trace the evolution of snoRNAs throughout the hemiascomycetes, we first generated 40 high-confidence C/D box snoRNA predictions for C. albicans (e.g., Fig. 1). Among these are the 11 identified in our intron analysis (above), as well as the previously identified snR52 (14). We searched for splicing signals in sequences adjacent to snoRNAs, and predict that 33 of the 40 identified C. albicans snoRNAs are intronic. This confirms the difference between C. albicans and S. cerevisiae, as C/D box snoRNAs in the latter species are rarely intronic [6 out of 47 (13)].

Fig. 1.

Fig. 1

Sequence library comparisons reveal noncoding RNAs. RNA sequence data are shown for MRPS35, a representative protein-coding gene that hosts a snoRNA within its intron. Nonadenylated RNAs, such as mature snoRNAs, are enriched in the rRNA-depleted library relative to the poly(A)-selected library. Sequence depth is represented on a log2 axis. (Bottom track) One of 1706 lower-confidence snoRNA predictions generated using the snoscan algorithm (22).

We next identified orthologous snoRNAs from other sequenced yeasts (Fig. 2), beginning with computational predictions (or, for S. cerevisiae, existing annotations), identifying candidate orthologs of our C. albicans set based on their predicted rRNA target sites, and finally confirming and refining our predictions by searching for limited primary sequence identity among predicted snoRNAs (15). This final refinement identified snoRNAs whose rRNA target sites had changed between species, demonstrating evolutionary plasticity in yeast rRNA methylation sites (fig. S1). We predict that 105 of the 255 analyzed snoRNAs are located within introns (Fig. 2A). Individual species vary considerably: Those in the Saccharomyces complex have few intronic snoRNAs (three to five), whereas others have substantially more (23 to 33). The most parsimonious explanation for these data is massive loss of snoRNA-associated introns, most of which took place in the common ancestor of the Saccharomyces complex (Fig. 2B).

Fig. 2.

Fig. 2

snoRNA-associated introns were lost in the Saccharomyces lineage. (A) Intron prediction scores [combined 5′ splice site and branch site species-specific PWM scores (15)] for 40 C/D box snoRNA flanking sequences (± 200 nt) in seven representative hemiascomycetes. Each snoRNA (horizontal row) is labeled according to the nomenclature of S. cerevisiae (where applicable) or N. crassa (CD39) or by the predicted C. albicans rRNA modification site. Intron scores greater than 5.0 (false-positive rate <0.4%) are shaded green. Inferred intron loss events for each snoRNA, based on parsimony, are indicated on the left and correspond to labeled branches (not drawn to scale) in the phylogeny shown above and in (B). snoRNAs that were not identified or that lack sufficient flanking sequence to score are gray and labeled NA or ND, respectively. Locations of N. crassa orthologs (green for intronic, white for exonic) are derived from (12). Intron scores for reverse-complements of flanking sequences (right) are provided as a negative control. (B) Phylogenetic pattern of snoRNA-associated intron loss. The number of loss events assigned to each branch is indicated in yellow. Species: N. crassa, Y. lipolytica, Debaryomyces hansenii, C. albicans, Kluyveromyces lactis, Kluyveromyces waltii, Zygosaccharomyces rouxii, and S. cerevisiae.

The intron loss mechanism proposed for protein-coding genes—retrotransposition of spliced mRNAs—cannot explain the pattern we observe, as it would eliminate the snoRNAs along with the introns. Rather, the introns appear to have been lost through degeneration of their splicing signals, effectively converting them into unspliced exon sequence. In N. crassa (12) and hemiascomycetes outside the Saccharomyces complex, snoRNAs in the snR72-78 polycistronic cluster are mostly contained within individual introns (Fig. 3A). In S. cerevisiae, the genes are arranged identically, but are cotranscribed as a single unspliced precursor, then processed into individual snoRNAs by the RNase III enzyme Rnt1 (16). The conservation of genomic synteny among these species strongly suggests intron loss through splice site degeneration (“de-intronization”) with Yarrowia lipolytica and the Candida clade representing an intermediate state of partial intron loss and S. cerevisiae representing complete intron loss. The snR57 cluster similarly supports this idea (below), as do the gene structures of snR47 and snR79 (SOM text).

Fig. 3.

Fig. 3

Polycistronic snoRNA precursors exhibit unusual splicing patterns. (A) Splicing of the snR72-78 cluster in various fungal species (phylogenetic relations on left). snoRNAs are shown in green, introns as lines, and exons as yellow boxes, with internal exons labeled by size. Cotranscription of entire clusters has been demonstrated only for S. cerevisiae (16) and N. crassa (12). (B) Nested splicing of the snR57/55/61 snoRNA cluster in C. albicans. The 5′ splice site, branch site, and 3′ splice site sequences are shown for introns 1 (pink), 2 (blue), and 3 (orange). Gray “exons” represent sequences joined with intron 1, by splicing of introns 2 and 3, to create de novo intron 1 splice sites. (Inset) Reverse transcription polymerase chain reaction products of the snoRNA host transcript from wild-type cells and those deficient for nonsense-mediated mRNA decay (upf1-Δ) or nuclear exonucleolytic decay (rrp6-Δ). We infer the order of splicing events from the observable products: Intron 3 is nearly always removed, intron 2 is removed in a subset of transcripts (lower two bands), and intron 1 only when 2 and 3 have also been removed (lower band). Patterns in decay mutants are consistent with degradation of partially spliced host transcripts in the nucleus.

Intron loss through splice-site degeneration would not be expected to occur within protein-coding genes, as this would disrupt the encoded protein. Consistent with this prediction, of the five snoRNAs located in introns within protein-coding regions in C. albicans, four remain associated with introns in nearly all the Saccharomyces complex species [snR18, snR24, snR54, and CD39; (Fig. 2A)]. The fifth, snR39b, was likely displaced from its associated coding sequence through a genomic duplication event (fig. S2B). The snoRNA CD39 is located within the ribosomal protein gene MRPS35 intron in all but one of the species we analyzed (Fig. 2A). The exception is S. cerevisiae, where MRPS35 has lost its intron, presumably through retrotransposition rather than deintronization. This appears to have eliminated CD39 from the genome as well: The predicted rRNA methylation site for this snoRNA is unmodified in S. cerevisiae (13).

In C. albicans, one unusual case of splicing may reflect the particular processing requirements imposed by intron-hosted snoRNAs. The snoRNAs snR57, snR55, and snR61 are processed from three introns of a single precursor (Fig. 3B). We find that two of the introns (introns 2 and 3) lie entirely within the sequence of an enveloping intron (intron 1). The splicing signals for intron 1 are not present in the primary transcript, but are created upon splicing of introns 2 and 3. Thus, splicing of intron 1 can occur only after introns 2 and 3 have been removed. (See SOM text for similar scenarios in protein-coding genes of other eukaryotes.) Analysis of both the snR57/55/61 and the snR72-78 clusters in other species indicates a significant reduction in the sizes of internal exons within the hemiascomycetes, driven perhaps by the same pressures that streamlined their genomes (17) and leading ultimately to nested splicing within the Candida clade (fig. S3). In animals and fungi, intron-hosted C/D box snoRNAs obey a strict “one-snoRNA-per-intron” rule, a requirement imposed by their exonucleolytic maturation pathways (18). Nested splicing of snoRNA host transcripts fulfills this requirement by allowing sequential removal of individual introns despite the absence of intervening exons.

Selective pressures are proposed to have driven intron loss from hemiascomycete protein-coding genes (19), and these same pressures may have driven the loss we observe here. The dependence of snoRNAs on splicing for their proper maturation, however, would have imposed a constraint against loss of their associated introns (18). This constraint may have been overcome by the innovation of an alternative snoRNA processing pathway in the Saccharomyces lineage, where exonic C/D box snoRNAs first undergo endonucleolytic cleavage by the RNase III enzyme Rnt1 (20). None-theless, some capacity to process exonic snoRNAs must be more ancient, because other hemiascomycetes (Fig. 2) and more distantly related fungi (12) do have some exon-hosted snoRNAs. It is unknown, however, whether processing of such snoRNAs involves Rnt1 (SOM text).

Studies of protein-coding genes have revealed a dramatic increase in the rate of intron loss in the hemiascomycete ancestor (21). By focusing instead on noncoding RNAs, we describe unexpected patterns of both exon and intron loss. A drastic reduction in the sizes of internal exons has ultimately led to their complete loss in Candida species, resulting in a complex form of splicing that maintains the independent removal of now over-lapping introns. The Saccharomyces complex, however, has experienced a second wave of intron loss, perhaps facilitated by innovations in snoRNA processing and mediated by a mechanism—splice site degeneration—distinct from that which has acted on protein-coding genes.

Supplementary Material

S2-S6
Table S1

Acknowledgments

We thank C. Barbacioru, O. Homann, S. Kuersten, S. Ledoux, I. Listerman, T. Lowe, C. Maeder, K. Pande, A. Price, S. Roy, J. Steitz, and members of the Guthrie and Johnson laboratories for helpful discussions and support; and M. Barker and C. Monighetti for providing sequence reads. This work was supported by NIH grants GM021119 (C.G.) and GM37049 (A.D.J.) and by Life Technologies Corp. C.G. is an American Cancer Society Research Professor of Molecular Genetics. Sequence data are available at the Gene Expression Omnibus (accession number GSE21291).

Footnotes

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S2-S6
Table S1

RESOURCES