Skip to main content
Molecular Biology and Evolution logoLink to Molecular Biology and Evolution
letter
. 2010 Apr 1;27(9):1979–1982. doi: 10.1093/molbev/msq087

Constrained Intron Structures in a Microsporidian

Renny CH Lee 1,, Erin E Gill 1,, Scott W Roy 2, Naomi M Fast 1,*
PMCID: PMC3809491  PMID: 20360213

Abstract

The 2.9-Mbp genome of the microsporidian Encephalitozoon cuniculi is severely reduced and compacted, possessing only 16 known tiny spliceosomal introns. Based on motif and expression data, intron profiles were constructed to screen the genome. Twenty additional introns were predicted and verified, doubling the previous estimate. We further predict that accurate 3′ splice site (3′SS) selection is accomplished via a scanning mechanism with specificity achieved by maintaining a constrained variable length between the branch point motif and 3′SS. Only introns in ribosomal protein genes exhibit positional bias, and we hypothesize that splicing could be regulating expression of these genes. The large set of new introns in non-ribosomal protein genes suggests that current models of intron loss are unlikely sufficient to explain the distribution of introns. Together, these results extend our understanding of the role of intron loss in genome evolution and contribute to a novel model for splice site selection.

Keywords: intron, Encephalitozoon cuniculi, genome reduction, microsporidia, intron loss, splice site selection


Across eukaryotes, spliceosomal introns show tremendous diversity from the long introns of mammals to the tiny 18- to 21-nt introns of green algal–derived nucleomorphs (Gilson and McFadden 1996; Slamovits and Keeling 2009) and from the degenerate, diffuse splicing signals of many intron-rich organisms to the highly constrained sequences of many intron-poor lineages (Irimia and Roy 2008; Schwartz et al. 2008; Irimia et al. 2009). Microsporidia are unicellular parasites related to fungi that possess the smallest eukaryotic genomes known. The 2.9-Mbp genome of Encephalitozoon cuniculi is testament to widespread reduction at all levels of organization; indeed, only 16 extremely short (23–52 nt) introns were annotated, almost exclusively interrupting the 5′ ends of ribosomal protein-encoding genes (RPGs), a trend also observed in unrelated lineages that have independently undergone genome reduction and possess low intron densities (Douglas et al. 2001; Bon et al. 2006). Why the few remaining introns persist in these unrelated lineages with reduced genomes is much debated and could be a product of neutral evolution and associated mutational biases or a reflection of functional selective pressure (Mourier and Jeffares 2003; Slamovits and Keeling 2009).

Although splicing signals can be highly variable, in general, introns possess sequence motifs at the 5′ and 3′ splice sites (5′SS and 3′SS, respectively) and at the branch point (bpA), the three primary regions that interact with conserved spliceosomal components (i.e., protein or small nuclear RNA-pre-messenger RNA [snRNA–pre-mRNA] pairings). Exceptions are found among those lineages possessing extremely short introns: the cryptophyte and chlorarachnion nucleomorphs, the ciliate Paramecium, and the metazoan Dicyema (Russell et al. 1994; Douglas et al. 2001; Gilson et al. 2006; Aruga et al. 2007). These tiny introns lack characteristic splicing motifs in all cases examined (Gilson et al. 2006; Slamovits and Keeling 2009; Ogino et al. 2010), with the exception of cryptophyte nucleomorph introns that possess a typical eukaryotic 5′SS sequence (GUAAGU; Douglas et al. 2001). Alignment of E. cuniculi introns revealed a sharp contrast to reduced introns that lack recognizable splicing signals. Figure 1 indicates the apparent 5′SS (5′-GUAAGUGG), bpA (YUAAYUU), and 3′SS (AG-3′). The first six bases of the 5′SS form the eukaryotic consensus and could pair with the predicted E. cuniculi U6 snRNA (Katinka et al. 2001) and possibly with an unknown U1 snRNA sequence. Although the E. cuniculi bpA resembles the eukaryotic consensus of CURAY (Spingola et al. 1999) and hence could pair appropriately with the predicted U2 snRNA, its length and position are unusual. Only hemiascomycete yeasts and a red alga possess a 7-nt bpA motif; however, E. cuniculi's bpA motif is unique. It extends distally toward the 3′SS, in contrast to the other extended motifs. Considering these motifs, a number of previously annotated E. cuniculi introns are predicted to be shorter than annotated, and two introns lacking these motifs are predicted to not be bona fide introns (see also Cornman et al. 2009). These predictions were confirmed by 5′ rapid amplification of cDNA ends from proliferative parasite material (Gill et al. 2010). Now the intron model could be refined in two important ways: only small variations from the consensus 5′SS and bpA motifs are allowed, and a very narrow distance separates the bpA motif and 3′SS (fig. 1). The new model allowed for a direct search for additional introns in E. cuniculi.

FIG. 1.

FIG. 1.

Encephalitozoon cuniculi introns are aligned and listed in order of increasing length. New introns are shown in bold; an asterisk indicates those shorter than annotated. Poly-A binder bases 9–36 are shown as “N.”

We predicted and verified (by sequencing spliced transcripts) 20 additional E. cuniculi introns (fig. 1; supplementary table S1, Supplementary Material online), more than doubling the number of known introns. Of the 20, 11 are in hypothetical genes (many unannotated), 5 are in RPGs (including those recently predicted by Peyretaillade et al. 2009), and 4 are in other genes of predicted function (fig. 1). These newly verified E. cuniculi introns attest to much broader length range of 23–76 bases.

The revised intron set allows us to consider how these introns are recognized by the reduced and likely divergent microsporidian spliceosome. Aberrant splicing products were not observed, suggesting accurate recognition. How fidelity is maintained is perhaps best illustrated by what the spliceosome does not do: A number of E. cuniculi introns have an adjacent AG dinucleotide immediately following the verified 3′SS, yet splicing at these adjacent positions was never observed, even in cases that would not induce a frameshift (fig. 2a; bold). The alignment of all E. cuniculi introns offers a clue: The distance between the bpA motif and the 3′SS is a discrete size ranging from zero to three bases. We hypothesize that this range represents a threshold (the trinucleotide threshold), beyond which a 3′SS is not selected.

FIG. 2.

FIG. 2.

Intron (lowercase) and exon (uppercase) bases with the trinucleotide threshold (pink) and bpA motif (blue) indicated. (a) Intron sequences with all possible bpA-3′SS distances. (b) Sample of sequences that contain all splicing motifs but are not introns.

Additional support comes from “intron-like elements” that contain the E. cuniculi intron motifs but are not introns and are not spliced (fig. 2b). For example, Prp8 is highly conserved both in size and in sequence. If the Prp8 intron-like sequence was spliced, this protein would be nonfunctional due to a frameshift. The intron-like sequence of Prp8 (fig. 2b) has four bases separating the bpA motif and the 3′SS, suggesting that exceeding the trinucleotide threshold by even one base could prevent mis-splicing. This suggestion is bolstered by additional intron-like elements (fig. 2b). Clearly, both possession and spacing of motifs are important for accurate splicing. One further mechanistic implication is that the E. cuniculi spliceosome scans within the trinucleotide threshold for the first acceptor. Interestingly, “unknown” intron “3-b” has two adjacent AG dinucleotides one and three bases from the bpA motif (fig. 2a). Splicing is only ever observed at the first acceptor site, suggesting a scanning mechanism that locates the first AG. What limits the threshold to just three nucleotides? We hypothesize that recognition is constrained by the physical interaction between the spliceosome and the intron, where a bpA-3′SS distance of four or more bases is sterically excluded. This is the first case where a variable yet highly constrained distance determines the upper bound for the separation of the bpA motif and the 3′SS.

These data also bear upon our understanding of ancestral intron loss. Intron positions tend to be 5′ biased within genes (most clearly evident in highly reduced species; Douglas et al. 2001; Mourier and Jeffares, 2003), and it is debated whether this is due to differences in selection or mutational rates. The most influential mutational hypothesis invokes reverse transcriptase (RT)-mediated intron loss (Fink 1987), where introns closer to the 3′ ends would be lost at a greater rate (Roy and Gilbert 2006). Alternatively, these 5′-biased introns could have a regulatory function. Indeed, there is a precedent for regulated splicing of RPG introns, involving both feedback inhibition and stress responses (Vilardell and Warner 1997; Pleiss et al. 2007). This hypothesis is popular because almost all Guillardia theta nucleomorph introns and a significant proportion of the few introns in Saccharomyces cerevisiae are found at the 5′ ends of RPGs (Spingola et al. 1999; Douglas et al. 2001). The original 14 introns in E. cuniculi also show a strong 5′ RPG bias (fig. 3, top). However, including the positional data from the 20 new introns gives a very different picture (fig. 3, bottom). Unlike RPG introns, non-RPG introns do not show a strong 5′ bias. The distribution of non-RPG introns is at odds with one of the predictions of the RT-mediated model for intron removal and suggests that whatever is driving, the 5′ bias of RPG introns is specific to RPGs and is not due to RT-mediated loss.

FIG. 3.

FIG. 3.

Positional distribution of original (above ORF) and new (below ORF) introns. Arrowheads indicate RPG (black) and non-RPG (pink) introns.

Indeed, the overrepresentation of introns in RPGs is opposite to that expected by the mutational prediction of greater intron loss in highly expressed genes. The high level of conservation of 5′ RPG introns across eukaryotes, their clear overrepresentation in several independently reduced lineages, as well as direct evidence for their functional role suggest that these introns are particularly selectively important. Assuming that E. cuniculi RPG introns also possess a function that no 5′ bias is observed when these introns are excluded suggests that general mutational factors may not have been important in determining which introns have been retained in E. cuniculi. Furthermore, the intron distribution in E. cuniculi strongly suggests that the 5′-biased RPG introns represent a special case and argues against a general functionality for 5′ introns. Given the massive amount of intron loss in this lineage (compared with ancestral fungi), nearly all 5′ ancestral introns are lost and not retained by selection. Taking this suggestion together with the observation that non-RPG introns are not 5′ biased, it follows that the observed bias in 3′ loss in intron-rich organisms is unlikely to be due to a generally greater functional importance for 5′ introns. If these arguments are accurate, it would be highly ironic because the striking finding—a strong 5′ bias in highly reduced species—that was initially used to argue for RT-mediated intron loss might instead be due to other factors.

Supplementary Material

Supplementary table S1 is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

Acknowledgments

We thank Lisa Bowers and Elizabeth Didier for generous gifts of material. This work was supported by a grant from the Natural Sciences and Engineering Research Council of Canada (262988 to N.M.F.).

References

  1. Aruga J, Odaka YS, Kamiya A, Furuya H. Dicyema Pax6 and Zic: tool-kit genes in a highly simplified bilaterian. BMC Evol Biol. 2007;7:201. doi: 10.1186/1471-2148-7-201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bon E, Casaregola S, Blandin G, et al. (11 co-authors) Molecular evolution of eukaryotic genomes: hemiascomycetous yeast spliceosomal introns. Nucleic Acids Res. 2006;31:1121–1135. doi: 10.1093/nar/gkg213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Cornman RS, Chen YP, Schatz MC, et al. (11 co-authors) Genomic analyses of the microsporidian Nosema ceranae, an emergent pathogen of honey bees. PLoS Pathogens. 2009;5 doi: 10.1371/journal.ppat.1000466. e1000466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Douglas S, Zauner S, Fraunholz M, Beaton M, Penny S, Deng LT, Wu X, Reith M, Cavalier-Smith T, Maier UG. The highly reduced genome of an enslaved algal nucleus. Nature. 2001;410:1091–1096. doi: 10.1038/35074092. [DOI] [PubMed] [Google Scholar]
  5. Fink GR. Pseudogenes in yeast? Cell. 1987;49:5–6. doi: 10.1016/0092-8674(87)90746-x. [DOI] [PubMed] [Google Scholar]
  6. Gill EE, Lee RCH, Corradi N, Grisdale CJ, Limpright VO, Keeling PJ, Fast NM. Splicing and transcription differ between spore and intracellular life stages in the parasitic microsporidia. Mol Biol Evol. 2010 doi: 10.1093/molbev/msq050. Advance Access published February 18, 2010, doi:10.1093/molbev/msq050. [DOI] [PubMed] [Google Scholar]
  7. Gilson PR, McFadden GI. The miniaturized nuclear genome of eukaryotic endosymbiont contains genes that overlap, genes that are cotranscribed, and the smallest spliceosomal introns. Proc Natl Acad Sci U S A. 1996;93:7737–7742. doi: 10.1073/pnas.93.15.7737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Gilson PR, Su V, Slamovits CH, Reith ME, Keeling PJ, McFadden GI. Complete nucleotide sequence of the chlorarachniophyte nucleomorph: nature's smallest nucleus. Proc Natl Acad Sci U S A. 2006;103:9566–9571. doi: 10.1073/pnas.0600707103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Irimia M, Roy SW. Evolutionary convergence on highly-conserved 3′ intron structures in intron-poor eukaryotes and insights into the ancestral eukaryotic genome. PLoS Genet. 2008;4:e1000148. doi: 10.1371/journal.pgen.1000148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Irimia M, Roy SW, Neafsey DE, Abril JF, Garcia-Fernandez J, Koonin EV. Complex selection on 5′ splice sites in intron-rich organisms. Genome Res. 2009;19:2021–2027. doi: 10.1101/gr.089276.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Katinka MD, Duprat S, Cornillot E, et al. (17 co-authors) Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature. 2001;414:450–453. doi: 10.1038/35106579. [DOI] [PubMed] [Google Scholar]
  12. Mourier T, Jeffares DC. Eukaryotic intron loss. Science. 2003;5624:1393. doi: 10.1126/science.1080559. [DOI] [PubMed] [Google Scholar]
  13. Ogino K, Tsuneki K, Furuya H. Unique genome of dicyemid mesozoon: highly shortened spliceosomal introns in conservative exon/intron structure. Gene. 2010;449:70–76. doi: 10.1016/j.gene.2009.09.002. [DOI] [PubMed] [Google Scholar]
  14. Peyretaillade E, Goncalves O, Terrat S, Dugat-Bony E, Wincker P, Cornman RS, Evans JD, Delbac F, Peyret P. Identification of transcriptional signals in Encephalitozoon cuniculi widespread among Microsporidia phylum: support for accurate structural genome annotation. BMC Genomics. 2009;10:607. doi: 10.1186/1471-2164-10-607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Pleiss JA, Whitworth GB, Bergkessel M, Guthrie C. Rapid, transcript-specific changes in splicing in response to environmental stress. Mol Cell. 2007;27:928–937. doi: 10.1016/j.molcel.2007.07.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Roy SW, Gilbert W. The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet. 2006;7:211–221. doi: 10.1038/nrg1807. [DOI] [PubMed] [Google Scholar]
  17. Russell CB, Fraga D, Hinrichsen RD. Extremely short 20-33 nucleotide introns are the standard length in Paramecium tetraurelia. Nucleic Acids Res. 1994;22:1221–1225. doi: 10.1093/nar/22.7.1221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Schwartz SH, Silva J, Burstein D, Pupko T, Eyras E, Ast G. Large-scale comparative analysis of splicing signals and their corresponding splicing factors in eukaryotes. Genome Res. 2008;18:88–103. doi: 10.1101/gr.6818908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Slamovits CH, Keeling PJ. Evolution of ultrasmall spliceosomal introns in highly reduced nuclear genomes. Mol Biol Evol. 2009;26:1699–1705. doi: 10.1093/molbev/msp081. [DOI] [PubMed] [Google Scholar]
  20. Spingola M, Grate L, Haussler D, Ares M., Jr Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae. RNA. 1999;5:221–234. doi: 10.1017/s1355838299981682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Vilardell JR, Warner JR. Ribosomal protein L32 of Saccharomyces cerevisiae influences both the splicing of its own transcript and the processing of rRNA. Mol Cell Biol. 1997;17:1959–1965. doi: 10.1128/mcb.17.4.1959. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES