Abstract
Intron number varies considerably among genomes, but despite their fundamental importance, the mutational mechanisms and evolutionary processes underlying the expansion of intron number remain unknown. Here we show that Drosophila, in contrast to most eukaryotic lineages, is still undergoing a dramatic rate of intron gain. These novel introns carry significantly weaker splice sites that may impede their identification by the spliceosome. Novel introns are more likely to encode a premature termination codon (PTC), indicating that nonsense-mediated decay (NMD) functions as a backup for weak splicing of new introns. Our data suggest that new introns originate when genomic insertions with weak splice sites are hidden from selection by NMD. This mechanism reduces the sequence requirement imposed on novel introns and implies that the capacity of the spliceosome to recognize weak splice sites was a prerequisite for intron gain during eukaryotic evolution.
Author Summary
The surprising observation 30 years ago that genes are interrupted by non-coding introns changed our view of gene architecture. Intron number varies dramatically among species; ranging from nine introns/gene in humans to less than one in some simple eukyarotes. Here we ask where new introns come from and how they are maintained in a population. We find that novel introns do not arise from pre-existing introns, although the mechanisms that generate novel introns remain unclear. We also show that novel introns carry only weak signals for their identification and removal, and therefore depend on nonsense-mediated decay (NMD). NMD maintains RNA quality control by degrading transcripts that have not been spliced properly. We propose that NMD shelters novel introns from natural selection. This increases the likelihood that a novel intron will rise in frequency and be maintained within a population, thus increasing the rate of intron gain.
Introduction
Intron number is highly variable among eukaryotes, ranging from about a dozen in some fungi to more than 100,000 in the human genome. Comparative genomics across broad phylogenetic distances have identified the importance of both intron gain and loss to the establishment of this variation [1]. In particular for a number of lineages, including Drosophila [2], Caenorhabditis [3] and some isolated vertebrate lineages [4], a considerable number of intron gains have been described.
While there is a general agreement that the very first spliceosomal introns arose from the degeneration of self-splicing group II introns [5],[6], their complete absence from genomes that have undergone intron gain strongly suggests alternative mechanism(s) are at work. While several mechanism with varying levels of empirical support have been proposed over the last 30 years, there is still strong uncertainty over whether any existing model can explain the observed and predicted rates of intron gain throughout eukaryote evolution [7]. A satisfactory model must address the mutational mechanism that allows a intron to colonise a novel position and the evolutionary process that facilitates the fixation of this new allele within a population. An accounting of both mechanism and evolution should give insight into why the rate of intron gain is so variable between species.
Irrespective of the mutational mechanism, it is apparent that any new intron will require a number of key motifs including the 5′ and 3′ splice sites, and a set of auxiliary signals including the branch point and splicing enhancer and suppressor motifs [8],[9]. The failure to correctly identify an intron may either lead to stochastic alternative splicing or intron retention, both of which have deleterious consequences. This predicament is overcome if the newly inserted intron arrives fully functional. The only mechanism capable of generating a fully formed novel intron is reverse splicing [10],[11], in which an existing intron propagates into a new position, but this process is both extremely rare and inconsistent with the characteristics observed of novel introns [2]. The alternative is that novel introns develop gradually via the optimisation of previously non-intronic sequence. Examples include the intronisation of coding sequences [3], intron gain between paralogs of multi-copy gene families [4], the splicing of an Alu element [12], after internal gene duplication (including tandem duplication) [13] and after the insertion of new sequence of unknown origin [14].
In this study, we have investigated this alternative model in which novel introns are not required to be fully functional, relying instead on a back up mechanism of transcript quality control for incorrectly spliced introns [15]. In recent years it has become evident that the cell invests heavily in the identification of premature termination codons (PTCs) via the Nonsense Mediated Decay (NMD) pathway [16],[17]. NMD acts during the preliminary round of translation to identify in-frame stop codons and classify them as either genuine or premature. The use of incorrect splice sites or intron retention are a ready source of such premature termination codons (PTCs) and will invoke the NMD dependent destruction of the transcript.
Using comparative genomics of nine Drosophila species, we show that novel introns have weaker splice sites and carry more stop codons than conserved introns. We propose that NMD may play an important role during the establishment of novel introns within a population, and in support of this we identified a significant deficiency of novel introns that would remain invisible to the NMD pathway upon intron retention.
Results/Discussion
Here we have identified 307 novel introns amongst 284 genes across nine Drosophila genomes (Figure S1), presenting the most comprehensive set of novel introns to date. Our approach also detected 803 intron loss events amongst 595 genes, including 49 genes that have undergone both intron gain and loss (Dataset S1). These events show a strong heterogeneity across the Drosophila phylogeny, with several lineages being hot-spots of intron turnover (Figure 1 and Figure S2). We observe the highest rate of intron gain reported thus far, 2.8 intron gains/gene/Bya (109) years in the melanogaster subgroup, being 6× greater than previously reported for Drosophila [2] and 4× greater than the next highest reported rate (occurring in yeast) [1],[18]. Interestingly, this rate is still higher than the range of estimates required to have generated the intron-rich eumetazoan genome (0.99–2.39 gains/gene/Bya years) [1],[19]. In sharp contrast, several other Drosophila lineages have experienced far less intron gain. D. virilis underwent only 0.0022 intron gains/gene/Bya years and since the split between D. melanogaster and D. yakuba 10 million years ago not a single intron gain could be identified, demonstrating that the rate of intron gain may vary over orders of magnitude between closely related species.
The previously proposed mechanisms of intron gain assume that new intronic sequence originates from elsewhere in the genome (reverse splicing [11],[20] or mobile elements [10],[12],[21]), or is derived from the endogenous genomic location (tandem duplication [22]–[25] or intronisation [3],[7]). Despite a rigorous search (Text S1) we could not identify an homologous parental origin for any novel intron elsewhere within the respective genomes, consistent with other studies [2],[26]. A manual inspection of the sequence flanking each novel intron identified a single event reminiscent of tandem duplication. The Bap170 gene in D. pseudoobscura has undergone a gain of 218 bp, of which only 206 bp are spliced out, revealing an imperfect 8 amino acid repeat 5′ and 3′ of this novel intron (Figure S3). While in final stage of preparing this manuscript Li et al., (2009) reported that several novel introns in Daphnia are flanked by short direct repeats [14]. They insightfully suggest this may represent the signature of nonhomologous end joining (NHEJ) after uneven double-stranded breaks (DSBs), a process known to generate insertions flanked by direct repeats [27]. In consideration of this, we note that the duplication observed here may also be explained by a direct repeat flanking sequence of unknown origin. A manual inspection of dotplots identified 6 further examples in which direct repeats of length at least 8 bp overlapping the splice sites of a novel intron (Figures S4, S5, S6, S7, S8, S9) in support of the finding of Li et al., (2009).
Reasoning that changes to the length of the coding sequence directly flanking a novel intron, as observed for Bap170, may give further insight into the mechanism of intron gain, we checked all 307 novel introns for alterations to the coding sequence that would indicate either the loss or gain of adjacent amino acids. Novel introns did not alter the ancestral coding sequence in 87% (267/307) of the cases. The remaining 13% (40/307) modified the adjacent coding sequence by only 1–3 amino acids (in 3 cases there was a gain of 4 or 5 amino acids along next to the new intron). This observation is inconsistent with the intronisation model of intron gain [3],[7] which requires the conversion of exonic sequence into an intron, hence reducing the coding sequence by the size of the new intron. A manual inspection of these 40 coding sequence-changing novel introns identified a small number of cases that can be explained via the conversion of low complexity amino acid sequence into an intron (Figure 2 and Figure S10). The novel intron within gene CG42594 has arisen from a rapidly evolving low complexity region including poly-Q sequence. Species lacking this intron show a highly variable sequence of amino acids at this position, with length differences of up to 18 amino acids. In the ancestor of D. melanogaster and D. ananassae this low complexity amino acid sequence was converted into an intron, stabilising the flanking protein sequence, while freeing the new intronic sequence of length constraint.
This indicates that the expansion of protein sequences can generate novel introns. Indels account for the majority of sequence variation between Drosophila species (3.2% of variable nucleotides vs. 1.8% for SNPs [28]) making them a significant contributor to both coding and non coding length evolution. Previous work focused on the mechanism underlying relatively short insertions (<15 bp), therefore, to access the possible contribution of exonic insertions to intron gain we identified insertions long enough to generate a novel intron (>44 bp in Drosophila). This revealed 180 insertions (Dataset S1), the largest being an insertion of 165 amino acids within the XNP gene of D. pseudoobscura. This demonstrates the plasticity of protein length and establishes large insertions within the protein coding sequence of Drosophila as a viable source of novel intronic sequence.
We reason, that a much larger number of exonic insertions occur over evolutionary time providing the raw genetic variation for the gain of novel introns. The model that novel introns arise from a subset of “random” insertions within coding regions (or indeed UTR sequences) predicts that new introns are unlikely to arise with full strength splice sites. We observe that novel introns do in fact have weaker splice sites, with significantly reduced usage of the “strong” consensus motif at both the 5′ and 3′ splice site (Figure 3A and Figure S11). Furthermore, novel introns use a more diverse set of rare 5′ motifs than expected (Figure S11 and Dataset S1). Of course, weak, rare or atypical 5′ splice sites have lower affinity to the U1 snRNP of the spliceosome [29] which, all else being equal, leads to less efficient splicing [30],[31]. This poses a conundrum; if the mutational mechanism that generates novel introns leaves them vulnerable to suboptimal splicing, why do such novel introns rise to fixation within a population? We propose that the solution lies in the action of NMD.
Retention of 3n+1 and 3n+2 introns is expected to induce NMD due to the introduction of a frame-shift, but introns of length 3n require an in-frame PTC or they will remain invisible to the NMD pathway. Because of this, we reason that the failure to splice a new 3n insertion maybe deleterious, hence we predicted that novel 3n introns are more likely to encode a PTC as a backup mechanism for incomplete splicing. As the expectation for PTC occurrence is proportional to intron length, we fitted a logistic regression, modelling intron length, intron phase and a combined main effect of 3n class (3n vs. 3n+1 and 3n+2) and whether an intron is novel (n = 307) or conserved (n = 8,810) (Text S1). Despite its simplicity, our model was highly significant (P<0.0001) and explained 24% of the variation in the occurrence of stop codons among introns. Interestingly, most of the variation was explained by phase (Wald χ2 = 331.5, P<0.0001) and not intron length (Wald χ2 = 174.2, P<0.0001). Phase 2 introns encode significantly more in-frame PTCs than either phase 0 and 1 due to the sequence requirements of the 5′ splice site. The canonical 5′ splice site GT(A/G)A restricts the first full potential codon of a phase 2 intron to either the TAA Ochre or TGA Opal stop codon. Only a minority of introns with non-canonical splice sites escape this constraint.
Our analysis indicates that selection acts against introns that are invisible to the NMD pathway (if they undergo intron retention) leading to a deficiency of 3n PTC-free introns across the genome, as previously reported [32] (Figure 3B). This verifies in Drosophila that NMD carries a significant load caused by the weak splicing of introns [16]. We also observe this deficit of 3n PTC-free introns within the 307 novel introns. Interestingly, we find that this effect is significantly stronger among novel introns than among conserved introns (Odds ratio of 3.027 for novel vs. 1.646 for conserved), supporting the central role of NMD in the establishment of newly inserted sequence as novel introns.
Here we have shown that while the expansion of amino acid repeats within exons can generate novel introns, nevertheless, the sequence origin for the vast majority remains unknown. This observation is inconsistent with previously suggested mechanisms of intron gain, but supported by the recent study of novel introns within Daphnia [14]. We have demonstrated that novel introns in Drosophila use weaker splice sites and are deficient for 3n PTC-free introns. Therefore, our evidence suggests that the establishment of these new sequences as introns is facilitated by NMD. Therefore, we propose a new model of intron gain (Figure 4), in which mutational mechanisms generate insertions that already carry the minimal requirements for correct, but not necessarily strong splicing. Cytoplasmic NMD is expected to degrade any unspliced transcript, leaving a proportion with the correct coding sequence. Conditional on adequate expression levels, this will shelter the new intron from selection allowing it to segregate within the population as a neutral polymorphism. Importantly, NMD allows new introns to utilise a more degenerate set of splice sites, thereby increasing the likelihood that any new sequence may become captured by a novel intron.
This model makes several predictions: First, novel introns are not required to pass through a protein coding intermediate stage (as would be expected from the intronisation of existing exonic sequence) and therefore, should not show codon usage bias. We observed no correlation between the “codon” usage of novel introns and the expected codon usage for Drosophila genes (Spearman Correlation Coefficient 0.01983, P = 0.8764) (Figure S12 and Text S1). Second, in general, introns with weaker splice sites are expected to suffer higher rates of failed splicing (intron retention or exon skipping), but we observe less intron retention among novel introns (2.6%) compared to conserved introns (5.3%). This is consistent with our expectation that via the action of NMD these transcripts are removed.
The “faux 3′ UTR” model suggests that PTC recognition depends on the distance to the downstream polyA tail [33],[34]. This makes NMD more potent towards the 5′ end of the transcript, leading to a third prediction; the establishment of novel introns should also be more efficient towards the 5′. As expected, we identified a strong and highly significant 5′ bias for novel introns (χ2 = 26.063, P<0.001) (Figure S13 and Text S1) in support of previous work [2]. NMD is more effective towards the 5′ as a PTC located towards the 3′ is more likely to be recognised as a canonical stop codon [7]. Hence, the involvement of NMD in the establishment of novel introns can explain the thus far enigmatic 5′ bias observed within a number of species [2],[7],[35]. The 803 lost introns reported here show no positional bias (χ2 = 1.309, P = 0.2526), consistent with previous reports [2],[18],[36]. In addition to 3′ UTR length the exon junction complex can invoke NMD in mammals. In effect this allows the recognition of PTCs in close proximity to the polyA tail, enhancing the effectiveness of NMD towards the 3′ of a transcript. Testing the influence of this on the distribution of novel introns is difficult due to their scarcity, but we note that mammalian genomes do not show the 5′ bias among all intron seen in Drosophila [35].
A significant question remains why does the rate of intron gain vary so much between closely related species? While differences in the action and potency of NMD are likely to exist between highly divergent taxa, we do not expect much variation on the fine scale of the Drosophila phylogeny. In contrast, the mutational processes that generate repeat expansions, tandem duplications [13], insertions of unknown origin [37] and DSBs are known to vary greatly between both closely and distantly related species. Differences in these underlying mechanisms will generate species specific variation upon which our proposed mechanism of intron fixation may act. This offers a possible explanation for the variation in intron gain rates observed here and over longer periods of eukaryotic evolution.
Methods
Discovery and validation of novel introns
Our approach to studying intron evolution is based on identifying gene orthologs across the Drosophila clade, predicting gene structure with GeneWise and using Dollo Parsimony to infer intron gain and loss events (Figure S1).
We identified orthologous genes using the D. melanogaster (release 4.3) gene set as the basis of a best-bidirectional-blast-hit approach in the 11 other sequenced Drosophila species, namely; D. erecta, D. yakuba, D. ananassae, D. pseudoobscura, D. willistoni, D. virilis, D. mojavensis and D. grimshawi (obtained from http://rana.lbl.gov/drosophila/). We excluded D. sechellia, D. simulans and D. persimilis because of low sequence coverage [38]. We acknowledge that a bidirectional-blast approach carries limitations but given our subsequent validation of intron turnover events feel this method was suitable. High-scoring segment pairs (HSPs) were identified via forward tblastx with default parameters followed by reverse tblastx using sequence cropped on either side of the best hit equivalent to the length of the corresponding gene in D. melanogaster. We considered the HSPs to be orthologous when the reverse blast identified only the parental gene in D. melanogaster.
Exon-intron structure of orthologous genes was generated by submitting to GeneWise [39] (2193 algorithm) the longest amino acid isoform of each D. melanogaster gene together with 100kb of nucleotide sequence flanking the corresponding orthologous hit. We excluded any gene with a frameshift mutation (either real or due to sequencing errors). Intron gain and loss events were predicted using the Malin java application [40]. The dense phylogeny of sequenced Drosophila genomes increases the power of Dollo Parsimony to accurately infer intron gain events, reducing the advantages of maximum likely methods [41]. Along two branches of the phylogeny (leading to D. willistoni and D. grimshawi) Dollo Parsimony remains sensitive to multiple losses being inferred as intron gain, but given the active debate about the best methods to infer intron turnover [42] we feel our approach and extensive downstream validation have proved reliable.
As our approach relies on de novo gene structure prediction via GeneWise it is sensitive to false positive and false negative intron prediction in other species. This problem was avoided in a previous study by considering only introns present in the well annotated D. melanogaster lineage [2]. Our approach takes full advantage of the multiple sequences genomes to find intron gain events outside of D. melanogaster, but required extensive validation to overcome the several limitations of GeneWise (detailed in Text S1 and Dataset S1).
This approach generated a high confidence set of 3,593 fully annotated orthologous genes (containing 8,810 introns) across nine Drosophila species, allowing us to identify intron gain and loss events across 40Mys of Drosophila evolution. Our approach is based on the amino acid sequence in D. melanogaster and is therefore not able to predict UTR introns. After this we still expected our data set to contain false positives (predicted novel introns that are not really introns) and false negatives (real introns that have been missed). Our experimental and informatic methods for their identification and exclusion are detailed in the Text S1. Novel intron sequences and gene, protein and intronic sequences for our orthologous gene set are available for download at http://i122server.vu-wien.ac.at/Drosophila_annotation/.
The strength of novel splice sites
As per previous studies [43],[44], we used the percentage of introns with the consensus 5′ splice site GT(A/G)AGT (position +1 to +6) as a measure of the splice site strength within each class of introns. To confirm that novel introns use this motif significantly less than all introns we resampled (bootstrap with replacement) 307 introns from the population of 50,836 D. melanogaster introns 10,000 times (Figure S11A). The top and bottom 2.5% of samples gave the 95% confidence intervals on the observed percentages for all introns. The observed percentage of novel introns fell outside these confidence intervals establishing significance. Resampling (307 from 307, with replacement) from novel introns (black bars in Figure S11A and S11B) gives an indication of the variance within novel introns, but is not actually required to establish the significance between all and novel. We repeated this approach for the CAG motif at −3 to −1 of the 3′ splice site (Figure S11B). To show that novel introns use a more diverse set of rare/weak motifs at the 5′ we used the same bootstrap data from above and counted the number of different motifs present in each sample (Figure S11C).
Supporting Information
Acknowledgments
We thank J. Heraud, A. McGregor, T. Flatt, and other members of the Institut für Populationsgenetik for critical discussions and comments on the manuscript.
Footnotes
The authors have declared that no competing interests exist.
This work was supported by grants of the Fonds zur Förderung der wissenschaftlichen Forschung (P19832 L403) awarded to CS and Eurasia-Pacific Uninet PhD scholarship awarded to LH. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Tarrio R, Ayala FJ, Rodriguez-Trelles F. Alternative splicing: a missing piece in the puzzle of intron gain. Proc Natl Acad Sci U S A. 2008;105:7223–7228. doi: 10.1073/pnas.0802941105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Coulombe-Huntington J, Majewski J. Intron Loss and Gain in Drosophila. Molecular Biology and Evolution. 2007;24:2842–2850. doi: 10.1093/molbev/msm235. [DOI] [PubMed] [Google Scholar]
- 3.Irimia M, Rukov JL, Penny D, Vinther J, Garcia-Fernandez J, et al. Origin of introns by ‘intronization’ of exonic sequences. Trends Genet. 2008;24:378–381. doi: 10.1016/j.tig.2008.05.007. [DOI] [PubMed] [Google Scholar]
- 4.Ragg H, Kumar A, Koster K, Bentele C, Wang Y, et al. Multiple gains of spliceosomal introns in a superfamily of vertebrate protease inhibitor genes. BMC Evol Biol. 2009;9:208. doi: 10.1186/1471-2148-9-208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cavalier-Smith T. Intron phylogeny: a new hypothesis. Trends Genet. 1991;7:145–148. [PubMed] [Google Scholar]
- 6.Toor N, Keating K, Taylor S, Pyle A. Crystal Structure of a Self-Spliced Group II Intron. Science. 2008:77–82. doi: 10.1126/science.1153803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Catania F, Lynch M. Where do introns come from? PLoS Biol. 2008;6:e283. doi: 10.1371/journal.pbio.0060283. doi: 10.1371/journal.pbio.0060283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zhang C, Li WH, Krainer AR, Zhang MQ. RNA landscape of evolution for optimal exon and intron discrimination. Proc Natl Acad Sci U S A. 2008;105:5797–5802. doi: 10.1073/pnas.0801692105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wahl MC, Will CL, Luhrmann R. The spliceosome: design principles of a dynamic RNP machine. Cell. 2009;136:701–718. doi: 10.1016/j.cell.2009.02.009. [DOI] [PubMed] [Google Scholar]
- 10.Cavalier-Smith T. Selfish DNA and the origin of introns. Nature. 1985;315:283–284. doi: 10.1038/315283b0. [DOI] [PubMed] [Google Scholar]
- 11.Roy SW, Irimia M. Mystery of intron gain: new data and new models. Trends Genet. 2009;25:67–73. doi: 10.1016/j.tig.2008.11.004. [DOI] [PubMed] [Google Scholar]
- 12.Pavesi G, Zambelli F, Caggese C, Pesole G. Exalign: a new method for comparative analysis of exon-intron gene structures. Nucleic Acids Res. 2008;36:e47. doi: 10.1093/nar/gkn153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gao X, Lynch M. Ubiquitous internal gene duplication and intron creation in eukaryotes. Proc Natl Acad Sci U S A. 2009 doi: 10.1073/pnas.0911093106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Li, Tucker, Sung, Thomas k, Lynch Extensive, Recent Intron Gains in Daphnia Population. Science. 2009;326:1260–1262. doi: 10.1126/science.1179302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Catania F, Gao X, Scofield DG. Endogenous mechanisms for the origins of spliceosomal introns. J Hered. 2009;100:591–596. doi: 10.1093/jhered/esp062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Sayani S, Janis M, Lee CY, Toesca I, Chanfreau GF. Widespread impact of nonsense-mediated mRNA decay on the yeast intronome. Mol Cell. 2008;31:360–370. doi: 10.1016/j.molcel.2008.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hansen KD, Lareau LF, Blanchette M, Green RE, Meng Q, et al. Genome-wide identification of alternative splice forms down-regulated by nonsense-mediated mRNA decay in Drosophila. PLoS Genet. 2009;5:e1000525. doi: 10.1371/journal.pgen.1000525. doi: 10.1371/journal.pgen.1000525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Nielsen C, Friedman B, Birren B, Burge C, Galagan J. Patterns of Intron Gain and Loss in Fungi. Plos Biol. 2004;2:e422. doi: 10.1371/journal.pbio.0020422. doi: 10.1371/journal.pbio.0020422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV. Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol. 2003;13:1512–1517. doi: 10.1016/s0960-9822(03)00558-x. [DOI] [PubMed] [Google Scholar]
- 20.Palmer JD, Logsdon JM., Jr The recent origins of introns. Curr Opin Genet Dev. 1991;1:470–477. doi: 10.1016/s0959-437x(05)80194-7. [DOI] [PubMed] [Google Scholar]
- 21.Crick F. Split genes and RNA splicing. Science. 1979;204:264–271. doi: 10.1126/science.373120. [DOI] [PubMed] [Google Scholar]
- 22.Rogers JH. How were introns inserted into nuclear genes? Trends Genet. 1989;5:213–216. doi: 10.1016/0168-9525(89)90084-x. [DOI] [PubMed] [Google Scholar]
- 23.Venkatesh B, Ning Y, Brenner S. Late changes in spliceosomal introns define clades in vertebrate evolution. Proc Natl Acad Sci U S A. 1999;96:10267–10271. doi: 10.1073/pnas.96.18.10267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Knowles DG, McLysaght A. High rate of recent intron gain and loss in simultaneously duplicated Arabidopsis genes. Mol Biol Evol. 2006;23:1548–1557. doi: 10.1093/molbev/msl017. [DOI] [PubMed] [Google Scholar]
- 25.Sharpton T, Neafsey D, Galagan J, Taylor J. Mechanisms of intron gain and loss in Cryptococcus. Genome Biol. 2008;9:R24. doi: 10.1186/gb-2008-9-1-r24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Fedorov A, Roy S, Fedorova L, Gilbert W. Mystery of intron gain. Genome Res. 2003;13:2236–2241. doi: 10.1101/gr.1029803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Hazkani-Covo E, Covo S. Numt-mediated double-strand break repair mitigates deletions during primate genome evolution. PLoS Genet. 2008;4:e1000237. doi: 10.1371/journal.pgen.1000237. doi: 10.1371/journal.pgen.1000237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Tanay A, Siggia ED. Sequence context affects the rate of short insertions and deletions in flies and primates. Genome Biol. 2008;9:R37. doi: 10.1186/gb-2008-9-2-r37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Roca X, Sachidanandam R, Krainer AR. Determinants of the inherent strength of human 5′ splice sites. Rna. 2005;11:683–698. doi: 10.1261/rna.2040605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Freund M, Hicks MJ, Konermann C, Otte M, Hertel KJ, et al. Extended base pair complementarity between U1 snRNA and the 5′ splice site does not inhibit splicing in higher eukaryotes, but rather increases 5′ splice site recognition. Nucleic Acids Res. 2005;33:5112–5119. doi: 10.1093/nar/gki824. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ding W, Lin L, Ren F, Zou H, Duan Z, et al. Effects of splice sites on the intron retention in histamine H3 receptors from rats and mice. J Genet Genomics. 2009;36:475–482. doi: 10.1016/S1673-8527(08)60137-X. [DOI] [PubMed] [Google Scholar]
- 32.Jaillon O, Bouhouche K, Gout J, Aury J, Noel B, et al. Translational control of intron splicing in eukaryotes. Nature. 2008;451:359–362. doi: 10.1038/nature06495. [DOI] [PubMed] [Google Scholar]
- 33.Brogna S, Wen J. Nonsense-mediated mRNA decay (NMD) mechanisms. Nat Struct Mol Biol. 2009;16:107–113. doi: 10.1038/nsmb.1550. [DOI] [PubMed] [Google Scholar]
- 34.Stalder L, Muhlemann O. The meaning of nonsense. Trends Cell Biol. 2008;18:315–321. doi: 10.1016/j.tcb.2008.04.005. [DOI] [PubMed] [Google Scholar]
- 35.Jeffares D, Mourier T, Penny D. The biology of intron gain and loss. Trends in Genetics. 2006;22:16–22. doi: 10.1016/j.tig.2005.10.006. [DOI] [PubMed] [Google Scholar]
- 36.Lin H, Zhu W, Silva J, Gu X, Buell C. Intron gain and loss in segmentally duplicated genes in rice. Genome Biol. 2006;7:R41. doi: 10.1186/gb-2006-7-5-r41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Chen JQ, Wu Y, Yang H, Bergelson J, Kreitman M, et al. Variation in the ratio of nucleotide substitution and indel rates across genomes in mammals and bacteria. Mol Biol Evol. 2009;26:1523–1531. doi: 10.1093/molbev/msp063. [DOI] [PubMed] [Google Scholar]
- 38.Hahn M, Han M, Han S. Gene Family Evolution across 12 Drosophila Genomes. PLoS Genet. 2007;3:e197. doi: 10.1371/journal.pgen.0030197. doi: 10.1371/journal.pgen.0030197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Birney E, Clamp M, Durbin R. GeneWise and Genomewise. Genome Res. 2004;14:988–995. doi: 10.1101/gr.1865504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Csuros M, Holey J, Rogozin I. In search of lost introns. Bioinformatics. 2007;23:i87–i96. doi: 10.1093/bioinformatics/btm190. [DOI] [PubMed] [Google Scholar]
- 41.Roy SW, Gilbert W. Rates of intron loss and gain: implications for early eukaryotic evolution. Proc Natl Acad Sci U S A. 2005;102:5773–5778. doi: 10.1073/pnas.0500383102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Stajich JE, Dietrich FS, Roy SW. Comparative genomic analysis of fungal genomes reveals intron-rich ancestors. Genome Biol. 2007;8:R223. doi: 10.1186/gb-2007-8-10-r223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Stephens RM, Schneider TD. Features of spliceosome evolution and function inferred from an analysis of the information at human splice sites. J Mol Biol. 1992;228:1124–1136. doi: 10.1016/0022-2836(92)90320-j. [DOI] [PubMed] [Google Scholar]
- 44.Irimia M, Roy SW. Evolutionary convergence on highly-conserved 3′ intron structures in intron-poor eukaryotes and insights into the ancestral eukaryotic genome. PLoS Genet. 2008;4:e1000148. doi: 10.1371/journal.pgen.1000148. doi: 10.1371/journal.pgen.1000148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Hiller M, Platzer M. Widespread and subtle: alternative splicing at short-distance tandem sites. Trends in Genetics. 2008;24:246–255. doi: 10.1016/j.tig.2008.03.003. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.