Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 1999 Aug 31;96(18):10267–10271. doi: 10.1073/pnas.96.18.10267

Late changes in spliceosomal introns define clades in vertebrate evolution

Byrappa Venkatesh *,, Yana Ning , Sydney Brenner *,‡
PMCID: PMC17877  PMID: 10468597

Abstract

The evolutionary origin of spliceosomal introns has been the subject of much controversy. Introns are proposed to have been both lost and gained during evolution. If the gain or loss of introns are unique events in evolution, they can serve as markers for phylogenetic analysis. We have made an extensive survey of the phylogenetic distribution of seven spliceosomal introns that are present in Fugu genes, but not in their mammalian homologues; we show that these introns were acquired by actinopterygian (ray-finned) fishes at various stages of evolution. We have also investigated the intron pattern of the rhodopsin gene in fishes, and show that the four introns found in the ancestral chordate rhodopsin gene were simultaneously lost in a common ancestor of ray-finned fishes. These changes in introns serve as excellent markers for phylogenetic analysis because they reliably define clades. Our intron-based cladogram establishes the difficult-to-ascertain phylogenetic relationships of some ray-finned fishes. For example, it shows that bichirs (Polypterus) are the sister group of all other extant ray-finned fishes.


Two competing theories have been proposed to explain the origin of spliceosomal introns, which are widespread in eukaryote genomes but absent from prokaryotes. The “introns early theory” states that the introns are ancient and have been lost in different lineages (1, 2). On the other hand, the “introns late theory” maintains that the spliceosomal introns were inserted into the eukaryote genes later in evolution (36). Although the distribution of intron phases and the correlation between intron positions and protein module boundaries have been proposed as evidence for the ancient origin of introns (2), the restricted phylogenetic distribution of some introns suggests that they arose late in evolution (710).

We and others have identified extra spliceosomal introns in the pufferfish (Fugu) and some other teleosts that are absent from mammals (refs. 1118; B. Peixoto and S.B., unpublished work). Likewise, extra introns were also found in some mammalian genes (16, 19, 20). These discordant introns could be the result of either loss of ancestral introns or gain of novel introns in different lineages. Because the spliceosomal introns are not self-splicing and are not known to be mobile, the loss or the gain of spliceosomal introns in a lineage is likely to be a unique event, occurring at a specific point in its evolution; hence it might serve as a decisive marker for evolutionary studies. To evaluate and confirm whether the extra vertebrate introns are the result of a loss or gain of introns, we made an extensive survey of the phylogenetic distribution of seven of these introns, including one each from the growth hormone gene (15), the major histocompatibility class II B-chain (MhcII) gene (13) and the mixed lineage leukemia-like (Mll) gene (16), and two each from the dystrophin gene (14) and the RAG1 gene (21). We also analyzed the distribution of introns in the rhodopsin gene, which has been shown to contain introns in mammals (22) and in primitive chordates such as lampreys (23) and skates (24), but not in some teleosts (25).

METHODS

PCR. Genomic DNA was extracted from fresh or frozen tissues by using standard protocols. Gene fragments were amplified by PCR with AmpliTaq DNA polymerase (Perkin–Elmer) or the Expand Long Template PCR system (Boehringer Mannheim). A typical PCR cycle consisted of an initial denaturing step of 92°C for 2 min, 35 cycles of 92°C for 30 sec, 55°C for 1 min, and 72°C for 2 min, followed by a final elongation step at 72°C for 6 min. For some combinations of primers and templates, higher (60°C) or lower (50°C) annealing temperatures were used to optimize the amplification conditions. The following sense and antisense primers complementary to the exons flanking the intron were used in PCR.

Growth hormone intron 4a (Gh4a), TGYTTYAARAARGAYATG and AGRTANGTYTCNACCT flanking the Fugu growth hormone gene from codon 175 to 177 (15) (the antisense primer includes two bases complementary to AG, which are the invariant 3′ end bases of introns, and thus no PCR product was obtained when the gene lacked the intron).

Mhc class II B-chain intron 2a (Mhc2a), TGCWGYGYRTAYGRSTTCTACCC and AGGCTKGKRTGCTCCACCWRRCA (extended primers Tu97 and Tu40) (26).

Mixed lineage leukemia intron 25a (Mll25a), GCNCGNTCNAAYATGTTYTTYGG and ATRTTNCCRCARTCRTCRCTRTT flanking the Fugu Mll gene from codon 3124 to 3376 (16).

Dystrophin intron 6a (Dyst6a), ATGGCNGGNYTNCARCARAC and GCNARNCCRTCRTTCCARCT flanking the human dystrophin gene from codon 134 to 166 (27).

Dystrophin intron 10a (Dyst10a), TAYCARACNGCNYTNGARGA and TGNGTRTGRAAYTGNTCYTT flanking the human dystrophin gene from codon 350 to 376 (27).

RAG1 intron a (RAG1a), AARTTYTCNGANTGGAARTT and ACRTCNACYTTRAANACYTT flanking the zebrafish RAG1 gene from codon 27 to 157 (21).

RAG1 intron b (RAG1b), TTYGCNGARAARGARGARGG and TACATYTTRTGRTAYTGRCT flanking the zebrafish RAG1 gene from codon 456 to 511 (21).

Rhodopsin, CCNTAYGAYTAYCCNCARTAYTA and TTNCCRCARCAYAANGTNGT flanking the teleost rhodopsin gene from codon 30 to 319 (25).

The rhodopsin primers amplified the gene when there was no intron present but failed to amplify the gene when introns were present (e.g., Polypterus), presumably because of the large size of the gene that contained introns. The presence of intron in the Polypterus (bichir) rhodopsin gene was subsequently confirmed by amplifying and sequencing the second intron together with its flanking exons by using the primers complementary to the flanking exons (GTNGTNTTYACNTGGATHATGGC and CCRCANGARCAYTGCATNCCYTC).

The absence of Gh4a intron from the Torpedo (electric ray), Lepisosteus (gar), and Amia (bowfin) was confirmed by cloning and sequencing the growth hormone gene fragments spanning exon 3, intron 3, and exon 4 by PCR with primers specific for the genes. Because of the sequence similarity between the dystrophin and its related protein utrophin, the dystrophin primers that we used also amplified the corresponding utrophin fragments. Utrophin fragments from all of the fishes were cloned and sequenced, and none contained introns at positions corresponding to Dyst6a and Dyst10a (data not shown).

Cloning and Sequencing.

The PCR fragments were cloned into a T-vector (modified pBluescript) and sequenced by using an ABI 373A DNA sequencer (Perkin–Elmer Applied Biosystems). All the PCR fragments were sequenced completely, with the exception of those which contained introns larger than 1.5 kb. Only end sequences (about 350 bp) of such large introns were determined.

RESULTS AND DISCUSSION

By using degenerate PCR primers, we amplified genomic fragments spanning the introns from representatives of most of the major groups of ray-finned fishes; we confirmed the presence or absence of an intron by cloning and sequencing the PCR products. We also cloned the introns from shark/torpedo (Chondrichthyes) which are the common ancestors of ray-finned fishes and the tetrapods. Fig. 1 summarizes our results and shows that most of the intron gain or loss events occur at unique points in the lineage.

Figure 1.

Figure 1

Phylogenetic distribution of spliceosomal introns. The phylogenetic tree is based on Nelson’s classification of fishes (28). A plus sign (+) indicates presence and a minus sign (−) indicates absence of intron. An asterisk (∗) indicates that no PCR fragment was obtained, presumably because of the large size of the intron. The letter “a” indicates a failure to clone by PCR because of the highly variable coding sequence in this region of RAG1. Data from previous studies (Gh4a, refs. 11 and 15; Mll25a, ref. 16; Dyst6a and Dyst10a, ref. 14; Mhc2a, refs. 12, 13, and 29; RAG1a and RAG1b, B. Peixoto and S.B., unpublished work and refs. 21 and 30; Rhod, refs. 22, 24, and 25) are enclosed in parentheses. Gh4a, (15); Mll25a, (16); Dyst6a and Dyst10a, (14); Mhc2a, (13); RAG1a and RAG1b, (21); Rhod, rhodopsin gene (+ represents presence of four introns and − represents absence of all four introns). The rhodopsin gene from the primitive lamprey (Agnatha) contains four introns (23) (not shown) as in the mammalian rhodopsin. RAG1a intron was previously reported to be absent from the rainbow trout (Oncorhynchus mykiss) (30), but our PCR products from the rainbow trout and the brown trout (data not shown) contained this intron.

The MhcII locus is an apparent exception to this general rule. Sequences without introns and sequences with small introns can always be amplified by PCR. However, if a large intron is present, we often fail to obtain a PCR product. Thus we find MhcII fragments (more than one in each species) with no intron until we encounter paracanthopterygians, for which no product was obtained. We conclude that the paracanthopterygian MhcII has an intron, but it is too large to be amplified. In Mugil (mullet) and some other fishes including the tetraodontoid boxfish (Ostracion), two PCR products were found, one without and one with an intron.

In other recent fishes only the intron-containing fragments were found. However, four of the perciforms—seabass (Dicentrarchus), grouper (Epinephelus), blenny (Salarias), and goby (Cryptocentrus)—contain only intron-lacking fragments. Because it is common for fish to have multiple copies of the histocompatibility gene (29, 31, 32), only one of these copies might have acquired an intron, and in later lineages either copy could have been lost. We believe that the losses of the copy with the intron account for the genes without introns in the four perciform fishes.

The late changes in spliceosomal introns clearly define clades in the ray-finned fish lineage and resolve some of the taxonomic problems of this large group of living vertebrates. Although a robust classification of fishes has been developed by using morphological characters, the phylogenetic relationships of many closely related groups of fishes are ambiguous, because of the paucity of synapomorphies and/or mosaic distribution of morphological characters (28, 33). Molecular data also have not been very useful in resolving these relationships (33). The intron-based cladogram inferred by us shows that the bichir (Polypterus) is the sister group of all other extant ray-finned fishes. The presence of a large number of both primitive and derived characters in this bony fish has rendered its phylogenetic position uncertain. The intron pattern in the bichir, in particular the presence of intron RAG1b, which is absent from Chondricthyes (cartilaginous fishes) and tetrapods but is present in other ray-finned fishes (Actinopterygii), and the presence of rhodopsin introns, which are absent from all other ray-finned fishes but are present in Chondrichthyes and tetrapods, unequivocally place this fish at the bottom of the ray-finned fish lineage.

Our results also clearly establish the relationships among the groups Chondrichthyes, Chondrostei, Neopterygii, and Teleostei (Fig. 2A). The Chondrichthyes are the ancestors of Actinopterygii (Chondrostei + Neopterygii + Teleostei). Among actinopterygiians, Acipenseriformes (sturgeons) branched off after Polypteriformes (bichirs), and Neopterygii (gars and bowfins) and Teleostei are sister groups. These relationships are in agreement with the morphology-based phylogeny of these fishes (34), and they disagree with the mitochondrial sequence-based phylogenetic tree that had identified Chondrichthyes and Teleostei as sister groups (35). Our cladogram also resolves the interrelationships of protacanthopterygian fishes (Fig. 2B). It shows that Osmeridae evolved after both esociformes and salmoniformes (their relative positions are not resolved by our data). Likewise, Galaxiidae branched after Osmeridae. Extensive sampling of fish species belonging to Protacanthopterygii should demarcate the species in which introns Gh4a, Mll25a, and Dyst10a are present, thus providing further insight into their evolutionary relationships.

Figure 2.

Figure 2

Cladogram showing the phylogenetic relationships of fishes, inferred by using the presence or absence of introns as character states. The numbers in the boxes represent introns cloned by us (1, rhodopsin; 2, RAG1b; 3, RAG1a; Gha4a; 5, Mll25a; 6, Dyst10a; 7, Dyst6a; and 8, Mhc2a). B is an extension of the Division Teleostei from A.

Our results provide direct evidence that both gain and loss of spliceosomal introns have occurred in evolution. The loss of some or all of the introns from a gene can be explained by the insertion of a reverse-transcribed, partially or completely spliced RNA into the genome. However, the events involved in the gain of spliceosomal introns are not well understood. The consensus sequence flanking five of the seven “late” introns that we have cloned (Table 1) is identical to the previously proposed “proto-splice” site (MAG/R) (36), and thus lends support to the hypothesis that new introns are gained at this site. Based on the sequence similarity between recent introns and preexisting introns, it has been suggested that the new introns were gained through gene conversion events (7) or by transposition of a preexisting intron to a new location (8). We have observed that some of the extra introns found here contain sequences with some degree of homology to the sequences in their 5′ flanking exons (Fig. 3). The Gh4a intron from Gadus (Atlantic cod) contains sequences corresponding to a stretch of 17 codons with 9 conserved (similar exon sequences are also found in the Gh4a intron from the Arctic cod, Boreogadus; data not shown); Mhc2a from Mugil contains two copies of sequences corresponding to 14 codons, and Mhc2a from Dissotichus contains sequences corresponding to 11 codons with 9 conserved (Fig. 3).

Table 1.

The size range and phase of introns, and the consensus sequences flanking them

Intron Size range, kb Phase Flanking consensus sequence/intron
Gh4a 0.07–1.08 0 AAG/GT
Mll25a 0.07–1.20 2 TTT/GA
Dyst6a 0.09–4.00 0 CAG/GT
Dyst10a 0.09–1.22 0 GAG/GT
MhcII2a 0.08–0.68 2 CAG/GT
RAG1a 0.09–1.27 2 CAG/GT
RAG1b 0.06–4.1 1 AAG/GC

Figure 3.

Figure 3

Introns that contain sequences corresponding to their 5′ flanking exons. Coding sequences are shown in uppercase letters and intron sequences in lowercase letters. Intron sequences are aligned with their 5′ flanking exons to show the codon sequences (underlined) in the intron. Nucleotide sequences are numbered above the line. Amino acid sequences are shown in single-letter codes.

These exons have undergone tandem duplications, and, although the conservation of sequence suggests that the duplications are recent, their existence leads us to propose a hypothesis for the origin of late introns. We suggest that tandem duplications of exons could give rise to novel introns if the sequence contains or acquires the elements required for splicing. Indeed, we note that if the proto-splice site is part of the sequence duplicated, it will automatically provide the correct sequences for the intron boundaries. It does not require recombination or transposition events acting over long distances. The virtue of this theory (exon duplication) for the origin of late introns is that it can also explain the origin of novel exons for alternative splicing, and it provides a common mechanism for both events.

Genetic events leading to the gain or loss of introns might have occurred initially in a single individual, and they would still be required to be fixed. Thus the ancestral population would be polymorphic for these changes, but could still have constituted a single species. The fixation of the event would take place in the descendants of the polymorphic ancestor regardless of the taxonomic position we now ascribe to it. These descendants thus constitute a monophyletic lineage derived from a single species. Thus the changes in spliceosomal introns are robust cladistic markers for tracing monophyletic lineages and establishing evolutionary relationships. The discordant introns can be easily identified from comparisons of the sequences of orthologous genes from phylogenetically distant taxa such as the Fugu and human; these introns can then be traced in many different species by using PCR methods to identify the phylogenetic branch points.

Acknowledgments

We thank C. H. Cheng-DeVries, M. Inoue, P. Linser, G. Martinez, G. Mikawa, J. G. Patil, H. S. Pradeep, S. H. San Ling, N. Takamatsu, S. M. Veeranna, Y. Wakamatsu, S. Winkler, and G. Yearsley for fish DNA; and we are grateful to B. H. Tay and B. Y. Goh for technical help in cloning and sequencing.

ABBREVIATIONS

Dyst6a and Dyst10a

novel introns found in the Fugu dystrophin gene

Gh4a

fifth intron in the growth hormone gene

Mhc2a

intron 2a in the Mhc class II β-chain gene

Mll25a

intron 25a in the Fugu mixed lineage leukemia gene

RAG1a and RAG1b

novel introns in the RAG1 gene

Footnotes

Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. AF134595AF134630; AF134919AF134976; AF137083AF137130; AF137132AF137262; AF142553AF142564; and AF148142AF148144).

References

  • 1.Gilbert W, Marchionni M, McKnight G. Cell. 1986;46:151–154. doi: 10.1016/0092-8674(86)90730-0. [DOI] [PubMed] [Google Scholar]
  • 2.Gilbert W, de Souza S J, Long M. Proc Natl Acad Sci USA. 1997;94:7698–7703. doi: 10.1073/pnas.94.15.7698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Palmer J D, Logsdon J M. Curr Opin Genet Dev. 1991;1:470–477. doi: 10.1016/s0959-437x(05)80194-7. [DOI] [PubMed] [Google Scholar]
  • 4.Cavalier-Smith T. Trends Genet. 1991;7:145–148. [PubMed] [Google Scholar]
  • 5.Cho G, Doolittle R F. J Mol Evol. 1997;44:573–584. doi: 10.1007/pl00006180. [DOI] [PubMed] [Google Scholar]
  • 6.Logsdon J M. Curr Opin Genet Dev. 1998;8:637–648. doi: 10.1016/s0959-437x(98)80031-2. [DOI] [PubMed] [Google Scholar]
  • 7.Hankeln T, Fried H, Ebersberger I, Martin J, Schmidt E R. Gene. 1997;205:151–160. doi: 10.1016/s0378-1119(97)00518-0. [DOI] [PubMed] [Google Scholar]
  • 8.Tarrio R, Rodriguez-Trelles F, Ayala F J. Proc Natl Acad Sci USA. 1998;95:1658–1662. doi: 10.1073/pnas.95.4.1658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.O’Neill R J W, Brennan F E, Delbridge M L, Crozier R H, Graves J A M. Proc Natl Acad Sci USA. 1998;95:1653–1657. doi: 10.1073/pnas.95.4.1653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Frugoli J A, McPeek M A, Thomas T L, McClung C R. Genetics. 1998;149:355–365. doi: 10.1093/genetics/149.1.355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Agellon L B, Davies S L, Chen T T, Powers D A. Proc Natl Acad Sci USA. 1988;85:5136–5140. doi: 10.1073/pnas.85.14.5136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Figueroa F, Ono H, Tichy H, O’hUigin C, Klein J. Proc R Soc Lond Ser B. 1995;259:325–330. doi: 10.1098/rspb.1995.0048. [DOI] [PubMed] [Google Scholar]
  • 13.Lim E H, Brenner S. Immunogenetics. 1995;42:432–433. doi: 10.1007/BF00179410. [DOI] [PubMed] [Google Scholar]
  • 14.Elgar G. Ph.D. thesis. Cambridge, U.K.: Cambridge Univ.; 1994. [Google Scholar]
  • 15.Venkatesh B, Brenner S. Gene. 1997;187:211–215. doi: 10.1016/s0378-1119(96)00750-0. [DOI] [PubMed] [Google Scholar]
  • 16.Caldas C, Kim M-H, MacGregor A, Cain D, Aparicio S, Wiedemann L M. Oncogene. 1998;16:3233–3241. doi: 10.1038/sj.onc.1201873. [DOI] [PubMed] [Google Scholar]
  • 17.Gottgens B, Gilbert J G R, Barton L M, Aparicio S, Hawker K, Mistry S, Vaudin M, King A, Bentley D, Elgar G, et al. Genomics. 1998;48:52–62. doi: 10.1006/geno.1997.5162. [DOI] [PubMed] [Google Scholar]
  • 18.Sandford R, Sgotto B, Aparicio S, Brenner S, Vaudin M, Wilson R K, Chissoe S, Pepin K, Bateman A, Chothia C, et al. Hum Mol Genet. 1997;6:1483–1489. doi: 10.1093/hmg/6.9.1483. [DOI] [PubMed] [Google Scholar]
  • 19.Armes N, Gilley J, Fried M. Genome Res. 1997;7:1138–1152. doi: 10.1101/gr.7.12.1138. [DOI] [PubMed] [Google Scholar]
  • 20.Tassone R, Villard L, Clancy K, Gardiner K. Gene. 1999;226:211–223. doi: 10.1016/s0378-1119(98)00559-9. [DOI] [PubMed] [Google Scholar]
  • 21.Willet C E, Zapata A G, Hopkins N, Steiner L A. Immunogenetics. 1997;45:394–404. doi: 10.1007/s002510050221. [DOI] [PubMed] [Google Scholar]
  • 22.Baehr W, Falk J D, Bugra K, Triantafyllos J T, McGinnis J F. FEBS Lett. 1988;238:253–256. doi: 10.1016/0014-5793(88)80490-3. [DOI] [PubMed] [Google Scholar]
  • 23.Zhang H, Yokoyama S. Gene. 1997;191:1–6. doi: 10.1016/s0378-1119(96)00864-5. [DOI] [PubMed] [Google Scholar]
  • 24.O’Brien J, Ripps H, Al-Ubaidi M R. Gene. 1997;193:141–150. doi: 10.1016/s0378-1119(97)00079-6. [DOI] [PubMed] [Google Scholar]
  • 25.Fitzgibbon J, Hope A, Slobodyanyuk S J, Bellingham J, Bowmaker J K, Hunt D M. Gene. 1995;164:273–277. doi: 10.1016/0378-1119(95)00458-i. [DOI] [PubMed] [Google Scholar]
  • 26.Betz U A K, Mayer W E, Klein J. Proc Natl Acad Sci USA. 1994;91:11065–11069. doi: 10.1073/pnas.91.23.11065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Nobile C, Marchi J, Nigro V, Roberts R G, Danieli G A. Genomics. 1997;45:421–424. doi: 10.1006/geno.1997.4911. [DOI] [PubMed] [Google Scholar]
  • 28.Nelson J S. Fishes of the World. New York: Wiley; 1994. [Google Scholar]
  • 29.Ono H, Klein D, Vincek V, Figueroa F, O’hUigin C, Tichy H, Klein J. Proc Natl Acad Sci USA. 1992;89:11886–11890. doi: 10.1073/pnas.89.24.11886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Hansen J D, Kaattari S L. Immunogenetics. 1995;42:188–195. doi: 10.1007/BF00191224. [DOI] [PubMed] [Google Scholar]
  • 31.Ono H, O’hUigin C, Vincek V, Klein J. Immunogenetics. 1993;38:223–234. doi: 10.1007/BF00211522. [DOI] [PubMed] [Google Scholar]
  • 32.McConnell T J, Godwin U B, Cuthbertson B J. Immunol Rev. 1998;166:294–300. doi: 10.1111/j.1600-065x.1998.tb01270.x. [DOI] [PubMed] [Google Scholar]
  • 33.Stepien C A, Kocher T D. In: Molecular Systematics of Fishes. Kocher T D, Stepien C A, editors. San Diego: Academic; 1997. pp. 1–11. [Google Scholar]
  • 34.De Pinna M C C. In: Interrelationships of Fishes. Stiassny M L J, Parenti L R, Johnson G D, editors. San Diego: Academic; 1996. pp. 147–162. [Google Scholar]
  • 35.Rasmussen A, Arnason U. Proc Natl Acad Sci USA. 1999;96:2177–2182. doi: 10.1073/pnas.96.5.2177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Dibb N J, Newman A J. EMBO J. 1989;8:2015–2021. doi: 10.1002/j.1460-2075.1989.tb03609.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES