Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 1999 Aug 1;27(15):3219–3228. doi: 10.1093/nar/27.15.3219

Intron-exon structures of eukaryotic model organisms.

M Deutsch 1, M Long 1
PMCID: PMC148551  PMID: 10454621

Abstract

To investigate the distribution of intron-exon structures of eukaryotic genes, we have constructed a general exon database comprising all available intron-containing genes and exon databases from 10 eukaryotic model organisms: Homo sapiens, Mus musculus, Gallus gallus, Rattus norvegicus, Arabidopsis thaliana, Zea mays, Schizosaccharomyces pombe, Aspergillus, Caenorhabditis elegans and Drosophila. We purged redundant genes to avoid the possible bias brought about by redundancy in the databases. After discarding those questionable introns that do not contain correct splice sites, the final database contained 17 102 introns, 21 019 exons and 2903 independent or quasi-independent genes. On average, a eukaryotic gene contains 3.7 introns per kb protein coding region. The exon distribution peaks around 30-40 residues and most introns are 40-125 nt long. The variable intron-exon structures of the 10 model organisms reveal two interesting statistical phenomena, which cast light on some previous speculations. (i) Genome size seems to be correlated with total intron length per gene. For example, invertebrate introns are smaller than those of human genes, while yeast introns are shorter than invertebrate introns. However, this correlation is weak, suggesting that other factors besides genome size may also affect intron size. (ii) Introns smaller than 50 nt are significantly less frequent than longer introns, possibly resulting from a minimum intron size requirement for intron splicing.

Full Text

The Full Text of this article is available as a PDF (562.3 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Baxendale S., Abdulla S., Elgar G., Buck D., Berks M., Micklem G., Durbin R., Bates G., Brenner S., Beck S. Comparative sequence analysis of the human and pufferfish Huntington's disease genes. Nat Genet. 1995 May;10(1):67–76. doi: 10.1038/ng0595-67. [DOI] [PubMed] [Google Scholar]
  2. Boyce F. M., Beggs A. H., Feener C., Kunkel L. M. Dystrophin is transcribed in brain from a distant upstream promoter. Proc Natl Acad Sci U S A. 1991 Feb 15;88(4):1276–1280. doi: 10.1073/pnas.88.4.1276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Christiano A. M., Hoffman G. G., Chung-Honet L. C., Lee S., Cheng W., Uitto J., Greenspan D. S. Structural organization of the human type VII collagen gene (COL7A1), composed of more exons than any previously characterized gene. Genomics. 1994 May 1;21(1):169–179. doi: 10.1006/geno.1994.1239. [DOI] [PubMed] [Google Scholar]
  4. Das S., Yu L., Gaitatzes C., Rogers R., Freeman J., Bienkowska J., Adams R. M., Smith T. F., Lindelien J. Biology's new Rosetta stone. Nature. 1997 Jan 2;385(6611):29–30. doi: 10.1038/385029a0. [DOI] [PubMed] [Google Scholar]
  5. Dorit R. L., Schoenbach L., Gilbert W. How big is the universe of exons? Science. 1990 Dec 7;250(4986):1377–1382. doi: 10.1126/science.2255907. [DOI] [PubMed] [Google Scholar]
  6. Fedorov A., Suboch G., Bujakov M., Fedorova L. Analysis of nonuniformity in intron phase distribution. Nucleic Acids Res. 1992 May 25;20(10):2553–2557. doi: 10.1093/nar/20.10.2553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Gilson P. R., McFadden G. I. The miniaturized nuclear genome of eukaryotic endosymbiont contains genes that overlap, genes that are cotranscribed, and the smallest known spliceosomal introns. Proc Natl Acad Sci U S A. 1996 Jul 23;93(15):7737–7742. doi: 10.1073/pnas.93.15.7737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Goffeau A., Barrell B. G., Bussey H., Davis R. W., Dujon B., Feldmann H., Galibert F., Hoheisel J. D., Jacq C., Johnston M. Life with 6000 genes. Science. 1996 Oct 25;274(5287):546, 563-7. doi: 10.1126/science.274.5287.546. [DOI] [PubMed] [Google Scholar]
  9. Hawkins J. D. A survey on intron and exon lengths. Nucleic Acids Res. 1988 Nov 11;16(21):9893–9908. doi: 10.1093/nar/16.21.9893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Hughes A. L., Hughes M. K. Small genomes for better flyers. Nature. 1995 Oct 5;377(6548):391–391. doi: 10.1038/377391a0. [DOI] [PubMed] [Google Scholar]
  11. Long M., Rosenberg C., Gilbert W. Intron phase correlations and the evolution of the intron/exon structure of genes. Proc Natl Acad Sci U S A. 1995 Dec 19;92(26):12495–12499. doi: 10.1073/pnas.92.26.12495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Long M., de Souza S. J., Gilbert W. The yeast splice site revisited: new exon consensus from genomic analysis. Cell. 1997 Dec 12;91(6):739–740. doi: 10.1016/s0092-8674(00)80462-6. [DOI] [PubMed] [Google Scholar]
  13. Long M., de Souza S. J., Rosenberg C., Gilbert W. Relationship between "proto-splice sites" and intron phases: evidence from dicodon analysis. Proc Natl Acad Sci U S A. 1998 Jan 6;95(1):219–223. doi: 10.1073/pnas.95.1.219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Moriyama E. N., Petrov D. A., Hartl D. L. Genome size and intron size in Drosophila. Mol Biol Evol. 1998 Jun;15(6):770–773. doi: 10.1093/oxfordjournals.molbev.a025980. [DOI] [PubMed] [Google Scholar]
  15. Mount S. M., Burks C., Hertz G., Stormo G. D., White O., Fields C. Splicing signals in Drosophila: intron size, information content, and consensus sequences. Nucleic Acids Res. 1992 Aug 25;20(16):4255–4262. doi: 10.1093/nar/20.16.4255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Palmer J. D., Logsdon J. M., Jr The recent origins of introns. Curr Opin Genet Dev. 1991 Dec;1(4):470–477. doi: 10.1016/s0959-437x(05)80194-7. [DOI] [PubMed] [Google Scholar]
  17. Pearson W. R. Using the FASTA program to search protein and DNA sequence databases. Methods Mol Biol. 1994;24:307–331. doi: 10.1385/0-89603-246-9:307. [DOI] [PubMed] [Google Scholar]
  18. Petrov D. A., Hartl D. L. High rate of DNA loss in the Drosophila melanogaster and Drosophila virilis species groups. Mol Biol Evol. 1998 Mar;15(3):293–302. doi: 10.1093/oxfordjournals.molbev.a025926. [DOI] [PubMed] [Google Scholar]
  19. Petrov D. A., Lozovskaya E. R., Hartl D. L. High intrinsic rate of DNA loss in Drosophila. Nature. 1996 Nov 28;384(6607):346–349. doi: 10.1038/384346a0. [DOI] [PubMed] [Google Scholar]
  20. Russell C. B., Fraga D., Hinrichsen R. D. Extremely short 20-33 nucleotide introns are the standard length in Paramecium tetraurelia. Nucleic Acids Res. 1994 Apr 11;22(7):1221–1225. doi: 10.1093/nar/22.7.1221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Sharp P. A., Burge C. B. Classification of introns: U2-type or U12-type. Cell. 1997 Dec 26;91(7):875–879. doi: 10.1016/s0092-8674(00)80479-1. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES