Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 1996 Sep 1;24(17):3439–3452. doi: 10.1093/nar/24.17.3439

Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information.

S M Hebsgaard 1, P G Korning 1, N Tolstrup 1, J Engelbrecht 1, P Rouzé 1, S Brunak 1
PMCID: PMC146109  PMID: 8811101

Abstract

Artificial neural networks have been combined with a rule based system to predict intron splice sites in the dicot plant Arabidopsis thaliana. A two step prediction scheme, where a global prediction of the coding potential regulates a cutoff level for a local prediction of splice sites, is refined by rules based on splice site confidence values, prediction scores, coding context and distances between potential splice sites. In this approach, the prediction of splice sites mutually affect each other in a non-local manner. The combined approach drastically reduces the large amount of false positive splice sites normally haunting splice site prediction. An analysis of the errors made by the networks in the first step of the method revealed a previously unknown feature, a frequent T-tract prolongation containing cryptic acceptor sites in the 5' end of exons. The method presented here has been compared with three other approaches, GeneFinder, Gene-Mark and Grail. Overall the method presented here is an order of magnitude better. We show that the new method is able to find a donor site in the coding sequence for the jelly fish Green Fluorescent Protein, exactly at the position that was experimentally observed in A.thaliana transformants. Predictions for alternatively spliced genes are also presented, together with examples of genes from other dicots, monocots and algae. The method has been made available through electronic mail (NetPlantGene@cbs.dtu.dk), or the WWW at http://www.cbs.dtu.dk/NetPlantGene.html

Full Text

The Full Text of this article is available as a PDF (272.4 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Brunak S., Engelbrecht J., Knudsen S. Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol. 1991 Jul 5;220(1):49–65. doi: 10.1016/0022-2836(91)90380-o. [DOI] [PubMed] [Google Scholar]
  2. Carle-Urioste J. C., Ko C. H., Benito M. I., Walbot V. In vivo analysis of intron processing using splicing-dependent reporter gene assays. Plant Mol Biol. 1994 Dec;26(6):1785–1795. doi: 10.1007/BF00019492. [DOI] [PubMed] [Google Scholar]
  3. Csank C., Taylor F. M., Martindale D. W. Nuclear pre-mRNA introns: analysis and comparison of intron sequences from Tetrahymena thermophila and other eukaryotes. Nucleic Acids Res. 1990 Sep 11;18(17):5133–5141. doi: 10.1093/nar/18.17.5133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Goodall G. J., Filipowicz W. The AU-rich sequences present in the introns of plant nuclear pre-mRNAs are required for splicing. Cell. 1989 Aug 11;58(3):473–483. doi: 10.1016/0092-8674(89)90428-5. [DOI] [PubMed] [Google Scholar]
  5. Goodall G. J., Filipowicz W. The minimum functional length of pre-mRNA introns in monocots and dicots. Plant Mol Biol. 1990 May;14(5):727–733. doi: 10.1007/BF00016505. [DOI] [PubMed] [Google Scholar]
  6. Haseloff J., Amos B. GFP in plants. Trends Genet. 1995 Aug;11(8):328–329. doi: 10.1016/0168-9525(95)90186-8. [DOI] [PubMed] [Google Scholar]
  7. Hayashi M., Tsugeki R., Kondo M., Mori H., Nishimura M. Pumpkin hydroxypyruvate reductases with and without a putative C-terminal signal for targeting to microbodies may be produced by alternative splicing. Plant Mol Biol. 1996 Jan;30(1):183–189. doi: 10.1007/BF00017813. [DOI] [PubMed] [Google Scholar]
  8. Hirose T., Sugita M., Sugiura M. cDNA structure, expression and nucleic acid-binding properties of three RNA-binding proteins in tobacco: occurrence of tissue-specific alternative splicing. Nucleic Acids Res. 1993 Aug 25;21(17):3981–3987. doi: 10.1093/nar/21.17.3981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Keith B., Chua N. H. Monocot and dicot pre-mRNAs are processed with different efficiencies in transgenic tobacco. EMBO J. 1986 Oct;5(10):2419–2425. doi: 10.1002/j.1460-2075.1986.tb04516.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Kopriva S., Cossu R., Bauwe H. Alternative splicing results in two different transcripts for H-protein of the glycine cleavage system in the C4 species Flaveria trinervia. Plant J. 1995 Sep;8(3):435–441. doi: 10.1046/j.1365-313x.1995.08030435.x. [DOI] [PubMed] [Google Scholar]
  11. Korning P. G., Hebsgaard S. M., Rouze P., Brunak S. Cleaning the GenBank Arabidopsis thaliana data set. Nucleic Acids Res. 1996 Jan 15;24(2):316–320. doi: 10.1093/nar/24.2.316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Lou H., McCullough A. J., Schuler M. A. 3' splice site selection in dicot plant nuclei is position dependent. Mol Cell Biol. 1993 Aug;13(8):4485–4493. doi: 10.1128/mcb.13.8.4485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Lou H., McCullough A. J., Schuler M. A. Expression of maize Adh1 intron mutants in tobacco nuclei. Plant J. 1993 Mar;3(3):393–403. doi: 10.1046/j.1365-313x.1993.t01-22-00999.x. [DOI] [PubMed] [Google Scholar]
  14. Maquat L. E. When cells stop making sense: effects of nonsense codons on RNA metabolism in vertebrate cells. RNA. 1995 Jul;1(5):453–465. [PMC free article] [PubMed] [Google Scholar]
  15. Matthews B. W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta. 1975 Oct 20;405(2):442–451. doi: 10.1016/0005-2795(75)90109-9. [DOI] [PubMed] [Google Scholar]
  16. Robberson B. L., Cote G. J., Berget S. M. Exon definition may facilitate splice site selection in RNAs with multiple exons. Mol Cell Biol. 1990 Jan;10(1):84–94. doi: 10.1128/mcb.10.1.84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Schneider T. D., Stephens R. M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990 Oct 25;18(20):6097–6100. doi: 10.1093/nar/18.20.6097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Smith C. W., Chu T. T., Nadal-Ginard B. Scanning and competition between AGs are involved in 3' splice site selection in mammalian introns. Mol Cell Biol. 1993 Aug;13(8):4939–4952. doi: 10.1128/mcb.13.8.4939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Tolstrup N. Pruning of a large network by optimal brain damage and surgeon: an example from biological sequence analysis. Int J Neural Syst. 1995 Mar;6(1):31–42. doi: 10.1142/s0129065795000044. [DOI] [PubMed] [Google Scholar]
  20. Trifonov E. N. Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16 S rRNA nucleotide sequences. J Mol Biol. 1987 Apr 20;194(4):643–652. doi: 10.1016/0022-2836(87)90241-5. [DOI] [PubMed] [Google Scholar]
  21. Umen J. G., Guthrie C. The second catalytic step of pre-mRNA splicing. RNA. 1995 Nov;1(9):869–885. [PMC free article] [PubMed] [Google Scholar]
  22. Waigmann E., Barta A. Processing of chimeric introns in dicot plants: evidence for a close cooperation between 5' and 3' splice sites. Nucleic Acids Res. 1992 Jan 11;20(1):75–81. doi: 10.1093/nar/20.1.75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Werneke J. M., Chatfield J. M., Ogren W. L. Alternative mRNA splicing generates the two ribulosebisphosphate carboxylase/oxygenase activase polypeptides in spinach and Arabidopsis. Plant Cell. 1989 Aug;1(8):815–825. doi: 10.1105/tpc.1.8.815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. White O., Soderlund C., Shanmugan P., Fields C. Information contents and dinucleotide compositions of plant intron sequences vary with evolutionary origin. Plant Mol Biol. 1992 Sep;19(6):1057–1064. doi: 10.1007/BF00040537. [DOI] [PubMed] [Google Scholar]
  25. Wiebauer K., Herrero J. J., Filipowicz W. Nuclear pre-mRNA processing in plants: distinct modes of 3'-splice-site selection in plants and animals. Mol Cell Biol. 1988 May;8(5):2042–2051. doi: 10.1128/mcb.8.5.2042. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES