Abstract
Pre-mRNA splicing in plants, while generally similar to the processes in vertebrates and yeast, is thought to involve plant specific cis-acting elements. Both monocot and dicot introns are typically strongly enriched in U nucleotides, and AU- or U-rich segments are thought to be involved in intron recognition, splice site selection, and splicing efficiency. We have applied logitlinear models to find optimal combinations of splice site variables for the purpose of separating true splice sites from a large excess of potential sites. It is shown that plant splice site prediction from sequence inspection is greatly improved when compositional contrast between exons and introns is considered in addition to degree of matching to the splice site consensus (signal quality). The best model involves subclassification of splice sites according to the identity of the base immediately upstream of the GU and AG signals and gives substantial performance gains compared with conventional profile methods.
Full Text
The Full Text of this article is available as a PDF (112.6 KB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Berget S. M. Exon recognition in vertebrate splicing. J Biol Chem. 1995 Feb 10;270(6):2411–2414. doi: 10.1074/jbc.270.6.2411. [DOI] [PubMed] [Google Scholar]
- Brunak S., Engelbrecht J., Knudsen S. Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol. 1991 Jul 5;220(1):49–65. doi: 10.1016/0022-2836(91)90380-o. [DOI] [PubMed] [Google Scholar]
- Burset M., Guigó R. Evaluation of gene structure prediction programs. Genomics. 1996 Jun 15;34(3):353–367. doi: 10.1006/geno.1996.0298. [DOI] [PubMed] [Google Scholar]
- Chen Q. K., Hertz G. Z., Stormo G. D. MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices. Comput Appl Biosci. 1995 Oct;11(5):563–566. doi: 10.1093/bioinformatics/11.5.563. [DOI] [PubMed] [Google Scholar]
- Dong S., Searls D. B. Gene structure prediction by linguistic methods. Genomics. 1994 Oct;23(3):540–551. doi: 10.1006/geno.1994.1541. [DOI] [PubMed] [Google Scholar]
- Fields C. A., Soderlund C. A. gm: a practical tool for automating DNA sequence analysis. Comput Appl Biosci. 1990 Jul;6(3):263–270. doi: 10.1093/bioinformatics/6.3.263. [DOI] [PubMed] [Google Scholar]
- Gelfand M. S. Prediction of function in DNA sequence analysis. J Comput Biol. 1995 Spring;2(1):87–115. doi: 10.1089/cmb.1995.2.87. [DOI] [PubMed] [Google Scholar]
- Goodall G. J., Filipowicz W. The AU-rich sequences present in the introns of plant nuclear pre-mRNAs are required for splicing. Cell. 1989 Aug 11;58(3):473–483. doi: 10.1016/0092-8674(89)90428-5. [DOI] [PubMed] [Google Scholar]
- Guigó R., Knudsen S., Drake N., Smith T. Prediction of gene structure. J Mol Biol. 1992 Jul 5;226(1):141–157. doi: 10.1016/0022-2836(92)90130-c. [DOI] [PubMed] [Google Scholar]
- Hebsgaard S. M., Korning P. G., Tolstrup N., Engelbrecht J., Rouzé P., Brunak S. Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucleic Acids Res. 1996 Sep 1;24(17):3439–3452. doi: 10.1093/nar/24.17.3439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kleffe J., Hermann K., Gunia W., Vahrson W., Wittig B. DNASTAT: a Pascal unit for the statistical analysis of DNA and protein sequences. Comput Appl Biosci. 1995 Aug;11(4):449–455. doi: 10.1093/bioinformatics/11.4.449. [DOI] [PubMed] [Google Scholar]
- Kondrakhin Y. V., Kel A. E., Kolchanov N. A., Romashchenko A. G., Milanesi L. Eukaryotic promoter recognition by binding sites for transcription factors. Comput Appl Biosci. 1995 Oct;11(5):477–488. doi: 10.1093/bioinformatics/11.5.477. [DOI] [PubMed] [Google Scholar]
- Korning P. G., Hebsgaard S. M., Rouze P., Brunak S. Cleaning the GenBank Arabidopsis thaliana data set. Nucleic Acids Res. 1996 Jan 15;24(2):316–320. doi: 10.1093/nar/24.2.316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu H. X., Filipowicz W. Mapping of branchpoint nucleotides in mutant pre-mRNAs expressed in plant cells. Plant J. 1996 Mar;9(3):381–389. doi: 10.1046/j.1365-313x.1996.09030381.x. [DOI] [PubMed] [Google Scholar]
- Lou H., McCullough A. J., Schuler M. A. 3' splice site selection in dicot plant nuclei is position dependent. Mol Cell Biol. 1993 Aug;13(8):4485–4493. doi: 10.1128/mcb.13.8.4485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lou H., McCullough A. J., Schuler M. A. Expression of maize Adh1 intron mutants in tobacco nuclei. Plant J. 1993 Mar;3(3):393–403. doi: 10.1046/j.1365-313x.1993.t01-22-00999.x. [DOI] [PubMed] [Google Scholar]
- Luehrsen K. R., Taha S., Walbot V. Nuclear pre-mRNA processing in higher plants. Prog Nucleic Acid Res Mol Biol. 1994;47:149–193. doi: 10.1016/s0079-6603(08)60252-4. [DOI] [PubMed] [Google Scholar]
- Luehrsen K. R., Walbot V. Addition of A- and U-rich sequence increases the splicing efficiency of a deleted form of a maize intron. Plant Mol Biol. 1994 Feb;24(3):449–463. doi: 10.1007/BF00024113. [DOI] [PubMed] [Google Scholar]
- Luehrsen K. R., Walbot V. Intron creation and polyadenylation in maize are directed by AU-rich RNA. Genes Dev. 1994 May 1;8(9):1117–1130. doi: 10.1101/gad.8.9.1117. [DOI] [PubMed] [Google Scholar]
- McCullough A. J., Lou H., Schuler M. A. Factors affecting authentic 5' splice site selection in plant nuclei. Mol Cell Biol. 1993 Mar;13(3):1323–1331. doi: 10.1128/mcb.13.3.1323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prestridge D. S. Predicting Pol II promoter sequences using transcription factor binding sites. J Mol Biol. 1995 Jun 23;249(5):923–932. doi: 10.1006/jmbi.1995.0349. [DOI] [PubMed] [Google Scholar]
- Simpson C. G., Clark G., Davidson D., Smith P., Brown J. W. Mutation of putative branchpoint consensus sequences in plant introns reduces splicing efficiency. Plant J. 1996 Mar;9(3):369–380. doi: 10.1046/j.1365-313x.1996.09030369.x. [DOI] [PubMed] [Google Scholar]
- Sinibaldi R. M., Mettler I. J. Intron splicing and intron-mediated enhanced expression in monocots. Prog Nucleic Acid Res Mol Biol. 1992;42:229–257. doi: 10.1016/s0079-6603(08)60577-2. [DOI] [PubMed] [Google Scholar]
- Sirajuddin K., Nagashima T., Ono K. A new algorithm for predicting splice site sequence based on an improvement of categorical discriminant analysis. Comput Appl Biosci. 1995 Aug;11(4):349–359. doi: 10.1093/bioinformatics/11.4.349. [DOI] [PubMed] [Google Scholar]
- Snyder E. E., Stormo G. D. Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. Nucleic Acids Res. 1993 Feb 11;21(3):607–613. doi: 10.1093/nar/21.3.607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- White O., Soderlund C., Shanmugan P., Fields C. Information contents and dinucleotide compositions of plant intron sequences vary with evolutionary origin. Plant Mol Biol. 1992 Sep;19(6):1057–1064. doi: 10.1007/BF00040537. [DOI] [PubMed] [Google Scholar]