Abstract
A novel approach to the problem of prediction of protein-coding regions is suggested. This approach combines the site prediction methods to predict splicing sites and the global coding region prediction methods to choose the best variant of spliced mRNA. One of the advantages of the suggested algorithm is that the resulting mRNA or protein sequence may then be immediately analyzed further. The true mRNA either coincides with the predicted one or ranks high in the list of variants. In the latter situation the predicted mRNA usually differs from the true one in only one or two of several exons. The combined approach allows the use of a priori information (e.g. the putative protein length or the number of exons). It is possible to use additional parameters not considered here, such as the preferred lengths of exons and introns, and particularly the preferred position of introns in the reading frame and the preferred codon position of exon termini.
Full text
PDFSelected References
These references are in PubMed. This may not be the complete list of references from this article.
- Andersen R. D., Birren B. W., Taplitz S. J., Herschman H. R. Rat metallothionein-1 structural gene and three pseudogenes, one of which contains 5'-regulatory sequences. Mol Cell Biol. 1986 Jan;6(1):302–314. doi: 10.1128/mcb.6.1.302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bell G. I., Quinto C., Quiroga M., Valenzuela P., Craik C. S., Rutter W. J. Isolation and sequence of a rat chymotrypsin B gene. J Biol Chem. 1984 Nov 25;259(22):14265–14270. [PubMed] [Google Scholar]
- Berg O. G., von Hippel P. H. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J Mol Biol. 1987 Feb 20;193(4):723–750. doi: 10.1016/0022-2836(87)90354-8. [DOI] [PubMed] [Google Scholar]
- Borodovskii M. Iu, Sprizhitskii Iu A., Golovanov E. I., Aleksandrov A. A. Statisticheskie zakonomernosti v pervichnykh strukturakh funktsional'nykh oblastei genoma Escherichia coli. III. Komp'iuternoe raspoznavanie kodiruiushchikh oblastei. Mol Biol (Mosk) 1986 Sep-Oct;20(5):1390–1398. [PubMed] [Google Scholar]
- Fickett J. W. Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 1982 Sep 11;10(17):5303–5318. doi: 10.1093/nar/10.17.5303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gelfand M. S. Statistical analysis of mammalian pre-mRNA splicing sites. Nucleic Acids Res. 1989 Aug 11;17(15):6369–6382. doi: 10.1093/nar/17.15.6369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gribskov M., Devereux J., Burgess R. R. The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):539–549. doi: 10.1093/nar/12.1part2.539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hawkins J. D. A survey on intron and exon lengths. Nucleic Acids Res. 1988 Nov 11;16(21):9893–9908. doi: 10.1093/nar/16.21.9893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Iida Y. Categorical discriminant analysis of 3'-splice site signals of mRNA precursors in higher eukaryote genes. J Theor Biol. 1988 Nov 8;135(1):109–118. doi: 10.1016/s0022-5193(88)80177-2. [DOI] [PubMed] [Google Scholar]
- Iida Y. Quantification analysis of 5'-splice signal sequences in mRNA precursors. Mutations in rabbit beta-globin gene. Biochim Biophys Acta. 1989 Apr 12;1007(3):270–276. doi: 10.1016/0167-4781(89)90147-4. [DOI] [PubMed] [Google Scholar]
- Iida Y. Splice-site signals of mRNA precursors as revealed by computer search. Site-specific mutagenesis and thalassemia. J Biochem. 1985 Apr;97(4):1173–1179. doi: 10.1093/oxfordjournals.jbchem.a135162. [DOI] [PubMed] [Google Scholar]
- Karin M., Richards R. I. Human metallothionein genes--primary structure of the metallothionein-II gene and a related processed gene. Nature. 1982 Oct 28;299(5886):797–802. doi: 10.1038/299797a0. [DOI] [PubMed] [Google Scholar]
- Kozak M. The scanning model for translation: an update. J Cell Biol. 1989 Feb;108(2):229–241. doi: 10.1083/jcb.108.2.229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kudo M., Iida Y., Shimbo M. Syntactic pattern analysis of 5'-splice site sequences of mRNA precursors in higher eukaryote genes. Comput Appl Biosci. 1987 Nov;3(4):319–324. doi: 10.1093/bioinformatics/3.4.319. [DOI] [PubMed] [Google Scholar]
- Lida Y. DNA sequences and multivariate statistical analysis. Categorical discrimination approach to 5' splice site signals of mRNA precursors in higher eukaryotes' genes. Comput Appl Biosci. 1987 Jun;3(2):93–98. doi: 10.1093/bioinformatics/3.2.93. [DOI] [PubMed] [Google Scholar]
- Malissen M., Malissen B., Jordan B. R. Exon/intron organization and complete nucleotide sequence of an HLA gene. Proc Natl Acad Sci U S A. 1982 Feb;79(3):893–897. doi: 10.1073/pnas.79.3.893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mironov A. A., Aleksandrov N. N., Liunovskaia-Gurova L. V., Kister A. E. Paket prikladnykh programm dlia analiza nukleotidnykh posledovatel'nostei (MALK). Mol Biol (Mosk) 1987 May-Jun;21(3):672–677. [PubMed] [Google Scholar]
- Miyatake S., Yokota T., Lee F., Arai K. Structure of the chromosomal gene for murine interleukin 3. Proc Natl Acad Sci U S A. 1985 Jan;82(2):316–320. doi: 10.1073/pnas.82.2.316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moormann R. J., den Dunnen J. T., Mulleners L., Andreoli P., Bloemendal H., Schoenmakers J. G. Strict co-linearity of genetic and protein folding domains in an intragenically duplicated rat lens gamma-crystallin gene. J Mol Biol. 1983 Dec 25;171(4):353–368. doi: 10.1016/0022-2836(83)90034-7. [DOI] [PubMed] [Google Scholar]
- Nakata K., Kanehisa M., DeLisi C. Prediction of splice junctions in mRNA sequences. Nucleic Acids Res. 1985 Jul 25;13(14):5327–5340. doi: 10.1093/nar/13.14.5327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nudel U., Calvo J. M., Shani M., Levy Z. The nucleotide sequence of a rat myosin light chain 2 gene. Nucleic Acids Res. 1984 Sep 25;12(18):7175–7186. doi: 10.1093/nar/12.18.7175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ohshima Y., Gotoh Y. Signals for the selection of a splice site in pre-mRNA. Computer analysis of splice junction sequences and like sequences. J Mol Biol. 1987 May 20;195(2):247–259. doi: 10.1016/0022-2836(87)90647-4. [DOI] [PubMed] [Google Scholar]
- Patthy L. Intron-dependent evolution: preferred types of exons and introns. FEBS Lett. 1987 Apr 6;214(1):1–7. doi: 10.1016/0014-5793(87)80002-9. [DOI] [PubMed] [Google Scholar]
- Quinqueton J., Moreau J. Application of learning techniques to splicing site recognition. Biochimie. 1985 May;67(5):541–547. doi: 10.1016/s0300-9084(85)80274-1. [DOI] [PubMed] [Google Scholar]
- Shapiro M. B., Senapathy P. RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic Acids Res. 1987 Sep 11;15(17):7155–7174. doi: 10.1093/nar/15.17.7155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shepherd J. C. Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification. Proc Natl Acad Sci U S A. 1981 Mar;78(3):1596–1600. doi: 10.1073/pnas.78.3.1596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith M. W. Structure of vertebrate genes: a statistical analysis implicating selection. J Mol Evol. 1988;27(1):45–55. doi: 10.1007/BF02099729. [DOI] [PubMed] [Google Scholar]
- Staden R., McLachlan A. D. Codon preference and its use in identifying protein coding regions in long DNA sequences. Nucleic Acids Res. 1982 Jan 11;10(1):141–156. doi: 10.1093/nar/10.1.141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Staden R. Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):551–567. doi: 10.1093/nar/12.1part2.551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tramontano A., Macchiato M. F. Probability of coding of a DNA sequence: an algorithm to predict translated reading frames from their thermodynamic characteristics. Nucleic Acids Res. 1986 Jan 10;14(1):127–135. doi: 10.1093/nar/14.1.127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Traut T. W. Do exons code for structural or functional units in proteins? Proc Natl Acad Sci U S A. 1988 May;85(9):2944–2948. doi: 10.1073/pnas.85.9.2944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trifonov E. N. Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16 S rRNA nucleotide sequences. J Mol Biol. 1987 Apr 20;194(4):643–652. doi: 10.1016/0022-2836(87)90241-5. [DOI] [PubMed] [Google Scholar]
- Zakut R., Shani M., Givol D., Neuman S., Yaffe D., Nudel U. Nucleotide sequence of the rat skeletal muscle actin gene. Nature. 1982 Aug 26;298(5877):857–859. doi: 10.1038/298857a0. [DOI] [PubMed] [Google Scholar]