Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 1990 Aug 25;18(16):4797–4801. doi: 10.1093/nar/18.16.4797

Neural network detects errors in the assignment of mRNA splice sites.

S Brunak 1, J Engelbrecht 1, S Knudsen 1
PMCID: PMC331948  PMID: 2395643

Abstract

The use of databanks in genetic research assumes reliability of the information they contain. Currently, error-detection in the manually or electronically entered data contained in the nucleotide sequence databanks at EMBL, Heidelberg and GenBank at Los Alamos is limited. We have used a subset of sequences from these databanks to train neural networks to recognize pre-mRNA splicing signals in human genes. During the training on 33 human genes from the EMBL databank seven genes appeared to disturb the learning process. Subsequent investigation revealed discrepancies from the original published papers, for three genes. In four genes, we found wrongly assigned splicing frames of introns. We believe this to be a reflection of the fact that splicing frames cannot always be unambiguously assigned on the basis of experimental data. Thus incorrect assignment appear both due to mere typographical misprints as well as erroneous interpretation of experiments. Training on 241 human sequences from GenBank revealed nine new errors. We propose that such errors could be detected by computer algorithms designed to check the consistency of data prior to their incorporation in databanks.

Full text

PDF
4797

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Baralle F. E., Shoulders C. C., Proudfoot N. J. The primary structure of the human epsilon-globin gene. Cell. 1980 Oct;21(3):621–626. doi: 10.1016/0092-8674(80)90425-0. [DOI] [PubMed] [Google Scholar]
  2. Bohr H., Bohr J., Brunak S., Cotterill R. M., Lautrup B., Nørskov L., Olsen O. H., Petersen S. B. Protein secondary structure and homology by neural networks. The alpha-helices in rhodopsin. FEBS Lett. 1988 Dec 5;241(1-2):223–228. doi: 10.1016/0014-5793(88)81066-4. [DOI] [PubMed] [Google Scholar]
  3. Brunak S., Engelbrecht J., Knudsen S. Cleaning up gene databases. Nature. 1990 Jan 11;343(6254):123–123. doi: 10.1038/343123a0. [DOI] [PubMed] [Google Scholar]
  4. Chen S. J., Chen Z., d'Auriol L., Le Coniat M., Grausz D., Berger R. Ph1+bcr- acute leukemias: implication of Alu sequences in a chromosomal translocation occurring in the new cluster region within the BCR gene. Oncogene. 1989 Feb;4(2):195–202. [PubMed] [Google Scholar]
  5. DeNoto F. M., Moore D. D., Goodman H. M. Human growth hormone DNA sequence and mRNA structure: possible alternative splicing. Nucleic Acids Res. 1981 Aug 11;9(15):3719–3730. doi: 10.1093/nar/9.15.3719. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Forster A., Huck S., Ghanem N., Lefranc M. P., Rabbitts T. H. New subgroups in the human T cell rearranging V gamma gene locus. EMBO J. 1987 Jul;6(7):1945–1950. doi: 10.1002/j.1460-2075.1987.tb02456.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Green M. R. Pre-mRNA splicing. Annu Rev Genet. 1986;20:671–708. doi: 10.1146/annurev.ge.20.120186.003323. [DOI] [PubMed] [Google Scholar]
  8. Hickey E., Brandon S. E., Potter R., Stein G., Stein J., Weber L. A. Sequence and organization of genes encoding the human 27 kDa heat shock protein. Nucleic Acids Res. 1986 May 27;14(10):4127–4145. doi: 10.1093/nar/14.10.4127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Holley L. H., Karplus M. Protein secondary structure prediction with a neural network. Proc Natl Acad Sci U S A. 1989 Jan;86(1):152–156. doi: 10.1073/pnas.86.1.152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Jacob M., Gallinaro H. The 5' splice site: phylogenetic evolution and variable geometry of association with U1RNA. Nucleic Acids Res. 1989 Mar 25;17(6):2159–2180. doi: 10.1093/nar/17.6.2159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Jacobs K., Shoemaker C., Rudersdorf R., Neill S. D., Kaufman R. J., Mufson A., Seehra J., Jones S. S., Hewick R., Fritsch E. F. Isolation and characterization of genomic and cDNA clones of human erythropoietin. 1985 Feb 28-Mar 6Nature. 313(6005):806–810. doi: 10.1038/313806a0. [DOI] [PubMed] [Google Scholar]
  12. Lawn R. M., Efstratiadis A., O'Connell C., Maniatis T. The nucleotide sequence of the human beta-globin gene. Cell. 1980 Oct;21(3):647–651. doi: 10.1016/0092-8674(80)90428-6. [DOI] [PubMed] [Google Scholar]
  13. Liu J. Z., Harano T., Lanclos K. D., Huisman T. H. The beta-delta crossover leading to the beta delta hybrid gene of hemoglobin P-Nilotic is located within 54 base-pairs of the 5' end of exon 2 or between codons 31 and 50. Biochim Biophys Acta. 1987 Aug 25;909(3):208–212. doi: 10.1016/0167-4781(87)90079-0. [DOI] [PubMed] [Google Scholar]
  14. Nakata K., Kanehisa M., DeLisi C. Prediction of splice junctions in mRNA sequences. Nucleic Acids Res. 1985 Jul 25;13(14):5327–5340. doi: 10.1093/nar/13.14.5327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Nedospasov S. A., Shakhov A. N., Turetskaya R. L., Mett V. A., Azizov M. M., Georgiev G. P., Korobko V. G., Dobrynin V. N., Filippov S. A., Bystrov N. S. Tandem arrangement of genes coding for tumor necrosis factor (TNF-alpha) and lymphotoxin (TNF-beta) in the human genome. Cold Spring Harb Symp Quant Biol. 1986;51(Pt 1):611–624. doi: 10.1101/sqb.1986.051.01.073. [DOI] [PubMed] [Google Scholar]
  16. Padgett R. A., Grabowski P. J., Konarska M. M., Seiler S., Sharp P. A. Splicing of messenger RNA precursors. Annu Rev Biochem. 1986;55:1119–1150. doi: 10.1146/annurev.bi.55.070186.005351. [DOI] [PubMed] [Google Scholar]
  17. Qian N., Sejnowski T. J. Predicting the secondary structure of globular proteins using neural network models. J Mol Biol. 1988 Aug 20;202(4):865–884. doi: 10.1016/0022-2836(88)90564-5. [DOI] [PubMed] [Google Scholar]
  18. Shapiro M. B., Senapathy P. RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic Acids Res. 1987 Sep 11;15(17):7155–7174. doi: 10.1093/nar/15.17.7155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Spritz R. A., DeRiel J. K., Forget B. G., Weissman S. M. Complete nucleotide sequence of the human delta-globin gene. Cell. 1980 Oct;21(3):639–646. doi: 10.1016/0092-8674(80)90427-4. [DOI] [PubMed] [Google Scholar]
  20. Staden R. Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):505–519. doi: 10.1093/nar/12.1part2.505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Strachan T., Sodoyer R., Damotte M., Jordan B. R. Complete nucleotide sequence of a functional class I HLA gene, HLA-A3: implications for the evolution of HLA genes. EMBO J. 1984 Apr;3(4):887–894. doi: 10.1002/j.1460-2075.1984.tb01901.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Wilde C. D., Crowther C. E., Cowan N. J. Diverse mechanisms in the generation of human beta-tubulin pseudogenes. Science. 1982 Aug 6;217(4559):549–549. doi: 10.1126/science.6178164. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES