Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 1999 Jul 1;27(13):2627–2637. doi: 10.1093/nar/27.13.2627

A clean data set of EST-confirmed splice sites from Homo sapiens and standards for clean-up procedures.

T A Thanaraj 1
PMCID: PMC148470  PMID: 10373578

Abstract

A clean data set of verified splice sites from Homo sapiens are reported as well as the standards used for the clean-up procedure. The sites were validated by: (i) standard cleaning procedures such as requiring consistency in the annotation of the gene structural elements, completeness of the coding regions and elimination of redundant sequences; (ii) clustering by decision trees coupled with analysis of ClustalW alignments of the translated protein sequence with homologous proteins from SWISS-PROT; (iii) matching against human EST sequences. The sites are categorised as: (i) donor sites, a set of 619 EST-confirmed donor sites, for which 138 are either the sites or the regions around the sites involved in alternative splice events; (ii) acceptor sites, a set of 623 EST-confirmed acceptor sites, for which 144 are either the sites or the regions around the sites are involved in alternative splice events; (iii) genuine splice sites, a set of 392 splice sites wherein both the donor and acceptor sites had EST confirmation and were not involved in any alternative splicing; (iv) alternative splice sites, a set of 209 splice sites wherein both the donor and acceptor sites had EST confirmation and the sites or the regions around them were involved in alternative splicing. A set of nucleotide regions that can be used to generate a control set of false splice sites that have a high confidence of being non-functional are also reported.

Full Text

The Full Text of this article is available as a PDF (533.4 KB).


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES