Abstract
KEYnet is a database where gene and protein names are hierarchically structured. Particular care has been devoted to the search and organisation of synonyms. The structuring is based on biological criteria in order to assist the user in data search and to minimise the risk of information loss. Links to the EMBL data library by the entry name and the accession number are implemented. KEYnet is available through the WWW at the following site: http://www.ba.cnr.it/keynet.html
INTRODUCTION
The most common interrogation criteria for bio-databases are gene and protein names but, so far, the majority of them have been incorrectly annotated in the nucleic acid sequence databases which causes inconsistencies in data retrieval. In order to properly target retrieval using such criteria, gene and protein names need to be correctly coded. Here we present the database KEYnet (1,2) where gene and protein names are organised in a hierarchical structure according to the biological function of the associated sequence. Links among lexical or biological synonyms are implemented.
DATABASE DESCRIPTION
Each entry in the KEYnet database is related to a gene or protein name. The whole database is hierarchically structured according to the scheme previously reported (1,2) and visible at http://bio-www.ba.cnr.it:8000/Tutorials/KEYnet/network.html . In particular, KEYnet structure is made up of a set of elements, nodes, linked to form a father–son relationship. At the highest level there is the root which links all the branches in the tree. The most important branches are the nodes Protein, DNA and RNA. Each leaf in the tree is composed of several elements linked by synonymy. Two by-side branches are implemented: the RAT Gene Names Tree and the Mitochondrial Genome Tree [the Mitochondrion Gene names classification has been structured as a contribution to the MitBASE project (3)]. Gene and protein names are extracted from the EMBL data library (4).
Biological information about associated sequences are extracted from the same primary databases [EMBL data library (4) and GenBank (5)] and from specialised databases such as SWISS-PROT (6), ENZYME (7) or any other suitable database. MEDLINE is also consulted whenever the above mentioned databases do not contain the necessary information for the gene and protein name classification. KEYnet database is updated at each EMBL data library release and, at this time, the link among KEYnet and the EMBL data library is established.
One of the major problems encountered during data classification is the gene names branch. Gene naming is recognised worldwide as a difficult problem, due to the freedom with which users assign a name to a gene whenever it is discovered. Several attempts to address this problem are in progress (8,9; see http://www.ebi.ac.uk:7081/docs/nomenclature and http://www.gene.ucl.ac.uk/nomenclature ).
We have organised gene names by establishing a starting set of main ancestor keywords relevant to their primary biological functions. At present KEYnet contains 66 219 gene and protein names as is reported in detail in the table at http://bio-www. ba.cnr.it:8000/Tutorials/KEYnet/Table1.html
KEYnet QUERY SYSTEMS
KEYnet database can be queried through the RETKEY program, written in FORTRAN and C, available at the CNR Research Area of the Bari server. A slightly different version is KEYnetWWW (http://www.ba.cnr.it/keynet.html ), which is more powerful because it can be accessed worldwide and the retrievable information is more complete.
The usage of KEYnetWWW is described in the following examples. Searching for glutamine synthetase nucleotide sequences in the KEYnet database (http://bio-www.ba.cnr. it:8000/Tutorials/KEYnet/example1 ) we obtain 257 entries from release 58 of the EMBL data library. Searching for the same protein starting from the ENZYME database through the SRS (10) retrieval system (http://bio-www.area.ba.cnr.it:8000/Tutorials/KEYnet/example2 ) gives 148 entries from the same EMBL data library release. The retrieved data have been carefully revised and the numbers actually refer to entries related to nucleotide sequences coding for glutamine synthetase.
Users of KEYnet are kindly invited to cite the present article.
Acknowledgments
ACKNOWLEDGEMENTS
This work has been partially supported by the EU-Biotechnology Programme (Contracts n. BIO4-CT95-0037 and BIO4-CT97-0), by ‘Programma Biotecnologie legge 95/95 (MURST 5%)’, by MPI (Italy) and by CNR Research Area of Bari (IT).
REFERENCES
- 1.Tullo A., Liuni,S. and Attimonelli,M. (1990) Protein Seq. Data Anal., 3, 327–334. [PubMed] [Google Scholar]
- 2.Liciulli F., Catalano,D., D’Elia,D., Lorusso,V. and Attiminelli,M. (1999) Nucleic Acids Res., 27, 365–367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Attimonelli M., Altamura,N., Benne,R., Boyen,C., Brennicke,A., Carone,A., Cooper,J.M., D’Elia,D., de Montalvo,A., de Pinto,B., De Robertis,M., Golik,P., Grienenberger,J.M., Knoop,V., Lanave,C., Lazowska,J., Lemagnen,A., Malladi,B.S., Memeo,F., Monnerot,M., Pilbout,S., Schapira,A.H.V., Sloof,P., Slonimski,P., Stevens,K. and Saccone,C. (1999) Nucleic Acids Res., 27, 128–133. Updated article in this issue: Nucleic Acids Res. (2000), 28, 148–152.9847157 [Google Scholar]
- 4.Stoesser G., Tuli,M.A., Lopez,R. and Sterk,P. (1999) Nucleic Acids Res., 27, 18–24. Updated article in this issue: Nucleic Acids Res. (2000), 28, 19–23.9847133 [Google Scholar]
- 5.Dennis A., Benson,M., Boguski,S., Lipman,D.J., Ostell,J., Ouellette,B.F.F., Rapp,B.A. and Wheeler,D.L. (1999) Nucleic Acids Res., 27, 12–17. Updated article in this issue: Nucleic Acids Res. (2000), 28, 15–18.9847132 [Google Scholar]
- 6.Bairoch A. and Apweiler,R. (1999) Nucleic Acids Res., 27, 49–54. Updated article in this issue: Nucleic Acids Res. (2000), 28, 45–48.9847139 [Google Scholar]
- 7.Bairoch A. (1999) Nucleic Acids Res., 27, 310–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lonsdale D.M. and Leaver,C.J. (1988) Plant Mol. Biol., 6, 14–21. [Google Scholar]
- 9.Hallick R.B. (1989) Plant Mol. Biol., 7, 266–275. [Google Scholar]
- 10.Etzold T., Ulyanov,A. and Argos,P. (1996) Methods Enzymol., 266, 114–128. [DOI] [PubMed] [Google Scholar]
