Skip to main content
. 2004 Dec 22;6(1):R2. doi: 10.1186/gb-2004-6-1-r2

Table 7.

Information extracted from different data sources

Data source (version) Information extracted (for each gene or locus) Number of genes

Obtained Nonredundant
Ensembl (Build 31) Gene name, chromosome or contig, start and end positions, strand (transcription direction), exons, gene-product (including function name(s) or description(s), synonyms and EC number(s)), cross references (IDs) to other databases (SwissProt, HUGO, PDB, GO, RefSeq, OMIM, Entrez, SPTREMBL, EMBL, LocusLink). 24,847
LocusLink (03/29/2003) Gene name, chromosome, gene product (function name or description), function synonyms, EC number(s), gene and protein comments, cross references (IDs) to other databases (Entrez, UCSC Genome, RefSeq, GO, OMIM, UniGene, PubMed) 18,880 3,936
GenBank NC_001807 (mitochondrion) Gene name, start and end positions, transcription direction, gene product (function name or description) 35

Functional information in Ensembl had to be extensively parsed to extract multiple functions, EC numbers, and/or synonyms. The 'nonredundant' column shows the number of genes from LocusLink that had no corresponding gene in the other two data sources (Ensembl and GenBank).