Skip to main content
. 2005 Dec 28;34(Database issue):D235–D242. doi: 10.1093/nar/gkj153

Table 1.

Data statistics as of August 2005

Data type NR record count Sources
Nucleic acid sequences 42686711 GenBank, RefSeq, BIND, UniGene, BodyMap
Protein sequences 2062061 SwissProt, TrEMBL, GenPept, PDB, BIND, PIR, RefSeq, SCOP, DDBJ, PRF, PATAA
Protein structures 32 637 PDB
Interactions 155 090 BIND, Biozon (predicted), DIPa
Enzyme families 3944 UniProt, PIR, GenPept, SCOP
Pathways 142 KEGG
Unigene clusters 185 543 NCBI
Domain families 181500 InterPro (includes data from PFAM, PRINTS, PRODOM, PROFILE, PROSITE, SMART, SSF, TIGRFAM), Biozon (predicted)
Sequence alignments 5000000000+ Biozon
Structure alignments 8250286 Biozon
Expression similarities 68138 Biozon
Non-similarity relations 136972705 All
Descriptor documents 58176040 All
Words indexed 1627747755 All

Numbers and origin of a selection of Biozon objects relations and indexed text. The database will be gradually extended to span both new source data types as well as new derived data. All data from the source ‘Biozon’ is derived data, either in the forms of predictions (e.g. predicted interactions) or similarity (e.g. sequence alignments).

aDIP is not publicly accessible due to copyright restrictions.