. 2005 Dec 28;34(Database issue):D235–D242. doi: 10.1093/nar/gkj153

Table 1.

Data statistics as of August 2005

Data type	NR record count	Sources
Nucleic acid sequences	42686711	GenBank, RefSeq, BIND, UniGene, BodyMap
Protein sequences	2062061	SwissProt, TrEMBL, GenPept, PDB, BIND, PIR, RefSeq, SCOP, DDBJ, PRF, PATAA
Protein structures	32 637	PDB
Interactions	155 090	BIND, Biozon (predicted), DIP^a
Enzyme families	3944	UniProt, PIR, GenPept, SCOP
Pathways	142	KEGG
Unigene clusters	185 543	NCBI
Domain families	181500	InterPro (includes data from PFAM, PRINTS, PRODOM, PROFILE, PROSITE, SMART, SSF, TIGRFAM), Biozon (predicted)
Sequence alignments	5000000000+	Biozon
Structure alignments	8250286	Biozon
Expression similarities	68138	Biozon
Non-similarity relations	136972705	All
Descriptor documents	58176040	All
Words indexed	1627747755	All

Numbers and origin of a selection of Biozon objects relations and indexed text. The database will be gradually extended to span both new source data types as well as new derived data. All data from the source ‘Biozon’ is derived data, either in the forms of predictions (e.g. predicted interactions) or similarity (e.g. sequence alignments).

^aDIP is not publicly accessible due to copyright restrictions.