Table 1.
Data statistics as of August 2005
Data type | NR record count | Sources |
---|---|---|
Nucleic acid sequences | 42686711 | GenBank, RefSeq, BIND, UniGene, BodyMap |
Protein sequences | 2062061 | SwissProt, TrEMBL, GenPept, PDB, BIND, PIR, RefSeq, SCOP, DDBJ, PRF, PATAA |
Protein structures | 32 637 | PDB |
Interactions | 155 090 | BIND, Biozon (predicted), DIPa |
Enzyme families | 3944 | UniProt, PIR, GenPept, SCOP |
Pathways | 142 | KEGG |
Unigene clusters | 185 543 | NCBI |
Domain families | 181500 | InterPro (includes data from PFAM, PRINTS, PRODOM, PROFILE, PROSITE, SMART, SSF, TIGRFAM), Biozon (predicted) |
Sequence alignments | 5000000000+ | Biozon |
Structure alignments | 8250286 | Biozon |
Expression similarities | 68138 | Biozon |
Non-similarity relations | 136972705 | All |
Descriptor documents | 58176040 | All |
Words indexed | 1627747755 | All |
Numbers and origin of a selection of Biozon objects relations and indexed text. The database will be gradually extended to span both new source data types as well as new derived data. All data from the source ‘Biozon’ is derived data, either in the forms of predictions (e.g. predicted interactions) or similarity (e.g. sequence alignments).
aDIP is not publicly accessible due to copyright restrictions.