Table 2.
Compounds-per-protein and per-document.
Database or subset | Document count |
Protein ID type | Total proteins |
Human proteins |
Cpds-per-protein | Cpds-per-document |
---|---|---|---|---|---|---|
GVKBIO | 87747 | Entrez Gene | 3292 | 1468 | 604 | 22 |
GVKBIO journals | 51810 | Entrez Gene | 2660 | 1146 | 239 | 12 |
GVKBIO patents | 35937 | Entrez Gene | 1765 | 952 | 815 | 40 |
GVKBIO DD | 26825 | Entrez Gene | 733 | 339 | 5 | 0.14 |
GVKBIO CCD | 27286 | Entrez Gene | 1224 | 610 | 7 | 0.32 |
WOMBAT | 10205 | Swiss-Prot | 1979 | 1095 | 91 | 18 |
DrugBank | n/a | Swiss-Prot | 1625 | 1356 | 3 | n/a |
PubChem actives | n/a | RefSeq | 72 | n/a | 104 | n/a |
PubChem PDB | n/a | RefSeq | 818 | n/a | 14 | n/a |
BindingDB | 1142 | Swiss-Prot | 297 | 97 | 112 | 19 |
MDDR | 137754 | n/a | n/a | n/a | n/a | 1.4 |
DNP | 7765 | n/a | n/a | n/a | n/a | 18 |
Column three is the type of protein identifier used for the count of all species (column four) and human proteins (column five). In columns six and seven the filtered compound totals are taken from Additional file 1. The compound ratios are calculated with respect to total proteins and documents. For boxes labelled n/a the information was either not applicable or not available. For reference we have included a compounds-per-protein calculation for the PubChem actives subset even though there are no document-protein links analogous to the other sources.