Table 2.
Actual search time benchmarks obtained searching the entire ChemDB database, with about 5M compounds using a 2.4MHz AMD Opteron processor with 2 GB of memory. Searches are carried using Tanimoto similarity measure with threshold (t = 0.9), or top ten (K = 10), or both. Search times for single-molecule query are expressed in seconds and are averaged over each dataset. The datasets correspond to the six Stahl and Rarey14 datasets, a random set of 1,000 queries extracted from the set of actual ChemDB queries, and a random set of 100 queries taken from the ChemDB. The fraction of the database that needs to be searched is given by 1 — f.
Dataset |
Size |
Time (t=0.9) |
1-f |
Time (K=10) |
1-f |
Time (t=0.9,K=10) |
1-f |
---|---|---|---|---|---|---|---|
Cox2 | 128 | 0.79 | 0.17 | 3.53 | 0.76 | 0.78 | 0.17 |
Estrogen | 55 | 0.60 | 0.12 | 2.03 | 0.43 | 0.52 | 0.11 |
Gelatinase A | 43 | 0.77 | 0.16 | 3.31 | 0.71 | 0.77 | 0.16 |
Neuraminidase | 17 | 0.70 | 0.14 | 2.74 | 0.59 | 0.66 | 0.14 |
p38 MAP kinase | 25 | 0.90 | 0.18 | 3.30 | 0.71 | 0.87 | 0.18 |
Thrombin | 67 | 0.91 | 0.19 | 3.27 | 0.70 | 0.88 | 0.19 |
ChemDB Queries | 1,000 | 0.27 | 0.06 | 1.12 | 0.24 | 0.26 | 0.06 |
Random ChemDB | 100 | 0.64 | 0.14 | 1.23 | 0.27 | 0.58 | 0.12 |