. Author manuscript; available in PMC: 2008 Aug 29.

Published in final edited form as: J Chem Inf Model. 2007 Feb 28;47(2):302–317. doi: 10.1021/ci600358f

Table 2.

Actual search time benchmarks obtained searching the entire ChemDB database, with about 5M compounds using a 2.4MHz AMD Opteron processor with 2 GB of memory. Searches are carried using Tanimoto similarity measure with threshold (t = 0.9), or top ten (K = 10), or both. Search times for single-molecule query are expressed in seconds and are averaged over each dataset. The datasets correspond to the six Stahl and Rarey¹⁴ datasets, a random set of 1,000 queries extracted from the set of actual ChemDB queries, and a random set of 100 queries taken from the ChemDB. The fraction of the database that needs to be searched is given by 1 — f.

Dataset	Size	Time (t=0.9)	1-f	Time (K=10)	1-f	Time (t=0.9,K=10)	1-f
Cox2	128	0.79	0.17	3.53	0.76	0.78	0.17
Estrogen	55	0.60	0.12	2.03	0.43	0.52	0.11
Gelatinase A	43	0.77	0.16	3.31	0.71	0.77	0.16
Neuraminidase	17	0.70	0.14	2.74	0.59	0.66	0.14
p38 MAP kinase	25	0.90	0.18	3.30	0.71	0.87	0.18
Thrombin	67	0.91	0.19	3.27	0.70	0.88	0.19
ChemDB Queries	1,000	0.27	0.06	1.12	0.24	0.26	0.06
Random ChemDB	100	0.64	0.14	1.23	0.27	0.58	0.12