Figure - PMC

Skip to main content

View full-text article in PMC

. Author manuscript; available in PMC: 2008 Aug 29.

Published in final edited form as: J Chem Inf Model. 2007 Feb 28;47(2):302–317. doi: 10.1021/ci600358f

The two upper plots correspond to an experiment where K is varied and |D| is held constant at 4,099,792. Results are averaged over 5,000 separate queries randomly chosen from the ChemDB. The two lower plots correspond to an experiment where |D| is varied and K is held constant at 1, the data is averaged over 1,000 separate queries randomly chosen from ChemDB. The lines are the best fit curves using the functional form given by y = 1 − C(K/|D|)^1/b, where b and C are the fit parameters and y corresponds either to u or the fraction pruned from the database. This equation fits the data very closely with similar values for b. One can notice a small, but systematic, misfit between the empirical points and the theoretical curve y = 1 − C(K/|D|)^1/b. This can be entirely eliminated by introducing one additional offset parameter and fitting y = 1 − [C₁ + C₂(K/|D|)^1/b] to the data.