Table 2.
Distribution | μ | σ | ρ |
---|---|---|---|
Tanimoto (ChemDB) | 0.17 | 0.052 | 0.82 |
Tanimoto (fT(t)ChemDB) | 0.16 | 0.055 | 0.82 |
Tanimoto (Binomial) | 0.12 | 0.017 | 0.28 |
Tanimoto (Multiple Bernoulli) | 0.18 | 0.018 | 0.25 |
Tanimoto (Hypergeometric/Gaussian) | 0.10 | 0.050 | 0.90 |
The parameters were determined from Gaussian fits to the distributions, except for the ratio-of-Gaussians parameters (fT(t)ChemDB), which were determined directly from the analytical formula for fT(t). The intersection and union correlations are also given. The ratio-of-Gaussians approximation gives accurate estimates of the empirical distribution parameters. With the empirical bit probabilities, the multiple Bernoulli model gives a good approximation of the empirical mean, but the distribution width is too small. Conversely, the use of a Gaussian to model the query and database fingerprints allows to the hypergeometric/Gaussian model to reproduce the empirical distribution width, though the distribution mean is smaller than the empirical value.