Skip to main content
. 2008 Jul 1;24(13):i357–i365. doi: 10.1093/bioinformatics/btn187

Table 2.

Empirical and model Tanimoto score distribution parameters

Distribution μ σ ρ
Tanimoto (ChemDB) 0.17 0.052 0.82
Tanimoto (fT(t)ChemDB) 0.16 0.055 0.82
Tanimoto (Binomial) 0.12 0.017 0.28
Tanimoto (Multiple Bernoulli) 0.18 0.018 0.25
Tanimoto (Hypergeometric/Gaussian) 0.10 0.050 0.90

The parameters were determined from Gaussian fits to the distributions, except for the ratio-of-Gaussians parameters (fT(t)ChemDB), which were determined directly from the analytical formula for fT(t). The intersection and union correlations are also given. The ratio-of-Gaussians approximation gives accurate estimates of the empirical distribution parameters. With the empirical bit probabilities, the multiple Bernoulli model gives a good approximation of the empirical mean, but the distribution width is too small. Conversely, the use of a Gaussian to model the query and database fingerprints allows to the hypergeometric/Gaussian model to reproduce the empirical distribution width, though the distribution mean is smaller than the empirical value.