. Author manuscript; available in PMC: 2008 Nov 1.

Published in final edited form as: J Chem Inf Model. 2007 Oct 30;47(6):2098–2109. doi: 10.1021/ci700200n

Table 7.

Speed benchmarks for various compression algorithms representing approximate time in seconds to perform 100 queries across a random set of 50,000 molecules from the ChemDB (5 million similarity calculations) with N_hash = 2³⁰ using binary fingerprints and Tanimoto similarity measure. The last two lines correspond to modulo compression. Modulo-Uncorrected corresponds to computing the Tanimoto similarity directly on the compressed fingerprints as an estimate of the Tanimoto similarity on the uncompressed fingerprints. Modulo-Corrected refers to a better estimate of the Tanimoto uncompressed similarity derived in.^? These benchmarks were carried on 2.0 MHz Intel-Dual Core Macintosh laptop computer.

Encoding	Path	Circular
Golomb-Rice[hash]	29.7	8.6
Golomb-Rice[posthash,sorted]	20.9	6.5
MOV[hash]	32.1	9.6
MOV[posthash,sorted]	22.0	6.3
MOL[hash]	27.2	9.0
MOL[posthash,sorted]	20.9	6.4
Modulo-Corrected (N=1024)	4.0	4.0
Modulo-Uncorrected (N=1024)	2.8	2.8