Skip to main content
. Author manuscript; available in PMC: 2008 Nov 1.
Published in final edited form as: J Chem Inf Model. 2007 Oct 30;47(6):2098–2109. doi: 10.1021/ci700200n

Table 7.

Speed benchmarks for various compression algorithms representing approximate time in seconds to perform 100 queries across a random set of 50,000 molecules from the ChemDB (5 million similarity calculations) with Nhash = 230 using binary fingerprints and Tanimoto similarity measure. The last two lines correspond to modulo compression. Modulo-Uncorrected corresponds to computing the Tanimoto similarity directly on the compressed fingerprints as an estimate of the Tanimoto similarity on the uncompressed fingerprints. Modulo-Corrected refers to a better estimate of the Tanimoto uncompressed similarity derived in.? These benchmarks were carried on 2.0 MHz Intel-Dual Core Macintosh laptop computer.

Encoding
Path
Circular
Golomb-Rice[hash] 29.7 8.6
Golomb-Rice[posthash,sorted]
20.9
6.5
MOV[hash] 32.1 9.6
MOV[posthash,sorted]
22.0
6.3
MOL[hash] 27.2 9.0
MOL[posthash,sorted]
20.9
6.4
Modulo-Corrected (N=1024) 4.0 4.0
Modulo-Uncorrected (N=1024) 2.8 2.8