Skip to main content
. 2018 Dec 18;10:66. doi: 10.1186/s13321-018-0321-8

Fig. 8.

Fig. 8

ChEMBL (n=1.7M) k-nearest neighbor searches performance of 2048-D MHFP6 indexed using LSH Forest and 2048-D ECFP4 indexed using Annoy Recovery rates for both implementations depend on parameters k, kc, and l. a While LSH Forest performs better for k=5 and k=10 nearest neighbors, Annoy surpasses LSH Forest for k=50 and k=100. b By increasing the number of nearest neighbors by a factor of kc, the performance of both ANN neighbor methods can be greatly improved. While LSH Forest (orange) shows worse performance compared to Annoy (green) for kc<20, it surpasses Annoy for higher values. c Increasing the number of trees l increases the recovery rate for both methods at the expense of main memory. Annoy performs slightly better for l=8,,128, performance of LSH Forest increases at a greater rate, overtaking Annoy at l=256. d, e Increasing values of parameters kc and k affects query times of Annoy negatively. While the average query time for LSH Forest remains below 100 ms for k=50 and k=100, Annoys average query time increases to above 100 and 200 ms respectively. f As the number of prefix trees, and thus the recovery rate, in LSH Forest increases, the query time decreases. On the other hand, an increase in Annoy trees, with a beneficial effect on recovery rate, also increases the query time. For subplots a, d; b, e; and c, f; the data has been aggregated over all measured values for kc, l; k, l; and kc, k; respectively