Abstract

Over the last five years, virtual screening of ultralarge synthesis on-demand libraries has emerged as a powerful tool for hit identification in drug discovery programs. As these libraries have grown to tens of billions of molecules, we have reached a point where it is no longer cost-effective to screen every molecule virtually. To address these challenges, several groups have developed heuristic search methods to rapidly identify the best molecules on a virtual screen. This article describes the application of Thompson sampling (TS), an active learning approach that streamlines the virtual screening of large combinatorial libraries by performing a probabilistic search in the reagent space, thereby never requiring the full enumeration of the library. TS is a general technique that can be applied to various virtual screening modalities, including 2D and 3D similarity search, docking, and application of machine-learning models. In an illustrative example, we show that TS can identify more than half of the top 100 molecules from a docking-based virtual screen of 335 million molecules by evaluating 1% of the data set.
Introduction
Virtual screening1,2 is widely used for hit identification in academic and pharmaceutical drug discovery. Various computational methods have been used to search through collections of inexpensive commercially available molecules and identify starting points for optimization. When protein structures are available, docking3,4 can be used to identify potential binders. Similarity searches5,6 can be performed using 2D7 or 3D8 structures when one or more reference ligands are available. Active and inactive molecules can also be used to train machine learning models9,10 that prioritize molecules for synthesis or purchase. Until recently, virtual screening was typically applied to collections containing a few million molecules. In these cases, it was possible to evaluate every molecule in the data set exhaustively. More recently, several companies have offered ultralarge synthesis on-demand libraries.11,12 These libraries are constructed from combinatorial reactions of available reagents. By combining dozens to hundreds of reactions, vendors can assemble collections consisting of billions of molecules. A vendor typically provides the structures of available molecules in an electronic form that can be searched using virtual screening software. The corresponding compounds are often available for between 100 and 200 US dollars each. These compounds are typically delivered in 6–8 weeks, with synthesis success rates approaching 80%. Table 1 lists some of the current syntheses on-demand libraries. As we can see, many of these libraries consist of tens of billions of molecules.
Table 1. Examples of Ultralarge Libraries Available for Synthesis on Demand.
| database | number of molecules | reference |
|---|---|---|
| eMolecules eXplore | 2.8 × 1012 | https://marketing.emolecules.com/explore |
| Enamine REAL | 3.5 × 1010 | https://enamine.net/compound-collections/real-compounds/real-database |
| WuXi GalaXi | 8 × 109 | https://wuxibiology.com/drug-discovery-services/hit-finding-and-screening-services/virtual-screening/ |
| Otava CHEMyria | 1.2 × 1010 | https://www.otavachemicals.com/products/chemriya |
In addition to commercial synthesis on-demand efforts, several groups have assembled large virtual libraries that can be synthesized through precedent chemical reactions. Teams at pharmaceutical companies, including GSK, Lilly,13 Merck, and Pfizer, have used internally developed collections of reactions and reagents to assemble virtual libraries consisting of between 108 and 1020 molecules. There are also several public and open source efforts with similar aims. One of the most well-developed packages is the SyntOn14 (formerly SynthI) package by the Varnek group. SyntOn contains tools for transforming reagents into synthons and an additional program that combines these synthons into reaction products. A good description of the current state of the art in ultralarge compound collections can be found in a 2022 paper by Warr15 and a 2023 paper by Cavasotto.16
Ultralarge virtual libraries have changed the scope of virtual screening. When we deal with libraries containing billions of molecules, brute-force methods are impractical. While it is possible to exhaustively dock 1 billion molecules using cloud resources for approximately $12,000, such an approach is not financially practical for larger libraries. For instance, an exhaustive docking of 50 billion molecules would cost approximately $600K, exceeding most organizations’ virtual screening budgets. Another downside to exhaustive brute-force docking is the cost and disk space requirements for 3D conformer generation. Most docking and 3D similarity approaches require a set of pregenerated conformers. While this process only needs to be performed once for a library, the cost can be large. According to a recent publication by Sivula,17 the generation of conformers for 1.56 billion molecules required 457,600 CPU hours, and the resulting compressed database (DB) consumed more than 10TB of disk space. If one considers increasing the size of these libraries by an order of magnitude, the computational and storage costs become prohibitive.
To overcome these limitations, groups have developed new methodologies for computationally evaluating ultralarge libraries. Several recent papers have described methods that employ machine learning models as surrogates to speed up docking calculations.18−20 While docking calculations typically take 1–10 seconds per molecule to generate and score a pose, machine learning models can process thousands of molecules per second. In an early example of this approach, Berenger21 docked 10,000 molecules and used the docking results to generate a machine learning model that predicted docking score based on a one-dimensional chemical structure representation. This model, built using approximately 20% of the data set, was subsequently used to predict the docking scores of the remaining 80% of the molecules. Across 15 targets from the Lit-PCBA data set, the authors achieved an R2 > 0.7 between the predicted and observed docking scores.
The passive machine learning method described above has been extended to active learning (AL) approaches involving multiple docking and model-building rounds. The initial steps of an AL approach are like those in the method described above. One begins by selecting a subset from a large DB and docking the subset. The chemical structures and docking scores are then used to construct a machine learning model that predicts docking scores. At this point, the procedure diverges from the passive method described above. The machine learning model selects another subset, which is then docked. The selection of molecules for the second round of docking can take one of several directions. In some cases, teams will select the molecules with the highest docking scores for the next round of docking. In other cases, additional terms, such as uncertainty in the model prediction, can also be used in the selection process. The results from this second docking round are merged with those from the previous iteration and used to build a new machine learning model. This docking, model building, and selection process can continue through several iterations.
Large virtual libraries are constructed from much smaller sets of reagents. A library of a billion molecules can be synthesized from 1000 each of three different reagents. An alternate approach to ultralarge library docking is to reduce the library to its constituent building blocks, dock the building blocks, and only elaborate the most promising fragments. The V-SYNTHES22 approach begins by reducing a large combinatorial library into sets of scaffolds and single synthons, referred to as a minimal enumeration library (MEL). Each MEL component consists of a fragment constructed from a scaffold attached to a single synthon at one R-group position. The remaining attachment points on the scaffold are filled with methyl or phenyl capping groups. In creating the MEL, the authors reduced the size of the REAL library from 11 billion to 600,000. The molecules from the MEL were then docked into the protein of interest, and the most promising fragments were identified. In a subsequent step, the capped R-groups were iteratively replaced with substituents found in the corresponding fully elaborated molecules from the REAL collection. Finally, the corresponding fully enumerated molecules were docked, and their scores were recorded. A 2021 paper by Sadybekov describes the application of V-SYNTHES to identify inhibitors of kinase ROCK1. By docking only 0.1% of an 11 billion molecule library, the authors achieved a 33% hit rate and identified 14 submicromolar ligands. In a 2022 paper, Beroza23 took a similar approach but limited the analysis to a subset of the Enamine REAL collection of approximately 1 billion molecules from 102 two-component reactions. One potential drawback to the approaches used by V-SYNTHES and Beroza is that the docking of fragments may not recapitulate poses found with fully elaborated molecules.
In subsequent sections, we describe Thompson sampling24 (TS), a probabilistic approach that takes advantage of the combinatorial nature of ultralarge libraries. TS achieves efficiency gains by working in the reagent space rather than the product space. Since TS operates in the reagent space, it is not necessary to enumerate the full library. As libraries approach hundreds of millions or even billions of molecules, TS provides an advantage over other AL approaches that require enumeration and inference over those of a full library. We begin by describing the TS method and its theoretical foundations. Following this, we provide practical applications of TS for three different virtual screening scenarios.
Methodology
TS is an algorithm for the multiarmed bandit problem25 that efficiently balances the trade-off between exploration and exploitation. An agent takes actions sequentially based on the probability of an action being optimal. The agent observes the outcome (reward) of a selected action, updates its beliefs about the “armed bandit” from which the action was taken, and then repeats the process with updated beliefs. In TS, a distribution representing the current beliefs of the expected rewards of each possible action is maintained. At each step, i, the agent randomly samples a value from the belief distribution of the expected reward of each action, selects an action by taking the maximum (or minimum, depending on the optimization objective) over the random samples, observes the result of the action taken, and performs the Bayesian update to the belief distribution for the action that was chosen. This updated distribution becomes the new belief distribution for this action, and the process is repeated.
When TS is performed for searching a virtual library, we perform separate TS runs for each reaction component. For example, in a simple two-component coupling reaction, we simultaneously run TS to select a reagent for the first reaction component and then run TS to select the reagent for the second. From the perspective of the first reagent, the choice of the second reagent is what produces the distribution of scores. An action consists of selecting reagents for each component and making a molecule (in silico) with those reagents. The molecule is scored with the desired scoring function (e.g., 3D overlays or docking), and the observed score is considered the reward. Finally, the belief distributions for the expected rewards for each reagent from which the molecule was made are updated with the observed score, and the process is repeated iteratively.
There are a few requirements for running a virtual screen with TS. First, the library must comprise reactions and reagents rather than an unstructured list of molecules. The second requirement is that the scoring function being used to screen molecules must be reasonably fast relative to the size of the library being screened and the compute resources available. For example, running TS with a scoring function that takes a day to run on a library containing a billion molecules may be cost or resource prohibitive. Sampling 1% of the library would require ∼27,000 CPU years (eq 1) of compute. However, running TS using a relatively fast scoring function (i.e., that takes ∼1 to 10 s) may be run on a library of 1 billion molecules in 2800 CPU hours (eq 2); in contrast, the full enumeration would take 1 CPU year (eq 3).
Approximate time to run TS (assuming 1% of the library is sampled) on a library of 1 × 109 molecules with a scoring function that takes 1 day to complete
| 1 |
Approximate time to run TS (assuming 1% of the library is sampled) on a library of 1 × 109 molecules with a scoring function which takes 1 s to complete
| 2 |
Approximate time to run the full enumeration of a 1 billion compound library with 1 s scoring function
| 3 |
When working with a library composed of reactions, TS is conducted on only a single reaction at a time (though if there are multiple reactions in a library, the multiple reactions can be searched in parallel). There is no limit to the number of components allowed in a reaction. However, any reagent in a given component of the reaction (e.g., R1) must be able to be combined with any reagent in any other component of the reagent (Rn) to form a valid molecule.
While the TS algorithm can conceptually work with any form of belief distribution, certain forms make updating the belief distributions much simpler. We chose to assume that the scores associated with every reagent are normally distributed with a known standard deviation.
where i indexes a reagent, Xi is the random variable for the reagent’s scores, μi is the true (but unknown) mean, and σ is the known standard deviation. We discuss below the source of this known standard deviation. Given this assumption, the belief distribution for a reagent is represented by the belief distribution for μi at time t, which we also assume is normally distributed.
where μi,t and σ2i,t are indexed by reagent i at time t.
With these assumptions, updating the belief distribution with an observation of xi,t is trivial
| 4 |
| 5 |
In practice, we find that a normal distribution works well for representing standard scoring functions, such as docking and 3D overlays. Figure 1 shows the distribution of docking, ROCS, and Tanimoto similarity scores for the full enumeration of a single reagent from a reaction in Enamine REAL SPACE, overlaid by a normal distribution probability density function (PDF). While these scores have a limited range, unlike the normal distribution, the difference in the tail of the distributions does not appear to have a practical effect.
Figure 1.
Density plot of scores from enumeration of a single randomly selected reagent in a reaction, overlaid by a normal distribution in black for (A) docking, (B) 3D Overlays using OpenEye’s ROCS TanimotoCombo score, and (C) 2D Similarity Tanimoto score.
We used a simple random warmup procedure to produce the initial belief distributions. This is done by making and scoring n random molecules for each reagent in the reaction, where n is determined by the number of components in the reaction. For two-component reactions, we find n = 3 works well; for reactions with >2 components, we find n = 10 is sufficient. For all scores observed during the warmup procedure, we calculate the empirical mean x̅W and empirical standard deviation σ®W. We set the known standard deviation and the belief distributions for all reagents from these values.
![]() |
We then do the above belief updating for each reagent with the scores seen for that reagent during warmup.
This setting of the priors for the belief distribution is essentially starting with the belief that all reagents are equivalent and equal to the full distribution observed during warmup. The true standard deviation of Xi is likely lower than the empirical standard deviation seen during warmup (since a given reagent’s scores are likely tighter than what is seen across all reagents), but we still observe this choice of known standard deviation to work practically.
Next we repeat the following:
For each reagent, randomly sample a value from the belief distribution N(μi,t,σi,2t) of the mean of that reagent.
Select a reagent for each reaction component (R1–Rn) by taking the maximum or minimum (depending on the scoring objective) of the sampled values for each component.
Make the molecule (in silico) with the selected reagents and perform conformational analysis (if required for the scoring function)
Score the molecule
Perform the Bayesian update to the belief distribution (eqs 4 and 5)
The top n molecules (usually 100–1000) and their scores are tracked throughout the process.
The above steps are repeated until either of the stopping criteria has been reached:
All molecules in the library have been sampled (i.e., full enumeration, usually only for small reactions)
The maximum number of iterations is reached (typically set to 0.1 to 1% of the library size, including warmup iterations)
Note that determining the optimal stopping criteria is an open area of research.
Results and Discussion
Evaluating TS with Tanimoto Similarity Searches
As a preliminary evaluation of the method, we used TS to identify the molecules in a large combinatorial library most similar to a specific query molecule. In this evaluation, we compared the performance of TS with an exhaustive Tanimoto similarity search. Similarity calculations were performed using a 2048-bit fingerprint with a radius of 2 from version 2023.03.1 of the RDKit. We investigated how many of the 100 most similar molecules to the query could be identified using a TS run that evaluated only a small fraction of the total combinatorial library. For this example, we considered a library assembled using a reaction transform based on the Niementowski Quinazoline Synthesis,26 as shown in Figure 2. The reagent pool associated with this reaction comprised 376 aminobenzoic acids, 500 primary amines, and 500 carboxylic acids. In this case, the fully enumerated library contained 94 million products.
Figure 2.

Niementowski quinazoline synthesis.
A random structure from the quinazoline library, as shown in Figure 3, was selected as the query molecule. We intentionally chose to use a query molecule from the library in order to provide a reproducible example that simulated the situation where “interesting” hits could be found through a similarity search. We tried the same exercise with 10 hit molecules selected from a 2023 review by Brown27 and found that the most similar molecules found in an exhaustive search had Tanimoto coefficients between 0.23 and 0.34. These Tanimoto coefficients do not fall into a range we would typically consider for follow-up. Two molecules in the quinazoline data set had a Tanimoto coefficient of 1.0 to the query, the query itself, and its enantiomer. An exhaustive calculation of the Tanimoto similarity of the query to all 94 million products required 144 CPU hours.
Figure 3.

Query molecule used to search the three-component library. The SMILES for the query molecules is CCc1cccc2c(=O)n(C3CNC3)c([C@@H](C)N)nc12.
To effectively run TS, we must understand how much of a library should be sampled to identify the best molecules. To investigate this, we carried out separate TS runs for between 2,000 and 100,000 iterations. Each TS run was performed twice, once with 3 warmup cycles and once with 10 warmup cycles. The results of these calculations are shown in Figure 4. The strip plot at the top of the figure shows the Tanimoto coefficients for the top 100 molecules compared with the scores for the exhaustive search, as shown on the right in green. The scores from the TS runs compare favorably with those from the exhaustive search. Even with a small number of iterations, TS finds high-scoring molecules. Furthermore, TS with either 3 or 10 warmup cycles identifies molecules similar to the query. The bar plots at the bottom of Figure 4 show how many of the top 100 molecules were recovered in each TS run. When the number of iterations is 10,000 or less, we see a slight increase in recovery for the TS runs with 10 warmup cycles. As we move to 50,000 and 100,000 iterations, we see no impact from the number of warmup cycles. Moreover, we see no difference in recovery between 50,000 and 100,000 iterations. After 100,000 iterations, which is only 0.1% of the total library, we can recover 90 of the top 100 molecules. A TS run with 100,000 iterations took 1.27 min on a single CPU.
Figure 4.
Impact of the number of iterations on the TS performance. The strip plot in the top figure shows the Tanimoto coefficients of the 100 top-scoring molecules. The “ref” column on the far right shows scores from an exhaustive search of 94 million molecules. The bar plots in the lower panel show the number of top 100 molecules recovered as a function of the number of iterations.
In both TS and the exhaustive search, we generated distributions of scores. It is important to determine whether there is a statistically significant difference in the means of the top 100 scores. To compare the score distributions using different warmup schedules and compare TS with the exhaustive search, we used Student’s t-test. We used the t-test to compare all pairs among the distributions shown in Figure 4 and calculated a p-value to determine whether we could invalidate the null hypothesis that the means were equivalent. While 0.05 is a widely accepted threshold for significance, we must consider that we are performing multiple comparisons. The probability of invalidating the null hypothesis increases when multiple comparisons are performed. As such, we must correct the threshold for significance. One means of adjusting for multiple comparisons is the Bonferroni correction, which divides the threshold for significance by N, where N is the number of comparisons. In the case of Figure 4, where 11 TS runs are being compared, the threshold would be 0.05/11 or 0.0045.
Figure S1 shows a heatmap with the p-values from the t-test for all pairs of distributions shown in the top half of Figure 4. The smallest p-value for any comparison was 0.056 for the comparison between the exhaustive reference (ref) and TS with 2000 iterations and three warmup cycles. The remaining p-values ranged between 0.15 and 1.0. We conclude that there is no significant difference between the mean score of the top 100 molecules in each case.
To assess the reproducibility of our TS procedure, we performed two different series of 10 separate TS runs (20 in total). The only difference between the two series was how the warmup phase was conducted. In the first series, each reagent was sampled with three random partner reagents. In the second series, the number of partner reagents sampled was increased to 10. Following the guidance from the study above, each TS run used 50,000 iterations, searching 0.05% of the total library. In the top portion of Figure 5, we again see a strip plot of the Tanimoto coefficients for the top 100 molecules. The “concat” column shows the scores for concatenating the 10 TS runs and selecting the 100 highest-scoring unique molecules. In all of the TS runs, we observe score distributions similar to those of the top 100 from the exhaustive search, as shown in green on the right of the plot. In the bottom portion of Figure 5, the TS runs consistently recover between 88 and 90 of the top 100 molecules. As described above, we performed a statistical analysis of all pairs of score distributions, as shown in the top portion of Figure 5. The results of this analysis are listed in Figure S2. In this case, we were unable to establish a statistically significant difference between any pair of distributions. The lowest p-value for any of the comparisons was 0.64.
Figure 5.
Evaluation of the reproducibility of the TS procedure. The top panel shows the Tanimoto coefficients of 100 highest-scoring molecules, with the results of an exhaustive evaluation shown in green on the right. The bar plots in the bottom section of the figure show the number of top 100 molecules recovered over 10 replicate TS runs. As shown in Figure 4, we compare 3 and 10 warmup cycles.
To provide a baseline comparison for TS, we compared with a random selection where we sampled 50,000 molecules from the quinazoline library highlighted in Figure 2. As above, we evaluated the number of top 100 molecules identified by TS and compared it with those identified using 50,000 random samples. For TS, we used 10 warmup cycles and stopped the search after 50,000 evaluations. The TS and random selections were repeated 10 times using Tanimoto similarity to the query molecule in Figure 3 as the objective. The results of this comparison are shown in Figure 6, which adopts the format of the figures above. The top strip plot shows the Tanimoto coefficients of the top 100 molecules selected randomly in red and those selected by TS in orange. The green points on the right represent the scores of the top 100 molecules from an exhaustive search. The bar plot at the bottom of Figure 6 shows the number of top 100 molecules found using TS and random search. As described above, the “concat” column represents the top 100 molecules from a concatenation of all 10 TS runs. In 9 of the 10 cases, the random search did not identify any of the top 100 molecules. In replicate 7, the random search identified 1 of the top 100 molecules.
Figure 6.
Comparison of TS with a random search. The top panel shows the Tanimoto coefficients of 100 highest-scoring molecules. The results of random searches are shown in red, TS is shown in orange, and an exhaustive evaluation is shown in green on the right. The bar plots in the bottom section of the figure show the number of top 100 molecules recovered over 10 replicate TS runs. In 9 of 10 replicates, the random search failed to identify any of the top 100 molecules. As a result, the red bars in the bottom panel are not visible.
To invalidate the null hypothesis that the means of the distributions in Figure 6 are equivalent, we performed a statistical analysis similar to that described above. The results of this analysis are shown in Figure S3. In this case, we see significant differences between the TS and random selection with p-values less than 10–100. The differences among the 11 TS replicates were not significant.
What is TS Missing?
In the example presented above, TS identified 90 of the top 100 hits from the Tanimoto similarity search. Moreover, concatenating the results of multiple searches had a minimal impact. In most cases, TS identifies the same hits across all 10 replicates. To better understand the TS performance, we compiled Table 2, which shows all the building blocks contained in the top 100 hits. Based on the exhaustive search results, the top 100 molecules were constructed from 5 R1, 12 R2, and 24 R3 building blocks. In addition to showing structures, Table 2 indicates the frequency with which the building blocks occur in the top 100. Under each building block structure, we show the number of molecules containing a particular building block and the number of instances found by TS. For instance, the R1 building block on the left was contained in 92 of the top 100 molecules, and 90 of 92 instances were found by TS. As such, we list “90/92” below the structure.
Table 2. Building Blocks Used to Generate the Top Molecules in the Tanimoto Similarity Examplea.
The numerator in the fraction below each structure indicates the number of occurrences of that reagent found by TS, while the denominator indicates the number of occurrences in the top 100 molecules.
Examining Table 2, we see that while TS identified only one of the five R1 building blocks, that building block accounted for 92 of the top 100 molecules. This result points to one potential limitation of our current TS implementation. When one building block contributes significantly to the score, the diversity of solutions at that position can be limited. Several approaches could potentially be applied to increase sampling diversity, including limiting the number of times a building block can be sampled or introducing a polling approach similar to that used by Smellie28 for conformational sampling. Improving the sampling efficiency is an active research area for our group, and we hope to report further advances in a future publication. By releasing our open source TS code, we also hope that others will develop improved sampling approaches and publish their methods and code. While TS fails to identify some R1 building blocks, the results for R2 and R3 are more promising. For R2, TS identifies 11 of the 12 top building blocks and fails to identify only the cyclobutyl sulfone on the far right in the second row. For R3, TS identifies all of the carboxylic acid building blocks. It is interesting that the two top-scoring R3 reagents are the enantiomers d-alanine and l-alanine. Since the Tanimoto similarity searches run here did not consider stereochemistry, molecules with different stereochemistry at the chiral methyl group will receive equal Tanimoto scores.
Comparing TS with Model-Assisted Active Learning
Over the past few years, ML model-assisted AL has become popular for large-scale virtual screening. In ML model-assisted AL, a machine learning model is used as a surrogate for a more computationally expensive calculation (CEC) such as docking or free-energy calculations. In an iterative process, an AL algorithm attempts to balance the exploration of the chemical space with the exploitation of specific promising regions. The AL process typically begins by sampling a small subset, typically a few hundred to a few thousand molecules, from a larger DB of molecules. We refer to this subset as the training set. After the selection, CEC is performed on the training set molecules, and a machine learning model is trained to predict the CEC values from the training set structures. This model is then used to predict the CEC values for all of the molecules in DB. These predicted values direct the selection of the next set of molecules (S1) for CEC. Several strategies can be used to select S1. In a “greedy” strategy, the N highest-scoring molecules are selected for the next round of CEC. Other selection strategies can use features such as the uncertainty in the prediction to balance exploration and exploitation. After CEC is calculated for S1, the chemical structures and corresponding values for CEC are added to the training set, and a new model is trained and used to predict a new set of CEC values for DB. This process of selecting a subset, calculating CEC, building a model, predicting on DB, and selecting another subset is continued for a predetermined number of iterations.
There are two important differences between the TS and AL.
1. AL requires the complete enumeration of a library, while TS does not. This enumeration can be computationally expensive for large libraries. It took almost 7 h on a single CPU to enumerate the 94 million molecule library described above. As individual libraries approach a billion molecules, the time required for enumeration can become cumbersome. While enumerated libraries can be stored on disk and reused, multibillion molecule libraries will consume hundreds of gigabytes of disk space.
2. Model predictions must be calculated for every molecule in DB in each AL cycle. When libraries become very large, this can be time-consuming. Assuming that a machine learning model can perform inference on 10,000 molecules per second, predictions for 1 billion molecules would require 27 h for a single AL cycle. In addition, the descriptors for 1 billion molecules will typically not fit in computer memory, so parallelization or paging strategies must be employed.
While it is possible to increase the speed of enumeration and inference in AL by employing parallel processing, these improvements come at the expense of a significantly more complex computational architecture.
We used a scaled-down version of the quinazoline library in Figure 2 to provide a computationally tractable and reproducible comparison of TS and AL. In this example, we used 100 each of the R1, R2, and R3 reagents for a total library of 1 million molecules. To achieve parity in the number of evaluations performed by TS and AL, we performed a total of 5000 evaluations for each method. For the TS arm of the comparison, we stopped after 5000 evaluations. For AL, we performed five cycles with 1000 molecules per cycle for a total of 5000 evaluations. Like the reproducibility analyses above, we carried out 10 replicates of TS and AL.
TS was performed as described above with 10 warmup cycles. We initially sampled 1000 molecules for AL and calculated Tanimoto similarities to the molecule, as shown in Figure 3. These Tanimoto similarities and the structures of the 1000 molecules were then used to build an ML model. In the model, the molecules were represented by 2048-bit Morgan fingerprints with a path length of 2. Random Forest regression, as implemented in scikit-learn version 1.2.2, was used to establish a model that predicted the Tanimoto similarity. This model was then used to perform inference on the entire 1 million molecule data set. In a greedy selection, the 1000 highest-scoring molecules not selected in the previous round(s) were added to the training set, and the model was retrained. This process was repeated for a total of five rounds. A Jupyter notebook with AL implementation is provided in the GitHub repository associated with this paper.
Figure 7, which shows the results of this comparison between TS and AL, employs a format similar to that in Figures 4–6. In the strip plot at the top, we see that the score distributions are similar for TS and AL. Using student’s t-test to exhaustively compare the means of all distributions in the top plot in Figure 7, the only significant difference is between AL run 6 and the other runs. A heatmap with a complete set of p-values for the pairwise comparisons is available in Figure S4. The bottom barplot in Figure 7 shows the number of top 100 hits found by TS and AL in each replicate. TS finds between 79 and 84 of the top 100 hits, while AL finds between 61 and 100. This may suggest an advantage for AL methods that build models based on whole molecule representations. However, given the small size of this library and a single example, it is difficult to generalize. As mentioned above, there are some current limitations to methods that perform searches in the reagent space. Additional research is necessary to determine optimal protocols for TS and the most appropriate ways to compare TS with other methods.
Figure 7.
Comparison between TS and model-assisted active learning (AL). The strip plot in the top figure shows the Tanimoto coefficients of the 100 top-scoring molecules. The “concat” column shows the scores of the 100 best unique molecules from the 10 TS and AL runs to the left. The “ref” column on the far right shows scores from an exhaustive search of one million molecules. The bar plots in the lower panel show the number of the top 100 molecules recovered in each replicate.
Thus, far, we have been using simple similarity searches to demonstrate the utility of the TS procedure. While this provides a means of rapidly evaluating TS, the actual value of the technique becomes apparent when more computationally expensive methods, such as docking or 3D similarity searches, are used as the objective. Our open source TS implementation defines an abstract base class called an “Evaluator”, which specifies the evaluation functions for TS. Python classes inherited from the Evaluator base class can use various criteria, including 2D similarity, docking score, 3D similarity, and machine learning models. An Evaluator class accepts an RDKit molecule as an input and returns a score as a floating point number. In the Python class that defines the TS procedure, the user defines whether a higher or lower score represents an improvement.
Using TS to Perform 3D Similarity Searches on 234 Million Molecules
To evaluate the ability of TS in the context of 3D similarity searches, we performed a search using ROCS29 from OpenEye Scientific Software. Calculations were performed using version 3.4.11 of the OEShape Toolkit. ROCS overlays molecular conformations and calculates scores based on the overlap of atom-centered Gaussian representations. ROCS outputs two scores, both of which are between 0 and 1. The shape score evaluates the steric overlap of two molecules, and the color score evaluates the overlap of the corresponding pharmacophoric groups. The sum of these two scores, which ranges between 0 and 2, is termed the TanimotoCombo score. In this example, we used a library based on reaction m_22bbh, as shown in Figure 8, from the Enamine REAL collection. The library comprised 13,995 secondary amines and 16,724 carboxylic acids for a total of 234,052,380 reaction products.
Figure 8.

Enamine reaction M22, which was used for the ROCS and docking experiments.
As with the Tanimoto similarity example, one random molecule from the library, as shown in Figure 9, was selected as a query. There were two molecules in the library with a TanimotoCombo score of 2.0 to the query, the molecule itself and its diastereomer.
Figure 9.

Query molecule used for the ROCS searches.
An exhaustive ROCS search of this library required 110,208 CPU hours (12.5 CPU years). We used TS with ROCS as the objective function to search 0.1% of the same combinatorial library. In this case, we utilized 10 warmup cycles. Each TS run took 32 CPU hours. 10 TS runs were performed, and the top 100 unique molecules from the 10 TS runs were concatenated into a group marked “concat” in Figure 10. This figure follows the same format as that in Figures 4–7 above. In this case, the top panel shows the TanimotoCombo scores for the top 100 molecules from the 10 TS runs, concatenation, and exhaustive search. The top panel of Figure 10 shows that each of the TS runs identified many of the top-scoring molecules. This is further quantified in the lower panel, which shows the number of molecules in the top 100 identified in each run. 54 of the top 100 molecules were identified in the worst-performing TS run. The concatenation of 10 TS runs identified 69 of the 100 best molecules. While the ROCS results do not show the same additive performance we see with Tanimoto similarity searches, we can identify between 54 and 69% of the top 100 molecules by only evaluating 0.1% of the library.
Figure 10.
Performance of TS vs an exhaustive ROCS search of 234 million molecules. The top panel shows the distribution of ROCS scores for the top 100 molecules. The “concat” column shows the scores of the 100 best unique molecules from the 10 TS runs to the left. The “ref” column shows the results from an exhaustive search. The bar plots in the bottom panel show the number of molecules within the top 100 from the exhaustive search discovered by TS.
Figure S5 shows a pairwise comparison of the score distributions from the 10 replicate TS ROCS runs and the exhaustive reference. The lowest p-value is 0.01 for the comparison between the exhaustive reference and TS run 6. Given that we are comparing 11 ROCS runs, the lowest p-value is greater than the Bonferroni corrected threshold of 0.0045.
Using TS to Dock 335 Million Molecules
As another example of how TS can be practically applied, we examined a docking study of an internal protein target. In this case, we compared the performance of TS with exhaustive docking of the Enamine library m_22bba. This library uses the reaction defined in Figure 8 with a different set of building blocks. The reagents comprised 17,741 secondary amines and 18,873 carboxylic acids, resulting in 334,825,893 reaction products. Docking was performed using version 4.0.01 of the OEDocking30,31 toolkit from OpenEye Scientific Software. Docking scores were calculated using the ChemGauss4 scoring function, which incorporates both empirical and physics-based terms. The protein structure was prepared by using the default settings from version 1.4.0.0 of the OpenEye Spruce toolkit. A box with dimensions of 20 × 23 × 19 Å, centered on the binding site, was used to define the active site for docking. The exhaustive search was run using 10,000 cloud-based CPUs on Amazon Web Services, consuming 1.6 million CPU hours (1826 CPU years). To evaluate the performance of TS, we used our implementation to search the same 335 million molecules. Based on our studies, docking with TS requires more sampling than the ROCS or Tanimoto searches described above. As such, we set the number of iterations so that 1% of the library was searched.
We compared the results of 10 TS runs with the ground truth obtained through exhaustive docking of the entire library and evaluated the ability of TS to identify the 100 top-scoring molecules from the exhaustive search. In the top panel of Figure 11, we plot the docking scores for the top 100 molecules for each of the 10 TS runs. To be consistent with the previous figures, where larger values indicate better scores, we show the ChemGauss4 score multiplied by −1.0. As above, the “concat” column shows the top 100 docking scores for the best 100 unique molecules from the concatenation of the 10 TS runs to the left. In the bottom panel of Figure 10, we see that all 10 TS replicates were able to identify more than half of the top 100 molecules. At worst, the TS runs identified 57 of the top 100 molecules. A concatenation of 10 runs raised the number of top 100 molecules recovered to 62. Figure S5 shows the p-values for pairwise comparisons of the distributions in the top part of Figure 11. While the differences between the 10 replicates are not significant, there is a significant difference (minimum p = 0.00033) between the TS runs and the exhaustive reference.
Figure 11.
Performance of TS vs exhaustive docking of 334 million molecules. The top panel shows the distribution of docking scores for the top 100 molecules. The “concat” column shows the scores of the 100 best unique molecules from the 10 TS runs to the left. The “ref” column shows the results from an exhaustive search. The bar plots in the bottom panel show the number of molecules within the top 100 from the exhaustive search recovered by TS.
TS under the Hood
Figure 12 depicts the background distribution of scores from the exhaustive searches for docking, 3D overlays, and Tanimoto similarity. The minimum score of the top 100 molecules identified by TS is indicated in each figure by the blue dashed vertical line along with its Z score. The high Z scores (ranging from 6.7–12.5) indicate that TS effectively samples the extreme tail of the distribution.
Figure 12.
Background distributions for each of the libraries used in the exhaustive searches done with (A) docking, (B) ROCS, and (C) Tanimoto similarity. The distributions were created by randomly sampling 10,000 to 50,000 scores from the complete set of scores in the exhaustive search. The background density is shown in orange and overlaid by a normal distribution PDF. The vertical blue dashed line represents the minimum score in the top 100 molecules identified when doing TS on the same library; the Z score of the vertical line is indicated in blue text.
As we assess the impact of parameter settings on the performance of TS, it is informative to examine the evolution of the belief distributions associated with the individual reagents. In Figure 13, we plot the belief distributions for the reagents associated with the top 100 products for Tanimoto similarity, ROCS, and docking. In all three cases, we evaluated TS with the same 1 million molecular library constructed using the quinazoline reaction, as shown in Figure 2. In this case, the library was constructed from 100 R1, 100 R2, and 100 R3 reagents. The query molecule for ROCS and Tanimoto similarity calculations was the structure shown in Figure 3. In this case, the query molecule was not contained in the 1 million molecules being searched. For the docking evaluation, the molecules were docked into the orthosteric site of JNK3 (PDB 2ZDT). The parameters for the TS were similar to those listed above. A total of 10 warmup cycles were run for all of the TS runs. The number of unique reagents present in the top 100 molecules for each method is shown in Table 3. As expected, the more “literal” Tanimoto similarity calculations were dominated by a small number of R1 and R2 reagents. In the more “abstract” ROCS and docking calculations, more reagents were involved in the top 100 molecules.
Figure 13.

Belief distributions for the reagents comprising the top 100 molecules in a 1 million molecule library. Rows in the figure correspond to the percentage of the library that has been searched.
Table 3. Number of Unique Reagents in the Top 100 Product Molecules from Tanimoto Similarity, a ROCS Search, and Docking of a 1 Million Molecule Library.
| method | R1 | R2 | R3 |
|---|---|---|---|
| Tanimoto | 2 | 5 | 23 |
| ROCS | 34 | 6 | 12 |
| docking | 24 | 9 | 41 |
Each subplot in Figure 13 shows kernel density estimates (KDEs) for the reagents used to construct the top 100 product molecules. The figure comprises three subfigures showing the KDEs for R1, R2, and R3 for Tanimoto similarity, ROCS, and docking. The rows in the figure show progressively increasing amounts of sampling. In the first row, 100 molecules (0.01% of the total) have been sampled. In the subsequent rows, 1000, 5000, and 10,000 molecules were sampled, representing 0.1, 0.5, and 1% of the total. In examining the second row in Figure 13, we see that after 0.1% of the DB has been sampled, frontrunner reagents emerge for Tanimoto similarity and ROCS. However, we also see that the belief distributions for the docking calculations are all very similar. While there are a couple of right-shifted docking distributions for R2, there are very few differences for R1 and R3. As we approach 0.5 and finally 1% of the DB searched, we see the emergence of frontrunner reagents for R1 and R2. For all three methods, we see less differentiation for the R3 distributions. For Tanimoto similarity and ROCS, the R3 group in the query molecule only contains three atoms. For docking, the R3 group explored a range of hydrophobic and hydrogen bonding interactions, and many R3 groups scored similarly. In addition to providing insights into the inner workings of TS, plots like those in Figure 13 enable the user to quickly ascertain which parts of top molecules have consistent high-scoring interactions.
Recommendations and Future Directions
In our hands, TS has proven to be useful for a wide range of virtual screening tasks. Based on empirical studies, we have found that three warmup cycles typically suffice for two component libraries while 10 warmup cycles are preferred for three component libraries. However, since warmup cycles account for a small portion of the overall runtime, we will typically set this parameter to 10. We tend not to set a predefined number of iterations. Instead, we stop the TS run when the score fails to improve over some predefined number of iterations (typically 1000 to 10,000, depending on the task). Ultimately, applying TS to virtual screening is a new technique, and best practices will have to be discovered empirically. Given the speed of TS, parameter scans can be rapidly performed, and the method can be adapted to specific objectives and data sets. It is our hope that by releasing the code for our TS implementation, others will take the opportunity to explore parameters and report their experiences.
Of course, TS also has some limitations. First, it can only be applied to combinatorial libraries where library chemistries and building blocks are known. It is not currently possible to apply TS to diverse screening libraries where the library chemistry has not been specified. One could imagine using retrosynthesis to decompose a library and use TS on the building blocks, but this would be nontrivial to implement. Another limitation was pointed out in an earlier section. In some circumstances, TS may focus too much on specific building blocks and miss top-scoring products. As mentioned above, there are many ways to improve sampling and integrate additional chemical intelligence. Even with these limitations, the TS is a useful technique. In the examples presented herein, TS can identify more than half of the top-scoring molecules by searching less than one percent of the total data set.
Our application of TS differs in several ways from the standard multiarmed bandit framework. First, the score distributions are nonstationary. From the perspective of a single reagent, the updates to the belief distributions of other reagents will produce a score distribution that varies over time. Furthermore, once a given set of reagents has been picked, we never evaluate that set of reagents again. Once the top partners for a given reagent have been picked, the future scores for this reagent will go down. Second, our goal is to find the tail of the joint distribution (the best-scoring molecules) and not to find the reagents with the best mean score (which is the typical TS regret objective) when paired with other reagents. Lastly, we typically want to find a large set of top-scoring molecules, not the single best set of reagents. The typical evaluation of TS would suggest that optimal performance is finding the single best reagents immediately and only selecting those.
Despite these differences from the theoretical foundations for TS, the empirical performance shown here is strong. However, we do observe that the algorithm here is sometimes unable to find all top-scoring molecules, as can be seen in the plateauing in Figures 4, 5, 10, and 11. We suggest that this may be due to the discrepancies in the theoretical framework where TS has been evaluated and the practical application for reagent-based virtual screening. Extensions to the theoretical analysis or algorithm to better address these discrepancies are avenues for future work.
In this work, we have shown applications of TS in which molecules are enumerated one at a time and scored. The score for the full molecule is then used to update the belief distributions for the corresponding building blocks. TS can also be run in batch mode, where sets of molecules are enumerated and evaluated. Developing optimal schedules for batch-mode TS is an open research project, and we hope to report more on this in the future.
Conclusions
TS provides an efficient means of searching the ultralarge combinatorial libraries that have become prevalent through the increased availability of synthesis on-demand chemistry. This highly flexible method can be applied to various objectives including 2D and 3D similarity searches and protein–ligand docking. In principle, TS can be applied using any function that takes a molecule as an input and returns a score. The only requirement for the method to operate is that a library must be specified as a set of building blocks that can be assembled into final molecules that can be evaluated.
With TS, we can evaluate less than 1% of a data set and find a significant fraction of the best molecules. Table 4 compares the CPU time required for TS and an exhaustive search of the three examples shown in this paper. In our tests, TS achieved speedups of more than 1,000-fold over exhaustive searches. Using multiple CPUs, we can search a library containing more than a billion molecules in a few hours. Depending on the evaluation metric, TS could identify between 57 and 90 of the top 100 molecules from an exhaustive search. This combination of speed and accuracy makes TS an excellent addition to the virtual screening toolbox.
Table 4. Comparison of the Runtimes for TS and an Exhaustive Search for the Three Examples Presented Herein.
| molecules searched | exhaustive CPU hrs | TS CPU hrs | |
|---|---|---|---|
| Taimoto similarity | 94,000,000 | 144 | 0.003 |
| ROCS search | 234,000,000 | 110,208 | 32 |
| docking | 335,000,000 | 1,600,000 | 16,000 |
Data Availability Statement
To facilitate the reproducibility and extension of this work, we have made an open source reference implementation of TS available at https://github.com/PatWalters/TS. The quinazoline library used for the Tanimoto validation is available in the same repository. Due to licensing restrictions, we cannot distribute the results of our exhaustive searches using the Enamine REAL library. The REAL library can be obtained from Enamine. While we would have loved to provide an ultralarge library docking comparison of TS and an exhaustive search on an open library, we cannot justify the expense of the exhaustive search. The above GitHub repository contains Python classes for performing TS evaluations with ROCS and docking. These components require software licenses from OpenEye Scientific software. Extending the existing framework to other docking and molecular similarity approaches should be straightforward.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.3c01790.
Heatmaps showing the statistical significance of comparisons (PDF)
The authors declare no competing financial interest.
Supplementary Material
References
- Walters W. P.; Stahl M. T.; Murcko M. A. Virtual Screening—an Overview. Drug Discovery Today 1998, 3, 160–178. 10.1016/S1359-6446(97)01163-X. [DOI] [Google Scholar]
- Shoichet B. K. Virtual Screening of Chemical Libraries. Nature 2004, 432, 862–865. 10.1038/nature03197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bender B. J.; Gahbauer S.; Luttens A.; Lyu J.; Webb C. M.; Stein R. M.; Fink E. A.; Balius T. E.; Carlsson J.; Irwin J. J.; Shoichet B. K. A Practical Guide to Large-Scale Docking. Nat. Protoc. 2021, 16, 4799–4832. 10.1038/s41596-021-00597-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Irwin J. J.; Shoichet B. K. Docking Screens for Novel Ligands Conferring New Biology. J. Med. Chem. 2016, 59, 4103–4120. 10.1021/acs.jmedchem.5b02008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bender A.; Glen R. C. Molecular Similarity: A Key Technique in Molecular Informatics. Org. Biomol. Chem. 2004, 2, 3204–3218. 10.1039/b409813g. [DOI] [PubMed] [Google Scholar]
- Stumpfe D.; Bajorath J. Similarity Searching. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2011, 1, 260–282. 10.1002/wcms.23. [DOI] [Google Scholar]
- Sheridan R. P.; Kearsley S. K. Why Do We Need so Many Chemical Similarity Search Methods?. Drug Discovery Today 2002, 7, 903–911. 10.1016/S1359-6446(02)02411-X. [DOI] [PubMed] [Google Scholar]
- Nicholls A.; McGaughey G. B.; Sheridan R. P.; Good A. C.; Warren G.; Mathieu M.; Muchmore S. W.; Brown S. P.; Grant J. A.; Haigh J. A.; Nevins N.; Jain A. N.; Kelley B. Molecular Shape and Medicinal Chemistry: A Perspective. J. Med. Chem. 2010, 53, 3862–3886. 10.1021/jm900818s. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vamathevan J.; Clark D.; Czodrowski P.; Dunham I.; Ferran E.; Lee G.; Li B.; Madabhushi A.; Shah P.; Spitzer M.; Zhao S. Applications of Machine Learning in Drug Discovery and Development. Nat. Rev. Drug Discovery 2019, 18, 463–477. 10.1038/s41573-019-0024-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bender A.; Cortes-Ciriano I. Artificial Intelligence in Drug Discovery: What Is Realistic, What Are Illusions? Part 2: A Discussion of Chemical and Biological Data. Drug Discovery Today 2021, 26, 1040–1052. 10.1016/j.drudis.2020.11.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Hilten N.; Chevillard F.; Kolb P. Virtual Compound Libraries in Computer-Assisted Drug Discovery. J. Chem. Inf. Model. 2019, 59, 644–651. 10.1021/acs.jcim.8b00737. [DOI] [PubMed] [Google Scholar]
- Walters W. P. Virtual Chemical Libraries. J. Med. Chem. 2019, 62, 1116–1124. 10.1021/acs.jmedchem.8b01048. [DOI] [PubMed] [Google Scholar]
- Nicolaou C. A.; Watson I. A.; Hu H.; Wang J. The Proximal Lilly Collection: Mapping, Exploring and Exploiting Feasible Chemical Space. J. Chem. Inf. Model. 2016, 56, 1253–1266. 10.1021/acs.jcim.6b00173. [DOI] [PubMed] [Google Scholar]
- Zabolotna Y.; Volochnyuk D. M.; Ryabukhin S. V.; Gavrylenko K.; Horvath D.; Klimchuk O.; Oksiuta O.; Marcou G.; Varnek A. SynthI: A New Open-Source Tool for Synthon-Based Library Design. J. Chem. Inf. Model. 2021, 62, 2151–2163. 10.1021/acs.jcim.1c00754. [DOI] [PubMed] [Google Scholar]
- Warr W. A.; Nicklaus M. C.; Nicolaou C. A.; Rarey M. Exploration of Ultralarge Compound Collections for Drug Discovery. J. Chem. Inf. Model. 2022, 62, 2021–2034. 10.1021/acs.jcim.2c00224. [DOI] [PubMed] [Google Scholar]
- Cavasotto C. N.; Di Filippo J. I. The Impact of Supervised Learning Methods in Ultralarge High-Throughput Docking. J. Chem. Inf. Model. 2023, 63, 2267–2280. 10.1021/acs.jcim.2c01471. [DOI] [PubMed] [Google Scholar]
- Sivula T.; Yetukuri L.; Kalliokoski T.; Käsnänen H.; Poso A.; Pöhner I. Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries. J. Chem. Inf. Model. 2023, 63, 5773–5783. 10.1021/acs.jcim.3c01239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Y.; Yao K.; Repasky M. P.; Leswing K.; Abel R.; Shoichet B.; Jerome S. Efficient Exploration of Chemical Space with Docking and Deep-Learning. J. Chem. Inf. Model. 2021, 17, 7106–7119. 10.1021/acs.jctc.1c00810. [DOI] [PubMed] [Google Scholar]
- Graff D. E.; Shakhnovich E. I.; Coley C. W. Accelerating High-Throughput Virtual Screening through Molecular Pool-Based Active Learning. Chem. Sci. 2021, 12, 7866–7881. 10.1039/D0SC06805E. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thompson J.; Walters W. P.; Feng J. A.; Pabon N. A.; Xu H.; Maser M.; Goldman B. B.; Moustakas D.; Schmidt M.; York F. Optimizing Active Learning for Free Energy Calculations. Artif. Intell. Life Sci. 2022, 2, 100050. 10.1016/j.ailsci.2022.100050. [DOI] [Google Scholar]
- Berenger F.; Kumar A.; Zhang K. Y. J.; Yamanishi Y. Lean-Docking: Exploiting Ligands’ Predicted Docking Scores to Accelerate Molecular Docking. J. Chem. Inf. Model. 2021, 61, 2341–2352. 10.1021/acs.jcim.0c01452. [DOI] [PubMed] [Google Scholar]
- Sadybekov A. A.; Sadybekov A. V.; Liu Y.; Iliopoulos-Tsoutsouvas C.; Huang X.-P.; Pickett J.; Houser B.; Patel N.; Tran N. K.; Tong F.; Zvonok N.; Jain M. K.; Savych O.; Radchenko D. S.; Nikas S. P.; Petasis N. A.; Moroz Y. S.; Roth B. L.; Makriyannis A.; Katritch V. Synthon-Based Ligand Discovery in Virtual Libraries of over 11 Billion Compounds. Nature 2022, 601, 452–459. 10.1038/s41586-021-04220-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beroza P.; Crawford J. J.; Ganichkin O.; Gendelev L.; Harris S. F.; Klein R.; Miu A.; Steinbacher S.; Klingler F.-M.; Lemmen C. Chemical Space Docking Enables Large-Scale Structure-Based Virtual Screening to Discover ROCK1 Kinase Inhibitors. Nat. Commun. 2022, 13, 6447. 10.1038/s41467-022-33981-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thompson W. R. On the Likelihood That One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrika 1933, 25, 285–294. 10.1093/biomet/25.3-4.285. [DOI] [Google Scholar]
- Slivkins A. Introduction to Multi-Armed Bandits. Found. Trends Mach. Learn. 2019, 12, 1–286. 10.1561/2200000068. [DOI] [Google Scholar]
- Hensbergen A. W.; Mills V. R.; Collins I.; Jones A. M. An Expedient Synthesis of Oxazepino and Oxazocino Quinazolines. Tetrahedron Lett. 2015, 56, 6478–6483. 10.1016/j.tetlet.2015.10.008. [DOI] [Google Scholar]
- Brown D. G. An Analysis of Successful Hit-to-Clinical Candidate Pairs. J. Med. Chem. 2023, 66, 7101–7139. 10.1021/acs.jmedchem.3c00521. [DOI] [PubMed] [Google Scholar]
- Smellie A.; Teig S. L.; Towbin P. Poling: Promoting Conformational Variation. J. Comput. Chem. 1995, 16, 171–187. 10.1002/jcc.540160205. [DOI] [Google Scholar]
- Rush T. S.; Grant J. A.; Mosyak L.; Nicholls A. A Shape-Based 3-D Scaffold Hopping Method and Its Application to a Bacterial Protein-Protein Interaction. J. Med. Chem. 2005, 48, 1489–1495. 10.1021/jm040163o. [DOI] [PubMed] [Google Scholar]
- McGann M. R.; Almond H. R.; Nicholls A.; Grant J. A.; Brown F. K. Gaussian Docking Functions. Biopolymers 2003, 68, 76–90. 10.1002/bip.10207. [DOI] [PubMed] [Google Scholar]
- McGann M. FRED Pose Prediction and Virtual Screening Accuracy. J. Chem. Inf. Model. 2011, 51, 578–596. 10.1021/ci100436p. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
To facilitate the reproducibility and extension of this work, we have made an open source reference implementation of TS available at https://github.com/PatWalters/TS. The quinazoline library used for the Tanimoto validation is available in the same repository. Due to licensing restrictions, we cannot distribute the results of our exhaustive searches using the Enamine REAL library. The REAL library can be obtained from Enamine. While we would have loved to provide an ultralarge library docking comparison of TS and an exhaustive search on an open library, we cannot justify the expense of the exhaustive search. The above GitHub repository contains Python classes for performing TS evaluations with ROCS and docking. These components require software licenses from OpenEye Scientific software. Extending the existing framework to other docking and molecular similarity approaches should be straightforward.










