Skip to main content
Communications Chemistry logoLink to Communications Chemistry
. 2025 Nov 7;8:335. doi: 10.1038/s42004-025-01758-x

Ultra-large library screening with an evolutionary algorithm in Rosetta (REvoLd)

Paul Eisenhuth 1,2,, Fabian Liessmann 1, Rocco Moretti 3, Jens Meiler 1,2,3,4
PMCID: PMC12594993  PMID: 41203805

Abstract

Ultra-large make-on-demand compound libraries now contain billions of readily available compounds. This represents a golden opportunity for in-silico drug discovery. One challenge, however, is the time and computational cost of an exhaustive screen of such large libraries when receptor flexibility is taken into account. We propose an evolutionary algorithm to search combinatorial make-on-demand chemical space efficiently without enumerating all molecules. We exploit the feature of make-on-demand compound libraries, namely that they are constructed from lists of substrates and chemical reactions. Our algorithm RosettaEvolutionaryLigand (REvoLd) explores the vast search space of combinatorial libraries for protein-ligand docking with full ligand and receptor flexibility through RosettaLigand. A benchmark of REvoLd on five drug targets showed improvements in hit rates by factors between 869 and 1622 compared to random selections. REvoLd is available as an application within the Rosetta software suite (https://docs.rosettacommons.org/docs/latest/revold). This work formulates an evolutionary algorithm for optimization and exploration of ultra-large make-on-demand libraries. We demonstrate that our approach results in strong and stable enrichment, offering the most efficient algorithm for drug discovery in ultra-large chemical space to date.

Subject terms: Virtual screening, Molecular modelling, Cheminformatics, Computational chemistry


The vastness of ultra-large make-on-demand compound libraries presents a challenge for efficient in-silico drug discovery. Here, the authors introduce RosettaEvolutionaryLigand (REvoLd), an evolutionary algorithm that significantly enhances hit rates in protein-ligand docking by efficiently navigating combinatorial chemical spaces spanning billions of compounds.

Introduction

Drug discovery is a complex and time-consuming process. A campaign typically starts with target selection, followed by hit identification. Hit identification is usually done through screening experiments. While other factors play a role, chances of success increase with the number of tested compounds. However, acquiring a large number of molecules is expensive and testing them in bulk requires specialized infrastructure1. One widely adopted solution is virtual high-throughput screening (vHTS), where molecules are pre-screened on computers and filtered for predicted activity2. One of the most significant challenges limiting vHTS is the size of the chemical space, which is estimated to contain up to 1060 possible drug-like molecules3. In addition to the lack of computational capacity to store and screen such a large number of molecules, defining a chemical space that is drug/lead-like and synthetically accessible presents a hindrance46. A wide range of technologies has been developed to generate molecules tailored to specific areas of interest. However, many computational approaches are never thoroughly tested as the barrier for synthesis is too high and, thus, compounds are unavailable for in-vitro testing611. Make-on-demand combinatorial libraries can overcome this challenge if vHTS algorithms can be tailored to sample this focused but still prodigious chemical space. These libraries combine simple building blocks through robust reactions to form billions of readily and economically available molecules. In the best-case scenario, they allow the confirmation of bioactive hit molecules from in-silico prediction through in-vitro evaluation within a few weeks1214. However, the increased make-on-demand chemical space is not only an opportunity but also a challenge. A small number of vHTS campaigns have been conducted on molecule libraries exceeding a hundred million compounds, even fewer exceeding billions, and they all required substantial computational resources13,1517. Additionally, most of the computational time in such campaigns is spent on molecules of no further interest for the following steps of the drug discovery campaign due to low hit rates.

The majority of these vHTS campaigns utilize rigid docking, as it tremendously decreases the computational demands compared to flexible docking. However, this introduces potential error sources, as rigid docking might not be able to sample favorable protein–ligand structures18. This is in line with previous findings where the introduction of both protein and ligand flexibility increased success rates notably1922. Throughout this study, we used the RosettaLigand flexible docking protocol2325. It is well-positioned among other available methods and showed strong ranking capabilities during screens of the Enamine REAL space2527.

To date, several solutions have been proposed to address this problem. The Deep Docking platform28,29 utilizes a mixture of conventional docking algorithms and neural networks to screen a subset of the target space and quantitative structure–activity relationship (QSAR) models to evaluate the remaining target space. However, this approach, called active learning, still requires the docking of tens to hundreds of millions of molecules and calculating QSAR descriptors for the whole billion-sized molecular library. A similar idea is used, for example, by Luttens et al.30, RosettaVS31, MolPal32 and HASTEN33.

Another promising solution is V-SYNTHES34,35. Instead of docking the final molecules, V-SYNTHES starts with docking of single fragments, picking the most promising ones, and iteratively adding more fragments to the growing scaffolds until final molecules are built. SpaceDock follows the same concept, but is not limited to commercially available combinatorial libraries36. Chemical Space Docking is essentially the same approach as V- SYNTHES and SpaceDock, but is a general instruction instead of being a ready-to-use software37. A similar approach, called Targeted Exploration38, filters the synthons of Enamine’s REAL space39 for similarity to known binders. The most promising synthons are used to enumerate ligand libraries. Search on chemical space near functional molecules requires previous structural knowledge of the molecules, which is not always available. To create such a space, millions of computational docking procedures have to be performed. Other active learning algorithms are SpaceHASTEN40 and Thompson Sampling41.

Recently42, published an evolutionary algorithm called Galileo to optimize molecules in chemical combinatorial space. The algorithm is not tailored towards a specific optimization goal or chemical space, but accepts any function that assigns a score to a molecule and treats reaction rules as general as possible. The algorithm was tested for a similarity search and optimization of pharmacophores, although with mixed success. A total of five million fitness calculations in the context of structure-based drug design makes expansive docking models unfeasible. Another approach using an evolutionary (or genetic) algorithm is SpaceGA43. It utilizes established mutation and crossover rules and maps the resulting molecules back to the combinatorial chemical space through similarity search with SpaceLight44. Both algorithms showed promising performance. This is in line with recent analysis on the potential of genetic algorithms, showing that their capabilities are on par with modern deep learning methods45,46. Evolutionary algorithms have been used for decades in computer-aided drug discovery (CADD) and were implemented by multiple research groups7,4750. They were all highly successful in finding and optimizing promising compounds, but shared one common drawback—synthetic accessibility. One recently published evolutionary algorithm even puts most of its research effort into assuring easy synthesizability51.

Building on these findings, we propose RosettaEvolutionaryLigand or short REvoLd: An evolutionary algorithm optimizing entire molecules from the Enamine REAL space39. It reveals promising compounds with just a few thousand docking calculations, continues to discover new scaffolds if run multiple times, and enforces high synthetic accessibility. Furthermore, REvoLd’s enrichment capabilities seem to be independent of the size of the space searched.

Results

Hyperparameter and protocol optimization

REvoLd is very flexible and has endless potential protocols (e.g., combinations of selectors, reproduction steps, and several global parameters). To allow for quick testing and optimization of the evolutionary protocol, we created a subset of the Enamine REAL Space consisting of one million scored molecules. This is described in further detail in the section “Pre-docked benchmark”. An iterative approach was used to test different combinations of selection and reproduction mechanics and run parameters. Initially, we selected parameters to bias towards the fittest individuals, allowing only them to mutate and reproduce. This setting proved to converge very fast towards the hit molecules. Its downside was limited exploration of the target space. We introduced several changes to our protocol to offset that effect. First, we increased the number of crossovers between fit molecules to enforce more variance and recombination between well-suited ligands. Second, we added an additional mutation step, which switches single fragments to low-similarity alternatives. This keeps well-performing parts of promising molecules intact but enforces huge changes on small parts of it. And third, another mutation step, which only changes the reaction of a molecule and searches for similar fragments within the new reaction group. This opens larger parts of the combinatorial space for screening. These changes increased the number and diversity of virtual hits tremendously. As a last change, we introduced a second round of crossover and mutation, excluding the fittest molecules, thus allowing worse-scoring ligands to improve and carry their molecular information forward. These changes reliably improved hit rates. Details on tested parameter combinations as well as their hit rates can be found in Supplementary Note 1.

Regarding the size of the random start population, we found that 200 initially created ligands offer enough variety to start the optimization process. More initial ligands might increase the chance of discovering good binders immediately, but greatly increase run-time costs. Fewer initial molecules, on the other hand, have less chance to capture promising structural elements. Next, we tested how many individuals should be allowed to advance to the next generation and found 50 to perform best. Larger populations carry more noise through the generations, which decreases the effectiveness of all reproduction steps, but smaller populations are too homogeneous and therefore hinder exploration of chemical space. Lastly, we found 30 generations of optimization to strike a good balance between convergence and exploration. Good solutions are usually unveiled after 15 generations, but only after 30 generations a flattening of discovery rates has been observed. The algorithm never fully converges and continues to discover well-scored molecules even after 400 generations, but the hit rates become smaller and smaller. Therefore, we advise multiple independent runs instead. The random starting population seeds different paths, which yield different high-scoring motifs. Since each run unveils new promising molecules, the exact number of required runs is solely depending on desired amount of hits.

Additionally, we observed that our algorithm was unable to discover the lowest-scoring molecule of the benchmark test subset of one million molecules. This might be related to the mentioned intended ruggedness of our scoring landscape, which traps runs at local minima. On the other side, it is not uncommon to observe only close-to-optimal solutions from meta-heuristic optimization algorithms like evolutionary algorithms. Considering REvoLd’s purpose, we postulate that this is not a flaw, since structure-based CADD campaigns almost never want to discover the single best-scoring compound, but many promising compounds which will enrich hit rates in in-vitro experiments.

Benchmark under realistic conditions

Based on the first results and selected parameters, we moved on to more realistic benchmarking conditions, utilizing our largest available Enamine REAL space39 with over 20 billion molecules at that time. Details on the five used drug targets and data collection can be found in the section “Drug target data collection”. Twenty runs of REvoLd were conducted against each target, docking between 49,000 and 76,000 unique molecules in total per target. The difference in sampled molecules per target is due to the stochastic nature of evolutionary optimization, as one run might produce more duplicates than another. This includes all docked molecules during the evolutionary optimization, not just the last generation. Figure 1 shows the development of scores in a selected run for each target. All runs successfully reported molecules with hit-like scores. Due to the size of the defined chemical space searched and the stochastic nature of our protocol, there was only a small overlap between the runs. We found that between 1.5% and 3% of tested compounds have a Tanimoto similarity of 1.0 to another compound tested against the same target. These duplicates were removed for further analysis. The performances in Fig. 1 show that four out of five runs found compounds scoring as good as the best known active within the first 200 randomly sampled molecules. While this seems intuitively unlikely, it highlights a shortcoming of the deployed scoring function. As we discussed before, the distribution of scores for known actives is only slightly more negative than the 100,000 large random sample of Enamine molecules. This means there is a significant overlap between known active and random scores. We found that for the four cases in question, between 21 and 45 molecules out of 100,000 show better scores than the best-known active. This translates to a 4–8% chance of observing at least one molecule with such good scores in a sample of 200 initial candidates. These numbers are in line with the presented performances, as we are showing only one out of 20 runs for each target. Additionally, only one of these runs actually found a better-than-active molecule in generation 0; the rest were just within the general scoring range.

Fig. 1. Score development during single REvoLd runs.

Fig. 1

Score distribution statistics from single selected runs are shown for each protein target. Solid lines show the best score found up to a certain generation. The dashed line shows the 10th best score, the dotted the 100th best score and dashed-dotted the median. The runs that reported the best-scoring molecule for each target were selected as examples here. All scores within the gray area are better than the best-scoring known active.

To assess the overall sampling performance, we use enrichment factors. As is often the case for in-silico evaluations, we assume a perfect predictive power for a scoring function at a given threshold and that all compounds scoring better than this threshold will be active in experimental validation. This decouples the performance of a sampling algorithm from its underlying scoring method. The number of hits achieved through a selected sampling method is compared to the number of hits from a random sample. Each molecule in the random sample is selected by first sampling a reaction (weighted by the number of total products in that reaction) and then uniformly sampling one reagent for each available position. This follows the same sampling approach used for the initial starting population and is repeated until 100,000 random molecules are generated. For better comparability, we followed the enrichment factor proposed with V-SYTNHES34 but added a normalization factor for sample sizes. This is implicitly still the same function as V-SYNTHES as they used samples of the same size:

EFi(x)=HitRatei(x)HitRaters(x) 1

where EFi is the enrichment factor of method i, x defines the score limit, rs is the random sample and

HitRatei(x)={mmMi,score(m)<x}Mi 2

where Mi is the set of all molecules sampled with method i. The enrichment rate can be interpreted as how many more hits can be expected from a constant sample size or how many fewer molecules need to be sampled for the same number of hits. We propose to use this calculation for all the following exploration algorithms for better comparability.

Figure 2 reports absolute hit numbers of REvoLd and the random sample for different hit limits, as well as the normalized enrichment of REvoLd over the random sample. REvoLd achieves maximum enrichments up to 869 and 1622 between all five targets, outperforming all currently available algorithms, which enable drug discovery in ultra-large libraries. Additionally, Fig. 2 shows that REvoLd is able to report hundreds of hits for much stricter hit limits than a random sampler. We also want to report the enrichment rates using the score of the best-scoring known active as the hit limit, following widespread practice. REvoLd achieves enrichments in four cases between 200 and 532. No such enrichment can be reported for the ABL1 kinase, since the random sample did not include a single molecule scoring better than the best-scoring known active, but REvoLd unveiled 99 such compounds. Using the best scoring known active as a hit limit can be important because scoring functions tend to overrate molecules showing certain artificially favored structural features. At the same time, this limit alone falls short in comparing sampling strategies' capabilities to optimize into local minima.

Fig. 2. Total number of hits and enrichments.

Fig. 2

We tested REvoLd against five different protein targets, namely Tyrosyl (A), Orexin 1 (B), Muscarinic M1 (C), Y1 Receptor (D) and ABL1 (E). We combined the results of 20 independent REvoLd runs against each target into single hit lists and removed duplicates with a Tanimoto similarity of 1.0. A molecule is to be considered a hit if its score is below a given hit limit. The total number of hits from REvoLd for various hit limits is shown (purple) together with the number of hits from a random sample (orange) on a logarithmic scale. REvoLd tested in total between 49,000 and 76,000 molecules; the random samples always consisted of 100,000 molecules. Therefore, we additionally plot the normalized enrichment of REvoLd over the random sample for various hit limits.

Furthermore, we observed the same convergence behavior as in the section “Hyperparameter and protocol optimization”. No run stopped sampling new molecules until the final generation, but the number of new unique structures per generation started to become relatively low. We could also observe a linear correlation between the number of tested unique molecules and the number of unique Bermis–Murcko scaffolds52. A more thorough discussion of the molecular diversity can be found in Supplementary Note 2.

Qualitative analysis of reproduction mechanics

We conducted a qualitative investigation of the effects of mutation and crossover on a few examples using REvoLd’s reproduction functionalities. We found that the in-silico functionalities effectively mimic the alterations to molecules typically conducted by medicinal chemists. Mutation enabled the introduction of small local changes, such as increased flexibility of certain parts of the molecules or alteration of their geometry through changing the attachment atom of a ring. Crossover, on the other hand, recombined promising motifs into new molecules, effectively transferring and combining knowledge from two separate ligands. This resemblance of medicinal chemistry was never intentionally developed or encoded but emerges naturally from evolutionary optimization paradigms. Figure 3 illustrates both operations for one example, including the change of fitness scores. The observed worsening of scores between C to D and G to H, respectively, also highlights the importance of selective pressure to allow temporally worse scores for new molecules. Both shown mutation steps introduce disadvantageous changes, but the resulting molecules recombine into the best-found solution.

Fig. 3. Example of a molecular family tree.

Fig. 3

The best-scoring molecule for ABL1 is shown to exemplify mutation and crossover. The color highlights do not show synthons, but reappearing structural motifs. A and B recombined through crossover into C, which mutated to (D). Here, A includes a tetrazole ring, a common motif in many FDA-approved drugs. Both A and B contain a 1,2,4-triazole, representing the basis structure to combine it to (C). From C to D a mutation changes the position of the oxymethylpyridine at the benyzl-group from 4 to 3. In the other route, E containing a tetrazole and a 1,2,4-triazole, exchanges moieties with F, containing the substructure 6-Bromo-1,3-benzodioxol. Here, instead of exchanging the exact moiety, a similar synthon in the set is searched, adding a 3-methyl,4-bromo-benzyl group to the offspring (G). A mutation derivatizes the [1,2,4]triazolo[1,5-a]pyridine from G to H's oxybenzyl-moiety. Finally, both D and H recombine into I, introducing the tetrazole-triazole-system from both H and D while including the pyrazole from D and the 3-methyl,4-bromo-benzyl group from (H). All reproduction steps happened in different generations. Positive and negative numbers are observed in unfavorable and favorable score changes.

Runtime analysis

All runs were conducted on Leipzig University’s high-performance parallel computing cluster equipped with AMD EPYC 7713@2.0 GHz–Turbo 3.7 GHz processors. We tested REvoLd runs with 20, 40, 60, and 100 cores per run. Only the first core, acting as the control and distribution core, loaded the entire Enamine REAL space database using around 23 GB of memory for the current library size. Most of the memory is taken up by representing each fragment with an RDKit molecule object. All other cores were used for docking alone and required less than 4 GB, mainly depending on the size of the protein structure. The highly parallel code of REvoLd and its efficient development in C++ caused very fast computation times. A single run with standard settings needs between 24 and 48 h to finish, depending on the number of used CPUs.

As visualized in Fig. 4, more than 99% of the time was spent on docking molecules as REvoLd causes close to no overhead, enabling it to use fully flexible protein and ligand models. Within the time spent on computations outside of docking, most of it is caused by loading and preparing the database. The speedups gained through the usage of more CPUs are acceptable. 20–40 cores yield a speedup of 1.679 (optimal 2.0), 40–60 cores yield 1.479 (optimal 1.5). The poor speedup from 60 to 100 (1.152, optimal 1.666) is due to the implementation. The docking of a single molecule can only be executed on a single CPU, and if all molecules for one generation are distributed, a CPU is idle until the next generation is docked. Therefore, CPU numbers should be integer fractions of the number of new individuals in each generation. Since this is subject to randomness, higher CPU numbers have higher chances of being idle and therefore reduced speedups.

Fig. 4. Runtime analysis.

Fig. 4

The average runtime of REvoLd on different numbers of CPU cores on the high-performance compute cluster. Over 99% of all computational time is spent on docking (gray). More than half of the remaining time is spent loading and preparing the Enamine REAL space data (checkered). The evolutionary optimization (horizontal stripes) and setup of Rosetta (vertical stripes) is comparably fast and therefore neglectable. The setup time is barely visible within the figure and takes ~20 s, independent of the number of cores.

Impact of database sizes

We ran REvoLd on different sizes of chemical spaces during its development process. From one million for the hyperparameter optimization, to around 300 million, 1.3 billion, and finally 20.1 billion, following the growth of the Enamine REAL space. The size of the chemical space had a severe impact on our code development and required several restructurings. The biggest consequence of larger chemical spaces is library initialization times and memory requirements. They grew from a few seconds and several hundred megabytes to six minutes and 23 GB, respectively. However, we found no sign that REvoLd’s capability to unveil low-scoring compounds is affected by the database size. We found it helpful to increase the exploration parts of the protocol slightly, but the number of individuals and generations remained constant. That means the number of required docking runs stays constant, even if the library size increases dramatically. It should be noted, though, that REvoLd was optimized for efficient runtimes whilst neglecting database preparation times and memory requirements. With the continued growth of combinatorial spaces, this would need reconsideration to adapt to larger quantities of reagents.

It should be noted, though, that REvoLd is not suited to conduct exhaustive screens. The randomness inherent in its protocol makes it unfeasible to sample and dock every molecule. As we observed only very limited overlap between runs, we assume that every run optimizes within an independent subspace of the available chemical library and that 20 runs are not enough to cover all subspaces. If more resources are available and larger numbers of hits or more diverse molecules are required, REvoLd runs should be repeated until the desired criteria are reached. Although we expect larger libraries to require tremendously more runs to cover all subspaces, the number of runs to unveil a set number of hits should remain independent of the library size, making REvoLd very scalable to diverse requirements and limitations.

Discussion

With the increased availability of three-dimensional protein structures and easily accessible chemical spaces expanding into billions of molecules, structure-based drug discovery and ultra-large library screening are becoming more and more important12,14,35,53. It is therefore crucial to improve the available protein-ligand-docking technologies to handle compound libraries of these sizes. Besides reducing the required computational time for available tools, it is also beneficial to reduce the number of required dockings. We have shown that REvoLd is a promising tool to help in that regard. Its evolutionary optimization has proven to be reliable and efficient. We were able to create highly enriched datasets for five different protein targets with only a minimal overhead of computational time. We were additionally able to observe optimization behavior comparable to medicinal chemistry practices.

REvoLd is one of several approaches that try to solve the problem of the size of chemical spaces with an efficient sampling algorithm, like V-SYNTHES and Galileo34,42. V-SYNTHES uses a greedy heuristic (meaning it selects fragments only based on their isolated scores and thereby potentially missing combined molecules which exceed the scores of their building blocks), Galileo is another evolutionary algorithm. While neither greedy nor evolutionary is inherently better than the other, the implementations of all algorithms show different performances. Galileo is more of a proof of concept and is not combined with a docking protocol, but accepts any score as fitness and relies on external protocols. Additionally, the authors reported that they were not able to generate good results for all their targets. REvoLd, on the other hand, showed convergence in all test cases. V-SYNTHES is available as a ready-to-use software and reported good results on all benchmarks, just as REvoLd. However, REvoLd shows greater enrichment (between 869 and 1622 for REvoLd and 250 to 460 for V-SYNTHES, respectively) and requires overall fewer docking runs to achieve these results (between 49,000 and 76,000 for REvoLd and 500,000–1,000,000 for V-SYNTHES). It should also be mentioned that V-SYNTHES requires additional docking of intermediate libraries and single fragments, causing a computational overhead of 20–35%. Additionally, we observed that REvoLd did not require more docking runs for larger combinatorial databases, whereas V-SYNTHES reported a linear complexity relation with the number of used fragments. This relationship exists within REvoLd as well, but only for memory requirements and preprocessing. It should be noted, though, that V-SYNTHES reported success in follow-up in-vitro experiments, while REvoLd is so far only benchmarked in-silico.

Alternative methods like Deep Docking and RosettaVS propose a mixture of structure-based docking protocols and QSAR machine learning instead of sampling-based optimization28,29,31. They dock subsets of the whole chemical space, generate molecular fingerprints for all molecules and train a QSAR model on the in-silico docking scores. This gives them an advantage over methods like REvoLd, because they consider information for all molecules and are therefore better suited to discover globally best hit candidates. But this is also a huge disadvantage, because they still require analyzing every molecule, which continues to become more expensive as chemical space grows. Additionally, both methods require several million docking runs and therefore, fall short of REvoLd and V-SYNTHES. A similar number of docking runs is required for Targeted Exploration, which uses prior knowledge about binding fragments to guide the molecule selection38. However, the dependence on prior knowledge is also a big limitation. Together with the required number of dockings, Targeted Exploration becomes less optimal as a universal solution for structure-based drug design in ultra-large chemical space.

REvoLd is well positioned among these methods and provides with its sampling capabilities a promising solution for the problem of ultra-large library screens in general. With that, two main topics require further analysis. The biggest barrier for REvoLd right now is the runtime of the employed docking protocol. Reducing the time spent docking will increase REvoLd’s speed tremendously, as over 99% of the runtime is used for docking. Machine learning based docking tools like DiffDock, EquiBind and DeepDock are promising candidates to lift these restrictions5456. The other important topic is the reliability of our deployed scoring function when transferring REvoLd’s in-silico results to in-vitro testing. Future experiments will need to investigate how prone REvoLd is to overfitting the fitness function. This can happen if a scoring function overestimates certain interactions or substructures in a molecule compared to actual in-vitro findings. Overfitting becomes more important with increasing library size17.

Conclusion

REvoLd is a promising algorithm tackling the problem of searching for potential ligands in combinatorial libraries spanning billions of compounds. Through advanced evolutionary optimization, it enables expensive docking methods for such a task. Our benchmarks have shown promising results, and REvoLd reliably outperformed a random sampler over five different targets. Unlike its competing algorithms, REvoLd is capable of discovering molecules with high predicted affinity with the same number of protein-ligand dockings, even in increasingly larger libraries. To date, it is the only complete pipeline to generate in-silico enriched, target-specific compound lists out of combinatorial chemical libraries without the need to dock several hundreds of thousands of molecules or even more. The availability of compounds through industrial make-on-demand services allows easy experimental validation of suggested hits and completely negates the need for synthetic accessibility scores. Future research will need to focus on improving the currently applied docking protocol. Speeding up the docking calculations will have a significant impact on REvoLd’s runtime performance. Additionally, while we found that lid_root2 can be used to distinguish actives from random samples, further experiments need to investigate the predictive power of RosettaLigand and our normalization approach.

Methods

Evolutionary algorithms mimic Darwinian evolution57. Starting from a random population, individuals are altered, and selective pressure is applied through a fitness function. In each iteration, individuals that are discarded due to their fitness will be replaced by new individuals created through reproduction. In CADD, evolutionary algorithms often utilize docking scores as fitness functions to minimize the free energy between the protein target and the ligand. Reproduction typically consists of mutation and crossover, where mutation alters small parts of the current molecule, like removing or adding a single atom or functional group. Crossover, on the other hand, recombines two or more promising solutions, for example, by cutting two molecules in half and recombining their fragments58. Our proposed algorithm sticks to this paradigm47. REvoLd allows for all parameters to be changed and adapted by the user. Several selection and reproduction methods can be freely combined to form customized evolutionary protocols.

Combinatorial libraries

While our reproduction functions follow the core ideas of evolutionary algorithms, their main novelty is their strict limitation to the chemical space defined by the make-on-demand library. Examples of such libraries are Enamine REAL space, Otava CHEMriya space, and WuXi LabNetwork GalaXi space39,59,60. While they all differ in size and include different molecules, they all define chemical reactions and sets of fragments that can be combined through these reactions. This causes an exponentially larger chemical space of billions of molecules defined through a few hundred reactions and hundreds of thousands or at most single-digit million fragments12,61. These definitions are exploited by our reproduction functions. Each individual in our algorithm represents a single molecule. It is defined through a reaction and a list of fragments used for that specific reaction (Fig. 5). Although we are showing only examples with two-component reactions here to simplify presentation, it should be noted that REvoLd can process reactions of all sizes as long as at least two components are involved.

Fig. 5. Example of fragment combinations.

Fig. 5

A Fragments are defined with attachment points or reaction handles. These are used as hinges to freely combine multiple fragments from predefined lists. For example, the yellow pentagon can be combined with all three fragments in list 2 to form three different products. The product space grows quadratically as both fragment lists grow linearly. B An example of mutation and crossover. The mutation aims to replace the fragment from list 1. A local similarity search through all those fragments yielded the blue hexagon. Next, a crossover between the new molecule and another one happens. Both parents contribute one fragment each and create a new molecule that resembles both its predecessors.

Pre-docked benchmark

We generated a test set of reactions to analyze how well REvoLd can sample in a chemical space with known global minima and to optimize its algorithmic parameters efficiently. Our test set was produced through four two-component reactions and 500 fragments per reagent position. First, we reduced the number of fragments per reaction and position by selecting a random fragment and adding it to an empty list. Iteratively, all remaining fragments for this specific position were calculated a fingerprint-based Tanimoto similarity to all fragments in the list and the one with the lowest average similarity got added to the list as well. This is repeated until 500 fragments are added. The entire process is done for all positions and reactions available. Next, we selected a random reaction to start our final set and again iteratively compared the average similarity between all fragments from selected and unselected reactions and added the reaction with the lowest similarity to the selection until it contained four reactions. This was done in order to create a rigid similarity landscape, which is usually harder to navigate and optimize within, and to provide a high diversity of possible molecules. All one million molecules were docked with RosettaLigand against the human dopamine D3 receptor (PDB ID: 3PBL,62) to have their scores readily available, and to allow us to assess how close the algorithm gets to the global minimal score. We set the best-scoring 1000 molecules (0.1%) from this set as virtual hits. The selection of a rigid similarity landscape as well as optimizing only for a single target is potentially suboptimal for protocol optimization, but we did not observe a deterioration of hit rates when switching to the later explained, more realistic benchmark.

Drug target data collection

For a benchmark closer to realistic conditions, we selected five established drug targets and sampled within the largest available Enamine REAL space39 with over 20 billion molecules at that time. The reaction SMARTS and reagent SMILES were obtained directly from Enamine under a non-disclosure agreement in January 2022. We assembled a benchmark-set of five proteins as drug targets. All are well-researched with high-throughput screening (HTS) data available and high-quality crystal structures deposited in the protein databank (PDB), namely the G protein-coupled receptor (GPCR) orexin receptor type 1 OX1 (PDB: 4ZJC63), the GPCR muscarinic acetylcholine receptor M1 (PDB: 5CXV64), the DNA repair enzyme tyrosyl DNA-phosphodiesterase 1 TDP1 (PDB: 6MYZ65), the GPCR neuropeptide Y Y1 receptor (PDB: 5ZBQ66), and last, the tyrosine-protein kinase ABL1 (PDB: 2HZI67). The corresponding HTS data is taken from curated lists of actives from PubChem68, with the exception of the ABL1 kinase. Its screening data was provided through the directory of useful decoys enhanced (DUD-E), which curated ChEMBL entries67,69. We selected the curated PubChem HTS data to ensure high reliability of reported actives, but only four of the eight reported drug targets had available high-quality protein–ligand structures deposited in the PDB. The ABL1 kinase was randomly selected from all DUD-E kinases to increase the diversity of protein classes. Through this, we cover a diverse selection of different valuable drug targets and a broad bandwidth of small-molecule ligands, making them a good test case for the benchmark. Furthermore, they represent a mixture of soluble and membrane proteins, and especially GPCRs are known for their flexible nature. Between 188 and 801 molecules are utilized as known actives. All structures were prepared following the presteps of the RosettaLigand protocol, with target sites derived from co-crystallized ligands resolved in complex with the protein structures. All known binders from the used HTS data were docked, again following the RosettaLigand protocol, to use their docking scores to ensure that lid_root2 helps to enrich molecule sets. Additionally, we sampled 100,000 random molecules from the Enamine REAL space and docked them against each target to compare REvoLd sampling efficiency. We found that the known active score distribution is more negative than the random sample, indicating that lid_root2 indeed enriches sets of molecules. Details can be found in Supplementary Note 3.

Algorithm overview

Following typical evolutionary algorithms, REvoLd starts with a random population. An overview of the optimization process is given in Fig. 6. Initial molecules, called individuals, are generated through picking a random reaction and picking one random suitable synthon for each of the reaction’s positions. The reaction is picked by a weighted random selection. The weight is the number of possible distinct educts of each reaction. Next, each of these random molecules is docked against the target protein following the RosettaLigand protocol, resulting in 150 complexes per molecule23,24. Each of these complexes is used to calculate interface energies between ligand and protein. The lowest calculated interface energy is used as a fitness score. The first population is formed through the application of selective pressure through a freely chosen main selector in order to reduce the number of individuals down to the selected maximum size. It should be noted, though, that whilst we are using the terms individual and molecule often together, they are treated differently. Each individual is an entity in the population, participates in the evolutionary optimization process and represents one distinct molecule. However, several individuals can represent the same molecule, as it can happen that several recombinations of parents occur multiple times. This is treated with extra care and explained in more detail in the section “Score calculation”.

Fig. 6. General overview of REvoLd’s optimization cycle.

Fig. 6

Starting from a random population, fitness scores are calculated through ligand docking and solutions are discarded through selective pressure. Alterations to molecules are introduced through offspring factories iteratively. The cycle is continued until a set number of generations is reached.

An evolutionary optimization cycle follows to successively select fit individuals, e.g., molecules, for reproduction. Every cycle consists of a sequence of reproduction steps. Each step selects individuals from the previous generation for a given reproduction pattern. The selected individuals remain in the pool for further reproduction steps by default, but can be removed if desired. Next, new molecules are docked to calculate their fitness, and the main selector is applied to reduce the new population down to the selected maximum. The main selector can be freely selected from the available selectors described in the section “Selectors”. A cycle is finished by checking if the maximum number of generations is reached. If not, another cycle follows; otherwise, the algorithm ends and reports all analyzed molecules.

Selectors

In the current version of REvoLd, three different selectors were implemented, all of which can be utilized as the main selector and for selecting individuals for reproduction. The simplest selector, called ElitistSelector, simply selects the fittest individuals. The other two selectors, TournamentSelector and RouletteSelector, are non-deterministic and may allow worse-scoring individuals to be selected for reproduction and to advance generations, in order to explore chemical space further and potentially escape local minima70. The RouletteSelector takes the relative differences of fitness scores into account by assigning selection chances based on them. If an individual has fitness two times better than another individual, the first will have a two times higher chance of selection. This deviates from the expected exponential correlation between binding free energy (represented by fitness, e.g., docking scores) and binding affinity. We opted for the linear correlation to make the selector softer and increase the chances of low-scoring molecules for selection. TournamentSelector, on the other hand, solely considers the ranking of individuals. A set number of individuals is selected randomly to participate in a tournament. This number is referred to as tournament size. All individuals are sorted by their fitness and are granted a chance to accept their selection. Therefore, a larger tournament size means a smaller chance to be selected for less fit individuals. In addition, all selectors have the option to either remove selected individuals from the pool or to keep them in the pool to potentially participate in more reproduction steps.

Reproduction

Reproduction is implemented through three different offspring factories. Each factory needs to be linked to one selector, which provides it with a set of selected individuals. Each factory uses these individuals to produce a set number of offspring. This is applied sequentially following a user-specified evolutionary protocol. The IdentityFactory simply copies the input individuals unchanged into the offspring generation. This is done to preserve and keep already found solutions and allow them to proceed through multiple generations.

The MutatorFactory applies point mutation to all its input individuals. A point mutation is either the change of one fragment to another fitting fragment or a change of reaction. This is expected to introduce small local perturbations to guide individuals into local energy minima. Switching fragments is easy since each reaction provides a list of suitable synthons for all positions, and those can be freely replaced by each other. RDKit’s implementation of extended connectivity fingerprints and Tanimoto similarity are used to control the impact of synthon mutation7173. Fingerprints are calculated for all fragments during database loading at program initialization. When a new fragment needs to be selected, all suitable replacements are collected into one list, and their similarity to the original fragment is calculated. Next, cutoff values are applied to ensure a minimal and maximum similarity. The final new fragment is then selected using a weighted random sample with similarity as weights. This can be used to enforce drastic mutations or to allow only fine-grained changes. The mutation of reactions is more challenging. First, a new reaction is selected. Second, new fragments for each position are selected based on maximal Tanimoto similarity to previously used fragments, to limit the changes induced on the molecule.

The last possibility to create offspring is the CrossoverFactory. It randomly combines all its input individuals into parental pairs, which are used to create offspring inheriting traits from them. One parent provides the reaction used for the offspring, and each parent provides a random number of fragments, but always at least one. If both parents use the same reaction, their fragments can be freely combined. If they use different reactions, a local similarity search is used to find the most similar fragments in the reaction used by the offspring, as it is done in the MutatorFactory.

Score calculation

Each individual represents a molecule that is a potential hit candidate. To estimate its fitness, or score, we use the ligand docking protocol RosettaLigand and its recommended preprocessing steps23,24. First, RDKit71 is used to turn the SMARTS and SMILES representations of the individual’s reaction and fragments, respectively, into RDKit reactions and molecules and to run the reaction with the selected fragments. RDKits' implementation of the ETKDG method is further used to calculate a three-dimensional embedding of the molecule and a list of low-energy conformers74. It should be noted, though, that as of now, REvoLd is using only the very basic RDKit functionalities for 3D structure generation. For example, there is no special consideration for stereoisomers or different protonation states. The conformers are passed to the RosettaLigand docking protocol to calculate the protein–ligand complex following a Monte Carlo optimization. Briefly, the RosettaLigand protocol consists of an initial placement of the ligand in a user-defined position, which is ideally a previously identified or known binding site. Next, a coarse-grained docking step successively applies rotation, translation, and conformer sampling to guide the ligand into close contact with the protein. This is followed by a high-resolution docking step, which applies only small changes to the ligand and optimizes protein sidechains. Lastly, a final minimization step optimizes the current solution into a local minimum. This protocol is repeated 150 times, and the lowest (best) reported protein-ligand interface energy is used as the basis of the efficiency score.

As mentioned before, several individuals can represent the same molecule. This is captured by the algorithm. First, it checks if a molecule has been docked in a previous generation and reuses the same score again. If multiple identical but unscored molecules are present within the same population, only one full docking run is conducted, and the same score is used for all of the identical copies. To ensure diversity across a population, we apply a similarity penalty. Starting from the best scoring molecule, for each molecule, a fingerprint is calculated, stored and compared with all stored fingerprints. If the Tanimoto similarity between two fingerprints exceeds 0.95, a penalty of +0.5 is applied to the worse-scoring molecule. This is done to prevent a population takeover by a single well-scoring molecule. At the same time, it does allow very good scoring molecules to be presented by 3–4 individuals (as the first one receives no penalty, the second a penalty of 0.5, the third 1.0, and so on), increasing its chance to pass on genes and thereby pulling the entire population towards a more favorable chemical space.

Ligand efficiency

Since many of RosettaLigand’s energy terms depend on the size of the molecule, we observed the well-known bias towards larger ligands in our first REvoLd runs. Molecules were getting larger and larger with every generation, since larger molecules can form more interactions with the target protein and therefore receive better scores. However, this does not accurately reflect experimental findings75. To address this issue, we normalized the interface energy and tested four different methods with n ∈ {1, 2, 3, 4}:

fitnessn(x)=energy(x)heavyatoms(x)n 3

where energy(x) is the interface energy between the protein target and ligand x calculated by RosettaLigand and heavyatoms(x) is the number of non-hydrogen atoms in the ligand x. Increasing n essentially decreases the penalty for large molecules. We found n = 2 to perform best. Detailed results can be found in Supplementary Note 4. This measure is the geometric mean between the predicted binding energy and the ligand efficiency. It strikes a good balance between biasing against too large and too small molecules. We call this score “ligand interface delta square root normalized” or short lid_root2. It will be used throughout the rest of the paper whenever we report fitness or docking scores.

Final protocol

There are a total of seven different reproductions in our final protocol, but protocols can be freely adapted. This protocol was developed through optimizing hit rates, which is explained in the section “Hyperparameter and protocol optimization”. Each step is implemented through a pair of selectors and an offspring factory. The steps are applied sequentially, and most of them keep the parent pool unchanged; only one removes the selected parent molecules. The first two steps are intended to cause small refinements on promising molecules from the previous generation. The next two steps potentially use the same molecules, but make sure more impactful changes are applied to enhance exploration of chemical space, where step 3 increases exploration within the same reaction and step 4 ensures different reactions are explored as well. Step 5 moves the best molecules unchanged to the next generation and removes them from the pool of reproduction candidates. This is done to preserve the best solutions for future generations and to allow optimization of less optimal molecules through the final two steps.

  1. Moderate mutations: A roulette wheel selector selects 15 individuals and a total of 30 new molecules are produced through mutation, with every parent being used twice. Mutations occur twice as often on fragments instead of reactions. Fragment selection is biased towards high similarity with an enforced minimal Tanimoto similarity of 0.6. Selected parents remain unchanged in the current pool.

  2. Excessive crossover: A roulette wheel selector selects 15 individuals and a total of 60 new molecules are produced through crossover. Parents are randomly paired to generate one new molecule. This is repeated until enough offspring are generated. Selected parents remain unchanged in the current pool.

  3. Drastic mutations: Like step 1, but only fragments can be mutated, not reactions. Furthermore, fragment selection is still biased towards higher similarity, but a maximum similarity of 0.25 is used as a hard cutoff.

  4. Reaction mutation: Like step 1, but only reactions are mutated to guarantee exploration of chemical spaces defined by different reactions.

  5. Identity: The 15 best molecules from the current pool are passed unchanged to the next generation. They are removed from the current pool afterwards.

  6. Moderate mutations: Same as step 1, but since the 15 best molecules were removed in step 5, individuals with a worse fitness have a higher chance of being selected.

  7. Excessive crossover: Same as step 2, but again with a higher chance for worse molecules.

These steps strike a good balance between exploration and exploitation of chemical space. This cycle is repeated until 30 generations are done, and all molecules from all generations are reported. REvoLd saves the best-scoring protein–ligand complexes calculated during docking for each individual. Our final protocol uses a tournament selector as the main selector with tournament size 15 and acceptance chance of 0.75. The initial population size is 200, and the maximum number of individuals to pass between generations is 50. It is important to make again the distinction between individuals and molecules. The reported numbers are static, meaning that every run results in 7400 individuals. But since these can represent duplicate molecules, we observed on average only 2450–3800 actual molecules being docked per run, as reported in “Benchmark under realistic conditions”. While a rate of 50–70% duplicated molecules seems very high, we did not experience a deterioration of results. Additionally, as explained in the section “Score calculation”, the duplicates do not cause docking overheads and are penalized.

Supplementary information

Acknowledgements

Computations for this work were done (in part) using resources of the Leipzig University Computing Center and of the Vanderbilt University Advanced Computing Cluster for Research and Education (ACCRE). The authors would like to thank Iaroslava Kos and Enamine LTD for their support and access to their dataset. The authors acknowledge the financial support by the Federal Ministry of Education and Research of Germany and by Sächsische Staatsministerium für Wissenschaft, Kultur und Tourismus in the program Center of Excellence for AI-research Center for Scalable Data Analytics and Artificial Intelligence Dresden/Leipzig, project identification number: ScaDS.AI. P.E.’s position is funded through an award by ScaDS.AI. J.M. is supported by an Alexander-von-Humboldt professorship. This work was further funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through SPP2363 (460865652) and through SFB1423 (421152132). P.E. and F.L. received a fellowship from the Max Kade Foundation to support their work in the Meiler laboratory at Vanderbilt University. Work in the Vanderbilt Meiler laboratory is supported through the NIH (R01 DA046138, R01 HL122010, and R01 GM080403). Funded by the Open Access Publication Fund of Leipzig University.

Author contributions

Conceptualization: Paul Eisenhuth, Jens Meiler. Data curation: Paul Eisenhuth. Formal analysis: Paul Eisenhuth. Funding acquisition: Jens Meiler. Investigation: Paul Eisenhuth. Methodology: Paul Eisenhuth. Resources: Rocco Moretti and Jens Meiler. Supervision: Rocco Moretti and Jens Meiler. Software: Paul Eisenhuth. Validation: Fabian Liessmann and Paul Eisenhuth. Visualization: Paul Eisenhuth. Writing—original draft: Paul Eisenhuth. Writing—review and editing: Paul Eisenhuth and Fabian Liessmann, Rocco Moretti, Jens Meiler.

Peer review

Peer review information

Communications Chemistry thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Data availability

Protein structures are available through the Protein Databank (PDB), and their codes are mentioned in the paper. Known actives are available either through the PubChem repositories associated with68 or the directory of useful decoys (DUD-E)69. Access to the Enamine REAL space can be requested through Enamine Ltd. The REvoLd guide contains a short section to help with the NDA (https://docs.rosettacommons.org/docs/latest/revold). The data that support the findings of this study are available from the corresponding author, but contain protected intellectual property of Enamine Ltd., which was used under license for the current study, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of Enamine Ltd.

Code availability

All code is available as part of the Rosetta repository. More information, including compilation and run commands, can be found in the REvoLd guide (https://docs.rosettacommons.org/docs/latest/revold). Rosetta is available under a permissive license for academic research. Commercial utilization requires a separate license.

Competing interests

P.E. and F.L. are founders of AI-Driven Therapeutics GmbH (AI-DT). The company does not own intellectual property, licenses, or rights associated with the presented work. Both P.E. and F.L. were not employed by AI-DT at the time of writing and received no financial or non-financial compensation from AI-DT. All other authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

The online version contains supplementary material available at 10.1038/s42004-025-01758-x.

References

  • 1.Veale, C. G. Into the fray! A beginner’s guide to medicinal chemistry. ChemMedChem16, 1199–1225 (2021). [DOI] [PubMed] [Google Scholar]
  • 2.Sliwoski, G., Kothiwale, S., Meiler, J. & Lowe, E. W. Computational methods in drug discovery. Pharmacol. Rev.66, 334–395 (2014). [DOI] [PMC free article] [PubMed]
  • 3.Bohacek, R. S., McMartin, C. & Guida, W. C. The art and practice of structure-based drug design: a molecular modeling perspective. Med. Res. Rev.16, 3–50 (1996). [DOI] [PubMed] [Google Scholar]
  • 4.Irwin, J. J. & Shoichet, B. K. Docking screens for novel ligands conferring new biology. J. Med. Chem.59, 4103–4120 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ertl, P. Cheminformatics analysis of organic substituents: identification of the most common substituents, calculation of substituent properties, and automatic identification of drug-like bioisosteric groups. J. Chem. Inf. Comput. Sci.43, 374–380 (2003). [DOI] [PubMed] [Google Scholar]
  • 6.Coley, C. W. Defining and exploring chemical spaces. Trends Chem.3, 133–145 (2021). [Google Scholar]
  • 7.Leguy, J., Cauchy, T., Glavatskikh, M., Duval, B. & Mota, B. D. Evomol: a flexible and interpretable evolutionary algorithm for unbiased de novo molecular generation. J. Cheminform.12, 1–19 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Virshup, A. M., Contreras-García, J., Wipf, P., Yang, W. & Beratan, D. N. Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. J. Am. Chem. Soc.135, 7296–7303 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform.1, 1–11 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hiss, J. A. et al. Combinatorial chemistry by ant colony optimization. Future Med. Chem.6, 267–280 (2014). [DOI] [PubMed] [Google Scholar]
  • 11.Powers, A. S. et al. Geometric deep learning for structure-based ligand design. ACS Cent. Sci.9, 2257–2267 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Grygorenko, O. O. et al. Generating multibillion chemical space of readily accessible screening compounds. iScience23, 101681 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature566, 224–229 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Warr, W. A., Nicklaus, M. C., Nicolaou, C. A. & Rarey, M. Exploration of ultralarge compound collections for drug discovery. J. Chem. Inf. Model.62, 2021–2034 (2022). [DOI] [PubMed] [Google Scholar]
  • 15.Acharya, A. et al. Supercomputer-based ensemble docking drug discovery pipeline with application to covid-19. J. Chem. Inf. Model.60, 5832–5852 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gorgulla, C. et al. An open-source drug discovery platform enables ultra-large virtual screens. Nature580, 663–668 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Lyu, J., Irwin, J. J. & Shoichet, B. K. Modeling the expansion of virtual screening libraries. Nat. Chem. Biol.19, 712–718 (2023). [DOI] [PMC free article] [PubMed]
  • 18.Bender, B. J. et al. A practical guide to large-scale docking. Nat. Protoc.16, 4799–4832 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Alogheli, H., Olanders, G., Schaal, W., Brandt, P. & Karlen, A. Docking of macrocycles: comparing rigid and flexible docking in glide. J. Chem. Inf. Model.57, 190–202 (2017). [DOI] [PubMed] [Google Scholar]
  • 20.Lexa, K. W. & Carlson, H. A. Protein flexibility in docking and surface mapping. Q. Rev. Biophys.45, 301–343 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Erickson, J. A., Jalaie, M., Robertson, D. H., Lewis, R. A. & Vieth, M. Lessons in molecular recognition: the effects of ligand and protein flexibility on molecular docking accuracy. J. Med. Chem.47, 45–55 (2004). [DOI] [PubMed] [Google Scholar]
  • 22.Miller, E. B. et al. Reliable and accurate solution to the induced fit docking problem for protein–ligand binding. J. Chem. Theory Comput.17, 2630–2639 (2021). [DOI] [PubMed] [Google Scholar]
  • 23.DeLuca, S., Khar, K. & Meiler, J. Fully flexible docking of medium sized ligand libraries with rosettaligand. PLoS ONE10, 1–19 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Meiler, J. & Baker, D. Rosettaligand: protein-small molecule docking with full side-chain flexibility. Proteins: Struct., Funct. Genet.65, 538–548 (2006). [DOI] [PubMed] [Google Scholar]
  • 25.Smith, S. T. & Meiler, J. Assessing multiple score functions in Rosetta for drug discovery. PLoS ONE15, 1–14 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Herasymenko, O. et al. Cache challenge# 2: targeting the RNA site of the SARS-CoV-2 helicase Nsp13. J. Chem. Inf. Model.65, 6884–6898 (2025). [DOI] [PMC free article] [PubMed]
  • 27.Li, F. et al. Cache challenge# 1: targeting the WDR domain of LRRK2, a Parkinson’s disease associated protein. J. Chem. Inf. Model.64, 8521–8536 (2024). [DOI] [PubMed] [Google Scholar]
  • 28.Gentile, F. et al. Deep docking: a deep learning platform for augmentation of structure based drug discovery. ACS Cent. Sci.6, 939–949 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Gentile, F. et al. Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with deep docking. Nat. Protoc.17, 672–697 (2022). [DOI] [PubMed]
  • 30.Luttens, A., de Vaca, I. C., Sparring, L., Norinder, U. & Carlsson, J. Rapid traversal of ultralarge chemical space using machine learning guided docking screens. (2023), unpublished. [DOI] [PMC free article] [PubMed]
  • 31.Zhou, G. et al. An artificial intelligence accelerated virtual screening platform for drug discovery. Nat. Commun.15, 7761 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Graff, D. E., Shakhnovich, E. I. & Coley, C. W. Accelerating high-throughput virtual screening through molecular pool-based active learning. Chem. Sci.12, 7866–7881 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Sivula, T. et al. Machine learning-boosted docking enables the efficient structure-based virtual screening of giga-scale enumerated chemical libraries. J. Chem. Inf. Model.63, 5773–5783 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Sadybekov, A. A. et al. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature601, 452–459 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Sala, D., Batebi, H., Ledwitch, K., Hildebrand, P. W. & Meiler, J. Targeting in silico GPCR conformations with ultra-large library screening for hit discovery. Trends Pharmacol. Sci.44, 150–161 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Sindt, F., Seyller, A., Eguida, M. & Rognan, D. Protein structure-based organic chemistry-driven ligand design from ultralarge chemical spaces. ACS Cent. Sci.10, 615–627 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Beroza, P. et al. Chemical space docking enables large-scale structure-based virtual screening to discover ROCK1 kinase inhibitors. Nat. Commun.13, 1–10 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Müller, J. et al. Magnet for the needle in haystack: “crystal structure first” fragment hits unlock active chemical matter using targeted exploration of vast chemical spaces. J. Med. Chem.65, 15663–15678 (2022). [DOI] [PubMed]
  • 39.Enamine Ltd., Enamine REAL Space. https://enamine.net/compound-collections/real-compounds/real-space-navigator. (2024). accessed on April 10.
  • 40.Kalliokoski, T., Turku, A. & Kasnanen, H. Spacehasten: a structure-based virtual screening tool for nonenumerated virtual chemical libraries. J. Chem. Inf. Model.65, 125–132 (2024). [DOI] [PubMed] [Google Scholar]
  • 41.Klarich, K., Goldman, B., Kramer, T., Riley, P. & Walters, W. P. Thompson sampling—an efficient method for searching ultralarge synthesis on demand databases. J. Chem. Inf. Model.64, 1158–1171 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Meyenburg, C., Dolfus, U., Briem, H. & Rarey, M. Galileo: three-dimensional searching in large combinatorial fragment spaces on the example of pharmacophores. J. Comput.-Aided Mol. Design37, 1–16 (2022). [DOI] [PMC free article] [PubMed]
  • 43.Moesgaard, L. & Kongsted, J. Introducing spacega: a search tool to accelerate large virtual screenings of combinatorial libraries. J. Chem. Inf. Model.64, 8123–8130 (2024). [DOI] [PubMed] [Google Scholar]
  • 44.Bellmann, L., Penner, P. & Rarey, M. Topological similarity search in large combinatorial fragment spaces. J. Chem. Inf. Model.61, 238–251 (2020). [DOI] [PubMed] [Google Scholar]
  • 45.Gao, W., Fu, T., Sun, J. & Coley, C. Sample efficiency matters: a benchmark for practical molecular optimization. Adv. Neural Inf. Process. Syst.35, 21342–21357 (2022). [Google Scholar]
  • 46.Tripp, A. & Hernández-Lobato, J. M. Genetic algorithms are strong baselines for molecule generation. arXiv preprint arXiv:2310.09267 (2023).
  • 47.Illgen, K., Enderle, T., Broger, C. & Weber, L. Simulated molecular evolution in a full combinatorial library. Chem. Biol.7, 433–441 (2000). [DOI] [PubMed] [Google Scholar]
  • 48.Chu, Y. & He, X. Molegear: a Java-based platform for evolutionary de novo molecular design. Molecules24, 1444 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Steinmann, C. & Jensen, J. H. Using a genetic algorithm to find molecules with good docking scores. PeerJ Phys. Chem.3, e18 (2021). [Google Scholar]
  • 50.Spiegel, J. O. & Durrant, J. D. Autogrow4: an open-source genetic algorithm for de novo drug design and lead optimization. J. Cheminform.12, 1–16 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Kerstjens, A. & Winter, H. D. Leadd: Lamarckian evolutionary algorithm for de novo drug design. J. Cheminform.14, 1–20 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Bemis, G. W. & Murcko, M. A. The properties of known drugs. 1. molecular frameworks. J. Med. Chem.39, 2887–2893 (1996). [DOI] [PubMed] [Google Scholar]
  • 53.Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature596, 583–589 (2021). [DOI] [PMC free article] [PubMed]
  • 54.Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. Diffdock: diffusion steps, twists, and turns for molecular docking. Proceedings of the International Conference on Learning Representations ICLR, Vol. 11 (2023).
  • 55.Stärk, H., Ganea, O.-E., Pattanaik, L., Barzilay, R. & Jaakkola, T. Equibind: Geometric deep learning for drug binding structure prediction. Proceedings of the International Conference on Machine Learning, Vol. 162, 20503–20521 (2022).
  • 56.Méndez-Lucio, O., Ahmad, M., del Rio-Chanona, E. A. & Wegner, J. K. A geometric deep learning approach to predict binding conformations of bioactive molecules. Nat. Mach. Intell.3, 1033–1039 (2021). [Google Scholar]
  • 57.Devi, R. V., Sathya, S. S. & Coumar, M. S. Evolutionary algorithms for de novo drug design—a survey. Appl. Soft Comput.27, 543–552 (2015). [Google Scholar]
  • 58.Le, T. C. & Winkler, D. A. A bright future for evolutionary methods in drug design. ChemMedChem10, 1296–1300 (2015). [DOI] [PubMed] [Google Scholar]
  • 59.Bilokin, Y. 12 Billion Novel Molecules: Chemriya—OTAVA’s On-demand Chemical Space (OTAVA Ltd., 2024).
  • 60.WuXi Biologics, GalaXi Space. https://wuxibiology.com/drug-discovery-services/hit-finding-and-screening-services/virtual-screening/. (2024). accessed on April 10.
  • 61.Bellmann, L., Penner, P., Gastreich, M. & Rarey, M. Comparison of combinatorial fragment spaces and its application to ultralarge make-on-demand compound catalogs. J. Chem. Inf. Model.62, 553–566 (2022). [DOI] [PubMed]
  • 62.Chien, E. Y. et al. Structure of the human dopamine d3 receptor in complex with a d2/d3 selective antagonist. Science330, 1091–1095 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Yin, J. et al. Structure and ligand-binding mechanism of the human ox1 and ox2 orexin receptors. Nat. Struct. Mol. Biol.23, 293–299 (2016). [DOI] [PubMed] [Google Scholar]
  • 64.Thal, D. M. et al. Crystal structures of the m1 and m4 muscarinic acetylcholine receptors. Nature531, 335–340 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Lountos, G. T. et al. Identification of a ligand binding hot spot and structural motifs replicating aspects of tyrosyl-dna phosphodiesterase i (tdp1) phosphoryl recognition by crystallographic fragment cocktail screening. Nucleic Acids Res.47, 10134–10150 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Yang, Z. et al. Structural basis of ligand binding modes at the neuropeptide y y1 receptor. Nature556, 520–524 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Cowan-Jacob, S. W. et al. Structural biology contributions to the discovery of drugs to treat chronic myelogenous leukaemia. Acta Crystallogr. Sect. D: Biol. Crystallogr.63, 80–93 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Butkiewicz, M. et al. Benchmarking ligand-based virtual high-throughput screening with the pubchem database. Molecules18, 735–756 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Mysinger, M. M., Carchia, M., Irwin, J. J. & Shoichet, B. K. Directory of useful decoys, enhanced (dud-e): Better ligands and decoys for better benchmarking. J. Med. Chem.55, 6582–6594 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Goldberg, D. E. & Deb, K. A comparative analysis of selection schemes used in genetic algorithms. In Foundations of Genetic Algorithms, Vol. 1 (ed Rawlins, G. J.) 69–93 (Elsevier, 1991).
  • 71.Landrum, G. et al. rdkit/rdkit: 2024_03_1 (q1 2024) release. https://zenodo.org/records/10893044Zenodo (2024).
  • 72.Bajusz, D., Rácz, A. & Héberger, K. Why is tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminform.7, 1–13 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model.50, 742–754 (2010). [DOI] [PubMed] [Google Scholar]
  • 74.Riniker, S. & Landrum, G. A. Better informed distance geometry: using what we know to improve conformation generation. J. Chem. Inf. Model.55, 2562–2574 (2015). [DOI] [PubMed] [Google Scholar]
  • 75.Hopkins, A. L., Keserü, G. M., Leeson, P. D., Rees, D. C. & Reynolds, C. H. The role of ligand efficiency metrics in drug discovery. Nat. Rev. Drug Discov.13, 105–121 (2014). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

Protein structures are available through the Protein Databank (PDB), and their codes are mentioned in the paper. Known actives are available either through the PubChem repositories associated with68 or the directory of useful decoys (DUD-E)69. Access to the Enamine REAL space can be requested through Enamine Ltd. The REvoLd guide contains a short section to help with the NDA (https://docs.rosettacommons.org/docs/latest/revold). The data that support the findings of this study are available from the corresponding author, but contain protected intellectual property of Enamine Ltd., which was used under license for the current study, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of Enamine Ltd.

All code is available as part of the Rosetta repository. More information, including compilation and run commands, can be found in the REvoLd guide (https://docs.rosettacommons.org/docs/latest/revold). Rosetta is available under a permissive license for academic research. Commercial utilization requires a separate license.


Articles from Communications Chemistry are provided here courtesy of Nature Publishing Group

RESOURCES