Skip to main content
ACS Medicinal Chemistry Letters logoLink to ACS Medicinal Chemistry Letters
. 2023 Mar 16;14(4):466–472. doi: 10.1021/acsmedchemlett.3c00021

Relevance of the Trillion-Sized Chemical Space “eXplore” as a Source for Drug Discovery

Alexander Neumann , Lester Marrison §, Raphael Klein ‡,*
PMCID: PMC10108389  PMID: 37077402

Abstract

graphic file with name ml3c00021_0007.jpg

Within the past two decades, virtual combinatorial compound collections, so-called chemical spaces, became an important molecule source for pharmaceutical research all over the world. The emergence of compound vendor chemical spaces with rapidly growing numbers of molecules raises questions about their application suitability and the quality of the content. Here, we examine the composition of the recently published and, so far, biggest chemical space, “eXplore”, which comprises approximately 2.8 trillion virtual product molecules. The utility of eXplore to retrieve interesting chemistry around approved drugs and common Bemis Murcko scaffolds has been assessed with several methods (FTrees, SpaceLight, SpaceMACS). Further, the overlap between several vendor chemical spaces and a physicochemical property distribution analysis has been performed. Despite the straightforward chemical reactions underlying its setup, eXplore is demonstrated to provide relevant and, most importantly, easily accessible molecules for drug discovery campaigns.

Keywords: combinatorial library, chemical space, virtual screening, molecular similarity, ligand-based drug discovery, make-on-demand


The pharmaceutical industry and research facilities all over the world seek synthetically accessible novel molecules with fast and reliable delivery rates to boost their drug discovery projects. Especially the early stages can require broad chemical diversity to increase the likelihood of finding active candidates, while the possibility to investigate the proximal chemical space around a known active compound can accelerate the lead optimization process (SAR-by-Space).1

The recent growth of combinatorial compound collections has enabled scientists to mine from chemical spaces containing billions of molecules.25 Chemical suppliers like Enamine, WuXi LabNetwork, and OTAVA created spaces that totaled in over 50 billion make-on-demand molecules in 2022. Similarly, pharmaceutical companies generated their proprietary chemical spaces with even higher numbers of virtual product molecules using their own proprietary building blocks and in-house chemistry. The GSK XXL space by GlaxoSmithKline represents the largest compound collection so far, featuring around 1026 entries.6

Similarity screening algorithms have been improved to keep pace with ever-growing enumerated libraries. Approaches like Arthor7 and SmallWorld8 are able to handle billion-sized databases but need an enormous amount of memory (TB range).9

To address the handling and processing of vast chemical spaces like the trillion-sized eXplore, a triad of algorithms has been developed in the past years to cover the most common search methodologies which are routinely performed on enumerated libraries. This triad consists of (i) FTrees,10,11 a fuzzy pharmacophore similarity search, (ii) SpaceLight,12 a molecular fingerprint search method, and (iii) SpaceMACS,13 that screens for maximum common substructures (MCSs).

Further, the algorithmic toolbox was extended in order to analyze the quality of each chemical space: SpaceCompare14 can quantitatively determine the overlap of chemical spaces, and SpaceProp15 calculates distributions of physicochemical parameters of the individual spaces. Every additional chemical space and their respective updates featuring novel compounds raise again the question about the quality of the contained virtual product molecules.

Recently, a new chemical space, “eXplore”, was released comprising 2.8 trillion virtual product molecules. This space is based on the knowledge of robust chemical reactions,16 reactions often used in medicinal chemistry,17 and novel reaction technologies18 (a total of 47 chemical reactions). A curated set of building blocks with a maximum delivery time of 10 days (denoted as tier I and II building blocks by eMolecules) was provided by eMolecules as input for the generation of the space to satisfy the demand of the pharmaceutical industry to quickly access desired compounds. The selection criteria for reactions and corresponding building blocks are aimed to ensure a high degree of synthetic accessibility.

In this work, the aforementioned algorithmic methods were applied to assess the properties and content of the eXplore space molecules and their relevance for drug discovery purposes. Additionally, due to their novelty, results retrieved by the methods SpaceLight and SpaceMACS were compared with each other by similarity measures to collect insights on their applicability for drug discovery.

A list of drugs approved by the U.S. Food & Drug Administration (FDA) was selected as a test set for the assessment of the eXplore space, as it contains the most relevant compounds regarding desirable physicochemical properties, toxicological safety profiles, and biologically active scaffolds. Therefore, every commercially available molecule collection should address those criteria to satisfy the needs of state-of-the-art drug discovery. The list of drugs was prefiltered for compounds with molecular weight lower than 800 g/mol for two reasons: first, to take “beyond rule of five” (bRo5) molecules into account, as they represent an emerging field in drug discovery,19,20 and second, to expand the hunting grounds for the discovery of tool compounds that may not be orally bioavailable but can be used to target validation studies and act as lead compounds.

Macrocycles, defined here as compounds bearing a ring containing 9 or more heavy atoms, were excluded. Those rings are poorly covered by chemical spaces and cannot be handled by the FTrees algorithm. All available search algorithms were used to identify close neighbors of the resulting 2793 unique FDA-approved drugs within eXplore.

The algorithm FTrees (Feature Trees) calculates a similarity score based on pharmacophore properties, the so-called “features”. Its fuzziness allows that very close analogs appear with a similarity score of 1; e.g., ortho- and meta-substituted analogs are considered equivalent. FTrees was able to retrieve 314 molecules (12%) with a similarity score of 1 from the space. For 1499 approved drugs (55%), molecules with similarities in the range 0.95–0.99 were identified. The FTrees results are summarized in Figure 1A.

Figure 1.

Figure 1

(A) FTrees, (B) SpaceLight (fCSFP3), and (C) SpaceMACS similarity search results for close analogs of 2793 FDA-approved drugs within the eXplore space. The FTrees search was performed with different intervals, given the nature of the fuzziness of the algorithm that allows soft matching of molecular features, and is therefore represented with a grain pattern.

The SpaceLight search algorithm is based on Tanimoto similarity. Different fingerprints are available: ECFP with variants 2/4/6 and CSFP with variants fCSFP, tCSFP, and iCSFP, each with subvariants 1–6. CSFP fingerprints apply more and smaller features and are therefore optimal for screening of fragment spaces.21 In this work, we have chosen the fCSFP3 fingerprint to search for analogs of FDA-approved drugs. Out of 2793 FDA-approved drugs, 269 (10%) were identified with a Tanimoto similarity of 1. Some non-identical molecules were identified with a similarity of 1 as well. This is due to the fact that this fingerprint variant cannot differentiate between single methylene groups in longer aliphatic chains. The similarity range of 0.90–0.99 was represented by 233 (8%) approved drug analogs. A similarity score of 0.80–0.89 was retrieved for 642 (23%) of the approved drugs. For the majority, only compounds with similarity scores below 0.80 (1647, 59%) were found. The results are summarized in Figure 1B.

The algorithm SpaceMACS was used with its similarity mode to screen the space for related compounds.13 This algorithm identifies the MCS and calculates a similarity score based on the number of matching atoms in relation to the number of heavy atoms of the whole molecule. Therefore, a similarity score of 1 represents the exact same molecule. The results are summarized in Figure 1C. SpaceMACS was able to identify 212 approved drugs (8%) within eXplore (e.g., lidocaine, ambroxol, diclofenac, celecoxib). Compounds with a similarity of 0.90–0.99 were retrieved for 503 approved drugs (18%), and another set of compounds with a similarity of 0.80–0.89 was retrieved for 724 approved drugs (26%). Search results for 48% of the test data set had low similarity values, <0.80 (see Figure 1C).

In 1255 cases (45%), both SpaceLight and SpaceMACS retrieved low similarity results (<0.8). This can be ascribed to factors such as the underlying complex synthesis of the query molecule not covered by one- to two-step reactions used in the generation of the space, the (semi)biological background of the drug, and the absence of particular building block scaffolds required to construct molecular entities (e.g., heterocycles, annulated ring systems, particular substitution patterns, etc.).

Each screening method retrieved chemically diverging analogs for the approved drug set. The drug celecoxib was chosen as an example query. All three algorithms were able to identify the exact molecule with a similarity score of 1. The closest analog to celecoxib found by FTrees displays three differences: (i) the sulfonamide is switched from the para to meta position, (ii) the pyrazole is converted to an imidazole ring, and (iii) the methyl substitution of the phenyl ring is replaced by a fluorine atom (see Figure 2). The first two observations can be traced back to FTrees’ indifference toward substitution patterns and the position of heteroatoms in a ring. The similarity score is therefore solely influenced by the methyl/fluorine replacement. The closest analog found by SpaceMACS has a MCS similarity of 0.96. It differs by the additional substitution of a methyl group. The closest analog in SpaceLight using an fCSFP3 fingerprint has a Tanimoto similarity of 0.978. This is due to the shift of the sulfonamide group from the para to meta position.

Figure 2.

Figure 2

Closest analogs to celecoxib retrieved by FTrees, SpaceMACS, and SpaceLight. For each molecule, the corresponding similarity scores of the respective other algorithms were calculated.

The closest analogs obtained by FTrees, SpaceLight, and SpaceMACS and celecoxib itself are stored within eXplore as reaction products of a copper(I)-catalyzed N-arylation. The respective reagents have a boronic acid and a pyrrole nitrogen as functional groups. All required building blocks are within Tier1 of the eMolecules catalog and have prices between $20 and $147 USD per 100 mg, resulting in $100–200 USD in acquisition costs for the required building blocks per compound.

Subsequently, search results of five molecules with no identical match in eXplore were representatively analyzed. Related compounds were retrieved for carvedilol, flunarizine, cabozatinib, erismodegib, and avanafil. The highest ranked compounds by similarity are summarized in Figure 3. In the case of carvedilol, FTrees and SpaceLight were able to retrieve closely related analogs with a methylene group-elongated linker, whereas SpaceMACS provided an amide analog. The latter represents a neutrally charged molecule under physiological pH of 7.4 compared to the positively charged query compound with a secondary amine. For flunarizine, analogs with different halogen decoration patterns (FTrees, SpaceLight) and a stripped-off analog (SpaceMACS) was discovered. The three analogs of cabozatinib were an alkynyl-linked, meta fluoro-substituted derivative (FTrees), a nitrogen-linked derivative (SpaceLight), and a compound bearing a pyridine as middle ring system (SpaceMACS). Greater chemical diversity compared to the query compound was observed for the search results of sonidegib. Here, FTrees provided a tertiary amine connecting the trifluoromethoxyphenyl, SpaceLight an amide-coupled biphenyl, and SpaceMACS an analog without the methyl group in position 2. The search results of avanafil displayed variety in the benzyl group: a tertiary amine (FTrees), an amine-linked pyridinylmethyl group (SpaceLight), and a tertiary amide (SpaceMACS) were observed.

Figure 3.

Figure 3

Examples of eXplore’s highest ranked search results for drugs without an identical match. The similarity score of each method is provided below the structure.

In summary, all three chemical space search algorithms provided close (similarity score close to 1) and more distant analogs. Access to chemically diverse analogs is of great interest for fast follow-up in early stages of drug discovery. The extended analysis connotes the potential of eXplore as a source for compounds to investigate structure–activity relationships of a lead structure. Our results show that each algorithm is able to extract relevant molecules with high similarity scores. Further, eXplore is demonstrated to provide analogs even for relatively complex structures such as the smoothened antagonist sonidegib and the PDE5 inhibitor avanafil. It should be mentioned that the similarity scores of the different screening methods do not correlate, as they use different comparison parameters to address individual needs in a drug discovery project.

The assumption that vast chemical spaces share mostly common areas of the chemical space has been discussed in the literature. In a recent work, the question was handled whether chemical spaces cover the same Bemis Murcko scaffolds as on-the-shelf compound libraries.22 In the study, a MCS similarity13 of 0.85 was set as a threshold to discriminate similar from dissimilar scaffolds.

We repeated the same experiment with the latest versions of REAL Space, GalaXi, and CHEMriya. Out of the 2,452,944 on-the-shelf scaffolds, 1.981.373 (81%), 646,505 (26%), and 395,958 (16%) were covered with similarities of 0.85–1, respectively. For eXplore we found a coverage of 1,404,608 (57%) scaffolds (see Figure 4).

Figure 4.

Figure 4

Pie chart of the pairwise MCS similarity between 2.4 million Bemis Murcko scaffolds and the most similar compounds from eXplore.

Out of 395,450 scaffolds (16%) with a similarity score of 1 in eXplore, 253,134 do overlap with REAL Space, 49,699 with GalaXi, and 18,602 with CHEMriya. The numbers were calculated with the recent versions of the spaces (as of September 2022), which could differ from the results in the previous work. Finally, 134,656 scaffolds identical with on-the-shelf library scaffolds are yet exclusive to the eXplore space.

A comparison of the scaffold overlap between eXplore and the other three spaces came to the same result as the preceding work.22 The biggest overlap of eXplore scaffolds is formed with REAL Space results, with 253,134 shared scaffolds in total. Among the four investigated chemical spaces, only 2,577 scaffolds were found in all of them (see Figure 5).

Figure 5.

Figure 5

Overlap of Bemis Murcko scaffolds found with a MCS similarity score of 1.

The drug-likeness of all identical and similar scaffolds of eXplore was measured by compliance to Ro5 (98.6%) and some limitation in complexity (max. seven rings, max. eight rotatable bonds, max. two chiral centers). The most critical Ro5 criterion is the logP, rejecting 1.1% of the compounds as well as the number of chiral centers not fulfilled by 2.6%. Finally, 95.6% of the investigated molecules retrieved from eXplore turned out to be relevant for small-molecule drug design campaigns.

The quality of a chemical library is often characterized by the compliance to certain rules (e.g., Ro5, rule of 3, Veber rules). These rules shall ensure that the libraries contain molecules with drug-like (i.e., orally active drug in humans) or fragment-like (i.e., small enough to be developed into a drug-like compound if necessary) properties. However, the involved parameters should not be regarded as absolute criteria. For example, the commonly applied Ro5 allows the violation of one parameter, and certain targets that require bRo5 compounds are known.20

The property distribution of chemical spaces (REAL Space, GalaXi, CHEMriya, KnowledgeSpace) has been previously published.15 Ro5 criteria like molecular weight, calculated logP, number of hydrogen bond acceptors, number of hydrogen bond donors, as well as the number of heavy atoms, were determined. Accordingly, an analysis of the eXplore space was performed (see Table 1, Figure 6). The distribution profile was found to be similar to those of GalaXi, CHEMriya, and KnowledgeSpace, whereas molecules within REAL Space mostly fulfilled the Ro5 criteria. Taking absolute numbers into account, eXplore still displays the largest number of Ro5-compliant compounds among the investigated chemical spaces, e.g., twice as many compounds as in the REAL Space. As for compounds not complying to the molecular weight criterion, some additional aspects need to be considered to assess the relevance of the content: 298 (11%) of the FDA-approved drugs (see paragraph above) have molecular weights above 500 g/mol, with several representatives going beyond 800 g/mol (e.g., elbasvir, ledipasvir, venetoclax). Further, it should not be neglected that compounds exceeding the limit of 500 g/mol also display potential as tool compounds for purposes in which physicochemical properties play a secondary role, such as in vitro target validation studies, fluorescent probes, and co-crystallization partners for structure determination.

Table 1. Rule of 5 (Ro5) Compliance of eXplore and Other Make-on-Demand Chemical Spaces.

  eXplore REAL Space CHEMriya GalaXi FreedomSpace Knowledge
Constrainta (2.8 trillion) (31 billion) (12 billion) (8 billion) (177 million) Spaceb
cLogP ≤ 5 49.0% 89.2% 26.0% 71.4% 78.1% 77.2%
M.W. ≤ 500 20.7% 77.6% 4.11% 42.9% 90.0% <0.01%
Acc. ≤ 10 67.5% 96.1% 35.4% 83.3% 97.6% 2.14%
Don. ≤ 5 98.3% 99.8% 94.5% 99.6% 99.5% 47.0%
H.A. ≤ 36 26.9% 98.4% 5.56% 53.8% 92.7% 0.01%
a

M.W.: molecular weight; Acc.: number of hydrogen bond acceptors; Don.: number of hydrogen bond donors; H.A.: number of heavy atoms).

b

Data taken from ref (15).

Figure 6.

Figure 6

Physicochemical property distributions of eXplore.

As a matter of principle, both factors used in the setup of eXplore, the building blocks and the chemical reactions, shape the distribution of physicochemical properties. While comprised reactions involving three or more building blocks apply molecular weight filters to prevent unpleasantly large products, it is still common for them to form a significant number of compounds with molecular weight above 500 g/mol due to the combinatorial explosion of possibilities, resulting in the observed distribution. Second, while building blocks containing heavy halogens (Br, I) are essential for certain reactions (e.g., Suzuki coupling) to access a broad range of molecular scaffolds, they form heavy molecular products in reactions where the halogen bond is not cleaved. The same phenomenon applies for building blocks containing a protective group.

Interestingly, among the physicochemical properties, “number of hydrogen bond donors” displayed the most favorable distribution: more than 90% of the molecules fulfilled this Ro5 criterion, which is in line with the findings for REAL Space, GalaXi, and CHEMriya.

Still, future iterations of the space should not aim to increase the number of entries just for the sake of growing but have to address the quality and relevance of the content for drug discovery purposes by adjusting the building block sizes involved in the creation of products as well as the addition of novel reactions.

As discussed in literature, the screening for similar compounds in a molecule collection represents a nontrivial challenge due to the fact that the perception of what can be considered as a relative to a query compound can vary between different projects and targets.2325 The findings presented in this work demonstrate the individual capacities of the discussed algorithms to screen for similar compounds in chemical spaces.

Application of the algorithms in the present analysis of the eXplore space verifies it as a valuable source for research- and development-relevant chemistry: for one-third of the investigated drugs, proximal molecules and analogs could be retrieved. Those results represent highly relevant chemistry due to their similarity to the respective FDA-approved query compound.

Although only approximately 20% of the eXplore compounds are within the Lipinski molecular weight range of 500 Da, compensation occurs by exceeding the already impressive commercial compound catalogs by several orders of magnitude in regard to drug-like molecules.26 The previously neglected realm of larger molecules which received tentative attention in the past decade (e.g., the rise in interest toward PROTACs) represents an intriguing field for future research which can profit from eXplore’s bRo5 content. The high Bermis Murcko scaffold coverage of 57% implies high resemblance to prominent medicinal chemistry craftmanship and the potential to address underrepresented structures. Given the fact that only a selection of 45 robust chemical reactions and building blocks from the eMolecules catalog were taken into consideration in the setup of the chemical space, it appears consequential that the entirety of the drug-like chemical matter cannot be covered. Especially comparatively complex and case-dependent procedures like reactions leading to (oligo-)nucleotide analogs or syntheses of certain decorated heterocycles are rarely covered by one- to two-step reactions, leading to their underrepresentation in the vast molecule collections.

The overlap between make-on-demand chemical spaces remains negligibly small. Therefore, every addition profoundly contributes to the exploration of synthetically feasible molecules and expands the medicinal chemistry toolbox. Further iterations of eXplore by fine-tuning of the chemical reactions and detailed assessment of building blocks used in the generation of the space will consequently contribute to an improved physicochemical property distribution of the comprised compounds and increase its already evident relevance as a source for accessible molecules for drug discovery.

Acknowledgments

The authors acknowledge Didier Rognan for sharing the Bermis Murcko scaffold queries.

Glossary

Abbreviations

Ro5

rule of 5

bRo5

beyond rule of 5

M.W.

molecular weight

H.A.

heavy atom

Acc.

hydrogen bond acceptor

Don.

hydrogen bond donor

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsmedchemlett.3c00021.

  • S1_Queries_FDA_approved_drugs.zip, FDA-approved drugs used as queries in sd file format; S2_Analogs_FDA_approved_drugs.zip, analogs of FDA-approved drugs obtained by FTrees, SpaceLight, and SpaceMACS in csv format; S3_BemisMurckoScaffold_Analysis.zip, Bemis Murcko scaffolds found in chemical spaces in csv format; and S4_Physicochemical_Parameter_Analysis.zip, physicochemical parameter distributions in csv format (ZIP)

Author Contributions

A.N. and R.K. contributed equally. The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.

The authors declare the following competing financial interest(s): A.N. and R.K. are employees of BioSolveIT, a company offering paid software to screen the eXplore space. L.M. is an employee of eMolecules that offers building blocks and screening compounds described in this work.

Supplementary Material

ml3c00021_si_001.zip (158.7MB, zip)

References

  1. Klingler F. M.; Gastreich M.; Grygorenko O. O.; Savych O.; Borysko P.; Griniukova A.; Gubina K. E.; Lemmen C.; Moroz Y. S. SAR by Space: Enriching Hit Sets from the Chemical Space. Molecules 2019, 24 (17), 3096–3106. 10.3390/molecules24173096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Hoffmann T.; Gastreich M. The next Level in Chemical Space Navigation: Going Far beyond Enumerable Compound Libraries. Drug Discovery Today 2019, 24 (5), 1148–1156. 10.1016/j.drudis.2019.02.013. [DOI] [PubMed] [Google Scholar]
  3. Sadybekov A. A.; Sadybekov A. V.; Liu Y.; Iliopoulos-Tsoutsouvas C.; Huang X.-P.; Pickett J.; Houser B.; Patel N.; Tran N. K.; Tong F.; Zvonok N.; Jain M. K.; Savych O.; Radchenko D. S.; Nikas S. P.; Petasis N. A.; Moroz Y. S.; Roth B. L.; Makriyannis A.; Katritch V. Synthon-Based Ligand Discovery in Virtual Libraries of over 11 Billion Compounds. Nature 2022, 601, 452–459. 10.1038/s41586-021-04220-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Muller J.; Klein R.; Tarkhanova O.; Gryniukova A.; Borysko P.; Merkl S.; Ruf M.; Neumann A.; Gastreich M.; Moroz Y. S.; Klebe G.; Glinca S. Magnet for the Needle in Haystacks: “Crystal Structure First” Fragment Hits Unlock Active Chemical Matter Using Targeted Exploration of Vast Chemical Spaces. J. Med. Chem. 2022, 65 (23), 15663–15678. 10.1021/acs.jmedchem.2c00813. [DOI] [PubMed] [Google Scholar]
  5. Beroza P.; Crawford J. J.; Ganichkin O.; Gendelev L.; Harris S. F.; Klein R.; Miu A.; Steinbacher S.; Klingler F.-M.; Lemmen C. Chemical Space Docking Finds Novel ROCK1 Kinase Inhibitors by Large-Scale Structure-Based Virtual Screening. Nat. Commun. 2022, 6447. 10.1038/s41467-022-33981-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Warr W. Report on an NIH Workshop on Ultralarge Chemistry Databases. ChemRxiv 2021, 10.26434/chemrxiv.14554803.v1. [DOI] [Google Scholar]
  7. NextMove Software, I. Arthor 3.0, NextMove Software, Inc.: Cambridge, England, 2020. [Google Scholar]
  8. NextMove Software, I. SmallWorld 5.0, NextMove Software, Inc.: Cambridge, England, 2020. [Google Scholar]
  9. Irwin J. J.; Tang K. G.; Young J.; Dandarchuluun C.; Wong B. R.; Khurelbaatar M.; Moroz Y. S.; Mayfield J.; Sayle R. A. ZINC20—A Free Ultralarge-Scale Chemical Database for Ligand Discovery. J. Chem. Inf. Model. 2020, 60 (12), 6065–6073. 10.1021/acs.jcim.0c00675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Rarey M.; Dixon J. S. Feature Trees: A New Molecular Similarity Measure Based on Tree Matching. J. Comput. Aided. Mol. Des. 1998, 12, 471–490. 10.1023/A:1008068904628. [DOI] [PubMed] [Google Scholar]
  11. Rarey M.; Stahl M. Similarity Searching in Large Combinatorial Chemistry Spaces. J. Comput. Aided. Mol. Des. 2001, 15 (6), 497–520. 10.1023/A:1011144622059. [DOI] [PubMed] [Google Scholar]
  12. Bellmann L.; Penner P.; Rarey M. Topological Similarity Search in Large Combinatorial Fragment Spaces. J. Chem. Inf. Model. 2021, 61 (1), 238–251. 10.1021/acs.jcim.0c00850. [DOI] [PubMed] [Google Scholar]
  13. Schmidt R.; Klein R.; Rarey M. Maximum Common Substructure Searching in Combinatorial Make-on-Demand Compound Spaces. J. Chem. Inf. Model. 2022, 62 (9), 2133–2150. 10.1021/acs.jcim.1c00640. [DOI] [PubMed] [Google Scholar]
  14. Bellmann L.; Penner P.; Gastreich M.; Rarey M. Comparison of Combinatorial Fragment Spaces and Its Application to Ultralarge Make-on-Demand Compound Catalogs. J. Chem. Inf. Model. 2022, 62 (3), 553–566. 10.1021/acs.jcim.1c01378. [DOI] [PubMed] [Google Scholar]
  15. Bellmann L.; Klein R.; Rarey M. Calculating and Optimizing Physicochemical Property Distributions of Large Combinatorial Fragment Spaces. J. Chem. Inf. Model. 2022, 62 (11), 2800–2810. 10.1021/acs.jcim.2c00334. [DOI] [PubMed] [Google Scholar]
  16. Hartenfeller M.; Eberle M.; Meier P.; Nieto-Oberhuber C.; Altmann K. H.; Schneider G.; Jacoby E.; Renner S. A Collection of Robust Organic Synthesis Reactions for in Silico Molecule Design. J. Chem. Inf. Model. 2011, 51 (12), 3093–3098. 10.1021/ci200379p. [DOI] [PubMed] [Google Scholar]
  17. Brown D. G.; Boström J. Analysis of Past and Present Synthetic Methodologies on Medicinal Chemistry: Where Have All the New Reactions Gone?. J. Med. Chem. 2016, 59 (10), 4443–4458. 10.1021/acs.jmedchem.5b01409. [DOI] [PubMed] [Google Scholar]
  18. Boström J.; Brown D. G.; Young R. J.; Keserü G. M. Expanding the Medicinal Chemistry Synthetic Toolbox. Nat. Rev. Drug Discovery 2018, 17 (10), 709–727. 10.1038/nrd.2018.116. [DOI] [PubMed] [Google Scholar]
  19. Lipinski C. A.; Lombardo F.; Dominy B. W.; Feeney P. J. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug Delivery Rev. 1997, 23 (1–13), 3–25. 10.1016/S0169-409X(96)00423-1. [DOI] [PubMed] [Google Scholar]
  20. Egbert M.; Whitty A.; Keserü G. M.; Vajda S. Why Some Targets Benefit from beyond Rule of Five Drugs. J. Med. Chem. 2019, 62 (22), 10005–10025. 10.1021/acs.jmedchem.8b01732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Bellmann L.; Penner P.; Rarey M. Connected Subgraph Fingerprints: Representing Molecules Using Exhaustive Subgraph Enumeration. J. Chem. Inf. Model. 2019, 59 (11), 4625–4635. 10.1021/acs.jcim.9b00571. [DOI] [PubMed] [Google Scholar]
  22. Perebyinis M.; Rognan D. Overlap of On-demand Ultra-large Combinatorial Spaces with On-the-shelf Drug-like Libraries. Mol. Inform. 2023, 42 (1), 2200163. 10.1002/minf.202200163. [DOI] [PubMed] [Google Scholar]
  23. Laufkötter O.; Miyao T.; Bajorath J. Large-Scale Comparison of Alternative Similarity Search Strategies with Varying Chemical Information Contents. ACS Omega 2019, 4 (12), 15304–15311. 10.1021/acsomega.9b02470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Bender A. How Similar Are Those Molecules after All? Use Two Descriptors and You Will Have Three Different Answers. Expert Opin. Drug Discovery 2010, 5 (12), 1141–1151. 10.1517/17460441.2010.517832. [DOI] [PubMed] [Google Scholar]
  25. Hutter M. C. Differential Multimolecule Fingerprint for Similarity Search—Making Use of Active and Inactive Compound Sets in Virtual Screening. J. Chem. Inf. Model. 2022, 62 (11), 2726–2736. 10.1021/acs.jcim.2c00242. [DOI] [PubMed] [Google Scholar]
  26. Volochnyuk D. M.; Ryabukhin S. V.; Moroz Y. S.; Savych O.; Chuprina A.; Horvath D.; Zabolotna Y.; Varnek A.; Judd D. B. Evolution of Commercially Available Compounds for HTS. Drug Discovery Today 2019, 24 (2), 390–402. 10.1016/j.drudis.2018.10.016. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ml3c00021_si_001.zip (158.7MB, zip)

Articles from ACS Medicinal Chemistry Letters are provided here courtesy of American Chemical Society

RESOURCES