Comparison of Large Chemical Spaces

Uta Lessel; Christian Lemmen

doi:10.1021/acsmedchemlett.9b00331

. 2019 Sep 11;10(10):1504–1510. doi: 10.1021/acsmedchemlett.9b00331

Comparison of Large Chemical Spaces

Uta Lessel ^†,^*, Christian Lemmen ^§

PMCID: PMC6792285 PMID: 31620241

Abstract

graphic file with name ml9b00331_0010.jpg

Chemical libraries are commonplace in computer-aided drug discovery, and assessing their overlap/complementarity is a routine task. For this purpose, different techniques are applied, ranging from exact matching to comparing physicochemical properties. However, these techniques are applicable only if the compound sets are not too big. Particularly for chemical spaces, containing billions of compounds, alternative ways of assessment are required. Random subsets could be enumerated and compared one-to-one, but given the vast sizes of the chemical spaces assessed here, such samples can at best provide a rough estimate of any overlap. Here we describe a novel way to compare chemical spaces utilizing a panel of query compounds. We applied this technique to three different types of spaces and obtained insight into their structural overlap, their coverage of the chemical universe, and their density. As chemical feasibility of virtual compounds is particularly important, we included related in silico predictions in our assessment.

Keywords: Chemical spaces, structural overlap, coverage of the chemical universe, complementarity

Virtual screening in large chemistry spaces was first popular with the rise of high throughput screening and combinatorial chemistry but then lost its attraction.¹ It currently gains another wave of interest through technologies like DNA encoded libraries² as well as the recognition that traditional compound libraries are becoming a quite limited resource.

Traditionally, compound libraries (including the virtual species) consist of individual molecules. However, combinatorial reaction product numbers quickly grow so large that it is hardly practical to enumerate all product molecules. One option to overcome this dilemma is to store the building blocks or reagents and their connection rules instead of the enumerated molecules. This has been realized, e.g., by Tripos with the ChemSpace technology^3,4 and AllChem,⁵ which can be searched with a technology called Topomers.⁶ Nicolaou et al. published a technique on how to explore the feasible chemical space at Eli Lilly based on available reagents and annotated reactions.⁷ Alternatively, large, so-called Feature Tree Fragment Spaces can be built with the software CoLibri⁸ and can simply and effectively be searched with the FTrees software.⁹⁻¹¹ Multiple big pharma companies built their own, in-house spaces based on this technology,^12,13 but also, publicly available reaction data was used to create a public domain space.¹⁴ Compounds that can be synthesized on demand have been captured as a searchable fragment space,¹⁵ too. An overview about huge virtual compound libraries has recently been published by Hoffmann and Gastreich.¹⁶ With the availability of vast chemical spaces from different sources and given that it is close to impossible to fully enumerate their contents, the question arises of how to compare those.

For traditional compound libraries, the overlap can easily be determined based on the chemical structures. For example, Daylight fingerprints¹⁷ or ECFPs¹⁸ are routinely utilized to determine the Tanimoto similarity of all compound pairs or at least a reasonably sized fraction thereof. However, these approaches are not applicable for not practically enumerable compound collections well-exceeding 10⁹ virtual molecules. Similarly interesting is the comparison of compound sets based on their distribution in physicochemical property space. In this context, Oprea and Gottfries published a system called ChemGPS¹⁹ where they characterize a chemical space via principal component analysis of physicochemical properties based on drug-like molecules and some outliers with extreme properties to cover the complete, relevant space. Compound sets can be projected into this property space and compared with each other according to the “GPS coordinates” of the subsets. This way, it is possible to detect missing combinations of physicochemical properties. Similar approaches include for instance the “receptor-relevant subspace” concept published by Pearlman and Smith²⁰ and the training of self-organizing maps.²¹ All these techniques are applicable only for traditional size libraries and explicit structures. For excessively large chemistry spaces, as discussed here, only comparatively small samples can be compared with each other, leading at best to a rough estimate of the overlap or complementarity. Therefore, instead of taking random subsets, we suggest the comparison of huge spaces, based on nearest neighbors for a panel of 100 specific query molecules. On the basis of the Feature Tree similarity, hit sets containing the most similar molecules from each chemical space to each of these 100 probes are determined. We analyzed the overlap of these hits, the coverage of the chemical space, the density of the spaces, as well as the chemical feasibility. The panel of query molecules is a random selection of 100 marketed drugs. The size of the panel was chosen as a compromise between the number needed to derive relevant statistics and the time to calculate the nearest neighbors and their overlap. The bias to marketed drugs is intended to focus on the pharmaceutically most relevant space.

The search technology, which is widely used throughout this publication, is called Feature Trees (implemented in the FTrees software tool).^9,11 It is conceptually different from classical similarities like those based on Daylight fingerprints¹⁷ or ECFPs;¹⁸ we therefore describe this method in slightly more detail. For a full technical description, please refer to the original publication.⁹

The descriptor (called Feature Tree) is a topological, pharmacophore descriptor. As illustrated in Figure 1, the Feature Tree of a molecule reduces its structure to a representation of pharmacophoric units like rings and functional groups. These units are called nodes of a Feature Tree; nodes are connected in the same manner as the represented substructures in the molecule. So the topology is represented, but detailed atom positions are not. Consequently, a Feature Tree is a quite fuzzy representation leading to its ability to identify scaffold hops as close neighbors, which belongs to the particular strengths of this descriptor.²²

Illustration of the Feature Trees and their corresponding parts for a query and a virtual hit.

In order to determine the similarity of the Feature Trees (Sim^FT) of two molecules, the comparison algorithm determines a mapping of nodes from one tree onto the other (see Figure 1 for an illustration). The constraint for such mapping is the given topology, which must be preserved, meaning that two connected nodes on one tree can be mapped (with some leeway) only onto connected nodes of the other tree. For each pair of mapped nodes, a local similarity is calculated, which is the Tanimoto similarity of the corresponding property profiles. The overall FTrees similarity is the average of all local similarities reduced by a penalty for nonmatching nodes. The algorithm rapidly explores all valid mappings and determines the one with the overall highest similarity score.

As the overall score is derived from local similarities, this method is especially amenable to similarity searches in fragment spaces that are defined by reagents and connection rules. In an extension module called FTrees-FS (Feature Trees Fragment Spaces),¹¹ molecule fragments are represented as Feature Tree fragments. The linker atoms become special Feature Tree nodes with the sole purpose to connect only compatible Feature Tree fragments. Compatible fragments relate to valid product molecules encoded in such a space. FTrees-FS implements an optimization strategy that dynamically explores only those regions of the search space that likely lead to a match with highly similar molecules.¹⁰ Thus, it retrieves—without explicit enumeration—compounds that are similar to a given query molecule. It has been shown that the search method is able to retrieve essentially any molecule in the space with certainty beyond 99% if it is present.¹³

In our study, we assess three different FTrees fragment spaces: BICLAIM developed by Boehringer-Ingelheim,¹³REAL Space from Enamine,¹⁵ and KnowledgeSpace from BioSolveIT.¹⁴

KnowledgeSpace uses well-known reactions from the literature²³ and commercially available building blocks as reagents. This leads to a giant pool (10¹⁴) of generally reasonably looking compounds; however, without further tailoring of the reagents, the pool contains compounds covering the entire range from low to high chemical feasibility. KnowledgeSpace is not per se constrained by any feasibility concern, and therefore, it is a large and valuable source for new ideas and de novo designs. BICLAIM, which today comprises more than 10²⁰ virtual products, is more focused on scaffolds as the central object. This space is defined in essence in a reverse order, starting from the products. Those are decoupled into core and side chains. The so-called REAL (readily accessible) concept of Enamine is based on many years of carefully monitoring synthesis success rates for both reactions and reagents. On the basis of this data, only reliable reactions as well as validated in-stock building blocks are included. This way, Enamine can promise synthesis within 3–4 weeks and a success rate of at least 80%, which has been confirmed by several of their clients in a survey.¹⁶ With close to 4 billion (10⁹) virtual products in its 2018 version, this space is a highly valuable and reliable resource for drug discovery projects.

As a result of the vast size of the spaces, a complete enumeration of the encoded compounds is impossible; thus, only smaller subsets can be used for a direct comparison. Instead of picking and comparing random samples from the different fragment spaces, we compare search results of meaningful queries. This way, a certain focus on relevant parts of these spaces is achieved and any overlap—if present—should be revealed.

On the basis of 100 query compounds used as reference points in the chemical universe, we retrieved from all three spaces compounds “in the vicinity” of these queries. We applied the following filter criteria to approved small molecule drugs to ensure a clear focus on small and drug-like molecules, which are generally of most interest to drug researchers

number of violations of Lipinski’s rules < 2
molecular weight < 600 Da
clogP < 6
total polar surface area < 150 Å²
number of rotatable bonds < 12
number of H-bond donators and acceptors > 0

From the remaining drugs, 100 were randomly selected. In later experiments, we also took query molecules randomly selected from the different spaces, applying the same filter criteria.

For each of the query molecules, the 10 000 most similar molecules—on the basis of the above-described Feature Tree similarity method—were retrieved from each of the three spaces. For the structural comparison of compounds within the hit sets, we used MDL public keys,²⁴ another widely known similarity measure. On the basis of this more traditional metric, the similarity within hit sets was analyzed.

The chemical feasibility of the hits was assessed with two different methods: Ertl and Schuffenhauer’s SAscore,²⁵ which is calculated based on the frequency of the occurrence of fragments in PubChem and a complexity score, and rsynth, which is calculated with MOE²⁶ and is determined via retrosynthetic disconnections and the occurrence of the resulting fragments in a reagent database.

In theory, the combined hit sets comprise 1 000 000 compounds from each chemistry space. But as some of the hits were detected for more than one query, the number of unique hits from each space is slightly smaller, namely, 996 415 compounds from the corporate BICLAIM space, 968 467 compounds from the public KnowledgeSpace, and 971 864 from the commercial REAL Space. Most interestingly, the overlap of these unique hit sets is remarkably low. Only three compounds were detected in all three hit sets, and only very few hits were found in two of them (see Figure 2).

Overlap of the hit sets with 10 000 compounds for each of the 100 queries. BICLAIM results shown as the blue circle. KnowledgeSpace results are in yellow, and REAL Space results are in red. The size of the overlapping parts is not proportional to the numbers.

The degree of overlap is not equally distributed for the different queries. 49 of the 100 queries do not show any overlap among the hits. For the other 51 queries on average, 32 out of 10 000 hits were retrieved from two different spaces. The three hits which have been detected in the results of all three fragment spaces all belong to the same query, indicating that the overlap is focused on the chemical space around this query compound, which is omeprazole (DB00338 (APRD00446) in DrugBank²⁷). Figure 3 shows omeprazole and the hits that were detected in all three spaces.

Omeprazole and the FTrees hits that were detected across all three chemistry spaces.

When hit sets of molecules similar to particular queries are compared, one may expect that any overlap of the vast chemical spaces analyzed—if present—will be detected. However, the exact match of chemical structures is surprisingly low with only 1660 (below 0.1%) compounds detected in at least two spaces. One explanation may be that compared to the size of the entire chemical space, which has been estimated to comprise up to 10¹⁸⁰ molecules,²⁸ each of the three spaces assessed here is not more than a “drop in the ocean”.

In order to confirm that any overlap between spaces can indeed be detected by the protocol applied here, the KnowledgeSpace reactions have been split, and two equally sized subspaces have been generated. The hit sets retrieved from either subspace exhibit an overlap of about 50% with the hit set retrieved from the entire KnowledgeSpace (detailed results are provided in the supporting material).

The physicochemical properties of the hits usually resemble those of the corresponding queries. A comparison of the physicochemical property profiles for the hits is provided in the supporting material.

For an assessment of the “coverage” of the three spaces, we calculated the mean FTrees similarity of the hits to each query and plotted the results as mean similarity distributions (Figure 4).

Distributions of the mean Sim^FT values for the 10 000 hits per query (blue = BICLAIM (BI), yellow = KnowledgeSpace (KS), red = REAL Space (RS)).

The mean Sim^FT values for the hits in the BICLAIM space are clearly higher than those determined for the other spaces; KnowledgeSpace provides slightly lower similarities than REAL Space. Nevertheless, aiming at Sim^FT values larger than 0.9, valuable hits can be retrieved from all three spaces. To confirm and generalize this finding, we broadened this analysis by choosing additional random queries, namely, 100 random compounds from each of the fragment spaces. Furthermore, to see how chemical space searches compare to classical database searches, we performed FTrees searches for the same queries against 12.7 million in-stock compounds from the ZINC15 collection.²⁹ The results are summarized as box plots (Figure 5).

Box plots summarizing mean Sim^FTs for the searches with four different query sets in four different spaces (blue = BICLAIM (BI), yellow = KnowledgeSpace (KS), red = REAL Space (RS)), green= ZINC15 (Z)).

Across all query sets, hits detected in BICLAIM show higher average similarities than those detected in the other spaces; the median of the mean SIM^FTs are quite close for the hits from REAL Space and KnowledgeSpace; finally, the hit sets retrieved from ZINC15 consistently have a lower average similarity. As expected, searches with queries from one chemistry space tend to provide slightly higher average similarities for the hit sets from this space.

Although structural identity occurs rarely, the coverage of relevant pharmacophore profiles of the chemical universe is extremely high for all spaces. For a broad panel of drug molecules, similar compounds are detected in all three spaces. This means that each of the spaces is of value in itself. The generally higher average similarities of the BICLAIM hits is caused by the high density of the space, which will be described in more detail in the next paragraph.

The structural similarity of the compounds within hit sets represents a measure for the “structural density” of the three chemical spaces, an interesting property of a chemical space. To assess this property, we used Tanimoto coefficients based on MDL public keys²⁴ as a commonly accepted structural similarity measure. The results of the hit sets for the 100 drug queries are summarized in Figure 6 and show that BICLAIM is by far the most densely occupied space followed by REAL Space, KnowledgeSpace, and the ZINC15 collection.

Average Tanimoto similarity within the hit sets for each query and space. For each member of a hit set, its nearest neighbor within that hit set based on MDL public keys was determined. Each dot in the figure shows the average of all the nearest neighbor similarities for one hit set, i.e., one for each query and space. For each space, the dots follow the same color code as before and are plotted in ascending order.

The highest density of BICLAIM is likely due to its scaffold-based way of construction. The number of unique scaffolds is significantly higher than the number of reactions covered by the other spaces, which leads to a highly dense population of the covered space. At the other end of the spectrum, the density of the ZINC15 collection is naturally much lower, as the size of this compound collection is just a small fraction of the size of the other spaces.

Synthetic accessibility of virtual hits is of key importance for the practical use of chemistry spaces. So, we analyzed the in silico predicted chemical feasibilities of the hits retrieved from the different spaces. The results for the retrosynthetic-analysis-based method rsynth in MOE²⁶ are shown in Figure 7.

Box plot comparing the rsynth scores (higher is better) for the respective hit sets of the 100 queries (blue = BICLAIM (BI), red = REAL Space (RS), yellow = KnowledgeSpace (KS), and green = ZINC15).

Generally, higher rsynth scores indicate higher synthetic feasibility. Given that the ZINC15 collection consists of existing molecules, its scores may serve as a reference. Although the trend indicates a declining chemical feasibility from REAL Space over BICLAIM to KnowledgeSpace, the majority of compounds falls into a comparatively narrow range of predicted chemical feasibility with a difference of only 0.05 between the highest and the lowest median. These values are only slightly worse than the predicted chemical feasibilities of the hits from the ZINC15 collection.

The complexity-based SAscores from Ertl and Schuffenhauer²⁵ can be seen in Figure 8 (lower values are better). Again, the existing compounds from the ZINC15 collection may serve as a reference. Generally, all scores indicate high predicted chemical feasibility. This prediction method follows the same trend for the different spaces; however, again, the differences between the collections are small.

Box plot comparing the SAscores (lower is better) for the nearest neighbors to the 100 queries (blue = BICLAIM (BI), red = REAL Space (RS), yellow = KnowledgeSpace (KS), and green = ZINC15).

According to two independent prediction methods for chemical feasibility, the hits from the different spaces show the same trend, with REAL Space compounds predicted to be most likely chemically feasible, followed by BICLAIM compounds and KnowledgeSpace compounds, which are predicted to be the most difficult to make. This ranking can be explained by the conception of the different spaces: REAL Space has been optimized for synthetic accessibility so that the vendor can guarantee the delivery of about 80% on average for any compound order, whereas BICLAIM contains cores with different levels of chemical feasibility. KnowledgeSpace is based on well-known published reactions and commercially available reagents and has not been specifically tuned for chemical feasibility.

Generally, the chemical feasibility scores are close to those for existing compounds from the ZINC15 collection. This implies a huge practical relevance of chemical spaces compared to previous de novo design methods.

Last but not least, we looked at the complementarity of the chemical spaces. On the basis of our experience in virtual screening, it is desirable to detect hits with Sim^FT > 0.9. Thus, the number of hits with Sim^FT > 0.9 for a query in a certain chemistry space might be taken as a performance measurement. Figure 9 illustrates that the number of hits with Sim^FT > 0.9 for each of the random queries in BICLAIM, REAL Space, and KnowledgeSpace varies significantly from query to query and from space to space.

Counts of hits with Sim^FT > 0.9 retrieved versus one query (one dot) in one particular space. The queries are sorted in ascending order based on the BICLAIM results. Ties are sorted in ascending order of the REAL Space results. (BICLAIM (BI) = blue, REAL Space (RS) = red, and KnowledgeSpace (KS) = yellow).

This observation can be used as an indicator of “holes” in the different spaces, for example, missing reactions. For an assessment of potential holes in a space A, those queries are especially interesting, for which many hits with high similarity are found in a space B, whereas the number and similarity of the hits from space A is significantly lower. In such a case, the reactions leading to the hits from space B might be missing in space A. An example for detecting a missing reaction is given in the supporting material.

In summary, a comparison of chemistry spaces that are too large to be enumerated can be achieved by analyzing the results of searches with a panel of distinct query compounds. The advantage of this procedure is that the comparison is directed to parts of the spaces, where an overlap can be found—if it exists. Furthermore, the expected performance in practical applications can be assessed.

The number of identical structures in hit sets from different spaces indicates the extent of overlap, which is found to be extremely low in this study. The similarity distribution of the most similar hits illustrates the coverage of the drug-like portion of the chemical universe, which is generally quite high for the analyzed spaces. The average similarity within the hit sets characterizes the density of a space, which is highest for BICLAIM. The chemical feasibility is reasonably high compared to the values obtained for existing molecules and follows the same general trend according to two complementary assessments. REAL Space compounds are on average more likely feasible, whereas KnowledgeSpace compounds are deemed less easy to be synthesized. The slight differences between the spaces can be explained by their design and setup. Altogether, this study shows that virtual chemical spaces cover a lot of valuable content for the pharmaceutical researcher and are no longer a mere theoretical concept. Instead, they may have a very high impact in practice, and it is worthwhile to explore different chemical spaces for the detection of alternative hits and potential leads.

Acknowledgments

We are grateful to Marcus Gastreich for stimulating discussions and valuable input to this publication.

Glossary

Abbreviations

FTree: Feature Tree
Sim^FT: Feature Tree similarity

Supporting Information Available

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acsmedchemlett.9b00331.

Additional data to show broader applicability of the method; comparison of physicochemical properties of queries and hits; example for the detection of a missing reaction (PDF)
SD file with the randomly chosen queries (SDF)

Author Contributions

The manuscript was written through contributions of all authors.

The authors declare no competing financial interest.

Supplementary Material

ml9b00331_si_002.pdf^{(253.3KB, pdf)}

ml9b00331_si_003.sdf^{(154.5KB, sdf)}

References

Kodadek T. The Rise, Fall and Reinvention of Combinatorial Chemistry. Chem. Commun. 2011, 47 (35), 9757–9763. 10.1039/c1cc12102b. [DOI] [PubMed] [Google Scholar]
Kontijevskis A. Mapping of Drug-like Chemical Universe with Reduced Complexity Molecular Frameworks. J. Chem. Inf. Model. 2017, 57 (4), 680–699. 10.1021/acs.jcim.7b00006. [DOI] [PubMed] [Google Scholar]
Cramer R. D.; Patterson D. E.; Clark R. D.; Soltanshahi F.; Lawless M. S. Virtual Compound Libraries: A New Approach to Decision Making in Molecular Discovery Research. J. Chem. Inf. Comput. Sci. 1998, 38 (6), 1010–1023. 10.1021/ci9800209. [DOI] [Google Scholar]
Andrews K. M.; Cramer R. D. Toward General Methods of Targeted Library Design: Topomer Shape Similarity Searching with Diverse Structures as Queries. J. Med. Chem. 2000, 43 (9), 1723–1740. 10.1021/jm000003m. [DOI] [PubMed] [Google Scholar]
Cramer R. D.; Soltanshahi F.; Jilek R.; Campbell B. AllChem: Generating and Searching 1020 Synthetically Accessible Structures. J. Comput.-Aided Mol. Des. 2007, 21 (6), 341–350. 10.1007/s10822-006-9093-8. [DOI] [PubMed] [Google Scholar]
Jilek R. J.; Cramer R. D. Topomers: A Validated Protocol for Their Self-Consistent Generation. J. Chem. Inf. Comput. Sci. 2004, 44 (4), 1221–1227. 10.1021/ci049961d. [DOI] [PubMed] [Google Scholar]
Nicolaou C. A.; Watson I. A.; Hu H.; Wang J. The Proximal Lilly Collection: Mapping, Exploring and Exploiting Feasible Chemical Space. J. Chem. Inf. Model. 2016, 56, 1253–1266. 10.1021/acs.jcim.6b00173. [DOI] [PubMed] [Google Scholar]
BioSolveIT GmbH. CoLibri, Version 4.2. https://www.biosolveit.de/CoLibri.
Rarey M.; Dixon J. S. Feature Trees: A New Molecular Similarity Measure Based on Tree Matching. J. Comput.-Aided Mol. Des. 1998, 12, 471–490. 10.1023/A:1008068904628. [DOI] [PubMed] [Google Scholar]
Rarey M.; Stahl M. Similarity Searching in Large Combinatorial Chemistry Spaces. J. Comput.-Aided Mol. Des. 2001, 15 (6), 497–520. 10.1023/A:1011144622059. [DOI] [PubMed] [Google Scholar]
BioSolveIT GmbH. FTrees and FTrees-FS, Version 5.0. https://www.biosolveit.de/FTrees-FS.
Boehm M.; Wu T. Y.; Claussen H.; Lemmen C. Similarity Searching and Scaffold Hopping in Synthetically Accessible Combinatorial Chemistry Spaces. J. Med. Chem. 2008, 51, 2468–2480. 10.1021/jm0707727. [DOI] [PubMed] [Google Scholar]
Lessel U.; Wellenzohn B.; Lilienthal M.; Claussen H. Searching Fragment Spaces with Feature Trees. J. Chem. Inf. Model. 2009, 49, 270–279. 10.1021/ci800272a. [DOI] [PubMed] [Google Scholar]
Detering C.; Claussen H.; Gastreich M.; Lemmen C. KnowledgeSpace — A Publicly Available Virtual Chemistry Space. J. Cheminform. 2010, 2 (Suppl 1), O9. 10.1186/1758-2946-2-S1-O9. [DOI] [Google Scholar]
Enamine. Enamine REAL Space and REAL Database. https://enamine.net/index.php?option=com_content&task=view&id=254.
Hoffmann T.; Gastreich M. The next level in chemical space navigation: going far beyond enumerable compound libraries. Drug Discovery Today 2019, 24 (5), 1148–1156. 10.1016/j.drudis.2019.02.013. [DOI] [PubMed] [Google Scholar]
Daylight Theory Manual: Daylight Version 4.9; Daylight Chemical Information Systems, 2011. http://www.daylight.com/dayhtml/doc/theory/index.pdf.
Rogers D.; Hahn M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50 (5), 742–754. 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
Oprea T. I.; Gottfries J. Chemography: The Art of Navigating in Chemical Space. J. Comb. Chem. 2001, 3 (2), 157–166. 10.1021/cc0000388. [DOI] [PubMed] [Google Scholar]
Pearlman R. S.; Smith K. M. Metric Validation and the Receptor-Relevant Subspace Concept. J. Chem. Inf. Comput. Sci. 1999, 39 (1), 28–35. 10.1021/ci980137x. [DOI] [Google Scholar]
Reutlinger M.; Schneider G. Nonlinear Dimensionality Reduction and Mapping of Compound Libraries for Drug Discovery. J. Mol. Graphics Modell. 2012, 34, 108–117. 10.1016/j.jmgm.2011.12.006. [DOI] [PubMed] [Google Scholar]
Briem H.; Lessel U. F. In Vitro and in Silico Affinity Fingerprints: Finding Similarities beyond Structural Classes. Perspect. Drug Discovery Des. 2000, 20 (1), 231–244. 10.1023/A:1008793325522. [DOI] [Google Scholar]
Hartenfeller M.; Eberle M.; Meier P.; Nieto-Oberhuber C.; Altmann K. H.; Schneider G.; Jacoby E.; Renner S. A Collection of Robust Organic Synthesis Reactions for in Silico Molecule Design. J. Chem. Inf. Model. 2011, 51 (12), 3093–3098. 10.1021/ci200379p. [DOI] [PubMed] [Google Scholar]
Durant J. L.; Leland B. A.; Henry D. R.; Nourse J. G. Reoptimization of MDL Keys for Use in Drug Discovery. J. Chem. Inf. Comput. Sci. 2002, 42 (6), 1273–1280. 10.1021/ci010132r. [DOI] [PubMed] [Google Scholar]
Ertl P.; Schuffenhauer A. Estimation of Synthetic Accessibility Score of Drug-like Molecules Based on Molecular Complexity and Fragment Contributions. J. Cheminform. 2009, 1 (1), 8. 10.1186/1758-2946-1-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chemical Computing Group ULC . Molecular Operating Environment (MOE), Version 2018.01; Chemical Computing Group ULC: Montreal, QC, Canada, 2018.
Wishart D. S.; Feunang Y. D.; Guo A. C.; Lo E. J.; Marcu A.; Grant J. R.; Sajed T.; Johnson D.; Li C.; Sayeeda Z.; et al. DrugBank 5.0: A Major Update to the DrugBank Database for 2018. Nucleic Acids Res. 2018, 46 (D1), D1074–D1082. 10.1093/nar/gkx1037. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gorse A.-D. Diversity in Medicinal Chemistry Space. Curr. Top. Med. Chem. 2006, 6 (1), 3–18. 10.2174/156802606775193310. [DOI] [PubMed] [Google Scholar]
Sterling T.; Irwin J. J. ZINC 15 — Ligand Discovery for Everyone. J. Chem. Inf. Model. 2015, 55, 2324–2337. 10.1021/acs.jcim.5b00559. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ml9b00331_si_002.pdf^{(253.3KB, pdf)}

ml9b00331_si_003.sdf^{(154.5KB, sdf)}

[ref1] Kodadek T. The Rise, Fall and Reinvention of Combinatorial Chemistry. Chem. Commun. 2011, 47 (35), 9757–9763. 10.1039/c1cc12102b. [DOI] [PubMed] [Google Scholar]

[ref2] Kontijevskis A. Mapping of Drug-like Chemical Universe with Reduced Complexity Molecular Frameworks. J. Chem. Inf. Model. 2017, 57 (4), 680–699. 10.1021/acs.jcim.7b00006. [DOI] [PubMed] [Google Scholar]

[ref3] Cramer R. D.; Patterson D. E.; Clark R. D.; Soltanshahi F.; Lawless M. S. Virtual Compound Libraries: A New Approach to Decision Making in Molecular Discovery Research. J. Chem. Inf. Comput. Sci. 1998, 38 (6), 1010–1023. 10.1021/ci9800209. [DOI] [Google Scholar]

[ref4] Andrews K. M.; Cramer R. D. Toward General Methods of Targeted Library Design: Topomer Shape Similarity Searching with Diverse Structures as Queries. J. Med. Chem. 2000, 43 (9), 1723–1740. 10.1021/jm000003m. [DOI] [PubMed] [Google Scholar]

[ref5] Cramer R. D.; Soltanshahi F.; Jilek R.; Campbell B. AllChem: Generating and Searching 1020 Synthetically Accessible Structures. J. Comput.-Aided Mol. Des. 2007, 21 (6), 341–350. 10.1007/s10822-006-9093-8. [DOI] [PubMed] [Google Scholar]

[ref6] Jilek R. J.; Cramer R. D. Topomers: A Validated Protocol for Their Self-Consistent Generation. J. Chem. Inf. Comput. Sci. 2004, 44 (4), 1221–1227. 10.1021/ci049961d. [DOI] [PubMed] [Google Scholar]

[ref7] Nicolaou C. A.; Watson I. A.; Hu H.; Wang J. The Proximal Lilly Collection: Mapping, Exploring and Exploiting Feasible Chemical Space. J. Chem. Inf. Model. 2016, 56, 1253–1266. 10.1021/acs.jcim.6b00173. [DOI] [PubMed] [Google Scholar]

[ref8] BioSolveIT GmbH. CoLibri, Version 4.2. https://www.biosolveit.de/CoLibri.

[ref9] Rarey M.; Dixon J. S. Feature Trees: A New Molecular Similarity Measure Based on Tree Matching. J. Comput.-Aided Mol. Des. 1998, 12, 471–490. 10.1023/A:1008068904628. [DOI] [PubMed] [Google Scholar]

[ref10] Rarey M.; Stahl M. Similarity Searching in Large Combinatorial Chemistry Spaces. J. Comput.-Aided Mol. Des. 2001, 15 (6), 497–520. 10.1023/A:1011144622059. [DOI] [PubMed] [Google Scholar]

[ref11] BioSolveIT GmbH. FTrees and FTrees-FS, Version 5.0. https://www.biosolveit.de/FTrees-FS.

[ref12] Boehm M.; Wu T. Y.; Claussen H.; Lemmen C. Similarity Searching and Scaffold Hopping in Synthetically Accessible Combinatorial Chemistry Spaces. J. Med. Chem. 2008, 51, 2468–2480. 10.1021/jm0707727. [DOI] [PubMed] [Google Scholar]

[ref13] Lessel U.; Wellenzohn B.; Lilienthal M.; Claussen H. Searching Fragment Spaces with Feature Trees. J. Chem. Inf. Model. 2009, 49, 270–279. 10.1021/ci800272a. [DOI] [PubMed] [Google Scholar]

[ref14] Detering C.; Claussen H.; Gastreich M.; Lemmen C. KnowledgeSpace — A Publicly Available Virtual Chemistry Space. J. Cheminform. 2010, 2 (Suppl 1), O9. 10.1186/1758-2946-2-S1-O9. [DOI] [Google Scholar]

[ref15] Enamine. Enamine REAL Space and REAL Database. https://enamine.net/index.php?option=com_content&task=view&id=254.

[ref16] Hoffmann T.; Gastreich M. The next level in chemical space navigation: going far beyond enumerable compound libraries. Drug Discovery Today 2019, 24 (5), 1148–1156. 10.1016/j.drudis.2019.02.013. [DOI] [PubMed] [Google Scholar]

[ref17] Daylight Theory Manual: Daylight Version 4.9; Daylight Chemical Information Systems, 2011. http://www.daylight.com/dayhtml/doc/theory/index.pdf.

[ref18] Rogers D.; Hahn M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50 (5), 742–754. 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]

[ref19] Oprea T. I.; Gottfries J. Chemography: The Art of Navigating in Chemical Space. J. Comb. Chem. 2001, 3 (2), 157–166. 10.1021/cc0000388. [DOI] [PubMed] [Google Scholar]

[ref20] Pearlman R. S.; Smith K. M. Metric Validation and the Receptor-Relevant Subspace Concept. J. Chem. Inf. Comput. Sci. 1999, 39 (1), 28–35. 10.1021/ci980137x. [DOI] [Google Scholar]

[ref21] Reutlinger M.; Schneider G. Nonlinear Dimensionality Reduction and Mapping of Compound Libraries for Drug Discovery. J. Mol. Graphics Modell. 2012, 34, 108–117. 10.1016/j.jmgm.2011.12.006. [DOI] [PubMed] [Google Scholar]

[ref22] Briem H.; Lessel U. F. In Vitro and in Silico Affinity Fingerprints: Finding Similarities beyond Structural Classes. Perspect. Drug Discovery Des. 2000, 20 (1), 231–244. 10.1023/A:1008793325522. [DOI] [Google Scholar]

[ref23] Hartenfeller M.; Eberle M.; Meier P.; Nieto-Oberhuber C.; Altmann K. H.; Schneider G.; Jacoby E.; Renner S. A Collection of Robust Organic Synthesis Reactions for in Silico Molecule Design. J. Chem. Inf. Model. 2011, 51 (12), 3093–3098. 10.1021/ci200379p. [DOI] [PubMed] [Google Scholar]

[ref24] Durant J. L.; Leland B. A.; Henry D. R.; Nourse J. G. Reoptimization of MDL Keys for Use in Drug Discovery. J. Chem. Inf. Comput. Sci. 2002, 42 (6), 1273–1280. 10.1021/ci010132r. [DOI] [PubMed] [Google Scholar]

[ref25] Ertl P.; Schuffenhauer A. Estimation of Synthetic Accessibility Score of Drug-like Molecules Based on Molecular Complexity and Fragment Contributions. J. Cheminform. 2009, 1 (1), 8. 10.1186/1758-2946-1-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref26] Chemical Computing Group ULC . Molecular Operating Environment (MOE), Version 2018.01; Chemical Computing Group ULC: Montreal, QC, Canada, 2018.

[ref27] Wishart D. S.; Feunang Y. D.; Guo A. C.; Lo E. J.; Marcu A.; Grant J. R.; Sajed T.; Johnson D.; Li C.; Sayeeda Z.; et al. DrugBank 5.0: A Major Update to the DrugBank Database for 2018. Nucleic Acids Res. 2018, 46 (D1), D1074–D1082. 10.1093/nar/gkx1037. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref28] Gorse A.-D. Diversity in Medicinal Chemistry Space. Curr. Top. Med. Chem. 2006, 6 (1), 3–18. 10.2174/156802606775193310. [DOI] [PubMed] [Google Scholar]

[ref29] Sterling T.; Irwin J. J. ZINC 15 — Ligand Discovery for Everyone. J. Chem. Inf. Model. 2015, 55, 2324–2337. 10.1021/acs.jcim.5b00559. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Comparison of Large Chemical Spaces

Uta Lessel

Christian Lemmen

Abstract

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Acknowledgments

Glossary

Abbreviations

Supporting Information Available

Author Contributions

Supplementary Material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Comparison of Large Chemical Spaces

Uta Lessel

Christian Lemmen

Abstract

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Acknowledgments

Glossary

Abbreviations

Supporting Information Available

Author Contributions

Supplementary Material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases