Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2024 Oct 10;44(1):e202400186. doi: 10.1002/minf.202400186

Navigating a 1E+60 Chemical Space of Peptide/Peptoid Oligomers

Markus Orsi 1, Jean‐Louis Reymond 1,
PMCID: PMC11733718  PMID: 39390672

Abstract

Herein we report a virtual library of 1E+60 members, a common estimate for the total size of the drug‐like chemical space. The library is obtained from 100 commercially available peptide and peptoid building blocks assembled into linear or cyclic oligomers of up to 30 units, forming molecules within the size range of peptide drugs and potentially accessible by solid‐phase synthesis. We demonstrate ligand‐based virtual screening (LBVS) using the peptide design genetic algorithm (PDGA), which evolves a population of 50 members to resemble a given target molecule using molecular fingerprint similarity as fitness function. Target molecules are reached in less than 10,000 generations. Like in many journeys, the value of the chemical space journey using PDGA lies not in reaching the target but in the journey itself, here by encountering non‐obvious analogs. We also show that PDGA can be used to generate median molecules and analogs of non‐peptide target molecules.

Keywords: chemical space, cheminformatics, genetic algorithm, therapeutic peptides


Herein we report a virtual library of 1E+60 members, a common estimate for the size of the drug‐like chemical space. The library consists of linear or cyclic oligomers forming molecules within the size range of peptide drugs. We demonstrate ligand‐based virtual screening using a genetic algorithm.

graphic file with name MINF-44-e202400186-g003.jpg

1. Introduction

Since the advent of combinatorial chemistry in the early 1990’s, which was triggered by the invention of the split‐and‐mix method yielding one‐bead‐one‐compound libraries of millions of peptide and peptide‐like oligomers in a few tens of synthetic operations [1, 2, 3], drug discovery has been fascinated and partly driven by large numbers [4, 5, 6]. Approaches ranged from the “needle in a haystack” method of high‐throughput screening typical for genetically encoded display libraries [7, 8] and DNA‐encoded libraries [9, 10], to the concept of chemical space guiding the design of focused libraries of small drug‐like molecules [11, 12, 13], fragments [14, 15] and peptides [16, 17, 18]. Many projects are currently exploiting “make‐on‐demand” virtual libraries of a few billion members obtained by using various coupling chemistries to combine two to four building blocks, each being taken from a pool of thousands of building blocks, to form linear, branched or cyclic oligomers [19, 20, 21, 22]. Despite of being rather constrained, this oligomer chemical space has proven amenable to virtual screening and sufficiently diverse to solve most drug discovery problems [23, 24, 25], probably because biomolecules are themselves oligomers and their binding sites are usually suitable for partly flexible, pearl‐string like molecules [26, 27, 28].

Following up on our interest for exhaustive enumeration of chemical space [29, 30, 31], here we aimed to extend the oligomer chemical space to reach up to a virtual library size of 1E+60, a common estimate for the total size of the drug‐like chemical space [6, 32]. We also aimed to demonstrate virtual screening at that library size focusing on ligand‐based virtual screening (LBVS) [33, 34]. LBVS consists in identifying analogs of a reference bioactive compound by scoring the virtual library using molecular similarity measures such as molecular fingerprints [35, 36, 37, 38], or shape‐based comparisons [39, 40, 41, 42, 43]. As discussed below, we achieved our goals for the case of mixed peptide‐peptoids potentially accessible by solid‐phase peptide synthesis (SPPS) [44], moving up to 30 mers with 100 different building blocks to reach the required library size. To demonstrate LBVS, we modified our recently reported peptide design genetic algorithm (PDGA) [45], which evolves analogs of any target molecule by performing mutations/selection cycles on sequences encoding a topologically diverse oligomer space using molecular fingerprint similarity as fitness function, an approach which is related to small molecule design genetic algorithms [46, 47]. PDGA can be used to design new analogs of known peptides as recently demonstrated experimentally for antimicrobial peptide dendrimers [48]. Specifically, we computed the fitness function using the macromolecule extended atom pair fingerprint (MXFP) [49, 50] and the chiral MinHashed atom pair fingerprint (MAP4 C) [51, 52], both designed for large molecules.

2. Methods

2.1. Building Blocks

Our set of 100 building blocks includes the 20 proteinogenic amino acids, their D‐enantiomers, 12 further amino acids, 46 peptoids (N‐substituted glycines) [53] as well as GABA and β‐alanine, all available commercially or easily accessible in protected form for Fmoc‐SPPS or for the submonomer synthesis method for peptoids (Figure S1) [54, 55]. To further augment diversity, we allowed 11 different acyl group to cap the N‐termini, and allowed a single cyclization either via a cystine bridge or by amide bond formation between the C‐terminus and the N‐terminus or a primary amine side chain (at lysine and related diamino acids). All building blocks are encoded in SMILES notation, ensuring that their concatenation always leads to a valid molecule. Additionally, sequences are represented in linear format to facilitate mutation and cross‐over operations within the genetic algorithm. In this format, “BBXXX” denotes a building block containing an amine and carboxylic acid, “bXXX” a diamino acid for sequence branching, “c” a C‐to‐N cyclization, “s” a cysteine for disulfide bridges, and “TXXX” an N‐terminal cap. Figure S2 illustrates the enhanced sequence format. Both, the enhanced sequence format, and the corresponding SMILES, are stored in the results files.

2.2. Genetic Algorithm

We modified our previously reported PDGA [45] by computing fitness functions either as the Jaccard distance (dJ ) to the target molecule computed using the molecular fingerprint MAP4 C [52], saving all generated molecules at each generation as trajectory molecules, or as the City Block Distance (dCBD ) to the target molecule computed using the most recent version of MXFP [50], here saving only molecules with dCBD ≤300 as trajectory molecules, a threshold which only retains molecules with a significant degree of similarity to the target. Each PDGA run was started either from 50 random linear sequences generated using the 100 available building blocks, or from 50 repetitions of a selected starting sequence (for traversal runs) and stopped either when the target was found or after 10,000 generations. For all runs, a mutation rate of 0.5, population size of 50 and free topology exploration were employed during the genetic optimization process. In each iteration, the 15 sequences nearest to the query are chosen as parents and mutated to create 35 new sequences, which are then added to the population. Mutation types include point mutations, deletions, insertions and cross‐over. A second set of topology‐changing mutations were added to the pool of possible mutations in the PDGA. These include forming and breaking of C‐to‐N‐cyclizations, forming and breaking of branching points using diamino acids as well as forming and breaking of disulfide bridges by insertion of two cysteines.

3. Results and Discussion

3.1. A 1E+60 Combinatorial Library from 100 Building Blocks up to 30 Mers

Due to its size, a chemical space of 1E+60 cannot be explicitly enumerated, leaving a formal combinatorial enumeration as the only viable option. Assembling N building blocks to form an oligomer of length M results in NM possibilities, hence 1E+60 is readily reached in a 60 mer peptide using only 10 different amino acids, in line with the well‐known combinatorial explosion of possibilities in peptide and protein sequences. However, reducing length M in the direction of small molecules requires an exponentially increasing number of building blocks N, for instance including all 20 proteinogenic amino acids would still require a 46 mer to reach 1E+60, and reducing oligomer length to a tetramer assembly typical of small molecules would require 1E+15 building blocks, well beyond the known small molecule chemical space (Table 1, 2nd column).

Table 1.

Influence of oligomer length M and number of building blocks N on virtual library size.[a]

Oligomer length (M)

Number of building blocks (N) required to reach NM=1E+60

Library size at length M with N=100

60

10

1E+120

46

20

1E+92

30

100

1E+60

29

117

1E+58

15

10,000

1E+30

8

31,622,777

1E+16

4

1E+15

100,000,000

[a] For linear amide‐bond connected oligomers. Since we consider single strands and the amide bond is directional (CO−N is not equivalent to N−CO), there are no symmetrical sequences. In the present report library size is further increased by cyclization and diverse N‐terminal caps (see text and methods).

Here we settled for 100 building blocks, reaching 1E+60 with a 30 mer, which lies within the size range of peptide drugs such as the HIV membrane fusion inhibitor enfuvirtide (34 residues) [56] or the diabetes/obesity drug semaglutide (31 residues) [57]. To reach N=100, we considered the 20 proteinogenic amino acids in L‐ and D‐ enantiomeric forms, together with simple non‐proteinogenic amino acids as well as peptoids (N‐alkylated glycine) [53], which can be easily assembled by SPPS with the sub‐monomer approach [58]. All 100 building blocks selected were commercially available or easily accessible in a protected from suitable for peptide and/or peptoid submonomer SPPS (Figure S1).

With these 100 building blocks at hand, a virtual combinatorial enumeration of 1E+60 sequences was possible. To increase diversity, we allowed for eleven different N‐terminal carboxylic acids, in particular fatty acids as found in peptide antibiotics such as polymyxin [59] and which favor cellular uptake in natural products [60] and extend peptide circulation times via albumin binding [61]. We also added several options for cyclization to increase diversity (see methods for details). While these additional variations enlarged library size, it should be noted that library size depended primarily on oligomer length. For instance, reducing length by one unit to 29 mers reduced library size by 100‐fold, implying that 99% of the library resided with 30 mers. Nevertheless, with 100 building blocks the virtual library still contained 100 million members for tetramers, well in the size range of the public archive PubChem (Table 1, 3rd column) [62].

3.2. Ligand‐Based Virtual Screening by Genetic Algorithm Guided Navigation

Virtual screening consists in computationally evaluating a dataset to select a restricted number of molecules for closer inspection. Here we used LBVS aiming to select analogs of a target compound by using a genetic algorithm approach with PDGA (Figure 1) [45]. Genetic algorithms evolve a population for fitness by rounds of mutations and selection. In the context of our 1E+60 chemical space, this approach corresponds to a targeted navigation guided by the fitness function, which circumvents the need for evaluating every library member. We set out to test whether our PDGA would find its way through our 1E+60 virtual library, drawing from the selected set of 100 peptide/peptoid building blocks rather than only 20 amino acids to generate mutants.

Figure 1.

Figure 1

Design of PDGA. PDGA uses a list of input building blocks to generate a set of random linear sequences. The sequences are encoded using either the MAP4 C or MXFP fingerprints. The fingerprints are used to determine the fitness of the sequences by calculating the distance towards a specified query molecule. Sequences with distances below a set threshold are stored in an analogs database. The 15 fittest sequences undergo rounds of mutations and crossovers in which building blocks and topology are changed to add 35 new sequences to the population. This process iterates until either the query is found or the PDGA reaches 10,000 generations.

We challenged PDGA to identify analogs of six known bioactive linear and cyclic peptides of various length in our 1E+60 library. The test cases were polymyxin B2 (1, 10 residues, antimicrobial) [59], gramicidin S (2, 10 residues, antimicrobial) [63, 64], the mixed peptide/peptoid hybrid EB9 (3, 11 residues, antibacterial) [44], oncocin (4, 19 residues, antimicrobial) [65], cathelicidin BF (5, 30 residues, immunomodulatory peptide) [66], and circulin D (6, 30 residues, anti‐HIV) [67] (Figure 2). In each case, we performed three PDGA runs of maximum 10,000 generations starting from 50 random sequences using the chiral fingerprint MAP4 C, which encodes pairs of circular substructures with high precision including chirality [51, 52].

Figure 2.

Figure 2

Structures of the selected queries for the PDGA runs using the MAP4 C similarity as fitness function. The linear sequences (4, 5 and 6) are written with standard one‐letter code for amino acids, with free N‐terminus marked as “H−“ and C‐terminus in acid form “−OH” or amide form “−NH2”.

PDGA identified the target molecule in less than 10,000 generation in at least one of the three runs for each of these six peptides, including the two 30 mer peptides 5 and 6, which required exploration of the full 1E+60 chemical space (Table 2). Since each generation only amounted to 35 new molecules, which were evaluated against the 15 best scoring molecules of the previous generation used as parents, the cumulative number of molecules generated in each trajectory only amounted to a few thousands, which is remarkably low considering the size of the explored chemical space. Note that the number of molecules per trajectory was approximately 30% lower when excluding stereoisomers. The presence of stereoisomers in the trajectory resulted from the presence of D‐ and L‐ residues in the building block set and the ability of MAP4 C to rank each stereoisomer differently. Among the generated structures, PDGA delivered thousands of virtual screening hits characterized by a high similarity (Jaccard distance dJ <0.5) to the target peptide.

Table 2.

Results of three parallel PDGA runs for queries 1–6.

# Generations to query[b]

# Unique structures (% with dJ <0.5)[c]

# Unique structures not counting

diastereomers (% with dJ <0.5)[c]

Query

Length

Structure[a]

Run 1

Run 2

Run 3

Run 1

Run 2

Run 3

Run 1

Run 2

Run 3

Polymyxin B2 (1)

10

cyclic peptide

894

1,371

>10 k

6,934 (67)

6,362 (74)

7,877 (24)

5,123 (57)

4,792 (67)

4,851 (13)

Gramicidin S (2)

10

cyclic peptide

512

736

>10 k

4,119 (69)

5,438 (80)

4,142 (13)

3,384 (63)

4,505 (76)

2,958 (9)

EB9 (3)

11

peptoid

2,485

2,295

>10 k

20,998 (36)

20,377 (44)

7,160 (32)

16,705 (32)

16,333 (41)

5,720 (28)

Oncocin (4)

19

linear peptide

5,350

5,629

>10 k

46,591 (80)

39,835 (77)

55,462 (67)

22,023 (65)

27,829 (70)

32,698 (52)

Cathelicidin BF (5)

30

linear peptide

9,355

8,521

>10 k

88,738 (86)

86,265 (87)

31,301 (86)

57,367 (81)

63,374 (83)

20,831 (80)

Circulin D (6)

30

Cyclotide[d]

8,133

>10 k

>10 k

73,535 (73)

37,526 (74)

33,738 (61)

43,550 (58)

23,368 (61)

26,092 (53)

[a] See supporting information Figure S2 for structural formulae.

[b] Number of generations used by PDGA to reach the query molecule. >10 k indicates that the target was not found within 10 k generations.

[c] dJ refers to the Jaccard distance calculated using MAP4 C fingerprints. d) PDGA was run on the linear sequence lacking the cystine bridges.

The evolution of the best score (dJ to target) per generation as function of generation number illustrated how PDGA reached each target (Figures 3 and S3, upper row). After an initial round of approximately 10 generations, the best score started to decrease, indicating that the algorithm had found a way towards the target. After approximately 1,000 generations, the score had either decreased to zero and the target had been found, or the algorithm was stuck at an intermediate score. In terms of the cumulative number of new molecules generated, the increase per generation was approximately steady until the target had been found (Figures 3 and S3, lower row). When the target was not found however, the algorithm was unable to generate any new structures, indicating that the same 15 top scoring molecules kept being selected as parent in each round and that none of their mutants led to any improvement in the score, implying that a local minimum had been reached. Because the computational expense of correcting this limitation by introducing a duplicate molecule check at every iteration was found to be far too large and the target was usually found by repeating the run several times, the algorithm was not modified.

Figure 3.

Figure 3

Analysis of three parallel PDGA runs starting from 50 random sequences towards selected queries. Top plots show the overall best score throughout the trajectory; the bottom plots show the cumulative number of unique new molecules generated throughout the trajectory for a) polymyxin B2, b) EB9, and c) cathelicidin BF. The best score refers to the MAP4 C dJ of the closest structure generated up to that generation relative to the target.

To get a closer insight into the analogs (MAP4 C dJ <0.5) generated by PDGA, we focused on the case of polymyxin B2 (Figure 4). We compared the three PDGA runs with an additional self‐run, starting PDGA from polymyxin B2 and letting the algorithm complete 10,000 generation independent of target identification. This self‐run quickly exhausted itself and produced 1,906 unique analogs, significantly less than the approximately seven thousand analogs obtained for each PDGA run. Interestingly, each of the three runs produced a different set of analogs (Figure 4a). While it is not surprising that all 7,877 molecules in the failed run were unique to this run since it failed to converge on the target, the two successful runs only shared three common molecules and less than 100 with the self‐run, although all molecules in these runs were highly similar to polymyxin B2, with an average Jaccard distance below 0.35 (Figure 4b). We also analyzed the average number of mutations from the target using Levenshtein distance as a proxy. Analogs of the successful runs were on average three mutations away from the target, while the self‐run only produced point mutants and molecules from the failed run remained approximately 9 mutations away from polymyxin B2 (Figure 4c).

Figure 4.

Figure 4

Analysis of polymyxin B2 runs starting from 50 random linear sequences (Run 1–3) or from polymyxin B2 without stopping condition (Self). a) Heatmap indicating the number of generated compounds with MAP4 C dJ <0.5 to polymyxin B2 for each trajectory, along with the number of overlapping compounds. b) Bar plot showing the mean and standard deviation of the dJ calculated using MAP4 C fingerprints for generated compounds with dJ <0.5 to polymyxin B2. c) Bar plot showing the mean and standard deviation of the Levenshtein distance (dL ; proxy for number of mutations) to polymyxin B2 for generated compounds with dJ <0.5 to polymyxin B2. d) Structure of a selected polymyxin B2 analog featuring a high dL and low dJ (7) and the closest analog generated in the failed run (8). e) TMAP displaying the generated compounds in a 2D space. Interactive TMAP: https://tm.gdb.tools/map4/10E60/polymyxin_randself_tmap.html.

A closer analysis of the successful runs revealed that many analogs combined multiple mutations with a high similarity to the target, as exemplified with analog 7 (Figure 4d). Such analogs are particularly interesting since they would be difficult to identify without PDGA compared to single point mutant from the self‐run, which do not require an algorithm for design. When displayed on a tree‐map (TMAP) [68] computed using MAP4 C similarities, molecules from the two successful runs and the self‐run were intermixed, indicating that they occupied a similar chemical space. Note however that two clusters of molecules from Run 1 (blue) or Run 2 (yellow) were visible, which contained early generation molecules with high Jaccard distance. Molecules from Run 3, which did not reach the target, also remained at high Jaccard distance and occupied a separate area of the map, reflecting their very different structural type, which featured a large, unbranched macrocycle exemplified by analog 8 (Figure 4d).

3.3. Traversing Chemical Space to Find Median Molecules

We next tested whether PDGA might be used to generate traversal trajectories in chemical space, starting from molecule A to reach a target molecule B, potentially travelling by a region of chemical space containing median molecules, a goal realized by small molecule generation algorithms [69, 70], but not demonstrated for the case of peptides or peptide‐like oligomers. PDGA was indeed able to generate such traversal trajectories between pairs of linear or cyclic peptides as illustrated with the pair of cyclic peptide natural products polymyxin B2 (1) and gramicidin S (2), the peptide/peptoid pair EB9 (3) and oncocin (4) and the pairs of linear 30‐mers cathelicidin BF (5) and circulin D (6). Although reaching their targets, these trajectories rapidly diverged from the starting molecules and generated mostly close analogs to the target, without spending significant time at intermediate similarities (blue and red points in Figures 5a and S4).

Figure 5.

Figure 5

Visualization of traversal trajectories and median molecules between polymyxin B2 and gramicidin S. a) Jaccard distance of molecules selected from the different trajectories towards polymyxin B2 and gramicidin S. The trajectory from polymyxin B2 to gramicidin S is displayed in blue, the reverse trajectory is displayed in red, and the combined structure trajectory is displayed in yellow. b) MAP4 C TMAP of selected molecules colored by their trajectory of origin. The trajectories populate separate chemical subspaces. c) Structures of the two queries polymyxin B2 and gramicidin S and two selected molecules from the median trajectory (yellow). Interactive TMAP: https://tm.gdb.tools/map4/10E60/polymyxin_gramicidin_tmap.html.

To obtain median molecules between A and B, we ran PDGA with a modified fitness function minimizing the sum of three terms, namely the Jaccard distances to A and B and their absolute difference. This fitness function guided the algorithm to produce molecules with the smallest possible but equal distance to A and B. Indeed, the population of molecules generated using this modified fitness function were close to the diagonal of the 2D‐jaccard distance plot (yellow points in Figures 5a and S4). A TMAP analysis of the set of molecules generated for the Polymyxin B2 (1) to gramicidin S (2) trajectories showed that each trajectory generated structurally distinct classes of molecules corresponding to different areas of the chemical space around these molecules, with interesting hybrid molecules such as 9 and 10 combining features from both compounds (Figure 5b/c).

3.4. Traveling Towards Non‐Peptide Molecules

We next used PDGA to identify analogs of targets not obtainable for the 100 selected building blocks, described here as “non‐peptide”, by minimizing the distance to target and stopping after 10,000 iterations. We tested this approach for diverse macrocycles containing building blocks and linkages not available in our library (11–17, Figure S5). For these non‐peptide targets, driving PDGA with the shape and pharmacophore fingerprint MXFP delivered somewhat more convincing results than with MAP4 C.

Specifically, the molecules generated using the MXFP fitness function matched the overall shape of the target molecules better than those generated using the MAP4 C fitness function (Figures 6 and S6). For instance, in the case of cyclosporin (11), which contains several N‐methylated amide bonds contributing to its membrane permeability [71, 72], and for valinomycin (13), where half of the linkages are ester instead of amide bonds, MAP4 C generated macrocycles preserved more standard amide bonds, while those generated by MXFP guided PDGA to use the peptoid units available in our set of 100 building blocks, in order to mask the amide H‐bond donor group. Furthermore, MAP4 C sometimes selected acyclic analogs as best fits due to its emphasis on substructures, while MXFP always selected macrocycles matching the overall shape and polarity of the target molecule.

Figure 6.

Figure 6

Non‐peptide macrocycles, the overall best score throughout the trajectories and the corresponding best scoring MXFP analog from three combined runs for a) cyclosporin and b) valinomycin. The MXFP dCBD is reported for each analog. See also Figure S6 for further details.

4. Conclusion

In the conversations around chemical space, 1E+60 has established itself as a symbolic and fascinating boundary. Here we explicitly created a virtual library of 1E+60 molecules by combining 100 peptide and peptoid buildings blocks to form up to 30 mer linear or cyclic oligomers, all potentially accessible by standard solid‐phase synthesis. We demonstrated LBVS of this 1E+60 chemical space using a simple genetic algorithm, which succeeded in identifying virtual hits, defined either as analogs of specific molecules or as median molecules, by surveying only a few thousand sequences. It should be noted that, like in many journeys, the value of the chemical space journey using PDGA lies not in reaching the target but in the journey itself, here by encountering interesting molecules which would be otherwise difficult to design. Whether these molecules might translate into useful bioactives requires experimental evaluation of specific series. Additional studies along these lines are ongoing in our team.

Supporting Information

Additional supporting information can be found online in the Supporting Information section at the end of this article.

Author Contributions

MO designed and realized the project and wrote the paper. JLR designed and supervised the project and wrote the paper. Both authors read and approved the final manuscript.

Conflict of Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

5.

Supporting information

As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

Supporting Information

Acknowledgments

This work was supported by the Swiss National Science Foundation (200020_178998) and the European Research Council (885076). [Corrections added on 15/10/2024, after first online publication: Abstract was amended.]

Orsi M., Reymond J.-L., Molecular Informatics 2025, 44, e202400186. 10.1002/minf.202400186

Data Availability Statement

The code used for the analysis and plots study is available at https://github.com/reymond‐group/10E60. The raw results files can be retrieved at https://zenodo.org/records/11396287.

References

  • 1. Lam K. S., Salmon S. E., Hersh E. M., et al., “A New Type of Synthetic Peptide Library for Identifying Ligand-Binding Activity”, Nature 354, no. 6348 (1991): 82–84, 10.1038/354082a0. [DOI] [PubMed] [Google Scholar]
  • 2. Houghten R. A., Pinilla C., Blondelle S. E., et al., “Generation and Use of Synthetic Peptide Combinatorial Libraries for Basic Research and Drug Discovery”, Nature 354, no. 6348 (1991): 84–86, 10.1038/354084a0. [DOI] [PubMed] [Google Scholar]
  • 3. Lam K. S., Lebl M., Krchňák V., “The ‘One-Bead-One-Compound’ Combinatorial Library Method”, Chemical Reviews 97, no. 2 (1997): 411–448, 10.1021/cr9600114. [DOI] [PubMed] [Google Scholar]
  • 4. Bohacek R. S., McMartin C., Guida W. C., “The Art and Practice of Structure-Based Drug Design: A Molecular Modeling Perspective”, Medicinal Research Reviews 16, no. 1 (1996): 3–50, https://doi.org/10.1002/%28SICI%291098-1128%28199601%2916:1<3::AID-MED1>3.0.CO;2-6. [DOI] [PubMed] [Google Scholar]
  • 5. Bleicher K. H., Bohm H. J., Muller K., et al., “Hit and Lead Generation: Beyond High-Throughput Screening”, Nature Reviews Drug Discovery 2, no. 5 (2003): 369–378, 10.1038/nrd1086. [DOI] [PubMed] [Google Scholar]
  • 6. Kirkpatrick P., Ellis C., “Chemical Space”, Nature 432, no. 7019 (2004): 823–823, 10.1038/432823a. [DOI] [Google Scholar]
  • 7. Glökler J., Schütze T., Konthur Z., “Automation in the High-Throughput Selection of Random Combinatorial Libraries-Different Approaches for Select Applications”, Molecules 15, no. 4 (2010): 2478–2490, 10.3390/molecules15042478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Goto Y., Suga H., “The RaPID Platform for the Discovery of Pseudo-Natural Macrocyclic Peptides”, Accounts of Chemical Research 54, no. 18 (2021): 3604–3617, 10.1021/acs.accounts.1c00391. [DOI] [PubMed] [Google Scholar]
  • 9. Gironda-Martínez A., Donckele E. J., Samain F., et al., “DNA-Encoded Chemical Libraries: A Comprehensive Review with Succesful Stories and Future Challenges”, ACS Pharmacology & Translational Science 4, no. 4 (2021): 1265–1279, 10.1021/acsptsci.1c00118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Dockerill M., Winssinger N., “DNA-Encoded Libraries: Towards Harnessing Their Full Power with Darwinian Evolution”, Angewandte Chemie International Edition 135, no. 9 (2023): e202215542, 10.1002/ange.202215542. [DOI] [PubMed] [Google Scholar]
  • 11. Oprea T. I., Gottfries J., “Chemography: The Art of Navigating in Chemical Space”, Journal of Combinatorial Chemistry 3, no. 2 (2001): 157–166, 10.1021/cc0000388. [DOI] [PubMed] [Google Scholar]
  • 12. Lipinski C., Hopkins A., “Navigating Chemical Space for Biology and Medicine”, Nature 432, no. 7019 (2004): 855–861, 10.1038/nature03193. [DOI] [PubMed] [Google Scholar]
  • 13. Renner S., van Otterlo W. A. L., Dominguez Seoane M., et al., “Bioactivity-Guided Mapping and Navigation of Chemical Space”, Nature Chemical Biology 5, no. 8 (2009): 585–592, 10.1038/nchembio.188. [DOI] [PubMed] [Google Scholar]
  • 14. Bon M., Bilsland A., Bower J., et al., “Fragment-Based Drug Discovery-the Importance of High-Quality Molecule Libraries”, Molecular Oncology 16, no. 21 (2022): 3761–3777, 10.1002/1878--0261.13277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Buehler Y., Reymond J.-L., “Expanding Bioactive Fragment Space with the Generated Database GDB-13 s”, Journal of Chemical Information and Modeling 63, no. 20 (2023): 6239–6248, 10.1021/acs.jcim.3c01096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Di Bonaventura I., Jin X., Visini R., et al., “Chemical Space Guided Discovery of Antimicrobial Bridged Bicyclic Peptides against Pseudomonas Aeruginosa and Its Biofilms”, Chemical Science 8, no. 10 (2017): 6784–6798, 10.1039/c7sc01314k. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Di Bonaventura I., Baeriswyl S., Capecchi A., et al., “An Antimicrobial Bicyclic Peptide from Chemical Space Against Multidrug Resistant Gram-Negative Bacteria”, Chemical Communications 54 (2018): 5130–5133, 10.1039/c8cc02412j. [DOI] [PubMed] [Google Scholar]
  • 18.M. L. Merz, S. Habeshian, B. Li, et al., “De Novo Development of Small Cyclic Peptides That Are Orally Bioavailable,” Nature Chemical Biology (2023): 1–10, 10.1038/s41589-023-01496-y. [DOI] [PMC free article] [PubMed]
  • 19. Chevillard F., Kolb P., “SCUBIDOO: A Large yet Screenable and Easily Searchable Database of Computationally Created Chemical Compounds Optimized toward High Likelihood of Synthetic Tractability”, Journal of Chemical Information and Modeling 55, no. 9 (2015): 1824–1835, 10.1021/acs.jcim.5b00203. [DOI] [PubMed] [Google Scholar]
  • 20. Hoffmann T., Gastreich M., “The next Level in Chemical Space Navigation: Going Far beyond Enumerable Compound Libraries”, Drug Discovery Today 24, no. 5 (2019): 1148–1156, 10.1016/j.drudis.2019.02.013. [DOI] [PubMed] [Google Scholar]
  • 21. Grygorenko O. O., Radchenko D. S., Dziuba I., et al., “Generating Multibillion Chemical Space of Readily Accessible Screening Compounds”, iScience 23, no. 11 (2020): 101681, 10.1016/j.isci.2020.101681. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Warr W. A., Nicklaus M. C., Nicolaou C. A., et al., “Exploration of Ultralarge Compound Collections for Drug Discovery”, Journal of Chemical Information and Modeling 62, no. 9 (2022): 2021–2034, 10.1021/acs.jcim.2c00224. [DOI] [PubMed] [Google Scholar]
  • 23. Irwin J. J., Gaskins G., Sterling T., et al., “Predicted Biological Activity of Purchasable Chemical Space”, Journal of Chemical Information and Modeling 58, no. 1 (2018): 148–164, 10.1021/acs.jcim.7b00316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Lyu J., Wang S., Balius T. E., et al., “Ultra-Large Library Docking for Discovering New Chemotypes”, Nature 566, no. 7743 (2019): 224–229, 10.1038/s41586--019--0917--9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Korn M., Ehrt C., Ruggiu F., et al., “Navigating Large Chemical Spaces in Early-Phase Drug Discovery”, Current Opinion in Structural Biology 80 (2023): 102578, 10.1016/j.sbi.2023.102578. [DOI] [PubMed] [Google Scholar]
  • 26. Liang J., Woodward C., Edelsbrunner H., “Anatomy of Protein Pockets and Cavities: Measurement of Binding Site Geometry and Implications for Ligand Design”, Protein Science: A Publication of the Protein Society 7, no. 9 (1998): 1884–1897, 10.1002/pro.5560070905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Kandel J., Tayara H., Chong K. T., “PUResNet: Prediction of Protein-Ligand Binding Sites Using Deep Residual Neural Network”, Journal of Cheminformatics 13, no. 1 (2021): 65, 10.1186/s13321--021--00547--7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.A. Comajuncosa-Creus, G. Jorba, X. Barril, et al., “Comprehensive Detection and Characterization of Human Druggable Pockets through Novel Binding Site Descriptors,” bioRxiv (2024): 2024.03.14.584971, 10.1101/2024.03.14.584971. [DOI] [PMC free article] [PubMed]
  • 29. Reymond J.-L., Ruddigkeit L., Blum L., et al., “The Enumeration of Chemical Space”, Wiley Interdisciplinary Reviews: Computational Molecular Science 2, no. 5 (2012): 717–733, 10.1002/wcms.1104. [DOI] [Google Scholar]
  • 30. Awale M., Visini R., Probst D., et al., “Chemical Space: Big Data Challenge for Molecular Diversity”, Chimia 71, no. 10 (2017): 661–666, 10.2533/chimia.2017.661. [DOI] [PubMed] [Google Scholar]
  • 31. Buehler Y., Reymond J.-L., “Molecular Framework Analysis of the Generated Database GDB-13 s”, Journal of Chemical Information and Modeling 63, no. 2 (2023): 484–492, 10.1021/acs.jcim.2c01107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Ertl P., “Cheminformatics Analysis of Organic Substituents: Identification of the Most Common Substituents, Calculation of Substituent Properties, and Automatic Identification of Drug-like Bioisosteric Groups”, Journal of Chemical Information and Computer Sciences 43, no. 2 (2003): 374–380, 10.1021/ci0255782. [DOI] [PubMed] [Google Scholar]
  • 33. Ripphausen P., Nisius B., Bajorath J., “State-of-the-Art in Ligand-Based Virtual Screening”, Drug Discovery Today 16, no. 9 (2011): 372–376, 10.1016/j.drudis.2011.02.011. [DOI] [PubMed] [Google Scholar]
  • 34. Giordano D., Biancaniello C., Argenio M. A., et al., “Drug Design by Pharmacophore and Virtual Screening Approach”, Pharmaceuticals 15, no. 5 (2022): 646, 10.3390/ph15050646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Willett P., “Similarity-Based Virtual Screening Using 2D Fingerprints”, Drug Discovery Today 11, no. 23–24 (2006): 1046–1053, 10.1016/j.drudis.2006.10.005. [DOI] [PubMed] [Google Scholar]
  • 36. Scior T., Bender A., Tresadern G., et al., “Recognizing Pitfalls in Virtual Screening: A Critical Review”, Journal of Chemical Information and Modeling 52, no. 4 (2012): 867–881, 10.1021/ci200528d. [DOI] [PubMed] [Google Scholar]
  • 37. Riniker S., Landrum G. A., “Open-Source Platform to Benchmark Fingerprints for Ligand-Based Virtual Screening”, Journal of Cheminformatics 5, no. 1 (2013): 26, 10.1186/1758--2946--5--26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Schmidt R., Klein R., Rarey M., “Maximum Common Substructure Searching in Combinatorial Make-on-Demand Compound Spaces”, Journal of Chemical Information and Modeling 62, no. 9 (2022): 2133–2150, 10.1021/acs.jcim.1c00640. [DOI] [PubMed] [Google Scholar]
  • 39. Schneider G., Neidhart W., Giller T., et al., “‘Scaffold-Hopping’ by Topological Pharmacophore Search: A Contribution to Virtual Screening”, Angewandte Chemie (International Edition in English) 38, no. 19 (1999): 2894–2896, https://doi.org/10.1002/(SICI)1521-3773(19991004)38:19<2894::AID-ANIE2894>3.0.CO;2-F. [PubMed] [Google Scholar]
  • 40. Sauer W. H., Schwarz M. K., “Molecular Shape Diversity of Combinatorial Libraries: A Prerequisite for Broad Bioactivity”, Journal of Chemical Information and Computer Sciences 43, no. 3 (2003): 987–1003, 10.1021/ci025599w. [DOI] [PubMed] [Google Scholar]
  • 41. Nicholls A., McGaughey G. B., Sheridan R. P., et al., “Molecular Shape and Medicinal Chemistry: A Perspective”, Journal of Medicinal Chemistry 53, no. 10 (2010): 3862–3886, 10.1021/jm900818s. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Awale M., Reymond J. L., “Atom Pair 2D-Fingerprints Perceive 3D-Molecular Shape and Pharmacophores for Very Fast Virtual Screening of ZINC and GDB-17”, Journal of Chemical Information and Modeling 54 (2014): 1892–1897, 10.1021/ci500232g. [DOI] [PubMed] [Google Scholar]
  • 43.M. Awale, X. Jin, J. L. Reymond, “Stereoselective Virtual Screening of the ZINC Database Using Atom Pair 3D-Fingerprints,” Journal of Cheminformatics (2015): 7, 3, 10.1038/nrd1799. [DOI] [PMC free article] [PubMed]
  • 44. Bonvin E., Personne H., Paschoud T., et al., “Antimicrobial Peptide-Peptoid Hybrids with and without Membrane Disruption”, ACS Infectious Diseases 9, no. 12 (2023): 2593–2606, 10.1021/acsinfecdis.3c00421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Capecchi A., Zhang A., Reymond J.-L., “Populating Chemical Space with Peptides Using a Genetic Algorithm”, Journal of Chemical Information and Modeling 60, no. 1 (2020): 121–132, 10.1021/acs.jcim.9b01014. [DOI] [PubMed] [Google Scholar]
  • 46. Schneider G., Fechner U., “Computer-Based de Novo Design of Drug-like Molecules”, Nature Reviews Drug Discovery 4, no. 8 (2005): 649–663, 10.1007/s10822-004-6986-2. [DOI] [PubMed] [Google Scholar]
  • 47. Lamanna G., Delre P., Marcou G., et al., “GENERA: A Combined Genetic/Deep-Learning Algorithm for Multiobjective Target-Oriented De Novo Design”, Journal of Chemical Information and Modeling 63, no. 16 (2023): 5107–5119, 10.1021/acs.jcim.3c00963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Cai X., Capecchi A., Olcay B., et al., “Exploring the Sequence Space of Antimicrobial Peptide Dendrimers”, Israel Journal of Chemistry 63, no. 10–11 (2023): e202300096, 10.1002/ijch.202300096. [DOI] [Google Scholar]
  • 49. Capecchi A., Awale M., Probst D., et al., “PubChem and ChEMBL beyond Lipinski”, Molecular Informatics 38 (2019): 1900016, 10.1002/minf.201900016. [DOI] [PubMed] [Google Scholar]
  • 50. Orsi M., Probst D., Schwaller P., et al., “Alchemical Analysis of FDA Approved Drugs”, Digital Discovery 2, no. 5 (2023): 1289–1296, 10.1039/D3DD00039G. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Capecchi A., Probst D., Reymond J.-L., “One Molecular Fingerprint to Rule Them All: Drugs, Biomolecules, and the Metabolome”, Journal of Cheminformatics 12, no. 1 (2020): 43, 10.1186/s13321--020--00445--4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Orsi M., Reymond J.-L., “One Chiral Fingerprint to Find Them All”, ChemRxiv (2023): 10.26434/chemrxiv-2023-33j02. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Zuckermann R. N., “Peptoid Origins”, Peptide Science 96, no. 5 (2011): 545–555, 10.1002/bip.21573. [DOI] [PubMed] [Google Scholar]
  • 54. Amblard M., Fehrentz J.-A., Martinez J., et al., “Methods and Protocols of Modern Solid Phase Peptide Synthesis”, Molecular Biotechnology 33, no. 3 (2006): 239–254, 10.1385/MB:33:3:239. [DOI] [PubMed] [Google Scholar]
  • 55. Clapperton A. M., Babi J., Tran H., “A Field Guide to Optimizing Peptoid Synthesis”, ACS Polymers Au 2, no. 6 (2022): 417–429, 10.1021/acspolymersau.2c00036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Matthews T., Salgo M., Greenberg M., et al., “Enfuvirtide: The First Therapy to Inhibit the Entry of HIV-1 into Host CD4 Lymphocytes”, Nature Reviews Drug Discovery 3, no. 3 (2004): 215–225, 10.1038/nrd1331. [DOI] [PubMed] [Google Scholar]
  • 57. Knudsen L. B., Lau J., “The Discovery and Development of Liraglutide and Semaglutide”, Frontiers in Endocrinology 10 (2019): 10.3389/fendo.2019.00155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Zuckermann R. N., Kerr J. M., Kent S. B. H., et al., “Efficient Method for the Preparation of Peptoids [Oligo(N-Substituted Glycines)] by Submonomer Solid-Phase Synthesis”, Journal of the American Chemical Society 114, no. 26 (1992): 10646–10647, 10.1021/ja00052a076. [DOI] [Google Scholar]
  • 59. Poirel L., Jayol A., Nordmann P., “Polymyxins: Antibacterial Activity, Susceptibility Testing, and Resistance Mechanisms Encoded by Plasmids or Chromosomes”, Clinical Microbiology Reviews 30, no. 2 (2017): 557–596, 10.1128/CMR.00064--16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Morstein J., Capecchi A., Hinnah K., et al., “Medium-Chain Lipid Conjugation Facilitates Cell-Permeability and Bioactivity”, Journal of the American Chemical Society 144, no. 40 (2022): 18532–18544, 10.1021/jacs.2c07833. [DOI] [PubMed] [Google Scholar]
  • 61. Kurtzhals P., Havelund S., Jonassen I., et al., “Effect of Fatty Acids and Selected Drugs on the Albumin Binding of a Long-Acting, Acylated Insulin Analogue”, Journal of Pharmaceutical Sciences 86, no. 12 (1997): 1365–1368, 10.1021/js9701768. [DOI] [PubMed] [Google Scholar]
  • 62.S. Kim, J. Chen, T. Cheng, et al., “PubChem 2023 Update,” Nucleic Acids Research 51, no. D1 (2023): D1373–D1380, 10.1093/nar/gkac956. [DOI] [PMC free article] [PubMed]
  • 63. Gause G. F., Brazhnikova M. G., “Gramicidin S and Its Use in the Treatment of Infected Wounds”, Nature 154, no. 3918 (1944): 703–703, 10.1038/154703a0. [DOI] [Google Scholar]
  • 64. Kondejewski L. H., Farmer S. W., Wishart D. S., et al., “Gramicidin S Is Active against Both Gram-Positive and Gram-Negative Bacteria”, International Journal of Peptide and Protein Research 47, no. 6 (1996): 460–466, 10.1111/j.1399--3011.1996.tb01096.x. [DOI] [PubMed] [Google Scholar]
  • 65. Knappe D., Piantavigna S., Hansen A., et al., “Oncocin (VDKPPYLPRPRPPRRIYNR-NH2): A Novel Antibacterial Peptide Optimized against Gram-Negative Human Pathogens”, Journal of Medicinal Chemistry 53, no. 14 (2010): 5240–5247, 10.1021/jm100378b. [DOI] [PubMed] [Google Scholar]
  • 66. Zhang H., Xia X., Han F., et al., “Cathelicidin-BF, a Novel Antimicrobial Peptide from Bungarus Fasciatus, Attenuates Disease in a Dextran Sulfate Sodium Model of Colitis”, Molecular Pharmaceutics 12, no. 5 (2015): 1648–1661, 10.1021/acs.molpharmaceut.5b00069. [DOI] [PubMed] [Google Scholar]
  • 67.H. R. Bokesch, L. K. Pannell, P. K. Cochran, et al., “A Novel Anti-HIV Macrocyclic Peptide from Palicourea Condensata,”. Journal of Natural Products 64, no. 2 (2001): 249–250, 10.1021/np000372l. [DOI] [PubMed]
  • 68. Probst D., Reymond J.-L., “Visualization of Very Large High-Dimensional Data Sets as Minimum Spanning Trees”, Journal of Cheminformatics 12, no. 1 (2020): 12, 10.1186/s13321--020--0416-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Brown N., McKay B., Gasteiger J., “The de Novo Design of Median Molecules within a Property Range of Interest”, Journal of Computer-Aided Molecular Design 18, no. 12 (2004): 761–771. [DOI] [PubMed] [Google Scholar]
  • 70. van Deursen R., Reymond J.-L., “Chemical Space Travel”, ChemMedChem: Chemistry Enabling Drug Discovery 2, no. 5 (2007): 636–640, 10.1002/cmdc.200700021. [DOI] [PubMed] [Google Scholar]
  • 71. Chatterjee J., Rechenmacher F., Kessler H., “N-Methylation of Peptides and Proteins: An Important Element for Modulating Biological Functions”, Angewandte Chemie (International Edition in English) 52, no. 1 (2013): 254–269, 10.1002/anie.201205674. [DOI] [PubMed] [Google Scholar]
  • 72. Corbett K. M., Ford L., Warren D. B., et al., “Cyclosporin Structure and Permeability: From A to Z and Beyond”, Journal of Medicinal Chemistry 64, no. 18 (2021): 13131–13151, 10.1021/acs.jmedchem.1c00580. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

Supporting Information

Data Availability Statement

The code used for the analysis and plots study is available at https://github.com/reymond‐group/10E60. The raw results files can be retrieved at https://zenodo.org/records/11396287.


Articles from Molecular Informatics are provided here courtesy of Wiley

RESOURCES