Skip to main content
RSC Medicinal Chemistry logoLink to RSC Medicinal Chemistry
. 2021 Dec 24;13(3):300–310. doi: 10.1039/d1md00363a

Comprehensive analysis of commercial fragment libraries

Julia Revillo Imbernon 1, Célien Jacquemard 1, Guillaume Bret 1, Gilles Marcou 2,, Esther Kellenberger 1,
PMCID: PMC8942207  PMID: 35434627

Abstract

Screening of fragment libraries is a valuable approach to the drug discovery process. The quality of the library is one of the keys to success, and more particularly the design or choice of a library has to meet the specificities of the research program. In this study, we made an inventory of the commercial fragment libraries and we established a methodology which allows any library to be positioned in relation to the complete offer currently on the market, by addressing the following questions: does this chemical library look like another chemical library? What is the coverage of the current chemical space by this chemical library? What are the characteristic structural features of the fragments of this chemical library? We based our analysis on 2D and 3D chemical descriptors, framework class generation and the generative topographic map. We identified 59 270 scaffolds, which can be searched in a dedicated web site (https://gtmfrag.drugdesign.unistra.fr) and developed a model which accounts for fragment diversity while being easy to interpret (download at 10.5281/zenodo.5534434).


Explore the chemical space of libraries marketed for fragment-based drug discovery.graphic file with name d1md00363a-ga.jpg

Introduction

For the last 25 years, fragment-based drug discovery (FBDD) has widely increased in popularity as an alternative or complement to high throughput screening (HTS).1,2 The approach has facilitated the success of many medicinal chemistry programs, with five fragment drugs approved by the FDA and many molecules having entered clinical trials.3

FBDD is an interdisciplinary field organized around library design, fragment screening, computational methods and optimization.4 These topics are strongly interconnected. For example, the biophysical techniques suitable for the study of low affinity binders, such as X-ray crystallography or isothermal calorimetry, have a low throughput, and therefore are limited to testing small-sized libraries. The screening method can also impose a bias on the chemical structure. For example, nuclear magnetic resonance (NMR), a pioneering method in FBDD,5 can exploit fluorine magnetic properties for the detection of fluorinated ligand binding.6 More generally, the physico-chemical properties are determinant for the selection of fragments depending on the screening objective. An illustrative example is provided by X-ray crystallography which allows the mapping of hot spots in protein using molecules with ultra-low molecular weight.7

Fragment definition

A widely accepted definition is that fragments are compounds following the rule of three (RO3),8 analogous to Lipinski's rule of five (RO5):9 MW < 300 Da, A log P ≤ 3, number of hydrogen bond acceptors ≤3 and number of hydrogen bond donors ≤3. Additional criteria have also been proposed such as the number of rotatable bonds being smaller than 3 and a polar surface area smaller than 60 Å2. The MW limit has also fluctuated tending towards smaller molecules, which generally present a lower hydrophobicity.10,11 More complex rules have been developed to quantify the desirability of each compound, such as the quantitative metric of drug-likeness (QED), which ranges from 0 to 1.12 However, QED is designed for full-sized drug-like molecules: as a consequence, QED values are biased when applied to fragment-like compounds. In this work, we expect this bias to be equal on all investigated situations.

Recently, Konteatis has proposed eight guidelines for the selection of fragments, stressing that priority should be set on the enthalpy-driven binding interactions, followed by the molecular shape diversity of the target-complementary fragments.13 Indeed, Keserü et al. have theoretically and experimentally confirmed the importance of enthalpic contribution to prioritize fragments forming key interactions.14 The success of fragment-to-lead optimization depends on the target-anchoring capacity of fragments, which in particular determines if the binding mode is conserved during fragment growth. Analysis of fragment binding modes in the Protein Data Bank suggested that fragments that are able to form polar interactions and have a molecular weight larger than 150 Da should be prioritized.15

Fragment libraries

Suppliers of chemical libraries offer a large panel of screening collections of fragments. The content of fragment libraries varies depending on their purpose. A fragment library can be tailored to a precise screening method, such as, for example, NMR.6 Fragment screening can not only deliver hits, but is also suitable for protein binding site detection, druggability assessment or study of versatile binding modes.16 Library design in general focuses on a specific need, such as, for example, the ability of the fragment to form a covalent bond, which imposes a particular functional group, or the ability of the library to offer a wide diversity of chemical structures.

The size of chemical libraries is variable, with a number of fragments ranging from a few hundred to over a hundred thousand. From the early days of FBDD, Hann et al. have noticed the relationship between molecular complexity and the likelihood of success in hit discovery,17 suggesting that screening a thousand fragments covers the chemical space in a more effective way than a million HTS compounds. More recently, Shi and von Itzstein engaged in a theoretical exercise aiming to determine how many fragments are sufficient to represent the 227 787 commercial fragments of the ZINC database.18 Their comparison of different sized subsets, using a measure of diversity taking into account both the number of unique structural fingerprints and their relative abundance in the subset, revealed that a subset of approximatively 2000 fragments can have the same level of diversity as the full set of fragments.

Three dimensionality

For the past ten years, three-dimensionality has been a major concern in medicinal chemistry to produce molecules that improve both the originality of drug candidates and their selectivity. In the seminal article by Lovering et al., the analysis of a huge number of compounds sampled at different stages of the drug development process suggested that complex molecules are more likely to become drug candidates than simpler molecules. This parameter was evaluated by the fraction of sp3 hybridized carbons and the number of chiral centers.19 The interest in non-planar scaffolds has given rise to numerous developments in synthetic methods.20–24 In FBDD programs, planar fragments are often avoided, since they tend to bind multiple targets, and therefore are potentially more difficult to develop into a selective drug-like compound.16 Nonetheless, the successful ligand design from flat fragments has fueled the debate, demonstrating that the flatness of the fragments does not necessarily transfer to the lead.25

In this study, we propose a comprehensive analysis of commercial fragment libraries available in 2021, with emphasis set on diversity and three-dimensionality. This analysis is designed so that it can be updated and made available on a yearly basis, to follow the evolutions of the content and availability of commercial fragment libraries.

Materials and methods

Collection of fragment libraries and classification

A total of 86 different fragment libraries from fourteen vendors were collected from February 22nd–24th, 2021. These libraries were categorized by composition, as annotated by the supplier, into ten main categories (Table 1): general (GEN), diverse (DIV), natural-product like (NP), 3D-shaped (3D), sp3 hybridized (SP3), metal chelating (CHE), protein–protein interactions (PPI), fluorine-rich (FLU), halogen-rich (HAL), and covalent binders (COV). The chemical libraries that did not fall into any of these categories were called miscellaneous (MIS).

Summary of library classification.

Type Description Number of libraries
Small-sized Middle-sized Large-sized
GEN General 2 3 9
DIV Diverse 7 1 0
NP Natural product-like 2 2 0
3D 3D-shaped 3 1 0
SP3 sp3 hybridized 1 2 2
CHE Metal chelators 3 0 0
PPI Protein–protein interacting 1 3 0
FLU Fluorine enriched 7 3 1
HAL Halogen enriched 3 4 0
COV Reactive covalent linking 4 5 0
MIS Miscellaneous 8 7 2

Standardization of fragments and calculation of molecular properties

Pipeline Pilot (v.19.1.0.1964; BIOVIA) was used to generate the canonical SMILES of each molecule from the structure files provided by the suppliers, then to remove counter ions of salts. Filter (v.2.5.1.4; Openeye Scientific Software, Santa Fe) was used to filter out simple ions and invalid chemical structures (for example containing a pentavalent carbon atom) and to ionize the molecules at pH = 7.4. The pyridone, pyridine and sulfonamide descriptions were corrected, and fragments were aromatized, using a basic aromatic scheme, with the Standardizer (v16.10.17.0; ChemAxon Ltd.) software.

Pipeline Tools was used to calculate 2D descriptors: number of hydrogen bond acceptors (HBA) and donors (HBD), as defined by Lipinski's Rule; A log P, computed with the Ghose/Crippen group-contribution estimate for Log P;26 polar surface area (PSA); number of rotatable bonds (Nrot); quantitative estimate of drug-likeness (QED).12 3D descriptors were calculated by using Openeye libraries (v.2019.10.2) from a low energy conformer generated for each molecule by the Corina (v.3.40; Molecular Networks GmbH, Nürnberg, Germany) software with its default settings: solvent-accessible surface area (SASA), polar surface area (3D-PSA), plane of best fit (PBF).27

Generation of fragment classes

The fragments from all libraries except those built with a chemical bias (i.e. COV, PPI, HAL, FLU and the MIS constructed around a specific functional group) were merged into a single file. Duplicates, as identified from identical canonical SMILES, were removed using Pipeline Pilot (BIOVIA, Dassault Systèmes, Pipeline Pilot, release 2019, San Diego: Dassault Systèmes).

Fragments were imported into ADMET Predictor 2019 (ADMET Predictor™, Simulations Plus, Inc., Lancaster, California, USA), neutralized, standardized, and classified using the option “Frameworks”. The method is based on ring systems and connecting chains, similar to the so-called “Murcko assemblies” described by Bemis and Murcko.28 It yields only one scaffold per molecule. Each fragment belongs to only one class, corresponding to its major ring system, and each class is defined by a specific scaffold.

Generative topographic maps (GTM)

The training set was composed of 6017 fragments randomly picked in the small databases. Molecular descriptors were generated using ISIDA/Fragmentor (v.2019, Faculté de Chimie, Université de Strasbourg, France),29 with atom centered fragments based on sequences of atoms and bonds, a length ranging from 2 to 4 and tagging the carbon atom if part of a ring and/or of an aromatic system, IIAB(2–4)_cycle. Those descriptors present in less than 5% of the dataset have been discarded.

The manifold generation and projections were carried out with the GTMTool (v.2018, Faculté de Chimie, Université de Strasbourg, France). A pre-processing procedure, to center the dataset, was used and the total number of traits was set to 301. The best model was chosen based on the likelihood. The GTMClass tool was used to generate property landscapes specific to supplier categories (Table 1). Landscape depiction was performed using the GTMLandscape software.

Dataset availability

Datasets are available on Zenodo (doi: 10.5281/zenodo.5534434). The deposit contains the chemical structures, annotations, computed molecular descriptors, scaffolds, and models.

A web interface was designed to search the scaffolds. It is freely available at: https://gtmfrag.drugdesign.unistra.fr. Molconvert was used to standardize structures and convert between file types, JChem Base was used to store structures in a postgresql instance and is used to query the database and Marvin.js was used for drawing structures, LTS-helium (21.4.0), ChemAxon (https://www.chemaxon.com).

Results and discussion

Overview of commercially available fragments

Our survey identified 86 fragment libraries, provided by fourteen 14 chemical suppliers, for a total of 754 646 molecules with 512 284 unique chemical 2D-structures.

These libraries have been designed with various specifications, for purposes which are generally well described, thereby allowing us to classify most of the libraries by type (Table 1). Firstly, twenty-seven libraries show chemical bias: enrichment in fluorine for screening by 19F-NMR (FLU type, eleven libraries), enrichment in other halogen elements for harnessing halogen bonding (HAL type, seven libraries),30 and the presence of a reactive group for covalent linking to the target (COV type, nine libraries). For eleven other chemical libraries, the construction rules consider the whole chemical structure of the fragment to focus on particular chemotypes: natural product or natural product-like fragments (NP type, four libraries), compounds targeting protein–protein interactions (PPI type, four libraries), or potential metal-chelating compounds (CHE type, three libraries). There are also eight libraries which aim at maximizing the coverage of the chemical space by optimizing molecular diversity within the set of fragments (DIV type). Avoiding an overrepresentation of flat fragments was the determining factor in the design of nine libraries, by favoring fragments with a non-flat 3D-shape (3D type, four libraries) or fragments with a high content of sp3-hydridized atoms (SP3 type, five libraries). The other libraries have each their own special properties and thus were qualified as miscellaneous (MIS type, seventeen libraries, e.g. library containing fragments which are well soluble in both dimethyl sulfoxide and phosphate-buffered saline buffer), or do not present specific characteristics and thus were qualified as general (GEN type, fourteen libraries).

Libraries were also classified according to their number of fragments into the three following categories: small (≤2000 fragments, 41 libraries), intermediate (2000–10 000 fragments, 31 libraries) and large (>10 000 fragments, 14 libraries). Only small-sized libraries are suitable for testing by biophysical approaches, such as NMR. Intermediate ones are accessible for medium throughput screening techniques. The large libraries offer a wide choice for the custom design of small chemical libraries. The small-sized libraries taken as a whole contain 33 023 different fragments. This subset covers 6.4% of all commercial fragments. By comparison, the intermediate libraries taken as a whole contain 94 471 different fragments, covering 18.4% of all commercial fragments.

Commercially available fragments conform with RO3

In 2003, Congreve et al. defined the “Rule of 3” (RO3) as follows: MW ≤ 300 Da, A log P ≤ 3, HBA ≤ 3, and HBD ≤ 3.8 As shown in Fig. 1a, less than 23% of the fragments strictly fulfill the four conditions and 74% break one rule, mainly that about HBA (60% of the cases). Most of the fragments have a MW in the 200–300 Da range (Fig. 1B). Only 13.43% of the fragments have a MW lower than 200 Da. These small fragments, which generally have a smaller number of atoms and a lower molecular complexity, have an increased chance of binding to a protein.17 Moreover, as low affinity binders with low specificity, these small fragments constitute chemical probes useful for the detection of binding sites and hotspots as well as druggability studies.16 Several suppliers have developed libraries of low MW fragments, sometimes called the Mini-Frag libraries. Noteworthily, the proportion of small fragments has surprisingly decreased over the two last years (24.29% in 2019).

Fig. 1. Physico-chemical properties of 512 284 commercial fragments. (A) Percentage of fragments which comply with the RO3 and which violate one to four of the rules. (B) Heatmap showing the distribution of MW and A log P for the entire fragment set. (C) Heatmap showing the distribution of MW and A log P for the subset of fragments in the small-sized libraries. (D) Logarithmic scatter plot representing the distribution of mean QED depending on the number of molecules and the type of library. The black dashed lines indicate the limit between small-sized libraries, intermediate-sized libraries, and large sized-libraries.

Fig. 1

Together with MW, lipophilicity is an important parameter to prioritize compounds in a medicinal chemistry program.31,32 Considering log P, again most of the fragments fall in a narrow range (0–2), with a high proportion of weakly hydrophobic molecules (46% with A log P < 1, Fig. 1B). Considering only the small-sized libraries (Fig. 1C), MW and log P distributions are more spread out, with a shift of the maximum of the MW distribution to a lower value, as compared to the full set of fragments.

The drug-likeness was further evaluated using QED. In their publication presenting the QED, Bickerton et al. evaluated that QEDs of attractive starting compounds for hit optimization are on average 0.67 (standard deviation = 0.16).12 They observed that “too simple” or “too complex” unattractive compounds overall show lower QED. The QED mean values computed here for each fragment library are represented in Fig. 1D. All libraries but Mini-Frag show mean QED exceeding 0.6, i.e. close or passing the threshold proposed for drug-like compounds, although MW, which is taken into account in the QED, is lower in fragments. QED mean values are not related to the library type.

Diversity of fragments

In order to assess the chemical diversity of the entire fragment collection, we used two different and complementary approaches: the classification of the fragments by common chemical scaffolds and the mapping of the chemical space covered by the fragments using GTM. These analyses have taken into account the libraries of the following types: GEN, DIV, NP, CHE, SP3, 3D and MIS (list in Table S1). PPI libraries were removed because their MW is significantly higher than the MW given by the fragment definition used for this study. FLU and HAL libraries were discarded because they would have introduced a bias in the representation of the chemical space. COV libraries were removed because the recurrent substructures present on reactive covalent linking group fragments would also have caused a bias in the analysis of the chemical space. The diversity analyses were hence performed on an ensemble of 433 433 unique fragments from fifty libraries provided by fourteen suppliers (Table S2). The classification of all fragments by the common substructure resulted in the inventory of 59 270 scaffolds. Not surprisingly, the number of scaffolds represented in a chemical library depends on the total number of fragments in the library (Fig. 2A and B). The ratio between the number of scaffolds and the number of molecules however varies, ranging from 0.13 to 0.77. Interestingly, small-sized DIV libraries cover this range (Fig. 2B), suggesting that some commercial collections of DIV fragments are not obtained by maximizing the number of different scaffolds. More generally, the type of library does not allow the number of scaffolds it contains to be presumed.

Fig. 2. Diversity of fragment scaffolds. (A) Scatter plot showing the number of scaffolds in a library as a function of the number of fragments in the library. The dot size indicates the proportion of singletons, i.e. scaffolds present in a single copy in the full set of fragments. The blue dotted line depicts x = y; the red and green dotted lines respectively depict the maximum and minimum ratio between the number of scaffolds and the number of molecules. (B) Zoom on the portion of the scatter plot (A) corresponding to small-sized libraries. (C) Bar plot showing the number of molecules that contain a scaffold as a function of the resulting rank of the scaffold, for the hundred most frequent scaffolds in the full set of fragments. The structures of the ten most common scaffolds are shown in the insert. (D) Bar plot showing the number of molecules that contain a scaffold as a function of the resulting rank of the scaffold, for the hundred most frequent scaffolds in small-sized libraries. The structures of the ten most common scaffolds are shown in the insert. (E) Selection of complex scaffolds in small-sized libraries. (F) Selection of heteroatom-rich scaffolds in small-sized libraries. In (E and F), the scaffold identifier is shown above the structure and the number of molecules containing the scaffold is indicated below the structure.

Fig. 2

If we consider the 59 270 scaffolds as a whole, their representation among the complete set of 433 433 fragments is highly uneven. About 0.1% of the scaffolds are present in more than a thousand of fragments, hence covering 25% of the entire set (Fig. 2C). These scaffolds are mainly simple and aromatic, such as benzene and nitrogen heterocycles. The fifty most frequent scaffolds are shown in Table S3 (left panel). At the other end of the distribution of scaffolds by frequency, there are 36 555 scaffolds defined by a single fragment and here called singletons.

Interestingly, the proportion of singletons may be related to the library specificity (Fig. 2B and Table S2). For example, small and diverse-type libraries, which are designed to maximally cover the chemical space, have a low proportion of singleton-scaffolds. By contrast, sp3-type libraries, which are focused on a more specific chemical space, present a higher proportion of singletons.

Focusing on small-sized libraries, about 6.4 percent of the 59 270 scaffolds are represented among the set of 20 708 unique fragments. The distribution of scaffolds by frequency within this smaller set resembles that observed within the complete set of 433 433 fragments (compare Fig. 2C and D). The most frequent scaffold remains benzene, the frequency of which is even increased (present in 12% of molecules, against 5% in the complete set). The structures and frequencies of the fifty most frequent scaffolds are shown in Table S3 (right panel). The diversity has been obtained through the singletons, which represent 62% of the scaffolds.

Less frequent scaffolds generally have a higher complexity. Their number of non-hydrogen atoms, number of rings, and PBF are on average higher than those for the most common scaffolds (Table 2). By focusing on the scaffolds present in small-sized libraries, this difference in complexity between popular and rarer scaffolds is maintained. If the comparison of common and less frequent scaffolds again shows that less frequent scaffolds have on average a higher number of non-hydrogen atoms, a higher number of rings and a higher PBF, it also reveals that the average complexity of small-sized library scaffolds is reduced (Table 2). The majority of scaffolds present in small-sized libraries contain two (63.9%) or three rings (14.1%). In many scaffolds with two or three rings, the rings are connected by an acyclic short linker. Only a hundred complex scaffolds contain more than three rings, which are either fused or bridged forming a compact, 3D or linear geometry (see examples in Fig. 2E), or which are separated by a central aliphatic linker of length 1 to 4. A unique scaffold shows a central ring acting as a platform on which are attached, via exocyclic single bonds, three other rings exploring three directions of space (C 43146 in Fig. 2E). The same scaffold is also remarkable because it contains six nitrogen atoms and one oxygen atom. The number of nitrogen, oxygen and sulfur atoms also gives an insight into the variety of small-sized library scaffolds. Nitrogen is by far the most common heteroatom (1–6 N in 93.4% of scaffolds), ahead of oxygen (1–3 O in 30.7% of scaffolds) and sulfur (1–2 S in 21.9% of scaffolds). Fig. 2F shows examples of scaffolds containing more than one nitrogen, oxygen or sulphur atoms, or containing all three heteroatoms.

Comparison of scaffold properties and frequency.

Frequency Number of scaffolds Number of non-hydrogen atoms (median) Proportion (%) of scaffold with PBF (mean ± SD)
1 ring 2 rings 3 rings >3 rings
Scaffolds present in all libraries Singletons 36 555 17 0.2 32.6 55.4 11.9 0.62 ± 0.17
Intermediate 22 669 15 0.7 60 36.4 2.9 0.56 ± 0.17
Populara 46 9 23.9 76.1 0 0 0.46 ± 0.17
Scaffolds present in small-sized libraries Singletons 2356 13 1.4 72.1 22.2 4.3 0.49 ± 0.20
Intermediate 1426 11 5.3 85.1 9 0.6 0.43 ± 0.20
Populara 46 9 34.8 63 0 2.2 0.30 ± 0.17
a

Scaffolds appearing at least 1000 times in all libraries or 66 times in small-sized libraries. These thresholds have been set as the top 1/5 of the highest frequency value (most common scaffolds have been previously discarded: benzene for all libraries and benzene and pyridine for small-sized libraries).

Fragment space mapping

To further study the chemical diversity covered by commercial fragments, we developed a generative topographic map (GTM) model able to represent the chemical space in a landscape.29 This method has the advantage of allowing a simple visual analysis. At a glance, the overall comparison of two maps gives the degree of resemblance between two sets of fragments. Local inspection of a map reveals information at a finer scale, such as chemical substructures which are well represented in a library. The GTM model was used to evaluate each library against all the fragments and to compare the libraries with each other.

Fig. 3 shows how the constructed GTM model has distributed the fragments on the chemical landscape. Detailed visual analysis of the GTM map validated the consistency between the spatial proximity of the projection points of the fragments and the similarity of the chemical structure of these fragments. Thus, it is possible to associate particular chemical characteristics with the different areas of the map, as illustrated in Fig. 3A with an area rich in unsaturated molecules (R1), an area populated by derivatives of benzoimidazole (R2) and an area where all fragments contain fluorine (R3). Within an area, the fragments are distributed in space depending on complexity (number of rings, presence of heteroatoms, number of substituents) and the type of functional group. For example, as shown in Fig. 3B, in the lower part of R3, a path going from right to left will successively meet fragments containing a trifluoromethyl group and a phenyl group, then those containing a trifluoromethyl group and a nitrogenous heterocycle, and finally those containing only a single fluorine atom and a nitrogenous heterocycle.

Fig. 3. Spatial distribution of the fragments in the GTM. (A) Visualization of the composition of the GTM model. Each point in the map corresponds to a single compound. The regions discussed in the text are delimited by dashed lines. The green, blue and orange dots spot selected examples in these map regions. (B) Structure of the example molecules. Note that the color code is consistent between panels A and B. R8a, R8b, and R8c, which are grouped in the text for the sake of simplicity, refer to three distinct chemical regions. (C) GTM landscape of the full dataset – 433 433 fragments. The likelihood represents the probability distribution of data in the chemical space. The annotations represent specific regions to help the reader locate them during discussion.

Fig. 3

In the GTM landscapes, likelihood is represented through a gradient color and depicts the probability distribution of the data. Fig. 3C shows that the density of the entire dataset covers the entire map, yet the distribution is clearly heterogeneous. Purple regions correspond to the most populated ones. The widely dense south-west area (R4) collects a broad set of pyridine derivatives connected to nitrogenated substitutes. As the populated south-east region (R5) is rich in fragments containing a piperazine connected to a pyridine, both regions could be related. The center-south region (R6), enclosing the benzoic acid derivatives, appears to be the less occupied one.

Coverage of the fragment landscape by libraries

Fig. 4A–D show the GTM landscape of the different fragment subsets as defined by the well-characterized library types DIV, NP, CHE, SP3 and 3D. Fragments in SP3 and 3D libraries, which both focus on three-dimensionality of molecules, were merged into a single set. The projections of the subsets on the GTM show different trends, most of which are expected from the purpose of the libraries that compose them. The DIV subset is representative of the whole chemical space as it contains almost all the chemical clusters of the GTM model applied on the full fragments' dataset (Fig. 4A). Nevertheless, the area containing the unsaturated cyclic and linear fragments (R1 in Fig. 3A and R1 in Fig. 4A) is less populated on its left part, where the more complex molecules are located. Also, right south to this region (R7), an elongated area grouping molecules rich in piperidine amides is almost empty. The SP3-3D landscape also well represents the whole data distribution (Fig. 4B), yet as expected, chart density is concentrated on the upper area, where the unsaturated compounds are located (R1). The NP landscape (Fig. 4C) also has a great density among the unsaturated compounds (R1 in Fig. 3A and R1 in Fig. 4C), including the area where the most complex ones are found, yet misses other regions of the whole fragment landscape, in particular the regions with a benzene linked to multiple-nitrogenated cycles are almost empty (R8a, R8b and R8c). For example, the piperazine area is not populated. The CHE subset better covers the whole fragment map than the NP subset (Fig. 4D). In addition, its projection on the GTM well fills the center-south area (R6), which is sparsely populated in (R6) of Fig. 3C. Hence, metal chelator libraries contain specific fragments that are not common in the general dataset.

Fig. 4. GTM landscape of small-sized fragment subsets. (A) DIV libraries – 7454 fragments. (B) SP3 and 3D – 5236 fragments. (C) NP libraries – 431 fragments. (D) CHE libraries, diverse – 3758. (E) DIV library 3 – 1226 fragments. (F) DIV library 6 – 1920 fragments. (G) DIV library 1 – 320 fragments. (H) DIV library 7 – 2000 fragments. The likelihood represents the probability distribution of data in the chemical space. The annotations represent specific regions to help the reader locate them during discussion.

Fig. 4

Although the DIV libraries taken together well represent the whole fragment space, they are not all equivalent (Fig. 4E–H). Comparison of DIV libraries 3 and 6 gives a good illustration (Fig. 4E and F respectively). The two sets, respectively made of 1226 and 1920 fragments, occupy the space in a very different way, although both contain fragments representing every populated cluster present in the full dataset (Fig. 3C). DIV library 3 has a homogenous coverage of the chemical space, while DIV library 6 is more representative of the density. The comparison between DIV library 1 and DIV library 7 (Fig. 4G – 320 fragments and Fig. 4H – 2000 fragments, respectively) illustrates that small-sized libraries can correctly cover the ensemble of the chemical space. Most clusters are covered by both libraries. However, DIV library 1 has a bigger presence on the north part than DIV library 7, despite being 6 times smaller.

Fragment three-dimensionality

The 3D character of fragments was evaluated using the plane of best fit (PBF) method, which fits a plane on the 3D structure of the molecule to issue the average distance of atomic coordinates from the plane, the PBF score.27 If molecule lies on the plane, i.e. is flat, PBF equals 0. Otherwise, PBF increases as a function of the scaffold geometry and the position and size of substituents. The distribution of PBF among the complete dataset of fragments is bimodal with a major peak at 0.62 ± 0.20 and a minor peak at 0.01 ± 0.03, allowing the distinction of flat fragments (7.8% of fragments with PBF < 0.1 Å) and fragments with non-planar geometry (92.2% of fragments with PBF ≥ 0.1 Å). A large part of flat fragments has no rotatable bonds (Fig. 5A). The proportion of this type of fragment is greater in the subset of small-sized libraries than in the whole dataset (8.6% against 21.9%), suggesting that currently, planarity is not a strong filter in the design of many fragment libraries (Fig. 5A and B). Accordingly, the percentage of flat fragments in small-sized libraries has significantly raised since 2019 (16%).

Fig. 5. (A) Heat map showing the distribution of the number of rotatable bonds and PBF for the entire fragment set. (B) Heat map of Nrot vs. PBF for the subset of fragments in the small-sized libraries. (C) GTM landscape of fragments with PBF < 0.1. (D) GTM landscape of fragments with PBF ≥ 0.1. (E) GTM landscape of 3D libraries. (F) GTM landscape of SP3 libraries. The likelihood represents the probability distribution of data in the chemical space. The annotations represent specific regions to help the reader locate them during discussion.

Fig. 5

Considering each library individually, the annotations by type and the three-dimensionality of fragments are consistent (Table S2). As expected, the 3D and SP3 libraries show the highest average PBF (≥0.6 Å). The NP libraries also show high average PBF, in agreement with the known complexity of many natural products.33 On the other hand, the MiniFrag libraries, which are overall poorly diverse (MIS libraries 1 and 2 in Table S2), are mostly composed of flat fragments (mean PBF ≤ 0.17 Å). The mean PBF values of DIV and CHE libraries are smaller than the global average. Actually, the percentage of flat fragments in both libraries is 40.8% and 31.7%, respectively.

To further study the three-dimensionality of fragments, we observed the GTM landscape of flat fragments (Fig. 5C) and of non-planar fragments (Fig. 5D). Interestingly, flat fragments are evenly distributed in nearly the entire fragment space. The same observation is made for non-planar fragments, suggesting that flat fragments can have a non-planar counterpart. Nevertheless, flat fragments show a smaller density in the south-east part (R9), corresponding to the trifluoromethyl derivates, and in the main center of the map, where the simplest non-aromatic compounds are located (R10). The non-planar molecules also appear to be absent from regions rich in amide groups, such as the center-north area (R11).

The 3D and SP3 libraries, which both are designed accounting for three-dimensionality, occupy nevertheless different spaces. The 3D libraries tend to reproduce the picture of the entire dataset (Fig. 5E), while the fragments of the SP3 libraries mainly populate in the east region of the landscape (R1), notably the north where the unsaturated molecules are clustered (Fig. 5F). Although this distribution may be biased by the molecular descriptors, it is consistent with the recent study led by Downes which states that the three dimensionality of a molecule cannot be well described by sp3 descriptors.27

Conclusions

We have proposed here a methodology for analyzing fragment libraries based on the classification of scaffolds and the modeling of landscapes by GTM. Combined, these two approaches have allowed the comparison of chemical libraries with one another as well as the positioning of each of them with respect to the set of fragments currently available commercially. We have validated that a small-sized chemical library (<2000 fragments) can adequately represent the chemical diversity of small-sized molecules (MW ≤ 300 Da). We also observed unexpected characteristics of specialized libraries, such as the low proportion of non-flat fragments in diverse libraries. Lastly, our results show that libraries are not interchangeable, even if they were built for the same purpose (e.g., three-dimensionality).

Abbreviations

3D

3D-Shaped

3D-PSA

Polar surface area

CHE

Metal chelating

COV

Covalent

DIV

Diverse

F-NMR

Fluorine nuclear magnetic resonance

FBDD

Fragment-Based Drug Discovery

FDA

Food and Drug Administration

HBA

Hydrogen bond acceptor

HBD

Hydrogen bond donor

HTS

High throughput screening

FLU

Fluorine-rich

GEN

General

GTM

Generative topographic map

HAL

Halogen-rich

log P

Partition logarithm

MIS

Miscellaneous

MW

Molecular weight

NMR

Nuclear magnetic resonance

NP

Natural-product like

Nrot

Number of rotatable bonds

PBF

Plane of best fit

PIM

Principal moments of inertia

PPI

Protein–protein interactions

PSA

Polar surface area

RO3

Rule of three

RO5

Rule of five

SASA

Solvent-accessible surface area

SD

Standard deviation

SP3

sp3 hybridization

QED

Quantitative estimate drug-likeness

Funding sources

Programme d'Investissements d'Avenir de l'Agence Nationale de Recherche (grant IDEX W19RHUS3, Université de Strasbourg).

Author contributions

Conceptualization: G. M. and E. K. Methodology: J. R. I., C. J., G. B., G. M. and E. K. Implementation of the protocol: J. R. I. Data preparation: J. R. I. Formal analysis: J. R. I., G. M. and E. K. Writing—original draft preparation: J. R. I. and E. K. Writing—review and editing: J. R. I., C. J., G. M. and E. K.; project coordination: E. K.

Conflicts of interest

There are no conflicts to declare.

Supplementary Material

MD-013-D1MD00363A-s001
MD-013-D1MD00363A-s001.pdf (1,001.2KB, pdf)

Acknowledgments

The authors thank Fanny Bonachera and Dragos Horvart for technical assistance and helpful discussion, and Xuechen Tang for preliminary studies. The authors thank the Institut du Medicament (IMS) for financial support.

Electronic supplementary information (ESI) available. See DOI: 10.1039/d1md00363a

References

  1. Carr R. A. E. Congreve M. Murray C. W. Rees D. C. Fragment-Based Lead Discovery: Leads by Design. Drug Discovery Today. 2005;10(14):987–992. doi: 10.1016/S1359-6446(05)03511-7. [DOI] [PubMed] [Google Scholar]
  2. Murray C. W. Rees D. C. The Rise of Fragment-Based Drug Discovery. Nat. Chem. 2009;1(3):187–192. doi: 10.1038/nchem.217. [DOI] [PubMed] [Google Scholar]
  3. Jahnke W. Erlanson D. A. de Esch I. J. P. Johnson C. N. Mortenson P. N. Ochi Y. Urushima T. Fragment-to-Lead Medicinal Chemistry Publications in 2019. J. Med. Chem. 2020;63(24):15494–15507. doi: 10.1021/acs.jmedchem.0c01608. [DOI] [PubMed] [Google Scholar]
  4. Romasanta A. K. S. van der Sijde P. Hellsten I. Hubbard R. E. Keseru G. M. van Muijlwijk-Koezen J. de Esch I. J. P. When Fragments Link: A Bibliometric Perspective on the Development of Fragment-Based Drug Discovery. Drug Discovery Today. 2018;23(9):1596–1609. doi: 10.1016/j.drudis.2018.05.004. [DOI] [PubMed] [Google Scholar]
  5. Shuker S. B. Hajduk P. J. Meadows R. P. Fesik S. W. Discovering High-Affinity Ligands for Proteins: SAR by NMR. Science. 1996;274(5292):1531–1534. doi: 10.1126/science.274.5292.1531. [DOI] [PubMed] [Google Scholar]
  6. Singh M. Tam B. Akabayov B. NMR-Fragment Based Virtual Screening: A Brief Overview. Molecules. 2018;23(2):233. doi: 10.3390/molecules23020233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. O'Reilly M. Cleasby A. Davies T. G. Hall R. J. Ludlow R. F. Murray C. W. Tisi D. Jhoti H. Crystallographic Screening Using Ultra-Low-Molecular-Weight Ligands to Guide Drug Design. Drug Discovery Today. 2019;24(5):1081–1086. doi: 10.1016/j.drudis.2019.03.009. [DOI] [PubMed] [Google Scholar]
  8. Congreve M. Carr R. A. E. Murray C. W. Jhoti H. A ‘Rule of Three’ for Fragment-Based Lead Discovery? Drug Discovery Today. 2003;8(19):876–877. doi: 10.1016/S1359-6446(03)02831-9. [DOI] [PubMed] [Google Scholar]
  9. Lipinski C. A. Lombardo F. Dominy B. W. Feeney P. J. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settingsq. Adv. Drug Delivery Rev. 2001:24. doi: 10.1016/s0169-409x(00)00129-0. [DOI] [PubMed] [Google Scholar]
  10. Jhoti H. Williams G. Rees D. C. Murray C. W. The “rule of Three” for Fragment-Based Drug Discovery: Where Are We Now? Nat. Rev. Drug Discovery. 2013;12(8):644–644. doi: 10.1038/nrd3926-c1. [DOI] [PubMed] [Google Scholar]
  11. Erlanson D. A. Fesik S. W. Hubbard R. E. Jahnke W. Jhoti H. Twenty Years on: The Impact of Fragments on Drug Discovery. Nat. Rev. Drug Discovery. 2016;15(9):605–619. doi: 10.1038/nrd.2016.109. [DOI] [PubMed] [Google Scholar]
  12. Bickerton G. R. Paolini G. V. Besnard J. Muresan S. Hopkins A. L. Quantifying the Chemical Beauty of Drugs. Nat. Chem. 2012;4(2):90–98. doi: 10.1038/nchem.1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Konteatis Z. What Makes a Good Fragment in Fragment-Based Drug Discovery? Expert Opin. Drug Discovery. 2021;16(7):723–726. doi: 10.1080/17460441.2021.1905629. [DOI] [PubMed] [Google Scholar]
  14. Ferenczy G. G. Keserű G. M. On the Enthalpic Preference of Fragment Binding. MedChemComm. 2016;7(2):332–337. doi: 10.1039/C5MD00542F. [DOI] [Google Scholar]
  15. Jacquemard C. Kellenberger E. A Bright Future for Fragment-Based Drug Discovery: What Does It Hold? Expert Opin. Drug Discovery. 2019;14(5):413–416. doi: 10.1080/17460441.2019.1583643. [DOI] [PubMed] [Google Scholar]
  16. Keserű G. M. Erlanson D. A. Ferenczy G. G. Hann M. M. Murray C. W. Pickett S. D. Design Principles for Fragment Libraries: Maximizing the Value of Learnings from Pharma Fragment-Based Drug Discovery (FBDD) Programs for Use in Academia. J. Med. Chem. 2016;59(18):8189–8206. doi: 10.1021/acs.jmedchem.6b00197. [DOI] [PubMed] [Google Scholar]
  17. Hann M. M. Leach A. R. Harper G. Molecular Complexity and Its Impact on the Probability of Finding Leads for Drug Discovery. J. Chem. Inf. Comput. Sci. 2001;41(3):856–864. doi: 10.1021/ci000403i. [DOI] [PubMed] [Google Scholar]
  18. Shi Y. von Itzstein M. How Size Matters: Diversity for Fragment Library Design. Molecules. 2019;24(15):2838. doi: 10.3390/molecules24152838. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Lovering F. Bikker J. Humblet C. Escape from Flatland: Increasing Saturation as an Approach to Improving Clinical Success. J. Med. Chem. 2009;52(21):6752–6756. doi: 10.1021/jm901241e. [DOI] [PubMed] [Google Scholar]
  20. Péron F. Riché S. Lesur B. Hibert M. Breton P. Fourquez J.-M. Girard N. Bonnet D. Versatile Synthetic Approach for Selective Diversification of Bicyclic Aza-Diketopiperazines. ACS Omega. 2018;3(11):15182–15192. doi: 10.1021/acsomega.8b01752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Downes T. D. Jones S. P. Klein H. F. Wheldon M. C. Atobe M. Bond P. S. Firth J. D. Chan N. S. Waddelove L. Hubbard R. E. Blakemore D. C. De Fusco C. Roughley S. D. Vidler L. R. Whatton M. A. Woolford A. J.-A. Wrigley G. L. O'Brien P. Design and Synthesis of 56 Shape-Diverse 3D Fragments. Chem. – Eur. J. 2020;26(41):8969–8975. doi: 10.1002/chem.202001123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Tang S.-Q. Leloire M. Schneider S. Mohr J. Bricard J. Gizzi P. Garnier D. Schmitt M. Bihel F. Diastereoselective Synthesis of Nonplanar 3-Amino-1,2,4-Oxadiazine Scaffold: Structure Revision of Alchornedine. J. Org. Chem. 2020;85(23):15347–15359. doi: 10.1021/acs.joc.0c01764. [DOI] [PubMed] [Google Scholar]
  23. Li Petri G. Raimondi M. V. Spanò V. Holl R. Barraja P. Montalbano A. Pyrrolidine in Drug Discovery: A Versatile Scaffold for Novel Biologically Active Compounds. Top. Curr. Chem. 2021;379(5):34. doi: 10.1007/s41061-021-00347-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Hiesinger K. Dar'in D. Proschak E. Krasavin M. Spirocyclic Scaffolds in Medicinal Chemistry. J. Med. Chem. 2021;64(1):150–183. doi: 10.1021/acs.jmedchem.0c01473. [DOI] [PubMed] [Google Scholar]
  25. Hall R. J. Mortenson P. N. Murray C. W. Efficient Exploration of Chemical Space by Fragment-Based Screening. Prog. Biophys. Mol. Biol. 2014;116(2–3):82–91. doi: 10.1016/j.pbiomolbio.2014.09.007. [DOI] [PubMed] [Google Scholar]
  26. Ghose A. K. Viswanadhan V. N. Wendoloski J. J. Prediction of Hydrophobic (Lipophilic) Properties of Small Organic Molecules Using Fragmental Methods: An Analysis of ALOGP and CLOGP Methods. J. Phys. Chem. A. 1998;102(21):3762–3772. doi: 10.1021/jp980230o. [DOI] [Google Scholar]
  27. Firth N. C. Brown N. Blagg J. Plane of Best Fit: A Novel Method to Characterize the Three-Dimensionality of Molecules. J. Chem. Inf. Model. 2012;52(10):2516–2525. doi: 10.1021/ci300293f. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Bemis G. W. Murcko M. A. The Properties of Known Drugs. 1. Molecular Frameworks. J. Med. Chem. 1996;39(15):2887–2893. doi: 10.1021/jm9602928. [DOI] [PubMed] [Google Scholar]
  29. Kireeva N. Baskin I. I. Gaspar H. A. Horvath D. Marcou G. Varnek A. Generative Topographic Mapping (GTM): Universal Tool for Data Visualization, Structure-Activity Modeling and Dataset Comparison. Mol. Inf. 2012;31(3–4):301–312. doi: 10.1002/minf.201100163. [DOI] [PubMed] [Google Scholar]
  30. Zimmermann M. O. Lange A. Wilcken R. Cieslik M. B. Exner T. E. Joerger A. C. Koch P. Boeckler F. M. Halogen-Enriched Fragment Libraries as Chemical Probes for Harnessing Halogen Bonding in Fragment-Based Lead Discovery. Future Med. Chem. 2014;6(6):617–639. doi: 10.4155/fmc.14.20. [DOI] [PubMed] [Google Scholar]
  31. Hann M. M. Keserü G. M. Finding the Sweet Spot: The Role of Nature and Nurture in Medicinal Chemistry. Nat. Rev. Drug Discovery. 2012;11(5):355–365. doi: 10.1038/nrd3701. [DOI] [PubMed] [Google Scholar]
  32. Hann M. M. Molecular Obesity, Potency and Other Addictions in Drug Discovery. MedChemComm. 2011;2(5):349–355. doi: 10.1039/C1MD00017A. [DOI] [Google Scholar]
  33. Over B. Wetzel S. Grütter C. Nakai Y. Renner S. Rauh D. Waldmann H. Natural-Product-Derived Fragments for Fragment-Based Ligand Discovery. Nat. Chem. 2013;5(1):21–28. doi: 10.1038/nchem.1506. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

MD-013-D1MD00363A-s001
MD-013-D1MD00363A-s001.pdf (1,001.2KB, pdf)

Articles from RSC Medicinal Chemistry are provided here courtesy of Royal Society of Chemistry

RESOURCES