Comprehensive analysis of commercial fragment libraries

Julia Revillo Imbernon; Célien Jacquemard; Guillaume Bret; Gilles Marcou; Esther Kellenberger

doi:10.1039/d1md00363a

. 2021 Dec 24;13(3):300–310. doi: 10.1039/d1md00363a

Comprehensive analysis of commercial fragment libraries^†

Julia Revillo Imbernon ¹, Célien Jacquemard ¹, Guillaume Bret ¹, Gilles Marcou ^2,^✉, Esther Kellenberger ^1,^✉

PMCID: PMC8942207 PMID: 35434627

Abstract

Screening of fragment libraries is a valuable approach to the drug discovery process. The quality of the library is one of the keys to success, and more particularly the design or choice of a library has to meet the specificities of the research program. In this study, we made an inventory of the commercial fragment libraries and we established a methodology which allows any library to be positioned in relation to the complete offer currently on the market, by addressing the following questions: does this chemical library look like another chemical library? What is the coverage of the current chemical space by this chemical library? What are the characteristic structural features of the fragments of this chemical library? We based our analysis on 2D and 3D chemical descriptors, framework class generation and the generative topographic map. We identified 59 270 scaffolds, which can be searched in a dedicated web site (https://gtmfrag.drugdesign.unistra.fr) and developed a model which accounts for fragment diversity while being easy to interpret (download at 10.5281/zenodo.5534434).

Explore the chemical space of libraries marketed for fragment-based drug discovery.

Introduction

For the last 25 years, fragment-based drug discovery (FBDD) has widely increased in popularity as an alternative or complement to high throughput screening (HTS).^1,2 The approach has facilitated the success of many medicinal chemistry programs, with five fragment drugs approved by the FDA and many molecules having entered clinical trials.³

FBDD is an interdisciplinary field organized around library design, fragment screening, computational methods and optimization.⁴ These topics are strongly interconnected. For example, the biophysical techniques suitable for the study of low affinity binders, such as X-ray crystallography or isothermal calorimetry, have a low throughput, and therefore are limited to testing small-sized libraries. The screening method can also impose a bias on the chemical structure. For example, nuclear magnetic resonance (NMR), a pioneering method in FBDD,⁵ can exploit fluorine magnetic properties for the detection of fluorinated ligand binding.⁶ More generally, the physico-chemical properties are determinant for the selection of fragments depending on the screening objective. An illustrative example is provided by X-ray crystallography which allows the mapping of hot spots in protein using molecules with ultra-low molecular weight.⁷

Fragment definition

A widely accepted definition is that fragments are compounds following the rule of three (RO3),⁸ analogous to Lipinski's rule of five (RO5):⁹ MW < 300 Da, A log P ≤ 3, number of hydrogen bond acceptors ≤3 and number of hydrogen bond donors ≤3. Additional criteria have also been proposed such as the number of rotatable bonds being smaller than 3 and a polar surface area smaller than 60 Å². The MW limit has also fluctuated tending towards smaller molecules, which generally present a lower hydrophobicity.^10,11 More complex rules have been developed to quantify the desirability of each compound, such as the quantitative metric of drug-likeness (QED), which ranges from 0 to 1.¹² However, QED is designed for full-sized drug-like molecules: as a consequence, QED values are biased when applied to fragment-like compounds. In this work, we expect this bias to be equal on all investigated situations.

Recently, Konteatis has proposed eight guidelines for the selection of fragments, stressing that priority should be set on the enthalpy-driven binding interactions, followed by the molecular shape diversity of the target-complementary fragments.¹³ Indeed, Keserü et al. have theoretically and experimentally confirmed the importance of enthalpic contribution to prioritize fragments forming key interactions.¹⁴ The success of fragment-to-lead optimization depends on the target-anchoring capacity of fragments, which in particular determines if the binding mode is conserved during fragment growth. Analysis of fragment binding modes in the Protein Data Bank suggested that fragments that are able to form polar interactions and have a molecular weight larger than 150 Da should be prioritized.¹⁵

Fragment libraries

Suppliers of chemical libraries offer a large panel of screening collections of fragments. The content of fragment libraries varies depending on their purpose. A fragment library can be tailored to a precise screening method, such as, for example, NMR.⁶ Fragment screening can not only deliver hits, but is also suitable for protein binding site detection, druggability assessment or study of versatile binding modes.¹⁶ Library design in general focuses on a specific need, such as, for example, the ability of the fragment to form a covalent bond, which imposes a particular functional group, or the ability of the library to offer a wide diversity of chemical structures.

The size of chemical libraries is variable, with a number of fragments ranging from a few hundred to over a hundred thousand. From the early days of FBDD, Hann et al. have noticed the relationship between molecular complexity and the likelihood of success in hit discovery,¹⁷ suggesting that screening a thousand fragments covers the chemical space in a more effective way than a million HTS compounds. More recently, Shi and von Itzstein engaged in a theoretical exercise aiming to determine how many fragments are sufficient to represent the 227 787 commercial fragments of the ZINC database.¹⁸ Their comparison of different sized subsets, using a measure of diversity taking into account both the number of unique structural fingerprints and their relative abundance in the subset, revealed that a subset of approximatively 2000 fragments can have the same level of diversity as the full set of fragments.

Three dimensionality

For the past ten years, three-dimensionality has been a major concern in medicinal chemistry to produce molecules that improve both the originality of drug candidates and their selectivity. In the seminal article by Lovering et al., the analysis of a huge number of compounds sampled at different stages of the drug development process suggested that complex molecules are more likely to become drug candidates than simpler molecules. This parameter was evaluated by the fraction of sp³ hybridized carbons and the number of chiral centers.¹⁹ The interest in non-planar scaffolds has given rise to numerous developments in synthetic methods.^20–24 In FBDD programs, planar fragments are often avoided, since they tend to bind multiple targets, and therefore are potentially more difficult to develop into a selective drug-like compound.¹⁶ Nonetheless, the successful ligand design from flat fragments has fueled the debate, demonstrating that the flatness of the fragments does not necessarily transfer to the lead.²⁵

In this study, we propose a comprehensive analysis of commercial fragment libraries available in 2021, with emphasis set on diversity and three-dimensionality. This analysis is designed so that it can be updated and made available on a yearly basis, to follow the evolutions of the content and availability of commercial fragment libraries.

Materials and methods

Collection of fragment libraries and classification

A total of 86 different fragment libraries from fourteen vendors were collected from February 22nd–24th, 2021. These libraries were categorized by composition, as annotated by the supplier, into ten main categories (Table 1): general (GEN), diverse (DIV), natural-product like (NP), 3D-shaped (3D), sp³ hybridized (SP3), metal chelating (CHE), protein–protein interactions (PPI), fluorine-rich (FLU), halogen-rich (HAL), and covalent binders (COV). The chemical libraries that did not fall into any of these categories were called miscellaneous (MIS).

Summary of library classification.

Type	Description	Number of libraries
Type	Description	Small-sized	Middle-sized	Large-sized
GEN	General	2	3	9
DIV	Diverse	7	1	0
NP	Natural product-like	2	2	0
3D	3D-shaped	3	1	0
SP3	sp³ hybridized	1	2	2
CHE	Metal chelators	3	0	0
PPI	Protein–protein interacting	1	3	0
FLU	Fluorine enriched	7	3	1
HAL	Halogen enriched	3	4	0
COV	Reactive covalent linking	4	5	0
MIS	Miscellaneous	8	7	2

Open in a new tab

Standardization of fragments and calculation of molecular properties

Pipeline Pilot (v.19.1.0.1964; BIOVIA) was used to generate the canonical SMILES of each molecule from the structure files provided by the suppliers, then to remove counter ions of salts. Filter (v.2.5.1.4; Openeye Scientific Software, Santa Fe) was used to filter out simple ions and invalid chemical structures (for example containing a pentavalent carbon atom) and to ionize the molecules at pH = 7.4. The pyridone, pyridine and sulfonamide descriptions were corrected, and fragments were aromatized, using a basic aromatic scheme, with the Standardizer (v16.10.17.0; ChemAxon Ltd.) software.

Pipeline Tools was used to calculate 2D descriptors: number of hydrogen bond acceptors (HBA) and donors (HBD), as defined by Lipinski's Rule; A log P, computed with the Ghose/Crippen group-contribution estimate for Log P;²⁶ polar surface area (PSA); number of rotatable bonds (Nrot); quantitative estimate of drug-likeness (QED).¹² 3D descriptors were calculated by using Openeye libraries (v.2019.10.2) from a low energy conformer generated for each molecule by the Corina (v.3.40; Molecular Networks GmbH, Nürnberg, Germany) software with its default settings: solvent-accessible surface area (SASA), polar surface area (3D-PSA), plane of best fit (PBF).²⁷

Generation of fragment classes

The fragments from all libraries except those built with a chemical bias (i.e. COV, PPI, HAL, FLU and the MIS constructed around a specific functional group) were merged into a single file. Duplicates, as identified from identical canonical SMILES, were removed using Pipeline Pilot (BIOVIA, Dassault Systèmes, Pipeline Pilot, release 2019, San Diego: Dassault Systèmes).

Fragments were imported into ADMET Predictor 2019 (ADMET Predictor™, Simulations Plus, Inc., Lancaster, California, USA), neutralized, standardized, and classified using the option “Frameworks”. The method is based on ring systems and connecting chains, similar to the so-called “Murcko assemblies” described by Bemis and Murcko.²⁸ It yields only one scaffold per molecule. Each fragment belongs to only one class, corresponding to its major ring system, and each class is defined by a specific scaffold.

Generative topographic maps (GTM)

The training set was composed of 6017 fragments randomly picked in the small databases. Molecular descriptors were generated using ISIDA/Fragmentor (v.2019, Faculté de Chimie, Université de Strasbourg, France),²⁹ with atom centered fragments based on sequences of atoms and bonds, a length ranging from 2 to 4 and tagging the carbon atom if part of a ring and/or of an aromatic system, IIAB(2–4)_cycle. Those descriptors present in less than 5% of the dataset have been discarded.

The manifold generation and projections were carried out with the GTMTool (v.2018, Faculté de Chimie, Université de Strasbourg, France). A pre-processing procedure, to center the dataset, was used and the total number of traits was set to 301. The best model was chosen based on the likelihood. The GTMClass tool was used to generate property landscapes specific to supplier categories (Table 1). Landscape depiction was performed using the GTMLandscape software.

Dataset availability

Datasets are available on Zenodo (doi: 10.5281/zenodo.5534434). The deposit contains the chemical structures, annotations, computed molecular descriptors, scaffolds, and models.

A web interface was designed to search the scaffolds. It is freely available at: https://gtmfrag.drugdesign.unistra.fr. Molconvert was used to standardize structures and convert between file types, JChem Base was used to store structures in a postgresql instance and is used to query the database and Marvin.js was used for drawing structures, LTS-helium (21.4.0), ChemAxon (https://www.chemaxon.com).

Results and discussion

Overview of commercially available fragments

Our survey identified 86 fragment libraries, provided by fourteen 14 chemical suppliers, for a total of 754 646 molecules with 512 284 unique chemical 2D-structures.

These libraries have been designed with various specifications, for purposes which are generally well described, thereby allowing us to classify most of the libraries by type (Table 1). Firstly, twenty-seven libraries show chemical bias: enrichment in fluorine for screening by 19F-NMR (FLU type, eleven libraries), enrichment in other halogen elements for harnessing halogen bonding (HAL type, seven libraries),³⁰ and the presence of a reactive group for covalent linking to the target (COV type, nine libraries). For eleven other chemical libraries, the construction rules consider the whole chemical structure of the fragment to focus on particular chemotypes: natural product or natural product-like fragments (NP type, four libraries), compounds targeting protein–protein interactions (PPI type, four libraries), or potential metal-chelating compounds (CHE type, three libraries). There are also eight libraries which aim at maximizing the coverage of the chemical space by optimizing molecular diversity within the set of fragments (DIV type). Avoiding an overrepresentation of flat fragments was the determining factor in the design of nine libraries, by favoring fragments with a non-flat 3D-shape (3D type, four libraries) or fragments with a high content of sp3-hydridized atoms (SP3 type, five libraries). The other libraries have each their own special properties and thus were qualified as miscellaneous (MIS type, seventeen libraries, e.g. library containing fragments which are well soluble in both dimethyl sulfoxide and phosphate-buffered saline buffer), or do not present specific characteristics and thus were qualified as general (GEN type, fourteen libraries).

Libraries were also classified according to their number of fragments into the three following categories: small (≤2000 fragments, 41 libraries), intermediate (2000–10 000 fragments, 31 libraries) and large (>10 000 fragments, 14 libraries). Only small-sized libraries are suitable for testing by biophysical approaches, such as NMR. Intermediate ones are accessible for medium throughput screening techniques. The large libraries offer a wide choice for the custom design of small chemical libraries. The small-sized libraries taken as a whole contain 33 023 different fragments. This subset covers 6.4% of all commercial fragments. By comparison, the intermediate libraries taken as a whole contain 94 471 different fragments, covering 18.4% of all commercial fragments.

Commercially available fragments conform with RO3

In 2003, Congreve et al. defined the “Rule of 3” (RO3) as follows: MW ≤ 300 Da, A log P ≤ 3, HBA ≤ 3, and HBD ≤ 3.⁸ As shown in Fig. 1a, less than 23% of the fragments strictly fulfill the four conditions and 74% break one rule, mainly that about HBA (60% of the cases). Most of the fragments have a MW in the 200–300 Da range (Fig. 1B). Only 13.43% of the fragments have a MW lower than 200 Da. These small fragments, which generally have a smaller number of atoms and a lower molecular complexity, have an increased chance of binding to a protein.¹⁷ Moreover, as low affinity binders with low specificity, these small fragments constitute chemical probes useful for the detection of binding sites and hotspots as well as druggability studies.¹⁶ Several suppliers have developed libraries of low MW fragments, sometimes called the Mini-Frag libraries. Noteworthily, the proportion of small fragments has surprisingly decreased over the two last years (24.29% in 2019).

Together with MW, lipophilicity is an important parameter to prioritize compounds in a medicinal chemistry program.^31,32 Considering log P, again most of the fragments fall in a narrow range (0–2), with a high proportion of weakly hydrophobic molecules (46% with A log P < 1, Fig. 1B). Considering only the small-sized libraries (Fig. 1C), MW and log P distributions are more spread out, with a shift of the maximum of the MW distribution to a lower value, as compared to the full set of fragments.

The drug-likeness was further evaluated using QED. In their publication presenting the QED, Bickerton et al. evaluated that QEDs of attractive starting compounds for hit optimization are on average 0.67 (standard deviation = 0.16).¹² They observed that “too simple” or “too complex” unattractive compounds overall show lower QED. The QED mean values computed here for each fragment library are represented in Fig. 1D. All libraries but Mini-Frag show mean QED exceeding 0.6, i.e. close or passing the threshold proposed for drug-like compounds, although MW, which is taken into account in the QED, is lower in fragments. QED mean values are not related to the library type.

Diversity of fragments

In order to assess the chemical diversity of the entire fragment collection, we used two different and complementary approaches: the classification of the fragments by common chemical scaffolds and the mapping of the chemical space covered by the fragments using GTM. These analyses have taken into account the libraries of the following types: GEN, DIV, NP, CHE, SP3, 3D and MIS (list in Table S1†). PPI libraries were removed because their MW is significantly higher than the MW given by the fragment definition used for this study. FLU and HAL libraries were discarded because they would have introduced a bias in the representation of the chemical space. COV libraries were removed because the recurrent substructures present on reactive covalent linking group fragments would also have caused a bias in the analysis of the chemical space. The diversity analyses were hence performed on an ensemble of 433 433 unique fragments from fifty libraries provided by fourteen suppliers (Table S2†). The classification of all fragments by the common substructure resulted in the inventory of 59 270 scaffolds. Not surprisingly, the number of scaffolds represented in a chemical library depends on the total number of fragments in the library (Fig. 2A and B). The ratio between the number of scaffolds and the number of molecules however varies, ranging from 0.13 to 0.77. Interestingly, small-sized DIV libraries cover this range (Fig. 2B), suggesting that some commercial collections of DIV fragments are not obtained by maximizing the number of different scaffolds. More generally, the type of library does not allow the number of scaffolds it contains to be presumed.

If we consider the 59 270 scaffolds as a whole, their representation among the complete set of 433 433 fragments is highly uneven. About 0.1% of the scaffolds are present in more than a thousand of fragments, hence covering 25% of the entire set (Fig. 2C). These scaffolds are mainly simple and aromatic, such as benzene and nitrogen heterocycles. The fifty most frequent scaffolds are shown in Table S3† (left panel). At the other end of the distribution of scaffolds by frequency, there are 36 555 scaffolds defined by a single fragment and here called singletons.

Interestingly, the proportion of singletons may be related to the library specificity (Fig. 2B and Table S2†). For example, small and diverse-type libraries, which are designed to maximally cover the chemical space, have a low proportion of singleton-scaffolds. By contrast, sp³-type libraries, which are focused on a more specific chemical space, present a higher proportion of singletons.

Focusing on small-sized libraries, about 6.4 percent of the 59 270 scaffolds are represented among the set of 20 708 unique fragments. The distribution of scaffolds by frequency within this smaller set resembles that observed within the complete set of 433 433 fragments (compare Fig. 2C and D). The most frequent scaffold remains benzene, the frequency of which is even increased (present in 12% of molecules, against 5% in the complete set). The structures and frequencies of the fifty most frequent scaffolds are shown in Table S3† (right panel). The diversity has been obtained through the singletons, which represent 62% of the scaffolds.

Less frequent scaffolds generally have a higher complexity. Their number of non-hydrogen atoms, number of rings, and PBF are on average higher than those for the most common scaffolds (Table 2). By focusing on the scaffolds present in small-sized libraries, this difference in complexity between popular and rarer scaffolds is maintained. If the comparison of common and less frequent scaffolds again shows that less frequent scaffolds have on average a higher number of non-hydrogen atoms, a higher number of rings and a higher PBF, it also reveals that the average complexity of small-sized library scaffolds is reduced (Table 2). The majority of scaffolds present in small-sized libraries contain two (63.9%) or three rings (14.1%). In many scaffolds with two or three rings, the rings are connected by an acyclic short linker. Only a hundred complex scaffolds contain more than three rings, which are either fused or bridged forming a compact, 3D or linear geometry (see examples in Fig. 2E), or which are separated by a central aliphatic linker of length 1 to 4. A unique scaffold shows a central ring acting as a platform on which are attached, via exocyclic single bonds, three other rings exploring three directions of space (C 43146 in Fig. 2E). The same scaffold is also remarkable because it contains six nitrogen atoms and one oxygen atom. The number of nitrogen, oxygen and sulfur atoms also gives an insight into the variety of small-sized library scaffolds. Nitrogen is by far the most common heteroatom (1–6 N in 93.4% of scaffolds), ahead of oxygen (1–3 O in 30.7% of scaffolds) and sulfur (1–2 S in 21.9% of scaffolds). Fig. 2F shows examples of scaffolds containing more than one nitrogen, oxygen or sulphur atoms, or containing all three heteroatoms.

Comparison of scaffold properties and frequency.

	Frequency	Number of scaffolds	Number of non-hydrogen atoms (median)	Proportion (%) of scaffold with				PBF (mean ± SD)
	Frequency	Number of scaffolds	Number of non-hydrogen atoms (median)	1 ring	2 rings	3 rings	>3 rings	PBF (mean ± SD)
Scaffolds present in all libraries	Singletons	36 555	17	0.2	32.6	55.4	11.9	0.62 ± 0.17
	Intermediate	22 669	15	0.7	60	36.4	2.9	0.56 ± 0.17
	Popular^a	46	9	23.9	76.1	0	0	0.46 ± 0.17
Scaffolds present in small-sized libraries	Singletons	2356	13	1.4	72.1	22.2	4.3	0.49 ± 0.20
	Intermediate	1426	11	5.3	85.1	9	0.6	0.43 ± 0.20
	Popular^a	46	9	34.8	63	0	2.2	0.30 ± 0.17

Open in a new tab

Scaffolds appearing at least 1000 times in all libraries or 66 times in small-sized libraries. These thresholds have been set as the top 1/5 of the highest frequency value (most common scaffolds have been previously discarded: benzene for all libraries and benzene and pyridine for small-sized libraries).

Fragment space mapping

To further study the chemical diversity covered by commercial fragments, we developed a generative topographic map (GTM) model able to represent the chemical space in a landscape.²⁹ This method has the advantage of allowing a simple visual analysis. At a glance, the overall comparison of two maps gives the degree of resemblance between two sets of fragments. Local inspection of a map reveals information at a finer scale, such as chemical substructures which are well represented in a library. The GTM model was used to evaluate each library against all the fragments and to compare the libraries with each other.

Fig. 3 shows how the constructed GTM model has distributed the fragments on the chemical landscape. Detailed visual analysis of the GTM map validated the consistency between the spatial proximity of the projection points of the fragments and the similarity of the chemical structure of these fragments. Thus, it is possible to associate particular chemical characteristics with the different areas of the map, as illustrated in Fig. 3A with an area rich in unsaturated molecules (R1), an area populated by derivatives of benzoimidazole (R2) and an area where all fragments contain fluorine (R3). Within an area, the fragments are distributed in space depending on complexity (number of rings, presence of heteroatoms, number of substituents) and the type of functional group. For example, as shown in Fig. 3B, in the lower part of R3, a path going from right to left will successively meet fragments containing a trifluoromethyl group and a phenyl group, then those containing a trifluoromethyl group and a nitrogenous heterocycle, and finally those containing only a single fluorine atom and a nitrogenous heterocycle.

In the GTM landscapes, likelihood is represented through a gradient color and depicts the probability distribution of the data. Fig. 3C shows that the density of the entire dataset covers the entire map, yet the distribution is clearly heterogeneous. Purple regions correspond to the most populated ones. The widely dense south-west area (R4) collects a broad set of pyridine derivatives connected to nitrogenated substitutes. As the populated south-east region (R5) is rich in fragments containing a piperazine connected to a pyridine, both regions could be related. The center-south region (R6), enclosing the benzoic acid derivatives, appears to be the less occupied one.

Coverage of the fragment landscape by libraries

Fig. 4A–D show the GTM landscape of the different fragment subsets as defined by the well-characterized library types DIV, NP, CHE, SP3 and 3D. Fragments in SP3 and 3D libraries, which both focus on three-dimensionality of molecules, were merged into a single set. The projections of the subsets on the GTM show different trends, most of which are expected from the purpose of the libraries that compose them. The DIV subset is representative of the whole chemical space as it contains almost all the chemical clusters of the GTM model applied on the full fragments' dataset (Fig. 4A). Nevertheless, the area containing the unsaturated cyclic and linear fragments (R1 in Fig. 3A and R1 in Fig. 4A) is less populated on its left part, where the more complex molecules are located. Also, right south to this region (R7), an elongated area grouping molecules rich in piperidine amides is almost empty. The SP3-3D landscape also well represents the whole data distribution (Fig. 4B), yet as expected, chart density is concentrated on the upper area, where the unsaturated compounds are located (R1). The NP landscape (Fig. 4C) also has a great density among the unsaturated compounds (R1 in Fig. 3A and R1 in Fig. 4C), including the area where the most complex ones are found, yet misses other regions of the whole fragment landscape, in particular the regions with a benzene linked to multiple-nitrogenated cycles are almost empty (R8a, R8b and R8c). For example, the piperazine area is not populated. The CHE subset better covers the whole fragment map than the NP subset (Fig. 4D). In addition, its projection on the GTM well fills the center-south area (R6), which is sparsely populated in (R6) of Fig. 3C. Hence, metal chelator libraries contain specific fragments that are not common in the general dataset.

Although the DIV libraries taken together well represent the whole fragment space, they are not all equivalent (Fig. 4E–H). Comparison of DIV libraries 3 and 6 gives a good illustration (Fig. 4E and F respectively). The two sets, respectively made of 1226 and 1920 fragments, occupy the space in a very different way, although both contain fragments representing every populated cluster present in the full dataset (Fig. 3C). DIV library 3 has a homogenous coverage of the chemical space, while DIV library 6 is more representative of the density. The comparison between DIV library 1 and DIV library 7 (Fig. 4G – 320 fragments and Fig. 4H – 2000 fragments, respectively) illustrates that small-sized libraries can correctly cover the ensemble of the chemical space. Most clusters are covered by both libraries. However, DIV library 1 has a bigger presence on the north part than DIV library 7, despite being 6 times smaller.

Fragment three-dimensionality

The 3D character of fragments was evaluated using the plane of best fit (PBF) method, which fits a plane on the 3D structure of the molecule to issue the average distance of atomic coordinates from the plane, the PBF score.²⁷ If molecule lies on the plane, i.e. is flat, PBF equals 0. Otherwise, PBF increases as a function of the scaffold geometry and the position and size of substituents. The distribution of PBF among the complete dataset of fragments is bimodal with a major peak at 0.62 ± 0.20 and a minor peak at 0.01 ± 0.03, allowing the distinction of flat fragments (7.8% of fragments with PBF < 0.1 Å) and fragments with non-planar geometry (92.2% of fragments with PBF ≥ 0.1 Å). A large part of flat fragments has no rotatable bonds (Fig. 5A). The proportion of this type of fragment is greater in the subset of small-sized libraries than in the whole dataset (8.6% against 21.9%), suggesting that currently, planarity is not a strong filter in the design of many fragment libraries (Fig. 5A and B). Accordingly, the percentage of flat fragments in small-sized libraries has significantly raised since 2019 (16%).

Considering each library individually, the annotations by type and the three-dimensionality of fragments are consistent (Table S2†). As expected, the 3D and SP3 libraries show the highest average PBF (≥0.6 Å). The NP libraries also show high average PBF, in agreement with the known complexity of many natural products.³³ On the other hand, the MiniFrag libraries, which are overall poorly diverse (MIS libraries 1 and 2 in Table S2†), are mostly composed of flat fragments (mean PBF ≤ 0.17 Å). The mean PBF values of DIV and CHE libraries are smaller than the global average. Actually, the percentage of flat fragments in both libraries is 40.8% and 31.7%, respectively.

To further study the three-dimensionality of fragments, we observed the GTM landscape of flat fragments (Fig. 5C) and of non-planar fragments (Fig. 5D). Interestingly, flat fragments are evenly distributed in nearly the entire fragment space. The same observation is made for non-planar fragments, suggesting that flat fragments can have a non-planar counterpart. Nevertheless, flat fragments show a smaller density in the south-east part (R9), corresponding to the trifluoromethyl derivates, and in the main center of the map, where the simplest non-aromatic compounds are located (R10). The non-planar molecules also appear to be absent from regions rich in amide groups, such as the center-north area (R11).

The 3D and SP3 libraries, which both are designed accounting for three-dimensionality, occupy nevertheless different spaces. The 3D libraries tend to reproduce the picture of the entire dataset (Fig. 5E), while the fragments of the SP3 libraries mainly populate in the east region of the landscape (R1), notably the north where the unsaturated molecules are clustered (Fig. 5F). Although this distribution may be biased by the molecular descriptors, it is consistent with the recent study led by Downes which states that the three dimensionality of a molecule cannot be well described by sp3 descriptors.²⁷

Conclusions

We have proposed here a methodology for analyzing fragment libraries based on the classification of scaffolds and the modeling of landscapes by GTM. Combined, these two approaches have allowed the comparison of chemical libraries with one another as well as the positioning of each of them with respect to the set of fragments currently available commercially. We have validated that a small-sized chemical library (<2000 fragments) can adequately represent the chemical diversity of small-sized molecules (MW ≤ 300 Da). We also observed unexpected characteristics of specialized libraries, such as the low proportion of non-flat fragments in diverse libraries. Lastly, our results show that libraries are not interchangeable, even if they were built for the same purpose (e.g., three-dimensionality).

Abbreviations

3D: 3D-Shaped
3D-PSA: Polar surface area
CHE: Metal chelating
COV: Covalent
DIV: Diverse
F-NMR: Fluorine nuclear magnetic resonance
FBDD: Fragment-Based Drug Discovery
FDA: Food and Drug Administration
HBA: Hydrogen bond acceptor
HBD: Hydrogen bond donor
HTS: High throughput screening
FLU: Fluorine-rich
GEN: General
GTM: Generative topographic map
HAL: Halogen-rich
log P: Partition logarithm
MIS: Miscellaneous
MW: Molecular weight
NMR: Nuclear magnetic resonance
NP: Natural-product like
Nrot: Number of rotatable bonds
PBF: Plane of best fit
PIM: Principal moments of inertia
PPI: Protein–protein interactions
PSA: Polar surface area
RO3: Rule of three
RO5: Rule of five
SASA: Solvent-accessible surface area
SD: Standard deviation
SP3: sp³ hybridization
QED: Quantitative estimate drug-likeness

Funding sources

Programme d'Investissements d'Avenir de l'Agence Nationale de Recherche (grant IDEX W19RHUS3, Université de Strasbourg).

Author contributions

Conceptualization: G. M. and E. K. Methodology: J. R. I., C. J., G. B., G. M. and E. K. Implementation of the protocol: J. R. I. Data preparation: J. R. I. Formal analysis: J. R. I., G. M. and E. K. Writing—original draft preparation: J. R. I. and E. K. Writing—review and editing: J. R. I., C. J., G. M. and E. K.; project coordination: E. K.

Conflicts of interest

There are no conflicts to declare.

Supplementary Material

MD-013-D1MD00363A-s001

MD-013-D1MD00363A-s001.pdf^{(1,001.2KB, pdf)}

Acknowledgments

The authors thank Fanny Bonachera and Dragos Horvart for technical assistance and helpful discussion, and Xuechen Tang for preliminary studies. The authors thank the Institut du Medicament (IMS) for financial support.

^†

Electronic supplementary information (ESI) available. See DOI: 10.1039/d1md00363a

References

Carr R. A. E. Congreve M. Murray C. W. Rees D. C. Fragment-Based Lead Discovery: Leads by Design. Drug Discovery Today. 2005;10(14):987–992. doi: 10.1016/S1359-6446(05)03511-7. [DOI] [PubMed] [Google Scholar]
Murray C. W. Rees D. C. The Rise of Fragment-Based Drug Discovery. Nat. Chem. 2009;1(3):187–192. doi: 10.1038/nchem.217. [DOI] [PubMed] [Google Scholar]
Jahnke W. Erlanson D. A. de Esch I. J. P. Johnson C. N. Mortenson P. N. Ochi Y. Urushima T. Fragment-to-Lead Medicinal Chemistry Publications in 2019. J. Med. Chem. 2020;63(24):15494–15507. doi: 10.1021/acs.jmedchem.0c01608. [DOI] [PubMed] [Google Scholar]
Romasanta A. K. S. van der Sijde P. Hellsten I. Hubbard R. E. Keseru G. M. van Muijlwijk-Koezen J. de Esch I. J. P. When Fragments Link: A Bibliometric Perspective on the Development of Fragment-Based Drug Discovery. Drug Discovery Today. 2018;23(9):1596–1609. doi: 10.1016/j.drudis.2018.05.004. [DOI] [PubMed] [Google Scholar]
Shuker S. B. Hajduk P. J. Meadows R. P. Fesik S. W. Discovering High-Affinity Ligands for Proteins: SAR by NMR. Science. 1996;274(5292):1531–1534. doi: 10.1126/science.274.5292.1531. [DOI] [PubMed] [Google Scholar]
Singh M. Tam B. Akabayov B. NMR-Fragment Based Virtual Screening: A Brief Overview. Molecules. 2018;23(2):233. doi: 10.3390/molecules23020233. [DOI] [PMC free article] [PubMed] [Google Scholar]
O'Reilly M. Cleasby A. Davies T. G. Hall R. J. Ludlow R. F. Murray C. W. Tisi D. Jhoti H. Crystallographic Screening Using Ultra-Low-Molecular-Weight Ligands to Guide Drug Design. Drug Discovery Today. 2019;24(5):1081–1086. doi: 10.1016/j.drudis.2019.03.009. [DOI] [PubMed] [Google Scholar]
Congreve M. Carr R. A. E. Murray C. W. Jhoti H. A ‘Rule of Three’ for Fragment-Based Lead Discovery? Drug Discovery Today. 2003;8(19):876–877. doi: 10.1016/S1359-6446(03)02831-9. [DOI] [PubMed] [Google Scholar]
Lipinski C. A. Lombardo F. Dominy B. W. Feeney P. J. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settingsq. Adv. Drug Delivery Rev. 2001:24. doi: 10.1016/s0169-409x(00)00129-0. [DOI] [PubMed] [Google Scholar]
Jhoti H. Williams G. Rees D. C. Murray C. W. The “rule of Three” for Fragment-Based Drug Discovery: Where Are We Now? Nat. Rev. Drug Discovery. 2013;12(8):644–644. doi: 10.1038/nrd3926-c1. [DOI] [PubMed] [Google Scholar]
Erlanson D. A. Fesik S. W. Hubbard R. E. Jahnke W. Jhoti H. Twenty Years on: The Impact of Fragments on Drug Discovery. Nat. Rev. Drug Discovery. 2016;15(9):605–619. doi: 10.1038/nrd.2016.109. [DOI] [PubMed] [Google Scholar]
Bickerton G. R. Paolini G. V. Besnard J. Muresan S. Hopkins A. L. Quantifying the Chemical Beauty of Drugs. Nat. Chem. 2012;4(2):90–98. doi: 10.1038/nchem.1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
Konteatis Z. What Makes a Good Fragment in Fragment-Based Drug Discovery? Expert Opin. Drug Discovery. 2021;16(7):723–726. doi: 10.1080/17460441.2021.1905629. [DOI] [PubMed] [Google Scholar]
Ferenczy G. G. Keserű G. M. On the Enthalpic Preference of Fragment Binding. MedChemComm. 2016;7(2):332–337. doi: 10.1039/C5MD00542F. [DOI] [Google Scholar]
Jacquemard C. Kellenberger E. A Bright Future for Fragment-Based Drug Discovery: What Does It Hold? Expert Opin. Drug Discovery. 2019;14(5):413–416. doi: 10.1080/17460441.2019.1583643. [DOI] [PubMed] [Google Scholar]
Keserű G. M. Erlanson D. A. Ferenczy G. G. Hann M. M. Murray C. W. Pickett S. D. Design Principles for Fragment Libraries: Maximizing the Value of Learnings from Pharma Fragment-Based Drug Discovery (FBDD) Programs for Use in Academia. J. Med. Chem. 2016;59(18):8189–8206. doi: 10.1021/acs.jmedchem.6b00197. [DOI] [PubMed] [Google Scholar]
Hann M. M. Leach A. R. Harper G. Molecular Complexity and Its Impact on the Probability of Finding Leads for Drug Discovery. J. Chem. Inf. Comput. Sci. 2001;41(3):856–864. doi: 10.1021/ci000403i. [DOI] [PubMed] [Google Scholar]
Shi Y. von Itzstein M. How Size Matters: Diversity for Fragment Library Design. Molecules. 2019;24(15):2838. doi: 10.3390/molecules24152838. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lovering F. Bikker J. Humblet C. Escape from Flatland: Increasing Saturation as an Approach to Improving Clinical Success. J. Med. Chem. 2009;52(21):6752–6756. doi: 10.1021/jm901241e. [DOI] [PubMed] [Google Scholar]
Péron F. Riché S. Lesur B. Hibert M. Breton P. Fourquez J.-M. Girard N. Bonnet D. Versatile Synthetic Approach for Selective Diversification of Bicyclic Aza-Diketopiperazines. ACS Omega. 2018;3(11):15182–15192. doi: 10.1021/acsomega.8b01752. [DOI] [PMC free article] [PubMed] [Google Scholar]
Downes T. D. Jones S. P. Klein H. F. Wheldon M. C. Atobe M. Bond P. S. Firth J. D. Chan N. S. Waddelove L. Hubbard R. E. Blakemore D. C. De Fusco C. Roughley S. D. Vidler L. R. Whatton M. A. Woolford A. J.-A. Wrigley G. L. O'Brien P. Design and Synthesis of 56 Shape-Diverse 3D Fragments. Chem. – Eur. J. 2020;26(41):8969–8975. doi: 10.1002/chem.202001123. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tang S.-Q. Leloire M. Schneider S. Mohr J. Bricard J. Gizzi P. Garnier D. Schmitt M. Bihel F. Diastereoselective Synthesis of Nonplanar 3-Amino-1,2,4-Oxadiazine Scaffold: Structure Revision of Alchornedine. J. Org. Chem. 2020;85(23):15347–15359. doi: 10.1021/acs.joc.0c01764. [DOI] [PubMed] [Google Scholar]
Li Petri G. Raimondi M. V. Spanò V. Holl R. Barraja P. Montalbano A. Pyrrolidine in Drug Discovery: A Versatile Scaffold for Novel Biologically Active Compounds. Top. Curr. Chem. 2021;379(5):34. doi: 10.1007/s41061-021-00347-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hiesinger K. Dar'in D. Proschak E. Krasavin M. Spirocyclic Scaffolds in Medicinal Chemistry. J. Med. Chem. 2021;64(1):150–183. doi: 10.1021/acs.jmedchem.0c01473. [DOI] [PubMed] [Google Scholar]
Hall R. J. Mortenson P. N. Murray C. W. Efficient Exploration of Chemical Space by Fragment-Based Screening. Prog. Biophys. Mol. Biol. 2014;116(2–3):82–91. doi: 10.1016/j.pbiomolbio.2014.09.007. [DOI] [PubMed] [Google Scholar]
Ghose A. K. Viswanadhan V. N. Wendoloski J. J. Prediction of Hydrophobic (Lipophilic) Properties of Small Organic Molecules Using Fragmental Methods: An Analysis of ALOGP and CLOGP Methods. J. Phys. Chem. A. 1998;102(21):3762–3772. doi: 10.1021/jp980230o. [DOI] [Google Scholar]
Firth N. C. Brown N. Blagg J. Plane of Best Fit: A Novel Method to Characterize the Three-Dimensionality of Molecules. J. Chem. Inf. Model. 2012;52(10):2516–2525. doi: 10.1021/ci300293f. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bemis G. W. Murcko M. A. The Properties of Known Drugs. 1. Molecular Frameworks. J. Med. Chem. 1996;39(15):2887–2893. doi: 10.1021/jm9602928. [DOI] [PubMed] [Google Scholar]
Kireeva N. Baskin I. I. Gaspar H. A. Horvath D. Marcou G. Varnek A. Generative Topographic Mapping (GTM): Universal Tool for Data Visualization, Structure-Activity Modeling and Dataset Comparison. Mol. Inf. 2012;31(3–4):301–312. doi: 10.1002/minf.201100163. [DOI] [PubMed] [Google Scholar]
Zimmermann M. O. Lange A. Wilcken R. Cieslik M. B. Exner T. E. Joerger A. C. Koch P. Boeckler F. M. Halogen-Enriched Fragment Libraries as Chemical Probes for Harnessing Halogen Bonding in Fragment-Based Lead Discovery. Future Med. Chem. 2014;6(6):617–639. doi: 10.4155/fmc.14.20. [DOI] [PubMed] [Google Scholar]
Hann M. M. Keserü G. M. Finding the Sweet Spot: The Role of Nature and Nurture in Medicinal Chemistry. Nat. Rev. Drug Discovery. 2012;11(5):355–365. doi: 10.1038/nrd3701. [DOI] [PubMed] [Google Scholar]
Hann M. M. Molecular Obesity, Potency and Other Addictions in Drug Discovery. MedChemComm. 2011;2(5):349–355. doi: 10.1039/C1MD00017A. [DOI] [Google Scholar]
Over B. Wetzel S. Grütter C. Nakai Y. Renner S. Rauh D. Waldmann H. Natural-Product-Derived Fragments for Fragment-Based Ligand Discovery. Nat. Chem. 2013;5(1):21–28. doi: 10.1038/nchem.1506. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials