GRADE and X-GRADE: Unveiling Novel Protein–Ligand Interaction Fingerprints Based on GRAIL Scores

Christian Fellinger; Thomas Seidel; Benjamin Merget; Klaus-Juergen Schleifer; Thierry Langer

doi:10.1021/acs.jcim.4c01902

. 2025 Feb 21;65(5):2456–2475. doi: 10.1021/acs.jcim.4c01902

GRADE and X-GRADE: Unveiling Novel Protein–Ligand Interaction Fingerprints Based on GRAIL Scores

Christian Fellinger ^†,^‡, Thomas Seidel ^†,^‡,^*, Benjamin Merget ^§, Klaus-Juergen Schleifer ^§, Thierry Langer ^†,^‡

PMCID: PMC11898076 PMID: 39980202

Abstract

graphic file with name ci4c01902_0010.jpg

Nonbonding molecular interactions, such as hydrogen bonding, hydrophobic contacts, ionic interactions, etc., are at the heart of many biological processes, and their appropriate treatment is essential for the successful application of numerous computational drug design methods. This paper introduces GRADE, a novel interaction fingerprint (IFP) descriptor that quantifies these interactions using floating point values derived from GRAIL scores, encoding both the presence and quality of interactions. GRADE is available in two versions: a basic 35-element variant and an extended 177-element variant. Three case studies demonstrate GRADE’s utility: (1) dimensionality reduction for visualizing the chemical space of protein–ligand complexes using Uniform Manifold Approximation and Projection (UMAP), showing competitive performance with complex descriptors; (2) binding affinity prediction, where GRADE achieved reasonable accuracy with minimal machine learning optimization; and (3) three-dimensional-quantitative structure–activity relationship (3D-QSAR) modeling for a specific protein target, where GRADE enhanced the performance of Morgan Fingerprints.

1. Introduction

Nonbonding inter- and intramolecular interactions such as hydrogen bonding, hydrophobic contacts, ionic interactions, π-stacking, and cation interactions¹ are the main driving force behind many biological processes from ligand–receptor binding to the folding of biopolymers.^2,3 Nonbonding interactions occur in three-dimensional (3D) space, and their presence and strength strongly depend on interaction-type specific distance and angle requirements that need to be fulfilled by potential interaction partners. An accurate analysis and adequate computer representation of all relevant characteristics of these interactions is thus key for any computational drug design method like docking,^4,5 pharmacophore modeling,⁶ molecular dynamics (MD) simulations as well as for the task of predicting derived physicochemical properties such as ligand–receptor binding energies and affinities.^7,8 For the latter, the utilization of modern machine learning (ML) techniques has become increasingly popular in recent years.^7,9−11 This can be attributed to the increased amount of available high-quality 3D structures¹² of ligand–protein complexes including associated binding affinity data,^13,14 which enabled a new era in the development of ML-based binding affinity prediction and docking scoring functions.^7,15 Benchmarking studies have shown^15,16 that for docking applications, these scoring functions are able to outperform established classical scoring functions in several aspects of relevance and impressively demonstrate the ability of novel ML methods to achieve advancements even in such a long-standing research area. ML methods based on mathematical algorithms usually require that input data is supplied in a form so that they can be fed directly to the carried out mathematical computations.¹⁷ A one- or multidimensional array of integer or floating point values, so-called descriptors, is commonly used, where each descriptor element provides a piece of significant information for the ML task at hand. For the successful application of ML methods in the context of structure-based drug design, the engineering of an appropriate set of descriptor features that is able to describe the complexity of chemical structures and their nonbonding interactions with sufficient but not too much detail^18,19 represents a key step that needs to be taken with due care.^7,8,17 Up to now, a wide panel of featurization strategies and resulting computer representations for the downstream processing of information about interactions between molecular structures were devised. They differ in several aspects such as the primary data type of the representation, variability of the representation length, types of interactions captured, granularity of the stored information, nature of the input data, and method used to generate the representation.

A well-known class of representations with an already long history are interaction fingerprints (IFPs).²⁰ Pioneering work was done by Deng et al. in 2004, which introduced the SIFt fingerprint.²¹ SIFt represents each interacting amino acid by seven bits that encode predefined interaction types, including side chain, backbone, hydrophobic, polar, and H-bonding interactions. The approach was later extended by Mordalski et al. by adding two bits for covering also aromatic and ionic interactions.²² Another variant developed by Marcou and Rognan²³ encodes hydrophobic, aromatic face-to-face and edge-to-face, H-bond donor/acceptor, and cationic/anionic interactions also with seven bits. However, the generation method allows for customization of the interaction definitions and thus greatly enhances the versatility of the approach by enabling the inclusion of less common interaction types like weak H-bonds, cation-π, and metal complexation.²⁴ The Rognan group also presented a novel and universal method to convert ligand–protein structures into a simple 210 element integer vector that reflects present molecular interaction patterns.²⁵ Each detected interaction (hydrophobic, aromatic, H-bond, ionic, and metal complexation) is represented by a pseudo-atom centered on either the interacting ligand atom, the interacting protein atom, or the geometric center of the interacting atoms. Counting all possible triplets of interaction pseudo-atoms within six distance ranges followed by a pruning step reducing the full integer vector to the most frequent triplets enables the definition of a simple, coordinate frame-invariant 210 element interaction pattern descriptor (TIFP).

The IFPs described above represent just a few examples of the wide variety of IFP generation algorithms and representations that have been developed in the last two decades. Other notable IFP implementations include PyPLIF,²⁶ APIF,²⁷ SILIRID,²⁸ SPLIF,²⁹ ProLIF,³⁰ and PLECFP.³¹ The general utility and versatility of the IFP approach is demonstrated by numerous published applications ranging from classification of ligand–protein binding modes,^32,33 postprocessing of virtual screening, and docking results,^25,34 studying ligand unbinding using data from MD simulations³⁵ to the prediction of ligand–receptor binding affinities.^31,36

In this article, we introduce a novel IFP called GRADE (short for GRAIL-based DEscriptor). GRADE is a fixed-length descriptor that comes in a basic (35 elements) and an extended version (X-GRADE; 177 elements), which differ solely in the granularity of the encoded information (see Section 2). Contrary to most IFPs that characterize observed interactions by short bit strings or integer counts, GRADE represents information about nonbonding interactions between a ligand and receptor structure as a defined set of floating point values, which not only encode the presence or absence of different types of interactions but also quantify their “quality” with regard to the fulfillment of specific distance and angle constraints. The interaction scores provided by GRADE are mostly based on GRAIL feature interaction scores,³⁷ which are calculated from the pharmacophoric representation of both the ligand and receptor structures. For a given ligand–receptor complex, GRADE can thus be calculated quickly and is therefore compatible with the analysis of MD trajectories. Furthermore, GRADE is not restricted to a specific kind of complex and does not depend on any residue naming or atom typing conventions.

Three successful applications of GRADE and X-GRADE are showcased in Section 4.

1.
Dimensionality reduction was performed using Uniform Manifold Approximation and Projection (UMAP). GRADE/X-GRADE descriptors were used on the PDBbind data set to compare the spread of the core set against another publically available molecular descriptor. This was done to evaluate the effectiveness in differentiating structurally and functionally dissimilar protein–ligand complexes.
2.
Out-of-the-box machine learning models (e.g., random forest, gradient boosting, etc.) were trained using GRADE/X-GRADE descriptors on the PDBbind data set to estimate binding affinities. The predictive models were also tested on independent data sets to evaluate their generalization capabilities. GRADE’s performance was benchmarked against established scoring functions, commonly used for docking pose scoring and refinement.
3.
Three-dimensional-quantitative structure–activity relationship (3D-QSAR) models were built for specific targets using both a proprietary insecticide data set provided by BASF SE and a publicly available data set for Cathepsin S. The performance of GRADE-based QSAR models was compared with models trained using traditional molecular fingerprints.

2. Descriptor Implementation

2.1. General Descriptor Anatomy

GRADE comes in two flavors, with different numbers of descriptor elements. The descriptor with the higher number of features, allowing for a more fine-grained characterization of observable nonbonding ligand–receptor interactions, is called Extended GRADE or short X-GRADE. The feature vectors of both descriptor versions consist of two parts: (i) a pose/receptor-independent list of features capturing ligand-specific characteristics relevant for binding and (ii) a pose-dependent part listing scores and energies describing the observed interactions between the ligand and the binding site environment residues. Details regarding the extent of the respective descriptor vector contributions can be found in Table 1.

Table 1. Principal Feature Vector Composition of GRADE and X-GRADE.

	feature count
	GRADE	X-GRADE
ligand features	13	31
interaction features	22	146
total	35	177

Open in a new tab

2.2. Pose-Independent Ligand Features

The pose-independent ligand-specific part of GRADE and X-GRADE feature vectors is composed of predicted physicochemical properties (TPSA,³⁸ XlogP,³⁹ total hydrophobicity), molecular graph properties (number of heavy atoms, rotatable bond count), and counts of pharmacophoric features (H-bond acceptors (HBA), H-bond donors (HBD), positively ionized groups (PI), negatively ionized groups (NI), aromatic rings (AR), hydrophobic atoms (H), halogen-bond donors (XBD), and halogen-bond acceptors (XBA)). Together, these descriptor vector elements provide information about the general binding capabilities of the ligand in terms of offered chemical features and also characterize the ligand with respect to entropic effects that impact the binding process. The pose-independent parts of GRADE and X-GRADE differ solely in the types of considered pharmacophoric features. GRADE provides only the counts of the standard chemical features implemented in the Chemical Data Processing Toolkit (CDPKit)⁴⁰ (see list above), which the GRADE implementation is based upon. X-GRADE uses an extended feature set where HBA and HBD features are further subclassed by the Sybyl atom type⁴¹ of the parent atoms to allow for a more fine-grained characterization of H-bonding interactions. The list of calculated ligand features in the order in which they appear in the two descriptor variants is shown in Table 2.

Table 2. List of Pose-Independent Ligand Features Calculated for GRADE and X-GRADE.

	feature^a
descriptor vector element	GRADE	X-GRADE
index
0	PI feature count	PI feature count
1	NI feature count	NI feature count
2	AR feature count	AR feature count
3	H feature count	H feature count
4	HBD^d feature count	HBD^b^,^d feature count
5	HBA feature count	HBA^b feature count
6	XBD feature count	XBD feature count
7	XBA feature count	XBA feature count
8	heavy atom count	N.3^c^,^d HBD feature count
9	rotatable bond count	N.2^c^,^d HBD feature count
10	total hydrophobic feature weight	N.ar^c HBD feature count
11	XlogP	N.am^c^,^d HBD feature count
12	TPSA	N.pl3^c^,^d HBD feature count
13		N.4^c^,^d HBD feature count
14		O.3^c HBD feature count
15		S.3^c HBD feature count
16		N.3^c HBA feature count
17		N.2^c HBA feature count
18		N.1^c HBA feature count
19		N.ar^c HBA feature count
20		N.pl3^c HBA feature count
21		O.3^c HBA feature count
22		O.2^c HBA feature count
23		O.co2^c HBA feature count
24		S.3^c HBA feature count
25		S.2^c HBA feature count
26		heavy atom count
27		rotatable bond count
28		total hydrophobic feature weight
29		XlogP
30		TPSA

Open in a new tab

HBA = H-bond acceptors, HBD = H-bond donors, PI = positively ionized groups, NI = negatively ionized groups, AR = aromatic rings, H = hydrophobic atoms, XBD = halogen-bond donors, XBA = halogen-bond acceptors.

For HBA/HBD features with parent atoms not having one of the Sybyl types below.

N.1 = sp nitrogen, N.2 = sp2 nitrogen, N.3 = sp3 nitrogen, N.4 = positively charged sp3 nitrogen, N.ar = aromatic nitrogen, N.am = amide nitrogen, N.pl3 = trigonal planar nitrogen, O.2 = sp2 oxygen, O.3 = sp3 oxygen, O.co2 = oxygen in carboxylate and phosphate groups, S.2 = sp2 sulfur, and S.3 = sp3 sulfur.

Counted per bonded hydrogen, e.g., a –NH₂ group contributes two HBDs.

CDPKit’s default hydrophobic feature generation functionality is based on Greene’s feature placement algorithm,⁴² which, if specific criteria are fulfilled, represents multiple neighboring lipophilic atoms by a single hydrophobic feature. For GRADE, a more fine-grained strategy has been implemented that places a feature on every heavy atom that is considered as being lipophilic. Doing so allows one to represent the shape of lipophilic ligand moieties in much more detail compared to the sparse set of features resulting from Greene’s method. For hydrophobic feature placement, all atoms that contribute to the logP of the ligand by an amount greater than or equal to 0.15 (this threshold gave the most reasonable results in a series of tests) are considered as lipophilic enough to be represented by a dedicated feature. The logP contribution of an atom is obtained with no extra cost in the course of predicting the ligand’s lipophilicity using an implementation of the XlogP V2 method by Wang et al.³⁹ The logP increment of a hydrophobic feature’s parent atom then also represents the “weight” of the emitted feature. These weights serve as parameters for the subsequent calculation of hydrophobic feature pair interaction scores (see section 2.6) and the total hydrophobic feature weight GRADE feature (see Table 2).

2.3. Pose-Dependent Features

The pose-dependent part of GRADE and X-GRADE provides features that quantify different types of nonbonding interactions that can be observed between a ligand in a particular pose and complementary interaction partners in the binding site environment. The pose-dependent part itself can be further subdivided into three sections: (i) a set of features quantifying the degree of coverage of HBA/HBD atoms on the receptor surface by ligand atoms not able to interact (accounts for the energetically disadvantageous disruption of potential H-bonds with water molecules upon ligand binding), (ii) a list of GRAIL-based feature interaction scores, and (iii) a set of physics-based features providing information about the electrostatic potential between ligand and environment and the amount of van der Waals (VdW) attraction/repulsion. The complete set of pose-dependent features provided by the GRADE variant is listed in Table 3. Analogous to the pose-independent part, the calculation of GRAIL-based interaction scores and environment HBA/HBD coverages for the X-GRADE variant uses an extended set of HBA/HBD feature types, and both GRADE variants only differ at that point. The much larger set of 146 pose-dependent X-GRADE features resulting from this further subdivision of HBA and HBD types is listed in Table S1.

Table 3. List of Pose-Dependent Interaction Features Calculated for the GRADE Variant.

descriptor vector element index	feature^a	section
13	HBA coverage sum.	environment HBA/HBD coverage
14	HBA coverage max. sum
15	HBD coverage sum
16	HBD coverage max. sum
17	PI ↔ AR interaction score sum	GRAIL feature interaction scores^b
18	PI ↔ AR score max. sum
19	AR ↔ PI score sum
20	AR ↔ PI score max. sum
21	H ↔ H score sum
22	H ↔ H score max. sum
23	AR ↔ AR score sum
24	AR ↔ AR score max. sum
25	HBD ↔ HBA score sum
26	HBD ↔ HBA score max. sum
27	HBA ↔ HBD score sum
28	HBA ↔ HBD score max. sum
29	XBD ↔ XBA score sum
30	XBD ↔ XBA score max. sum
31	electrostatic potential	energies/forces
32	sum of pairwise electrostatic forces
33	VdW attraction
34	VdW repulsion

Open in a new tab

First listed pharm. feature type denotes ligand features of that type, second pharm. feature type, the corresponding features of the binding site environment.

2.4. Calculation of the Electrostatic Potential and Sum of Pairwise Forces

GRADE feature vector elements corresponding to electrostatic potential and force between the ligand and receptor atoms are calculated by eq 1. The magnitudes of the obtained values are not correct in a physical sense, since they have not been scaled by a corresponding factor that accounts for the permittivity of the medium surrounding the atoms. This is not a problem since, e.g., building a ML model based on GRADE will assign each descriptor element a model-specific weight anyway

where U_E is the intermolecular electrostatic potential between the atoms i of the ligand and the atoms j of the binding site if the distance exponent x is 1, and the sum of pairwise forces if x equals 2. GRADE feature vector elements corresponding to electrostatic potential and force between ligand and receptor atoms are calculated by eq 1. r_ij is the spatial distance between atom i and atom j, q_i the partial charge of atom i, q_j the partial charge of atom j, N denotes the number of ligand atoms, and M the number of binding site atoms. Partial charges are calculated using the corresponding functionality provided by CDPKit’s Merck Molecular Force Field 94^43,44 (MMFF94) implementation.

2.5. Calculation of Attractive and Repulsive Van der Waals Energies

To model VdW interactions of ligand–receptor atom pairs, GRADE uses a Morse potential as a functional form. In contrast to a Lennard-Jones (LJ) 6–12 potential that is commonly used for this purpose, Morse potentials are less sensitive to deviations from “ideal” distances and give reasonable attractive/repulsive energies irrespective of the particular method (e.g., X-ray crystallography, NMR experiments, or MD simulations) that was used for the determination of the investigated 3D structure. Ideal distances (eq 5) and well depths (eq 4) for the evaluated atom pairs are derived from atom-type specific parameters employed by the universal force field (UFF),⁴⁵ which are available for all naturally occurring chemical elements. The attractive and repulsive contribution to the VdW interaction between ligand and receptor are calculated as follows (eqs 2 and 3)

where E_VdW,attr is the attractive and E_VdW,rep the repulsive contribution to the van der Waals potential between ligand atoms i and binding site atoms j, N is the number of ligand atoms, M the number of binding site atoms, r_ij the spatial distance between atom i and atom j, α a factor controlling the Morse potential well width, D_IJ the well depth for the atom pair consisting of atom i with UFF atom type I and atom j with UFF atom type J, and x_IJ the equilibrium distance of the respective atom pair. For the well width factor α, the value 1.1 was chosen, which delivered the best results in a series of tests. D_I, D_J, x_I, and x_J are values that have been tabulated for the respective UFF atom types and can be found in ref (45). Since UFF does not provide any special parameters for “polar” hydrogens, x_I and x_J values are scaled by a factor of 0.5 if atom i and/or j is hydrogen bonded to nitrogen, oxygen, or sulfur.

2.6. Calculation of GRAIL Interaction Score Features

Putative nonbonding interactions between pairs of complementary chemical features of the ligand and the binding site residues are quantified by means of GRAIL interaction scores.³⁷ Here, the score FIS_ij (eq 6) of the interaction between a ligand feature i and the receptor feature j is calculated as the product of distance and angle-dependent score contributions DS_ij and AS_ij, respectively, weighted by a factor C_ij

The factor C_ij is 1.0 for all types of considered interactions except for those between hydrophobic features, where C_ij represents the product of the hydrophobic feature weights (see Section 2.2). More detailed information regarding the calculation of distance and angle-dependent score contributions can be found in SI section: “Calculation of GRAIL Pharmacophoric Feature Interaction Scores”. For each supported nonbonding interaction type, two GRADE feature values are calculated that quantify the total of the interactions of that type between the ligand in the given pose and any proximal receptor residues

where IS_AB,sum is the sum of GRAIL scores FIS_AB,ij for the nonbonding interaction of type A–B that were calculated for all pairs of pharmacophoric ligand features i of type A and the binding site environment features j of type B, N is the number of ligand features of type A, and M is the number of binding site environment features of type B

where IS_AB,max.sum is the sum of the maximum GRAIL scores max(FIS_AB,ij) for the nonbonding interaction of type A–B that were calculated for each pharmacophoric ligand feature i of type A and the binding site environment features j of type B, N is the number of ligand features of type A, and M is the number of binding site environment features of type B.

2.7. Calculation of Binding Site HBA/HBD Atom Coverage

GRADE features dedicated to binding site HBA/HBD atom coverage quantify the extent to which these atoms are sterically blocked by ligand heavy atoms (e.g., carbon atoms) that are not able to form H-bonds. The higher the value of these GRADE features, the lower the number of potential energetically favorable H-bonds that could otherwise be formed between complementary ligand–receptor atom pairs. For the calculation of two coverage features per environment H-bonding feature type X (for the “simple” GRADE variant X is either HBA or HBD), eqs 9 and 10 are used

where Cov_X,sum is the sum of GRAIL scores FIS_AX,ij for the “pseudo” H-bonding interaction of type A–X that were calculated for all pairs of ligand heavy atoms i with no assigned feature type or a type A that is not complementary to X (e.g., any type that is not HBD if X = HBA) and the binding site environment H-bonding features j of type X, N is the number of ligand heavy atoms with no feature type or a type A not complementary to X and M is the number of binding site environment H-bonding features of type X

where Cov_AX,max.sum is the sum of the maximum GRAIL scores max(FIS_AX,ij) for the “pseudo” H-bonding interaction of type A–X that were calculated for each ligand heavy atom i with no assigned feature type or a type A that is not complementary to X and the binding site environment H-bonding features j of type X, N is the number of ligand heavy atoms with no feature type or a type A not complementary to X and M is the number of binding site environment H-bonding features of type X.

3. Methods

3.1. Data Collection and Preparation

PDBbind v.2020 was used for Case Studies 1 and 2. The refined set of PDBbind was used to train machine learning models, the core set was used as a validation set, and the general set was used as a test set.⁴⁶⁻⁴⁹ The refined set provides either a K_i or a K_d value per complex. Additionally, Gibbs free energy (ΔG) values were calculated by the following formula

where R represents the universal gas constant, T represents the temperature, and K_d is the dissociation constant. To calculate ΔG, a temperature of 298 K and the provided K_d values were used. Furthermore, for calculating ΔG, the simplifying assumption K_i = K_d was made in the case of complexes for which only K_i values are provided. Since this assumption is only true if the inhibition is competitive, three types of training sets were created from the refined set of PDBbind. The “all” set is the refined set of PDBbind after removing any PDB code that is part of the core set (5050 protein–ligand complexes). The “ki” set is a subset of the “all” set and contains only the PDB codes for which K_i values are present (2367 complexes). Lastly, the “kd” set contains only those “all” set entries for which K_d values are provided (2683 complexes).

The core set was left as is and used as a validation set. This means that there was no differentiation between K_i and K_d values. The same is true for the general set with the added complication that some protein–ligand complexes only provided IC₅₀ values. These were also not differentiated from the K_i and K_d values. For training purposes, K_i and K_d values were converted to pK_i and pK_d, respectively.

The PL-REX data set⁵⁰ was employed as an additional high-quality external test set for Case Study 2.

For 3D-QSAR modeling, two test sets were prepared: (1) a proprietary set of 734 insecticidal compounds, kindly provided by BASF SE and (2) the Cathepsin S data set published by Janssen as part of the Drug Design Data Resource Grand Challenge 4⁵¹ in 2020.

The total BASF data set consists of 734 compounds with corresponding pK_i values. All compounds were docked into a proprietary X-ray structure of the insecticide target using Schrdinger Glide with default settings.⁵²⁻⁵⁶ The 100 complexes that were measured most recently were then removed from the total data set and saved as a separate test set. The remaining 634 complexes were used for training.

The Janssen data set was downloaded from https://drugdesigndata.org/about/datasets/2028 (August 20, 2024). Compounds of both the free energy (FE) set and the score set were prepared using Schrdinger Maestro LigPrep for protonation at pH 7.4 and initial 3D conformation generation using default parameters (Maestro version 13.7.125).⁵⁷ Protein complex structures for Cathepsin S were downloaded from the previous Grand Challenge 3⁵⁸ from https://drugdesigndata.org/about/datasets/2020. The 24 protein structures were prepared using Schrdinger Maestro, and the binding pocket geometries were analyzed. As no major differences were observed in the conformations of the binding pockets, the first structure (ID ZRQQ) was used for docking using Schrdinger Glide with default parameters. The prepared and docked Janssen Cathepsin S data set eventually contained 465 unique compounds with IC₅₀ data and an additional free energy set of 32 compounds was used as an external test set. An overlap of 28 compounds was removed from the initial training set, leaving 437 compounds for actual training.

3.2. Machine Learning

All machine learning was done in a Python environment, using scikit-learn⁵⁹ (version 1.2.2) and XGBoost⁶⁰ (v. 1.7.6). The dimensionality reduction process in Case Study 1 was carried out with umap-learn⁶¹ (v. 0.5.5) using the settings n_neighbors = 15, min_dist = 0.1, n_components = 2, metric = euclidian, as per default.

Using the training sets as defined in Section 3.1, the following models (employing scikit-learn) were trained once using GRADE and once using X-GRADE.

LinearRegression
RidgeCV
LassoCV
ElasticNetCV
SVR
DecisionTreeRegressor
RandomForestRegressor
XGBRegressor

Before training these models, both descriptors were treated with a Standard Scaler to ensure that no single feature is assigned a higher importance because of the absolute scale.

Morgan Fingerprints (MFP, radius 2, bit vector length 2048) and RDKit_PhysChem (using the default molecular descriptor function) were generated by means of RDKit⁶² (v. 2023.09.6). GRADE_OPD only included the pose-dependent features of GRADE as described in Section 2.3. Protein–Ligand Extended Connectivity (PLEC) Fingerprints³¹ were created with default parameters as implemented in the Open Drug Discovery Toolkit (ODDT).⁶³

4. Results and Discussion

To investigate the utility of GRADE and X-GRADE in different application scenarios, three case studies were conducted. Case Study 1 focuses on the visualization of protein–ligand complexes in different subsets of the PDBbind v.2020 database⁴⁶⁻⁴⁹ in chemical space using the UMAP⁶⁴ method. Case Study 2 aimed to estimate protein–ligand binding affinities using the introduced descriptors as input for the training of various machine learning models. In Case Study 3, the applicability of GRADE-based machine learning models for 3D-QSAR on a single target was assessed two times using a public and a proprietary data set.

4.1. Case Study 1: Dimensionality Reduction and Visualization

Dimensionality reduction was carried out using the AlignedUMAP algorithm, which relies on three assumptions: (1) the data is uniformly distributed on a Riemannian manifold, (2) the Riemannian metric can be approximated as locally constant, and (3) the manifold is locally connected.⁶⁵

To generate UMAPs (in the following, used as the acronym for the obtained dimensionality reduction result and its visualization), a reducer needs to be trained on some of the data to learn about the manifold. We used the refined set of PDBbind v.2020 for this purpose. To ensure that features with a higher variety do not dominate the representation, a standard scaler was used for the data. Additionally, PCA was used to reduce the number of features to 35 in all cases to circumvent the “curse of dimensionality”.⁶⁶ This reducer then scaled down the descriptors into two dimensions for convenient visualization of the represented points in the chemical space.

4.1.1. Divided by PDBbind Subsets

The first question we addressed was How well does the descriptor reflect the differences in the chemical structures of different protein–ligand complexes? We decided to split the general set, refined set, and core set to investigate that question. As outlined in a previous paper,⁴⁹ the core set was compiled with high chemical diversity in mind. A GRAILS-based representation of the core set is thus expected to also show a substantial spread in the descriptor space plane.

Figure 1 shows the UMAP representation obtained for the GRADE descriptor in the leftmost column using the data set split described above. The core set (yellow) is nicely distributed across the chemical space. The middle column shows the UMAP representation obtained for the X-GRADE descriptor, although the distribution of the core set with this descriptor seems closer to the ideal spread, with some singular points spread around. These singular points might indicate that the resulting shape is partly defaulting to a Gaussian distribution due to the increased complexity of the descriptor.⁶⁶ For the comparison with an established descriptor, the same workflow as before was used to calculate UMAPs using Δ_vinaXGB⁶⁷ (rightmost column). Again, looking at the core set, the distribution across the chemical space based on Δ_vinaXGB seems comparable to those of GRADE and X-GRADE. Taking a closer look at the differences of the core set distribution of GRADE and X-GRADE, it seems like X-GRADE has a more uniform distribution. This appears to lead to the conclusion that X-GRADE is the better version overall, but due to the higher complexity of this descriptor, this is to be expected and should not be overinterpreted.

UMAP representations of the core set (yellow, second row), refined set (blue, third row), general set (red, fourth row) of PDBbind, and all together (top row) using GRADE, X-GRADE, and Δ_vinaXGB, respectively, as described in Tables 2, 3, and S1.

4.1.2. Divided by the Enzyme Commission (EC) Number

Due to the fact that GRADE and X-GRADE are based on protein–ligand interactions, we decided to investigate additional possible subsets of the PDBbind database. For this purpose, we divided the general set into subsets 0–7 where the numeric value refers to the top-level EC number. Zero does not exist in this classification scheme and represents all complexes that had no EC number or multiple ones. Table S3 shows the number of protein–ligand complexes of each EC Class.

Figure S2 shows the UMAP representations of different EC classes calculated for GRADE, X-GRADE, and Δ_vinaXGB, respectively. Even though some clustering can be observed in all representations, it is quite clear that none of these descriptors provide sufficient information to differentiate between EC numbers.

All previously discussed, dimensionality reductions were also carried out using t-Distributed Stochastic Neighbor Embedding⁶⁸ (t-SNE). The visualized results displayed even less differentiation between different descriptor types, and the use of this technique did not lead to any qualitative improvements.

4.2. Case Study 2: Binding Affinity Estimation

In order to evaluate GRADE and X-GRADE regarding their applicability for protein–ligand binding affinity estimation tasks, different ML models were trained on the PDBbind database by using GRADE and X-GRADE vectors calculated for a subset of the provided protein–ligand complexes.

4.2.1. PDBbind Data Set

The models were trained as described in Sections 3.1 and 3.2. The results of the validation and test split by descriptor type are shown in Table 4.

Table 4. Best Performing Model of Each Model Type for the Validation and Test Set, Respectively.

model type	trained on	MAE	MSE	SD	Pearson R	90% conf. int.	Spearman R	descriptor type	set	size
RandomForest	pK_all	1.25	2.42	1.14	0.73	[0.679–0.771]	0.72	GRADE	validation	285
SVR	pK_all	1.27	2.55	1.21	0.69	[0.636–0.738]	0.67
XGBoost	pK_all	1.29	2.66	1.35	0.66	[0.606–0.715]	0.67
Lasso	pK_i	1.35	2.79	1.18	0.65	[0.588–0.701]	0.65
ElasticNet	pK_i	1.36	2.79	1.18	0.65	[0.587–0.701]	0.64
Ridge	pK_i	1.36	2.81	1.22	0.64	[0.580–0.695]	0.64
LinearRegression	pK_i	1.37	2.81	1.23	0.64	[0.578–0.694]	0.64
DecisionTree	pK_i	1.58	3.96	1.78	0.51	[0.433–0.578]	0.48
RandomForest	pK_all	1.23	2.37	1.13	0.74	[0.691–0.780]	0.73	X-GRADE	validation	285
Lasso	pK_i	1.32	2.62	1.13	0.69	[0.637–0.739]	0.68
SVR	pK_i	1.34	2.78	1.09	0.67	[0.610–0.719]	0.64
XGBoost	pK_all	1.29	2.65	1.36	0.66	[0.604–0.714]	0.66
ElasticNet	pK_all	1.39	2.93	0.95	0.65	[0.590–0.703]	0.66
Ridge	pK_all	1.38	2.99	1.05	0.62	[0.561–0.681]	0.64
DecisionTree	pK_all	1.57	4.03	1.86	0.51	[0.438–0.582]	0.49
LinearRegression	pK_i	>^a	>^a	>^a	0.01	[−0.018–0.176]	0.66
RandomForest	pK_d	1.16	2.17	1.17	0.59	[0.583–0.601]	0.57	GRADE	test	14119
SVR	pK_all	1.18	2.26	1.18	0.57	[0.557–0.576]	0.55
XGBoost	pK_all	1.28	2.66	1.41	0.51	[0.502–0.522]	0.50
DecisionTree	pK_all	1.58	4.07	1.85	0.39	[0.383–0.406]	0.39
Lasso	pK_all	1.42	4.59	1.84	0.32	[0.305–0.330]	0.46
ElasticNet	pK_d	1.42	4.68	1.87	0.31	[0.302–0.327]	0.46
Ridge	pK_all	1.43	4.74	1.89	0.31	[0.300–0.325]	0.46
LinearRegression	pK_all	1.43	4.79	1.90	0.31	[0.298–0.323]	0.46
RandomForest	pK_all	1.15	2.15	1.17	0.59	[0.584–0.602]	0.58	X-GRADE	test	14119
SVR	pK_all	1.19	2.29	1.18	0.56	[0.550–0.569]	0.55
XGBoost	pK_all	1.25	2.60	1.40	0.52	[0.513–0.533]	0.52
DecisionTree	pK_d	1.60	4.15	1.86	0.39	[0.374–0.398]	0.38
Ridge	pK_all	>^a	>^a	>^a	0.01	[−0.006–0.022]	0.47
ElasticNet	pK_all	>^a	>^a	>^a	0.01	[−0.006–0.022]	0.50
Lasso	pK_all	>^a	>^a	>^a	0.01	[−0.006–0.022]	0.48
LinearRegression	pK_d	>^a	>^a	>^a	0.00	[−0.016–0.012]	0.48

Open in a new tab

> Represents an unreasonably big numeric value. They are at least a few orders of magnitude bigger than the other values of the same type.

The performance of these models ranges from a Pearson R of 0.74–0.00 depending on the type of model, the test/validation set, the type of employed descriptor, and the target affinity value for which the model was trained.

When it comes to models trained using the GRADE variant and predictions performed for the validation set, the Random Forest (RF)-based model delivered the best accuracy, followed by the gradient support vector regression-based model (SVR), the gradient boosted trees regression model (XGBoost), the linear models, and finally the decision tree regression model. The “trained on” column shows that depending on the model, sometimes training on pK_i or pK_d values, and sometimes training on pK_all values gives the best results—without any obvious systematic reason. X-GRADE shows similar trends with an important distinction: Lasso regression performed better in this case. This is somewhat surprising since Lasso regression is inherently linear and can therefore not model very complex relationships between the descriptor and binding affinity, especially since other nonlinear models still perform worse, namely, SVR and XGBoost. Pure linear regression models show the least promising performance for GRADE and X-GRADE. This is especially true for performance with X-GRADE.

Looking at the test set in more detail, some deviations from the validation set are apparent. First, the accuracy, measured by the Pearson correlation coefficient, is significantly lower than that for the validation set. Although the larger set size of the general set might suggest more robust test results, they still might be skewed since the complexes in the general set of PDBbind are not uniformly distributed within the chemical space, as shown in Section 4.1. Additionally, the average quality of the data provided by the general set is significantly worse.⁴⁶⁻⁴⁹ Still, the trends stay similar to those of the validation set, with one exception: SVR seems to perform better than all other models, except the RF models. Figure 2 shows the best four models (two for GRADE and two for X-GRADE) for the validation and test sets according to Table 4, respectively. The full list of graphical representations of the models can be seen in Figures S3 and S4.

Predictions of the two best performing GRADE and X-GRADE Models on the PDBbind core set and general set.

To get a deeper understanding of the accuracy of different models, we again split the data by the EC number. The most accurate models are shown in Table 5.

Table 5. Best Performing Model for Each EC Main Class Number on the Validation and Test Set, Respectively.

model type	EC #	trained on	MAE	MSE	SD	Pearson R	90% conf. int	Spearman R	descriptor	set	size
SVR	3	pK_all	0.64	0.74	1.40	0.88	[0.593–0.996]	0.70	GRADE	val.	9
LinearRegression	2	pK_all	1.35	2.84	1.08	0.86	[0.508–0.966]	0.60	GRADE	val.	8
RandomForest	5	pK_i	1.20	2.37	1.25	0.64	[0.567–0.702]	0.63	GRADE	test	211
RandomForest	2	pK_all	1.05	1.79	1.09	0.61	[0.592–0.626]	0.59			3806
RandomForest	4	pK_d	1.27	2.85	1.23	0.60	[0.541–0.653]	0.63			357
RandomForest	1	pK_i	1.04	1.76	1.12	0.60	[0.538–0.650]	0.59			357
RandomForest	3	pK_d	1.23	2.46	1.25	0.60	[0.576–0.614]	0.56			3211
RandomForest	6	pK_all	1.17	2.26	0.98	0.57	[0.503–0.626]	0.56			332
DecisionTree	3	pK_i	0.72	0.56	2.06	0.94	[0.803–0.985]	0.92	X-GRADE	val.	9
ElasticNet	2	pK_d	1.66	3.42	0.67	0.83	[0.416–0.957]	0.57	X-GRADE	val.	8
RandomForest	5	pK_i	1.17	2.11	1.19	0.68	[0.615–0.737]	0.65	X-GRADE	test	211
RandomForest	1	pK_d	1.04	1.74	1.11	0.60	[0.543–0.655]	0.60			357
RandomForest	4	pK_d	1.26	2.83	1.23	0.60	[0.543–0.654]	0.64			357
RandomForest	3	pK_d	1.22	2.46	1.25	0.60	[0.580–0.617]	0.57			3211
RandomForest	2	pK_d	1.06	1.83	1.11	0.60	[0.581–0.615]	0.57			3806
RandomForest	6	pK_i	1.15	2.29	1.00	0.55	[0.486–0.612]	0.55			332

Open in a new tab

It is noteworthy that the validation set for classes 2 and 3 contains less than 10 complexes, not enough to perform a reliable statistical evaluation. Other classes were not represented in the validation set at all. Looking at the test set, the sample size is not an issue anymore, but the accuracy is back in the range of better models of Table 4. This is also represented by the 90% confidence interval of Pearson R.

Figure 3 shows the graphical representations of the best models in Table 5. All of the remaining representations are displayed in Figure S5.

Predictions of the two best performing GRADE and X-GRADE models on the PDBbind general set are split by EC numbers.

4.2.2. PL-REX Data Set

All tests discussed above were performed using subsets of the PDBbind data. To further reinforce these results, we also tested with protein–ligand complexes provided by the PL-REX⁵⁰ data set. These are of the highest quality and thus allow for a robust evaluation of the predictive capability of binding affinity estimation methods. The PL-REX data set features 10 different biologically relevant systems, namely, CA2, HIV-PR, CK2, AR, Cath-D, BACE1-D3R, JAK1, Trypsin, CDK2, and MMP12. Table 6 shows the best performing model for each system using GRADE and X-GRADE, respectively. Figures S6 and S7 show graphical representations of these models. Here, only ΔG values are of relevance since the data set only provides those.

Table 6. Best Performing Model for Each System in the PL-REX Data Set.

model type	system	trained on	MAE	MSE	SD	Pearson R	90% conf. int	Spearman R	descriptor	size
LinearRegression	Trypsin	ΔG_{K_d}	1.12	2.21	0.74	0.92	[0.805–0.968]	0.80	GRADE	15
ElasticNet	Cath-D	ΔG_{K_d}	1.40	2.56	0.21	0.86	[0.575–0.956]	0.72		10
LinearRegression	JAK1	ΔG_{K_d}	3.92	16.30	0.30	0.82	[0.536–0.935]	0.88		12
ElasticNet	BACE1-D3R	ΔG_{K_d}	0.75	1.04	0.54	0.73	[0.432–0.880]	0.71		16
Lasso	AR	ΔG_{K_d}	0.86	0.97	0.30	0.72	[0.395–0.887]	0.57		14
RandomForest	CA2	ΔG_{K_all}	0.77	0.78	0.76	0.60	[0.074–0.866]	0.65		10
SVR	CK2	ΔGpK_i	0.48	0.38	0.52	0.57	[0.194–0.804]	0.49		16
SVR	CDK2	ΔG_{K_all}	1.46	3.62	0.84	0.50	[0.230–0.694]	0.43		31
SVR	HIV-PR	ΔG_{K_d}	1.70	4.51	0.55	0.42	[0.075–0.680]	0.45		22
XGBoost	MMP12	ΔG_all	1.14	2.14	0.94	0.36	[−0.046–0.666]	0.18		18
Ridge	Cath-D	ΔG_{K_d}	1.35	2.15	0.28	0.92	[0.760–0.978]	0.9	X-GRADE	10
ElasticNet	Trypsin	ΔG_{K_i}	1.18	2.05	0.73	0.84	[0.638–0.936]	0.73		15
SVR	JAK1	ΔG_{K_d}	3.77	14.81	0.69	0.82	[0.545–0.936]	0.66		12
DecisionTree	AR	ΔG_{K_d}	0.94	1.65	1.94	0.80	[0.544–0.922]	0.68		14
Lasso	BACE1-D3R	ΔG_{K_d}	0.88	1.04	0.50	0.76	[0.490–0.895]	0.84		16
SVR	CK2	ΔG_{K_all}	0.72	0.67	0.63	0.69	[0.378–0.864]	0.69		16
SVR	MMP12	ΔG_{K_d}	1.98	4.47	0.57	0.67	[0.368–0.844]	0.51		18
SVR	HIV-PR	ΔG_{K_d}	1.99	5.99	0.44	0.57	[0.264–0.772]	0.42		22
RandomForest	CA2	ΔG_{K_i}	2.09	5.23	0.33	0.57	[0.025–0.853]	0.56		10
XGBoost	CDK2	ΔG_{K_i}	1.19	2.37	1.00	0.57	[0.322–0.742]	0.54		31

Open in a new tab

The resulting performance ranges from a Pearson R of 0.92 to 0.36, depending on the system. Surprisingly, some of these models perform significantly better than on the validation or test set extracted from PDBbind, and many systems prefer a linear model. The sample sizes are rather small, ranging from 10 to 31 complexes per system, but the 90% confidence interval of Pearson R indicates quite good results.

Figures 4 and 5 display the predictions of the best two models for GRADE and X-GRADE, respectively, as shown in Table 6.

Predictions of the two best performing GRADE models on PL-REX data.

Predictions of the two best performing X-GRADE models on PL-REX data.

To further investigate the performance on the PL-REX data set, UMAPs were created as described in Section 4.1 to see whether the systems are in the applicability domain defined by the training set.

Figures 6 and 7 show the descriptors calculated for the PDBbind refined set in gray and the descriptors for different systems in PL-REX in color for GRADE and X-GRADE, respectively. The GRADE descriptor projection shows that most points of every system are close together, and it is apparent that all systems are perfectly within the distribution. The X-GRADE descriptor projection looks less ordered. Points of a single system, which are often spread over the whole projection, provide additional evidence that some of the points might default to a Gaussian distribution.

UMAP projection of the PL-REX data set in comparison to the PDBbind refined set using GRADE descriptors.

UMAP projection of the PL-REX data set in comparison to the PDBbind refined set using X-GRADE descriptors.

4.2.3. Comparison to Existing Methods

The Comparative Assessment of Scoring Functions (CASF)¹⁶ protocol was used to benchmark our models against other binding affinity estimation techniques. Table 7 was taken from the CASF publication and modified to include the best performing models for GRADE and X-GRADE, respectively. Additionally, we benchmarked the model prediction speeds with precalculated descriptors. The workstation used for this benchmark employs an Intel Core i7-8700K CPU @ 3.70 GHz processor, 32GB RAM, and a GeForce GTX 1050 Ti graphics card.

Table 7. Comparative Assessment of Scoring Functions (CASF)¹⁶ Modified to Include GRADE and X-GRADE Binding Affinity Estimation and Their Corresponding Runtime.

			core set (validation set)			general set (test set)
rank or train. set	descriptor	scoring function (sf)	R	90% confidence interval	runtime [s]	R	90% confidence interval	runtime [s]
1	Δ_VinaRF₂₀		0.816	[0.772–0.848]	0.26367	n.p.a.^a	n.p.a.	n.p.a.
pK_all	X-GRADE	RF-based sf	0.74	[0.693–0.781]	0.04971	0.60	[0.586–0.604]	0.21900
pK_all	GRADE	RF-based sf	0.74	[0.691–0.780]	0.05361	0.59	[0.586–0.604]	0.18842
pK_all	PLEC	RF-based sf	0.73	[0.685–0.776]	0.06532	0.57	[0.561–0.580]	1.30035
2	X-Score		0.631	[0.571–0.682]	n.p.a.	n.p.a.	n.p.a.	n.p.a.
3	X-Score^HS		0.629	[0.568–0.679]	n.p.a.	n.p.a.	n.p.a.	n.p.a.
4	ΔSAS		0.625	[0.568–0.675]	n.p.a.	n.p.a.	n.p.a.	n.p.a.
5	X-Score^HP		0.621	[0.560–0.675]	n.p.a.	n.p.a.	n.p.a.	n.p.a.
6	ASP@Gold		0.617	[0.549–0.674]	n.p.a.	n.p.a.	n.p.a.	n.p.a.
6	ChemPLP@Gold		0.614	[0.543–0.671]	n.p.a.	n.p.a.	n.p.a.	n.p.a.

Open in a new tab

n.p.a. = not publically available.

Considering the Pearson R values, both the GRADE and X-GRADE models performed well in comparison to existing methods. Only Δ_VinaRF₂₀ outperforms our models in terms of accuracy. This already shows the reasonable predictive capability of these models. Additionally, GRADE and X-GRADE models predict much faster than Δ_VinaRF₂₀ by a factor of <5. For the validation set (285 complexes), this speedup is already significant, but looking on the test set (14119 complexes), this becomes even more apparent. In less than the time that Δ_VinaRF₂₀ needs for predicting the validation set, our models predict the whole test set. To provide additional context, we also trained a PLEC descriptor model, which performed equally well in comparison to the GRADE and X-GRADE models, except in terms of runtime. However, a major advantage of GRADE and X-GRADE lies in their interpretability, which further underscores their utility.

We additionally decided to take a closer look at the descriptor calculation speed itself. The results for these benchmarks can be seen in Table 8.

Table 8. Time to Calculate Different Descriptors for the Core Set, Refined Set, and General Set of PDBbind, Respectively.

descriptor	time [s]	# of complexes	success rate [%]
GRADE	1268.13	285	100	core set
GRADE + prot.	2076.49	285	100
X-GRADE	1198.65	285	100
X-GRADE + prot.	2075.51	285	100
PLEC	2322.67	277	97
Δ_vinaXGB	1156.60	285	100
GRADE	14358.31	5316	100	refined set
GRADE + prot.	32222.36	5316	100
X-GRADE	14004.57	5316	100
X-GRADE + prot.	32907.01	5316	100
PLEC	33540.84	5119	96
Δ_vinaXGB	20376.31	5207	98
GRADE	34837.78	14127	100	general set
GRADE + prot.	81523.83	14127	100
X-GRADE	33584.94	14127	100
X-GRADE + prot.	81317.64	14127	100
PLEC	110877.03	13563	96
Δ_vinaXGB	59909.96	13412	95

Open in a new tab

It is important to note that these results represent the calculation times for the descriptor computation applied to the entire script. These times should be interpreted cautiously as they include the loading process for the protein–ligand complex in addition to the pure computation of the descriptor. GRADE and X-GRADE were each calculated both with and without protonation estimation, which had a substantial impact on the calculation time. However, when examining the results without protonation estimation, it is evident that GRADE and X-GRADE are at least comparably fast to the other descriptors when computing the core set and significantly faster for the refined and general sets. Furthermore, GRADE and X-GRADE were the only descriptors that could be successfully calculated for all of the provided protein–ligand complexes.

4.3. Case Study 3: 3D-Quantitative Structure–Activity Relationship (3D-QSAR)

To assess the performance of the GRADE descriptors in a target-specific predictive modeling scenario, 3D-QSAR models were trained as a third case study. Models were trained (1) on proprietary industry data provided by BASF SE for a validated insecticide target, (2) on a Cathepsin S data set published by Janssen in the Drug Design Data Resource Grand Challenge 4,⁵¹ and (3) on part of the PL-REX data set. Using these data sets, XGBoost and RF models as described in Section 3.2 were trained with the following descriptors:

MFP
GRADE_OPD
MFP + GRADE_OPD
GRADE
MFP + GRADE
RDKit_PhysChem
MFP + RDKit_PhysChem
PLEC
MFP + PLEC

4.3.1. Insecticide Data Set

Table 9 shows the coefficient of determination and the mean absolute error (MAE) of the validation, as described in the section above. Looking at the XGBoost model, MFP + PLEC performs best with an R² of 0.732. Notably, all descriptors, including the MFP, seem to perform better in a 3D-QSAR scenario than a generalizable scoring function, but this is most likely due to the complexity of the task. The RF models show similar, but slightly worse performance overall.

Table 9. 5-Fold Cross-Validation Performance of Different Descriptors Using Gradient Boosted Trees and a RF Regressor on the BASF Data Set^a.

XGBoost
descriptor type	R²	MAE
MFP	0.716 ± 0.046	0.398 ± 0.018
GRADE_OPD	0.448 ± 0.060	0.590 ± 0.025
MFP + GRADE_OPD	0.704 ± 0.062	0.421 ± 0.045
GRADE	0.519 ± 0.061	0.551 ± 0.031
MFP + GRADE	0.716 ± 0.047	0.416 ± 0.023
RDKit_PhysChem	0.659 ± 0.038	0.472 ± 0.027
MFP + RDKit_PhysChem	0.716 ± 0.044	0.423 ± 0.028
PLEC	0.722 ± 0.018	0.402 ± 0.017
MFP + PLEC	0.732 ± 0.039	0.404 ± 0.035

RF Regressor
descriptor type	R²	MAE
MFP	0.665 ± 0.020	0.457 ± 0.020
GRADE_OPD	0.483 ± 0.014	0.579 ± 0.016
MFP + GRADE_OPD	0.668 ± 0.028	0.464 ± 0.026
GRADE	0.509 ± 0.022	0.565 ± 0.020
MFP + GRADE	0.661 ± 0.029	0.467 ± 0.027
RDKit_PhysChem	0.587 ± 0.028	0.510 ± 0.027
MFP + RDKit_PhysChem	0.643 ± 0.019	0.477 ± 0.025
PLEC	0.687 ± 0.020	0.448 ± 0.016
MFP + PLEC	0.689 ± 0.021	0.448 ± 0.016

Open in a new tab

OPD = Only Pose-Dependent. The best performing ones are marked by the use of bold letters.

The results in Table 10 show the benefit of the added GRADE features. Here, the MFP + GRADE descriptor shows the overall best performance in combination with an RF regressor at an R² of 0.545. Looking at the MFP performance, it is apparent that the XGBoost model was likely overfitted, and even adding GRADE_OPD only marginally improves R². Generally speaking, XGBoost performs worse on the test set than the RF regressor. Interestingly, even simple RDKit PhysChem descriptor models give practically useful results but are outperformed by the combinations of MFP + interaction fingerprints (PLEC or GRADE, respectively).

Table 10. Test Set Performance of Different Descriptors Using Gradient Boosted Trees and a RF Regressor on the BASF Data Set^a.

	R²
descriptor type	XGBoost	RF regressor
MFP	0.079	0.424
GRADE_OPD	0.235	0.381
MFP + GRADE_OPD	0.160	0.516
GRADE	0.336	0.406
MFP + GRADE	0.373	0.545
RDKit_PhysChem	0.400	0.431
MFP + RDKit_PhysChem	0.300	0.488
PLEC	0.371	0.516
MFP + PLEC	0.323	0.522

Open in a new tab

OPD = Only Pose-Dependent. The best performing ones are marked by the use of bold letters.

Figure 8 shows the true values plotted against the predicted binding affinity values for the test set by using the RF regressor. It is apparent that MFP + GRADE (panel e) shows lower variance from the diagonal than the others. The results for the XGBoost model can be found in Figure S8. The complete graphical representation of Table 9 can be found in Figures S9 and S10. Interestingly, even the GRADE_OPD descriptor, which only contains interaction features, already gives practically useful results, which can be further augmented by adding compound fingerprint data.

True vs predicted values of the RF model on the insecticide test set. *OPD* is short for Only Pose-Dependent.

4.3.2. Cathepsin S

Next, the same training and validation procedure was applied on the public Janssen Cathepsin S data set.⁶⁹ Using XGBoost, the best performance in the 5-fold cross-validation could be achieved with MFP + GRADE with an R² of 0.686. However, many other descriptor combinations show a similar level of accuracy, such as RDKit PhysChem descriptors and PLEC, with and without MFP, respectively, as well as plain MFP. Using RF, the best model employs the combination of MFP + RDKit PhysChem descriptors, although again, many descriptor combinations perform on a similar level (Table 11).

Table 11. 5-Fold Cross-Validation Performance of Different Descriptors Using an RF Regressor on the Janssen Cathepsin S Data Set^a.

XGBoost
XGBoost descriptor type	R²	MAE
MFP	0.650 ± 0.120	0.211 ± 0.026
GRADE_OPD	0.246 ± 0.084	0.415 ± 0.012
MFP + GRADE_OPD	0.620 ± 0.071	0.274 ± 0.024
GRADE	0.534 ± 0.056	0.314 ± 0.007
MFP + GRADE	0.686 ± 0.066	0.247 ± 0.023
RDKit_PhysChem	0.653 ± 0.096	0.217 ± 0.016
MFP + RDKit_PhysChem	0.670 ± 0.113	0.204 ± 0.026
PLEC	0.664 ± 0.112	0.252 ± 0.023
MFP + PLEC	0.683 ± 0.089	0.250 ± 0.023

RF regressor
descriptor type	R²	MAE
MFP	0.664 ± 0.078	0.240 ± 0.018
GRADE_OPD	0.351 ± 0.088	0.388 ± 0.022
MFP + GRADE_OPD	0.612 ± 0.072	0.283 ± 0.016
GRADE	0.513 ± 0.055	0.327 ± 0.014
MFP + GRADE	0.640 ± 0.063	0.271 ± 0.014
RDKit_PhysChem	0.665 ± 0.066	0.243 ± 0.016
MFP + RDKit_PhysChem	0.680 ± 0.066	0.237 ± 0.014
PLEC	0.619 ± 0.058	0.280 ± 0.008
MFP + PLEC	0.634 ± 0.062	0.274 ± 0.008

Open in a new tab

OPD = Only Pose-Dependent. The best performing ones are marked by the use of bold letters.

Adding GRADE features to the fingerprint again shows a drastic increase in predictive power on the external test set, as previously observed on the BASF data set. Whereas the models based on MFPs show R² values of 0.306 and 0.275 for XGBoost and RF, respectively, the most consistent boost in predictive power can be observed for MFP + GRADE with R² values of 0.429 and 0.420, respectively (Table 12). Notably, even the RF model based on GRADE alone, without the addition of Morgan fingerprints, achieves a predictive power with an R² of 0.431 on the external Cathepsin S test set. Surprisingly, PLEC showed very poor performance on the external test set.

Table 12. Test Set Performance of Different Descriptors Using Gradient Boosted Trees and a RF Regressor on the Janssen Cathapsin S Data Set^a.

	R²
descriptor type	XGBoost	RF Regressor
MFP	0.306	0.275
GRADE_OPD	<0	0.265
MFP + GRADE_OPD	0.442	0.345
GRADE	0.120	0.431
MFP + GRADE	0.429	0.420
RDKit_PhysChem	0.366	0.415
MFP + RDKit_PhysChem	0.433	0.413
PLEC	<0	<0
MFP + PLEC	<0	<0

Open in a new tab

OPD = Only Pose-Dependent. The best performing ones are marked by using bold letters.

Figure 9 shows the true values plotted against the predicted binding affinity values of the test set by using the RF regressor. The results for XGBoost can be seen in Figure S11, and the training set can be seen in Figures S12 and S13.

True value versus the predicted value of the RF model on the Cathepsin S test set. *OPD* is short for Only Pose-Dependent.

This trend of increased performance on the external test sets of both test cases indicates that the 3D information encoded in the GRADE features extends the generalizability and transferability of a prediction model beyond the immediate chemical space of the training data and thus might prevent overfitting on the training data, as potentially observed in the discrepancy in performance of the MFP models between the 5-fold cross-validation and the external test.

4.3.3. PL-REX CDK2

Lastly, CDK2 data taken from the previously used PL-REX data set were used for a 3D-QSAR approach to assess the predictive power of a dedicated target-specific model over the generic scoring function, as described in Case Study 2. Using the same combinations of MFPs and GRADE features, a 5-fold cross-validation of the set of 31 compounds was performed with both XGBoost and RF. With average Pearson R values ranging from 0.32 to 0.43, no convincing model could be generated, regardless of the training method or descriptor selection. These values are lower than the performance of the generic scoring function on this data set (Pearson r of 0.57, see Table 6). This underlines that, although the CDK2 data set is with 31 compounds, the largest in the PL-REX set, training of a useful 3D-QSAR model on a data set of this size is very likely not feasible, and a generic scoring function trained on PDBBind can outperform target-specific models in a limited data scenario. This further highlights the importance of high-quality generic models as, particularly in early drug discovery projects, the availability of project-specific data is often still very limited.

5. Conclusions and Outlook

We introduced a novel protein–ligand interaction descriptor that is largely based on pharmacophoric feature interaction scores that were initially developed for the generation of GRAIL maps. Three case studies were conducted to get a reasonable estimate of the usability of GRADE/X-GRADE in various application scenarios.

The first case study focused on dimensionality reduction and visualization using the UMAP method. The ability of the descriptor to differentiate between different molecular structures in the chemical space seems to be comparable to that of other more complex descriptors.

The goal of the second case study was to develop and evaluate a general ML-based binding affinity estimation method for protein–ligand complexes using GRADE/X-GRADE. Even with out-of-the-box ML methods, reasonable accuracy is achieved, especially on an independent data set. These models were additionally compared to already existing scoring functions, where they outperformed all but one in terms of accuracy while being significantly faster than the best performing competitor. This shows the potential GRADE has for the development of well-performing docking pose postscoring and refinement scoring functions. However, for such GRADE use cases, one should consider performing hyperparameter optimization and feature selection to ensure that only the most important features are chosen for model building.

The third case study took a similar approach to the second one but with a different goal. Instead of building general binding affinity estimation models, 3D-QSAR models were trained on specific targets and chemical compound series. This was conducted on proprietary insecticide data provided by BASF SE, as well as public data sets for Cathepsin S. The results clearly indicate that the additional information provided by GRADE greatly increased the predictive power over plain MFPs, specifically for external test sets. Lastly, the smaller PL-REX CDK2 data set illustrated the limitations of the 3D-QSAR approach. A direct comparison of the performance of the PDBBind trained generic scoring function (Case Study 2) vs a dedicated CDK2 3D-QSAR model showed that a substantial amount of data is a prerequisite for an accurate 3D-QSAR model, as the generic affinity scoring function outperformed the 3D-QSAR model in this case.

Regarding further development of GRADE, it is planned to utilize precalculated GRAIL maps for descriptor generation, which will allow for a more efficient calculation of the GRADE/X-GRADE protein–ligand interaction features. Moreover, since all performed calculations are rather simple, the use of GPUs is feasible, which will help to reduce descriptor calculation times even further. Given the simplicity of the descriptor feature calculation in combination with low processing times and the demonstrated accuracy of derived binding affinity estimation models, GRADE/X-GRADE can be considered as a valuable contribution to the arsenal of existing IFP methods and thus beneficial for any downstream IFP applications.

Acknowledgments

The financial support by the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association are gratefully acknowledged, as well as the financial support and scientific expertise of BASF and Boehringer Ingelheim.

Data Availability Statement

Python scripts for the generation of GRADE/X-GRADE feature vectors for PDBbind database entries and arbitrary ligand/protein pairs, Python code for the evaluation and testing of GRADE/X-GRADE, as well as all generated intermediate and final data files can be downloaded from the GitHub repository: https://github.com/molinfo-vienna/GRADE.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.4c01902.

List of pose-dependent interaction features calculated for the X-GRADE variant (Table S1), schematic General Bell Function (Figure S1), distance and angle scoring functions, associated value ranges, and interaction geometries for all nonbonding interaction types supported by the current implementation of the GRAIL method (Table S2), number of protein–ligand complexes in each EC class for the PDBbind Data set (Table S3), UMAP representations of each enzyme class in PDBbind (Figure S2), predicted vs true binding affinities of all best performing GRADE and X-GRADE models on the PDBbind core set (Figure S3), predicted vs true affinities of all best performing GRADE and X-GRADE models on the PDBbind general set (Figure S4), predicted vs true affinities of all best performing GRADE and X-GRADE models on the PDBbind general set split by EC number (Figure S5), predicted vs true affinities of all best performing GRADE models on the PL-REX data set (Figure S6), predicted vs real affinities of all best performing X-GRADE models on the PL-REX data set (Figure S7), true vs predicted values of the XGBoost model on the test set of the insecticide data set (Figure S8), true vs predicted value of the XGBoost model on the insecticide data set (Figure S9), true vs predicted value of the RF model on the insecticide data set (Figure S10), true vs predicted value of the XGBoost model on the test set of the Cathepsin S data set (Figure S11), true vs predicted value of the XGBoost model on the Cathepsin S data set (Figure S12), and true vs predicted value of the RF model on the Cathepsin S data set (Figure S13) (PDF)

Author Contributions

^∥ C.F. and T.S. contributed equally to this work. C.F. wrote the manuscript, performed research, and implemented the GRADE/X-GRADE evaluation code. T.S. designed and implemented GRADE/X-GRADE, is the main developer of CDPKit, performed research, cosupervised C.F., and contributed to the writing of the manuscript. B.M. performed research, cosupervised, contributed to writing, and performed proofreading of the manuscript. T.L. supervised C.F., is the PI of the group, and performed proofreading of the manuscript. K.-J.S. provided valuable scientific input and performed proofreading of the manuscript. C.F. and T.S. contributed equally to the writing of the manuscript. The submitted manuscript was approved by all authors.

The authors declare no competing financial interest.

Supplementary Material

ci4c01902_si_001.pdf^{(4.7MB, pdf)}

References

Salentin S.; Haupt V. J.; Daminelli S.; Schroeder M. Polypharmacology rescored: Protein-ligand interaction profiles for remote binding site similarity assessment. Progress in biophysics and molecular biology 2014, 116 (2–3), 174–186. 10.1016/j.pbiomolbio.2014.05.006. [DOI] [PubMed] [Google Scholar]
Lehn J. M. Perspectives in supramolecular chemistry–from molecular recognition towards molecular information processing and self-organization. Angew. Chem., Int. Ed. Engl. 1990, 29 (11), 1304–1319. 10.1002/anie.199013041. [DOI] [Google Scholar]
Persch E.; Dumele O.; Diederich F. Molecular recognition in chemical and biological systems. Angew. Chem., Int. Ed. 2015, 54 (11), 3290–3327. 10.1002/anie.201408487. [DOI] [PubMed] [Google Scholar]
Coumar M. S.Molecular docking for computer-aided drug design: Fundamentals, techniques, resources and applications; Academic Press, 2021. [Google Scholar]
Kaczor A. A.Protein-protein docking: Methods and protocols. In Methods in Molecular Biology; Springer, 2024 10.1007/978-1-0716-3985-6. [DOI] [Google Scholar]
Langer T.; Hoffmann R. D.. Pharmacophores and pharmacophore searches. In Methods and Principles in Medicinal Chemistry; Wiley, 2006 10.1002/3527609164. [DOI] [Google Scholar]
Harren T.; Gutermuth T.; Grebner C.; Hessler G.; Rarey M. Modern machine-learning for binding affinity estimation of protein–ligand complexes: Progress, opportunities, and challenges. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2024, 14 (3), e1716 10.1002/wcms.1716. [DOI] [Google Scholar]
Xiong G.; Shen C.; Yang Z.; Jiang D.; Liu S.; Lu A.; Chen X.; Hou T.; Cao D. Featurization strategies for protein–ligand interactions and their applications in scoring function development. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2022, 12 (2), e1567 10.1002/wcms.1567. [DOI] [Google Scholar]
Jiménez J.; Skalic M.; Martinez-Rosell G.; Fabritiis G. De. K deep: protein-ligand absolute binding affinity prediction via 3d-convolutional neural networks. J. Chem. Inf. Model. 2018, 58 (2), 287–296. 10.1021/acs.jcim.7b00650. [DOI] [PubMed] [Google Scholar]
Feinberg E. N.; Sur D.; Wu Z.; Husic B. E.; Mai H.; Li Y.; Sun S.; Yang J.; Ramsundar B.; Pande V. S. Potentialnet for molecular property prediction. ACS Cent. Sci. 2018, 4 (11), 1520–1530. 10.1021/acscentsci.8b00507. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nguyen D. D.; Cang Z.; Wu K.; Wang M.; Cao Y.; Wei G. W. Mathematical deep learning for pose and binding affinity prediction and ranking in d3r grand challenges. J. Comput.-Aided Mol. Des. 2019, 33, 71–82. 10.1007/s10822-018-0146-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Massa W.; Gould R.. Crystal structure determination 2010. URL https://books.google.at/books?id/TvB7THSBO7IC.
Li J.; Guan X.; Zhang O.; Sun K.; Wang Y.; Bagni D.; Head-Gordon T.. Leak proof pdbbind: A reorganized dataset of protein-ligand complexes for more generalizable binding affinity prediction. arXiv preprint arXiv:2308.09639, 2023.
Wang R.; Fang X.; Lu Y.; Wang S. The pdbbind database: Collection of binding affinities for protein- ligand complexes with known three-dimensional structures. J. Med. Chem. 2004, 47 (12), 2977–2980. 10.1021/jm030580l. [DOI] [PubMed] [Google Scholar]
Ain Q. U.; Aleksandrova A.; Roessler F. D.; Ballester P. J. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2015, 5 (6), 405–424. 10.1002/wcms.1225. [DOI] [PMC free article] [PubMed] [Google Scholar]
Su M.; Yang Q.; Du Y.; Feng G.; Liu Z.; Li Y.; Wang R. Comparative assessment of scoring functions: the casf-2016 update. J. Chem. Inf. Model. 2019, 59 (2), 895–913. 10.1021/acs.jcim.8b00545. [DOI] [PubMed] [Google Scholar]
Stuart S.; Watchorn J.; Gu F. X. Sizing up feature descriptors for macromolecular machine learning with polymeric biomaterials. npj Comput. Mater. 2023, 9 (1), 102 10.1038/s41524-023-01040-5. [DOI] [Google Scholar]
Hughes G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 1968, 14 (1), 55–63. 10.1109/TIT.1968.1054102. [DOI] [Google Scholar]
Trunk G. V. A problem of dimensionality: A simple example. IEEE Trans. Pattern Anal. Mach. Intell. 1979, PAMI-1, 306–307. 10.1109/TPAMI.1979.4766926. [DOI] [PubMed] [Google Scholar]
Li Z.; Huang R.; Xia M.; Patterson T. A.; Hong H. Fingerprinting interactions between proteins and ligands for facilitating machine learning in drug discovery. Biomolecules 2024, 14 (1), 72. 10.3390/biom14010072. [DOI] [PMC free article] [PubMed] [Google Scholar]
Deng Z.; Chuaqui C.; Singh J. Structural interaction fingerprint (sift): a novel method for analyzing three-dimensional protein- ligand binding interactions. J. Med. Chem. 2004, 47 (2), 337–344. 10.1021/jm030331x. [DOI] [PubMed] [Google Scholar]
Mordalski S.; Kosciolek T.; Kristiansen K.; Sylte I.; Bojarski A. J. Protein binding site analysis by means of structural interaction fingerprint patterns. Bioorg. Med. Chem. Lett. 2011, 21 (22), 6816–6819. 10.1016/j.bmcl.2011.09.027. [DOI] [PubMed] [Google Scholar]
Marcou G.; Rognan D. Optimizing fragment and scaffold docking by use of molecular interaction fingerprints. J. Chem. Inf. Model. 2007, 47 (1), 195–207. 10.1021/ci600342e. [DOI] [PubMed] [Google Scholar]
Da Silva F.; Desaphy J.; Rognan D. Ichem: a versatile toolkit for detecting, comparing, and predicting protein-ligand interactions. ChemMedChem 2018, 13 (6), 507–510. 10.1002/cmdc.201700505. [DOI] [PMC free article] [PubMed] [Google Scholar]
Desaphy J.; Raimbaud E.; Ducrot P.; Rognan D. Encoding protein-ligand interaction patterns in fingerprints and graphs. J. Chem. Inf. Model. 2013, 53 (3), 623–637. 10.1021/ci300566n. [DOI] [PubMed] [Google Scholar]
Radifar M.; Yuniarti N.; Istyastono E. P. Pyplif: Python-based protein-ligand interaction fingerprinting. Bioinformation 2013, 9 (6), 325. 10.6026/97320630009325. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pérez-Nueno V. I.; Rabal O.; Borrell J. I.; Teixidó J. Apif: a new interaction fingerprint based on atom pairs and its application to virtual screening. J. Chem. Inf. Model. 2009, 49 (5), 1245–1260. 10.1021/ci900043r. [DOI] [PubMed] [Google Scholar]
Chupakhin V.; Marcou G.; Gaspar H.; Varnek A. Simple ligand-receptor interaction descriptor (silirid) for alignment-free binding site comparison. Comput. Struct. Biotechnol. J. 2014, 10 (16), 33–37. 10.1016/j.csbj.2014.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Da C.; Kireev D. Structural protein-ligand interaction fingerprints (splif) for structure-based virtual screening: method and benchmark study. J. Chem. Inf. Model. 2014, 54 (9), 2555–2561. 10.1021/ci500319f. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bouysset C.; Fiorucci S. Prolif: a library to encode molecular interactions as fingerprints. J. Cheminf. 2021, 13 (1), 72 10.1186/s13321-021-00548-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wójcikowski M.; Kukiełka M.; Stepniewska-Dziubinska M. M.; Siedlecki P. Development of a protein-ligand extended connectivity (plec) fingerprint and its application for binding affinity predictions. Bioinformatics 2019, 35 (8), 1334–1341. 10.1093/bioinformatics/bty757. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rodríguez-Pérez R.; Miljković F.; Bajorath J. Assessing the information content of structural and protein-ligand interaction representations for the classification of kinase inhibitor binding modes via machine learning and active learning. J. Cheminf. 2020, 12, 36 10.1186/s13321-020-00434-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mpamhanga C. P.; Chen B.; McLay I. M.; Willett P. Knowledge-based interaction fingerprint scoring: a simple method for improving the effectiveness of fast scoring functions. J. Chem. Inf. Model. 2006, 46 (2), 686–698. 10.1021/ci050420d. [DOI] [PubMed] [Google Scholar]
de Graaf C.; Kooistra A. J.; Vischer H. F.; Katritch V.; Kuijer M.; Shiroishi M.; Iwata S.; Shimamura T.; Stevens R. C.; de Esch I. J. P.; Leurs R. Crystal structure-based virtual screening for fragment-like ligands of the human histamine h1 receptor. J. Med. Chem. 2011, 54 (23), 8195–8206. 10.1021/jm2011589. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kokh D. B.; Doser B.; Richter S.; Ormersbach F.; Cheng X.; Wade R. C. A workflow for exploring ligand dissociation from a macromolecule: Efficient random acceleration molecular dynamics simulation and interaction fingerprint analysis of ligand trajectories. J. Chem. Phys. 2020, 153, 125102 10.1063/5.0019088. [DOI] [PubMed] [Google Scholar]
Wang D. D.; Chan M. T.; Yan H. Structure-based protein-ligand interaction fingerprints for binding affinity prediction. Comput. Struct. Biotechnol. J. 2021, 19, 6291–6300. 10.1016/j.csbj.2021.11.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schuetz D. A.; Seidel T.; Garon A.; Martini R.; Körbel M.; Ecker G. F.; Langer T. Grail: Grids of pharmacophore interaction fields. J. Chem. Theory Comput. 2018, 14 (9), 4958–4970. 10.1021/acs.jctc.8b00495. [DOI] [PubMed] [Google Scholar]
Ertl P.; Rohde B.; Selzer P. Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J. Med. Chem. 2000, 43 (20), 3714–3717. 10.1021/jm000942e. [DOI] [PubMed] [Google Scholar]
Wang R.; Gao Y.; Lai L. Calculating partition coefficient by atom-additive method. Perspect. Drug Discovery Des. 2000, 19, 47–66. 10.1023/A:1008763405023. [DOI] [Google Scholar]
Seidel T.; Wieder O.. Cdpkit, https://github.com/molinfo-vienna/cdpkit. URL https://cdpkit.org/.
Clark M.; Cramer R. D. III; Van Opdenbosch N. Validation of the general purpose tripos 5.2 force field. J. Comput. Chem. 1989, 10 (8), 982–1012. 10.1002/jcc.540100804. [DOI] [Google Scholar]
Greene J.; Kahn S.; Savoj H.; Sprague P.; Teig S. Chemical function queries for 3d database search. J. Chem. Inf. Comput. Sci. 1994, 34 (6), 1297–1308. 10.1021/ci00022a012. [DOI] [Google Scholar]
Halgren T. A. Merck molecular force field. i. basis, form, scope, parameterization, and performance of mmff94. J. Comput. Chem. 1996, 17 (5–6), 520–552. 10.1002/(sici)1096-987x(199604)17:5/63.0.co;2-w. [DOI] [Google Scholar]
Halgren T. A. Merck molecular force field. ii. mmff94 van der waals and electrostatic parameters for intermolecular interactions. J. Comput. Chem. 1996, 17 (5–6), 520–552. 10.1002/(sici)1096-987x(199604)17:5/63.0.co;2-w. [DOI] [Google Scholar]
Rappé A. K.; Casewit C. J.; Colwell K.; Goddard W. A. III; Skiff W. M. Uff, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 1992, 114 (25), 10024–10035. 10.1021/ja00051a040. [DOI] [Google Scholar]
Cheng T.; Li X.; Li Y.; Liu Z.; Wang R. Comparative assessment of scoring functions on a diverse test set. J. Chem. Inf. Model. 2009, 49 (4), 1079–1093. 10.1021/ci9000053. [DOI] [PubMed] [Google Scholar]
Li Y.; Liu Z.; Li J.; Han L.; Liu J.; Zhao Z.; Wang R. Comparative assessment of scoring functions on an updated benchmark: 1. compilation of the test set. J. Chem. Inf. Model. 2014, 54 (6), 1700–1716. 10.1021/ci500080q. [DOI] [PubMed] [Google Scholar]
Liu Z.; Li Y.; Han L.; Li J.; Liu J.; Zhao Z.; Nie W.; Liu Y.; Wang R. Pdb-wide collection of binding data: current status of the pdbbind database. Bioinformatics 2015, 31 (3), 405–412. 10.1093/bioinformatics/btu626. [DOI] [PubMed] [Google Scholar]
Liu Z.; Su M.; Han L.; Liu J.; Yang Q.; Li Y.; Wang R. Forging the basis for developing protein-ligand interaction scoring functions. Acc. Chem. Res. 2017, 50 (2), 302–309. 10.1021/acs.accounts.6b00491. [DOI] [PubMed] [Google Scholar]
Pecina A.; Fanfrlík J.; Lepšík M.; Řezáč J. Sqm2. 20: Semiempirical quantum-mechanical scoring function yields dft-quality protein-ligand binding affinity predictions in minutes. Nat. Commun. 2024, 15 (1), 1127 10.1038/s41467-024-45431-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gilson M. K.; Amaro R. E. Drug design data resource, grand challenge 4, second of two issues. J. Comput.-Aided Mol. Des. 2020, 34 (2), 97. 10.1007/s10822-020-00296-z. [DOI] [PubMed] [Google Scholar]
Yang Y.; Yao K.; Repasky M. P.; Leswing K.; Abel R.; Shoichet B. K.; Jerome S. V. Efficient exploration of chemical space with docking and deep learning. J. Chem. Theory Comput. 2021, 17 (11), 7106–7119. 10.1021/acs.jctc.1c00810. [DOI] [PubMed] [Google Scholar]
Friesner R. A.; Murphy R. B.; Repasky M. P.; Frye L. L.; Greenwood J. R.; Halgren T. A.; Sanschagrin P. C.; Mainz D. T. Extra precision glide: Docking and scoring incorporating a model of hydrophobic enclosure for protein- ligand complexes. J. Med. Chem. 2006, 49 (21), 6177–6196. 10.1021/jm051256o. [DOI] [PubMed] [Google Scholar]
Halgren T. A.; Murphy R. B.; Friesner R. A.; Beard H. S.; Frye L. L.; Pollard W. T.; Banks J. L. Glide: a new approach for rapid, accurate docking and scoring. 2. enrichment factors in database screening. J. Med. Chem. 2004, 47 (7), 1750–1759. 10.1021/jm030644s. [DOI] [PubMed] [Google Scholar]
Friesner R. A.; Banks J. L.; Murphy R. B.; Halgren T. A.; Klicic J. J.; Mainz D. T.; Repasky M. P.; Knoll E. H.; Shelley M.; Perry J. K.; et al. Glide: a new approach for rapid, accurate docking and scoring. 1. method and assessment of docking accuracy. J. Med. Chem. 2004, 47 (7), 1739–1749. 10.1021/jm0306430. [DOI] [PubMed] [Google Scholar]
Schrödinger release 2024–3: Glide; Schrödinger, LLC: New York, NY, 2024.
Schrödinger release 2023–3, maestro; Schrödinger, LLC: New York, NY, 2024.
Gaieb Z.; Parks C. D.; Chiu M.; Yang H.; Shao C.; Walters W. P.; Lambert M. H.; Nevins N.; Bembenek S. D.; Ameriks M. K.; et al. D3r grand challenge 3: blind prediction of protein-ligand poses and affinity rankings. J. Comput.-Aided Mol. Des. 2019, 33, 1–18. 10.1007/s10822-018-0180-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pedregosa F.; Varoquaux G.; Gramfort A.; Michel V.; Thirion B.; Grisel O.; Blondel M.; Prettenhofer P.; Weiss R.; Dubourg V.; Vanderplas J.; Passos A.; Cournapeau D.; Brucher M.; Perrot M.; Duchesnay E. Scikit-learn: Machine learning in Python. J. Mach. Learning Res. 2011, 12, 2825–2830. [Google Scholar]
Chen T.; Guestrin C.. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining 2016; pp 785–794.
McInnes L.; Healy J.; Melville J.. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 2018. [Google Scholar]
Landrum G.; Tosco P.; Kelley B.; Ric; Cosgrove D.; sriniker; Vianello R.; gedeck; NadineSchneider; Jones G.; Kawashima E.; Nealschneider D.; Dalke A.; Cole B.; Swain M.; Turk S.; Savelev A.; Vaucher A.; Wójcikowski M.; Take I.; Scalfani V. F.; Probst D.; Ujihara K.; Walker R.; guillaume godin; Pahl A.; Lehtivarjo J.; Berenger F. strets123, jasondbiggs. rdkit/rdkit: 2023_09_6 (q3 2023) release 2024. 10.5281/zenodo.10793672 URI: 10.5281/zenodo.10793672. [DOI]
Wójcikowski M.; Zielenkiewicz P.; Siedlecki P. Open drug discovery toolkit (oddt): a new open-source player in the drug discovery field. J. Cheminf. 2015, 7, 26 10.1186/s13321-015-0078-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sainburg T.; McInnes L.; Gentner T. Q. Parametric umap embeddings for representation and semisupervised learning. Neural Comput. 2021, 33 (11), 2881–2907. 10.1162/neco_a_01434. [DOI] [PMC free article] [PubMed] [Google Scholar]
McInnes L.Umap: Uniform manifold approximation and projection for dimension reduction - documentation. URI: https://umap-learn.readthedocs.io/en/latest/.
Vershynin R.High-dimensional probability: An introduction with applications in data science Cambridge University Press, 2018. [Google Scholar]
Lu J.; Hou X.; Wang C.; Zhang Y. Incorporating explicit water molecules and ligand conformation stability in machine-learning scoring functions. J. Chem. Inf. Model. 2019, 59 (11), 4540–4549. 10.1021/acs.jcim.9b00645. [DOI] [PMC free article] [PubMed] [Google Scholar]
Van der Maaten L.; Hinton G.. Visualizing data using t-sne. J. Mach. Learning Res. 9 (11) 2008. [Google Scholar]
Parks C. D.; Gaieb Z.; Chiu M.; Yang H.; Shao C.; Walters W. P.; Jansen J. M.; McGaughey G.; Lewis R. A.; Bembenek S. D.; et al. D3r grand challenge 4: blind prediction of protein-ligand poses, affinity rankings, and relative binding free energies. J. Comput.-Aided Mol. Des. 2020, 34, 99–119. 10.1007/s10822-020-00289-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ci4c01902_si_001.pdf^{(4.7MB, pdf)}

Data Availability Statement

[ref1] Salentin S.; Haupt V. J.; Daminelli S.; Schroeder M. Polypharmacology rescored: Protein-ligand interaction profiles for remote binding site similarity assessment. Progress in biophysics and molecular biology 2014, 116 (2–3), 174–186. 10.1016/j.pbiomolbio.2014.05.006. [DOI] [PubMed] [Google Scholar]

[ref2] Lehn J. M. Perspectives in supramolecular chemistry–from molecular recognition towards molecular information processing and self-organization. Angew. Chem., Int. Ed. Engl. 1990, 29 (11), 1304–1319. 10.1002/anie.199013041. [DOI] [Google Scholar]

[ref3] Persch E.; Dumele O.; Diederich F. Molecular recognition in chemical and biological systems. Angew. Chem., Int. Ed. 2015, 54 (11), 3290–3327. 10.1002/anie.201408487. [DOI] [PubMed] [Google Scholar]

[ref4] Coumar M. S.Molecular docking for computer-aided drug design: Fundamentals, techniques, resources and applications; Academic Press, 2021. [Google Scholar]

[ref5] Kaczor A. A.Protein-protein docking: Methods and protocols. In Methods in Molecular Biology; Springer, 2024 10.1007/978-1-0716-3985-6. [DOI] [Google Scholar]

[ref6] Langer T.; Hoffmann R. D.. Pharmacophores and pharmacophore searches. In Methods and Principles in Medicinal Chemistry; Wiley, 2006 10.1002/3527609164. [DOI] [Google Scholar]

[ref7] Harren T.; Gutermuth T.; Grebner C.; Hessler G.; Rarey M. Modern machine-learning for binding affinity estimation of protein–ligand complexes: Progress, opportunities, and challenges. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2024, 14 (3), e1716 10.1002/wcms.1716. [DOI] [Google Scholar]

[ref8] Xiong G.; Shen C.; Yang Z.; Jiang D.; Liu S.; Lu A.; Chen X.; Hou T.; Cao D. Featurization strategies for protein–ligand interactions and their applications in scoring function development. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2022, 12 (2), e1567 10.1002/wcms.1567. [DOI] [Google Scholar]

[ref9] Jiménez J.; Skalic M.; Martinez-Rosell G.; Fabritiis G. De. K deep: protein-ligand absolute binding affinity prediction via 3d-convolutional neural networks. J. Chem. Inf. Model. 2018, 58 (2), 287–296. 10.1021/acs.jcim.7b00650. [DOI] [PubMed] [Google Scholar]

[ref10] Feinberg E. N.; Sur D.; Wu Z.; Husic B. E.; Mai H.; Li Y.; Sun S.; Yang J.; Ramsundar B.; Pande V. S. Potentialnet for molecular property prediction. ACS Cent. Sci. 2018, 4 (11), 1520–1530. 10.1021/acscentsci.8b00507. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref11] Nguyen D. D.; Cang Z.; Wu K.; Wang M.; Cao Y.; Wei G. W. Mathematical deep learning for pose and binding affinity prediction and ranking in d3r grand challenges. J. Comput.-Aided Mol. Des. 2019, 33, 71–82. 10.1007/s10822-018-0146-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref12] Massa W.; Gould R.. Crystal structure determination 2010. URL https://books.google.at/books?id/TvB7THSBO7IC.

[ref13] Li J.; Guan X.; Zhang O.; Sun K.; Wang Y.; Bagni D.; Head-Gordon T.. Leak proof pdbbind: A reorganized dataset of protein-ligand complexes for more generalizable binding affinity prediction. arXiv preprint arXiv:2308.09639, 2023.

[ref14] Wang R.; Fang X.; Lu Y.; Wang S. The pdbbind database: Collection of binding affinities for protein- ligand complexes with known three-dimensional structures. J. Med. Chem. 2004, 47 (12), 2977–2980. 10.1021/jm030580l. [DOI] [PubMed] [Google Scholar]

[ref15] Ain Q. U.; Aleksandrova A.; Roessler F. D.; Ballester P. J. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2015, 5 (6), 405–424. 10.1002/wcms.1225. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] Su M.; Yang Q.; Du Y.; Feng G.; Liu Z.; Li Y.; Wang R. Comparative assessment of scoring functions: the casf-2016 update. J. Chem. Inf. Model. 2019, 59 (2), 895–913. 10.1021/acs.jcim.8b00545. [DOI] [PubMed] [Google Scholar]

[ref17] Stuart S.; Watchorn J.; Gu F. X. Sizing up feature descriptors for macromolecular machine learning with polymeric biomaterials. npj Comput. Mater. 2023, 9 (1), 102 10.1038/s41524-023-01040-5. [DOI] [Google Scholar]

[ref18] Hughes G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 1968, 14 (1), 55–63. 10.1109/TIT.1968.1054102. [DOI] [Google Scholar]

[ref19] Trunk G. V. A problem of dimensionality: A simple example. IEEE Trans. Pattern Anal. Mach. Intell. 1979, PAMI-1, 306–307. 10.1109/TPAMI.1979.4766926. [DOI] [PubMed] [Google Scholar]

[ref20] Li Z.; Huang R.; Xia M.; Patterson T. A.; Hong H. Fingerprinting interactions between proteins and ligands for facilitating machine learning in drug discovery. Biomolecules 2024, 14 (1), 72. 10.3390/biom14010072. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] Deng Z.; Chuaqui C.; Singh J. Structural interaction fingerprint (sift): a novel method for analyzing three-dimensional protein- ligand binding interactions. J. Med. Chem. 2004, 47 (2), 337–344. 10.1021/jm030331x. [DOI] [PubMed] [Google Scholar]

[ref22] Mordalski S.; Kosciolek T.; Kristiansen K.; Sylte I.; Bojarski A. J. Protein binding site analysis by means of structural interaction fingerprint patterns. Bioorg. Med. Chem. Lett. 2011, 21 (22), 6816–6819. 10.1016/j.bmcl.2011.09.027. [DOI] [PubMed] [Google Scholar]

[ref23] Marcou G.; Rognan D. Optimizing fragment and scaffold docking by use of molecular interaction fingerprints. J. Chem. Inf. Model. 2007, 47 (1), 195–207. 10.1021/ci600342e. [DOI] [PubMed] [Google Scholar]

[ref24] Da Silva F.; Desaphy J.; Rognan D. Ichem: a versatile toolkit for detecting, comparing, and predicting protein-ligand interactions. ChemMedChem 2018, 13 (6), 507–510. 10.1002/cmdc.201700505. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref25] Desaphy J.; Raimbaud E.; Ducrot P.; Rognan D. Encoding protein-ligand interaction patterns in fingerprints and graphs. J. Chem. Inf. Model. 2013, 53 (3), 623–637. 10.1021/ci300566n. [DOI] [PubMed] [Google Scholar]

[ref26] Radifar M.; Yuniarti N.; Istyastono E. P. Pyplif: Python-based protein-ligand interaction fingerprinting. Bioinformation 2013, 9 (6), 325. 10.6026/97320630009325. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref27] Pérez-Nueno V. I.; Rabal O.; Borrell J. I.; Teixidó J. Apif: a new interaction fingerprint based on atom pairs and its application to virtual screening. J. Chem. Inf. Model. 2009, 49 (5), 1245–1260. 10.1021/ci900043r. [DOI] [PubMed] [Google Scholar]

[ref28] Chupakhin V.; Marcou G.; Gaspar H.; Varnek A. Simple ligand-receptor interaction descriptor (silirid) for alignment-free binding site comparison. Comput. Struct. Biotechnol. J. 2014, 10 (16), 33–37. 10.1016/j.csbj.2014.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref29] Da C.; Kireev D. Structural protein-ligand interaction fingerprints (splif) for structure-based virtual screening: method and benchmark study. J. Chem. Inf. Model. 2014, 54 (9), 2555–2561. 10.1021/ci500319f. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] Bouysset C.; Fiorucci S. Prolif: a library to encode molecular interactions as fingerprints. J. Cheminf. 2021, 13 (1), 72 10.1186/s13321-021-00548-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref31] Wójcikowski M.; Kukiełka M.; Stepniewska-Dziubinska M. M.; Siedlecki P. Development of a protein-ligand extended connectivity (plec) fingerprint and its application for binding affinity predictions. Bioinformatics 2019, 35 (8), 1334–1341. 10.1093/bioinformatics/bty757. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref32] Rodríguez-Pérez R.; Miljković F.; Bajorath J. Assessing the information content of structural and protein-ligand interaction representations for the classification of kinase inhibitor binding modes via machine learning and active learning. J. Cheminf. 2020, 12, 36 10.1186/s13321-020-00434-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref33] Mpamhanga C. P.; Chen B.; McLay I. M.; Willett P. Knowledge-based interaction fingerprint scoring: a simple method for improving the effectiveness of fast scoring functions. J. Chem. Inf. Model. 2006, 46 (2), 686–698. 10.1021/ci050420d. [DOI] [PubMed] [Google Scholar]

[ref34] de Graaf C.; Kooistra A. J.; Vischer H. F.; Katritch V.; Kuijer M.; Shiroishi M.; Iwata S.; Shimamura T.; Stevens R. C.; de Esch I. J. P.; Leurs R. Crystal structure-based virtual screening for fragment-like ligands of the human histamine h1 receptor. J. Med. Chem. 2011, 54 (23), 8195–8206. 10.1021/jm2011589. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref35] Kokh D. B.; Doser B.; Richter S.; Ormersbach F.; Cheng X.; Wade R. C. A workflow for exploring ligand dissociation from a macromolecule: Efficient random acceleration molecular dynamics simulation and interaction fingerprint analysis of ligand trajectories. J. Chem. Phys. 2020, 153, 125102 10.1063/5.0019088. [DOI] [PubMed] [Google Scholar]

[ref36] Wang D. D.; Chan M. T.; Yan H. Structure-based protein-ligand interaction fingerprints for binding affinity prediction. Comput. Struct. Biotechnol. J. 2021, 19, 6291–6300. 10.1016/j.csbj.2021.11.018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref37] Schuetz D. A.; Seidel T.; Garon A.; Martini R.; Körbel M.; Ecker G. F.; Langer T. Grail: Grids of pharmacophore interaction fields. J. Chem. Theory Comput. 2018, 14 (9), 4958–4970. 10.1021/acs.jctc.8b00495. [DOI] [PubMed] [Google Scholar]

[ref38] Ertl P.; Rohde B.; Selzer P. Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J. Med. Chem. 2000, 43 (20), 3714–3717. 10.1021/jm000942e. [DOI] [PubMed] [Google Scholar]

[ref39] Wang R.; Gao Y.; Lai L. Calculating partition coefficient by atom-additive method. Perspect. Drug Discovery Des. 2000, 19, 47–66. 10.1023/A:1008763405023. [DOI] [Google Scholar]

[ref40] Seidel T.; Wieder O.. Cdpkit, https://github.com/molinfo-vienna/cdpkit. URL https://cdpkit.org/.

[ref41] Clark M.; Cramer R. D. III; Van Opdenbosch N. Validation of the general purpose tripos 5.2 force field. J. Comput. Chem. 1989, 10 (8), 982–1012. 10.1002/jcc.540100804. [DOI] [Google Scholar]

[ref42] Greene J.; Kahn S.; Savoj H.; Sprague P.; Teig S. Chemical function queries for 3d database search. J. Chem. Inf. Comput. Sci. 1994, 34 (6), 1297–1308. 10.1021/ci00022a012. [DOI] [Google Scholar]

[ref43] Halgren T. A. Merck molecular force field. i. basis, form, scope, parameterization, and performance of mmff94. J. Comput. Chem. 1996, 17 (5–6), 520–552. 10.1002/(sici)1096-987x(199604)17:5/63.0.co;2-w. [DOI] [Google Scholar]

[ref44] Halgren T. A. Merck molecular force field. ii. mmff94 van der waals and electrostatic parameters for intermolecular interactions. J. Comput. Chem. 1996, 17 (5–6), 520–552. 10.1002/(sici)1096-987x(199604)17:5/63.0.co;2-w. [DOI] [Google Scholar]

[ref45] Rappé A. K.; Casewit C. J.; Colwell K.; Goddard W. A. III; Skiff W. M. Uff, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 1992, 114 (25), 10024–10035. 10.1021/ja00051a040. [DOI] [Google Scholar]

[ref46] Cheng T.; Li X.; Li Y.; Liu Z.; Wang R. Comparative assessment of scoring functions on a diverse test set. J. Chem. Inf. Model. 2009, 49 (4), 1079–1093. 10.1021/ci9000053. [DOI] [PubMed] [Google Scholar]

[ref47] Li Y.; Liu Z.; Li J.; Han L.; Liu J.; Zhao Z.; Wang R. Comparative assessment of scoring functions on an updated benchmark: 1. compilation of the test set. J. Chem. Inf. Model. 2014, 54 (6), 1700–1716. 10.1021/ci500080q. [DOI] [PubMed] [Google Scholar]

[ref48] Liu Z.; Li Y.; Han L.; Li J.; Liu J.; Zhao Z.; Nie W.; Liu Y.; Wang R. Pdb-wide collection of binding data: current status of the pdbbind database. Bioinformatics 2015, 31 (3), 405–412. 10.1093/bioinformatics/btu626. [DOI] [PubMed] [Google Scholar]

[ref49] Liu Z.; Su M.; Han L.; Liu J.; Yang Q.; Li Y.; Wang R. Forging the basis for developing protein-ligand interaction scoring functions. Acc. Chem. Res. 2017, 50 (2), 302–309. 10.1021/acs.accounts.6b00491. [DOI] [PubMed] [Google Scholar]

[ref50] Pecina A.; Fanfrlík J.; Lepšík M.; Řezáč J. Sqm2. 20: Semiempirical quantum-mechanical scoring function yields dft-quality protein-ligand binding affinity predictions in minutes. Nat. Commun. 2024, 15 (1), 1127 10.1038/s41467-024-45431-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref51] Gilson M. K.; Amaro R. E. Drug design data resource, grand challenge 4, second of two issues. J. Comput.-Aided Mol. Des. 2020, 34 (2), 97. 10.1007/s10822-020-00296-z. [DOI] [PubMed] [Google Scholar]

[ref52] Yang Y.; Yao K.; Repasky M. P.; Leswing K.; Abel R.; Shoichet B. K.; Jerome S. V. Efficient exploration of chemical space with docking and deep learning. J. Chem. Theory Comput. 2021, 17 (11), 7106–7119. 10.1021/acs.jctc.1c00810. [DOI] [PubMed] [Google Scholar]

[ref53] Friesner R. A.; Murphy R. B.; Repasky M. P.; Frye L. L.; Greenwood J. R.; Halgren T. A.; Sanschagrin P. C.; Mainz D. T. Extra precision glide: Docking and scoring incorporating a model of hydrophobic enclosure for protein- ligand complexes. J. Med. Chem. 2006, 49 (21), 6177–6196. 10.1021/jm051256o. [DOI] [PubMed] [Google Scholar]

[ref54] Halgren T. A.; Murphy R. B.; Friesner R. A.; Beard H. S.; Frye L. L.; Pollard W. T.; Banks J. L. Glide: a new approach for rapid, accurate docking and scoring. 2. enrichment factors in database screening. J. Med. Chem. 2004, 47 (7), 1750–1759. 10.1021/jm030644s. [DOI] [PubMed] [Google Scholar]

[ref55] Friesner R. A.; Banks J. L.; Murphy R. B.; Halgren T. A.; Klicic J. J.; Mainz D. T.; Repasky M. P.; Knoll E. H.; Shelley M.; Perry J. K.; et al. Glide: a new approach for rapid, accurate docking and scoring. 1. method and assessment of docking accuracy. J. Med. Chem. 2004, 47 (7), 1739–1749. 10.1021/jm0306430. [DOI] [PubMed] [Google Scholar]

[ref56] Schrödinger release 2024–3: Glide; Schrödinger, LLC: New York, NY, 2024.

[ref57] Schrödinger release 2023–3, maestro; Schrödinger, LLC: New York, NY, 2024.

[ref58] Gaieb Z.; Parks C. D.; Chiu M.; Yang H.; Shao C.; Walters W. P.; Lambert M. H.; Nevins N.; Bembenek S. D.; Ameriks M. K.; et al. D3r grand challenge 3: blind prediction of protein-ligand poses and affinity rankings. J. Comput.-Aided Mol. Des. 2019, 33, 1–18. 10.1007/s10822-018-0180-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref59] Pedregosa F.; Varoquaux G.; Gramfort A.; Michel V.; Thirion B.; Grisel O.; Blondel M.; Prettenhofer P.; Weiss R.; Dubourg V.; Vanderplas J.; Passos A.; Cournapeau D.; Brucher M.; Perrot M.; Duchesnay E. Scikit-learn: Machine learning in Python. J. Mach. Learning Res. 2011, 12, 2825–2830. [Google Scholar]

[ref60] Chen T.; Guestrin C.. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining 2016; pp 785–794.

[ref61] McInnes L.; Healy J.; Melville J.. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 2018. [Google Scholar]

[ref62] Landrum G.; Tosco P.; Kelley B.; Ric; Cosgrove D.; sriniker; Vianello R.; gedeck; NadineSchneider; Jones G.; Kawashima E.; Nealschneider D.; Dalke A.; Cole B.; Swain M.; Turk S.; Savelev A.; Vaucher A.; Wójcikowski M.; Take I.; Scalfani V. F.; Probst D.; Ujihara K.; Walker R.; guillaume godin; Pahl A.; Lehtivarjo J.; Berenger F. strets123, jasondbiggs. rdkit/rdkit: 2023_09_6 (q3 2023) release 2024. 10.5281/zenodo.10793672 URI: 10.5281/zenodo.10793672. [DOI]

[ref63] Wójcikowski M.; Zielenkiewicz P.; Siedlecki P. Open drug discovery toolkit (oddt): a new open-source player in the drug discovery field. J. Cheminf. 2015, 7, 26 10.1186/s13321-015-0078-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref64] Sainburg T.; McInnes L.; Gentner T. Q. Parametric umap embeddings for representation and semisupervised learning. Neural Comput. 2021, 33 (11), 2881–2907. 10.1162/neco_a_01434. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref65] McInnes L.Umap: Uniform manifold approximation and projection for dimension reduction - documentation. URI: https://umap-learn.readthedocs.io/en/latest/.

[ref66] Vershynin R.High-dimensional probability: An introduction with applications in data science Cambridge University Press, 2018. [Google Scholar]

[ref67] Lu J.; Hou X.; Wang C.; Zhang Y. Incorporating explicit water molecules and ligand conformation stability in machine-learning scoring functions. J. Chem. Inf. Model. 2019, 59 (11), 4540–4549. 10.1021/acs.jcim.9b00645. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref68] Van der Maaten L.; Hinton G.. Visualizing data using t-sne. J. Mach. Learning Res. 9 (11) 2008. [Google Scholar]

[ref69] Parks C. D.; Gaieb Z.; Chiu M.; Yang H.; Shao C.; Walters W. P.; Jansen J. M.; McGaughey G.; Lewis R. A.; Bembenek S. D.; et al. D3r grand challenge 4: blind prediction of protein-ligand poses, affinity rankings, and relative binding free energies. J. Comput.-Aided Mol. Des. 2020, 34, 99–119. 10.1007/s10822-020-00289-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

GRADE and X-GRADE: Unveiling Novel Protein–Ligand Interaction Fingerprints Based on GRAIL Scores

Christian Fellinger

Thomas Seidel

Benjamin Merget

Klaus-Juergen Schleifer

Thierry Langer

Abstract

1. Introduction

2. Descriptor Implementation

2.1. General Descriptor Anatomy

Table 1. Principal Feature Vector Composition of GRADE and X-GRADE.

2.2. Pose-Independent Ligand Features

Table 2. List of Pose-Independent Ligand Features Calculated for GRADE and X-GRADE.

2.3. Pose-Dependent Features

Table 3. List of Pose-Dependent Interaction Features Calculated for the GRADE Variant.

2.4. Calculation of the Electrostatic Potential and Sum of Pairwise Forces

2.5. Calculation of Attractive and Repulsive Van der Waals Energies

2.6. Calculation of GRAIL Interaction Score Features

2.7. Calculation of Binding Site HBA/HBD Atom Coverage

3. Methods

3.1. Data Collection and Preparation

3.2. Machine Learning

4. Results and Discussion

4.1. Case Study 1: Dimensionality Reduction and Visualization

4.1.1. Divided by PDBbind Subsets

Figure 1.

4.1.2. Divided by the Enzyme Commission (EC) Number

4.2. Case Study 2: Binding Affinity Estimation

4.2.1. PDBbind Data Set

Table 4. Best Performing Model of Each Model Type for the Validation and Test Set, Respectively.

Figure 2.

Table 5. Best Performing Model for Each EC Main Class Number on the Validation and Test Set, Respectively.

Figure 3.

4.2.2. PL-REX Data Set

Table 6. Best Performing Model for Each System in the PL-REX Data Set.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

4.2.3. Comparison to Existing Methods

Table 7. Comparative Assessment of Scoring Functions (CASF)16 Modified to Include GRADE and X-GRADE Binding Affinity Estimation and Their Corresponding Runtime.

Table 8. Time to Calculate Different Descriptors for the Core Set, Refined Set, and General Set of PDBbind, Respectively.

4.3. Case Study 3: 3D-Quantitative Structure–Activity Relationship (3D-QSAR)

4.3.1. Insecticide Data Set

Table 9. 5-Fold Cross-Validation Performance of Different Descriptors Using Gradient Boosted Trees and a RF Regressor on the BASF Data Seta.

Table 10. Test Set Performance of Different Descriptors Using Gradient Boosted Trees and a RF Regressor on the BASF Data Seta.

Figure 8.

4.3.2. Cathepsin S

Table 11. 5-Fold Cross-Validation Performance of Different Descriptors Using an RF Regressor on the Janssen Cathepsin S Data Seta.

Table 12. Test Set Performance of Different Descriptors Using Gradient Boosted Trees and a RF Regressor on the Janssen Cathapsin S Data Seta.

Figure 9.

4.3.3. PL-REX CDK2

5. Conclusions and Outlook

Acknowledgments

Data Availability Statement

Supporting Information Available

Author Contributions

Supplementary Material

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 7. Comparative Assessment of Scoring Functions (CASF)¹⁶ Modified to Include GRADE and X-GRADE Binding Affinity Estimation and Their Corresponding Runtime.

Table 9. 5-Fold Cross-Validation Performance of Different Descriptors Using Gradient Boosted Trees and a RF Regressor on the BASF Data Set^a.

Table 10. Test Set Performance of Different Descriptors Using Gradient Boosted Trees and a RF Regressor on the BASF Data Set^a.

Table 11. 5-Fold Cross-Validation Performance of Different Descriptors Using an RF Regressor on the Janssen Cathepsin S Data Set^a.

Table 12. Test Set Performance of Different Descriptors Using Gradient Boosted Trees and a RF Regressor on the Janssen Cathapsin S Data Set^a.