Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2022 Sep 30.
Published in final edited form as: Front Bioinform. 2022 Jun 17;2:885983. doi: 10.3389/fbinf.2022.885983

Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review

Rocco Meli 1, Garrett M Morris 2, Philip C Biggin 1,*
PMCID: PMC7613667  EMSID: EMS154635  PMID: 36187180

Abstract

The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.

Keywords: binding affinity, protein-ligand binding, structure-based drug discovery, machine learning, deep learning, scoring functions, docking, in silico drug discovery

1. Introduction

The discovery and development of new small-molecule drugs is a very challenging and expensive process (Schneider and Schneider, 2016; Drews, 2000; Dickson and Gagnon, 2004). Only a handful of new drugs are approved each year (Brown and Wobst, 2021), which is minuscule compared to the vastness of chemical space (Reymond et al., 2010) and the billions of dollars poured into drug discovery campaigns (DiMasi et al., 2016). The discovery pipeline for small-molecule drugs usually starts with the identification of a protein target against which a hit compound is identified by high throughput screening (HTS) (Mayr and Bojanic, 2009; Macarron et al., 2011). The hit compound is subsequently optimized to obtain a lead compound with good potency and favorable pharmacodynamics and pharmacokinetics properties.

Thanks to significant methodological and hardware advances, computer-aided drug discovery (CADD) has played an important role in the development of new small-molecule drugs over the last decades (Sliwoski et al., 2013). CADD speeds up the early stages of the drug discovery process—hit identification and hit-to-lead optimization—and lowers the costs of these phases by reducing time and experimental resources needed. CADD methods fall into two broad classes: (explicit) structure-based, and ligand-based (or implicit structure-based) methods. For the latter, similarities to known active molecules play an important role since either the protein target is unknown, or information about the protein target is either unavailable or not included. For structure-based methods, the target structure is known and this additional information is exploited in the modelling and optimization of drug-target interactions (DTIs).

One of the main goals in the computational elucidation of DTIs is the calculation of relative or absolute binding free energies to distinguish potent binders from weak binders (or non-binders) against a target of interest. A fast and accurate prediction of protein-ligand binding affinities would circumvent the need for many time-consuming and complex experiments. Rigorous computational methods based on all-atom molecular dynamics simulations in explicit solvent—such as free energy perturbation and thermodynamic integration (Adcock and McCammon, 2006)—can compute accurate relative and absolute binding free energies (Bash et al., 1987; Boresch et al., 2003; Mobley et al., 2007; Aldeghi et al., 2016, 2018a; Cournia et al., 2017), predict ligand selectivity (Aldeghi et al., 2017) and mutation effects (Hauser et al., 2018; Aldeghi et al., 2018b), and guide fragment elaborations (Alibay et al., 2022). Unfortunately, such rigorous methods are computationally expensive and often require a lot of expert knowledge and domain expertise (Mey et al., 2020; Hahn et al., 2021). This remains true even for simpler methods such as ligand-interaction energy (LIE) (Åqvist et al., 1994; Jones-Hertzog and Jorgensen, 1997). Methods treating the solvent implicitly, such as the Poisson-Boltzmann and generalized Born models (Genheden and Ryde, 2015), can offer significant speed increase but sometimes at the expense of accuracy.

The great successes of deep learning (DL) in the fields of computer vision (Voulodimos et al., 2018), natural language processing (NLP) (Young et al., 2018), and other fields of computer science in recent years kick-started the research and application of deep learning in many scientific disciplines including physics, chemistry, biology, and medicine (Baldi, 2021). In the field of drug discovery, machine learning (ML) has been in use for a long time, and the potential usefulness of the use of deep learning in virtual screening was identified early on (Unterthiner et al., 2014). The application of modern deep learning architectures to all stages of the drug discovery pipeline is a very active area of research today (Brown, 2020; Jing et al., 2018; Jiménez-Luna et al., 2021a; Muratov et al., 2020; Gaudelet et al., 2021). The main applications in small-molecule drug design consists in the prediction of DTIs, identification of binding sites (Jiménez et al., 2017; Pu et al., 2019; Aggarwal et al., 2021), the generation of novel molecular entities (Schneider and Clark, 2019; Meyers et al., 2021), and the prediction of absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties (Huang et al., 2021a).

Bioactivity prediction can be performed as a classification task—where binders/actives are distinguished from non-binders/inactives—or as a regression task. Machine learning and deep learning scoring functions (SFs) for the prediction of binding affinities (regression) are useful in lead optimization, in contrast with SFs that try to identify binders amongst a large pool of non-binders (classification) and are used in virtual screening to identify a hit. Another task where SFs are commonly used is pose prediction, where near-native poses are distinguished from incorrect poses (classification). Pose prediction and binding affinity prediction are complementary tasks in molecular docking, where a pose is generated and subsequently scored according to the predicted binding affinity.

In this review, we will focus on SFs for binding affinity prediction (inhibition constant Ki or dissociation constant Kd) or binding free energy prediction, but we will inevitably mention related SFs used in pose prediction and virtual screening—which often share the same algorithms and ideas. Recent reviews of structure-based SFs and deep learning for virtual screening are given by Li et al. (2021b), Kimber et al. (2021), and Rifaioglu et al. (2019). Additionally, to narrow the scope of the review, we focus on structure-based deep-learning methods and we refer the reader interested in ligand-based methods to Tropsha (2010), Muratov et al. (2020), Baskin (2020), and Palazzesi and Pozzan (2022). More general and broad reviews about the application of machine learning and deep learning in drug discovery are provided by Chen et al. (2018a), Vamathevan et al. (2019), and Schneider et al. (2019).

2. Classical Scoring Functions

Historically, SFs for binding affinity prediction and virtual screening have been classified into three categories: force-field-based, empirical, and knowledge-based (Muegge and Rarey, 2001; Böhm and Stahl, 2002). However, recently Liu and Wang (2015) argued that this historical classification overlooks more recent developments in the field and thus proposed an updated classification scheme with four classes of scoring functions: force-field-based or physics-based, empirical or regression-based, knowledge-based or potential of mean force-based, and descriptor-based or machine learning-based.

This classification is useful to distinguish different methodologies and ideas appearing in the development of SFs. However, some SFs can’t be precisely assigned to only one category and the boundary between the four different classes remains rather fuzzy.

In this section we will briefly discuss the first three classes of SFs, often termed “classical” SFs. A good overview of the different SFs can be found in the paper of Liu and Wang (2015)—which proposed the current classification of SFs—and a more recent overview of different SFs used in protein-ligand docking is provided by Li et al. (2019a). While classical scoring functions are still actively developed and refined today, the research focus has certainly shifted to ML/DL based scoring functions.

2.1. Physics-based (force-field based) scoring functions

Physics-based (or force-field-based) SFs use energy terms of a molecular mechanics force-field—whose parameters are determined to reproduce experimental observables or ab initio quantum mechanical calculations (Monticelli and Tieleman, 2012)—to evaluate protein-ligand interactions. The non-covalent interaction energy between protein and ligand atoms is usually expressed as the sum of van der Waals and electrostatic interaction terms. In their simplest form, such pairwise interactions are represented by a Lennard-Jones potential and Coulomb interaction between point charges.

Different physics-based scoring functions use different potentials to describe van der Waals and electrostatic interactions, depending on the design of the underlying force field. For example, the dielectric constant can be distance-dependent to take into account electrostatic screening due to the solvent and the lower dielectric constant in protein-ligand binding sites (Hingerty et al., 1985; Gilson and Honig, 1986; Huang et al., 2010).

Often, additional shorter-range (and sometimes directional) terms are added to account for hydrogen bonding as well as solvation energy and therefore physics-based scoring function can take the following form:

ΔGbinding =ΔEVdW+ΔEel+ΔEHbond +ΔGsol. 1

The solvation energy term can take into account both polar and non-polar contributions. The former accounts for the loss of polar interactions between charged groups and water, while the latter accounts for the desolvation of hydrophobic groups upon binding.

Finally, empirical terms accounting for the loss of torsional degrees of freedom upon complexation can also be included. Oftentimes, simple approximations based on the number of rotatable bonds are used (Böhm, 1994; Chang et al., 2007; Huey et al., 2007; Huang and Zou, 2010), although more advanced treatments have been suggested (Guedes et al., 2021b). The same corrections are applied to empirical and knowledge-based scoring functions, discussed below.

Force-field-based scoring functions are attractive because of their physical origin and because they can leverage advances in force-field developments, including the latest advances in ML force-fields (Unke et al., 2021). However, describing solvent effects in ligand binding remains an outstanding challenge (Ross et al., 2012; Darby et al., 2019; Limongelli et al., 2012).

Notable examples of physics-based (force field-based) scoring functions are DOCK (DesJarlais et al., 1988; Meng et al., 1992; Shoichet et al., 1992; Allen et al., 2015; Ewing et al., 2001; Moustakas et al., 2006), AutoDock (Goodsell and Olson, 1990) and AutoDock 2 (Morris et al., 1996)(AutoDock 3 and AutoDock 4 use hybrid scoring functions (Morris et al., 1998; Huey et al., 2007; Morris et al., 2009)), GoldScore (Jones et al., 1995, 1997), and GalaxyDock (Shin and Seok, 2012; Shin et al., 2013).

2.2. Empirical (regression-based) scoring functions

Empirical or regression-based scoring functions are based on regression analysis to determine the coefficient of different pre-defined terms based on experimental data. This is also what machine learning (or descriptor-based) scoring functions do, however in empirical or regression-based scoring functions the functional form of the scoring function is predetermined and it is often quite simple (such as a linear combination of different contributions) (Ain et al., 2015). As we mentioned previously, the line between the four different classes of scoring functions suggested by (Li et al., 2019a) is sometimes blurry.

Empirical scoring functions assuming a linear functional form take the following form (Guedes et al., 2018):

ΔGbinding =w0+w1ΔGVdW+w2ΔGH-bond +w3ΔGentropy . 2

The functional form of empirical scoring functions is similar to physics-based scoring functions. However, in empirical scoring functions the parameters w are determined by regression analysis—usually multivariate linear regression or partial least squares (Li et al., 2019a)—to reproduce experimentally determined values.

Often, the different terms in empirical scoring functions are simple reward or penalty scores. For example, the ChemScore (Eldridge et al., 1997; Verdonk et al., 2003) scoring function has the following functional form:

 ChemScore =w0+w1Shbond +w2Smetal +w3Slipo +w4Hrot+Eint+Eclash +Ecov  3

where Shbond is the score assigned to hydrogen bonds, Smetal scores acceptor-metal interactions, Slipo scores lipophilic interactions, Hrot describes the loss in conformational entropy upon complexation, Eint is the ligand’s internal energy, Ecov is the covalent energy term, and Eclash represents the energetic penalty of clashes between protein and ligand atoms.

One of the first empirical scoring functions was introduced by Böhm (1994) and notable examples include ChemScore (Eldridge et al., 1997; Verdonk et al., 2003), X-Score (Wang et al., 2002), Glide (Friesner et al., 2004, 2006) DockThor (de Magalhães et al., 2014), SFCscore (Sotriffer et al., 2008). More recent scoring functions are Vinardo (Quiroga and Villarreal, 2016), Lin_F9 (Yang and Zhang, 2021), DockTScore (Guedes et al., 2021a) (combined with ML), and AA-Score (Pan et al., 2022).

A fairly recent review of empirical scoring functions for structure-based virtual screening is provided by Guedes et al. (2018).

2.3. Knowledge-based (potential-based) scoring functions

Knowledge-based or potential-based scoring functions are based on pairwise statistical potentials of the form:

S=iligjprotωij(r) 4

where the distance-dependent pairwise potential ωij (r) is given by:

ωij(r)=kbTln(ρij(r)ρij0s). 5

ρij(r) is the number density of pairs of type i-j at distance r while ρij0 is the same quantity for a reference state where there is no interaction between types i and j (Muegge and Martin, 1999). Therefore, if ρij(r) is larger than the reference state ρij0 it contributes favorably to the scoring function while if ρij(r) is smaller than the reference state ρij0 then it contributes unfavorably to the scoring function. The pairwise potentials ωij (r) are obtained from the analysis of interactions in a large data set of protein-ligand complexes and usually, only pairs of protein and ligand atoms within a certain cutoff are considered (r < rcutoff).

One of the advantages of knowledge-based scoring functions is that entropic and solvation contributions are taken into account implicitly (Muegge and Martin, 1999). However, some knowledge-based scoring functions include solvation and entropy effects explicitly (Huang and Zou, 2010).

Notable examples of knowledge-based (potential-based) scoring function are the SMoG (DeWitte and Shakhnovich, 1996; DeWitte et al., 1997) (later extended to a hybrid knowledge-based and empirical scoring function (Debroise et al., 2017b)), the PMF scoring function developed by Muegge and co-workers (Muegge and Martin, 1999; Muegge, 2000, 2001), DrugScore (Gohlke et al., 2000; Velec et al., 2005; Neudert and Klebe, 2011; Dittrich et al., 2018), ITScore (Huang and Zou, 2006a,b, 2010), KECSA (Zheng and Merz, 2013), and M-score (Yang et al., 2005). More recent knowledge-based scoring functions are SMoG2016 (Debroise et al., 2017a), Convex-PL (Kadukova and Grudinin, 2017), DLIGAND2 (Chen et al., 2019b), and KORP-PL (Kadukova et al., 2021).

3. Data Sets

To train ML and DL SFs, high-quality and reasonably large data sets are essential. The success of supervised machine learning and deep learning algorithms strongly depends on the quality and the size of the data set used for training. Thanks to the advances in high-throughput X-ray crystallography and cryo-electron microscopy (cryoEM), the number of available high-resolution structures in the Protein Data Bank (PDB) is constantly increasing (Goodsell et al., 2019b).

In this section, we briefly discuss some of the most common data sets encountered in the training and evaluation of machine learning and deep learning structure-based SFs for binding affinity prediction. The main data sets providing both co-crystal structures and experimental binding affinities are listed in Table 1

Table 1. Main data sets providing protein-ligand complexes (crystal structures) and corresponding binding affinities. N is the number of protein-ligand complexes (co-crystal structures) with associated binding affinities.

Data Set N Superset Website
PDBbind 2020 19 443 pdbbind.org.cn
CASF-2016 285 PDBbind 2016 pdbbind.org.cn
CASF-2013 195 PDBbind 2013 pdbbind.org.cn
CASF-2007 195 PDBbind 2007 pdbbind.org.cn
Binding MOAD 2020 15 223 bindingmoad.org
CSAR-NCS HiQ 343 Binding MOAD + PDBbind csardock.org
CSAR-NCS HiQ Update 123 Binding MOAD + PDBbind csardock.org
Astex Diverse Set 74 10.1021/jm061277y
BindingDB 11 442 bindingdb.org
D3R GC 4 20 drugdesigndata.org
D3R GC 3 24 drugdesigndata.org
D3R GC 2 36 drugdesigndata.org
D3R GC 2015 24 drugdesigndata.org

3.1. PDBbind

The PDBbind dataset (Wang et al., 2004) is a curated subset of the PDB and it is arguably one of the most common data sets used to train ML and DL SFs for protein-ligand binding affinity prediction. The dataset also contains protein-protein and ligand-nucleic acid complexes.

The origin of the database can be traced back to 2004, when Wang et al. (2004) collected protein-ligand complexes from the PDB (release 103, January 2003) and screened the primary references of the identified complexes to extract binding affinity data (Kd, Ki, IC50).

To train ML and DL SFs, high-quality data is essential—although it has been demonstrated that including lower quality data can improve performance (Li et al., 2015; Francoeur et al., 2020). The PDBbind database is therefore split into a “refined” set and a “general” set (Wang et al., 2004, 2005). The “refined” set is a selection of protein-ligand crystal structures with a resolution of 2.5 Å or lower, where there is a single ligand that is non-covalently bound without significant steric clashes (Wang et al., 2005). Only systems with associated equilibrium constants Ki and Kd are included in the refined set—IC50 values depend on the design of the binding assay—and complexes are filtered to only contain common organic elements.

The same approach was used to build the PDBbind refined set version 2007 (Cheng et al., 2009), but it was improved to produce the PDBbind refined set 2013 and subsequent versions (Li et al., 2014b; Liu et al., 2014, 2017). In addition to the previous criteria used to compile the PDBbind refined set 2007, the complexes added to the PDBbind refined set 2013 satisfy the following additional criteria (Li et al., 2014b): no missing backbone or side chain fragments within 8 Å from the ligand, no extreme values of binding affinity (1 pM < K < 10 mM, where K = {Ki, Kd}), no multiple binding sites with significantly different binding affinities (> 10 folds difference), no non-standard amino acids within 5 Å from the ligand, and no shallow binders (< 15% of buried ligand surface). The rules for selecting protein-ligand complexes into the PDBbind refined set 2013, together with their rationale and the difference with the rules used for the PDBbind refined set 2007, are very clearly summarized by Li et al. (2014b).

The PDBbind dataset can be downloaded from pdbbind.org.cn. The current release (PDBbind 2020) collects binding affinities and structural data for 23 496 biomolecular complexes, 19 443 of which are protein-ligand complexes.

3.1.1. CASF

The CASF benchmarks are a series of comparative assessments of scoring functions originally introduced by Cheng et al. (2009). They evaluate different scoring functions for their performance on scoring, ranking, docking, and screening on a diverse and high-quality set of protein-ligand complexes. Originally employed to compare mostly classical SFs, it has become the de facto standard for an initial evaluation of ML and DL SFs (especially for protein-ligand binding affinity prediction).

To test different scoring functions on a diverse and high-quality data set of protein-ligand complexes, a data set is extracted from the PDBbind refined set (where high-quality complexes have already been identified). The PDBbind refined set is clustered according to sequence similarity using BLAST (Altschul et al., 1990), with a similarity threshold of 90% (Cheng et al., 2009). This means that proteins with a sequence similarity higher than 90% are collected in the same cluster since they are likely to represent the same protein or the same protein family.

Once proteins from the PDBbind refined set are clustered by sequence similarity, clusters containing at least four complexes are retained (Cheng et al., 2009). This results in a total of 65 clusters, from which three complexes are sampled: the complex with lower binding affinity, the complex with higher binding affinity, and the complex with binding affinity closer to the mean between the highest and lowest binding affinities (Cheng et al., 2009). This clustered sub-sampling of the PDBbind refined set (called PDBbind core set) results in a total of 65 × 3 = 195 protein-ligand complexes used for the first comparative assessment of scoring functions (CASF-2007).

For the CASF-2013 comparative assessment of scoring functions (Li et al., 2014a), the construction of the PDBbind core set was improved by using the same sequence similarity program used by the PDB, and only clusters with five (and not four) proteins were retained (Li et al., 2014b). Additionally, the best binding affinity has to differ at least 10-fold from the median binding affinity, and the median binding affinity has to differ at least 10-fold from the poorest binding affinity (Li et al., 2014b). The electron density maps of the remaining complexes were visually assessed; if a complex failed at this step, the next best candidate was selected amongst the same cluster (Li et al., 2014b). The final PDBbind core set 2013 still consists of 195 protein-ligand complexes from 65 protein clusters (Li et al., 2014b).

The core set for CASF-2016 (Su et al., 2018) brought additional refinements and more data. As usual, the systems within the high-quality benchmark set are selected from the 4057 protein-ligand complexes in the PDBbind refined set (version 2016). The clustering of complexes based on protein sequence similarity remains the same. However, for CASF-2016, five representatives of each cluster were selected instead of the three selected for CASF-2007 and CASF-2013 (Su et al., 2018). The representative complexes were selected according to their binding affinity: the complex with the lowest binding affinity, the complex with the highest binding affinity, and three complexes distributed as evenly as possible between the lowest and highest binding affinity (Su et al., 2018). The lowest and highest binding affinities differ at least 100-fold and the difference between consecutive binding affinities is at least 1-fold. All ligands were inspected to ensure that there are no identical ligands or stereoisomers (Su et al., 2018). The final PDBbind core set (CASF-2016 benchmark set) consists of 57 × 5 = 285 protein-ligand complexes and it is arguably one of the test sets encountered more frequently in the development of ML and DL SFs.

Unlike the PDBbind data set, the CASF benchmark is not updated annually and therefore the latest release to date remains CASF-2016. The CASF benchmark packages can be downloaded from pdbbind.org.cn/casf.php.

It is very common for ML and DL SFs to be trained on the PDBbind refined or general set and subsequently tested on the CASF benchmark set. Recently, non-redundant subsets of the PDBbind refined set were introduced by Boyles et al. (2019) and Su et al. (2020) to evaluate the ability of ML and DL SFs to generalize when removing increasingly dissimilar examples from the training set that have some similarities with the CASF benchmark set.

3.2. Binding MOAD

The Binding MOAD (Mother Of All Databases) (Hu et al., 2005; Benson et al., 2007; Ahmed et al., 2014; Smith et al., 2019) is a subset of the PDB that collects high-quality and biologically relevant crystal structures of protein-ligand complexes together with experimentally determined binding affinities. Ligands available in the Binding MOAD include small peptides (ten amino acids or less), small oligonucleotides (four nucleotides or fewer), small and drug-like organic molecules, and enzymatic cofactors. Crystal structures have a resolution better than 2.5 Å. As for the PDBbind data set, experimental binding affinities are collected from the primary reference of the deposited PDB structure and consists of only Ki, Kd or IC50 values.

The Binding MOAD was first introduced in 2005, containing 5331 protein-ligand complexes from 1780 unique protein families and 2630 unique ligands (Hu et al., 2005). 1375 protein-ligand complexes were associated with binding affinity data spanning 13 orders of magnitude (Hu et al., 2005). The 1780 unique protein families were used to create a non-redundant subset for which 475 complexes have binding affinity data (Hu et al., 2005).

The Binding MOAD is extracted from the PDB as follows (Hu et al., 2005). The full PDB database is screened for high-resolution structures (better than 2.5 Å) excluding theoretical models and NMR structures. Structures containing nucleic acids larger than four nucleotides and peptides longer than ten amino acids were also discarded. Subsequently, complexes with covalently bound ligands as well as invalid ligand structures were filtered out. This reduced database of protein-ligand complexes is hand-curated: the primary citation associated with each structure is screened for binding affinity data while some “suspect ligands” were flagged for visual inspection, resulting in the final database of 5331 protein-ligand complexes.

The Binding MOAD has been expanded annually over the years by adding new protein-ligand complexes deposited on the PDB (together with binding affinity data), resulting in 23 269 total entries and 8156 entries with associated binding affinities in 2015 (Ahmed et al., 2014). In 2019, the Binding MOAD contained 32 747 structures comprising 9117 unique protein families and 16 044 unique ligands.

The Binding MOAD and the PDBbind databases are curated in a similar fashion, to the point that the two data sets could be compared to find and fix disagreements in overlapping systems (Hu et al., 2005). However, the Binding MOAD includes complexes with only binding cofactors, complexes with both a ligand and a cofactor present, and also includes high-quality complexes without binding affinity data (Hu et al., 2005).

Given that the development of the PDBbind was mostly driven by the development of scoring functions (Cheng et al., 2009; Li et al., 2014b,a; Su et al., 2018) while the development of the Binding MOAD was primarily driven by research on protein binding site prediction (Clark et al., 2020) and protein flexibility (Clark et al., 2019), it is more common to encounter the former in the development of ML and DL SFs. However, Binding MOAD can be certainly used for assessing the performance of scoring functions in binding affinity prediction (Xavier et al., 2016) and has been used to build the CSAR dataset discussed below.

3.2.1. CSAR

The CSAR dataset is a data set associated with the Community Structure-Activity Resource (CSAR) which has the goal of collecting high-quality data from both academia and industry to improve docking scoring functions and to organize community-wide assessments of current methods (Dunbar et al., 2011).

The first CSAR data set consisted of protein-ligand complexes from the PDB for which experimental binding affinities (Ki or Kd values) were available in the Binding MOAD database, augmented with data from the PDBbind; Dunbar et al. (2011) describe the CSAR data set as “the best of the PDB [...] augmented with binding data from Binding MOAD and PDBbind”. The data set consists of 343 protein-ligand complexes which span binding affinities over several orders of magnitude.

The CSAR data set is subdivided into two subsets: Set 1, and Set 2. Initially, 2916 protein-ligand complexes were identified in the Binding MOAD database (version 2006) and filtered down to 1241 entries according to the quality of the crystal structures. Further processing consisted of the removal of ligands for which hybridization states and bond orders could not be automatically inferred, and for which the experimental binding affinity was expressed in terms of IC50 values. This resulted in a total of 309 complexes with associated Ka, Kd and Ki values. Later on, an additional 1228 complexes from Binding MOAD (versions 2007 and 2008) were processed to obtain an additional 230 complexes with associated binding affinity data. After moving some complexes between the two groups to balance physicochemical properties, the final data set representing the initial release consisted of Set 1 (242 entries) and Set 2 (297 entries). Following community feedback, a more stringent quality assessment of the crystal structures was applied, thus reducing the size of the two sets, and errors concerning binding affinities were corrected. Following the CSAR benchmark exercise (Smith et al., 2011), the two sets were further processed resulting in the CSAR-NCS HiQ data set (September 2010), subdivided into Set 1 (176 entries) and Set 2 (167 entries). The CSAR-NCS HiQ data set consists of 52 protein targets with 2 or more structures and 191 targets with a single structure.

The CSAR-NCS HiQ dataset was subsequently updated with an additional 123 structures (set 3) applying the same criteria of the CSAR-NCS HiQ data set to structures in Binding MOAD added between 1/1/2009 and 12/31/2011.

The CSAR-NCS HiQ data set Dunbar et al. (2011), its update, and other data sets associated with the CSAR benchmark exercises (Dunbar et al., 2013; Damm-Ganamet et al., 2013; Smith et al., 2015; Carlson et al., 2016) can be downloaded from csardock.org or bindingmoad.org

3.3. Astex Diverse Set

The Astex Diverse Set is another common data set encountered in the validation of protein-ligand scoring functions (Hartshorn et al., 2007), alongside the CASF and CSAR benchmarks. This data set contains 85 protein-ligand complexes, most of which are associated with experimentally measured binding potency.

The diverse set was obtained as follows. First, proteins from the PDB database were clustered based on sequence similarity leading to 9188 clusters of distinct proteins. Then, ligands bound to the clustered proteins were then screened to select high-quality structures of pharmaceutical or agrochemical interest and were filtered according to drug-likeliness criteria. The selected protein-ligand complexes were further assessed in terms of ligand clashes with the binding site residues, possible problems related to spurious interactions, and quality of the ligand electron density. This automated filtering procedure resulted in 427 clusters with high-quality protein-ligand complexes.

The final Astex Diverse Set was manually curated from the 427 clusters resulting in 85 complexes. Potency data for 74 of the 85 complexes was finally extracted from the literature.

3.4. Other data sets

The data sets described above are curated collections of binding affinities and structures and are therefore useful for the development and assessment of structure-based SFs for protein-ligand binding affinity predictions, both using classical and ML/DL scoring functions. However, there are several other data set that might be useful to build and assess scoring functions, and some are briefly described below.

ChEMBL (Gaulton et al., 2011; Bento et al., 2013; Mendez et al., 2018) is a manually curated database of bioactive molecules, where data about drug-like molecules are collected together with results from bioactivity assays and genomic information. ChEMBL version 29 (10.6019/CHEMBL.database.29) contains data about 2 105 464 compounds and 14 554 targets. While ChEMBL is an extremely valuable resource and provides a large amount of binding affinity data, it does not contain structural data and it is, therefore, more commonly encountered in the development and assessment of ligand-based models (such as in Riniker and Landrum (2013a)).

The bioactivity data in ChEMBL is also exchanged with PubChem Bioassay (Wang et al., 2009, 2011) and BindingDB (Chen et al., 2001; Liu et al., 2007). The PubChem Bioassay database is a public repository containing bioactivity data for small molecules collecting more than 130 million assay results together with their protocols, while the BindingDB is a public database of experimental binding affinities between proteins (8,644 as of November 8, 2021) and drug-like molecules (1,023,385 as of November 8, 2021) which is accessible via a web interface. The BindingDB also contains 5988 protein-ligand crystal structures with associated binding affinity measurements.

Data sets released as part of the Drug Design Data Resources (D3R) Grand Challenges also constitute important datasets on which several ML and DL scoring functions have been designed or tested. D3R Grand Challenges promote the development and benchmarking of computational methods for binding pose and binding affinity prediction, by organizing blinded community challenges using high-quality data sets of pharmaceutical interest. The first D3R Grand Challenge was based on two targets (Gathiaka et al., 2016) using data from industrial drug discovery programs. Subsequent challenges (Gaieb et al., 2017, 2019; Parks et al., 2020) introduced novel targets and associated data for the blind prediction of binding poses, affinity rankings, and relative binding free energies. All the data sets are now easily accessible on the D3R website (drugdesigndata.org) as additional test sets for the development and evaluation of ML and DL scoring functions. Interestingly, in the D3R Grand Challenge 3 an increased number of ML methods was observed but overall they did not seem to perform any better than standard methods (Gaieb et al., 2019).

The databases that do not contain target structures are often employed to build ligand-based models or are used to put together new data sets with three-dimensional structures by generating different conformers for the ligand and collecting target structures from the PDB (Bernstein et al., 1977; Berman et al., 2000) to subsequently build structure-based models. For example, Boyles et al. (2021) recently released a new dataset—called the Updated DUD-E Diverse Subset—which combines data from the Directory of Useful Decoys Enhanced (DUD-E) data set (Mysinger et al., 2012) and ChEMBL.

Some data sets for binding affinity prediction discussed above are collected into benchmark data sets such as MoleculeNet (Wu et al., 2018) and Therapeutic Data Commons (Huang et al., 2021b), which provide much-needed collections for the evaluation of different machine learning and deep learning methods for molecular properties prediction as well as drug discovery and development.

4. Machine Learning And Deep Learning Scoring Functions

Machine learning (or descriptor-based) scoring functions have been developed and used for decades (Brown, 2020). The simplest “scoring functions”—more commonly known as QSAR (Quantitative Structure-activity Relationship) models—were based on a small set of handcrafted descriptors and simple models (such as multiple linear regression Morris et al. (1998); Böhm (1992)), and typically ligand-based. Later, other machine learning (ML) algorithms—such as support vector machines (SVM) (Boser et al., 1992; Cortes and Vapnik, 1995), random forests (RFs) (Ho, 1995, 1998; Breiman, 2001), and gradient boosting (Friedman, 2002; Mason et al., 1999)—have been applied in attempt to learn non-linear relationships between descriptors and the binding affinity.

For an in-depth and rigorous introduction to the deep learning (DL) architectures described below the reader should consult Goodfellow et al. (2016), while classical ML methods are described thoroughly in Bishop (2006), to which we refer the interested reader. For a more hands-on introduction to both ML and DL the reader should consult Géron (2019).

4.1. Descriptors

Descriptors for ML and DL SFs can encode information about the ligand, about the protein, or about intermolecular interactions in the protein-ligand complex. Ligand descriptors are commonly used in cheminformatics applications, quantitative structure-activity relationship (QSAR) modelling, and ligand-based virtual screening (Lo et al., 2018). Ligand-based descriptors can be combined with descriptors for the protein commonly employed in ML-based protein engineering (Xu et al., 2020) to obtain a SF that combines separate information about the ligand and the protein. However, here we focus on structure-based descriptors that encode the protein-ligand complex as a whole and form the basis for structure-based SFs. Methods that combine separate ligand and protein descriptors (van Westen et al., 2011), known as “pair methods” or “proteochemometric models”, have been reviewed by Qiu et al. (2016) and more recently by Kimber et al. (2021).

One common distinction between ML and DL models is that the latter are usually based on a simpler representation and learn descriptors directly from the data; this distinction is however somewhat arbitrary and most DL models still require some pre-processing to convert atom types and coordinates in the correct format for the architecture being used. Here we briefly review structure-based descriptors commonly employed with ML algorithms as well as the input representation used in DL architectures.

One common type of descriptor employed with ML models is an interaction fingerprint (IFP). Structural interaction FPs (SIFts) encode the 3D structure of a protein-ligand complex into a one-dimensional binary vector (Deng et al., 2003). Seven different interaction types involving the ligand and binding site residues are identified and a FP for the whole protein-ligand complex is obtained by concatenating the binding bit string of each binding site residue. Simple ligand-receptor interaction descriptors (SILIRID) are instead obtained from binary IFPs by summing the bits corresponding to the same amino-acids (Chupakhin et al., 2014), thus resulting in a FP with 168 elements (20 amino acids and one cofactor, and 8 interaction types per amino acid). Da and Kireev (2014) developed structural protein-ligand interaction fingerprints (SPLIF), where protein-ligand atom pairs within 4.5 Å are identified and expanded into circular fragments described by extended connectivity fingerprints (ECFPs) (Rogers and Hahn, 2010). In this way, protein-ligand interactions are encoded implicitly instead of needing explicit ad-hoc interaction classes and therefore can encode all local interactions (Da and Kireev, 2014). Similarly, Wójcikowski et al. (2018) developed the protein-ligand extended connectivity (PLEC) FP which combines the ECFP environments of the protein and the ligand atoms in contact to describe protein-ligand interactions. The atomic features employed to construct PLEC FPs—atomic number, isotope, number of neighboring heavy atoms, number of hydrogens, formal charge, ring membership, and aromaticity—are the same used to construct ECFP, but only pairs of interacting atoms within 4.5 Å are considered. The FPs computed for ligand and protein atoms are hashed together to a final bit position (Wójcikowski et al., 2018). The PLEC FP implementation is available as part of the Open Drug Discovery Toolkit (Wójcikowski et al., 2015) and has been successfully used in combination with different ML models for binding affinity prediction (Wójcikowski et al., 2018). There are several other IFPs such as APIF (Pérez-Nueno et al., 2009), PADIF (Jasper et al., 2018), and PyLIF (Radifar et al., 2013).

Ballester et al. (2014) evaluated the impact of the choice of chemical descriptors on ML scoring functions. They showed that more complex descriptors do not necessarily lead to more accurate scoring functions and they identify and discuss the factors that might be contributing to this observation: modelling assumptions, co-dependence of representation and regression method, and data set features.

In structure-based methods, the goal is to exploit the 3D information of protein-ligand complexes. One natural representation of the 3D structure of protein-ligand complexes is the electron density, which is used in X-ray crystallography to model the structure of both the protein and the bound ligand. To encode information about the nuclei available from resolved structures, a representation that clearly encodes the spatial relationship between the protein and the ligand are three-dimensional (3D) grids which discretize volumetric data. The voxel occupancy is often defined as the sum (or maximum) of decaying density functions centered at the different atoms, while atoms of different types are represented in different grids—which can be thought of as a generalization of the RGB channels used in 2D images to represent the different colors. Different representations have been proposed, but they are mostly based on atom-centered density functions g(r; i, t) centered at atom i of type t whose contributions are aggregated together:

G(r;t,R)=iNg(||rRi||;ti)δti,t. 6

r represents the coordinates of the voxel, Ri represents the coordinates of atom i, while δi,j is Kronecker delta so that only atoms of type t contribute to G(r; t, R). ⊕ is an aggregation function such as sum or maximum.

In Jiménez et al. (2018),

g(r;t)=1exp[(Rtr)12] 7

and the different channel represent hydrophobic, hydrogen-bond donor/acceptor, aromatic, ionizable, metallic, and excluded volume properties. Rt represents an atom type-dependent characteristic length, often set to the van der Waals radius. The properties are duplicated to represent protein and ligand atoms in different channels, and the density for different atoms in the same channel is aggregated by taking the maximum. In Ragoza et al. (2017a) and subsequent publications (Sunseri et al., 2018; Francoeur et al., 2020) the following functional form is used:

g(r;t)={e2r2/Rt20r<Rt,4e2Rt2r212e2Rtr+9e2Rtr<1.5Rt,0r1.5Rt, 8

and the different channels represent the different atom types from the SMINA docking software (Koes et al., 2013), resulting in 16 channels for the receptor and 18 channels for the ligand (Ragoza et al., 2017a). Contributions from different atoms on the same channel are summed together.

The advantage of using 3D grid representations is that they encode clear spatial relationships between the different atom types and the computation can be performed very efficiently (Sunseri and Koes, 2020) thus allowing on-the-fly data augmentation during training. However, grid representations have also several limitations. While computation of G(r; t) can be performed very efficiently, their dependence on the coordinate frame requires extensive data augmentation (Ragoza et al., 2017a) at increased computational costs, and the sparsity of some channels (such as the ones representing halogens or metals) implies wasteful computations. Additionally, the memory footprint of grid-based representations increases with the number of atom types. Despite the limitations, the close connection to the field of computer vision has led to the successful development of several SFs based on this representation, as discussed below.

For graph-based models such as graph neural networks (GNNs), descriptors are associated to atoms—the nodes of the graph—and bonds—the edges of the graph. A node descriptor is a vector containing information about the atom. An edge descriptor is a vector describing the chemical bond—such as the bond order. There are several descriptors employed in the literature, and they depend on the task at hand. For protein-ligand binding affinities, simple quantities related to an atom or a bond are commonly employed since higher-level features are learned by intermediate GNN layers (Feinberg et al., 2018). Such simple features for the nodes can include one-hot-encoded elements or atom types, formal charges, hybridization states, aromaticity, and other atomic properties (Jiang et al., 2021). Edge features can include both 2D and 3D information such as bond order, conjugation, bond length, and other bond properties (Jiang et al., 2021).

Descriptors commonly used in ML/DL for quantum chemistry have been successfully applied to the classification of active and decoys against different protein families (Bartók et al., 2017). Recently, the smooth overlap of atomic position (SOAP) descriptor (Bartók et al., 2013)—which allows comparing molecules across the structural and chemical space (De et al., 2016)—have been used together with Gaussian processes models to predict pIC50 values (McCorkindale et al., 2020). At the same time, atomic environment vectors developed for the ANI family of neural network potentials (Smith et al., 2017; Gao et al., 2020) and based on Behler-Parrinello symmetry functions (Behler and Parrinello, 2007) have been used as descriptors of protein-ligand complexes for binding affinity predictions (Meli et al., 2021). Behler-Parrinello symmetry functions have also been employed as node features in GNNs for binding affinity prediction (Karlov et al., 2020) and inspired the atomic convolution architecture from Gomes et al. (2017). Both descriptors are strongly related (Musil et al., 2021) and provide a local descriptor of the structural and chemical environment of atoms in a way that is translationally and rotationally invariant.

Learned molecular representations also play an important role as descriptors (Chuang et al., 2020; Menke and Koch, 2021). The characteristic of deep learning architecture is that useful and efficient internal representations are learned directly from the input data. Therefore, the fixed and ad-hoc descriptors or fingerprints described above can be substituted with learned representations. Yang et al. (2019) performed an extensive analysis of learned molecular representation for property predictions, showing that they achieve similar or better performance than fixed descriptors. While many learned representations for computational chemistry include only 2D information, learned representation for three-dimensional structures have been developed (Kuzminykh et al., 2018) but their application in structure-based drug discovery is still under-explored. The interest in DL architectures is that they can leverage the simple inputs described above (such as 3D atomic densities or coordinates and atom types) to automatically learn internally complex representations that can be used for molecular property prediction.

Some authors extracted descriptors from molecular dynamics (MD) trajectories, instead of using the crystal structure or docked poses, although the use of trajectory data remains rare (Wang and Riniker, 2020). Yakovenko and Jones (2017) use atomic densities but trained their model on both docked poses and MD trajectory frames to obtain learned representations later used to predict LIE. Berishvili et al. (2019) developed 1D descriptors based on GROMACS (Berendsen et al., 1995; Abraham et al., 2015), AutoDock Vina (Trott and Olson, 2009), and SMINA (Koes et al., 2013) terms to describe frames from MD trajectories. The descriptor for each frame where stacked together into a matrix of size ndescriptor × nframes, representing the whole MD trajectory.

A more in-depth overview of featurization strategies for protein-ligand interactions that are commonly employed in the development of ML and DL SFs is given by Xiong et al. (2021), while an overview of common molecular representations used in AI-driven drug discovery is provided by David et al. (2020).

4.2. Overview of Classical Machine Learning Scoring Functions

Classical ML algorithms such as SVMs and RFs have been used in quantitative structure-activity relationship (QSAR) modelling and in the development of structure-based scoring functions for a while (Ain et al., 2015; Muratov et al., 2020).

One of the earliest ML SFs for binding affinity predictions has been developed by (Deng et al., 2004). The model combines protein-ligand atom pair occurrence and distance-dependent atom pair features with a kernel partial least squares method (K-PLS) (Rännar et al., 1994, 1995) to predict pKd, demonstrating that structure-based descriptors combined with ML regression can be effective for protein-ligand binding affinity prediction on different complexes. Das et al. (2010) introduced property-encoded shape distribution signatures—descriptors encoding molecular shapes and property distributions on protein and ligand surfaces—which were used in combination with SVM to build a regression model. SVM-based regression was also used by Li et al. (2011) to develop two SFs, one based on knowledge-based potentials (SVR-KB) and another based on physicochemical properties of the protein-ligand complex (SVR-EP). Both SFs show very good performance on the CASF benchmark when compared to classical SFs.

RF models have been quite successful in the development of structure-based ML SFs. Ballester and Mitchell (2010) introduced a novel SF based on RF, called RF-Score. Protein-ligand complexes are described by a 36-dimensional feature vector storing the occurrence count of different protein-ligand atom pairs within a cutoff of 12 Å. The feature vector is used as input of a RF regression model predicting the binding affinity. Thanks to the use of the PDBbind benchmark (Cheng et al., 2009), RF-Score could be easily compared to 16 other SFs, showing that RF-Score is a very competitive scoring function. SFCScoreRF improved the performance of RF-Score by using a different and larger feature vector including ligand-based features (such as the number of rotatable bonds) and interaction-specific descriptors (Zilian and Sotriffer, 2013).

Gradient boosting (Friedman, 2002; Mason et al., 1999)—often combined with decision trees—is another popular ML technique used in the development of SFs, also thanks to the availability of high-quality open-source implementations such as XGBoost (Chen and Guestrin, 2016) and LightGBM (Ke et al., 2017). Notable scoring functions based on gradient boosting are XGB-Score (Li et al., 2019b), AGL-Score (Nguyen and Wei, 2019), and OPRC-GBT (Wee and Xia, 2021). Shen et al. (2021) recently developed several XGBoost-based classifiers to assess the impact of cross-docked poses on the performance on pose-prediction, highlighting the importance of cross-docked poses for training of ML SFs with a broad applicability domain and increased robustness for pose-prediction.

Instead of learning the experimental protein-ligand binding affinity directly, (Wang and Zhang, 2016) used a RF model learning to correct the AutoDock Vina scoring function (Trott and Olson, 2009), which represent a reasonable baseline—especially for docking and virtual screening. The resulting scoring function, called ΔVinaRF, retains the very good scoring performance of other ML SFs on scoring and ranking while also working well for docking and virtual screening. ΔVinaRF showed great performance in the CASF-2016 benchmark (with a Pearson’s r of 0.82 in the scoring task), but this superior performance can be partially attributed to the overlap between the training set and the CASF-2016 test set (Su et al., 2018).

ML-based scoring functions are still under active development both in terms of methodology and training data. For example, Boyles et al. (2019) showed that including ligand features obtained with RDKit into structure-based ML scoring functions consistently improves the performance in protein-ligand binding affinity prediction. Combining features from RF-Score (v3) with RDKit molecular descriptors improves Pearson’s correlation for the CASF-2016 scoring benchmark from 0.79 to 0.84 (Boyles et al., 2019). Another example of recently developed scoring function using classical machine learning regression models for binding affinity prediction is RASPD+ (Holderbach et al., 2020).

Several other classical machine learning algorithms such as kernel ridge regression, Gaussian processes (Williams and Rasmussen, 1996; Rasmussen, 2003), and other methods have been used in the development of structure-based scoring functions but they are not the focus of this review. The interested reader can consult Ain et al. (2015) and (Li et al., 2020a) for a more in-depth review of machine learning scoring functions.

4.3. Feed-forward neural networks

Feed-forward neural networks (also known as multilayer perceptrons (MLPs), fully-connected neural networks, artificial neural networks (ANNs), or simply neural networks (NNs)) consist in a series of linear layers combined with point-wise non-linearities called activation functions (Bishop, 2006). Originally, feed-forward neural networks were inspired by the way neurons in the brain work (McCulloch and Pitts, 1943; Widrow and Hoff, 1960; Rosenblatt, 1962).

The basic unit of a neural network is a “neuron” (perceptron, or node) and the neurons in a neural network are clustered in different layers that are stacked. The neuron j in layer k takes an input vector x ∈ RN returns an output

zj(k)=g(iNwji(k)xi+bj(k)), 9

where wji(k) (weights) and bj(k) (biases) for neuron j in layer k are learnable parameters to be determined during training and where g(·) is a non-linear function, called activation function. Neural networks are very expressive and can be regarded as universal approximators (Hornik et al., 1989), provided a large enough number of hidden neurons and some classes of activation functions (Bishop, 2006).

Initially, neural networks were composed only of a small number of neurons with a single (hidden) layer between the input layer and the output layer but thanks to the development of algorithms able to train neural networks with multiple layers in a simple and efficient way (Rumelhart et al., 1986) neural networks became deeper and deeper (now called deep neural networks, DNNs) by staking together multiple hidden layers.

The use of simple and deep NNs for the determination of quantitative structure-activity relationships (QSAR) is not new (Salt et al., 1992; Ma et al., 2015; Dahl et al., 2014). One of the first use of NNs in binding affinity prediction was published by Artemenko (2008), where a subset of physicochemical descriptors and quasi-fragmental descriptors—describing pairwise statistics of interatomic distances—were selected using multiple linear regression and used as input of a feed-forward NN. NNs have been also successfully used for classification of actives and decoys. Durrant and McCammon (2010) introduced a NN-based SF—NNScore—to distinguish between well and poorly docked ligands as well as actives from decoys. NNScore was later extended to regression of binding affinities in NNScore 2.0 (Durrant and McCammon, 2011b) thus providing a direct estimation of pKd. NNScore 2.0 uses terms from the Vina scoring function (Trott and Olson, 2009)—to encode steric, hydrophobic, and hydrogen-bonding interactions–as well as BINANA features (Durrant and McCammon, 2011a) as input and returns a estimate of pKd as output.

Ashtawy and Mahapatra (2015) used a collection of NNs whose predictions are combined with the bagging (Breiman, 1996)—bootstrap aggregation—or boosting (Freund and Schapire, 1997; Friedman, 2002) ensemble methods. The input features were obtained as a combination of classical scoring functions terms, together with features from RF-Score. Their BgN-Score and BsN-Score SFs perform significantly better on the PDBbind core set 2007 than classical SF and surpass SFs based on RFs.

Wójcikowski et al. (2018) showed that a MLP combined with their PLEC fingerprint can achieve very good performance on the CASF-2016 benchmark. However, they also show that the PLEC FPs perform equally well when using a simpler linear model instead of a neural network, confirming that well-crafted descriptors can be extremely powerful.

More recently, Zhu et al. (2020) developed a model for pKd prediction where pairwise contributions are computed with a fully connected NN. Trained on the PDBbind 2018, they achieve a Pearson’s correlation coefficient of 0.75 and a RMSE of 1.44 on the CASF-2016 benchmark but the authors point out that there is a significant overlap between the test and training sets which might be boosting the performance of their model. Meli et al. (2021) used a collection of MLPs combined with a local representation of the atomic environment to predict protein-ligand binding affinities, reaching good performance on the CASF-2016 benchmark.

4.4. Convolutional neural networks

Convolutional neural networks (Fukushima, 1980; Le Cun et al., 1989; Lecun et al., 1998; Krizhevsky et al., 2017) are a class of neural networks that tries to overcome some of the limitations of feed-forward neural networks, by using convolution operations instead of matrix multiplication in some of their layers (Goodfellow et al., 2016). Feed-forward neural networks use a one-dimensional vector as input which prevents the encoding of spatial relationships, and uses many parameters. CNNs are based on three main concepts (Bishop, 2006): local receptive fields (inspired by the structure of the visual cortex (Hubel, 1959; Hubel and Wiesel, 1959)), weight sharing, and subsampling.

Local receptive fields are implemented in convolutional layers, where neurons in a layer do not receive the output of all neurons in the previous layer (as in fully-connected NNs) but only the ones in their local receptive field (Géron, 2019). For two-dimensional grid-based inputs (such as images), the output of neuron at location (i, j) of feature map k of the convolutional layer l is given by (Géron, 2019)

zi,j,k(l)=bk(l)+u=1fh(l)v=1fw(l)k(l1)=1fn(l1)xi(l1),j(l1),k(l1)wu,v,k(l1),k(l), 10

with

{i(l1)=ush(l)+fh(l)1,j(l1)=usw(l)+fw(l)1 11

fh and fw are the height and the width of the receptive field (i.e. the size of the 2D convolutional kernel) while sh and sw represent the strides (i.e. the size of the displacement of the receptive field). fn(l1) denotes the number of feature maps in the previous layer (l − 1). bkl is a bias term associated to feature map k while wu,v,k,k(l) denotes the weight term associated to the connection between the input located at (u, v) in feature map k(l−1) (relative to the neuron’s receptive field) and the neuron in feature map k of layer l. Both bkl and wu,v,k,k(l) are learnable parameters to be determined during training. For clear depictions of the main building blocks of 2D CNNs we refer the reader to Dumoulin and Visin (2016).

Parameter sharing in a convolutional network comes from the fact that each weight wu,v,k,k(l) of the kernel is used at every position of the input, avoiding the need to learn a parameter for each input element as it is the case in MLPs. Parameter sharing does not reduce the computational complexity of the forward pass, but significantly reduces the number of parameters in the network (when the size of the convolutional kernel is much smaller than the size of the input) and therefore the associated memory footprint (Goodfellow et al., 2016).

Pooling layers—such as maximum pooling (Zhou et al., 1988), and average pooling—are often inserted after (activated) convolutional layers to make the representation approximately invariant to small translations (Goodfellow et al., 2016). Additionally, they reduce the size of the input of the next layer thus increasing the computational efficiency of the CNN, and are essential for dealing with inputs of varying size (Goodfellow et al., 2016).

Convolutional neural networks have been very successfully applied to different tasks in computer vision such as image classification (Krizhevsky et al., 2017) in the ImageNet challenge (Deng et al., 2009; Russakovsky et al., 2015).

Wallach et al. (2015) introduced a structure-based deep convolutional network for bioactivity prediction (classification into two activity classes) of small drug-like molecules against a target of interest. In their architecture, denoted AtomNet, the protein-ligand binding site was converted into a 3D grid (20 Å per side at 1 Å resolution) containing values related to structural features such as the number of atom types or protein-ligand descriptors such as SPLIF (Da and Kireev, 2014), SIFt (Deng et al., 2003), or APIF (Pérez-Nueno et al., 2009). They showed improved performance in the ROC-AUC compared to their baseline, provided by the SMINA docking software (Koes et al., 2013). Ragoza et al. (2017a) introduced a similar approach to distinguish good (low RMSD) from bad (high RMSD) docking poses using CNNs based on an atomic density representation of the binding site (see Eq. (8)). This approach was later extended to include binding affinity predictions in a multitask learning framework (Sunseri et al., 2018)—both the binding affinity and the pose quality are predicted at the same time—and it was shown to provide a good correlation between experimental and predicted binding affinities for the CASF-2016 benchmark (Francoeur et al., 2020). The various pre-trained CNN scoring functions are integrated and readily available in the GNINA docking software (McNutt et al., 2021). Jiménez et al. (2018) took a similar approach—with a slightly different density representation, first introduced in DeepSite Jiménez et al. (2017)—for binding affinity prediction with their Kdeep architecture and they achieved a very good correlation and low RMSE on the CASF-2016 benchmark. Interestingly, they analyzed the accuracy separately for the 58 different target classes of the CASF-2016 benchmark, revealing that accuracy is very sensitive to the specific protein used. Indeed, protein family-specific CNN models have been developed for virtual screening using a transfer-learning approach (Imrie et al., 2018).

Many other architectures for binding affinity predictions based on CNNs have been developed in recent years. Notable examples are Pafnucy (Stepniewska-Dziubinska et al., 2018), DeepAtom (Li et al., 2019c), OnionNet (Zheng et al., 2019; Wang et al., 2021b) and OnionNet-2 (Wang et al., 2021d), and RoseNet (Hassan-Harrirou et al., 2020).

Pafnucy discretizes the binding site on a three-dimensional grid of 20 Å in side at 1 Å resolution and employs a set of 19 features including one-hot encoding of atom types (including selenium, halogens, and metals), hybridization state, number of bonds with heavy atoms, number of bonds with heteroatoms and a flag distinguishing protein and ligand atoms. DeepAtom uses a grid of 1 Å resolution to voxelize the binding site, with the same density representation of Jiménez et al. (2018) and using Arpeggio atom types (Jubb et al., 2017), but the architecture is inspired from ShuffleNet V2 (Ma et al., 2018). OnionNet (Zheng et al., 2019) also uses a deep convolutional neural network but the input features are based on intermolecular element-pair-specific contacts between ligand and protein atoms, which are grouped in different distance shells. Each shell is described by 64 features representing the intermolecular interactions—within the shell boundaries—between the protein and ligand for eight atoms types considered, and a total of 60 shells (of thickness 0.5 Å) is employed (Zheng et al., 2019). This idea was later extended in OnionNet-2 (Wang et al., 2021d), which uses protein residues types instead of protein atom types (increasing the number of features from 64 to 8 × 21 = 168). RoseNet (Hassan-Harrirou et al., 2020) uses an ensemble of CNNs—based on the ResNet architecture (He et al., 2016)—combining molecular mechanics energies from the Rosetta force field (Alford et al., 2017) voxelized onto a 3D grid (25 Å each side, at 1 Å resolution) and molecular descriptors—using an approach similar to Kdeep with descriptors from AutoDock Vina (Trott and Olson, 2009)—to predict absolute binding affinities.

CNNs can also be employed with lower-dimensional descriptors. For example, TopologyNet (Cang and Wei, 2017) encodes the three-dimensional protein-ligand complex structure into one-dimensional element-specific fingerprints based on topological invariants. Such element-specific topological fingerprints, stacked together over multiple channels—like a one-dimensional image representation—are then used as input of a CNN, and achieve good performance on the CASF-2016 benchmark. The work was later extended to explore additional algebraic topology approaches (Cang et al., 2018).

CNNs have also been successfully applied to the related task of pose prediction. The CNN developed by Ragoza et al. (2017a) has been developed initially for pose prediction, and it was extended to binding affinity prediction on a later stage (Francoeur et al., 2020). Other notable examples are DeepBSP (Bao et al., 2021), which uses a 3D voxel representation of protein-ligand complexes to predict the RMSD between a docked ligand and its native pose—an idea previously explored by Aggarwal and Koes (2020)—and MedusaNet (Jiang et al., 2020a), which uses CNNs to predict if a pose generated by docking is a good pose to stop the docking process earlier when k good poses are found thus reducing computational costs.

The application of CNNs in the prediction of protein-ligand binding affinities has been quite successful, as demonstrated by the methods discussed above. However, while CNNs are translational invariant they are not rotationally invariant and therefore require extensive data augmentation where the protein-ligand complex is randomly rotated before computing its associated grid representation. Data augmentation with CNNs has proven to be essential to prevent overfitting in pose prediction (Ragoza et al., 2017a), and the average over multiple random rotations can be used during inference thus reducing the variance of the predictions (Jiménez et al., 2018). Many concepts from geometric deep learning (Bronstein et al., 2021; Atz et al., 2021), such as CNNs that are equivariant to rigid body motions (Weiler et al., 2018), will spill more and more into the field of protein-ligand binding affinity prediction as well as virtual screening to overcome some of the limitations of standard CNNs by encoding relevant symmetries directly into the model.

4.5. Graph neural networks

Graph neural networks (GNNs) are a collection of DL architectures to work with data that can be represented as a graph (Bronstein et al., 2021). The vast majority of GNNs falls under three categories (Bronstein et al., 2021): convolutional (Kipf and Welling, 2016; Defferrard et al., 2016), attentional (Monti et al., 2017; Veličković et al., 2017; Zhang et al., 2018), and message-passing (Gilmer et al., 2017; Battaglia et al., 2018). A graph 𝒢(V, E) is composed of a set of vertices viV and a set of edges eijE connecting the vertices. Features x are associated to vertices (and, optionally, edges) and such features are subsequently updated as follows:

hu=ϕ(xu,vNuψ(xu,xv)) 12

where ϕ and ψ are learnable functions (often learnable affine transformations with activation functions (Bronstein et al., 2021)) and where ⊕ represents a permutation-invariant function allowing the aggregation of features (such as sum, mean, and maximum (Bronstein et al., 2021)) over the neighborhood 𝒩u of node u. ψ is a message-passing function (which can be generalized to include edge features as well), while ϕ is a vertex update function. It is possible to learn edge features as well by introducing a hidden representation huv for the edges (Kearnes et al., 2016; Gilmer et al., 2017).

Since molecules can be naturally represented as graphs—with nodes in the graphs representing different atoms and edges in the graph representing the chemical bonds between such atoms—GNNs are well suited to be applied in the field of chemistry (Atz et al., 2021). Message-passing GNNs, which are the most general flavor, have been successfully applied in quantum chemistry applications (Schätt et al., 2018; Schafer et al., 2020; Qiao et al., 2020; Christensen et al., 2021). GNNs have also been applied to several molecular property predictions (Gaudelet et al., 2021), including bioactivity and protein-ligand binding affinity.

Gomes et al. (2017), inspired by the work of Behler and Parrinello (2007), developed an atom type convolution that uses a neighbor-listed distance matrix to automatically extract features about local chemical environments and combine this information with radial pooling to downsample the output of the atom type convolution. Essentially, the atom type convolution performs a graph convolution on the nearest neighbors graph in three-dimensional space. The resulting features are then passed to a collection of fully connected layers (all with the same weights and biases) to predict atomic contributions to the energy, which are summed together to obtain the total Gibbs free energy. To predict the binding free energy, three weight-sharing networks are used (one each for Gcomplex, Gprotein and Gligand) and the results are then combined as

ΔGcomplex =Gcomplex Gprotein Gligand  13

so that the whole architecture directly incorporates the thermodynamic cycle.

In PotentialNet (Feinberg et al., 2018) the node updates are of the form

hu(k)=GRU(hu(k1),eNetvNe(νi)NNe(hv(k1))) 14

where GRU is a gated recurrent unit (Hochreiter and Schmidhuber, 1997; Cho et al., 2014; Chung et al., 2014), NNe is a trainable NN for edge type e, and 𝒩e(𝞶i) denotes the neighbors of edge type e for atom i. Several updates are concatenated into different stages: in the first stage information is propagated only between nodes linked by a covalent bond, in the second stage information is propagated between non-covalent and covalent bonds and finally, everything is aggregated by a ligand-based graph gather. The first step essentially produces learned (bond-based) atom types, while the second step includes both bond and spatial information between the atoms (Feinberg et al., 2018). In stage three, all learned features for the ligand atoms are summed together and the resulting vector is used as input of a fully-connected neural network to produce the final prediction.

The graphDelta architecture uses a graph-based representation for the ligand and incorporates information about the target in the node features (Karlov et al., 2020). The node features represent radial and angular Behler-Parrinello atom centered symmetry functions (ACSFs) (Behler and Parrinello, 2007), combined with a message-passing neural network. With enough training epochs, they achieve a Pearson’s correlation coefficient of 0.87 and a RMSE of 1.05 in the CASF-2016 benchmark for binding affinity prediction.

Li et al. (2021c) developed a structure-aware interactive GNN which combines polar coordinate-inspired graph attention layers and pairwise interactive pooling. The graph attention layers leverage distances between nodes and angles between edges to iteratively update node and edge embeddings while preserving distance and angle information among atoms. The pairwise atomic type-aware pooling layer is then used to gather interactive edges to capture long-range interactions. Their model, called SIGN, achieves good results on the CASF-2016 benchmark for binding affinity prediction as well as the CSAR-NRC HiQ set.

Son and Kim (2021) developed GraphBAR, where a graph is constructed from all ligand atoms and protein atoms within 4 Å from the ligand (limited to a maximum of 200 nodes, with zero-padding of the adjacency matrix for smaller graphs). Node features consist of one-hot encoded atom types, atom hybridization states, number of neighboring atoms (heavy atoms and heteroatoms), and well as partial charges, stored in a 200 × 13 feature matrix. Multiple binary adjacency matrices are used to encode different interaction shells with fixed distance intervals. A graph convolution block is applied to each adjacency matrix together with the feature matrix pre-processed by a fully-connected layer. The outputs of the graph convolutional blocks are concatenated and a fully connected layer produces the final prediction. The model shows similar performance to Pafnucy (Stepniewska-Dziubinska et al., 2018), but the training time appears to be considerably shorter (Son and Kim, 2021).

Jiang et al. (2021) developed InteractionGraphNet, where two independent graph convolution modules are stacked to sequentially learn intramolecular and intermolecular interactions using three molecular graphs (one for the ligand, one for the protein, and one for the protein-ligand complex). The protein-ligand bipartite graph is built using protein and ligand atoms within 8 Å of each other. At first, a series of message passing iterations is employed to update the node features in the protein and ligand graphs. Then, these learned node features are used as initial node features for the protein-ligand graph on which edge features representing non-covalent interactions are updated. The learned edge features on the protein-ligand graph, representing the non-covalent interactions between the protein and the ligand, are finally pooled together and used for downstream prediction tasks: binding affinity prediction, virtual screening and pose prediction. For binding affinity prediction, InteractionGraphNet shows good results on the CASF-2016 benchmark, although several systems were removed from the test set.

Moesser et al. (2022) recently developed a simple but effective way to include protein-ligand interactions into ligand-based graphs. Their protein-ligand interaction graphs (PLIGs) representation featurize an atom node in the molecular graph by including both atom properties and atom-atom contacts with protein atoms. Combined with the GAT architecture (Veličković et al., 2017), their model reaches a very good performance on the CASF-2016 benchmark.

Moon et al. (2022) used GNNs in a very interesting way. Instead of using standard and general architectures, Moon et al. (2022) included parametrized physics-based equations in the model architecture, to incorporate the appropriate inductive bias with the goal of improving model generalization by forcing the model to learn the underlying chemical interactions. A GNN is used to update node features across covalent bonds and intermolecular interactions, which are then used—together with pairwise distances—as input of physics-based parametrized equations describing intermolecular interactions as well as entropy loss. The parameters of the physics-informed equations are learned during training and contribute to model generalization.

GNNs have been also successfully applied for structure-based virtual screening (classification of actives and decoys) as well as pose prediction (classification of binding poses), as demonstrated by Lim et al. (2019), Morrone et al. (2020), and Stafford et al. (2022). The use of GNNs—and, more generally, geometric deep learning—in drug discovery and drug development is a very active area of research and a recent overview on several different applications beyond the narrow scope of this review is given by Gaudelet etal. (2021).

4.6. Other Methods

Above we briefly described widely used families of deep learning architectures—MLPs, CNNs, and GNNs—and their application on the development of structure-based scoring functions. One important omission is recurrent neural networks (RNNs) (Rumelhart et al., 1986; Hochreiter and Schmidhuber, 1997; Graves, 2012), which are suited to learn from sequential data (such as language or time series). RNNs are also applied to protein-ligand binding affinity prediction (Karimi et al., 2019) but they usually employ unrelated representation for the protein (often the sequence of amino acids) and the ligand (SMILES strings or related representations). As mentioned above, proteochemometric or pair models (Lenselink et al., 2017; Öztürk et al., 2018; Feng et al., 2018; Shin et al., 2019; Nguyen et al., 2020; Jiang et al., 2020b; Yang et al., 2022) are outside the scope of this review and the reader can find more information in Kimber et al. (2021).

Similarly to proteochemometric models, which combine different—often learned—representations for the protein and the ligand, protein-ligand binding affinity predictions can also benefit from the use of complementary representations of the complex. Jones et al. (2021) combine learned representations of the protein-ligand complex obtained with CNNs and GNNs using mid-level or late deep fusion (Roitberg et al., 2019).

Seo et al. (2021) recently developed BAPA, an architecture based on 1D CNNs combined with an attention layer. The protein-ligand complex is encoded into a 1D descriptor of contacts between the protein and ligand atoms and processed using a 1D CNN to obtain learned features, which are then concatenated with terms from the AutoDock Vina scoring function. The learned features are then encoded into a latent representation using a MLP. The encoded vector is then passed to an attention layer. As described by Chen et al. (2018b), an attention layer computes a weighted sum of input values, where the weights are determined based on the relevance of the different input components. In BAPA, the goal of the attention layer is to extract the components of the input important for binding affinity prediction in a context vector. The encoded and context vectors are then concatenated an used by an MLP to obtain the final prediction. Wang et al. (2021c) also used self-attention in their PointTransformer architecture. The use of the attention mechanism (Luong et al., 2015; Bahdanau et al., 2014) in binding affinity prediction is also found in proteochemometric models (Zhao et al., 2019; Karimi et al., 2019).

A totally different approach from the data-driven ones reviewed above is to use physics-based methods for the computation of binding free energies accelerated or improved using ML and DL. Thanks to the recent developments in ML force fields (Unke et al., 2021), accurate alchemical free energy calculations based on such force fields are starting to appear (Rufa et al., 2020; Wieder et al., 2021). ML-based corrections to conventional free energy calculations will also play an important role in reaching good prediction accuracy of protein-ligand binding free energies (Dong et al., 2021). While such methods are outside the scope of this review, we believe the exploration and development of ML and DL methods in the field of free energy calculations will provide very interesting outcomes in the coming years, by getting the methodology closer to chemical accuracy while significantly reducing computational costs.

5. Training And Evaluation

5.1. Back-propagation, regularization and transfer learning

Deep learning architectures for supervised learning are usually trained with gradient-based optimisation of a loss (or cost, or error) function that represents some measure of the prediction error (such as the mean squared difference between predicted and expected values). The weights and biases (trainable or learnable parameters) of the model are initialized from a random distribution or in a data-driven fashion (Narkhede et al., 2021), and they are iteratively adjusted by gradient-based optimisation techniques (such as stochastic gradient descent (Bottou et al., 1998)) to minimize a loss function.

Rumelhart et al. (1986) developed an algorithm called backpropagation, which allows computing the gradient of the loss function with respect to the parameters of the model (weights and biases) in an automated and efficient way. The algorithm consists of a forward pass computing the output of each component of the neural network, and the final output is used to evaluate the loss function. Then, the error is propagated backward using the chain rule of calculus to compute the gradients of the loss function with respect to each parameter of the network. The backpropagation algorithm is explained in detail in Goodfellow et al. (2016).

Modern deep learning frameworks such as PyTorch (Li et al., 2020b; Paszke et al., 2019) and TensorFlow (Yu et al., 2018) usually require one to define only the forward pass, and gradients of the loss function can be easily and automatically computed with respect to any parameter. The availability of open-source, well-designed, and easy-to-use deep learning frameworks certainly contributed to the increased application of DL in different areas of research, including drug discovery.

Given the large number of parameters, DL architectures are often subject to the pitfalls of overfitting. To prevent overfitting, several techniques are commonly employed such as early stopping (Caruana et al., 2001), and the use of dropout layers (Srivastava et al., 2014).

Oftentimes, especially in the field of drug discovery, there is interest in models that are not completely generalizable but work well in specific cases such as specific protein families. Once a model has been trained on a general data set, it is possible to fine-tune the learned parameters to improve performance for specific tasks. Transfer learning (Bozinovski and Fulgosi, 1976) methods can be subdivided in four classes (Pan and Yang, 2010): instance-based, feature-based, parameter-based, and relation-based. Deep transfer learning, the combination of transfer learning and deep learning architectures (Tan et al., 2018), is commonly exploited in drug discovery applications where learned representations are employed in different tasks (feature-based transfer learning) or where pre-trained models are fine-tuned for specific tasks (parameter-based transfer learning). The latter technique has been used successfully to develop protein family-specific models for virtual screening (Imrie et al., 2018), for example. An overview of transfer learning in drug discovery is given by Cai et al. (2020).

Multitask learning, which is closely related to transfer learning, consists in learning multiple endpoints at the same time using a shared representation (Ramsundar et al., 2017). Multitask learning can be used for the development of ML and DL SFs for both pose prediction (docking) and binding affinity prediction (scoring) (Ashtawy and Mahapatra, 2017; Francoeur et al., 2020).

5.2. Evaluation

The models for protein-ligand binding affinity prediction discussed above consist of regression models, which given a protein-ligand complex as input return a real-valued estimate of the binding affinity (usually pKd, pKi, or pIC50).

In the CASF-2016 benchmark, arguably one of the most used benchmarks for the development of scoring functions, the scoring power of a scoring function is measured in terms of correlation between experimental and predicted values. This correlation is measured quantitatively using Pearson’s correlation coefficient r, defined as

r=i(xix¯)(yiy¯)i(xix¯)2i(yiy¯)2, 15

where (xi, yi) are the predicted and experimental values of the binding affinity, while x¯ and y¯ are the corresponding averages on the whole data set. A Pearson’s r of 1.0 indicates perfect correlation, while a Pearson’s r of 0.0 indicates no correlation. The Pearson’s correlation coefficient is often accompanied by the root mean squared error

RMSE=1NiN(xiyi)2, 16

or the mean absolute error (MAE)

MAE=1NiN|xiyi|, 17

where N is the total number of samples in the test set.

The predicted value of the protein-ligand binding affinity can also be used to rank compounds, usually against the same target. Common metrics to evaluate the ranking power of a scoring function are rank correlation coefficients such as Spearman’s ρ (Spearman, 2010) and Kendall’s τ (Kendall, 1938). The Spearman’s rank correlation coefficient is defined as (Spearman, 2010)

ρ=i(rxirx¯)(ryiry¯)i(rxirx¯)2i(ryiry¯)2, 18

which is similar to Pearson’s r but uses the predicted and experimental ranks (rxi, ryi)—and the corresponding sample averages—instead of using directly the predicted and experimental values (xi, yi). The other difference is that the Pearson’s correlation coefficient is usually computed on the whole data set, while the Spearman’s rank correlation coefficient (and other rank correlation coefficients) are often disaggregated by target. This is the case for the CASF benchmarks, for example (Su et al., 2018). Another way to quantify the ranking power of a scoring function is the predictive index (PI) introduced by Pearlman and Charifson (2001) and defined as

PI=ij>iWijCijij>iWij 19

where Wij = |yiyj| is the absolute difference between the experimental binding data of ligands i and j and where Cij is defined as (Pearlman and Charifson, 2001)

Cij={1 if yjyixjxi<01 if yjyixjxi>00 if xjxi=0.  20

The weights Wij reflect the fact that ranking incorrectly compounds with similar experimental binding affinities is less detrimental than ranking incorrectly compounds with vastly different binding affinities. As for Spearman’s and Kendall’s rank correlation coefficients, the PI is bound on the interval [−1, 1] (with 0 indicating random predictions).

Confidence intervals for the correlation coefficients described above can be computed using bootstrapping (Efron, 1992). For the CASF-2016 benchmark this is easily done with the provided analysis scripts (Su et al., 2018). The very important topics of calculation of confidence intervals and comparison of different models are discussed at length in Nicholls (2014) and Nicholls (2016) and while we are concerned with regression models in this review, we point the reader interested in the comparison of classification models to Patrick Walters (2021).

Given that the gradient-based training described above depends on the initialization of the parameters of the model, oftentimes multiple models are trained starting from different weights and using different seeds for the random number generator (used for random weight initialization, random shuffling of examples, …), and the final prediction consists on a combination of the results of the different models (often an average). This ensemble approach has been shown multiple times to improve predictions of machine learning and deep learning models (Hansen and Salamon, 1990; Ericksen et al., 2017; Francoeur et al., 2020; Kwon et al., 2020; Ashtawy and Mahapatra, 2015; Meli et al., 2021). More generally, a consensus score amongst multiple models (also with different architectures) can be used as well (Druchok et al., 2021), and the average between different models (different architectures and/or different training data sets) has been shown to improve pose predictions with CNN scoring functions (McNutt et al., 2021). While the average across different models is often used to estimate the performance of the ensemble, the standard deviation across predictions gives information about their stability and can be used as a diagnostic tool. Low standard deviations are expected within the domain of applicability of the models, while large standard deviations are often a symptom of poor generalizability.

Consensus scoring is not a new idea applicable only to machine learning and deep learning models; several flavors of consensus scoring have been successfully applied in combining different classical docking scoring functions for a long time (Charifson et al., 1999; Wang and Wang, 2001; Clark et al., 2002). It is now commonly applied in ML and DL scoring functions to improve prediction performance.

Uncertainty quantification is an important field of machine learning and deep learning research and applications in drug discovery are a very active area of research. Some uncertainty quantification methods such as Monte Carlo dropout (Gal and Ghahramani, 2016) remain under explored. Recently, evidential deep learning (Amini et al., 2020) has been applied to uncertainty quantification in DL-based QSAR (Soleimany et al., 2021). Soleimany et al. (2021) show that evidential deep learning allows to obtain predictions where uncertainty correlates with error and that uncertainty can be employed to perform sample-efficient training. Given the flexibility and scalability of the approach, which can be easily incorporated into existing architectures, this approach might contribute to the development of SFs in the near future.

5.3. Cross-validation and data splitting

Very important aspects to consider when training and evaluating a new model are the size of the training set, the overlap between training and test sets, and the data set bias. These aspects need to be carefully evaluated, to properly assess the performance and generalizability of a new model.

The size of the training set affects the performance of ML and DL models and several authors noticed that including more examples in the training set—even of a lower quality, such as lower-resolution structures— improves model performance (Li et al., 2015; Francoeur et al., 2020). Learning curves, which show the prediction error as a function of the number of training examples, are commonly employed to evaluate and compare ML and DL methods in molecular properties prediction but they remain somewhat uncommon in the evaluation of structure-based models for binding affinity prediction, probably because of the much smaller size of the data sets available for training and evaluation.

The similarity between training and test set has also a very high impact on the performance of structure-based models (Kramer and Gedeck, 2010; Li and Yang, 2017) and a careful model evaluation needs to take this similarity into account to avoid artificially inflated performance. Li and Yang (2017) studied the impact on ML SFs of protein structural and sequence similarity between the training and test. In their study, they remove training proteins that are highly similar to the ones in the test set, as evaluated by structural and sequence alignment. They concluded that ML SFs do not outperform classical scoring functions after removal of proteins from the training set with a high degree of similarity with the test set and therefore they attributed the higher performance of ML SFs compared to classical SFs to the existence of similarities between proteins in the training and test sets. Li et al. (2018), however, performed a similar study and concluded that the good scoring power of RF-Score is not exclusively due to a high number of similar proteins, although when sufficiently similar targets are present in both the training and test set ML scoring functions perform consistently better than classical scoring functions (Shen et al., 2020b). Additionally, ML scoring functions are able to exploit new data points as they become available, while classical scoring functions seem unable to exploit the large volumes of structural and interaction data available nowadays; incorporating a larger proportion of similar complexes to the training set does not seem to make classical SFs more accurate, according to Li et al. (2019b).

Boyles et al. (2019) and Su et al. (2020) both developed sub-sets of the PDBbind data set to carefully evaluate the effect of protein and ligand similarities on the performance of models trained on PDBbind and tested on the CASF data set. Boyles et al. (2019) evaluated ligand similarity using the Tanimoto similarity between Morgan fingerprints of each pair of ligands while protein similarity was evaluated with by sequence identity. Su et al. (2020) also used protein sequences to determine protein similarity, but used 3D shape similarity (Vainio et al., 2009) to evaluate the similarity between ligands. Additionally, Su et al. (2020) also evaluated binding site similarity—the binding site might be preserved, in contrast to the overall protein sequence—using structural descriptors including residue types and interatomic distances (Yeturu and Chandra, 2008). Both groups confirmed the strong dependence on the similarity between the training and test set of the performance of ML scoring functions, which poses a challenge in the comparison of ML and DL SFs with classical SFs. While these considerations are very important in the development of new methods and it is important to take them into account when comparing different models, in practical applications the similarity between the training set and the system under investigation can be exploited to obtain superior predictions compared to classical SFs. For example, Li et al. (2021a) argue that the performance of ML scoring functions is underestimated due to the artificial removal of similarities between the training and tests sets and put forward a new benchmark with tries to mimic prospective binding affinity predictions. However, it is important to keep in mind that ML and DL SFs might be less effective when dealing with novel targets or small molecules (Su et al., 2020), and the applicability domain needs to be clearly defined.

Very recently, Ji et al. (2022) developed a free and open-source Python package that allows to curate dataset for benchmarking out-of-distribution (OOD) algorithms in the context of protein-ligand binding affinity predictions. The authors highlight a significant performance gap between in-distribution and OOD experiments, highlighting the need for new and domain-specific techniques allowing better OOD generalization.

Another way to elucidate the performance of ML and DL SFs in light of similarities and dissimilarities between the training test set is to use clustered cross-validation. K-fold cross-validation is an established technique for the evaluation of ML and DL models (Arlot and Celisse, 2010) that consists of randomly splitting the training set into K different sets and use, in turn, K − 1 sets for training and the remaining set for validation/testing. Francoeur et al. (2020) evaluated the performance of their CNN scoring function using cross-validation with clusters based on protein sequence and ligand fingerprint similarities (for the models trained using PDBbind) and also concluded that evaluations based on the PDBbind core set are overly-optimistic and therefore a rather poor measure of the model’s ability to generalize to novel target and small molecules.

Finally, care should be taken in the presence of data set bias. One of the simplest forms of bias in current data sets is that published binding affinities tend to come from publications where potent binders were identified. Therefore, the distribution of binding affinities available for training might be skewed to potent binders and the trained model might be unable to predict binding affinities for weak binders. Bias can also be introduced in the construction of the training and test sets. For example, for the classification of actives and decoys on the DUD-E data set (Mysinger et al., 2012) it has been shown that analogue bias together with easily distinguishable decoys (decoys bias) result in CNN SFs exploiting only ligand information even when structure-based information is provided (Chen et al., 2019a). Yang et al. (2020) also caution about the use of DUD-E to train ML and DL models to predict protein-ligand interactions but point out that the data set can still serve as an independent test set. Sieg et al. (2019) analyzed the problem of data set bias in-depth and proposed guidelines to recognize biases and develop robust models. Yang et al. (2020) suggest to evaluate the performance of ligand-only and protein-only models to better understand what ML and DL methods are learning from protein-ligand complexes.

The problem of unnoticed biases in the dataset that are exploited on learning by complex DL models is related to the infamous “black box” nature of some models.

6. Explainable Ai

As mentioned in the previous section, the “black box” nature of some models poses serious challenges in the identification of biases in the data sets and often prevents a deeper understanding of the model predictions and especially of its failures. In recent years, a lot of research effort has been devoted to model interpretability and explainable artificial intelligence (XAI) (Lipton, 2018; Gunning et al., 2019; Murdoch et al., 2019).

To unpack the predictions of CNN-based scoring functions, several authors focused on feature attribution methods. For example, Stepniewska-Dziubinska et al. (2018) estimated feature importance of the different input channels by looking at the weight distributions of the convolutional filters of the first layer. Hochuli et al. (2018) also looked at the weights of the convolutional filters of the first layer, which can give some insight on how the model uses the different input atom types. Hochuli et al. (2018) used additional established methods for feature attribution—such as gradient computation, a modified version of layer-wise relevance propagation (Bach et al., 2015), and masking (Štrumbelj et al., 2009; Szegedy et al., 2013)—combined with visualization of the protein-ligand complex, showing that each method provides some insight into their CNN scoring function.

Gradient-based feature attribution methods, which allow to determine (local) feature importance, consist in computing the gradient of the prediction with respect to the input. In DL models, such gradients are readily available thanks to the automatic differentiation machinery of modern deep learning frameworks. Interestingly, the gradients of the SF with respect to atomic coordinates can be used to perform ligand pose optimization in the context of docking (Ragoza et al., 2017b). Masking, a perturbation-based feature attribution approach, consists in removing part of the input in order to measure the change in output. Masking can be performed on single atoms or fragments and whole protein residues. While masking approaches are close to chemical intuition and directly estimate feature importance of different atoms or functional groups, they are computationally expensive since they require several evaluations per input.

Hochuli et al. (2018) show that feature attribution methods are able to identify important atoms in the ligand and this information can potentially be employed to optimize protein-ligand interactions during lead optimization. However, it is not always clear why particular atoms are highlighted as important (Hochuli et al., 2018). More recently, Varela-Rial et al. (2022) applied the integrated gradient feature attribution technique (Sundararajan et al., 2017) to their Kdeep model, confirming that the model can generally learn meaningful interactions, but that in some cases important interactions where ignored or protein residues far from the ligand were highlighted. The fact that residues far from the ligand are highlighted as important suggest that in some cases the model is exploiting protein similarity instead of important physical interactions between the protein and the ligand.

The feature attribution methods shortly described above in the context of CNN SFs can be applied to other models as well. For example, gradient-based attribution has been applied in combination with GNNs to identify pharmacophoric features involved in ligand binding (McCloskey et al., 2019), while Cho et al. (2020) applied layer-wise relevance propagation to explain the predictions of their InteractionNet model.

For GNNs, there are several XAI methods specifically tailored for such architecture (Jiménez-Luna et al., 2020) and it is currently a vibrant area of research (Baldassarre and Azizpour, 2019; Agarwal et al., 2021; Yuan et al., 2020). XAI methods for graphs can be classified in two categories (Jiménez-Luna et al., 2020): sub-graph identification, and attention-based (Veličković et al., 2017) approaches. Sub-graph identification is useful to identify a compact sub-graph structure as well as a small subset of node features that contribute strongly to the model prediction (Ying et al., 2019). While GNN-based XAI has seen several applications in the prediction of molecular properties and reactivity (Preuer et al., 2019; Ryu et al., 2018; Coley et al., 2019; Jiménez-Luna et al., 2021b), its consistent application to GNN-based structure-based scoring function is still under-explored.

Uncertainty quantification, briefly discussed above in the context of model evaluation, is also an important XAI technique with the goal of quantifying the reliability of a prediction. Ensemble approaches are currently employed in most applications but probabilistic approaches such as evidential deep learning (Amini et al., 2020; Soleimany et al., 2021) will play a major role in the future.

Model interpretability is also important for classical ML methods—such as RFs, and SVMs—and QSAR models (Marchese Robinson et al., 2017; Sheridan, 2019; Rodríguez-Pérez and Bajorath, 2019; Riniker and Landrum, 2013b), that are not the focus of this review. Several XAI methods are model-agnostic and therefore work with several ML and DL methods. However, it is worth mentioning that the heavily pre-processed features—such as interaction fingerprints discussed above—often used in combination with classical ML methods might render the models less interpretable than complex DL methods (Lipton, 2018).

XAI approaches have the potential to transform the application of DL in real drug discovery applications. Being able to explain why a particular prediction is relevant and interesting would facilitate the adaptation of computational models in experimental pipelines. However, several limitations of XAI remain. For example, XAI approaches are still under active development and research, and often the methods need to be carefully tailored to the problem at hand. Additionally, as pointed out by Jiménez-Luna et al. (2020), there is no method that combines all desirable features of XAI—transparency, justification, informativeness, and uncertainty estimation—and therefore current applications often rely on consensus approaches between methods possessing different desirable features.

A recent, extensive, and very accessible review of XAI applications in drug discovery is given by Jiménez-Luna et al. (2020), which also outline recent advances in the field of XAI that are yet to be applied to chemistry or drug discovery. However, the field is moving at a fast pace and some of the methods without any reported application in drug discovery in Jiménez-Luna et al. (2020)—such as instance-based methods—are now starting to be applied successfully (Wellawatte et al., 2022).

7. Discussion And Conclusion

In this review, we focused on structure-based scoring functions for binding affinity prediction based on deep learning, many of which have been developed in recent years. The large number of recently developed SFs (see Table 2 for a non-exhaustive list) is a testament to this rapid and fast-moving field. Li et al. (2020a) recently reviewed ML and DL scoring functions for structure-based lead optimization developed between 2015 and 2019, but several new DL SFs have been developed and published in the last two years. Another example is the review of Shen et al. (2019), where only one GNN-based scoring function—PotentialNet (Feinberg et al., 2018)—was identified; most GNN scoring functions in Table 2 are from 2020 and later.

Table 2.

Non-exhaustive list of deep learning architectures for protein-ligand binding affinity prediction and their performance on the CASF-2016 scoring benchmark (if available). MLPs are included regardless of the number of hidden layers. Some methods are described in multiple publications and the ones referenced in this table are the ones where the model has been evaluated on the PDBbind Core set 2016/CASF-2016 set (or the original publication, if this evaluation is not available). The best result (the highest Pearson’s r) is reported. Different publications might use slightly different custom variations of the CASF-2016 benchmark and the overlap between training and test sets might be taken into account in different ways. We refer the reader to the original publications for details, but we also report the number, N, of systems in the test set to outline possible differences. RMSEs are expressed in pK units.

Model Reference Architecture Pearson’s r RMSE N
Artemenko (2008) MPL
NNScore 2.0 Durrant and McCammon (2011b) MPL
BgN- & BsN-Score Ashtawy and Mahapatra (2015) MPL
DLscore Hassan et al. (2018) MPL
PLEC-NN Wójcikowski et al. (2018) MLP 0.82 290
pair Zhu et al. (2020) MLP 0.75 1.44 285
AEScore Meli et al. (2021) MLP 0.83 1.22 285
TopologyNet Cang and Wei (2017) CNN 0.81 1.34 290
Kdeep Jiménez et al. (2018) CNN 0.82 1.27 290
Pafnucy Stepniewska-Dziubinska et al. (2018) CNN 0.78 1.42 290
1D2D-CNN Cang etal. (2018) CNN 0.85 1.21 290
DeepAtom Li et al. (2019c) CNN 0.81 1.32 290
OnionNet Zheng et al. (2019) CNN 0.82 1.28 290
Gnina Francoeur et al. (2020) CNN 0.80 1.37 280
RosENet Hassan-Harrirou et al. (2020) CNN 0.82 1.24
AK-Score Kwon et al. (2020) CNN 0.81 285
LigityScore1D Azzopardi and Ebejer (2021) CNN 0.74 1.46 285
OnionNet-2 Wang et al. (2021d) CNN 0.86 1.16 285
SE-OnionNet Wang et al. (2021a) CNN 0.83 285
ACNN Gomes et al. (2017) GNN
PotentialNet Feinberg et al. (2018) GNN
graphDelta Karlov et al. (2020) GNN 0.87 1.05 285
SIGN Li et al. (2021c) GNN 0.80 1.32 290
InteractionGraphNet Jiang et al. (2021) GNN 0.84 1.22 262
GraphBAR Son and Kim (2021) GNN 0.78 1.41 290
PLIG/GATNet Moesser et al. (2022) GNN 0.84 1.22 272
PIGNet Moon et al. (2022) GNN 0.76 283
Berishvili et al. (2019) CNN/RNN
FAST Jones et al. (2021) CNN + GNN 0.81 1.31 290
BAPA Seo et al. (2021) CNN + ATT 0.82 1.30 285
PointTransformer Wang et al. (2021c) CNN + ATT 0.85 1.19 285

Table 2 reports the scoring performance of several deep learning SFs mostly based on MLPs, CNNs, and GNNs on the CASF-2016 benchmark (whenever available in the primary reference). Tables 3 and 4 report the scoring performance (Pearson’s correlation coefficient) for the CSAR-NRC HiQ sets for the same methods outlined in Table 2. The significantly lower number of methods tested on the CSAR-NRC HiQ sets and the Astex Diverse Set shows that the CASF benchmark is the de facto standard for the assessment of novel ML and DL scoring functions. Going forward, it would be interesting to see the other benchmarks gaining more traction in order to obtain more information about scoring function performance.

Table 3. Performance of the models summarized in Tab. 2 on the CSAR-NRC HiQ scoring benchmark. We only report evaluation results from the original reference. RMSEs are expressed in pK units.

Model Reference Set 1 r Set 1 RMSE Set 2 r Set 2 RMSE
Kdeep Jiménez et al. (2018) 0.72 2.08 0.65 1.91
RosENet Hassan-Harrirou et al. (2020) 0.83 1.78 0.80 1.44
OnionNet-2 Wang et al. (2021d) 0.89 1.50 0.87 1.21
graphDelta Karlov et al. (2020) 0.74 1.59 0.71 1.52
GraphBAR Son and Kim (2021) 0.75 1.59 0.65 1.56
PIGNet Moon et al. (2022) 0.77 0.80
BAPA Seo et al. (2021) 0.83 1.06 0.75 0.98

Table 4. Performance of the models summarized in Tab. 2 on the Astex Diverse Set scoring benchmark. We only report evaluation results from the original reference. RMSEs are expressed in pK units.

Model Reference Pearson’s r RMSE
Pafnucy Stepniewska-Dziubinska et al. (2018) 0.57 1.37
DeepAtom Li et al. (2019c) 0.77 1.03
RosENet Hassan-Harrirou et al. (2020) 0.48 1.65

Despite the standardized benchmarks, some methods required the removal of some systems—leading to parametrization problems or outside the applicability domain—, but it is clear that most methods achieve similar performance on this benchmark. Additionally, the comparison between different methods on the same benchmark remains challenging due to possible differences in the training set—and the possible overlap between training and test sets. Finally, most methods are only tested on the CASF benchmark, despite other benchmark sets being widely available. These observations call for an in-depth comparison of the different methods trained and tested on exactly the same data sets, and using all available high-quality test sets.

The performance on CASF-2016 of the DL methods reviewed here is much higher than the performance of classical SFs on the same benchmark (Su et al., 2018). However, deep learning scoring functions do not always perform better or significantly better than scoring functions based on classical ML algorithms (Li et al., 2020a). For example, it was shown that deep NNs and shallow regularized NNs perform similarly in QSAR applications when using the same set of descriptors (Winkler and Le, 2016), and RF-based methods can achieve state-of-the-art performance when combined with suitable descriptors (Boyles et al., 2019). This is in stark contrast with other fields such as computer vision and natural language processing, where DL has quickly taken over classical ML algorithms. Additionally, while most ML and DL SFs for binding affinity prediction are trained and tested on crystal structures, their performance deteriorates when trained and tested on docked poses (Boyles et al., 2021), but it is worth noting that augmenting structure-based features obtained from docked structures with ligand-based features can recover the performance of structure-based models trained on crystal structures.

Another problem identified with ML and DL stricture-based SFs for binding affinity prediction is that while they perform significantly better than classical SFs for scoring (better correlation of the score with experimental binding affinities), they often perform poorly in virtual screening tasks (Gabel et al., 2014). Gabel et al. (2014) suggest that the development of novel ML and DL scoring function for binding affinity predictions should be accompanied by analysis of ligand pose sensitivity and enrichment capabilities in structure-based virtual screening. A more recent study by Shen et al. (2020a) confirms that ML scoring functions trained on PDBbind do not work well for virtual screening, especially on novel targets or targets with unconventional binding pockets. Multitask learning for binding affinity prediction and pose prediction trained using docked poses instead of crystallographic structures is effective to increase pose sensitivity in the context of CNN scoring functions (Francoeur et al., 2020). In the context of virtual screening, data augmentation techniques can also increase pose sensitivity by forcing the model to rely less on ligand information (Scantlebury et al., 2020).

It is well known that the maximum achievable performance of ML and DL models for binding affinity predictions is limited by experimental errors and uncertainties (Kramer et al., 2012). This explains the similar performance of the best performing models on CASF-2016, which are likely close to the theoretical limit. Ventures like the Critical Assessment of Computational Hit-finding Experiments (CACHE) (Müller et al., 2022) will play an important role to validate computational methods in the future and generate a larger corpus of very high-quality data.

Going forward, it is important to evaluate ML and DL scoring functions as part of the docking pipeline. Most SFs discussed here are applied as a post-processing step of docking—or they are only applied to crystal structures—and only a few SFs seem to have been incorporated into readily available docking software. One such example is GNINA, where the CNN scoring function can be employed within the docking pipeline to re-score or locally optimize the ligand poses after fast Monte Carlo search (McNutt et al., 2021).

In this review, we have focussed mainly on methods for the prediction of protein-ligand binding affinity, and scoring functions evaluated on scoring tasks. However, ranking different compounds against the same target of interest is extremely useful in drug discovery applications. This is the case for lead optimization, where a lead compound against the target of interest has been identified and the goal is to increase potency while improving pharmacokinetic and pharmacodynamic properties. With binding affinity prediction computing such rankings is trivial. However, it remains unclear if the performance of ML and DL methods developed for scoring work equally well for ranking, especially in real drug discovery applications. For example some methods trained to predict binding affinities performed poorly on the different task of predicting the differences in binding affinity upon protein mutation (Aldeghi et al., 2018b). DL methods specifically designed for ranking—computing relative binding affinities—have been developed (Jiménez-Luna et al., 2019) and are an active area of research (McNutt and Koes, 2022).

Given that the performance of a DL SFs varies widely from the target under consideration (Jiménez et al., 2018; Hassan-Harrirou et al., 2020; Meli et al., 2021), there is a lot of room for improvement in the development of target-specific scoring functions (Ross et al., 2013; Nogueira and Koch, 2019). ML and DL algorithms are very good at exploiting similarities between inputs to perform predictions—as demonstrated by the performance drop when similarities between the training and test sets are removed (Boyles et al., 2019; Su et al., 2020)—and therefore family-specific scoring function will play an increasing role in early stages of drug discovery, when a particular target has been identified. However, it is still unclear if family-specific structure-based SFs consistently outperform ligand-based methods (Shen et al., 2020a).

Finally, given the ultimate goal of lowering the high attrition rate at later stages of drug discovery, the use of ADME/Tox predictions will also play an increasingly important role (Bhhatarai et al., 2019) alongside SFs to identify potent compounds against the target of interest and prioritize compounds for further experimental validation.

While the application of deep learning has not yet provided a step-changing improvement in the performance of binding affinity prediction compared to classical ML methods, further research into novel architectures, combined with the ever-increasing size and quality of data sets of protein-ligand complexes might change the tide in the future. Physics-based ML and DL will probably take over purely data-driven models in the long term, combining the best of both worlds. It is however important to remain realistic on the capabilities of DL SFs and it will be interesting to see how they actually perform in real-world drug discovery applications. Schneider et al. (2019) suggest a “curious but cautious approach” to the application of DL in the drug discovery process. XAI methods will certainly play a central role in the application of DL scoring functions to real drug discovery programs because knowing the reason behind a given prediction and understanding well the failure modes of the developed models will help to guide the next steps in the drug discovery process.

Figure 1.

Figure 1

Schematic representation of a structure-based deep learning architecture—with several hidden layers—for binding affinity prediction. The protein-ligand complex (PDB ID: 5S9H, rendered with Illustrate (Goodsell et al., 2019a)) is encoded into a suitable representation that is used as input to the deep learning architecture. Using a series of stacked hidden layers, a prediction of the binding affinity is finally obtained. The exact nature of the input layer as well as the hidden layers depends on the type of architecture under consideration.

Acknowledgments

Funding

This work was supported by funding from the Biotechnology and Biological Sciences Research Council (BBSRC) [BB/MO11224/1] National Productivity Investment Fund (NPIF) [BB/S50760X/1] and Evotec (UK) via the Interdisciplinary Biosciences DTP at the University of Oxford.

Footnotes

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Author Contributions

RM, GMM, and PCB conceptualized the review. RM wrote the initial draft. RM, GMM, and PCB edited, reviewed, and expanded the initial draft. All authors read and approved the final manuscript.

Data Availability Statement

References

  1. Abraham MJ, Murtola T, Schulz R, Páll S, Smith JC, Hess B, et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015;1-2:19–25. doi: 10.1016/j.softx.2015.06.001. [DOI] [Google Scholar]
  2. Adcock SA, McCammon JA. Molecular dynamics: Survey of methods for simulating the activity of proteins. ChemInform. 2006;37:1589–1615. doi: 10.1002/chin.200630297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Agarwal C, Zitnik M, Lakkaraju H. Towards a rigorous theoretical analysis and evaluation of GNN explanations. arXiv preprint. 2021:arXiv:2106.09078 [Google Scholar]
  4. Aggarwal R, Gupta A, Chelur V, Jawahar CV, Priyakumar UD. DeepPocket: Ligand binding site detection and segmentation using 3D convolutional neural networks. J Chem Inf Model. 2021 doi: 10.1021/acs.jcim.1c00799. [DOI] [PubMed] [Google Scholar]
  5. Aggarwal R, Koes DR. Learning rmsd to improve protein-ligand scoring and pose selection. ChemRxiv. 2020 doi: 10.26434/chemrxiv.11910870.v2. [DOI] [Google Scholar]
  6. Ahmed A, Smith RD, Clark JJ, Dunbar JB, Carlson HA. Recent improvements to binding MOAD: A resource for protein–ligand binding affinities and structures. Nucleic Acids Res. 2014;43:D465–D469. doi: 10.1093/nar/gku1088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Ain QU, Aleksandrova A, Roessler FD, Ballester PJ. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. WIREs Comput Mol Sci. 2015;5:405–424. doi: 10.1002/wcms.1225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Aldeghi M, Bluck JP, Biggin PC. Methods in Molecular Biology. Springer; New York: 2018a. Absolute alchemical free energy calculations for ligand binding: A beginner’s guide; pp. 199–232. [DOI] [PubMed] [Google Scholar]
  9. Aldeghi M, Gapsys V, de Groot BL. Accurate estimation of ligand binding affinity changes upon protein mutation. ACS Cent Sci. 2018b;4:1708–1718. doi: 10.1021/acscentsci.8b00717. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Aldeghi M, Heifetz A, Bodkin MJ, Knapp S, Biggin PC. Accurate calculation of the absolute free energy of binding for drug molecules. Chem Sci. 2016;7:207–218. doi: 10.1039/c5sc02678d. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Aldeghi M, Heifetz A, Bodkin MJ, Knapp S, Biggin PC. Predictions of ligand selectivity from absolute binding free energy calculations. JACS. 2017;139:946–957. doi: 10.1021/jacs.6b11467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Alford RF, Leaver-Fay A, Jeliazkov JR, O’Meara MJ, DiMaio FP, Park H, et al. The Rosetta all-atom energy function for macromolecular modeling and design. J Chem Theory Comput. 2017;13:3031–3048. doi: 10.1021/acs.jctc.7b00125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Alibay I, Mangakar A, Seeliger D, Biggin P. Evaluating the use of absolute binding free energy in the fragment optimization process. ChemRxiv. 2022 doi: 10.26434/chemrxiv-2022-cw2kq. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Allen WJ, Balius TE, Mukherjee S, Brozell SR, Moustakas DT, Lang PT, et al. DOCK 6: Impact of new features and current docking performance. J Comput Chem. 2015;36:1132–1156. doi: 10.1002/jcc.23905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/s0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  16. Amini A, Schwarting W, Soleimany A, Rus D. Deep evidential regression. Advances in Neural Information Processing Systems. 2020;33:14927–14937. [Google Scholar]
  17. Aqvist J, Medina C, Samuelsson J-E. A new method for predicting binding affinity in computer-aided drug design. Protein Engineering, Design and Selection. 1994;7:385–391. doi: 10.1093/protein/7.3.385. [DOI] [PubMed] [Google Scholar]
  18. Arlot S, Celisse A. A survey of cross-validation procedures for model selection. Stat Surv. 2010;4:40–79. doi: 10.1214/09-ss054. [DOI] [Google Scholar]
  19. Artemenko N. Distance dependent scoring function for describing protein-ligand intermolecular interactions. J Chem Inf Model. 2008;48:569–574. doi: 10.1021/ci700224e. [DOI] [PubMed] [Google Scholar]
  20. Ashtawy HM, Mahapatra NR. BgN-score and BsN-score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes. BMC Bioinf. 2015;16:1–12. doi: 10.1186/1471-2105-16-s4-s8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ashtawy HM, Mahapatra NR. Task-specific scoring functions for predicting ligand binding poses and affinity and for screening enrichment. J Chem Inf Model. 2017;58:119–133. doi: 10.1021/acs.jcim.7b00309. [DOI] [PubMed] [Google Scholar]
  22. Atz K, Grisoni F, Schneider G. Geometric deep learning on molecular representations. Nature Machine Intelligence. 2021;3:1023–1032. doi: 10.1038/s42256-021-00418-8. [DOI] [Google Scholar]
  23. Azzopardi J, Ebejer J. LigityScore: Convolutional neural network for binding-affinity predictions; Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies; 2021; pp. 38–49. [DOI] [Google Scholar]
  24. Bach S, Binder A, Montavon G, Klauschen F, Müller K-R, Samek W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS One. 2015;10:e0130140. doi: 10.1371/journal.pone.0130140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint. 2014:arXiv:1409.0473 [Google Scholar]
  26. Baldassarre F, Azizpour H. Explainability techniques for graph convolutional networks. arXiv preprint. 2019:arXiv:1905.13686 [Google Scholar]
  27. Baldi P. Deep Learning in Science. Cambridge University Press; 2021. [DOI] [Google Scholar]
  28. Ballester PJ, Mitchell JBO. A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking. Bioinformatics. 2010;26:1169–1175. doi: 10.1093/bioinformatics/btq112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Ballester PJ, Schreyer A, Blundell TL. Does a more precise chemical description of Protein–Ligand complexes lead to more accurate prediction of binding affinity? J Chem Inf Model. 2014;54:944–955. doi: 10.1021/ci500091r. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Bao J, He X, Zhang JZH. DeepBSP—a machine learning method for accurate prediction of Protein–Ligand docking structures. J Chem Inf Model. 2021;61:2231–2240. doi: 10.1021/acs.jcim.1c00334. [DOI] [PubMed] [Google Scholar]
  31. Bartók AP, De S, Poelking C, Bernstein N, Kermode JR, Csányi G, et al. Machine learning unifies the modeling of materials and molecules. Sci Adv. 2017;3:e1701816. doi: 10.1126/sciadv.1701816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Bartók AP, Kondor R, Csányi G. On representing chemical environments. Phys Rev B. 2013;87:184115. doi: 10.1103/physrevb.87.184115. [DOI] [Google Scholar]
  33. Bash PA, Singh UC, Brown FK, Langridge R, Kollman PA. Calculation of the relative change in binding free energy of a protein-inhibitor complex. Science. 1987;235:574–576. doi: 10.1126/science.3810157. [DOI] [PubMed] [Google Scholar]
  34. Baskin II. The power of deep learning to ligand-based novel drug discovery. Expert Opin Drug Discovery. 2020;15:755–764. doi: 10.1080/17460441.2020.1745183. [DOI] [PubMed] [Google Scholar]
  35. Battaglia PW, Hamrick JB, Bapst V, Sanchez-Gonzalez A, Zambaldi V, Malinowski M, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint. 2018:arXiv:1806.01261 [Google Scholar]
  36. Behler J, Parrinello M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys Rev Lett. 2007;98:146401. doi: 10.1103/physrevlett.98.146401. [DOI] [PubMed] [Google Scholar]
  37. Benson ML, Smith RD, Khazanov NA, Dimcheff B, Beaver J, Dresslar P, et al. Binding MOAD, a high-quality protein ligand database. Nucleic Acids Res. 2007;36:D674–D678. doi: 10.1093/nar/gkm911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, et al. The ChEMBL bioactivity database: An update. Nucleic Acids Res. 2013;42:D1083–D1090. doi: 10.1093/nar/gkt1031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Berendsen H, van der Spoel D, van Drunen R. GROMACS: A message-passing parallel molecular dynamics implementation. Comput Phys Commun. 1995;91:43–56. doi: 10.1016/0010-4655(95)00042-e. [DOI] [Google Scholar]
  40. Berishvili VP, Perkin VO, Voronkov AE, Radchenko EV, Syed R, Venkata Ramana Reddy C, et al. Time-domain analysis of molecular dynamics trajectories using deep neural networks: Application to activity ranking of tankyrase inhibitors. J Chem Inf Model. 2019;59:3519–3532. doi: 10.1021/acs.jcim.9b00135. [DOI] [PubMed] [Google Scholar]
  41. Berman HM, Bhat TN, Bourne PE, Feng Z, Gilliland G, Weissig H, et al. The protein databank. Nat Struct Biol. 2000;7:957–959. doi: 10.1038/80734. [DOI] [PubMed] [Google Scholar]
  42. Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Brice MD, Rodgers JR, et al. The protein data bank: A computer-based archival file for macromolecular structures. J Mol Biol. 1977;112:535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]
  43. Bhhatarai B, Walters WP, Hop CECA, Lanza G, Ekins S. Opportunities and challenges using artificial intelligence in ADME/Tox. Nat Mater. 2019;18:418–422. doi: 10.1038/s41563-019-0332-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Bishop CM. Pattern Recognition and Machine Learning. Springer; New York: 2006. [DOI] [Google Scholar]
  45. Böhm H-J. Ludi: rule-based automatic design of new substituents for enzyme inhibitor leads. Journal of Computer-Aided Molecular Design. 1992;6:593–606. doi: 10.1007/BF00126217. [DOI] [PubMed] [Google Scholar]
  46. Böhm H-J. The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure. J Comput Aided Mol Des. 1994;8:243–256. doi: 10.1007/bf00126743. [DOI] [PubMed] [Google Scholar]
  47. Böhm H-J, Stahl M. Reviews in Computational Chemistry. Vol. 18. John Wiley & Sons, Inc; 2002. The use of scoring functions in drug discovery applications; pp. 41–87. chap. 2. [DOI] [Google Scholar]
  48. Boresch S, Tettinger F, Leitgeb M, Karplus M. Absolute binding free energies: A quantitative approach for their calculation. The Journal of Physical Chemistry B. 2003;107:9535–9551. doi: 10.1021/jp0217839. [DOI] [Google Scholar]
  49. Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers; Proceedings of the fifth annual workshop on Computational learning theory - COLT ’92; ACM Press; 1992. pp. 144–152. [DOI] [Google Scholar]
  50. Bottou L, et al. Online learning and stochastic approximations. On-line learning in neural networks. 1998;17:142. [Google Scholar]
  51. Boyles F, Deane CM, Morris GM. Learning from the ligand: Using ligand-based features to improve binding affinity prediction. Bioinformatics. 2019;36:758–764. doi: 10.1093/bioinformatics/btz665. [DOI] [PubMed] [Google Scholar]
  52. Boyles F, Deane CM, Morris GM. Learning from docked ligands: Ligand-based features rescue structure-based scoring functions when trained on docked poses. J Chem Inf Model. 2021 doi: 10.1021/acs.jcim.1c00096. [DOI] [PubMed] [Google Scholar]
  53. Bozinovski S, Fulgosi A. The influence of pattern similarity and transfer learning upon training of a base perceptron b2. Proceedings of Symposium Informatica. 1976:3–121. [Google Scholar]
  54. Breiman L. Bagging predictors. Mach Learn. 1996;24:123–140. doi: 10.1007/bf00058655. [DOI] [Google Scholar]
  55. Breiman L. Random forests. Mach Learn. 2001;45:5–32. [Google Scholar]
  56. Bronstein MM, Bruna J, Cohen T, Veličković P. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint. 2021:arXiv:2104.13478 [Google Scholar]
  57. Brown DG, Wobst HJ. A decade of FDA-approved drugs (2010–2019): Trends and future directions. J Med Chem. 2021;64:2312–2338. doi: 10.1021/acs.jmedchem.0c01516. [DOI] [PubMed] [Google Scholar]
  58. Brown N, editor. Drug Discovery. Royal Society of Chemistry; 2020. Artificial Intelligence in Drug Discovery. [DOI] [Google Scholar]
  59. Cai C, Wang S, Xu Y, Zhang W, Tang K, Ouyang Q, et al. Transfer learning for drug discovery. J Med Chem. 2020;63:8683–8694. doi: 10.1021/acs.jmedchem.9b02147. [DOI] [PubMed] [Google Scholar]
  60. Cang Z, Mu L, Wei G-W. Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening. PLOS Comput Biol. 2018;14:e1005929. doi: 10.1371/journal.pcbi.1005929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Cang Z, Wei G-W. TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions. PLOS Comput Biol. 2017;13:e1005690. doi: 10.1371/journal.pcbi.1005690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Carlson HA, Smith RD, Damm-Ganamet KL, Stuckey JA, Ahmed A, Convery MA, et al. CSAR 2014: A benchmark exercise using unpublished data from pharma. J Chem Inf Model. 2016;56:1063–1077. doi: 10.1021/acs.jcim.5b00523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Caruana R, Lawrence S, Giles L. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. Adv Neural Inf Process Syst. 2001:402–408. [Google Scholar]
  64. Chang C-eA, Chen W, Gilson MK. Ligand configurational entropy and protein binding. Proc Natl Acad Sci. 2007;104:1534–1539. doi: 10.1073/pnas.0610494104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Charifson PS, Corkery JJ, Murcko MA, Walters WP. Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. J Med Chem. 1999;42:5100–5109. doi: 10.1021/jm990352k. [DOI] [PubMed] [Google Scholar]
  66. Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug discovery today. 2018a;23:1241–1250. doi: 10.1016/j.drudis.2018.01.039. [DOI] [PubMed] [Google Scholar]
  67. Chen L, Cruz A, Ramsey S, Dickson CJ, Duca JS, Hornak V, et al. Hidden bias in the DUD-e dataset leads to misleading performance of deep learning in structure-based virtual screening. PLoS One. 2019a;14:e0220113. doi: 10.1371/journal.pone.0220113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Chen MX, Firat O, Bapna A, Johnson M, Macherey W, Foster G, et al. The best of both worlds: Combining recent advances in neural machine translation; Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2018b; pp. 5998–6008. [DOI] [Google Scholar]
  69. Chen P, Ke Y, Lu Y, Du Y, Li J, Yan H, et al. Dligand2: an improved knowledge-based energy function for protein–ligand interactions using the distance-scaled, finite, ideal-gas reference state. Journal of cheminformatics. 2019b;11:1–11. doi: 10.1186/s13321-019-0373-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Chen T, Guestrin C. XGBoost; Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016; pp. 785–794. [DOI] [Google Scholar]
  71. Chen X, Liu M, Gilson M. BindingDB: A web-accessible molecular recognition database. Combinatorial Chemistry & High Throughput Screening. 2001;4:719–725. doi: 10.2174/1386207013330670. [DOI] [PubMed] [Google Scholar]
  72. Cheng T, Li X, Li Y, Liu Z, Wang R. Comparative assessment of scoring functions on a diverse test set. J Chem Inf Model. 2009;49:1079–1093. doi: 10.1021/ci9000053. [DOI] [PubMed] [Google Scholar]
  73. Cho H, Lee EK, Choi IS. Layer-wise relevance propagation of interactionnet explains protein–ligand interactions at the atom level. Scientific reports. 2020;10:1–11. doi: 10.1038/s41598-020-78169-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Cho K, Van Merriënboer B, Bahdanau D, Bengio Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint. 2014:arXiv:1409.1259 [Google Scholar]
  75. Christensen AS, Sirumalla SK, Qiao Z, O’Connor MB, Smith DGA, Ding F, et al. OrbNet denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy. J Chem Phys. 2021;155:204103. doi: 10.1063/5.0061990. [DOI] [PubMed] [Google Scholar]
  76. Chuang KV, Gunsalus LM, Keiser MJ. Learning molecular representations for medicinal chemistry. J Med Chem. 2020;63:8705–8722. doi: 10.1021/acs.jmedchem.0c00385. [DOI] [PubMed] [Google Scholar]
  77. Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint. 2014:arXiv:1412.3555 [Google Scholar]
  78. Chupakhin V, Marcou G, Gaspar H, Varnek A. Simple Ligand–Receptor interaction descriptor (SILIRID) for alignment-free binding site comparison. Comput Struct Biotechnol J. 2014;10:33–37. doi: 10.1016/j.csbj.2014.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Clark JJ, Benson ML, Smith RD, Carlson HA. Inherent versus induced protein flexibility: Comparisons within and between apo and holo structures. PLOS Comput Biol. 2019;15:e1006705. doi: 10.1371/journal.pcbi.1006705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Clark JJ, Orban ZJ, Carlson HA. Predicting binding sites from unbound versus bound protein structures. Sci Rep. 2020;10:1–18. doi: 10.1038/s41598-020-72906-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Clark RD, Strizhev A, Leonard JM, Blake JF, Matthew JB. Consensus scoring for ligand/protein interactions. J Mol Graph Model. 2002;20:281–295. doi: 10.1016/s1093-3263(01)00125-5. [DOI] [PubMed] [Google Scholar]
  82. Coley CW, Jin W, Rogers L, Jamison TF, Jaakkola TS, Green WH, et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem Sci. 2019;10:370–377. doi: 10.1039/c8sc04228d. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–297. doi: 10.1007/bf00994018. [DOI] [Google Scholar]
  84. Cournia Z, Allen B, Sherman W. Relative binding free energy calculations in drug discovery: Recent advances and practical considerations. J Chem Inf Model. 2017;57:2911–2937. doi: 10.1021/acs.jcim.7b00564. [DOI] [PubMed] [Google Scholar]
  85. Da C, Kireev D. Structural Protein–Ligand interaction fingerprints (SPLIF) for structure-based virtual screening: Method and benchmark study. J Chem Inf Model. 2014;54:2555–2561. doi: 10.1021/ci500319f. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Dahl GE, Jaitly N, Salakhutdinov R. Multi-task neural networks for QSAR predictions. arXiv preprint. 2014:arXiv:1406.1231 [Google Scholar]
  87. Damm-Ganamet KL, Smith RD, Dunbar JB, Stuckey JA, Carlson HA. CSAR benchmark exercise 2011–2012: Evaluation of results from docking and relative ranking of blinded congeneric series. J Chem Inf Model. 2013;53:1853–1870. doi: 10.1021/ci400025f. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Darby JF, Hopkins AP, Shimizu S, Roberts SM, Brannigan JA, Turkenburg JP, et al. Water networks can determine the affinity of ligand binding to proteins. JACS. 2019;141:15818–15826. doi: 10.1021/jacs.9b06275. [DOI] [PubMed] [Google Scholar]
  89. Das S, Krein MP, Breneman CM. Binding affinity prediction with property-encoded shape distribution signatures. J Chem Inf Model. 2010;50:298–308. doi: 10.1021/ci9004139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. David L, Thakkar A, Mercado R, Engkvist O. Molecular representations in Al-driven drug discovery: A review and practical guide. J Cheminf. 2020;12:1–22. doi: 10.1186/s13321-020-00460-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. De S, Bartók AP, Csányi G, Ceriotti M. Comparing molecules and solids across structural and alchemical space. Phys Chem Chem Phys. 2016;18:13754–13769. doi: 10.1039/c6cp00415f. [DOI] [PubMed] [Google Scholar]
  92. de Magalhães CS, Almeida DM, Barbosa HJC, Dardenne LE. A dynamic niching genetic algorithm strategy for docking highly flexible ligands. Inform Sciences. 2014;289:206–224. doi: 10.1016/j.ins.2014.08.002. [DOI] [Google Scholar]
  93. Debroise T, Shakhnovich EI, Chéron N. A hybrid knowledge-based and empirical scoring function for protein–ligand interaction: Smog2016. Journal of Chemical Information and Modeling. 2017a;57:584–593. doi: 10.1021/acs.jcim.6b00610. [DOI] [PubMed] [Google Scholar]
  94. Debroise T, Shakhnovich EI, Chéron N. A hybrid knowledge-based and empirical scoring function for Protein–Ligand interaction: Smog2016. J Chem Inf Model. 2017b;57:584–593. doi: 10.1021/acs.jcim.6b00610. [DOI] [PubMed] [Google Scholar]
  95. Defferrard M, Bresson X, Vandergheynst P. Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst. 2016;29:3844–3852. [Google Scholar]
  96. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database; 2009 IEEE Conference on Computer Vision and Pattern Recognition. Ieee; 2009; pp. 248–255. [DOI] [Google Scholar]
  97. Deng W, Breneman C, Embrechts MJ. Predicting protein-ligand binding affinities using novel geometrical descriptors and machine-learning methods. J Chem Inf Comput Sci. 2004;44:699–703. doi: 10.1021/ci034246+. [DOI] [PubMed] [Google Scholar]
  98. Deng Z, Chuaqui C, Singh J. Structural interaction fingerprint (SIFt): A novel method for analyzing three-dimensional Protein-Ligand binding interactions. J Med Chem. 2003;47:337–344. doi: 10.1021/jm030331x. [DOI] [PubMed] [Google Scholar]
  99. DesJarlais RL, Sheridan RP, Seibel GL, Dixon JS, Kuntz ID, Venkataraghavan R. Using shape complementarity as an initial screen in designing ligands for a receptor binding site of known three-dimensional structure. J Med Chem. 1988;31:722–729. doi: 10.1021/jm00399a006. [DOI] [PubMed] [Google Scholar]
  100. DeWitte RS, Ishchenko AV, Shakhnovich EI. SMoG: De novo design method based on simple, fast, and accurate free energy estimates. 2. case studies in molecular design. JACS. 1997;119:4608–4617. doi: 10.1021/ja963689+. [DOI] [Google Scholar]
  101. DeWitte RS, Shakhnovich EI. SMoG: De novo design method based on simple, fast, and accurate free energy estimates. 1. methodology and supporting evidence. JACS. 1996;118:11733–11744. doi: 10.1021/ja960751u. [DOI] [Google Scholar]
  102. Dickson M, Gagnon JP. Key factors in the rising cost of new drug discovery and development. Nat Rev Drug Discov. 2004;3:417–429. doi: 10.1038/nrd1382. [DOI] [PubMed] [Google Scholar]
  103. DiMasi JA, Grabowski HG, Hansen RW. Innovation in the pharmaceutical industry: New estimates of R&D costs. J Health Econ. 2016;47:20–33. doi: 10.1016/j.jhealeco.2016.01.012. [DOI] [PubMed] [Google Scholar]
  104. Dittrich J, Schmidt D, Pfleger C, Gohlke H. Converging a knowledge-based scoring function: Drugscore 2018. J Chem Inf Model. 2018;59:509–521. doi: 10.1021/acs.jcim.8b00582. [DOI] [PubMed] [Google Scholar]
  105. Dong L, Qu X, Zhao Y, Wang B. Prediction of binding free energy of Protein–Ligand complexes with a hybrid molecular Mechanics/Generalized born surface area and machine learning method. ACS Omega. 2021;6:32938–32947. doi: 10.1021/acsomega.1c04996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Drews J. Drug discovery: A historical perspective. Science. 2000;287:1960–1964. doi: 10.1126/science.287.5460.1960. [DOI] [PubMed] [Google Scholar]
  107. Druchok M, Yarish D, Garkot S, Nikolaienko T, Gurbych O. Ensembling machine learning models to boost molecular affinity prediction. Comput Biol Chem. 2021;93:107529. doi: 10.1016/j.compbiolchem.2021.107529. [DOI] [PubMed] [Google Scholar]
  108. Dumoulin V, Visin F. A guide to convolution arithmetic for deep learning. arXiv preprint. 2016:arXiv:1603.07285 [Google Scholar]
  109. Dunbar JB, Smith RD, Damm-Ganamet KL, Ahmed A, Esposito EX, Delproposto J, et al. CSAR data set release 2012: Ligands, affinities, complexes, and docking decoys. J Chem Inf Model. 2013;53:1842–1852. doi: 10.1021/ci4000486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Dunbar JB, Smith RD, Yang C-Y, Ung PM-U, Lexa KW, Khazanov NA, et al. CSAR benchmark exercise of 2010: Selection of the Protein–Ligand complexes. J Chem Inf Model. 2011;51:2036–2046. doi: 10.1021/ci200082t. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Durrant JD, McCammon JA. NNScore: A neural-network-based scoring function for the characterization of Protein−Ligand complexes. J Chem Inf Model. 2010;50:1865–1871. doi: 10.1021/ci100244v. [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. Durrant JD, McCammon JA. BINANA: A novel algorithm for ligand-binding characterization. J Mol Graph Model. 2011a;29:888–893. doi: 10.1016/j.jmgm.2011.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Durrant JD, McCammon JA. NNScore 2.0: A neural-network Receptor–Ligand scoring function. J Chem Inf Model. 2011b;51:2897–2903. doi: 10.1021/ci2003889. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Efron B. Springer Series in Statistics. Springer; New York: 1992. Bootstrap methods: Another look at the jackknife; pp. 569–593. [DOI] [Google Scholar]
  115. Eldridge MD, Murray CW, Auton TR, Paolini GV, Mee RP. Empirical scoring functions: I. the development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J Comput Aided Mol Des. 1997;11:425–445. doi: 10.1023/a:1007996124545. [DOI] [PubMed] [Google Scholar]
  116. Ericksen SS, Wu H, Zhang H, Michael LA, Newton MA, Hoffmann FM, et al. Machine learning consensus scoring improves performance across targets in structure-based virtual screening. J Chem Inf Model. 2017;57:1579–1590. doi: 10.1021/acs.jcim.7b00153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  117. Ewing TJ, Makino S, Skillman AG, Kuntz ID. DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases. J Comput Aided Mol Des. 2001;15:411–428. doi: 10.1023/a:1011115820450. [DOI] [PubMed] [Google Scholar]
  118. Feinberg EN, Sur D, Wu Z, Husic BE, Mai H, Li Y, et al. PotentialNet for molecular property prediction. ACS Cent Sci. 2018;4:1520–1530. doi: 10.1021/acscentsci.8b00507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Feng Q, Dueva E, Cherkasov A, Ester M. Padme: A deep learning-based framework for drug-target interaction prediction. arXiv preprint. 2018:arXiv:1807.09741 [Google Scholar]
  120. Francoeur PG, Masuda T, Sunseri J, Jia A, Iovanisci RB, Snyder I, et al. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J Chem Inf Model. 2020;60:4200–4215. doi: 10.1021/acs.jcim.0c00411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  121. Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 1997;55:119–139. doi: 10.1006/jcss.1997.1504. [DOI] [Google Scholar]
  122. Friedman JH. Stochastic gradient boosting. Computational Statistics & Data Analysis. 2002;38:367–378. doi: 10.1016/s0167-9473(01)00065-2. [DOI] [Google Scholar]
  123. Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, et al. Glide: A new approach for rapid, accurate docking and scoring. 1. method and assessment of docking accuracy. J Med Chem. 2004;47:1739–1749. doi: 10.1021/jm0306430. [DOI] [PubMed] [Google Scholar]
  124. Friesner RA, Murphy RB, Repasky MP, Frye LL, Greenwood JR, Halgren TA, et al. Extra precision glide: Docking and scoring incorporating a model of hydrophobic enclosure for Protein−Ligand complexes. J Med Chem. 2006;49:6177–6196. doi: 10.1021/jm051256o. [DOI] [PubMed] [Google Scholar]
  125. Fukushima K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern. 1980;36:193–202. doi: 10.1007/bf00344251. [DOI] [PubMed] [Google Scholar]
  126. Gabel J, Desaphy J, Rognan D. Beware of machine learning-based scoring Functions—On the danger of developing black boxes. J Chem Inf Model. 2014;54:2807–2815. doi: 10.1021/ci500406k. [DOI] [PubMed] [Google Scholar]
  127. Gaieb Z, Liu S, Gathiaka S, Chiu M, Yang H, Shao C, et al. D3R grand challenge 2: Blind prediction of protein–ligand poses, affinity rankings, and relative binding free energies. J Comput Aided Mol Des. 2017;32:1–20. doi: 10.1007/s10822-017-0088-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  128. Gaieb Z, Parks CD, Chiu M, Yang H, Shao C, Walters WP, et al. D3R grand challenge 3: Blind prediction of protein–ligand poses and affinity rankings. J Comput Aided Mol Des. 2019;33:1–18. doi: 10.1007/s10822-018-0180-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  129. Gal Y, Ghahramani Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning; international conference on machine learning; 2016; pp. 1050–1059. [Google Scholar]
  130. Gao X, Ramezanghorbani F, Isayev O, Smith JS, Roitberg AE. TorchANI: A free and open source PyTorch-based deep learning implementation of the ANI neural network potentials. J Chem Inf Model. 2020;60:3408–3415. doi: 10.1021/acs.jcim.0c00451. [DOI] [PubMed] [Google Scholar]
  131. Gathiaka S, Liu S, Chiu M, Yang H, Stuckey JA, Kang YN, et al. D3R grand challenge 2015: Evaluation of protein–ligand pose and affinity predictions. J Comput Aided Mol Des. 2016;30:651–668. doi: 10.1007/s10822-016-9946-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  132. Gaudelet T, Day B, Jamasb AR, Soman J, Regep C, Liu G, et al. Utilizing graph machine learning within drug discovery and development. Brief Bioinform. 2021;22 doi: 10.1093/bib/bbab159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  133. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, et al. ChEMBL: A large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2011;40:D1100–D1107. doi: 10.1093/nar/gkr777. [DOI] [PMC free article] [PubMed] [Google Scholar]
  134. Genheden S, Ryde U. The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities. Expert Opin Drug Discovery. 2015;10:449–461. doi: 10.1517/17460441.2015.1032936. [DOI] [PMC free article] [PubMed] [Google Scholar]
  135. Geron A. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media; 2019. [Google Scholar]
  136. Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE. Neural message passing for quantum chemistry; International conference on machine learning; 2017; pp. 1263–1272. [Google Scholar]
  137. Gilson MK, Honig BH. The dielectric constant of a folded protein. Biopolymers. 1986;25:2097–2119. doi: 10.1002/bip.360251106. [DOI] [PubMed] [Google Scholar]
  138. Gohlke H, Hendlich M, Klebe G. Knowledge-based scoring function to predict protein-ligand interactions. J Mol Biol. 2000;295:337–356. doi: 10.1006/jmbi.1999.3371. [DOI] [PubMed] [Google Scholar]
  139. Gomes J, Ramsundar B, Feinberg EN, Pande VS. Atomic convolutional networks for predicting protein-ligand binding affinity. arXiv preprint. 2017:arXiv:1703.10603 [Google Scholar]
  140. Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press; 2016. [Google Scholar]
  141. Goodsell DS, Autin L, Olson AJ. Illustrate: Software for biomolecular illustration. Structure. 2019a;27:1716–1720.:e1. doi: 10.1016/j.str.2019.08.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  142. Goodsell DS, Olson AJ. Automated docking of substrates to proteins by simulated annealing. Proteins: Struct, Funct, Genet. 1990;8:195–202. doi: 10.1002/prot.340080302. [DOI] [PubMed] [Google Scholar]
  143. Goodsell DS, Zardecki C, Di Costanzo L, Duarte JM, Hudson BP, Persikova I, et al. RCSB protein data bank: Enabling biomedical research and drug discovery. Protein Sci. 2019b;29:52–65. doi: 10.1002/pro.3730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  144. Graves A. Supervised sequence labelling. Springer; Berlin Heidelberg: 2012. pp. 5–13. chap. Supervised Sequence Labelling. [DOI] [Google Scholar]
  145. Guedes IA, Barreto A, Marinho D, Krempser E, Kuenemann MA, Sperandio O, et al. New machine learning and physics-based scoring functions for drug discovery. Scientific reports. 2021a;11:1–19. doi: 10.1038/s41598-021-82410-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  146. Guedes IA, Barreto AMS, Marinho D, Krempser E, Kuenemann MA, Sperandio O, et al. New machine learning and physics-based scoring functions for drug discovery. Sci Rep. 2021b;11:1–19. doi: 10.1038/s41598-021-82410-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  147. Guedes IA, Pereira FSS, Dardenne LE. Empirical scoring functions for structure-based virtual screening: Applications, critical aspects, and challenges. Front Pharmacol. 2018;9:1089. doi: 10.3389/fphar.2018.01089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  148. Gunning D, Stefik M, Choi J, Miller T, Stumpf S, Yang G-Z. XAI—Explainable artificial intelligence. Science Robotics. 2019;4 doi: 10.1126/scirobotics.aay7120. [DOI] [PubMed] [Google Scholar]
  149. Hahn DF, Bayly CI, Macdonald HEB, Chodera JD, Mey AS, Mobley DL, et al. Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks. arXiv preprint. 2021 doi: 10.33011/livecoms.4.1.1497. arXiv:2105.06222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  150. Hansen L, Salamon P. Neural network ensembles. IEEE Trans Pattern Anal Mach Intell. 1990;12:993–1001. doi: 10.1109/34.58871. [DOI] [Google Scholar]
  151. Hartshorn MJ, Verdonk ML, Chessari G, Brewerton SC, Mooij WTM, Mortenson PN, et al. Diverse, high-quality test set for the validation of Protein−Ligand docking performance. J Med Chem. 2007;50:726–741. doi: 10.1021/jm061277y. [DOI] [PubMed] [Google Scholar]
  152. Hassan M, Mogollon DC, Fuentes O, sirimulla s. DLSCORE: A deep learning model for predicting protein-ligand binding affinities. ChemRxiv. 2018 doi: 10.26434/chemrxiv.6159143.v1. [DOI] [Google Scholar]
  153. Hassan-Harrirou H, Zhang C, Lemmin T. RosENet: Improving binding affinity prediction by leveraging molecular mechanics energies with an ensemble of 3D convolutional neural networks. J Chem Inf Model. 2020;60:2791–2802. doi: 10.1021/acs.jcim.0c00075. [DOI] [PubMed] [Google Scholar]
  154. Hauser K, Negron C, Albanese SK, Ray S, Steinbrecher T, Abel R, et al. Predicting resistance of clinical abl mutations to targeted kinase inhibitors using alchemical free-energy calculations. Communications Biology. 2018;1:1–14. doi: 10.1038/s42003-018-0075-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  155. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition; 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016; pp. 770–778. [DOI] [Google Scholar]
  156. Hingerty BE, Ritchie RH, Ferrell TL, Turner JE. Dielectric effects in biopolymers: The theory of ionic saturation revisited. Biopolymers. 1985;24:427–439. doi: 10.1002/bip.360240302. [DOI] [Google Scholar]
  157. Ho TK. Random decision forests; Proceedings of 3rd International Conference on Document Analysis and Recognition; 1995; IEEE (IEEE Comput. Soc. Press); pp. 278–282. [DOI] [Google Scholar]
  158. Ho TK. The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell. 1998;20:832–844. doi: 10.1109/34.709601. [DOI] [Google Scholar]
  159. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]
  160. Hochuli J, Helbling A, Skaist T, Ragoza M, Koes DR. Visualizing convolutional neural network protein-ligand scoring. J Mol Graph Model. 2018;84:96–108. doi: 10.1016/j.jmgm.2018.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  161. Holderbach S, Adam L, Jayaram B, Wade RC, Mukherjee G. RASPD+: Fast protein-ligand binding free energy prediction using simplified physicochemical features. Frontiers in Molecular Biosciences. 2020;7 doi: 10.3389/fmolb.2020.601065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  162. Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Networks. 1989;2:359–366. doi: 10.1016/0893-6080(89)90020-8. [DOI] [Google Scholar]
  163. Hu L, Benson ML, Smith RD, Lerner MG, Carlson HA. Binding MOAD (Mother of all databases) Proteins: Struct, Funct, Bioinf. 2005;60:333–340. doi: 10.1002/prot.20512. [DOI] [PubMed] [Google Scholar]
  164. Huang DZ, Baber JC, Bahmanyar SS. The challenges of generalizability in artificial intelligence for ADME/Tox endpoint and activity prediction. Expert Opin Drug Discovery. 2021a;16:1045–1056. doi: 10.1080/17460441.2021.1901685. [DOI] [PubMed] [Google Scholar]
  165. Huang K, Fu T, Gao W, Zhao Y, Roohani Y, Leskovec J, et al. Therapeutics data commons: Machine learning datasets and tasks for therapeutics. arXiv preprint. 2021b:arXiv:2102.09548 [Google Scholar]
  166. Huang S-Y, Grinter SZ, Zou X. Scoring functions and their evaluation methods for protein-ligand docking: Recent advances and future directions; Phys Chem Chem Phys; 12899; 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  167. Huang S-Y, Zou X. An iterative knowledge-based scoring function to predict protein–ligand interactions: I. derivation of interaction potentials. Journal of computational chemistry. 2006a;27:1866–1875. doi: 10.1002/jcc.20504. [DOI] [PubMed] [Google Scholar]
  168. Huang S-Y, Zou X. An iterative knowledge-based scoring function to predict protein–ligand interactions: Ii. validation of the scoring function. Journal of computational chemistry. 2006b;27:1876–1882. doi: 10.1002/jcc.20505. [DOI] [PubMed] [Google Scholar]
  169. Huang S-Y, Zou X. Inclusion of solvation and entropy in the knowledge-based scoring function for Protein−Ligand interactions. J Chem Inf Model. 2010;50:262–273. doi: 10.1021/ci9002987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  170. Hubel DH. Single unit activity in striate cortex of unrestrained cats. J Physiol. 1959;147:226–238. doi: 10.1113/jphysiol.1959.sp006238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  171. Hubel DH, Wiesel TN. Receptive fields of single neurones in the cat’s striate cortex. J Physiol. 1959;148:574–591. doi: 10.1113/jphysiol.1959.sp006308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  172. Huey R, Morris GM, Olson AJ, Goodsell DS. A semiempirical free energy force field with charge-based desolvation. J Comput Chem. 2007;28:1145–1152. doi: 10.1002/jcc.20634. [DOI] [PubMed] [Google Scholar]
  173. Imrie F, Bradley AR, van der Schaar M, Deane CM. Protein family-specific models using deep neural networks and transfer learning improve virtual screening and highlight the need for more data. J Chem Inf Model. 2018;58:2319–2330. doi: 10.1021/acs.jcim.8b00350. [DOI] [PubMed] [Google Scholar]
  174. Jasper JB, Humbeck L, Brinkjost T, Koch O. A novel interaction fingerprint derived from per atom score contributions: Exhaustive evaluation of interaction fingerprint performance in docking based virtual screening. J Cheminf. 2018;10:1–13. doi: 10.1186/s13321-018-0264-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  175. Ji Y, Zhang L, Wu J, Wu B, Huang L-K, Xu T, et al. DrugOOD: Out-of-distribution (OOD) dataset curator and benchmark for AI-aided drug Discovery–A focus on affinity prediction problems with noise annotations. arXiv preprint. 2022:arXiv:2201.09637 [Google Scholar]
  176. Jiang D, Hsieh C-Y, Wu Z, Kang Y, Wang J, Wang E, et al. InteractionGraphNet: A novel and efficient deep graph representation learning framework for accurate Protein–Ligand interaction predictions. J Med Chem. 2021;64:18209–18232. doi: 10.1021/acs.jmedchem.1c01830. [DOI] [PubMed] [Google Scholar]
  177. Jiang H, Fan M, Wang J, Sarma A, Mohanty S, Dokholyan NV, et al. Guiding conventional Protein–Ligand docking software with convolutional neural networks. J Chem Inf Model. 2020a;60:4594–4602. doi: 10.1021/acs.jcim.0c00542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  178. Jiang M, Li Z, Zhang S, Wang S, Wang X, Yuan Q, et al. Drug–target affinity prediction using graph neural network and contact maps. RSC Adv. 2020b;10:20701–20712. doi: 10.1039/d0ra02297g. [DOI] [PMC free article] [PubMed] [Google Scholar]
  179. Jiménez J, Doerr S, Martínez-Rosell G, Rose AS, De Fabritiis G. DeepSite: Protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics. 2017;33:3036–3042. doi: 10.1093/bioinformatics/btx350. [DOI] [PubMed] [Google Scholar]
  180. Jiménez J, Škalič M, Martínez-Rosell G, De Fabritiis G. KDEEP: Protein–ligand absolute binding affinity prediction via 3D-convolutional neural networks. J Chem Inf Model. 2018;58:287–296. doi: 10.1021/acs.jcim.7b00650. [DOI] [PubMed] [Google Scholar]
  181. Jiménez-Luna J, Grisoni F, Schneider G. Drug discovery with explainable artificial intelligence. Nature Machine Intelligence. 2020;2:573–584. doi: 10.1038/s42256-020-00236-4. [DOI] [Google Scholar]
  182. Jiménez-Luna J, Grisoni F, Weskamp N, Schneider G. Artificial intelligence in drug discovery: Recent advances and future perspectives. Expert Opin Drug Discovery. 2021a;16:949–959. doi: 10.1080/17460441.2021.1909567. [DOI] [PubMed] [Google Scholar]
  183. Jiménez-Luna J, Pérez-Benito L, Martínez-Rosell G, Sciabola S, Torella R, Tresadern G, et al. DeltaDelta neural networks for lead optimization of small molecule potency. Chem Sci. 2019;10:10911–10918. doi: 10.1039/c9sc04606b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  184. Jiménez-Luna J, Skalic M, Weskamp N, Schneider G. Coloring molecules with explainable artificial intelligence for preclinical relevance assessment. J Chem Inf Model. 2021b;61:1083–1094. doi: 10.1021/acs.jcim.0c01344. [DOI] [PubMed] [Google Scholar]
  185. Jing Y, Bian Y, Hu Z, Wang L, Xie X-QS. Deep learning for drug design: An artificial intelligence paradigm for drug discovery in the big data era. AAPS J. 2018;20:1–10. doi: 10.1208/s12248-018-0210-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  186. Jones D, Kim H, Zhang X, Zemla A, Stevenson G, Bennett WFD, et al. Improved Protein-Ligand binding affinity prediction with structure-based deep fusion inference. J Chem Inf Model. 2021;61:1583–1592. doi: 10.1021/acs.jcim.0c01306. [DOI] [PubMed] [Google Scholar]
  187. Jones G, Willett P, Glen RC. Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation. J Mol Biol. 1995;245:43–53. doi: 10.1016/s0022-2836(95)80037-9. [DOI] [PubMed] [Google Scholar]
  188. Jones G, Willett P, Glen RC, Leach AR, Taylor R. Development and validation of a genetic algorithm for flexible docking 1 1Edited by f. E. Cohen. J Mol Biol. 1997;267:727–748. doi: 10.1006/jmbi.1996.0897. [DOI] [PubMed] [Google Scholar]
  189. Jones-Hertzog DK, Jorgensen WL. Binding affinities for sulfonamide inhibitors with human thrombin using Monte Carlo simulations with a linear response method. J Med Chem. 1997;40:1539–1549. doi: 10.1021/jm960684e. [DOI] [PubMed] [Google Scholar]
  190. Jubb HC, Higueruelo AP, Ochoa-Montaño B, Pitt WR, Ascher DB, Blundell TL. Arpeggio: A web server for calculating and visualising interatomic interactions in protein structures. J Mol Biol. 2017;429:365–371. doi: 10.1016/j.jmb.2016.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  191. Kadukova M, Grudinin S. Convex-pl: a novel knowledge-based potential for protein-ligand interactions deduced from structural databases using convex optimization. Journal of computer-aided molecular design. 2017;31:943–958. doi: 10.1007/s10822-017-0068-8. [DOI] [PubMed] [Google Scholar]
  192. Kadukova M, Machado KdS, Chacón P, Grudinin S. Korp-pl: a coarse-grained knowledge-based scoring function for protein–ligand interactions. Bioinformatics. 2021;37:943–950. doi: 10.1093/bioinformatics/btaa748. [DOI] [PubMed] [Google Scholar]
  193. Karimi M, Wu D, Wang Z, Shen Y. DeepAffinity: Interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks. Bioinformatics. 2019;35:3329–3338. doi: 10.1093/bioinformatics/btz111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  194. Karlov DS, Sosnin S, Fedorov MV, Popov P. graphDelta: Mpnn scoring function for the affinity prediction of Protein–Ligand complexes. ACS Omega. 2020;5:5150–5159. doi: 10.1021/acsomega.9b04162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  195. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30 [Google Scholar]
  196. Kearnes S, McCloskey K, Berndl M, Pande V, Riley P. Molecular graph convolutions: Moving beyond fingerprints. J Comput Aided Mol Des. 2016;30:595–608. doi: 10.1007/s10822-016-9938-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  197. Kendall MG. A new measure of rank correlation. Biometrika. 1938;30:81–93. doi: 10.1093/biomet/30.1-2.81. [DOI] [Google Scholar]
  198. Kimber TB, Chen Y, Volkamer A. Deep learning in virtual screening: Recent applications and developments. Int J Mol Sci. 2021;22:4435. doi: 10.3390/ijms22094435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  199. Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv preprint. 2016:arXiv:1609.02907 [Google Scholar]
  200. Koes DR, Baumgartner MP, Camacho CJ. Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J Chem Inf Model. 2013;53:1893–1904. doi: 10.1021/ci300604z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  201. Kramer C, Gedeck P. Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets. J Chem Inf Model. 2010;50:1961–1969. doi: 10.1021/ci100264e. [DOI] [PubMed] [Google Scholar]
  202. Kramer C, Kalliokoski T, Gedeck P, Vulpetti A. The experimental uncertainty of heterogeneous public Ki data. J Med Chem. 2012;55:5165–5173. doi: 10.1021/jm300131x. [DOI] [PubMed] [Google Scholar]
  203. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60:84–90. doi: 10.1145/3065386. [DOI] [Google Scholar]
  204. Kuzminykh D, Polykovskiy D, Kadurin A, Zhebrak A, Baskov I, Nikolenko S, et al. 3D molecular representations based on the wave transform for convolutional neural networks. Mol Pharmaceut. 2018;15:4378–4385. doi: 10.1021/acs.molpharmaceut.7b01134. [DOI] [PubMed] [Google Scholar]
  205. Kwon Y, Shin W-H, Ko J, Lee J. AK-score: Accurate protein-ligand binding affinity prediction using an ensemble of 3D-convolutional neural networks. Int J Mol Sci. 2020;21:8424. doi: 10.3390/ijms21228424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  206. Le Cun Y, Jackel L, Boser B, Denker J, Graf H, Guyon I, et al. Handwritten digit recognition: Applications of neural network chips and automatic learning. IEEE Commun Mag. 1989;27:41–46. doi: 10.1109/35.41400. [DOI] [Google Scholar]
  207. Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86:2278–2324. doi: 10.1109/5.726791. [DOI] [Google Scholar]
  208. Lenselink EB, ten Dijke N, Bongers B, Papadatos G, van Vlijmen HWT, Kowalczyk W, et al. Beyond the hype: Deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set. J Cheminf. 2017;9:1–14. doi: 10.1186/s13321-017-0232-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  209. Li H, Leung K-S, Wong M-H, Ballester P. Low-quality structural and interaction data improves binding affinity prediction via random forest. Molecules. 2015;20:10947–10962. doi: 10.3390/molecules200610947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  210. Li H, Lu G, Sze K-H, Su X, Chan W-Y, Leung K-S. Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark. Brief Bioinform. 2021a;22:bbab225. doi: 10.1093/bib/bbab225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  211. Li H, Peng J, Leung Y, Leung K-S, Wong M-H, Lu G, et al. The impact of protein structure and sequence similarity on the accuracy of machine-learning scoring functions for binding affinity prediction. Biomolecules. 2018;8:12. doi: 10.3390/biom8010012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  212. Li H, Peng J, Sidorov P, Leung Y, Leung K-S, Wong M-H, et al. Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data. Bioinformatics. 2019a;35:3989–3995. doi: 10.1093/bioinformatics/btz183. [DOI] [PubMed] [Google Scholar]
  213. Li H, Peng J, Sidorov P, Leung Y, Leung K-S, Wong M-H, et al. Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data. Bioinformatics. 2019b;35:3989–3995. doi: 10.1093/bioinformatics/btz183. [DOI] [PubMed] [Google Scholar]
  214. Li H, Sze K-H, Lu G, Ballester PJ. Machine-learning scoring functions for structure-based drug lead optimization. WIREs Comput Mol Sci. 2020a;10:e1465 [Google Scholar]
  215. Li H, Sze K-H, Lu G, Ballester PJ. Machine-learning scoring functions for structure-based virtual screening. WIREs Comput Mol Sci. 2021b;11:e1478 [Google Scholar]
  216. Li L, Wang B, Meroueh SO. Support vector regression scoring of Receptor–Ligand complexes for rank-ordering and virtual screening of chemical libraries. J Chem Inf Model. 2011;51:2132–2138. doi: 10.1021/ci200078f. [DOI] [PMC free article] [PubMed] [Google Scholar]
  217. Li S, Zhao Y, Varma R, Salpekar O, Noordhuis P, Li T, et al. PyTorch distributed: Experiences on accelerating data parallel training. Proceedings of the VLDB Endowment. 2020b;13:3005–3018. doi: 10.14778/3415478.3415530. [DOI] [Google Scholar]
  218. Li S, Zhou J, Xu T, Huang L, Wang F, Xiong H, et al. Structure-aware interactive graph neural networks for the prediction of protein-ligand binding affinity; Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining; 2021c; pp. 975–985. [DOI] [Google Scholar]
  219. Li Y, Han L, Liu Z, Wang R. Comparative assessment of scoring functions on an updated benchmark: 2. evaluation methods and general results. J Chem Inf Model. 2014a;54:1717–1736. doi: 10.1021/ci500081m. [DOI] [PubMed] [Google Scholar]
  220. Li Y, Liu Z, Li J, Han L, Liu J, Zhao Z, et al. Comparative assessment of scoring functions on an updated benchmark: 1. compilation of the test set. J Chem Inf Model. 2014b;54:1700–1716. doi: 10.1021/ci500080q. [DOI] [PubMed] [Google Scholar]
  221. Li Y, Rezaei MA, Li C, Li X. DeepAtom: A framework for protein-ligand binding affinity prediction; 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) IEEE; 2019c; pp. 303–310. [DOI] [Google Scholar]
  222. Li Y, Yang J. Structural and sequence similarity makes a significant impact on machine-learning-based scoring functions for Protein–Ligand interactions. J Chem Inf Model. 2017;57:1007–1012. doi: 10.1021/acs.jcim.7b00049. [DOI] [PubMed] [Google Scholar]
  223. Lim J, Ryu S, Park K, Choe YJ, Ham J, Kim WY. Predicting Drug–Target interaction using a novel graph neural network with 3D structure-embedded graph representation. J Chem Inf Model. 2019;59:3981–3988. doi: 10.1021/acs.jcim.9b00387. [DOI] [PubMed] [Google Scholar]
  224. Limongelli V, Marinelli L, Cosconati S, La Motta C, Sartini S, Mugnaini L, et al. Sampling protein motion and solvent effect during ligand binding. Proc Natl Acad Sci. 2012;109:1467–1472. doi: 10.1073/pnas.1112181108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  225. Lipton ZC. The mythos of model interpretability. Queue. 2018;16:31–57. doi: 10.1145/3236386.3241340. [DOI] [Google Scholar]
  226. Liu J, Wang R. Classification of current scoring functions. J Chem Inf Model. 2015;55:475–482. doi: 10.1021/ci500731a. [DOI] [PubMed] [Google Scholar]
  227. Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK. BindingDB: A web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res. 2007;35:D198–D201. doi: 10.1093/nar/gkl999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  228. Liu Z, Li Y, Han L, Li J, Liu J, Zhao Z, et al. PDB-wide collection of binding data: Current status of the PDBbind database. Bioinformatics. 2014;31:405–412. doi: 10.1093/bioinformatics/btu626. [DOI] [PubMed] [Google Scholar]
  229. Liu Z, Su M, Han L, Liu J, Yang Q, Li Y, et al. Forging the basis for developing Protein-Ligand interaction scoring functions. Accounts Chem Res. 2017;50:302–309. doi: 10.1021/acs.accounts.6b00491. [DOI] [PubMed] [Google Scholar]
  230. Lo Y-C, Rensi SE, Torng W, Altman RB. Machine learning in chemoinformatics and drug discovery. Drug Discov Today. 2018;23:1538–1546. doi: 10.1016/j.drudis.2018.05.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  231. Luong M-T, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. arXiv preprint. 2015:arXiv:1508.04025 [Google Scholar]
  232. Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V. Deep neural nets as a method for quantitative Structure–Activity relationships. J Chem Inf Model. 2015;55:263–274. doi: 10.1021/ci500747n. [DOI] [PubMed] [Google Scholar]
  233. Ma N, Zhang X, Zheng H-T, Sun J. Shufflenet v2: Practical guidelines for efficient cnn architecture design; Proceedings of the European conference on computer vision (ECCV); 2018; pp. 116–131. [Google Scholar]
  234. Macarron R, Banks MN, Bojanic D, Burns DJ, Cirovic DA, Garyantes T, et al. Impact of high-throughput screening in biomedical research. Nat Rev Drug Discov. 2011;10:188–195. doi: 10.1038/nrd3368. [DOI] [PubMed] [Google Scholar]
  235. Marchese Robinson RL, Palczewska A, Palczewski J, Kidley N. Comparison of the predictive performance and interpretability of random forest and linear models on benchmark data sets. J Chem Inf Model. 2017;57:1773–1792. doi: 10.1021/acs.jcim.6b00753. [DOI] [PubMed] [Google Scholar]
  236. Mason L, Baxter J, Bartlett P, Frean M. Boosting algorithms as gradient descent in function space. Proc NIPS; 1999. pp. 512–518. [Google Scholar]
  237. Mayr LM, Bojanic D. Novel trends in high-throughput screening. Curr Opin Pharmacol. 2009;9:580–588. doi: 10.1016/j.coph.2009.08.004. [DOI] [PubMed] [Google Scholar]
  238. McCloskey K, Taly A, Monti F, Brenner MP, Colwell LJ. Using attribution to decode binding mechanism in neural network models for chemistry. Proc Natl Acad Sci. 2019;116:11624–11629. doi: 10.1073/pnas.1820657116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  239. McCorkindale W, Poelking C, Lee AA. Investigating 3D atomic environments for enhanced QSAR. arXiv preprint. 2020:arXiv:2010.12857 [Google Scholar]
  240. McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys. 1943;5:115–133. doi: 10.1007/bf02478259. [DOI] [PubMed] [Google Scholar]
  241. McNutt AT, Francoeur P, Aggarwal R, Masuda T, Meli R, Ragoza M, et al. GNINA 1.0: Molecular docking with deep learning. J Cheminf. 2021;13:1–20. doi: 10.1186/s13321-021-00522-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  242. McNutt AT, Koes DR. Improving ΔΔg predictions with a multitask convolutional Siamese network. J Chem Inf Model. 2022 doi: 10.1021/acs.jcim.1c01497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  243. Meli R, Anighoro A, Bodkin MJ, Morris GM, Biggin PC. Learning protein-ligand binding affinity with atomic environment vectors. J Cheminf. 2021;13:1–19. doi: 10.1186/s13321-021-00536-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  244. Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, et al. ChEMBL: Towards direct deposition of bioassay data. Nucleic Acids Res. 2018;47:D930–D940. doi: 10.1093/nar/gky1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  245. Meng EC, Shoichet BK, Kuntz ID. Automated docking with grid-based energy evaluation. J Comput Chem. 1992;13:505–524. doi: 10.1002/jcc.540130412. [DOI] [Google Scholar]
  246. Menke J, Koch O. Using domain-specific fingerprints generated through neural networks to enhance ligand-based virtual screening. J Chem Inf Model. 2021;61:664–675. doi: 10.1021/acs.jcim.0c01208. [DOI] [PubMed] [Google Scholar]
  247. Mey AS, Allen B, Macdonald HEB, Chodera JD, Kuhn M, Michel J, et al. Best practices for alchemical free energy calculations. arXiv preprint. 2020:arXiv:2008.03067. doi: 10.33011/livecoms.2.1.18378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  248. Meyers J, Fabian B, Brown N. De novo molecular design and generative models. Drug Discov Today. 2021;26:2707–2715. doi: 10.1016/j.drudis.2021.05.019. [DOI] [PubMed] [Google Scholar]
  249. Mobley DL, Graves AP, Chodera JD, McReynolds AC, Shoichet BK, Dill KA. Predicting absolute ligand binding free energies to a simple model site. J Mol Biol. 2007;371:1118–1134. doi: 10.1016/j.jmb.2007.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  250. Moesser MA, Klein D, Boyles F, Deane CM, Baxter A, Morris GM. Protein-ligand interaction graphs: Learning from ligand-shaped 3d interaction graphs to improve binding affinity prediction. bioRxiv. 2022 [Google Scholar]
  251. Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein MM. Geometric deep learning on graphs and manifolds using mixture model CNNs; 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017; pp. 5115–5124. [DOI] [Google Scholar]
  252. Monticelli L, Tieleman DP. Methods in Molecular Biology. Humana Press; Totowa, NJ: 2012. Force fields for classical molecular dynamics; pp. 197–213. [DOI] [PubMed] [Google Scholar]
  253. Moon S, Zhung W, Yang S, Lim J, Kim WY. PIGNet: A physics-informed deep learning model toward generalized drug–target interaction predictions. Chem Sci. 2022;13:3661–3673. doi: 10.1039/d1sc06946b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  254. Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, Belew RK, et al. Automated docking using a lamarckian genetic algorithm and an empirical binding free energy function. Journal of computational chemistry. 1998;19:1639–1662. [Google Scholar]
  255. Morris GM, Goodsell DS, Huey R, Olson AJ. Distributed automated docking of flexible ligands to proteins: Parallel applications of AutoDock 2.4. J Comput Aided Mol Des. 1996;10:293–304. doi: 10.1007/bf00124499. [DOI] [PubMed] [Google Scholar]
  256. Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS, et al. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J Comput Chem. 2009;30:2785–2791. doi: 10.1002/jcc.21256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  257. Morrone JA, Weber JK, Huynh T, Luo H, Cornell WD. Combining docking pose rank and structure with deep learning improves Protein–Ligand binding mode prediction over a baseline docking approach. J Chem Inf Model. 2020;60:4170–4179. doi: 10.1021/acs.jcim.9b00927. [DOI] [PubMed] [Google Scholar]
  258. Moustakas DT, Lang PT, Pegg S, Pettersen E, Kuntz ID, Brooijmans N, et al. Development and validation of a modular, extensible docking program: Dock 5. J Comput Aided Mol Des. 2006;20:601–619. doi: 10.1007/s10822-006-9060-4. [DOI] [PubMed] [Google Scholar]
  259. Muegge I. A knowledge-based scoring function for protein-ligand interactions: Probing the reference state. Perspect Drug Discov. 2000;20:99–114. doi: 10.1023/a:1008729005958. [DOI] [Google Scholar]
  260. Muegge I. Effect of ligand volume correction on PMF scoring. J Comput Chem. 2001;22:418–425. doi: 10.1002/1096-987x(200103)22:4&#x03008;418::aid-jcc1012&#x03009;3.0.co;2-3. [DOI] [Google Scholar]
  261. Muegge I, Martin YC. A general and fast scoring function for Protein−Ligand interactions: A simplified potential approach. J Med Chem. 1999;42:791–804. doi: 10.1021/jm980536j. [DOI] [PubMed] [Google Scholar]
  262. Muegge I, Rarey M. Reviews in Computational Chemistry. John Wiley & Sons, Inc; 2001. Small molecule docking and scoring; pp. 1–60. chap. 1. [DOI] [Google Scholar]
  263. Müller S, Ackloo S, Al Chawaf A, Al-Lazikani B, Antolin A, Baell JB, et al. Target 2035 – update on the quest for a probe for every protein. RSC Medicinal Chemistry. 2022;13:13–21. doi: 10.1039/d1md00228g. [DOI] [PMC free article] [PubMed] [Google Scholar]
  264. Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, et al. QSAR without borders. Chem Soc Rev. 2020;49:3525–3564. doi: 10.1039/d0cs00098a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  265. Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B. Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci. 2019;116:22071–22080. doi: 10.1073/pnas.1900654116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  266. Musil F, Grisafi A, Bartók AP, Ortner C, Csányi G, Ceriotti M. Physics-inspired structural representations for molecules and materials. Chem Rev. 2021;121:9759–9815. doi: 10.1021/acs.chemrev.1c00021. [DOI] [PubMed] [Google Scholar]
  267. Mysinger MM, Carchia M, Irwin JJ, Shoichet BK. Directory of useful decoys, enhanced (DUD-e): Better ligands and decoys for better benchmarking. J Med Chem. 2012;55:6582–6594. doi: 10.1021/jm300687e. [DOI] [PMC free article] [PubMed] [Google Scholar]
  268. Narkhede MV, Bartakke PP, Sutaone MS. A review on weight initialization strategies for neural networks. Artif Intell Rev. 2021;55:291–322. doi: 10.1007/s10462-021-10033-z. [DOI] [Google Scholar]
  269. Neudert G, Klebe G. DSX: A knowledge-based scoring function for the assessment of Protein-Ligand complexes. J Chem Inf Model. 2011;51:2731–2745. doi: 10.1021/ci200274q. [DOI] [PubMed] [Google Scholar]
  270. Nguyen DD, Wei G-W. AGL-score: Algebraic graph learning score for Protein–Ligand binding scoring, ranking, docking, and screening. J Chem Inf Model. 2019;59:3291–3304. doi: 10.1021/acs.jcim.9b00334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  271. Nguyen T, Le H, Quinn TP, Nguyen T, Le TD, Venkatesh S. GraphDTA: Predicting drug-target binding affinity with graph neural networks. Bioinformatics. 2020;37:1140–1147. doi: 10.1093/bioinformatics/btaa921. [DOI] [PubMed] [Google Scholar]
  272. Nicholls A. Confidence limits, error bars and method comparison in molecular modeling. part 1: The calculation of confidence intervals. J Comput Aided Mol Des. 2014;28:887–918. doi: 10.1007/s10822-014-9753-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  273. Nicholls A. Confidence limits, error bars and method comparison in molecular modeling. part 2: Comparing methods. J Comput Aided Mol Des. 2016;30:103–126. doi: 10.1007/s10822-016-9904-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  274. Nogueira MS, Koch O. The development of target-specific machine learning models as scoring functions for docking-based target prediction. J Chem Inf Model. 2019;59:1238–1252. doi: 10.1021/acs.jcim.8b00773. [DOI] [PubMed] [Google Scholar]
  275. Öztürk H, Özgür A, Ozkirimli E. DeepDTA: Deep drug–target binding affinity prediction. Bioinformatics. 2018;34:i821–i829. doi: 10.1093/bioinformatics/bty593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  276. Palazzesi F, Pozzan A. Deep learning applied to ligand-based de novo drug design. Artificial Intelligence in Drug Design. 2022:273–299. doi: 10.1007/978-1-0716-1787-8_12. [DOI] [PubMed] [Google Scholar]
  277. Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2010;22:1345–1359. doi: 10.1109/tkde.2009.191. [DOI] [Google Scholar]
  278. Pan X, Wang H, Zhang Y, Wang X, Li C, Ji C, et al. Aa-score: a new scoring function based on amino acid-specific interaction for molecular docking. Journal of Chemical Information and Modeling. 2022 doi: 10.1021/acs.jcim.1c01537. [DOI] [PubMed] [Google Scholar]
  279. Parks CD, Gaieb Z, Chiu M, Yang H, Shao C, Walters WP, et al. D3R grand challenge 4: Blind prediction of protein–ligand poses, affinity rankings, and relative binding free energies. J Comput Aided Mol Des. 2020;34:99–119. doi: 10.1007/s10822-020-00289-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  280. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. Pytorch: An imperative style, high-performance deep learning library. Adv Neural Inf Process Syst. 2019;32:8026–8037. [Google Scholar]
  281. Patrick Walters W. Comparing classification models—a practical tutorial. J Comput Aided Mol Des. 2021:1–9. doi: 10.1007/s10822-021-00417-2. [DOI] [PubMed] [Google Scholar]
  282. Pearlman DA, Charifson PS. Are free energy calculations useful in practice? a comparison with rapid scoring functions for the p38 MAP kinase protein system. J Med Chem. 2001;44:3417–3423. doi: 10.1021/jm0100279. [DOI] [PubMed] [Google Scholar]
  283. Pérez-Nueno VI, Rabal O, Borrell JI, Teixidó J. APIF: A new interaction fingerprint based on atom pairs and its application to virtual screening. J Chem Inf Model. 2009;49:1245–1260. doi: 10.1021/ci900043r. [DOI] [PubMed] [Google Scholar]
  284. Preuer K, Klambauer G, Rippmann F, Hochreiter S, Unterthiner T. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Springer International Publishing; 2019. Interpretable deep learning in drug discovery; pp. 331–345. [DOI] [Google Scholar]
  285. Pu L, Govindaraj RG, Lemoine JM, Wu H-C, Brylinski M. DeepDrug3D: Classification of ligand-binding pockets in proteins with a convolutional neural network. PLOS Comput Biol. 2019;15:e1006718. doi: 10.1371/journal.pcbi.1006718. [DOI] [PMC free article] [PubMed] [Google Scholar]
  286. Qiao Z, Welborn M, Anandkumar A, Manby FR, Miller TF. OrbNet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital features. J Chem Phys. 2020;153:124111. doi: 10.1063/5.0021955. [DOI] [PubMed] [Google Scholar]
  287. Qiu T, Qiu J, Feng J, Wu D, Yang Y, Tang K, et al. The recent progress in proteochemometric modelling: Focusing on target descriptors, cross-term descriptors and application scope. Brief Bioinform. 2016;18:125–136. doi: 10.1093/bib/bbw004. [DOI] [PubMed] [Google Scholar]
  288. Quiroga R, Villarreal MA. Vinardo: A scoring function based on autodock vina improves scoring, docking, and virtual screening. PloS one. 2016;11:e0155183. doi: 10.1371/journal.pone.0155183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  289. Radifar M, Yuniarti N, Istyastono EP. PyPLIF: Python-based protein-ligand interaction fingerprinting. Bioinformation. 2013;9:325–328. doi: 10.6026/97320630009325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  290. Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR. Protein–Ligand scoring with convolutional neural networks. J Chem Inf Model. 2017a;57:942–957. doi: 10.1021/acs.jcim.6b00740. [DOI] [PMC free article] [PubMed] [Google Scholar]
  291. Ragoza M, Turner L, Koes DR. Ligand pose optimization with atomic grid-based convolutional neural networks. arXiv preprint. 2017b:arXiv:1710.07400 [Google Scholar]
  292. Ramsundar B, Liu B, Wu Z, Verras A, Tudor M, Sheridan RP, et al. Is multitask deep learning practical for pharma? J Chem Inf Model. 2017;57:2068–2076. doi: 10.1021/acs.jcim.7b00146. [DOI] [PubMed] [Google Scholar]
  293. Rännar S, Geladi P, Lindgren F, Wold S. A PLS kernel algorithm for data sets with many variables and few objects. part II: Cross-validation, missing data and examples. J Chemometr. 1995;9:459–470. doi: 10.1002/cem.1180090604. [DOI] [Google Scholar]
  294. Rännar S, Lindgren F, Geladi P, Wold S. A PLS kernel algorithm for data sets with many variables and fewer objects. part 1: Theory and algorithm. J Chemometr. 1994;8:111–125. doi: 10.1002/cem.1180080204. [DOI] [Google Scholar]
  295. Rasmussen CE. Summer school on machine learning. Springer; 2003. Gaussian processes in machine learning; pp. 63–71. [Google Scholar]
  296. Reymond J-L, van Deursen R, Blum LC, Ruddigkeit L. Chemical space as a source for new drugs. MedChemComm. 2010;1:30. doi: 10.1039/c0md00020e. [DOI] [Google Scholar]
  297. Rifaioglu AS, Atas H, Martin MJ, Cetin-Atalay R, Atalay V, Doğan T. Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Briefings in bioinformatics. 2019;20:1878–1912. doi: 10.1093/bib/bby061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  298. Riniker S, Landrum GA. Open-source platform to benchmark fingerprints for ligand-based virtual screening. J Cheminf. 2013a;5:1–17. doi: 10.1186/1758-2946-5-26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  299. Riniker S, Landrum GA. Similarity maps - a visualization strategy for molecular fingerprints and machine-learning methods. J Cheminf. 2013b;5:1–7. doi: 10.1186/1758-2946-5-43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  300. Rodríguez-Pérez R, Bajorath J. Interpretation of compound activity predictions from complex machine learning models using local approximations and Shapley values. J Med Chem. 2019;63:8761–8777. doi: 10.1021/acs.jmedchem.9b01101. [DOI] [PubMed] [Google Scholar]
  301. Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50:742–754. doi: 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
  302. Roitberg A, Pollert T, Haurilet M, Martin M, Stiefelhagen R. Analysis of deep fusion strategies for multi-modal gesture recognition; 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); 2019; pp. 0–0. [DOI] [Google Scholar]
  303. Rosenblatt F. Perceptions and the theory of brain mechanisms. Spartan books; 1962. [Google Scholar]
  304. Ross GA, Morris GM, Biggin PC. Rapid and accurate prediction and scoring of water molecules in protein binding sites. PLoS ONE. 2012;7:e32036. doi: 10.1371/journal.pone.0032036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  305. Ross GA, Morris GM, Biggin PC. One size does not fit all: The limits of structure-based models in drug discovery. J Chem Theory Comput. 2013;9:4266–4274. doi: 10.1021/ct4004228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  306. Rufa DA, Macdonald HEB, Fass J, Wieder M, Grinaway PB, Roitberg AE, et al. Towards chemical accuracy for alchemical free energy calculations with hybrid physics-based machine learning/molecular mechanics potentials. BioRxiv. 2020 [Google Scholar]
  307. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nature. 1986;323:533–536. doi: 10.1038/323533a0. [DOI] [Google Scholar]
  308. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. ImageNet large scale visual recognition challenge. Int J Comput Vision. 2015;115:211–252. doi: 10.1007/s11263-015-0816-y. [DOI] [Google Scholar]
  309. Ryu S, Lim J, Hong SH, Kim WY. Deeply learning molecular structure-property relationships using attention-and gate-augmented graph convolutional network. arXiv preprint. 2018:arXiv:1805.10988 [Google Scholar]
  310. Salt DW, Yildiz N, Livingstone DJ, Tinsley CJ. The use of artificial neural networks in QSAR. Pestic Sci. 1992;36:161–170. doi: 10.1002/ps.2780360212. [DOI] [Google Scholar]
  311. Scantlebury J, Brown N, Von Delft F, Deane CM. Data set augmentation allows deep learning-based virtual screening to better generalize to unseen target classes and highlight important binding interactions. J Chem Inf Model. 2020;60:3722–3730. doi: 10.1021/acs.jcim.0c00263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  312. Schafer M, Oeing CU, Rohm M, Baysal-Temel E, Lehmann LH, Bauer R, et al. ‘Corrigendum to “Ataxin-10 is part of a cachexokine cocktail triggering cardiac metabolic dysfunction in cancer cachexia” [Molecular metabolism 5 (2) (2015) 67-78]’. Mol Metab. 2020;35:100970. doi: 10.1016/j.molmet.2020.02.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  313. Schneider G, Clark DE. Automated de novo drug design: Are we nearly there yet? Angew Chem Int Ed. 2019;58:10792–10803. doi: 10.1002/anie.201814681. [DOI] [PubMed] [Google Scholar]
  314. Schneider P, Schneider G. De novo design at the edge of chaos. J Med Chem. 2016;59:4077–4086. doi: 10.1021/acs.jmedchem.5b01849. [DOI] [PubMed] [Google Scholar]
  315. Schneider P, Walters WP, Plowright AT, Sieroka N, Listgarten J, Goodnow RA, et al. Rethinking drug design in the artificial intelligence era. Nat Rev Drug Discov. 2019;19:353–364. doi: 10.1038/s41573-019-0050-3. [DOI] [PubMed] [Google Scholar]
  316. Schütt KT, Sauceda HE, Kindermans P-J, Tkatchenko A, Müller K-R. SchNet – a deep learning architecture for molecules and materials. J Chem Phys. 2018;148:241722. doi: 10.1063/1.5019779. [DOI] [PubMed] [Google Scholar]
  317. Seo S, Choi J, Park S, Ahn J. Binding affinity prediction for protein–ligand complex using deep attention mechanism based on intermolecular interactions. BMC Bioinf. 2021;22 doi: 10.1186/s12859-021-04466-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  318. Shen C, Ding J, Wang Z, Cao D, Ding X, Hou T. From machine learning to deep learning: Advances in scoring functions for protein–ligand docking. WIREs Comput Mol Sci. 2019;10:e1429. doi: 10.1002/wcms.1429. [DOI] [Google Scholar]
  319. Shen C, Hu X, Gao J, Zhang X, Zhong H, Wang Z, et al. The impact of cross-docked poses on performance of machine learning classifier for protein–ligand binding pose prediction. J Cheminf. 2021;13:1–18. doi: 10.1186/s13321-021-00560-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  320. Shen C, Hu Y, Wang Z, Zhang X, Pang J, Wang G, et al. Beware of the generic machine learning-based scoring functions in structure-based virtual screening. Brief Bioinform. 2020a;22:bbaa070. doi: 10.1093/bib/bbaa070. [DOI] [PubMed] [Google Scholar]
  321. Shen C, Hu Y, Wang Z, Zhang X, Zhong H, Wang G, et al. Can machine learning consistently improve the scoring power of classical scoring functions? insights into the role of machine learning in scoring functions. Brief Bioinform. 2020b;22:497–514. doi: 10.1093/bib/bbz173. [DOI] [PubMed] [Google Scholar]
  322. Sheridan RP. Interpretation of QSAR models by coloring atoms according to changes in predicted activity: How robust is it? J Chem Inf Model. 2019;59:1324–1337. doi: 10.1021/acs.jcim.8b00825. [DOI] [PubMed] [Google Scholar]
  323. Shin B, Park S, Kang K, Ho JC. Self-attention based molecule representation for predicting drug-target interaction; Machine Learning for Healthcare Conference; 2019; pp. 230–248. [Google Scholar]
  324. Shin W-H, Kim J-K, Kim D-S, Seok C. Galaxydock2: Protein–ligand docking using beta-complex and global optimization. Journal of computational chemistry. 2013;34:2647–2656. doi: 10.1002/jcc.23438. [DOI] [PubMed] [Google Scholar]
  325. Shin W-H, Seok C. Galaxydock: protein–ligand docking with flexible protein side-chains. Journal of chemical information and modeling. 2012;52:3225–3232. doi: 10.1021/ci300342z. [DOI] [PubMed] [Google Scholar]
  326. Shoichet BK, Kuntz ID, Bodian DL. Molecular docking using shape descriptors. J Comput Chem. 1992;13:380–397. doi: 10.1002/jcc.540130311. [DOI] [Google Scholar]
  327. Sieg J, Flachsenberg F, Rarey M. In need of bias control: Evaluating chemical data for machine learning in structure-based virtual screening. J Chem Inf Model. 2019;59:947–961. doi: 10.1021/acs.jcim.8b00712. [DOI] [PubMed] [Google Scholar]
  328. Sliwoski G, Kothiwale S, Meiler J, Lowe EW. Computational methods in drug discovery. Pharmacol Rev. 2013;66:334–395. doi: 10.1124/pr.112.007336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  329. Smith JS, Isayev O, Roitberg AE. ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost. Chem Sci. 2017;8:3192–3203. doi: 10.1039/c6sc05720a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  330. Smith RD, Clark JJ, Ahmed A, Orban ZJ, Dunbar JB, Carlson HA. Updates to binding MOAD (Mother of all databases): Polypharmacology tools and their utility in drug repurposing. J Mol Biol. 2019;431:2423–2433. doi: 10.1016/j.jmb.2019.05.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  331. Smith RD, Damm-Ganamet KL, Dunbar JB, Ahmed A, Chinnaswamy K, Delproposto JE, et al. CSAR benchmark exercise 2013: Evaluation of results from a combined computational protein design, docking, and Scoring/Ranking challenge. J Chem Inf Model. 2015;56:1022–1031. doi: 10.1021/acs.jcim.5b00387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  332. Smith RD, Dunbar JB, Ung PM-U, Esposito EX, Yang C-Y, Wang S, et al. CSAR benchmark exercise of 2010: Combined evaluation across all submitted scoring functions. J Chem Inf Model. 2011;51:2115–2131. doi: 10.1021/ci200269q. [DOI] [PMC free article] [PubMed] [Google Scholar]
  333. Soleimany AP, Amini A, Goldman S, Rus D, Bhatia SN, Coley CW. Evidential deep learning for guided molecular property prediction and discovery. ACS central science. 2021;7:1356–1367. doi: 10.1021/acscentsci.1c00546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  334. Son J, Kim D. Development of a graph convolutional neural network model for efficient prediction of protein-ligand binding affinities. PLoS One. 2021;16:e0249404. doi: 10.1371/journal.pone.0249404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  335. Sotriffer CA, Sanschagrin P, Matter H, Klebe G. SFCscore: Scoring functions for affinity prediction of protein-ligand complexes. Proteins: Struct, Funct, Bioinf. 2008;73:395–419. doi: 10.1002/prot.22058. [DOI] [PubMed] [Google Scholar]
  336. Spearman C. The proof and measurement of association between two things. Int J Epidemiol. 2010;39:1137–1150. doi: 10.1093/ije/dyq191. [DOI] [PubMed] [Google Scholar]
  337. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. The journal of machine learning research. 2014;15:1929–1958. [Google Scholar]
  338. Stafford KA, Anderson BM, Sorenson J, van den Bedem H. AtomNet PoseRanker: Enriching ligand pose quality for dynamic proteins in virtual high-throughput screens. J Chem Inf Model. 2022;62:1178–1189. doi: 10.1021/acs.jcim.1c01250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  339. Stepniewska-Dziubinska MM, Zielenkiewicz P, Siedlecki P. Development and evaluation of a deep learning model for protein–ligand binding affinity prediction. Bioinformatics. 2018;34:3666–3674. doi: 10.1093/bioinformatics/bty374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  340. Štrumbelj E, Kononenko I, Robnik Šikonja M. Explaining instance classifications with interactions of subsets of feature values. Data & Knowledge Engineering. 2009;68:886–904. doi: 10.1016/j.datak.2009.01.004. [DOI] [Google Scholar]
  341. Su M, Feng G, Liu Z, Li Y, Wang R. Tapping on the black box: How is the scoring power of a machine-learning scoring function dependent on the training set? J Chem Inf Model. 2020;60:1122–1136. doi: 10.1021/acs.jcim.9b00714. [DOI] [PubMed] [Google Scholar]
  342. Su M, Yang Q, Du Y, Feng G, Liu Z, Li Y, et al. Comparative assessment of scoring functions: The CASF-2016 update. J Chem Inf Model. 2018;59:895–913. doi: 10.1021/acs.jcim.8b00545. [DOI] [PubMed] [Google Scholar]
  343. Sundararajan M, Taly A, Yan Q. Axiomatic attribution for deep networks; International Conference on Machine Learning; 2017. pp. 3319–3328. [Google Scholar]
  344. Sunseri J, King JE, Francoeur PG, Koes DR. Convolutional neural network scoring and minimization in the D3R 2017 community challenge. J Comput Aided Mol Des. 2018;33:19–34. doi: 10.1007/s10822-018-0133-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  345. Sunseri J, Koes DR. libmolgrid: Graphics processing unit accelerated molecular gridding for deep learning applications. J Chem Inf Model. 2020;60:1079–1084. doi: 10.1021/acs.jcim.9b01145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  346. Szegedy C, Toshev A, Erhan D. Deep neural networks for object detection. 2013 [Google Scholar]
  347. Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C. A survey on deep transfer learning; International conference on artificial neural networks; 2018; Springer; pp. 270–279. [Google Scholar]
  348. Tropsha A. Best practices for QSAR model development, validation, and exploitation. Mol Inform. 2010;29:476–488. doi: 10.1002/minf.201000061. [DOI] [PubMed] [Google Scholar]
  349. Trott O, Olson AJ. AutoDock vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem. 2009;31 doi: 10.1002/jcc.21334. NA–NA. [DOI] [PMC free article] [PubMed] [Google Scholar]
  350. Unke OT, Chmiela S, Sauceda HE, Gastegger M, Poltavsky I, Schütt KT, et al. Machine learning force fields. Chem Rev. 2021;121:10142–10186. doi: 10.1021/acs.chemrev.0c01111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  351. Unterthiner T, Mayr A, Klambauer G, Steijaert M, Wegner JK, Ceulemans H, et al. Deep learning as an opportunity in virtual screening. Proceedings of the deep learning workshop at NIPS; 2014. pp. 1–9. [Google Scholar]
  352. Vainio MJ, Puranen JS, Johnson MS. ShaEP: Molecular overlay based on shape and electrostatic potential. J Chem Inf Model. 2009;49:492–502. doi: 10.1021/ci800315d. [DOI] [PubMed] [Google Scholar]
  353. Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, et al. Applications of machine learning in drug discovery and development. Nature reviews Drug discovery. 2019;18:463–477. doi: 10.1038/s41573-019-0024-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  354. van Westen GJP, Wegner JK, Geluykens P, Kwanten L, Vereycken I, Peeters A, et al. Which compound to select in lead optimization? prospectively validated proteochemometric models guide preclinical development. PLoS ONE. 2011;6:e27518. doi: 10.1371/journal.pone.0027518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  355. Varela-Rial A, Maryanow I, Majewski M, Doerr S, Schapin N, Jiménez-Luna J, et al. PlayMolecule glimpse: Understanding Protein–Ligand property predictions with interpretable neural networks. J Chem Inf Model. 2022;62:225–231. doi: 10.1021/acs.jcim.1c00691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  356. Velec HFG, Gohlke H, Klebe G. DrugScoreCSD knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. J Med Chem. 2005;48:6296–6303. doi: 10.1021/jm050436v. [DOI] [PubMed] [Google Scholar]
  357. Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. arXiv preprint. 2017:arXiv:1710.10903 [Google Scholar]
  358. Verdonk ML, Cole JC, Hartshorn MJ, Murray CW, Taylor RD. Improved protein-ligand docking using GOLD. Proteins: Struct, Funct, Bioinf. 2003;52:609–623. doi: 10.1002/prot.10465. [DOI] [PubMed] [Google Scholar]
  359. Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E. Deep learning for computer vision: A brief review. Computational Intelligence and Neuroscience. 2018;2018:1–13. doi: 10.1155/2018/7068349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  360. Wallach I, Dzamba M, Heifets A. AtomNet: A deep convolutional neural network for bioactivity prediction in structure-based drug discovery. arXiv preprint. 2015:arXiv:1510.02855 [Google Scholar]
  361. Wang C, Zhang Y. Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest. J Comput Chem. 2016;38:169–177. doi: 10.1002/jcc.24667. [DOI] [PMC free article] [PubMed] [Google Scholar]
  362. Wang R, Fang X, Lu Y, Wang S. The PDBbind database: Collection of binding affinities for Protein−Ligand complexes with known three-dimensional structures. J Med Chem. 2004;47:2977–2980. doi: 10.1021/jm030580l. [DOI] [PubMed] [Google Scholar]
  363. Wang R, Fang X, Lu Y, Yang C-Y, Wang S. The PDBbind database: Methodologies and updates. J Med Chem. 2005;48:4111–4119. doi: 10.1021/jm048957q. [DOI] [PubMed] [Google Scholar]
  364. Wang R, Lai L, Wang S. Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J Comput Aided Mol Des. 2002;16:11–26. doi: 10.1023/a:1016357811882. [DOI] [PubMed] [Google Scholar]
  365. Wang R, Wang S. How does consensus scoring work for virtual library screening? an idealized computer experiment. J Chem Inf Comp Sci. 2001;41:1422–1426. doi: 10.1021/ci010025x. [DOI] [PubMed] [Google Scholar]
  366. Wang S, Liu D, Ding M, Du Z, Zhong Y, Song T, et al. Se-onionnet: a convolution neural network for protein–ligand binding affinity prediction. Frontiers in Genetics. 2021a:1805. doi: 10.3389/fgene.2020.607824. [DOI] [PMC free article] [PubMed] [Google Scholar]
  367. Wang S, Liu D, Ding M, Du Z, Zhong Y, Song T, et al. SE-OnionNet: A convolution neural network for Protein–Ligand binding affinity prediction. Front Genet. 2021b;11:1805. doi: 10.3389/fgene.2020.607824. [DOI] [PMC free article] [PubMed] [Google Scholar]
  368. Wang S, Riniker S. Machine learning in the area of molecular dynamics simulations. Artificial Intelligence in Drug Discovery. 2020;75:184. [Google Scholar]
  369. Wang Y, Wu S, Duan Y, Huang Y. A point cloud-based deep learning strategy for protein–ligand binding affinity prediction. Brief Bioinform. 2021c;23 doi: 10.1093/bib/bbab474. [DOI] [PubMed] [Google Scholar]
  370. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH. PubChem: A public information system for analyzing bioactivities of small molecules. Nucleic Acids Res. 2009;37:W623–W633. doi: 10.1093/nar/gkp456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  371. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Zhou Z, et al. PubChem’s BioAssay database. Nucleic Acids Res. 2011;40:D400–D412. doi: 10.1093/nar/gkr1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  372. Wang Z, Zheng L, Liu Y, Qu Y, Li Y-Q, Zhao M, et al. OnionNet-2: A convolutional neural network model for predicting protein-ligand binding affinity based on residue-atom contacting shells. Front Chem. 2021d;9:913. doi: 10.3389/fchem.2021.753002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  373. Wee J, Xia K. Ollivier persistent Ricci curvature-based machine learning for the Protein-Ligand binding affinity prediction. J Chem Inf Model. 2021;61:1617–1626. doi: 10.1021/acs.jcim.0c01415. [DOI] [PubMed] [Google Scholar]
  374. Weiler M, Geiger M, Welling M, Boomsma W, Cohen T. 3d steerable cnns: Learning rotationally equivariant features in volumetric data. arXiv preprint. 2018:arXiv:1807.02547 [Google Scholar]
  375. Wellawatte GP, Seshadri A, White AD. Model agnostic generation of counterfactual explanations for molecules. Chemical Science. 2022 doi: 10.1039/d1sc05259d. [DOI] [PMC free article] [PubMed] [Google Scholar]
  376. Widrow B, Hoff ME. Adaptive Switching Circuits Tech rep. Defense Technical Information Center; 1960. [DOI] [Google Scholar]
  377. Wieder M, Fass J, Chodera JD. Teaching free energy calculations to learn from experimental data. bioRxiv. 2021 [Google Scholar]
  378. Williams CK, Rasmussen CE. Gaussian processes for regression. 1996 [Google Scholar]
  379. Winkler DA, Le TC. Performance of deep and shallow neural networks, the universal approximation theorem, activity cliffs, and QSAR. Mol Inform. 2016;36:1600118. doi: 10.1002/minf.201600118. [DOI] [PubMed] [Google Scholar]
  380. Wójcikowski M, Kukiełka M, Stepniewska-Dziubinska MM, Siedlecki P. Development of a protein–ligand extended connectivity (PLEC) fingerprint and its application for binding affinity predictions. Bioinformatics. 2018;35:1334–1341. doi: 10.1093/bioinformatics/bty757. [DOI] [PMC free article] [PubMed] [Google Scholar]
  381. Wójcikowski M, Zielenkiewicz P, Siedlecki P. Open drug discovery toolkit (ODDT): A new open-source player in the drug discovery field. J Cheminf. 2015;7:1–6. doi: 10.1186/s13321-015-0078-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  382. Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, et al. MoleculeNet: A benchmark for molecular machine learning. Chem Sci. 2018;9:513–530. doi: 10.1039/c7sc02664a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  383. Xavier MM, Heck GS, de Avila MB, Levin NMB, Pintro VO, Carvalho NL, et al. SAnDReS a computational tool for statistical analysis of docking results and development of scoring functions. Combinatorial Chemistry & High Throughput Screening. 2016;19:801–812. doi: 10.2174/1386207319666160927111347. [DOI] [PubMed] [Google Scholar]
  384. Xiong G, Shen C, Yang Z, Jiang D, Liu S, Lu A, et al. Featurization strategies for protein-ligand interactions and their applications in scoring function development. WIREs Comput Mol Sci. 2021;12:e1567. doi: 10.1002/wcms.1567. [DOI] [Google Scholar]
  385. Xu Y, Verma D, Sheridan RP, Liaw A, Ma J, Marshall NM, et al. Deep dive into machine learning models for protein engineering. J Chem Inf Model. 2020;60:2773–2790. doi: 10.1021/acs.jcim.0c00073. [DOI] [PubMed] [Google Scholar]
  386. Yakovenko O, Jones SJM. Modern drug design: The implication of using artificial neuronal networks and multiple molecular dynamic simulations. J Comput Aided Mol Des. 2017;32:299–311. doi: 10.1007/s10822-017-0085-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  387. Yang C, Zhang Y. Lin_f9: A linear empirical scoring function for protein–ligand docking. Journal of Chemical Information and Modeling. 2021;61:4630–4644. doi: 10.1021/acs.jcim.1c00737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  388. Yang C-Y, Wang R, Wang S. M-score: A knowledge-based potential scoring function accounting for protein atom mobility. J Med Chem. 2005;49:5903–5911. doi: 10.1021/jm050043w. [DOI] [PubMed] [Google Scholar]
  389. Yang J, Shen C, Huang N. Predicting or pretending: Artificial intelligence for protein-ligand interactions lack of sufficiently large and unbiased datasets. Front Pharmacol. 2020;11:69. doi: 10.3389/fphar.2020.00069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  390. Yang K, Swanson K, Jin W, Coley C, Eiden P, Gao H, et al. Analyzing learned molecular representations for property prediction. J Chem Inf Model. 2019;59:3370–3388. doi: 10.1021/acs.jcim.9b00237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  391. Yang Z, Zhong W, Zhao L, Yu-Chian Chen C. MGraphDTA: Deep multiscale graph neural network for explainable drug–target binding affinity prediction. Chem Sci. 2022;13:816–833. doi: 10.1039/d1sc05180f. [DOI] [PMC free article] [PubMed] [Google Scholar]
  392. Yeturu K, Chandra N. PocketMatch: A new algorithm to compare binding sites in protein structures. BMC Bioinf. 2008;9:1–17. doi: 10.1186/1471-2105-9-543. [DOI] [PMC free article] [PubMed] [Google Scholar]
  393. Ying R, Bourgeois D, You J, Zitnik M, Leskovec J. Gnnexplainer: Generating explanations for graph neural networks. Adv Neural Inf Process Syst. 2019;32:9240. [PMC free article] [PubMed] [Google Scholar]
  394. Young T, Hazarika D, Poria S, Cambria E. Recent trends in deep learning based natural language processing [Review article] IEEE Comput Intell Mag. 2018;13:55–75. doi: 10.1109/mci.2018.2840738. [DOI] [Google Scholar]
  395. Yu Y, Abadi M, Barham P, Brevdo E, Burrows M, Davis A, et al. Dynamic control flow in large-scale machine learning; Proceedings of the Thirteenth EuroSys Conference; 2018; pp. 265–283. [DOI] [Google Scholar]
  396. Yuan H, Yu H, Gui S, Ji S. Explainability in graph neural networks: A taxonomic survey. arXiv preprint. 2020:arXiv:2012.15445. doi: 10.1109/TPAMI.2022.3204236. [DOI] [PubMed] [Google Scholar]
  397. Zhang J, Shi X, Xie J, Ma H, King I, Yeung D-Y. Gaan: Gated attention networks for learning on large and spatiotemporal graphs. arXiv preprint. 2018:arXiv:1803.07294 [Google Scholar]
  398. Zhao Q, Xiao F, Yang M, Li Y, Wang J. AttentionDTA: Prediction of drug–target binding affinity using attention model; 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) IEEE; 2019; pp. 64–69. [DOI] [Google Scholar]
  399. Zheng L, Fan J, Mu Y. OnionNet: A multiple-layer intermolecular-contact-based convolutional neural network for Protein–Ligand binding affinity prediction. ACS Omega. 2019;4:15956–15965. doi: 10.1021/acsomega.9b01997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  400. Zheng Z, Merz KM. Development of the knowledge-based and empirical combined scoring algorithm (KECSA) to score Protein-Ligand interactions. J Chem Inf Model. 2013;53:1073–1083. doi: 10.1021/ci300619x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  401. Zhou Y-T, Chellappa R, Vaid A, Jenkins B. Image restoration using a neural network. IEEE Trans Acoust Speech Signal Process. 1988;36:1141–1151. doi: 10.1109/29.1641. [DOI] [Google Scholar]
  402. Zhu F, Zhang X, Allen JE, Jones D, Lightstone FC. Binding affinity prediction by pairwise function based on neural network. J Chem Inf Model. 2020;60:2766–2772. doi: 10.1021/acs.jcim.0c00026. [DOI] [PubMed] [Google Scholar]
  403. Zilian D, Sotriffer CA. Sfcscore rf: A random forest-based scoring function for improved affinity prediction of protein–ligand complexes. J Chem Inf Model. 2013;53:1923–1933. doi: 10.1021/ci400120b. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

RESOURCES