Abstract
Molecular docking stands as a pivotal element in the realm of computer-aided drug design (CADD), consistently contributing to advancements in pharmaceutical research. In essence, it employs computer algorithms to identify the “best” match between two molecules, akin to solving intricate three-dimensional jigsaw puzzles. At a more stringent level, the molecular docking challenge entails predicting the accurate bound association state based on the atomic coordinates of two molecules. This process assumes particular significance in unraveling the mechanistic intricacies of physicochemical interactions at the atomic scale. Notably, the application of docking, especially in the context of protein-small molecule interactions, holds wide-ranging implications for structure-based drug design, given the prevalent use of small compounds as drug candidates. This study provides an overview of docking methodologies, delves into recent key developments, elucidates the physicochemical underpinnings of molecular recognition in protein-ligand interactions, and concludes by addressing the applications of docking in virtual screening, alongside current challenges within existing docking methods.
Keywords: Molecular docking, Virtual screening, Free energy of binding, Structure-based drug design, Protein-ligand interactions
1. Introduction
Proteins are incredibly diverse macromolecules made up of amino acids as the building blocks. In a cell, proteins interact with ions, small molecules (drugs), and even macromolecules (e.g., other proteins and nucleic acids). During such interactions, shape alteration (referred to as conformational change) almost occurs at the same time. It is known that a structure of protein dictates its function in living organisms [1]. Any dysfunction could lead to pathological changes. Therefore, studying protein interactions and solving the structures are crucial steps for understanding the mechanism of life, in addition to the investigation of biological systems and rational drug design. Direct methods for the determination of three-dimensional structures of the protein-ligand complexes rely on experiments. Traditional techniques include X-ray crystallography, neutron diffraction or electrons diffraction (which depends on the source type of beams used in interacting with the specimen) [2, 3]. The principle of this method is the analysis of diffraction patterns, which requires a crystalline form of the specimen. Proteins are very difficult to crystalize due to the numerous unstructured loops. Therefore, the limitation of X-ray crystallography is the quality of the crystallization. In this circumstance, cryo-electron microscopy (cryo-EM) has the advantage that a crystalline sample is not necessary. However, cryo-EM usually has lower resolution when compared with crystallography. Nevertheless, considerable improvements have been made very recently [4, 5, 6]. Also, magnetic resonance (NMR) spectroscopy has contributed to many structural details and dynamics of biological materials [7]. However, NMR technology has difficulty to solve the structures of large proteins. In summary, all experimental techniques utilized to obtain protein structures have limitations. In addition, they are time consuming and costly. Ultimately, these shortcomings spur the development of computational methods. Molecular docking is one of the examples that implement the integration of computational and experimental strategies [8, 9, 10, 11]. In the past decades, molecular docking has been a powerful approach for computer-aided drug design (CADD). Generally, molecular docking is performed between two molecules (see Figure 1). The target (receptor) is usually a macromolecule such as a protein, DNA or RNA. The other molecule can be a protein, peptide or small molecule. Molecular docking utilizes computational algorithms to predict the bound association of the two molecules. In this work, focus was essentially on docking of small molecules against a protein target, referred to as ligand-protein docking, because of its broad application to structure-based drug design (SBDD) (drug compounds are usually small molecules). With a protein of a known three-dimensional structure, the docking method can predict how ligands interact with their protein targets from a physical or chemical perspective. The determination of protein-ligand complex structures can be done routinely by docking algorithms. In other words, protein-ligand docking approaches allow an automatic way to manipulate the recognition of a drug by its protein target through capturing physical principles.
Figure 1:

Illustration of ligand-protein docking. Docking algorithms generate a variety of complex configurations. The right panel shows the surface representation of a protein binding pocket, with the ligands in different orientations.
Nowadays, protein-ligand docking is widely applied to structure-activity and mutagenesis studies, virtual screening (VS), and lead optimization [12]. With the rapid growth of the protein structures in the Protein Data Bank (as illustrated in Figure 2, data sourced from the Protein Data Bank [13, 14]), protein-ligand docking methods have become a valuable tool for mechanistic biological research and pharmaceutical drug discovery.
Figure 2:

Protein Data Bank growth statistics [13]. Number of structures deposited per year vs the number of total available structures.
2. Physical Basis of Molecular Docking
Protein-ligand interactions are central to the in-depth understanding of protein functions in biology because proteins accomplish molecular recognition through binding with various molecules [15, 16]. Drugs often act as inhibitors when interacting with proteins, because many diseases including cancer are attributed to the abnormally active protein-ligand interactions. In this circumstance, inhibitors can be used to prevent the abnormal interactions for specific therapy. For instance, in the treatment of chronic myelogenous leukemia (CML), STI571 (Gleevec™, imatinib mesylate) is used to inhibit Bcr-Abl target protein activation, which also exemplifies the successful development of the rational drug design [17, 18]. Therefore, insights into protein-ligand interactions are vital for the development of drugs.
Molecular docking algorithms predict protein-ligand interactions in a complex that is formed non-covalently. In biological systems, hydrogen bonds, ionic bonds, Van der Waals interactions and hydrophobic bonds are the four main types of non-covalent interactions [19]. Non-covalent bonds, ranging from 1–5 kcal/mol [20], are weak in comparison with covalent bonds. Therefore, most of the time the term “non-covalent interactions” is used, rather than non-covalent bonds. However, the cumulative effect of these non-covalent interactions can be significant, which means multiple non-covalent bonds often act together at the binding interface of a complex to produce highly stable and specific associations [21].
2.1. Major types of non-covalent interactions
2.1.1. Hydrogen bonds
Hydrogen bonds are polar electrostatic interactions that can be described in the form of D—H … A. Here, D and A stand for an electron donor and acceptor atom, respectively. H is the hydrogen atom attached to a donor atom. Thus, the donor atom must be electronegative [22]. Hydrogen bonds have a strength of about 5 kcal/mol, which is weaker than the covalent bond between an oxygen atom and a hydrogen atom (approximately 110 kcal/mol) [20]. However, biomolecules exist in solvent surroundings, in which the extensive hydrogen bonds between adjacent molecules are broken and reformed constantly, resulting in the change of system enthalpy and entropy. Both the protein and the ligand interact with the solvent before binding, so the enthalpy and entropy influence their complex formation [8, 15]. Also, hydrogen bonds contribute a stabilizing force to macromolecules (e.g., proteins and nucleic acids), which is of great importance to the structure.
2.1.2. Ionic interaction
Ionic interaction is the electronic attraction between oppositely charged ionic pairs. Thus, ionic bonds are highly specific electrostatic interactions. In the aqueous solution, ions are surrounded by a shell of water molecules which is caused by dissolution in water [21, 23, 24].
2.1.3. Van der Waals interactions
When atoms in different molecule are close enough, the transient dipoles in the electron clouds lead to an intermolecular force and form nonspecific Van der Waals interactions. Roughly, the strength of Van der Waals interaction is 1 kcal/mol, which is even weaker than typical hydrogen bonds.
2.1.4. Hydrophobic interactions
Hydrophobic interactions result from the effect that nonpolar molecules tend to exclude from the solvent and aggregate in an adequate surrounding. Hydrophobic interactions are often considered as driven by the entropy gain [24]. However, the molecular mechanisms of the hydrophobic effect remain controversial. In the scaled-particle theory, the hydrophobic effect is multifaceted depending on the size of solute [24, 25].
2.1.5. Other types of interactions
Besides the aforementioned major types of non-covalent interactions, other types of non-covalent interactions also exist in protein-ligand complexes, such as CH-π in protein-sugar complexes due to the presence of aromatic rings [26].
2.2. Enthalpy-Entropy Compensation
The contributions of various non-covalent interactions are intimately linked to two fundamental thermodynamic quantities: entropy and enthalpy. The enthalpic and entropic changes occurring before and after the complex formation determine how tightly the protein and ligand bind together, which is quantified by the Gibbs binding free energy, as shown in Equation 1.
| (1) |
Here, represents the change of the free energy, and is the absolute temperature in Kelvin. represents the types and numbers of chemical bonds and noncovalent interactions that are broken and formed, and is the change in the randomness of the system. The Gibbs binding free energy can be calculated using theoretical methods, or can be quantified experimentally by the reaction rate [27]:
| (2) |
Equation 2 shows that complex stability can be measured through the equilibrium binding constant , which is determined by the two kinetic rate constants, (for the binding reaction) and (for the dissociation reaction). and is the gas constant (8.314 J/mol·K) and the absolute temperature, respectively.
The net driving force for binding is balanced between the two factors, entropy (the tendency to achieve the highest degree of randomness) and enthalpy (the tendency to achieve the most stable bonding state). In fact, the protein-ligand binding process is driven by the decrease in the total Gibbs free energy of the system [15].
2.3. Molecular recognition models
Noncovalent interaction-driven molecular complexes show geometrical complementarity. Three conceptual models of ligand-protein binding have been proposed to explain the mechanism for molecular recognition (Figure 3).
Figure 3:

Illustration of the three conceptual protein-ligand interaction models accompanied with one extended interaction model: (a) The lock-and-key model; both protein and ligand are rigid. (b) The induced-fit model; the conformational change of the protein occurs. (c) The conformational selection model; the ligand binds to the most suitable conformation among an ensemble of protein conformations. (d) The extended conformational selection model; the ligand binds to one protein conformer first and then a subsequent conformational change occurs to the protein. To focus on protein flexibility, this figure does not show ligand flexibility.
Fisher’s lock-and-key model theorizes that the binding interface should be matched complementarily [28]. The protein and the ligand are rigid in this model, which means that their conformations are identical before and after binding. Du et al. suggested that the lock-and-key model is an entropy-dominated binding process [15].
In Koshland’s induced-fit model, conformational change occurs to the protein during binding in order to accommodate the ligand with the best amino acid configuration [29]. It should be noted that in most cases only minor conformational changes arise in proteins after ligand binding. Compared with the lock-and-key model, the induced-fit hypothesis can be linked to a “hand in glove” model, adding flexibility upon Fisher’s idea [30].
In the conformational selection model, ligands bind selectively to the most suitable conformational state among an ensemble substates [31, 32]. In the original conformational selection model, no further conformational rearrangement occurs. However, in an extended conformational selection recognition mechanism, ligands first bind to a protein in the initial favorable protein conformational state, which induces a protein conformational change subsequently. In other words, the ”induced-fit” and ”conformational selection” recognition mechanisms may co-exist in ligand binding [33, 34].
3. Key Components in Ligand-Protein Docking
The basic procedure of docking can be simplified to sampling and ranking, two critical ingredients in current docking programs. Sampling refers to searching putative ligand conformations in the binding site of a receptor. Proteins are usually held rigid in most docking algorithms. Generation of 3D conformers for the ligand is required for flexible (ligand) docking. A good sampling algorithm should be able to generate a diverse ensemble of ligand structures, including the bioactive, bound ligand structure. The second component, ranking, refers to assessing the fit between a ligand conformation and the target by scoring functions. These two docking components will be discussed in detail, respectively.
3.1. Conformational sampling
Biomolecules are inherently dynamic, existing either dissolved within the cytosol or interacting with other cellular components. In addition to the six degrees of freedom—three translational and three rotational—dihedral (torsional) angles significantly enhance their flexibility. This extensive range of motion enables biomolecules to adopt diverse conformations, complicating the computational prediction of their behaviors and properties. For instance, acetylcholine, a neurotransmitter essential for signal transmission across synapses in the nervous system, has a torsional angle between the acetyl group and the choline part that affects its structure (see Figure 4), which in turn influences how it fits into the binding site of acetylcholinesterase.
Figure 4:

Stick representation of acetylcholine with the elements in different colors: C(tan), O(red), N(blue). One of the torsion angles is annotated.
Similarly, proteins are polymers of amino acids that link together through peptide bonds. Therefore, even small variations in the conformations of protein subunits could lead to extensive flexibility in a protein. Figure 5 shows the dihedral angles of an amino acid in a protein. The third backbone dihedral angle (see Figure 5 for its definition) is typically close to 180° or 0° in proteins. The flexible side chain dihedral angle for each residue has various distributions [35]. Furthermore, the task is more complicated because proteins in solution exist as an ensemble of conformations and fluctuate over time, which make it even harder to simulate on computer. While more extensive samplings are needed to account for the induced fit and conformational selection models as introduced in Section 2.3, Zou group has developed a docking package MDock [36, 37, 38] to consider many different conformations in an ensemble without time consuming strain. This will be further delved into in Section 3.3 and Section 4.
Figure 5:

Illustration of dihedral angles in one amino acid residue of a protein: two backbone dihedral angles and one side chain dihedral angle . The elements are in different colors: C(grey), O(red), and N(blue). Hydrogen atoms are not shown. The backbone of this amino acid is colored green.
3.2. Sampling algorithms
Computational conformational sampling has been an active realm for many years because of evolving small molecule modeling and design in pharmaceutical work [39, 40]. Sampling algorithms explore the molecule’s conformational space from two perspectives. One of the categories exhaustively enumerates all possible torsions, which is necessary due to our limited knowledge of pharmaceutically relevant conformational space. However, this method may cause exponential growth of the search space. To overcome high degrees of freedom, systematic search methods store the physicochemical properties in an applicate grid to prevent combinatorial explosion [41]. The typical procedures divide the ligand into fragments first and then dock into the active site sequentially. An application of automated calculating grids is incremental construction (IC) methods (anchor and grow), which are implemented in the application DOCK [42]. DOCK is the first and most widely used docking program, and the newest version is DOCK 6 [43, 44]. Other well known flexible ligand searching programs such as FlexX and Glide also use incremental construction methods [45, 46].
The other perspective uses a random element to change the system degrees of freedom, which is also referred to as stochastic search. Most common stochastic methods include genetic algorithms (GAs) and Monte Carlo (MC) search [47, 48]. Also, simulation methods have been developed and widely used in molecular docking sampling procedures. Molecular dynamics (MD) is one example of simulation methods that the generated state has to be lower in energy than the initial state to move to the next step. Therefore, it may often get trapped in the local minima that cannot cross high energy barriers. An alternative strategy is to identify the conformation preliminarily, and followed by further MD simulations to implement local optimization. Other methods such as increasing the simulation temperature and smoothing potential energy surfaces are also introduced in previous studies [8, 40].
3.3. Scoring functions
Another key aspect of molecular docking is to rank the ligand conformations that have been generated. A good scoring function should be capable of distinguishing between the true binding modes and the decoy modes. Due to the large degrees of freedom in the protein-ligand system, balancing the speed and accuracy is necessary for the scoring step. The scoring functions are mainly divided into four categories: force field-based, empirical, knowledge-based, and machine learning-based. Different scoring functions are listed in Table 1.
Table 1:
Example of scoring functions
| Scoring function | Category | Prominent features | Ref |
|---|---|---|---|
| GOLDScore | force field-based | Default scoring function for GOLD | [48] |
| MedusaScore | force field-based | Including a hydrogen-bonding model and implicit solvent model | [49] |
| DOCK6 | force field-based and footprint similarity | Updated internal energy function | [44] |
| Glide | force field-based and empirical | Modifying and expanding the ChemScore function | [46] |
| AutoDocka | empirical | Allowing side chains to be flexible | [50] |
| AutoDockFRa | empirical | Considering partial receptor flexibility | [51] |
| AutoDock VinaXB | halogen bond scoring function | Adding halogen bonding parameters | [52] |
| ICM | empirical | Allowing flexible side chains which are sampled simultaneously with the ligand | [53] |
| LigScore1 /LigScore2 | empirical | Consisting of three terms that describe the van der Waals interaction | [54] |
| Vina | empirical | Summation of the energetic contributions from both inter-and intramolecular interactions | [55] |
| Vinardo | empirical | Modifying the interaction terms and the atomic radii of the Vina scoring function | [56] |
| Vina-Carb | empirical | Adding a CHI-energy term to the scoring function of AutoDock Vina to improve protein-carbohydrate docking | [57] |
| SLICK | empirical | A scoring function for protein-carbohydrate docking by implementing a CH-n stacking interaction energy term in BALLDock | [58] |
| DrugScore | knowledge-based | Extracting both short-range pair and solvent accessible surface potentials | [59] |
| PFM | knowledge-based | Potentials of mean force for fast calculation of atom pair interaction potentials | [60] |
| ITScore | knowledge-based | An iterative method to derive atomic pair potential energies | [37] |
| ROTA | knowledge-based | pair potentials in GalaxyDocka | [61, 62] |
| AffiScore | knowledge-based | A scoring function in SLIDE for the secondary scoring step | [63] |
| X-CSCORE | consensus | Combine three empirical scoring functions | [64] |
| RF-Score-VS | machine-learning-based | Terms from RF-Score-v3 | [65] |
| ΔvinaRF20 | machine-learning-based | 10 terms from Vina and 10 terms related to buried solvent-accessible surface area (bSASA) | [66] |
| ΔvinaXGB | machine-learning-based | 58 terms from Vina, 30 terms related to bSASA, 3 features related to water effect, 2 features related to ligand stability, 1 feature related to ions | [67] |
| Kdeep | machine-learning-based | Voxelized 24 A representation of the binding site considering 8 pharmacophoric-like properties | [68] |
| Pafnucy | machine-learning-based | Atomic coordinates and 19 features associated with atom type, hybridization bonds, pharmacophoric-like properties or partial charges | [69] |
| DeepDock | machine-learning-based | Ligand-based information and structure-based information | [70] |
| XGB-Score | machine-learning-based | Terms from RF-Score and Vina | [71] |
Allows certain protein flexibility in docking.
3.3.1. Force field-based scoring functions
Classical force field-based scoring functions calculate Coulombic (for electrostatics) and Lennard-Jones (for Van der Waals) terms for non-covalent interactions. More sophisticated forms as an extension of force field-based scoring functions also consider the terms of hydrogen bonds, solvations and entropic contributions.
3.3.2. Empirical scoring functions
Empirical scoring functions are often related to force field-based scoring functions because they consider similar molecular contributions such as hydrogen bonds, ionic interactions, hydrophobic effects and entropic change. Summation of all these components results in the binding free energy, which is ultimately calculated by empirical scoring functions. The empirical coefficients are trained by fitting the scoring function to the experimentally measured binding affinities. Therefore, these coefficients strongly rely on the experimental data set for fitting, which is the main drawback of empirical scoring functions.
3.3.3. Knowledge-based scoring functions
Knowledge-based scoring functions derive atomic pair potentials by statistical principles to extract preferred interaction geometries [72]. High efficiency is one of the major characteristics of knowledge-based scoring functions, in addition to the implicit consideration of the solvent effect, resulting in more common application of knowledge-based scoring functions than other scoring functions [36, 59]. Example knowledge-based scoring functions include DrugScore [59], PMFScore [60], and ITScore [37, 38, 73, 74, 75].
3.3.4. Machine learning-based scoring functions
In recent years, methods based on machine learning (ML), including deep learning (DL), have been applied in scoring functions and demonstrated tremendous success [76, 77, 78, 79]. For example, RFScore [80] and NNScore [81] are two pioneering ML-based scoring functions. These methods have been demonstrated to perform better and more effective than classical methods [82, 83]. However, despite their significant advancements, machine learning scoring functions (MLSFs) have faced skepticism due to issues with generalization [84, 85, 86, 87, 88, 89]. For example, Gabel et al. [84] found that their MLSFs trained on random forest (RF) and support vector machine (SVM) could reproduce the claimed excellent scoring power (the capability for binding affinity prediction) of RFScore [80], but their docking power (the capability for discrimination of near-native poses from decoy poses) and screening power (the capability for discrimination of active ligands from decoy compounds) are rather bad. Nonetheless, efforts persist in enhancing MLSFs to broaden their applicability [66, 67, 70, 90, 91, 92, 93]. Zhang’s group successfully developed [66] and [67], which employ ML algorithms to fit correction terms to AutoDock Vina scores rather than conventionally fit the final binding scores. These methods have excelled in ranking the top three in all four tasks (scoring, docking, ranking, and screening) in the Comparative Assessment of Scoring Functions (CASF) benchmark. They further proposed [90], a novel scoring function that employed a similar parametrization strategy based on another clasical scoring function Lin_F9. This new scoring function achieve excellent scoring, ranking, screening powers on the CASF benchmark. OnionNet-SFCT proposed by Zheng et al. [91] also incorporated a correction term to AutoDock Vina, but this term was calculated based on a binding pose prediction model rather than binding affinity prediction, implying that the training of the model still relies on the involvement of decoy poses. The linear combination of Vina scores and root-mean square error (RMSD) values produced by the ML model enabled the method to yield impressive results in docking and screening tasks. More recently, Méndez-Lucio et al. [70] introduced DeepDock, a geometric DL-based method that introduced a mixture density network (MDN) to learn the probability density distribution of the distance between each ligand atom and each point in the surface of the binding site. This distance likelihood potential performed competitively in docking and screening tasks, hinting at its potential for capturing protein-ligand interactions effectively. Shen et al. [94] reported a DL approach named RTMScore based on residue-atom distance likelihood potential and graph transformer for the prediction of protein-ligand interactions. Their method achieved remarkable performance on CASF-2016 benchmark in terms of docking and screening powers. These advancements underscore the continuous evolution of ML and DL techniques in refining scoring functions and their applications in drug discovery and protein-ligand interaction studies.
The significant growth that machine learning, including deep learning has experienced in the field of drug discovery over recent years is propelled by advancements in data-driven techniques [95, 96]. While molecular docking has demonstrated success in various applications such as protein-ligand docking and drug discovery, physics-based models have been employed to comprehend the interactions between macromolecules and small molecules [96, 97, 98]. However, these models are not well-suited for handling the rapidly expanding experimental datasets. In contrast, machine learning screenings frequently surpass molecular docking predictions [96]. They adeptly forecast drug-related properties and predict drug-target interactions, thereby identifying potential repurposable drugs [96]. The chEMBL database [99] comprises inhibitor datasets for various targets, featuring experimentally established binding labels that serve as the foundation for machine learning predictions [96]. DrugBank, an openly accessible database, houses a diverse collection of compounds, encompassing approved drugs, investigational or off-market drugs, along with their interactions with targets [100, 101]. In a recent investigation, machine learning was employed to repurpose DrugBank compounds for addressing opioid use disorder [96]. These models were applied to screen 8865 compounds within the DrugBank database against four opioid receptors for their potency [96].
The popularity of deep learning has surged, primarily driven by its ability for universal approximation and the increasing availability of powerful, high-performing computational tools [102]. More recently this method in drug discovery has immensely benefitted from the incorporation of Artificial Intelligence (AI) Deep Learning techniques [103, 104, 105], culminating in the phenomenal success of Alpha-Fold2 (AF2), a DL algorithm developed by Google DeepMind [106]. Additionally, DeepMind has collaborated with the European Bioinformatics Institute (EBI) to establish AlphaFoldDB [107], providing open access to a repository containing over 200 million predicted protein structures generated by AlphaFold, providing large coverage of UniProt [108]. DeepMind’s AlphaFold, a prominent deep learning algorithm, gained widespread attention in 2020 for its exceptional performance in the 14th Critical Assessment of Structure Prediction (CASP14) competition, demonstrating remarkable accuracy in predicting protein structures. CASP comprises various categories, including template-guided modeling and template-free modeling [75, 109] and plays a pivotal role in assessing the effectiveness of computational methods in predicting three-dimensional protein structures. In a recent investigation, a new modeling category known as protein-ligand complex modeling was introduced in Round 15 of the Critical Assessment of Structure Prediction (CASP), which was traditionally centered on protein modeling [75]. Zou group participated in predicting the protein-ligand targets in the CASP15, and using a novel template-guided approach that combines a physiochemical, molecular docking method and a bioinformatics-based ligand similarity method, they achieved commendable performance in the CASP15 protein-ligand complex structure prediction [75].
Recently, Stark et al. [110] developed EQUIBIND, a deep neural model which relies on SE(3)-equivariant graph neural networks to predict bound protein-ligand conformations in a single shot, treating docking as a regression problem. EQUIBIND tried to tackle the blind docking task by directly predicting pocket keypoints on both ligand and protein and aligning them. Lu et al. [111] proposed Trigonometry-Aware Neural Networks, TANKBind, which incorporates trigonometry constraints as a rigorous inductive bias into the model and explicitly attends to all possible binding sites for each protein by segmenting the whole protein into functional blocks. In terms of blind docking performance, TANKBind achieved state-of-the-art performance, outperforming EQUIBIND. Stark et al. [112] presented DIFFDOCK, a diffusion generative model over the non-Euclidean manifold of ligand poses. DIFFDOCK significantly outperforms all previous methods on PDBBind [113], and provides confidence estimates with high selective accuracy. More recently, Lu et al. [114] presented DynamicBind, a deep learning method that employs equivariant geometric diffusion networks to construct a smooth energy landscape, thereby promoting efficient transitions between different equilibrium states. DynamicBind unifies two conventionally separated steps: protein conformation generation and ligand pose prediction. This method outperforms previous methods [110, 111, 112] in predicting ligand poses for both the PDBBind dataset and major drug targets (MDT) dataset.
3.3.5. Recent development in scoring functions
It is worth noting that there is no universal scoring function that can fit all biological systems, and it is not appropriate to do so because the existing scoring functions remain to have limitations and most of them are sensitive to the training set or have a limited scope of application. Therefore, when docking against a specific target family, training a scoring function with the particular protein family can usually improve binding mode and binding affinity prediction, which is referred to as target-specific scoring functions [115]. The consensus scoring approach is another way to overcome the deficiencies of individual scoring functions to some extent. Consensus scoring functions can mitigate the bias of a single scoring function and improve the hit rate [116, 117]. The consensus appoach should not include multiple scoring functions that yield highly correlated results; otherwise the bias of these scoring functions would be amplified rather than diminished [116]. Other ongoing methodology development includes adding additional terms to the existing scoring functions to improve the scoring performance [118, 56]. Numerous comparison studies have been reported to evaluate the performance of different scoring functions [36, 119, 120, 121, 122].
4. Docking Programs
In the past two decades, more than 60 docking programs have become available [123], such as SLIDE [63], X-CSCORE [64] and ConsDock [124]. Most of the existing docking algorithms treat ligands as flexible molecules and proteins as rigid molecules, because a protein usually consists of thousands of atoms. In some programs such as GalaxyDock [61], protein side-chains can be optimized.
Moreover, there are indirect ways to consider the protein induced fit effect in ligand binding. A common approach is to utilize ensemble protein conformations. These ensemble protein structures can be obtained either from computational analysis such as molecular dynamics (MD) simulations or from experimental measurements such as NMR or X-ray diffraction. The MDock software incorporates a method referred to as “ensemble docking” to implicitly account for protein flexibility. In this method, the protein conformational number is introduced as an additional variable during docking optimization. This ensemble docking algorithm is computationally efficient; its computational speed is comparable to that of single-protein docking [37, 38, 125]. Mizutani et al. used an enlarged binding pocket in their docking program ADAM to allow for certain protein flexibility [126]. In this approach, the van der Waals energy curve for each atom pair is shiftied to widen the protein cavity uniformly, followed by structure optimization to remove atomic clashes and to improve interaction energy. Similarly, Bottegoni et al. adopted the SCARE (SCan Alanines and REfine) protocol, which enlarges the binding pocket by alanine mutation within the binding pocket [127].
To enhance the success of docking procedures, certain docking programs focus on specific protein or ligand categories. Rather than creating a “one-size-fits-all scoring function”, improving prediction accuracy can be achieved by training a system-specific scoring function or incorporating additional factors into energy calculations. For instance, the ITScore in MDock has been effectively retrained using 2897 protein-ligand complexes sourced from the Protein Data Bank, proving valuable in screening for inhibitors and assessing their selectivities across multiple target proteins [38, 128]. Additionally, Vina-carb [57] integrated the Carbohydrate Intrinsic (CHI) energy terms into the AutoDock Vina scoring function (ADV), improving the docking accuracy to 74%, compared with the 55% when relying soley on the ADV score.
5. Structure-based virtual screening and inverse virtual screening
The applications of molecular docking in therapy interventions can be bidirectional depending on the purpose of docking. Conventional docking searches for a compound from a DrugBank [100, 101] or from a compound library, a process referred to as virtual screening (VS) [116, 129]. On the other hand, for a specific ligand of interest, inverse docking is useful for identification of the targets from thousands of protein candidates, a process called inverse virtual screening (IVS) [117, 130, 131, 132]. Both prediction tasks are challenging and require immense calculations.
Virtual screening methods are classified into structure-based virtual screening and ligand-based virtual screening [133]. Structure-based virtual screening is explored by docking methods on a large scale. In contrast, ligand-based virtual screening methods utilize all the structural and chemical information (e.g., by searching the common features or similarity) rather than using a docking strategy. These two methods are not competitive, and can be used individually or in combination. For instance, a ligand-based prescreening step can be done for preselection in order to reduce the size of the compound database in many structure-based virtual screening applications [116]. Tan et al. integrated the two methods into parallel selection and postulated that structure-based and ligand-based screening could be more effective in a complementarity manner than rank fusion [134].
IVS reverse docking has been widely applied to drug development in recent years [130, 135]. Screening from a drug target database can help rapidly identify potential target candidates for a specific molecule. The molecule can be a non-drug ligand or a known drug. The major safety issues of drug candidates are possible side effects (referred to as off-target effects) and toxicity. IVS can help investigate the interactions of the drug candidate with multiple targets in early-stage drug development. The purpose of IVS studies on known drugs is to search for alternative uses of FDA approved drugs, which reduce costs and risks in comparison with new chemical compounds. This approach is referred to as drug-repositioning (also known as drug repurposing or reprofiling) [136, 137, 138]. Drug development has been trending from “one gene, one drug, one disease” to polypharmacology, which means that drugs are designed to interact with multiple targets. The main challenge of IVS is the costly computational time. Recently, Zou group integrated their fast docking application, MDock, into a webserver platform in order to undertake the IVS tasks. The server is built-in with 3268 protein-ligand complexes (containing 3349 distinct binding sites) associated with 537 protein targets. A typical IVS run for a given ligand takes 1 to 4 hours. For the example, the IVS study on the progesterone receptor takes approxiamately one hour on a workstation with 24 Intel Xeon cores [Intel(R) Xeon(R) CPU E5–2650 v3 @ 2.30GHz]. In addition, the screening results from our web server provide searchable target diseases information [139].
6. Challenges
Molecular docking methodology is rapidly evolving and improving with the exponential increase of computing power. Recently, machine learning algorithms (including neural network and deep learning) [140, 141] and hybrid approaches from high-content screening and virtual screening [142] have been proposed for docking predictions and drug discovery. A critical question in docking is how reliable the docking predictions are [143]. Consequently, benchmarking excises for blind docking prediction have been held, such as Community Structure-Activity Resource (CSAR) [144], Drug Design Data Resource (D3R) [145, 146, 147] and Continuous Evaluation of Ligand Pose Prediction (CELPP) [148]. Zou group participated in the weekly CELPP competition using a fully automated systematic strategy [149]. These large-scale protein-ligand complex structure predictions are statistically significant. Through our analysis, the current sampling (80.4%) is much better than the scoring (46.3%) [149]. The inaccuracy of current scoring functions likely arises from various sources, such as solvent effects, entropic effects, protonation or tautomer states, and so on. Development of new scoring schemes to address these issues is at large. The biggest hurdle in molecular docking stems from protein flexibility. The computational cost is immense for sampling all possible protein conformations. In conclusion, despite decades of docking algorithm development, prediction accuracy remains a major issue.
Acknowledgements
This work received the financial support from NIH R35GM136409 (PI: XZ) and R01HL126774 (PIs: Ira Cohen, Jianmin Cui, and XZ). We are also grateful to the support to XZ from OpenEye Cadence Molecular Sciences (Santa Fe, NM, http://www.eyesopen.com) [150, 151].
Contributor Information
Zhiwei Ma, Dalton Cardiovascular Research Center, University of Missouri-Columbia USA.
Abeeb Ajibade, Dalton Cardiovascular Research Center, University of Missouri-Columbia; Department of Physics and Astronomy, University of Missouri-Columbia USA.
Xiaoqin Zou, Dalton Cardiovascular Research Center, University of Missouri-Columbia; Department of Physics and Astronomy, University of Missouri-Columbia; Department of Biochemistry, University of Missouri-Columbia; Institute for Data Science and Informatics, University of Missouri-Columbia USA.
References
- [1].Starr C, Taggart R, Evers C, Starr L, Biology: The Unity and Diversity of Life; Cengage Learning. (2015). [Google Scholar]
- [2].Rossmann MG, Morais MC, Leiman PG, Zhang W, Combining x-ray crystallography and electron microscopy. Structure 13 (2005), 355–362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Gallagher-Jones M, Rodriguez JA, Miao J, Frontier methods in coherent x-ray diffraction for high-resolution structure determination. Quarterly Reviews of Biophysics 49 (2016). [Google Scholar]
- [4].Callaway E, The revolution will not be crystallized: a new method sweeps through structural biology. Nature News 525 (2015), 172. [DOI] [PubMed] [Google Scholar]
- [5].Glaeser RM, How good can cryo-em become? Nature Methods 13 (2015), 28. [DOI] [PubMed] [Google Scholar]
- [6].Vinothkumar KR, Henderson R, Single particle electron cryomicroscopy: trends, issues and future perspective. Quarterly Reviews of Biophysics 49 (2016). [DOI] [PubMed] [Google Scholar]
- [7].Price WS, Spin dynamics: basics of nuclear magnetic resonance. Concepts in Magnetic Resonance Part A 34 (2009), 60–61. [Google Scholar]
- [8].Brooijmans N, Kuntz ID,Molecular recognition and docking algorithms. Annual Review of Biophysics and Biomolecular Structure 32 (2003), 335–373. [DOI] [PubMed] [Google Scholar]
- [9].Meng X-Y, Zhang H-X, Mezei M, Cui M, Molecular docking: a powerful approach for structure-based drug discovery. Current Computer-aided Drug Design 7 (2011), 146–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Grinter SZ, Zou X, Challenges, applications, and recent advances of protein-ligand docking in structure-based drug design. Molecules 19 (2014), 10150–10176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Ferreira LG, dos Santos RN, Oliva G, Andricopulo AD, Molecular docking and structure-based drug design strategies. Molecules 20 (2015), 13384–13421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Morris GM, Lim-Wilby M, Molecular docking. Molecular Modeling of Proteins; Springer, (2008); 365–382. [DOI] [PubMed] [Google Scholar]
- [13].Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M, The protein data bank: a computer-based archival file for macromolecular structures. Journal of Molecular Biology 112 (1977), 535–542. [DOI] [PubMed] [Google Scholar]
- [14].Overington JP, Al-Lazikani B, Hopkins AL, How many drug targets are there? Nature Reviews Drug Discovery 5 (2006), 993. [DOI] [PubMed] [Google Scholar]
- [15].Du X, Li Y, Xia Y-L, Ai S-M, Liang J, Sang P, Ji X-L, S.-Q, Liu, Insights into protein–ligand interac- tions: mechanisms, models, and methods. International Journal of Molecular Sciences 17 (2016), 144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Chandel TI, Zaman M, Khan MV, Ali M, Rabbani G, Ishtikhar M, Khan RH, A mechanistic insight into protein-ligand interaction, folding, misfolding, aggregation and inhibition of protein aggregates: An overview. International Journal of Biological Macromolecules 106 (2018), 1115–1129. [DOI] [PubMed] [Google Scholar]
- [17].Druker BJ, Sti571 (gleevec™) as a paradigm for cancer therapy. Trends in Molecular Medicine 8 (2002), S14–S18. [DOI] [PubMed] [Google Scholar]
- [18].W Ma Z, Huang S-Y, Cheng F, Zou X, Rapid identification of inhibitors and prediction of ligand selectivity for multiple proteins: application to protein kinases J Phys Chem B. 125(9) (2021), 2288–2298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Agnieszka K, Thermodynamics of ligand-protein interactions: Implications for molecular design. Thermodynamics - Interaction Studies - Solids, Liquids and Gases; InTech, (2011). [Google Scholar]
- [20].Lodish HF, Berk A, Zipursky SL, Matsudaira P, Baltimore D, Darnell J, Molecular Cell Biology; 4th Edition, W. H. Freeman; (2000). [Google Scholar]
- [21].Nelson DL, Cox MM, Lehninger Principles of Biochemistry, 4th Edition; W. H. Freeman; (2004). [Google Scholar]
- [22].Baker EN, Hydrogen bonding in biological macromolecules. International Tables for Crystallography; International Union of Crystallography, (2006), 546–552. [Google Scholar]
- [23].Ninham BW, Yaminsky V, Ion binding and ion specificity: the hofmeister effect and onsager and lifshitz theories. Langmuir 13 (1997), 2097–2108. [Google Scholar]
- [24].Chandler D, Interfaces and the driving force of hydrophobic assembly. Nature 437 (2005), 640. [DOI] [PubMed] [Google Scholar]
- [25].Snyder PW, Mecinovi J, Moustakas DT, Thomas SW, Harder M, Mack ET, Lockett MR, Hèroux A, Sherman W, Whitesides GM, Mechanism of the hydrophobic effect in the biomolecular recognition of arylsulfonamides by carbonic anhydrase. Proceedings of the National Academy of Sciences (2011) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Muraki M, Ishimura M, Harata K, Interactions of wheat-germ agglutinin with GlcNAc1,6gal sequence. Biochimica et Biophysica Acta (BBA) - General Subjects 1569 (2002), 10–20. [DOI] [PubMed] [Google Scholar]
- [27].Olsson TS, Williams MA, Pitt WR, Ladbury JE, The thermodynamics of protein–ligand interaction and solvation: Insights for ligand design. Journal of Molecular Biology 384 (2008), 1002–1017. [DOI] [PubMed] [Google Scholar]
- [28].Fischer E, Influence of configuration on the action of enzymes. Berichte der Deutschen Chemischen Gesellschaft 27 (1894), 2985–2993. [Google Scholar]
- [29].Koshland D, Application of a theory of enzyme specificity to protein synthesis. Proceedings of the National Academy of Sciences 44 (1958), 98–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Koshland D Jr, he key-lock theory and the induced fit theory. Angewandte Chemie-English Edition 33 (1994), 2475. [Google Scholar]
- [31].Ma B, Kumar S, Tsai C-J, Folding funnels and binding mechanisms. Nussinov, R. Protein engineering 12 (1999), 713–720. [DOI] [PubMed] [Google Scholar]
- [32].Tsai C-J, Kumar S, Ma B, Folding funnels, binding funnels, and protein function. Nussinov, R. Protein Science 8 (1999), 1181–1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Boehr DD, Nussinov R, Wright PE, The role of dynamic conformational ensembles in biomolecular recognition. Nature Chemical Biology 5 (2009), 789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Csermely P, Palotai R, Nussinov R, Induced fit, con- formational selection and independent dynamic segments: an extended view of binding events. Trends in Biochemical Sciences 35 (2010), 539–546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Zhou AQ, O’Hern CS, Regan L, Predicting the side-chain dihedral angle distributions of nonpolar, aromatic, and polar amino acids using hard sphere models. Proteins: Structure, Function, and Bioinformatics 82 (2014), 2574–2584. [DOI] [PubMed] [Google Scholar]
- [36].Huang S-Y, Zou X, An iterative knowledge-based scoring function to predict protein–ligand interactions: II. Validation of the scoring function Journal of Computational Chemistry 27 (2006), 1876–1882. [DOI] [PubMed] [Google Scholar]
- [37].Huang S-Y, Zou X, An iterative knowledge-based scoring function to predict protein–ligand interactions: I. derivation of interaction potentials. Journal of Computational Chemistry 27 (2006), 1866–1875. [DOI] [PubMed] [Google Scholar]
- [38].Yan C, Zou X, Mdock: An ensemble docking suite for molecular docking, scoring and in silico screening. In Computer-Aided Drug Discovery. Methods in Pharmacology and Toxicology.; Springer, (2015), 153–166. [Google Scholar]
- [39].Leach AR, A Survey of Methods for Searching the Conformational Space of Small and Medium-Sized Molecules Reviews in Computational Chemistry; John Wiley & Sons, Inc., (2007), 1–55. [Google Scholar]
- [40].Agrafiotis DK, Gibbs AC, Zhu F, Izrailev S, Martin E, Conformational sampling of bioactive molecules: a comparative study. Journal of Chemical Information and Modeling 47 (2007), 1067–1086. [DOI] [PubMed] [Google Scholar]
- [41].Goodford PJ, A computational procedure for determining en- ergetically favorable binding sites on biologically important macro- molecules. Journal of Medicinal Chemistry 28 (1985), 849–857. [DOI] [PubMed] [Google Scholar]
- [42].Meng EC, Shoichet BK, Kuntz ID, Automated docking with grid-based energy evaluation. Journal of Computational Chemistry 13 (1992), 505–524. [Google Scholar]
- [43].Kuntz ID, Blaney JM, Oatley SJ, Langridge R, Ferrin TE, A geometric approach to macromolecule-ligand interactions. Journal of Molecular Biology 161 (1982), 269–288. [DOI] [PubMed] [Google Scholar]
- [44].Allen WJ, Balius TE, Mukherjee S, Brozell SR, Moustakas DT, Lang PT, Case DA, Kuntz ID, Rizzo RC, DOCK 6: Impact of new features and current docking performance. Journal of Computational Chemistry 36 (2015), 1132–1156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Rarey M, Karamer B, Lengauer T, Klebe G, A fast flexible docking method using an incremental construction algorithm. Journal of Molecular Biology 261 (1996), 470–489. [DOI] [PubMed] [Google Scholar]
- [46].Friesner RA, Banks JL, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, Repasky MP, Knoll EH, Shelley M, Perry JK, Shaw DE, Francis P, Shenkin PS, Glide: a new approach for rapid, accurate docking and scoring. 1. method and assessment of docking accuracy. Journal of Medicinal Chemistry 47 (2004), 1739–1749. [DOI] [PubMed] [Google Scholar]
- [47].Bean JC, Genetic algorithms and random keys for sequencing and optimization. ORSA Journal on Computing 6 (1994), 154–160. [Google Scholar]
- [48].Jones G, Willett P, Glen RC, Leach AR, Taylor R, Development and validation of a genetic algorithm for flexible docking 1 1edited by F. E. Cohen. Journal of Molecular Biology 267 (1997), 727–748. [DOI] [PubMed] [Google Scholar]
- [49].Yin S, Biedermannova L, Vondrasek J, Dokholyan NV, MedusaScore: An accurate force field-based scoring function for virtual drug screening. Journal of Chemical Information and Modeling 48 (2008), 1656–1662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, Belew RK, Olson A, Automated docking using a lamarckian genetic algorithm and an empirical binding free energy function. Journal of Computational Chemistry 19 (1998), 1639–1662. [Google Scholar]
- [51].Ravindranath PA, Forli S, Goodsell DS, Olson AJ, Sanner MF, AutoDockFR: Advances in protein-ligand docking with explicitly specified binding site flexibility. PLOS Computational Biology 11 (2015), e1004586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Koebel MR, Schmadeke G, Posner RG, Sirimulla S, AutoDock VinaXB: implementation of XBSF, new empirical halogen bond scoring function, into AutoDock vina. Journal of Cheminformatics 8 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Abagyan R, Totrov M, Kuznetsov D, ICM-a new method for protein modeling and design: Applications to docking and structure prediction from the distorted native conformation. Journal of Computational Chemistry (1994), 488–506. [Google Scholar]
- [54].Krammer A, Kirchhoff PD, Jiang X, Venkatachalam C, Waldman M, LigScore: a novel scoring function for predicting binding affinities. Journal of Molecular Graphics and Modelling 23 (2005), 395–407. [DOI] [PubMed] [Google Scholar]
- [55].Trott O, Olson AJ, Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient opti- mization, and multithreading. Journal of Computational Chemistry 31 (2010), 455–461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Quiroga R, Villarreal MA, Vinardo: A scoring function based on autodock vina improves scoring, docking, and virtual screening. PLOS ONE 11 (2016), e0155183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Nivedha AK, Thieker DF, Makeneni S, Hu H, Woods RJ, Vinacarb: Improving glycosidic angles during carbohydrate docking. Journal of Chemical Theory and Computation 12 (2016), 892–901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [58].Kerzmann A, Fuhrmann J, Kohlbacher O, Neumann D, BALLDock/SLICK: A new method for protein-carbohydrate docking. Journal of Chemical Information and Modeling 48 (2008), 1616–1625. [DOI] [PubMed] [Google Scholar]
- [59].Gohlke H, Hendlich M, Klebe G, Knowledge- based scoring function to predict protein-ligand interactions. Journal of Molecular Biology 295 (2000), 337–356. [DOI] [PubMed] [Google Scholar]
- [60].Muegge I, Martin YC, A general and fast scoring function for protein-ligand interactions: a simplified potential approach. Journal of Medicinal Chemistry 42 (1999), 791–804. [DOI] [PubMed] [Google Scholar]
- [61].Shin W-H, Seok C, GalaxyDock: Protein–ligand docking with flexible protein side-chains. Journal of Chemical Information and Modeling 52 (2012), 3225–3232. [DOI] [PubMed] [Google Scholar]
- [62].Hartmann C, Antes I, Lengauer T, Docking and scoring with alternative side-chain conformations. Proteins: Structure, Function, and Bioinformatics 74 (2009), 712–726. [DOI] [PubMed] [Google Scholar]
- [63].Zavodszky MI, Rohatgi A, Voorst JRV, Yan H, Kuhn LA, Scoring ligand similarity in structure-based virtual screening. Journal of Molecular Recognition 22 (2009), 280–292. [DOI] [PubMed] [Google Scholar]
- [64].Wang R, Lai L, Wang S, Further development and validation of empirical scoring functions for structure-based binding affinity prediction. Journal of Computer-aided Molecular Design 16 (2002), 11–26. [DOI] [PubMed] [Google Scholar]
- [65].Wójcikowski M, Ballester PJ, Siedlecki P, Performance of machine-learning scoring functions in structure-based virtual screening. Scientific Reports 7 (2017), 46710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [66].Wang C, Cheng Y Zhang, Improving scoring-docking-screening powers of protein–ligand scoring functions using random forest. Journal of Computational Chemistry 38 (2017), 169–177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [67].Lu J, Hou X, Wang C, Zhang Y, Incorporating explicit water molecules and ligand conformation stability in machine-learning scoring functions. Journal of Chemical Information and Modeling 59 (2019), 4540–4549. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [68].Jiménez J, Skalic M, Martinez-Rosell G, De Fabritiis G, K deep: protein–ligand absolute binding affinity prediction via 3d-convolutional neural networks. Journal of Chemical Information and Modeling 58 (2018), 287–296. [DOI] [PubMed] [Google Scholar]
- [69].Stepniewska-Dziubinska MM, Zielenkiewicz P, Siedlecki P, Development and evaluation of a deep learning model for protein–ligand binding affinity prediction. Bioinformatics (2018), 3666–3674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [70].Méndez-Lucio O, Ahmed M, del Rio-Chanona EA, Wegner JK, A geometric deep learning approach to predict binding conformations of bioactive molecules. Nature Machine Intelligence 3 (2021), 1033–1039. [Google Scholar]
- [71].Li H, Peng J, Sidorov P, Leung Y, Leung KS, Wong MH, Lu G, Ballester PJ, Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data. Bioinformatics 35 (2019), 3989–3995. [DOI] [PubMed] [Google Scholar]
- [72].Gohlke H, Klebe G, Statistical potentials and scoring functions applied to protein–ligand binding. Current opinion in Structural Biology 11 (2001), 231–235. [DOI] [PubMed] [Google Scholar]
- [73].Yan C, Grinter SZ, Merideth BR, Ma Z, Zou X, Iterative Knowledge-Based Scoring Functions Derived from Rigid and Flexible Decoy Structures: Evaluation with the 2013 and 2014 CSAR Benchmarks Journal of Chemical Information and Modeling, 56 (2016), 1013–1021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [74].Xu X, Yan C, Zou X, Improving binding mode and binding affinity predictions of docking by ligand-based search of protein conformations: evaluation in D3R grand challenge 2015 Journal of Computer-aided Molecular Design, 31 (2017), 689–699 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [75].Xu X, Duan R, Zou X, Template-guided method for protein–ligand complex structure prediction: Application to casp15 protein–ligand studies. Proteins: Structure, Function, and Bioinformatics, 91 (2023), 1829–1831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [76].Lo Y-C, Rensi SE, Torng W, Altman RB, Machine learning in chemoinformatics and drug discovery. Drug Discovery Today 23 (2018), 1538–1546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [77].Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M, Zhao S, Applications of machine learning in drug discovery and development. Nature Reviews Drug Discovery 18 (2019), 463–477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [78].Shen C, Ding J, Wang Z, Cao D, Xiaoqin D, Tingjun H, From machine learning to deep learning: Advances in scoring functions for protein–ligand docking. Computational Molecular Science 10 (2020a), e1429. [Google Scholar]
- [79].Lavecchia A, Machine-learning approaches in drug discovery: methods and applications. Drug Discovery Today 3 (2015), 318–331. [DOI] [PubMed] [Google Scholar]
- [80].Ballester PJ, Mitchell JBO, A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking. Bioinformatics 26 (2010), 1169–1175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [81].Durrant JD, McCammon JA, NNScore: a neural-network-based scoring function for the characterization of protein-ligand complexes. Journal of Chemical Information and Modeling 50 (2010), 1865–1871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [82].Ain QU, Aleksandrova A, Florian DR, Ballester PJ, Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Computational Molecular Science 5 (2015), 405–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [83].Heck GS, Pintro VO, Pereira RR, de Ávila MB, Levin NMB, de WF Azevedo Supervised machine learning methods applied to predict ligand-binding affinity. Current Medicinal Chemistry 24 (2017), 2459–2470. [DOI] [PubMed] [Google Scholar]
- [84].Gabel J, Desaphy J, Rognan D, Beware of Machine Learning-Based Scoring Functions-On the Danger of Developing Black Boxes. Journal of Chemical Information and Modeling 54 (2014), 2807–2815. [DOI] [PubMed] [Google Scholar]
- [85].Shen C, Hu Y, Wang Z, Zhang X, Pang J, Wang G, Zhong H, Xu L, Cao D, Hou T, Beware of the generic machine learning-based scoring functions in structure-based virtual screening. Briefings in Bioinformatics 22 (2021), bbaa070. [DOI] [PubMed] [Google Scholar]
- [86].Adeshina YO, Deeds EJ, Karanicolas J, Machine learning classification can reduce false positives in structure-based virtual screening. Proceedings of the National Academy of Sciences 117 (2020), 18477–18488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [87].Shen C, Hu Y, Wang Z, Zhang X, Zhong H, Wang G, Yao X, Xu L, Cao D, Hou T, Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions. Briefings in Bioinformatics 22 (2021), 497–514. [DOI] [PubMed] [Google Scholar]
- [88].Su M, Feng G, Liu Z, Wang R, Tapping on the black box: how is the scoring power of a machine-learning scoring function dependent on the training set?. Journal of Chemical Information and Modeling 60 (2020), 1122–1136. [DOI] [PubMed] [Google Scholar]
- [89].Yang J, Shen C, Huang N, Predicting or pretending: artificial intelligence for protein-ligand interactions lack of sufficiently large and unbiased datasets. Frontiers in Pharmacology 11 (2020), 508760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [90].Yang C, Zhang Y, Delta machine learning to improve scoring-ranking-screening performances of protein–ligand scoring functions. Journal of Chemical Information and Modeling 62 (2022), 2696–2712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [91].Zheng L, Meng J, Jiang K, Lan H, Wang Z, Lin M, Li W, Guo H, Mu Y, Improving protein–ligand docking and screening accuracies by incorporating a scoring function correction term. Briefings in Bioinformatics 23 (2022), bbac051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [92].Ashtawy HM, Mahapatra NR, Task-specific scoring functions for predicting ligand binding poses and affinity and for screening enrichment. Journal of Chemical Information and Modeling 58 (2018), 119–133. [DOI] [PubMed] [Google Scholar]
- [93].Moon S, Zhung W, Yang S, Lim J, Kim WY, PIGNet: a physicsinformed deep learning model toward generalized drug–target interaction predictions. Chemical Science 13 (2022), 3661–3673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [94].Shen C, Zhang X, Deng Y, Gao J, Wang D, Xu L, Pan P, Hou T, Kang Y, Boosting protein–ligand binding pose prediction and virtual screening based on residue–atom distance likelihood potential and graph transformer. Journal of Medicinal Chemistry 65 (2022), 10691–10706. [DOI] [PubMed] [Google Scholar]
- [95].Wei G-W, Zhu F, Merz KM, Machine-learning repurposing of drugbank compounds for opioid use disorder. Journal of Chemical Information and Modeling 62 (2022), 3941–3941. [DOI] [PubMed] [Google Scholar]
- [96].Feng H, Jiang J, Wei G-W, Machine learning in bio- cheminformatics. Computers in Biology and Medicine, 160 (2023), 106921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [97].MacKerell AD Jr., Bashford D, Bellott M, et al. , All-atom empirical potential for molecular modeling and dynamics study of proteins. The Journal of Physical Chemistry B 102 (1998), 3586–3616. [DOI] [PubMed] [Google Scholar]
- [98].Karplus M and McCammon JA, Molecular dynamics simulations of biomolecules. Nature Structural & Molecular Biology, 9 (2002), 646–652. [DOI] [PubMed] [Google Scholar]
- [99].Gaulton A, Hersey A, Nowotka M, et al. The chembl database in 2017. Nucleic Acids Research, 45(D1) (2017), D945–D954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [100].Wishart DS, DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research 34 (2006), D668–D672 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [101].Knox C, Wilson M, Klinger C, et al. , DrugBank 6.0: the DrugBank Knowledgebase for 2024. Nucleic Acids Research 52 (2023), D1265–D1275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [102].Cheng L, Qiu Y, Schmidt BJ, and Wei GW, Review of applications and challenges of quantitative systems pharmacology modeling and machine learning for heart failure. Journal of Pharmacokinetics and Pharmacodynamics, 49 (2022), 39–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [103].Lensink Marc F.,Brysbaert, Raouraoua N, Bates PA, Giulini M, Honorato RV, van Noort C, Teixeira JMC, Bonvin AMJJ, et al. , Impact of alphafold on structure prediction of protein com- plexes: The casp15-capri experiment. Proteins: Structure, Function, and Bioinformatics, 91 (2023), 1658–1683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [104].Min S, Lee B, Yoon S, Deep learning in bioinformatics. Briefings in Bioinformatics, 18(5) (2017), 851–869. [DOI] [PubMed] [Google Scholar]
- [105].Wang S, Sun S, Li Z, Zhang R, Xu J, Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Computational Biology, 13(1) (2017), e1005324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [106].Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with alphafold. Nature, 596 (2021), 583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [107].Varadi M, Anyango S, Deshpande M, et al. , Alphafold protein structure database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research, 50 (2022), D439–D444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [108].Consortium UniProt, Uniprot: The universal protein knowledgebase in 2021. Nucleic Acids Research, 49 (2021), D480–D489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [109].Xu X and Zou X, Dissimilar ligands bind in a simi- lar fashion: A guide to ligand binding-mode prediction with appli- cation to celpp studies. International Journal of Moleculari Science, 22 (2021), 12320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [110].Stärk H, Ganea O, Pattanaik L, Barzilay R, Jaakkola T, Geometric deep learning for drug binding structure prediction. International Conference on Machine Learning 39 (2022), 20503–20521. [Google Scholar]
- [111].Lu W, Wu Q, Zhang J, Rao J, Li C, Zheng S, Tankbind: Trigonometry-aware neural networks for drug-protein binding structure prediction. Advances in Neural Information Processing Systems 35 (2022), 7236–7249. [Google Scholar]
- [112].Corso G, Stärk H, Jing B, Barzilay R, Jaakola T, Diffdock: Diffusion steps, twists, and turns for molecular docking. arXiv preprint arXiv:2210.01776 (2022). [Google Scholar]
- [113].Liu Z, Li Y, Han L et al. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31 (2015), 405–412. [DOI] [PubMed] [Google Scholar]
- [114].Lu W, Zhang J, Huang W, Zhang Z, Jia X, Wang Z, Shi L, Li C, Wolynes G, Zheng S, DynamicBind: predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model. Nature Communications 15 (2024), 1071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [115].Guedes IA, de Magalhas CS, Dardenne LE, Receptor–ligand molecular docking. Biophysical Reviews 6 (2013), 75–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [116].Kitchen DB, Decornez H, Furr JR, Bajorath J, Docking and scoring in virtual screening for drug discovery: methods and applications. Nature Reviews Drug Discovery 3 (2004), 935. [DOI] [PubMed] [Google Scholar]
- [117].Huang H, Zhang G, Zhou Y, Lin C, Chen S, Lin Y, Mai S, Huang Z, Reverse screening methods to search for the protein targets of chemopreventive compounds. Frontiers in Chemistry 6 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [118].Li H, Leung K-S, Ballester PJ, Wong M-H, istar: A web platform for large-scale protein-ligand docking. PLoS ONE 9 (2014), e85678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [119].Ferrara P, Gohlke H, Price DJ, Klebe G, Brooks CL, Assessing scoring functions for protein- ligand interactions. J Journal of Medicinal Chemistry 47 (2004), 3032–3047. [DOI] [PubMed] [Google Scholar]
- [120].Li Y, Han L, Liu Z, Wang R, Comparative assessment of scoring functions on an updated benchmark: 2. evaluation methods and general results. Journal of Chemical Information and Modeling 54 (2014), 1717–1736. [DOI] [PubMed] [Google Scholar]
- [121].Xu W, Lucke AJ, Fairlie DP, Comparing sixteen scoring functions for predicting biological activities of ligands for protein targets. Journal of Molecular Graphics and Modelling 57 (2015), 76–88. [DOI] [PubMed] [Google Scholar]
- [122].Huang S-Y, Comprehensive assessment of flexible-ligand docking algorithms: current effectiveness and challenges. Briefings in Bioinformatics (2017). [DOI] [PubMed] [Google Scholar]
- [123].Pagadala NS, Syed K, Tuszynski J, Comprehensive assessment of flexible-ligand docking algorithms: current effectiveness and challenges. Biophysical Reviews 9 (2017), 91–102.28510083 [Google Scholar]
- [124].Paul N, Rognan D, ConsDock: A new program for the consensus analysis of protein-ligand interactions. Proteins: Structure, Function, and Genetics 47 (2002), 521–533. [DOI] [PubMed] [Google Scholar]
- [125].Huang S-Y, Zou X, Ensemble docking of multiple protein structures: Considering protein structural variations in molecular docking. Proteins: Structure, Function, and Bioinformatics 66 (2006), 399–421. [DOI] [PubMed] [Google Scholar]
- [126].Mizutani MY, Takamatsu Y, Ichinose T, Nakamura K, Itai A, Effective handling of induced-fit motion in flexible docking. Proteins: Structure, Function, and Bioinformatics 63 (2006), 878–891. [DOI] [PubMed] [Google Scholar]
- [127].Bottegoni G, Kufareva I, Totrov M, Abagyan R, A new method for ligand docking to flexible receptors by dual alanine scanning and refinement (SCARE). Journal of Computer-Aided Molecular Design 22 (2008), 311–325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [128].Ma Z, Zou X, MDock: A suite for molecular inverse docking and target prediction Protein-Ligand Interactions and Drug Design, (2021), 313–22. [DOI] [PubMed] [Google Scholar]
- [129].Malik V, Dhanjal JK, Kumari A, Radhakrishnan N, Singh K, Sundar D, Function and structure-based screening of compounds, peptides and proteins to identify drug candidates. Methods 131 (2017), 10–21. [DOI] [PubMed] [Google Scholar]
- [130].Xu X, Huang M, Zou X, Docking-based inverse virtual screening: methods, applications, and challenges. Biophysics Reports 4 (2018), 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [131].Zheng M, Liu X, Xu Y, Li H, Luo C, Jiang H, Computational methods for drug design and discovery: focus on china. Trends in Pharmacological Sciences 34 (2013), 549–559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [132].Lauro G, Masullo M, Piacente S, Riccio R, Bifulco G, Inverse virtual screening allows the discovery of the biological activity of natural compounds. Bioorganic & Medicinal Chemistry 20 (2012), 3596–3602. [DOI] [PubMed] [Google Scholar]
- [133].Lavecchia A, Di Giovanni C, Virtual screening strate- gies in drug discovery: a critical review. Current medicinal chemistry 20 (2013), 2839–2860. [DOI] [PubMed] [Google Scholar]
- [134].Tan L, Geppert H, Sisay M, Gtschow M, Bajorath J, Integrating structure and ligand-based virtual screening: Comparison of individual, parallel, and fused molecular docking and similarity search calculations on multiple targets. ChemMedChem 3 (2008), 1566–1571. [DOI] [PubMed] [Google Scholar]
- [135].Bullock C, Cornia N, Jacob R, Remm A, Peavey T, Weekes K, Mallory C, Oxford JT, McDougal OM, Andersen TL, DockoMatic 2.0: High throughput inverse virtual screening and homology modeling. Journal of Chemical Information and Modeling 53 (2013), 2161–2170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [136].Keiser MJ, et al. Predicting new molecular targets for known drugs. Nature 462 (2009), 175–181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [137].Haupt VJ, Schroeder M, Old friends in new guise: repositioning of known drugs with structural bioinformatics. Briefings in Bioinformatics 12 (2011), 312–326. [DOI] [PubMed] [Google Scholar]
- [138].Nagaraj AB, Wang QQ, Joseph P, Zheng C, Chen Y, Kovalenko O, Singh S, Armstrong A, Resnick K, Zanotti K, Waggoner S, Xu R, DiFeo A, Using a novel computational drug-repositioning approach (DrugPredict) to rapidly identify potent drug candidates for cancer treatment. Oncogene 37 (2017), 403–414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [139].Ma Z, Xu X, Zou X, MDockServer: An efficient docking platform for inverse virtual screening. Biophysical Journal 114 (2018), 56a. [Google Scholar]
- [140].Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T, The rise of deep learning in drug discovery. Drug Discovery Today 23 (2018), 1241–1250. [DOI] [PubMed] [Google Scholar]
- [141].Russo DP, Zorn KM, Clark AM, Zhu H, Ekins S, Comparing multiple machine learning algorithms and metrics for estrogen receptor binding prediction. Molecular Pharmaceutics 15 (2018), 4361–4370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [142].Samardzhieva I, Khan A, Neccessasity of bio-imaging hybrid approaches accelerating drug discovery process (mini-review). International Journal of Computer Applications 182 (2018), 1–10. [Google Scholar]
- [143].Chen Y-C, Beware of docking! Trends in Pharmacological Sciences 36 (2015), 78–95. [DOI] [PubMed] [Google Scholar]
- [144].Carlson HA, Lessons Learned over Four Benchmark Exercises from the Community Structure Activity Resource Journal of Chemical Information and Modeling 56 (2016), 951–954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [145].Gathiaka S, Liu S, Chiu M, Yang H, Stuckey JA, Kang YN, Delproposto J, Kubish G, Dunbar JB, Carlson HA, Burley SK, Walters WP, Amaro RE, Feher VA, Gilson MK, D3R grand challenge 2015: Evaluation of protein–ligand pose and affinity predictions Journal of Computer-Aided Molecular Design 30 (2016), 651–668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [146].Gaieb Z, Liu S, Gathiaka S, Chiu M, Yang H, Shao C, Feher VA, Walters P, Kuhn B, Rudolph MG, Burley SK, Gilson MK, R. e. Amaro, D3r grand challenge 2: blind prediction of protein–ligand poses, affinity rankings, and relative binding free energies. Journal of Computer-Aided Molecular Design 32 (2017), 1–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [147].Gaieb Z, Parks CD, Chiu M, Yang H, Shao C, Walters P, Lambert MH, Nevins N, Bembenek SD, Ameriks MK, Mirzadegan T, Burley SK, Amaro RE, Gilson MK, D3r grand challenge 3: blind prediction of protein–ligand poses and affinity rankings. Journal of Computer-Aided Molecular Design 33 (2019), 1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [148].Wagner JR, Churas CP, Liu S, Swift RV, Chiu M, Shao C, Feher VA, Burley SK, Gilson MK, Amaro RE, Continuous evaluation of ligand protein predictions: A weekly community challenge for drug docking. Structure 27 (2019), 1326–1335.e4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [149].Xu X, Ma Z, Duan R, Zou X, Predicting protein–ligand binding modes for CELPP and GC3: workflows and insight. Journal of Computer-Aided Molecular Design 33 (2019), 367–374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [150].Hawkins PC, Skillman AG, Warren GL, Ellingson BA, Stahl MT, Conformer generation with omega: algorithm and validation using high quality structures from the protein databank and Cambridge structural database. Journal of Chemical Information and Modeling, 50 (2010), 572–584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [151].Hawkins PC, Nicholls A, Conformer generation with OMEGA: learning from the data set and the analysis of failures. Journal of Chemical Information and Modeling, 52 (2012), 2919–2936. [DOI] [PubMed] [Google Scholar]
