Skip to main content
QRB Discovery logoLink to QRB Discovery
. 2022 Aug 19;3:e13. doi: 10.1017/qrd.2022.11

Challenges and frontiers of computational modelling of biomolecular recognition

Jinan Wang 1, Apurba Bhattarai 1, Hung N Do 1, Yinglong Miao 1,*
PMCID: PMC10299731  NIHMSID: NIHMS1870156  PMID: 37377636

graphic file with name S2633289222000114_figAb.jpg

Key words: biomolecular recognition, enhanced sampling, kinetics, machine learning, molecular dynamics, thermodynamics

Abstract

Biomolecular recognition including binding of small molecules, peptides and proteins to their target receptors plays a key role in cellular function and has been targeted for therapeutic drug design. However, the high flexibility of biomolecules and slow binding and dissociation processes have presented challenges for computational modelling. Here, we review the challenges and computational approaches developed to characterise biomolecular binding, including molecular docking, molecular dynamics simulations (especially enhanced sampling) and machine learning. Further improvements are still needed in order to accurately and efficiently characterise binding structures, mechanisms, thermodynamics and kinetics of biomolecules in the future.

Introduction

Biomolecular recognition plays key roles in many fundamental biological processes, including immune response, cellular signal transduction and so on (Nooren and Thornton, 2003). Moreover, these processes are implicated in the development of numerous human diseases and serve as important drug targets (Ferreira et al., 2016; Scott et al., 2016). Experimental techniques (Miura, 2018) including X-ray crystallography, nuclear magnetic resonance (NMR) and cryo-electron microscopy (cryo-EM) have been applied to determine the bound structures of protein–small molecule, protein–peptide and protein–protein complexes. The number of experimental complex structures are significantly increased in recent years (Sussman et al., 1998). However, it is still rather time consuming and resource demanding to obtain high-resolution experimental structures. Moreover, the experimental structures often capture static pictures of protein complexes. Intermediate conformational states that could be relevant for drug design are usually difficult to probe using current experimental techniques.

Computational methods have been developed to model biomolecular recognition and predict the binding free energies and/or kinetic rates, including the widely used molecular docking (Morris et al., 2009; Wang and Zhu, 2016; Porter et al., 2017; Ciemny et al., 2018; Vakser, 2020), Brownian dynamics (Ermak and McCammon, 1978; Gabdoulline and Wade, 2001; Spaar et al., 2006; Wieczorek and Zielenkiewicz, 2008; Votapka and Amaro, 2015) and molecular dynamics (MD) simulations (Karplus and McCammon, 2002; Basdevant et al., 2013; Pan et al., 2019; He et al., 2021; Lamprakis et al., 2021). Molecular docking has been widely used for predicting the holo structures of protein–ligand (Wang and Zhu, 2016), protein–peptide (Ciemny et al., 2018) and protein–protein complexes (Vakser, 2020). Although significant improvements have been achieved in developments of the molecular docking algorithms, the accuracy of docking could be still limited, due to high system flexibility especially in docking of the peptides and proteins. Recently, deep learning techniques have been introduced into molecular docking to increase accuracy. One successful example is the AlphaFold-multimer (Evans et al., 2022), which has significantly increased the accuracy of predicting protein–protein complex structures. However, one is still not able to predict biomolecular binding kinetics from molecular docking.

MD is a powerful technique for simulations of biomolecular structural dynamics (Karplus and McCammon, 2002). Remarkable advances in computing hardware (e.g., the Anton supercomputer and GPUs) and software developments have significantly increased the accessible time scale of conventional MD (cMD) from hundreds of nanoseconds to hundreds of microseconds (Harvey et al., 2009; Shaw et al., 2010; Johnston and Filizola, 2011; Lane et al., 2013; Hollingsworth and Dror, 2018; Shaw et al., 2021). Notably, the latest Anton3 (Shaw et al., 2021) has achieved the speed of hundreds of microseconds per day for ATPase and Satellite Tobacco Mosaic Virus (STMV) with total number of atoms ranging from 328 K to 1,067 K, which will significantly facilitate simulations of biomolecular recognition process. The cMD simulations have been widely applied to investigate biomolecular dynamics, including conformational change (Jensen et al., 2012), protein folding (Lindorff-Larsen et al., 2011) and substrate binding (Shan et al., 2011; Dror et al., 2013; Robustelli et al., 2020).

For small-molecule ligand binding, Shan et al. (2011) observed spontaneous binding of the Dasatinib drug to its target Src kinase during tens of microseconds cMD simulations. However, no dissociation event was observed in the cMD simulations. Pan et al. (2017) performed tens of microseconds cMD simulations to successfully characterise repetitive binding and dissociation of six small-molecule fragments to the protein FKBP. Based on the large number of binding and dissociation events in the simulations, they were able to accurately calculate the binding free energies and kinetic rates. Remarkably, the binding free energies calculated from the cMD simulations agreed very well with those predicted from free energy perturbations (FEP) calculations. It is worth noting that the tested fragments were weak binders with affinities ranging from 200 μM to 20 mM. It is still challenging to simulate both binding and dissociation of typical small-molecule ligands of proteins (usually with higher binding affinities and slower dissociation rates) using cMD, although the ligand residence time (or dissociation rate) has recently been recognised to correlate better with drug efficacy (Schuetz et al., 2017). For protein–protein interactions, tens of microseconds cMD simulations were able to capture barnase binding to barstar (Pan et al., 2019). Accurate barnase binding rate (kon) was predicted based on multiple binding events captured in a total of ~440 μs Anton cMD simulations (Pan et al., 2019). However, it remains challenging to simulate dissociation of the barnase–barstar model system using cMD (Pan et al., 2019).

Weighted ensemble (Saglam and Chong, 2019) and Markov state model (MSM) (Plattner et al., 2017) have been developed to improve the prediction of biomolecular binding thermodynamics and kinetics based on a large number of short cMD trajectories. The kinetic binding rate (kon) of the p53 peptide to the MDM2 protein was accurately predicted with weighted ensemble of a total amount of ~120 μs cMD simulations in implicit solvent (Zwier et al., 2016). Another weighted ensemble of a total of ~18 μs cMD simulations was able to accurately predict the barnase–barstar binding rate constant (kon) (Saglam and Chong, 2019). However, it is still challenging to model the slow protein/peptide dissociation processes with weighted ensemble simulations (Zwier et al., 2016; Saglam and Chong, 2019). MSM (Plattner and Noe, 2015; Paul et al., 2017; Plattner et al., 2017) was able to simultaneously predict the binding and dissociation kinetics through longer aggregated cMD simulations. MSM built with 150 μs MD simulation data was used to accurately predict benzamidine–trypsin binding kinetics (Plattner and Noe, 2015). Based on a total of two millisecond cMD simulations of barnase binding to barstar, MSM was generated to predict intermediate structures, binding energies and kinetic rates that were consistent with experimental data (Plattner et al., 2017). However, these calculations required very expensive computational resources.

Coarse-grained MD models have been developed to reduce the demand of computational resources and extend simulation time scales (Souza et al., 2020, 2021). Souza et al. (2020) performed millisecond cMD simulations to capture the binding of diverse protein–ligand systems. Accurate binding free energies were predicted through the cMD simulations without a priori information (Souza et al., 2020). Millisecond MD simulations with a useful coarse-grained model (PACE) were performed to characterise the binding mechanism of the intrinsically disordered Aβ peptides (Aβ17–42) to form Aβ fibril (Han and Schulten, 2014). In addition, coarse-grained models could be incorporated into multiscale computational approaches to improve the efficiency and accuracy of ligand binding thermodynamics and kinetics calculations (Elber, 2020; Jagger et al., 2020; Huang, 2021). For example, simulation enabled estimation of kinetic rates (SEEKR; Votapka and Amaro, 2015; Jagger et al., 2020) is a multiscale simulation approach combining MD, Brownian dynamics and milestoning for calculating receptor−ligand binding and dissociation rates. SEEKR has been shown to estimate binding kinetic rates with up to a factor of 10 less simulation time (Jagger et al., 2020).

Enhanced sampling methods have been developed to efficiently simulate biomolecular recognition. They could be generally divided into two categories depending on the usage of collective variables (CVs). The CV-based methods include the widely used steered MD (Kingsley et al., 2016), umbrella sampling (Gumbart et al., 2013; Kingsley et al., 2016; Joshi and Lin, 2019b), metadynamics (Antoszewski et al., 2020; Banerjee and Bagchi, 2020), adaptive biasing force (ABF; Darve and Pohorille, 2001; Darve et al., 2008) and so on. These methods often use predefined CVs to effectively guide simulations. Thus, a priori knowledge of the system is required in CV-based enhanced sampling. Alternatively, when it is difficult to predefine CVs, CV-free enhanced sampling methods could be useful (Kamenik et al., 2022). These methods include replica exchange MD (Sugita and Okamoto, 1999; Sugita et al., 2019; Siebenmorgen and Zacharias, 2020), random acceleration molecular dynamics (RAMD; Nunes-Alves et al., 2021), tempered binding (Pan et al., 2019), integrated tempering sampling (ITS; Yang et al., 2015; Shao and Zhu, 2019), scaled MD (Deb and Frank, 2019), accelerated MD (aMD; Hamelberg et al., 2004), Gaussian accelerated MD (GaMD; Miao et al., 2015b; Wang et al., 2021) and so on. The above-mentioned methodological advances have enabled simulations of millisecond or even longer time scale processes. Here, we will briefly review recent efforts in modelling biomolecular recognition, especially characterisation of binding thermodynamics and kinetics.

Collective variable-based enhanced sampling

During CV-based enhanced sampling simulations, a potential or force bias is applied along certain CVs to facilitate energy barrier crossing events among different conformational states. Typical CVs include distances, angle, dihedral, path, eigenvectors generated from the principal component analysis, root-mean square deviation (RMSD) relative to a reference conformation (Bouvier and Grubmuller, 2007) and so on. The bias potential applied to the system is usually around several kcal/mol. Thus one is able to accurately recover the original free energy profiles.

Umbrella sampling has been applied to predict the ligand/peptide/protein binding and/or dissociation pathways and map the associated free energy landscapes (Gumbart et al., 2013; Joshi and Lin, 2019a; Sieker et al., 2008; You et al., 2019). Metadynamics has been applied to investigate ligand/peptide/protein binding in terms of the binding kinetic rates (Casasnovas et al., 2017; Sun et al., 2017) and free energies (Saleh et al., 2017; Banerjee et al., 2018; Raniolo and Limongelli, 2020; Wang et al., 2022a). Metadynamics simulations (Limongelli et al., 2013; Tiwary and Parrinello, 2013) have also been applied to investigate the thermodynamics and kinetics of benzamidine inhibitor binding to trypsin. Multiple metadynamics trajectories with a total of 5 μs simulations were obtained to predict the ligand unbinding pathways and dissociation rate constant (k off). The predicted k off (9.1 ± 2.5 s−1) was smaller than the experimental value (600 ± 300 s−1). Separate funnel metadynamics simulations predicted accurate of ligand binding free energies (−8.5 ± 0.7 kcal/mol) for the same system (Limongelli et al., 2013). Infrequent metadynamics simulations with three carefully chosen CVs have successfully predicted the peptide binding and dissociation rates for the system of p53-MDM2 -(Zou et al., 2020). Although these methods have shown remarkable improvements in capturing rare events that happen over exceedingly long timescales, users often face a challenge for defining CVs, which requires expert knowledge of the studied systems (Abrams and Bussi, 2014; Zuckerman, 2011). Additionally, the predefined CVs could constrain the sampling space, leading to slow convergence of the simulations and suffering from “hidden energy barrier” once important CVs were missed during the simulation setup (Bešker and Gervasio, 2012). To accelerate the convergence of simulations, replica exchange or parallel tempering methods have been incorporated into metadynamics. For example, bias-exchange metadynamics simulations with eight CVs have been performed to predict accurate binding free energy of the p53 peptide to the MDM2 protein. Parallel tempering metadynamics simulations with well-tempered ensemble (PTMetaD-WTE) successfully captured the binding and dissociation processes of insulin dimer (Antoszewski et al., 2020). In summary, by carefully defining reaction coordinates, the CV-based enhanced sampling methods could efficiently and accurately predict binding free energies and kinetic rates.

Enhanced sampling without predefined collective variables

In CV-free enhanced sampling methods, bias is often applied on generalised properties of the system (such as the potential energy and atomic forces) in the simulations. Repetitive benzamidine binding and unbinding in trypsin were captured using the selective ITS method (Yang and Qin Gao, 2009; Yang et al., 2015; Shao and Zhu, 2019). Pan et al. (2019) developed the tempered binding method, which significantly accelerates the slow protein dissociation process by dynamically adjusting electrostatic and van der Waals interactions between different groups of protein atoms by a factor λ. The tempered binding simulations have successfully captured repetitive binding and dissociation events for five diverse protein–protein systems (Pan et al., 2019). In the scaled MD simulations (Sinko et al., 2013), a scale factor ranging from 0 to 1 is introduced to smoothen the potential energy surface. Schuetz et al. (2018) performed scaled MD simulations to accurately predict the residence time and drug dissociation pathways of different inhibitors of heat shock protein 90 (Hsp90). In a recent study, Bianciotto et al. (2021) used scaled MD simulations to predict the residence time and ligand unbinding pathways for a set of 27 ligands of Hsp90 protein, being highly consistent with experimental data. Deb and Frank (2019) developed a selective scaled MD simulation method, where specific energy terms are scaled to promote dissociation of bound ligands from the protein. Particularly, ligand–water interactions are scaled to help the ligands dissociate from its bound state. Selective scaled MD predict accurate residence times and associated free energy change of three inhibitor drugs bound to cyclin-dependent kinase protein complexes. Hence, selective scaled MD proves to be an important enhanced sampling method for modelling biomolecular dissociation process.

In RAMD, an additional random force is applied on the ligand to promote especially the dissociation. In one recent study, Nunes-Alves et al. (2021) performed RAMD simulations to predict ligand dissociation rates of T4 lysozyme. The predicted kinetic rates agreed well with experimental values for various systems with different ligands, temperatures and protein mutations. Moreover, a ligand with complex dissociation pathways was often associated with longer residence time. In another study, the same group (Kokh and Wade, 2021) performed RAMD simulations to explore ligand dissociation pathways and kinetics of two GPCRs, i.e., the β2 adrenergic receptor (β2AR) and M2 muscarinic acetylcholine receptor (M2R). The ligand dissociation pathways observed in the RAMD simulations were similar to those in long cMD and metadynamics simulations. Additionally, RAMD revealed an allosteric modulation mechanism of the LY2119620 PAM in the M2R. Dissociation of the iperoxo agonist was blocked from one of the possible pathways and hence had increased residence time, being consistent with the experimental data.

The aMD enhanced sampling technique works by adding a non-negative boost potential to smooth the system potential energy surface (Voter, 1997; Hamelberg et al., 2004). The boost potential (ΔV) decreases the energy barrier to facilitate the system cross different conformational states (Hamelberg et al., 2004, 2007). In one study, Kappel et al. (2015) performed aMD simulations to study ligand binding to M3 muscarinic receptor (M3R). Three ligands of the receptor: full agonist Ach, partial agonist arecoline (Arc) and antagonist tiotropium (TTP) were used to perform the aMD simulations. Starting from the bulk solvent, aMD captured the binding of Ach to the M3R orthosteric site in significantly less time as compared to the cMD simulations. The Arc was also observed binding to the orthosteric site whereas the TTP molecule bound to the extracellular vestibule of the receptor. Moreover, all ligands could bind to the extracellular vestibule of the receptor, suggesting the vestibule as metastable binding site for orthosteric ligands. However, aMD suffers from large energetic noise during reweighting as the boost potential is typically on the order of tens to hundreds of kcal/mol (Shen and Hamelberg, 2008).

GaMD is developed to apply a harmonic boost potential to enhance sampling with significantly reduced energetic noise. The boost potential normally exhibits a near Gaussian distribution, which enables proper reweighting of the free energy profiles through cumulant expansion to the second order (Miao et al., 2015b; Wang et al., 2021). GaMD has been successfully applied to simulate important biomolecular processes, including ligand/protein/RNA binding (Miao et al., 2015a, 2018b; Miao and McCammon, 2016; Pang et al., 2017; Wang and Chan, 2017; Chuang et al., 2018; Liao and Wang, 2019; Wang et al., 2022b), protein folding (Miao et al., 2015a; Pang et al., 2017) and protein conformational changes (Miao and McCammon, 2016; Salawu, 2018; Zhang et al., 2018). However, it remained challenging to simulate repetitive substrate binding and dissociation through normal GaMD (Miao and McCammon, 2018; Wang et al., 2021).

Recently, “selective GaMD” algorithms have been developed to allow for more efficient enhanced sampling of biomolecular binding and dissociation processes, including the Ligand GaMD (LiGaMD) (Miao et al., 2020), Peptide GaMD (Pep-GaMD; Wang and Miao, 2020) and protein–protein interaction – GaMD (PPI-GaMD; Wang and Miao, 2022). For simulations of biomolecular binding, the system contains substrate L (e.g. small-molecule ligands, peptides or ligand protein), protein P and the biological environment E. Therefore, the potential energy of system could be decomposed into the following terms: Inline graphic , where Inline graphic , Inline graphic and Inline graphic are the bonded potential energies in protein P, substrate L and environment E, respectively. Inline graphic , Inline graphic and Inline graphic are the self non-bonded potential energies in protein P, substrate L and environment E, respectively. Inline graphic , Inline graphic and Inline graphic are the non-bonded interaction energies between P-L, P-E and L-E, respectively. In order to facilitate the ligand/peptide/protein binding (Fig. 1), a boost potential is selectively added on the essential energy terms ( Inline graphic ) in the LiGaMD, Pep-GaMD and PPI-GaMD, respectively. Presumably, ligand binding mainly involves the non-bonded interaction energies of the ligand. LiGaMD thus selectively boosts on the energy terms of Inline graphic . In comparison, peptide binding involves in both the bonded and non-bonded interaction energies of the peptide since peptides often undergo large conformational changes during binding to the target proteins. Thus, the essential energy term in Pep-GaMD is Inline graphic While protein–protein binding and unbinding processes mainly involve the non-bonded interaction energies between protein partners, one can apply a selective boost to the essential energy term Inline graphic in PPI-GaMD. In addition to selectively boost the essential energy term Inline graphic , another boost potential could be applied on the remaining energy of the system to facilitate substrate rebinding in a dual-boost scheme. These new algorithms have been implemented in the GPU version of AMBER22 (Case et al. 2022).

Figure 1.

Figure 1.

Schematic illustration of biomolecular recognition: (a) Small-molecule ligand binding, (b) peptide binding and (c) protein–protein interactions (PPIs).

Repetitive binding and dissociation of small-molecule ligands were captured in the LiGaMD simulations of host–guest and protein–ligand binding model systems (Miao et al., 2020), which enabled us to calculate ligand binding thermodynamics and kinetics calculations. Repetitive guest binding and dissociation in the β-cyclodextrin host were observed in hundreds-of-nanoseconds LiGaMD simulations. The binding free energies of guest molecules predicted from LiGaMD simulations agreed excellently with experimental data (< 1.0 kcal/mol error). In comparison with previous microsecond-timescale cMD simulations, accelerations of ligand kinetic rate constants in LiGaMD simulations were properly estimated using Kramers’ rate theory. Furthermore, microsecond LiGaMD simulations observed repetitive benzamidine binding and dissociation in trypsin. Trypsin–benzamidine ligand binding free energy was calculated from the 3D PMF profile to be −6.13 ± 0.35 kcal/mol, being highly consistent with the experimental value of −6.2 kcal/mol (Guillain and Thusius, 1970). Similarly, the ligand binding and dissociation time periods were recorded to calculate the reweighted kon and koff values to be 1.15 ± 0.79 × 107 M−1·s−1 and 3.53 ± 1.41 s−1, respectively. These data were comparable to the values calculated from experiments (Guillain and Thusius, 1970).

Pep-GaMD (Wang and Miao, 2020) has been demonstrated on binding of three model peptides to the SH3 domains (Ball et al., 2005; Ahmad and Helms, 2009), including “PAMPAR” (PDB: 1SSH), “PPPALPPKK” (PDB: 1CKA) and “PPPVPPRR” (PDB: 1CKB). Repetitive dissociation and binding of the three peptides were successfully captured in each of the 1 microsecond Pep-GaMD simulations. The peptide binding free energies calculated from Pep-GaMD simulations were in excellent agreements with those from the experiments. For the 1CKA system, the calculated peptide binding free energy value was −7.72 ± 0.54 kcal/mol, being highly consistent with the experimental value of −7.84 kcal/mol (Wu et al., 1995). For the 1CKB system, the predicting binding free energy was −6.84 ± 0.14 kcal/mol, being closely similar to the experimental value of −7.24 kcal/mol (Wu et al., 1995). In addition, the Pep-GaMD predicted the kon and koff of 1CKA as 4.06 ± 2.26 × 1010 M−1⋅s−1 and 1.45 ± 1.07 × 103 s−1, respectively. They were comparable to the experimental data (Xue et al., 2014) of Inline graphic = 1.5 × 109 M−1⋅s−1 and Inline graphic = 8.9 × 103 s−1.

More recently, Pep-GaMD simulations were combined with complementary biochemical experiments to elucidate mechanism of tripeptide trimming of amyloid β-peptide (Aβ peptide) by γ-secretase (Bhattarai et al., 2022). The active model of γ-secretase for ε cleavage was extracted from previous study (Bhattarai et al., 2020) and used as the starting structure for Pep-GaMD simulations. 600 ns Pep-GaMD simulations were able to capture the ζ cleavage activation starting from the ε cleavage activated model, which was suggested to carry out in timescale of minutes (Kamp et al., 2015). During activation, coordinated hydrogen bonds were formed between carbonyl oxygen of Aβ49 Val46 and enzyme catalytic Asp257. The two catalytic aspartates, Asp257 and Asp385 in the active site of the enzyme both formed hydrogen bonds with the water molecule aligned in between them. This activated enzyme conformation was well oriented for the ζ cleavage of amide between Val46 and Ile47 of the Aβ49. Three low energy states including “Final”, “Intermediate” and “Initial” were identified from the Pep-GaMD simulations ( Fig. 2a ). The Final state denoted the activated enzyme conformation for ζ cleavage where the Asp257–Asp385 distance was ~7–8 Å and the Asp257–Aβ49 Val46 distance was ~3 Å (hydrogen bond). The Initial and Intermediate low energy states denoted the starting and transitional conformation during the activation process. Furthermore, Pep-GaMD simulations were performed for three additional FAD mutant Aβ49 bound enzyme systems. Similar to the wildtype system, Pep-GaMD simulations of I45F, A42T and V46F mutant Aβ49 bound enzyme systems were able to capture the ζ cleavage activation starting from the ε cleavage activated model. Free energy profiles of the FAD mutant systems were similar to the wildtype system ( Fig. 2bd). In the I45F mutant system, two low energy states were identified including “Initial” and “Final” (Fig. 2b). The A42T mutant was the most dynamic enzyme system with four distinct low energy states identified in a larger area covered free energy profile including “Initial”, “Final”, “Inhibited-1” and “Inactive” (Fig. 2c). The catalytic aspartates of the “Inhibited-1” conformational state were too close for activation and hence was inhibited. In contrast, the aspartates were too far for their catalytic activity in the “Inactive” low energy state of the enzyme. In the V46F mutant γ-secretase system, two low energy states were identified in the free energy profile including “Final” and “Inhibited-2” (Fig. 2d). The structures were compared between the “Initial” and “Final” low energy conformational states of the enzyme as identified from the free energy profiles (Fig. 2eg). The enzyme moved from Initial to Final conformational state, the Aβ49 substrate tilted by ~50° (Fig. 2f). Unwinding of helix was observed in the C-terminus of Aβ49 where residues Val44 and Ile45 were observed changing their conformation from helix to a loop (Fig. 2f). Similarly, in the active site of the enzyme, the protonated Asp257 in the Final state was observed moving forward towards the substrate scissile amide bond by 3 Å in comparison to the Initial state (Fig. 2g). In contrast, the deprotonated Asp385 in the Final state and the Initial state were observed in a similar conformation (Fig. 2g). The simulation findings were highly consistent with biochemical experimental data. Taken together, complementary biochemical experiments and Pep-GaMD simulations have enabled elucidation of the mechanism of tripeptide trimming of Aβ49 by γ-secretase.

Figure 2.

Figure 2.

Mechanism of tripeptide trimming of amyloid β-peptide 49 by γ-secretase. 2D free energy profiles calculated regarding Asp257 - Asp 385 distance and Asp257 – Aβ49 Val46 distance calculated from Pep-GaMD simulations of (a) wildtype Aβ49 bound γ-secretase, (b) I45F mutant Aβ49 bound γ-secretase, (c) A42T mutant Aβ49 bound γ-secretase and (d) V46F mutant Aβ49 bound γ-secretase systems. (e) Structures of catalytic subunit PS1 bound to APP and Aβ49 substrates representing the “Initial” and “Final” conformational states, respectively. (f) Conformational changes in (f) Aβ49 and (g) active site of the enzyme during transition from Initial to Final activated state for ζ cleavage. Adapted with permission from Bhattari A, Devkota S, Do HN, Wang J, Bhattarai S, Wolfe MS and Miao Y. Journal of the American Chemical Society. 10.1021/jacs.1c10533. Copyright 2022 American Chemical Society.

PPI-GaMD (Wang and Miao, 2022) has been demonstrated on a model system of the ribonuclease barnase binding to barstar. Six independent 2 μs PPI-GaMD simulations have successfully captured repetitive barstar dissociation and rebinding events (Fig. 3a). Five binding and six dissociation events were observed in both Sim1 and Sim3. In Sim2, three binding and four dissociation events were captured. For the remaining simulations (Sim4–Sim6), three binding and three dissociation events were observed (Fig. 3a). The barstar binding free energy predicted from PPI-GaMD was −17.79 kcal/mol with a standard deviation of 1.11 kcal/mol, being highly consistent with the experimental value of −18.90 kcal/mol (Schreiber and Fersht, 1993). In addition, the PPI-GaMD simulations allowed us to calculate the protein binding kinetics. The average reweighted kon and koff were predicted as 21.7 ± 13.8 × 108 M−1⋅s−1 and 7.32 ± 4.95 × 10−6 s−1, being highly consistent with the corresponding experimental values of 6.0 × 108 M−1⋅s−1 and 8.0 × 10−6 s−1, respectively. Furthermore, PPI-GaMD simulations have provided mechanistic insights into barstar binding to barnase, which involve long-range electrostatic interactions and multiple binding pathways (Fig. 3cf), being consistent with previous experimental and computational findings of this model system. It is worth noting that at least three independent replicas of selective GaMD simulations with longer simulation lengths (e.g., microsecond) are required to obtain sufficient statistics for ligand binding, peptide binding and protein–protein interactions. In order to calculate accurate binding free energy and kinetic rates, the length of each simulation should be long enough to capture ≥3 binding and dissociation events as suggested by LiGaMD (Miao et al., 2020), Pep-GaMD (Wang and Miao, 2020) and PPI GaMD (Wang and Miao, 2022) studies.

Figure 3.

Figure 3.

PPI-GaMD simulations of barnase binding/dissociation to barstar. (a) Time courses of protein–protein interface distance calculated from six independent 2 μs PPI-GaMD simulations. (b) Original (reweighted) and modified (no reweighting) PMF profiles of the protein interface distance averaged over six PPI-GaMD simulations. Error bars are standard deviations of the free energy values calculated from six PPI-GaMD simulations. (c) 2D PMF profiles regarding the interface RMSD and the distance between the CZ atom of barnase Arg59 and CG atom of barstar Asp39. (d) 2D PMF profiles regarding the interface RMSD and the distance between the center of masses (COMs) of barnase residues Ala37-Ser38 and barstar residues Gly43-Trp44. (e,f) Low-energy conformations as identified from the 2D PMF profiles of the (e) intermediate “I1”, (f) intermediate “I2”. Strong electrostatic interactions are shown in red dash lines with their corresponding distance values labelled in the intermediate “I1” (e) and “I2” (f). Adapted with permission from Wang J, Miao Y. Journal of Chemical Theory and Computation. 10.1021/acs.jctc.1c00974. Copyright 2022 American Chemical Society.

Machine learning

Machine learning (ML) has been applied to improve computational docking, especially in the scoring functions (Khamis et al., 2015). A scoring function in molecular docking refers to a mathematical predictive model that outputs a representative score of the binding free energy of a bound conformation. Scoring of a docked complex is the final step of the three essential components in molecular docking, with the first two being chemical molecule representation and pose generation (Khamis et al., 2015). A reliable scoring function should have a good scoring power (the ability to produce scores for different binding poses), ranking power (the ability to correctly rank a given set of ligands with known binding poses when bound to a common protein) and docking power (the ability to identify the best binding pose of a given ligand from a set of computationally generated poses when bound to a specific protein; Ashtaway and Mahapatra, 2012). Kinnings et al. (2011) used a support vector machine (SVM) to derive a unique set of weights for each individual protein family – the wi’s in the following equation:

graphic file with name S2633289222000114_eqn1.jpg (1)

This was shown to improve the binding affinity prediction of the electronic high throughput screening (eHiTS) molecular docking software (Zsoldos et al., 2007) compared with empirical knowledge-based scoring functions (Khamis et al., 2015). Similarly, a force field scoring function can be trained to derive a unique set of parameters for each individual protein family – the Aij’s and Bij’s in the following equation:

graphic file with name S2633289222000114_eqn2.jpg (2)

ML could also be used to predict the binding affinity based on a number of features of the protein–ligand complex, including geometric features, physical force field energy terms, pharmacophore features, etc. Specifically, ML could learn the relationship between these features and corresponding known binding affinity to predict the binding affinity of new complexes (Khamis et al., 2015). Recently, Ballester and Mitchell (2010) applied non-parametric ML techniques to generate the functional form of scoring functions given molecular databases. The authors used random forest (RF; Breiman, 2001) to learn the relationship between the atomic-level description of the complex and the experimental binding affinity. Here, the Kd and Ki measurements were merged into a single binding constant K to represent the experimental binding affinity. The atomic-level description used was of geometric nature and was the occurrence count of nine common elemental atoms (C, N, O, F, P, S, Cl, Br, I) type pair. Even though they completely neglected the energy terms induced by protein–ligand interactions, Ballester and Mitchell (2010) were able to achieve Pearson correlation coefficient of 0.774 on the PDBbind v2007 core set (195 complexes).

Very recently, deep learning (DL) methods, including RoseTTAFold (Baek et al., 2021) and AlphaFold (Jumper et al., 2021), were developed to achieve structure prediction accuracies far beyond those from classical force-field-based methods (Baek and Baker, 2022). These methods have millions of parameters, much more than the hundreds of parameters in classical approaches, thus better sample the large conformational space of proteins. Furthermore, they make no assumptions about the functional form of the interactions between atoms. In fact, the two DL-based methods learn millions of parameters directly to generate correct 3D structures from input amino acid sequences (Baek and Baker, 2022; Baek et al., 2021; Jumper et al., 2021). AlphaFold and RoseTTAFold are trained to predict structures from alignments of homologous amino acid sequences. In particular, the two DL-based approaches learn to extract rich structural information through a three-track network where information at the 1D sequence level, 2D distance map, and 3D coordinate level is successively transformed and integrated (Baek et al., 2021; Jumper et al., 2021). They were also shown to predict protein structures very accurately from single amino acid sequences (Baek and Baker, 2022; Baek et al., 2021; Jumper et al., 2021).

MD simulations could generate very large data in terms of conformation frames and number of simulated atoms. For example, weighted ensemble of the COVID19 spike protein’s closed-to-open state generated over 100 terabytes of data (Casalino et al., 2021). This brings a challenge to identify proper CVs to differentiate conformational states from the raw simulation data and to identify corresponding biologically transitions between such states (e.g., open/closed states of spike). In this regard, the ML/deep learning has been applied to identify appropriate CV to analysis MD simulation trajectories (Noé, 2020; Wang et al., 2020; Glielmo et al., 2021; Sun et al., 2022). These linear, non-linear and hybrid ML approaches cluster the simulation data along a small number of latent dimensions to identify conformational transitions between states (Bernetti et al., 2020; Ramanathan et al., 2012). Another benefit of MD-coupled ML approaches is that the information learned from ML can be used to iteratively guide the MD sampling (Wang et al., 2019). Based on the predictive information bottleneck, Wang et al. (2019) developed an approach to identify system reaction coordinates and calculate the free energy and kinetic rates in biomolecules. The algorithm was demonstrated on conformational transitions in the alanine dipeptide model system and ligand dissociation from the L99A T4lysome. Thermodynamic and kinetic quantities calculated from short enhanced MD simulations for slow biomolecular processes were in good agreement with the experiments and long unbiased MD simulations.

Recently, we have integrated the GaMD, Deep Learning and free energy prOfiling Workflow (GLOW) to predict important reaction coordinates and map free energy profiles of biomolecules (Do et al., 2022). First, GaMD simulations are performed on the target biomolecules (Fig. 4a). The residue contact map is then calculated for each GaMD simulation frame and transformed into images (Fig. 4b). The specialised type of neural network for image classification, two-dimensional (2D) convolutional neural network (CNN), is employed to classify the residue contact maps of target biomolecules, from which important residue contacts are identified by classic gradient-based pixel attribution (Fig. 4c). Finally, the free energy profiles of these reaction coordinates are calculated through reweighting of GaMD simulations to characterise the biomolecular systems of interest (Fig. 4d; Do et al., 2022). GLOW was successfully demonstrated on characterisation of activation and allosteric modulation of a GPCR, using the adenosine A1 receptor (A1AR) as a model system. Characterisation of the A1AR activation was achieved by classification of the A1AR bound by “Antagonist”, “Agonist” and “Agonist-Gi”. GLOW achieved an overall accuracy of 99.34% and loss of 1.85%, respectively, on the validation data set after 15 epochs. Meanwhile, characterisation of A1AR allosteric modulation was achieved by classification of the A1AR bound by “Agonist-Gi” and “Agonist-Gi-PAM”. GLOW achieved an overall accuracy of 99.27% and loss of 1.78%, respectively, on the validation data set after 15 epochs. GLOW identified characteristic residue contacts that were highly consistent with previous studies to the residue levels for both A1AR activation and allosteric modulation. In particular, the ligand-binding extracellular domains (ECL1–ECL3) and intracellular G-protein binding domains (TM3, TM5, TM6 and TM7) were found to be loosely coupled in the GPCR activation. Furthermore, it showed that ECL2 played a critical role in the allosteric modulation of A1AR, being consistent with previous mutagenesis, structure and molecule modelling studies (Avlani et al., 2007; Peeters et al., 2012; Nguyen et al., 2016; Miao et al., 2018a; Draper-Joyce et al., 2021). GLOW revealed that binding of a PAM (MIPS521) to the agonist-Gi-A1AR complex biased the receptor conformational ensemble, especially in the ECL1 and ECL2 regions. PAM binding stabilised agonist binding within the orthosteric pocket of A1AR, which confined the extracellular mouth of the receptor Furthermore, PAM binding disrupted the N148ECL2-V152ECL2 α-helical hydrogen bond and distorted this portion of the ECL2 helix (Do et al., 2022).

Figure 4.

Figure 4.

Overview of the Gaussian accelerated molecular dynamics (GaMD), deep learning (DL) and Free Energy PrOfiling Workflow (GLOW). (a) With structures of our interest, GaMD simulations are applied for enhanced sampling of the system dynamics. (b) DL models are then built with GaMD trajectories of residue contact maps transformed into image representations. (c) The DL analysis allows us to identify important residue contacts and system reaction coordinates (RCs). (d) Free energy profiles of the RCs are finally calculated through reweighting of GaMD simulations to characterise the system dynamics. Adapted with permission from Do HN, Wang J, Bhattari A and Miao Y. Journal of Chemical Theory and Computation. 10.1021/acs.jctc.1c01055. Copyright 2022 American Chemical Society.

In addition, DL has been widely applied to optimise force field (Poltavsky and Tkatchenko, 2021; Unke et al., 2021; Chatterjee et al., 2022), binding free energy calculations (Jiang et al., 2021; Jones et al., 2021; Chen et al., 2022) and binding pathway identification (Motta et al., 2022).

Conclusions and outlook

With remarkable advances in both computer hardware and software, computational approaches have achieved significant improvement to characterise biomolecular recognition, including molecular docking, MD simulations and ML. ML has been incorporated into both molecular docking and MD simulations to improve the docking accuracy, simulation efficiency and trajectory analysis, e.g., AlphaFold-Multimer and GLOW. MD simulations have enabled characterisation of biomolecular binding thermodynamics and kinetics, attracting increasing attention in recent years. Long time scale cMD simulations have successfully captured biomolecular binding processes, although slow dissociation of biomolecules are still often difficult to simulate using cMD.

Enhanced sampling methods have greatly reduced the computational cost for calculations of biomolecular binding thermodynamics and kinetics. Higher sampling efficiency could be generally obtained using the CV-based methods than using the CV-free methods. However, CV-based enhanced sampling methods require predefined CVs, which is often challenging for simulations of complex biological systems. Nevertheless, ML techniques have proven useful to identify proper CVs or reaction coordinates. Alternatively, CV-free methods are usually easy to use without requirement of a priori knowledge of the studied systems. Additionally, the CV-based and CV-free methods could be combined to be more powerful. The CV-free methods can enhance the sampling to potentially overcome the hidden energy barriers in orthogonal degrees of freedom relative to the CVs predefined in the CV-based methods, which could enable faster convergence of the MD simulations. Newly developed algorithms in this direction include integration of replica exchange umbrella sampling with GaMD (GaREUS; Oshima et al., 2019), replica exchange of solute tempering with umbrella sampling (gREST/REUS; Kamiya and Sugita, 2018; Re et al., 2019), replica exchange of solute tempering with well-tempered Metadynamics (ST-MetaD; Mlýnský et al., 2022) and temperature accelerated molecular dynamics (TAMD) with integrated tempering sampling (ITS/TAMD; Xie et al., 2017).

Recent years have seen an increasing number of techniques that introduce “selective” boost in the CV-free enhanced sampling methods, including the selective ITS, selective scaled MD, selective aMD and selective LiGaMD, Pep-GaMD and PPI-GaMD. In these methods, only essential energy terms are selectively boosted to further increase the sampling efficiency. Additionally, compatible enhanced sampling methods could be combined to be more powerful. For example, GaMD has been combined with Umbrella Sampling to achieve significantly improved efficiency (Oshima et al., 2019; Wang et al., 2021). Besides enhanced sampling, the accuracy of force fields and water models play a critical role in predicting the biomolecular binding affinities and kinetics. For example, the TIP4P2015 water model was shown to be more accurate than the TIP3P water model in calculating the kinetics of barnase–barstar binding in cMD simulations (Pan et al., 2019). Nevertheless, biomolecular recognition in systems of increasing sizes (such as viruses and cells) and accurate calculations of binding thermodynamics and kinetics of large biomolecular complexes present grand challenges for computational modelling and enhanced sampling simulations. Further innovations in both computing hardware and method developments may help us to address these challenges in the future.

Acknowledgements

This work used supercomputing resources with allocation awards TG-MCB180049 and BIO210039 through the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation Grant no. ACI-1548562 and project M2874 through the National Energy Research Scientific Computing Center (NERSC), which is a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231. It also used computational resources provided by the Research Computing Cluster at the University of Kansas. This work was supported in part by the National Institutes of Health (R01GM132572), National Science Foundation (2121063) and the startup funding in the College of Liberal Arts and Sciences at the University of Kansas.

Open Peer Review

To view the open peer review materials for this article, please visit http://doi.org/10.1017/qrd.2022.11.

References

  1. Abrams C and Bussi G (2014) Enhanced sampling in molecular dynamics using metadynamics, replica-exchange, and temperature-acceleration. Entropy 16(1), 163–199. [Google Scholar]
  2. Ahmad M and Helms V (2009) How do proteins associate? A lesson from SH3 domain. Chemistry Central Journal 3(S1), O22. [Google Scholar]
  3. Antoszewski A, Feng C-J, Vani BP, Thiede EH, Hong L, Weare J, Tokmakoff A and Dinner AR (2020) Insulin dissociates by diverse mechanisms of coupled unfolding and unbinding. The Journal of Physical Chemistry B 124(27), 5571–5587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Ashtaway HM and Mahapatra NR (2012) A comparative assessment of ranking accuracies of conventional and machine-learning-based scoring functions for protein–ligand binding affinity prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(5), 1301–1313. [DOI] [PubMed] [Google Scholar]
  5. Avlani VA, Gregory KJ, Morton CJ, Parker MW, Sexton PM and Christopoulos A (2007) Critical role for the second extracellular loop in the binding of both orthosteric and allosteric G protein-coupled receptor ligands. Journal of Biological Chemistry 282(35), 25677–25686. [DOI] [PubMed] [Google Scholar]
  6. Baek M and Baker D (2022) Deep learning and protein structure modeling. Nature Methods 19(1), 13–14. [DOI] [PubMed] [Google Scholar]
  7. Baek M, Dimaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, Wang J, Cong Q, Kinch LN and Schaeffer RD (2021) Accurate prediction of protein structures and interactions using a three-track neural network. Science 373(6557), 871–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Ball LJ, Kuhne R, Schneider-Mergener J and Oschkinat H (2005) Recognition of proline-rich motifs by protein-protein-interaction domains. Angewandte Chemie (International Ed. in English) 44(19), 2852–2869. [DOI] [PubMed] [Google Scholar]
  9. Ballester PJ and Mitchell JB (2010) A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking. Bioinformatics 26(9), 1169–1175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Banerjee P and Bagchi B (2020) Dynamical control by water at a molecular level in protein dimer association and dissociation. Proceedings of the National Academy of Sciencesof the United States of America 117(5), 2302–2308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Banerjee P, Mondal S and Bagchi B (2018) Insulin dimer dissociation in aqueous solution: A computational study of free energy landscape and evolving microscopic structure along the reaction pathway. The Journal of Chemical Physics 149(11), 114902. [DOI] [PubMed] [Google Scholar]
  12. Basdevant N, Borgis D and Ha-Duong T (2013) Modeling protein–protein recognition in solution using the coarse-grained force field SCORPION. Journal of Chemical Theory and Computation 9(1), 803–813. [DOI] [PubMed] [Google Scholar]
  13. Bernetti M, Bertazzo M and Masetti M (2020) Data-driven molecular dynamics: A multifaceted challenge. Pharmaceuticals 13(9), 253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Bešker N and Gervasio FL (2012) Using metadynamics and path collective variables to study ligand binding and induced conformational transitions. In Computational Drug Discovery and Design. Springer, (e.d.Baron Riccardo) pp. 501–513, Humana Totowa, NJ. [DOI] [PubMed] [Google Scholar]
  15. Bhattarai A, Devkota S, Bhattarai S, Wolfe MS and Miao Y (2020) Mechanisms of γ-secretase activation and substrate processing. ACS Central Science 6(6), 969–983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Bhattarai A, Devkota S, Do HN, Wang J, Bhattarai S, Wolfe MS and Miao Y (2022) Mechanism of tripeptide trimming of amyloid β-peptide 49 by γ-secretase. Journal of the American Chemical Society 144, 6215–6226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Bianciotto M, Gkeka P, Kokh DB, Wade RC and Minoux H (2021) Contact map fingerprints of protein–ligand unbinding trajectories reveal mechanisms determining residence times computed from scaled molecular dynamics. Journal of Chemical Theory and Computation 17(10), 6522–6535. [DOI] [PubMed] [Google Scholar]
  18. Bouvier B and Grubmuller H (2007) Molecular dynamics study of slow base flipping in DNA using conformational flooding. Biophysical Journal 93(3), 770–786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Breiman L (2001) Random forests. Machine Learning 45(1), 5–32. [Google Scholar]
  20. Casalino L, Dommer AC, Gaieb Z, Barros EP, Sztain T, Ahn S-H, Trifan A, Brace A, Bogetti AT, Clyde A, Ma H, Lee H, Turilli M, Khalid S, Chong LT, Simmerling C, Hardy DJ, Maia JD, Phillips JC, Kurth T, Stern AC, Huang L, McCalpin JD, Tatineni M, Gibbs T, Stone JE, Jha S, Ramanathan A and Amaro RE (2021) AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics. The International Journal of High Performance Computing Applications 35(5), 432–451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Casasnovas R, Limongelli V, Tiwary P, Carloni P and Parrinello M (2017) Unbinding kinetics of a p38 MAP kinase type II inhibitor from Metadynamics simulations. Journal of the American Chemical Society 139(13), 4780–4788. [DOI] [PubMed] [Google Scholar]
  22. Chatterjee P, Sengul MY, Kumar A and Mackerell AD (2022) Harnessing deep learning for optimization of Lennard–Jones parameters for the polarizable classical Drude oscillator force field. Journal of Chemical Theory and Computation 18(4), 2388–2407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Chen H, Liu H, Feng H, Fu H, Cai W, Shao X and Chipot C (2022) MLCV: Bridging machine-learning-based dimensionality reduction and free-energy calculation. Journal of Chemical Information and Modeling 62, 1–8. [DOI] [PubMed] [Google Scholar]
  24. Chuang CH, Chiou SJ, Cheng TL and Wang YT (2018) A molecular dynamics simulation study decodes the Zika virus NS5 methyltransferase bound to SAH and RNA analogue. Scientific Reports 8(1), 6336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Ciemny M, Kurcinski M, Kamel K, Kolinski A, Alam N, Schueler-Furman O and Kmiecik S (2018) Protein-peptide docking: Opportunities and challenges. Drug Discovery Today 23(8), 1530–1537. [DOI] [PubMed] [Google Scholar]
  26. Case DA, Cheatham TE III, Darden TA, Duke RE, Giese TJ, Gohlke H, Goetz AW, Greene D, Homeyer N, Izadi S, Kovalenko A, Lee TS, Legrand S, Li P, Lin C, Liu J, Luchko T, Luo R, Mermelstein D, Merz KM, Monard G, Nguyen H, Omelyan I, Onufriev A, Pan F, Qi R, Roe DR, Roitberg A, Sagui C, Simmerling CL, Botello-Smith WM, Swails J, Walker RC, Wang J, Wang J, Wolf RM, Wu X, Xiao L, York DM and Kollman PA (2022) Amber 2022, University of California, San Francisco.
  27. Darve E and Pohorille A (2001) Calculating free energies using average force. The Journal of Chemical Physics 115(20), 9169–9183. [Google Scholar]
  28. Darve E, Rodríguez-Gómez D and Pohorille A (2008) Adaptive biasing force method for scalar and vector free energy calculations. The Journal of Chemical Physics 128(14), 144120. [DOI] [PubMed] [Google Scholar]
  29. Deb I and Frank AT (2019) Accelerating rare dissociative processes in biomolecules using selectively scaled MD simulations. Journal of Chemical Theory and Computation 15(11), 5817–5828. [DOI] [PubMed] [Google Scholar]
  30. Do HN, Wang J, Bhattarai A and Miao Y (2022) GLOW: A workflow integrating Gaussian-accelerated molecular dynamics and deep learning for free energy profiling. Journal of Chemical Theory and Computation 18(3), 1423–1436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Draper-Joyce CJ, Bhola R, Wang J, Bhattarai A, Nguyen AT, O’Sullivan K, Chia LY, Venugopal H, Valant C and Thal DM (2021) Positive allosteric mechanisms of adenosine A1 receptor-mediated analgesia. Nature 597(7877), 571–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Dror RO, Green HF, Valant C, Borhani DW, Valcourt JR, Pan AC, Arlow DH, Canals M, Lane JR, Rahmani R, Baell JB, Sexton PM, Christopoulos A and Shaw DE (2013) Structural basis for modulation of a G-protein-coupled receptor by allosteric drugs. Nature 503, 295. [DOI] [PubMed] [Google Scholar]
  33. Elber R (2020) Milestoning: An efficient approach for atomically detailed simulations of kinetics in biophysics. Annual Review of Biophysics 49(1), 69–85. [DOI] [PubMed] [Google Scholar]
  34. Ermak DL and McCammon JA (1978) Brownian dynamics with hydrodynamic interactions. The Journal of Chemical Physics 69(4), 1352–1360. [Google Scholar]
  35. Evans R, O’Neill M, Pritzel A, Antropova N, Senior A, Green T, Žídek A, Bates R, Blackwell S, Yim J, Ronneberger O, Bodenstein S, Zielinski M, Bridgland A, Potapenko A, Cowie A, Tunyasuvunakool K, Jain R, Clancy E, Kohli P, Jumper J and Hassabis D (2022) Protein complex prediction with AlphaFold-Multimer. bioRxiv, 2021.2010.2004.463034.
  36. Ferreira LG, Oliva G and Andricopulo AD (2016) Protein-protein interaction inhibitors: Advances in anticancer drug design. Expert Opinion on Drug Discovery 11(10), 957–968. [DOI] [PubMed] [Google Scholar]
  37. Gabdoulline RR and Wade RC (2001) Protein-protein association: Investigation of factors influencing association rates by Brownian dynamics simulations. Journal of Molecular Biology 306(5), 1139–1155. [DOI] [PubMed] [Google Scholar]
  38. Glielmo A, Husic BE, Rodriguez A, Clementi C, Noé F and Laio A (2021). Unsupervised learning methods for molecular simulation data. Chemical Reviews 121, 9722–9758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Guillain F and Thusius D (1970) Use of proflavine as an indicator in temperature-jump studies of the binding of a competitive inhibitor to trypsin. Journal of the American Chemical Society 92(18), 5534–5536. [DOI] [PubMed] [Google Scholar]
  40. Gumbart JC, Roux B and Chipot C (2013) Efficient determination of protein-protein standard binding free energies from first principles. Journal of Chemical Theory and Computation 9(8), 3789–3798. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Hamelberg D, De Oliveira CAF and McCammon JA (2007) Sampling of slow diffusive conformational transitions with accelerated molecular dynamics. The Journal of Chemical Physics 127(15), 10B614. [DOI] [PubMed] [Google Scholar]
  42. Hamelberg D, Mongan J and McCammon JA (2004) Accelerated molecular dynamics: A promising and efficient simulation method for biomolecules. The Journal of Chemical Physics 120(24), 11919–11929. [DOI] [PubMed] [Google Scholar]
  43. Han W and Schulten K (2014) Fibril elongation by Aβ17–42: Kinetic network analysis of hybrid-resolution molecular dynamics simulations. Journal of the American Chemical Society 136(35), 12450–12460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Harvey MJ, Giupponi G and Fabritiis GD (2009) ACEMD: Accelerating biomolecular dynamics in the microsecond time scale. Journal of Chemical Theory and Computation 5(6), 1632–1639. [DOI] [PubMed] [Google Scholar]
  45. He Z, Paul F and Roux B (2021) A critical perspective on Markov state model treatments of protein–protein association using coarse-grained simulations. The Journal of Chemical Physics 154(8), 084101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Hollingsworth SA and Dror RO (2018) Molecular dynamics simulation for all. Neuron 99(6), 1129–1143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Huang Y-MM (2021) Multiscale computational study of ligand binding pathways: Case of p38 MAP kinase and its inhibitors. Biophysical Journal 120(18), 3881–3892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Jagger BR, Ojha AA and Amaro RE (2020) Predicting ligand binding kinetics using a Markovian milestoning with Voronoi tessellations multiscale approach. Journal of Chemical Theory and Computation 16(8), 5348–5357. [DOI] [PubMed] [Google Scholar]
  49. Jensen MØ, Jogini V, Borhani DW, Leffler AE, Dror RO and Shaw DE (2012) Mechanism of voltage gating in potassium channels. Science 336(6078), 229–233. [DOI] [PubMed] [Google Scholar]
  50. Jiang D, Hsieh C-Y, Wu Z, Kang Y, Wang J, Wang E, Liao B, Shen C, Xu L, Wu J, Cao D and Hou T (2021) InteractionGraphNet: A novel and efficient deep graph representation learning framework for accurate protein–ligand interaction predictions. Journal of Medicinal Chemistry 64, 18209–18232. [DOI] [PubMed] [Google Scholar]
  51. Johnston JM and Filizola M (2011) Showcasing modern molecular dynamics simulations of membrane proteins through G protein-coupled receptors. Current Opinion in Structural Biology 21(4), 552–558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Jones D, Kim H, Zhang X, Zemla A, Stevenson G, Bennett WFD, Kirshner D, Wong SE, Lightstone FC and Allen JE (2021) Improved protein–ligand binding affinity prediction with structure-based deep fusion inference. Journal of Chemical Information and Modeling 61(4), 1583–1592. [DOI] [PubMed] [Google Scholar]
  53. Joshi DC and Lin JH (2019a) Delineating protein-protein curvilinear dissociation pathways and energetics with naive multiple-Walker umbrella sampling simulations. Journal of Computational Chemistry 40(17), 1652–1663. [DOI] [PubMed] [Google Scholar]
  54. Joshi DC and Lin JH (2019b) Delineating protein–protein curvilinear dissociation pathways and energetics with Naïve multiple-Walker umbrella sampling simulations. Journal of Computational Chemistry 40(17), 1652–1663. [DOI] [PubMed] [Google Scholar]
  55. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A and Potapenko A (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596(7873), 583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Kamenik AS, Linker SM and Riniker S (2022) Enhanced sampling without borders: On global biasing functions and how to reweight them. Physical Chemistry Chemical Physics 24, 1225–1236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Kamiya M and Sugita Y (2018) Flexible selection of the solute region in replica exchange with solute tempering: Application to protein-folding simulations. The Journal of Chemical Physics 149(7), 072304. [DOI] [PubMed] [Google Scholar]
  58. Kamp F, Winkler E, Trambauer J, Ebke A, Fluhrer R and Steiner H (2015) Intramembrane proteolysis of β-amyloid precursor protein by γ-secretase is an unusually slow process. Biophysical Journal 108(5), 1229–1237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Kappel K, Miao YL and McCammon JA (2015) Accelerated molecular dynamics simulations of ligand binding to a muscarinic G-protein-coupled receptor. Quarterly Reviews of Biophysics 48(4), 479–487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Karplus M and McCammon JA (2002) Molecular dynamics simulations of biomolecules. Nature Structural Biology 9(9), 646–652. [DOI] [PubMed] [Google Scholar]
  61. Khamis MA, Gomaa W and Ahmed WF (2015) Machine learning in computational docking. Artificial Intelligence in Medicine 63(3), 135–152. [DOI] [PubMed] [Google Scholar]
  62. Kingsley LJ, Esquivel-Rodríguez J, Yang Y, Kihara D and Lill MA (2016) Ranking protein–protein docking results using steered molecular dynamics and potential of mean force calculations. Journal of Computational Chemistry 37(20), 1861–1865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Kinnings SL, Liu N, Tonge PJ, Jackson RM, Xie L and Bourne PE (2011) A machine learning-based method to improve docking scoring functions and its application to drug repurposing. Journal of Chemical Information and Modeling 51(2), 408–419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Kokh DB and Wade RC (2021) G protein-coupled receptor–ligand dissociation rates and mechanisms from τRAMD simulations. Journal of Chemical Theory and Computation 17(10), 6610–6623. [DOI] [PubMed] [Google Scholar]
  65. Lamprakis C, Andreadelis I, Manchester J, Velez-Vega C, Duca JS and Cournia Z (2021) Evaluating the efficiency of the Martini force field to study protein dimerization in aqueous and membrane environments. Journal of Chemical Theory and Computation 17(5), 3088–3102. [DOI] [PubMed] [Google Scholar]
  66. Lane TJ, Shukla D, Beauchamp KA and Pande VS (2013) To milliseconds and beyond: Challenges in the simulation of protein folding. Current Opinion in Structural Biology 23(1), 58–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Liao JM and Wang YT (2019) In silico studies of conformational dynamics of mu opioid receptor performed using Gaussian accelerated molecular dynamics. Journal of Biomolecular Structure & Dynamics 37(1), 166–177. [DOI] [PubMed] [Google Scholar]
  68. Limongelli V, Bonomi M and Parrinello M (2013) Funnel metadynamics as accurate binding free-energy method. Proceedings of the National Academy of Sciences of the United States of America 110(16), 6358–6363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Lindorff-Larsen K, Piana S, Dror RO and Shaw DE (2011) How fast-folding proteins fold. Science 334(6055), 517–520. [DOI] [PubMed] [Google Scholar]
  70. Miao Y, Bhattarai A, Nguyen ATN, Christopoulos A and May LT (2018a) Structural basis for binding of allosteric drug leads in the adenosine A1 receptor. Scientific Reports 8(1), 16836. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Miao Y, Bhattarai A and Wang J (2020) Ligand Gaussian accelerated molecular dynamics (LiGaMD): Characterization of ligand binding thermodynamics and kinetics. Journal of Chemical Theory and Computation 16(9), 5526–5547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Miao Y, Caliman AD and McCammon JA (2015a) Allosteric effects of sodium ion binding on activation of the M3 muscarinic G-protein-coupled receptor. Biophysical Journal 108(7), 1796–1806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Miao Y, Feher VA and McCammon JA (2015b) Gaussian accelerated molecular dynamics: Unconstrained enhanced sampling and free energy calculation. Journal of Chemical Theory and Computation 11(8), 3584–3595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Miao Y, Huang Y-MM, Walker RC, McCammon JA and Chang C-EA (2018b) Ligand binding pathways and conformational transitions of the HIV protease. Biochemistry 57(9), 1533–1541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Miao Y and McCammon JA (2016) Graded activation and free energy landscapes of a muscarinic G-protein-coupled receptor. Proceedings of the National Academy of Sciences of the United States of America 113(43), 12162–12167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Miao Y and McCammon JA (2018) Mechanism of the G-protein mimetic nanobody binding to a muscarinic G-protein-coupled receptor. Proceedings of the National Academy of Sciences of the United States of America 115(12), 3036–3041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Miura K (2018) An overview of current methods to confirm protein-protein interactions. Protein and Peptide Letters 25(8), 728–733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Mlýnský V, Janeček M, Kührová P, Fröhlking T, Otyepka M, Bussi G, Banáš P and Šponer J (2022) Toward convergence in folding simulations of RNA tetraloops: Comparison of enhanced sampling techniques and effects of force field modifications. Journal of Chemical Theory and Computation 18(4), 2642–2656. [DOI] [PubMed] [Google Scholar]
  79. Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS and Olson AJ (2009) AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. Journal of Computational Chemistry 30(16), 2785–2791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Motta, S., Callea, L., Bonati, L. and Pandini, A. (2022). PathDetect-SOM: A neural network approach for the identification of pathways in ligand binding simulations. Journal of Chemical Theory and Computation 18, 1957–1968. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Nguyen AT, Vecchio EA, Thomas T, Nguyen TD, Aurelio L, Scammells PJ, White PJ, Sexton PM, Gregory KJ, May LT and Christopoulos A (2016) Role of the second extracellular loop of the adenosine A1 receptor on allosteric modulator binding, signaling, and cooperativity. Molecular Pharmacology 90(6), 715–725. [DOI] [PubMed] [Google Scholar]
  82. Noé F (2020) Machine learning for molecular dynamics on long timescales. In Machine Learning Meets Quantum Physics. Springer, pp. 331–372. [Google Scholar]
  83. Nooren IM and Thornton JM (2003) Diversity of protein-protein interactions. The EMBO Journal 22(14), 3486–3492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Nunes-Alves A, Kokh DB and Wade RC (2021) Ligand unbinding mechanisms and kinetics for T4 lysozyme mutants from τRAMD simulations. Current Research in Structural Biology 3, 106–111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Oshima H, Re S and Sugita Y (2019) Replica-exchange umbrella sampling combined with Gaussian accelerated molecular dynamics for free-energy calculation of biomolecules. Journal of Chemical Theory and Computation 15(10), 5199–5208. [DOI] [PubMed] [Google Scholar]
  86. Pan AC, Jacobson D, Yatsenko K, Sritharan D, Weinreich TM and Shaw DE (2019) Atomic-level characterization of protein-protein association. Proceedings of the National Academy of Sciences of the United States of America 116(10), 4244–4249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Pan AC, Xu H, Palpant T and Shaw DE (2017) Quantitative characterization of the binding and unbinding of Millimolar drug fragments with molecular dynamics simulations. Journal of Chemical Theory and Computation 13(7), 3372–3377. [DOI] [PubMed] [Google Scholar]
  88. Pang YT, Miao Y, Wang Y and McCammon JA (2017) Gaussian accelerated molecular dynamics in NAMD. Journal of Chemical Theory and Computation 13(1), 9–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Paul F, Wehmeyer C, Abualrous ET, Wu H, Crabtree MD, Schöneberg J, Clarke J, Freund C, Weikl TR and Noé F (2017) Protein-peptide association kinetics beyond the seconds timescale from atomistic simulations. Nature Communications 8(1), 1095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Peeters MC, Wisse LE, Dinaj A, Vroling B, Vriend G and Ijzerman AP (2012) The role of the second and third extracellular loops of the adenosine A1 receptor in activation and allosteric modulation. Biochemical Pharmacology 84(1), 76–87. [DOI] [PubMed] [Google Scholar]
  91. Plattner N, Doerr S, De Fabritiis G and Noé F (2017) Complete protein–protein association kinetics in atomic detail revealed by molecular dynamics simulations and Markov modelling. Nature Chemistry 9(10), 1005–1011. [DOI] [PubMed] [Google Scholar]
  92. Plattner N and Noe F (2015) Protein conformational plasticity and complex ligand-binding kinetics explored by atomistic simulations and Markov models. Nature Communications 6, 7653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Poltavsky I and Tkatchenko A (2021) Machine learning force fields: Recent advances and remaining challenges. The Journal of Physical Chemistry Letters 12, 6551–6564. [DOI] [PubMed] [Google Scholar]
  94. Porter KA, Xia B, Beglov D, Bohnuud T, Alam N, Schueler-Furman O and Kozakov D (2017) ClusPro PeptiDock: Efficient global docking of peptide recognition motifs using FFT. Bioinformatics 33(20), 3299–3301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Ramanathan A, Savol A, Burger V, Quinn S, Agarwal PK and Chennubhotla C (2012) Statistical inference for big data problems in molecular biophysics. In Neural Information Processing Systems: Workshop on Big Learning. Citeseer. [Google Scholar]
  96. Raniolo S and Limongelli V (2020) Ligand binding free-energy calculations with funnel metadynamics. Nature Protocols 15, 2837–2866. [DOI] [PubMed] [Google Scholar]
  97. Re S, Oshima H, Kasahara K, Kamiya M and Sugita Y (2019) Encounter complexes and hidden poses of kinase-inhibitor binding on the free-energy landscape. Proceedings of the National Academy of Sciences of the United States of America 116(37), 18404–18409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Robustelli P, Piana S and Shaw DE (2020) Mechanism of coupled folding-upon-binding of an intrinsically disordered protein. Journal of the American Chemical Society 142(25), 11092–11101. [DOI] [PubMed] [Google Scholar]
  99. Saglam AS and Chong LT (2019) Protein–protein binding pathways and calculations of rate constants using fully-continuous, explicit-solvent simulations. Chemical Science 10(8), 2360–2372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Salawu EO (2018) The impairment of TorsinA’s binding to and interactions with its activator: An atomistic molecular dynamics study of primary dystonia. Frontiers in Molecular Biosciences 5, 64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Saleh N, Ibrahim P, Saladino G, Gervasio FL and Clark T (2017) An efficient metadynamics-based protocol to model the binding affinity and the transition state ensemble of G-protein-coupled receptor ligands. Journal of Chemical Information and Modeling 57(5), 1210–1217. [DOI] [PubMed] [Google Scholar]
  102. Schreiber G and Fersht AR (1993) Interaction of barnase with its polypeptide inhibitor barstar studied by protein engineering. Biochemistry 32(19), 5145–5150. [DOI] [PubMed] [Google Scholar]
  103. Schuetz DA, Bernetti M, Bertazzo M, Musil D, Eggenweiler HM, Recanatini M, Masetti M, Ecker GF and Cavalli A (2018) Predicting residence time and drug unbinding pathway through scaled molecular dynamics. Journal of Chemical Information and Modeling 59, 535–549. [DOI] [PubMed] [Google Scholar]
  104. Schuetz DA, De Witte WEA, Wong YC, Knasmueller B, Richter L, Kokh DB, Sadiq SK, Bosma R, Nederpelt I, Heitman LH, Segala E, Amaral M, Guo D, Andres D, Georgi V, Stoddart LA, Hill S, Cooke RM, De Graaf C, Leurs R, Frech M, Wade RC, De Lange ECM, Ap IJ, Muller-Fahrnow A and Ecker GF (2017) Kinetics for drug discovery: An industry-driven effort to target drug residence time. Drug Discovery Today 22(6), 896–911. [DOI] [PubMed] [Google Scholar]
  105. Scott DE, Bayly AR, Abell C and Skidmore J (2016) Small molecules, big targets: Drug discovery faces the protein–protein interaction challenge. Nature Reviews Drug Discovery 15(8), 533–550. [DOI] [PubMed] [Google Scholar]
  106. Shan Y, Kim ET, Eastwood MP, Dror RO, Seeliger MA and Shaw DE (2011) How does a drug molecule find its target binding site? Journal of the American Chemical Society 133(24), 9181–9183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Shao Q and Zhu W (2019) Exploring the ligand binding/unbinding pathway by selectively enhanced sampling of ligand in a protein–ligand complex. The Journal of Physical Chemistry B 123(38), 7974–7983. [DOI] [PubMed] [Google Scholar]
  108. Shaw DE, Adams PJ, Azaria A, Bank JA, Batson B, Bell A, Bergdorf M, Bhatt J, Butts JA, Correia T, Dirks RM, Dror RO, Eastwood MP, Edwards B, Even A, Feldmann P, Fenn M, Fenton CH, Forte A, Gagliardo J, Gill G, Gorlatova M, Greskamp B, Grossman JP, Gullingsrud J, Harper A, Hasenplaugh W, Heily M, Heshmat BC, Hunt J, Ierardi DJ, Iserovich L, Jackson BL, Johnson NP, Kirk MM, Klepeis JL, Kuskin JS, Mackenzie KM, Mader RJ, McGowen R, McLaughlin A, Moraes MA, Nasr MH, Nociolo LJ, O’Donnell L, Parker A, Peticolas JL, Pocina G, Predescu C, Quan T, Salmon JK, Schwink C, Shim KS, Siddique N, Spengler J, Szalay T, Tabladillo R, Tartler R, Taube AG, Theobald M, Towles B, Vick W, Wang SC, Wazlowski M, Weingarten MJ, Williams JM and Yuh KA (2021). Anton 3: Twenty microseconds of molecular dynamics simulation before lunch. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Association for Computing Machinery, St. Louis, Missouri, Article 1. [Google Scholar]
  109. Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y and Wriggers W (2010) Atomic-level characterization of the structural dynamics of proteins. Science 330(6002), 341–346. [DOI] [PubMed] [Google Scholar]
  110. Shen T and Hamelberg D (2008) A statistical analysis of the precision of reweighting-based simulations. The Journal of Chemical Physics 129(3), 034103. [DOI] [PubMed] [Google Scholar]
  111. Siebenmorgen T and Zacharias M (2020) Efficient refinement and free energy scoring of predicted protein–protein complexes using replica exchange with repulsive scaling. Journal of Chemical Information and Modeling 60(11), 5552–5562. [DOI] [PubMed] [Google Scholar]
  112. Sieker F, Straatsma TP, Springer S and Zacharias M (2008) Differential tapasin dependence of MHC class I molecules correlates with conformational changes upon peptide dissociation: A molecular dynamics simulation study. Molecular Immunology 45(14), 3714–3722. [DOI] [PubMed] [Google Scholar]
  113. Sinko W, Miao Y, De Oliveira CSAF and McCammon JA (2013) Population based reweighting of scaled molecular dynamics. The Journal of Physical Chemistry B 117(42), 12759–12768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Souza PCT, Alessandri R, Barnoud J, Thallmair S, Faustino I, Grünewald F, Patmanidis I, Abdizadeh H, Bruininks BMH, Wassenaar TA, Kroon PC, Melcr J, Nieto V, Corradi V, Khan HM, Domański J, Javanainen M, Martinez-Seara H, Reuter N, Best RB, Vattulainen I, Monticelli L, Periole X, Tieleman DP, De Vries AH and Marrink SJ (2021) Martini 3: A general purpose force field for coarse-grained molecular dynamics. Nature Methods 18(4), 382–388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Souza PCT, Thallmair S, Conflitti P, Ramirez-Palacios C, Alessandri R, Raniolo S, Limongelli V and Marrink SJ (2020) Protein–ligand binding with the coarse-grained martini model. Nature Communications 11(1), 3714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  116. Spaar A, Dammer C, Gabdoulline RR, Wade RC and Helms V (2006) Diffusional encounter of barnase and barstar. Biophysical Journal 90(6), 1913–1924. [DOI] [PMC free article] [PubMed] [Google Scholar]
  117. Sugita Y, Kamiya M, Oshima H and Re S (2019). Replica-exchange methods for biomolecular simulations. In Biomolecular Simulations. Springer, (ed. Bonomi M. & Camilloni C.) pp. 155–177, Humana New York, NY. [DOI] [PubMed] [Google Scholar]
  118. Sugita Y and Okamoto Y (1999) Replica-exchange molecular dynamics method for protein folding. Chemical Physics Letters 314(1–2), 141–151. [Google Scholar]
  119. Sun H, Li Y, Shen M, Li D, Kang Y and Hou T (2017) Characterizing drug–target residence time with metadynamics: How to achieve dissociation rate efficiently without losing accuracy against time-consuming approaches. Journal of Chemical Information and Modeling 57(8), 1895–1906. [DOI] [PubMed] [Google Scholar]
  120. Sun L, Vandermause J, Batzner S, Xie Y, Clark D, Chen W and Kozinsky B (2022) Multitask machine learning of collective variables for enhanced sampling of rare events. Journal of Chemical Theory and Computation 18(4), 2341–2353. [DOI] [PubMed] [Google Scholar]
  121. Sussman JL, Lin D, Jiang J, Manning NO, Prilusky J, Ritter O and Abola EE (1998) Protein data bank (PDB): Database of three-dimensional structural information of biological macromolecules. Acta Crystallographica Section D: Biological Crystallography 54(6), 1078–1084. [DOI] [PubMed] [Google Scholar]
  122. Tiwary P and Parrinello M (2013) From metadynamics to dynamics. Physical Review Letters 111(23), 230602. [DOI] [PubMed] [Google Scholar]
  123. Unke OT, Chmiela S, Sauceda HE, Gastegger M, Poltavsky I, Schütt KT, Tkatchenko A and Müller K-R (2021) Machine learning force fields. Chemical Reviews 121, 10142–10186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  124. Vakser IA (2020) Challenges in protein docking. Current Opinion in Structural Biology 64, 160–165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  125. Votapka LW and Amaro RE (2015) Multiscale estimation of binding kinetics using Brownian dynamics, molecular dynamics and Milestoning. PLoS Computational Biology 11(10), e1004381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  126. Voter AF (1997) Hyperdynamics: Accelerated molecular dynamics of infrequent events. Physical Review Letters 78(20), 3908. [Google Scholar]
  127. Wang G and Zhu W (2016) Molecular docking for drug discovery and development: A widely used approach but far from perfect. Future Medicinal Chemistry 8(14), 1707–1710. [DOI] [PubMed] [Google Scholar]
  128. Wang J, Arantes PR, Bhattarai A, Hsu RV, Pawnikar S, Huang YM, Palermo G and Miao Y (2021) Gaussian accelerated molecular dynamics (GaMD): Principles and applications. Wiley Interdisciplinary Reviews: Computational Molecular Science 11(5), e1521. [DOI] [PMC free article] [PubMed] [Google Scholar]
  129. Wang J, Ishchenko A, Zhang W, Razavi A and Langley D (2022a) A highly accurate metadynamics-based dissociation free energy method to calculate protein–protein and protein–ligand binding potencies. Scientific Reports 12(1), 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  130. Wang J, Lan L, Wu X, Xu L and Miao Y (2022b) Mechanism of RNA recognition by a Musashi RNA-binding protein. Current Research in Structural Biology 4, 10–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  131. Wang J and Miao Y (2020) Peptide Gaussian accelerated molecular dynamics (pep-GaMD): Enhanced sampling and free energy and kinetics calculations of peptide binding. The Journal of Chemical Physics 153(15), 154109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  132. Wang J and Miao Y (2022) Protein–protein interaction-Gaussian accelerated molecular dynamics (PPI-GaMD): Characterization of protein binding thermodynamics and kinetics. Journal of Chemical Theory and Computation 18, 1275–1285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  133. Wang Y-T and Chan Y-H (2017) Understanding the molecular basis of agonist/antagonist mechanism of human mu opioid receptor through Gaussian accelerated molecular dynamics method. Scientific Reports 7(1), 7828. [DOI] [PMC free article] [PubMed] [Google Scholar]
  134. Wang Y, Ribeiro JML and Tiwary P (2019) Past–future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics. Nature Communications 10(1), 3573. [DOI] [PMC free article] [PubMed] [Google Scholar]
  135. Wang Y, Ribeiro JML and Tiwary P (2020) Machine learning approaches for analyzing and enhancing molecular dynamics simulations. Current Opinion in Structural Biology 61, 139–145. [DOI] [PubMed] [Google Scholar]
  136. Wieczorek G and Zielenkiewicz P (2008) Influence of macromolecular crowding on protein-protein association rates—A Brownian dynamics study. Biophysical Journal 95(11), 5030–5036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  137. Wu X, Knudsen B, Feller SM, Zheng J, Sali A, Cowburn D, Hanafusa H and Kuriyan J (1995) Structural basis for the specific interaction of lysine-containing proline-rich peptides with the N-terminal SH3 domain of c-Crk. Structure 3(2), 215–226. [DOI] [PubMed] [Google Scholar]
  138. Xie L, Shen L, Chen Z-N and Yang M (2017) Efficient free energy calculations by combining two complementary tempering sampling methods. The Journal of Chemical Physics 146(2), 024103. [DOI] [PubMed] [Google Scholar]
  139. Xue Y, Yuwen T, Zhu F and Skrynnikov NR (2014) Role of electrostatic interactions in binding of peptides and intrinsically disordered proteins to their folded targets. 1. NMR and MD characterization of the complex between the c-Crk N-SH3 domain and the peptide Sos. Biochemistry 53(41), 6473–6495. [DOI] [PubMed] [Google Scholar]
  140. Yang L, Liu C-W, Shao Q, Zhang J and Gao YQ (2015) From thermodynamics to kinetics: Enhanced sampling of rare events. Accounts of Chemical Research 48(4), 947–955. [DOI] [PubMed] [Google Scholar]
  141. Yang L and Qin Gao Y (2009) A selective integrated tempering method. The Journal of Chemical Physics 131(21), 12B606. [DOI] [PubMed] [Google Scholar]
  142. You W, Tang Z and Chang C-EA (2019) Potential mean force from umbrella sampling simulations: What can we learn and what is missed? Journal of Chemical Theory and Computation 15(4), 2433–2443. [DOI] [PMC free article] [PubMed] [Google Scholar]
  143. Zhang J, Wang N, Miao Y, Hauser F, McCammon JA, Rappel W-J and Schroeder JI (2018) Identification of SLAC1 anion channel residues required for CO2 bicarbonate sensing and regulation of stomatal movements. Proceedings of the National Academy of Sciences of the United States of America 115, 11129–11137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  144. Zou R, Zhou Y, Wang Y, Kuang G, Ågren H, Wu J and Tu Y (2020) Free energy profile and kinetics of coupled folding and binding of the intrinsically disordered protein p53 with MDM2. Journal of Chemical Information and Modeling 60(3), 1551–1558. [DOI] [PubMed] [Google Scholar]
  145. Zsoldos Z, Reid D, Simon A, Sadjad SB and Johnson AP (2007) eHiTS: A new fast, exhaustive flexible ligand docking system. Journal of Molecular Graphics and Modelling 26(1), 198–212. [DOI] [PubMed] [Google Scholar]
  146. Zuckerman DM (2011) Equilibrium sampling in biomolecular simulations. Annual Review of Biophysics 40(1), 41–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  147. Zwier MC, Pratt AJ, Adelman JL, Kaus JW, Zuckerman DM and Chong LT (2016) Efficient atomistic simulation of pathways and calculation of rate constants for a protein–peptide binding process: Application to the MDM2 protein and an intrinsically disordered p53 peptide. The journal of physical chemistry letters 7(17), 3440–3445. [DOI] [PMC free article] [PubMed] [Google Scholar]
QRB Discov. doi: 10.1017/qrd.2022.11.pr1

Review: Challenges and Frontiers of Computational Modeling of Biomolecular Recognition — R0/PR1

Reviewed by: Yuji Sugita1

Comments to Author: In this manuscript, the authors introduce recent enhanced-sampling methods for accelerating association and dissociation events of protein-ligand, protein-peptide, and protein-protein complexes. The authors classify the method into the three types: Collective-variable (CV) based methods, CV-free methods, and the methods combined with machine learning (ML) techniques. In CV-based methods, bias potentials are applied to the system along the predefined CVs. Umbrella sampling or metadynamics are applied to binding problems to investigate binding affinities, pathways, and kinetics. In CV-free methods, bias potentials do not depend on the CVs. The authors mainly introduce Gaussian accelerated MD (GaMD), which was developed by themselves. In particular, selective GaMD methods are efficient for binding and unbinding simulations, because they can apply the boosting potentials to the selective regions of interest in the system. Due to sufficient statistics for binding and unbinding events, the free-energy changes as well as the kinetics (k_on and k_off) can be estimated with high accuracy. In the methods with ML, ML or deep learning (DL) improves the scoring function for docking simulations and achieves the structure prediction, such as AlphaFold and RoseTTAFold. The authors also combine DL with GaMD. DL extracts the important interactions between residues and the CVs from GaMD trajectories, which enables to obtain the accurate free-energy profiles. This manuscript is well written and concisely summarizes recent works of enhanced sampling methods. I recommend the publication of this manuscript after minor revisions, considering the points below.

The authors separately discuss about CV-based and CV-free methods, but their combination should be important for more efficient sampling. In fact, in the last paragraph of Sec.5, the authors mention that compatible enhanced methods could be combined to be more powerful. Even if the hidden energy barriers exist in the orthogonal degrees of freedom for the predefined CVs in the CV-based method, the CV-free method can enhance the sampling in the orthogonal CV spaces. Several combinations of CV-based and CV-free methods have been already proposed. For examples, GaREUS (https://doi.org/10.1021/acs.jctc.9b00761), gREST/REUS (https://doi.org/10.1063/1.5016222; https://doi.org/10.1073/pnas.1904707116), ST-MetaD (https://doi.org/10.1021/acs.jctc.1c01222), ITS/TAMD (https://doi.org/10.1063/1.4973607), etc. The authors should discuss more about the combinations of enhanced sampling methods.

GaMD boosts the motion and flexibility of biomolecules and enhances the sampling in the conformational space, resulting in the reduction of the simulation time. However, even if GaMD is used, many independent GaMD simulations or long GaMD simulations are required to obtain sufficient statistics for protein-peptide binding or binding between large biomolecules. We suggest the authors to discuss convergence issues of GaMD in more details.

GaMD successfully reproduces the binding affinities and kinetics with very high accuracy. However, even if the binding and unbinding events are sufficiently sampled, the affinities and kinetics would strongly depend on the force-field parameters of proteins and ligands and the water model. The author had better explain the relationship between the force-field parameters and enhanced conformational sampling methods.

Minor comments

1. Page 11, Fifth paragraph of Section 3: V_{PP,nb}(r_P) + V_{LL,nb}(r_L) + V_{EE,nb}(r_E) duplicates in V(r). Please modify the duplication.

QRB Discov. doi: 10.1017/qrd.2022.11.pr2

Review: Challenges and Frontiers of Computational Modeling of Biomolecular Recognition — R0/PR2

Reviewed by: Wen Ma1

Comments to Author: The manuscript reviews computational approaches to study biomolecular binding and dissociation processes. The authors reviewed the challenges and latest developments in applying molecular dynamics (MD) simulations to study protein-ligand and protein-protein interactions. Among these methods, they described in detail how Gaussian accelerated MD (GaMD) can be used to enhance sampling. A "selective GaMD" algorithm was introduced to more efficiently accelerate a certain biological process by perturbing specific terms in the potential energy function. In general this review is very interesting to the readership of QRB discovery - I would like to recommend it for publication after considering the following minor points:

1. On Page 8, it is mentioned that metadynamics simulations were used to predict ligand unbinding pathways and related k_off. "The predicted k_off (9.1 ± 2.5 s^-1) was comparable with the experimental data (600 ± 300 s^-1)." Actually these two numbers are not exactly comparable as they are orders of magnitude away from each other.

2. The authors discussed coarse-grained (CG) MD models, which can greatly extend the simulation timescales compared to conventional MD. They should also address CG models that can efficiently sample peptide binding to a receptor. A useful united-atom CG model (PACE) was successfully used to study intrinsically disordered peptide binding to a receptor (Han, W., & Schulten, K. (2014). JACS, 136(35), 12450-12460). This work performed millisecond CG simulations to characterize an Aβ peptide binding to an amyloid fibril tip.

3. The original work of milestoning should be cited in the discussion of SEEKR on Page 6 (e.g. a review by Elber, R. (2020). Annu. Rev. Biophys., 49(1), 69-85).

4. In Figure. 4B, it is a bit unclear to me what the multiple structures represent. Are these structures just static PDB snapshots from GaMD or should they be the saliency maps built based on residue contacts?

5. One of the benefits of MD-coupled machine learning approaches is that the information (features) learned from the neural network can be used to iteratively enhance the MD sampling. This point can be discussed in Section 4 (e.g. check out Wang, Y., Ribeiro, J.M.L. & Tiwary, P. (2019). Nat. Commun., 10, 3573).

QRB Discov. doi: 10.1017/qrd.2022.11.pr3

Recommendation: Challenges and Frontiers of Computational Modeling of Biomolecular Recognition — R0/PR3

Editor: Giulia Palermo1

Comments to Author: Reviewer #1: The manuscript reviews computational approaches to study biomolecular binding and dissociation processes. The authors reviewed the challenges and latest developments in applying molecular dynamics (MD) simulations to study protein-ligand and protein-protein interactions. Among these methods, they described in detail how Gaussian accelerated MD (GaMD) can be used to enhance sampling. A "selective GaMD" algorithm was introduced to more efficiently accelerate a certain biological process by perturbing specific terms in the potential energy function. In general this review is very interesting to the readership of QRB discovery - I would like to recommend it for publication after considering the following minor points:

1. On Page 8, it is mentioned that metadynamics simulations were used to predict ligand unbinding pathways and related k_off. "The predicted k_off (9.1 ± 2.5 s^-1) was comparable with the experimental data (600 ± 300 s^-1)." Actually these two numbers are not exactly comparable as they are orders of magnitude away from each other.

2. The authors discussed coarse-grained (CG) MD models, which can greatly extend the simulation timescales compared to conventional MD. They should also address CG models that can efficiently sample peptide binding to a receptor. A useful united-atom CG model (PACE) was successfully used to study intrinsically disordered peptide binding to a receptor (Han, W., & Schulten, K. (2014). JACS, 136(35), 12450-12460). This work performed millisecond CG simulations to characterize an Aβ peptide binding to an amyloid fibril tip.

3. The original work of milestoning should be cited in the discussion of SEEKR on Page 6 (e.g. a review by Elber, R. (2020). Annu. Rev. Biophys., 49(1), 69-85).

4. In Figure. 4B, it is a bit unclear to me what the multiple structures represent. Are these structures just static PDB snapshots from GaMD or should they be the saliency maps built based on residue contacts?

5. One of the benefits of MD-coupled machine learning approaches is that the information (features) learned from the neural network can be used to iteratively enhance the MD sampling. This point can be discussed in Section 4 (e.g. check out Wang, Y., Ribeiro, J.M.L. & Tiwary, P. (2019). Nat. Commun., 10, 3573).

Reviewer #2: In this manuscript, the authors introduce recent enhanced-sampling methods for accelerating association and dissociation events of protein-ligand, protein-peptide, and protein-protein complexes. The authors classify the method into the three types: Collective-variable (CV) based methods, CV-free methods, and the methods combined with machine learning (ML) techniques. In CV-based methods, bias potentials are applied to the system along the predefined CVs. Umbrella sampling or metadynamics are applied to binding problems to investigate binding affinities, pathways, and kinetics. In CV-free methods, bias potentials do not depend on the CVs. The authors mainly introduce Gaussian accelerated MD (GaMD), which was developed by themselves. In particular, selective GaMD methods are efficient for binding and unbinding simulations, because they can apply the boosting potentials to the selective regions of interest in the system. Due to sufficient statistics for binding and unbinding events, the free-energy changes as well as the kinetics (k_on and k_off) can be estimated with high accuracy. In the methods with ML, ML or deep learning (DL) improves the scoring function for docking simulations and achieves the structure prediction, such as AlphaFold and RoseTTAFold. The authors also combine DL with GaMD. DL extracts the important interactions between residues and the CVs from GaMD trajectories, which enables to obtain the accurate free-energy profiles. This manuscript is well written and concisely summarizes recent works of enhanced sampling methods. I recommend the publication of this manuscript after minor revisions, considering the points below.

The authors separately discuss about CV-based and CV-free methods, but their combination should be important for more efficient sampling. In fact, in the last paragraph of Sec.5, the authors mention that compatible enhanced methods could be combined to be more powerful. Even if the hidden energy barriers exist in the orthogonal degrees of freedom for the predefined CVs in the CV-based method, the CV-free method can enhance the sampling in the orthogonal CV spaces. Several combinations of CV-based and CV-free methods have been already proposed. For examples, GaREUS (https://doi.org/10.1021/acs.jctc.9b00761), gREST/REUS (https://doi.org/10.1063/1.5016222; https://doi.org/10.1073/pnas.1904707116), ST-MetaD (https://doi.org/10.1021/acs.jctc.1c01222), ITS/TAMD (https://doi.org/10.1063/1.4973607), etc. The authors should discuss more about the combinations of enhanced sampling methods.

GaMD boosts the motion and flexibility of biomolecules and enhances the sampling in the conformational space, resulting in the reduction of the simulation time. However, even if GaMD is used, many independent GaMD simulations or long GaMD simulations are required to obtain sufficient statistics for protein-peptide binding or binding between large biomolecules. We suggest the authors to discuss convergence issues of GaMD in more details.

GaMD successfully reproduces the binding affinities and kinetics with very high accuracy. However, even if the binding and unbinding events are sufficiently sampled, the affinities and kinetics would strongly depend on the force-field parameters of proteins and ligands and the water model. The author had better explain the relationship between the force-field parameters and enhanced conformational sampling methods.

Minor comments

1. Page 11, Fifth paragraph of Section 3: V_{PP,nb}(r_P) + V_{LL,nb}(r_L) + V_{EE,nb}(r_E) duplicates in V(r). Please modify the duplication.

QRB Discov. doi: 10.1017/qrd.2022.11.pr4

Recommendation: Challenges and Frontiers of Computational Modeling of Biomolecular Recognition — R1/PR4

Editor: Giulia Palermo1

No accompanying comment.


Articles from QRB Discovery are provided here courtesy of Cambridge University Press

RESOURCES