Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2024 Jul 9;20(14):5807–5819. doi: 10.1021/acs.jctc.4c00452

Incorporating Prior Knowledge in the Seeds of Adaptive Sampling Molecular Dynamics Simulations of Ligand Transport in Enzymes with Buried Active Sites

Dheeraj Kumar Sarkar †,, Bartlomiej Surpeta †,, Jan Brezovsky †,‡,*
PMCID: PMC11270739  PMID: 38978395

Abstract

graphic file with name ct4c00452_0006.jpg

Because most proteins have buried active sites, protein tunnels or channels play a crucial role in the transport of small molecules into buried cavities for enzymatic catalysis. Tunnels can critically modulate the biological process of protein–ligand recognition. Various molecular dynamics methods have been developed for exploring and exploiting the protein–ligand conformational space to extract high-resolution details of the binding processes, a recent example being energetically unbiased high-throughput adaptive sampling simulations. The current study systematically contrasted the role of integrating prior knowledge while generating useful initial protein–ligand configurations, called seeds, for these simulations. Using a nontrivial system of a haloalkane dehalogenase mutant with multiple transport tunnels leading to a deeply buried active site, simulations were employed to derive kinetic models describing the process of association and dissociation of the substrate molecule. The most knowledge-based seed generation enabled high-throughput simulations that could more consistently capture the entire transport process, explore the complex network of transport tunnels, and predict equilibrium dissociation constants, koff/kon, on the same order of magnitude as experimental measurements. Overall, the infusion of more knowledge into the initial seeds of adaptive sampling simulations could render analyses of transport mechanisms in enzymes more consistent even for very complex biomolecular systems, thereby promoting drug development efforts and the rational design of enzymes with buried active sites.

1. Introduction

Molecular recognition is critical for all biological processes. Therefore, it has been a long-standing quest to capture the intrinsically dynamic and volatile nature of protein–ligand (un)binding processes by high-resolution sampling and resolve the meaningful kinetics of ligand binding processes in structure-based drug design.1,2 Additionally, a ligand can prefer multiple routes of entry to interact with the environment of an active site.35 These routes, often referred to as tunnels, are seen to have equivalent importance as the catalytic properties of enzymes.6 Whereas in the majority of enzymes, the active site is buried,7,8 the underlying molecular properties of the tunnels can control the entry and exit of ligands to a greater extent, specifically via gating residues.9 In this context, the ligand binding processes via those transport pathways are a critical component in biocatalysis and for identifying critical residues underlying the transport processes for mutagenesis and rational drug design.6,10 Hence, protein tunnels are well-placed when considering improved catalysis and features such as the specificity and altered activity of small molecules. Very often, tunnel lining residues or other gating residues can act as hot spots other than active site residues.9,10

Transport processes, such as the migration of ligands from an active site to the bulk solvent, are often connected with the requirement of overcoming a high energy barrier, resulting in the rare nature of such an event.11 Because molecular dynamics (MD) simulations can observe biologically relevant processes even at atomistic resolution, they are extensively used to study the mechanisms, dynamics, and functions of biomolecular complexes.12,13 Numerous computational approaches have been developed in recent years to sample such rare events of ligand transport processes involving the association and dissociation of ligands and receptors.14 These approaches benefit from the improvement of computational hardware in terms of GPUs as well as the implementation of various path sampling methods and methods for sampling rare events.15,16 Specifically, enhanced sampling methods such as milestoning,17 weighted ensemble,18 Gaussian accelerated MD,19,20 metadynamics,21,22 adaptive sampling MD (ASMD) based on Markov state models (MSM),5,23,24 and random acceleration MDs5,25,26 have gained popularity in studying such rare events. Although most methods use additional potential or force to bias the simulations along a designed collective variable, ASMD methods utilizing MSMs can avoid such perturbations.2729 MSMs is an approach for extracting kinetic information from unbiased MD simulations, capable of deriving longer-time scale insights from shorter trajectories. MSM modeling entails the collection of data from the trajectory set, followed by the decomposition of the molecular ensemble into microstates, which are subsequently employed to enumerate transitions between them. This enables the construction of a transition probability matrix and definition of metastable states, which can then be used to create a final model, delivering insights into the conditional probabilities of being in a particular state and transitioning to another state at a specific time (also known as lag time). In simple terms, the MSM can be described as a network, in which the metastable states are represented as nodes, while the edges represent the transition rates between those states.2931

Extensive ASMD simulations have been successfully used to study ligand binding processes.5,24,3235 ASMD is an energetically unbiased protocol comprising iterative rounds of intelligently respawned equilibrium simulations of protein–ligand configurations. The procedure starts with generating the set of initial configurations, which are called seeds. From these seeds, a set of independent simulation runs is performed, and preliminary MSM is created from these simulations. Next, a scoring function is used to identify the least explored, most informative metastates in this MSM, employed to respawn the subsequent batch of simulations.28,29 The whole ASMD procedure consists of several iterative rounds (called epochs) of MD simulations, data aggregation from all previous simulations, an MSM creation, metastate scoring, and the generation of new starting configurations. Usually, ASMD setup speeds sampling by about an order of magnitude over the standard MD simulations.33,36,37 However, ASMDs are still ineffective when a too large barrier is encountered that cannot be overcome in an unbiased regimen, requiring the sampling of the ensuing rare events with enhanced sampling methods along predefined collective variable, and incorporating such data through the concept of multiensemble MSM.38 Finally, most ASMD protocols suffer from exploitation-exploration trade-off,39 meaning that already visited states will less likely be respawn for simulations in the next epochs, potentially limiting the sampling of some metastable states required for accurate description of processes.35 Given the rising success of ASMD simulations in ligand transport studies, the impact of designing individual components in an ASMD workflow on the efficacy of sampling relevant regions of the protein–ligand configurational space is of interest.24,33,35,40,41 Betz and Dror investigated the role of a scoring function for selecting the configuration for successive iterations to partially overcome the exploration–exploitation trade-off using the well-known test system of trypsin with a benzamidine inhibitor and a more complex yet realistic system of membrane-bound adrenergic receptor β2 with dihydroalprenolol inhibitor.35 They compared three scoring functions: simple counts, in which states were resampled with a probability inversely proportional to their occurrence in the simulation; population scores, which prefer states with smaller populations in MSMs; and hub scores, which select states with lower connectivity in MSMs, as the measure of connectivity of states in MSMs. For that membrane-bound system, the count score could not govern the ASMD toward investigating inhibitor migration through the protein, by focusing entirely on the membrane region. In contrast, the other two scores successfully sampled the relevant configurations. Hence, the use of more information-rich functions markedly benefited the study of ligand transport in that more complex setting.

Here, we investigated the role of employing relevant information while preparing the initial seeding structures for ASMD. We designed four schemes (Figure 1A) from a random positioning of the ligand around the protein to more knowledge-based poses of the ligand bound in the active site or along the tunnels precomputed from the apo simulation. We tested the capabilities of ASMD, initiated from these seeding schemes, in exploring and exploiting the transport tunnels in haloalkane dehalogenase mutant LinB86 (Figure 1B,C), in which an additional functional tunnel was introduced de novo.10 By performing intensive ASMD simulations of LinB86 with one of its substrates, 1,2-dibromoethane (DBE) (Figure 1D), for each scheme, we were able to compare to what degree the initial seeding impacted the ability of ASMD to (i) capture the entire process of substrate association and dissociation, (ii) consistently identify the metastable states adopted by substrate, (iii) predict kinetic parameters of the process, and finally (iv) describe the complexity of transport via multiple transport tunnels.

Figure 1.

Figure 1

Overview of the evaluated seeding schemes and model system used. (A) Schematic representation of the studied schemes and seeding of the substrate molecule from random to knowledge-based positions. (B) Representative structure of tunnel network from 100 ns MD simulation of LinB86 (see Table S1 for other tunnel properties). The known tunnels are shown as sets of colored spheres: p1a (blue), p1b (cyan), four branches of p2 (green), and p3 (red). The protein structure is shown as a gray cartoon. (C) Average lengths of ensembles of the known tunnels observed in the MD simulation. The data represent mean ± stdev from the simulation. (D) A structure of 1,2-dibromoethane (DBE), which is the substrate of haloalkane dehalogenase LinB86.

2. Materials and Methods

2.1. Seed Generation for ASMD Simulations

The input model was based on the crystallographic structure of the mutant of haloalkane dehalogenase enzyme LinB86 (PDB code: 5LKA).10 The protein structure was further protonated using H++ web server42,43 at pH 8.5. The protein was modeled using the AMBER ff14SB42 force field and the substrate DBE with the General Amber Force Field – GAFF.44,45 The partial atomic charges on the DBE molecule were derived using multiconformational, multiorientational restrained electrostatic potential fit.46 Each DBE conformation was geometry optimized at the MP2/6-31G(d) level of theory, and its multiorientational molecular electrostatic potential was calculated at the HF/6-31G(d) level using Gaussian v09.47 Finally, two-stage charge fitting was conducted for all conformations and orientations using the resp and antechamber modules of AMBERTools18.48

The substrate molecule was placed according to the four designed schemes to investigate systematically the role of prior knowledge in ASMD seeding (Figure 1A). DBE was placed at 30 different positions, to seed 30 independent simulations for each scheme (Figure S1) as follows. (i) In the Bulk scheme, DBE was positioned on an equally spaced grid in the solvent surrounding the protein using the “drawgridbox [selection], nx = 5, ny = 5, nz = 5, padding = 5, lw = 1, r = 0, g = 0” function of PyMOL.49 (ii) In the Cavity scheme, DBE was docked to the enzyme’s active site using AutoDock Vina.50 The 30 docked poses were derived by defining the grid box centered at the center of mass (COM) of catalytic residues (N38, D108, W109, and H272) with a dimension of 22.5 Å and exhaustiveness of 1000. (iii) In the Cavity&Bulk scheme, 15 DBE positions were taken from the Cavity scheme based on their energies and first 15 DBE positions were taken from the Bulk scheme. Finally, (iv) in the Tunnels scheme, putative transport tunnels in LinB86 were detected from the 100 ns trajectory of ligand-free LinB86 simulation, and then the most open tunnels were explored for binding of DBE molecules along these tunnels. Finally, the composite tunnels, formed from parts of tunnels with conformations ensuring minimal energy costs for DBE migration, were generated (see Text S1 and Figures S2–S9 for details of this protocol).

The generated protein–ligand complexes were then solvated based on the 3D reference interaction site model theory51 using the Placevent52 algorithm. Such a system was then processed with the tleap module of AMBERTools18, placing the presolvated proteins in the octahedral box of OPC water molecules53 at a distance of 10 Å and neutralizing them with counterions (Na+ and Cl) at an ionic strength of 0.1 M. Finally, the hydrogen mass repartitioning (HMR) method was applied to produce topologies to enable a 4 fs time step.54 Here, we would like to note that the application of HMR might alter the time scales of the studied processes, as shown recently.55,56 Hence, in studies aimed at measuring the absolute rates, unlike the primarily comparative work here, it might be advisable to avoid HMR.

2.2. Equilibration MD Simulations of Seeds

The systems were then energy-minimized and equilibrated using PMEMD and PMEMD.CUDA modules57 of AMBER18,48 respectively. All complexes were energy minimized in five consecutive stages, each composed of 100 steps of the steepest descent followed by 400 steps of the conjugate gradient method, with gradually decreasing restraints on the protein atoms (initially 500 to heavy atoms, and later restraints of 500, 125, 25, and 0.001 kcal.mol–1·Å–2 applied only to the backbone atoms). Minimization was followed by 20 ps heating from 0 to 200 K in the NVT ensemble using the Langevin thermostat59 with a collision frequency of 2 ps–1 and coupling constant of 1 ps while keeping the protein heavy atoms restrained with a force constant of 5 kcal·mol–1·Å–2. Next, the temperature was raised to the target value of 310 K during the first 100 ps of the NVT simulation and kept constant for 900 ps, employing the same parameters as previously described. This was followed by an NPT simulation at 1 atm enforced by the weak-coupling barostat with a coupling constant of 1 ps using positional restraints of 5 kcal·mol–1·Å–2 on the backbone atoms for 1 ns, followed by 1 ns without any positional restraints. All MD simulation stages were run using a 4 fs time step enabled by SHAKE58 and HMR algorithms, periodic boundary conditions, and particle mesh Ewald method,60 with the nonbonded cutoff of 8 Å. The trajectories were generated by saving coordinates every 20 ps. The MD trajectories were analyzed using the cpptraj module of AMBERTools23,61,62 The last snapshots from the unrestrained simulation were used as the initial input structures for ASMD.

2.3. High Throughput ASMD to Study Substrate Un(binding) Processes

ASMD was set up with 30 epochs, each consisting of 30 separate production simulations. To build an MSM after each epoch, we used the distances between the Cα atoms of the protein and four heavy atoms of DBE and reduced the high dimensional space to three dimensions using time-lagged independent component analysis (TICA)63 with a lag time of 2 ns. ASMD simulations were performed using HTMD v1.13.1027 and AMBER1848 software packages. The equilibration phase in HTMD consisted of two 250 ps NVT and NPT simulations, during which the systems were heated from 0 to 310 K with a Langevin thermostat and harmonic positional restraints to the backbone atoms with a force constant of 5 kcal·mol–1·Å–2. Finally, a 50 ns unrestrained production MD was performed in the NVT ensemble using a weak-coupling thermostat and a saving frequency of 100 ps, since only the NVT production simulations were fully supported by HTMD package. Such ASMD runs were performed in three replicates for each investigated seeding scheme.

2.4. Final MSM Construction and Validation

All the MSMs were built using HTMD,27 which internally uses the PyEMMA program,64 following the standard PyEMMA protocols. The high dimensional data from adaptive sampling projected using the distance feature was reduced to three dimensions using TICA63 with a lag time of 2 ns. Next, the reduced TICA coordinates were clustered into 1000 microstates using the MiniBatchKMeans65 method. The metastable states were lumped using the PCCA++ method,66 with the number of metastable states based on spectral analysis64 and verified against plots of linear implied time scales (Figures S10–S12). A lag time of 20 ns was used during MSM construction. Finally, the Chapman–Kolmogorov67 test was performed to confirm the Markovianity of the generated MSMs (Figures S13–S21).

2.5. MSM Analysis and Comparison

In order to quantify the ability of ASMD to sample the whole (un)binding process of DBE, the distances between the COM of the DBE molecule and the COM of three catalytic residues (N38, D108, and W109, Figure S22A) were measured from the ∼900 trajectories for each replicate using the cpptraj module of AMBERTools23.61,62 Using this distance, we could define the location of DBE in the active site (0–5 Å), tunnel (5–19 Å), and bulk (>19 Å). The cutoff of 19 Å for tunnels was derived from the average lengths of the investigated tunnels measured by CAVER (Figure 1C and Table S1). Finally, the transition path theory approach implemented in PyEMMA was used to derive transition probability matrices and compute the mean first-passage times of each association and dissociation process in MSMs. Here, the metastable states with the most prevalent bound conformation of DBE were used as sink states. In contrast, the metastable states featuring DBE, mainly in the bulk solvent, were considered as source states to perform the transition flux analysis and derive the transition probabilities and kinetics rates. Furthermore, the most frequently occurring bottleneck residues were shortlisted from the CAVER results as follows (Figure S22B–E): p1a (D147, F151, and V173), p1b (D147, W177, and L248), p2 (L211 and L248), and p3 (L143, F151, and I213) and the distance between their COM to the COM of DBE was calculated to assess localization of DBE with respect to these tunnels. The fluctuations in the positions of COM of bottleneck and cavity residues were evaluated to confirm their stability and hence suitability to be used as landmarks (Table S2). While the cavity COM was the most stable, the RMSFs of all other COMs were lower than 1.4 Å, indicating the lack of major distortions in these landmarks during the simulations.

Ensembles of 1000 representative structures of metastable states generated from individual MSMs were clustered to establish the correspondence of these metastable states across the explored schemes. For this purpose, the mean, 25th, 50th, and 75th percentiles were calculated for each set of characteristic distances between DBE and the bottleneck residues as well as the catalytic machinery described above (Figure S22). These were used cumulatively as a vector of 20 variables describing each metastable state. Principal component analysis implemented in the Python scikit-learn library65 was used to reduce the dimensionality of each vector. The set of the first three principal components for each metastable state was clustered with HDBSCAN68 using a min_cluster_size of 2, with the remaining parameters kept at their default values.

2.6. Analysis of Substrate Utilization of Tunnels

Time-evolution of distances between DBE and the bottleneck residues as well as the catalytic machinery described above (Figure S22) for the entire set of trajectories was used to estimate the approximate position of the ligand in the context of the tunnel network. By tracking the change of the relative position, the movement through a particular tunnel was assigned, where possible. Therefore, the approximate tunnels’ utilization was estimated across the investigated schemes by analyzing the transition between subsequent positions. The procedure was composed of three stages, as follows.

i) Position Assignment

First, the closest bottleneck at a particular frame to the DBE molecule was defined. This information was used to define the approximate length of the closest tunnel (i.e., the distance between the COM of catalytic machinery and the COM of the particular bottleneck). These two distances were contrasted with the distance between the ligand and the catalytic machinery, which altogether resulted in the identification of the approximate ligand position. Importantly, at this point, additional parameters were introduced to classify the ligand position, namely bt_cutoff_along = 2.0 Å defining the region around the bottleneck (i.e., whether the ligand is in the bulk, bottleneck region, or tunnel) and bt_cutoff_across = 5.0 Å, which defines whether the ligand is not too far from the bottleneck horizontally in case it is within the bottleneck region. Given these three distances and introduced cutoffs, the following scenarios and corresponding ligand states were considered

  • Bulk (out_): Ligand is further from the active site than the sum of tunnel length and bt_cutoff_along.

  • Bottleneck (bt_): Ligand is within the bottleneck region, either further than the tunnel length or closer than the tunnel length but within bt_cutoff_along and bt_cutoff_across.

  • Unknown bottleneck (bt_unknown): Ligand is within the bottleneck region, either further than the tunnel length or closer than the tunnel length within bt_cutoff_along but exceeding the bt_cutoff_across.

  • Inside (in_): Ligand distance to catalytic machinery is shorter than the tunnel length reduced by the bt_cutoff_along.

ii) Transition Detection and Classification

Given the defined states for each frame, the transitions between bulk (out_) and interior (in_) and vice versa were identified. Transitions via bottleneck regions (in_bt_out or out_bt_in) were also considered. When a mismatch was detected between assigned tunnel in_out/out_in, we applied an additional dist_tolerance = 1.0 Å parameter. This parameter defined the tolerance distance that is considered for swapping the classification of one of the sides of the transition, thus promoting the tunnel that was seen in the bottleneck region for scenarios where the intermediate state was seen. The transitions were tracked as follows:

  • If the transition occurred from the bulk to inside or from the inside to bulk directly, the transition in_out/out_in was assigned by applying the dist_tolerance for cases where a mismatch between both sides occurred.

  • If the ligand moved from the interior to the bottleneck region or from the bulk to the bottleneck region, the transition was not assigned; only the information regarding the temporary state was updated.

  • If the temporary state was a bottleneck and the closest tunnel changed, the transition was not assigned; only the bottleneck temporary state was updated.

  • If the temporary state was a bottleneck and the ligand moved to the same general state but related to a different tunnel, the transition was not assigned; only the general state was updated.

  • If the temporary state was a bottleneck and the ligand moved to the other general state (from in_ to out_ or from out_ to in_), the transition was assigned. In addition, information was collected about the bottleneck used by applying the dist_tolerance for cases where the mismatch between both sides occurred, thus promoting the tunnel of the assigned transition bottleneck state.

iii) Characterization of Tunnel Utilization

Finally, all types of unique transitions were counted across all simulations from each scheme and averaged across three replicates performed for each scheme. Importantly, we applied the following classification to assign transitions to particular categories

  • Tunnel (p1a, p1b, p2, and p3) – all transitions that passed through the bottleneck of a particular tunnel or the direct transitions in_out or out_in related to the same tunnel on both sides.

  • Mixed–all direct transitions in_out or out_in, where both sides of transitions differed despite the applied distance tolerance.

  • Unknown–all transitions that crossed through the unknown bottleneck.

3. Results and Discussion

Overall, ∼900 MD trajectories (450,000 frames) with an aggregated simulation time of 45 μs were produced using ASMD of LinB86-DBE complexes generated for each studied scheme (Table S3). For each scheme, ASMD was performed in three replicates (∼135 μs in total) to evaluate the abilities to consistently describe the entire transport processes, focusing on the convergence among ASMD replicates, the degree of quantitative agreement with experimental data, and the ability to consider transport via all known tunnels. These ASMD simulations featured stable protein conformations, with the most mobile part being unstructured N-terminal region (Figures S23 and S24).

3.1. Capturing DBE Association and Dissociation Processes in LinB86

In order to study the applicability of the studied schemes, we initially investigated how effectively each scheme could sample the end points of the processes (i.e., the bound and unbound states of DBE in the active site cavity of LinB86 and bulk solvent, respectively). Those states could be effectively defined by the distance of the DBE COM from the COM of three catalytic residues located at the bottom of the cavity (Figure S22A), defining the bound states within 5 Å distance, whereas the unbound state sampled distances primarily above 19 Å, which were further than the length of the longest tunnels present in LinB86 (Figure 1C). The DBE molecule adopted the unbound state in all schemes and replicates for a substantial fraction of cumulative ASMD trajectories (Figure 2). Curiously, some of the ASMDs, particularly those originating from the Bulk scheme but not exclusively, exhibited an area of high probability of DBE occurrence centered at approximately 25 Å-distance from the catalytic residues (Figure 2A). Structurally, this area corresponded to DBE bound at the surface cleft at the C-termini of LinB86 (Figure S25). Even in the Cavity scheme initiated from the DBE molecule bound deep in the active site, the substrate reached the bulk solvent, generating a minimum of 12% configurations in the unbound state (Figure 2B). Over 10,000 configurations in the unbound state were generated after at most seven epochs of ASMD simulations (Figure 3). Foreseeably, the unbound states were most prevalent (>31%) in the simulation seeded with DBE placed in the bulk solvent around the enzyme (scheme Bulk).

Figure 2.

Figure 2

Substrate (un)binding to the active site of LinB86 captured by ASMD simulations with four seeding schemes. (A) The distance distribution of DBE to catalytic residues for three replicated ASMDs for each seeding scheme. The regions corresponding to DBE in the active site (0–5 Å), shortest (p1b, 5–14 Å) and longest (p2, 14–19 Å) tunnel lengths (Figure 1C), and bulk solvent (>19 Å) are highlighted as gold, pink, shaded pink, and white, respectively. The distances are those between the COM of DBE and the COM of catalytic residues (N38, D108, and W109), measured in 45-μs ASMD simulations. (B) The fraction of DBE seen in individual regions.

Figure 3.

Figure 3

An epoch-wise sampling of the active site, tunnel, and bulk regions of LinB86 by DBE. Each epoch most often comprises 30 separate 50 ns-long production simulations, aggregating about 15,000 frames.

Concerning the ability of ASMDs to reach the bound pose of DBE in the buried active site of LinB86, all schemes except for Bulk consistently sampled the bound states in all three replicates. In the case of the Bulk scheme, the DBE molecule was able to find a path to the active site in replicate1, producing a total of 4% of simulations in the bound state (Figures 2 and 3), with a significant ensemble of more than 1,000 bound configurations sampled already by the fifth epoch (Figure 3). However, no bound state was observed in the other two replicated ASMDs from the Bulk scheme (Figures 2 and 3). This was expected as unbiased simulations of ligand associations are generally rather time-consuming, even for less complex systems.6971

Among the remaining schemes, Cavity ASMDs exploited the bound states the most frequently, as expected from the initial seeding with docked poses of DBE (Figure 2). Such a setup led to the accumulation of over 5,000 bound configurations during the first epoch in all three replicates (Figure 3). Such behavior was also partially retained in the Cavity&Bulk scheme, where more than 2,000 configurations in the bound state were systematically observed in the first epoch of ASMDs. Here, the additional seeds of DBE placed in the bulk solvent resulted in considerable sampling of more than 10,000 configurations in the unbound state within the first three epochs of ASMDs, about twice as fast as in the pure Cavity scheme (Figure 3). Finally, in ASMDs from the Tunnels scheme, we observed the most consistent behavior in exploring these three regions, with DBE localized primarily in regions corresponding to transport tunnels (Figures 2 and 3). Given the sufficient coverage of bound and unbound states, we progressed to the creation of MSMs from the assembled trajectories and the calculation of kinetic parameters of descriptions of (un)binding processes. Due to the lack of bound states in the Bulk scheme, these ASMDs were not considered for constructing MSMs.

3.2. Identifying Metastable States of DBE Interacting with LinB86 and Predicting Kinetic Parameters from MSMs

To further test the capabilities of the studied seeding schemes in the diversity and consistency of the identified metastable states, we generated MSMs from the individual ASMD replicates. These MSMs consisted of three to six metastable states for the Cavity (Figures S26–S28) and Cavity&Bulk (Figures S29–S31) schemes, whereas six to eight metastable states were identified in MSMs from the Tunnels schemes (Figures S32–S34). To understand the mutual correspondence among these states across all generated MSMs, we generated 1,000 representative structures of each metastable state and measured the distances of DBE to the catalytic residues, as well as to the bottlenecks of the known transport tunnels in LinB86 (Figure S22). These distances represent fingerprints characterizing the metastable states (Figures S35–S37), clearly identifying not only unbound and bound states but also their alignment to individual transport tunnels.

Finally, these unified fingerprints enabled us to cluster the metastable states (Figure S38), forming unified nonredundant ligand states (ULSs) across all MSMs (Figure 4A). The only state consistently present in all replicates of each seeding scheme (Figure 4B) was ULS1, which corresponded to the DBE molecules in the bulk solvent. ULS2–ULS5 all represented metastable states of DBE molecules bound inside the active site cavity. The DBE was bound closest to the catalytic residues in ULS2, which was found only in the MSMs of the Cavity scheme. In ULS3, the substrate was placed closer to the cavity center, whereas in ULS4 and ULS5, the substrate was located near the exit from the cavity in the direction of the p1 or p3 tunnels. ULS6–ULS9 featured the DBE molecule bound on the LinB86 surface at the entrances to the p3, p2a, p2c, and p2d tunnels, with the p3 tunnel entrance (ULS6) being the most prevalent across the MSMs (Figure 4B). Curiously, in replicate2 from the Cavity&Bulk scheme, we observed several metastable states forming ULS10, which were composed of DBE molecules exploring the cryptic pocket located back-to-back with the canonical active site cavity of LinB86 and with the entrance located on the opposite side of the enzyme structure with respect to the p1 tunnel entrance.

Figure 4.

Figure 4

Inference about the (un)binding process of DBE to LinB86 from the MSM analysis. (A) Structurally unified ligand states (ULSs) identified among all metastable states (Figures S25–S33) resolved by MSM analysis of three replicated ASMD simulations initiated from the studied seeding schemes. Protein structures are shown as a gray cartoon, whereas the region occupied by the DBE molecule in 20% (1% for bulk solvent state) of 1,000 structures representing a given ULS is shown as a red surface. (B) Presence of ULS among metastable states in each ASMD replicates with their average probabilities. The unbound and bound metastable states used as source and sink states during the mean first-passage time analyses are highlighted (Table S4). (C) Average equilibrium dissociation constants derived from MSMs as the ratio of dissociation and association rates (Table S5). The experimental kd was obtained from.75 The data represent mean ± s.e.m. from the three replicates.

Some identified ULSs were also observed in a recent study of transient binding sites on the LinB wild-type conducted with seven halogenated compounds, including DBE molecules.72 Out of nine sites, three could be matched to ULSs as follows: (i) site 5 corresponded to ULS6, the entrance to the p3 tunnel, (ii) site 9 overlaid with ULS9, the entrance to the p2c tunnel, and (iii) site 4 aligned to ULS8, the entrance to the p2d tunnel. Such agreement suggests conservation of those interaction sites between LinB wild-type and LinB86 mutant despite the substitutions introduced into the p1 and p3 tunnels of the mutant. From the identification of ULSs in replicated MSMs, the Tunnels scheme exhibited the best consistency because four ULSs were found in all three replicates, whereas the other two ULSs were found in two replicates. In contrast, only unbound ULS1 was systematically found in the Cavity and Cavity&Bulk schemes. In fact, those two schemes frequently led to the formation of singleton ULSs, i.e., ULSs present in one replicate only.

Finally, using the metastable states with the largest population of DBE bound close to catalytic residues of the enzyme below 5 Å COM distance (Table S4) as the bound state (Figure 4B), we calculated the equilibrium dissociation constant (kd) from the rates of DBE association (kon) and dissociation (koff) predicted from MSMs by the mean first-passage time analyses. All three schemes exhibited comparable average association and dissociation rates for DBE (p-values >0.12), with the Cavity&Bulk scheme consistently having the lowest relative error. In contrast, the Tunnels scheme exhibited the largest relative errors for these rates that, however, exhibited partial cancelation when computing kd values (Table S5). Interestingly, we found that the computed kd values from all schemes agreed with the experimentally determined one within an order of magnitude (Figure 4C). However, only the kd value obtained from the Cavity&Bulk scheme was not statistically different from the experimental data (Table S7). Considering the convergence of the computed kd, the Cavity scheme exhibited the largest relative error, surpassing 30% (Table S5). In general, it is still not common for such predicted data to reach this level of agreement with experiments, even for much less complex biomolecular systems.73,74

3.3. Exploration of Different Transport Paths by Substrate and Its Ability to Transit across Their Bottlenecks

Next, we investigated the utilization of individual transport pathways of LinB86 by the substrate DBE. Initially, we attempted to match the substrate migration traces to the tunnel ensembles using the TransportTools library.76 However, we observed very few complete migration events of the DBE molecule between the bulk solvent and the active site of LinB86 (Table S8), with replicate2 of the Cavity scheme capturing 33 transport events of DBE via known tunnels. Because such data did not allow sufficient inference, we considered a simplified transition of the DBE molecule through the tunnel bottleneck only, which corresponds to the least favorable region along the migration path and, hence, controls the transport rates.9,69,75 By considering the distances of DBE to the COM of bottleneck residues of each tunnel and the bottom of the active site cavity (Figure S22), we traced the location of DBE in all simulations, focusing on the frames where DBE came close to any of the bottlenecks and whether it passed through them.

A thorough investigation of the transport in all schemes via particular tunnels revealed the following observations. We observed the highest total number of transitions for the Tunnels scheme, followed by Cavity, Cavity&Bulk, and finally, the lowest for the Bulk only (Table S9). The overall proportion of the particular tunnels utilized to the total number of transitions was consistent across all schemes. The most frequently used tunnel was p2, followed by p1b, p1a, and p3, whereas mixed and unknown transport routes represented the smallest fraction of the data (Figure 5A). We have visually explored the nature of unknown transitions to verify if they correspond to the usage of a new tunnel. These transitions were primarily located in the vicinity of known tunnels but beyond the used cutoff values. However, in replicate2 of Cavity&Bulk scheme, we have also observed transitions of DBE into the back-side cryptic pocket, i.e., ULS10 among the unknown ones (Figure 4A).

Figure 5.

Figure 5

Tunnel utilization by the DBE in the investigated seeding schemes. (A) Relative utilization of particular tunnels for each scheme. (B) Per tunnel average tunnel utilization. (C) Per scheme average tunnel utilization. The data in B and C represent mean ± stdev from the three replicates.

Interestingly, besides this consistent trend in tunnel usage, schemes Cavity&Bulk and Tunnels displayed a higher percentage of the p2 tunnel than the remaining schemes (Figure 5A), suggesting that the more complex seeding schemes enabled relatively more efficient exploration of the longest and most complex branches of the p2 tunnel. In contrast, simplified schemes (Bulk and Cavity) tended to promote the sampling of more accessible primary conduits (i.e., the p1 tunnel) in agreement with their preferential utilization observed in TransportTools analyses of replicate2 of the Cavity scheme (Table S8). Besides this difference, the increased number of transitions for particular schemes mainly came from the proportionally boosted sampling for each tunnel (Figure 5B). Importantly, given the standard deviations of the three replicates for each scheme, it is clear that the Tunnels scheme had the highest consistency of all tested schemes for all tunnels. This was particularly evident when the transitions were considered for each run separately (Table S9). Whereas the total sum of transitions for the Bulk scheme differed noticeably for particular replicates (1820, 49, and 1006 transitions for replicate1, replicate2, and replicate3, respectively), the Tunnels scheme had the lowest deviation when the three replicates were considered separately (replicate1–2809, replicate2–1680, replicate3–2968). Cavity and Cavity&Bulk schemes fell between those two extremes and had a similar consistency to each other (Figure 5C).

3.4. Lessons Learned

By contrasting the performance of individual seeding schemes from different perspectives, we could identify several benefits and limitations. First, by infusing more knowledge into the initial seeds, we can drive ASMDs to sample relevant protein–ligand space, enabling appreciable agreement with the experiment (Figure 4C). However, in the case of the Cavity scheme, this was possible by virtue of the investigated LinB86 that does not bind the DBE substrate too tightly and hence allows exploration of the unbinding process well enough. As such, we would not expect that this scheme would perform well when applied to cases like the trypsin–benzamidine system.33,35 The more knowledge-based schemes, especially Tunnels, exhibited notably higher reproducibility and faster convergence in the proportion of explored regions (Figures 2, 3, 4C, and Table S5). This consistency indicates that even ASMD setups with fewer epochs might provide interesting insights into the transport processes without requiring intensive computational efforts. For illustration, a 45 μs ASMD run consisting of 30 epochs with 900 simulations, representing a typical use-case for investigating similar systems, required circa 9 days of computations on 18 RTX2080 cards.

From the perspective of studying the utilization of multiple tunnels present in the LinB86, the Tunnels schemes clearly promoted in-depth exploration of tunnel region (Figures 2, 3, and 5), providing the most complete overview of tunnel usage by DBE. Therefore, the Tunnels scheme would be particularly beneficial when investigating proteins with complex tunnel networks. Here, we noted a clear drawback of the exploration preference of ASMDs when information-rich and hard-to-explore tunnel networks are initially seeded, is that it resulted in a quite limited sampling of the active site and bulk regions, indicating that more hybrid seeding of tunnels with some seeds being placed in the active site cavity as well as in bulk regions could yield more balanced sampling.

Despite its rather beneficial properties, ASMDs using Tunnels seeding generally suffer from limitations originating from its dependence on tunnel detection in preseeding simulation(s) (Text S1). The placement of ligands to the originally identified tunnels will prioritize their sampling, and only after an active site or unbound regions are visited will the ASMD progress to explore these regions, including other potentially overlooked tunnels. Hence, the discovery of other tunnels might be delayed due to thorough exploration of the seeded ones. This limitation could, to some extent, be overcome by analyzing the tunnels with a rather small probe, as small as 0.7 Å, which was shown to enable the identification of even potential tunnels that could be formed after gain-of-function mutations or induced by ligand binding.10,77 In this study, we have used a probe radius of 0.9 Å, which sufficiently well identified the known tunnels of Linb86. However, even <0.7 Å probe could readily be applied without incurring extreme computational costs by using the Divide-and-Conquer approach.78

The applied seed generation based on rather short standard MD simulation(s) can be expected to capture relevant tunnels that are gated primarily with side chains but will be less appropriate for proteins requiring pronounced conformational changes to facilitate tunnel opening and ligand migration. While the molecular gates formed only by side-chains are the most prevalent,9,10 for systems exhibiting larger transformations, the tunnels could be identified using enhanced sampling methods, like Gaussian accelerated MD simulations, which was shown to be applicable for this purpose.79 Along the same line, moderately sized ligands would most probably fit into some of the tunnel instances of the ensemble generated by apo simulations performed here. However, larger ligands would require either seeding with the Cavity&Bulk scheme or their explicit inclusion in simulations, using computationally demanding biasing methods capable of enforcing the induction in the tunnel geometry due to the ligand presence, akin to IterTunnel approach,80 which, however, could result in the formation of unrealistic tunnels when incorrectly applied.81

Finally, as ASMD does not utilize any form of energy biasing, no matter the seeding scheme used, there will certainly be protein–ligand systems presenting too large energy barriers to witness ligand transitions with underlying standard MD simulation protocols despite significant computational efforts. In those instances, the Tunnels seeding scheme could be employed to obtain input structures for enhanced sampling methods, like umbrella sampling,82 which could be used to sample these high-energy regions and later integrated via multiensemble MSM, to attain a comprehensive view of ligand migration in the given protein system.38

4. Conclusions

This study aimed to test the effect of different seeding schemes on the sampling of metastable states of LinB86-DBE complexes in MSM-driven ASMD simulations. Meaningful insights were provided into the kinetic rates and mechanisms of the transport of the substrate DBE in LinB86 from its deeply buried active site to the solvent environment via multiple transport tunnels. Four designed seeding schemes were used to position DBE by using more knowledge to tackle the sampling of regions with higher energy barriers. The ensuing ASMD simulations constructed kinetic models with different levels of detail based on each employed seeding scheme. All simulations explored the entire transport process, visiting unbound and bound states except for the Bulk scheme, which could not reach the bound state in two replicates of 45-μs ASMDs. Conversely, the Tunnels scheme was the most consistent in sampling different metastable states of the substrate in the transport-relevant regions. Application of the more information-rich Tunnels and Cavity&Bulk schemes led to the enhanced exploration of the auxiliary p2 and p3 tunnels. In contrast, the primary p1 tunnel was preferred in ASMDs initiated from the other two schemes. Tunnels and Cavity&Bulk schemes also provided the most converged kd values from the rates of DBE association and dissociation, which were sufficiently close to the experimental measurements (within an order of magnitude) despite the complexity of the kinetic models. We expect that a methodology analogous to the one used during the seeding of Tunnels scheme (Text S1) could be beneficial for the identification of relevant protein–ligand states along such a likely migration path and subsequently defining suitable collective variables for enhanced sampling methods targeting ligand migrations, such as metadynamics21,22 and umbrella sampling.82 Overall, the infusion of more knowledge into the initial seeds of ASMD simulations could render computational analyses of transport mechanisms in enzymes more consistent, even for very complex biomolecular systems. This has a clear potential to translate into faster rational protein design and drug development efforts.

Acknowledgments

This work was supported by the National Science Centre, Poland (grant no. 2017/26/E/NZ1/00548). D.K.S. and B.S. were scholarship recipients provided by POWER projects POWR.03.02.00-00-I006/17 and POWR.03.02.00-00-I022/16, respectively. The simulations were performed at the Poznan supercomputing center.

Data Availability Statement

The underlying data for this study are available in the published article, the Supporting Information, and Zenodo repository at 10.5281/zenodo.10849386. The data include Python scripts, binary files, plain text, and PDB-formatted and AMBER-formatted structural data, all compatible with various freely available SW packages. No tools with restricted access are required.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.4c00452.

  • Properties of tunnel identified in LinB86 simulation. localization of DBE molecule around the LinB86 enzyme used as initial seeds in four investigated schemes. Protocol describing MD simulation used for tunnel calculations during seeding with scheme Tunnels. DBE migration energy barriers profile of ensembles of investigated tunnels. Scheme showing the creation of composite tunnel used for DBE seeding. Implied time scales of generated MSM, corresponding spectral separation analysis, and Chapman-Kolmogorov tests for all replicates of Cavity, Cavity&Bulk, and Tunnels schemes. Visualization of locations of the COM of residues used to characterize DBE locations within the LinB86 structure and their RMSFs for all schemes replicates. Statistics on the successfully completed ASMDs for all schemes replicates. Protein RMSD and RMSF calculated from ASMD simulations for all schemes replicates. Visualization of region preferentially occupied by DBE molecule at approximately 25 Å-distance from the catalytic residues. Visualization of metastable states and characteristic distances of their representative structures for all replicates of Cavity, Cavity&Bulk, and Tunnels schemes. Clustering of metastable states from all MSM analyses into ULS. Association and dissociation rates derived from MSMs for Cavity, Cavity&Bulk, and Tunnels schemes. Configurations in metastable states with DBE within 5 Å from the catalytic residues. Kinetic rates of DBE with LinB86 derived from MSMs. Sensitivity of calculated koff/kon values to the bound state selection. Statistical comparison between experimental and calculated kd values. Number of substrate migration events obtained from TransportTools analyses for all replicates of Cavity, Cavity&Bulk, and Tunnels schemes. Statistics of bottlenecks’ crossing by DBE molecule for all schemes replicates (PDF)

Author Contributions

Conceptualization: J.B.; Data curation: D.K.S.; Formal analysis: D.K.S., B.S., J.B.; Funding acquisition: J.B.; Investigation: D.K.S. (Simulations and MSMs), B.S. (Transitions and ULS clustering); Methodology: D.K.S. (Simulations and MSMs), B.S. (Transitions and ULS clustering); Project administration: J.B.; Resources: J.B.; Software: J.B.; Supervision: J.B.; Validation: J.B.; Writing–original draft: D.K.S. (Simulations and MSMs), B.S. (Transitions and ULS clustering); Writing–review and editing: J.B.

The authors declare no competing financial interest.

Supplementary Material

ct4c00452_si_001.pdf (6.7MB, pdf)

References

  1. Ansari N.; Rizzi V.; Parrinello M. Water Regulates the Residence Time of Benzamidine in Trypsin. Nat. Commun. 2022, 13, 5438. 10.1038/s41467-022-33104-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Gallicchio E.; Levy R. M. Recent Theoretical and Computational Advances for Modeling Protein-Ligand Binding Affinities. Adv. Protein Chem. Struct. Biol. 2011, 85, 27–80. 10.1016/B978-0-12-386485-7.00002-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Dror R. O.; Pan A. C.; Arlow D. H.; Borhani D. W.; Maragakis P.; Shan Y.; Xu H.; Shaw D. E. Pathway and Mechanism of Drug Binding to G-Protein-Coupled Receptors. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 13118–13123. 10.1073/pnas.1104614108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Silva D. A.; Bowman G. R.; Sosa-Peinado A.; Huang X. A Role for Both Conformational Selection and Induced Fit in Ligand Binding by the Lao Protein. PLoS Comput. Biol. 2011, 7, e1002054 10.1371/journal.pcbi.1002054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Buch I.; Giorgino T.; De Fabritiis G. Complete Reconstruction of an Enzyme-Inhibitor Binding Process by Molecular Dynamics Simulations. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 10184–10189. 10.1073/pnas.1103547108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Kokkonen P.; Bednar D.; Pinto G.; Prokop Z.; Damborsky J. Engineering Enzyme Access Tunnels. Biotechnol. Adv. 2019, 37, 107386. 10.1016/j.biotechadv.2019.04.008. [DOI] [PubMed] [Google Scholar]
  7. Pravda L.; Berka K.; Vařeková R. S.; Sehnal D.; Banáš P.; Laskowski R. A.; Koča J.; Otyepka M. Anatomy of Enzyme Channels. BMC Bioinf. 2014, 15, 379. 10.1186/s12859-014-0379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Marques S. M.; Daniel L.; Buryska T.; Prokop Z.; Brezovsky J.; Damborsky J. Enzyme Tunnels and Gates As Relevant Targets in Drug Design. Med. Res. Rev. 2017, 37, 1095–1139. 10.1002/med.21430. [DOI] [PubMed] [Google Scholar]
  9. Gora A.; Brezovsky J.; Damborsky J. Gates of Enzymes. Chem. Rev. 2013, 113, 5871–5923. 10.1021/cr300384w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Brezovsky J.; Babkova P.; Degtjarik O.; Fortova A.; Gora A.; Iermak I.; Rezacova P.; Dvorak P.; Smatanova I. K.; Prokop Z.; Chaloupkova R.; Damborsky J. Engineering a de Novo Transport Tunnel. ACS Catal. 2016, 6, 7597–7610. 10.1021/acscatal.6b02081. [DOI] [Google Scholar]
  11. Decherchi S.; Cavalli A. Thermodynamics and Kinetics of Drug-Target Binding by Molecular Simulation. Chem. Rev. 2020, 120, 12788–12833. 10.1021/acs.chemrev.0c00534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Zwier M. C.; Chong L. T. Reaching Biological Timescales with All-Atom Molecular Dynamics Simulations. Curr. Opin. Pharmacol. 2010, 10, 745–752. 10.1016/j.coph.2010.09.008. [DOI] [PubMed] [Google Scholar]
  13. Lee C. T.; Amaro R. E. Exascale Computing: A New Dawn for Computational Biology. Comput. Sci. Eng. 2018, 20, 18–25. 10.1109/MCSE.2018.05329812. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Bruce N. J.; Ganotra G. K.; Kokh D. B.; Sadiq S. K.; Wade R. C. New Approaches for Computing Ligand–Receptor Binding Kinetics. Curr. Opin. Struct. Biol. 2018, 49, 1–10. 10.1016/j.sbi.2017.10.001. [DOI] [PubMed] [Google Scholar]
  15. Chong L. T.; Saglam A. S.; Zuckerman D. M. Path-Sampling Strategies for Simulating Rare Events in Biomolecular Systems. Curr. Opin. Struct. Biol. 2017, 43, 88–94. 10.1016/j.sbi.2016.11.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Ahmad K.; Rizzi A.; Capelli R.; Mandelli D.; Lyu W.; Carloni P. Enhanced-Sampling Simulations for the Estimation of Ligand Binding Kinetics: Current Status and Perspective. Front. Mol. Biosci. 2022, 9, 899805 10.3389/fmolb.2022.899805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Votapka L. W.; Stokely A. M.; Ojha A. A.; Amaro R. E. SEEKR2: Versatile Multiscale Milestoning Utilizing the OpenMM Molecular Dynamics Engine. J. Chem. Inf. Model. 2022, 62, 3253–3262. 10.1021/acs.jcim.2c00501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Ahn S. H.; Jagger B. R.; Amaro R. E. Ranking of Ligand Binding Kinetics Using a Weighted Ensemble Approach and Comparison with a Multiscale Milestoning Approach. J. Chem. Inf. Model. 2020, 60, 5340–5352. 10.1021/acs.jcim.9b00968. [DOI] [PubMed] [Google Scholar]
  19. Miao Y.; Feher V. A.; McCammon J. A. Gaussian Accelerated Molecular Dynamics: Unconstrained Enhanced Sampling and Free Energy Calculation. J. Chem. Theory Comput. 2015, 11, 3584–3595. 10.1021/acs.jctc.5b00436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Wang J.; Arantes P. R.; Bhattarai A.; Hsu R. V.; Pawnikar S.; Huang Y. M.; Palermo G.; Miao Y. Gaussian Accelerated Molecular Dynamics: Principles and Applications. WIREs Comput. Mol. Sci. 2021, 11, e1521 10.1002/wcms.1521. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Bussi G.; Laio A. Using Metadynamics to Explore Complex Free-Energy Landscapes. Nat. Rev. Phys. 2020 2:4 2020, 2, 200–212. 10.1038/s42254-020-0153-0. [DOI] [Google Scholar]
  22. Laio A.; Parrinello M. Escaping Free-Energy Minima. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 12562–12566. 10.1073/pnas.202427399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Ribeiro J. M. L.; Provasi D.; Filizola M. A Combination of Machine Learning and Infrequent Metadynamics to Efficiently Predict Kinetic Rates, Transition States, and Molecular Determinants of Drug Dissociation from G Protein-Coupled Receptors. J. Chem. Phys. 2020, 153, 124105. 10.1063/5.0019100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Plattner N.; Noé F. Protein Conformational Plasticity and Complex Ligand-Binding Kinetics Explored by Atomistic Simulations and Markov Models. Nat. Commun. 2015, 6, 7653. 10.1038/ncomms8653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Kokh D. B.; Amaral M.; Bomke J.; Grädler U.; Musil D.; Buchstaller H. P.; Dreyer M. K.; Frech M.; Lowinski M.; Vallee F.; Bianciotto M.; Rak A.; Wade R. C. Estimation of Drug-Target Residence Times by τ-Random Acceleration Molecular Dynamics Simulations. J. Chem. Theory Comput. 2018, 14, 3859–3869. 10.1021/acs.jctc.8b00230. [DOI] [PubMed] [Google Scholar]
  26. Nunes-Alves A.; Kokh D. B.; Wade R. C. Ligand Unbinding Mechanisms and Kinetics for T4 Lysozyme Mutants from τ RAMD Simulations. Curr. Res. Struct. Biol. 2021, 3, 106–111. 10.1016/j.crstbi.2021.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Doerr S.; Harvey M. J.; Noé F.; De Fabritiis G. HTMD: High-Throughput Molecular Dynamics for Molecular Discovery. J. Chem. Theory Comput. 2016, 12, 1845–1852. 10.1021/acs.jctc.6b00049. [DOI] [PubMed] [Google Scholar]
  28. Pande V. S.; Beauchamp K.; Bowman G. R. Everything You Wanted to Know about Markov State Models but Were Afraid to Ask. Methods. 2010, 52, 99–105. 10.1016/j.ymeth.2010.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Husic B. E.; Pande V. S. Markov State Models: From an Art to a Science. J. Am. Chem. Soc. 2018, 140, 2386–2396. 10.1021/jacs.7b12191. [DOI] [PubMed] [Google Scholar]
  30. Arbon R. E.; Zhu Y.; Antonia Markov State Models: To Optimize or Not to Optimize. J. Chem. Theory Comput. 2024, 20, 977–988. 10.1021/acs.jctc.3c01134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Bowman G. R.; Ensign D. L.; Pande V. S. Enhanced Modeling via Network Theory: Adaptive Sampling of Markov State Models. J. Chem. Theory Comput. 2010, 6, 787–794. 10.1021/ct900620b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Lawrenz M.; Shukla D.; Pande V. S. Cloud Computing Approaches for Prediction of Ligand Binding Poses and Pathways. Sci. Rep. 2015, 5, 7918. 10.1038/srep07918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Doerr S.; De Fabritiis G. On-the-Fly Learning and Sampling of Ligand Binding by High-Throughput Molecular Simulations. J. Chem. Theory Comput. 2014, 10, 2064–2069. 10.1021/ct400919u. [DOI] [PubMed] [Google Scholar]
  34. Gu S.; Silva D. A.; Meng L.; Yue A.; Huang X. Quantitatively Characterizing the Ligand Binding Mechanisms of Choline Binding Protein Using Markov State Model Analysis. PLoS Comput. Biol. 2014, 10, e1003767 10.1371/journal.pcbi.1003767. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Betz R. M.; Dror R. O. How Effectively Can Adaptive Sampling Methods Capture Spontaneous Ligand Binding?. J. Chem. Theory Comput. 2019, 15, 2053–2063. 10.1021/acs.jctc.8b00913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Hruska E.; Abella J. R.; Nüske F.; Kavraki L. E.; Clementi C. Quantitative Comparison of Adaptive Sampling Methods for Protein Dynamics. J. Chem. Phys. 2018, 149, 244119. 10.1063/1.5053582. [DOI] [PubMed] [Google Scholar]
  37. Hruska E.; Balasubramanian V.; Lee H.; Jha S.; Clementi C. Extensible and Scalable Adaptive Sampling on Supercomputers. J. Chem. Theory Comput. 2020, 16, 7915–7925. 10.1021/acs.jctc.0c00991. [DOI] [PubMed] [Google Scholar]
  38. Wu H.; Paul F.; Wehmeyer C.; Noé F. Multiensemble Markov Models of Molecular Thermodynamics and Kinetics. Proc. Natl. Acad. Sci. U. S. A. 2016, 113, E3221–E3230. 10.1073/pnas.1525092113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Berger-Tal O.; Nathan J.; Meron E.; Saltz D. The Exploration-Exploitation Dilemma: A Multidisciplinary Framework. PLoS One 2014, 9, e95693 10.1371/journal.pone.0095693. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Takahashi R.; Gil V. A.; Guallar V. Monte Carlo Free Ligand Diffusion with Markov State Model Analysis and Absolute Binding Free Energy Calculations. J. Chem. Theory Comput. 2014, 10, 282–288. 10.1021/ct400678g. [DOI] [PubMed] [Google Scholar]
  41. Dandekar B. R.; Ahalawat N.; Sinha S.; Mondal J. Markov State Models Reconcile Conformational Plasticity of GTPase with Its Substrate Binding Event. JACS Au 2023, 3, 1728–1741. 10.1021/jacsau.3c00151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Gordon J. C.; Myers J. B.; Folta T.; Shoja V.; Heath L. S.; Onufriev A. H++: A Server for Estimating PKas and Adding Missing Hydrogens to Macromolecules. Nucleic Acids Res. 2005, 33, W368–W371. 10.1093/nar/gki464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Anandakrishnan R.; Aguilar B.; Onufriev A. V. H++ 3.0: Automating PK Prediction and the Preparation of Biomolecular Structures for Atomistic Molecular Modeling and Simulations. Nucleic Acids Res. 2012, 40, W537–W541. 10.1093/nar/gks375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Wang J.; Wang W.; Kollman P. A.; Case D. A. Automatic Atom Type and Bond Type Perception in Molecular Mechanical Calculations. J. Mol. Graphics Modell. 2006, 25, 247–260. 10.1016/j.jmgm.2005.12.005. [DOI] [PubMed] [Google Scholar]
  45. He X.; Man V. H.; Yang W.; Lee T. S.; Wang J. A Fast and High-Quality Charge Model for the next Generation General AMBER Force Field. J. Chem. Phys. 2020, 153, 114502. 10.1063/5.0019056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Cieplak P.; Cornell W. D.; Bayly C.; Kollman P. A. Application of the Multimolecule and Multiconformational RESP Methodology to Biopolymers: Charge Derivation for DNA, RNA, and Proteins. J. Comput. Chem. 1995, 16, 1357–1377. 10.1002/jcc.540161106. [DOI] [Google Scholar]
  47. Frisch M. J. G.; Trucks W.; Schlegel H. B.; Scuseria G. E.; Robb M. A.; Cheeseman J. R.; Scalmani G.; Barone V.; Mennucci B.; Petersson G. A.; Nakatsuji H.; Caricato M.; Li X.; Hratchian H. P.; Izmaylov A. F.; Bloino J.; Zheng G.; Sonnenberg J. L.; Frisch M. J.; Trucks G. W.; Schlegel H. B.; Scuseria G. E.; Robb M. A.; Cheeseman J. R.; Scalmani G.; Barone V.; Mennucci B.; Petersson G. A.; Nakatsuji H.; Caricato M.; Li X.; Hratchian H. P.; Izmaylov A. F.; Bloino J.; Zheng G.; Sonnenberg J. L.; Hada M.; Ehara M.; Toyota K.; Fukuda R.; Hasegawa J.; Ishida M.; Nakajima T.; Honda Y.; Kitao O.; Nakai H.; Vreven T.; Montgomery J. A. Jr.; Peralta J. E.; Ogliaro F.; Bearpark M.; Heyd J. J.; Brothers E.; Kudin K. N.; Staroverov V. N.; Kobayashi R.; Normand J.; Raghavachari K.; Rendell A.; Burant J. C.; Iyengar S. S.; Tomasi J.; Cossi M.; Rega N.; Dannenberg J. J.; Dapprich S.; Daniels A. D.; Farkas Ö.; Foresman J. B.; Ortiz J. V.; Cioslowski J.; Fox D. J.. Gaussian 09, Revision E. 01; Gaussian. Gaussian, Inc.: Wallingford, CT., 2009, 11.
  48. Case D. A.; Walker R. C.; Cheatham T. E.; Simmerling C.; Roitberg A.; Merz K. M.; Luo R.; Darden T.. Amber 18. University of California: San Francisco. 2018, 2018. [Google Scholar]
  49. The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.
  50. Trott O.; Olson A. J. AutoDock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, Efficient Optimization, and Multithreading. J. Comput. Chem. 2010, 31, 455–461. 10.1002/jcc.21334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Kovalenko A.; Luchko T.; Gusarov S.; Roe D. R.; Simmerling C.; Case D. A.; Tuszynski J. Three-Dimensional Molecular Theory of Solvation Coupled with Molecular Dynamics in Amber. J. Chem. Theory Comput. 2010, 6, 607–624. 10.1021/ct900460m. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Sindhikara D. J.; Yoshida N.; Hirata F. Placevent: An Algorithm for Prediction of Explicit Solvent Atom Distribution-Application to HIV-1 Protease and F-ATP Synthase. J. Comput. Chem. 2012, 33, 1536–1543. 10.1002/jcc.22984. [DOI] [PubMed] [Google Scholar]
  53. Li Z.; Song L. F.; Li P.; Merz K. M. Systematic Parametrization of Divalent Metal Ions for the OPC3, OPC, TIP3P-FB, and TIP4P-FB Water Models. J. Chem. Theory Comput. 2020, 16, 4429–4442. 10.1021/acs.jctc.0c00194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Hopkins C. W.; Le Grand S.; Walker R. C.; Roitberg A. E. Long-Time-Step Molecular Dynamics through Hydrogen Mass Repartitioning. J. Chem. Theory Comput. 2015, 11, 1864–1874. 10.1021/ct5010406. [DOI] [PubMed] [Google Scholar]
  55. Sahil M.; Sarkar S.; Mondal J. Long-Time-Step Molecular Dynamics Can Retard Simulation of Protein-Ligand Recognition Process. Biophys. J. 2023, 122, 802–816. 10.1016/j.bpj.2023.01.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Sahil M.; Singh T.; Ghosh S.; Mondal J. 3site Multisubstrate-Bound State of Cytochrome P450cam. J. Am. Chem. Soc. 2023, 145, 23488–23502. 10.1021/jacs.3c06144. [DOI] [PubMed] [Google Scholar]
  57. Salomon-Ferrer R.; Götz A. W.; Poole D.; Le Grand S.; Walker R. C. Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald. J. Chem. Theory Comput. 2013, 9, 3878–3888. 10.1021/ct400314y. [DOI] [PubMed] [Google Scholar]
  58. Ryckaert J. P.; Ciccotti G.; Berendsen H. J. C. Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes. J. Comput. Phys. 1977, 23, 327–341. 10.1016/0021-9991(77)90098-5. [DOI] [Google Scholar]
  59. Zwanzig R. Nonlinear Generalized Langevin Equations. J. Stat. Phys. 1973, 9, 215–220. 10.1007/BF01008729. [DOI] [Google Scholar]
  60. Darden T.; York D.; Pedersen L. Particle Mesh Ewald: AnN·Log(N) Method for Ewald Sums in Large Systems. J. Chem. Phys. 1993, 98, 10089–10092. 10.1063/1.464397. [DOI] [Google Scholar]
  61. Roe D. R.; Cheatham T. E. PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data. J. Chem. Theory Comput. 2013, 9, 3084–3095. 10.1021/ct400341p. [DOI] [PubMed] [Google Scholar]
  62. Case D. A.; Cheatham T. E. I.; Simmerling C.; Roitberg A.; Merz K. M.; Walker R. C.; Luo R.; Li P.; Darden T.; Sagui C.; Pan F.; Wang J.; Roe D. R.; Swails J.; Götz A. W.; Smith J.; Cerutti D.; Lee T.; York D.; Giese T.; Luchko T.; Forouzesh N.; Man V.; Cruzeiro V. W. D.; Monard G.; Miao Y.; Wang J.; Lin C.; Cisneros G. A.; Rahnamoun A.; Shajan A.; Manathunga M.; Berryman J. T.; Skrynnikov N. R.; Mikhailovskii O.; Xue Y.; Izmailov S. A.; Kasavajhala K.; Belfon K.; Shen J.; Harris J.; Onufriev A.; Izadi S.; Wu X.; Gohlke H.; Schott-Verdugo S.; Qi R.; Wei H.; Wu Y.; Zhao S.; Zhu Q.; King E.; Giamba G.; Liu J.; Nguyen H.; Brozell S. R.; Kovalenko A.; Gilson M.; Ben-Shalom I.; Kurtzman T.; Pantano S.; Machado M.; Aktulga H. M.; Kaymak M. C.; O’Hearn K. A.; Kollman P. A.. Amber 2023 Reference Manual. University of California: San Francisco, 2023. [Google Scholar]
  63. Pérez-Hernández G.; Paul F.; Giorgino T.; De Fabritiis G.; Noé F. Identification of Slow Molecular Order Parameters for Markov Model Construction. J. Chem. Phys. 2013, 139, 015102 10.1063/1.4811489. [DOI] [PubMed] [Google Scholar]
  64. Scherer M. K.; Trendelkamp-Schroer B.; Paul F.; Pérez-Hernández G.; Hoffmann M.; Plattner N.; Wehmeyer C.; Prinz J. H.; Noé F. PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models. J. Chem. Theory Comput. 2015, 11, 5525–5542. 10.1021/acs.jctc.5b00743. [DOI] [PubMed] [Google Scholar]
  65. Pedregosa F.; Varoquaux G.; Gramfort A.; Michel V.; Thirion B.; Grisel O.; Blondel M.; Prettenhofer P.; Weiss R.; Dubourg V.; Vanderplas J.; Passos A.; Cournapeau D.; Brucher M.; Perrot M.; Duchesnay É. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  66. Röblitz S.; Weber M. Fuzzy Spectral Clustering by PCCA+: Application to Markov State Models and Data Classification. Adv. Data Anal. Classif. 2013, 7, 147–179. 10.1007/s11634-013-0134-6. [DOI] [Google Scholar]
  67. Prinz J. H.; Wu H.; Sarich M.; Keller B.; Senne M.; Held M.; Chodera J. D.; Schütte C.; Noé F. Markov Models of Molecular Kinetics: Generation and Validation. J. Chem. Phys. 2011, 134, 174105. 10.1063/1.3565032. [DOI] [PubMed] [Google Scholar]
  68. McInnes L.; Healy J.; Astels S. Hdbscan: Hierarchical Density Based Clustering. J. Open Source Softw. 2017, 2, 205. 10.21105/joss.00205. [DOI] [Google Scholar]
  69. Bujotzek A.; Weber M. Efficient Simulation of Ligand-Receptor Binding Processes Using the Conformation Dynamics Approach. J. Bioinf. Comput. Biol. 2009, 7, 811–831. 10.1142/S0219720009004369. [DOI] [PubMed] [Google Scholar]
  70. Wolf S. Predicting Protein–Ligand Binding and Unbinding Kinetics with Biased MD Simulations and Coarse-Graining of Dynamics: Current State and Challenges. J. Chem. Inf. Model. 2023, 63, 2902–2910. 10.1021/acs.jcim.3c00151. [DOI] [PubMed] [Google Scholar]
  71. Lecina D.; Gilabert J. F.; Guallar V. Adaptive Simulations, towards Interactive Protein-Ligand Modeling. Sci. Rep. 2017, 7, 8466. 10.1038/s41598-017-08445-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Raczyńska A.; Kapica P.; Papaj K.; Stańczak A.; Shyntum D.; Spychalska P.; Byczek-Wyrostek A.; Góra A. Transient Binding Sites at the Surface of Haloalkane Dehalogenase LinB as Locations for Fine-Tuning Enzymatic Activity. PLoS One 2023, 18, e0280776 10.1371/journal.pone.0280776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Wang J.; Do H. N.; Koirala K.; Miao Y. Predicting Biomolecular Binding Kinetics: A Review. J. Chem. Theory Comput. 2023, 19, 2135–2148. 10.1021/acs.jctc.2c01085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Sohraby F.; Nunes-Alves A. Advances in Computational Methods for Ligand Binding Kinetics. Trends Biochem. Sci. 2023, 48, 437–449. 10.1016/j.tibs.2022.11.003. [DOI] [PubMed] [Google Scholar]
  75. Kokkonen P.; Sykora J.; Prokop Z.; Ghose A.; Bednar D.; Amaro M.; Beerens K.; Bidmanova S.; Slanska M.; Brezovsky J.; Damborsky J.; Hof M. Molecular Gating of an Engineered Enzyme Captured in Real Time. J. Am. Chem. Soc. 2018, 140, 17999–18008. 10.1021/jacs.8b09848. [DOI] [PubMed] [Google Scholar]
  76. Brezovsky J.; Thirunavukarasu A. S.; Surpeta B.; Sequeiros-Borja C. E.; Mandal N.; Sarkar D. K.; Dongmo Foumthuim C. J.; Agrawal N. TransportTools: A Library for High-Throughput Analyses of Internal Voids in Biomolecules and Ligand Transport through Them. Bioinformatics 2022, 38, 1752–1753. 10.1093/bioinformatics/btab872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Sequeiros-Borja C.; Surpeta B.; Thirunavukarasu A. S.; Foumthuim C. J. D.; Marchlewski I.; Brezovsky J.. Water Will Find Its Way: Transport through Narrow Tunnels in Hydrolases. J. Chem. Inf. Model. 2024, 10.1021/acs.jcim.4c00094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Sequeiros-Borja C.; Surpeta B.; Marchlewski I.; Brezovsky J. Divide-And-Conquer Approach to Study Protein Tunnels in Long Molecular Dynamics Simulations. MethodsX 2023, 10, 101968 10.1016/j.mex.2022.101968. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Mandal N.; Surpeta B.; Brezovsky J.. Reinforcing Tunnel Network Exploration in Proteins Using Gaussian Accelerated Molecular Dynamics. bioRxiv 2024. 10.1101/2024.04.30.591887. [DOI]
  80. Kingsley L. J.; Lill M. A. Including Ligand-Induced Protein Flexibility into Protein Tunnel Prediction. J. Comput. Chem. 2014, 35, 1748–1756. 10.1002/jcc.23680. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Kingsley L. J.; Lill M. A. Substrate Tunnels in Enzymes: Structure-Function Relationships and Computational Methodology. Proteins: Struct., Funct., Bioinf. 2015, 83, 599–611. 10.1002/prot.24772. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Matthews C.; Weare J.; Kravtsov A.; Jennings E. Umbrella Sampling: A Powerful Method to Sample Tails of Distributions. Mon. Not. R. Astron. Soc. 2018, 480, 4069–4079. 10.1093/mnras/sty2140. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ct4c00452_si_001.pdf (6.7MB, pdf)

Data Availability Statement

The underlying data for this study are available in the published article, the Supporting Information, and Zenodo repository at 10.5281/zenodo.10849386. The data include Python scripts, binary files, plain text, and PDB-formatted and AMBER-formatted structural data, all compatible with various freely available SW packages. No tools with restricted access are required.


Articles from Journal of Chemical Theory and Computation are provided here courtesy of American Chemical Society

RESOURCES