Abstract
Molecular mechanics combined with Poisson–Boltzmann or generalized Born and solvent-accessible area solvation energies (MM/PBSA and MM/GBSA) are popular methods to estimate the free energy for the binding of small molecules to biomacromolecules. However, the estimation of the entropy has been problematic and time-consuming. Traditionally, normal-mode analysis has been used to estimate the entropy, but more recently, alternative approaches have been suggested. In particular, it has been suggested that exponential averaging of the electrostatic and Lennard–Jones interaction energies may provide much faster and more accurate entropies, the interaction entropy (IE) approach. In this study, we show that this exponential averaging is extremely poorly conditioned. Using stochastic simulations, assuming that the interaction energies follow a Gaussian distribution, we show that if the standard deviation of the interaction energies (σIE) is larger than 15 kJ/mol, it becomes practically impossible to converge the interaction entropies (more than 10 million energies are needed, and the number increases exponentially). A cumulant approximation to the second order of the exponential average shows a better convergence, but for σIE > 25 kJ/mol, it gives entropies that are unrealistically large. Moreover, in practical applications, both methods show a steady increase in the entropy with the number of energies considered.
Introduction
Estimating accurate free energies for the binding of small molecules to biomacromolecules is one of the most important goals of computational chemists because it would have a strong impact on drug development.1−3 Therefore, a large number of methods have been developed with this aim. Some methods involve docking using simplified scoring functions, which give fast, but not so accurate results.4−6 More accurate results are obtained with free-energy perturbation methods, which employ molecular dynamics (MD) simulations with standard force fields for the free and bound ligands, as well as for a number of alchemical intermediate states.7−9 Consequently, they are computationally very demanding, but they can give an accuracy of ∼4 kJ/mol for well-behaving cases.10−12
Intermediate between these two levels of theory, there are some methods that are also based on MD simulations but only of the physical end-states (the complex and possibly also the free protein and ligand).13−15 In particular, the molecular mechanics combined with Poisson–Boltzmann or generalized Born and solvent-accessible surface area (MM/PBSA and MM/GBSA) solvation energies have been much used.16−18 In these, the complex of the macromolecule and the ligand is simulated by MD simulations, and a number of snapshots are collected. For each of these, the water molecules are stripped off, and the binding free energy is approximated by
1 |
where Eel is the electrostatic energy and EvdW is the van der Waals energy, calculated with a standard MM force field, whereas Gsol is the solvation free energy calculated either by solving the Poisson–Boltzmann equation or by the Generalized Born approach, GSASA is the non-polar solvation free energy, estimated from the solvent-accessible surface area (SASA), T is the absolute temperature, and SNM is the translational, rotational, and vibrational entropy, estimated from a normal-mode (NM) analysis of vibrational frequencies calculated at the MM level of theory. Each energy term is estimated from the difference between the complex (RL), the free receptor (R), and the ligand (L)
2 |
normally obtained by simply stripping off the receptor or the ligand from snapshots taken from the MD simulations of the complex (otherwise the precision will be worse and there will be an additional energy term from the bond, angle, and dihedral interactions).13−15 Moreover, each energy term is an average over all snapshots from the MD simulation, indicated by the angular brackets in eq 1.
The time consumption of these approaches is often dominated by the ΔSNM term (i.e., the frequency calculation), and ΔSNM is therefore often calculated for only a fraction of the snapshots.13−15,19,20 However, then this term will limit the precision of the final results. As this term often does not improve the accuracy of the method (at least not the relative energies), it is often omitted. Several alternative approaches to estimate the binding entropy have been suggested.19−22
In 2016, Zhang and co-workers suggested a new method to estimate the binding entropy, called the interaction entropy (IE) approach.23 It estimates the entropy from
3 |
where R is the gas constant and ΔEIE = ΔEel + ΔEvdW. Consequently, the entropies can be calculated directly from energies already available from the normal MM/GBSA (for simplicity, we will in the following say only MM/GBSA even when everything applies equally well for MM/PBSA) calculations and therefore does not add any extra computational cost. The IE method has been used in several later studies, also for protein–protein binding and alanine screening.24−27
In 2018, Minh and co-workers put the IE approach into a more general theoretical framework and tested a number of cheap methods to calculate the entropy.22 In particular, they expressed the binding free energy as an exponential average of ΔEIE and approximated it by a cumulant expansion
4 |
where σIE is the standard deviation of EIE over all snapshots. Consequently, the binding entropy can be approximated with the second-order cumulant approximation term (C2)
5 |
In a recent study, we tried to use these approaches to estimate MM/GBSA binding free energies for the binding of three similar ligands to galectin-328 but obtained poor and confusing results. Here, we explain those results and compare entropies obtained with the IE and C2 methods. In particular, we address the important question: how many snapshots are needed to obtain a converged estimate of the entropy with the IE and C2 methods? We answer the question by performing stochastic simulations, assuming that EIE follows a Gaussian distribution, as has been done before for related questions.29,30
Methods
MD Simulations
We have studied the binding of five different ligands to three proteins: galectin-3 with three ligands, differing only in the position of a single fluorine group (o-, m-, and p-fluoro-phenyltriazolyl-galactosylthioglucoside, called O, M, and P in the following), ferritin with phenol, and the T4 lysozyme Leu99Ala mutant with benzene. All three systems have been studied before by us, and we used the same setup as in our previous studies (which therefore shows some slight variations).28,31,32 The simulations were based on the crystal structures of the complexes: 6RZF, 6RZG, 6RZH,283F39,33 and 181L.34 All crystal-water molecules were kept in the simulations. Each complex was solvated in an octahedral (galectin-3 and lysozyme) or rectangular (ferritin) box of water molecules extending at least 10 Å from the protein using the tleap module. The protonation state of all residues is specified in our previous studies.28,31,32 No counter ions were used in the simulations.35
The MD simulations were run with the Amber software suite.36 The protein was described by the Amber ff14SB force field,37 water molecules with the TIP3P (ferritin and lysozyme) or TIP4P-Ewald model (galectin-3),38 whereas the ligands were treated with the general Amber force field (GAFF).39 Charges for the ligands were obtained with the restrained electrostatic potential method,40 and they were specified before.28,31,32
For each complex, 1000 steps of minimization were used, followed by 20 ps constant-volume equilibration and 20 ps constant-pressure equilibration, all performed with heavy non-water atoms restrained toward the starting structure with a force constant of 4184 kJ/mol/Å2. The system was then equilibrated freely for 1 ns. Two sets of production simulations were then performed for each of the five complexes with constant pressure and without any restraints: In the first, we run 100 ns simulation and sampled snapshots every 10 ps. In the second, we run 10 ns simulation and sampled snapshots every 10 fs. The former is similar to what we normally do in ligand-binding studies,28,31,32,41 whereas the latter is more similar to what was done in the original IE publication.23 In both cases, we run 10 independent simulations for each system, using different starting velocities and water solvation boxes.42 Consequently, the total simulation time for each complex was 1 μs and 100 ns, respectively, and we collected N = 100,000 and 10,000,000 snapshots from each simulation.
All bonds involving hydrogen atoms were constrained to the equilibrium value using the SHAKE algorithm,43 allowing for a time step of 2 fs. The temperature was kept constant at 301 K (galectin-3 and lysozyme) or 298 K (ferritin) using Langevin dynamics,44 with a collision frequency of 2 ps–1. The pressure was kept constant at 1 atm using a weak-coupling isotropic algorithm45 with a relaxation time of 1 ps. Long-range electrostatics were handled by particle-mesh Ewald summation46 with a fourth-order B spline interpolation and a tolerance of 10–5. The cutoff radius for Lennard-Jones interactions was 8 Å (galectin-3 and lysozyme) or 10 Å (ferritin).
MM/GBSA Calculations
MM/GBSA calculations16−18 were performed using mmpbsa.py utility of AMBER.47 The calculations employed the latest generalized Born method GB-Neck2 (igb = 8) with modified Bondi radii (mbondi3)48 and a dielectric constant of 80 outside the solute and 1 inside the solute. The non-polar solvation free energy was calculated from the solvent accessible surface, using ΔGSASA = α SASA + b, with α = 0.0227 kJ/mol/Å2 and b = 3.85 kJ/mol.49
Entropies were calculated by the IE23 and C2 approaches22 using the simulated data and a local script. We also divided the complete data (all snapshots) into smaller batches, allowing for estimates of the precision of the estimated entropies (as the standard deviation over the calculated entropy for each batch divided by the square root of the number of batches of equal size).
In addition, we also discuss results of previous MM/GBSA calculations on avidin with seven biotin-like ligands,41 blood clotting factor Xa with nine inhibitors,50 galectin-3 with two additional ligands,51 and ferritin with eight additional small ligands.31 Using the old data, we have calculated IE and C2 entropies with the same script. Most of the old studies reported also entropies estimated with the NM approach.
All entropies in this article are discussed in energy terms, that is, as −TΔS in kJ/mol at 300 K.
Gaussian Simulations
The stochastic simulations used the same approach as in our previous study of the convergence of exponential averaging to estimate reaction free energies in combined quantum and molecular mechanical calculations.29 In fact, the same small simulation program could be used, besides that entropies were considered, instead of free energies (so that average of ΔEIE was not included). The program generates a certain number of Gaussian-distributed energies (by the Box–Muller transform52) and calculates the exponential average in eq 3. This is repeated 1000 times, and it is noted how many times this average is within a certain limit (we used 4 kJ/mol for all calculations in this study) from the analytical results (eq 5; which is the analytical result for a Gaussian distribution). The program automatically finds the minimum number of snapshots (N, estimated within 0.1%) needed to fulfil these criteria. The Fortran code can be obtained from the authors upon request.
Results and Discussion
Galectin-3
In a recent investigation of the binding thermodynamics of three ligands to galectin-3,28 we were suggested by a reviewer to estimate the energy and entropy of binding using the MM/GBSA and IE approaches. This was an attractive suggestion because we already had 10 × 100 ns simulations of the protein–ligand complexes with structures sampled every 10 ps, so it was only a matter of postprocessing of these snapshots. Unfortunately, the results were not especially encouraging and were therefore presented only in the Supporting Information.28
In particular, whereas the MM/GBSA energies were reasonably well-converged, the interaction entropies showed a very alarming trend depending on how the entropies were calculated: If all data were used for exponential average in eq 3 (N = 100,000, the number of snapshots and individual ΔEIE energy estimates), we obtained very large entropies for all three ligands, 162–208 kJ/mol, giving positive binding free energies of 33–56 kJ/mol (the experimental estimates are −30 to −33 kJ/mol28). The results look reasonably converged with a variation of 2–5 kJ/mol for the running average over the last 10% of the snapshots, as can be seen in Figure 1.
These entropies do not have any estimates of the uncertainty, and it is more natural to calculate interaction entropies for each of the 10 independent simulations separately (N = 10,000 in each simulation), presenting the average entropy and the standard error over the 10 sets of simulations. This gave somewhat lower estimates, 160 ± 9, 161 ± 9, and 153 ± 10 kJ/mol, for O, M, and P, respectively, but still positive binding free energies (9–25 kJ/mol).
However, we have previously found that more stable entropies are obtained (by dihedral histogramming) if they are obtained from simulations of 5 ns (N = 500; the entropies are averaged over 200 batches) because it reduces the dependence on rare events.53 The 5 ns time window was selected to be similar to the rotational correlation time of the protein (∼7 ns).28 Quite surprisingly, this gave much smaller entropies, 84–89 kJ/mol, with a standard error of 2 kJ/mol. We therefore repeated the calculations with N varying from 100 to 100,000. From Figure 2a, it can be seen that the estimated IE entropy increases steadily with N. At the same time, the estimated standard error increases from 0.5 to 0.6 kJ/mol at N = 100 to 9–10 kJ/mol at N = 10,000, reflecting that the entropies are averaged over fewer independent simulations (from 1000 to 10; the range of the estimated entropies is actually larger at N = 100 than at N = 10,000, 140–210 kJ/mol, compared to 83–97 kJ/mol).
Naturally, these results are very alarming, showing that we can essentially get any estimate of the entropy between 60 and 200 kJ/mol. To yield some further understanding, we calculated entropies also with the second-order cumulant approximation, C2, as suggested by Minh and co-workers.22 The results are shown in Figures 1b and 2b, and it can be seen that it gives much higher entropies, 175–990 kJ/mol, but with the same increasing trend with respect to N.
It could be hoped that relative entropies are more stable. Therefore, we show in Figure 3 the relative entropy of the three ligands as a function of N. It can be seen that the C2 entropies show a consistent trend (P < O < M) for all values of N, although the range (the difference between the largest and the smallest value) increases from 36 to 280 kJ/mol. However, IE entropies do not give any consistent results, although P has the smallest entropy for N ≤ 5000. The range of the entropies increases from 2 to 45 kJ/mol.
We also repeated the entropy calculations from 10 × 10 ns simulations with a sampling frequency of 10 fs (i.e., more similar to what was used in the original IE study23). However, they gave similar results, as can be seen in Figures S1 and S2. Moreover, we tried to remove various amounts of the initial part of the simulation as equilibration (in Figure 2, the equilibration time was 1 ns). However, this did not lead to any qualitative change in the results (Figure S3 shows the results when 51 ns of equilibration was used).
Next, we studied the root-mean-squared deviation (rmsd) of the ligand from the starting (crystallographic) conformation. It can be seen from Figure S4 that it in general fluctuates around 1.2 Å. However, occasionally, it increases to higher values (up to 5.5 Å). For a few simulations, it also stabilizes around 3 Å. These fluctuations do not represent full unbinding of the ligand but a change in the conformation of parts of the ligand. Still, it is likely that these conformational changes may affect the estimated entropies. Minh and co-workers suggested that only conformations with a low rmsd should be used for the entropy calculations,22 and we therefore tested to exclude all snapshots with a ligand rmsd > 2.2 Å. The results in Figure S5 show that it did not improve the convergence of the entropies.
Finally, we have also tried to extrapolate the block-averaged IE entropies with a power series in 1/N, following the suggestion by Zuckerman and co-workers.54,55 However, as can be seen in Table S1, the extrapolations give a large uncertainty and a strong dependence on the exponent of the power series.
Curse of Exponential Averaging
At first, we assumed that we made some error in the calculations, owing to the large discrepancy between the IE and C2 results, but after some further considerations, we convinced ourselves that the calculations are correct and the large discrepancy actually explains the problem. In fact, we had already given the explanation before but in another context.30 Several other groups have also pointed out the poor convergence of exponential averaging.54−58
If ΔEIE follows a Gaussian distribution, the last sum in eq 4 truncates at the second term, and the IE and C2 estimates should coincide. Owing to the central limit theorem, it is reasonable to assume that most ΔEIE data should be approximately Gaussian (ref (22) shows distributions for 87 protein–ligand complexes, supporting this suggestion). Therefore, a powerful technique to judge the performance, convergence, accuracy, and precision of methods like IE is to assume that ΔEIE follows a Gaussian distribution and perform numerical simulations with random Gaussian-distributed data.29 With such a simulation, it is simple to show that as long as the standard deviation of the ΔEIE energies, σIE, is small, ΔSIE and ΔSC2 indeed coincide.
Moreover, we can answer the question: how many independent ΔEIE energies (N) are needed to obtain a reliable estimate of the entropy, which for a Gaussian distribution is the C2 estimate. We only need to specify what we mean by “reliable.” Following our previous study,30 we define reliable as giving an entropy within 4 kJ/mol of the analytical results with 95% confidence (but other values could easily be used). 4 kJ/mol corresponds to a factor of 5 in the binding constant, which seems to be a proper limit for a reliable estimate.
The results of the simulations are shown in Figure 4. It can be seen that for σIE < 10 kJ/mol, the IE entropy converges smoothly and only a rather small number of snapshots is needed (e.g., N = 1100 for σIE = 9 kJ/mol). However, when σIE > 10 kJ/mol, the convergence rapidly deteriorates and N increases exponentially. In practical applications, the upper limit is around N = 10,000,000, at which point the size of the coordinate files becomes several TB, even after stripping of the water molecules. This limit is reached around σIE = 15 kJ/mol. If σIE is larger than that, it will be practically impossible to converge the IE entropy, and the IE estimates will gravely underestimate the true entropy and will increase as the sampling is increased, as observed in Figure 2 (for which σIE = 60–70 kJ/mol).
The reason for this is that the exponential average is extremely badly conditioned.30 The exponential average depends critically on energies that give the largest value of the exponential in eq 3. If the standard deviation of the distribution, σ, is small, these values are rather likely and are therefore frequently found in a simulation. However, as σ increases, the most important values become extremely rare and may actually never be observed in a MD simulation of a normal length. Consequently, the exponential average will typically underestimate the true result (obtained by an infinite number of snapshots, or by assuming that the distribution is indeed Gaussian, in which case the exponential average should converge to the second-order cumulant result). This is illustrated in Figure 5.
Fortunately, the C2 estimate has a better convergence. For example, at σIE = 15 kJ/mol, less than 1000 energies are needed. At σIE = 25 kJ/mol, still only 7200 snapshots are needed, but on the other hand, the estimated entropy starts to be very large, −TΔSC2 = 125 kJ/mol, indicating that other approximations start to break down. The results in Table 1 (to be discussed below) show that NM entropies typically are 52–126 kJ/mol, and Duan et al. report NM entropies of 0–122 kJ/mol.23
Table 1. Values of σIE, TΔSIE, TΔSIE, TΔSIE, and IE from Our Previous MM/GBSA Investigations28,31,41,42,50,51 and in the Present Studya.
protein | ligand | length | f | σIE | –TΔSIE | –TΔSC2 | –TΔSNM |
---|---|---|---|---|---|---|---|
lysozyme42 | Bz | 20 × 0.2 | 5 | 6 | 6 | 7 | 52 |
40 × 0.2 | 6 | 18 | 6 | 53 | |||
b | 10 × 10 | 0.01 | 6 | 15 | 6 | ||
b | 10 × 100 | 10 | 6 | 14 | 7 | ||
ferritin31 | L1 | 40 × 1.2 | 10 | 9 | 15 | 18 | 67 |
L2 | 11 | 25 | 26 | 73 | |||
L3 | 10 | 18 | 20 | 70 | |||
L4 | 10 | 19 | 20 | 68 | |||
L5 | 12 | 21 | 28 | 63 | |||
L6 | 10 | 16 | 20 | 60 | |||
L7 | 10 | 14 | 19 | 59 | |||
L8 | 10 | 13 | 19 | 56 | |||
L9 | 13 | 19 | 34 | 82 | |||
b | L1 | 10 × 7.5 | 0.01 | 13 | 23 | 33 | |
b | 10 × 100 | 10 | 13 | 24 | 34 | ||
galectin-328,b | O | 10 × 10 | 0.01 | 54 | 185 | 583 | |
M | 46 | 199 | 428 | ||||
P | 40 | 206 | 328 | ||||
O | 10 × 100 | 10 | 70 | 162 | 980 | ||
M | 70 | 174 | 990 | ||||
P | 60 | 208 | 714 | ||||
galectin-351 | Lac | 10 × 200 | 100 | 37 | 161 | 274 | 126 |
L02 | 30 | 152 | 177 | 122 | |||
Lac | 40 × 0.2 | 5 | 45 | 190 | 398 | 99 | |
L02 | 49 | 185 | 476 | 100 | |||
fXa50 | CBB | 40 × 0.2 | 5 | 66 | 154 | 880 | 114 |
C9 | 61 | 159 | 737 | 116 | |||
C39 | 40 | 119 | 321 | 116 | |||
C47 | 66 | 167 | 872 | 118 | |||
C49 | 69 | 208 | 949 | 119 | |||
C50 | 71 | 254 | 1009 | 119 | |||
C53 | 66 | 218 | 876 | 119 | |||
C57 | 41 | 113 | 333 | 115 | |||
C63 | 41 | 125 | 330 | 113 | |||
avidin41 | Btn1 | 4 × 25 × 0.2 | 5 | 48 | 166 | 469 | 102 |
Btn2 | 4 × 30 × 0.2 | 47 | 162 | 452 | 103 | ||
Btn3 | 4 × 20 × 0.2 | 50 | 213 | 495 | 100 | ||
Btn4 | 4 × 50 × 0.2 | 25 | 82 | 126 | 99 | ||
Btn5 | 4 × 40 × 0.2 | 25 | 48 | 123 | 78 | ||
Btn6 | 4 × 20 × 0.2 | 20 | 43 | 80 | 75 | ||
Btn7 | 4 × 20 × 0.2 | 21 | 53 | 91 | 66 | ||
Btn1 | 25 × 0.2 | 13 | 30 | 35 | 102 | ||
13 | 38 | 36 | 101 | ||||
14 | 46 | 40 | 101 | ||||
13 | 42 | 32 | 102 | ||||
Btn2 | 30 × 0.2 | 14 | 40 | 41 | 103 | ||
14 | 32 | 40 | 104 | ||||
15 | 40 | 44 | 102 | ||||
14 | 42 | 41 | 103 | ||||
Btn3 | 20 × 0.2 | 14 | 38 | 38 | 98 | ||
14 | 38 | 40 | 100 | ||||
15 | 39 | 43 | 101 | ||||
15 | 56 | 48 | 101 | ||||
Btn4 | 50 × 0.2 | 13 | 29 | 33 | 100 | ||
13 | 33 | 33 | 97 | ||||
13 | 38 | 36 | 99 | ||||
12 | 30 | 31 | 98 | ||||
Btn5 | 40 × 0.2 | 10 | 32 | 21 | 77 | ||
13 | 33 | 33 | 78 | ||||
13 | 38 | 36 | 78 | ||||
12 | 30 | 31 | 79 | ||||
Btn6 | 20 × 0.2 | 10 | 18 | 19 | 74 | ||
9 | 23 | 16 | 75 | ||||
10 | 18 | 18 | 75 | ||||
10 | 18 | 18 | 75 | ||||
Btn7 | 20 × 0.2 | 8 | 19 | 14 | 70 | ||
8 | 15 | 14 | 67 | ||||
8 | 17 | 13 | 64 | ||||
9 | 22 | 16 | 68 |
The table shows the protein, the ligand (named after the original publications), the length of the simulation (number of independent simulations times the length of the production simulation in ns; for avidin, an initial “4×” signifies that the four binding sites in the homotetrameric protein are considered at the same time) and the sampling frequency (f in fs).
Present investigation.
It might seem strange to simulate the C2 entropy with a Gaussian distribution, for which the C2 estimate is exact. However, since the C2 entropy is estimated from the square of the standard deviation from the finite sample of ΔEIE energies (eq 5), the random variation of the sampled values become quite large when σIE is large and therefore many snapshots are needed before the precision is good enough to reproduce the correct value of the C2 entropy within 4 kJ/mol with 95% confidence.
The conclusion from this exercise is that if σIE < 10 kJ/mol, both IE and C2 can be used to estimate the entropy, they should give identical results (within 4 kJ/mol) and ∼1000 snapshots can be used, as is typical for MM/GBSA. If 15 ≤ σIE < 25 kJ/mol, IE becomes impossible to converge; IE and C2 entropies will start to diverge and C2 is strongly to be preferred. For C2, it is still enough with a few thousands of snapshots. Finally, if σIE > 25, C2 can still be converged (up to σIE ≈ 150 kJ/mol), but the estimated entropy is most likely grossly overestimated.
Comparison with Previous Results
To evaluate whether the results of our Gaussian simulations are representative, we examine the results of some of our previous MM/GBSA studies. The data are presented in Table 1 and are typically based on N = 1600–5000 snapshots. It can be seen that the magnitude of σIE depends mainly on the protein. Proteins with a buried binding site give a low σIE, for example, lysozyme (6 kJ/mol), ferritin (9–13 kJ/mol), and avidin (8–15 kJ/mol), whereas proteins for which the ligand binds on the surface give larger σIE, for example, factor Xa (40–71 kJ/mol) and galectin-3 (30–70 kJ/mol). However, there are also clear trends among the ligands. For example, for avidin, small and neutral ligands, like Btn6 and Btn7, give lower values of σIE (8–10 k/mol) than the larger and negatively charged ligands Btn1–Btn3 (13–15 kJ/mol). Moreover, the results also depend on the details of the simulation. For example, if all four binding sites of the tetrameric avidin protein are considered at the same time, σIE becomes much larger than if the sites in each subunit are considered individually.
It is hard to decide the convergence of the calculated entropies without doing additional simulations. However, a first indication can be obtained by comparing the IE and C2 entropies. The difference between these two estimates is shown as a function of σIE in Figure 6. It can be seen that for σIE < 16 kJ/mol, the IE and C2 entropies agree reasonably (within 15 kJ/mol) and with no consistent trend (inset in Figure 6). However, for larger σIE values, the difference between the two methods increases (essentially linearly for σIE > 35 kJ/mol), and the IE entropy is always smaller than the C2 entropy. This confirms our results in the previous section that it is completely impossible to obtain any reliable estimate of the IE entropy if σIE > 15 kJ/mol and that the C2 entropies are more accurate (but too large in magnitude).
Our previous MM/GBSA studies involved also estimation of the entropies with the NM method. There is a fair correlation between entropies calculated between the three methods: R = 0.62 and 0.72 between the NM and C2 or IE entropies, respectively. In fact, the correlation is very good (R = 0.86–0.89 for all three combinations) for lysozyme, ferritin, and avidin, and the NM entropies are always larger in magnitude by 35–72 kJ/mol (54 kJ/mol on average for both IE and C2). On the other hand, there is no correlation between NM and the other two methods (R < 0.12) for galectin-3 and factor Xa (but the IE and C2 entropies have a fair correlation of R = 0.68). In these cases, C2 entropies are the largest, IE entropies are intermediate, and NM entropies are the smallest (54 and 470 kJ/mol smaller than IE and C2 on average), but the variation is large.
Next, we consider the results in the study by Menzer et al., which implicitly reports σIE in their Figure S2.22 Among their 87 examined protein–ligand complexes, σIE varies between 6 and 42 kJ/mol (with an average of 17 kJ/mol). As in our study, the lowest values are found for lysozyme (6–8 kJ/mol). This shows that our estimates of σIE (6–71 kJ/mol) are comparable with σIE estimates in other studies and are not unusually large.
Likewise, we have examined the results in the original IE article.23 They examined 15 protein–ligand complexes with two simulation protocols. They do not report σIE, but we can get an approximation of it from eq 5, but using SIE instead of SC2 (it will underestimate σIE when it is larger ∼15 kJ/mol). This gives values for σIE of 8–24 kJ/mol, 43% of which are larger than 15 kJ/mol. In the first simulation protocol, the protein structure was restrained by a force constant of 42 kJ/mol/mol/Å2, and the simulation was run for 2 ns. In the second protocol, no restraints were used, and the simulation length was 6 ns. In both cases, ΔEIE energies were sampled every 10 fs from the last 1 ns of the simulation (i.e., N = 100,000). The restrained simulations always gave a lower entropy by 1–11 kJ/mol, showing that the restraints are not innocent. On the other hand, it reduced the number of simulations with σIE > 15 kJ/mol from 67 to 20%. Minh and co-workers performed a systematic study of the effect of restraints on the simulations for 54 protein–ligand complexes for five proteins.59 They showed that the calculated entropies vary by up to 140 kJ/mol when the restraint weight varied. The entropies are 26 and 38 kJ/mol on average for IE and C2, with a difference of up to 119 kJ/mol (the average and maximum values of σIE are 14 and 34 kJ/mol). Naturally, TΔSIE is even larger for protein–protein interactions, for example, 171–287 kJ/mol, for 13 complexes studied by Zhang and co-workers.24
Simulations of Lysozyme and Ferritin
To get some additional perspective of the performance of the IE and C2 methods, we run new simulations for two protein–ligand systems, benzene bound to the Leu99Ala T4 lysozyme mutant and phenol bound to a ferritin dimer. The systems were selected to illustrate cases where the two entropy methods are expected to work well (lysozyme with σIE = 6 kJ/mol) and where the performance could start to be problematic (ferritin with σIE ≈ 13 kJ/mol). In both cases, we run two sets of simulations (the same as for galectin-3), 10 × 10 ns simulations with a sampling frequency of 10 fs (N = 10,000,000) and 10 × 100 ns with a sampling frequency of 10 ps (N = 100,000). The pooled simulations were then divided into blocks of decreasing N, as for galectin-3.
The results for lysozyme are shown in Figure 7. For both sets of simulations, it can be seen that the C2 entropies are stable, with a variation of only 0.3–0.4 kJ/mol for the different batch sizes (reflecting a variation of the estimated σIE of 0.1–0.2 kJ/mol). On the other hand, the IE entropy increases by 5–6 kJ/mol as N is increased. Moreover, the estimated entropy is 0.7–0.9 kJ/mol lower when estimated from the simulations with the high sampling frequency (comparing results obtained with the same N). This probably reflects that the 10 fs sampling is too dense: the correlation of the ΔEIE energies between two snapshots 10 fs apart is 0.74, whereas it is 0.04–0.06 between the snapshots 10 ps apart (0.31 for 0.1 ps and 0.07 for 1 ps). The difference goes down to 0.3 kJ/mol for a sampling frequency of 0.1 ps and to essentially zero for a sampling frequency of 1 ps. The estimated statistical efficiency is ∼3. In a previous study, we estimated the correlation time between different MM/GBSA energies to 1–10 ps.41 Sampling correlated energies will not affect the estimated entropies, only slow the convergence (in terms of N but not in terms of the total simulation time). For example, if we use each energy 10 times in the Gaussian simulation, 10 times more energies are needed to reach the same level of convergence for each value of σIE, both for IE and C2 entropies.
It should be noted that this rather strong dependence of the IE entropy on N is unexpected. If we instead base the calculations on Gaussian distributed random energies with the same mean and σIE as the simulated data, the calculated IE entropies vary by only 0.1 and 0.6 kJ/mol for the short and long simulations, respectively (and the C2 entropies by less than 0.01 kJ/mol). Thus, the variation seems to come from the fact that the ΔEIE energies do not exactly follow a Gaussian distribution. This is confirmed by the distributions, as shown in Figure 8. It would then be tempting to prefer the IE results, but the steady increase in the IE entropy with N, without any sign of convergence, makes it problematic to use in practice.
We have examined the movement of the benzene ligand in the MD simulations. Figure S6 shows that the ligand rmsd fluctuates between 0.3 and 4.2 Å, reflecting that the symmetric ligand rotates freely in the binding site (in two of the 100 ns simulations, the ligand rmsd occasionally stabilizes around 3 Å). However, it never leaves the binding site. Therefore, it seems meaningless to restrict the averaging to certain structures; excluding some initial parts of the simulations as equilibration also has a minimal effect on the calculated entropies (Figure S7). Extrapolating the IE entropy54 according to
6 |
gives reasonably consistent results for c = 0.2–0.3, –TΔS = 15–16 kJ/mol (Table S2).
The corresponding results for ferritin are shown in Figure 9. It can be seen that the IE entropy shows the same increasing trend with N as for lysozyme, but the variation is larger in absolute terms, 11–13 kJ/mol. The increase is essentially linear on the logarithmic scale. This variation is slightly larger than expected from Gaussian distributed data, 9–10 kJ/mol.
However, for ferritin, the C2 entropy also shows an increasing trend with N. In fact, the variation is larger than that for the IE entropy, 18–25 kJ/mol and with a more irregular trend. This reflects that σIE shows a similar increasing trend, 9–13 kJ/mol for the long simulation and 6–13 kJ/mol for the shorter simulation. This indicates that the data are distinctly non-Gaussian, as also seen in Figure 10 (with Gaussian data, there should be essentially no variation of the C2 entropy, <0.03 kJ/mol).
As for ferritin, the IE entropies estimated from the short simulations are lower than those from the long simulations, obtained from the same N (by 5–7 kJ/mol). Increasing the sampling frequency to 0.1 and 1 ps decreases the difference to 3 and 0.4 kJ/mol, respectively. This time, the statistical inefficiency is ∼40. Again, this indicates that a sampling frequency of 10 fs is too dense.
The rmsd of the ligand compared to the starting crystal structure is quite large and showing irregular fluctuations between 0.5 and 5 Å. Again, this reflects rotations of the small ligand in the binding site and no unbinding of the ligand. Such rotations are supported by the crystal structure, which shows two conformations of the ligand.34 Again, this makes it meaningless to restrict the averaging to certain structures. Removing the initial part of the simulation has a somewhat larger effect on ferritin than that on lysozyme (Figure S9), up to 5 and 7 kJ/mol for IE and C2 entropies, respectively, but decreasing with N. Extrapolation of the entropies with eq 6 also works worse than that for lysozyme, giving 26–29 kJ/mol for IE and 39–43±4–7 kJ/mol for the C2 entropies, based on the 100 ns simulations and c = 0.2–0.3 (Table S3).
Conclusions
In this study, we have made a critical evaluation of the interaction-entropy method23 and the related approach involving a cumulant expansion, truncated at the second order,22 as cheap estimates of the entropies for MM/GBSA calculations. By employing simulations with Gaussian-distributed random numbers, we illustrate the extremely poorly conditioning of the exponential average, which is involved in the IE method. If the standard deviation of the ΔEIE energies, σIE, is larger than 15 kJ/mol, it becomes practically impossible to obtain converged results. Even worse, it is hard to recognize the problem because the extreme energies that determine the true value of the exponential average become very unlikely (cf. Figure 5). However, a good indication of the poor convergence (besides the large value of σIE) is the steadily increasing value of the estimated entropy as the number of energies included in the exponential average increases, as shown in Figure 2. Several previous studies have pointed out the poor convergence of exponential averages for free-energy estimates and suggested various methods to decide whether the estimate is accurate,54−58,60 Theoretically, the C2 method shows much better convergence, up to σIE = 150 kJ/mol. However, for σIE > 25 kJ/mol, C2 entropies are too large to be realistic for a binding ligand.
Clearly, this is a practical problem because approximately half of the protein–ligand systems studied in the original IE and C2 publications22,23 give σIE > 15 kJ/mol. Moreover, 13% of the systems studied by Minh and co-workers and two of our studied systems give σIE > 25 kJ/mol. This was also the case for 13 protein–protein interactions.24 In that case, it was also observed that C2 entropies were much larger IE entropies. However, the authors suggested that “the Gaussian distribution obtained from relatively short MD runs...does not accurately represent the true energy distribution.” Therefore, they preferred the IE entropies, in contrast to our Gaussian simulations, which show that if IE and C2 entropies differ for large σIE, the C2 results should be trusted more than those from IE. On the other hand, for alanine scanning calculations, the effect of the entropy seems to be relatively small.25−27 In one study, it was shown that predictions by IE and C2 typically agreed with a mean absolute deviation of 1 kJ/mol and maximum deviations of 9 kJ/mol.25
Zhang and co-workers suggested that the problem can be solved by using an extremely frequent sampling (every 10 fs). Our results indicate that this is too dense, giving strongly correlated energies. This does not affect the calculated entropies, but it is inefficient; for lysozyme and ferritin, the statistical inefficiency is 3–40, indicating that a sampling frequency of 0.1 ps would be more appropriate.
Moreover, both Zhang and Minh and co-workers suggested that the protein should be kept restrained during the simulations.23,59 This reduces σIE by 1–16 kJ/mol (7 kJ/mol on average) but of course also reduces the calculated entropy, which is not necessarily innocent, especially as the reduction varies between different proteins (so that the relative entropy also changes). The effect of the restraints on the dynamics and the other energy terms in eq 1 are also significant.59
Moreover, Zhang and co-workers suggested that a cutoff should be employed for the IE energies, so that those >3σIE are ignored in the average of eq 3.24 From a statistical point of view, this is very questionable. As shown in Figure 5, when σIE is large, the correct ΔG or TΔS is completely dominated by the lowest values of ΔEIE. If these are ignored or truncated, the calculated entropy will simply be incorrect (although the convergence to the wrong value will improve). Again, this can be illustrated by a Gaussian simulation, as shown in Figure 11. When σIE < 6 kJ/mol, ΔEIE values outside 3σIE have little influence on the calculated entropy. However, for larger values, the truncated IE estimate rapidly diverges from the analytic results, giving a linear, rather than exponential increase with σIE. The deviation is 5, 17, 38, and 69 kJ/mol at σIE = 10, 15, 20, and 25 kJ/mol, respectively.
Unfortunately, our applications on lysozyme and ferritin show that the convergence is worse than expected from the Gaussian model. For lysozyme (with σIE = 6 kJ/mol), C2 gives converged energies that do not depend on the sampling frequency. However, the IE energies are larger and increase steadily with N, although the variation is rather small (up to 6 kJ/mol). For ferritin (with σIE = 13 kJ/mol), both IE and C2 give entropies that increase steadily with N and with variations of up to 13 and 25 kJ/mol for N between 100 and 10,000,000. This may be caused by the fact that the ΔEIE energies do not follow a Gaussian distribution. However, it can also reflect that as the simulations are elongated, more and more conformational states become available (higher activation barriers can be passed), as has been discussed before.51
There is a clear relation between σIE and the properties of the protein–ligand complexes: σIE is lower in systems where the ligand binds in a buried binding site, than when it binds on the surface. This natural and intuitive: in a solvent-exposed binding site, the ligand most likely retains much of its flexibility and group-rotation degrees of freedom, whereas inside the protein, they may be strongly restricted. However, σIE also seems to increase for charged ligands, simply because the interaction energies increase in magnitude, which is less obvious.
In conclusion, this study gives a rather pessimistic view of the applicability of IE or C2 entropies for MM/GBSA, except for alanine screening. Our results show that σIE should always be reported when using these methods. Moreover, it is advisable to calculate both IE and C2 entropies for the available data and to study how they depend on N by block averaging (which also provide an estimate of the precision of the calculated entropies). In addition, a sampling frequency of 10 fs seems to be 3–40 times too dense. Clearly, the IE method should be avoided if σIE > 15 kJ/mol because it is impossible to converge the exponential average. Moreover, C2 seems to give unrealistically large entropies when σIE > 25 kJ/mol. However, in practice, even when σIE < 15 kJ/mol, both methods often seem to have problems to give converged results. Still, it seems that relative entropies between similar ligands binding to the same protein are more stable, as shown in Figure 3, although the results are far from quantitative. Thus, estimating entropies from MD simulations remains a challenging task.
Acknowledgments
This investigation has been supported by grants from the Swedish research council (project 2018-05003). The computations were performed on computer resources provided by the Swedish National Infrastructure for Computing (SNIC) at Lunarc at Lund University and HPC2N at Umeå University, partially funded by the Swedish Research Council (grant 2018-05973).
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.1c00374.
Absolute and relative entropies for the binding of three ligands to galectin-3 from 10 × 10 ns simulations with a sampling frequency of 10 fs; entropies estimated for the same system after 51 ns of equilibration were removed or after exclusion of snapshots with ligand RMSD > 2.2 Å, as well as ligand rmsd during the simulations and extrapolated values of the IE entropy; ligand rmsd, extrapolated IE entropies, as well as IE and C2 entropies estimated after 6 or 51 ns of equilibration were removed for the binding of benzene to lysozyme; ligand rmsd, extrapolated IE and C2 entropies, as well as IE and C2 entropies estimated after 4.25 or 51 ns of equilibration were removed for the binding of benzene to lysozyme. (PDF)
The authors declare no competing financial interest.
Supplementary Material
References
- Gohlke H.; Klebe G. Approaches to the Description and Prediction of the Binding Affinity of Small-Molecule Ligands to Macromolecular Receptors. Angew. Chem., Int. Ed. 2002, 41, 2644–2676. . [DOI] [PubMed] [Google Scholar]
- Gilson M. K.; Zhou H.-X. Calculation of Protein-Ligand Binding Affinities. Annu. Rev. Biophys. Biomol. Struct. 2007, 36, 21–42. 10.1146/annurev.biophys.36.040306.132550. [DOI] [PubMed] [Google Scholar]
- Wang X.; Song K.; Li L.; Chen L. Structure-Based Drug Design Strategies and Challenges. Curr. Top. Med. Chem. 2018, 18, 998–1006. 10.2174/1568026618666180813152921. [DOI] [PubMed] [Google Scholar]
- Leach A. R.; Shoichet B. K.; Peishoff C. E. Prediction of Protein–Ligand Interactions. Docking and Scoring: Successes and Gaps. J. Med. Chem. 2006, 49, 5851–5855. 10.1021/jm060999m. [DOI] [PubMed] [Google Scholar]
- Kontoyianni M.; Madhav P.; Suchanek E.; Seibel W. Theoretical and Practical Considerations in Virtual Screening: A Beaten Field?. Curr. Med. Chem. 2008, 15, 107–116. 10.2174/092986708783330566. [DOI] [PubMed] [Google Scholar]
- Pinzi L.; Rastelli G. Molecular Docking: Shifting Paradigms in Drug Discovery. Int. J. Mol. Sci. 2019, 20, 433. 10.3390/ijms20184331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hansen N.; Van Gunsteren W. F. Practical Aspects of Free-Energy Calculations: A Review. J. Chem. Theory Comput. 2014, 10, 2632–2647. 10.1021/ct500161f. [DOI] [PubMed] [Google Scholar]
- Wang L.; Chambers J.; Abel R.. Protein-Ligand Binding Free Energy Calculations with FEP+. In Biomolecular Simulations: Methods and Protocols; Bonomi M., Camilloni C., Eds.; Springer New York: New York, NY, 2019; pp 201–232. [DOI] [PubMed] [Google Scholar]
- Mey A. S. J. S.; Allen B. K.; Bruce Macdonald H. E.; Chodera J. D.; Hahn D. F.; Kuhn M.; Michel J.; Mobley D. L.; Naden L. N.; Prasad S.; et al. Best Practices for Alchemical Free Energy Calculations. Living J. Comput. Mol. Sci. 2020, 2, 18378. 10.33011/livecoms.2.1.18378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Christ C. D.; Fox T. Accuracy Assessment and Automation of Free Energy Calculations for Drug Design. J. Chem. Inf. Model. 2014, 54, 108–120. 10.1021/ci4004199. [DOI] [PubMed] [Google Scholar]
- Mikulskis P.; Genheden S.; Ryde U. A Large-Scale Test of Free-Energy Simulation Estimates of Protein-Ligand Binding Affinities. J. Chem. Inf. Model. 2014, 54, 2794–2806. 10.1021/ci5004027. [DOI] [PubMed] [Google Scholar]
- Harder E.; Damm W.; Maple J.; Wu C.; Reboul M.; Xiang J. Y.; Wang L.; Lupyan D.; Dahlgren M. K.; Knight J. L.; et al. OPLS3: A Force Field Providing Broad Coverage of Drug-like Small Molecules and Proteins. J. Chem. Theory Comput. 2016, 12, 281–296. 10.1021/acs.jctc.5b00864. [DOI] [PubMed] [Google Scholar]
- Åqvist J.; Luzhkov V. B.; Brandsdal B. O. Ligand Binding Affinities from MD Simulations. Acc. Chem. Res. 2002, 35, 358–365. 10.1021/ar010014p. [DOI] [PubMed] [Google Scholar]
- Michel J. Current and Emerging Opportunities for Molecular Simulations in Structure-Based Drug Design. Phys. Chem. Chem. Phys. 2014, 16, 4465–4477. 10.1039/c3cp54164a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Salo-Ahen O. M. H.; Alanko I.; Bhadane R.; Bonvin A. M. J. J.; Honorato R. V.; Hossain S.; Juffer A. H.; Kabedev A.; Lahtela-Kakkonen M.; Larsen A. S.; et al. Molecular Dynamics Simulations in Drug Discovery and Pharmaceutical Development. Processes 2021, 9, 71. [Google Scholar]
- Kollman P. A.; Massova I.; Reyes C.; Kuhn B.; Huo S.; Chong L.; Lee M.; Lee T.; Duan Y.; Wang W.; et al. Calculating Structures and Free Energies of Complex Molecules: Combining Molecular Mechanics and Continuum Models. Acc. Chem. Res. 2000, 33, 889–897. 10.1021/ar000033j. [DOI] [PubMed] [Google Scholar]
- Genheden S.; Ryde U. The MM/PBSA and MM/GBSA Methods to Estimate Ligand-Binding Affinities. Expert Opin. Drug Discov. 2015, 10, 449–461. 10.1517/17460441.2015.1032936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang E.; Sun H.; Wang J.; Wang Z.; Liu H.; Zhang J. Z. H.; Hou T. End-Point Binding Free Energy Calculation with MM/PBSA and MM/GBSA: Strategies and Applications in Drug Design. Chem. Rev. 2019, 119, 9478–9508. 10.1021/acs.chemrev.9b00055. [DOI] [PubMed] [Google Scholar]
- Kongsted J.; Ryde U. An Improved Method to Predict the Entropy Term with the MM/PBSA Approach. J. Comput. Aided Mol. Des. 2008, 23, 63–71. 10.1007/s10822-008-9238-z. [DOI] [PubMed] [Google Scholar]
- Genheden S.; Kuhn O.; Mikulskis P.; Hoffmann D.; Ryde U. The Normal-Mode Entropy in the MM/GBSA Method: Effect of System Truncation, Buffer Region, and Dielectric Constant. J. Chem. Inf. Model. 2012, 52, 2079–2088. 10.1021/ci3001919. [DOI] [PubMed] [Google Scholar]
- Gohlke H.; Case D. A. Converging free energy estimates: MM-PB(GB)SA studies on the protein-protein complex Ras-Raf. J. Comput. Chem. 2004, 25, 238–250. 10.1002/jcc.10379. [DOI] [PubMed] [Google Scholar]
- Menzer W. M.; Li C.; Sun W.; Xie B.; Minh D. D. L. Simple Entropy Terms for End-Point Binding Free Energy Calculations. J. Chem. Theory Comput. 2018, 14, 6035–6049. 10.1021/acs.jctc.8b00418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duan L.; Liu X.; Zhang J. Z. H. Interaction Entropy: A New Paradigm for Highly Efficient and Reliable Computation of Protein-Ligand Binding Free Energy. J. Am. Chem. Soc. 2016, 138, 5722–5728. 10.1021/jacs.6b02682. [DOI] [PubMed] [Google Scholar]
- Sun Z.; Yan Y. N.; Yang M.; Zhang J. Z. H. Interaction Entropy for Protein-Protein Binding. J. Chem. Phys. 2017, 146, 124124. 10.1063/1.4978893. [DOI] [PubMed] [Google Scholar]
- Yan Y.; Yang M.; Ji C. G.; Zhang J. Z. H. Interaction Entropy for Computational Alanine Scanning. J. Chem. Inf. Model. 2017, 57, 1112–1122. 10.1021/acs.jcim.6b00734. [DOI] [PubMed] [Google Scholar]
- He L.; Bao J.; Yang Y.; Dong S.; Zhang L.; Qi Y.; Zhang J. Z. H. Study of SHMT2 Inhibitors and Their Binding Mechanism by Computational Alanine Scanning. J. Chem. Inf. Model. 2019, 59, 3871–3878. 10.1021/acs.jcim.9b00370. [DOI] [PubMed] [Google Scholar]
- Liu X.; Peng L.; Zhou Y.; Zhang Y.; Zhang J. Z. H. Computational Alanine Scanning with Interaction Entropy for Protein-Ligand Binding Free Energies. J. Chem. Theory Comput. 2018, 14, 1772–1780. 10.1021/acs.jctc.7b01295. [DOI] [PubMed] [Google Scholar]
- Wallerstein J.; Ekberg V.; Misini Ignjatović M.; Kumar R.; Caldararu O.; Peterson K.; Leffler H.; Logan D. T.; Nilsson U. J.; Ryde U.; et al. Entropy–Entropy Compensation Between the Conformational and Solvent Degrees of Freedom Fine-Tunes Affinity in Ligand Binding to Galectin-3C. JACS Au 2021, 1, 484–500. 10.1021/jacsau.0c00094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boresch S.; Woodcock H. L. Convergence of Single-Step Free Energy Perturbation. Mol. Phys. 2017, 115, 1200–1213. 10.1080/00268976.2016.1269960. [DOI] [Google Scholar]
- Ryde U. How Many Conformations Need To Be Sampled To Obtain Converged QM/MM Energies? The Curse of Exponential Averaging. J. Chem. Theory Comput. 2017, 13, 5745–5752. 10.1021/acs.jctc.7b00826. [DOI] [PubMed] [Google Scholar]
- Mikulskis P.; Genheden S.; Wichmann K.; Ryde U. A Semiempirical Approach to Ligand-Binding Affinities: Dependence on the Hamiltonian and Corrections. J. Comput. Chem. 2012, 33, 1179–1189. 10.1002/jcc.22949. [DOI] [PubMed] [Google Scholar]
- Genheden S.; Kongsted J.; Söderhjelm P.; Ryde U. Nonpolar Solvation Free Energies of Protein–Ligand Complexes. J. Chem. Theory Comput. 2010, 6, 3558–3568. 10.1021/ct100272s. [DOI] [PubMed] [Google Scholar]
- Vedula L. S.; Brannigan G.; Economou N. J.; Xi J.; Hall M. A.; Liu R.; Rossi M. J.; Dailey W. P.; Grasty K. C.; Klein M. L.; et al. A Unitary Anesthetic Binding Site at High Resolution. J. Biol. Chem. 2009, 284, 24176. 10.1074/jbc.m109.017814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morton A.; Matthews B. W. Specificity of Ligand Binding in a Buried Nonpolar Cavity of T4 Lysozyme: Linkage of Dynamics and Structural Plasticity. Biochemistry 1995, 34, 8576–8588. 10.1021/bi00027a007. [DOI] [PubMed] [Google Scholar]
- Mikulskis P.; Cioloboc D.; Andrejić M.; Khare S.; Brorsson J.; Genheden S.; Mata R. A.; Söderhjelm P.; Ryde U. Free-Energy Perturbation and Quantum Mechanical Study of SAMPL4 Octa-Acid Host-Guest Binding Energies. J. Comput. Aided Mol. Des. 2014, 28, 375–400. 10.1007/s10822-014-9739-x. [DOI] [PubMed] [Google Scholar]
- Case D. A.; Ben-Shalom I. Y.; Brozell S. R.; Cerutti D. S.; Cheatham T. E.; Cruzeiro V. W. D.; Darden T. A.; Duke R. E.; Ghoreishi D.; et al. Amber 18; University of California: San Francisco, 2018.
- Maier J. A.; Martinez C.; Kasavajhala K.; Wickstrom L.; Hauser K. E.; Simmerling C. Ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from Ff99SB. J. Chem. Theory Comput. 2015, 11, 3696–3713. 10.1021/acs.jctc.5b00255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Horn H. W.; Swope W. C.; Pitera J. W.; Madura J. D.; Dick T. J.; Hura G. L.; Head-Gordon T. Development of an Improved Four-Site Water Model for Biomolecular Simulations: TIP4P-Ew. J. Chem. Phys. 2004, 120, 9665–9678. 10.1063/1.1683075. [DOI] [PubMed] [Google Scholar]
- Wang J.; Wolf R. M.; Caldwell J. W.; Kollman P. A.; Case D. A. Development and Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25, 1157–1174. 10.1002/jcc.20035. [DOI] [PubMed] [Google Scholar]
- Bayly C. I.; Cieplak P.; Cornell W.; Kollman P. A. A Well-Behaved Electrostatic Potential Based Method Using Charge Restraints for Deriving Atomic Charges: The RESP Model. J. Phys. Chem. 1993, 97, 10269–10280. 10.1021/j100142a004. [DOI] [Google Scholar]
- Genheden S.; Ryde U. How to Obtain Statistically Converged MM/GBSA Results. J. Comput. Chem. 2010, 31, 837–846. 10.1002/jcc.21366. [DOI] [PubMed] [Google Scholar]
- Genheden S.; Ryde U. A Comparison of Different Initialization Protocols to Obtain Statistically Independent Molecular Dynamics Simulations. J. Comput. Chem. 2011, 32, 187–195. 10.1002/jcc.21546. [DOI] [PubMed] [Google Scholar]
- Ryckaert J.-P.; Ciccotti G.; Berendsen H. J. C. Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes. J. Comput. Phys. 1977, 23, 327–341. 10.1016/0021-9991(77)90098-5. [DOI] [Google Scholar]
- Wu X.; Brooks B. R. Self-Guided Langevin Dynamics Simulation Method. Chem. Phys. Lett. 2003, 381, 512–518. 10.1016/j.cplett.2003.10.013. [DOI] [Google Scholar]
- Berendsen H. J. C.; Postma J. P. M.; Van Gunsteren W. F.; DiNola A.; Haak J. R. Molecular Dynamics with Coupling to an External Bath. J. Chem. Phys. 1984, 81, 3684–3690. 10.1063/1.448118. [DOI] [Google Scholar]
- Darden T.; York D.; Pedersen L. Particle mesh Ewald: An N·log(N) method for Ewald sums in large systems. J. Chem. Phys. 1993, 98, 10089. 10.1063/1.464397. [DOI] [Google Scholar]
- Miller B. R.; McGee T. D.; Swails J. M.; Homeyer N.; Gohlke H.; Roitberg A. E. MMPBSA.py: An Efficient Program for End-State Free Energy Calculations. J. Chem. Theory Comput. 2012, 8, 3314–3321. 10.1021/ct300418h. [DOI] [PubMed] [Google Scholar]
- Nguyen H.; Roe D. R.; Simmerling C. Improved Generalized Born Solvent Model Parameters for Protein Simulations. J. Chem. Theory Comput. 2013, 9, 2020–2034. 10.1021/ct3010485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuhn B.; Kollman P. A. Binding of a Diverse Set of Ligands to Avidin and Streptavidin: An Accurate Quantitative Prediction of Their Relative Affinities by a Combination of Molecular Mechanics and Continuum Solvent Models. J. Med. Chem. 2000, 43, 3786–3791. 10.1021/jm000241h. [DOI] [PubMed] [Google Scholar]
- Genheden S.; Söderhjelm P.; Ryde U. Transferability of Conformational Dependent Charges from Protein Simulations. Int. J. Quantum Chem. 2012, 112, 1768–1785. 10.1002/qua.22967. [DOI] [Google Scholar]
- Genheden S.; Ryde U. Will Molecular Dynamics Simulations of Proteins Ever Reach Equilibrium?. Phys. Chem. Chem. Phys. 2012, 14, 8662–8677. 10.1039/c2cp23961b. [DOI] [PubMed] [Google Scholar]
- Box G. E. P.; Muller M. E. A Note on the Generation of Random Normal Deviates. Ann. Math. Stat. 1968, 29, 610–611. 10.1214/aoms/1177706645. [DOI] [Google Scholar]
- Genheden S.; Akke M.; Ryde U. Conformational Entropies and Order Parameters: Convergence, Reproducibility, and Transferability. J. Chem. Theory Comput. 2014, 10, 432–438. 10.1021/ct400747s. [DOI] [PubMed] [Google Scholar]
- Zuckerman D. M.; Woolf T. B. Overcoming Finite-Sampling Errors in Fast-Switching Free-Energy Estimates: Extrapolative Analysis of a Molecular System. Chem. Phys. Lett. 2002, 351, 445–453. 10.1016/s0009-2614(01)01397-5. [DOI] [Google Scholar]
- Zuckerman D. M.; Woolf T. B. Theory of a Systematic Computational Error in Free Energy Differences. Phys. Rev. Lett. 2002, 89, 180602. 10.1103/physrevlett.89.180602. [DOI] [PubMed] [Google Scholar]
- Lu N.; Kofke D. A. Accuracy of Free-Energy Perturbation Calculations in Molecular Simulation. I. Modeling. J. Chem. Phys. 2001, 114, 7303–7311. 10.1063/1.1359181. [DOI] [Google Scholar]
- Gore J.; Ritort F.; Bustamante C. Bias and Error in Estimates of Equilibrium Free-Energy Differences from Nonequilibrium Measurements. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 12564–12569. 10.1073/pnas.1635159100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu D.; Kofke D. A. Model for Small-Sample Bias of Free-Energy Calculations Applied to Gaussian-Distributed Nonequilibrium Work Measurements. J. Chem. Phys. 2004, 121, 8742–8747. 10.1063/1.1806413. [DOI] [PubMed] [Google Scholar]
- Menzer W. M.; Xie B.; Minh D. D. L. On Restraints in End-Point Protein-Ligand Binding Free Energy Calculations. J. Comput. Chem. 2020, 41, 573–586. 10.1002/jcc.26119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu D.; Kofke D. A. Phase-Space Overlap Measures. I. Fail-Safe Bias Detection in Free Energies Calculated by Molecular Simulation. J. Chem. Phys. 2005, 123, 054103. 10.1063/1.1992483. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.