Free Energy Differences from Molecular Simulations: Exact Confidence Intervals from Transition Counts

Pavel Kříž; Jan Beránek; Vojtěch Spiwok

doi:10.1021/acs.jctc.2c01237

. 2023 Mar 16;19(7):2102–2108. doi: 10.1021/acs.jctc.2c01237

Free Energy Differences from Molecular Simulations: Exact Confidence Intervals from Transition Counts

Pavel Kříž ^†, Jan Beránek ^‡, Vojtěch Spiwok ^‡,^*

PMCID: PMC10100533 PMID: 36926862

Abstract

graphic file with name ct2c01237_0006.jpg

Here, we demonstrate a method to estimate the uncertainty (confidence intervals and standard errors) of free energy differences calculated by molecular simulations. The widths of confidence intervals and standard errors can be calculated solely from temperature and the number of transitions between states. Uncertainty (95% confidence interval) lower than ±1 kcal/mol can be achieved by a simulation with four forward and four reverse transitions. For a two-state Markovian system, the confidence interval is exact, regardless the number of transitions.

1. Introduction

One of the main purposes of molecular simulations is to predict equilibrium constants and associated free energy differences for processes, such as conformational changes, formation of noncovalent complexes, chemical reactions, or phase transitions. The equilibrium constant of a transition from state A to B can be determined experimentally as a fraction of the concentrations of B and A in equilibrium. In molecular simulations, it is common to simulate only one copy of the system (e.g., a single solvated protein or a protein–ligand pair); thus, the equilibrium constant can be predicted as a fraction of time spent in states B and A. For practical application of such predictions, it is necessary to assess their accuracy.

Accuracy of molecular simulations is determined by systematic and random errors. Systematic errors can be caused, for example, by over- or underestimation of some noncovalent interactions or oversimplification of the structure-energy relationship. In this work, we deal with random errors (uncertainties) caused by lack of sampling. They can be, in principle, eliminated by running infinitely long simulations.

Current statistical methods used in this field are based on the treatment of autocorrelation.¹⁻³ Many characteristics of a molecular system, including the densities of states, can be calculated by averaging along the trajectory. Accuracy of the mean value can be characterized by a standard error of the mean. It is commonly calculated as Inline graphic , where σ is the standard deviation and n is the number of samples. However, n can be set to the number of samples only if these samples are independent. This is not the case of states sampled along a simulation trajectory because they are strongly autocorrelated. To apply to a simulation trajectory, it is necessary to estimate the number of independent samples n, which is significantly lower than the number of all samples, either by block averaging^1,2 or autocorrelation analysis.³

As an alternative, we present a simple method “JumpCount” based solely on the number of observed transitions between states of the system. The prerequisite of the method is that the states of the system can be clearly distinguished, there is an energy barrier between them, and the process is Markovian, i.e., the probability of transition from one state to another is constant and independent of the history of the system, and it can be studied by the first order kinetics. Finally, the simulation is sampled with a frequency allowing to capture all transitions.

The method is demonstrated on model data (random numbers with an appropriate statistical distribution) and two types of molecular systems–glycerol and fast-folding miniproteins.

2. Theory

First, we will derive an expression of the uncertainties of an equilibrium constant and the associated free energy difference for a simple reversible transition from state A to state B. A simulation can be performed in a way that undergoes the same number of transitions from A to B (n_A) and from B to A (n_B). This is illustrated in Figure 1 for the number of transitions n_A = n_B = 3.

A schematic representation of a trajectory with three A to B and three B to A transitions (data taken from glycerol simulation).

For a Markovian process, the first time passage times for the transition from A to B (and vice versa) are exponentially distributed with the probability P(t_A) = k₁ exp(−k₁t_A), where k₁ is a rate constant for the transition from A to B. Analogously, P(t_B) = k_–1 exp(−k_–1t_B). It is possible to estimate k₁ and k_–1 as n_A/∑_it_A,i and n_B/∑_it_B,i, respectively.

The sum of independent random variables with exponential distribution follows the gamma distribution, namely, Gamma (shape = n_A, rate = k₁) for Inline graphic . The equilibrium constant K̂ can be estimated as (K̂ will be used as a symbol for an estimate of the true equilibrium constant K). The corresponding free energy difference can be estimated as , where k is Boltzmann constant and T is the temperature in Kelvins. The fraction of the estimate and the true value of K Inline graphic can be written as a rescaled fraction of independent gamma-distributed random variables and follows the F-distribution (also known as Fisher-Snedecor distribution, known from F-test and ANOVA)⁴ with degrees of freedom d₁ = 2n_B and d₂ = 2n_A. The confidence interval (CI) of K can therefore be calculated using the quantile function of the F-distribution. The 95% CI of K can be calculated as

where qF is the quantile of F-distribution (inverse of the cumulative distribution function) with subscripts d₁ and d₂.

For free energy, the 95% CI can be calculated as

As a result, the CI of the free energy difference depends solely on temperature and the number of transitions. For 300 K, the 95% CI for the free energy difference is Inline graphic (2.18 kcal/mol) for n_A = n_B = 1, (1.35 kcal/mol) for n_A = n_B = 2, etc. (see Table 1). The confidence interval , which is often used as a threshold of accuracy in molecular simulations, can be reached in a simulation with n_A = n_B = 4.

Table 1. 95% Confidence Intervals at 300 K (Mean ± Error) for Processes with Numbers of Transitions from A to B (n_A) and from B to A (n_B).

Open in a new tab

The concept described above can be easily generalized to n_A = n_B + 1 (we denote the starting state as A). In this case, the confidence interval can be calculated by setting the degrees of freedom d₁ = 2n_B and d₂ = 2n_A in the F-distribution. The confidence interval is then asymmetric (see Table 1).

The variance of ΔG₀ can be calculated as

Since K is exact (its variance is 0) and K̂/K follows F-distribution, the variance of Inline graphic and standard error can be calculated as

where ψ⁽¹⁾ is a polygamma function of order 1 (trigamma). The values of standard error are presented in Table 2. However, we believe that confidence intervals are more informative because the free energy estimates discussed here are not normally distributed and most researchers are familiar with standard errors in the context of normally distributed samples.

Table 2. Standard Errors at 300 K (Mean ± Error) for Processes with Numbers of Transitions from A to B (n_A) and from B to A (n_B).

Open in a new tab

For a system with multiple states (e.g., A, B, and C), it is possible to calculate confidence intervals and standard errors as described above, however, it is necessary to count only the accomplished transitions between states. For example, when calculating ΔG_A→C it is necessary to count the process with transitions A → B → A → B → C as a single accomplished transition from A to C. The resulting numbers of accomplished transitions n_A and n_C can be used in eqs 1 and 2 to obtain CI of K_A→C and ΔG_A→C (see Supporting Information for a demonstration of the fact that the sum of t_Ai and the sum of t_Ci follow the Gamma(shape = n_A, ...) and Gamma(shape = n_C, ...) distributions, respectively).

As an alternative, for a system with multiple states, it is possible to calculate the free energy differences for states with direct transitions (e.g., A to B and B to C, for a system with A ⇌ B ⇌ C transitions) separately and combine errors as Inline graphic . This approach is rigorous for variables with normal distribution. It is possible to apply it to systems with a high number of transitions, because the distribution of K̂/K becomes a close-to-normal distribution according to the central limit theorem.

3. Computational Details

Numerical simulations with random numbers were performed in R 3.4.4.⁵ Molecular dynamics simulation of gylcerol in water was conducted using GROMACS 2018.6. Relevant preparation steps were conducted using GROMACS 2022.3.⁶ The preparation of the system for MD simulation was as follows: Starting structure of sn-glycerol was obtained by conversion of SMILES file to PDB format using Open Babel.⁷ Topology was built manually according to Glycam06.⁸ Partial atomic charges were calculated at HF/6-31G*//HF/6-31G* level of the theory using the RESP method.⁹ Simulation box was cubic with a size of 2.94013 nm. The molecule was solvated by TIP3P water. No ions were added, as the net charge of the system was zero. Potential energy of the system was then minimized using the steepest descent method, until the maximum force acting on any atom was lower than 1000 kJ/mol/nm. This step was followed by isothermal-isochoric equilibration and isothermal–isobaric equilibration, each at 300 K for 100 ps.

Then, 1 μs long MD simulation was conducted. At the beginning of the simulation, the velocities of atoms were generated randomly from Maxwell distribution for 300 K. In the relevant steps of equilibration and MD run, the following parameters were used: Leapfrog integrator (md), radius for short-range electrostatic and van der Waals was set to 1 nm (spanning the whole molecule of glycerol). Particle Mesh Ewald method¹⁰ was used for computing long-range electrostatic interactions. Temperature coupling was conducted using Parrinello–Bussi thermostat¹¹ (300 K) and pressure coupling was conducted with Parrinello–Rahman barostat¹² (1 bar). After obtaining the trajectory of glycerol from the MD run, the values of torsion angles in the molecule were computed at each 1 ps using Plumed.¹³

Trajectories of fast-folding miniproteins were taken from literature.¹⁴

4. Results

The easiest way to test the concept is to generate the first time passage times as exponentially distributed random numbers. This was performed in the R software (see Supporting Information). We tested scenarios with n_A = n_B set to 1 to 20 and with K set to 1 to 1,000. The values of k_–1 and k₁ were set to 1 and K, respectively (in arbitrary units). For each pair of n and K we generated 10,000 sets of first time passage times and calculated 95% CI of K and 95% CI of the free energy difference (by eqs 1 and 2, respectively). We expect the rate of type I errors (i.e., the fraction of trials for which the predefined K lies outside the calculated CI) to be 5%. Indeed, the type I error rate ranged from 4.50 to 5.52% with a median of 4.99% (Figure 2). Similar results were observed for n_A = n_B + 1 (see Supporting Information).

Rates of type 1 errors (in %) for different n_A = n_B and K in simulations using generated random numbers with exponential distribution.

Numerical support for eq 5 for calculation of standard errors is demonstrated in Figure 3. For each n_A and n_B we generated n × 10,000 sets of first time passage times and calculated standard error by eq 5 and numerically as a standard deviation of 10,000 calculated values of ΔG₀. There is a perfect agreement.

Standard errors calculated by eq 5 (lines) and numerically (points) for n_A = n_B (A) and n_B = n_AB + 1 (B).

Our approach was demonstrated on two types of molecular systems. An example of a molecular system with multiple states is a glycerol molecule in water.¹⁵ Each of the two O–C–C–O torsion angles can adopt three minima. This gives nine combinations; however, the three pairs are equivalent as a result of the symmetry of the molecule; thus, six conformers can be experimentally resolved. Equilibria of these conformers have been studied by molecular simulations as well as experimentally.^15,16

Here, we performed a 1 μs simulation of glycerol in water and calculated the equilibria for six conformers. Consistent with simulation and experimental studies^15,16 we observed conformer populations αγ > αβ > αα > βγ > γγ > ββ ∼ γγ. Confidence intervals for all conformers relative to αγ were calculated at times 5, 10, 20, 50, 100, 200, 500, and 1,000 ns. In total, 37 confidence intervals were calculated (eight for each conformer except αγ, confidence intervals for γγ were not available at 5, 10, and 20 ns due to no sampling). These confidence intervals were compared with the value of ΔG calculated from the whole trajectory. Since we do not know the exact value of ΔG, we used this value as a “ground truth”.

One of these 37 confidence intervals was not spanning the ground truth ΔG. This was the case of ΔG of βγ at 50 ns (Figure 4). The distance of ΔG from the confidence interval was very low. Figures for other conformers are available in Supporting Information. Since we compare the confidence intervals with the estimate of ΔG calculated for whole trajectory, not the exact value of ΔG, we expect the rate of type 1 errors to be lower than 5%. This is in agreement with the observed 1/37 (2.7%).

Confidence intervals of free energy of βγ conformer of glycerol, relative to αγ. The value of free energy calculated for the whole simulation is depicted as a blue line. A confidence interval that does not span this value is depicted in red.

The main prerequisite of the above-outlined approach is Markovianity of the processes studied. This is usually fulfilled for transitions associated with a single energy barrier, such as simple conformational changes or chemical reactions. More complex transitions, such as protein folding, can be non-Markovian.¹⁷ The method of molecular dynamics simulation is Markovian because each state of the molecular system depends solely on the previous state, not on the history. However, coarse graining of the system representation into a few substates (e.g., folded and unfolded, ligand-bound, and unbound) and ignoring the complex kinetics within these states may cause the Markovianity condition to be not fulfilled.

Keeping in mind the limitation of the Markovianity prerequisite, we applied our approach to the folding and unfolding trajectories of fast folding mini-proteins.¹⁴ Trajectories were kindly provided by D.E. Shaw research. Root-mean-square deviation (RMSD) profiles from the native structure were calculated, and folded and unfolded states were assigned by visual inspection of RMSD profiles and trajectories. Folding free energies were estimated as Inline graphic for the whole trajectories (multiple trajectories of the same system were combined). These values were used as a ground truth.

They were compared with Inline graphic calculated at 20 points equally distributed along each trajectory. The ground truth was outside the 95% confidence intervals for six of 432 (1.39%) values. These confidence intervals are shown in Figure 5. Corresponding plots for other systems can be obtained in Supporting Information.

Confidence intervals of folding free energies for (A) NTL9 (simulation 2), (B) protein G (simulation 0), and (C) α3D (simulation 1). Top part of each subplot shows calculated folding free energy with confidence intervals depicted as error bars. The value of folding free energy calculated for the whole simulation is depicted as a blue line. Confidence intervals that do not span this value are depicted in red. Bottom parts show RMSD profiles. Folded states are highlighted by gray background.

Since we do not know the exact value of ΔG_folding and we used the value calculated for the whole simulation as the ground truth, we expect a lower number of type 1 errors (values outside 95% CI) than 5%, which is in good agreement with the observed 1.39%. Furthermore, all values outside the confidence intervals were located very close to them.

5. Discussion

Our results show that relatively accurate predictions of ΔG can be obtained from simulations with relatively low number of transitions between the states. This finding may reduce the costs and increase the efficiency of future applications of molecular simulations in ligand design and other fields.

Most important, in our opinion, is the fact that the method is very easy to use. It requires counting of transitions between states of the system and looking up confidence intervals from a table or by a simple program. It must be kept in mind that automated identification of transitions between states can be challenging. While for the glycerol system presented above it was possible to decide the state based on torsion angles, for fast-folding miniproteins, it was necessary to count transitions “manually” based on visual inspection of the trajectories, because the value of RMSD from the native structure cannot strictly distinguish the folded and unfolded states.

It is possible to generalize the concept to binding, such as simulation of protein–ligand interactions. The dissociation constant of a complex PL (the equilibrium constant of PL ⇌ P + L) can be expressed as K_d = c_L∑t_P/∑t_PL, where c_L is the concentration of the ligand in the simulation box in the unbound state. The value of ΔG₀ is calculated as kT log K_d. Therefore, the accuracy of ∑t_P/∑t_PL (and thus for the resulting K_d and binding ΔG_bind) can be calculated as described above.

The concept described above was presented for one long simulation with multiple transitions between the states of the system. Another design of molecular simulations can be to run a series of n_A independent simulations starting from the state A until they transit to state B and a series of n_B simulations starting from state B until they transit to the state A. The value of K can be estimated as

The confidence intervals and standard errors can be estimated by the equations presented above.

However, we must warn readers who are not patient enough to run all simulations until all transitions are observed. The method described above can be used only when all simulations result in transitions from A to B or from B to A.

Let us consider the situation when 10 simulations starting from A are performed, but only one of them (i = 1) results in a transition to B. Other 9 simulations end at time t_max without a transition. Some users might be tempted to estimate the rate constant k₁ as 1/t_A1 and to discard the results of 9 unsuccessful simulations. However, k₁ is likely to be significantly overestimated because 9 of the 10 simulations did not reach state B. The correct estimator of k₁ with unfinished simulations (right-censored t) is

where n_A is the number of simulations in which the transition from A to B was observed (indexes i) and m_A is the number of simulations in which the time t_max is reached without a transition.

Therefore, K can be estimated as

Derivation of equations for this design of simulations is presented in Supporting Information. The fact that some long simulations starting from state A fail to reach state B, while others reach it quickly, can also be a signature of non-Markovianity.

The results show that our approach can be applied to fast-folding mini-proteins. Either the degree of non-Markovianity in these systems is not high enough to significantly affect the performance of our approach or this approach is robust enough for the non-Markovianity typical for biomolecular systems.

In principle, non-Markovianity can be tested by inferential statistics methods, such as by Kolmogorov–Smirnov test (or some other goodness-of-fit test) to assess the deviation from the exponential distributions. However, we would like to stress that the power of the Kolmogorov–Smirnov test is rather low, i.e., it can reject Markovianity provided that there are enough samples (enough transitions). With few transitions, the test neither rejects nor approves Markovianity.

The problem of non-Makovianity can be solved by dissection of sampled states into a minimal set of substates for which mutual transitions are Markovian. This approach is used when building Markov state models.¹⁸ Alternatively, it would be possible to estimate the true distribution of t_A and t_B for non-Markovian systems and derive corresponding distributions for K and ΔG.

We argue that the approach can be generalized to biased simulations with a static bias potential, such as single-replica umbrella sampling.¹⁹ Adapting the approach to methods with a time-dependent bias potential, such as metadynamics,²⁰ will be the subject of future research.

In Supporting Information, it is possible to find commands to calculate confidence intervals and standard errors in various programming languages. Data necessary to reproduce all calculations are available online via Zenodo (dx.doi.org/10.5281/zenodo.7610654). Online calculator of CI and standard errors is available at https://jumpcount.cz.

Acknowledgments

The authors would like to thank D.E. Shaw Research for providing us trajectories. The work was supported by the Czech Science Foundation (22-29667S and 19-16857S). Computational resources were supplied by the project “e-Infrastruktura CZ” (e-INFRA CZ LM2018140) and ELIXIR-CZ project (LM2023055) supported by the Ministry of Education, Youth and Sports of the Czech Republic.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.2c01237.

Testing of the method on random numbers with exponential distribution, detailed results of simulation of aqueous glycerol, full results of application of the method on fast-folding miniproteins, and instructions to calculate errors in various programming languages (PDF)

The authors declare no competing financial interest.

Supplementary Material

ct2c01237_si_001.pdf^{(6.4MB, pdf)}

References

Flyvbjerg H.; Petersen H. G. Error estimates on averages of correlated data. J. Chem. Phys. 1989, 91, 461–466. 10.1063/1.457480. [DOI] [Google Scholar]
Bussi G.; Tribello G. A. In Biomolecular Simulations: Methods and Protocols; Bonomi M., Camilloni C., Eds.; Springer: New York, NY, 2019; pp 529–578. [Google Scholar]
Klimovich P. V.; Shirts M. R.; Mobley D. L. Guidelines for the analysis of free energy calculations. J. Comput. Aided Mol. Des. 2015, 29, 397–411. 10.1007/s10822-015-9840-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fisher R. A. On the interpretation of χ² from contingency tables and the calculation of P. J. R. Stat. Soc. 1922, 85, 87–94. 10.2307/2340521. [DOI] [Google Scholar]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
Hess B.; Kutzner C.; van der Spoel D.; Lindahl E. GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. J. Chem. Theory Comput. 2008, 4, 435–447. 10.1021/ct700301q. [DOI] [PubMed] [Google Scholar]
O’Boyle N. M.; Banck M.; James C. A.; Morley C.; Vandermeersch T.; Hutchison G. R. Open Babel: An open chemical toolbox. J. Cheminform. 2011, 3, 33. 10.1186/1758-2946-3-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kirschner K. N.; Yongye A. B.; Tschampel S. M.; González-Outeiriño J.; Daniels C. R.; Foley B. L.; Woods R. J. GLYCAM06: A generalizable biomolecular force field. Carbohydrates. J. Comput. Chem. 2008, 29, 622–655. 10.1002/jcc.20820. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bayly C. I.; Cieplak P.; Cornell W.; Kollman P. A. A well-behaved electrostatic potential based method using charge restraints for deriving atomic charges: the RESP model. J. Phys. Chem. 1993, 97, 10269–10280. 10.1021/j100142a004. [DOI] [Google Scholar]
Darden T.; York D.; Pedersen L. Particle mesh Ewald: An N · log(N) method for Ewald sums in large systems. J. Chem. Phys. 1993, 98, 10089–10092. 10.1063/1.464397. [DOI] [Google Scholar]
Bussi G.; Donadio D.; Parrinello M. Canonical sampling through velocity rescaling. J. Chem. Phys. 2007, 126, 014101. 10.1063/1.2408420. [DOI] [PubMed] [Google Scholar]
Parrinello M.; Rahman A. Polymorphic transitions in single crystals: A new molecular dynamics method. J. Appl. Phys. 1981, 52, 7182–7190. 10.1063/1.328693. [DOI] [Google Scholar]
Tribello G. A.; Bonomi M.; Branduardi D.; Camilloni C.; Bussi G. PLUMED 2: New feathers for an old bird. Comput. Phys. Com. 2014, 185, 604–613. 10.1016/j.cpc.2013.09.018. [DOI] [Google Scholar]
Lindorff-Larsen K.; Piana S.; Dror R. O.; Shaw D. E. How Fast-Folding Proteins Fold. Science 2011, 334, 517–520. 10.1126/science.1208351. [DOI] [PubMed] [Google Scholar]
Yongye A. B.; Foley B. L.; Woods R. J. On Achieving Experimental Accuracy from Molecular Dynamics Simulations of Flexible Molecules: Aqueous Glycerol. J. Phys. Chem. A 2008, 112, 2634–2639. 10.1021/jp710544s. [DOI] [PMC free article] [PubMed] [Google Scholar]
van Koningsveld H. A conformational study on glycerol in a D₂O solution by means of 220 Mc PMR data. Recueil des Travaux Chimiques des Pays-Bas 1970, 89, 801–812. 10.1002/recl.19700890806. [DOI] [Google Scholar]
Ayaz C.; Tepper L.; Brünig F. N.; Kappler J.; Daldrop J. O.; Netz R. R. Non-Markovian modeling of protein folding. Proc. Natl. Acad. Sci. U.S.A. 2021, 118, e2023856118 10.1073/pnas.2023856118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bowman G. R.; Pande V. S.; Noé F.. An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation; Springer: Dordrecht, 2014. [Google Scholar]
Torrie G.; Valleau J. Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling. J. Comput. Phys. 1977, 23, 187–199. 10.1016/0021-9991(77)90121-8. [DOI] [Google Scholar]
Laio A.; Parrinello M. Escaping free-energy minima. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 12562–12566. 10.1073/pnas.202427399. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ct2c01237_si_001.pdf^{(6.4MB, pdf)}

[ref1] Flyvbjerg H.; Petersen H. G. Error estimates on averages of correlated data. J. Chem. Phys. 1989, 91, 461–466. 10.1063/1.457480. [DOI] [Google Scholar]

[ref2] Bussi G.; Tribello G. A. In Biomolecular Simulations: Methods and Protocols; Bonomi M., Camilloni C., Eds.; Springer: New York, NY, 2019; pp 529–578. [Google Scholar]

[ref3] Klimovich P. V.; Shirts M. R.; Mobley D. L. Guidelines for the analysis of free energy calculations. J. Comput. Aided Mol. Des. 2015, 29, 397–411. 10.1007/s10822-015-9840-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] Fisher R. A. On the interpretation of χ² from contingency tables and the calculation of P. J. R. Stat. Soc. 1922, 85, 87–94. 10.2307/2340521. [DOI] [Google Scholar]

[ref5] R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]

[ref6] Hess B.; Kutzner C.; van der Spoel D.; Lindahl E. GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. J. Chem. Theory Comput. 2008, 4, 435–447. 10.1021/ct700301q. [DOI] [PubMed] [Google Scholar]

[ref7] O’Boyle N. M.; Banck M.; James C. A.; Morley C.; Vandermeersch T.; Hutchison G. R. Open Babel: An open chemical toolbox. J. Cheminform. 2011, 3, 33. 10.1186/1758-2946-3-33. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] Kirschner K. N.; Yongye A. B.; Tschampel S. M.; González-Outeiriño J.; Daniels C. R.; Foley B. L.; Woods R. J. GLYCAM06: A generalizable biomolecular force field. Carbohydrates. J. Comput. Chem. 2008, 29, 622–655. 10.1002/jcc.20820. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] Bayly C. I.; Cieplak P.; Cornell W.; Kollman P. A. A well-behaved electrostatic potential based method using charge restraints for deriving atomic charges: the RESP model. J. Phys. Chem. 1993, 97, 10269–10280. 10.1021/j100142a004. [DOI] [Google Scholar]

[ref10] Darden T.; York D.; Pedersen L. Particle mesh Ewald: An N · log(N) method for Ewald sums in large systems. J. Chem. Phys. 1993, 98, 10089–10092. 10.1063/1.464397. [DOI] [Google Scholar]

[ref11] Bussi G.; Donadio D.; Parrinello M. Canonical sampling through velocity rescaling. J. Chem. Phys. 2007, 126, 014101. 10.1063/1.2408420. [DOI] [PubMed] [Google Scholar]

[ref12] Parrinello M.; Rahman A. Polymorphic transitions in single crystals: A new molecular dynamics method. J. Appl. Phys. 1981, 52, 7182–7190. 10.1063/1.328693. [DOI] [Google Scholar]

[ref13] Tribello G. A.; Bonomi M.; Branduardi D.; Camilloni C.; Bussi G. PLUMED 2: New feathers for an old bird. Comput. Phys. Com. 2014, 185, 604–613. 10.1016/j.cpc.2013.09.018. [DOI] [Google Scholar]

[ref14] Lindorff-Larsen K.; Piana S.; Dror R. O.; Shaw D. E. How Fast-Folding Proteins Fold. Science 2011, 334, 517–520. 10.1126/science.1208351. [DOI] [PubMed] [Google Scholar]

[ref15] Yongye A. B.; Foley B. L.; Woods R. J. On Achieving Experimental Accuracy from Molecular Dynamics Simulations of Flexible Molecules: Aqueous Glycerol. J. Phys. Chem. A 2008, 112, 2634–2639. 10.1021/jp710544s. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] van Koningsveld H. A conformational study on glycerol in a D₂O solution by means of 220 Mc PMR data. Recueil des Travaux Chimiques des Pays-Bas 1970, 89, 801–812. 10.1002/recl.19700890806. [DOI] [Google Scholar]

[ref17] Ayaz C.; Tepper L.; Brünig F. N.; Kappler J.; Daldrop J. O.; Netz R. R. Non-Markovian modeling of protein folding. Proc. Natl. Acad. Sci. U.S.A. 2021, 118, e2023856118 10.1073/pnas.2023856118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref18] Bowman G. R.; Pande V. S.; Noé F.. An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation; Springer: Dordrecht, 2014. [Google Scholar]

[ref19] Torrie G.; Valleau J. Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling. J. Comput. Phys. 1977, 23, 187–199. 10.1016/0021-9991(77)90121-8. [DOI] [Google Scholar]

[ref20] Laio A.; Parrinello M. Escaping free-energy minima. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 12562–12566. 10.1073/pnas.202427399. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Free Energy Differences from Molecular Simulations: Exact Confidence Intervals from Transition Counts

Pavel Kříž

Jan Beránek

Vojtěch Spiwok

Abstract

1. Introduction

2. Theory

Figure 1.

Table 1. 95% Confidence Intervals at 300 K (Mean ± Error) for Processes with Numbers of Transitions from A to B (n_A) and from B to A (n_B).

Table 2. Standard Errors at 300 K (Mean ± Error) for Processes with Numbers of Transitions from A to B (n_A) and from B to A (n_B).

3. Computational Details

4. Results

Figure 2.

Figure 3.

Figure 4.

Figure 5.

5. Discussion

Acknowledgments

Supporting Information Available

Supplementary Material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Free Energy Differences from Molecular Simulations: Exact Confidence Intervals from Transition Counts

Pavel Kříž

Jan Beránek

Vojtěch Spiwok

Abstract

1. Introduction

2. Theory

Figure 1.

Table 1. 95% Confidence Intervals at 300 K (Mean ± Error) for Processes with Numbers of Transitions from A to B (nA) and from B to A (nB).

Table 2. Standard Errors at 300 K (Mean ± Error) for Processes with Numbers of Transitions from A to B (nA) and from B to A (nB).

3. Computational Details

4. Results

Figure 2.

Figure 3.

Figure 4.

Figure 5.

5. Discussion

Acknowledgments

Supporting Information Available

Supplementary Material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 1. 95% Confidence Intervals at 300 K (Mean ± Error) for Processes with Numbers of Transitions from A to B (n_A) and from B to A (n_B).

Table 2. Standard Errors at 300 K (Mean ± Error) for Processes with Numbers of Transitions from A to B (n_A) and from B to A (n_B).