Abstract
Free energy differences are essential to a quantitative characterization and understanding of chemical and biological processes. Their direct estimation with an accurate quantum mechanical potential is of great interest and yet impractical due to high computational cost and incompatibility with typical alchemical free energy protocols. One promising solution is the multi-level free energy simulation in which the estimate of at an inexpensive low level of theory is combined with the correction toward a higher level of theory. The poor configurational overlap generally expected between the two levels of theory, however, presents a major challenge. We overcome this challenge using a deep neural network model and enhanced sampling simulations. An adversarial autoencoder is used to identify a low-dimensional (latent) space that compactly represents the degrees of freedom that encode the distinct distributions at the two levels of theory. Enhanced sampling in this latent space is then used to drive the sampling of configurations that predominantly contribute to the free energy correction. Results for both gas phase and condensed phase systems demonstrate that this data-driven approach offers high accuracy and efficiency with great potential of scalability to complex systems.
Graphical Abstract

1. Introduction
The reliable estimation of free energy differences between two thermodynamic states of a (bio)molecular system has been an active research area for decades.1–4 is not only closely tied to various important quantities such as binding constant, , and solubility, but also able to provide compelling insights into processes for which direct experimental information is difficult or even impossible to obtain.5,6 As a result, free energy simulations (FES) have been applied to diverse areas including drug discovery7 and protein design/engineering.8 Although conventional methods based on molecular mechanical (MM) force fields (FF) and molecular dynamics (MD) simulations have been widely employed,7,9,10 FES with quantum mechanical (QM) methods have increasingly gained interest as they are more accurate than MM models for the description of intramolecular conformations and intermolecular interactions,11–15 and they generally require little to none empirical parameterizations. Moreover, by treating reactive groups with a QM method and the surrounding environment with MM FF, the combined QM/MM method has become a routine tool in the study of covalent inhibition and enzyme catalysis.16,17 However, the high computational cost of QM methods significantly limits the practical application of QM-based FES; another technical challenge is that QM methods are not readily compatible with the typical alchemical free energy protocols18. Consequently, it is imperative to further develop free energy methods that can benefit from both the accuracy of QM methods and the efficiency of MM models.
One promising framework along this line is the multi-level free energy simulation (Figure 1a),19–25 which is also referred to as “indirect cycle free energy simulation”,24 “bookending method”,26 or “end state correction scheme”27 in the literature. Using a thermodynamic cycle, to obtain the free energy difference between two states A and B with an accurate but expensive method (the high level of theory, H), one can perform extensive configurational sampling with an efficient but less accurate method (the low level of theory, L), and then apply the low-to-high free energy correction, , to the end states (eq. 1):
| (1) |
Figure 1: The intermediate model potentially improves the convergence of the correction term in multi-level free energy simulation.

a A thermodynamic cycle is employed to compute the free energy difference between states A and B at the high level (H) by extensively sampling the low level (L) and improving the accuracy to the high level through correction terms (vertical legs). An intermediate model (M) is introduced to improve the convergence of the correction terms. Note that only the intermediate model for state B is shown explicitly for simplification (denoted as in the yellow rectangle). b The potential energy difference distributions of the low and high levels and , respectively) often barely overlap for realistic molecular systems, which limits the reliability of methods that rely only on samples from the low level. The distribution of the intermediate model bridges the distributions of the two levels of theory and therefore can significantly increase the computational efficiency and accuracy.
To completely avoid sampling at the high level, the single-sided free energy estimator, free energy perturbation (FEP),28 is the most rudimental method for computing . A well-known limitation of FEP is that it is reliable only when a sufficient amount of configurations sampled at the low level are representative of those highly populated at the high level29,30. A particularly relevant and quantitative measure of the similarity between the two configurational ensembles concerns the distribution of potential energy difference between the two levels, the central quantity that enters FEP.31,32 Reliable estimates can be obtained when the distribution of in the forward perturbation (low-to-high) overlaps significantly with that in the backward perturbation (high-to-low). In practice, the two levels of theory often feature substantial differences in certain degrees of freedom (DOFs), which lead to a poor distribution overlap33 and introduce large uncertainties into FEP results. Accordingly, numerous methodologies have been proposed to improve the evaluation of the correction term, .
In general, two levels of theory may differ in both stiff (e.g., bonds and angles) and soft (e.g., dihedral angles and collective motions) degrees of freedoms. For small differences in the stiff DOFs, corrections to the free energy can be estimated semi-analytically with either harmonic or anharmonic models.34 For larger differences, the nonequilibrium-work (NEW) method in which simulations are conducted by rapidly switching between low and high level potentials can be effective.24 Since the switching simulations require the evaluation of the high level potential at each MD step, the NEW trajectories need to be short and have the lengths of a few picoseconds. Although such a switching timescale is sufficient to relax the stiff DOFs, differences in the soft DOFs are difficult to capture.35 To consider more complex discrepancies between different levels of theory, the low level potential can be modified to better mimic the high level potential,36 using, for example, force matching.21,26 With sufficient improvement of the low level potential function, FEP can then be used to better estimate the free energy correction toward the high level; the improved low-level method can also be used to compute forces at inner time steps in a multiple time step framework,37 while high level forces are computed only at outer time steps. Alternatively, one may improve the low level of theory using machine learning based techniques; i.e., one parameterizes the difference between the two levels of theory with neural networks based on samples from low-level MD simulations and uses such -learning to iteratively improve the low level method towards the high level.38–42 In general, a robust improvement of the low-level model requires a substantial number of snapshots, which limits the size of systems amenable to the -learning approach. Finally, approaches based on the normalizing flow model43 have been developed to improve the convergence of multi-level FES44,45. Normalizing flow uses a chain of probability transformations to map a low-level distribution to a high-level distribution. The transformations are exact so in the ideal scenario the mapped distribution can fully overlap with the high level distribution, and hence FEP calculations can be efficient and accurate. Unfortunately, the normalizing flow method has to model the entire system, which means the modeled distribution can be extremely high-dimensional for condensed phase systems, making the model training almost impossible. As a result, the normalizing flow-based approaches have only been demonstrated to be effective in the gas phase or with an implicit solvent model.
In a different framework, without explicitly modifying the low-level potential, one may introduce an intermediate model (denoted as M in Figure 1a) that features good configurational overlaps with the high level,25 and the total free energy correction can be evaluated in a staged fashion, i.e.
| (2) |
In some cases, the intermediate model differs only from the low level in terms of charge distributions.25,46 Alternatively, when ensembles at the two levels of theory show substantial structural differences, the intermediate model can be derived by performing biased simulations at the low level toward configurations favored by the high level of theory. Such a staged approach is particularly attractive because the biased simulations required to establish the intermediate model can follow collective coordinates that compactly describe the “problematic” DOFs, rather than specifying the latter individually as required in methods that explicitly modify the low-level potential function. On the other hand, the major challenge is then the automated identification of such collective coordinates that dictate the configurational overlap between different levels of theory.
In recent years, the incorporation of the autoencoder47 (AE), a neural network framework, and its variants into physics-based methods has resulted in great success in the collective variable (CV) or reaction coordinate (RC) discovery,48–52 where CVs/RCs are low-dimensional representations that capture the most essential information regarding processes occurring in a high dimensional space. AEs map the input data into a low-dimensional latent space, which is the neural network-predicted RC, with little-to-none prior knowledge about the underlying process. Depending on the choice of input features and model architectures, the latent space usually captures the slow52,53 or large-magnitude48,49 motion of the system. The success of these AE-based methods suggests that a similar approach can be valuable to the identification of DOFs essential to the bridging of configurational distributions at different levels of theory in multi-level FES. We surmise that, even with limited sampling at the high level, if the neural network is trained on data from both levels, the optimal RCs are able to capture the nature of the problematic DOFs. Furthermore, the RCs are learned via a chain of continuous and differentiable functions of the input features and therefore can be conveniently integrated into enhanced sampling simulations52–55 to generate the intermediate model (Figure 1b) for computing the free energy correction term in the staged-transformation framework.25
Specifically, we develop a neural network based method to learn the optimal RCs so that enhanced sampling simulations in this compact space at the low level can be performed to preferentially sample the configurations that contribute significantly to the free energy correction toward the high level. The neural network model is based on the adversarial autoencoder (AAE),56 a modern variant of AE, with key modifications that maximize the likelihood of success for our approach. While MD simulations at both levels are conducted to generate the training data, enhanced sampling simulations are conducted at the low level and the high level simulations are short and only required at the end states. The significant increase in the accuracy and efficiency compared to direct FEP or straightforward Bennett acceptance ratio (BAR)57 calculations is demonstrated with both gas phase and solvated systems, showing great promise of the method for more complex condensed phase applications. Although the use of neural network in this study is directly inspired by the previous machine learning CV/RC discovery and enhanced sampling methods, our approach differs from those in several ways. First, rather than capturing the slow or large-magnitude motion(s), the latent space in our approach represents the most important structural differences that dictate the configurational overlap between two levels of theory, making the approach particularly suited for the application of multi-level FES. Second, as further elaborated in the following sections, the use of AAE allows for an explicit control of the latent space distribution, which makes the subsequent enhanced sampling simulations more straightforward and interpretable compared with previous multi-level methods.
2. Methods and Computational Setup
2.1. Typical Architecture of Adversarial Autoencoder
To provide the general background of our development, the typical architecture of AAE is shown in Figure S1a. The upper part is the same as AE (Figure S1b), in which an encoder maps a high-dimensional dataset, , into a low-dimensional representation (the latent space), . Then a decoder reconstructs the input data from . The training process minimizes the reconstruction error, so after training, the learned latent space, , captures the most essential information about while filtering out most of the noise. It’s therefore efficient to sample the latent space and generate new data that resemble the original data. The key improvement of AAE over AE is that the distribution of the latent space is matched to a prior distribution, whose functional form and parameters are determined before training. The match of the latent space to the prior distribution is realized by the adversarial training, an idea borrowed from generative adversarial networks (GAN).58 In GAN, two models, a generator and a discriminator, are trained adversarially. The generator produces fake data that mimics real data while the discriminator attempts to distinguish fake and real data (Figure S1c). In AAE, the encoder part of the autoencoder acts as the generator. Fake data are the latent space variables generated by the encoder, while real data are sampled from the prior distribution. The outcome of the adversarial training in AAE is that the distribution of the latent space variables, , matches the prior distribution (Figure S1a). Compared with AE, a better latent space can be learned in AAE.56
2.2. An AAE based Workflow for Multi-level Free Energy Simulation
Now we present the proposed workflow for an efficient evaluation of the free energy correction between different levels of theory in the multi-level FES framework. Short MD simulations at low and high levels are conducted and pairwise distances between the heavy atoms are computed from the MD trajectories. These features are commonly used in ML-assisted free energy calculations as they are translationally and rotationally invariant. The pairwise distances of the two levels are combined together as the training dataset, to ensure that the model can learn the difference between the two distributions. We take advantage of the flexibility of AAE to maximize the likelihood of success of our workflow by adopting two strategies. First, during the adversarial training stage of the discriminator, each data entry is assigned a label, making the model a supervised one. The label informs the discriminator whether the data belong to the low level or the high level. As suggested in the original AAE paper,56 the labeled data from different classes can be mapped onto different regions of the prior distribution. Second, considering that the potential energy difference between the low and high levels of theory is critical to the evaluation of , the output of the decoder is not the reconstructed input, but scaled (see below) associated with the corresponding input data; this helps ensure that the latent space is optimized for capturing the information most critical to the evaluation of .
The above effort on maximizing the model performance results in an optimal latent space that can be used as the RCs in the subsequent umbrella sampling (US) simulations.59 Thanks to the learned relationship between the configurational space and the latent space, we can bias the simulations at the low level so that the sampled ensembles are increasingly representative of the high level. Moreover, due to the use of labels during training, the latent space is aware of the information about the distributions at different levels of theory; thus we expect that the distributions during US sampling will also progressively shift toward the distribution of the high level. As a result, an intermediate model (M) can be readily established whose distribution overlaps with that of the high level. As a result, can be robustly estimated with rapid convergence, leading to an efficient computation of following eq. 2.
2.3. Computational Setup of the Test Systems
Two benchmark systems are tested in this study: alanine dipeptide (ALA) in the gas phase and bis-2-chloro diethyl ether (2CLE) in a water box. MD simulations are conducted at the low and high levels, respectively, in the CHARMM package.60 For ALA in the gas phase, the high level is the Density Functional Tight Binding61 up to third Order (DFTB3)62 with the 3OB parameter set63,64 (DFTB3/3OB). The low level is the CHARMM36m65 force field. Non-bonded interactions are computed using a standard group based cutoff scheme66. The Langevin integrator with a 2-fs timestep is used and a friction coefficient of 2 ps−1 is applied to all atoms. The temperature is kept at 303.15 K.
In the case of 2CLE, the solute is solvated in a cubic water box with a 10.0 Å edge distance under periodic boundary condition (PBC). The low level is a MM model, in which the parameters for the solute were generated in the previous study22 from the CHARMM general force field (CGenFF)67 and refined with the general automated atomic model parameterization (GAAMP) method.68 The high level is the QM/MM potential where the solute is treated by DFTB3/3OB, while the water molecules are treated by the CHARMM-modified TIP3P water model.69 The electrostatic interactions are calculated by the Particle-mesh Ewald (PME) summation.70 Van der Waals (vdW) interactions are treated by a cutoff distance of 12 Å and a switching distance of 10 Å. NPT (constant particle number, pressure, and temperature) simulations at 303.15 K and 1 bar are carried out with an integration time step of 2 fs. Snapshots are saved every 2 ps.
For all simulations of both systems, a weak harmonic restraint (force constant: 10 kcal mol−1 Å−2) is applied to the center-of-mass of ALA or 2CLE to keep them around the origin. The SHAKE algorithm is used to constrain bonds involving hydrogen atoms to the respective optimal bond distances at the low and high levels of theory. By constraining these bonds, we do not explicitly consider their vibrational contributions to the free energy correction, which can be estimated based on harmonic approximations.34 This choice is made based on the consideration that for many processes of interest, vibrations of these stiff degrees of freedom do not make the dominant contribution to the free energy differences.71,72
2.4. Dataset Preparation
Each snapshot from MD simulations is represented by a 1D list of pairwise distances between the non-hydrogen (heavy) atoms of ALA or 2CLE. Distances from both levels are combined as the input data for the neural network model training. Two training datasets containing 250 or 1250 conformations from each level of theory are created, corresponding to simulation lengths of 0.5 and 2.5 ns, respectively. Test datasets are generated from separate MD simulations. To mimic the realistic scenario where the high level sampling is limited, only 0.5 ns simulations are performed at both levels to generate a small dataset for the evaluation of model performance, which corresponds to 250 conformations at each level. A larger test dataset, which contains 40,000 snapshots collected from 80 ns of trajectory at each level of theory, is also generated for better examining the latent space mapping but not used in any model training/evaluation steps. Regardless of the level of the theory used for the configurational sampling, the potential energy difference is defined as for each snapshot, where and are the potential energies evaluated at the high and low levels, respectively. To avoid large numerical values for , unscaled values are offset by 16,400 (ALA) or 12,200 kcal/mol (2CLE) before any computation or visualization. MinMaxScaler in scikit-learn73 is applied to scale the pairwise distances and values to the range of [−1, 1]. The prior distribution is taken to be a mixture of two two-dimensional Gaussians. By design, the component centered in the first quadrant represents the low-level distribution while the other component centered in the third quadrant represents the high-level distribution. A two-dimensional prior model is used here because it strikes the balance between providing sufficient flexibility for the latent space optimization and being amenable to efficient sampling. For example, sampling in a three-dimensional latent space is substantially more costly.
2.5. Neural Network Training
All neural networks are implemented with PyTorch.74 The encoder, decoder, and discriminator each consists of two fully-connected layers. The first layer has 64 nodes and the second layer has 32 nodes. The Leaky Rectified Linear Unit activation function (LeakyReLU)75 is applied in each hidden layer. The functional form of LeakyReLU is as follows:
| (3) |
where in our neural network models. A hyperbolic tangent function (tanh) is applied to normalize the output of the decoder to the range of [−1, 1]. The Mean Square Error loss function (MSELoss), , is used for both the decoder and the discriminator. The Adam optimizer76 is used with default parameters. The learning rate is 0.00025 for the encoder and the decoder while it is set to 0.0001 for the discriminator. Initial weights are generated from a normal distribution, . Five-fold cross-validation77 is used to evaluate the performance of models.
Kullback–Leibler (KL) divergence is a non-negative number that measures the overlap between two probability distributions. A smaller KL divergence value indicates a larger overlap. KL divergence between the distributions of the reconstructed scaled and the original scaled , is computed. As the KL divergence calculation is a part of the model evaluation, the small test dataset described in the above subsection is used.
2.6. Umbrella Sampling Simulations
Umbrella sampling (US) simulations are conducted in OpenMM 7.678 interfaced with the open-source, community-developed PLUMED library,79 version 2.7.80 The reaction coordinates are the learned latent space from neural network training. While the work was in process, the PLUMED library did not directly recognize files in the PyTorch format. Therefore, all weights and biases in the neural network layers are extracted after training, and the transformations from the input pairwise distances to the latent space are reconstructed in a PLUMED input file by using linear and non-linear functionalities available in PLUMED. These operations are numerically stable and efficient as they are all arithmetic computations and do not involve any gradient calculations. By the time our work was finished, the PYTORCH module was added to a later version of the PLUMED library (version 2.9), which allows users to directly load models defined in PyTorch.
The sampling of the -th umbrella window is restrained to a narrow region around its equilibrium position on the latent space, , through a harmonic restraining potential, , where is the force constant. The centers of the US windows are taken to be along the straight line that connects the two centers of the prior distributions (indicated as black dots in Figure 5a and summarized numerically in Table S1). During US simulations, at each integration step, the latent space variables are computed on-the-fly by the PLUMED library using the method described above. All other set-up and simulation parameters are the same as the unbiased low level simulations. 10 ns trajectory (5000 snapshots) is collected for each window. With the sufficiently large dataset, we then compute the free energy difference as a function of the simulation length (see below) to evaluate the convergence of our AAE-based approach.
Figure 5: Free energy calculation results for ALA in the gas phase.

a The underneath gray contour is the prior distribution. The black dots represent the centers of each umbrella window. The rainbow contours represent the sampled regions on the latent space for each window. b The orange and blue dotted curves represent the distributions of the ensemble sampled at the high and low levels, respectively. The two distributions only have a negligible overlap while the distribution of the intermediate state has decent overlap with both levels, especially with the high level. MM = CHARMM36m, DFTB = DFTB3/3OB. c The free energy difference between the low and high levels, , as a function of the simulation length. The shaded areas represent the standard error of the mean (SEM) computed from five replicas. Note that the model is trained on the first 2.5 ns trajectories from both levels, as indicated by the vertical dotted line in c, but extended simulations are conducted to compute the reference FEP and BAR values.
2.7. Free Energy Calculations
Following eq. 2, the correction term is computed as , where M denotes the intermediate model. In the current work, we focus on a single end state and thus drop the subscript notation in the free energy differences hereafter.
The WHAM package of the Grossfield Lab81 is used to calculate the free energy difference between umbrella windows. The original low level simulation is treated as one umbrella window without the restraining potential (denoted as the 0-th window). 9 umbrella windows are simulated in both test cases (1-st to 9-th windows), so . The free energy difference between the intermediate model and the high level, , as well as the direct BAR results are calculated with the BAR estimators implemented in the pymbar package,82 while the direct FEP results are calculated with in-house codes. The reported free energy differences are the average values of 5 independent replicas and statistical errors are estimated as the standard error of the mean (SEM) across the 5 replicas. The SEM of , and the SEM of , are propagated to obtain the SEM of , using the formula . The computed free energy differences (Figures 5c, 6b,d and 7d), components (Figures S6 and S11) and the statistical errors (Figures S7 and S12) are shown as functions of the simulation length (i.e., by including an increasing number of snapshots in the free energy calculations), which is taken to be the same for the low level, high level and umbrella sampling simulations.
Figure 6: The KL divergence between the reconstructed and the original distributions is a good indicator of whether the sampling is sufficient for training.

a When a much smaller training dataset (250 conformations from each level) is used, the model with the optimal architecture results in a poor overlap between the reconstructed and the original distributions for the test dataset. The KL divergence between the two distributions is 4.20 ± 0.54. b The free energy differences also deviate significantly from the average BAR value, especially in the first a few nanoseconds. c A smaller KL divergence of 2.46 ± 0.32 and therefore a good overlap between the reconstructed and original distributions are obtained in another model with slightly different architecture but trained on 1250 conformations from each level. d The free energy estimates obtained in this model show much quicker and better convergence than the model in b and is comparable with the optimal model discussed in the text. Note that the model is trained on the first 0.5 ns or 2.5 ns trajectories from both levels, as indicated by the vertical dotted lines in b and d, respectively, but extended simulations are conducted to compute the reference values. The shaded areas represent the SEM from 5 replicas.
Figure 7: The AAE-assisted free energy calculation results on the solvated bis-2-chloro diethyl ether (2CLE) system.

a The solute, 2CLE, is solvated in a water box. b The prior distribution and the centers of umbrella windows follow the similar definitions as the ALA example in Figure 5a. c The distribution of the intermediate model (the solid green curve), i.e. the last umbrella window, has significant overlap with the distributions of both the low (the dotted blue curve) and high (the dotted orange curve) levels. d The free energy difference as a function of the simulation length shows the similar pattern as in the ALA case. The vertical dotted line represents the simulation length used for model training. The shaded areas represent the SEM from 5 replicas.
3. Results and Discussion
3.1. Training of the Optimal Model
The performance of our approach is first evaluated with an alanine dipeptide (ALA) in the gas phase (Figure 3a), a well-studied benchmark system to test free energy calculation methods.83 Although it is a small molecule, some DOFs such as backbone and sidechain torsional angles are likely to feature different distributions at different levels of theory. Here, the low level is the MM force field65 (MM), while the high level is a semi-empirical QM method, the third-order density functional tight binding (DFTB3/3OB) method.61–64 We chose this semi-empirical QM method as the high level instead of an ab initio QM approach because DFTB3/3OB is more efficient with a reasonable accuracy, making it possible to obtain ‘exact’ reference results by running brute-force sampling, which can be used to evaluate the reliability and convergence of our multi-level FES approach. As stated earlier, the pairwise distances between the heavy atoms from MD trajectories at the MM and DFTB3/3OB levels are combined as the training dataset. The distributions at the two levels have large widths of 3.46 and 4.28 kcal/mol, respectively, and barely overlap, which highlights the difficulty in obtaining accurate free energy results with conventional methods29,30 even for a gas phase system (Figure S2).
Figure 3: Training of the optimal model using the gas-phase alanine dipeptide (ALA) system as an example.

a The three-dimensional structure of ALA. b The blue solid curve and the green dotted curve represent the generator losses for the training and validation datasets, respectively. The orange solid curve and the red dotted curve represent the discriminator losses for the training and validation datasets, respectively. The shaded areas represent the one standard deviation of losses computed from the five-fold cross-validation. c After training, the test dataset is used to evaluate the performance of the model. The reconstructed distribution overlaps well with the original distribution. The decoder predicts scaled from both levels and therefore, to make a direct comparison, the distribution of the original scaled in c also includes data from both levels. The KL divergence between the two distributions is 2.79 ± 1.12. The corresponding comparison for the training set is shown in Figure S4, in which the distributions include data from both levels as well. d The mapping of another larger test dataset to the learned latent space. The colorbar represents the scaled value associated with each latent space variables. The arrow points from the regions representing the dominant populations of the low level to regions representing the dominant populations of the high level (i.e. from 1 to −1).
The training and validation loss curves demonstrate the advantage of our AAE architecture (Figure 3b). By implementing the two strategies mentioned before, the discriminator loss quickly converges to the equilibrium value with very small variation. The same trend is observed for the generator loss, which decays sharply to a small and stable value. Moreover, the validation loss agrees well with the training loss, indicating neither over-fitting nor under-fitting. For comparison, two additional architectures are also implemented. One is an unsupervised and symmetric AAE, i.e. the label information is not provided to the discriminator and the decoder reconstructs the pairwise distances. The other is a supervised and symmetric AAE, i.e. the label information is provided to the discriminator and the decoder still reconstructs the pairwise distances (i.e. not ). The discriminator loss generally has much larger variation and does not converge to the expected value; the generator loss is also significantly larger than the best model (Figures 4a and b). In the supervised and symmetric AAE, the validation loss of the discriminator also deviates markedly from the training loss (Figure 4b). These observations clearly show that, without the two strategies we adopted, the optimal model will not be learned.
Figure 4: Training outcomes of the ALA test system for neural network models trained with non-optimal architectures.

a Loss curves of the unsupervised and symmetric AAE model, i.e. without label information and the decoder reconstructs the input data. b Loss curves of the supervised and symmetric AAE. c Mapping of a separate test dataset onto the learned latent space of the unsupervised and symmetric AAE. d Mapping of a separate test dataset onto the learned latent space of the supervised and symmetric AAE. All models are trained with the dataset containing 1250 conformations from each level and other hyper-parameters are the same as the optimal architecture.
After training, the distribution of the reconstructed for the training set overlaps well with the original distribution (Figure S4). To evaluate the model performance on data not used for training, a separate test dataset is used to obtain the distribution of reconstructed potential energy difference after the model training (see Figure 3c). We deliberately used a small test set (250 conformations) to mimic the realistic scenario where one aims to minimize the computational cost by limiting high level samplings to the generation of the training dataset. Since the values are generated by decoding latent space variables, the significant overlap between the distributions of the reconstructed and the original indicates that the learned latent space is indeed able to capture the essential DOFs that encode the difference in the distributions and therefore potentially facilitate the free energy calculations.
Mapping of the test dataset onto the learned latent space can strongly suggest whether the encoder has learned the representation of the input data distribution. Here, to better visualize the effect of the mapping, we used a much larger test dataset. In models with non-optimal architectures (i.e. the unsupervised and symmetric AAE or the supervised and symmetric AAE), the mapping of the test dataset only partially overlaps with the prior distribution (Figures 4c and 4d), leaving a large portion of the prior distribution uncovered. The uncovered regions represent the lack of a well-defined relationship between the prior distribution and the learned latent space in models with non-optimal architectures, preventing a clear interpretation of the mapping of the test dataset. By contrast, with the optimal architecture, the mapping of the test dataset overlaps well with the prior distribution, indicating that the learned latent space indeed highly resembles the desired prior distribution (Figure 3d). The optimal mapping provides further evidence that our model has successfully learned the relationship between the configurational space and the latent space, including the information regarding the distributions. First, the input data from low and high levels are mapped onto different regions in the latent space with the supervised discriminator (Figure S5). Second, the fact that the learned latent space captures the essential information about distributions can be better demonstrated by coloring each latent space variable with its associated value. Data points with similar values are clustered to the nearby region in the latent space. This naturally forms a pathway from regions representing the dominant populations of the low level to regions representing the dominant populations of the high level. Thus if we sample along this pathway in the latent space, we are able to collect configurations that are increasingly representative of the dominant populations of the high level method.
3.2. Construction of the Intermediate State Through Umbrella Sampling
With the optimal latent space, umbrella windows are constructed to follow the pathway that represents the continuous transition of the latent space mapping. Each umbrella window overlaps with its neighbors to a notable extent, which ensures that their distributions also have decent overlap (Figure 5a). As expected, their distributions progressively shift toward that of the high level. The last umbrella window features the largest overlap with the high level distribution and is chosen as the intermediate model (Figure 5b), for which can be efficiently and accurately calculated through the two-sided BAR estimator.57 The free energy difference between the original (unbiased) low level simulation and the intermediate model, , is readily calculated through the weighted histogram analysis method (WHAM).84 The sum of these two terms gives the total free energy difference between the low level and the high level, , which is the correction term of interest in multi-level FES.
To demonstrate the advantage of our approach, we compare the free energy results from our AAE-assisted staged transformation approach with the direct calculations of using FEP28 and BAR.57 Although FEP is an efficient estimator since it requires sampling only at the low level, the reliability of FEP calculations depends critically on the distributions overlap between levels. BAR in principal has a looser dependence on the configurational overlap although it requires explicit sampling at the high level. With the 2.5-ns MD trajectories used to train the AAE model, the FEP results (the dotted blue curve in Figure 5c) deviate significantly from the reference value and this trend persists even when more data are included (data points after the vertical dotted line). Moreover, large fluctuations are observed for FEP results (the blue shaded area in Figure 5c). Using the same 2.5-ns MD trajectories, the average values of BAR are stable (the dashed black line in Figure 5c). Nevertheless, the large statistical uncertainty for the first few nanoseconds (the grey shaded area in Figure 5c) indicates that the BAR results are reliable only if the sampling is long enough to develop sufficient overlap in distributions (for example, longer than 10 ns; see Figure 5c). This behavior makes the straightforward BAR calculation less practical in multi-level FES due to the high computational cost of extensive sampling at the high level. Nevertheless, the better convergence of BAR than FEP highlights that some amount of sampling at the high level is necessary to ensure the reliability of the free energy estimate.
In sharp contrast to the free energy differences computed with FEP and BAR, our AAE-based approach captures the free energy difference with high accuracy and efficiency (the orange curve in Figure 5c; for components and statistical errors, see Figures S6 and S7). The first term, , is efficiently calculated using the original low level trajectory and the intermediate model trajectory through the US simulations and WHAM as discussed above. The significant configurational overlap between the intermediate model and the high level greatly reduces the uncertainty in the calculation of the second term, . Altogether, the computed from our approach quickly converges to the reference value when only the 2.5-ns high level data are included in the free energy calculations. Within this timescale, the results from our approach are more reliable than other methods, as demonstrated by the much smaller statistical uncertainty. To further support the observation, we also calculated free energy differences using the same model trained on the 2.5-ns trajectories but with more data for calculating free energies (see the orange curve in Figure 5c after the vertical dotted line). These additional calculations show comparable average results and similar level of statistical uncertainty as results for the first 2.5 ns, again supporting that a modest amount of sampling at the high level is sufficient to obtain converged results with our approach.
3.3. Proper Size of Training Dataset
In our method, an issue of major importance is the determination of the length of training simulations, especially sampling at the high level of theory. The expectation is that although the size of the training dataset at the high level is not large enough for computing accurate free energy differences, the learned latent space is still able to guide the construction of the intermediate model so that a reliable estimate of the free energy difference can eventually be obtained. In this regard, we surmise that the KL divergence between the reconstructed scaled and the original scaled distributions is a proper metric for the overlap between the two distributions and therefore an indicator of whether the amount of training data is sufficient.
The model trained on the 2.5-ns dataset (1250 conformations from each level) shows a small KL divergence (Figure 5c) while models trained on a 0.5-ns dataset (250 conformations from each level) show large KL divergence (Figure 6a). Correspondingly, free energy differences with models trained on the smaller dataset show a large deviation from the reference value, suggesting that the training dataset has not captured the major distinctions between the low and high levels, leading to a systematic error in the AAE-assisted latent space discovery and subsequent free energy estimates (Figure 6b); nevertheless, we note that even with the smaller dataset, the results are substantially better than FEP. Furthermore, the model with a different set of hyper-parameters but similar KL divergence (Figure 6c) as the “optimal” model discussed above is able to maintain a similar level of accuracy in free energy calculations (Figure 6d). These observations support that the KL divergence between the reconstructed and the original distributions can be a proper indicator for determining the required length of the training simulations.
3.4. Free Energy Difference in a Solvated System
Having demonstrated the success in a gas phase system, we now apply the computational framework to a condensed phase system, which includes a small solute, bis-2-chloro diethyl ether (2CLE), in a water box (Figure 7a). Here, the low level is still the MM FF, and the high level potential is a combined QM/MM method where the solute is treated by DFTB3/3OB, and the solvent is treated by the MM FF. In addition to intra-solute DOFs, solute-solvent interactions also vary with the level of theory. Thus the low and high level distributions are broad (with a width of 3.22 and 4.70 kcal/mol, respectively) and likely to have a poor overlap, making conventional methods even more difficult to converge.35 For the optimization of the latent space, to avoid modeling the large number of water molecules, we aim to include minimal information about the solvent in the training dataset. In this particular example, only pairwise distances between the heavy atoms of the solute are used for training. In this way, we expect that the solvent effect is implicitly reflected by the distributions of the solute-only features.
For the AAE-based training, the two modifications of the architecture discussed earlier are also proven important in the solvated case. As shown in Figure S8, the loss curves in the solvated case capture the similar behavior as observed in the gas phase example. With the optimal architecture, the training loss and the validation loss agree well with each other (Figure S8a). The discriminator loss quickly converges to the equilibrated value with a small variation and the generator loss also decays rapidly to a small and stable value. As in the gas phase example, models with non-optimal architectures generally show large variations in the discriminator loss and/or poor overlap between training and validation losses (Figures S8b and c). KL divergence between the reconstructed and the original distributions is also used to determine the length of the training dataset (Figure S9). We found that the models trained on the dataset containing 2.5 ns trajectories from each level show favorable overlap between two distributions and a small KL divergence (Figure S9a), which indicates that this dataset is sufficient for training a reliable model, as also demonstrated by the loss curves in Figure S8a.
With the optimal latent space, US simulations are conducted to establish the intermediate model (Figure 7b). As in the gas phase example, the distribution of the last umbrella window has a large overlap with the high level distribution and therefore chosen as the intermediate model (Figure 7c). Figure 7d shows the free energy difference as a function of the simulation length using FEP, BAR, and our approach. The FEP results never converge to the reference value and exhibit large statistical uncertainties. The BAR results slowly converge to the reference value but show large statistical uncertainties for all simulation lengths, suggesting that the results are less reliable even if we run much longer simulations at both levels. By contrast, the free energy computed from our approach (for components and statistical errors, see Figures S11 and S12) quickly converges to the reference value within the first 2.5 ns. The statistical uncertainty of our approach is much smaller than other methods, illustrating that the results are much more reliable even with a modest amount of sampling. These encouraging observations demonstrate that our approach is able to efficiently and accurately estimate the free energy difference between the low and high levels in the solvated case. We recall that only solute-derived features are used for model training. The fact that an accurate free energy difference can still be efficiently calculated suggests that our computational framework has the potential to be scaled up to systems with a large number of solvent molecules.
4. Concluding Remarks
In this work, we have developed a computational framework to compute the free energy difference between an inexpensive, low level of theory and a more costly, high level of theory, , which is a key quantity of interest in multi-level FES. The major challenge we aim to tackle is to automatically identify collective coordinates that dictate the distinct energy difference distributions at the two levels of theory. The solution we propose is a neural network model based on the adversarial autoencoder (AAE), which matches the distribution of the latent space to a prior distribution, thus allowing a better-controlled mapping from the input data to the latent space than the traditional AE. We implemented two modifications to the basic AAE architecture to make our approach particularly effective in the context of multi-level FES. Since the neural network model is aware of both configurational distributions and the energy difference distributions at the two levels of theory, the latent space effectively captures the nature of the DOFs that dictate the poor distribution overlaps between the two levels, thus overcoming a major remaining bottleneck of previous studies. Due to the multi-layer nature of the neural network used in our work, it remains difficult to explicitly define the contributions of individual features. Nevertheless, examination of the weight matrix (Figures S3 and S10) suggests that no single feature dominates and the latent space is spanned by multiple conformational degrees of freedom, highlighting the power and scalability of the AAE framework.
This optimal latent space is then used in US simulations to efficiently construct an intermediate model that has significantly improved distribution overlap with the high level. In turn, the staged transformation strategy proposed in our previous study25 is adopted to calculate the total free energy difference between the low and high levels (eq. 2). This data-driven method has been applied to both gas phase and condensed phase systems. Compared to a straightforward BAR calculations that require extensive sampling at both low and high levels, more than an order-of-magnitude increase in the computational efficiency is observed in both cases with at least comparable accuracy. The success supports our expectation that a modest amount of sampling at the high level is both necessary and sufficient to guide the discovery of latent space variables for substantially improving the distribution overlap with the high level of theory by judiciously biasing low-level simulations in such a latent space.
The unique aspect of our approach compared to existing multi-level FES methods21,23,24,26,41 is that we use a neural network to automatically learn a compact latent space that represents the multiple degrees of freedom that encode the distinct distributions at the two levels. This feature avoids the need of identifying and/or correcting individual problematic DOFs as required by methods that explicitly modify the low-level potential function,21,26 making our approach more scalable to large systems where many DOFs are expected to contribute to the poor overlap between the low and high levels of theory. For example, the -learning approach39–42 needs to learn the energy (and force) differences between two levels of theory, which are complex functions of high dimensionality; an accurate and transferable training thus is expected to require a large number of data points for condensed phase systems, although many configurations may not make a significant contribution to the free energy difference. By contrast, by focusing only on the DOFs that greatly impact the distribution overlap between the two levels of theory, our approach explicitly drives the sampling of configurations that contribute predominantly to the free energy difference, which explains the high efficiency and accuracy observed in the example applications.
The performance of our neural network-assisted approach, which first learns an optimal latent space and then constructs an intermediate distribution to facilitate the convergence of free energy correction, can be further enhanced with more sophisticated molecular features, such as those representing the nearby solvent distributions.39,40,85 In fact, we note that the two levels of theory can differ significantly in not only structural distributions but also electronic properties, such as charge distributions,25,46 which lead to different interactions with the environment. For such cases, including the environmental features in the training of the latent space is expected to be particularly important. Overall, our study provides a uniquely effective avenue to compute the low-to-high free energy correction by integrating state-of-the-art machine learning models with enhanced sampling simulations.
Supplementary Material
Figure 2: The proposed workflow for multi-level free energy simulations.

A modified AAE model is trained with the input simulation data from both the low and high levels. The input data are the pairwise distances between selected atoms (for example, the heavy atoms of the solute molecule). The output of the decoder is the reconstructed associated with the input data. The prior distribution is a two-dimensional Gaussian mixture with two components. As indicated in the figure, by design, the low level and high level data are supposed to be mapped to their respective components. The learned encoder and latent space is then used for the on-the-fly calculation of latent space variables during enhanced sampling simulations for computing the free energy correction between the low and high levels of theory (eq. 2).
Acknowledgement
This work was supported by NIH grant R35-GM141930 and grant ML-21-016 from the Dreyfus foundation. Computational resources from the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by NSF Grant ACI-1548562, are greatly appreciated; part of the computational work was performed on the Shared Computing Cluster, which is administered by Boston University’s Research Computing Services (www.bu.edu/tech/support/research/).
Footnotes
Supporting Information Available
Illustration of the typical architecture of several relevant neural network models, and additional computational results for the multi-level free energy simulations of the test systems.
References
- (1).Rao SN; Singh UC; Bash PA; Kollman PA Free energy perturbation calculations on binding and catalysis after mutating Asn 155 in subtilisin. Nature 1987, 328, 551–554. [DOI] [PubMed] [Google Scholar]
- (2).Simonson T; Archontis G; Karplus M Free energy simulations come of age: Protein-ligand recognition. Acc. Chem. Res 2002, 35, 430–437. [DOI] [PubMed] [Google Scholar]
- (3).Hansen N; van Gunsteren WF Practical Aspects of Free-Energy Calculations: A Review. J. Chem. Theory Comput 2014, 10, 2632–2647. [DOI] [PubMed] [Google Scholar]
- (4).Baron R; McCammon JA Molecular Recognition and Ligand Association. Annu. Rev. Phys. Chem 2013, 64, 151–175. [DOI] [PubMed] [Google Scholar]
- (5).Gao J; Kuczera K; Tidor B; Karplus M Hidden Thermodynamics of Mutant Proteins - A Molecular Dynamics Analysis. Science 1989, 244, 1069–1072. [DOI] [PubMed] [Google Scholar]
- (6).Lin YL; Meng YL; Jiang W; Roux B Explaining why Gleevec is a specific and potent inhibitor of Abl kinase. Proc. Natl. Acad. Sci. U.S.A 2013, 110, 1664–1669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (7).Wang L; Wu Y; Deng Y; Kim B; Pierce L; Krilov G; Lupyan D; Robinson S; Dahlgren MK; Greenwood J; Romero DL; Masse C; Knight JL; Steinbrecher T; Beuming T; Damm W; Harder E; Sherman W; Brewer M; Wester R; Murcko M; Frye L; Farid R; Lin T; Mobley DL; Jorgensen WL; Berne BJ; Friesner RA; Abel R Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. J. Am. Chem. Soc 2015, 137, 2695–2703. [DOI] [PubMed] [Google Scholar]
- (8).Zhu F; Bourguet FA; Bennett WFD; Lau EY; Arrildt KT; Segelke BW; Zemla AT; Desautels TA; Faissol DM Large-scale application of free energy perturbation calculations for antibody design. Sci. Rep 2022, 12, 12489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (9).Zhang H; Kim S; Im W Practical Guidance for Consensus Scoring and Force Field Selection in Protein–Ligand Binding Free Energy Simulations. J. Chem. Inf. Model 2022, 62, 6084–6093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (10).Cournia Z; Allen B; Sherman W Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. J. Chem. Inf. Model 2017, 57, 2911–2937. [DOI] [PubMed] [Google Scholar]
- (11).Friesner RA; Guallar V Ab initio quantum chemical and mixed quantum mechanics/molecular mechanics (QM/MM) methods for studying enzymatic catalysis. Annu. Rev. Phys. Chem 2005, 56, 389–427. [DOI] [PubMed] [Google Scholar]
- (12).Hu H; Yang W Free Energies of Chemical Reactions in Solution and in Enzymes with Ab Initio Quantum Mechanics/Molecular Mechanics Methods. Annu. Rev. Phys. Chem 2008, 59, 573–601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Gao J; Ma S; Major DT; Nam K; Pu J; Truhlar DG Mechanisms and Free Energies of Enzymatic Reactions. Chem. Rev 2006, 106, 3188–3209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (14).Lu X; Fang D; Ito S; Okamoto Y; Ovchinnikov V; Cui Q QM/MM free energy simulations: recent progress and challenges. Mol. Simul 2016, 42, 1056–1078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (15).Ryde U; Söderhjelm P Ligand-Binding Affinity Estimates Supported by Quantum-Mechanical Methods. Chem. Rev 2016, 116, 5520–5566. [DOI] [PubMed] [Google Scholar]
- (16).Dos Santos AM; Oliveira ARS; da Costa CHS; Kenny PW; Montanari CA; Varela J. d. J. G. J.; Lameira J Assessment of Reversibility for Covalent Cysteine Protease Inhibitors Using Quantum Mechanics/Molecular Mechanics Free Energy Surfaces. J. Chem. Inf. Model 2022, 62, 4083–4094. [DOI] [PubMed] [Google Scholar]
- (17).van der Kamp MW; Mulholland AJ Combined Quantum Mechanics/Molecular Mechanics (QM/MM) Methods in Computational Enzymology. Biochemistry 2013, 52, 2708–2728. [DOI] [PubMed] [Google Scholar]
- (18).Yang W; Bitetti-Putzer R; Karplus M Chaperoned alchemical free energy simulations: A general method for QM, MM, and QM/MM potentials. J. Chem. Phys 2004, 120, 9450–9453. [DOI] [PubMed] [Google Scholar]
- (19).Gao J Absolute free energy of solvation from Monte Carlo simulations using combined quantum and molecular mechanical potentials. J. Phys. Chem 1992, 96, 537–540. [Google Scholar]
- (20).Dybeck EC; König G; Brooks BR; Shirts MR Comparison of Methods To Reweight from Classical Molecular Simulations to QM/MM Potentials. J. Chem. Theory Comput 2016, 12, 1466–1480. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (21).Hudson PS; Boresch S; Rogers DM; Woodcock HL Accelerating QM/MM Free Energy Computations via Intramolecular Force Matching. J. Chem. Theory Comput 2018, 14, 6327–6335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (22).Hudson PS; Woodcock HL; Boresch S Use of Interaction Energies in QM/MM Free Energy Simulations. J. Chem. Theory Comput 2019, 15, 4632–4645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (23).König G; Hudson PS; Boresch S; Woodcock HL Multiscale Free Energy Simulations: An Efficient Method for Connecting Classical MD Simulations to QM or QM/MM Free Energies Using Non-Boltzmann Bennett Reweighting Schemes. J. Chem. Theory Comput 2014, 10, 1406–1419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (24).Hudson PS; Woodcock HL; Boresch S Use of Nonequilibrium Work Methods to Compute Free Energy Differences Between Molecular Mechanical and Quantum Mechanical Representations of Molecular Systems. J. Phys. Chem. Lett 2015, 6, 4850–4856. [DOI] [PubMed] [Google Scholar]
- (25).Ito S; Cui Q Multi-level free energy simulation with a staged transformation approach. J. Chem. Phys 2020, 153, 044115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (26).Giese TJ; York DM Development of a Robust Indirect Approach for MM →QM Free Energy Calculations That Combines Force-Matched Reference Potential and Bennett’s Acceptance Ratio Methods. J. Chem. Theory Comput 2019, 15, 5543–5562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (27).Collins MA; Ho J Accelerating the Calculation of Solute–Solvent Interaction Energies through Systematic Molecular Fragmentation. J. Phys. Chem. A 2019, 123, 8476–8484. [DOI] [PubMed] [Google Scholar]
- (28).Zwanzig RW High-Temperature Equation of State by a Perturbation Method. I. Nonpolar Gases. J. Chem. Phys 1954, 22, 1420–1426. [Google Scholar]
- (29).Pohorille A; Jarzynski C; Chipot C Good Practices in Free-Energy Calculations. J. Phys. Chem. B 2010, 114, 10235–10253. [DOI] [PubMed] [Google Scholar]
- (30).Kofke DA Free Energy Methods in Molecular Simulation. Fluid Phase Equil 2005, 228–229, 41–48. [Google Scholar]
- (31).Wu D; Kofke DA Phase-space overlap measures. I. Fail-safe bias detection in free energies calculated by molecular simulation. J. Chem. Phys 2005, 123, 054103. [DOI] [PubMed] [Google Scholar]
- (32).Wu D; Kofke DA Phase-space overlap measures. II. Design and implementation of staging methods for free-energy calculations. J. Chem. Phys 2005, 123, 084109. [DOI] [PubMed] [Google Scholar]
- (33).Cave-Ayland C; Skylaris C-K; Essex JW Direct Validation of the Single Step Classical to Quantum Free Energy Perturbation. J. Phys. Chem. B 2015, 119, 1017–1025. [DOI] [PubMed] [Google Scholar]
- (34).Konig G; Brooks BR Correcting for the free energy costs of bond or angle constraints in molecular dynamics simulations. Biochim. Biophys. Acta - Gen. Subj 2015, 1850, 932–943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (35).Kearns FL; Hudson PS; Woodcock HL; Boresch S Computing converged free energy differences between levels of theory via nonequilibrium work methods: Challenges and opportunities. J. Comput. Chem 2017, 38, 1376–1388. [DOI] [PubMed] [Google Scholar]
- (36).Heimdal J; Ryde U Convergence of QM/MM free-energy perturbations based on molecular-mechanics or semiempirical simulations. Phys. Chem. Chem. Phys 2012, 14, 12592–12604. [DOI] [PubMed] [Google Scholar]
- (37).Pan X; Van R; Epifanovsky E; Liu J; Pu J; Nam K; Shao Y Accelerating Ab Initio Quantum Mechanical and Molecular Mechanical (QM/MM) Molecular Dynamics Simulations with Multiple Time Step Integration and a Recalibrated Semiempirical QM/MM Hamiltonian. J. Phys. Chem. B 2022, 126, 4226–4235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (38).Shen L; Wu J; Yang W Multiscale Quantum Mechanics/Molecular Mechanics Simulations with Neural Networks. J. Chem. Theory Comput 2016, 12, 4934–4946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (39).Shen L; Yang W Molecular Dynamics Simulations with Quantum Mechanics/Molecular Mechanics and Adaptive Neural Networks. J. Chem. Theory Comput 2018, 14, 1442–1455. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (40).Pan X; Yang J; Van R; Epifanovsky E; Ho J; Huang J; Pu J; Mei Y; Nam K; Shao Y Machine-Learning-Assisted Free Energy Simulation of Solution-Phase and Enzyme Reactions. J. Chem. Theory Comput 2021, 17, 5745–5758. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (41).Zeng J; Giese TJ; Ekesan Ş; York DM Development of Range-Corrected Deep Learning Potentials for Fast, Accurate Quantum Mechanical/Molecular Mechanical Simulations of Chemical Reactions in Solution. J. Chem. Theory Comput 2021, 17, 6993–7009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (42).Böselt L; Thürlemann M; Riniker S Machine Learning in QM/MM Molecular Dynamics Simulations of Condensed-Phase Systems. J. Chem. Theory Comput 2021, 17, 2641–2658. [DOI] [PubMed] [Google Scholar]
- (43).Papamakarios G; Nalisnick E; Rezende DJ; Mohamed S; Lakshminarayanan B Normalizing Flows for Probabilistic Modeling and Inference. J. Mach. Learn. Res 2021, 22, 2617–2680. [Google Scholar]
- (44).Rizzi A; Carloni P; Parrinello M Targeted Free Energy Perturbation Revisited: Accurate Free Energies from Mapped Reference Potentials. J. Phys. Chem. Lett 2021, 12, 9449–9454. [DOI] [PubMed] [Google Scholar]
- (45).Rizzi A; Carloni P; Parrinello M Multimap targeted free energy estimation 2023, arXiv: 2302.07683. arXiv.org ePrint archive, 10.48550/arXiv.2302.07683 (accessed Jun 1, 2023). [DOI] [PMC free article] [PubMed]
- (46).Schöller A; Woodcock HL; Boresch S Exploring Routes to Enhance the Calculation of Free Energy Differences via Non-Equilibrium Work SQM/MM Switching Simulations Using Hybrid Charge Intermediates between MM and SQM Levels of Theory or Non-Linear Switching Schemes. Molecules 2023, 28, 4006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (47).Hinton GE; Salakhutdinov RR Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [DOI] [PubMed] [Google Scholar]
- (48).Chen W; Ferguson AL Molecular enhanced sampling with autoencoders: On-the-fly collective variable discovery and accelerated free energy landscape exploration. J. Comput. Chem 2018, 39, 2079–2102. [DOI] [PubMed] [Google Scholar]
- (49).Chen W; Tan AR; Ferguson AL Collective variable discovery and enhanced sampling using autoencoders: Innovations in network architecture and error function design. J. Chem. Phys 2018, 149, 072312. [DOI] [PubMed] [Google Scholar]
- (50).Belkacemi Z; Gkeka P; Leliévre T; Stoltz G Chasing Collective Variables Using Autoencoders and Biased Trajectories. J. Chem. Theory Comput 2022, 18, 59–78. [DOI] [PubMed] [Google Scholar]
- (51).Ribeiro JML; Bravo P; Wang Y; Tiwary P Reweighted autoencoded variational Bayes for enhanced sampling (RAVE). J. Chem. Phys 2018, 149, 072301. [DOI] [PubMed] [Google Scholar]
- (52).Wang Y; Ribeiro JML; Tiwary P Past–future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics. Nat. Commun 2019, 10, 3573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (53).Bonati L; Piccini G; Parrinello M Deep learning the slow modes for rare events sampling. Proc. Natl. Acad. Sci. U.S.A 2021, 118, e2113533118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (54).Wang Y; Lamim Ribeiro JM; Tiwary P Machine learning approaches for analyzing and enhancing molecular dynamics simulations. Curr. Opin. Struct. Biol 2020, 61, 139–145. [DOI] [PubMed] [Google Scholar]
- (55).Wang Y; Herron L; Tiwary P From data to noise to data for mixing physics across temperatures with generative artificial intelligence. Proc. Natl. Acad. Sci. U.S.A 2022, 119, e2203656119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (56).Makhzani A; Shlens J; Jaitly N; Goodfellow I; Frey B Adversarial Autoencoders 2016, arXiv:1511.05644. arXiv.org ePrint archive, https://arxiv.org/abs/1511.05644 (access on Jun 1, 2023).
- (57).Bennett CH Efficient estimation of free energy differences from Monte Carlo data. J. Comput. Phys 1976, 22, 245–268. [Google Scholar]
- (58).Goodfellow I; Pouget-Abadie J; Mirza M; Xu B; Warde-Farley D; Ozair S; Courville A; Bengio Y Generative Adversarial Networks. Commun. ACM 2020, 63, 139–144. [Google Scholar]
- (59).Torrie G; Valleau J Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling. J. Comput. Phys 1977, 23, 187–199. [Google Scholar]
- (60).Brooks BR; Brooks III CL; Mackerell AD Jr.; Nilsson L; Petrella RJ; Roux B; Won Y; Archontis G; Bartels C; Boresch S; Caflisch A; Caves L; Cui Q; Dinner AR; Feig M; Fischer S; Gao J; Hodoscek M; Im W; Kuczera K; Lazaridis T; Ma J; Ovchinnikov V; Paci E; Pastor RW; Post CB; Pu JZ; Schaefer M; Tidor B; Venable RM; Woodcock HL; Wu X; Yang W; York DM; Karplus M CHARMM: The biomolecular simulation program. J. Comput. Chem 2009, 30, 1545–1614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (61).Elstner M; Porezag D; Jungnickel G; Elsner J; Haugk M; Frauenheim T; Suhai S; Seifert G Self-consistent-charge density-functional tight-binding method for simulations of complex materials properties. Phys. Rev. B 1998, 58, 7260–7268. [Google Scholar]
- (62).Gaus M; Cui Q; Elstner M DFTB3: Extension of the Self-Consistent-Charge Density-Functional Tight-Binding Method (SCC-DFTB). J. Chem. Theory Comput 2011, 7, 931–948. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (63).Gaus M; Goez A; Elstner M Parametrization and Benchmark of DFTB3 for Organic Molecules. J. Chem. Theory Comput 2013, 9, 338–354. [DOI] [PubMed] [Google Scholar]
- (64).Lu X; Gaus M; Elstner M; Cui Q Parametrization of DFTB3/3OB for Magnesium and Zinc for Chemical and Biological Applications. J. Phys. Chem. B 2015, 119, 1062–1082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (65).Huang J; Rauscher S; Nawrocki G; Ran T; Feig M; de Groot BL; Grubmüller H; MacKerell AD CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. Methods 2017, 14, 71–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (66).Steinbach PJ; Brooks BR New spherical-cutoff methods for long-range forces in macromolecular simulation. J. Comput. Chem 1994, 15, 667–683. [Google Scholar]
- (67).Vanommeslaeghe K; Hatcher E; Acharya C; Kundu S; Zhong S; Shim J; Darian E; Guvench O; Lopes P; Vorobyov I; Mackerell AD Jr. CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J. Comput. Chem 2010, 31, 671–690. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (68).Huang L; Roux B Automated Force Field Parameterization for Nonpolarizable and Polarizable Atomic Models Based on Ab Initio Target Data. J. Chem. Theory Comput 2013, 9, 3543–3556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (69).Neria E; Fischer S; Karplus M Simulation of activation free energies in molecular systems. J. Chem. Phys 1996, 105, 1902–1921. [Google Scholar]
- (70).Darden T; York D; Pedersen L Particle mesh Ewald: An N·log(N) method for Ewald sums in large systems. J. Chem. Phys 1993, 98, 10089–10092. [Google Scholar]
- (71).Boresch S; Karplus M The role of bonded terms in free energy simulations: 1. Theoretical analysis. J. Phys. Chem. A 1999, 103, 103–118. [Google Scholar]
- (72).Boresch S; Karplus M The role of bonded terms in free energy simulations. 2. Calculation of their influence on free energy differences of solvation. J. Phys. Chem. A 1999, 103, 119–136. [Google Scholar]
- (73).Pedregosa F; Varoquaux G; Gramfort A; Michel V; Thirion B; Grisel O; Blondel M; Prettenhofer P; Weiss R; Dubourg V; Vanderplas J; Passos A; Cournapeau D; Brucher M; Perrot M; Édouard Duchesnay Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res 2011, 12, 2825–2830. [Google Scholar]
- (74).Paszke A; Gross S; Massa F; Lerer A; Bradbury J; Chanan G; Killeen T; Lin Z; Gimelshein N; Antiga L; Desmaison A; Köpf A; Yang E; DeVito Z; Raison M; Tejani A; Chilamkurthy S; Steiner B; Fang L; Bai J; Chintala S PyTorch: An Imperative Style, High-Performance Deep Learning Library 2019.
- (75).Maas AL; Hannun AY; Ng AY Rectifier nonlinearities improve neural network acoustic models. Proc. ICML 2013; p 3. [Google Scholar]
- (76).Kingma DP; Ba J Adam: A Method for Stochastic Optimization 2017, arXiv:1412.6980. arXiv.org ePrint archive, 10.48550/arXiv.1412.6980 (access on Jun 1, 2023). [DOI]
- (77).Hastie T; Tibshirani R; Friedmand J The Elements of Statistical Learning, 2nd ed.; Springer Series in Statistics; Springer; New York, NY, 2009. [Google Scholar]
- (78).Eastman P; Swails J; Chodera JD; McGibbon RT; Zhao Y; Beauchamp KA; Wang L-P; Simmonett AC; Harrigan MP; Stern CD; Wiewiora RP; Brooks BR; Pande VS OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLoS Comput. Biol 2017, 13, 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (79).The PLUMED consortium Promoting transparency and reproducibility in enhanced molecular simulations. Nat. Methods 2019, 16, 670–673. [DOI] [PubMed] [Google Scholar]
- (80).Tribello GA; Bonomi M; Branduardi D; Camilloni C; Bussi G PLUMED 2: New feathers for an old bird. Comput. Phys. Commun 2014, 185, 604–613. [Google Scholar]
- (81).Grossfield A WHAM: an implementation of the weighted histogram analysis method (version 2.0.10). http://membrane.urmc.rochester.edu/content/wham/.
- (82).Shirts MR; Chodera JD Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys 2008, 129, 124105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (83).Tobias DJ; Brooks CLI Conformational equilibrium in the alanine dipeptide in the gas phase and aqueous solution: a comparison of theoretical results. J. Phys. Chem 1992, 96, 3864–3870. [Google Scholar]
- (84).Kumar S; Rosenberg JM; Bouzida D; Swendsen RH; Kollman PA The weighted histogram analysis method for free-energy calculations on biomolecules. I. The method. J. Comput. Chem 1992, 13, 1011–1021. [Google Scholar]
- (85).Jung H; Covino R; Arjun A; Leitold C; Dellago C; Bolhuis PG; Hummer G Machine-guided path sampling to discover mechanisms of molecular self-organization. Nat. Comput. Sci 2023, 3, 334–345. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
