Abstract

Understanding and manipulating the conformational behavior of a molecule in different solvent environments is of great interest in the fields of drug discovery and organic synthesis. Molecular dynamics (MD) simulations with solvent molecules explicitly present are the gold standard to compute such conformational ensembles (within the accuracy of the underlying force field), complementing experimental findings and supporting their interpretation. However, conventional methods often face challenges related to computational cost (explicit solvent) or accuracy (implicit solvent). Here, we showcase how our graph neural network (GNN)-based implicit solvent (GNNIS) approach can be used to rapidly compute small molecule conformational ensembles in 39 common organic solvents reproducing explicit-solvent simulations with high accuracy. We validate this approach using nuclear magnetic resonance (NMR) measurements, thus identifying the conformers contributing most to the experimental observable. The method allows the time required to accurately predict conformational ensembles to be reduced from days to minutes while achieving results within one kBT of the experimental values.
Introduction
The majority of marketed pharmaceuticals are low-molecular-weight organic compounds, commonly referred to as small molecules.1 In solution, flexible small molecules are dynamic entities, giving rise to a Boltzmann-weighted ensemble of different conformations, which depends on the nature of the surrounding solvent. Weak interactions can vary dramatically between different solvents, resulting in critical differences between the corresponding conformational ensembles.2,3 Therefore, understanding even slight variations in the populations of conformers is crucial for a variety of applications. These include the synthesis of regio- and stereospecific molecules,4,5 optimization and understanding of passive membrane permeability,6,7 as well as fine-tuning physical properties.8 In addition, differing conformational ensembles also give rise to distinct experimental observables, and thus, they strongly influence the interpretation of experimental techniques. For instance, nuclear magnetic resonance (NMR),9−12 infrared (IR) spectroscopy,13,14 or vibrational circular dichroism (VCD) spectroscopy15−19 among others exhibit all conformational dependencies.
While molecular dynamics (MD) simulations with explicit solvent molecules (and if needed enhanced sampling techniques) constitute a powerful tool to estimate the conformational ensemble of small molecules in solution,20−23 they are inherently constrained by their relatively high computational demand. The molecule of interest (the solute) is solvated in a large box of solvent molecules, and the whole system is propagated through time. This approach is inherently inefficient since the probabilities of the conformational ensemble are identified from their repeated occurrence in the simulation rather than their intrinsic free energy, which is not readily accessible from explicit-solvent simulations. Nevertheless, such simulations are currently the gold standard because implicit solvation methods (e.g., Poisson–Boltzmann (PB),24 fast analytical continuum treatments of solvation (FACTS),25 or generalized Born (GB) models26) do not achieve the desired accuracy.
Machine learning has been proposed to bridge this gap27−32 and in previous studies,33,34 we have shown for the case of water that a graph neural network (GNN) trained on reference forces from explicit-solvent simulations can represent the solvent in a probabilistic way and, hence, accurately simulate a system using many fewer degrees of freedom. This reduces the computational effort significantly while providing comparable results to explicit-solvent simulations. The approach is inspired by the concept of Δ-learning35 and incorporates the functional form of the traditional GB-SA model as a physics-based regularization36 to reproduce the reference forces. This ensures the correct physical behavior of the model.
Here, we extend the GNN-based implicit solvent (GNNIS) model to 39 common organic solvents and demonstrate how the approach can be employed to rapidly identify Boltzmann-weighted conformational ensembles of small molecules in these solvents. To encode the different solvents in the GNN, whose architecture is described in greater detail in our previous work,33,34 we use an embedding scheme that allows the model to learn itself how to represent each solvent. Our approach allows for sets of randomly sampled conformations (e.g., from a conformer generator) to be minimized into Boltzmann-weighted conformational ensembles that are on par with explicit-solvent simulations. The approach is tested using ∼200 experimental observables derived from new NMR measurements as well as literature data. These tests show high accuracy using minimal computational effort and predict solvent-specific differences within one kBT of experimental results. This high speed (e.g., a small molecule’s conformational ensemble can be obtained in minutes rather than days) will allow the study of large data sets as well as iterative design procedures.
Results and Discussion
Technical Validation
The accuracy of the GNN, trained on ∼370,000 diverse small molecules with a molecular weight <500 Da, was assessed on an external test set consisting of 1000 compounds with a molecular weight ranging from 500 to 700 Da. The RMSE values achieved for each solvent are shown in Figure 1A. The overall correlation between the GNN and the external test set is excellent (Figure 1B). The deviation between predicted and reference forces varies significantly between different solvents ranging from 4.6 kJ mol–1 nm–1 for hexane to 46.9 kJ mol–1 nm–1 for HMPA. These differences between the solvents are most likely an effect of differing force magnitudes (see Supporting Information Figure S5 for the distribution of the magnitude of reference forces). Apolar solvents generally feature smaller forces, while polar ones feature larger forces. In addition to these polarity effects, other aspects are likely the influence of the conformational flexibility, self-diffusion, and rotational correlation time of the solvent (i.e., the time it takes for a solvent to equilibrate around a given conformation), which could explain why HMPA, glycerin, and octanol show the largest errors. While for most solvents, the chosen sampling time to capture the Boltzmann-weighted ensemble of solvent configurations around a given conformer was long enough, these larger solvents may suffer from increased sampling errors in the test-set calculations. However, as the GNN was trained on multiple conformers, it is possible that the lower test-set accuracy does not manifest in large deviations in prospective applications (e.g., free-energy profiles or conformational ensembles). For this reason, the sampling time for the larger solvents was not increased when generating the training set.
Figure 1.

Comparison of the GNN predicted versus reference forces. (A) Root-mean-square error (RMSE) for the 39 solvents. The black error bar indicates the standard deviation among the three GNN models trained with different random seeds. The color scale corresponds to the dielectric permittivity of the solvent. (B) Correlation between the predicted and reference forces for one replicate of the GNN.
Three models with different random seeds for the initialization were trained and evaluated. In general, the performance of the three models was virtually identical (i.e., standard deviation over the RMSE values is at most 0.05 kJ mol–1 nm–1). Hence, one model was selected for all further analysis. The model with the highest average RMSE for all solvents was chosen to be conservative.
Solvent Embedding
A key aspect of our GNN architecture (Figure 2A) is the simultaneous training of the entire solvent set. Briefly, a single model is trained for the entire set of solvents, and differences among solvents are captured through the embedding of solvent-specific feature vectors. This concurrent training should optimize information gain and efficiency by allowing the GNN to exploit transfer learning among solvents, i.e., to link similar solvent behavior to related features. While this means that the model needs to be retrained, if a new solvent is added, the cost of training a new model is negligible compared to the cost of generating a training set for a new solvent. To probe the GNN’s ability to develop an artificial “chemical intuition”, the feature vectors of the 39 solvents were taken from the solvent embedding layer of the GNN, and a principal component analysis (PCA) was performed on them. In Figure 2B, the three largest principal components, which together account for 47% of the variance, are shown. The resulting 3D arrangement of solvents groups chemically similar solvents (e.g., having the same functional group) together, in line with a chemist’s intuition and going beyond a solvent classification based solely on dielectric permittivity. For example, THF and DMSO have very different dielectric permittivities but still appear close in the PCA projection as they are both polar aprotic solvents. On the other hand, solvents with similar dielectric permittivity but different functional groups are far apart in our projection (e.g., chloroform and acetic acid). This demonstrates that the GNNIS model does indeed identify solvent effects that are not accounted for in classical continuum-based implicit solvent models and supports our strategy to train multiple solvents in a joint manner.
Figure 2.

(A) Schematic representation of the GNN architecture used for the implicit solvent. Message-passing steps are shown in coral, while node-wise operations are shown in red. (B) Principal component analysis (PCA) of the solvent embedding. The first two PCs are shown in the full plot. The inset further resolves the closely related solvents in the lower left corner by showing the projection with respect to the second and third PCs. The color scale corresponds to the dielectric permittivity of the solvent.
Prospective Molecular Dynamics Simulations
Next, we probed the ability of the GNNIS model to reproduce explicit-solvent MD simulations on an additional compound set I (Figure 3A) whose members were not part of the training set. Crucially, all compounds in set I can form intramolecular as well as solvent-intermediated hydrogen bonds and should thus be susceptible to solvent effects that are not simply correlated with the dielectric permittivity of the solvent. The free-energy profiles of the intramolecular hydrogen bonds (Supporting Information Figures S6–S9) and the free-energy differences ΔG between the conformations with the open and closed hydrogen bond were extracted from both the GNNIS and explicit-solvent MD simulations. Given that the free-energy difference is obtained from populations, note that the persistence time, simulation length, and write-out frequency have an effect on the range of possibly detectable free-energy differences. To ensure that the compared results do not lie outside of this range, the variability of the results was analyzed by splitting the explicit-solvent reference simulation into three equal parts and analyzing ten independent replicates for the GNNIS simulations. The comparison shown in Figure 3B demonstrates the excellent agreement between the approaches.
Figure 3.

Comparison of prospective MD simulations using the GNNIS model (10 × 50 ns) compared to the explicit-solvent reference (3 × 500 ns). (A) Compound set I with closed (left) and open (right) conformations. (B) Comparison of the free-energy difference ΔG between the lowest lying minima of the closed and opened intramolecular hydrogen bond for the explicit-solvent simulations (x-axis) and the GNNIS simulations (y-axis). Each dot represents one molecule–solvent combination. Error bars indicate the standard deviation across the simulation replicates. Error bars below 0.5 kJ mol–1 are omitted for clarity. (C) Free-energy profiles of the formation of the intramolecular hydrogen bond of compound I1 simulated using explicit solvent (dashed lines) and the GNNIS model (solid lines). Low dielectric solvents are shown on the left: ethyl acetate, 1,4-dioxane, chloroform, and cyclohexane; high dielectric solvents on the right: DMSO, methanol, acetonitrile, and nitrobenzene. The color scale indicates the dielectric permittivity of the solvent. The free-energy difference ΔG between the global minimum and the lowest-lying local minimum for ethyl acetate is indicated by the red double arrow.
The complete free-energy profiles of compound I1 in a selection of four low dielectric solvents (i.e., dielectric permittivity ϵ < 10) and four high dielectric solvents (i.e., ϵ > 30) are depicted in Figure 3C. As expected, the profiles in the individual solvents diverge significantly despite their similar dielectric permittivities, which can be reproduced by the GNNIS model. In contrast, simulations with the state-of-the-art implicit solvent model GB-Neck237 show significant deviations from the explicit-solvent reference. The median absolute error with GB-Neck2 is 2.5 kJ mol–1 compared to 0.6 kJ mol–1 for GNNIS (Supporting Information Figures S10–S14). Note that the GB-Neck2 model was optimized to best reproduce simulations in water and not in other solvents. To ensure that the observed limitations are not an artifact of this optimization, a second GB-based model, the GB-OBC model,38 was also compared, leading to qualitatively equal results (see Supporting Information Section S4). For this reason, and because the GNNIS model is based on the GB-Neck2 model, the further comparisons are only performed with the GB-Neck2 model.
The high accuracy of the GNNIS model is especially significant as it is achieved at a substantially reduced cost compared to explicit-solvent simulations when multiple systems are simulated in parallel (as in this study). While an explicit-solvent simulation can make full use of currently available hardware (i.e., the GPU), a single GNNIS simulation does not make full use of it and allows the evaluation of multiple systems in parallel, achieving a speed-up of 10-fold (1750 versus 16900 ns d–1) on the same hardware (8 CPU cores + 1 NVIDIA RTX 4090 GPU). A detailed description of the parallelization approach and the relationship between simulation speed and system size are given in the Supporting Information Sections S5 and S6.
Rapid Assessment of Conformational Ensembles
A key advantage of implicit-solvent methods is that they allow for the direct estimation of a conformer’s solvation free energy.39 In contrast, with explicit-solvent simulations, these computations require costly free-energy calculations. Implicit-solvent methods can hence be used to identify minima of a conformational ensemble and to calculate the corresponding free energies. This enables rapid access to Boltzmann-weighted conformational ensembles.
The accuracy of conformational ensembles generated with conventional implicit solvent models has been limited by their deficiencies in the description of explicit solvent effects (e.g., relative stability of intramolecular hydrogen bonds, see Supporting Information Figures S10–S14). On the other hand, the GNNIS model does, as shown above, accurately reproduce such explicit solvent effects. We further explored this approach for rapid identification of Boltzmann-weighted conformational ensembles of small molecules. The procedure used is depicted in Figure 4A (see Methods Section for more details). Briefly, a diverse set of conformers is generated randomly using distance geometry40−42 and minimized using a standard force field (OpenFF 2.0.043) in combination with the GNNIS model. The resulting ensemble is sorted based on the potential energy of each conformer and pruned based on the root-mean-square distance (RMSD). In the last step, the free energies of the final conformers are estimated based on the predicted energies and an entropy estimation using Grimme’s quasi-RRHO approach,44 whereby the entropy of all modes is damped and complemented with a free-rotor entropy contribution.
Figure 4.

(A) Example of the minimization workflow for 512 conformers of compound C1 in water. Values of the two dihedral angles around the amide bond of C1 are represented as black dots for the various conformers. The reference free-energy profile is depicted by the blue heatmap. From left to right, the steps in the workflow include: (1) generating random conformers (here with KDG42), (ii) minimizing these conformers using OpenFF43 for the intramolecular interactions combined with the GNNIS model, (iii) sorting conformers by their potential energy, (iv) RMSD pruning of the ensemble, and (v) estimating free energies. (B) Comparison of the estimated free energy of the final set of conformers with the free energy computed from the explicit-solvent reference simulation of C1 in water. Dashed lines indicate a difference of 1 kBT from the identity line. (C) Comparison of the conformer ensembles minimized in vacuum, using GB-Neck2, or using the GNNIS model for compound set C (left) and P (right). Models are compared based on the balanced accuracy score of finding conformers below 1 kBT. The sampling error of the explicit-solvent simulation is given for reference. Statistically significant differences (p < 0.05) between a model and the model to its left are indicated by a star. The median balanced accuracy is indicated by the horizontal line.
Simulated Ensembles
For two compound sets C and P (see Figure 8 in the Methods section), which feature smaller (i.e., molecular weight <500 Da) and larger molecules (i.e., molecular weight >500 Da), respectively, the GNNIS-predicted ensembles were compared to ensembles extracted from explicit-solvent simulations. As shown in Figure 4B for one of the compounds in water, the GNNIS-predicted ensembles are generally in good agreement with the explicit-solvent reference. The results for all other solvent–compound combinations are provided in Supporting Information Figures S15 and S19 for sets C and P, respectively, for the GNNIS model and in Supporting Information Figures S16–S18 and S20–S22 for the other solvent models.
Figure 8.

(A) Compound set C. (B) Compound set P.
To quantitatively assess the quality of the conformational ensemble, we calculated balanced accuracies for ensembles minimized using OpenFF 2.0.0 with (i) no solvent (in vacuum), (ii) the GB-Neck2 implicit solvent model, and (iii) the GNNIS model, and for ensembles simulated with the same force field in explicit solvent. The balanced accuracy indicates whether low-lying minima (i.e., minima with free energies below one kBT) are correctly identified in an ensemble. An explanatory illustration is provided in Supporting Information Figure S23. The correct prediction of low-energy conformers is especially important as they contribute significantly to experimental observables, e.g., NMR or IR spectra. As shown in Figure 4C, the GNNIS minimization detects low-lying free-energy minima with high accuracy, significantly outperforming GB-Neck2. In the case of set P, the minimization procedure with the GNNIS model even approaches the balanced accuracy obtained with explicit-solvent simulations evaluated using standard sampling times (i.e., 50 and 100 ns REST2 simulations for set C and P, respectively). As the compounds of set P are larger and more flexible compared to those in set C, longer simulation times in explicit solvent would be required to further increase the achievable balanced accuracy.
These results are very promising as the computational cost of the proposed workflow is much lower than that of explicit solvent simulations. The entire pipeline depicted in Figure 4A took minutes rather than hours for the reference calculations. A detailed analysis of the computational cost of the approach and the reference simulations is provided in the Supporting Information Section S7.
Comparison with Experimental Observables
To further explore the applicability of the conformational ensembles minimized with the GNNIS model for practical applications, we followed the workflow described above for compounds I1 and I4 (Figure 3A) in solvents for which proton NMR measurements with high spectral resolution are available. Experimental measurements for I1 were conducted in parallel to the computational work, while data for I4 was taken from the literature.45 At room temperature, both I1 and I4 feature two dominant conformers that interconvert (Figure 5A). The relative population of the conformers can be directly deduced from the J-coupling constant of the α-protons of the methoxy-group (denoted Hα in the following). The NMR signal of Hα is a multiplet resulting from two distinct vicinal proton–proton J-couplings (JA–B, JA–C). Both JA–B and JA–C are rotational averages over all conformers. Here, we extracted the total scalar coupling Jtot = JA–B + JA–C as the frequency difference between the two outermost peaks of the multiplet. In chloroform, the predominant population of the gauche conformer minimizes Jtot (Figure 5B), while a higher value is observed in DMSO from an enhanced population of the trans-conformer (Figure 5C).
Figure 5.

Assessment of the quality of the ensembles minimized with the classical GB-Neck2 or GNNIS model. (A) Schematic representation of the interconversion of the two key conformations (top) and Newman projections of the gauche and trans conformation (bottom) of compound I1. (B) 1H NMR spectrum of the characteristic Hα proton peak used for the determination of Jtot in chloroform. (C) 1H NMR spectrum of the characteristic Hα proton peak used for the determination of Jtot in DMSO. (D) Comparison of the predicted population of the trans conformer (x-axis) and the experimental Jtot coupling constant (y-axis) of I1 (left) and I4 (right, experimental data taken from ref (45).) for the GB-Neck2 model. (E) Same comparison for the GNNIS model.
The correlation between the populations of the trans-conformer for the predicted ensemble and the experimental observable for compounds I1 and I4 is shown in Figure 5D,E for GB-Neck2 and GNNIS, respectively. For both compounds, the expected linear relationship between the predicted trans-conformer population and Jtot is observed for the ensembles minimized with OpenFF 2.0.0 and GNNIS. The results indicate that the proposed workflow can rapidly provide solution ensembles and represents a significant improvement over the GB-Neck2 model. One interesting aspect of these results is that the relative populations of the opened and closed conformations of compound I1 do not align with the perception that polar solvents should favor more open conformations, as can be seen in the GB-Neck2 results. As the GB-Neck2 model relies solely on a continuum description using a fixed dielectric constant, the predicted populations are dependent on this property. Water, DMSO, and methanol, which feature the highest dielectric permittivities, are predicted to have the greatest fraction of the trans-conformer (conversely, benzene, with the lowest dielectric permittivity, is predicted to have the smallest fraction of the trans-conformer). While this is true for DMSO, water features one of the lowest populations of the trans-conformer for both compounds I1 and I4. Further investigation revealed that the reason for this is the ability of water to form an hydrogen-bonding network with the compound (see Supporting Information Figure S24), thus stabilizing the closed conformation. The fact that the GNNIS model can capture the effects of these complex interactions is especially promising.
The two main outliers are acetonitrile and THF for compound I1 (left panel of Figure 5E) for which the prediction diverges from the linear trend. However, the explicit-solvent simulations of compound I1 in these solvents (see Section “Prospective Simulations”) indicate that this deviation may not be due to GNNIS itself but rather to the explicit-solvent model that was used to generate the training data. The predicted fraction of the trans-conformer based on the explicit-solvent simulation (2.9% and 6.0% for acetonitrile and THF, respectively) is very similar to that predicted by the minimization approach with OpenFF 2.0.0 and GNNIS (2.3% and 4.9% for acetonitrile and THF, respectively). Note that while a divergence to experiment could also arise from issues of the underlying solute force field, these differences should be systematic for all solvents and result in a shift of all populations rather than manifest in a single outlier for a specific solvent.
In order to further investigate the behavior of the proposed minimization workflow, a set of 22 molecular balances (Figure 6A) designed to quantify hydrogen-bond strength in different solvents was studied. For these compounds, Meredith et al.46 measured the distribution of two distinct ketone rotations (Figure 6A) by means of 19F{1H} NMR measurements in nine different solvents. These rotations give strong evidence of the strength of the intramolecular hydrogen bond and the composition of the conformational ensemble of the molecule. The computational prediction of such properties would be highly desirable in fields such as synthetic organic chemistry, and makes them an ideal test case for the GNNIS model.46
Figure 6.

(A) Illustration of the 22 studied molecular balances. (B) Comparison of predicted ΔΔG values using the GNNIS model and the experimental data for the 22 molecular balances in nine different solvents. The color scale indicates the dielectric permittivity of the solvent. The experimentally determined standard error is denoted by the black error bars. Markers filled with gray represent measurements where the experiment could not determine the free-energy difference between the two rotations exactly. In these cases, the value indicated for these points provides an upper limit to the undefined values.
We performed our minimization approach for all balances in the different solvents. The main question we wanted to address in this context is how effectively the minimized ensembles can capture small differences in the populations of the different rotational states in varying solvent environments. To compare these, we first used the minimized ensemble to predict the free-energy difference between the two ketone rotation states. Next, the difference between these predictions and their mean (ΔΔGpre) is compared to the experimentally determined free-energy differences and their mean (ΔΔGexp). The results are shown in Figure 6B. Overall, the GNNIS model reproduces the differences with respect to the conformational ensembles in the different solvents very accurately, with most of the ΔΔG values within one kBT. This is especially interesting as this rotation, based on experimental findings, takes place on the μs time scale, making the analysis with explicit-solvent simulations computationally expensive. Comparing the results to GB-Neck2 simulations highlights the improvement over state-of-the-art conventional implicit solvent methods (see Supporting Information Figure S25 and S26). Ensembles minimized with GB-Neck2 predict almost the same values for the different polar solvents and the strength of the solvent effect is significantly underestimated (i.e., the median slope is 0.43). The high accuracy and low computational effort suggest that the GNNIS model is well-suited to study these kinds of systems.
Conclusions
This study extends the GNNIS approach to 39 organic solvents and presents a workflow for rapidly identifying Boltzmann-weighted conformational ensembles of small molecules in solution in combination with standard force fields. The GNNIS model was trained on a diverse set of small molecules simulated in the 39 different solvents, validated based on prospective MD simulations, and tested on conformational preferences determined from NMR experiments. Both comparisons show excellent agreement of the method and the reference data. Further, the dramatic speed-up compared to explicit-solvent simulations (i.e., ensembles can be generated within minutes rather than hours) makes the proposed approach a powerful tool for understanding the behavior of molecules in solution. This advance could facilitate the rational design of molecules by enabling the study of larger data sets and faster turnarounds, as well as the accurate and rapid interpretation of experimental observables. The software is provided open-source, and the training data is made freely available.
Methods
Solvents
39 solvents relevant for organic synthesis were selected (ordered by descending dielectric permittivity): Water, dimethyl sulfoxide (DMSO), glycerin, sulfolane, dimethylformamide (DMF), nitromethane, acetonitrile, N,N′-dimethylpropyleneurea (DMPU), nitrobenzene, methanol, N-methylpyrrolidone (NMP), propionitrile, hexamethylphosphoramide (HMPA), 2-nitropropane, benzonitrile, ethanol, acetone, isopropyl alcohol (IPA), pyridine, octanol, trifluorotoluene, dichloromethane (DCM), tetrahydrofuran (THF), dimethyl ether (DME), ethyl acetate, acetic acid, butyl formate, chloroform, methyl tert-butyl ether (MTBE), diethyl ether, oxylol, toluene, benzene, carbon tetrachloride, 1,4-dioxane, hexafluoroacetone, hexafluorobenzene, cyclohexane, hexane. The density and dielectric permittivity values of the solvents were taken from the literature47−52 and are given in Supporting Information Table S1.
Generation of the Training and Test Sets
The training and external test sets for water were taken from ref (34). The same procedure was followed for the calculation of the additional solvents but with fewer conformers per molecule (i.e., nine conformers were used for water): three conformers for chloroform, DMSO, and methanol, and one conformer for the remaining solvents. As in ref (34), forces could not be extracted for all compound–solvent combinations as some simulations were unstable and led to infinite energies. These combinations were not added to the data set. The final data set is freely available in the ETH Research Collection (10.3929/ethz-b-000710355).
Graph Neural Network (GNN)
The GNN architecture is shown in Figure 2A. It is based on our previous work in ref (34) but extends the approach for the prediction of energies and forces in multiple solvents. The model represents an invariant GNN with three passes. The additional solvent embedding was implemented using a lookup table that carries a vector of length 64 with learnable weights for each solvent. These weights were optimized during training together with all other trainable parameters. A detailed description of the architecture is given in the Supporting Information Section S1 “Graph Neural Network”.
For the training process, 95% of the full data set was used while the remaining 5% was reserved for validation. The training was carried out over 50 epochs with a batch size of 256. An exponentially decaying learning rate was applied, ranging from 5 × 10–4 to 5 × 10–6. The mean squared error was selected as the loss function, and the Adam optimizer53 was employed for weight optimization, with gradients being clipped to a norm of 1. Additionally, a dropout value of 0.1 was employed during training.
Molecular Dynamics (MD) Simulations
The MD simulations were carried out using the methodology described in our previous work.34 The ETKDG42 conformer generator as implemented in the RDKit54 was used to generate the starting structures. Molecules were parametrized using the OpenFF 2.0.0 force field43 and simulated using the OpenMM (version 8.0.0) simulation program.55 The solvation of compounds was performed using the PACKMOL program56 with a padding of 1 nm. The L-BFGS algorithm was used to minimize all compounds with the tolerance set to 10 kJ mol–1 nm–1. During simulation, all bonds involving hydrogens were constrained using the SETTLE57 for water and CCMA58 algorithms for all other bonds. Langevin dynamics employing the LFMiddle discretization scheme59 was used with a Monte Carlo barostat to propagate the system. The reference temperature and pressure were set to 300 K and 1 bar, respectively. Nonbonded interactions were corrected using the particle mesh Ewald scheme60 with a nonbonded cutoff of 1 nm. The time step of all simulations was set to 2 fs. The center-of-mass motion was removed using OpenMM’s CMMotionRemover, and the write-out frequency was set to 1000 and 100 for simulations using explicit and implicit solvent models, respectively.
MD simulations were performed for the molecules in set I (Figure 7): compounds I1, I2, and I3 that feature an intramolecular hydrogen bond forming pseudo 5-membered, 6-membered, or 7-membered rings, respectively, and compound I4 that contains two hydrogen-bond acceptors but no donors. The two main conformers for each compound are depicted in Figure 3A.
Figure 7.
Compound set I.
The explicit-solvent reference simulations were carried out for 500 ns in three repeats using the same procedure as described above. The simulations with the GNNIS model were performed for 50 ns in ten repeats. To perform these simulations, the OpenMM (version 8.0.0) simulation program55 was configured to carry out a vacuum simulation, and the GNN was integrated using the OpenMM-Torch package (version 1.4, https://github.com/openmm/openmm-torch). The OpenMM-Custom Forces classes were used to reimplement the nonbonded interactions to allow for multiple systems to be simulated in parallel. The reference GB-Neck2 simulations were carried out using default OpenMM settings for 1000 ns. For both implicit-solvent simulation methods, the same settings as for the explicit-solvent simulations were employed with the exception that no barostat and no nonbonded cutoff were used.
For compounds I1, I2, and I3, free-energy profiles were computed with direct counting based on the O-H distance histograms. Free energies were further corrected with a Jacobian correction factor of 4 πr2.61 For compound I4, the central torsional dihedral angle was analyzed to construct the histograms and subsequently the free-energy profile. To estimate free-energy differences, the distance or dihedral at the global and second lowest-lying minimum (see Supporting Information Figures S6–S9 for the exact location) of the explicit-solvent simulation was identified, and the ΔG between the two minima at these values was calculated for all approaches. Cases where the three repeats of the explicit-solvent reference simulation showed standard deviations >1 kBT were removed from the comparison. This was only the case for compound I3 in HMPA.
Generation and Minimization of Conformational Ensembles
Minimizations were carried out using the same setup as for the MD simulations. To optimize structures, the L-BFGS algorithm62 was used. RDKit54 was employed to generate the initial KDG42 conformers and MDTraj (version 1.9.7)63 was used to prune the minimized conformations based on their heavy-atom RMSD. The selection of a suitable RMSD threshold is not straightforward, as the RMSD metric is sensitive to the size of a molecule.64,65 We have, therefore, chosen different thresholds for different molecules, as specified below. Free energies were calculated using normal-mode analysis according to the quasi-RRHO algorithm proposed by Grimme.44 Conformers that led to imaginary frequencies were minimized further for up to 100 repeats. Note that some of the minimization did not converge or led to imaginary frequencies in the free-energy calculation. In both cases, the energy was set to infinity, and the conformers were ignored for the follow-up comparisons. The minimization in vacuum and using GB-Neck2 were carried out the same way starting from the same set of random conformers.
Compound Sets C and P
Two different sets of compounds were chosen for the comparison with explicit-solvent reference simulations (Figure 8): (i) compound set C with five small molecules with a molecular weight <500 Da that can be well sampled and whose conformers can be characterized based on two key dihedral angles, and (ii) compound set P with ten neutral compounds extracted from the Platinum data set66 with molecular weights between 500 and 700 Da that feature 5–10 rotatable bonds.
Due to the different size of the compounds, the procedure to generate the conformational ensembles varies slightly between the two sets. For compound set C, 5,120 initial conformers were generated and an RMSD threshold of 0.075 nm was used for pruning. For compound set P, 51,200 initial conformers were generated and an RMSD threshold of 0.15 nm was applied.
During the total ∼2,000,000 minimizations for set P, we realized that the approach yielded unreasonably small energies for two conformers of compound P9 (e.g., the free energy is more than 100 kJ mol–1 smaller than the median of the ensemble). In order to exclude conformations that had such high deviations in a systematic way, all conformations with free energies more than four standard deviations away from the ensemble median were excluded. Further, for compound P10, which is an atropisomer, only conformers of the correct isomer were considered for the comparison to the explicit-solvent reference simulation.
The reference ensembles were extracted from MD simulations performed using the enhanced sampling method REST2,67 following the setup described by Waibl et al.68 GROMACS69−71 2023 with the PLUMED72−74 2.9.1 plugin was used. Simulations were performed for 100 and 200 ns for set C and P, respectively. Eight replicas and a quadratic scheme to distribute the scaling factors between 1 and 0.125 were applied.
The comparison to explicit-solvent reference simulations was conducted by taking the final conformers from the minimization workflow and considering them as separate conformational states. The reference simulation was then used to assign a population to each state by assigning each frame of the reference simulation to the closest conformer below an RMSD threshold of 0.1 nm. These populations were converted to free energies and compared to the free energy predicted by the GNNIS approach. The same procedure was performed with the conformer ensembles minimized in a vacuum and using GB-Neck2.
For quantification, a confusion matrix (see Supporting Information Figure S19) was defined by separating the correlation plots into four bins using a cutoff at 1 kBT:
True positive: GGNNIS < 1 kBT and Gexplicit < 1 kBT
True negative: GGNNIS > 1 kBT and Gexplicit > 1 kBT
False positive: GGNNIS < 1 kBT and Gexplicit > 1 kBT
False negative: GGNNIS > 1 kBT and Gexplicit < 1 kBT.
From this, balanced accuracy was calculated using the SciPy library.75 Note that for some minimizations in vacuum and using GB-Neck2, no true negatives were found. In these cases, the true negative rate was set to zero for the calculation of the balanced accuracy.
NMR Measurements
Experimental 1H NMR measurements for compound I1 were conducted in deuterated solvents at room temperature. The measured NMR spectra, as well as the experimental details, are given in Supporting Information Section S3.
The NMR data for compound I4 recorded at 40 °C were taken from ref (45). The experimental observable Jtot was computed by measuring the distance between the two outer peaks of the multiplet (see Figure 5B) using the MestReNova software (version 14.3.3). The spectra and MestReNova files are available in the ETH Research Collection (DOI: 10.3929/ethz-b-000710355).
The experimental data for the 22 molecular balances were taken from ref (46).
Conformational Ensembles of Compounds I1, I4, and Molecular Balances
For the minimization, 256 initial KDG conformers were generated and an RMSD threshold of 0.05 nm was used. The dihedral angle around the central bond shown in Figure 5A was analyzed. Dihedral angles between −2.1 and 2.1 rad were categorized as the gauche-conformer, while all other dihedral angles were classified as the trans-conformer. Next, the total populations of these two states were evaluated by weighting each state according to its predicted free energy.
For the 22 molecular balances, 5,120 KDG conformers were generated and minimized. The conformers were divided into two sets based on the rotation state of the ketone group (see Figure 6A) and RMSD pruned using a threshold of 0.075 nm. The populations of the two groups were assessed based on the free energies of the conformers, and the difference in free energy between the two rotamers was calculated. To compare with experimental data, the mean of the predicted values for each conformer was subtracted from the predictions, while the mean of the experimental values was subtracted from the experimental data.
Acknowledgments
The authors thank ETH Zurich for financial support (Research Grant no. ETH-50 21-1), Philippe H. Hünenberger for interesting discussions, Felix Pultar for guidance with respect to organic chemistry, and Niels Maeder for reviewing the provided data and code. In addition, the authors would like to thank Rainer Frankenstein and Marc-Olivier Ebert of the NMR service at Laboratory of Organic Chemistry at ETH for measuring the experimental NMR spectra.
Data Availability Statement
The open-source code is available on GitHub (https://github.com/rinikerlab/GNNImplicitSolvent). All data required for the training and testing of the GNN is made freely available in the ETH Research Collection (DOI: 10.3929/ethz-b-000710355). All other data points (e.g., trajectories, minimized conformers, etc.) are available upon reasonable request.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/jacs.4c17622.
Description of the GNN architecture, physical properties of solvents, details of NMR measurements, GB model comparison, simulation timings, technical description of simulation setup, timings of the minimization approach, and additional figures (PDF)
The authors declare no competing financial interest.
Supplementary Material
References
- Makurvet F. D. Biologics vs. Small Molecules: Drug Costs and Patient Access. Med. Drug Discovery 2021, 9, 100075 10.1016/j.medidd.2020.100075. [DOI] [Google Scholar]
- Cook J. L.; Hunter C. A.; Low C. M. R.; Perez-Velasco A.; Vinter J. G. Solvent Effects on Hydrogen Bonding. Angew. Chem., Int. Ed. 2007, 46, 3706–3709. 10.1002/anie.200604966. [DOI] [PubMed] [Google Scholar]
- Cumberworth A.; Bui J. M.; Gsponer J. Free Energies of Solvation in the Context of Protein Folding: Implications for Implicit and Explicit Solvent Models. J. Comput. Chem. 2016, 37, 629–640. 10.1002/jcc.24235. [DOI] [PubMed] [Google Scholar]
- Carreira E. M.; Lisbet K.. Classics in Stereoselective Synthesis; Wiley-VCH, 2008; Vol. 1. [Google Scholar]
- Varghese J. J.; Mushrif S. H. Origins of Complex Solvent Effects on Chemical Reactivity and Computational Tools to Investigate Them: A Review. React. Chem. Eng. 2019, 4, 165–206. 10.1039/C8RE00226F. [DOI] [Google Scholar]
- Ashwood V. A.; Field M. J.; Horwell D. C.; Julien-Larose C.; Lewthwaite R. A.; McCleary S.; Pritchard M. C.; Raphy J.; Singh L. Utilization of an Intramolecular Hydrogen Bond To Increase the CNS Penetration of an NK1 Receptor Antagonist. J. Med. Chem. 2001, 44, 2276–2285. 10.1021/jm010825z. [DOI] [PubMed] [Google Scholar]
- Rezai T.; Bock J. E.; Zhou M. V.; Kalyanaraman C.; Lokey R. S.; Jacobson M. P. Conformational Flexibility, Internal Hydrogen Bonding, and Passive Membrane Permeability: Successful in Silico Prediction of the Relative Permeabilities of Cyclic Peptides. J. Am. Chem. Soc. 2006, 128, 14073–14080. 10.1021/ja063076p. [DOI] [PubMed] [Google Scholar]
- Chovatia P.; Sanzone A.; Hofman G.-J.; Dooley R.; Pezzati B.; Trist I. M. L.; Ouvry G.. Chapter 1 – Harnessing Conformational Drivers in Drug Design; Bentley J.; Bingham M., Eds.; Elsevier, 2024; Vol. 63, pp 1–60. [DOI] [PubMed] [Google Scholar]
- Laszlo P. Chapter 6 – Solvent Effects and Nuclear Magnetic Resonance. Prog. Nucl. Magn. Reson. Spectrosc. 1967, 3, 231–402. 10.1016/0079-6565(67)80016-5. [DOI] [Google Scholar]
- Searles D. J.; Huber H.. Calculation of NMR and EPR Parameters; John Wiley & Sons, Ltd, 2004; pp 175–189. [Google Scholar]
- Ciofini I.Calculation of NMR and EPR Parameters; John Wiley & Sons, Ltd, 2004; pp 191–208. [Google Scholar]
- Dračínský M.; Bouř P. Computational Analysis of Solvent Effects in NMR Spectroscopy. J. Chem. Theory Comput. 2010, 6, 288–299. 10.1021/ct900498b. [DOI] [PubMed] [Google Scholar]
- Allerhand A.; Schleyer P. V. R. Solvent Effects in Infrared Spectroscopic Studies of Hydrogen Bonding. J. Am. Chem. Soc. 1963, 85, 371–380. 10.1021/ja00887a001. [DOI] [Google Scholar]
- Gastegger M.; Schütt K. T.; Müller K.-R. Machine Learning of Solvent Effects on Molecular Spectra and Reactions. Chem. Sci. 2021, 12, 11473–11483. 10.1039/D1SC02742E. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Polavarapu P. L.; He J. Chiral Analysis Using Mid-IR Vibrational CD Spectroscopy. Anal. Chem. 2004, 76, 61A–67A. 10.1021/ac0415096. [DOI] [Google Scholar]
- Sherer E. C.; Lee C. H.; Shpungin J.; Cuff J. F.; Da C.; Ball R.; Bach R.; Crespo A.; Gong X.; Welch C. J. Systematic Approach to Conformational Sampling for Assigning Absolute Configuration Using Vibrational Circular Dichroism. J. Med. Chem. 2014, 57, 477–494. 10.1021/jm401600u. [DOI] [PubMed] [Google Scholar]
- Merten C. Vibrational Optical Activity as Probe for Intermolecular Interactions. Phys. Chem. Chem. Phys. 2017, 19, 18803–18812. 10.1039/C7CP02544K. [DOI] [PubMed] [Google Scholar]
- Weirich L.; Blanke K.; Merten C. More Complex, Less Complicated? Explicit Solvation of Hydroxyl Groups for the Analysis of VCD Spectra. Phys. Chem. Chem. Phys. 2020, 22, 12515–12523. 10.1039/D0CP01656J. [DOI] [PubMed] [Google Scholar]
- Eikås K. D. R.; Beerepoot M. T. P.; Ruud K. A Computational Protocol for Vibrational Circular Dichroism Spectra of Cyclic Oligopeptides. J. Phys. Chem. A 2022, 126, 5458–5471. 10.1021/acs.jpca.2c02953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Michel J.; Tirado-Rives J.; Jorgensen W. L. Energetics of Displacing Water Molecules from Protein Binding Sites: Consequences for Ligand Optimization. J. Am. Chem. Soc. 2009, 131, 15403–15411. 10.1021/ja906058w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- De Vivo M.; Masetti M.; Bottegoni G.; Cavalli A. Role of Molecular Dynamics and Related Methods in Drug Discovery. J. Med. Chem. 2016, 59, 4035–4061. 10.1021/acs.jmedchem.5b01684. [DOI] [PubMed] [Google Scholar]
- Cournia Z.; Allen B.; Sherman W. Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. J. Chem. Inf. Model. 2017, 57, 2911–2937. 10.1021/acs.jcim.7b00564. [DOI] [PubMed] [Google Scholar]
- Riniker S. Fixed-Charge Atomistic Force Fields for Molecular Dynamics Simulations in the Condensed Phase: An Overview. J. Chem. Inf. Model. 2018, 58, 565–578. 10.1021/acs.jcim.8b00042. [DOI] [PubMed] [Google Scholar]
- Baker N. A. Improving Implicit Solvent Simulations: A Poisson-Centric View. Curr. Opin. Struct. Biol. 2005, 15, 137–143. 10.1016/j.sbi.2005.02.001. [DOI] [PubMed] [Google Scholar]
- Haberthür U.; Caflisch A. FACTS: Fast Analytical Continuum Treatment of Solvation. J. Comput. Chem. 2008, 29, 701–715. 10.1002/jcc.20832. [DOI] [PubMed] [Google Scholar]
- Still W. C.; Tempczyk A.; Hawley R. C.; Hendrickson T. Semianalytical Treatment of Solvation for Molecular Mechanics and Dynamics. J. Am. Chem. Soc. 1990, 112, 6127–6129. 10.1021/ja00172a038. [DOI] [Google Scholar]
- Mahmoud S. S. M.; Esposito G.; Serra G.; Fogolari F. Generalized Born Radii Computation using Linear Models and Neural Networks. Bioinformatics 2020, 36, 1757–1764. 10.1093/bioinformatics/btz818. [DOI] [PubMed] [Google Scholar]
- Horvath D.; Marcou G.; Varnek A. Big Data” Fast Chemoinformatics Model to Predict Generalized Born Radius and Solvent Accessibility as a Function of Geometry. J. Chem. Inf. Model. 2020, 60, 2951–2965. 10.1021/acs.jcim.9b01172. [DOI] [PubMed] [Google Scholar]
- Chen Y.; Krämer A.; Charron N. E.; Husic B. E.; Clementi C.; Noé F. Machine Learning Implicit Solvation for Molecular Dynamics. J. Chem. Phys. 2021, 155, 084101 10.1063/5.0059915. [DOI] [PubMed] [Google Scholar]
- Airas J.; Ding X.; Zhang B. Transferable Implicit Solvation via Contrastive Learning of Graph Neural Networks. ACS Cent. Sci. 2023, 9, 2286–2297. 10.1021/acscentsci.3c01160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yao S.; Van R.; Pan X.; Park J. H.; Mao Y.; Pu J.; Mei Y.; Shao Y. Machine Learning Based Implicit Solvent Model for Aqueous-Solution Alanine Dipeptide Molecular Dynamics Simulations. RSC Adv. 2023, 13, 4565–4577. 10.1039/D2RA08180F. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coste A.; Slejko E.; Zavadlav J.; Praprotnik M. Developing an Implicit Solvation Machine Learning Model for Molecular Simulations of Ionic Media. J. Chem. Theory Comput. 2024, 20, 411–420. 10.1021/acs.jctc.3c00984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katzberger P.; Riniker S. Implicit Solvent Approach Based on Generalized Born and Transferable Graph Neural Networks for Molecular Dynamics Simulations. J. Chem. Phys. 2023, 158, 204101 10.1063/5.0147027. [DOI] [PubMed] [Google Scholar]
- Katzberger P.; Riniker S. A General Graph Neural Network Based Implicit Solvation Model for Organic Molecules in Water. Chem. Sci. 2024, 15, 10794–10802. 10.1039/D4SC02432J. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramakrishnan R.; Dral P. O.; Rupp M.; von Lilienfeld O. A. Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. J. Chem. Theory Comput. 2015, 11, 2087–2096. 10.1021/acs.jctc.5b00099. [DOI] [PubMed] [Google Scholar]
- Thürlemann M.; Böselt L.; Riniker S. Regularized by Physics: Graph Neural Network Parametrized Potentials for the Description of Intermolecular Interactions. J. Chem. Theory Comput. 2023, 19, 562–579. 10.1021/acs.jctc.2c00661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen H.; Roe D. R.; Simmerling C. Improved Generalized Born Solvent Model Parameters for Protein Simulations. J. Chem. Theory Comput. 2013, 9, 2020–2034. 10.1021/ct3010485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Onufriev A.; Bashford D.; Case D. A. Exploring Protein Native States and Large-Scale Conformational Changes with a Modified Generalized Born Model. Proteins:Struct., Funct., Bioinf. 2004, 55, 383–394. 10.1002/prot.20033. [DOI] [PubMed] [Google Scholar]
- Roux B.; Simonson T. Implicit Solvent Models. Biophys. Chem. 1999, 78, 1–20. 10.1016/S0301-4622(98)00226-9. [DOI] [PubMed] [Google Scholar]
- Blaney J. M.; Dixon J. S.. Distance Geometry in Molecular Modeling. In Reviews in Computational Chemistry; Wiley, 1994; Vol. 5, pp 299–335. [Google Scholar]
- Havel T. F.Distance Geometry: Theory, Algorithms, and Chemical Applications. In Encyclopedia of Computational Chemistry; Wiley, 1998. [Google Scholar]
- Riniker S.; Landrum G. A. Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation. J. Chem. Inf. Model. 2015, 55, 2562–2574. 10.1021/acs.jcim.5b00654. [DOI] [PubMed] [Google Scholar]
- Boothroyd S.; Behara P. K.; Madin O. C.; Hahn D. F.; Jang H.; Gapsys V.; Wagner J. R.; Horton J. T.; Dotson D. L.; Thompson M. W.; Maat J.; Gokey T.; Wang L.-P.; Cole D. J.; Gilson M. K.; Chodera J. D.; Bayly C. I.; Shirts M. R.; Mobley D. L. Development and Benchmarking of Open Force Field 2.0.0: The Sage Small Molecule Force Field. J. Chem. Theory Comput. 2023, 19, 3251–3275. 10.1021/acs.jctc.3c00039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grimme S. Supramolecular Binding Thermodynamics by Dispersion-Corrected Density Functional Theory. Chem. - Eur. J. 2012, 18, 9955–9964. 10.1002/chem.201200497. [DOI] [PubMed] [Google Scholar]
- Tasaki K.; Abe A. NMR Studies and Conformational Energy Calculations of 1,2-Dimethoxyethane and Poly(oxyethylene). Polym. J. 1985, 17, 641–655. 10.1295/polymj.17.641. [DOI] [Google Scholar]
- Meredith N. Y.; Borsley S.; Smolyar I. V.; Nichol G. S.; Baker C. M.; Ling K. B.; Cockroft S. L. Dissecting Solvent Effects on Hydrogen Bonding. Angew. Chem., Int. Ed. 2022, 61, e202206604 10.1002/anie.202206604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- CRC Handbook of Chemistry and Physics; CRC Press: Cleveland, OH, 2024. [Google Scholar]
- Fujinaga T.; Izutsu K.; Sakura S. Hexamethylphosphoramide: Purification and Tests for Purity. Pure Appl. Chem. 1975, 44, 115–124. 10.1351/pac197544010115. [DOI] [Google Scholar]
- Rosenfarb J.; Huffman H. L. J.; Caruso J. A. Dielectric constants, viscosities, and related physical properties of several substituted liquid ureas at various temperatures. J. Chem. Eng. Data 1976, 21, 150–153. 10.1021/je60069a034. [DOI] [Google Scholar]
- Wohlfahrt C.2 Pure Liquids: Data: Datasheet from Landolt-Börnstein - Group IV Physical Chemistry “Static Dielectric Constants of Pure Liquids and Binary Liquid Mixtures”; Springer-Verlag: Berlin Heidelberg, 1991; Vol. 6https://materials.springer.com/lb/docs/sm_lbs_978-3-540-47619-1_2. [Google Scholar]
- Kirk-OthmerKirk-Othmer Encyclopedia of Chemical Technology, 4th ed.; John Wiley and Sons, 1995; Vol. 1. [Google Scholar]
- de la Luz A. P.; Iuga C.; Vivier-Bunge A. An Effective Force Field to Reproduce the Solubility of MTBE in Water. Fuel 2020, 264, 116761 10.1016/j.fuel.2019.116761. [DOI] [Google Scholar]
- Kingma D. P.; Ba J.. Adam: A Method for Stochastic Optimization. arXiv:1412.6980. arXiv.org e-Printarchive. 2017. https://doi.org/10.48550/arXiv.1412.6980.
- Landrum G.; Tosco P.; Kelley B.; Ric; Cosgrove D.; Sriniker; Gedeck; Vianello R.; Schneider N.; Kawashima E.; Dan N.; Jones G.; Dalke A.; Cole B.; Swain M.; Turk S.; Savelyev A.; Vaucher A.; Wójcikowski M.; Take I.; Probst D.; Ujihara K.; Scalfani V. F.; Godin G.; Lehtivarjo J.; Pahl A.; Walker R.; Berenger F.. RDKit (Q1 ) Release2023; Zenodo, 2023.
- Eastman P.; Swails J.; Chodera J. D.; McGibbon R. T.; Zhao Y.; Beauchamp K. A.; Wang L.-P.; Simmonett A. C.; Harrigan M. P.; Stern C. D.; Wiewiora R. P.; Brooks B. R.; Pande V. S. OpenMM 7: Rapid Development of High Performance Algorithms for Molecular Dynamics. PLoS Comput. Biol. 2017, 13, e1005659 10.1371/journal.pcbi.1005659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martínez L.; Andrade R.; Birgin E. G.; Martínez J. M. PACKMOL: A Package for Building Initial Configurations for Molecular Dynamics Simulations. J. Comput. Chem. 2009, 30, 2157–2164. 10.1002/jcc.21224. [DOI] [PubMed] [Google Scholar]
- Miyamoto S.; Kollman P. A. An Analytical Version of the SHAKE and RATTLE Algorithm for Rigid Water Models. J. Comput. Chem. 1992, 13, 952–962. 10.1002/jcc.540130805. [DOI] [Google Scholar]
- Eastman P.; Pande S. Constant Constraint Matrix Approximation: A Robust, Parallelizable Constraint Method for Molecular Simulations. J. Chem. Theory Comput. 2010, 6, 434–437. 10.1021/ct900463w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Z.; Liu X.; Yan K.; Tuckerman M. E.; Liu J. Unified Efficient Thermostat Scheme for the Canonical Ensemble with Holonomic or Isokinetic Constraints via Molecular Dynamics. J. Phys. Chem. A 2019, 123, 6056–6079. 10.1021/acs.jpca.9b02771. [DOI] [PubMed] [Google Scholar]
- Darden T.; York D.; Pedersen L. Particle Mesh Ewald: An N·log(N) Method for Ewald Sums in Large Systems. J. Chem. Phys. 1993, 98, 10089–10092. 10.1063/1.464397. [DOI] [Google Scholar]
- Herschbach D. R.; Johnston H. S.; Rapp D. Molecular Partition Functions in Terms of Local Properties. J. Chem. Phys. 1959, 31, 1652–1661. 10.1063/1.1730670. [DOI] [Google Scholar]
- Liu D. C.; Nocedal J. On the Limited Memory BFGS Method for Large Scale Optimization. Math. Program. 1989, 45, 503–528. 10.1007/BF01589116. [DOI] [Google Scholar]
- McGibbon R. T.; Beauchamp K. A.; Harrigan M. P.; Klein C.; Swails J. M.; Hernández C. X.; Schwantes C. R.; Wang L.-P.; Lane T. J.; Pande V. S. MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophys. J. 2015, 109, 1528–1532. 10.1016/j.bpj.2015.08.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schulz-Gasch T.; Schärfer C.; Guba W.; Rarey M. TFD: Torsion Fingerprints As a New Measure To Compare Small Molecule Conformations. J. Chem. Inf. Model. 2012, 52, 1499–1512. 10.1021/ci2002318. [DOI] [PubMed] [Google Scholar]
- Braun J.; Katzberger P.; Landrum G. A.; Riniker S. Understanding and Quantifying Molecular Flexibility: Torsion Angular Bin Strings. J. Chem. Inf. Model. 2024, 64, 7917–7924. 10.1021/acs.jcim.4c01513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friedrich N.-O.; Meyder A.; de Bruyn Kops C.; Sommer K.; Flachsenberg F.; Rarey M.; Kirchmair J. High-Quality Dataset of Protein-Bound Ligand Conformations and Its Application to Benchmarking Conformer Ensemble Generators. J. Chem. Inf. Model. 2017, 57, 529–539. 10.1021/acs.jcim.6b00613. [DOI] [PubMed] [Google Scholar]
- Wang L.; Friesner R. A.; Berne B. J. Replica Exchange with Solute Scaling: A More Efficient Version of Replica Exchange with Solute Tempering (REST2). J. Phys. Chem. B 2011, 115, 9431–9438. 10.1021/jp204407d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waibl F.; Casagrande F.; Dey F.; Riniker S. Validating Small-Molecule Force Fields for Macrocyclic Compounds Using NMR Data in Different Solvents. J. Chem. Inf. Model. 2024, 64, 7938–7948. 10.1021/acs.jcim.4c01120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hess B.; Kutzner C.; van der Spoel D.; Lindahl E. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory Comput. 2008, 4, 435–447. 10.1021/ct700301q. [DOI] [PubMed] [Google Scholar]
- Pronk S.; Páll S.; Schulz R.; Larsson P.; Bjelkmar P.; Apostolov R.; Shirts M. R.; Smith J. C.; Kasson P. M.; van der Spoel D.; Hess B.; Lindahl E. GROMACS 4.5: A High-throughput and Highly Parallel Open Source Molecular Simulation Toolkit. Bioinformatics 2013, 29, 845–854. 10.1093/bioinformatics/btt055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Abraham M. J.; Murtola T.; Schulz R.; Páll S.; Smith J. C.; Hess B.; Lindahl E. GROMACS: High Performance Molecular Simulations Through Multi-level Parallelism from Laptops to Supercomputers. SoftwareX 2015, 1–2, 19–25. 10.1016/j.softx.2015.06.001. [DOI] [Google Scholar]
- Bonomi M.; Branduardi D.; Bussi G.; Camilloni C.; Provasi D.; Raiteri P.; Donadio D.; Marinelli F.; Pietrucci F.; Broglia R. A.; Parrinello M. PLUMED: A Portable Plugin for Free-energy Calculations with Molecular Dynamics. Comput. Phys. Commun. 2009, 180, 1961–1972. 10.1016/j.cpc.2009.05.011. [DOI] [Google Scholar]
- Tribello G. A.; Bonomi M.; Branduardi D.; Camilloni C.; Bussi G. PLUMED 2: New Feathers for an Old Bird. Comput. Phys. Commun. 2014, 185, 604–613. 10.1016/j.cpc.2013.09.018. [DOI] [Google Scholar]
- Promoting Transparency and Reproducibility in Enhanced Molecular Simulations. Nat. Methods 2019, 16, 670–673. 10.1038/s41592-019-0506-8. [DOI] [PubMed] [Google Scholar]
- Virtanen P.; Gommers R.; Oliphant T. E.; Haberland M.; Reddy T.; Cournapeau D.; Burovski E.; Peterson P.; Weckesser W.; Bright J.; van der Walt S. J.; Brett M.; Wilson J.; Millman K. J.; Mayorov N.; Nelson A. R. J.; Jones E.; Kern R.; Larson E.; Carey C. J.; Polat İ.; Feng Y.; Moore E. W.; VanderPlas J.; Laxalde D.; Perktold J.; Cimrman R.; Henriksen I.; Quintero E. A.; Harris C. R.; Archibald A. M.; Ribeiro A. H.; Pedregosa F.; van Mulbregt P. SciPy 1.0 Contributors SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The open-source code is available on GitHub (https://github.com/rinikerlab/GNNImplicitSolvent). All data required for the training and testing of the GNN is made freely available in the ETH Research Collection (DOI: 10.3929/ethz-b-000710355). All other data points (e.g., trajectories, minimized conformers, etc.) are available upon reasonable request.

