Skip to main content
. 2021 Jul 19;12(34):11364–11381. doi: 10.1039/d1sc01185e

Fig. 8. Optimizing QML parameters on a set of experimentally obtained tautomer free energies in solution ΔtGexpsolv enables ANI-1ccx to include crucial solvation effects and improved estimates for tautomeric free energy differences can be obtained by importance weighting from vacuum simulations using the optimized QML parameters. (A) Top panel shows the training (green) and validation (purple) set performance as ΔΔtGsolv. Validation set performance was plotted with a bootstrapped 95% confidence interval. The performance of the optimized parameter set is also shown on the original ANI-1ccx dataset in blue. The best performing parameter set (evaluated on the validation set and indicated by the red dotted line) was selected to evaluate its performance on the test set. The bottom panel shows the MAE for the energy difference between each of the 400 parameter sets and the original parameter set on all the snapshots used for the free energy calculations (≈1,2 million snapshots) split in training/validation and test set as well as the original ANI1-ccx dataset. Figure (B) shows the distribution of ΔtGexpsolv − ΔtGcalcsolv for a hold out test set (71 tautomer pairs) with the native ANI-1ccx (θ) and the optimization parameter set (θ*). The optimized parameter set was able to improve the prediction of tautomeric free energy differences from initial 6.7 kcal mol−1 to 2.8 kcal mol−1 (MAE improved from 5.3 to 2.0 kcal mol−1). The difference in Kullback–Leibler divergence (KL) indicates that the tautomeric free energy differences obtained with the optimized parameter set can reproduce the distribution of the experimental tautomer ratios much better than the free energy differences obtained with the original parameter set.

Fig. 8