Abstract
A new method called QM-VM2 is presented that efficiently combines statistical mechanics with quantum mechanical (QM) energy potentials in order to calculate noncovalent binding free energies of host–guest systems. QM-VM2 efficiently couples the use of semi-empirical QM (SEQM) energies and geometry optimizations with an underlying molecular mechanics (MM) based conformational search, to find low SEQM energy minima, and allows for processing of these minima at higher levels of ab initio QM theory. A progressive geometry optimization scheme is introduced as a means to increase conformational sampling efficiency. The newly implemented QM-VM2 is used to compute the binding free energies of the host molecule cucurbit[7]uril and a set of 15 guest molecules. The results are presented along with comparisons to experimentally determined binding affinities. For the full set of 15 host–guest complexes, which have a range of formal charges from +1 to +3, SEQM-VM2 based binding free energies show poor correlation with experiment, whereas for the ten +1 complexes only, a significant correlation (R2 = 0.8) is achieved. SEQM-VM2 generation of conformers followed by single-point ab initio QM calculations at the dispersion corrected restricted Hartree–Fock-D3(BJ) and TPSS-D3(BJ) levels of theory, as post-processing corrections, yields a reasonable correlation with experiment for the full set of host–guest complexes (R2 = 0.6 and R2 = 0.7, respectively) and an excellent correlation for the +1 formal charge set (R2 = 1.0 and R2 = 0.9, respectively), as long as a sufficiently large basis set (triple-zeta quality) is employed. The importance of the inclusion of configurational entropy, even at the MM level, for the achievement of good correlation with experiment was demonstrated by comparing the calculated ΔE values with experiment and finding a considerably poorer correlation with experiment than for the calculated free energy ΔE − TΔS. For the complete set of host–guest systems with the range of formal charges, it was observed that the deviation of the predicted binding free energy from experiment correlates somewhat with the net charge of the systems. This observation leads to a simple empirical interpolation scheme to improve the linear regression of the full set.
I. INTRODUCTION
Ever since the Nobel Prize was awarded to Cram, Lehn, and Pedersen1–4 in 1987 for their seminal work on host–guest supramolecular chemistry, host–guest chemical systems have been widely studied in basic research laboratories.5,6 They have now also been widely adopted as a means to utilize molecular recognition mechanisms in various applied chemistry fields, including, but not limited to, drug development,7,8 materials sciences,9,10 analytical separation sciences,11,12 chemical pollutant cleanup technology,13,14 and the agrochemical industry.15 For example, many pharmaceutical drug candidates exhibit poor solubility and, therefore, poor bioavailability (the ability to reach the site of action unaltered), but their bioavailability can be improved by the formation of inclusion complexes with host molecules such as cyclodextrins16,17 and cucurbitrils.18
Both the host–guest binding affinity strength and the structural nature of host–guest complexation (i.e., the guest molecule inside, partly inside, or outside the host cavity) control the effect on the physicochemical properties of the guest molecule; therefore, key data required for the optimization of host–guest complexes for specific chemical applications are the binding free energy and the most energetically favored host–guest structures. There have been concerted efforts to develop routinely usable accurate physics-based computational methods to predict these host–guest properties. If such a capability were available, many rounds of expensive chemical synthesis and experimental measurement, usually required by chemical research and development programs, could be avoided. Furthermore, there has been recent interest in accurate computational prediction of host–guest properties in the context of providing simplified models to aid development and refinement of computational protein–ligand binding affinity methods.19–23
Two main challenges arise when computing host–guest binding free energies and structures: First, the guest molecule, depending on its size and flexibility, may adopt many energetically favorable conformations and orientations within the host cavity, and even more so if the host also exhibits some flexibility. This introduces a requirement for significant conformational searching, with a goal of finding low energy structures of the system. Such studies are computationally demanding. Notably, multiple distinct thermally accessible molecular structures result in an increase in conformational entropy and are, therefore, important to account for. In addition, if, upon binding, the geometries of predominant conformations of the guest molecule change, this can also change the configurational entropy compared to that of the free guest, affecting the total binding affinity.24 Second, the host–guest noncovalent interaction potential is relatively weak, but, at the same time, highly complicated to model accurately, requiring treatment of energy contributions such as electrostatics (Coulomb), polarization, exchange repulsion, dispersion, charge transfer, and solvation.
Current computational approaches fall into two camps with respect to the interaction potential: classical molecular mechanics (MM) based and quantum mechanics (QM) based approaches. The fast turnaround of MM-based methods allows for significant conformational sampling, and computations can be performed on thousands of atoms, facilitating inclusion of explicit solvent molecules. However, while MM-based methods, e.g., free energy perturbation (FEP),25,26 MM Poisson–Boltzmann Surface Area (MMPBSA),27 and attach-pull-release (APR), a combined docking and molecular dynamics (MD) based approach,28 have, for some systems, proved capable of providing relative binding free energies that correlate well with experiment, consistency as well as accurate absolute binding free energies remain a challenge.29–31 Furthermore, the empirical force field potentials32,33 that these methods rely on are unlikely to have optimal parameters for an arbitrary system of interest,34 due to the presence of empirically fitted parameters that are typically tied to specific molecule types, nor do they provide adequate descriptions of complex chemical interactions involving, for example, π-stacking, polarization, and charge transfer, limiting their applicability. In contrast, QM based potentials can naturally account for these complex interactions, and, in addition, the level of QM theory (method and basis set) can, in principle, be systematically improved to provide the required accuracy. A significant difficulty, though, with applying ab initio QM (AIQM) potentials to the calculation of host–guest binding affinities is that their increased computational expense can preclude adequate conformational sampling. This has resulted in a tendency to (a) rely solely on computationally cheaper semi-empirical QM (SEQM) methods, e.g., dispersion corrected PM6,35–37 and (b) rely on a conformational search step that uses only MM methods, sometimes leading to a situation in which QM corrections (energy and/or geometry) are unable to recover from the poor quality of the provided MM structures. This has led to somewhat mixed results regarding the accuracy of QM predicted host–guest binding affinities compared to experiment.38,39 In the situation where guest molecules are relatively small and rigid and the necessary seed conformations can be intuited and generated “by hand,” the application of density functional theory (DFT) with a good quality basis set has resulted in predicted absolute binding affinities that correlate highly with experiment, providing a good proof of principle.40 Such a manual approach to the generation of host–guest conformers, however, is not generally and routinely feasible, especially when the guest and/or host molecules have significant flexibility.
This paper presents a new approach to the calculation of host–guest binding free energies, called QM-VM2, which tightly integrates SEQM and AIQM potentials with the statistical mechanics based second-generation mining minima method M2.41,42 This was achieved by interfacing the VeraChem LLC implementation of the second-generation mining minima method, VM2, with the QM software package GAMESS.43–45 The MM-only version of VM2 has already been applied to the calculation of protein–ligand and host–guest free energies.46–50 The new QM-VM2 approach is designed to address the problems with current approaches to the calculation of host–guest binding affinities, which were outlined above. QM-VM2 efficiently couples the use of SEQM energies and geometry optimizations with an underlying MM-based conformational search, guiding the search toward conformers that have low SEQM energies, instead of low MM energy conformers. This new scheme also allows for processing of the conformers produced at higher levels of QM theory, and it needs no manual initial placement of guests in the host, providing an automated placement mechanism to seed conformational searching. The first demonstration of QM-VM2 presented here is to compute the binding free energies of the host cucurbit[7]uril and a set of 15 guest molecules20 and to compare the results with experimentally determined binding affinities.
This paper is divided into the following sections: Sec. II is a theory section, where the QM-VM2 method is described in detail, Sec. III gives the computational details of the first application of QM-VM2, Sec. IV presents the results and discussion, and Sec. V provides the conclusions and future work.
II. THEORY
A. MM-VM2
The VeraChem mining minima (VM2) algorithm is an implementation of the second-generation Mining Minima (M2) method originally developed by Gilson et al.24,42,51–53 VM2 is an end-point approach, whereby the binding free energy of a host–guest complex is computed as the difference between the standard chemical potentials of the bound complex (HG) and the free host (H) and guest (G) at constant volume: . For host–guest systems, the volume change on binding is small, so the Helmholtz binding free energy obtained is a good approximation to the Gibbs binding free energy ΔF0 ≈ ΔG0.24,54
The classical statistical mechanics based standard chemical potential of a molecule in solution can be expressed as
| (1) |
where
| (2) |
and
| (3) |
Here, Z is the configuration integral over all molecular conformations, C0 is the standard concentration, which, combined with the factor of 8π2, accounts for the positional and orientational mobility of the free molecule at standard concentration, E(r) is the energy comprising the potential energy U(r) plus the solvation energy W(r) as a function of internal coordinates, R is the ideal gas constant, and T is the absolute temperature.24 The VM2 method approximates Z, an integral over all space, as the sum over local configuration integrals zi for a manageable set of M local energy wells, which correspond to the low energy minima of the system,
| (4) |
| (5) |
where E(r) is again the energy as a function of internal coordinates, but the integration is restricted to the local energy well i. The standard chemical potential may now be conveniently expressed in terms of a sum over local standard chemical potentials ,
| (6) |
where
| (7) |
The two key computational requirements of the VM2 algorithm then become the determination of the low energy minima of a system and the calculation of the local standard chemical potentials of these minima. The latter is calculated using an enhanced harmonic approximation method called HAMS, i.e., harmonic approximation with mode scanning.55,56
| (8) |
| (9) |
Here, E0,i, the energy at the bottom of the potential energy well i, is the leading term, and is calculated via the matrix of the energy second derivatives (Hessian) plus numerical integration of low energy modes to correct for anharmonicity. The determination of the low energy minima of a system is achieved by the use of an aggressive torsional mode-distort-minimize algorithm, a heavily modified version of the Tork search method,57 which generates torsional modes via diagonalization of the MM-based dihedral angle Hessian matrix and is designed to repeatedly drive molecular conformations over high energy barriers and subsequently geometry optimize them to produce lower and lower energy minima.
In the MM-only based implementation of the VM2 algorithm (MM-VM2), the energy E(r) is calculated only with classical methods. The potential energy U(r) is obtained from empirical force fields such as the CHARMM General Force Field (CGenFF)33 or a general Amber force field (GAFF)32 or CHARMm,58 which contain bond-stretch, bond-angle, torsion, van der Waals, and Coulombic terms. During geometry optimizations and Hessian matrix involved steps, such as torsional mode generation, the solvation energy and energy derivative terms are calculated using the Generalized Born (GB) continuum model,59,60 and the more accurate Poisson–Boltzmann Surface Area (PBSA) method61 is applied as a final solvation energy correction W(r) to the minima found.
Given the forgoing discussion, the basic MM-VM2 algorithm proceeds by searching for low energy conformations/minima of the system, and any repeat conformations found during the conformational search are discarded by a symmetry aware structural RMSD method.62 The local configuration integral [Eq. (8)] is calculated for the remaining minima, and then, the standard chemical potential is calculated according to Eqs. (1) and (4). In practice, this process is repeated iteratively until no new minima are found and the chemical potential converges within a given tolerance.
Upon convergence of a VM2 calculation, a probability pi can be assigned to each of the local wells (or conformations), assuming a Boltzmann distribution,
| (10) |
In addition, again applying a harmonic oscillator (H.O.) approximation to the wells, an average energy of each well can be obtained using the equipartition theorem,
| (11) |
where Nint is the number of internal degrees of freedom. Equations (10) and (11) lead to an expression for the energy averaged over all wells,
| (12) |
This, in turn, allows a useful decomposition of the total chemical potential, providing expressions for the total configurational entropy as well as the entropy of each local well,63
| (13) |
| (14) |
The quantities in Eqs. (13) and (14) will be discussed further in Sec. IV D.
B. QM-VM2
The VM2 algorithm has now been interfaced with quantum mechanical (QM) methods, producing a new mining minima free energy method QM-VM2. In the QM-VM2 scheme [Fig. 1(a)], the semi-empirical QM (SEQM) based free energy is iterated until self-consistency, that is, until the change of the SEQM free energy falls within a predefined threshold. Once the SEQM free energy is converged, single point AIQM calculations may be performed to obtain a more accurate total free energy. As will be shown in Sec. IV, while the converged SEQM-VM2 calculation can alone yield reasonably good correlation with experiments, this post-processing step with ab initio QM methods can produce absolute binding free energies in excellent agreement with experiment, provided adequate basis sets are used.
FIG. 1.
QM-VM2 method. (a) Mining minima method. (b) Generation of SEQM corrected conformers [first step in (a), purple border]. Blue indicates QM calculations that are performed using GAMESS. Pink indicates calculations carried out with the VM2 software.
The QM-VM2 conformational search [Fig. 1(b)], like that for MM-VM2, occurs through a torsional mode-distort-minimize process, with the torsional modes calculated via the MM-based dihedral angle Hessian matrix, and the distortion steps along these modes and initial minimization again using the MM-based potential. However, once an MM-based conformer is produced, it is then passed through an interface to the necessary GAMESS quantum chemistry package routines for SEQM energy or geometry optimization (note that the interface is actually general and not limited to SEQM, but rather can access any type of QM method implemented in GAMESS), and it is these SEQM energies and structures, when passed back from GAMESS through the interface to the VM2 drivers, that are used in energy cutoff decisions and to seed, if they are low in energy, the next round of conformational searches.
Once the search is complete, as in the case of MM-VM2, duplicate conformers are identified and discarded.62 The local chemical potentials are then calculated for the remaining set, and from these, the total chemical potential is determined. The implementation of QM-VM2 allows the use of the MM potential to calculate the Hessian based and mode scanning terms in the HAMS based local configuration integrals, see Eq. (9),56,57 but with the QM based energy, e.g., , used as the leading term, i.e., used to adjust the bottom of the well,
| (15) |
Similarly, the QM based energies, e.g., can be used in the average energy expressions, Eqs. (11) and (12), leading to a means of expressing the total MM-based configurational entropy and local well entropy for SEQM and QM adjusted energy wells.
The QM-VM2 implementation also allows for the use of an AIQM or SEQM Hessian based rigid rotor harmonic oscillator (RRHO) approximation, with the conformers produced by the search passed to the GAMESS Hessian and thermodynamic analysis routines to provide the required terms by the non-classical statistical mechanics RRHO based expression [i.e., zero point energy (ZPE), rotational, translational, and vibrational enthalpy (H) and entropy (S) terms] with an additional term to adjust from 1 atm pressure (P) to 1M standard concentration,38,64,65
| (16) |
| (17) |
The use of AIQM or SEQM Hessian based RRHO will usually be too computationally demanding to include inside the VM2 iterations and is more likely to be used for the post-processing of a limited set of conformers produced by a converged VM2 calculation, in an approach similar to the AIQM single-point energy correction post-processing described above. Note that while the classical formulation of VM2 results in Helmholtz free energies as described above, the QM thermodynamic analysis from GAMESS includes enthalpy terms and so provides the Gibbs free energy.
C. Progressive scheme (PGSS)
The most computationally expensive step of the conformational search procedure in QM-VM2 calculations is the SEQM geometry optimization of the MM-generated conformers. The MM-based mode-distort-minimize procedure, with subsequent PBSA solvation energy correction, takes on the order of several seconds or less per conformer. In contrast, the SEQM geometry optimization of a host–guest conformer can, for the systems presented in this study (see Fig. 3), take between 3 min and 20 min on a single compute processor (CPU) core, with a total of 800 minimizations attempted per VM2 iteration by default – see Sec. III. This is due, in part, to their significant size: the number of atoms for the set of host–guest complexes in this study ranges between 146 and 157 atoms, with corresponding basis function counts for SEQM of 416 and 442, respectively. It is also because the relatively flat potential energy surface encountered for such noncovalently bound systems results in the frequent need for more than one hundred steps to converge the geometry. Note that, in the conformational search scheme shown in Fig. 1(b), single-point SEQM energies of MM-generated conformers are used as a metric to decide whether to discard a conformer or further process it through SEQM geometry optimization, with the goal of reducing the number of geometry optimizations carried out. However, this is far from an ideal solution because some MM-generated conformers may have a high initial SEQM single-point energy, but then, after a SEQM geometry optimization, may become competitively low in energy. Therefore, the discarding of these conformers could slow down the overall convergence of the QM-VM2 calculation and even prevent the lowest energy conformers of the system from being found.
FIG. 3.
Structure of host molecule cucurbit[7]uril (CB7) and the guest molecules that comprise the host–guest systems in this study. Dark gray: carbon, blue: nitrogen, red: oxygen, and white: hydrogen.
In order to address these problems, a partial optimization scheme, called progressive scheme (PGSS), has been developed to generate SEQM corrected conformers (Fig. 2). The key idea of the progressive scheme is straightforward: large gradient vector components indicate steep descent along the trajectory down to the local minima, more likely leading to a low-energy potential well. Therefore, apart from the energy, the energy gradient is also used as a means to determine if a geometry optimization calculation should be stopped and the conformer should be discarded or continued until completion. In this scheme, four new control parameters have been introduced for the partial SEQM optimization of MM-generated conformers (Table I).
FIG. 2.
Progressive optimization scheme (PGSS). This scheme can replace the standard scheme used in Fig. 1(b) to generate SEQM corrected conformers. Blue indicates QM calculations performed using GAMESS. Pink indicates calculations performed with the VM2 software.
TABLE I.
A summary of the new control variables for PGSS.
| PGSS control | ||
|---|---|---|
| parameters | Default | Function |
| npgstep | Five steps | The number of PGSS optimization steps |
| ecutpg | 5 kcal/mol | Energy cutoff when comparing with the |
| lowest-found local minimum | ||
| npgopt | Three iterations | The maximum number of PGSS iterations |
| gradcut | (0.5)n × 10−3 hartree/bohr, where n | Threshold value with which the largest |
| is the progressive iteration number | gradient vector component is compared |
The first parameter, npgstep, the number of PGSS partial optimization steps allowed, is typically set to 5. If the geometry has converged within npgstep optimization steps, the energy difference (Ediff) between the current conformer and the lowest energy conformer found so far is examined. The current conformer is kept when Ediff is within a predefined threshold ecutpg (the default is 5.0 kcal/mol); otherwise, it is discarded. If the geometry has not converged but Ediff is less than the ecutpg value, suggesting that a potential low-lying local minimum is nearby, further optimization is carried out until completion. If Ediff is greater than the predefined threshold, the energy gradient is analyzed. The PGSS process is repeated if the largest gradient component is larger than a threshold value (gradcut); otherwise, the process is stopped, and the partially optimized structure is discarded. The default value for the number of PGSS iterations, npgopt, is 3. In essence, this scheme introduces another mechanism for selecting low-energy minima contributing to the configuration integral with an early checking and intervention capability that should avoid unnecessary time-consuming SEQM geometry optimizations.
Among the four control parameters for the progressive scheme summarized in Table I, npgstep, ecutpg, npgopt, and gradcut, the first three are single-valued parameters that can be altered in the VM2 input by users. The gradient threshold, gradcut, on the other hand, is not a user-specified input parameter; rather, it changes as a function of the PGSS iteration number n, (0.5)n × 10−3 hartree/bohr. This is because the magnitude of the gradient vector gradually decreases when falling toward the bottom of the potential well, assuming a harmonic shaped potential well. By the third progressive iteration, the gradcut value becomes 0.25 × 10−3, which is not too far from the typical gradient cutoff value (0.0001) in electronic structure codes. Hence, the default value for npgopt is set to 3, and if users set npgopt larger than 4, from the fourth progressive iteration onwards, the formula for gradcut will be disregarded and the value 0.0001 will be used.
D. Coarse-grained parallelism for QM-VM2
Like the MM-VM2 implementation, QM-VM2 uses coarse-grained parallelization of the conformational search, based on the Message Passing Interface (MPI) library, to speed up the turnaround of calculations. A conformational search occurs every VM2 iteration, see Figs. 1 and 2, and each conformational search initiates hundreds of mode-distort-minimize calculations, i.e., by default, 400 single torsional mode-distort-minimize calculations followed by 400 random combinations of mode-pairs-distort-minimize calculations. In the current work, these mode-distort-minimize calculations were distributed across all MPI processes available to each QM-VM2 calculation (for most of the calculations presented here, this was 24 MPI processes), with each MPI process itself carrying out multiple “serial” executions of the procedure outlined in Fig. 2. The parallel algorithm alternates between two schemes, one in which all MPI processes are seeded with the same initial conformer, and, through global communication, all MPI processes are periodically reseeded with the current lowest energy conformation found so far. The other scheme seeds each MPI process with a different conformation, each one taken from the full set of conformations produced so far in the VM2 calculation, and carries out independent rounds of the procedure shown in Fig. 2. This approach is designed to introduce structural diversity, which, on the basis of an extensive experience with host–guest and protein–ligand conformational searches at the VM2-MM-only level, helps the search avoid becoming trapped and stalling in local energy wells, before the lowest energy conformers of a system are found.
As a single unified QM-VM2 executable is built by linking the GAMESS and VM2 compiled object files, the VM2 drivers can access and utilize the generalized distributed data interface (GDDI)66 in GAMESS, which is built on top of MPI. This allows not only the use of a team of processes, where each individual process carries out a “serial” execution of mode-distort-minimize calculations, as described above, but also the use of multiple teams, each with multiple processes, providing for a combined coarse grained-fine grained multi-level parallel approach; i.e., parallelized SEQM or AIQM energy or energy-gradient calculations are carried out by the already distributed (across teams) mode-distort-minimize calculations. While not used in the current work, the latter multi-level parallelism has been applied in other projects and allows for faster turnaround if the computational resources are available; it also allows for application to larger molecular systems.
III. COMPUTATIONAL DETAILS
The Statistical Assessment of the Modeling of Proteins and Ligands (SAMPLs) challenges were founded to provide prospective validation for computational tools in rational drug design.19–23 Participants in these blinded challenges are tasked with predicting molecular properties, such as binding affinities, for given chemical systems. The participants then submit their predictions to the organizers, who then compare the results of each submission to as yet unpublished high quality experimental data, generated especially for each challenge. There have been seven challenges to date, and SAMPL3 through SAMPL7 have all included a host–guest binding affinity component,19–23 with the rationale that these smaller simplified systems provide for faster computational turnaround than the larger protein–ligand systems, but still provide a means of testing and validating many components of the computational models employed. The datasets for completed SAMPLn challenges are also useful for retrospective studies, given that they comprise the curated sets of molecular system coordinate files supplied to the challenge participants, along with the corresponding high-quality experimental data, such as binding affinities. In this study, the host molecule cucurbit[7]uril (CB7) and associated guest molecules from the SAMPL4 challenge20 were used for an initial application of the QM-VM2 method and the proposed progressive scheme (Fig. 3), and the QM-VM2 calculated binding free energies of these host–guest systems are compared against the published SAMPL4 experimental binding affinities.
Initial starting structures of the host–guest complexes were generated automatically by translation of the center of geometry (COG) of the guest molecules to the COG of the host, followed by the removal of any resulting steric clashes via an initial MM-based geometry optimization that damps very large energy-gradient values [the COG is calculated the same way as the center of mass (COM), but with all the masses set to one].
For the MM-based parts of the QM-VM2 calculation, the parameters (bond, angle, torsion, van der Waals, etc.) for the potential energy were assigned according to the CHARMM force field,58 using the Discovery Studio Visualizer (Biovia), and atomic partial charges were assigned using the VCharge software (VeraChem LLC).67 For the MM-based mode-distort-minimize procedure, the MM solvation energy was included using the generalized Born (GB) continuum model,59,61 and the Poisson–Boltzmann Surface Area (PBSA) method62 was applied to provide a more accurate MM-based single-point solvation energy correction. The torsional mode-distort-minimize based conformational search carried out during each VM2 iteration included 400 single-mode distortion based searches and 400 searches in which random combinations of pairs of modes were used to generate distortions. At least, four VM2 iterations were carried out in all mining minima calculations, and typically, most of the free energy lowering occurs within these iterations. All final VM2 free energies were converged to an energy difference <0.3 kcal/mol compared to the previous VM2 iteration.
Both the SEQM and post-processing AIQM calculations were carried out with the electronic structure package, GAMESS.42–44 The SEQM calculations employed the third order density functional tight binding method, DFTB3,68 in combination with the D3 Grimme dispersion correction69 modified with the Becke–Johnson (BJ) damping [DFTB3-D3(BJ)].70–73 The set of interatomic interaction parameters used (3OB)74,75 was specifically designed for DFTB3, with improvements mainly in non-covalent bonding. SEQM geometry optimizations were considered converged when the largest component of the gradient was less than 0.0001 hartree/bohr, and, in addition, the root-mean-square gradient was less than a third of this maximum component tolerance.
Two different implicit solvation models were considered to account for solvation effects in the SEQM and AIQM based calculations: the Solvation Model Density (SMD) method76 and the conductor-like polarizable continuum model (C-PCM).77–80 The cavitation and dispersion terms were excluded from the latter model in the current work. It was found that the SMD model, both with the default van der Waals radii values81 and with a set of adjusted radii,82 showed improved binding free energies relative to C-PCM for a few complexes, but over the full set of complexes was worse; therefore, the C-PCM results are presented here. For C-PCM, the default density of tesserae on the cavity surface (nstall =60) is used, as increasing this value only marginally improved the binding free energies and the overall correlation, with significantly longer computational times. The simplified united atomic (SUAHF) radii83 were used for the generation of the C-PCM cavity holding the solute, as van der Waals radii led to increased over-binding of the host and guests and substantially lowered the correlation with experiment (see the supplementary material for more details).
For the AIQM post-processing step [see Fig. 1(a)], the 30 conformers with the lowest DFTB3-D3(BJ)/PCM energy were processed with single-point energy calculations using TPSS-D3(BJ)84 and HF-D3(BJ) with the def2-TZVP basis set85–88 and C-PCM implicit solvation. For a subset of seven of the host–guest systems, second-order perturbation theory with density fitting (RI-MP2),89,90 again with C-PCM and the def2-TZVP basis set, was applied to the 30 lowest DFTB3-D3(BJ)/C-PCM energy conformers. RI-MP2 was applied only to a reduced host–guest set due to the computational expense resulting from the use of the def2-TZVP basis set (∼3700 basis functions) for these calculations. The reduced set comprised the systems with the most strongly and weakly bound guests experimentally (guests 3 and 13), a middle range binder (guest 7), and the four highly charged systems (guests 1, 4, 5, and 10), i.e., formal charge greater than +1.
Local configuration integrals were calculated using MM for the Hessian based and mode scanning (HAMS) terms,55,56 but with the SEQM and AIQM energies used as the leading term, i.e., used to adjust the bottom of the well [e.g., Eq. (15)], and the absolute temperature was set as 300 K.
Default values are used for the control parameters of the progressive scheme (Fig. 2); that is, the number of progressive iterations and the number of SEQM optimization steps were set to 3 and 5, respectively, and an energy cutoff of 5 kcal/mol was used.
In the remaining sections of the paper, the use of the C-PCM solvation model is implied. References to the SEQM-VM2 step and the post-processing step will be separated by //, and values from the SEQM method are enclosed in square brackets. For example, the notation TPSS-D3(BJ)/Def2-TZVP//[DFTB3-D3(BJ)]-VM2 indicates that the SEQM-VM2 calculation was performed at the DFTB3-D3(BJ) level of theory and the conformers were post-processed at the TPSS-D3(BJ)/Def2-TZVP level of theory.
IV. RESULTS AND DISCUSSION
This section is divided into the following subsections: (a) the SEQM-VM2 binding free energies and the SEQM-VM2 plus AIQM post-processing binding free energies are presented for the host CB7 and 15 guest molecules (Fig. 3) and compared to experimental binding affinities; (b) the performance of the PGSS method, developed to avoid unproductive geometry optimizations during the VM2 conformational search, is evaluated; (c) the effect of inclusion of MM-based entropy terms [Eq. (15)] on the accuracy of the calculated binding free energies compared to experiment is examined; and (d) an interpolation scheme is proposed to examine and correct for apparent systematic error in the current implementation of QM-VM2 when applied to highly charged systems.
While the ultimate goal of the QM-VM2 methodology, and its ongoing development, is to consistently predict, with good accuracy, the absolute binding free energies of receptor–ligand complexes, the ability to predict even just the ranking of a set of ligands with respect to how strongly they bind a particular receptor is also highly sought after by medicinal chemists and other applied scientists. Therefore, examined here are not only errors of the predicted absolute binding free energies but also errors of the predicted relative binding free energies and linear correlation metrics between the predicted values and experimental values.
Specific error metrics presented are mean signed error (MSE), mean absolute error (MAE), and root mean-squared error (RMSE). Two definitions of RMSE are presented: RMSEo and RMSEr. The former evaluates the accuracy of the absolute binding free energies and is given as
| (18) |
where the second sum is the MSE. The latter, RMSEr, assesses the relative binding free energies by calculating the differences among all pairs of host–guest systems subject to this study,
| (19) |
In Eqs. (18) and (19), n is the number of measurements, ΔGcalc and ΔGexp are the calculated and experimental binding affinities, respectively, and just for the purposes of these definitions, ΔGcalc ≡ ΔFcalc. These RMSE definitions are those used for the assessment of Sampl4 challenge results,20 allowing direct comparison between QM-VM2 and the participating methods.
Two methods to examine linear correlation are used: one, linear regression analysis via the linear regression slope and the Pearson coefficient of determination, R2, where a linear regression slope of 1.0 and R2 value of 1.0 indicate a perfect correlation, and two, the Kendall rank correlation coefficient, τ,91 which is a measure of the strength of the association between two sets of ranked data, in this case, experimental and calculated binding free energies. Kendall τ ranges between 0 and 1, with 0 being no correlation and 1 being a perfect correlation. In addition to the metrics mentioned above, the y-intercept values of the linear regression lines are also recorded as they are used in an interpolation scheme discussed later in Sec. IV D.
A. SEQM-VM2 and SEQM-VM2 with ab initio QM post-processing
Table II presents the [DFTB3-D3(BJ)]-VM2 calculated binding free energies of host CB7 and the guest molecules shown in Fig. 3, along with the corresponding experimental binding free energies. A MSE of −7.2 kcal/mol for the full set of host–guest systems, Table IV, indicates substantial over-binding in the DFTB3-D3(BJ)-VM2 predicted values compared to the experimental values, with very large over-binding (MSE = −12.8 kcal/mol) for the four highly charged guests (i.e., charge >+1) and smaller over-binding (MSE = −5.2 kcal/mol) for the remaining guests with charge +1. Given the significant mismatch between MSEs for the two groups, the lack of linear correlation between the computed and experimental values (R2 = 0.05) for the full set is not surprising – see Fig. 4. Separate regression plots for the charge =+1 and charge >+1 sets (see Fig. 5), however, show very good correlation for the charge =+1 set, R2 = 0.8, and significantly worse correlation, R2 = 0.3, for the charge >+1 set, albeit with a small sample size. The high level of correlation with experiment for the charge =+1 set is encouraging, given that only a SEQM level of theory was employed. Furthermore, the results suggest that highly charged (>+1) systems are a particular challenge to DFTB3-D3(BJ)/PCM, with Fig. 5 indicating the possibility of systematic error that could be adjusted for. This is further examined in Sec. IV D.
TABLE II.
Binding free energies of the Sampl4 CB7 set from DFTB3-D3(BJ)-VM2 calculations. The numbering scheme is given in Fig. 3.
| Binding free energy (kcal/mol) | |||
| Complex | Positive charge | Exp. | SEQM-VM2 |
| 1 | 2 | −9.9 | −23.43 |
| 2 | 1 | −9.6 | −14.24 |
| 3 | 1 | −6.6 | −11.87 |
| 4 | 2 | −8.4 | −18.71 |
| 5 | 2 | −8.5 | −21.74 |
| 6 | 1 | −7.9 | −12.58 |
| 7 | 1 | −10.1 | −14.22 |
| 8 | 1 | −11.8 | −15.95 |
| 9 | 1 | −12.6 | −17.66 |
| 10 | 3 | −7.9 | −21.98 |
| 11a | 1 | −11.1 | −15.97 |
| 11b | 1 | −11.1 | −16.18 |
| 12 | 1 | −13.3 | −19.46 |
| 13 | 1 | −14.1 | −19.08 |
| 14 | 1 | −11.6 | −19.92 |
TABLE IV.
Error metrics of QM-VM2 binding free energy predictions compared to experiment for the SAMPL4 CB7 dataset.
| Host–guest systems | Level of theory | MSE | MAE | RMSEo | RMSEr | τ | Slope | Intercept | R2 |
|---|---|---|---|---|---|---|---|---|---|
| Current QM-VM2 work | |||||||||
| Full set | [DFTB3-D3(BJ)]-VM2 | −7.2 | 7.2 | 3.6 | 5.2 | 0.2 | 0.3 | −13.9 | 0.0 |
| RHF-D3(BJ)//VM2a | −2.3 | 2.3 | 1.7 | 2.5 | 0.7 | 1.0 | −2.2 | 0.6 | |
| TPSS-D3(BJ)//VM2a | 2.2 | 2.4 | 1.4 | 2.1 | 0.6 | 0.9 | 1.6 | 0.7 | |
| SAMPL4 participant results for comparison | |||||||||
| OSTb,c | 1.9 | 2.8 | 1.4 | 0.8 | |||||
| RRHOb,d | 2.5 | 3.7 | 1.8 | 0.8 | |||||
| Enthalpyb,e | 2.7 | 4.0 | 1.6 | 0.7 | |||||
| M2b,f | 3.4 | 4.5 | 2.0 | 0.7 | |||||
| EESb,g | 3.4 | 5.0 | 1.9 | 0.7 | |||||
| SIE+HBb,h | 1.8 | 2.6 | 0.2 | 0.6 | |||||
| BARb,i | 2.2 | 3.3 | 1.3 | 0.6 | |||||
| FEPb,j | 3.9 | 5.7 | 1.8 | 0.6 | |||||
| QM/M2b,k | 3.0 | 4.5 | 0.7 | 0.2 | |||||
| Current QM-VM2 work | |||||||||
| Charge =+1 | [DFTB3-D3(BJ)]-VM2 | −5.2 | 5.2 | 1.1 | 1.7 | 0.7 | 1.1 | −4.0 | 0.8 |
| RHF-D3(BJ)//VM2a | −1.4 | 1.5 | 1.0 | 1.5 | 1.0 | 1.4 | 2.9 | 1.0 | |
| TPSS-D3(BJ)//VM2a | 2.8 | 2.8 | 1.0 | 1.4 | 0.8 | 1.2 | 5.1 | 0.9 | |
| Charge >+1 | [DFTB3-D3(BJ)]-VM2 | −12.8 | 12.8 | 1.5 | 2.4 | 0.3 | 1.4 | −10.8 | 0.3 |
| RHF-D3(BJ)//VM2a | −4.6 | 4.6 | 0.6 | 1.0 | 1.0 | 1.5 | −0.1 | 0.8 | |
| TPSS-D3(BJ)//VM2a | 0.5 | 1.0 | 1.0 | 1.7 | 1.0 | 1.4 | 4.3 | 0.5 | |
Results in this work from AIQM single-point energy post-processing of the first 30 [DFTB3-D3(BJ)]-VM2 conformers. The Def2-TZVP basis set is employed for both RHF-D3(BJ) and TPSS-D3(BJ).
Results presented in the SAMPL4 overview article.20 See references therein for additional methodological details to those shown directly below.
Orthogonal space tempering (OST) carried out with the GAFF/AM1-BCC energy model and a modified TIP3P water potential.93
Conformational sampling was performed manually. Final QM free energy model: PW6B95-D3/def2-QZVP(-g,-f)/COSMO-RS//TPSS-D3-cosmo/def2-TZVP/HF-3c(freq.) and includes rigid rotor harmonic oscillator (RRHO) approximation derived terms.40
Direct calculation of enthalpy change from long MD simulations of end states, carried out with the GAFF/AM1-BCC energy model and TIP3P/TIP3P-Ew water potential.
Molecular mechanics mining minima (M2) calculations39 carried out with the CHARMm force field58 and Vcharge charges.67
Expanded ensemble simulations (EESs) using MD with the GAFF/AM1-BCC energy model and TIP3P water potential.94
Solvated interaction energy including hydrogen bonding terms (SIE+HB) using Wilma docking and the GAFF/AM1-BCC energy model with the biotechnology research institute boundary element method for solving Poisson equation (BRI BEM) continuum solvation.95,96
Bennett acceptance ratio (BAR)97 using MD with the atomic multipole optimized energetics for biomolecular simulation (AMOEBA)98 force field.
Free energy perturbation (FEP)25 using metadynamics with the GAFF/AM1-BCC energy model and TIP3P water potential.
FIG. 4.
Comparison of binding affinities (kcal/mol) of the Sampl4 CB7 set, experiments vs calculations obtained from [DFTB3-D3(BJ)]-VM2. Also presented is the linear regression line.
FIG. 5.
Comparison of binding affinities (kcal/mol) of Sampl4 CB7 dataset experiments vs calculations obtained from [DFTB3-D3(BJ)]-VM2, separately for the set of charge =+1 (blue) and the set of charge >+1 (orange).
Since DFTB is a SEQM method, an obvious next step toward more accurate binding free energy predictions is to apply a more sophisticated AIQM treatment. This was carried out according to the post-processing scheme indicated in Fig. 1(a); i.e., AIQM single-point energies, , at [DFTB3-D3(BJ)]-VM2 generated conformer geometries, which replace the term in Eq. (15). The SEQM level of theory employed, DFTB3-D3(BJ)/PCM, will not necessarily agree with the AIQM level of theory as to which conformers are the lowest in energy; therefore, a significant number of [DFTB3-D3(BJ)]-VM2 conformers must be included in the AIQM based Boltzmann-averaged binding free energies. After including the top 10, 20, and 30 conformers for post-processing of the complete set, it was observed that the binding free energies converge with the inclusion of 30 conformers. Therefore, the 30 lowest energy [DFTB3-D3(BJ)]-VM2 conformers were post-processed for each host–guest system. Given this large total number of post-processing AIQM calculations, the computationally efficient HF and DFT methods were chosen, augmented with Grimme dispersion corrections (−D). Second-order perturbation theory with density fitting (RI-MP2) was also considered. Both a double-zeta + polarization basis set, 6-31G(d,p), and a triple-zeta + polarization basis set, Def2-TZVP, were explored.
The binding affinities obtained from post-processing calculations using the double-zeta basis set (see supplementary material for details) clearly demonstrate that the double-zeta + polarization basis set overestimates the binding affinities, regardless of the choice of theory. For example, the predicted binding affinity of CB7-guest2 is −19.5 kcal/mol and −18.3 kcal/mol calculated with HF-D3(BJ)/6-31G(d,p) and TPSS-D3(BJ)/6-31G(d,p), respectively, while the experimental result is only −9.6 kcal/mol. Adding the diffuse functions on the heavy elements can significantly improve the error metrics, as illustrated by the post-process of SEQM-VM2 at HF-D3(BJ)/6-31+G(d,p) (Table S5). However, the predicted binding affinities at HF-D3(BJ)/6-31+G(d,p) are still overestimated by ∼5 kcal/mol to 12 kcal/mol. On the other hand, Table III clearly shows that the triple-zeta + polarization basis set, Def2-TZVP, can produce absolute binding affinities in very good agreement with experiment for both HF-D3(BJ) and TPSS-D3(BJ). It is concluded that the double-zeta + polarization basis set is inadequate for predicting binding free energies of the Sampl4 CB7 host–guest complexes, and a triple-zeta + polarization basis set or better is necessary.
TABLE III.
Binding affinities of the Sampl4 CB7 set from TPSS-D3(BJ)/def2-tzvp//[DFTB3-D3(BJ)]-VM2 and HF-D3(BJ)/def2-tzvp//[DFTB3-D3(BJ)]-VM2 calculations.
| Positive | Binding free energy (kcal/mol) | ||||
| Complex | charge | Experimental | HF-D3(BJ) | TPSS-D3(BJ) | RI-MP2 |
| 1 | 2 | −9.9 | −14.90 | −9.64 | −17.89 |
| 2 | 1 | −9.6 | −9.79 | −6.10 | |
| 3 | 1 | −6.6 | −6.56 | −3.51 | −8.89 |
| 4 | 2 | −8.4 | −12.60 | −7.01 | −13.04 |
| 5 | 2 | −8.5 | −13.83 | −9.63 | −14.63 |
| 6 | 1 | −7.9 | −7.47 | −3.28 | |
| 7 | 1 | −10.1 | −12.04 | −7.91 | −13.57 |
| 8 | 1 | −11.8 | −12.78 | −8.11 | |
| 9 | 1 | −12.6 | −14.79 | −9.91 | |
| 10 | 3 | −7.9 | −11.66 | −6.49 | −8.84 |
| 11a | 1 | −11.1 | −12.76 | −9.44 | |
| 11b | 1 | −11.1 | −12.75 | −9.22 | |
| 12 | 1 | −13.3 | −15.55 | −10.07 | |
| 13 | 1 | −14.1 | −16.40 | −12.93 | −20.45 |
| 14 | 1 | −11.6 | −14.42 | −8.03 | |
Table III, Fig. 6, and Table IV present the predicted binding free energies obtained by HF-D3(BJ) and TPSS-D3(BJ) post-processing with the triple-zeta + polarization basis set Def2-TZVP, along with linear correlation and error metrics with respect to experiment, for the complete set of host–guest complexes shown in Fig. 3. Table III also presents RI-MP2/Def2-TZVP binding free energies for a subset of these complexes – see Sec. III for details. It may be seen from Table IV that HF-D3(BJ), with a MSE of −2.3 kcal/mol for the full set of complexes, tends to overestimate the binding free energies compared to experiment [though not nearly as much as DFTB3-D3(BJ)], whereas TPSS-D3(BJ), with a full set MSE of 2.2 kcal/mol, underestimates the binding free energies. Examining the charge =+1 and charge >1 sets separately, it was seen that HF-D3(BJ) yields very good results for the charge =+1 set, with a small over binding, MSE = −1.4 kcal/mol, but noticeably larger over binding errors for the charge >1 set (MSE = −4.6 kcal/mol). TPSS-D3(BJ), on the other hand, shows the opposite trend with clear under binding, MSE = 2.8 kcal/mol, for the charge =+1 set and very small errors over and under binding, MSE = 0.5 kcal/mol and MAE = 1.0 kcal/mol, for the charge >1 set.
FIG. 6.
Comparison of binding free energy (kcal/mol) of the Sampl4 CB7 set obtained from experiments vs calculated at (a) HF-D3(BJ)/def2-TZVP//[DFTB3-D3(BJ)]-VM2 and (b) TPSS-D3(BJ)/def2-TZVP//[DFTB3-D3(BJ)]-VM2 for the training set with charge =+1 (blue) and charge >+1 (orange).
Both HF-D3(BJ) and TPSS-D3(BJ) yield much improved linear correlations with experiment for the full set of host–guest complexes [R2 = 0.6 for HF-D3(BJ) and R2 = 0.7 for TPSS-D3(BJ)], compared to a correlation of essentially zero for the SEQM-VM2 calculations for the full set, see Table IV and Figs. 4 and 6. This significant improvement in the ability to describe the whole set of host–guest systems, with their full range of charge states, +1 to +3, is likely indicative of the importance of large basis sets as well as improvement in the underlying QM method. On the other hand, if the linear correlations are again examined for the separate charge =+1 and charge >1 sets (see Fig. 6), the correlations are further improved: for the charge =+1 set, R2 for HF-D3(BJ) and TPSS-D3(BJ) is 1.0 and 0.8, respectively; for the charge >1 set, R2 for HF-D3(BJ) and TPSS-D3(BJ) is 0.8 and 0.5, respectively {also note the improvement of these R2 values over the equivalent [DFTB3-D3(BJ)]-VM2 values, Fig. 5}. These results, together with the mean error data discussed above, suggest that some systematic error remains in the relative description of the charge =+1 set and charge >1 set. While in the long term, efforts will be made to find and address the underlying issues with the physics of the model (e.g., a possible source of this error is the use of a continuum solvation model); in the short term, simple scaling methods that can correct for these errors have been explored, as discussed in Sec. IV D.
The results for the RI-MP2/Def2-TZVP calculations for a subset of seven of the host–guest systems (see Table III) show that RI-MP2 more closely matches the behavior of HF-D3(BJ) than TPSS-D3(BJ). In fact, except for complex 10, RI-MP2 exhibits over binding larger than HF-D3(BJ). This is not too surprising, given that MP2 is known for its tendency to over bind noncovalent complexes due to its incomplete treatment of electron correlation.92 Furthermore, even though the RI approximation greatly reduces the computational cost of MP2, at little cost in accuracy, it is still considerably more costly and memory-intensive than HF-D and DFT-D. Considering all of the reported error metrics in Table IV, the HF-D and DFT-D AIQM methods, with an adequately large basis set, perform similarly well over the full set of host–guest systems. In terms of computational expense, the SCF convergence of the HF-D3(BJ)/Def2-TZVP can be up to a factor of two times faster than TPSS-D3(BJ)/Def2-TZVP. An additional advantage of HF-D over DFT-D is that HF-D avoids possible double counting of dispersion via the −D correction, whereas the accuracy of DFT is functional dependent. Consequentially, the HF-D3(BJ) method with a triple-zeta + polarization quality basis set is recommended for post-processing of [DFTB3-D3(BJ)]-VM2 calculations, when considering accuracy, reliability, and computational cost.
For selected SAMPL4 challenge participant methods, the error metrics RMSEo, RMSEr, slope, and R2 for calculated binding affinities with respect to experimental values are also presented in Table IV to allow comparison with the current QM-VM2 work. A number of SAMPL4 participants submitted multiple entries with the same underlying methodology; in these cases, only the best performing entry is included. Furthermore, SAMPL4 methods achieving an R2 < 0.2 are excluded. An overall comparison between the error metrics for the QM-VM2 approaches restricted Hartree–Fock (RHF)-D3(BJ)//VM2 and TPSS-D3(BJ)//VM2, applied to the full set of host–guest systems, and the SAMPL4 methods shows that the QM-VM2 results are competitive with respect to R2 and that they outperform all SAMPL4 methods with respect to RMSEo, RMSEr, and slope. A more detailed comparison now follows, though, for the sake of brevity only, TPSS-D3(BJ)//VM2 values are discussed, as the RHF-D3(BJ)//VM2 trends are quite similar.
Table IV shows that the best performing SAMPL4 method, OST, achieves an impressive R2 = 0.8, which is slightly better than the R2 = 0.7 achieved by TPSS-D3(BJ)//VM2, but its RMSEo, RMSEr, and slope values (1.9 kcal/mol, 2.8 kcal/mol, and 1.4, respectively) do not compare favorably with the corresponding TPSS-D3(BJ)//VM2 values of 1.4 kcal/mol, 2.1 kcal/mol, and 0.9, respectively. The SAMPL4 method labeled RRHO in Table IV also achieves R2 = 0.8, but its RMSEo, RMSEr, and slope values of 2.5 kcal/mol, 3.7 kcal/mol, and 1.8, respectively, are worse than the OST values with respect to comparison to the TPSS-D3(BJ)//VM2 values. In addition, RRHO, a QM based approach, in contrast to the automatic initial guest molecule placement and conformational search employed by QM-VM2, relies on manual placement and conformational search. Notably, while a manual search is feasible for relatively small and rigid guests or ligand molecules, it will quickly become unmanageable and ineffective with even a modest increase in guest size and flexibility. The SAMPL4 methods achieving R2 values of 0.7 and 0.6, with the exception of SIE+HB (see Table IV), show significantly worse RMSEo, RMSEr, and slope values than TPSS-D3(BJ)//VM2; for example, the method labeled enthalpy achieves R2 = 0.7, RMSEo = 2.7 kcal/mol, RMSEr = 4.0 kcal/mol, and slope = 1.6. The SIE+HB method (R2 = 0.6) has somewhat more competitive RMSEo and RMSEr values of 1.8 kcal/mol and 2.6 kcal/mol, but a poor slope value of 0.2.
B. Progressive scheme (PGSS)
The performance of the newly proposed PGSS method was assessed by comparison of QM-VM2 calculations using the conventional conformational search scheme and QM-VM2 calculations using the PGSS enhanced conformational search. In terms of accuracy, the differences in the binding affinities between the conventional and PGSS QM-VM2 schemes are negligibly small, less than 0.1 kcal/mol, as shown in the supplementary material. In terms of efficiency, several measures are examined in Tables V and S6 in the supplementary material. The CB7-guest10 and CB7-guest14 complexes (see Fig. 3) were chosen as representatives of the charge >+1 and charge =+1 host–guest sets, respectively, to demonstrate the performance of PGSS. All of the SEQM-VM2 calculations were carried out on an Intel® Xeon® CPU E5-2695 v2 (2.40 GHz) with 24 cores in one node. Several noteworthy observations are made. First, the total CPU time and the number of VM2 iterations may be larger for PGSS QM-VM2 runs. This is because the number of conformers explored during the conformational search step can be considerably larger for PGSS QM-VM2. On the other hand, the time spent per generated conformer is still shorter for the PGSS scheme. In addition, the number of conformers that contribute to the Boltzmann distribution is slightly larger for the PGSS scheme. In other words, within the same amount of time, the PGSS scheme can sample a larger conformational space. Thus, it is concluded that even with the current default values of PGSS control parameters, the PGSS scheme provides a powerful boost for the conformational search in QM-VM2 and can be further enhanced by optimizing the PGSS control parameters.
TABLE V.
Comparison of timing and efficiency metrics between conventional QM-VM2 and PGSS QM-VM2 for complexes CB7-guest10 and CB7-guest14.
| Conventional | PGSS | Conventional | PGSS | |
|---|---|---|---|---|
| CB7-guest10 | CB7-guest14 | |||
| No. of VM2 iterations | 4 | 4 | 4 | 6 |
| Total CPU time (s) | 105 201.3 | 107 685.5 | 94 502.1 | 126 523.2 |
| Time/iteration (h) | 7.3 | 7.5 | 6.6 | 5.9 |
| No. of Boltzmann average samples | 111 | 132 | 80 | 82 |
| No. of conformers | 1565 | 1655 | 1814 | 2889 |
| Time/conformer (s) | 67.2 | 65.1 | 52.1 | 43.8 |
C. Entropy effect
As mentioned in Sec. III, in the current QM-VM2 study, local configuration integrals were calculated using an MM-based enhanced harmonic approximation, but with the SEQM and AIQM energies used as the leading term, to adjust the bottom of the well – see Eq. (15). Furthermore, Sec. II B describes how the SEQM and AIQM based average energy ⟨E⟩ and the total configurational entropy −TS0 can be backed out of the total chemical potential by the same energy adjustment to Eq. (12), followed by application of Eq. (13), which allows one to examine the importance of inclusion of configurational entropy, even at the MM level (note that the solvent entropy contribution is implicitly included in the continuum solvent model).
The importance of the configurational entropy to the correlation with experimental values is clearly demonstrated in Fig. 7. Using only the energy term (blue dots labeled ΔE) results in a weak linear correlation of the computed and experimental binding affinities [R2 = 0.2 for HF-D3(BJ) and R2 = 0.3 for TPSS-D3(BJ) with the Def2-TZVP basis set]. Including the MM-based entropy term (orange dots labeled ΔE − TΔS) significantly improves the linear correlation of the computed and experimental binding affinities; i.e., R2 increases from 0.2 to 0.6 for HF-D3(BJ) and from 0.3 to 0.7 for TPSS-D3(BJ).
FIG. 7.
Comparison of the binding free energy (kcal/mol) obtained from experiments vs calculated with (a) HF-D3(BJ)/def2-tzvp//[DFTB3-D3(BJ)]-VM2 and (b) TPSS-D3(BJ)/def2-tzvp//[DFTB3-D3(BJ)]-VM2 with the inclusion of various components of free energy.
To assess the validity of using the MM-based entropy term, it is noted that previous MM-based mining-minima studies of host–guest and protein–ligand systems observed an approximately linear relationship between energy and entropy contributions, that is, the large negative energy contribution is canceled partly by a proportional entropy penalty.42,46 Such an approximately linear relationship can also be observed between the QM energy and MM entropy (see the supplementary material).
D. Interpolation
While [DFTB3-D3(BJ)]-VM2 yields good correlation with experiment for the set of charge =+1 host–guest systems in this study (see Table IV and Fig. 5, slope = 1.1, close to the ideal value of 1.0, R2 = 0.8), as described in Sec. IV A, the linear regression R2 values drop considerably when highly charged systems are included, suggesting a systematic error in the relative treatment of charge =+1 and charge >1 systems (see Figs. 4 and 5). Therefore, an interpolation method is proposed to adjust computed binding free energies of highly charged systems to account for systematic error and, thereby, improve the accuracy of predictions without additional computational cost. The scheme requires a low charge (e.g., +1) training set, for which QM-VM2 provides good correlation with known experimental binding free energies.
The impetus for investigating whether errors exhibited for the highly charge systems are sufficiently systematic to allow an interpolation scheme to be useful was the observation that the deviation from experiment in the calculated binding free energies for the highly charged systems in the current study is approximately a multiple of the average deviation for charge =+1 systems. The idea of the proposed empirical adjustment scheme, then, is straightforward: adjust the linear regression line for the charge =+1 system as much to the ideal as is possible, and then, for the highly charged species, apply this same correction scaled by the charge. This leads to the following equation for the interpolation of the binding affinities from DFTB3-D3(BJ)-VM2 calculations:
| (20) |
where Yscaled and Ycalc are the scaled and the computed binding affinities, respectively, a and b are the slope and the y-intercept of the regression line of the charge =+1 set, and q is the charge of the molecule. After introducing the empirical adjustment to the full set, one can observe significant improvement of the predicted binding affinities and their correlation with experiment. The R2 value for q > +1 obtained for the adjusted SEQM-VM2 results increased dramatically, from 0.005 to 0.558, as illustrated in Fig. 8.
FIG. 8.
Interpolated correlation for the Sampl4 CB7 full set. The binding free energies computed at various levels of theory are compared with the experiment. Blue dots, labeled QM-VM2, represent the interpolated predicted values from [DFTB3-D3(BJ)]-VM2. Those in orange and purple are the values obtained from the interpolated post-processing values from HF-D3(BJ)/Def2-TZVP and TPSS-D3(BJ)/Def2-TZVP, respectively.
For the post-processed binding free energies, the correlation with experiment for the charge >+1 set is remarkably good—see Fig. 6 and Table V. However, the number of data points for the highly charged systems in the Sampl4 CB7 set is so limited that good correlation can be coincidental. Applying the same empirical adjustment introduced above [Eq. (20)] does not produce improved correlation. Instead of multiplying by the charges of the systems, dividing by the charges yields much better correlation,
| (21) |
where Yscaled and Ycalc are the scaled and the computed binding affinity at the post-processing step, respectively, and a and b are the slope and the y-intercept of the regression of charge =+1 system at the same level of theory, respectively. This change of the interpolation formulation suggests that the nature of the systematic error may have changed for the AIQM binding free energies relative to those for the SEQM binding energies. After application of the interpolation scheme, the linear correlation with experiment for both HF-D3(BJ) and TPSS-D3(BJ), for the full set of host–guest systems, improved significantly – see Table VI and Fig. 8. For HF-D3(BJ), R2 increased from 0.6 to 0.8, and for TPSS-D3(BJ), R2 increased from 0.7 to 0.9. Furthermore, Table VI shows that all of the error metrics improved substantially. Since this interpolation scheme is an empirical approach, it is not surprising that a significant change in the level of theory may change the nature of any systematic error and, therefore, the form of the best scaling, e.g., from multiplication by charge to dividing by charge.
TABLE VI.
Error metrics of interpolated QM-VM2 binding free energy predictions compared to experiment for the SAMPL4 CB7 dataset. The pre-interpolation values are presented in parentheses.
| Level of theory | MSE | MAE | RMSEo | RMSEr | τ | Slope | Intercept | R2 | |
|---|---|---|---|---|---|---|---|---|---|
| Interpolation | [DFTB3-D3(BJ)]-VM2 | −0.7 | 1.2 | 1.6 | 2.3 | 0.6 | 0.8 | −2.7 | 0.6 |
| of full set | (−7.2) | (7.2) | (3.6) | (5.2) | (0.2) | (0.3) | (−13.9) | (0.0) | |
| RHF-D3(BJ)a | −0.4 | 0.7 | 0.8 | 1.2 | 0.8 | 0.8 | −2.2 | 0.8 | |
| (−2.3) | (2.3) | (1.7) | (2.5) | (0.7) | (1.0) | (−2.2) | (0.6) | ||
| TPSS-D3(BJ)a | 0.0 | 0.7 | 0.8 | 1.2 | 0.9 | 1.1 | 0.1 | 0.9 | |
| (2.2) | (2.4) | (1.4) | (2.1) | (0.6) | (0.9) | (1.6) | (0.7) |
Def2-TZVP basis set is employed for both HF-D3(BJ) and TPSS-D3(BJ).
The proposed interpolation approach demonstrates a simple way to improve the prediction of binding affinities of a dataset when it contains both +1 and highly charged guests. The underlying assumption made here is that the set of charge =+1 guests yields reasonably good linear correlation. While the approach works well within the small dataset presented here, extensive testing of this procedure with large and varied datasets is required to establish general applicability. If found to be broadly applicable, the interpolation approach suggested here could be a practical way of achieving good predictions, even for highly charged systems that are particularly challenging for binding free energy calculations.
V. CONCLUSIONS AND FUTURE WORK
QM-VM2 is an approach that efficiently combines statistical mechanics with quantum mechanical energy potentials in order to calculate noncovalent binding free energies of receptor–ligand systems. The method efficiently couples the use of SEQM energies and geometry optimizations with an underlying MM-based conformational search and allows for processing of the conformers produced at higher levels of QM theory. A progressive scheme for conformer geometry optimizations is introduced in this work as a means to boost conformational sampling efficiency by recognizing that a steep descent on the potential energy surface implies a deep potential well and the gradient at that point can be used as a screening metric for accepting/rejecting conformers.
This first application of QM-VM2 computed the binding free energies of the host molecule cucurbit[7]uril and a set of 15 guest molecules. The results are presented here along with comparisons to experimentally determined binding affinities. SEQM-VM2 based binding free energies do not show good correlation with experiment for the full set of host–guest complexes, which includes highly charged systems (+2 and +3), whereas for just the +1 systems, a significant correlation (R2 = 0.8) is achieved. SEQM-VM2 generation of conformers followed by single-point AIQM calculations, as post-processing corrections, yields good binding affinities and good correlation with experiment, even for the full set of systems, as long as a sufficiently large basis set (at least, triple-zeta quality) is employed.
The importance of the inclusion of configurational entropy, even at the MM level, to the achievement of good correlation with experiment was demonstrated by comparing ΔE values with experiment and finding considerably poorer correlation with experiment than for ΔE − TΔS. For the complete set of host–guest systems with various charges, it was observed that the deviation of the predicted binding free energy from experiment correlates with the net charge of the systems to some extent. Thus, a simple empirical interpolation scheme was proposed to improve the linear regression of the full set.
While this work demonstrates that SEQM-VM2 with AIQM post-processing is a viable approach for predicting absolute binding free energies efficiently (with the proposed PGSS scheme) and accurately (with interpolation), there are several aspects of this method that can be further improved. For the energy model, a continuum solvation model, C-PCM, was employed due to its relatively low cost, and cavitation and dispersion solvation terms were not included. Future work could include these additional terms as a possible route to improved accuracy of the solvation treatment. In addition, to address possible inadequacies with continuum solvation treatments of important localized interactions, especially for highly charged solutes, explicit solvent molecules represented as a polarizable model potential based on quantum mechanics, e.g., the effective fragment potential,100–106 could provide detailed interactions such as hydrogen bonding, without much additional computational cost. The configurational entropy contribution in the current study is obtained at the MM level. It would be interesting to see how the binding affinities and the correlation would be affected if a SEQM or AIQM based entropy term is used.
In the progressive scheme, ecutpg, the energy cutoff when comparing with the lowest-found local minimum, is taken to be a single-valued, user-specified parameter. An optimal value of this parameter that maximizes the benefits of a progressive scheme can be difficult to set even with extensive knowledge or experience for the system. Expressing ecutpg as a function of system-dependent parameters may improve the performance of the progressive scheme.
SUPPLEMENTARY MATERIAL
See the supplementary material for preliminary calculations using various SEQM and ab initio QM methods for two conformers of each complex reported in Sec. I. In Sec. II, the binding free energies using two implicit solvation models are compared. Validation for using MM-based entropy contribution for the calculation of binding free energies is provided in Sec. III. Basis set effects are investigated in Sec. IV. A brief discussion of the rank 1 conformers (with the largest Boltzmann weight) is given in Sec. V. The performance of the progressive scheme is presented in Sec. VI, and the Cartesian coordinates for the rank 1 conformer of the SAMPL4 CB7 complexes are given in Sec. VII.
DEDICATION
This paper is dedicated to Professor Rosalind Franklin and Professor Patricia Thiel. Professor Franklin’s research in x-ray crystallography paved the way for the determination of the structure of DNA and directly led to the discovery of the double helix. Professor Thiel, who passed away too soon in September 2020, was a leader in the field of quasicrystals, an excellent teacher and colleague, and a dear friend.
ACKNOWLEDGMENTS
This work was supported by the National Institute of General Medical Sciences of the NIH (Grant No. R44GM109679 to S.P.W.). The contents are solely the responsibility of the authors and do not necessarily represent the official views of the National Institute of General Medical Sciences. This work was also supported by the University of Colorado Denver UROP and EUReCA programs. The authors thank Lawrence Stewart for automating the process of submission of 209 RI-MP2 GAMESS calculations to the AWS cloud and Michael Potter for helpful discussions regarding this work. S.P.W. declares an equity interest in VeraChem LLC.
Note: This paper is part of the JCP Special Collection in Honor of Women in Chemical Physics and Physical Chemistry.
DATA AVAILABILITY
The data that support the findings of this study are available within the article and its supplementary material.
REFERENCES
- 1.Cram D. J., Science 240, 760 (1988). 10.1126/science.3283937 [DOI] [PubMed] [Google Scholar]
- 2.Pedersen C. J., Angew. Chem., Int. Ed. Engl. 27, 1021 (1988). 10.1002/anie.198810211 [DOI] [Google Scholar]
- 3.Lehn J.-M., J. Inclusion Phenom. 6, 351 (1988). 10.1007/bf00658981 [DOI] [Google Scholar]
- 4.Lehn J.-M., Angew. Chem., Int. Ed. Engl. 29, 1304 (1990). 10.1002/anie.199013041 [DOI] [Google Scholar]
- 5.Rekharsky M. V. and Inoue Y., Chem. Rev. 98, 1875 (1998). 10.1021/cr970015o [DOI] [PubMed] [Google Scholar]
- 6.Masson E., Ling X., Joseph R., Kyeremeh-Mensah L., and Lu X., RSC Adv. 2, 1213 (2012). 10.1039/c1ra00768h [DOI] [Google Scholar]
- 7.Ma X. and Zhao Y., Chem. Rev. 115, 7794 (2015). 10.1021/cr500392w [DOI] [PubMed] [Google Scholar]
- 8.Cai H., Huang Y.-L., and Li D., Coord. Chem. Rev. 378, 207 (2019). 10.1016/j.ccr.2017.12.003 [DOI] [Google Scholar]
- 9.Schmidt B. V. K. J. and Barner-Kowollik C., Angew. Chem., Int. Ed. 56, 8350 (2017). 10.1002/anie.201612150 [DOI] [PubMed] [Google Scholar]
- 10.Loh X. J., Mater. Horiz. 1, 185 (2014). 10.1039/c3mh00057e [DOI] [Google Scholar]
- 11.Scriba G. K. E., Chromatographia 75, 815 (2012). 10.1007/s10337-012-2261-1 [DOI] [Google Scholar]
- 12.Zhang X., Zhang Y., and Armstrong D. W., in Comprehensive Chirality, edited by Carreira E. M. and Yamamoto H. (Elsevier, Amsterdam, 2012), pp. 177–199. [Google Scholar]
- 13.Erdemir S., Bahadir M., and Yilmaz M., J. Hazard. Mater. 168, 1170 (2009). 10.1016/j.jhazmat.2009.02.150 [DOI] [PubMed] [Google Scholar]
- 14.Aksoy T., Erdemir S., Yildiz H. B., and Yilmaz M., Water, Air, Soil Pollut. 223, 4129 (2012). 10.1007/s11270-012-1179-4 [DOI] [Google Scholar]
- 15.Smith V. J., Rougier N. M., de Rossi R. H., Caira M. R., Buján E. I., Fernández M. A., and Bourne S. A., Carbohydr. Res. 344, 2388 (2009). 10.1016/j.carres.2009.08.036 [DOI] [PubMed] [Google Scholar]
- 16.Uekama K., Hirayama F., and Irie T., Chem. Rev. 98, 2045 (1998). 10.1021/cr970025p [DOI] [PubMed] [Google Scholar]
- 17.Gidwani B. and Vyas A., BioMed Res. Int. 2015, 198268. 10.1155/2015/198268 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Barrow S. J., Kasera S., Rowland M. J., del Barrio J., and Scherman O. A., Chem. Rev. 115, 12320 (2015). 10.1021/acs.chemrev.5b00341 [DOI] [PubMed] [Google Scholar]
- 19.Muddana H. S., Daniel Varnado C., Bielawski C. W., Urbach A. R., Isaacs L., Geballe M. T., and Gilson M. K., J. Comput.-Aided Mol. Des. 26, 475 (2012). 10.1007/s10822-012-9554-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Muddana H. S., Fenley A. T., Mobley D. L., and Gilson M. K., J. Comput.-Aided Mol. Des. 28, 305 (2014). 10.1007/s10822-014-9735-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Yin J., Henriksen N. M., Slochower D. R., Shirts M. R., Chiu M. W., Mobley D. L., and Gilson M. K., J. Comput.-Aided Mol. Des. 31, 1 (2017). 10.1007/s10822-016-9974-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Rizzi A., Murkli S., McNeill J. N., Yao W., Sullivan M., Gilson M. K., Chiu M. W., Isaacs L., Gibb B. C., Mobley D. L., and Chodera J. D., J. Comput.-Aided Mol. Des. 32, 937 (2018). 10.1007/s10822-018-0170-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Amezcua M., El Khoury L., and Mobley D. L., “SAMPL7 host–guest challenge overview: Assessing the reliability of polarizable and non-polarizable methods for binding free energy calculations,” J. Comput.-Aided Mol. Des. 35, 1 (2021). 10.1007/s10822-020-00363-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Gilson M. K., Given J. A., Bush B. L., and McCammon J. A., Biophys. J. 72, 1047 (1997). 10.1016/s0006-3495(97)78756-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kollman P., Chem. Rev. 93, 2395 (1993). 10.1021/cr00023a004 [DOI] [Google Scholar]
- 26.Mikulskis P., Cioloboc D., Andrejic M., Khare S., Brorsson J., Genheden S., Mata R. A., Soderhjelm P., and Ryde U., J. Comput.-Aided Mol. Des. 28, 375 (2014). 10.1007/s10822-014-9739-x [DOI] [PubMed] [Google Scholar]
- 27.Homeyer N. and Gohlke H., Mol. Inf. 31, 114 (2012). 10.1002/minf.201100135 [DOI] [PubMed] [Google Scholar]
- 28.Yin J., Henriksen N. M., Slochower D. R., and Gilson M. K., J. Comput.-Aided Mol. Des. 31, 133 (2017). 10.1007/s10822-016-9970-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hsiao Y.-W. and Söderhjelm P., J. Comput.-Aided Mol. Des. 28, 443 (2014). 10.1007/s10822-014-9724-4 [DOI] [PubMed] [Google Scholar]
- 30.Bhakat S. and Söderhjelm P., J. Comput.-Aided Mol. Des. 31, 119 (2017). 10.1007/s10822-016-9948-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Coleman R. G., Sterling T., and Weiss D. R., J. Comput.-Aided Mol. Des. 28, 201 (2014). 10.1007/s10822-014-9722-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wang J., Wolf R. M., Caldwell J. W., Kollman P. A., and Case D. A., J. Comput. Chem. 25, 1157 (2004). 10.1002/jcc.20035 [DOI] [PubMed] [Google Scholar]
- 33.Vanommeslaeghe K., Hatcher E., Acharya C., Kundu S., Zhong S., Shim J., Darian E., Guvench O., Lopes P., Vorobyov I., and MacKerell A. D., J. Comput. Chem. 31, 671 (2010). 10.1002/jcc.21367 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Henriksen N. M. and Gilson M. K., J. Chem. Theory Comput. 13, 4253 (2017). 10.1021/acs.jctc.7b00359 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Řezáč J., Fanfrlík J., Salahub D., and Hobza P., J. Chem. Theory Comput. 5, 1749 (2009). 10.1021/ct9000922 [DOI] [PubMed] [Google Scholar]
- 36.Stewart J. J. P., J. Mol. Model. 13, 1173 (2007). 10.1007/s00894-007-0233-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Řezáč J. and Hobza P., Chem. Phys. Lett. 506, 286 (2011). 10.1016/j.cplett.2011.03.009 [DOI] [Google Scholar]
- 38.Muddana H. S. and Gilson M. K., J. Chem. Theory Comput. 8, 2023 (2012). 10.1021/ct3002738 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Muddana H. S., Yin J., Sapra N. V., Fenley A. T., and Gilson M. K., J. Comput.-Aided Mol. Des. 28, 463 (2014). 10.1007/s10822-014-9726-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Sure R., Antony J., and Grimme S., J. Phys. Chem. B 118, 3431 (2014). 10.1021/jp411616b [DOI] [PubMed] [Google Scholar]
- 41.Head M. S., Given J. A., and Gilson M. K., J. Phys. Chem. A 101, 1609 (1997). 10.1021/jp963817g [DOI] [Google Scholar]
- 42.Chen W., Chang C.-E., and Gilson M. K., Biophys. J. 87, 3035 (2004). 10.1529/biophysj.104.049494 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Schmidt M. W., Baldridge K. K., Boatz J. A., Elbert S. T., Gordon M. S., Jensen J. H., Koseki S., Matsunaga N., Nguyen K. A., Su S., Windus T. L., Dupuis M., and Montgomery J. A., J. Comput. Chem. 14, 1347 (1993). 10.1002/jcc.540141112 [DOI] [Google Scholar]
- 44.Gordon M. S. and Schmidt M. W., in Theory and Applications of Computational Chemistry, edited by Dykstra C. E., Frenking G., Kim K. S., and Scuseria G. E. (Elsevier, Amsterdam, 2005), pp. 1167–1189. [Google Scholar]
- 45.Barca G. M. J., Bertoni C., Carrington L., Datta D., De Silva N., Deustua J. E., Fedorov D. G., Gour J. R., Gunina A. O., Guidez E., Harville T., Irle S., Ivanic J., Kowalski K., Leang S. S., Li H., Li W., Lutz J. J., Magoulas I., Mato J., Mironov V., Nakata H., Pham B. Q., Piecuch P., Poole D., Pruitt S. R., Rendell A. P., Roskop L. B., Ruedenberg K., Sattasathuchana T., Schmidt M. W., Shen J., Slipchenko L., Sosonkina M., Sundriyal V., Tiwari A., Galvez Vallejo J. L., Westheimer B., Włoch M., Xu P., Zahariev F., and Gordon M. S., J. Chem. Phys. 152, 154102 (2020). 10.1063/5.0005188 [DOI] [PubMed] [Google Scholar]
- 46.Chen W., Gilson M. K., Webb S. P., and Potter M. J., J. Chem. Theory Comput. 6, 3540 (2010). 10.1021/ct100245n [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Webb S. P., Potter M. J., and Stewart L. E., “Benchmarking the VM2 binding free energy software package: host-guest systems” (unpublished).
- 48.Chen W., Ren X., and Chang C. A., ChemMedChem 14, 107 (2019). 10.1002/cmdc.201800801 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.You W., Huang Y. M., Kizhake S., Natarajan A., and Chang C. A., PLoS Comput. Biol. 12, e1005057 (2016). 10.1371/journal.pcbi.1005057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Huang Y.-M. M., Chen W., Potter M. J., and Chang C.-E. A., Biophys. J. 103, 342 (2012). 10.1016/j.bpj.2012.05.046 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.David L., Luo R., and Gilson M. K., J. Comput.-Aided Mol. Des. 15, 157 (2001). 10.1023/a:1008128723048 [DOI] [PubMed] [Google Scholar]
- 52.Kairys V. and Gilson M. K., J. Comput. Chem. 23, 1656 (2002). 10.1002/jcc.10168 [DOI] [PubMed] [Google Scholar]
- 53.Chang C.-E. and Gilson M. K., J. Am. Chem. Soc. 126, 13156 (2004). 10.1021/ja047115d [DOI] [PubMed] [Google Scholar]
- 54.Zhou H.-X. and Gilson M. K., Chem. Rev. 109, 4092 (2009). 10.1021/cr800551w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Chang C.-E., Potter M. J., and Gilson M. K., J. Phys. Chem. B 107, 1048 (2003). 10.1021/jp027149c [DOI] [Google Scholar]
- 56.Potter M. J. and Gilson M. K., J. Phys. Chem. A 106, 563 (2002). 10.1021/jp0135407 [DOI] [Google Scholar]
- 57.Chang C.-E. and Gilson M. K., J. Comput. Chem. 24, 1987 (2003). 10.1002/jcc.10325 [DOI] [PubMed] [Google Scholar]
- 58.Momany F. A. and Rone R., J. Comput. Chem. 13, 888 (1992). 10.1002/jcc.540130714 [DOI] [Google Scholar]
- 59.Qiu D., Shenkin P. S., Hollinger F. P., and Still W. C., J. Phys. Chem. A 101, 3005 (1997). 10.1021/jp961992r [DOI] [Google Scholar]
- 60.Still W. C., Tempczyk A., Hawley R. C., and Hendrickson T., J. Am. Chem. Soc. 112, 6127 (1990). 10.1021/ja00172a038 [DOI] [Google Scholar]
- 61.Luo R., David L., and Gilson M. K., J. Comput. Chem. 23, 1244 (2002). 10.1002/jcc.10120 [DOI] [PubMed] [Google Scholar]
- 62.Chen W., Huang J., and Gilson M. K., J. Chem. Inf. Comput. Sci. 44, 1301 (2004). 10.1021/ci049966a [DOI] [PubMed] [Google Scholar]
- 63.Chang C.-E. A., Chen W., and Gilson M. K., Proc. Natl. Acad. Sci. U. S. A. 104, 1534 (2007). 10.1073/pnas.0610494104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Hill T. L., An Introduction to Statistical Thermodynamics (Dover Publications, 1986). [Google Scholar]
- 65.Jensen J. H., Phys. Chem. Chem. Phys. 17, 12441 (2015). 10.1039/c5cp00628g [DOI] [PubMed] [Google Scholar]
- 66.Fedorov D. G., Olson R. M., Kitaura K., Gordon M. S., and Koseki S., J. Comput. Chem. 25, 872 (2004). 10.1002/jcc.20018 [DOI] [PubMed] [Google Scholar]
- 67.Gilson M. K., Gilson H. S. R., and Potter M. J., J. Chem. Inf. Comput. Sci. 43, 1982 (2003). 10.1021/ci034148o [DOI] [PubMed] [Google Scholar]
- 68.Gaus M., Cui Q., and Elstner M., J. Chem. Theory Comput. 7, 931 (2011). 10.1021/ct100684s [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Grimme S., Antony J., Ehrlich S., and Krieg H., J. Chem. Phys. 132, 154104 (2010). 10.1063/1.3382344 [DOI] [PubMed] [Google Scholar]
- 70.Becke A. D. and Johnson E. R., J. Chem. Phys. 123, 154101 (2005). 10.1063/1.2065267 [DOI] [PubMed] [Google Scholar]
- 71.Johnson E. R. and Becke A. D., J. Chem. Phys. 123, 024101 (2005). 10.1063/1.1949201 [DOI] [Google Scholar]
- 72.Johnson E. R. and Becke A. D., J. Chem. Phys. 124, 174104 (2006). 10.1063/1.2190220 [DOI] [PubMed] [Google Scholar]
- 73.Grimme S., Ehrlich S., and Goerigk L., J. Comput. Chem. 32, 1456 (2011). 10.1002/jcc.21759 [DOI] [PubMed] [Google Scholar]
- 74.Gaus M., Goez A., and Elstner M., J. Chem. Theory Comput. 9, 338 (2013). 10.1021/ct300849w [DOI] [PubMed] [Google Scholar]
- 75.Kubillus M., Kubař T., Gaus M., Řezáč J., and Elstner M., J. Chem. Theory Comput. 11, 332 (2015). 10.1021/ct5009137 [DOI] [PubMed] [Google Scholar]
- 76.Marenich A. V., Cramer C. J., and Truhlar D. G., J. Phys. Chem. B 113, 6378 (2009). 10.1021/jp810292n [DOI] [PubMed] [Google Scholar]
- 77.Li H. and Jensen J. H., J. Comput. Chem. 25, 1449 (2004). 10.1002/jcc.20072 [DOI] [PubMed] [Google Scholar]
- 78.Tomasi J., Mennucci B., and Cammi R., Chem. Rev. 105, 2999 (2005). 10.1021/cr9904009 [DOI] [PubMed] [Google Scholar]
- 79.Barone V. and Cossi M., J. Phys. Chem. A 102, 1995 (1998). 10.1021/jp9716997 [DOI] [Google Scholar]
- 80.Su P., Liu H., and Wu W., J. Chem. Phys. 137, 034111 (2012). 10.1063/1.4736533 [DOI] [PubMed] [Google Scholar]
- 81.Bondi A., J. Phys. Chem. 68, 441 (1964). 10.1021/j100785a001 [DOI] [Google Scholar]
- 82.Kromann J. C., Steinmann C., and Jensen J. H., J. Chem. Phys. 149, 104102 (2018). 10.1063/1.5047273 [DOI] [PubMed] [Google Scholar]
- 83.Barone V., Cossi M., and Tomasi J., J. Chem. Phys. 107, 3210 (1997). 10.1063/1.474671 [DOI] [Google Scholar]
- 84.Tao J., Perdew J. P., Staroverov V. N., and Scuseria G. E., Phys. Rev. Lett. 91, 146401 (2003). 10.1103/physrevlett.91.146401 [DOI] [PubMed] [Google Scholar]
- 85.Weigend F. and Ahlrichs R., Phys. Chem. Chem. Phys. 7, 3297 (2005). 10.1039/b508541a [DOI] [PubMed] [Google Scholar]
- 86.Feller D., J. Comput. Chem. 17, 1571 (1996). [DOI] [Google Scholar]
- 87.Schuchardt K. L., Didier B. T., Elsethagen T., Sun L., Gurumoorthi V., Chase J., Li J., and Windus T. L., J. Chem. Inf. Model. 47, 1045 (2007). 10.1021/ci600510j [DOI] [PubMed] [Google Scholar]
- 88.Pritchard B. P., Altarawy D., Didier B., Gibson T. D., and Windus T. L., J. Chem. Inf. Model. 59, 4814 (2019). 10.1021/acs.jcim.9b00725 [DOI] [PubMed] [Google Scholar]
- 89.Katouda M. and Nagase S., Int. J. Quantum Chem. 109, 2121 (2009). 10.1002/qua.22068 [DOI] [Google Scholar]
- 90.Pham B. Q. and Gordon M. S., J. Chem. Theory Comput. 15, 5252 (2019). 10.1021/acs.jctc.9b00409 [DOI] [PubMed] [Google Scholar]
- 91.Kendall M. G., Biometrika 30, 81 (1938). 10.2307/2332226 [DOI] [Google Scholar]
- 92.McGibbon R. T., Taube A. G., Donchev A. G., Siva K., Hernández F., Hargus C., Law K.-H., Klepeis J. L., and Shaw D. E., J. Chem. Phys. 147, 161725 (2017). 10.1063/1.4986081 [DOI] [PubMed] [Google Scholar]
- 93.Zheng L. and Yang W., J. Chem. Theory Comput. 8, 810 (2012). 10.1021/ct200726v [DOI] [PubMed] [Google Scholar]
- 94.Monroe J. I. and Shirts M. R., J. Comput.-Aided Mol. Des. 28, 401 (2014). 10.1007/s10822-014-9716-4 [DOI] [PubMed] [Google Scholar]
- 95.Purisima E. O., J. Comput. Chem. 19, 1494 (1998). [DOI] [Google Scholar]
- 96.Hogues H., Sulea T., and Purisima E. O., J. Comput.-Aided Mol. Des. 28, 417 (2014). 10.1007/s10822-014-9715-5 [DOI] [PubMed] [Google Scholar]
- 97.Bennett C. H., J. Comput. Phys. 22, 245 (1976). 10.1016/0021-9991(76)90078-4 [DOI] [Google Scholar]
- 98.Ponder J. W., Wu C., Ren P., Pande V. S., Chodera J. D., Schnieders M. J., Haque I., Mobley D. L., Lambrecht D. S., DiStasio R. A., Head-Gordon M., Clark G. N. I., Johnson M. E., and Head-Gordon T., J. Phys. Chem. B 114, 2549 (2010). 10.1021/jp910674d [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Klamt A. and Schüürmann G., J. Chem. Soc., Perkin Trans. 2 799 (1993). 10.1039/p29930000799 [DOI] [Google Scholar]
- 100.Day P. N., Jensen J. H., Gordon M. S., Webb S. P., Stevens W. J., Krauss M., Garmer D., Basch H., and Cohen D., J. Chem. Phys. 105, 1968 (1996). 10.1063/1.472045 [DOI] [Google Scholar]
- 101.Webb S. P. and Gordon M. S., J. Phys. Chem. A 103, 1265 (1999). 10.1021/jp983781n [DOI] [Google Scholar]
- 102.Merrill G. N., Webb S. P., and Bivin D. B., J. Phys. Chem. A 107, 386 (2003). 10.1021/jp0220128 [DOI] [Google Scholar]
- 103.Merrill G. N. and Webb S. P., J. Phys. Chem. A 107, 7852 (2003). 10.1021/jp030073f [DOI] [Google Scholar]
- 104.Merrill G. N. and Webb S. P., J. Phys. Chem. A 108, 833 (2004). 10.1021/jp030970j [DOI] [Google Scholar]
- 105.Shanker S. and Bandyopadhyay P., J. Phys. Chem. A 115, 11866 (2011). 10.1021/jp2073864 [DOI] [PubMed] [Google Scholar]
- 106.Sattasathuchana T., Xu P., and Gordon M. S., J. Phys. Chem. A 123, 8460 (2019). 10.1021/acs.jpca.9b05801 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
See the supplementary material for preliminary calculations using various SEQM and ab initio QM methods for two conformers of each complex reported in Sec. I. In Sec. II, the binding free energies using two implicit solvation models are compared. Validation for using MM-based entropy contribution for the calculation of binding free energies is provided in Sec. III. Basis set effects are investigated in Sec. IV. A brief discussion of the rank 1 conformers (with the largest Boltzmann weight) is given in Sec. V. The performance of the progressive scheme is presented in Sec. VI, and the Cartesian coordinates for the rank 1 conformer of the SAMPL4 CB7 complexes are given in Sec. VII.
Data Availability Statement
The data that support the findings of this study are available within the article and its supplementary material.








