Abstract
Solvation forces dominate protein structure and dynamics. Integral equation theories allow a rapid and accurate evaluation of the effect of solvent around a complex solute, without the sampling issues associated with simulations of explicit solvent molecules. Advances in integral equation theories make it possible to calculate the angle dependent average solvent structure around an irregular molecular solution. We consider two methodological problems here: the treatment of long-ranged forces without the use of artificial periodicity or truncations and the effect of closures. We derive a method for calculating the long-ranged Coulomb interaction contributions to three-dimensional distribution functions involving only a rewriting of the system of integral equations and introducing no new formal approximations. We show the comparison of the exact forms with those implied by the supercell method. The supercell method is shown to be a good approximation for neutral solutes whereas the new method does not exhibit the known problems of the supercell method for charged solutes. Our method appears more numerically stable with respect to thermodynamic starting state. We also compare closures including the Kovalenko–Hirata closure, the hypernetted-chain (HNC) and an approximate three-dimensional bridge function combined with the HNC closure. Comparisons to molecular dynamics results are made for water as well as for the protein solute bovine pancreatic trypsin inhibitor. The proposed equations have less severe approximations and often provide results which compare favorably to molecular dynamics simulation where other methods fail.
INTRODUCTION
The advantages of being able to easily calculate three-dimensional (3D) solvent distribution functions, rather than their simple radial counterparts, are readily apparent for any solute species that is not spherically symmetric. For molecules with a very complex structure, such as a protein, knowledge of the solvent structure at specific surface and interior sites is essential for understanding many complicated and localized interactions contributing to nearly all biological processes. It is with this motivation that many researchers have contributed to the development of integral equation (IE) theoretical methods to provide such 3D information.
Beglov and Roux1 solved an Ornstein–Zernike (OZ) and hypernetted-chain (HNC) equation system in three dimensions for Lennard-Jones particles around (and inside) an asymmetric shape and also for molecules in a polar liquid.2 Ikeguchi and Doi3 proposed the natural extension of the original reference interaction site model (RISM) formalism to a 3D grid. Cortis et al.4 developed a 3D counterpart to the RISM equation and generated results without many of the anomalies found in one-dimensional (1D) RISM∕HNC results. Since that time these methods have also been applied to polar molecules in water,5 to ions in solution6, 7 and, in conjunction with density functional theory, to water-metal interfaces.8 Kovalenko and Hirata have worked on several biochemical systems to examine the structure of solvent around proteins,9 as well as to locate water molecules in internal protein cavities.10
While 3D IE results are becoming an approach of great utility, they suffer from two drawbacks. The first is that the Coulomb interactions between a solute and solvent have not been treated in a formally exact way like the simple radial systems. It is possible to have reasonable short-ranged structure when other properties of the solution, such as dielectric behavior, are seriously flawed.11 In general, if Coulomb interactions are not included exactly in IE methods for all charged sites, whether by a renormalization or some other method, results for ionic solutions are often unavailable, and if numerical solutions can be found they are often poor.11, 12 Cortis et al.4 have used a chain sum renormalization method to solve for self-consistent, exact Coulomb contributions, but their method was developed for systems with solute and solvent of the same species. Much work has been done using the artificial periodicity suggested by the numerical implementation of a fast Fourier transform (FFT) on a 3D grid to achieve numerically stable solutions with physically reasonable short-ranged structure.2, 5, 6, 8, 9, 10
Such methods impose a type of periodicity and are often referred to as a supercell method. Since true lattice-periodic or Ewald methods13 are quite mature in the statistical mechanical literature, particularly in molecular dynamics (MD) simulations,14 the approximation is presumably good and of the same level of accuracy as has been found for MD results. For charged solutes, however, the supercell method does not provide the expected asymptotic behavior in the resulting correlations, and these problems require correction.6, 7 These problems do not appear to arise for neutral solutes, but without an exact method with which to compare, the degree of accuracy of this method cannot be known.
In this work we consider the treatment of long-ranged forces without the use of artificial periodicity or truncations, in combination with the effect of closures on 3D IEs. We present a rewriting of the 3D OZ and HNC equations such that the infinite range of Coulomb interactions are fully included in the results, without any level of periodicity. It is a 3D extension of the approach that has been used for 1D radial equation systems for some time.15, 16
The second question relates to the closure used. The IEs for many model systems at some thermodynamic states may appear to have no solution. For the HNC equation this seems to be due to the well-known short-ranged overcorrelation yielded for many model systems.11, 17, 18 In particular a problem arises when there are small surface areas near a solute molecule where the local solute-solvent pairwise added potential has a deep well, causing the corresponding peaks in particle distributions to be unphysically high or numerically divergent. In an exact theory, these deep wells would be mitigated by additional correlations between the sites on the solute and sites on several solvent molecules. Many such correlation contributions are missing from the HNC theory. The correlations which are missing are well known and often represented by bridge diagrams,17 with the lowest order missing diagrams representing a type of correlations between four solvent or solute sites. These missing correlations have been calculated for simple fluids19, 20, 21, 22 and have shown improvement over the HNC closure for radial systems.
Attempts to correct for the missing correlations in 3D systems have generally been to alter the HNC equation in some way. One idea suggested by Kovalenko and Hirata has been to simply not exponentiate the atomic potential of mean force wherever it is negative.8 This would prevent over correlation when only a few small areas in the potential surface are causing the problem. Another idea is to angle average the exponent of some part of the full solvent molecular potential.4 The third is to create an approximate bridge function by combining exponentiated solute-solvent site-site potentials using an intramolecular correlation matrix.23 These last two approaches are attempts to correct the correlations between solute sites and solvent sites on the same molecule. Unfortunately, it has been found that adding many bridge diagrams of this type (by direct calculation) to small molecular fluids failed to show nearly as much improvement in the case of 1D IE systems24 as there had been for simple fluids. Additionally, calculating all the interatomic four-body bridge diagrams for a large solute would lose the advantage of relatively short computing times for IE methods. One can, instead of considering just atomic site based bridge functions, formally write down the four-body intermolecular bridge diagrams, with one body being the complete solute molecule and the others being solvent sites. These are full 3D functions but would be computationally prohibitively expensive to calculate. However, if a simple approximation is applied, based on the relationship between exact bridge diagrams and the Percus-Yevick (PY) approximation,19, 25 these diagrams can be reduced to a form where they can be found reasonably easily, yet are still 3D bridge diagrams uniquely valued at each point around an asymmetric solute. When included with the HNC closure, we will refer to this as the HNC bridge (HNCB) closure. We will compare electrostatics methods, and various closures with simulations for the same potential surface and state.
In the following section the formal expression for the handling of long-ranged Coulomb interactions is presented and compared analytically to the supercell method, which has been used frequently in the literature.6, 7, 26 We also give expressions for the PY approximation to the four-body solvent-solute intermolecular 3D bridge diagrams. In Sec. 3 some example results are given. We check for numerical convergence using water as both solute and solvent and compare correlations from MD simulations with various closures and methods for handling the electrostatics for this model. We then compare MD simulation results with results using various closures for a model protein system, bovine pancreatic trypsin inhibitor (BPTI). The final section contains conclusions about the combined approach suggested here.
THEORY
The IE system in three dimensions
The model system for which we wish to calculate 3D distribution functions and system thermodynamic quantities is a multisite solute, like a protein, dissolved at infinite dilution in a multisite solvent, like water. In order to calculate the solvent distribution around the solute we will need to know the bulk solvent structure. For this we use the solution of the dielectrically consistent RISM∕HNC (DRISM∕HNC)11, 27 equations, solved in the usual manner for the aqueous solvent alone, giving as its result pair correlation functions hij(rij), where i and j are any of the nv sites on the solvent and rij is the intersite separation distance. Throughout this paper we will denote all solvent intersite functions with small letters.
The solute-solvent structure is represented in terms of the functions Cui(x,y,z) and Hui(x,y,z), the direct and total correlation functions, respectively. These two functions are single valued over the 3D coordinate system. The subscript u represents the solute molecule and the subscript i represents one atomic site on a solvent molecule. While a solute molecule can have a large number of atomic sites, it is treated as a single species so in general there will be one function pair for each atomic site i present at finite concentration in the solvent, unless solvent molecule symmetry provides some degeneracy. Capital letters will be used to represent the solute-solvent functions.
As is the case for IE theories of site-site distances only, it is convenient to write the closure equation in terms of the difference
(1) |
Since the model for both solute and solvent is of the usual site-site type, the full solute-solvent intermolecular potentials are still a sum of pair potential functions depending on interatomic distances rαi
(2) |
where the subscript α can represent any of the nu atoms on the solute molecule, and the subscript i represents a site on a solvent molecule. The site-site pair potential functions used here are of the Lennard-Jones plus Coulombic term form
(3) |
with qi being an atomic (perhaps partial) charge, σαi and ϵαi the usual Lennard-Jones parameters and β the energy factor 1∕kBT, with kB being the Boltzmann constant and T the absolute temperature.
The HNC equation is solved over the entire 3D (x,y,z) space and can be written as
(4) |
and we note that the sum over potential terms represents the interaction potential of an entire solute molecule with a single site i on the solvent.
The form of the OZ equation used is analogous to that used in site-site theories and can be written in matrix form as
(5) |
where is the ith element of the 1×nv-dimensional matrix . The matrix has elements, where k is the distance from the origin in Fourier space and dij is the intramolecular distance between site i and site j on a solvent molecule. The value of each wii(k), on the diagonal of the matrix, is 1. The matrix has elements and the matrix ρ is a diagonal matrix of elements ρv, the solvent number density. The matrices ρ, , and are all of the order nv×nv. The tilde notation denotes FT, where
(6) |
The functions are also 3D FTs of their r-space counterparts hij(r) but because they depend only on the intermolecular distance between molecular sites on the solvent, their transforms are equivalent to zeroth order Hankel transforms
(7) |
where rij is the distance between atom i on one solute molecule and atom j on another solute molecule.
Reduction of the OZ equation
Because of the large data structures needed in obtaining a solution to the equations, it is advantageous to write Eq. 5 in terms of only the unique sites on the solvent. If we first rewrite the OZ equation in terms of and we obtain
(8) |
where I is the identity matrix. Now expanding the matrix multiplication and using the symmetry of the two hydrogen atoms in the water model we can write
(9) |
and
(10) |
where the subscripts O and H represent the oxygen and hydrogen solvent sites, respectively, and ρv is the solvent density.
Long-ranged function resummation
Separation of long-ranged and short-ranged contributions
Because of the long-ranged nature of Eq. 3, we resume the equations in order to avoid numerical problems. The form of Eq. 2 suggests that we define a function
(11) |
where each (as yet unspecified) function ϕαi in the sum has the asymptotic property
(12) |
and is finite valued at small rui, so that we can write
(13) |
and
(14) |
where the superscript s indicates a short-ranged function. As is the case with purely radially dependent IE systems, we also define
(15) |
which implies
(16) |
to provide short-ranged counterparts of the solute-solvent functions to be used in our equation system. For now we will assume that the Φui functions can be Fourier transformed, and will denote the transforms by . The details of this process will be shown below. Equations 15, 16 can be applied in Eq. 9 to give
(17) |
which is easily rearranged into the form
(18) |
where
(19) |
and
(20) |
A similar derivation using Eq. 10 gives
(21) |
where
(22) |
and
(23) |
The closure [Eq. 4] can also be written in terms of short- and long-ranged contributions. If we again assume that the inverse FTs of and exist, they can be calculated separately and combined after the transform process to give
(24) |
which can be combined with Eqs. 14, 15, 16 to rewrite Eq. 4 as
(25) |
Calculation of the long-ranged contribution
The function Φαi(x,y,z) is composed of component functions ϕαi(rαi)=ϕαi(|ri−sα|) where each ϕαi(|ri−sα|) is isotropic about a fixed point sα in (x,y,z)-space. We can FT each of these components separately by invoking a coordinate translation which is equivalent to choosing sα as a new origin. Using this translation we find
(26) |
Since FTs are linear we can apply this result in Eq. 11 and write
(27) |
We now apply Eq. 27 in Eqs. 20, 23, giving
(28) |
and
(29) |
which define the component functions and of and , respectively. The form of Eqs. 28, 29 is exactly analogous to that of Eq. 27 which was the result of the FT of Eq. 11. Therefore, we do not need to calculate the exponential prefactors for each term in Eqs. 28, 29, nor do we need to calculate 3D inverse transforms. Since each of these prefactors is equivalent to a change of origin to a solute site, we can simply write the inverse FT of as
(30) |
provided that the Fourier function pairs and θαi(r) exist for each solute site α and each solvent site i, and it is understood that the θαi(rαi) are functions of |ri−sα|, each one isotropic around the site position sα.
The long-ranged contributions are, therefore, calculated as follows. A function ϕαi(rαi) is chosen for each site α on the solute and i on the solvent for which there is a long-ranged interaction, such that it has the property given in Eq. 12, and for which an analytically known FT exists. Each ϕαi(rαi) is Fourier transformed as if isotropic about the origin and from the resulting functions the set of functions is calculated using Eqs. 28, 29. Each of the functions are inverse Fourier transformed numerically and then combined to form the 3D data set Θui(x,y,z). This is done for each solvent position in the (x,y,z) grid by calculating the distance to the position of each solute site α, evaluating the corresponding θαi(rαi) function there and adding them, as prescribed by Eq. 30.
A useful but not unique choice for the functions ϕαi(rαi) is15
(31) |
for which the FT is known to be
(32) |
By precalculating and Θui(x,y,z), the OZ and HNC equations can be solved as rewritten, in terms of constant and short-ranged functions only, using Eqs. 19, 22, 24, 25. We note that in deriving this representation of the original equation system no additional approximations were made.
In practice, since all the component functions ϕαi and θαi are radially symmetric and fixed throughout the calculation, they can be precalculated quickly and easily. It is convenient to use the same r and k-space grids as for the solvent-solvent functions and then interpolate the θαi(rαi) functions to the functions of x, y, and z.
Long-ranged behavior of the supercell method
An approximate method for handling the long-range Coulombic tails that has been used frequently is the supercell method,6, 7, 26 which replaces the electrostatic part of the interaction potential between solute and solvent species with an Ewald-like term
(33) |
We wish to examine the differences between the supercell method and the exact method given in Sec. 2C above, and in particular any approximations that Eq. 33 may imply, so we derive Eq. 33 beginning with the exact expression for the electrostatic potential.
(34) |
where we have separated the electrostatic interaction potential into a long-ranged term (the inverted FT) and a short-ranged remainder. Now let rαi=r−sα
(35) |
Since we have made no approximations, the last form of Eq. 35 is equivalent to the bare Coulomb interaction with which we began, and comparison with Eq. 33 shows that there are only two differences between the exact electrostatic potential and the supercell form. The supercell form replaces the continuous inverse FT over all space with its discrete counterpart, and the value of the integrand at k=0 is set to 0.
When replacing a continuous FT of a function by a discrete one, how well the function is sampled is of importance. Figure 1 shows the result of the supercell equation for a central water molecule interacting with an oxygen site on a solvent molecule. For each box size shown, the supercell potential function goes to 0 at the edge of the data region. The sampling theorem applied to this inverse FT tells us that if the function we are sampling with a Δk grid is bandwidth limited to distances smaller than rc=NΔr=π∕Δk then the function will be 0 for r>rc, and the function will be completely determined by this sampling grid. This is the same relationship between Δr and Δk enforced by FFT routines. However, the converse of the sampling theorem is not always true; just because the transform of a discretely sampled function goes to 0 at the edge of the data region does not necessarily mean that the original function has been adequately sampled. It could be that the result has a serious aliasing problem, but the symmetry of the transform is such that the periodic aliased data images will sum to 0 exactly at the edge of the box.
If the supercell equation was adequately sampling the Coulomb interaction, then decreasing Δk and increasing the number of points so that the range in k-space was constant would leave the r-space result of the inverse transform unchanged. As shown in Fig. 1, the result of the supercell equation changes significantly as the grid spacing is changed. This is not surprising because the long-ranged part of the potential for this model is a dipole-charge interaction which, in the direction parallel to the dipole shown in Fig. 1, will have a 1∕r2 dependence, but is positive in one direction and negative in the other. The aliasing phenomenon enforces a box-length periodicity, so negative and positive aliases cancel at the data boundary. Because of the 1∕r2 dependence, there is no value of Δk small enough to completely sample the function. Therefore there will always be an infinitely long-ranged contribution outside the range of the data structure in r-space which has been effectively truncated. In spite of this problem, when the supercell method is used for a charge neutral solute, a small enough Δk can provide good numerical results. But, in the case of charged solutes, there is a 1∕k2 pole at k=0 corresponding to the long-ranged contributions outside the data structure which have Coulombic 1∕r dependence and ignoring them causes serious problems as will be discussed below.
Since there is a partial similarity between Eq. 33 and that used in the well-known Ewald sum13 method for handling Coulomb interactions in MD simulations, some discussion of the differences is appropriate. First, the quantity appropriate to IE theories is the interaction potential, the energy between each separate pair of molecules or sites in the ensemble. This is distinct from the Ewald sum which represents the normalized total electrostatic potential energy between all pairs of molecules or sites. As with Eq. 33, the Ewald sum also eliminates the long-ranged nature of the interactions, but does so by the physics of a lattice of a finite concentration of charges which gives the cancellation of a large number of infinite long-ranged Coulomb tails by conditional summation. The Ewald sum is over not just all the pairs of charges in the central calculation box, but also over their images in all periodic replicas. The sum over the periodic replicas is an essential feature of the Ewald sum expression and is not present in Eq. 33 (nor should it be). Since the equation used in the supercell method is a pair potential, the elimination of long-ranged tails in the method is not due to the physics of electrostatic screening as it is in the Ewald sum.
Ewald sum errors due to the periodicity of the replicas have been well researched,28, 29 but these errors are due to the difference between the periodic positions of image charges on one hand, and the value of the interaction energy that would be found for an equally large sized nonperiodic system on the other. Periodicity errors in Ewald sums are distinct from under sampling aliasing errors due to the periodic nature of discrete FTs. Ewald sums can also have aliasing errors due to insufficient numerical sampling, but this is usually due to an inadequate number of replicas or because the central calculation box is not large enough with respect to the distance dependence of screening effects. Unlike the supercell method, however, careful application of the Ewald method can eliminate the aliasing errors because the resulting function is screened and therefore not long-ranged. Because of these considerations we choose to examine the behavior of Eq. 33 simply as a mathematical approximation.
Despite being a method which changes the long-ranged part of the electrostatic potential, as we will show in the results section below, the supercell method is a surprisingly good approximation for the case of a simple neutral solute, although has known problems for ionic solutes which require correction.6, 7 To show why this is true, we cast the approximation in terms of the quantities of Sec. 2B. As Fig. 1 demonstrates, Eq. 33 represents a pruned form of the electrostatic potential and is short-ranged, so it can therefore be considered to be a candidate for the quantity from Eq. 14. When the supercell method is used, the corresponding long-ranged piece Φui is effectively set to 0 which in turn sets and the functions . While we do not write down an exact analytical expression for the missing long-ranged piece Φui, we know it will only have a significant value at small k. To find out how good an approximation this is we examine the low k behavior of Eq. 20.
For the case of a neutral solute, as k→0, , the relevant Kirkwood G, and which gives
(36) |
Note that in this region ΦuO and ΦuH must have the same functional form, differing only in being proportional to the charges of the species involved, so we will have ΦuO=−2ΦuH. Unless the values of the Φui are significantly large and the values of the are sufficiently different near to k=0, the values of the will be small in this low k region. A similar analysis of Eq. 23 shows will also be small. Therefore, a good choice in dividing the potential into long and short-ranged pieces provides a situation where almost all the correlation information is retained in the short-ranged piece.
For the case of a charged solute, however, the missing long-ranged piece will by definition have a pole at k=0, with the low k behavior of the relevant functions as follows:
(37) |
which, when used in Eq. 20, gives
(38) |
where we have used qO=−2qH. Similarly, when applied in Eq. 23 we have
(39) |
(40) |
where ϵ is the static dielectric constant and y=4πβμ2ρv∕6, with μ the dipole moment of the solvent, it is easily shown that
(41) |
This gives the dielectric behavior that should be expected in the solvent-solute correlations for ionic solutes when the exact expressions for the Φui are used. However, when the Φui are set to 0 as in the supercell method, we should expect to see unusual and incorrect dielectric behavior. This is exactly what has been reported in the literature.6, 7
An approximate 3D bridge function
Bridge diagrams often make a useful correction to HNC theories.20, 21, 22, 31 As an approximate 3D bridge function we take the sum of all the four-body bridge diagrams that represent entirely intermolecular interactions. We choose to write these diagrams in terms of total correlation functions because they are available throughout the calculation, provide bridge functions which are short-ranged and represent a multiple sum of many higher order Mayer f-bond diagrams.21 A general expression for the bridge function in terms of exact diagrams is
(42) |
where i, j, and k represent solvent sites and the rij are 3D vector variables. The PY approximation is equivalent to replacing the “difficult” bond in every bridge diagram in the exact closure with −1 for all bridge diagrams.25 For this work we will simply use this replacement in the small set of four-point intermolecular diagrams. Using this approximation Eq. 42 becomes
(43) |
Since the hij(rij) only depend on |rij|, each term in the final expression in Eq. 43 is easily calculated by the usual method of calculating the product in Fourier space, with the result being a 3D approximate bridge function. Note that since the hij(rij) and Hui(rui) are short-ranged, Buk(ruk) is also short-ranged. Since the total correlation functions change as the solution progresses, the Buk(ruk) need to be updated regularly throughout the numerical procedure, as discussed below. When this approximation is employed Eq. 4 can be written as
(44) |
which we refer to as the HNCB closure.
Thermodynamic quantities
We will examine excess internal energies and Kirkwood G’s calculated with respect to each solvent site which are calculated using17
(45) |
where i represents each solvent site and
(46) |
RESULTS
Method of solution
The solvent-solvent total correlation functions hij(rij) are calculated first using the DRISM∕HNC IE theory11 for the TIP3P (Ref. 32) model of water on a radial grid with Δr=0.01 Å and 16 384 points. Neither this resolution nor this range is needed to provide an accurate set of functions, but using them has two advantages. First we will be interpolating a radial result for the long-range contribution onto an (x,y,z) coordinate system, so this fine grid will provide more significant figures in the interpolated data. Second, such a long-range for the functions associated with the Coulomb interactions provides a solution over a range effectively much larger than that of the main 3D data set itself. It remains necessary only to have the main data sets large enough to accurately describe the additional short-ranged structure of the solvent. Once the hij(rij) have been obtained they are used to calculate the θαi(rαi), which are then interpolated onto the 3D grid to form the Θui(x,y,z) as described in Sec. 2C. In practice, splines or higher order interpolation schemes have hardly noticeable advantages over linear interpolation at this resolution; the extra information carried by the fine grid provides the required accuracy. The remaining short-ranged contribution to the potential is also precalculated.
Once the fixed data sets have been stored, a modified direct inversion of the iterative subspace (MDIIS) algorithm33 is used to find a solution to the main equations, as given in Sec. 2B. If the bridge functions described in Sec. 2E are being used, it is necessary to update them throughout the main iterative procedure. While it is not necessary to update them at each MDIIS cycle, the approach to solution appears to be more efficient if that is done.
Results for a simple model
Before turning to a more interesting model we will examine the effects of applying the main result of this paper: solving the 3D-RISM∕HNCB equations using exact Coulombic potentials. To do this we will examine a simple model—a single water molecule as a solute in a water bath of water of the same model. We wish to discover how calculating only the short-ranged part of the correlation functions in the 3D data structures affects the range and grid spacing necessary to achieve converged values of thermodynamic and structural quantities. We also will compare the results to those calculated by MD simulation and those using the supercell method, the Kovalenko–Hirata (KH) closure and the approximate 3D bridge diagram we have proposed.
The water MD simulation was performed using the same TIP3P water model as for the IE systems, with one water fixed with its charge center at the origin of the calculation box, and its dipole oriented along the z-axis. The system was composed of 216 water molecules in an 18.622 Å cubic box. The simulation was performed in the microcanonical (NVE) ensemble at 300 K. The simulation was performed using the extended system program (ESP).34 The bonds were kept rigid using the Rattle35 implementation of the Shake method36 and the electrostatic interactions were evaluated using the Ewald sum technique.13 The equations of motion were integrated using the Velocity Verlet algorithm37 with a 2 fs time step and the coordinates were saved every 25 steps. For the calculation of the Hui(x,y,z) the simulation box was unwrapped, and atomic positions were tabulated in a plane of 0.4 Å cubes centered on the plane (x=0, y=0) containing the atoms of the central fixed water molecule.
Grid spacing and data range
We now consider the effect grid spacing has on the correlation functions. Figure 2 shows a line of data, extracted from the full 3D result for HOO(x,y,z), with x and y held constant at a value of 0. This line represents the line of symmetry through the water molecule that passes through the oxygen atom and bisects the two hydrogen atoms. The figure shows results calculated using a fixed range in all three dimensions, 19.2 Å, while varying the grid spacing. The coarsest grid (0.8 Å, 243 points) represented by the circles is rather inaccurate, but the next coarsest, (0.4 Å, 483 points) represented by diamonds, is surprisingly good despite the relatively small number of points in each dimension. While the dashed line connecting the calculated points does not approximate the finer grid results very well, the actual data points lie very close to the finer grid results everywhere. The three finer grids are only distinguishable on the scale of the graph at the tips of the peaks and in the very steepest regions.
Table 1 gives the results of excess internal free energy per particle, β⟨Uex⟩∕N, for ten different numbers of points and four different grid spacings. What is remarkable here is the insensitivity of this thermodynamic quantity to grid spacing, given a fixed data range. For instance, the value for Δr=0.4 and a box size of 963 has only a 0.2% difference from the converged result using a grid spacing of Δr=0.1 and a box size of 3843 points. It should be noted that these two results have the same range in each of the three spatial dimensions, 38.4 Å. In fact, all the results throughout the table that represent the same spatial range are remarkably similar, except when the grid spacing is so large (Δr=0.8) that it cannot adequately sample even the larger details of the result for the model.
Table 1.
Box dimension | Δr | |||
---|---|---|---|---|
0.10 | 0.20 | 0.40 | 0.80 | |
3843 | −19.44 | ⋯ | ⋯ | ⋯ |
2563 | −19.44 | ⋯ | ⋯ | ⋯ |
1923 | −19.46 | −19.48 | ⋯ | ⋯ |
1283 | −19.51 | −19.57 | ⋯ | ⋯ |
963 | −19.48 | −19.64 | −19.52 | ⋯ |
643 | ⋯ | −19.77 | −19.72 | ⋯ |
483 | ⋯ | −19.94 | −19.90 | −19.50 |
323 | ⋯ | −17.43 | −20.21 | −19.96 |
243 | ⋯ | ⋯ | −20.83 | −20.42 |
163 | ⋯ | ⋯ | −20.79 | −21.20 |
Table 2 gives the Kirkwood G’s for the same range of box sizes and grid spacings as the internal energy. The same general trend of the data sampling range being far more important than grid spacing or number of data points is evident for this quantity as well. The insensitivity to grid spacing is not as strong, however, as for internal energy. This is probably because the integrand for internal energy is quite a long-ranged function, due to the displacement difference between the positive and negative Coulombic potential contributions [Eq. 45]. This functional form will extend several molecule diameters into space, but will be a smooth function without detail smaller than 0.4 Å. By comparison, the integrand for the Kirkwood G’s, which are simply a summation over the molecule distributions [Eq. 46], is short-ranged and the detail of the shape of the excluded volume and local contact peaks will need to be sampled fairly accurately. It should also be reiterated that the results in Tables 1, 2 have the same precalculated θαi(rαi) functions which are calculated on a very fine grid and extend approximately 50 Å into the surrounding solvent, giving a greater effective resolution at both short and long-range.
Table 2.
Box dimension | Δr | |||
---|---|---|---|---|
0.10 | 0.20 | 0.40 | 0.80 | |
3843 | −37.87 | ⋯ | ⋯ | ⋯ |
2563 | −37.93 | ⋯ | ⋯ | ⋯ |
1923 | −38.57 | −37.88 | ⋯ | ⋯ |
1283 | −40.58 | −37.95 | ⋯ | ⋯ |
963 | −32.46 | −38.52 | −37.87 | ⋯ |
643 | ⋯ | −40.34 | −37.95 | ⋯ |
483 | ⋯ | −33.29 | −38.38 | −37.86 |
323 | ⋯ | −47.01 | −39.80 | −37.98 |
243 | ⋯ | ⋯ | −35.01 | −38.09 |
163 | ⋯ | ⋯ | −39.16 | −38.61 |
Comparison to the supercell method
The supercell method, examined in Sec. 2D, is a method which modifies the long-ranged Coulomb tails in the expressions for the solvent-solute interaction potential. Despite this, as discussed above, the method should be expected to provide reasonable solutions for neutral solutes. For our model of TIP3P water as solute in a solvent of the same model, the method provides excess internal energies which are surprisingly close to the values from the exact method, as a comparison of Tables 1, 3 shows. When the data range is sufficiently large the values are converged from the method, although they converge to a slightly different value, probably because while the are close to 0 in the region near k=0 for the exact case, they are not identically 0 as for the supercell method. The supercell results are a little more sensitive to box size as would be expected. Perhaps the most interesting difference between the methods is the difficulty in achieving supercell results. As shown in Table 3, there are several box size and data range combinations for which there appears to be no solution for the supercell method. Even where solutions can be achieved they are far more sensitive to functional starting point and adjustment of input parameters. Solutions typically require four to ten partial solution steps where an input parameter is adjusted gradually and the functions are converged and then used as a starting point for the next parameter value. By comparison, the method given in this work converges readily and quickly in a single MDIIS convergence cycle, and is fairly insensitive to starting point.
Table 3.
Box dimension | Δr | |||
---|---|---|---|---|
0.10 | 0.20 | 0.40 | 0.80 | |
3843 | −19.16 | ⋯ | ⋯ | ⋯ |
2563 | −19.16 | ⋯ | ⋯ | ⋯ |
1923 | −19.14 | −19.19 | ⋯ | ⋯ |
1283 | −19.02 | −19.24 | ⋯ | ⋯ |
963 | −18.73 | −19.27 | −19.19 | ⋯ |
643 | ⋯ | −19.22 | −19.33 | ⋯ |
483 | ⋯ | −19.01 | −19.44 | NS |
323 | ⋯ | NS | −20.56 | −22.79 |
243 | ⋯ | ⋯ | −19.54 | −22.08 |
163 | ⋯ | ⋯ | NS | −21.25 |
We do not compare supercell and current method results for charged solutes here because the problems with the supercell results have already been discussed in the literature.6, 7 Charged solutes provide no special problems for the method presented here, and solutions are possible for systems with large charges, such as the protein BPTI discussed below with its overall charge of +6e. When a solution is obtained for a single atom ion, the results are identical to those obtained from 1D IE methods.
Comparison of closures
Figure 3 provides a comparison of the three closures considered here to the simulation result for a TIP3P water solute surrounded by water as solvent. The figure shows HuO(x,y,z), the distribution of water oxygen sites around the central fixed molecule. All IE results here are for a 0.1 Å, 2563 point grid, and are plotted on the same scale for comparison.
Figure 3a shows the simulation result, with two large contact peaks at the position of closest approach to the solute water hydrogen atoms. There is a smaller solvent separated peak near the solute oxygen atom. Figure 3b shows that the HNC closure provides the closest approximation, with peaks that are somewhat lower than those from the simulation. Adding the approximate bridge function to form the HNCB closure lowers them slightly more [Fig. 3c], while the highest peaks from the KH closure are only 15% of the simulation values.
For some models of both ionic liquids11 and multisite molecules18 the HNC closure gives values which slightly overestimate the correlation for contact peaks. This is also true for the large biological molecule model we will discuss below. The KH closure and the approximate bridge function we propose here were both developed to deal with situations where the overcorrelation provided by the HNC closure is so severe that there appears to be no solution to the equations. To our knowledge the KH closure has not been used when the HNC solution was available, but here we have a clear counterexample of the usual trend, showing that an accurate bridge function will need to be able to enhance correlations for some models as well as depress them for others. It can simply be demonstrated that there exist models for which our proposed approximate bridge diagram would correct the HNC peaks. Unfortunately this water model is not one of them. However, the KH closure can only decrease the size of correlation peaks for all models, and will do so for almost all peaks, sometimes quite drastically. For this model the HNCB closure, being the least severe modification of the HNC closure, provides the better result.
Comparison of methods for a protein
We wish to show that the general procedure for handling the long-ranged Coulombic tails is easily extended to larger molecules. As an example we take BPTI, which is a small, but fairly rigid, globular protein with 58 amino acid residues. Since our IE results examine rigid molecules, we choose for comparison the time averaged structure from an MD trajectory conducted in the NVT ensemble. For that MD simulation of BPTI, the original coordinates of the macromolecule, including the four internal water molecules, were taken from an x-ray structure38(PDB ID: 5PTI). A new trajectory has been calculated holding the BPTI model in this fixed conformation immersed in water, using the same TIP3P water model as in the previous section. The position and orientation of the fixed BPTI molecule was with the center of charge at the origin and the dipole aligned with the z-axis, with all-atom positions at the same coordinate values as for the IE calculations. The system was solvated with 4151 water molecules, after overlaying the x-ray structure and repositioning the four internal water molecules into the average MD protein structure, for a total of 4155 waters, and six chloride ions were added to neutralize the system. The final box was 48.0 Å×50.0 Å×54.0 Å. The all-atom CHARMM (version 27) molecular mechanics force field39 was used define the interactions. For this calculation the macromolecule was held fixed at the average structure from the previous NVT simulation, however, all the other atoms, including the internal water molecules, were allowed to move. The system was minimized and equilibrated in the NVE ensemble at 300 K. All other simulation details were the same as for the water simulation described above. The BPTI Hui(x,y,z) functions were obtained by tabulating positions in a plane of 0.4 Å cubes centered about the x=3.6 Å plane in the molecular coordinates described above. The bulk density was obtained from a 20.0 Å cubic box cut from the upper corner of the simulation box; this is in a region away from the protein.
IE results for the same model protein structure at infinite dilution in TIP3P DRISM∕HNC water were calculated. Calculations with data structures of up to 3843 points were performed, but as with the water results grid spacing and range were converged with a system using 1283 points, spaced evenly in all dimensions, with a grid spacing of 0.4 Å. For this model no HNC solution was possible. For this solute model the HNC closure gives a large overcorrelation which has catastrophic numerical effects in regions near the surface of the protein where several strong local interactions reinforce each other. This problem is well known and is one of the main motivations for the application of the KH closure,6, 8 as well as our proposal of an approximate 3D bridge diagram. A numerical solution can be found using both of these approaches.
As a sample structural comparison between these two IE results and that from MD simulation, we plot the results one grid width wide centered on a plane cut through the data. Using the alignment of the BPTI model described above, a plane 3.6 Å away from the molecular dipole and center of charge was chosen because of the presence of water distribution peaks on the interior of the protein. However, the qualitative differences between the three results are similar regardless of which particular plane is chosen.
The results are given in Figs. 4a, 4b, 4c and, as with the water results above, the result of the HNCB calculation is qualitatively much closer to that of the MD simulation. Once again the linearization of the exponential relationship between the distribution and the potential of mean force in the KH closure leads to peaks in the contact region that are between two and ten times smaller than that of the simulation. Since the KH closure primarily affects the larger contact peaks, and the 3D bridge diagram only gives a significant contribution in the contact regions, these two results appear more similar in the second and surrounding solvation shells.
BPTI is an example of the possibility of biological molecules having internal regions which energetically accommodate solvent molecules favorably. With simulations, due to initial trajectory conditions, such sites may never be populated during the typical time scales on which simulations are carried out. Or, if such internal cavities are populated at the beginning of a simulation, they are likely to remain populated throughout the simulation. For the simulation used here, whether or not to populate a site was determined by the presence of an internal water molecule in the experimental crystal structure of the molecule, so for this simulation there are four interior water molecules that were positioned in cavities as an initial condition, and as expected, they had no opportunity to migrate out into the surrounding solvent for the duration of the simulation. If in a real system these cavities are populated part of the time it is unlikely that we would get an accurate picture from simulations unless the time scales become prohibitively long.
By contrast, since IE methods sample the entire ensemble of system configurations in all regions of space, they will show a finite probability for the population of internal cavities whenever the equation system used gives favorable energetics in the cavity. There are however two problems with the results from IE methods. First, if there is no energetic path to the population of a model cavity, such as a completely enclosed hard shell, they will incorrectly show internal population anyway. Second, approximate theories like the ones used here will tend to overpopulate small internal cavities. The reason for the overpopulation is similar to the problem of overcorrelation with the HNC closure. Missing diagrams in the theory represent incorrectly weighted correlations, and these are typically most noticed in contact regions, so should be expected to be important in regions completely surrounded by solute sites at close range. Diagrams are missing in all orders of density for RISM-like theories, including zeroth order terms contributing directly to the correlation between solvent and solute. Some of these diagrams could provide large repulsive contributions to the potentials of mean force in these regions. In particular, only a complete set of correct diagrams will provide the correct topology; if some are missing, especially in site-site theories, correlations with partial molecules could be implied. If IE results are significantly dependent on choice of closure for these small internal regions, it would be an indication that we should not rely on the prediction of how much solvent should be in the cavity, or even if there should be any there at all.
The cross-sectional plane chosen for Fig. 4 shows two internal regions for which both the KH closure and HNCB closure show the presence of solvent molecules but the simulation only shows the presence of solvent in one of these regions. This second region had no water molecules present in the experimental x-ray structure. In the internal region where all three results show the presence of solvent, the HNCB peak is closer in both height and width to that of the simulation than is the KH result, but both IE results show the presence of solvent in a region of significantly larger spatial extent than does the MD result. The overcorrelation effect of HNC-like closures appear to be a significant factor in this entirely enclosed region. While the KH closure does not allow the correlations to grow large, it does not remove smaller correlations in regions (or parts of the region) where an exact closure may forbid a nonzero solvent density. The HNCB approximation decreases the width of the peak somewhat, but not enough to make it as narrow as that of the MD result. This is important because it is not clear whether the other internal peak, which the MD simulation does not find, should be there or not. Its spatial extent is significantly smaller for both IE results than is the other internal peak, and the HNCB peak is narrower (although taller) than the KH result. With such a large difference in the correlations between closures, it is an indication that if the exact bridge functions were available these internal peaks may no longer appear. For most model systems, we can rely on simulations to provide a result that is close to exact for the model, but this is a situation where we do not know if the internal peaks should be there or not. Some of the problem may be ascribed to an issue of ensemble choice. The simulations were performed in the NVE ensemble, where interior cavities can only be populated or depopulated if the protein unfolds enough to allow a water molecule to move in or out, which is an extremely unlikely event. Testing the occupancy rate of an interior cavity would best be done in a semigrand ensemble.40 Getting reliable predictions from IE’s in this type of situation will require more accurate correlation functions.41, 42 We will consider both these approaches in the future.
CONCLUSIONS
We have presented a method for protein solvation for resumming the exact Coulomb interactions for 3D IE systems. The solutions are easily and efficiently calculated, for both small and large molecules. While the supercell method has been interpreted in terms of the considerations of Ewald sums,6, 7, 26 we have analyzed it simply as a mathematical approximation, showing why it should be expected to work well for neutral solutes and poorly for charged solutes, as has been reported in the literature.6, 7 Comparison of thermodynamic quantities from exact and supercell methods shows small differences for a neutral molecule, but the supercell method results were computationally more difficult or in a few cases impossible to obtain.
We have also proposed a simple 3D bridge function, based on the lowest density order missing intermolecular diagrams and the PY approximation. This bridge function is combined with the HNC closure to form the HNCB closure, and results from HNC, HNCB, and KH closures have been evaluated by comparison to MD simulation results. For the simple model of a water molecule fixed in a water solvent the HNC closure gave better results. For the BPTI protein model no HNC solutions were possible, but the HNCB result was much closer to the MD result than the one calculated from the KH closure which greatly underestimated the correlations at contact.
It has been proposed that simulations are not a good method for investigating the occupancy of interior protein cavities and 3D IEs show promise in this regard.10 We have shown that both the size and height of occupancy peaks in BPTI cavities are strongly closure dependent, indicating that they are strongly dependent on the large number of low order missing diagrams which are a problem inherent in all 3D-RISM based IE theories, and that predicting the population of internal cavities should be approached with caution. The resolution of this problem will need advances in theories which use more accurate correlation functions41, 42 as well as comparison to simulations using semigrand ensembles which are more suited to the problem.40
Full angular dependent site-site methods have only been tested on dipolar diatomic molecular fluids.24 The extension of such dielectrically reliable methods to more complex systems like aqueous solutions would allow an interesting extension of methodological family explored here. A 3D grid where every point was a fully angularly dependent function of the site densities could be used to accurately represent the electrostatic properties involved in the fine balance between hydrophobic and hydrophilic forces in molecular recognition of proteins.
ACKNOWLEDGMENTS
The authors thank Dr. Kippi Dyer and Professor Benoit Roux for helpful discussions and suggestions. This work was supported in part by the National Institutes of Health (GM 037657), the Robert A. Welch Foundation (E-1028), and a training fellowship to J.J.H. from the Keck Center for Computational and Structural Biology of the Gulf Coast Consortia (NLM Grant No. 5T15LM07093). This research was performed in part using the Molecular Science Computing Facility in the William R. Wiley Environmental Molecular Sciences Laboratory, located at the Pacific Northwest National Laboratory and in part by the National Science Foundation through TeraGrid resources provided by Pittsburgh Supercomputing Center.
References
- Beglov D. and Roux B., J. Chem. Phys. 103, 360 (1995). 10.1063/1.469602 [DOI] [Google Scholar]
- Beglov D. and Roux B., J. Chem. Phys. 104, 8678 (1996). 10.1063/1.471557 [DOI] [Google Scholar]
- Ikeguchi M. and Doi J., J. Chem. Phys. 103, 5011 (1995). 10.1063/1.470587 [DOI] [Google Scholar]
- Cortis C. M., Rossky P. J., and Friesner R. A., J. Chem. Phys. 107, 6400 (1997). 10.1063/1.474300 [DOI] [Google Scholar]
- Beglov D. and Roux B., J. Phys. Chem. B 101, 7821 (1997). 10.1021/jp971083h [DOI] [Google Scholar]
- Kovalenko A. and Hirata F., J. Chem. Phys. 112, 10391 (2000). 10.1063/1.481676 [DOI] [Google Scholar]
- Kovalenko A. and Truong T. N., J. Chem. Phys. 113, 7458 (2000). 10.1063/1.1313388 [DOI] [Google Scholar]
- Kovalenko A. and Hirata F., J. Chem. Phys. 110, 10095 (1999). 10.1063/1.478883 [DOI] [Google Scholar]
- Imai T., Kovalenko A., and Hirata F., Chem. Phys. Lett. 395, 1 (2004). 10.1016/j.cplett.2004.06.140 [DOI] [Google Scholar]
- Imai T., Hiraoka R., Kovalenko A., and Hirata F., J. Am. Chem. Soc. 127, 15334 (2005). 10.1021/ja054434b [DOI] [PubMed] [Google Scholar]
- Perkyns J. S. and Pettitt B. M., J. Chem. Phys. 97, 7656 (1992). 10.1063/1.463485 [DOI] [Google Scholar]
- Perkyns J. S. and Pettitt B. M., Biophys. Chem. 51, 129 (1994). 10.1016/0301-4622(94)00056-5 [DOI] [PubMed] [Google Scholar]
- de Leeuw S. W., Perram J. W., and Klein M. L., Proc. R. Soc. London, Ser. A 373, 27 (1980). 10.1098/rspa.1980.0135 [DOI] [Google Scholar]
- Allen M. P. and Tildesley D. J., Computer Simulation of Liquids (Oxford University Press, Oxford, 1987). [Google Scholar]
- Ng K. -C., J. Chem. Phys. 61, 2680 (1974). 10.1063/1.1682399 [DOI] [Google Scholar]
- Lado F., Mol. Phys. 31, 1117 (1976). 10.1080/00268977600100851 [DOI] [Google Scholar]
- Hansen J. P. and McDonald I. R., Theory of Simple Liquids, 2nd ed. (Academic, London, 1986). [Google Scholar]
- Perkyns J. S. and Pettitt B. M., J. Am. Chem. Soc. 118, 1164 (1996). 10.1021/ja952392t [DOI] [Google Scholar]
- Attard P. and Patey G. N., J. Chem. Phys. 92, 4790 (1990). [Google Scholar]
- Perkyns J. and Pettitt B. M., Theor. Chem. Acc. 96, 61 (1997). 10.1007/s002140050205 [DOI] [Google Scholar]
- Perkyns J. S., Dyer K. M., and Pettitt B. M., J. Chem. Phys. 116, 9404 (2002). 10.1063/1.1473660 [DOI] [Google Scholar]
- Dyer K. M., Perkyns J. S., and Pettitt B. M., J. Chem. Phys. 116, 9413 (2002). 10.1063/1.1473661 [DOI] [Google Scholar]
- Kovalenko A. and Hirata F., J. Chem. Phys. 113, 9830 (2000). 10.1063/1.1321039 [DOI] [Google Scholar]
- Dyer K. M., Perkyns J. S., and Pettitt B. M., J. Chem. Phys. 123, 204512 (2005). 10.1063/1.2116987 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gurikov Y. V. and Khim Z. F., Russ. J. Phys. Chem. 56, 4970 (1982). [Google Scholar]
- Kovalenko A., in Molecular Theory of Solvation, edited by Hirata F. (Kluwer Academic, New York, 2003), Chap. 4, p. 169. [Google Scholar]
- Perkyns J. S. and Pettitt B. M., Chem. Phys. Lett. 190, 626 (1992). 10.1016/0009-2614(92)85201-K [DOI] [Google Scholar]
- Hünenberger P. H. and McCammon J. A., J. Chem. Phys. 110, 1856 (1999). 10.1063/1.477873 [DOI] [Google Scholar]
- Smith P. E. and Pettitt B. M., J. Am. Chem. Soc. 113, 6029 (1991). 10.1021/ja00016a015 [DOI] [Google Scholar]
- Høye J. S. and Stell G., J. Chem. Phys. 65, 18 (1976). 10.1063/1.432793 [DOI] [Google Scholar]
- Perkyns J. and Pettitt B. M., Theor. Chem. Acc. 99, 207 (1998). 10.1007/s002140050325 [DOI] [Google Scholar]
- Jorgensen W. L., Chandrasekhar J., and Madura J. D., J. Chem. Phys. 79, 926 (1983). 10.1063/1.445869 [DOI] [Google Scholar]
- Kovalenko A., Ten-no S., and Hirata F., J. Comput. Chem. 20, 928 (1999). [DOI] [Google Scholar]
- ESP: Extended Systems Program, Copyright of the University of Houston.
- Andersen H. C., J. Comput. Phys. 52, 24 (1983). 10.1016/0021-9991(83)90014-1 [DOI] [Google Scholar]
- Ryckaert J. -P., Ciccotti G., and Berendsen H. J. C., J. Comput. Phys. 23, 327 (1977). 10.1016/0021-9991(77)90098-5 [DOI] [Google Scholar]
- Swope W. C., Andersen H. C., Berens P. H., and Wilson K. R., J. Chem. Phys. 76, 637 (1982). 10.1063/1.442716 [DOI] [Google Scholar]
- Wlodawer A., Walter J., Huber R., and Sjolin L., J. Mol. Biol. 180, 301 (1984). 10.1016/S0022-2836(84)80006-6 [DOI] [PubMed] [Google Scholar]
- MacKerell J. A. D., Bashford D., Bellott M., Dunbrack J. R. L., Evanseck J. D., Field M. J., Fischer S., Gao J., Guo H., Ha S., Joseph-McCarthy D., Kuchnir L., Kuczera K., Lau F. T. K., Mattos C., Michnick S., Ngo T., Nguyen D. T., Prodhom B., W. E.ReiherIII, Roux B., Schlenkrich M., Smith J. C., Stote R., Straub J., Watanabe M., Wiórkiewicz-Kuczera J., Yin D., and Karplus M., J. Phys. Chem. B 102, 3586 (1998). 10.1021/jp973084f [DOI] [PubMed] [Google Scholar]
- Lynch G. C. and Pettitt B. M., J. Chem. Phys. 107, 8594 (1997). 10.1063/1.475012 [DOI] [Google Scholar]
- Dyer K. M., Perkyns J. S., Stell G., and Pettitt B. M., J. Chem. Phys. 129, 104512 (2008). 10.1063/1.2976580 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dyer K. M., Perkyns J. S., Stell G., and Pettitt B. M., Mol. Phys. 107, 423 (2009). 10.1080/00268970902845313 [DOI] [PMC free article] [PubMed] [Google Scholar]