Abstract
The proximal distribution function (pDF) quantifies the probability of finding a solvent molecule in the vicinity of solutes. The approach constitutes a hierarchically organized theory for constructing approximate solvation structures around solutes. Given the assumption of universality of atom cluster-specific solvation, reconstruction of the solvent distribution around arbitrary molecules provides a computationally convenient route to solvation thermodynamics. Previously, such solvent reconstructions usually considered the contribution of the nearest-neighbor distribution only. We extend the pDF reconstruction algorithm to terms including next-nearest-neighbor contribution. As a test, small molecules (alanine and butane) are examined. The analysis is then extended to include the protein myoglobin in the P6 crystal unit cell. Molecular dynamics simulations are performed, and solvent density distributions around the solute molecules are compared with the results from different pDF reconstruction models. It is shown that the next-nearest-neighbor modification significantly improves the reconstruction of the solvent number density distribution in concave regions and between solute molecules. The probability densities are then used to calculate the solute–solvent non-bonded interaction energies including van der Waals and electrostatic, which are found to be in good agreement with the simulated values.
I. INTRODUCTION
Knowledge of protein hydration in an aqueous solution is important for understanding the role of water in protein structures and functions.1,2 The impact of water on protein structures is well-appreciated.3,4 Experimental methods, such as x-ray and neutron diffraction and nuclear magnetic resonance (NMR) spectroscopy, provide some observations about structural and functional roles of water molecules surrounding proteins.5 Insight about hydration has been obtained from theoretical calculations and computational methods,6 providing an interpretation of experimental results.7
The probability distributions of an aqueous solvent around a protein can be characterized in a variety of ways. A fundamental approach to this problem can be based on the protein–solvent atomic radial pair correlation functions.8,9 However, for polyatomic solutes like proteins, due to the complex structure and low symmetry, the details of the solvent structure are not easily interpretable by traditional radial distribution functions.10,11 An alternative is to consider the full three-dimensional distribution of the solvent to interpret the solvent features. Three-dimensional distributions of the solvent around a protein have been considered experimentally5,12 and theoretically for some time.7,13,14 Solvent distributions, from continuum to atomically detailed, can be used in a variety of methods including mixed implicit–explicit representations for electronic structure calculations of solutes in a solvent environment.15,16
Given the solvent–solute 3D probability distribution, solution thermodynamic properties are an important target for calculation.17,18 One approach is grid inhomogeneous solvation theory (GIST), which is designed to provide approximate thermodynamics based on the molecular distribution functions derived from explicit solvent molecular dynamics (MD) simulations,19 which may be utilized to calculate local thermodynamic features of the system.20 Although GIST provides a detailed solvent number density approximation, sampling configurations within a given ensemble leads to computational expense. Here, we consider a family of methods relying on reconstructed solvent distribution to circumvent simulations for each new system of interest.
Formally, one can consider the hierarchy of many-body distribution functions between the solute and solvent from cluster theory.9 A variety of such constructs exist, but the two most useful have been physical cluster theory and mathematical cluster theory. Here, we use the former where we consider a partition function for the 3D solvent distribution around a solute written in terms of correlations involving nearest-neighbors, next-nearest-neighbors, etc. Once the components are projected from a full distribution, it has been noted that within a chemical series of compounds, such as alkanes, polypeptides, polynucleotides, etc., strong similarities exist between correlations involving similar chemistries.21 Given approximate universality, one can then use a library of physical cluster correlation components between solute atoms and solvent to reconstruct an approximate distribution of the solvent around chemically related molecules. This approximate universality allows us to predict the hydration of complex biological macromolecules from the proximal distribution functions (pDF, also referred to as pRDF elsewhere) of a set of small model compound solute molecules with similar chemistry and atom types,22–26 where besides the accuracy of the results, the fast performance of pDF is significant.27 While the first order (nearest-neighbor) approximation pDF model predicts general aspects of the solvent structure, neglecting the effects from the higher-order terms in the proximity hierarchy (next-nearest solute atoms, next–next nearest, etc.) can lead to differences in probability densities of reconstructions with MD simulations of the same potential, often within interfacial regions.28 A general approach is based on the closely related quasi-component distribution functions.29 To the lowest order, these methods select a set of solute atoms based on their proximity and calculate the corresponding pDFs from simulation24,25,30 or experiments.31
A number of previous studies have employed the near neighbor contribution, the dominant leading term of the distribution function proximity cluster decomposition, in most cases.32–37 In the work we present here, the method is extended to include the next-nearest-neighbor contributions (the second-order proximity terms) as a higher-order approximation to the probability distribution. The data for the next-nearest-neighbor pDFs are obtained from all-atom molecular dynamics (MD) simulations in a similar manner as in Ref. 24 and utilized in solvent number density reconstructions. In principle, pDFs can be utilized for any solvent,38 however, aqueous solutions are of interest here.
This paper is organized as follows. We present the theoretical framework to calculate pDFs and the reconstruction of solvent density distributions based on the second-order proximity approximation, followed by the simulation details for both the components needed and the control calculations for comparison. Next, our test results are presented, in which two small molecules, butane and alanine (Ala1), are examined. The reconstructed solvent density and the electrostatic and van der Waals (vdW) components of the thermodynamic interaction energies from both nearest and nearest + next-nearest-neighbor pDF models are presented along with the results from MD simulations for comparison. Then, our analyses are extended to include the myoglobin P6 unit cell. We conclude our study with a summary of our findings and discussion for future work.
II. METHOD
A. Distribution hierarchy and reconstructions
The proximal distribution functions are a conditional case of the quasi-component distributions,32,39,40 which quantify the probability of finding a solvent molecule in the vicinity of a solute. The basic idea is that in any configuration of the system, the solvent molecules are classified based on the proximity of their distances to any solute atom, A, with the proximity index denoting order in the proximity hierarchy k.32,41 In an N solvent particle configuration of the system, where the total number of solute atoms is M, for a solvent molecule i, the complete set of all the possible distances from the solvent to every solute atom, [r(k)], can be arranged in the order of k and presented as
(1) |
In the ordered set, the primary proximity index, rAi(1), selects the minimum distance among all the members of [r(k)], the set of all the distances between solvent molecule i and all the M individual solute atoms. In a similar fashion, rAi(2) selects the second minimum distance. Taking into account the proximity indices for all the solvent molecules, one can characterize the solvation structure of the system in terms of the physical cluster ordered molecular distribution functions.
The proximity decomposition of the full intermolecular pair distribution function, , can be referenced to either the solvent atoms or the solute atoms.42 We can make an ordered set as in Eq. (1) in the frame of the solute molecule or in the frame of a solvent molecule. For any solute atom A, there exists a total (conventional) distribution function of the solvent (water) molecules, W, around the solute denoted by gAW(r), where the vector notation is dropped. A proximal radial distribution function can be defined where solvent molecules are classified based on their proximity to the solute atom A and the kth order proximal distribution function is denoted by gAW(r; k). In this way, gAW(r; 1) is built based on the solvent molecules, W, nearest to the solute atom A and gAW(r; 2) is constructed based on the next-nearest solvent molecules, and so on. This can also be phrased in terms of conditionally averaged angular distribution functions.42 We will leave more exploration of this decomposition for another study.
Figure 1 shows how for a three atom solute molecule, the solvent space is divided into multiple regions of two proximity orders [nearest (n) and next-nearest (nn)]. This classification of the solvent space and molecules reflects the complexities of the geometry of the system and introduces issues for normalization. However, as shown in Fig. 1, the volume element of the radial shell associated with the primary region of each individual solute atom can be confined in Voronoi polyhedra, and the kth order proximal distribution function is given by
(2) |
where NA(r; k) is the total number of solvent molecules of the kth order of proximity to the solute atom A and δ[VA(r; k)] is the volume of the shell in the kth order proximity Voronoi region at a distance r associated with atom A, where ρ0 is the bulk solvent number density.
FIG. 1.
A Voronoi diagram for a triatomic solute. The space around a solute molecule composed of three atoms A, B, and C is divided into six regions nearest (n) and next-nearest (nn) to different solute atoms with the relevant indices showing nearest and next-nearest atoms, respectively. As an example, a three-body correlation between a water molecule (W) at a grid point is represented nearest to A and next-nearest to B.
The complete set of conditional pair distribution functions then sum to the radial site–site distribution function,
(3) |
Thus, the density of the solvent around a solute can be decomposed into the nearest [gAW(r; 1)], next-nearest [gAW(r; 2)], and higher-order conditional pair distribution functions, where k runs from one to the total number of atoms (sites) of the solute, M, and normalization assures that gAW(r) goes to unity at large distances. Summations where k < M can be utilized to approximate the solvent structure with proper normalization. Illustrations of this exist in the previous literature.42 In principle, the various conditional atom–atom correlation functions of the system can be deduced from diffraction experiments or theoretical analyses as well as simulation.
In previous studies, the total three-dimensional distribution of the solvent around a solute based on the nearest-neighbor approximation has been utilized and it was shown that the effective solute–solvent interactions are largely local, and, consequently, the correlations can be often well approximated by the near neighbor contribution.32–37 However, neglecting the effects from the next-nearest solute atoms can lead to deficits in densities where three-body correlations are important, such as in molecular crevices/cavities and in intersolute regions.28 In the following analysis, we will investigate the contribution of the next-nearest-neighbor approximation.
There are two computational aspects to the problem. The first is to compute the set of gAW(r; k) often from a set of model compounds. Simple examples have been given before.33,42 Then, assuming transferability based on atom types and chemistry, the approximate density of the solvent can be reconstructed around any molecule with similar atoms and chemistry.
For a specific solute atom type A, we compute gAW(r; k) from simulations given the total number of configurations of the system, T, we can rewrite Eq. (2) as follows:
(4) |
where at each time instant t, NA(r, t; k) is the total number of solvent molecules with the kth order proximal distance to solute atom A.
To compute gAW(r; k) described in Eq. (4), the volume element has to be solved for each solvent molecule i at any instant t. A computationally cost effective solution is to reference the solvent positions on a three-dimensional grid around the solute. In this framework, as shown in Fig. 1, each individual grid point along with its corresponding time averaged solvent density is assigned to the first and second closest solute atoms for gAW(r; 1) and gAW(r; 2) calculations, respectively, in a similar manner as performed in Ref. 33.
Given a set of pDFs and the assumption of approximate universality,21 we can reconstruct the solvation density around any solute with a similar set of atom types or groups. For this purpose, utilizing the precomputed gAW(r; k), we use a three-dimensional grid around the solute at a resolution required for the computation of the solvent-influenced properties. One then may assign solvent density to the grid point (u, v, w), located at the center of the voxel, as follows:
(5) |
where r and r′ are the corresponding distance from the grid point (u, v, w) to the nearest and next-nearest atoms of the solute molecule, respectively. We can normalize the sum of the conditional functions with γ such that their sum approaches 1 at long distances when using less than a complete set, i.e., for k < M. Consequently, the solvent density at grid point (u, v, w) is
(6) |
To reconstruct the solvent number density at each grid point assigned to the solute atoms A, B, …, we need to check if the grid point could still be within the density exclusion distance of other solute atoms. For instance, the nearest-neighbor atom could be overshadowed by a larger nearby atom such that it excludes the solvent at that grid point. This happens frequently when the grid point is near some atoms of disparate size. To prevent these situations, we modify the procedure in Eq. (6) by applying an exclusion rule43 during the reconstruction process,
(7) |
where j refers to the grid point (u, v, w) and rA; k represents the distance between the solute atom A with the kth order proximity and the grid point (u, v, w). The exclusion factor is defined as
(8) |
Once the solvent density distribution is determined, many of the equilibrium properties and thermodynamics can be calculated.21,43 For instance, the solute–solvent interaction energy can be written as
(9) |
where for the total number of grid points S, at each grid point (x, y, z) with solvent density ρ0, the interaction energy between solute atom j and solvent (water) molecule w is Ujw(rji), with rji representing the distance between solute atom j and grid point i, and Δv is the volume of a single grid voxel.
Similar to previous analysis,43 we address two parameters used in solvent density reconstructions and conduct pDF calculations. The radial resolution (Δr) for collecting pDF’s and the grid spacing (Δs) on the three-dimensional target grid around the solute, wherein the present work, are set to 0.01 and , respectively. We note that the resolution of is somewhat fine for most aqueous distribution functions, but required for accurate energy and free energy calculations to avoid interpolation errors.
In this study, we wish to improve the approximation of the reconstructed solvent number density in the space around a solute molecule from previous work. By adding the contribution of the next-nearest-neighbor approximation to the previous nearest-neighbor model, we obtain contributions important for a situation where three-body correlations are important, such as regions of local solute concavity. To further illustrate our results, non-bonded interaction energies are investigated. The vdW contribution to the average interaction energy in solution is calculated and compared with the nearest-neighbor pDF model and similar results from MD simulations. In the underlying molecular mechanics, we use the Lennard-Jones 6–12 potential,
(10) |
where ϵ is the well depth and σ is the contact distance. vdW terms decay rapidly with particle separation. As a result, a cutoff is utilized with the assumption of negligible impact from the periodic copy of the system.
The electrostatic contribution to the average interaction energy between the solute and the solvent is calculated using the particle mesh Ewald method (PME)44–46 for calculation of long-range contributions to the potential energy considering the periodic boundary conditions. PME recast the Coulomb terms as follows:
(11) |
where n represents the index of the periodic copy of the system. Udirect is the direct or real-space sum of the potential, which is a sum of point charges “screened” by Gaussian charge distributions centered at the position of each charge and given as follows:
(12) |
where erfc(x) is the complementary error function and α is the Ewald parameter that controls the Gaussian. Ureciprocal is the reciprocal or Fourier part of the potential, which is a periodic distribution charge field decomposed into a Fourier series for faster convergence,
(13) |
where m = (2π/L)l with l = (lx, ly, lz) are the reciprocal lattice vector in Fourier space. Finally, Uself accounts for a correction term to remove the self-interactions arising from the reciprocal term and is defined as
(14) |
To calculate the electrostatic energy, for a given value of α, the results have to converge for the selected reciprocal lattice vectors. The electrostatic interaction energies are derived and compared with the nearest-neighbor pDF model and similar results from MD simulations.
Now, we consider which solute atom sets will make the more important contributions. For a solute molecule composed of M number of atoms, the total number of possibilities for the next-nearest pDFs is M(M − 1), however, considering the atom type similarities and/or negligible probabilities, not all will make a significant contribution. Chemistry, such as in the case of the amino acid residues, limits some potential combinations. Additionally, in order to decrease the number of possibilities for the next-nearest-neighbor pDF dataset and make the analysis more feasible, only the non-hydrogen (heavy) atoms are considered. In the pDF calculations for each non-hydrogen atom, a so-called united atom representation is assumed, which considers the heavy atom with all its bounded hydrogens for the proximity criteria. Moreover, for the cases where the contribution of the next-nearest pDF is negligible, it is excluded from the dataset and calculations. Finally, in the cases with similar atom types, the averaged pDF data are assumed.
For each snapshot of the MD simulations of the model compounds, at each grid point, water molecules are assigned for the histograms to the nearest and next-nearest united atom in the solute. The pDFs for hydrogen atoms are calculated in a similar manner as presented in Ref. 33, and utilized for the exclusion rule [Eq. (7)] in our analysis.
As a control, we used butane and alanine as their own model compounds. In butane, the non-hydrogen atom types include methylene carbon (CT2) and methyl carbon (CT3), with the set for the nearest pDFs being CT2 and CT3. There are four possibilities for the next-nearest-neighbor pDFs, including CT2 · CT2, CT2 · CT3, CT3 · CT2, and CT3 · CT3, wherein A · B notation, B is the next-nearest atom given A as the nearest atom. The hydrogen atoms include methylene hydrogen (HA3) and methyl hydrogen (HA2). The sets of pDFs for both nearest and next-nearest are presented in Fig. 2. The sum of all members, as in Eq. (3), shows the expected normalization for a pair distribution function. Each member of the hierarchy displays an asymptote consistent with its percentage contribution to the total. Partial sums of Eq. (3) must subsequently be normalized to produce the expected behavior approaching unity at long range.
FIG. 2.
The average solvent density maps. (a) and (d) illustrate the simulated 2-D water density maps for butane and Ala1, respectively. (b) and (e) represent the average solvent density maps from the nearest pDF model. (c) and (f) represent the average solvent density maps from the nearest + next-nearest pDF model for molecules butane and Ala1, respectively. For each individual molecule, all the panels represent the same cross-section (x–y plane) along the z axis.
Ala1 consists of five different non-hydrogen atom types, with the nearest pDF set including CT1, CT3, C, O, and NH1, where CT1 is the backbone carbon, C and O represent the carbon and oxygen atoms of the carbonyl group, and NH1 is the peptide nitrogen. In general, there are 20 different possibilities for the next-nearest pDF sets with all the contributing ones presented in Fig. 6. The pDF set for hydrogen atoms includes HA3, HN, and HB1, where HN is the polar hydrogen bonded to the nitrogen and HB1 is the backbone hydrogen.
FIG. 6.
Calculated hierarchy contributions to the pDFs for different atom types in Ala1. (a)–(e) represent the nearest (n) and next-nearest (nn) pDFs for CT1, CT3, C, NH1, and O, respectively, (X · Y notation represents the next-nearest pDF for Y given X as the nearest). The negligible nn pDFs are not presented. (f) represents the total pDF for hydrogen atom types HA3, H, and HB1 in ala1. OT is the water oxygen atom type.
B. Simulations
In this study, the solute molecules butane and Ala1 were examined. Molecular dynamics simulations were performed with NAMD 2.1447 and the CHARMM36 force field parameters.48 Each solute molecule was solvated with TIP3P water49 with at least from the solute molecule to the side of the box. For the Ala1 molecule, the terminals were capped with acetate (ACE) and N-methyl amide (NME) residues, assuming its initial structure from our previous studies.43 Three-dimensional periodic boundary conditions were applied. A rigid water geometry is enforced using SHAKE.50 To treat the electrostatic interactions, particle mesh Ewald51 using a grid of was applied. The temperature was fixed at 300 K via a Langevin thermostat. A time step of 1.0 fs was used to integrate the equation of motion. The simulations were carried out in the NPT ensemble at 1 atm pressure, with the first 1 ns excluded as further equilibration. Throughout the simulations, the solute molecules were fixed.
For myoglobin, the analyses were performed on the unit cell crystal structure of the sperm whale myoglobin with carbon monoxide obtained from the Protein Data Bank using PDB entry 2MGK.52 The resolution of the structure is and the space group is P6. The crystallographic waters and ions were stripped out and the missing hydrogen atoms were reconstructed using the PSFGEN plugin in (Visual Molecular Dyanmics).53 The P6 space group unit cell of dimension Å3 was constructed using the UnitCell program in AmberTools (http://ambermd.org). The unit cell was created by applying the symmetry operations of the P6 group to reconstruct the unit cell crystal structure and solvated with TIP3p water49 after removing the crystallographic waters. Finally, the unit cell was neutralized by adding six chloride ions. To relax the system, three steps of energy minimization/equilibration of the system were carried out including energy minimization and equilibration of water molecules and ions, while protein atom positions were fixed, energy minimization and equilibration of protein, while new positions of water molecules were fixed, and, finally, energy minimization and equilibration with no restraints applied on the system. Equilibration of the system was carried out in an NVT ensemble as the system was gradually heated to 300 K with no restraints applied. Finally, NPT production simulations were performed for 80 ns. The simulations were carried out at 1 atm pressure.
To analyze the data from MD simulations, the three-dimensional space was divided into grids with 0.5 A spacing, and the number of water molecules (based on the position of oxygen) was calculated at each grid point throughout the trajectories and averaged. For normalization, the number density of bulk water was utilized.
III. RESULTS
To understand the consequences of the next-nearest approximation in pDF models, the average solvent density maps from both nearest and nearest + next-nearest pDF models are considered. For comparison, the average solvent number density from MD simulations is calculated. For this purpose, a grid spacing of was selected. The results for both test molecules (butane and Ala1) are presented in Fig. 2, where 2D (x–y planes) maps of the average solvent density are presented at the same z.
Results of the solvent density from MD simulations for butane and Ala1 are illustrated in Fig. 2 in panels (a) and (d) for comparison. Where there are correlations due to the geometry and potential interactions of multiple solute atoms interacting with the solvent, some higher density features are present. These features are not well represented in the near neighbor approximation [panels (b) and (e)]. They are clearly improved in the nearest + next-nearest pDF model, as shown in panels (c) and (f). This shows that in the regions where solvent molecules are adjacent to more than one solute atom, higher-order correlations may play a more significant role in determining the local distribution of the solvent. This effect is obvious in all the examples in this study. In addition, the shape of the solute cavity in the nn pDF model better resembles the results from the simulation compared to the n pDF model.
Being able to reconstruct the solvent number density at each grid point in the space around the solute utilizing both nearest-neighbor and nearest + next-nearest-neighbor pDF models should lead to better averages over the distribution. As a test, the non-bonded vdW and electrostatic interaction energies between the solute and the solvent were examined for butane and Ala1. The accuracy of the estimated average energies depends on the reconstructed solvent probability densities. A combination of short-range and long-range non-bonded interactions is a reasonable check of the accuracy of the results. For short-range vdW interactions, due to the nature of the Lennard-Jones potential, even small deviations in densities in the vicinity of solute atoms may cause dramatic discrepancies with respect to the energy contributions. For long-range electrostatic interactions, the solvent average number density at longer distances from the solute also plays an important role. To validate our results, the vdW and electrostatic interaction energy between the solute and the solvent was calculated from the simulation trajectories, and presented in Fig. 3 in panels (a) and (b), respectively. A fairly good agreement is observed with the higher-order approximation generally being more important in the small test system containing heteroatoms.
FIG. 3.
An illustration of the solute–solvent vdW (a) and electrostatic (b) interaction energies. The results from both nearest (n) and nearest + next-nearest (nn) pDF models along with MD simulations are presented for butane and Ala1.
Finally, we consider larger biomolecular systems with more surface features including crevices, cavities, and interstitial regions. Such geometric features are clearly expected to have strong contributions to the solvent density from atoms that are not necessarily the nearest to a solvent molecule, but strongly correlate or even help trap the solvent. Here, we investigate the application of the modified pDF model to a protein crystal, where we also expect nontrivial solvation correlations due to proximal protein monomers. We examine the myoglobin P6 unit cell, in which six myoglobin proteins are fully packed, with a large amount of protein–protein interface, which makes it a more stringent case study. The pDF dataset for atoms is selected based on the chemical similarity approximations. The solvent number density maps from both nearest-neighbor and nearest + next-nearest-neighbor pDF models along with results from simulations are presented in Fig. 4. Comparing the solvent number density maps from pDF models with MD simulation results, we find that the higher-order approximation nearest + next-nearest pDF model better represents the features of the solvent number density in the cavities and near protein interfaces. These features are generally underestimated in nearest-neighbor approximations.
FIG. 4.
The average solvent density maps for solvated myoglobin P6 unit cell. (a) represents the simulated 2-D water density map, (b) represents the average solvent density maps from the nearest pDF model, and (c) represents the average solvent density maps from the nearest + next-nearest pDF model. All the panels represent the same cross-section (x–y plane) along the z axis.
IV. CONCLUSION
In this study, we extended the pDF solvent density reconstruction method to terms including next-nearest-neighbor (along with the nearest-neighbor) contributions. It was shown that this modification improves the reconstruction of the solvation number density distribution in the near vicinity of a variety of solute molecules from small molecules to protein crystals. This validation was performed by comparing the results with the number density distribution from simulations. The vdW and electrostatic interaction energies between selected solute molecules and solvent were calculated for the small molecule test set and compared to that from simulations. The higher-order approximation notably improved the agreement for the small molecules including heteroatoms.
The nn pDF model was applied to the P6 unit cell of myoglobin protein, consisting of six fully packed proteins per unit cell with a large amount of protein interface inside the unit cell. By comparing with the number density distribution maps from MD simulations, we found that considering higher-order approximations improves the results in the vicinity of protein surfaces. We found substantial improvements in the density also at protein–protein interfaces.
Solvent density can be localized due to strong interactions, such as Coulombic forces. However, localization due to geometric confinement, whether intermolecular or intramolecular, provides a qualitatively different mechanism usually involving more neighboring solute atoms. By considering only the correlations with the nearest atom, the proximal distribution approximation misses contributions from the three-body and higher relations. The results at protein–protein interfaces shown in Fig. 4 show many strongly solvent localized density peaks in the simulation data and experiment.52 The improved approximation presented here may be useful in protein refinements.
ACKNOWLEDGMENTS
The authors are grateful to NSF (Grant No. CHE-1709310), NIH (Grant No. GM037657), and the Robert A. Welch Foundation (H-0013) for partial support of this work. M.G. acknowledge CPRIT (Grant No. RP170593) for a summer fellowship. The authors appreciate conversations with Dr. Su-ching Ou, Dr. Ka-Yiu Wong, and Dr. Justin Drake. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation Grant No. ACI-1548562. The authors thank the scientific computing staff at the Sealy Center for Structural Biology and Molecular Biophysics for computing support.
APPENDIX: PROXIMAL DISTRIBUTION FUNCTIONS
The contributions of the lowest members of the proximal hierarchy are shown in Figs. 5 and 6. The sum of all members, as in Eq. (3), shows the expected normalization for a pair distribution function. Each member of the hierarchy displays an asymptote consistent with its percentage contribution to the total. Partial sums of Eq. (3) must subsequently be normalized to produce the expected behavior approaching unity at long range.
FIG. 5.
Calculated pDF contributions for different atom types in butane. (a) and (b) represent the nearest (n) and next-nearest (nn) pDFs for CT2 and CT3, respectively, (X · Y notation represents the next-nearest pDF for Y given X as the nearest). In (b), pDF for CT3 · CT3 is negligible and not presented. (c) represents the total pDF for hydrogen atom types HA2 and HA3 in butane. OT is the water oxygen atom type.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.
REFERENCES
- 1.Nagendra H. G., Sukumar N., and Vijayan M., “Role of water in plasticity, stability, and action of proteins: The crystal structures of lysozyme at very low levels of hydration,” Proteins 32, 229–240 (1998). [DOI] [PubMed] [Google Scholar]
- 2.Mattos C., “Protein-water interactions in a dynamic world,” Trends Biochem. Sci. 27, 203–208 (2002). 10.1016/s0968-0004(02)02067-4 [DOI] [PubMed] [Google Scholar]
- 3.Chou D. H. and Morr C. V., “Protein-water interactions and functional properties,” J. Am. Oil Chem. Soc. 56, A53–A62 (1979). 10.1007/bf02671785 [DOI] [Google Scholar]
- 4.Bellissent-Funel M., Hassanali A., Havenith M., Henchman R., Pohl P., Sterpone F., van der Spoel D., Xu Y., and Garcia A. “Water determines the structure and dynamics of proteins,” Chem. Rev. 116, 7673–7697 (2016). 10.1021/acs.chemrev.5b00664 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Svergun D. I., Richard S., Koch M. H. J., Sayers Z., Kuprin S., and Zaccai G., “Protein hydration in solution: Experimental observation by x-ray and neutron scattering,” Proc. Natl. Acad. Sci. U. S. A. 95, 2267–2272 (1998). 10.1073/pnas.95.5.2267 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Brooks C. L. III, Karplus M., and Pettitt B. M., “Proteins: A theoretical perspective of dynamics, structure, and thermodynamics,” in Advances in Chemical Physics (John Wiley and Sons, New York, 1988), Vol. 71, p. 259. [Google Scholar]
- 7.Lounnas V., Lüdemann S. K., and Wade R. C., “Towards molecular dynamics simulation of large proteins with a hydration shell at constant pressure,” Biophys. Chem. 78, 157–182 (1999). 10.1016/s0301-4622(98)00237-3 [DOI] [PubMed] [Google Scholar]
- 8.Hill T. L., “Theory of solutions. I,” J. Am. Chem. Soc. 79, 4885–4890 (1957). 10.1021/ja01575a016 [DOI] [Google Scholar]
- 9.Theory of Simple Liquids, 4th ed., edited by Hansen J.-P. and McDonald I. R. (Academic Press, Oxford, 2013). [Google Scholar]
- 10.Mehrotra P. K. and Beveridge D. L., “Structural analysis of molecular solutions based on quasi-component distribution functions. Application to [H2CO]aq at 25 °C,” J. Am. Chem. Soc. 102, 4287–4294 (1980). 10.1021/ja00533a001 [DOI] [Google Scholar]
- 11.Attard P., “Spherically inhomogeneous fluids. I. Percus–Yevick hard spheres: Osmotic coefficients and triplet correlations,” J. Chem. Phys. 91, 3072–3082 (1989). 10.1063/1.456930 [DOI] [Google Scholar]
- 12.Bellissent-Funel M.-C., “Hydration in protein dynamics and function,” J. Mol. Liq. 84, 39–52 (2000), part of Special Issue: Dynamics of Complex Molecular Liquids-Computer Simulations and Experiments. 10.1016/s0167-7322(99)00109-9 [DOI] [Google Scholar]
- 13.Perkyns J. S., Lynch G. C., Howard J. J., and Pettitt B. M., “Protein solvation from theory and simulation: Exact treatment of Coulomb interactions in three-dimensional theories,” J. Chem. Phys. 132, 064106 (2010). 10.1063/1.3299277 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Howard J. J., Perkyns J. S., Choudhury N., and Pettitt B. M., “Integral equation study of the hydrophobic interaction between graphene plates,” J. Chem. Theory Comput. 4, 1928–1939 (2008). 10.1021/ct8002817 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Raucci U., Perrella F., Donati G., Zoppi M., Petrone A., and Rega N., “Ab-initio molecular dynamics and hybrid explicit-implicit solvation model for aqueous and nonaqueous solvents: GFP chromophore in water and methanol solution as case study,” J. Comput. Chem. 41, 2228–2239 (2020). 10.1002/jcc.26384 [DOI] [PubMed] [Google Scholar]
- 16.Petrone A., Cerezo J., Ferrer F. J. A., Donati G., Improta R., Rega N., and Santoro F., “Absorption and emission spectral shapes of a prototype dye in water by combining classical/dynamical and quantum/static approaches,” J. Phys. Chem. A 119, 5426–5438 (2015). 10.1021/jp510838m [DOI] [PubMed] [Google Scholar]
- 17.Mezei M., Swaminathan S., and Beveridge D. L., “Ab initio calculation of the free energy of liquid water,” J. Am. Chem. Soc. 100, 3255–3256 (1978). 10.1021/ja00478a070 [DOI] [Google Scholar]
- 18.Chen F. and Smith P. E., “Theory and computer simulation of solute effects on the surface tension of liquids,” J. Phys. Chem. B 112, 8975–8984 (2008). 10.1021/jp711062a [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Nguyen C. N., Kurtzman Young T., and Gilson M. K., “Grid inhomogeneous solvation theory: Hydration structure and thermodynamics of the miniature receptor cucurbit[7]uril,” J. Chem. Phys. 137, 044101 (2012). 10.1063/1.4733951 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Nguyen C., Yamazaki T., Kovalenko A., Case D. A., Gilson M. K., Kurtzman T., and Luchko T., “A molecular reconstruction approach to site-based 3D-RISM and comparison to GIST hydration thermodynamic maps in an enzyme active site,” PLoS One 14, e0219473 (2019). 10.1371/journal.pone.0219473 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lin B. and Pettitt B. M., “Note: On the universality of proximal radial distribution functions of proteins,” J. Chem. Phys. 134, 106101 (2011). 10.1063/1.3565035 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Nguyen B. L. and Pettitt B. M., “Effects of acids, bases, and heteroatoms on proximal radial distribution functions for proteins,” J. Chem. Theory Comput. 11, 1399–1409 (2015). 10.1021/ct501116v [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lynch G. C., Perkyns J. S., Nguyen B. L., and Pettitt B. M., “Solvation and cavity occupation in biomolecules,” Biochim. Biophys. Acta 1850, 923–931 (2015). 10.1016/j.bbagen.2014.09.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Makarov V. A., Feig M., Andrews B. K., and Pettitt B. M., “Diffusion of solvent around biomolecular solutes: A molecular dynamics simulation study,” Biophys. J. 75, 150–158 (1998). 10.1016/s0006-3495(98)77502-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Makarov V., Pettitt B. M., and Feig M., “Solvation and hydration of proteins and nucleic acids a theoretical view of simulation and experiment,” Acc. Chem. Res. 35, 376–384 (2002). 10.1021/ar0100273 [DOI] [PubMed] [Google Scholar]
- 26.Virtanen J. J., Makowski L., Sosnick T. R., and Freed K. F., “Modeling the hydration layer around proteins: Applications to small- and wide-angle x-ray scattering,” Biophys. J. 101, 2061 (2013). 10.1016/j.bpj.2011.09.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lin B., Wong K.-Y., Hu C., Kokubo H., and Pettitt B. M., “Fast calculations of electrostatic solvation free energy from reconstructed solvent density using proximal radial distribution functions,” J. Phys. Chem. Lett. 2, 1626–1632 (2011). 10.1021/jz200609v [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Ou S.-C. and Pettitt B. M., “Free energy calculations based on coupling proximal distribution functions and thermodynamic cycles,” J. Chem. Theory Comput. 15, 2649–2658 (2019). 10.1021/acs.jctc.8b01157 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ben-Naim A., Molecular Theory of Water and Aqueous Solutions (World Scientific Publishing Company, 2011). [Google Scholar]
- 30.Subramanian P. S., Ravishanker G., and Beveridge D. L., “Theoretical considerations on the “spine of hydration” in the minor groove of d(CGCGAATTCGCG).d(GCGCTTAAGCGC): Monte Carlo computer simulation,” Proc. Natl. Acad. Sci. U. S. A. 85, 1836–1840 (1988). 10.1073/pnas.85.6.1836 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Burling F. T., Weis W. I., Flaherty K. M., and Brünger A. T., “Direct observation of protein solvation and discrete disorder with experimental crystallographic phases,” Science 271, 72–77 (1996). 10.1126/science.271.5245.72 [DOI] [PubMed] [Google Scholar]
- 32.Mezei M. and Beveridge D. L., “Structural chemistry of biomolecular hydration via computer simulation: The proximity criterion,” in Biomembranes, Part O, Methods in Enzymology (Academic Press, 1986), Vol. 127, pp. 21–47. [DOI] [PubMed] [Google Scholar]
- 33.Makarov V. A., Andrews B. K., and Pettitt B. M., “Reconstructing the protein–water interface,” Biopolymers 45, 469–478 (1998). [DOI] [PubMed] [Google Scholar]
- 34.Phillips G. N. and Pettitt B. M., “Structure and dynamics of the water around myoglobin,” Protein Sci. 4, 149–158 (1995). 10.1002/pro.5560040202 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Feig M. and Pettitt B. M., “Modeling high-resolution hydration patterns in correlation with DNA sequence and conformation,” J. Mol. Biol. 286(4), 1075–1095 (1999). 10.1006/jmbi.1998.2486 [DOI] [PubMed] [Google Scholar]
- 36.Rudnicki W. R. and Pettitt B. M., “Modeling the DNA-solvent interface,” Biopolymers 41, 107–119 (1997). [DOI] [PubMed] [Google Scholar]
- 37.Ou S.-C., Drake J. A., and Pettitt B. M., “Nonpolar solvation free energy from proximal distribution functions,” J. Phys. Chem. B 121, 3555–3564 (2017). 10.1021/acs.jpcb.6b09528 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Feig M. and Pettitt B. M., “Sodium and chlorine ions as part of the DNA solvation shell,” Biophys. J. 77, 1769–1781 (1999). 10.1016/s0006-3495(99)77023-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lounnas V., Pettitt B. M., and Phillips G. N., “A global model of the protein-solvent interface,” Biophys. J. 66, 601–614 (1994). 10.1016/s0006-3495(94)80835-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Ben-Naim A., “Mixture-model approach to the theory of classical fluids. II. Application to liquid water,” J. Chem. Phys. 57, 3605–3612 (1972). 10.1063/1.1678816 [DOI] [Google Scholar]
- 41.Mezei M., Mehrotra P. K., and Beveridge D. L., “Monte Carlo computer simulation of the aqueous hydration of the glycine zwitterion at 25 °C,” J. Biomol. Struct. Dyn. 2, 1–27 (1984). 10.1080/07391102.1984.10507543 [DOI] [PubMed] [Google Scholar]
- 42.Dyer K. M. and Pettitt B. M., “Proximal distributions from angular correlations: A measure of the onset of coarse-graining,” J. Chem. Phys. 139, 214111 (2013). 10.1063/1.4832895 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Ou S.-C. and Pettitt B. M., “Solute–solvent energetics based on proximal distribution functions,” J. Phys. Chem. B 120, 8230–8237 (2016). 10.1021/acs.jpcb.6b01898 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Ewald P. P., “Die berechnung optischer und elektrostatischer gitterpotentiale,” Ann. Phys. 369, 253–287 (1921). 10.1002/andp.19213690304 [DOI] [Google Scholar]
- 45.de Leeuw S. W., Perram J. W., Smith E. R., and Rowlinson J. S., “Simulation of electrostatic systems in periodic boundary conditions. I. Lattice sums and dielectric constants,” Proc. R. Soc. London, Ser. A 373, 27–56 (1980). 10.1098/rspa.1980.0135 [DOI] [Google Scholar]
- 46.de Leeuw S. W., Perram J. W., Smith E. R., and Rowlinson J. S., “Simulation of electrostatic systems in periodic boundary conditions. III. Further theory and applications,” Proc. R. Soc. London, Ser. A 388, 177–193 (1983). 10.1098/rspa.1983.0077 [DOI] [Google Scholar]
- 47.Phillips J. C., Braun R., Wang W., Gumbart J., Tajkhorshid E., Villa E., Chipot C., Skeel R. D., Kalé L., and Schulten K., “Scalable molecular dynamics with NAMD,” J. Comput. Chem. 26, 1781–1802 (2005). 10.1002/jcc.20289 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Vanommeslaeghe K., Hatcher E., Acharya C., Kundu S., Zhong S., Shim J., Darian E., Guvench O., Lopes P., Vorobyov I., and A. D.Mackerell, Jr., “CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields,” J. Comput. Chem. 31, 671–690 (2010). 10.1002/jcc.21367 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Jorgensen W. L., Chandrasekhar J., Madura J. D., Impey R. W., and Klein M. L., “Comparison of simple potential functions for simulating liquid water,” J. Chem. Phys. 79, 926–935 (1983). 10.1063/1.445869 [DOI] [Google Scholar]
- 50.Ryckaert J.-P., Ciccotti G., and Berendsen H. J. C., “Numerical integration of the cartesian equations of motion of a system with constraints: Molecular dynamics of n-alkanes,” J. Comput. Phys. 23, 327–341 (1977). 10.1016/0021-9991(77)90098-5 [DOI] [Google Scholar]
- 51.Darden T., York D., and Pedersen L., “Particle mesh Ewald: An N ⋅ log(N) method for Ewald sums in large systems,” J. Chem. Phys. 98, 10089–10092 (1993). 10.1063/1.464397 [DOI] [Google Scholar]
- 52.Quillin M. L., Arduini R. M., Olson J. S., and Phillips G. N., “High-resolution crystal structures of distal histidine mutants of sperm whale myoglobin,” J. Mol. Biol. 234, 140–155 (1993). 10.1006/jmbi.1993.1569 [DOI] [PubMed] [Google Scholar]
- 53.Humphrey W., Dalke A., and Schulten K., “VMD: Visual molecular dynamics,” J. Mol. Graphics 14, 33–38 (1996). 10.1016/0263-7855(96)00018-5 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.