Abstract
A statistical mechanical framework elucidates the significance of structural correlations between coarse-grained (CG) sites in the multiscale coarse-graining (MS-CG) method [S. Izvekov and G.A. Voth. J. Phys. Chem. B 109 2469 (2005), J. Chem. Phys. 123 134105 (2005)]. If no approximations are made, the MS-CG method yields a many-body multi-dimensional potential of mean force describing the interactions between CG sites. However, numerical applications of the MS-CG method typically employ a set of pair potentials to describe non-bonded interactions. The analogy between coarse-graining and the inverse problem of liquid state theory clarifies the general significance of three-particle correlations for the development of such CG pair potentials. It is demonstrated that the MS-CG methodology incorporates critical three-body correlation effects and that, for isotropic homogeneous systems evolving under a central pair potential, the MS-CG equations are a discretized representation of the well-known Yvon-Born-Green equation. Numerical calculations validate the theory and illustrate the role of these structural correlations in the MS-CG method.
1. Introduction
Although molecular dynamics (MD) simulations provide a powerful tool for investigating complex biomolecular systems,1 their substantial computational cost limits conventional atomistic MD simulations to investigations on time-scales that are less than microseconds and length-scales that are significantly less than micrometers.2,3 Such atomistic MD simulations are often inadequate to model biological processes such as protein folding4 or signal transduction,5 which may occur on significantly larger time- and length- scales. Consequently, there has been growing interest in developing computationally-inexpensive “coarse-grained” (CG) models,6–18 which can then be simulated over significantly longer time- and length- scales.
In order to reproduce the thermodynamic and structural properties of the original atomistic system, the low-resolution CG structures must be sampled according to the probability distribution function for the fully atomistic representation of the same structures. A formal prescription for designing CG models thus involves the integration over atomistic degrees of freedom to define a reduced description of the system.2,7 The interactions between CG sites must therefore include not only energetic, but also entropic contributions that result from averaging over eliminated degrees of freedom. In principle, the resulting CG model may require a many-body interaction potential that depends upon the thermodynamic state point.13 In practice, though, CG force fields typically model non-bonded interactions with central pair potentials that depend only upon the distance between CG sites.2 This effective pair potential represents an approximate decomposition of the many-body interaction obtained from a formal integration over uninteresting degrees of freedom. The procedure for determining this pair potential must appropriately incorporate the effects of many-body correlations in order to reproduce the structure of the original system.
The theory of coarse-grained modeling is similar to the ‘inverse problem’ of liquid state theory.19–21 Both theories attempt to determine an interaction potential reproducing an observed structure. However, in coarse-grained modeling the target structure is a low-resolution representation of the original structure and the deduced interaction is between the CG sites defining the reduced structure.2 The theory of the Yvon-Born-Green (YBG) equation provides a direct solution to this inverse problem,20 under the assumption that such an interaction potential exists. The YBG equation provides an exact relation between a given two-body interaction potential and the n- and (n+1)- particle distribution functions obtained from equilibrium simulations employing the potential. Therefore, a CG pair potential may be determined by inverting the YBG equation for the observed two- and three-particle CG distribution functions. This relationship suggests a role for higher order correlations in deducing a pair potential that will reproduce the observed CG structure.
The multiscale coarse-graining (MS-CG) procedure16,17 determines CG force fields from atomistic MD simulations by employing a statistical implementation of the Force-Matching (FM) method.22,23 This MS-CG method has been successful in developing CG models for many complex systems such as ionic liquids,24 mixed lipid bilayers,25 small peptides,26 nano-particles, 27 and even mixed resolution models of trans-membrane proteins in lipid bilayers.28 In the following analysis it is demonstrated that if no approximations in the functional form of the CG force field are made, the MS-CG method determines a many-body multidimensional potential of mean force describing the CG representation of the system. Consequently, simulations employing this many-body MS-CG potential would generate CG structures according to the underlying atomistic distribution function. However, prior numerical applications of the MS-CG method have approximated this many-body interaction potential with a set of bonded and non-bonded pair potentials between CG sites.16,17,24–28 The present work demonstrates that the MS-CG equations for this CG pair potential reflect both two- and three-particle correlations between CG sites within a system. Moreover, for homogeneous, isotropic systems the MS-CG equations are equivalent to generalized Yvon-Born-Green (YBG) equations20 for CG sites interacting according to a central pair force field. For such systems the MS-CG methodology explicitly considers the two- and three- particle correlations between CG sites within an atomistic MD simulation, assumes that these distribution functions were generated by a pair-wise decomposable central force field, and then inverts the resulting YBG equation to determine this force field. The YBG theory thus provides a fundamental link between the system structure and the CG force field.
In Section 2.1 the relationship between the MS-CG interaction potential and a multidimensional potential of mean force is derived. The “normal” MS-CG equations are next derived in Section 2.2 by approximating the many-body interaction potential with a central pair-wise decomposable force field. The derivation emphasizes the relationship between the MS-CG equations and the correlations observed between CG sites in atomistic MD simulations. The YBG equation for a CG system is then presented in Section 2.3 and it is shown that for a homogeneous, isotropic system this equation reduces to the MS-CG equations. Numerical illustrations of this analysis are presented in Section 3. In Section 4 the significance of liquid state theory for developing coarse-grained models is considered, especially with regards to the MS-CG and reverse Monte Carlo methods.2,21,29 These results are reviewed in the Conclusion section, Section 5. Proofs of certain necessary results and generalizations of the theory for a multi-component system are provided in the Appendices. A more detailed discussion of the theory is attached as a Supporting Information section.
2. Theory
2.1. Many-body CG potential of mean force
The multi-scale coarse-graining (MS-CG) method of Izvekov and Voth extends the Force-Matching (FM) method22 to determine coarse-grained (CG) force fields from atomistic molecular dynamics simulations.17 The CG force field is obtained by minimizing a residual describing the difference between the instantaneous forces defined by the CG force field and the original atomistic force field. This difference is statistically averaged over all CG sites and all configurations sampled from the atomistic MD simulations. As shown below, these configurations must be sampled according to the distribution function for the atomistic Hamiltonian in order to perform the correct averaging. The FM residual for a system described by NCG identical CG sites may be expressed:
(1) |
In eq (1) and in the following, Latin subscripts ( i ) indicate particular CG sites and the superscript ( I ) labels configurations sampled during the atomistic MD simulation. Thus, the MS-CG residual compares the total force on a given CG site defined by the atomistic force field for the sampled atomistic configuration, with the force on the same site defined by the CG force field for the CG representation of the same configuration,
In the limit of adequate sampling, the FM residual may be considered an average over the region of configuration space, D, sampled by an atomistic trajectory and weighted according to the atomistic distribution function, defined for the trajectory. The residual may then be expressed as the configurational integral:
(2) |
Now, assume that there exists a canonical transformation that partitions the atomistic phase space into a set of coordinates describing the CG sites and a set of residual degrees of freedom: Such a transformation certainly exists, for example, when the CG sites are defined as the centers of mass for groups of atoms. Upon employing this transformation in eq (2), the residual may be considered a functional of the CG force field:
(3) |
The CG force field is defined by minimizing the functional20,30 according to yielding the result:
(4) |
Here the canonical ensemble is explicitly considered: where Z(N,V0,T) is the canonical partition function. The force on CG site i according to the atomistic potential is defined where the partial derivative is performed with respect to the CG coordinate r⃑i while holding all remaining atomistic and CG coordinates fixed. In general the force depends upon all NAA = NCG + NR degrees of freedom. The CG force field defined by eq (4) depends upon all NCG degrees of freedom and may be considered an average of the atomistic force acting on a CG site with the average performed over all sampled atomistic configurations consistent with the fixed CG configuration and weighted according to the atomistic distribution function. The normalization of eq (4) defines a multi-dimensional potential of mean force (pmf) describing the CG sites that may be expressed:
(5) |
The CG force field defined in eq (4) is the appropriate gradient of this CG potential:
(6) |
2.2. Force-Matching to a Central Pair Potential
Previous numerical applications of the MS-CG method16,17,24–27 have approximated the many-body CG potential of mean force (pmf) derived above in eq (5) with a central pair potential. These applications have determined the optimal CG force field by minimizing the residual in eq (1) under the assumptions that the MS-CG force field is pair-wise additive, such that
(7) |
and, furthermore, that this force field is central,
(8) |
In these definitions r⃑i represents the Cartesian coordinates of CG site i ; represents the vector from the j to the i CG site; represents the magnitude of this vector; represents the associated unit vector; and f(r )represents the function defining the magnitude of the interaction between CG sites and depends only upon the inter-site distance. The force field, f(r ), minimizing the residual under these constraints is uniquely determined and may be conveniently described by a discrete delta function basis for which δD (r) =1 when −Δr / 2 ≤r < Δr / 2 and is 0 otherwise. In this basis the force field may be represented as:
(9) |
This definition corresponds to tabulating the force field at a discrete set of points, rd, about which ( rd − Δr / 2 < r < rd + Δr / 2 ) the force field is assumed to be constant. Previous applications of the MS-CG method16,17,24–27 have typically employed a spline basis for representing the FM force field. However, the basis used in eq (9) is particularly amenable for the following analysis and, in the limit that Δr→0, this representation transparently recovers a continuous representation of the force field.
By minimizing the residual (1) with respect to the elements of the force table, and employing the definitions in eq (7)–eq (9), a system of linear algebraic equations is obtained:
(10) |
in which
(11) |
and
(12) |
The angular brackets denote an average over configurations sampled by the atomistic MD simulation. In the following analysis it will be assumed that this average over configurations is equivalent to the appropriate ensemble average:
(13) |
The linear system in eq (10) is referred to as the “normal” MS-CG equations because Gdd′ is a symmetric, i.e. normal, matrix. Previous numerical implementations of the MS-CG procedure16,17,24–28 have employed an additional block-averaging approximation to solve an equivalent system of over-determined equations for the force field, fd. It is the purpose of this work to further elucidate the physical significance of the “normal” MS-CG equations and, in particular, to relate them to well-known theories for the liquid state.20
In eq (10) all information regarding the atomistic forces has been packaged into the term bd. According to eq (11), this information enters as the average projection of the instantaneous total force on each CG site, onto the sum of unit vectors from all other CG sites, that are a distance rd from i in the given configuration, I. Thus, the quantity bd describes the average correlation between the instantaneous net force on each CG site and the local spatial distribution of CG sites a distance rd away. If this distribution is always instantaneously spherically symmetric, then bd = 0. Similarly, Gdd′ describes the average instantaneous fluctuations in the local density of CG sites at distances rd and rd′ , respectively, from each CG site.
Since the j = k term has not been excluded from the triple sum in eq (12), the quantity Gdd′ includes both two- and three-particle correlations, which may be explicitly separated and analyzed:
(14) |
(15) |
(16) |
The quantity counts the number of distinct CG sites separated by a distance rd and is closely related to the radial distribution function for the CG sites. The quantity is a direct measure of three-particle correlations between CG sites in atomistic MD simulations. The factor is the cosine of the angle defined between the three CG sites with the site i at the vertex of the angle as illustrated in Figure 1. Thus, describes the constrained average value of this quantity for two distinct CG sites, j and k, that are distances rd and rd′, respectively, from each CG site, i. Because the three CG sites are distinct, excluded volume effects prevent sites j and k from overlapping. Consequently, for rd ≈ rd′, there exists a cone of small angles defined by that are never sampled during the MD simulation. Corresponding configurations for which are not disfavored and, as a result, the constrained average is negative for distances rd ≈ rd′ at which small angle configurations are disfavored. This situation is illustrated in Figure 1b. The form of may at first seem somewhat artificial. The dot product factor arises as a consequence of the assumption in eq (8) that the MS-CG force field is directed along the inter-site vector. Moreover, it will be shown that this same factor arises quite naturally in liquid state theories for a central pair potential.
Employing the definitions in eq (14)–eq (16), eq (10) may be re-expressed as:
(17) |
As mentioned above, all information regarding atomistic forces is contained within bd. The left hand side of eq (17) depends only upon two-particle information, while the right hand expression reflects three-particle correlations through the term which couples the equations for different force table elements. The first term in the left hand expression (i.e., bd ) describes the average correlation of the total instantaneous force on each CG particle with the spatial distribution of CG particles a distance rd away. The second term describes the average net force on each CG particle from CG particles a distance rd away in terms of the MS-CG force field and the average two-particle distribution according to The difference between this average projection of the total force and the average force arising directly from a CG particle at the fixed distance arises from interactions with a third particle. The right hand side of eq (17) then describes how forces from a third particle are correlated with the two-particle structure described in The quantity describes the average MS-CG force from a distinct third particle, k, a distance rd′ away from the i CG site, given that the j CG site is a distance rd away. The dot product in the definition of arises because it has been assumed that the interaction between each pair is along the vector connecting them.
Although the numerical MS-CG procedure explicitly employs information regarding the atomistic forces, it is proven in Appendix A that the normal MS-CG equations presented in eq (10) may be recast in a form that is independent of atomistic force information. Under the assumption that the forces on the CG sites measured in MD simulations may be expressed as the gradient of a many-body CG potential energy function generating the observed CG structure, it follows that:
(18) |
where
(19) |
In eq (18) and eq (19) it has been assumed that the average over MD configurations is equivalent to the ensemble average, ρ = N /V0, and V0 is the total system volume. Equation (19) defines the radial distribution function for CG sites in an atomistic MD simulation.20 As proven in Appendix A, eq (18) follows immediately from the definition in eq (19).
Multiplying both sides of eq (17) by the factor and employing the relations in eq (18) and eq (19), the MS-CG equations for the interaction, fd, may be expressed as:
(20) |
Equation (20) contains the same information as eq (17) but has eliminated the explicit dependence upon atomistic forces for an equivalent quantity in terms of the radial distribution function. This equation emphasizes the discrete nature of the MS-CG equations and suggests a continuous representation. The discrete delta function representation defined in eq (9) is particularly convenient for this purpose as, in the limit that Δr→0, eq (20) becomes a one-dimensional linear integral equation
(21) |
in terms of a FM kernel which describes the effects of three-particle correlations according to
(22) |
Equation (21) re-emphasizes the significance of three-particle correlations in the MS-CG procedure. The left hand side of this equation depends only upon two-particle information and, in the case that the right hand side vanishes, the MS-CG interaction becomes where w(r) = −kBT ln g(r), i.e., the conventional 2-body pmf. Thus the MS-CG pair potential differs significantly from the 2-body pmf because it explicitly considers the effects of three-particle correlations between CG sites in determining a CG force field.
The normal MS-CG equations for a multi-component system are derived in Appendix C. The resulting system of equations for the MS-CG interaction between CG sites of types α and β may be expressed
(23) |
This equation generalizes eq (21) to consider multiple types of CG particles (labeled with Greek indices), the relevant interactions between them, and also the effect of three-particle correlations centered about the second particle, which is a distinct case for α ≠ β. A more detailed presentation of the derivation is provided in the Supporting Information.
To summarize the preceding analysis, the MS-CG method has been applied to derive a linear integral equation for the MS-CG force field. By assuming that the CG force field is both pair-wise additive and central, according to eq (7) and eq (8), and employing the discrete delta function basis of eq (9), the linear least squares problem defined by the residual in eq (1) has been transformed into a system of linear algebraic equations (10) describing the relationship between the CG force field and the distribution of CG sites. By separating two- and three- particle contributions according to eq (14) and employing the identity in eq (18) to eliminate atomistic force information, the normal MS-CG equations have been recast in a form that depends only upon structural information according to eq (20). Finally, using the discrete basis to pass into the continuum limit, this system of equations has been expressed as the integral equation in eq (21), which generalizes to eq (23) for systems with multiple types of CG sites.
2.3. Yvon-Born-Green Equation
As demonstrated in subsection 2.2, the normal MS-CG equations are a discrete representation of a one-dimensional linear integral equation. For a homogeneous isotropic system, this integral equation is equivalent to the Yvon-Born-Green equation20 describing CG distribution functions resulting from a central pair force field. This result is briefly derived for a one-component system in the present subsection. The general result for multi-component systems is derived in Appendix D. A more detailed presentation is provided as Supporting Information.
Consider a system described by NCG identical CG sites. The Hamiltonian for this system evolving under an NCG -particle potential function, VNCG, is defined:
(24) |
Assuming that the total force on each CG site, arises from a sum of pair interactions,
(25) |
then for a homogeneous system with it follows that the distribution of three arbitrary, but distinct CG particles, i, j, and k may be described according to
(26) |
Equation (26) is the well-known Yvon-Born-Green (YBG) equation20 relating the equilibrium two- and three- CG particle distribution functions to the pair-wise decomposable force field, F⃑i,j. If the system is also isotropic, the two-particle correlation function is equal to its translational and orientational average:
(27) |
Equation (27) defines an operator which averages over both translation and rotation of the system, where Ωj,i defines the orientation of the vector r⃑j,i from the i to the j CG site. For an isotropic homogeneous system the gradient term in eqs (26) may then be simplified as Under the additional assumption that the pair interaction between CG sites is central such that,
(28) |
eq (26) may be re-expressed to read
(29) |
Projecting this equation onto the vector u⃑i,j and shifting the integration variable, one obtains the following result:
(30) |
Thus the dot product factor arises naturally in an integral equation theory, just as it did in the MS-CG equations. For a system described by a central pair potential without an external field, the average effect of a third particle on two-particle correlations must lie along the two-particle vector.
Because the left hand side of eq (30) depends only on the distance, ri,j, upon application of the operator P̂RT eq (30) may be expressed as a one-dimensional integral equation:
(31) |
Equation (31) defines a YBG kernel describing the effects of a third particle on two particle correlations,
(32) |
Moreover, it is shown in Appendix B that
(33) |
Therefore, for a homogeneous isotropic system evolving under a central pair potential, the YBG eq (26) may be reduced exactly to the FM equation (21):
(34) |
Appendix D generalizes the YBG theory for a homogeneous isotropic multi-component CG system. If the system is governed by a set of central pair potentials, the generalized YBG equation reduces to a form that is equivalent to the multi-component MS-CG equations (23):
(35) |
A more detailed derivation is provided in the Supporting Information.
The analysis of section 2 elucidates the general significance of three-particle correlations for deducing CG pair potentials in general and the MS-CG pair force field in particular. Furthermore, the equivalence of eq (23) and eq (35) indicates that the MS-CG method incorporates higher-order correlations in a mechanism that is consistent with the well-known statistical mechanics of the liquid state.20 However, while the analysis of subsection 2.1 and subsection 2.2 and, in particular, the “normal” MS-CG equations presented in eq (17), are generally valid and may be applied to determine a CG force field for any system, the preceding analysis of subsection 2.3, which relates the MS-CG equations to the YBG theory, is strictly valid only for isotropic, homogeneous systems. Moreover, there exists no general proof that the two- and three- CG particle distribution functions measured from atomistic MD simulations may be related to a simple pair potential through a YBG-type equation.19,31 Rather, as stressed in subsection 2.1, the residual defined by eq (1) is minimized by a many-body CG interaction potential. The MS-CG pair potential investigated in subsection 2.2 is only an approximate decomposition of this interaction potential, albeit one that has physical significance and which incorporates critical three-body correlations as demonstrated in subsection 2.2 and subsection 2.3.
3. Results
The analysis of the previous section demonstrates that the MS-CG equations reflect two- and three-particle correlations between CG sites observed in atomistic MD simulations. The following figures illustrate the effect of these structural correlations on the normal MS-CG equations for coarse-graining a system of 216 Lennard-Jones (LJ) spheres and a system of 216 simple point charge (SPC) water molecules.32 The CG mapping for the LJ system is an identity operation and the resulting MS-CG equations reflect the atomistic structure. The SPC water system is coarse-grained onto two different one-site models: the COM (COG) model maps each SPC molecule onto a single site located at its center-of-mass (geometry). To facilitate comparison between these systems, the LJ sphere system was modeled using the LJ parameters for the oxygen atom in the SPC model. In the following figures, the solid lines correspond to the LJ system, the dashed lines correspond to the SPC system analyzed in terms of the COG, and the dotted lines correspond to the SPC system analyzed in terms of the COM. All MD simulations were performed in the constant NVT ensemble using the GROMACS 3.3.0 software package,33 with V0=(1.8602 nm)3 and T = 298K maintained with the Nose-Hoover thermostat.34,35 All quantities computed from MD simulation were tabulated on a grid with Δr = 0.001nm according to the discrete delta function basis defined in eq (9). Because the grid is so fine, the systems were simulated for 30ns in order to adequately converge all relevant quantities and accurately evaluate necessary numerical derivatives.
Analyzing the MS-CG procedure for the LJ fluid is instructive because the “coarse-graining” for this system is an identity mapping. Clearly there exists a central pair potential that will exactly reproduce the system structure (i.e., the LJ pair potential,) and it is illuminating to investigate the mechanism by which the MS-CG procedure recovers this pair potential from the relevant distribution functions. In contrast, there does not necessarily exist a pair potential for a one-site CG model of water that will reproduce both the two- and three- particle distribution functions measured in atomistic MD simulations.19,31 The MS-CG procedure determines a central pair potential that would reproduce this structure under the assumption that the CG distribution functions were related by the YBG equation to the assumed central pair potential.
Equation (20) expresses the MS-CG equations in terms of the radial distribution function (rdf), g(r). The rdf’s measured for CG sites in the atomistic simulations are presented in Figure 2. The maxima and minima of the LJ rdf are roughly evenly spaced, corresponding to simple packing. In contrast, the first peak of both SPC rdf’s is much more narrow and the first minima is at shorter range, reflecting the short-range packing effects in water due to the presence of hydrogen atoms. The SPC-COM rdf has a larger and more pronounced hard core region because the CG site roughly corresponds to the oxygen atom of each molecule and the hydrogen atoms that have been integrated out in the CG representation generate a larger excluded volume between the CG sites. In contrast, the SPC-COG model has a less well-defined hard-core region and presents a smaller excluded volume effect. The LJ rdf indicates much longer and more pronounced correlations than either representation of SPC water.
As indicated in section 2.2, if three-particle correlations were not considered in the MS-CG procedure, the resulting CG pair potential would be the conventional two-body pmf. Consequently, the difference between the two-body pmf, w(r), and the MS-CG pair potential may be attributed entirely to the effect of three-particle correlations. Figure 3 directly compares the MS-CG force field, f (r), (black curve) with the mean force, fmf (r) = −w′ (r), (red curve) for each system. In Figure 3, panel (a) corresponds to the LJ system, while panels (b) and (c) correspond to the SPC-COG and SPC-COM systems, respectively. Figure 4 compares the MS-CG forces (panel a) and the mean forces (panel b) for the different systems. In Figure 4 solid curves describe the LJ system, dashed curves describe the SPC-COG system, and dotted curves describe the SPC-COM system. The mean force is computed from fmf (r) = kBTg′ (r) / g(r), where g′ (r) is evaluated using atomistic force information according to eq (18) and the rdf has been smoothed with a running-average over adjacent table elements. The exact LJ force field is also presented as a solid light green line in Figure 3a and it can be seen that the difference between the exact LJ force field and the MS-CG force field for the LJ system is essentially within the thickness of the lines. The mean force for both SPC models vanishes rapidly with increasing r and is relatively featureless after the first attractive well, while the mean force for the LJ model demonstrates longer-range interactions with significant repulsive and attractive regions. The degrees of freedom that have been integrated out in the coarse-grained SPC model screen the mean force between molecules. The MS-CG force field obtained from either representation of the SPC model contains two attractive wells. The first attractive well is deeper and more narrow than the attractive well in the LJ force field. The rapid decay of the mean force after the attractive well in both CG SPC models generates a repulsive barrier in the MS-CG force field at the same distance as the first minimum in the SPC rdf, which corresponds to the presence of hydrogen bonding between each SPC oxygen atom and the second nearest non-bonded hydrogen.36 The barrier between the attractive wells is larger and occurs at shorter range for the SPC-COM model than for the SPC-COG model. The oscillations in the LJ mean force correspond to a smooth monotonic decay of the MS-CG force field.
Three-particle correlations enter the MS-CG equations through the quantity defined in eq (16), which in the continuum limit may be represented by the symmetric function This quantity is plotted in Figure 5 as a function of r for r′ = 0.3 and 0.6 nm . This function reflects the influence of both direct and indirect excluded volume effects on the fluid structure and, ultimately, on the MS-CG interaction potential between CG sites. The three-particle correlations couple the equations for the force field elements, fd, according to eq (17). Figure 5 has been presented to highlight the fine structure of the three-particle correlations between CG sites within the atomistic system. The range of the coupling, though, is somewhat exaggerated by Figure 5, since the other terms in eq (17) scale as r2 due to the statistics of particles on a shell.
For molecular scale coarse-graining (i.e., coarse-graining on length- and time- scales for which molecular excluded volume is relevant), the presence of a fixed third CG particle impacts the distribution of a given pair of CG sites. As illustrated in Figure 1, the geometry of contact between the second and third CG particles defines an excluded volume cone. This cone corresponds to a range of angles θi;j,k ≤ θ*i;j,k ≈ 0 that are not sampled during the MD simulation and, as a result, the spherical average of the cosine of this angle (cos θi;j,k = u⃑i,j · u⃑i,k) does not vanish. This depletion effect is clearly evident in the calculations of Figure 5, where it can be seen that G(3) (r,r′) has a negative peak centered at r ≈ r′. Figure 5 also demonstrates a corresponding density enhancement resulting from the solvation shell of the third CG particle, centered roughly at r ≈ r′ ± 0.3nm. The negative peak of the three-particle correlation function is most pronounced near the first maximum of the rdf. For larger r′, the peak is more diffuse. The SPC water and LJ fluid three-particle correlations are quite similar for r,r′ > 0.3nm and this similarity increases with increasing r,r′. This result is perhaps quite surprising and suggests that, for simple fluids, a reasonable approximation to the three-particle correlation functions important to the MS-CG method might be obtained by appropriately rescaling the LJ three-particle correlation function. This would dramatically reduce the necessary simulation time involved in determining the MS-CG force field, since the three-particle correlation function is clearly the most difficult quantity in the MS-CG equations to converge.
4. Discussion
The analysis of section 2.1 proves that, if no approximations are made in the form of the CG force field, then the MS-CG method determines a multi-dimensional potential of mean force between CG sites according to eq (5). Simulations of CG models employing this many-body interaction potential will sample CG representations of atomistic configurations according to the atomistic distribution function. Such a simulation would reproduce any structural and thermodynamic properties of the atomistic system that may be observed in the CG simulation. However, the calculation and simulation of such a potential are in general not practical. Rather, CG force fields often employ pair potentials to model non-bonded interactions between CG sites. Consequently, it is important to understand the mechanism by which CG pair potentials incorporate many-body effects to approximate this multi-dimensional potential of mean force.
The YBG equation states an exact relationship between the equilibrium two- and three-particle distribution functions resulting from simulations of a given pair potential function.20 If this pair potential depends only upon the inter-particle distance, then, for an isotropic homogeneous system, the YBG equation may be reduced to a one-dimensional integral equation that is equivalent to the MS-CG equations. Thus the MS-CG equations are equivalent in form to a statement of the exact relationship between the two- and three- particle distribution functions that arise from simulations of a homogeneous, isotropic system employing a central pair potential. (However, it should be noted that the converse of the YBG relationship does not necessarily follow. Although, a given pair potential determines a set of resulting correlation functions, a given set of correlation functions does not necessarily determine a pair potential. Thus this relationship does not directly address the general validity of approximating the many-body CG force field with a pairwise additive form.)
The MS-CG procedure may therefore be considered a novel numerical mechanism for solving the ‘inverse’ problem of liquid state theory,19 i.e., determining an interaction pair potential that generates a given set of distribution functions. If a given set of two- and three-particle distribution functions are generated from a pair interaction, then this interaction may be determined by inverting the generalized YBG equation (26) for F⃑(x⃑i,x⃑j). If, moreover, this interaction is central and depends only upon the distance between the two particles, then this pair-wise central force field may be determined by solving the MS-CG equation (23) for f(ri,j). The MS-CG method takes advantage of atomic force information rather than directly evaluating the numerical derivative of the rdf, which may require extensive sampling.
The Ornstein-Zernicke (OZ) and YBG equations are two of the most well known integral equations for liquid state theory.20 The ‘direct’ solution to either integral equation for the two-particle distribution function requires an (approximate) closure relation that determines a second unknown function in the integral equation. The closure for the OZ equation involves the direct correlation function, c(r), while the closure relation for the YBG equation involves the three-particle distribution function. The OZ equation has been particularly useful for investigating the structure of simple liquids because the direct correlation function is short-ranged and, consequently, more amenable to theoretical and numerical analysis than the YBG equation which depends upon the three-particle correlation function, g(3). The OZ equation has also been employed in developing force fields for CG models.14 However, the YBG equation may be more useful than the OZ equation for solving the inverse problem in developing CG force fields. Although c(r) may be determined by directly inverting the OZ equation, approximate closure relations are necessary to relate the direct correlation function to a CG pair potential. In contrast, the YBG equation for the CG pair potential may be closed by directly computing the relevant distributions of CG sites within an atomistic MD simulation. Inverting the resulting equation provides a force field that will reproduce the correct CG structure, under the assumption that such a force field exists.
It is a fundamental assumption that the CG distribution functions measured in atomistic MD simulations may be generated from a pair-wise central force field. In general, there may not exist such a CG pair potential that will reproduce both the two- and three- particle distribution functions characterizing a given system. Consequently the MS-CG equations presented in eq (21) are not necessarily an exact identity describing the CG structure because the assumed pair potential giving rise to the measured CG two- and three-particle distribution functions may not exist. In fact, recent work has demonstrated that many-body37–41 and non-central42,43 interactions may be critical for the coarse-grained modeling of proteins.
The underlying variational principle provides tremendous flexibility in the MS-CG methodology. By minimizing the residual in eq (1), the MS-CG method systematically determines an optimal approximation to the multi-dimensional many-body CG PMF. If no assumptions are made in the form of the CG interaction, the MS-CG procedure determines this many-body PMF, which would reproduce both two- and three-particle structural correlations in CG simulations. Previous numerical implementations of the MS-CG procedure16,17,24–28 have determined and employed an optimal central pair decomposition of this CG PMF. However, in principle, the MS-CG procedure may be readily generalized to incorporate both non-central and many-body effects within the CG force field by relaxing the assumptions explicit in eq (7)–eq (8). Many-body interactions may be incorporated into the MS-CG force field either by generalizing the approach of eq (9) and tabulating additional many-body interaction terms on a multidimensional grid or by parameterizing an assumed functional form by minimizing the MS-CG residual. Similarly, non-central interactions may also be incorporated into the CG force field, for instance, by expanding the pair interaction in eq (8) as a series of spherical harmonics.43 However, if the interaction does not depend linearly upon the force field parameters, the minimization procedure for optimizing the MS-CG force field requires a nonlinear regression algorithm.22
The reverse Monte Carlo (RMC) method has been employed to determine a CG potential that reproduces a given CG rdf when used in CG simulations.2,29 Chayes and Chayes19 have proven that there does indeed exist a unique44 pair potential that will reproduce an observed rdf.14 However, this potential is not guaranteed to reproduce higher order distribution functions.31 Although by design the RMC method should reproduce CG rdf’s to the desired numerical precision,2,29 simulations employing this pair potential may not necessarily reproduce the correct three-particle correlations for the system. In particular, using a similar RMC procedure to generate water configurations from known radial distribution functions, Jedlovsky et al.45 have demonstrated that three-particle correlations may be inaccurately reproduced although the observed rdf’s are quantitatively accurate. In contrast, Iuchi et al.46 have demonstrated that the MS-CG procedure accurately reproduces both the two- and three-particle correlations of a polarizable 4-site water model using a non-polarizable 4-site MS-CG force field.
It is of some interest to briefly compare the MS-CG and RMC methods for developing CG potentials in light of the YBG equation for homogeneous isotropic systems. The MS-CG method implicitly measures the two- and three- particle correlation functions describing CG sites within an atomistic MD simulation and then directly inverts the YBG equation to determine a central CG pair potential that would generate these distribution functions, should such a potential exist. In general such a potential may not exist and, consequently, simulations employing the MS-CG pair potential will not necessarily reproduce either the two- or three-particle correlations exactly. Rather CG simulations employing the MS-CG force field will satisfy a different YBG equation relating the fixed MS-CG force field and the resulting CG distribution functions. Although the MS-CG force field does not necessarily reproduce either the two- or three-particle distribution functions exactly, the method clearly incorporates three-particle correlations. Moreover, because the method does not necessarily reproduce the pair correlation functions, comparison of the relevant rdf’s between atomistic and CG MD simulations is a useful measure of the validity of the CG model.
In contrast, the RMC method only directly considers the two-particle correlation function and attempts to solve the YBG equation for the pair interaction that reproduces the target rdf while allowing the three-particle correlation function to vary as necessary.2,29 The repeated MC or MD simulations used in iteratively updating the pair potential may be considered a nonlinear regression algorithm that tries to solve the YBG equation for a pair potential reproducing a fixed rdf. If three-particle correlations were not significant in the YBG equation, then the pair potential would be simply the two-body pmf, which is often implemented as an initial condition in the search for the optimal pair potential. With succesive iterations the simulated rdf converges to the target rdf measured from atomistic MD simulations.2 The RMC method implicitly incorporates information regarding three-particle correlations by updating the force field to improve agreement between the measured and target rdf, although in successive simulations the three-particle correlations may change. The YBG equation that is implicitly solved through the RMC method incorporates the target rdf and is guaranteed to reproduce this rdf. However, the three-particle correlations in the final YBG equation may be different than those in the original atomistic representation of the system.
In closing, two additional points require brief discussion. The preceding analysis directly addresses non-bonded interactions between CG sites. Bonded interactions may be treated as special cases of non-bonded interactions or by introducing additional terms into the MS-CG potential. In the latter case, the contributions from these interactions should be subtracted from the total forces on CG sites and the preceding analysis still holds for non-bonded interactions. Additionally, it has been mentioned repeatedly that the MS-CG equations are equivalent to the YBG equation for a homogeneous, isotropic system evolving under a central pair potential. In principle, a system described by a central pair potential may be both homogeneous and isotropic. However, these symmetries are not necessarily realized in MD simulations of complex interfacial or biological systems. The presence of a lipid bilayer or a protein in an MD simulation box may break this symmetry. The analysis of Section 2.3, though, requires the additional assumptions of isotropy and homogeneity. Consequently, the FM equations are rigorously equivalent to the YBG equations in eq (35) only for relatively simple systems. However, the analysis of Section 2.2 demonstrates that the FM force field incorporates critical information regarding three-particle correlations for any system. It has been empirically demonstrated that even for highly complex systems the MS-CG method generates a useful and quantitatively accurate model.16,17,24–28,46
5. Conclusions
The present work develops a statistical mechanical framework for understanding the foundations and success of the MS-CG method. It has been shown that the MS-CG method may be used to derive a multi-dimensional pmf for the interactions between CG sites. The “normal” MS-CG equations have been derived for approximating this many-body interaction with central pair potentials. The derivation demonstrates that the MS-CG equations for these potentials reflect not only two-body, but also three-body correlations observed between CG sites during atomistic MD simulations. The generalized YBG equation has been presented for CG systems and it has been proven that this generalized YBG equation is equivalent to the normal MS-CG equations for a homogeneous, isotropic system evolving under a central pair potential. The present analysis provides a connection between the MS-CG method and equilibrium statistical mechanics and also illuminates the general significance of three-particle correlations for deriving CG effective pair potentials.
Supplementary Material
Acknowledgements
W.G. Noid acknowledges funding from the NIH through a Ruth L. Kirschstein NRSA postdoctoral fellowship grant number 5F32GM076839 - 02. This research was also supported in part by the National Science Foundation (CHE-0628257). An allocation of computer time from the Center for High Performance Computing at the University of Utah is gratefully acknowledged. The computational resources for this project have been provided by the National Institutes of Health (Grant # NCRR 1 S10 RR17214-01) on the Arches Metacluster, administered by the University of Utah Center for High Performance Computing. W.G. Noid acknowledges many stimulating conversations with Dr. V. Krishna.
Appendices
Appendices A and B involve the properties of Dirac delta functions in spherical polar coordinates. In particular, the derivation of eq (18) and eq (33) employ the following relation47
(36) |
which is valid for all r0 ≠ 0. In eq (36), ASP (r,Ω0 ) is the representation of the function A(r⃑) in spherical polar coordinates evaluated in the direction Ω0 defined by the vector r⃑0.
Appendix A. Derivation of Equation (18)
Assume that the CG configurations employed in the FM procedure were sampled according to a distribution function, where is a many-particle CG potential function describing the interactions between CG sites. Selecting two arbitrary CG sites, i and j, and transforming into “sum,” and “difference,” r⃑ij = r⃑i − r⃑j, variables, then it follows that:
(37) |
The partial derivative is evaluated with respect to the inter-CG site distance (difference variable) rij, while holding fixed the relative (difference) orientation Ωij, and average position of the two CG sites R⃑ij as well as the remaining NCG − 2 sites,
Consider next the quantity Gij(r) = 〈δ(r − rij)〉. Employing the identity in eq (36), it can be proven that
(38) |
Let i and j correspond to CG site types α and β, respectively, and define the radial distribution function
(39) |
then
(40) |
In the case that α = β, the two terms in the summations are equivalent and the result simplifies:
(41) |
A more detailed derivation is provided in the supporting information section.
Appendix B. Derivation of Equation (33)
Consider the distribution of three distinct CG particles at positions r⃑1, r⃑2, and r⃑3 corresponding to particles iα, jβ, and kγ in Figure 1. The FM kernel is defined
(42) |
The YBG kernel is defined:
(43) |
where Ωn1 defines the orientation of the unit vector u⃑n and the three-particle correlation function is defined:
(44) |
The definitions in eq (42) and eq (44) assume that the average over configurations is equivalent to the appropriate ensemble average. After integrating over r⃑1 in eq (43), the YBG kernel may be expressed
(45) |
The integrals over the orientational degrees of freedom may then be evaluated according to the identity in eq (36). Upon performing these integrals within the ensemble average, one obtains the desired result:
(46) |
Appendix C. Normal FM equations for a multi-component system
The FM residual for a system with multiple types of CG sites may be expressed:
(47) |
In eq (47) and in the following, Greek indices ( α) represent types of CG sites, Latin subscripts ( i ) represent particular sites of a given CG type, and the superscript ( I ) labels configurations sampled during the atomistic MD simulation. As before it is assumed that the FM force field is pair-wise additive, such that
(48) |
and, furthermore, that this force field is central:
(49) |
Employing the discrete delta function basis of eq (9), a system of linear equations (i.e., the “normal” FM equations for a multi-component system) is obtained by minimizing the residual in eq (47) with respect to each interaction,
(50) |
in which
(51) |
and
(52) |
Equation (50) generalizes eq (10) and employs the symmetry The quantity defined in eq (52) may be decomposed into two- and three-particle contributions according to:
(53) |
where
(54) |
and
(55) |
Employing these definitions, eq (50) may then be re-expressed:
(56) |
Upon multiplication by the factor 1/(4πραρβV0·rd2) and application of the definition in eq (39) and the identity in eq (41) from Appendix A, eq (56) may be expressed:
(57) |
Passing into the continuum limit, eq (57) may be represented by a linear one-dimensional integral equation:
(58) |
where
(59) |
and, according to Appendix B,
Appendix D. Generalized YBG equation for multi-component CG system
The Hamiltonian for a system with Nα sites of type α and total CG sites evolving under an NCG particle potential function, VNCG, is defined:
(60) |
Assuming that the total force on each CG site, arises from a sum of pair interactions,
(61) |
then for a homogeneous system with ρα(r⃑iα) = ρα = const, the following relations describe the distribution of three arbitrary but distinct CG particles, iα, jβ, and kγ :
(62) |
Equations (62) generalize the Yvon-Born-Green (YBG) equation20 and relate the equilibrium two- and three- CG particle distribution functions to the pair-wise decomposable force field, F⃑iα·jβ. As before, for an isotropic homogeneous system, the two-particle distribution function is equal to its rotational and translational average, and the gradient terms in eqs (62) may then be simplified: Under the additional assumption that the pair interaction between CG sites is central, the first equality in eq (62) may be re-expressed:
(63) |
Projecting this equation onto the vector u⃑iα,jβ and shifting the integration variable, one obtains the result:
(64) |
where,
(65) |
and, according to Appendix B, Similar manipulation of the second relation in eq (62) yields
(66) |
The left hand sides and thus the right hand sides of eq (64) and eq (66) are identical. The two equations may then be averaged to yield the final result, the YBG equation for a homogeneous, isotropic multi-component CG system evolving under a central pair potential:
(67) |
Therefore, the generalized YBG equation for a homogeneous and isotropic multi-component CG system evolving under a central pair potential is equivalent to the normal multi-component FM eqs (58).
References
- 1.Karplus M, McCammon JA. Nat. Struct. Biol. 2002;9:646. doi: 10.1038/nsb0902-646. [DOI] [PubMed] [Google Scholar]
- 2.Muller-Plathe F. Chemphyschem. 2002;3:754. doi: 10.1002/1439-7641(20020916)3:9<754::aid-cphc754>3.0.co;2-u. [DOI] [PubMed] [Google Scholar]
- 3.Duan Y, Kollman PA. Science. 1998;282:740. doi: 10.1126/science.282.5389.740. [DOI] [PubMed] [Google Scholar]
- 4.Daggett V. Chem. Rev. 2006;106:1898. doi: 10.1021/cr0404242. [DOI] [PubMed] [Google Scholar]
- 5.Wulfing C, Sjaastad MD, Davis MM. Proc. Natl. Acad. Sci. U. S. A. 1998;95:6302. doi: 10.1073/pnas.95.11.6302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Akkermans RLC, Briels WJ. J. Chem. Phys. 2000;113:6409. [Google Scholar]
- 7.Akkermans RLC, Briels WJ. J. Chem. Phys. 2001;114:1020. [Google Scholar]
- 8.Forrest BM, Suter UW. J. Chem. Phys. 1995;102:7256. [Google Scholar]
- 9.Shelley JC, Shelley MY, Reeder RC, Bandyopadhyay S, Klein ML. J. Phys. Chem. B. 2001;105:4464. [Google Scholar]
- 10.Lopez CF, Moore PB, Shelley JC, Shelley MY, Klein ML. Comput. Phys. Commun. 2002;147:1. [Google Scholar]
- 11.Nielsen SO, Lopez CF, Ivanov I, Moore PB, Shelley JC, Klein ML. Biophys. J. 2004;87:2107. doi: 10.1529/biophysj.104.040311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Nielsen SO, Lopez CF, Srinivas G, Klein ML. J. Phys.: Condens. Matter. 2004;16:R481. [Google Scholar]
- 13.Bolhuis PG, Louis AA, Hansen JP. Phys. Rev. E. 2001;6402 [Google Scholar]
- 14.Louis AA, Bolhuis PG, Hansen JP, Meijer EJ. Phys. Rev. Lett. 2000;85:2522. doi: 10.1103/PhysRevLett.85.2522. [DOI] [PubMed] [Google Scholar]
- 15.Marrink SJ, de Vries AH, Mark AE. J. Phys. Chem. B. 2004;108:750. [Google Scholar]
- 16.Izvekov S, Voth GA. J. Chem. Phys. 2005;123:134105. doi: 10.1063/1.2038787. [DOI] [PubMed] [Google Scholar]
- 17.Izvekov S, Voth GA. J. Phys. Chem. B. 2005;109:2469. doi: 10.1021/jp044629q. [DOI] [PubMed] [Google Scholar]
- 18.See, for example, the recent work inJ. Chem. Theory Comput. 2006;2(3) and references cited therein.
- 19.Chayes JT, Chayes L. J. Stat. Phys. 1984;36:471. [Google Scholar]
- 20.Hansen JP, McDonald IR. Theory of Simple Liquids. 2 ed. San Diego, CA: Academic Press; 1990. [Google Scholar]
- 21.Reatto L, Levesque D, Weis JJ. Phys. Rev. A. 1986;33:3451. doi: 10.1103/physreva.33.3451. [DOI] [PubMed] [Google Scholar]
- 22.Ercolessi F, Adams JB. Europhys. Lett. 1994;26:583. [Google Scholar]
- 23.Izvekov S, Parrinello M, Burnham CJ, Voth GA. J. Chem. Phys. 2004;120:10896. doi: 10.1063/1.1739396. [DOI] [PubMed] [Google Scholar]
- 24.Wang YT, Izvekov S, Yan TY, Voth GA. J. Phys. Chem. B. 2006;110:3564. doi: 10.1021/jp0548220. [DOI] [PubMed] [Google Scholar]
- 25.Izvekov S, Voth GA. J. Chem. Theory and Comput. 2006;2:637. doi: 10.1021/ct050300c. [DOI] [PubMed] [Google Scholar]
- 26.Zhou J, Thorpe IF, Izvekov S, Voth GA. Biophys. J. 2007 doi: 10.1529/biophysj.106.094425. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Izvekov S, Violi A, Voth GA. J. Phys. Chem. B. 2005;109:17019. doi: 10.1021/jp0530496. [DOI] [PubMed] [Google Scholar]
- 28.Shi Q, Izvekov S, Voth GA. J. Phys. Chem. B. 2006;110:15045. doi: 10.1021/jp062700h. [DOI] [PubMed] [Google Scholar]
- 29.Lyubartsev A, Laaksonen A. Phys. Rev. E. 1995;52:3730. doi: 10.1103/physreve.52.3730. [DOI] [PubMed] [Google Scholar]
- 30.Schulman LS. Techniques and applications of path integration. John Wiley and Sons; 1981. [Google Scholar]
- 31.Evans R. Mol. Sim. 1990;4:409. [Google Scholar]
- 32.Berendsen HJ, Postma JPM, van Gunsteren WF, Hermans J. In: Intermolecular Forces. Pullman B, editor. Reidel, Dordrecht; 1981. p. 331. [Google Scholar]
- 33.Van Der Spoel D, Lindahl E, Hess B, Groenhof G, Mark AE, Berendsen HJ. J. Comput. Chem. 2005;26:1701. doi: 10.1002/jcc.20291. [DOI] [PubMed] [Google Scholar]
- 34.Hoover WG. Phys. Rev. A. 1985;31:1695. doi: 10.1103/physreva.31.1695. [DOI] [PubMed] [Google Scholar]
- 35.Nose S. Mol. Phys. 1984;52:255. [Google Scholar]
- 36.Chandler D. Introduction to modern statistical mechanics. Oxford University Press; 1987. [Google Scholar]
- 37.Kolinski A, Skolnick J. J. Chem. Phys. 1992;97:9412. [Google Scholar]
- 38.Kolinski A, Skolnick J. Proteins: Structure, Function, and Genetics. 1994;18:338. [Google Scholar]
- 39.Liwo A, Czaplewski C, Pillardy J, Scheraga HA. J. Chem. Phys. 2001;115:2323. [Google Scholar]
- 40.Liwo A, Kazmierkiewicz R, Czaplewski C, Groth M, Oldziej S, Wawak RJ, Rackovsky S, Pincus MR, Scheraga HA. Journal of Computational Chemistry. 1998;19:259. [Google Scholar]
- 41.Vendruscolo M, Domany E. J. Chem. Phys. 1998;109:11101. [Google Scholar]
- 42.Buchete NV, Straub JE. J. Chem. Phys. 2003;118:7658. [Google Scholar]
- 43.Buchete NV, Straub JE, Thirumalai D. Journal of Molecular Graphics and Modeling. 2004;22:441. doi: 10.1016/j.jmgm.2003.12.010. [DOI] [PubMed] [Google Scholar]
- 44.Henderson RL. Phys. Lett. 1974;49A:197. [Google Scholar]
- 45.Jedlovszky P, Bako I, Palinkas G, Radnai T, Soper AK. J. Chem. Phys. 1996;105:245. [Google Scholar]
- 46.Iuchi S, Izvekov S, Voth GA. J. Chem. Phys. 2007 doi: 10.1063/1.2710252. in press. [DOI] [PubMed] [Google Scholar]
- 47.Arfken GB, Weber HJ. Mathematical methods for physicists. San Diego: Academic Press; 1995. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.