Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2020 Sep 18;153(11):114116. doi: 10.1063/5.0019560

Efficient formulation of polarizable Gaussian multipole electrostatics for biomolecular simulations

Haixin Wei 1, Ruxi Qi 1, Junmei Wang 2, Piotr Cieplak 3, Yong Duan 4, Ray Luo 1,a)
PMCID: PMC7502018  PMID: 32962395

Abstract

Molecular dynamics simulations of biomolecules have been widely adopted in biomedical studies. As classical point-charge models continue to be used in routine biomolecular applications, there have been growing demands on developing polarizable force fields for handling more complicated biomolecular processes. Here, we focus on a recently proposed polarizable Gaussian Multipole (pGM) model for biomolecular simulations. A key benefit of pGM is its screening of all short-range electrostatic interactions in a physically consistent manner, which is critical for stable charge-fitting and is needed to reproduce molecular anisotropy. Another advantage of pGM is that each atom’s multipoles are represented by a single Gaussian function or its derivatives, allowing for more efficient electrostatics than other Gaussian-based models. In this study, we present an efficient formulation for the pGM model defined with respect to a local frame formed with a set of covalent basis vectors. The covalent basis vectors are chosen to be along each atom’s covalent bonding directions. The new local frame can better accommodate the fact that permanent dipoles are primarily aligned along covalent bonds due to the differences in electronegativity of bonded atoms. It also allows molecular flexibility during molecular simulations and facilitates an efficient formulation of analytical electrostatic forces without explicit torque computation. Subsequent numerical tests show that analytical atomic forces agree excellently with numerical finite-difference forces for the tested system. Finally, the new pGM electrostatics algorithm is interfaced with the particle mesh Ewald (PME) implementation in Amber for molecular simulations under the periodic boundary conditions. To validate the overall pGM/PME electrostatics, we conducted an NVE simulation for a small water box of 512 water molecules. Our results show that to achieve energy conservation in the polarizable model, it is important to ensure enough accuracy on both PME and induction iteration. It is hoped that the reformulated pGM model will facilitate the development of future force fields based on the pGM electrostatics for applications in biomolecular systems and processes where polarization plays crucial roles.

I. INTRODUCTION

Atomistic simulations of biomolecules have been applied in a wide range of biological systems.1 While additive nonpolarizable models will continue to play important roles,2–4 nonadditive polarizable models are expected to extend our ability to study more complex biomolecular systems and processes. Nonpolarizable models typically use fixed atom-centered partial charges to model electrostatics and include the polarization response to the environment (mostly in water) only in an averaged, mean-field manner. Subsequently, nonpolarizable models that provide excellent descriptions of the homogeneous bulk phase are poor models for gas-phase clusters or in nonpolar solvents. The importance of modeling nonadditive effects is well known.5 For example, the gas-phase water dimer interaction energy is overestimated by more than 30% in the TIP5P model.6 Similarly, for large biomolecular systems, there are concerns that such models cannot correctly account for situations where the same nonpolarizable moiety is exposed to different electrostatic environments/solvents, either within a single large structure or during a simulation process. In addition, there is an inherent inconsistency in most nonpolarizable models related to their static inclusion of average bulk polarization within the potential. This results in internal energies and other properties that are derived against a gas-phase reference state, which is already “pre-polarized” for the liquid phase. These limitations lead to issues in modeling multiple important problems such as pH-dependent processes, ion-dependent interactions, order–disorder transition, and enzymatic reactions.

In response to the above concerns, much effort has been invested on the inclusion of explicit polarization within the molecular mechanics (MM) potentials.7–9 Several methods are available to explicitly model polarization in molecular simulations, such as the Drude oscillator,10,11 fluctuating charges,12 and induced dipoles.6,13,14 The use of polarizable point dipoles is a classical approach with a long history in molecular simulation.15 The original induced dipole model of Applequist places the induced point dipoles on atom centers.16 However, this model suffers from the so-called “polarization catastrophe”: when the interaction between two mutually interacting induced dipoles with atomic polarizabilities diverges at a finite distance. Thole proposed a solution by applying a damping function to the induced dipole–induced dipole interactions.17 However, a drawback to this model is that it does not prescribe how the induced dipoles and permanent charges interact. A great deal of effort has been devoted to developing modern polarizable models, including the fluctuating charge models18,19 in the context of Optimized Potentials for Liquid Simulations-All Atoms (OPLS-AA), the fluctuating charge model and the Drude oscillator model20–23 in the context of Chemistry at Harvard Molecular Mechanics (CHARMM), and detailed multipole expansions and more complicated MM potentials in the context of Amoeba.24 In Amber, polarization was implemented with the induced dipoles.25 In Amber ff12pol, the induced dipoles are calculated using Thole models to avoid “polarization catastrophe.”26–29

Another limitation of widely adopted nonpolarizable models is their use of partial atomic charges in the electrostatic models, which often lack sufficient mathematical flexibility to describe the electrostatic potential (ESP) around molecules. Williams showed that optimal least-squares fitting of atom-centered partial charges resulted in relative root-mean-square errors of 3%–10% over a set of grid points in a shell outside the surface of a series of small polar molecules.30 These errors were reduced by 2–3 orders of magnitude via the use of higher atomic multipoles.6 In Amoeba force fields, multipoles are placed on each atom, allowing better capture of electrostatic potential distribution around molecules.31,32 The Gaussian electrostatic model (GEM) is a force field based on density fitting, which can extend to arbitrary angular momenta (multipoles).33–35 Of course, there are many other proposals to model electronic polarization in the literature.12,36–38

Recently, Elking et al. proposed a polarizable multipole model with Gaussian charge densities.39 A key benefit of the polarizable Gaussian Multipole (pGM) model is its screening of all short-range electrostatic interactions in a physically consistent manner. This is critical for stable charge-fitting in polarizable force fields when the polarizations of 1–2 and 1–3 charges are included and are needed to reproduce molecular anisotropy, as discussed in Ref. 40. Of course, this strategy would also require us to redesign the valence terms to retain close to harmonic behaviors. An advantage of pGM is that each atom’s multipoles are represented by a single Gaussian function and its derivatives with different amplitudes. Therefore, pGM is a minimalist Gaussian polarizable model. In comparison, the GEM model33–35 treats nuclear charges explicitly and uses Hermite Gaussian auxiliary basis sets to reproduce atomic electron density, so it has the potential to represent the short-range interactions more faithfully than the pGM model. However, because the computational cost of the nonbonded electrostatic calculation scales as the squared number of functions on each atom, the multiple functions used to represent each atom in the GEM can notably increase the simulation cost. The increased number of parameters associated with the functions may also pose additional challenges in parameterization. Another major difference is that the GEM,33–35 like several other efforts, such as X-Pol41 and Amoeba,31,32 uses electronic densities to model molecular polarization and other effects. In comparison, our pGM model follows the Amber tradition and uses the ab initio electrostatic potential to fit the parameters of atomic partial charges and dipoles.

Most macromolecular simulations with long-range electrostatic interactions are performed using periodic boundary conditions. A rigorous treatment of electrostatic interactions in periodic boundary conditions requires a careful treatment of the associated lattice sums. Thus, the widely used lattice sum methods, such as particle mesh Ewald (PME), need to be extended to handle multipolar related summations. Fortunately, efficient implementations of PME of dipoles and higher multipoles are already available in widely used software packages, such as Amber.42,43 This greatly simplifies the integration of pGM with PME for molecular simulations.

In Secs. II and III, we first describe the detailed pGM electrostatics scheme with a focus on how to define the atomic Gaussian multipoles and associated analytical algorithms for force computation. This is followed by algorithmic details of interfacing pGM and PME. We then present the validation of the analytical force formulation and accuracy discussion of pGM in PME simulations. Finally, we conclude the manuscript with a brief discussion of the next steps in our development.

II. THEORY

A. Gaussian density representation of charge distribution

The Gaussian multipole model represents the charge distribution on each atom as a Gaussian-shaped multipole expansion. Hence, an nth order Gaussian multipole with the radius of 1/β, located at position R, is39

ρnr;R=Θ(n)R(n)βπ3expβ2|rR|2. (1)

Here, Θ(n) is the nth rank momentum tensor and ∇(n) is the nth rank gradient operator (Subsection 1 of the Appendix).

In our current pGM model, only monopoles and dipoles are retained, so only the first two terms are needed at each atom as shown below,

ρ(0)r;R=qβπ3exp(β2|rR|2),ρ(1)r;R=μRβπ3exp(β2|rR|2), (2)

where the zeroth-order term represents a monopole and the first-order term represents a dipole.

Once the charge densities are defined, as in Eq. (2), the pairwise Coulombic interaction energy expressions needed for the current pGM model are as follows:

  • (1)

    Monopole–monopole:

q1q2erf(β12R12)R12. (3)
  • (2)

    Monopole–dipole:

q1μ22erfβ12R12R12. (4)
  • (3)

    Dipole–dipole:

μ11μ22erfβ12R12R12, (5)

where erf() is the error function, and

β12=β1β2β12+β22 and R12=R1R2. (6)

Finally, it is often convenient to introduce the dipole–dipole interaction tensor T12=12erfβ12R12R12 so that Eq. (5) can be simplified as μ1T12μ2. Here, it is worth pointing out an important convention used throughout this manuscript. All gradient operators paired with a dipole only operate on coordinates. For example, the gradient operator in μ11 only operates on the atomic coordinates that follow. On the other hand, all other gradient operators that are not paired with a dipole are used in the normal sense. This convention is adopted throughout this manuscript.

It can be shown that an effective potential and corresponding effective field at atomic center R1 can be defined as

ϕeffective=(q2+μ22)erf(β12R12)R12,Eeffective=(q2+μ22)1erfβ12R12R12 (7)

due to a charge distribution at atomic center R2 so that the pairwise Coulomb energies in Eq. (3)–(5) can be reproduced when an effective point charge of q1 and an effective point dipole μ1 are placed at atomic center R1. The use of the effective potential and field simplifies the derivation of pairwise Coulombic force calculations, as shown below. Note that these are different from the real Coulombic potential and field due to Gaussian charges and dipoles. For example, the real potential at any location R1 due to atom 2 at R2 is

ϕreal=(q2+μ22)erf(β2R12)R12. (8)

B. Gaussian multipoles in pGM

In our current pGM model, interactions are modeled with both permanent and induced atomic multipoles at atomic centers, both of which are truncated at the dipole level. The framework can be easily extended to higher-order multipoles, if needed in future developments.

1. Permanent multipoles

Permanent multipoles are the first part of the pGM model and are defined with respect to a local frame overlapped with atom’s covalent bonds. This choice is based on the fact that atomic moments result from atomic covalent bonding interactions. This is also because covalent bonding interactions are along the stiffest degrees of freedom of a molecule. Thus, our design follows the logic that the induced moments are meant to be responsible for changes in molecular moments due to the changes in soft degrees of freedom in molecular simulations. Of course, the partition between permanent and induced moments is somewhat artificial in a moment fitting procedure. Therefore, we refer to permanent multipoles in our pGM model as covalent multipoles in the following discussion.

The zeroth-order covalent multipoles, i.e., covalent monopoles, are simply the atomic partial charges as in other polarizable or nonpolarizable force fields. The first-order multipoles, i.e., covalent dipoles, are expressed in linear combinations of certain basis vectors. We define the basis vectors to be along the bonding directions or, more precisely, covalent interaction directions. Thus, there may be more covalent interactions than the number of bonds needed to fully define all covalent dipoles on an atom. For example, hydrogen atoms in water are with covalent dipole moments not 100% along the H–O bonds, so virtual H–H bonds may be needed to define covalent dipoles more accurately. On the other hand, sp3 carbon atoms may have up to four covalent dipoles due to the presence of four bonds, though the presence of symmetry often reduces the number of unique covalent dipoles. The new local frame originates from a physical consideration that permanent dipole moments are primarily aligned along the covalent bonds due to the differences in electronegativity of bonded atoms.

An illustration of basis vectors is shown in Fig. 1 for O and H atoms of water, and we refer to these as the covalent basis vectors (CBVs). The local frame formed by CBVs on an atom is termed its CBV frame. Another representative case is the alpha carbon atom in proteins, which has four bonds, so there can be four basis vectors in its CBV frame to define the covalent dipoles.

FIG. 1.

FIG. 1.

Definition of covalent basis vectors for atoms in the water molecule. (a) For covalent dipoles centered at A (oxygen), the two basis vectors are eBA and eCA, which are defined as unit vectors along its two O–H bonds whose two covalent dipoles are the same due to symmetry. (b) For covalent dipoles centered at C (hydrogen), the two basis vectors are eAC, the unit vector along the H–O bond, and eBC, the unit vector along the H–H virtual bond. The covalent dipoles centered at the other hydrogen atom B can be defined similarly.

The CBV frame is also chosen for the sake of simplifying force calculations because the basis vectors are directly dependent on the positions of atoms. For example, the gradient of a covalent dipole vector used extensively in force calculations can be obtained easily within the CBV frame as

u=iuiRiRi=iuiIRiRiRiRi3, (9)

where u is the permanent dipole of an atom, Ri is the vector pointing from the atom to its ith bonded atom (including virtually), I is the identity tensor, and the summation is over all covalent interactions of the atom.

Even if quadrupoles are not used in the current pGM model, it is instructive to outline how they are defined in the CBV frame. Given the covalent basis vectors defined above, covalent basis tensors are constructed as dyadic tensors, with each of which formed as a dyadic product of two covalent basis vectors. For example, in the case of oxygen atom with two covalent basis vectors (eBA,eCA) in Fig. 1(a), there are up to four dyadic tensors (eBAeBA,eCAeCA,eBAeCA, and eCAeBA) available to define its quadrupole.

2. Induced multipoles

Induced multipoles are the second part of the pGM model. Only first-order terms, i.e., the induced dipoles, are used. The pGM polarization scheme can naturally avoid the well-known polarization catastrophe in point polarizable models without employing any artificial screening factors17 because distributed dipole densities instead of point dipoles are induced at atomic centers.44

In the current pGM model, the linear polarization relation is retained as follows:

pi=αiEieffective=αi(Eicovalent,effectivejiTijpj),Eicovalent,effective=ji(qj+μjj)ierfβijRijRij,Tij=ijerfβijRijRij, (10)

where pi is the induced dipole and αi is the polarizability coefficient of atom i; Eieffective is the total effective electric field at atom i, which contains two parts: (1) the effective field of covalent dipoles, Eicovalent,effective [Eq. (7)], and (2) that of the induced dipoles, jiTijpj. Note that we have used the effective electric field instead of the real electric field to define the induced dipoles in the pGM model. One reason is to ensure the symmetry of Tij, which greatly reduces the complexity of force calculation later.44 To simplify the following discussion, we drop the effective superscript as we plan to use effective electric fields in all subsequent discussions of energy and force calculations in the pGM model.

Another issue worth pointing out about induced dipoles is their self-energies. The linear polarization itself implies a self-energy term of the form

U=12p2α. (11)

The derivation can be found in many publications.45 However, the pGM model, due to its use of Gaussian distributions of multipoles, posts extra difficulty. For example, a Gaussian charge distribution itself has self-energy, or assembly energy, of the form

U=q2β2π+β3(μ+p)232π. (12)

Clearly, the self-energy is different from Eq. (11), and it does not lead to a linear polarization behavior. In fact, it is difficult to assess the physical meaning for the nonlinear assembly energy, just like it is hard to discuss the physical meaning of the infinitely large assembly/self-energy for a point charge. Thus, for the current model development, we do not consider self-energies beyond Eq. (11).

C. Total electrostatic energy and forces in pGM model

From the introduction of the pGM model in Secs. II A and II B, it is clear that the electrostatic potential energy of the system can be divided into two parts:

  • (1)

    Covalent dipole–covalent dipole interaction energy:

Ucovalentcovalent=12iNjiN(qi+μii)×(qj+μjj)erf(βijRij)Rij. (13)
  • (2)

    Induced energy:

Uinduced=Uinducedcovalent+Uinducedinduced+Uself=iNpiEicovalent+12iNjiN(piTijpj)+12iNpi(EicovalentjiNTijpj)=12iNpiEicovalent (14)

where N denotes the number of atoms in the system. Thus, atomic electrostatic forces can be derived as negative gradients of the above two potential energy terms.

When computing gradients for the covalent–dipole interaction energy, it is very important to know which quantities are the variables of the virtual displacement of atom i. There are two types of variables: (1) pairwise distances between atom i and all other atoms and (2) covalent dipoles on atom i and covalent dipoles on atoms covalently interacting with atom i. Given this classification of variables, we can group the terms in Eq. (13) into four different parts and discuss their gradients with respect to Ri, separately.

The detailed derivations are presented in Subsection 2 of the Appendix, and the final force expression for the covalent–dipole interaction energy is

Ficovalentcovalent=jNiujkjNjqk+ukkerfβjkRjkRjkqi+uiijiNqj+ujjierfβijRijRij, (15)

where N denotes the number of atoms in the system. Briefly, the first term is from the derivatives over the covalent dipoles, and the second term is from derivatives over the pairwise distances.

The derivation of forces for the induced energy in Eq. (14) is not that straightforward. We first need to express the induced dipole, pi, in terms of fields from covalent dipoles only, not as their definition in Eq. (10). This is because pi appears on both sides of Eq. (10), i.e., the induced dipoles mutually influence each other, so it is difficult to take their derivatives. Instead, we proceed by expressing pi, as shown in Subsection 2 of the Appendix, as

pis=A1ijstEjcovalent,t, (16)

where A1ijst is an 3N × 3N matrix, which we do not know the expression of, and s, t and i, j are coordinate component indices and atom indices, respectively. Next, Eq. (14) can be rewritten as

U=12A1jkstEjcovalent,sEkcovalent,t, (17)

where Einstein’s index notation is employed for j, k, s, and t so that a repeated index implies a summation over all possible values of the index, i.e., Eq. (17) is a quadruple summation. Even if we do not know the expression of matrix A1, we can still obtain its gradient with respect to Rk, or the virtual displacement of atom k,

A1ijstxkw=A1iissTijstxkwA1jjtt, (18)

where w refers to coordinate indices (x1, x2, x3), all primed indices follow Einstein’s index notation, and T is the dipole–dipole interaction tensor.

Given the above preparations, the induced part of the force can be obtained as

Fiinduced=piijiN(pjj)ierfβijRijRij+jNi(Ejcovalent)pj, (19)

where N denotes the number of atoms in the system. Details are presented in Subsection 2 of the Appendix. Briefly, the first term is obtained from the derivative of matrix A−1, and the second term is calculated as the derivative of the covalent field Ejcovalent as follows:

i(Ejcovalent)=kjni(uk)kjerf(βjkRjk)βjkRjkqi+uiiijerfβijRijβijRijifjikini(uk)kierf(βikRik)βikRikkiNqk+ukkiierfβikRikβikRikifj=i, (20)

where n denotes all atoms that are covalently (including virtually) bonded with atom i and atom i itself.

In practice, force calculations must be combined with an Ewald summation or particle mesh Ewald (PME) technique to handle long-range under periodic boundary conditions. This is to be discussed in detail in Sec. II E.

D. Ewald summation and PME in pGM

The Ewald summation was introduced to compute the electrostatic energy of an infinite lattice under periodic boundary conditions.46 The basic idea is to put a mask Gaussian charge distribution on the real charge on each atom. Then, a direct-space pairwise summation is conducted to compute the electric field due to real charges masked by the Gaussian charges. This step can be executed with a reasonably short cutoff distance due to the very fast decay after applying the mask Gaussian charges. Next, the field generated by the mask Gaussians can be computed efficiently by a reciprocal-space summation to bring back the original electric field due to the real charges. Finally, a correction step is used to remove interactions not needed in the original electrostatic model. Similar to its use in point charge/dipole models, a mask Gaussian distribution is also used on each moment of each atom in the pGM model,

ρimaskr;Ri=qiβ02π32expβ02rRi2+(μi+pi)Riβ02π32expβ02rRi2, (21)

where β0 is an adjustable parameter usually in the range of about 1512 Å−1, universal for all atoms.

1. Direct summation

Given that the mask distribution is also a Gaussian function, it is straightforward to compute the electrostatic potential, field, and the gradient of the field of a masked pGM charge distribution as follows:

ϕi=jiN[qj+(μj+pj)j]erfβijRijerfβ0RijRij, (22)
Ei=jiN[qj+(μj+pj)j]ierfβijRijerfβ0RijRij, (23)
Ei=jiN[qj+(μj+pj)j]iierfβijRijerfβ0RijRij. (24)

For the current pGM model, no higher-order field is needed. Here, N represents all the atoms including those in the periodic boxes, but their influence would decay to zero very quickly due to the masking effect.

Another point worth pointing out is that the real field of the mask Gaussian multipoles is used, i.e., β0 is used instead of βi0=βiβ0βi2+β02 in the above expressions. The mixed use is not an issue because mask multipoles do not really exist; their role is just a mathematical treatment in the Ewald summation as long as the effect is exactly canceled out in the later step.

2. Reciprocal summation

The reciprocal summation of the pGM model follows the same procedure as a traditional point polarizable model.43 Thus, the electrostatic potential, field, and gradient of the field can be shown as

ϕi=1πVm0expπ2m2β02m2exp2πimRiS(m), (25)
Ei=2iVm0mexpπ2m2β02m2exp2πimRiS(m), (26)
Ei=4πVm0mmexpπ2m2β02m2exp2πimRiS(m), (27)

where the i’s that are not subscripts but the imaginary units, V is the volume of the unit cell, and m is the reciprocal space vector. Sm is the structure factor,

Sm=j=1NL~j(m)exp2πimRj,L~jm=qj+2πi(μj+pj)m. (28)

Here, N is the number of atoms in the primary simulation box only.

3. Correction

The PME correction term is used to handle various specific situations in a force field. For example, most force fields have masked bonded (1–2 and 1–3) atom pairs, which result in no electrostatic interactions between these pairwise atoms. Thus, the interactions among these masked pairs must be removed. However, in the current pGM model, we do not have any masked pairs, so there is no need for such correction.

Another correction that needs paying attention to is the self-interaction correction. This is the only correction in the pGM model. The self-potential, self-field, and gradient of the self-field can be shown as42,43

ϕi=2qiβ0π, (29)
Ei=4(μi+pi)β033π, (30)
Ei=4qiβ033πI. (31)

These terms need to be properly subtracted to obtain the correct potential, field, and field gradient, respectively

In summary, the similarity between the Ewald summation in the pGM model and that in the polarizable point charge/dipole model shows that the PME molecular dynamics (MD) engine for the point charge/dipole model can be easily transplanted over for pGM applications with little revision. There are excellent literature studies discussing the details of PME for polarizable point dipole models and can be safely omitted in this work.42,43

E. Computing forces with the Ewald summation and PME

As pointed out at the end of Sec. II C, analytical force expressions [Eqs. (15) and (19)] cannot be used directly in typical MD simulations since all summations are over infinite numbers of atoms with periodic boundary conditions. They must be combined with an Ewald summation or a PME technique to facilitate solvated-phase simulations. To bypass the infinite summations, the force expressions are reformulated in terms of fields and its derivatives, which are also the quantities that an Ewald or PME procedure would return. The key to express Eqs. (15) and (19) with fields and gradients of fields is to consider the following quantities together:

Eicovalent=jiN(qj+μjj)ierfβijRijRij,Eicovalent=jiN(qj+μjj)iierfβijRijRij,Eiinduced=jiNpjjierfβijRijRij,Eiinduced=jiNpjjiierfβijRijRij, (32)

where N denotes all atoms in the system, including those of the periodic boxes. Thus, these are all infinite summations.

A key step is in the computation of Finducedi, where term jNi(Ejcovalent)pj also has to be reformulated accordingly. Given that Eq. (20) for i(Ejcovalent) lists two separate terms for j = i and ji, we can rewrite jNi(Ejcovalent)pj as follows:

jNi(Ejcovalent)pj=jiNkjn(iuk)kpjjerf(βjkRjk)βjkRjk  jiNqi+μiii(pjj)erfβijRijβijRij  kiniukkpiierfβikRikβikRik  kiNqk+μkkipiierfβikRikβikRik, (33)

where both j = i and ji terms in Eq. (20) are needed due to the outermost summation over j. Combining the first and third terms of Eq. (33) gives

jNkjn(iuk)kpjjerfβjkRjkβjkRjk  jiNqi+μiii(pjj)erfβijRijβijRij  kiNqk+μkkipiierfβikRikβikRik. (34)

Exchanging the summation order for the first term leads to

knjkN(iuk)kpjjerfβjkRjkβjkRjk  jiNqi+μiiipjjerfβijRijβijRij  kiNqk+μkkipiierfβikRikβikRik. (35)

Substitution of the expressions of electric fields and derivatives in Eq. (32) gives

jNiEjcovalentpj=kniukEkinduced+qiEiinduced+uiEiinduced+piEicovalent. (36)

Here, n are those atoms that covalently interact with atom i.

Given the above preparations, Eqs. (15) and (19) can finally be expressed as follows after the substitution of Eqs. (32) and (36):

Ficovalentcovalent=jniujEjcovalent+qiEicovalent+uiEicovalent, (37)
Fiinduced=jniujEjinduced+qiEiinduced+piEiinduced+uiEiinduced+piEicovalent. (38)

Adding these two terms together, the final force expression is obtained as

Fi=jniujEj+qiEi+(ui+pi)Ei. (39)

Equation (39) shows that a key step in this algorithm is to accumulate the atomic electric potential and its first and second derivatives from various components, including both reciprocal and direct summations.

III. RESULTS AND DISCUSSION

A. Validation of analytical electrostatic force expression

To validate the pGM force expressions, e.g., Eq. (15) or (19), we constructed a small toy system of two water molecules in free space. The detailed water pGM parameters are listed in Table I. These parameters were derived with an iterative Restrained ESP (RESP) procedure for the pGM model with quantum mechanical ESP data from a B3LYP/aug-cc-pVTZ calculation of the water dimer.

TABLE I.

Two water molecules in free space. The permanent dipole moments are expressed in the CBV frame according to Fig. 1. The moments in the CBV frame can be obtained via a least square fitting procedure from either the ESP data or the permanent dipole moments in the lab frame.

Tested atoms Water-1 O Water-1 H1 Water-1 H2 Water-2 O Water-2 H1 Water-2 H2
Charge (e) −1.797 045 4 0.898 522 70 0.898 522 70 −1.797 045 4 0.898 522 70 0.898 522 70
Covalent dipole moment (e Å) −0.371 441 54 0.151 706 58 0.151 706 58 −0.371 441 54 0.151 706 58 0.151 706 58
−0.371 441 54 −0.022 490 434 −0.022 490 434 −0.371 441 54 −0.022 490 434 −0.022 490 434
Polarizability (Å3) 1.448 980 10 0.427 350 00 0.427 350 00 1.448 980 10 0.427 350 00 0.427 350 00
Gaussian radius (Å) 0.806 624 90 0.714 759 70 0.714 759 70 0.806 624 90 0.714 759 70 0.714 759 70
Coordinates (Å) −1.387 669 −1.734 110 −1.756 164 1.514 536 1.919 419 0.555 921
−0.006 775 0.790 147 −0.740 122 0.007 522 −0.047 517 −0.008 485
0.110 728 −0.310 036 −0.397 708 −0.121 554 0.751 251 0.043 101

Two methods were used to calculate the atomic forces. The first method is to use the force expressions to calculate forces analytically. The second method is to calculate forces numerically via the finite-difference method based on the fact that each force is the negative gradient of potential energy. Here, the potential energy was computed with Eqs. (13) and (14). The finite-difference coordinate displacement was set to be 1 × 10−6 Å, and the induced dipole accuracy was set to 1 × 10−9. The two sets of atomic forces are listed in Table II. It is clear that the differences between the two sets of atomic forces appear only on the ninth digit after the decimal point.

TABLE II.

Atomic forces (e22) computed via the analytical expression and the finite difference procedure and their differences.

Tested atoms Water-1 O Water-1 H1 Water-1 H2 Water-2 O Water-2 H1 Water-2 H2
Analytical forces (0.096 951 679, (−0.049 846 750, (−0.042 834 464, (0.055 862 413, (−0.248 587 088, (0.188 454 211,
−0.008 409 817, −0.237 351 651, 0.245 826 708, 0.009 291 242, 0.001 371 87, −0.010 728 352,
0.124 608 301) −0.076 271 576) −0.048 683 253) −0.136 952 953) −0.041 030 761) 0.178 330 242)
Finite-difference forces (0.096 951 679, (−0.049 846 749, (−0.042 834 464, (0.055 862 412, (−0.248 587 086, (0.188 454 212,
−0.008 409 817, −0.237 351 652, 0.245 826 707, 0.009 291 241, 0.001 371 87, −0.010 728 352,
0.124 608 301) −0.076 271 576) −0.048 683 253) −0.136 952 954) −0.041 030 761) 0.178 330 242)
Deviations (−0.3 × 10−9, (0.6 × 10−9, (0.02 × 10−9, (−1.4 × 10−9, (2.0 × 10−9, (0.8 × 10−9,
−0.3 × 10−9, −1.1 × 10−9, −1.0 × 10−9, −0.9 × 10−9, 1.1 × 10−9, 0.2 × 10−10,
0.5 × 10−9) 0.2 × 10−9) −0.4 × 10−9) −0.9 × 10−9) −0.1 × 10−10) −0.3 × 10−9)

There is also an indirect way to confirm the correctness of the force expression, which is to utilize the fact that the total force of the system should always be zero in any direction. If we add up all atomic forces, the system net force (in e22) is 1 × 10−9, 2 × 10−17, and 3 × 10−17, for x, y, and z directions, respectively. The overall error here is consistent with the induction tolerance used in the testing, 1 × 10−9.

B. Accuracy of pGM electrostatic energy and forces in PME

To achieve aqueous-phase simulations, an Ewald summation or PME technique is essential for any electrostatic model. Although there are various publications discussing the accuracy of PME,42,43,47,48 we have to acknowledge the fact that the pGM model has a higher accuracy requirement than classical point-charge force fields due to the presence of dipoles. In general, higher moments would require higher PME accuracy. This can be appreciated from the perspective of two considerations. First, the interaction energy between two dipoles decays faster with distance (1/r3) than that between two charges (1/r). Second, the second derivatives of the potential are needed to compute forces on dipoles [Eq. (39)], whereas only first derivatives, e.g., electric fields, are needed to compute forces on charges. Due to these differences, we have to carefully examine the accuracy requirement of PME methods used in our model.

To test the accuracy, we only look at the most difficult pairwise interactions so that the errors reported below are the maximum errors in the tested water system. For the reciprocal part, we focus on the electrostatic field between the bonded O atom and H atoms whose interactions are the strongest and thus the most difficult in PME. For the direct summation part, we focus on the electrostatic field between two H atoms. Because they have the smallest Gaussian radii, their interactions converge slowly in the direct summation. Thus, to guarantee a given accuracy level for forces in the pGM model, we need to consider both PME components.

In the following analysis, we set the grid spacing to 1 Å for PME as in most biomolecular simulations and varied other parameters to see how accuracy changes in both the reciprocal and the direct summation components. Because our model contains both charges and dipoles, we analyzed their field separately to assess the impact of different setups on their accuracy of the electric field. The test results are shown in Tables III–V.

TABLE III.

Errors of reciprocal potentials and derivatives generated by the dipole of the water-1 H1 atom on the water-1 O atom at different PME setups. The analytical values [Eqs. (25)–(28)] were calculated using MATLAB. The interpolation order refers to the rank of the B-spline interpolation method used in PME (see Ref. 48 for details).

Interpolation order 5 6 7 8 9
Ewald coefficient β0 = 0.3 Å−1 Potential 8.2 × 10−5 2.8 × 10−5 1.5 × 10−7 3.4 × 10−6 2.7 × 10−6
First derivative 3.0 × 10−4 6.4 × 10−5 9.2 × 10−6 7.6 × 10−6 3.9 × 10−6
Second derivative 1.3 × 10−3 3.9 × 10−4 4.6 × 10−5 1.6 × 10−5 5.7 × 10−6
Ewald coefficient β0 = 0.4 Å−1 Potential 1.2 × 10−4 4.0 × 10−5 4.3 × 10−6 3.5 × 10−8 5.4 × 10−6
First derivative 1.3 × 10−3 4.6 × 10−4 1.5 × 10−4 6.5 × 10−5 2.5 × 10−5
Second derivative 2.6 × 10−3 1.6 × 10−3 4.0 × 10−4 1.9 × 10−4 8.1 × 10−5
Ewald coefficient β0 = 0.5 Å−1 Potential 2.6 × 10−4 1.1 × 10−4 1.7 × 10−4 6.1 × 10−5 9.1 × 10−5
First derivative 5.1 × 10−3 2.5 × 10−3 1.2 × 10−3 7.2 × 10−4 4.0 × 10−4
Second derivative 8.8 × 10−3 5.5 × 10−3 2.4 × 10−3 1.3 × 10−3 8.8 × 10−4

TABLE IV.

Errors of reciprocal potentials and derivatives generated by the charge of the water-1 H1 atom on the water-1 O atom at different PME setups. The analytical values [Eqs. (25)–(28)] were calculated using MATLAB. The interpolation order refers to the rank of the B-spline interpolation method used in PME (see Ref. 48 for details).

Interpolation order 5 6 7 8 9
Ewald coefficient β0 = 0.3 Å−1 Potential 1.4 × 10−6 3.7 × 10−8 6.5 × 10−8 1.4 × 10−8 5.3 × 10−9
First derivative 1.8 × 10−4 3.8 × 10−5 5.5 × 10−6 1.1 × 10−6 3.4 × 10−7
Second derivative 7.7 × 10−4 2.1 × 10−4 3.0 × 10−5 7.8 × 10−6 1.8 × 10−6
Ewald coefficient β0 = 0.4 Å−1 Potential 1.4 × 10−5 3.7 × 10−6 1.6 × 10−6 6.8 × 10−7 3.2 × 10−7
First derivative 7.4 × 10−4 1.1 × 10−4 6.1 × 10−5 2.1 × 10−5 1.0 × 10−5
Second derivative 2.5 × 10−3 7.7 × 10−4 2.1 × 10−4 7.5 × 10−5 3.0 × 10−5
Ewald coefficient β0 = 0.5 Å−1 Potential 8.1 × 10−5 3.9 × 10−5 1.9 × 10−5 1.2 × 10−5 7.0 × 10−6
First derivative 2.5 × 10−3 7.8 × 10−4 4.5 × 10−4 2.6 × 10−4 1.5 × 10−4
Second derivative 6.7 × 10−3 2.6 × 10−3 1.1 × 10−3 6.3 × 10−4 3.4 × 10−4

TABLE V.

Errors of direct summation potentials for Gaussian potentials between two H atoms at different PME setups. These values are the difference between two error functions, erf(βRc) − erf(β0Rc). Here, Rc is the direct summation cutoff distance, and β=1/(0.71475972)Å1 for the H atom pairs.

Cutoff distance (Å) 7 8 9 10 11
β0 = 0.3 Å−1 3.0 × 10−3 6.9 × 10−4 1.3 × 10−4 2.2 × 10−5 3.1 × 10−6
β0 = 0.4 Å−1 7.5 × 10−5 6.0 × 10−6 3.6 × 10−7 1.5 × 10−8 4.9 × 10−10
β0 = 0.5 Å−1 7.4 × 10−7 1.5 × 10−8 2.0 × 10−10 1.5 × 10−12 7.3 × 10−15

It is clear from the above analyses that the pGM model demands a higher accuracy level than classical point-charge models. This is as expected for any electrostatic model with dipoles or higher moments. For the reciprocal part, the field generated by dipoles is more difficult to handle than that of charges in PME. Comparing Tables III and IV, we can see that the errors of dipole fields are about twice larger than those of charge fields. Furthermore, Tables III and IV show that the second derivatives are the most difficult in PME. Thus, to ensure the accuracy of the reciprocal summation of PME, we need to make sure that the second derivatives reach a specified accuracy level. For example, if we use 5 × 10−5 as the accuracy threshold, which is a common choice, we have to set Ewald β0 = 0.3 Å−1 and the interpolation order 7 or higher in the PME setup (Table III). Of course, the use of smaller grid spacing would increase the accuracy but with a higher overhead in reciprocal summation. We will explore the best tradeoff in accuracy and efficiency for realistic biomolecular systems in a later publication.

Situations are similar for the direct summation part. To reach a common accuracy threshold of 5 × 10−5, if we set β0 = 0.3 Å−1 as in the reciprocal part, the direct space cutoff should be set to a relatively longer cutoff distance of 10 Å, as shown in Table V. Of course, a choice of larger β0 (i.e., 0.35 Å−1) would allow a commonly used cutoff distance of 9 Å. However, this would require a higher interpolation order to achieve the similar level of accuracy.

C. NVE simulations of the water box

Given all the accuracy considerations in Sec. III B, we performed a pure water simulation to test the energy conservation behavior in an NVE run with the PME treatment. The electrostatic parameters were derived from those in Table I and transplanted onto the TIP3P water model. We used 512 water molecules in a truncatedoctahedron box of 27.5 Å. The dimension of the particle mesh grid is 303, so the grid spacing is a bit less than 1 Å. The PME β0 = 0.35 Å−1, the real space cutoff was set as 9 Å, and the interpolation order was 8 so that the overall PME error was less than 5 × 10−5.

We first tested a range of induction tolerance criteria, ranging from 10−3 to 10−6. Our experiments show that 10−3 and 10−4 are clearly not sufficient for the induction iteration, leading to decreasing energy throughout the MD simulations. This is consistent with previous findings in the developments of polarizable point dipole models.42 Specifically, the energy in the NVE run of 10−3 drifts too fast so that it is already out of the plotting range at the 100th step, the very first data point. The rest of the energy plots over the simulation time are shown below in Fig. 2. The initial testing shows that the energy convergence became much better after we tighten the iteration tolerance to 10−5. Although the total energy still drifts down a little, but much slower. Finally, after we tighten it to 10−6, the total energy is basically conserved. Of course, the total energy fluctuation does exist.

FIG. 2.

FIG. 2.

Total energy vs the simulation time for the 512-water box simulation. Here, all the simulations are performed with a 5 × 10−5 PME accuracy, but with different induction iteration tolerances (1 × 10−4–1 × 10−6).

Next, we also studied the influences of the PME setup on the energy conservation. To compare with the NVE run with the high PME accuracy above, we collected a comparable NVE run with the same induction tolerance of 10−6, but with a somewhat lower PME setting. The real space cutoff was set as 8 Å, and the interpolation order was set as 6, but others remained to be the same, which leads to a lower PME accuracy of ∼5 × 10−4. The total energy is more positive because there are fewer van der Waals pairs. As shown in Fig. 3, the total energy also drifts noticeably, though it becomes more positive over time. In summary, our experiment shows that high enough accuracy in both the PME calculation and the induction iteration is necessary for a polarizable dipole model with permanent dipoles to achieve energy conservation. Furthermore, we expect that even higher accuracy is necessary if higher moments, i.e., quadrupoles, are used in future pGM developments as higher derivatives are needed from the PME calculation.

FIG. 3.

FIG. 3.

Total energy vs the simulation time for the 512-water box simulation. Here, both simulations were performed with a 1 × 10−6 induction iteration tolerance, but with different PME accuracies, low for 5 × 10−4 and high for 5 × 10−5.

IV. CONCLUSION

In this work, we proposed an efficient formulation for the polarizable Gaussian Multipole (pGM) model for biomolecular simulations. First, a local frame based on the covalent basis vectors (CBV)/tensors was used to set up the permanent (covalent) multipoles on all atoms. The CBV frame nicely allows the intrinsic molecular flexibility during simulations and facilitates an efficient expression of the electrostatic forces in the closed form. Based on the new CBV local frame, we then derived the analytical force expressions for the pGM model. Finally, we outlined how to interface the pGM electrostatics seamlessly with the PME implementation for molecular simulations under the periodic boundary conditions.

To validate the analytical force expression for the pGM model defined on the CBV frame, we studied the accuracy of the analytical atomic forces with a finite-different force analysis for a water dimer. The analysis shows a very good consistency between the analytical and numerical forces, with an error comparable to the finite difference uncertainty. In addition, total analytical and numerical forces of the water dimer are very close to zero with an error consistent with the induction iteration tolerance.

Next, we analyzed the PME setups necessary for accurate pGM energy and force calculations. It was found that the pGM model requires higher accuracy than the classical point-charge models due to the presence of dipoles. This is because the electrostatic field generated by dipoles is much more difficult to interpolate than that of charges in PME, and the error of the dipole field is about twice that of the charge field. In addition, the second derivative of the potential is needed, which is the more difficult to compute accurately in PME to ensure accurate pGM forces.

To validate the overall electrostatic framework for the reformulated pGM model, we conducted an NVE simulation for a small water box of 512 water molecules. Our results show that to achieve energy conservation, it is important to ensure enough accuracy on both PME and induced dipoles. With a 5 × 10−5 accuracy on PME and a 1 × 10−6 tolerance for the induced dipoles, the tested NVE water simulation in the pGM model was shown to conserve energy reasonably well. Future development will be necessary to improve the efficiency of the pGM model in both the PME setup and induction iteration to bring out the potential of the pGM model.

DATA AVAILABILITY

The algorithms developed in this study and the validation data are deposited in the Amber repository and will be made publicly available in the next Amber/AmberTools release at http://ambermd.org/.

ACKNOWLEDGMENTS

This work was supported by NIH (Grant Nos. GM079383, GM093040, and GM130367).

APPENDIX: BOYS SERIAL AND DERIVATION OF ANALYTICAL PGM FORCES

1. Tensor format of Boys serial

In this study, Boys functions up to rank 3 were used and are listed below as reference. Higher ranked tensors and Boys functions can be found in the literature.44,49

Boys functions up to rank 3 are

B0x=erf(x)x,
B1x=erf(x)x32πex21x2,
B2x=3erf(x)x52πex21x4(3+2x2),
B3x=15erf(x)x72πex21x6(15+10x2+4x4).

The associated tensors are

erf(βR)R=Rβ3B1(βR),
erf(βR)R=x^px^q(RpRqβ5B2(βR)δpqβ3B1βR),
erf(βR)R=x^px^qx^rδpqRr+δprRq+δrqRpβ5B2βRRpRqRrβ7B3βR.

2. Force derivation

We proceed in two steps: covalent–covalent interactions and induced interactions, as shown in Sec. II C. First, we consider interaction energies due to covalent multipoles interacting with covalent multipoles.

The system can be split into two groups of atoms, bonded atoms and nonbonded atoms. The bonded group has those atoms bonded to the atom to be considered (including itself), and the nonbonded group has the rest. In the bonded group, the atoms can be further split into two subgroups: the atom that is currently under consideration, termed as the bonded-moving atom below, and the other atoms in the bonded group are termed bonded-non-moving atoms. A total of three groups of atoms can be classified.

Thus, we can rewrite the covalent–covalent interaction energy as the following four parts.

  • (1)

    Nonbonded atoms interacting with bonded-non-moving atoms:

U=ibondednonmovingjnonbonded(qi+μii)(qj+μjj)erf(βijRij)Rij.
  • (2)

    Nonbonded atoms interacting with bonded-moving atom i:

U=jnonbonded(qi+μii)(qj+μjj)erf(βijRij)Rij.
  • (3)

    Bonded-non-moving atoms interacting with bonded-non-moving atoms:

U=12ibondednonmovingjibondednonmoving(qi+μii)×(qj+μjj)erf(βijRij)Rij.
  • (4)

    Bonded-non-moving atoms interacting with bonded-moving atom i:

U=jbondednonmoving(qi+μii)(qj+μjj)erf(βijRij)Rij.

Apparently, there should be a fifth part of interaction energy, nonbonded atoms interacting with nonbonded atoms. However, this part of energy does not change in the force calculation, so we omit its expression here. Of course, atom i’s self-interaction is also ignored, as discussed in the text.

Next, force on bonded-moving atom (i) can be derived as the negative gradient of the above energy terms. When computing the gradient, it is worth pointing out that nothing varies on the nonbonded atoms, only the dipole directions vary on the bonded-non-moving atoms, and both dipole directions and positions of the bonded-moving atoms vary. The above four energy parts thus lead to the following four force components, respectively:

(Fi)1=kbondednonmovingjnonbondedi(μk)k(qj+μjj)erf(βkjRkj)Rkj,
(Fi)2=jnonbondedi(μi)i(qj+μjj)erfβijRijRijjnonbonded(qi+μii)qj+μjjierfβijRijRij,
(Fi)3=kbondednonmovingjkbondednonmovingi(μk)k(qj+μjj)erf(βkjRkj)Rkj,
(Fi)4=jbondednonmovingi(μi)i(qj+μjj)erfβijRijRijjbondednonmoving(qi+μii)i(μj)jerfβijRijRijjbondednonmoving(qi+μii)(qj+μjj)ierf(βijRij)Rij.

Summing up all four components, the final force expression is

Fi=knjkNi(μk)k(qj+μjj)erfβkjRkjRkjjiN(qi+μii)(qj+μjj)ierf(βijRij)Rij.

Here, n and N follow the same notation as Sec. II C, the number of atoms in the bonded group and the system, respectively.

Second, we consider energies caused by the induced dipoles. As stated before, the induced energy contains three parts, induced dipoles interacting with covalent multipoles, induced dipoles interacting with induced dipoles, and induced dipole self-energy 12pE. From Sec. II C, we know that the total induced energy is

12iNpiEi0,

where Ei0 is the electric field on atom i only by covalent multipoles.

The induced dipoles are determined by the total electric field,

pi=αiEi=αi(Ei0jiNTijpj),
Tij=ijerf(βijRij)Rij.

Changing the above expressions into the component format and applying Einstein’s index notation, we obtain

pis=αi(Ei0,sTijstpjt),

where s and t are component indices. Rearrangement leads to

1αjδijst+Tijstpjt=Aijstpjt=Ei0,s.

If we assume that Aijst is inversible and its inverse matrix is A1ijst, we have

pis=A1ijstEj0,t,

where A1ijst is a 3N × 3N matrix, symmetrical for both atom index and component index,

A1ijst=A1ijts=A1jist.

The gradient of A−1 is

A1ijstxkw=A1iissAijstxkwA1ijst=A1iissTijstxkwA1jjtt,

where w refers to coordinate indices (x1, x2, x3). It is obvious that i′ or j′ has to be equal to k for T to have a nonzero value. We have

Tijstxkw=Tkjstxkw+Tikstxkw.

Based on the above relations, the force expressed as the negative gradient of the induced energy is

Fiw=xiw12pjEj0=xiw12A1jkstEj0,sEk0,t=12A1jkstxiwEj0,sEk0,t+A1jkstEj0,sEk0,txiw=12A1jjssTjkstxiwA1kkttEj0,sEk0,t+pktEk0,txiw=12pjsTjkstxiwpkt+pktEk0,txiw=pisTikstxiwpkt+pktEk0,txiw.

Rewriting the above component format into the vector/tensor format, we have

Fi=jiNpiipjjierfβijRijRij+jNi(Ej0)pj.

The next step is to evaluate i(Ej0). Following the similar strategy used in covalent–covalent interactions, we split the system into two groups: non-moving atoms and moving atom (i.e., atom i).

When computing the derivative of the field on a non-moving atom j, it is worth pointing out that the other nonbonded non-moving atoms are not influenced by the virtual displacement of atom i, so only bonded non-moving atoms are considered below,

iEj0=ikjbondednonmovingjqk+μkkerfβkjRkjRkjjqi+μiierfβijRijRij=kjbondednonmovingi(μk)kjerfβkjRkjRkjiμiijerfβijRijRijqi+μiiijerfβijRijRij=kjni(μk)kjerfβkjRkjRkjqi+μiiijerfβijRijRij.

Here, n represents the number of atoms in the bonded group, including atom i.

Next, we compute the derivative of the field on the moving atom, i.e., atom i, as follows:

iEi0=ijiniqj+μjjerfβijRijRijjnonbondediqj+μjjerfβijRijRij=jini(μj)jierfβijRijRijjinqj+μjjiierfβijRijRijjinonbondedqj+μjjiierfβijRijRij=jiniμjjierfβijRijRijjiNqj+μjjiierfβijRijRij.

Here, N represents the number of all atoms in the system.

Note: This paper is part of the JCP Special Topic on Classical Molecular Dynamics (MD) Simulations: Codes, Algorithms, Force fields, and Applications.

REFERENCES

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The algorithms developed in this study and the validation data are deposited in the Amber repository and will be made publicly available in the next Amber/AmberTools release at http://ambermd.org/.


Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES