Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2020 Aug 10;153(6):064103. doi: 10.1063/5.0016376

Accurate description of molecular dipole surface with charge flux implemented for molecular mechanics

Xudong Yang 1, Chengwen Liu 1, Brandon D Walker 1, Pengyu Ren 1,a)
PMCID: PMC7433759  PMID: 35287459

Abstract

The molecular dipole moment is strongly coupled to molecular geometry among different phases, conformational states, intermolecular interaction energy, and vibrational spectroscopy. Our previous inclusion of geometry dependent charge flux into the atomic multipole-based polarizable AMOEBA+ force field has shown significant improvement of water properties from gaseous to condensed phases [C. Liu et al., J. Phys. Chem. Lett. 11(2), 419–426 (2020)]. In this work, the parameterization of the CF model for a broad range of organic and biomolecular fragments is presented. Atom types are automatically assigned by matching the predefined SMARTS patterns. Comparing to the current AMOEBA+ model without the CF component, it is shown that the AMOEBA+ (CF) model improves the description of molecular dipole moments for the molecules we studied over both equilibrium and distorted geometries. For the equilibrium-geometry structures, AMOEBA+ (CF) reduces the mean square error (MSE) from 6.806 × 10−1 (without CF) to 4.249 × 10−4 D2. For non-equilibrium structures, the MSE is reduced from 5.766 × 10−1 (without CF) to 2.237 × 10−3 D2. Finally, the transferability of the CF model and parameters were validated on two sets of molecules: one includes molecules in the training set but with different geometries, and the other one involves new molecules outside of the training set. A similar improvement on dipole surfaces was obtained on the validation sets. The CF algorithms and parameters derived in this work are general and can be implemented into any existing molecular mechanical force fields.

I. INTRODUCTION

Point charge distributions are commonly used to represent the electrostatic potential in classical force fields (FFs). It is well known that charge distributions will be affected by both chemical environments through polarization effect and local geometry changes.1 The response in the electrostatic potential due to environmental changes is explicitly treated in popular “polarizable” FFs using methods such as Drude oscillator,2 atomic induced dipole,3 and fluctuating charges.4–6 The charge redistribution due to local geometry changes is ignored by the most classical FFs even though it can be significant.1,7,8 One widely recognized example is that the bond angle of water cannot be described consistently in different phases by flexible models due to incorrect dipole derivatives.8–10 This requires an explicit treatment of the geometry dependent charge flux (GDCF) when water geometry changes.

The gain of including GDCF in classical models has been seen from the spectroscopically determined force field (SDFF),11–13 TTM-family models,14,15 and others.16,17 Recently, the GDCF model was implemented into the AMOEBA+ water potential.18 Our implementation followed the model proposed by Dinur and Hagler.7 The parameters of the original AMOEBA+ water model19 were refined after adding the GDCF. The results show that CF not only improves the bond angle geometry of water over different phases (gas, liquid, and ice) but also improves the cluster interaction energy, liquid thermodynamic properties, as well as liquid infrared spectra property.18 ForceBalance automation tool was employed in the parameter optimization of the water model.20 The CF parameters were slightly tuned, targeting both the gas phase quantum mechanical (QM) data and experimental observations of the liquid phase.

Atom types are commonly used in FF parameterization to reduce the number of parameters to be determined.21,22 Atom types usually depend on the atomic number and the chemical bonding environment they are involved in. Although electronic structure properties can be used for comparing similarities and thus determining atom types,23,24 most FFs still rely on chemical intuition for atom typing.22 SMILES strings are convenient descriptors of atoms and their chemical environment in molecules, and they can be used to assign atom types in FFs with SMARTS pattern searching.25 In this work, we utilized SMILES strings to define common atom types that cover a wide range of organic molecules to directly assign CF parameters for the molecules in our datasets.

A set of organic molecules and biomolecular fragments with emphasis on chemical diversity was constructed first. The molecular dipole moments were calculated from ab initio methods for minimum energy and distorted structures (both increasing and decreasing their bonds and angles). SMILES-based atom types were defined and used to reduce the number of CF parameters, thus avoiding overfitting in later parameterization. By using chemical environment-based atom types, the expectation is that the derived CF parameters can be transferred on other molecules outside of the training set to describe their molecule dipole surface. Finally, the CF parameters were validated on new geometries and molecules.

II. METHODOLOGIES

A. Theoretical background

In this section, the implementation of CF into the AMOEBA+ potential is briefly described. In the AMOEBA+ potential, short-range charge penetration26 and charge transfer effects are incorporated, together with the improved description of polarization27 and van der Waals non-bonded interactions.19 The CF has been incorporated into the AMOEBA+ potential, and a water model was parameterized.18 The general CF implementation is described in the following paragraphs.

Inherited from the AMOEBA model, AMOEBA+ uses atomic multipole moments up to quadrupole to represent the complex electrostatic potential. In the AMOEBA+ model, the dipole moment of a molecule is composed of two parts: (1) permanent moments, including contributions from permanent dipoles and monopoles and (2) induced-dipole due to electronic response to the external field. However, the dipole moment surface from the AMOEBA+ potential is still deficient because the local geometrical dependence of multipoles is neglected. The total molecular dipole moment is partitioned into contributions from monopoles and dipoles (both permanent and induced) in the following equation:

μtot=iqiri+iμi, (1)

where ri is the distance to the coordinate origin and qi is the permanent atomic charge on atom i. μi is the sum of permanent and induced dipole moments on atom i. To accurately describe the molecular dipole surface requires accurate dipole derivatives with respect to atom coordinates. Taking the derivative of Eq. (1), we obtain

μtotxi=iqirixi+iriqixi+iμixi. (2)

The three terms on the right-hand side represent the gradient of geometry, charge, and dipole, respectively. The third term can be further divided into contributions from both permanent and induced dipole moments. However, the second term captures the dependence of atomic charge on the geometry, which can be a significant contribution, as we will demonstrate by AMOEBA+ dipole surfaces with and without GDCF in Sec. III.

By following the approach proposed by Dinur and Hagler,7 the GDCF contribution is expressed as a function of (1) direct bond-length, (2) angle, and (3) proximal bond-length. The approximation of locality for CF was adopted to only consider 1–2 and 1–3 effects (atoms at most two bonds apart). Equations (3) and (4) describe the change in the charges of atoms A and B (i.e., the amount of charge flux) that are directly bonded to each other as a result of bond length variation. “b” denotes the bond length, and the subscript “0” means the reference equilibrium bond. “jb” is the parameter to reflect the intensity of CF related to the AB bond,

dqA=jbbb0, (3)
dqB=dqA. (4)

For an angle made of atoms A–B–C, the CF from an angle and related proximal bonds is expressed in Eqs. (5)–(7). “a” denotes the angle value, and “b” represents the length of proximal bonds (AB and BC) in the angle under discussion,

dqA=ja,Aaa0+jb,BCbBCb0BC, (5)
dqC=ja,Caa0+jb,ABbABb0AB, (6)
dqB=dqAdqC. (7)

An example of ammonia has been given in Fig. 1 to illustrate this scheme. Rules for determining the direction of CF (the sign of dq) and if the CF effects are included or not (e.g., due to symmetry) are described below, which were proposed by Dinur and Hagler.7 The CF will be from atom A to B (dq is added to atom A, and –dq is added to atom B) if:

  • (1)

    Atom B has a greater atomic number. For example, the CF direction in the C–H bond of methane is from H to C [Fig. 2(a)].

  • (2)

    Atom B has more connections if (1) is not applicable (atom A and B are the same elements). For instance, the H3C–C in methylbenzene, the C on the aromatic ring has only three connections, and the C of methyl group has four connections; then the direction of CF is from benzyl C to the methyl C [Fig. 2(b)].

  • (3)

    Atom B has more H atoms bonded to it if both (1) and (2) are not applicable (atoms A and B have the same atomic number and number of connections). For example, for the C–C bond in ethanol (CH3CH2OH), CF is from –CH2 to –CH3 [Fig. 2(c)].

FIG. 1.

FIG. 1.

An illustration of the CF parameters to be determined for ammonia, where jb, ja, and jb are charge flux intensity parameters for the bond, angle, and proximal bond contributions, respectively.

FIG. 2.

FIG. 2.

Three examples to show the direction of CF alongside the chemical bonds: (a) methane, (b) methylbenzene, and (c) ethanol. The green arrows indicate the CF direction.

For the angle-proximal bond CF, the CF direction is from the side atoms to the central atom (dq is added to the side atom, and –dq is added to the central atom, which is shown in Fig. 3).

FIG. 3.

FIG. 3.

An example to show the direction of angle–bond CF. The atoms here can represent any elements. The black arrows denote the direction of CF, and the blue arrow shows how the angle changes.

The modified multipole for atom i is then expressed as

Mi=q+dqi,μx,μy,μz,Θxx,. (8)

Due to the inclusion of the CF, the forces on the involved atoms are modified in the electrostatic and polarization terms. The final form is provided in the following equation:

Fi,αCF=Fi,α+Fi,α,α=x,y,z. (9)

The first term on the right-hand side stands for the usual AMOEBA+ electrostatics and polarization forces with CF-updated multipoles. The second term arises from CF, which explicitly depends on the internal bonds and angles. More details of the force derivation due to CF have been provided in the previous publication.18

B. Computational details

1. Selection of training set

In Table I, we highlight all the functional groups that include in the training set molecules. In total, 162 molecules have been chosen. Their chemical nomenclatures and the number of conformers are listed in Table S1. The set of 162 molecules has been subdivided into several classes according to the functional groups: (1) alkanes; (2) alkanes with substituent groups, including sp3 N, O, S, or halogen; (3) molecules involving sp2 atoms; (4) benzene and its derivatives; and (5) heterocyclic molecules.

TABLE I.

Functional groups covered in the current study.

Group Chemical structure Group Chemical structure
Alkane Inline graphic (chain & cyclic) Alkene graphic file with name JCPSA6-000153-064103_1-d002.jpg
Alkyne graphic file with name JCPSA6-000153-064103_1-d003.jpg Conjugated molecules graphic file with name JCPSA6-000153-064103_1-d004.jpg
Amine Inline graphic (all R = H, C) Ammonium Inline graphic (all R = H, C)
Alcohol Inline graphic (R = C) Ether Inline graphic (all R = C)
Di-oxyl Inline graphic (all R = C) Thiol Inline graphic (R = C)
Sulfide Inline graphic (all R = C) Disulfide Inline graphic (all R = C)
Halide R–X (R=C; X=F, Cl, Br) Aldehyde Inline graphic (R=C, H)
Ketone Inline graphic (R=C) Carboxylic acid/carboxylate Inline graphic (R=C, H)
Carboxylate ester Inline graphic (R=C, H; R’=C) Carbamic acid/carbamate graphic file with name JCPSA6-000153-064103_1-d017.jpg
Amide Inline graphic (R, R’, R”=C, H) Carbonate/bicarbonate Inline graphic (R=C, H; R’=C)
Anhydride Inline graphic (R1, R2=C) Acyl halide Inline graphic (R=C; X=F, Cl, Br)
Oxalic acid oxalate graphic file with name JCPSA6-000153-064103_1-d022.jpg α-Keto acetate/β-keto acetate Inline graphic (R=C)
Malonic acid/malonate graphic file with name JCPSA6-000153-064103_1-d024.jpg Acrylic acid/acrylate graphic file with name JCPSA6-000153-064103_1-d025.jpg
Imide Inline graphic (R, R’=C) Imine (aldimine/ketimine) Inline graphic (R, R’, R” remain 1 C)
Azide graphic file with name JCPSA6-000153-064103_1-d028.jpg Azo-compound graphic file with name JCPSA6-000153-064103_1-d029.jpg
Nitro-compound Inline graphic (R=C) Guanidine graphic file with name JCPSA6-000153-064103_1-d031.jpg
Amino acid graphic file with name JCPSA6-000153-064103_1-d032.jpg Oxime Inline graphic (R=C, H; R’ =C)
Nitrite Inline graphic (R=C) Nitrile Inline graphic (R=C)
Sulfoxide Inline graphic (R, R’=C) Sulfone Inline graphic (R, R’=C)
Sulfonic acid/sulfonate Inline graphic (R=C) Sulfonate ester Inline graphic (R, R’=C)
Thiocyanate Inline graphic (R=C) Phosphine graphic file with name JCPSA6-000153-064103_1-d041.jpg
Phosphoric acid/phosphate graphic file with name JCPSA6-000153-064103_1-d042.jpg Pyridine/pyrazine graphic file with name JCPSA6-000153-064103_1-d043.jpg
Benzene derivatives Conjugated lactam graphic file with name JCPSA6-000153-064103_1-d044.jpg
Indole graphic file with name JCPSA6-000153-064103_1-d045.jpg
graphic file with name JCPSA6-000153-064103_1-d046.jpg Imidazole/imidazolium graphic file with name JCPSA6-000153-064103_1-d047.jpg
NA base graphic file with name JCPSA6-000153-064103_1-d048.jpg

2. Constructing ab initio molecular dipole surface

Molecular dipole surfaces of the training set molecules were first generated. The optimized equilibrium molecular geometry was modified on the chosen bonds and angles to obtain adequate conformations.

It is worthwhile to survey the bond and angle distributions in the real MD simulations before generating distorted monomer structures. From AMOEBA protein, nucleic acid, and small-molecule MD trajectories, we found that the standard deviation of the bond and angle is in the range of −0.05 Å to 0.05 Å and −5° to 5°, respectively. We used the following scheme to generate the distorted conformers: every bond and angle were perturbed toward larger or smaller directions. Specifically, starting from the equilibrium bond length, four conformers were created by changing the chosen bond by ±0.1 Å and ±0.2 Å. Besides, another eight conformers have been created by altering the chosen angle by ±1°, ±2°, ±4°, and ±6°. Those conformers that are higher in potential energy than their equilibria by 10 kcal/mol have been ruled out, given their rare occurrence in the real simulations. In the following text, conformers experiencing the bond change ±0.1 Å will be denoted as the “bond-small” group, and those having the bond change ±0.2 Å are the “bond-large” group. For molecules with angle change ±1° or ±2°, they are the “angle-small” group. If the angle change is ±4° or ±6°, they are the “angle-large” group.

Additionally, to explore the capability of the CF model to describe the dipole gradient [as shown in Eq. (2)], the additional conformers have been generated, with each heavy atom in each molecule moving in the direction of each coordinate axis (x, y, and z) by 0.01 Å. Likewise, the conformers should be under energy limitation 10 kcal/mol from equilibrium, and the number of atoms should be no more than 18 because dipole gradients in large molecules are sensitive to the minor atom displacement. The numerical gradients of the dipole moment with respect to the coordinate were directly calculated using a two-side finite difference approach. In total, 5836 conformers for dipole moment fitting and 2211 conformers for gradient fitting were created from the 162 molecules in the training set.

For the equilibrium and perturbed structures, their molecular dipole moments were calculated at the PBE0/aug-cc-pvdz level of theory, which was accurate for dipole moment estimation with an affordable computational cost.28,29 This functional and basis set combination has been further validated using available experimental dipole moment data. The difference between QM and the experiment is 0.10 D on average (Fig. S1). The Gaussian 09 package30 was used for neutral molecules, and the Psi4 package31 was used for the charged ones. The reason for using Psi4 is that users can easily specify the origin of dipole moment to the center of mass (COM) of the molecule, which matters for charged molecules. This allows for comparison to AMOEBA+ (CF) dipoles also in the COM frame for charged molecules. It is worth mentioning that for neutral molecules, the molecular dipole moment does not depend on the choice of the reference frame. To eliminate the possible inconsistency that different software packages may introduce, we compared the dipole moments calculated from Psi4 and Gaussian packages (both G09 and G16). Our results indicate that the difference is in the magnitude of 10−4 D for the representative molecules in our database (Table S9).

3. Multipole parameter derivation

The automatic parameterization tool, Poltype,32 was used to derive the multipole parameters for all the molecules involved in this work. The molecular equilibrium structure was fully optimized using the Gaussian 09 package at the MP2/6-31+G(d) level of theory. We followed the previous procedure for deriving AMOEBA+ multipole parameters,33 where the distributed multipole analysis34 was first performed at the MP2/6-311g(d,p) level of theory to obtain the initial atomic multipole moments (including monopole, dipole, and quadrupole). Then, the atomic dipole and quadrupole were further optimized, targeting a higher level (MP2/aug-cc-pvtz) of electrostatic potential with monopoles being fixed.

4. CF parametrization

The Python-based least-square minimization algorithm35 was used in CF parameterization. The “analyze” program of our CF-implemented Tinker package was used to get the molecular dipole moment. The objective function was chosen to be the total mean square error [Eq. (10)] of the QM and MM dipole moments (first rhs term) and their gradients (second rhs term). In Sec. III, if not specified, MSE denotes the mean square error from each component (X, Y, and Z) of dipole moments computed for all molecules and conformers,

MSE(total)=1ni=1nj=x,y,zμi,jMMμi,jQM2+μrjiMMμrjiQM2. (10)

5. Validation sets

CF parameters were validated on two sets of molecules. The first set involves the same molecules as those in the training set but with different geometries (1934 conformers); they are generated by randomly choosing atoms to change their distance and angle, which might not be alongside the real bonds (the changing values are the same to the training set). To validate the transferability of the CF parameters to capture the dipole moment surfaces of molecules outside of the training set, the second set including the additional 50 molecules (shown in Table S2) was constructed. Following the schemes above, 2610 conformers were generated with the specified bond being changed to ±0.1 Å and ±0.2 Å from equilibrium and angle being changed to ±1°, ±2°, ±4°, and ±6° from their equilibria. The same procedure as above was used to obtain their dipole moments.

III. RESULTS AND DISCUSSION

In this section, the atom types covered in our database are described first. These atom types were applied in the parameterization of the CF model. The parameter derivations and validations are reported and discussed afterward.

A. Determination of general atom types and CF atom types

General atomic types were first derived for the molecules in the training set, which covers most chemical groups typically encountered in biomolecular fragments and small drug-like organic molecules composed of C, H, O, H, S, P, F, Cl, and Br elements. The general atomic types can be applied to all components in the AMOEBA+ potential, such as charge penetration, charge transfer, and van der Waals. As a result, the general atom types are determined by element type, hybridization, and immediate bonding atoms (function groups it is in). We found that the CF types can be “coarser” than the general types as the CF is less sensitive to the chemical environments. Thus, the general atom types were further grouped into CF types, as described in the following.

The average bond and angular length values, defined by using the general atom types, with their correspondent standard deviations (SD) are good indicators whether the bonding atoms can share the same CF parameters (i.e., defined as the same CF types). For example, in general typing, we defined four types of sp3 C bonded to H except for methane, based on the number of H on C (Table II, 19–23). However, the bonds formed by all these sp3 C and H fall in the same bond length range of 1.088 Å–1.106 Å. Thus, they share the same CF parameters and typing. As can be seen in Tables S4 and S5, sp3 C, except the specially defined ones, can be grouped into generic C3* based on the bond/angle analysis.

TABLE II.

Statistical comparison of QM molecular dipole moments and molecular mechanics (MM) values calculated by the AMOEBA+ model with and without CF implementation.a

Statisticsb w./w.o. CF Training setc Validation set I Validation set II
MSE components (D2) w.o. 5.795 × 10−1 7.608 × 10−1 5.876 × 10−1
Eq: 6.806 × 10−1 Eq: 4.380 × 10−1
Non-eq: 5.766 × 10−1 Non-eq: 5.906 × 10−1
w. 2.187 × 10−3 4.968 × 10−2 1.742 × 10−1
Eq: 4.249 × 10−4 Eq: 1.608 × 10−1
Non-eq: 2.237 × 10−3 Non-eq: 1.745 × 10−1
MSE total (D2) w.o. 6.917 × 10−1 6.581 × 10−1 8.980 × 10−1
w. 3.154 × 10−3 6.401 × 10−2 2.904 × 10−1
MAE total (D) w.o. 4.878 × 10−1 4.351 × 10−1 6.437 × 10−1
w. 3.230 × 10−2 9.860 × 10−2 3.221 × 10−1
MSD total (D) w.o. 3.672 × 10−1 2.542 × 10−1 3.220 × 10−1
w. 1.650 × 10−3 −1.099 × 10−2 −2.091 × 10−2
R2 w.o. 0.887 0.827 0.669
w. 0.999 0.981 0.871
a

Statistics are reported for all the conformers in each dataset, otherwise noted.

b

MSE: mean square error; MAE: mean absolute error; MSD: mean signed deviation; R2: correlation coefficient.

c

Eq: molecules in the optimized equilibrium geometry; Non-eq: molecules in the distorted geometry.

Specifically, the criteria for grouping the CF types using the bond length and angle value distributions are described as follows: for bond length, the tolerance set for the SD of bond lengths is less than 0.01 Å, and the maximum deviation for a particular bond length from the mean is less than 0.02 Å for each bond type emerging in the training set. For angles, the tolerance for SD of bond angles is set to 3°, and the maximum deviation for a particular angle from the mean is at most 4°. All of the molecules in our test set were examined using the above criteria for bond and angle distribution to define transferable CF types (Tables S4 and S5). However, for angles, which have relatively high flexibility compared to bonds, the transferability might not be as good. Sometimes, even the same angle type within a molecule will show large differences. For example, the bond H–C–S in ethanethiol has two angle values 104.78° and 109.41°, which are impossible to distinguish together based on SMARTS definition. One possible explanation to this difference is that the valence electron repulsion exists between the S–H bond and the other two C–H bonds, which leads to larger H–C–S angles. This phenomenon is widely prevalent in organic molecules, so the only way to cope with this problem is to make our type approach a generally appropriate level as much as possible. However, the CF assists in capturing polarization effects related to local geometry changes. It should be noted that due to sharing CF parameters and typing across different molecules, a small amount of CF may exist in equilibrium structures because of the bond/angle distribution.

The results for general and CF types with their corresponding SMARTS strings are collected in Table S3. In total, there are 178 general atom types, including 14H, 91C, 29N, 22O, 12S, 7P, and halogens. Among these types, 24C and 5N are exclusive to heterocyclic molecules. Based on the CF types, we obtained 341 types of bonds and 651 types of angles (Tables S4 and S5), and these types can be directly applied for not only the calculation of CF but also bond and angle energy components in the following AMOEBA+ force field development. Then, 341 bond CF intensity parameters, 1204 CF intensity parameters for the angle, and 1204 CF intensity parameters for the proximal bond have been generated. Considering the bond/angle distribution of the realistic molecular system modeling, the direct bond, the angle, and proximal bond parameters lower than 10−3 can be removed, which will lead to the reduction of ∼66% direct bond parameters, ∼45% angle parameters, and ∼25% proximal bond parameters (Table III).

TABLE III.

Distribution of the CF parameters (the number of parameters and percentage in the specified range are presented).

Value j<103 103j<102 102j<101 101j<0.5
Direct bond (e/Å) 222(65.10%) 8(2.35%) 30(8.80%) 81(23.75%)
Angle (e/°) 554(46.00%) 559(46.47%) 89(7.37%) 2(1.54%)
Proximal bond (e/Å) 333(27.65%) 205(17.05%) 548(45.55%) 117(9.75%)

Next, the detailed analysis of the intensity parameters jb, ja, and jb and CF type classification in different categories of molecules have been given.

1. CF types and parameters for molecules with only sp3 atoms

Intensity parameters jb, ja, and jb are constants that determine the magnitude of charge fluxes across the corresponding bonds and angles with respect to geometry. These parameters reflect the sensitivity of CF to the geometry and chemical surroundings. Our goal is to obtain a set of general, transferable parameters shared by the set of common molecules. We have determined the “minimal” set of parameters that can reasonably reproduce molecular dipole surfaces by varying the number of CF-related atom types.

First, for hydrogenated molecules with only one heavy atom (CH4, NH3, H2S, PH3, and H2O), using unique exclusive intensity parameters ensures greater transferability. For chain alkanes, if we use four separate j constants for sp3 C and H atoms in the example mentioned earlier, the MSE in molecular dipole moments (169 conformers) is 2.084 × 10−3 D2. If a single j constant is shared among all sp3 C and H in this family of molecules, the final MSE in molecular dipole moments increases marginally to 2.293 × 10−3 D2. This suggests that different numbers of H atoms have a negligible impact on charge flux to sp3 C. Thus, it is reasonable to set the CF parameters including the intensity constants to be the same for all sp3 C and H in alkanes.

Several kinds of molecules with five-membered or six-membered carbon rings have been used in the training set to parameterize the non-aromatic cyclic carbons. We investigated the feasibility of using the same CF parameters for the carbons in non-aromatic rings. If we use separate CF parameters for chain vs cyclic sp3 C, the MSE in molecular dipole moments (227 conformers) is 9.772 × 10−4 D2. However, the MSE increases to 2.126 × 10−3 D2 (about double increase) if we force them to share the same parameters. The results indicated that the internal bond–angle interactions under the ring tension have a noticeable impact on CF effects, thus requiring different intensity parameters for chain vs non-aromatic cyclic sp3 C. The above analysis is consistent with the previous studies that different FF parameters should be used for linear and cyclic molecules.36–38

Besides H atoms, the influence of other elements (N, O, S, and halogen) on sp3 C has been analyzed. The molecule set selected to address this question has been shown in category two of Table S1. For C bonded with sp3 N, O, and S, the bond length and angle values involving C show small standard deviation from the average (Tables S4 and S5). To demonstrate that intensity parameters for sp3 C not bonded to sp3 NOS can be combined with those sp3 C bonded to sp3 NOS, we use the set of molecules containing MeNH2, MeOH, MeSH, 1-butylamine, EtOH, 2-propanol, and EtSH. The calculated molecular dipole moments are essentially the same by using shared (MSE = 1.048 × 10−2 D2) or different (6.220 × 10−3 D2, ∼40% decrease) CF type for C. Therefore, the CF types of sp3 C that are bonded to sp3 N, O, or S are set to be the same as sp3 C that is bonded to H or sp3 C only by the trade-off between the number of types and this improvement.

However, it was discovered that for the halogenated alkanes, different number of halogen atoms lead to large deviations in equilibrium bond lengths. For example, from CH3F to CH2F2, CHF3, and CF4, the C–F bond lengths change from 1.391 to 1.366, 1.350, and 1.344 Å. For CClx, the C–Cl bond lengths change from 1.780 (CH3Cl) to 1.769 (CH2Cl2), 1.772 (CHCl3), and 1.782 Å (CCl4). For CBrx, the C–Br bond lengths change from 1.951 (CH3Br) to 1.940 (CH2Br2), and 1.944 Å (CHBr3). There are significant deviations in angles for these molecules as well. Meanwhile, for these halides, the unique sigma hole39 effects were observed, which originates from the large electron distortion along the direction of the bond. This effect is mainly attributed to polarization and different numbers of halogen atoms bonded to C, which affects the sigma hole. Thus, the different CF types have been assigned to these atoms to capture this difference (CF1, CF2, CF3, CF4, CCl1, CCl2, CCl3, CCl4, CBr1, CBr2, and CBr3). For example, in ClCH2CH2CH3, the bond C–Cl will be marked as CCl1–Cl, and the angle H–C–Cl will be marked as H–CCl1–Cl. However, for the carbon chain angle (C–C–C) only containing halogenated C without bonded halogen or H, it will be assigned intensity parameters and equilibrium angle the same as C3*–C3*–C3* in propane because of the locality of this effect. In our test using the set of halogenated alkanes [CHnX4−n (X = halogen, n = 1–4), EtX], it was noticed that the MSE in molecular dipole moments is 2.096 × 10−3 D2 when not separating sp3 C bonded to different numbers of halogens from those non-halogenated sp3 C. If separated, the MSE decreases to 1.093 × 10−3 D2, which shows obvious improvement.

Besides, all the angle CF types involving 2 or 3 sp3 C atoms, unless defined specifically, are assigned the generic C3* type because their parameter values are close to each other. For example, the CA (C in the backbone of amino acid) has its unique bond and angle types for the backbone. For the angle CA–C–S in the cysteine side chain, we will treat both sp3 C as generic type C3* so that this angle will adopt the angle parameters from C3*–C3*–S, which stems from ethanethiol or other molecules containing this structure.

2. CF types and parameters for molecules involving sp2 and sp C

This section discusses CF types for molecules with sp2 C and sp C. sp3 C bonded with sp2 C atoms can mostly use the same type as generic sp3 C. For molecules with only sp2 C and sp3 C, the bond length between most of sp3 C and sp2 C was found to be inside the range of 1.50 Å–1.53 Å, and the angle composed of sp3 C - sp3 C - sp2 C is inside the range of 111.87°–113.70° (Tables S4 and S5). Comparing with sp3 C not bonded to sp2 C, the average value of the bond C3*–C3* is 1.522 Å, and the angle C3*–C3*–C3* is 112.10°, which are both inside the above ranges. However, the exceptions for this kind of sp3 C are normally caused by highly polarized functional groups around (such as sp3 C of methylene inside β-keto acid or sp3 C on the backbone of amino acid), which requires distinct CF types and parameters.

Based on different functional groups and chemical environments, sp2 C was divided into these categories: C in ketone, aldehyde, carboxylic acid, carboxylate, alkene, and imine. Special groups related to biochemistry need extra considerations to better describe their behavior in the biological system, such as the peptide bond widely existing in proteins and the carbamic acid prevalent in the biosynthetic process. Due to structural differences and the impact from sp2 N in these bio-fragments, CF types that show sp2 C in the peptide bond (–CO–NH–) and the carbamic acid (NH2–COO–) have been generated. Next, the CF types for sp2 C that are directly bonded to other sp2 C atoms were considered in the following categories: (1) the conjugate double C=C (long carbon chain with alternate C=C and C–C), (2) the connection between two C=O (for example pyruvate, oxalate, glyoxal, etc.), and (3) connection between C(=C) – C(=O). These sp2 C atoms use separate types of their own. In addition, the atom in the charged molecules (for example, the C in oxalic acid and that in oxalate) needs additional consideration. For most cases, there is no CF contribution from the sp2 C–sp2 C bond (i.e., jb=0), according to the CF direction rules defined in Sec. II.

Furthermore, CF types related to (1) alkyne and (2) nitrile molecules are included in our dataset, and their parameters were derived, despite the relatively rare occurrence of sp C compared to sp2 and sp3 C.

3. CF types and parameters for benzene derivatives and heterocyclic molecules

The carbon on benzene and its derivatives was defined as aromatic carbon (car). New CF types, different from those in an aliphatic carbon chain, were assigned for the atoms in substituent groups, which are directly bonded to benzene rings. Heterocyclic molecules (see Sec. II), which are commonly found in drug-like molecules, are also included in the training and validation set. Due to the unique electronic structures of heterocyclic molecules, it is necessary to assign exclusive types to capture CF in them.

B. Molecular dipole surface with vs without CF

In this work, CF produces a small correction to molecular dipole moments by modifying the permanent atomic charges. Therefore, a more accurate description of the dipole surface is obtained compared to that of the current AMOEBA+ model without CF implementation.

Our goal is to describe both the total molecular dipole moments and their X, Y, and Z components accurately. We examined both the total and components of different conformers of each molecule. Dipole components must maintain appropriate trends according to the QM reference. The molecular dipole moments of eight representative molecules with simple structures and small size are shown in Fig. 4, where the values from our model with and without CF implementation are compared with QM references. The specified bond lengths were distorted by −0.2 Å, −0.1 Å, 0 Å, 0.1 Å, and 0.2 Å from their equilibrium geometries. The dipole moment components, on which coordinate axis has the largest contribution to the overall magnitude and is directly impacted by deformation, are reported here for comparison. It is seen that without CF, the current model can capture the dipole moments of equilibrium molecular geometry but fails to describe the dipole moments of distorted geometries. To be specific, the AMOEBA+ model without CF, in general, shows weak dependence of molecular dipoles on local geometry changes. In particular, molecular dipole moments computed by AMOEBA+ without CF displayed the opposite trends to the QM for molecules of highly symmetric structures, such as CH4, NH3, and CH3Cl. By contrast, when CF is included, AMOEBA+ can well reproduce the QM dipole moments for all eight molecules. When larger molecules are examined, the improvement for all three dipole moment components was also observed. It should be mentioned here that angles are often coupled together so that it is difficult to separate the contributions as with bonds. However, the molecular dipole surface is also well captured along the degree of freedom of angles, which can be seen in the following analysis.

FIG . 4.

FIG . 4.

Molecular dipole moment components from QM and MM for representative molecules. The equilibrium bond was perturbed along the bond direction by ± 0.1 Å and ±0.2 Å. Molecules in the plots are as follows: (a) formaldehyde, (b) formic acid, (c) dimethyl sulfoxide, (d) hydrogenphosphate, (e) methane, (f) ammonia, (g) phenol, and (h) chloromethane.

From Fig. 5, it is seen that dipole gradients for the molecules in the training set have been improved when compared with AMOBEA+ without CF. The MSE for dipole gradients is reduced from 1.837 to 1.304 (D/Å)2. Meanwhile, R2 is improved from 0.596 to 0.704. Some conformers in the training set show a relatively large deviation. The reasonable explanation is that our training set incorporates a large scale of organic molecules, which cover various species of functional groups and distinct sizes. The molecules with large sizes have different chemical environments for interior and surface atoms. Besides, the displacement of the middle atom will cause complicated structural deformation of the interior. Thus, there is a high sensitivity for dipole gradient calculation in those molecules. In addition, the connection of functional groups, especially the benzene derivatives or the functional groups that are large and highly polarized (e.g., sulfonate or sulfoxide), will lead to inevitable mutual impact. Because the conformers in validation set II stem from molecules that own a large number of atoms and the complicated structure, the sensitivity of the dipole gradient cannot be handled easily.

FIG. 5.

FIG. 5.

Correlation plots of QM and MM numerical dipole gradients for molecules in the training set. The MM results are from AMOEBA+ without (left) and with CF (right) contributions. In total, 2211 conformers from 162 molecules are included, with each generated by following the rule described in Sec. II B. The MSEs without and with CF are 1.837 (D/Å)2 and 1.304 (D/Å),2 respectively. The correlation coefficients R2 without and with CF are 0.596 and 0.705, respectively.

C. Statistical analysis of the CF model and CF parameters

The bond length and angle parameters b0, b0, and θ0 were chosen to be the average equilibrium bond/angle when we determine CF atom types. The CF parameters j were fitted by targeting on the QM molecular dipole components of all 5836 conformers. The CF parameters have been summarized in Tables S6 and S7.

With the fitted CF parameters, the mean-squared-error (MSE) of dipole components from the training set is 2.187 × 10−3 D2, which shows notable improvement comparing to that without CF (MSE: 5.795 × 10−1 D2). A similar improvement was observed when the total molecular dipole moments were examined (a decrease in MSE from 6.917 × 10−1 to 3.154 × 10−3 D2) (Fig. 6 and Table II). It is seen from Fig. 6 that the AMOEBA+ without CF underestimates the QM total dipole moments for most of the molecules. The same conclusion can be drawn for the validation sets I and II (Sec. III D, Figs. 7 and 8). This can also be observed from the comparison of mean absolute error (MAE) and mean signed deviation (MSD) in Table II. Meanwhile, the dipole moments for molecules at both equilibrium and non-equilibrium geometry are all improved, which can be observed by the MSE of dipole components from the equilibrium group (from 6.806 × 10−1 to 4.249 × 10−4 D2) and non-equilibrium group (from 5.766 × 10−1 to 2.237 × 10−3 D2). Besides, the errors in molecular dipole moments from the CF model are more “random” as indicated by the small MSD.

FIG. 6.

FIG. 6.

Correlation plots of QM and MM total molecular dipole moments for molecules in the training set. In total, 5836 conformers from 162 molecules are included. Bond and angle are perturbed, as described in Sec. II B. The MSEs without and with CF are 5.795 × 10−1 D2 and 2.187 × 10−3 D2, respectively. The correlation coefficients R2 without and with CF are 0.887 and 0.999, respectively.

FIG. 7.

FIG. 7.

Correlation plots of QM and MM total molecular dipole moments for molecules in validation set I. In total, 1934 conformers from 162 molecules are included. Bond and angle are perturbed, as described in Sec. II B. The MSEs without and with CF are 7.608 × 10−1 D2 and 4.968 × 10−2 D2, respectively. The correlation coefficients R2 without and with CF are 0.827 and 0.981, respectively.

FIG. 8.

FIG. 8.

Correlation plots of QM and MM total molecular dipole moments for molecules in validation set II. In total, 2610 conformers from 50 molecules are included. Bond and angle are perturbed, as described in Sec. II B. The MSEs without and with CF are 5.876 × 10−1 D2 and 1.742 × 10−2 D2, respectively. The correlation coefficients R2 without and with CF are 0.669 and 0.871, respectively.

The distribution of fitted CF parameters is shown in Table III. The majority of the parameters fall into −0.1 to 0.1 (e/Å for the bond and proximal bond CF parameters jb and jb and e/degree for the angle CF parameter ja). This indicates that our CF model introduces relatively small modifications to the permanent charges, which noticeably improve the description of molecular dipole moment magnitudes and particularly the trends. Furthermore, the CF parameters of bond contribution distribute more broadly than those from the angle distribution. From Fig. 6, the blue filled circles show the conformations with large bond changes from equilibrium geometry, and they have larger deviations from reference values than those under large angle changes (orange filled circles), which suggests that the bond changes have a higher impact on the overall dipole moment than angle deformations. Additionally, the localization of CF effects hints at a smaller contribution from angle changes. Previous studies also suggested that the dihedral angle and distant atoms or structures have negligible CF effects.7,40

Here, we check the bond formed by sp3 C and other sp3 atoms to further verify the physical meaning of fitted parameters. The partial charge on C is more negative than that on H in a C–H bond. The C3*–H bond CF parameter is negative (−0.0717 e/Å), which means the negative dq will be added to H if the C–H bond length increases. Thus, the difference of charges between C and H decreases after adding the CF dq. This explains the correct trend of the dipole moment change. The same analysis can be applied to sp3 C–sp3 O, sp3 C–sp3 N, sp3 C–sp3 S, and sp3 C–halogens. Furthermore, similar cases emerge in other types of bonds, such as carbon in benzene and aromatic derivatives. All the final parameters have been listed in Figs. S6 and S7.

D. Accuracy and transferability of the CF model on the validation sets

Here, we validate the accuracy and transferability of the derived CF parameters on two sets of molecules/conformers. The first set includes the same molecules as the training sets but with different conformations (validation set I). The second set includes new molecules outside of the training set (validation set II). Validation set I is composed of 1934 conformers. Validation set II includes 2610 conformers from 50 new molecules (Table S2).

As shown in Fig. 7 and Table II, the CF model leads to great improvement in the molecular dipole moments for validation set I: the MSE decreased from 7.608 × 10−1 to 4.968 × 10−2 D2, and R2 increased from 0.827 to 0.981. This demonstrates the accuracy and transferability of our CF parameters for the molecules in the training set. As for validation set II, the significant improvement in the correlation between QM and AMOEBA+ (with CF) dipole moments can be seen in Fig. 8 and Table II. The MSE decreases from 5.876 × 10−1 down to 1.742 × 10−1 D2, while R2 is substantially enhanced from 0.669 to 0.871. A similar MSE decrease also appears in both equilibrium and non-equilibrium groups. Furthermore, Fig. 8 again shows that the larger differences between QM and AMOEBA+ (CF) dipole moments are mostly caused by large bond length deformations.

Most conformers in the validation sets show significant improvement, while some conformations of a few molecules show noticeable disagreement with reference ab initio molecular dipole moments. The large errors all arise from bond-large groups (±0.2 Å modification on the specified bond), including the conformers generated by changing the C–S bond in C–S–C of diisopropylsulfide, the C–O bond in C–O–C of diisopropylether, and the C=O bond of isobutyric anhydride. The possible reason is the size of molecules chosen in training and validation sets. The molecules in the training set have a relatively smaller size, with the smallest ones containing only four atoms. Validation set II includes molecules with the minimum number of atoms of 11. The smaller molecule size in the training set means that the change in a bond or angle will lead to a relatively big change in molecular dipole moments. Therefore, the intensity parameters from this training set may over predict the sensitivity of molecular dipole moments in the larger molecules. In addition, there are essentially no torsion angles among the training set molecules, while the large molecules in validation set II do possess the torsional degree of freedom, even though we believe that it has a small effect on charge fluxes.7,40

It should be noted that Sedghamiz et al.16 suggested that torsional rotation is a major contributor to charge fluxes that modify molecular dipole moments, especially for biomolecules, while bond and angle play as indirect factors. The biomolecules chosen by Sedghamiz et al. are amino acids, which are collectively larger than small organic molecules used by Hagler’s. The torsions will cause relatively strong conformational changes in the biological system, and Sedghamiz et al. might use conformers largely generated by rotating torsions without paying much attention to bonds and angles. Additionally, the concept of atomic charge is artificially defined and experimentally unobservable, unlike molecular dipoles. There are various methods to define the atomic charge so that calculated CF effects may vary.

For organic molecules, the current CF types and parameters are capable of producing the correct molecular dipole moment and show transferability in different molecules with similar types of bonds and angles. It is possible to further improve the performance of current CF parameters by incorporating molecules with relatively larger size into training, which will avoid overfitting and broaden the applicable molecule set. This was not done in the current study as we would like to verify the transferability of the CF model and parameters from small to large molecules. In the future application to biomolecules, we will further expand our training set to include large bio-fragments.

E. Final CF parameters by including all the molecules in the training set and validation set in the fitting

To ensure that our final CF parameters are appropriate for an even larger scale of molecules, which is essential for the following application in realistic molecular systems, the fitting procedure has been repeated on all the conformers from the training set and two validation sets (the total number is 10 384). In this process, the objective function was the MSE of dipole components only (not including the dipole gradient) from all conformers. The statistical analysis of molecular dipole moments has been reported in Table IV, together with a correlation plot in Fig. 9. Decomposition based on the training and validation sets are shown in Fig. S2. Overall, it is found that further improvement has been achieved over the CF parameters that were derived from the training set only. This suggests that increasing the dataset reduces the bias of parameters, hereby facilitating the transferability.

TABLE IV.

Statistical comparison of QM molecular dipole moments and molecular mechanics (MM) values calculated by the AMOEBA+ model with CF implementation. All the molecules both in the training and validation sets are included in parameterization.

Statistics Training set Validation set I Validation set II
MSE components (D2) 8.401 × 10−3 2.401 × 10−2 3.800 × 10−2
Eq: 9.084 × 10−3 Eq: 2.436 × 10−2
Non-eq: 8.382 × 10−3 Non-eq: 3.827 × 10−2
MSE total (D2) 1.438 × 10−2 3.240 × 10−2 5.962 × 10−2
MAE total (D) 6.503 × 10−2 1.104 × 10−1 1.654 × 10−1
MSD total (D) −1.137 × 10−2 −8.713 × 10−3 −1.412 × 10−2
R2 0.997 0.990 0.979

FIG. 9.

FIG. 9.

Correlation plots of QM and MM total molecular dipole moments for all the molecules from the training set, validation set I, and validation set II based on the final CF parameters from fitting all the molecules (10 384 conformations).

IV. CONCLUSION

In summary, we determined the parameters for our previously implemented CF model for common organic molecules and biomolecular fragments. A set of molecules with high chemical diversity was first optimized. Their geometries regarding bond/angle were perturbed, and the molecular dipole moment surface was calculated with a high-level QM method. To reduce the dimension of parameter space, CF atom types were then generated based on SMARTS pattern matching. The equilibrium bond/angle parameters b0, b0, and θ0 were determined at the same time along with atom typing. The remaining CF parameters (jb, jb, and ja) were determined by fitting to the QM dipole surface.

With the CF parameters derived above, we demonstrate that the AMOEBA+ (CF) model is capable of capturing the dipole moments of not only equilibrium but also the non-equilibrium structures, which, by contrast, cannot be achieved without incorporating an explicit CF model. The fitted CF parameters produced remarkable improvements for molecular dipole moments both in training and validation sets. Our results demonstrate the reliability and transferability of the current CF model and parameters in terms of obtaining the correct dipole surface in molecular simulations.

It was also observed that the major influence on the molecular dipole originated from bond deformation in the current CF model. Finally, it should be mentioned that the main goal of this work is to explore a general procedure for CF parameterization, in the framework of the AMOEBA+ model. This procedure can be utilized by other molecular mechanical potential energy functions, including those of fixed charged force fields. A broader range of molecules such as those in drug-like molecular databases and large biological fragments will be investigated in the future. The CF algorithms, including energy and force evaluation, have been implemented in the canonical Tinker41 and Tinker-OpenMM42,43 packages on the CUDA platform. As tested in water simulations, the additional computational cost due to CF is less than 1%.18 These CF implementations are available in the public accessible GitHub sites as AMOEBA+ CF branches.44,45

SUPPLEMENTARY MATERIAL

See the supplementary material for the details of the training and validation sets, atom types defined using SMARTS strings, derived CF parameters, comparison of experimental and PBE0 molecular dipole moment, and comparison of molecular dipole moments calculated with Gaussian and Psi4.

AUTHORS’ CONTRIBUTIONS

X.Y. and C.L. contributed equally to this work.

DATA AVAILABILITY

The data that support the findings of this study are available within the article and the supplementary material. The AMOEBA+ (CF) code is available in the public accessible Github site TinkerTools/Tinker and TinkerTools/Tinker-OpenMM as AMOEBA+ CF branches.

ACKNOWLEDGMENTS

The authors are grateful for support by the National Institutes of Health (Grant Nos. R01GM106137 and R01GM114237), the Cancer Prevention and Research Institute of Texas (Grant No. RP160657), and the National Science Foundation (Grant No. CHE-1856173).

Note: This paper is part of the JCP Special Topic on Classical Molecular Dynamics (MD) Simulations: Codes, Algorithms, Force Fields, and Applications.

REFERENCES

  • 1.Hagler A. T., “Force field development phase II: Relaxation of physics-based criteria… or inclusion of more rigorous physics into the representation of molecular energetics,” J. Comput.-Aided Mol. Des. 33(2), 205 (2019). 10.1007/s10822-018-0134-x [DOI] [PubMed] [Google Scholar]
  • 2.Lemkul J. A., Huang J., Roux B., and A. D. MacKerell, Jr., “An empirical polarizable force field based on the classical Drude oscillator model: Development history and recent applications,” Chem. Rev. 116(9), 4983 (2016). 10.1021/acs.chemrev.5b00505 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Jing Z., Liu C., Cheng S. Y., Qi R., Walker B. D., Piquemal J.-P., and Ren P., “Polarizable force fields for biomolecular simulations: Recent advances and applications,” Annu. Rev. Biophys. 48, 371 (2019). 10.1146/annurev-biophys-070317-033349 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Patel S. and Brooks C. L., “Fluctuating charge force fields: Recent developments and applications from small molecules to macromolecular biological systems,” Mol. Simul. 32(3-4), 231 (2006). 10.1080/08927020600726708 [DOI] [Google Scholar]
  • 5.Patel S. and Brooks C. L. III, “CHARMM fluctuating charge force field for proteins: I parameterization and application to bulk organic liquid simulations,” J. Comput. Chem. 25(1), 1 (2004). 10.1002/jcc.10355 [DOI] [PubMed] [Google Scholar]
  • 6.Patel S., A. D. Mackerell, Jr., and Brooks C. L. III, “CHARMM fluctuating charge force field for proteins: II protein/solvent properties from molecular dynamics simulations using a nonadditive electrostatic model,” J. Comput. Chem. 25(12), 1504 (2004). 10.1002/jcc.20077 [DOI] [PubMed] [Google Scholar]
  • 7.Dinur U. and Hagler A. T., “Geometry-dependent atomic charges: Methodology and application to alkanes, aldehydes, ketones, and amides,” J. Comput. Chem. 16(2), 154 (1995). 10.1002/jcc.540160204 [DOI] [Google Scholar]
  • 8.Dinur U., ““Flexible” water molecules in external electrostatic potentials,” J. Phys. Chem. 94(15), 5669 (1990). 10.1021/j100378a013 [DOI] [Google Scholar]
  • 9.Fanourgakis G. S. and Xantheas S. S., “The bend angle of water in ice Ih and liquid water: The significance of implementing the nonlinear monomer dipole moment surface in classical interaction potentials,” J. Chem. Phys. 124(17), 174504 (2006). 10.1063/1.2193151 [DOI] [PubMed] [Google Scholar]
  • 10.Pan C., Liu C., Peng J., Ren P., and Huang X., “Three-site and five-site fixed-charge water models compatible with AMOEBA force field,” J. Comput. Chem. 41(10), 1034 (2020). 10.1002/jcc.26151 [DOI] [PubMed] [Google Scholar]
  • 11.Palmo K., Mannfors B., Mirkin N. G., and Krimm S., “Inclusion of charge and polarizability fluxes provides needed physical accuracy in molecular mechanics force fields,” Chem. Phys. Lett. 429(4), 628 (2006). 10.1016/j.cplett.2006.08.087 [DOI] [Google Scholar]
  • 12.Palmo K., Mannfors B., and Krimm S., “Balanced charge treatment of intramolecular electrostatic interactions in molecular mechanics energy functions,” Chem. Phys. Lett. 369(3), 367 (2003). 10.1016/s0009-2614(02)02032-8 [DOI] [Google Scholar]
  • 13.Mannfors B., Palmo K., and Krimm S., “Spectroscopically determined force field for water dimer: Physically enhanced treatment of hydrogen bonding in molecular mechanics energy functions,” J. Phys. Chem. A 112(49), 12667 (2008). 10.1021/jp806948w [DOI] [PubMed] [Google Scholar]
  • 14.Burnham C. J. and Xantheas S. S., “Development of transferable interaction models for water. IV. A flexible, all-atom polarizable potential (TTM2-F) based on geometry dependent charges derived from an ab initio monomer dipole moment surface,” J. Chem. Phys. 116(12), 5115 (2002). 10.1063/1.1447904 [DOI] [Google Scholar]
  • 15.Fanourgakis G. S. and Xantheas S. S., “Development of transferable interaction potentials for water. V. Extension of the flexible, polarizable, Thole-type model potential (TTM3-F, v. 3.0) to describe the vibrational spectra of water clusters and liquid water,” J. Chem. Phys. 128(7), 074506 (2008). 10.1063/1.2837299 [DOI] [PubMed] [Google Scholar]
  • 16.Sedghamiz E., Nagy B., and Jensen F., “Probing the importance of charge flux in force field modeling,” J. Chem. Theory Comput. 13(8), 3715 (2017). 10.1021/acs.jctc.7b00296 [DOI] [PubMed] [Google Scholar]
  • 17.Sedghamiz E. and Ghalami F., “Evaluating the effects of geometry and charge flux in force field modeling,” J. Phys. Chem. A 122(19), 4647 (2018). 10.1021/acs.jpca.7b12198 [DOI] [PubMed] [Google Scholar]
  • 18.Liu C., Piquemal J. P., and Ren P., “Implementation of geometry-dependent charge flux into the polarizable AMOEBA+ potential,” J. Phys. Chem. Lett. 11(2), 419 (2020). 10.1021/acs.jpclett.9b03489 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Liu C., Piquemal J.-P., and Ren P., “AMOEBA+ classical potential for modeling molecular interactions,” J. Chem. Theory Comput. 15(7), 4122 (2019). 10.1021/acs.jctc.9b00261 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wang L.-P., Martinez T. J., and Pande V. S., “Building force fields: An automatic, systematic, and reproducible approach,” J. Phys. Chem. Lett. 5(11), 1885 (2014). 10.1021/jz500737m [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Jensen F., Introduction to Computational Chemistry (John Wiley & Sons, 2017). [Google Scholar]
  • 22.Kantonen S. M., Muddana H. S., Schauperl M., Henriksen N. M., Wang L.-P., and Gilson M. K., “Data-driven mapping of gas-phase quantum calculations to general force field Lennard-Jones parameters,” J. Chem. Theory Comput. 16, 1115 (2020). 10.1021/acs.jctc.9b00713 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Popelier P. L. A. and Aicken F. M., “Atomic properties of amino acids: Computed atom types as a guide for future force-field design,” ChemPhysChem 4(8), 824 (2003). 10.1002/cphc.200300737 [DOI] [PubMed] [Google Scholar]
  • 24.Matta C. F., “How dependent are molecular and atomic properties on the electronic structure method? Comparison of Hartree-Fock, DFT, and MP2 on a biologically relevant set of molecules,” J. Comput. Chem. 31(6), 1297 (2010). 10.1002/jcc.21417 [DOI] [PubMed] [Google Scholar]
  • 25.Schmidt R., Ehmki E. S. R., Ohm F., Ehrlich H.-C., Mashychev A., and Rarey M., “Comparing molecular patterns using the example of SMARTS: Theory and algorithms,” J. Chem. Inf. Model. 59(6), 2560 (2019). 10.1021/acs.jcim.9b00250 [DOI] [PubMed] [Google Scholar]
  • 26.Rackers J. A., Wang Q., Liu C., Piquemal J.-P., Ren P., and Ponder J. W., “An optimized charge penetration model for use with the AMOEBA force field,” Phys. Chem. Chem. Phys. 19(1), 276 (2017). 10.1039/c6cp06017j [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Liu C., Qi R., Wang Q., Piquemal J.-P., and Ren P., “Capturing many-body interactions with classical dipole induction models,” J. Chem. Theory Comput. 13(6), 2751 (2017). 10.1021/acs.jctc.7b00225 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hickey A. L. and Rowley C. N., “Benchmarking quantum chemical methods for the calculation of molecular dipole moments and polarizabilities,” J. Phys. Chem. A 118(20), 3678 (2014). 10.1021/jp502475e [DOI] [PubMed] [Google Scholar]
  • 29.Hait D. and Head-Gordon M., “How accurate is density functional theory at predicting dipole moments? An assessment using a new database of 200 benchmark values,” J. Chem. Theory Comput. 14(4), 1969 (2018). 10.1021/acs.jctc.7b01252 [DOI] [PubMed] [Google Scholar]
  • 30.Frisch M. J., Trucks G. W., Schlegel H. B., Scuseria G. E., Robb M. A., Cheeseman J. R., Scalmani G., Barone V., Mennucci B., Petersson G. A., Nakatsuji H., Caricato M., Li X., Hratchian H. P., Izmaylov A. F., Bloino J., Zheng G., Sonnenberg J. L., Hada M., Ehara M., Toyota K., Fukuda R., Hasegawa J., Ishida M., Nakajima T., Honda Y., Kitao O., Nakai H., Vreven T., J. A. Montgomery, Jr., Peralta J. E., Ogliaro F., Bearpark M., Heyd J. J., Brothers E., Kudin K. N., Staroverov V. N., Kobayashi R., Normand J., Raghavachari K., Rendell A., Burant J. C., Iyengar S. S., Tomasi J., Cossi M., Rega N., Millam J. M., Klene M., Knox J. E., Cross J. B., Bakken V., Adamo C., Jaramillo J., Gomperts R., Stratmann R. E., Yazyev O., Austin A. J., Cammi R., Pomelli C., Ochterski J. W., Martin R. L., Morokuma K., Zakrzewski V. G., Voth G. A., Salvador P., Dannenberg J. J., Dapprich S., Daniels A. D., Farkas Ö., Foresman J. B., Ortiz J. V., Cioslowski J., and Fox D. J., Gaussian 09, Gaussian, Inc., Wallingford, CT, 2009. [Google Scholar]
  • 31.Parrish R. M., Burns L. A., Smith D. G. A., Simmonett A. C., DePrince A. E., Hohenstein E. G., Bozkaya U., Sokolov A. Y., Di Remigio R., Richard R. M., Gonthier J. F., James A. M., McAlexander H. R., Kumar A., Saitow M., Wang X., Pritchard B. P., Verma P., Schaefer H. F., Patkowski K., King R. A., Valeev E. F., Evangelista F. A., Turney J. M., Crawford T. D., and Sherrill C. D., “Psi4 1.1: An open-source electronic structure program emphasizing automation, advanced libraries, and interoperability,” J. Chem. Theory Comput. 13(7), 3185 (2017). 10.1021/acs.jctc.7b00174 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Wu J. C., Chattree G., and Ren P., “Automation of AMOEBA polarizable force field parameterization for small molecules,” Theor. Chem. Acc. 131(3), 1138 (2012). 10.1007/s00214-012-1138-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wang Q., Rackers J. A., He C., Qi R., Narth C., Lagardere L., Gresh N., Ponder J. W., Piquemal J.-P., and Ren P., “General model for treating short-range electrostatic penetration in a molecular mechanics force field,” J. Chem. Theory Comput. 11(6), 2609 (2015). 10.1021/acs.jctc.5b00267 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Stone A. J., “Distributed multipole analysis: Stability for large basis sets,” J. Chem. Theory Comput. 1(6), 1128 (2005). 10.1021/ct050190+ [DOI] [PubMed] [Google Scholar]
  • 35.Moré J. J., “The Levenberg-Marquardt algorithm: Implementation and theory,” in Numerical Analysis, edited by Watson G. A. (Springer Berlin Heidelberg, Berlin, Heidelberg, 1978), p. 105. [Google Scholar]
  • 36.Errington J. R. and Panagiotopoulos A. Z., “New intermolecular potential models for benzene and cyclohexane,” J. Chem. Phys. 111(21), 9731 (1999). 10.1063/1.480308 [DOI] [Google Scholar]
  • 37.Lee J.-S., Wick C. D., Stubbs J. M., and Siepmann J. I., “Simulating the vapour–liquid equilibria of large cyclic alkanes,” Mol. Phys. 103(1), 99 (2005). 10.1080/00268970412331303341 [DOI] [Google Scholar]
  • 38.Jorge M., “Predicting hydrophobic solvation by molecular simulation: 2. New united-atom model for alkanes, alkenes, and alkynes,” J. Comput. Chem. 38(6), 359 (2017). 10.1002/jcc.24689 [DOI] [PubMed] [Google Scholar]
  • 39.Politzer P., Murray J. S., and Clark T., “Halogen bonding and other σ-hole interactions: A perspective,” Phys. Chem. Chem. Phys. 15(27), 11178 (2013). 10.1039/c3cp00054k [DOI] [PubMed] [Google Scholar]
  • 40.Galimberti D., Milani A., and Castiglioni C., “Charge mobility in molecules: Charge fluxes from second derivatives of the molecular dipole,” J. Chem. Phys. 138(16), 164115 (2013). 10.1063/1.4802009 [DOI] [PubMed] [Google Scholar]
  • 41.Rackers J. A., Wang Z., Lu C., Laury M. L., Lagardère L., Schnieders M. J., Piquemal J.-P., Ren P., and Ponder J. W., “Tinker 8: Software tools for molecular design,” J. Chem. Theory Comput. 14(10), 5273 (2018). 10.1021/acs.jctc.8b00529 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Harger M., Li D., Wang Z., Dalby K., Lagardère L., Piquemal J.-P., Ponder J., and Ren P., “Tinker-OpenMM: Absolute and relative alchemical free energies using AMOEBA on GPUs,” J. Comput. Chem. 38(23), 2047 (2017). 10.1002/jcc.24853 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Eastman P., Swails J., Chodera J. D., McGibbon R. T., Zhao Y., Beauchamp K. A., Wang L.-P., Simmonett A. C., Harrigan M. P., Stern C. D., Wiewiora R. P., Brooks B. R., and Pande V. S., “OpenMM 7: Rapid development of high performance algorithms for molecular dynamics,” PLoS Comput. Biol. 13(7), e1005659 (2017). 10.1371/journal.pcbi.1005659 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.AMOEBA+ CF model in Tinker; https://github.com/TinkerTools/Tinker/tree/AMOEBA+CF, accessed on 07 September 2020.
  • 45.AMOEBA+ CF model in Tinker-OpenMM; https://github.com/TinkerTools/Tinker-OpenMM/tree/AMOEBA+CF, accessed on 07 September 2020.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

See the supplementary material for the details of the training and validation sets, atom types defined using SMARTS strings, derived CF parameters, comparison of experimental and PBE0 molecular dipole moment, and comparison of molecular dipole moments calculated with Gaussian and Psi4.

Data Availability Statement

The data that support the findings of this study are available within the article and the supplementary material. The AMOEBA+ (CF) code is available in the public accessible Github site TinkerTools/Tinker and TinkerTools/Tinker-OpenMM as AMOEBA+ CF branches.


Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES