Abstract
The performance of three density functional theory (DFT) exchange–correlation functionals, namely, Perdew–Burke–Ernzerhof (PBE), BP86, and B3LYP, in predicting conformational distributions of a hydrated glycine peptide is tested with two different basis sets in the framework of adaptive force matching (AFM). The conformational distributions yielded the free energy profiles of the DFT functional and basis set combinations. Unlike traditional validations of potential energy and structural parameters, our approach allows the free energy of DFT to be validated. When compared to experimental distributions, the def2-TZVP basis set provides better agreement than a slightly trimmed aug-cc-pVDZ basis set. B3LYP is shown to be better than BP86 and PBE. The glycine model fitted against B3LYP-D3(BJ) with the def2-TZVP basis set is the most accurate and named the AFM2021 model for glycine. The AFM2021 glycine model provides better agreement with experimental J-coupling constants than C36m and ff14SB, although the margin is very small when compared to C36m. Our previously published alanine model is also refitted with the slightly simplified AFM2021 energy expression. This work shows good promise of AFM for developing force fields for a range of proteinogenic peptides using only DFT as reference.
I. INTRODUCTION
Computer simulations of proteins are a powerful tool to investigate fundamental biological processes and can provide valuable insight into computer aided drug design.1–3 While ab initio simulations based on electronic structure methods, such as density functional theory (DFT),4 are gaining popularity on various topics in materials sciences, such simulations are rarely used for proteins or even polypeptides. The high computational cost of ab initio molecular dynamics (MD) generally limits their applications to small systems with no more than a few hundred atoms and only for tens or hundreds of picosecond timescale.5,6 A simulation box for a hydrated peptide can easily have thousands of atoms. In addition, nanoseconds are generally considered short when peptides and proteins are investigated.7 Thus, direct ab initio MD simulations on such biological systems are very costly.
Recently, it has been shown that by force matching8–11 Perdew–Burke–Ernzerhof (PBE)12 exchange–correlation coupled with Grimme’s D3 dispersion correction,13,14 a force field created by Adaptive Force Matching (AFM)15–18 provides excellent agreement with experimental nuclear magnetic resonance (NMR) scalar coupling constants for hydrated alanine peptides. The force field was referred to as AFM2020.19 The performance of AFM2020 is respectable since it outperformed even some models that were fit directly to such NMR data during parameterization. Since AFM2020 was based entirely on PBE-D3 reference forces, the performance of the AFM2020 model indirectly reflects the quality of the reference DFT method.
The previous work on the PBE based alanine force field focused on establishing the feasibility of creating a fully electronic structure based force field for small peptides. No evaluation was performed on the optimal choice of basis set and exchange–correlation functional. The alanine model was based on a slim version of the aug-cc-pVDZ basis set20,21 with augmented functions on hydrogens removed. Aug-cc-pVDZ is a good basis set, especially in a condensed phase environment, where basis functions on adjacent atoms can be borrowed to provide a better description of the electron density, leading to improved basis set completeness. However, it has been argued that the aug-cc-pVDZ basis set provides a poor balance for different atomic species.22 This is a concern not just for the aug-cc-pVDZ set but also for any basis set developed based on a rigid n-tuple principle and polarized according to a strict correlation consistent scheme. This leads to an imbalance of the basis set completeness along each row of the periodic table, as the number of electrons changes, but not the number of basis and polarization functions. This is particularly critical for hydrogen, which is ubiquitous in biological systems. With only one electron, hydrogen is over-polarized in basis sets developed with such a strict scheme. If diffuse functions on hydrogens were to be kept, it might lead to even worse imbalance and large basis set superposition errors. For the accurate description of the potential surface, basis sets that are internally balanced and designed to minimize errors in ground state properties across the periodic table may have advantages over basis sets created following a strict n-tuple scheme for valence and polarization sets. One such balanced basis set is the def2 family of basis sets by Weigend et al.22,23 It has been argued that for DFT, the def2-TZVP basis set, which is not much more costly than aug-cc-pVDZ, would reach the complete basis set limit.
In this work, we will compare the performance of the aug-cc-pVDZ and def2-TZVP basis sets in reproducing experimental conformational distributions as measured by NMR scalar coupling constants for hydrated polyglycine. The experimental NMR scalar coupling constants are correlated with the conformational distributions through the Karplus relationship.24–27 Since the performance of the def2-TZVP basis set was originally tested with the BP8628–30 exchange–correlation functional,22,23 we decided to compare the BP86 functional along with the PBE functional. We also tested the performance of a hybrid functional, B3LYP.31,32 A hybrid functional is generally anticipated to be more accurate than pure exchange correlation functionals, such as PBE and BP86.
It is worth pointing out that the validation of the functional and basis set is performed using long molecular dynamics (MD) trajectories on the DFT potential energy surface mapped by our AFM method. Traditional validations of DFT tend to rely on single point energy calculations and geometry optimizations. At finite temperature, a large ensemble of conformations away from geometry minimum contributes to ensemble properties, such as free energy. To obtain good agreement with experimental NMR, the DFT treatment must produce highly accurate free energy surfaces, which are proportional to the logarithm of relative populations of the conformations. A good free energy profile requires the DFT energies to be accurate for all accessible conformations. Free energy validation is especially challenging for a peptide since different conformations have very small free energy differences. Long trajectories are required to achieve adequate convergence. In addition, the peptide free energy profile is strongly influenced by hydration water as different conformations interact with water differently. To properly account for hydration, our simulation box contains 2709 atoms.
This paper is arranged in four sections. In Sec. II, we describe our slightly simplified protocol for AFM. Section III describes the procedure for the validation of the quality of the models. Section IV summarizes the results and provides additional discussions.
II. AFM DEVELOPMENT OF DFT BASED MODELS FOR HYDRATED GLYCINE
A unique feature of AFM is to eliminate the use of combination rules and fit custom interaction potentials specifically optimized for each unique atom pair. The strength of such an approach is to eliminate the need for the compromise between different types of pairwise interactions as a result of combination rules. However, such an approach would make it necessary to create one force field for each amino acid. In this work, we test some simplifications to our AFM protocol for alanine19 in hopes of reducing the computational burden when force fields for other peptides are developed.
For the alanine force field,19 a seven residue zwitterionic peptide was used. In order to reduce the number of atoms to be treated quantum mechanically, we attempt to use five residues with the Ace-Gly5-NMe peptide. Such a peptide is commonly referred to as a blocked peptide.33–35 Ace-Gly5-NMe will form the same number of hydrogen bonds in the α-helix conformation as zwitterionic Gly7. At the same time, such a blocked peptide has fewer atoms and is thus less costly for DFT calculations.
Below, we provide a brief summary of the protocol for fitting glycine, highlighting only the differences between the glycine and alanine protocols. The alanine dataset was also refit using the revised protocol to ensure that the simplified protocol leads to similar performances also for alanine. Since the energy expressions have been changed slightly, we refer to the new peptide models as AFM2021 and provide the alanine and glycine parameters in the supplementary material and release the GROMACS36 parameter files to the general public.37
Figure 1 provides a summary of the atom types used for the glycine model. All repeating residues share the same atom types, and the two terminal groups have their own parameters. Similar to alanine,21 dispersion will be fitted before other parameters using a fragment-based approach with the short-range damped (SRD) energy expression as follows:
| (1) |
where r0 is 0.6 times the sum of the atomic van der Waals (vdW) radius.38 The fragments used for fitting dispersion are provided in the supplementary material. The reference gradients were computed using Grimme’s D3 approach with Becke–Johnson (BJ) damping.13,14 The hyperparameters in the D3(BJ) correction are dependent on the exchange–correlation functional; thus, the fitted C6 parameters for the three different exchange–correlation functionals are slightly different.
FIG. 1.
Atom types used for the blocked polyglycine peptide while developing the force field. The GLY unit repeats for longer peptides.
After dispersion is fitted, standard AFM iterations are performed to determine the remaining parameters. MD sampling for AFM was performed with the Ace-Gly5-NMe peptide in a box of 2649 water molecules. The initial guess force field for glycine is AMBER ff99SB.39
Since glycine is not chiral, it has two equivalent basins for each type of secondary structure. A decision was made to always sample both symmetry equivalent basins. In the MD step, the following conformation groups were sampled:
-
(1)
An unbiased group was sampled where no constraint was applied during the MD sampling.
-
(2)
The α-helix group has conformations from both φ = −60°, ψ = −45° and φ = 60°, ψ = 45°, covering both left and right turn helices. For sampling this group, we also constrained the α-helix hydrogen bonds. We decided to use three different bond length constraints of 1.5, 1.7, and 1.9 Å with a force constant of 1000 kJ/(mol Å2). The N–H–O angle was also constrained to 180° with a force constant of 500 kJ/(mol rad2).
-
(3)
The β-strand conformation group was sampled with φ = −135°, ψ = 135° and φ = 135°, ψ = −135°.
-
(4)
The polyproline II conformation group was sampled with φ = −75°, ψ = 150° and φ = 75°, ψ = −150°.
-
(5)
A survey group was added to sample torsional angles not visited sufficiently by the other four groups of conformations. For glycine, this group includes φ = −150°, ψ = −60°; φ = 150°, ψ = 60°; φ = −150°, ψ = 60°; and φ = 150°, ψ = −60°.
-
(6)
A terminal contact group was added. As we were using the experimental J-coupling constants for Gly3 to check our model, MD simulations for validation were performed with a shorter Ace-Gly3-NMe peptide. For the shorter peptide, the terminal groups have a much higher likelihood of approaching each other than for the longer peptide used as the training set. In order to ensure that conformations are present in the training set to properly capture interactions between the terminal residues, a terminal contact group was created by constraining H5–O2 or H4–O2 (Fig. 1) distances to 1.5, 1.7, and 1.9 Å with a force constant of 1000 kJ/(mol Å2). The terminal contact group is a standing group. A standing group is not resampled in each generation but included with the fit for each generation and in the global fit.
Table I summarizes the composition of the training set. For groups 1 through 4, MD simulations were performed at both 310 and 360 K. The terminal contact group was only simulated at 310 K. Unless otherwise noted in Table I, 15 conformations were used at each temperature in each group. The six groups provided a total of 360 conformations for each generation. As a standing group, the terminal contact group was only sampled once; all other groups were resampled in each generation of AFM. As with alanine,21 we fit up to three generations of conformations. Only one standing group was used even if the data from multiple generations were fitted. This training set thus provided 960 conformations for the FM step of each generation from the third one. For the global fit, we used five generations with a total of 1560 conformations.
TABLE I.
Dihedral angle constraints and number of conformations (N) in each training group.
| Group | φ (deg) | ψ (deg) | T (K) | N |
|---|---|---|---|---|
| Unrestrained | N/Aa | N/A | 310, 360 | 60 |
| α-helixb | −60 | −45 | 310, 360 | 30 |
| 60 | 45 | 310, 360 | 30 | |
| β-sheet | −135 | −135 | 310, 360 | 30 |
| 135 | −135 | 310, 360 | 30 | |
| PP-II | −75 | 150 | 310, 360 | 30 |
| 75 | −150 | 310, 360 | 30 | |
| Survey | −150 | −60 | 310 | 15 |
| −150 | 60 | 310 | 15 | |
| 150 | −60 | 310 | 15 | |
| 150 | 60 | 310 | 15 | |
| Terminal contactb | N/A | N/A | 310 | 60 |
Not applicable.
Distance or angle constraints are also applied; see text for more details.
In the QM/MM step, the QM/MM region is identified by the following method:
-
(1)
Ace–Gly5–NMe is included in the QM region.
-
(2)
If a water molecule is within 4.5 Å of a carbon atom or within 3.8 Å of any other peptide atoms, it will be included in the QM region.
-
(3)
Five of the QM water molecules identified in step 2 are randomly selected. Any water molecules within 2.6 Å of the selected water molecules will be included in the QM region.
-
(4)
Water molecules within 6 Å from the boundary of the QM region will be included in the MM region. Water molecules further away will be discarded.
-
(5)
The peptide and any QM water without MM particles within 2.6 Å will have a solvation factor of one; otherwise, their solvation factor will be zero.15
With this procedure, the number of water molecules in the QM region ranges from 47 to 84. However, on average, only 11 water molecules are fitted as the other water molecules are too close to the QM/MM boundary.
After the QM/MM region was identified, all particles in the MM region were presented as point charges and forces on the QM particles were computed using the Parallel Quantum Solutions (PQS) program.40 The Fourier transform Coulomb (FTC) technique41 for two electron integrals was used for the pure DFT functionals. The self-consistent field calculations were terminated with a Brillouin convergence of 10−6. The one-electron and two-electron integral thresholds were chosen to be 10−11 and 10−10, respectively.
In the force matching (FM) step, the reference forces were fit using the same three step procedure as we did previously for alanine.19 When fitting the intramolecular terms for alanine, we fitted the equilibrium angle δ for the singe bond torsional potential (SP). This allowed the minimum of the torsional surface to deviate from multiples of 60°. For glycine, which is achiral, symmetry would require a δ value of zero. We decided to refit the alanine model also with a δ value of zero as it simplifies the model. With a nonzero δ value, the sign of δ will depend on chirality.
When fitting alanine, we relied on an exponential term for short range repulsion. In this work, we switched to a combination of the generalized exponential,
| (2) |
and simple exponential terms. The generalized exponential term was used only for atom pairs that can form intra-peptide hydrogen bonds and for intimate pairs where the interatomic distance is less than 2 Å.
As a result of the minor changes to the energy expressions, the name of our peptide models has been changed to AFM2021. As mentioned previously, the alanine model was also refit using the slightly modified energy expressions. When refitting the alanine model, we used the previous training set and reference forces already released to the general public without new sampling or QM/MM calculations. The AFM2021 alanine model provides a χ2 value (vide infra) very close to AFM2020, indicating that the different energy expressions provide similar accuracy. We report only the parameters for the AFM2021 alanine model in the supplementary material without further discussion.
III. PROCEDURE FOR VALIDATING THE GLYCINE MODEL
Four models for hydrated polyglycine were developed in this work. One model is based on PBE with the mixed aug-cc-pVDZ/cc-pVDZ basis set (aug-cc-pVDZ′), where the aug-cc-pVDZ basis set is used for all heavy atoms and cc-pVDZ for hydrogens. The aug-cc-pVDZ′ basis set was the basis set used in our previous alanine study. Another model is developed using the PBE functional but with the more balanced def2-TZVP basis set. Two models were developed with BP86 and B3LYP functionals also with the def2-TZVP basis set.
Experimental J-coupling constants were reported for cationic Gly3 peptides.26 As mentioned previously, our force field was fitted using reference forces for the Ace-Gly5-NMe peptide. The use of the blocked peptide during the fit was intentional as it allowed us to use a smaller QM region. While reducing the QM region size might not be important for glycine, it will be important for larger peptides. Our study thus provides insight into the influence of the terminal groups in an AFM based force field development.
A complication with using the blocked peptide is that experimental J-couplings were obtained with cationic peptides. The different terminal groups might affect the conformational distribution. Without a model for counterions, simulating cationic peptides by our force field model is currently not feasible. We decided to simulate also a zwitterionic peptide to get some insight into the influence of the terminal groups. Since we only have AFM glycine parameters for blocked terminal groups, the zwitterionic models were constructed by borrowing parameters, such as partial charges and short-range non-bonded parameters, from the AFM2020 zwitterionic alanine model. The parameters for the zwitterionic glycine model are reported in the supplementary material, and the GROMACS input files are provided on the website.37
Similar to our previous work, the scalar couplings computed with the Karplus equations42,43 were used to validate our models. We note that glycine is achiral with two symmetry equivalent basins connected by negations of both φ and ψ angles. Two conformations connected by a mirror symmetry must have the same J-coupling constants. On the other hand, typical peptide Karplus parameters were fitted in asymmetric environments,44,45 and not all Karplus equations provide identical J-coupling constants for the two symmetrically equivalent conformations of an achiral peptide.
The Karplus equation relates the backbone torsional angle, η, to the spin–spin coupling constant as follows:
| (3) |
In this equation, η can be either the φ or ψ angle. The offset angle θ approximates the torsional angle between the magnetically coupled nuclei and the backbone angle η. We note that Eq. (3) is symmetric with respect to the negation of the torsional angle, η + θ, between the two magnetically coupled nuclei. For chiral amino acids, there is only one proton on Cα, which is the L-amino-acid proton. This leads to a θ value of −60° for 3J(HN,Hα) and 120 for 3J(Hα,C′), where C′ is the carbonyl carbon. For glycine, there are two enantiotopic Hαs that are chemically equivalent. The other Hα will have a θ value of 60° for 3J(HN,Hα) and −120 for 3J(Hα,C′). Although the two Hαs are not magnetically equivalent, only one value is reported in the experimental J-coupling data from the study by Graf et al.26 We thus decided that for 3J(HN,Hα) and 3J(Hα,C′), the mean of both Hαs should be computed to ensure that the two symmetry equivalent conformations will give the same J-coupling constant in a homogeneous environment.
In addition, 3J(HN,Cα) was frequently fit to a second order Fourier expression with terms that are symmetric, antisymmetric, and asymmetric to the plane of reflection.46,47 This is appropriate for chiral amino acids and might also be appropriate for glycine in a chiral environment. However, for glycine in a homogeneous environment, it would lead to symmetry equivalent conformations to have different 3J(HN,Cα) values, which is unphysical. We decided to use the following formula:
| (4) |
where antisymmetric and asymmetric terms were dropped from the original expression of Hennig et al.47 The Hennig expression was a fit to experiments. For glycine, only the symmetric component would be relevant. Dropping terms with different symmetries will not affect the fit to the symmetric component of Hennig’s data.
The deviations between simulation and experimental NMR data were quantified with34
| (5) |
where σi is the root mean square deviation (RMSD) between the experimental coupling constants and the experimental fit to the Karplus equation.44,47–49 A total of six coupling constants, 3J(HN,Hα), 3J(HN,C′), 3J(Hα,C′), 1J(N,Cα), 2J(N,Cα), and 3J(HN,Cα), were computed and compared to experiments. We note that for 3J(HN,Hα), 3J(HN,C′), and 3J(Hα,C′), an early work of Best et al.34 used the Karplus equation parameters fitted to torsional angles determined using NMR by Hu and Bax.44 However, the σ value used by Best was from an x-ray based fit instead of the same NMR based fit. Since Hu and Bax clearly showed the high accuracy and consistency for the NMR based fit, we thus decided to use the NMR based σ value also provided by Hu and Bax. We note that although using the NMR based parameters and σ value is more consistent, it actually led to a slightly poorer performance for our models.
For 1J(N,Cα) and 2J(N,Cα), the σ values between the experimental J-coupling49 and the fit to the Karplus equation were calculated as a consistency check. Our results are in good agreement with those reported by Best et al.34 For 3J(HN,Cα), the J-couplings were calculated using the symmetrized equation [Eq. (4)] and the original experimental data from the study by Hennig et al.47 It is not surprising that the σ value based on the symmetrized equation [Eq. (4)] is slightly higher. The σ values and the associated Karplus parameters used for computing χ2 are summarized in Table II.
TABLE II.
Coefficients of Karplus equations and σ used to evaluate χ2. Note that two Hαs exist for glycine. Thus, 3J(HN,Hα) and 3J(Hα,C′) have two θ angles, one for each Hα.
| Coupling | A (Hz) | B (Hz) | C (Hz) | θ (deg) | σ (Hz) |
|---|---|---|---|---|---|
| 3J(HN,Hα) | 7.09 | −1.42 | 1.55 | ±60 | 0.39 |
| 3J(HN,C′) | 4.29 | −1.01 | 0.00 | 180 | 0.32 |
| 3J(Hα,C′) | 3.72 | −2.18 | 1.28 | ±120 | 0.24 |
| 1J(N,Cα) | 1.70 | −0.98 | 9.51 | 0 | 0.59 |
| 2J(N,Cα) | −0.66 | −1.52 | 7.85 | 0 | 0.53 |
| 3J(HN,Cα) | Equation (4) | 0.13 | |||
For each force field, the simulation was performed with a cubic box containing 892 BLYPSP-4F water molecules20 at 300 K and 1 bar. The Nosè–Hoover thermostat50,51 and Parrinello–Rahman barostat52 were used to control the temperature and pressure. The statistics for the J-coupling constants were collected from 1 µs trajectories for each model. We note that relative free energy differences between different basins in the secondary structure are fairly small. Long trajectories are required to achieve proper convergence. Our simulation box has 2708 atoms to provide a good description of the hydration environment for Ace-Gly3-NMe. Obtaining converged conformational distribution using such a box size with DFT is prohibitively expensive without our AFM algorithm.
Two traditional protein force fields, CHARMM53 C36m54,55 and Amber56 ff14SB,57,58 were also simulated under the same conditions but with TIP3P water59 as comparisons. The χ2 values for the C36m and ff14SB models were also computed with the parameters in Table II. To better understand the effects of terminal groups, χ2 was computed for each residue in addition to that for the whole peptide. Only J coupling constants related to the φ angle of the C terminus and ψ angle of the N terminus are included. We note that 3J(HN,Cα) depends on the φ and ψ angles of two adjacent residues. It is included, however, in the computation of the χ2 value for the central residue.
IV. RESULTS AND DISCUSSION
Table III reports the χ2 values for Ace-Gly3-NMe. The AFM based models are named with the exchange–correlation functional and basis set used for QM/MM calculations. A smaller χ2 value would indicate a better agreement with experiments. The performance of the four AFM models is fairly similar. The basis set is playing a larger role than the different choices of exchange–correlation functionals in improving agreement with experiments. With PBE-D3(BJ) reference forces computed with the aug-cc-pVDZ′ basis set, the χ2 value is 2.33. The use of the def2-TZVP basis set leads to a significant reduction by 0.31, which is larger than the difference between different functionals. We note that def2-TZVP being a triple-zeta quality basis set has significantly less basis functions than cc-pVTZ and is thus more computationally efficient.
TABLE III.
Calculated χ2 values for Ace-Gly3-NMe in water. The Grimme D3 dispersion with Beck–Johnson damping is used with all DFT functionals. The augmented functions on hydrogens are omitted in the aug-cc-pVDZ′ basis set. The B3LYP/def2-TZVP model is also referred to as the AFM2021 model for glycine.
| Model | χ2 |
|---|---|
| PBE/aug-cc-pVDZ′ | 2.33 |
| PBE/def2-TZVP | 2.02 |
| BP86/def2-TZVP | 1.95 |
| B3LYP/def2-TZVP | 1.86 |
| C36m | 1.93 |
| ff14SB | 3.44 |
Comparing different functionals, the hybrid functional, B3LYP, performed the best followed by BP86 and then PBE. The difference between B3LYP and PBE is only 0.16, which is about half of the improvement brought by the more balanced basis set. While B3LYP is indeed the most accurate, hybrid functionals are significantly more costly to use. When the computational resource is the limiting factor, it would be more beneficial to pick a well-balanced basis set than resort to a hybrid functional. With the B3LYP-D3(BJ)/def2-TZVP based AFM model giving the lowest χ2 value, we will release this model as the AFM2021 model for glycine.37
Both the CHARMM C36m and AMBER ff14SB models are highly robust protein force fields in widespread use. Although the AFM2021 model gives a smaller χ2 value than C36m, the difference between C36m and AFM2021 is small. AMBER ff14SB gives a larger χ2 value. However, this does not necessarily suggest that ff14SB will be less accurate in a protein environment. In a protein environment, the balance between peptide–peptide and peptide–water interactions would be different from that in the fully hydrated environment simulated. The solution phase J-coupling is just one small aspect of model accuracy. It is, however, very encouraging that the AFM2021 model performs very competitively to the best traditional protein force fields for modeling hydrated glycine without fitting to any experiments.
To understand the effect of the terminal groups introduced by the acetyl and methylamine substitutions, the χ2 values for zwitterionic Gly3 and blocked Ace-Gly3-NMe for AFM2021 are reported in Table IV for each residue and for the whole peptide. It is interesting that the χ2 value for both the cationic N terminus and the central residue decreased for the zwitterionic Gly3 peptide when compared to the blocked peptide, whereas the anionic C terminus showed a larger χ2 value. This is consistent with the experimental J-coupling being measured with the cationic Gly3. The zwitterionic N-terminus is more similar to the cationic counterpart, and the blocked C terminus being neutral could be argued to be more similar to the neutral C terminus of a cationic peptide. We note that the zwitterionic model was constructed with borrowed parameters and would be less accurate than a zwitterionic model fitted directly. Part of the increase in the C terminus χ2 value might be a result of using such borrowed parameters.
TABLE IV.
Calculated χ2 values for blocked and zwitterionic Gly3 with AFM2021.
| Blocked | Zwitterionic | |
|---|---|---|
| N-terminus | 13.49 | 11.59 |
| Central | 0.75 | 0.58 |
| C-terminus | 0.62 | 2.97 |
| Peptide | 1.86 | 2.45 |
Table V reports the population in each conformational basin for Ace-Gly3-NMe as predicted by the AFM2021 model, CHARMM36m model, and AMBER ff14SB model. The φ and ψ angles are those for the central residue. The definition for each conformation is only based on torsional angles listed in Table V. The corresponding free energy surface profile is shown in Fig. 2. The two basins for each secondary structure are equivalent. The population difference between the two basins provides an estimate of the statistical uncertainty of the simulations. For all the models, the population difference for the 1 μs simulation is less than 2%.
TABLE V.
Populations (in percent) in each conformational basin classified according to the backbone angles of the central residue for Ace-Gly3-NMe with different force fields.
| Region | Torsional range | AFM2021 | C36m | ff14SB |
|---|---|---|---|---|
| α | −160° < φ < −20°, −120° < ψ < 50° | 3.9 | 6.0 | 14.3 |
| 20° < φ < 160°, −50° < ψ < 120° | 4.3 | 6.3 | 15.1 | |
| β | −180° < φ < −90°, 50° < ψ < 240° | 15.1 | 5.6 | 12.3 |
| 90° < φ < 180°, −240° < ψ < −50° | 14.9 | 5.8 | 12.5 | |
| PP-II | −90° < φ < −20°, 50° < ψ < 240° | 31.0 | 36.9 | 22.0 |
| 20° < φ < 90°, −240° < ψ < −50° | 30.5 | 39.2 | 22.9 | |
| Other | 0.3 | 0.2 | 0.9 |
FIG. 2.
The free energy profile as a function of the central residue φ and ψ angles for Ace-Gly3-NMe with (a) AFM2021, (b) CHARMM36m, and (c) Amber ff14SB models.
Compared to C36m and ff14SB, the AFM2021 glycine model showed the least amount of helix and the most amount of β-strand. The most dominating conformation according to all models is poly-proline II (PP-II), consistent with the results from previous studies.60–63 While the overall trend is similar, the relative population of various conformations predicted by the three models differs slightly. The free energy profile for the three models is fairly similar, except for that ff14SB is showing a deeper basin for the α-helix and C36m is showing a deeper basin for PP-II when compared to AFM2021. The same trend can also be clearly seen from the relative populations shown in Table V. The PP-II conformations are the most stable for both polyglycine and polyalanine.21
It is commonly understood that the helical region is stabilized by intra-peptide hydrogen bonds. For Ace-Gly3-NMe, it is possible to form one α-helix 1–5 hydrogen bond between the two terminal substitutions. A maximum of two 1–4 hydrogen bonds that stabilize the 310 helix can be formed. Table VI reports the number of hydrogen bonds for each model listing separately the 1–5 hydrogen bonds and 1–4 hydrogen bonds. Even for the most helical ff14SB, the number of 1–5 hydrogen bonds is negligible, which indicates that the possible hydrogen bonds between terminal groups will not affect the likelihood of helical conformations for a blocked peptide when compared to peptides without the acetal and methyl residues. The ff14SB model forms the most 1–4 hydrogen bonds, which may explain the increased helical conformation. This difference between force fields reflects a different balance of intra-peptide and peptide–water hydrogen bonds.64–66 Without the use of combination rules, AFM based models potentially could achieve a better balance between intra-peptide and peptide–water interactions.
TABLE VI.
The number of 1–4 and 1–5 intrapeptide hydrogen bonds formed for the hydrated Ace-Gly3-NMe at any given time. A hydrogen bond is formed when the O–H distance is less than 2.5 Å.
| Model | 1–4 | 1–5 |
|---|---|---|
| AFM2021 | ∼0 | 0.0001 |
| C36m | 0.0089 | 0.0024 |
| ff14SB | 0.1137 | 0.0085 |
Figure 3 shows the torsional energy diagram for the four DFT based models along with CHARMM C36m and AMBER ff14SB. Unlike Fig. 2, which shows the free energy as a function of the Ramachandran angles, Fig. 3 only shows the contribution of the intramolecular torsional terms. For all DFT functionals, the intramolecular torsional surfaces are similar, showing the minima at a ψ anlge of 180°, which are close to the PP-II and β-strand basins. The C36m and ff14SB torsional surfaces are very different from each other and from the DFT based models, indicating the different balance for non-bonded and bonded interactions. For both C36m and ff14SB, the non-bonded interactions are restricted by combination rules, whereas AFM allows each specific pair to be fit independently. The C36m torsional surface is plotted with the CMAP contribution. It shows a larger variation of 12 kcal/mol with more complex features that are likely to complement the non-bonded intramolecular interaction to describe the torsional surface. For example, there is a sharp peak at the (φ, ψ) angle of (−180, 0) that is mostly likely designed to capture some missing steric hindrance in the non-bonded terms. It is also interesting to note that although the DFT models and the two traditional protein models have vastly different torsional energy surfaces, the free energy profiles are quite similar in comparison.
FIG. 3.
φ and ψ torsional energy surfaces for different models: (a) B3LYP/def2-TZVP, (b) BP86/def2-TZVP, (c) PBE/def2-TZVP, (d) PBE/aug-cc-pVDZ′, (e) CHARMM36m, (f) Amber ff14SB, (g) the difference between models based on B3LYP-D3(BJ) and PBE-D3(BJ) with the def2-TZVP basis set, and (h) the difference between models based on def2-TZVP and aug-cc-pVDZ′ for the PBE functional. The energy unit is kcal/mol.
Figure 3(g) shows the difference between B3LYP-D3(BJ) and PBE with the def2-TZVP basis set, and Fig. 3(h) shows the difference between def2-TZVP and aug-cc-pVDZ′ using PBE. From the different scales for Figs. 3(g) and 3(h), it is clear that the difference between the two basis sets is smaller than the difference between the two functionals. Since the basis sets showed larger improvement of the agreement to experiments, we believe that the improvement came mostly from better peptide–water interactions. The stability of different secondary structures is a delicate balance between intra-peptide and peptide–water interactions since different conformations have different solvent accessible surface areas and will be bonded differently with hydration water.
Parameters for all four models are provided in the supplementary material, and the GROMACS files for all four models are released,37 including input files for both the blocked and zwitterionic peptides. For the most accurate simulation of hydrated glycine, the AFM2021 model fitted with B3LYP/def2-TZVP should be used. All the data for fitting our glycine models are released.37 These data allow glycine models to be refit with additional training set data using the CReate Your Own Force-Field (CRYOFF) code67 from our group. Such a refit may be necessary in the future if deficiency is identified for certain conformations related to glycine.
V. SUMMARY AND CONCLUSION
The previously introduced AFM based protocol for peptide force field development is slightly modified with simplified energy expressions. DFT based models for hydrated glycine are developed using this revised protocol. Our procedure allows the free energy profiles for the different DFT methods to be studied. The free energy profile translates to the population distribution through an exponential function. A good free energy profile requires adequate accuracy in all accessible regions of the phase space and provides more information than benchmarks based on the potential energy minimum. The accuracy of three different functionals, PBE, BP86, and B3LYP, in producing such a free energy profile is studied with two different basis sets. One basis set is a slightly reduced version of aug-cc-pVDZ, and the other is def2-TZVP.
Our free energy based benchmark indicates that the def2-TZVP basis set produces better agreement than a slightly modified version of aug-cc-pVDZ that has been used previously in the development of an alanine model. Different functionals show rather similar performance, with B3LYP slightly better than BP86, followed closely by PBE. The most accurate AFM glycine model, which is based on B3LYP-D3(BJ) and the def2-TZVP basis set, is referred to as the AFM2021 model. Based on the calculated χ2 value, the AFM2021 glycine model is in better agreement with experiments than either C36m or ff14SB for a hydrated peptide, although the margin with respect to C36m is very thin.
For Ace-Gly3-NMe, AFM2021 predicts about 62% PP-II, 30% β-strand, and 8% helical secondary structures. The free energy profile of AFM2021 is similar to that from C36m and ff14SB, with ff14SB predicting a slightly more stable helical region and C36m predicting more PP-II.
The previously published alanine model was refitted using the slightly simplified energy expressions for AFM2021. The updated alanine model and the new glycine model are released along with the entire training set data for glycine.37 Based on the previous success of AFM2020, the development of AFM2021 parameters for glycine and alanine gave us more confidence in using AFM to develop force fields for proteinogenic peptides. The use of blocked peptides in AFM allows the use of smaller QM regions in AFM while still sampling α-helix hydrogen bonds. While the AFM force fields gave good accuracy for short hydrated peptides, the performance of such force fields in a protein environment requires additional testing.
SUPPLEMENTARY MATERIAL
See the supplementary material for fragments used to determine the dispersion parameters, the summary of parameters of alanine and four glycine AFM models, and the parameters for the zwitterionic glycine model.
ACKNOWLEDGMENTS
This work was supported by the National Institutes of Health under Grant Nos. 1R01GM120578 and 2P20GM103429. The computational resource was provided by the Arkansas High Performance Computing Center with partial support from the Arkansas Bioscience Institute and an equipment supplement under Grant No. 1R01GM120578.
DATA AVAILABILITY
The force field models are available in the supplementary material. Gromacs input files and reference data for fitting are available at https://wanglab.uark.edu/Models.
REFERENCES
- 1.van Gunsteren W. F. and Dolenc J., Biochem. Soc. Trans. 36, 11 (2008). 10.1042/bst0360011 [DOI] [PubMed] [Google Scholar]
- 2.Nerenberg P. S. and Head-Gordon T., Curr. Opin. Struct. Biol. 49, 129 (2018). 10.1016/j.sbi.2018.02.002 [DOI] [PubMed] [Google Scholar]
- 3.Wu C., Ren P., and Ponder J. W., CCB Report 2010-01, Washing University School of Medicine, 2010. [Google Scholar]
- 4.Becke A. D., J. Chem. Phys. 140, 18A301 (2014). 10.1063/1.4869598 [DOI] [PubMed] [Google Scholar]
- 5.Iftimie R., Minary P., and Tuckerman M. E., Proc. Natl. Acad. Sci. U. S. A. 102, 6654 (2005). 10.1073/pnas.0500193102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Carloni P., Rothlisberger U., and Parrinello M., Acc. Chem. Res. 35, 455 (2002). 10.1021/ar010018u [DOI] [PubMed] [Google Scholar]
- 7.Brust R., Lukacs A., Haigney A., Addison K., Gil A., Towrie M., Clark I. P., Greetham G. M., Tonge P. J., and Meech S. R., J. Am. Chem. Soc. 135, 16168 (2013). 10.1021/ja407265p [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ercolessi F. and Adams J. B., Europhys. Lett. 26, 583 (1994). 10.1209/0295-5075/26/8/005 [DOI] [Google Scholar]
- 9.Izvekov S., Parrinello M., Burnham C. J., and Voth G. A., J. Chem. Phys. 120, 10896 (2004). 10.1063/1.1739396 [DOI] [PubMed] [Google Scholar]
- 10.Wang L.-P., Martinez T. J., and Pande V. S., J. Phys. Chem. Lett. 5, 1885 (2014). 10.1021/jz500737m [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hudson P. S., Boresch S., Rogers D. M., and Woodcock H. L., J. Chem. Theory Comput. 14, 6327 (2018). 10.1021/acs.jctc.8b00517 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Perdew J. P., Burke K., and Ernzerhof M., Phys. Rev. Lett. 77, 3865 (1996). 10.1103/physrevlett.77.3865 [DOI] [PubMed] [Google Scholar]
- 13.Grimme S., Antony J., Ehrlich S., and Krieg H., J. Chem. Phys. 132, 154104 (2010). 10.1063/1.3382344 [DOI] [PubMed] [Google Scholar]
- 14.Grimme S., Ehrlich S., and Goerigk L., J. Comput. Chem. 32, 1456 (2011). 10.1002/jcc.21759 [DOI] [PubMed] [Google Scholar]
- 15.Akin-Ojo O., Song Y., and Wang F., J. Chem. Phys. 129, 064108 (2008). 10.1063/1.2965882 [DOI] [PubMed] [Google Scholar]
- 16.Akin-Ojo O. and Wang F., J. Comput. Chem. 32, 453 (2011). 10.1002/jcc.21634 [DOI] [PubMed] [Google Scholar]
- 17.Li J. and Wang F., J. Chem. Phys. 143, 194505 (2015). 10.1063/1.4935599 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Wang F., Akin-Ojo O., Pinnick E., and Song Y., Mol. Simul. 37, 591 (2011). 10.1080/08927022.2011.565759 [DOI] [Google Scholar]
- 19.Yuan Y., Ma Z., and Wang F., J. Phys. Chem. B 125, 1568 (2021). 10.1021/acs.jpcb.0c11618 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Dunning T. H., J. Chem. Phys. 90, 1007 (1989). 10.1063/1.456153 [DOI] [Google Scholar]
- 21.Kendall R. A., Dunning T. H., and Harrison R. J., J. Chem. Phys. 96, 6796 (1992). 10.1063/1.462569 [DOI] [Google Scholar]
- 22.Weigend F., Furche F., and Ahlrichs R., J. Chem. Phys. 119, 12753 (2003). 10.1063/1.1627293 [DOI] [Google Scholar]
- 23.Weigend F. and Ahlrichs R., Phys. Chem. Chem. Phys. 7, 3297 (2005). 10.1039/b508541a [DOI] [PubMed] [Google Scholar]
- 24.Shi Z., Chen K., Liu Z., Ng A., Bracken W. C., and Kallenbach N. R., Proc. Natl. Acad. Sci. U. S. A. 102, 17964 (2005). 10.1073/pnas.0507124102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Shi Z., Olson C. A., Rose G. D., Baldwin R. L., and Kallenbach N. R., Proc. Natl. Acad. Sci. U. S. A. 99, 9190 (2002). 10.1073/pnas.112193999 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Graf J., Nguyen P. H., Stock G., and Schwalbe H., J. Am. Chem. Soc. 129, 1179 (2007). 10.1021/ja0660406 [DOI] [PubMed] [Google Scholar]
- 27.Salvador P., in Annual Reports on NMR Spectroscopy, edited by Webb G. A. (Academic Press, 2014), p. 185. [Google Scholar]
- 28.Perdew J. P., Phys. Rev. B 33, 8822 (1986). 10.1103/physrevb.33.8822 [DOI] [PubMed] [Google Scholar]
- 29.Perdew J. P., Phys. Rev. B 34, 7406 (1986). 10.1103/physrevb.34.7406 [DOI] [PubMed] [Google Scholar]
- 30.Becke A. D., Phys. Rev. A 38, 3098 (1988). 10.1103/physreva.38.3098 [DOI] [PubMed] [Google Scholar]
- 31.Becke A. D., J. Chem. Phys. 98, 5648 (1993). 10.1063/1.464913 [DOI] [Google Scholar]
- 32.Lee C., Yang W., and Parr R. G., Phys. Rev. B 37, 785 (1988). 10.1103/physrevb.37.785 [DOI] [PubMed] [Google Scholar]
- 33.Avbelj F., Grdadolnik S. G., Grdadolnik J., and Baldwin R. L., Proc. Natl. Acad. Sci. U. S. A. 103, 1272 (2006). 10.1073/pnas.0510420103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Best R. B., Buchete N.-V., and Hummer G., Biophys. J. 95, L07 (2008). 10.1529/biophysj.108.132696 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Tobias D. J. and Brooks C. L., Biochemistry 30, 6059 (1991). 10.1021/bi00238a033 [DOI] [PubMed] [Google Scholar]
- 36.Abraham M. J., Murtola T., Schulz R., Páll S., Smith J. C., Hess B., and Lindahl E., SoftwareX 1-2, 19 (2015). 10.1016/j.softx.2015.06.001 [DOI] [Google Scholar]
- 37.See https://wanglab.hosted.uark.edu/Models for AFM2021 Parameters; retrieved May 17, 2021.
- 38.Anatole von Lilienfeld O. and Tkatchenko A., J. Chem. Phys. 132, 234109 (2010). 10.1063/1.3432765 [DOI] [PubMed] [Google Scholar]
- 39.Hornak V., Abel R., Okur A., Strockbine B., Roitberg A., and Simmerling C., Proteins: Struct., Funct., Bioinf. 65, 712 (2006). 10.1002/prot.21123 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Baker J., Wolinski K., Malagoli M., Kinghorn D., Wolinski P., Magyarfalvi G., Saebo S., Janowski T., and Pulay P., J. Comput. Chem. 30, 317 (2009). 10.1002/jcc.21052 [DOI] [PubMed] [Google Scholar]
- 41.Füsti-Molnár L. and Pulay P., J. Chem. Phys. 117, 7827 (2002). 10.1063/1.1510121 [DOI] [Google Scholar]
- 42.Karplus M., J. Am. Chem. Soc. 85, 2870 (1963). 10.1021/ja00901a059 [DOI] [Google Scholar]
- 43.Karplus M., J. Chem. Phys. 30, 11 (1959). 10.1063/1.1729860 [DOI] [Google Scholar]
- 44.Hu J.-S. and Bax A., J. Am. Chem. Soc. 119, 6360 (1997). 10.1021/ja970067v [DOI] [Google Scholar]
- 45.Schmidt J. M., Blümel M., Löhr F., and Rüterjans H., J. Biomol. NMR 14, 1 (1999). 10.1023/a:1008345303942 [DOI] [PubMed] [Google Scholar]
- 46.Edison A. S., Markley J. L., and Weinhold F., J. Biomol. NMR 4, 519 (1994). 10.1007/bf00156618 [DOI] [PubMed] [Google Scholar]
- 47.Hennig M., Bermel W., Schwalbe H., and Griesinger C., J. Am. Chem. Soc. 122, 6268 (2000). 10.1021/ja9928834 [DOI] [Google Scholar]
- 48.Wirmer J. and Schwalbe H., J. Biomol. NMR 23, 47 (2002). 10.1023/a:1015384805098 [DOI] [PubMed] [Google Scholar]
- 49.Ding K. and Gronenborn A. M., J. Am. Chem. Soc. 126, 6232 (2004). 10.1021/ja049049l [DOI] [PubMed] [Google Scholar]
- 50.Hoover W. G., Phys. Rev. A 31, 1695 (1985). 10.1103/physreva.31.1695 [DOI] [PubMed] [Google Scholar]
- 51.Nosé S., Mol. Phys. 52, 255 (1984). 10.1080/00268978400101201 [DOI] [Google Scholar]
- 52.Parrinello M. and Rahman A., J. Appl. Phys. 52, 7182 (1981). 10.1063/1.328693 [DOI] [Google Scholar]
- 53.Brooks B. R., Brooks C. L. III, A. D.Mackerell, Jr., Nilsson L., Petrella R. J., Roux B., Won Y., Archontis G., Bartels C., Boresch S., Caflisch A., Caves L., Cui Q., Dinner A. R., Feig M., Fischer S., Gao J., Hodoscek M., Im W., Kuczera K., Lazaridis T., Ma J., Ovchinnikov V., Paci E., Pastor R. W., Post C. B., Pu J. Z., Schaefer M., Tidor B., Venable R. M., Woodcock H. L., Wu X., Yang W., York D. M., and Karplus M., J. Comput. Chem. 30, 1545 (2009). 10.1002/jcc.21287 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Best R. B., Zhu X., Shim J., Lopes P. E. M., Mittal J., Feig M., and MacKerell A. D., J. Chem. Theory Comput. 8, 3257 (2012). 10.1021/ct300400x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Huang J., Rauscher S., Nawrocki G., Ran T., Feig M., de Groot B. L., Grubmüller H., and MacKerell A. D., Nat. Methods 14, 71 (2017). 10.1038/nmeth.4067 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Salomon-Ferrer R., Case D. A., and Walker R. C., Wiley Interdiscip. Rev.: Comput. Mol. Sci. 3, 198 (2013). 10.1002/wcms.1121 [DOI] [Google Scholar]
- 57.Maier J. A., Martinez C., Kasavajhala K., Wickstrom L., Hauser K. E., and Simmerling C., J. Chem. Theory Comput. 11, 3696 (2015). 10.1021/acs.jctc.5b00255 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Cornell W. D., Cieplak P., Bayly C. I., Gould I. R., Merz K. M., Ferguson D. M., Spellmeyer D. C., Fox T., Caldwell J. W., and Kollman P. A., J. Am. Chem. Soc. 117, 5179 (1995). 10.1021/ja00124a002 [DOI] [Google Scholar]
- 59.Jorgensen W. L., Chandrasekhar J., Madura J. D., Impey R. W., and Klein M. L., J. Chem. Phys. 79, 926 (1983). 10.1063/1.445869 [DOI] [Google Scholar]
- 60.Andrews B., Zhang S., Schweitzer-Stenner R., and Urbanc B., Biomolecules 10, 1121 (2020). 10.3390/biom10081121 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Bykov S. and Asher S., J. Phys. Chem. B 114, 6636 (2010). 10.1021/jp100082n [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Takekiyo T., Imai T., Kato M., and Taniguchi Y., Biochim. Biophys. Acta 1764, 355 (2006). 10.1016/j.bbapap.2005.11.013 [DOI] [PubMed] [Google Scholar]
- 63.Schweitzer-Stenner R., Eker F., Huang Q., and Griebenow K., J. Am. Chem. Soc. 123, 9628 (2001). 10.1021/ja016202s [DOI] [PubMed] [Google Scholar]
- 64.König G. and Boresch S., J. Phys. Chem. B 113, 8967 (2009). 10.1021/jp902638y [DOI] [PubMed] [Google Scholar]
- 65.Nerenberg P. S. and Head-Gordon T., J. Chem. Theory Comput. 7, 1220 (2011). 10.1021/ct2000183 [DOI] [PubMed] [Google Scholar]
- 66.Best R. B., Zheng W., and Mittal J., J. Chem. Theory Comput. 10, 5113 (2014). 10.1021/ct500569b [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Wang F., Akin-ojo O., Li J., Ma Z., Yuan Y., and Rogers T. R., Create Your Own Force Field (CRYOFF), https://wanglab.hosted.uark.edu/cryoff/wanglab_CRYOFF.html; retrieved May 17, 2021.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
See the supplementary material for fragments used to determine the dispersion parameters, the summary of parameters of alanine and four glycine AFM models, and the parameters for the zwitterionic glycine model.
Data Availability Statement
The force field models are available in the supplementary material. Gromacs input files and reference data for fitting are available at https://wanglab.uark.edu/Models.



