Multiscale Simulations of Protein Landscapes: Using Coarse Grained Models as Reference Potentials to Full Explicit Models

Benjamin M Messer; Maite Roca; Zhen T Chu; Spyridon Vicatos; Alexandra Vardi Kilshtain; Arieh Warshel

doi:10.1002/prot.22640

. Author manuscript; available in PMC: 2011 Apr 1.

Published in final edited form as: Proteins. 2010 Apr;78(5):1212–1227. doi: 10.1002/prot.22640

Multiscale Simulations of Protein Landscapes: Using Coarse Grained Models as Reference Potentials to Full Explicit Models

Benjamin M Messer ¹, Maite Roca ^1,², Zhen T Chu ¹, Spyridon Vicatos ¹, Alexandra Vardi Kilshtain ¹, Arieh Warshel ^1,^*

PMCID: PMC2822134 NIHMSID: NIHMS159971 PMID: 20052756

Abstract

Evaluating the free energy landscape of proteins and the corresponding functional aspects presents a major challenge for computer simulation approaches. This challenge is due to the complexity of the landscape and the enormous computer time needed for converging simulations. The use of simplified coarse grained (CG) folding models offers an effective way of sampling the landscape but such a treatment, however, may not give the correct description of the effect of the actual protein residues. A general way around this problem that has been put forward in our early work (Fan et al, Theor Chem Acc (1999) 103:77-80) uses the CG model as a reference potential for free energy calculations of different properties of the explicit model. This method is refined and extended here, focusing on improving the electrostatic treatment and on demonstrating key applications. This application includes: evaluation of changes of folding energy upon mutations, calculations of transition states binding free energies (which are crucial for rational enzyme design), evaluation of catalytic landscape and simulation of the time dependent responses to pH changes. Furthermore, the general potential of our approach in overcoming major challenges in studies of structure function correlation in proteins is discussed.

Keywords: Coarse Grained model, free energy calculations, dielectric constants, proton transfer

1. Introduction

The elucidation of the landscape of proteins is an important element in attempts to explore the energetics and kinetics of the folding process. Furthermore, the ability to explore and sample large regions in the accessible conformational space can help in improving the description of functional properties and in exploring possible relationships between landscape and functions. Unfortunately, detailed sampling of protein landscapes requires enormous computational resources that are not generally available. Thus it is important to develop multiscale approaches that will allow one to effectively generate the free energy surface for folding and related processes. Simplified protein models have been in use since our early work in 1975¹ and are being used extensively in folding studies²^-⁸. Furthermore, recent efforts have extended our early 1975 idea into membranes (see MARTINI force field ref9). However such coarse grained (CG) models are, by their nature, approximated and it is important to be able to move from the coarse-grained description to more realistic explicit models. A general way to resolve this problem has been introduced by us some time ago¹⁰, where we proposed the use of a simplified model as a reference potential for explicit calculations of folding free energies. This idea, which can be now classified under the umbrella name of multiscale modeling, became even more important in recent years in view of the interest in the relationship between landscape and function⁷^,⁸^,¹¹^-¹³.

The idea of using reference potential in multiscale modeling has been exploited by us in a wide range of problems including accelerating QM/MM calculations (ref¹⁴^-¹⁶) and path integral calculations of nuclear quantum mechanical effects (ref¹⁷^,¹⁸). The idea of using CG model as reference potential in folding studies¹⁰ and related processes has been also explored recently by other workers (e.g. refs⁸^,¹⁹^,²⁰). Here we describe the details of the current version of our multiscale landscape modeling and provide instructive illustrations of its use in calculating the change in protein stability upon mutation. We also outline other promising applications of this powerful model.

2. Methods

2.1. The Simplified Protein Model

The simplified protein model used in this work is an extension of the model originally developed by Levitt and Warshel¹ and modified in ref.10 In the current model we have made additional refinements of the original model, focusing on the introduction of the specialized solvation terms, designed to replicate the effects of the missing solvent molecules and on other factors that are not included explicitly.

Our simplified protein model, which is depicted schematically in Fig. 1, is created by replacing the side chain of each residue by an effective “atom” named (X) and an additional dummy atom named (D). The atom X is usually placed at the geometrical center of the heavy atoms of the corresponding side chains (with a residue dependent charge and van der Waals equilibrium radius), where, the distance between the newly placed X atom and the C_α atom, r_Ca₋_X, depends upon the type of the side chain of the residue it replaces. The dummy atoms are placed along the corresponding C_α - C_β vectors and serve as tools for rotational transformations in the process of moving between the simplified and explicit models. That is, the C_α − D bond is used to define the C_α - C_β bond of the explicit model. The actual rotational transformations are done by using simple rotation around the specific bond. The Dummy atoms do not have any charge or van der Waals interaction with the rest of the system. The backbone atoms of each residue are treated explicitly, and the interactions between main chain atoms are identical to those used in the explicit model but then modified to reflect the missing solvent terms. In the case of ionized residues, we shift the r_Ca₋_X of atom X to the charged center of the given residue.

A schematic illustration of the conversion of an explicit side chain to its simplified equivalent. The position of the explicit C_β is preserved, using a dummy atom, named D, and the geometrical center of the side chain heavy atoms is represented in the simplified model by the effective atom, named X.

The potential energy surface of the simplified (sp) model is written as

\begin{matrix} \begin{matrix} U_{sp} = U_{mm}^{0} + U_{ss}^{0} + U_{ms}^{0} + U_{solv}^{0} = \\ U_{mm} + U_{ss}^{ef} + U_{ss}^{QQ} + U_{s}^{self} + U_{ms}^{ef} + U_{ms}^{Qq} \\ + Δ U_{mm}^{HB} + Δ U_{mm}^{phi - psi} + Δ U_{mm}^{qq} \end{matrix} \end{matrix}

(1)

Where m and s designate, respectively, “main” and “side”, U⁰ is the gas phase potential of the indicated term, $U_{solv}^{0}$ is the missing effect of the solvent and the protein. The more detailed expression on the right hand side involves the effective steric potential (U^ef), the screened interaction between different charges (Q and q designate full charges of ionized groups and atomic residual charges, respectively), the ΔU terms represent the corrections to the indicated potential terms due to the solvent and $U_{s}^{self}$ is the self energy of the ionized groups. The nature of the different terms is described below.

2.1.1. The side-side interaction potential

The interaction between the simplified side chains is given by,

U_{ss} = U_{ss}^{ef} + U_{ss}^{QQ}

(2)

where $U_{ss}^{ef}$ is described by an “8-6” potential of the form,

U_{ss}^{ef} = ∑_{i < j} \frac{ε_{ij}^{0}}{C_{ij}^{scale}} [3 {(r_{ij}^{0} / r_{ij})}^{8} - 4 {(r_{ij}^{0} / r_{ij})}^{6}]

(3)

where $ε_{ij}^{0} = \sqrt{ε_{i}^{0} ε_{j}^{0}}$ and $r_{ij}^{0} = \sqrt{r_{i}^{0} r_{j}^{0}}$ . The parameters $ε_{ij}^{0}$ and $r_{ij}^{0}$ define, respectively, the well depth and equilibrium distance. The unitless parameter C^scale has a value of 10 when $U_{ss}^{ef}$ is calculated for polar - non polar and ionized – non polar residues only. For side chain interactions between polar – polar and non polar – non polar residues, C_scale has a value of 1. These parameters were refined by minimizing the root-mean-square deviations between the calculated and observed values of both the atomic positions and the protein size (i.e., the radii of gyration) for a series of proteins. The corresponding refined parameters are given in Table I, together with the corresponding r_Ca₋_X parameters. The term $U_{ss}^{QQ}$ represents the charge-charge interaction in the gas phase. This term is given by,

Table I.

Parameters for the U^ef term^a

Residue Name

R_{i}^{0} (Å)

ε_i (kcal mol^-1)

Polarity

r_{C_{a} - X}^{0} (Å)

4.26

0.05

nonpolar

1.52

4.79

1.17

polar

2.46

^bD

4.45

1.17

polar

2.49

^bE

4.68

1.17

polar

3.40

4.55

0.13

nonpolar

3.30

4.75

1.17

polar

3.20

4.68

0.18

nonpolar

2.60

^bK

4.74

1.17

polar

3.92

4.72

0.18

nonpolar

2.56

4.76

0.17

nonpolar

3.24

4.59

1.17

polar

2.52

4.72

0.31

nonpolar

2.46

4.79

1.17

polar

3.10

^bR

4.47

1.17

polar

4.11

4.54

1.17

polar

2.46

4.62

1.17

polar

2.46

4.70

0.14

nonpolar

2.06

4.48

0.35

nonpolar

3.30

4.51

1.17

polar

3.40

Open in a new tab

$R_{i}^{0}$ and ε_i are used for all interactions between simplified mode residues. $r_{C_{a} - X}^{0}$ is the equilibrium distance between the C_α and the X atom of the given simplified residue.

For the ionizable residues, the equilibrium distance $r_{C_{a} - X}^{0}$ is shifted towards the geometrical center of the ionizable functional group.

U_{ss}^{QQ} = ∑_{i \neq j} 332 Q_{i} Q_{j} / ε_{eff} r_{ij}

(4)

where Q_i designates the charge of the i^th ionized residue, ε_eff is the effective dielectric for charge-charge interactions. Here we reflect the idea established in many of our works (e.g. refs²¹^,²²) and used a large effective dielectric, ε_eff (ε_eff = 40) even for protein interiors. This type of dielectric has been found recently to provide very powerful insight in studies of protein stability (see refs²¹^,²³) and thus expected to be very useful in modeling the electrostatic contribution to the stability of the simplified model.

2.1.2. Effective electrostatic terms that involve the main chain

The effective electrostatic interactions between the side chains and the main chain and between the main chain groups and themselves are given by

U_{ms}^{Qq} + U_{mm}^{qq} = ∑_{i} ∑_{k} 332 (Q_{i} q_{k} / ε_{eff}^{'} r_{ik}) + ∑_{k} ∑_{k' \neq k} 332 (q_{k} q_{k'} / ε_{eff}^{″} r_{kk'})

(5)

Here Q_i and q_k designate the charges of ionized residues and residual atomic charges on the main chain respectively (there are no residual charges on the simplified unionized side chains). The different dielectrics represent the compensation of the gas phase electrostatic interactions by the solvent and the protein. The dielectric for the Q and q terms is taken (following our previous studies²²) as ε_eff = 10 for charge (Q)-residual (q) charge interaction and ε_eff = 4 for residual charge(q)-residual charge(q) interaction. The potential, U, is given in kcal/mole, where Q and q are given in atomic charges and r in Å. The factor 332 in Eq. (5) is used for these units (see Warshel and Russell early work²⁴ for further details). Note that $U_{mm}^{qq}$ is the sum of the gas phase qq part of U_mm plus the screening by the surrounding solvent and the protein. This treatment also includes the residual charge interactions that are involved in hydrogen bonding.

2.1.3 The self energy term

The self energy term $U_{s}^{self}$ represents the interaction of the side chain with its environment and accounts for the crucial free energy of transferring the side chain to the protein when all other groups are not ionized (note that the effect of interaction with other ionized groups is accounted separately by $U_{ss}^{QQ}$ ). The overall $U_{s}^{self}$ term is given by,

U_{s}^{self} = ∑_{i} [U_{np} (N_{np}^{i}) + U_{polar} (N_{polar}^{i})]

(6)

where i runs over all ionized residues and $N_{np}^{i}$ and $N_{polar}^{i}$ are the number of nonpolar and polar residues in the neighborhood of the i^th residue. The functions U_np and U_polar are given by

U_{np} = {\begin{matrix} 4 exp [- 0.2 {(N_{np} - 6)}^{2}] & N_{np} \leq 6 \\ 4 & N_{np} > 6 \end{matrix}

(7)

and

U_{polar} = {\begin{matrix} - 2 exp [- 0.2 {(N_{polar} - 4)}^{2}] & N_{polar} \leq 4 \\ - 2 & N_{polar} > 4 \end{matrix}

(8)

The number of nonpolar residues neighboring the i^th ionized residue is expressed by the analytical function

N_{np}^{i} = ∑_{j (np)} G (r_{ij})

(9)

With

G (r_{ij}) = {\begin{matrix} 1 & r_{ij} \leq r_{np} \\ exp [- 6 {(r_{ij} - r_{np})}^{2}] & r_{ij} > r_{np} \end{matrix}

(10)

where r_np is the nonpolar radius defining the cutoff range of nonpolar neighboring residues (typically r_np = 7Å), and r_ij is the separation between the ionized residue i and nonpolar residue j. The same expression is used for neighboring polar residues (N_polar)., where r_p = 7Å. This treatment is aimed at capturing the fact that an ionized group has to pay very large amount of energy for moving from water to a nonpolar environment²⁴^,²⁵ and is usually surrounded by polar residues or water molecules²²^,²⁴.

It should be noted that the calculations of N_p, and N_np are done while treating the ionizable residues as non polar residues. This is done since the effect of charging these groups is taken into account separately by the $U_{ss}^{QQ}$ term.

The parameters in Eqs. (6-10) were determined (see section 3.1.3) by comparing the results obtained with these equations to the results of the actual solvation energy evaluated by the semimacroscopic version of the Protein Dipole Langevin Dipole (PDLD/S) in its linear response approximation (LRA) version (PDLD/S-LRA).

In principle it would be useful to include in $U_{s}^{self}$ the hydrophobic contribution of non polar residues and the effect of the solvent on the self energy of polar residues. However, these contributions are included implicitly in $U_{ss}^{ef}$ and thus we only consider explicitly the $U_{s}^{self}$ of ionized residues. This is done by using different $ε_{ij}^{0}$ in Eq. (3) for hydrophobic and polar residues.

2.1.4. The main chain torsional potential

The secondary structure of proteins depends strongly on the solvation of the main chains. Thus we added the correction potential $U_{mm}^{phi - psi}$ that is used to modify the gas phase potential. This solvation potential is given by,

Δ U_{mm}^{phi - psi} = ∑_{i = 1}^{4} A_{i} g (ϕ - ϕ_{0}^{i}, ω_{0}^{i}) g (ψ - ψ_{0}^{i}, ω_{o}^{i})

(11)

Where

g (x, ω) = exp (- 0.693 (1 - cos (x)) / sin (ω / 2))

(12)

The values of $φ_{0}^{i}$ and $ψ_{0}^{i}$ are chosen to represent the minima of the α-helix and β-sheet regions of the Ramachandran plot, while A_i and $ω_{0}^{i}$ have been selected to tune the simple model α-helix and β-sheet regions to match those of the explicit model (see section 3.1.1). The specific values of these parameters are listed in Table II.

Table II.

Parameters for the $Δ U_{mm}^{phi - psi}$ term^a

Region

A_i (kcal mol^-1)

φ_{0}^{i}

ω_{φ, 0}^{i}

ψ_{0}^{i}

ω_{ψ, 0}^{i}

α - helix

-1

-95

-5

β - sheet

-10

-150

175

γ′-turn

-3

-80

L-α - helix

-5

Open in a new tab

Angular values are given in degrees.

2.1.5. The effective hydrogen bonding potential

Solvation effects play a crucial role in determining hydrogen bonding. A part of this effect is taken into account by the $ε_{eff}^{″}$ of Eq. (5). However this uniform screening does not yield the correct physics of hydrogen bonding as established by our simulated studies where the RMS between the calculated and observed structure drift upward over time. Furthermore, even the introduction of $Δ U_{mm}^{phi - psi}$ have not resolved this problem.

In order to obtain better hydrogen bonding representation we introduced a potential $Δ U_{mm}^{HB}$ , where a small Gaussian barrier was added to the potential well used to describe hydrogen bonds. At room temperature, this barrier is easily crossed during hydrogen bond formation while preventing the already formed bonds from breaking. This treatment gives $Δ U_{mm}^{HB}$ the functional form

Δ U_{mm}^{HB} = {\begin{matrix} - 9 & r \leq 2.0 \\ - 9 exp (- 15 {(r - 2.3)}^{2}) & r > 2.0 \end{matrix}

(13)

The inclusion of both $U_{mm}^{phi - psi}$ and $Δ U_{mm}^{HB}$ greatly increases the long term stability of the simple model, with a maximum RMS deviation of less than 3.0 Å over the course of a 100 ps trajectory.

2.1.6. The effective side chain-main chain potential

The effective non electrostatic interaction between the side chain and the main chain, $U_{ms}^{ef}$ , is represented by the same type of potential as the potential in Eq. (3) (see Table I).

2.2. Simple to Explicit Transformation

The greatest difficulty to multiscale simulations lies in accurately and efficiently recreating the original all atom system from the simplified model system. Our strategy is based on the approach introduced by Fan et. al.¹⁰ In this approach we considered the partition functions Q_sp and Q_ep of the simplified and explicit models, respectively, as well as the ration between them, Q_sp / Q_ep. This gives the free energy of transfer between the surfaces by

exp (- Δ G_{ep \to sp} β) = Q_{sp} / Q_{ep}

(14)

where β = 1/k_BT, k_B being the Boltzmann constant and T being the absolute Temperature. In order to evaluate Eq. (14), we start by expressing the simplified and explicit potentials as

U_{sp} = U_{m} (\tilde{R}) + U_{s}^{sp} (\tilde{R})

(15)

and

U_{ep} = U_{m} (\tilde{R}) + U_{s}^{ep} (\tilde{r}, \tilde{R})

(16)

where R̃ are the coordinates of the simplified model and r̃ are the coordinates of the explicit side chain atoms, relative to the centers of the corresponding side chains in the simplified model. Now our task is to evaluate the free energy of moving from the simplified to the explicit model, ΔG_sp→ep, or −ΔG_ep→sp. This can be expressed as the ratio between the corresponding partition function

\begin{matrix} exp (- Δ G_{ep \to sp} β) = Q_{sp} / Q_{ep} = \\ = \int d \tilde{R} d \tilde{r} exp {- U_{sp} (\tilde{R}) β} / \int d \tilde{R} d \tilde{r} exp {- U_{ep} (\tilde{r}, \tilde{R}) β} \\ = \frac{\int d \tilde{R} d \tilde{r} exp {- (U_{sp} (\tilde{R}) - U_{ep} (\tilde{r}, \tilde{R})) β} exp {- U_{ep} (\tilde{r}, \tilde{R}) β}}{\int d \tilde{R} d \tilde{r} exp {- U_{ep} (\tilde{r}, \tilde{R}) β}} \end{matrix}

(17)

Thus we have

exp (- Δ G_{ep \to sp} β) = {〈 exp {- (U_{sp} (\tilde{R}) - U_{ep} (\tilde{r}, \tilde{R})) β} 〉}_{V_{ep}}

(18)

where the average is done formally over the combined R̃+r̃ coordinate set. Eq. (18) can be evaluated by using a free energy perturbation approach with a mapping potential:

U_{m} = U_{ep} (\tilde{r}, \tilde{R}) (1 - λ_{m}) + U_{sp} (\tilde{R}) λ_{m}

(19)

so that we have,

exp (- δ Δ G_{m \to m + 1} β) = {〈 exp {- (U_{m + 1} - U_{m}) β} 〉}_{V_{m}}

(20)

with

Δ G_{m \to m + 1} = ∑_{m = 1}^{n + 1} δ Δ G_{m \to m + 1}

(21)

In many applications we are not interested in the total partition function and the total free energy, but in the potential of mean force (PMF) of the simplified and explicit models. In such cases we define the PMF by considering a partial partition function

Q = \int q (X) d X

(22)

and

q (X) = exp {- Δ g (X) β},

(23)

where X is the parameter that defines the PMF. With Δg_sp(X) and Δg_ep(X) defined in the thermodynamic cycle of Fig. 2. The key for the transition between the simplified and explicit PMFs is the ratio

The thermodynamic cycle used to calculate the change in free energy Δ*g_ep* for a generic process. Having calculated the free energy change of the simple model, Δ*g_sp*, umbrella sampling can be used to calculate the free energy change ΔΔ*g_sp*_→*_ep* for the initial and final states to obtain Δ*g_ep*.

exp {- Δ g_{ep \to sp} (X) β} = q_{sp} (X) / q_{ep} (X)

(24)

Here we follow the same strategy as in Eq. (18) and obtain,

exp {- Δ g_{ep \to sp} (X) β} = {〈 δ (X - X') exp {- (U_{sp} (\tilde{R}, X') - U_{ep} (\tilde{r}, \tilde{R}, X'))} 〉}_{V_{ep}}

(25)

If we use the mapping potential of Eq. (19) we will have,

Δ g_{ep \to sp} (X) = ∑_{m = 1}^{n + 1} δ Δ g_{m \to m + 1} (X)

(26)

with,

exp {- δ Δ g_{m \to m + 1} (X)} = {〈 δ (X - X') exp {- (U_{m + 1} - U_{m}) β} 〉}_{V_{m}}

(27)

In evaluating the free energy terms of Eqs. (18) or (27), we start typically from the simplified model in a given configuration and construct the explicit model by replacing the dummy atom with the explicit C_β. Then, using a rotamer library²⁶ (with random search followed by brief steepest decent torsional minimization) we select the side chain configurations that minimize the potential energy of the system, while also minimizing the displacement between the center of the explicit side chain and the position of the simplified atom. After this treatment that considers all the rotational degrees of freedom within the side chains, we apply a steepest descent minimization over the Cartesian degrees of freedom within each side chain to obtain the relaxed side chains structure. Once the explicit structure has been minimized we can use Molecular Dynamics (MD) or Monte Carlo (MC) for the evaluation of the free energy of moving from the simplified to the explicit model.

3. Results and Discussion

3.1. Validation and Parameterization

A crucial element of the introduction of any analytic potential surface is the parameterization and validation process. This section considers aspects of the refinements that were not considered in the previous sections.

3.1.1. Refining $Δ U_{mm}^{phi - psi}$

The optimization of the parameters that determine the solvation contribution from $U_{mm}^{phi - psi}$ and $Δ U_{mm}^{HB}$ involved the use of the alanine dipeptide as a model system. The explicit free energy surface for this system was generated both for a gas phase model and for a solution model (using the surface constrained all atom solvent (SCAAS) model²⁷^,²⁸). In order to recreate the hydrated landscape by the simplified model, it was necessary to add to U^phi⁻^psi four potential wells at the minima of the α- helix (φ ∼ -90°, ψ ∼ -30°), β-sheet (φ ∼ -160°, ψ ∼ 160°), γ′-turn (φ ∼ -90°, ψ ∼ 60°), and L-α-helix (φ ∼ 60°, ψ ∼ 40°) regions of the Ramachandran (RC) diagram. The relevant parameters are given in Table II.

The addition of the $Δ U_{mm}^{phi - psi}$ term (and in some respects the $Δ U_{mm}^{HB}$ ) to the simplified model created a surface with minima at the typical regions obtained in other works (e.g. Lovell et al.²⁹). Our RC diagram is presented in Fig. 3 for both the explicit and simplified models. Apparently the simplified surface is not perfect, but allows us to capture the main features of the explicit model. The notations for the different regions are taken from the ref29.

The Ramachandran diagram for alanine dipeptide, obtained by the explicit model (A) and the simplified model (B).

3.1.2. Examination of ΔU^HB

The term $Δ U_{mm}^{HB}$ is strongly coupled to $Δ U_{mm}^{phi - psi}$ and it is important to examine whether in addition to reproducing reasonable phi-psi map, we can retain a reasonable description of stable helices and other structures that are determined by hydrogen bonding. With the $Δ U_{mm}^{HB}$ we find that a small barrier to hydrogen bond formation (∼1 kcal mol^-1) is easily crossed at room temperature, but forms a significant barrier to the breaking of hydrogen bonds once they have been formed. By including ΔU^HB in Eq. (1), we find that even full scale protein systems like ubiquitin or chorismate mutase²¹ are structurally stable (RMS displacement < 3 Å) for more than 100 ps of simulation.

3.1.3. Refining $U_{s}^{self}$

The introduction of $U_{s}^{self}$ is perhaps the most important new element in our approach. This potential clearly improves the physical basis of our model but the reliability of the model still depends crucially on the parametization of the self energy potential. The refinement procedure was done by selecting different ionizable groups in several protein sites, evaluating their self energy using the PDLD/S-LRA approach²⁸ and adjusting the parameters in $U_{s}^{self}$ to reproduce the best agreement between the PDLD/S results and those of the simplified model.

In the first step of this refinement procedure we started by considering $U_{s}^{self}$ in a fully hydrophobic, non polar, protein environment. For this purpose we created the relevant environment by using in-silico methods. That is, we mutated, in-silico, residues 21-56 (out of the total of 62 residues) of the protein SSO7d (PDB ID 1SSO) to all-hydrophobic residues, except Ile29 which we mutated into Glu. This created an all-hydrophobic environment surrounding the ionizable residue Glu29 (see Fig. 4), where Glu29 is buried, by residues 22 to 33 and 43 to 56. This model which is referred to as “the fully buried Glu 29” was then modified creating two additional environments for Glu29. In the second, which is referred to as a “semi-buried” environment, we deleted residues 49 to 56, making Glu29 more exposable to solvent. Finally for the third, which is refereed to here as “exposed” environment, we mutated Val22, Met24 and Phe47 into Glycine, making Glu29 fully exposed to solvent.

The system used as a benchmark for the behavior of the self energy term in non polar environment. The backbone conformation of protein 1SSO residues 21-56 was taken, and the fully hydrophobic sequence after the mutations is reported in the Figure. The figure describes the hydrophobic environment generated around Glu29, buried by residues 22-33 and 43-56.

Next we calculated the solvation free energy of the ionized Glu29, in all 3 hydrophobic environments described above. We used the PDLD/S-LRA as well as simplified model's Eqs. (6 – 10), and adjusted the parameters in these equations to get the best agreement between the two values. The results for simplified and PDLD/S-LRA salvation energy for the artificial protein environments are given in the first three entries of Table III.

Table III.

Comparison of the self energy calculated by the CG and the PDLD/S-LRA models

Test Systems	The CG model (kcal/mole)	The PDLD/S-LRA (kcal/mole)

Glu29 burried	3.9	5.2
Glu29 semi buried	3.2	3.8
Glu29 exposed	-0.1	1.1
Apoflavodoxin Glu 106	3.7	4.7
Dimer DHFR ARG 28	2.9	2.3
Dimer DHFR Asp 117	0.3	-0.6
Interleukin Glu 111	3.0	4.5
Interleukin Glu 113	3.6	3.3

Open in a new tab

After refining the simplified model to obtain the best description of the effect of nonpolar residues on the self-energy, we turned to the refinement of the parameters that reflect the polar contributions. This was done by the same procedure described above but now for a number of real proteins sites (where we have significant number of polar residues). The protein sites were taken from three diverge proteins: Apoflavodoxin (PDB ID 1FTG) residue site Glu106, Tm DHFR (PDB ID 1CZ3) residue sites Arg28 and Asp117, and Interleukin (PDB ID 1IOB) residue sites Glu111 and Glu113. Following the same steps as with the artificial hydrophobic environments, we calculated $U_{s}^{self}$ by using both the PDLD/S-LRA and the simplified model's approach and then refined the parameters in Eqs. (6-10). The self energies obtained by both approaches are given in Table III. The final set of refined parameters for $U_{s}^{self}$ are given in Table IV. As seen from the tables, the agreement between simplified model $U_{s}^{self}$ , and PDLD/S-LRA $U_{s}^{self}$ is quite encouraging.

Table IV.

Refined distance r_ij parameters for CG model $U_{s}^{self}$

Distance parameters	Best fitted maximum values^a

r_{ij polar}	5 Å
r_{ij non polar}	7 Å

Open in a new tab

The best fitted values for the distances used in for the calculation of polar and non polar G(r_ij). The distances r_ij, polar and non polar, should not exceed the values shown on the right column

3.2. Evaluating the free energy of transition between the simplified and explicit surfaces

In considering the transition from simplified CG models to explicit models, we start by noting that Eq. (14) is not conceptually tied to a given type of reaction or process. Rather, it represents a general means of converting between two model representations of the same system. The thermodynamic cycle of this generic transformation is given in Fig. 2. Previously¹⁰, we have shown that this process can be exploited in calculating the PMF for unfolding processes, using the radius of gyration as a means of denaturing the system from the folded to the unfolded state. We have also shown that it is possible to use simplified model structures to obtain explicit structures of both folded and unfolded proteins¹². In this work we explore other applications of our general strategy. This is done in the following subsection.

3.2.1 Evaluating changes in Folding Energy

One possible application of our model is the evaluation of the effect of mutation on protein stability. Although this can be done by evaluating the folding PMF for both systems, it is much simpler to do so by using the thermodynamic cycle of Fig. 5. More specifically, ultimately we are interested in evaluating:

The thermodynamic cycles used to calculate the change in free energy of unfolding upon mutation (see text for details)

Δ Δ G_{N \to M}^{uf \to f} = Δ G_{M_{ep}}^{fold} - Δ G_{N_{ep}}^{fold}

(29)

where $Δ G_{M_{ep}}^{fold}$ and $Δ G_{N_{ep}}^{fold}$ are the free energies of the mutant and native forms of the protein, respectively, evaluated by the explicit model. $Δ G_{M_{ep}}^{fold}$ can be evaluated using the thermodynamic cycle of Fig. 5, yielding

Δ G_{M_{ep}}^{fold} = Δ G_{M_{sp}}^{fold} + (Δ Δ G_{M_{sp \to ep}}^{f} - Δ Δ G_{M_{sp \to ep}}^{uf})

(30)

where $Δ G_{M_{ep}}^{fold}$ is the folding free energy of the mutant in the simplified model, while $Δ Δ G_{M_{sp \to ep}}^{f}$ , $Δ Δ G_{M_{sp \to ep}}^{u f}$ are the changes in free energy of the transformation from the simplified model to the explicit model for the folded and unfolded mutant, respectively. The folding energy of the native system is produced through the same process as the mutant. However, rather than calculating the folding energy directly, we can use the thermodynamic cycle inside the grey box in Fig. 5 and obtain

Δ G_{M_{sp}}^{fold} - Δ G_{N_{sp}}^{fold} = Δ G_{N_{sp} \to M_{sp}}^{f} - Δ G_{N_{sp} \to M_{sp}}^{uf}

(31)

It is therefore possible to calculate the change in folding free energy in the simple model by comparing the free energy of mutating the protein in the folded and unfolded systems. Ultimately, combining Eqs. (29 – 31), and the thermodynamic cycle of Fig. 5 gives,

\begin{array}{r} \begin{matrix} Δ Δ G_{N \to M}^{uf \to f} = (Δ G_{N_{sp} \to M_{sp}}^{f} - Δ G_{N_{sp} \to M_{sp}}^{uf}) \\ + (Δ Δ G_{M_{sp \to ep}}^{f} - Δ Δ G_{M_{sp \to ep}}^{uf}) \\ - (Δ Δ G_{N_{sp \to ep}}^{f} - Δ Δ G_{N_{sp \to ep}}^{uf}) \end{matrix} \end{array}

(32)

While Eq. (32) appears more complicated than evaluating Eq. (29) directly, it has definite advantages. First, calculating ΔΔG_uf₋_f via Eq. (32) requires only that we calculate the free energy change in the mutation process only in the simple model (this is done in the folded and unfolded states). With the exception of mutations to or from glycine, mutating residues within the simplified model is extremely straightforward. All that is required is the change of the parameters associated with the X atom for the residue being mutated. Furthermore, it is not necessary to force the folded system to unfold. Additionally, the unfolded state can be modeled using only the neighboring residues to the residue to be mutated, greatly simplifying the calculation of $Δ G_{N_{sp} \to M_{sp}}^{uf}$ and reducing the computational cost involved. These computational savings can be reinvested by increasing the number and length of frames used to calculate $Δ G_{N_{sp} \to M_{sp}}^{f}$ and $Δ G_{N_{sp} \to M_{sp}}^{uf}$ via the free energy perturbation method, which greatly improves the accuracy of the calculation.

As a test case for the performance of Eq. (32), we have chosen to examine the pseudo wild-type ubiquitin and the Asp21Asn mutant discussed in our previous work²¹ (see also ref30). The three dimensional representation of this system is shown in Fig. 6. We have chosen this system because it has been well studied by other means²¹, and at 76 residues is large enough and has a compact enough secondary structure to represent an actual system of interest. Our FEP calculations were performed by both MD and Monte Carlo simulations. The calculations started with 10,000 MD relaxation steps at 300 K, using 1 fs for each step. The main chain and residues far from the mutated residues were fixed during this relaxation runs. The resulting explicit structure was then used to generate the simplified structure. Four such simplified structures were generated. Starting from the given simplified structure we simulated the motions of the explicit model by either MD or MC simulations at 300 K with the mapping potential of Eq. (19) and evaluated the free energy of moving from the simplified to the explicit model by a FEP approach. The key aspect of our procedure is demonstrated in Fig. 7, where we describe the fluctuations of the gap between the explicit and simplified potential for trajectories on the given mapping potential of the native protein (in this case the sixth window with λ = 0.5). The same procedure was repeated for 11 frames and the overall free energy of moving from the simplified to the explicit potential was calculated ( $Δ G_{N_{sp \to ep}}^{f}$ ). The same procedure was used to evaluate $Δ G_{M_{sp \to ep}}^{f}$ for the Asp 21Asn mutant. The calculations for the unfolded protein ( $Δ G_{N_{sp \to ep}}^{uf}$ and $Δ G_{M_{sp \to ep}}^{uf}$ ) are shown in Table V.

A three dimensional representation of pseudo wild type ubiquitin. Asp21, which is involved in the mutational study, is represented explicitly.

The fluctuations of the energy difference between the explicit and simplified potentials, obtained during (A) FEP/MD, (B) Monte Carlo, simulations

Table V.

The energetics of the Asp21Asn mutation in Ubiquatin.^a

Δ Δ G_{N_{sp \to ep}}^{f}

Δ Δ G_{M_{sp \to ep}}^{f}

Δ G_{N_{sp}}^{fold}

Δ G_{M_{sp}}^{fold}

Δ G_{N_{sp \to ep}}^{uf}

Δ G_{M_{sp \to ep}}^{uf}

Δ Δ G_{N \to M}^{uf \to f}

(calc)

Δ Δ G_{N \to M}^{uf \to f}

(obs)

(FEP/MD)

67.40

64.36

-179.89

-172.94

13.77

8.57

0.73

0.85

(MC)

67.40

64.36

-179.89

-172.94

8.02

6.58

2.47

0.85

Open in a new tab

Energies are in kcal/mol. The table provides the different contributions for the cycle of Fig. 4 obtained from the average of the results generated from four simplified structures. The signs of the different terms follow Eq. (32), so that the sum will give the desired $Δ Δ G_{N \to M}^{uf \to f}$ .

3.2.2 Evaluating transition state binding free energy

Another important use of our approach is in the field of enzyme design where it can be used to evaluate the binding free energy of rate determining transition states. This can be done by focusing on the electrostatic free energy contribution ΔG_bind′, while using the cycle of Fig. 8, which leads to the following expression:

The cycle used to evaluate mutational effects on transition states binding free energies.

\begin{array}{l} \begin{matrix} Δ Δ G_{bin d^{'}, ep}^{(N \to M)} (TS) = - Δ G_{sp \to ep}^{N} (TS) + Δ G_{sp \to ep}^{M} (TS) + Δ G_{sp}^{N \to M} (TS) \\ + Δ G_{sp \to ep}^{N} (T S^{'}) - Δ G_{sp \to ep}^{M} (TS') - Δ G_{sp}^{N \to M} (TS') \end{matrix} \end{array}

(33)

Where $Δ G_{sp \to ep}^{N} (TS)$ and $Δ G_{sp \to ep}^{M} (TS)$ are the changes in the free energy of transforming the simplified model to the explicit model, for the fully charged TS in the native and mutant, respectively. $Δ G_{sp}^{N \to M} (TS)$ is the difference in free energy between native and mutant simplified structures for the fully charged TS. $Δ G_{sp \to ep}^{N} (TS')$ and $Δ G_{sp \to ep}^{M} (TS')$ are the changes in the free energy of transforming the simplified model to the explicit model for the zero charged TS, in the native and mutant, respectively. $Δ G_{sp}^{N \to M} (TS')$ is the difference in free energy between native and mutant simplified structures for the zero charged TS.

The calculations were performed on monomeric chorismate mutase (mMjCM) (whose active site is depicted in Fig. 9). The coordinates mMjCM (NMR) structure were taken from PDB access code 2GTV. The native protein preceded to 100ps MD simulations in order to obtain a relaxed structure. The resulting native explicit structure was then used to generate the simplified structure. Both the simplified and explicit models included an explicit empirical valence bond (EVB) description of the reacting system (see ref31 for details). Three such simplified structures were generated and used to simulate the fluctuations of the explicit model by MC moves of the torsional coordinates. The simulations were done at 300 K with the mapping potential of Eq. (19) and evaluated the free energy of moving from the simplified to the explicit model with full EVB transition state charges and zero EVB transition state charges. We have used a simulation length of 1 ps for each of the 11 mapping steps and the overall free energy of moving from the simplified to the explicit potential was calculated for the charged ( $Δ G_{sp \to ep}^{N} (TS)$ ) and zero charged $Δ G_{sp \to ep}^{N} (TS')$ transition state. We choose as test cases the mutations of Gln88 and Arg51 since both (particularly Gln88) were found to present a significant challenge in our recent exploration of the potential of the EVB in enzyme design³¹. Here we mutated Gln88 to Asn88 and repeated the calculation described above. We also mutated Arg51 to Gln51 and repeated the calculations. The results are shown in Table VI. Finally, we calculated $Δ Δ G_{bind}^{ep (N \to M)} (TS)$ by taking the corresponding change in average energy (see results in Table VI).

A schematic description of the mMjCM active sites, depicting key residues that are involved in the binding of the transition-state analogue (bold).

Table VI.

The energetics of the Gln88Asn and Arg51Gln mutations in monomeric chorismate mutase (mMjCM).^a

Gln88Asn

Arg51Gln

Free energies values

Δ G_{sp \to ep}^{N} (TS)

64.50

57.46

Δ G_{sp \to ep}^{M} (TS)

77.30

23.50

Δ G_{sp}^{N \to M} (TS)

16.69

3.57

Δ G_{sp \to ep}^{N} (T S^{'})

27.14

105.93

Δ G_{sp \to ep}^{M} (T S^{'})

37.89

71.21

Δ G_{sp}^{N \to M} (T S^{'})

12.80

0.5

Δ Δ G_{bind}^{ep (N \to M)} (calc)

5.94

3.83

Δ Δ G_{bind}^{ep (N \to M)} (obs)

4.76

3.27

Open in a new tab

Energies in kcal/mol.

TS and TS′ designate, respectively, the transition state with its full charges and without any residual charges.

experimental values are taken from on Woycechowsky et al⁴². The observed value of $Δ Δ G_{bind}^{ep (N \to M)}$ is given by $Δ Δ G_{bind}^{ep (N \to M)} = - RT ln [\frac{{(k_{cat} / K_{M})}_{M}}{{(k_{cat} / K_{M})}_{N}}]$

3.3 Other applications

The use of the CG model as a reference potential can facilitate studies of other long-standing problems and here we list several new key applications.

3.3.1 Studies of the coupling between conformational and chemical motions

The proposal that slow conformational motions on the millisecond time scale play a major role in enzyme catalysis has been featured in many recent high profile works (see discussion in ref32). Although there is no direct experimental support, and there are very clear conceptual and logical reasons why this idea is invalid (see ref32 and reference in that work), it is crucial to explore the dynamical proposal by simulation studies that can explore the relevant milliseconds time range. Furthermore, there are other fundamental functional problems whose analysis can be greatly helped from the ability to explore slow protein motions.

Recently we introduced a renormalization method were we move from the full model to the simplified model and then to an even simpler 2-D model in representing the landscape and dynamics in the space defined by the conformational and chemical coordinate in enzymes. In this treatment we force the CG model to have the same dynamical properties as the full model by a strategy similar to the one used in our modeling of ion channels³³. The main element of our approach is the refinement of the friction in the CG model so that the response of this model to the application of a strong constraint will be similar to that of the full model. The use of this model and in particular its transfer to the 2-D model allows us to explore the time dependence of processes that occur on a long time scale (e.g. ref34 and ref32). The study of ref provided the first direct proof that the dynamical proposal is invalid and further progress in the use of the renormalization model is clearly expected.

3.3.2 Studies of constant pH dynamics

There is a significant current interest in MD simulations that take into account changes in ionizations states during the simulated process (ref³⁵ and ref36,37). However, the current model does not consider the time dependence of the proton transport (PTR) process and thus cannot reproduce the proper time dependence of the response of a protein to changes in pH. To advance on this challenging field we combined our approach of time dependent Monte Carlo (MC) of PTR processes (ref³⁸) and the simplified protein model described here, in studies of pH dependent MD

Our model represents the energetics of the system by a simplified version of the EVB using the modified Marcus equation (see ref39) for the energetic of any possible PTR step, where the free energies for the different protonation states is obtained by the electrostatic energy of the CG model. The MC moves are based on the electrostatic energies of the CG model and then scaled by the characteristic PT time to correspond to the rate constant predicted by transition state theory. More specifically (ref⁴⁰ and ref41), the free energy of each protonation state is expressed by

Δ G^{(m)} = ∑_{k} {- 2.3 RT q_{k}^{(m)} [p K_{int, k}^{p} - pH] + 1 / 2 ∑_{k \neq 1} W_{kl} q_{k}^{(m)} q_{l}^{(m)}}

(34)

where m designates the vector of the charge states of the given configuration (i.e. m = q₁^(m), q₂^(m),… q_n^(m)). Here q_k⁽^m⁾ is the actual charge of the k^th group at the m^th configuration. The W_kl q_k⁽^m⁾ q_l⁽^m⁾ term represents the charge-charge interactions.

The barrier for the PT moves is given by³⁹

Δ g_{i \to j}^{\neq} = {(Δ G_{i \to j}^{0} + λ)}^{2} / 4 λ - {\bar{H}}_{ij} (x^{\neq}) + {\bar{H}}_{ij}^{2} (x_{0}^{(i)}) / (Δ G_{i \to j}^{0} + λ)

(35)

where $Δ G_{i \to j}^{0}$ is the free energy of the reaction, and H_ij is the EVB off-diagonal term that mixes the two relevant states whose average values at the transition state x^≠ and at the reactant state $x_{0}^{(i)}$ , are designated by the corresponding H̄. The first term is the expression of the regular Marcus equation (see ref39), which corresponds to the intersection of Δg₁ and Δg₂ at x^≠. The second and third terms represent, respectively, the effect of H₁₂ at x^≠ and $x_{0}^{(i)}$ .

Using Eq. (35) with simplified modifications³⁸, we consider random jumps of the proton to any possible site, but accept only jumps to sites i+1 and i-1 (or in the more general case to sites that are less than 4.0 from site i (usually we use 3.5 Å cutoff but here we use a simplified model for the protein side chains). Furthermore, these jumps are accepted only if they satisfy the standard Metropolis criterion with regard to the free energy change (ΔG_n₊₁ − ΔG_n) or the closely related activation barrier $Δ g_{n \to n + 1}^{‡}$ , which is defined by the individual PT steps in this transition. The MC procedure is converted to a time-dependent simulation by exploiting the isomorphism between the probability obtained from the MC procedure and the probability factor of the transition state theory (TST). Since the probability of the MC jump satisfies the Boltzmann probability we can write:

τ_{i \to j} = S_{PT} \cdot N_{i \to j}

(36)

Where τ_i_→_j is the real time required for a move from site i to site j and N_i_→_j is the number of MC steps that were required for this move. The factor S for PT between internal groups is given by³⁸:

S = 0.165 \cdot exp {\frac{δ}{k_{B} T}}

(37)

Where 0.165 is the average time, in picoseconds, it takes for a productive trajectory to reach the TS at room temperature (see supplementary information of ref38). The factor δ represents the solute and solvent reorganization barrier for a transfer between states of similar energy, i.e. $δ = Δ g_{n \to n + 1}^{‡} - Δ G_{n \to n + 1}$ . The MC approach guarantees that the rate of the PT steps will follow the relevant Boltzmann probability in TST (see ref38).

The PT to and from the bulk is determined by the same considerations but now using the free energy of Eq. (34) for the given pH. The MC move for ΔG_N_→_N₊₁ is scaled by the time of H₃O⁺ diffusion by the same considerations as in ref41 but now for pH = -1 (which corresponds to the pK_a of H₃O⁺). This allows us to express the time of transferring proton from the bulk to a water molecule on the surface of the protein and then to a given ionizable group, as:

τ_{Balk} = 0.2 ps

(38)

Since some ionized groups are not in direct contact with the bulk solvent, we scale the MC move for PT to and from the bulk by a factor, F(R), that reflects the distance, R_i,bulk, between the given residue and the closest bulk water molecule.

\begin{array}{l} F (R_{i, bulk}) = R_{i, bulk} - a & \begin{matrix} R_{i, bulk} \geq 4 \\ site i is protonated \end{matrix} \\ F (R_{i, bulk}) = a - R_{i, bulk} & \begin{matrix} R_{i, bulk} \geq 4 \\ site i is not protonated \end{matrix} \\ F (R_{i, bulk}) = 0 & R_{i, bulk} < 4 \end{array}

(39)

Where a is a parameter of positive value. In this preliminary work, a is set to a = 10 Å. The distances R, are determined by using a 4 Å grid and excluding grid points that are within 4 Å from the protein (this large exclusion radius is due to the fact that we use a simplified protein model). The function F(R) was chosen to represent the chance of a bulk molecule to approach the protein residues. The grid is then updated once in every 1000 MD steps. Since the exponential factor of F(R) is included in the criteria of the MC move that determines the acceptance of the PT moves we can write:

S_{bulk} = 0.2 ps = 0.16 \cdot (\frac{0.2}{0.16}) ps

(40)

Thus we can write for the PT from and to the bulk

τ_{bulk} = S_{bulk} \cdot N_{i \to j}

(41)

The scaling by 0.16 allows us to use the same scaling for the MC for the PT from bulk and the MC for the PT between the protein groups and between the protein groups and the bulk.

The next issue is the coupling between the PT steps and the protein fluctuations. One option is to run Langevin dynamics (LD) of the CG model. This option requires a significant computational time, since the typical LD steps are about 10fs, where the MC steps correspond to 200fs. Another option is to also simulate the protein motion by an MC treatment, and then estimate the characteristic time for the MC moves. Finally, considering the fact that protein's structural changes are of interest mainly when they appear after the changes in its ionization state, it is possible to simply turn-on the protein relaxation, only after reaching new stable ionization states, then allow the protein to relax (with fixed ionization state) and finally turn on again the MC for the PT processes. While such approach may not be fully consistent, it will converge to the same final state as the more consistent and much more expensive fully coupled LD - MC approach. All the above options will be considered in the future. Regardless of the method used for simulating the protein fluctuations, we expect the overall method to be very instrumental in modeling pH induced conformational changes.

The preliminary implementation of the model demonstrated that calculated ionization state at the equilibrated system appeared to be quite similar to those obtained with the PDLD/S-LRA treatment. We also performed preliminary simulations of the time dependent PTR in protein SSO7d (PDB ID: 1SSO) following pH change from 0 to 7. The residue sites used for these simulations are: Lys4, Lys6, Lys8, Glu10, Glu11, Lys12, Asp15, Lys18, Lys20, Lys21, Lys24, Lys27, Asp34 and Glu35. The corresponding results are given in Fig. 10. The approach is promising but more validation is needed.

Simulation of the proton transport process in protein 1SSO, upon change of the bulk pH from 0 to 7. The graph describes the relaxation of the overall free energy, while the insert figures describe the migration of the “active” protons. Residue sites are depicted in this figure as squares, which are sometimes occupied by a proton (black circle), and sometimes they are empty. Protons can move between residue sites, or between residue sites and the surrounding bulk.

4. Conclusions

This work presented the current version of our multiscale approach that uses a simplified folding model as a reference potential for all-atom simulations. The simplified model used here includes some innovations in terms of the electrostatic treatment and in particular in terms of the representation of the self-energy. However, the main point in our treatment is that the simplified potential does not have to be perfect since it is only used as a reference for the explicit potential. Of course, the closer are the simplified and the explicit potential, the faster will be the convergence of the calculations of the free energy for the transfer between these potentials.

The power of our treatment was illustrated in calculations of mutational effects, in the calculations of transition state binding energies and in calculations of time dependent response to pH changes. Another application that was demonstrated in our previous work involves the evaluation of the potential of mean force (PMF). We also outlined other applications of our approach, including its implementation in studies of long time scale dynamical effects and the proposal that dynamical effects contribute to catalysis³². More systematic options of using the reference potential for exploring the dynamics of the full model has not been fully formalized but developments in this crucial direction are clearly expected.

The exact time saving of the CG model is an issue that requires careful analysis, which is left to subsequent studies. That is, some of the key questions like obtaining a PMF for protein unfolding are hard to explore with explicit models since performing the corresponding calculations, where the proper sampling is extremely challenging. Similarly the effect of proper sampling calculations of binding energies is not yet fully clear. Thus we are not trying to give here time estimates, and hope to address the issue in subsequent works. However we can give here a rather trivial example, noting that a structural relaxation of the protein 1SSO took 100ps took 7 hours on a 64bit Dual Intel Xeon node, with 20Gb memory, while the equivalent relaxation of the simplified model on the same node took less than 2 minutes.

Overall it is clear that multiscale modeling of proteins has advanced significantly from its early days in 1975 and that a more rigorous treatments should focus on minimizing the difference between the average of the difference between the simplified and explicit potential <(U_sp-U_exp)>. In fact, minimizing this functional with respect to the parameters of the simplified model is probably the most promising ways of refining the CG model.

Acknowledgments

This work was supported by NIH grants GM 24492 and GM40283. M.R. thanks the Generalitat Valenciana from Spain for the postdoctoral fellowship. We gratefully acknowledge the University of Southern California's High Performance Computing and Communications Center for computer time.

References

1.Levitt M, Warshel A. Computer-Simulation of Protein Folding. Nature. 1975;253(5494):694–698. doi: 10.1038/253694a0. [DOI] [PubMed] [Google Scholar]
2.Bryngelson JD, Wolynes PG. Spin glasses and the statistical mechanics of protein folding. Proc Natl Acad Sci U S A. 1987;84(21):7524–7528. doi: 10.1073/pnas.84.21.7524. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Dill KA. Dominant Forces in Protein Folding. Biochemistry. 1990;29(31):7133–7155. doi: 10.1021/bi00483a001. [DOI] [PubMed] [Google Scholar]
4.Hinds DA, Levitt M. A lattice model for protein structure prediction at low resolution. Proc Natl Acad Sci U S A. 1992;89(7):2536–2540. doi: 10.1073/pnas.89.7.2536. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Olszewski KA, Kolinski A, Skolnick J. Folding simulations and computer redesign of protein A three-helix bundle motifs. Proteins. 1996;25(3):286–299. doi: 10.1002/(SICI)1097-0134(199607)25:3<286::AID-PROT2>3.0.CO;2-E. [DOI] [PubMed] [Google Scholar]
6.Shakhnovich E, Abkevich V, Ptitsyn O. Conserved residues and the mechanism of protein folding. Nature. 1996;379(6560):96–98. doi: 10.1038/379096a0. [DOI] [PubMed] [Google Scholar]
7.Cheung MS, Chavez LL, Onuchic JN. The energy landscape for protein folding and possible connections to function. Polymer. 2004;45(2):547–555. [Google Scholar]
8.Heath AP, Kavraki LE, Clementi C. From coarse-grain to all-atom: toward multiscale analysis of protein landscapes. Proteins. 2007;68(3):646–661. doi: 10.1002/prot.21371. [DOI] [PubMed] [Google Scholar]
9.Marrink SJ, Risselada HJ, Yefimov S, Tieleman DP, de Vries AH. The MARTINI force field: coarse grained model for biomolecular simulations. J Phys Chem B. 2007;111(27):7812–7824. doi: 10.1021/jp071097f. [DOI] [PubMed] [Google Scholar]
10.Fan ZZ, Hwang JK, Warshel A. Using simplified protein representation as a reference potential for all-atom calculations of folding free energy. Theor Chem Acc. 1999;103(1):77–80. [Google Scholar]
11.Benkovic SJ, Hammes GG, Hammes-Schiffer S. Free-energy landscape of enzyme catalysis. Biochemistry. 2008;47(11):3317–3321. doi: 10.1021/bi800049z. [DOI] [PubMed] [Google Scholar]
12.Roca M, Messer BM, Hilvert D, Warshel A. On the relationship between folding and chemical landscapes in enzyme catalysis. Proc Natl Acad Sci U S A. 2008;105:13877–13882. doi: 10.1073/pnas.0803405105. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Xiang Y, Goodman MF, Beard WA, Wilson SH, Warshel A. Exploring the role of large conformational changes in the fidelity of DNA polymerase beta. Proteins. 2008;70(1):231–247. doi: 10.1002/prot.21668. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Luzhkov V, Warshel A. Microscopic Models for Quantum Mechanical Calculations of Chemical Processes in Solutions: LD/AMPAC and SCAAS/AMPAC Calculations of Solvation Energies. J Comp Chem. 1992;13:199–213. [Google Scholar]
15.Rosta E, Klahn M, Warshel A. Towards Accurate Ab Initio QM/MM Calculations of Free-Energy Profiles of Enzymatic Reactions. J Phys Chem B. 2006;110(6):2934–2941. doi: 10.1021/jp057109j. [DOI] [PubMed] [Google Scholar]
16.Wesolowski TA, Warshel A. Frozen Density Functional Approach for Ab Initio Calculations of Solvated Molecules. J Phys Chem. 1993;97:8050–8053. [Google Scholar]
17.Hwang JK, Warshel A. A quantized classical path approach for calculations of quantum mechanical rate constants. J Phys Chem. 1993;97(39):10053–10058. [Google Scholar]
18.Hwang JK, Warshel A. How important are quantum mechanical nuclear motions in enzyme catalysis? J Am Chem Soc. 1996;118:11745–11751. [Google Scholar]
19.Brandt A. Principles of systematic upscaling. In: Fish J, editor. Bridging the scales in Science and Engineering. Oxford University Press; 2008. [Google Scholar]
20.Matysiak S, Clementi C. Minimalist protein model as a diagnostic tool for misfolding and aggregation. J Mol Biol. 2006;363(1):297–308. doi: 10.1016/j.jmb.2006.07.088. [DOI] [PubMed] [Google Scholar]
21.Roca M, Messer B, Warshel A. Electrostatic contributions to protein stability and folding energy. FEBS Lett. 2007;581(10):2065–2071. doi: 10.1016/j.febslet.2007.04.025. [DOI] [PubMed] [Google Scholar]
22.Warshel A, Sharma PK, Kato M, Parson WW. Modeling electrostatic effects in proteins. Biochim Biophys Acta. 2006;1764(11):1647–1676. doi: 10.1016/j.bbapap.2006.08.007. [DOI] [PubMed] [Google Scholar]
23.Vicatos S, Roca M, Warshel A. Effective Approach for Calculations of Absolute Stability of Proteins Using Focused dielectric Constants. Proteins. 2009 doi: 10.1002/prot.22481. In Print. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Warshel A, Russell ST. Calculations of electrostatic interactions in biological systems and in solutions. Q Rev Biophys. 1984;17(3):283–422. doi: 10.1017/s0033583500005333. [DOI] [PubMed] [Google Scholar]
25.Warshel A, Russell ST, Churg AK. Macroscopic models for studies of electrostatic interactions in proteins: limitations and applicability. Proc Natl Acad Sci U S A. 1984;81(15):4785–4789. doi: 10.1073/pnas.81.15.4785. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Lovell SC, Word JM, Richardson JS, Richardson DC. The penultimate rotamer library. Proteins. 2000;40(3):389–408. [PubMed] [Google Scholar]
27.King G, Warshel A. A surface constrained all-atom solvent model for effective simulations of polar solutions. J Chem Phys. 1989;91(6):3647–3661. [Google Scholar]
28.Lee FS, Chu ZT, Warshel A. Microscopic and semimicroscopic calculations of electrostatic energies in proteins by the POLARIS and ENZYMIX programs. J Comp Chem. 1993;14:161–185. [Google Scholar]
29.Lovell SC, Davis IW, Arendall WB, 3rd, de Bakker PI, Word JM, Prisant MG, Richardson JS, Richardson DC. Structure validation by Calpha geometry: phi,psi and Cbeta deviation. Proteins. 2003;50(3):437–450. doi: 10.1002/prot.10286. [DOI] [PubMed] [Google Scholar]
30.Went HM, Jackson SE. Ubiquitin folds through a highly polarized transition state. Protein Eng Des Sel. 2005;18(5):229–237. doi: 10.1093/protein/gzi025. [DOI] [PubMed] [Google Scholar]
31.Roca M, Vardi-Kilshtain A, Warshel A. Toward Accurate Screening in Computer-Aided Enzyme Design. Biochemistry. 2009;48(14):3046–3056. doi: 10.1021/bi802191b. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Pisliakov AV, Cao J, Kamerlin SCL, Warshel A. Enzyme millisecond conformational dynamics do not catalyze the chemical step. Proc Natl Acad Sci U S A. 2009 doi: 10.1073/pnas.0909150106. Early edition. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Burykin A, Kato M, Warshel A. Exploring the origin of the ion selectivity of the KcsA potassium channel. Proteins. 2003;52(3):412–426. doi: 10.1002/prot.10455. [DOI] [PubMed] [Google Scholar]
34.Liu H, Shi Y, Chen XS, Warshel A. Simulating the electrostatic guidance of the vectorial translocations in hexameric helicases and translocases. Proc Natl Acad Sci U S A. 2009;106(18):7449–7454. doi: 10.1073/pnas.0900532106. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Khandogin J, Brooks CL., 3rd Constant pH molecular dynamics with proton tautomerism. Biophys J. 2005;89(1):141–157. doi: 10.1529/biophysj.105.061341. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Baptista AM, Martel PJ, Petersen SB. Simulation of protein conformational freedom as a function of pH: constant-pH molecular dynamics using implicit titration. Proteins. 1997;27(4):523–544. [PubMed] [Google Scholar]
37.Baptista AM, Martel PJ, Soares CM. Simulation of electron-proton coupling with a Monte Carlo method: application to cytochrome c3 using continuum electrostatics. Biophys J. 1999;76(6):2978–2998. doi: 10.1016/S0006-3495(99)77452-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Olsson MH, Warshel A. Monte Carlo simulations of proton pumps: on the working principles of the biological valve that controls proton pumping in cytochrome c oxidase. Proc Natl Acad Sci U S A. 2006;103(17):6500–6505. doi: 10.1073/pnas.0510860103. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Braun-Sand S, Burykin A, Chu ZT, Warshel A. Realistic simulations of proton transport along the gramicidin channel: demonstrating the importance of solvation effects. J Phys Chem B. 2005;109(1):583–592. doi: 10.1021/jp0465783. [DOI] [PubMed] [Google Scholar]
40.Warshel A. Conversion of light energy to electrostatic energy in the proton pump of halobacterium halobium. Photochem Photobiol. 1979;30:285–290. doi: 10.1111/j.1751-1097.1979.tb07148.x. [DOI] [PubMed] [Google Scholar]
41.Sham YY, Muegge I, Warshel A. Simulating proton translocations in proteins: probing proton transfer pathways in the Rhodobacter sphaeroides reaction center. Proteins. 1999;36(4):484–500. [PubMed] [Google Scholar]
42.Woycechowsky KJ, Choutko A, Vamvaca K, Hilvert D. Relative Tolerance of an Enzymatic Molten Globule and Its Thermostable Counterpart to Point Mutation. Biochemistry. 2008;47(51):13489–13496. doi: 10.1021/bi801108a. [DOI] [PubMed] [Google Scholar]

[R1] 1.Levitt M, Warshel A. Computer-Simulation of Protein Folding. Nature. 1975;253(5494):694–698. doi: 10.1038/253694a0. [DOI] [PubMed] [Google Scholar]

[R2] 2.Bryngelson JD, Wolynes PG. Spin glasses and the statistical mechanics of protein folding. Proc Natl Acad Sci U S A. 1987;84(21):7524–7528. doi: 10.1073/pnas.84.21.7524. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Dill KA. Dominant Forces in Protein Folding. Biochemistry. 1990;29(31):7133–7155. doi: 10.1021/bi00483a001. [DOI] [PubMed] [Google Scholar]

[R4] 4.Hinds DA, Levitt M. A lattice model for protein structure prediction at low resolution. Proc Natl Acad Sci U S A. 1992;89(7):2536–2540. doi: 10.1073/pnas.89.7.2536. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Olszewski KA, Kolinski A, Skolnick J. Folding simulations and computer redesign of protein A three-helix bundle motifs. Proteins. 1996;25(3):286–299. doi: 10.1002/(SICI)1097-0134(199607)25:3<286::AID-PROT2>3.0.CO;2-E. [DOI] [PubMed] [Google Scholar]

[R6] 6.Shakhnovich E, Abkevich V, Ptitsyn O. Conserved residues and the mechanism of protein folding. Nature. 1996;379(6560):96–98. doi: 10.1038/379096a0. [DOI] [PubMed] [Google Scholar]

[R7] 7.Cheung MS, Chavez LL, Onuchic JN. The energy landscape for protein folding and possible connections to function. Polymer. 2004;45(2):547–555. [Google Scholar]

[R8] 8.Heath AP, Kavraki LE, Clementi C. From coarse-grain to all-atom: toward multiscale analysis of protein landscapes. Proteins. 2007;68(3):646–661. doi: 10.1002/prot.21371. [DOI] [PubMed] [Google Scholar]

[R9] 9.Marrink SJ, Risselada HJ, Yefimov S, Tieleman DP, de Vries AH. The MARTINI force field: coarse grained model for biomolecular simulations. J Phys Chem B. 2007;111(27):7812–7824. doi: 10.1021/jp071097f. [DOI] [PubMed] [Google Scholar]

[R10] 10.Fan ZZ, Hwang JK, Warshel A. Using simplified protein representation as a reference potential for all-atom calculations of folding free energy. Theor Chem Acc. 1999;103(1):77–80. [Google Scholar]

[R11] 11.Benkovic SJ, Hammes GG, Hammes-Schiffer S. Free-energy landscape of enzyme catalysis. Biochemistry. 2008;47(11):3317–3321. doi: 10.1021/bi800049z. [DOI] [PubMed] [Google Scholar]

[R12] 12.Roca M, Messer BM, Hilvert D, Warshel A. On the relationship between folding and chemical landscapes in enzyme catalysis. Proc Natl Acad Sci U S A. 2008;105:13877–13882. doi: 10.1073/pnas.0803405105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Xiang Y, Goodman MF, Beard WA, Wilson SH, Warshel A. Exploring the role of large conformational changes in the fidelity of DNA polymerase beta. Proteins. 2008;70(1):231–247. doi: 10.1002/prot.21668. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Luzhkov V, Warshel A. Microscopic Models for Quantum Mechanical Calculations of Chemical Processes in Solutions: LD/AMPAC and SCAAS/AMPAC Calculations of Solvation Energies. J Comp Chem. 1992;13:199–213. [Google Scholar]

[R15] 15.Rosta E, Klahn M, Warshel A. Towards Accurate Ab Initio QM/MM Calculations of Free-Energy Profiles of Enzymatic Reactions. J Phys Chem B. 2006;110(6):2934–2941. doi: 10.1021/jp057109j. [DOI] [PubMed] [Google Scholar]

[R16] 16.Wesolowski TA, Warshel A. Frozen Density Functional Approach for Ab Initio Calculations of Solvated Molecules. J Phys Chem. 1993;97:8050–8053. [Google Scholar]

[R17] 17.Hwang JK, Warshel A. A quantized classical path approach for calculations of quantum mechanical rate constants. J Phys Chem. 1993;97(39):10053–10058. [Google Scholar]

[R18] 18.Hwang JK, Warshel A. How important are quantum mechanical nuclear motions in enzyme catalysis? J Am Chem Soc. 1996;118:11745–11751. [Google Scholar]

[R19] 19.Brandt A. Principles of systematic upscaling. In: Fish J, editor. Bridging the scales in Science and Engineering. Oxford University Press; 2008. [Google Scholar]

[R20] 20.Matysiak S, Clementi C. Minimalist protein model as a diagnostic tool for misfolding and aggregation. J Mol Biol. 2006;363(1):297–308. doi: 10.1016/j.jmb.2006.07.088. [DOI] [PubMed] [Google Scholar]

[R21] 21.Roca M, Messer B, Warshel A. Electrostatic contributions to protein stability and folding energy. FEBS Lett. 2007;581(10):2065–2071. doi: 10.1016/j.febslet.2007.04.025. [DOI] [PubMed] [Google Scholar]

[R22] 22.Warshel A, Sharma PK, Kato M, Parson WW. Modeling electrostatic effects in proteins. Biochim Biophys Acta. 2006;1764(11):1647–1676. doi: 10.1016/j.bbapap.2006.08.007. [DOI] [PubMed] [Google Scholar]

[R23] 23.Vicatos S, Roca M, Warshel A. Effective Approach for Calculations of Absolute Stability of Proteins Using Focused dielectric Constants. Proteins. 2009 doi: 10.1002/prot.22481. In Print. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Warshel A, Russell ST. Calculations of electrostatic interactions in biological systems and in solutions. Q Rev Biophys. 1984;17(3):283–422. doi: 10.1017/s0033583500005333. [DOI] [PubMed] [Google Scholar]

[R25] 25.Warshel A, Russell ST, Churg AK. Macroscopic models for studies of electrostatic interactions in proteins: limitations and applicability. Proc Natl Acad Sci U S A. 1984;81(15):4785–4789. doi: 10.1073/pnas.81.15.4785. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Lovell SC, Word JM, Richardson JS, Richardson DC. The penultimate rotamer library. Proteins. 2000;40(3):389–408. [PubMed] [Google Scholar]

[R27] 27.King G, Warshel A. A surface constrained all-atom solvent model for effective simulations of polar solutions. J Chem Phys. 1989;91(6):3647–3661. [Google Scholar]

[R28] 28.Lee FS, Chu ZT, Warshel A. Microscopic and semimicroscopic calculations of electrostatic energies in proteins by the POLARIS and ENZYMIX programs. J Comp Chem. 1993;14:161–185. [Google Scholar]

[R29] 29.Lovell SC, Davis IW, Arendall WB, 3rd, de Bakker PI, Word JM, Prisant MG, Richardson JS, Richardson DC. Structure validation by Calpha geometry: phi,psi and Cbeta deviation. Proteins. 2003;50(3):437–450. doi: 10.1002/prot.10286. [DOI] [PubMed] [Google Scholar]

[R30] 30.Went HM, Jackson SE. Ubiquitin folds through a highly polarized transition state. Protein Eng Des Sel. 2005;18(5):229–237. doi: 10.1093/protein/gzi025. [DOI] [PubMed] [Google Scholar]

[R31] 31.Roca M, Vardi-Kilshtain A, Warshel A. Toward Accurate Screening in Computer-Aided Enzyme Design. Biochemistry. 2009;48(14):3046–3056. doi: 10.1021/bi802191b. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Pisliakov AV, Cao J, Kamerlin SCL, Warshel A. Enzyme millisecond conformational dynamics do not catalyze the chemical step. Proc Natl Acad Sci U S A. 2009 doi: 10.1073/pnas.0909150106. Early edition. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Burykin A, Kato M, Warshel A. Exploring the origin of the ion selectivity of the KcsA potassium channel. Proteins. 2003;52(3):412–426. doi: 10.1002/prot.10455. [DOI] [PubMed] [Google Scholar]

[R34] 34.Liu H, Shi Y, Chen XS, Warshel A. Simulating the electrostatic guidance of the vectorial translocations in hexameric helicases and translocases. Proc Natl Acad Sci U S A. 2009;106(18):7449–7454. doi: 10.1073/pnas.0900532106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Khandogin J, Brooks CL., 3rd Constant pH molecular dynamics with proton tautomerism. Biophys J. 2005;89(1):141–157. doi: 10.1529/biophysj.105.061341. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Baptista AM, Martel PJ, Petersen SB. Simulation of protein conformational freedom as a function of pH: constant-pH molecular dynamics using implicit titration. Proteins. 1997;27(4):523–544. [PubMed] [Google Scholar]

[R37] 37.Baptista AM, Martel PJ, Soares CM. Simulation of electron-proton coupling with a Monte Carlo method: application to cytochrome c3 using continuum electrostatics. Biophys J. 1999;76(6):2978–2998. doi: 10.1016/S0006-3495(99)77452-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Olsson MH, Warshel A. Monte Carlo simulations of proton pumps: on the working principles of the biological valve that controls proton pumping in cytochrome c oxidase. Proc Natl Acad Sci U S A. 2006;103(17):6500–6505. doi: 10.1073/pnas.0510860103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Braun-Sand S, Burykin A, Chu ZT, Warshel A. Realistic simulations of proton transport along the gramicidin channel: demonstrating the importance of solvation effects. J Phys Chem B. 2005;109(1):583–592. doi: 10.1021/jp0465783. [DOI] [PubMed] [Google Scholar]

[R40] 40.Warshel A. Conversion of light energy to electrostatic energy in the proton pump of halobacterium halobium. Photochem Photobiol. 1979;30:285–290. doi: 10.1111/j.1751-1097.1979.tb07148.x. [DOI] [PubMed] [Google Scholar]

[R41] 41.Sham YY, Muegge I, Warshel A. Simulating proton translocations in proteins: probing proton transfer pathways in the Rhodobacter sphaeroides reaction center. Proteins. 1999;36(4):484–500. [PubMed] [Google Scholar]

[R42] 42.Woycechowsky KJ, Choutko A, Vamvaca K, Hilvert D. Relative Tolerance of an Enzymatic Molten Globule and Its Thermostable Counterpart to Point Mutation. Biochemistry. 2008;47(51):13489–13496. doi: 10.1021/bi801108a. [DOI] [PubMed] [Google Scholar]

PERMALINK

Multiscale Simulations of Protein Landscapes: Using Coarse Grained Models as Reference Potentials to Full Explicit Models

Benjamin M Messer

Maite Roca

Zhen T Chu

Spyridon Vicatos

Alexandra Vardi Kilshtain

Arieh Warshel

Abstract

1. Introduction

2. Methods

2.1. The Simplified Protein Model

Figure 1.

2.1.1. The side-side interaction potential

Table I.

2.1.2. Effective electrostatic terms that involve the main chain

2.1.3 The self energy term

2.1.4. The main chain torsional potential

Table II.

2.1.5. The effective hydrogen bonding potential

2.1.6. The effective side chain-main chain potential

2.2. Simple to Explicit Transformation

Figure 2.

3. Results and Discussion

3.1. Validation and Parameterization

3.1.1. Refining ΔUmmphi−psi

Figure 3.

3.1.2. Examination of ΔUHB

3.1.3. Refining Usself

Figure 4.

Table III.

Table IV.

3.2. Evaluating the free energy of transition between the simplified and explicit surfaces

3.2.1 Evaluating changes in Folding Energy

Figure 5.

Figure 6.

Figure 7.

Table V.

3.2.2 Evaluating transition state binding free energy

Figure 8.

Figure 9.

Table VI.

3.3 Other applications

3.3.1 Studies of the coupling between conformational and chemical motions

3.3.2 Studies of constant pH dynamics

Figure 10.

4. Conclusions

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3.1.1. Refining $Δ U_{mm}^{phi - psi}$

3.1.2. Examination of ΔU^HB

3.1.3. Refining $U_{s}^{self}$