Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2007 Aug 14.
Published in final edited form as: J Chem Theory Comput. 2006;2(3):696–704. doi: 10.1021/ct600060d

The Extent of Cooperativity of Protein Motions Observed with Elastic Network Models Is Similar for Atomic and Coarser-Grained Models

Taner Z Sen 1,2, Yaping Feng 2,3, John V Garcia 1, Andrzej Kloczkowski 1, Robert L Jernigan 1,2,*
PMCID: PMC1948848  NIHMSID: NIHMS21980  PMID: 17710199

Abstract

Coarse-grained elastic network models have been successful in determining functionally relevant collective motions. The level of coarse-graining, however, has usually focused on the level of one point per residue. In this work, we compare the applicability of elastic network models over a broader range of representational scales. We apply normal mode analysis for multiple scales on a high-resolution protein data set using various cutoff radii to define the residues considered to be interacting, or the extent of cooperativity of their motions. These scales include the residue-, atomic-, proton-, and explicit solvent-levels. Interestingly, atomic, proton, and explicit solvent level calculations all provide similar results at the same cutoff value, with the computed mean-square fluctuations showing only a slightly higher correlation (0.61) with the experimental temperature factors from crystallography than the results of the residue-level coarse-graining. The qualitative behavior of each level of coarse graining is similar at different cutoff values. The correlations between these fluctuations and the number of internal contacts improve with increased cutoff values. Our results demonstrate that atomic level elastic network models provide an improved representation for the collective motions of proteins compared to the coarse-grained models.

Keywords: Gaussian network model, elastic network model, protein motions, coarse-grained protein models

Introduction

Elastic Network Models13 have been quite successful in predicting the large-scale motions of proteins and other biological structures, even for such large complexes as the ribosome46. These models originated from the theory of polymer networks7,8 using the pioneering idea of Tirion3, who proposed a single uniform spring constant parameter for all atom-atom contacts used in a normal mode analysis. Elastic Network applications have usually focused on coarse-grained representations of proteins, using mostly Cα-atoms and relying upon Cα-Cα proximity for placement of springs. The predicted position fluctuations of amino acids in proteins obtained from Elastic Network Models usually give quite good agreement with experimental B-factors measured by crystallographers, but as we will see here more detailed atomic models yield similar, if slightly better results. This is an important finding that may be particularly important for developing mixed coarse-grained models wherein the functionally important part of the protein is represented by atoms and the remainder of the structure is rendered in lesser detail. The only information utilized in Elastic Network Models is the structure of the protein, from the Protein Data Bank (PDB)9, but this approach can also be applied to hypothetical protein models based on sequence similarities or other techniques. The essential aspect of these models is a representation of proteins as highly interconnected structures, which represents well their cohesive and cooperative nature. It has been shown that fluctuations of residues in proteins depend mostly on the packing density and that the slowest modes corresponding to the motions of large domains depend essentially on the protein shape10,11. Elastic Network Models have been useful in studies of protein binding12 and the analysis of the binding pocket flexibility13.

One of the strengths of the Gaussian Network Model is its success in the determination of functionally significant collective motions in proteins with an extremely simple model based only on packing density and geometry. However, does such a simple model, which does not differentiate between various bonded and non-bonded interactions (such as covalent and hydrogen bonds), produce physically meaningful results? There is strong evidence that it actually does. First, the accumulated normal mode analysis results demonstrate clearly that GNM produces experimentally-verifiable results, e.g. for X—ray analysis2,14, NMR15, hydrogen-exchange16, and cryo-EM4,17,18 experiments. Second, the normal mode results correlate well with results of molecular dynamics (MD) simulations19 based on detailed atomistic force fields. These studies have proven that the normal mode analysis using coarse-grained models is extremely useful, and that collective motions derived from the equilibrium structure depend largely on the shape of the protein, rather than on particular types of interactions10,11. A lack of any dependence on discriminating between bonded and non-bonded interactions is most likely due to the large number of interactions inside compact structures of biomolecules that leads to their cohesiveness and cooperativity. Essentially for large compact structures the number of covalent bonds is small compared to the number of non-bonded interactions. Note that this conclusion does not negate the differential importance of certain types of interactions for protein stability or for the folding process.

Although elastic network models have proven to provide a good description of protein collective motions, the effect of coarse-graining over the full range of scales has not been thoroughly explored. Jernigan and co-workers have mostly analyzed one end of the spectrum - coarser-grained models of proteins - and have observed that even when 40 residues of hemagglutinin A, are represented by a single node, the global motions are only slightly affected20,21 in comparison to more detailed models. Here we will explore the other end of the spectrum, and study the effect of more detailed representations of proteins for the elastic network models. We will analyze the effect of scaling in elastic network models by comparing results obtained at varying levels of coarse-graining. These levels will include one point per residue, one point per atom for heavy atoms alone, and the case when protons are also included. Additionally we will investigate the effect of explicit inclusion of solvent molecules insofar as they are reported for high-resolution protein structures. For the residue-level coarse-graining, a single node (located at Cα) is assigned to each residue. For the atomic-level representation, each heavy atom in the protein is assigned a node, and hydrogen atoms are neglected. For the proton-level additional nodes for each hydrogen atom in the protein are included. Finally, in the explicit solvent-level representation oxygen and hydrogen atoms of the water molecules reported in the crystallographic data are also taken into account and each of these atoms is represented by a node. Our study will allow us to analyze the effects of scaling at various levels of accuracy and present a multi-scale picture of the normal mode of protein dynamics.

Previously22 we had observed a strong correlation between the entropies computed from the elastic network models with the number of internal contacts in the given protein. This corresponds to a simple view of protein stabilities, in which the number of contacts (stabilizing energy) compensates directly for the extent of motions within the structure (motional entropy). Conceivably such a simple relationship could also depend upon the level of cooperativity in the model, i.e. the cutoff distance defining both the number of contacts and their restraining effects on the motions of the protein. We have investigated this correlation for the same set of proteins and at different levels of coarse-graining with the same elastic network models.

Although normal mode analyses provide a remarkable tool for probing protein dynamics, they have some limitations: every interaction is treated identically for all contacts regardless of the contact distance or type of interaction. We have observed however that the results obtained by using residue-type specific potentials23,24 at the residue-level coarse-graining (unpublished results) or adjusted springs based on number of contacts25 are not substantially different from those obtained by using a harmonic potential with a single uniform spring constant. Furthermore, elastic network model results are comparable to those of molecular dynamics based on AMBER potential19. Here, we take a different route and explore the effect of assigning a harmonic potential with a single uniform spring constant for each pair of nodes being in contact regardless of the type of the interaction, at all of the different scales of coarse-graining.

There are other important reasons to introduce more detailed atomic level elastic network models. For other types of studies such as enzyme mechanisms26, unraveling the details of molecular hinges or detailed investigations of residue conservation around hinges, further detail is likely to be important. One potential outcome from the atomic elastic networks could be the identification of specific conserved atomic groups, in more detail than residue conservation, relating to critical functional motions and flexibility, within molecular hinges, enzyme active sites or other functional loci. This could be information of importance for protein design. One of the appealing aspects of the atomic models is that they can be conveniently combined with other more coarsely grained parts of the structure (mixed coarse-graining), as has been demonstrated previously20,21,27.

Methods

Dataset

We used search tools available on the PDB web site to find proteins with resolution better than 0.8 Å and with less than 50% sequence similarity to one another. We narrowed our list for this initial study to only eight mostly single chain proteins whose lengths range from 64 to 158 amino acids. These proteins, listed in order by their increasing size are: Type III antifreeze protein rd1 (pdb id: 1ucs) (64 residues), syntenin Pdz2 domain (1r6j) (82 residues), high-potential iron-sulfur protein (1iua) (83 residues), Lys-49 phospholipase A2 homologue (lysine 49 PLA2) (1mc2) (122 residues), cobratoxin (1v6p) (2 chains 62 residues each), bacterial photoreceptor pyp (1nwz) (125 residues), carbohydrate Binding Domain Cbm36 (1w0n) (131 residues), and E. Coli pyrophosphokinase HPPK (1f9y) (158 residues).

Multi-scale representations

Our defined models are: “Residue-level models” include only Cα atoms; “Atomic-level models” include every atom in a protein except hydrogen atoms; “Proton-level models” include every atom in a protein including hydrogen atoms; and finally, “explicit solvent-level models” include every protein atom and also every oxygen and hydrogen atom of water molecules in the crystallographic data provided in the protein PDB. If the positions of hydrogen atoms are not found in the pdb file, Accelrys DS ViewerPro is used to generate locations of missing hydrogen atoms. Ligands are removed from the protein structures and are not included in the present analyses.

Gaussian network models

The details of the Gaussian Network Model2 (GNM) and its extension considering the directionalities of fluctuations - the Anisotropic Network Model1 can be found elsewhere. The GNM originates from the theory of rubber-like elasticity7,8 and Tirion's approach of using a uniform spring constant parameter in the harmonic analysis of protein motions3. The cohesiveness of the protein structure in the elastic network model is represented by assuming that all pairs of nodes separated by less than a certain cutoff distance are connected by uniform springs. In the standard coarse-grained version, each residue is represented by a single point ( node) positioned at its Cα atom, but we will also use an atomic version here where the points represent atoms. There are two parameters in the model: the cutoff distance Rc and the spring constant γ. The cutoff distance Rc determines whether two residues are connected by a spring, i.e. are in contact, without differentiating between bonded and non-bonded interactions. These contacts are mathematically expressed as the contact (Kirchhoff) matrix, Γ, where the ij-the element of the matrix is −1 if nodes i and j are connected by a spring, and zero otherwise, and the diagonal elements are the sums of non-diagonal elements in a given row (or column) taken with the negative sign. Because of this definition the matrix Γ is singular (its determinant is zero) and only the pseudoinverse of Γ can by calculated by using the singular value decomposition (SVD) method. It can be shown that the zero eigenvalues of Γ that are eliminated by using SVD correspond to the six external rigid body degrees of freedom. The equilibrium correlations < ΔRi · ΔRj> between fluctuations of residues i and j are proportional to the ij-th element of the inverse of Γ,

<ΔRiΔRj>=3kBT2γ(Γ1)ij (1)

where ΔRi and ΔRj are the vectors representing the instantaneous displacements of the ith and the jth nodes from their mean positions. Here kB is Boltzmann's constant, T is temperature and γ is is the spring constant. The mean square fluctuation < (ΔRi)2 > of the i-th node is then given by the i-th diagonal element [Γ−1]ii of the matrix Γ−1. The mean square fluctuations may be compared directly with the experimental crystallographic Debye-Waller temperature factors (B-factors) usually available in the pdb files by the equation:

Bi=8π2<(ΔRi)2>/3  (2)

The pseudoinverse matrix Γ−1 can be expanded in the series of eigenvalues λk and eigenvectors uk of the contact matrix Γ as follows:

Γ1=kλk1ukukT  (3)

where zero eigenvalues (that physically correspond to motions of the center of mass of the system) are excluded from the summation. This eigen-expansion has a direct physical meaning by showing contributions from individual modes associated with the eigenvalues of Γ.28 The ith component of the eigenvector uk (corresponding to the kth normal mode) specifies the magnitude of the mean square fluctuations of the ith node in the kth mode. It can also be shown that all eigenvalues of Γ are non-negative. If we order eigenvalues according to their ascending values starting from zero, then the most important contributions in Eq. 3 are given by the smallest non-zero eigenvalues λk, that correspond to the large-scale, slow, collective modes. Slowest modes play a dominant role in the fluctuational dynamics of protein structures, because their contributions to the mean-square fluctuations scale with λk−1. It has been shown that the most important motions of proteins2931 or large biological structures (such as the ribosome)46,32,33 that are associated with their biological function can be clearly identified with a few slowest modes of GNM. The large-scale changes of protein conformations between ‘open’ and ‘closed’ forms, or domain swapping in proteins can be also well represented with elastic network models34. Reviews of elastic network applications can be found in References 16,35.

Correlation coefficients

The usual criterion for choosing parameters is based upon achieving the best agreement between the computed fluctuations and the experimental B-factors. For this purpose, here we use the linear correlation coefficient:

C=i=1N(xix¯)(yiy¯)i=1N(xix¯)2.i=1N(yiy¯)2  (4)

In this equation, N is the number of nodes, xi and are the mean-square fluctuations of the ith node calculated by GNM and their mean over all nodes, respectively. Similarly, yi and ȳ are the experimentally determined B-factor for the ith node and the mean over all nodes. The linear correlation coefficient is a straightforward way to analyze the extent of linear dependence between any two quantities. Its value can range between 1 and −1, where the limiting values 1 and −1 correspond to perfect correlation and perfect anti-correlation.

Overlaps

Absolute overlap between two eigenvectors, each representing specific motions, is defined as

|cosθ|=|inxiyi||x||y|  (5)

In this equation, x and y are two eigenvectors, xi and yi denote their ith components and θ is the angle between x and y. If two eigenvectors are exactly collinear, then their absolute overlap equals 1. If they are orthogonal to each other, than the absolute overlap is zero, and the angle between the two eigenvectors is 90°. This provides a measure of the extent of similarity in the directions of motions for different modes.

Entropy

In the Gaussian Network Model fluctuations of residues about their mean positions obey the Gaussian distribution

W(ΔRi)=Aexp{3(ΔRi)2/2<(ΔRi)2>}  (6)

The conformational entropy change ΔSi resulting from fluctuations in the position of the ith residue can be obtained from the equation

ΔSi=kBlnW(ΔRi)=γ(ΔRi)2/(2T[Γ1]ii)  (7)

Eq. 1 for the case i=j was applied in the above derivation. Equation 7 can be used to calculate the free energy increase of entropic origin contributed by the ith residue, upon distortion ΔRi of its coordinates

ΔGi=TΔSi=γ2(ΔRi)2/[Γ1]ii  (8)

This free energy change is inversely proportional to < (ΔRi)2 >. Physically, this signifies a stronger resistance to deformation, including unfolding, of residues subject to smaller amplitude fluctuations in the folded state.16

Results and Discussion

Choosing spring constants for different resolution scales

The Gaussian Network Model requires specification of two parameters: the spring constant that defines the strength of interactions and the cutoff distance that defines whether two given nodes are in contact or not. The spring constant ultimately scales the amplitudes of motions calculated from the contact matrix. When comparing results obtained at different scales, the spring constant should be adjusted to reflect the scale at which the protein is modeled27. Here, the spring constants at each scale are calculated for each protein by comparing fluctuations predicted by GNM with experimentally determined B-factors, as this method has proven to be generally successful in the past.

Choosing cutoff radii for different resolution scales

Correlations between the GNM-derived mean-square fluctuations and crystallographic B-factors calculated from Eq. 4 clearly show the extent to which GNM results represent actual protein motions. Phillips and co-workers14 showed that GNM coarse-grained at the residue-level has a correlation of about 0.6 with these experimental data, depending on the cutoff radius and on the extent of inclusion of neighboring molecules packed in the crystal. Although 60% correlation at the residue-level is rather impressive, here we are studying the effect of including other atoms together with solvent molecules in the crystal on these correlations. Table 1 shows the correlation coefficients for Cα-atoms calculated at the residue, atomic, proton, and the explicit solvent levels for various cutoffs.

Table 1. The average correlation coefficients between computed mean square fluctuations and experimental B-factors for four different resolution levels of coarse-graining as a function of the cutoff distance.

A correlation of 1 shows perfect correlation and 0 the lack of correlation (maxima are indicated in bold).

Cutoff (Å) Residue Level Atomic Level Proton Level Solvent Level
1 -- -- -- --
2 -- 0.17 0.37 0.27
3 -- 0.46 0.59 0.51
4 0.17 0.61 0.58 0.42
5 0.38 0.59 0.59 0.48
6 0.38 0.57 0.59 0.52
7 0.51 0.60 0.60 0.56
8 0.55 0.60 0.61 0.56
9 0.52 0.60 0.61 0.57
10 0.56 0.60 0.60 0.58
11 0.56 0.61 0.60 0.59
12 0.54 0.60 0.60 0.59
13 0.55 0.59 0.59 0.59
14 0.55 0.58 0.59 0.60
15 0.54 0.57 0.58 0.59

The results in Table 1 show that at the residue level, the correlation increases with increasing cutoff radius reaching a peak around 11 Å as shown in Figure 1. However, the average correlation coefficient never exceeds 0.56. Although the value of this correlation is close to the result (∼0.6) obtained by Phillips14, the optimum cutoff radius (11 Å ) found here is much larger than the Phillips' optimum cutoff of 7.3 Å. One major difference is that we have neglected intermolecular contacts due to packing in crystal. It is also important to note that the number of proteins in our data set is quite limited (8 proteins only). For further comparison with the Phillips group's results14, we repeated the average correlation coefficient calculations as a function of cutoff distance with their data set of 113 proteins. These results shown in Fig. 1 indicate that for the 113-protein data set, another peak around 11.1 Å is also clearly visible. Figure 1 also demonstrates that although the 8-protein set consistently exhibits lower correlations than the 113-protein set, the average correlation coefficients of both sets have similar patterns; thus the 8-protein set seems to be sufficiently representative to make comparisons at various radii.

Figure 1. Average correlation coefficients as a function of the cutoff radius for the 8-protein set used in this study and the 113-protein set used by Phillips14.

Figure 1

The correlation coefficients between the results of residue-level coarse-grained model and experimental B-factors for both data sets suggest two optimal cutoff radii around 7.3 Å and 11.1 Å.

Table 2 lists the optimum cutoff distances for all eight proteins for each of the four different resolution level models studied here. The correlation coefficients are also given in Table 2 in parentheses. A real surprise comes upon examination of average correlation coefficients obtained at better resolution with more detailed scales. The inclusion of other atoms in the normal mode analysis increases the average correlation coefficient for the fluctuations of the Cα-atoms by 0.05 to 0.61. This is highly interesting, because although all interactions are treated similarly, a better correlation is obtained. The inclusion of all heavy atoms clearly provides a superior representation of protein structure and protein dynamics. Interestingly, the further inclusion of protons or even atoms of the solvent does not enhance these correlations, and only shifts the optimum cutoff radius. The optimum cutoffs for various scales differ: for atomic and proton-level calculations, the optimum cutoff values are 4 Å and 9 Å, respectively, and for the explicit solvent level the optimum cutoff is 14 Å. It is worth emphasizing that the inclusion of atoms redefines the packing density critical for protein dynamics. While the consideration of protons in protein structure is associated with small uncertainties such as the ionization state of histidine, the inclusion of atoms of the explicit solvent is much more uncertain. At least it is encouraging that there is no visible loss of correlation when these possibly incomplete sets of solvent atoms are included.

Table 2. The optimum cutoff radii (Å) for eight proteins in the data set for four different resolution level models.

The correlation coefficients are given in parentheses.

1ucs 1iua 1r6j 1w0n 1mc2 1nwz 1v6p 1f9y
Residue 8 (0.65) 13 (0.54) 14 (0.76) 12 (0.48) 7 (0.60) 10 (0.54) 6 (0.63) 19 (0.78)
Atom 4 (0.67) 5 (0.56) 14 (0.72) 8 (0.58) 5 (0.73) 4 (0.67) 7 (0.64) 22 (0.78)
Proton 3 (0.66) 5 (0.54) 15 (0.71) 8 (0.59) 5 (0.68) 3 (0.63) 7 (0.67) 23 (0.78)
Solvent 15 (0.59) 15 (0.53) 18 (0.69) 9 (0.51) 5 (0.62) 10 (0.66) 7 (0.64) 23 (0.78)

Atomic and proton resolution level models give better results than the residue-level models

To analyze the effect of the resolution scale of the model, we have chosen one of the proteins from the data set lysine 49 PLA2 (pdb code: 1mc2) for a more detailed presentation of the results. A schematic representation of the protein backbone colored according to the magnitude of mean-square fluctuations of residues derived from the experimental data, and from residue-level, and explicit solvent-level models is shown in Figures 2a–c, respectively. The residue-level model computations were performed with the cutoff radius 7 Å, and the atomic-level calculations with the cutoff 5 Å. Figure 3 shows the computed mean-square fluctuations of Cα-atoms for the residue-level and the atomic-level models. B-factors are also provided for comparison. The predicted fluctuations are calculated by summing over all internal normal modes. The mean square fluctuations obtained for the residue-level model have a correlation of 0.60 with B-factors, whereas the atomic-level model calculations with 5 Å cutoff give a correlation 0.73 with the experimental data. Figure 3 shows that mean square fluctuations predicted from the atomic-level model are significantly closer to the experimental B-factors, both qualitatively and quantitatively.

Figure 2. The schematic picture of lysine 49 PLA2.

Figure 2

(PDB id: 1mc2). The backbone is colored according to the magnitude of mean-square fluctuations obtained (a) experimentally, (b) computed from the residue-level GNM, and (c) calculated from the atomic-level GNM. Most mobile regions are colored with red, less mobile regions with green, and finally, almost immobile regions with blue.

Figure 3. The mean-square fluctuations for lysine 49 PLA2 computed from the residue-level and the atomic-level models using optimal cutoffs.

Figure 3

Results are shown for Cα atoms only.

What is the source of the discrepancy between theoretical predictions and the experimental data? For further analysis, we focus on the PDZ2 domain of syntenin (1r6j). PDZ domains are mainly involved in the regulation of intracellular signaling and in the assembly of large protein complexes36. The structure of the PDZ2 domain of syntenin was resolved with a resolution 0.73 Å, allowing determination of coordinates of the hydrogen atoms in the crystal37. The PDZ2 domain contains 82 residues and 1867 atoms (including solvent atoms and hydrogen atoms). Figure 4 shows the dependence of the absolute value of the difference between predicted mean square fluctuations and experimental B-factors as a function of the number of contacts in the protein structure. An inverse relationship can clearly be seen between this difference and the number of neighbors (contacts). Since nodes inside the protein core have more contacts, Figure 4 shows that the GNM predictions are generally less accurate on the protein surface. This implies that atoms on the protein surface should perhaps be treated in a more cooperative way than atoms of residues inside the core.

Figure 4. The absolute differences between atomic-level model predictions and experimental B-factors for the PDZ2 domain.

Figure 4

The calculations are performed at the cutoff 5 Å as a function of the number of contacts (neighbors).

Since the GNM is mainly used to analyze cooperative global motions with functional relevance, a detailed analysis of slowest normal modes is of critical importance. For this purpose, we show in Figure 5 the overlaps of the eigenvectors computed for the residue-level and proton-level models. The overlap is defined by Eq. 5 as the absolute value of the cosine of the angle between these two eigenvectors. The absolute value of the overlap is used because the term ukuTk in Eq. 3 does not depend on the direction of the eigenvector uk, and the use of absolute cosine ensures that a the 180° rotation still specifies the same type of motion. The overlap is calculated only for the eigenvector components corresponding to the Cα-atoms. Figures 5a to 5d illustrate these overlaps for four different proteins: (5a) 1ucs, (5b) 1r6j, (5c) 1w0n, and (5d) 1f9y.

Figure 5. The absolute overlaps, |cosθ|, between eigenvectors obtained for the residue-level and the proton-level models for (a) Type III antifreeze protein rd1 (1ucs), (b) syntenin Pdz2 domain (1r6j), (c) carbohydrate Binding Domain Cbm36 (1w0n), and (d) E. Coli pyrophosphokinase HPPK (1f9y).

Figure 5

The calculations were performed by using optimum cutoffs for each protein for a given model. Proteins are arranged from (a) to (d) according to increasing protein size.

Each point in Figures 5ad shows a pair of eigenvectors, one computed from the residue-level model and the other from the proton-level model that have an absolute overlap of at least 0.4. The results were obtained by using optimum cutoff radii for each level of resolution for various proteins according to Table 2. For the case of syntenin, the eigenvectors corresponding to the first 10 slowest modes in both the residue-level and proton-level models have overlap higher than 0.4. However, this correspondence does not always hold, for example, for the case of pyrophosphokinase HPPK, this overlap is less good. More detailed studies are needed to conclude whether there may be certain regularities in the overlaps of modes in protein multi-scale models.

Figure 5 shows scattered, sporadic, rather weak overlaps for 1ucs (5a) but not for other proteins (5b–d): The small (64 residues) Type III antifreeze protein rd1 (pdb id: 1ucs) indeed shows very scattered overlaps, but for the larger proteins, there is a strong overlap between corresponding eigenvectors (around the diagonal of the plot), and very weak overlap between dissimilar eigenvectors (far from the diagonal). This high overlaps between these two different scales can be due to the protein size, which is indirectly related to packing density (the larger the protein, then the larger is its core having high packing density). Since the successes of Elastic Network Models depend on having an adequate representation of protein packing, larger proteins in general might be expected to exhibit better multi-scale overlaps.

The effect of fluctuations in elastic network models on protein entropy

We have calculated the correlation coefficient (defined by Equation 4) between the free energy change of entropic origin given by Equation 8 and the numbers of contacts for alpha-carbons of each residue at four different levels of coarse graining. The results have been averaged over the set of eight proteins and are shown in Table 3 as the function of the cutoff distance used for defining contacts. It is interesting that Table 3 strongly resembles Table 1. This resemblance originates from the fluctuational nature of these free energy changes.

Table 3. The average correlation coefficients between the free energy change due to fluctuations (entropy) and the contact number (energy) as a function of the cutoff distance for four different resolution level models.

Correlation coefficients have been averaged over the set of eight proteins. High values are achieved for the three more detailed models at lower cutoff values, as is also seen in Table 1.

Cutoff (Å) Residue Level Atomic Level Proton Level Solvent Level
1 -- -- -- --
2 -- −0.32 −0.03 0.01
3 -- 0.15 0.50 0.50
4 0.19 0.76 0.91 0.89
5 0.61 0.95 0.99 0.97
6 0.73 0.99 1.00 0.99
7 0.89 1.00 1.00 1.00
8 0.95 1.00 1.00 1.00
9 0.98 1.00 1.00 1.00
10 0.99 1.00 1.00 1.00

Figure 6 shows plots of the absolute value of the entropy of fluctuations as a function of the total number of contacts for 3 different proteins: 1f9y, 1iua and 1mc2. The calculations have been performed for the standard residue-level coarse-grained GNM. We used six different values of the cutoff radius defining contacts, ranging from 5Å to 10Å with increments of 1Å. Each of these six cutoffs is represented by a marked point in Fig. 6 starting from 5Å on the left to 10Å on the right. The linearity of the plots in Figure 6 reemphasizes the dependence of entropy on packing density. A related study was also done by us38 and by Halle39, where an inverse relationship between mean-square fluctuations and contact densities can be seen. It is also worth noting that entropy depends on the size of the protein. The largest of the three proteins 1f9y (158 residues) has the smallest entropies, and the smallest one 1iua (83 residues) has also the largest entropies for the same number of contacts, as seen in Fig. 6. This means that the fluctuation entropy per contact is smaller for larger proteins, i.e., large proteins exhibit more cooperative motions.

Figure 6. The absolute value of the entropy of fluctuations as a function of the total number of contacts for 3 different proteins.

Figure 6

The residue-level coarse-grained model was used. For each protein, there are 6 points corresponding to 6 different cutoffs varying from 5Å on the left to 10Å on the right in increments of 1Å.

Conclusions

We have applied normal mode analysis with multi-scale coarse-graining to high-resolution protein structures. The atomic, proton, and explicit solvent level models all provide quite similar results, showing significantly higher correlations of the predicted fluctuations of Cα-atoms with the experimental B-factors, than the residue level GNM. At the residue-level coarse-graining, the optimum cutoff radius is ∼11 Å, which is significantly larger than the value 7.3 Å obtained by Phillips and coworkers14. This suggests that the optimum cutoff radius may depend on the specific protein structure, and the inclusion of intermolecular contacts in the crystal seems to be necessary at the residue level resolution. The absence of these intermolecular contacts in our model must be compensated by an increased cutoff that increases the number of springs and leads to better agreement with experimental data. The inclusion of atoms in our models significantly improves predictions of fluctuations of Cα-atoms and gives better correlations with experimental B-factors. Additionally better resolution atomic scale models require small cutoff radius (4 Å). However, there is a second maximum in the correlation values appearing at 11 Å, notably the same cutoff distance where the maximum occurs for the residue-level models. More detailed atomic resolution level elastic network models are likely to provide a better representation of motions in proteins. Our results also show that small proteins may require atomic scale resolution models to achieve a good representation of their dynamics. However, the atomic level GNM computations for larger proteins require significantly larger computer resources than those for the residue-level GNM. An alternative that offers a compromise might be mixed coarse-grained modeling of proteins proposed by Doruker and Jernigan20,21,27 - to include a high level of detail for the most important parts of the protein structure and less detail for other parts. Our analysis shows that the multi-scale normal mode analysis can be useful for understanding and predicting the collective motions in proteins.

Acknowledgments

The authors thank Lei Yang for a critical reading of the manuscript. The authors acknowledge the financial support provided by the NIH grants R01GM072014 and R33GM066387. JVG was a NIH-NSF BBSI Summer Institute in Bioinformatics and Computational Biology fellow.

References

  • 1.Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, Bahar I. Biophys J. 2001;80:505–515. doi: 10.1016/S0006-3495(01)76033-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bahar I, Atilgan AR, Erman B. Folding & Design. 1997;2:173–181. doi: 10.1016/S1359-0278(97)00024-2. [DOI] [PubMed] [Google Scholar]
  • 3.Tirion MM. Physical Review Letters. 1996;77:1905–1908. doi: 10.1103/PhysRevLett.77.1905. [DOI] [PubMed] [Google Scholar]
  • 4.Tama F, Valle M, Frank J, Brooks CL. Proc Natl Acad Sci USA. 2003;100:9319–9323. doi: 10.1073/pnas.1632476100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wang YM, Rader AJ, Bahar I, Jernigan RL. J Struct Biol. 2004;147:302–314. doi: 10.1016/j.jsb.2004.01.005. [DOI] [PubMed] [Google Scholar]
  • 6.Wang YM, Jernigan RL. Biophys J. 2005;89:3399–3409. doi: 10.1529/biophysj.105.064840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Flory PJ. Proceedings of the Royal Society of London Series A-Mathematical Physical and Engineering Sciences. 1976;351:351–380. [Google Scholar]
  • 8.Kloczkowski A, Mark JE, Erman B. Macromolecules. 1989;22:1423–1432. [Google Scholar]
  • 9.Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Doruker P, Jernigan RL. Proteins-Structure Function and Genetics. 2003;53:174–181. doi: 10.1002/prot.10486. [DOI] [PubMed] [Google Scholar]
  • 11.Lu MY, Ma JP. Biophys J. 2005;89:2395–2401. doi: 10.1529/biophysj.105.065904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Tobi D, Bahar I. PNAS. 2005 0507603102. [Google Scholar]
  • 13.Zheng W, Brooks BR. Biophys J. 2005;89:167–178. doi: 10.1529/biophysj.105.063305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kundu S, Melton JS, Sorensen DC, Phillips GN. Biophys J. 2002;83:723–732. doi: 10.1016/S0006-3495(02)75203-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Haliloglu T, Bahar I. Proteins-Structure Function and Genetics. 1999;37:654–667. doi: 10.1002/(sici)1097-0134(19991201)37:4<654::aid-prot15>3.0.co;2-j. [DOI] [PubMed] [Google Scholar]
  • 16.Bahar I, Rader AJ. Curr Opin Struct Biol. 2005;15:586–592. doi: 10.1016/j.sbi.2005.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ming D, Kong YF, Lambert MA, Huang Z, Ma JP. Proc Natl Acad Sci USA. 2002;99:8620–8625. doi: 10.1073/pnas.082148899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Beuron F, Flynn TC, Ma JP, Kondo H, Zhang XD, Freemont PS. J Mol Biol. 2003;327:619–629. doi: 10.1016/s0022-2836(03)00178-5. [DOI] [PubMed] [Google Scholar]
  • 19.Micheletti C, Carloni P, Maritan A. Proteins-Structure Function and Bioinformatics. 2004;55:635–645. doi: 10.1002/prot.20049. [DOI] [PubMed] [Google Scholar]
  • 20.Doruker P, Jernigan RL, Bahar I. Journal of Computational Chemistry. 2002;23:119–127. doi: 10.1002/jcc.1160. [DOI] [PubMed] [Google Scholar]
  • 21.Kurkcuoglu O, Jernigan RL, Doruker P. Qsar & Combinatorial Science. 2005;24:443–448. [Google Scholar]
  • 22.Bahar I, Wallqvist A, Covell DG, Jernigan RL. Biochemistry (Mosc) 1998;37:1067–1075. doi: 10.1021/bi9720641. [DOI] [PubMed] [Google Scholar]
  • 23.Miyazawa S, Jernigan RL. Macromolecules. 1985;18:534–552. [Google Scholar]
  • 24.Miyazawa S, Jernigan RL. J Mol Biol. 1996;256:623–644. doi: 10.1006/jmbi.1996.0114. [DOI] [PubMed] [Google Scholar]
  • 25.Sen TZ, Jernigan RL. In: In Normal Mode Analysis: Theory and Applications to Biological and Chemical Systems. Cui Q, Bahar I, editors. Chapter 9. CRC; Boca Raton FL: 2005. pp. 171–186. [Google Scholar]
  • 26.Kurkcuoglu O, Jernigan RL, Doruker P. Biochemistry (Mosc) 2006;45:1173–1182. doi: 10.1021/bi0518085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Doruker P, Jernigan RL, Navizet I, Hernandez R. International Journal of Quantum Chemistry. 2002;90:822–837. [Google Scholar]
  • 28.Haliloglu T, Bahar I, Erman B. Physical Review Letters. 1997;79:3090–3093. [Google Scholar]
  • 29.Keskin O, Durell SR, Bahar I, Jernigan RL, Covell DG. Biophys J. 2002;83:663–680. doi: 10.1016/S0006-3495(02)75199-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Keskin O, Bahar I, Flatow D, Covell DG, Jernigan RL. Biochemistry (Mosc) 2002;41:491–501. doi: 10.1021/bi011393x. [DOI] [PubMed] [Google Scholar]
  • 31.Navizet I, Lavery R, Jernigan RL. Proteins-Structure Function and Genetics. 2004;54:384–393. doi: 10.1002/prot.10476. [DOI] [PubMed] [Google Scholar]
  • 32.Rader AJ, Wang YM, Bahar I, Jernigan RL. Biophys J. 2004;86:190A. [Google Scholar]
  • 33.Trylska J, Konecny R, Tama F, Brooks CL, McCammon JA. Biopolymers. 2004;74:423–431. doi: 10.1002/bip.20093. [DOI] [PubMed] [Google Scholar]
  • 34.Kundu S, Jernigan RL. Biophys J. 2004;86:3846–3854. doi: 10.1529/biophysj.103.034736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Ma JP. Structure. 2005;13:373–380. doi: 10.1016/j.str.2005.02.002. [DOI] [PubMed] [Google Scholar]
  • 36.Sheng M, Sala C. Annu Rev Neurosci. 2001;24:1–29. doi: 10.1146/annurev.neuro.24.1.1. [DOI] [PubMed] [Google Scholar]
  • 37.Kang BS, Devedjiev Y, Derewenda U, Derewenda ZS. J Mol Biol. 2004;338:483–493. doi: 10.1016/j.jmb.2004.02.057. [DOI] [PubMed] [Google Scholar]
  • 38.Liao H, Yeh W, Chiang D, Jernigan RL, Lustig B. Protein Engineering Design & Selection. 2005;18:59–64. doi: 10.1093/protein/gzi009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Halle B. Proc Natl Acad Sci USA. 2002;99:1274–1279. doi: 10.1073/pnas.032522499. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES