Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Oct 12.
Published in final edited form as: J Chem Theory Comput. 2010 Oct 12;6(10):3249–3258. doi: 10.1021/ct1001413

Models to Approximate the Motions of Protein Loops

Aris Skliros 1, Robert L Jernigan 1, Andrzej Kloczkowski 1
PMCID: PMC2963458  NIHMSID: NIHMS238334  PMID: 21031141

Abstract

We approximate the loop motions of various proteins by using a coarse-grained model and the theory of rubberlike elasticity of polymer chains. The loops are considered as chains where only the first and the last residues thereof are tethered by their connections to the main structure; while within the loop, the loop residues are connected only to their sequence neighbors. We applied these approximate models to five proteins. Our approximation shows that the loop motions can usually be computed locally which shows these motions are robust and not random. But most interestingly, the new method presented here can be used to compute the likely motions of loops that are missing in the structures.

Keywords: loop reconfiguration, loop mobility, loop forms, elastic network models, Gaussian Network Model

INTRODUCTION

Coarse-grained Elastic Network Models (ENM) have been extremely successful in predicting the large-scale motions of proteins, RNA and other biological structures, even for the largest complexes such as the ribosome. The predicted fluctuations of the positions of amino acids in the coarse-grained representations usually give excellent agreement with the experimental B-factors reported by crystallographers,13 the ensembles reports by NMR scientists,4 and the variability in structures manifested in the known multiple structures of the same protein 4,5. The only information required in the ENMs is the structure of the protein, to furnish the coarse-grained coordinates of essential atoms, usually those of the of Cα atoms (but they could be other points representing the amino acids, such as centers of mass of side chains) for residue-level coarse graining. It has been shown that fluctuations of residues in proteins depend mostly on the protein’s shape.6 Because of this the ENMs give excellent results even for relatively low-resolution structural data, such as electron micrographs.

The problem of modeling the conformations of external protein loops is a really important problem. These are generally the most mobile parts of the protein localized on their surface and are the functional sites for many protein activities, particularly for encounter complexes and binding. Because of their relatively high mobilities they are often unresolved in the crystal structures, particularly for larger loops. Because of this, frequently the PDB coordinates for residues in loops are either missing, or have alternative positions. When obtaining an experimental structure, often the loops are the most uncertain parts of the structure.

Functions of biomolecules depend on their structures and are often exerted through functional motions. This makes understanding loop motions in proteins a particularly critical problem. Often interactions with other proteins or ligands, can lead to apparent rearrangements of loops to accommodate functional ligands. For drug binding, the reconfiguration of target protein loops is relevant. Thus the frequent involvement of external loops in function makes the prediction of their conformations essential for many structural applications in molecular biology, medicine, pharmacy and drug design. The motions of loops computed by using ENMs have been intriguing, since often these move together with motions of the large-scale domains, either as rigid parts of domains or as separate parts moving in an anticorrelated way, but controlled by the domain motions. We have observed that the functionally meaningful loops most often appear to move under the control of the entire structure and its domain motions. Protein loops have been the focus of many previous studies. Modeling of loops has a significant role in making comparisons between protein structures,7 since these may be found in one structure and missing in the other. The field of structural genomics often requires building loop structures 8. Methods for the automated classification of the structures of protein loops have been developed 9. In principle, the study of loops can aid the understanding of protein evolution 10. Panchenko and Madej 10 noted that protein loops are far from being random coils, regardless of their size. Changes in the conformations of protein loops have also been a subject of some specific study.11 The importance of protein loops for protein function has been widely acknowledged.12 Conformations of loops play a large role in protein docking as has been pointed out in References 1316. The motion of protein loops, especially where they are flexible is an important factor for understanding the various roles that proteins play. The website https://simtk.org/home/looptk provides a toolkit to model the kinematics of protein loops. In Reference 17 a novel approach for loop prediction was presented and analyzed. Kolodny et al.18 describe an algorithm for generating conformations of candidate loops for a gap of a given size. A similar work also appeared in Reference.19 Protein loops are also essential for protein folding.20 Conformational evaluation of loops and their major role in protein design was discussed in Reference 21. The importance of loop prediction was emphasized also in Reference.22 In this paper we devise a method for specifying how a protein loop can adopt different configurations. Our simple approximate model accounts only for the sequential connections within the loop and the loop’s connections at the two ends, and this is a surprisingly successful model for generating loop forms. The issues of other interactions of the loop with the body of the protein are not explicitly taken into account in this work.

To overcome the difficulties with loop predictions, here we have applied the analytical theory of fluctuations in Gaussian Phantom Networks originally developed in the rubber-like elasticity theory of polymers by James and Guth,2326 and others.2733 The theory assumes that polymer chains are phantom-like, i.e. they can pass freely through one another, so that excluded volume effects are to be completely neglected. . It is also assumed that the distributions of the end-to-end vectors for polymer chains are Gaussian. This means that mechanically the network behaves as a collection of nodes (junctions) connected by simple Hookean springs and both chains and junctions fluctuate harmonically around their mean positions. Kloczkowski et al.28 obtained analytical solutions for this model by assuming that all junctions in the network have the same connectivity φ (i.e. each junction is connected to φ other chains) and that a polymer network has the topology of an infinite tree. The theory provides analytical expressions for fluctuations of chains and junctions in such networks and for correlations of instantaneous fluctuations of two different points within the network.

The theory of phantom elastomeric networks was successfully adapted to treat protein motions originally as the Gaussian Network Model (GNM) by Bahar and Erman1 and others.3,6,3438 Their coarse-grained model was based on an earlier work of Tirion 39 who proposed that both non-bonded and covalently bonded atomic contacts in proteins could be modeled using a universal single spring constant in a harmonic analysis of protein dynamics. She assumed that two atoms are connected by a spring if they are separated by a distance smaller than a specified cutoff value. This defines a connectivity matrix for a system of nodes connected by springs. The coarse-grained GNM model enables computation of fluctuations of residues around their mean positions in protein structures directly from this connectivity matrix. Fluctuations of residues are simply expressed by the diagonal elements of the inverse of the connectivity matrix. Theoretical predictions are usually in quite good agreement with crystallographic temperature factors (B-factors) that measure the extent of disorder in crystallographically-determined positions of atoms resulting from thermal motions. Several variations of the elastic network approach to treat protein dynamics have been proposed recently,4,35,36,40,41 that additionally improve agreement of theoretical results with B-factors.

In the present paper we will apply analytical results from the theory of polymer networks obtained originally for tree-like networks to the external loops in proteins, and then we compare these results with results from GNM computations (based on the known packing details within a protein structure). It is worthwhile mentioning that both the original theory of phantom polymer networks and the elastic network models of proteins are based on the assumption that excluded volume effects are completely negligible. The theory of phantom Gaussian networks, although developed for polymer networks with the topology of an ideal infinite tree, works well for real polymer networks that contain many loops. This means that the detailed topology of the network is really not so essential for studies of individual chains.

The theory of phantom Gaussian networks provides analytical expressions for fluctuations of chains and junctions in the polymer network having connectivity φ (where φ is the number of polymer chains connected at each junction), which is constant. However, since each end of the chain in an exterior loop of a protein can be connected to a different number of springs, the original theory28 had to be modified to reflect having junctions at the two opposite ends of the loop with different functionalities φ1 andφ2. We analytically compute the mean-square fluctuations and correlations of the instantaneous fluctuations for junctions and points along the polymer chains in such a tree-like network,42 and here apply these analytical results to treat protein loops. The comparison of analytical predictions with the results of GNM computations for proteins with known crystallographic coordinates of loops overall show an excellent agreement. Our results demonstrate that it is possible to model theoretically the motions of protein loops using the Gaussian model from the polymer network, without knowing the structural details of the loop itself.

The structure of the present paper is as follows. First we describe briefly the Gaussian theory of random polymer networks, and show how fluctuations of chains and junctions (cross-links) and covariances among them can be computed analytically for a network with an ideal tree-like topology. In the next section we discuss the Gaussian Network Model (GNM) of proteins, and its relationship to the earlier discussed theory of random polymer networks. Later we compare the results from the GNM for several proteins with large external loops having known structures with the analytical results based on the theory of random polymer networks and the experimental B factors. Other possible applications of our method for computing the structures of loops could be made to treat loops in nucleic acids, and other loops in large biological structures, such as the ribosome, for the prediction of protein function, and for drug design.

METHODS

Theory of Random Polymer Networks

The first theory of rubber elasticity was proposed by Kuhn in late thirties.43 The theory was further developed by Treloar.30,31 It was based on the assumptions that the rubber network consists on ν freely-jointed Gaussian chains, which are cross-linked. It was also assumed that positions of the junctions (points of the chemical cross-links) deform affinely upon mechanical deformation of the rubber. The theory of phantom networks was developed in the 1940s by James and Guth.2326,44 They also considered the network to be composed of cross-linked Gaussian chains. Additionally they assumed that there are two types of network junctions. Junctions which are at the surface of the rubber are fixed and deform affinely with the macroscopic strain, while the junctions inside the network are free to fluctuate around their mean positions. They assumed that the behavior of the network is determined only by the connectivities of network chains and neglected the effect of the excluded volume of the chains. The chains in their model are phantom-like; i.e. they may pass freely through one another.

Chain dimensions and fluctuations in random elastomeric networks were studied by Flory,27 and by Kloczkowski, Mark and Erman 28 who examined in detail the behavior of phantom Gaussian networks in the undeformed state. It can be shown that the mean-square fluctuations in position of a junction i <(ΔRi)2> are related to the element Γii1 of the inverse of the connectivity matrix Γ, and more generally the covariances in positions of points i and j <(ΔRi· ΔRj)> are related to Γij1 as

<(ΔRi·ΔRj)>=3<r2>02Γij1 (1)

The elements of the inverse matrix have been calculated analytically for the network with the topology of an infinite tree, composed of chains of equal length (unimodal network), with equal mean square end-to-end distances <r2>0 in the undeformed state. It is assumed that the network has functionality φ, i.e. that each free junction connects exactly φ chains. Examples of unifuctional networks, recurrence relations between fluctuations of junctions in the neighboring tiers of the tree, recurrence relations between fluctuations of two junctions m and n separated by d other junctions along the path joining m and n, recurrence relations between fluctuations of points along the chains in the network and covariances of fluctuations among such points, recurrence relations between the elements of the inverse connectivity matrix Gamma;−1 are given by Kloczkowski et al. in Reference (28) and presented briefly in Appendix A in the Supplementary Materials.

Because most models of real polymer networks use phantom network as a reference state for the construction of the real network models, these results are significant for rubber elasticity. The Gaussian network model has been also extended to proteins, as will be described later.

Theory of Random Polymer Networks with alternating functionality

The theory was developed in Skliros et al.42,45 and is presented briefly in Appendix B in the Supplementary Materials. To study fluctuations of points along the chain we follow the method proposed in reference 46, and assume that all chains consist of n equal length segments and of n − 1 junctions of functionality 2, which connect these segments. Fig. 1 illustrates this approach, and the method of numbering all junctions for a tree-like network with alternating multifunctional functionalities composed of two tiers.

Figure 1.

Figure 1

Tree-like network with alternating functionalities separating each chain into four segments of equal length by three additional 2-functional junctions.

Although we have obtained the most general solution of the problem when two points i and j can be separated by several multifunctional junctions,42 we show here only the results when these two points belong to the same chain, i.e. there are no multifunctional junctions between them. Additionally since the network is assumed to be infinite we will concentrate on the case for the central first tier shown in the center of Fig. 1.

The positions of 2-functional junctions i and j can be expressed as the fraction of the chain between φ1 – functional and φ2 – functional junctions, counted from the closest φ1 – functional junction on the left of points i or j in Fig. 1; ζ=i1n;θ=j1n.

The final result is:

[<(ΔRi)2><ΔRi·ΔRj><ΔRi·ΔRj><(ΔRj)2>]=3n2γ0×[φ2(φ11)φ1(φ1φ2φ1φ2)+ζ(1ζ)(φ1φ2φ1φ2)+ζ(φ1φ2)φ1φ2φ2(φ11)φ1(φ1φ2φ1φ2)+(φ1φ2φ1φ2)φ1φ2[min(ζ,θ)ζθ]+min(ζ,θ)φ2max(ζ,θ)φ1φ2(φ11)φ1(φ1φ2φ1φ2)+(φ1φ2φ1φ2)φ1φ2[min(ζ,θ)ζθ]+min(ζ,θ)φ2max(ζ,θ)φ1φ2(φ11)φ1(φ1φ2φ1φ2)+θ(1θ)(φ1φ2φ1φ2)+θ(φ1φ2)φ1φ2] (2)

Gaussian Network Model of Proteins

The Gaussian Network Model (GNM) was originally developed for the theory of rubber-like elasticity of random polymer networks27,28 to calculate fluctuations of junctions and chains inside the network. That physical situation is quite different from that prevailing in a protein; because the polymer chains have random forms and the protein may be a more fixed form. The model has been adapted to coarse-grained proteins in 1997 by Bahar and Erman 1,47 based on the earlier result of Tirion 39 with a single harmonic force parameter, which successfully described the large scale motions in proteins.

The GNM is based on coarse-grained modeling of protein structure, with a single site per residue representing proteins. Positions of these sites are usually identified with the coordinates of Cα atoms in proteins, and it is assumed that both bonded and non-bonded contacts in protein structure are connected by uniform massless harmonic springs. Significantly the atomic version gives only slightly better results than the coarse-grained model,3 indicating that the motions are mostly representative of the overall structure, and not so much of it details.

To define which sites are in contact, a uniform cutoff distance Rc is used.1,38,40,47 Residues separated by this distance or closer than Rc (including neighbors along the sequence) are assumed to be in contact, and are connected with identical springs. This leads to the elastic network representation of a protein, in the folded state that bears a resemblance to a random polymer network. While this model of a protein is closely similar to that of a rubbery network, the main difference is that in the rubber the coordinations are defined by covalent links whereas in the GNMs and ENMs the connections are primarily non-bonded contacts arising from close packing within the structure. While the GNM formally neglects the excluded volume, regions with a higher density of atoms are represented by higher density of springs, while less dense regions are represented by few springs.

The distance vector between the ith and jth sites is Rij., with ΔRij being the instantaneous displacement of Rij from the mean value Rij0, and (ΔRij)2 is given by the scalar product (ΔRijT · ΔRij). The reference structure is usually the crystal structure taken from the Protein Data Bank (PDB), but could be a modeled structure or even the shape of a structure from an electron micrograph, which was filled with lattice points.48 It can be shown 27,28 that

<ΔRi·ΔRj>=3kBT2γ(Γ1)ij (3)

where (Γ−1)ij is the ij-th element of the inverse of the connectivity matrix Γ.

It should be note that the connectivity matrix Γ has been defined so that all elements in every row (or column) sum to zero. Because of this detΓ = 0, the matrix is singular, and only the pseudoinverse of Γ can be computed by the use of the singular value decomposition method. The pseudoinverse of Γ may be written as Γ−1 = U(Λ−1)UT where U is the matrix composed of eigenvectors ui (1 ≤ iN) of Γ, and Λ is the matrix having eigenvalues of Γ on the diagonal, and zeros off-diagonal. Additionally, it can be proven that all eigenvalues λi of Γ are non-negative.

Mean-square fluctuations of each Cα computed from Eq. 3 can be compared with the Debye-Waller factors for the Cα atoms. These temperature factors are frequently measured by X-ray crystallography for all heavy atoms in the protein structure and are deposited in the Protein Data Bank as temperature B-factors. The computed B-factors for the i-th residue are given by:

Bi=8π2<(ΔRi)2>/3 (4)

The B-factors computed by the GNM usually are in excellent agreement with experimental data,2 although even better agreement is found when compared with the averages of internal distances from NMR ensembles.4,49

The matrix Γ−1 can be written as the sum of contributions from individual modes50

Γ1=kλk1ukukT (5)

where the zero eigenvalues (physically corresponding to motions of the center of mass of the system) are excluded from the sum. The i-th component of the eigenvector uk (corresponding to the k-th normal mode) specifies the magnitude of fluctuational motions of the i-th residue in the protein exerted by the k-th mode. If the eigenvalues are ordered according in an ascending order starting from zero, then the most meaningful contributions in Eq. 5 are given by the smallest non-zero eigenvalues λk, which correspond to the large-scale slow modes. The slowest modes play a dominant role in the fluctuational dynamics of structures, because their contributions to the mean-square fluctuations scale with λk1. It has been shown that the most essential motions of proteins5153 or large biological structures such as the ribosome,5457 which are associated with their biological function, are clearly identifiable within a few of the slowest modes of the GNM or ENM. The large-scale changes of protein conformations between ‘open’ and ‘closed’ forms, or domain swapping in proteins can be also explained well by these ENMs 58,59. The Gaussian Network Model is the simplest version of several different ENMs. It has been extended to treat anisotropic fluctuations with vector directions for the motions,40 and hierarchical35 or mixed36 levels of coarse graining.

RESULTS AND DISCUSSION

Prediction of motions of loops in proteins

We have studied in detail the motions of external loops in five different proteins: tubulin (PDB code 1tub, tubulin α/β dimer), reverse transcriptase (1n5y), triose phosphate isomerase (1tph, human triose phosphate isomerase), protease (1j71, extracellular aspartic proteinase from Candida tropicalis yeast) and myoglobin (2v1k, ferrous deoxymyoglobin at pH 6.8). Tubulin, reverse transcriptase and triose phosphate isomerase are composed of two monomers and have 867, 910, and 496 residues, respectively. Protease and myoglobin each contain single chains with 338, and 163 residues, respectively. We first locate loops of these proteins that are on the surface and compute the mean-square fluctuations of all the residues in these loops, and their cross-correlations (covariances) between fluctuations of two different residues by using both the GNM (Eq. 4) and the analytical formula (Eq. 2) derived for a polymer network with alternating functionality. More specifically we consider the loops as chains where the first and the last residue are junctions with functionalities equal to the actual connectivities for these residues with the remainder of the protein as given for a cut off distance (7 Å); the other residues of the loop are considered as junctions having functionality two. Now since the functionality of the two terminal loop residues may be different, in order to find the auto and cross covariances of the loop residues we use Equation 2. For the case of the GNM model, we find the connectivity matrix (for the whole protein including PDB data for residues forming loops) and then we find the pseudoinverse thereof by using Singular Value Decomposition. The fluctuations and the covariances of the residues of each loop resiude are found based on Equation 4.

We identify protein loops by first excluding helices and beta-strands in the protein structure, leaving only coils. The criterion for a loop is the requirement that four or more consecutive coil residues are located on the protein surface. We illustrate these loops in protein structures for three of the studied proteins by coloring them blue in Fig. 2.

Figure 2.

Figure 2

(A) Loops (colored in blue) of reverse transcriptase (A), triose phosphate isomerase (B) and protease (C).

We calculated covariances of instantaneous fluctuations of residues in loops both from Equation 2 (polymer theory of rubberlike elasticity) and from Equation 4 (GNM computations based on the complete protein structure). The results obtained are shown in Figs. 37. Curves with squares show covariances calculated from Equation 2 using only information on the connectivity of the terminal residues of protein loops (functionality of their junctions) and the length of a loop, while curves with dots display covariances calculated from Equation 4 for GNM applied to the whole protein. The pattern for the residue-residue indexing is as follows: initially the index shows the covariance of the first residue in the loop with itself and with all others, then of the second residue with itself and with all others (except the first one), etc. For a loop composed of n residues the residue-residues index changes from 1 to n (n + 1)/2.

Figure 3.

Figure 3

Covariances of instantaneous fluctuations calculated by using theory of rubberlike elasticity (Equation 2) (squares) and GNM (Equation 4) (dots) for the following individual loops: (a) Reverse transcriptase loop number 4, (b) Tubulin loop number 4, (c) Triose phosphate isomerase loop number 3, (d) Protease loop number 7 and (e) Myoglobin loop number 4. The abscissa shows the index for pairs of residues.

Figure 7.

Figure 7

Covariances of instantaneous fluctuations of the first residue of the loop with other residues in the same loop as a function of the sequence distance between them for: (a) reverse transcriptase loop no. 4, (b) tubulin loop no. 4, (c) triose phosphate isomerase loop no. 3, (d) protease loop no. 7 and (e) myoglobin loop no. 4. Results computed from polymer theory of rubberlike elasticity (squares), and from GNM (dots) are compared.

Figure 3 shows the values of covariances calculated from polymer rubberlike elasticity theory (squares) and from GNM (dots) for the following loops (for indexing of all loops see Appendix C in Supplementary Materials): (a) reverse transcriptase loop number 4 (n = 7), (b) tubulin loop number 4 (n = 5), (c) triose phosphate isomerase loop number 3 (n = 5), (d) protease loop number 7 (n = 6) and (e) myoglobin loop number 4 (n = 5). We see that for these cases the local approximation based on Eq. 2 provides an excellent result that very well approximates the whole structure result based on the GNM.

In addition to covariances of the instantaneous fluctuations it is interesting to analyze correlations among them defined by

Corr=<(ΔRi·ΔRj)><(ΔRi)2><(ΔRj)2)> (6)

Figure 4 shows the correlations obtained by the GNM and the polymer elastic theory for the loops analyzed earlier in Fig. 3.

Figure 4.

Figure 4

Correlations of instantaneous fluctuations computed by using the theory of rubberlike elasticity (Equation 2) (squares) and the GNM (Equation 4) (dots) for the following individual loops: (a) Reverse transcriptase loop number 4, (b) Tubulin loop number 4, (c) Triose phosphate isomerase loop number 3, (d) Protease loop number 7 and (e) Myoglobin loop number 4. The abscissa shows the index for pairs of residues.

Figure 5 shows the values of covariances computed for four loops of reverse transcriptase with asymmetry in the connectivities φ1, φ2 of the terminal junctions for loops: (a) loop number 4 (n = 7, φ1 = 4, φ2 = 11), (b) number 5 (n = 5, φ1 = 8, φ2 = 3), (c) number 11 (n = 6, φ1 = 5, φ2 = 10), and (d) number 14 (n = 4, φ1 = 5, φ2 = 13). We see that the local approximations based on the polymer network model are more successful for longer loops than for short ones.

Figure 5.

Figure 5

Covariances of the instantaneous fluctuations calculated by using the theory of rubber elasticity (Equation 2) (squares) and GNM (Equation 4) (dots) for loops number 4, 5, 11, and 14 of reverse transcriptase. The abscissa shows the index for pairs of residues, with indexing described in the text.

It is also relevant to examine the relationships between the fluctuations of the loop residues computed from polymer rubberlike elasticity model, and from GNM with experimental B-factors. Figure 6 shows plots of the mean square fluctuations of loop residues obtained from these three different sources for all 16 loops of the protease. For purposes of comparison all three quantities are normalized. (For example Binorm=BiBminBmaxBmin where i is the residue index for the particular loop. The same normalization is carried for the fluctuations computed from GNM and from polymer phantom network model). We see that for some of these cases the local approximation based on our model approximates the GNM results very well. The main reason for the good or bad approximation by the local method to the GNM results are the connectivities of the residues of the loops.

Figure 6.

Figure 6

Fluctuations of the residues for all loops of protease. Fluctuations computed from polymer theory of rubberlike elasticity (squares), and from GNM (dots) are compared with B-factors (triangles). All fluctuations have been normalized. The abscissa is the residue index in the loop.

Another crucial issue is whether the covariances of instantaneous fluctuations decay similarly with respect to the sequence distance between the residues. To address this problem we have plotted in Fig. 7 the covariances of the first residue in each loop with respect to the other residues of the same loop as a function of the sequence distance between these two residues.

Figures 37 indicate that the covariances of instantaneous fluctuations obtained both by considering the loops as individual entities (theoretical model) and as a whole structure (GNM) are closely similar. We computed the correlations of these covariances for all loops for all the proteins studied here. Our computations indicate that the average correlation of covariances (averaged over all loops in a given protein) is the largest (0.70) for triose phosphate isomerase, 0.66 for reverse transcriptase, and the smallest (0.38) for myoglobin, which has only four very short loops. All profiles shown in these figures show a close resemblance between the behavior of our new loop modeling and the whole protein modeling with the Gaussian Network model. This is at first surprising, since previous results with the Gaussian Network Model indicated that the whole structure was needed in order to compute the motions of any part. What we are seeing here is that the individual loops and their simplified representations are generally sufficient to compute the relative mobilities for the individual parts of the loop. Information contained in Figs. 37 for additional loops of the proteins are provided in Appendix D in the Supplementary Materials. Appendix E gives the correlations of the covariances for every loop of each protein we have studied as a function of the sum of the functionalities of the two terminal junctions in each loop. We have not noticed any apparent relationship between these two quantities. Appendix F shows results of computations of covariances of instantaneous fluctuations of residues belonging to helices for the proteins of study, similarly as was done earlier for loops. Our computations show that polymer network approximation does not work as well for helices as for the loops. Appendix G shows the computed correlation coefficients between the predicted amplitudes of fluctuations of loop residues (using both the present analytical model and GNM theory) and the experimental B-factors.

Discussion

We have presented a comparison of the loop motions using both the GNM and a new approach based on theory of rubberlike elasticity of polymer networks. For the latter, the loop is modeled by assuming that the two ends have functionalities φ1 and φ2 connecting them with the remainder of the protein structure, whereas the intermediate residues of the loop uniformly have the functionality of two, connecting them only to their sequence neighbors. We then calculated the mean square fluctuations (variances) and covariances for the loop residues by using formulas derived analytically by us for a tree-like polymer network with alternating functionalities. We then applied this new approach to all external loops in five different proteins (reverse transcriptase, triose phosphate isomerase, tubulin, protease and myoglobin) and compared analytical results for the mean square fluctuations and the covariances with GNM computations based on the coordinates for all residues of the loops. For each loop we have plotted the covariances between instantaneous fluctuations of pairs of residues of the loop, and the mean square fluctuations of individual loop residues. We have also compared the covariance of instantaneous fluctuations between two residues of the loop as a function of the distance between them. The comparisons between these two models show that the local approximation of the loops that describes the motion of the loop residues based on polymer tree-like network topology independently from the rest of the protein structure closely approximates the results obtained from GNM where the whole protein structure is taken into account.

Loop mobilities have long been considered to be important for function. Loops on the surfaces of proteins are often the most uncertain parts of structures. It is quite likely that many loops are artificially immobilized and compressed against the body of the protein in the crystal environment. What has been done in the present work is to develop and compare two simple models for loop mobilities. These two quite different approaches yield similar results. Our finding - that computed loop motions are similar whether treated as independent or in the context of the whole protein structure – implies that the loop motions are occurring along relatively well defined pathways, which likely could be mapped out in future computations. This approach may be used not only for loops of proteins, but also for the large loops frequently occurring in nucleic acid stem-loop structures, as well as the much larger loops originating in double stranded DNA, when multiple subunit proteins bind at widely separated positions. This critical problem for transcription regulation requires coarse-graining of the structure since the loops can be thousands of base pairs in length. The more difficult problem may be how to introduce additional interactions within these DNA loops when supercoiling is introduced.

Utility of this approach for describing the motions of loops with unknown structures

Often in protein structures loops are missing in the crystal structure. The approach described in this paper could be used to predict probably structures for these missing loops since the only requirement is knowing the connectivity numbers for the two ends and the number of residues in the loop.

In the present work we have considered only the relative magnitudes of these motions, which compare favorably. In the future it will be essential to develop a vector version of the present loop modeling to define actual pathways, to compare with the ANM and the anisotropic temperature factors. These pathways will be presented in future work, where we also investigate the effects of interactions between loop residues and the body of the protein. The results from the present work indicate that our new approach based on polymer elasticity theory provides a close approximation to the GNM model that includes the effects of the whole structure.

The polymer rubberlike elasticity model only predicts a relatively simple pattern of fluctuations of residues across a loop with a convex shape, and with the residues close the center of the loops having a higher amplitude. However, according to this model the central residue of the loop does not always have the maximum amplitude of fluctuations. We have also an effect from the functionality of junctions on both ends of the loop. The maximum is shifted towards the junction which having lower functionality. This effect is also observed for experimental B-factors for loop residues, although sometimes there are exceptions to this rule, due to significant interactions of loop residues with the remainder of the protein. Our predictions are limited somewhat by the simplicity of the theoretical model, and by the assumption that the motions of loops are unobstructed by the remaining part of the protein, but nevertheless they do enable predicting the basic features of these motions.

Supplementary Material

1_si_001

Acknowledgments

We are pleased to acknowledge the financial support provided by the National Institutes of Health through grants R01GM081680, R01GM072014, and R01GM073095.

Footnotes

Description of the Available Supporting Information: In Appendix A we provide a synopsis of the theory of random polymer networks. In the Appendix B we present a summary of the theory of random polymer networks with alternating functionality. In Appendix C we show tables listing the loops and the loop residues for all the proteins of this study. In Appendix D we provide information (supplementary to that contained in Figs. 37) for more loops for all the proteins of study. In Appendix E we show the calculated correlation of covariances of instantaneous fluctuations for every loop of each protein as a function of the sum of the functionalities of their terminal junctions. In Appendix F we have calculated the covariances of instantaneous fluctuations of residues belonging to helices for all proteins studied, as we did earlier for the loops. Appendix G lists the computed correlation coefficients between the predicted amplitudes of fluctuations (using both the present analytical model and GNM theory) and the experimental B-factors for protein loops.

This information is available free of charge via the Internet at http://pubs.acs.org.

Reference List

  • 1.Bahar I, Atilgan AR, Erman B. Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Folding & Design. 1997;2(3):173–181. doi: 10.1016/S1359-0278(97)00024-2. [DOI] [PubMed] [Google Scholar]
  • 2.Kundu S, Melton JS, Sorensen DC, Phillips GN. Dynamics of proteins in crystals: Comparison of experiment with simple models. Biophysical Journal. 2002;83(2):723–732. doi: 10.1016/S0006-3495(02)75203-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Sen TZ, Feng YP, Garcia JV, Kloczkowski A, Jernigan RL. The extent of cooperativity of protein motions observed with elastic network models is similar for atomic and coarser-grained models. Journal of Chemical Theory and Computation. 2006;2(3):696–704. doi: 10.1021/ct600060d. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Yang L, Song G, Carriquiry A, Jernigan RL. Close correspondence between the motions from principal component analysis of multiple HIV-1 protease structures and elastic network modes. Structure. 2008;16(2):321–330. doi: 10.1016/j.str.2007.12.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Yang L, Song G, Jernigan RL. How well can we understand large-scale protein motions using normal modes of elastic network models? Biophysical Journal. 2007;93(3):920–929. doi: 10.1529/biophysj.106.095927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lu MY, Ma JP. The role of shape in determining molecular motions. Biophysical Journal. 2005;89(4):2395–2401. doi: 10.1529/biophysj.105.065904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Fiser A, Do RKG, Sali A. Modeling of loops in protein structures. Protein Science. 2000;9(9):1753–1773. doi: 10.1110/ps.9.9.1753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Espadaler J, Fernandez-Fuentes N, Hermoso A, Querol E, Aviles FX, Sternberg MJE, Oliva B. ArchDB: automated protein loop classification as a tool for structural genomics. Nucleic Acids Research. 2004;32:D185–D188. doi: 10.1093/nar/gkh002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Oliva B, Bates PA, Querol E, Aviles FX, Sternberg MJE. An automated classification of the structure of protein loops. Journal of Molecular Biology. 1997;266(4):814–830. doi: 10.1006/jmbi.1996.0819. [DOI] [PubMed] [Google Scholar]
  • 10.Panchenko AR, Madej T. Structural similarity of loops in protein families: toward the understanding of protein evolution. BMC Evolut Biology. 2005;5 doi: 10.1186/1471-2148-5-10. Art. No. 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Groban ES, Narayanan A, Jacobson MP. Conformational changes in protein loops and helices induced by post-translational phosphorylation. Plos Computational Biology. 2006;2(4):238–250. doi: 10.1371/journal.pcbi.0020032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hu XZ, Wang HC, Ke HM, Kuhlman B. High-resolution design of a protein loop. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(45):17668–17673. doi: 10.1073/pnas.0707977104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bos C, Lorenzen D, Braun V. Specific in vivo labeling of cell surface-exposed protein loops: Reactive cysteines in the predicted gating loop mark a ferrichrome binding site and a ligand-induced conformational change of the Escherichia coli FhuA protein. Journal of Bacteriology. 1998;180(3):605–613. doi: 10.1128/jb.180.3.605-613.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Li C, Banfield MJ, Dennison C. Engineering copper sites in proteins: Loops confer native structures and properties to chimeric cupredoxins. Journal of the American Chemical Society. 2007;129(3):709–718. doi: 10.1021/ja0661562. [DOI] [PubMed] [Google Scholar]
  • 15.Smith JW, Tachias K, Madison EL. Protein loop grafting to construct a variant of tissue-type plasminogen activator that binds platelet integrin alpha(IIb)beta(3) Journal of Biological Chemistry. 1995;270(51):30486–30490. doi: 10.1074/jbc.270.51.30486. [DOI] [PubMed] [Google Scholar]
  • 16.Sudarsanam S, Dubose RF, March CJ, Srinivasan S. Modeling Protein Loops Using A Phi-I+1, Psi-I Dimer Database. Protein Science. 1995;4(7):1412–1420. doi: 10.1002/pro.5560040715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.van Vlijmen HWT, Karplus M. PDB-based protein loop prediction: Parameters for selection and methods for optimization. Journal of Molecular Biology. 1997;267(4):975–1001. doi: 10.1006/jmbi.1996.0857. [DOI] [PubMed] [Google Scholar]
  • 18.Kolodny R, Guibas L, Levitt M, Koehl P. Inverse kinematics in biology: The protein loop closure problem. International Journal of Robotics Research. 2005;24(2–3):151–163. [Google Scholar]
  • 19.Gerstein M, Chothia C. Analysis of Protein Loop Closure - 2 Types of Hinges Produce One Motion in Lactate-Dehydrogenase. Journal of Molecular Biology. 1991;220(1):133–149. doi: 10.1016/0022-2836(91)90387-l. [DOI] [PubMed] [Google Scholar]
  • 20.Krieger F, Fierz B, Axthelm F, Joder K, Meyer D, Kiefhaber T. Intrachain diffusion in a protein loop fragment from carp parvalbumin. Chemical Physics. 2004;307(2–3):209–215. [Google Scholar]
  • 21.Li WZ, Liu ZJ, Lai LH. Protein loops on structurally similar scaffolds: Database and conformational analysis. Biopolymers. 1999;49(6):481–495. doi: 10.1002/(SICI)1097-0282(199905)49:6<481::AID-BIP6>3.0.CO;2-V. [DOI] [PubMed] [Google Scholar]
  • 22.Burke DF, Deane CM. Improved protein loop prediction from sequence alone. Protein Engineering. 2001;14(7):473–478. doi: 10.1093/protein/14.7.473. [DOI] [PubMed] [Google Scholar]
  • 23.James HM, Guth E. Theory of the Increase in Rigidity of Rubber During Cure. Journal of Chemical Physics. 1947;15(9):669–683. [Google Scholar]
  • 24.James HM. Statistical Properties of Networks of Flexible Chains. Journal of Chemical Physics. 1947;15(9):651–668. [Google Scholar]
  • 25.James HM, Guth E. Simple Presentation of Network Theory of Rubber, with A Discussion of Other Theories. Journal of Polymer Science. 1949;4(2):153–182. [Google Scholar]
  • 26.James HM, Guth E. Statistical Thermodynamics of Rubber Elasticity. Journal of Chemical Physics. 1953;21(6):1039–1049. [Google Scholar]
  • 27.Flory PJ. Statistical Thermodynamics of Random Networks. Proceedings of the Royal Society of London Series A-Mathematical Physical and Engineering Sciences. 1976;351(1666):351–380. [Google Scholar]
  • 28.Kloczkowski A, Mark JE, Erman B. Chain Dimensions and Fluctuations in Random Elastomeric Networks .1. Phantom Gaussian Networks in the Undeformed State. Macromolecules. 1989;22(3):1423–1432. [Google Scholar]
  • 29.Kloczkowski A, Mark JE, Frisch HL. The relaxation spectrum for Gaussian Networks. Macromolecules. 1990;23:3481–3490. [Google Scholar]
  • 30.Treloar LRG. The Elasticity of A Network of Long-Chain Molecules .3. Transactions of the Faraday Society. 1946;42(1–2):83–94. [Google Scholar]
  • 31.Treloar LRG. The Statistical Length of Long-Chain Molecules. Transactions of the Faraday Society. 1946;42(1–2):77–82. [Google Scholar]
  • 32.Kloczkowski A, Mark JE, Erman B. Fluctuations, Correlations, and Small-Angle Neutron-Scattering from End-Linked Gaussian Chains in Regular Bimodal Networks. Macromolecules. 1991;24(11):3266–3275. [Google Scholar]
  • 33.Kloczkowski A, Mark JE, Erman B. A Diffused-Constraint Theory for the Elasticity of Amorphous Polymer Networks .1. Fundamentals and Stress-Strain Isotherms in Elongation. Macromolecules. 1995;28(14):5089–5096. [Google Scholar]
  • 34.Bahar I, Rader AJ. Coarse-grained normal mode analysis in structural biology. Current Opinion in Structural Biology. 2005;15(5):586–592. doi: 10.1016/j.sbi.2005.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Doruker P, Jernigan RL, Bahar I. Dynamics of large proteins through hierarchical levels of coarse-grained structures. Journal of Computational Chemistry. 2002;23(1):119–127. doi: 10.1002/jcc.1160. [DOI] [PubMed] [Google Scholar]
  • 36.Kurkcuoglu O, Jernigan RL, Doruker P. Collective dynamics of large proteins from mixed coarse-grained elastic network model. Qsar & Combinatorial Science. 2005;24(4):443–448. [Google Scholar]
  • 37.Tama F, Gadea FX, Marques O, Sanejouand YH. Building-block approach for determining low-frequency normal modes of macromolecules. Proteins-Structure Function and Genetics. 2000;41(1):1–7. doi: 10.1002/1097-0134(20001001)41:1<1::aid-prot10>3.0.co;2-p. [DOI] [PubMed] [Google Scholar]
  • 38.Tama F, Sanejouand YH. Conformational change of proteins arising from normal mode calculations. Protein Engineering. 2001;14(1):1–6. doi: 10.1093/protein/14.1.1. [DOI] [PubMed] [Google Scholar]
  • 39.Tirion MM. Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Physical Review Letters. 1996;77(9):1905–1908. doi: 10.1103/PhysRevLett.77.1905. [DOI] [PubMed] [Google Scholar]
  • 40.Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, Bahar I. Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophysical Journal. 2001;80(1):505–515. doi: 10.1016/S0006-3495(01)76033-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Song G, Jernigan RL. vGNM: A better model for understanding the dynamics of proteins in crystals. Journal of Molecular Biology. 2007;369(3):880–893. doi: 10.1016/j.jmb.2007.03.059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Skliros A, Mark JE, Kloczkowski A. Chain Dimensions and Fluctuations in Elastomeric Networks in which the Junctions Alternate Regularly in their Functionality. J Chem Phys. 2009;130:19. doi: 10.1063/1.3063115. Art. No. 064905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Kuhn W. Relationship between molecular size, static molecular shape and elastic properties of high polymer materials. Kolloid-Zeitschrift. 1936;76:258. [Google Scholar]
  • 44.Jensen JH, Gordon MS. An approximate formula for the intermolecular Pauli repulsion between closed shell molecules. II. Application to the effective fragment potential method. Journal of Chemical Physics. 1998;108(12):4772–4782. [Google Scholar]
  • 45.Skliros A, Mark JE, Kloczkowski A. Small-Angle Neutron Scattering from Elastomeric Networks in which the Junctions Alternate Regularly in their Functionality. Macromolecular Theory and Simulations. 2009;18(9):537–544. [Google Scholar]
  • 46.Kloczkowski A, Mark JE, Erman B. Chain Dimensions and Fluctuations in Random Elastomeric Networks .1. Phantom Gaussian Networks in the Undeformed State. Macromolecules. 1989;22(3):1423–1432. [Google Scholar]
  • 47.Haliloglu T, Bahar I, Erman B. Gaussian dynamics of folded proteins. Physical Review Letters. 1997;79(16):3090–3093. [Google Scholar]
  • 48.Doruker P, Jernigan RL. Functional motions can be extracted from on-lattice construction of protein structures. Proteins-Structure Function and Genetics. 2003;53(2):174–181. doi: 10.1002/prot.10486. [DOI] [PubMed] [Google Scholar]
  • 49.Yang LW, Eyal E, Chennubhotla C, Jee J, Gronenborn AM, Bahar I. Insights into equilibrium dynamics of proteins from comparison of NMR and X-ray data with computational predictions. Structure. 2007;15(6):741–749. doi: 10.1016/j.str.2007.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Haliloglu T, Bahar I, Erman B. Gaussian dynamics of folded proteins. Physical Review Letters. 1997;79(16):3090–3093. [Google Scholar]
  • 51.Keskin O, Durell SR, Bahar I, Jernigan RL, Covell DG. Relating molecular flexibility to function: A case study of tubulin. Biophysical Journal. 2002;83(2):663–680. doi: 10.1016/S0006-3495(02)75199-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Keskin O, Bahar I, Flatow D, Covell DG, Jernigan RL. Molecular mechanisms of chaperonin GroEL-GroES function. Biochemistry. 2002;41(2):491–501. doi: 10.1021/bi011393x. [DOI] [PubMed] [Google Scholar]
  • 53.Navizet I, Lavery R, Jernigan RL. Myosin flexibility: Structural domains and collective vibrations. Proteins-Structure Function and Genetics. 2004;54(3):384–393. doi: 10.1002/prot.10476. [DOI] [PubMed] [Google Scholar]
  • 54.Wang YM, Rader AJ, Bahar I, Jernigan RL. Global ribosome motions revealed with elastic network model. Journal of Structural Biology. 2004;147(3):302–314. doi: 10.1016/j.jsb.2004.01.005. [DOI] [PubMed] [Google Scholar]
  • 55.Wang YM, Jernigan RL. Comparison of tRNA motions in the free and ribosomal bound structures. Biophysical Journal. 2005;89(5):3399–3409. doi: 10.1529/biophysj.105.064840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Yan AM, Wang YM, Kloczkowski A, Jernigan RL. Effects of Protein Subunits Removal on the Computed Motions of Partial 30S Structures of the Ribosome. Journal of Chemical Theory and Computation. 2008;4(10):1757–1767. doi: 10.1021/ct800223g. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Kurkcuoglu O, Doruker P, Sen TZ, Kloczkowski A, Jernigan RL. The ribosome structure controls and directs mRNA entry, translocation and exit dynamics. Physical Biology. 2008;5(4) doi: 10.1088/1478-3975/5/4/046005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Kundu S, Jernigan RL. Molecular mechanism of domain swapping in proteins: An analysis of slower motions. Biophysical Journal. 2004;86(6):3846–3854. doi: 10.1529/biophysj.103.034736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Feng YP, Yang L, Kloczkowski A, Jernigan RL. The energy profiles of atomic conformational transition intermediates of adenylate kinase. Proteins-Structure Function and Bioinformatics. 2009;77(3):551–558. doi: 10.1002/prot.22467. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1_si_001

RESOURCES