Abstract
Structural fluctuations of a protein are essential for a protein to function and fold. By using molecular dynamics (MD) simulations of the model α/β protein VA3 in its native state, the coupling between the main-chain (MC) motions [represented by coarse-grained dihedral angles (CGDAs) γn based on four successive Cα atoms (n - 1, n, n + 1, n + 2) along the amino acid sequence] and its side-chain (SC) motions [represented by CGDAs δn formed by the virtual bond joining two consecutive Cα atoms (n, n + 1) and the bonds joining these Cα atoms to their respective Cβ atoms] was analyzed. The motions of SCs (δn) and MC (γn) over time occur on similar free-energy profiles and were found to be subdiffusive. The fluctuations of the SCs (δn) and those of the MC (γn) are generally poorly correlated on a ps time-scale with a correlation increasing with time to reach a maximum value at about 10 ns. This maximum value is close to the correlation between the δn(t) and γn(t) time-series extracted from the entire duration of the MD runs (400 ns) and varies significantly along the amino acid sequence. High correlations between the SC and MC motions [δ(t) and γ(t) time-series] were found only in flexible regions of the protein for a few residues which contribute the most to the slowest collective modes of the molecule. These results are a possible indication of the role of the flexible regions of proteins for the biological function and folding.
Keywords: dihedral principal component analysis, free-energy landscape, power law, subdiffusion
The motions of the side chains and of the backbone of a protein are coupled to each other in protein folding. The removal of the nonpolar side chains from the solvent in protein folding orients the backbone locally (1), and the formation of the backbone hydrogen bonds between residues in the secondary structures induces displacements of their side chains. Therefore, in order to understand protein folding, it is necessary to understand how the motions of the side chains are propagated to the backbone and vice versa. As a prerequisite, it is necessary to understand how the motions of the side chains are coupled to those of the main chain in the native state of a protein. To address this question, five 80 ns all-atom molecular dynamics (MD) simulations of a 46-residue α/β model protein VA3 (PDB: 1ED0) (2) in its native state were analyzed. Protein VA3 was chosen here because it is highly homologous to the well-studied protein crambin, but unlike crambin (3), it is soluble in water (2).
To express the correlation between the main-chain and the side-chain motions quantitatively, two coarse-grained dihedral angles (CGDAs) were defined for each residue n (Fig. S1A). The angles characterizing the fluctuations of the main chain are the CGDAs γn (4–6) formed by the virtual bonds joining four consecutive Cα atoms (n - 1, n, n + 1 and n + 2) along the amino acid sequence with 2 ≤ n ≤ N - 2 and N being the number of residues (Fig. S1A). The angles γn are coordinates used to describe large changes of protein conformation (7) and are part of coarse-grained models of proteins (8, 9). The angles characterizing the fluctuations of the side chains are defined by the CGDAs δn (1 ≤ n ≤ N - 1) formed by the virtual bond joining two consecutive Cα atoms (n, n + 1) and the bonds joining these Cα atoms to their respective Cβ atoms (Fig. S1A). For the residue glycine, the side-chain H atom was defined as a pseudo-Cβ atom.
To monitor the fluctuations of the side chains, the displacements of the Cβ atoms (and pseudo Cβ in glycine) of the residues were chosen because they are the atoms of the side chains nearest to the backbone atoms. Consequently, their motions should be strongly related to those of the main chain. Assuming rigid covalent bonds, this coupling can be understood as follows. Because the Cα(n) atom is an sp3-hybridized atom, motion of the Cα(n)─Cβ(n) bond of residue n (modifying the CGDA δn) induces the displacements of the N(n)–Cα(n) and Cα(n)–C′(n) backbone bonds (C′ and N atoms are defined in Fig. S1A). Because of the rigidity of the peptide plane and the partial double bond character of the C′(n-1)─N(n) and C′(n)─N(n + 1) bonds, the motion of the Cα(n)─Cβ(n) bond propagates at least to residues n - 1 and n + 1 (modifying the CGDA γn). The coupling between the motions of the Cα─Cβ bonds and those of the amide and carbonyl bonds of the backbone also depend on the bond-length and bond-angle deformations, on the noncovalent interactions between the atoms, on the nature of the residue, and on the interactions of the protein with the solvent. Therefore, the magnitude of the correlation between the side-chain motions (measured here by δn) and the main-chain motions (measured here by γn) should vary along the amino acid sequence.
In order to quantify and to understand the coupling between the side chains and the main chain of a protein in its native state, the effective 1-D free-energy profiles (FEPs) V(δn) and V(γn) (5, 6) along the amino acid sequence of the model protein VA3 (Fig. 1) were first compared. The FEPs were computed over the whole MD runs by using V(δn) = -kT ln[P(δn)] and V(γn) = -kT ln[P(γn)], where k and T = 300 K are the Boltzmann constant and temperature, respectively, and P(δn) and P(γn) are the probability distribution functions (PDFs) of each CGDA δn and γn (5, 6). As shown in Fig. S2, the FEPs V(δn) and V(γn) of each residue n were found to be highly correlated [the standard correlation coefficient R (10) averaged along the amino acid sequence is 0.90; see Results section]. This result, as expected, reflects the local constraints imposed by the sp3 hybridization of the Cα atoms on the δn and γn motions.
The similarity between the equilibrium properties V(δn) and V(γn) does not imply that each point of these FEPs is explored at the same time; i.e., this comparison does not provide any information about the possible dynamical coupling between δn and γn. To examine the dynamical correlation, the analogy between the fluctuations over time of a dihedral angle; for example γn, and Brownian motion on a unit circle (5, 6) (Fig. S1B) is exploited. Since each dihedral angle is coupled to a huge number of microscopic degrees of freedom, its temporal evolution is a stochastic process. The time-evolution of each γn(t) [or δn(t)] extracted from each MD run is interpreted as the random walk of a fictitious particle on a unit circle; the particle makes a random angular jump of amplitude Δγn(t) = γn(t + δt) - γn(t) [or Δδn(t) = δn(t + δt) - δn(t)], at each time step δt for which the MD run is recorded (δt = 1 ps in refs. 5 and 6 and in the present work). The mechanism of transport of the particle along the circle, due to random fluctuating torques, is characterized by its Mean-Square-Displacement (MSD) and Rotational Correlation Functions (RCFs) (6). The RCFs associated with γn are related to the position un(t) of the fictitious particle on the circle where un(t) = { cos[γn(t)], sin[γn(t)]} (Fig. S1B). For circular geometry, the RCFs are defined (6, 11) as
[1] |
where Tl(xn) is a Tchebychev polynomial of order l (11). In the present work, the second-order RCFs, T2(x) = 2x2 - 1, were analyzed [results for T1(x) = x contain similar information (6)]. In Eq. 1, xn ≡ un(t′)•un(t′ + t) is the cosine of the angular displacement Δγn of un between t′ and t′ + t, and Tl is averaged over all possible initial orientations (at all times t′). The RCFs of δn are defined similarly by Eq. 1 in which the vector un(t) = { cos[δn(t)], sin[δn(t)]}.
It was previously established that the MSD of the random walker, associated with the temporal fluctuations of each γn, is a power law of time: MSD(t) = 2Dαtα, with a diffusion constant Dα and an exponent α < 0.4 depending on n (5, 6). As shown in ref. 6, the RCFs associated with un decay as stretched exponentials (SEs): Cl,n(t) = exp(-l2Dαtα) (6). The quantity 2Dα is the variance of the distribution of the random angular jumps Δγn(t) of the random walker, and the exponent α is related to the speed of the diffusion (6). An exponent α < 1 corresponds to a subdiffusive regime (12–15) previously observed in fluorescence experiments (16) and in MD simulations (5, 6, 17–19) of proteins. The subdiffusion means that the random walker has a memory of its past (13). In other words, the random torques which push the walker back and forth on the circle are correlated in time (16).
It is found here that the side-chain motions, characterized by δn, are also subdiffusive. The RCFs of δn computed from MD runs (Eq. 1, l = 2) decay as an SE, i.e., C2,n(t) = exp(-4Dαtα) (Fig. S3), with α and Dα varying along the amino acid sequence (Fig. 2). The side chains and the main chain diffuse in quite a similar way: the correlation coefficient R computed between the exponents α of the RCFs of the γn and those of the RCFs of the δn angles, averaged along the amino acid sequence of VA3, is R = 0.87. The correlation coefficient R computed between the diffusion constants Dα of the RCFs of these CGDAs, averaged along the amino acid sequence of VA3, is R = 0.55.
The FEPs (Fig. 1) and the parameters characterizing the diffusion on these FEPs (Fig. 2) are quite similar for the side chains (δn) and the main chain (γn). One could conclude naively that the two random walkers, moving on a circle and representing the temporal fluctuations of γn and δn, respectively (Fig. S1B), are most likely making steps in the same direction at the same time. Consequently, one could expect that the time series γn(t) and δn(t) extracted from the MD runs (named here, in short, these time series γn(t) and δn(t) trajectories) should be highly correlated. To express this correlation of the motions between the side chain and the backbone at residue n quantitatively, the coefficient R between the steps that the walkers are performing on the δn and γn circles (Fig. S1B) is computed in the five MD runs. For each MD run, the correlation coefficient R between the steps of the walkers, averaged over the amino acid sequence, is not very high and varied widely between residues (Ravg = 0.42, Rmin = 0.1 for n = 3 (CYS), and Rmax = 0.63 for n = 36 (SER) in MD run 1 for example). On the contrary, the correlation between the γn(t) and δn(t) trajectories, computed over 80 ns for each of the five MD runs and averaged over the amino acid sequence, is quite high, but also varies significantly between residues (Ravg = 0.68, Rmin = 0.33 for n = 7 (THR), and Rmax = 0.99 for n = 38 (SER) in MD run 1, for example). In fact, it was found that trajectories of the side chains [δn(t)] and of the main-chain [γn(t)] are highly correlated (R ∼ 1) only for a few residues n along the sequence. It is demonstrated here that the correlation between the side-chain and main-chain motions around a residue n depends on its location and not on its nature.
The question arises as to whether residues with a high correlation between the motions of their side chains (δn) and of the main chain (γn), which are scattered along the amino acid sequence of the protein, also have motions that are coupled to each other through space. To answer this question, a dihedral principal component analysis (dPCA) of the trajectories of the CGDAs (20) was carried out. The dPCA facilitates a projection of the dihedral-angle coordinates of a protein on a few relevant coordinates along which the FEPs and the collective modes of the protein can be analyzed (20–22). By using dPCA, it is demonstrated that the trajectories of the side chains [δn(t)] follow those of the main chain [γn(t)] (i.e., R ∼ 1) for the residues which contribute the most to the slowest modes of the protein. This conclusion is particularly interesting because the slowest modes of a protein are the ones important for its biological function (23–27). Because the slowest modes of a protein in its native state are the easiest to excite thermally, they should also play a role in the early events of the thermal unfolding of a protein.
Results
Comparison of the FEPs of the CGDAs (Fig. 1).
The minima of the FEPs, V(δn) and V(γn), are in good agreement with the values of the CGDAs computed from the NMR-derived structures (2) and x-ray structures (PDB: 1OKH) (28) (Fig. S4). The largest deviations compared with the experimental data were found for δ1, δ3, δ28, δ35, γ36, γ37, γ38, and γ42 (compared with NMR), and for γ6, γ35, γ37, γ38, δ38, γ42, and δ42 (compared with X-ray); the CGDAs at n = 1, 35–38 are the angles which are significantly different in the NMR and x-ray structures (Fig. S4). The potentials in helices (Fig. 1) are found to be rather harmonic (except n = 15, 25, 26) with a minimum approximately 50° for V(γn) and approximately 60° for V(δn), as expected for a canonical right-handed α-helix [for an α-helix (ϕ = -57°, ψ = -47°), γ = 52° and δ = 70°]. The potentials of residues in the antiparallel β-sheet have a minimum close to ± 180° as for a canonical β-sheet [for a antiparallel β-sheet (ϕ = -139°, ψ = 135°), γ = δ = 178°]. Several CGDAs γn and δn for n = 26, 34, 35, 37, 38, 39, 41 move on a multiple-minima potential; they are located in loops/turns except for n = 26 and 34 (6). In addition, the N-terminal potential V(δ1) has two minima. As shown in Fig. 1, the FEP of each CGDA δn is quite similar to the FEP of the corresponding angle γn: the anharmonic potentials with single or multiple minima were found for the same n. In order to quantify the similarity between V(δn) and V(γn) along the amino acid sequence, these two potentials were aligned on their deepest minimum for each value of n (Fig. S2). The correlation coefficient R (10) between the aligned FEPs and an index of similarity h between the corresponding PDFs were computed. The index h varies between 1 (similar) and 0 (dissimilar). The index h was adapted from ref. 29 and is defined in the SI Text: Calculation of the Pearson Correlation Coefficient R and Similarity Index h for the FEPs. Whereas the coefficient R (10) measures the similarity of the shapes of two functions (a function scaled by a constant factor is perfectly correlated to itself), h rather measures their absolute similarity (29). Because of the definition of h (SI Text), the low values of the PDFs (the less statistically relevant data extracted from MD corresponding to the highest values of the potentials) have a small weight in the computation of the similarity. The FEPs were found to be highly correlated [R ∼ 0.8–1.0 (Fig. S2), except n = 3 (R = 0.65)] and their PDFs highly similar [h ∼ 1 (Fig. 1), except n = 36 (h = 0.85) and n = 38 (h = 0.81)].
Comparison of the RCFs of the CGDAs.
All the RCFs of the 43 angles γn (6) and of the 44 angles δn of VA3 were computed up to 1 ns by using Eq. 1 as in ref. 6. The RCFs are converged on this time-scale (6). Three typical RCFs found for the CGDAs δn are identified and are represented in Fig. S3 for the MD run 1 [representative of all runs (6)]. As for the CGDAs γn (6), the RCFs are grouped into three types corresponding to three types of FEP V(δn) (all shown in Fig. S4). As shown in Fig. S3, the RCF of the CGDA δn converges in a few ps to a value close to 1 for a stiff harmonic FEP (as for δ11) (first type) or to a value between 0.7 and 0.9 for a wide or anharmonic single-minimum FEP (as for δ20) (second type), and does not converge on a time-scale of 1 ns for a multiple-minima FEP (as for δ39) (third type). As for the CGDAs of the main chain (6), the RCFs of the CGDAs of the side chains are also extremely well represented by a SE, (dashed line in Fig. S3).
Typical results for the parameters [α, Dα] of the SE fitted on the RCFs up to 1 ns are shown in Fig. 2 for the MD run 1 for each CGDA γn and δn. The exponent α, measuring the speed of the spread of the PDF of the CGDA with the time elapsed (6), varies in an extremely similar manner along the amino acid sequence for the CGDAs γn and δn (correlation coefficient R = 0.87). The exponent α associated with side-chain diffusion is the largest for a multiple-minima V(δn) potential (n = 1, 26, 33 and 39, Fig. 2A and Fig. S4) and the lowest for strongly harmonic potentials located in the secondary structures (for example, n = 11, Fig. 2A and Fig. S4), as observed previously for the diffusion of the main chain (6). The exponent α is smaller for δn than γn for most of the residues n (Fig. 2A), which means (13, 14) that the successive jumps of the side chains (δn) are more correlated than the segmental motions of the main chain (γn) for these residues. Fig. 2 demonstrates that both the side-chain and the main-chain motions correspond to subdiffusion (α < 1) (13). The smallest diffusion constants of the side-chain and main-chain CGDAs are found in helices (Fig. 2B). Except for n = 5 and 6, the diffusion constant Dα (Fig. 2B) is larger for the side-chain motions than for the main-chain motions, indicating that the side chains are making larger angular displacements than the main chain, which is more constrained. The highest diffusion constants Dα correspond to the CGDAs δn located at n = 8, 19, and 36 (they involve the motions of Gly9, Gly20 and Gly37).
Dynamical Correlation Between the Main-Chain (γn) and Side-Chain (δn) Motions.
The CGDAs of the main chain and of the side chains diffuse in a similar way (the values of α and Dα of their RCFs are similar) and on very similar FEPs (h between their FEPs is very high). Based on the definition of δn and γn (Fig. S1) and on the sp3 hybridization of the Cα atom (see Introduction), one could imagine that each δn angle follows the motion of the corresponding γn angle in the course of time, and that both CGDAs thus explore the same region of their respective FEPs at the same time. In order to test this hypothesis, the correlation coefficient R between the time series of the angular steps Δγn(t) and Δδn(t) extracted every ps from the MD runs (see Introduction), was computed for each value of n (see Eq. S1). The results are represented in Fig. 3A. The value of R calculated between the time series Δγn(t) and Δδn(t) varies from Rmin = 0.12 (n = 3, CYS) to Rmax = 0.62 (n = 36, SER) with a low average value Ravg = 0.41 (the standard deviation σ computed over the amino acid sequence is 0.12). By contrast with these results, the average correlation coefficient R computed between the γn(t) and δn(t) trajectories is rather high: Ravg = 0.67 (Fig. 3A), but also varies widely along the primary sequence: Rmin = 0.24 (n = 7, THR) and Rmax = 0.98 (n = 38, SER) (σ computed over the amino acid sequence is 0.19). The coefficients R computed between the δn(t) and γn(t) trajectories and the Δγn(t) and Δδn(t) time series follow the same trends qualitatively along the amino acid sequence with exceptions; for example, between n = 32 and 34 (Fig. 3A). Regions of the protein, for which R computed between the δn(t) and γn(t) trajectories is larger than Ravg + σ (shown by a dashed line in Fig. 3A), are located in loops (except n = 26, 33, and 34). One concludes that the angular steps Δγn(t) and Δδn(t) are poorly correlated compared with the trajectories δn(t) and γn(t), which are highly correlated only for few residues.
Discussion
Correlation Between the CGDA Trajectories and Steps.
How can the correlation coefficient between the CGDA trajectories (0.24 < R < 0.98) be significantly higher than the correlation coefficient (0.12 < R < 0.67) between the coarse-grained angular steps computed every ps (in particular, the values of R between n = 32 and 34 between the CGDA trajectories are larger by about 0.6 than the values of R computed between the CGDA steps; see Fig. 3A)? The values of δn(t) and γn(t) at time t are respectively the result of the sum of successive random steps Δδn(τ) and Δγn(τ) from time τ = 0 up to time t. If the correlation R is higher for the CGDA trajectories δn(t) and γn(t) than for the steps Δδn(t) and Δγn(t) computed at same time t, it would mean that the correlation coefficient between the trajectories increases as they are carrying out more steps. In other words, they are following the same “average” trajectories as time progresses. To test this hypothesis, two new sets of random steps Δγn(t; M) = γn(t + Mδt) - γn(t) and Δδn(t; M) = δn(t + Mδt) - δn(t) (δt = 1 ps) were generated from the MD runs for each value of M = 10, 30, 100, 500, 1,000, 2,000, 10,000, and each value of n represented in Fig. 3 (see SI Text: Calculation of the Pearson Correlation Coefficient R for the Dihedral Angular Steps for details). The time series Δδn(t) and Δγn(t) discussed in the Results section (R between these time series is shown in Fig. 3A) are, by definition, equal to Δγn(t; 1) and Δδn(t; 1). The data Δγn(t; M ≠ 1) [Δδn(t; M ≠ 1)] correspond to a new time series for which each step (every ps) is in fact the displacement of the angle γn [δn] between t-M and t ps. The values of R computed between Δγn(t; M) and Δδn(t; M) increase with M and converge to the values of R computed independently between the γn(t) and δn(t) trajectories [the variations of Rn(M) are shown for Rn(M)avg, Rn(M)min and Rn(M)max in Fig. 3B and for each value of n in Fig. S5]. The convergence of Rn(M) is reached at M = 10 ns for all residues n, with small deviations between Rn(10 ns) and R computed between the trajectories for n = 2, 8, 9, 19, 25, 32–36 (Fig. S5). The slope of Rn(M) varies along the amino acid sequence (Fig. S5); for example, Rn(M) reaches a plateau at M = 30 ps for n = 9 and only at M = 10,000 for n = 27. As shown by Eq. S3 in the SI Text, , where the cross-correlation function C(0; M) is given by with α and α′ being the exponents of the time in the power-law MSDs of the CGDA γn and δn, respectively (5, 6). The average is taken over all times t′ in the whole MD run. It is worth noting that the exponents of the MSD were shown to be close to those of the RCFs given in Fig. 2 (6). For most of the residues (except n = 7, 14, 24, 35, 36), Rn(M) was found here to be approximated by a power-law of time: Rn(M) ∼ Mβ. The exponent β was computed from a fit of Rn(M) by a power-law up to M = 1,000, for each residue n for which the power-law was found to be a good approximation. The approximate relation β ≈ α′ ≈ α was found for most of the residues along the primary sequence (Fig. 2 and Fig. S5). This qualitative relation can be explained as follows. The most subdiffusive CGDA (low exponents in Fig. 2) are the CGDAs for which the RCFs, which involve the correlation between the steps [Eq. 1], converge quickly (< 1 ns) to a constant (for example, n = 11 and 20 in Fig. S3). For these CGDAs, the correlation coefficient between the steps Δγn(t′; M) and Δδn(t′; M) [Cn(0)] also converges to a constant quickly (small β, Fig. 2 and Fig. S5). In the opposite way, the less subdiffusive CGDAs (large exponents in Fig. 2) are those for which the RCFs do not converge to a constant in the ns time-scale (for example, n = 39 in Fig. S3). For these CGDAs, the correlation coefficient between the steps Δγn(t′; M) and Δδn(t′; M) [Cn(0)] does not converge to a plateau in the ns time-scale (high β, Fig. 2 and Fig. S5).
Main-Chain and Side-Chain Motions, Anharmonic FEPs, and Mean-Square-Fluctuations (MSF) of the CGDAs.
Why does the correlation coefficient R between the trajectories of the side chains [δn(t)] and of the main chain [γn(t)] vary so widely along the amino acid sequence (Fig. 3A)? The correlation coefficient does not depend on the nature of the residues. For example, the rotations of the side chains around the Cα atoms of the virtual bonds CYS4-PRO5 and CYS40-PRO41 correspond to different values of R: R = 0.4 (n = 4) and R = 0.76 (n = 40) (Fig. 3A). In fact, the correlation coefficient between the CGDA trajectories is very high (R > R + σ = 0.85) only for a few residues: n = 22, 26, 33–35, 37–39, 41, and 43. The FEPs of these highly correlated regions correspond either to wide single-minimum FEPs (n = 22, 33, 43) or multiple-minima FEPs (n = 26, 34, 35, 37–39, and 41) (Fig. 1). Wide FEPs correspond to values of R above the average; for example, n = 21 (R = 0.81) or n = 40 (R = 0.76). They therefore correspond to regions of large structural fluctuations of the protein (“flexible” regions), as demonstrated in Fig. 4 where we show the MSF of each vector un = { cos[γn(t)], sin[γn(t)]} (MSFn) along the amino acid sequence (see SI Text: The Dihedral Principal Component Analysis). The motions of the main chain (γn) and of the side chains (δn) are less constrained in the flexible regions than in the rigid regions of the protein. This explains why the main chain can follow, on average, the side-chain motions and vice versa in these flexible regions. The correlation between the trajectories γn(t) and δn(t) is very low () for a few residues (n = 4, 7, 9, 11, 12, 18 and 28), which are all located in a helix (except 4) and have low MSFn (Fig. 4). Because the motions of the main chain are generally strongly restricted by the amide hydrogen bonds in a helix (the FEPs are stiff harmonic potentials as for n = 12 in Fig. 1), the main chain cannot follow the motion of the side chain, on average, in regions of low structural fluctuations of the protein.
Main-Chain Motions, Side-Chain Motions, and Collective Modes.
The CGDA structural fluctuations MSFn can be decomposed into collective modes by using dPCA (20–22). The modes have “frequencies” and directions corresponding to the eigenvalues and eigenvectors of the covariance matrix of the vectors un (20) (see SI Text: The Dihedral Principal Component Analysis). The modes with the largest eigenvalues λk (named slow modes) correspond to the modes which contribute the most to the structural fluctuations of the protein. The contribution of each CGDA n to a mode k is the so-called influence νk,n, and the (20–22).
As the highest correlation coefficients between the δn(t) and γn(t) trajectories were observed for residues n with high MSFn (Fig. 4), one wonders if these residues contribute to the same slow collective modes. To answer this question, dPCA was applied to the vectors un(t) = { cos[γn(t)], sin[γn(t)]}. Only the two slowest modes with the largest eigenvalues λ1 and λ2 which contribute 37% and 13%, respectively, to the total MSF of the protein (i.e., the sum over n of the MSFn) were considered. The other modes make much smaller contributions to the total MSF (for example, modes 3 and 4 contribute 7% and 3%, respectively).
In mode 1, the CGDAs γn contributing to the MSFn (as λ1υ1,n) are located at n = 2, n = 32–41 as shown in Fig. 4; there are also small contributions at n = 25 and 26 which are hardly visible. The CGDAs n for which the trajectories γn(t) and δn(t) are highly correlated (R > Ravg + σ = 0.85), namely n = 22, 26, 32–35, 37–39, 41, and 43, are, remarkably, also those which contribute to the slowest mode (except n = 22, 26, and 43). The highest correlation coefficient between the motions of the side chains and of the main chain was found for n = 38 (Fig. 3A), which corresponds to the residue for which the amplitude of the CGDA γn is maximum in the slowest modes 1 (Fig. 4) and for which the FEP has the largest activation barrier (Fig. 1). Interestingly, mode 1 at n = 36 has a low amplitude (Fig. 4) compared with the other residues in the loop in which it is located. Similarly, the correlation coefficient between the side-chain and main-chain trajectories at n = 36 is low compared with the correlation coefficients of the other residues in the loop 34–44 (Fig. 3A).
The contribution of the first two modes to the MSFn of the vectors un(t) is λ1υ1,n + λ2υ2,n and is also shown in Fig. 4. The most important contributions of mode 2 to the MSF occur for n = 32 to 39 and 41 to 44 (Fig. 4). Most of the CGDAs contributing either to mode 1 or mode 2 move on multiple-minima FEPs (Fig. 1), namely n = 34, 35, 37, 38, 39, and 41, and on wide anharmonic FEPs for n = 2, 32, 33, 36, 43, and 44. Thus, it is expected that modes 1 and 2 should correspond to jumps between the different substates of the multiple-minima FEPs along the amino acid sequence (Fig. 1). To test this hypothesis, the 2-D projection of the free-energy-landscape (FEL) of the protein along the directions of the eigenvectors of modes 1 and 2 (20–22) was calculated. The projections of the trajectories of the vectors un(t) extracted from the MD runs along the eigenvectors of modes 1 and 2 are the principal components PC1(t) and PC2(t), respectively (20–22) (SI Text). From the 2-D PDFs of PC1 and PC2 computed from the MD runs, the potential V1,2 = -kT ln[P(PC1,PC2)], shown in the inset of Fig. 4, was calculated. This energy surface can be divided schematically into three basins, states 1, 2, and 3 in Fig. 4. In each basin, the most probable structure of VA3 in the concatenated MD runs was selected, and its CGDAs γ and δ were computed along the amino acid sequence and placed on the 1-D potentials shown in Fig. 1. This projection of the collective PC energy basins on a sequence of 1-D potentials is a powerful approach to interpret the protein dynamics and folding (30). State 1 in Fig. 4 corresponds to the deepest minima of the FEPs V(δn) and V(γn). The metastable substates of the FEPs V(δn) and V(γn) for n = 34, 35, 37, 38, 39, and 41 are occupied in state 2. State 3 is an intermediate state with a displacement of the dihedral angles from their most stable positions on the FEPs towards state 2 for n = 1, 2, 32–39, 41, 43, and 44. There is, therefore, large motion of loop 35–44 coupled to a motion of δ1. The whole motion is represented in Fig. S6 A and B. State 1 is stabilized by two amide backbone hydrogen bonds between Lys1 and Gly37 and between Gly37 and Thr39 (Fig. S6C). The displacement of the loop towards states 2 and 3 is due to the breaking of these two hydrogen bonds (Fig. S6 D and E). In this particular protein, the fluctuations of the long Lys side chain of the N-terminal residue induce the fluctuations of the main chain and of the side chains of a loop located in the C-terminal part. Because the loops and the N and C-terminal parts are more flexible, the side-chain motions and main-chain motions are strongly correlated in this collective motion. This collective motion could contribute to the biological function of VA3, which is still poorly understood (31).
Conclusions
The FEPs of the main-chain and of the side-chain CGDAs are remarkably similar along the primary sequence of the VA3 protein (Fig. 1). It is demonstrated that the side-chain motions (δn) are subdiffusive with stretched exponential RCFs (Fig. S3), like the main-chain motions (γn) (6). The side chains diffuse slower than the main chain (i.e., the exponent α is smaller) on their FEPs with a larger diffusion constant (Fig. 2). The fluctuations of the coarse-grained steps Δγn(t) and Δδn(t) (recorded every ps from the MD runs) were poorly correlated on average along the amino acid sequence (Fig. 3A). The correlation coefficient between the displacements of the CGDA γn and δn after a time t increases with t and converges to the correlation coefficient between the γn(t) and δn(t) trajectories. The increase of the correlation coefficient of the fluctuations between the displacements of the CGDAs is faster for those which are more subdiffusive (more correlated to their past). In spite of that, the CGDAs γn and δn move on very similar FEPs along the entire primary sequence of the protein, and the correlation coefficient between their trajectories varies widely along the amino acid sequence (Fig. 3). The highest correlation between the γn(t) and δn(t) trajectories occurs for residues n located in the most flexible regions of the protein, where the FEPs are generally multiple-minima. In addition, the highest correlated trajectories γn(t) and δn(t) correspond to residues n which contribute to the slowest collective mode of the protein. This explains the heterogeneity of the correlation coefficient R between the trajectories of the CGDAs along the primary sequence (Fig. 3A). In the particular case of VA3, the side chain of the N-terminal part of the protein hence is coupled to the motion of a loop in the nearby C-terminal part. In these modes, the side chains and main chain follow each other on average in the course of time. The slow collective motions are important for the biological functions of proteins and could play a role in protein folding. Indeed, in an unfolded protein, most of the regions of the protein are flexible and, from our findings, it can be expected that the side-chain motions and the main-chain motions of these regions should follow each other on average. When part of the protein is locked (folded), the residues involved cease to contribute to collective modes and the side chains cease to follow the main chain motion. The implications for the protein entropy and protein folding of this change in the correlation between the side-chain motions and the main-chain motion, when part of the protein is folded, is beyond the scope of the present study but it is worth pursuing in the future.
Methods
Five all atom MD simulations of VA3 in explicit water each of a duration of 80 ns, were carried out previously (6). Details of all MD simulations were given in ref. 6. In the present work, these five MD runs were joined to each other to form a long (400 ns) single MD run. The calculations performed from the five joined MD runs and each MD run are similar (see SI Text: Comparison Between Calculations Performed Over 400 ns (Five MD Runs) and Over 80 ns (for Each MD Run)). The correlation coefficient R was computed according to the standard formula (10), taking care of the periodicity of the angle variables (ref. 5 and SI Text: Calculation of the Pearson Correlation Coefficient R for the Dihedral Angular Steps). The similarity index h defined in theSI Text, and adapted from ref. 29, was applied to the aligned PDFs of the CGDAs. The dPCA was applied to the covariance matrix of the vectors un(t) of γn. For the CGDAs γn, there are 43 modes, each mode k contributing to the MSFn by λkvk,n (19–21) (see SI Text: The Dihedral Principal Component Analysis. The dPCA was also applied to the covariance matrix of the vectors un(t) = { cos[δn(t)], sin[δn(t)]}. The results were similar to those discussed for the modes of the CGDAs γn. The main difference is that δ33 has a larger contribution to the slowest mode 1 (Fig. S7) than γ33 (Fig. 4). More details about the methods and amino acid sequence of VA3 can be found in the SI Text.
Supplementary Material
ACKNOWLEDGMENTS.
Y.C. thanks the Centre National de la Recherche Scientifique (CNRS) and the Conseil Regional de Bourgogne for a PhD fellowship. This research was supported by grants from the National Institutes of Health (GM-14312) and the National Science Foundation (MCB10-19767), and was conducted by using the resources of the 736-processor Beowulf cluster at the Baker Laboratory of Chemistry and Chemical Biology, Cornell University and the HPC resources from DSI-CCUB (Université de Bourgogne).
Footnotes
The authors declare no conflict of interest.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1207083109/-/DCSupplemental.
References
- 1.Matheson RR, Jr, Scheraga HA. A method for predicting nucleation sites for protein folding based on hydrophobic contacts. Macromolecules. 1978;11:1819–1829. [Google Scholar]
- 2.Romagnoli S, et al. NMR structural determination of Viscotoxin A3 from Viscum album L. Biochem J. 2000;350:569–577. [PMC free article] [PubMed] [Google Scholar]
- 3.Bonvin AMJJ, Rullmann JAC, Lamerichs RMJN, Boelens R, Kaptein R. “Ensemble” iterative relaxation matrix approach: A new NMR refinement protocol applied to the solution structure of crambin. Proteins. 1993;15:385–400. doi: 10.1002/prot.340150406. [DOI] [PubMed] [Google Scholar]
- 4.Nishikawa K, Momany FA, Scheraga HA. Low-energy structures of two dipeptides and their relationship to bend conformations. Macromolecules. 1974;7:797–806. doi: 10.1021/ma60042a020. [DOI] [PubMed] [Google Scholar]
- 5.Senet P, Maisuradze GG, Foulie C, Delarue P, Scheraga HA. How main-chains of proteins explore the free-energy landscape in native states. Proc Natl Acad Sci USA. 2008;105:19708–19713. doi: 10.1073/pnas.0810679105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Cote Y, Senet P, Delarue P, Maisuradze GG, Scheraga HA. Nonexponential decay of internal rotational correlation functions of native proteins and self-similar structural fluctuations. Proc Natl Acad Sci USA. 2010;107:19844–19849. doi: 10.1073/pnas.1013674107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Liwo A, Khalili M, Scheraga HA. Ab initio simulations of protein-folding pathways by molecular dynamics with the united-residue model of polypeptide chains. Proc Natl Acad Sci USA. 2005;102:2362–2367. doi: 10.1073/pnas.0408885102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Liwo A, et al. Modification and optimization of the united-residue (UNRES) potential-energy function for canonical simulations. I. Temperature dependence of the effective energy function and tests of the optimization method with single training proteins. J Phys Chem B. 2007;111:260–285. doi: 10.1021/jp065380a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Korkuta A, Hendrickson WA. A force field for virtual atom molecular mechanics of proteins. Proc Natl Acad Sci USA. 2009;106:15667–15672. doi: 10.1073/pnas.0907674106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Pearson K. Mathematical contributions to the theory of evolution. III Regression, heredity and panmixia. Phil Trans R Soc Lond A. 1896;187:253–318. [Google Scholar]
- 11.Perrin F. Etude mathématique du mouvement Brownien de rotation. (Mathematical study of rotational Brownian motion) Ann Sci Ecole Norm S. 1928;45:1–51. Originally in French. [Google Scholar]
- 12.Mandelbrot BB, Van Ness JW. Fractional Brownian motion, fractional noises and applications. SIAM Rev. 1968;10:422–437. [Google Scholar]
- 13.Bouchaud JP, Georges A. Anomalous diffusion in disordered media: Statistical mechanisms and physical applications. Phys Rep. 1990;195:127–293. [Google Scholar]
- 14.Lim SC, Muniandy SV. Self-similar Gaussian process for modeling anomalous diffusion. Phys Rev E. 2002;66:021114. doi: 10.1103/PhysRevE.66.021114. [DOI] [PubMed] [Google Scholar]
- 15.Wu J, Berland KM. Propagators and time-dependent coefficients for anomalous diffusion. Biophys J. 2008;95:2049–2052. doi: 10.1529/biophysj.107.121608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Min W, Luo G, Cherayil BJ, Kou SC, Xie XS. Observation of a power-law memory kernel for fluctuations within a single protein molecule. Phys Rev Lett. 2005;94:198302. doi: 10.1103/PhysRevLett.94.198302. [DOI] [PubMed] [Google Scholar]
- 17.Kneller GR, Hinsen K. Fractional Brownian dynamics in proteins. J Chem Phys. 2004;121:10278–10283. doi: 10.1063/1.1806134. [DOI] [PubMed] [Google Scholar]
- 18.Luo G, Andricioaei I, Xie XS, Karplus M. Dynamic distance disorder in proteins is caused by trapping. J Phys Chem B. 2006;110:9363–9367. doi: 10.1021/jp057497p. [DOI] [PubMed] [Google Scholar]
- 19.Liu L, Gronenborn AM, Bahar I. Longer simulations sample larger subspaces of conformations while maintaining robust mechanisms of motion. Proteins. 2012;80:616–625. doi: 10.1002/prot.23225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Altis A, Nguyen PH, Hegger R, Stock G. Dihedral angle principal component analysis of molecular dynamics simulation. J Chem Phys. 2007;126:244111. doi: 10.1063/1.2746330. [DOI] [PubMed] [Google Scholar]
- 21.Kitao A, Gō N. Investigating protein dynamics in collective coordinate space. Curr Opin Struct Biol. 1999;9:164–169. doi: 10.1016/S0959-440X(99)80023-2. [DOI] [PubMed] [Google Scholar]
- 22.Maisuradze GG, Leitner DM. Free energy landscape of a biomolecule in dihedral principal component space: Sampling convergence and correspondence between structures and minima. Proteins. 2007;67:569–578. doi: 10.1002/prot.21344. [DOI] [PubMed] [Google Scholar]
- 23.McCammon JA, Gelin BR, Karplus M, Wolynes P. The hinge-bending mode in Lysosyme. Nature. 1976;262:325–326. doi: 10.1038/262325a0. [DOI] [PubMed] [Google Scholar]
- 24.Brooks BR, Karplus M. Normal modes for specific motions of macromolecules: application to the hinge-bending mode of lysozyme. Proc Natl Acad Sci USA. 1985;82:4995–4999. doi: 10.1073/pnas.82.15.4995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tobi D, Bahar I. Structural changes involved in protein binding correlate with intrinsic motions of proteins in the unbound state. Proc Natl Acad Sci USA. 2005;102:18908–18913. doi: 10.1073/pnas.0507603102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Nicolay S, Sanejouand YH. Functional modes of proteins are among the most robust. Phys Rev Lett. 2006;96:078104. doi: 10.1103/PhysRevLett.96.078104. [DOI] [PubMed] [Google Scholar]
- 27.Henzler-Wildman K, Kern D. Dynamic personalities of proteins. Nature. 2007;450:964–972. doi: 10.1038/nature06522. [DOI] [PubMed] [Google Scholar]
- 28.Debreczeni JE, Girmann B, Zeeck A, Sheldrick GM. Structure of viscotoxin A3: Disulfide location from weak SAD data. Acta Crystallogr D. 2003;59:2125–2132. doi: 10.1107/s0907444903018973. [DOI] [PubMed] [Google Scholar]
- 29.Hodgkin EE, Richards WG. Molecular similarity based on electrostatic potential and electric field. Int J Quantum Chem. 1987;14:105–110. [Google Scholar]
- 30.Maisuradze GG, Senet P, Czaplewski C, Liwo A, Scheraga HA. Investigation of protein folding by coarse-grained molecular dynamics with the unres force field. J Phys Chem A. 2010;114:4471–4485. doi: 10.1021/jp9117776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Giudici M, et al. Antifungal effects and mechanism of action of Viscotoxin A3. FEBS J. 2006;273:72–83. doi: 10.1111/j.1742-4658.2005.05042.x. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.