Abstract
Protein structural fluctuation, typically measured by Debye-Waller factors, or B-factors, is a manifestation of protein flexibility, which strongly correlates to protein function. The flexibility-rigidity index (FRI) is a newly proposed method for the construction of atomic rigidity functions required in the theory of continuum elasticity with atomic rigidity, which is a new multiscale formalism for describing excessively large biomolecular systems. The FRI method analyzes protein rigidity and flexibility and is capable of predicting protein B-factors without resorting to matrix diagonalization. A fundamental assumption used in the FRI is that protein structures are uniquely determined by various internal and external interactions, while the protein functions, such as stability and flexibility, are solely determined by the structure. As such, one can predict protein flexibility without resorting to the protein interaction Hamiltonian. Consequently, bypassing the matrix diagonalization, the original FRI has a computational complexity of . This work introduces a fast FRI (fFRI) algorithm for the flexibility analysis of large macromolecules. The proposed fFRI further reduces the computational complexity to . Additionally, we propose anisotropic FRI (aFRI) algorithms for the analysis of protein collective dynamics. The aFRI algorithms permit adaptive Hessian matrices, from a completely global 3N × 3N matrix to completely local 3 × 3 matrices. These 3 × 3 matrices, despite being calculated locally, also contain non-local correlation information. Eigenvectors obtained from the proposed aFRI algorithms are able to demonstrate collective motions. Moreover, we investigate the performance of FRI by employing four families of radial basis correlation functions. Both parameter optimized and parameter-free FRI methods are explored. Furthermore, we compare the accuracy and efficiency of FRI with some established approaches to flexibility analysis, namely, normal mode analysis and Gaussian network model (GNM). The accuracy of the FRI method is tested using four sets of proteins, three sets of relatively small-, medium-, and large-sized structures and an extended set of 365 proteins. A fifth set of proteins is used to compare the efficiency of the FRI, fFRI, aFRI, and GNM methods. Intensive validation and comparison indicate that the FRI, particularly the fFRI, is orders of magnitude more efficient and about 10% more accurate overall than some of the most popular methods in the field. The proposed fFRI is able to predict B-factors for α-carbons of the HIV virus capsid (313 236 residues) in less than 30 seconds on a single processor using only one core. Finally, we demonstrate the application of FRI and aFRI to protein domain analysis.
I. INTRODUCTION
Proteins provide the structural basis for living organisms and are essential research subjects of the biological sciences. Although the original sequence-structure-function dogma2 has been seriously challenged,8,32 the protein structure, in either folded or unfolded form, still determines its function. Therefore, the understanding of the structure of a protein holds the key to the prediction of the protein's function.27,45 Unfortunately, it remains a major challenge in the biological sciences to predict a protein's functions from its known structure.
There are a few essential factors, including protein geometry, electrostatics, and flexibility, that strongly correlate to protein function. There is no need to elaborate on the importance of protein geometry and electrostatics to protein function and dynamics. However, the impact of protein flexibility on protein function is often underestimated or even overlooked. In general, protein flexibility is the ability to deform from the equilibrium state under external forces, such as docking of ligands, docking with other proteins or random bombardments of small molecules in liquid and/or gas phases and lattice phonons in a solid phase.13 Under physiological conditions, proteins experience everlasting motion or structural fluctuation in a wide variety of spatiotemporal scales because of their flexibility and uninterrupted external forces. However, at absolute zero temperature, there is no protein motion or fluctuation. Therefore, protein motion or fluctuation is just the molecule's response to the external stimuli, while protein flexibility is an intrinsic property of the structure. In fact, protein flexibility varies from protein to protein and is a signature of the protein. In a given protein, flexibility can be different from atom to atom, from residue to residue and from domain to domain. Typically, the performance of a theoretical model for protein flexibility analysis can be validated via its prediction of protein structural fluctuation.
X-ray crystallography is one of the most important techniques for protein flexibility analysis. The atomic mean-square-fluctuations are reflected in X-ray diffraction or other diffraction data and can be estimated in terms of the Debye-Waller factor, also known as the B-factor or temperature factor. Typically, reported B-factors are not corrected for the variations in atomic diffraction cross sections and chemical stability during the diffraction data collection, which perhaps contributes to the fact that all-atom models usually do not work as well as coarse-grained models in the B-factor prediction.28 Nuclear magnetic resonance (NMR) is another important method, which is particularly valuable for flexibility analysis under physiological conditions. NMR is able to investigate protein flexibility at a variety of time scales.
Apart from experimental approaches, a number of theoretical methods have been developed for flexibility analysis and B-factor prediction in the past. Protein collective motion and fluctuation can be elucidated by molecular dynamics (MD), which has considerably expanded our understanding of the conformational landscapes of proteins. However, the excessively large number of degrees of freedom associated with the all-atom representation and long time integration becomes computationally inefficient with increasing size of the system and can obscure larger scale motions. Alternative time-independent approaches, such as normal mode analysis (NMA),7,15,23,39 graph theory,19 and elastic network model (ENM)41 theories, including Gaussian network model (GNM)4,5,12 and anisotropic network model (ANM),3 have been developed for protein flexibility analysis in the past few decades. These methods can be derived from their corresponding Newton's equations by using the time-harmonic approximation.28 The low order eigenmodes computed from diagonalizing the Kirchhoff matrix or the Hessian matrix can shed light on the long-time behavior of the protein dynamics beyond the reach of MD simulations.5,7,23,39,41 Coarse-grained based ENM and GNM approaches have become popular recently due to their simplified potential and computational efficiency.3–5,16,24,37 It was shown that the GNM is about one order more efficient than most other approaches.48 Improvements to these approaches have been developed for many aspects, including crystal periodicity and cofactor corrections,17,21,22,35 and density – cluster rotational – translational blocking.11 These approaches have been applied to the study of large proteins or protein complexes, such as, hemoglobin,47 F1 ATPase,10,50 chaperonin GroEL,20,49 viral capsids,29,36 and ribosome.38,43 Flexibility also plays an important role in stability25 and docking analysis.14 For further detail in their status and application, the reader is referred to recent review papers.9,26,34,48
Recently, we have developed a new multiscale formalism called continuum elasticity with atomic rigidity (CEWAR) for the elastic analysis of excessively large macromolecules. In the CEWAR approach, a continuous atomic rigidity function is required to characterize the shear modulus in the stress tensor of elasticity equations. To this end, a simple method, called flexibility-rigidity index (FRI), is introduced to evaluate macromolecular flexibility and rigidity.46 We noted after the publication of our earlier work46 that the name of “flexibility index” was proposed independently by von der Lieth et al.42 and Jacobs et al.19 for two different quantities to describe bond strengths. Both of these flexibility indices are distinct from our proposed FRI. The FRI is a solely structural based algorithm that does not reconstruct any protein interaction Hamiltonian. Only elementary arithmetics is needed in the FRI method for proteins. In particular, the FRI prediction of protein B-factors does not require a stringently minimized structure and time consuming matrix diagonalization or matrix decomposition, nor does it involve any training procedure. Two types of monotonically decaying correlation functions, namely, exponential type and Lorentz type, have been utilized previously for the construction of protein correlation matrix. Parameter ranges for the FRI have been extensively tested and the performance of the FRI for protein B-factor prediction has been carefully validated with a set of 263 proteins.46 It is found that for residue based B-factor prediction, the FRI can be made parameter free. However, it is not clear how the FRI compares to alternative approaches in the field, particularly the state of the art methods such as GNM and NMA.
One of the objectives of the present work is to introduce a fast FRI (fFRI) algorithm by using appropriate data structures. Computational efficiency is a central issue. The computational complexity of the proposed fFRI is of , compared to that of for the original FRI algorithm and of for the GNM, where N is the number of atoms. We use a cell lists approach1 to reduce the computational complexity. Another objective is to introduce anisotropic FRI (aFRI) algorithms for the motion analysis of biomolecules. Unlike ANM,3,28 which is completely global and has 3N × 3N elements in its Hessian matrix, the proposed aFRI algorithms have adaptive Hessian matrices, which vary from completely global to completely local. Despite the localization, there are collective motions in three sets of eigenvectors. The other objective of the present work is to further analyze the performance of the FRI methods for protein B-factor prediction. To this end, we examine the accuracy of FRI algorithms associated with four families of correlation functions and carry out a comparative study of the FRI and fFRI vs. other cutting edge approaches, namely, NMA and GNM. Our investigation concerns three issues, i.e., accuracy, reliability, and efficiency in the protein B-factor prediction. Finally, we also demonstrate the applications of the FRI and aFRI to protein domain analysis.
The rest of this paper is organized as follows. Section II is devoted to methods and algorithms. To establish notation and facilitate further discussion, the FRI approach is briefly discussed. We present a new simplified version of the FRI that is relevant to the B-factor prediction and visualization analysis studied in this work. Anisotropic FRI algorithms are proposed via two different ways. The fFRI algorithm is developed by using appropriate data structures to avoid the evaluation of insignificant correlation matrix elements, which leads to a sparse fFRI matrix. In Sec. III, we first analyze the behavior of a few FRI correlation functions. Additionally, we examine the parameter dependence of some FRI correlation functions. Moreover, the performance of fFRI is investigated. Further, we provide a comprehensive comparison of our FRI and other established methods. We adopted three protein sets corresponding to relatively small-, medium-, and large-sized structures, proposed in the literature.28 We also utilize an extended set of 365 proteins to further evaluate the performance of various methods. Furthermore, the computational complexities of FRI, fFRI, and GNM are compared over a set of 44 proteins. Finally, we show that the FRI offers a distinguished visualization of biomolecular structure and interaction. In Sec. IV, we demonstrate the usefulness of the FRI and aFRI by carrying out an in-depth study of protein domains. This paper ends with concluding remarks.
II. THEORY AND ALGORITHM
In the CEWAR model, the dynamics of a biomolecular system under the given force f is governed by the equation of motion46
(1) |
where ρ is the density of the macromolecule, w is the displacement, and λ = λ(r) and μ = μ(r) are, respectively, bulk modulus and shear modulus. The FRI algorithm was proposed to evaluate the shear modulus, i.e., rigidity.
This section describes the theory and algorithm underpinning the FRI method. We first briefly review the FRI theory to establish notation. Then, two anisotropic FRI algorithms are introduced for the analysis of the anisotropic motions of biomolecules. Finally, A fast FRI algorithm is proposed to reduce the computational complexity of the original FRI.
A. Flexibility-rigidity index
We consider proteins as examples to illustrate our FRI algorithm, although other biomolecules, such as DNA and RNA, can be similarly treated with a minor modification of our algorithm. We are particularly interested in a coarse-grained representation. However, methods for a full atom description can be formulated as well.
We seek a structure based algorithm to convert protein geometry into protein topology. To this end, we consider a protein with N Cα atoms. Their locations are represented by . We denote ‖ri − rj‖ the Euclidean space distance between ith Cα atom and the jth Cα atom. The distance geometry of protein Cα atoms is utilized to establish the topology connectivity by using monotonically decreasing radial basis functions,
(2) |
where ηij is a characteristic distance between particles, and Φ(‖ri − rj‖; ηij) is a correlation function, which is, in general, a real-valued monotonically decreasing function. As a correlation function, it satisfies
(3) |
(4) |
Delta sequences of the positive type discussed in an earlier work44 are all good choices. For example, one can use generalized exponential functions
(5) |
and generalized Lorentz functions
(6) |
Essentially, the correlation between any two particles should decay according to their distance. Therefore, many other alternatives can be used and some of them are investigated in Sec. III.
The correlation map or cross correlation is an important quantity for the GNM. We can define a similar correlation map by setting C = {Cij}, i, j = 1, 2, …, N. The correlation map measures the connectivity of Cαs in the protein. The similarity and difference of the present correlation map and that of the GNM are studied in Sec. III.
We define an atomic rigidity index μi as the summation of topological connectivity
(7) |
where wij is a weight function related to the atomic type. The atomic rigidity index μi manifests the rigidity or stiffness at the ith atom. In a general sense, the atomic rigidity index reflects the total interaction strength, including both bonded and non-bonded contributions. It is quite straightforward to define the averaged molecular rigidity index as a summation of atomic rigidity indices
(8) |
The averaged molecular rigidity index can be used to predict molecular thermal stability, bulk modulus, density (compactness), boiling points of isomers, the ratio of surface area over volume, surface tension, etc. A detailed investigation of these aspects is beyond the scope of the present work.
We are now ready to define a position dependent shear modulus
(9) |
where wj(r) is a weight function, r is in the proximity of ri, and ΩE is the macromolecular domain. In order to determine wj(r), we define an average rigidity (or averaged rigidity index function) by
(10) |
where V is the volume of the macromolecule. If wj(r) is a constant, its value can be uniquely determined by a comparison of with experimental shear modulus33 for a given macromolecule and correlation function.
We also define an atomic flexibility index as
(11) |
Since the flexibility at each atom is proportional to its temperature fluctuation, we can express B-factors as
(12) |
where are theoretically predicted B-factors, and a and b are two constants to be determined by a simple linear regression.
We can also define the averaged molecular flexibility index (MFI) as a summation of atomic flexibility indices
(13) |
MFI should correlate with molecular stability and energy.
For the purpose of visualization, we define a continuous atomic flexibility function as
(14) |
where Ψ(‖r − rj‖) is a general interpolation function for scattered data. Wavelets, spline functions, and modified Shepard's method30,40 can be employed for the interpolation. One can map f(r) to the molecular surface to visualize the protein flexibility.46 Alternatively, one can compute the continuous atomic flexibility function by
(15) |
B. Anisotropic flexibility-rigidity index
In this section, we propose a new anisotropic model based on our FRI theory. In existing anisotropic methods, the Hessian matrix is always global, i.e., the matrix contains all the 3N × 3N elements for N particles in a molecule. In our aFRI model, the Hessian matrix is inherently local and adaptive. Its size may vary from 3 × 3 for a completely local aFRI to 3N × 3N for a completely global aFRI, depending on the focus of a physical problem.
Let us partition all the N particles in a molecule into a total of M clusters {c1, c2, …, ck, …, cM}. Cluster ck has Nk particles or atoms so that . A cluster may be of physical interest, i.e., an alpha helix, a domain, or a binding site of a protein. One of two extreme cases is that there is only one particle in each cluster. We therefore have N clusters. The other case is that there is only one cluster, i.e., the whole molecule. The essential idea is to develop a Hessian matrix for each cluster individually without the information about other cluster properties. (However, information for nearby particles outside the cluster is still required.) For example, if we are interested in the thermal fluctuation of a particular cluster ck with Nk particles or atoms, we can find 3Nk eigenvectors for the cluster. Let us keep in mind that each position vector in has three components, i.e., r = (x, y, z). We denote
(16) |
Note that for each given ij, we define as a local anisotropic matrix
(17) |
Since rigidity and flexibility can both be anisotropic, it is natural to propose two different aFRI algorithms based on rigidity Hessian matrix and flexibility Hessian matrix, respectively.
1. Anisotropic rigidity
The anisotropic rigidity is defined by a rigidity Hessian matrix for an arbitrary cluster ck. Let us denote a rigidity Hessian matrix for cluster ck. Its elements are chosen as
(18) |
(19) |
(20) |
Hessian matrix is of 3Nk × 3Nk dimensions. Note that the diagonal part, , has built in information from all the particles in the system, even if the cluster is completely localized, i.e., Nk = 1, ∀k.
An immediate test of the anisotropic rigidity is to check if it works for the B-factor prediction. To this end, we collect the diagonal terms of the rigidity Hessian matrix
(21) |
(22) |
We then define a set of anisotropic rigidity (AR) based flexibility indices by
(23) |
B-factors can be predicted with a set of by using the linear regression in Eq. (12).
2. Anisotropic flexibility
To analyze biomolecular anisotropic motions in parallel to ANM, we need to examine their anisotropic flexibility. To this end, we further define a flexibility Hessian matrix F(ck) for cluster ck as
(24) |
(25) |
(26) |
where (Φij)−1 denote the unscaled inverse of matrix Φij such that Φij(Φij)−1 = |Φij|. Similar to the anisotropic rigidity, the diagonal part Fii(ck) has built in information from all particles in the system. Therefore, even if the partition of clusters is completely localized (i.e., N clusters), certain correlation among atomic motions is retained. By diagonalizing F(ck), we obtain 3Nk eigenvectors for the Nk particles in cluster ck. Since the selection of ck is arbitrary, eigenvectors of all other clusters can be attained using the same procedure.
To obtain the B-factor prediction from this anisotropic flexibility, we define a set of anisotropic flexibility (AF) based flexibility indices by
(27) |
(28) |
Then Eq. (12) is employed to obtain B-factor predictions.
In this work, we only consider the coarse-grained model in which each residue is represented by its Cα. To further simplify the model, the differences between residues are ignored. The parameter wij is assumed to be 1 and ηij is set to a constant η.
C. Fast FRI algorithm
As discussed in our earlier work,46 the original FRI algorithm has the computational complexity of , mainly due to the construction of the correlation matrix. In the present work, we propose a fFRI algorithm, which computes only the significant elements of the correlation matrix and at the same time maintains the accuracy of our method. As a result, the computational complexity of our fFRI algorithm is of .
The essential idea is to partition the residues in a protein into cubic boxes according to their spatial locations. For each residue in a given box, we only compute its correlation matrix elements with all residues within the given box and with all residues in the adjacent 26 boxes. The accuracy and efficiency of this approach are determined by the box dimension. We select a box size of R such that
(29) |
where ɛ > 0 is a given truncation error. Therefore, for generalized exponential functions (5), we have
(30) |
If we set ɛ = 10−2, we have R ≈ 4.6η for κ = 1 and R ≈ 2.15η for κ = 2. Note that different κ values have different optimal η values. The higher the κ value is, the larger the optimal η is.
Similarly, for generalized Lorentz functions (6), we choose the box size
(31) |
Again, if we set ɛ = 10−2, we have R ≈ 10η for υ = 2 and R ≈ 4.6η for υ = 3.
An optimal R should balance the accuracy and efficiency. In Sec. III B, it is found that the selection of R = 12 Å is near optimal for both exponential and Lorentz functions. In Algorithm 1, we present a pseudocode to illustrate the truncation algorithm of the fFRI.
Input: atoms(N) | ▷ XYZ coordinates from PDB file |
mincoor ← minval(atoms) | ▷ Compute dimensions of bounding box |
maxcoor ← maxval(atoms) | |
R ← boxsize | ▷ Set size of grid |
Nbox ← ceiling((maxcoor − mincoor)/R) | ▷ Compute number of boxes in each direction |
for ii ← 1, Natoms do | |
i, j, k ← ceiling((atoms(ii) − mincoor)/R) | ▷ Count the number of atoms in each box |
Natoms(i, j, k) ← Natoms(i, j, k) + 1 | |
end for | |
for k ← 1, Nbox[3] do | |
for j ← 1, Nbox[2] do | |
for i ← 1, Nbox[1] do | |
allocate(box(i, j, k)) | ▷ Allocate space for each box |
end for | |
end for | |
end for | |
for ii ← 1, Natoms do | ▷ Copy coordinates to appropriate box based on 3D coordinates |
i, j, k ← ceiling((atoms(ii) − mincoor)/R) | |
box(i, j, k) ← atoms(ii) | |
end for | |
for k ← 1, Nbox[3] do | ▷ Iterate over boxes |
for j ← 1, Nbox[2] do | |
for i ← 1, Nbox[1] do | |
for na ← 1, Natoms(i, j, k) do | ▷ Iterate over atoms in current box |
for n ← k − 1, k + 1 do | ▷ Iterate over adjacent boxes |
for m ← j − 1, j + 1 do | |
for l ← i − 1, i + 1 do | |
for nb ← 1, Natoms(l, m, n) do | ▷ Iterate over atoms in adjacent boxes |
dist ← distance(box(i, j, k)(na), box(l, m, n)(nb)) | |
FRI(na) ← kernel(dist) | |
end for | |
end for | |
end for | |
end for | |
end for | |
end for | |
end for | |
end for |
III. NUMERICAL EXPERIMENTS
In this section, we validate the FRI approach for protein B-factor prediction by a comparison of its performance with that of two established methods, namely, NMA and GNM. We consider the accuracy, reliability and efficiency of these methods. It is well known that the computational complexity of matrix diagonalization is asymptotically close to , while that of correlation map construction and two-parameter linear regression given in Eq. (12) is asymptotically . The computational complexity of the proposed fFRI algorithm is further reduced to . Therefore, there is a dramatic reduction in the computational complexity. We demonstrate that the FRI method outperforms other methods in computational efficiency and is potentially useful for the flexibility analysis of excessively large macromolecules.
To test FRI against GNM and NMA, five sets of structures are utilized. Among them, three sets were used by Park, Jernigan, and Wu in their comparative study.28 These include relatively small-, medium-, and large-sized sets of structures. A fourth set of 44 structures was created to test the efficiency of each algorithm. This set was created by randomly selecting protein-only structures from the Protein Data Bank database with varying size. The number of residues for proteins in this set range from 125 to 313 236 residues. The final set, called a superset, is a combination of sets including the three sets used by Park el al.,28 the first 40 structures of the efficiency set and a set of 263 high-resolution structures used in earlier tests of the FRI method.46 The total number of structures in the superset is 365 after the removal of duplicate structures.
To quantitatively assess the performance of the proposed FRI model for the B-factor prediction, we consider the correlation coefficient
(32) |
where are a set of predicted B-factors by using the proposed method and are a set of experimental B-factors read from the PDB file. Here and the statistical averages of theoretical and experimental B-factors, respectively.
A. Analysis of FRI correlation functions
In order to further explore the FRI method, we test four types of correlation functions. Apart from the Lorentz and exponential functions, two alternative functions are employed in our study. All correlation functions equal the unit at the origin and are monotonically decreasing with respect to increasing distance (r). Each correlation function is tested with a range of parameter values for each of 365 structures as listed in Table I. The performance of the new correlation functions comes close to that of the exponential and Lorentz functions with the product of these two having the highest average correlation coefficient among alternative functions.
Table I.
Correlation function | Parameter range | Average correlation coefficient |
---|---|---|
0.676 | ||
0.673 | ||
0.670 | ||
0.577 |
B. Analysis of fFRI algorithms
To analyze the best parameter for Lorentz and exponential functions, we study their behavior in Fig. 1, where each function is tested over a range of parameters. For exponential type of functions, κ = 1 and η = 3 Å give rise to a near optimal parameter-free FRI. Similarly, for Lorentz type of functions, υ = 3, and η = 3 Å offer near optimal results. It is seen from Fig. 1 that exponential functions are quite sensitive to η values, while Lorentz functions are relatively robust with respect to η. This study provides a basis for the selection of parameter free FRI (pfFRI) schemes.
It is interesting to analyze the performance of the proposed fFRI in terms of accuracy and efficiency. To this end, we first explore the impact of box size on the correlation coefficients of a few fFRI schemes in Fig. 2. For each given κ and υ, the best η found in Fig. 1 is employed. It is seen from Fig. 2 that both exponential and Lorentz types of functions are able to achieve their near optimal performance at R = 12 Å. Therefore, we recommend R = 12 Å, η = 3 Å and κ = 1 for the exponent type of fFRI method. Similarly, R = 12 Å, η = 3 Å and υ = 3 are near optimal for Lorentz type of fFRI methods.
C. Comparison of B-factor predictions
1. FRI vs GNM and NMA
In order to compare the FRI and GNM, we re-analyzed the structures from Park et al.28 with the GNM method with a cutoff value of 7 Å, the same value used by the authors. It was found that some correlation coefficients were artificially low for GNM due to multiple coordinates for some Cα atoms in some PDB data and missing Cα atoms in others. To ensure a fair comparison between the FRI and GNM we re-analyzed the structures using GNM after processing the PDB files to fix these issues. We removed all but the highest occupancy coordinates for each atom and used every Cα atom from the original PDB files to run the GNM B-factor prediction code and calculate corrected correlation coefficients. In Tables II–IV, optimal and parameter free FRI is compared to the GNM data reported by Park et al.28 The newly calculated correlation coefficient is shown only if there is a significant improvement using our processed PDB files. On the other hand, Table V lists all correlation coefficients for GNM from our own tests using our processed PDB files. These correlation coefficients are typically the same as those reported by Park et al.28 although some have changed. The use of our processed PDB files leads to a slight increase in the average scores for the GNM in our analysis.
Table II.
PDB ID | N | opFRI | pfFRI | GNMa | NMAa |
---|---|---|---|---|---|
1AIE | 31 | 0.588 | 0.416 | 0.155 | 0.712 |
1AKG | 16 | 0.373 | 0.350 | 0.185 | −0.229 |
1BX7 | 51 | 0.726 | 0.623 | 0.706 | 0.868 |
1ETL | 12 | 0.710 | 0.609 | 0.628 | 0.355 |
1ETM | 12 | 0.544 | 0.393 | 0.432 | 0.027 |
1ETN | 12 | 0.089 | 0.023 | −0.274 | −0.537 |
1FF4 | 65 | 0.718 | 0.613 | 0.674 | 0.555 |
1GK7 | 39 | 0.845 | 0.773 | 0.821 | 0.822 |
1GVD | 52 | 0.781 | 0.732 | 0.591 | 0.570 |
1HJE | 13 | 0.811 | 0.686 | 0.616 | 0.562 |
1KYC | 15 | 0.796 | 0.763 | 0.754 | 0.784 |
1NOT | 13 | 0.746 | 0.622 | 0.523 | 0.567 |
1O06 | 20 | 0.910 | 0.874 | 0.844 | 0.900 |
1OB4 | 16 | 0.776 | 0.763 | 0.750b | 0.930 |
1OB7 | 16 | 0.737 | 0.545 | 0.652b | 0.952 |
1P9I | 29 | 0.754 | 0.742 | 0.625 | 0.603 |
1PEF | 18 | 0.888 | 0.826 | 0.808 | 0.888 |
1PEN | 16 | 0.516 | 0.465 | 0.270 | 0.056 |
1Q9B | 43 | 0.746 | 0.726 | 0.656 | 0.646 |
1RJU | 36 | 0.517 | 0.447 | 0.431 | 0.235 |
1U06 | 55 | 0.474 | 0.429 | 0.434 | 0.377 |
1UOY | 64 | 0.713 | 0.653 | 0.671 | 0.628 |
1USE | 40 | 0.438 | 0.146 | −0.142 | −0.399 |
1VRZ | 21 | 0.792 | 0.695 | 0.677b | −0.203 |
1XY2 | 8 | 0.619 | 0.570 | 0.562 | 0.458 |
1YJO | 6 | 0.375 | 0.333 | 0.434 | 0.445 |
1YZM | 46 | 0.842 | 0.834 | 0.901 | 0.939 |
2DSX | 52 | 0.337 | 0.333 | 0.127 | 0.433 |
2JKU | 35 | 0.805 | 0.695 | 0.656 | 0.850 |
2NLS | 36 | 0.605 | 0.559 | 0.530 | 0.088 |
2OL9 | 6 | 0.909 | 0.904 | 0.689 | 0.886 |
2OLX | 4 | 0.917 | 0.888 | 0.885 | 0.776 |
6RXN | 45 | 0.614 | 0.574 | 0.594 | 0.304 |
GNM and NMA values are taken from the coarse-grained (Cα) GNM and NMA results reported in Park et al.28
Table III.
PDB ID | N | opFRI | pfFRI | GNMa | NMAa |
---|---|---|---|---|---|
1ABA | 87 | 0.727 | 0.698 | 0.613 | 0.057 |
1CYO | 88 | 0.751 | 0.702 | 0.741 | 0.774 |
1FK5 | 93 | 0.590 | 0.568 | 0.485 | 0.362 |
1GXU | 88 | 0.748 | 0.634 | 0.421 | 0.581 |
1I71 | 83 | 0.549 | 0.516 | 0.549 | 0.380 |
1LR7 | 73 | 0.679 | 0.657 | 0.620 | 0.795 |
1N7E | 95 | 0.651 | 0.609 | 0.497 | 0.385 |
1NNX | 93 | 0.795 | 0.789 | 0.631 | 0.517 |
1NOA | 113 | 0.622 | 0.604 | 0.615 | 0.485 |
1OPD | 85 | 0.555 | 0.409 | 0.398 | 0.796 |
1QAU | 112 | 0.678 | 0.672 | 0.620 | 0.533 |
1R7J | 90 | 0.789 | 0.621 | 0.368 | 0.078 |
1UHA | 83 | 0.726 | 0.665 | 0.638b | 0.308 |
1ULR | 87 | 0.639 | 0.594 | 0.495 | 0.223 |
1USM | 77 | 0.832 | 0.809 | 0.798 | 0.780 |
1V05 | 96 | 0.629 | 0.599 | 0.632 | 0.389 |
1W2L | 97 | 0.691 | 0.564 | 0.397 | 0.432 |
1X3O | 80 | 0.600 | 0.559 | 0.654 | 0.453 |
1Z21 | 96 | 0.662 | 0.638 | 0.433 | 0.289 |
1ZVA | 75 | 0.756 | 0.579 | 0.690 | 0.579 |
2BF9 | 36 | 0.606 | 0.554 | 0.680b | 0.521 |
2BRF | 100 | 0.795 | 0.764 | 0.710 | 0.535 |
2CE0 | 99 | 0.706 | 0.598 | 0.529 | 0.628 |
2E3H | 81 | 0.692 | 0.682 | 0.605 | 0.632 |
2EAQ | 89 | 0.753 | 0.690 | 0.695 | 0.688 |
2EHS | 75 | 0.720 | 0.713 | 0.747 | 0.565 |
2FQ3 | 85 | 0.719 | 0.692 | 0.348 | 0.508 |
2IP6 | 87 | 0.654 | 0.578 | 0.572 | 0.826 |
2MCM | 113 | 0.789 | 0.713 | 0.639 | 0.643 |
2NUH | 104 | 0.835 | 0.691 | 0.771 | 0.685 |
2PKT | 93 | 0.162 | 0.003 | −0.193b | −0.165 |
2PLT | 99 | 0.508 | 0.484 | 0.509b | 0.187 |
2QJL | 99 | 0.594 | 0.584 | 0.594 | 0.497 |
2RB8 | 93 | 0.727 | 0.614 | 0.517 | 0.485 |
3BZQ | 99 | 0.532 | 0.516 | 0.466 | 0.351 |
5CYT | 103 | 0.441 | 0.421 | 0.331 | 0.102 |
GNM and NMA values are taken from the coarse-grained (Cα) GNM and NMA results reported in Park et al.28
Table IV.
PDB ID | N | opFRI | pfFRI | GNMa | NMAa |
---|---|---|---|---|---|
1AHO | 64 | 0.698 | 0.625 | 0.562 | 0.339 |
1ATG | 231 | 0.613 | 0.578 | 0.497 | 0.154 |
1BYI | 224 | 0.543 | 0.491 | 0.552 | 0.133 |
1CCR | 111 | 0.580 | 0.512 | 0.351 | 0.530 |
1E5K | 188 | 0.746 | 0.732 | 0.859 | 0.620 |
1EW4 | 106 | 0.650 | 0.644 | 0.547 | 0.447 |
1IFR | 113 | 0.697 | 0.689 | 0.637 | 0.330 |
1NKO | 122 | 0.619 | 0.535 | 0.368 | 0.322 |
1NLS | 238 | 0.669 | 0.530 | 0.523b | 0.385 |
1O08 | 221 | 0.562 | 0.333 | 0.309 | 0.616 |
1PMY | 123 | 0.671 | 0.654 | 0.685 | 0.702 |
1PZ4 | 114 | 0.828 | 0.781 | 0.843 | 0.844 |
1QTO | 122 | 0.543 | 0.520 | 0.334 | 0.725 |
1RRO | 112 | 0.435 | 0.372 | 0.529 | 0.546 |
1UKU | 102 | 0.665 | 0.661 | 0.742 | 0.720 |
1V70 | 105 | 0.622 | 0.492 | 0.162 | 0.285 |
1WBE | 204 | 0.591 | 0.577 | 0.549 | 0.574 |
1WHI | 122 | 0.601 | 0.539 | 0.270 | 0.414 |
1WPA | 107 | 0.634 | 0.577 | 0.417 | 0.380 |
2AGK | 233 | 0.705 | 0.694 | 0.512 | 0.514 |
2C71 | 205 | 0.658 | 0.649 | 0.560 | 0.584 |
2CG7 | 90 | 0.551 | 0.539 | 0.379 | 0.308 |
2CWS | 227 | 0.647 | 0.640 | 0.696 | 0.524 |
2HQK | 213 | 0.824 | 0.809 | 0.365 | 0.743 |
2HYK | 238 | 0.585 | 0.575 | 0.510 | 0.593 |
2I24 | 113 | 0.593 | 0.498 | 0.494 | 0.441 |
2IMF | 203 | 0.652 | 0.625 | 0.514 | 0.401 |
2PPN | 107 | 0.677 | 0.638 | 0.668 | 0.468 |
2R16 | 176 | 0.582 | 0.495 | 0.618b | 0.411 |
2V9V | 135 | 0.555 | 0.548 | 0.528 | 0.594 |
2VIM | 104 | 0.413 | 0.393 | 0.212 | 0.221 |
2VPA | 204 | 0.763 | 0.755 | 0.576 | 0.594 |
2VYO | 210 | 0.675 | 0.648 | 0.729 | 0.739 |
3SEB | 238 | 0.801 | 0.712 | 0.826 | 0.720 |
3VUB | 101 | 0.625 | 0.610 | 0.607 | 0.365 |
GNM and NMA values are taken from the coarse-grained (Cα) GNM and NMA results reported in Park et al.28
Table V.
PDB ID | N | opFRI | pfFRI | GNM | PDB ID | N | opFRI | pfFRI | GNM |
---|---|---|---|---|---|---|---|---|---|
1ABA | 87 | 0.727 | 0.698 | 0.613 | 1PEF | 18 | 0.888 | 0.826 | 0.808 |
1AGN | 1492 | 0.331 | 0.051 | 0.170 | 1PEN | 16 | 0.516 | 0.465 | 0.270 |
1AHO | 64 | 0.698 | 0.625 | 0.562 | 1PMY | 123 | 0.671 | 0.654 | 0.685 |
1AIE | 31 | 0.588 | 0.416 | 0.155 | 1PZ4 | 114 | 0.828 | 0.781 | 0.843 |
1AKG | 16 | 0.373 | 0.350 | 0.185 | 1Q9B | 43 | 0.746 | 0.726 | 0.656 |
1ATG | 231 | 0.613 | 0.578 | 0.497 | 1QAU | 112 | 0.678 | 0.672 | 0.620 |
1BGF | 124 | 0.603 | 0.539 | 0.543 | 1QKI | 3912 | 0.809 | 0.751 | 0.645 |
1BX7 | 51 | 0.726 | 0.623 | 0.706 | 1QTO | 122 | 0.543 | 0.520 | 0.334 |
1BYI | 224 | 0.543 | 0.491 | 0.552 | 1R29 | 122 | 0.650 | 0.631 | 0.556 |
1CCR | 111 | 0.580 | 0.512 | 0.351 | 1R7J | 90 | 0.789 | 0.621 | 0.368 |
1CYO | 88 | 0.751 | 0.702 | 0.741 | 1RJU | 36 | 0.517 | 0.447 | 0.431 |
1DF4 | 57 | 0.912 | 0.889 | 0.832 | 1RRO | 112 | 0.435 | 0.372 | 0.529 |
1E5K | 188 | 0.746 | 0.732 | 0.859 | 1SAU | 114 | 0.742 | 0.671 | 0.596 |
1ES5 | 260 | 0.653 | 0.638 | 0.677 | 1TGR | 104 | 0.720 | 0.711 | 0.714 |
1ETL | 12 | 0.710 | 0.609 | 0.628 | 1TZV | 141 | 0.837 | 0.820 | 0.841 |
1ETM | 12 | 0.544 | 0.393 | 0.432 | 1U06 | 55 | 0.474 | 0.429 | 0.434 |
1ETN | 12 | 0.089 | 0.023 | −0.274 | 1U7I | 267 | 0.778 | 0.762 | 0.691 |
1EW4 | 106 | 0.650 | 0.644 | 0.547 | 1U9C | 221 | 0.600 | 0.577 | 0.522 |
1F8R | 1932 | 0.878 | 0.859 | 0.738 | 1UHA | 83 | 0.726 | 0.665 | 0.638 |
1FF4 | 65 | 0.718 | 0.613 | 0.674 | 1UKU | 102 | 0.665 | 0.661 | 0.742 |
1FK5 | 93 | 0.590 | 0.568 | 0.485 | 1ULR | 87 | 0.639 | 0.594 | 0.495 |
1GCO | 1044 | 0.766 | 0.693 | 0.646 | 1UOY | 64 | 0.713 | 0.653 | 0.671 |
1GK7 | 39 | 0.845 | 0.773 | 0.821 | 1USE | 40 | 0.438 | 0.146 | -0.142 |
1GVD | 52 | 0.781 | 0.732 | 0.591 | 1USM | 77 | 0.832 | 0.809 | 0.798 |
1GXU | 88 | 0.748 | 0.634 | 0.421 | 1UTG | 70 | 0.691 | 0.610 | 0.538 |
1H6V | 2927 | 0.488 | 0.429 | 0.306 | 1V05 | 96 | 0.629 | 0.599 | 0.632 |
1HJE | 13 | 0.811 | 0.686 | 0.616 | 1V70 | 105 | 0.622 | 0.492 | 0.162 |
1I71 | 83 | 0.549 | 0.516 | 0.549 | 1VRZ | 21 | 0.792 | 0.695 | 0.677 |
1IDP | 441 | 0.735 | 0.715 | 0.690 | 1W2L | 97 | 0.691 | 0.564 | 0.397 |
1IFR | 113 | 0.697 | 0.689 | 0.637 | 1WBE | 204 | 0.591 | 0.577 | 0.549 |
1K8U | 89 | 0.553 | 0.531 | 0.378 | 1WHI | 122 | 0.601 | 0.539 | 0.270 |
1KMM | 1499 | 0.749 | 0.744 | 0.558 | 1WLY | 322 | 0.695 | 0.679 | 0.666 |
1KNG | 144 | 0.547 | 0.536 | 0.512 | 1WPA | 107 | 0.634 | 0.577 | 0.417 |
1KR4 | 110 | 0.635 | 0.612 | 0.466 | 1X3O | 80 | 0.600 | 0.559 | 0.654 |
1KYC | 15 | 0.796 | 0.763 | 0.754 | 1XY1 | 18 | 0.832 | 0.645 | 0.447 |
1LR7 | 73 | 0.679 | 0.657 | 0.620 | 1XY2 | 8 | 0.619 | 0.570 | 0.562 |
1MF7 | 194 | 0.687 | 0.681 | 0.700 | 1Y6X | 87 | 0.596 | 0.524 | 0.366 |
1N7E | 95 | 0.651 | 0.609 | 0.497 | 1YJO | 6 | 0.375 | 0.333 | 0.434 |
1NKD | 59 | 0.750 | 0.703 | 0.631 | 1YZM | 46 | 0.842 | 0.834 | 0.901 |
1NKO | 122 | 0.619 | 0.535 | 0.368 | 1Z21 | 96 | 0.662 | 0.638 | 0.433 |
1NLS | 238 | 0.669 | 0.530 | 0.523 | 1ZCE | 146 | 0.808 | 0.757 | 0.770 |
1NNX | 93 | 0.795 | 0.789 | 0.631 | 1ZVA | 75 | 0.756 | 0.579 | 0.690 |
1NOA | 113 | 0.622 | 0.604 | 0.615 | 2A50 | 457 | 0.564 | 0.524 | 0.281 |
1NOT | 13 | 0.746 | 0.622 | 0.523 | 2AGK | 233 | 0.705 | 0.694 | 0.512 |
1O06 | 20 | 0.910 | 0.874 | 0.844 | 2AH1 | 939 | 0.684 | 0.593 | 0.521 |
1O08 | 221 | 0.562 | 0.333 | 0.309 | 2B0A | 186 | 0.639 | 0.603 | 0.467 |
1OB4 | 16 | 0.776 | 0.763 | 0.750 | 2BCM | 413 | 0.555 | 0.551 | 0.477 |
1OB7 | 16 | 0.737 | 0.545 | 0.652 | 2BF9 | 36 | 0.606 | 0.554 | 0.680 |
1OPD | 85 | 0.555 | 0.409 | 0.398 | 2BRF | 100 | 0.795 | 0.764 | 0.710 |
1P9I | 29 | 0.754 | 0.742 | 0.625 | 2C71 | 205 | 0.658 | 0.649 | 0.560 |
2CE0 | 99 | 0.706 | 0.598 | 0.529 | 2OLX | 4 | 0.917 | 0.888 | 0.885 |
2CG7 | 90 | 0.551 | 0.539 | 0.379 | 2PKT | 93 | 0.162 | 0.003 | −0.193 |
2COV | 534 | 0.846 | 0.823 | 0.812 | 2PLT | 99 | 0.508 | 0.484 | 0.509 |
2CWS | 227 | 0.647 | 0.640 | 0.696 | 2PMR | 76 | 0.693 | 0.682 | 0.619 |
2D5W | 1214 | 0.689 | 0.682 | 0.681 | 2POF | 440 | 0.682 | 0.651 | 0.589 |
2DKO | 253 | 0.816 | 0.812 | 0.690 | 2PPN | 107 | 0.677 | 0.638 | 0.668 |
2DPL | 565 | 0.596 | 0.538 | 0.658 | 2PSF | 608 | 0.526 | 0.500 | 0.565 |
2DSX | 52 | 0.337 | 0.333 | 0.127 | 2PTH | 193 | 0.822 | 0.784 | 0.767 |
2E10 | 439 | 0.798 | 0.796 | 0.692 | 2Q4N | 153 | 0.711 | 0.667 | 0.740 |
2E3H | 81 | 0.692 | 0.682 | 0.605 | 2Q52 | 412 | 0.756 | 0.748 | 0.621 |
2EAQ | 89 | 0.753 | 0.690 | 0.695 | 2QJL | 99 | 0.594 | 0.584 | 0.594 |
2EHP | 248 | 0.804 | 0.804 | 0.773 | 2R16 | 176 | 0.582 | 0.495 | 0.618 |
2EHS | 75 | 0.720 | 0.713 | 0.747 | 2R6Q | 138 | 0.603 | 0.540 | 0.529 |
2ERW | 53 | 0.461 | 0.253 | 0.199 | 2RB8 | 93 | 0.727 | 0.614 | 0.517 |
2ETX | 389 | 0.580 | 0.556 | 0.632 | 2RE2 | 238 | 0.652 | 0.613 | 0.673 |
2FB6 | 116 | 0.791 | 0.786 | 0.740 | 2RFR | 154 | 0.693 | 0.671 | 0.753 |
2FG1 | 157 | 0.620 | 0.617 | 0.584 | 2V9V | 135 | 0.555 | 0.548 | 0.528 |
2FN9 | 560 | 0.607 | 0.595 | 0.611 | 2VE8 | 515 | 0.744 | 0.643 | 0.616 |
2FQ3 | 85 | 0.719 | 0.692 | 0.348 | 2VH7 | 94 | 0.775 | 0.726 | 0.596 |
2G69 | 99 | 0.622 | 0.590 | 0.436 | 2VIM | 104 | 0.413 | 0.393 | 0.212 |
2G7O | 68 | 0.785 | 0.784 | 0.660 | 2VPA | 204 | 0.763 | 0.755 | 0.576 |
2G7S | 190 | 0.670 | 0.644 | 0.649 | 2VQ4 | 106 | 0.680 | 0.679 | 0.555 |
2GKG | 122 | 0.688 | 0.646 | 0.711 | 2VY8 | 149 | 0.770 | 0.724 | 0.533 |
2GOM | 121 | 0.586 | 0.584 | 0.491 | 2VYO | 210 | 0.675 | 0.648 | 0.729 |
2GXG | 140 | 0.847 | 0.780 | 0.520 | 2W1V | 548 | 0.680 | 0.680 | 0.571 |
2GZQ | 191 | 0.505 | 0.382 | 0.369 | 2W2A | 350 | 0.706 | 0.638 | 0.589 |
2HQK | 213 | 0.824 | 0.809 | 0.365 | 2W6A | 117 | 0.823 | 0.748 | 0.647 |
2HYK | 238 | 0.585 | 0.575 | 0.510 | 2WJ5 | 96 | 0.484 | 0.440 | 0.357 |
2I24 | 113 | 0.593 | 0.498 | 0.494 | 2WUJ | 100 | 0.739 | 0.598 | 0.598 |
2I49 | 398 | 0.714 | 0.683 | 0.601 | 2WW7 | 150 | 0.499 | 0.471 | 0.356 |
2IBL | 108 | 0.629 | 0.625 | 0.352 | 2WWE | 111 | 0.692 | 0.582 | 0.628 |
2IGD | 61 | 0.585 | 0.481 | 0.386 | 2X1Q | 240 | 0.534 | 0.478 | 0.443 |
2IMF | 203 | 0.652 | 0.625 | 0.514 | 2X25 | 168 | 0.632 | 0.598 | 0.403 |
2IP6 | 87 | 0.654 | 0.578 | 0.572 | 2X3M | 166 | 0.744 | 0.717 | 0.655 |
2IVY | 88 | 0.544 | 0.483 | 0.271 | 2X5Y | 171 | 0.718 | 0.705 | 0.694 |
2J32 | 244 | 0.863 | 0.848 | 0.855 | 2X9Z | 262 | 0.583 | 0.578 | 0.574 |
2J9W | 200 | 0.716 | 0.705 | 0.662 | 2XHF | 310 | 0.606 | 0.591 | 0.569 |
2JKU | 35 | 0.805 | 0.695 | 0.656 | 2Y0T | 101 | 0.778 | 0.774 | 0.798 |
2JLI | 100 | 0.779 | 0.613 | 0.622 | 2Y72 | 170 | 0.780 | 0.754 | 0.766 |
2JLJ | 115 | 0.741 | 0.720 | 0.527 | 2Y7L | 319 | 0.928 | 0.797 | 0.747 |
2MCM | 113 | 0.789 | 0.713 | 0.639 | 2Y9F | 149 | 0.771 | 0.762 | 0.664 |
2NLS | 36 | 0.605 | 0.559 | 0.530 | 2YLB | 400 | 0.807 | 0.807 | 0.675 |
2NR7 | 194 | 0.803 | 0.785 | 0.727 | 2YNY | 315 | 0.813 | 0.804 | 0.706 |
2NUH | 104 | 0.835 | 0.691 | 0.771 | 2ZCM | 357 | 0.458 | 0.422 | 0.420 |
2O6X | 306 | 0.814 | 0.799 | 0.651 | 2ZU1 | 360 | 0.689 | 0.672 | 0.653 |
2OA2 | 132 | 0.571 | 0.456 | 0.458 | 3A0M | 148 | 0.807 | 0.712 | 0.392 |
2OCT | 192 | 0.567 | 0.550 | 0.540 | 3A7L | 128 | 0.713 | 0.663 | 0.756 |
2OHW | 256 | 0.614 | 0.539 | 0.475 | 3AMC | 614 | 0.675 | 0.669 | 0.581 |
2OKT | 342 | 0.433 | 0.411 | 0.336 | 3AUB | 116 | 0.614 | 0.608 | 0.637 |
2OL9 | 6 | 0.909 | 0.904 | 0.689 | 3B5O | 230 | 0.644 | 0.629 | 0.601 |
3BA1 | 312 | 0.661 | 0.624 | 0.621 | 3MD4 | 12 | 0.860 | 0.781 | 0.914 |
3BED | 261 | 0.845 | 0.820 | 0.684 | 3MD5 | 12 | 0.649 | 0.413 | -0.218 |
3BQX | 139 | 0.634 | 0.481 | 0.297 | 3MEA | 166 | 0.669 | 0.669 | 0.600 |
3BZQ | 99 | 0.532 | 0.516 | 0.466 | 3MGN | 348 | 0.205 | 0.119 | 0.193 |
3BZZ | 100 | 0.485 | 0.450 | 0.600 | 3MRE | 383 | 0.661 | 0.641 | 0.567 |
3DRF | 547 | 0.559 | 0.549 | 0.488 | 3N11 | 325 | 0.614 | 0.583 | 0.517 |
3DWV | 325 | 0.707 | 0.661 | 0.547 | 3NE0 | 208 | 0.706 | 0.645 | 0.659 |
3E5T | 228 | 0.502 | 0.489 | 0.296 | 3NGG | 94 | 0.696 | 0.689 | 0.719 |
3E7R | 40 | 0.706 | 0.687 | 0.642 | 3NPV | 495 | 0.702 | 0.653 | 0.677 |
3EUR | 140 | 0.431 | 0.427 | 0.577 | 3NVG | 6 | 0.721 | 0.617 | 0.597 |
3F2Z | 149 | 0.824 | 0.792 | 0.740 | 3NZL | 73 | 0.627 | 0.583 | 0.506 |
3F7E | 254 | 0.812 | 0.803 | 0.811 | 3O0P | 194 | 0.727 | 0.706 | 0.734 |
3FCN | 158 | 0.640 | 0.606 | 0.632 | 3O5P | 128 | 0.734 | 0.698 | 0.630 |
3FE7 | 91 | 0.583 | 0.533 | 0.276 | 3OBQ | 150 | 0.649 | 0.645 | 0.655 |
3FKE | 250 | 0.525 | 0.476 | 0.435 | 3OQY | 234 | 0.698 | 0.686 | 0.637 |
3FMY | 66 | 0.701 | 0.655 | 0.556 | 3P6J | 125 | 0.774 | 0.767 | 0.810 |
3FOD | 48 | 0.532 | 0.440 | -0.126 | 3PD7 | 188 | 0.770 | 0.723 | 0.589 |
3FSO | 221 | 0.831 | 0.817 | 0.793 | 3PES | 165 | 0.697 | 0.642 | 0.683 |
3FTD | 240 | 0.722 | 0.713 | 0.634 | 3PID | 387 | 0.537 | 0.531 | 0.642 |
3FVA | 6 | 0.835 | 0.825 | 0.789 | 3PIW | 154 | 0.758 | 0.744 | 0.717 |
3G1S | 418 | 0.771 | 0.700 | 0.630 | 3PKV | 221 | 0.625 | 0.597 | 0.568 |
3GBW | 161 | 0.820 | 0.747 | 0.510 | 3PSM | 94 | 0.876 | 0.790 | 0.745 |
3GHJ | 116 | 0.732 | 0.511 | 0.196 | 3PTL | 289 | 0.543 | 0.541 | 0.468 |
3HFO | 197 | 0.691 | 0.670 | 0.518 | 3PVE | 347 | 0.718 | 0.667 | 0.568 |
3HHP | 1234 | 0.720 | 0.716 | 0.683 | 3PZ9 | 357 | 0.709 | 0.709 | 0.678 |
3HNY | 156 | 0.793 | 0.723 | 0.758 | 3PZZ | 12 | 0.945 | 0.922 | 0.950 |
3HP4 | 183 | 0.534 | 0.500 | 0.573 | 3Q2X | 6 | 0.922 | 0.904 | 0.866 |
3HWU | 144 | 0.754 | 0.748 | 0.841 | 3Q6L | 131 | 0.622 | 0.577 | 0.605 |
3HYD | 7 | 0.966 | 0.950 | 0.867 | 3QDS | 284 | 0.780 | 0.745 | 0.568 |
3HZ8 | 192 | 0.617 | 0.502 | 0.475 | 3QPA | 197 | 0.587 | 0.442 | 0.503 |
3I2V | 124 | 0.486 | 0.441 | 0.301 | 3R6D | 221 | 0.688 | 0.669 | 0.495 |
3I2Z | 138 | 0.613 | 0.599 | 0.317 | 3R87 | 132 | 0.452 | 0.419 | 0.286 |
3I4O | 135 | 0.735 | 0.714 | 0.738 | 3RQ9 | 162 | 0.510 | 0.403 | 0.242 |
3I7M | 134 | 0.667 | 0.635 | 0.695 | 3RY0 | 128 | 0.616 | 0.606 | 0.470 |
3IHS | 169 | 0.586 | 0.565 | 0.409 | 3RZY | 139 | 0.800 | 0.784 | 0.849 |
3IVV | 149 | 0.817 | 0.797 | 0.693 | 3S0A | 119 | 0.562 | 0.524 | 0.526 |
3K6Y | 227 | 0.586 | 0.535 | 0.301 | 3SD2 | 86 | 0.523 | 0.421 | 0.237 |
3KBE | 140 | 0.705 | 0.704 | 0.611 | 3SEB | 238 | 0.801 | 0.712 | 0.826 |
3KGK | 190 | 0.784 | 0.775 | 0.680 | 3SED | 124 | 0.709 | 0.658 | 0.712 |
3KZD | 85 | 0.647 | 0.611 | 0.475 | 3SO6 | 150 | 0.675 | 0.666 | 0.630 |
3L41 | 220 | 0.718 | 0.716 | 0.669 | 3SR3 | 637 | 0.619 | 0.611 | 0.624 |
3LAA | 169 | 0.827 | 0.647 | 0.659 | 3SUK | 248 | 0.644 | 0.633 | 0.567 |
3LAX | 106 | 0.734 | 0.730 | 0.584 | 3SZH | 697 | 0.817 | 0.815 | 0.697 |
3LG3 | 833 | 0.658 | 0.614 | 0.589 | 3T0H | 208 | 0.808 | 0.775 | 0.694 |
3LJI | 272 | 0.612 | 0.608 | 0.551 | 3T3K | 122 | 0.796 | 0.748 | 0.735 |
3M3P | 249 | 0.584 | 0.554 | 0.338 | 3T47 | 141 | 0.592 | 0.527 | 0.447 |
3M8J | 178 | 0.730 | 0.728 | 0.628 | 3TDN | 357 | 0.458 | 0.419 | 0.240 |
3M9J | 210 | 0.639 | 0.574 | 0.296 | 3TOW | 152 | 0.578 | 0.556 | 0.571 |
3M9Q | 176 | 0.591 | 0.510 | 0.471 | 3TUA | 210 | 0.665 | 0.658 | 0.588 |
3MAB | 173 | 0.664 | 0.591 | 0.451 | 3TYS | 75 | 0.853 | 0.800 | 0.791 |
3U6G | 248 | 0.635 | 0.632 | 0.526 | 4DT4 | 160 | 0.776 | 0.738 | 0.716 |
3U97 | 77 | 0.753 | 0.736 | 0.712 | 4EK3 | 287 | 0.680 | 0.680 | 0.674 |
3UCI | 72 | 0.589 | 0.526 | 0.495 | 4ERY | 318 | 0.740 | 0.701 | 0.688 |
3UR8 | 637 | 0.666 | 0.652 | 0.597 | 4ES1 | 95 | 0.648 | 0.625 | 0.551 |
3US6 | 148 | 0.698 | 0.586 | 0.553 | 4EUG | 225 | 0.570 | 0.529 | 0.405 |
3V1A | 48 | 0.531 | 0.487 | 0.583 | 4F01 | 448 | 0.633 | 0.372 | 0.688 |
3V75 | 285 | 0.604 | 0.596 | 0.491 | 4F3J | 143 | 0.617 | 0.598 | 0.551 |
3VN0 | 193 | 0.840 | 0.837 | 0.812 | 4FR9 | 141 | 0.671 | 0.655 | 0.501 |
3VOR | 182 | 0.602 | 0.557 | 0.484 | 4G14 | 15 | 0.467 | 0.323 | 0.356 |
3VUB | 101 | 0.625 | 0.610 | 0.607 | 4G2E | 151 | 0.760 | 0.755 | 0.758 |
3VVV | 108 | 0.833 | 0.741 | 0.753 | 4G5X | 550 | 0.786 | 0.754 | 0.743 |
3VZ9 | 163 | 0.785 | 0.749 | 0.695 | 4G6C | 658 | 0.591 | 0.590 | 0.528 |
3W4Q | 773 | 0.737 | 0.725 | 0.649 | 4G7X | 194 | 0.688 | 0.587 | 0.624 |
3ZBD | 213 | 0.651 | 0.516 | 0.632 | 4GA2 | 144 | 0.528 | 0.485 | 0.406 |
3ZIT | 152 | 0.430 | 0.404 | 0.392 | 4GMQ | 92 | 0.678 | 0.628 | 0.550 |
3ZRX | 221 | 0.590 | 0.562 | 0.391 | 4GS3 | 90 | 0.544 | 0.522 | 0.547 |
3ZSL | 138 | 0.691 | 0.687 | 0.526 | 4H4J | 236 | 0.810 | 0.806 | 0.689 |
3ZZP | 74 | 0.524 | 0.460 | 0.448 | 4H89 | 168 | 0.682 | 0.588 | 0.596 |
3ZZY | 226 | 0.746 | 0.709 | 0.728 | 4HDE | 168 | 0.745 | 0.728 | 0.615 |
4A02 | 166 | 0.618 | 0.516 | 0.303 | 4HJP | 281 | 0.703 | 0.649 | 0.510 |
4ACJ | 167 | 0.748 | 0.746 | 0.759 | 4HWM | 117 | 0.638 | 0.622 | 0.499 |
4AE7 | 186 | 0.724 | 0.717 | 0.717 | 4IL7 | 85 | 0.446 | 0.404 | 0.316 |
4AM1 | 345 | 0.674 | 0.619 | 0.460 | 4J11 | 357 | 0.620 | 0.562 | 0.401 |
4ANN | 176 | 0.551 | 0.536 | 0.470 | 4J5O | 220 | 0.793 | 0.757 | 0.777 |
4AVR | 188 | 0.680 | 0.605 | 0.650 | 4J5Q | 146 | 0.742 | 0.742 | 0.689 |
4AXY | 54 | 0.700 | 0.623 | 0.720 | 4J78 | 305 | 0.658 | 0.648 | 0.608 |
4B6G | 558 | 0.765 | 0.756 | 0.669 | 4JG2 | 185 | 0.746 | 0.736 | 0.543 |
4B9G | 292 | 0.844 | 0.816 | 0.763 | 4JVU | 207 | 0.723 | 0.697 | 0.553 |
4DD5 | 387 | 0.615 | 0.596 | 0.351 | 4JYP | 534 | 0.688 | 0.682 | 0.538 |
4DKN | 423 | 0.781 | 0.761 | 0.539 | 4KEF | 133 | 0.580 | 0.530 | 0.324 |
4DND | 95 | 0.763 | 0.750 | 0.582 | 5CYT | 103 | 0.441 | 0.421 | 0.331 |
4DPZ | 109 | 0.730 | 0.726 | 0.651 | 6RXN | 45 | 0.614 | 0.574 | 0.594 |
4DQ7 | 328 | 0.690 | 0.683 | 0.376 |
To directly compare the FRI with GNM and NMA, we calculated the correlation coefficient of Cα B-factor predictions for the three structure sets taken from Park et al.28 To further compare the FRI and GNM, we also calculated the accuracy of these two methods on a superset of 365 structures.
Two versions of the FRI are used for these tests. The first, optimal FRI (opFRI), searches a wide range of parameters for the highest scoring parameter and the second, parameter free FRI (pfFRI), uses υ = 3 and η = 3 Å in all cases. The correlation coefficients for three sets proposed by Park et al.28 are reported in Tables II–IV for FRI, GNM, and NMA. The results of the B-factor predictions for the superset are shown in Fig. 3. Using the top left chart as an example, both axises are correlation coefficients. For each circle, its x-coordinate is its correlation coefficient for pfFRI, while its y-coordinate is its correlation coefficient for opFRI. Since all circles are located above the diagonal line, opFRI always outperforms pfFRI. The average correlation scores for optimal FRI, parameter free FRI, GNM, and NMA for each set of structures are listed in Table VI. As shown in Table VI and Fig. 3, opFRI outperforms pfFRI in many cases although the majority of structures have little difference in their score for each method. Both optimal and parameter free FRI methods outperform GNM and NMA for most structures. B-factor prediction with the FRI is most accurate for smaller structures (<70 residues). All three methods tend to perform worse as the structures get larger except in the case of NMA where the medium-sized structures scored slightly lower than the large-sized structures. This behavior is expected because as proteins get larger their structures become more complex and may include structural co-factors and more amino acid side chain interactions that contribute to the protein's stability. The coarse-grained Cα representation used in these methods is unable to capture these kinds of details. The average increase in correlation coefficients when using the FRI over GNM on the superset of 365 proteins is 0.096 for opFRI and 0.059 for pfFRI. Additionally, opFRI and pfFRI are more accurate on average than GNM and NMA for all three sets of structures used by Park et al.28 From these results we conclude that both FRI and pfFRI are more accurate on average than either GNM or NMA.
Table VI.
PDB set | opFRI | pfFRI | GNM | NMA |
---|---|---|---|---|
Small | 0.667 | 0.594 | 0.541 | 0.480 |
Medium | 0.664 | 0.605 | 0.550 | 0.482 |
Large | 0.636 | 0.591 | 0.529 | 0.494 |
Superset | 0.673 | 0.626 | 0.565 | NA |
2. fFRI vs GNM
Table VII lists the average correlation coefficients of B-factor prediction for 365 proteins using fFRI schemes at a given truncation (R = 12 Å). It is seen that the proposed fFRI schemes implemented in either exponential (η = 3 Å and κ = 1) or Lorentz (η = 3 Å and υ = 3) are at least 10% more accurate than the GNM.
Table VII.
Exponential parameters | Avg. CC | Lorentz parameters | Avg. CC |
---|---|---|---|
κ = 0.5, η = 0.5 Å | 0.615 (8.8%) | υ = 2.5, η = 2.0 Å | 0.622 (10.1%) |
κ = 1.0, η = 3.0 Å | 0.623 (10.3%) | υ = 3.0, η = 3.0 Å | 0.626 (10.8%) |
κ = 1.5, η = 6.0 Å | 0.619 (9.6%) | υ = 3.5, η = 4.0 Å | 0.623 (10.3%) |
D. Efficiency comparison for FRI, fFRI, and GNM
This section concerns the computational efficiency of the FRI method. The efficiency of the FRI and the fFRI is compared with that of the GNM.
Computational efficiency in B-factor prediction becomes important for large proteins and for repeated predictions in molecular dynamics simulation and flexible docking analysis. High efficiency in rigidity analysis is also a requirement for CEWAR dynamics, where atomic rigidity functions are to be evaluated during the time evolution. The previously described set of 44 proteins as listed in Table VIII are used to test the computational complexity of the FRI, fFRI, and GNM algorithms. The method used to obtain the structure of the HIV virus capsid, which has more than 313 000 amino acid residues, does not provide experimental B-factors. To ensure a fair test, we have added some random noise to the predicted B-factors. The resulting B-factors of the HIV structure are employed in our efficiency test as if they were real experimental data.
Table VIII.
PDB ID | N | FRI | fFRI | GNM |
---|---|---|---|---|
3P6J | 125 | * | * | 0.141 |
3R87 | 132 | * | * | 0.156 |
3KBE | 140 | * | * | 0.187 |
1TZV | 141 | * | * | 0.203 |
2VY8 | 149 | * | * | 0.219 |
3ZIT | 152 | * | * | 0.234 |
2FG1 | 157 | * | * | 0.265 |
2X3M | 166 | * | * | 0.312 |
3LAA | 169 | * | * | 0.327 |
3M8J | 178 | * | * | 0.375 |
2GZQ | 191 | * | * | 0.468 |
4G7X | 194 | * | * | 0.499 |
2J9W | 200 | * | * | 0.546 |
3TUA | 210 | * | * | 0.655 |
1U9C | 221 | * | * | 0.733 |
3ZRX | 221 | * | * | 0.718 |
3K6Y | 227 | * | * | 0.765 |
3OQY | 234 | * | * | 0.873 |
2J32 | 244 | * | * | 0.967 |
3M3P | 249 | * | * | 1.029 |
1U7I | 267 | * | * | 1.263 |
4B9G | 292 | * | * | 1.669 |
4ERY | 318 | * | * | 2.122 |
3MGN | 348 | * | * | 2.902 |
2ZU1 | 360 | * | * | 3.136 |
2Q52 | 412 | * | * | 4.696 |
4F01 | 448 | * | * | 6.178 |
3DRF | 547 | 0.062 | * | 11.154 |
3UR8 | 637 | 0.07 | * | 17.409 |
2AH1 | 939 | 0.156 | * | 61.012 |
1GCO | 1044 | 0.187 | * | 75.801 |
1AGN | 1492 | 0.343 | * | 298.632 |
1F8R | 1932 | 0.655 | * | 654.127 |
1H6V | 2927 | 1.545 | * | 2085.842 |
1QKI | 3912 | 2.699 | * | 6365.668 |
3KGV | 4064 | 2.949 | * | 6194.518 |
1K32 | 6138 | 6.755 | * | * |
1JZ0 | 8168 | 11.87 | * | * |
4BGR | 8949 | 14.056 | 0.889 | * |
1VSZ | 12 012 | 25.413 | 1.248 | * |
GroEL | 14 700 | * | 1.467 | * |
B Gal | 16 336 | * | 1.716 | * |
1VRI | 18 540 | * | 1.934 | * |
HIV | 313 236 | * | 29.344 | * |
Table VIII and Figure 4 list the running times for each method in our FORTRAN implementations of GNM and FRI. Tests were conducted using a single core of an AMD Phenom II X6 1100T processor and include the entire GNM and FRI algorithm leaving out only the time it takes to load PDB files. As expected, GNM has a computational complexity close to due to the matrix decomposition, while the FRI is approximately of , mainly because of the computation of correlation functions. As for the fFRI, its computational complexity is of due to the nature of its sparse matrix. The lines of best fit for CPU time (t) are: t = (4 × 10−8) * N3.09 for GNM, t = (2 × 10−7) * N1.98 for FRI and t = (1 × 10−7) * N0.975 for fFRI. Some of the 44 structures used for efficiency testing were excluded from the final analysis of the FRI and fFRI methods because they required so little time to run that it was not possible to get an accurate measure of execution time. A few of the largest structures were only tested with FRI and fFRI methods because they require much more CPU time to run with GNM and the efficiency data are already sufficient to show that GNM scales at approximately . For a protein of seven thousand amino acid residues, it takes close to 10 000 s for GNM and only a few seconds for the FRI to predict the B-factors in our test. The fFRI is significantly faster than other methods. It takes less than 30 s for the fFRI to predict the B-factors of the HIV virus capsid with 313 236 residues.
E. Visualization of correlation maps
The correlation functions used to generate FRI correlation maps are based on monotonically decreasing radial basis functions. NMA, GNM, and related tools, on the other hand, use either a Kirchhoff (i.e., contact matrix) or a cross correlation map for connectivity between atoms. A Kirchhoff matrix is similar to our correlation map except the values are set to −1 for pairs of atoms within a cutoff distance and set to 0 for pairs of atoms outside the cutoff distance. The downside of using the Kirchhoff matrix is that, by definition, it treats all bonds within a certain cutoff distance the same and neglects all interactions outside that distance. We know that the closest atoms, such as covalently bonded atoms, will contribute more to the rigidity of a particular atom. With a rapidly decaying distance function, such as those used in our correlation functions, the impact of covalently bonded atoms is emphasized over the interactions that are just slightly more distant. This relationship is important because while structural features at a distance can play a significant role in stability and flexibility of a molecule, local structure has a much greater impact. Additionally, the cross correlation map in GNM and other methods does not reflect the atomic distance relations in a direct manner. In contrast, given a correlation matrix and a function from the FRI method it is easy to reconstruct the position of each Cα atom in our coarse-grained model. The FRI correlation maps on the right side of Fig. 5 display distance values along side the correlation values to reflect this property of the FRI. These maps were calculated using the Lorentz function with υ = 2.5. The second case (PDB: 1AIE), a single α-helix, is a good example of how the distance based correlation map reflects secondary structure information. The width of the band of high correlation is four amino acids, approximately the number of amino acids in one turn of an α-helix. For atoms within one turn of the helix, correlation values to nearby atoms follow a predictable pattern based on their distance. The cross-correlation matrix for 1AIE from GNM shows a similar overall pattern but without the same kind of atomic detail from the distance based correlation functions.
The other structures in Fig. 5 are from larger proteins and they show how the FRI and GNM represent complex arrangements of secondary structures. The FRI correlation maps clearly indicate where secondary structures are in the protein and what other residues they interact closely with. In these correlation maps, secondary structures are typically shown as small bands of relatively high correlation (yellow to red) while interactions between them appear in green and they often appear as regularly spaced green spots. These spots have space between them because they involve interactions with one face of an α-helix or similar fold and so the residues on the far side have a lower correlation. The cross-correlation matrix from GNM also gives some indication of where secondary structures are, α-helix is a square of green and beta sheets are lines, however these shapes are less defined on the atomic scale. Similarly the interactions between secondary structures are harder to pinpoint atomically as these interactions appear as a green smear in the matrices while in the correlation maps of the FRI method there are discrete spots with individual correlation values for atom to atom interactions. In the last example of Fig. 5 we can also see in the top left and bottom right corners how each map displays the interaction between two images of a structure from a single X-ray crystallography experiment.
IV. PROTEIN DOMAIN ANALYSIS WITH FRI AND aFRI
A. FRI for protein identification
Structural domains, as opposed to functional domains, are the basic unit of protein structures. Typically, these portions of a protein fold independently and are stable on their own. The way various domains are assembled in a protein dictates its shape and function and is therefore interesting to biologists. One simple way to identify these domains and their interactions is by looking at the correlation matrix. The FRI correlation matrix or correlation map allows us to identify protein structural domains because the values of a correlation function directly reflect the pairwise distances between Cα-Cα atoms. Therefore FRI correlation maps are also distance maps, which have been shown to be an effective tool for identifying protein domains as early as 1974.31 This method involves generating a Cα-Cα pairwise distance map and identifying dense, triangular areas of contacts near the diagonal. The correlation matrix generated for FRI is similar to a basic Cα-Cα distance map with the benefit of a rapidly decaying distance function. This results in more pronounced separation between domains in the map, potentially making it easier to identify distinct domains. Figure 6 demonstrates how domains are identified on a Cα-Cα distance map or a FRI correlation matrix for three proteins of increasing size and complexity. Each protein is divided into two structural domains outlined with red boxes for clarity.
In more complex cases, we see larger domains made up of multiple subdomains. Figure 7 shows the HIV capsid protein (PDB ID: 1E6J) consisting of six distinct structural domains. Alongside the protein's secondary structure are the contact matrix used in GNM and the correlation maps from the FRI. By adjusting the distance cutoff in the GNM and correlation function parameters in the FRI we arrive at similar results. The six compact structural subdomains can be identified visually from these matrices by finding areas with high correlation or large numbers of contacts in a triangular shape near the diagonal. Some interactions between subdomains can also be seen in the matrices. Subdomains with a significant level of correlation or contacts between them can be considered as single larger domain. In the example of Figure 7, there are contacts in the matrices between five subdomains. The last subunit (in green) shows little or no correlation or contacts to the other subdomains and is therefor a separate domain and expected to move fully independently. However, predicting domain motions with the FRI requires an alternative formulation, the anisotropic FRI.
B. aFRI for protein identification
Anisotropic FRI, like ANM, predicts the amplitudes and directions of atomic fluctuations. To test the accuracy of this new method we can compare aFRI fluctuation predictions to experimental B-factors as we have done with FRI and GNM. Both anisotropic rigidity based flexibility and anisotropic flexibility based flexibility () are examined for their B-factor predictions and their average correlation coefficients are 0.602 (υ = 2 and η = 9 Å) and 0.572 (υ = 2 and η = 18 Å) for the superset of 365 structures. This means that aFRI is more accurate than GNM (average correlation coefficient 0.565 for the superset). However, aFRI is slightly less accurate than pfFRI (average correlation coefficient 0.626 for the superset), which is similar to the fact that ANM is not as accurate as GNM.
A major utility of the proposed aFRI theory is the prediction of protein motions by using the anisotropic flexibility. Since aFRI is adaptive, its cluster can be as large as the whole molecule and as small as a single particle. The completely global aFRI has a Hessian matrix of 3N × 3N elements and produces 3N eigenmodes. Depending on symmetry, 5 or 6 modes are due to the translational and rotational motions. Therefore, the remaining 3N − 5 or 3N − 6 vibrational modes can be obtained. To validate the proposed aFRI theory for normal mode analysis, we have computed the vibrational modes for a few simple molecules, namely, H2O, CO2, and CH2O, whose vibrational modes are well-known. Our results are displayed in Figure 8. In addition to these vibrational modes, appropriate translational and rotational modes are also observed from our calculations, but are omitted in our presentation. Therefore, the proposed aFRI works well for the analysis of small molecular translation, rotation, and vibration.
Having established our aFRI for small molecular vibrational analysis, we are interested in examining its behavior for macromolecules. Anisotropic normal mode analysis of large biomolecules can be very expensive because the computational complexity of the global matrix scales as . As such, the adaptive cluster analysis option provided by the aFRI algorithm can be useful. In this work, we explore two extreme aFRI options, i.e., the completely global cluster and completely local clusters, for protein vibrational analysis. Similar to the ANM, the completely global aFRI algorithm has a Hessian matrix of 3N × 3N elements and produces 3N eigenmodes. The motions predicted by these eigenmodes are typically very similar to those produced by ANM. In contrast, the completely localized aFRI has only a total of N 3 × 3 Hessian matrices and gives rise to 3N eigenmodes for N particles. We assemble these 3N eigenmodes into 3 modes for the molecule and weight the amplitude of each eigenmode by the corresponding B-factor for the particle. Due to the non-local correlation built in the aFRI matrices, these three modes of motion obtained by the completely local aFRI algorithm are often similar to certain low-order modes calculated by ANM for a protein. The first three modes of motion for the HIV capsid protein are shown in Figure 9 for two aFRI algorithms and ANM. It is seen that three eigenmodes obtained from the completely global aFRI resemble those calculated by the ANM. Although three modes produced by the completely local aFRI algorithm show different motions, it is amazing to note that there is much collective motion in these modes.
Figure 10 depicts three modes for phosphate active transport receptor protein generated by using two aFRI algorithms and ANM. Once again, we see a good similarity between eigenmodes calculated by using the completely global aFRI algorithm and those computed by using the ANM. However, the modes generated with the completely localized aFRI demonstrate somewhat different motions. In each method, the relative motion of two domains can be clearly identified. The domain relative motions in the eigenmodes of the completely global aFRI and the ANM exhibit a better synergistic effect in general. Whereas, modes from the completely local aFRI are slightly less collective. Since there is no standard answer to domain fluctuations, it is difficult to say which one is right or wrong. An interesting observation is that although aFRI matrices can be completely local, they have built in non-local correlation and thus are able to simulate highly collective protein motions.
V. CONCLUDING REMARKS
The fundamental challenges that hinder the current quantitative understanding of biomolecular systems are their tremendous complexity and excessively large number of degrees of freedom. A multiscale approach, CEWAR, provides a new method for the reduction of the number of degrees of freedom in biomolecular systems.46 The performance of the CEWAR method relies on the accurate and efficient evaluation of a continuous atomic rigidity function. The FRI is proposed as a potential algorithm for such an evaluation. The underlying assumption of the FRI is that protein interactions uniquely determine the protein structure which, in turn, determines the protein functions, such as stability and flexibility. Therefore, one just needs the structural information to predict protein B-factors without reconstructing the protein interaction Hamiltonian. In particular, we assume that biomolecular flexibility and rigidity are local structural properties. Therefore, the flexibility at an atom is completely determined by its local environment, namely, local geometry and local topological connectivity. We treat the (local) flexibility as an inverse of the (local) rigidity. As a consequence, we do not need to solve the (global) eigenvalue problem of the Hamiltonian. The first step of the FRI method is to measure protein topological connectivity from the distance geometry via smooth and monotonically decreasing radial basis functions. The atomic rigidity index is then associated with the total connectivity or interaction strength at each residue. Consequently, the atomic flexibility index, which is the inverse function of the atomic rigidity index, is associated with protein B-factors.
Protein flexibility is an intrinsic property that strongly correlates with protein functions. The analysis of protein flexibility is a crucial task in computational biophysics. Many established methods, including NMA7,15,23,39 and ENM41 such as Gaussian network model,4,5,12 have been developed in the past. These approaches typically depend on the Hamiltonian mechanics of elastic interactions and matrix decomposition of the interaction Hamiltonian. The present work examines the performance of the proposed FRI for protein flexibility analysis in a comparison with some cutting edge methods, specifically, GNM and NMA.
We calibrate the accuracy, reliability and computational efficiency of the FRI method by using GNM and NMA on three sets of proteins, including the relatively small-sized, medium sized, and large sized structures employed by Park et al.28 in a recent study. Additional calibration with the GNM is carried out on extended set. As a result, a total of 365 proteins is studied in the present comparative study. As an internal validation, the FRI method is realized by using three families of correlation functions. Correlation functions of generalized exponential type and Lorentz type are found to deliver better results. In particular, correlation functions of Lorentz type are simple and can be made parameter free, which is desirable for general use. Although GNM and NMA may outperform the proposed FRI method on certain proteins in terms of the accuracy of the B-factor prediction, the FRI method is able to improve on the average correlation coefficient of GNM and NMA on all three sets of proteins. Additionally, the FRI is found to significantly outperform the GNM on the extended superset of 365 structures as well.
A possible reason for the FRI to outperform the existing methods is that GNM and NMA are essentially global methods in a sense that they rely on the solution of the global eigenvalue problem to predict local atomic properties, e.g., B-factors. In contrast, the FRI is a local method and utilizes the local geometric information to predict local atomic properties. In parallel, there are (global) band theory of solids and (local) atomic orbital model of solids. The former is good for describing many global physical properties, such as electrical conductivity and thermal lattice motions in terms of excitations, while the latter is more powerful for explaining localized chemical reactivity and catalysis of solids.
The GNM is known for its superb computational efficiency.48 The matrix diagonalization is of in computational complexity, where N is the number of residues. The computational complexity of our original FRI is of . In the present work, we propose a fast FRI (fFRI) algorithm, which further reduces the computational complexity of FRI to . Both FRI and fFRI do not involve the time consuming matrix decomposition. As a result, it takes less than 30 s for the fFRI to predict the B factors of a HIV virus structure with more than three hundred thousands of residues, which otherwise requires many years for the GNM to compute. Additionally, both the exponential based parameter-free fFRI and the Lorentz based parameter-free fFRI are about 10% more accurate than the GNM in the B-factor prediction of 365 proteins.
Anisotropic motions between protein domains are known to correlate with protein functions. To describe protein anisotropic fluctuations, we introduce anisotropic FRI (aFRI) algorithms. We introduce an adaptive aFRI method which partitions the molecule into many clusters with variable sizes. We specifically examine two extreme cases, i.e., a one-cluster partition and N-cluster partition, which result in a completely global 3N × 3N Hessian matrix and N completely localized 3 × 3 Hessian matrices, respectively. The computational complexity of aFRI varies from to . Although aFRI Hessian matrices can be completely local, they still contain non-location correlation. As such, all of three protein modes predicted by the completely local aFRI exhibit highly collective global motions. The eigenmodes obtained from the completely global aFRI closely resemble those of the ANM.3,6 However, modes constructed from the completely local aFRI show different collective motion patterns. Since there is no analytical solution for collective motions, it is not possible to judge whose collective motions are more correct. In general, the eigenmodes of ANM and the completely global aFRI exhibit a slightly better synergistic effect than modes generated by using the completely local aFRI.
The proposed FRI has a few visual applications. First, the correlation maps of the FRI are capable of revealing both short- and long-distance interactions or connectivities. Since correlation map elements are directly related to the original distances by a known radial basis function, the distances can be labeled on the map as well. Additionally, the predicted B-factors can be plotted as the radii of residues to visualize the amplitude of thermal fluctuations. This plot becomes even more interesting when atomic spheres are colored with the electrostatics.46 The close correlation between flexibility and large electrostatic potentials can be unveiled, which sheds light on protein intrinsic structural properties. Moreover, the predicted B-factors can be plotted with secondary structures to have an overall picture of structural flexibility. Finally, as continuous functions, the atomic rigidity function and atomic flexibility function can be projected onto protein molecular surfaces or other surface representations to analyze flexibility.
Another application of FRI and aFRI is the analysis of protein domains. Existing methods, such as GNM and ANM, are well known for domain analysis. The present FRI provides a clear correlation map for domain identifications. It is found that aFRI gives rise to highly collective domain motion patterns, although not all parts of a domain move uniformly in our aFRI representations.
ACKNOWLEDGMENTS
This work was supported in part by NSF Grant Nos. IIS-1302285 and DMS-1160352, NIH Grant No. R01GM-090208, and MSU Center for Mathematical Molecular Biosciences Initiative. K.O. thanks Dr. Minxin Chen for useful discussions about the data structure of cell lists. The authors acknowledge the Mathematical Biosciences Institute for hosting valuable workshops.
REFERENCES
- 1.Allen M. P. and Tildesley D. J., Computer Simulation of Liquids (Clarendon Press, Oxford, 1987). [Google Scholar]
- 2.Anfinsen C. B., “Einfluss der configuration auf die wirkung den,” Science 181, 223–230 (1973). 10.1126/science.181.4096.223 [DOI] [PubMed] [Google Scholar]
- 3.Atilgan A. R., Durrell S. R., Jernigan R. L., Demirel M. C., Keskin O., and Bahar I., “Anisotropy of fluctuation dynamics of proteins with an elastic network model,” Biophys. J. 80, 505–515 (2001). 10.1016/S0006-3495(01)76033-X [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bahar I., Atilgan A. R., Demirel M. C., and Erman B., “Vibrational dynamics of proteins: Significance of slow and fast modes in relation to function and stability,” Phys. Rev. Lett. 80, 2733–2736 (1998). 10.1103/PhysRevLett.80.2733 [DOI] [Google Scholar]
- 5.Bahar I., Atilgan A. R., and Erman B., “Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential,” Folding Des. 2, 173–181 (1997). 10.1016/S1359-0278(97)00024-2 [DOI] [PubMed] [Google Scholar]
- 6.Bakan A., Meireles L. M., and Bahar I., “Prody: Protein dynamics inferred from theory and experiments,” Bioinformatics 27, 1575–1577 (2011). 10.1093/bioinformatics/btr168 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Brooks B. R., Bruccoleri R. E., Olafson B. D., States D., Swaminathan S., and Karplus M., “Charmm: A program for macromolecular energy, minimization, and dynamics calculations,” J. Comput. Chem. 4, 187–217 (1983). 10.1002/jcc.540040211 [DOI] [Google Scholar]
- 8.Chiti F. and Dobson C. M., “Protein misfolding, functional amyloid, and human disease,” Annu. Rev. Biochem. 75, 333–366 (2006). 10.1146/annurev.biochem.75.101304.123901 [DOI] [PubMed] [Google Scholar]
- 9.Cui Q. and Bahar I., Normal Mode Analysis: Theory and Applications to Biological and Chemical Systems (Chapman and Hall/CRC, 2010). [Google Scholar]
- 10.Cui Q., Li G. J., Ma J., and Karplus M., “A normal mode analysis of structural plasticity in the biomolecular motor f(1)-atpase,” J. Mol. Biol. 340(2), 345–372 (2004). 10.1016/j.jmb.2004.04.044 [DOI] [PubMed] [Google Scholar]
- 11.Demerdash O. N. A. and Mitchell J. C., “Density-cluster NMA: A new protein decomposition technique for coarse-grained normal mode analysis,” Proteins 80(7), 1766–1779 (2012). 10.1002/prot.24072 [DOI] [PubMed] [Google Scholar]
- 12.Flory P. J., “Statistical thermodynamics of random networks,” Proc. R. Soc. London, Ser. A 351, 351–378 (1976). 10.1098/rspa.1976.0146 [DOI] [Google Scholar]
- 13.Frauenfelder H., Slihar S. G., and Wolynes P. G., “The energy landscapes and motion of proteins,” Science 254(5038), 1598–1603 (1991). 10.1126/science.1749933 [DOI] [PubMed] [Google Scholar]
- 14.Gerek Z. N. and Ozkan S. B., “A flexible docking scheme to explore the binding selectivity of PDZ domains,” Protein Sci. 19, 914–928 (2010). 10.1002/pro.366 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Go N., Noguti T., and Nishikawa T., “Dynamics of a small globular protein in terms of low-frequency vibrational modes,” Proc. Natl. Acad. Sci. U.S.A. 80, 3696–3700 (1983). 10.1073/pnas.80.12.3696 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Hinsen K., “Analysis of domain motions by approximate normal mode calculations,” Proteins 33, 417–429 (1998). [DOI] [PubMed] [Google Scholar]
- 17.Hinsen K., “Structural flexibility in proteins: Impact of the crystal environment,” Bioinformatics 24, 521–528 (2008). 10.1093/bioinformatics/btm625 [DOI] [PubMed] [Google Scholar]
- 18.Humphrey W., Dalke A., and Schulten K., “VMD – Visual molecular dynamics,” J. Mol. Graphics 14(1), 33–38 (1996). 10.1016/0263-7855(96)00018-5 [DOI] [PubMed] [Google Scholar]
- 19.Jacobs D. J., Rader A. J., Kuhn L. A., and Thorpe M. F., “Protein flexibility predictions using graph theory,” Proteins: Struct., Funct., Genet. 44(2), 150–165 (2001). 10.1002/prot.1081 [DOI] [PubMed] [Google Scholar]
- 20.Keskin O., Bahar I., Flatow D., Covell D. G., and Jernigan R. L., “Molecular mechanisms of chaperonin GroEL-GroES function,” Biochemistry 41, 491–501 (2002). 10.1021/bi011393x [DOI] [PubMed] [Google Scholar]
- 21.Kondrashov D. A., Van Wynsberghe A. W., Bannen R. M., Cui Q., and Phillips J. G. N., “Protein structural variation in computational models and crystallographic data,” Structure 15, 169–177 (2007). 10.1016/j.str.2006.12.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kundu S., Melton J. S., Sorensen D. C., and Phillips J. G. N., “Dynamics of proteins in crystals: Comparison of experiment with simple models,” Biophys. J. 83, 723–732 (2002). 10.1016/S0006-3495(02)75203-X [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Levitt M., Sander C., and Stern P. S., “Protein normal-mode dynamics: Trypsin inhibitor, crambin, ribonuclease and lysozyme,” J. Mol. Biol. 181(3), 423–447 (1985). 10.1016/0022-2836(85)90230-X [DOI] [PubMed] [Google Scholar]
- 24.Li G. H. and Cui Q., “A coarse-grained normal mode approach for macromolecules: An efficient implementation and application to Ca(2+)-ATPase,” Biophys. J. 83, 2457–2474 (2002). 10.1016/S0006-3495(02)75257-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Livesay D. R., Dallakyan S., Wood G. G., and Jacobs D. J., “A flexible approach for understanding protein stability,” FEBS Lett. 576, 468–476 (2004). 10.1016/j.febslet.2004.09.057 [DOI] [PubMed] [Google Scholar]
- 26.Ma J. P., “Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes,” Structure 13, 373–180 (2005). 10.1016/j.str.2005.02.002 [DOI] [PubMed] [Google Scholar]
- 27.Onuchic J. N., Luthey-Schulten Z., and Wolynes P. G., “Theory of protein folding: The energy landscape perspective,” Annu. Rev. Phys. Chem. 48, 545–600 (1997). 10.1146/annurev.physchem.48.1.545 [DOI] [PubMed] [Google Scholar]
- 28.Park J. K., Jernigan R., and Wu Z., “Coarse grained normal mode analysis vs. refined gaussian network model for protein residue-level structural fluctuations,” Bull. Math. Biol. 75, 124–160 (2013). 10.1007/s11538-012-9797-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Rader A. J., Vlad D. H., and Bahar I., “Maturation dynamics of bacteriophage hk97 capsid,” Structure 13, 413–421 (2005). 10.1016/j.str.2004.12.015 [DOI] [PubMed] [Google Scholar]
- 30.Renka R. J., “Multivariate interpolation of large sets of scattered data,” ACM Trans. Math. Software 14(2), 139–148 (1988). 10.1145/45054.45055 [DOI] [Google Scholar]
- 31.Rossman M. G. and Liljas A., “Recognition of structural domains in globular proteins,” J. Mol. Biol. 85, 177–181 (1974). 10.1016/0022-2836(74)90136-3 [DOI] [PubMed] [Google Scholar]
- 32.Schroder M. and Kaufman R. J., “The mammalian unfolded protein response,” Annu. Rev. Biochem. 74, 739–789 (2005). 10.1146/annurev.biochem.73.011303.074134 [DOI] [PubMed] [Google Scholar]
- 33.Sept D. and MacKintosh F. C., “Microtubule elasticity: Connecting all-atom simulations with continuum mechanics,” Phys. Rev. Lett. 104(1), 018101 (2010). 10.1103/PhysRevLett.104.018101 [DOI] [PubMed] [Google Scholar]
- 34.Skjaerven L., Hollup S. M., and Reuter N., “Normal mode analysis for proteins,” J. Mol. Struct.: THEOCHEM 898, 42–48 (2009). 10.1016/j.theochem.2008.09.024 [DOI] [Google Scholar]
- 35.Song G. and Jernigan R. L., “vGNM: A better model for understanding the dynamics of proteins in crystals,” J. Mol. Biol. 369(3), 880–893 (2007). 10.1016/j.jmb.2007.03.059 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Tama F. and Brooks C. K. III, “Diversity and identity of mechanical properties of icosahedral viral capsids studied with elastic network normal mode analysis,” J. Mol. Biol. 345, 299–314 (2005). 10.1016/j.jmb.2004.10.054 [DOI] [PubMed] [Google Scholar]
- 37.Tama F. and Sanejouand Y. H., “Conformational change of proteins arising from normal mode calculations,” Protein Eng. 14, 1–6 (2001). 10.1093/protein/14.1.1 [DOI] [PubMed] [Google Scholar]
- 38.Tama F., Valle M., Frank J., and Brooks C. K. III, “Dynamic reorganization of the functionally active ribosome explored by normal mode analysis and cryo-electron microscopy,” Proc. Natl. Acad. Sci. U.S.A. 100(16), 9319–9323 (2003). 10.1073/pnas.1632476100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Tasumi M., Takenchi H., Ataka S., Dwidedi A. M., and Krimm S., “Normal vibrations of proteins: Glucagon,” Biopolymers 21, 711–714 (1982). 10.1002/bip.360210318 [DOI] [PubMed] [Google Scholar]
- 40.Thacker W. I., Zhang J., Watson L. T., Birch J. B., Iyer M. A., and Berry M. W., “Algorithm 905: SHEPPACK: Modified Shepard algorithm for interpolation of scattered multivariate data,” ACM Trans. Math. Software 37(3), 1–20 (2010). 10.1145/1824801.1824812 [DOI] [Google Scholar]
- 41.Tirion M. M., “Large amplitude elastic motions in proteins from a single-parameter, atomic analysis,” Phys. Rev. Lett. 77, 1905–1908 (1996). 10.1103/PhysRevLett.77.1905 [DOI] [PubMed] [Google Scholar]
- 42.von der Lieth C. W., Stumpf-Nothof K., and Prior U., “A bond flexibility index derived from the constitution of molecules,” J. Chem. Inf. Comput. Sci. 36, 711–716 (1996). 10.1021/ci9501204 [DOI] [Google Scholar]
- 43.Wang Y., Rader A. J., Bahar I., and Jernigan R. L., “Global ribosome motions revealed with elastic network model,” J. Struct. Biol. 147, 302–314 (2004). 10.1016/j.jsb.2004.01.005 [DOI] [PubMed] [Google Scholar]
- 44.Wei G. W., “Wavelets generated by using discrete singular convolution kernels,” J. Phys. A: Math. Gen. 33, 8577–8596 (2000). 10.1088/0305-4470/33/47/317 [DOI] [Google Scholar]
- 45.White S. H. and Wimley W. C., “Membrane protein folding and stability: Physical principles,” Annu. Rev. Biophys. Biomol. Struct. 28, 319–365 (1999). 10.1146/annurev.biophys.28.1.319 [DOI] [PubMed] [Google Scholar]
- 46.Xia K. L., Opron K., and Wei G. W., “Multiscale multiphysics and multidomain models: Flexibility and rigidity,” J. Chem. Phys. 139, 194109 (2013). 10.1063/1.4830404 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Xu C., Tobi D., and Bahar I., “Allosteric changes in protein structure computed by a simple mechanical model: Hemoglobin t ↔ r2 transition,” J. Mol. Biol. 333, 153–168 (2003). 10.1016/j.jmb.2003.08.027 [DOI] [PubMed] [Google Scholar]
- 48.Yang L. W. and Chng C. P., “Coarse-grained models reveal functional dynamics–I. Elastic network models–Theories, comparisons and perspectives,” Bioinf. Biol. Insights 2, 25–45 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Zheng W., Brooks B. R., and Thirumalai D., “Allosteric transitions in the chaperonin groel are captured by a dominant normal mode that is most robust to sequence variations,” Biophys. J. 93, 2289–2299 (2007). 10.1529/biophysj.107.105270 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Zheng W. J. and Doniach S., “A comparative study of motor-protein motions by using a simple elastic-network model,” Proc. Natl. Acad. Sci. U.S.A. 100(23), 13253–13258 (2003). 10.1073/pnas.2235686100 [DOI] [PMC free article] [PubMed] [Google Scholar]