The structure of protein dynamic space

S Rackovsky; Harold A Scheraga

doi:10.1073/pnas.2008873117

. 2020 Aug 5;117(33):19938–19942. doi: 10.1073/pnas.2008873117

The structure of protein dynamic space

S Rackovsky ^a,^b,¹, Harold A Scheraga ^a,¹

PMCID: PMC7443874 PMID: 32759212

Significance

Protein dynamic properties have been computationally accessible only by means of molecular-dynamic simulations, or network models, of specific molecules. We have developed a bioinformatic approach to dynamics, which makes it possible to delineate the dynamic characteristics of large numbers of sequences simultaneously. In this work we report an analysis of the large-scale dynamic structure of protein space. It is demonstrated that proteins of different structural classes have different dynamic behaviors, and that all-helical proteins occur with two distinct types of dynamic behavior. One subset of helical proteins is characterized by localized, helix-based dynamics, while the complementary subset exhibits dynamics of a more three-dimensional nature. This information has not been available through the application of more traditional methods.

Keywords: B factor, protein dynamics, structure–dynamics relationships, Fourier transform

Abstract

We use a bioinformatic description of amino acid dynamic properties, based on residue-specific average B factors, to construct a dynamics-based, large-scale description of a space of protein sequences. We examine the relationship between that space and an independently constructed, structure-based space comprising the same sequences. It is demonstrated that structure and dynamics are only moderately correlated. It is further shown that helical proteins fall into two classes with very different structure–dynamics relationships. We suggest that dynamics in the two helical classes are dominated by distinctly different modes––pseudo–one-dimensional, localized helical modes in one case, and pseudo–three-dimensional (3D) global modes in the other. Sheet/barrel and mixed-α/β proteins exhibit more conventional structure–dynamics relationships. It is found that the strongest correlation between structure and dynamic properties arises when the latter are represented by the sequence average of the dynamic index, which corresponds physically to the overall mobility of the protein. None of these results are accessible to bioinformatic methods hitherto available.

Protein structure and evolution have been intensively studied for many years, and vast bodies of sequence and structure data have been accumulated and analyzed using tools of bioinformatics. This approach is usually referred to as “knowledge-based,” to distinguish it from an equally impressive body of computational studies based on simulation, using physically motivated empirical energy functions, of actual physical processes. One central area of protein science, however, has thus far resisted knowledge-based study. Protein dynamic characteristics have only been available computationally from two frameworks. These are molecular-dynamic simulations and elastic network models, processor-intensive approaches which limit studies to single proteins, or to comparisons of small groups of molecules of interest. This situation arises from the fact that no informatic parameter has been available which adequately represents the dynamics of individual amino acids.

In recent work (1), we have developed a measure of the dynamic properties of amino acids in protein sequences which is suitable for bioinformatic use. This property is the residue-specific average value of the B factor (2), determined from a large database of protein structures. We denote the average B factor for amino acid X as <B(X)>. The quantity plays the same general role with respect to dynamics that a hydrophobicity index plays with respect to solvent exposure. It is not the case that every hydrophilic amino acid is in actual contact with solvent, nor does every amino acid with a high value of exhibit high mobility. Rather, <B(X)> is a measure of the tendency of the amino acid X to be in motion. The information carried by <B(X)> becomes important in the context of a complete sequence, as is also true of hydrophobicity indices.

It was shown (1) that the values of differ between amino acids in a statistically significant manner. Using statistical, signal processing, and information theoretic methods, we demonstrated several properties of :

1)
Values of are partly, but not exclusively, determined by the values of the intrinsic physical properties of the amino acids, as represented by an complete and orthogonal set of property factors (3, 4);
2)
 also encodes information about the influence of protein fold and other residue-external factors on dynamics;
3)
Using Fourier techniques, the global, whole-sequence dynamic properties of sequences can be represented;
4)
A substantial fraction of the information encoded in the global representation of dynamic properties originates from the part of which does not arise from single-amino acid physical properties, and is therefore not accessible from any representation based on static amino acid properties;
5)
Groups of proteins which fold to different architectures differ from one another in their behavior in a detectable and statistically significant manner, when represented by global dynamic parameters.

The availability of a well-characterized bioinformatic quantity derived from the dynamic properties of amino acids makes possible study of the dynamic properties of proteins on a large scale, rather than anecdotally. We wish to understand the relationship between the space of protein structures and a parallel, distinct space determined by the dynamic properties of the same proteins. We demonstrate the following results:

1)
The relationship between the two spaces is characterized, in part, by an anomalous dependence of dynamic distance on structure difference.
2)
This anomaly arises from unexpected behavior of all-helical proteins, which exhibit two distinct types of behavior in dynamic space.
3)
We suggest that these behaviors correspond to physically different dynamic regimes within the universe of all-helical proteins.
4)
Structure–dynamics correlations in proteins are encoded in the overall mobility of the structure, rather than in more localized descriptions of chain dynamic properties.

Results and Discussion

In the present work, we examine a basic, but hitherto inaccessible question about proteins––whether structure and dynamics are related in a simple way. We proceed as follows.

1)
We construct a dynamics-based distance function between proteins, based on the global properties of , and apply it to a large protein database. This generates a protein space determined by sequence dynamic properties.
2)
We construct a structure-based distance function between proteins which allows rapid, optimization-free comparison of molecular architectures. We demonstrate that, despite the fact that this function is based on low-resolution information, it accurately describes the organization of structure space. This distance function generates a second, independent space determined by the structures in the same large protein database.
3)
We analyze the relationship between these two spaces, and ask what information this relationship carries about protein physics.

It should be noted that there is reason to question whether structure and dynamics are of necessity related. Even in cases where structural homology and functional similarity are both obtained, dynamic differences have been demonstrated (5).

In order to understand the relationship between the spaces associated with our database––the structure space, which we denote by S, and the dynamic space, which we denote by D—we consider the correlation between corresponding pairwise distances in the two spaces. Naively, one expects to find that distances in the two spaces should be positively correlated, because, as structures become less similar, the dynamic characteristics of the molecules diverge in some reasonable sense. We shall examine whether this is, in fact, the case.

We begin by considering the distance function in S. We use a low-resolution representation of protein structures which we have shown (6, 7) to give a structure space essentially equivalent to that obtained by high-resolution methods, while making possible extremely rapid comparisons of structure. This leads to a Euclidean sequence distance function in a three-dimensional space, shown in Eq. 3 in Methods. We demonstrate there that the organization of the space generated by this metric corresponds precisely to what would be expected based on physical intuition.

We turn next to the construction of the distance function in D. The proteins in our database are labeled by values of the four indices C,A,T, and H which together classify entries in the CATH database (8). We focus first on the identifier C, which specifies structural class. It will be remembered that C = 1 denotes helical architecture, C = 2 sheet/barrel architecture, and C = 3 mixed-α/β architecture. The sequences of the 5,719 proteins in our database are written in terms of <B(X)>, giving a numerical string for each sequence (which we denote as the dynamic sequence), and we ask in what way sequences belonging to the three C classes differ from one another. We answer this question by Fourier analyzing the dynamic sequences, and carrying out an ANOVA analysis of the distributions of the resulting Fourier coefficients. (Details of the procedure are given in Methods.) We require a high degree of statistical significance (9), and find that there are 11 values of the wave number k at which the distributions of sine or cosine Fourier coefficients of sequences belonging to the three classes differ from one another with P < 0.0001. We measure distance in D using a weighted Euclidean distance function based on these 11 Fourier coefficients, as shown in Eq. 4 below. The weighting allows us to measure independently the contribution of each of the significant wave numbers to any structure–dynamics correlation we observe.

Given these two functions, the correlation between distances in the two spaces can be determined for any protein in the database, and for any set of values of the dynamic weighting factors. We denote this correlation coefficient by R(S_m,D_m;{w_i}), where distances are measured from protein m, and {w_i|i = 1,2...11} is the set of weighting functions used in the dynamic distance function. We have carried out this calculation for every protein in the database, and for all 2,047 possible binary values of the 11 weights. The results are summarized in Fig. 1, in which we show a side-by-side boxplot of the average value, maximum, and minimum of R(S_m,D_m;{w_i}). It will be seen that one specific choice of {w_i} gives an exceptionally large average and range for R. This weighting, {w_i} = (1,0,0...,0), corresponds to a distance measured solely by the values of the cos(k = 0) Fourier coefficient. The cos(k = 0) Fourier coefficient is the average value of , measured over the sequence. This coefficient, it should be noted, contains no information about the actual linear arrangement of residues along the sequence. In physical terms, it measures the average tendency of all of the residues in the sequence to be in motion. We shall refer to this coefficient as the mobility of the sequence.

The results embodied in Fig. 1 are of interest for several reasons.

1)
The values of R(S_m,D_m;{w_i}) are not large, even for the exceptional case {w_i} = (1,0....,0). (The actual distribution of R for the optimal weighting set is shown in Fig. 2) This confirms the anecdotal observation that molecular architecture does not strongly dictate dynamic behavior.
2)
Both positive and negative values of R are found, in contrast to intuitive expectation.
3)
The strongest correlation between dynamic and structural distances occurs when global dynamics are described by a function which depends only on the average tendency of residues in each sequence to be mobile.

Fig. 2. — A histogram of the values of RALL, the correlation between structure and dynamic distances, for all 5,719 proteins in the data set.

We shall refer to the R > 0 and R < 0 subsets of the database as normal and anomalous, respectively. The normal regime corresponds physically to “expected” behavior, in which dynamics diverge as structures do. The complementary, anomalous regime corresponds physically to an extended region of dynamic similarity in structure space. It is instructive to examine the relationship between these two types of behavior and molecular architecture. We find that the distribution of C values in the anomalous region is very different from that in the sample as a whole. Fully 65% of proteins with R < 0 are α-helical. All-β (17%) and α/β (18%) structures are found with much lower frequency in this region. In the database as a whole the corresponding percentages are 24, 26, and 50%.

A statistical analysis of the observed distribution sheds further light on this result. We ask to what extent the observed distribution of structure classes in the anomalous region differs from that in the entire database. For this purpose we subdivide the structural classes, by labeling each protein with values of both C, the CATH parameter which indicates structural class, and A, the parameter which indicates architecture. For each CA group, we ask whether the number of representatives in the R < 0 region is significantly larger than would be expected on a random basis. The answer to this question is summarized in Table 1. The only structural classes which occur in the anomalous region with greater than random probability (corresponding to Z > 0) are those with C = 1, and all C = 1 classes are represented in this region.

Table 1.

Comparative structural properties of database and anomalous region

C	A	Nneg	Nall	fNeg	fAll	Z	P^*
1	10	332	899	0.3701	0.1572	15.2357385	∼0
1	20	207	415	0.2308	0.0726	15.0944806	∼0
1	25	20	58	0.0223	0.0101	3.13571099	<0.0017
1	40	2	2	0.0022	0.0003	2.12962067	<0.04
1	50	19	24	0.0212	0.0042	5.88583513	<0.00001
2	10	17	110	0.0190	0.0192	−0.0572426
2	20	4	28	0.0045	0.0049	−0.1752547
2	30	30	152	0.0334	0.0266	1.16903786
2	40	45	311	0.0502	0.0544	−0.5199053
2	60	54	683	0.0602	0.1194	−5.2417886	<<0.0001
2	70	2	44	0.0022	0.0077	−1.8310616
2	80	1	36	0.0011	0.0063	−1.9341994
2	160	1	34	0.0011	0.0059	−1.8541518
2	170	1	25	0.0011	0.0044	−1.4493877
3	10	30	269	0.0334	0.0470	−1.8219082
3	20	6	234	0.0067	0.0409	−5.0973756	<<0.0001
3	30	56	766	0.0624	0.1339	−6.0365754	<<0.0001
3	40	51	1,136	0.0569	0.1986	−10.289245	<<0.0001
3	50	4	41	0.0045	0.0072	−0.9180483
3	80	1	20	0.0011	0.0035	−1.1793186
3	90	14	282	0.0156	0.0493	−4.5394363	<<0.0001

Open in a new tab

C and A are CATH labels (see text); Nneg and Nall are the number of occurrences of each CA group in the anomalous region and full database, respectively; fNeg and fAll are the corresponding composition fractions; Z is the Z score for the comparison of fractional occurrences; p is the associated probability of randomly observing the difference in fractions.

Values of P are given only for cases where the occurrence is significantly different from random.

The same C = 1 classes are also represented in the normal region. We ask what the dynamic difference is between C = 1 proteins which occur in the anomalous region of dynamic space and those in the normal region. Table 2 shows average values of the mobility distributions in the two regions. Comparison of the two averages gives |Z| = 33.5, corresponding to distributions of R(S_m,D_m;{w_i}) which differ with p ∼ 0. The two distributions differ with very high significance, despite the structural similarities between members of each CA class.

Table 2.

Dynamic comparison between helical proteins with R ≤ 0 and R > 0

	N	<cos(0)>	σ
R ≤ 0	580	27.04	0.24
R > 0	818	28.15	0.26

Open in a new tab

N is the number of proteins in each region. <cos(0)> is the average value of the k = 0 Fourier coefficient in the region, and σ is the associated SD.

Helical proteins in the anomalous region are seen to have lower mobility than those in the normal region, and the sensitivity of those mobilities to structural change is greatly reduced. The insensitivity in this region to overall structural differences suggests that mobility is significantly influenced by structural subfeatures which are common to all of the proteins in question, irrespective of changes in overall architecture. We speculate that the dynamics of molecules in the R < 0 regime is dominated by modes which are characteristic of helical segments, and are quasi–one-dimensional in nature, rather than those which depend in a 2- or 3D way on the manner in which helical segments are assembled.

Further inspection of Table 1 indicates that, whereas sheet/barrel proteins are generally distributed randomly between the two regions of dynamic space, mixed-α/β proteins show a strong tendency to avoid the anomalous region. A majority of C = 3 groups (comprising 88% of all α/β proteins in the dataset) are found to prefer the normal region, with very high significance. The peak in the histogram of Fig. 2 at high positive values of R(S_m,D_m;{w_i}) contains only proteins with C = 2 and C = 3.

In Table 3 we show the average values of the mobility for the three structural classes. Differences between the averages are significant with P < <0.0001. The highest average mobility is exhibited by helical proteins, and the lowest by sheet/barrel structures. However, the average value for helical proteins is a composite. As is clear from Table 2, the subset of helical proteins with R > 0 exhibit higher average mobility, and the subset with R < 0 lower average mobility, than any other class of structures. From a dynamic perspective, there are two separate classes of helical proteins.

Table 3.

Average mobility for different structural classes

C	<cos(0)>
1	27.97
2	27.79
3	27.84

Open in a new tab

The fact that the greatest correlation with structure is given by a dynamic description which depends only on the cos(k = 0) Fourier coefficient is suggestive. The k = 0 coefficient arises from an equal weighting of at all sequence positions and is a global measure of dynamic characteristics. Dynamic distance functions which include contributions from coefficients with higher values of k, and are therefore more localized in nature, underweight some regions of the molecule, and apparently obscure information which connects dynamics with overall structure.

Several questions are raised by our results. One would like to know what structural features differentiate helical proteins with R < 0 from those with R > 0? What detailed molecular motions lead to the observed behaviors? Is there a difference in the character of intramolecular interactions between the two regimes? Another intriguing question is whether one observes a similar dichotomy of behavior in groups of proteins which are known to be structurally very similar? What behavior will be observed if the analysis is extended to topological (CAT) subgroups of the C = 1 proteins? We are addressing these questions in ongoing work.

Methods

Average B Factors.

In this work, we use values of the residue-specific average B factor which have been adjusted from our previous work (1), to account for the removal from our database of some defective sequences. The new values of <B(X)> are correlated with the previous values with R = 0.98, and all conclusions of our previous work remain. The adjusted values are shown in Table 4.

Table 4.

Values of

	N	<B>	σ
ALA	70,827	26.27	18.58
ASP	49,858	29.89	20.36
CYS	12,640	26.27	18.60
GLU	57,792	31.92	21.21
PHE	34,385	25.31	17.48
GLY	65,405	27.45	19.18
HIS	19,436	26.98	18.99
ILE	48,947	25.86	17.92
LYS	51,027	31.30	20.83
LEU	77,160	26.88	18.44
MET	18,205	27.65	19.42
ASN	37,037	28.52	20.14
PRO	39,407	28.77	19.58
GLN	31,524	29.54	20.65
ARG	42,687	29.47	20.16
SER	51,065	28.32	19.64
THR	48,400	27.06	19.32
VAL	62,648	25.67	18.03
TRP	11,966	23.98	16.41
TYR	30,558	25.07	17.41

Open in a new tab

N is the number of occurrences of each amino acid, is the value of the residue-specific average B factor, and σ is the associated SD.

Database.

The dataset we used for these studies is a subset of our standard dataset (8, 10). The basic data set is drawn from the CathDomainSeqs.S60.ATOM.v.3.2.0 dataset (ref. 11; www.cathdb.info), and the sequences therein exhibit no more than 60% pairwise sequence identity. The subset utilized in this work was selected to contain only structures for which reliable B factors are given. It contains 5,719 structures.

Fourier Analysis.

The details of the Fourier approach have been extensively described in previous work (10, 12–17) The methods which were developed to study the 10 static property factors in that work carry over unchanged to the analysis of the dynamic (B-factor) sequence. As in previous work, the sequences are described in terms of the Z functions.

Z_{k} = \frac{c_{k} - {〈 c_{k} 〉}_{N}}{σ (c_{k})},

[1]

where the bracket indicates an average of the Fourier coefficient c_k over all permutations of the wild-type sequence of the protein in question, and σ is the associated SD. We have shown (14) that these statistical quantities can be calculated analytically. As was noted previously (13), the effect of this normalization is to remove any dependence on sequence composition alone, so that the Z function explicitly encodes information about the specific linear arrangement of amino acids in the wild-type sequence. Sequence composition information is encoded in the k = 0 cosine Fourier coefficient, whose value is independent of the linear arrangement of amino acids.

Structure–Distance Function.

We have shown in previous work (8, 9) that the structure of a protein can be represented numerically by a low-resolution representation which is four-dimensional. Each protein can be written as a point in this space with coordinates (E_L,E₀,E_R,A_R), where the coordinates are the fractional occurrence of 4-C_α fragments in one of three extended structure types or in an α-helical conformation, respectively. Because the four fractions are normalized (E_L+ E₀+ E_R+ A_R = 1), there are only three independent variables, and a principal component analysis shows the structure space S to be 3D under the following transformation:

\begin{matrix} Ξ_{1} = - (0.515) E_{L} - (0.518) E_{0} - (0.34) E_{R} + (0.592) A_{R} \\ Ξ_{2} = - (0.099) E_{L} - (0.358) E_{0} + (0.92) E_{R} + (0.128) A_{R} \\ Ξ_{3} = - (0.780) E_{L} + (0.6) E_{0} + (0.158) E_{R} - (0.065) A_{R} \end{matrix} .

[2]

Each protein is now represented in structure space by a three-vector Ξ = (Ξ₁, Ξ₂, Ξ₃). The three principal components account for 100% of the variance of the four coordinates.

A meaningful representation of structure space should exhibit a physically sensible separation between proteins belonging to the three structural classes. In Fig. 3 this is shown to be the case. We find that the two extremes of S are occupied by helical (C = 1) and sheet/barrel (C = 2) structures, and the region intermediate between those extremes contains mixed-α/β (C = 3) structures, which are indeed structurally intermediate between the two other classes. The organization of structure space is precisely that which would be expected on the basis of physical intuition.

Fig. 3. — The organization of the structure space S, arising from the structure–distance function in Eq. 3. Helical proteins are shown in red, sheet/barrel proteins in blue, and mixed-α/β proteins in black.

With this result established, we use a straightforward Euclidean structure distance function. The structural distance Ω between proteins P and Q is given by

Ω (P,Q) = {[\sum_{i = 1}^{3} {(Ξ_{i} (P) - Ξ_{i} (Q))}^{2}]}^{1 / 2} .

[3]

ANOVA.

In the present paper, we are interested in determining at which values of the wave-number k distributions of Fourier coefficients of the dynamic sequences differ with statistical significance between the three structural classes. We therefore carry out an ANOVA comparison of these distributions, in the same manner as our previous work (17). Only structure groups with at least 20 representatives were used in the analysis, in order to guarantee statistical reliability. We require that differences between distributions be significant with P < 0.001. It is found that differences at this level of significance exist at 11 wave numbers, which are shown in Table 5. These 11 wave numbers are used to construct the dynamic distance function discussed next.

Table 5.

Values of k included in dynamic distance function

k	sin	cos
0		x
2	X
3	X
4	X
5	X
15		x
18	X
46		x
54		x
55		x
60		x

Open in a new tab

Dynamic Distance Function.

The distance Δ between two proteins P and Q in the dynamic space D is written as a weighted Euclidean function:

Δ (P,Q, \{w_{i}\}) = {[\sum_{i = 1}^{11}, w_{i}, {(Z_{i} (P) - Z_{i} (Q))}^{2}]}^{1 / 2},

[4]

where the Z are defined in Eq. 1, the k values indexed by i are shown in Table 5, and the weighting factors w_i take the values 0 or 1.

Structure–Dynamic Correlation.

We calculated the correlation coefficient R(S_m,D_m;{w_i}) between Δ(P,Q,{w_i}) and Ω(P,Q) (Eqs. 3 and 4) for all proteins m in the data set, and all possible weighting sets {w_i|w_i = 0,1; i = 1,…,11}- a total of 2047 sets (excluding the trivial set {w_i| w_i = 0 ∀i}). The results of that calculation are shown in Figs. 1 and 2. It should be noted that, given the size of our dataset, values of |R(S_m,D_m;{w_i})|≥0.03 are statistically significant, for any {w_i}.

Acknowledgments

We thank Dr. Gia Maisuradze and Dr. Khatuna Kachlishvili for enlightening and helpful discussions. We thank the referees for helpful comments, which have contributed to this work. This work was supported by NIH/NIGMS Grant GM14312.

Footnotes

The authors declare no competing interest.

Data Availability.

All study data are included in the article.

References

1.Scheraga H. A., Rackovsky S., Sequence-specific dynamic information in proteins. Proteins 87, 799–804 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Sun Z., Liu Q., Qu G., Feng Y., Reetz M. T., Utility of B-factors in protein science: Interpreting rigidity, flexibility, and internal motion and engineering thermostability. Chem. Rev. 119, 1626–1665 (2019). [DOI] [PubMed] [Google Scholar]
3.Kidera A., Konishi Y., Oka M., Ooi T., Scheraga H. A., Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J. Protein Chem. 4, 23–55 (1985). [Google Scholar]
4.Kidera A., Konishi Y., Ooi T., Scheraga H. A., Relation between sequence similarity and structural similarity in proteins: Role of important properties of amino acids. J. Protein Chem. 4, 265–297 (1985). [Google Scholar]
5.He Y. et al., Sequence-, structure-, and dynamics-based comparisons of structurally homologous CheY-like proteins. Proc. Natl. Acad. Sci. U.S.A. 114, 1578–1583 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Scheraga H. A., Rackovsky S., Homolog detection using global sequence properties suggests an alternate view of structural encoding in protein sequences. Proc. Natl. Acad. Sci. U.S.A. 111, 5225–5229 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Rackovsky S., Quantitative organization of the known protein x-ray structures. I. Methods and short-length-scale results. Proteins 7, 378–402 (1990). [DOI] [PubMed] [Google Scholar]
8.Dawson N. L. et al., CATH: An expanded resource to predict protein function through structure and sequence. Nucleic Acids Res. 45, D289–D295 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Johnson V. E., Revised standards for statistical evidence. Proc. Natl. Acad. Sci. U.S.A. 110, 19313–19317 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Scheraga H. A., Rackovsky S., Global informatics and physical property selection in protein sequences. Proc. Natl. Acad. Sci. U.S.A. 113, 1808–1810 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Orengo C. A. et al., CATH–A hierarchic classification of protein domain structures. Structure 5, 1093–1108 (1997). [DOI] [PubMed] [Google Scholar]
12.Rackovsky S., “Hidden” sequence periodicities and protein architecture. Proc. Natl. Acad. Sci. U.S.A. 95, 8580–8584 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Rackovsky S., Characterization of architecture signals in proteins. J. Phys. Chem. B 110, 18771–18778 (2006). [DOI] [PubMed] [Google Scholar]
14.Rackovsky S., Sequence physical properties encode the global organization of protein structure space. Proc. Natl. Acad. Sci. U.S.A. 106, 14345–14348 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Rackovsky S., Global characteristics of protein sequences and their implications. Proc. Natl. Acad. Sci. U.S.A. 107, 8623–8626 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Rackovsky S., Spectral analysis of a protein conformational switch. Phys. Rev. Lett. 106, 248101 (2011). [DOI] [PubMed] [Google Scholar]
17.Rackovsky S., Sequence determinants of protein architecture. Proteins 81, 1681–1685 (2013). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All study data are included in the article.

[r1] 1.Scheraga H. A., Rackovsky S., Sequence-specific dynamic information in proteins. Proteins 87, 799–804 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r2] 2.Sun Z., Liu Q., Qu G., Feng Y., Reetz M. T., Utility of B-factors in protein science: Interpreting rigidity, flexibility, and internal motion and engineering thermostability. Chem. Rev. 119, 1626–1665 (2019). [DOI] [PubMed] [Google Scholar]

[r3] 3.Kidera A., Konishi Y., Oka M., Ooi T., Scheraga H. A., Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J. Protein Chem. 4, 23–55 (1985). [Google Scholar]

[r4] 4.Kidera A., Konishi Y., Ooi T., Scheraga H. A., Relation between sequence similarity and structural similarity in proteins: Role of important properties of amino acids. J. Protein Chem. 4, 265–297 (1985). [Google Scholar]

[r5] 5.He Y. et al., Sequence-, structure-, and dynamics-based comparisons of structurally homologous CheY-like proteins. Proc. Natl. Acad. Sci. U.S.A. 114, 1578–1583 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r6] 6.Scheraga H. A., Rackovsky S., Homolog detection using global sequence properties suggests an alternate view of structural encoding in protein sequences. Proc. Natl. Acad. Sci. U.S.A. 111, 5225–5229 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7] 7.Rackovsky S., Quantitative organization of the known protein x-ray structures. I. Methods and short-length-scale results. Proteins 7, 378–402 (1990). [DOI] [PubMed] [Google Scholar]

[r8] 8.Dawson N. L. et al., CATH: An expanded resource to predict protein function through structure and sequence. Nucleic Acids Res. 45, D289–D295 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r9] 9.Johnson V. E., Revised standards for statistical evidence. Proc. Natl. Acad. Sci. U.S.A. 110, 19313–19317 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r10] 10.Scheraga H. A., Rackovsky S., Global informatics and physical property selection in protein sequences. Proc. Natl. Acad. Sci. U.S.A. 113, 1808–1810 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r11] 11.Orengo C. A. et al., CATH–A hierarchic classification of protein domain structures. Structure 5, 1093–1108 (1997). [DOI] [PubMed] [Google Scholar]

[r12] 12.Rackovsky S., “Hidden” sequence periodicities and protein architecture. Proc. Natl. Acad. Sci. U.S.A. 95, 8580–8584 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13] 13.Rackovsky S., Characterization of architecture signals in proteins. J. Phys. Chem. B 110, 18771–18778 (2006). [DOI] [PubMed] [Google Scholar]

[r14] 14.Rackovsky S., Sequence physical properties encode the global organization of protein structure space. Proc. Natl. Acad. Sci. U.S.A. 106, 14345–14348 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r15] 15.Rackovsky S., Global characteristics of protein sequences and their implications. Proc. Natl. Acad. Sci. U.S.A. 107, 8623–8626 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16] 16.Rackovsky S., Spectral analysis of a protein conformational switch. Phys. Rev. Lett. 106, 248101 (2011). [DOI] [PubMed] [Google Scholar]

[r17] 17.Rackovsky S., Sequence determinants of protein architecture. Proteins 81, 1681–1685 (2013). [DOI] [PubMed] [Google Scholar]

PERMALINK

The structure of protein dynamic space

S Rackovsky

Harold A Scheraga

Significance

Abstract

Results and Discussion

Fig. 1.

Fig. 2.

Table 1.

Table 2.

Table 3.

Methods

Average B Factors.

Table 4.

Database.

Fourier Analysis.

Structure–Distance Function.

Fig. 3.

ANOVA.

Table 5.

Dynamic Distance Function.

Structure–Dynamic Correlation.

Acknowledgments

Footnotes

Data Availability.

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

The structure of protein dynamic space

S Rackovsky

Harold A Scheraga

Significance

Abstract

Results and Discussion

Fig. 1.

Fig. 2.

Table 1.

Table 2.

Table 3.

Methods

Average B Factors.

Table 4.

Database.

Fourier Analysis.

Structure–Distance Function.

Fig. 3.

ANOVA.

Table 5.

Dynamic Distance Function.

Structure–Dynamic Correlation.

Acknowledgments

Footnotes

Data Availability.

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases