Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 1998 Apr 28;95(9):4976–4981. doi: 10.1073/pnas.95.9.4976

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright © 1998, The National Academy of Sciences

PMC Copyright notice

Analysis for the CMBF superfamily. Ninety-eight families were used (the list is available from authors on request). All listed proteins are structurally homologous to CheY with Z > 3 and RMSD < 4A, according to the families of structurally similar proteins (FSSP) database (19). We used a coarse-grained six-letter amino-acid alphabet whereby amino acids were grouped according to their physical properties into following six classes: “aliphatic + Cys”: A, L, I, V, M, C; “aromatic”: F, Y, W, H; small nonpolar: G, P; polar: T, S, Q, N; basic: R, K; and acidic: E, D. The analysis using all 20 types of amino acids gives results that are qualitatively similar. Horizontal axes denote position in the CheY, which was taken as reference. (a, circles) CoC analysis: intrafamily sequence entropy averaged over all 98 families (excluding gaps), calculated as S_CoC(l) = ∑_F=1^M S_intra^F(l)/M. Here, the sum is taken over all of the 98 families used in the analysis, excluding gaps. Intrafamily sequence entropy for every position, for a given family, F, is calculated as follows: S_intra^F(l) = −∑_i=1⁶ p_i^F(l)log p_i^F(l), where p_i^F(l) represents the normalized frequency of observing residue of class i (i = 1–6) at position l in all homologous sequences belonging to the family F. The sum is taken over all possible residue classes. (a, squares) sequence entropy calculated across all families. To obtain this quantity, we evaluated frequencies of occurrence of amino acids of each class i at each position l for all families [p_i^across(l)] and then calculated sequence entropy for a position l as S_across(l) = −∑_i=1⁶ p_i^across(l)log p_i^across(l). (b) The probability that equal or lower S_CoC will be observed under zero hypothesis that conservatism of a residue in the structure is related primarily to its degree of buriedness.