Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2009 Feb 13;106(9):3107–3112. doi: 10.1073/pnas.0811262106

Scaling and self-organized criticality in proteins I

J C Phillips 1,1
PMCID: PMC2651243  PMID: 19218446

Abstract

The complexity of proteins is substantially simplified by regarding them as archetypical examples of self-organized criticality (SOC). To test this idea and elaborate on it, this article applies the Moret–Zebende SOC hydrophobicity scale to the large-scale scaffold repeat protein of the HEAT superfamily, PR65/A. Hydrophobic plasticity is defined and used to identify docking platforms and hinges from repeat sequences alone. The difference between the MZ scale and conventional hydrophobicity scales reflects long-range conformational forces that are central to protein functionality.

Keywords: hydrophobicity, repeat, scaling


There are many scales (hydrophobicity, charge, helix-forming propensity, etc.) that bioinformatically describe various aspects of amino acid interactions in proteins and their relations to sequence, structure, stability, and functionality, all justified by combining plausible constructs with parameters adjusted through statistical searches. Are all such scales equally useful, and are the substantial differences among them attributable to their inevitably heuristic character? Here, I discuss differences in methodology, with special emphasis on holistic vs. reductionist approaches. I suggest that a better understanding of the limitations and potential of scaling analysis of proteins will result from systematic studies of simple cases first, such as repeat proteins and their associates. Scaling analysis may prove to be a valuable tool in designing proteins with desired functions.

Many physical systems exhibit power-law distributions over limited ranges (hence the popularity of log–log graph paper), and power-law distributions are the characteristic feature of the modern theory of phase transitions near a critical point. Self-organized criticality (SOC) is a methodology that attempts to explain why so many complex systems exhibit power-law distributions and appear to be “accidentally” located near critical points. It is argued that the critical points are dynamical fixed points (“tipping points”) toward which the system evolves without tuning external parameters (1). The critical points are extrema in some property (or properties) with respect to which the system has been optimized, especially with respect to long-range, highly-cooperative interactions, such as conformational changes.

Given the widespread occurrence of power laws, SOC has great intuitive appeal: it has achieved an enduring popularity among theorists (>2,500 papers discussing SOC, >25 current books from a single publisher), notably in modeling the critical stability of sand piles against avalanches, but its concrete applications have been limited largely to seismic phenomena. A cautionary remark: power laws, especially in the context of dissipative reactions, have often been interpreted in terms of SOC, but a simpler interpretation involves only slow sweeping of a control parameter toward a global extremal, as could occur in the context of commercial product optimization (2). However, more recently Boolchand and coworkers (3) discovered a reversibility window associated with an intermediate phase (IP) in the phase diagrams of both chalcogenide and oxide network glasses; near reversibility is strongly suggestive of the concepts of equilibrium statistical mechanics. Such networks resemble proteins, not only geometrically, but also in their exponential (Vogel–Fulcher viscosity) slowing near the glass transition, mimicking the slowness of protein conformational changes, regarded as microscopic phase transitions. The reversibility window is bounded by edges that become more abrupt (4) in the presence of nonbridging oxygen (analogous to H bonds in proteins). In the reversibility window the internal stress and aging of the networks also nearly vanishes (3), further heightening the similarities to unstressed (nearly ideally compacted) proteins.

Are these abstract SOC–IP models relevant to folded proteins, or should they simply be regarded as yet another inexplicably successful fractal approach to complexity? It has been argued that SOC is a misnomer in the context of life-origin models (5), but bioinformatic studies of graphs of protein domain structures have shown scale-free (and stress-free) properties consistent with SOC (6). These issues have been brought into much sharper focus by the Moret–Zebende (MZ) discovery that solvent-accessible surface areas (SASA) in protein fragments containing (2N + 1) amino acids contract according to power laws in (2N + 1) (1 ≤ N ≤ 17) with different (centered, amino acid-specific) exponents (7). These exponents, derived from bioinformatic scans of 5,526 high-resolution protein fragments, show that each amino acid induces long-range changes in local backbone curvatures [which are smaller (larger) for hydrophilic(phobic) residues]. In general the fluctuations associated with extrema associated with SOC are expected to involve long-wave lengths, such as large-scale conformational changes in microscopic protein-phase transitions.

The hierarchical ordering of these dimensionless holistic exponents (not energies) is qualitatively similar (7) to that of reductionist hydrophobicity scales based on ordering transference energies of isolated amino acids (an MZ fragment with N = 0) from water to organic solvents (8, 9), which involve shortest wave lengths. Reductionist isolation methods have often been criticized on logical grounds, but in practice replacing them with reliable and transferable results holistically is difficult. In other words, the entirely unexpected discovery of the MZ self-similar (power-law) hydrophobicity scale based on SASA is of great interest not only in terms of applications, but also in terms of scaling methods in general and connections between SOC and evolution. A deeper discussion of SASA and geometrical aspects of the highly original MZ scale is given elsewhere (10).

Previously, it has been noted that weak, large-scale interactions (like water–protein interactions) are generally more important than strong (intraprotein) ones in determining properties and functions of large, complex systems; the classic example is the Hueckel model of (aromatic) polycylic planar hydrocarbons (benzene, anthracene, naphthalene, etc.), where weak out-of-plane π polarizability interactions dominate strong in-plane σ interactions, even in feeding back to determine variations of in-plane σ bond lengths (10). However, in general one cannot find classical force field (CFF) screening parameters to describe the effects of weak water interactions on strong bare amino acid interactions. Even in a pair lattice model with only 2 kinds of interactions and 2 kinds of amino acids, the 2 adjustable parameters are fixed by the diagonal interactions, leaving no flexibility to treat the off-diagonal interactions (11, 12), a situation that rapidly deteriorates with 20 aa interacting in virial clusters of 3 or more amino acids. Power laws and SOC (7, 10) close this apparently insoluble (open-ended or divergent) series expansion problem by treating conformational changes as microscopic-phase transitions.

Given that SOC is a plausible model for relating protein sequences to some combination of structure, stability, and functionality (10), and that MZ have discovered a long-range hydrophobicity scale compatible with SOC, what should we look for? We should look first for proteins with large SASA, where interactions with water are critical to functionality, such as nonglobular superhelical repeat proteins (10, 13). We should not look first at transition states associated with unfolding, leading to dysfunctional aggregation and amyloid formation; the latter depends primarily on at least 3 factors, hydrophobicity, charge, and helix-forming propensity (14). Nor should we search bioinformatically for a “universal” model that ignores functionality entirely, because each protein family with similar functions in all likelihood shares a specific SOC that is different from other families with different functionalities. As for methods, the reader definitely should not look for bioinformatic statistics; these are already contained in the MZ hydrophobicity scale, based on 5,526 ultra-high resolution helical fragments (7). In effect this (already bioinformatic) scale enables us to analyze long-range water–protein interactions holistically in a multiply curved (or winding) amino acid sequence space, without the limitations imposed by reductionist deconstructions based on individual water–organic solvent transference.

An Instructive Precedent: An Absolute Chemical Scale

By restricting our attention to the context of functionalities of protein families dominated by interactions with water, often (but not necessarily) sharing similar secondary structures, we maximize the knowledge to be gained from scaling. There is ample precedent for this restrictive approach in semiconductor science, a very limited context, yet technologically an incomparably successful field with a database second only to proteins. There a clever combination of deductive and inductive methods produced extremely accurate results 40 years ago, which have not been replicated since by deductive or inductive methods alone (10), not even with the largest, fastest, and most modern supercomputers. The discussion was further restricted to the context of all 70 simple binary octet compounds ANB8-N, where there are 2 broad classes to be considered (10). In 6-fold coordinated salts with ionic bonding (NaCl, CaO), increasing binding energy increases the average energy gap and reduces polarizability, but in 4-fold coordinated semiconductors with covalent bonding (Si, GaAs), increasing binding energy reduces the average energy gap and increases polarizability (because of quantum interference of atomic orbitals in the bonding overlap region). By exploiting the deductive f-sum rule, the dielectric theory of ionicity derives a chemical configuration coordinate that converts the thermodynamically first-order phase transition into one that is second order in this coordinate. This chemical configuration coordinate is the dimensionless fractional ionicity: it is 100% successful in predicting the chemical-phase transition involved in structural separation of these compounds, using as inductive input only the valence electron density and the electronic polarizability. The theory also explains chemical trends in many other properties of these (technologically extremely successful, hence well-studied) materials, including the second-order collapse of shear resistance (called enhanced conformational flexibility by protein scientists) in the AgX salts (X = Cl, Br, I) as the ionicity increases to its critical value. It shows (10) that these chemical trends (previously known qualitatively) can be predicted with high accuracy (usually >10%, for example in heats of formation and elastic constants) by using a configuration coordinate that is accurate to >1%.

The essentially complete success of the dielectric theory of ionicity is based on several subtle points. Had the energies in the ionic salt and covalent semiconductor structures been calculated separately by conventional reductionist methods, the large cancellations in most of the energy differences would have amplified the effects of small errors in either first principles quantum calculations (even though the separate energies are nearly 100% accurate), or (much worse) parameterized CFF models, leading to ≈70% accuracy at best in predicting chemical trends in properties. [These reductionist cancellation problems were already evident in CFF (Born-Mayer) models of binary salts in the 1930s, which is why they were abandoned in favor of quantum theories (10).] The use of the electronic polarizabilty to construct covalent and ionic energy gaps turned out to be especially felicitous, because the former is measured at optical wavelengths and holistically weights long-range weak interactions more heavily compared with short-range nearest-neighbor strong ones. In proteins energy cancellations are especially challenging, because protein folding energies typically average <1 K/(degree of freedom), and errors in CFF energies are typically of order 100 K/(degree of freedom).

Today, the successes of the dielectric scale of ionicity are so extensive that it is regarded as the Kelvin scale (analogous to his absolute temperature scale) of quantum chemical calculations, which presently are able to reproduce its results with 90% accuracy for the 30 easiest ANB8-N compounds (10). Here, I argue that a properly constructed (holistic) hydrophobicity scale (restricted to protein contexts) also exists that is dimensionless and can be expected to be almost equally successful on the much larger protein–water canvas, where the need for it is correspondingly greater. Of course, the success of such a scale must be confirmed in practice, but it would be wrong to attempt to do so on a large number of proteins with widely different structures and functionalities: a case-by-case approach is necessary to establish the contexts for which this scale is optimal and to recognize distinctive peak-to-valley hydrophobicity patterns associated with specific functionalities.

Hydrophobic Order Parameters

When the secondary structure of helices, strands, and loops is known, it is convenient to introduce averages over these structural features; these averages are analogous to order parameters defined by averages over sublattices in mixed magnetic crystals. Given the MZ alphanumeric table for hydrophobicity, −γ(aa), one can define a suitable average hydrophobicity Ψ(M, S) = 〈−γ(aa)M〉, where the average is taken either over M consecutive residues, or over part of, or an entire secondary element (such as a helix S). This average is similar to the one used by Kyte and Doolittle for membrane proteins (8). A second intersecondary, especially interhelical quantity Φ, is a measure of hydrophobic plasticity or flexibility,

graphic file with name zpq00909-6131-m01.jpg

Here, R denotes an amino acid and γaa(R) is its hydrophobicity. For applications to helical repeats, consecutive repeats are aligned in the standard matrix tableau dictated by the consensus set; the sum is over matched helices of maximum length M, so that both Ψ and Φ are normalized. Note that consensus sites usually contribute 0 to Φ, but that alignment of other sites may either increase or decrease Φ, and several tests have shown that Φ is minimized by standard consensus alignments in cases where the structures of repeat proteins are known. (However, Φ can also be used to align sequences by minimizing the mismatch that Φ measures, thus overcoming some of the ambiguities associated with residue insertions or deletions.) Even in the context of adjacent (nearly parallel nonrepeats, no consensus sites) helices (such as merely H-bonded β strands), Φ potentially could be useful. A general remark: overall there is a tendency toward hydroneutrality, but given proteins often separate into distinct hydroph(ob/il)ic (inside/outside) regions, not only merely surface/volume, and these regions are easily identified with accurate values of Ψ.

A remark about 〈Ψ3〉 is in order here. At first sight this average appears to involve nothing more than rectangular smoothing to reduce noise, but it has a much deeper justification. First, as regards self-similarity, the MZ exponents begin with N = 1 in 2N + 1 residue groups, thus 〈Ψ3〉 is a natural choice. Moreover, the i, j = i ± 4 (3) α-helical (β-strand) periodicity is such that 〈Ψ3〉(i) never overlaps 〈Ψ3〉(j); thus the moving average contains hidden spiral information. It is probably for these 2 reasons, and the general tendency toward hydroneutrality, that quite consistently these applications have shown interesting 〈Ψ3〉 peak-to-valley patterns. I have examined other patterns (based on wider smoothing, such as 〈Ψ5〉) and found that they merely reduce the high resolution of the MZ scale.

Repeat Proteins

There are many families of proteins that one could choose to study, but the HEAT and ARM repeat proteins have many attractive features that are especially suitable for studying long-range water-mediated interactions. They have large SASA that are separable into patches with distinct functionalities that are easily recognized with the phantom order parameters, much like liquid crystal domains. This becomes clearer if we compare these spiral “open” repeats with more compact and rigid β solenoidal repeats (13).

HEAT and ARM repeats may use their large SASA to perform multiple functions (15), a task that increases in importance from archea (<1 repeat %) to metazoa (>5 repeat %). These repeats exhibit functionality both mechanically and hydrophobically, in terms of the linear Ψ and quadratic Φ order parameters defined above. The simplest mechanical aspects involve the formation of subhelices in each repeat. In the HEAT repeats there are typically 9-aa-specific consensus sites, and leucine occupies 5 of these (of ≈39 residues), and in the ARM repeats there are typically 8-aa-specific consensus sites, and leucine occupies 4 of these. The predominance of leucine at consensus sites in repeats can be explained in terms of water-mediated interrepeat bridging switches, which exploit the binary (on-off) character of the CH2 terminal orientation relative to the helical axes (10). This explanation requires that the protein network be critically organized, so that the individual switches themselves function critically (and supereffectively). Note that attempts to explain amino acid distributions proteomically in terms of codon numbers have achieved only limited successes, and these do not extend to leucine predominance (16), including its spectacular maximal occurrence (9.1%) in all proteins, which is immediately explained by its 2-fold CH2 terminal symmetry, unique among the 20 protein amino acids (10).

Unlike leucine zippers and leucine-rich repeats (17), the helices of HEAT and ARM repeats split into subhelices, labeled A and B for HEAT repeats. In the ARM repeats the A helices split further into H1,2 helices, and the B helices are relabeled H3 helices. The A and B helices form an L-shape (like chopsticks), separated by a short (usually 3 residues) β-turn (18). The A arms are more hydrophilic and function as stabilizing rudders that support the B arms, which function as protein docks or scaffolds. The differences in functionality of HEAT and ARM repeats arise mainly from interrepeat variations, and these appear differently in the spatial and hydrostructures. Thus, these 2 families of repeats provide a severe test of whether it is the observed “real” spatial structure, or the “phantom” hydrostructure, or both, that determines functionality.

Scaffolding HEAT Protein

As the first application, I consider the structure of PR65/A, a 588-residue subunit of protein phosphatase 2A, which contains 15 tandem repeats of ≈39 residues each (19). PR65/A is unusual because the A and B arms are symmetrical (17 aa each). This rigid and symmetrical structure is ideal for a scaffold supporting other subunits. There are bends at the centers of both A and B arms in isolated PR65/A that increase the volumes of their hydrophobic cores. Weakening or softening of the helical structure is most pronounced in the off-center repeats 8–10, where 3–4 amino acids are unwound. All repeats are nearly aligned, with the exception of the insertion of 4 hydrophilic amino acids between repeats 8 and 9. The repeats form 3 blocks, with aberrant nonparallel packing between repeats 3 and 4, and 12 and 13, apparently caused by divergences from the consensus sequence; these divergences are described in terms of individual amino acid salt bridges, wedges, and other variations. In addition to leucines, a signature-motif sequence of the HEAT repeats of the PR65/A subunit and other HEAT motif proteins is the presence of conserved Asp and Arg (or occasionally Lys) residues; these are the 3 most hydrophilic amino acids on the long-range MZ scale.

The Ψ(A,B) and Φ(A,B) patterns of the A and B helical arms of PR65/A, shown in Fig. 1, are strikingly different and reflect different functionalities. Generally speaking, hydro(phil/phob)ic interactions with water soften/harden helices; thus the left (N) half of PR65/A (Fig. 1A) exhibits hard comb-like Ψ(B) docking arms and relatively soft Ψ(A) “rudder” arms, with a hydro hinge (both arms hydrophilically softened below hydroneutral at 0.155) at the central repeat S = 8. The A arms are associated with the exposed convex outer surface, whereas the B arms belong to the concave inner surface, which functions as the scaffolding support for catalytic and regulatory domains in the tumor suppressor protein phosphatase PP2A (18, 20, 21).

Fig. 1.

Fig. 1.

Hydrophobicity Ψ (A) and hydroplasticity Φ (B) repeat diagrams for the scaffold protein PR65/A (in color on line). A helical arms are shown as blue diamonds, and B helical arms are shown as red squares; their average is the green triangles. The A arms are more hydrophilic and function as rudders. Note that Φ(B) displays 3 hydroplastically soft docking peaks. Repeats 2–7 peak binds regulatory subunit B, a catalytic subunit C is bound to repeats 11–15, and the central Φ hydroplastically soft peak (repeats 8–9) is associated with the interfacial space between these 2 subunits (see Fig. 2). The abscissa S labels helical repeats. All of the Ψ ordinates have been multiplied by 103, and all of the Φ ordinates have been multiplied by 106.

The regulatory subunit B56γ1 (itself a 16-repeat) attaches (19) to the B arms of the 6 repeats 2–7. The catalytic subunit attaches to the B arms of repeats 11–15, which are associated with large oscillations in Ψ. Thus, the docking structure of the scaffold-regulatory-catalytic heterotrimer, shown in Fig. 2, again from ref. 20 for the reader's convenience, partially reflects the interrepeat spatial structure of the scaffold alone (the weakening of the helical structure in repeats 8–10 corresponds to the interface or hydrohinge between the regulatory and catalytic units). However, at the same time the aberrant nonparallel packing between repeats 3 and 4, and 12 and 13, apparently caused by divergences from the consensus sequence, probably reflect only the marginal stability of isolated PR65/A. These “breaks” received detailed attention in discussions of the isolated structure, including many amino acid-specific packing interactions (18, 19), but they appear to be irrelevant to the heterotrimer interfaces, which, of course, are tailored by functionality and stabilized in the complex. The break at repeat 12 disappears in the heterotrimeric structure, in which the overall shape of PR65/A changes from a twisted hook to a more regular horseshoe (20).

Fig. 2.

Fig. 2.

Diagram of the overall structure of the heterotrimeric PP2A holoenzyme, from ref. 20. The scaffold subunit A (importin β), the regulatory subunit B, and the catalytic subunit C are in blue, green, and orange, respectively. The details of the matching between repeats A2–A7 and the B subunit, and A11–A15 and the C subunit, are shown in figures 3 and 2, respectively, of ref. 20. [Reproduced with permission from ref. 20 (Copyright 2007, Macmillan Publishers: Nature).]

Next, I turn to the phantom (A, B) hydroplasticity ΦA,B patterns (Fig. 1B), which are quite different from the spatial packing patterns. Apart from softness near the N end, there is little structure between the outward-facing philic A arms, but the inward-facing phobic B arms show distinct fragility ΦB peaks, at S = 13, 4, and 12 (in that order). One of the helical consensus sites in the B arm is 24, which is occupied by Val in 11 of the 15 repeats. This hydrophobic amino acid is missing from repeats 1 (N end), 4, 12, and 13, an essentially perfect correlation with the fragility peaks 13, 4, and 12. The marginal fragility of B arms of S = 8 and 9 is presumably correlated with the hydrohinge seen in Ψ in Fig. 1A. Overall it is clear that the phantom Ψ(A, B) and Φ(A, B) long-range adaptive patterns of the helical arms of PR65/A, shown in Fig. 1, provide a better interpretation of the locations and functions of the heterotrimer protein interfaces (20) than the interrepeat spatial patterns (18, 19) of isolated PR65/A. The hydroflexibility of the helical arms enables the intrarepeat loops, which are in close proximity to the regulatory B and catalytic C subunits (20), to adopt optimal conformations. This is also apparent from the narrowing of the A horseshoe in the trimeric complex compared with the free A subunit.

It should be stressed that the essentially perfect matching of the large-scale hydroplasticity patterns in Fig. 1B with the heterotrimer structure (20) can scarcely be accidental, and that this matching is not easily recognized by using conventional methods based on CFF of any kind (10). This success implies that the large-scale structure of the heterotrimer interface is determined both implicitly by the self-organized evolutionary factors that designed the 3 proteins involved and explicitly by long-range protein–water interactions that are accurately described by the MZ scale. By looking closely at the short-range scaffold contacts of intrarepeat turns to the regulatory B and catalytic C subunits shown in figures 3 and 2 of ref. 20, one can see that they can be described as an alternating mixture of rigid hydrophilic salt interactions (anionic Glu and Asp with cationic Arg and Lys) with hydrophobic spacers; the latter could make the large-scale contacts more flexible and adaptive.

The 3-aa turns at the vertex of the groove formed by the intrarepeat turns connecting the A and B arms of PR65/A are solvent-exposed, and it is useful to note that they form 2 distinct hydrophilic groups. Thus, 10 of the turns are strongly hydrophilic, 〈Ψ3〉 ≈0.10, whereas 5 are close to hydroneutral (〈Ψ3〉 ≈0.14, repeats 3, 4, 8, 10, and 14; note that 4 of 5 of these turns are located near 1 of the 3 ΦB peaks). The latter may promote stability of the marginally stable scaffolding structure in the unbound state. It appears that these vertex turns determine a characteristic length in repeat proteins that makes 〈Ψ3〉 a valuable configuration coordinate for repeats composed of L-shaped building blocks.

Four oncogenic site mutations associated with lung and colon tumors have been discussed in terms of the static structure (19) but here one can propose an alternative interpretation. Two of the mutations occur in helical sites, and these 2 (P65 → S and L101 → P) involve substantial destabilizing decreases in hydrophobicity by reducing helical rigidity. The other 2 occur in the short turns, D504 → G (repeat 13) and V545 → A (repeat 14). The former increases 〈Ψ3〉 (504), thus hydrophobically strengthening the typically hydrophilic repeat 13 turn, whereas the latter reduces 〈Ψ3〉 (545) and hydrophilically weakens the atypically nearly hydroneutral repeat 14 turn; both mutations regress 〈Ψ3〉 for these turns toward the mean value (0.118) for all 15 turns. All of these small changes could combine to favor more rapid production of the oncogenic unbound protein, while disrupting its functionality, by averaging barrier heights. Although the individual mutagenic changes are small, acting cooperatively they holistically (collectively) smooth water–protein interactions at both the helices and the solvent-exposed turns and could easily accelerate oncogenic production; this cooperative phantom mechanism is spatially invisible and is also inaccessible to conventional CFF simulations.

In another article (33) we will see that large-scale binding of transport repeat proteins is both hydrophobically- and charge-driven, but note here that the (K + R − D − E) charges on the A and B arms of PR65/A are small [near ± 1 for (B, A) helical arms] and there are no long basic loops: large-scale charge effects are weak for proteins binding to this scaffold. Another repeat with mechanically driven binding interactions and the characteristic L shape with A and B helical arms is the ankyrin D34 section. Ankyrins are superhelical springs or multifunctional adaptors that link specific proteins to the membrane-associated, spectrin-actin cytoskeleton (22); in D34 A and B arm charges are small and evidence almost no correlation. Hydrophobic analysis of D34 has been given elsewhere (10, 23). Solenoidal repeats also function as scaffolds and support a wide variety of functions through gradually more divergent sequences (13, 24). The spatial curvature created by the superhelical nature of these proteins combines with hydroflexibility to predetermine the target proteins that can bind to them. It appears that this is a profitable area for further sequential hydroanalysis with both linear Ψ and quadratic Φ profiles.

Discussion

There is a growing interest in analogies between protein dynamics and the phase diagrams of molecular and network glasses (25). The latter have a long history, stretching back to the recognition in the 1920s of exponential growth of viscosity in supercooled liquids as the glass transition is approached (now called “funnels” in biophysical energy landscapes, which were already introduced into glass science in the 1960s). There are many general treatments of protein docking (26), but even the apparently simple question of whether the hydrophobic interactions are important and should be treated as short- or long-range, is still uncertain. Contact interactions in enzymes are often dominated by hydrophilic residues, whereas sugar sites are overall strongly hydrophobic, facilitating water displacement in the early stage of docking (27, 28). The present approach based on SOC and the MZ discovery (7) of amino acid-specific power-law scaling of SASA is exact in the long-range limit. Before MZ, the only connection to criticality was through the reversibility window of network glasses. Exponents for the 2 phase transitions at the edges of this window have been observed (3), but this connection to power laws is too abstract to be of practical value for proteins. More recently, an elegant example of an aggregated (large-scale) protein stress-phase transition has been obtained by studying hydrogel lysozyme solutions, where long-range hydrophobic forces apparently modulate the formation of β-sheet fibrils (29). A break in slope of an elastic plateau modulus as a function of lysozyme concentration is seen on a log–log plot, suggestive either of a connectivity transition or an α/β order–disorder transition.

The functionality of proteins is a challenging problem because it is open-ended; it involves protein–protein interactions on ever-widening scales. These interactions are always mediated by water films, but most of the properties of such films on scales of hundreds of residues are too complex and long-range to be accessible quantitatively to simulations using necessarily inaccurate CFF (10). The MZ discovery that SASA exhibit power-law scaling (7), and that the exponents are hierarchically ordered from hydrophilic to hydrophobic, offers theory great opportunities. Specifically it appears that evolution has encoded many aspects of long-range functionality in amino acid sequences, and that (as here) this functionality is well-separated from short-range functionality of intraprotein and interprotein spatial contacts. Because the protein sequences exhibit the effects of SOC, the strengths of water-mediated long-range interactions can often be inferred from patterns and/or chemical trends in sequential MZ hydrophobicity. For larger proteins (and even some as small as 30 residues) these long-range holistic trends and patterns either weaken or disappear by using short-range reductionist hydrophobicity scales based on isolated amino acid transference energies, which we take as evidence for their origin from SOC (1) and network stress-free reversibility windows (3). Sequential MZ hydrophobicity is a powerful tool that offers a new dimension to homology analysis, because by comparing holistic and reductionist hydroprofiles, we can separate long-range and short-range interactions and mechanical and electrostatic ones.

Here, I have discussed PR 65/A, a scaffold repeat protein, as a simple example that generates long-range functional peak-and-valley patterns that are easily recognized. [The easiest way to recognize these patterns is to study large-scale features that facilitate docking (for example) in members of homologous families.] Both PR65/A and ankyrin D34 exhibit large-scale hydroplasticity effects that are easily recognized with the MZ scale. Because of their large-scale structural simplicity, repeat proteins are probably the easiest large proteins to design. Plueckthun and collaborators (30) have constructed libraries of such proteins stabilized primarily by a consensus design strategy for hydrophobic cores, which is similar to optimizing interrepeat interactions, represented here by Φ. The designed repeats contained up to 8 internal repeats plus 2 capping repeats (≈400 residues). They found that designed repeat proteins could exhibit high thermodynamic stability, monomeric states, and high expression levels of solubility in Escherichia coli. Mutants from this library can also exhibit high binding specificities, and it is hoped that they can be further refined by pair mutations to function as antibodies based on specific interactions with peptide side chains. Thus, this interesting program for designing protein scaffolds has a strongly reductionist spatial aspect, based on analysis of individual side-chain interactions.

The present hydroanalysis suggests an alternative holistic approach to protein design, based on MZ SOC-IP hydrophobicities and comparisons of protein families through patterns of phantom hydro extrema (stabilizing hydrophobic maxima and binding hydrophilic minima) patterns. Protein design is very challenging, and the Plueckthun and collaborators program (30) is well advanced. However, the success of either (or better, both) approaches to designing antibodies through repeat scaffolds, or other proteins with other functions, remains to be seen.

Postscript

Because single mutations are known to be oncogenic, it has often been assumed that reductionist accuracy is sufficient for their analysis. Unbiased whole-genome sequencing of healthy and cancerous tissues of a single patient revealed 10 genes with acquired mutations; 2 were previously described mutations that are thought to contribute to tumor progression, and 8 were new mutations present in virtually all tumor cells at presentation and relapse, the function of which is not yet known (31). Most of the 8 new mutations are associated with conventional signaling proteins, but because 10 (not merely 2) mutations are involved, holistic accuracy may be necessary for their analysis; many of the individual mutations may have small effects, whereas their combination in series accumulates decisively rapid growth (see the discussion above for PR65/A).

Methods

The MZ hydrophobicity scale (7) has been tabulated elsewhere (10, 23), together with several other hydrophobicity scales. The latter have been rescaled so that their ranges and hydroneutral midpoints match those of the MZ scale, which facilitates comparisons of the effectiveness of these scales with that of the MZ scale based on SOC. Obtaining profiles for sequential hydronanalysis is computationally trivial and can be done with only EXCEL [more convenient and complete than earlier web-based profiling software (32)]. Pattern recognition requires judgment in the context of known functionalities, and thus the greatest effort is required in surveying the (abundant) literature for the broadest protein families. As compensation, the simplicity of hydroanalysis facilitates such surveys and pattern recognition, even for very long sequences that may be inaccessible to other methods.

The advantages of using by far the most accurate hydrophobicity scale based holistically on SOC, compared with previous and only approximate reductionist scales, are brought out by comparing the results discussed here and elsewhere (10, 23) with those summarized in ref. 32, which used a “neutral” scale based on averaging many different transference scales. At that time, rectangular window averages were generally taken only over helices (as here), and usually over sequences with M no smaller than 7 (〈Ψ7〉), because shorter averages scattered excessively with the organic solvent used to measure transference energies. For most nonrepeat proteins such large-scale averages are useful for identifying qualitative trends, but they are incapable of resolving subtle features.

Acknowledgments.

This article benefited from many comments by G. F. Zebende and detailed, constructive advice from B. Kobe.

Footnotes

The author declares no conflict of interest.

References

  • 1.Bak P, Tang C, Wiesenfeld K. Self-organized criticality: An explanation of the 1/f noise. Phys Rev Lett. 1987;59:381–384. doi: 10.1103/PhysRevLett.59.381. [DOI] [PubMed] [Google Scholar]
  • 2.Sornette D. Sweeping of an instability: An alternative to self-organized criticality to get power laws without parameter tuning. J Phys I [French] 1994;4:209–221. [Google Scholar]
  • 3.Chakravarty S, Georgiev DG, Boolchand P, Micoulaut M. Aging, fragility, and the reversibility window in bulk alloy glasses. J Phys Condens Matter. 2005;17:L1–L7. doi: 10.1088/0953-8984/17/1/L01. [DOI] [PubMed] [Google Scholar]
  • 4.Rompicharla K, et al. Abrupt boundaries of intermediate phases and space filling in oxide glasses. J Phys Condens Matter. 2008;20:202101. doi: 10.1088/0953-8984/20/20/202101. [DOI] [PubMed] [Google Scholar]
  • 5.Abel DL, Trevors JT. Self-organization vs. self-ordering events in life-origin models. Phys Life Rev. 2006;3:211–228. [Google Scholar]
  • 6.Dokholyan NV. The architecture of the protein domain universe. Gene. 2005;347:199–206. doi: 10.1016/j.gene.2004.12.020. [DOI] [PubMed] [Google Scholar]
  • 7.Moret MA, Zebende GF. Amino acid hydrophobicity and accessible surface area. Phys Rev E. 2007;75 doi: 10.1103/PhysRevE.75.011920. 011920. [DOI] [PubMed] [Google Scholar]
  • 8.Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982;157:105–132. doi: 10.1016/0022-2836(82)90515-0. [DOI] [PubMed] [Google Scholar]
  • 9.Chandler D. Interfaces and the driving force of hydrophobic assembly. Nature. 2005;437:640–647. doi: 10.1038/nature04162. [DOI] [PubMed] [Google Scholar]
  • 10.Phillips JC. Protein sequence, structure, stability, and functionality. 2008 http://xxx.lanl.gov/abs/0802.36410802.3641.
  • 11.Salvi G, De Los Rios P. Effective interactions cannot replace solvent effects in a lattice model of proteins. Phys Rev Lett. 2003;91:258102. doi: 10.1103/PhysRevLett.91.258102. [DOI] [PubMed] [Google Scholar]
  • 12.Seno F, Trovato A, Banavar JR, Maritan A. Maximum entropy approach for deducing amino acid interactions in proteins. Phys Rev Lett. 2008;100 doi: 10.1103/PhysRevLett.100.078102. 078102. [DOI] [PubMed] [Google Scholar]
  • 13.Kobe B, Kajava AV. When protein folding is simplified to protein coiling: The continuum of solenoid protein structures. Trends Biochem Sci. 2000;25:509–515. doi: 10.1016/s0968-0004(00)01667-4. [DOI] [PubMed] [Google Scholar]
  • 14.Pawar AP, DuBay KF, Zurdo J, Vendruscolo M, Dobson CM. Prediction of “aggregation-prone” and “aggregation-susceptible” regions in proteins associated with neurodegenerative diseases. J Mol Biol. 2005;350:379–392. doi: 10.1016/j.jmb.2005.04.016. [DOI] [PubMed] [Google Scholar]
  • 15.Andrade M, Petosa C, O'Donoghue S. Comparison of ARM and HEAT protein repeats. J Mol Biol. 2001;309:1–18. doi: 10.1006/jmbi.2001.4624. [DOI] [PubMed] [Google Scholar]
  • 16.Cutter AD, Wasmuth JD, Blaxter ML. The evolution of biased codon and amino acid usage in nematode genomes. Mol Biol Evol. 2006;23:2303–2315. doi: 10.1093/molbev/msl097. [DOI] [PubMed] [Google Scholar]
  • 17.Kobe B, Kajava AV. The leucine-rich repeat as a protein recognition motif. Curr Opin Struct Biol. 2001;11:725–732. doi: 10.1016/s0959-440x(01)00266-4. [DOI] [PubMed] [Google Scholar]
  • 18.Kobe B, Gleichmann T, Horne J, Jennings IG, Scotney PD, Teh T. Turn up the HEAT. Structure (London) 1999;7:R91–R97. doi: 10.1016/s0969-2126(99)80060-4. [DOI] [PubMed] [Google Scholar]
  • 19.Groves MR, Hanlon N, Turowski P, Hemmings BA, Barford D. The structure of the protein phosphatase 2A PR65/A subunit reveals the conformation of its 15 tandemly repeated HEAT motifs. Cell. 1999;96:99–110. doi: 10.1016/s0092-8674(00)80963-0. [DOI] [PubMed] [Google Scholar]
  • 20.Cho US, Xu WQ. Crystal structure of a protein phosphatase 2A heterotrimeric holoenzyme. Nature. 2007;445:53–57. doi: 10.1038/nature05351. [DOI] [PubMed] [Google Scholar]
  • 21.Mumby M. PP2A: Unveiling a reluctant tumor suppressor. Cell. 2007;130:21–24. doi: 10.1016/j.cell.2007.06.034. [DOI] [PubMed] [Google Scholar]
  • 22.Michaely P, Tomchick DR, Machius M, Anderson RGW. Crystal structure of a 12 ANK repeat stack from human ankyrinR. EMBO J. 2002;21:6387–6396. doi: 10.1093/emboj/cdf651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Phillips JC. Functionality and protein–water interactions. 2008 http://xxx.lanl.gov/abs/0803.0091.
  • 24.Karpenahalli MR, Lupas AN, Soding J. TPRpred: A tool for prediction of TPR-, PPR- and SELI-like repeats from protein sequences. BMC Bioinform. 2007;8:2. doi: 10.1186/1471-2105-8-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Lubchenko V. Competing interactions create functionality through frustration. Proc Natl Acad Sci USA. 2008;105:10635–10636. doi: 10.1073/pnas.0805716105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Shakhnovich E. Protein folding thermodynamics and dynamics: Where physics, chemistry, and biology meet. Chem Rev. 2006;106:1559–1588. doi: 10.1021/cr040425u. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Israelachvili JN, Pashley RM. The hydrophobic interaction is long-range, decaying exponentially with distance. Nature. 1982;300:341–342. doi: 10.1038/300341a0. [DOI] [PubMed] [Google Scholar]
  • 28.Yamaotsu N, Oda A, Hirono S. Determination of ligand-binding sites on proteins using long-range hydrophobic potential. Biol Pharm Bull. 2008;31:1552–1558. doi: 10.1248/bpb.31.1552. [DOI] [PubMed] [Google Scholar]
  • 29.Yan H, et al. Thermoreversible lysozyme hydrogels: Properties and an insight into the gelation pathway. Soft Matter. 2008;4:1313–1325. doi: 10.1039/b716966c. [DOI] [PubMed] [Google Scholar]
  • 30.Parmeggiani F, et al. Designed armadillo repeat proteins as general peptide-binding scaffolds: Consensus design and computational optimization of the hydrophobic core. J Mol Biol. 2008;376:1282–1304. doi: 10.1016/j.jmb.2007.12.014. [DOI] [PubMed] [Google Scholar]
  • 31.Ley TJ, et al. DNA sequencing of a cytogenetically normal acute myeloid leukemia genome. Nature. 2008;456:66–72. doi: 10.1038/nature07485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Hofmann K, Stoffel W. PROFILEGRAPH: An interactive graphical tool for protein sequence analysis. Bioinformatics. 1992;8:331–337. doi: 10.1093/bioinformatics/8.4.331. [DOI] [PubMed] [Google Scholar]
  • 33.Phillips JC. Scaling and self-organized criticality in proteins II. Proc Natl Acad Sci USA. 2009 doi: 10.1073/pnas.0811308106. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES