Abstract
In living cells, functional protein–protein interactions compete with a much larger number of nonfunctional, or promiscuous, interactions. Several cellular properties contribute to avoiding unwanted protein interactions, including regulation of gene expression, cellular compartmentalization, and high specificity and affinity of functional interactions. Here we investigate whether other mechanisms exist that shape the sequence and structure of proteins to favor their correct assembly into functional protein complexes. To examine this question, we project evolutionary and cellular abundance information onto 397, 196, and 631 proteins of known 3D structure from Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens, respectively. On the basis of amino acid frequencies in interface patches versus the solvent-accessible protein surface, we define a propensity or “stickiness” scale for each of the 20 amino acids. We find that the propensity to interact in a nonspecific manner is inversely correlated with abundance. In other words, high abundance proteins have less sticky surfaces. We also find that stickiness constrains protein evolution, whereby residues in sticky surface patches are more conserved than those found in nonsticky patches. Finally, we find that the constraint imposed by stickiness on protein divergence is proportional to protein abundance, which provides mechanistic insights into the correlation between protein conservation and protein abundance. Overall, the avoidance of nonfunctional interactions significantly influences the physico-chemical and evolutionary properties of proteins. Remarkably, the effects observed are consistently larger in E. coli and S. cerevisiae than in H. sapiens, suggesting that promiscuous protein–protein interactions may be freer to accumulate in the human lineage.
Keywords: promiscuity, protein structure, interaction potential
The interior of cells is a highly crowded environment where proteins continuously encounter each other (1). Thus, for cells to function properly, it is important that casual encounters do not outweigh functional ones. Statistically, the competition from nonfunctional interactions should be severe (2–4), given that the huge number of possible interactions far outweighs the comparatively small number of functional interactions: the Escherichia coli proteome contains about 4,200 proteins, yielding over 8,000,000 potential distinct pairwise interactions. Eukaryotic proteomes are even larger and require additional mechanisms to minimize the impact of nonfunctional interactions (3, 5, 6). For example, Zhang et al. showed that, in Saccharomyces cerevisiae, the average concentration of coexpressed and colocalized proteins is close to the upper tolerable limit (3), implying that compartmentalization of proteins in time and space was crucial to allow the expansion of eukaryotic protein repertoires.
In addition to cellular mechanisms such as compartmentalization and regulation of protein abundance, shown to be important for intrinsically unstructured proteins, for example (7), specific physico-chemical properties contribute to minimizing nonfunctional protein-protein interactions (PPIs). This has been observed within the protein core (8) and within interface patches (9), which, due to their hydrophobic character, have a potential to mediate nonfunctional interactions. Pechmann et al. showed that interface regions are often aggregation-prone but protected by strategically placed disulfide bonds and salt bridges (9). Such aggregation-prone regions have also been shown to be less frequent among highly expressed proteins, which, according to the law of mass action, are potentially more deleterious to the cell than lowly expressed proteins (10). Importantly, in these studies aggregation is measured along the protein sequence and therefore reflects the potential for aggregation of the unfolded state.
Most previous studies have highlighted “negative-design” principles at known binding regions (9) or examined nonfunctional interactions through aggregation (10–13). In contrast, here we concentrate on the surface regions of proteins in their folded state. Specifically, we ask if the folded state of proteins is evolutionarily constrained by nonfunctional interactions. This means, in particular, that we consider surface residues but not amino acids buried in the protein core, as these cannot be involved in protein–protein interactions. In a molecular evolution-oriented study, Yang et al. recently observed that such surface-specific evolutionary constraints exist in yeast (14). Here we present a complementary analysis that places the emphasis on the physico-chemical properties of proteins associated with constraints from nonfunctional interactions and describe these properties in two additional species to better cover the tree of life. We thus assembled three datasets of proteins of known structure in their biological state (“biological unit”), resulting in 397, 196, and 631 proteins for E. coli, S. cerevisiae, and Homo sapiens, respectively.
Results
Defining an Interaction Propensity Scale.
To investigate the impact of promiscuous interactions, we first define an interaction propensity scale to use as a proxy for an amino acid “stickiness” scale. We derive this scale purely from structural data by taking the log ratio of amino acid frequencies observed at the protein surface versus in protein–protein interfaces, as previously defined (15–17) and as illustrated in Fig. 1B. As we consider protein structures in terms of biological units, surface amino acids as defined here are not involved in interfacial protein–protein contacts in the crystal structure. This scale thus reflects a trade-off between the probability of finding a given amino acid in a solvated environment versus the residue being involved in an interaction with another protein. For example, lysine is frequent at the surface (∼15% of amino acids) but rare in interface core regions (<5% of amino acids), which makes it an interaction-resistant or “nonsticky” amino acid (17). We used only E. coli proteins to derive this scale, but our conclusions are not dependent on the organism used because the scales based on S. cerevisiae and H. sapiens proteins are almost identical to that of E. coli (Rcoli-yeast = 0.94, Rcoli-human = 0.97; Fig. S2).
Fig. 1.
The solvent-accessible surfaces of high-abundance proteins are enriched in nonsticky amino acids compared with low-abundance proteins. (A) Illustration of the approach taken in this study. (B) We first define a stickiness scale for each amino acid using its interface propensity. The propensity is defined by the log ratio of amino acid frequencies at interfaces versus surfaces. The definition of the structural regions used is explained in more detail in Fig. S1. (C and D) We calculate a stickiness score by averaging interface propensity scores of residues in the region considered (surface or interior). We then plot this score against the abundance of the protein and indicate the Spearman rank correlation coefficients of the relationships, as well as the P value associated with the linear association obtained by analysis of variance. The contour lines mark the 2/6, 3/6, 4/6, and 5/6 percentile of the density function range.
Chemical Constraints on Surfaces of Highly Abundant Proteins.
Nonfunctional interactions are, on average, detrimental to fitness because they sequester interaction partners (18). According to the law of mass action, the number of nonfunctional interactions that a protein participates in should be proportional to its abundance (19). Therefore, an abundant protein with a sticky surface is expected to be more deleterious than a low-abundance protein with the same surface stickiness. If cellular crowding and its associated promiscuous interactions were a constraint in cellular systems, we would expect an anticorrelation between protein surface stickiness and protein abundance. We quantified the stickiness of a protein surface as the average of interface-propensity scores, thus reflecting the tendency of its solvent-accessible residues to interact with other protein surfaces (Fig. 1). For all three organisms, we used all of the available experimental data on protein abundance provided by the PaxDb database (http://pax-db.org) (20). These values are linearly proportional to protein copy numbers in cells.
Plotting surface stickiness against protein abundance reveals a significant anticorrelation in all three organisms studied (pcoli = 9.10−10, pyeast = 7.10−7, phuman = 2.10−5; Fig. 1C; these and subsequent P values associated with correlations were calculated using the F-statistic obtained by analysis of variance of the linear association between abundance and stickiness). However, the magnitude of the anticorrelation, as measured by the Spearman rank correlation coefficient, varies greatly. The strongest anticorrelation is found in E. coli (R = −0.48), followed by yeast (R = −0.36), followed by human (R = −0.25).
This result shows that the surface of highly abundant proteins has adapted to become less sticky and more soluble than for lowly abundant proteins, especially in E. coli and, to a smaller extent, in yeast and humans. This weaker signal might reflect the fact that eukaryotic cells are more compartmentalized than bacterial cells, which may introduce a bias in the measure of protein concentration approximated here with abundance. An analysis of protein stickiness as a function of localization indeed reveals significant differences across different cellular compartments. Interestingly, nuclear proteins are more sticky than the rest of the proteome taken as an average (pcerevisiae = 0.023; psapiens = 0.016) whereas mitochondrial proteins are less sticky (pcerevisiae = 0.0021, psapiens = 0.0045). Remarkably, in H. sapiens the gene ontology (GO) term most enriched in nonsticky proteins is “soluble fraction” (psapiens = 3.6*10−5; Fig. S3).
The amino acid potential provided in this analysis yields results that are significantly different from those obtained on the basis of the commonly used hydrophobicity scale of Kyte and Doolittle (21). When considering this hydrophobicity scale, the association described in Fig. 1 disappears in S. cerevisiae and H. sapiens and greatly weakens in E. coli (Fig. 2 and Fig. S4). We further tested 71 additional scales associated with “hydrophobicity” from the AAindex database (22) (Table S1). Interestingly, the scale of Wimley and White (23) yields the best correlation (R = −0.44) in E. coli, and is based on the transfer of amino acids from a hydrophobic environment (lipid bilayer interface) to water. This is different from the Kyte and Doolittle scale, which is based on measures of transfers of amino acids between two polar environments (e.g., ethanol and water). The similarity between the stickiness scale and the Wimley and White scale may reflect the fact that an interaction resembles more a transfer from water to a hydrophobic environment than a transfer between two relatively polar environments. Fig. S5 provides a comparison of these three scales, and Table S2 presents the values for our stickiness scale.
Fig. 2.
Protein hydrophobicity is less strongly tuned as a function of abundance than stickiness. We calculate a “hydrophobicity score” for the surface and interior regions of a protein by averaging Kyte and Doolittle hydrophobicity scores of residues in the region (21). We then plot this score against the abundance of the protein and indicate the Spearman rank correlation coefficients of the relationships, as well as the P value associated with the linear association obtained by analysis of variance. The hydrophobicity analysis for all species and surface as well as interior regions is shown in Fig. S4.
Current views of protein evolution emphasize stability, which must be maintained to avoid misfolding and thereby prevent loss of function or aggregation (24, 25). To assess the extent to which the anticorrelation observed here is linked to the unfolded state of the protein, we reproduce the same plots but now consider amino acids at the protein interior instead of the surface (Fig. 1D). For two organisms, the correlation disappears almost entirely when we consider amino acids at the interior. The surface–interior difference is most marked in E. coli, where the correlation vanishes almost completely (R = −0.08) and becomes insignificant (P = 0.4). In humans, the weaker anticorrelation observed in Fig. 1B is also lost with interior amino acids (R = −0.07, P = 0.33), whereas in yeast a weak correlation persists (R = −0.26, P = 4.10−2).
Considering protein length provides a further piece of evidence showing that misassembly rather than misfolding is responsible for the anticorrelation between surface stickiness and abundance. It is known that the small hydrophobic core of short proteins (26) requires compensating mechanisms (27) that increase their stability. In line with this, we find an increase in interior stickiness among small proteins relative to larger proteins for all three species (Fig. S6; pcoli = 2.10−5; pyeast = 0.015; phuman = 2.10−8). The increased stickiness associated with the core of small proteins suggests that a strong amino acid interaction potential can lead to an increase in stability. Comparatively, however, the lack of association between surface stickiness and protein length (Fig. S6; pcoli = 0.05; pyeast = 0.89; phuman = 0.15) implies that stability is unlikely to drive the evolution of protein surfaces toward nonsticky amino acids.
Taken together, these results suggest that, in addition to selection against misfolding and aggregation of polypeptide chains, avoidance of nonfunctional interactions by folded proteins is an important constraint that is proportional to abundance. Moreover, adaptation to this constraint is achieved through a bias in surface amino acid composition toward nonsticky amino acids.
Surface Stickiness Is an Evolutionary Constraint.
To assess whether nonfunctional interactions place a constraint on protein evolution, we study conservation at the amino acid level. We ask whether, within a protein, amino acids surrounded by a sticky environment are more conserved than amino acids surrounded by a nonsticky environment. We computed rates of evolution for each amino acid for all three species and projected these data onto protein structures of each organism (Materials and Methods). In parallel, we calculated a surrounding stickiness score for every surface amino acid of each protein (Fig. 3A). This score is calculated from the amino acid composition of the 400-Å2 surface patch surrounding the residue of interest by averaging its amino acids stickiness values (note that the stickiness of the central residue is independent from that of the patch). Residues are then binned into five “surrounding stickiness” classes of equal size for each organism, and evolutionary conservation is compared across the five classes (Fig. 3B). We reason that residues in more sticky environments are expected to have a higher probability of triggering nonfunctional interactions upon mutation and on average should be more constrained than those in less sticky environments.
Fig. 3.
The relative evolutionary rate of an amino acid is influenced by the stickiness of its environment. (A) Illustration of the procedure used to calculate the stickiness score of a residue’s environment. We use this score as a proxy for the probability of the central residue to trigger a promiscuous interaction upon mutation. Note that, although the central residue is classified according to its context, its chemical composition remains independent of the context and follows an average surface composition, even for the most sticky category of patches (Fig. S7). (B) An evolutionary conservation ratio is calculated for each surface amino acid. The ratio is equal to the median evolutionary rate of the entire protein divided by the evolutionary rate of the residue. We bin all residues into five classes of equal size and increasing stickiness and show the boxplot distribution of evolutionary rates for each class. In all three organisms, the stickier the environment of a residue, the more the residue is conserved relative to the rest of the protein. Note that in this analysis we consider the conservation of the central residue and not that of the patch surrounding it. P values are calculated using the Wilcoxon test.
Importantly in Fig. 3, the evolutionary rate of each residue is normalized as we divide the rate of the protein by that of each residue. Therefore, the larger the ratio, the more conserved the residue relative to the protein. This shows the clear effect of a residue’s environment stickiness on its degree of conservation relative to the protein: residues in nonsticky environments (left-most bin) are 35%, 65%, and 12% freer to evolve than residues in stickier environments (right-most bin) for E. coli, S. cerevisiae, and H. sapiens, respectively. Because these values are obtained after a normalization per protein, they reflect the impact of stickiness on conservation relative to the conservation of the protein. This normalization is necessary to single-out the effect of stickiness because lowly expressed proteins are poorly conserved (28) but also carry most of the sticky patches, as shown in Fig. 1C. Interestingly, the weaker adaptation of human proteins against nonfunctional interactions observed in Fig. 1C is reproduced here, as differences in evolutionary conservation across the five probability classes are weakest in the human data set.
It can be argued that the conservation of residues found in sticky surface patches is due to those patches being unknown biological interfaces. However, several pieces of evidence suggest otherwise. First, if this were the case, we would not expect to see such a difference in signal between species (i.e., decreasing signal strength from E. coli to H. sapiens) because functional interfaces should, on average, be conserved in all species. Second, we would expect the central residue within sticky patches to resemble interface amino acids. To assess this, we compared the frequency distribution of amino acids in sticky patches (Fsticky) with that of amino acids at the interface (Finterface) and surface (Fsurface). Because amino acids such as cysteine are rare in all regions, we normalized these distributions by the average frequencies observed in all regions (Ftotal). As expected, the linear regression between (Fsticky/Ftotal) and (Finterface/Ftotal) was not significant (pcoli = 0.27, pcerevisiae = 0.66, psapiens = 0.48). Residues found in sticky patches are in fact nearly identical in their composition to surface residues, as reflected by the highly significant linear regression between (Fsticky/Ftotal) and (Fsurface/Ftotal): pcoli = 3.1e-14, pcerevisiae = 4.1e-14, psapiens = 9.2e-12, as obtained by analysis of variance. These results are detailed in Fig. S7 and show that for biological units in the Protein Data Bank (PDB) the surfaces are largely solvent-exposed as opposed to being involved in cryptic stable interfaces. Considering isolated subunits, however, we observe the opposite because the sticky patches include genuine interfaces. For this data set, the distribution of residues at the center of sticky patches is closer to interface amino acids (Fig. S7, pcoli = 0.019, pcerevisiae = 0.16, psapiens = 0.028) than to surface ones (for the latter, the regression slopes are actually negative (slopecoli = −0.85, slopecerevisiae = −1.02, slopesapiens = −0.19). Finally, the increasing conservation of residues in increasingly sticky environments holds true even at known interfaces, both at the rim and at the core (Fig. S8), showing that, even within protein–protein contact regions, stickiness is controlled. The latter observation supports the notion of negative design (29) in sensitive interface regions (8, 9, 30). Although unknown biological interfaces must exist, these observations make us confident that they are unlikely to contribute significantly to the signal observed.
Nonfunctional Interactions Might Contribute to the Differential Conservation Between Highly and Lowly Expressed Proteins.
In the first part of this study, we observed an anticorrelation between protein abundance and protein surface stickiness. Subsequently we saw that stickiness is correlated with conservation within a protein. This prompts us to ask whether protein stickiness might be involved in the well-established correlation between protein abundance and evolutionary conservation. Thus, we would expect low-copy proteins to be more tolerant than abundant proteins to amino acid substitutions that significantly change their surface stickiness.
To test this hypothesis, we took advantage of the properties of two pairs of charged amino acids: aspartic (D) and glutamic (E) acids have similar stickiness scores, whereas arginine (R) and lysine (K) do not (Fig. 1B) (15, 17, 31). Arginine is more frequently found at protein–protein interfaces than lysine, making it a stickier amino acid according to our definition. This characteristic enables us to make the following prediction: among high-copy proteins, where significant changes in stickiness have a greater impact, substitutions between K and R should be less frequent than substitutions between D and E. Also, because K, R, E, and D are mostly present at protein surfaces (15, 17), we do not need to restrict ourselves to proteins of known structure and can measure substitution rates from whole proteomes.
We thus measured the substitutions frequencies between K and R (fK<->R) as well as between D and E (fD<->E) among orthologs of three species pairs: E. coli–Salmonella typhimurium, S. cerevisiae–Saccharomyces paradoxus, and H. sapiens–Mus musculus, as detailed in Table S3. Fig. 4 shows ratios of these frequencies (fD↔E/fK↔R) as a function of protein abundance. Substitutions between K and R are rare among abundant proteins relative to substitutions between D and E. In contrast, among low-copy proteins, both substitution types occur at more comparable frequencies. Interestingly, the magnitude of the effect observed, again, decreases in strength from E. coli (160% change between lowest and highest abundance classes) to yeast (78% change) and to humans (13% change).
Fig. 4.
The strength of selection against changes in protein stickiness is proportional to protein abundance. (A) Ratio of frequencies of two substitution types: one between charged residues of equal stickiness (D and E) and one between charged residues with a change in stickiness (K and R). The ratio is plotted for five bins of increasing protein abundance, each containing the same number of these charged residues. The sixth bin contains the top 5% abundant proteins. The ratio, r, defined in the figure, increases by 160%, 78%, and 13% in E. coli, S. cerevisiae, and H. sapiens, respectively, for the most abundant proteins relative to the least abundant ones. Thus, substitutions between K and R become less frequent than substitutions between D and E among highly abundant proteins. The red intervals show the SD of the ratios r obtained from 1,000 datasets where abundance data are randomized. (B) Scheme illustrating the constraints from misfolding and promiscuous interactions. Selection against misfolding provides an explanation for the relationship between protein abundance and evolutionary conservation for residues buried in the interior because the deleterious effects of misfolded aggregates increase with abundance. Avoidance of promiscuous interactions provides a further mechanism that explains negative selection proportional to abundance for residues on the solvent-accessible surface of proteins.
Taken together, these observations provide mechanistic insights into the well-established correlation between protein abundance and evolutionary conservation. Although this correlation has been known for over a decade (28), the biological mechanisms associated with it are still not entirely clear. Selection against misfolding can explain part of the correlation (24, 25), where the assumption is that toxicity of misfolded proteins is proportional to their abundance. Our results support the notion that avoidance of promiscuous interactions, or negative pleiotropy (32), represents an additional mechanistic explanation (Fig. 4B).
Discussion
It has been shown previously that mutations tend to arise faster at the protein surface than in the interior (33, 34). In fact, Toth-Petroczy and Tawfik recently showed that mutations at the interior accumulate more rapidly once the surface has drifted sufficiently (35). Therefore, by lowering the tolerance for mutations at the surface, the divergence of the entire protein becomes constrained (35). Promiscuous interactions, which constrain mutations at the surface, could thereby limit the evolutionary rate of the entire protein. This is consistent with the results of a recent study by Yang et al. showing, in a theoretical molecular evolutionary model using S. cerevisiae, that protein misinteraction represents an evolutionary constraint (14).
Considering two additional species and taking a complementary approach placing more emphasis on the physico-chemical properties of proteins, we also find that protein misinteractions represent an evolutionary constraint. We provide a physico-chemical rationalization of nonfunctional interactions through the stickiness scale. This scale is significantly different from the Kyte and Doolittle hydrophobic scale, which is commonly used as in, e.g., Yang et al. (14). Our stickiness scale is more similar to the Wimley and White scale, although differences, e.g., between lysine and arginine, suggest that it is important to consider the “interaction” potential of amino acids in interpreting nonfunctional interactions. Interestingly, lysine underrepresentation at nonbiological crystal contacts also supports the notion that lysine and arginine have different potentials to be involved in nonfunctional interactions (17, 36). We thus hope that the stickiness scale proposed here will help to refine models that couple protein chemistry to cellular crowding (5). Furthermore, taken together, the work by Yang et al. and our work suggest that proteins are constrained to avoid nonfunctional interactions, adding to the commonly accepted stability and solubility constraints on the amino acid composition of proteins.
Finally, the impact of promiscuous interactions appears most prominent among the unicellular organisms E. coli and S. cerevisiae. It is thus tempting to speculate that nonfunctional interactions may have accumulated in the human lineage (37) in a similar fashion to the accumulation of noncoding DNA (38). In a further analogy to noncoding DNA, nonfunctional interactions may represent the raw material for exploring and ultimately selecting functional interactions (39, 40) through mechanisms such as colocalization (41). These speculations should nevertheless be considered with care, as the weaker signal observed for H. sapiens may also result from the ill-defined nature of protein abundance in multicellular organisms. Future studies will thus be needed to explore these ideas further and better understand the properties of proteomes across the tree of life.
Methods
Sequence Data.
Sequences of proteins and their respective orthologs were aligned with MUSCLE (42). Orthology information was taken from ref. 43 for E. coli and from ENSEMBL v.48 (44) for H. sapiens. Multiple sequence alignments of S. cerevisiae proteins with their orthologs were taken from Wapinsky et al. (45). The details of the species used are in Table S4. Protein multiple alignments were concatenated to obtain three proteome wide multiple alignments (one for each species). These were used to calculate amino acid evolutionary rates using Rate4Site (46).
Structural Data.
Species-specific structures were retrieved by sequence homology. We searched for structures where the sequence from the SEQRES field was similar to proteins from E. coli, S. cerevisiae, or H. sapiens proteomes. We imposed a minimal sequence identity of 90% and a minimum overlap of 70%. We used protein structures from the PDB (47), and the dataset includes all structures present in the second release of 3DComplex (48). All structures for which the biological state was manually annotated in the PiQSi database (49) as “error,” “probable error,” or “undefined” were discarded, as well as all DNA-binding and membrane proteins. Finally, we kept only structures with a resolution below 3 Å. A summary of the number of structures per organism and complex type is given in Table S5. Structural regions were defined as in Levy (15). The environment stickiness for a given residue was calculated based on its surrounding residues, i.e., residues with the Cα within a 400-Å2 patch centered on the Cα of the residue of interest.
Abundance Data.
Protein abundance data were taken from PaxDb (20) (http://pax-db.org). Because of the uncertainty associated with very low abundance proteins, we discarded all proteins with an abundance unit below 1. Statistical analyses and plots were done with R. Data used in this study are available at www.tinyurl.com/structuralregions.
Supplementary Material
Acknowledgments
We thank Dan Tawfik, Joël Janin, Eugene Shakhnovich, Sergei Maslov, David Liberles, Joseph Marsh, Eviatar Natan, Gideon Schreiber and Peter Tompa for their comments on the manuscript. We also thank the two anonymous referees for their constructive comments that significantly helped improve the paper. E.D.L. acknowledges the Human Frontier Science Project for financial support through a long-term fellowship; Stephen Michnick and Université de Montréal for hosting part of this research; and the Weizmann Institute of Science for hosting part of this research. S.D. acknowledges support from the University of Colorado School of Medicine and the National Cancer Institute Physical Sciences Oncology Center initiative (U54-CA143798). E.D.L. and S.A.T. were supported by the Medical Research Council (file Reference U105161047).
Footnotes
The authors declare no conflict of interest.
*This Direct Submission article had a prearranged editor.
Data deposition: The data processed in this paper are available at: www.tinyurl.com/structuralregions.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1209312109/-/DCSupplemental.
References
- 1.McGuffee SR, Elcock AH. Diffusion, crowding & protein stability in a dynamic molecular model of the bacterial cytoplasm. PLOS Comput Biol. 2010;6(3):e1000694. doi: 10.1371/journal.pcbi.1000694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Janin J. Quantifying biological specificity: The statistical mechanics of molecular recognition. Proteins. 1996;25(4):438–445. doi: 10.1002/prot.4. [DOI] [PubMed] [Google Scholar]
- 3.Zhang J, Maslov S, Shakhnovich EI. Constraints imposed by non-functional protein-protein interactions on gene expression and proteome size. Mol Syst Biol. 2008;4:210. doi: 10.1038/msb.2008.48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Tompa P, Rose GD. The Levinthal paradox of the interactome. Protein Sci. 2011;20(12):2074–2079. doi: 10.1002/pro.747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Heo M, Maslov S, Shakhnovich E. Topology of protein interaction network shapes protein abundances and strengths of their functional and nonspecific interactions. Proc Natl Acad Sci USA. 2011;108(10):4258–4263. doi: 10.1073/pnas.1009392108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Johnson ME, Hummer G. Nonspecific binding limits the number of proteins in a cell and shapes their interaction networks. Proc Natl Acad Sci USA. 2011;108(2):603–608. doi: 10.1073/pnas.1010954108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gsponer J, Futschik ME, Teichmann SA, Babu MM. Tight regulation of unstructured proteins: From transcript synthesis to protein degradation. Science. 2008;322(5906):1365–1368. doi: 10.1126/science.1163581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Fleishman SJ, Baker D. Role of the biomolecular energy gap in protein design, structure, and evolution. Cell. 2012;149(2):262–273. doi: 10.1016/j.cell.2012.03.016. [DOI] [PubMed] [Google Scholar]
- 9.Pechmann S, Levy ED, Tartaglia GG, Vendruscolo M. Physicochemical principles that regulate the competition between functional and dysfunctional association of proteins. Proc Natl Acad Sci USA. 2009;106(25):10159–10164. doi: 10.1073/pnas.0812414106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Tartaglia GG, Pechmann S, Dobson CM, Vendruscolo M. A relationship between mRNA expression levels and protein solubility in E. coli. J Mol Biol. 2009;388(2):381–389. doi: 10.1016/j.jmb.2009.03.002. [DOI] [PubMed] [Google Scholar]
- 11.Hamada D, et al. Competition between folding, native-state dimerisation and amyloid aggregation in beta-lactoglobulin. J Mol Biol. 2009;386(3):878–890. doi: 10.1016/j.jmb.2008.12.038. [DOI] [PubMed] [Google Scholar]
- 12.Tartaglia GG, Pechmann S, Dobson CM, Vendruscolo M. Life on the edge: A link between gene expression levels and aggregation rates of human proteins. Trends Biochem Sci. 2007;32(5):204–206. doi: 10.1016/j.tibs.2007.03.005. [DOI] [PubMed] [Google Scholar]
- 13.Münch C, Bertolotti A. Exposure of hydrophobic surfaces initiates aggregation of diverse ALS-causing superoxide dismutase-1 mutants. J Mol Biol. 2010;399(3):512–525. doi: 10.1016/j.jmb.2010.04.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yang JR, Liao BY, Zhuang SM, Zhang J. Protein misinteraction avoidance causes highly expressed proteins to evolve slowly. Proc Natl Acad Sci USA. 2012;109(14):E831–E840. doi: 10.1073/pnas.1117408109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Levy ED. A simple definition of structural regions in proteins and its use in analyzing interface evolution. J Mol Biol. 2010;403(4):660–670. doi: 10.1016/j.jmb.2010.09.028. [DOI] [PubMed] [Google Scholar]
- 16.Lo Conte L, Chothia C, Janin J. The atomic structure of protein-protein recognition sites. J Mol Biol. 1999;285(5):2177–2198. doi: 10.1006/jmbi.1998.2439. [DOI] [PubMed] [Google Scholar]
- 17.Janin J, Bahadur RP, Chakrabarti P. Protein-protein interaction and quaternary structure. Q Rev Biophys. 2008;41(2):133–180. doi: 10.1017/S0033583508004708. [DOI] [PubMed] [Google Scholar]
- 18.Vavouri T, Semple JI, Garcia-Verdugo R, Lehner B. Intrinsic protein disorder and interaction promiscuity are widely associated with dosage sensitivity. Cell. 2009;138(1):198–208. doi: 10.1016/j.cell.2009.04.029. [DOI] [PubMed] [Google Scholar]
- 19.Levy ED, Michnick SW, Landry CR. Protein abundance is key to distinguish promiscuous from functional phosphorylation based on evolutionary information. Philos Trans R Soc Lond B Biol Sci. 2012;367(1602):2594–2606. doi: 10.1098/rstb.2012.0078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wang M, et al. PaxDb, a database of protein abundance averages across all three domains of life. Mol Cell Proteomics. 2012;11(8):492–500. doi: 10.1074/mcp.O111.014704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982;157(1):105–132. doi: 10.1016/0022-2836(82)90515-0. [DOI] [PubMed] [Google Scholar]
- 22.Kawashima S, et al. AAindex: Amino acid index database, progress report 2008. Nucleic Acids Res. 2008;36(Database issue):D202–D205. doi: 10.1093/nar/gkm998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wimley WC, White SH. Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nat Struct Biol. 1996;3(10):842–848. doi: 10.1038/nsb1096-842. [DOI] [PubMed] [Google Scholar]
- 24.Yang JR, Zhuang SM, Zhang J. Impact of translational error-induced and error-free misfolding on the rate of protein evolution. Mol Syst Biol. 2010;6:421. doi: 10.1038/msb.2010.78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Drummond DA, Wilke CO. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell. 2008;134(2):341–352. doi: 10.1016/j.cell.2008.05.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Chothia C. Structural invariants in protein folding. Nature. 1975;254(5498):304–308. doi: 10.1038/254304a0. [DOI] [PubMed] [Google Scholar]
- 27.Pereira de Araújo AF, Gomes AL, Bursztyn AA, Shakhnovich EI. Native atomic burials, supplemented by physically motivated hydrogen bond constraints, contain sufficient information to determine the tertiary structure of small globular proteins. Proteins. 2008;70(3):971–983. doi: 10.1002/prot.21571. [DOI] [PubMed] [Google Scholar]
- 28.Pál C, Papp B, Hurst LD. Highly expressed genes in yeast evolve slowly. Genetics. 2001;158(2):927–931. doi: 10.1093/genetics/158.2.927. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Doye JP, Louis AA, Vendruscolo M. Inhibition of protein crystallization by evolutionary negative design. Phys Biol. 2004;1(1–2):9–13. doi: 10.1088/1478-3967/1/1/P02. [DOI] [PubMed] [Google Scholar]
- 30.Levin KB, et al. Following evolutionary paths to protein-protein interactions with high affinity and selectivity. Nat Struct Mol Biol. 2009;16(10):1049–1055. doi: 10.1038/nsmb.1670. [DOI] [PubMed] [Google Scholar]
- 31.MacCallum JL, Tieleman DP. Hydrophobicity scales: A thermodynamic looking glass into lipid-protein interactions. Trends Biochem Sci. 2011;36(12):653–662. doi: 10.1016/j.tibs.2011.08.003. [DOI] [PubMed] [Google Scholar]
- 32.Liberles DA, Tisdell MD, Grahnen JA. 2011. Binding constraints on the evolution of enzymes and signalling proteins: The important role of negative pleiotropy. Proc Biol Sci 278(1714):1930–1935. [DOI] [PMC free article] [PubMed]
- 33.Sasidharan R, Chothia C. The selection of acceptable protein mutations. Proc Natl Acad Sci USA. 2007;104(24):10080–10085. doi: 10.1073/pnas.0703737104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Franzosa EA, Xia Y. Structural determinants of protein evolution are context-sensitive at the residue level. Mol Biol Evol. 2009;26(10):2387–2395. doi: 10.1093/molbev/msp146. [DOI] [PubMed] [Google Scholar]
- 35.Tóth-Petróczy A, Tawfik DS. Slow protein evolutionary rates are dictated by surface-core association. Proc Natl Acad Sci USA. 2011;108(27):11151–11156. doi: 10.1073/pnas.1015994108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Cieślik M, Derewenda ZS. The role of entropy and polarity in intermolecular contacts in protein crystals. Acta Crystallogr D Biol Crystallogr. 2009;65(Pt 5):500–509. doi: 10.1107/S0907444909009500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Fernández A, Lynch M. Non-adaptive origins of interactome complexity. Nature. 2011;474(7352):502–505. doi: 10.1038/nature09992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lynch M. The Origins of Genome Architecture. Sunderland, MA: Sinauer Associates, Inc.; 2007. p. 494. [Google Scholar]
- 39.Tawfik DS. Messy biology and the origins of evolutionary innovations. Nat Chem Biol. 2010;6(10):692–696. doi: 10.1038/nchembio.441. [DOI] [PubMed] [Google Scholar]
- 40.Nobeli I, Favia AD, Thornton JM. Protein promiscuity and its implications for biotechnology. Nat Biotechnol. 2009;27(2):157–167. doi: 10.1038/nbt1519. [DOI] [PubMed] [Google Scholar]
- 41.Kuriyan J, Eisenberg D. The origin of protein interactions and allostery in colocalization. Nature. 2007;450(7172):983–990. doi: 10.1038/nature06524. [DOI] [PubMed] [Google Scholar]
- 42.Edgar RC. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Moreno-Hagelsieb G, Janga SC. Operons and the effect of genome redundancy in deciphering functional relationships using phylogenetic profiles. Proteins. 2008;70(2):344–352. doi: 10.1002/prot.21564. [DOI] [PubMed] [Google Scholar]
- 44.Flicek P, et al. Ensembl 2008. Nucleic Acids Res. 2008;36(Database issue):D707–D714. doi: 10.1093/nar/gkm988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wapinski I, Pfeffer A, Friedman N, Regev A. Natural history and evolutionary principles of gene duplication in fungi. Nature. 2007;449(7158):54–61. doi: 10.1038/nature06107. [DOI] [PubMed] [Google Scholar]
- 46.Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N. Rate4Site: An algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics. 2002;18(Suppl 1):S71–S77. doi: 10.1093/bioinformatics/18.suppl_1.s71. [DOI] [PubMed] [Google Scholar]
- 47.Berman HM, et al. 2002. The Protein Data Bank. Acta Crystallogr D Biol Crystallogr 58(Pt 6 No 1):899–907.
- 48.Levy ED, Pereira-Leal JB, Chothia C, Teichmann SA. 3D complex: A structural classification of protein complexes. PLOS Comput Biol. 2006;2(11):e155. doi: 10.1371/journal.pcbi.0020155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Levy ED. PiQSi: Protein quaternary structure investigation. Structure. 2007;15(11):1364–1367. doi: 10.1016/j.str.2007.09.019. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




