Nonadaptive origins of interactome complexity

Ariel Fernández; Michael Lynch

doi:10.1038/nature09992

. Author manuscript; available in PMC: 2011 Dec 23.

Published in final edited form as: Nature. 2011 May 18;474(7352):502–505. doi: 10.1038/nature09992

Nonadaptive origins of interactome complexity

Ariel Fernández ^1,², Michael Lynch ³

PMCID: PMC3121905 NIHMSID: NIHMS279320 PMID: 21593762

Abstract

The boundaries between prokaryotes, unicellular eukaryotes, and multicellular eukaryotes are accompanied by orders-of-magnitude reductions in effective population size, with concurrent amplifications of the effects of random genetic drift and mutation¹. The resultant decline in the efficiency of selection appears to be sufficient to influence a wide range of attributes at the genomic level in a nonadaptive manner². A key remaining question concerns the extent to which variation in the power of random genetic drift is capable of influencing phylogenetic diversity at the subcellular and cellular levels²^–⁴. Should this be the case, population size would have to be considered as a potential determinant of the mechanistic pathways underlying long-term phenotypic evolution. Here we demonstrate a phylogenetically broad inverse relationship between the power of drift and the structural integrity of protein subunits. This leads to the hypothesis that the accumulation of mildly deleterious mutations in populations of small size induces secondary selection for protein-protein interactions that stabilize key gene functions. By this means, the complex protein architectures and interactions essential to the genesis of phenotypic diversity may initially emerge by nonadaptive mechanisms.

Here we examine whether established gene orthologies reveal a role for drift in phylogenetic patterns of protein structural evolution. Although evolutionary change at the structural level is unlikely to greatly destabilize the native fold of an essential protein, as the complete loss of function would generally be unbearable, the drift hypothesis predicts a negative relationship between population size (N) and the accumulation of mildly deleterious amino-acid substitutions. The following examination of the structures of orthologous proteins from vastly different lineages suggests that the enhanced power of drift in eukaryotes (multicellular species in particular) results in a qualitative reduction in the stability of protein-water interfaces (PWIs) via the partial exposure of paired backbone polar groups (amides and carbonyls) that are otherwise protected in prokaryotes. In effect, the reduced efficiency of selection in small-N species encourages the accumulation of mild structural deficiencies in the form of solvent-accessible backbone hydrogen bonds (SABHBs), which lead to protein structures that are more “open” and vulnerable to fold-disruptive hydration (Fig. 1a) and create protein-water interfacial tension (PWIT, Supplementary Fig. 1)⁵ by hindering the hydrogen-bonding capabilities of nearby water molecules.

a. Hydration of exposed polar backbone induces interfacial tension by causing water molecules near the defect to relinquish part of their coordination (g < 4) relative to the level in surrounding bulk solvent (g = 4). White represents hydrogen; red, oxygen; blue, nitrogen; and black, carbon atoms; and the larger purple circles denote side chains for amino acids. Hydrogen bonds are denoted by dashed lines. Thick grey lines outline the external surface of the overall protein molecule, and the underlying structure represents two amino acids made adjacent by the protein architecture and bound by a hydrogen bond between the backbone amide (blue:white) of one amino acid and carbonyl (red:black) of the other. Water molecules are shown as angular red and white segments, with the coordination number g denoting the number of hydrogen bonds associated with a water molecule (g = 4 for bulk water; g < 4 for confined interfacial water). In the center, the structure of the protein causes local exposure and unfavorable hydration of the polar backbone, whereas the absence of such local interactions between water molecules and the well-wrapped proteins on the left and right reduces interfacial tension (interfacial water is bulk-like, retaining the maximum coordination g = 4).

b. Comparison of orthologous proteins with different levels of homo-oligomerization reveals that the PWIT is an indicator of the propensity for cooperative improvement/refinement of protein function through complexation. The ratio of P-P interfaces (small to large) was determined for pairs of orthologous proteins with different levels of oligomerization in different species (Supplementary Table 2) and plotted against the ratio of PWITs for the respective free subunits. The tight correlation (r²=0.94) reveals that interspecific differences in PWIT accompany differences in levels of oligomerization, thus providing a measure of potential allosteric or cooperative improvement of basic protein function. Complexes with cyclic rotational symmetry (C2, C3,…) can further oligomerize into complexes with dihedral (D2, D3,…) symmetry, as shown in the idealized diagrams in the lower right. For example, C2-complexes can dimerize into D2-complexes, trimerize into D3 complexes, etc., while a D3 complex can also be obtained by dimerization of C3-complex. For the PP interface and PWIT ratios examined, the interface for the subunit in the complex with lower-order symmetry is compared with that in the complex with higher-order symmetry, yielding analyses based on protein pairs contrasted within three groupings: C2 vs. D2, C2 vs. D3, and C3 vs. D3.

c. The SABHB patterns from two haemoglobins with different oligomerization levels in their native states are compared. In the bottom panels, the protein backbone is represented by virtual bonds in blue joining α-carbons, with well-protected BHBs shown as light grey and SABHBs as green lines joining the α-carbons of the paired residues. The ribbon representations of the human complex and dissociated subunit (chain A in PDB.2DN2, left and center, respectively) are included as aids to the eye, representing the structuring of the backbone in each subunit. The free subunit isolated from the tetramer in *Homo sapiens* (PDB.2DN2, chain A, center) has seven excess SABHBs (denoted by asterisks) when compared with the subunit within the tetrameric complex, where they are well-protected intermolecularly, alleviating interfacial tension. As a consequence of this better wrapping, the overall extent of structural deficiency (ν-value) for the subunit within the human complex is identical to that of the natively monomeric haemoglobin from the trematode *Fasciola hepatica* (PDB.2VYW). This raises the possibility that the accumulation of structural deficiencies in the mammalian haemoglobin subunit promoted the emergence of an oligomeric association as a means for reducing excess interfacial tension. The structural displays were obtained by uploading the PDB text files into the program YAPview, a displayer of local backbone desolvation of soluble proteins that can be downloaded from the link “Dehydron Calculator” at the site http://www.owlnet.rice.edu/~arifer/.

We argue that the emergence of unfavorable PWIs promotes the secondary recruitment of novel protein-protein (PP) associations that restore structural stability by reducing PWI. Under this hypothesis, complex organisms may frequently develop PP interactions not as immediate vehicles for novel adaptive functions, but as compensatory mechanisms for retaining key gene functions. Once in place, such physical contact between interacting proteins may provide a selective environment for the further emergence of entirely novel PP interactions underlying cellular and organismal complexities. Our suggestion that the hallmark of eukaryotic evolution, the origin of interactome complexity, may have arisen in part as a passive consequence of the enhanced power of drift reduces the need to invoke direct long-term selective advantages of phenotypic complexity⁶.

To gain insight into the evolution of interactome complexity, we derived quantitative measures of the PWIT as indicators of potential molecular interactivity. To estimate the PWIT of a protein, we computationally equilibrated the protein structure in surrounding water, using the function g(r) to represent the time-averaged coordination (number of hydrogen bonds) associated with a water molecule at position r (Figure 1a), and integrating over the entire protein surface all water molecules within a 10 angstrom radius (the thickness of four layers of water molecules). Compared with bulk water (where g = 4), interfacial water molecules may have reduced hydrogen-bonding opportunities (g < 4) and often counterbalance these losses by interacting with polar groups on the protein surface. Thus, the PWIT parameter integrates information on unfavorable local decreases in g and favorable polarization contributions from the protein to yield the free-energy cost, ΔG_if, of spanning the P-W interface (Methods). A high PWIT signals a high propensity for PP associations, which reduce the PWI area.

To validate the utility of PWIT as a measure of interactivity, we examined an exhaustive catalog of contact topologies for protein complexes with 1 to 6 subunits, with each topology being evaluated with one or more nonhomologous complexes using structures in the Protein Data Bank (PDB) (Supplementary Table 1). For each complex, we computed the total PP interface area after identifying the residues engaged in intermolecular contacts⁷. For each protein subunit, the PP interface is contained within the PWI region that generates tension in the free subunit, and there is a tight correlation between the surface areas for both regions, implying that regions on the protein surface generating PWIT (i. e., those with g < 4 for nearby water) actually promote associations (Supplementary Figs. 1, 2a). Next, we verified that protein surface regions generating PWIT coincide with the affinity-contributing regions at PP interfaces. To this end, we tested the value of PWIT as a promoter of protein associations by focusing on the interface for the 1:1 human growth hormone (hGH)-receptor complex⁸ (Supplementary Fig. 2b) for which the consequences of amino-acid substitutions have been extensively evaluated. Our analysis reveals a strong correlation between the change in PWIT induced by site-specific mutagenesis of interfacial residues and the association free energy difference induced in the interface between the hormone and its receptor (Supplementary Fig. 2c).

Comparison of orthologous proteins engaging in different levels of homo-oligomerization in different species⁹ further supports the view that PWIT serves as a measure of the propensity for PP association. The ratio of PP interfaces (lower to higher degrees of complexation; Supplementary Table 2) exhibits a strong positive correlation with the ratio of PWITs for the respective free subunits (Fig. 1b). As complexes with higher degrees of oligomerization arise from lower-order complexes, this implies that the degree of cooperativity among subunits correlates with the PWIT of the basic subunit.

Hydrophobic regions on protein surfaces obviously contribute to PWIT, but analysis of proteins exhibiting association propensity (Supplementary Table 2) shows that the regions generating 73±5% of the PWIT (Supplementary Information, Supplementary Figure 1) arise from solvent-accessible backbone hydrogen bonds (SABHBs). The resultant hydration of backbone polar groups (amides and carbonyls) causes a loss of coordination for local water molecules, which increases surface tension and creates an unstable PWI, as the cavities cannot accommodate a bulk-like water molecule¹⁰. As an example of how such a structural deficiency can be alleviated through a protein association, an isolated α-subunit of the human haemoglobin tetramer has seven SABHBs that become protected within the tetrameric complex, such that the ratio (ν) of SABHBs to total BHBs in the complex-associated subunit is the same as that in the natively monomeric unit for haemoglobin from the trematode Fasciola hepatica (Fig. 1c).

To evaluate whether the accumulation of structural deficiencies of proteins is generally encouraged by random genetic drift, and in turn enhance the propensity for establishing protein complexes, we examined a set of 106 orthologous water-soluble proteins (sequence identity >30%)¹¹^,¹² with PDB-reported structures for at least two species. We considered 36 species with vastly different population sizes¹^,², each containing proteins in at least 90 of the 106 orthologous groups (Supplementary Tables 3–5). Template-based three-dimensional structures for orthologs lacking PDB-reported structures were constructed by homology threading¹³^,¹⁴, and evaluated, ranked, and selected according to the energetic proximity between template and model¹⁵. The accuracy of this homology-based prediction of PWIT was determined with a test set of proteins with PDB-reported structures from two species, subjecting one member of each orthologous pair to homology threading through the other. Comparison of the indirect and direct estimates of PWIT demonstrates that when sequence identities are >35%, the predicted PWIT diverges <10% from the more direct estimate for the same protein (Supplementary Fig. 3).

For each protein structure, g(r) was obtained as described in Methods, and the relative propensities for protein association across orthologs were then determined by assessing differences in the free energy cost ΔG_if among species. We estimated the relative complexation propensity M_j,n of a protein in ortholog group j (1, …,106) from species n (1, …, 36) by adopting E. coli as a reference species (n = 1):

M_{j, n} = [{(Δ G_{if})}_{j, n} - {(Δ G_{if})}_{j, 1}] ∕ {(Δ G_{if})}_{j, 1}

(1)

With this index, M_j,1 = 0 for all proteins in E. coli, and taxa with less well-wrapped proteins (and hence greater propensity for complexation) have positive values.

The mean value of species-specific estimates of M_j,n over all proteins evaluated is negatively correlated with the approximate effective population sizes of species (Fig. 2a), given that the average ranking of the latter is prokaryotes > unicellular eukaryotes > invertebrates > vertebrates and land plants¹^,². A specific example of a trend toward increasing structural openness with reduced population sizes is illustrated in Fig. 2b, where the SABHB patterns and ν-values for orthologs of the enzyme superoxide dismutase are compared across three species.

a. Potential for interactome complexity of 36 species with diverse population sizes (Supplementary Table 2), relative to *Escherichia coli*. To highlight the relative power of random genetic drift, bars are color-coded to reflect groupings of species in broad population-size categories.

b. Overall structural deficiency of orthologs of the enzyme superoxide dismutase (Mn), revealing a progressive accumulation of SABHBs in the orthologs of the bacterium *E. coli*, the nematode *C. elegans*, and *H. sapiens*. The upper ribbon representations illustrate the structural conservation across orthologs (respective PDB accession numbers 3ot7, 3dc6, 2adq). The conventional color coding is: red, blue, magenta and light blue for helix, β-strand, loop, and turn, respectively.

c. Average structural deficiency (ν-value) of protein orthologs for intracellular and free-living bacterial species. Species identities, progressing from left to right: α-Proteobacteria – Rickettsia typhi, Orientia tsutsugamushi, Anaplasma centrale str. Israel, *Wolbachia* sp. wRi, *Rhodospirillum centenum* SW, Orientia tsutsugamushi str. Ikeda, Magnetospirillum magneticum, Silicibacter TM1040, Erythrobacter litoralis; γ-Proteobacteria – *Buchnera aphidicola, Wigglesworthia brevipalpis*, Candidatus *Blochmannia pennsylvanicus, Marinomonas* MWYL1, *Escherichia coli, Pseudomonas aeruginosa.* Only proteins with orthologs across the full set of species within each group were considered for analysis (Supplementary Tables 6, 7).

The results from Figs. 2a and an additional analysis (Supplementary Fig. 4) support the hypothesis that large organisms with small population sizes experience a significant enough increase in the power of random genetic drift to magnify the accumulation of mild structural deficiencies in the form of SABHBs, resulting on average in proteins with a more solvent-exposed or “open” structure. By contrast, mutations to SABHBs are more frequently excluded by selection in species with larger population sizes (e.g., prokaryotes). Thus, because SABHBs are the main determinants of interfacial tension (Supplementary Fig. 1), the proteins of large organisms have a greater inherent tendency to form novel protein-protein associations (Fig. 1a). This suggests that increases in protein-network complexity in multicellular species may in part owe their origins to modifications to the intracellular selective environment induced by nonadaptive structural degradation of individual proteins.

One concern with the preceding interpretation is the order of events - does an initial degradation of architectural integrity of individual proteins in response to random genetic drift induce secondary selection for the recruitment of interacting partners, or does the emergence of cellular complexity (and increased protein interactivity) precede secondary changes in protein sequence to accommodate such interactions? One way to evaluate this matter is to compare proteins from related species that have experienced relatively recent divergences in effective population sizes but no major modifications in intracellular complexity or emergence of multicellularity.

To achieve this task, we compared orthologous genes from endosymbiotic/intracellular bacteria and their free-living relatives, as the former are thought to have experienced substantial reductions in effective population sizes¹⁶. Previous suggestions that intracellular bacteria experience elevated levels of random genetic drift have been based on ratios of substitution rates at silent and replacement sites, which can be biased indicators of the efficiency of selection if there is selection on silent sites. Although the lack of protein structural information for endosymbiotic species requires a sequence-based identification of SABHBs derived from reliable scores of native disorder propensity (Methods), the resultant analyses are broadly consistent with the hypothesis that an increase in the power of drift in microbes encourages the accumulation of structural defects in protein architecture (Fig. 2c). Free-living species, with larger effective population sizes, have consistently smaller ν-values for orthologous genes in both α- and γ-Proteobacteria. [Application of the same sort of analysis of disorder propensity across a set of 105 species and 541 proteins corroborates this result (Supplementary Figs. 5–7)].

Taken together, our analyses support the hypothesis that the range of population sizes experienced by natural populations is sufficient to induce significantly different patterns of evolution at the level of protein architecture. The resultant changes in the intracellular environment in small-N species provides an opportunity for the recruitment of stabilizing protein-protein interactions, yielding a plausible mechanism for the emergence of molecular complexities prior to their exploitation in phenotypic divergence⁹^,¹⁷. This hypothesis does not deny a potentially significant role for natural selection in utilizing such novelties subsequent to their establishment, nor does it deny the fact that intramolecular compensatory mutations can alleviate some structural defects associated with SABHBs. However, our results do raise questions about the necessity of invoking an intrinsic advantage to organismal complexity, and provide a strong rationale for expanding comparative studies in molecular evolution beyond linear sequence analysis to evaluations of molecular structure.

METHODS SUMMARY

We determined the propensity of proteins to be engaged in associations that reduce the protein-water interface (PWI) by computing the protein-water interfacial tension (PWIT). This thermodynamic parameter gives ΔG_if, the free energy cost of spanning the PWI. The PWIT is computed as:

Δ G_{if} = (1 ∕ 2) \int {a {∣ \nabla g ∣}^{2} - {∣ P [g (r)] ∣}^{2}} d r,

(2)

where the term ½a|▽g|², with a = 9.02 mJ/m at T = 298 K (Methods), accounts for tension-generating reductions in water coordination, and the polarization P[g(r)] accounts for dipole-electrostatic field interactions (Methods). For a given protein structure or template-based structural model, the field g = g(r) used in the numerical integration of Eq. 2 was determined by equilibrating the water-embedded structure within an NPT ensemble (with fixed parameters N = number of particles, P = pressure, and T = temperature; Methods)¹⁰^,¹⁸^,¹⁹. From structural coordinates, we determined the structural deficiencies (solvent-accessible backbone hydrogen bonds or SABHBs, Methods)²⁰ that generate 73±5% of the PWIT (Supplementary Information). We examined 106 groups of orthologous proteins identified using OrthoMCL¹¹^,¹² for which there are PDB representatives from at least two species (usually E. coli and H. sapiens, Supplementary Tables 3–5). We considered 36 representative species, each containing proteins in at least 90 of the 106 ortholog groups. Template-based three-dimensional structures for orthologs lacking a PDB-reported structure¹⁴ were constructed using MODELLER¹³, with side chains directly positioned with SCWRL²¹. The template and resulting model were evaluated, ranked, and finally selected using ProSA¹⁵. The accuracy of homology models is shown in Supplementary Fig. 3. In cases where orthologous structural templates were unavailable, as with the comparison of endosymbionts to free-living species, a sequence-based inference of SABHBs was performed based on an established anticorrelation between backbone protection and disorder propensity (Supplementary Fig. 5)²². The cross validation of homology-based and disorder-based estimations of ν-values is given in Supplementary Fig. 6.

METHODS

Computation of protein-water interfacial tension

The parameter a in Eq. 2 is obtained from the interfacial tension of a large nonpolar sphere with radius θ in the limit θ/1 nm → ∞ (nm=nanometer). Thus, we get: a = 9.02 mJ/m = limit_{θ/1nm_→∞}[γ(4πθ²)/∫½|▽g|²dr], where γ=72mJ/m² is the bulk surface tension of water at 298 K, and ∫|▽g|²dr=O(θ²) since ▽g ≠ 0 only in the vicinity of the interface. To determine the g-dependence of polarization P=P(r), we adopt the Fourier-conjugate frequency space (ω-space) and represent the dipole correlation kernel K_p(ω) and the electrostatic field E=E(r) in this space. In contrast with other treatments²³, we note that P and E are indeed proportional but the proportionality constant is ω-dependent²⁴. Thus, in ω-space we get:

F (P) (ω) = K_{p} (ω) F (E) (ω)

(3)

where F denotes 3D-Fourier transform F(f)(ω)=(2π)^−3/2∫e^iω.rf(r)dr, and the kernel K_p(ω) is the Lorentzian K_p(ω)=(ε_b-ε_o )/(1+(τ(r)c)²|ω|²), with τ(r)c=position-dependent dielectric relaxation scale ≈ 3 cm for τ=τ_b≈100 ps (c=speed of light), ε_b = bulk permittivity, and ε_o=vacuum permittivity. Because P(r) satisfies the Debye relation ▽.(ε_oE+P)(r)=ρ(r), where ρ(r)=charge density, Eq. (4) yields the following equation in r-space²⁵:

\nabla . [\int F^{- 1} (K) (r - r ’) E (r ’) d r ’)] = ρ (r),

(4)

with K(ω)=ε_o+K_p(ω). The convolution ∫F⁻¹(K)(r−r')E(r')dr' captures the correlation of the dipoles with the electrostatic field. Note that Eq. 4 is not the Poisson-Boltzmann equation, which requires a proportionality between the fields E and P under the ad hoc assumption K(ω)≡constant.

Upon water confinement, the dielectric relaxation undergoes a frequency redshift arising from the reduction in hydrogen-bond partnerships that translates to a reduction in dipole orientation possibilities. Thus, at position r, the relaxation time is τ=τ_bexp(B(g(r))/k_BT), where the kinetic barrier B(g(r))=−k_BTln(g(r)/4) yields τ(r)=τ (g(r)/4)⁻¹. Thus, for charge distribution:

ρ (r) = Σ_{m \in L} 4 π q_{m} δ (r - r_{m}),

(5)

with L=set of charges on the protein surface labeled by index m, the g-dependent polarization is obtained from Eq. 4 (Supplementary Information):

\begin{matrix} P (r) = \int F^{- 1} (K_{p}) (r - r ’) E (r ’) d r ’ \\ = {(2 π)}^{- 3} Σ_{m \in L} \int d r ’ F^{- 1} (K_{p}) (r - r ’) \nabla_{r ’} \int d ω e^{- i ω . (r ’ - r_{m})} 4 π q_{m} ∕ [{∣ ω ∣}^{2} K (ω)] . \end{matrix}

(6)

Spatially dependent coordination g=g(r)

The time-averaged scalar field g=g(r) was obtained from classical trajectories generated by molecular dynamics (MD). The computations started with the PDB structure of a free (uncomplexed) protein molecule embedded in a pre-equilibrated cell of explicitly represented water molecules and counterions¹⁸^,¹⁹. The MD trajectories were generated by adopting an integration time step of 2 fs in an NPT ensemble with box size 10³ nm³ and periodic boundary conditions²⁶. The box size was calibrated so that the solvation shell extended at least 10 Å from the protein surface at all times. The long-range electrostatics were treated using the Particle Mesh Ewald (PME) summation method²⁷. A Nosé-Hoover thermostat²⁸ was used to maintain the temperature at 300 K, and a Tip3P water model with OPLS (Optimized Potential for Liquid Simulations) force field was adopted¹⁸^,¹⁹. A barostat scheme was maintained through a dedicated routine with the pressure held constant at 1 atm. using a weak-coupling algorithm²⁹. After 300 ns equilibration, g values averaged over a time span of 100 ns were determined for each point in space.

PWIT as promoter of protein-protein associations

The PWIT computed using Eqs. 2, 6 is generated by interfacial hot spots of red-shifted dielectric relaxation (g(r)<4, τ(r)>τ_b). The most common spots involve hindered polar hydration generated by SABHBs (Fig. 1a). Taken collectively, the SABHBs contribute 73±5% to the interfacial tension (Supplementary Information). The results are validated by showing that the inferred patches of interfacial tension promote protein associations, a conclusion supported by the tight correlation (r²=0.83) between the total area of surface patches begetting PWIT (increasing the value of the integral in Eq. 2) in free complex subunits, and the total P-P interfacial area of protein complexes (Supplementary Fig. 2a). The relevance of PWIT as a molecular determinant of protein-protein interactions is further validated by showing that inferred tension patches actually coincide with hot spots at complex interfaces experimentally identified by mutational scanning (Supplementary Figs. 2b,c).

Identification of SABHBs in soluble proteins

The extent of protection of a backbone hydrogen bond, ζ, was computed directly from PDB structural coordinates by determining the number of side-chain nonpolar groups contained within a desolvation domain around the bond²⁰^,²². This domain was defined as two intersecting spheres of fixed radius (~thickness of three water layers) centered at the α-carbons of the residues paired by the hydrogen bond. In structures of soluble proteins, backbone hydrogen bonds are protected on average by ζ=26.6±7.5 nonpolar groups for a desolvation sphere radius 6Å. SABHBs lie in the tails of the distribution, i.e. their microenvironment contains 19 or fewer nonpolar groups (ζ≤19), so their ζ-value is below the mean minus one standard deviation.

Sequence-based identification of SABHBs

SABHBs represent structural vulnerabilities that have been characterized as belonging to a twilight zone between order and native disorder. This characterization is justified by a strong correlation between intramolecular hydrogen-bond protection, ζ, and propensity for structural disorder (f_d) (Supplementary Fig. 5). The correlation reveals that the inability to exclude water intramolecularly from pre-formed hydrogen bonds is causative of the loss of structural integrity. The disorder propensity is accurately quantified by a sequence-based score generated by the program PONDR-VLXT³⁰, a predictor of native disorder that takes into account residue attributes such as hydrophilicity, aromaticity, and their distribution within the window interrogated. The disorder score (0≤f_d≤1) is assigned to each residue within a sliding window, representing the predicted propensity of the residue to be in a disordered region (f_d=1, certainty of disorder; f_d=0, certainty of order). Only 6% of 1100 nonhomologous PDB proteins gave false positive predictions of disorder in sequence windows of 40 amino acids²²^,³⁰. The strong correlation (Supplementary Fig. 5) between the disorder score of a residue and extent of protection of the hydrogen bond engaging the residue (if any) provides a sequence-based method of inference of SABHBs and supports the picture that such bonds belong to an order-disorder twilight zone²². Thus, SABHBs can be safely inferred in regions where the disorder score lies in the range 0.35≤f_d<0.95, which corresponds to a marginal BHB protection with 7≤ζ≤19 (Supplementary Fig. 5).

Evaluation of homology models

The homology models based on template PDB structures from orthologous proteins were evaluated, ranked, and ultimately selected using ProSA¹⁵, based on the minimization of (Z_mod − Z_temp)/Z_temp, where Z_mod, Z_temp are the Z-scores of model and template. The Z-score of a structure or template-based model is the energetic gap between the structure and an average over an ensemble of random conformations for the protein chain¹⁵.

Supplementary Material

NIHMS279320-supplement-1.pdf^{(787.8KB, pdf)}

Acknowledgments

A. F. was supported by National Institutes of Health grant R01GM072614, and by the Institute of Biophysical Dynamics and the Department of Computer Science at The University of Chicago. M. L. was supported by National Institutes of Health grant R01GM036827 and National Science Foundation grant EF-0827411.

References

1.Lynch M, Conery JS. The origins of genome complexity. Science. 2003;302:1401–1404. doi: 10.1126/science.1089370. [DOI] [PubMed] [Google Scholar]
2.Lynch M. The Origins of Genome Architecture. Sinauer Assocs., Inc.; Sunderland, MA: 2007. [Google Scholar]
3.Stoltzfus A. On the possibility of constructive neutral evolution. J. Mol. Evol. 1999;49:169–81. doi: 10.1007/pl00006540. [DOI] [PubMed] [Google Scholar]
4.Gray MW, Lukes J, Archibald JM, Keeling PJ, Doolittle WF. Cell biology. Irremediable complexity? Science. 2001;330:920–921. doi: 10.1126/science.1198594. [DOI] [PubMed] [Google Scholar]
5.Rowlinson JS, Widom B. Molecular Theory of Capillarity. Oxford University Press; New York: 1982. [Google Scholar]
6.Lynch M. The frailty of adaptive hypotheses for the origins of organismal complexity. Proc. Natl. Acad. Sci. USA. 2007;104(Suppl):8597–8604. doi: 10.1073/pnas.0702207104. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Levy ED, Pereira-Leal JB, Chothia C, Teichmann SA. 3D Complex: a structural classification of protein complexes. PLoS Comput. Biol. 2006;2:e155. doi: 10.1371/journal.pcbi.0020155. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Clackson T, Ultsch MH, Wells JA, de Vos AM. Structural and functional analysis of the 1:1 growth hormone:receptor complex reveals the molecular basis for receptor affinity. J. Mol. Biol. 1998;277:1111–1128. doi: 10.1006/jmbi.1998.1669. [DOI] [PubMed] [Google Scholar]
9.Levy ED, Boeri Erba E, Robinson CV, Teichmann SA. Assembly reflects evolution of protein complexes. Nature. 2008;453:1262–1265. doi: 10.1038/nature06942. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Fenimore PW, Frauenfelder H, McCammon BH, Young RD. Bulk solvent and hydration-shell fluctuations, similar to α- and β-fluctuations in glasses, control protein motions and functions. Proc. Natl. Acad. Sci. U.S.A. 2004;101:14408–14413. doi: 10.1073/pnas.0405573101. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Ostlund G, et al. InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nuc. Acids Res. 2010;38:D196–203. doi: 10.1093/nar/gkp931. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Gabaldon T, et al. Joining forces in the quest for orthologs. Genome Biol. 2009;10:403. doi: 10.1186/gb-2009-10-9-403. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Sali A, Blundell TL. Comparative protein modeling by satisfaction of spatial restraints. J. Mol. Biol. 1993;234:779–815. doi: 10.1006/jmbi.1993.1626. [DOI] [PubMed] [Google Scholar]
14.Zhou H, Skolnick J. Improving threading algorithms for remote homology modeling by combining fragment and template comparisons. Proteins. 2010;78:2041–2048. doi: 10.1002/prot.22717. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Wiederstein M, Sippl MJ. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nuc. Acids Res. 2007;35:W407–410. doi: 10.1093/nar/gkm290. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Moran NA. Accelerated evolution and Muller's ratchet in endosymbiotic bacteria. Proc. Natl. Acad. Sci. USA. 1996;93:2873–2878. doi: 10.1073/pnas.93.7.2873. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Kuriyan J, Eisenberg D. The origin of protein interactions and allostery in colocalization. Nature. 2007;450:983–990. doi: 10.1038/nature06524. [DOI] [PubMed] [Google Scholar]
18.Rizzo RC, Jorgensen WL. OPLS All-atom model for amines: Resolution of the amine hydration problem. J. Am. Chem. Soc. 1999;121:4827–4836. [Google Scholar]
19.Jorgensen WL, Chandrasekhar J, Madura J, Impey RW, Klein ML. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983;79:926–935. [Google Scholar]
20.Fernández A, Berry RS. Golden rule for buttressing vulnerable soluble proteins. J. Proteome Res. 2010;9:2643–2648. doi: 10.1021/pr100089t. [DOI] [PubMed] [Google Scholar]
21.Canutescu AA, Shelenkov A, Dunbrack RL. A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci. 2003;12:2001–2014. doi: 10.1110/ps.03154503. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Pietrosemoli N, Crespo A, Fernández A. Dehydration propensity of order-disorder intermediate regions in soluble proteins. J. Proteome Res. 2007;6:3519–3526. doi: 10.1021/pr070208k. [DOI] [PubMed] [Google Scholar]
23.Schutz CN, Warshel A. What are the dielectric constants of proteins and how to validate electrostatic models? Proteins: Str. Funct. Genet. 2001;44:400–417. doi: 10.1002/prot.1106. [DOI] [PubMed] [Google Scholar]
24.Scott R, Boland M, Rogale K, Fernández A. Continuum equations for dielectric response to macromolecular assemblies at the nanoscale. J. Phys. A. 2004;37:9791–9803. [Google Scholar]
25.Fernández A, Sosnick TR, Colubri A. Dynamics of hydrogen-bond desolvation in folding proteins. J. Mol. Biol. 2002;321:659–675. doi: 10.1016/s0022-2836(02)00679-4. [DOI] [PubMed] [Google Scholar]
26.Lindahl E, Hess B, Van der Spoel D. GROMACS 3.0: A package for molecular simulation and trajectory analysis. J. Mol. Model. 2001;7:302–317. [Google Scholar]
27.Darden T, York D, Pedersen L. Particle mesh Ewald: an Nlog(N) method for Ewald sums in large systems. J. Chem. Phys. 1993;98:10089–10092. [Google Scholar]
28.Hoover WG. Canonical dynamics: Equilibrium phase-space distributions. Phys. Rev. A. 1985;31:1695–1697. doi: 10.1103/physreva.31.1695. [DOI] [PubMed] [Google Scholar]
29.Berendsen HJ, Postma JP, van Gunsteren WF, DiNola A, Haak JR. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 1984;81:3684–3690. [Google Scholar]
30.Li X, Romero P, Rani M, Dunker AK, Obradovic Z. Predicting protein disorder for N-, C-, and internal regions. Genome Informatics. 1999;10:30–40. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS279320-supplement-1.pdf^{(787.8KB, pdf)}

[R1] 1.Lynch M, Conery JS. The origins of genome complexity. Science. 2003;302:1401–1404. doi: 10.1126/science.1089370. [DOI] [PubMed] [Google Scholar]

[R2] 2.Lynch M. The Origins of Genome Architecture. Sinauer Assocs., Inc.; Sunderland, MA: 2007. [Google Scholar]

[R3] 3.Stoltzfus A. On the possibility of constructive neutral evolution. J. Mol. Evol. 1999;49:169–81. doi: 10.1007/pl00006540. [DOI] [PubMed] [Google Scholar]

[R4] 4.Gray MW, Lukes J, Archibald JM, Keeling PJ, Doolittle WF. Cell biology. Irremediable complexity? Science. 2001;330:920–921. doi: 10.1126/science.1198594. [DOI] [PubMed] [Google Scholar]

[R5] 5.Rowlinson JS, Widom B. Molecular Theory of Capillarity. Oxford University Press; New York: 1982. [Google Scholar]

[R6] 6.Lynch M. The frailty of adaptive hypotheses for the origins of organismal complexity. Proc. Natl. Acad. Sci. USA. 2007;104(Suppl):8597–8604. doi: 10.1073/pnas.0702207104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Levy ED, Pereira-Leal JB, Chothia C, Teichmann SA. 3D Complex: a structural classification of protein complexes. PLoS Comput. Biol. 2006;2:e155. doi: 10.1371/journal.pcbi.0020155. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Clackson T, Ultsch MH, Wells JA, de Vos AM. Structural and functional analysis of the 1:1 growth hormone:receptor complex reveals the molecular basis for receptor affinity. J. Mol. Biol. 1998;277:1111–1128. doi: 10.1006/jmbi.1998.1669. [DOI] [PubMed] [Google Scholar]

[R9] 9.Levy ED, Boeri Erba E, Robinson CV, Teichmann SA. Assembly reflects evolution of protein complexes. Nature. 2008;453:1262–1265. doi: 10.1038/nature06942. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Fenimore PW, Frauenfelder H, McCammon BH, Young RD. Bulk solvent and hydration-shell fluctuations, similar to α- and β-fluctuations in glasses, control protein motions and functions. Proc. Natl. Acad. Sci. U.S.A. 2004;101:14408–14413. doi: 10.1073/pnas.0405573101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Ostlund G, et al. InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nuc. Acids Res. 2010;38:D196–203. doi: 10.1093/nar/gkp931. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Gabaldon T, et al. Joining forces in the quest for orthologs. Genome Biol. 2009;10:403. doi: 10.1186/gb-2009-10-9-403. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Sali A, Blundell TL. Comparative protein modeling by satisfaction of spatial restraints. J. Mol. Biol. 1993;234:779–815. doi: 10.1006/jmbi.1993.1626. [DOI] [PubMed] [Google Scholar]

[R14] 14.Zhou H, Skolnick J. Improving threading algorithms for remote homology modeling by combining fragment and template comparisons. Proteins. 2010;78:2041–2048. doi: 10.1002/prot.22717. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Wiederstein M, Sippl MJ. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nuc. Acids Res. 2007;35:W407–410. doi: 10.1093/nar/gkm290. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Moran NA. Accelerated evolution and Muller's ratchet in endosymbiotic bacteria. Proc. Natl. Acad. Sci. USA. 1996;93:2873–2878. doi: 10.1073/pnas.93.7.2873. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Kuriyan J, Eisenberg D. The origin of protein interactions and allostery in colocalization. Nature. 2007;450:983–990. doi: 10.1038/nature06524. [DOI] [PubMed] [Google Scholar]

[R18] 18.Rizzo RC, Jorgensen WL. OPLS All-atom model for amines: Resolution of the amine hydration problem. J. Am. Chem. Soc. 1999;121:4827–4836. [Google Scholar]

[R19] 19.Jorgensen WL, Chandrasekhar J, Madura J, Impey RW, Klein ML. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983;79:926–935. [Google Scholar]

[R20] 20.Fernández A, Berry RS. Golden rule for buttressing vulnerable soluble proteins. J. Proteome Res. 2010;9:2643–2648. doi: 10.1021/pr100089t. [DOI] [PubMed] [Google Scholar]

[R21] 21.Canutescu AA, Shelenkov A, Dunbrack RL. A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci. 2003;12:2001–2014. doi: 10.1110/ps.03154503. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Pietrosemoli N, Crespo A, Fernández A. Dehydration propensity of order-disorder intermediate regions in soluble proteins. J. Proteome Res. 2007;6:3519–3526. doi: 10.1021/pr070208k. [DOI] [PubMed] [Google Scholar]

[R23] 23.Schutz CN, Warshel A. What are the dielectric constants of proteins and how to validate electrostatic models? Proteins: Str. Funct. Genet. 2001;44:400–417. doi: 10.1002/prot.1106. [DOI] [PubMed] [Google Scholar]

[R24] 24.Scott R, Boland M, Rogale K, Fernández A. Continuum equations for dielectric response to macromolecular assemblies at the nanoscale. J. Phys. A. 2004;37:9791–9803. [Google Scholar]

[R25] 25.Fernández A, Sosnick TR, Colubri A. Dynamics of hydrogen-bond desolvation in folding proteins. J. Mol. Biol. 2002;321:659–675. doi: 10.1016/s0022-2836(02)00679-4. [DOI] [PubMed] [Google Scholar]

[R26] 26.Lindahl E, Hess B, Van der Spoel D. GROMACS 3.0: A package for molecular simulation and trajectory analysis. J. Mol. Model. 2001;7:302–317. [Google Scholar]

[R27] 27.Darden T, York D, Pedersen L. Particle mesh Ewald: an Nlog(N) method for Ewald sums in large systems. J. Chem. Phys. 1993;98:10089–10092. [Google Scholar]

[R28] 28.Hoover WG. Canonical dynamics: Equilibrium phase-space distributions. Phys. Rev. A. 1985;31:1695–1697. doi: 10.1103/physreva.31.1695. [DOI] [PubMed] [Google Scholar]

[R29] 29.Berendsen HJ, Postma JP, van Gunsteren WF, DiNola A, Haak JR. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 1984;81:3684–3690. [Google Scholar]

[R30] 30.Li X, Romero P, Rani M, Dunker AK, Obradovic Z. Predicting protein disorder for N-, C-, and internal regions. Genome Informatics. 1999;10:30–40. [PubMed] [Google Scholar]

PERMALINK

Nonadaptive origins of interactome complexity

Ariel Fernández

Michael Lynch

Abstract

Figure 1. Structural deficiencies in soluble proteins promote protein associations.

Figure 2. Structural degradation enhances protein-water interfacial tension and promotes protein interactivity in species with low population sizes.

METHODS SUMMARY

METHODS

Computation of protein-water interfacial tension

Spatially dependent coordination g=g(r)

PWIT as promoter of protein-protein associations

Identification of SABHBs in soluble proteins

Sequence-based identification of SABHBs

Evaluation of homology models

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Nonadaptive origins of interactome complexity

Ariel Fernández

Michael Lynch

Abstract

Figure 1. Structural deficiencies in soluble proteins promote protein associations.

Figure 2. Structural degradation enhances protein-water interfacial tension and promotes protein interactivity in species with low population sizes.

METHODS SUMMARY

METHODS

Computation of protein-water interfacial tension

Spatially dependent coordination g=g(r)

PWIT as promoter of protein-protein associations

Identification of SABHBs in soluble proteins

Sequence-based identification of SABHBs

Evaluation of homology models

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases