Skip to main content
Biophysical Journal logoLink to Biophysical Journal
. 2014 Sep 2;107(5):1217–1225. doi: 10.1016/j.bpj.2014.07.020

Smooth Functional Transition along a Mutational Pathway with an Abrupt Protein Fold Switch

Christian Holzgräfe 1, Stefan Wallin 1,
PMCID: PMC4156676  PMID: 25185557

Abstract

Recent protein design experiments have demonstrated that proteins can migrate between folds through the accumulation of substitution mutations without visiting disordered or nonfunctional points in sequence space. To explore the biophysical mechanism underlying such transitions we use a three-letter continuous protein model with seven atoms per amino acid to provide realistic sequence-structure and sequence-function mappings through explicit simulation of the folding and interaction of model sequences. We start from two 16-amino-acid sequences folding into an α-helix and a β-hairpin, respectively, each of which has a preferred binding partner with 35 amino acids. We identify a mutational pathway between the two folds, which features a sharp fold switch. By contrast, we find that the transition in function is smooth. Moreover, the switch in preferred binding partner does not coincide with the fold switch. Discovery of new folds in evolution might therefore be facilitated by following fitness slopes in sequence space underpinned by binding-induced conformational switching.

Introduction

Most proteins fold robustly into a single, stable three-dimensional structure determined by their amino-acid sequence. There are a growing number of examples of proteins, however, with a unique ability to undergo a global conformational transition from one fold to another (1). This ability to switch between folds can be triggered by changes in the environment and allow these proteins to perform an alternative set of biological functions. The switch of lymphotactin, for example, from a monomeric chemokine fold to a dimeric β-sandwich also changes its preferred binding partner (2). In contrast to the irreversible conformational transitions of protein misfolding and aggregation, fold switching is a reversible process. In the case of lymphotactin, the relative population of the two folds can be controlled by changing the temperature or salt concentration (2).

Important insight into fold switching has come from protein design experiments. A notable example is a series of experiments by Alexander et al. (3, 4) and He et al. (5), who took as a starting point two different domains of protein G with little sequence similarity, GA and GB, folding into two different structures (3α and α + 4β, respectively). The authors mutated both domains, making them increasingly similar in sequence yet maintaining their respective folds. Remarkably, they reached a point where two GA- and GB-variants differed in only one amino-acid position, thus demonstrating that there is a mutational pathway between the two folds in which the structure is switched in response to a single-point mutation.

The possibility of triggering proteins to switch their folds by point mutations has implications for protein evolution and the organization of sequence-structure space. In particular, it makes plausible the idea that proteins can migrate between folds mainly by accumulating substitution mutations, along the lines originally proposed by Smith (6). This view is supported by recent identification of naturally occurring homologous proteins with different structures (7, 8, 9, 10). However, the biophysical mechanisms underlying evolutionary transitions between the neutral nets of different folds, i.e., the connected network of sequences folding to the same structure (11, 12), remain poorly understood (13, 14, 15, 16, 17, 18, 19). For instance, it remains unclear how evolution is able to guide proteins through the narrow passages in sequence space between folds while maintaining functionally relevant proteins.

Here we examine the properties of a mutational pathway between two elementary folds within a self-contained modeling framework. As a biophysical basis for scoring stability and function, we use a reduced continuous protein model with three amino-acid types (20), which explicitly captures key aspects of folding, including secondary structure and hydrophobic core formation (21, 22), and yet allows the thermodynamic behavior of many sequences to be determined.

Our starting point is two ideally amphiphatic sequences folding into an α-helix and a β-hairpin, respectively. We find that there is a direct mutational pathway between the sequences, and it does not pass through unstable or nonfunctional sequences. Importantly, the switch points for fold and function do not coincide and we therefore distinguish between neutral and functional nets, as illustrated in Fig. 1. In contrast to neutral nets, we find that functional nets can overlap in sequence space. Taken together, this means that potential evolutionary bridge sequences with a multifunctional character might be located elsewhere than at the fold switch point. Across the switch in fold, we find a gradual change of function, which might facilitate evolutionary transitions between folds.

Figure 1.

Figure 1

Schematic view of our selected mutational pathway between the two model sequences N1 and A1 folding into a β-hairpin and an α-helix, respectively, and associated (A) neutral and (B) functional nets. Along the pathway, neighboring sequences (circles) differ in a single position and are connected (solid lines). The pathway exhibits a fold switch, consisting of P1 and P2, where the native structure changes abruptly. In a neutral net, all sequences share the same fold and are connected via point mutations (11). Similarly, in a functional net, all sequences share the same binding partner and are connected. As natural binding partners to N1 and A1, we use the model proteins TN and TA, respectively. In contrast to the abrupt fold switch, the functional transition along the path is smooth and, moreover, it occurs within the α-helix neutral net. An overlap in the two functional nets represents bifunctional sequences (e.g., P5). P5 undergoes a helix-to-hairpin fold switch upon binding to TN. (Ribbon-and-stick representation) Minimum-energy structures for (A) N1 and A1 and (B) TN and TA; (red) h, (gray) p, and (green) t amino-acid types; and (spheres) Cβ-atoms of the h amino acids. To see this figure in color, go online.

Materials and Methods

Protein model

The calculations in this work are performed using the three-letter (p, h, and t) continuous protein model developed and described in detail in Bhattacherjee and Wallin (20). Geometrically, the model combines a detailed backbone representation, including explicit N, Cα, C′, O, Hα1, and Hα2 atoms, with a simplified side-chain representation consisting of a single, enlarged Cβ atom. Bond lengths and angles are held fixed at standard values obtained from a statistical analysis of experimentally solved protein structures, such that an N-amino-acids chain conformation C is determined by the 2N backbone torsion angles ϕi and ψi, i = 1,…, N. The amino-acid types p (polar) and h (hydrophobic) include the Cβ-atom and are geometrically related to alanine whereas t (turn) instead includes a Hα2-atom and is thus essentially a glycine residue. The energy function can be written as a sum of four terms:

E(C)=Eev+Eloc+Ehb+Ehp. (1)

The first term, Eev, implements soft (σev/r)12 repulsions between all pairs of atoms using typical van der Waals atom radii, except for CβCβ pairs for which σev = 5.0 Å. The second term, Eloc, represents interactions between partial charges on neighboring peptide planes and is included to give a good local description of the polypeptide chain. The last two terms, Ehb and Ehp, represent backbone-backbone hydrogen bonding and effective hydrophobic attractions, respectively, and are driving both folding and association in the model. A hydrogen bond is modeled by a directionally dependent attraction between NH and CO groups, with the energy minimum occurring for the distance rOH = 2.0 Å and the angles αNHO = αCOH = 180°. The Cβ atoms of hydrophobic amino acids attract each other pairwise through a Lennard-Jones-like term with the energy minimum at rCβCβ = 5.0 Å, whereas all other Cβ Cβ pairs are purely repulsive through the excluded-volume term.

Monte Carlo folding simulations

Calculations of the folding thermodynamics of various sequences σ are carried out using simulated tempering (ST) (23, 24, 25), an expanded-ensemble Monte Carlo (MC) method that enhances conformational sampling by making the temperature T a dynamic variable. Hence, it works by simulating the probability distribution

P(C,k)exp[E(C)/kBTk+gk], (2)

where k = 0,…, M is a temperature index. The simulation parameters determine the marginal distribution P(k) and are selected such that P(k) ≈ M−1. In ST, MC updates operate either on the temperature (kk′) or the chain conformation (CC′), and are subject to a Metropolis accept/reject question. We use two different types of conformational updates: a pivot move where a single ϕi- or ψi-angle is selected and assigned a new random value, and a semilocal move where eight consecutive ϕi-,ψi-angles are turned in a coordinated way to yield approximately local chain deformations (26). We use M = 8 temperatures between kBT0 = kBTlow = 0.43 and kBT7 = 0.65. For each of the 144 sequences in our restricted sequence space, we performed four independent runs, each of 4 × 108 elementary MC steps. For the sequences A1 and N1, we performed five independent runs of 109 steps each. For TA and TN, we used kBT0 = 0.47 and kBT7 = 0.7, and performed at least three independent simulations of 109 steps each.

Binding simulations

To determine the probability of selected sequences σ being bound to the two different targets TA and TN at Tlow, we used a variant of ST that works in the following way:

We first split the energy function into two parts, E = E0 + E1, where E1 includes all (inter- and intrachain) hydrophobic and hydrogen bond terms involving the ligand σ and E0 includes all remaining terms. We then simulate the probability distribution

P(C,k)exp[E0/kBT0E1/kBTk+gk], (3)

where Tk can be seen as a pseudo temperature. We use either M = 8 or M = 10, putting T0 = Tlow and TM−1 = Tmax. Hence, only conformations generated at T0 will be physically meaningful. However, frequent visits to pseudo temperatures k > 0 ensure that bound and unbound, as well as unfolded, ligand conformations are properly sampled while the target remains folded, allowing the relative weight of bound and unbound states to be accurately determined. In particular, visiting unfolded ligand conformations is necessary because some σ switch fold upon binding, and a rather large free energy barrier separates the two folds at Tlow. We use kBTmax = 0.73 (M = 10) or 0.65 (M = 8). Simulations are performed in a box with side length L = 150 Å and periodic boundaries, and started with the target in a folded conformation and σ in a random conformation. The MC move set is extended to include single-chain rigid-body translations and rotations. For each target and sequence pair we performed 10 runs of each 1010 elementary MC steps.

Observables

The fraction of native contacts, Qα and Qβ, are calculated based on the two sets of hydrogen bonds present in the A1 and N1 native structures, respectively. For the α-helix, the set of contacts (i,j) between the CO group of amino-acid i and the NH group of amino-acid j is (2,6), (3,7), (4,8), (5,9), (6,10), (7,11), (8,12), (9,13), (10,14), and (11,15), and for the β-hairpin it is (3,14), (5,12), (7,10), (10,7), (12,5), and (14,3). We note that there is no overlap between the two contact sets. In the calculation of Qα and Qβ, we consider a hydrogen bond formed if 1), both the αNHO and αHOC angles are >90°, and 2), the distance rHO < 2.7 Å. Root-mean-square deviations (RMSD), and center-of-mass distances, RCM, are calculated over all Cα atoms. In defining unbound and bound states in our binding simulations, we use the fact that the probability distribution P(RCM) is bimodal (see Fig. S1 in the Supporting Material). The cutoff value RCM = 20 Å was selected because it lies close to the minimum between the two peaks in P(RCM), thereby naturally separating bound and unbound states.

Results

Continuous three-letter protein model

As a biophysical basis for our investigation into the properties of the sequence-structure and sequence-function relationships, we use the continuous protein chain model described in Materials and Methods (see also Bhattacherjee and Wallin (20)). The model has three amino-acid types: polar (p), hydrophobic (h), and turn (t). The driving forces for folding and binding are backbone hydrogen bonding and pairwise Cβ-Cβ attraction of h-amino acids. The model was developed to fold a set of sequences into diverse native structures (20)—meaning, in particular, it lacks an inherent preference for either α- or β-secondary structure. In combination with efficient MC conformational sampling techniques, the model therefore provides a self-contained framework for exploring protein fold switching.

Thermodynamics of N1 and A1

We take as a starting point two 16 amino-acid sequences, N1 and A1, which are shown in Table 1. The sequences are constructed using basic protein design principles for amphiphatic α- and β-secondary structure, respectively (27, 28, 29). Accordingly, A1 is made up of h- and p-amino acids organized such that the h-amino acids are repeated at every three or four positions along the sequence (e.g., phpphh), allowing for the formation of an amphiphatic α-helix. The N1 sequence has two regions with alternating p- and h-amino acids (phphph) making it possible to form a β-hairpin with all h-side chains on the same side. At the center of N1, two flexible t-amino acids (no Cβ-atom) accommodate a tight β-hairpin turn. Both sequences contain a total of six h-amino acids.

Table 1.

Model proteins studied

Protein N Sequence
A1 16 pphpphhpphpphhpp
N1 16 phphphpttphphphp
TA 35 A1′-ttt-A1′
TN 35 A1-ttt-N1
A1′ 16 pphhphhpphhphhpp

Amino-acid sequences of A1 and N1 and their respective binding partners, TA and TN. In our reduced continuous protein model there are three types of amino acids: p (polar), h (hydrophobic), and t (turn). N is the number of amino acids.

In our model, the N1 and A1 sequences fold at low temperatures into a stable β-hairpin and α-helix, respectively, as can be seen from Fig. 2. Although the stability of the α-helix is slightly higher than for the β-hairpin, the folding of both N1 and A1 are near complete at the lowest studied temperature, Tlow. The N1 and A1 native state populations are Pβ ≈ 80% and Pα ≈ 85%, respectively. In defining the two native states, we use the minimum-energy conformations found in each case (see Fig. 1 A), which we refer to as the N1 and A1 native structures. A conformation is considered part of the α-helix native state if the fraction of native hydrogen bonds, Qα, is at least 80%, and similarly for the β-hairpin native state (Qβ ≥ 80%). We do not require all native contacts to be present because bonds close to the tails are easily broken by thermal fluctuations.

Figure 2.

Figure 2

Thermodynamic behavior of A1 and N1. (A) Temperature dependence of the specific heat capacity Cv = (〈E2〉 − 〈E2)/NkBT2, where E is the total energy, kB is the Bolzmann’s constant, N is the number of amino acids, and T is the temperature. (B) Fraction of configurations in the α-helix (Pα) and β-hairpin (Pβ) native states for A1 and N1, respectively, where a conformation belong to the native state if ≥80% of the native structure hydrogen bonds are formed (see the min-E structures in Fig. 1A). (C and D) Two-dimensional free energy surfaces F(E, RMSDα/β)/kBT = −ln P(E, RMSDα/β) for (C) N1 and (D) A1 where the root-mean-square deviations RMSDβ and RMSDα are calculated against the N1 and A2 native structures, respectively, and the probabilities P(E, RMSDα/β) are taken at kBTlow = 0.43. To see this figure in color, go online.

Restricted sequence-structure space

Next, we will address the question of whether there are pathways in sequence space connecting A1 and N1 along which intermediate sequences maintain a high Pα or Pβ. This requires an exploration of the sequence-structure space. In searching for such mutational pathways, a natural sequence space to consider is that which is spanned by A1 and N1. Because A1 and N1 differ at 10 amino-acid positions, i.e., their Hamming distance is H = 10, this space includes 310 = 59,049 sequences and is too large to explore fully even with our simplified model. To limit the number of sequences, we impose further restrictions on this space. We include sequences for which

  • 1.

    The amino-acid type, at each position, is taken either from N1 or A1;

  • 2.

    There is a total of six h-amino acids; and

  • 3.

    The h-amino acids are divided equally between the first and second halves, which leaves 144 sequences.

We calculated the thermodynamic behavior of these sequences using the same MC methods as for A1 and N1. Note that, due to the second restriction, there are many closely related sequences with ΔH = 2. We start by describing some general features of the sequence-structure space.

Neutral net is larger for the α-helix than for the β-hairpin

As a first step, we order the 144 sequences by their Hamming distance to N1, and calculate their Pα and Pβ values, at Tlow. The result is shown in Fig. 3 A. It turns out that there are more sequences with high Pα than with high Pβ. In fact, only a few mutational steps from N1 will render the stability of the β-hairpin rather low. By contrast, A1 can sustain several consecutive mutations without the α-helix becoming unstable.

Figure 3.

Figure 3

Restricted sequence-structure space and selected mutational pathway. (A) Populations of the α-helix (Pα) and β-hairpin (Pβ) native states for the 148 different sequences in our restricted sequence space and selected mutational pathway as a function of Hamming distance to N1 (H), at Tlow. (Thin lines) Upper and lower limits for the obtained Pα- and Pβ-values, respectively; (thick lines) selected mutational pathway. The minimum and maximum native-state stabilities along the mutational pathway occur for H = 1 (Pβ = 0.69 ± 0.03) and H = 9 (Pα = 0.90 ± 0.02), respectively. (B) Individual amino-acids sequences of the pathway, from N1 to A1, and the names of intermediate sequences. (Blue circle) The position mutated from the previous sequence in the line above. (C) Free energy surface F(RMSDα, RMSDβ)/kBT = −ln P(RMSDα, RMSDβ) for (left) P1 and (right) P2, where the probabilities P(RMSDα, RMSDβ) are taken at Tlow. The step from P1 to P2 leads to a sharp population shift from β-hairpin to α-helix. To see this figure in color, go online.

Neutral sets and nets have proved useful for describing the mutational robustness of protein folds, in particular for lattice model studies (11, 12). A neutral set contains all sequences with the same configuration as their unique ground state. The neutral set can often be split into components, each of which consists of a subset of sequences connected by single-point mutations. The largest such component is called the neutral net of the structure. Extending the concept of neutral nets to our continuous model can be readily done, e.g., by requiring a native state population P > Pcut for some cutoff Pcut, in lieu of ground state uniqueness. A glimpse of the full neutral nets is offered by our restricted sequence space and the result in Fig. 3 A indicates that the neutral net of the α-helix is larger than that of the β-hairpin.

Bistable sequences have low total stability

Bistable sequences have been suggested to facilitate transitions between folds (30, 31). They are characterized by populating, to similar extents, two different folded structures. It is a priori not clear how the occurrence of structurally degenerate native states will impact the overall stability of a protein. A naive guess would be that the summed population of the two folds of a bistable sequence might be as high, or perhaps higher, than the population of the single native state of an ordinary monostable protein. Fig. 4 shows, however, that this is likely not the case. The 12 sequences in our restricted data set, which populate the α-helix and β-hairpin native states to similar extents (relative bistability B > 0.8; see Fig. 4 legend for definition), exhibit a relatively low total native state population (Ptot ≲ 0.5). Moreover, the 31 sequences with the highest stability (Ptot > 0.8) all have B < 0.05. In short, the most stable sequences exhibit little bistability and sequences with high relative bistability are not very stable. A similar trend can be seen between B and the stability of the most stable native state, S = max(Pα,Pβ). This parameter may be of more direct evolutionary relevance than Ptot, because proteins in a neutral net are typically under selective pressure to maintain at least a minimal level of stability of the fold (32).

Figure 4.

Figure 4

Native state degeneracy versus thermodynamic stability. For the 144 sequences in our restricted sequence space, we show (A) overall native state population Ptot = Pα + Pβ and (B) stability of the most stable native state S = max(Pα,Pβ), against a measurement of relative bistability, B = 1−|PαPβ|/Ptot, obtained at Tlow. A sequence with B = 1 populates the two native states equally and a sequence with B = 0 populates only one of the states. The quantity S has a B-dependent upper limit, Su = 1 − B/2 (solid line). We find that sequences with high B are thermodynamically unstable, as measured by either Ptot or S.

Mutational pathway with abrupt fold switch

We turn now to the task of determining a mutational pathway between N1 and A1 along which maximal stability is maintained. To this end, we first organize our 144 sequences into a graph where each sequence is represented by a vertex and any two vertices are connected by an edge if ΔH ≤ 2. Including 2-edges (with ΔH = 2) is necessary to have a fully connected graph in our restricted sequence space. A 2-edge corresponds to swapping a p- and h-pair within either the first- or second-half of the sequence. We can now consider different paths between sequences in the graph. The minimally short paths from N1 to A1 are taken in six steps (four 2-edges and two 1-edges giving a distance 2 × 1 + 4 × 2 = 10). As it turns out, there are 2880 possible such paths.

To select a path with optimal stability property, we proceed in the following way:

  • 1.

    For each of the 2880 paths, we determine the average stability 〈S〉, where 〈〉 indicates average over the sequences in the path. We select one of the top scoring pathways for which 〈S〉 = 0.81, meaning all included sequences fold into either a stable β-hairpin or a stable α-helix.

  • 2.

    The 2-edges of the selected pathway are replaced by two consecutive point mutations. For each 2-edge, this can be done in two different ways, either going through an hh- or pp-pair. We keep the path providing the highest 〈S〉.

The sequences making up the chosen mutational pathway are shown in Fig. 3 B. The path from N1 to A1 includes an abrupt switch in fold from β-hairpin to α-helix in response to the point mutation t8p. This mutation acts by both destabilizing the β-hairpin turn and stabilizing the α-helix fold.

It is of interest to further investigate the thermodynamic behavior of the switch point, consisting of the two sequences P1 and P2 (see Fig. 3 B). Both sequences exhibit a clearly dominant fold. The alternative folds, i.e., an α-helix for P1 and a β-hairpin for P2, are populated to some extent but the suppression is relatively strong (≈ 3–4 kBT), as shown in Figs. 3 C and S2. The sharpness of the fold switch can also be seen by examining the very different probability distributions of the hydrogen bond and hydrophobicity energy terms exhibited by P1 and P2 (see Fig. S3).

Scoring function with the targets TN and TA

As a way to quantify the functional capabilities of various sequences, we constructed two 35-amino-acid target sequences, TN and TA. These sequences fold at low T into α-β-β and α-α structures, respectively, as illustrated in Fig. 5 and by their minimum-energy conformations in Fig. 1 B. The idea behind these constructions is that the TA and TN folds will provide natural scaffolds to which an α-helix and a β-hairpin can preferentially bind, respectively. Motivated by this, we define PT(σ) as the probability of a sequence σ to be properly bound to a target T. In defining properly bound, we use different criteria for TA and TN such that binding to TA requires a tightly packed α-helix (RCM < 20 Å and Qα ≥ 80%, where RCM is the center-of-mass distance between the two chains) while binding to TN requires a bound β-hairpin (RCM < 20 Å and Qβ ≥ 80%). To quantify the functional activity for a given target, we use a score fT (σ) = ln(PT/PU), where PU(σ) is the probability for σ to be unbound (RCM > 20 Å). The f-score can this way be seen as a (reverse sign) binding free energy in units of kBT, such that a natural criterion for significant functional activity is fT > 0 corresponding to PT > PU.

Figure 5.

Figure 5

Thermodynamic behavior of the N1 and A1 binding partners, TN and TA. Temperature dependence of the (A) specific heat capacity, Cv, and the (B) average root-mean-square deviations for TN and TA calculated to their respective native structures (see min-E structures in Fig. 1B), 〈RMSDαββ〉 and 〈RMSDαα〉. Folding temperatures, defined as the maximum in Cv, are kBT = 0.52 and 0.57 for TN and TA, respectively. (C and D) Free energy surfaces (C) F(E, RMSDαββ) for TN and (D) F(E, RMSDαα) for TA, at kBT = 0.47. To see this figure in color, go online.

To estimate the functional scores fTA(σ) and fTN(σ) for different sequences σ, we use an in-principle straightforward procedure, simply allowing the two chains to interact in a simulation box without any constraint on either chain. It turns out that the targets bind some sequences very tightly at Tlow such that PT ≈ 1, making proper statistical sampling nontrivial. To overcome this challenge, we developed an expanded-ensemble MC method where a dynamic pseudo-temperature affects part of the energy function, including interchain attractions, allowing representative sampling of both bound and unbound states. Using this procedure, we calculate the functional scores of N1 and A1 and find that they bind their intended targets with comparable strengths, fTN (N1) = 5.4 ± 0.5 and fTA (A1) = 4.3 ± 0.1, respectively. By contrast, the functional scores for the cross-interactions are very low, i.e., fTN (N1) and fTA (A1) ≪ 0.

Summarizing our results so far, we have shown that within our theoretical framework N1 and A1 fold into two different native structures, and that TN and TA act as their respective preferred binding partners.

The switch points for fold and function do not coincide

What are the functional capabilities in the sequence space between N1 and A1? This question is relevant for potential evolutionary transitions between the two proteins, A1 and N1. To explore this question, we calculated fTA and fTN for the sequences along our N1-A1 mutational pathway. The result is shown in Fig. 6. We find that the functional scores tend to be lower at intermediate points along the path than at the start points, A1 and N1. There is, however, no entirely nonfunctional sequence; all sequences maintain f > 2 for at least one of the two targets throughout the path.

Figure 6.

Figure 6

Functional scores along the N1-A1 mutational pathway at Tlow. (A) The scores fTN and fTA (see text) as a function of Hamming distance H from N1. Native α-helix and β-hairpin populations for the sequences following the switch to an α-helix (H = 2–5 or P2–P5), when (B) in spatial proximity to TN (RCM < 20 Å) and (C) in isolation. The P2–P5 sequences exhibit binding-induced fold switching from an α-helix to a β-hairpin upon binding to TN. To see this figure in color, go online.

It is especially interesting to compare the switch in fold from β- to α-structure, which is abrupt, to the switch from TN to TA function, which occurs gradually. Another difference is that there is a point along the path which exhibits significant bifunctionality (P5 at H = 5). Hence, a feature emerging from our model is that the evolutionary potentials underlying the TA and TN functions are smooth, in contrast to the sharp switch in fold. From a structural and dynamical perspective, the bifunctional sequence P5 is of particular interest. As it folds into a stable α-helix on its own, it undergoes a binding-induced fold switch into a β-hairpin when associating with the target TN.

Discussion

There are several mechanisms by which protein evolution can occur, among them gene duplication and recombination. Here we have explored the biophysical properties of the simplest possible evolutionary mechanism, namely the accumulation of substitution mutations. To this end, we studied the sequence-structure-function space using a three-letter continuous protein model that folds model sequences into realistic protein structures and allows a natural way to score function through the binding strengths between different sequences.

We examined possible mutational pathways between two elementary protein folds, an α-helix and a β-hairpin. We find the following:

  • 1.

    A direct mutational pathway connects the two folds,

  • 2.

    The switch in fold does not coincide with the switch in function, and

  • 3.

    The functional transition along the mutational pathway is smooth in contrast to the switch in fold, which is abrupt.

Our results provides some insight into how new folds and functions can evolve. It is interesting to consider a hypothetical evolutionary transition between A1 and N1, which can be viewed in two different ways. From a fold-centric perspective, the transition is completed in a single mutational step (P1 to P2, or vice versa), and involves a direct passage between the α-helix and β-hairpin neutral nets (see Fig. 1 A). The transition is abrupt in the sense that it does not pass through a sequence for which there is a balance in the populations of the two folds. In taking instead a function-centric perspective, it is useful to define, in analogy with neutral nets, the functional net of a target T as the set of interconnected sequences that are functionally active with respect to T, e.g., with a binding affinity above some threshold. This reveals a rather different view of the A1-N1 transition, as illustrated in Fig. 1 B, where we have defined the functional net of TA and TN using the criteria fTA > 0 and fTN > 0, respectively. In contrast to the neutral nets, we find that the two functional nets overlap in sequence space. The bi- or multifunctional sequences in regions where functional nets overlap might facilitate evolutionary transitions between different folds. Alternatively, they can be stable evolutionary points in themselves under circumstances where multifunctionality is beneficial.

The organization of sequence-structure space has been the target of several previous theoretical studies (11, 12, 13, 14, 15, 16, 17, 18, 19, 30, 31). In particular, the question of the frequency of connections between neutral nets has been addressed. Cao and Elber (19) investigated the flow of sequences in a network of structures from a Protein Data Bank sample and found that, on average, each fold is connected by a small number of mutations to tens of others. In a simple H/P lattice model, where sequence-structure space can be exactly enumerated, direct connections between neutral nets are common, but not highly frequent (18). These results raises the question of how the links between different folds are found by evolution. Our results suggest smooth gradients in function, underpinned by binding-induced conformational switching, might provide a biophysical mechanism for locating the sharp and hard-to-find transitions between different protein folds.

The overlap of the two functional nets occurs in our system within the neutral net of one of the folds, i.e., where a single native structure dominates. This highlights a difference with recent work (30, 31) on lattice proteins, which had suggested that bistable sequences with a native state encompassing two distinct structures may play the role of evolutionary bridges between folds. Bistable sequences correspond to overlaps of neutral nets (30). In our model, we find no sign of an overlap of the α-helix and β-sheet neutral nets. Although many sequences in our restricted set do exhibit a high relative bistability B, their overall native state stabilities are generally low such that they do not belong to either net. It is important to point out, however, that the fold switch studied in this work represents a rather dramatic change in structure such that all native contacts are replaced upon switching. How the ability to support bistability depends on the degree of structural similarity of the two folds remains to be investigated.

The character of the fold transition in this work is reminiscent of that of the GA-GB system (3, 4, 5). Several different mutational paths between the GA and GB folds have been found (5). In all cases, the transition was found to be sharp without passing through a bistable sequence. Further, as for our model system, the observed switch in preferred binding partners along the GA-GB mutational pathways does not occur at the fold switch point. The designed sequence GB98-T25I, for example, adopts a 3α-fold but has lost its ability to bind the preferred 3α-fold target, HSA. Instead, GB98-T25I binds to the alternative target IgG, presumably switching to a 4β + α fold upon binding (5). In a similar vein, some of the sequences along our pathway (P2–P4) fold into a stable α-helix on their own but bind to TN as a β-hairpin, and only weakly to TA, emphasizing that membership in a neutral net is not a sufficient criterion for determining functional behavior. Furthermore, similar to our results, there are individual sequences along the GA-GB mutational paths that exhibit bifunctionality. GA98, for example, binds both the IgG and HSA targets, with slightly decreased affinity (5). The experimental results on the GA-GB system are thus in line with the notion that functional nets, rather than neutral nets, are more prone to overlap in sequence space.

Multifunctionality mediated by conformational flexibility in proteins has been observed in several cases (33, 34). Experimental characterization of structural diversity in proteins is therefore key to providing further insight into the emergence of new folds and functions. The gradual shift in functional strength seen here extends to sequences (e.g., P5) with a relatively stable single native state and only minor population of an alternative fold. Determining whether functional landscapes can indeed be smooth, and thereby drive fold transitions, might therefore require direct assessment of functional activity. It would also be of particular interest to use our model to explore the character of functional landscapes for natively unstructured or intrinsically disordered proteins, which lack a stable native structure altogether. The function of disordered proteins is typically underpinned by a coupled folding-binding transition, a biophysical mechanism similar to binding-induced fold switching. It is possible therefore that this class of proteins might be free to follow evolutionary pathways through potentially smooth functional landscapes without the constraint of staying within neutral nets.

Although our biophysical model for protein folding and binding provides a mapping between sequence and realistic protein structures, it ignores many details of protein physics such as side-chain packing and protein-solvent interactions, and should be viewed as a simplified model. One might perhaps expect such omissions to lead to exaggerated native state stabilities. However, the parameterization strategy applied for the model development, specifically demanding the spontaneous folding of a set of sequences to diverse native structures, constrains key parameters, such as the strengths of hydrogen bonds and hydrophobic attractions (20), to reasonable values. Indeed, many of the 144 sequences in our restricted sequence set exhibit low populations of both the α-helix and the β-hairpin folds (see Fig. 4), showing that any potential overstabilization is not severe. We focused in this work on relatively short sequences to allow a systematic exploration of the sequence-structure space, but more realistic protein sizes can also be studied in our model with additional computational cost. It is, however, interesting that the observed α- to β-structure transition is sharp despite the chain size studied, indicating that a qualitative comparison with experimental data on longer proteins is not unreasonable.

Summary and Conclusion

We have, within a continuous protein model, explored function and stability properties along a mutational pathway with an abrupt fold switch between two basic structural elements. Our results suggest that evolutionary transitions between protein folds can be guided by smooth gradients in a densely connected functional space. These gradients are underpinned by a biophysical mechanism based on binding-induced conformational switching. The model used, although simplified, strikes a balance between realistic features of protein folding and computational tractability which, in particular, means that function of sequences can be naturally scored. It therefore opens up for an extension to explore the transition between different functions of unstable or disordered protein sequences.

Acknowledgments

This work was supported in part by the Swedish Research Council. Computational resources were provided by the Swedish National Infrastructure for Computing (SNIC) through the Lunarc facility.

Editor: Nathan Baker.

Footnotes

Supporting Material

Document S1. Three figures
mmc1.pdf (200.1KB, pdf)
Document S2. Article plus Supporting Material
mmc2.pdf (1.5MB, pdf)

References

  • 1.Bryan P.N., Orban J. Proteins that switch folds. Curr. Opin. Struct. Biol. 2010;20:482–488. doi: 10.1016/j.sbi.2010.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Tuinstra R.L., Peterson F.C., Volkman B.F. Interconversion between two unrelated protein folds in the lymphotactin native state. Proc. Natl. Acad. Sci. USA. 2008;105:5057–5062. doi: 10.1073/pnas.0709518105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Alexander P.A., He Y., Bryan P.N. The design and characterization of two proteins with 88% sequence identity but different structure and function. Proc. Natl. Acad. Sci. USA. 2007;104:11963–11968. doi: 10.1073/pnas.0700922104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Alexander P.A., He Y., Bryan P.N. A minimal sequence code for switching protein structure and function. Proc. Natl. Acad. Sci. USA. 2009;106:21149–21154. doi: 10.1073/pnas.0906408106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.He Y., Chen Y., Orban J. Mutational tipping points for switching protein folds and functions. Structure. 2012;20:283–291. doi: 10.1016/j.str.2011.11.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Smith J.M. Natural selection and the concept of a protein space. Nature. 1970;225:563–564. doi: 10.1038/225563a0. [DOI] [PubMed] [Google Scholar]
  • 7.Davidson A.R. A folding space odyssey. Proc. Natl. Acad. Sci. USA. 2008;105:2759–2760. doi: 10.1073/pnas.0800030105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Roessler C.G., Hall B.M., Cordes M.H.J. Transitive homology-guided structural studies lead to discovery of Cro proteins with 40% sequence identity but different folds. Proc. Natl. Acad. Sci. USA. 2008;105:2343–2348. doi: 10.1073/pnas.0711589105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sadreyev R.I., Kim B.-H., Grishin N.V. Discrete-continuous duality of protein structure space. Curr. Opin. Struct. Biol. 2009;19:321–328. doi: 10.1016/j.sbi.2009.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Grishin N.V. Fold change in evolution of protein structures. J. Struct. Biol. 2001;134:167–185. doi: 10.1006/jsbi.2001.4335. [DOI] [PubMed] [Google Scholar]
  • 11.Bornberg-Bauer E. How are model protein structures distributed in sequence space? Biophys. J. 1997;73:2393–2403. doi: 10.1016/S0006-3495(97)78268-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bornberg-Bauer E., Chan H.S. Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. Proc. Natl. Acad. Sci. USA. 1999;96:10689–10694. doi: 10.1073/pnas.96.19.10689. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Deeds E.J., Dokholyan N.V., Shakhnovich E.I. Protein evolution within a structural space. Biophys. J. 2003;85:2962–2972. doi: 10.1016/S0006-3495(03)74716-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Xia Y., Levitt M. Funnel-like organization in sequence space determines the distributions of protein stability and folding rate preferred by evolution. Proteins. 2004;55:107–114. doi: 10.1002/prot.10563. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Bloom J.D., Labthavikul S.T., Arnold F.H. Protein stability promotes evolvability. Proc. Natl. Acad. Sci. USA. 2006;103:5869–5874. doi: 10.1073/pnas.0510098103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wroe R., Chan H.S., Bornberg-Bauer E. A structural model of latent evolutionary potentials underlying neutral networks in proteins. HFSP J. 2007;1:79–87. doi: 10.2976/1.2739116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Pascual-García A., Abia D., Bastolla U. Quantifying the evolutionary divergence of protein structures: the role of function change and function conservation. Proteins. 2010;78:181–196. doi: 10.1002/prot.22616. [DOI] [PubMed] [Google Scholar]
  • 18.Holzgräfe C., Irbäck A., Troein C. Mutation-induced fold switching among lattice proteins. J. Chem. Phys. 2011;135:195101. doi: 10.1063/1.3660691. [DOI] [PubMed] [Google Scholar]
  • 19.Cao B., Elber R. Computational exploration of the network of sequence flow between protein structures. Proteins. 2010;78:985–1003. doi: 10.1002/prot.22622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bhattacherjee A., Wallin S. Coupled folding-binding in a hydrophobic/polar protein model: impact of synergistic folding and disordered flanks. Biophys. J. 2012;102:569–578. doi: 10.1016/j.bpj.2011.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kauzmann W. Some factors in the interpretation of protein denaturation. Adv. Protein Chem. 1959;14:1–63. doi: 10.1016/s0065-3233(08)60608-7. [DOI] [PubMed] [Google Scholar]
  • 22.Du D., Tucker M.J., Gai F. Understanding the mechanism of β-hairpin folding via Φ-value analysis. Biochemistry. 2006;45:2668–2678. doi: 10.1021/bi052039s. [DOI] [PubMed] [Google Scholar]
  • 23.Marinari E., Parisi G. Simulated tempering: a new Monte Carlo scheme. Europhys. Lett. 1992;19:451–458. [Google Scholar]
  • 24.Lyubartsev A.P., Martsinovski A.A., Vorontsov-Velyaminov P.N. New approach to Monte Carlo calculation of the free energy: method of expanded ensembles. J. Chem. Phys. 1992;96:1776–1783. [Google Scholar]
  • 25.Irbäck A., Potthast F. Studies of an off-lattice model for protein folding: sequence dependence and improved sampling at finite temperature. J. Chem. Phys. 1995;103:10298–10305. [Google Scholar]
  • 26.Favrin G., Irbäck A., Sjunnesson F. Monte Carlo update for chain molecules: biased Gaussian steps in torsional space. J. Chem. Phys. 2001;114:8154–8158. [Google Scholar]
  • 27.Regan L., DeGrado W.F. Characterization of a helical protein designed from first principles. Science. 1988;241:976–978. doi: 10.1126/science.3043666. [DOI] [PubMed] [Google Scholar]
  • 28.Xiong H., Buckwalter B.L., Hecht M.H. Periodicity of polar and nonpolar amino acids is the major determinant of secondary structure in self-assembling oligomeric peptides. Proc. Natl. Acad. Sci. USA. 1995;92:6349–6353. doi: 10.1073/pnas.92.14.6349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Hecht M.H., Das A., Wei Y. De novo proteins from designed combinatorial libraries. Protein Sci. 2004;13:1711–1723. doi: 10.1110/ps.04690804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Sikosek T., Bornberg-Bauer E., Chan H.S. Evolutionary dynamics on protein bi-stability landscapes can potentially resolve adaptive conflicts. PLOS Comput. Biol. 2012;8:e1002659. doi: 10.1371/journal.pcbi.1002659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Sikosek T., Chan H.S., Bornberg-Bauer E. Escape from adaptive conflict follows from weak functional trade-offs and mutational robustness. Proc. Natl. Acad. Sci. USA. 2012;109:14888–14893. doi: 10.1073/pnas.1115620109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Taverna D.M., Goldstein R.A. Why are proteins marginally stable? Proteins. 2002;46:105–109. doi: 10.1002/prot.10016. [DOI] [PubMed] [Google Scholar]
  • 33.James L.C., Tawfik D.S. Conformational diversity and protein evolution—a 60-year-old hypothesis revisited. Trends Biochem. Sci. 2003;28:361–368. doi: 10.1016/S0968-0004(03)00135-X. [DOI] [PubMed] [Google Scholar]
  • 34.Meier S., Özbek S. A biological cosmos of parallel universes: does protein structural plasticity facilitate evolution? BioEssays. 2007;29:1095–1104. doi: 10.1002/bies.20661. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Three figures
mmc1.pdf (200.1KB, pdf)
Document S2. Article plus Supporting Material
mmc2.pdf (1.5MB, pdf)

Articles from Biophysical Journal are provided here courtesy of The Biophysical Society

RESOURCES