Abstract
A very small number of natural proteins have folded configurations in which the polypeptide backbone is knotted. Relatively little is known about the folding energy landscapes of such proteins, or how they have evolved. We explore those questions here by designing a unique knotted protein structure. Biophysical characterization and X-ray crystal structure determination show that the designed protein folds to the intended configuration, tying itself in a knot in the process, and that it folds reversibly. The protein folds to its native, knotted configuration approximately 20 times more slowly than a control protein, which was designed to have a similar tertiary structure but to be unknotted. Preliminary kinetic experiments suggest a complicated folding mechanism, providing opportunities for further characterization. The findings illustrate a situation where a protein is able to successfully traverse a complex folding energy landscape, though the amino acid sequence of the protein has not been subjected to evolutionary pressure for that ability. The success of the design strategy—connecting two monomers of an intertwined homodimer into a single protein chain—supports a model for evolution of knotted structures via gene duplication.
Keywords: Anfinsen, energy landscape, folding kinetics, protein folding, topology
Thousands of distinct protein folds have been observed in nature, yet only a handful possess the property of having a knotted protein backbone (1). These rare cases present intriguing opportunities for studying the mechanisms of protein folding (2). In order to fold into a knot, complex contortions of the protein chain are required. For instance, one part of the protein might have to pass through a loop formed by another part at some point during the folding process, like a thread through the eye of a needle. Before the first deeply knotted protein structure was identified ten years ago (3), the apparent lack of knotted proteins was cited as evidence to suggest that this type of threading event might be impossible (4, 5). However, there are now roughly ten distinct knotted protein folds known (1), some of which are quite deep, proving that some proteins can and do spontaneously fold into knotted structures.
Knotted structures present challenges to current theories of protein folding, which have been developed mainly based on small proteins with simple folding kinetics (6). For such proteins, it has been proposed that the folding energy landscape resembles a funnel (7–9), implying that the native state can be reached by moving toward lower energy from any of a vast ensemble of denatured configurations. The topological constraint of having to thread a knotted protein, on the other hand, would appear to greatly restrict the conformational space available for productive folding, which raises the question of how the folding pathway through this restricted conformational space is encoded in the amino acid sequence of the protein. A general model of protein folding must be able to account for topologically complex proteins, such as those containing knots.
Much of the recent research on knotted proteins, both experimental and computational, has naturally turned to investigating exactly how and when threading occurs during folding. Jackson et al. have carried out a series of experiments to characterize the complex folding pathways of two structurally related, knotted methyltransferases (10–13). Work on the methyltransferase model system led to the proposal that threading can occur early in folding reactions (14), producing a knotted protein in a loose, denatured-like state, followed by normal folding to the native structure. A recent demonstration that the methyltransferases tend to remain knotted even under strongly denaturing conditions (15) further supports the view that threading occurs early for that particular knotted fold. Computational simulations, on the other hand, have suggested various scenarios for threading, including mechanisms where the knot is acquired in later stages of folding (16, 17).
Other important questions posed by the existence of knotted proteins have not yet been addressed experimentally. For instance, does a knotted topology have any effect on the stability, folding, or rigidity of a protein? The difficulty in attacking such a question lies in a lack of suitable controls. In order to specifically address the effects of a knot in a protein, a nearly identical, yet unknotted protein must be available for comparison. Because such pairs of knotted/unknotted proteins do not exist naturally, they must be designed or engineered (2).
Here we describe the design of a unique knotted protein and its structural and biophysical characterization. Despite having to navigate a presumably complex energy landscape, the protein folds reversibly in vitro to the target, knotted configuration. The engineering of a control protein that is unknotted but otherwise nearly identical in sequence and structure allows the effects of protein knotting to be examined directly.
Results
Designing a Unique Protein Knot by Domain Duplication.
We sought to design a unique knotted protein by minimally modifying a naturally existing, unknotted protein. Our design strategy was motivated by the prior observation that some knotted protein folds display internal pseudosymmetry (1, 3). This phenomenon is seen in five of the ten knotted folds that have been elucidated. The construction of these proteins from internally duplicated motifs or domains suggests that they evolved by gene duplication and fusion, potentially from ancestral proteins that were oligomeric in nature. In one illustrative example, the hypothetical ancestral homodimer and the knotted tandem-domain monomer can be found in extant proteins: the knotted Agrobacterium tumefaciens protein VirC2 is a fusion of two ribbon-helix-helix DNA-binding domains in a configuration resembling the Arc repressor dimer (Fig. 1A). Motivated by the evidence of domain duplication as a naturally occurring, evolutionary pathway for the creation of knotted folds, we engineered a unique knotted protein by genetically fusing a tandem repeat of the gene for the unknotted, dimeric protein HP0242 from Helicobacter pylori (PDB codes 2ouf and 2bo3; Fig. 1B). The two subunits of this protein of unknown function (referred to hereafter as 2ouf-wt) intertwine in such a manner that connecting the C terminus of the first subunit to the N terminus of the second by a nine-residue, glycine-rich linker yields a knotted, monomeric protein, which we refer to as 2ouf-knot (Fig. 1D). The knot in 2ouf-knot is a left-handed trefoil (three-crossing) knot, and is 28 residues deep on the N-terminal side and 57 residues deep on the C-terminal side. A third protein, 2ouf-ds, was constructed in which the two chains of the dimer are linked by an intermolecular disulfide bond that does not introduce a knot (Fig. 1C). Because 2ouf-knot and 2ouf-ds are both unimolecular and nearly identical other than their differing topologies, comparing the genetically fused, knotted protein to the disulfide-linked, unknotted protein allows for direct interrogation of the effects of knotted topologies on protein stability and folding.
Fig. 1.
Experimental design and crystal structures of 2ouf-ds and 2ouf-knot. (A) A natural example where gene duplication and fusion of an intertwined dimer has led to a knot. The naturally knotted VirC2 protein (right) is a tandem repeat of ribbon-helix-helix domains, like those found in the dimeric, unknotted Arc repressor (left). A single ribbon-helix-helix domain in each protein is highlighted in cyan. (B) Crystal structure (left; PDB code 2ouf) and simplified schematic (right) of the intertwined, dimeric protein HP0242. The two chains of the dimer are colored green and blue. (C) Schematic (left) and crystal structure (right) of 2ouf-ds. A disulfide bond, engineered to produce a unimolecular but unknotted construct, is shown in orange. (D) Schematic (left) and crystal structure (right) of 2ouf-knot, a knotted protein created by linking the two intertwined domains in tandem. The eight linker residues not modeled in the crystal structure are represented by the dashed gray line. Alignments of one subunit of the wild-type protein (blue) to the crystal structures of (C) 2ouf-ds and (D) 2ouf-knot are shown. In all structures, tryptophans 18 and 18′ (18 and 107 in 2ouf-knot) are shown as sticks.
Structural Characterization of 2ouf-ds and 2ouf-knot.
(His)6-tagged constructs of 2ouf-wt, 2ouf-ds, and 2ouf-knot were expressed recombinantly in Escherichia coli and purified to homogeneity. Following oxidation of the designed disulfide in 2ouf-ds, the mutant proteins were crystallized (see SI Text). Crystal structures of 2ouf-ds and 2ouf-knot were determined by molecular replacement and refined to respective resolutions of 2.9 Å and 2.3 Å (Table S1). Although the protein molecules are packed differently in these two crystals and the crystals of the wild-type protein, the refined models of all three structures are effectively superimposable. Pairwise alignments of the three structures—the wild-type homodimer, 2ouf-knot, and 2ouf-ds—over backbone atoms covering residues 13–92 in the wild-type sequence yield rms differences that are less than or equal to 0.74 Å (Fig. 1 C and D). The designed 2ouf-knot protein is therefore folded as intended into a knotted configuration.
Unfortunately, although not unexpectedly, electron density was observed for only one residue of the flexible linker used to link the two domains of 2ouf-knot (sequence SGSGSGSSG). To verify that the linker was still intact in the crystallized protein, we subjected washed crystals of 2ouf-knot to nonreducing SDS-PAGE alongside solutions of the purified proteins (Fig. S1A). We did not observe cleavage products of the proteins in any lane of the gel, confirming that our crystal structure of 2ouf-knot is of the intact, full-length, tandem repeat protein.
Biophysical Characterization of 2ouf-wt, 2ouf-ds, and 2ouf-knot.
Analytical size exclusion chromatography was used to confirm that the designed proteins exhibited the correct quaternary structures in solution (Fig. 2A). The wild-type protein was previously shown to be a dimer in solution (18). As expected, 2ouf-ds eluted at precisely the same volume as 2ouf-wt, while 2ouf-knot eluted very slightly later, as expected due to its marginally smaller molecular weight (see SI Text). We next compared the spectroscopic properties of the folded and unfolded states of the proteins using CD and intrinsic fluorescence. The far-UV CD spectra of 2ouf-wt, 2ouf-ds, and 2ouf-knot overlap nearly perfectly (Fig. 2B), as do their fluorescence spectra (Fig. 2C), indicating that the molecules are in structurally equivalent states in solution. Unfolding the proteins in high concentrations of guanidinium chloride (GdmCl) resulted in the expected loss of helical signal in their CD spectra, although 2ouf-ds and 2ouf-knot retained slightly more signal than 2ouf-wt (Fig. 2B). Similarly, a large decrease in the fluorescence intensity and a red-shift of the emission maximum of each protein was observed in the presence of GdmCl owing to the single, buried tryptophan residue at position 18 (or two pseudosymmetrically related copies of that residue in 2ouf-knot; Fig. 1) becoming exposed to solvent upon unfolding (Fig. 2C). Importantly, proteins that were unfolded and subsequently refolded by dilution of the GdmCl exhibited CD and fluorescent spectra nearly identical to those obtained under native conditions (Fig. 2 B and C), suggesting that all three proteins, even the synthetically knotted 2ouf-knot, unfold reversibly in vitro in the absence of molecular chaperones.
Fig. 2.
Biophysical characterization of 2ouf-wt, 2ouf-ds, and 2ouf-knot. Data for 2ouf-wt are colored black; 2ouf-ds, red; and 2ouf-knot, blue. (A) Size exclusion chromatography elution profiles of the three proteins. The elution profiles for 2ouf-wt and 2ouf-ds are nearly indistinguishable. (B) Far-UV CD spectra of the three proteins. Spectra for the folded proteins (in buffer supplemented with 0.25 M GdmCl) are shown as solid lines, unfolded proteins (7.44 M GdmCl) as dash-dot lines, and proteins refolded from 6.14 M GdmCl (0.25 M GdmCl) as dashed lines. (C) Fluorescence emission spectra of the three proteins, using an excitation wavelength of 290 nm. Spectra are represented as in B, and have been normalized for comparison.
Equilibrium Denaturation of 2ouf-wt, 2ouf-ds, and 2ouf-knot.
To confirm the reversibility of unfolding, and to compare the thermodynamic stability of a knotted protein to an unknotted control, we collected equilibrium denaturation curves of 2ouf-wt, 2ouf-ds, and 2ouf-knot. We used GdmCl as denaturant, and collected data at low (2.5 μM 2ouf-wt/ds, 1.25 μM 2ouf-knot) and high (100 μM 2ouf-wt/ds, 50 μM 2ouf-knot) protein concentration at 25 °C. The unfolding and refolding curves for each protein were superimposable whether fluorescence or CD was used as a probe, demonstrating that unfolding is reversible for all three proteins (Fig. 3). However, the two spectroscopic probes yielded substantially different curves for each protein. When monitoring fluorescence emission at 320 nm at low protein concentration, a single transition is observed for all three proteins, with midpoints of denaturation between 2–3 M GdmCl. The CD curves at low protein concentration, in contrast, contain two readily identifiable transitions, the first between 2–3 M GdmCl, and the second between 5–7 M GdmCl. For 2ouf-wt, the higher [GdmCl] transition became more pronounced at higher protein concentration, while the lower [GdmCl] transition was largely unchanged (Fig. 3A). The curves at low and high protein concentration were superimposable for 2ouf-ds and 2ouf-knot (Fig. 3 B and C), consistent with the unimolecular nature of these two proteins.
Fig. 3.
Equilibrium denaturation of (A) 2ouf-wt, (B) 2ouf-ds, and (C) 2ouf-knot. CD data are represented by filled symbols, fluorescence data by open symbols. All data have been normalized to a native signal of 0 and a denatured signal of 1. Circles and squares represent unfolding and refolding data, respectively, at low protein concentration. Triangles represent unfolding data at high protein concentration. Lines represent fits of the CD (solid) and fluorescence (dashed) data to three-state and two-state models, respectively.
Together, the equilibrium data indicate that the higher [GdmCl] transition observed by CD for all three proteins is due to an equilibrium intermediate populated at moderate [GdmCl], and that the 2ouf-wt intermediate is dimeric, because it is more populated at higher protein concentration. The analogous 2ouf-ds and 2ouf-knot intermediates are highly populated even at low protein concentration; in these proteins the two domains forming the intermediates are covalently linked, so their stabilities are independent of protein concentration. Finally, the intermediates for all three proteins appear to be structurally similar, each containing a significant amount of secondary structure (moderate CD signal), but without the tryptophan residues in their buried, native states (diminished fluorescent signal). We also monitored the equilibrium denaturation of 2ouf-wt at 100 μM protein by fluorescence, and found it to be indistinguishable from the curve collected at 2.5 μM protein (Fig. 3A).
We fitted the CD equilibrium denaturation data at low protein concentration for 2ouf-ds and 2ouf-knot to a three state N↔I↔U model (as described in SI Text) to extract thermodynamic parameters (Table 1). For 2ouf-wt, we fitted the low and high protein concentration CD data individually to a three state N2↔I2↔2U dimer denaturation model with a dimeric intermediate as described previously (19), with minor modification (as described in SI Text). We independently fitted the fluorescence data for all proteins to a two-state N↔I model (N2↔I2 for 2ouf-wt) because the intermediates exhibit no detectable fluorescent signal. For each protein, the parameters extracted from the CD and fluorescence data for the native to intermediate transition were in close agreement. Unexpectedly, the 2ouf-knot intermediate is significantly (5.8 kcal mol-1) more stable than that of 2ouf-ds, whereas the native state of 2ouf-knot is more stable than that of 2ouf-ds by a lesser amount (2.7 kcal mol-1). In comparison, therefore the stability of the native state of 2ouf-knot relative to the intermediate state (2.3 kcal mol-1) is partially diminished compared to 2ouf-ds (5.4 kcal mol-1). This observation could reflect the entropic cost of constraining the nine-residue, glycine-rich linker in a relatively low-entropy conformation (20). Alternatively, the intermediate state of 2ouf-knot may be more native-like than that of 2ouf-ds, as suggested by its significantly enhanced stability and much lower m-value for the native to intermediate transition.
Table 1.
Thermodynamic parameters for 2ouf-wt, 2ouf-ds, and 2ouf-knot, extracted from fits of equilibrium denaturation data
| Protein | Probe | YI | ![]() |
mN2↔I2 | ![]() |
mI2↔2U |
*
|
mN2↔2U† |
| 2ouf-wt | CD | 0.5 | 4.6 ± 0.3 | 1.71 ± 0.11 | 13.4 ± 0.3 | 1.22 ± 0.06 | 18.0 ± 0.4 | 2.93 ± 0.13 |
| CD‡ | 0.5 | 4.9 ± 0.6 | 1.83 ± 0.21 | 13.5 ± 0.7 | 1.45 ± 0.13 | 18.4 ± 0.9 | 3.29 ± 0.24 | |
| Fluor. | - | 4.3 ± 0.2 | 1.74 ± 0.09 | - | - | - | - | |
![]() |
mN↔I | ![]() |
mI↔U |
†
|
mN↔U‡ | |||
| 2ouf-ds | CD | 0.59 ± 0.02 | 5.4 ± 0.4 | 1.80 ± 0.14 | 7.1 ± 0.8 | 1.14 ± 0.12 | 12.5 ± 0.9 | 2.94 ± 0.18 |
| Fluor. | - | 5.3 ± 0.4 | 1.90 ± 0.13 | - | - | - | - | |
| 2ouf-knot | CD | 0.52 ± 0.01 | 2.3 ± 0.1 | 0.96 ± 0.04 | 12.9 ± 0.6 | 1.99 ± 0.10 | 15.2 ± 0.6 | 2.95 ± 0.10 |
| Fluor. | - | 2.1 ± 0.1 | 1.05 ± 0.03 | - | - | - | - |
The parameter YI refers to the CD signal amplitude of the intermediate state of each protein. All ΔG° values are in units of kcal mol-1; the standard state concentration (which enters into the wild-type analysis) is 1 M. All m values are in units of kcal mol-1 M-1; errors quoted are the standard errors for the fits calculated by the program R.
*
; 
†mN2↔2U = mN2↔I2 + mI2↔2U; mN↔U = mN↔I + mI↔U.
‡100 μM protein. All other parameters are from fits to low protein concentration curves.
Folding Kinetics of 2ouf-wt, 2ouf-ds, and 2ouf-knot.
We collected and analyzed stopped-flow kinetic traces of 2ouf-wt, 2ouf-ds, and 2ouf-knot refolding to examine the effects of knotted topologies on protein folding rates. Monitoring single-jump refolding from the unfolded state (7.29 M GdmCl) by fluorescence emission at 320 nm yielded kinetic traces that reached completion on the time scale of several seconds (Fig. 4). However, a significant fraction of the total signal change between the unfolded and folded states was complete in less than the dead time of the instrument (∼15 ms) for all three proteins, suggesting the presence of a burst phase intermediate during folding. Nevertheless, it is clear from visual inspection of the kinetic traces that the knotted 2ouf-knot folds much more slowly than the unknotted 2ouf-ds. Numerical analysis of the refolding traces revealed that the observable signal change for each protein could be adequately described by a sum of two exponentials (Fig. 4, residuals). However, refolding to several final concentrations of GdmCl (Fig. S2) and plotting the natural logs of rate constants extracted from either single or double exponential fits as a function of [GdmCl] (chevron analysis) did not result in linear folding limbs as typically observed (Fig. S3); severe rollover is observed at low [GdmCl]. This anomalous behavior indicates that the refolding reactions are too complex to be explained by a simple biexponential function. Given the stable intermediates observed at equilibrium, the presence of an apparent burst phase in the kinetic refolding data, and the rollover observed at low [GdmCl], it seems probable that the proteins are folding through one or more intermediate species or pathways.
Fig. 4.
Single-jump refolding kinetic traces of (A) 2ouf-wt, (B) 2ouf-ds, and (C) 2ouf-knot. For each protein, the data are normalized to a native signal of 1 and a denatured signal of 0. Residuals are for fits to first-order equations with two (top) or one (bottom) exponential terms. The double exponential fits are shown as red lines. Insets show the protein concentration dependence of the rate constants extracted from single (filled circles) and double (open squares and triangles) exponential fits.
Nevertheless, the appearance of fluorescent signal in the refolding traces clearly monitors attainment of native structure during refolding of the proteins, and therefore allows a quantitative comparison of their folding rates. Fitting the kinetic data to a single exponential function, while not capturing fine details of the kinetic traces, provides a reasonable assessment of the overall folding efficiencies (Fig. S2). Comparing these estimated rates for the three proteins reveals that, at a final [GdmCl] of 0.66 M, 2ouf-knot folds approximately 20 times more slowly (0.9 s-1) than the unknotted 2ouf-ds designed as a control (17.7 s-1) and twice as slowly as the dimeric 2ouf-wt (1.9 s-1; the units here reflect that the observed phenomenon monitors the appearance of N2, which is first-order). The folding rates of 2ouf-ds and 2ouf-knot were found to be independent of protein concentration, while 2ouf-wt refolds faster as protein concentration increases, as expected for a bimolecular reaction (Fig. 4, insets).
Refolding kinetics were also analyzed starting with proteins equilibrated in 4 M GdmCl, conditions under which the equilibrium intermediate of each protein is well populated. 2ouf-knot was found to refold rapidly from this state, yielding estimated folding rates comparable to those of 2ouf-ds under the same conditions (Figs. S3 and S4B), in sharp contrast to the much slower rate of overall folding for 2ouf-knot beginning from the fully denatured state (Figs. S3 and S4A). This result suggests that the intermediate observed at equilibrium may be populated as a kinetic intermediate on the folding pathway of 2ouf-knot, and that the slower overall folding rate of 2ouf-knot when refolding is initiated from the unfolded state can be attributed to the transition from the unfolded to the intermediate state.
We also monitored unfolding reactions starting with the proteins in their native states, and observed that at high [GdmCl] the fluorescent signal was lost very rapidly (< 100 ms) for all three proteins (Fig. S2). Unfolding traces for 2ouf-wt and 2ouf-ds were adequately described by a single exponential function, while those for 2ouf-knot required two exponential terms, except at the highest final [GdmCl] tested (6 M). As with the folding data, it was found that single exponential fits could reasonably approximate the apparent unfolding rate of 2ouf-knot. The natural logs of estimated unfolding rates extracted from the single exponential fits for each protein were found to depend linearly on the final [GdmCl] of the unfolding reaction (Fig. S3). This analysis revealed that although the apparent unfolding rates observed for 2ouf-knot are similar to the rates of refolding from the intermediate state, they are much higher than the overall rates of refolding from the fully unfolded state. Therefore, fluorescence appears to monitor a reversible, relatively rapid transition between the native and intermediate states of 2ouf-knot. Furthermore, although unfolding from the intermediate to the fully unfolded state is invisible when monitoring by fluorescence, it appears to be the rate-limiting step of 2ouf-knot unfolding based on the above kinetic and equilibrium data.
The folding and unfolding data are most consistent with a model in which 2ouf-knot folds through a kinetic intermediate, similar in nature to the stable intermediate observed at equilibrium, that is also populated on the unfolding pathway. Furthermore, the data suggest that a higher activation energy barrier exists between the intermediate and the fully unfolded states of 2ouf-knot compared to 2ouf-ds during folding and, most likely, also during unfolding. We propose that this higher barrier is related to knot formation and unthreading, respectively, because the key difference between 2ouf-knot and 2ouf-ds is the presence or absence of the knot, although, as discussed below, other factors arising from differences in the way the two domains are connected in each protein may also be partially responsible.
Computational Search for Other Potential Domain Fusion Knots.
To our knowledge, there has not been a systematic search for dimeric proteins that would become knotted upon fusion of the two subunits. We used the PISA server (21) to download a set of PDB files containing predicted homodimeric assemblies of small proteins (< 200 residues), computationally connected their termini, and evaluated them for knottedness. Out of 4,192 homodimers searched, we found only five distinct, globular, dimeric protein folds that could be knotted by fusion of the two subunits (Table S2 and Fig. S5). In addition to HP0242, the subject of the present study, our search also identified several dimeric ribbon-helix-helix proteins, as expected given the topology of the naturally knotted VirC2, which is a tandem duplication of ribbon-helix-helix domains as discussed above. The final three dimeric folds revealed by our search were the YejL-like domain, the OsmC-like domain, and a lambda-repressor-like DNA-binding domain with intertwined C-terminal extensions. These proteins could provide additional targets for designing unique knotted proteins by tandem duplication. We considered whether proteins comprising tandem duplications of these domains (which could be knotted) might already exist in nature, but initial sequence searches did not identify any such cases (see SI Text).
Discussion
We have found that a designed knotted protein, 2ouf-knot, successfully folds to the target knotted configuration, demonstrating that a protein sequence can overcome substantial topological barriers on the way to reaching its minimum free energy structure even when it has not evolved to do so. However, consistent with the topological problems associated with knotting, we find that our designed protein knot has a more complex folding energy landscape than an unknotted control protein. This conclusion is supported by a recent study in which dimeric variants of the p53 tetramerization domain were engineered such that threading of one linear chain of the dimer through a cyclized second chain could be specifically monitored. It was found that the threaded dimer folded about an order of magnitude more slowly than the wild-type protein, in which threading is not required during folding since both chains of the dimer are linear (22). Our observation of a complex energy landscape for 2ouf-knot is also consistent with recent experimental investigations of the folding pathways of naturally knotted proteins (11, 23); all three of the proteins studied have been found to fold very slowly, with complex kinetic behaviors involving multiple intermediates. The lack of unknotted controls for those proteins has prevented direct assessment of the role played by their knotted topologies. Further kinetic analysis of model systems of the type we present here should allow additional insights into the roles of knots in proteins.
Interestingly, 2ouf-knot may not be the first case where the design of a tandem repeat protein led to a knotted structure. Nearly 15 years ago, to study the effects of linking dimeric proteins into single chains, Robinson and Sauer genetically fused the two chains of the Arc repressor dimer (Fig. 1A), resulting in a protein they dubbed Arc-L1-Arc (24). Although we cannot be certain because the crystal structure of Arc-L1-Arc was never determined, it is likely that the protein was in fact knotted, though this was apparently not recognized. A retrospective evaluation of the biophysical characterization of Arc-L1-Arc suggests that, in contrast to the present study, the putative knot, if present, had little effect on the complexity of the folding pathway of the protein; Arc-L1-Arc exhibited two-state behavior and folded and unfolded more quickly than both wild-type Arc (24) and an unknotted, disulfide-linked Arc dimer (25) at all concentrations of denaturant. Those results suggest that the effects of knots on the folding energy landscapes of proteins could depend on the particular protein fold.
We cannot conclusively identify the specific mechanistic events that lead to complexities in the folding energy landscape for 2ouf-knot. However, folding simulations of knotted proteins have given rise to various proposed mechanisms for threading during protein folding, each of which involves complex movements that would be expected to result in entropic barriers to folding and constricted energy landscapes. For instance, simulations from two groups have revealed that folding can proceed through “slipknotted” intermediates, in which the portion of the protein chain being threaded is initially in a hairpin-like conformation that goes through and then comes back out of the threading loop (16, 17, 26). A somewhat different mechanism involving the “flipping” of large segments of the protein over a semifolded core has also been proposed (1).
Alternatively, slow migration of the knot along the protein chain after an initial collapse of the protein during folding may be responsible for the slower folding rate of 2ouf-knot. Collapse of a “prethreaded” denatured state could give rise to a conformation bearing a knot in a nonnative location. The ensuing steps in folding would involve migration of the knot along the threaded portion of the polypeptide chain. Interactions between the threaded portion of the chain and the collapsed loop surrounding it, amounting to internal friction in the folding molecule, could result in a rugged energy landscape for 2ouf-knot. Simulations on slipknotted structures support the potential significance of chain friction (27), and the recent experimental demonstration of a rough energy landscape arising from internal friction (28), in that case due to helices pairing in nonnative registers on the way to the native state, shows that such effects can be important in protein folding reactions. This mechanism may account for the slow, complex folding of the knotted methyltransferases studied by Jackson et al., for which initial threading has been suggested to not be a kinetically limiting step (12, 15).
In addition, when considering explanations for slow folding, we note that a significant correlation has been observed between contact order and folding rate for a number of small proteins with simple folding kinetics, providing a link between the native structures of proteins and their mechanisms of folding (9, 29). We calculated the absolute contact order of 2ouf-ds, counting the sequence separation of intersubunit contacts through the disulfide bond (30), to be 10.8, while the absolute contact order of 2ouf-knot was calculated to be 20.9. These values indicate that contacting residues in 2ouf-ds are separated by approximately 11 residues on average, while the same contacts in 2ouf-knot are separated by about 21 residues on average. This large difference is consistent with a natural tendency for proteins that have complex topologies to have high contact order. It is probable that the higher fraction of nonlocal contacts in 2ouf-knot compared to 2ouf-ds can explain, at least in part, the slower folding of the knotted protein. To the extent that knotted proteins must have high contact order, this presents a potentially important complication for knotted proteins in general. However, it is important to note that 2ouf-knot folds to its native state even more slowly at low [GdmCl] than 2ouf-wt, which, on account of being dimeric, has an effectively infinite contact order. Evidently the nonlocality of contacts alone cannot fully explain the slower folding of 2ouf-knot.
In summary, our results suggest that although there is no insurmountable barrier to threading during protein folding, knotted proteins have more complex or constricted folding energy landscapes than unknotted proteins with similar tertiary structures. A common view in the protein folding community is that most proteins are subject to selective pressure to fold cooperatively, without highly populated intermediate states (6). This selective pressure is thought to arise from highly populated nonnative species having an increased risk of misfolding or aggregating, which could deleteriously affect the health of the cell (31). Our results imply that knotted proteins possess complex folding landscapes, leading to increased folding times and populated intermediates that could be selected against during evolution. It is interesting, however, that although 2ouf-knot folds 20 times more slowly than the unknotted control, it still folds within a few seconds, which is faster than some small proteins with simple folding kinetics (6). Apparently, despite the topological complications, some knotted structures are able to fold quickly and cooperatively enough to minimize deleterious misfolding and aggregation events. Nonetheless, it is likely that for many potentially knotted tertiary structures the landscape is sufficiently complex or constricted to provide a strong disadvantage, which could explain the apparent rarity of knotted folds in nature.
Materials and Methods
Proteins were expressed recombinantly in E. coli and purified by metal affinity chromatography and size exclusion chromatography. Crystals of 2ouf-ds and 2ouf-knot were obtained using the hanging drop vapor diffusion method. Crystal structures of the proteins were determined by molecular replacement using X-ray diffraction data collected in-house and at the Advanced Photon Source beamline 24-ID-C. An analysis of pseudosymmetry from the diffraction intensities is illustrated in Fig. S6. CD and fluorescence spectroscopy were performed in buffers containing varying amounts of GdmCl to perturb the energy landscapes of the proteins. The kinetics of folding and unfolding reactions were monitored using a stopped-flow device equipped with excitation and emission monochromators set to measure native tryptophan fluorescence. An endpoint amplitude analysis is shown in Fig. S7. Detailed methods can be found in the SI Text.
Supplementary Material
Acknowledgments.
The authors thank Katelyn Connell and Susan Marqusee for assistance with folding experiments, Martin Phillips for assistance with stopped-flow fluorimetry, Inna Pashkov for technical assistance, and Sophie Jackson for helpful comments on the manuscript. This work was supported by award R01GM081652 from the National Institutes of Health.
Footnotes
The authors declare no conflict of interest.
*This Direct Submission article had a prearranged editor.
Data deposition: The atomic coordinates and structure factors reported in this paper have been deposited in the Protein Data Bank, www.pdb.org.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1007602107/-/DCSupplemental.
References
- 1.Bölinger D, et al. A Stevedore’s protein knot. PLoS Comput Biol. 2010;6:e1000731. doi: 10.1371/journal.pcbi.1000731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Yeates TO, Norcross TS, King NP. Knotted and topologically complex proteins as models for studying folding and stability. Curr Opin Chem Biol. 2007;11:595–603. doi: 10.1016/j.cbpa.2007.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Taylor WR. A deeply knotted protein structure and how it might fold. Nature. 2000;406:916–919. doi: 10.1038/35022623. [DOI] [PubMed] [Google Scholar]
- 4.Mansfield ML. Are there knots in proteins? Nat Struct Biol. 1994;1:213–214. doi: 10.1038/nsb0494-213. [DOI] [PubMed] [Google Scholar]
- 5.Mansfield ML. Fit to be tied. Nat Struct Biol. 1997;4:166–167. doi: 10.1038/nsb0397-166. [DOI] [PubMed] [Google Scholar]
- 6.Jackson SE. How do small single-domain proteins fold? Fold Des. 1998;3:R81–91. doi: 10.1016/S1359-0278(98)00033-9. [DOI] [PubMed] [Google Scholar]
- 7.Leopold PE, Montal M, Onuchic JN. Protein folding funnels: a kinetic approach to the sequence-structure relationship. Proc Natl Acad Sci USA. 1992;89:8721–8725. doi: 10.1073/pnas.89.18.8721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dill KA, Chan HS. From Levinthal to pathways to funnels. Nat Struct Biol. 1997;4:10–19. doi: 10.1038/nsb0197-10. [DOI] [PubMed] [Google Scholar]
- 9.Wolynes PG. Recent successes of the energy landscape theory of protein folding and function. Q Rev Biophys. 2005;38:405–410. doi: 10.1017/S0033583505004075. [DOI] [PubMed] [Google Scholar]
- 10.Mallam AL, Jackson SE. Probing nature’s knots: the folding pathway of a knotted homodimeric protein. J Mol Biol. 2006;359:1420–1436. doi: 10.1016/j.jmb.2006.04.032. [DOI] [PubMed] [Google Scholar]
- 11.Mallam AL, Jackson SE. A comparison of the folding of two knotted proteins: YbeA and YibK. J Mol Biol. 2007;366:650–665. doi: 10.1016/j.jmb.2006.11.014. [DOI] [PubMed] [Google Scholar]
- 12.Mallam AL, Onuoha SC, Grossmann JG, Jackson SE. Knotted fusion proteins reveal unexpected possibilities in protein folding. Mol Cell. 2008;30:642–648. doi: 10.1016/j.molcel.2008.03.019. [DOI] [PubMed] [Google Scholar]
- 13.Mallam AL, Morris ER, Jackson SE. Exploring knotting mechanisms in protein folding. Proc Natl Acad Sci USA. 2008;105:18740–18745. doi: 10.1073/pnas.0806697105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Mallam AL. How does a knotted protein fold? FEBS J. 2009;276:365–375. doi: 10.1111/j.1742-4658.2008.06801.x. [DOI] [PubMed] [Google Scholar]
- 15.Mallam AL, Rogers JM, Jackson SE. Experimental detection of knotted conformations in denatured proteins. Proc Natl Acad Sci USA. 2010;107:8189–8194. doi: 10.1073/pnas.0912161107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wallin S, Zeldovich KB, Shakhnovich EI. The folding mechanics of a knotted protein. J Mol Biol. 2007;368:884–893. doi: 10.1016/j.jmb.2007.02.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Sułkowska JI, Sułkowski P, Onuchic J. Dodging the crisis of folding proteins with knots. Proc Natl Acad Sci USA. 2009;106:3119–3124. doi: 10.1073/pnas.0811147106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Tsai JY, et al. Crystal structure of HP0242, a hypothetical protein from Helicobacter pylori with a novel fold. Proteins. 2006;62:1138–1143. doi: 10.1002/prot.20864. [DOI] [PubMed] [Google Scholar]
- 19.Mallam AL, Jackson SE. Folding studies on a knotted protein. J Mol Biol. 2005;346:1409–1421. doi: 10.1016/j.jmb.2004.12.055. [DOI] [PubMed] [Google Scholar]
- 20.Robinson CR, Sauer RT. Optimizing the stability of single-chain proteins by linker length and composition mutagenesis. Proc Natl Acad Sci USA. 1998;95:5929–5934. doi: 10.1073/pnas.95.11.5929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J Mol Biol. 2007;372:774–797. doi: 10.1016/j.jmb.2007.05.022. [DOI] [PubMed] [Google Scholar]
- 22.Blankenship JW, Dawson PE. Threading a peptide through a peptide: protein loops, rotaxanes, and knots. Protein Sci. 2007;16:1249–1256. doi: 10.1110/ps.062673207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Andersson FI, Pina DG, Mallam AL, Blaser G, Jackson SE. Untangling the folding mechanism of the 5(2)-knotted protein UCH-L3. FEBS J. 2009;276:2625–2635. doi: 10.1111/j.1742-4658.2009.06990.x. [DOI] [PubMed] [Google Scholar]
- 24.Robinson CR, Sauer RT. Equilibrium stability and sub-millisecond refolding of a designed single-chain Arc repressor. Biochemistry. 1996;35:13878–13884. doi: 10.1021/bi961375t. [DOI] [PubMed] [Google Scholar]
- 25.Robinson CR, Sauer RT. Striking stabilization of Arc repressor by an engineered disulfide bond. Biochemistry. 2000;39:12494–12502. doi: 10.1021/bi001484e. [DOI] [PubMed] [Google Scholar]
- 26.King NP, Yeates EO, Yeates TO. Identification of rare slipknots in proteins and their implications for stability and folding. J Mol Biol. 2007;373:153–166. doi: 10.1016/j.jmb.2007.07.042. [DOI] [PubMed] [Google Scholar]
- 27.Sułkowska JI, Sułkowski P, Onuchic JN. Jamming proteins with slipknots and their free energy landscape. Phys Rev Lett. 2009;103:268103. doi: 10.1103/PhysRevLett.103.268103. [DOI] [PubMed] [Google Scholar]
- 28.Wensley BG, et al. Experimental evidence for a frustrated energy landscape in a three-helix-bundle protein family. Nature. 2010;463:685–688. doi: 10.1038/nature08743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Plaxco KW, Simons KT, Baker D. Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol. 1998;277:985–994. doi: 10.1006/jmbi.1998.1645. [DOI] [PubMed] [Google Scholar]
- 30.Parrini C, et al. The folding process of acylphosphatase from Escherichia coli is remarkably accelerated by the presence of a disulfide bond. J Mol Biol. 2008;379:1107–1118. doi: 10.1016/j.jmb.2008.04.051. [DOI] [PubMed] [Google Scholar]
- 31.Dobson CM. Protein folding and misfolding. Nature. 2003;426:884–890. doi: 10.1038/nature02261. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.










