Abstract
Calculating protein stability and predicting stabilizing mutations remain exceedingly difficult tasks, largely due to the inadequacy of potential functions, the difficulty of modeling entropy and the unfolded state, and challenges of sampling, particularly of backbone conformations. Yet, computational design has produced some remarkably stable proteins in recent years, apparently owing to near ideality in structure and sequence features. With caveats, computational prediction of stability can be used to guide mutation, and mutations derived from consensus sequence analysis, especially improved by recent co-variation filters, are very likely to stabilize without sacrificing function. The combination of computational and statistical approaches with library approaches, including new technologies such as deep sequencing and high throughput stability measurements, point to a very exciting near term future for stability engineering, even with difficult computational issues remaining.
There is considerable evidence that some proteins, or domains, function in an intrinsically disordered state [1], but the majority of proteins act from a highly ordered folded state [2]. Mutational and combinatorial studies have shown us that these highly ordered states are not common in ‘sequence space’ [3,4], meaning that most polypeptide sequences are not well folded like natural proteins. In fact, because the folded state of proteins is only 5–15 kcal mol−1 more stable than the unfolded state, even a single mutation can significantly destabilize or unfold a protein. Although most proteins from mesophiles have melting temperatures far below those of corresponding proteins from thermophiles—which is to say, their folds can usually be stabilized—the overwhelming majority of mutations to natural proteins are neutral to unfavorable [5,6].
At a minimum, this is an inconvenience for protein scientists. The instability of natural proteins or their variants makes them difficult to purify, handle, and study. A well-meaning mutation to probe the function of some residue must always be analyzed in light of the high likelihood of unfavorable consequences on the folding or stability of the protein. Protein stability and instability also underlie biology and disease. Many mutations may reduce function or promote disease simply through destabilization, such as many of the mutations of tumor suppressors like p53 [7] or mutations of SOD1 that may be related to ALS [8]. We continue to find new uses for proteins as therapeutics due to their exquisite specificities, but their uses in the clinic are significantly limited by difficulty in handling, poor storage stability, and aggregation [9].
The solution sounds simple: stabilize the protein. Stabilizing mutations may be rare, but a great many have been found. The problem has been attacked from virtually every imaginable angle of random mutagenesis, rational design, bioinformatics, and computational design. The sheer number of approaches highlights the objective reality: we are not that good at it. This is especially vexing because the key forces that underlie protein stability are fairly well understood, such as the burial and tight packing of hydrophobic residues, the ejection of ordered solvent, and the formation of hydrogen bonds and other electrostatic interactions, conformational entropy, and bond strain (such as backbone angle strain) [10–12]. The dominance of core packing in protein stability [13], which encompasses several of these parameters at once, simplifies the problem. More subtle effects, such as the effect of burial of charged residues [14] and the role of surface electrostatics [15], are much better understood in recent years.
But the challenges remain numerous. For one thing, some of these factors are a lot easier to calculate than others [16]. We are very good at calculating geometric parameters, for example to maximize hydrophobic surface burial or minimize bond strain. But electrostatics calculations are greatly hampered by how to treat solvation, in particular due to the challenge of polarizability. There is no way to compute entropy directly from the force field itself, and so conformational effects are beholden to long simulations and accurate sampling, which are both challenging. Matthews’s work on T4 lysozyme taught us that proteins respond to mutations more by subtle movements of the backbone than adopting unfavorable side chain rotamers [17], but it is much more difficult to explore non-discretized backbone conformational space. Moreover, sometimes subtle changes to proteins can cause them to settle into very different regions of conformational space, as seen in topological changes from seemingly conservative mutations of the a protein’s hydrophobic core [18]. The gain in solvent entropy that largely underlies the hydrophobic effect is not explicitly included in these calculations. And finally, the ΔG of folding is the free energy difference between the folded state and the unfolded state, but our knowledge of how to model the unfolded state is so scant that we generally do not. It is also very difficult to model the effects of misfolding, or to account for alternative conformations.
Despite these daunting challenges, progress has been made in engineering highly stabilized proteins in recent years. Here I will briefly examine some important advances and ongoing challenges in computational and statistical stabilization of proteins.
Rosetta and other computational design
There have been numerous contributors to the modern state of computational protein design [19], and several successful implementations of design programs, but none has had as broad an impact as Rosetta [20], emanating from David Baker’s lab (with considerable development by many collaborators). The Rosetta modelling and design suite of applications has produced remarkable successes, from stable folds not seen in nature [21], to de novo design of enzymes for reactions like the Kemp elimination [22], the retro-aldol reaction [23], and the Diels-Alder reaction [24]. Two recent papers have focused on the use of Rosetta to design folds, and in particular the effects of designing ‘ideal’ versions of particular structures. The results have included some of the most stable proteins observed or designed to date.
Koga et al. focused on the design of proteins with steep folding funnels arising out of assembly of structurally optimized elements containing only local interactions (that is, are close in primary structure), and then assembly of these idealized elements in a way that strongly favors a single tertiary structure [25]. Fundamental rules for ββ, βα and αβ elements were used to discover emergent rules for larger units, like ββα, and then assembled into folds such as the ferredoxin-like βαββαβ. Genes for a total of 54 designs for 5 folds were synthesized, of which 45 expressed and were soluble, 32 had expected CD spectra, and 25 had a Tm greater than 95 °C. No single reason emerges for the high stability of these variants, and the use of some factors outside of the idealized elements such as selection of large hydrophobic residues to strongly favor burial is likely important, but the inference is that the ideality of the elements cumulatively favors folding. That is, getting all the details right, not a single magic bullet, best explains the successful results.
Huang et al. recently used a parametric approach along with Rosetta design tools to control the oligomeric state and handedness of designed helical bundles [26]. An antiparallel 3-helix bundle had a denaturation midpoint of 7 M GdnHCl at 80 °C, corresponding to a ΔG of folding of over 60 kcal mol−1 at 25 °C. A designed five-helix bundle did not melt at 95 °C in PBS; a four-helix bundle did not melt at 95 °C in 8 M GdnHCl. Baker and colleagues note that the 3- and 4-helix proteins are extreme stability outliers in the ProTherm database [27], and attribute the stability to ideal side chain complementarity as well as minimal backbone strain.
Despite these truly remarkable results, it is sobering to note that several related designed proteins that appeared every bit as ideal as the successes, failed to express, or be soluble, or have well-dispersed NMR spectra. The exact nature of the differences among these successes and failures is difficult to discern because it likely comes from multiple inconspicuous nonidealities. To the point, Murphy et al. redesigned the core of CheA, a four helix bundle, using four different approaches for backbone flexibility and core repacking [28]. Two of the designs were very successful, with Tm > 140 °C and a ΔG of folding of 15–16 kcal mol−1. But one did not express, and one had wild-type like stability. Kuhlman and colleagues attempted to discern the differences among the designs by examining Ramachandran and side chain torsion angle preferences and hydrophobic burial, but no simple answer emerged.
The very high stability achievable in idealized proteins compared to what is observed in natural proteins, even from thermophiles, begs the question of whether this kind of ideality is incompatible with function. My guess is that it is not; natural, active proteins seem more likely to have only adequate stability because that is all natural selection demands of them. This question seems akin to whether thermostable proteins can be active at low temperature, which Arnold and colleagues convincingly showed to be possible through directed evolution [29]. But it remains to be seen if dynamic features, binding sites, hydrophobic patches and other features of functional proteins are actually compatible with these idealized frameworks, or if functionalization will necessarily degrade their ideality.
Similarly, despite remarkable successes in de novo enzyme design, the activity levels of the designed enzymes have called for directed evolution for improvements in most cases. In at least one case, a combination of statistically-derived consensus mutations (see below) and directed evolution was needed to improve the best computationally designed Kemp eliminase, KE59, that was too unstable at the outset to evolve [30]. It has become clear that most of the mutations that will collect in a directed evolution campaign will destabilize the subject protein, and so starting with more stable homologs is advisable [6]. Similar considerations may prove useful in computational protein design.
Related issues of positive design (for affinity) and negative design (for specificity) arise in the computational design of protein interactions, wherein one must find the co-optimum for the maximum interaction of a protein for its intended partner and minimum interaction with itself or other near-partner sequences. Design algorithms that specifically address this tradeoff have been implemented successfully, for example for coiled-coils [31,32], and the topic has been reviewed by Chen & Keating [33]. Accurate estimates of the affinities are essential for these calculations, and the success of data-driven over physics-driven approaches [34] highlights the difficulty of such accuracy even for very simple proteins.
Computational prediction of stabilizing mutations
A number of programs have been written or used specifically to estimate the ΔΔG of folding for protein mutants. Among the best known, in addition to Rosetta (which was not actually designed for this purpose), are FoldX [35], Eris [36], and CC/PBSA [37]. Potapov et al. set out to evaluate these (as well as EGAD, I-Mutant2.0, and Hunter) on a set of over 2,000 single-residue protein mutants with known stability effects [38]. For all methods, the correlation coefficient was positive (about 0.4–0.6) between the predicted and experimental results, but less comfortingly, the ΔΔG errors were close to 1 ± 1 kcal mol−1 for all mutations, which appears similar to the distribution of experimental ΔΔG values in the database. Even the sign of ΔΔG was incorrect about 25% of the time for all methods.
In part because Rosetta performed somewhat worse than most of the other algorithms in in Schreiber’s study, Kellogg et al. examined the causes for this in detail [39]. Part of the issue arose from the actual calculation as carried out in Rosetta (increased sampling and dampening of the repulsive term brought Rosetta in line with the other methods), but the authors noted poor performance for hydrophobic/hydrophilic exchanges, surface mutations, volume changes, and where unfolded state effects are observed (as evidenced by changes in m values). Another study by Das concluded that Rosetta’s inability to solve four structural predictions on small proteins or RNAs likely mostly stems from the energy function, and not from sampling [40]. In general backbone freedom appears to help computational methods predict stability, particularly for mutations with large volume changes [41]. It is worth noting that the available training sets for stability prediction algorithms are biased by researcher interest, resulting in the prevalence of Ala mutations (> 25% of the ProTherm database [27]), which mostly result in underpacking.
Surface electrostatics
One of the difficult areas noted above, changes to solvent-exposed residues, has received increased attention as a means of computational stabilization. The effects of protein surface mutations in general are difficult to predict because of our limitations in calculating solvation and attenuated electrostatic effects. And in general, it is the hydrophobic core of the protein and not the surface that is critical for determining the fold and overall stability [13]. On the other hand, structural bioinformatics has demonstrated that proteins from thermophiles do not have markedly different cores or numbers of disulfide bonds than proteins from mesophiles, but they do have different surface charge distributions [42]. In particular, they appear to have more favorable electrostatic interactions.
Makhatadze and Sanchez-Ruiz speculated, based on differences in the pH dependence of chemical and thermal denaturation of ubiquitin, that surface charge optimization could be a viable route to rational protein stabilization [43]. Using Tanford-Kirkwood electrostatics calculations, they showed reasonable agreement to prediction both for stabilizing and destabilizing mutations on the surface of ubiquitin [44]. Makhatadze further demonstrated this for other proteins like acylphosphatase and Cdc42 with little change in activity [45], and has suggested that the stabilization arises mostly from increased folding rates due to elimination of folding frustration in the native sequences [46].
Buried ionizable groups are rare but sometimes are required for protein function. Garcia-Moreno and colleagues have found that 25 buried Lys residues have apparent pKa values from 5.3 to 10.4 [14], and structural perturbations were observed for many of the variants when the Lys is protonated [47]. This makes the stability effects of buried polar residues extremely challenging to predict.
Sequence consensus
Computational approaches to predicting protein stability (or protein stability changes) generally are physics-based, knowledge-based, or a hybrid. Knowledge-based approaches include machine learning (e.g., SVM) applications, but as noted above the relative paucity of experimental data limits these approaches. An alternative knowledge-based approach for predicting stabilizing mutations is the concept of consensus, the idea that mutation to the most common amino acid in the position in an alignment of structural homologs is likely to be stabilizing.
Steipe et al. used this idea to stabilize immunoglobin domains, finding that about half the time individual consensus mutations were stabilizing [48,49]. Wyss found that making full consensus variants (i.e., changing every residue to the most common residue in the alignment) of fungal phytases resulted in highly stabilized proteins [50,51]. This concept has subsequently been used for the generating very stable repeat protein architectures, while maintaining the original fold, which has proved useful for engineering binders to a wide variety of targets [52,53]. Consensus mutations have also been combined with computational approaches to stabilize proteins [54].
Effects of sequence correlation
Although it seems intuitive that the most common amino acids might result in stabilization, it is somewhat more surprising when one realizes that the MSAs used for these studies are dominated by proteins from mesophiles. So why are consensus mutations stabilizing? The likely explanation is that drift allows individual proteins to sample only some of stabilizing mutations needed for adequate stability. Effectively, consensus mutations are causing a compounding of stabilizing interactions that no one proteins needs to amass. A similar effect has been observed from mutations in directed evolution [55]. But it is also easy to think of reasons why this will not always work. Some sites are not well conserved, so the most common residue is not of obvious information value, and some sites are likely to be coupled. Indeed, Ranganathan and colleagues have demonstrated that designs of WW domains are improved in soluble expression and stability by accounting for co-variation of residues [56]. The role of correlated residues is even more poorly understood than those of consensus mutations, but they have been implicated in allosteric regulation in several instances [57–59]. Magliery et al. elucidated the source of high charge in consensus TPR motifs using co-variation analysis, demonstrating that natural TPRs have many charge balancing interactions in weakly conserved positions [60]. We have recently shown correlated mutations in the LID domain of adenylate kinase, originally identified through a sequence correlation visualization modality called MAVL/StickWRLD [61], to be critical to enzymatic activity by modulating protein dynamics (Nicholas Callahan, Deepa Perera, unpublished results).
Sullivan et al. demonstrated that two consensus variants of triosephosphate isomerase differ dramatically in their activity and physical properties despite varying in a small number of unconserved residues [62]. The positions of many of these mutations overlapped positions that participated in a network of statistically interacting residues, especially seen in metazoans. Subsequent work in our lab has shown that narrowing the phylogenetic scope of the database allows one to capture these networks, resulting well-folded, active proteins (Venuka Durani, unpublished results).
These observations also suggested an algorithm to improve consensus design, by making consensus mutations at more-conserved positions and eliminating mutations at highly correlated positions [63,64]. The result in yeast TIM was a 90+% success rate in identifying stabilizing mutations, and large numbers of the mutations could be productively aggregated. Every individual mutant, as well as the algorithmic aggregate mutant, retained high activity. Subsequent studies in our lab on TIMs from humans and bacteria, as well as two adenylate kinases from bacteria, found similar success with this approach (Nicholas Callahan, Sidharth Mohan, unpublished results), although the optimal nature of phylogenetic distribution of the source MSA remains an area for exploration. These modified consensus methods are much simpler than computational methods for non-experts to deploy, and do not require structural information.
Reconstruction of ancestral proteins bears considerable resemblance to these methods, often resulting in similar mutations [65,66]. Recent work by Sanchez-Ruiz suggests that differences between ancestral and consensus mutations are more likely at unconserved positions, and ancestral reconstructions worked better at least in the limited cases tested [67]. It seems possible that ancestral reconstruction is effectively capturing both sequence consensus information in well-conserved sites and some sequence correlation information in poorly-conserved sites.
Experimental approaches
Directed evolution remains a very important tool for stabilizing proteins, and it has the advantage for enzymes and other functional proteins that it is possible to screen for activity and stability in the same experiment. The success of totally random methods supports the distributed role of positions in stabilizing the protein, but the improvement provided by recombinogenic methods (like DNA shuffling) likewise illustrates that many single mutations can be so unfavorable that methods are needed to remove them. Experimental approaches to screening for stabilization have been reviewed elsewhere [68–70]. Methods of screening or selecting for stabilizing mutations generally are either inferential—which is to say they are actually a screen for a correlate of protein stability, such as protease resistance [71]—or direct, such as chemical modification (H/D exchange or Cys reactivity [72]) or dye binding, as with differential scanning fluorimetry (DSF) and its high-throughput implementation HTTS [73]. Direct methods in general tend to be more information-rich and avoid “you get what you select for” sorts of variants (e.g., unstable but protease resistant sequences), but they also tend to be harder to deploy and lower throughput.
Computational and experimental methods can be used together to generate stable proteins with other selected properties. Computation ideally allows one to search in a more targeted region of sequence space, increasing rates of desired selectants and thus reducing library sizes. One successful method along those lines is SCHEMA, which seeks to break proteins down into fragments, the structure of which will not be perturbed upon recombination into another protein [74]. The method was used to engineer thermostable cellulases from a small number of parent sequences, and the stability effects linearly deconvoluted from the recombinant proteins were shown to be largely additive (i.e., the schema were in fact largely independent as predicted) [75]. The deconvolution uncovered a single mutation that stabilized recombinants by nearly 10 °C. Snow and colleagues recently produced a more cautionary tale on the concerted use of consensus, FoldX, Rosetta, molecular dynamics, and SCHEMA, in which additivity was not observed for schema from endoglucanase E1 [76]. While some stabilizing mutations were identified by multiple methods, one destabilizing mutation looked favorable by FoldX, Rosetta and consensus, raising red flags only in MD simulation.
A number of approaches have been taken to guide library design with computation, mostly to narrow down sequence space, such as library score optimization [77], amino acid profiles derived from families of computational design solutions [78], and probabilistic definitions of amino acid profiles [79] (see the review by Chen & Keating [33] for more detail). These codon or amino acid profiles may capture both functional and structural aspects (including stability) of a particular design problem, depending on what is simulated.
One of the most promising new experimental tools for engineering and understanding stability is the use of next-generation or deep sequencing technologies in combination with selections. These experiments are no different from other inferential methods of screening for stability by function (cellular selection, phage or yeast display, etc.), but rather than collapse down to a small number of “best” selectants, the entire library can be sequenced over rounds of selection. Fields [80] and Bolon [81] have used this approach to examine protein fitness landscapes, and Sidhu [82] and Keating [83] have used it to examine binding. Kelly, Fields and colleagues attempted to identify stabilizing mutations from site that result in positive epistasis scores in multiple mutant backgrounds [84]. They scanned a library of single and double mutants of a WW domain for binding to the parent cognate ligand. The method identified 15 likely stabilizing mutants, of which three were known to be stabilizing. Of six of the unknown mutants synthesized, two were very destabilized, one slightly destabilized, one neutral, and two were stabilized. Interestingly, none of the 15 candidate mutations were identified as stabilizing by FoldX. It remains to be seen if other scoring algorithms can be developed to more cleanly isolate stability effects, but the prospects are exciting.
Conclusion
Given the large role of solvent entropy in protein folding and our lack of knowledge of how to model the unfolded state, it is quite remarkable that we are able to predict stabilizing mutations at all. Surprisingly, tight core packing must be correlated enough with other uncomputable factors, and the unfolded state must be sufficiently homogenous, that protein stability computations can be useful. But it is easy to understand why there is a limit to accuracy of the current approaches. One thing we have learned in recent years is that very high protein stability is more a function of getting many subtle factors right than simply maximizing a single factor (although core packing remains the most important single effect). The role of backbone strain appears to have been underappreciated to this point, for example.
Similarly, it is surprising, given the many pressures on proteins throughout evolution and the non-randomness of our sampling of phylogeny, that stabilizing mutations can be inferred from sequence alignments as effectively as they are. Co-variation information is likely to be of increasing use with more sequences and improved alignment accuracy. Both computation and statistical approaches are beginning to be integrally incorporated into library design, and reciprocally, library approaches are revealing and fixing the flaws in designs. Recent tools for large scale sequence analysis and direct high throughput measurements of stability stand to improve radically our knowledge of and ability to predict stability effects in the coming years.
Highlights.
Fundamental flaws in computing protein stability will be difficult to overcome.
Very stable idealized proteins have been designed by optimizing many factors at once.
Consensus mutations reliably produce active, stable mutants, especially when filtered for co-variation.
Combining these approaches with experimental libraries is feasible and powerful.
Acknowledgments
The author is grateful to the NIH (R01GM083114 and U54 NS058183), NSF (DBI1262469) and Enlyton, Ltd., for support for studies related to protein stability and engineering. Thanks to Nicholas Callahan and Brandon Sullivan for helpful comments.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References and recommended reading
- 1.Berlow RB, Dyson HJ, Wright PE. Functional advantages of dynamic protein disorder. FEBS Lett. 2015 doi: 10.1016/j.febslet.2015.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Anfinsen CB. The formation and stabilization of protein structure. Biochemical Journal. 1972;128:737–749. doi: 10.1042/bj1280737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Keefe AD, Szostak JW. Functional proteins from a random-sequence library. Nature. 2001;410:715–718. doi: 10.1038/35070613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Minard P, Scalley-Kim M, Watters A, Baker D. A “loop entropy reduction” phage-display selection for folded amino acid sequences. Protein Science. 2001;10:129–134. doi: 10.1110/ps.32401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kleina LG, Miller JH. Genetic studies of the lac repressor. XIII.Extensive amino acid replacements generated by the use of natural and synthetic nonsense suppressors. Journal of Molecular Biology. 1990;212:295–318. doi: 10.1016/0022-2836(90)90126-7. [DOI] [PubMed] [Google Scholar]
- 6.Bloom JD, Labthavikul ST, Otey CR, Arnold FH. Protein stability promotes evolvability. Proc Natl Acad Sci U S A. 2006;103:5869–5874. doi: 10.1073/pnas.0510098103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bullock AN, Fersht AR. Rescuing the function of mutant p53. Nature Reviews Cancer. 2001;1:68–76. doi: 10.1038/35094077. [DOI] [PubMed] [Google Scholar]
- 8.Kiernan MC, Vucic S, Cheah BC, Turner MR, Eisen A, Hardiman O, Burrell JR, Zoing MC. Amyotrophic lateral sclerosis. Lancet. 2011;377:942–955. doi: 10.1016/S0140-6736(10)61156-7. [DOI] [PubMed] [Google Scholar]
- 9.Lee CC, Perchiacca JM, Tessier PM. Toward aggregation-resistant antibodies by design. Trends Biotechnol. 2013;31:612–620. doi: 10.1016/j.tibtech.2013.07.002. [DOI] [PubMed] [Google Scholar]
- 10.Dill KA. Dominant forces in protein folding. Biochemistry. 1990;29:7133–7155. doi: 10.1021/bi00483a001. [DOI] [PubMed] [Google Scholar]
- 11.Rose GD, Wolfenden R. Hydrogen bonding, hydrophobicity, packing, and protein folding. Annu Rev Biophys Biomol Struct. 1993;22:381–415. doi: 10.1146/annurev.bb.22.060193.002121. [DOI] [PubMed] [Google Scholar]
- 12.Pace CN, Scholtz JM, Grimsley GR. Forces stabilizing proteins. FEBS Lett. 2014;588:2177–2184. doi: 10.1016/j.febslet.2014.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Cordes MH, Davidson AR, Sauer RT. Sequence space, folding and protein design. Current Opinion in Structural Biology. 1996;6:3–10. doi: 10.1016/s0959-440x(96)80088-1. [DOI] [PubMed] [Google Scholar]
- 14.Isom DG, Castaneda CA, Cannon BR, Garcia-Moreno B. Large shifts in pKa values of lysine residues buried inside a protein. Proc Natl Acad Sci U S A. 2011;108:5260–5265. doi: 10.1073/pnas.1010750108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Sanchez-Ruiz JM, Makhatadze GI. To charge or not to charge? Trends Biotechnol. 2001;19:132–135. doi: 10.1016/s0167-7799(00)01548-1. [DOI] [PubMed] [Google Scholar]
- 16.Freddolino PL, Harrison CB, Liu Y, Schulten K. Challenges in protein folding simulations: Timescale, representation, and analysis. Nat Phys. 2010;6:751–758. doi: 10.1038/nphys1713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Matthews BW. Studies on protein stability with T4 lysozyme. Advances in Protein Chemistry. 1995;46:249–278. doi: 10.1016/s0065-3233(08)60337-x. [DOI] [PubMed] [Google Scholar]
- 18.Gambin Y, Schug A, Lemke EA, Lavinder JJ, Ferreon AC, Magliery TJ, Onuchic JN, Deniz AA. Direct single-molecule observation of a protein living in two opposed native structures. Proc Natl Acad Sci U S A. 2009;106:10153–10158. doi: 10.1073/pnas.0904461106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Samish I, MacDermaid CM, Perez-Aguilar JM, Saven JG. Theoretical and computational protein design. Annu Rev Phys Chem. 2011;62:129–149. doi: 10.1146/annurev-physchem-032210-103509. [DOI] [PubMed] [Google Scholar]
- 20.Das R, Baker D. Macromolecular modeling with rosetta. Annu Rev Biochem. 2008;77:363–382. doi: 10.1146/annurev.biochem.77.062906.171838. [DOI] [PubMed] [Google Scholar]
- 21.Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. Design of a novel globular protein fold with atomic-level accuracy. Science. 2003;302:1364–1368. doi: 10.1126/science.1089427. [DOI] [PubMed] [Google Scholar]
- 22.Rothlisberger D, Khersonsky O, Wollacott AM, Jiang L, DeChancie J, Betker J, Gallaher JL, Althoff EA, Zanghellini A, Dym O, et al. Kemp elimination catalysts by computational enzyme design. Nature. 2008;453:190–195. doi: 10.1038/nature06879. [DOI] [PubMed] [Google Scholar]
- 23.Jiang L, Althoff EA, Clemente FR, Doyle L, Rothlisberger D, Zanghellini A, Gallaher JL, Betker JL, Tanaka F, Barbas CF, 3rd, et al. De novo computational design of retro-aldol enzymes. Science. 2008;319:1387–1391. doi: 10.1126/science.1152692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Siegel JB, Zanghellini A, Lovick HM, Kiss G, Lambert AR, St Clair JL, Gallaher JL, Hilvert D, Gelb MH, Stoddard BL, et al. Computational design of an enzyme catalyst for a stereoselective bimolecular Diels-Alder reaction. Science. 2010;329:309–313. doi: 10.1126/science.1190239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25**.Koga N, Tatsumi-Koga R, Liu G, Xiao R, Acton TB, Montelione GT, Baker D. Principles for designing ideal protein structures. Nature. 2012;491:222–227. doi: 10.1038/nature11600. By design of small, ideal structural elements, larger proteins with exceptionally high stability were built using rules that emerged from connection of the idealized elements. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26**.Huang PS, Oberdorfer G, Xu C, Pei XY, Nannenga BL, Rogers JM, DiMaio F, Gonen T, Luisi B, Baker D. High thermodynamic stability of parametrically designed helical bundles. Science. 2014;346:481–485. doi: 10.1126/science.1257481. The combination of parametric design and Rosetta modeling not only yielded the intended oligomeric states and structures, but also some of the most stable proteins ever designed or observed. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kumar MD, Bava KA, Gromiha MM, Prabakaran P, Kitajima K, Uedaira H, Sarai A. ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions. Nucleic Acids Res. 2006;34:D204–206. doi: 10.1093/nar/gkj103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Murphy GS, Mills JL, Miley MJ, Machius M, Szyperski T, Kuhlman B. Increasing sequence diversity with flexible backbone protein design: the complete redesign of a protein hydrophobic core. Structure. 2012;20:1086–1096. doi: 10.1016/j.str.2012.03.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Giver L, Gershenson A, Freskgard PO, Arnold FH. Directed evolution of a thermostable esterase. Proc Natl Acad Sci U S A. 1998;95:12809–12813. doi: 10.1073/pnas.95.22.12809. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Khersonsky O, Kiss G, Rothlisberger D, Dym O, Albeck S, Houk KN, Baker D, Tawfik DS. Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59. Proc Natl Acad Sci U S A. 2012;109:10358–10363. doi: 10.1073/pnas.1121063109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Grigoryan G, Reinke AW, Keating AE. Design of protein-interaction specificity gives selective bZIP-binding peptides. Nature. 2009;458:859–864. doi: 10.1038/nature07885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Havranek JJ, Harbury PB. Automated design of specificity in molecular recognition. Nat Struct Biol. 2003;10:45–52. doi: 10.1038/nsb877. [DOI] [PubMed] [Google Scholar]
- 33.Chen TS, Keating AE. Designing specific protein-protein interactions using computation, experimental library screening, or integrated methods. Protein Sci. 2012;21:949–963. doi: 10.1002/pro.2096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34*.Potapov V, Kaplan JB, Keating AE. Data-driven prediction and design of bZIP coiled-coil interactions. PLoS Comput Biol. 2015;11:e1004046. doi: 10.1371/journal.pcbi.1004046. The authors demonstrate that a machine learning approach yields superior results in desinging coiled coil specficity over previous physics based approaches. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L. The FoldX web server: an online force field. Nucleic Acids Res. 2005;33:W382–388. doi: 10.1093/nar/gki387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Yin S, Ding F, Dokholyan NV. Eris: an automated estimator of protein stability. Nat Methods. 2007;4:466–467. doi: 10.1038/nmeth0607-466. [DOI] [PubMed] [Google Scholar]
- 37.Benedix A, Becker CM, de Groot BL, Caflisch A, Bockmann RA. Predicting free energy changes using structural ensembles. Nat Methods. 2009;6:3–4. doi: 10.1038/nmeth0109-3. [DOI] [PubMed] [Google Scholar]
- 38.Potapov V, Cohen M, Schreiber G. Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Eng Des Sel. 2009;22:553–560. doi: 10.1093/protein/gzp030. [DOI] [PubMed] [Google Scholar]
- 39.Kellogg EH, Leaver-Fay A, Baker D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins. 2011;79:830–838. doi: 10.1002/prot.22921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Das R. Four small puzzles that Rosetta doesn’t solve. PLoS One. 2011;6:e20044. doi: 10.1371/journal.pone.0020044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Smith CA, Kortemme T. Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction. J Mol Biol. 2008;380:742–756. doi: 10.1016/j.jmb.2008.05.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kumar S, Tsai CJ, Nussinov R. Factors enhancing protein thermostability. Protein Eng. 2000;13:179–191. doi: 10.1093/protein/13.3.179. [DOI] [PubMed] [Google Scholar]
- 43.Ibarra-Molero B, Loladze VV, Makhatadze GI, Sanchez-Ruiz JM. Thermal versus guanidine-induced unfolding of ubiquitin. An analysis in terms of the contributions from charge-charge interactions to protein stability. Biochemistry. 1999;38:8138–8149. doi: 10.1021/bi9905819. [DOI] [PubMed] [Google Scholar]
- 44.Loladze VV, Ibarra-Molero B, Sanchez-Ruiz JM, Makhatadze GI. Engineering a thermostable protein via optimization of charge-charge interactions on the protein surface. Biochemistry. 1999;38:16419–16423. doi: 10.1021/bi992271w. [DOI] [PubMed] [Google Scholar]
- 45.Gribenko AV, Patel MM, Liu J, McCallum SA, Wang C, Makhatadze GI. Rational stabilization of enzymes by computational redesign of surface charge-charge interactions. Proc Natl Acad Sci U S A. 2009;106:2601–2606. doi: 10.1073/pnas.0808220106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46*.Tzul FO, Schweiker KL, Makhatadze GI. Modulation of folding energy landscape by charge-charge interactions: linking experiments with computational modeling. Proc Natl Acad Sci U S A. 2015;112:E259–266. doi: 10.1073/pnas.1410424112. Surface charge optimization was found to stabilize by accellerating protein folding, and modeling suggests this occurs by relief of native-sequence frustration in the folding reaction. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47*.Chimenti MS, Khangulov VS, Robinson AC, Heroux A, Majumdar A, Schlessman JL, Garcia-Moreno B. Structural reorganization triggered by charging of Lys residues in the hydrophobic interior of a protein. Structure. 2012;20:1071–1085. doi: 10.1016/j.str.2012.03.023. Stuctural studies show that buried, protonated Lys can cause global unfolding, local unfolding, or local perturbations that complicate the prediction of stability for burial of charged residues. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Steipe B, Schiller B, Pluckthun A, Steinbacher S. Sequence statistics reliably predict stabilizing mutations in a protein domain. J Mol Biol. 1994;240:188–192. doi: 10.1006/jmbi.1994.1434. [DOI] [PubMed] [Google Scholar]
- 49.Wirtz P, Steipe B. Intrabody construction and expression III: engineering hyperstable V(H) domains. Protein Sci. 1999;8:2245–2250. doi: 10.1110/ps.8.11.2245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Lehmann M, Loch C, Middendorf A, Studer D, Lassen SF, Pasamontes L, van Loon AP, Wyss M. The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Eng. 2002;15:403–411. doi: 10.1093/protein/15.5.403. [DOI] [PubMed] [Google Scholar]
- 51.Lehmann M, Pasamontes L, Lassen SF, Wyss M. The consensus concept for thermostability engineering of proteins. Biochim Biophys Acta. 2000;1543:408–415. doi: 10.1016/s0167-4838(00)00238-7. [DOI] [PubMed] [Google Scholar]
- 52.Main ER, Lowe AR, Mochrie SG, Jackson SE, Regan L. A recurring theme in protein engineering: the design, stability and folding of repeat proteins. Curr Opin Struct Biol. 2005;15:464–471. doi: 10.1016/j.sbi.2005.07.003. [DOI] [PubMed] [Google Scholar]
- 53.Varadamsetty G, Tremmel D, Hansen S, Parmeggiani F, Pluckthun A. Designed Armadillo repeat proteins: library generation, characterization and selection of peptide binders with high specificity. J Mol Biol. 2012;424:68–87. doi: 10.1016/j.jmb.2012.08.029. [DOI] [PubMed] [Google Scholar]
- 54*.Wijma HJ, Floor RJ, Jekel PA, Baker D, Marrink SJ, Janssen DB. Computationally designed libraries for rapid enzyme stabilization. Protein Eng Des Sel. 2014;27:49–58. doi: 10.1093/protein/gzt061. FRESCO is a method of reducing the amount of screening that must be done to find stabilizing mutations by using computation to identify stabilizing point mutations and new disulfide bonds. These can then be screened and aggregated into a stable multimutant. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Jackel C, Bloom JD, Kast P, Arnold FH, Hilvert D. Consensus protein design without phylogenetic bias. J Mol Biol. 2010;399:541–546. doi: 10.1016/j.jmb.2010.04.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Socolich M, Lockless SW, Russ WP, Lee H, Gardner KH, Ranganathan R. Evolutionary information for specifying a protein fold. Nature. 2005;437:512–518. doi: 10.1038/nature03991. [DOI] [PubMed] [Google Scholar]
- 57.Reynolds KA, McLaughlin RN, Ranganathan R. Hot spots for allosteric regulation on protein surfaces. Cell. 2011;147:1564–1575. doi: 10.1016/j.cell.2011.10.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Lee J, Natarajan M, Nashine VC, Socolich M, Vo T, Russ WP, Benkovic SJ, Ranganathan R. Surface sites for engineering allosteric control in proteins. Science. 2008;322:438–442. doi: 10.1126/science.1159052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Hatley ME, Lockless SW, Gibson SK, Gilman AG, Ranganathan R. Allosteric determinants in guanine nucleotide-binding proteins. Proc Natl Acad Sci U S A. 2003;100:14445–14450. doi: 10.1073/pnas.1835919100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Magliery TJ, Regan L. Beyond consensus: statistical free energies reveal hidden interactions in the design of a TPR motif. J Mol Biol. 2004;343:731–745. doi: 10.1016/j.jmb.2004.08.026. [DOI] [PubMed] [Google Scholar]
- 61.Ozer HG, Ray WC. MAVL/StickWRLD: analyzing structural constraints using interpositional dependencies in biomolecular sequence alignments. Nucleic Acids Res. 2006;34:W133–136. doi: 10.1093/nar/gkl251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Sullivan BJ, Durani V, Magliery TJ. Triosephosphate isomerase by consensus design: dramatic differences in physical properties and activity of related variants. J Mol Biol. 2011;413:195–208. doi: 10.1016/j.jmb.2011.08.001. [DOI] [PubMed] [Google Scholar]
- 63*.Sullivan BJ, Nguyen T, Durani V, Mathur D, Rojas S, Thomas M, Syu T, Magliery TJ. Stabilizing proteins from sequence statistics: the interplay of conservation and correlation in triosephosphate isomerase stability. J Mol Biol. 2012;420:384–399. doi: 10.1016/j.jmb.2012.04.025. The authors demonstrate that filtering consensus mutations for highly coupled (covarying) positions results in a much higher rate of success in identifying stabilizing mutations, and also leads to additivity and high activity. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Durani V, Magliery TJ. Protein engineering and stabilization from sequence statistics: variation and covariation analysis. Methods Enzymol. 2013;523:237–256. doi: 10.1016/B978-0-12-394292-0.00011-4. [DOI] [PubMed] [Google Scholar]
- 65.Watanabe K, Ohkuri T, Yokobori S, Yamagishi A. Designing thermostable proteins: ancestral mutants of 3-isopropylmalate dehydrogenase designed by using a phylogenetic tree. J Mol Biol. 2006;355:664–674. doi: 10.1016/j.jmb.2005.10.011. [DOI] [PubMed] [Google Scholar]
- 66.Harms MJ, Thornton JW. Analyzing protein structure and function using ancestral gene reconstruction. Curr Opin Struct Biol. 2010;20:360–366. doi: 10.1016/j.sbi.2010.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Risso VA, Gavira JA, Gaucher EA, Sanchez-Ruiz JM. Phenotypic comparisons of consensus variants versus laboratory resurrections of Precambrian proteins. Proteins. 2014;82:887–896. doi: 10.1002/prot.24575. [DOI] [PubMed] [Google Scholar]
- 68.Magliery TJ, Lavinder JJ, Sullivan BJ. Protein stability by number: high-throughput and statistical approaches to one of protein science’s most difficult problems. Curr Opin Chem Biol. 2011;15:443–451. doi: 10.1016/j.cbpa.2011.03.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Magliery TJ, Regan L. Combinatorial approaches to protein stability and structure. Eur J Biochem. 2004;271:1595–1608. doi: 10.1111/j.1432-1033.2004.04075.x. [DOI] [PubMed] [Google Scholar]
- 70.Roodveldt C, Aharoni A, Tawfik DS. Directed evolution of proteins for heterologous expression and stability. Curr Opin Struct Biol. 2005;15:50–56. doi: 10.1016/j.sbi.2005.01.001. [DOI] [PubMed] [Google Scholar]
- 71.Finucane MD, Tuna M, Lees JH, Woolfson DN. Core-directed protein design. I.An experimental method for selecting stable proteins from combinatorial libraries. Biochemistry. 1999;38:11604–11612. doi: 10.1021/bi990765n. [DOI] [PubMed] [Google Scholar]
- 72.Geer MA, Fitzgerald MC. Energetics-based methods for protein folding and stability measurements. Annu Rev Anal Chem (Palo Alto Calif) 2014;7:209–228. doi: 10.1146/annurev-anchem-071213-020024. [DOI] [PubMed] [Google Scholar]
- 73.Lavinder JJ, Hari SB, Sullivan BJ, Magliery TJ. High-throughput thermal scanning: a general, rapid dye-binding thermal shift screen for protein engineering. J Am Chem Soc. 2009;131:3794–3795. doi: 10.1021/ja8049063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Voigt CA, Martinez C, Wang ZG, Mayo SL, Arnold FH. Protein building blocks preserved by recombination. Nat Struct Biol. 2002;9:553–558. doi: 10.1038/nsb805. [DOI] [PubMed] [Google Scholar]
- 75.Heinzelman P, Snow CD, Smith MA, Yu X, Kannan A, Boulware K, Villalobos A, Govindarajan S, Minshull J, Arnold FH. SCHEMA recombination of a fungal cellulase uncovers a single mutation that contributes markedly to stability. J Biol Chem. 2009;284:26229–26233. doi: 10.1074/jbc.C109.034058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Johnson LB, Gintner LP, Park S, Snow CD. Discriminating between stabilizing and destabilizing protein design mutations via recombination and simulation. Protein Eng Des Sel. 2015;28:259–267. doi: 10.1093/protein/gzv030. [DOI] [PubMed] [Google Scholar]
- 77**.Chen TS, Palacios H, Keating AE. Structure-based redesign of the binding specificity of anti-apoptotic Bcl-x(L) J Mol Biol. 2013;425:171–185. doi: 10.1016/j.jmb.2012.11.009. Computational modeling is used to dramatically reduce the library sizes required to generate specific protein binders, and the method is reduced to practice to yield a Bcl-xL varaint specific for binding the Bad peptide. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Guntas G, Purbeck C, Kuhlman B. Engineering a protein-protein interface using a computationally designed library. Proc Natl Acad Sci U S A. 2010;107:19296–19301. doi: 10.1073/pnas.1006528107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Saven JG. Combinatorial protein design. Current Opinion in Structural Biology. 2002;12:453–458. doi: 10.1016/s0959-440x(02)00347-0. [DOI] [PubMed] [Google Scholar]
- 80.Fowler DM, Araya CL, Fleishman SJ, Kellogg EH, Stephany JJ, Baker D, Fields S. High-resolution mapping of protein sequence-function relationships. Nat Methods. 2010;7:741–746. doi: 10.1038/nmeth.1492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81**.Hietpas RT, Jensen JD, Bolon DN. Experimental illumination of a fitness landscape. Proc Natl Acad Sci U S A. 2011;108:7896–7901. doi: 10.1073/pnas.1016024108. Among the first studies to use deep sequencing to examine laboratory selected populations, the authors suggest key methods and frameworks for interpretation that shed light on the potential uses of the technology in design. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Ernst A, Gfeller D, Kan Z, Seshagiri S, Kim PM, Bader GD, Sidhu SS. Coevolution of PDZ domain-ligand interactions analyzed by high-throughput phage display and deep sequencing. Mol Biosyst. 2010;6:1782–1790. doi: 10.1039/c0mb00061b. [DOI] [PubMed] [Google Scholar]
- 83.Reich LL, Dutta S, Keating AE. SORTCERY-A High-Throughput Method to Affinity Rank Peptide Ligands. J Mol Biol. 2015;427:2135–2150. doi: 10.1016/j.jmb.2014.09.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84**.Araya CL, Fowler DM, Chen W, Muniez I, Kelly JW, Fields S. A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proc Natl Acad Sci U S A. 2012;109:16858–16863. doi: 10.1073/pnas.1209751109. Although only a small number of stabilizing mutations were identified, the use of epistastis data from fitness selections to infer protein stability prefigures exciting new approaches for identifying physical properties from deep sequencing methods. [DOI] [PMC free article] [PubMed] [Google Scholar]