Abstract
Protein design methodology has made remarkable progress over the past decade. Historically, the low reliability of purely structure-based design methods limited their application. But recent strategies that combine structure- and sequence-based calculations as well as machine-learning approaches, have dramatically improved success. These approaches have led to design of increasingly complex structures and therapeutically relevant activities. Additionally, protein optimization methods have improved the stability and activity of complex eukaryotic proteins. Thanks to their heightened reliability, design methods have been applied to improve therapeutics, enzymes for green chemistry, and generated vaccines, antivirals and drug-delivery nanostructures. These exciting developments bring protein design closer to becoming a mainstream approach in protein science and engineering. Moreover, they reflect an increased understanding of basic rules that govern the relationship between protein sequence, structure, and function. We note, however, that de novo design is still limited mostly to α-helix bundles, restricting its potential to generate sophisticated enzymes and diverse protein and small-molecule binders. Designing complex protein structures is a challenging next step if the field is to realize its objective of generating new-to-nature activities. Here, we review developments in computational protein optimization and de novo design of function and highlight pressing areas for future research.
Keywords: protein design, de novo design, protein optimization, protein engineering, protein structure
Introduction
Life depends on myriads of protein functions. Decades of efforts to understand the determinants of protein function have resulted in a deeper mechanistic understanding of life processes and the generation of new proteins, including life-saving drugs and enzymes that catalyze desirable chemical reactions in an environmentally friendly way1,2. Protein function is determined by amino acid sequence and structure (Fig. 1A), and understanding these determinants can dramatically improve our ability to control protein activity. Due to the complex relationships between sequence, structure, and function, however, protein-engineering methods have historically relied on experimental screening of natural proteins and iterative mutational processes to discover and optimize biochemical functions3–6. Although powerful, these approaches exhibit limited reliability, are time-consuming, and can mostly be applied to proteins with functions that are amenable to high-throughput screening7–9. Additionally, these methods rely on trial and error rather than on direct engineering of the desired properties. The fact that proteins that are critical to research, human health and industry are generated using random processes reveals critical gaps in our understanding of how function is encoded in proteins. One of the most important goals of the protein design field is to bridge these gaps.
Figure 1. Goals of protein design methodology.
A. The fundamental paradigm of protein science is that amino acid sequence determines structure which in turn determines function27. Protein design has been historically dubbed as the inverse folding problem20 of finding the sequence that will fold to a desired structure (step 3. Fold Design). By extension, the problem of finding the sequences and structures that can realize a desired function is the inverse function problem (step 4. Function Design). Computational methods for (1) structure prediction and (2) function annotation have recently made considerable advances. The inverse problems of fold (3) and function (4) design have also made important progress, but significant gaps remain in the design of complex folds and functions relative to those observed in nature. B. A fitness landscape is an abstract model that relates protein variants to their relative activities28. Nearby points on the landscape represent close sequences and the relative heights represent relative fitnesses or activities. Evolution and conventional methods in protein engineering iteratively explore nearby mutants (dashed black lines) starting from a naturally occurring protein (circle close to the viewer). Such methods may find local fitness optima but are restricted in their ability to find distant ones (peak in the background). New computational design methods may be able to reach such solutions which are unlikely to emerge through evolutionary approaches or would emerge only slowly.
The main challenge of computational protein design is to develop reliable models for protein generation and optimization that are based on computable features. Until a decade ago, such methods made only limited contributions to protein engineering due to their low accuracy and reliability6,10–14. Since then, however, the field has made significant progress, especially in the design and prediction of protein structures15 (Fig. 1A steps 1 and 3), and we refer the reader to recent reviews on the milestones that enabled these advances16–18. Given this progress, the next frontier for computational protein design can be dubbed the “inverse function” problem, expanding on the influential paradigm from the early 1990’s that cast design as the “inverse folding” problem (Fig. 1A step 4)19,20. Whereas the inverse folding problem asks which amino acid sequences fold into a desired three-dimensional structure, the inverse function problem asks how to develop strategies for generating new or improved protein functions. In addition to verifying our understanding of protein design principles, advancing this front will accelerate fundamental and applied discoveries (Fig. 1B) that are crucial to human health, industry, and environmental sustainability.
The advances made over the past decade are mainly due to methods that combine molecular and physical principles with data-based approaches. These have led to higher reliability and broad usefulness. Particularly, stability-design methods have become so reliable that they have been successfully applied to dozens of different protein families, including ones that did not yield to experimental optimization strategies21. In addition, de novo protein design, the generation of proteins from scratch, has reached remarkable accuracy and scope, allowing the routine design of small proteins and large self-assembling complexes22–24. De novo designed proteins have been further designed to generate new binders of proteins or small molecules23,25,26, advancing toward completely rational generation of new-to-nature activities.
Thus, protein design methodology has come a long way, and some strategies have become integrated in basic and applied protein engineering workflows. Here, we will review the main biophysical challenges that the protein design field has had to address to increase the reliability and scope of these methods. In addition, we will survey some of the opportunities that new methods have already begun to realize while underlining areas where they still struggle. Our perspective is divided into two broad categories of design goals: optimizing existing, typically natural, protein activities, and de novo design of fold and function. We will also review the increasing uses of machine learning in design and highlight areas where protein design methodology may lead to transformative advances in the coming years.
Protein Optimization
Natural evolution adapts proteins to the requirements of their host organisms. The requirements of research and application, however, may differ in important ways from those of the natural host. For instance, the expression levels, stability, specificity, or activity of the natural protein may fall short relative to those needed for human use6. Therefore, understanding the rules that govern protein stability and activity has been a long-standing goal of protein science. Protein optimization may be especially challenging because it typically strives to find a protein that excels in several different properties9,29. Furthermore, large gains in stability and activity often require many mutations30, demanding methods that can accurately predict how changes in sequence and structure impact activity.
According to the Thermodynamic Hypothesis, the protein native-state energy must be significantly lower than all other states, including misfolded and unfolded ones, for a significant fraction of the protein to fold uniquely into the native state27 (Figure 2A). Thus, general methods for protein design must often comprise elements of both positive and negative design, favoring the desired state and disfavoring competitors, respectively. The fundamental problem that all general protein design strategies face is that only the desired state is defined in atomic detail and is amenable to atomistic calculations, whereas the competing structural states are typically unknown10. To appreciate the magnitude of the negative-design problem, we may consider that the number of possible undesired states is likely to scale with the exponent of the size of the protein31,32, and that the median size of proteins is 300 amino acids33. Given the astronomically large space of combinations of mutations and conformations, ensuring that the desired state exhibits significantly lower energy than any of the myriads of unknown competing states is a daunting task. Indeed, numerous studies have demonstrated that design and engineering efforts often result in proteins that exhibit lower stability than the natural starting point and a tendency to aggregate, misfold, or exhibit conformationally flexible regions13,34–39.
Figure 2. Computational stability design.
A. Schematic representation of the folding landscape of a marginally stable protein generated using physics-based design methods alone (top) versus physics and phylogeny-driven approaches (bottom). Physics-based methods have access only to the native state and may inadvertently lower the energy of competing (misfolded and unfolded) states, thereby not addressing negative design. Mutations that lower the energy of undesired states are likely to be purged by evolutionary selection; accordingly, a protein designed using a combination of atomistic and phylogenetic constraints may preferentially fold into the native state. E: energy, vdW: van der Waals interactions, elec: Coulomb electrostatic interactions, sol: solvation. B. Stability design of the SARS-CoV-2 Spike protein monomer (PDB entry: 6VYB). A design that encodes 20 mutations (S2D14) in the S2 Spike protein subunit (top left) increased protein yields 11-fold (top right) while improving pseudovirus neutralization titers (pVNT50) of several SARS-CoV-2 variants of concern (bottom)67. S-2P is a rationally designed Spike variant that encodes stabilizing mutations Lys986Pro and Val987Pro. Data for panel (B) generously provided by Wayne Harshbarger.
One solution to the problem of encoding negative-design elements is using additional data to guide structure-based calculations. For instance, it is likely that sequence elements that are prone to misfolding and aggregation have been eliminated through natural selection40. In an approach called evolution-guided atomistic design, the natural diversity of homologous sequences is analyzed at each position of the target protein to eliminate rare mutations from the design choices before the atomistic design step21,41. Such filtering implements some aspects of negative design while reducing the design sequence space by many orders of magnitude and focusing it on the space that is more likely to fold stably and accurately40,42,43. Subsequent atomistic design calculations stabilize the desired state within this reduced sequence space, thus implementing elements of positive design (Fig. 2A).
Another successful approach that improves design success relies on machine-learning inferences applied to experimental data to predict mutations for further experimental screening44–46. These empirical approaches are powerful45,47,48 and may become even more so with the advent of deep-learning based Large Language Models (LLMs). However, they demand iterations of mutagenesis and screening for each target protein and apply only to proteins that are amenable to at least medium-throughput screening. The current perspective focuses on structure-based design methods that do not rely on experimentally determined mutation data, and we refer the interested reader to recent reviews on protein engineering by machine-learning methods applied to experimental datasets44,46.
[H2] Protein stability design
Many natural proteins exhibit only a small energy difference between the native state and the myriads of unfolded or misfolded states and are therefore marginally stable40. Marginal stability is often masked in the natural host of the protein due to dedicated cellular machines that promote folding, such as chaperones and proteases49. Furthermore, in some cases, marginal stability may be an evolutionarily selected property that ensures fast protein turnover50. Yet, marginal stability typically implies low expression levels in heterologous hosts51–53, limiting usefulness in basic and applied research. The problem of low heterologous expression levels is surprisingly prevalent, as the fraction of cytosolic proteins that are amenable to overexpression is estimated to be <50% of any proteome54,55, and membrane proteins are typically even less accommodating48,56,57. Furthermore, it is especially difficult to introduce mutations that improve activities in marginally stable proteins because these mutations may reduce stability below the threshold required for protein folding8,9,29,58. Therefore, marginal stability is perhaps the most general problem in protein engineering and design.
Due to the very broad impact of protein stability on research and engineering, several structure-based methods have been developed over the past decade to address stability design42,59–61. The native state of a protein relies on weak and transient forces, and it is the combined effect of thousands of such interactions that favors this state over all others40. Thus, stability optimization methods may suggest dozens of mutations relative to the wild-type protein to generate significant improvement in protein stability42.
As native-state stability correlates with heterologous expression levels51–53, stability design approaches often enhance the yields of functional protein. The impact on expression levels has sometimes been remarkable, generating variants with identical or improved functional properties that can be efficiently expressed rapidly and economically42,62–64. New methods for stability optimization allowed testing hypotheses on the activity of challenging proteins65, characterizing the activities of proteins that could not be functionally produced before66, reducing the costs of manufacturing therapeutics62,67, improving the resilience of proteins to environmental conditions68–71, and engineering improved functions9,72,73.
In the development of vaccine immunogens for clinical use, stability and protein production costs are significant bottlenecks that protein design may alleviate74–77. For instance, the protein RH5 from the parasite Plasmodium falciparum, the most virulent malaria-causing strain, is the leading candidate for a vaccine that targets the parasite blood stage. But this protein can only be produced in expensive insect cells and denatures at approximately 40°C, whereas vaccines for the developing world require large-scale production and would preferentially not depend on refrigerated transport. A mutant that was designed for higher native-state stability could be robustly expressed in E. coli and exhibited nearly 15°C higher thermal resistance while maintaining the immunogenicity of the wild-type RH5 in animal models. This design lowers costs of production and may enable refrigeration-free transport62 and has recently entered Phase 2 clinical trials78. Similarly, a designed variant of the SARS-CoV-2 Spike protein exhibited tenfold improved production levels in mammalian cells67 (Fig. 2B). Encouragingly, this design elicited a substantial increase in the titer of neutralizing antibodies against both the original SARS-CoV-2 Wuhan strain and subsequent variants of concern, including Omicron. The improvement in neutralization and breadth is likely due to increased conformational stability of the design relative to the natural Spike protein (Fig. 2A), highlighting the important functional implications of specifically stabilizing the protein native state. These results suggest that stability design methods may rapidly, economically, and effectively address both emerging and long-standing global-health challenges.
Stability design has also expanded the feasibility of using enzymes as therapeutics64,79. A notable example is the bacterial enzyme chondroitinase ABC, which has attracted biomedical research interest for its ability to induce nerve recovery following stroke and spinal-cord injury by degrading the proteoglycan in scar tissue79–81. The enzyme exhibits limited stability at physiological pH and body temperature, however, and attempts to improve these critical properties have met little success for a decade, perhaps due to its unusual size (>1,000 amino acids). By contrast, by testing merely three designs generated by a webserver, a stabilized variant was produced with a functional half-life of >6 days, compared to under a day for the wild type enzyme, while showing a threefold increase in proteoglycan degradation79.
Until recently, atomistic design methods could only be reliably applied to crystallographically determined protein structures due to the high sensitivity of design methods to atomic details82. New AI-based structure-prediction methods, such as AlphaFold283, however, have exhibited remarkable accuracy close to that obtained from crystallographic analysis. This exciting development increases the number of proteins that can be computationally engineered from under 200 thousand experimentally determined entries in the Protein Data Bank (PDB) to the hundreds of millions of protein sequences in genomic databases84. In principle, any protein sequence can now be used as a starting point for design to improve stability and expressibility66. This new and unexpected capability is especially useful where large numbers of newly discovered proteins must be optimized, such as in antibody85 or enzyme discovery workflows66 or where speed is crucial, such as in response to a pandemic threat75. This ability depends on the accuracy of the prediction tools, and we expect newer generations of structure-prediction methods that model bound ligands and cofactors86 to dramatically improve the scope and reliability of design workflows. Furthermore, a future improvement in structure prediction of long loops, especially in antibodies87, is likely to enhance design capabilities in this important therapeutic area. Recent design studies have also extended stability design to membrane proteins88,89 which comprise more than a quarter of every proteome and are typically extremely challenging for heterologous overexpression56,57 despite their high impact as drug targets.
Several stability design methods were implemented as automated web servers42,90–92 becoming accessible to scientists with no expertise in computational modeling. These tools have led to dozens of reports with successful case studies that required screening only a handful of designs. To be sure, not all proteins targeted for stability design exhibit improvements63, but given the broad usefulness of current methods, we may be close to the point at which stability design can be applied reliably and universally to any protein class. Beyond the practical implications for accelerating research and technology, this exciting development verifies our understanding of some of the basic rules that determine native-state stability and foldability40 even in large eukaryotic proteins that exhibit complex folds and functions.
Enhancing protein activity
Optimizing protein activity is a critical and challenging frontier of design methodology development. Improving affinity, catalytic rate, or substrate selectivity often requires mutations within the active site, but most active-site mutations lead to a significant loss in function. A further complication is that large gains in function often demand several mutations. Active-site positions are proximal, however, and the functional impact of mutations is likely to depend strongly on other mutations in a phenomenon known as epistasis93,94. Epistasis may severely restrict the emergence of new functions in natural or lab evolution because evolutionary processes rely on the stepwise accumulation of mutations each of which must at least preserve activity28. Desirable mutants that exhibit high activity may be unreachable by such processes if, due to epistasis among the mutations, these processes require passing through low-activity variants93–97 (Fig 1B). Another severe complication is that functional traits may be mutually antagonistic, such as when mutations that improve activity come at the cost of lower specificity or stability8,9,29,58. Protein dynamics are a significant source of additional uncertainty in optimizing function. The activity of many enzymes and binding proteins relies on changes in protein conformation between states with similar energy, and typically not all states are characterized in the atomic detail necessary for design98. Therefore, although the limitations of protein evolution provide a strong impetus to develop non-evolutionary computational design methods for enhancing activity, the complexities of active-site design entail that such methods are less developed and general than those for stabilizing proteins43,99,100.
FuncLib is an active-site design method, implemented as an online tool, that addresses some of the complexities of designing active sites43. Instead of accumulating mutations iteratively as evolutionary methods do, it searches for simultaneous combinations of active-site mutations that form stable, low-energy constellations. In practice, the designs exhibit shape and electrostatic diversity in the active site while ensuring that the mutations do not deform the catalytic constellation of residues. FuncLib has typically been applied to generate variants that exhibit diverse activity profiles starting from a natural or engineered enzyme43,72,101–103. In these cases, the method does not model any substrate in the active-site pocket, and yet several studies observed large improvements in activity. Typically, this approach leads to improvements in activities already observed in the parental enzyme, and encouragingly, in a handful of cases new specificities emerged that were not seen in the wild-type protein102,103.
In an application of FuncLib to a fungal unspecific peroxygenase (UPO) (Fig 3A), >70% of the experimentally screened designs were functional and several exhibited dramatic differences in enantio- and regioselectivities103 (Fig. 3B). The broad versatility of the set of designs was highlighted by two that exhibited opposite enantioselectivity for two different compounds. UPOs are attracting significant interest for their ability to economically oxidize a wide range of aliphatic and aromatic biomolecules104, and enzymes that exhibit new substrate specificities may find uses in research and the chemical industry. In another demonstration, designs of a bacterial phosphotriesterase (Fig. 3C) exhibited up to 4000-fold improvement in the hydrolysis rate of synthetic venomous nerve agents, reaching the enzymatic efficiency required for therapeutic use43. One of the designs that exhibited large gains in activity comprised four point mutations, and mutational analysis demonstrated that due to epistasis among the designed mutations, no evolutionarily plausible path could accumulate these mutations without going through intermediates that exhibit low activity (Fig. 3D).
Figure 3. Design of enhanced activity.
A. Amino acid residues (orange) in the UPO active site are in close proximity (PDB entry: 6EKZ). Computational design introduces combinations of simultaneous mutations to generate stable and preorganized active sites103. Heme shown in cyan; the substrate propranolol in purple. B. Select designs were experimentally tested for C−H oxyfunctionalization of a variety of styrenes. The yellow and blue bars indicate the fraction of enantiomers generated by the designs and starting enzyme (WT). Molecular structures of the products are shown at the top. Designs show striking changes in enantiospecificity (compare, for instance, WT and d28 in the right-most column). C. The active site of a bacterial phosphotriesterase (PDB entry: 1HZY). Wild type residues are in gray sticks, and designed mutations in a quadruple mutant are in cyan. D. Mutations in the active site of the phosphotriesterase show strong epistasis in their impact on the measured activity (2-naphthyl acetate hydrolysis). Each circle represents a phosphotriesterase mutant, and the area of the circle is proportional to the specific activity of the design. The starting enzyme exhibits low specific activity (360 μM s-1 mg-1 protein). Each of the point mutants exhibits improved specific activity, but activity declines in the double mutants relative to the His257Trp single mutant. Last, the quadruple mutant (designed enzyme) substantially improves specific activity relative to all single or double mutants. Adapted from ref 43.
These results demonstrate that addressing the challenge posed by epistasis may open the way to dramatically improve existing activities and even generate new ones. Additionally, they show that designing stable and catalytically preorganized active sites is a prerequisite for designing efficient enzymes, in line with theoretical considerations105. Furthermore, the designs typically do not exhibit stability-activity trade offs, suggesting that computational methods may access high-functioning variants that are unlikely to emerge from applying evolutionary methods. We note, however, that the design methods only increase the likelihood of obtaining highly functional variants that may exhibit substrate specificity differences; they are not sufficient for designing variants with predetermined activity profiles. A future complete computational activity-optimization solution will require advances in ligand docking86 and possibly also in transition-state modeling to generate structural models that bias the computational design method in favor of the desired catalytic outcome.
Learning from iterative high-throughput structure-based design
Over the past decade, computational methods have matured to the point that they can generate thousands and even millions of designs through automated pipelines106–109. These methods and advances in massive DNA synthesis, deep sequencing, experimental screening, and machine learning have enabled the application of statistical learning to draw general lessons about protein folding and activity.
High-throughput structure-based design methods have been used to map sequence and conformation spaces of “miniproteins”106, proteins typically ≤ 60 amino acids in size. To study the determinants of folding and stability, nearly a million miniproteins were screened for protein stability and foldability107 using high-throughput proteolytic screening and deep sequencing. This analysis quantified the contribution to stability of molecular properties, such as the compatibility of amino acid choices to the local secondary structure, large hydrophobic cores, charge stabilization, and epistatic mutations. This strategy was also used to design 20 thousand miniproteins to screen for binders of influenza hemagglutinin and botulinum neurotoxin B25. A machine-learning analysis distinguished binders from inert proteins by their stability and contact order at the interface, improving binder recovery success rate by eightfold in a second design round. During the SARS-Cov-2 pandemic, this approach was used to generate novel binders of the Spike receptor binding domain that inhibited ACE2 binding26. This combination of high-throughput design and screening is opening exciting possibilities for complete automation of de novo binder design.
Natural proteins are typically larger than 200 amino acids33 and structurally more complex than miniproteins110. The combinatorial assembly and design of enzymes (CADENZ) method generates stable and diverse backbone fragments by selecting dozens of structural fragments from homologous enzymes and optimizing their sequences for maximal stability and compatibility with other fragments108. It then uses a machine-learning strategy to select designed fragments that would combine to form low-energy full-length proteins that exhibit stable catalytic constellations. CADENZ was used to design a million models of structurally diverse proteins based on glycoside hydrolase structures, and high-throughput activity screening isolated thousands of functional designs that exhibited high diversity (80-160 mutations from any natural enzyme including large insertions and deletions). Comparing features of functional and non-functional designs revealed different physicochemical preferences in active sites versus the rest of the protein. Functional designs were characterized by dense, yet low-energy constellations of active site residues, while the remainder of the protein required compactness, large hydrogen-bond networks and high compatibility between amino acid identity and the local backbone conformation. Implementing these features in a second-generation repertoire yielded more than 10 thousand functional enzymes from a single screen, with thousands of designs that exhibited different substrate specificity from one another108. Based on similar principles, htFuncLib generates large libraries of multipoint active site mutants that are predicted to form amino acid constellations that do not disrupt the positioning of functional groups in the active site109. Applied to 27 positions in the chromophore-binding pocket of GFP, the method generated a repertoire encoding approximately 10 million designs from which more than 16 thousand functionally fluorescent designs were retrieved, many of which exhibited more than five active-site mutations. A detailed biophysical analysis revealed dozens of designs with potentially useful changes in spectral properties, including large improvements or changes in brightness, excitation spectra, and thermal stability relative to the starting protein.
Thus, despite the high sensitivity of protein active sites to mutation, high-throughput design studies reveal that the space of active-site functional variants is likely to be large. For instance, despite the long-term interest in engineering GFP, fewer than 100 variants in the chromophore-binding pocket were documented111 before the htFuncLib study. It is important to note, however, that while protein active sites may accommodate thousands of variants, these numbers are dwarfed by the vast space of nonfunctional active-site variants (Fig 4). The ability of new energy-based design methods to effectively navigate the complex mutational space of active sites suggests that we have a sound understanding of some of the biophysical underpinnings needed to design improved activity. In particular, the analysis of the newly designed GFP functional variants highlighted the crucial role that protein stability and active-site residue preorganization play in determining whether a protein is functional. On a practical level, design calculations are largely automated, and gene-synthesis costs for such customized libraries are modest112 opening the way for new computational methods to find uses in areas of protein engineering that are amenable to high-throughput screening, such as antibody and binder discovery and optimization. Additionally, we anticipate that the large and diverse datasets that emerge from such screens that systematically relate sequence and structure to stability and activity will drive further improvements in design methodology, including in AI-based design of function.
Figure 4. Design methods must navigate an astronomically large sequence space that is extremely sparse in functional proteins.
(I) The theoretical sequence space of an average-sized protein is huge: for a 350 amino acid protein, 20350 >10455 sequences. By designing enzyme domains as in the CADENZ approach, the hypothetical space of possibilities approaches such numbers. Even when designing amino acids only in a functional site (II), as in the case of GFP design (htFuncLib), the hypothetical sequence space is orders of magnitude greater than the number of viral particles in the world113, and no experimental screening method could search it adequately. In the case of GFP, phylogenetic data and physics-based calculations focus the search and reduce the sequence space by 18 orders of magnitude (III). This still leaves a sequence space that cannot be assayed even by the most high-throughput methods currently available, such as ribosome display. A machine-learning procedure can filter out mutations that do not combine with one another to form stable and foldable proteins, leading to the machine learning (ML) restricted space (IV). This restriction enables constructing an effective library that is enriched with stable and potentially functional designs. From this sequence space of ~107 variants, 104 functional fluorescent proteins were recovered (V)109 compared to fewer than 100 active-site variants recorded in the fluorescent proteins database111.
De novo design of structure and function
All natural proteins are the product of evolution. Due to the limitations of evolutionary processes, new molecular activities only slowly emerge through natural evolution, and many desirable activities may not have been found114. The rationale of de novo design is that these fundamental limitations of evolutionary processes can be overcome by tailoring a protein structure for each desired activity without recourse to natural starting points17,22. Thus, general methods for de novo function design is the ultimate goal of the “inverse-function” problem (Figure 1A). Although this goal has been attempted for decades, the approaches to achieve it have changed dramatically16–18. Starting with the manual application of protein design principles to generate new structures115–117, a period of physics-based approaches followed10,22 up to the recent application of generative AI23,118. As the field develops, we are witnessing a surge in the structural complexity and diversity of de novo-designed proteins in step with the adoption of new approaches (Figure 5). While this progress is notable, however, the complexity of de novo-designed folds still pales in comparison to the natural repertoire of structures (Figure 5a). In fact, a very large fraction of designed proteins adopt simple all α helix topologies whereas natural proteins sample a much greater diversity of secondary structures and topologies (Figure 5b). We argue that the simplicity of designed folds constrains the types of functions that they can present. In the following sections, we provide an overview of the achievements in de novo design of fold and function and discuss obstacles that must be overcome to bridge the gap to natural proteins.
Figure 5. Comparison of structural features in natural and de novo designed proteins.
A) De novo proteins are yet to reach the structural complexity of natural ones as measured by size and relative contact order. Structures of a de novo α-helix bundle (PDB: 7CBC) are highlighted versus two natural proteins (PDBs: 3NF4 and 3ZQJ) B) Secondary Structure Element (SSE) content in natural and de novo proteins. De novo proteins are biased towards high α-helix content whereas natural ones are structurally diverse. C) Relative contact order of designed protein structures found in the PDB plotted against their time of publication. Each design is also labeled according to the class of design generation method (PDBs: 1AL1, 1QYS, 3QA9, 3NF4). D) SSE content of natural and de novo binder interfaces (excluding antibodies). De novo binders are biased towards presenting helices at interaction sites compared to natural binders. Antibody binding surfaces, which are not represented in this figure, are dominated by loop regions and almost entirely devoid of helices. E) Number of interfacial residues in natural protein interactions compared to de novo protein binders. F) Hydrogen bonds in natural interfaces compared to de novo interfaces. G) Examples of de novo designed protein-protein interactions: botulinum neurotoxin binder25, influenza binder38, PDL-1 binder and SARS-CoV-2 receptor binding domain (RBD) binder150. Targets are colored green and binders in red.
[H2] De novo fold design
The de novo design problem is typically formulated in two steps: backbone generation and sequence search24,119 (Fig. 1A). To generate a plausible backbone, most design methods use regular structural motifs of α helices and β sheets connected by short loops to create idealized topologies dominated by densely packed secondary structures22–24,119. In such simple topologies, the sequence-structure rules are well understood, simplifying the design process24,120–122. Moreover, to increase the likelihood of achieving the desired structure, fold design can use features that exclude alternative conformations, for instance, by using short segments to connect secondary structure elements, by ensuring high compatibility between amino acid type and secondary structure, and by introducing mutations that relieve backbone and residue strain in the desired state106,107,120,122.
A guiding principle for backbone design is the notion of “backbone designability”123,124. A designable backbone is one for which it is possible to find a sequence of natural amino acids that yields a unique structure. Designable backbones usually contain common structural motifs found in nature, both at the level of the regular secondary elements and the 3D arrangement of those elements, increasing the likelihood of producing a stable protein125,126 without explicitly encoding negative-design principles24. The resulting protein structures can be thought of as “idealized” versions of natural protein folds. In these versions, the secondary structures are canonical, well-packed, and unstrained to promote stability and foldability. The ability to automatically design thousands of such proteins106,107 is a remarkable demonstration of the success of protein design methods in capturing some of the features that are critical for protein folding. Nevertheless, idealized designs lack the structural irregularities that are hallmarks of functional motifs in natural proteins, such as surface cavities, kinked secondary structure elements, and desolvated polar groups110. Because of their regularity, many designed proteins exhibit high stability, but methods for designing sophisticated activities in de novo proteins are limited. Part of the functional design challenge is that natural proteins are much larger and topologically more complex than those that have been the subject of design studies110. Moreover, function in natural proteins often depends on large and structured loop regions110,127, and such regions continue to pose a severe challenge to structure prediction and design methods87.
Following backbone design, a compatible sequence is generated. Physics-based methods have been widely applied in this field. They compute an energetically favorable amino acid for each position of a computed backbone structure. Given the high dimensionality of the sequence search problem, it is impossible to enumerate the full space of sequences128, and heuristic129,130 sampling approaches129,130 are used to find low-energy sequences. Physics-based methods have enabled several landmark designs, including proteins with folds not previously found in nature24, idealized folds based on natural ones120, a large number of miniproteins and peptides131, and the exploration of new geometries of secondary structure elements in protein folds132.
In recent years, the toolbox for protein design has been expanded with deep learning methods. Typically, these methods learn structural and amino acid patterns from natural protein structures and use the learned distributions to generate new backbones and matching amino-acid sequences. In contrast to physics-based methods, deep learning approaches need huge amounts of high quality data for training, and the basis for their design decisions is often opaque, making it challenging to interpret the results in biophysical terms based on molecular features. Nevertheless, machine learning methods are more efficient that physics-based ones and scale well with the number of amino acids. Similar to physics-based methods, most deep learning methods rely on discrete steps of structure and sequence design, though recent developments design structure and sequence in a single step133,134. Once sequences are generated their quality is typically assessed by verifying that AlphaFold2 predicts a model structure that closely matches the designed one23,118,135–141.
The deep learning-based de novo design approaches have demonstrated remarkable accuracy in generating protein folds, such as β-barrels133,142,143, immunoglobulin-like folds143, α-β folds23,133,143, water-soluble analogs of membrane proteins143, and new-to-nature structures23,118,136,144, including proteins with more than 300 amino acids23. This is a clear breakthrough relative to the achievements of the physics-based approaches for which such challenging folds and large proteins were either extremely difficult or beyond the scope145. While it remains unclear what are the precise methodological aspects that have led to these achievements, one possibility is that deep learning strategies optimize the probability of a sequence to fold into the desired conformation relative to alternatives, thereby implicitly capturing some negative design principles146.
Two popular generative models are currently diffusion-based models and protein language models. Diffusion-based models can rapidly sample many backone conformations by gradually denoising a randomly generated constellation of backbone atoms23,147,148. By contrast, protein language models capture evolutionary patterns in sequences that can be used to generate novel sequences. Even though these are trained on natural sequences, they may generalize beyond their training set to generate de novo proteins133. Notably, methods development is progressing quickly, and many new learning paradigms are being introduced149.
The automation of de novo design workflows enables statistical comparison of computed and natural proteins to understand the current limitations of design methodology (Figure 5 and Supplementary Box 1). Clearly, de novo-designed proteins tend to be smaller than natural ones, and their relative contact order, which quantifies fold complexity, is also smaller (Fig. 5a). Particularly, de novo designs are dominated by secondary structures and exhibit a much higher fraction of α helices compared to natural proteins (Figure 5b), as also observed in previous analyses17,18. The dominance of α helices can be explained by the high stability and foldability of these structural elements due to their highly predictable pattern of stabilizing hydrogen bonds and reduced degrees of freedom; these features facilitate accurate design in the absence of an explicit way to encode negative design principles. β-strands are often more difficult to generate due to the requirement to design non-canonical structural features that guard against the tendency of β strands to aggregate116,120. Additionally, designed folds exhibit lower loop content than natural proteins, despite the fact that loop regions often form essential parts of active sites110. Despite these critical gaps between de novo designed and natural proteins, we observe a clear trend towards greater fold complexity in designs over the years, in particular with the advent of deep learning-based design strategies around 2020 (Fig. 5c). This is an encouraging sign that perhaps some of the gaps in fold complexity between designed and natural folds (which are still dramatic) will be overcome with more sophisticated design methods.
De novo design of function
The ultimate goal of protein design is developing general methods that can be reliably applied to generate desired functions without recourse to natural starting points, thus fully addressing the “inverse function” problem (Fig. 1A). While this grand goal is far from being achieved, one important step toward it is embedding functional motifs from natural proteins into computationally designed ones. Furthermore, the de novo-designed parts of the protein can be sculpted to accommodate the functional site in an optimal way. This task has been achieved by both physics-based and deep learning-based approaches, resulting in proteins that present viral epitopes for vaccines, metal-binding proteins, enzymes, and protein-binding sites141,151,152.
Beyond incorporating a known functional motif into a designed protein, de novo design also aims to design functional proteins completely from scratch. The main focus in recent years has been on protein or small-molecule binders. In the case of the latter, success came from physics-based methods where ligands are modeled simultaneously with interacting side chains and then used to search for protein structures that can implement such arrangements, resulting in high-affinity binders153. The challenge for protein binders, on the other hand, is that protein-protein interactions must strike a delicate balance between opposing biophysical contributions, such as favorable electrostatic and hydrogen-bond interactions and the unfavorable desolvation penalty of shielding polar groups from solvent. Furthermore, the protein interaction surfaces must be carefully optimized to exhibit high shape and electrostatic complementarity and minimal strain154. Nonetheless, physics-based methods have been used to design protein binders for different targets, such as cell-surface receptors and viral proteins26,38,155. Recently, deep learning-based methods have also made important contributions to protein binder design. The MaSIF framework trained a geometric deep learning framework on protein surfaces for prediction and design tasks156. This approach enabled accurate prediction of the most likely protein interaction sites on a protein surface and the design of protein binders150. Additionally, diffusion-based generative approaches showed remarkable success in the design of de novo binders, including several cases of high-affinity binders23.
Although the number of successfully designed binders is small, in the following we compare several structural features between designs and natural binders. Similar to our analysis of protein folds, here too, we find that designed binding surfaces rely heavily on regular secondary structures, particularly α helices (Figure 5d). Furthermore, these surfaces tend to be smaller (Figure 5e) and rely on fewer hydrogen bonds than natural ones, highlighting the difficulties in designing precise hydrogen-bond interactions across large and complementary protein surfaces (Figure 5f). We note that the distribution of natural proteins we analyzed excluded antibodies. The dominance of loop regions in antibody binding sites would greatly accentuate the differences we observe here.
We conclude that despite the significant progress made in de novo binder design in recent years, there is a dramatic gap in the types of molecular interactions that can be generated compared to those observed in nature that has not been closed over the past decade11. Closing this gap is likely to demand a dedicated focus on de novo design of nonideal structure elements, such as those seen in antibodies and enzyme active sites. Designing such elements, however, may come at the price of lowering the stability and foldability observed in de novo proteins and may require explicitly encoding negative design principles to ensure accuracy. We expect that some of the advantages of high stability observed in de novo designs relative to natural proteins may have to be sacrificed to design more complex structures that encode sophisticated functions.
Emerging applications for de novo designed proteins
Recent years have seen an explosion of novel applications of de novo designed proteins. In this section we present a few that highlight innovations in the design approaches and these are summerized in Figure 6. One of the areas where de novo designs may have clear advantages over natural proteins is in protein-based therapeutics, particularly antivirals. In this area, the small size, high stability, and relatively economical and scalable production of designs compared to antibodies are all appealing and have inspired several successful design campaigns against influenza23,38 and SARS-CoV-2155,157. Designs were generated to act through several inhibitory mechanisms including conformational inhibition38 of target viral proteins and directly blocking the interactions between the virus and the human cell surface receptor155,157,158, demonstrating the versatility of design approaches.
Figure 6. Several practical applications of de novo protein design.
De novo designed proteins present a range of possibilities for the development of new molecules with biotechnological applications. Some popular areas of research are antivirals (PDB: 3R2X), cancer therapies (PDB: 7JH5), protein based therapies (PDB: 2B5I), protein switches (PDB: 6IWB), drug delivery vehicles (PDB: 6VFI), vaccines (PDB: 3LHP) and biosensors (PDB: 7AYE). Natural proteins (targets) are colored green and the designed proteins are colored red.
Another notable example of computationally designed protein-based drugs are immunomodulatory interleukins159. Designs for this task were generated by grafting a natural interleukin binding site followed by de novo design of the supporting scaffold. The resulting design exhibited enhanced affinity and selectivity and increased biological activity compared to natural interleukins in vivo. Immunomodulatory proteins, specifically cytokines, are an attractive target for design because the natural pleiotropy of interleukins across different receptors suggests many possibilities for fine-tuning their potency and specificity160.
Structure-based vaccine immunogen design is a very promising area, including for de novo design approaches151,161. Vaccine immunogen stabilization62, grafting of neutralization epitopes on stable de novo designed scaffolds141,151,162, and even targeting specific antibody germline responses through design have been recently explored163. Many of the designed immunogens were tested in non-human primates showing the induction of neutralizing antibodies, and germline targeting immunogens have since entered human clinical trials showing engagement of the desired antibodies164. One of the most impressive recent advances has been the generation of vaccine presentation and delivery vehicles in the form of self-assembled protein nanoparticles. These designed protein-based nanoparticles have shown a breadth of applications related to different viruses (RSV, HIV, SARS-CoV-2, and influenza), consistently improving the efficacy of the immune responses and enabling more economical manufacturing options165–168. Remarkably, a designed nanoparticle targeting SARS-CoV-2 has recently been approved for human use168,169.
Biosensing is another area for which protein design is particularly well suited. In such designs the molecular recognition module needs to be designed in a way that binding of the analyte impacts the conformation of the sensor in measurable ways. Currently, such approaches are more readily applicable to detecting the presence of proteins than small molecules given that larger conformational changes are more readily triggered by protein-protein binding events. Nevertheless some progress has also been made in small molecule sensing where ligand receptors can be embedded in multidomain architectures and used for quantifying drug dosage170, or an array of small protein sensors produced to mimic the multifactorial sensing of olfactory system171.
One of the most innovative areas in modern biomedicine is cell-based therapeutics as a strategy to treat non-solid tumors. Cell-based therapies are a class of living drugs that can leverage protein design and synthetic biology. The engineering of chimeric antigen receptor T-cells (CAR-Ts) has been a dynamic field where many concepts that could benefit from sophisticated protein design have been attempted. Specific examples arise from the need to control the activity of CAR-T cells to avoid toxicity172. For instance, small molecule switches have been used for cancer therapies through design of receptor activation in vivo, thereby adding a critical safety switch for clinical usage173. Additionally, de novo designed transmembrane segments that were incorporated into receptors in CAR-T cells were shown to generate more specific signaling than the standard segments used in such constructs174. This led to lower release of inflammatory cytokines and may lead to safer CAR-T cell treatments. One of the biggest problems of CAR-T cells is the lack of specific antigens on the tumor cells. Using computational design strategies, multiple antigen binders were combined into “logic gates” that increased the tissue specificity of the engineered cells175.
To conclude, de novo protein design is increasingly finding uses in areas at the forefront of therapeutic engineering. The inherent features of de novo designs provide improved stability and manufacturability compared to conventional modalities.
Outlook
Protein design has made huge progress over the past decade. From concentrating efforts on understanding the fundamental principles that govern sequence-structure relationships120, the field has matured to the point that thousands of unique protein structures can be designed using automated workflows25,106,107. In addition, despite concerns that design methods were not accurate enough for optimizing natural protein activities10,12,30, new methods that combine physics- and data-based calculations are making significant inroads into applied protein engineering. Recent algorithms can generate variants of natural proteins that encode dozens of simultaneous mutations to significantly increase stability and desirable activities42,43,108,109. Such methods have dramatically accelerated the development of proteins for research and therapeutic or industrially relevant applications62,67,79,102 and have been applied by scientists with no expertise in computational modeling. Encouragingly, the broad spectrum of proteins and functions that have been optimized using protein design suggests that one of the critical goals of protein engineering — i.e., a general strategy for protein optimization that can be universally applied — may be within reach.
Methods for designing protein function from scratch (the “inverse function problem”) are, however, limited in scope. Despite advances in de novo binder design, the resulting proteins still exhibit limited structural diversity, and targeting complex protein surfaces is still beyond reach. Furthermore, only rudimentary enzymatic activities can be designed de novo152. We argue that addressing these important challenges requires advancing beyond the idealized proteins that are typically generated today and approach the complexity of natural binders and enzymes, such as antibodies, TIM-barrels, and β propellers. Designing complex structures without compromising stability and foldability is, in our view, the most pressing challenge for de novo design. To overcome this challenge, we may need methods that encode negative design principles either through new data-driven approaches or, more realistically, by combining AI-based methods and physics-based ones. A solution to this outstanding challenge will open numerous opportunities for addressing problems in research, drug development, and sustainable chemical manufacturing, among many other urgent basic and applied topics.
Supplementary Material
Glossary.
- Fold design
Designing a backbone that shares no significant sequence homology with natural proteins. Sometimes denoted de novo design.
- Function design
Implementing a new function into a protein scaffold.
- Structure-based design
Design based on computed or experimentally determined molecular structures using physical principles.
- Protein backbone
The protein mainchain of amino acids connected through covalent amide linakages. Also denoted scaffold.
- Protein optimization
Design with the goal of optimizing desired protein functional aspect(s), such as thermodynamic and kinetic stability, production yields, catalytic efficiency, binding affinity, and specificity.
- Stability design
Design with the goal of improving protein thermodynamic and kinetic stability.
- Sequence space
The theoretical space of possible combinations of protein sequence changes. This space is often too large for experimental or computational enumeration, and design methods must find ways to restrict and sample it efficiently.
- Positive design
Designing elements that improve the stability of a desired structural state.
- Negative design
Designing elements that destabilize undesired (e.g., nonfunctional or aggregation-prone) states.
- Epistasis
Non-additive effects of combinatorial mutations; for instance, when mutations are toleated in combination, but not individually, or vice versa.
- Backbone generation
Generating a spatial arrangement of the protein backbone excluding the amino acid sidechains.
- Idealized topology
Simplified geometric representation of protein structure, mostly comprising secondary structure elements connected by short linkers.
- Backbone designability
The ability of amino acid sequences to fold into the desired backbone. A backbone that has many solutions is highly designable.
- Relative Contact Order (RCO)
Represents the relative complexity of a protein fold. Computed as the extent to which amino acids that are far in the primary sequence are physically close in the 3D structure.
Acknowledgments
We thank Ariel Tennenhouse for critical reading. Work in the Fleishman lab was funded by the Volkswagen Foundation grant 9474, the Israel Science Foundation grant 1844, the European Research Council through a Consolidator Award grant 815379, the Dr. Barry Sherman Institute for Medicinal Chemistry, and a donation in memory of Sam Switzer. Work in the Correia lab was supported by the Swiss National Foundation, the National Center of Competence in Molecular Systems Engineering and Fondation Leenaards. SJF and BEC are named inventors on patents relating to methods and designs described in the manuscript and consult on the application of protein design methods.
References
- 1.Arnold FH. Innovation by Evolution: Bringing New Chemistry to Life (Nobel Lecture) Angew Chem Int Ed Engl. 2019 doi: 10.1002/anie.201907729. [DOI] [PubMed] [Google Scholar]
- 2.Winter G. Harnessing Evolution to Make Medicines (Nobel Lecture) Angewandte Chemie International Edition. 2019;58:14438–14445. doi: 10.1002/anie.201909343. Preprint at. [DOI] [PubMed] [Google Scholar]
- 3.Trudeau DL, Tawfik DS. Protein engineers turned evolutionists-the quest for the optimal starting point. Curr Opin Biotechnol. 2019;60:46–52. doi: 10.1016/j.copbio.2018.12.002. [DOI] [PubMed] [Google Scholar]
- 4.Packer MS, Liu DR. Methods for the directed evolution of proteins. Nat Rev Genet. 2015;16:379–394. doi: 10.1038/nrg3927. [DOI] [PubMed] [Google Scholar]
- 5.Arnold FH. The nature of chemical innovation: new enzymes by evolution. Q Rev Biophys. 2015;48:404–410. doi: 10.1017/S003358351500013X. [DOI] [PubMed] [Google Scholar]
- 6.Arnold FH. Combinatorial and computational challenges for biocatalyst design. Nature. 2001;409:253–257. doi: 10.1038/35051731. [DOI] [PubMed] [Google Scholar]
- 7.Tokuriki N, Stricher F, Serrano L, Tawfik DS. How protein stability and new functions trade off. PLoS Comput Biol. 2008;4:e1000002. doi: 10.1371/journal.pcbi.1000002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Tokuriki N, et al. Diminishing returns and tradeoffs constrain the laboratory optimization of an enzyme. Nat Commun. 2012;3:1257. doi: 10.1038/ncomms2246. [DOI] [PubMed] [Google Scholar]
- 9.Goldsmith M, et al. Overcoming an optimization plateau in the directed evolution of highly efficient nerve agent bioscavengers. Protein Eng Des Sel. 2017;30:333–345. doi: 10.1093/protein/gzx003. [DOI] [PubMed] [Google Scholar]
- 10.Fleishman SJ, Baker D. Role of the biomolecular energy gap in protein design, structure, and evolution. Cell. 2012;149:262–273. doi: 10.1016/j.cell.2012.03.016. [DOI] [PubMed] [Google Scholar]
- 11.Stranges PB, Kuhlman B. A comparison of successful and failed protein interface designs highlights the challenges of designing buried hydrogen bonds. Protein Sci. 2013;22:74–82. doi: 10.1002/pro.2187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Baker D. What has de novo protein design taught us about protein folding and biophysics? Protein Sci. 2019;28:678–683. doi: 10.1002/pro.3588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Khare SD, Fleishman SJ. Emerging themes in the computational design of novel enzymes and protein-protein interfaces. FEBS Lett. 2013;587:1147–1154. doi: 10.1016/j.febslet.2012.12.009. [DOI] [PubMed] [Google Scholar]
- 14.Baker D. An exciting but challenging road ahead for computational enzyme design. Protein Sci. 2010;19:1817–1819. doi: 10.1002/pro.481. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Baek M, Baker D. Deep learning and protein structure modeling. Nat Methods. 2022;19:13–14. doi: 10.1038/s41592-021-01360-8. [DOI] [PubMed] [Google Scholar]
- 16.Pan X, Kortemme T. Recent advances in de novo protein design: Principles, methods, and applications. J Biol Chem. 2021;296:100558. doi: 10.1016/j.jbc.2021.100558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Korendovych IV, DeGrado WF. De novo protein design, a retrospective. Q Rev Biophys. 2020;53:e3. doi: 10.1017/S0033583519000131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Woolfson DN. A Brief History of De Novo Protein Design: Minimal, Rational, and Computational. J Mol Biol. 2021;433:167160. doi: 10.1016/j.jmb.2021.167160. [DOI] [PubMed] [Google Scholar]
- 19.Yue K, Dill KA. Inverse protein folding problem: designing polymer sequences. Proc Natl Acad Sci USA. 1992;89:4163–4167. doi: 10.1073/pnas.89.9.4163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bowie JU, Lüthy R, Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991;253:164–170. doi: 10.1126/science.1853201. [DOI] [PubMed] [Google Scholar]
- 21.Weinstein J, Khersonsky O, Fleishman SJ. Practically useful protein-design methods combining phylogenetic and atomistic calculations. Curr Opin Struct Biol. 2020;63:58–64. doi: 10.1016/j.sbi.2020.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Huang P-S, Boyken SE, Baker D. The coming of age of de novo protein design. Nature. 2016;537:320–327. doi: 10.1038/nature19946. [DOI] [PubMed] [Google Scholar]
- 23.Watson JL, et al. De novo design of protein structure and function with RFdiffusion. Nature. 2023 doi: 10.1038/s41586-023-06415-8. [ Applying diffusion models to backbone generation yields large de novo designed proteins and assemblies. Available as a colab-notebook ] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kuhlman B, et al. Design of a novel globular protein fold with atomic-level accuracy. Science. 2003;302:1364–1368. doi: 10.1126/science.1089427. [DOI] [PubMed] [Google Scholar]
- 25.Chevalier A, et al. Massively parallel de novo protein design for targeted therapeutics. Nature. 2017 doi: 10.1038/nature23912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Cao L, et al. Design of protein-binding proteins from the target structure alone. Nature. 2022;605:551–560. doi: 10.1038/s41586-022-04654-9. [ Repertoires of miniprotein binders for 12 different antigens are designed based solely on the structure of the target antigenic site ] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Anfinsen CB. Principles that Govern the Folding of Protein Chains. Science. 1973;181:223–230. doi: 10.1126/science.181.4096.223. [DOI] [PubMed] [Google Scholar]
- 28.Smith JM. Natural selection and the concept of a protein space. Nature. 1970;225:563–564. doi: 10.1038/225563a0. [DOI] [PubMed] [Google Scholar]
- 29.Bershtein S, Segal M, Bekerman R, Tokuriki N, Tawfik DS. Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein. Nature. 2006;444:929–932. doi: 10.1038/nature05385. [DOI] [PubMed] [Google Scholar]
- 30.Zhao H, Arnold FH. Directed evolution converts subtilisin E into a functional equivalent of thermitase. Protein Eng. 1999;12:47–53. doi: 10.1093/protein/12.1.47. [DOI] [PubMed] [Google Scholar]
- 31.Levinthal C. Are there pathways for protein folding? J Chim Phys. 1968;65:44–45. [Google Scholar]
- 32.Dill KA. Polymer principles and protein folding. Protein Sci. 1999;8:1166–1180. doi: 10.1110/ps.8.6.1166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Brocchieri L, Karlin S. Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res. 2005;33:3390–3400. doi: 10.1093/nar/gki615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Johansson KE, et al. Computational Redesign of Thioredoxin Is Hypersensitive toward Minor Conformational Changes in the Backbone Template. J Mol Biol. 2016;428:4361–4377. doi: 10.1016/j.jmb.2016.09.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Cherny I, et al. Engineering V-type nerve agents detoxifying enzymes using computationally focused libraries. ACS Chem Biol. 2013;8:2394–2403. doi: 10.1021/cb4004892. [DOI] [PubMed] [Google Scholar]
- 36.Baran D, et al. Principles for computational design of binding antibodies. Proc Natl Acad Sci USA. 2017;114:10900–10905. doi: 10.1073/pnas.1707171114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Murphy PM, Bolduc JM, Gallaher JL, Stoddard BL, Baker D. Alteration of enzyme specificity by computational loop remodeling and design. Proc Natl Acad Sci USA. 2009;106:9215–9220. doi: 10.1073/pnas.0811070106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Fleishman SJ, et al. Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. Science. 2011;332:816–821. doi: 10.1126/science.1202617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Whitehead TA, et al. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing. Nat Biotechnol. 2012;30:543–548. doi: 10.1038/nbt.2214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Goldenzweig A, Fleishman SJ. Principles of Protein Stability and Their Application in Computational Design. Annu Rev Biochem. 2018;87:105–129. doi: 10.1146/annurev-biochem-062917-012102. [DOI] [PubMed] [Google Scholar]
- 41.Khersonsky O, Fleishman SJ. Why reinvent the wheel? Building new proteins based on ready-made parts. Protein Sci. 2016;25:1179–1187. doi: 10.1002/pro.2892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Goldenzweig A, et al. Automated Structure- and Sequence-Based Design of Proteins for High Bacterial Expression and Stability. Mol Cell. 2016;63:337–346. doi: 10.1016/j.molcel.2016.06.012. [ Combining phylogenetic analysis with atomistic design calculations improves expressibility and stability in diverse proteins. Available as a webserver ] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Khersonsky O, et al. Automated Design of Efficient and Functionally Diverse Enzyme Repertoires. Mol Cell. 2018;72:178–186.:e5. doi: 10.1016/j.molcel.2018.08.033. [ An evolution-guided atomistic design method enhances enzyme activity levels. Available as a webserver ] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Hanning KR, Minot M, Warrender AK, Kelton W, Reddy ST. Deep mutational scanning for therapeutic antibody engineering. Trends Pharmacol Sci. 2022;43:123–135. doi: 10.1016/j.tips.2021.11.010. [DOI] [PubMed] [Google Scholar]
- 45.Fox RJ, et al. Improving catalytic function by ProSAR-driven enzyme evolution. Nat Biotechnol. 2007;25:338–344. doi: 10.1038/nbt1286. [DOI] [PubMed] [Google Scholar]
- 46.Yang KK, Wu Z, Arnold FH. Machine-learning-guided directed evolution for protein engineering. Nat Methods. 2019;16:687–694. doi: 10.1038/s41592-019-0496-6. [DOI] [PubMed] [Google Scholar]
- 47.Taft JM, et al. Deep mutational learning predicts ACE2 binding and antibody escape to combinatorial mutations in the SARS-CoV-2 receptor-binding domain. Cell. 2022;185:4008–4022.:e14. doi: 10.1016/j.cell.2022.08.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Bedbrook CN, Yang KK, Rice AJ, Gradinaru V, Arnold FH. Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization. PLoS Comput Biol. 2017;13:e1005786. doi: 10.1371/journal.pcbi.1005786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Balchin D, Hayer-Hartl M, Hartl FU. In vivo aspects of protein folding and quality control. Science. 2016;353:aac4354. doi: 10.1126/science.aac4354. [DOI] [PubMed] [Google Scholar]
- 50.McLendon G, Radany E. Is protein turnover thermodynamically controlled? J Biol Chem. 1978;253:6335–6337. [PubMed] [Google Scholar]
- 51.Kwon WS, Da Silva NA, Kellis JT., Jr Relationship between thermal stability, degradation rate and expression yield of barnase variants in the periplasm of Escherichia coli. Protein Eng. 1996;9:1197–1202. doi: 10.1093/protein/9.12.1197. [DOI] [PubMed] [Google Scholar]
- 52.Parsell DA, Sauer RT. The structural stability of a protein is an important determinant of its proteolytic susceptibility in Escherichia coli. J Biol Chem. 1989;264:7590–7595. [PubMed] [Google Scholar]
- 53.Shusta EV, Kieke MC, Parke E, Kranz DM, Wittrup KD. Yeast polypeptide fusion surface display levels predict thermal stability and soluble secretion efficiency. J Mol Biol. 1999;292:949–956. doi: 10.1006/jmbi.1999.3130. [DOI] [PubMed] [Google Scholar]
- 54.Christendat D, et al. Structural proteomics: prospects for high throughput sample preparation. Prog Biophys Mol Biol. 2000;73:339–345. doi: 10.1016/s0079-6107(00)00010-9. [DOI] [PubMed] [Google Scholar]
- 55.Mehlin C, et al. Heterologous expression of proteins from Plasmodium falciparum: results from 1000 genes. Mol Biochem Parasitol. 2006;148:144–160. doi: 10.1016/j.molbiopara.2006.03.011. [DOI] [PubMed] [Google Scholar]
- 56.Klenk C, Ehrenmann J, Schütz M, Plückthun A. A generic selection system for improved expression and thermostability of G protein-coupled receptors by directed evolution. Sci Rep. 2016;6:21294. doi: 10.1038/srep21294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Andréll J, Tate CG. Overexpression of membrane proteins in mammalian cells for structural studies. Mol Membr Biol. 2013;30:52–63. doi: 10.3109/09687688.2012.703703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Bloom JD, Labthavikul ST, Otey CR, Arnold FH. Protein stability promotes evolvability. Proc Natl Acad Sci USA. 2006;103:5869–5874. doi: 10.1073/pnas.0510098103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Rosace A, et al. Automated optimisation of solubility and conformational stability of antibodies and proteins. Nat Commun. 2023;14:1937. doi: 10.1038/s41467-023-37668-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Wijma HJ, Fürst MJLJ, Janssen DB. A Computational Library Design Protocol for Rapid Improvement of Protein Stability: FRESCO. Methods Mol Biol. 2018;1685:69–85. doi: 10.1007/978-1-4939-7366-8_5. [DOI] [PubMed] [Google Scholar]
- 61.Musil M, et al. FireProt: web server for automated design of thermostable proteins. Nucleic Acids Res. 2017;45:W393–W399. doi: 10.1093/nar/gkx285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Campeotto I, et al. One-step design of a stable variant of the malaria invasion protein RH5 for use as a vaccine immunogen. Proc Natl Acad Sci USA. 2017;114:998–1002. doi: 10.1073/pnas.1616903114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Peleg Y, et al. Community-Wide Experimental Evaluation of the PROSS Stability-Design Method. J Mol Biol. 2021;433:166964. doi: 10.1016/j.jmb.2021.166964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Pokorna S, et al. Design of a stable human acid-β-glucosidase: towards improved Gaucher disease therapy and mutation classification. FEBS J. 2023 doi: 10.1111/febs.16758. [DOI] [PubMed] [Google Scholar]
- 65.Borgert SR, et al. Moonlighting chaperone activity of the enzyme PqsE contributes to RhlR-controlled virulence of Pseudomonas aeruginosa. Nat Commun. 2022;13:7402. doi: 10.1038/s41467-022-35030-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Barber-Zucker S, et al. Stable and Functionally Diverse Versatile Peroxidases Designed Directly from Sequences. J Am Chem Soc. 2022;144:3564–3571. doi: 10.1021/jacs.1c12433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Williams JA, et al. Structural and computational design of a SARS-CoV-2 spike antigen with improved expression and immunogenicity. Sci Adv. 2023;9:eadg0330. doi: 10.1126/sciadv.adg0330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Mao G, et al. A sustainable approach for degradation and detoxification of malachite green by an engineered polyphenol oxidase at high temperature. J Clean Prod. 2021:129437 [Google Scholar]
- 69.Lambert AR, Hallinan JP, Werther R, Glów D, Stoddard BL. Optimization of Protein Thermostability and Exploitation of Recognition Behavior to Engineer Altered Protein-DNA Recognition. Structure. 2020 doi: 10.1016/j.str.2020.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Khersonsky O, et al. Stable Mammalian Serum Albumins Designed for Bacterial Expression. J Mol Biol. 2023;435:168191. doi: 10.1016/j.jmb.2023.168191. [DOI] [PubMed] [Google Scholar]
- 71.Sherkhanov S, et al. Isobutanol production freed from biological limits using synthetic biochemistry. Nat Commun. 2020;11:4292. doi: 10.1038/s41467-020-18124-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Allouche-Arnon H, et al. Computationally designed dual-color MRI reporters for noninvasive imaging of transgene expression. Nat Biotechnol. 2022 doi: 10.1038/s41587-021-01162-5. [DOI] [PubMed] [Google Scholar]
- 73.Doble MV, et al. Engineering Thermostability in Artificial Metalloenzymes to Increase Catalytic Activity. ACS Catal. 2021;11:3620–3627. [Google Scholar]
- 74.Hsieh C-L, et al. Stabilized coronavirus spike stem elicits a broadly protective antibody. Cell Rep. 2021;37:109929. doi: 10.1016/j.celrep.2021.109929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Higgins MK. Can We AlphaFold Our Way Out of the Next Pandemic? J Mol Biol. 2021;433:167093. doi: 10.1016/j.jmb.2021.167093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Graham BS, Gilman MSA, McLellan JS. Structure-Based Vaccine Antigen Design. Annu Rev Med. 2019;70:91–104. doi: 10.1146/annurev-med-121217-094234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Hsieh C-L, McLellan JS. Protein engineering responses to the COVID-19 pandemic. Curr Opin Struct Biol. 2022;74:102385. doi: 10.1016/j.sbi.2022.102385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.A study to test experimental blood stage malaria vaccine in Burkina Faso. US National Library of Medicine; 2023. https://clinicaltrials.gov/study/NCT05790889 . [Google Scholar]
- 79.Hettiaratchi MH, et al. Reengineering biocatalysts: Computational redesign of chondroitinase ABC improves efficacy and stability. Sci Adv. 2020;6:eabc6378. doi: 10.1126/sciadv.abc6378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Rosenzweig ES, et al. Chondroitinase improves anatomical and functional outcomes after primate spinal cord injury. Nat Neurosci. 2019;22:1269–1275. doi: 10.1038/s41593-019-0424-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Busch SA, Horn KP, Silver DJ, Silver J. Overcoming macrophage-mediated axonal dieback following CNS injury. J Neurosci. 2009;29:9967–9976. doi: 10.1523/JNEUROSCI.1151-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Schueler-Furman O, Wang C, Bradley P, Misura K, Baker D. Progress in modeling of protein structures and interactions. Science. 2005;310:638–642. doi: 10.1126/science.1112160. [DOI] [PubMed] [Google Scholar]
- 83.Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021 doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Tunyasuvunakool K, et al. Highly accurate protein structure prediction for the human proteome. Nature. 2021;596:590–596. doi: 10.1038/s41586-021-03828-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Tennenhouse A, et al. Computational optimization of antibody humanness and stability by systematic energy-based ranking. Nat Biomed Eng. 2023 doi: 10.1038/s41551-023-01079-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Krishna R, et al. Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom. bioRxiv. 2023:2023.10.09.561603. doi: 10.1101/2023.10.09.561603. [DOI] [PubMed] [Google Scholar]
- 87.Abanades B, et al. ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins. bioRxiv. 2022:2022.11.04.514231. doi: 10.1101/2022.11.04.514231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Zelnik ID, et al. Computational design and molecular dynamics simulations suggest the mode of substrate binding in ceramide synthases. Nat Commun. 2023;14:2330. doi: 10.1038/s41467-023-38047-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Weinstein JJ, et al. One-shot design elevates functional expression levels of a voltage-gated potassium channel. bioRxiv. 2022:2022.12.28.522065. doi: 10.1101/2022.12.28.522065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Schymkowitz J, et al. The FoldX web server: an online force field. Nucleic Acids Res. 2005;33:W382–8. doi: 10.1093/nar/gki387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Bednar D, et al. FireProt: Energy- and Evolution-Based Computational Design of Thermostable Multiple-Point Mutants. PLoS Comput Biol. 2015;11:e1004556. doi: 10.1371/journal.pcbi.1004556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Marques SM, Planas-Iglesias J, Damborsky J. Web-based tools for computational enzyme design. Curr Opin Struct Biol. 2021;69:19–34. doi: 10.1016/j.sbi.2021.01.010. [DOI] [PubMed] [Google Scholar]
- 93.Breen MS, Kemena C, Vlasov PK, Notredame C, Kondrashov FA. Epistasis as the primary factor in molecular evolution. Nature. 2012;490:535–538. doi: 10.1038/nature11510. [DOI] [PubMed] [Google Scholar]
- 94.Weinreich DM, Watson RA, Chao L. Perspective: Sign epistasis and genetic constraint on evolutionary trajectories. Evolution. 2005;59:1165–1174. [PubMed] [Google Scholar]
- 95.Starr TN, Thornton JW. Epistasis in protein evolution. Protein Sci. 2016;25:1204–1218. doi: 10.1002/pro.2897. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Yang G, et al. Higher-order epistasis shapes the fitness landscape of a xenobiotic-degrading enzyme. Nat Chem Biol. 2019;15:1120–1128. doi: 10.1038/s41589-019-0386-3. [DOI] [PubMed] [Google Scholar]
- 97.Goldsmith M, Tawfik DS. Enzyme engineering: reaching the maximal catalytic efficiency peak. Curr Opin Struct Biol. 2017;47:140–150. doi: 10.1016/j.sbi.2017.09.002. [DOI] [PubMed] [Google Scholar]
- 98.Corbella M, Pinto GP, Kamerlin SCL. Loop dynamics and the evolution of enzyme activity. Nat Rev Chem. 2023;7:536–547. doi: 10.1038/s41570-023-00495-w. [DOI] [PubMed] [Google Scholar]
- 99.Sumbalova L, Stourac J, Martinek T, Bednar D, Damborsky J. HotSpot Wizard 3.0: web server for automated design of mutations and smart libraries based on sequence input information. Nucleic Acids Res. 2018;46:W356–W362. doi: 10.1093/nar/gky417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Stourac J, et al. Caver Web 1.0: identification of tunnels and channels in proteins and analysis of ligand transport. Nucleic Acids Res. 2019;47:W414–W422. doi: 10.1093/nar/gkz378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Klaus M, Buyachuihan L, Grininger M. Ketosynthase domain constrains the design of polyketide synthases. ACS Chem Biol. 2020;15:2422–2432. doi: 10.1021/acschembio.0c00405. [DOI] [PubMed] [Google Scholar]
- 102.Ospina F, et al. Selective Biocatalytic N-Methylation of Unsaturated Heterocycles. Angew Chem Int Ed Engl. 2022 doi: 10.1002/anie.202213056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Gomez de Santos P, et al. Repertoire of Computationally Designed Peroxygenases for Enantiodivergent C-H Oxyfunctionalization Reactions. J Am Chem Soc. 2023 doi: 10.1021/jacs.2c11118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Beltrán-Nogal A, et al. Surfing the wave of oxyfunctionalization chemistry by engineering fungal unspecific peroxygenases. Curr Opin Struct Biol. 2022;73:102342. doi: 10.1016/j.sbi.2022.102342. [DOI] [PubMed] [Google Scholar]
- 105.Warshel A. Electrostatic origin of the catalytic power of enzymes and the role of preorganized active sites. J Biol Chem. 1998;273:27035–27038. doi: 10.1074/jbc.273.42.27035. [DOI] [PubMed] [Google Scholar]
- 106.Rocklin GJ, et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science. 2017;357:168–175. doi: 10.1126/science.aan0693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Tsuboyama K, et al. Mega-scale experimental analysis of protein folding stability in biology and design. Nature. 2023 doi: 10.1038/s41586-023-06328-6. [ More than a million miniproteins were designed and screened to learn the determinants of foldability and stability ] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Lipsh-Sokolik R, et al. Combinatorial assembly and design of enzymes. Science. 2023;379:195–201. doi: 10.1126/science.ade9434. [DOI] [PubMed] [Google Scholar]
- 109.Weinstein JY, et al. Designed active-site library reveals thousands of functional GFP variants. Nat Commun. 2023;14:2890. doi: 10.1038/s41467-023-38099-z. [ Millions of active-site variants were designed in the GFP active site and used to learn molecular determinants of activity ] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Khersonsky O, Fleishman SJ. What Have We Learned from Design of Function in Large Proteins? BioDesign Research. 2022;2022:1–11. doi: 10.34133/2022/9787581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Lambert TJ. FPbase: a community-editable fluorescent protein database. Nat Methods. 2019;16:277–278. doi: 10.1038/s41592-019-0352-8. [DOI] [PubMed] [Google Scholar]
- 112.Hoch SY, Weinstein JY, Netzer R, Hakeny K, Fleishman SJ. GGAssembler: Economical Design of Gene Libraries with Precise Control over Mutations. bioRxiv. 2023:2023.05.18.541394 doi: 10.1101/2023.05.18.541394. [DOI] [Google Scholar]
- 113.Mushegian AR. Are There 1031 Virus Particles on Earth, or More, or Fewer? J Bacteriol. 2020;202 doi: 10.1128/JB.00052-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Povolotskaya IS, Kondrashov FA. Sequence space and the ongoing expansion of the protein universe. Nature. 2010;465:922–926. doi: 10.1038/nature09105. [DOI] [PubMed] [Google Scholar]
- 115.Ho SP, DeGrado WF. Design of a 4-helix bundle protein: synthesis of peptides which self-associate into a helical protein. J Am Chem Soc. 1987;109:6751–6758. [Google Scholar]
- 116.Richardson JS, et al. Looking at proteins: representations, folding, packing, and design. Biophysical Society National Lecture, 1992. Biophys J. 1992;63:1185–1209. [PMC free article] [PubMed] [Google Scholar]
- 117.Broome BM, Hecht MH. Nature disfavors sequences of alternating polar and non-polar amino acids: implications for amyloidogenesis. J Mol Biol. 2000;296:961–968. doi: 10.1006/jmbi.2000.3514. [DOI] [PubMed] [Google Scholar]
- 118.Anishchenko I, et al. De novo protein design by deep network hallucination. Nature. 2021;600:547–552. doi: 10.1038/s41586-021-04184-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Dahiyat BI, Mayo SL. De novo protein design: fully automated sequence selection. Science. 1997;278:82–87. doi: 10.1126/science.278.5335.82. [DOI] [PubMed] [Google Scholar]
- 120.Koga N, et al. Principles for designing ideal protein structures. Nature. 2012;491:222–227. doi: 10.1038/nature11600. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Marcos E, et al. De novo design of a non-local β-sheet protein with high stability and accuracy. Nat Struct Mol Biol. 2018;25:1028–1034. doi: 10.1038/s41594-018-0141-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Dou J, et al. De novo design of a fluorescence-activating β-barrel. Nature. 2018;561:485–491. doi: 10.1038/s41586-018-0509-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Shakhnovich EI. Protein design: a perspective from simple tractable models. Fold Des. 1998;3:R45–58. doi: 10.1016/S1359-0278(98)00021-2. [DOI] [PubMed] [Google Scholar]
- 124.McMillan PF, Clary DC, Wolynes PG. Energy landscapes and solved protein–folding problems. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2004;363:453–467. doi: 10.1098/rsta.2004.1502. [DOI] [PubMed] [Google Scholar]
- 125.Govindarajan S, Goldstein RA. Why are some proteins structures so common? Proc Natl Acad Sci USA. 1996;93:3341–3345. doi: 10.1073/pnas.93.8.3341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Helling R, et al. The designability of protein structures. J Mol Graph Model. 2001;19:157–167. doi: 10.1016/s1093-3263(00)00137-6. [DOI] [PubMed] [Google Scholar]
- 127.Tóth-Petróczy A, Tawfik DS. The robustness and innovability of protein folds. Curr Opin Struct Biol. 2014;26:131–138. doi: 10.1016/j.sbi.2014.06.007. [DOI] [PubMed] [Google Scholar]
- 128.Pierce NA, Winfree E. Protein design is NP-hard. Protein Eng. 2002;15:779–782. doi: 10.1093/protein/15.10.779. [DOI] [PubMed] [Google Scholar]
- 129.Kuhlman B, Bradley P. Advances in protein structure prediction and design. Nat Rev Mol Cell Biol. 2019;20:681–697. doi: 10.1038/s41580-019-0163-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130.Street AG, Mayo SL. Computational protein design. Structure. 1999;7:R105–9. doi: 10.1016/s0969-2126(99)80062-8. [DOI] [PubMed] [Google Scholar]
- 131.Bhardwaj G, Mulligan VK, Bahl CD, Gilmore JM. Accurate de novo design of hyperstable constrained peptides. Nature. 2016 doi: 10.1038/nature19791. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132.Pan X, et al. Expanding the space of protein geometries by computational design of de novo fold families. Science. 2020;369:1132–1136. doi: 10.1126/science.abc0881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133.Verkuil R, et al. Language models generalize beyond natural proteins. bioRxiv. 2022:2022.12.21.521521 doi: 10.1101/2022.12.21.521521. [DOI] [Google Scholar]
- 134.Lisanza SL, et al. Joint Generation of Protein Sequence and Structure with RoseTTAFold Sequence Space Diffusion. bioRxiv. 2023:2023.05.08.539766 doi: 10.1101/2023.05.08.539766. [DOI] [Google Scholar]
- 135.Dauparas J, et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science. 2022;378:49–56. doi: 10.1126/science.add2187. [ An AI-based sequence design method improves design success rate relative to previous physics-based approaches. Available as a colab-notebook ] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136.Wicky BIM, et al. Hallucinating protein assemblies. bioRxiv. 2022:2022.06.09.493773 doi: 10.1101/2022.06.09.493773. [DOI] [Google Scholar]
- 137.Huang B, et al. A backbone-centred energy function of neural networks for protein design. Nature. 2022;602:523–528. doi: 10.1038/s41586-021-04383-5. [DOI] [PubMed] [Google Scholar]
- 138.Anand N, et al. Protein sequence design with a learned potential. Nat Commun. 2022;13:746. doi: 10.1038/s41467-022-28313-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139.Harteveld Z, et al. Deep sharpening of topological features for de novo protein design. 2022.
- 140.Eguchi RR, Choe CA, Huang P-S. Ig-VAE: Generative modeling of protein structure by direct 3D coordinate generation. PLoS Comput Biol. 2022;18:e1010271. doi: 10.1371/journal.pcbi.1010271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141.Wang J, et al. Scaffolding protein functional sites using deep learning. Science. 2022;377:387–394. doi: 10.1126/science.abn2100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142.Kim DE, et al. De novo design of small beta barrel proteins. Proc Natl Acad Sci USA. 2023;120:e2207974120. doi: 10.1073/pnas.2207974120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 143.Goverde CA, et al. Computational design of soluble analogues of integral membrane protein structures. bioRxiv. 2023:2023.05.09.540044 doi: 10.1101/2023.05.09.540044. [DOI] [Google Scholar]
- 144.Harteveld Z, et al. Exploring ‘dark matter’ protein folds using deep learning. bioRxiv. 2023:2023.08.30.555621 doi: 10.1101/2023.08.30.555621. [DOI] [Google Scholar]
- 145.Huang P-S, et al. De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nat Chem Biol. 2016;12:29–34. doi: 10.1038/nchembio.1966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 146.Norn C, et al. Protein sequence design by conformational landscape optimization. Proc Natl Acad Sci USA. 2021;118 doi: 10.1073/pnas.2017228118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 147.Lee JS, Kim J, Kim PM. Score-based generative modeling for de novo protein design. Nat Comput Sci. 2023;3:382–392. doi: 10.1038/s43588-023-00440-3. [DOI] [PubMed] [Google Scholar]
- 148.Ingraham JB, et al. Illuminating protein space with a programmable generative model. Nature. 2023;623:1070–1078. doi: 10.1038/s41586-023-06728-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149.Yim J, et al. Fast protein backbone generation with SE(3) flow matching. arXiv [q-bioQM] 2023 [Google Scholar]
- 150.Gainza P, et al. De novo design of protein interactions with learned surface fingerprints. Nature. 2023;617:176–184. doi: 10.1038/s41586-023-05993-x. [Designing binders of four target proteins using an AI-based approach that predicts putative binding sites] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151.Sesterhenn F, et al. De novo protein design enables the precise induction of RSV-neutralizing antibodies. Science. 2020;368 doi: 10.1126/science.aay5051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152.Yeh AH-W, et al. De novo design of luciferases using deep learning. Nature. 2023;614:774–780. doi: 10.1038/s41586-023-05696-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 153.Polizzi NF, DeGrado WF. A defined structural unit enables de novo design of small-molecule–binding proteins. Science. 2020;369:1227–1233. doi: 10.1126/science.abb8330. [ Computational design of small-molecule binding sites using a precomputed, low-energy constellation of ligand and interacting amino acids ] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 154.Marchand A, Van Hall-Beauvais AK, Correia BE. Computational design of novel protein-protein interactions - An overview on methodological approaches and applications. Curr Opin Struct Biol. 2022;74:102370. doi: 10.1016/j.sbi.2022.102370. [DOI] [PubMed] [Google Scholar]
- 155.Linsky TW, et al. De novo design of potent and resilient hACE2 decoys to neutralize SARS-CoV-2. Science. 2020;370:1208–1214. doi: 10.1126/science.abe0075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 156.Gainza P, et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat Methods. 2020;17:184–192. doi: 10.1038/s41592-019-0666-6. [DOI] [PubMed] [Google Scholar]
- 157.Cao L, et al. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science. 2020;370:426–431. doi: 10.1126/science.abd9909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 158.Strauch E-M, et al. Computational design of trimeric influenza-neutralizing proteins targeting the hemagglutinin receptor binding site. Nat Biotechnol. 2017;35:667–671. doi: 10.1038/nbt.3907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 159.Silva D-A, et al. De novo design of potent and selective mimics of IL-2 and IL-15. Nature. 2019;565:186–191. doi: 10.1038/s41586-018-0830-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160.Hafler DA. Cytokines and interventional immunology. Nat Rev Immunol. 2007;7:423. [Google Scholar]
- 161.Correia BE, et al. Proof of principle for epitope-focused vaccine design. Nature. 2014;507:201–206. doi: 10.1038/nature12966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 162.Azoitei ML, et al. Computation-guided backbone grafting of a discontinuous motif onto a protein scaffold. Science. 2011;334:373–376. doi: 10.1126/science.1209368. [DOI] [PubMed] [Google Scholar]
- 163.Sesterhenn F, et al. Boosting subdominant neutralizing antibody responses with a computationally designed epitope-focused immunogen. PLoS Biol. 2019;17:e3000164. doi: 10.1371/journal.pbio.3000164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 164.Jardine JG, et al. HIV-1 broadly neutralizing antibody precursor B cells revealed by germline-targeting immunogen. Science. 2016;351:1458–1463. doi: 10.1126/science.aad9195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 165.Marcandalli J, et al. Induction of Potent Neutralizing Antibody Responses by a Designed Protein Nanoparticle Vaccine for Respiratory Syncytial Virus. Cell. 2019;176:1420–1431.:e17. doi: 10.1016/j.cell.2019.01.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 166.Kanekiyo M, et al. Self-assembling influenza nanoparticle vaccines elicit broadly neutralizing H1N1 antibodies. Nature. 2013;499:102–106. doi: 10.1038/nature12202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 167.Abbott RK, et al. Precursor Frequency and Affinity Determine B Cell Competitive Fitness in Germinal Centers, Tested with Germline-Targeting HIV Vaccine Immunogens. Immunity. 2018;48:133–146.:e6. doi: 10.1016/j.immuni.2017.11.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 168.Arunachalam PS, et al. Adjuvanting a subunit COVID-19 vaccine to induce protective immunity. Nature. 2021;594:253–258. doi: 10.1038/s41586-021-03530-2. [DOI] [PubMed] [Google Scholar]
- 169.Walls AC, et al. Elicitation of Potent Neutralizing Antibody Responses by Designed Protein Nanoparticle Vaccines for SARS-CoV-2. Cell. 2020;183:1367–1382.:e17. doi: 10.1016/j.cell.2020.10.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 170.Griss R, et al. Bioluminescent sensor proteins for point-of-care therapeutic drug monitoring. Nat Chem Biol. 2014;10:598–603. doi: 10.1038/nchembio.1554. [DOI] [PubMed] [Google Scholar]
- 171.Dawson WM, et al. Differential sensing with arrays of de novo designed peptide assemblies. Nat Commun. 2023;14:383. doi: 10.1038/s41467-023-36024-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 172.Lim WA, June CH. The Principles of Engineering Immune Cells to Treat Cancer. Cell. 2017;168:724–740. doi: 10.1016/j.cell.2017.01.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 173.Giordano-Attianese G, et al. Author Correction: A computationally designed chimeric antigen receptor provides a small-molecule safety switch for T-cell therapy. Nat Biotechnol. 2020;38:503. doi: 10.1038/s41587-020-0461-z. [DOI] [PubMed] [Google Scholar]
- 174.Elazar A, et al. De novo-designed transmembrane domains tune engineered receptor functions. Elife. 2022;11 doi: 10.7554/eLife.75660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 175.Lajoie MJ, et al. Designed protein logic to target cells with precise combinations of surface antigens. Science. 2020;1643:eaba6527. doi: 10.1126/science.aba6527. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






