Practically every biological process in our cells relies on the function of proteins, acting either as single molecules or parts of larger assemblies. Proteins (from the Greek prōteios, of the first order) are the most diverse group of macromolecules. Following their synthesis on ribosomes as amino acid polymer chains, typically hundreds of residues in length, most proteins must fold into well-defined three-dimensional structures in order to become biologically active. While the three-dimensional (3D) structures of proteins are of remarkable complexity (Fig. 1), the intramolecular interactions that drive the folding process are mainly dependent on a multiplicity of noncovalent, energetically weak interactions. Therefore, how the amino acid sequence, encoded in the genetic information, determines the folded state of a protein (1) remains a fundamental problem in biology, driven by the long-standing desire to predict the fold from the amino acid sequence.
Fig. 1.
The remarkable diversity and complexity of protein structures. Credit to “RCSB Protein Data Bank” (CC-BY-4.0 license).
Importantly, because the number of conformational states a protein chain can adopt is astronomically large and the folding energy landscape is often rugged (2), exhibiting local minima (kinetic traps), the folding process is error-prone. Proteins with complex structures require chaperones to navigate the folding energy landscape successfully (3). A subset of proteins, when misfolding spontaneously or due to mutation, can convert into aberrant aggregates forming highly stable fibrillar assemblies with cross-β structure. Deposition of these “amyloid-like” aggregates within and around cells is associated with a range of human diseases, including neurodegenerative pathologies like Alzheimer’s and Parkinson’s disease. Clearly, to better understand the complexities of the pathways of protein folding and misfolding, as well as the assembly of proteins into higher-order structures, is fundamentally important in both basic and translational research. Discoveries from experiment and theory over several decades have revealed the intricate 3D structures of proteins and their conformational transitions, providing insight into how these complex molecules have evolved the ability to fold spontaneously or with the help of molecular chaperones (3, 4). However, until recently, it has not been possible to reliably predict a protein’s folded state from its sequence.
Recent breakthrough developments, using AI-based methods, now allow accurate predictions of native protein structures from primary sequence information. DeepMind’s program AlphaFold (5) enables scientists to predict the 3D shape of proteins and thus their functions. The program is easy to use and does not require specific expertise in structural biology. Since its development, AlphaFold and related approaches such as RoseTTAfold (6) have already made tremendous contributions to the life sciences by accurately predicting the structures of proteins involved in cancer development, neurodegeneration, and infectious diseases, including research on Covid-19, to name only a few of its applications. Progress resulting from these developments is truly breathtaking.
The “AI & Protein Folding” special Feature in PNAS discusses the impact of these recent advances, with a particular focus on cross-fertilization between experimental approaches and computer simulation methods. A collection of eight articles by leading scientists illuminates the different aspects of the protein folding problem, including the physico-chemical rules that govern folding and misfolding, the rules of protein evolution, and aspects of the biological machineries of protein quality control. Of particular interest is how advanced experimental techniques, applied to molecules and cells, can be utilized in combination with computational methods and AI approaches to drive the field forward.
The issue begins with a perspective by Kovalevsky et al. (7), describing the development of AlphaFold at DeepMind. AlphaFold is an AI learning system designed to predict the 3D structure of proteins based solely on their amino acid sequences. It does this by training on and extracting patterns from the database of myriad experimentally determined protein structures (8, 9) combined with information from multiple sequence alignment of honologous proteins. The extracted patterns are used as a “parts list”, taking physical and geometrical constraints into consideration. Once trained, AlphaFold can accurately predict protein structures, often achieving results comparable to experimental methods such as X-ray crystallography or cryo-electron microscopy. While not solving the problem of how a protein actually folds (i.e., its folding mechanism) (10), the development of AlphaFold represents a major advance in the field of computational biology and has the potential to significantly accelerate scientific research and the development of therapies for various diseases. Two years on from the initial release (11), AlphaFold is not only used widely as a prediction tool, having solved millions of structures (8), but is also being adapted for related applications in a variety of inventive ways. The article provides an overview of some of these exciting applications. It also addresses a number of challenges facing the field in areas like protein dynamics and predicting the effect of point mutations on protein structure.
How protein folding works and how “useful” protein sequences have evolved from simpler molecules is the subject of the contribution by Kocher and Dill (12). Focusing on fundamental physical aspects of the problem, the authors provide a perspective that proteins and their folding processes have been critical drivers in the early stages of the origin of life. They discuss the principles of protein folding that have emerged over recent years from theoretical analysis, specifically how local structural preferences sampled stochastically contribute to the formation of the global native fold; the underlying kinetic mechanisms; how chaperone machineries assist in maintaining a balanced proteome and why proteostasis balance is lost in aging cells; and, finally, how the physico-chemical principles of protein folding control protein evolution.
Lipsh-Sokolik and Fleishman (13) elaborate further on a fundamental aspect of protein evolution that provides an important constraint. They describe how an ensemble of computational methods and AI approaches can be used in defining the epistatic relationship between multiple mutations in a gene. Epistatic theory predicts that a specific mutation is tolerated only if other mutations have been fixed previously. In some cases, a set of mutations is tolerated only when multiple mutations are introduced simultaneously. As Darwinian evolution generally operates by introducing, and functionally selecting, mutations one at a time, a protein requiring multiple simultaneous mutations to evolve would find itself in an evolutionary trapped state, akin to a kinetically trapped folding intermediate in the rugged energy landscape of protein folding. This problem of epistasis constitutes a major constraint for the evolution of functional properties in a protein, such as a different enzymatic function. The authors cover the theoretical basis of epistasis in proteins and provide examples from bioinformatics and AI-based methods to explore this phenomenon.
Next, Yhang et al. (14) focus on the ability of certain proteins to phase-separate, forming liquid-like bio-condensates. This phenomenon has more recently been recognized as an important principle of subcellular organization with implications in many biological functions and in disease. It has been challenging, however, to investigate the composition of biomolecular condensates in the cellular environment, because suitable experimental methods are still under development. To fill this gap, the authors report a method of predicting the co-condensation propensity of proteins. This method, called CoDropleT (Co-condensation into Droplet Transformer), is based on the assumption that the cocondensation propensity of proteins depends on their conformational properties and that such conformational properties could be extracted from the amino acid sequences by deep learning methods based on the transformer architecture. CoDropleT may find use in accelerating the discovery of proteins involved in protein phase separation, as well as in exploring diagnostic and therapeutic opportunities offered by a quantitative knowledge of the composition of protein condensates.
The article by Williams et al. (15). deals with an important aspect of protein folding in cells, where chaperones and other machineries act to prevent or correct misfolding. The authors provide an example for the experimental validation of a protein complex structure that was predicted using AlphaFold, the large uridine diphosphate-glucose:glycoprotein glucosyltransferase (UGGT) complex of the endoplasmic reticulum (ER) and its enigmatic co-chaperone SEP15, involved in protein quality control. The ER serves as a protein-folding factory where elaborate quality and quantity control systems monitor efficient and accurate production of secretory proteins, many of which are glycosylated and/or contain disulfide bonds. UGGT selectively glycosylates misfolded proteins and facilitates their interaction with chaperones that assist in proper folding. Based on structural information obtained with AlphaFold, the researchers successfully designed mutations that disrupted the interaction of UGGT with the small selenoprotein SEP15, which likely contributes to recruiting certain client proteins to UGGT.
The following three studies address various aspects of protein misfolding and aggregation, including the formation of amyloid-like aggregates as the most dangerous outcome. Dewison et al. (16) report experiments aiming at a better understanding of α-synuclein (αSyn) aggregation, which is a critical driver of pathology in Parkinson’s disease. They demonstrate that the N-terminal region of (αSyn) (residues 2 to 7) modulate the formation of αSyn fibrils, although not changing the final fibril structure. Interestingly, deletion of residues 2 to 7 slows the rate of fibril formation in vitro and in a nematode model. Furthermore, αSynΔN7 has a reduced capacity to be recruited by wild-type αSyn fibril seeds, a process thought to underly the spreading of aggregate pathology in patient brain. These results identify the N-terminal sequence of αSyn as a new target for the design of amyloid inhibitors.
The article by Kozell et al. (17) explores the effect of external forces, such as mechanical perturbations by ultrasound, on protein fold stability. Ultrasound is routinely used in studies of amyloid aggregation, for example, to generate fragments from mature fibrils to be used as aggregate seeds. However, ultrasound can cause temperature effects and extreme shear forces and may also induce free radicals. The study shows that when precisely adjusted, ultrasound can trigger amyloid fibril growth directly from natively folded monomers. The authors present evidence that low ultrasonic energy can induce subtle conformational changes in a protein, resulting in primary aggregate nucleation, as confirmed using molecular dynamics simulations. Thus, ultrasound may be more generally useful in protein chemistry to modulate conformational states.
Finally, Jäger et al. (18) explore the role of evolution in shaping the energy landscape of unfolding of the oligomeric protein transthyretin (TTR), which circulates in blood and cerebrospinal fluid. Dissociation of the homotetrameric TTR complex, followed by misfolding and aggregation of TTR monomers underlies the disease TTR amyloidosis, which typically affects the heart in elderly individuals. To prevent amyloidosis, the TTR complex needs to be kinetically stable (with destabilizing mutations causing disease). Using sophisticated biophysical techniques, the authors studied the dissociation/unfolding pathways of human TTR (hTTR) and a thyretin-related protein of Escherichia coli (EcTRP), an evolutionary ancestor of TTR with a completely different function. Interestingly, EcTRP is as kinetically stable as hTTR under native conditions, indicating that kinetic stability, serving to avoid amyloidosis in hTTR, is an ancient feature that predates the evolution of TRPs into circulating TTR proteins. However, there are differences between the proteins in how stability is achieved, related to the structural changes that occurred during evolution from the enzyme EcTRP to the thyroid hormone binding TTR.
The collection of original research and perspective articles in this Special Feature on protein folding emphasizes the interdisciplinary spirit that unites the large community of scientists interested in protein structure and function. The recent advances in predicting protein structures are enormously helpful in addressing a range of problems from basic research to applications in biotechnology and medicine. Theory and experiment, working hand-in-hand, will drive rapid progress. A bright future is predicted for protein science.
Acknowledgments
Author contributions
U.S. and F.U.H. analyzed data; and wrote the paper.
Competing interests
The authors declare no competing interest.
Footnotes
U.S. and F.U.H. are organizers of this Special Feature.
Contributor Information
Ulyana Shimanovich, Email: ulyana.shimanovich@weizmann.ac.il.
F. Ulrich Hartl, Email: uhartl@biochem.mpg.de.
References
- 1.Anfinsen C. B., Principles that govern the folding of protein chains.Science 181, 223–230 (1973). [DOI] [PubMed] [Google Scholar]
- 2.Levinthal D. A., Adaptation on rugged landscapes.Manage. Sci. 43, 895–1045 (1997). [Google Scholar]
- 3.Balchin D., Hayer-Hartl M., Hartl U. F., In vivo aspects of protein folding and quality control.Science 353, aac4354 (2016). [DOI] [PubMed] [Google Scholar]
- 4.Hingorani K. S., Gierasch L. M., Comparing protein folding in vitro and in vivo: Foldability meets the fitness challenge.Curr. Opin. Struct. Biol. 24, 81–90 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Jumper J., et al. , Highly accurate protein structure prediction with AlphaFold.Nature 596, 583–589 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Baek M., et al. , Accurate prediction of protein structures and interactions using a three-track neural network.Science 373, 871–876 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kovalevsky O., Mateos-Garcia J., Tunyasuvunakool K., AlphaFold two years on: Validation and impact.Proc. Natl. Acad. Sci. U.S.A. 121, e2315002121 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Varadi M., et al. , AlphaFold protein structure database in 2024: Providing structure coverage for over 214 million protein sequences.Nucleic Acids Res. 52, D368–D375 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Berman H. M., et al. , The protein data bank.Nucleic Acids Res. 28, 235–242 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Chen S.-J., et al. , Protein folds vs. protein folding: Differing questions, different challenges.Proc. Natl. Acad. Sci. U.S.A. 120, e2214423119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Senior A. W., et al. , Improved protein structure prediction using potentials from deep learning.Nature 577, 706–710 (2020). [DOI] [PubMed] [Google Scholar]
- 12.Kocher C. D., Dill K. A., Origins of Life: The Protein Folding Problem all over again? Proc. Natl. Acad. Sci. U.S.A. 121, e2315000121 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lipsh-Sokolik R., Fleishman S., Addressing epistasis in the design of protein function.Proc. Natl. Acad. Sci. U.S.A. 121, e2314999121 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zhang A., Lim C., Occhetta M., Vendruscolo M., AlphaFold2-based prediction of the cocondensation propensity of proteins.Proc. Natl. Acad. Sci. U.S.A. 121, e2315005121 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Williams R. V., et al. , Insights into the interaction between UGGT, the gatekeeper of folding in the ER, and its partner, the selenoprotein SEP15.Proc. Natl. Acad. Sci. U.S.A. 121, e2315009121 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Dewison K. M., et al. , Residues 2–7 of α-synuclein regulate amyloid formation via lipid-dependent and -independent pathways.Proc. Natl. Acad. Sci. U.S.A. 121, e2315006121 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kozell A., et al. , Sound-mediated nucleation and growth of amyloid fibrils.Proc. Natl. Acad. Sci. U.S.A. 121, e2315510121 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Jäger M., Kelly J. W., Gruebele M., Conservation of kinetic stability, but not the unfolding mechanism, between human transthyretin anda transthyretin-related enzyme.Proc. Natl. Acad. Sci. U.S.A. 121, e2315007121 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

