Abstract
Two years on from the initial release of AlphaFold, we have seen its widespread adoption as a structure prediction tool. Here, we discuss some of the latest work based on AlphaFold, with a particular focus on its use within the structural biology community. This encompasses use cases like speeding up structure determination itself, enabling new computational studies, and building new tools and workflows. We also look at the ongoing validation of AlphaFold, as its predictions continue to be compared against large numbers of experimental structures to further delineate the model’s capabilities and limitations.
Keywords: AlphaFold, structural biology, protein structure prediction
The first experimental protein structures were determined in the 1950s using X-ray crystallography, proving that protein chains fold into well-defined 3D shapes (1). Anfinsen went on to assert that each sequence of amino acids adopts a specific 3D structure (2). This raised an important question: Can we predict a protein’s structure given only its amino acid sequence?
Even in the early days of structural biology, when few protein structures were available, it became clear that proteins with similar sequences adopt similar shapes. This quickly led to the idea of homology modeling—predicting a structure based on its sequence similarity to known structures, see, e.g., ref. 3. As more experimental data accumulated, the Protein Data Bank (PDB) was established in the 1970s (4). This was crucial for the field, making possible open sharing of structural data, facilitating its analysis, and laying the foundation for all future structure prediction efforts.
In 1994, the critical assessment of methods for protein structure prediction (CASP) was started to encourage the development of more accurate prediction methods (5). CASP consists of blind prediction challenges: Participants predict nontrivial protein structures for which experimental results have not yet been made public. With the growth in computing power and in the PDB during the late 90s and 2000s, novel computational methods flourished. Fragment-based methods used short protein fragments extracted from experimental structures as building blocks to construct a prediction (6). More recently, methods incorporating evolutionary information and contact prediction showed great promise (7–9). However, predictions rarely met the bar for near-experimental quality (a GDT_TS score > 90) before CASP14, when the machine learning system AlphaFold2 (referred in this paper as simply AlphaFold) achieved this level of accuracy on the majority of CASP targets (10, 11).
Since AlphaFold’s release in 2021, it has seen rapid and widespread adoption. Based on data from OpenAlex, the paper describing the method has now received over 10,000 citations (10). Originated as a repository of 360,000 predicted protein structures from 21 organisms including humans, the AlphaFold Database has since grown exponentially, encompassing a staggering collection of 214,000,000 structures in 2022. The AlphaFold Database has had 1.6 million unique visitors from more than 190 countries, and the whole archive has been downloaded over 23,000 times as of January 2024 (12–14). Finally, our own analysis of PDB depositions up to January 2023 found ~850 entries associated with a paper that uses AlphaFold (of which over 60% were cryo-EM structures); evidence of the method’s widespread use among experimental structural biologists. Here, we review how AlphaFold is being used today, with a particular focus on that community. We also discuss recent work on evaluating AlphaFold by comparing its predictions against experimental results on a larger scale. Given the rapid pace of the field, we do not expect this review to be complete.
AlphaFold in Structural Biology
Impact on Experimental Structure Determination.
A major outcome of improvements in structure prediction has been to accelerate the work of experimental structural biologists by simplifying certain steps in their workflow.
This first became apparent in X-ray crystallography. To determine a structure by X-ray crystallography or microcrystal electron diffraction, it is necessary to reconstruct the phase information lost during the diffraction experiment. Molecular replacement is a technique to reconstruct the phases that requires no additional experimental work, but it does rely on having a search model that closely resembles the target structure. There have now been numerous reports of successful molecular replacement using AlphaFold predictions (15–18), including challenging cases where all search models derived from PDB had failed (19, 20), where the target had a novel fold (21) or was a de novo design (22). In fact, work by Terwilliger et al. (discussed later) suggests that a high percentage of structures can now be phased largely automatically using AlphaFold (23).
The community has developed a variety of tools to support this workflow, making a further valuable contribution to accelerating the process. Both major software suites for macromolecular crystallography, CCP4 (24, 25) and PHENIX (26), now include import procedures that convert AlphaFold’s pLDDT* confidence metric into an estimated B-factor and remove low-confidence regions. CCP4Cloud, a cloud-based environment for crystallographic computations (27), offers online access to AlphaFold modeling (25). Widely used automatic tools like MRBUMP (28) and MRPARSE (29) can now search for templates and fetch predictions from the AlphaFold Database with minimal user intervention. In some cases, it is better to split a prediction into smaller regions before attempting molecular replacement. Software like Slice’n’Dice in CCP4 (30) and PHENIX’s process_predicted_model (31) can split an AlphaFold prediction into domains based on its PAE plot† or on spatial clustering, while ARCIMBOLDO (32), an ab initio phasing tool, can extract fragments from AlphaFold models (33). The Low Resolution Structure Refinement pipeline (LORESTR) has also been updated to automatically fetch models from the AlphaFold Database and use them for restraints generation (25, 34).
AlphaFold has also had a substantial impact on structure determination by cryo-EM. Since the “Resolution Revolution” (35) it is relatively common to obtain detailed electron density maps with resolution better than 3.5 Å. Still, many reconstructions suffer from data collection problems leading to lower resolution in some regions. Cryo-ET and subtomogram averaging are also now used to visualize large assemblies or parts of whole cells and can yield lower resolution data. Combining cryo-EM with AlphaFold predictions can give the best of both worlds, with the experimental data serving to validate the prediction and reveal domain arrangements, while the prediction provides fine atomic details.
A pioneering example of this integrative approach was work on the nuclear pore complex, which fit AlphaFold models for individual proteins and small subcomplexes into electron density maps with resolution 12 to 23 Å, reconstituting the majority of this enormous ~120 MDa assembly (36, 37). Since then, we have seen numerous other integrative cryo-EM examples, elucidating the structures of the intraflagellar train (38–40); the augmin complex (41, 42); components of the yeast small subunit processome (43); and components of the eukaryotic lipid transport machinery (44). In the case of Retriever (part of a 0.5 MDa endosomal trafficking complex), particles suffered from preferred orientations, leading to one direction being poorly resolved with an overestimated overall resolution of 4.3 Å. However, the close agreement with AlphaFold’s prediction made it easy to fit the model into experimental maps, and the authors went on to reconstruct the whole Commander complex, combining information from experimental structures and predictions (45).
Recognizing the utility of this approach, some of the major model building and fitting programs used in cryo-EM have added support for AlphaFold predictions. COOT (46) can import predictions from the AlphaFold Database, while ChimeraX (47) includes an option to generate new predictions in ColabFold (48). Again the community has built on AlphaFold to produce useful automated workflows. For example, an iterative procedure for model building has been developed that begins by fitting an initial AlphaFold prediction into the experimental density using PHENIX tooling (49). In subsequent iterations, the latest fitted structure is provided to AlphaFold as a template, producing a prediction that more closely matches the density. This iterative procedure improves the resulting structures beyond simple rebuilding against experimental data. Another automated solution uses a deep learning–based quality score (DAQ) to identify low-quality regions and then rebuilds these in a targeted fashion with AlphaFold (50). Interestingly, an analysis using the ML-based validation tool checkMySequence has highlighted at least one example where a deposited cryo-EM structure appears to suffer from a register shift, while a rebuilt model guided by an AlphaFold prediction is in good agreement with the experimental density map (51). Another ML tool conkit-validate specifically uses AlphaFold predictions to derive interresidue contacts and distances for identification of register shifts (52).
A particularly novel use of AlphaFold in cryo-EM has been identifying unknown densities via structural search. In one example, researchers were working to solve the structure of the mycobacterial lipid transporter Mce1 (53). Their density maps revealed a previously unknown subunit of the complex, in sufficient detail to build a polyalanine model of the protein. They were then able to perform a structural search of the model against a large number of predictions in the AlphaFold Database, which returned a hit for MSMEG_3032/LucB. The assignment was subsequently validated by checking that LucB and the rest of the Mce1 system copurify. This method in particular is only possible thanks to the availability of large prediction databases.
Predicting Protein–Protein Interactions.
Although AlphaFold initially was not trained to predict protein–protein complexes, it became apparent that even a monomer version of AlphaFold is capable of predicting them (54). A specially trained AlphaFold-Multimer was released later in 2021, facilitating the discovery and characterization of new protein–protein (including protein–peptide) interactions (55). Computational methods are useful in this context, as they can scale to screening millions of protein pairs. One of the first examples was work by Humphreys et al., which used a combination of RoseTTAFold (56) and AlphaFold to screen 8.3 million protein pairs from Saccharomyces cerevisiae (54). Searching for complexes that might be broadly conserved across eukaryotes, they identified 1,505 novel interactions and proposed predicted structures for 912 assemblies. Other large-scale interaction prediction efforts have explored the human proteome (57) and the proteome of Bacillus subtilis (58), using experimental data and prior knowledge to narrow down the set of protein pairs to process.
More recent work has used AlphaFold-Multimer to better understand specific biological pathways on a mechanistic level (55). For example, Gu et al. had identified the largely uncharacterized protein midnolin as a novel mediator of proteasomal degradation, involved in regulating levels of transcription factors like EGR1, FosB, and c-Fos (59). They used AlphaFold-Multimer to predict the structure of midnolin in complex with several of its substrates, including IRF4. The complex prediction suggested a mechanism of action in which two Catch domains in midnolin come together to capture a β-strand portion of the substrate. This hypothesis was tested for several midnolin substrates, either by introducing targeted mutations into the predicted β-strand region or deleting it altogether, after which the interaction with midnolin no longer occurred.
In a separate example, Lim et al. were studying the protein DONSON, which is necessary for the assembly of CMG helicase in vertebrates (60). However, exactly how DONSON mediated helicase assembly was unclear. The authors used AlphaFold-Multimer to screen 70 core DNA replication factors for possible interactions with DONSON. Based on the most confidently predicted complexes, they were able to build up a structural model of a pre-Loading Complex, in which DONSON interacts with GINS, TOPBP1, and Pol ε. Experimental evidence for the model was subsequently obtained from coimmunoprecipitation and site-directed mutagenesis. Analogously, Sifri et al. investigated a system for DNA double-strand break repair, employing AlphaFold-Multimer to predict all possible pairwise protein combinations within the 53BP1-RIF1-shieldin-CST pathway. Their analysis revealed a novel binding interface between RIF1 and SHLD3 and provided structural information for seven previously characterized interactions; these findings were subsequently confirmed experimentally (61). These examples illustrate how multimeric structure prediction can shed light on protein–protein interactions, both through large-scale screens and more targeted structure modeling.
Today, software is available that aims to simplify and accelerate interaction screening with AlphaFold-Multimer, notably AlphaPulldown (62). Besides large-scale screening, other supported use cases include locating the binding interface between two proteins by screening pairs of sequence fragments, and identifying which subunits of a complex are in direct contact via all-to-all screening. Given a list of pairwise subunit interactions, other tools like MoLPC can attempt to build out a model of the full complex (63). MoLPC uses Monte Carlo tree search to explore possible orders in which to assemble the chains, stopping when there are too many clashes and scoring each output to identify the most promising assembly. The latest update of AlphaFold-Multimer also supports higher residue and chain limits, meaning that complexes of up to 20 chains may now be predicted directly.
Use in Protein Design.
While AlphaFold is not intended as a protein design model, it has been used by the community as a component in design pipelines. For example, Wicky et al. used AlphaFold to generate novel “hallucinated” complexes by beginning with a random sequence and number of copies and then performing Monte Carlo search using AlphaFold confidence plus a cyclic symmetry metric as the loss function (64). This generated topologically diverse 1- to 7-mers that were subsequently used as targets for design with ProteinMPNN (65). A more recent investigation has shown that both AlphaFold and RosettaFold are useful for filtering protein designs, with the inclusion of a structure prediction step increasing success rates significantly over a purely energy-based pipeline (66). After exploring several filtering approaches, the authors used AlphaFold average interchain PAE < 10 as a selection criterion in their prospective analysis, finding that this yielded a higher success rate for binder designs by between eightfold and 30-fold.
Another way AlphaFold has been used in design is simply to generate a starting prediction for a protein with no experimental structure, which can then be used to guide design efforts and suggest which domains or residues to edit. For example, in recent work by Kreitz et al. the authors aimed to re-engineer a contractile injection system from the bacterium Photorhabdus asymbiotica so that it would target human cells. As no experimental structure was available, they modeled the trimeric distal tip protein with AlphaFold, revealing a globular domain that appeared to be responsible for target recognition. By replacing this domain with alternative binding proteins, they were able to alter the injection system to target human and mouse cells, demonstrating its potential as a delivery system for therapeutics (67).
Enabling New Computational Work.
A key advantage of computational methods is their ability to scale. Experimental structural biology may take months to solve one protein structure and so cannot keep pace with the rapid accumulation of known protein sequences. Structure prediction tools can keep up with modern sequencing, and enable the construction of extremely large prediction databases. Examples include the AlphaFold Database (12, 13), which now covers over 200M UniProt sequences (68), and ESM Metagenomic Atlas (69), which covers 600M metagenomic sequences. The community has further enriched these prediction databases with information from other sources. For example, AlphaFill adds ligands to predictions by “transplanting” them in from similar PDB structures (70), while TmAlphaFold and AFTM use software to add predicted membrane planes (71, 72).
The availability of large structure databases has spurred on the development of efficient algorithms like FoldSeek (73) and FoldSeek cluster (74), which can group structurally similar entries or identify proteins similar to a query structure. We have already mentioned how structural search can be used to identify unknown densities in cryo-EM maps. It can also be applied in other situations where researchers would previously have relied on sequence homology, e.g., for functional annotation (75), or for identifying parasite proteins that are molecular mimics of a host protein (76).
Large structure prediction databases also contribute to the effort to catalog protein folds. CATH, a hierarchy-based structural classification of protein domains, now incorporates AlphaFold predictions from 21 model organisms. Analysis of ~370,000 predicted domains, filtered based on confidence and geometrical quality, assigned 92% of them to existing CATH superfamilies. Nevertheless, AlphaFold predictions contained considerable structural novelty: 25 new superfamilies and a 36% increase in the number of unique “global” folds (77). More recent analyses of the full AlphaFold Structure Database have looked at the properties of its structural clusters (74) and its potential for use in function annotation (78).
Other computational work builds on AlphaFold by leveraging its confidence metrics rather than its ability to scale. Two particularly interesting examples explore AlphaFold’s ability to rank its own predictions. First, a method called AFSample has been developed which generates and ranks ~5,000 AlphaFold-Multimer predictions for any given input (79). Diversity is boosted by enabling dropout and by varying a range of settings (e.g., whether templates are used and the number of recycling iterations). AFSample demonstrates that ranking large numbers of diverse predictions using a pTM-based score‡ often succeeds in picking out higher accuracy models, and the method ranked top in the protein assembly category of CASP15, with a +0.13 higher DockQ score than default AlphaFold-Multimer.
Meanwhile, Roney and Ovchinnikov have investigated whether AlphaFold can rank predictions made in the absence of any evolutionary information from a multiple sequence alignment (80). They generated multiple predictions for a given target structure, each using a different “decoy” structure as the input template. A “composite confidence score” based on several AlphaFold outputs was able to rank the resulting predictions with a mean Spearman’s coefficient of 0.925, identifying the decoys closest to the target. Composite confidence outperformed ROSETTA’s energy function (81) and DeepAccNet (82) at this task. The authors proposed based on the results that AlphaFold has “learned an energy function that can assess sequence–structure agreement, but needs coevolution data or templates to help search for optimal structures.”
This observation resonates with another finding: AlphaFold's low confidence scores (low pLDDT and high predicted aligned error) correlate strongly with intrinsic protein disorder, making it a state-of-the-art predictor of protein disordered regions. This was initially observed during large-scale protein structure modeling for the AlphaFold Database (13) and later confirmed by independent researchers (17, 83, 84).
Although AlphaFold was originally designed to predict only one conformation for a particular sequence of amino acids, it seems that it is often possible to induce AlphaFold into generating alternative structural states of a protein. One way of doing this is subsampling or clustering of MSA based on sequence similarity (85, 86). Another strategy is combining shallow MSA with a template corresponding to the specific state (87, 88). If the MSA signal is strong, AlphaFold tends to ignore structural information from the template; artificial weakening of the MSA signal by reducing the number of sequences in the MSA increases contribution from the template and pushes AlphaFold prediction to the conformation specified by a template. This can be easily done by specifying the corresponding parameters of ColabFold, a popular, community-driven front-end to AlphaFold (48).
Experimental Validation of AlphaFold Models
CASP14 provided the initial evidence for AlphaFold’s accuracy. However, the evaluation of any computational method is necessarily an ongoing process, involving continued comparison of predictions against new experimental results. This section looks at recent investigations that compare AlphaFold predictions against experimental results on a larger scale, to evaluate different aspects of the method.
Single Chains.
Molecular replacement, a technique to solve phase problem in macromolecular crystallography, requires a search model that closely resembles the actual contents of the target crystal. Typically this technique works well if the rmsd between model and target is <1.5 Å (89), and it may work at <2 Å rmsd on 50% of atoms (90). Therefore, if AlphaFold predictions can be used for molecular replacement it indicates that they closely resemble the crystal structures. A comprehensive investigation into the use of AlphaFold models for molecular replacement has now been conducted by Terwilliger et al. (23). They took 215 recent PDB structures solved by experimental phasing (indicating that molecular replacement attempts by the original depositors likely failed). They then tried to solve the structures using an AlphaFold search model followed by a fully automated iterative refinement process. Molecular replacement succeeded for 208/215 cases, and further automated refinement went on to yield a high-quality model for 87% of the structures (at least 50% of C-alpha atoms matching the deposited model to within 2 Å).
A second investigation used the same benchmark set of 215 structures to study how closely AlphaFold predictions matched: 1) density maps from the automated refinement process and 2) deposited PDB models (91). The mean map-model correlation was 0.56 for AlphaFold predictions vs. 0.86 for the deposited models. AlphaFold predictions had a median Cα rmsd to the corresponding PDB structure of 1.0 Å. To put this in context, the median rmsd for another PDB structure of the same protein crystallized in a different space group would be 0.6 Å. Confidence scores were predictive of the level of agreement with deposited models, highlighting the importance of referring to these when interpreting predictions. Regions with low confidence (pLDDT < 70) had a median rmsd of 3.5 Å, while for high-confidence regions (pLDDT > 90) the median rmsd was only 0.6 Å. An analysis of side chains indicated that 20% of AlphaFold-predicted side chains are substantially different from the map data and 7% are incompatible with the data; corresponding values for PDB structures in a different space group were 6% and 2%. The authors concluded that AlphaFold predictions are “valuable hypotheses” that “accelerate but do not replace experimental structure determination.”
Complexes.
Evidence for predicted complexes can be obtained at scale using nonstructural experimental techniques like cross-linking mass-spectrometry (XL-MS). In this method, chemical cross-linkers are used to covalently fix amino acids that are spatially close. After a protease digestion step, the short cross-linked peptides can be identified by mass spectrometry, providing structural constraints on a protein or protein–protein interface. In one study, in-cell XL-MS was used to search for protein–protein interactions in B. subtilis. Based on AlphaFold-Multimer modeling, novel high-confidence structures were proposed for 153 dimeric and 14 trimeric protein assemblies. The authors then looked at the cross-link violation rate of AlphaFold-Multimer predictions as a function of the ipTM confidence metric. They found that models with ipTM > 0.85 show especially low rates of cross-link violation and tend to agree with experimental interresidue distances identified in situ (58). Models in the lower confidence 0.55 to 0.85 ipTM range showed a wide range of violation rates and would particularly benefit from checking against independent experimental data.
Biologically Relevant States.
The majority of data used to train AlphaFold come from crystal structures, and there is a long-standing discussion in the field about how representative these are of proteins in solution and in cells (92).
Cross-linking mass spectrometry is one method that can deliver information about protein states in situ. A study by Bartolec et al. generated a high-coverage cross-link dataset for HEK293 cells and then looked at whether these cross-links were satisfied in PDB structures as well as in AlphaFold predictions (93). Of note, 92% of intrachain cross-links were satisfied in high-confidence AlphaFold models for proteins without an experimental structure. This compares favorably with the corresponding cross-link satisfaction rate for PDB structures (89 to 99%). Another study looked at the 100 best-sampled proteins from intact Tetrahymena thermophila cilia, cross-linked in situ (94). AlphaFold models satisfied 86.2% of the experimental cross-links, with 43% of proteins showing no cross-linking violations at all. Observed violations tended to occur in low-confidence regions or between domains. Based on this, the authors concluded that AlphaFold “predicts biologically relevant protein conformations.”
Protein structures solved by NMR are another interesting point of comparison for AlphaFold models. In contrast to crystallography, NMR provides information about solution-state protein structures, in the form of spectra and derived coupling constants, e.g., chemical shifts. Fowler and Williamson have reported that AlphaFold outputs are sometimes a better fit to the underlying NMR data than deposited ensembles, based on quality metrics like ANSURR (95). Their investigation looked at 904 human proteins and found that the AlphaFold model had a significantly higher quality score in 30% of cases, while the NMR ensemble was preferred in 2% of cases. Specifically they suggest that AlphaFold can produce more correct hydrogen bonds that persist in solution. A separate focused study on nine small monomeric proteins, absent from the AlphaFold training set, concluded that AlphaFold predictions fit the NMR data almost as well as, or in some cases even better than, experimental structures (96). An investigation of short peptide NMR structures produced more mixed results, with AlphaFold outperforming the other computational methods tested, but showing weaknesses on highly solvated peptides and helix-turn-helix structures (97).
Conclusions and Future Directions
The arrival of AlphaFold has been a transformative event in the field of structural biology. We have reviewed some of the many ways the method has been applied in the field, from accelerating experimental structure determination and protein design to discovering and understanding protein–protein interactions. AlphaFold has also enabled breakthroughs in closely related areas like genetics and molecular biology, for instance allowing development of AlphaMissense, which classifies the pathogenicity of the human genetic variants (98). The ability of AlphaFold to scale along with its self-reported confidence metrics has enabled a range of new computational work. We also touched on the ongoing process of evaluating AlphaFold predictions by comparing them against newly deposited structures and experimental data from other methods. A common theme is the importance of interpreting predictions in the light of their confidence metrics, with higher confidence models more likely to prove accurate. In lower confidence bands, a prediction can still provide a useful starting hypothesis, but it is even more important to seek independent experimental data to validate conclusions.
Current AlphaFold limitations can be viewed as an action plan for future development: modeling protein–DNA and protein–RNA complexes, predicting all functional states of a protein, elucidating the effects of point mutations, predicting the binding of ligands/ions, and modeling posttranslational modifications; first steps in some of these directions have already been reported (99, 100). Other challenges include predicting larger complexes than ever before, improving predictions for antigen–antibody interactions and orphan proteins, and improving domain positioning for membrane proteins. The inclusion of new categories in CASP15 in 2022 is a sign that the wider field of structure prediction is now moving beyond single protein chains. It also reflects the growing interest in structure prediction and its broader application. We are excited to see what new research future models might enable.
We would like to thank the multitude of people who have contributed to the development and adoption of AlphaFold. This includes not only the team at Google DeepMind but the many scientists who contributed to the training data, have used AlphaFold in their research, integrated AlphaFold into their tools, expanded its functionality, and provided constructive feedback. The method’s widespread adoption and successful use is thanks in large part to the work of this wider community.
Recent progress in structure prediction is a testament to the power of AI; we believe that this is only the beginning. The future of AI for science is full of promise, and we are only starting to scratch the surface of what is possible.
Acknowledgments
Author contributions
O.K., J.M.-G., and K.T. analyzed data and wrote the paper.
Competing interests
All authors at the time of writing are employees of Google DeepMind, which has filed non-provisional patent applications: 16/701,070; PCT/EP2020/084238; PCT/EP2021/072552; EP21766396.2; US18/025,689; CN202180067160.6; PCT/EP2021/082684; EP21816053.9; US18/026,376; CN202180068415.0; PCT/EP2021/082696; EP21816436.6; 18/034,989; PCT/EP2021/082698; EP21816056.2; 18/034,006; CN202180069629.X; PCT/EP2021/082707; EP21816437.4; 18/034,280; and CN202180069819.1. Each of the foregoing are filed in the name of DeepMind Technologies Limited, each pending, and each relating to machine learning for predicting protein structures.
Footnotes
This article is a PNAS Direct Submission. U.S. is a guest editor invited by the Editorial Board.
*pLDDT (predicted Local Distance Difference Test) is a per-residue score between 0 and 100 that reflects the AlphaFold’s confidence in the local structure of a domain.
†PAE (Predicted Aligned Error) is an AlphaFold confidence metric reported for each pair of residues, and reflects confidence in their relative positions. A low PAE implies high confidence.
‡pTM stands for predicted TM-score, and is a measure of AlphaFold’s expected global accuracy on a protein or complex. An interface-only version of pTM can be computed for complexes, called ipTM.
Data, Materials, and Software Availability
There are no data underlying this work.
References
- 1.Kendrew J. C., et al. , A three-dimensional model of the myoglobin molecule obtained by X-ray analysis. Nature 181, 662–666 (1958). [DOI] [PubMed] [Google Scholar]
- 2.Anfinsen C. B., Principles that govern the folding of protein chains. Science 181, 223–230 (1973). [DOI] [PubMed] [Google Scholar]
- 3.Browne W. J., et al. , A possible three-dimensional structure of bovine alpha-lactalbumin based on that of hen’s egg-white lysozyme. J. Mol. Biol. 42, 65–86 (1969). [DOI] [PubMed] [Google Scholar]
- 4.SpringerNature.com, Crystallography: Protein Data Bank. Nat. New Biol. 233, 223 (1971).20480989 [Google Scholar]
- 5.Moult J., Pedersen J. T., Judson R., Fidelis K., A large-scale experiment to assess protein structure prediction methods. Proteins 23, ii–v (1995). [DOI] [PubMed] [Google Scholar]
- 6.Simons K. T., Kooperberg C., Huang E., Baker D., Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 268, 209–225 (1997). [DOI] [PubMed] [Google Scholar]
- 7.Marks D. S., et al. , Protein 3D structure computed from evolutionary sequence variation. PLoS One 6, e28766 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Jones D. T., Buchan D. W. A., Cozzetto D., Pontil M., PSICOV: Precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184–190 (2012). [DOI] [PubMed] [Google Scholar]
- 9.Shindyalov I. N., Kolchanov N. A., Sander C., Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng. 7, 349–358 (1994). [DOI] [PubMed] [Google Scholar]
- 10.Jumper J., et al. , Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kryshtafovych A., Schwede T., Topf M., Fidelis K., Moult J., Critical assessment of methods of protein structure prediction (CASP)-Round XIV. Proteins 89, 1607–1617 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Varadi M., et al. , AlphaFold protein structure database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Tunyasuvunakool K., et al. , Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Varadi M., et al. , AlphaFold protein structure database in 2024: Providing structure coverage for over 214 million protein sequences. Nucleic Acids Res. 52, D368–D375 (2023), 10.1093/nar/gkad1011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.McCoy A. J., Sammito M. D., Read R. J., Implications of AlphaFold2 for crystallographic phasing by molecular replacement. Acta Crystallogr. D Struct. Biol. 78, 1–13 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Millán C., et al. , Assessing the utility of CASP14 models for molecular replacement. Proteins 89, 1752–1769 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Akdel M., et al. , A structural biology community assessment of AlphaFold2 applications. Nat. Struct. Mol. Biol. 29, 1056–1067 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Danelius E., Porter N. J., Unge J., Arnold F. H., Gonen T., MicroED Structure of a protoglobin reactive carbene intermediate. J. Am. Chem. Soc. 145, 7159–7165 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Barbarin-Bocahu I., Graille M., The X-ray crystallography phase problem solved thanks to AlphaFold and RoseTTAFold models: A case-study report. Acta Crystallogr. D Struct. Biol. 78, 517–531 (2022). [DOI] [PubMed] [Google Scholar]
- 20.Obita T., Inaka K., Kohda D., Maita N., Crystal structure of the PX domain of Vps17p from Saccharomyces cerevisiae. Acta Crystallogr. Sect. F Struct. Biol. Cryst. Commun. 78, 210–216 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Hu L., et al. , Novel fold of rotavirus glycan-binding domain predicted by AlphaFold2 and determined by X-ray crystallography. Commun. Biol. 5, 1–8 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Cao L., et al. , Design of protein-binding proteins from the target structure alone. Nature 605, 551–560 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Terwilliger T. C., et al. , Accelerating crystal structure determination with iterative AlphaFold prediction. Acta Crystallogr. D Struct. Biol. 79, 234–244 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Agirre J., et al. , The CCP4 suite: Integrative software for macromolecular crystallography. Acta Crystallogr. D Struct. Biol. 79, 449–461 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Simpkin A. J., et al. , Predicted models and CCP4. Acta Crystallogr. D Struct. Biol. 79, 806–819 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Liebschner D., et al. , Macromolecular structure determination using X-rays, neutrons and electrons: Recent developments in Phenix. Acta Crystallogr. D Struct. Biol. 75, 861–877 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Krissinel E., et al. , CCP4 Cloud for structure determination and project management in macromolecular crystallography. Acta Crystallogr. D Struct. Biol. 78, 1079–1089 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Keegan R. M., Winn M. D., MrBUMP: An automated pipeline for molecular replacement. Acta Crystallogr. D Biol. Crystallogr. 64, 119–124 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Simpkin A. J., Thomas J. M. H., Keegan R. M., Rigden D. J., MrParse: Finding homologues in the PDB and the EBI AlphaFold database for molecular replacement and more. Acta Crystallogr. D Struct. Biol. 78, 553–559 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Simpkin A. J., et al. , Slice’N’Dice: Maximising the value of predicted models for structural biologists. bioRxiv [Preprint] (2022). 10.1101/2022.06.30.497974 (Accessed 18 October 2023). [DOI]
- 31.Oeffner R. D., et al. , Putting AlphaFold models to work with phenix.process_predicted_model and ISOLDE. Acta Crystallogr. D Struct. Biol. 78, 1303–1314 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Millán C., Sammito M., Usón I., Macromolecular ab initio phasing enforcing secondary and tertiary structure. IUCrJ 2, 95–105 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Medina A., et al. , Verification: Model-free phasing with enhanced predicted models in ARCIMBOLDO_SHREDDER. Acta Crystallogr. D Struct. Biol. 78, 1283–1293 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kovalevskiy O., Nicholls R. A., Murshudov G. N., Automated refinement of macromolecular structures at low resolution using prior information. Acta Crystallogr. D Struct. Biol. 72, 1149–1161 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kühlbrandt W., Biochemistry. The resolution revolution. Science 343, 1443–1444 (2014). [DOI] [PubMed] [Google Scholar]
- 36.Mosalaganti S., et al. , AI-based structure prediction empowers integrative structural analysis of human nuclear pores. Science 376, eabm9506 (2022). [DOI] [PubMed] [Google Scholar]
- 37.Fontana P., et al. , Structure of cytoplasmic ring of nuclear pore complex by integrative cryo-EM and AlphaFold. Science 376, eabm9326 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.McCafferty C. L., et al. , Integrative modeling reveals the molecular architecture of the intraflagellar transport A (IFT-A) complex. Elife 11, e81977 (2022), 10.7554/eLife.81977 (March 24, 2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hesketh S. J., Mukhopadhyay A. G., Nakamura D., Toropova K., Roberts A. J., IFT-A structure reveals carriages for membrane protein transport into cilia. Cell 185, 4971–4985.e16 (2022). [DOI] [PubMed] [Google Scholar]
- 40.Ma Y., et al. , Structural insight into the intraflagellar transport complex IFT-A and its assembly in the anterograde IFT train. Nat. Commun. 14, 1–12 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Gabel C. A., et al. , Molecular architecture of the augmin complex. Nat. Commun. 13, 1–13 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Zupa E., et al. , The augmin complex architecture reveals structural insights into microtubule branching. Nat. Commun. 13, 1–14 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Zhao Y., Rai J., Xu C., He H., Li H., Artificial intelligence-assisted cryoEM structure of Bfr2-Lcp5 complex observed in the yeast small subunit processome. Commun. Biol. 5, 1–10 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Cai S., et al. , In situ architecture of the lipid transport protein VPS13C at ER-lysosome membrane contacts. Proc. Natl. Acad. Sci. U.S.A. 119, e2203769119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Healy M. D., et al. , Structure of the endosomal Commander complex linked to Ritscher-Schinzel syndrome. Cell 186, 2219–2237.e29 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Emsley P., Lohkamp B., Scott W. G., Cowtan K., Features and development of Coot. Acta Crystallogr. D Biol. Crystallogr. 66, 486–501 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Pettersen E. F., et al. , UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Mirdita M., et al. , ColabFold: Making protein folding accessible to all. Nat. Methods 19, 679–682 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Terwilliger T. C., et al. , Improved AlphaFold modeling with implicit experimental information. Nat. Methods 19, 1376–1382 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Terashi G., Wang X., Kihara D., Protein model refinement for cryo-EM maps using AlphaFold2 and the DAQ score. Acta Crystallogr. D: Struct. Biol. 79, 10–21 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Chojnowski G., Sequence-assignment validation in cryo-EM models with checkMySequence. Acta Crystallogr. D Struct. Biol. 78, 806–816 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Sánchez Rodríguez F., Chojnowski G., Keegan R. M., Rigden D. J., Using deep-learning predictions of inter-residue distances for model validation. Acta Crystallogr. D Struct. Biol. 78, 1412–1427 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Chen J., et al. , Structure of an endogenous mycobacterial MCE lipid transporter. Nature 620, 445–452 (2023). [DOI] [PubMed] [Google Scholar]
- 54.Humphreys I. R., et al. , Computed structures of core eukaryotic protein complexes. Science 374, eabm4805 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Evans R., et al. , Protein complex prediction with AlphaFold-Multimer. bioRxiv [Preprint] (2021). 10.1101/2021.10.04.463034 (Accessed 18 October 2023). [DOI]
- 56.Baek M., et al. , Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Burke D. F., et al. , Towards a structurally resolved human protein interaction network. Nat. Struct. Mol. Biol. 30, 216–225 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.O’Reilly F. J., et al. , Protein complexes in cells by AI-assisted structural proteomics. Mol. Syst. Biol. 19, e11544 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Gu X., et al. , The midnolin-proteasome pathway catches proteins for ubiquitination-independent degradation. Science 381, eadh5021 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Lim Y., et al. , In silico protein interaction screening uncovers DONSON’s role in replication initiation. Science 381, eadi3448 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Sifri C., Hoeg L., Durocher D., Setiaputra D., An AlphaFold2 map of the 53BP1 pathway identifies a direct SHLD3-RIF1 interaction critical for shieldin activity. EMBO Rep. 24, e56834 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Yu D., Chojnowski G., Rosenthal M., Kosinski J., AlphaPulldown—a python package for protein–protein interaction screens using AlphaFold-Multimer. Bioinformatics 39, btac749 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Bryant P., et al. , Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search. Nat. Commun. 13, 1–14 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Wicky B. I. M., et al. , Hallucinating symmetric protein assemblies. Science 378, 56–61 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Dauparas J., et al. , Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Bennett N. R., et al. , Improving de novo protein binder design with deep learning. Nat. Commun. 14, 2625 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Kreitz J., et al. , Programmable protein delivery with a bacterial contractile injection system. Nature 616, 357–364 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.UniProt Consortium, UniProt: The universal protein knowledgebase in 2021. Nucleic Acids Res. 49, D480–D489 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Lin Z., et al. , Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023). [DOI] [PubMed] [Google Scholar]
- 70.Hekkelman M. L., de Vries I., Joosten R. P., Perrakis A., AlphaFill: Enriching AlphaFold models with ligands and cofactors. Nat. Methods 20, 205–213 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Dobson L., et al. , TmAlphaFold database: Membrane localization and evaluation of AlphaFold2 predicted alpha-helical transmembrane protein structures. Nucleic Acids Res. 51, D517–D522 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Pei J., Cong Q., AFTM: A database of transmembrane regions in the human proteome predicted by AlphaFold. Database 2023, baad008 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.van Kempen M., et al. , Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 42, 243–246 (2023), 10.1038/s41587-023-01773-0 (September 21, 2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Barrio-Hernandez I., et al. , Clustering-predicted structures at the scale of the known protein universe. Nature 622, 637–645 (2023), 10.1038/s41586-023-06510-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Ruperti F., et al. , Cross-phyla protein annotation by structural prediction and alignment. Genome Biol. 24, 113 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Muthye V., Wasmuth J. D., Proteome-wide comparison of tertiary protein structures reveals molecular mimicry in Plasmodium-human interactions. Front. Parasitol. 2, 1162697 (2023). [Google Scholar]
- 77.Bordin N., et al. , AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun. Biol. 6, 1–12 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Durairaj J., et al. , Uncovering new families and folds in the natural protein universe. Nature 622, 646–653 (2023), 10.1038/s41586-023-06622-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Wallner B., Improved multimer prediction using massive sampling with AlphaFold in CASP15. Proteins 91, 1734–1746 (2023), 10.1002/prot.26562. [DOI] [PubMed] [Google Scholar]
- 80.Roney J. P., Ovchinnikov S., State-of-the-art estimation of protein model accuracy using AlphaFold. Phys. Rev. Lett. 129, 238101 (2022). [DOI] [PubMed] [Google Scholar]
- 81.Alford R. F., et al. , The rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Hiranuma N., et al. , Improved protein structure refinement guided by deep learning based accuracy estimation. Nat. Commun. 12, 1340 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Alderson T. R., Pritišanac I., Kolarić Đ, Moses A. M., Forman-Kay J. D., Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2. Proc. Natl. Acad. Sci. U.S.A. 120, e2304302120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Piovesan D., Monzon A. M., Tosatto S. C. E., Intrinsic protein disorder and conditional folding in AlphaFoldDB. Protein Sci. 31, e4466 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Wayment-Steele H. K., et al. , Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 625, 832–839 (2023), 10.1038/s41586-023-06832-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Del Alamo D., Sala D., Mchaourab H. S., Meiler J., Sampling alternative conformational states of transporters and receptors with AlphaFold2. Elife 11, e75751 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Sala D., Engelberger F., Mchaourab H. S., Meiler J., Modeling conformational states of proteins with AlphaFold. Curr. Opin. Struct. Biol. 81, 102645 (2023). [DOI] [PubMed] [Google Scholar]
- 88.Casadevall G., Duran C., Osuna S., AlphaFold2 and deep learning for elucidating enzyme conformational flexibility and its application for design. JACS Au 3, 1554–1562 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Scapin G., Molecular replacement then and now. Acta Crystallogr. D Biol. Crystallogr. 69, 2266–2275 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Abergel C., Molecular replacement: Tricks and treats. Acta Crystallogr. D Biol. Crystallogr. 69, 2167–2173 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Terwilliger T. C., et al. , AlphaFold predictions are valuable hypotheses, and accelerate but do not replace experimental structure determination. bioRxiv [Preprint] (2023). 10.1101/2022.11.21.517405 (Accessed 18 October 2023). [DOI] [PMC free article] [PubMed]
- 92.Sikic K., Tomic S., Carugo O., Systematic comparison of crystal and NMR protein structures deposited in the protein data bank. Open Biochem. J. 4, 83–95 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Bartolec T. K., et al. , Cross-linking mass spectrometry discovers, evaluates, and corroborates structures and protein-protein interactions in the human cell. Proc. Natl. Acad. Sci. U.S.A. 120, e2219418120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.McCafferty C. L., Pennington E. L., Papoulas O., Taylor D. W., Marcotte E. M., Does AlphaFold2 model proteins’ intracellular conformations? An experimental test using cross-linking mass spectrometry of endogenous ciliary proteins. Commun. Biol. 6, 421 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Fowler N. J., Williamson M. P., The accuracy of protein structures in solution determined by AlphaFold and NMR. Structure 30, 925–933.e2 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Li E. H., et al. , Blind assessment of monomeric AlphaFold2 protein structure models with experimental NMR data. bioRxiv [Preprint] (2023). 10.1101/2023.01.22.525096 (Accessed 18 October 2023). [DOI] [PMC free article] [PubMed]
- 97.McDonald E. F., Jones T., Plate L., Meiler J., Gulsevin A., Benchmarking AlphaFold2 on peptide structure prediction. Structure 31, 111–119.e2 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Cheng J., et al. , Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023). [DOI] [PubMed] [Google Scholar]
- 99.Google DeepMind AlphaFold Team and Isomorphic Labs Team, “Performance and structural coverage of the latest, in-development AlphaFold model”. Google DeepMind and Isomorphic Labs. https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/a-glimpse-of-the-next-generation-of-alphafold/alphafold_latest_oct2023.pdf. Accessed 18 October 2023. [Google Scholar]
- 100.Krishna R., et al. , Generalized biomolecular modeling and design with RoseTTAFold all-atom. bioRxiv [Preprint] (2023). 10.1101/2023.10.09.561603 (Accessed 18 October 2023). [DOI] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
There are no data underlying this work.