Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Mar 31.
Published in final edited form as: Nat Chem Biol. 2024 Jun 21;20(8):950–959. doi: 10.1038/s41589-024-01638-w

The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins

Vinayak Agarwal 1,2,*, Andrew C McShan 1,*
PMCID: PMC11956457  NIHMSID: NIHMS2066019  PMID: 38907110

Abstract

Artificial intelligence driven advances in protein structure prediction in recent years have begged the question: has the protein structure prediction problem been solved? Here, with a focus on non-globular proteins, we highlight the many strengths and potential weaknesses of DeepMind’s AlphaFold2 (AF2) in the context of its biological and therapeutic applications. We summarize the subtleties associated with evaluation of AF2 model quality and reliability using predicted local distance difference test (pLDDT) and predicted aligned error (PAE) values. We highlight various classes of proteins that AF2 can be applied to, and the caveats involved. Concrete examples of how AF2 models can be integrated with experimental data in the form of SAXS, solution NMR, cryo-EM, and X-ray diffraction are discussed. Finally, we highlight the need to move beyond structure prediction of rigid, static structural snapshots towards conformational ensembles and alternate biologically relevant states. The overarching theme is that careful consideration is due when using AF2-generated models to generate testable hypotheses and structural models, rather than treating predicted models as de facto ground truth structures.

Main

DeepMind’s AlphaFold2 (AF2) has revolutionized structural biology with its deep-learning algorithm that enables accurate prediction of three-dimensional protein structures from only the target amino acid sequence, potentially solving the half a century old protein structure prediction problem: how to predict 3D structures from only sequence information14. AF2 has opened the door to understanding protein folds, structures, interactions, and function at organismal levels through modeling of 98.5% of the human proteome5. One common critique of AF2 is that it requires significant computational resources to run the software locally (up to 3 terabytes of disk space and a modern NVIDIA GPUs with gigabytes of memory). Several efforts have alleviated these limitations, including the AlphaFold Protein Structure Database6,7, which houses over 200 million pre-run AF2 predictions, as well as ColabFold8 and OpenFold9, which allow users to run a modified AF2 protocol on open access servers in minutes. These platforms have allowed the public, industry, and academics without computational resources to model and analyze structures of their favorite target using AF2 with just a few clicks of a button. Another critique of AF2 concerns whether it has truly solved the protein structure prediction problem. Several groups have proposed that AF2 has only learned how to estimate three-dimensional structures using patterns extracted from known folds in the Protein Data Bank (PDB) and coevolutionary information between residues rather than the underlying physical and chemical basis of protein folding3,1012. This is strictly true since current versions of AF2 do not use energy functions that seek to identify native-like protein conformations, unlike its competitor Rosetta13. Others suggest that AF2’s algorithm may have indirectly learned a similar function14. Finally, some critics question the accuracy of the standard implementation of AF2 against different types of non-globular molecular targets, which could limit its potential applications15,16. Overwhelming evidence suggests that machine learning software like AF2, RoseTTAFold, ESMFold, and related approaches are the best and most accurate answer to the structure prediction problem to date1,1719.

AlphaFold2’s AI-driven revolution

The basic workflow of AF2 is outlined in Fig. 1a. Users input the primary amino acid sequence of the target protein as one letter code in FASTA format. When more than one input sequence is provided AlphaFold-Multimer or AF2Complex is utilized20,21. Lower and upper limits for input sequence lengths are defined by difficulties in generating reliable multiple sequence alignments (MSAs) for short (<10 amino acids) sequences and graphic processing / memory issues for long (>3,000 amino acids) sequences, respectively. Protein sequences can be obtained from annotated public databases, such as UniProt. The full details of the AF2 workflow have been discussed previously1, but are briefly outlined below. Using the input sequence(s), AF2 first queries several databases to construct a pair representation and an MSA representation of the target. The pair representation is a matrix of pairwise interactions between amino acids that are likely to be spatially related (i.e., close to each other in space). The MSA representation is a collection of sequences that are evolutionarily related to the target sequence and provides mutational covariance information utilized by AF2. The pair and MSA representations are then passed through the Evoformer, a neural network block that exchanges information within the MSA and pair representations to establish spatial and evolutionary relationships. Next, the Structural Module parses information from the Evoformer to convert the representations into a three-dimensional protein structure. The entire process undergoes several rounds of iterative recycling to produce the final refined models. For each output AF2 generates a per-residue confidence score stored in the B-factor column of the model coordinate file (.pdb, .mmCIF, or related formats), the predicted Local Distance Difference Test (pLDDT) score, which ranges from 0 to 100 with higher values assigned higher confidence in the model (Fig. 1b)1,22. AF2 also generates a Predicted Aligned Error (PAE) matrix, which evaluates the relative orientation and position of different parts (i.e., domains) of the model6. Higher PAE values correspond to lower confidence for the relative position and orientation of two parts of the protein in the model. Users should be especially careful to not assign biological or structural relevance to regions with low pLDDT (< 70) or high PAE values (> 5 Å)5,23. However, as discussed below, high pLDDT or low PAE metrics, indicating high confidence in the prediction, do not promise agreement with native protein conformations, but instead estimate a likelihood for local and global coordinate positions and/or orientations.

Fig. 1: Overview of AlphaFold2.

Fig. 1:

a, The general workflow for an AF2 prediction is shown (derived from Jumper et al1). The input is the primary amino acid sequence. The AF2 model of Ganglioside GM2 activator protein (https://alphafold.ebi.ac.uk/entry/P17900) is shown. The model is colored based on pLDDT value. b, Left: AF2 prediction for OSBP1 (https://alphafold.ebi.ac.uk/entry/P22059) is shown. The AF2 model is colored based on pLDDT value. The domains of the protein are noted: plextrin homology (PH), phenylalanines in an acidic tract motif (FFAT), OSBP-related ligand-binding domain (ORD), and coiled-coil domain (CC). Right: The AF2 PAE metrics shows the predicted relative position error for each residue in the sequence with low confidence values in white and high confidence values in green. The domains of OSBP1 have been manually annotated on the PAE graph.

An example AF2 model of oxysterol-binding protein 1 (OSBP1), a lipid transfer protein, obtained from the AlphaFold Protein Structure Database is shown in Fig. 1b. The pLDDT values plotted onto the model highlight that the PH, CC, and ORD domain structure are assigned very high to high predicted confidence, while the FFAT domain is predicted with very low confidence. The PAE graph reveals that the model has low confidence with respective to relative placement of PH, CC, FFAT, and ORD domains with respect to each other.

A critical evaluation of AlphaFold2’s applications

There are several open areas of research concerning AF2 in the context of its biological and therapeutic applications. First, accuracy evaluations of AF2 models relative to different types of protein folds present in the PDB, especially for new structures as they’re released. Second, expanding the types of systems that AF2 can be applied to, either through benchmarking the default AF2 pipeline on new types of targets, or through modifications in the AF2 protocol. A recent structural biology community assessment reports that on average AF2 generates models with quality near experimental structures across diverse target folds and applications. These types of studies with good reason affirm AF2’s utility but may give the impression that AF2 is without limitations. Below we summarize several potential applications of AF2 and provide examples where the predicted model deviates from the experimental structure. Cases where AF2’s performance is compromised are especially important to help us understand its limitations and provide opportunities to refine its deep learning-based algorithm in future iterations. These deviations can be broadly categorized into cases with i) inaccurate secondary, tertiary, and/or quaternary structure in regions where AF2 predicts low to moderate confidence, ii) inaccurate structure in regions where AF2 predicts high model confidence, iii) correct backbone structure but incorrect fine details (i.e., side chain rotamer placements), and iv) correct backbone structure for individual domains but inaccurate placement of domains relative to each other (Fig. 2al). For cases i and iv, low confidence in pLDDT scores and PAE graphs alerts users to interpret structures with caution, which is not immediately clear in cases ii and iii.

Fig. 2: Example applications of AlphaFold2 predictions that deviate from the experimental structure.

Fig. 2:

A superposition of the AF2 model and experimental structure for several classes of peptides/proteins are shown. AF2 models are colored according to the pLDDT value with overlayed experimental structures colored in pink. The PDB IDs of the experimental structures used for comparison are noted. AF2 models were fetched from the AlphaFold Protein Structure Database or derived the literature: a, PRNP (https://alphafold.ebi.ac.uk/entry/P23907), b, Insulin (https://alphafold.ebi.ac.uk/entry/P01308), c, Polycystin 2 (https://alphafold.ebi.ac.uk/entry/Q13563), d, PqqL (https://alphafold.ebi.ac.uk/entry/P31828), e, Complement Component C6 (https://alphafold.ebi.ac.uk/entry/P13671), f, ChRmine, g, At2g23090 (https://alphafold.ebi.ac.uk/entry/O64818), h, SecA (https://alphafold.ebi.ac.uk/entry/P28366), i, L-PGDS K59A/C65A, j, NT-9,k, A12 nanobody/HIV C186 gp120 complex, l, Beta-endorphin amyloid fibril. Red arrows denote areas where the AF2 model deviates from the experimental structure. It is important to note that predicted structures, gleaned from literature or the AlphaFold structure database, have been generated using different versions of the AF2 software and/or with different input parameters and thus cannot be directly compared.

The success of AF2 in predicting protein structures begs the question as to whether it can also accurately predict peptide structures (or, in some cases, lack of a well-defined structure). Peptide structure prediction poses additional challenges given that the benchmark set used to train AF2 excluded peptides, the difficulty in generating robust MSAs for short sequences, and observations that many peptides exist in solution as conformational ensembles rather than a single static conformation1,24,25. McDonald et al performed a benchmark of 588 peptides revealing that AF2 predicts many α-helical and β-hairpin peptide structures with surprising accuracy25. However, AF2 was challenged by mixed secondary structure membrane and soluble peptides, such as the prion protein PRNP (Fig. 2a)25. It was also shown that the best ranked AF2 models (selected on the basis of high pLDDT score) often did not exhibit the lowest Cα root-mean-square deviation (RMSD) relative to the experimental structure, suggesting the pLDDT metric used by AF2 to assess protein models is not optimal for classification of peptide conformations25. In a separate study, Tsaban et al showed that AF2 can be adapted to accurately model peptide/protein complexes irrespective of peptide length, although the results seemed biased towards helical structures and peptides that do not undergo large structural rearrangements upon binding24. New methods are fine-tuning AF2-based pipelines for specific types of peptide/protein complexes (i.e., peptide/MHC)26. These studies provide compelling evidence that AF2 can be applied across peptides, proteins, and peptide/protein complexes albeit with several limitations and caveats. Refinement of AF2 derived models with nuclear magnetic resonance (NMR) derived restraints, such as chemical shift values, torsion angles, residual dipolar couplings (RDCs), and nuclear Overhauser effects (NOEs) data, could help improve accuracy of peptide modeling15.

To date, the most common types of protein folds benchmarked in AF2 assessments are globular and extended / repeat proteins1,5,17. NMR structural ensembles offer a unique validation metric to assess the accuracy of predicted AlphaFold2 models since AlphaFold2 was trained on a subset of the PDB that excluded NMR data1,2730. While AF2 performs exceptionally well on these types of folds on average, Fowler et al revealed that NMR ensembles can be more accurate than static AF2 models for dynamic proteins28. As an example, the AF2 model of insulin deviates significantly from its experimental NMR structure (Fig. 2b), potentially due to the inability of AF2 to orient cysteine pairs for disulfide bond formation31. Another consideration is that AF2 models of many globular proteins, especially enzymes and metalloproteins, lack functionally relevant co-factors, prosthetic groups, or ligands. The authors of AF2 note that since it is trained on both apo and holo structures from the PDB, models may still be consistent with the expected structure in the presence of ligands or cofactors despite their absence in the AF2 workflow1,5. However, whether the modeled structure resembles the apo or holo form of the protein is not immediately clear from analysis of pLDDT scores or PAE graphs32. Furthermore, deviations of AF2 models from experimental structures also occur when co-factors, prosthetic groups, or ligands induce structural changes, either locally or allosterically. For example, the NMR structure of Ca2+ bound polycystin 2 deviates from the AF2 model, potentially due to conformational changes upon calcium binding (Fig. 2c). Likewise, the AF2 model of the zinc protease PqqL deviates from the open, highly extended conformation determined by X-ray crystallography (Fig. 2d). New algorithms, such as AlphaFill, are actively being developed that could improve AF2 structure prediction and refinement for co-factor, prosthetic group, or ligand bound proteins33. These modifications will enable AF2 to identify new therapeutic candidates34. AF2 may also exhibit difficulties in structure prediction for extended proteins or proteins with repeat elements35. In the case of the extended Complement C6 protein, AF2 predicts the structure of individual domains well but deviates in placement of domains relative to each other (Fig. 2e). For large macromolecules, users may be able to estimate the likelihood that AF2 correctly placed domains relative to each other by visualization of confidence scores in the PAE graph. However, it is important to remember that the PAE values are only confidence estimates. Furthermore, the accuracy of PAE graphs for inter-domain prediction has not been as extensively benchmarked as for intra-domain contacts36.

Evaluation of membrane protein structure is another important application of AF25,37. Benchmarking AF2 against membrane proteins represents a challenge since the membrane environment, which includes lipids and other proteins, is not directly considered by current versions of AlphaFold38. Furthermore, membrane proteins represent less than 3% of total structures in the PDB15, meaning that the training set used by AF2 was highly biased towards soluble proteins39. Hegedűs et al benchmarked several membrane proteins not included in the original AF2 training set and concluded that on average AF2 predicts transmembrane proteins as well as soluble proteins38. However, the authors note two important limitations. First, AF2 models with trans-membrane region lengths corresponding to non-physiological membrane thickness values can exhibit very high pLDDT scores (high model confidence), suggesting pLDDT scores alone are not sufficient to select native membrane protein conformations. Second, AF2 performs poorly for targets embedded in membrane thickness outside the 15–35 Å range as well as targets with novel features not commonly present in the PDB. In agreement with these findings, Azzaz et al have shown the difficulty in AF2 in modeling membrane proteins due to “epigenetic” factors (i.e., lipid environment, co-receptor induced structural changes, post-translational modifications) that control protein structure beyond the amino acid sequence40. As an example, while the AF2 model of the channelrhodopsin ChRmine captures its overall fold, the modeled N-terminal region and extracellular loops deviate from its experimental high-resolution cryogenic electron microscopy (cryo-EM) structure (Fig. 2f), likely due to ChRmine’s unique covalent Schiff base feature41. It will be imperative to evaluate AF2 against membrane protein structures as they become more readily available as the result of high-resolution cryo-EM and advances in NMR spectroscopy.

Another unknown is how AF2 performs on intrinsically disordered proteins (IDPs) and proteins with intrinsically disordered regions (IDRs)27,42. IDPs represents a challenge for AF2 since it is difficult to identify evolutionary constraints from MSAs of IDPs and IDRs due to sequence hypervariability. In addition, like peptides, IDPs and IDRs are best thought of as sampling diverse conformational ensembles rather than a single static conformation42. Preliminary studies suggest that the majority of targets with very low-confidence score (pLDDT < 50) assigned by AF2 are likely to be IDPs / IDRs rather than well-folded structures that AF2 fails to predict4244. However, for many targets, AF2 models with predicted disorder may not be relevant for structure and function analysis other than for assigning the likelihood for conformational heterogeneity27. As an example, the NMR structure of IDR containing protein At2g23090 deviates from the AF2 model despite the confident pLDDT score (Fig. 2g). A study by Ruff et al showed that radius of gyration values of IDPs / IDR containing proteins calculated using static AF2 models significantly deviates from those experimentally obtained by small-angle X-ray scattering (SAXS)42. Future benchmarks should continue to evaluate AF2 against panels of IDPs and IDR containing proteins using novel Critical Assessment of Protein Intrinsic Disorder (CAID) targets44. Efforts are also underway to establish whether AF2 can be used to predict alternative conformations or conformational ensembles of folded proteins (discussed in detail below). Several groups have suggested that the default AF2 pipeline has difficulty in modeling alternative conformations45. For example, AF2 fails to predict the “open” activated conformation of the ATPase SecA (Fig. 2h). Interestingly, several groups have shown that modifications of AF2 have the potential to generate models that significantly deviate from each other, allowing for sampling of conformational landscapes. A study by del Alamo et al modified the AF2 pipeline by reducing the number of recycles and restricting the depth of randomly subsampled MSAs to sample functionally relevant alternative conformations of transporters and G-protein-coupled receptors46. Similarly, Wayment-Steele et al found that clustering MSAs by sequence similarity enables AF2 to sample known alternative states of KaiB, RfaH, and Mad247. Further benchmarking of modified AF2 protocols against IDPs, IDR containing proteins, and alternative conformation is required to establish protein prediction strengths and limitations for those of systems43,48.

There are several other challenging structural modeling problems in biology and therapeutics that AF2 is tasked with. One of the most sought-after applications of AF2 is predicting the effect of mutations on protein structure and/or stability17. The AF2 authors note that “AlphaFold has not been trained or validated for predicting the effect of mutations.” In support of this, studies have reported an inability of AF2 to predict the effects of mutations on protein structure and stability14,49,50, which may be due to a training bias on stable structures or an inability to extract signal from small mutations through MSAs. As an example, structural perturbations induced by the K59A/C65A double mutation in Prostaglandin D Synthase are not accurately captured by AF2 (Fig. 2i). A recent adaption of AF, AlphaMissense, does not explicitly determine the structural effects of a mutation on a protein but provides the probability of a missense variant being pathogenic51. Other groups have suggested that developing AF2 workflows that are less depending on MSAs could be beneficial14.

Another challenge is modeling of novel three-dimensional folds that are either completely absent or not commonly represented in the PDB, such as de novo designed proteins. In these cases, AF2 has not been fully trained on novel topologies, which are not commonly found in the PDB. Furthermore, extraction of coevolutionary information from MSAs using de novo designed targets may be difficult since the amino acid sequences of de novo designed proteins deviate from naturally observed sequences. For some, this is ultimate test of whether AlphaFold may have solved the protein structure prediction problem. Interestingly, Moffat et al showed the AF2 performs well on the de novo designed proteins Top7, Peak6, Foldit1, and Ferredog-Diesel52. Slight deviations in tertiary structure are noted, such as for the nuclear transport factor 2 derived de novo designed protein NT-9, but the overall structure is well described (Fig. 2j). For targets without known homologs, such as computationally designed proteins, increasing the number of recycling iterations can improve the quality of the prediction8. An inverted version of AlphaFold, called AlphaDesign, has been used for de novo protein design with some success53.

Evaluation of protein/protein interactions, including oligomerization, is another major potential application of AF2 that continues to be explored17. Yin et al benchmarked 152 heterodimeric protein complexes revealing that AF2 and AF2-Multimer had a 51% success rate54,55. The authors note that AF2 had difficulty modeling antibody/protein complexes, such as the A12 nanobody/HIV gp120 complex (Fig. 2k). A separate study by Bryant et al reported a 63% success rate for heterodimeric complexes56. Both studies suggest that a robust MSA coevolutionary signal is required for accurate complex modeling. Preliminary reports also suggest that AF2 may also be able to predict oligomeric states of proteins and amyloids17,57. However, care must be taken when interpreting predictions, as highlighted by the incorrect AF2 model of the beta-endorphin amyloid fiber relative to its experimental solid-state NMR structure58 (Fig. 2l).

Evaluation metrics and model reliability

As stated above, AlphaFold provides error categorizations in the form of pLDDT scores and PAE values to estimate confidence of its predictions and to evaluate overall model quality/reliability. For the majority of globular proteins, AF2 provides accurate, reliable models with high pLDDT (> 70) or low PAE values (< 5 Å) highlighting confidence in the prediction of the position of the atomic coordinates, which match experimentally determined native structures5,17,23,59. In other cases, if the “best” AF2 model exhibits many residues with low pLDDT (< 70) or high PAE values (> 5 Å), the likelihood that the backbone structure matches the native conformation is very low and the model cannot be reliably interpreted. Previous analysis suggests that AF2 predicts on average ~50% of residues across all proteins with high confidence17,59. Users can attempt to increase model quality (better pLDDT and PAE values) by generating a series of predictions with different parameters (number of recycles, number of random seeds, number of ensemble)1 or employ integration with experimental data60. However, cases in which where the AF2 evaluation metrics are good but the model doesn’t match experimental structure (Fig. 2a,d,e,k) suggest that care must be taken in blind faith in pLDDT and PAE metrics. The most dramatic case is when AF2 provides excellent evaluation metrics despite complete disagreement of the model’s backbone structure with an experimental structure. Terwilliger et al estimate that ~10% of residues predicted by AF2 with high confidence deviate from the backbone by more than 2 Å from native conformations observed in experimental structures61. There are also cases in which AF2 generates models with high confidence whose backbone structure is correct, but fine details, such as side-chain rotamer placement, are lacking. Jumper et al note that a rotamer is generally classified as correct if the predicted torsion angle is within 40° of the experimental torsion angle, which is correlated with pLDDT scores > 901. However, as noted by several groups, high pLDDT at a specific residue does not always indicate the correct rotameter has been modeled23. Cases can also exist where AF2 predicts the correct backbone structure for individual domains but misplaces domains relative to each other, which should be recognizable in the output PAE matrices. Several groups have reported cases of AF2 models with low PAE values (< 5 Å) that deviated from experimental data62,63. While no precise mechanism exists to identify these cases, some groups have employed molecular dynamics simulations to further evaluate stability and quality of AF2 models64,65. Some groups have used MD simulations to suggest that pLDDT and PAE metrics provide information on dynamics and disorder42,65. However, other reports have compared pLDDT scores with crystallographic B-factors to suggest AF2 confidence metrics are unable to provide direct information on local flexibility66. The determinants driving cases where AF2 models are associated with high confidence but deviate from experimental structure are currently unknown and should be thoroughly evaluated in future studies, especially in the context of non-globular proteins, towards quantitatively defining limits of AlphaFold’s evaluation and error categorization metrics. Updated and refine approaches for error categorization may provide better methods for model quality assessment relative to pLDDT and PAE metrics14,67.

Integration of AF2 models with experimental data

In cases where no experimental data is available (i.e., in vitro recombinant protein production or in situ characterization is not possible), insights into the structure and function of proteins may be primarily guided by AF2 predictions supplemented with molecular dynamics simulations to further evaluate model stability64,65. In cases where recombinant protein can be prepared in milligram quantities required for biophysical characterization, AF2 models can be integrated with experimental data, typically in the form of SAXS, NMR, X-ray crystallography, and cryo-EM (Figure 3ad). Here, experimental results are directly compared and contrasted against a series of AF2 models to evaluate which prediction, if any, adequately fits the data. AF2 models are increasingly used as initial templates to fit experimental data. The models subsequently undergo further refinement in an iterative fashion to match data towards generation of data driven structural models. Another possibility is the use of implicit experimental data to guide and restrain AF2 predictions (i.e., AF2 models are refined to best fit experimental data)60. The integration of AF2 models with experiments is especially useful for cases where template structures or homologous models are lacking. The use of AF models in structure determination protocols has shown to reduce the time and effort required relative to ab initio model building6870.

Fig. 3: Integration of AlphaFold models with experimental data.

Fig. 3:

a, Schematic of using AF2 models together with either small angle X ray scattering (SAXS) or small angle neutron scattering (SANS) data. In this example, AF2 models are compared against SAXS data in the form of the pair distribution function, P(r) and log (Iq) vs q graphs. The SAXS envelope is fit together with AF2 models in an iterative fashion and refined to generate the final structure. b, Schematic of using AF2 models together with X-ray diffraction data. In the absence of an experimental template structure, AF2 models are iteratively used during the molecular replacement / phasing stages to process and fit the diffraction data in an iterative fashion. When the proper solution is found, the model is refined to generate the final structure. c, Schematic of using AF2 models together with solution NMR data. In one pathway, AF2 models are used together with experimental distance restraints (in the form of either nuclear Overhauser effects – NOEs, residual dipolar couplings – RDCs, paramagnetic relaxation enhancements – PREs, and/or pseudocontact shifts – PCSs) towards automated NMR resonance assignment via the predicted structure (in this case a 2D 1H-13C methyl HMQC spectra). In another pathway, predicted distances in the AF2 models are compared to those obtained experimentally. If the restraints match, the AF2 model is validated and refined. If the experimental restraints do not match, the AF2 model can be refined/recalculated using those restraints. d, Schematic of using AF2 models together with cryo-EM data. 2D class averages obtained from cryo-EM experiments are reconstructed into 3D density maps. AF2 models are iteratively fitted into the cryo-EM density map and refined to generate the final structure. For panels a-d, all graphs / maps shown are conceptual (i.e., not real data). For all theoretical examples shown human glycolipid transfer protein (PDB ID 1SWX) was used.

As one example, theoretical SAXS profiles for a series of AF2 models can be predicted from the 3D coordinates and directly compared with experimental SAXS data in form of P(r) vs r or log (Iq) vs q plots where χ2 values provide a goodness-of-fit measure for AF2 models relative to the solution state structure, which is time and ensemble averaged in SAXS7173. The best matching AF2 model is fitted into the experimental SAXS-derived envelope using a variety of software for further refinement73 (Fig. 3a). Preliminary comparison of SAXS-derived versus AF calculated P(r) curves revealed that for many cases a static AF2 model does not adequately describe solution state structures42,72. Recent methods have shown that fitting of SAXS data significantly improves when an ensemble of AF predicted structures is used rather than a static AF model71, highlighting the importance of integrating AF models with experimental data. An important caveat is that one must be wary of overfitting AF2 models to SAXS envelopes, especially for lower resolution data74. Typically, χ2 values of less than one are indicative of overfitting, and additional strategies such as the combination of Vc, Qr, X2free, and Rsas metrics have been proposed as more robust evaluation metrics75.

AF2 models have also been increasingly used during molecular replacement and phasing of X-ray diffraction data obtained from protein crystals7679 (Fig. 3b). Standard molecular replacement strategies require 3D coordinates of a template/homologous structure and work best when the template is < 2 Å Cα RMSD from the target structure80. Recently, an AF2 integrated iterative procedure for molecular replacement has been developed where AF models are utilized during the initial structure-solution cycle, followed by data guided cycles of AF structure prediction and model rebuilding60,69. This iterative procedure works extremely well as demonstrated in a benchmark where 187 of 215 structures were solved by AF-guided molecular replacement; the success was shown to be dependent on high confidence scores associated with the AF prediction69. The use of AF models in molecular replacement can be further enhanced by downweighting or removing low confidence regions23.

Another burgeoning area where AF models are integrated with experimental data is solution-state NMR15,27,28,30,8185. A series of AF2 models can be compared with experimental NMR data in the form of distance and conformation sensitive structural restraints obtained from NOEs81, RDCs29,86, paramagnetic relaxation enhancements (PREs)87,88, and/or pseudocontact shifts (PCSs)89 (Figure 3c). If NMR-derived restraints match the AF2 model, the structure can be refined. Otherwise, the NMR-derived restraints can be used to recalculate the structure using the AF2 model as a template28. Moreover, in the absence of NMR resonance assignments, AF2 models can be used as structural templates towards automated assignment90. This is especially helpful for large biological assemblies where methyl side-chain labeling affords an increase in signal and resolution91,92. Here, NMR assignments for methyl side-chain groups can be obtained using only methyl-methyl NOEs obtained from 3D NMR experiments and the atomic coordinates of a structure (or AF2 predicted structure) as input with software such as MAUS, MAGIC, and methylFLYA91,93,94.

AF models have also been used extensively together with single particle cryo-EM data60,76,95,96 (Figure 3d). 2D class averages generated from tens of thousands of particle images are used as the input for 3D classification and reconstruction. A series of AF2 models are fitted into the 3D cryo-EM density maps and each is evaluated for goodness-of-fit96 and can be refined to generate a final structure97. Like for X-ray diffraction studies, implicit incorporation of AF2 models, which are iteratively rebuilt on the basis of cryo-EM data, enables swift and robust structure modeling relative to ab initio model building60,98. Here, the resolution of the cryo-EM data is essential for accurate model building. Reports suggest AF2 models should not be fit into cryo-EM maps with resolution greater than 6Å98,99.

Beyond static snapshots: ensembles and conformational landscapes

Native conformations of proteins are often described as time averaged ensembles of conformations with Boltzmann-type distributions, especially IDPs and proteins with IDRs100,101. Apart from IDPs, well-folded globular proteins, such as G-protein coupled receptors and kinases, also sample a wide range of conformations to carry out their biological function102,103. The standard implementation of AlphaFold performs well only in detecting a single structural snapshot (the “ground state” structure), likely due to the lack of a large set of redundant protein conformers in the training set16,32. Thus, several groups have worked to extend AlphaFold to include predictions of structural ensembles and excited state structures102,104. Initial efforts to enhance sampling of different conformations have involved altering the number of sequences used to generate shallower MSA representations, masking coevolutionary information provided by MSAs, and splitting conflicting coevolutionary signals by clustering MSAs46,47,105. Another approach is to enable dropout layers in the neural network, which are usually commonly used only during neural network training8,106,107. These approaches have shown great promise in increasing the ability of AF2 to predict alternative conformations, although benchmarking has been limited by the small number of structures in the PDB solved with multiple conformations. The use of experimental data, especially SAXS, NMR, and cryo-EM, has also been described to guide modeling of ensembles and alternative conformations71,82,108.

Conclusions and outlook

AlphaFold and other machine learning based structure prediction software represents a giant leap forward in our understanding of protein function and structures. However, they are not yet “one-sized fits all” solutions to the protein structure prediction problem. Current implementations of AF2 can provide highly accurate working models for most rigid, well-folded globular proteins, but may have issues predicting other classes of proteins. However, as suggested by recent work, we expect incredible progress in other classes of proteins in the coming years33,37,47,51. Machine learning approaches are also expected to be applied towards structure prediction of biomolecules, including nucleic acids109, carbohydrates110, and lipids. The case studies highlighted here reveal why caution must be taken in naive interpretation of AF2 models, even for cases with reasonable pLDDT and PAE confidence metrics (Fig. 2al). We expect that future studies will enable further refinement of error categorizations by teasing out fine details of cases with good evaluation metrics that don’t match experimental results. We also expect to see increase integration of experimental data with AF2 predictions. Several of the studies mentioned here also show that simple modifications in the AF2 workflow can further extend its accuracy and applications into new horizons. On average >10,000 protein structures are released in the PDB per year (https://www.rcsb.org/stats/growth/growth-protein). AF2 will continue to be evaluated against new experimental structures to further identify areas for improvement. Even in the face of an impressive display of accuracy, AlphaFold2 is still best utilized to complement and extend interpretation of experimental data at both structural and functional levels.

Acknowledgements

A.C.M. acknowledges start-up funds from the Georgia Institute of Technology. V.A. acknowledges support from the National Science Foundation (CHE-2238650) and the National Institutes of Health (R35GM142882).

Footnotes

Ethics declarations

The authors declare no competing interests.

Data Availability

PyMOL sessions containing comparisons of AlphaFold models (extracted from the literature or AlphaFold Database) compared with experimental structures together with python script used to color coding structures based on pLDDT values are freely available at https://github.com/mcshanlab/AlphaFold_Models_Agarwal_McShan.

References

  • 1.Jumper J et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bertoline LMF, Lima AN, Krieger JE & Teixeira SK Before and after AlphaFold2: An overview of protein structure prediction. Front Bioinform 3, 1120370 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Perrakis A & Sixma TK AI revolutions in biology. EMBO reports 22, e54046 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bouatta N, Sorger P & AlQuraishi M Protein structure prediction by AlphaFold2: are attention and symmetries all you need? Acta Crystallogr D Struct Biol 77, 982–991 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Tunyasuvunakool K et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Varadi M et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research 50, D439–D444 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Varadi M et al. AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences. Nucleic Acids Res gkad1011 (2023) doi: 10.1093/nar/gkad1011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mirdita M et al. ColabFold: making protein folding accessible to all. Nat Methods 19, 679–682 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ahdritz Gustaf et al. OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. bioRxiv 2022.11.20.517210 (2023) doi: 10.1101/2022.11.20.517210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chen S-J et al. Protein folds vs. protein folding: Differing questions, different challenges. Proceedings of the National Academy of Sciences 120, e2214423119 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Skolnick J, Gao M, Zhou H & Singh S AlphaFold 2: Why It Works and Its Implications for Understanding the Relationships of Protein Sequence, Structure, and Function. J Chem Inf Model 61, 4827–4831 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Outeiral C, Nissley DA & Deane CM Current structure predictors are not learning the physics of protein folding. Bioinformatics 38, 1881–1887 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Alford RF et al. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J Chem Theory Comput 13, 3031–3048 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Roney JP & Ovchinnikov S State-of-the-Art Estimation of Protein Model Accuracy Using AlphaFold. Phys Rev Lett 129, 238101 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Laurents DV AlphaFold 2 and NMR Spectroscopy: Partners to Understand Protein Structure, Dynamics and Function. Frontiers in Molecular Biosciences 9, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Chakravarty D & Porter LL AlphaFold2 fails to predict protein fold switching. Protein Sci 31, e4353 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Akdel M et al. A structural biology community assessment of AlphaFold2 applications. Nat Struct Mol Biol 29, 1056–1067 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Baek M et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Lin Z et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023). [DOI] [PubMed] [Google Scholar]
  • 20.Evans R et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv 2021.10.04.463034 (2022) doi: 10.1101/2021.10.04.463034. [DOI] [Google Scholar]
  • 21.Gao M, Nakajima An D, Parks JM & Skolnick J AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nat Commun 13, 1744 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Mariani V, Biasini M, Barbato A & Schwede T lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 29, 2722–2728 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Oeffner RD et al. Putting AlphaFold models to work with phenix.process_predicted_model and ISOLDE. Acta Crystallogr D Struct Biol 78, 1303–1314 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Tsaban T et al. Harnessing protein folding neural networks for peptide–protein docking. Nat Commun 13, 176 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.McDonald EF, Jones T, Plate L, Meiler J & Gulsevin A Benchmarking AlphaFold2 on peptide structure prediction. Structure 31, 111–119.e2 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Mikhaylov V et al. Accurate modeling of peptide-MHC structures with AlphaFold. Structure 32, 228–241.e4 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Alderson TR, Pritišanac I, Kolarić Đ, Moses AM & Forman-Kay JD Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2. Proc Natl Acad Sci U S A 120, e2304302120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Fowler NJ & Williamson MP The accuracy of protein structures in solution determined by AlphaFold and NMR. Structure 30, 925–933.e2 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Zweckstetter M NMR hawk‐eyed view of AlphaFold2 structures. Protein Sci 30, 2333–2337 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Tejero R, Huang YJ, Ramelot TA & Montelione GT AlphaFold Models of Small Proteins Rival the Accuracy of Solution NMR Structures. Frontiers in Molecular Biosciences 9, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Thornton JM, Laskowski RA & Borkakoti N AlphaFold heralds a data-driven revolution in biology and medicine. Nat Med 27, 1666–1669 (2021). [DOI] [PubMed] [Google Scholar]
  • 32.Saldaño T et al. Impact of protein conformational diversity on AlphaFold predictions. Bioinformatics btac202 (2022) doi: 10.1093/bioinformatics/btac202. [DOI] [PubMed] [Google Scholar]
  • 33.Hekkelman ML, de Vries I, Joosten RP & Perrakis A AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat Methods 20, 205–213 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Karelina M, Noh JJ & Dror RO How accurately can one predict drug binding modes using AlphaFold models? Elife 12, RP89386 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Diwan GD, Gonzalez-Sanchez JC, Apic G & Russell RB Next Generation Protein Structure Predictions and Genetic Variant Interpretation. J Mol Biol 433, 167180 (2021). [DOI] [PubMed] [Google Scholar]
  • 36.David A, Islam S, Tankhilevich E & Sternberg MJE The AlphaFold Database of Protein Structures: A Biologist’s Guide. J Mol Biol 434, 167336 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Jambrich MA, Tusnady GE & Dobson L How AlphaFold2 shaped the structural coverage of the human transmembrane proteome. Sci Rep 13, 20283 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Hegedűs T, Geisler M, Lukács GL & Farkas B Ins and outs of AlphaFold2 transmembrane protein structure predictions. Cell Mol Life Sci 79, 73 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Topitsch A, Schwede T & Pereira J Outer membrane β-barrel structure prediction through the lens of AlphaFold2. Proteins (2023) doi: 10.1002/prot.26552. [DOI] [PubMed] [Google Scholar]
  • 40.Azzaz F, Yahi N, Chahinian H & Fantini J The Epigenetic Dimension of Protein Structure Is an Intrinsic Weakness of the AlphaFold Program. Biomolecules 12, 1527 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kishi KE et al. Structural basis for channel conduction in the pump-like channelrhodopsin ChRmine. Cell 185, 672–689.e23 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Ruff KM & Pappu RV AlphaFold and Implications for Intrinsically Disordered Proteins. Journal of Molecular Biology 433, 167208 (2021). [DOI] [PubMed] [Google Scholar]
  • 43.Wilson CJ, Choy W-Y & Karttunen M AlphaFold2: A Role for Disordered Protein/Region Prediction? Int J Mol Sci 23, 4591 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Piovesan D, Monzon AM & Tosatto SCE Intrinsic protein disorder and conditional folding in AlphaFoldDB. Protein Sci 31, e4466 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Lane TJ Protein structure prediction has reached the single-structure frontier. Nat Methods 20, 170–173 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Del Alamo D, Sala D, Mchaourab HS & Meiler J Sampling alternative conformational states of transporters and receptors with AlphaFold2. Elife 11, e75751 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Wayment-Steele HK et al. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 625, 832–839 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Zhao B, Ghadermarzi S & Kurgan L Comparative evaluation of AlphaFold2 and disorder predictors for prediction of intrinsic disorder, disorder content and fully disordered proteins. Computational and Structural Biotechnology Journal 21, 3248–3258 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Buel GR & Walters KJ Can AlphaFold2 predict the impact of missense mutations on structure? Nat Struct Mol Biol 29, 1–2 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Pak MA et al. Using AlphaFold to predict the impact of single mutations on protein stability and function. PLoS One 18, e0282689 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Cheng J et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023). [DOI] [PubMed] [Google Scholar]
  • 52.Moffat L, Greener JG & Jones DT Using AlphaFold for Rapid and Accurate Fixed Backbone Protein Design. bioRxiv 2021.08.24.457549 (2021) doi: 10.1101/2021.08.24.457549. [DOI] [Google Scholar]
  • 53.Goverde CA, Wolf B, Khakzad H, Rosset S & Correia BE De novo protein design by inversion of the AlphaFold structure prediction network. Protein Sci 32, e4653 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Yin R, Feng BY, Varshney A & Pierce BG Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants. Protein Sci 31, e4379 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Yin R & Pierce BG Evaluation of AlphaFold antibody-antigen modeling with implications for improving predictive accuracy. Protein Sci 33, e4865 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Bryant P, Pozzati G & Elofsson A Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun 13, 1265 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Jeppesen M & André I Accurate prediction of protein assembly structure by combining AlphaFold and symmetrical docking. Nat Commun 14, 8283 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Pinheiro F, Santos J & Ventura S AlphaFold and the amyloid landscape. J Mol Biol 433, 167059 (2021). [DOI] [PubMed] [Google Scholar]
  • 59.Binder JL et al. AlphaFold Illuminates Half of the Dark Human Proteins. Curr Opin Struct Biol 74, 102372 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Terwilliger TC et al. Improved AlphaFold modeling with implicit experimental information. Nat Methods 19, 1376–1382 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Terwilliger TC et al. AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination. Nat Methods 21, 110–116 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.McCafferty CL, Pennington EL, Papoulas O, Taylor DW & Marcotte EM Does AlphaFold2 model proteins’ intracellular conformations? An experimental test using cross-linking mass spectrometry of endogenous ciliary proteins. Commun Biol 6, 1–10 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Motmaen A et al. Peptide-binding specificity prediction using fine-tuned protein structure prediction networks. Proc Natl Acad Sci U S A 120, e2216697120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Jussupow A & Kaila VRI Effective Molecular Dynamics from Neural Network-Based Structure Prediction Models. J. Chem. Theory Comput 19, 1965–1975 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Guo H-B et al. AlphaFold2 models indicate that protein sequence determines both structure and dynamics. Sci Rep 12, 10696 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Carugo O pLDDT Values in AlphaFold2 Protein Models Are Unrelated to Globular Protein Local Flexibility. Crystals 13, 1560 (2023). [Google Scholar]
  • 67.Zhu W, Shenoy A, Kundrotas P & Elofsson A Evaluation of AlphaFold-Multimer prediction on multi-chain protein complexes. Bioinformatics 39, btad424 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Fontana P et al. Structure of cytoplasmic ring of nuclear pore complex by integrative cryo-EM and AlphaFold. Science 376, eabm9326 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Terwilliger TC et al. Accelerating crystal structure determination with iterative AlphaFold prediction. Acta Crystallogr D Struct Biol 79, 234–244 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Blanc M et al. Designed Ankyrin Repeat Proteins provide insights into the structure and function of CagI and are potent inhibitors of CagA translocation by the Helicobacter pylori type IV secretion system. PLoS Pathog 19, e1011368 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Brookes E, Rocco M, Vachette P & Trewhella J AlphaFold-predicted protein structures and small-angle X-ray scattering: insights from an extended examination of selected data in the Small-Angle Scattering Biological Data Bank. J Appl Crystallogr 56, 910–926 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Brookes E & Rocco M A database of calculated solution parameters for the AlphaFold predicted protein structures. Sci Rep 12, 7349 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Chinnam NB et al. Combining small angle X-ray scattering (SAXS) with protein structure predictions to characterize conformations in solution. Methods Enzymol 678, 351–376 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Da Vela S & Svergun DI Methods, development and applications of small-angle X-ray scattering to characterize biological macromolecules in solution. Curr Res Struct Biol 2, 164–170 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Rambo RP & Tainer JA Accurate assessment of mass, models and resolution by small-angle scattering. Nature 496, 477–481 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Kryshtafovych A et al. Computational models in the service of X-ray and cryo-EM structure determination. Proteins 89, 1633–1646 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Chai L et al. AlphaFold Protein Structure Database for Sequence-Independent Molecular Replacement. Crystals 11, 1227 (2021). [Google Scholar]
  • 78.McCoy AJ, Sammito MD & Read RJ Implications of AlphaFold2 for crystallographic phasing by molecular replacement. Acta Crystallogr D Struct Biol 78, 1–13 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Barbarin-Bocahu I & Graille M The X-ray crystallography phase problem solved thanks to AlphaFold and RoseTTAFold models: a case-study report. Acta Crystallogr D Struct Biol 78, 517–531 (2022). [DOI] [PubMed] [Google Scholar]
  • 80.Abergel C Molecular replacement: tricks and treats. Acta Crystallogr D Biol Crystallogr 69, 2167–2173 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Chiliveri SC et al. Experimental NOE, Chemical Shift, and Proline Isomerization Data Provide Detailed Insights into Amelotin Oligomerization. J Am Chem Soc 145, 18063–18074 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Abdollahi H, Prestegard JH & Valafar H Computational modeling multiple conformational states of proteins with residual dipolar coupling data. Curr Opin Struct Biol 82, 102655 (2023). [DOI] [PubMed] [Google Scholar]
  • 83.Sedinkin SL, Burns D, Shukla D, Potoyan DA & Venditti V Solution Structure Ensembles of the Open and Closed Forms of the ~130 kDa Enzyme I via AlphaFold Modeling, Coarse Grained Simulations, and NMR. J Am Chem Soc 145, 13347–13356 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Li EH et al. Blind assessment of monomeric AlphaFold2 protein structure models with experimental NMR data. J Magn Reson 352, 107481 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Ma P, Li D-W & Brüschweiler R Predicting protein flexibility with AlphaFold. Proteins (2023) doi: 10.1002/prot.26471. [DOI] [PubMed] [Google Scholar]
  • 86.Robertson AJ, Courtney JM, Shen Y, Ying J & Bax A Concordance of X-ray and AlphaFold2 Models of SARS-CoV-2 Main Protease with Residual Dipolar Couplings Measured in Solution. J. Am. Chem. Soc 143, 19306–19310 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Lenard AJ, Mulder FAA & Madl T Solvent paramagnetic relaxation enhancement as a versatile method for studying structure and dynamics of biomolecular systems. Prog Nucl Magn Reson Spectrosc 132–133, 113–139 (2022). [DOI] [PubMed] [Google Scholar]
  • 88.Koehler Leman J & Künze G Recent Advances in NMR Protein Structure Prediction with ROSETTA. Int J Mol Sci 24, 7835 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Zhu W, Yang DT & Gronenborn AM Ligand-Capped Cobalt(II) Multiplies the Value of the Double-Histidine Motif for PCS NMR Studies. J Am Chem Soc 145, 4564–4569 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Klukowski P, Riek R & Güntert P Time-optimized protein NMR assignment with an integrative deep learning approach using AlphaFold and chemical shift prediction. Sci Adv 9, eadi9323 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.McShan AC Utility of methyl side chain probes for solution NMR studies of large proteins. Journal of Magnetic Resonance Open 14–15, 100087 (2023). [Google Scholar]
  • 92.Ruschak AM & Kay LE Methyl groups as probes of supra-molecular structure, dynamics and function. Journal of Biomolecular NMR 46, 75–87 (2009). [DOI] [PubMed] [Google Scholar]
  • 93.Pritišanac I, Würz JM, Alderson TR & Güntert P Automatic structure-based NMR methyl resonance assignment in large proteins. Nat Commun 10, 4922 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Clay MC, Saleh T, Kamatham S, Rossi P & Kalodimos CG Progress toward automated methyl assignments for methyl-TROSY applications. Structure 30, 69–79.e2 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Giri N, Roy RS & Cheng J Deep learning for reconstructing protein structures from cryo-EM density maps: Recent advances and future directions. Curr Opin Struct Biol 79, 102536 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Hryc CF & Baker ML AlphaFold2 and CryoEM: Revisiting CryoEM modeling in near-atomic resolution density maps. iScience 25, 104496 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Reggiano G, Lugmayr W, Farrell D, Marlovits TC & DiMaio F Residue-level error detection in cryo-electron microscopy models. Structure 31, 860–869.e4 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Dai X, Wu L, Yoo S & Liu Q Integrating AlphaFold and deep learning for atomistic interpretation of cryo-EM maps. Brief Bioinform 24, bbad405 (2023). [DOI] [PubMed] [Google Scholar]
  • 99.Alshammari M, He J & Wriggers W Appraisal of AlphaFold2-Predicted Models in Cryo-EM Map Interpretation. Microsc Microanal 29, 977–978 (2023). [Google Scholar]
  • 100.Lindorff-Larsen K & Kragelund BB On the Potential of Machine Learning to Examine the Relationship Between Sequence, Structure, Dynamics and Function of Intrinsically Disordered Proteins. J Mol Biol 433, 167196 (2021). [DOI] [PubMed] [Google Scholar]
  • 101.Wei G, Xi W, Nussinov R & Ma B Protein Ensembles: How Does Nature Harness Thermodynamic Fluctuations for Life? The Diverse Functional Roles of Conformational Ensembles in the Cell. Chem Rev 116, 6516–6551 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Sala D, Hildebrand PW & Meiler J Biasing AlphaFold2 to predict GPCRs and kinases with user-defined functional or structural properties. Front Mol Biosci 10, 1121962 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Heo L & Feig M Multi-state modeling of G-protein coupled receptors at experimental accuracy. Proteins 90, 1873–1885 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Sala D, Engelberger F, Mchaourab HS & Meiler J Modeling conformational states of proteins with AlphaFold. Curr Opin Struct Biol 81, 102645 (2023). [DOI] [PubMed] [Google Scholar]
  • 105.Stein RA & Mchaourab HS SPEACH_AF: Sampling protein ensembles and conformational heterogeneity with Alphafold2. PLoS Computational Biology 18, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Wallner B AFsample: improving multimer prediction with AlphaFold using massive sampling. Bioinformatics 39, btad573 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Johansson-Åkhe I & Wallner B Improving peptide-protein docking with AlphaFold-Multimer using forced sampling. Front Bioinform 2, 959160 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Ramelot TA, Tejero R & Montelione GT Representing structures of the multiple conformational states of proteins. Current Opinion in Structural Biology 83, 102703 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Townshend RJL et al. Geometric deep learning of RNA structure. Science 373, 1047–1051 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Bojar D & Lisacek F Glycoinformatics in the Artificial Intelligence Era. Chem Rev 122, 15971–15988 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

PyMOL sessions containing comparisons of AlphaFold models (extracted from the literature or AlphaFold Database) compared with experimental structures together with python script used to color coding structures based on pLDDT values are freely available at https://github.com/mcshanlab/AlphaFold_Models_Agarwal_McShan.

RESOURCES