Skip to main content
Structural Dynamics logoLink to Structural Dynamics
. 2024 May 17;11(3):034701. doi: 10.1063/4.0000251

Identifying protein conformational states in the Protein Data Bank: Toward unlocking the potential of integrative dynamics studies

Joseph I J Ellaway 1, Stephen Anyango 1, Sreenath Nair 1, Hossam A Zaki 2, Nurul Nadzirin 1, Harold R Powell 3, Aleksandras Gutmanas 4, Mihaly Varadi 1, Sameer Velankar 1,a)
PMCID: PMC11106648  PMID: 38774441

Abstract

Studying protein dynamics and conformational heterogeneity is crucial for understanding biomolecular systems and treating disease. Despite the deposition of over 215 000 macromolecular structures in the Protein Data Bank and the advent of AI-based structure prediction tools such as AlphaFold2, RoseTTAFold, and ESMFold, static representations are typically produced, which fail to fully capture macromolecular motion. Here, we discuss the importance of integrating experimental structures with computational clustering to explore the conformational landscapes that manifest protein function. We describe the method developed by the Protein Data Bank in Europe – Knowledge Base to identify distinct conformational states, demonstrate the resource's primary use cases, through examples, and discuss the need for further efforts to annotate protein conformations with functional information. Such initiatives will be crucial in unlocking the potential of protein dynamics data, expediting drug discovery research, and deepening our understanding of macromolecular mechanisms.

INTRODUCTION

As of February 2024, the Protein Data Bank (PDB),1 the global repository of experimentally determined structures, hosts over 215 000 macromolecular structures. Recent advances in protein structure prediction—made by the new generation of AI-based tools such as AlphaFold2,2 RoseTTAFold,3 and ESMFold4—have predicted almost 1 × 109 further structures, archived in the AlphaFold Protein Structure Database (AFDB),5 the ESM Metagenomic Atlas,4 and the Model Archive.6 Although significant work is ongoing to generate ensemble models, these tools generally predict a single structure per sequence.7

To realize the relationship between protein sequence, structure, and function, we must consider their dynamics—relative movements between residues. The structure of a protein navigates a high-dimensional conformational landscape, where stable conformations occupy free energy minima.8 The transitions between these minima represent conformation changes, often crucial for protein function, both under physiological conditions or in disease progression.9,10 Changes to the landscape's topology may be induced via ligand association, solvent packing, oligomerization, pH changes, or post-translational modification11–16 [Fig. 1(a), right]. On the far end of the conformational flexibility spectrum are the intrinsically disordered proteins (IDPs), whose free energy landscapes lack deep energy minima [Fig. 1(b)], instead being littered with shallow dips that could become more favorable upon environment changes.17–19 Investigating these landscapes requires diverse experimental techniques, each contributing unique insights into conformational states or motion of proteins20–22 [Fig. 1(c)].

FIG. 1.

FIG. 1.

Illustration of functional protein conformation changes. (a) Hypothetical free-energy landscape (top) of adenylate kinase's coordination state before (left) and after (right) ligand binding. A dominant minimum is plotted in the ligand-free environment, facilitating one apo conformation (PDB: 6S36, green). Addition of ADP changes the landscape to accommodate a second minimum for the adoption of a closed conformation (PDB: 8CRG, orange), while permitting the existence of the original conformation (PDB: 6F7U, indigo). An energy barrier between these states must be overcome to transition between the conformations. (b) Hypothetical free energy landscape of the human nuclear pore complex protein Nup153—an IDP (PDB: 2EBV). (c) Conformational states of E. coli transcription factor RfaH binding the NusG N-terminal domain (NGN, left), the NusG C-terminal domain (KOW, middle), and when bound to an operon polarity suppressor (ops) DNA sequence in the transcription elongation complex (opsEC, right).33 KOW-bound structure is truncated—solved for the N-terminal domain only. PDB: 2OUG, blue; PDB: 2LCL, purple; PDB: 6C6S, green.

X-ray crystallography has been instrumental in providing atomic-resolution models of proteins. Despite its tendency to capture proteins in static states due to crystal packing, advancements such as temperature-jump and time-resolved serial femtosecond crystallography (SFX) can observe local dynamic processes within crystallized proteins.23–27 In contrast, small-angle x-ray scattering (SAXS) is a low-resolution method for studying larger, global conformation changes in solution.28–31 Combined with experimentally derived or predicted atomic models, integrative SAXS models can offer impressively comprehensive views of macromolecular states and dynamics.24,32

In recent years, cryogenic electron microscopy (cryoEM) has dramatically improved to solve thousands of macromolecules—particularly those that are difficult to crystallize.34 Like x-ray crystallography, cryoEM has traditionally produced single, static models from three-dimensional (3D) projections of many images.35,36 However, advances in direct-electron detectors and image classification software facilitate the reconstruction of conformational ensembles,37–41 offering views of proteins in states closer to their physiological conditions. Furthermore, the study of individual molecules from cryo-electron tomography (cryoET) reveals the conformation of macromolecules in situ.42–47 Such physiological insight was once the preserve of nuclear magnetic resonance (NMR) spectroscopy, which excels at detailing structure and dynamics over a range of timescales in near-physiological conditions.48–51 NMR can detect both transient states52 and intrinsically disordered regions,53,54 providing insight into protein movements and interactions crucial for biological function. However, its application is typically limited to smaller proteins and complexes—complementing the data collected by cryoEM, which struggles to resolve smaller macromolecules.38,48

The success of AI-based tools at predicting protein structures from amino acid sequences has marked a significant milestone in structural biology.2–4 However, modeling the conformational states of proteins remains a frontier,55–59 as demonstrated by the general tendency of AlphaFold2 to predict structures in similar conformations.56,60–65 Innovations have emerged where modifications to the multiple sequence alignments (MSAs), a key input for many structure prediction tools, enable the exploration of more diverse protein conformations.7,61,64,66,67 For example, the AF-Cluster technique has demonstrated through experimental validation that AlphaFold can predict multiple states of the fold-switching protein KaiB.65,68

While structure prediction tools can help investigate conformational heterogeneity, molecular dynamics (MD) simulations remain indispensable for probing the theoretical dynamic behavior of macromolecules, complementing the generally static models provided by AI-based predictions and experimental data.69–71 Despite their computational cost and the challenges associated with force field accuracy, MD simulations are invaluable tools for exploring the conformation space and potential biological activities of proteins, helping to identify novel ligand-binding sites crucial for drug discovery.63,72–74

Here, we describe the method the Protein Data Bank in Europe – Knowledge Base75 (PDBe-KB) uses to aggregate and cluster protein conformational states, primarily from x-ray, cryoEM, and NMR structures deposited in the PDB.

METHODS

The first step of the clustering process is to collate polypeptide chains from the PDB with 100% sequence identity into groups called segments [Fig. 2(a)]. A single segment will contain only structures mapping to a contiguous section of their corresponding UniProt sequence, potentially resulting in multiple segments per UniProt sequence (such as truncated N- or C-terminal domains). Each polypeptide in the PDB archive is mapped to a corresponding UniProt sequence using the SIFTS annotation tool.76,77 Only chains within segments are subsequently considered for clustering.

FIG. 2.

FIG. 2.

Automated identification of protein conformational states across the PDB archive. (a) All chains of a given UniProt accession (100% sequence identity) are assigned to segments based on their overlap with the reference UniProt sequence. Non-overlapping sequences are grouped into separate segments. (b) Chains are superposed to all other chains within their assigned segment. (c) Chain–chain GLOCON scores are calculated for all polypeptides within a segment (refer to Ref. 89 for formal definition) before (d) agglomerative clustering is performed. The results are displayed in 3D on PDBe-KB aggregated views of proteins pages.

Next, we calculate the Euclidean distances between Cα atoms per residue pair, leading to a transformation-independent Cα distance matrix. Polypeptides are compared pairwise by calculating the absolute difference between their Cα distance matrices, capturing the chain–chain differences in Cα position, independent of the chains' original Cartesian coordinates. The distance matrix is filtered by reducing elements to zero if below 3 Å, removing small discrepancies in Cα placement between structures. To condense this filtered difference matrix, the upper diagonal elements are summated and normalized by multiplication with the fraction of modeled residues, penalizing any gaps in the structures [Fig. 2(b)]. This measure captures the GLObal CONformation (GLOCON) difference as a dissimilarity score between chains.

Next, we use UPGMA agglomerative clustering to group chains based on their GLOCON scores, splitting the segment into clusters—approximating potential conformational states [Figs. 2(c) and 2(d)]. Based on the GLOCON dissimilarity score, small structural differences (such as changes in loop position) are noticeable by this clustering method, such as in the manganese ABC transporter's Leu127-Lys135 region (UniProt accession: P0A4G2). However, small differences could be obscured where small and large differences occur (such as domain movements or fold switches). Reasonable separation into clusters is generally achievable at 70% of the maximum GLOCON score, although this threshold could be further optimized per segment. All chains are superposed (independently of the clustering step) using GESAMT, which identifies structurally conserved regions between possibly heterogeneous structures.78 Where NMR structures are clustered, the first model of the ensemble is selected as a reference. PDBe runs this pipeline weekly,75 predicting conformations for the entire PDB archive.

Alongside the experimentally derived structures, our process allows users to superpose the corresponding AlphaFold2 model, supplementing the cluster results. The root-mean-squared deviation of the AlphaFold2 model from each cluster's representative chain is calculated and displayed, allowing identification of the conformational state predicted by AlphaFold2. This comparison allows users to quickly identify the conformational state predicted by the full-length AlphaFold2 protein, potentially expediting functional characterization.

To test the clustering pipeline, we manually curated a benchmark dataset of polypeptide chains in the PDB archive that adopt open or closed conformations,89 similar to previous datasets characterizing distinct secondary structure changes during fold switching.79 An initial search identified 630 unique entries with descriptions of open or closed in their PDB entry title before filtering the results for spurious substrings (e.g., cyclopentadienyl). Publications for the remaining 315 entries were read to designate labels of conformational states. The dataset comprises a range of structural variations at different scales, such as a ∼5 Å loop movement in α-fucosidase (UniProt accession: J9UN47), a set of intra-domain rearrangement of residues in NMR structures (e.g., PDB code: 6qeb) of human carbonic anhydrase (UniProt accession: P00918), and a ∼20 Å C-terminal domain movement at 5′-deoxynucleotidase's Glu332 hinge (UniProt accession: P21589). We make the dataset available through the PDBe-KB's FTP server and Kaggle.

All the data from the clustering process are openly accessible from the PDBe-KB FTP area, through API end points in the PDBe Aggregated API and via the PDBe-KB aggregated views of proteins. The code is open source and available on GitHub under the Apache 2.0 license.

RESULTS: NOTABLE EXAMPLES FROM THE ARCHIVE

The PDB provides a rich sampling of protein conformation space, where independently solved structures have identical sequences. Although a significant portion of the biologically meaningful conformation space has been captured, it is non-trivial to identify distinct conformations across all PDB entries.80,81 For example, hexokinase from Sulfurisphaera tokodaii (UniProt accession: Q96Y14) is the first glycolytic enzyme that initializes respiration and is essential during anaerobic conditions [Fig. 3(a)]. The kinase is moderately promiscuous to sugar substrates,82 allowing it to associate with glucose, mannose, glucosamine, xylose, and N-acetylglucosamine. Hexokinase adopts an open or a closed conformation, dependent on sugar binding, although ADP binding has a marginal effect on the protein's shape. Our automated pipeline can discern between the open and closed states, even identifying the open and closed chains solved within the asymmetric unit of 2E2Q [Fig. 3(a)].

FIG. 3.

FIG. 3.

Notable examples of predicted conformational states by the PDBe-KB. (a) Clustering results in dendrogram (left) and structures (right) of the open–closed conformation change made by UniProt: Q96Y14. XYP (red) denotes β-d-xylopyranose and ADP (light green) denotes adenosine triphosphate, both bound to 2E2Q chain A. 2E2N chain A is an apo-form of the polypeptide. RMSD calculated between AlphaFold2 model and experimental structures. (b) Substrate promiscuity illustrated by consistent binding of diverse ligands (magenta), despite the polypeptide (UniProt: P15121) adopting a consistent conformation. Mean RMSD displayed for the collection of ligand-bound structures (top left), the AlphaFold2 structure to the two representative chains (top right), and between representative chains (bottom right). Structural variation between ligand-bound structures is relatively low, with a standard deviation in RMSD of 0.16 Å. Ligand-free structure (yellow) has a displaced loop in the Pro211-Asp230 region. (c) Fold-switch protein (UniProt: Q79V61) transitioning to control day–night cycle. Clustering dendrogram (left) with AlphaFold2 structure superposed alongside experimentally determined models. RMSD calculated between AlphaFold2 model and experimental structures.

Additionally, human aldose reductase (UniProt accession: P15121) accepts a diverse range of carbonyl-based substrates, reducing them to alcohol products using NADH as an electron source [Fig. 3(b)]. Many structures of this protein have been independently solved with a variety of ligands, providing information on the conformational heterogeneity within the holo state.83 Individual PDB entries fail to capture the structural heterogeneity in the β-sheet region spanning Val121-Arg156, but our pipeline can separate the only non-liganded structure in the PDB (1XGD) from all other ligand-bound chains. Superposition of all chains highlights a structural deviation in the unliganded structure within the Pro211-Asp230 loop.

Finally, the circadian rhythm protein KaiB (UniProt: Q79V61) helps regulate the day–night cycle in cyanobacteria and has been previously characterized as a fold-switch protein79 [Fig. 3(c)]. Associating with KaiA and KaiC, KaiB from Thermosynechococcus vestitus partakes in a concerted cycle of complex formation, autophosphorylation, and autodephosphorylation of KaiC, completing each oscillation every ∼24 h.68 KaiB adopts a homotetrameric ground state during the day and a thioredoxin-like “fold-switch” state at night. The fold-switch state is ordinarily stabilized upon oligomerization with KaiC and KaiB subunits, forming a multimeric complex.68 The clustering method described here identifies the structures solved in these two states and highlights that the protein's AlphaFold2 model from the AFDB is closer in conformation to the night-dominant fold-switch state.

DISCUSSION

Exploring protein dynamics and conformational heterogeneity is essential for understanding molecular mechanisms and disease progression. However, capturing the full range of biologically relevant conformations—even for a single polypeptide—poses significant challenges beyond solving or predicting a static structure.22,81 Numerous experimental and computational methods characterize macromolecular dynamics, but a lack of standardization hinders comprehensive data integration. The next generation of integrative methodologies promises to combine diverse experimental data and computational techniques to achieve accurate and meaningful representations of conformational heterogeneity.22 Here, we have presented the method of clustering the static structures archived in the PDB. These clusters may depict some of the most stable, highly populated protein conformations (at 100% sequence identity) but cannot represent the complete free-energy topology nor the pathways traversed during conformational state transitions.

Nevertheless, even a high-accuracy representation of structural dynamics will be of limited value in answering biological questions unless contextualized with functional information. Attributing biological significance to conformational differences becomes much more challenging without annotations, such as ligand binding, oligomeric state, post-translational modifications, and point mutations, to name a few.11–16 When comparing more distantly related proteins, ontological annotations and domain mappings from resources such as CATH84 and SCOP85 can help systematically explore sequence–structure–function relationships. Automated annotation methods, utilizing structural motifs, domain composition, and comparative modeling, will be useful for predicting functions of uncharacterized proteins and their distinct conformations. Tools such as DALI,86 SSAP,87 and Foldseek88 are currently available for the identification of evolutionary relationships and functional similarities via structural comparison. As more conformations are determined experimentally—improving ensemble-model prediction algorithms—high-quality functional annotations necessitate integration to enable systematic analysis of structural diversity across different structure data archives. The PDBe-KB superposition and clustering pipeline presented here is a step toward this goal, but the collation of annotations is now needed before biological relevance can be systematically mapped to distinct protein conformations.

CONCLUSION

Here, we present a deterministic data pipeline that clusters all proteins in the PDB archive based on model coordinates, independently of superposition. We demonstrated that the process can automatically identify distinct conformations, which, due to lack of standardized labeling in the archive, would otherwise be non-trivial to find in the PDB.

However, the lack of systematic, high-quality conformational state annotations impedes our understanding of the biological implications of protein dynamics. As such, functional annotations become available, and high-throughput mapping to conformations could be driven by initiatives such as the PDBe-KB consortium that has laid the groundwork for creating unified data access mechanisms and standard data exchange formats for a broad range of functional annotations.

Unlocking the potential of protein dynamics involves a multifaceted approach to understand their roles in biological mechanisms. It demands application of the innovative multimodal approaches seen by integrative modeling, combined with continued infrastructure improvement for high-throughput annotation and data access. As the field advances, these efforts will help with the development of novel therapeutic strategies and help us realize the relationship between protein sequence, structure, and function.

ACKNOWLEDGMENTS

The authors would like to thank the UKRI-Biotechnology and Biological Sciences Research Council for providing funding under the FunCLAN (No. BB/V016113/1) project and the European Molecular Biology Laboratory-European Bioinformatics Institute for supporting development of the service.

Note: Paper published as part of the special topic Tribute to Olga Kennard (1924-2023).

AUTHOR DECLARATIONS

Conflict of Interest

The authors have no conflicts to disclose.

Author Contributions

Joseph I. J. Ellaway: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Stephen Anyango: Resources (equal); Software (equal). Sreenath Nair: Resources (equal); Software (equal). Hossam A. Zaki: Conceptualization (equal); Methodology (supporting); Software (supporting); Writing – review & editing (supporting). Nurul Nadzirin: Software (equal). Harold R. Powell: Methodology (equal); Software (equal); Writing – review & editing (supporting). Aleksandras Gutmanas: Conceptualization (equal); Software (equal); Writing – review & editing (equal). Mihaly Varadi: Conceptualization (equal); Funding acquisition (equal); Investigation (supporting); Methodology (supporting); Project administration (lead); Supervision (lead); Writing – original draft (equal); Writing – review & editing (lead). Sameer Velankar: Conceptualization (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Project administration (equal); Supervision (equal); Writing – review & editing (equal).

DATA AVAILABILITY

The data that support the findings of this study are available within the article and its supplementary material.

References

  • 1.wwPDB consortium, “ Protein Data Bank: The single global archive for 3D macromolecular structure data,” Nucl. Acids Res. 47, D520–D528 (2019). 10.1093/nar/gky949 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Jumper J. et al. , “ Highly accurate protein structure prediction with AlphaFold,” Nature 596, 583–589 (2021). 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Baek M. et al. , “ Accurate prediction of protein structures and interactions using a three-track neural network,” Science 373, 871–876 (2021). 10.1126/science.abj8754 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Lin Z. et al. , “ Evolutionary-scale prediction of atomic-level protein structure with a language model,” Science 379, 1123–1130 (2023). 10.1126/science.ade2574 [DOI] [PubMed] [Google Scholar]
  • 5. Varadi M. et al. , “ AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models,” Nucl. Acids Res. 50, D439–D444 (2022). 10.1093/nar/gkab1061 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Schwede T. et al. , “ Outcome of a workshop on applications of protein models in biomedical research,” Structure 17, 151–159 (2009). 10.1016/j.str.2008.12.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Jing B., Berger B., and Jaakkola T., “ AlphaFold meets flow matching for generating protein ensembles,” preprint arXiv:2402.04845 (2024).
  • 8. Henzler-Wildman K. and Kern D., “ Dynamic personalities of proteins,” Nature 450, 964–972 (2007). 10.1038/nature06522 [DOI] [PubMed] [Google Scholar]
  • 9. Xue L. et al. , “ Visualizing translation dynamics at atomic detail inside a bacterial cell,” Nature 610, 205–211 (2022). 10.1038/s41586-022-05255-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Weng C., Faure A. J., Escobedo A., and Lehner B., “ The energetic and allosteric landscape for KRAS inhibition,” Nature 626, 643–652 (2024). 10.1038/s41586-023-06954-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Pozzati G. et al. , “ Limits and potential of combined folding and docking,” Bioinformatics 38, 954–961 (2022). 10.1093/bioinformatics/btab760 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Høie M. H., Cagiada M., Frederiksen A. H. B., Stein A., and Lindorff-Larsen K., “ Predicting and interpreting large-scale mutagenesis data using analyses of protein stability and conservation,” Cell Rep. 38, 110207 (2022). 10.1016/j.celrep.2021.110207 [DOI] [PubMed] [Google Scholar]
  • 13. Fu T.-M. et al. , “ Cryo-EM structure of caspase-8 tandem DED filament reveals assembly and regulation mechanisms of the death-inducing signaling complex,” Mol. Cell 64, 236–250 (2016). 10.1016/j.molcel.2016.09.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Rimmerman D. et al. , “ Revealing fast structural dynamics in pH-responsive peptides with time-resolved x-ray scattering,” J. Phys. Chem. B 123, 2016–2021 (2019). 10.1021/acs.jpcb.9b00072 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Zimmermann N., Noga A., Obbineni J. M., and Ishikawa T., “ ATP‐induced conformational change of axonemal outer dynein arms revealed by cryo‐electron tomography,” EMBO J. 42, e112466 (2023). 10.15252/embj.2022112466 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Adamoski D. et al. , “ Molecular mechanism of glutaminase activation through filamentation and the role of filaments in mitophagy protection,” Nat. Struct. Mol. Biol. 30, 1902–1912 (2023). 10.1038/s41594-023-01118-0 [DOI] [PubMed] [Google Scholar]
  • 17. Abyzov A., Blackledge M., and Zweckstetter M., “ Conformational dynamics of intrinsically disordered proteins regulate biomolecular condensate chemistry,” Chem. Rev. 122, 6719–6748 (2022). 10.1021/acs.chemrev.1c00774 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Thomasen F. E. and Lindorff-Larsen K., “ Conformational ensembles of intrinsically disordered proteins and flexible multidomain proteins,” Biochem. Soc. Trans. 50, 541–554 (2022). 10.1042/BST20210499 [DOI] [PubMed] [Google Scholar]
  • 19. Qin S. and Zhou H.-X., “ Effects of macromolecular crowding on the conformational ensembles of disordered proteins,” J. Phys. Chem. Lett. 4, 3429–3434 (2013). 10.1021/jz401817x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Schröder G. F., “ Hybrid methods for macromolecular structure determination: Experiment with expectations,” Curr. Opin. Struct. Biol. 31, 20–27 (2015). 10.1016/j.sbi.2015.02.016 [DOI] [PubMed] [Google Scholar]
  • 21. van den Bedem H. and Fraser J. S., “ Integrative, dynamic structural biology at atomic resolution—It's about time,” Nat. Methods 12, 307–318 (2015). 10.1038/nmeth.3324 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Grandori R., “ Protein structure and dynamics in the era of integrative structural biology,” Front. Biophys. 1, 1219843 (2023). 10.3389/frbis.2023.1219843 [DOI] [Google Scholar]
  • 23. Wolff A. M. et al. , “ Mapping protein dynamics at high spatial resolution with temperature-jump X-ray crystallography,” Nat. Chem. 15, 1549–1558 (2023). 10.1038/s41557-023-01329-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Du S. et al. , “ Refinement of multiconformer ensemble models from multi-temperature X-ray diffraction data,” Methods Enzymol. 688, 223–254 (2023). 10.1016/bs.mie.2023.06.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Nogly P. et al. , “ Retinal isomerization in bacteriorhodopsin captured by a femtosecond X-ray laser,” Science 361, eaat0094 (2018). 10.1126/science.aat0094 [DOI] [PubMed] [Google Scholar]
  • 26. Coquelle N. et al. , “ Chromophore twisting in the excited state of a photoswitchable fluorescent protein captured by time-resolved serial femtosecond crystallography,” Nat. Chem. 10, 31–37 (2018). 10.1038/nchem.2853 [DOI] [PubMed] [Google Scholar]
  • 27. Oda K. et al. , “ Time-resolved serial femtosecond crystallography reveals early structural changes in channelrhodopsin,” eLife 10, e62389 (2021). 10.7554/eLife.62389 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Rambo R. P. and Tainer J. A., “ Accurate assessment of mass, models and resolution by small-angle scattering,” Nature 496, 477–481 (2013). 10.1038/nature12070 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Cho H. S. et al. , “ Dynamics of quaternary structure transitions in R-state carbonmonoxyhemoglobin unveiled in time-resolved X-ray scattering patterns following a temperature jump,” J. Phys. Chem. B 122, 11488–11496 (2018). 10.1021/acs.jpcb.8b07414 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Josts I. et al. , “ Photocage-initiated time-resolved solution X-ray scattering investigation of protein dimerization,” IUCrJ 5, 667–672 (2018). 10.1107/S2052252518012149 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Caporaletti F. et al. , “ Small-angle x-ray and neutron scattering of MexR and its complex with DNA supports a conformational selection binding model,” Biophys. J. 122, 408–418 (2023). 10.1016/j.bpj.2022.11.2949 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Narayanan T. et al. , “ A multipurpose instrument for time-resolved ultra-small-angle and coherent X-ray scattering,” J. Appl. Crystallogr. 51, 1511–1524 (2018). 10.1107/S1600576718012748 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Kang J. Y. et al. , “ Structural basis for transcript elongation control by NusG family universal regulators,” Cell 173, 1650–1662.e14 (2018). 10.1016/j.cell.2018.05.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Nwanochie E. and Uversky V. N., “ Structure determination by single-particle cryo-electron microscopy: Only the sky (and intrinsic disorder) is the limit,” Int. J. Mol. Sci. 20(17), 4186 (2019). 10.3390/ijms20174186 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Punjani A., Zhang H., and Fleet D. J., “ Non-uniform refinement: Adaptive regularization improves single-particle cryo-EM reconstruction,” Nat. Methods 17, 1214–1221 (2020). 10.1038/s41592-020-00990-8 [DOI] [PubMed] [Google Scholar]
  • 36. Gupta H., McCann M. T., Donati L., and Unser M., “ CryoGAN: A new reconstruction paradigm for single-particle cryo-EM via deep adversarial learning,” IEEE Trans. Comput. Imaging 7, 759–774 (2021). 10.1109/TCI.2021.3096491 [DOI] [Google Scholar]
  • 37. Zhong E. D., Lerer A., Davis J. H., and Berger B., “ CryoDRGN2: Ab initio neural reconstruction of 3D protein structures from real cryo-EM images,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV) ( IEEE, 2021), pp. 4046–4055. [Google Scholar]
  • 38. Tang W. S., Zhong E. D., Hanson S. M., Thiede E. H., and Cossio P., “ Conformational heterogeneity and probability distributions from single-particle cryo-electron microscopy,” Curr. Opin. Struct. Biol. 81, 102626 (2023). 10.1016/j.sbi.2023.102626 [DOI] [PubMed] [Google Scholar]
  • 39. Kinman L. F., Powell B. M., Zhong E. D., Berger B., and Davis J. H., “ Uncovering structural ensembles from single-particle cryo-EM data using cryoDRGN,” Nat. Protoc. 18, 319–339 (2023). 10.1038/s41596-022-00763-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Chen M. and Ludtke S. J., “ Deep learning-based mixed-dimensional Gaussian mixture model for characterizing variability in cryo-EM,” Nat. Methods 18, 930–936 (2021). 10.1038/s41592-021-01220-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Chen M., Toader B., and Lederman R., “ Integrating molecular models into cryoEM heterogeneity analysis using scalable high-resolution deep gaussian mixture models,” J. Mol. Biol. 435, 168014 (2023). 10.1016/j.jmb.2023.168014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Rangan R. et al. , “ Deep reconstructing generative networks for visualizing dynamic biomolecules inside cells,” preprint arXiv:18.553799 (2023).
  • 43. Zhang H. et al. , “ A method for restoring signals and revealing individual macromolecule states in cryo-ET, REST,” Nat. Commun. 14, 2937 (2023). 10.1038/s41467-023-38539-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Zeng X. et al. , “ High-throughput cryo-ET structural pattern mining by unsupervised deep iterative subtomogram clustering,” Proc. Natl. Acad. Sci. U. S. A. 120, e2213149120 (2023). 10.1073/pnas.2213149120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Hagen W. J. H., Wan W., and Briggs J. A. G., “ Implementation of a cryo-electron tomography tilt-scheme optimized for high resolution subtomogram averaging,” J. Struct. Biol. 197, 191–198 (2017). 10.1016/j.jsb.2016.06.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Khavnekar S. et al. , “ Multishot tomography for high-resolution in situ subtomogram averaging,” J. Struct. Biol. 215, 107911 (2023). 10.1016/j.jsb.2022.107911 [DOI] [PubMed] [Google Scholar]
  • 47. de Teresa-Trueba I. et al. , “ Convolutional networks for supervised mining of molecular patterns within cellular context,” Nat. Methods 20, 284–294 (2023). 10.1038/s41592-022-01746-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Ramelot T. A., Tejero R., and Montelione G. T., “ Representing structures of the multiple conformational states of proteins,” Curr. Opin. Struct. Biol. 83, 102703 (2023). 10.1016/j.sbi.2023.102703 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Wapeesittipan P., Mey A. S. J. S., Walkinshaw M. D., and Michel J., “ Allosteric effects in cyclophilin mutants may be explained by changes in nano-microsecond time scale motions,” Commun. Chem. 2, 41 (2019). 10.1038/s42004-019-0136-1 [DOI] [Google Scholar]
  • 50. Karschin N., Becker S., and Griesinger C., “ Interdomain dynamics via paramagnetic NMR on the highly flexible complex calmodulin/Munc13-1,” J. Am. Chem. Soc. 144, 17041–17053 (2022). 10.1021/jacs.2c06611 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Overbeck J. H., Stelzig D., Fuchs A.-L., Wurm J. P., and Sprangers R., “ Observation of conformational changes that underlie the catalytic cycle of Xrn2,” Nat. Chem. Biol. 18, 1152–1160 (2022). 10.1038/s41589-022-01111-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Stiller J. B. et al. , “ Structure determination of high-energy states in a dynamic protein ensemble,” Nature 603, 528–535 (2022). 10.1038/s41586-022-04468-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Jensen M. R., Zweckstetter M., Huang J., and Blackledge M., “ Exploring free-energy landscapes of intrinsically disordered proteins at atomic resolution using NMR spectroscopy,” Chem. Rev. 114, 6632–6660 (2014). 10.1021/cr400688u [DOI] [PubMed] [Google Scholar]
  • 54. Camacho-Zarco A. R. et al. , “ NMR provides unique insight into the functional dynamics and interactions of intrinsically disordered proteins,” Chem. Rev. 122, 9331–9356 (2022). 10.1021/acs.chemrev.1c01023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Ahdritz G. et al. , “ OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization,” preprint arXiv:20.517210 (2022). [DOI] [PubMed]
  • 56. Chakravarty D., Schafer J. W., Chen E. A., Thole J. R., and Porter L. L., “ AlphaFold2 has more to learn about protein energy landscapes,” preprint arXiv:12.571380 (2023).
  • 57. Guo H.-B. et al. , “ AlphaFold2 models indicate that protein sequence determines both structure and dynamics,” Sci. Rep. 12, 10696 (2022). 10.1038/s41598-022-14382-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Lane T. J., “ Protein structure prediction has reached the single-structure frontier,” Nat. Methods 20, 170–173 (2023). 10.1038/s41592-022-01760-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Sala D., Engelberger F., Mchaourab H. S., and Meiler J., “ Modeling conformational states of proteins with AlphaFold,” Curr. Opin. Struct. Biol. 81, 102645 (2023). 10.1016/j.sbi.2023.102645 [DOI] [PubMed] [Google Scholar]
  • 60. Stein R. A. and Mchaourab H. S., “ SPEACH_AF: Sampling protein ensembles and conformational heterogeneity with Alphafold2,” PLOS Comput. Biol. 18, e1010483 (2022). 10.1371/journal.pcbi.1010483 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. del Alamo D., Sala D., Mchaourab H. S., and Meiler J., “ Sampling alternative conformational states of transporters and receptors with AlphaFold2,” eLife 11, e75751 (2022). 10.7554/eLife.75751 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Huang Y. J. et al. , “ Assessment of prediction methods for protein structures determined by NMR in CASP14: Impact of AlphaFold2,” Proteins Struct. Funct. Bioinform. 89, 1959–1976 (2021). 10.1002/prot.26246 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Heo L. and Feig M., “ Multi-state modeling of G-protein coupled receptors at experimental accuracy,” Proteins Struct. Funct. Bioinform. 90, 1873–1885 (2022). 10.1002/prot.26382 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Saldaño T. et al. , “ Impact of protein conformational diversity on AlphaFold predictions,” Bioinformatics 38, 2742–2748 (2022). 10.1093/bioinformatics/btac202 [DOI] [PubMed] [Google Scholar]
  • 65. Wayment-Steele H. K., Ovchinnikov S., Colwell L., and Kern D., “ Prediction of multiple conformational states by combining sequence clustering with AlphaFold2,” preprint arXiv:17.512570 (2022).
  • 66. Wallner B., “ AFsample: Improving multimer prediction with AlphaFold using massive sampling,” Bioinformatics 39, btad573 (2023). 10.1093/bioinformatics/btad573 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Karamanos T. K., “ Chasing long-range evolutionary couplings in the AlphaFold era,” Biopolymers 114, e23530 (2023). 10.1002/bip.23530 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Tseng R. et al. , “ Structural basis of the day-night transition in a bacterial circadian clock,” Science 355, 1174–1180 (2017). 10.1126/science.aag2516 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Banerjee A., Saha S., Tvedt N. C., Yang L.-W., and Bahar I., “ Mutually beneficial confluence of structure-based modeling of protein dynamics and machine learning methods,” Curr. Opin. Struct. Biol. 78, 102517 (2023). 10.1016/j.sbi.2022.102517 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Gupta A., Dey S., Hicks A., and Zhou H.-X., “ Artificial intelligence guided conformational mining of intrinsically disordered proteins,” Commun. Biol. 5, 610 (2022). 10.1038/s42003-022-03562-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Janson G., Valdes-Garcia G., Heo L., and Feig M., “ Direct generation of protein conformational ensembles via machine learning,” Nat. Commun. 14, 774 (2023). 10.1038/s41467-023-36443-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Park C. W. et al. , “ Accurate and scalable graph neural network force field and molecular dynamics with direct force architecture,” Npj Comput. Mater. 7, 73 (2021). 10.1038/s41524-021-00543-3 [DOI] [Google Scholar]
  • 73. Tian J. et al. , “ Revealing the conformational dynamics of UDP-GlcNAc recognition by O-GlcNAc transferase via Markov state model,” Int. J. Biol. Macromol. 256, 128405 (2024). 10.1016/j.ijbiomac.2023.128405 [DOI] [PubMed] [Google Scholar]
  • 74. Wang D. et al. , “ Efficient sampling of high-dimensional free energy landscapes using adaptive reinforced dynamics,” Nat. Comput. Sci. 2, 20–29 (2021). 10.1038/s43588-021-00173-1 [DOI] [PubMed] [Google Scholar]
  • 75. Varadi M. et al. , “ PDBe and PDBe-KB: Providing high-quality, up-to-date and integrated resources of macromolecular structures to support basic and applied research and education,” Protein Sci. 31, e4439 (2022). 10.1002/pro.4439 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76. Dana J. M. et al. , “ SIFTS: Updated structure integration with function, taxonomy and sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins,” Nucl. Acids Res. 47, D482–D489 (2019). 10.1093/nar/gky1114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Velankar S. et al. , “ SIFTS: Structure integration with function, taxonomy and sequences resource,” Nucl. Acids Res. 41, D483–D489 (2012). 10.1093/nar/gks1258 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Krissinel E., “ Enhanced fold recognition using efficient short fragment clustering,” J. Mol. Biochem. 1, 76–85 (2012). [PMC free article] [PubMed] [Google Scholar]
  • 79. Porter L. L. and Looger L. L., “ Extant fold-switching proteins are widespread,” Proc. Natl. Acad. Sci. 115, 5968–5973 (2018). 10.1073/pnas.1800168115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80. Burra P. V., Zhang Y., Godzik A., and Stec B., “ Global distribution of conformational states derived from redundant models in the PDB points to non-uniqueness of the protein structure,” Proc. Natl. Acad. Sci. U. S. A. 106, 10505–10510 (2009). 10.1073/pnas.0812152106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81. Miller M. D. and Phillips G. N., “ Moving beyond static snapshots: Protein dynamics and the Protein Data Bank,” J. Biol. Chem. 296, 100749 (2021). 10.1016/j.jbc.2021.100749 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82. Nishimasu H., Fushinobu S., Shoun H., and Wakagi T., “ Crystal structures of an ATP-dependent hexokinase with broad substrate specificity from the hyperthermophilic archaeon Sulfolobus tokodaii,” J. Biol. Chem. 282, 9923–9931 (2007). 10.1074/jbc.M610678200 [DOI] [PubMed] [Google Scholar]
  • 83. Sandner A. et al. , “ Which properties allow ligands to open and bind to the transient binding pocket of human aldose reductase?,” Biomolecules 11, 1837 (2021). 10.3390/biom11121837 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84. Sillitoe I. et al. , “ CATH: Increased structural coverage of functional space,” Nucl. Acids Res. 49, D266–D273 (2021). 10.1093/nar/gkaa1079 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85. Andreeva A., Kulesha E., Gough J., and Murzin A. G., “ The SCOP database in 2020: Expanded classification of representative family and superfamily domains of known protein structures,” Nucl. Acids Res. 48, D376–D382 (2020). 10.1093/nar/gkz1064 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86. Holm L., Laiho A., Törönen P., and Salgado M., “ DALI shines a light on remote homologs: One hundred discoveries,” Protein Sci. 32, e4519 (2023). 10.1002/pro.4519 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87. Orengo C. A. and Taylor W. R., “ SSAP: Sequential structure alignment program for protein structure comparison,” in Methods in Enzymology ( Academic Press, 1996), Vol. 266, pp. 617–635. [DOI] [PubMed] [Google Scholar]
  • 88. van Kempen M. et al. , “ Fast and accurate protein structure search with Foldseek,” Nat. Biotechnol. 42, 243–246 (2024). 10.1038/s41587-023-01773-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.See the 10.60893/figshare.sdy.c.7222863 for details. We include a copy of our manually curated benchmark dataset of 315 proteins across a range of conformational states and a supplementary methods document, formally describing the algorithm. [DOI]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. See the 10.60893/figshare.sdy.c.7222863 for details. We include a copy of our manually curated benchmark dataset of 315 proteins across a range of conformational states and a supplementary methods document, formally describing the algorithm. [DOI]

Data Availability Statement

The data that support the findings of this study are available within the article and its supplementary material.


Articles from Structural Dynamics are provided here courtesy of American Institute of Physics

RESOURCES