Abstract
Despite the success of AlphaFold methods in predicting single protein structures, these methods showed intrinsic limitations in the characterization of multiple functional conformations of allosteric proteins. The recent NMR-based structural determination of the unbound ABL kinase in the active state and discovery of the inactive low-populated functional conformations that are unique for ABL kinase present an ideal challenge for the AlphaFold2 approaches. In the current study, we employ several adaptations of the AlphaFold2 methodology to predict protein conformational ensembles and allosteric states of the ABL kinase including randomized alanine sequence scanning combined with the multiple sequence alignment subsampling proposed in this study. We show that the proposed new AlphaFold2 adaptation combined with local frustration profiling of conformational states enables accurate prediction of the protein kinase structures and conformational ensembles, also offering a robust approach for interpretable characterization of the AlphaFold2 predictions and detection of hidden allosteric states. We found that the large high frustration residue clusters are uniquely characteristic of the low-populated, fully inactive ABL form and can define energetically frustrated cracking sites of conformational transitions, presenting difficult targets for AlphaFold2. The results of this study uncovered previously unappreciated fundamental connections between local frustration profiles of the functional allosteric states and the ability of AlphaFold2 methods to predict protein structural ensembles of the active and inactive states. This study showed that integration of the randomized sequence scanning adaptation of AlphaFold2 with a robust landscape-based analysis allows for interpretable atomistic predictions and characterization of protein conformational ensembles, providing a physical basis for the successes and limitations of current AlphaFold2 methods in detecting functional allosteric states that play a significant role in protein kinase regulation.
Graphical Abstract
INTRODUCTION
The remarkable progress of the AlphaFold2 (AF2) technology in the field of protein structure modeling has ushered in a transformative era in structural biology.1,2 AF2 utilizes evolutionary information by considering multiple sequence alignments (MSAs) derived from related protein sequences as input and incorporates the transformer architecture with self-attention mechanisms, which allows one to discern long-range dependencies and interactions within protein sequences. The AF2 architecture enables bidirectional information flow throughout the neural network where through self-consistent dynamic exchange of sequence and structure information the network converges to a robust inference.1,2 Self-supervised deep learning models, drawing inspiration from natural language processing architectures, and particularly approaches incorporating attention-based3 and transformer mechanisms,4 have proven to be powerful tools for predicting protein structures from individual sequences, eliminating the dependence on MSA information.5,6 The language models can learn directly evolutionary patterns of sequences linked to structure, eliminating the need for MSAs and templates, which improves the quality and speed of protein structure predictions. Among latest breakthrough in AI-based protein structure predictions is a high-accuracy end-to-end transformer protein language model ESMFold for atomic level structure prediction directly from the individual sequence of a protein.6 OmegaFold is another related approach that combines a protein language model with a geometry-guided transformer model, enabling robust protein structure predictions from individual sequences.7 The RoseTTAFold2 approach extends the original three-track architecture of RoseTTAFold8 over the full network by integrating a mechanism of updating pair features using a more computationally efficient structure-biased attention, producing high accuracy of protein structure predictions for both monomers and multimers as well as efficient computational scaling on large proteins and complexes.9 Recent studies employed variational autoencoders trained on the experimental protein structures and molecular dynamics (MD) simulation snapshots to convert protein structural data into a continuous, low-dimensional representation, followed by guided sampling in the latent space and rapid generation of accurate conformational ensembles with RoseTTAFold.10 A new protein design method RoseTTAFold diffusion integrates structure prediction networks and diffusion generative models, enabling design of complex functional proteins from simple molecular specifications.11 Despite the remarkable success of the AF2-based methods and self-supervised protein language models that excelled at predicting static protein structures, there are notable shortcomings related to their applicability and generality in accurately characterizing conformational dynamics, functional protein ensembles, conformational changes, and allosteric states.12 Conformational changes and allosteric states involve subtle shifts in protein structure and function and are crucial for understanding how proteins interact with other molecules and modulate cellular processes. Several recent studies indicated that protein structure prediction capabilities of the AF2 methods are not trivially expandable for prediction of conformational ensembles and accurate mapping of allosteric landscapes.13-17 Efforts to optimize the AF2 methodology for predicting alternative conformational states of proteins have primarily centered on altering the MSA information, which is motivated by the recognition that MSAs can encode for coevolutionary signals of the thermodynamically probable protein states.13,14 In one of these approaches, MSAs are randomly subsampled to reduce depth, resulting in shallower MSAs that enhance the diversity of the AF2 protein models and can capture the alternative conformational states of proteins.13 Another AF2-based approach known as SPEACH_AF (Sampling Protein Ensembles and Conformational Heterogeneity with AlphaFold2) involves the manipulation of the MSA through in silico mutagenesis where replacing specific residues within the MSA can induce changes in the distance matrices, ultimately leading to the prediction of alternate protein conformations.14 In this approach, it is assumed that alanine mutations in the MSA could broaden an attention network mechanism within AF2 and ascertain distinct patterns of coevolved residues associated with alternative conformations.14 The observed limitations of the AF2 methods in predicting multiple protein conformations are associated with the intrinsic training biases toward the experimentally determined structures and evolutionary information derived from MSAs that drives inferences of the thermodynamically stable native protein states. In particular, the AF2 methodology was found to be systematically biased to predict only one conformation of fold-switching proteins as 94% of the AF2 predictions captured the experimentally determined ground conformation but not the alternative structure.16 The recent study examined more systematically AF2 adaptations for sampling of alternative states for predictions of experimental conformational states by generating >280,000 models of 93 fold-switching proteins with experimentally determined conformations.17 Combining all models, the AF2 predicted structures displayed only modest success in detecting the experimentally characterized conformations for most fold switching proteins. Some of the recent attempts for expanding the AF2 capabilities toward prediction of conformational ensembles involved combination of shallow MSA and state-annotated templates incorporating functional or structural properties of GPCRs and protein kinases.18 Another AF2 adaption termed AF-Cluster used a simple MSA subsampling method for subsequent clustering of evolutionarily related or functionally similar sequences, enabling predictions of alternative protein states and showing promise in identifying a previously unknown fold-switched state that was later validated by NMR analysis.19 Recent application studies indicated that AF-Cluster predicts fold switching from single sequences by associating structures in its training set with homologous sequences, suggesting potential limitations for predicting multiple functional states of fold-switching proteins that are absent in the training set.20
Exploring biological mechanisms of action and understanding enzymatic functions is complicated and remains highly challenging as it requires characterization of the conformational ensembles that enzymes adopt in solution and in the functional complexes.21,22 In an illuminating pioneering study, Osuna and colleagues evaluated the potential of AF2 methodologies in assessing the effect of the mutations on the conformational landscape of the beta subunit of tryptophan synthase.21 Through elegant adaptations of AF2 by altering the MSAs depths and using MD-generated conformations as templates, this study accurately predicted protein heterogeneity and the impact of mutations on modulating the conformational landscape of the tryptophan synthase.21 Importantly, this study highlighted the importance of tuned template-based AF2 approaches for assessing the conformational heterogeneity of protein structures and the potential of these synergistic methods for computational enzyme design. A recent perspective on applications of AF2 and deep learning methods focused on AI-based predictions of protein functions and conformational dynamics in enzyme functions and evolution.22 The key lessons outlined in this fascinating analysis pointed to the advantages of the template-based AF2 predictions of protein conformational landscapes that are consistent with computationally expensive characterization of protein dynamics from multiple-walker metadynamics MD simulations, also presenting convincing showcases of successful AF2 template-based adaptations for conformational enzyme design.22
Several user-friendly powerful architectural pipelines that facilitate AF2 implementations were developed including ColabFold,23 which is a community implementation of a Colab for running AF2 applications and OpenFold,24 which is a retrainable implementation of AF2 that can integrate the experimental data derived from cross-linking experiments to guide monomeric and multimeric predictions. Another study introduced AF unmasked adaptation that can leverage information from templates containing quaternary structures without the need for retraining and can generate robust models of large complexes in one shot, allowing for robust integrative structure modeling and predictions of protein dynamics.25 The analysis of various AF2 pipelines reveals a significant focus of emerging adaptations on broadening the range of accessible protein conformations to enhance the ability to predict specific conformational states by adjusting the balance between genetic information obtained from MSAs and structural information derived from templates.26 Expanding existing tools for prediction of allosteric regulatory mechanisms would require a considerable step forward toward accurate mapping of the conformational ensembles and prediction of functional allosteric conformations.27 Recent analysis of AF2 predictions and direct comparison with the experimental crystallographic maps showed that high-confidence predictions could often differ from the experimental data, indicating that experimental structural determination remains the key step in validating the AF2-based predictions.28
The enormous amount of structural information on protein kinases29-34 and their complexes with ATP-bound inhibitors and allosteric modulators35,36 provides unprecedented opportunities for validation and evaluating the AF2 methods. The recent NMR-based structural determination of the unbound ABL kinase in the active state and two inactive functional conformations that are unique for the ABL kinase present an ideal challenge for AF2 approaches (Figure 1).37,38 The thermodynamically dominant active conformation of the ligand-free ABL kinase and two short-lived inactive conformations and that occur only 5% of the time are intrinsically present on the conformational landscape of the unbound kinase domain and are quite different from each other in the critical functional regions (Figure 1).38
Figure 1.
NMR solution structural ensembles of the thermodynamically stable fully active ground state of the ABL kinase domain (pdb id 6XR6) (A), the inactive state (pdb id 6XR7) (B), and the closed inactive state (pdb id 6XRG) (C). The NMR ensemble of DFG conformations in the active ABL state (D), the inactive state (E), and the closed inactive state (F). The ABL conformations are shown in ribbons. The structures point to similarities and differences in the key functional regions of the helix, the A-loop, and the P-loop. In particular, the A-loop in the inactive state (C,F) adopts a completely different closed conformation.
Multidisciplinary studies that exploited synergies between structural, biophysical approaches, and computational methods39-42 have been fruitful in uncovering the invisible dynamic aspects of protein kinases. Our recent study employed atomistic MD simulations with dimensionality reduction methods and Markov state models (MSM) to characterize the dynamics and kinetics of structural changes between the ABL conformational states.42 While MD simulations and MSM provided insights into the conformational dynamics of the NMR-determined ABL states, these approaches completely relied on the initial experimental structures. Several recent investigations explored the potential of AF2 methodologies for predicting conformational states in protein kinases. AF2-based modeling of 437 human protein kinases in the active form using shallow MSAs of orthologs and close homologues of the query protein showed that the robustness of AF2 methods as selected models for each kinase based on the prediction confidence scores of the activation loop residues conformed closely to the substrate-bound experimental structures.43 The ability of AF2 methods to predict kinase structures in different conformations at various MSA depths was examined, suggesting that the shallow MSAs allow for more efficient exploration of alternative kinase conformations, including identification of previously unseen conformations for 398 kinases.44 Another AF2 adaptation explored the conformational landscape of the ABL kinase domain by systematically manipulating the MSA subsampling parameters and predicted the relative populations of different ABL conformations.45 While this study suggested that AF2 with shallow MSAs may be useful for predicting conformational ensembles in protein kinases, accurate and reproducible prediction of functional allosteric conformations remains challenging and is highly sensitive to the architectural details of the AF2 pipeline.
In the current study, to predict protein conformational ensembles and allosteric states of the ABL kinase, we employ several recent AF2 adaptations including (a) MSA subsampling with shallow MSA depth; (b) SPEACH_AF approach in which alanine scanning is performed on generated MSAs by using different random alanine mutation positions in the MSAs; and (c) random alanine sequence scanning algorithm that iterates through each amino acid in the native sequence or specified functional regions to simulate random alanine substitution mutations. The results demonstrate that randomized alanine sequence scanning combined with shallow AF2 enables accurate and robust prediction of structural ensembles and conformational heterogeneity of the active and the intermediate inactive ABL state that represents one of the two common inactive forms found in the crystal structures of protein kinases.46,47 Principal Component Analysis (PCA) of the AF2-predicted ensembles and experimental NMR ensembles yields very similar patterns, confirming the validity of the proposed approach. By mapping the predicted AF2 ensembles of the ABL kinase with the equilibrium simulations and major ABL kinetic macrostates obtained from our previous MD and MSM studies,42 we demonstrate that the proposed AF2 adaptation can accurately capture conformational heterogeneity of the functional states and structural transformations between the active and inactive ABL conformations. Integration of the proposed AF2 adaptation with local frustration mapping of conformational states can enable robust prediction of the ABL active and inactive conformational ensembles, also offering an energy landscape framework for interpretable characterization of the AF2 predictions and limitations in detecting hidden allosteric states. The results reveal that the dominant minimal frustration pattern in the active ABL state and the inactive intermediate state provide funnel-liked landscapes around these states, enabling accurate AF2 predictions of functional conformational ensembles The emergence of interconnected high frustration residue clusters in the inactive state that define the initiation “cracking” sites of allosteric changes can present difficult targets for robust AF2 predictions. This study uncovered previously unappreciated, fundamental connections between distinct patterns of local frustration in functional kinase states and AF2 successes/limitations in detecting low-populated frustrated conformations, providing understanding and rationale for limitations of current AF2-based adaptations in the modeling of conformational ensembles.
MATERIALS AND METHODS
Protein Structure Modeling Using AF2 with MSA Shallow Subsampling Adaptation.
Structural prediction of the ABL kinase states were carried out using the AF2 framework1,2 within the ColabFold implementation23 using a range of MSA depths and MSA subsampling.13 We used the max_msa field to set two AF2 parameters in the following format: max_seqs:extra_seqs. These parameters determine the number of sequences subsampled from the MSA (max_seqs sets the number of sequences passed to the row/column attention track and extra_seqs the number of sequences additionally processed by the main evoformer stack). The lower values encourage more diverse predictions but increase the number of misfolded models. The default MSAs are subsampled randomly to obtain shallow MSAs containing as few as five sequences (Figure 2). We set the max_msa value to 16:32. This parameter is in the format of max_seqs:extra_seqs, which decides the number of sequences subsampled from the MSA. Max_seq determines the number of sequences passed to the row/column attention matrix at the front end of the AF2 architecture, and extra_seqs sets the number of extra sequences processed by the Evoformer stack after the attention mechanism. We additionally manipulated the num_recycles parameters to produce more diverse outputs. AF2 makes predictions using 5 models pretrained with different parameters and consequently with different weights. To generate more data, we set the number of recycles to 12, which produces 14 structures for each model starting from recycle 0 to recycle 12 and generates a final refined structure. Recycling is an iterative refinement process, with each recycled structure becoming more precise. Each of the AF2 models generates 14 structures, amounting to 70 structures in total (Supporting Information Figure S1).
Figure 2.
Schematic representation of the AF2 protein structure prediction pipeline using full sequence randomized alanine scanning and MSA shallow subsampling.
In addition, we also predicted the ABL structure using AF2, with the default and “auto” parameters serving as a baseline structure for prediction and variability analysis. The MSAs were generated using the MMSeqs2 library48,49 using the ABL1 sequence from residues 240 to 440 as input. We then set the num_seed parameter to 1. This parameter quantifies the number of random seeds to iterate through, ranging from random_seed to random_seed + num_seed, increasing the num_seeds samples’ predictions from the uncertainty of the model. We also enabled the use of the dropout parameter, meaning that dropout layers in the model would be active during the time of predictions, which further increases the variability within predictions. As a summary of the process, we input a protein sequence of the ABL1 kinase and predicted 70 unique structures using the shallow MSA technique.
Protein Structure Modeling Using AF2 with SPEACH_AF Adaptation.
In the SPEACH_AF approach,14 MSAs were first generated from the original full-length protein sequence using MMSeqs2.48,49 The method then applies random alanine masking to the generated MSA. This method generates the MSA first and then performs alanine scanning. We generated multiple distinct MSAs using the SPEACH_AF notebook by using different random alanine mutation positions in the MSAs as well as the number of mutations in the MSAs. We start with native sequence, which we use to create an MSA. This MSA is then processed by the SPEACH_AF approach where it undergoes alanine scanning of all sequences in the MSA, from which we take two distinct MSA sequences, with the first MSA having a lower frequency of mutation than the second one (Supporting Information Figure S2). The shallow MSA methodology was then employed to predict structural ensembles for different states of the ABL kinase. The predicted structures were generated from 6 recycles per model.
Randomized Alanine Sequence Scanning and Shallow MSA Subsampling for Prediction of Conformational Ensembles.
Protein models generated using shallow MSA subsampling and SPEACH_AF can increase the conformational heterogeneity by reducing the depth of the input MSAs through stochastic subsampling. The proposed randomized sequence scanning adaptation takes it a step further by constructing diverse MSAs based on the alanine-mutated sequences and probing conformational variability of the generated ensembles by directing the AF2 attention mechanism to different functional regions on the protein sequence. The randomized alanine scanning algorithm operates first on the pool of sequences and iterates through each amino acid in the native sequence to randomly substitute protein residues with alanine and probe different regions of the protein sequence and their resulting effect on the constructed MSAs. The initial input for the full sequence randomized alanine scanning (Figure 2) is the original full native sequence. The algorithm substitutes residue with alanine at each position with probability randomly generated between 0.05 and 0.15 for each sequence position. We ran the masking algorithm on the full native sequence 10 times to generate 10 alanine scanned sequences, each with different frequencies and positions of alanine mutations (Figure 2). MSAs are then generated for each of the alanine scanned full length sequence as input to MMSeqs2.48,49 This alanine masking sequence algorithm can produce gradual and controllable perturbations of MSAs without mutating the sequences within the MSA, which is the biggest difference from the SPEACH_AF technique. A gradual diversification of the resulting MSAs enables the attention network module of the AF2 to discover functional sequence positions and distinct parts of the MSAs that determine conformational diversity and ensembles of functionally relevant protein states. The randomized alanine sequence scanning protocol is followed by AF2 shallow subsampling on each of these MSAs with 12 recycles per model and a total of 70 predicted conformations for each of the MSAs constructed.
We start with native sequence , which we then apply the alanine mutation algorithm to resulting in nine new sequences.
(1) |
We then used these full sequences to generate our MSAs.
MSA 1
(2) |
MSA 10
(3) |
In addition to randomized alanine sequence scanning of the complete sequence, we also examined several variations of this approach, with targeted alanine masking of the ABL sequence space. In particular, we probed the effects of random alanine masking of sequence positions in the A-loop (residues 398421) that is critical for conformational change between the active and inactive ABL forms. In a similar manner, we also examined the effect of targeted alanine masking in C-terminal regions (residues 420–480) that are known to be vulnerable to functional conformational changes during switching between active and inactive conformations. For each of these targeted alanine making experiments, we generate 10 alanine scanned sequences, each with different frequencies and positions of alanine mutations in the respective A-loop and C-terminal loop regions. By combining manipulation of sequence variations with MSA subsampling, we can efficiently probe the effects of different sequence regions and generate functional conformational ensembles with increased diversity while avoiding partially disordered and misfolded states. We performed comparative prediction of ABL protein conformational states and functional ensembles using a hierarchy of several AF2-based adaptations: AF2 default settings, AF2 methodology with shallow MSA depth; SPEACH_AF in which the MSAs are manipulated using mutagenesis by replacing specific residues within the MSAs;14 randomized alanine sequence scanning of the complete sequence, and targeted alanine masking combined with MSA shallow subsampling (Supporting Information Figure S3).
Statistical and Structural Assessment of AF2-Generated Models.
AF2 models were ranked by Local Distance Difference Test (pLDDT) scores (a per-residue estimate of the prediction confidence on a scale from 0 to 100), quantified by the fraction of predicted distances that lie within their expected intervals. The values correspond to the model’s predicted scores based on the lDDT- metric, which is a local superposition-free metric that assesses the atomic displacements of the residues in the predicted model.1,2 Models were compared to the experimental structure using structural alignment tool TM-align and calculating the TM-score as a quantitative measure of the overall accuracy of the predicted models. The TM-score ranges from 0 to 1, where a value of 1 indicates a perfect match between the predicted model and the reference structure. When the TM-score >0.5, it implies that the structures share the same fold. TM-score >0.5 is often used as a threshold to determine if the predicted model has a fold similar to the reference structure. The root-meansquare deviation (rmsd) superposition of backbone atoms was calculated using ProFit (http://www.bioinf.org.uk/software/profit/).
Local Frustration Analysis.
To quantify the role of molecular frustration in the allosteric control of the ABL kinase and evaluate local frustration patterns of the active and inactive ABL states, we employed FrustratometeR apparatus.51,52 Conformational and mutational frustration of protein kinase residues are evaluated by computing the local energetic frustration index in which the contribution of a residue to the energy in a given conformation is compared to the energies that would be found by mutating residues in the same native location or by changing the local conformational environment for the interacting pair.51,52 For mutational frustration, the decoy set was made randomizing the identities of the interacting amino acids and , keeping all other interaction parameters at their native value. For configurational frustration, the decoy set involves randomizing residue identities and the distance between the interacting amino acids , . The frustration index was calculated by mutating the identities and distances between the interacting amino acids. We evaluate configurational and mutational molecular frustration by computing local densities of minimal, neutral, and high frustration contacts within a 5 Å sphere from a given protein residue. The distributions of neutral, high, and minimal frustration are determined and compared with a more general frustration analysis that revealed the majority of local interactions as neutral (~50–60%) or minimally frustrated (30%).53-55 Our earlier studies of local frustration in the protein kinases established that the protein kinase regions undergoing large structural changes during activating transitions between the inactive and active forms could be enriched in clusters of highly frustrated residues.56 Here, we combine AF2 predictions of the ABL conformational ensembles using a hierarchy of different AF2-based adaptations with local frustration analysis of ABL structures to propose an approach for landscape-based analysis and interpretable characterization of the AF2 predictions.
RESULTS AND DISCUSSION
Structural Analysis of the NMR Conformational Ensembles of the ABL Kinase.
First, we analyzed in more detail the NMR-generated ensemble of the ligand-free ABL kinase domain38 in its active conformational state (pdb id 6XR6) and two inactive conformational states and (pdb id 6XR7, 6XRG) (Figure 1). Conformational transitions between the inactive and active kinase states are orchestrated by three conserved structural motifs in the catalytic domain: the -helix, the DFG-Asp motif (DFG-Asp in, active; DFG-Asp out, inactive), and the activation loop (A-loop open, active; A-loop closed, inactive), providing structural fingerprints that differentiate the active and inactive forms (Figure 1). Structural analysis showed that the NMR ensemble of the active conformation (Figure 1A) closely conforms to this thermodynamically dominant state, with some appreciable degree of conformational plasticity that is visible in the A-loop and the helix. Nonetheless, all conformations from the active ensemble are characterized by the “-in” position and a stable DFG-in orientation (Figure 1A,D). In the active “-in” state, a conserved helix residue E305 forms an ion pair with K290 in the strand that coordinates the and phosphates of the ATP. Interestingly, the NMR ensembles of the inactive ABL states are very different structurally and dynamically (Figure 1B,C). In the inactive state (pdb id 6XR7), the helix moves from its active “-in” position to the intermediate -in/out position in the state (Figure 1B,E). The structural alignment of the DFG conformations in the ensemble highlighted a considerable variability in the regulatory DFG motif that adopts distinct “out” conformations exemplified by heterogeneity of the phenylalanine departures from the active “in” position (Figure 1E). In the ensemble, the DFG motif is flipped 180°, with respect to the active conformation, but the A-loop remains in an open and highly heterogeneous conformation, similar to the active conformation. In this form, ABL adopts the DFG-out, -helix-in conformation, and thus, it is catalytically inactive (Figure 1B,E). The analysis of the NMR ensembles showed a contrast between heterogeneous ensemble (Figure 1B,E) and a more restricted inactive ensemble, which is different structurally and dynamically (Figure 1C,F). The regulatory DFG motif adopts a distinct “out” conformation in the state (Figure 1F) where only minor displacements were observed in the NMR ensemble of this inactive state. However, the DFG motif is in the “out” conformation in both the and states, and the A-loop and the helix adopt vastly different conformations in these states. In the inactive state, the A-loop undergoes a large conformational arrangement to adopt a fully closed conformation, which is accompanied by change in the -out position (Figure 1C,F). Structural changes associated with the transition to the structure are large and are executed through coordinated massive rearrangements of the key structural elements, the P-loop, A-loop, and -helix. Similar to the state, adopts the DFG-out conformation where F401 of the DFG in the state flips into the catalytic pocket and translates by more than 10 Å to occupy a hydrophobic pocket lined up by L267, V275, A288, F336, and L403 (Figure 1C,F).
Shallow MSA Depth Increases the Structural Diversity of AF2-Generated Protein Models.
The important objective of this study was to explore the potential and limits of various AF2 adaptations while preserving the overall AF2 implementation architecture. First, we analyzed the AF2 results using shallow MSA depth settings, which showed accurate predictions of the ABL active state and excellent structural alignment with the crystallographically active conformation with high confidence pLDDT values (Figure 3). Moreover, the top AF2 predicted models selected by pLDDT values displayed increased structural similarity to the experimental ABL1 active kinase structure. The analysis of the pLDDT profiles for the predicted top models showed high confidence values for most of the ABL kinase domain regions (pLDDT ~80–100), while the highly mobile A-loop (residues 395421) displayed variability of pLDDT values (pLDDT ~65–85) (Figure 3A). In general, protein regions with pLDDT values of ~50–70 have lower confidence, while pLDDT <50 may indicate some level of disorder. Indeed, some of the low confidence pLDDT values corresponded to disordered N-terminal residues that were revealed in the NMR ensembles.
Figure 3.
Statistical analysis of the shallow MSA depth models for the ABL kinase structures. (A) The residue-based pLDDT profile for the top five models is obtained from AF2 predictions of ABL conformations. The distributions of structural model assessment for the ABL conformational ensemble are obtained from AF2-MSA shallow subsampling predictions. (B) Density distribution of the pLDDT estimate of the prediction confidence on a scale from 0 to 100. (C) PAE heatmaps for the top five models. The heat maps are provided for each top ranked model and show the PAE between each residue in the model. The color scale contains three colors to highlight the contrast between the high-confidence regions and the low-confidence regions.
The density distribution of the pLDDT values obtained for the AF2 shallow MSA structural ensemble showed a pronounced peak at pLDDT ~90 and several minor peaks for pLDDT ~80–85 and pLDDT ~75 (Figure 3B). The heat maps of the predicted alignment error (PAE) for the best five AF2 models (Figure 3C) highlighted differences between the high-low-confidence regions, pointing to the increased heterogeneity of the A-loop (residues 395–421) in the bottom three models.
We computed the TM scores and rmsd values for the predicted ABL conformations and A-loop residues with respect to both the active and inactive experimental ABL structures (Figure 4). Structural similarity of the predicted conformations was evaluated with respect to both the active and the inactive experimental ABL structures. The most dominant peak was seen for TM values of ~0.95; with respect to the active ABL conformation, only a small fraction of the AF2 ensemble was similar to the state, and the predicted conformations were different from the inactive state (Figure 4A). The rmsd distribution highlighted a peak at rmsd ~0.2–0.4 Å from the active state and another peak at rmsd ~1.0 Å from inactive state (Figure 4B). By analyzing the density distributions of the rmsds for the A-loop residues with respect to the experimental ABL structures, we found three peaks at rmsd = 1.0 2.0, and 3.0 Å from the active ABL structure (Figure 4C). These findings confirmed the ability of the AF2 shallow MSA approach to accurately predict the ABL state and capture the intrinsic heterogeneity of the A-loop in the active form. The predicted A-loop conformations also displayed a shallow peak at rmsd ~3.0 Å from the inactive state, which represented a fraction of the predicted AF2 conformations that were similar to the state, while most of the AF2-predicted conformations were structurally different from the inactive state (Figure 4C). Hence, AF2 ensembles generated using the shallow MSA depth method can accurately predict the active ABL kinase form and also capture the mobility of the A-loop seen in both the active and intermediate inactive form.
Figure 4.
Distribution of structural similarity metrics TM scores and rmsd values for the predicted ABL conformations and for the predicted A-loop conformations computed with respect to the experimentally active and inactive ABL states. (A) Distribution density of TM scores for the AF2-predicted ABL conformations using shallow MSA depth with respect to the experimental active ABL state (orange filled bars) and the inactive states (red filled bars) and (blue filled bars). (B) Distribution density of rmsd scores for the AF2-predicted ABL conformations using shallow MSA depth with respect to the active ABL state (orange filled bars) and inactive states (red filled bars) and (blue filled bars). (C) Distribution density of rmsd scores for the AF2-predicted A-loop conformations with respect to the A-loop residues in the active ABL state (orange filled bars) and the inactive states (red filled bars) and (blue filled bars). (D) Structural alignment of the AF2-predicted conformational ensemble obtained with the AF2 and shallow MSA subsampling approach. (E) Structural overlay of the regulatory DFG motif conformations from the AF2-predicted conformational ensemble.
Structural alignment of the AF2-predicted structural ensemble pointed to a moderate degree of conformational heterogeneity, particularly in the A-loop (Figure 4D), while conformational differences in the DFG motif are modest and conform to the dominant DFG-in orientation in the active state, with only small displacements of the F401 residue (Figure 4E). These results are consistent with AF2 modeling of the active substrate-bound conformations for 437 catalytically competent human protein kinase domains, showing that the best models selected using pLDDT assessment of the A-loop residues can correctly single out the catalytically active protein kinase structure.43 A close correspondence was found between the best predicted A-loop conformations and the experimentally observed A-loop structures in the active and intermediate forms. Nonetheless, this analysis also highlighted limitations of this approach to capture the full diversity of the ABL conformational states as the key functional regions -helix and DFG motif remained in their active -in and DFG-in orientations. The AF2 shallow MSA approach produced only a limited variability of the regulatory DFG-in conformation that remained confined to its active form. As a result, although reducing MSA depth can increase conformational heterogeneity around the ground active ABL state, this approach is unable to generate the experimentally observed low-populated inactive ABL conformations.
Prediction of the ABL Conformational Ensembles Using the SPEACH_AF Approach.
In the next round of computational experiments, we explored the SPEACH_AF method14 in which MSAs are first generated from the original full length sequence using MMSeqs2 library.48,49 The method then applies random alanine masking on the generated MSA by introducing mutations to every sequence in the MSA at the same corresponding positions of the sequences (Figure S2). These experiments examined whether the SPEACH_AF approach can redirect the attention of the network to distinct parts of the MSA and detect distinct functional states and corresponding structural ensembles. We generated different MSAs with SPEACH_AF by using different random alanine mutation positions in the MSAs as well as the number of mutations in the MSAs (Figure 5). The results illustrated a range of prediction scenarios that can be generated by SPEACH_AF adaptation depending on the altered positions in the MSAs. First, we examined the results of SPEACH_AF experiments in which alanine masking with different numbers of mutations in the MSAs is done in the functional kinase segments of the catalytic domain including the substrate binding region in the C-terminal lobe, P-loop, -helix, and A-loop (Figure 5). By altering MSAs in these regions, it assumed that SPEACH_AF can promote an attention network mechanism on these positions and facilitate conformational heterogeneity of the predicted ensembles. A heatmap representation of the MSA and the relative coverage of the sequence with respect to the total number of aligned sequences pointed to good sequence coverage and high quality of MSA alignment across all residue positions (Figure 5A). The pLDDT profiles for the models showed convergence of the predicted conformations as pLDDT values >80 were seen for most of the ABL residues with the exception of highly mobile N-terminal residues 240–290 and regions near the intrinsically dynamic A-loop (residues 398–420) (Figure 5B). The pLDDT density profile highlighted the top peaks corresponding to the high pLDDT values of ~85–90 for the ABL residues, showing some moderate reduction for the A-loop residues (Figure 5C). The distributions of TM scores with respect to the active and two inactive conformations showed a broad range of values between 0.2 and 0.9, confirming that this approach can produce a wide range of alternative structures (Figure 5D). Interestingly, significant peaks at TM ~0.6–0.9 values measuring similarities with the active form indicated that predictions continue to be dominated by active-like ABL conformations.
Figure 5.
Statistical analysis of the SPEACH_AF models for the ABL kinase structures. (A) Heatmap representation of the MSA indicating all sequences mapped to the input sequences. The color scale points to the identity score, and sequences are ordered from top (largest identity) to bottom (lowest identity). White regions are not covered, which occurs with subsequence entries in the database. The black line qualifies the relative coverage of the sequence with respect to the total number of aligned sequences. (B) The residue-based pLDDT profile for the models is obtained from the SPEACH_AF predictions for ABL conformations. (C) Distribution density of pLDDT values for the SPEACH_AF models. (D) Distribution density of TM scores for the predicted ABL conformations with respect to the experimental structures. (E) Distribution density of rmsd scores for the predicted ABL conformations relative to the experimental structures. (F) Distribution density of rmsd scores for the predicted A-loop conformations with respect to the A-loop residues in the experimental structures. The densities in (D-F) for structural similarity metrics are shown with respect to the experimental active state (orange filled bars) and inactive states (red filled bars) and (blue filled bars).
A strong peak of TM scores ~0.85–0.9 measuring structural similarities of the predicted conformations with the inactive state revealed that the SPEACH_AF approach can recover the active and the intermediate inactive conformations (Figure 5D). A quantitative analysis of the rmsds for the complete ABL structure (Figure 5E) and the A-loop residues only (Figure 5) revealed a broad distribution for rmsds with respect to the active ABL conformation (rmsd ~0.4–0.8 Å) and a sharp peak at rmsds ~1.0–1.5 Å for the A-loop. A broader peak at larger rmsds ~3–4 Å reflected variability of the predicted open form of the A-loop, showing that this approach can capture conformational heterogeneity of the active structure. The rmsd distribution for the A-loop residues displayed a small peak at rmsd ~1.5–2 Å with respect to the inactive structure but relatively large rmsds of ~7.0–8.0 Å when compared to the flipped A-loop conformation in the fully inactive (Figure 5E,F).
After filtering out conformations with low pLDDT values (pLDDT <60), we analyzed structural alignment of the predicted conformational ensemble (Figure 6A) featuring appreciable displacements of the dynamic functional regions P-loop, -helix, and A-loop. Notably, the diversity of the predicted conformations is also reflected in the distribution of DFG conformations, showing the increased variability and extensive sampling of multiple intermediate DFG-out positions (Figure 6B). The diversity of the predicted conformations is also reflected in the increased variability of the DFG motif and multiple intermediate DFG-out positions (Figure 6B). Consistent with the experimental analysis of the inactive state, the predicted conformational ensemble featured the DFG motif flipped from the active DFG-in position by 180° and adopting the inactive DFG-out conformation. At the same time, the majority of the SPEACH_AF generated conformations featured the A-loop in the open conformation, which is similar to the active form (Figure 6B).
Figure 6.
(A) Structural alignment of the AF2-predicted conformational ensemble obtained with the SPEACH_AF approach. (B) Structural overlay of the regulatory DFG motif conformations from the SPEACH_AF predicted conformational ensemble.
We also examined how alanine masking in different positions of the generated MSAs can direct an attention network mechanism within AF2 to particular protein regions and affect the predictions of the alternative conformations. Specifically, we conducted SPEACH_AF experiments in which alanine masking of MSAs was directed to the highly dynamic N-terminal region (residues 240–300). By randomly masking the N-terminal residue positions in the MSAs and applying the AF2 prediction apparatus, we can induce significant changes in the distance matrices and produce a highly heterogeneous conformational ensemble. The AF2 prediction analysis shows the expected “poor” coverage of sequences in the targeted by masking residues 240–300 (Supporting Information, Figure S4). In these experiments, alanine mutagenesis of the MSAs produced a significant and broad increase in conformational variability across the entire ABL kinase domain, particularly featuring low-to-moderate pLDDT values of ~40–70 for the N-terminal residues and A-loop (Supporting Information Figure S4). The heat maps of PAE values reflected these differences, showing a highly heterogeneous ensemble, with PAEs signaling a considerable level of disorder for the A-loop conformations from these targeted SPEACH_AF predictions (Figure S4). In these experiments, the increased conformational diversity of the ensemble comes at the expense of producing many disordered and misfolded conformations with low pLDDT values along with an appreciable number of folded kinase conformations with pLDDT >70 (Supporting Information, Figure S4). A filtering protocol to remove misfolded conformations was conducted in which the predicted models with pLDDT less than 60–70 were removed while conformations with pLDDT >70 corresponded to the correctly folded ABL conformations. Structural alignment of the filtered ABL conformations showed an interesting and instructive consequence of targeted alanine masking. Indeed, the predicted ensemble displays “uncontrollable” diversity in the N-terminal lobe, whereas the remaining kinase domain accurately reproduced the active-like and intermediate ABL conformations. The overlay of the DFG motif showed the DFG motif exploring both the active DFG-in position and the inactive DFG-out conformation (Supporting Information Figure S4). Hence, SPEACH_AF adaptation of AF2 can enhance conformational heterogeneity of the predicted kinase states but could often produce misfolded and partially unfolded conformations. Our results indicated this method can only partly capture conformational heterogeneity of the ABL structures. Despite markedly increased variability of the AF2-predicted ABL conformations, this approach could not detect the functionally important inactive structure, which is dramatically different from both active state and inactive state . Our analysis also indicated that the SPEACH_AF approach can generate a significant number of outliers with low pLDDT values that may often correspond to disordered and misfolded structures.
Randomized Alanine Sequence Scanning Combined with Shallow MSA Facilitates Prediction of Functional Structural Ensembles in Different Forms of the ABL1 Kinase.
Using a combination of randomized Alanine Sequence Scanning and shallow MSA Subsampling, we predicted structures and conformational ensembles of the ABL kinase. The central distinguishing feature of the proposed AF2 adaptation is that the algorithm through random alanine scanning of the entire protein sequence or specific functional regions can systematically perturb the MSAs while avoiding introduction of random mutations in the homologous sequences within MSAs themselves, which is also the major difference from the SPEACH_AF approach (Figure 7).
Figure 7.
Statistical analysis of the predicted AF2 models using randomized alanine sequence scanning adaptation of AF2. (A-C) The residue-based pLDDT profiles of the top five ranked models were obtained in three different randomized sequence scanning AF2 experiments. These experiments highlight cases of consistent predictions of the active ABL state with high-confidence pLDDT values for the protein residues. (D-F) The PAE heatmaps are for the top five models in three different randomized sequence scanning AF2 experiments. The color scale contains three colors to highlight the contrast between the high confidence regions and the low confidence regions. The high pLDDT values and the patterns of PAE profiles highlight consistent and accurate prediction of the A-loop conformation (residues 398–421) and other functional regions for the active ABL state.
We report the results obtained from using the algorithm multiple times on the full native sequence, resulting in distinct sequences, each with a different frequency and position of alanine mutations. MSAs were then constructed for each of the mutated sequences using the alanine-scanned full-length sequences as input for the MMSeqs2 program, followed by shallow MSA AF2-based structure generation. The analysis of pLDDT distribution densities for produced conformations revealed an expected and generally conserved pattern57 in which we observed that high-confidence pLDDT values are consistently found for the ABL kinase domain in the thermodynamically dominant active state (Figure 7A-C). Another key regulatory element of the ABL kinase is the -helix (residues 292–312) where the -in position is characteristic of the active state and the -out position is present in intermediate and inactive kinase forms. The AF2 experiments using randomized alanine scanning of the entire protein sequence produced consistently high pLDDT values for the functional kinase regions, including A-loop and -helix, which typically reflected accurate predictions of the active ABL state (Figure 7). In addition, the AF2 predictions correctly reported the reduced confidence in the A-loop residues, as the A-loop conformation cannot be adequately represented by a single structure (Figure 7).
Other randomized alanine scanning AF2 experiments (Figure S5) produced ABL conformations with reduced pLDDT values for the A-loop residues and a more considerable variation in the A-loop predictions among the top five ranked models. Moderate pLDDT values were observed for the -helix residues (Figure S5) and featuring both the active -in and inactive -out orientations. These randomized alanine scanning AF2 experiments showed the ability to generate diverse ensemble of the inactive ABL conformations with a pronounced variability for the A-loop (residues 398–421). The PAE heatmaps between each residue in the model (Figure S5) underscored the differences between the high-confidence regions and the low-confidence regions, particularly demonstrating the increased heterogeneity of the A-loop and C-lobe residues in the predicted models. The cumulative density distribution of the pLDDT values (Figure 8A) displayed a strong peak at pLDDT ~85–90 and several small peaks at lower pLDDT values. The TM score values for the predicted conformations showed peaks at a TM score ~0.8–0.9 and structural similarities with the active and inactive conformations (Figure 8B). The distribution of the rmsd values for the A-loop residues showed distinct peaks mostly corresponding to the ensemble of active conformations (rmsd <1.0 Å) (Figure 8C,D). A peak at rmsd ~3.0–3.5 Å from the inactive structure indicated a significant population of the predicted conformations that are structurally similar to those of the inactive structure. The rmsd distribution of predicted conformations with respect to the inactive conformation showed a peak at larger rmsd values but also a population of the predicted conformations with the A-loop residues within rmsd of ~3.0 Å from the inactive state (Figure 8D). Hence, combining randomized alanine scanning with shallow MSAs can reproduce a functional ensemble of the heterogeneous inactive conformations and also detect a minor population of the conformations that are similar to the fully inactive structure.
Figure 8.
Analysis of AF2 predictions using a randomized alanine sequence scanning approach. The distribution of the pLDDT assessment metric and structural similarity metrics, TM scores, and rmsd values for the AF2 predicted structure and for the predicted conformations of the A-loop with respect to the active and inactive ABL states. (A) The cumulative residue-based pLDDT profile for the models is obtained from nine different experiments using AF2 with a randomized alanine scanning approach. (B) Distribution density of TM scores for the AF2-predicted ABL conformations with respect to the experimental active ABL state (orange filled bars) and the inactive states (red filled bars) and (blue filled bars). (C) Distribution density of rmsd scores for the AF2-predicted ABL conformations relative to the active ABL state (orange filled bars) and the inactive states (red filled bars) and (blue filled bars). (D) Distribution density of rmsd scores for the AF2-predicted A-loop conformations with respect to the A-loop residues in the active ABL state (orange filled bars), the inactive states (red filled bars) and (blue filled bars). (E) Structural alignment of the AF2-predicted conformational ensemble using randomized alanine sequence scanning. (F) Structural overlay of the regulatory DFG motif conformations from the AF2-predicted conformational ensemble.
Structural mapping of the AF2-predicted conformational ensemble demonstrated a functionally significant variability of the ABL kinase domain, suggesting that randomized alanine scanning combined with shallow MSA AF2 could produce a conformational ensemble reflecting functional heterogeneity of the kinase domain that is dominated by the active ABL conformation with the -in, DFG-in, and a highly flexible A-loop in its open conformational form (Figure 8E). By mapping the AF2-produced DFG conformations for the entire predicted ensemble, we illustrated the functionally relevant positional variability of the DFG motif and a “continuous” spectrum of movements between the active DFG-in position and inactive DFG-out conformations (Figure 8F). The variability of the DFG motif is particularly exemplified by the observed movements of the F401 residue that samples a large number of intermediate states between DFG-in and DFG-out flipped by 180° (Figure 8F). In addition to the active ABL state, the AF2 ensemble generated a significant population of ABL inactive conformations that sample changes between -in and -out inactive positions as well as intermediate DFG-in to DFG-out orientations (Figure 8E,F).
The diversity of the predicted ABL conformations is illustrated by selecting AF2 models that are close to the inactive state (rmsd <1.5 Å) (Supporting Information Figure S6). Strikingly, the kinase core and the A-loop conformation shared by these models are similar and closely resemble the state, but the DFG motif can adopt a number of intermediate DFG-out positions (Supporting Information Figure S6). In addition, most of the AF2-predicted ABL states tend to shift the helix toward the inactive -out orientation. To understand functional relevance of the predicted inactive conformations, the AF2 ensemble was further examined using the recently proposed nomenclature for the structures of active and inactive kinases.46
Interestingly, a significant fraction of the predicted inactive conformations directly correspond to the “BLBplus” class (DFG-in/out, -helix-out) that represents one of the two common inactive kinase forms46 where the -helix assumes an inactive -out conformation and DFG-Phe motif samples intermediate “out” positions with F401pointing upward and thus pushing the -helix outward (Figure 8, Supporting Information Figure S6B). According to the statistical analysis of the active and inactive kinase forms, 95% of kinase structures with the active DFG-in conformation also have the active -in position, while 77% of kinase structures with the inactive DFG-in/out upward conformation have the -helix out conformation In this conformation, the DFG-Phe ring is underneath the C-helix but pointing upward, forcing the C-helix outward.46 Hence, the important finding of this analysis is that the randomized alanine sequence scanning adaptation, when combined with shallow MSA AF2 modeling, enables prediction of functional conformational ensembles of ABL kinase, capturing the dominant population of the active ABL along with largely heterogeneous conformations.
Hence, the proposed alanine sequence scanning adaptation of the AF2 approach can produce a broad ensemble of both active and inactive states. The results also indicated that the A-loop and C-lobe regions become increasingly flexible in the intermediate inactive forms, which is consistent with the kinase regulation mechanisms where these functional regions undergo “cracking” to facilitate considerable structural changes during transitions between the inactive and active kinase states.58,59 Importantly, in contrast to other methods such as SPEACH_AF and AF-Cluster, the randomized sequence scanning adaptation can increase the conformational diversity of the AF2-predicted states while avoiding any misfolded predictions and capturing functional conformational heterogeneity of both active and inactive states.
Structural Analysis of the AF2-Generated Conformations and Equilibrium Macrostates of the ABL Landscape Reveals Functional Significance of the Predicted Ensembles.
The presented results can be better interpreted and understood in the context of the experimental structural studies and our previous atomistic simulations of the ABL kinase states. The results of our previous MD and MSM analysis42 suggested that the ABL kinase domain can utilize the inactive state for pathways between the active form and the inactive state . We compared the AF2 conformational ensembles produced using randomized alanine scanning with the equilibrium MD simulations, particularly focusing on mapping the predicted output conformations with the MSM-based conformational macrostates that define kinetic transitions between ABL states.42 According to our original study, the stationary distribution and transition probabilities resulted in a total of eight macrostates, with three macrostates defining the active kinase conformation, two macrostates being associated with the inactive structure I, and three other macrostates defining the basin of the inactive structure .42 Here, we systematically computed the rmsd’s between the AF2-generated conformational ensembles and the corresponding macrostates and analyzed the corresponding rmsd density distributions (Figure 9). This analysis showed that AF2-produced conformations can fairly accurately capture the entire landscape of conformational change, as exemplified by a spectrum of conformational macrostates. Indeed, structural mapping of AF2 conformations on macrostates 1, 2, and 3 that define the active ABL form42 revealed strong peaks at rmsds ~2.5–3.0 Å, thus confirming coverage of the equilibrium ensemble for the active ABL structure (Figure 9A). Strikingly, structural projection of the AF2 ensemble on macrostates 4 and 6 that characterized the inactive structure42 resulted in the distributions with pronounced peaks at rmsds ~2.5–3.5 Å from these macrostates (Figure 9B). These findings supported the notion that the AF2 ensembles can capture the main characteristics of the equilibrium ensembles and capture conformational transitions between the active and inactive ABL forms. The results of structural mapping onto the macrostates representing the inactive structure42 showed that a small fraction of the predicted conformational ensemble is similar to the unique inactive conformations (Figure 9C). Structural mapping of the AF2 conformations with distinct conformational macrostates of ABL derived from our previous study42 confirmed the prediction accuracy and ability of the approach to capture the functionally relevant heterogeneity of the generated ensemble. Interestingly, the results also reflected the dominant population of the active ABL form and a small fraction of the low-populated inactive structures (Figure 9D-F). The results of the AF2 predictions indicated that a broad ensemble of inactive intermediate BLBplus conformations (DFG-in/out, C-helix-out) may effectively connect the active form with the inactive state (Figures 8 and 9). Importantly, the AF2 prediction results are also consistent with the recent unbiased MD simulations of the ABL-Imatinib binding, capturing conformational change from the DFG-in/A-loop open conformation to the DFG-out/A-loop closed conformation.59 Consistent with this study, the AF2-predicted conformations are associated with a predominantly active DFG-in/A-loop open conformation and the inactive conformation, where the -helix assumes the inactive “out” orientation, the DFG motif is “flipped” to adopt a “DFG-out” conformation, while the A-loop remains in the open, active-like form (Figure 8E,F). It is important to emphasize that even though AF2 predictions can often provide highly accurate static structures and capture functionally relevant conformations from the equilibrium ensembles, the AF2 ensembles cannot be equated with the thermodynamic equilibrium ensemble of conformations. However, it appears that integrating information from both methods may be beneficial for the robust characterization of the functional protein conformations.
Figure 9.
Comparison of AF2 predictions using a randomized alanine sequence scanning approach with the equilibrium macrostates obtained from MSM analysis. (A) Distribution density of the rmsd scores for the AF2-predicted ABL conformations relative to the macrostates 1,2,3 defining the active ABL form (shown in orange, red, and blue filled bars respectively). (B) Distribution density of the rmsd scores for the AF2-predicted ABL conformations relative to the macrostates 4 and 6 defining the inactive form (shown in orange and red filled bars, respectively). (C) Distribution density of the rmsd scores for the AF2-predicted ABL conformations relative to the macrostates 5, 7, and 8 defining the inactive form (shown in orange, red, and blue filled bars, respectively). (D,E) Structural alignment of the AF2 predicted conformation with active-like macrostate 1 (orange ribbons) and macrostate 2 (red ribbons). (F,G) Structural alignment of the AF2 predicted conformation with inactive-like macrostate 4 (orange ribbons) and macrostate 6 (red ribbons). (H,I) Structural alignment of the AF2 predicted conformation with inactive-like macrostate 5 (orange ribbons) and macrostate 7 (red ribbons). The AF2 conformation is in hot-pink ribbons in (D-I).
Our results showed that applying randomized alanine sequence scanning together with the AF2 shallow MSA framework generates accurate conformational ensembles of the active and intermediate inactive states (-out, DFG-out, A-loop open) that play a central role in the ABL-drug recognition and binding process. Furthermore, this analysis confirmed that the prediction of the population of the fully inactive state (DFG-out, -helix-out, and closed A-loop) is rare is also echoed in observations that this ABL conformation is transient and “hidden” on the unbound ABL landscape and it may become stable only upon Imatinib binding.59 Similar conclusions are reached in a recent analysis of fold-switching proteins, showing that AF2 predictions of these allosteric systems are strongly biased toward predictions of the more stable ground fold-switch state while often failing to detect less stable “excited” states.60
Probing Effects of Targeted Alanine Sequence Scanning in Functional Kinase Regions Responsible for Conformational Change on Predicted AF2 Ensembles.
To probe differences between randomized alanine scanning of the entire protein sequence and targeted random scanning of specific kinase regions, we also examined several variations of this approach with targeted alanine masking of the ABL sequence space. The effects of random alanine masking of sequence positions were particularly probed for regions involved in conformational changes between active and inactive ABL states, including the A-loop (resides 398–421), the -helix (residues 300–311), as well as the C-lobe regions including the P+1 motif (residues 420–440), and the -helix (residues 460–480). For each of these targeted alanine making experiments, we generate 10 alanine scanned sequences, each with a different frequency and position of alanine mutations in the respective regions. By combining manipulation of sequence variations with MSA subsampling, we probed the effects of randomized alanine scanning in the functional regions on diversity of the predicted AF2 ensembles. The generated AF2 ensembles showed a considerable amount of conformational heterogeneity while typically producing high-confidence structures with pLDDT values >70 (Supporting Information Figure S7). Although random scanning targeted the A-loop region, the vast majority of the AF2-generated conformations corresponded to the active and inactive states (Supporting Information Figure S7). The population of the fully inactive state is small, as only a few predicted conformations featured the fully flipped A-loop. Interestingly, alanine scanning of the full sequence produced more conformations that are similar to those of the inactive state.
To facilitate a comparative analysis, we assembled the predicted AF2 ensembles for shallow MSA subsampling and SPEACH_AF randomized full and targeted alanine sequence scanning. PCA was performed on the AF2 predicted ensembles and compared against the experimental NMR ensembles of the active ABL state and inactive and inactive forms (Figure 10). The generated models from all methods were processed to exclude models with a pLDDT <60–70. These parsed sets of models were further evaluated using PCA. We used the MDAnalysis library for PCA of the structural ensembles. MDAnalysis created the trajectories of all the generated structures, and then the PCA algorithm was employed to project the respective conformational ensembles on two principal components (Figure 10). This allowed for direct comparison of the output AF2 structures obtained with the different methodologies to check the validity and diversity of the models. PCA projections of the NMR ensembles for the ABL states (Figure 10A-C) highlight distinctive structural signatures of these functional ABL forms, sampling both common and unique regions of the conformational space. Consistent with our analysis, the PCA of the ensemble generated by shallow MSA subsampling illustrated conformational heterogeneity, which is primarily associated with variability of the active ABL form corresponding to the densest region (Figure 10D).
Figure 10.
PCA of the NMR ensembles for the active ABL structure (A), inactive structure (B), and inactive structure (C). PCA of the generated ensembles using shallow MSA subsampling (D), SPEACH_AF approach (E), randomized alanine scanning of the full sequence (F), and random scanning in the targeted functional regions of the A-loop and C-lobe (G).
Interestingly, the SPEACH_AF approach yielded the most diverse ensemble and spans a more diffuse conformational space but failed to differentiate between populations of different ABL conformations and identify functionally relevant conformational clusters of the active and inactive states (Figure 10E). We also found that this approach could not detect the inactive structure, which is vastly different from both the active state and inactive state . Moreover, the SPEACH_AF approach can generate a significant number of disordered and misfolded structures that need to be filtered out from the generated ensemble. Our results suggest that this approach can be used to generate a large number of distinct initial states that may be potentially utilized to launch MD simulations and thereby enhance more effective exploration of the conformational space.
Of particular interest is a comparison of PCA plots for conformational ensembles produced by randomized alanine scanning of the full sequence (Figure 10F) and random scanning in the targeted functional regions of the A-loop and C-lobe (Figure 10G). While there is considerable similarity between these ensembles, PCAs also reveal interesting differences. The key difference is that randomized alanine scanning of the full sequence can produce more distinct functional conformational clusters that are associated with the active and inactive ABL states, thus enabling some characterization of the populations of the different states. In addition, it appears that full sequence scanning can yield more conformations that are similar to the inactive state with the flipped closed conformation of the A-loop. On the other hand, random scanning of targeted regions can also yield meaningful conformational ensembles and detection of the inactive intermediate states (Figure 10G). Although it may be counterintuitive at first glance, “localized” random sequence scanning in specific kinase regions, particularly the A-loop residues, may not necessarily force the AF2 attention network to switch bias from the most dominant active kinase structures to low-populated “hidden” inactive ABL conformations.
The important finding of this analysis is that combining randomized sequence scanning across the entire sequence with shallow MSA subsampling may present a simple and robust approach for the prediction of functional kinase states and ensembles that are consistent with the structural experiments. This comparative analysis also underscored that the key challenges of the emerging AF2 adaptations may be associated with more accurate predictions of relative conformational populations of distinct allosteric states rather than simply increasing the breadth of sampled conformations.
Local Frustration Analysis of the ABL Structures and AF2-Structural Ensembles.
Here, we propose a physical model for analyzing and interpreting the results of AF2 predictions of ABL allosteric states by hypothesizing that the accurate AF2 predictions of structural ensembles of the active and intermediate states and the inability to consistently locate the inactive form may be associated with the radical differences in local frustration patterns of these states. We examine local frustration patterns of the ABL states and show that the AF2-predicted thermodynamically dominant active and inactive states are characterized by minimally frustrated conformational landscapes that ensure both the stability and kinetic accessibility of these conformations. Our hypothesis for rationalization of the AF2 predictions is based on the notion that the minimally frustrated interactions of thermodynamically dominant states within a protein family can be conserved over evolutionary time scales, which implies their driving role in determining the foldability and coevolutionary signals, which are learned and then captured by the AF2 methodology. We propose that the AF2 inability to accurately predict all multiple experimental conformations of regulatory switchers such as ABL kinase may be also associated with the intrinsic differences in local frustration of conformational landscapes for the thermodynamically favorable and low-populated excited states. In this model, the highly frustrated interactions in alternative excited kinase states can be associated with specific coevolutionary signals that are not learned by the AF2 architecture, which forces AF2 predictions toward minimally frustrated and thermodynamically favorable states. The analysis is based on the profiling of the ABL residues in different states by a local frustratometer.50-52
The conformational frustration profiles showed moderately high frustration density for the active state (Figure 11A) and inactive state (Figure 11B) with the A-loop in the fully extended or open states. Importantly, it can be noticed that the high frustration density for the A-loop (residues 398–421) is fairly small in both the active and states as the corresponding density profiles for the A-loop residues displayed local minima (Figure 11A,B), while the minimally frustrated density profile showed moderate peaks for the A-loop positions (Figure 11D,E). The shape of the local frustration density profiles for the active and the inactive states is similar, with some exceptions associated with the high frustration for the P-loop region (residues 260–280) in the form (Figure 11B). The DFG motif is flipped 180° in the form with respect to the active conformation and adopts the DFG-out conformation. However, the A-loop remains in an open conformation similar to the active conformation. The frustration profiles near the DFG motif and neighboring A-loop residues featured a considerable level of minimal frustration contacts (Figure 11A,B). In contrast, for the fully inactive form, the density of highly frustrated contacts is increased significantly (Figure 11C). In particular, for the state, the high frustration density profile revealed pronounced peaks associated with the -helix (residues 300–311), A-loop (402–421), and the adjacent region in the C-lobe (residues 420–440) that includes the P + 1 motif and the -helix region (residues 460–480) (Figure 11C). The P + 1 segment is critical for substrate recognition and also serves as hydrophobic glue holding the subdomains of the C-lobe together. The APE motif (residues 426–428) is anchored to the -, -, and -helices, linking the activation segment and C-terminal subdomains. This can be contrasted to the active and inactive structures in which the contacts formed by residues proximal to A-loop positions are mostly minimally frustrated (Figure 11D,E). The observed change in the local conformational frustration of the inactive form echoed our previous studies of local frustration in the protein kinases, showing that conserved locally frustrated clusters could overlap with the kinase segments involved in conformational changes associated with the kinase function.56
Figure 11.
Local density of contact distributions of conformational frustration in the ABL structures. The residue-based high frustration density in the active ABL structure (A), inactive state (B), and inactive state (C). The high frustration density is shown as red lines. The residue-based minimal frustration density in the active ABL structure (D), inactive state (E), and inactive state (F).
Conformational frustration profiles for AF2-predicted intermediate inactive states showed moderate high frustration density (Figure S8) that is reminiscent of the inactive and also partly states, exhibiting several specific peaks. The corresponding high frustration clusters correspond to the -helix (residues 300–311), the C-lobe regions including the P + 1 motif (residues 420–440), and the -helix (residues 460480) (Supporting Information Figure S8A-C). Importantly, however, these distributions revealed only moderate high frustration for the A-loop residues following the DFG motif (residues 403–421), which is more reminiscent of the frustration profile of the inactive structure but vastly different from the high frustration constellation of A-loop residues see in the fully inactive, A-loop closed state (Supporting Information Figure S8A-C). We also performed both complete conformational and mutational frustration analysis for the ABL states and highlighted the highly frustrated, neutrally frustrated, and minimally frustrated local densities (Supporting Information Figure S9). The results of conformational frustration analysis (Supporting Information Figure S9A-C) highlighted the prevalence of the neutral frustration, which is consistent with previous studies of frustration in proteins.55 A direct comparison of high frustration and minimal frustration local densities is particularly revealing, showing the dominance of minimal frustration densities for the active state (Supporting Information, Figure S9A,D). Interestingly, the profiles for the high and minimal frustration densities in the state (Supporting Information Figure S9B,E) are quite similar to the active form, suggesting that both states are characterized by an overall minimally frustrated landscape. In contrast, conformational and mutational frustration densities for the state displayed high frustration peaks in the A-loop and other regions (Supporting Information Figure S9C,F).
High Frustration Clusters in the Low-Populated Inactive ABL State Define Cracking Sites of Allosteric Changes and Present Difficult Targets for AF2 Assessment.
To provide structure-based analysis and interpretation of local frustration patterns in the ABL kinase, we mapped the top 10% of highly frustrated sites and minimally frustrated sites in the active state (Supporting Information Figure S10), inactive state (Supporting Information Figure S11), and inactive state (Supporting Information Figure S12). The high frustration sites in the active state are mostly localized in the -helix region (residues 460–480) with several additional isolated sites in the C-lobe (Supporting Information Figure S10A). Notably, the functional -helix and A-loop regions are minimally frustrated in the active form. The high frustration density sites become more densely populated in the inactive state (Supporting Information Figure S11A) where, in addition to the C-lobe -helix, some of the sites are located near the DFG and -helix undergoing in-out shifts. However, there is no high frustration density in the A-loop that remains in the open form. A vastly different distribution of high frustration density sites was observed in the inactive state (Supporting Information Figure S12). We found a considerable density of high frustration sites in the -helix and in the C-lobe including portions of the P + 1 loop (residues 420–440) and the -helix (residues 460–480). Notably and differently from other states, we detected a significant density of high frustration positions in the closed A-loop, mostly in the flipped segment of the A-loop region (Supporting Information Figure S12A). Importantly, the highly frustrated sites from the A-loop, P + 1 substrate binding motif, and the -helix are clustered together, forming a large fraction of the C-lobe. Interestingly, the high frustration residues occupy critical regions of the inactive state and orchestrate conformational switches between the inactive and active states. Hence, our results indicated that the ABL kinase regions undergoing large structural changes during inactiveactive transitions could be enriched in clusters of highly frustrated residues. These findings are consistent with recent studies, showing that identification of conserved highly frustrated contacts can determine residues involved in conformational transitions.56
The emergence of local frustration in these regions could also indicate the “initiation cracking points” of the inactive kinase form, ultimately facilitating global conformational transitions and shifting a dynamic equilibrium between kinase states toward the active form.58 A local cracking model in protein kinases is often referred to for formation of unfavorable residue-residue interactions that can be compensated by increased local entropy in intermediate states during large conformational changes between inactive and active states.58,59,61 In this model, the formation of frustrated contacts and high strain may result in partial local unfolding (cracking) to facilitate large conformational transitions. Our local frustration analysis particularly indicated that a large cluster of high frustration sites in the inactive state may be involved in such a cracking mechanism during transitions to the active form. The emergence of highly frustrated clusters in the -helix, A-loop, and C-lobe regions could present the “initiation cracking points” that could perturb the inactive state and promote dynamic functional transitions to the active kinase form.
Strikingly, AF2 adaptations often fail to correctly predict the structural arrangement of this highly frustrated region in the state and thus cannot accurately reconstruct the unique structure of the fully inactive form. The inactive state, which is characterized by a significant amount of local frustration and presence of large structural clusters of frustrated unfavorable contacts, can feature coevolutionary signals that are masked in AF2 predictions. We argue that AF2 adaptations can predict distinct conformations that are associated with stable interactions and coevolutionary signals produced by minimal frustration contacts observed in the native protein structures. This analysis supported the emerging notion that AF2 models cannot readily predict conformational mechanisms driven by frustration changes and allosteric transformations. The results of this study suggest that AF2 prediction pipelines can be biased toward minimally frustrated, thermodynamically favorable states as coevolutionary signals inferred from these structures dominate AF2 training and inference networks. The significant dependence of AF2 algorithms on coevolutionary informational signals may explain the lack of robust prediction for the rare kinase states that are evolutionary byproducts with significant frustration contents (both geometrical and energetic) that are not readily discernible by learning of coevolutionary signals in protein families.
In this context, it is interesting to note that analysis of local frustration patterns in allosteric proteins with multiple functional states showed that the regions undergoing large conformational rearrangements can be enriched in patches of highly frustrated interactions.54,55 Although most interactions in allosteric proteins are not frustrated, evolution can use frustration in localized protein regions to allow for alternative structures, which can then become energetically favorable under differing conditions or upon ligand/protein binding.54 Importantly, however, local frustration patterns alone cannot predict allosteric states and mechanisms but rather represent some of the many characteristics of allosterically regulated systems. Understanding structural and dynamic signatures of allosteric interactions and pathways in protein systems can provide another dimension and level of physical insight required for improving AF2 methods to predict allosteric states.
Among a myriad of approaches, simulation-based modeling of allosteric correlation pathways and communications can be compared with the NMR experimental data to reconstruct functional pathways and allosteric protein states. The recent illuminating studies established important linkages between allosteric pathways and long-range dynamic couplings between functional regions and binding sites.62 Incorporation of dynamic and allosteric pathway-based characteristics of allosteric systems in manipulation of sequence and structural modules of AF2 attention neural networks may be necessary to make another step toward robust prediction of allosteric states and functional ensembles. Combining AF2 methods for predicting of protein ensembles and allosteric states with related adaptations for understanding of enzymatic functions21,22 could enable quantitative characterization of enzyme catalysis and applications for enzyme design, particularly for generating new functional enzyme variants for target reaction within biological constraints of the sequence space. Future applications of AF2 methodologies are likely to embrace innovative approaches for the modeling of protein dynamics and design into synergistic and adaptive AI-based platforms for manipulating protein functions and evolution.
CONCLUSIONS
In the current study, we employed several adaptations of the AlphaFold2 methodology to predict protein conformational ensembles and allosteric states of the ABL kinase. We showed that the MSA shallow subsampling approach can adequately predict the active ABL conformation and the conformational heterogeneity around the active form but is unable to reliably reproduce the low-populated inactive conformations. A comparative analysis reveals that the SPEACH_AF approach can increase the conformational diversity by sampling both open and partly closed A-loop conformations but may be sensitive to the mutation-generated MSAs and frequency of mutations in the MSAs, often producing highly disordered and misfolded conformations. The proposed randomized sequence scanning adaptation of the AF2 shallow MSA method can increase the conformational diversity of the predicted conformations and adequately describe relative populations of functional states while avoiding any misfolded predictions. We demonstrated that combining the proposed randomized sequence scanning with shallow MSA subsampling may present a simple and robust approach for the accurate prediction of functional kinase states and ensembles that are consistent with the NMR experiments. By mapping the predicted AF2 ensembles of the ABL kinase with the equilibrium simulations and major kinetic macrostates obtained from our previous studies, we found that the proposed AF2 adaptation can accurately capture conformational heterogeneity of the functional states and structural transformations between the active and inactive ABL conformations. We suggest that this approach can allow for gradual diversification of the attention network mechanism and discover distinct patterns of coevolved residues from the MSAs. The performed comparative analysis also underscored that the key challenges of the emerging AF2 adaptations may be associated with more accurate predictions of relative conformational populations of distinct allosteric states rather than simply increasing the breadth of sampled conformations. By employing local frustration analysis of the AF2-predicted conformations, this study unveiled previously unappreciated connections between local frustration patterns of conformational states and the ability of the AF2 methods to predict structural ensembles of the active and inactive states. We determined that the dominant minimal frustration patterns in the active ABL state and the inactive intermediate state provide broad and funnel-like landscapes around these states, allowing AF2 predictions to accurately capture structural ensembles of these functional ABL conformations. In contrast, the emergence of interconnected high frustration residue clusters in the inactive state that define the initiation “cracking” sites of allosteric changes presents difficult targets for robust AF2 predictions. This study proposed the energy landscape framework for interpretable characterization of the AF2 predictions and limitations in detecting allosteric states, suggesting that incorporation of local frustration information and attention-based learning of frustration patterns across protein folds may augment the predictive abilities of AF2-inspired methods.
Data Availability Statement
Data are fully contained within the article and Supporting Information. Crystal structures were obtained and downloaded from the Protein Data Bank (http://www.rcsb.org). The rendering of protein structures was done with UCSF ChimeraX package (https://www.rbvi.ucsf.edu/chimerax/) and Pymol (https://pymol.org/2/). The software tools used in this study are freely available at GitHub sites: https://github.com/deepmind/alphafold; https://github.com/sokrypton/ColabFold/; https://github.com/RSvan/SPEACH_AF; https://www.github.com/HWaymentSteele/AFCluster; https://github.com/smu-tao-group/protein-VAE. All the data obtained in this work, the software tools, and the in-house scripts are freely available at ZENODO generalpurpose open repository: https://zenodo.org/records/10656097.
Supplementary Material
ACKNOWLEDGMENTS
G.V acknowledges support from Schmid College of Science and Technology at Chapman University for providing computing resources at the Keck Center for Science and Engineering at Chapman University.
Funding
This research was supported by the National Institutes of Health under Award 1R01AI181600-01 and Subaward 6069-SC24-11 to G.V and National Institutes of Health under Award no. R15GM122013 to P.T.
Footnotes
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.4c00222.
Schematic representation of the AF2 protein structure prediction pipeline using MSA shallow subsampling; schematic representation of the AF2 protein structure prediction pipeline using SPEACH_AF adaptation; schematic overview of the three major AF2-based adaptations used in our study; statistical analysis of the SPEACH_AF experiments with alanine masking of MSAs in the N-terminal regions; statistical analysis of the predicted AF2 models for the ABL kinase structures using randomized alanine sequence scanning adaptation of AF2; structural alignment of the AF2-predicted conformations that are close to the inactive state; statistical analysis of the predicted AF2 models with random alanine masking of sequence positions for regions involved in conformational changes; local density of contacts distributions of conformational frustration in the ABL structures in the AF2-generated intermediate inactive structures; distributions of conformational and mutational frustration as local densities for the highly frustrated, neutrally frustrated, and minimally frustrated contacts in the active and inactive ABL states; structural analysis and mapping of high frustration hotspots and minimal frustration hotspots in the active ABL state; structural analysis and mapping of high frustration hotspots and minimal frustration hotspots in the intermediate inactive ABL state; and structural analysis of high frustration hotspots and minimal frustration hotspots in the intermediate inactive ABL state (PDF)
(PDF)
The authors declare no competing financial interest.
Contributor Information
Nishank Raisinghani, Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States.
Mohammed Alshahrani, Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States.
Grace Gupta, Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States.
Hao Tian, Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States.
Sian Xiao, Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States.
Peng Tao, Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States.
Gennady M. Verkhivker, Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States; Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, California 92618, United States; Department of Pharmacology, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
REFERENCES
- (1).Jumper J; Evans R; Pritzel A; Green T; Figurnov M; Ronneberger O; Tunyasuvunakool K; Bates R; Zídek A; Potapenko A; Bridgland A; Meyer C; Kohl SAA; Ballard AJ; Cowie A; Romera-Paredes B; Nikolov S; Jain R; Adler J; Back T; Petersen S; Reiman D; Clancy E; Zielinski M; Steinegger M; Pacholska M; Berghammer T; Bodenstein S; Silver D; Vinyals O; Senior AW; Kavukcuoglu K; Kohli P; Hassabis D Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596, 583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).Tunyasuvunakool K; Adler J; Wu Z; Green T; Zielinski M; Zídek A; Bridgland A; Cowie A; Meyer C; Laydon A; Velankar S; Kleywegt GJ; Bateman A; Evans R; Pritzel A; Figurnov M; Ronneberger O; Bates R; Kohl SAA; Potapenko A; Ballard AJ; Romera-Paredes B; Nikolov S; Jain R; Clancy E; Reiman D; Petersen S; Senior AW; Kavukcuoglu K; Birney E; Kohli P; Jumper J; Hassabis D Highly Accurate Protein Structure Prediction for the Human Proteome. Nature 2021, 596, 590–596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (3).Bahdanau D; Cho K; Bengio Y Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- (4).Vaswani A; Shazeer N; Parmar N; Uszkoreit J; Jones L; Gomez A; Kaiser L; Polosukhin I Attention is all you need. Advances in Neural Information Processing Systems; Curran Associates, Inc., 2017; Vol. 30, pp 5998–6008. [Google Scholar]
- (5).Rives A; Meier J; Sercu T; Goyal S; Lin Z; Liu J; Guo D; Ott M; Zitnick CL; Ma J; Fergus R Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences. Proc. Natl. Acad. Sci. U.S.A 2021, 118, No. e2016239118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (6).Lin Z; Akin H; Rao R; Hie B; Zhu Z; Lu W; Smetanin N; Verkuil R; Kabeli O; Shmueli Y; dos Santos Costa A; Fazel-Zarandi M; Sercu T; Candido S; Rives A Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model. Science 2023, 379, 1123–1130. [DOI] [PubMed] [Google Scholar]
- (7).Wu R; Ding F; Wang R; Shen R; Zhang X; Luo S; Su C; Wu Z; Xie Q; Berger B; Ma J; Peng J High-Resolution de Novo Structure Prediction from Primary Sequence. bioRxiv 2022. [Google Scholar]
- (8).Baek M; DiMaio F; Anishchenko I; Dauparas J; Ovchinnikov S; Lee GR; Wang J; Cong Q; Kinch LN; Schaeffer RD; Millán C; Park H; Adams C; Glassman CR; DeGiovanni A; Pereira JH; Rodrigues AV; van Dijk AA; Ebrecht AC; Opperman DJ; Sagmeister T; Buhlheller C; Pavkov-Keller T; Rathinaswamy MK; Dalwadi U; Yip CK; Burke JE; Garcia KC; Grishin NV; Adams PD; Read RJ; Baker D Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network. Science 2021, 373, 871–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (9).Baek M; Anishchenko I; Humphreys IR; Cong Q; Baker D; DiMaio F Efficient and Accurate Prediction of Protein Structure Using RoseTTAFold2. bioRxiv 2023. [Google Scholar]
- (10).Mansoor S; Baek M; Park H; Lee GR; Baker D Protein Ensemble Generation Through Variational Autoencoder Latent Space Sampling. J. Chem. Theory Comput 2024, 20, 2689–2695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (11).Watson JL; Juergens D; Bennett NR; Trippe BL; Yim J; Eisenach HE; Ahern W; Borst AJ; Ragotte RJ; Milles LF; Wicky BIM; Hanikel N; Pellock SJ; Courbet A; Sheffler W; Wang J; Venkatesh P; Sappington I; Torres SV; Lauko A; De Bortoli V; Mathieu E; Ovchinnikov S; Barzilay R; Jaakkola TS; DiMaio F; Baek M; Baker D De Novo Design of Protein Structure and Function with RFdiffusion. Nature 2023, 620, 1089–1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (12).Fleishman SJ; Horovitz A Extending the New Generation of Structure Predictors to Account for Dynamics and Allostery. J. Mol. Biol 2021, 433, 167007. [DOI] [PubMed] [Google Scholar]
- (13).Del Alamo D; Sala D; Mchaourab HS; Meiler J Sampling Alternative Conformational States of Transporters and Receptors with AlphaFold2. Elife 2022, 11, No. e75751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (14).Stein RA; Mchaourab HS SPEACH_AF: Sampling Protein Ensembles and Conformational Heterogeneity with Alphafold2. PLoS Comput. Biol 2022, 18, No. e1010483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (15).Saldano T; Escobedo N; Marchetti J; Zea DJ; Mac Donagh J; Velez Rueda AJ; Gonik E; García Melani A; Novomisky Nechcoff J; Salas MN; Peters T; Demitroff N; Fernandez Alberti S; Palopoli N; Fornasari MS; Parisi G Impact of Protein Conformational Diversity on AlphaFold Predictions. Bioinformatics 2022, 38, 2742–2748. [DOI] [PubMed] [Google Scholar]
- (16).Chakravarty D; Porter LL AlphaFold2 Fails to Predict Protein Fold Switching. Protein Sci. 2022, 31, No. e4353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (17).Chakravarty D; Schafer JW; Chen EA; Thole JR; Porter LL AlphaFold2 Has More to Learn about Protein Energy Landscapes. bioRxiv 2023. [Google Scholar]
- (18).Sala D; Hildebrand PW; Meiler J Biasing AlphaFold2 to Predict GPCRs and Kinases with User-Defined Functional or Structural Properties. Front. Mol. Biosci 2023, 10, 1121962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (19).Wayment-Steele HK; Ojoawo A; Otten R; Apitz JM; Pitsawong W; Hömberger M; Ovchinnikov S; Colwell L; Kern D Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 2023, 625, 832–839. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (20).Porter LL; Chakravarty D; Schafer JW; Chen EA ColabFold Predicts Alternative Protein Structures from Single Sequences, Coevolution Unnecessary for AF-Cluster. bioRxiv 2023. [Google Scholar]
- (21).Casadevall G; Duran C; Estévez-Gay M; Osuna S Estimating conformational heterogeneity of tryptophan synthase with a template-based Alphafold2 approach. Protein Sci. 2022, 31, No. e4426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (22).Casadevall G; Duran C; Osuna S AlphaFold2 and Deep Learning for Elucidating Enzyme Conformational Flexibility and Its Application for Design. JACS Au 2023, 3, 1554–1562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (23).Mirdita M; Schötze K; Moriwaki Y; Heo L; Ovchinnikov S; Steinegger M ColabFold: Making Protein Folding Accessible to All. Nat. Methods 2022, 19, 679–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (24).Ahdritz G; Bouatta N; Floristean C; Kadyan S; Xia Q; Gerecke W; O’Donnell TJ; Berenberg D; Fisk I; Zanichelli N; Zhang B; Nowaczynski A; Wang B; Stepniewska-Dziubinska MM; Zhang S; Ojewole A; Guney ME; Biderman S; Watkins AM; Ra S; Lorenzo PR; Nivon L; Weitzner B; Ban Y-EA; Chen S; Zhang M; Li C; Song SL; He Y; Sorger PK; Mostaque E; Zhang Z; Bonneau R; AlQuraishi M OpenFold: Retraining AlphaFold2 Yields New Insights into Its Learning Mechanisms and Capacity for Generalization. Nat. Methods 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (25).Mirabello C; Wallner B; Nystedt B; Azinas S; Carroni M Unmasking AlphaFold: Integration of Experiments and Predictions in Multimeric Complexes. bioRxiv 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (26).Sala D; Engelberger F; Mchaourab HS; Meiler J Modeling Conformational States of Proteins with AlphaFold. Curr. Opin. Struct. Biol 2023, 81, 102645. [DOI] [PubMed] [Google Scholar]
- (27).Nussinov R; Zhang M; Liu Y; Jang H AlphaFold, Artificial Intelligence (AI), and Allostery. J. Phys. Chem. B 2022, 126 (34), 6372–6383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (28).Terwilliger TC; Liebschner D; Croll TI; Williams CJ; McCoy AJ; Poon BK; Afonine PV; Oeffner RD; Richardson JS; Read RJ; Adams PD AlphaFold Predictions Are Valuable Hypotheses and Accelerate but Do Not Replace Experimental Structure Determination. Nat. Methods 2024, 21, 110–116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (29).Taylor SS; Keshwani MM; Steichen JM; Kornev AP Evolution of the Eukaryotic Protein Kinases as Dynamic Molecular Switches. Philos. Trans. R. Soc. London, Ser. B 2012, 367, 2517–2528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (30).Taylor SS; Ilouz R; Zhang P; Kornev AP Assembly of Allosteric Macromolecular Switches: Lessons from PKA. Nat. Rev. Mol. Cell Biol 2012, 13, 646–658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (31).Oruganty K; Kannan N Design Principles Underpinning the Regulatory Diversity of Protein Kinases. Philos. Trans. R. Soc. London, Ser. B 2012, 367, 2529–2539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Meharena HS; Chang P; Keshwani MM; Oruganty K; Nene AK; Kannan N; Taylor SS; Kornev AP Deciphering the Structural Basis of Eukaryotic Protein Kinase Regulation. PLoS Biol. 2013, 11, No. e1001680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (33).Taylor SS; Kornev AP Protein Kinases: Evolution of Dynamic Regulatory Proteins. Trends Biochem. Sci 2011, 36, 65–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (34).Taylor SS; Wu J; Bruystens JGH; Del Rio JC; Lu TW; Kornev AP; Ten Eyck LF From Structure to the Dynamic Regulation of a Molecular Switch: A Journey over 3 Decades. J. Biol. Chem 2021, 296, 100746. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (35).Johnson TK; Bochar DA; Vandecan NM; Furtado J; Agius MP; Phadke S; Soellner MB Synergy and Antagonism between Allosteric and Active-Site Inhibitors of Abl Tyrosine Kinase. Angew. Chem., Int. Ed. Engl 2021, 60, 20196–20199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (36).Kim C; Ludewig H; Hadzipasic A; Kutter S; Nguyen V; Kern D A Biophysical Framework for Double-Drugging Kinases. Proc. Natl. Acad. Sci. U.S.A 2023, 120, No. e2304611120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (37).Saleh T; Rossi P; Kalodimos CG Atomic View of the Energy Landscape in the Allosteric Regulation of Abl Kinase. Nat. Struct. Mol. Biol 2017, 24, 893–901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (38).Xie T; Saleh T; Rossi P; Kalodimos CG Conformational states dynamically populated by a kinase determine its function. Science 2020, 370, No. eabc2754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (39).Stiller JB; Otten R; Häussinger D; Rieder PS; Theobald DL; Kern D Structure Determination of High-Energy States in a Dynamic Protein Ensemble. Nature 2022, 603, 528–535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (40).Meng Y; Gao C; Clawson DK; Atwell S; Russell M; Vieth M; Roux B Predicting the Conformational Variability of Abl Tyrosine Kinase Using Molecular Dynamics Simulations and Markov State Models. J. Chem. Theory Comput 2018, 14, 2721–2732. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (41).Paul F; Thomas T; Roux B Diversity of Long-Lived Intermediates along the Binding Pathway of Imatinib to Abl Kinase Revealed by MD Simulations. J. Chem. Theory Comput 2020, 16, 7852–7865. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (42).Krishnan K; Tian H; Tao P; Verkhivker GM Probing Conformational Landscapes and Mechanisms of Allosteric Communication in the Functional States of the ABL Kinase Domain Using Multiscale Simulations and Network-Based Mutational Profiling of Allosteric Residue Potentials. J. Chem. Phys 2022, 157, 245101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (43).Faezov B; Dunbrack RL Jr. AlphaFold2Models of the Active Form of All 437 Catalytically Competent Human Protein Kinase Domains. bioRxiv 2023. [Google Scholar]
- (44).Herrington NB; Stein D; Li YC; Pandey G; Schlessinger A Exploring the Druggable Conformational Space of Protein Kinases Using AI-Generated Structures. bioRxiv 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (45).Monteiro da Silva G; Cui JY; Dalgarno DC; Lisi GP; Rubenstein BM High-Throughput Prediction of Protein Conformational Distributions with Subsampled AlphaFold2. Nat. Commun 2024, 15, 2464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (46).Modi V; Dunbrack RL Jr. Defining a New Nomenclature for the Structures of Active and Inactive Kinases. Proc. Natl. Acad. Sci. U.S.A 2019, 116, 6818–6827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (47).Arter C; Trask L; Ward S; Yeoh S; Bayliss R Structural Features of the Protein Kinase Domain and Targeted Binding by Small-Molecule Inhibitors. J. Biol. Chem 2022, 298, 102247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (48).Steinegger M; Söding J MMseqs2 Enables Sensitive Protein Sequence Searching for the Analysis of Massive Data Sets. Nat. Biotechnol 2017, 35, 1026–1028. [DOI] [PubMed] [Google Scholar]
- (49).van Kempen M; Kim SS; Tumescheit C; Mirdita M; Lee J; Gilchrist CLM; Söding J; Steinegger M Fast and Accurate Protein Structure Search with Foldseek. Nat. Biotechnol 2024, 42, 243–246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (50).Zhang Y. TM-Align: A Protein Structure Alignment Algorithm Based on the TM-Score. Nucleic Acids Res. 2005, 33, 2302–2309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (51).Ferreiro DU; Hegler JA; Komives EA; Wolynes PG Localizing frustration in native proteins and protein assemblies. Proc. Natl. Acad. Sci. U.S.A 2007, 104, 19819–19824. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (52).Parra RG; Schafer NP; Radusky LG; Tsai MY; Guzovsky AB; Wolynes PG; Ferreiro DU Protein Frustratometer 2: A Tool to Localize Energetic Frustration in Protein Molecules, Now With Electrostatics. Nucleic Acids Res. 2016, 44, W356–W360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (53).Freiberger MI; Wolynes PG; Ferreiro DU; Fuxreiter M Frustration in Fuzzy Protein Complexes Leads to Interaction Versatility. J. Phys. Chem. B 2021, 125 (10), 2513–2520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (54).Chen M; Chen X; Schafer NP; Clementi C; Komives EA; Ferreiro DU; Wolynes PG Surveying biomolecular frustration at atomic resolution. Nat. Commun 2020, 11, 5944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (55).Freiberger MI; Ruiz-Serra V; Pontes C; Romero-Durana M; Galaz-Davison P; Ramírez-Sarmiento CA; Schuster CD; Marti MA; Wolynes PG; Ferreiro DU; Parra RG; Valencia A Local Energetic Frustration Conservation in Protein Families and Superfamilies. Nat. Commun 2023, 14, 8379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (56).Dixit A; Verkhivker GM The Energy Landscape Analysis of Cancer Mutations in Protein Kinases. PLoS One 2011, 6, No. e26071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (57).Lane TJ Protein Structure Prediction Has Reached the Single-Structure Frontier. Nat. Methods 2023, 20, 170–173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (58).Whitford PC; Miyashita O; Levy Y; Onuchic JN Conformational Transitions of Adenylate Kinase: Switching by Cracking. J. Mol. Biol 2007, 366, 1661–1671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (59).Ayaz P; Lyczek A; Paung Y; Mingione VR; Iacob RE; de Waal PW; Engen JR; Seeliger MA; Shan Y; Shaw DE Structural Mechanism of a Drug-Binding Process Involving a Large Conformational Change of the Protein Target. Nat. Commun 2023, 14, 1885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (60).Schafer JW; Porter LL Evolutionary Selection of Proteins with Two Folds. Nat. Commun 2023, 14, 5478. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (61).Miyashita O; Onuchic JN; Wolynes PG Nonlinear Elasticity, Proteinquakes, and the Energy Landscapes of Functional Transitions in Proteins. [DOI] [PMC free article] [PubMed]; Miyashita O, Onuchic JN, Wolynes PG. Nonlinear elasticity, protein quakes, and the energy landscapes of functional transitions in proteins. Proc. Natl. Acad. Sci. U.S.A 2003, 100, 12570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (62).Ray D; Quijano RN; Andricioaei I Point Mutations in SARS-CoV-2 Variants Induce Long-Range Dynamical Perturbations in Neutralizing Antibodies. Chem. Sci 2022, 13, 7224–7239. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data are fully contained within the article and Supporting Information. Crystal structures were obtained and downloaded from the Protein Data Bank (http://www.rcsb.org). The rendering of protein structures was done with UCSF ChimeraX package (https://www.rbvi.ucsf.edu/chimerax/) and Pymol (https://pymol.org/2/). The software tools used in this study are freely available at GitHub sites: https://github.com/deepmind/alphafold; https://github.com/sokrypton/ColabFold/; https://github.com/RSvan/SPEACH_AF; https://www.github.com/HWaymentSteele/AFCluster; https://github.com/smu-tao-group/protein-VAE. All the data obtained in this work, the software tools, and the in-house scripts are freely available at ZENODO generalpurpose open repository: https://zenodo.org/records/10656097.