Abstract
This study reports a comprehensive analysis and comparison of several AlphaFold2 adaptations and OmegaFold and AlphaFlow approaches in predicting distinct allosteric states, conformational ensembles, and mutation-induced structural effects for a panel of state-switching allosteric ABL mutants. The results revealed that the proposed AlphaFold2 adaptation with randomized alanine sequence scanning can generate functionally relevant allosteric states and conformational ensembles of the ABL kinase that qualitatively capture a unique pattern of population shifts between the active and inactive states in the allosteric ABL mutants. Consistent with the NMR experiments, the proposed AlphaFold2 adaptation predicted that G269E/M309L/T408Y mutant could induce population changes and sample a significant fraction of the fully inactive I2 form which is a low-populated, high-energy state for the wild-type ABL protein. We also demonstrated that other ABL mutants G269E/M309L/T334I and M309L/L320I/T334I that introduce a single activating T334I mutation can reverse equilibrium and populate exclusively the active ABL form. While the precise quantitative predictions of the relative populations of the active and various hidden inactive states in the ABL mutants remain challenging, our results provide evidence that AlphaFold2 adaptation with randomized alanine sequence scanning can adequately detect a spectrum of the allosteric ABL states and capture the equilibrium redistributions between structurally distinct functional ABL conformations. We further validated the robustness of the proposed AlphaFold2 adaptation for predicting the unique inactive architecture of the BSK8 kinase and structural differences between ligand-unbound apo and ATP-bound forms of BSK8. The results of this comparative study suggested that AlpahFold2, OmegaFold, and AlphaFlow approaches may be driven by structural memorization of existing protein folds and are strongly biased toward predictions of the thermodynamically stable ground states of the protein kinases, highlighting limitations and challenges of AI-based methodologies in detecting alternative functional conformations, accurate characterization of physically significant conformational ensembles, and prediction of mutation-induced allosteric structural changes.
Graphical Abstract
INTRODUCTION
AlphaFold2 (AF2) has achieved significant advancements in protein structure modeling, marking a pivotal moment in structural biology.1,2 AF2 utilizes evolutionary insights from multiple sequence alignments (MSAs) of related protein sequences and leverages the power of a hierarchical transformer architecture with self-attention mechanisms to effectively identify long-range dependencies and interactions within protein sequences.1,2 Self-supervised deep learning models, inspired by natural language processing (NLP) architectures, particularly those incorporating attention-based and transformer mechanisms, have emerged as powerful tools for capturing contextual spatial relationships through training on extensive sets of protein sequences.3,4 One of the latest breakthroughs in AI-based protein structure predictions is the development of ESMFold, an end-to-end family of transformer protein language models known for their high accuracy in predicting atomic-level protein structures directly from individual protein sequences eliminating the reliance on MSA information.5,6 OmegaFold represents another related approach, utilizing a hybrid model that combines a protein language model (PLM) with a geometry-guided transformer model that can learn single- and pairwise-residue embeddings as features, which are then fed into a vector geometry-inspired transformer block named Geoformer7 as opposed to the evolutionary variation of the AF2 Evoformer. Unlike AF2, neither OmegaFold nor ESMFold relies on MSAs or known structures as templates. Recent comparative analysis on 69 protein targets from the 15th Critical Assessment of Structure Prediction (CASP15) challenge revealed that PLM-based methods performed the best with OmegaFold closely followed by ESMFold outperforming the other methods including AF2.8 A diffusion-based generative model FoldingDiff produces protein backbone structures via a procedure inspired by the natural folding process in which the predicted structures are denoised from a random, unfolded state toward a stable folded structure.9 Notably, this study also demonstrated that predicted folds using the OmegaFold and FoldingDiff approaches are similar. RoseTTAFold2 approach extends the original three-track architecture of RoseTTAFold10,11 by using a more computationally efficient structure-biased attention, producing comparable accuracy on monomers, and AF2-multimer on complexes while allowing for efficient computational scaling on large proteins and complexes.12 OmegaFold can often yield better precision than RoseTTaFold and AF2, particularly for orphan proteins and antibodies.13
Despite the remarkable achievements of AF2-based methods and self-supervised protein language models in predicting static protein structures, they face significant limitations in their ability to universally characterize conformational dynamics, functional protein ensembles, conformational changes, and allosteric states.14 Recent studies have highlighted that while AF2 methods excel in predicting individual protein structures, extending this capability to accurately capture conformational protein ensembles and allosteric landscapes remains a substantial challenge.15–17 The limitations of AF2 methods in predicting multiple protein conformations may stem from an inherent training bias toward experimentally determined, thermodynamically stable structures, and MSAs containing evolutionary information used to infer the ground protein states. Although AF2 methods showed robust performance in predicting the experimentally determined ground conformation for 98 fold-switching proteins, they typically failed to detect alternative structures suggesting the inherent AF2 network bias for the most probable conformer rather than an ensemble of relevant functional states.15,16
Several recent adaptations of the AF2 framework for predicting alternative conformational states of proteins involve reducing the depth of the MSA where only a subset of sequences is sampled, resulting in a shallower MSA.18 The SPEACH_AF (sampling protein ensembles and conformational heterogeneity with AlphaFold2) approach incorporates in silico alanine mutagenesis within MSAs to enhance the attention network mechanism of AF2.19 This enables the exploration of different patterns of coevolved residues linked to alternative protein conformations. Another adaptation of AF2, the AF-Cluster method, includes subsampling MSAs and subsequently clustering sequences that share evolutionary or functional similarities.20 Using SPEACH_AF and AF-Cluster approaches, a recent study generated conformational ensembles for 93 fold-switching proteins with experimentally determined structures, resulting in only a modest success rate of 25% in predicting functional conformations, thereby highlighting limitations in of these tools for predicting alternative states for the majority of fold-switching proteins.21 AF2 adaptations that combined sequence and evolutionary content discerned from MSAs with structural information was particularly successful in applications to protein kinase an GPCR proteins.22 The latest analysis of the AF2 methods tested the predictive ability on fold-switching proteins, showing relatively weak reproducibility of experimental fold switching, failure to discriminate between low and high energy conformations and suggesting a strong bias toward ground states arising from structural memorization of training-set structures rather than from understanding of protein thermodynamics.23
AF2 and ESMFold approaches were recently repurposed to build a flow matching framework AlphaFlow which is a sequence-conditioned generative denoising model that introduces flexibility into structural models and can predict conformational ensembles which are superior to the ones obtained by MSA subsampling adaptations.24 A feature-conditioned generative model AlphaFlow-Lit is an adaptation of AlphaFlow for efficient protein ensemble generation that eliminates the heavy reliance on MSAs encoding blocks and utilizes computed features to produce a diverse range of conformations.25 Despite various enhancements that bring some diversity to the AF2 framework, the key challenges of the emerging AF2 adaptations are associated with accurate predictions of functional conformations and relative populations of distinct allosteric states rather than simply increasing the conformational diversity of the predicted ensembles.
The abundance of structural data available for protein kinases and their complexes26,27 in a variety of functional states offers unparalleled opportunities for validating and testing AF2 methods. Molecular dynamics (MD) simulations combined with Markov state models (MSM) explored in details the free energy landscape of ABL kinase revealing that long simulations and rigorous kinetic analysis could detect hidden metastable conformations accessible to tyrosine kinases.28–30 Recent unbiased MD simulations of the ABL-Imatinib binding captured conformational change between the active and inactive states, showing that binding involved several primary conformational states that differ in structural arrangements of the key functional regions.31 Despite significant structural wealth of the protein kinases, the experimental atomistic characterizations of the functionally relevant transient states have been lacking until recently due to large conformational transformations and short-lived kinase intermediates involved in kinetics of the allosteric shifts. High-resolution structure determination of low-populated states by coupling NMR spectroscopy-derived pseudocontact shifts with Carr–Purcell–Meiboom–Gill (CPMG) relaxation dispersion can characterize kinetics and thermodynamics of conformational changes between ground and excited states, leading to accurate structure determination of high-energy structures for some protein kinases.32 The recent NMR studies provided atomistic view of the energy landscape underlying allosteric regulation of ABL kinase showing how structural elements synergistically generate a multilayered allosteric mechanism that enables ABL kinase to switch between distinct allosteric states.33 Another groundbreaking study from the same lab employed NMR chemical exchange saturation transfer (CEST) experiments to determine structure of the unbound ABL kinase in the active state and enable structural characterization of two short-lived hidden inactive conformations I1 and I2 that are intrinsic for the kinase domain and are different from each other in the critical functional regions.34 This study showed that the equilibrium switching between the active and two distinct inactive states is exploited by mutants, ligands, post-translational modifications, and inhibitors to regulate kinase activity and function. Atomistic MD simulations and MSM kinetic modeling characterized the dynamics of structural changes between these ABL conformational states.35
Recent studies have examined the potential of AF2 methodologies in predicting conformational states of protein kinases. One such study applied AF2-based modeling to evaluate 437 human protein kinases in their active states, using shallow MSA derived from orthologs and close homologues of the target proteins. Interestingly, AF2-generated models chosen for each kinase, based on prediction confidence scores for activation loop (A-loop) residues, closely resembled experimentally determined substrate-bound structures.36 Another study explored the ability of AF2 to predict kinase structures in various conformations across different MSA depth, showing that reducing the MSA depth can promote loosening of the coevolutionary constraints leading to more efficient exploration of alternative kinase conformations.37
The main challenges of AF2 adaptations are associated with accurately predicting functional conformations and determining the relative populations of distinct allosteric states rather than merely increasing the diversity of the predicted ensembles. In addition, although AF2 methods are highly reliable and robust in predicting single protein structures corresponding to thermodynamically dominant ground states, these methods are limited and not sufficiently sensitive when predicting multiple functional conformations of allosteric proteins or accurately capturing the effects of local mutations that affect protein stability or allosteric mutants inducing large structural transformations and equilibrium switching between distinct functional conformations. By exploring different adaptations of the MSA subsampling architecture, a latest study attempted a systematic analysis of the AF2 approach for prediction and characterization of the conformational distributions of the ABL wild type (WT) kinase domain and various functional ABL mutants that can operate by altering the equilibrium distribution of states and inducing allosterically regulated population switches between the active and inactive kinase forms.38 Through extensive and family targeted MSA spanning over 600,000 sequences of the ABL WT and by varying shallow MSA parameters, this study encouraged AF2 to generate a more complete ensemble of ABL conformations and reproduced the thermodynamically dominant ABL active state. This pioneering approach qualitatively predicted the positive and negative effects of several ABL mutations on the active state populations but could not accurately reproduce the quantitative ratios of populations for some allosteric state-switching mutants and often misjudged the mutational effects on structural changes in the ABL kinase.38 Despite the intrinsic limitations, this approach for the first time demonstrated the potential and promise of the AF2 adaptations in predicting the effects of allosteric mutations on protein structures and ability to detect structurally different allosteric states. In this context, it is worth noting that although AF2 and its shallow MSA adaptations are now recognized as highly reliable and robust tools for predicting protein structures, these methods are still believed to be limited to accurately capture the effects of conformational ensemble and single-point mutations, which can lead to allosteric structural changes.39–41
A recent analysis of AF2 methods for predicting the effects of point mutations showed that functionally relevant structural changes in the mutational models can be obtained when mutations are introduced in the entire MSA as compared to only the input sequence.42 Another interesting investigation demonstrated that AF2 may be able to predict single mutation effects with moderate structural changes.43 It was also demonstrated that AF2-predicted structures can encode information about the stability and destabilizing effects of single mutations that do not disrupt protein structure.44 In general, the challenge of accurately predicting the structural effects of mutations is further compounded by the fact that critical changes in stability may arise from subtle structural alterations and that many important disease-associated mutations are allosteric and could induce large structural transformations that are significantly underrepresented in the training data sets used for optimizing AF2 performance. A significant focus of emerging AF2 and OmegaFold adaptations is broadening the range of accessible protein conformations to enhance the ability to predict specific conformational states by adjusting the balance between evolutionary information obtained from MSAs and structural information derived from templates.
We recently introduced a novel adaptation of AF2 with randomized alanine sequence scanning (AF2 RASS) of the entire protein sequence that is combined with subsequent MSA subsampling enabling more accurate characterization of the ABL conformational ensembles and interpretable atomistic predictions of the allosteric active and inactive states.45 In the current study, we conduct a detailed comparative analysis of the AF2MSA subsampling adaptations, AF2 RASS approach, and OmegaFold and AlphaFlow methods focusing on predicting structures, conformational ensembles, and populations for a series of state-switching triple ABL mutants that modulate the equilibrium between the active ABL form and the inactive I1 and I2 states. Through a comparative analysis of AI-based structural modeling approaches, we demonstrate that AF2, OmegaFold, and AlphaFlow predictions of allosteric kinase states and mutants are generally strongly biased toward predictions of the thermodynamically stable ground active state and are unable to detect low-populated inactive ABL states and mutation-specific population shifts between the active and inactive forms. In this study, we show that combining AF2 RASS with shallow MSA subsampling can detect structurally different allosteric kinase states and qualitatively reproduce patterns of population shifts in the ABL allosteric mutants. We further validate the utility of the proposed AF2 adaptation for predicting the unique inactive architecture of the BSK8 kinase and structural differences between ligand-unbound apo and ATP-bound forms of BSK8. We argue that the AF2 RASS adaptation with systematic perturbation of the MSAs through iterative random scanning of the protein sequence can loosen coevolutionary constraints and reduce structural “memorization” allowing for conformational sampling of alternative kinase states and detection of mutation-induced allosteric changes and state-switching transformations. The results of this study point to the key challenges of AI-based methods in the prediction of protein conformational ensembles and the effects of mutations on large structural changes.
MATERIALS AND METHODS
AF2 with MSA Shallow Subsampling.
The accuracy of AF2 structure predictions hinges on the quality of the MSA where deeper MSA with thousands of sequences could generally lead to a better prediction of a single thermodynamically dominant protein structure. However, a shallow MSA with fewer than 100 sequences combined with dropout and random seeds can drive the AF2 deep network to sample more alternative functional protein conformations. We employed an AF2 adaptation with shallow MSA whereby reducing/varying the depth of the MSA through stochastic subsampling enables efficient generation of diverse protein conformations.18 Although AF2 adaptation with shallow MSA has gained significant traction and popularity,18–20 it is not obvious whether this approach can accurately predict structures and populations of functional allosteric states, i.e. capture functionally relevant heterogeneity of conformational ensembles. Here, we examined the predictive ability of this AF2 methodology for predicting functional ensembles of ABL kinase.
Structural predictions of the ABL states were conducted using the AF2 framework within the ColabFold implementation46 using a range of MSA depths and MSA subsampling. The MSAs were generated using the MMSeqs2 library47,48 using the ABL1 sequence from residues 240 to 440 as input. We used the max_msa field in the ColabFold to set two AF2 parameters in the following format: max_seqs:extra_seqs. These parameters determine the number of sequences subsampled from the MSA where max_seqs sets the number of sequences passed to the row/column attention track and extra_seqs is the number of sequences additionally processed by the main Evoformer stack. The default MSAs are subsampled randomly to obtain shallow MSAs containing as few as five sequences. We ran simulations with max_seqs:extra_seqs 16:32, 32:64, and 64:128. 128:256, 256:512, and 512:1024 values and report the results at max_seqs:extra_seqs 16:32 that produced the greatest diversity. The lower values encourage more diverse predictions but increase the number of misfolded models. We additionally manipulated the num_recycles parameters to produce more diverse outputs. To generate more data, we set num_recycles to 12, which produces 14 structures starting from recycle 0 to recycle 12 and generates a final refined structure. Recycling is an iterative refinement process, with each recycled structure getting more precise. AF2 makes predictions using five models pretrained with different parameters and, consequently, with different weights. Each of these models generates 14 structures, amounting to 70 structures in total. We then set the num_seed parameter to 1. This parameter quantifies the number of random seeds to iterate through, ranging from random_seed to random_seed +num_seed. We also enabled the use of the dropout parameter, meaning that dropout layers in the model would be active during the time of the predictions.
AF2 with Randomized Alanine Sequence Scanning and Shallow Subsampling.
To address the intrinsic limitations of the AF2 methodologies in predicting functional conformational ensembles of allosteric proteins, we combined the recently introduced AF2 RASS approach45 with shallow MSA sampling.18,19 In addition, AF2 RASS adaptation is further expanded by also exploring targeted alanine masking of the ABL sequence space in functional regions critical for conformational changes between active and inactive kinase states. The enhanced randomization of the native protein sequences followed by construction of the MSAs and shallow subsampling of the MSA could produce gradual diversification of MSAs without mutating the homologous sequences within the MSA, which is the key difference from the SPEACH_AF technique.19 A gradual perturbation of the resulting MSAs drives the AF2 deep network to explore distinct parts of the MSAs that can enhance the detection of structurally different allosteric states of functionally relevant conformational ensembles of the protein kinase.
The original full native sequence is the initial input for the AF2 RASS approach in which the full sequence is iteratively perturbed through randomized alanine scanning. This technique utilizes an algorithm that iterates through each amino acid in the native sequence and randomly substitutes 5–15% of the residues with alanine, to simulate random alanine substitution mutations.45 The algorithm substitutes residue with alanine at each position with a probability randomly generated between 0.05 and 0.15 for each sequence position. We ran this algorithm multiple times (~10–50) on the full sequences for each mutant, resulting in a multitude of distinct sequences, each with different frequencies and positions of alanine mutations. MSAs are then constructed for each of these mutated sequences using the alanine-scanned full-length sequences as input for the MMSeqs2 program.47,48 The AF2 shallow MSA methodology is subsequently employed on these MSAs to predict protein structures. A total of 70 predicted structures were generated from 12 recycles per model. In addition to randomized alanine sequence scanning of the complete sequence, we also examined variations of this approach with targeted alanine masking of the ABL sequence space. In particular, we probed the effects of random alanine masking of sequence positions in the A-loop (residues 398–421) that is critical for conformational change between the active and inactive ABL forms. For each of these targeted alanine-making experiments, we generate 10 alanine scanned sequences, each with a different frequency and position of alanine mutations in the respective A-loop and C-terminal loop regions. The AF2 shallow MSA methodology is subsequently employed on these MSAs to predict protein structures as described previously. The randomized alanine sequence scanning protocol is followed by AF2 shallow subsampling on each of these MSAs with 12 recycles per model and a total of 70 predicted conformations for each of the MSAs constructed. This alanine masking sequence algorithm can produce gradual perturbations of MSAs without mutation of the sequences within the MSA. A gradual diversification of the resulting MSAs enables the AF2 to attenuate coevolutionary and structural memorization biases and detect distinct parts of the MSAs that can determine conformational ensembles of functionally relevant distinct protein states.45
Alanine Sequence Scanning and Latent Space Perturbation Adaptations of OmegaFold.
OmegaFold is a data-driven protein structure prediction tool that was trained on ~110,000 single-chain structures from the PDB and all single domains from the SCOP v1.75 database with at most 40% sequence identity, offering a balance of high speed and accuracy, particularly for proteins that share some sequence similarity with known structures.7 OmegaFold achieved prediction accuracy comparable to AF2 and showed versatility by successfully handling diverse and divergent sequences.7 However, to the best of our knowledge, there has not been any systematic study evaluating the performance of OmegaFold in the prediction of conformational ensembles and functional allosteric protein states. In this study, we compared the performance of various AF2 adaptations against OmegaFold in predicting the allosteric states of the ABL kinase and the effects of allosteric ABL mutants.
We first employed the default OmegaFold approach and applied the randomized alanine sequence scanning approach in the context of OmegaFold architecture. We ran this algorithm 100 times on the full native sequence, resulting in 100 distinct sequences, each with different frequencies and positions of alanine mutations. These sequences were then used as sequence input for OmegaFold, resulting in 100 distinct predicted structures. In addition, to further diversify generated ensembles, we employed OmegaFold in conjunction with latent space perturbation. In this OmegaFold adaptation, we introduce perturbations within the latent space, which is obtained after the full sequence is processed by the PLM. We identify the protein latent representation of the edges after the input sequence is processed by the PLM and multiply this representation matrix by a random value between 0 and 7. We only perturb the edges representation and not the nodes because we do not want to change the structural components of the amino acids themselves within the protein, only the relations between them.
We start with native sequence , which we then apply the alanine mutation algorithm to result in new sequences.
We then input these sequences into the OmegaFold. After it is processed by the PLM, we are given latent representations of the edges:
These matrices are then perturbed by multiplying them by a random number between 0 and 7.
The perturbed matrices are then used as inputs for the Geoformer module of OmegaFold to predict the final structures.
Statistical and Structural Assessment of AF2-Generated Models.
Assessing the AF2 models involves both statistical and structural evaluations to ensure their accuracy and reliability. This typically includes (a) statistical assessment using pLDDT (predicted local distance difference test), which indicates confidence in the predicted positions of amino acids; and (b) PAE (predicted aligned error), which provides an estimate of the expected error in the relative positions of residues.1,2 The generated AF2 models were ranked by pLDDT scores (a per-residue estimate of the prediction confidence on a scale from 0 to 100) and quantified by the fraction of predicted Cα distances that lie within their expected intervals. The values correspond to the model’s predicted scores based on the lDDT-Cα metric, which is a local superposition-free metric that assesses the atomic displacements of the residues in the predicted model.1,2 The global distance test score (GDT_TS)49 and template modeling TM-score50 (or the predicted template modeling pTM score) are often the two standard metrics to evaluate the model quality.
The predicted models are compared to the experimental structure using the structural alignment tool TM-align, an algorithm for sequence-independent protein structure comparison, to assess and compare the accuracy of protein structure predictions.50 TM-align involves optimizing the alignment of residues, refining the alignment through dynamic programming iterations, superposing the structures based on the alignment, and calculating the TM-score as a quantitative measure of the overall accuracy of the predicted models. We used TM-score, which is a metric for assessing the topological similarity of protein structures based on their given residue equivalency.51,52 TM-score ranges from 0 to 1, where a value of 1 indicates a perfect match between the predicted model and the reference structure. A TM-score of >0.5 implies that the structures share roughly the same fold. A TM-score of >0.5 is often used as a threshold to determine if the predicted model has a fold similar to the reference structure. If the TM-score is above this threshold, it suggests that the predicted structure and the reference structure have a significant structural resemblance.52 We ranked the AF2-generated conformations as high- and low-energy conformations using similar considerations outlined in previous studies.23,53 In this analysis, a composite score of the pLDDT metric and TM-score can be considered as an energy function that evaluates model quality based on the premise that high pLDDT and TM-score values for the generated conformations would imply more favorable energetics and enable the detection of thermodynamically stable conformations.53 Similar to the recent statistical analysis of the AF2 predictions,23 we grouped and separated the generated conformations according to the percentage of confident residues (plDDT scores ≥70–80) and then compared them to the experimental NMR structures. The predicted conformations in which >80% of residues featured pLDDT scores of ≥80 were considered as high-confidence predicted conformations that would correspond to the thermodynamically stable states. Accordingly, these conformations were grouped together for each of the ABL mutants to describe the thermodynamically stable states. The root-mean-square deviation (RMSD) superposition of backbone atoms was calculated using ProFit (http://www.bioinf.org.uk/software/profit/).
Principal Component Analysis (PCA).
Principal component analysis (PCA) is a powerful dimensionality reduction tool for analyzing protein structures and conformational ensembles that enables the interpretation and comparison of the main structural changes. In this study, we performed PCA of the AF2-generated conformational ensembles for the ABL WT kinase and ABL mutants and compared the results with the experimental NMR ensembles of the active and inactive ABL states. PCA is employed to characterize functional conformational changes in the AF2-generated ensembles, highlight correlated motions, and classify structural ensembles. Through systematic PCA comparison of the AF2-computed ensembles and NMR ensembles, we can describe conformational variations and characterize population shifts between the inactive and active ABL states in the AF2-generated ensembles. Using this approach, we analyze whether mutation-induced dynamic changes can alter the populations of the active and inactive ABL states. To enable rigorous comparison, PCA was done by merging the NMR ensembles and AF2-generated ensembles and projecting the PCA maps onto the common principal components of the ABL WT protein. Using the projected principal components (eigenvectors) of the ABL WT as the common reference coordinate system, we generated PCA maps for the AF2 ensembles that are compared with the respective PCA maps of the experimentally determined NMR ensembles. Python package ProDy was used to perform PCA for C-alpha Cartesian atoms for the systems.54 This program allows for the characterization of structural variations in heterogeneous data sets of NMR structures as well as for the comparison of these variations with AF2-generated protein conformations. The AF2-generated models with a pLDDT of <70 were excluded from PCA computations to remove a bias of partially misfolded or disordered conformational states.
RESULTS AND DISCUSSION
Predicting Structures and Conformational Ensembles of the ABL Kinase Mutants Using AF2 Adaptation with Shallow MSA Subsampling.
ABL kinase functions as a regulatory switch and is characterized by multiple functional states of the catalytic domain operating through dynamic equilibrium between the active and inactive states. Conformational transitions between the kinase states are orchestrated by allosterically regulated couplings between conserved structural motifs in the catalytic domain including the P-loop αC-helix, the 400-DFG-402 motif in ABL (DFG-in, active; DFG-out, inactive), and the activation loop (A-loop open, active; A-loop closed, inactive) (Figure 1A–C).
Figure 1.
Thermodynamically stable fully active ground state of the ABL kinase domain (PDB ID: 6XR6) (A), the inactive state I1 (PDB ID: 6XR7) (B), and the closed inactive state I2 (PDB ID: 6XRG) (C). The R-spine residues M309, L320, H380, and F401 in the active state and inactive state I1 are shown in spheres, respectively. The ABL structures are shown in orange ribbons. The αC-helix (residues 291–311) is shown in red ribbons, the A-loop (resides 398–421) is shown in blue ribbons, and the 400-DFG-402 motif is shown in yellow sticks. The R-spine residues M309, L320, H380, F401, and D440 are also shown in spheres. The experimental ABL structures are NMR ensembles with 20 structures each. The structures presented for each of the ABL states correspond to the first conformation in the respective NMR ensemble. NMR solution structural ensembles of the active ground state of the ABL kinase domain (PDB ID: 6XR6) (D), the inactive state I1 (PDB ID: 6XR7) (E), and the closed inactive state I2 (PDB ID: 6XRG) (F). The NMR ensembles are depicted in orange, pink, and blue ribbons for the active and inactive I1 and I2 ABL forms, respectively.
The NMR ensemble of the active conformations (PDB ID: 6XR6) is characterized by the “αC-in” position and stable DFG-in orientation (Figure 1D), while in the inactive I1 state (PDB ID: 6XR7), the αC helix moves to the intermediate αC-out position and the DFG motif is flipped 180°, with respect to the active conformation (Figure 1E). In the inactive I2 state, the regulatory DFG motif assumes a distinct “out” conformation and the A-loop swings to a fully closed conformation (Figure 1F). The transition of the P-loop from the kinked conformation observed in the active state (Figure 1A) to the stretched conformation in the I2 state (Figure 1C) is necessary to accommodate the inhibited closed conformation of the A-loop in the inactive I2 conformation. In the fully inactive I2 conformation, the key structural elements A-loop, DFG motif, αC helix, and P-loop adopt conformations that are not compatible with substrate binding or ATP binding (Figure 1C,F). The concerted structural rearrangements by the αC helix, the A-loop, and the P-loop enable conformational switches between the ABL states.
ABL kinase domain exists predominantly in the active state (~88%), but the triple ABL mutant G269E/M309L/T408Y that involves modifications in the P-loop (G269E), R-spine (M309L) and A-loop (T408Y) shifts the equilibrium from the active state toward the I2 state.34 At the same time, several triple mutants sharing T334I gate-keeper mutation can alter the dynamic equilibrium and begin to populate the active form.34 The NMR studies of ABL-T334I single mutant showed that the T334I substitution shifts the equilibrium toward the active state. Because the ABL kinase domain already exists predominantly in the active state (pA ≈ 88%), the experiments used the triple mutant G269E/M309l/T408Y variant, which populates primarily the I2 state (pI2 ≈ 90%) and through the introduction of T334I shifted the equilibrium toward the active state away from the fully inactive form, while M309L/L320I/T334I triple mutant depletes the inactive population entirely and reverses the equilibrium to the catalytically active state (~93%).34
By using different AF2 adaptations, we examined the predictive ability and limitations of these approaches to detect different functional states and characterize mutation-specific shifts between the active and inactive kinase states. First, we employed the AF2 approach with shallow MSA subsampling to predict structures and conformational ensembles of the ABL triple mutants. To gain quantitative insight into the AF2 predictions, we constructed the pLDDT density distribution for the predicted conformational ensembles of the ABL mutants (Figure 2A–C).
Figure 2.
Statistical analysis of the shallow MSA depth ensemble models for the ABL kinase mutants. Density distributions of the pLDDT values for the predicted conformations (in black filled bars) of the G269E/M309L/T408Y mutant (A), G269E/M309L/T334I mutant (B), and M309L/L320I/T334I mutant (C). Distribution density of TM-scores for the predicted ABL mutant conformations with respect to the experimental active ABL state for the G269E/M309L/T408Y mutant (D), G269E/M309L/T334I mutant (E), and M309L/L320I/T334I mutant (F). The TM-score distributions are shown in blue-colored filled bars.
Interestingly, by filtering the AF2-predicted conformations according to the percentage of confident residues (>70% residues with pLDDT scores of ≥80), we found that these stable ABL states typically have high pLDDT scores (pLDDT ≈ 85–90) and conform with high precision to the experimental NMR structure of the active ABL for all triple mutants (Figure 2A–C). The overall pLDDT distributions are similar for all triple mutants, which is indicative of similar conformational ensembles produced by the shallow MSA approach (Figure 2A–C).
The distributions of TM-scores reciprocated the pLDDT profiles, displaying characteristic peaks at a TM-score of ~0.8–0.9, further confirming strong convergence of all predicted stable conformations to the active ABL form for all triple mutant ABL proteins (Figure 2D–F). It is noteworthy that the TM-scores computed with respect to the active ABL structure featured a sharp peak at a TM-score of ~0.9, pointing to exceedingly high structural similarity to the active ABL form. The AF2-generated MSAs are also summarized as a heatmap indicating all sequences mapped to the input sequences, pointing to the statistically significant sequence coverage that was seen for the kinase core regions and C-lobe residues (Supporting Information, Figure 1A–C). The pLDDT profiles of the top five models showed high confidence pLDDT values for the ABL kinase core regions, while the A-loop (residues 395–421) displayed moderately reduced pLDDT values of ~65–75. This is consistent with the evidence that the A-loop conformation cannot be adequately represented by a single structure and undergoes considerable conformational changes during equilibrium switching in the ABL 1 kinase (Supporting Information, Figure 1D,E). The low confidence pLDDT values corresponded to disordered N-terminal residues that were revealed in the NMR ensembles (Supporting Information, Figure 1D,E). Of particular interest are the predicted pLDDT values for Gly-rich P-loop (residues 268-GGGQYG-273), the A-loop (residues 398–421), DFG motif (residues 400–402), and the αC-helix (residues 300–311). The profiles showed only moderate pLDDT values for the P-loop, which is consistent with the fact that the P-loop is one of the most flexible elements in the catalytic core. These results suggest a relationship between conformational flexibility and the pLDDT metric, showing high pLDDT values for the kinase core and moderately lowered pLLDT values for more dynamic functional regions. Similar observations were made in a recent analysis of the correlation between pLDDT and the B-factors for other kinases.55
In order to provide a more quantitative and granular analysis of the predicted ABL conformations, we also computed the RMSD distributions of the generated ensembles with respect to the experimental structures using the complete kinase domain (Figure 3A–C) and the functionally important A-loop region only (Figure 3D–F). The RMSD densities showed a separation between the RMSDs computed with respect to different ABL states, particularly highlighting the dominant peaks for RMSDs < 1.0 Å relative to the active form (Figure 3A–C). We observed contributions of conformations that are similar to the intermediate inactive I1 form (RMSD ≈ 2.0 Å), but these similarities largely reflect the intermediate nature of the inactive I1 state in which the A-loop remains in an open conformation and the N-terminal part of the A-loop is similar in the active and I1 states. The RMSD densities obtained for the A-loop reside only confirmed that the majority of the ensemble conformations are highly similar to the active ABL form (Figure 3D–F). Notably, the RMSDs of the generated conformations relative to the fully inactive I2 structure are large, showing a distinct peak at an RMSD of ~7.0 Å for the A-loop residues (Figure 3D–F). Accordingly, the AF2-produced ensembles with shallow MSA subsampling for all triple ABL mutants populate almost exclusively the active ABL state, with no measurable population of the fully inactive state.
Figure 3.
Analysis of the conformational ensembles for ABL mutants using a shallow MSA subsampling approach. Density distributions of the RMSD values for the predicted conformational ensembles of the G269E/M309L/T408Y mutant (A), G269E/M309L/T334I mutant (B), and M309L/L320I/T334I mutant (C). Density distributions of the RMSD values for the A-loop (residues 395–421) in the predicted ensembles for G269E/M309L/T408Y (D), G269E/M309L/T334I (E), and M309L/L320I/T334I mutants (F). The distribution density of RMSD scores for the AF2-predicted conformations is shown relative to the active ABL state (orange filled bars) and the inactive states I1 (red filled bars) and I2 (blue filled bars).
By constructing scatter plots of the pLDDT values against TM-scores (Figure 4A–C) and against RMSDs (Figure 4D–F) with respect to the active form, we observed strong correlations for all the ABL mutants. The scatter plots revealed a dominant density of the stable conformations, which are defined as ABL states where the large fraction of residues (>70%) featured high confidence values of pLDDT of ~80–90 and therefore correspond to the functionally relevant structures (Figure 4). Instructively, for all mutants, these stable conformations are within an RMSD of <0.5–0.7 Å from the experimental active ABL structure (Figure 4D–F).
Figure 4.
Scatter plots of pLDDT with TM-scores and pLDDT with RMSDs for the conformational ensembles of the ABL mutants obtained with the AF2 shallow MSA subsampling approach. The scatter plots between pLDDT and TM-scores for the G269E/M309L/T408Y mutant (A), G269E/M309L/T334I mutant (B), and M309L/L320I/T334I mutant (C). The scatter plots between pLDDT and RMSD values for the respective ABL triple mutants are shown in panels (D–F).
This analysis illustrated that AF2 shallow MSA subsampling is strongly biased toward predicting only the active ABL structure as the stable ABL conformation (using pLDDT-based criteria for energetically dominant states) while failing to detect the inactive ABL states or any other structurally alternative functional conformations with high pLDDT values. Hence, this approach cannot differentiate between distinct structural effects induced by the ABL mutants, particularly capturing a radical structural change toward the inactive I2 form incurred by G269E/M309L/T408Y mutations.34
Structural mapping of the predicted stable ABL conformations that were filtered from the generated ensemble using a pLDDT of >80 criteria illustrated the convergence to the active ABL state (DFG-in and A-loop in the open extended form) for all triple mutants (Figure 5). The analysis of the high-confidence stable ABL conformations generated by AF2 with MSA subsampling clearly showed that the DFG motif remained in its fully active DFG-in form for all ABL triple mutants (Figure 5D–F), thus failing to detect and recognize the inactive I2 state among thermodynamically stable conformations for the G269E/M309L/T408Y mutant. Structural inspection revealed the dominant active form of the kinase domain fold with moderate A-loop fluctuations only for the G269E/M309l/T408Y mutant (Supporting Information, Figure S2). The alignment of the DFG conformations showed only a minor fraction of DFG-in/out conformations for the G269E/M309L/T408Y mutant (Supporting Information, Figure S2D), while only DFG-in active conformations can be seen for the G269E/M309L/T334I mutant (Supporting Information, Figure S2E) and the M309L/L320I/T334I mutant (Supporting Information, Figure S2F).
Figure 5.
Structural alignment of the AF2-predicted ensembles of stable conformations (>70% of residues with a pLDDT of >80.0) using the shallow MSA subsampling approach. Structural overlay of the kinase catalytic domain conformations and DFG motif from the AF2-predicted conformational ensemble are shown for the G269E/M309L/T408Y mutant (A, D), G269E/M309L/T334I mutant (B, E), and M309L/L320I/T334I mutant (C, F).
The generated ensemble was examined using the recently proposed nomenclature for the structures of active and inactive kinases.56 Interestingly, a small fraction of generated conformations for the G269E/M309L/T408Y mutant corresponded to the “BLBplus “ class (DFG-in/out, αC-helix-out) that represents one of the two common inactive kinase forms56 where αC-helix assumes inactive αC-out conformation and DFG-Phe motif samples intermediate “out” positions and is not completely flipped to the DFG-out state (Supporting Information, Figure S2A,D). These ABL conformations featured reduced pLDDT values, suggesting that they could likely represent “intermediate” states that bear some similarity to highly heterogeneous I1 inactive forms rather than functionally stable ABL structures.34 It should be noted that the DFG motif is flipped by 180° and adopts the DFG-out conformation in the NMR structure of the I1 form with the A-loop remaining in an open conformation.34 To summarize, although subsampling of MSA may increase conformational heterogeneity around the ground active ABL state, this approach cannot detect inactive I2 conformations as a stable functional state of the triple mutant G269E/M309L/T408Y as established by NMR experiments.34 The analysis demonstrated that AF2 with shallow MSA subsampling can be strongly biased toward predicting only the active ABL structure and cannot differentiate between distinct structural effects induced by the ABL mutants, particularly capturing a radical structural change toward the inactive I2 form incurred by G269E/M309L/T408Y mutations.
AF2 RASS Adaptation Can Detect Functional Allosteric States of ABL and Capture Distinct Patterns of Population Shifts in State-Switching ABL Triple Mutants.
We employed a recently developed randomized alanine scanning adaptation of the AF2 methodology in which the algorithm operates first on the pool of sequences and iterates through each amino acid in the native sequence to randomly substitute 5–15% of the residues with alanine, thus emulating random alanine mutagenesis.45 In the proposed protocol, randomized alanine sequence scanning is performed for the entire protein sequence or specific kinase regions involved in conformational changes, followed by the construction of corresponding MSAs and then by AF2 shallow subsampling applied on each of these MSAs. For each of these targeted alanine-making experiments, we generate an ensemble of alanine scanned sequences, each with different frequencies and positions of alanine mutations in the respective regions.
By reranking the predicted conformations according to the percentage of confident residues,23 we selected the stable conformations which are defined as ABL states where the large fraction of residues (>70%) featured the high confidence values of pLDDT of ~80–90 and therefore are assumed to be functionally relevant stable states. The pLDDT distributions for the predicted ABL ensembles are similar for all triple mutants, showing broadly distributed peaks at pLDDT values of ~85–90 and ~75–80 (Figure 6A–C). Importantly, the pLDDT distributions for ensembles of stable conformations obtained with the AF2-RASS approach are appreciably broader than the ones derived from AF2 with shallow MSA sampling, which may reflect the ability of our approach to identify functionally relevant conformational heterogeneity of the ABL states. To analyze the structural signatures of the predicted stable ABL conformations, the RMSD values were computed between the predicted conformations and the experimental active and inactive ABL structures (Figure 6D–F). Importantly, in this case, the results showed a significant overlap in the distributions for the G269E/M309L/T408Y triple mutant (Figure 6D). Notably, the RMSD distribution of predicted conformations for G269E/M309L/T408Y mutant with respect to the inactive I2 conformation showed a peak at an RMSD of ~2.0 Å signaling a significant (~25%) population of the inactive I2 form. Although our results detected the experimentally observed inactive state in the equilibrium distribution of states, the quantitative predictions of the population for the inactive I2 form underestimate the more drastic experimental shift in the G269E/M309L/T408Y mutant populating ~80% of the I2 state.34 In contrast, the RMSD distributions for the G269E/M309L/T334I mutant (Figure 6E) and M309L/L320I/T334I mutant (Figure 6F) are dominated by sharp peaks at an RMSD of ~1.0 Å from the active state, showing that the predicted ensembles for these mutants are populated exclusively (~90%) by the active conformations.
Figure 6.
Analysis of AF2 predictions using randomized alanine sequence scanning approach. The pLDDT distribution density for the AF2-predicted ABL conformational ensembles of the G269E/M309L/T408Y mutant (A), G269E/M309L/T334I mutant (B), and M309L/L320I/T334I mutant (C). The distribution density of RMSD scores for the AF2-predicted conformational ensembles is shown on panels (D–F) for the respective mutants. The RMSD density relative to the active ABL state is shown in orange filled bars, relative to the inactive state I1 in red filled bars and relative to the inactive I2 in blue filled bars.
By constructing scatter plots between pLDDT and RMSD values for the predicted ensembles, we could gain some additional insight into the specifics of mutation-induced structural changes (Figure 7). The most revealing aspect of this analysis concerns the nature of the predicted ensemble for the G269E/M309L/T408Y mutant (Figure 7A–C) that is known to induce allosteric transformation from the active to fully inactive state.34 We observed statistically significant correlations between pLDDT and RMSDs for this mutant in all of the functional states. Notably, the predicted ensemble featured the population of the active, inactive I1, and fully inactive I2 forms. Interestingly, the scatter plot between the pLDDT and RMSDs with respect to the inactive I2 form displayed a measurable fraction of the conformations forming a “funnel”, leading to this fully inactive conformation (Figure 7C). In contrast, the scatter plots between the pLDDT and RMSD values for G269E/M309L/T334I and M309L/L320I/T334I mutants showed meaningful correlations only for the active and I1 states that dominate the ensembles (Figure 7D–I). The observed “isolated funnel” may reflect the intrinsic physical nature of the fully inactive ABL form that occupies an isolated and kinetically poorly accessible region on the conformational landscape that is separated from the dominant basins populated by the thermodynamically stable active form and active-like state I1. We argue that this may be one of the fundamental reasons behind difficulties in the AF2 approaches to detect this hidden allosteric form of ABL kinase. These findings are also consistent with and complement our previous MD/MS simulation studies of the ABL states35 in which it was shown that ABL kinase can transition following the path from the active state to the inactive state I1 while direct transitions from the active state to the inactive state I2 are kinetically highly unfavorable. Moreover, experimental studies confirmed a complex dynamic equilibrium between the ground active state G and inactive states G ↔ I1 ↔ I2 where the kinetic exchange between the active and inactive state I2 is extremely slow due to the DFG flip and dramatic rearrangements in the A-loop.34
Figure 7.
Scatter plots of pLDDT with TM-scores and pLDDT with RMSDs for the conformational ensembles of the ABL mutants obtained with the AF2 alanine sequence scanning approach. The scatter plots between pLDDT and RMSDs with respect to the NMR structures of the active form (PDB ID: 6XR6), inactive I1 state (PDB ID: 6XR7) and inactive I2 state (PDB ID: 6XRG) are shown for the G269E/M309L/T408Y mutant (A–C), G269E/M309L/T334I mutant (D–F), and M309L/L320I/T334I mutant (G–I).
To gain a functionally relevant view of the predicted ensembles, we filtered the AF2 RASS-generated conformations through reranking and selected the stable conformations in which >70% of the kinase domain residues featured a high pLDDT of ~80–90 (Figure 8). Structural mapping of these ABL conformations highlighted functional variations of the A-loop in the G269E/M309L/T408Y mutant that samples both the open (active) and closed (inactive) I2 conformation as well as population switching between DFG-in and DFG-out conformations (Figure 8A,D). The stable ABL conformations reflected a generally more heterogeneous nature of the conformational landscape for the G269E/M309L/T408Y mutant, which is consistent with the NMR data.34 Instructively, structural analysis of this ensemble showed that the population of the inactive conformations can fluctuate between the DFG-out form that is found in many full inactive kinase structures (“BBAminus” nomenclature cluster) and the “BLBplus “ class (DFG-in/out, αC-helix-out) that is also among common inactive kinase forms56 in which DFG tends to adopt a narrow spectrum of DFG-Phe upward intermediate positions that are compatible with the inactive closed form and prevent assembly of the R-spine.
Figure 8.
Structural alignment of the AF2-predicted ensembles of stable conformations (>70% of residues with a pLDDT of >80.0) and DFG motif in the ABL mutants obtained using AF2 RASS. Structural overlay of the AF2-predicted conformational ensemble for G269E/M309L/T408Y mutant (A), G269E/M309L/T334I mutant (B), and M309L/L320I/T334I mutant (C). Structural overlay of the DFG motifs is shown in panels (D–F) for the corresponding mutants.
Hence, the AF2 RASS approach enabled the detection of the allosteric ABL states and predicted population shifts between the fully active form and the fully closed inactive conformations, which contribute ~25% of the entire population (Figure 8A,D). It should be noted that AF2 shallow sampling failed to detect any alternative functional states of ABL and always converged only to the active ABL structure. Noticeably, conformational variability of the predicted ensemble for stable native states (defined by a pLDDT of >80) for triple mutants G269E/M309L/T334I (Figure 8B,E) and M309L/L320I/T334I (Figure 8C,F) is mostly confined to functionally relevant fluctuations around the active ABL form. For the G269E/M309L/T334I mutant, DFG-in assumes a classical active BLAminus conformation according to the comprehensive structural nomenclature of kinase conformations.56
We also observed that the ensemble of the active conformations is more heterogeneous for the M309L/L320I/T334I mutant, where fluctuations around the DFG-in configuration could be seen (Figure 8C,F). A more detailed inspection of the DFG fluctuations in this mutant indicated that DFG adopts the BLAplus conformations that are also attributes of the active kinase state in which the assembled R-spine architecture is preserved,56 owing mainly to a stabilizing role of T334I mutation that stimulates kinase activation. These findings are consistent with the NMR studies, showing that the introduction of T334I can reverse the population of G269E/M309L/T408Y mutant from a predominantly inactive I2 state to a dominant active form (with 93% of the total population).34 Structural mapping of all generated ABL conformations illustrated important differences between ABL mutants (Supporting Information and Figure S3). Significant variations of the A-loop that sampled both the open (active) and closed (inactive) conformations were seen in the G269E/M309L/T408Y mutants (Supporting Information, Figure S3A,D). Conformational variability of the predicted ensemble for triple mutants G269E/M309L/T334I and M309L/L320I/T334I is mostly evident for the flexible N-lobe regions, while the enhanced fluctuations of the key functional regions αC-helix and A-loop are largely around their active positions αC-in and A-loop open (Supporting Information, Figure S3B,C).
Overall, our results demonstrated that AF2 RASS adaptation for the generation of diverse kinase conformations and reranking them based on high pLDDT and pTM scores can identify distinct functional ABL structures that have been observed experimentally but cannot be detected by other AF2 implementations. However, it should be acknowledged that the precise quantitative predictions of the relative populations of the active and inactive states in the ABL mutants remain challenging for the AF2 adaptations as the NMR-measured population of the inactive I2 state in the G269E/M309L/T408Y mutant becomes dominant (~80%) with only 20% of the active state, thus showing a more radical switch in the equilibrium distribution.34
It is also instructive to compare our results with the earlier pioneering study that developed different AF2 implementations and applied them to four mutations that are known to decrease the population of the ground active state (M309L, L3320I, F401 V, M309L/L320I) and four mutations that are expected to increase it (E274 V, T334, F401L, E274 V/T315I).38 Although the results of this illuminating study can sometimes qualitatively predict both the positive and negative effects of mutations on the active state populations, the predicted effects often diverged from the experimental observations and could not reproduce the quantitative ratios of populations.38 For example, AF2 shallow MSA can correctly predict the activating effect of the T334I mutation and the deactivating effect of the F401 V mutant but incorrectly predict the effect of the inactivating M309L mutation. Additionally, M309L/L320I mutation which is known to induce a switch in the kinase from a fully active state (~90%) to an almost fully inactive state (~8%)34 was predicted by subsampled AF2 to slightly increase the population of the ground state.38
We argue that accurate quantitative predictions of population shifts upon allosteric mutations would require new training paradigms in which coevolutionary constraints and structural memorization biases could be properly modulated and attribute optimal weights to enable exploration of the functional states and relevant conformational landscapes.
PCA Comparison of the NMR and AF2-Geneated Conformational Ensembles Using Shallow MSA Subsampling and RASS Approaches.
Of particular interest is a comparison of PCA plots for conformational ensembles derived from NMR ensembles and produced by AF2 adaptations (Figure 9). This analysis confirmed that the conformational ensembles generated using the AF2 RASS approach can localize the alternative ABL states and reflect the functionally significant heterogeneity of the distinct allosteric forms of the ABL kinase (Figure 9). To enable a relevant comparison, PCA was performed by merging the NMR ensembles for both active and inactive states together with AF2-generated ensembles for ABL triple mutants produced by shallow MSA adaptation (Figure 9A–C) and AF2 RASS approaches (Figure 9D–F). PCA of the combined ensembles enabled a direct comparison of differences between the predicted and experimental ensembles through projection on a common reference coordinate system. PCA plots showed a separation between the active ABL ensemble and the NMR ensembles of the inactive I1 and I2 states. The projection of the NMR ensemble on the two principal components showed a separation of the thermodynamically stable active form and fully closed inactive I2 state, while the distribution for the inactive I1 ensemble is more localized (Figure 9).
Figure 9.
PCA for the merged ensemble consisting of NMR ensembles of the active and inactive ABL states combined with AF2-generated ensembles for the ABL mutants. The PCA plot of the merged ensemble that combines the experimentally determined NMR ensemble and shallow MSA AF2-generated ensembles for G269E/M309L/T408Y mutant (A), G269E/M309L/T334I mutant (B), and M309L/L320I/T334I mutant (C). The PCA plot of the merged ensemble consisting of the experimentally determined NMR ensemble and AF2 RASS generated ensembles for G269E/M309L/T408Y mutant (D), G269E/M309L/T334I mutant (E), and M309L/L320I/T334I mutant (F). PCA projection for the NMR ABL active state (6XR6) is shown in green filled circles, for the NMR of the inactive I1 (in blue filled circles), and the inactive state I2 (in red filled circles). PCA projection maps for the AF2-generated ensembles for the G269E/M309L/T408Y mutant are shown in light blue filled circles, for the G269E/M309L/T334I mutant are in yellow-colored filled circles, and for the M309L/L320I/T334I mutant are shown in purple filled circles.
We analyzed the overlaps between PCA distributions for AF2-generated ensembles and NMR ensembles to quantify whether the predicted ensembles can sample alternative ABL forms. The AF2-generated models obtained with shallow MSA subsampling and RJASS were processed to include stable conformations where >70% of residues featured a pLDDT of >80. The AF2-predicted ensembles using shallow MSA adaptation are relatively narrow and pointed to a limited sampling of the conformational space showing moderate overlaps with the active NMR conformations (Figure 9A–C). These observations suggested that although AF2 shallow MSA converged to the active ABL structure, the generated ensembles deviated from the NMR ensemble of the active conformations. Accordingly, this approach showed a considerable structural bias toward the prediction of the dominant active conformation but may be less suitable for mapping conformational landscape and fluctuations around functional states.
In contrast, considerably enhanced sampling and functionally significant coverage of the conformational space were evident in PCAs of the RASS-generated ensembles for the ABL mutants (Figure 9D–F). The distributions of sampled conformations for G269E/M309L/T408Y (Figure 9D) showed an overlap with the NMR ensembles for both the active and inactive states. It is particularly instructive to underscore a dense overlap between the PCA distribution for the G269E/M309L/T408Y mutant and the PCA map for the NMR ensemble of the fully closed inactive I2 state (Figure 9D). The analysis showed that the AF2-generated ensemble for the G269E/M309L/T408Y mutant can identify the inactive allosteric form I2 and predict the experimentally observed inactive state as a functionally significant form populating ~25% in the equilibrium distribution of states. While the precise quantitative predictions of the relative populations of the active and various hidden inactive states in the ABL mutants remain challenging, our results provide evidence that AF2 RASS can detect a spectrum of experimental allosteric ABL states and observe equilibrium redistributions between structurally distinct functional ABL conformations.
PCA distributions obtained for G269E/M309L/T334 (Figure 9E) and M309L/L320I/T334 triple mutants (Figure 9F) showed dense concentrations of conformations in the active conformations amounting to ~90% of the entire population with 5% of the intermediate I1 form with active A-loop conformation and only negligible fraction of the inactive I2 -like states with a partially flipped A-loop. Our results confirm that T334I mutation may promote shifts to a predominantly active form and are consistent with the NMR experiments showing that M309L/L320I/T334I triple mutant depletes the inactive population and reverses the equilibrium almost completely to the active state (~93%).34 Together with our previous analyses, the results demonstrate the ability of the proposed approach to capture physically significant population shifts and produce distinct functional conformational clusters that are associated with the active and inactive ABL states.
Despite certain limitations in accurately reproducing the ensembles of hidden states and precisely quantifying the changes in relative populations of the active and inactive states, we showed that combining alanine sequence masking with MSA construction and shallow MSA subsampling can capture hidden inactive states and mutation-specific patterns of population shifts. These results are also consistent with and expand upon our latest analysis of the ABL kinase showing the feasibility of the AF2 adaptations that integrate alanine sequence masking with shallow MSA sampling for detecting ABL states and mutation-induced allosteric changes.57
OmegaFold and AlphaFlow Methods for Prediction of Allosteric ABL States and Conformational Ensembles for the ABL Mutants: A Comparison with AF2 Approaches.
To compare the AF2 predictions of the ABL state-switching mutants with other approaches, we explored OmegaFold by running an alanine sequence scanning algorithm 100 times on the mutant sequences, resulting in 100 distinct masked sequences for each of the mutants. These sequences were then used as sequence input for OmegaFold, resulting in 100 distinct predicted structures (Figure 10A). In addition, to facilitate diversification of the predicted structures, we introduced perturbations within the latent space, which is obtained after the full sequence is processed by OmegaFold PLM. The protein latent representation of the edges is perturbed by multiplying the representation matrix by a random value between 0 and 7. The perturbed matrices are then used as inputs for the Geoformer module of OmegaFold to predict final structures (see Materials and Methods for further details) (Figure 10B–D). For this analysis, in order to provide a detailed assessment of the OmegaFold performance and enable a quantitative comparison with the AF2 predictions of this study and earlier investigations,38,57 we considered an extended panel of ABL state-switching mutants that included M309L/L320I (mutant 1), M309L/H415P(mutant 2), F378 V/T408Y (mutant 3), L389M/T408Y (mutant 4), T408Y/H415P (mutant 5) G269E/M309L/T408Y (mutant 6), G269E/M309L/T334I (mutant7), M309L/L320I/T334I (mutant 8), and quadruple mutant G269E/E274 V/M309L/T408Y (mutant 9) (Figure 10). These mutants induce allosteric structural changes, including M309L/L320I that increases the population of the hidden I2 state and M309L/H415P and M309L/H415P mutation that moves the equilibrium to the I1 form.34 Some of these ABL mutants including double mutants M309L/L320I, M309L/H415P, and T408Y/H415P were explored for the prediction of conformational ensembles in our latest study.57
Figure 10.
Analysis of OmegaFold predictions for state-switching ABL kinase mutants. The following mutants are considered: M309L/L320I (mutant 1), M309L/H415P(mutant 2), F378 V/T408Y (mutant 3), L389M/T408Y (mutant 4), T408Y/H415P (mutant 5) G269E/M309L/T408Y (mutant 6), G269E/M309L/T334I (mutant7), M309L/L320I/T334I (mutant 8), and quadruple mutant G269E/E274 V/M309L/T408Y (mutant 9). (A) Structural alignment of the OmegaFold-predicted ensembles for ABL kinase mutants obtained with an alanine sequence scanning algorithm. The NMR structure of the active ABL is shown in black ribbons. (B) The average TM-score values for a panel of ABL kinase mutants were obtained from OmegaFold-predicted structures using latent perturbation adaptation. TM-scores with respect to the active state are in orange bars, relative to the I1 structure are in red bars and relative to the I2 structure are in blue bars. (C) Average RMSD values for a panel of ABL kinase mutants obtained from OmegaFold-predicted structures using latent perturbation adaptation. (D) Structural alignment of the OmegaFold-predicted ensembles for ABL kinase mutants obtained with the latent perturbation model. The NMR structure of the active ABL state is shown in black-colored ribbons.
Here, we provide a more complete comparison of the AF2 and OmegaFold approaches using a panel of ABL mutants to examine the ability of OmegaFold to detect a spectrum of the functional ABL structures and probe the sensitivity of this approach to mutational effects. The results revealed a strong convergence of the OmegaFold-generated models to the active ABL conformation with RMSDs of ~0.5 Å from the experimental active structure (PDB ID: 6XR6) for all mutants (Figure 10A). It is evident that even with masking ABL sequences, the generated ensemble produced extremely limited diversity, as most of the A-loop conformations and even N-terminal regions converged to the active ABL structure (Figure 10A). Interestingly, OmegaFold displayed an even stronger prediction bias toward the native active ABL form as compared to AF2 shallow MSA results, as the entire population of generated conformations converged to the same structure and showed the absence of any conformational heterogeneity in the ensemble (Figure 10A).
By introducing perturbations within the latent space, we probed the OmegaFold adaptation that attempted to diversify the predictions. However, the results revealed that OmegaFold-Latent predictions for all ABL mutants continuously converged to the same active ABL state, displaying TM-scores of ~0.95–0.98 (Figure 10B) and RMSDs of <1.0 Å with respect to the native ABL WT structure (Figure 10C). Considerably larger RMSD values of ~4.0–4.5 Å were obtained when the predicted conformations were compared with the I1 structure, while RMSD values of ~7.0–7.2 Å with respect to the fully inactive I2 state clearly indicated the failure of OmegaFold to detect any inactive or inactive-like ABL conformations for any of the ABL mutants (Figure 10C). Structural alignment of the generated ensembles with the native active ABL form illustrates a close structural similarity for all mutants, showing only marginal flexibility in the N-terminal and C-terminal lobe regions (Figure 10D). It should be noticed that different from the triangle multiplicative and triangle attention modules applied in AF2 Evoformer, OmegaFold employs a combination of PLM that allows us to make predictions from single sequences and a geometry-inspired transformer model Geoformer which is trained on protein structures and maintains geometric consistency in the high-dimensional space as well as in the Euclidean space. We argue that the Geoformer module of OmegaFold that imposes strict geometric consistency among single/pair embeddings may overcorrect the generated PLM diversity and force the generated conformations to converge to the dominant native active state. Our results strongly indicated that OmegaFold predictions may be driven by structural “memorization” biases23 that are likely to be further amplified by geometry constraints learned from structural databases. Hence, while OmegaFold may present an excellent tool for predicting native structures of single-domain proteins, Geoformer’s advantages to reduce the noise of multiple conformations may become counterproductive when applied to modeling protein ensembles and exploring allosteric structural effects of mutants.
We also employed the latest AlphaFlow approach, which is a sequence-conditioned denoising model, which receives the noisy structures as templates and samples the protein ensembles under a flow field.24 According to the original study, AlphaFlow when trained and evaluated on the PDB structures can arguably combine both the advantages of accurate structure prediction and the generative capability for conformation sampling, providing enhancement for modeling conformational ensembles compared to AF2 methods.24,25 To provide an additional comparison, we employed AlphaFlow approach that is trained on PDB structures to model experimental ensembles from X-ray crystallography or cryo-EM.24 This approach is fine-tuned on different protein MD trajectories as a regression model using loss functions similar to those in the original AF2. Using this AlphaFlow implementation, we generated conformational ensembles for state-switching triple ABL kinase mutants G269E/M309L/T408Y, G269E/M309L/T334I, and M309L/L320I/T334I, which enabled a direct comparison with the AF2 adaptations and OmegaFold results. The AlphaFlow ensembles were obtained using 100 generated conformations for each mutant (Figure 11). The relatively high TM-scores for the predicted ensembles with respect to both the active ABL state and heterogeneous inactive I1 form (Figure 11A–C) indicated that this approach may sample predominantly the active-like and intermediate ABL states in which the A-loop remains in an open active conformation and the N-terminal part of the A-loop is similar in the active and I1 states. The average RMSD values between the AlphaFlow-predicted conformations and the NMR structures highlighted the convergence of the generated ensembles toward the active ABL conformation and intermediate I1 states across all triple mutants (Figure 11D,E). Strikingly, the results also demonstrated that none of the generated conformations were structurally similar to those of the fully inactive I2 state (Figure 11).
Figure 11.
Analysis of AlphaFlow predictions for state-switching triple ABL kinase mutants. (A–C) The average TM-score values for the predicted structures of G269E/M309L/T408Y, G269E/M309L/T334I, and M309L/L320I/T334I mutants. AlphaFlow generated 100 conformations for each mutant. TM-scores with respect to the active state are in orange bars, TM-scores relative to the I1 structure are in red bars and TM-scores relative to the I2 structure are in blue bars. (D, E) Average RMSD values for the predicted structures of G269E/M309L/T408Y, G269E/M309L/T334I, and M309L/L320I/T334I mutants. RMSDs between the predicted conformations and the active state are in orange bars, RMSDs with respect to the I1 structure are in red bars, and RMSDs with respect to the I2 structure are in blue bars.
We illustrate the conformational diversity of AlphaFlow-generated ensembles using structural alignment of the predicted conformations (Supporting Information and Figure S4). Interestingly, for all triple mutants, the ensembles depicted functionally relevant fluctuations of the active ABL form. Noticeably, the P-loop adopted the extended stretched conformation, which is seen in the inactive states and is different from the kinked P-loop conformation in the active structure that protrudes into the ATP cleft (Supporting Information, Figure S4A–C). Importantly, structural alignment of the predicted DFG conformations and the DFG motif from the active ABL form showed only small displacements around the dominant DFG-in position which is an important structural signature of the active form, while DFG-out flip is necessary to transition to the inactive kinase forms. Overall, AlphaFlow simulations produced conformational ensembles that are reminiscent of the active ABL and I1 forms and captured the conformational heterogeneity of the A-loop regions in the open active form. Hence, AlphaFlow can sample active and intermediate I1 conformations but fail to detect the functionally relevant inactive I2 form. In this context, it may be worth noting that according to NMR studies, I1 lies between the active and the I2 state but is unlikely to be an obligatory intermediate form along the functional transitions and therefore its biological role remains to be unknown.34
To summarize, the results of the comparison between different AI-inspired methods and implementations further underscored the key finding that despite the success of AF2, OmegaFold, and AlphaFlow adaptations are in the process of predicting single protein structures and probing conformational flexibility around native strictures. these methods are still limited in predicting structurally distinct functional conformations of allosteric proteins and capturing the effects of mutations that induce significant structural changes. These results are particularly significant as AF2 predictions of fold-switching allosteric systems including protein kinases also showed structural memorization-driven biases toward predictions of the thermodynamically stable ground states and typically failed to detect less stable “excited” states.23,38
Our results suggested that AF2, OmegaFold, and AlphaFlow may be largely driven by structural memorization23 and are often unable to readily detect functional allosteric states that are structurally different and could include low-populated functional conformations. According to our analysis, a reasonable approach for capturing mutational effects on shifting conformational equilibrium is to introduce random sequence masking across the entire sequence or in specific functional regions, followed by MSA construction and shallow MSA subsampling. We suggest that the improved predictive ability of AF2 RASS adaptation in capturing the population of alternative states may arise from loosening and modulation of coevolutionary “memorization” constraints allowing AF2 to diversify structure-biased attention and increase conformational sampling of protein landscapes. Another potentially promising adaptation of AF2 methods for predicting the effects of point mutations is based on introducing mutations in the entire MSA as compared to only the input sequence.42
Prediction of Unique Apo and Holo States of the Constitutively Inactive Protein Kinase BSK8: A Comparative Analysis of AF2 RASS, OmegaFold, and AlphaFlow Approaches.
Brassinosteroid signaling kinases (BSKs) are plant-specific receptor-like cytoplasmic protein kinases that represent constitutively inactive protein kinases that regulate signal transfer through an allosteric mechanism.58 The crystal structures of the catalytic domain of BSK8 kinase and BSK8 with 5-adenylyl imidodiphosphate (AMP-PNP) revealed unique conformational arrangements of the nucleotide phosphate groups and catalytic key motifs, typically not observed for active protein kinases.59 Strikingly, structural studies discovered that the A-loop adopts an inactive, partly closed conformation. Moreover, other unique features of the structure of BSK8 are the presence of the alanine gatekeeper and the conformation of the CFG motif that does not resemble any known DFG-in or DFG-out conformations as well as the preservation of the closed inactive A-loop conformation in the complex with AMP-PNP indicating that BSK8 kinase is constitutively inactive protein kinase similar to known pseudokinases. We used this kinase with unprecedented structural architecture to evaluate the ability of the AF2 RASS approach in predicting the apo and holo inactive forms of BSK8 (Figure 12). The pLDDT distribution density featured major peaks at a pLDDT of ~75–80 (Figure 12A) suggesting some degree of conformational flexibility in the predicted ensemble. The overlap of the RMSD distributions for the apo and holo BSK8 forms showed pronounced peaks at an RMSD of ~1.0–2.0 Å for the apo structure, while the RMSD density for the holo BSK8 state shifted moderately toward RMSDs of ~2.5–2.8 Å (Figure 12B). Hence, the predicted ensembles conformed fairly closely to the crystal structures. The scatter plots of pLDDT values against RMSDs from the apo and holo forms showed correlations between these metrics for both apo (Pearson correlation coefficient R = −0.7) and holo BSK8 states (R ≈ 0.65), indicating a strong relationship between conformational deviations from the crystallographic states and pLDDT values (Figure 12C,D). Similar observations were made in a previous recent analysis of the correlation between pLDDT and the B-factors for other protein kinase structures.55 Structural alignment of the predicted BSK8 ensemble in the unbound apo illustrated conservation of the kinase domain fold (Figure 12E), showing the expected variability of the N-terminal regions and particularly the A-loop (Figure 12F). The AF2 ensemble featured A-loop conformations that conformed to the unique inactive arrangement seen in the crystal structure while exhibiting moderate fluctuations around the native structure (Figure 12F). Similarly, the predicted structures in the ensemble reproduced the P-loop conformation that binds into the ATP-binding cleft located between the two lobes (Figure 12G).
Figure 12.
Analysis of AF2 predictions for the BSK8 kinase apo and holo structures. (A) pLDDT distribution density for the AF2-predicted ABL conformational ensembles. (B) Distribution density of the RMSD values for the predicted conformations with respect to apo BSK8 crystal structure (in orange-colored filled bars) and AMP-PNP bound BSK8 complex (in maroon-colored filled bars). (C) Scatter plot between pLDDT values and RMSDs between the predicted conformations and BSK8 apo crystal structure (PDB ID: 4I92). (D) Scatter plot between pLDDT values and RMSDs between the generated conformations and BSK8 holo crystal structure (PDB ID: 4I94). (E) Structural alignment of the predicted BSK8 ensemble with the apo crystal structure (shown as black ribbons). (F) Close-up of the predicted A-loop conformations aligned with the crystallographic A-loop conformation (shown as black ribbons). (G) Close-up of the predicted P-loop conformations aligned with the crystallographic P-loop conformation (shown in black ribbons).
The BSK8 holo structure exhibited a similar overall shape compared to the Apo form, but structural rearrangements caused by the nucleotide are located in the A-loop and P-loop. The AF2-predicted ensemble showed reduced mobility in the holo BSK8 complex, as the majority of the generated conformations closely match the crystal structure (Supporting Information, Figure S5A). The predicted A-loop conformations reproduced almost precisely the unusual inactive arrangement also observed in the holo BSK8 structure (Supporting Information, Figure S5A,B). Interestingly, in the AMP-PNP complex, the P-loop formed a unique “omega” shape and moved away from the phosphate groups (Supporting Information, Figure S5C). This arrangement is vastly different from the active ABL kinase that features the kinked conformation pointing inward and occupying the cleft where the phosphate group is located. These unique P-loop and A-loop conformations are characteristic of the inactive form which is precisely what makes BSK8 a constitutively inactive protein kinase. The AF2-predicted conformations showed an excellent alignment with the crystallographic conformation, reproducing the “omega-like” shape of the P-loop and featuring modest fluctuations around the structure (Supporting Information, Figure S5C).
We also compared the AF2 RASS predictions of the apo and holo BSK8 structures (Figure 12) with the corresponding predictions using conventional AF2 implementation with shallow MSA and Omega Fold (Supporting Information, Figure S6). Instructively, in both approaches the RMSD distributions for the generated conformations shifted toward larger RMSD values from the experimental structures, showing peaks at an RMSD of ~4–5 Å (Supporting Information, Figure S6). This can be contrasted with more accurate AF2 RASS predictions of the crystal structures producing RMSD values of ~1.0–2.0 Å (Figure 12B). Similarly to ABL kinase, both AF2 and OmegaFold approaches tend to assign active A-loop conformation and predict active-like BSK8 conformations due to the presence of an overwhelmingly larger set of active conformations in structural databases used for training of these models. These observations further strengthen our arguments suggesting that these approaches cannot accurately predict unique allosteric kinase states and structures with unique architecture that combines elements of fully active and inactive templates as can be seen in the structural modeling of the BSK8 functional forms. The analysis points to the need of incorporating specific patterns of coevolutionary signals together with allosteric-centric structural information on long-range couplings and attention-based learning of allosteric couplings across homologous folds which may help to augment the predictive abilities of AF2-based methods.
CONCLUSIONS
In the current study, we performed a comprehensive analysis and comparison of several AF2 adaptations and OmegaFold and AlphaFlow approaches in predicting distinct allosteric states, conformational ensembles, and mutation-induced structural effects for a panel of state-switching allosteric ABL mutants. The comparison of various AI-inspired methods and implementations emphasized a key insight: despite the success of AF2, OmegaFold, and AlphaFlow in predicting single protein structures in their dominant ground states and exploring conformational flexibility around native structures, these methods still face significant limitations in predicting structurally distinct functional conformations of allosteric proteins, such as ABL kinase, and are generally unable to accurately predict structural effects of allosteric mutations that lead to population shifts and significant structural changes in the ABL kinase. Our results revealed that combining AF2 RASS with shallow MSA subsampling can partially address some of these challenges and enable us to (a) capture functionally relevant conformational diversity of the equilibrium kinase ensembles while avoiding the generation of misfolded or partially folded conformations; (b) detect functional allosteric states for ABL and BSK8 kinases including low-populated, hidden inactive conformations; and (c) qualitatively reproduce the experimentally observed both positive and negative effects of state-switching ABL triple mutants on the conformational equilibrium. The precise quantitative predictions of the relative populations of the active and various hidden inactive states in the ABL state-switching mutants remain problematic for all AI-based approaches due to the hidden nature of the inactive ABL forms on the conformational landscape and lack of direct physical energetics in driving the predictions. Nonetheless, we showed that the AF2 RASS approach correctly predicted structural effects of the G269E/M309L/T408Y mutant that can induce population changes and sample a significant fraction of the fully inactive I2 form. We also demonstrated that G269E/M309L/T334I and M309L/L320I/T334I mutants that share a single activating T334I mutation can reverse the equilibrium and populate exclusively the active ABL form. We further validated and compared the proposed AF2 RASS adaptation with other methods in predicting the unique conformation of the BSK8 kinase, which is constitutively inactive protein kinase and is characterized by unprecedented structural architecture of the apo and holo inactive forms. Although AF2 RASS predictions can capture functionally relevant conformations, the generated ensembles cannot be directly compared with the thermodynamic equilibrium ensemble of conformations. Combining the proposed AF2 adaptation using randomized or targeted alanine sequence masking together with existing AI-based platforms and tools showing promise in describing structures and conformational ensembles alongside atomistic simulations of the predicted stable states60 may enable a more realistic characterization of the protein landscapes and underlying molecular mechanisms. Our analysis argues for introducing new training and architectural paradigms in which coevolutionary constraints and structural memorization biases could be properly modulated within the transformer networks to enable exploration of the allosteric conformational landscapes and characterization of multiple functional states and relevant structural ensembles.
Supplementary Material
ACKNOWLEDGMENTS
G.V. acknowledges support from Schmid College of Science and Technology at Chapman University for providing computing resources at the Keck Center for Science and Engineering at Chapman University.
Funding
This research was supported by the National Institutes of Health under Award 1R01AI181600–01 and Subaward 6069-SC24–11 to G.V and National Institutes of Health under award no. R15GM122013 to P.T.
Footnotes
The authors declare no competing financial interest.
Complete contact information is available at: https://pubs.acs.org/10.1021/acs.jpcb.4c04985
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jpcb.4c04985.
(Figure S1) Statistical analysis of the AF2 experiments with shallow MSA subsampling for the ABL triple mutants; (Figure S2) structural alignment of the AF2-predicted ensembles using shallow MSA subsampling approach; (Figure S3) structural alignment of the conformational ensembles for the kinase domain and DFG motif in the ABL mutants obtained using AF2 RASS approach; (Figure S4) structural analysis and alignments of AlphaFlow predictions for state-switching triple ABL kinase mutants with the NMR structures; (Figure S5) analysis of AF2 predictions for the BSK8 kinase holo structure; (Figure S6) RMSD distribution analysis of AF2 shallow MSA and Omega Fold predicted ensembles for the apo and holo BSK8 structures (PDF)
Contributor Information
Nishank Raisinghani, Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States.
Mohammed Alshahrani, Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States.
Grace Gupta, Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States.
Hao Tian, Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States.
Sian Xiao, Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States.
Peng Tao, Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States.
Gennady Verkhivker, Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States; Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, California 92618, United States; Department of Pharmacology, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States.
Data Availability Statement
Data is fully contained within the article and Supporting Information material. Crystal structures were obtained from Protein Data Bank (http://www.rcsb.org). The rendering of protein structures was done with UCSF ChimeraX package (https://www.rbvi.ucsf.edu/chimerax/) and Pymol (https://pymol.org/2/). The software tools used in this study are available at GitHub sites: https://github.com/deepmind/alphafold; https://github.com/sokrypton/ColabFold/; https://github.com/RSvan/SPEACH_AF; https://www.github.com/HWaymentSteele/AFCluster; https://github.com/bjing2016/alphaflow; https://github.com/HeliXonProtein/OmegaFold; https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/omegafold.ipynb All the data obtained in this work, the software tools, and the in-house scripts are freely available at ZENODO general-purpose open repository: https://zenodo.org/records/11204773
REFERENCES
- (1).Jumper J; Evans R; Pritzel A; Green T; Figurnov M; Ronneberger O; Tunyasuvunakool K; Bates R; Žídek A; Potapenko A; et al. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596, 583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).Tunyasuvunakool K; Adler J; Wu Z; Green T; Zielinski M; Žídek A; Bridgland A; Cowie A; Meyer C; Laydon A; et al. Highly Accurate Protein Structure Prediction for the Human Proteome. Nature 2021, 596, 590–596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (3).Bahdanau D; Cho K; Bengio Y Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014. [Google Scholar]
- (4).Vaswani A; Shazeer N; Parmar N; Uszkoreit J; Jones L; Gomez A; Kaiser L; Polosukhin I Attention is all you need. Adv. Neural Inf. Process. Syst 2017, 30, 5998–6008. [Google Scholar]
- (5).Rives A; Meier J; Sercu T; Goyal S; Lin Z; Liu J; Guo D; Ott M; Zitnick CL; Ma.; et al. Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences. Proc. Natl. Acad. Sci. U.S.A 2021, 118, No. e2016239118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (6).Lin Z; Akin H; Rao R; Hie B; Zhu Z; Lu W; Smetanin N; Verkuil R; Kabeli O; Shmueli Y; et al. Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model. Science 2023, 379, 1123–1130. [DOI] [PubMed] [Google Scholar]
- (7).Wu R; Ding F; Wang R; Shen R; Zhang X; Luo S; Su C; Wu Z; Xie Q; Berger B; et al. High-Resolution de Novo Structure Prediction from Primary Sequence. bioArxiv 2022. [Google Scholar]
- (8).Moussad B; Roche R; Bhattacharya D The Transformative Power of Transformers in Protein Structure Prediction. Proc. Natl. Acad. Sci. U. S. A 2023, 120, No. e2303499120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (9).Wu KE; Yang KK; van den Berg R; Alamdari S; Zou JY; Lu AX; Amini AP Protein Structure Generation via Folding Diffusion. Nat. Commun 2024, 15, 1059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (10).Baek M; DiMaio F; Anishchenko I; Dauparas J; Ovchinnikov S; Lee GR; Wang J; Cong Q; Kinch LN; Schaeffer RD; et al. Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network. Science 2021, 373, 871–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (11).Marchal I RoseTTAFold Expands to All-Atom for Biomolecular Prediction and Design. Nat. Biotechnol 2024, 42, 571. [DOI] [PubMed] [Google Scholar]
- (12).Krishna R; Wang J; Ahern W; Sturmfels P; Venkatesh P; Kalvet I; Lee GR; Morey-Burrows FS; Anishchenko I; Humphreys IR; et al. Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom. Science 2024, 384, No. eadl2528. [DOI] [PubMed] [Google Scholar]
- (13).Valdés-Tresanco MS; Valdés-Tresanco ME; Jiménez-Gutiérrez DE; Moreno E Structural Modeling of Nanobodies: A Benchmark of State-of-the-Art Artificial Intelligence Programs. Molecules 2023, 28, 3991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (14).Sala D; Engelberger F; Mchaourab HS; Meiler J Modeling Conformational States of Proteins with AlphaFold. Curr. Opin. Struct. Biol 2023, 81, No. 102645. [DOI] [PubMed] [Google Scholar]
- (15).Saldaño T; Escobedo N; Marchetti J; Zea DJ; Mac Donagh J; Velez Rueda AJ; Gonik E; García Melani A; Novomisky Nechcoff J; Salas MN; et al. Impact of Protein Conformational Diversity on AlphaFold Predictions. Bioinformatics 2022, 38, 2742–2748. [DOI] [PubMed] [Google Scholar]
- (16).Chakravarty D; Porter LL AlphaFold2 Fails to Predict Protein Fold Switching. Protein Sci. 2022, 31, No. e4353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (17).Versini R; Sritharan S; Aykac Fas B; Tubiana T; Aimeur SZ; Henri J; Erard M; Nüsse O; Andreani J; Baaden M; et al. A Perspective on the Prospective Use of AI in Protein Structure Prediction. J. Chem. Inf. Model 2024, 64, 26–41. [DOI] [PubMed] [Google Scholar]
- (18).Del Alamo D; Sala D; Mchaourab HS; Meiler J Sampling Alternative Conformational States of Transporters and Receptors with AlphaFold2. Elife 2022, 11, No. e75751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (19).Stein RA; Mchaourab HS SPEACH_AF: Sampling Protein Ensembles and Conformational Heterogeneity with Alphafold2. PLoS Comput. Biol 2022, 18, No. e1010483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (20).Wayment-Steele HK; Ojoawo A; Otten R; Apitz JM; Pitsawong W; Hömberger M; Ovchinnikov S; Colwell L; Kern D Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 2023, 625, 832–839. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (21).Chakravarty D; Schafer JW; Chen EA; Thole JR; Porter LL AlphaFold2 Has More to Learn about Protein Energy Landscapes. bioRxiv 2023. [Google Scholar]
- (22).Sala D; Hildebrand PW; Meiler J Biasing AlphaFold2 to Predict GPCRs and Kinases with User-Defined Functional or Structural Properties. Front. Mol. Biosci 2023, 10, 1121962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (23).Chakravarty D; Schafer JW; Chen EA; Thole JF; Ronish LA; Lee M; Porter LL AlphaFold Predictions of Fold-Switched Conformations Are Driven by Structure Memorization. Nat. Commun 2024, 15, 7296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (24).Jing B; Berger B; Jaakkola T AlphaFold Meets Flow Matching for Generating Protein Ensembles. arXiv 2024. [Google Scholar]
- (25).Li S; Li M; Wang Y; He X; Zheng N; Zhang J; Heng P-A Improving AlphaFlow for Efficient Protein Ensembles Generation. arXiv 2024. [Google Scholar]
- (26).Taylor SS; Wu J; Bruystens JGH; Del Rio JC; Lu T-W; Kornev AP; Ten Eyck LF From Structure to the Dynamic Regulation of a Molecular Switch: A Journey over 3 Decades. J. Biol. Chem 2021, 296, No. 100746. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (27).Johnson TK; Bochar DA; Vandecan NM; Furtado J; Agius MP; Phadke S; Soellner MB Synergy and Antagonism between Allosteric and Active-Site Inhibitors of Abl Tyrosine Kinase. Angew. Chem., Int. Ed. Engl 2021, 60, 20196–20199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (28).Meng Y; Gao C; Clawson DK; Atwell S; Russell M; Vieth M; Roux B Predicting the Conformational Variability of Abl Tyrosine Kinase Using Molecular Dynamics Simulations and Markov State Models. J. Chem. Theory Comput 2018, 14, 2721–2732. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (29).Paul F; Thomas T; Roux B Diversity of Long-Lived Intermediates along the Binding Pathway of Imatinib to Abl Kinase Revealed by MD Simulations. J. Chem. Theory Comput 2020, 16, 7852–7865. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (30).Paul F; Meng Y; Roux B Identification of Druggable Kinase Target Conformations Using Markov Model Metastable States Analysis of Apo-Abl. J. Chem. Theory Comput 2020, 16, 1896–1912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (31).Ayaz P; Lyczek A; Paung Y; Mingione VR; Iacob RE; de Waal PW; Engen JR; Seeliger MA; Shan Y; Shaw DE Structural Mechanism of a Drug-Binding Process Involving a Large Conformational Change of the Protein Target. Nat. Commun 2023, 14, 1885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Stiller JB; Otten R; Häussinger D; Rieder PS; Theobald DL; Kern D Structure Determination of High-Energy States in a Dynamic Protein Ensemble. Nature 2022, 603, 528–535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (33).Saleh T; Rossi P; Kalodimos CG Atomic View of the Energy Landscape in the Allosteric Regulation of Abl Kinase. Nat. Struct. Mol. Biol 2017, 24, 893–901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (34).Xie T; Saleh T; Rossi P; Kalodimos CG Conformational states dynamically populated by a kinase determine its function. Science. 2020, 370, No. eabc2754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (35).Krishnan K; Tian H; Tao P; Verkhivker GM Probing Conformational Landscapes and Mechanisms of Allosteric Communication in the Functional States of the ABL Kinase Domain Using Multiscale Simulations and Network-Based Mutational Profiling of Allosteric Residue Potentials. J. Chem. Phys 2022, 157, No. 245101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (36).Faezov B; Dunbrack RL Jr. AlphaFold2Models of the Active Form of All 437 Catalytically Competent Human Protein Kinase Domains. bioRxiv 2023. [Google Scholar]
- (37).Herrington NB; Stein D; Li YC; Pandey G; Schlessinger A Exploring the Druggable Conformational Space of Protein Kinases Using AI-Generated Structures. bioRxiv 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (38).Monteiro da Silva G; Cui JY; Dalgarno DC; Lisi GP; Rubenstein BM High-Throughput Prediction of Protein Conformational Distributions with Subsampled AlphaFold2. Nat. Commun 2024, 15, 2464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (39).Yang Z; Zeng X; Zhao Y; Chen R AlphaFold2 and Its Applications in the Fields of Biology and Medicine. Signal Transduct Target Ther. 2023, 8, 115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (40).Buel GR; Walters KJ Can AlphaFold2 Predict the Impact of Missense Mutations on Structure? Nat. Struct Mol. Biol 2022, 29, 1–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (41).Pak MA; Markhieva KA; Novikova MS; Petrov DS; Vorobyev IS; Maksimova ES; Kondrashov FA; Ivankov DN Using AlphaFold to Predict the Impact of Single Mutations on Protein Stability and Function. PLoS One 2023, 18, No. e0282689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (42).Stein RA; Mchaourab HS Rosetta Energy Analysis of AlphaFold2Models: Point Mutations and Conformational Ensembles. bioRxiv 2024. [Google Scholar]
- (43).McBride JM; Polev K; Abdirasulov A; Reinharz V; Grzybowski BA; Tlusty T AlphaFold2 Can Predict Single-Mutation Effects. Phys. Rev. Lett 2023, 131, No. 218401. [DOI] [PubMed] [Google Scholar]
- (44).McBride JM; Tlusty T AI-Predicted Protein Deformation Encodes Energy Landscape Perturbation. Phys. Rev. Lett 2024, 133, No. 098401. [DOI] [PubMed] [Google Scholar]
- (45).Raisinghani N; Alshahrani M; Gupta G; Tian H; Xiao S; Tao P; Verkhivker GM Integration of a Randomized Sequence Scanning Approach in AlphaFold2 and Local Frustration Profiling of Conformational States Enable Interpretable Atomistic Characterization of Conformational Ensembles and Detection of Hidden Allosteric States in the ABL1 Protein Kinase. J. Chem. Theory Comput 2024, 20, 5317–5336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (46).Mirdita M; Schütze K; Moriwaki Y; Heo L; Ovchinnikov S; Steinegger M ColabFold: Making Protein Folding Accessible to All. Nat. Methods 2022, 19, 679–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (47).Steinegger M; Söding J MMseqs2 Enables Sensitive Protein Sequence Searching for the Analysis of Massive Data Sets. Nat. Biotechnol 2017, 35, 1026–1028. [DOI] [PubMed] [Google Scholar]
- (48).van Kempen M; Kim SS; Tumescheit C; Mirdita M; Lee J; Gilchrist CLM; Söding J; Steinegger M Fast and Accurate Protein Structure Search with Foldseek. Nat. Biotechnol 2023, 42, 243–246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (49).Zemla A LGA: A Method for Finding 3D Similarities in Protein Structures. Nucleic Acids Res. 2003, 31, 3370–3374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (50).Zhang Y TM-Align: A Protein Structure Alignment Algorithm Based on the TM-Score. Nucleic Acids Res. 2005, 33, 2302–2309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (51).Zhang Y; Skolnick J Scoring Function for Automated Assessment of Protein Structure Template Quality. Proteins 2004, 57, 702–710. [DOI] [PubMed] [Google Scholar]
- (52).Xu J; Zhang Y How Significant Is a Protein Structure Similarity with TM-Score = 0.5? Bioinformatics 2010, 26, 889–895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (53).Roney JP; Ovchinnikov S State-of-the-Art Estimation of Protein Model Accuracy Using AlphaFold. Phys. Rev. Lett 2022, 129, No. 238101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (54).Bakan A; Meireles LM; Bahar I ProDy: Protein Dynamics Inferred from Theory and Experiments. Bioinformatics 2011, 27, 1575–1577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (55).Al-Masri C; Trozzi F; Lin S-H; Tran O; Sahni N; Patek M; Cichonska A; Ravikumar B; Rahman R Investigating the Conformational Landscape of AlphaFold2-Predicted Protein Kinase Structures. Bioinform Adv. 2023, 3, vbad129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (56).Modi V; Dunbrack RL Jr. Defining a New Nomenclature for the Structures of Active and Inactive Kinases. Proc. Natl. Acad. Sci. U. S. A 2019, 116, 6818–6827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (57).Raisinghani N; Alshahrani M; Gupta G; Verkhivker G Predicting Mutation-Induced Allosteric Changes in Structures and Conformational Ensembles of the ABL Kinase Using AlphaFold2 Adaptations with Alanine Sequence Scanning. Int. J. Mol. Sci 2024, 25, 10082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (58).Sreeramulu S; Mostizky Y; Sunitha S; Shani E; Nahum H; Salomon D; Hayun LB; Gruetter C; Rauh D; Ori N; et al. BSKs are partially redundant positive regulators of brassinosteroid signaling in Arabidopsis. Plant J. 2013, 74, 905–919. [DOI] [PubMed] [Google Scholar]
- (59).Grütter C; Sreeramulu S; Sessa G; Rauh D Structural Characterization of the RLCK Family Member BSK8: A Pseudokinase with an Unprecedented Architecture. J. Mol. Biol 2013, 425, 4455–4467. [DOI] [PubMed] [Google Scholar]
- (60).Brown BP; Stein RA; Meiler J; Mchaourab HS Approximating Projections of Conformational Boltzmann Distributions with AlphaFold2 Predictions: Opportunities and Limitations. J. Chem. Theory Comput 2024, 20, 1434–1447. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data is fully contained within the article and Supporting Information material. Crystal structures were obtained from Protein Data Bank (http://www.rcsb.org). The rendering of protein structures was done with UCSF ChimeraX package (https://www.rbvi.ucsf.edu/chimerax/) and Pymol (https://pymol.org/2/). The software tools used in this study are available at GitHub sites: https://github.com/deepmind/alphafold; https://github.com/sokrypton/ColabFold/; https://github.com/RSvan/SPEACH_AF; https://www.github.com/HWaymentSteele/AFCluster; https://github.com/bjing2016/alphaflow; https://github.com/HeliXonProtein/OmegaFold; https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/omegafold.ipynb All the data obtained in this work, the software tools, and the in-house scripts are freely available at ZENODO general-purpose open repository: https://zenodo.org/records/11204773