Graphical abstract
Keywords: Abelson 1 kinase, Mutations, Drug resistance, Structure-guided machine learning, Graph-based signatures
Abstract
Kinases play crucial roles in cellular signalling and biological processes with their dysregulation associated with diseases, including cancers. Kinase inhibitors, most notably those targeting ABeLson 1 (ABL1) kinase in chronic myeloid leukemia, have had a significant impact on cancer survival, yet emergence of resistance mutations can reduce their effectiveness, leading to therapeutic failure. Limited effort, however, has been devoted to developing tools to accurately identify ABL1 resistance mutations, as well as providing insights into their molecular mechanisms. Here we investigated the structural basis of ABL1 mutations modulating binding affinity of eight FDA-approved drugs. We found mutations impair affinity of type I and type II inhibitors differently and used this insight to developed a novel web-based diagnostic tool, SUSPECT-ABL, to pre-emptively predict resistance profiles and binding free-energy changes (ΔΔG) of all possible ABL1 mutations against inhibitors with different binding modes. Resistance mutations in ABL1 were successfully identified, achieving a Matthew’s Correlation Coefficient of up to 0.73 and the resulting change in ligand binding affinity with a Pearson’s correlation of up to 0.77, with performances consistent across non-redundant blind tests. Through an in silico saturation mutagenesis, our tool has identified possibly emerging resistance mutations, which offers opportunities for in vivo experimental validation. We believe SUSPECT-ABL will be an important tool not just for improving precision medicine efforts, but for facilitating the development of next-generation inhibitors that are less prone to resistance. We have made our tool freely available at http://biosig.unimelb.edu.au/suspect_abl/.
1. Introduction
Kinases can modulate protein activities through phosphorylation, acting as an essential on/off switch in many cellular signalling pathways. In chronic myelogenous leukemia (CML), ABeLson 1 (ABL1) kinase is constitutively overactivated as a consequence of chromosomal translocation with the breakpoint cluster region, and the induced abnormal cellular environment triggers malignant cell growth, initiating cancers [1]. The approval of imatinib in 2001, a first generation tyrosine kinase inhibitor (TKI), significantly improved cancer patient prognosis compared to the standard interferon alpha and cytarabine combination, highlighting the importance of targeting kinases in cancer treatment [2]. At the cellular level, imatinib competes with ATP to bind to, and consequently inhibit ABL1 kinase [3], [4], [5]. However, with increased clinical use over time, the therapeutic relevance of imatinib decreased due to the accumulation of missense mutations within its target, ABL1 kinase, especially those residues participating in drug recognition and binding. To overcome imatinib resistance, efforts were directed towards the development of second- and third-generation TKIs, such as dasatinib, nilotinib, bosutinib, ponatinib, and axitinib.
While several resistance mutation hotspots in ABL1 kinase (at gatekeeper, P-loop, αC-helix, and A-loop regions) modulating the binding affinity of different drugs have been well characterised [6], the increased mutation rate in cancers leads to the introduction of variants whose effects on therapeutic efficacy have not been previously studied. Pre-emptively identifying kinase mutations leading to potential therapeutic failure for specific drugs could lead to better and more personalised patient treatment and management.
One challenge for tackling ABL1 drug resistance is that mutations at the same position may modulate the drug affinity to a different extent, depending not only on the type of mutant amino acids but also on the drugs being used. Although most approved kinase inhibitors are ATP-competitive, they can be sensitive to different changes of the residue environment brought by different mutations, as each of them can have distinctive interactions with the kinase based on their favourable binding poses: (1) type I inhibitors bind to the active kinase (DFG-in), mainly occupying the hinge region where the adenine ring of the ATP binds; (2) type II inhibitors bind to the inactive kinase (DFG-out), extending to a hydrophobic back pocket while maintaining interactions with the ATP binding site. Apart from altering the drug affinity directly through the local atomic changes, mutations can also impact protein stability as well as dynamics, which may trigger conformational changes and impact drug recognition and interactions [7]. Moreover, these unknown structural changes may even cause favorable interactions with the endogenous ATP rather than the drugs, which could lead to the loss of drug competency and off-target effects. The broad repertoire of mutation effects and binding modes makes the characterization of molecular mechanisms of the resistance profiles a challenging task.
Experimental methods studying the effect of mutations on therapeutic efficacy, such as cell-based mutagenesis screening and yeast-based flow-cytometry [8], are labour intensive. Computational methods, on the other hand, have been proven useful and cost-effective for characterising the consequence of mutations. Although several predictive models have been developed to identify potential ABL drug resistance mutations [9], [10], [11], their performances were not sufficient to aid clinical decision support nor are they freely available for use or evaluation. Furthermore, limited insights into the molecular mechanism of drug resistance have been inferred from current models. Thus, there is a demand to develop computational tools capable of accurately predicting and understanding kinase drug resistance mutations.
We have previously shown that a suite of structural- and sequence-based computational tools which characterise the molecular consequence of mutations on protein dynamics, flexibility, stability, as well as ligand binding affinity, can successfully predict anti-tuberculosis drug resistance [12], [13], [14], [15], [16]. Additionally, we have also shown that the concept of graph-based signatures could be used to model both protein and small molecule structures, capturing both physicochemical and geometric properties [17], [18], [19]. Incorporating these two approaches, here we developed a novel, web-based diagnostic tool, SUSPECT-ABL (StrUctural Susceptibility PrEdiCTion for ABeLson 1 kinase), to pre-emptively predict the binding free-energy changes (ΔΔG) and resistance profiles of all possible ABL mutations against eight FDA-approved drugs. The newly identified resistance mutations via in silico saturation mutagenesis offer opportunities for prioritization of in vivo experimental validation.
2. Results and discussion
The methodology of this project is summarised in Fig. 1 by the following four steps: (1) data and structural curation, which involved molecular docking of inhibitors in the absence of experimental co-crystallized structures; (2) feature engineering, which involved the generation and evaluation of a set of features capturing the structural, geometric and physicochemical properties of the protein and ligands; (3) machine learning, where structural insights obtained were used to train and test supervised machine learning algorithms to accurately predict single-point mutations in ABL leading to resistance against eight inhibitors and the corresponding changes in Gibbs free energy of binding (ΔΔG); (4) web server development, where computational saturation mutagenesis results for eight FDA-approved drugs and the 3-dimensional visualization of ABL-ligand interactions were made freely available through a web server.
2.1. Structural information on ABL-ligand complexes
We curated 144 binding affinity effects (given as ΔΔG values) for 31 ABL mutations against eight FDA-approved drugs [9]. The corresponding ABL-ligand co-crystallized structures were curated from the RCSB Protein Data Bank [20] as available or generated by virtual docking [21], [22] (Table S1). The 31 mutations are located throughout the kinase domain, with some present at the ATP binding site, and multiple mutations clustering at the same position (Fig. 2A). Among them, six mutations were observed to cause resistance (as denoted by an experimental ΔΔG ≥ 1.36 Kcal/mol, leading to more than 10-fold affinity loss) to at least one drug. Although these mutations were all located within the ATP binding pocket, we found the mutated residues are not essential for the binding of ATP, as shown by a lack of disruption to ATP-kinase interactions [23]. This may suggest that to cause drug resistance, the mutations should not impair the binding ability of ATP drastically.
Overall, the Abl-ligand complexes employed in this study are structurally diverse. This enables thorough analysis and predictions of resistance mutations throughout the kinase domain against different ligands with respective binding modes. The eight drugs include both type I (Axitinib, Bosutinib, Dasatinib, Erlotinib, and Gefitinib) and type II (Imatinib, Nilotinib, and Ponatinib) inhibitors. Superimposing the structures, we confirmed that the major conformational difference between the two binding modes is at the activation loop (Fig. 2B).
In the absence of Erlotinib- and Gefitinib-bound crystal structures, we docked these two inhibitors in the Bosutinib-bound structure (PDB ID: 3UE4) due to the shared type I binding mode and structural similarity among the three ligands. Similar to the quinoline group of Bosutinib, the recurring quinazoline group of Erlotinib and Gefitinib was thought to be the main determinant for binding, as it mimics the adenine ring of ATP. To confirm that the obtained poses of Erlotinib and Gefitinib were favourable, we calculated the intermolecular interactions using Arpeggio [23]. We found the quinazoline groups form hydrogen bonds with the hinge residue Met318, and Carbon-π interactions with Ala269 and Leu370, which is a shared characteristic with Bosutinib (Fig. S1). Moreover, the benzene rings connected to the quinazolines form Carbon-π interactions with the ABL1, which coincides with their interactions with epidermal growth factor receptor (EGFR) – the protein for which the Erlotinib and Gefitinib were originally developed as type I inhibitors.
2.2. Structural, biophysical, and evolutionary consequences of ABL mutations against different inhibitors
Although all resistance mutations in the dataset are located near the ATP site, the same mutation can modulate the binding affinity of type I and II inhibitors to a different extent due to their distinctive binding modes. We found some significant differences between resistant (ΔΔG ≥ 1.36 Kcal/mol, n = 19) and susceptible (ΔΔG <1.36 Kcal/mol, n = 125) mutations specific to each type of inhibitor.
To better capture the cellular environments, we calculated the effects of mutations on both inhibitors and the endogenous ligand (ATP). Interestingly, resistance mutations against type I inhibitors were specifically found to be more associated with ATP modulation due to similar binding sites, even though the mutated residues do not directly interact with ATP. These mutations are more likely to impair an Amide-Ring interaction between residues, but fewer Carbon-π interactions were observed between the wild-type residues and ATP. Moreover, they have lower SIFT scores [24], which means the amino acid substitutions are more intolerant to the accumulation of mutations, resulting in detrimental effects on protein function (Fig. 3A, ‘Mann-Whitney’ test p-values <0.005). These findings suggest that resistance mutations against type I inhibitors may occur within the hinge region where the sequence is evolutionary conserved, but they may reduce the affinity of drugs more than the ATP. On the other hand, the affinity of type II inhibitors has been observed to be less sensitive to these mutations, as they extend to the extra pockets and reduce contacts with the hinge region (Fig. S2).
Resistance mutations against type II inhibitors, on the other hand, occur frequently in the phosphate-binding loop (P-loop). They usually involve more van der Waals clashes in the wild-type environment, destabilize the kinase to a lower extent, and have lower residual solvent accessibility (Fig. 3B, ‘Mann-Whitney’ test p-values <0.05). This can be caused by impairing the distinctive “kinked” conformation of the P-loop in the inactive ABL (Fig. S2), which is maintained by van der Waals clashes between several essential residues, and therefore decreasing the surface complementarity with type II inhibitors, favoring the ATP binding [25].
While drug binding affinity can be modulated by different structural, biophysical and evolutionary changes depending on the binding modes, we found that the gatekeeper mutation T315I can cause resistance to both types I and II inhibitors, with the exception of Axitinib (type I) and Ponatinib (type II). It has been shown that T315I causes resistance subject to a steric hindrance by a substantial conformational change from the inactive ABL (DFG-out) to the active form (DFG-in) [26], [27]. However, axitinib can still inhibit the T315I DFG-in mutant structure potently due to its distinct binding pattern – without close contacts with the activation loop [28]; and Ponatinib can effectively inhibit the conformational change of the activation loop, locking the T315I mutant complex at the DFG-out status without affecting ligand interactions [29], [30]. The overall higher efficacy of these two inhibitors across all the mutations can be associated with a common characteristic – having the lowest topological polar surface area (TPSA) within the respective binding modes, which makes them less sensitive to polarity change brought by mutations. Therefore, to better capture the impact of mutations on drug binding affinity, it is important to describe the residue environment using both structural and ligand information.
2.3. Quantifying the changes in binding affinity upon single point ABL mutations
Based on the patterns observed through structural analysis of the ABL-ligand complexes and the molecular properties of ligands, we next sought to quantitatively predict the changes in ΔΔG upon missense mutation in response to each of the drugs using supervised machine learning. Different machine learning algorithms from python sklearn library [31] were used to train and optimize regression models through bottom-up greedy feature selection on 204 ΔΔG values, which included 102 ΔΔGs obtained from hypothetical reverse mutations, generated to better balance the distribution of values, which are naturally biased towards mutations decreasing binding affinity. Of these, the Extra Tree regressor was chosen based on better and consistent performance between cross-validation and external blind test set, measured through Pearson’s correlation. The final model, SUSPECT-ABL was made up of 10 features, which broadly describe the protein local environmental changes upon mutation, ligand properties and sequence-derived evolutionary properties (Table S3).
On a low-redundancy validation scheme, SUSPECT-ABL was able to achieve Pearson's, Spearman's and Kendall's correlations of 0.77, 0.77 and 0.59 respectively (RMSE = 0.68 Kcal/mol) on the training set, under leave-one-position-out cross-validation; and 0.74, 0.57 and 0.42 (RMSE = 0.42 Kcal/mol) on the blind test (Fig. 4), significantly outperforming other models which use data-driven or physics-based computational approaches (Table 1). After removing the 10% outliers, Pearson’s correlations increased to 0.84 on cross-validation and 0.83 on the blind test. These outlier mutations were all located in beta strands. Among them, the hypothetical reverse mutations lay mostly below the best-fit line (underestimated), whereas the predictions for their respective original forward mutations lay above (overestimated). This can be caused by compromising several strong resistance mutations in beta strands. Unlikely, the 10% outliers in the blind test set are all located at position 317 in a loop, which is thought to be a result of its overrepresentation (64%) in the blind test set. The performance when including only the forward mutations, and using 10-fold cross-validation are shown in Table S2 for comparison purposes.
Table 1.
Name | Method | Pearson | Spearman | Kendall | RMSE (Kcal/mol) |
---|---|---|---|---|---|
SUSPECT-ABL | Machine Learning | 0.74 | 0.57 | 0.42 | 0.40 |
mCSM-lig [49] | Machine Learning | 0.43 | 0.33 | 0.24 | 0.75 |
Aldeghi et al.[11] | Machine Learning | 0.06 | 0.21 | 0.15 | 0.64 |
Hauser et al.[9] | Molecular Dynamics | 0.65 | 0.38 | 0.30 | 0.69 |
A99 [11] | Molecular Dynamics | 0.40 | 0.26 | 0.17 | 0.68 |
A99L | Molecular Dynamics | 0.60 | 0.37 | 0.26 | 0.55 |
A99DC | Molecular Dynamics | 0.58 | 0.45 | 0.31 | 0.55 |
A14 | Molecular Dynamics | 0.37 | 0.37 | 0.26 | 0.70 |
R15[11] | Rosetta | 0.72 | 0.50 | 0.38 | 0.49 |
R16 | Rosetta | 0.47 | 0.29 | 0.20 | 0.70 |
Among the 10 selected features (Table S3), 7 are graph-based signatures that capture the geometric properties of atoms within the wild-type local residue environment, describing the 3-dimensional inter- and intra-atomic arrangements in both ABL-drug complexes and the ABL-ATP complex. Additionally, there are two ligand features, positively ionizable atom counts and the number of rotatable bonds, which potentially distinguish type I and II inhibitors as well as the higher effectiveness of axitinib and ponatinib in general. Lastly, a sequence-based feature that relies on the optimal substitution matrix, incorporating both multiple sequence alignment and the evolutionary information in the context of secondary structure, KOSJ950100_SST (aaindex[32], [33]), could potentially capture the impact of mutations on protein structure.
Based on such diverse features ranging from the geometric properties of ABL-ligand structures, ligand properties, and the structural impact of mutations, we further investigated whether SUSPECT-ABL shows similar performance on each of the binding modes. By expanding the prediction results, we found that although the Pearson’s correlations are inconsistent across the two binding modes on training and blind test sets, this could be caused by limited sample size and high similarity between samples in the blind test set (Fig. S3). On the other hand, the model achieved reasonable RMSE for both types I (0.74 Kcal/mol on cross-validation, 0.46 Kcal/mol on blind test) and II (0.61 Kcal/mol on cross-validation, 0.34 Kcal/mol on blind test) inhibitors. Therefore, we still expect that SUSPECT-ABL could generalize well on novel data for both binding modes.
2.4. Classifying resistant and susceptible ABL mutations
While the binding free-energy changes can give us insights into how strongly the mutations affect drug affinity, for clinical applications, it can also be valuable to directly predict whether a given mutation could lead to drug resistance. Rather than transforming the predictions from our regressor to resistant (ΔΔG ≥ 1.36 Kcal/mol) and susceptible (ΔΔG <1.36 Kcal/mol) mutations using the defined 10-fold affinity loss threshold, we decided to train a separate classifier to improve the model robustness. To keep consistency, the dataset was split into training and blind test in the same way. Since only a few mutations increase the drug binding affinity (ΔΔG <0), the inclusion of hypothetical reverse mutations does not mitigate the data imbalance (19 resistant and 125 susceptible mutations). With such limited samples of resistance mutations, however, SUSPECT-ABL still achieved Matthew’s correlation coefficient (MCC) of 0.73 (AUC = 0.84, Fig. 5) on the training set under stratified leave-one-position-out cross-validation, and 0.63 (AUC = 0.89) on the blind test, using the Random Forest classification algorithm. The performance when including the hypothetical reverse mutations, and using 10-fold cross-validation are shown in Table S4 for comparison. Notably, the nine selected features in the classifier overlap significantly with the key features distinguishing resistant and susceptible mutations detected previously in Fig. 3 (TPSA, Δ Amide-Ring interaction, and DynaMut2[34], [35]), which further highlights their importance (Table S5).
Compared to several other models, the state-of-the-art molecular dynamics method by Hauser et al. [9] and the Rosetta method R15 achieved higher MCCs on the blind test set (0.77 and 0.76, respectively), but their significantly lower MCCs on our training set (0.30 and 0.43, respectively) indicates that these two methods can be highly sensitive to different mutations (Table 2). By analysing the false negatives (resistance mutations which are predicted as susceptible), we found SUSPECT-ABL, Hauser et al. [9] and R15 were not able to identify the mutation E255V, which is resistant to imatinib and nilotinib, as well as the mutation E255K being resistant to only imatinib.
Table 2.
Name | Method | MCC | F1 | BACC |
---|---|---|---|---|
SUSPECT-ABL | Machine Learning | 0.63 | 0.67 | 0.79 |
Hauser et al.[9] | Molecular Dynamics | 0.77(0.30 on our training set) | 0.80 | 0.89 |
A99/A99L/A99DC [11] | Molecular Dynamics | 0.42 | 0.33 | 0.6 |
A14 | Molecular Dynamics | −0.06 | NaN | 0.49 |
R15 | Rosetta | 0.76(0.43 on our training set) | 0.77 | 0.96 |
R16 | Rosetta | 0.55 | 0.60 | 0.77 |
The molecular interactions calculated by Arpeggio [23] revealed that E255 has no direct contact with the ligands, instead, its carboxylate group in the side chain forms an electrostatic triad with K247 and Y257, stabilizing the P-loop, which was proven to be required for binding of Type II inhibitors [25] (Fig. S4). Disruption of the triad interactions, however, can reduce the surface complementarity with drugs, and cause distributed allosteric effects which destabilize the activation loop. This rare and indirect contribution to drug resistance increases the challenge for detection. Interestingly, after we changed the resistance threshold from 10-fold affinity loss to 5-fold (ΔΔG = 0.95 Kcal/mol), the model was able to detect five out of six resistance cases of E255 mutations, while still predicting the susceptible cases correctly. Moreover, the overall performance of the 5-fold affinity loss classifier (MCCs of 0.75 on training, 0.67 on blind test) improved compared to the 10-fold (Table S6), which can be caused by the increased sample size of resistance cases (from 19 to 27). Therefore, the indirect contributions of resistance can still be detected while relaxing the classification threshold.
2.5. Deploying the model for saturation mutagenesis
We performed in silico saturation mutagenesis within the kinase domain to pre-emptively identify ABL mutations leading to therapeutic failures. Although the resistance mutations (more than 10-fold affinity loss) in the dataset were clustered within the ATP binding pocket, several studies [25], [36] have shown that distant mutations may also impair drug binding, despite many of them occurring less frequently. To test whether our classification model could generalise to those unseen mutations, we thoroughly analysed the newly identified resistance mutations that occur at different positions with those in the dataset created by Hauser et al. [9], and found that many of them coincide with clinically observed or in vitro screened resistance mutations from various sources [25], [36]. These include mutations at residues close to the ATP site, which have no direct contact with ATP but shields the corresponding drugs from solvent (V268, A269, V270, and G321); as well as mutations at allosteric sites to ATP, which destabilize the ABL-drug complex by disrupting the gatekeeper residue or the activation loop (F283, M290, and F382). Thus the resistance mutations predicted by SUSPECT-ABL through saturation mutagenesis are not restricted to the residues within the ATP pocket which directly interact with the drugs, but extend to more complicated resistance mechanisms by including biophysical features in the model.
2.6. SUSPECT-ABL web server
We have implemented SUSPECT-ABL as a user-friendly and freely available web server at http://biosig.unimelb.edu.au/suspect_abl/, which is a database for all possible ABL variants within the kinase domain (from residue number 242 to 493). The web server allows the predictions of phenotypes (resistant or susceptible) and the changes in ΔΔG upon missense mutation, as well as the visualization of molecular interactions within the wild-type and mutant residue environment. Users can select one or more FDA-approved tyrosine kinase inhibitors, specify a single missense mutation in the format T315I (where threonine is the wild-type residue, 315 is the residue position, isoleucine is the mutant residue), or upload a list of missense mutations in the same format. A step-by-step help page with illustrative figures can be found at http://biosig.unimelb.edu.au/suspect_abl/help.
Fig. 6 shows a snapshot of the output page for the single mutation T315I while selecting axitinib and imatinib. The web server displays the results in a table, where rows are inhibitors, columns include the predictions of phenotypes, ΔΔGs, information about the wild-type environment (residual solvent accessibility, secondary structure, dihedral angles, residual depth [37], and distance to ligand) and several other predicted parameters (DynaMut2 [34], [35], mCSM [17], MTR3D [38], MTRv2 [39], [40], and MTRX scores). Below the table, there is information about the mutation, its conservation scores predicted by PROVEAN [41] and SNAP2 [42], as well as the resulting pharmacophore changes. Additionally, there is a downloadable 3D interactive viewer built using NGL [43], which allows users to analyse the non-covalent intermolecular interactions for the residue specified in the input calculated using Arpeggio [23], for both wild-type and mutant structures across different ABL-inhibitor complexes. The results for the mutation list input are summarized in a downloadable table, where rows are drugs and columns are mutations, which allows users to compare the predicted phenotypes and ΔΔGs for different drugs across input mutations, and access the corresponding result pages for single mutations through links. There is also a 3D viewer at the bottom of the page in which the residues in the input mutation list are mapped to the ABL-ligand complexes, and colored according to the predicted phenotype (Fig. S5).
3. Conclusions
Here we present SUSPECT-ABL, a web server that integrates our well-established graph-based signatures concept with dynamics properties, conservation scores and small-molecule properties to accurately predict how missense mutations in ABL1 kinase modulate the binding affinity of eight FDA-approved drugs and thereby lead to resistance. Our method has shown to be robust when evaluated using different cross-validation methods, and outperformed the existing tools on a non-redundant blind test set. Since the ATP sites are highly conserved across different kinases, this approach can also be potentially applied to identify resistance mutations in other kinases against any given ATP-competitive kinase inhibitor, using either the experimental kinase-ligand structures or docked structures. We anticipate expanding this tool to predict the impact of kinome-wide missense mutations on modulating the inhibition ability of a large library of small molecule kinase inhibitors. Finally, due to its consideration of ABL1-inhibiting drugs with different clinical indications, our tool, SUSPECT-ABL offers opportunities to both clinical and research fields. This is particularly due to its applicability in drug repurposing, and hit prioritization according to resistance potential detectable through large-scale comparisons across mutations followed by in vivo experimental validation. In this way, our tool could reduce cost and time commitment in the pipeline of improving precision medicine, and eventually providing patients with tailored chemotherapies.
4. Methods
4.1. Dataset
The binding affinity effects of 144 mutations (ΔΔG, given in Kcal/mol) were obtained from Hauser et al. [9], including 19 resistant (ΔΔG ≥ 1.36 Kcal/mol) and 125 susceptible (ΔΔG <1.36 Kcal/mol) mutations defined by the 10-fold affinity loss cut-off. Within this dataset, mutations were phenotyped according to their response to eight different TKIs: imatinib, dasatinib, nilotinib, bosutinib, ponatinib, erlotinib, gefitinib and axitinib. Of these, all drugs have shown strong inhibition to ABL1 kinase in vitro [28], [44], while imatinib, dasatinib, nilotinib, bosutinib and ponatinib are indicated clinically for use in CML. Further to that, given the limited sample size of the original dataset, we introduced hypothetical reverse mutations in the training set as an over-sampling strategy to better balance the range of mutation effects and increase model robustness for the regression task. These reverse mutations were computed based on the premise that the binding free energy change of a mutation from the wild-type ABL to its mutant (ΔΔGWT→MT) is equivalent to the opposite change in binding free energy of the hypothetical reverse mutation (-ΔΔGMT→WT) [35], [45].
4.2. ABL-ligand structures
The six available ABL-ligand co-crystallized structures were curated from the RCSB PDB [20] (Table S1), and preprocessed in Maestro to fill in missing residues and atoms. Of these, the imatinib-bound crystallized structure (PDB: 1OPJ), represented the mouse homolog, and was used as a template for homology modelling of the human ABL1 kinase. Additionally, when processing the ATP-bound structure (PDB: 2G1T), we removed the peptide conjugate present with ATP, as this was required for crystallization, but not for ATP binding. Finally, as the crystallographic structures of ABL1 kinase bound to erlotinib and gefitinib were not available, these drugs were docked based on the bound coordinates of co-crystallized Bosutinib (PDB: 3UE4), due to similarities in chemical moieties as calculated using ChemMine Tools, and binding modalities (type I inhibitors). During docking, the methods EasyVS [22] and Glide from the Maestro package [21] were used for comparison purposes. An initial redocking of Bosutinib within its crystal structure was used as a control to assess the potential utility of each tool within the target ABL1 kinase, and yielded comparable poses to the co-crystallized ligand. Comparisons between all binding poses were carried out using Pymol, where the final chosen poses were generated using Maestro, and retained comparable contacts between the drug and receptor, calculated using Arpeggio.
4.3. Feature generation
The structural consequences of all ABL mutations were captured by considering a total of 1,013 features, which were broadly divided into descriptors of (1) the molecular properties and pharmacophore modelling of different drugs; (2) the physicochemical properties and local flexibility of ABL around the mutation sites; (3) the interatomic interactions between ligands and the kinase; and (4) the distance patterns between pharmacophore pairs within the local residue environment using the concept of graph-based signatures (Fig. 1B).
The first feature subset described the different ligands bound to ABL1. It includes molecular properties (logP, number of rings, rotatable bonds, surface area and others) calculated by the Python RDKit toolkit based on SMILES, and the atomic pharmacophore frequency counts [18], [19], [46], [47], [48]. The distances from ligands to the mutation sites in Angstrom (Å) and ΔΔG predicted by mCSM-lig [49] were also included.
The second feature subset considered the unbound ABL kinase, capturing the impacts of mutations on physicochemical properties at the binding site (aaindex [32] and changes in pharmacophore count[17]), protein stability changes to account for the local conformational and flexibility changes (predicted by Dynamut [34], Dynamut2 [35], mCSM [17], DUET [50], and SDM[45]), and functional changes predicted by SIFT [24]. Furthermore, other features considering the local residue environment in wild-type structures included residue depth [37], residue solvent accessibility, deformation and fluctuation energies [51].
The third feature subset included the local interatomic contacts of ligand-kinase complexes calculated by Arpeggio, both in the wild-type and the differences between the wild-type and the mutant. These features were calculated for each bound ligand, and compared to endogenous ligand ATP.
The final subset is the graph-based signatures [17], which capture the geometric properties within the local residue environment under different distance thresholds. The atoms with pharmacophore labels (Hydrophobic, Positively charged, Negatively charged, Hydrogen Acceptor, Hydrogen Donor, Aromatic and Neutral) are modelled as nodes, and atomic interactions are modelled as edges. From this graph representation, distance patterns between pharmacophore pairs around the mutation site are captured as a cumulative distribution.
4.4. Machine learning and performance evaluation
To ensure low-redundancy, the dataset was split in a way that mutations at the same site occur exclusively in the training (102 ΔΔG values for 19 mutations across 13 sites) or in a non-redundant blind test set (42 ΔΔG values for 12 mutations across 6 sites). We trained and evaluated the models using different controls and comparisons for both the regressor and the classifier: with and without reverse mutations; 10-fold versus leave-one-position-out cross-validations within the training set; as well as different machine learning algorithms from the python sklearn library [31] (random forest, extra trees, multilayer perceptrons, and support vector machines). The ones with the best performance were selected as our final models.
Moreover, a bottom-up greedy feature selection method was used to reduce the noise, according to Pearson's correlation (r) for the regressor, and Matthew’s correlation coefficient (MCC) for the classifiers. The final models were chosen based on consistent performance between train and test. The model performances were further evaluated by different metrics: Kendall's and Spearman's correlation coefficients, and root mean squared error (RMSE) for regression; F1 score, balanced accuracy (BACC) and Area Under receiver operating characteristic Curve (AUC) for classification.
4.5. Web server development
The SUSPECT-ABL server front-end was built using materialize CSS framework version 1.0.0, while the backend was built in Python via the Flask framework (version 0.12.2). It is hosted on a Linux server running Apache.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Funding
S.P. was funded by a Melbourne Research Scholarship. D.B.A. and D.E.V.P. were funded by a Newton Fund RCUK-CONFAP Grant awarded by The Medical Research Council (MR/M026302/1). D.B.A. was supported by the Jack Brockhoff Foundation (No. JBF 4186, 2016), Wellcome Trust (200814/Z/16/Z) and an Investigator Grant from the National Health and Medical Research Council (NHMRC) of Australia (No. GNT1174405). Supported in part by the Victorian Government’s Operational Infrastructure Support Program. For the purpose of open access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.csbj.2021.09.016.
Contributor Information
Thanh-Binh Nguyen, Email: thanhbinh.nguyen@unimelb.edu.au.
Douglas E.V. Pires, Email: douglas.pires@unimelb.edu.au.
David B. Ascher, Email: david.ascher@unimelb.edu.au.
Appendix A. Supplementary data
The following are the Supplementary data to this article:
References
- 1.Quintas-Cardama A., Cortes J. Kinase inhibitors in chronic myelogenous leukemia. Clin Adv Hematol Oncol. 2006;4(5):365. [PubMed] [Google Scholar]
- 2.O'Brien S.G., Guilhot F., Larson R.A., Gathmann I., Baccarani M., Cervantes F. Imatinib compared with interferon and low-dose cytarabine for newly diagnosed chronic-phase chronic myeloid leukemia. N Engl J Med. 2003;348(11):994. doi: 10.1056/NEJMoa022457. [DOI] [PubMed] [Google Scholar]
- 3.Cuellar S., Vozniak M., Rhodes J., Forcello N., Olszta D. BCR-ABL1 tyrosine kinase inhibitors for the treatment of chronic myeloid leukemia. J Oncol Pharm Pract. 2018;24(6):433. doi: 10.1177/1078155217710553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Elnair R., Galal A. Finding the right BCR-ABL1 tyrosine kinase inhibitor: a case report of successful treatment of a patient with chronic myeloid leukemia and a V299L mutation using nilotinib. BMC Cancer. 2018;18(1):1097. doi: 10.1186/s12885-018-5004-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Rossari F., Minutolo F., Orciuolo E. Past, present, and future of Bcr-Abl inhibitors: from chemical development to clinical efficacy. Journal of Hematology & Oncology. 2018;11(1):84. doi: 10.1186/s13045-018-0624-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kim P., Li H., Wang J., Zhao Z. Landscape of drug-resistance mutations in kinase regulatory hotspots. Brief Bioinform. 2021;22(3) doi: 10.1093/bib/bbaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wang D.D., Ou-Yang L., Xie H., Zhu M., Yan H. Predicting the impacts of mutations on protein-ligand binding affinity based on molecular dynamics simulations and machine learning methods. Comput Struct Biotechnol J. 2020;18:439. doi: 10.1016/j.csbj.2020.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Taft J.M., Georgeon S., Allen C., Reckel S., DeSautelle J., Hantschel O. Rapid Screen for Tyrosine Kinase Inhibitor Resistance Mutations and Substrate Specificity. ACS Chem Biol. 2019;14(9):1888. doi: 10.1021/acschembio.9b00283. [DOI] [PubMed] [Google Scholar]
- 9.Hauser K., Negron C., Albanese S.K., Ray S., Steinbrecher T., Abel R. Predicting resistance of clinical Abl mutations to targeted kinase inhibitors using alchemical free-energy calculations. Commun Biol. 2018;1:70. doi: 10.1038/s42003-018-0075-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Liu J., Pei J., Lai L. A combined computational and experimental strategy identifies mutations conferring resistance to drugs targeting the BCR-ABL fusion protein. Commun Biol. 2020;3(1):18. doi: 10.1038/s42003-019-0743-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Aldeghi M., Gapsys V., de Groot B.L. Predicting Kinase Inhibitor Resistance: Physics-Based and Data-Driven Approaches. ACS Cent Sci. 2019;5(8):1468. doi: 10.1021/acscentsci.9b00590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Karmakar M., Globan M., Fyfe J.A.M., Stinear T.P., Johnson P.D.R., Holmes N.E. Analysis of a Novel pncA Mutation for Susceptibility to Pyrazinamide Therapy. Am J Respir Crit Care Med. 2018;198(4):541. doi: 10.1164/rccm.201712-2572LE. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Karmakar M., Rodrigues C.H.M., Holt K.E., Dunstan S.J., Denholm J., Ascher D.B. Empirical ways to identify novel Bedaquiline resistance mutations in AtpE. PLoS One. 2019;14(5) doi: 10.1371/journal.pone.0217169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Karmakar M., Rodrigues C.H.M., Horan K., Denholm J.T., Ascher D.B. Structure guided prediction of Pyrazinamide resistance mutations in pncA. Sci Rep. 2020;10(1):1875. doi: 10.1038/s41598-020-58635-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Portelli S., Myung Y., Furnham N., Vedithi S.C., Pires D.E.V., Ascher D.B. Prediction of rifampicin resistance beyond the RRDR using structure-based machine learning approaches. Sci Rep. 2020;10(1):18120. doi: 10.1038/s41598-020-74648-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Portelli S., Phelan J.E., Ascher D.B., Clark T.G., Furnham N. Understanding molecular consequences of putative drug resistant mutations in Mycobacterium tuberculosis. Sci Rep. 2018;8(1):15356. doi: 10.1038/s41598-018-33370-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Pires D.E., Ascher D.B., Blundell T.L. mCSM: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics. 2014;30(3):335. doi: 10.1093/bioinformatics/btt691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Pires D.E., Blundell T.L., Ascher D.B. pkCSM: Predicting Small-Molecule Pharmacokinetic and Toxicity Properties Using Graph-Based Signatures. J Med Chem. 2015;58(9):4066. doi: 10.1021/acs.jmedchem.5b00104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Pires D.E.V., Ascher D.B. mycoCSM: Using Graph-Based Signatures to Identify Safe Potent Hits against Mycobacteria. J Chem Inf Model. 2020;60(7):3450. doi: 10.1021/acs.jcim.0c00362. [DOI] [PubMed] [Google Scholar]
- 20.Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Friesner R.A., Banks J.L., Murphy R.B., Halgren T.A., Klicic J.J., Mainz D.T. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem. 2004;47(7):1739. doi: 10.1021/jm0306430. [DOI] [PubMed] [Google Scholar]
- 22.Pires D.E.V., Veloso W.N.P., Myung Y., Rodrigues C.H.M., Silk M., Rezende P.M. EasyVS: a user-friendly web-based tool for molecule library selection and structure-based virtual screening. Bioinformatics. 2020;36(14):4200. doi: 10.1093/bioinformatics/btaa480. [DOI] [PubMed] [Google Scholar]
- 23.Jubb H.C., Higueruelo A.P., Ochoa-Montano B., Pitt W.R., Ascher D.B., Blundell T.L. Arpeggio: A Web Server for Calculating and Visualising Interatomic Interactions in Protein Structures. J Mol Biol. 2017;429(3):365. doi: 10.1016/j.jmb.2016.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Sim, N.-L. Kumar, P. Hu, J. Henikoff, S. Schneider, G. Ng, P. C. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 2012. 40 (Web Server issue). W452. [DOI] [PMC free article] [PubMed]
- 25.Barouch-Bentov R., Sauer K. Mechanisms of drug resistance in kinases. Expert Opin Investig Drugs. 2011;20(2):153. doi: 10.1517/13543784.2011.546344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Rosenzweig S.A. Acquired Resistance to Drugs Targeting Tyrosine Kinases. Adv Cancer Res. 2018;138:71. doi: 10.1016/bs.acr.2018.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Reddy E.P., Aggarwal A.K. The ins and outs of bcr-abl inhibition. Genes Cancer. 2012;3(5–6):447. doi: 10.1177/1947601912462126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Pemovska T., Johnson E., Kontro M., Repasky G.A., Chen J., Wells P. Axitinib effectively inhibits BCR-ABL1(T315I) with a distinct binding conformation. Nature. 2015;519(7541):102. doi: 10.1038/nature14119. [DOI] [PubMed] [Google Scholar]
- 29.O'Hare T., Shakespeare W.C., Zhu X., Eide C.A., Rivera V.M., Wang F. AP24534, a pan-BCR-ABL inhibitor for chronic myeloid leukemia, potently inhibits the T315I mutant and overcomes mutation-based resistance. Cancer Cell. 2009;16(5):401. doi: 10.1016/j.ccr.2009.09.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Zhou T., Commodore L., Huang W.S., Wang Y., Thomas M., Keats J. Structural mechanism of the Pan-BCR-ABL inhibitor ponatinib (AP24534): lessons for overcoming kinase inhibitor resistance. Chem Biol Drug Des. 2011;77(1):1. doi: 10.1111/j.1747-0285.2010.01054.x. [DOI] [PubMed] [Google Scholar]
- 31.Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011;12(null):2825. [Google Scholar]
- 32.Kawashima S., Kanehisa M. AAindex: amino acid index database. Nucleic Acids Res. 2000;28(1):374. doi: 10.1093/nar/28.1.374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Koshi J.M., Goldstein R.A. Context-dependent optimal substitution matrices. Protein Eng Des Sel. 1995;8(7):641. doi: 10.1093/protein/8.7.641. [DOI] [PubMed] [Google Scholar]
- 34.Rodrigues C.H., Pires D.E., Ascher D.B. DynaMut: predicting the impact of mutations on protein conformation, flexibility and stability. Nucleic Acids Res. 2018;46(W1):W350. doi: 10.1093/nar/gky300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Rodrigues C.H.M., Pires D.E.V., Ascher D.B. DynaMut2: Assessing changes in stability and flexibility upon single and multiple point missense mutations. Protein Sci. 2021;30(1):60. doi: 10.1002/pro.3942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Krishnamurty R., Maly D.J. Biochemical mechanisms of resistance to small-molecule protein kinase inhibitors. ACS Chem Biol. 2010;5(1):121. doi: 10.1021/cb9002656. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Tan, K. P. Nguyen, T. B. Patel, S. Varadarajan, R. Madhusudhan, M. S. Depth: a web server to compute depth, cavity sizes, detect potential small-molecule ligand-binding cavities and predict the pKa of ionizable residues in proteins. Nucleic acids research 2013. 41 (Web Server issue). W314. [DOI] [PMC free article] [PubMed]
- 38.Silk M., Pires D.E.V., Rodrigues C.H.M., D'Souza E.N., Olshansky M., Thorne N. MTR3D: identifying regions within protein tertiary structures under purifying selection. Nucleic Acids Res. 2021;49(W1):W438. doi: 10.1093/nar/gkab428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Silk M., Petrovski S., Ascher D.B. MTR-Viewer: identifying regions within genes under purifying selection. Nucleic Acids Res. 2019;47(W1):W121. doi: 10.1093/nar/gkz457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Traynelis J., Silk M., Wang Q., Berkovic S.F., Liu L., Ascher D.B. Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation. Genome Res. 2017;27(10):1715. doi: 10.1101/gr.226589.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Choi Y., Chan A.P. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics. 2015;31(16):2745. doi: 10.1093/bioinformatics/btv195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Hecht M., Bromberg Y., Rost B. Better prediction of functional effects for sequence variants. BMC Genomics. 2015;16(Suppl 8):S1. doi: 10.1186/1471-2164-16-S8-S1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Rose A.S., Bradley A.R., Valasatava Y., Duarte J.M., Prlic A., Rose P.W. NGL viewer: web-based molecular graphics for large complexes. Bioinformatics. 2018;34(21):3755. doi: 10.1093/bioinformatics/bty419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Watanabe K., Kage H., Nagoshi S., Toyama K., Ohno Y., Shinozaki-Ushiku A. Dual EGFR and ABL Tyrosine Kinase Inhibitor Treatment in a Patient with Concomitant EGFR-Mutated Lung Adenocarcinoma and BCR-ABL1-Positive CML. Case Reports in Oncological Medicine. 2020;2020:4201727. doi: 10.1155/2020/4201727. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Pandurangan A.P., Ochoa-Montano B., Ascher D.B., Blundell T.L. SDM: a server for predicting effects of mutations on protein stability. Nucleic Acids Res. 2017;45(W1):W229. doi: 10.1093/nar/gkx439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Al-Jarf R., de Sa A.G.C., Pires D.E.V., Ascher D.B. pdCSM-cancer: Using Graph-Based Signatures to Identify Small Molecules with Anticancer Properties. J Chem Inf Model. 2021 doi: 10.1021/acs.jcim.1c00168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Pires, D. E. V. Stubbs, K. A. Mylne, J. S. Ascher, D. B. Designing safe and potent herbicides with the cropCSM online resource. bioRxiv 2020. DOI:10.1101/2020.11.01.364240 10.1101/2020.11.01.364240. 2020.11.01.364240.
- 48.Pires D.E., Ascher D.B. CSM-lig: a web server for assessing and comparing protein-small molecule affinities. Nucleic Acids Res. 2016;44(W1):W557. doi: 10.1093/nar/gkw390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Pires D.E., Blundell T.L., Ascher D.B. mCSM-lig: quantifying the effects of mutations on protein-small molecule affinity in genetic disease and emergence of drug resistance. Sci Rep. 2016;6:29575. doi: 10.1038/srep29575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Pires, D. E.; Ascher, D. B.; Blundell, T. L. DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach. Nucleic Acids Res 2014, 42 (Web Server issue), W314. [DOI] [PMC free article] [PubMed]
- 51.Grant B.J., Rodrigues A.P., ElSawy K.M., McCammon J.A., Caves L.S. Bio3d: an R package for the comparative analysis of protein structures. Bioinformatics. 2006;22(21):2695. doi: 10.1093/bioinformatics/btl461. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.