Summary
Tyrosine phosphorylations are a prominent characteristic of numerous diseases, yet it is challenging to identify potentially (dys)functional phosphorylations among thousands of phospho-proteins. Here, we propose a machine learning method to predict the thermodynamic stability change resulting from tyrosine phosphorylation. Our approach, based on the prediction of phosphomimetic stability (ΔΔG) from structural features, strongly correlates with experimental phosphorylation stability and mutational scanning cDNA proteolysis data (R = 0.55–0.67). We apply our approach to predict the potential destabilizing effects of all 384,858 tyrosine residues from the Alphafold2 database, the PhosphoSitePlus database, and on a pan-cancer phosphoproteomics dataset with 11 cancer subtypes. We predict destabilizing phosphorylations in both oncogenes and tumor suppressors, and ΔΔG values and local protein circuit topology features are able to distinguish phospho-proteins that are known to be dysregulated in cancer. Our approach can enable rapid screening of destabilizing phosphorylations and phosphomimetic mutations.
Keywords: protein stability, post-translational modifications, computational biophysics, phosphorylation, systems biology
Graphical abstract

Highlights
-
•
Model trained on phosphomimetic ΔΔG predicts destabilizing tyrosine phosphorylations
-
•
Rapid ML method uses easily computable protein structural features to predict ΔΔG
-
•
Destabilizing phosphorylations in cancer are related to cell cycle and cell death
-
•
Phosphorylations of network peripheral proteins are predicted to be most destabilizing
Motivation
Protein phosphorylation is a crucial aspect of cellular regulation and is dysregulated in numerous diseases. An under-explored consequence of phosphorylation is change in protein thermodynamic stability. Hundreds of proteins are phosphorylated at any given point in a human cell, and it is challenging to study their destabilizing effects experimentally, as some of these methods are expensive and sometimes inconsistent with physical intuition. There is a need for a biophysically interpretable machine-learning method that predicts the destabilizing effects of phosphorylation both accurately and more rapidly than current experimental and computational approaches.
Protein destabilization by tyrosine phosphorylation is an under-explored mechanism of cellular regulation. Woodard et al. use computational biophysics-informed machine learning to predict the effect of phosphorylation on protein stability. They screen thousands of normal and cancer-related phosphorylation sites and identify systems-level and structural signatures of destabilizing phosphorylations.
Introduction
The dysregulation of protein phosphorylation is a central theme in the pathology of many human diseases, including cancer and Alzheimer’s disease.1,2 One of the predominant mechanisms driving oncogenesis is the sustained and inappropriate activation of signaling pathways. In cancer cells, genetic mutations or the overexpression of key kinases often lead to a state of constitutive signaling. This bypasses the need for external growth factors and decouples proliferative pathways, such as the MAPK and PI3K/AKT cascades, from normal homeostatic control.3 While numerically less abundant than phosphorylation on serine or threonine residues, tyrosine phosphorylation holds a particularly critical role in cellular signaling due to its unique structural consequences. The phosphorylation of a tyrosine residue creates a distinct molecular signature that is specifically recognized by specialized protein modules, most notably, the Src Homology 2 (SH2) domain.4 This interaction forms the basis of a powerful signal transduction mechanism centered on inducible protein recruitment.5 Upon phosphorylation by a tyrosine kinase, the phosphotyrosine residue serves as a high-affinity docking site, rapidly assembling signaling complexes by recruiting specific SH2 domain-containing proteins, such as adaptors, scaffolds, and enzymes. This allows for the precise and localized activation of downstream pathways directly at the site of the initial signal. Unlike the more general regulatory effects often associated with serine/threonine phosphorylation, this modular recruitment system is fundamental to high-fidelity signal transmission for critical cellular decisions, including growth, differentiation, and cytoskeletal rearrangement.5 The tight regulation and lower prevalence of tyrosine phosphorylation reflect its specialized function as a high-impact switch controlling these essential processes.
While the role of phosphorylation in mediating protein-protein interactions is well characterized, a more fundamental consequence is its direct impact on the thermodynamic stability of the protein fold.6,7,8 The stability of a protein’s native fold represents a delicate balance of intramolecular interactions, a balance that can be significantly perturbed by phosphorylation. Foundational insights into this process come from studies on simplified model compounds, which allow for the systematic dissection of specific structural effects.9,10 These studies reveal that the consequences of introducing a bulky, dianionic phosphate group are not uniform but are instead highly dependent on the local stereochemical environment. For instance, within model α-helical peptides, positioning a phosphoserine at the N terminus is markedly stabilizing (free energy change (ΔΔG) ≈ −2.3 kcal/mol), a result of favorable electrostatic interactions with the helix dipole and hydrogen bonding with the backbone.11 Conversely, placing the same modified residue within the peptide’s interior induces significant instability (ΔΔG ≈ +1.2 kcal/mol), a penalty attributed to the energetic cost of desolvating the charge.11 This principle—that the outcome is dictated by the precise geometry of the fold—was also demonstrated in model β-hairpin peptides, where stability effects are governed by specific side-chain interactions and proximity to turn structures.12,13,14 Therefore, evidence from these model systems establishes that phosphorylation can act as a potent, context-dependent remodeling agent, capable of either reinforcing or disrupting local architecture based on its precise atomic placement.
While destabilizing mutations represent a pathological loss of protein function, it is plausible that cells have harnessed this same physical principle for physiological regulation. This would provide a powerful mechanism to control protein availability and activity in response to specific cellular cues. For instance, destabilization leading to partial or complete unfolding results in a loss of the protein’s native structure, which is almost invariably linked to a loss of its function. Thus, phosphorylation at such sites can act as a potent “off-switch,” rapidly terminating a protein’s activity.15,16 Partial unfolding of a monomer due to phosphorylation might expose cryptic degradation signals (degrons) or render the protein more susceptible to proteasomal or lysosomal pathways. Indeed, studies suggest that certain “cryptic” phosphosites, which may become exposed during protein folding intermediates, can destabilize native structures upon modification and thereby trigger rapid polypeptide degradation.17,18 However, while this represents an elegant form of physiological control, such a potent system is also vulnerable to pathogenic hijacking. The extent to which this mechanism is utilized in normal protein homeostasis, and how its dysregulation by aberrant kinase activity contributes to diseases like cancer, remains a topic of significant debate.17
A key challenge in biophysics is to obtain precise, proteome-wide measurements of the energetic contributions of phosphorylation to protein stability. While high accuracy ΔΔG has been determined for phosphoserine in model peptides, these simplified helical and hairpin systems do not capture the complex tertiary interactions of a globular fold. Conversely, cellular thermal profiling methods provide large-scale data but measure an apparent stability that convolutes the intrinsic stability of the monomer with the thermodynamics of protein-protein interactions.6,7 A measured stability change could signal direct monomer unfolding, or it could reflect the dissociation of a complex,19 as proposed for PUP2 Ser56 phosphorylation (Figure 1).6 Computational biophysics offers a way to deconvolve these effects. Previous computational efforts to define “functional” phosphorylation have been largely correlative, linking sequence to systems-level phenotypes rather than predicting direct, mechanistic changes in protein stability.2,20,21,22 To our knowledge, there is currently no computational method designed to rapidly predict the ΔΔG of folding upon phosphorylation from first principles or structural features.
Figure 1.
Mechanisms and structural contexts of phosphorylation-induced stability changes
(A) Thermodynamic cycle illustrating how measured stability changes in a cellular context can be convoluted. Cellular thermal profiling methods measure an apparent stability that can reflect a change in the intrinsic stability of a protein monomer (right) or a change in the stability of a protein-protein interface (left), both of which can be affected by phosphorylation (P).
(B–E) Examples of the top two most destabilizing and two most stabilizing phosphorylation sites. The most stabilizing and destabilizing phosphorylations occur in exposed, disordered regions (B) and (D) or at protein-protein interaction surfaces (C) and (E).
Results
Machine learning model trained on phosphomimetic data accurately predicts phosphorylation stability
The overarching goal of this study is to predict functional or maladaptive phosphorylation using ΔΔG prediction programs and machine learning (ML). However, developing such a model is complicated by a lack of ideal training data. Large-scale cellular thermal profiling data are ill-suited for training a model of intrinsic stability due to confounding interaction effects (Figure 1), while high-accuracy biophysical data are too sparse for training alone. As an alternative to interrogating phosphorylation directly, we explore phosphomimetic modifications (mutating tyrosine, serine, or threonine to glutamate or aspartate; Figure 2), which can be more accurate than direct phosphorylation modeling.23 We employ a hybrid strategy: we use large-scale phosphomimetic data from cDNA display proteolysis to train the model24 and validate it using the limited but precise biophysical measurements of actual phosphorylation. This contrasts with previous studies of computational modeling of phosphorylation and phosphomimetics, which have been limited to a small number of examples;23,25 one such study found that phosphomimetic approaches can potentially be more accurate than direct phosphorylation modeling.23
Figure 2.
Determination of thermodynamic stability, ΔΔG, of phosphomimetic mutations from Tsuboyama et al. data
(A) Diagram depicting phosphomimetic mutations, along with equations for calculating ΔG and ΔΔG.
(B) Distribution of ΔΔG by residue type for mutation to glutamate from the Tsuboyama et al. dataset. This dataset, comprising over 776,000 measurements, frames stability changes (ΔΔG) within a two-state model, where a positive value signifies destabilization. Representative protein structures are shown that reveal high numbers of contacts with the phosphorylated residue, leading to greater destabilization. Pictures show the residue prior to phosphorylation, with the complete residue in sphere representation. Residues within 5 angstroms are shown in stick representation. Color is used to indicate the position along the protein chain, and in addition, oxygen and phosphorous are colored red, nitrogen is colored blue, and sulfur is colored yellow, within the sphere and stick representations.
While extensive, comprising over 776,000 measurements, the experimental approach used to generate this phosphomimetic dataset has constraints; it assumes fully cooperative folding at equilibrium and has a limited dynamic range of approximately 5 kcal/mol for stability measurements. Our proposed model is then validated using precise, albeit sparse, biophysical measurements of actual phosphorylation, with all stability changes (ΔΔG) framed in a two-state model where a positive value signifies destabilization.24
To generate features for our ML model, we required an energy description capable of accommodating both large-scale phosphomimetic data and the precise biophysical data from model systems. The nature of the validation set, which primarily uses small, designed proteins, precludes the use of evolutionary information and necessitates a physics-based potential. Furthermore, given the known importance of electrostatic interactions in determining phosphorylation’s stability effects, special attention must be paid to this energetic term. For these reasons, we chose to generate features using FoldX.26 FoldX is an empirical force field that uses a full atomic description to calculate the free energy of a protein as a weighted sum of terms, including van der Waals forces, solvation energy, hydrogen bonding, water bridges, electrostatics, and the entropic cost of fixing the main chain and side chains. To calculate the effect of a mutation, it iteratively explores new side-chain rotamers for the mutated residue and its neighbors while keeping the backbone structure fixed. A key advantage of FoldX is its computational speed, which is a direct result of this fixed-backbone approach. This efficiency, combined with its purely physical energy description, makes it well suited for our diverse datasets and specific focus on the energetic drivers of stability change. To supplement the features from FoldX, we also included the relative solvent accessible area of the mutated residue and its secondary structure context. Because FoldX uses a simple coulombic potential, we incorporated an empirical categorical position term for α-helices to better capture known positional electrostatic effects.11,13 Finally, to model the local chemical environment and implicitly account for minor conformational changes, we included features based on the number of hydrophobic, polar, and total atoms within a 10 Å radius of the mutation.27 The features generated from FoldX and the additional structural descriptors were used to train an ML model using the CatBoost gradient boosting framework, chosen due to its predictive performance in datasets containing both categorical and continuous features (Figure 3).28 To build and robustly assess the model, we employed a 5-fold cross-validation scheme. To ensure the model generalizes to unseen proteins and prevent data leakage, all data splits for the training, test, and validation sets were performed at the protein level. Model performance was evaluated using two distinct validation sets: an internal validation set of phosphomimetic data held out from the main training corpus, and a critical external validation set consisting of the high-accuracy biophysical measurements from the model protein systems with actual phosphorylation (Tables S1; S2).
Figure 3.
Representative structures of residues that are highly destabilized and not destabilized upon phosphorylation in the phosphomimetic dataset
(A) Residues that are highly destabilized upon phosphorylation are generally more interior and have more contacts compared with structures that are not destabilized (B).
(C–G) Schematic representation of features used to train the CatBoost regression model. In addition to the (C) physics-based energy terms from FoldX, the model included (D) secondary structure context, (E) relative solvent accessible area (SASA), (F) an empirical categorical position term for α-helices to capture positional electrostatic effects, and (G) local environment features describing the number of hydrophobic, polar, and total atomic contacts within a 10 Å radius.
The performance of the model is summarized in Figure 4. The overall accuracy of our CatBoost model on the internal phosphomimetic hold out validation set, as measured by the Pearson correlation coefficient, was r = 0.61. The model achieved a similar accuracy of r = 0.58 on the external validation set composed of actual phosphorylation data, supporting the viability of using phosphomimetic data as a training proxy. To further justify this phosphomimetic proxy, we performed a direct computational comparison using FoldX to predict ΔΔG for both Y- > E and Y- > pY mutations on the same protein set. We found a strong correlation between the stability effects of the mimic and the actual phosphorylation (r = 0.87, Figure S1), suggesting that the phosphomimetic mutation captures a substantial component of the energetic perturbation of phosphorylation within this physics-based model.
Figure 4.
Performance and interpretation of the structural machine learning model
(A) Accuracy of the prediction model scatterplot showing the correlation between the predicted ΔΔG from the regression model and experimental ΔΔG values for the internal phosphomimetic hold out validation set.
(B) Feature importance ranking the FoldX energy term as the most predictive feature, followed by relative solvent accessible area (SASA) and contact-based metrics.
(C) Scatterplot showing the correlation between predicted ΔΔG and experimental ΔΔG for the external validation set, which consists of model peptides with actual phosphorylation.
(D) Receiver operating characteristic (ROC) curve for the classifier’s performance on the phosphomimetic dataset.
See also Figures S1–S4, and Tables S1, S2, and S3.
On the phosphomimetic dataset, predictions for tyrosine mutations (r = 0.67) were more accurate than for serine (r = 0.48) and threonine (r = 0.27; Figure S2). Threonine is unique in that it undergoes a dramatic disorder-to-order transition upon phosphorylation, forming a compact, highly ordered, cyclic structure stabilized by a strong intra-residue hydrogen bond.29 This highly specific and strained conformation is not well captured by our model’s more general physical features and the discrete rotamer approximation, resulting in poor predictive accuracy. The model also performed significantly better on buried residues (r = 0.64) than exposed residues (r = 0.48) (Figure S2). Accuracy for residues in α-helices (r = 0.73) and β-sheets (r = 0.68) was substantially higher than that for residues in coil regions (r = 0.21) (Figure S2). While the ML method slightly outperformed FoldX alone on the phosphomimetic data, its predictions were strongly correlated with the FoldX energy terms (r = 0.77), indicating that, on this dataset, the model primarily learned to be a more refined version of the underlying physics-based potential (Figure S3).
Surprisingly, this trend did not hold for the external validation set. On this set of actual phosphorylation data, the ML model (r = 0.59) performed significantly better than FoldX (r = 0.34), and there was essentially no correlation observed between the two methods' predictions (Figure S3). This divergence is a key finding, as it suggests that to predict the effects of actual phosphorylation, the model learned biophysical principles beyond the FoldX energy function. The custom features, such as the helix position term and local atomic environment, likely enabled the model to capture the unique steric and electrostatic consequences of a true phosphate group that is not fully represented by the FoldX potential. This result ultimately validates our hybrid strategy; the phosphomimetic dataset, while a biophysically imperfect proxy, served as an effective scaffold that allowed the model to learn foundational stability principles that could then be successfully extrapolated to predict the more complex energetic consequences of actual phosphorylation.
Feature reduction eliminating FoldX calculation improves speed without substantially compromising accuracy
Building on our initial findings, we developed a rapid prediction tool specifically for tyrosine phosphomimetic mutations. Tyrosine phosphorylation was selected as it represents a critical hub in cellular signaling and a frequent point of dysregulation in human cancers.5 As the least common and most specialized type of phosphorylation, phosphotyrosine sites are more frequently located in structured, partially buried regions compared with serine or threonine; only ∼40% of phosphotyrosine sites are in intrinsically disordered regions, compared with ∼70% for phosphoserine and ∼60% for phosphothreonine.30 This structural context means that phosphorylation-induced stability changes are often larger and more likely to function as high-impact “on-off” regulatory switches; indeed, analysis of the Tsuboyama et al. phosphomimetic dataset shows that tyrosine to glutamate mutation tends to be much more destabilizing on average than the mutation of either serine or threonine to glutamate (Figure 2B).24
This biological significance is complemented by a strong predictive signal in our analysis. Tyrosine mutations were the most accurately predicted in our model (r = 0.67), a result explained by their unique biophysical properties. Unlike serine or threonine, tyrosine’s bulky side chain is sterically restrained from the side chain-to-backbone hydrogen bonding that is a major factor in their stability changes. Instead, its stability effects are strongly correlated with changes in solvent accessible surface area (SASA; r = 0.65), far more so than for serine (r = 0.32) or threonine (r = 0.20). In contrast, the smaller stability effects for serine and the unique structural transitions of threonine phosphorylation are not as well captured by our current model, leading to lower predictive accuracy.
To build a rapid ML model, we decided to assess performance of the method upon eliminating the most computationally intensive steps: calculation of ΔΔG using FoldX, which demand about 10–24 h of computational time to run, respectively, per 1,000 mutations on large proteins. We therefore constructed a simplified model consisting of only the residues in contact with the phosphorylated residue (any atom within 5 angstroms), the SASA, and the phi and psi angles. To achieve the required speed for high-throughput analysis, we eliminated the computationally intensive FoldX feature generation step. This simplified model proved to be both rapid, with a calculation time of approximately 3 min per 1,000 mutations, and accurate, achieving a 5-fold cross-validated predictive performance (r = 0.55) on par with our original, full feature model (r = 0.67), and output strongly correlates with the full feature model (r = 0.86, Figure S4; Table S3). For the phosphomimetic substitution, we continued to use glutamate due to its larger size, noting that its stability effects are highly correlated with those of aspartate (r = 0.94) in the Tsuboyama et al. dataset.
As shown in Table 1, Figure S3, and Table S4, the rapid method’s performance approximately matches or exceeds that of other common predictors, including physics-based estimators like FoldX (r = 0.56) and EvoEF (r = 0.53),31 a neural network predictor DDMut (r = −0.53),32 AlphaMissense (r = −0.1),33 and sequence conservation metrics (r = 0.04).34 Notably, the full feature ML model (full method) outperforms all other predictors (r = 0.67) and was used for all subsequent analysis, with the exception of the large-scale exploration of AlphaFold2 and PhosphoSitePlus data.
Table 1.
Accuracy benchmarks for tyrosine mutations for the Tsuboyama et al. dataset
| Method | Correlation (r) with exp. |
|---|---|
| FoldX | 0.56 |
| EvoEF | 0.53 |
| AlphaMissense | −0.1 (NS) |
| DDMut | 0.53 |
| Sequence conservation | 0.04 (NS) |
| Full method | 0.67 |
| Rapid method | 0.55 |
Accuracy benchmarks of the rapid and full method on tyrosine mutations in the Tsuboyama dataset. Performance of the “full method” (with FoldX) and the “rapid method” (FoldX-free) compared with other computational predictors on tyrosine mutations in the Tsuboyama dataset, including an empirical physics-based estimator (FoldX), a physics-based estimator that uses the CHARMM forcefield (EvoEF), a neural network predictor (DDMut), a sequence conservation metric (JS Divergence), and a pathogenicity predictor (AlphaMissense). For AlphaMissense, we note that not all UniProt IDs were contained in their database, and the vast majority of mutations from Y to E in these proteins were predicted to be pathogenic. For DDMut, the sign for destabilizing mutations was flipped to match the convention used in this study. NS: not significant. See also Table S4.
Destabilizing cancer-associated tyrosine phosphorylation sites, but not serine or threonine, are linked to cell growth and death pathways
We next applied the rapid, FoldX-free prediction model to three distinct, large-scale datasets to assess the landscape of phosphomimetic-induced stability changes (Figure 5 and Table S5. Features of proteins from the pan cancer dataset, including role in cancer, related to Figure 5, Table S6. Circuit topology features of tyrosine phosphorylation proteins from the pan cancer dataset, including role in cancer, related to Figure 5, Table S7. Features of proteins from the alphafold dataset, related to Figure 5). The datasets included all 384,858 tyrosine residues from the human AlphaFold2 database, representing a comprehensive baseline35; 36,468 known phosphorylated tyrosine residues from the PhosphoSitePlus database36; and a curated set of tyrosine phosphosites known to be dysregulated across 1,110 patients in 11 cancer types.16
Figure 5.
Distributions of predicted ΔΔG and pathway enrichment of destabilizing phosphorylations
(A) Violin plots showing the distributions of predicted ΔΔG by the rapid method. Included are all 384,858 tyrosine residues from the human AlphaFold2 database, 36,468 known phosphorylated tyrosine residues from the PhosphoSitePlus database; and a curated set of tyrosine phosphosites known to be dysregulated across 1,110 patients in 11 cancer types from Geffen et al.
(B) GO enrichment of highly destabilized (ΔΔG > 2 kcal/mol) tyrosine phosphorylation sites vs. all tyrosine phosphorylation sites in the Geffen pan-cancer dataset, based on the full feature method. There was no significant association for any biological process for serine or threonine.
The distribution of predicted stability changes (ΔΔG) was roughly bimodal across all datasets, with a large cluster of sites predicted to have a neutral effect (ΔΔG ≈0 kcal/mol) and a second cluster predicted to be strongly destabilizing (ΔΔG ≈ +2.5 kcal/mol). This destabilizing population was most prominent in the comprehensive AlphaFold dataset. A direct comparison of the biological datasets revealed a significant trend: the set of tyrosine phosphosites known to be dysregulated in cancer contained a significantly larger fraction of strongly destabilizing sites than the set of all known human phosphosites. This finding suggests a potential link between phosphorylation-induced protein destabilization and oncogenic processes.
To systematically assess the biological functions associated with phosphorylation-induced instability, we performed Gene Ontology (GO) enrichment analysis on the Geffen et al. pan-cancer phosphorylation dataset.16 The analysis was conducted separately for tyrosine, serine, and threonine residues to account for their distinct biological functions and biophysical profiles.30 For each residue type in the pan-cancer phosphosite dataset, sites predicted by our model to have a destabilization potential greater than 2 kcal/mol (ΔΔG>2 kcal/mol), which constituted 29% of tyrosine (N = 145 destabilized), 3% of serine (N = 162), and 4% of threonine (N = 70) phosphosites, were functional compared with the remaining background set of that residue type using ShinyGo.37 Strikingly, the analysis of destabilizing tyrosine phosphosites revealed a modest but statistically significant enrichment for proteins involved in the regulation of cell death, apoptosis, and mitotic cell cycle phase transition (false discovery rate [FDR] < 0.1; Figure 5B). In contrast, no GO terms for any biological process were found to be significantly enriched for destabilizing serine or threonine phosphorylation sites at the FDR = 0.1 level. These results suggest that destabilizing tyrosine phosphorylation could potentially represent a distinct, non-genomic mechanism for modulating pathways of cell survival, a mode of regulation that appears to be specific to tyrosine, although this may be influenced by the relatively higher accuracy of our ΔΔG prediction model for tyrosine over serine and threonine phosphorylation.
Proteins destabilized by phosphorylation occupy peripheral network locations
To link protein structural properties with systems-level molecular network properties, we asked whether systems network centralities of proteins harboring highly destabilizing phosphorylation are different from those with less destabilizing phosphorylations. Specifically, we quantified centralities for metabolic reactions carried out by the proteins of interest, where reactions were counted as nodes and metabolites as edges. The network data were derived from a genome scale metabolic model, with centralities derived as part of a previous study.21 Metabolic networks were chosen as they are one of the most well studied and characterized cellular networks. We found that degree, closeness, betweenness, and Pagerank are all less for phosphorylations with ΔΔG greater than 1 kcal/mol vs. phosphorylations with lower ΔΔG (Figure 6). This suggests that cancer cells preferentially destabilize proteins in the network periphery. This is consistent with previous pan-cancer analysis of genomic and transcriptomics data with metabolic networks, which found preferential dysregulation of enzymes and transporters in the network periphery.38 Perturbation of hub enzymes may be detrimental to the overall network function. ΔΔG predicted using the full method was significantly lower (p = 0.02) for essential genes vs. non-essential genes.39 Modeled growth rate upon gene knockout was not significantly different for proteins with highly destabilizing phosphorylations (ΔΔG >1 kcal/mol) vs. proteins with less destabilizing phosphorylations.
Figure 6.
Metabolic network GEM centralities for machine learning predicted ΔΔG less than or equal to, or greater than, 1 kcal/mol
p value corresponds to a Wilcoxon test. Proteins predicted to be highly destabilized by phosphorylation consistently have lower network centrality
(A) Degree.
(B) Closeness.
(C) Betweenness.
(D) PageRank.
ΔΔG and circuit topology identifies proteins with a role in cancer
We asked whether predicted ΔΔG of phosphorylation can distinguish proteins with an oncogenic, tumor suppressor, or driver role from other proteins, within the Geffen et al. pan-cancer dataset.16 There is significant difference in ΔΔG for proteins annotated as either tumor suppressors or both tumor suppressor and oncogene annotation (p = 0.02; STAR Methods). We next asked whether circuit topology may be able to improve such predictions (Figure S5). Circuit topology is a mathematical framework by which pairs of contacts along a linear chain are viewed in relation to one another.40 Three possible relations are possible: parallel, where one contact is “inside” another; series, where the contacts are in tandem; and cross, where the intervals of the contacts partially overlap. Circuit topology was shown to have relevance to folding rates theoretically and in actual proteins.40,41,42,43 The inverse parallel and cross-relations were found to be important in determining the pathogenicity of mutations.44 We find that the inverse parallel relation can determine oncogene, tumor suppressor, and driver status to a greater extent than ΔΔG. We see significance in the inverse parallel relation for all four cancer-associated protein categories, with the most significance (p = 0.0005) seen for tumor suppressor and oncogene annotation. Such information may be useful to include in ML models to predict tumor suppressor, oncogene, and driver status, although such improvement is beyond the scope of the current article. We note that the direction of the difference is such that proteins with a role in cancer have lower stability change and a smaller number of inverse parallel relations than other proteins. Additionally, we found that the number of cross-relations and inverse relations correlates with predicted ΔΔG by the rapid method, with r = 0.57 and 0.46, respectively, thus linking ΔΔG with protein circuit topology.
Case studies of destabilizing phosphorylations: PI3K and HSP60
As a case study, we examined phosphorylation within the p85 subunit of phosphatidylinositol 3-kinase (PI3K), a critical regulator of cell growth and survival. Of proteins with the most destabilizing tyrosine phosphorylation (in the top 85th percentile; Figure S6), the PI3K inhibitory p85α subunit (PIK3R1 gene) stands out, in that the relative orientation of two phosphorylated tyrosine residues closely resembles that of the receptor tyrosine kinase (RTK; Figures S6A and S6B). Mutations within the PI3K domain that is phosphorylated are known to cause cancer. Phosphorylation of Y580 (orange in Figure S6A), in addition to being implicated in cancer, occurs in normal cells via the insulin receptor.45 It has been proposed that Y580 is a passenger phosphorylation in cancer.46 We suggest the alternative hypothesis that phosphorylation of this residue relieves inhibition of the interacting catalytic subunit, activating PI3K. Such activation may be functional in normal cells but is likely co-opted in cancer cells to promote cell growth and proliferation. Joint phosphorylation of the nearby tyrosine 452 in cancer may increase this effect.
A second case study was cancer-related phosphorylation of the protein HSP60. HSPD1 or HSP60 is a member of the heat shock protein family. Hsp60 can be phosphorylated at Y227 and Y243, which is crucial for its surface activation. We find HSPD1 Y243 as one of the top hits using our CatBoost model, in the 93rd percentile. Phosphorylation occurs at a residue in contact with a large loop, which may exhibit phosphorylation-dependent conformational changes. HSP60 is present in the plasma membrane of human leukemic CD4+ T cells and is phosphorylated by protein kinase A (PKA). Hsp60 physically associates with histone 2B (H2B) in the dephosphorylated form. Protein kinase A-catalyzed phosphorylation of HSP60 regulates its attachment to H2B.47 By contrast, PKA-catalyzed phosphorylation causes dissociation of H2B from HSP60, potentially due to its destabilizing effect.
We next wanted to see how evolutionary mutations at cancer-phosphorylated sites affect protein function. While we utilized mutation to glutamate as a phosphomimetic in this study, in evolution, tyrosine is more frequently mutated to aspartate, given the similarity in codon usage rules. We mined the HuVarBase database of rare variants for cases of mutation of tyrosine residues phosphorylated in cancer to aspartate.48 We found three such cases (Figures S6C and S6D). All three are associated with pathogenicity to different extents. Of the three, two mutations are at sites of high predicted destabilization due to phosphorylation; the third likely affects protein-protein interactions. Notably, for the proteins hemoglobin beta (HBB) and Shp-2 (PTPN11; Figure S6C), the Tyr to Asp mutation confers increased cancer risk in genetic disease.49 Histograms of ΔΔG values with locations of example proteins labeled are shown in Figures S6E and S6F.
Discussion
Here, we predict the potential destabilizing effects of protein tyrosine phosphorylations using an ML approach using easily accessible protein features. The ML model was trained on a large-scale dataset measuring the stability change of phosphomimetic mutations. Phosphomimetic approaches are commonly employed experimentally to interrogate functional phosphorylations. Numerous studies have demonstrated that phosphomimetics are often suitable representations.25,50,51 This suggests that phosphomimetics may also be employed computationally, taking advantage of optimized strategies to model mutations between amino acid types. We note that phosphomimetics are an inherently limited approach, due to the lesser negative charge and larger size of the mimetic, and there is still some debate on their accuracy in reflecting the destabilizing effects of phosphorylation.25,52 However, the ML model presented here is consistent with known phosphorylation stability changes and can be re-trained using large-scale direct phosphorylation stability measurements when available.
With our ML method, we obtain performance on phosphomimetic mutations exceeding commonly employed neural network-based approaches, such as DDMut,32 but substantially faster than these approaches. Our method employed a CatBoost model with readily obtainable and mechanistic features, in addition to calculations that represent composites of mineable energy terms, and so is easy to interpret. We also find that the phosphorylations predicted by phosphomimetics to be the most destabilizing can be rationalized by examination of the protein structure. In addition to simply modeling the phosphomimetic mutation with traditional ΔΔG prediction tools, we employed ML strategies to recapitulate the experimental results of a widely used mutational scanning dataset. Previous experimental studies did not identify a difference in ΔΔG of tyrosine phosphorylation vs. serine or threonine.6,53 However, such studies did not include a large number of high-confidence tyrosine phosphorylations, and results were likely skewed toward less destabilizing phosphorylations.
Surprisingly, we observed a general lack of consistency between SASA values and change in melting temperature upon phosphorylation, ΔTm, from Potel et al.’s study that developed a method to assess the thermodynamic stability change due to phosphorylation.6 In theory, ΔTm should correlate substantially with SASA; phosphorylations further within the core of the protein should be more prone to destabilization. However, upon carrying out SASA calculations on residues from alphafold2 structures of the proteins, we found a correlation of less than 0.1 for each of serine, threonine, and tyrosine phosphorylation. Therefore, we utilized phosphomimetic data from the mutational scanning dataset by Tsuboyama et al. for constructing our ML models.24 We cannot rule out that phosphomimetics may not be accurate representations of phosphorylated states for larger proteins that possess properties that preclude forecast based on the smaller proteins of the Tsuboyama et al. dataset. However, the identification of visually “obvious” destabilizations by our phosphomimetic approach suggests that this approach may be an adequate strategy.
We predicted that many phosphorylations in human cancers destabilize the protein beyond the often-cited threshold of 1 kcal/mol above which mutations often affect protein function.54,55 While most proteins have stabilities of a few kcal or more, kinetic factors and the possibility of local unfolding push the threshold to lower values. Our results suggest that tumor suppressors can be depleted by thermodynamically destabilizing phosphorylation. Drugs that prevent phosphorylation or enzyme replacement therapy with a stable protein that cannot be phosphorylated may be beneficial in such cases. Among the cancer associated phosphorylations, the p85 subunit of PI3K illustrates a case in which the structural motif found in RTK is reused in a different context. Similar to the relief of auto inhibition within the same protein chain via partial unfolding seen in RTK, destabilization of p85 likely abolishes the chain’s role in THE inhibition of the catalytic subunit. In both cases, two interacting tyrosine residues with similar respective orientations are both phosphorylated. However, the structural details surrounding the tyrosine-tyrosine interaction are quite different in these two cases. This is one example of a structural motif that is transferrable to other contexts, a common occurrence in structural biology.
While this study focused on applications to cancer, we note that destabilizing phosphorylation will likely be applicable to other disease states, including diabetes, as forecast by the PI3K case. However, there may be selection against destabilizing phosphorylations in healthy individuals,56 as evidenced by the relative lack of destabilization in the PhosphoSitePlus data (Figure 5A). Hence, cancers may be particularly prone to harbor destabilizing phosphorylations. Further, while we focused on single phosphorylations, a limitation of our study is that many proteins contain multiple phosphorylation sites that may interact to determine overall effects. Autoinhibited proteins, for instance, often have two tyrosine phosphorylations side by side, as seen for PI3K. Phosphorylation at both of these sites likely leads to enhanced local or global destabilization of the domain. It should be possible to model such multiple phosphorylations using either phosphomimetics or direct phosphorylation approaches in the future. An important question moving forward is whether other post-translational modification types may also exhibit stability alteration. A future extension would be acetylation, for which mimetics also exist. In fact, stability alteration due to acetylation has been proposed and verified experimentally.57,58,59 Methylation may be an example for which prediction is less straightforward, but results may also be determinable by rigorous computational approaches. Finally, a thermodynamic and structural analysis of PTMs such as glycosylation and SUMOylation via molecular dynamics and topological methods would likely reveal interesting results.44 Ultimately, our goal is to combine systems-level approaches with structural approaches to develop a comprehensive picture of PTM signaling.
In sum, based on our ML analysis, we find that stability change (ΔΔG) of phosphorylations and phosphomimetic mutations can be predicted from structural features. We identify protein circuit topology as a tool to further prioritize functional phosphorylations in cancer. We propose adding destabilization of tumor suppressors due to phosphorylation to the repertoire of effects with potential to affect cancer progression. We prioritize predicted destabilizing phosphorylations for further experimental study.
Limitations of the study
The predictive accuracy of the ML models, while useful for screening, is inherently constrained by the available training data. A primary limitation of this study is that our model’s calibration relies heavily on phosphomimetic assumption, a strategy necessitated by the overall paucity of large-scale experimental data for the stability effects of direct phosphorylation. While we attempted to validate this approach using an external dataset of actual phosphorylation events, we acknowledge that this validation is also constrained by the available data. This data scarcity is particularly acute for phosphotyrosine, for which there are few high-confidence stability measurements. Therefore, while our model demonstrates a promising ability to predict the effects of true phosphorylation, its accuracy will undoubtedly improve as larger and more diverse experimental datasets become available for future training and refinement.
Resource availability
Lead contact
Requests for further information and resources should be directed to and will be fulfilled by the lead contact, Sriram Chandrasekaran (csriram@umich.edu).
Materials availability
This study did not generate new unique reagents.
Data and code availability
-
•
This paper analyzes existing, publicly available data. The details of all datasets used in this study are provided in the key resources table.
-
•
All original code has been deposited at Zenodo and is publicly available at https://doi.org/10.5281/zenodo.16458054 and https://github.com/sriram-lab/phosphorylation as of the date of publication.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
Acknowledgments
We thank Dr. Sumaiya Iqbal, Dr. Matthew O’Meara, and members of the Chandrasekaran lab for helpful discussions. This work was supported by faculty start-up funds from the University of Michigan (UM), Camille and Henry Dreyfus Foundation, the Rogel Cancer Center at UM, and R35GM137795 from the NIH to S.C.. J.W. is supported by the NIH Predoctoral and Postdoctoral Multidisciplinary Training Program in Benign Kidney, Urology, and Hematology Disease, University of Michigan (U2C/TL1 Training, grant no. U2CDK129445/TL1DK136046).
Author contributions
Conceptualization: J.W. and S.C.; methodology: J.W., J.R.B., Z.L., A.M.C., J.T., and R.B.; investigation: J.W., J.R.B., Z.L., A.M.C., J.T., and R.B.; visualization: J.W. and J.R.B.; supervision: A.M., S.P., S.C., and J.T.; writing – original draft: J.W. and J.R.B.; and writing – review & editing: S.C., J.W., J.R.B., A.M., and J.T.
Declaration of interests
The authors declare no competing interests.
STAR★Methods
Key resources table
Method details
Machine learning prediction of ΔΔG of phosphomimetic mutation
To predict the thermodynamic stability change (ΔΔG) resulting from protein phosphorylation, machine learning models were developed using phosphomimetic mutations as a proxy, thereby overcoming the sparsity of high-accuracy, direct phosphorylation stability data available for training. A feature set was generated for each mutation from a combination of structural features and energy calculations. Prior to analysis, PDB structures were aligned to the experimental protein sequence using the Needleman-Wunsch algorithm.68
The FoldX software suite (foldx_20251231.exe) was used to perform energy calculations.26,62,63 Each wild-type PDB structure was first prepared using the RepairPDB command to correct potential structural issues, and was subsequently processed with the Optimize command to minimize its conformational energy. Following this preparation, the BuildModel command was used to introduce the specified phosphomimetic mutations into the optimized wild-type structures. During the BuildModel step, water molecules were ignored, and the simulation was set to pH = 7.3 and ion strength = 0.15.
Additional structural features were calculated using custom Python scripts. These features included: secondary structure context and relative SASA determined by DSSP66; the number of hydrophobic, polar, and total atoms within a 10 Å radius of the mutation; and a categorical position term (AltPosition). The AltPosition feature quantifies a residue’s location within its secondary structure element; for α-helices and turns, it is numbered from the N-terminus and capped at a value of 4 to capture electrostatic effects, while for β-sheets, it is numbered relative to the connecting turn. The classification of atoms as "hydrophobic" or "polar" was based on a predefined scheme derived from the YRB charge and hydrophobicity scale. Hydrophobic atoms were defined specifically as side-chain carbon atoms that are not directly bonded to a nitrogen or oxygen atom. The broader category of polar atoms comprised all main-chain nitrogen and carbonyl oxygen atoms, side-chain atoms with formal charges such as the NZ atom in Lysine, and other non-charged polar side-chain atoms including the hydroxyl oxygens in Serine (OG) and the sulfur atoms in Cysteine (SG).
The machine learning model was built in Python using the CatBoost gradient boosting framework.28 A 5-fold cross-validation scheme was employed for development, with data splits performed at the protein level to prevent data leakage between folds. Hyperparameter tuning was performed using the Optuna framework over 10 trials to identify the optimal hyperparameters.65
The final model’s performance was assessed using two distinct validation sets. The first was an internal validation set, which consisted of a 20% hold-out portion of the phosphomimetic training data, with the split performed at the protein level to ensure generalization to unseen proteins. The second was a critical external validation set, which was composed of high-accuracy experimental biophysical measurements from model protein systems with actual (not phosphomimetic) phosphorylation.
Application to the pan-cancer dataset
In order to utilize our approach in the context of cancer phosphorylation, we applied our machine learning predictions to a recent pan-cancer dataset.16 Sequences flanking the phosphorylation site were aligned to fasta sequences using the Needleman Wunsch algorithm, with gap and extension parameters altered to promote proper alignment; we found that 21% of aligned residue numbers did not conform with the numbering system employed in the publication and were discarded. Phosphorylations were considered individually in cases where multiple phosphorylations occurred in the same protein. Alphafold2 structures were referenced.35 Features were extracted as above. In addition, direct phosphorylation modeling, or phosphorylation performed with the FoldX energy function, was carried out in FoldX, and correlation between the two methods of ΔΔG determination was assessed.
Metabolic gene centralities
We sought to determine whether certain types of network locations were prone to harbor destabilizing phosphorylation. Degree, betweenness, closeness, and pagerank of metabolic reactions were extracted from Smith et al.,21 based on the Recon1 Genome Scale Metabolic Model (GEM).69 For genes that catalyzed multiple reactions, centralities were averaged over reactions for the gene of interest. Boxplots of centrality were generated for proteins predicted to be destabilized or not by greater than 1 kcal/mol.
Circuit topology of cancer-associated phosphoproteins
We aimed to determine if it was possible to distinguish proteins with a role in cancer from other proteins, on the basis of topological parameters from circuit topology. Briefly, circuit topology descriptions consist of parallel, inverse parallel, series, and cross relations, where local circuit topology counts the number of each type of relation relative to any contact that is formed with the residue of interest.40 Local circuit topology is then the number of contacts with a given relation to any contact formed with a residue (e.g., the phosphorylated residue) of interest (Figure S5). We extracted local circuit topology using a state-of-the-art, published program,67 for all proteins in the pan-cancer dataset, and we labeled each protein as to whether it has a CancerMine annotation of (separately) oncogene, tumor suppressor, or driver status, or oncogene and tumor suppressor combined (e.g., in different cancer subtypes).61 Distributions of role in cancer vs. absence of a role in cancer were determined for each category and compared using a two-tailed T-test.
| Cancer-related genes and comparison group | Local P | Local IP | Local X | Local S | ΔΔG |
|---|---|---|---|---|---|
| Oncogene (298) to non-oncogene (186) | 0.05 | 0.01 | NS | NS | NS |
| Tumor Suppressor (187) to non-tumor suppressor (297) | NS | 5x10−4 | NS | NS | 0.04 |
| Driver (144) to non-driver (340) | NS | 0.002 | NS | NS | NS |
| Tumor Suppressor & Oncogene (164) to other (320) | NS | 5x10−4 | 0.05 | NS | 0.02 |
Significance of circuit topological relations for proteins with a specific role in cancer vs. other proteins, based on alphafold structure predictions of phosphorylations from Geffen et al. and CancerMine annotation. Related to STAR Methods.
Quantification and statistical analysis
All statistical analyses were conducted using GraphPad Prism (Version 10.4.1, GraphPad Software) and custom scripts written in Python. The specific statistical tests used are detailed in the corresponding figure legends and results sections, and include Pearson correlation coefficients, two-tailed T-tests, and Wilcoxon tests. Significance was defined as a p-value less than 0.05 or a False Discovery Rate (FDR) of less than 0.1. The exact value of n is indicated for each experiment in the figures. Data are presented as mean ± SD. For the development of machine learning models, a 5-fold cross-validation scheme was employed where data splits were performed at the protein level to prevent data leakage. No specific methods were used for sample size estimation. Details regarding the inclusion and exclusion of data can be found in the relevant methods sections.
Published: September 15, 2025
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.crmeth.2025.101169.
Supplemental information
Contains stability predictions from the full method.
Contains stability predictions from the full method.
Contains stability predictions from the rapid method.
References
- 1.Peng Y., Liu J., Inuzuka H., Wei W. Targeted protein posttranslational modifications by chemically induced proximity for cancer therapy. J. Biol. Chem. 2023;299 doi: 10.1016/j.jbc.2023.104572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Reimand J., Bader G.D. Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Mol. Syst. Biol. 2013;9:637. doi: 10.1038/msb.2012.68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Hanahan D., Weinberg R.A. The hallmarks of cancer. Cell. 2000;100:57–70. doi: 10.1016/s0092-8674(00)81683-9. [DOI] [PubMed] [Google Scholar]
- 4.Kaneko T., Joshi R., Feller S.M., Li S.S. Phosphotyrosine recognition domains: the typical, the atypical and the versatile. Cell Commun. Signal. 2012;10:32. doi: 10.1186/1478-811X-10-32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Blume-Jensen P., Hunter T. Oncogenic kinase signalling. Nature. 2001;411:355–365. doi: 10.1038/35077225. [DOI] [PubMed] [Google Scholar]
- 6.Potel C.M., Kurzawa N., Becher I., Typas A., Mateus A., Savitski M.M. Impact of phosphorylation on thermal stability of proteins. Nat. Methods. 2021;18:757–759. doi: 10.1038/s41592-021-01177-5. [DOI] [PubMed] [Google Scholar]
- 7.Smith I.R., Hess K.N., Bakhtina A.A., Valente A.S., Rodríguez-Mias R.A., Villén J. Identification of phosphosites that alter protein thermal stability. Nat. Methods. 2021;18:760–762. doi: 10.1038/s41592-021-01178-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Pullen K., Rajagopal P., Branchini B.R., Huffine M.E., Reizer J., Saier M.H., Jr., Scholtz J.M., Klevit R.E. Phosphorylation of serine-46 in HPr, a key regulatory protein in bacteria, results in stabilization of its solution structure. Protein Sci. 1995;4:2478–2486. doi: 10.1002/pro.5560041204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Szilak L., Moitra J., Krylov D., Vinson C. Phosphorylation destabilizes alpha-helices. Nat. Struct. Biol. 1997;4:112–114. doi: 10.1038/nsb0297-112. [DOI] [PubMed] [Google Scholar]
- 10.Signarvic R.S., DeGrado W.F. De novo design of a molecular switch: phosphorylation-dependent association of designed peptides. J. Mol. Biol. 2003;334:1–12. doi: 10.1016/j.jmb.2003.09.041. [DOI] [PubMed] [Google Scholar]
- 11.Andrew C.D., Warwicker J., Jones G.R., Doig A.J. Effect of phosphorylation on alpha-helix stability as a function of position. Biochemistry. 2002;41:1897–1905. doi: 10.1021/bi0113216. [DOI] [PubMed] [Google Scholar]
- 12.Riemen A.J., Waters M.L. Dueling post-translational modifications trigger folding and unfolding of a beta-hairpin peptide. J. Am. Chem. Soc. 2010;132:9007–9013. doi: 10.1021/ja101079z. [DOI] [PubMed] [Google Scholar]
- 13.Riemen A.J., Waters M.L. Positional effects of phosphoserine on beta-hairpin stability. Org. Biomol. Chem. 2010;8:5411–5417. doi: 10.1039/c0ob00202j. [DOI] [PubMed] [Google Scholar]
- 14.Riemen A.J., Waters M.L. Controlling peptide folding with repulsive interactions between phosphorylated amino acids and tryptophan. J. Am. Chem. Soc. 2009;131:14081–14087. doi: 10.1021/ja9047575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Bolduc D., Rahdar M., Tu-Sekine B., Sivakumaren S.C., Raben D., Amzel L.M., Devreotes P., Gabelli S.B., Cole P. Phosphorylation-mediated PTEN conformational closure and deactivation revealed with protein semisynthesis. eLife. 2013;2 doi: 10.7554/eLife.00691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Geffen Y., Anand S., Akiyama Y., Yaron T.M., Song Y., Johnson J.L., Govindan A., Babur Ö., Li Y., Huntsman E., et al. Pan-cancer analysis of post-translational modifications reveals shared patterns of protein regulation. Cell. 2023;186:3945–3967. doi: 10.1016/j.cell.2023.07.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gasparotto D., Zanon A., Bonaldo V., Marchiori E., Casagranda M., Di Domenico E., Copat L., Asquini T.F., Rigoli M., Feltrin S.V., et al. Mapping Cryptic Phosphorylation Sites in the Human Proteome. bioRxiv. 2025;2024 doi: 10.1101/2024.12.03.626562. Preprint at. 2012.2003.626562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gong E.Y., Hernández B., Nielsen J.H., Smits V.A.J., Freire R., Gillespie D.A. Chk1 KA1 domain auto-phosphorylation stimulates biological activity and is linked to rapid proteasomal degradation. Sci. Rep. 2018;8 doi: 10.1038/s41598-018-35616-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Jimenez J.L., Hegemann B., Hutchins J.R., Peters J.M., Durbin R. A systematic comparative and structural analysis of protein phosphorylation sites based on the mtcPTM database. Genome Biol. 2007;8:R90. doi: 10.1186/gb-2007-8-5-r90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Xiao Q., Miao B., Bi J., Wang Z., Li Y. Prioritizing functional phosphorylation sites based on multiple feature integration. Sci. Rep. 2016;6 doi: 10.1038/srep24735. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Smith K., Shen F., Lee H.J., Chandrasekaran S. Metabolic signatures of regulation by phosphorylation and acetylation. iScience. 2022;25 doi: 10.1016/j.isci.2021.103730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.English N., Torres M. In: Computational Methods for Predicting Post-Translational Modification Sites. Kc D.B., editor. Springer US; 2022. Enhancing the Discovery of Functional Post-Translational ModificationPost-translational modification (PTM) Sites with Machine Learning Models – Development, Validation, and Interpretation; pp. 221–260. [DOI] [PubMed] [Google Scholar]
- 23.Sora V., Laspiur A.O., Degn K., Arnaudi M., Utichi M., Beltrame L., De Menezes D., Orlandi M., Stoltze U.K., Rigina O., et al. RosettaDDGPrediction for high-throughput mutational scans: From stability to binding. Protein Sci. 2023;32 doi: 10.1002/pro.4527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Tsuboyama K., Dauparas J., Chen J., Laine E., Mohseni Behbahani Y., Weinstein J.J., Mangan N.M., Ovchinnikov S., Rocklin G.J. Mega-scale experimental analysis of protein folding stability in biology and design. Nature. 2023;620:434–444. doi: 10.1038/s41586-023-06328-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Perez-Mejias G., Velazquez-Cruz A., Guerra-Castellano A., Banos-Jaime B., Diaz-Quintana A., Gonzalez-Arzola K., Angel De la Rosa M., Diaz-Moreno I. Exploring protein phosphorylation by combining computational approaches and biochemical methods. Comput. Struct. Biotechnol. J. 2020;18:1852–1863. doi: 10.1016/j.csbj.2020.06.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Schymkowitz J., Borg J., Stricher F., Nys R., Rousseau F., Serrano L. The FoldX web server: an online force field. Nucleic Acids Res. 2005;33:W382–W388. doi: 10.1093/nar/gki387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Pires D.E.V., Ascher D.B., Blundell T.L. mCSM: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics. 2014;30:335–342. doi: 10.1093/bioinformatics/btt691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Prokhorenkova L., Gusev G., Vorobev A., Dorogush A.V., Gulin A. CatBoost: unbiased boosting with categorical features. arXiv. 2017 doi: 10.48550/arXiv.1706.09516. Preprint at. [DOI] [Google Scholar]
- 29.Pandey A.K., Ganguly H.K., Sinha S.K., Daniels K.E., Yap G.P.A., Patel S., Zondlo N.J. An Inherent Difference between Serine and Threonine Phosphorylation: Phosphothreonine Strongly Prefers a Highly Ordered, Compact, Cyclic Conformation. ACS Chem. Biol. 2023;18:1938–1958. doi: 10.1021/acschembio.3c00068. [DOI] [PubMed] [Google Scholar]
- 30.Ramasamy P., Vandermarliere E., Vranken W.F., Martens L. Panoramic Perspective on Human Phosphosites. J. Proteome Res. 2022;21:1894–1915. doi: 10.1021/acs.jproteome.2c00164. [DOI] [PubMed] [Google Scholar]
- 31.Pearce R., Huang X., Setiawan D., Zhang Y. EvoDesign: Designing Protein-Protein Binding Interactions Using Evolutionary Interface Profiles in Conjunction with an Optimized Physical Energy Function. J. Mol. Biol. 2019;431:2467–2476. doi: 10.1016/j.jmb.2019.02.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zhou Y., Pan Q., Pires D.E.V., Rodrigues C.H.M., Ascher D.B. DDMut: predicting effects of mutations on protein stability using deep learning. Nucleic Acids Res. 2023;51:W122–W128. doi: 10.1093/nar/gkad472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Cheng J., Novati G., Pan J., Bycroft C., Žemgulytė A., Applebaum T., Pritzel A., Wong L.H., Zielinski M., Sargeant T., et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science. 2023;381 doi: 10.1126/science.adg7492. [DOI] [PubMed] [Google Scholar]
- 34.Capra J.A., Singh M. Predicting functionally important residues from sequence conservation. Bioinformatics. 2007;23:1875–1882. doi: 10.1093/bioinformatics/btm270. [DOI] [PubMed] [Google Scholar]
- 35.Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A., et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Hornbeck P.V., Kornhauser J.M., Latham V., Murray B., Nandhikonda V., Nord A., Skrzypek E., Wheeler T., Zhang B., Gnad F. 15 years of PhosphoSitePlus(R): integrating post-translationally modified sites, disease variants and isoforms. Nucleic Acids Res. 2019;47:D433–D441. doi: 10.1093/nar/gky1159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Ge S.X., Jung D., Yao R. ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics. 2020;36:2628–2629. doi: 10.1093/bioinformatics/btz931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Oruganty K., Campit S.E., Mamde S., Lyssiotis C.A., Chandrasekaran S. Common biochemical properties of metabolic genes recurrently dysregulated in tumors. Cancer Metab. 2020;8:5. doi: 10.1186/s40170-020-0211-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Gurumayum S., Jiang P., Hao X., Campos T.L., Young N.D., Korhonen P.K., Gasser R.B., Bork P., Zhao X.M., He L.J., Chen W.H. OGEE v3: Online GEne Essentiality database with increased coverage of organisms and human cell lines. Nucleic Acids Res. 2021;49:D998–D1003. doi: 10.1093/nar/gkaa884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Mashaghi A., van Wijk R.J., Tans S.J. Circuit topology of proteins and nucleic acids. Structure. 2014;22:1227–1237. doi: 10.1016/j.str.2014.06.015. [DOI] [PubMed] [Google Scholar]
- 41.Scalvini B., Sheikhhassani V., Mashaghi A. Topological principles of protein folding. Phys. Chem. Chem. Phys. 2021;23:21316–21328. doi: 10.1039/d1cp03390e. [DOI] [PubMed] [Google Scholar]
- 42.Heidari M., Schiessel H., Mashaghi A. Circuit Topology Analysis of Polymer Folding Reactions. ACS Cent. Sci. 2020;6:839–847. doi: 10.1021/acscentsci.0c00308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Mugler A., Tans S.J., Mashaghi A. Circuit topology of self-interacting chains: implications for folding and unfolding dynamics. Phys. Chem. Chem. Phys. 2014;16:22537–22544. doi: 10.1039/c4cp03402c. [DOI] [PubMed] [Google Scholar]
- 44.Woodard J., Iqbal S., Mashaghi A. Circuit topology predicts pathogenicity of missense mutations. Proteins. 2022;90:1634–1644. doi: 10.1002/prot.26342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Hayashi H., Nishioka Y., Kamohara S., Kanai F., Ishii K., Fukui Y., Shibasaki F., Takenawa T., Kido H., Katsunuma N., et al. The alpha-type 85-kDa subunit of phosphatidylinositol 3-kinase is phosphorylated at tyrosines 368, 580, and 607 by the insulin receptor. J. Biol. Chem. 1993;268:7107–7117. [PubMed] [Google Scholar]
- 46.Nussinov R., Zhang M., Tsai C.J., Jang H. Phosphorylation and Driver Mutations in PI3Kalpha and PTEN Autoinhibition. Mol. Cancer Res. 2021;19:543–548. doi: 10.1158/1541-7786.MCR-20-0818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Khan I.U., Wallin R., Gupta R.S., Kammer G.M. Protein kinase A-catalyzed phosphorylation of heat shock protein 60 chaperone regulates its attachment to histone 2B in the T lymphocyte plasma membrane. Proc. Natl. Acad. Sci. USA. 1998;95:10425–10430. doi: 10.1073/pnas.95.18.10425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Ganesan K., Kulandaisamy A., Binny Priya S., Gromiha M.M. HuVarBase: A human variant database with comprehensive information at gene and protein levels. PLoS One. 2019;14 doi: 10.1371/journal.pone.0210475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Martinelli S., Nardozza A.P., Delle Vigne S., Sabetta G., Torreri P., Bocchinfuso G., Flex E., Venanzi S., Palleschi A., Gelb B.D., et al. Counteracting effects operating on Src homology 2 domain-containing protein-tyrosine phosphatase 2 (SHP2) function drive selection of the recurrent Y62D and Y63C substitutions in Noonan syndrome. J. Biol. Chem. 2012;287:27066–27077. doi: 10.1074/jbc.M112.350231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Kliche J., Garvanska D.H., Simonetti L., Badgujar D., Dobritzsch D., Nilsson J., Davey N.E., Ivarsson Y. Large-scale phosphomimetic screening identifies phospho-modulated motif-based protein interactions. Mol. Syst. Biol. 2023;19 doi: 10.15252/msb.202211164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Luwang J.W., Natesh R. Phosphomimetic Mutation Destabilizes the Central Core Domain of Human p53. IUBMB Life. 2018;70:1023–1031. doi: 10.1002/iub.1914. [DOI] [PubMed] [Google Scholar]
- 52.Jonson P.H., Petersen S.B. A critical view on conservative mutations. Protein Eng. 2001;14:397–402. doi: 10.1093/protein/14.6.397. [DOI] [PubMed] [Google Scholar]
- 53.Huang J.X., Lee G., Cavanaugh K.E., Chang J.W., Gardel M.L., Moellering R.E. High throughput discovery of functional protein modifications by Hotspot Thermal Profiling. Nat. Methods. 2019;16:894–901. doi: 10.1038/s41592-019-0499-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Peng Y., Alexov E. Investigating the linkage between disease-causing amino acid variants and their effect on protein stability and binding. Proteins. 2016;84:232–239. doi: 10.1002/prot.24968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Brender J.R., Zhang Y. Predicting the Effect of Mutations on Protein-Protein Binding Interactions through Structure-Based Interface Profiles. PLoS Comput. Biol. 2015;11 doi: 10.1371/journal.pcbi.1004494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Bradley D., Hogrebe A., Dandage R., Dubé A.K., Leutert M., Dionne U., Chang A., Villén J., Landry C.R. The fitness cost of spurious phosphorylation. EMBO J. 2024;43:4720–4751. doi: 10.1038/s44318-024-00200-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Incani F., Serra M., Meloni A., Cossu C., Saba L., Cabras T., Messana I., Rosatelli M.C. AIRE acetylation and deacetylation: effect on protein stability and transactivation activity. J. Biomed. Sci. 2014;21:85. doi: 10.1186/s12929-014-0085-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Narita T., Weinert B.T., Choudhary C. Functions and mechanisms of non-histone protein acetylation. Nat. Rev. Mol. Cell Biol. 2019;20:156–174. doi: 10.1038/s41580-018-0081-3. [DOI] [PubMed] [Google Scholar]
- 59.Shimizu K., Gi M., Suzuki S., North B.J., Watahiki A., Fukumoto S., Asara J.M., Tokunaga F., Wei W., Inuzuka H. Interplay between protein acetylation and ubiquitination controls MCL1 protein stability. Cell Rep. 2021;37 doi: 10.1016/j.celrep.2021.109988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Varadi M., Bertoni D., Magana P., Paramval U., Pidruchna I., Radhakrishnan M., Tsenkov M., Nair S., Mirdita M., Yeo J., et al. AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences. Nucleic Acids Res. 2024;52:D368–D375. doi: 10.1093/nar/gkad1011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Lever J., Zhao E.Y., Grewal J., Jones M.R., Jones S.J.M. CancerMine: a literature-mined resource for drivers, oncogenes and tumor suppressors in cancer. Nat. Methods. 2019;16:505–507. doi: 10.1038/s41592-019-0422-y. [DOI] [PubMed] [Google Scholar]
- 62.Guerois R., Nielsen J.E., Serrano L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 2002;320:369–387. doi: 10.1016/S0022-2836(02)00442-4. [DOI] [PubMed] [Google Scholar]
- 63.Delgado J., Radusky L.G., Cianferoni D., Serrano L. FoldX 5.0: working with RNA, small molecules and a new graphical interface. Bioinformatics. 2019;35:4168–4169. doi: 10.1093/bioinformatics/btz184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Cock P.J., Antao T., Chang J.T., Chapman B.A., Cox C.J., Dalke A., Friedberg I., Hamelryck T., Kauff F., Wilczynski B. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422. doi: 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. (2019). Optuna: A Next-Generation Hyperparameter Optimization Framework. KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 10.1145/3292500.333070. pp. 2623-2631
- 66.Kabsch W., Fau - Sander C., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers: Original Research on Biomolecules. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- 67.Moes D., Banijamali E., Sheikhhassani V., Scalvini B., Woodard J., Mashaghi A. ProteinCT: An implementation of the protein circuit topology framework. MethodsX. 2022;9 doi: 10.1016/j.mex.2022.101861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Needleman S.B., Wunsch C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970;48:443–453. doi: 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]
- 69.Rolfsson O., Palsson B.Ø., Thiele I. The human metabolic reconstruction Recon 1 directs hypotheses of novel human metabolic functions. BMC Syst. Biol. 2011;5:155. doi: 10.1186/1752-0509-5-155. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Contains stability predictions from the full method.
Contains stability predictions from the full method.
Contains stability predictions from the rapid method.
Data Availability Statement
-
•
This paper analyzes existing, publicly available data. The details of all datasets used in this study are provided in the key resources table.
-
•
All original code has been deposited at Zenodo and is publicly available at https://doi.org/10.5281/zenodo.16458054 and https://github.com/sriram-lab/phosphorylation as of the date of publication.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.






