Abstract
Kinases are pivotal cell signaling regulators and prominent drug targets. Short peptide substrates are widely used in kinase activity assays essential for investigating kinase biology and drug discovery. However, designing substrates with high activity and specificity remains challenging. Here, we present Subtimizer (substrate optimizer), a streamlined computational pipeline for structure-guided kinase peptide substrate design using AlphaFold-Multimer for structure modeling, ProteinMPNN for sequence design, and AlphaFold2-based interface evaluation. Applied to five kinases, four showed substantially improved activity (up to 350%) with designed peptides. Kinetic analyses revealed >2-fold reductions in Michaelis constant (Km), indicating improved enzyme-substrate affinity. Two designed peptides exhibited >5-fold improvement in selectivity. This study demonstrates AI-driven structure-guided protein design as an effective approach for developing potent and selective kinase substrates, facilitating assay development for drug discovery and functional investigation of the kinome.
1. Introduction
Protein kinases, comprising over 500 members, are central regulators of cellular processes including metabolism, signal transduction, cell growth, and differentiation [1–4]. These enzymes catalyze the transfer of a phosphate group from ATP to specific serine, threonine, or tyrosine residues on their target protein substrates, thereby modulating protein function and complex signaling pathways [4–6]. Consequently, the dysregulation of kinase activity is implicated in numerous diseases, including cancer, inflammatory disorders, and metabolic syndromes, making kinases important targets for both fundamental research and therapeutic intervention [2,7]. Over 85% of the human kinome is implicated in various diseases, establishing kinases as one of the most important and extensively pursued classes of drug targets [4,8,9].
This has led to the development and clinical success of numerous kinase inhibitors, with over 100 small-molecule inhibitors approved for clinical use [6,8]. However, these drugs target only about 10% of the human kinome, with the majority belonging to the tyrosine kinase family, leaving most kinases underexplored and underutilized in clinical contexts [8–11]. Robust kinase activity assays are indispensable tools for studying kinase biology and driving drug discovery efforts [12,13], yet, more than 50% of the over 500 known kinases do not have established high-throughput assays due to the lack of necessary tools, including optimal and assay-suitable substrates [3,7,14–16]. This lack of assays hinders research on these kinases and the development of potential therapeutics targeting them.
In practice, researchers often use short synthetic peptide substrates to quantify kinase activity, enabling the study of kinase function and inhibitor development [17–19]. Compared to full-length protein substrates, peptides offer several advantages, including ease of synthesis, purification, and storage. They are also more cost-effective and readily adaptable to various assay formats, including high-throughput screening [7,17–19]. However, most existing kinase substrates are either promiscuous or non-selective leading to high background and limited specificity, and hampering accurate activity measurement, especially for complex mixtures like cell lysates or tissue extracts [7,18,20,21]. Traditional substrate discovery and optimization methods, such as positional scanning peptide libraries or phosphoproteomic profiling, are costly, time-consuming and labor-intensive [15,19,22,23]. These methods often involve synthesizing libraries of peptides with systematic substitutions, followed by experimental determination of phosphorylation efficiency. While these approaches have been successful in identifying promiscuous substrates or minimal recognition motifs surrounding phosphosites, they are often limited by their inability to fully explore the vast sequence space beyond simple substitutions [15,23]. Thus, scalable, generalizable approaches to kinase substrate design and optimization are needed to cater to the growing need for tools to explore uncharacterized kinases, elucidate new mechanisms, and support kinase-targeted drug discovery [9].
Existing computational efforts have primarily focused on identifying phosphorylation sites on protein substrates [24,25], predicting kinase-protein substrate pairs [26], or predicting kinases responsible for known phosphosites [15,27]. Despite the long-standing interest and practical importance of designing optimal peptide substrates for kinases [28], computational design strategies have remained largely unexplored [24,25]. One notable existing design method is KINATEST-ID, developed to generate peptide substrates for use in chelation-enhanced fluorescence (CHEF) assays for tyrosine kinases [17]. However, KINATEST-ID’s reliance on kinase compatibility with phosphorylation-dependent lanthanide ion coordination (specifically terbium, Tb3+) limits its generalizability to diverse assay formats beyond Tb3+-sensitization CHEF assays [17].
Recent advances in AI-based protein modeling methods—propelled by the groundbreaking achievement of AlphaFold2 (AF2) for protein structure prediction [29] and followed by other related methods including RosettaFold [30], ESMFold [31], and AF-Multimer [32]—have revolutionized structural biology and protein engineering [33–35]. Additionally, the field of protein sequence design (predicting amino acid sequences that fold into a given protein structure) has also seen a rapid emergence of innovative AI models with exceptional performance such as ProteinMPNN [36], ESM-IF1 [37], ABACUS-R [38], and more. The transformative impact of these advances was recognized with the 2024 Nobel Prize in Chemistry awarded to David Baker for computational protein design and jointly to Demis Hassabis and John Jumper for protein structure prediction [39–41].
In this study, we define the design of optimal peptide substrates for kinases as a protein design problem and present subtimizer, a pipeline that integrates Nobel Prize-winning AI-based protein design and structure prediction methods. We previously demonstrated that established protein design tools, such as the ABACUS2 statistical energy function [42,43] and the RosettaDesign physics-based method [44,45] can reprogram protease-substrate interactions [46]. We hypothesized that recent advances in AI-based protein modeling and design would overcome the limitations of physics-based and statistically-learned empirical energy functions in selectively optimizing enzyme-substrate interactions [46]. To this end, we describe a streamlined structure-guided kinase substrate design workflow that utilizes AF-Multimer to predict the structure of a kinase in complex with a starting peptide substrate, ProteinMPNN for sequence optimization, and AF2-based interface prediction and evaluation [47]. As a proof of concept, we optimized known but suboptimal peptide substrates for a set of kinases, and the resulting designed peptides were experimentally evaluated.
2. Results
2.1. An AI-driven computational pipeline for designing optimal kinase peptide substrates
The subtimizer pipeline Figure 1A begins with the amino acid sequences of a target kinase and a starting peptide substrate, which could be a known, literature-reported, or kinase family-related substrate sourced from a phosphoproteomic database. Multiple 3D structure models of the kinase-peptide complex are then predicted using AF-Multimer. Given that the accuracy of the predicted complex structure is critical for the success of subsequent design steps, we implemented a filtering step based on the interface Predicted Template Modeling (ipTM) score. The top 5 complexes with ipTM > 0.75, indicating high-confidence prediction of the kinase-peptide interface, were retained for further processing. The predicted complex structures serve as input for the fixed-backbone ProteinMPNN design step, where novel peptide sequences are generated on the backbone of the peptide in each complex. The amino acid identity of the phosphosite (Ser/Thr/Tyr) and kinase sequence are fixed. The goal is to generate novel sequences predicted to optimize the interaction of the complex, with the assumption that improved binding would potentially enhance catalysis.
Figure 1:
Computational workflow and validation of the Subtimizer pipeline for kinase peptide substrate design. (A) Schematic overview of the Subtimizer computational pipeline integrating AlphaFold-Multimer, ProteinMPNN, and AlphaFold2-based structural evaluation for kinase substrate optimization. (B) Percentage of parental substrate sequence recovered by designed peptides across representative kinase-substrate pairs, demonstrating the pipeline’s ability to capture known functional features. (C) Sequence logos showing amino acid preferences at each position for designed peptides compared to parental substrates for selected kinase-substrate pairs. (D) Distribution of interface predicted aligned error (ipAE) scores for designed peptides (colored dots) versus parental substrates (bigger red dots) across all tested kinases, with lower scores indicating better predicted binding (interface) quality. (E) Correlation plots between ipAE scores and combined pTM and ipTM confidence metrics (0.2 · pTM + 0.8 · ipTM) for representative kinase-substrate pairs, with peptide confidence (pLDDT) shown by color gradient. In each plot, the starting peptide is shown as a starred dot. Encircled dots are the highest confidence designed peptides from which sequences were selected for experimental testing.
The newly generated peptide sequences are then paired with the kinase sequence and subjected to a second round of kinase-peptide structure modeling using AF-Multimer. The modeled complexes are then evaluated for kinase-peptide interface and binding quality using a modified version of AF2 (AF2 with initial guess) described by Bennett et al. (2023) [47]. Designed peptides with average predicted aligned error of interchain residue pairs (ipAE) score ≤ 10 are considered high-confidence binders [47]. When ΔipAE > 0 relative to the parent peptide, they are considered computationally improved. Additional evaluation metrics utilized include peptide pLDDT (predicted local distance difference test) [29] and a weighted sum of pTM and interface pTM (0.2 · pTM + 0.8 · ipTM) [32].
To validate this workflow, we applied it to 47 kinase-substrate pairs experimentally validated in-house. This set comprised 25 unique kinases, each paired with one or more validated but suboptimal peptide substrates. We evaluated how well the Subtimizer workflow recovers residues present in the experimentally validated substrates. Figure 1B shows the percentage substrate recovery for a select set of kinases (see Figure S1 for all pairs). Figure 1C shows sequence logos depicting amino acid preferences for the designed peptides for a few representative kinase-substrate pairs (see Figure S2 for all pairs). Predictions for over half of the evaluated kinase-peptide pairs successfully recovered at least 70% (and up to 90% in some cases) of the residues in the validated substrates (Figures S1 and S2), demonstrating the pipeline’s ability to generate sequences with features consistent with known substrates.
For each kinase-peptide pair, structural evaluation metrics, including ipTM and ipAE scores, were calculated using AF-Multimer and AF2 with-initial-guess [47]. The distribution of ipAE scores for representative kinase complexes of the designed and parental peptides is shown in Figure S1D (see Figure S3 for all kinase-peptide pairs). For all 47 kinase-peptide pairs except SGK1–1aktide, there are designed peptide complexes with ipAE lower than that of the parent peptide (Figure 1D and S3). In addition, most kinases have designed peptides with ipAE < 10 for at least one substrate type. As shown in Figures 1E and S4, lower ipAE scores correlate with high ipTM and pLDDT scores, indicating that these metrics are suitable for evaluating prediction confidence. These structural evaluation metrics were used to filter the designed sequences down to a smaller set of high confidence designs to be prioritized for experimental testing.
2.2. Designed Peptide Substrates Showed Improved Potency In Vitro
To experimentally evaluate the computationally optimized peptide sequences, we chose five kinases based on commercial availability. Using the ipAE, pLDDT, and ipTM scores, we ranked the Subtimizer-designed sequences, and for each kinase we selected two to five peptides for synthesis and experimental testing. We evaluated the activity of the peptides using the ADP-Glo kinase assay, a luminescence-based method that quantifies ADP production as a measure of kinase activity. First, we measured the ADP-Glo signals for the parental peptide and the baseline (no enzyme background) for the five kinases. As shown in Figure 2A, we established that all five kinases are active under the assay conditions and that the signal generated from the phosphorylation reaction is significantly above background noise. The varying signal intensities across different kinases reflect differences in their specific activities and the efficiency of the parent peptide substrates.
Figure 2:
Experimental validation of designed peptides shows improved kinase activity. (A) ADP-Glo assay baseline measurements showing kinase activity with parental substrates and background controls for all five tested kinases. (B–F) Relative percentage change in kinase activity for designed peptides compared to parental substrates for ALK (B), MET (C), ROS1 (D), EGFR L858R (E), and SRC (F). Designed peptides are color-coded with sequence information provided. Phosphosite residues that were fixed during design are highlighted in green. Four out of five kinases showed substantial activity improvements with at least one designed peptide.
Figures 2B–F show, for ALK, MET, ROS1, EGFR, and SRC, respectively, the relative percentage change in kinase activity for the designed peptides compared to their respective parent peptides. As shown in Figure 2B, four out of five peptides designed for ALK showed improved activity compared to the parent peptide KSRGDYMTMQIG. Notably, peptide alk-s2 (PSPEDYMWMEIG) demonstrated a remarkable increase in activity, exceeding 300% relative to the parental axltide substrate. Alk-s5 (HSPEDYMWVEME) and alk-s4 (PDPDDYMLMEIG) also showed substantial improvements of 146% and 130%, respectively. For MET, two designed peptides met-s1 (EEPAYVALPAK) and met-s2 (EEPLYVALPAP) were tested as shown in Figure 2C. Compared to the parent peptide EEPLYWSFPAK, both met-s1 and met-s2 exhibited enhanced activity, showing relative improvements of approximately 41% and 11%, respectively.
For ROS1 (Figure 2D), only ros1-s2 (KEPFYMRLKPK) showed improved activity relative to the parent EEPLYWSFPAK, demonstrating an increase of about 35%. Evaluation of designed substrates for EGFR L858R (Figure 2E) revealed that two out of four peptides, egfr-s1 (EPEDYAPV) and egfr-s2 (EPEGYAPL), showed improved activity compared to the parent peptide KEEIYFFF. Egfr-s1 exhibited an increase in activity of about 10%, while egfr-s2 showed approximately 5% increase. Unlike ALK, MET, ROS1, and EGFR L858R, none of the five designed peptides tested for SRC activity (Figure 2F) showed better activity compared to the parent peptide. Rather, all designed SRC peptides exhibited reduced activity, with decreases ranging from approximately 57% to 80%. Overall, experimental validation using the ADP-Glo assay confirmed that the Subtimizer pipeline can design peptide substrates with substantially improved activity. The tests also showed the high success rate of the pipeline, as optimized peptides can be identified by experimentally testing as few as two peptides.
2.3. Improved Activity of Designed Peptides Correlates with Enhanced Binding Affinity
To further characterize the improved designed peptide substrates and gain insights into the mechanism underlying their enhanced activity, we determined the Michaelis constant (Km) for the most potent designed peptides and their corresponding parent substrates using the mass spectrometry-based LIMS-kinase assay previously developed in our group [16]. Figures 3A and 3B illustrate the schematic of a kinase assay reaction and the multiple reaction monitoring (MRM) phosphorylation detection system employed in the LIMS-kinase assay. The MRM setup comprises a RapidFire liquid chromatography sampler (Agilent) coupled to a triple quadrupole mass spectrometer (AB Sciex 6500). Quadrupole 1 (Q1) acts as a mass filter, selecting precursor ions based on their mass-to-charge ratios. Q2 functions as a collision cell, fragmenting these precursor ions into product ions. Q3 serves as an additional mass filter, specifically monitoring desired product ion fragments originating from the Q1-selected parent ion (Figure 3B).
Figure 3:

Kinetic characterization reveals improved binding affinity of designed peptides. (A) Schematic of kinase phosphorylation reaction monitored by the LIMS-kinase assay. (B) Multiple reaction monitoring (MRM) setup showing the mass spectrometry detection pathway for ADP quantification. (C–E) Michaelis-Menten kinetic analysis comparing parental and optimized peptides for ROS1 (C), MET (D), and ALK (E). Left panels show time-course data at varying substrate concentrations; right panels show saturation curves with fitted Km values and 95% confidence intervals. The designed peptides demonstrated lower Km values, indicating improved apparent binding affinity.
To determine kinase activity, we directly measured ADP production by following specific ADP ion fragments rather than quantifying phosphorylated peptide fragments as in our original published LIMS-kinase assay [16]. We chose to measure ADP production rather than the phosphorylated peptide product to avoid potential variations in ionization efficiency or fragmentation patterns that can arise from differences in amino acid sequences and physicochemical properties of different peptide substrates. This ensures not only a more direct and unbiased comparison of potency across different peptides, but also broad applicability of optimized substrates across different assay platforms. Initial tests confirmed that we were able to selectively detect an ADP fragment in kinase/peptide reactions (Figure S5). The ADP signal was enzyme concentration- and time-dependent, consistent with the expected response of a functional assay. We then optimized the LIMS-kinase assay for ROS1, MET, ALK, and EGFR L858R. We evaluated two enzyme concentrations with the objective of selecting the concentration where activity was linear to 60 minutes with a strong signal (Figure S5).
We ran LIMS-kinase assays using a range of peptide concentrations. We plotted the initial reaction rate for each concentration to generate Michaelis-Menten curves. The curves were used to calculate Km values for parental and most potent Subtimizer-designed peptides for ROS1, MET, and ALK, respectively (Figures 3C–E, Figures S6–S8). We could not run a successful Km study with EGFR L858R due to the insolubility of the parental MS-csktide1 peptide at high concentrations, similar to our previous observation [16]. For ROS1 (Figure 3C, Figure S6), the parent peptide (MS-srctide) exhibited a Km of 47.6 μM while the optimized peptide ros1-s2 showed a significantly lower Km of 20.3 μM. This more than two-fold reduction in Km indicates that the designed peptide has a substantially higher apparent affinity for ROS1 compared to the parent substrate.
While the parent peptide substrate MS-srctide had a Km of 288 μM for MET (Figure 3D, Figure S7), the designed peptide met-s1 showed an improved activity with a Km of 134 μM—an over 2-fold reduction in Km for MET. For ALK (Figure 3E, Figure S8), the parent peptide MS-axltide exhibited a Km of 18.4 μM while the designed peptide alk-s3 showed a lower Km of 14.1 μM. This decrease in Km suggests improved apparent affinity of the designed peptide for ALK, consistent with the enhanced activity observed in the ADP-Glo assay. The consistent observation of lower Km values for the Subtimizer-designed peptides with improved activity across multiple kinases provides strong evidence that the computational pipeline optimizes substrates for kinase-substrate interaction, generating peptide sequences that bind more tightly to the kinase active site.
2.4. Subtimizer-Optimized Substrates Showed Improved Selectivity for Target Kinases
Beyond improving activity, the enhancement of substrate selectivity for a target kinase over other kinases is critically important in practical applications such as assays in complex environments like cells or cell lysates. This is particularly relevant for kinases with similar substrate recognition motifs or those belonging to the same family. To evaluate the selectivity of the Subtimizer-designed peptides, we assessed the activity of selected optimized peptides against both their intended target kinases and a related kinase that utilizes the same parent peptide. The MS-srctide peptide (EEPLYWSFPAK) is a substrate for many kinases including MET and ROS1 [16], and it served as the parent substrate for both the met-s1 peptide, optimized for MET, and the ros1-s2 peptide, optimized for ROS1 (Figure 4A). This common parentage provides a direct context for evaluating whether the design process introduced kinase-specific selectivity.
Figure 4:
Designed peptides exhibit enhanced selectivity for target kinases. (A) Schematic showing the common parental substrate MS-srctide optimized separately for MET (met-s1) and ROS1 (ros1-s2). (B) MET kinase activity with designed peptides ros1-s2 and met-s1, showing preferential activity with MET-optimized peptide. (C) ROS1 kinase activity with the same peptides, demonstrating preferential activity with ROS1-optimized peptide. The reciprocal selectivity pattern confirms successful kinasespecific optimization.
Figure 4B shows the activity of MET kinase with each of the designed peptides ros1-s2 and met-s1, as measured by the LIMS-kinase assay (quantifying ADP production). The MET kinase exhibited high activity with met-s1 peptide (designed for MET) as expected and consistent with the improved activity observed in the ADP-Glo assay (Figure 2C). In contrast, MET kinase showed over 4-fold reduction in activity with the ros1-s2 peptide (designed for ROS1). This indicates that the modifications introduced during optimization of ros1-s2 for ROS1 reduced its efficiency as a substrate for MET, even though both peptides originated from the same MS-srctide parent.
Conversely, when the same two peptides ros1-s2 and met-s1 were incubated with ROS1 kinase (Figure 4C), ROS1 kinase showed an over 11-fold reduction in activity with the met-s1 peptide (designed for MET) compared to the ros1-s2 peptide. However, with the ros1-s2 peptide, ROS1 demonstrated high activity as anticipated from the ADP-Glo activity data (Figure 2D) and the Km data (Figure 3C). This reciprocal pattern of activity demonstrates that the Subtimizer design process successfully introduced selectivity into the designed peptides. The MET-optimized peptide (met-s1) is preferentially phosphorylated by MET, while the ROS1-optimized peptide (ros1-s2) is preferred by ROS1, despite their common origin from the less selective MS-srctide peptide.
2.5. Structural and Computational Analyses of Optimized Kinase-Peptide Complexes
To gain deeper understanding of the molecular mechanisms underlying the observed improvements in activity, binding affinity (Km), and selectivity of the designed peptide substrates, we performed structural and computational analyses of the predicted kinase-peptide complexes of the most potent peptides compared to the parental peptides. We performed a physics-based refinement of the AF-Multimer models of the complexes using Rosetta FlexPepDock [48]. The FlexPepDock is a high-resolution, sub-angstrom quality, protein-peptide docking protocol implemented as a module within the Rosetta framework and is capable of refining protein-peptide complex models to near-native structures [48]. We analyzed the kinase-peptide interactions observed in the refined structures (Figure 5). We also analyzed the computed Rosetta energy and interface scores of the refined structures to quantitatively assess the predicted binding interactions (Table 1).
Figure 5:
Structural analysis reveals molecular basis for improved activity. (A-D) Comparison of AlphaFold-Multimer predicted structures and hydrogen bonding networks for parental (green) versus designed (yellow) peptides in complex with ALK (A), ROS1 (B), MET (C), and EGFR L858R (D). Kinase structures are shown in gray ribbon representation. Green boxes highlight hydrogen bond interactions formed by parental peptides; yellow boxes show interactions formed by designed peptides. Designed peptides generally form additional or optimized hydrogen bonds that correlate with improved experimental activity.
Table 1:
Rosetta FlexPepDock scoring analysis of kinase-peptide complexes
| kinase | peptide | total_sc ↓ | reweighted_sc ↓ | hb_I↑ | I_sc ↓ | hb_bb_sc ↓ | hb_sc ↓ | pep_sc ↓ | fa_atr ↓ | fa_elec ↓ | fa_rep ↓ |
|---|---|---|---|---|---|---|---|---|---|---|---|
| ALK |
axltide (parental) KSRGDYMTMQIG |
−647.6 | −722.8 | 8 | −53.5 | −65.6 | −24.09 | −21.811 | −1806.3 | −507.8 | 273.7 |
|
alk-s3 HDPEDYMWVEMR |
−695.3 | −776.3 | 9 | −56.7 | −68.6 | −26.87 | −24.214 | −1824.0 | −514.3 | 257.1 | |
| ROS1 |
sretide (parental) EEPLYWSFPAK |
−778.7 | −858.0 | 7 | −58.2 | −62.1 | −23.16 | −21.126 | −1866.4 | −530.3 | 219.2 |
|
rosl-s2 EEPAYMLMPAK |
−784.3 | −879.0 | 9 | −65.5 | −64.1 | −25.66 | −29.145 | −1848.6 | −526.0 | 218.9 | |
| EGFR |
csktide (parental) KEEIYFFF |
−729.2 | −784.6 | 5 | −41.6 | −42.7 | −24.67 | −13.81 | −1732.2 | −479.8 | 206.8 |
|
egfr-s2 EPEDYAPI |
−788.8 | −849.1 | 7 | −39.4 | −46.3 | −24.48 | −20.88 | −1743.3 | −474.2 | 220.5 | |
| MET |
srctide (parental) EEPLYWSFPAK |
−494.8 | −566.6 | 5 | −51.2 | −56.9 | −19.76 | −20.612 | −1719.0 | −446.8 | 308.1 |
|
met-s1 EEPLYVALPAP |
−559.6 | −630.3 | 5 | −46.6 | −62.3 | −17.03 | −24.12 | −1734.9 | −433.4 | 277.9 |
totalsc: Total score of the complex (REU: rosetta energy unit). reweighted_sc: Re weighted score of the complex (REU) with interface residues given double weight and peptide residues given triple weight. hb_I: Number of hbonds at interface (calculated with HBPlus). I_sc: Interface score (sum over energy contributed by interface residues of both partners). hbond_bb_sc: sc-bb hbond energy. hbond_sc: sc-sc hbond energy. pep_sc: Peptide score (sum over energy contributed by the peptide to the total score; consists of internal peptide energy and interface energy). AF_hb: Number of hbonds in AF-multimer model used as input for Rosetta FlexPepDock. fa_atr: Lennard-Jones attractive between atoms in different residues. fa_rep: Lennard-Jones repulsive between atoms in different residues. fa_elec: Coulombic electrostatic potential with a distance-dependent dielectric
Figure 5A–D presents structural representations and schematic diagrams highlighting key polar interactions between the kinases and their parental or optimized peptide substrates for ALK, ROS1, MET, and EGFR L858R. By comparing the hydrogen bonding patterns and overall predicted binding poses of the parental peptides with their optimized counterparts, we evaluated potential structural determinants that contribute to the enhanced functional properties. Although the parental peptide MS-axltide (green) forms several hydrogen bonds with residues of ALK (gray) in the structure model shown in Figure 5A, the modeled complex structure with the optimized peptide alk-s3 (yellow) forms slightly different hydrogen bonding patterns, including one additional hydrogen bond with ALK. While many of the hydrogen bonds are retained in the alk-s3, new interactions were made at the substituted peptide residues K1H, G4E, and G12R with kinase residues N151, M199, and N232. These structural changes, particularly the modifications in the hydrogen bonding pattern are predicted to lead to more favorable binding interaction, consistent with the enhanced activity and lower Km observed for ALK-optimized peptides (Figures 2B and 3E).
In the case of ROS1 (Figure 5B), the two additional hydrogen bonds observed in the predicted model of ROS1 kinase with the optimized ros1-s2 substrate were formed by unchanged residues E2, and K11. This indicates that the alterations in the sequences caused some changes in the binding mode of the peptide in a way that optimized the interactions of ROS1 with ros1-s2 for improved binding (Figure 3C) and higher activity (Figure 2D). While the number of hydrogen bonds of the MET complexes of both the parental MS-srctide (green) and optimized met-s1 (yellow) peptide substrates are the same, the unchanged met-s1 peptide residues E2 and Y5 makes new hydrogen bonds with the kinase residues T222 and D137, respectively (Figure 5C).
Furthermore, ros1-s2 residues P3, M8, and K11 form unique hydrogen bonds with ROS1 whereas no corresponding hydrogen bonds exist in met-s1 with MET. Similarly, met-s1 residue E1 makes a unique hydrogen bond with MET compared to the ROS1-ros1-s2 complex. These unique structural differences of the met-s1 and ros1-s2 complexes suggest that the design process introduced specific interactions tailored to the structural features of the binding pocket of the corresponding kinase. This indicates that these unique interactions are likely key determinants of the enhanced selectivity of met-s1 for MET over ROS1 (Figure 4B) and ros1-s2 for ROS1 over MET (Figure 4C).
For EGFR L858R, the unchanged residues E1 and E3 of the optimized egfr-s1 peptide make two new hydrogen bonds with the kinase residues R99 and K209 consistent with the improvement in activity (Figure 5E). Although one hydrogen bond between the parental MS-csktide1 peptide residue E2 and the kinase residue R99 was lost in egfr-s1, the lost bond was compensated for by the new bond between the egfr-s1 peptide residue E1 and the kinase residue K209. These changes in hydrogen bonding pattern suggest potential selectivity of the optimized egfr-s1 peptide for EGFR L858R as opposed to the promiscuous MS-csktide1, which is phosphorylated by several other kinases, including MET, RET M918T, and ROS1 [16].
The Rosetta energy and interface scores calculated for each kinase-peptide complex offer a quantitative assessment of the predicted binding interactions of the parental and the most potent designed peptides (Table 1). For all kinases, the Rosetta scores “total_sc” (overall energy of the complex) and “reweighted_sc” (reweighted energy prioritizing interface and peptide residues) of the designed peptides were lower (better) than those of the parental peptides (Table 1), indicating that the designed peptides formed more energetically favorable and stable kinase-peptide interactions. Similarly, the designed alk-s3, ros1-s2, egfr-s2, and met-s1 peptides all have lower (better) Rosetta “pep_sc” (peptide score – overall peptide energy) than their corresponding parental peptides, indicating the designed peptides have better stability and contribute more favorably to the total energy scores of their complexes.
The number of kinase-peptide interface hydrogen bonds (hb_I) for the designed peptides in complex with ALK, ROS1, and EGFR were higher than those of the parental peptides as shown in Table 1 and Figure 5. The increased numbers of polar interactions might play an important role in the improved activity of the designed alk-s3, ros1-s2, and egfr-s2 peptides. However, for MET, the number of interface hydrogen bonds is the same for both the parental and designed peptides. Although the hydrogen bonding pattern differs in the MET complexes of the parental and designed peptides, the improved activity of met-s1 over the parental MS-srctide might not be attributed to just polar interactions. The improvement might also be due to the overall stability of the met-s1 peptide and its complex with MET compared to the parental peptide as indicated by met-s1 having better Rosetta energy scores, including total_sc, reweighted_sc, pep_sc, fa_atr (LJ attractive energy), and fa_rep (LJ repulsive energy) (Table 1)
3. Discussion
In this study, we present Subtimizer, a structure-guided computational workflow that approaches the design of potent and selective kinase peptide substrates as a protein design problem. The workflow integrates AF-Multimer for predicting kinase-peptide complex structures, ProteinMPNN for designing novel peptide sequences within this structural context, and structural metrics (ipTM, pTM, ipAE, and pLDDT) for evaluating the confidence and predicted quality of the designed peptides. Unlike previous studies on kinase-substrate relationships that mainly addressed the phosphorylation prediction task, the Subtimizer workflow combines both structure prediction and sequence design tasks to generate optimal peptide substrates for kinases.
We experimentally validated the Subtimizer pipeline by testing designed peptides for five kinases using a luminescence ADP-Glo kinase assay. For four out of the five kinases tested, we successfully identified designed peptides that demonstrated substantially improved kinase activity compared to their parental substrates, with the magnitude of improvement reaching over 300%. Kinetic characterization revealed that the improved activity of the most potent designed peptides for ALK, ROS1, and MET is primarily due to lower Km values (enhanced binding affinity). The improved binding affinity is consistent with the structure-guided design approach, which aims to optimize kinase-peptide interactions within the substrate binding pocket. We also demonstrated that the Subtimizer pipeline generates novel substrates with improved selectivity for the target kinase by showing that substrates designed for ROS1 and MET, while originating from the same parental peptides, exhibited 4-fold and 11-fold improvement in selectivity for their target kinase, respectively. High substrate affinity and selectivity are paramount for developing sensitive kinase assays suitable for low enzyme concentrations and complex biological samples like cell lysates and tissue extracts, as well as direct measurement in vivo [7,18,20,21].
The computational confidence and quality evaluation steps of the workflow allowed for rapid prioritization of high-confidence designs for experimental validation, as only two to five designed peptides needed to be tested to identify peptides with improved activity. This significantly reduces the experimental burden compared to traditional screening, highlighting both the predictive power and practical efficiency of Subtimizer. Thus, the structure-guided protein design workflow offers a viable approach toward developing the necessary tools to enable comprehensive studies of the human kinome. The ability to generate high-affinity, selective substrates is crucial not only for fundamental research but also for accelerating drug discovery efforts, particularly for the large number of understudied kinases [8,10,11,15,16].
While the Subtimizer pipeline demonstrated high success rates for four (ALK, EGFR L858R, MET, and ROS1) of the five kinases tested, designed peptides tested for SRC showed reduced activity compared to the parental MS-srctide peptide. This indicates that the workflow may not work for all kinases, highlighting potential limitations. However, such limitations can be overcome by introducing further improvements to the Subtimizer workflow. The fields of AI-driven protein design and structure prediction are rapidly evolving, with innovative methods emerging frequently. This, in principle, presents an opportunity to enhance the workflow from two angles: by improving the accuracy of (1) the kinase-substrate structure models, and (2) the sequence design step.
We envision that the Subtimizer workflow would be improved by replacing AF-Multimer with the recently published AlphaFold 3 [49] (or comparable models like Boltz-1 [50], Protenix [51], or Chai-1 [52]), the latest iteration of AF that can predict structures of protein complexes with nucleic acids, small molecules, ions, and modified residues. Second, LigandMPNN, the recently developed and ligand-aware version of ProteinMPNN that excels at designing residues near ligands and cofactors [53], could be used in the sequence design step. Since kinases rely on metal cofactors and nucleotides for phosphorylation, implementing these ligand-aware modifications could improve the performance of the peptide design workflow. For example, such modifications could facilitate the optimization of not only substrate binding but also catalysis as design objectives, potentially overcoming the limitations of Km-Vmax tradeoff seen with peptides designed for ROS1 and MET (Figure 3C–D).
Another potential limitation of the workflow is a case where the AF-Multimer structure prediction step fails to generate a complex structure that passes the threshold of the confidence metric. In such a case, it might be helpful to either incorporate an automated MD simulation tool for dynamics refinement [54], or utilize methods such as AlphaRED [55] that significantly improved the success rate and accuracy of AF-Multimer on difficult cases by integrating ReplicaDock (a physics-based replica exchange docking algorithm).
In summary, this work demonstrates that AI-driven structure-guided protein design holds tremendous potential for scaling up kinase assay development by enabling rapid design of optimal kinase-specific peptide substrates for a larger set of kinases, including high-priority dark kinome members lacking validated substrates. More broadly, results from this study indicate that applications of recent advancements in AI-driven structure-guided protein engineering could generalize to other enzyme-substrate specificity engineering applications beyond kinases, such as proteases [46] and phosphatases [56,57].
4. Methods
4.1. Workflow for Computational Design of Kinase Peptide Substrates
4.1.1. Kinase-peptide complex prediction
The 3D structures of kinase-peptide complexes were predicted with AF-Multimer [32] using the ColabFold [58] implementation. The code for local installation of ColabFold was obtained from the repository https://github.com/sokrypton/ColabFold. Sequences of the kinase catalytic domain obtained from UniProt and the starting peptide substrate were used as input to AF-Multimer. For the proof-of-concept study, 25 kinases were each paired with one or more starting peptide substrates that have previously been experimentally validated in-house, making a total of 45 kinase-peptide pairs. For each pair, five rounds of AF-Multimer predictions were run with different seeds, generating five models per round. The number of AF-Multimer recycles and Amber relax cycles were set to 10 and 3, respectively. The top-ranking model from each round was selected, resulting in a total of 5 predictions. Only predictions with an ipTM > 0.75 were considered high-confidence and used for the downstream design step.
4.1.2. Peptide sequence design
The AF-Multimer models were passed to ProteinMPNN, which was used to design novel sequences on the backbone of the starting peptide while keeping the kinase residues fixed and preserving the amino acid identity of the phosphosite (Ser/Thr/Tyr) in the peptide. For the AKT2-gsk3tide complex where two crystal structures are available (PDB IDs 1O6K, 1O6L), these were also included as separate pairs (a total of 47 pairs) for use as input for ProteinMPNN. For each of the input kinase-peptide models, 480 sequences were designed (a total of 2400 for each of the AF-Multimer models) with a sampling temperature of 0.1 and batch size of 32. The designed sequences were clustered using CD-Hit [59] at 100% sequence identity to eliminate redundant sequences. The ProteinMPNN code was obtained from the repository https://github.com/dauparas/ProteinMPNN.
4.1.3. Structure prediction and evaluation of designed sequences
The newly designed peptide sequences were paired with their target kinase sequences, and their complex structures were predicted again using AF-Multimer with minimal parameters (2 AF recycles, 4 prediction models, and no Amber relaxation). The top-ranking AF-Multimer model for each pair was used as initial guess for the interface prediction and evaluation using a modified version of AF2 (AF2 with-initial-guess) as described by Bennett et al. (2023) [47]. The code for AF2 with-initial-guess interface prediction was obtained from https://github.com/nrbennet/dl_binder_design. The ipAE, ipTM, pTM, and pLDDT scores were used to evaluate and rank the ProteinMPNN-generated sequences to identify a set of high-confidence designs for experimental validation. For the kinases (ALK, MET, ROS1, EGFR L858R, and SRC) selected for experimental validation, two to five designed peptides with low ipAE and high ipTM and pLDDT scores were chosen for experimental tests.
4.1.4. Structural analysis and energy evaluation
Structure refinement and quantitative assessment of predicted binding interactions was performed using the Rosetta FlexPepDock protocol [48]. AF-Multimer models of the kinase complexes of the most potent designed peptide and their parental counterparts were used as input for FlexPepDock refinement and scoring using the Rosetta energy function. The refined structure with the lowest total Rosetta energy score was used for interaction analysis. Hydrogen bond interactions between kinase and peptide residues were calculated with HBPLUS 3.2 [60] using angle and distance criteria of D-H-A > 125°, D-A < 3.45 Å. Predicted kinase-peptide complex structures and key interactions were visualized using PyMOL (Schrödinger, LLC). Plots were generated using Python or GraphPad Prism.
4.2. ADP-Glo Kinase Assays
ADP-Glo Kit (ADP-Glo™ Kinase Assay, Cat#V6930) was obtained from Promega. The kinases ALK (#08–518), MET (#08–151), ROS1 (#08–163), and EGFR L858R (#08–502) were products of Carna Biosciences. SRC was expressed and purified as previously reported [61,62]. Substrate peptides were obtained from Biomatik Corporation (Wilmington, DE, USA). All components were equilibrated to 25°C prior to setting up reactions in 384-well microplates (White ProxiPlate 384-shallow well Plus). Final working concentrations were enzyme: 1.25–10 nM, ATP 10 μM, peptide 1 μM. All reagents were added manually and incubated for 90 minutes. Experiments were repeated twice with results expressed as mean ± standard deviation. Luminescence was detected on a Synergy Neo2 plate reader.
4.3. LIMS-Kinase Assays – Enzyme Optimization
All components were equilibrated to 25°C prior to setting up reactions in 384-well microplates (Costar 3657 round bottom polypropylene). All reagents were added manually, and final working concentrations were enzyme 0.63–10 nM, ATP 100 μM, peptide 1–10 μM. Enzyme was first added to plate, followed by peptide/ATP mix in buffer to initiate reaction. Reactions were quenched at 20, 40, 60, 80, 100 minutes with formic acid at final concentration of 1%. Experiments were repeated twice and results expressed as mean ± standard deviation.
4.4. LIMS-Kinase Assays – Kinetic Analysis
Final working concentrations of enzyme varied by kinase. ATP was at 100 μM, and peptide concentrations started at 300–400 μM followed by 2-fold or 3-fold dilutions. All components were equilibrated to 25°C prior to setting up reactions. Peptide dilutions were added to 384-well microplates (Costar 3657 round bottom polypropylene). All reagents were added manually. Buffer, enzyme, and ATP mixture were added to initiate reaction. Reactions were quenched at set time-points with 1% formic acid (final). Activity was evaluated by measuring MS signal of ADP-specific fragment ion. Experiments were repeated twice, with results expressed as mean ± standard deviation.
4.5. Calculation of Km
Km was evaluated by linear regression of the first three time-points at each concentration of peptide. The slope was used to represent initial velocity in graph of initial velocity versus concentration of peptide. Nonlinear regression using GraphPad Prism’s Michaelis Menten equation was used to estimate the Km.
4.6. Peptide Selectivity Assays
For MET selectivity assay, reagent final working concentrations were MET 5 nM, peptides 134 μM (Km for met-s1 with MET), and ATP 100 μM. For ROS1 selectivity assay, reagent final working concentrations were ROS1 0.63 nM, peptides 20.3 μM (Km for ros1-s2 with ROS1), and ATP 100 μM. All components equilibrated to 25°C prior to setting up reactions. All reagents added manually. Peptides added to 384-well microplates (Costar 3657 round bottom polypropylene). A pre-mixed buffer, enzyme, and ATP solution was added to initiate reaction. Reaction was quenched with 1% formic acid (final) following 30-minute incubation at room temperature. Kinase activity was evaluated by measuring MS signal of ADP-specific fragment ion. Experiments were repeated twice, with results expressed as mean ± standard deviation. Time 0 (background) values were subtracted from subsequent time-points.
4.7. RapidFire Chromatography and Mass Spectrometry
RapidFire liquid chromatography and mass spectrometry methods were performed as previously described [16].
4.8. Statistical Analysis
All experiments were performed in duplicate with results expressed as mean ± standard deviation. Time 0 background values were subtracted from subsequent time-points where indicated.
Supplementary Material
Acknowledgements
We thank the University of Texas Southwestern Medical Center and the Simmons Comprehensive Cancer Center for institutional support. This work was supported by Welch Foundation I-1829 (K.D.W), NIH UM1CA294119 (K.D.W), P50CA070907 (K.D.W) and P30CA142543.
Code Availability
The code of the Subtimizer pipeline is publicly available at https://github.com/abeebyekeen/subtimizer.
References
- [1].Soleymani Saber, Gravel Nathan, Huang Liang-Chin, Bendzunas Nathaniel G., Kochut Krzysztof J., and Kannan Natarajan. Dark kinase annotation, mining, and visualization using the protein kinase ontology. PeerJ, 11:e16087, December 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Matthew E Berginski Nienke Moret, Liu Changchang, Goldfarb Dennis, Sorger Peter K, and Gomez Shawn M. The dark kinase knowledgebase: an online compendium of knowledge and experimental results of understudied kinases. Nucleic Acids Research, 49(D1):D529–D535, October 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Manning G., Whyte D. B., Martinez R., Hunter T., and Sudarsanam S.. The protein kinase complement of the human genome. Science, 298(5600):1912–1934, December 2002. [DOI] [PubMed] [Google Scholar]
- [4].Wilson Leah J., Linley Adam, Hammond Dean E., Smith Paul D., Eyers Patrick A., and Prior Ian A.. New perspectives, opportunities, and challenges in exploring the human protein kinome. Cancer Research, 78(1):15–29, January 2018. [DOI] [PubMed] [Google Scholar]
- [5].Hunter Tony. Protein kinase classification. In Hunter Tand Sefton BM, editors, Methods in Enzymology, pages 3–37. Academic Press, 1991. [DOI] [PubMed] [Google Scholar]
- [6].Li Jiahao, Gong Chen, Zhou Haiting, Liu Junxia, Xia Xiaohui, Ha Wentao, Jiang Yizhi, Liu Qingxu, and Xiong Huihua. Kinase inhibitors and kinase-targeted cancer therapies: Recent advances and future perspectives. International Journal of Molecular Sciences, 25(10):5489, May 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Wu Ding, Sylvester Juliesta E., Parker Laurie L., Zhou Guangchang, and Kron Stephen J.. Peptide reporters of kinase activity in whole cell lysates. Peptide Science, 94(4):475–486, January 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Attwood Misty M., Fabbro Doriano, Sokolov Aleksandr V., Knapp Stefan, and Schiöth Helgi B.. Trends in kinase drug discovery: targets, indications and inhibitor design. Nature Reviews Drug Discovery, 20(11):839–861, August 2021. [DOI] [PubMed] [Google Scholar]
- [9].Wu Peng, Nielsen Thomas E., and Clausen Mads H.. Fda-approved small-molecule kinase inhibitors. Trends in Pharmacological Sciences, 36(7):422–439, July 2015. [DOI] [PubMed] [Google Scholar]
- [10].Anderson Brian, Rosston Peter, Han Wee Ong Mohammad Anwar Hossain, Zachary W. Davis-Gilbert, and David H. Drewry. How many kinases are druggable? a review of our current understanding. Biochemical Journal, 480(16):1331–1363, August 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Fabbro Doriano. 25 years of small molecular weight kinase inhibitors: Potentials and limitations. Molecular Pharmacology, 87(5):766–775, May 2015. [DOI] [PubMed] [Google Scholar]
- [12].Jafari Rozbeh, Almqvist Helena, Axelsson Hanna, Ignatushchenko Marina, Thomas Lundbäck Pär Nordlund, and Molina Daniel Martinez. The cellular thermal shift assay for evaluating drug target interactions in cells. Nature Protocols, 9(9):2100–2122, August 2014. [DOI] [PubMed] [Google Scholar]
- [13].Warner Greg, Illy Chantal, Pedro Liliana, Roby Philippe, and Bosse Roger. Alphascreen™ kinase hts platforms. Current Medicinal Chemistry, 11(6):721–730, March 2004. [DOI] [PubMed] [Google Scholar]
- [14].Perez Minervo, Blankenhorn John, Murray Kevin J., and Parker Laurie L.. High-throughput identification of flt3 wild-type and mutant kinase substrate preferences and application to design of sensitive in vitro kinase assay substrates. Molecular amp; Cellular Proteomics, 18(3):477–489, March 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Johnson Jared L., Yaron Tomer M., Huntsman Emily M., Turk Benjamin E., Yaffe Michael B., and Cantley Lewis C.. An atlas of substrate specificities for the human serine/threonine kinome. Nature, 613(7945):759–766, January 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Meyer Cynthia, Melissa McCoy Lianbo Li, Posner Bruce, and Westover Kenneth D.. Lims-kinase provides sensitive and generalizable label-free in vitro measurement of kinase activity using mass spectrometry. Cell Reports Physical Science, 4(10):101599, October 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Lipchik Andrew M., Perez Minervo, Bolton Scott, Steven B. … Ouellette, Cui Wei, and Parker Laurie L.. Kinatest-id: A pipeline to develop phosphorylation-dependent terbium sensitizing kinase assays. Journal of the American Chemical Society, 137(7):2484–2494, February 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Mengqi Jonathan Fan Misha Mehra, Yang Kunwei, Chadha Rahuljeet S., Anber Sababa, and Kovarik Michelle L.. Cross-species applications of peptide substrate reporters to quantitative measurements of kinase activity. ACS Measurement Science Au, 4(5):546–555, August 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Proctor Angela, Wang Qunzhao, Lawrence David S., and Allbritton Nancy L.. Development of a peptidase-resistant substrate for single-cell measurement of protein kinase b activation. Analytical Chemistry, 84(16):7195–7202, August 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Shults Melissa D, Janes Kevin A, Lauffenburger Douglas A, and Imperiali Barbara. A multiplexed homogeneous fluorescence-based assay for protein kinase activity in cell lysates. Nature Methods, 2(4):277–284, March 2005. [DOI] [PubMed] [Google Scholar]
- [21].Yeh Ren-Hwa, Yan Xiongwei, Cammer Michael, Bresnick Anne R., and Lawrence David S.. Real time visualization of protein kinase activity in living cells. Journal of Biological Chemistry, 277(13):11527–11532, March 2002. [DOI] [PubMed] [Google Scholar]
- [22].Xiao Di, Lin Michael, Liu Chunlei, Geddes Thomas A, Burchfield James G, Parker Benjamin L, Humphrey Sean J, and Pengyi Yang. Snapkin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data. NAR Genomics and Bioinformatics, 5(4), October 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Ljungdahl Thomas, Jenny Veide-Vilg Fredrik Wallner, Tamás Markus J., and Grøtli Morten. Positional scanning peptide libraries for kinase substrate specificity determinations: Straightforward and reproducible synthesis using pentafluorophenyl esters. Journal of Combinatorial Chemistry, 12(5):733–742, July 2010. [DOI] [PubMed] [Google Scholar]
- [24].Esmaili Farzaneh, Pourmirzaei Mahdi, Ramazi Shahin, Shojaeilangari Seyedehsamaneh, and Yavari Elham. A review of machine learning and algorithmic methods for protein phosphorylation site prediction. Genomics, Proteomics amp; Bioinformatics, 21(6):1266–1285, October 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Audagnotto Martina and Dal Peraro Matteo. Protein post-translational modifications: In silico prediction tools and molecular modeling. Computational and Structural Biotechnology Journal, 15:307–319, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Esmaili Farzaneh, Qin Yongfang, Wang Duolin, and Xu Dong. Kinase-substrate prediction using an autoregressive model. Computational and Structural Biotechnology Journal, 27:1103–1111, 2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Ma Hongli, Li Guojun, and Su Zhengchang. Ksp: an integrated method for predicting catalyzing kinases of phosphorylation sites in proteins. BMC Genomics, 21(1), August 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Kemp Bruce E. and Pearson Richard B.. Design and use of peptide substrates for protein kinases. In Hunter T.and Sefton B. M., editors, Methods in Enzymology, volume 200, pages 121–134. Academic Press, 1991. [DOI] [PubMed] [Google Scholar]
- [29].Jumper John, Evans Richard, Pritzel Alexander, Kavukcuoglu Koray, Kohli Pushmeet, and Hassabis Demis. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, July 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Baek Minkyung, DiMaio Frank, Anishchenko Ivan, Adams Paul D., Read Randy J., and Baker David. Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557):871–876, August 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Lin Zeming, Akin Halil, Rao Roshan, Sercu Tom, Salvatore Candido, and Rives Alexander. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, March 2023. [DOI] [PubMed] [Google Scholar]
- [32].Evans Richard, Michael O’Neill Alexander Pritzel, Kohli Pushmeet, Jumper John, and Hassabis Demis. Protein complex prediction with alphafold-multimer. bioRxiv, October 2021. [Google Scholar]
- [33].Read Randy J., Baker Edward N., Bond Charles S., Garman Elspeth F., and van Raaij Mark J.. Alphafold and the future of structural biology. Acta Crystallographica Section D Structural Biology, 79(7):556–558, July 2023. [DOI] [PubMed] [Google Scholar]
- [34].Varadi Mihaly, Bertoni Damian, Magana Paulyna, Steinegger Martin, Hassabis Demis, and Velankar Sameer. Alphafold protein structure database in 2024: providing structure coverage for over 214 million protein sequences. Nucleic Acids Research, 52(D1):D368–D375, November 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Bertoline Letícia M. F., Lima Angélica N., Krieger Jose E., and Teixeira Samantha K.. Before and after alphafold2: An overview of protein structure prediction. Frontiers in Bioinformatics, 3, February 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Dauparas J., Anishchenko I., Bennett N., Bera A. K., King N. P., and Baker D.. Robust deep learning–based protein sequence design using proteinmpnn. Science, 378(6615):49–56, October 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Hsu Chloe, Verkuil Robert, Liu Jason, Lin Zeming, Hie Brian, Sercu Tom, Lerer Adam, and Rives Alexander. Learning inverse folding from millions of predicted structures. bioRxiv, April 2022. [Google Scholar]
- [38].Liu Yufeng, Zhang Lu, Wang Weilun, Li Houqiang, Chen Quan, and Liu Haiyan. Rotamer-free protein sequence design based on deep learning and self-consistency. Nature Computational Science, 2(7):451–462, July 2022. [DOI] [PubMed] [Google Scholar]
- [39].The Nobel Prize. Nobel prize in chemistry 2024. Available from https://www.nobelprize.org/prizes/chemistry/2024/press-release/, 2024. Accessed 10 January 2025.
- [40].Abriata Luciano A.. The nobel prize in chemistry: past, present, and future of ai in biology. Communications Biology, 7(1), October 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Lu Peilong and Liu Lei. Computational protein design and structure prediction—the 2024 nobel prize in chemistry. Science China Chemistry, 68(3):812–814, December 2025. [Google Scholar]
- [42].Xiong Peng, Hu Xiuhong, Huang Bin, Zhang Jiahai, Chen Quan, and Liu Haiyan. Increasing the efficiency and accuracy of the abacus protein sequence design method. Bioinformatics, 36(1):136–144, June 2019. [DOI] [PubMed] [Google Scholar]
- [43].Xiong Peng, Wang Meng, Zhou Xiaoqun, Zhang Tongchuan, Zhang Jiahai, Chen Quan, and Liu Haiyan. Protein design with a comprehensive statistical energy function and boosted by experimental selection for foldability. Nature Communications, 5(1), October 2014. [DOI] [PubMed] [Google Scholar]
- [44].Andrew Leaver-Fay Michael Tyka, Lewis Steven M, Sheffler Will, et al. Rosetta3: an object-oriented software suite for the simulation and design of macromolecules. In Methods in enzymology, volume 487, pages 545–574. Elsevier, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Julia Koehler Leman Brian D. Weitzner, Lewis Steven M., Watkins Andrew, Zimmerman Lior, and Bonneau Richard. Macromolecular modeling and design in rosetta: recent methods and frameworks. Nature Methods, 17(7):665–680, June 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Abeeb Abiodun Yekeen Lu Zhang, Liu Haiyan, and Chen Quan. Simultaneous positive and negative selection of proteases in bacterium based on cell suicide and antibiotic resistance. Biotechnology Journal, 18(6), April 2023. [DOI] [PubMed] [Google Scholar]
- [47].Bennett Nathaniel R., Coventry Brian, Goreshnik Inna, De Munck Steven, Savvides Savvas N., and Baker David. Improving de novo protein binder design with deep learning. Nature Communications, 14(1), May 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Raveh Barak, London Nir, and Schueler-Furman Ora. Sub-angstrom modeling of complexes between flexible peptides and globular proteins: Sub-angstrom modeling of flexible peptides. Proteins: Structure, Function, and Bioinformatics, 78(9):2029–2040, March 2010. [DOI] [PubMed] [Google Scholar]
- [49].Abramson Josh, Adler Jonas, Dunger Jack, Jaderberg Max, Hassabis Demis, and Jumper John M.. Accurate structure prediction of biomolecular interactions with alphafold 3. Nature, 630(8016):493–500, May 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Wohlwend Jeremy, Corso Gabriele, Passaro Saro, Silterra Jacob, Jaakkola Tommi, and Barzilay Regina. Boltz-1 democratizing biomolecular interaction modeling. bioRxiv, November 2024. [Google Scholar]
- [51].Chen Xinshi, Zhang Yuxuan, Lu Chan, Shi Bo, Shi Shaochen, and Xiao Wenzhi. Protenix - advancing structure prediction through a comprehensive alphafold3 reproduction. bioRxiv, January 2025. [Google Scholar]
- [52].Boitreaud Jacques, Dent Jack, Matthew McPartlon Joshua Meier, Reis Vinicius, Rogozhnikov Alex, and Wu Kevin. Chai-1: Decoding the molecular interactions of life. bioRxiv, October 2024. [Google Scholar]
- [53].Dauparas Justas, Gyu Rie Lee Robert Pecoraro, An Linna, Anishchenko Ivan, Glasscock Cameron, and Baker David. Atomic context-conditioned protein sequence design using ligandmpnn. Nature Methods, 22(4):717–723, March 2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [54].Abeeb Abiodun Yekeen Olanrewaju Ayodeji Durojaye, Mukhtar Oluwaseun Idris Hamdalat Folake Muritala, and Arise Rotimi Olusanya. Chaperong: A tool for automated gromacs-based molecular dynamics simulations and trajectory analyses. Computational and Structural Biotechnology Journal, 21:4849–4858, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [55].Harmalkar Ameya, Lyskov Sergey, and Gray Jeffrey J. Reliable protein-protein docking with alphafold, rosetta, and replica-exchange. eLife, February 2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Tian Chaoyu, Yang Jiangang, Liu Cui, Ma Hongwu, Sun Yuanxia, and Ma Yanhe. Engineering substrate specificity of had phosphatases and multienzyme systems development for the thermodynamicdriven manufacturing sugars. Nature Communications, 13(1), June 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Lu Jiangong, Lv Xueqin, Yu Wenwen, Du Guocheng, Chen Jian, and Liu Long. Reshaping phosphatase substrate preference for controlled biosynthesis using a “design–build–test–learn” framework. Advanced Science, 11(22), March 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [58].Mirdita Milot, Konstantin Schütze Yoshitaka Moriwaki, Heo Lim, Ovchinnikov Sergey, and Steinegger Martin. Colabfold: making protein folding accessible to all. Nature Methods, 19(6):679–682, May 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [59].Li Weizhong and Godzik Adam. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 22(13):1658–1659, May 2006. [DOI] [PubMed] [Google Scholar]
- [60].Ian K. McDonald and Janet M. Thornton. Satisfying hydrogen bonding potential in proteins. Journal of Molecular Biology, 238(5):777–793, May 1994. [DOI] [PubMed] [Google Scholar]
- [61].Du Guangyan, Rao Suman, Gurbani Deepak, Westover Kenneth D., Zhang Tinghu, and Gray Nathanael S.. Structure-based design of a potent and selective covalent inhibitor for src kinase that targets a p-loop cysteine. Journal of Medicinal Chemistry, 63(4):1624–1641, January 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [62].Gurbani Deepak, Du Guangyan, Henning Nathaniel J., Rao Suman, Bera Asim K., Zhang Tinghu, Gray Nathanael S., and Westover Kenneth D.. Structure and characterization of a covalent inhibitor of src kinase. Frontiers in Molecular Biosciences, 7, May 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The code of the Subtimizer pipeline is publicly available at https://github.com/abeebyekeen/subtimizer.




