Abstract
Protein kinases are central regulators of cell signaling and play pivotal roles in a wide array of diseases, most notably cancer and autoimmune disorders. The clinical success of kinase inhibitors—such as imatinib and osimertinib—has firmly established kinases as valuable drug targets. However, the development of selective, potent inhibitors remains challenging due to the conserved nature of the ATP-binding site, off-target effects, resistance mutations, and patient-specific variability. Recent advances in artificial intelligence (AI) and machine learning (ML) offer transformative solutions to these obstacles across the drug discovery pipeline. This review explores how AI/ML methods, including deep learning, graph neural networks, and generative models, are revolutionizing the design, optimization, and repurposing of kinase inhibitors. We detail applications in target identification, virtual screening, structure–activity relationship modeling, resistance prediction, and clinical trial design. Representative case studies—such as AI-optimized BTK and EGFR inhibitors—highlight real-world impact. We also examine current limitations, including data sparsity, model interpretability, and translational gaps between in silico and experimental results. Finally, we discuss emerging directions such as federated learning, personalized kinase inhibitors, and AI-enabled combination therapies. By integrating computational innovation with medicinal chemistry, AI/ML holds immense promise to accelerate and refine the next generation of kinase-targeted therapeutics.
This review highlights the integration of artificial intelligence (AI) and machine learning (ML) in the discovery of kinase inhibitors, showcasing recent advances, challenges, and the transformative potential of AI in precision drug design.
1. Introduction
Protein kinases are enzymes that transfer a phosphate group to target proteins, thereby regulating cell growth, survival and immune signaling.1,2 Dysregulated kinase signaling drives many diseases: activating mutations or overexpression of kinases underlie many cancers,1 while kinases such as JAKs and SYK control inflammatory and autoimmune pathways.2 Kinase inhibitors have become among the most successful targeted therapies.3 Kinases are the second most targeted family of drug targets after GPCRs. Over 80 small-molecule kinase inhibitors (e.g. BCR–ABL, EGFR, HER2 inhibitors) are FDA-approved, and many more are in clinical trials.4 These targeted drugs have significantly improved patient outcomes (e.g. in CML and lung cancer). Kinase inhibitors also treat other diseases: JAK inhibitors (tofacitinib, baricitinib) are approved for rheumatoid arthritis and related autoimmune disorders. These successes underscore the broad relevance of kinases as drug targets.5
Despite these advances, inhibitor development faces significant challenges. Chief among these is selectivity: most ATP-competitive inhibitors bind a conserved active site shared across the kinome, so they often hit multiple kinases and cause off-target toxicity.1 For example, first-generation kinase inhibitors caused cardiotoxicity and other side effects by inhibiting unintended kinases.1 Designing molecules with both high potency and high target specificity is therefore difficult. Moreover, patient-specific factors (tumor microenvironment, epigenetics) can modulate response to kinase drugs, adding further complexity to efficacy.6 Given the complexity of kinase selectivity and patient heterogeneity, novel computational approaches are needed to streamline drug discovery and enhance precision.
Despite the success of kinase inhibitors, their development remains hindered by selectivity issues and off-target toxicity due to the conserved nature of ATP-binding sites. Traditional medicinal chemistry struggles to fully overcome these challenges. In response, artificial intelligence (AI) and machine learning (ML) have emerged as powerful tools to streamline kinase drug discovery. By learning from large datasets, AI/ML models can predict inhibitor selectivity, optimize lead compounds, and propose novel molecules with improved specificity—thus complementing and accelerating traditional approaches (Fig. 1).
Fig. 1. Diagram summarizing kinase signaling and the challenges in inhibitor design. Kinases propagate signals via phosphorylation cascades, but the conserved ATP-binding pocket (highlighted) makes selective inhibition difficult, and cancer cells can acquire resistance mutations. The figure conceptually illustrates a kinase activation pathway with common obstacles (selectivity, mutation-driven resistance) indicated.
AI and ML methods offer data-driven ways to address these problems. Intuitively, ML algorithms can learn complex structure–activity patterns from large datasets that are hard to grasp manually.7,8 For example, supervised models (deep neural networks, random forests, etc.) trained on known bioactivity data can predict which new compounds are likely to inhibit a given kinase or avoid off-target interactions.9 Graph neural networks (GNNs) operate on molecular graphs (atoms connected by bonds), allowing them to capture structural features.10 Deep generative models can propose entirely new chemical structures by learning from existing libraries.11 These methods have already shown promise: for instance, a recent study combined a generative deep learning model with reinforcement learning to design novel EGFR kinase inhibitors, identifying compounds whose activity was later validated experimentally.12 Ultimately, these AI/ML proposals are hypotheses – the promising candidates generated by algorithms must still be synthesized and tested in the laboratory. Nevertheless, by prioritizing likely leads and predicting selectivity profiles, AI/ML approaches can augment traditional medicinal chemistry and help focus experimental efforts.
2. Background
2.1. Kinase inhibitors: a primer
Protein kinases share a conserved catalytic domain (∼250–300 amino acids) composed of two lobes: An N-terminal lobe (mostly β-sheets) and a larger C-terminal lobe (mostly α-helices). The ATP-binding cleft lies between the lobes, and conformational elements like the activation loop (with its DFG motif) regulate whether the kinase is “on” or “off.” Many kinases also have additional regulatory domains (e.g. SH2, PH domains), but most small-molecule drugs target the core catalytic site. Subtle structural differences among kinases (e.g. at the gatekeeper residue or hidden allosteric pockets) can be exploited to achieve selectivity.
Kinase inhibitors are often classified by binding mode. ATP-competitive inhibitors occupy the conserved ATP pocket. Type I inhibitors bind the active conformation, whereas type II inhibitors (like imatinib) bind an inactive “DFG-out” conformation.5 Type II binding engages a unique hydrophobic pocket and often enhances selectivity.13,14Allosteric inhibitors bind outside the ATP site (type III/IV), at kinase-specific pockets.15,16 These can achieve high selectivity by exploiting features unique to a particular kinase. Covalent inhibitors form irreversible bonds (typically to a nucleophilic cysteine) in the active site.17 Examples include afatinib (EGFR inhibitor) and ibrutinib (BTK inhibitor), which covalently modify their targets. Covalent binding often yields prolonged inhibition and, since only a subset of kinases has the reactive cysteine, can improve selectivity.18
These strategies have yielded many clinical successes. Imatinib (BCR–ABL inhibitor) dramatically improved outcomes in chronic myeloid leukemia.19,20 EGFR inhibitors (gefitinib, osimertinib) revolutionized EGFR-mutant lung cancer therapy, and HER2 inhibitors (lapatinib, trastuzumab) are effective in breast cancer.21 BRAF inhibitors (vemurafenib) and MEK allosteric inhibitors (trametinib) have transformed melanoma treatment.22 In the inflammatory/immunology arena, JAK inhibitors (tofacitinib, baricitinib) are approved for rheumatoid arthritis and related autoimmune diseases.23 Even monoclonal antibodies like trastuzumab (against HER2) and cetuximab (against EGFR) exploit kinase biology to block signaling. These examples show how diverse inhibitor types (ATP-competitive, allosteric, covalent, antibody) can be harnessed clinically.
Ongoing challenges remain. Resistance invariably emerges: secondary mutations (EGFR T790M/C797S, BCR–ABL T315I) and alternative signaling can bypass single-agent therapies.24 Achieving absolute isoform specificity is still difficult, so many inhibitors retain activity on multiple kinases and risk off-target effects.24 In practice, medicinal chemists must balance potency, selectivity and drug-like properties. Novel strategies (e.g. bivalent inhibitors targeting two sites, proteolysis-targeting chimeras) are under study to overcome these limitations. Ultimately, even successful kinase drugs often need combination or sequential therapies to address tumor heterogeneity and adaptive resistance.
2.2. Basics of AI/ML in drug discovery
Core AI/ML concepts relevant to drug discovery include supervised learning, unsupervised learning, deep learning, reinforcement learning, and generative models.
• Supervised learning involves training models (neural networks, random forests, etc.) on labeled data—such as molecules annotated with IC50 values—to predict biological activity or binding affinity. Once trained, these models can virtually screen large compound libraries and flag likely actives.
• Unsupervised learning uncovers hidden patterns in unlabeled data. Techniques like clustering, principal component analysis, and autoencoders can group compounds by structural similarity or bioactivity, revealing chemotype families or highlight novel scaffolds.
• Deep learning uses multi-layer neural architectures (e.g. convolutional, recurrent, graph-based networks) to learn complex features from raw chemical representations such as SMILES strings or molecular graphs. These models can identify substructures or spatial patterns that correlate with biological activity and support end-to-end prediction workflows.
• Reinforcement learning (RL) enables iterative compound optimization. Here, an algorithm proposes modifications to a molecule and receives a “reward” signal if predicted properties (e.g. potency, selectivity) improve. Over time, this strategy can produce optimized candidates, and has been successfully used to generate improved kinase inhibitors.12
• Graph neural networks (GNNs) represent molecules as graphs of atoms (nodes) and bonds (edges) and iteratively update atomic features based on neighboring atoms. GNNs excel in chemistry because they naturally encode molecular connectivity and local structure.10
• Generative models including autoencoders, variational autoencoders (VAEs), and generative adversarial networks (GANs), are capable of producing novel chemical structures. Once trained on large libraries of known compounds, they can generate novel chemical structures. Generative deep learning has already produced new kinase inhibitor scaffolds in silico,11 illustrating its promise for lead discovery. Recent methods have also adapted transformer-based architectures (from natural language processing) for molecular generation (treating SMILES as a ‘language’ of chemistry).25
Common software frameworks and libraries (TensorFlow, PyTorch, RDKit) implement these methods, but the key is understanding data. For kinase drug discovery, relevant data sources include structural and bioactivity databases. The abundant publicly available data on kinases make ML especially promising. For example, thousands of kinase structures have been solved (available in the PDB) and databases like ChEMBL contain millions of measured kinase-inhibitor activities. ML models can be trained on these datasets to predict binding and selectivity for new compounds, complementing traditional docking and QSAR approaches. Genomic and transcriptomic datasets (e.g. TCGA cancer profiles, LINCS gene-expression signatures) offer context on disease biology and drug response. In summary, these diverse data allow AI/ML models to relate molecular structures to biological outcomes, aiding kinase inhibitor design (Fig. 2). In practice, AI/ML helps focus experimental efforts on the most promising candidates, potentially accelerating the design–synthesis–test cycle.
Fig. 2. Overview of AI/ML techniques in kinase drug discovery.
3. AI/ML applications in kinase inhibitor development
3.1. Target identification and validation
While kinases are a well-known target class for medicinal chemists, identifying the most relevant kinase among hundreds remains a major challenge—especially as cancers evolve and adapt.26 Modern AI/ML approaches can sift through complex biological data to pinpoint dysregulated kinases and other actionable targets. By integrating multi-omics and prior knowledge, these methods generate ranked lists of candidate targets (e.g. kinases) that warrant further experimental validation. In simple terms, these tools act like intelligent filters—narrowing down massive datasets into the most promising drug targets, including kinases that show abnormal behavior in disease. Below we highlight representative tools and strategies for AI-enabled kinase target discovery and validation, with a focus on their inputs, algorithms, and outputs (Fig. 3).
Fig. 3. Overview of AI/ML approaches for kinase target identification.
3.1.1. Integrative multi-omics platforms
Input data and methods
Multi-omics integration platforms combine diverse molecular data from the same patient cohort to identify disease-specific targets. For example, Multiomics2Targets (Deng et al., 2024)27 is a web platform that accepts matched transcriptomic, proteomic and phosphoproteomic profiles from cancer cohorts. It applies a series of enrichment and network analyses established tools (Enrichr,28ChEA3,29KEA3,30Expression2Kinases/X2K,31TargetRanger32) to rank proteins, genes and transcripts as potential targets.
Output
The output is an automated report (figures, tables, text) highlighting the highest-scoring candidates. In a pan-cancer CPTAC analysis, Multiomics2Targets successfully recovered subtype-specific targets across tumor types.27 Such integrative tools help chemists by reducing hundreds of dysregulated signals to a manageable list of prioritized kinase targets, facilitating experimental follow-up.
Limitations and applicability
Beyond Multiomics2Targets, related pipelines leverage curated databases and statistical models. For instance, Expression2Kinases (X2K) infers upstream kinases from gene expression signatures,31 while TargetRanger ranks drug-target genes from multi-omic input.32 These methods effectively translate complex omics patterns into candidate drug targets (often upstream kinases) by exploiting known gene–protein and protein–protein networks. Importantly, they can highlight kinases that are disproportionately implicated by the data (“actionable nodes”) and warrant medicinal chemistry attention. However, an inherent limitation of these enrichment-based approaches is their dependence on existing knowledge: novel or context-specific kinase drivers not represented in pathway databases may be missed. They also typically provide broad candidate lists that require further filtering or validation. On the positive side, their use of established networks makes results highly interpretable and user-friendly (e.g. Multiomics2Targets automatically generates a publication-style report for easy examination. In practice, integrative multi-omics pipelines are most precise when rich, matched datasets are available, and they excel at quickly flagging well-known pathways or kinases implicated across data types. It is important be aware that these tools may over-prioritize well-studied kinases due to database bias, and their recommendations should be cross-checked for truly novel insights.
3.1.2. Network and graph-based models
Input data and methods
Another strategy uses network inference and graph-based learning to identify key kinases. The SPHINKS algorithm (substrate PHosphosite-based inference for network of kinaseS)33 integrates quantitative proteomic and phosphoproteomic data to construct a kinase–substrate interaction network. SPHINKS scores kinase–phosphosite pairs across samples and identifies “master kinases” whose activity best explains the observed phosphoproteome (Output). In glioblastoma, SPHINKS predicted subtype-specific drivers (e.g. PKCδ, DNA-PK), some of which were experimentally validated as regulators of tumor growth in glioblastoma subtypes.33Limitations and applicability: this approach effectively prioritizes kinases by exploiting the pattern of substrate phosphorylation, and it was shown to be robust to missing data (via cross-validation on known kinase–substrate interactions).33 In practice, SPHINKS narrows the kinome to a few candidate “master” enzymes per cancer subtype, guiding further target validation studies. Because SPHINKS relies on specialized phosphoproteomic input, its utility is highest in contexts like proteogenomic projects (e.g. CPTAC) where phosphorylation data are plentiful. It may be less applicable when only transcriptomic or genomic data are available, but its focus on functional post-translational activity provides a level of mechanistic precision that expression-only methods (like X2K) might lack. The strength of SPHINKS lies in its mechanistic network inference (highlighting direct kinase–substrate relationships), although it could overlook potential drivers that do not produce clear phosphosite signals in the dataset.
Graph neural networks (GNNs) are another powerful class of models. Input data and methods: CancerOmicsNet, for example, is a deep-learning framework that embeds multi-modal cancer data (e.g. cell-line genomics, chemical features of drugs) into a graph and applies graph convolution to predict cell viability after kinase inhibitor treatment.26Output: crucially, CancerOmicsNet uses explainable AI techniques: a customized saliency map reveals which kinases (nodes) most influence each prediction. Limitations and applicability: in practice, high-saliency kinases often correspond to known oncogenic drivers or drug targets. Singha et al. demonstrated that CancerOmicsNet recapitulated essential kinases across cancers and suggested novel candidates by comparing the saliency-weighted kinases to literature.26 Such GNN approaches can integrate heterogeneous data and explicitly highlight kinase nodes as outputs, making them attractive for target discovery. They typically include rigorous validation (e.g. cross-validation on held-out cell-line datasets) and benchmark their predictions against experimental responses. However, deep graph-based models have their own trade-offs: they require large, well-annotated training datasets (e.g. drug screening results across many cell lines) and substantial computational resources. Their predictions can be less transparent than simpler statistical methods, so explainability methods (like saliency maps) are essential to interpret why a particular kinase is deemed important. In addition, a GNN like CancerOmicsNet excels in the context of predicting drug response or essential kinases in in vitro models, but applying it directly to patient data can be challenging unless similar multi-modal datasets are available. In short, graph-based AI offers high predictive power and the ability to uncover complex multi-gene patterns, but users must be mindful of its black-box nature and the need for careful interpretation and validation of any suggested targets.
3.1.3. Pathway-informed deep learning
Input data and methods
Incorporating prior biological knowledge into AI models can improve interpretability.34 Recent transformer-based models like DeePathNet integrate curated pathway information with omics data. DeePathNet encodes known cancer-specific pathways into its neural architecture, enabling it to capture how genomic alterations perturb signaling cascades.35 This approach uses the body's existing biological roadmaps—known signaling pathways—to make AI predictions more trustworthy and aligned with known mechanisms. While originally developed for drug-response and subtype classification, the same idea can highlight pathways (and their constituent kinases) that are most perturbed in a tumor. Output: in general, pathway-aware models identify subnetworks of the kinome that are dysregulated, which guides chemists to focus on pathway-central kinases. Other examples (e.g. Pathformer) similarly improve prediction accuracy by constraining models to operate on established signaling pathways.36Limitations and applicability: by using prior knowledge, these deep-learning approaches gain interpretability and often boost precision, as the model's attention is directed to real biological interactions. However, a key limitation is the bias toward known pathways – novel kinase interactions or off-pathway effects will not be captured if they are absent from the curated knowledge base. Thus, pathway-informed models are extremely useful for confirming and explicating known disease mechanisms (making their outputs more explainable to researchers), but they might miss entirely new kinase targets that lie outside canonical pathways.
These deep-learning approaches are often paired with feature-attribution methods. For instance, attention weights or saliency scores can point to specific pathways or genes that drive model decisions. When applied to kinase signaling, this can uncover novel or unexpected kinases for experimental follow-up, while still grounding the findings in biological context. In summary, pathway-informed AI models strike a balance between discovery and interpretability – they leverage what is known to ensure predictions are credible and biologically relevant, yet researchers should remain cautious about blind spots for biology that isn't yet in the databases.
3.1.4. Validation and prioritization
All AI-based target predictions require validation. It's important to note that no model is perfect—predictions must always be checked in the lab or against trusted datasets. Typical strategies include in silico benchmarking (cross-validation, test sets) and comparison with orthogonal data (e.g. genetic screens, literature). For example, CancerOmicsNet's top-ranked kinases were supported by extensive literature in many cases and some novel suggestions were consistent with known cancer biology.26 Similarly, the SPHINKS study performed cross-validation on known kinase–substrate pairs and experimentally confirmed the top kinase hits in cell assays.33 In practice, predicted kinases are often tested by CRISPR knockouts or small-molecule perturbations in relevant cancer models to confirm their functional importance. Comparative Consideration: it is also crucial to consider the precision differences between methods: a simple multi-omics enrichment might return several dozen candidates (with some false positives due to correlated data), whereas a tailored deep model might output a more focused prediction (which could still be biased by training data). Therefore, any AI-generated target list should be prioritized and filtered using domain knowledge and additional experiments. Ultimately, these AI/ML methods for kinase target identification bring together heterogeneous data and advanced algorithms to highlight actionable kinases. Each approach comes with distinct strengths and limitations – from the ease-of-use and interpretability of enrichment platforms to the complexity and depth of graph neural networks – so the choice of tool should match the research context and data at hand. These approaches—ranging from multi-omics enrichment pipelines to network inference and deep-learning models—produce ranked candidate targets along with measures of confidence. Coupled with experimental and literature validation, they provide medicinal chemists with a data-driven shortlist of dysregulated kinases for drug development.
To facilitate comparison, Table 1 summarizes the data inputs, techniques, and types of outputs each tool provides. This can serve as a quick reference guide for selecting the most suitable approach for a given task.
Table 1. Representative AI/ML methods for kinase target identification, showing inputs, approaches, and outputs for target prioritization.
Tool/method | Input data | Algorithm/approach | Output/use | Ref. |
---|---|---|---|---|
Multiomics2Targets | Cohort transcriptomics, proteomics, phosphoproteomics | Integrative pipeline using enrichment tools (Enrichr, ChEA3, KEA3, X2K, TargetRanger) | Prioritized gene/protein targets; subtype-specific target lists | (Deng et al., 2024)27 |
SPHINKS | Phosphoproteomic (± proteomic) profiles | Kinase–substrate network inference (machine-learning scoring of kinase–phosphosite pairs) | Master kinases per cancer subtype (drivers of signaling) | (Migliozzi et al., 2023)33 |
CancerOmicsNet | Cell-line genomic data, drug features | Graph neural network + explainability (saliency mapping) | Predicted drug responses; essential kinases as explainable targets | (Singha et al., 2023)26 |
DeePathNet (Pathformer) | Transcriptomics (+ other omics); pathway databases | Transformer-based deep learning with pathway embeddings | Cancer type/drug response classification; highlights critical pathways | (Cai et al., 2022)35 |
Each method balances data input and interpretability differently. Multiomics2Targets relies on established enrichment tools for direct interpretability. SPHINKS uncovers latent kinase activities from phosphoproteomes. CancerOmicsNet and DeePathNet leverage deep learning to model complex patterns, with built-in explainability to recover key kinases. By understanding these nuances, researchers can better appreciate the opportunities and caveats presented by AI/ML tools in kinase target discovery, and apply them in the appropriate contexts to maximize success in drug research.
3.1.5. Repurposing kinase inhibitors
Repurposing kinase inhibitors has emerged as a promising strategy to expand their therapeutic applications beyond their originally intended targets. AI-driven methodologies, particularly network-based approaches, have played a pivotal role in identifying new indications for existing kinase inhibitors. A notable example is the repurposing of baricitinib, a JAK inhibitor, for COVID-19 treatment based on network pharmacology and AI-driven prediction models.37 By leveraging large-scale transcriptomic and drug-target interaction data, AI facilitates the identification of novel uses for kinase inhibitors, optimizing their therapeutic potential (Fig. 4).38
Fig. 4. Study design. Gene expression data from the LINCS project were mapped to 12 MeSH therapeutic use categories. A deep neural network (DNN) was trained independently on two input types: gene expression levels for 977 “landmark genes” and pathway activation scores for significantly perturbed samples, corresponding to input layers of 977 and 271 neural nodes, respectively (reproduced with permission Copyright 2024, Springer).
Machine learning models trained on pharmacogenomic datasets further enhance the repurposing process by predicting drug synergy and patient-specific responses. Recent studies have demonstrated that explainable ML models can uncover expression signatures associated with synergistic drug responses, enabling more precise drug repurposing efforts.39 This aligns with AI-driven clinical trial optimization strategies, where real-world data and predictive modeling refine patient selection and improve trial efficiency.40,41
Moreover, deep learning frameworks have accelerated large-scale drug discovery and repurposing by integrating chemical, biological, and clinical data.42 These models enable the identification of previously unrecognized kinase-inhibitor interactions, aiding in the expansion of therapeutic indications for FDA-approved drugs. Such approaches complement AI-driven resistance mutation prediction and next-generation inhibitor design, creating a comprehensive framework for kinase inhibitor development.
In addition to AI-driven predictions, computational modeling approaches such as molecular docking and dynamic simulations have provided structural insights into kinase-inhibitor interactions, offering a mechanistic rationale for repurposing efforts. These techniques allow researchers to evaluate the binding affinity of kinase inhibitors to off-target proteins, facilitating the identification of alternative disease indications. The integration of AI-generated hypotheses with experimental validation further enhances the reliability of repurposing strategies, expediting the transition from computational predictions to clinical applications.
The future of kinase inhibitor repurposing is poised to benefit from the convergence of AI, multi-omics data, and precision medicine. Federated learning and generative AI are expected to refine predictive accuracy while preserving patient privacy and data security. Furthermore, the combination of AI with high-throughput screening technologies may uncover novel kinase targets across a spectrum of diseases, expanding therapeutic possibilities and improving patient outcomes. As AI continues to advance, its role in kinase inhibitor repurposing will likely become increasingly indispensable, driving innovation in drug development and personalized medicine.
3.2. Virtual screening and hit generation
Virtual screening (VS) and hit generation are critical steps in kinase inhibitor discovery, enabling computational triage of large libraries to identify molecules likely to bind a target. The approaches fall into structure-based methods (docking) and ligand-based methods (similarity/QSAR), with emerging de novo generative strategies. AI and ML techniques have been increasingly integrated to augment traditional VS workflows. In the kinase context, screening must account for challenges such as highly conserved ATP pockets and the need for selectivity across the kinome. This section reviews classical and ML-enhanced VS strategies—structure-based, ligand-based, and generative—with an emphasis on cancer-related kinases and representative examples (Fig. 5).43,44
Fig. 5. AI-enhanced virtual screening workflows in kinase inhibitor discovery: a unified pipeline for structure-based, ligand-based, and generative strategies (the kinase protein shown in the figure is PDGFRA (PDB ID: 6JOL).
3.2.1. Structure-based virtual screening
Input data and methods
Structure-based VS (SBVS) uses 3D structures of kinase targets to dock candidate ligands. Traditional docking programs (e.g. AutoDock Vina, Glide) sample ligand orientations and score fits with empirical or physics-based functions. Recent AI-driven methods accelerate and improve SBVS. For example, Deep Docking (DeepDock) combines docking and ML: only a subset of molecules is docked, and a ML model trained on those results predicts docking scores for the rest of the library. This strategy can achieve ∼100-fold speed-ups in screening billion-scale libraries.45 Likewise, Luttens et al. demonstrated that a classifier trained on 1 million docked compounds could guide screening of a 3.5 billion-molecule library, reducing computational cost by ∼1000-fold and yielding novel ligands with multi-target activity.46
Output
Such workflows enable rapid screening of vast libraries and can uncover novel chemical matter for kinases. Deep learning models such as GNINA, which augments docking with 3D convolutional neural networks trained on known complexes, raise pose prediction success (top1 accuracy) from ∼58% to 73% on a test set of protein–ligand structures.47 Graph-based neural networks can predict docking scores directly: ScoreFormer is a graph-transformer model that learns to predict docking scores across chemical space, outperforming prior GNN approaches in hit recovery while running ∼1.65× faster.48 Such ML scorers can wrap around any docking engine, enabling faster rescoring or triage of candidates.
Limitations and applicability
Training accurate models requires large structural datasets: Huang et al. reported that training a pose predictor on an expanded “BindingNet v2” dataset (∼689 k protein–ligand complexes) greatly improved docking pose prediction success compared to training on the smaller PDBBind set.49AlphaFold2 (AF2) has greatly expanded structural coverage of kinases,50 but standard AF2 models usually represent a single (often active/DFGin) conformation. Kinases can adopt multiple binding-site shapes (DFGin vs. DFGout, αC-helix positions, etc.), which affects ligand binding. To address this, Song et al. introduced an AF2 multi-state modeling (MSM) protocol: using state-specific templates, they generated alternative kinase conformations. MSM models achieved comparable structural accuracy to standard AF2 but led to better docking pose predictions and more chemically diverse hits in virtual screening.44 This demonstrates that accounting for kinase flexibility (e.g. screening multiple kinase conformers from crystal structures, molecular dynamics, or AF2 variants) can broaden scaffold discovery and overcome biases toward a single kinase state.
3.2.2. Ligand-based virtual screening
Input data and methods
Ligand-based VS (LBVS) relies on known inhibitors to guide screening. Traditional approaches include 2D fingerprint similarity search or pharmacophore matching. ML has enhanced LBVS by learning activity patterns from large datasets. For kinases, extensive public activity data (e.g. ChEMBL) are available. A recent kinome-wide study trained a multi-task deep neural network on >650 000 kinase–ligand bioactivity points (covering 342 human kinases) and found it outperformed classical single-task models.51 Graph neural networks can also predict bioactivity from structure: e.g., VirtuDockDL employs a GNN and combines ligand and protein information.
Output
In benchmarks, VirtuDockDL achieved ∼99% accuracy (AUC ≈ 0.99) on a HER2 kinase inhibitor dataset, surpassing a DeepChem ML baseline (89% accuracy) and AutoDock Vina docking alone (82%).43 It outperformed methods focusing solely on docking (RosettaVS) or ligand features (PyRMD) by integrating both modalities.43 These results illustrate that ML-enhanced LBVS can greatly improve hit identification when ample data exist.
Limitations and applicability
LBVS remains limited to the chemical space represented in the training set. While it offers high accuracy and speed, it may not generalize well to novel scaffolds or poorly annotated targets. Careful validation and post-screening filtering are essential.
3.2.3. De novo design and generative models
De novo design creates novel molecules without starting points. This is like having an AI assistant that can invent new molecules from scratch, not just select from existing ones. Classical methods include fragment linking or evolutionary algorithms. Input data and methods: modern AI-driven generative models have shown promise for kinase drug discovery. For example, REINVENT 4 is an open-source RNN-based framework (with transformer components) that generates molecular graphs or SMILES and optimizes them via reinforcement learning.52 Similarly, MolDQN (Zhou et al., 2019) uses a Deep Q-Network to sequentially apply chemically valid modifications, achieving 100% chemical validity and enabling multi-objective optimization (e.g. improving drug-likeness while maintaining similarity to a lead).53In silico medicine's generative tensorial reinforcement learning (GENTRL) framework integrates a tensor-train-decomposed variational autoencoder with reinforcement learning and Kohonen self-organizing map (SOM)-based reward functions to bias generation toward novel and synthetically feasible DDR1 (discoidin domain receptor 1) kinase inhibitors;54 in a 21 day challenge it produced six novel DDR1 inhibitors, four of which were biochemically active (two also in cell-based assays) and one of which exhibited favorable pharmacokinetics in mice.
Generative models can also be conditioned on protein structure. RELATION (Wang et al., 2022) is a 3D structure-based generative model that encodes the protein pocket geometry and pharmacophore using a BiTL network, then performs docking-guided sampling to generate ligands.56DeepFrag uses a CNN to suggest optimal fragment additions to a given ligand for lead optimization (Durrant et al., 2021).55
Output
These tools can invent molecules with desired properties tailored to kinase binding pockets. RELATION designed candidate inhibitors for AKT1 and CDK2 with favorable predicted binding affinities and pharmacophoric features.56GENTRL succeeded in generating real kinase inhibitors with in vivo activity.
Limitations and applicability
These generative approaches require careful choice of reward/objective functions and post-filtering for synthetic feasibility. While powerful, they may propose chemically valid but non-synthesizable or toxic molecules. Diffusion models have emerged as a leading class for 3D structure generation, but they still face challenges in accurately predicting pharmacological properties or ensuring synthetic accessibility.57,58
3.2.4. Comparative analysis and practical considerations
The discussion below help medicinal chemists weigh which approach to use based on available data, objectives, and experimental constraints.
• Screening strategy: structure-based VS requires a 3D target structure (crystal or predicted) and a compound library; it can discover new chemo types if the model is accurate.44,59 Ligand-based VS uses known actives (requiring annotated data) and is computationally fast but limited to analogues of training compounds.43,51 Generative design creates novel molecules without existing leads, but its candidates must be validated by docking or experiment.
• Data requirements: docking needs accurate receptor models and curated compound libraries. LBVS-ML needs large kinase activity datasets (e.g. ChEMBL/BindingDB) and careful train/test splitting (often by target or scaffold) to gauge generalization. Generative models need extensive chemical training sets (often millions of compounds) and defined scoring functions (docking, drug likeness, etc.). Large structural datasets (PDBBind, BindingNet) support training of ML scoring functions.49
• Tools and performance: classical docking tools (Glide, GOLD, Vina) are well-validated but can be slow for ultra-large screens. AI-accelerated tools (DiffDock, EquiBind, TankBind, etc.) achieve much faster docking with comparable accuracy.59 Hybrid ML pipelines (e.g. VirtuDockDL) combine ligand and structure information to boost hit rates.43 ML models for docking pose (e.g. ScoreFormer) can reduce compute by predicting scores directly.48 Each approach has trade-offs: physics-based docking is generally more robust to novel targets, while learned models excel when similar complexes are in the training domain.
• Limitations: even the best AI models are not foolproof. It's important to combine these predictions with medicinal chemistry intuition and follow up with wet-lab validation. All methods can yield false positives – predicted binders may fail experimentally. Docking scores are approximate and ignore factors like solvation and entropy. Ligand-based ML cannot identify chemotypes far from the training set. Generative models may propose chemically valid but non-synthesizable or toxic molecules. In kinases, the high conservation of the ATP site can lead to off-target activity; selectivity filters (for example, using KLIFS-derived kinase pocket fingerprints)60 or cross-screening against multiple kinases are needed. Careful model validation is essential, typically using retrospective benchmarks (e.g. DUD-E,61 custom kinase test sets62,63) and metrics like ROC AUC64 and early enrichment.65 For example, a recent PoseX benchmark found that AI-based docking models (including DiffDock) achieved higher accuracy on known complexes, whereas traditional docking generalized better to unseen proteins.66
Because kinase binding sites can shift shape between active and inactive forms, selecting the right structure to dock into is crucial—AI can help model these conformational changes using multiple receptor models (ensembles of crystal/AF2 structures) which can capture this flexibility. The KLIFS database (available at https://klifs.net), a structural resource that systematically aligns 85 conserved binding-site residues across all human kinase domains to standardize comparisons of inhibitor interactions, is widely used for analyzing pocket conformations and designing selectivity filters.60 For example, a kinase-focused virtual screening study leveraged KLIFS-derived interaction fingerprints to prioritize compounds targeting specific hinge and αC-helix interactions, leading to the discovery of JAK1-selective inhibitors (JAK1 IC50 ∼ 1.0 μM vs. TYK2 4.5 μM).67 Multi-task ML models can also be trained to predict kinase polypharmacology directly using multi-kinase data.51
To summarize the models discussed above, Table 2 provides an overview of representative AI/ML tools applied in kinase-focused virtual screening and hit generation.
Table 2. Representative AI/ML tools applied in kinase-based virtual screening and hit generation.
Tool/ method | Input data | Algorithm/approach | Output/use | Ref. |
---|---|---|---|---|
DeepDock | Protein structure; small subset of docked library compounds | Deep neural network (QSAR) trained on docking scores to iteratively filter ultra-large libraries | Ranks and enriches likely binders from billion-scale libraries, accelerating VR | Gentile et al., 2022 (ref. 45) |
GNINA | Protein 3D structure; candidate ligand structures (SMILES/3D) | Convolutional neural network (3D CNN) scoring function integrated into docking (AutoDock Vina) | Predicts binding poses and affinity scores with higher accuracy than classical scoring | McNutt et al., 2021 (ref. 47) |
ScoreFormer | Ligand molecular graphs with known docking scores (training data) | Graph transformer with principal neighborhood aggregation and learnable RW positional encodings | Surrogate scoring model for HTVS | Ciudad-Serrano et al., 2024 (ref. 48) |
AlphaFold2 MSM | Kinase amino-acid sequence; state-specific template structures | AlphaFold2 with tailored templates to predict multiple kinase conformations | Ensemble of kinase structures in different states to enhance docking & VS | Song et al., 2024 (ref. 44) |
VirtuDockDL | Target protein structure; compound library with known actives/inactives | Graph neural network combining ligand-based and structure-based screening | Identifies likely hits in large libraries; >99% accuracy in HER2 validation | Noor et al., 2024 (ref. 43) |
REINVENT 4 | Large SMILES dataset; optional constraints (scaffolds, linkers, etc.) | RNN with transformer and reinforcement/transfer learning for molecule design | De novo drug design generating molecules with certain criteria | Loeffler et al., 2024 (ref. 52) |
MolDQN | Initial lead molecule (SMILES or graph); reward function | Deep Q-learning (double DQN) modifying molecular graphs | Lead optimization ensuring 100% valid molecules | Zhou et al., 2019 (ref. 53) |
GENTRL | Drug-like molecules; activity prediction model | Conditional VAE + reinforcement learning for goal-directed generation | De novo hit generation with optimized potency and novelty | Zhavoronkov et al., 2019 (ref. 54) |
RELATION | 3D structure of target binding site | 3D conditional generator designing molecules within receptor active site | Structure-based de novo design with high predicted affinity | Wang et al., 2022 (ref. 56) |
DeepFrag | 3D protein-ligand complex with a ligand fragment removed | 3D CNN trained on pocket-ligand grids to suggest fragment additions | Fragment-based lead optimization to improve ligand affinity | Green et al., 2021 (ref. 55) |
DiffDock | Protein and ligand structures (SMILES or 3D) | Diffusion generative model with SE(3)-equivariance | Structure-based docking predicting ligand pose without exhaustive search | Corso et al., 2022 (ref. 59) |
EquiBind | Protein and ligand structures (geometry or generated) | SE(3)-equivariant GNN predicting pose and binding site | Ultra-fast binding pose prediction without iterative sampling | Stärk et al., 2022 (ref. 68) |
TankBind | Protein and ligand structures | Trigonometry-aware geometric GNN with distance-angle constraints | Predicts protein–ligand complex structure and affinity | Lu et al., 2022 (ref. 69) |
3.3. Lead optimization and selectivity profiling
Kinase inhibitor optimization demands specialized AI/ML methods that account for the large kinase family and its conserved active site. Kinases share a common ATP-binding fold, so off-target activity is common unless selectivity is explicitly modeled. Recent strategies therefore leverage kinome-wide data and multi-task modeling to predict on- and off-target activities simultaneously. For example, Tang et al. compiled the KIBA dataset (∼52 500 compounds × 467 kinases) by integrating IC50, K(i) and K(d) data.70 Such large-scale kinase bioactivity matrices enable training of multi-output QSAR models. In practice, models trained on kinome data can rapidly profile a compound's selectivity fingerprint across hundreds of kinases (the “kinome”). These approaches also incorporate ADMET objectives that are especially relevant for kinase chemotypes. Overall, AI-driven SAR modeling for kinases has shifted from single-target QSAR to proteochemometric and multi-objective frameworks. These frameworks don't just look at potency—they can simultaneously assess drug-likeness, toxicity, and selectivity, mimicking the multi-parameter optimization that medicinal chemists perform (Fig. 6).71,72
Fig. 6. Workflow for multi-objective optimization in selective inhibitor design. This workflow consists of two parts: molecular structure generation and generated structure evaluation. Chem-TS, a reinforcement learning-based tool, generates molecular structures, which are evaluated for selectivity, pharmacokinetics, and drug-likeness. The D-score function calculates rewards based on objective values, guiding the generator to optimize molecules through repeated cycles (reproduced with permission Copyright 2022, American Chemical Society).
3.3.1. Kinome-wide profiling with multi-task models
Input data and methods
Multiple groups have demonstrated that multi-task neural networks outperform traditional single-target models in kinome profiling. Schürer et al. showed that a multi-task deep neural network (MTDNN) trained on hundreds of thousands of ChEMBL and private kinase measurements (∼668 k data points, 342 kinases) achieved ∼85% accuracy in classifying compound activity across the kinome.51 The MTDNN exploited shared hidden representations to improve prediction for sparsely annotated kinases. This is similar to learning multiple related skills at once—AI can learn common patterns that apply across different kinases, improving performance even on kinases with limited data. Similarly, Ma et al. developed KinomePro-DL, a multi-task DNN trained on a cleaned compendium of 191 kinases (from 6 public datasets), which achieved auROC ∼0.95 and PR-AUC ∼0.92 for selectivity profile classification.73 KinomePro-DL was applied in a machine learning–guided virtual screening to discover novel CDK2 inhibitors with favorable selectivity.73
Graph neural networks have also been applied to kinome profiling. Bao et al. introduced AMGU, an auxiliary multi-task graph isomorphism network with uncertainty weighting, trained on activity data for 204 kinases.74AMGU outperformed conventional descriptor-based and single-task GNN models on internal and external kinase test sets, indicating its superior generalizability. It was released as the KIP web service for kinome-wide polypharmacology prediction.74 Likewise, Hu et al. (2024) confirmed that MTDNN models consistently beat single-task methods: their kinome-wide classifier achieved ∼85% accuracy (ROC-AUC ∼0.8) on held-out LINCS KINOMEscan data.51
Output
These web-accessible tools (KinomePro-DL, KIP, etc.) allow users to input ligand structures and obtain predicted inhibition profiles across the kinome. In all of these studies, multi-task learning leverages the fact that chemical features predictive of activity on one kinase often transfer to related kinases, improving selectivity modeling overall.51
Limitations and applicability
AI models for kinome profiling must be trained on extensive bioactivity databases. In addition to KIBA, public sources like ChEMBL, BindingDB, and proprietary KinaseProfiler results are routinely mined. For example, Hu et al. built their dataset from ChEMBL and the Kinase Knowledge Base (KKB), covering 315 000 compounds and 342 kinases.51 Park et al. curated ∼288 k bioactivities for 391 kinases and 155 000 compounds from three reference databases.75 These large-scale kinome matrices form the basis for predictive models of selectivity. The IDG-DREAM drug–kinase binding prediction challenge further illustrated ML's power: crowdsourced models trained on such data accurately ranked activities of new kinase inhibitors across many targets, underscoring ML's advantage over classical QSAR.71
Input data and methods
At the modeling level, diverse architectures have been used. Convolutional and recurrent neural networks (CNNs, RNNs) have been applied to sequence-derived or graph-based representations of ligands and kinases.71 For example, AiKPro is an attention-based deep model that fuses structure-validated multiple sequence alignments (svMSA) of kinase domains with ensemble 3D descriptors of ligands. Other groups have incorporated docking or protein structure directly: Schifferstein et al. docked a comprehensive set of kinase inhibitors into high-quality X-ray structures and trained a “kinase-specific scoring function” neural network using docking features.76
Output
These models achieved robust predictive performance for kinase–ligand interactions. AiKPro achieved Pearson's r ≈ 0.88 on held-out kinase–ligand pairs, while the docking-informed model achieved R2 ≈ 0.63–0.74 on unseen inhibitors (kinome-wide), illustrating how 3D pose features can augment ML predictions (Fig. 7).76
Fig. 7. Dataset composition and KinCo generation process. A) Dataset overview. The DTC dataset includes over 130 000 kinase-compound pairs with binding constants but no crystal structures (green). PDBBind2018-kinase contains 2244 kinase-compound pairs with binding affinities and corresponding crystal structures (orange). Kinome-wide binding constants from DTC were used to train structure-based affinity prediction models through docking and iterative training. B) Docking workflow. Homology models were generated for each target kinase using mammalian kinase structures as templates. Models with over 40% sequence identity to the target kinase were selected, and compounds were docked into these models, producing over 11 000 docked poses across various kinase conformations for each kinase-compound pair. KinCo comprises over 137 000 kinase-compound pairs with docked poses and experimental binding constants. C) Iterative training. Autodock Vina's scoring function selected the pose with the highest predicted binding affinity for each kinase-compound pair, which was paired with its experimental binding affinity to train the initial model, KinCoNet-M1. This model predicted poses with the highest affinity for training the next model, KinCoNet-M2. This process can be repeated n times to develop KinCoNet-Mn (reproduced with permission Copyright 2023, American Chemical Society).
3.3.2. Multi-objective optimization and kinase-specific ADMET
Input data and methods
Lead optimization for kinases often must balance potency and drug-like properties. AI/ML strategies thus frequently employ multi-objective frameworks. Yoshizawa et al. developed a de novo molecular generator (ChemTS) guided by reinforcement learning to optimize 18 objectives simultaneously.72 These included inhibitory activities against nine EGFR-family kinases plus three pharmacokinetic endpoints (e.g. solubility, metabolic stability) and drug-likeness metrics.
Output
The model successfully generated novel tyrosine kinase inhibitors that maximized on-target potency while minimizing off-target activity and satisfying PK/ADMET criteria.72 More generally, “kinase-specific ADMET” models are built by training on known kinase-targeted drugs. For instance, ML classifiers and regressors for hERG inhibition, microsomal clearance, or CYP liability can be tuned using kinase chemotypes. These ADMET predictions can be incorporated into lead optimization loops or multi-criteria virtual screening.
Limitations and applicability
Some approaches merge ADMET predictors with potency models in Pareto-optimal design pipelines.72 The accuracy of ML-generated candidates depends on training data and the appropriateness of objective functions. These models are best viewed as augmentation tools, not stand-alone solutions. In summary, kinase-focused lead optimization leverages ML not only to maximize target inhibition, but also to flag promiscuity and predict absorption, distribution, metabolism and toxicity (ADMET) outcomes relevant to kinase inhibitor scaffolds.
3.3.3. Structural data in kinase ML models
Input data and methods
Incorporating kinase structural information into ML workflows further enhances selectivity modeling. The KLIFS database provides curated kinase pocket alignments and interaction fingerprints from >2900 kinase–ligand crystal structures.67 By using KLIFS residue indices and residue-based features, models can explicitly account for binding site variation. For example, one may encode the presence of specific gatekeeper or hinge residues (from KLIFS) as part of the input vector. These are key residues known to influence inhibitor binding and selectivity, so including them in the model helps fine-tune predictions for kinase-specific activity.
Output
Tools like kinase inhibitor complex (KinCo) go further: Liu et al. generated in silico kinase–ligand complexes by docking to predicted kinase structures (AlphaFold2 models) paired with experimental binding data.77
Limitations and applicability
The resulting “KinCo” dataset enabled training of a structure-based ML model that outperformed sequence-only models, especially when generalizing to distant kinases and novel scaffolds.77 In essence, KinCo shows that AlphaFold-predicted kinase structures can serve as proxies to derive protein–ligand features.
More broadly, proteochemometric models can integrate any structural descriptors. Oliveira et al. demonstrated for GPCRs that one can extract 3D “protein fingerprints” from AlphaFold-predicted structures and feed them (together with ligand descriptors) into ML models.78 Applying a similar strategy to kinases would allow ML models to distinguish targets by their unique pocket geometry or allosteric sites. Indeed, the success of the docking-informed and KinCo approaches implies that structural descriptors (docking scores, contact fingerprints, pocket descriptors) are valuable inputs for kinase ML.
Table 3 summarizes different machine learning models used for kinase profiling. It shows the type of data they need, what they predict, and how you might use them in a practical setting—even if you don't have a background in AI.
Table 3. Machine learning models and input data used for kinase profiling.
Model/tool | Kinases (tasks) | Input features | Predicted output |
---|---|---|---|
KinomePro-DL73 | 191 (panel) | Molecular fingerprints (SMILES) | Multi-task classification: inhibitor vs. inactive on 191 kinases |
AiKPro75 | 391 (w/ data) | Ligand 3D conformers + kinase sequence alignments | Regression: kinase–ligand binding affinity (pK_i) across kinases |
MTDNN51 | 342 (kinome subset) | Morgan fingerprints | Multi-task classification (KA/KI vs. PI method) for activity on 342 kinases |
AMGU (GIN)74 | 204 (PKIS kinases) | Molecular graph + node/edge attributes | Multi-task classification: predicted inhibition (0/1) per kinase |
KinomeMETA79 | 661 (WT and mutant kinases) | Ligand descriptors + meta-learning parameters | Polypharmacology score/probability for each kinase task |
In the table above, all models were trained on kinome-wide bioactivity data. The inputs vary: some use only ligand features (fingerprints or graphs), while others incorporate target features (sequence motifs or structural descriptors). Outputs range from binary activity predictions to continuous affinity values across multiple kinases. Notably, models like AiKPro and KinCo explicitly integrate kinase features (sequence or structure) to improve generalization to novel kinases.75,77
3.3.4. Overcoming resistance mutations
Resistance mutations remain a major challenge in targeted kinase inhibitor therapies, often leading to treatment failure. AI-driven approaches offer promising solutions by predicting resistance mechanisms and guiding the design of next-generation inhibitors.
3.3.4.1. AI-driven prediction of resistance mechanisms
AI and ML algorithms are being employed to predict resistance mechanisms, such as gatekeeper mutations in kinase domains that hinder drug binding. Computational histopathology has revealed mutation patterns across different cancer types, allowing for earlier identification of resistance pathways. Fu et al. demonstrated that AI can detect tumor composition and mutation status using histological data, providing insights into resistance evolution.80
Similarly, high-throughput AI-based screening can pre-screen FGFR3 mutational status in muscle-invasive bladder cancer using routine histology slides, enabling early intervention (AI allows pre-screening of FGFR3 mutational status, 2024).81,82 Additionally, BRAF mutation testing has been improved by AI-assisted decision-making, ensuring more accurate mutation profiling and therapy selection (using AI to support BRAF mutation testing, 2024).83
3.3.4.2. Designing next-generation inhibitors using adaptive ML models
AI-driven models are being used to design novel inhibitors that can bypass resistance mutations. Advanced ML algorithms analyze conformational flexibility in mutated proteins, facilitating the rational design of inhibitors with higher binding affinity. AI-based conformational prediction, such as the AlphaFold2 subsampled approach, has enabled the modeling of resistant kinase structures, aiding drug discovery (Fig. 8).84
Fig. 8. Overview of the subsampled AlphaFold2 (AF2) workflow used in the reported study. A) Standard AF2 predicts protein structures using a multiple sequence alignment (MSA), typically producing similar structures across multiple independent predictions (seeds). B) This study demonstrates that subsampling deep MSAs enables AF2 to predict diverse conformations of the same protein, with the predicted frequency of each conformation (based on varied random seeds) correlating strongly with its experimentally determined relative state population (reproduced with permission Copyright 2024, Springer).
Ataxia telangiectasia mutated (ATM) kinase inhibitors have also seen advancements through AI-based analytical approaches, optimizing drug targeting strategies.83 These innovations enhance drug efficacy and minimize the likelihood of further resistance development. By leveraging AI, researchers can predict resistance mutations more accurately and design inhibitors that remain effective against evolving targets. Future directions include integrating AI with high-throughput screening platforms and molecular dynamics simulations to refine drug design further. As AI capabilities continue to expand, overcoming resistance mutations in kinase inhibitors will become increasingly feasible, ultimately improving long-term treatment success.
3.4. Clinical trial design and optimization
Artificial intelligence (AI) and machine learning (ML) are revolutionizing clinical trial design, particularly for kinase inhibitors, by enhancing patient cohort selection and optimizing trial outcomes. Traditional clinical trial methodologies often face challenges related to patient heterogeneity, long recruitment times, and suboptimal dosing regimens. AI-driven approaches provide solutions by leveraging vast datasets, including patient genomics, histopathology, and real-world evidence (RWE), to improve trial efficiency and therapeutic success.
3.4.1. AI for patient cohort selection
One of the most critical factors in the success of clinical trials is the accurate selection of patient cohorts. AI models facilitate this process by identifying molecular biomarkers and histopathological patterns predictive of drug response. Deep learning techniques have been instrumental in this regard. For instance, Skrede et al. demonstrated that deep learning models could predict colorectal cancer outcomes with high accuracy based on histopathological images.85 Similarly, Wang et al. applied machine learning-based analysis to identify extrachromosomal DNA variations, providing insights into tumor biology and patient stratification for targeted therapies.86 These AI-driven methodologies streamline patient selection, ensuring that clinical trials enroll individuals who are most likely to benefit from kinase inhibitors (Fig. 9).
Fig. 9. Framework for identifying and analyzing cancer extrachromosomal DNA (ecDNA) amplification using whole-exome sequencing data. a) Study schematic. b) Feature importance of the final XGBOOST model for ecDNA cargo gene prediction, built with 11 features over 1000 independent runs to optimize hyperparameters. c) Performance (auPRC, mean ± SD) of the ecDNA cargo gene prediction model during training and evaluation with 10-fold cross-validation, with early stopping after 10 non-improving rounds (n = 386 tumor samples). d) Performance metrics (auPRC, auROC, precision, sensitivity, specificity) for sample-level ecDNA amplification identification. Source data provided. XGBOOST: eXtreme gradient boosting; total_cn: total copy number; minor_cn: minor allele copy number; cna_burden: copy number alteration burden; pLOH: genome percentage with loss of heterozygosity; AScore: aneuploidy score (reproduced with permission Copyright 2024, Springer).
Another example is the use of deep learning in predicting microsatellite instability directly from histology, as shown by Kather et al. in gastrointestinal cancers.87 Such AI-based patient stratification can significantly improve the precision of trial designs by reducing variability and increasing treatment efficacy. Moreover, AI models integrating spatial interactions of tumor-infiltrating lymphocytes (TILs) and cancer nuclei have been employed to predict responses to immune checkpoint inhibitors, further refining patient selection.88
3.4.2. Predicting trial outcomes and dose optimization
Beyond patient selection, AI and ML facilitate trial outcome prediction by analyzing historical clinical data and real-world patient responses. AI-based models trained on multi-omics data have been developed to predict treatment responses and survival outcomes. For example, deep learning-based histopathological analysis has been used to predict HER2 status and trastuzumab response in breast cancer patients (Fig. 10).89
Fig. 10. Datasets and study design for HER2 status and trastuzumab response classification. a) Datasets created for model training and testing. b) Number of tiles per class, showing tumor region tiles for the TCGA-BRCA independent test set and response model. c) Key preprocessing and model training steps. Pathology experts annotated HER2+, HER2−, and non-tumor regions on slides. Slides were divided into 512 × 512 pixel tiles, with background tiles removed. Data were split (70% training, 30% testing) for Yale cohorts, with TCGA-BRCA used for independent validation of HER2 status prediction. Data augmentation and color normalization enhanced reproducibility. Classes were balanced using down- and up-sampling. Tiles were converted to TFRecords for training an inception v3-based model, and performance was evaluated on test data, with predictions visualized as heatmaps on whole-slide images (reproduced with permission Copyright 2024, Elsevier).
Additionally, AI is being integrated into digital twin models for trial simulation and personalized dosing regimens. Digital twins create virtual patient populations, allowing researchers to test different dosing strategies in silico before applying them in real-world trials.90 The application of digital twin technology in oncology trials has been explored by Chebanov and Misyurin (2023), demonstrating its potential to optimize trial designs through predictive modeling. These AI-driven simulations accelerate trial execution by minimizing patient risk and reducing reliance on traditional dose-finding studies.41
Furthermore, AI-assisted evaluation of clinical trial eligibility criteria using real-world data enhances enrollment efficiency and trial feasibility assessments. Liu et al. (2021) demonstrated that AI-based models could optimize eligibility criteria, ensuring better patient representation and faster trial execution.40 Another significant advancement is federated learning, which enables AI models to be trained on decentralized patient data while preserving privacy. This approach has been utilized in predicting histological responses to neoadjuvant chemotherapy in triple-negative breast cancer.91
As AI continues to evolve, its integration into clinical trial design will likely expand. Future advancements may include multi-omics AI models capable of integrating genomic, proteomic, and metabolomic data for even more precise patient stratification. Additionally, real-time AI-driven adaptive trials will enable dynamic adjustments to trial protocols based on emerging patient responses, improving efficiency and cost-effectiveness.
While AI holds significant promise, challenges such as data bias, regulatory compliance, and model interpretability must be addressed. The practical implementation of AI in medicine requires robust validation, as highlighted by He et al. in their analysis of AI technologies in healthcare.92 Addressing these challenges will be essential to fully harness AI's potential in optimizing clinical trials for kinase inhibitors.
Kinase inhibitors face distinct challenges in clinical trials due to the rapid emergence of resistance mutations and the need for robust biomarkers to predict therapeutic response. For instance, mutations such as EGFR T790M or BCR-ABL T315I can significantly alter treatment efficacy, necessitating real-time monitoring to adapt trial designs.93 Additionally, identifying reliable biomarkers for patient stratification is critical to optimize trial outcomes for kinase-targeted therapies.
AI-driven approaches offer tailored solutions to these challenges, distinct from generic drug discovery methods:
Dynamic monitoring of resistance mutations: AI models, particularly those integrating longitudinal genomic data, enable real-time tracking of resistance mutations during clinical trials. Machine learning algorithms, such as those based on recurrent neural networks, can analyze circulating tumor DNA (ctDNA) sequencing data to detect emerging mutations like T790M in EGFR, allowing adaptive trial designs that adjust dosing or combine therapies to overcome resistance.94 Unlike traditional trial designs, which rely on static endpoints, AI facilitates dynamic adjustments based on real-time molecular insights.
Biomarker development and patient stratification: AI techniques, including supervised learning and multi-omics integration, enhance biomarker discovery by identifying predictive signatures for kinase inhibitor response. For example, random forest models have been used to correlate kinase expression profiles with clinical outcomes, enabling better patient stratification in trials for drugs like imatinib.95 This contrasts with conventional approaches, which often rely on predefined biomarkers and may miss complex, non-linear patterns in patient data.
Trial optimization with predictive modeling: AI-driven predictive models can simulate trial outcomes by integrating patient-specific data, such as kinase mutation profiles and pharmacokinetic parameters, to optimize dosing regimens and trial cohorts. Reinforcement learning approaches have shown promise in designing adaptive trials that prioritize patient subgroups likely to respond to kinase inhibitors, reducing trial costs and timelines.96,97
3.5. Case study
3.5.1. FGFR2 inhibitor (RLY-4008) in cholangiocarcinoma
Fibroblast growth factor receptor 2 (FGFR2) alterations, including fusions and rearrangements, are well-established oncogenic drivers in intrahepatic cholangiocarcinoma (iCCA). Targeting FGFR2 aberrations has been a major focus of precision oncology, with several inhibitors developed to selectively block it signaling pathway. Among these, RLY-4008 (Fig. 11) has emerged as the first highly selective and irreversible FGFR2 inhibitor, demonstrating promising clinical efficacy in patients with FGFR2-altered cancers.
Fig. 11. Characterization of RLY-4008 as a potent, selective, irreversible FGFR2 inhibitor. A) Sequence alignment of FGFR1–4 kinase domains shows high similarity, with RLY-4008 binding site residues boxed and differences from FGFR2 highlighted in pink (FGFR2 IIIc isoform numbering). B) Chemical structure of RLY-4008: N-(4-(4-amino-5-(3-fluoro-4-((4-methylpyrimidin-2-yl)oxy)phenyl)-7-methyl-7H-pyrrolo[2,3-d]pyrimidin-6-yl)phenyl)methacrylamide. C) Crystal structure of RLY-4008 bound to FGFR2 (PDB: 8STG), with protein in green, inhibitor carbons in magenta, and Cys491 sulfur (gold) forming a covalent adduct. D and E) Covalent labeling rates of RLY-4008 (D) and futibatinib (E) on FGFR2 (red) and FGFR1 (blue), measured by intact mass over time (triplicate biological replicates). F) Concentration-dependent modification rates of RLY-4008 for FGFR2 (red) and FGFR1 (blue): FGFR2: kinact = 6.45 × 10−2 s−1, KI = 1.87 μmol L−1, kinact/KI = 3.45 × 10−2 s−1/(μmol L−1); FGFR1: kinact = 2.33 × 10−3 s−1, KI = 6.14 μmol L−1, kinact/KI = 3.79 × 10−4 s−1/(μmol L−1). G) Fold change in biochemical IC50 values for RLY-4008 and other inhibitors across FGFR1–4, averaged from three experiments with two biological replicates each (error bars: SD). H) TREEspot visualization of RLY-4008 selectivity against 468 kinases (KINOMEscan, DiscoverX) at 500 nmol L−1, with >75% inhibition for FGFR2 (94.1%), MEK5 (92.4%), and MKNK2 (89%) (reproduced with permission Copyright 2023, American Association for Cancer Research).
The identification of FGFR2 fusions and rearrangements is crucial for patient selection in clinical trials evaluating FGFR2 inhibitors. A novel circulating cell-free DNA (cfDNA) algorithm has been developed to detect these genetic alterations with high sensitivity, facilitating the enrollment of eligible patients for treatment with RLY-4008.98 This approach enhances personalized treatment strategies by ensuring that only patients harboring FGFR2-driven tumors receive targeted therapy, thereby improving clinical outcomes and minimizing off-target effects.
RLY-4008 distinguishes itself from earlier FGFR inhibitors by its unprecedented selectivity for FGFR2, which minimizes off-target toxicities commonly associated with pan-FGFR inhibition. Preclinical studies and early-phase clinical trials have demonstrated its potent activity across a range of FGFR2 alterations, including acquired resistance mutations that often emerge during treatment with first-generation FGFR inhibitors.99 By covalently binding to FGFR2, RLY-4008 effectively suppresses oncogenic signaling while sparing other FGFR isoforms, reducing the risk of dose-limiting toxicities such as hyperphosphatemia.
The discovery and development of RLY-4008 represent a significant advancement in targeted kinase inhibitor therapy. Structural and biochemical analyses have elucidated its mechanism of action, confirming its irreversible binding mode and selectivity for FGFR2.100 This specificity is particularly valuable for overcoming resistance mechanisms, as many patients with FGFR2-altered cancers develop secondary mutations that confer resistance to earlier inhibitors. The ongoing phase II clinical trial of RLY-4008 aims to further validate its efficacy and safety profile, potentially setting a new standard of care for FGFR2-driven malignancies.
Overall, RLY-4008 exemplifies the power of AI and precision medicine in kinase inhibitor development. The integration of advanced computational tools for mutation detection and predictive modeling enhances the drug development pipeline, ensuring that targeted therapies are both highly selective and effective against resistance mutations. As research progresses, RLY-4008 may pave the way for next-generation FGFR inhibitors, offering new hope for patients with FGFR2-driven cancers.
3.5.2. Small molecule inhibitors of MEK1 and MEK2 (REC-4881) phase II
REC-4881 is an orally bioavailable, non-ATP-competitive allosteric inhibitor targeting mitogen-activated protein kinase kinases 1 and 2 (MEK1 and MEK2). Developed by Recursion Pharmaceuticals, this small-molecule inhibitor is under investigation for its potential to treat various conditions, including familial adenomatous polyposis (FAP) and certain cancers characterized by specific genetic mutations.101
REC-4881 is a product of Recursion Pharmaceuticals' AI-driven drug discovery platform. Recursion integrates artificial intelligence (AI), machine learning, and high-throughput screening to accelerate the identification and optimization of therapeutic candidates. This platform analyzes extensive biological datasets to uncover novel drug mechanisms, enhance precision, and streamline drug development processes. The application of AI in REC-4881's discovery exemplifies the increasing role of computational tools in modern drug development, allowing for faster identification of promising molecules and improved predictions of efficacy and safety profiles.
By selectively binding to MEK1 and MEK2, REC-4881 inhibits the activation of downstream effector proteins within the RAS/RAF/MEK/ERK signaling pathway. This pathway is crucial for cell proliferation and survival, and its dysregulation is implicated in numerous cancers. Inhibiting MEK1/2 activity may suppress tumor growth and proliferation in malignancies driven by aberrations in this signaling cascade.
The U.S. Food and Drug Administration (FDA) has granted REC-4881 both Fast Track and Orphan Drug designations for the potential treatment of FAP, a hereditary condition leading to colorectal cancer due to the development of numerous polyps in the colon and rectum. These designations aim to expedite the development and review processes for therapies addressing unmet medical needs in serious conditions.
REC-4881 is currently being evaluated in a phase II clinical trial targeting patients with unresectable, locally advanced, or metastatic solid tumors harboring AXIN1 or APC mutations. This open-label, multicenter study investigates the safety, efficacy, and pharmacokinetics of REC-4881 administered at a daily oral dose of 12 mg. Approximately 60 participants are enrolled, divided equally between cohorts with AXIN1 or APC mutations.
The trial's primary objectives include assessing the drug's safety profile and preliminary efficacy in these genetically defined patient populations. Secondary objectives encompass evaluating pharmacokinetic parameters to understand the drug's behavior within the body. This study represents a significant step toward personalized medicine, as targeting specific genetic mutations may enhance treatment efficacy and minimize adverse effects.101
REC-4881 exemplifies the advancement of targeted therapies in oncology and genetic disorders. Its AI-driven development underscores the importance of precision medicine approaches, focusing on specific molecular aberrations to improve patient outcomes. Ongoing clinical trials will elucidate its therapeutic potential and safety, potentially offering new treatment avenues for patients with limited options.101
3.5.3. A small-molecule TNIK inhibitor targeting fibrosis
Recent preclinical and clinical research has highlighted the therapeutic potential of a small-molecule inhibitor targeting Traf2 and Nck-interacting kinase (TNIK), a serine/threonine kinase involved in Wnt signaling and fibrosis pathogenesis. This investigational TNIK inhibitor has demonstrated significant antifibrotic activity in models of idiopathic pulmonary fibrosis (IPF) and other fibrotic conditions. By attenuating fibrogenic signaling pathways, the compound reduces extracellular matrix deposition and mitigates tissue scarring. Its translation into early-phase clinical studies represents a novel approach to targeting fibrotic diseases, where current treatment options remain limited. This case reinforces the broader applicability of kinase inhibitors beyond oncology and exemplifies how AI-assisted discovery and mechanistic targeting can address unmet clinical needs in complex pathologies like fibrosis.102
3.6. The role of data in AI/ML-based kinase inhibitor development
Data is the essential substrate for all artificial intelligence (AI) and machine learning (ML) applications in kinase inhibitor research. Whether the goal is target identification, virtual screening, lead optimization, or resistance modeling, the choice and quality of the input data largely determine the success of the model. In the context of kinase biology and drug development, data spans multiple modalities—including transcriptomic, proteomic, phosphoproteomic, structural, and pharmacological datasets—each playing a critical role in different stages of the pipeline. For instance, multi-omics datasets (e.g. CPTAC, TCGA) enable AI models to uncover dysregulated kinase pathways in cancer, while structural databases such as PDB and KLIFS inform molecular docking and binding site modeling. Bioactivity repositories like ChEMBL, BindingDB, and KIBA provide the quantitative interaction data needed to train and benchmark ML models for kinase profiling, selectivity prediction, and activity regression. Similarly, datasets like LINCS provide drug-response signatures that are crucial for drug repurposing or synergy prediction. The increasing availability of AlphaFold2-predicted kinase structures has also opened new avenues for structure-based design, even in the absence of crystallographic data.
Despite this wealth of resources, challenges remain. Many datasets are unevenly annotated, biased toward well-studied kinases, or limited in scope (e.g. covering only wild-type isoforms). Data heterogeneity—different formats, quality, or missing modalities—also limits integration across platforms. For example, kinase inhibitors annotated with IC50 in one database might lack consistent assay conditions or target confirmation in another. Additionally, deep learning models often require large, balanced, and diverse datasets, yet many kinase families remain underrepresented in current resources. To address these limitations, researchers increasingly rely on multi-task modeling, transfer learning, and pathway-informed architectures that make better use of sparse or noisy data. Still, the success of such approaches ultimately depends on the availability, accessibility, and quality of foundational datasets. To that end, Table 4 summarizes the most commonly used kinase-related datasets—both experimental and computational—that underpin AI/ML tools in this field. These resources serve as the empirical backbone for virtually every method discussed in this review.
Table 4. Key experimental and computational kinase datasets supporting AI/ML-driven drug discovery.
Name | Data type | Size | Public | Year | URL |
---|---|---|---|---|---|
CPTAC103 (clinical proteomic tumor analysis consortium) | Proteomic, phosphoproteomic & transcriptomic (proteogenomic) data | ∼10 cancer types, ≈1000 tumor samples profiled | Yes (open data) | 2016 | https://targets.linkedomics.org/ |
TCGA (the Cancer genome atlas)104 | Genomic, transcriptomic, epigenomic (DNA methylation, etc.) | >11 000 patients across 33 cancer types | Yes | 2008 | https://staff.stat.sinica.edu.tw/chyeang/IHAS/ |
LINCS105 (library of integrated network-based cellular signatures) | Transcriptomic drug-response signatures (L1000 profiles) | 1.3 M+ gene expression profiles for ∼20 000 perturbagens (small molecules & genetic) | Yes | 2011 | https://clue.io/data |
ChEMBL | Bioactivity database (IC50, Ki, Kd, etc.) | >2.2 million compounds; >18 million activity measurements | Yes | 2009 | https://www.ebi.ac.uk/chembl/ |
BindingDB106 | Binding affinities (IC50, Kd, Ki) | ∼3.1 million binding data points for ∼9600 protein targets (≈1.3 M compounds) | Yes | 2000 | https://www.bindingdb.org/rwd/bind/index.jsp |
KIBA70 | Integrated kinase inhibitor bioactivity matrix | 52 498 compounds × 467 kinases (246 088 affinity entries) | Yes | 2014 | https://github.com/paperswithcode/paperswithcode-data |
PDB (protein data Bank) | 3D structures of proteins (experimental) | >200 000 experimentally-determined structures archived (∼1.1 TB data) | Yes | 1971 | https://www.rcsb.org/ |
AlphaFold DB107 | Predicted 3D protein structures (AI-based) | >200 million protein structures predicted (includes all human kinases) | Yes | 2021 | https://alphafold.ebi.ac.uk/ |
KLIFS (kinase–ligand interaction fingerprints and structures)60 | Aligned kinase binding pocket structures & interactions | 85 conserved pocket residues tracked in each kinase; >2900 kinase–ligand crystal structures (human/mouse) at initial release (expanded to >6000 complexes in recent updates) | Yes | 2014 | https://klifs.net/ |
PKIS/PKIS2 (published kinase inhibitor sets)108 | Kinome-wide inhibitor screening panels | ∼367 kinase inhibitors tested across 224 kinases (PKIS1); PKIS2 adds ∼645 inhibitors (for expanded kinome coverage) | Yes | 2014 | https://github.com/openkinome |
Kinase Knowledge Base (KKB) | Proprietary kinase bioactivity data | ∼2.8 million activity data points (curated literature & patents) covering ∼426 k compounds and 574 kinases | No (commercial) | ∼2000s | https://eidogen-sertanty.com/kinasekb.php |
KinomePro-DL73 | Curated kinome bioactivity matrix for deep learning | Cleaned matrix of inhibitors vs. 191 kinases (integrated from 6 public datasets); thousands of ligand–kinase pairs | Yes (upon request) | 2024 | https://kinomepro-dl.pharmablock.com/ |
KinCo77 | Kinase–ligand docking complexes (homology models) | ∼137 778 kinase–inhibitor complexes (docked poses) with experimental activities | Yes (upon request) | 2023 | https://lsp.connect.hms.harvard.edu/ikinco/ |
4. Limitations and challenges in AI-driven drug discovery: current bottlenecks and future prospects
The integration of artificial intelligence (AI), machine learning (ML), and deep learning (DL) into drug discovery has revolutionized the pharmaceutical landscape, offering unprecedented opportunities for accelerating and optimizing the development of therapeutics. However, despite notable successes, several limitations and challenges persist that hinder the full realization of AI's potential in this domain.9–11,20
4.1. AI-driven solutions for kinase-specific hurdles
Kinases share highly conserved ATP-binding sites, which complicates the design of selective inhibitors that can distinguish between closely related kinase isoforms while maintaining potency. Additionally, resistance mutations, such as those in the gatekeeper residues (e.g., T790M in EGFR), further challenge the development of effective and durable kinase inhibitors.109 These issues necessitate innovative approaches to achieve both selectivity and robustness against resistance.
AI models address these challenges in several unique ways compared to traditional drug discovery methods:
• Enhanced selectivity prediction: AI-driven approaches, particularly deep learning models, leverage large datasets of kinase-inhibitor interactions to predict selectivity profiles with high accuracy. For instance, structure-based AI models, such as those using graph neural networks (GNNs), can model the subtle structural differences in kinase binding pockets, enabling the design of inhibitors with improved specificity.110 Unlike traditional high-throughput screening, which often struggles with the combinatorial complexity of kinase targets, AI models efficiently prioritize compounds with favorable selectivity profiles.
• Addressing resistance mutations: AI models can integrate genomic and structural data to predict the impact of resistance mutations on inhibitor binding. For example, machine learning approaches have been used to design inhibitors that retain activity against resistant kinase mutants by targeting allosteric sites or exploiting mutation-specific vulnerabilities.111,112 This contrasts with traditional medicinal chemistry approaches, which often rely on iterative synthesis and testing, a process that is time-consuming and less adaptable to rapidly evolving resistance profiles.
• Data-driven optimization: AI models excel at integrating diverse datasets, including kinase sequence data, structural information, and clinical outcomes, to optimize lead compounds. Techniques such as reinforcement learning and generative AI can propose novel chemical scaffolds that traditional methods might overlook, particularly for challenging targets like kinases with conserved binding sites.54 This data-driven approach accelerates the identification of selective and mutation-robust inhibitors compared to conventional structure–activity relationship (SAR) studies.
4.2. Data quality and availability
A fundamental challenge in AI-driven drug discovery is the reliance on high-quality, comprehensive datasets. The performance of AI models is intrinsically linked to the quality of the data they are trained on. Issues such as data inconsistency, bias, and scarcity can significantly impair model accuracy and generalizability.1,10,19 For instance, many publicly available biological datasets are not adequately curated for AI applications, leading to potential mispredictions and unreliable outcomes. Moreover, the lack of standardized data formats and protocols across different research institutions exacerbates these issues, making data integration and comparison challenging.2,17,22
4.3. Model interpretability and explainability
The “black box” nature of many AI and DL models poses a significant barrier to their adoption in drug discovery. The complexity of these models often makes it difficult to interpret how specific predictions are made, which is particularly problematic in a field where understanding the rationale behind a decision is crucial for validation and regulatory approval. Explainable AI (XAI) has emerged as a solution to this problem, aiming to make AI decisions more transparent and understandable.6,10,18 However, implementing XAI in drug discovery remains a work in progress, with ongoing research needed to develop methods that can effectively elucidate the decision-making processes of complex models.3,7,8
4.4. Generalization and robustness
AI models often struggle with generalizing beyond the specific datasets they were trained on. This limitation is particularly concerning in drug discovery, where models need to predict the behavior of novel compounds accurately. Overfitting to training data can lead to models that perform well in silico but fail in experimental or clinical settings. Addressing this issue requires the development of models that are not only accurate but also robust and capable of generalizing across diverse chemical spaces.11,15,16,21
4.5. Integration with experimental validation
While AI can significantly accelerate the identification of potential drug candidates, experimental validation remains an essential step in the drug discovery process. Bridging the gap between computational predictions and laboratory experiments is critical for translating AI-generated insights into viable therapeutics. This integration necessitates close collaboration between computational scientists and experimental researchers to ensure that AI predictions are grounded in biological reality and can be effectively tested and validated in the lab.4,8,17,23
4.6. Ethical and regulatory considerations
The use of AI in drug discovery also raises ethical and regulatory concerns. Issues such as data privacy, especially when dealing with patient data, and the potential for algorithmic bias must be carefully managed. Regulatory frameworks need to evolve to address the unique challenges posed by AI-driven drug development, ensuring that these technologies are used responsibly and that their outputs are subject to appropriate oversight and validation.5,9,12,19 Notably, the U.S. Food and Drug Administration (FDA) has announced plans to deploy AI tools internally across all its centers, aiming to streamline the drug approval process. This move underscores the growing recognition of AI's potential in regulatory settings (Reuters).
4.7. Future perspectives
Despite these challenges, the future of AI in drug discovery is promising. Advancements in data sharing and standardization, such as the development of community-driven data repositories and standardized protocols, can enhance data quality and availability.10,12,16,18 Improvements in model interpretability, through the advancement of XAI techniques, will make AI tools more transparent and trustworthy.6,7 Moreover, the integration of AI with other emerging technologies, such as quantum computing and advanced robotics, could further revolutionize the drug discovery process. Collaborative efforts between academia, industry, and regulatory bodies will be essential in addressing current limitations and unlocking the full potential of AI in developing safe and effective therapeutics.2,3,21,24
In conclusion, while AI, ML, and DL offer transformative potential for drug discovery, realizing this potential requires addressing significant challenges related to data quality, model interpretability, generalization, experimental integration, and ethical considerations. Through concerted efforts to overcome these hurdles, the pharmaceutical industry can harness AI's capabilities to accelerate the development of new drugs and improve patient outcomes.
Conflicts of interest
The authors declare no competing financial interest.
Acknowledgments
The authors disclose that AI tool ChatGPT was employed for language polishing. This work was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (Grant No. RS-2024-00409793) and JSPS KAKENHI (Grant No. 24k17681).
Data availability
No primary research results, software or code have been included and no new data were generated or analyzed as part of this review.
References
- Bhullar K. S. Lagarón N. O. McGowan E. M. Parmar I. Jha A. Hubbard B. P. Rupasinghe H. V. Mol. Cancer. 2018;17:1–20. doi: 10.1186/s12943-018-0804-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zarrin A. A. Bao K. Lupardus P. Vucic D. Nat. Rev. Drug Discovery. 2021;20:39–63. doi: 10.1038/s41573-020-0082-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferguson F. M. Gray N. S. Nat. Rev. Drug Discovery. 2018;17:353–377. doi: 10.1038/nrd.2018.21. [DOI] [PubMed] [Google Scholar]
- Roskoski Jr R. Pharmacol. Res. 2025:107723. doi: 10.1016/j.phrs.2025.107723. [DOI] [PubMed] [Google Scholar]
- Zhang J. Yang P. L. Gray N. S. Nat. Rev. Cancer. 2009;9:28–39. doi: 10.1038/nrc2559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartmann J. T. Haap M. Kopp H.-G. Lipp H.-P. Curr. Drug Metab. 2009;10:470–481. doi: 10.2174/138920009788897975. [DOI] [PubMed] [Google Scholar]
- Laufer S. Bajorath J. J. Med. Chem. 2021;65:891–892. doi: 10.1021/acs.jmedchem.1c02126. [DOI] [PubMed] [Google Scholar]
- Xerxa E. Miljković F. Bajorath J. r. J. Med. Chem. 2023;66:7657–7665. doi: 10.1021/acs.jmedchem.3c00621. [DOI] [PubMed] [Google Scholar]
- Xerxa E. Bajorath J. Eur. J. Med. Chem. 2024:116413. doi: 10.1016/j.ejmech.2024.116413. [DOI] [PubMed] [Google Scholar]
- Reiser P. Neubert M. Eberhard A. Torresi L. Zhou C. Shao C. Metni H. van Hoesel C. Schopmans H. Sommer T. Commun. Mater. 2022;3:93. doi: 10.1038/s43246-022-00315-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Y. Zhang L. Wang Y. Zou J. Yang R. Luo X. Wu C. Yang W. Tian C. Xu H. Nat. Commun. 2022;13:6891. doi: 10.1038/s41467-022-34692-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Korshunova M. Huang N. Capuzzi S. Radchenko D. S. Savych O. Moroz Y. S. Wells C. I. Willson T. M. Tropsha A. Isayev O. Commun. Chem. 2022;5:129. doi: 10.1038/s42004-022-00733-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Salem M. S. H. Aziz Y. M. A. Elgawish M. S. Said M. M. Abouzid K. A. Bioorg. Chem. 2020;94:103472. doi: 10.1016/j.bioorg.2019.103472. [DOI] [PubMed] [Google Scholar]
- Gardouh A. R. Srag El-Din A. S. Salem M. S. Moustafa Y. Gad S. Drug Des., Dev. Ther. 2021:3071–3093. doi: 10.2147/DDDT.S321962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lamba V. Ghosh I. Curr. Pharm. Des. 2012;18:2936–2945. doi: 10.2174/138161212800672813. [DOI] [PubMed] [Google Scholar]
- Eglen R. Reisine T. Pharmacol. Ther. 2011;130:144–156. doi: 10.1016/j.pharmthera.2011.01.007. [DOI] [PubMed] [Google Scholar]
- Cohen M. S. Zhang C. Shokat K. M. Taunton J. Science. 2005;308:1318–1321. doi: 10.1126/science1108367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Z. Liu Q. Bliven S. Xie L. Bourne P. E. J. Med. Chem. 2017;60:2879–2889. doi: 10.1021/acs.jmedchem.6b01815. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Viganò I. Di Giacomo N. Bozzani S. Antolini L. Piazza R. Gambacorti Passerini C. Am. J. Hematol. 2014;89:E184–E187. doi: 10.1002/ajh.23804. [DOI] [PubMed] [Google Scholar]
- Baran Y. Saydam G. J. Blood Med. 2012:139–150. doi: 10.2147/JBM.S29132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Piper-Vallillo A. J. Sequist L. V. Piotrowska Z. J. Clin. Oncol. 2020;38:2926–2936. doi: 10.1200/JCO.19.03123. [DOI] [PubMed] [Google Scholar]
- El-Damasy A. K., Salem M. S., Sebaiy M. M. and Elgawish M. S., in Current Molecular Targets of Heterocyclic Compounds for Cancer Therapy, Elsevier, 2024, pp. 219–254 [Google Scholar]
- Taylor P. C. Rheumatology. 2019;58:i17–i26. doi: 10.1093/rheumatology/key225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Y. Li S. Wang Y. Zhao Y. Li Q. Signal Transduction Targeted Ther. 2022;7:329. doi: 10.1038/s41392-022-01168-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang J. Hsieh C.-Y. Wang M. Wang X. Wu Z. Jiang D. Liao B. Zhang X. Yang B. He Q. Nat. Mach. Intell. 2021;3:914–922. [Google Scholar]
- Singha M. Pu L. Srivastava G. Ni X. Stanfield B. A. Uche I. K. Rider P. J. Kousoulas K. G. Ramanujam J. Brylinski M. Cancers. 2023;15:4050. doi: 10.3390/cancers15164050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deng E. Z. Marino G. B. Clarke D. J. Diamant I. Resnick A. C. Ma W. Wang P. Ma'ayan A. Cells Rep. Methods. 2024;4:100839. doi: 10.1016/j.crmeth.2024.100839. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuleshov M. V. Jones M. R. Rouillard A. D. Fernandez N. F. Duan Q. Wang Z. Koplev S. Jenkins S. L. Jagodnik K. M. Lachmann A. Nucleic Acids Res. 2016;44:W90–W97. doi: 10.1093/nar/gkw377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keenan A. B. Torre D. Lachmann A. Leong A. K. Wojciechowicz M. L. Utti V. Jagodnik K. M. Kropiwnicki E. Wang Z. Ma'ayan A. Nucleic Acids Res. 2019;47:W212–W224. doi: 10.1093/nar/gkz446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuleshov M. V. Xie Z. London A. B. Yang J. Evangelista J. E. Lachmann A. Shu I. Torre D. Ma'ayan A. Nucleic Acids Res. 2021;49:W304–W316. doi: 10.1093/nar/gkab359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen E. Y. Xu H. Gordonov S. Lim M. P. Perkins M. H. Ma'ayan A. Bioinformatics. 2012;28:105–111. doi: 10.1093/bioinformatics/btr625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marino G. B. Ngai M. Clarke D. J. Fleishman R. H. Deng E. Z. Xie Z. Ahmed N. Ma'ayan A. Nucleic Acids Res. 2023;51:W213–W224. doi: 10.1093/nar/gkad399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Migliozzi S. Oh Y. T. Hasanain M. Garofano L. D'angelo F. Najac R. D. Picca A. Bielle F. Di Stefano A. L. Lerond J. Nat. Cancer. 2023;4:181–202. doi: 10.1038/s43018-022-00510-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie J. Chen Y. Luo S. Yang W. Lin Y. Wang L. Ding X. Tong M. Yu R. Cells Rep. Methods. 2024;4:100797. doi: 10.1016/j.crmeth.2024.100797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cai Z. Poulos R. C. Aref A. Robinson P. J. Reddel R. R. Zhong Q. Cancer Res. Commun. 2024;4:3151–3164. doi: 10.1158/2767-9764.CRC-24-0285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu X. Tao Y. Cai Z. Bao P. Ma H. Li K. Li M. Zhu Y. Lu Z. J. Bioinformatics. 2024;40:btae316. doi: 10.1093/bioinformatics/btae316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou Y. Wang F. Tang J. Nussinov R. Cheng F. Lancet Digital Health. 2020;2:e667–e676. doi: 10.1016/S2589-7500(20)30192-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aliper A. Plis S. Artemov A. Ulloa A. Mamoshina P. Zhavoronkov A. Mol. Pharmaceutics. 2016;13:2524–2530. doi: 10.1021/acs.molpharmaceut.6b00248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Janizek J. D. Dincer A. B. Celik S. Chen H. Chen W. Naxerova K. Lee S.-I. Nat. Biomed. Eng. 2023;7:811–829. doi: 10.1038/s41551-023-01034-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu R. Rizzo S. Whipple S. Pal N. Pineda A. L. Lu M. Arnieri B. Lu Y. Capra W. Copping R. Nature. 2021;592:629–633. doi: 10.1038/s41586-021-03430-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chebanov D. K. and Misyurin V. A., medRxiv, 2023, preprint, 2023.2009.2011.23295380, 10.1101/2023.09.11.23295380 [DOI]
- Yu M. Li W. Yu Y. Zhao Y. Xiao L. Lauschke V. M. Cheng Y. Zhang X. Wang Y. Nat. Comput. Sci. 2024;4:600–614. doi: 10.1038/s43588-024-00679-4. [DOI] [PubMed] [Google Scholar]
- Noor F. Junaid M. Almalki A. H. Almaghrabi M. Ghazanfar S. Tahir ul Qamar M. Sci. Rep. 2024;14:28321. doi: 10.1038/s41598-024-79799-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Song J. Ha J. Lee J. Ko J. Shin W.-H. Sci. Rep. 2024;14:25167. doi: 10.1038/s41598-024-75400-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gentile F. Yaacoub J. C. Gleave J. Fernandez M. Ton A.-T. Ban F. Stern A. Cherkasov A. Nat. Protoc. 2022;17:672–697. doi: 10.1038/s41596-021-00659-2. [DOI] [PubMed] [Google Scholar]
- Luttens A. Cabeza de Vaca I. Sparring L. Brea J. Martínez A. L. Kahlous N. A. Radchenko D. S. Moroz Y. S. Loza M. I. Norinder U. Nat. Comput. Sci. 2025:1–12. doi: 10.1038/s43588-025-00777-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McNutt A. T. Francoeur P. Aggarwal R. Masuda T. Meli R. Ragoza M. Sunseri J. Koes D. R. Aust. J. Chem. 2021;13:43. doi: 10.1186/s13321-021-00522-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ciudad Á., Morales-Pastor A., Malo L., Filella-Mercè I., Guallar V. and Molina A., arXiv, 2024, preprint, arXiv:2406.09346, 10.48550/arXiv.2406.09346 [DOI]
- Zhu H. Li X. Chen B. Huang N. NPJ Drug Discov. 2025;2:1. [Google Scholar]
- Yang Z. Zeng X. Zhao Y. Chen R. Signal Transduction Targeted Ther. 2023;8:115. doi: 10.1038/s41392-023-01381-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu J. Allen B. K. Stathias V. Ayad N. G. Schürer S. C. Int. J. Mol. Sci. 2024;25:2538. doi: 10.3390/ijms25052538. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loeffler H. H. He J. Tibo A. Janet J. P. Voronov A. Mervin L. H. Engkvist O. Aust. J. Chem. 2024;16:20. doi: 10.1186/s13321-024-00812-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou Z. Kearnes S. Li L. Zare R. N. Riley P. Sci. Rep. 2019;9:10752. doi: 10.1038/s41598-019-47148-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhavoronkov A. Ivanenkov Y. A. Aliper A. Veselov M. S. Aladinskiy V. A. Aladinskaya A. V. Terentiev V. A. Polykovskiy D. A. Kuznetsov M. D. Asadulaev A. Nat. Biotechnol. 2019;37:1038–1040. doi: 10.1038/s41587-019-0224-x. [DOI] [PubMed] [Google Scholar]
- Green H. Koes D. R. Durrant J. D. Chem. Sci. 2021;12:8036–8047. doi: 10.1039/d1sc00163a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang M. Hsieh C.-Y. Wang J. Wang D. Weng G. Shen C. Yao X. Bing Z. Li H. Cao D. J. Med. Chem. 2022;65:9478–9492. doi: 10.1021/acs.jmedchem.2c00732. [DOI] [PubMed] [Google Scholar]
- Wang L., Song C., Liu Z., Rong Y., Liu Q. and Wu S., arXiv, 2025, preprint, arXiv:2502.09511, 10.48550/arXiv.2502.09511 [DOI]
- Alakhdar A. Poczos B. Washburn N. J. Chem. Inf. Model. 2024;64:7238–7256. doi: 10.1021/acs.jcim.4c01107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corso G., Stärk H., Jing B., Barzilay R. and Jaakkola T., arXiv, 2022, preprint, arXiv:2210.01776, 10.48550/arXiv.2210.01776 [DOI]
- Kooistra A. J. Kanev G. K. van Linden O. P. Leurs R. de Esch I. J. de Graaf C. Nucleic Acids Res. 2016;44:D365–D371. doi: 10.1093/nar/gkv1082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mysinger M. M. Carchia M. Irwin J. J. Shoichet B. K. J. Med. Chem. 2012;55:6582–6594. doi: 10.1021/jm300687e. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Metz J. T. Johnson E. F. Soni N. B. Merta P. J. Kifle L. Hajduk P. J. Nat. Chem. Biol. 2011;7:200–202. doi: 10.1038/nchembio.530. [DOI] [PubMed] [Google Scholar]
- Davis M. I. Hunt J. P. Herrgard S. Ciceri P. Wodicka L. M. Pallares G. Hocker M. Treiber D. K. Zarrinkar P. P. Nat. Biotechnol. 2011;29:1046–1051. doi: 10.1038/nbt.1990. [DOI] [PubMed] [Google Scholar]
- Triballeau N. Acher F. Brabet I. Pin J.-P. Bertrand H.-O. J. Med. Chem. 2005;48:2534–2547. doi: 10.1021/jm049092j. [DOI] [PubMed] [Google Scholar]
- Truchon J.-F. Bayly C. I. J. Chem. Inf. Model. 2007;47:488–508. doi: 10.1021/ci600426e. [DOI] [PubMed] [Google Scholar]
- Jiang Y., Li X., Zhang Y., Han J., Xu Y., Pandit A., Zhang Z., Wang M., Wang M. and Liu C., arXiv, 2025, preprint, arXiv:2505.01700, 10.48550/arXiv.2505.01700 [DOI]
- Kanev G. K. de Graaf C. Westerman B. A. de Esch I. J. Kooistra A. J. Nucleic Acids Res. 2021;49:D562–D569. doi: 10.1093/nar/gkaa895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stärk H., Ganea O., Pattanaik L., Barzilay R. and Jaakkola T., EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction, Proceedings of the 39th International Conference on Machine Learning PMLR, 2022, vol. 162, pp. 20503–20521
- Lu W., Wu Q., Zhang J., Rao J., Li C. and Zheng S., Advances in neural information processing systems, 2022, vol. 35, pp. 7236–7249
- Tang J. Szwajda A. Shakyawar S. Xu T. Hintsanen P. Wennerberg K. Aittokallio T. J. Chem. Inf. Model. 2014;54:735–743. doi: 10.1021/ci400709d. [DOI] [PubMed] [Google Scholar]
- Shahin R. Jaafreh S. Azzam Y. Future Sci. OA. 2025;11:2483631. doi: 10.1080/20565623.2025.2483631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yoshizawa T. Ishida S. Sato T. Ohta M. Honma T. Terayama K. J. Chem. Inf. Model. 2022;62:5351–5360. doi: 10.1021/acs.jcim.2c00787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma W. Hu J. Chen Z. Ai Y. Zhang Y. Dong K. Meng X. Liu L. J. Chem. Inf. Model. 2024;64:7273–7290. doi: 10.1021/acs.jcim.4c00595. [DOI] [PubMed] [Google Scholar]
- Bao L. Wang Z. Wu Z. Luo H. Yu J. Kang Y. Cao D. Hou T. Acta Pharm. Sin. B. 2023;13:54–67. doi: 10.1016/j.apsb.2022.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park H. Hong S. Lee M. Kang S. Brahma R. Cho K.-H. Shin J.-M. Sci. Rep. 2023;13:10268. doi: 10.1038/s41598-023-37456-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schifferstein J. Bernatavicius A. Janssen A. P. J. Chem. Inf. Model. 2024;64:9196–9204. doi: 10.1021/acs.jcim.4c01260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu C. Kutchukian P. Nguyen N. D. AlQuraishi M. Sorger P. K. J. Chem. Inf. Model. 2023;63:5457–5472. doi: 10.1021/acs.jcim.3c00347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oliveira P. F. Guedes R. C. Falcao A. O. Sci. Rep. 2024;14:8252. doi: 10.1038/s41598-024-58394-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Z. Qu N. Zhou J. Sun J. Ren Q. Meng J. Wang G. Wang R. Liu J. Chen Y. Nucleic Acids Res. 2024;52:W489–W497. doi: 10.1093/nar/gkae380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu Y. Jung A. W. Torne R. V. Gonzalez S. Vöhringer H. Shmatko A. Yates L. R. Jimenez-Linan M. Moore L. Gerstung M. Nat. Cancer. 2020;1:800–810. doi: 10.1038/s43018-020-0085-8. [DOI] [PubMed] [Google Scholar]
- Bannier P.-A. Saillard C. Mann P. Touzot M. Maussion C. Matek C. Klümper N. Breyer J. Wirtz R. Sikic D. Nat. Commun. 2024;15:10914. doi: 10.1038/s41467-024-55331-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Webster J. Ghith J. Penner O. Lieu C. H. Schijvenaars B. J. JCO Precis. Oncol. 2024;8:e2300685. doi: 10.1200/PO.23.00685. [DOI] [PubMed] [Google Scholar]
- Rameshkumar A. ArunPrasanna V. Mahalakshmi V. Raja M. R. Gopinath K. Process Biochem. 2024;144:142–159. [Google Scholar]
- Monteiro da Silva G. Cui J. Y. Dalgarno D. C. Lisi G. P. Rubenstein B. M. Nat. Commun. 2024;15:2464. doi: 10.1038/s41467-024-46715-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skrede O.-J. De Raedt S. Kleppe A. Hveem T. S. Liestøl K. Maddison J. Askautrud H. A. Pradhan M. Nesheim J. A. Albregtsen F. Lancet. 2020;395:350–360. doi: 10.1016/S0140-6736(19)32998-8. [DOI] [PubMed] [Google Scholar]
- Wang S. Wu C.-Y. He M.-M. Yong J.-X. Chen Y.-X. Qian L.-M. Zhang J.-L. Zeng Z.-L. Xu R.-H. Wang F. Nat. Commun. 2024;15:1515. doi: 10.1038/s41467-024-45479-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kather J. N. Pearson A. T. Halama N. Jäger D. Krause J. Loosen S. H. Marx A. Boor P. Tacke F. Neumann U. P. Nat. Med. 2019;25:1054–1056. doi: 10.1038/s41591-019-0462-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang X. Barrera C. Bera K. Viswanathan V. S. Azarianpour-Esfahani S. Koyuncu C. Velu P. Feldman M. D. Yang M. Fu P. Sci. Adv. 2022;8:eabn3966. doi: 10.1126/sciadv.abn3966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farahmand S. Fernandez A. I. Ahmed F. S. Rimm D. L. Chuang J. H. Reisenbichler E. Zarringhalam K. Mod. Pathol. 2022;35:44–51. doi: 10.1038/s41379-021-00911-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang K. Zhou H.-Y. Baptista-Hon D. T. Gao Y. Liu X. Oermann E. Xu S. Jin S. Zhang J. Sun Z. Patterns. 2024;5:101028. doi: 10.1016/j.patter.2024.101028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ogier du Terrail J. Leopold A. Joly C. Béguier C. Andreux M. Maussion C. Schmauch B. Tramel E. W. Bendjebbar E. Zaslavskiy M. Nat. Med. 2023;29:135–146. doi: 10.1038/s41591-022-02155-w. [DOI] [PubMed] [Google Scholar]
- He J. Baxter S. L. Xu J. Xu J. Zhou X. Zhang K. Nat. Med. 2019;25:30–36. doi: 10.1038/s41591-018-0307-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cowan-Jacob S. W. Fendrich G. Floersheimer A. Furet P. Liebetanz J. Rummel G. Rheinberger P. Centeleghe M. Fabbro D. Manley P. W. Acta Crystallogr., Sect. D: Biol. Crystallogr. 2007;63:80–93. doi: 10.1107/S0907444906047287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blakely C. M. Watkins T. B. Wu W. Gini B. Chabon J. J. McCoach C. E. McGranahan N. Wilson G. A. Birkbak N. J. Olivas V. R. Nat. Genet. 2017;49:1693–1704. doi: 10.1038/ng.3990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Druker B. J. Guilhot F. O'Brien S. G. Gathmann I. Kantarjian H. Gattermann N. Deininger M. W. Silver R. T. Goldman J. M. Stone R. M. N. Engl. J. Med. 2006;355:2408–2417. doi: 10.1056/NEJMoa062867. [DOI] [PubMed] [Google Scholar]
- Askin S. Burkhalter D. Calado G. El Dakrouni S. Health Technol. 2023;13:203–213. doi: 10.1007/s12553-023-00738-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harrer S. Shah P. Antony B. Hu J. Trends Pharmacol. Sci. 2019;40:577–591. doi: 10.1016/j.tips.2019.05.005. [DOI] [PubMed] [Google Scholar]
- Schram A. Borad M. Sahai V. Kamath S. Kim R. Liao C.-Y. A. Oh D. Y. Ponz-Sarvisé M. Yachnin J. Shell S. Eur. J. Cancer. 2022;174:S116. [Google Scholar]
- Subbiah V. Sahai V. Maglic D. Bruderek K. Touré B. B. Zhao S. Valverde R. O'Hearn P. J. Moustakas D. T. Schönherr H. Cancer Discovery. 2023;13:2012–2031. doi: 10.1158/2159-8290.CD-23-0475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schönherr H. Ayaz P. Taylor A. M. Casaletto J. B. Touré B. B. Moustakas D. T. Hudson B. M. Valverde R. Zhao S. O'Hearn P. J. Proc. Natl. Acad. Sci. U. S. A. 2024;121:e2317756121. doi: 10.1073/pnas.2317756121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pharmaceutical Technology, REC-4881 by Recursion Pharmaceuticals for colorectal cancer: Likelihood of approval [Internet], Data Insights, 2024. Apr 23 [cited 2025 Aug 30], Available from: https://www.pharmaceutical-technology.com/data-insights/rec-4881-recursion-pharmaceuticals-colorectal-cancer-likelihood-of-approval/?cf-view
- Ren F. Aliper A. Chen J. Zhao H. Rao S. Kuppe C. Ozerov I. V. Zhang M. Witte K. Kruse C. Nat. Biotechnol. 2025;43:63–75. doi: 10.1038/s41587-024-02143-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Savage S. R. Yi X. Lei J. T. Wen B. Zhao H. Liao Y. Jaehnig E. J. Somes L. K. Shafer P. W. Lee T. D. Cell. 2024;187:4389–4407. doi: 10.1016/j.cell.2024.05.039. [DOI] [PMC free article] [PubMed] [Google Scholar]; , e4315
- Tiong K.-L. Sintupisut N. Lin M.-C. Cheng C.-H. Woolston A. Lin C.-H. Ho M. Lin Y.-W. Padakanti S. Yeang C.-H. PLOS Digit. Health. 2022;1:e0000151. doi: 10.1371/journal.pdig.0000151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Musa A. Tripathi S. Kandhavelu M. Dehmer M. Emmert-Streib F. PLoS One. 2018;13:e0201937. doi: 10.1371/journal.pone.0201937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu T. Hwang L. Burley S. K. Nitsche C. I. Southan C. Walters W. P. Gilson M. K. Nucleic Acids Res. 2025;53:D1633–D1644. doi: 10.1093/nar/gkae1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jumper J. Evans R. Pritzel A. Green T. Figurnov M. Ronneberger O. Tunyasuvunakool K. Bates R. Žídek A. Potapenko A. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rata S., Gruver J. S., Trikoz N., Lukyanov A., Vultaggio J., Ceribelli M., Thomas C., Gujral T. S., Kirschner M. W. and Peshkin L., bioRxiv, 2020, preprint, 2020.2009.2026.312348, 10.1101/2020.09.26.312348 [DOI]
- Yun C.-H. Mengwasser K. E. Toms A. V. Woo M. S. Greulich H. Wong K.-K. Meyerson M. Eck M. J. Proc. Natl. Acad. Sci. U. S. A. 2008;105:2070–2075. doi: 10.1073/pnas.0709662105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiménez J. Doerr S. Martínez-Rosell G. Rose A. S. De Fabritiis G. Bioinformatics. 2017;33:3036–3042. doi: 10.1093/bioinformatics/btx350. [DOI] [PubMed] [Google Scholar]
- Lee J. Y. Gebauer E. Seeliger M. A. Bahar I. Curr. Opin. Struct. Biol. 2024;84:102770. doi: 10.1016/j.sbi.2023.102770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Singh S. Gapsys V. Aldeghi M. Schaller D. Rangwala A. M. White J. B. Bluck J. P. Scheen J. Glass W. G. Guo J. J. Phys. Chem. B. 2025;129:2882–2902. doi: 10.1021/acs.jpcb.4c07794. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
No primary research results, software or code have been included and no new data were generated or analyzed as part of this review.