AI In Action: Redefining Drug Discovery and Development

Anshul Kanakia; Mark Sale; Liang Zhao; Zhu Zhou

doi:10.1111/cts.70149

. 2025 Feb 6;18(2):e70149. doi: 10.1111/cts.70149

AI In Action: Redefining Drug Discovery and Development

Anshul Kanakia ¹, Mark Sale ², Liang Zhao ^3,^✉, Zhu Zhou ^4,^✉

PMCID: PMC11800368 PMID: 39912678

Artificial intelligence (AI) is a field integrating computer science, statistics, and engineering to develop systems capable of performing tasks that typically require human intelligence. AI applications for healthcare span from drug discovery to postmarket safety surveillance and advanced pharmaceutical manufacturing. This perspective provides insights into the application of AI in drug discovery, translational and late‐phase development and summarizes the perspectives of the clinical pharmacology community based on survey results, highlighting potential impacts on future research. [Correction added on 10 February 2025, after first online publication: the introductory paragraph was inadvertently published as part of the Acknowledgments section, but is now in the correct location in this version.]

1. Introduction

The 2024 Nobel Prize in Chemistry was awarded to David Baker, Demis Hassabis, and John Jumper for their groundbreaking work in using AI to predict protein structures and design functional proteins. The development of the AlphaFold model has solved a long‐standing challenge in biology by accurately predicting the complex structures of proteins, which are crucial for understanding their function. AlphaFold enhances our ability to design new proteins with specific functions and accelerates drug discovery and development by providing detailed insights into protein behavior and interactions. The recognition of this work underscores the transformative potential of AI in the life sciences and its critical role in future drug research and development (R&D).

2. The Role of AI in Drug Discovery

AI has revolutionized the drug discovery space in recent years, with applications ranging from highly accurate structure predictions of proteins [1], to the design and optimization of both small and large molecules [2]. Several large foundational models have been developed for encoding functional information of proteins in a powerful way to support the drug development pipeline [3, 4]. Figure 1 highlights the areas in the pipeline where AI now plays a significant role and is poised to disrupt traditional experimental techniques. The culmination of AI‐driven discovery is de novo design, where the entire preclinical pipeline can be performed in silico, resulting in billions of dollars of R&D cost savings, translating to reduced costs of medications and higher clinical success rates via optimization of safer and more developable molecules showing strong efficacy for well‐selected targets.

An overview of drug discovery augmented with AI at each primary step of the pipeline.

While de novo design is as‐yet unproven, the success rate of the 21 AI‐developed drugs that have completed Phase I trials as of December 2023 is 80%–90%, significantly higher than ~40% for traditional methods [5]. We continue to see an increase in the number of candidate drugs developed using AI enter clinical stages, and this trend is growing at an exponential rate—from 3 in 2016 to 17 in 2020 and 67 in 2023 [5].

The intersection between high‐quality data access across life science modalities like imaging, multi‐omics, DMRs, and very large protein repertoires, and recent advancements in the scaling and architecture of large deep learning models has led to an explosion in AI applications for healthcare. While some of this data is publicly available, much of it is proprietary and under the control of large pharmaceutical companies, partly due to regulatory and privacy concerns. Conversely, innovation in AI for drug discovery is being led by academic and industry research laboratories, often resulting in highly funded spin‐off ventures like Genentech, Recursion, Absci, and more recently, Evolutionary Scale. Such AI‐first life sciences companies have found success in synergistic partnerships with large pharmaceutical companies, thereby gaining access to the large proprietary datasets upon which to apply their AI expertise. Some of these partnerships have led to acquisitions such as the 2009 purchase of Genentech by Roche for approximately $46.8 billion, highlighting the value that AI internalization brings to large pharmaceutical companies.

3. The Role of AI in Translational, Knowledge‐Based Management

The use of AI is poised to cover the full life cycle of a drug product, including drug discovery, drug development, and application assessment in a regulatory setting. Recent research from the Food and Drug Administration (FDA) included two distinct case studies. The first case exemplifies the use of conventional machine learning (ML) approaches through a project aimed at decoding kinase–adverse event associations for small molecule kinase inhibitors (SMKIs). By constructing a multi‐domain dataset from 4638 patients in registrational trials of 16 FDA‐approved SMKIs, ML models such as Random Survival Forests (RSF), Artificial Neural Networks (ANNs), and DeepHit were utilized to find potential associations between 442 kinases and 2145 adverse events. This information was made publicly accessible via an interactive web application, “Identification of Kinase‐Specific Signal” (https://gongj.shinyapps.io/ml4ki). This platform aids experimentalists in identifying and verifying kinase‐inhibitor adverse event pairs and serves as a precision‐medicine tool to mitigate individual patient safety risks by forecasting clinical safety signals [6]. In general, the credibility of AI models in extrapolation and generalization heavily depends on the diversity and comprehensiveness of the training data. Future studies integrating richer datasets with detailed genomic, phenotypic, and demographic information could further improve the precision of such associations and help refine the applicability of these models to specific patient subgroups. For future research, while Multi‐Input Neural Networks were not employed in this study, they represent a promising architecture for integrating heterogeneous datasets, such as kinase activity, demographic data, and clinical outcomes, into a unified predictive framework. Additionally, hybrid approaches combining neural networks with Markov Chains could be explored to capture sequential dependencies in disease progression and improve the robustness of predictions across diverse patient cohorts.

The second case study showcases the application of generative AI methods through the development of PharmBERT, a domain‐specific large language model (LLM) for drug labels [7]. Leveraging the foundational BERT architecture, PharmBERT was pre‐trained on textual data extracted from 138,924 raw drug labels sourced from DailyMed. This pre‐training on domain‐specific text significantly improved the model's performance in extracting pharmacokinetic information from drug labeling. PharmBERT demonstrated superior performance in tasks such as adverse drug reaction (ADR) detection and ADME (absorption, distribution, metabolism, and excretion) classification, surpassing other models like ClinicalBERT and BioBERT. This advancement underscores the potential of LLMs to enhance the efficiency of text‐related regulatory work and improve the extraction of critical information from complex drug labels.

Together, these case studies illustrate the transformative impact of AI on drug development and regulatory science. Traditional AI methods provide robust frameworks for specific, structured data analyses, while generative AI methods offer expansive capabilities for handling unstructured data and developing generalized intelligence. Both approaches are crucial for advancing personalized medicine and optimizing drug development processes.

4. The Role of AI in Late‐Phase Development

AI methods have been used across a range of disciplines in late‐phase development. One of the greatest opportunities for these methods is to reduce the time from last‐subject last‐visit to application filing. Some examples of the tasks to be accomplished in phase are:

Regulatory document authoring
Pharmacometrics

4.1. Regulatory Document Authoring

AI for document authoring is an opportunity for time saving between the last subject's last visit and filing. The Generative Pre‐trained Transformer (GPT) algorithm is promising for this task. The challenge is finding an adequate training data set (consisting of clinical study results, clinical protocols, and final reports). One general‐purpose GPT‐based tool for authoring regulatory documents has been described by Bouton [8]. While GPT is promising, it is crucial to ensure that the algorithm does not generate inaccuracies, commonly referred to as ‘hallucinations,’ given the sensitivity and high stakes of regulatory documents. There remains a significant challenge in finding adequate training datasets, consisting of clinical study results, protocols, and final reports.

4.2. Pharmacometrics Modeling

Work on constructing model code has met with variable success. Shin et al. [9] had modest success for initial coding in NONMEM using common GPT platforms. However, all required correction of errors by humans. pyDarwin is a more general approach to using ML for PMX model selection. pyDarwin makes available a number of algorithms to search for the optimal pharmacometrics model, including pharmacokinetics and pharmacodynamics. This ML approach identifies the optimal combination of user‐defined model features, such as the number of compartments, covariate relationships, and random effects, based on user‐defined criteria. This method has been shown to be superior to the traditional manual forward addition/backward elimination method, resulting in considerable time savings and a more robust model [10].

5. Summary Views From Clinical Pharmacology Community

Figure 2 summarizes the results from two surveys during the “When AI Meets Drug Development” session at the 2024 American Society of Clinical Pharmacology and Therapeutics Annual Meeting. The first question evaluates views on AI's potential as a significant change in drug R&D. Notably, 80% of participants recognized AI's significant impact, while 12% were unconvinced. No participants were unaware of AI's application in drug R&D, suggesting a high level of awareness within the clinical pharmacology community. A small minority (6%) were uncertain about AI's current capabilities, and 2% selected an unspecified option. Regarding AI's future impact in the next 5–10 years, 45% highlighted a preference for its application in molecule design and optimization, followed by clinical trials and development (28%), target discovery and validation (20%), and preclinical testing and screening (7%). The results highlight the current familiarity, usage, and perceptions of AI among clinical pharmacology community, indicating a strong interest and optimism about AI's role in the future of drug development.

Summary of survey questions and results from the 2024 American Society of Clinical Pharmacology and Therapeutics Annual Meeting “When AI Meets Drug Development” Session.

6. Future Direction

Looking ahead, the integration of AI in drug R&D is poised to accelerate, driven by advancements from leading tech companies. NVIDIA's powerful GPUs and AI frameworks are enabling faster and more efficient generative drug discovery processes. Google Health is leveraging its expertise in data analytics and ML to enhance predictive modeling and patient data analysis. Apple Health is contributing through its health data ecosystem, facilitating personalized medicine and real‐time health monitoring. OpenAI's cutting‐edge language models are revolutionizing the way researchers generate hypotheses and analyze scientific literature. These innovations collectively promise to streamline the drug development pipeline, reduce costs, and improve clinical outcomes, heralding a new era of precision medicine.

As global investment in AI for drug discovery accelerates, so does the expectation of improved outcomes for drug programs. As of 2024, there are no on‐market medications that have been developed using an AI‐first pipeline. Future drivers for AI, particularly in healthcare, need to show disruption to existing business processes and tangible financial gains. This could happen via the launch of the first AI‐developed medication or AI‐based clinical pipeline improvements that significantly reduce the lead time from first patient in to regulatory approval.

Conflicts of Interest

M.S. is an employee of Certara. A.K. is an employee of AstraZeneca. All other authors declared no competing interests for this work.

Acknowledgments

The content of this perspective was presented at the 2024 American Society of Clinical Pharmacology and Therapeutics Annual Meeting.

Funding: Z.Z. was supported by the National Institute Of General Medical Sciences of the National Institutes of Health under Award Number R16GM146679. M.S. was funded by FDA/NIH grant (Development of a model selection method for population pharmacokinetics analysis by deep‐learning‐based reinforcement learning; Unique Federal Award Identification Number: 1U01FD007355). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.

Anshul Kanakia, Mark Sale, Liang Zhao and Zhu Zhou contributed equally to this work.

Contributor Information

Liang Zhao, Email: liang.zhao2@ucsf.edu.

Zhu Zhou, Email: zzhou1@york.cuny.edu.

References

1. Jumper J., Evans R., Pritzel A., et al., “Highly Accurate Protein Structure Prediction With AlphaFold,” Nature 596, no. 7873 (2021): 583–589, 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Watson J. L., Juergens D., Bennett N. R., et al., “De Novo Design of Protein Structure and Function With RFdiffusion,” Nature 620, no. 7976 (2023): 1089–1100, 10.1038/s41586-023-06415-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Lin Z., Akin H., Rao R., et al., “Evolutionary‐Scale Prediction of Atomic‐Level Protein Structure With a Language Model,” Science 379, no. 6637 (2023): 1123–1130, 10.1126/science.ade2574. [DOI] [PubMed] [Google Scholar]
4. Madani A., Krause B., Greene E. R., et al., “Large Language Models Generate Functional Protein Sequences Across Diverse Families,” Nature Biotechnology 41, no. 8 (2023): 1099–1106, 10.1038/s41587-022-01618-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Kp Jayatunga M., Ayers M., Bruens L., Jayanth D., and Meier C., “How Successful Are AI‐Discovered Drugs in Clinical Trials? A First Analysis and Emerging Lessons,” Drug Discovery Today 29, no. 6 (2024): 104009, 10.1016/j.drudis.2024.104009. [DOI] [PubMed] [Google Scholar]
6. Gong X., Hu M., Liu J., et al., “Decoding Kinase‐Adverse Event Associations for Small Molecule Kinase Inhibitors,” Nature Communications 13, no. 1 (2022): 4349, 10.1038/s41467-022-32033-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. ValizadehAslani T., Shi Y., Ren P., et al., “PharmBERT: A Domain‐Specific BERT Model for Drug Labels,” Briefings in Bioinformatics 24, no. 4 (2023): 226, 10.1093/bib/bbad226. [DOI] [PubMed] [Google Scholar]
8. Bouton C., “Generative AI Tools for Regulatory Writing,” accessed 07, 30, 2024, https://www.certara.com/blog/generative‐ai‐tools‐for‐regulatory‐writing/?utm_source=linkedin&utm_medium=organic_social&utm_campaign=Accelerating_medical_writing_efficiency_blog.
9. Shin E., Yu Y., Bies R. R., and Ramanathan M., “Evaluation of ChatGPT and Gemini Large Language Models for Pharmacometrics With NONMEM,” Journal of Pharmacokinetics and Pharmacodynamics 51, no. 3 (2024): 187–197, 10.1007/s10928-024-09921-y. [DOI] [PubMed] [Google Scholar]
10. Li X., Sale M., Nieforth K., et al., “pyDarwin Machine Learning Algorithms Application and Comparison in Nonlinear Mixed‐Effect Model Selection and Optimization,” Journal of Pharmacokinetics and Pharmacodynamics 51, no. 6 (2024): 785–796, 10.1007/s10928-024-09932-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cts70149-bib-0001] 1. Jumper J., Evans R., Pritzel A., et al., “Highly Accurate Protein Structure Prediction With AlphaFold,” Nature 596, no. 7873 (2021): 583–589, 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cts70149-bib-0002] 2. Watson J. L., Juergens D., Bennett N. R., et al., “De Novo Design of Protein Structure and Function With RFdiffusion,” Nature 620, no. 7976 (2023): 1089–1100, 10.1038/s41586-023-06415-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cts70149-bib-0003] 3. Lin Z., Akin H., Rao R., et al., “Evolutionary‐Scale Prediction of Atomic‐Level Protein Structure With a Language Model,” Science 379, no. 6637 (2023): 1123–1130, 10.1126/science.ade2574. [DOI] [PubMed] [Google Scholar]

[cts70149-bib-0004] 4. Madani A., Krause B., Greene E. R., et al., “Large Language Models Generate Functional Protein Sequences Across Diverse Families,” Nature Biotechnology 41, no. 8 (2023): 1099–1106, 10.1038/s41587-022-01618-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cts70149-bib-0005] 5. Kp Jayatunga M., Ayers M., Bruens L., Jayanth D., and Meier C., “How Successful Are AI‐Discovered Drugs in Clinical Trials? A First Analysis and Emerging Lessons,” Drug Discovery Today 29, no. 6 (2024): 104009, 10.1016/j.drudis.2024.104009. [DOI] [PubMed] [Google Scholar]

[cts70149-bib-0006] 6. Gong X., Hu M., Liu J., et al., “Decoding Kinase‐Adverse Event Associations for Small Molecule Kinase Inhibitors,” Nature Communications 13, no. 1 (2022): 4349, 10.1038/s41467-022-32033-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cts70149-bib-0007] 7. ValizadehAslani T., Shi Y., Ren P., et al., “PharmBERT: A Domain‐Specific BERT Model for Drug Labels,” Briefings in Bioinformatics 24, no. 4 (2023): 226, 10.1093/bib/bbad226. [DOI] [PubMed] [Google Scholar]

[cts70149-bib-0008] 8. Bouton C., “Generative AI Tools for Regulatory Writing,” accessed 07, 30, 2024, https://www.certara.com/blog/generative‐ai‐tools‐for‐regulatory‐writing/?utm_source=linkedin&utm_medium=organic_social&utm_campaign=Accelerating_medical_writing_efficiency_blog.

[cts70149-bib-0009] 9. Shin E., Yu Y., Bies R. R., and Ramanathan M., “Evaluation of ChatGPT and Gemini Large Language Models for Pharmacometrics With NONMEM,” Journal of Pharmacokinetics and Pharmacodynamics 51, no. 3 (2024): 187–197, 10.1007/s10928-024-09921-y. [DOI] [PubMed] [Google Scholar]

[cts70149-bib-0010] 10. Li X., Sale M., Nieforth K., et al., “pyDarwin Machine Learning Algorithms Application and Comparison in Nonlinear Mixed‐Effect Model Selection and Optimization,” Journal of Pharmacokinetics and Pharmacodynamics 51, no. 6 (2024): 785–796, 10.1007/s10928-024-09932-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

AI In Action: Redefining Drug Discovery and Development

Anshul Kanakia

Mark Sale

Liang Zhao

Zhu Zhou

1. Introduction

2. The Role of AI in Drug Discovery

FIGURE 1.

3. The Role of AI in Translational, Knowledge‐Based Management

4. The Role of AI in Late‐Phase Development

4.1. Regulatory Document Authoring

4.2. Pharmacometrics Modeling

5. Summary Views From Clinical Pharmacology Community

FIGURE 2.

6. Future Direction

Conflicts of Interest

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

AI In Action: Redefining Drug Discovery and Development

Anshul Kanakia

Mark Sale

Liang Zhao

Zhu Zhou

1. Introduction

2. The Role of AI in Drug Discovery

FIGURE 1.

3. The Role of AI in Translational, Knowledge‐Based Management

4. The Role of AI in Late‐Phase Development

4.1. Regulatory Document Authoring

4.2. Pharmacometrics Modeling

5. Summary Views From Clinical Pharmacology Community

FIGURE 2.

6. Future Direction

Conflicts of Interest

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases