Abstract
The global pharmaceutical drug delivery market is forecasted to grow to USD 2546.0 billion by 2029. The expanding pharmaceutical market urgently needs a more efficient drug research and development paradigm. Artificial intelligence (AI) is revolutionizing drug delivery by offering alternatives to traditional trial-and-error experimental approaches. This review systematically traces the technological evolution from early simple models to current advanced AI algorithms in various applications, ranging from formulation optimization to the prediction of critical formulation parameters and de novo material design. To enhance the reliability of AI applications in drug delivery, we present comprehensive guidelines and “Rule of Five” (Ro5) principles to systematically direct researchers in utilizing AI in formulation development. This “Ro5” includes the following criteria: a formulation dataset containing at least 500 entries, coverage of a minimum of 10 drugs and all significant excipients, appropriate molecular representations for both drugs and excipients, inclusion of all critical process parameters, and utilization of suitable algorithms and model interpretability. The review concludes with insights into emerging trends and future directions, including the utilization of large language models, multidisciplinary collaboration opportunities, talent development, and culture transformation, aimed at facilitating a paradigm shift toward AI-driven drug formulation development.
Key words: Drug delivery, Rational formulation design, Artificial intelligence, Machine learning, Deep learning, Formulation prediction, Rule of five, Multidisciplinary integration
Graphical abstract
This review summarizes the evolution of AI in drug delivery and highlights the importance of advanced models, multidisciplinary integration, and talent training for the future.

1. Introduction
Drug delivery is a critical step in transforming active pharmaceutical ingredients (APIs) into clinically applicable dosage forms, which helps to optimize the pharmacokinetics (PK) and pharmacodynamics (PD) properties of drugs1,2. As the difficulties and cost of producing new molecular entities (NMEs) continue to increase3, the importance and promise of drug delivery continue to grow. The global pharmaceutical drug delivery market is forecasted to grow from USD 1949.4 billion in 2024 to USD 2546.0 billion by 2029, expanding at a compound annual growth rate (CAGR) of 5.5%4.
Modern drug delivery technology has evolved significantly over the last seven decades since SmithKline successfully introduced the first 12-h controlled-release formulation using Spansule® technology in 19525. The evolution can be observed in three key dimensions: First, therapeutic agents have evolved from traditional small-molecule APIs to include biomolecules such as peptides, proteins, and nucleic acids, as well as cell therapies. Second, delivery systems have diversified, from conventional tablets, capsules, and injections to advanced delivery systems such as microspheres, liposomes, and nanoparticles. Third, delivery goals encompass not only optimizing the controlled drug release but also incorporating the requirements of target delivery and personalized delivery.
The principle of drug delivery is multi-task optimization from a high-dimensional space based on material attributes and process parameters6, with an estimated formulation design space between 1025 and 1030. However, the drug delivery paradigm largely depends on traditional trial-and-error experimental approaches. The inefficient methodology relies heavily on researchers' experience and intuition to explore a minute fraction of a vast design space, resulting in significantly prolonged development timelines and huge expenses. From 2010 to 2019, drug development averaged 8.7 years from Investigational New Drug (IND) filing to New Drug Application (NDA) approval7. The mean cost to develop a new drug was estimated at $879.3 million (2018 dollars)8. Furthermore, while the pharmaceutical industry has evolved to accumulate a wealth of valuable data, traditional research and development (R&D) methods lack effective tools to leverage it fully, potentially overlooking critical information. Given these challenges, more cost-effective development strategies are urgently required to accelerate drug R&D processes.
Artificial intelligence (AI), which refers to the simulation of human intelligence by machines that can learn from existing data and adapt to new inputs, has significantly developed since its origin in the 1950s. Although it experienced several downturns, AI has ushered in an explosive research boom since the AlexNet model won the ImageNet competition in 20129. Powered by the advances and the convergence of big data, advanced algorithms, and computing resources, AI has achieved remarkable success across various fields. The 2024 Nobel Prize in Physics and Chemistry was awarded for AI-related work, highlighting AI's profound impact on science10. Attracted by the immense potential of AI techniques, more and more pharmaceutical companies are setting up divisions involving AI11. The major technology companies such as Google and Microsoft, along with some AI-focused startups, are also leveraging their expertise to intensively pursue opportunities in biomedical fields12. The U.S. Food and Drug Administration (FDA) has further validated this technological transformation by recognizing and supporting AI's role in drug discovery and development13.
In recent years, there has been a dramatic growth in interest in the transformative potential of AI in drug delivery research. Fig. 1 shows the publication of AI applications in drug delivery from 2000 to 2024 based on the Web of Science (WOS) database. Before 2018, publications related to AI technologies remained few, but subsequently showed a striking upward trend, demonstrating exponential growth over the following years and exceeding 500 publications in 2024. Table 1 presents the rankings of the top 10 countries, affiliations with departments, journals, and hot topics in AI for drug delivery research based on publication output. Various AI techniques have been successfully applied to predict drug-excipient interactions14,15, optimize formulations for various dosage forms16,17, predict critical process parameters18, and efficiently screen delivery materials19.
Figure 1.
The bar chart illustrates the temporal evolution of publication counts indexed in Web of Science from 2000 to 2024 (accessed January 4, 2025) using the following keywords setting: ALL=(“artificial intelligence” OR “machine learning” OR “deep learning” OR “neural network” OR “expert system”) AND ALL=(“drug formulation” OR “pharmaceutical formulation” OR “drug delivery” OR “pharmaceutics”).
Table 1.
Rankings of the top 10 countries, affiliations with departments, journals, and hot topics in AI-related pharmaceutics publications from the WOS database (based on the same search keyword setting as Fig. 1).
| Ranka | Country | Affiliation with the department | Journal | Hot topic |
|---|---|---|---|---|
| 1 | USA (448) | Institute of Chinese medical Science, University of Macau, China (23) | Pharmaceutics (167) | Solid dispersion (274) |
| 2 | China (431) | Faculty of Pharmacy, University of Belgrade (17) | International Journal of Pharmaceutics (158) | Gene delivery (103) |
| 3 | India (222) | School of Engineering, Massachusetts Institute of Technology (16) | Molecular Pharmaceutics (121) | Stratum corneum (63) |
| 4 | England (164) | Faculty of Pharmacy, King Abdulaziz University (15) | Advanced Drug Delivery Reviews (46) | Protein folding (41) |
| 5 | Iran (121) | College of Pharmacy, The University of Texas at Austin (14) | Journal of Drug Delivery Science and Technology (45) | Hydrogels (32) |
| 6 | Saudi Arabia (115) | Faculty of Engineering, University of Waterloo (14) | European Journal of Pharmaceutics and Biopharmaceutics (42) | Exosomes (28) |
| 7 | Germany (96) | Faculty of Science, University of Waterloo (14) | Journal of Controlled Release (30) | Gene expression data (26) |
| 8 | South Korea (91) | College of Pharmacy, King Saud University (13) | Journal of molecular Liquids (20) | Dry powder inhaler (25) |
| 9 | Spain (88) | College of Design and Engineering, National University of Singapore (13) | Scientific Reports (20) | Silver nanoparticles (25) |
| 10 | Canada (74) | School of Pharmacy, Tehran University of medical Sciences (13) | Advanced materials (14) | Supercritical carbon dioxide (23) |
The rankings of countries, affiliations with departments, journals, and hot topics are independent of each other and there is no direct correspondence across columns. The numbers in parentheses indicate the publication count for each entry.
To address the growing importance of AI in drug delivery, this review systematically summarizes the evolution of AI applications in drug delivery, illustrating how AI is transforming the traditional research paradigm (Fig. 2). As AI applications in this field continue to expand rapidly, establishing robust methodological standards becomes crucial for ensuring reproducibility and enabling effective comparison of different AI techniques. This review proposes comprehensive guidelines to address this need, including a “Rule of Five” (Ro5) principle for developing reliable AI models in formulation prediction. Beyond these practical guidelines, this review further explores emerging trends and future directions, including the utilization of large language models, opportunities for multidisciplinary collaboration, talent development, and culture transformation. By providing both practical methodological guidance and forward-looking perspectives, this review is a valuable starting point for pharmaceutical researchers seeking to incorporate advanced AI techniques into their research.
Figure 2.
Evolution of computational methods in drug delivery.
2. Early stage: initial applications of artificial intelligence before 2018
The applications of AI in drug delivery can be traced back to the early 1970s20. Over the following decades, researchers gradually explored AI's potential in pharmaceutical formulation development, which laid the ground for future development. This section provides an overview of these early applications, while Table 220, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 summarizes these efforts, highlighting how various computational tools were applied to different drug delivery systems during this formative stage.
Table 2.
Early representative computational applications in drug delivery.
| Year | Computational method | Dosage form | Dataset | Objective | Ref. |
|---|---|---|---|---|---|
| 1973 | Factorial design | Tablet | 27 data (1 API and 1 excipient) | Predict disintegration time, tablet hardness, friability, weight, thickness, porosity, mean pore diameter, and dissolution (% at 30min). | 20 |
| 1990 | Expert system | Aerosols, capsules, granulates, injection solutions, and tablets | Not mentioned | Carrying out ‘theoretical experiments’ by the computer using galenical knowledge before testing drug products in practical experiments | 21 |
| 1991 | RSM and ANN | Matrix capsule | 23 data (1 API and 4 excipients) | Predict release exponent N and the dissolution half-time T0.5 | 22 |
| 1998 | ANN | Sustained-release matrix tablets | 3 data (1 API and 1 excipient) | Establish in vitro–in vivo correlation (IVIVC) | 23 |
| 2000 | ANN | Osmotic pump tablets | 30 data (1 API and 2 excipients) | The drug release rate and the correlation coefficient | 24 |
| 2002 | MLR and PLS | Pure drug | 17 data (17 drugs) | Predict intrinsic solubility | 25 |
| 2003 | PLS | Pure drug | 23 data (23 drugs) | Predict solubility and permeability | 26 |
| 2006 | Neurofuzzy logic and neural networks | Immediate-release tablet | 205 data (1 API and 4 excipients) | Predict tablet tensile strength, disintegration time, friability, capping, and drug dissolution rate (%) at 15, 30, 45, and 60 min. | 27 |
| 2011 | Expert system | Osmotic pump tablets | Hundreds of PPOP data | Establish a formulation design model based on the prediction of release behavior | 28 |
| 2014 | RSM and ANN | Solid dispersions | 46 data (6 APIs and 1 excipient) | Predict yield, outlet temperature, and mean particle size | 29 |
| 2015 | RSM and ANN | Nanoparticles | 18 data (1 API and 4 excipients) | Predict particle size and loading efficiency | 30 |
API, active pharmaceutical ingredient; ANN, artificial neural network; RSM, response surface methodology; MLR, multilinear regression; PLS, partial least square; PPOP, push–pull osmotic pump tablets.
2.1. Statistical models
Pharmaceutical research and development often revolve around solving optimization problems. The need for efficiency in pharmaceutical development drove the transition from empirical practices to systematic approaches in drug delivery. Statistical models provided researchers with tools to identify relationships between formulation variables and outcomes.
A pioneering study conducted in 197320 exemplified the early adoption of statistical models in drug delivery, employing factorial design and regression analysis to optimize tablet formulations. This study used a dataset of 27 samples with five variables as inputs, including diluent ratio, compression force, disintegrant levels, blinder level, and lubricant level. Through regression analysis, second-order polynomial predictive equations were derived and optimized using feasibility and grid search methods. The predictions showed excellent agreement with experimental results in disintegration time, tablet hardness, dissolution rate in 30 min, and thickness, illustrating the effectiveness of integrating statistical modeling with computational optimization tools. Using graphical techniques such as response curves and contour plots further enhanced the understanding of the formulation system. Regression analysis was also employed to optimize the formulation variables of griseofulvin/hydroxypropyl cellulose solid dispersions and flufenamic acid/polyvinylpolypyrrolidone/methyl cellulose solid dispersions for high dissolution rates and stability31,32.
Partial least squares (PLS) regression can effectively handle multicollinearity in datasets, while simultaneously modeling multiple response variables and managing situations where predictors outnumber observations. Bergström et al.25 first introduced experimental and computational screening models for predicting aqueous drug solubility. They generated high-quality experimental data by developing a miniaturized shake-flask method to measure the intrinsic solubility of 17 compounds. These data were then analyzed using PLS to establish correlations between molecular descriptors such as lipophilicity (ClogP) and molecular surface area, with solubilities, resulting in a predictive model with the coefficient of determination (R2) of 0.91.
PLS has also been used in predicting oral drug absorption classification based on molecular surface properties26. Based on a structurally diverse dataset of 23 drug molecules, the researchers combined experimental data such as pKa, logPoct, and Caco-2 monolayer permeability with computationally derived molecular surface area descriptors to simultaneously predict solubility and permeability, enabling theoretical classification of drug absorption profiles. The resulting model demonstrated that these surface-based descriptors could predict solubility and permeability with high accuracy, achieving 87% prediction accuracy for the solubility-permeability profile of the 23 compounds. Furthermore, the model achieved a prediction accuracy of 77% on an external test set comprising FDA-recommended standard compounds. These studies highlighted the value of integrating experimental and computational approaches to improve early-stage solubility and permeability predictions.
2.2. Expert systems
In the 1980s, expert systems were applied in drug delivery. Expert systems were designed to mimic human decision-making by integrating domain-specific knowledge into rule-based frameworks. These systems relied on predefined rules derived from expert insights to generate predictions or recommendations.
One of the earliest documented applications of expert systems in pharmaceutical formulation was introduced in 1989, when Zeneca Pharmaceuticals UK and Logica UK developed the Product Formulation Expert System (PFES)33. Since then, similar systems have been developed in the 1990s. For instance, the Cadila System facilitated tablet formulation by leveraging knowledge of API properties, such as solubility, hygroscopicity, and dissolution rate34. Similarly, the Capsule System and Sanofi System35 were designed to optimize hard gelatin capsule formulations based on specific preformulation data. Zeneca Pharmaceuticals further extended the application of PFES to create expert systems for tablets, parenteral formulations, and film coating, demonstrating its versatility across various dosage forms36, 37, 38. Around the same period, the Boots Company introduced an expert system to aid in the formulation of creams and lotions, expanding the scope of AI in pharmaceutical development.
As technology evolved, expert systems expanded to include immediate-release and controlled-release formulations. For example, push-pull osmotic pump tablets (PPOPTs) benefited from AI-assisted expert systems that integrate predictive models with knowledge-based rules, enabling rapid prototyping and efficient exploration of formulation options28. SeDeM Expert System, known as “Sediment delivery model”, was an innovative tool developed in 2005. Designed for direct compression tablets, it incorporated nearly all the critical physical parameters required to evaluate the compressibility of powdered substances39. The SeDeM expert system has since been widely applied to the preformulation study of oral tablets such as cefuroxime axetil and paracetamol40. A notable extension of SeDeM is the SeDeM-ODT variant, explicitly tailored for orally disintegrating tablets (ODTs). This system assesses excipient and API mixtures for compressibility and orodispersibility, introducing indices like the index of good compressibility and orodispersibility (IGCB). Such tools facilitated the optimization of APIs, including ibuprofen41, 42, 43. Importantly, SeDeM-ODT enabled simultaneous optimization of direct compression and disintegration properties, further enhancing formulation precision.
Another key development in this era was the Ontology-Based Expert System for Immediate-Release Tablets (OXPIRT)44. The OXPIRT system supported pharmacists by offering ingredient lists, manufacturing processes, lab-scale production steps, and equivalence validation with original drugs. It combined domain knowledge from guidebooks and patents, structured in ontology format, with production rules for calculations and process recommendations, effectively bridging traditional expertise with computational intelligence.
2.3. Artificial neural networks
Inspired by biological neural networks, artificial neural networks (ANNs) consist of interconnected layers of nodes that process information through weighted connections. This structure allows ANNs to model intricate input-output dynamics and capture complex, nonlinear relationships. ANNs can learn patterns from data, potentially offering advantages in handling certain types of variability in the data. In pharmaceutical development, ANNs have provided additional tools for modeling complex systems, complementing traditional approaches in areas such as formulation optimization and process control.
A significant early application of ANNs in pharmaceutical development was reported in 1991, modeling and optimizing controlled-release hydrophilic matrix capsules containing mixtures of anionic and non-ionic cellulose ether polymers22. This study provided a comparative analysis of ANN and response surface methodology (RSM) to understand the relationships between formulation variables and drug release parameters. In their case study, ANN showed higher predictive accuracy for dissolution half-time in the validation datasets.
ANNs were also applied to establish in vitro‒in vivo correlation (IVIVC). In 1999, a study23 used dissolution profiles from two extended-release formulations to predict their corresponding in vivo PK behavior. 29 different ANN architectures were evaluated, including feedforward neural networks, recurrent neural networks, and generalized regression neural networks (GRNN). The feedforward neural networks and GRNNs demonstrate the highest predictive accuracy in modeling the relationship between dissolution and in vivo PK profiles. This study underscored the potential of ANN in capturing the intricate dynamics of drug release and absorption.
Neurofuzzy logic represents a hybrid computational approach combining the pattern-recognition capabilities of neural networks with the interpretability of fuzzy logic. A comparative analysis of neurofuzzy logic and neural networks was conducted to model 205 experimental data from immediate-release tablet formulations27. Both methods effectively predicted tablet tensile strength and drug dissolution profiles. While neural networks exhibited a slight advantage in predicting unseen data, neurofuzzy logic provided an additional benefit by generating interpretable “if-then” rules, offering deeper insights into formulation performance. In a subsequent study45, neurofuzzy logic with decision trees was compared for knowledge extraction from the same dataset. Both techniques successfully generated useful insights using “if-then” rules or decision trees. In a comparative modeling study46 of developing direct compression formulations, neurofuzzy logic was evaluated against multiple linear regression using data from factorial design experiments. Neurofuzzy logic achieved lower normalized error rates and superior prediction accuracy for five output variables. Additionally, the derived fuzzy rules quantified the nonlinear relationships between formulation variables. These findings were consistent with statistical results while also revealing novel insights.
2.4. Limitations of early AI applications in drug delivery
Despite introducing innovative perspectives to the drug delivery field, early AI techniques failed to gain widespread attention at that stage due to several significant limitations, as summarized in Table 3.
Table 3.
Comparison of early and current AI applications in drug delivery.
| Aspect | Early AI | Current AI |
|---|---|---|
| Data volume | Smaller datasets (typically <100 data samples) | Larger datasets (typically ≥500 data samples) |
| Formulation scope | Limited to a few drugs and excipients | ≥ 10 drugs and all important excipients |
| Data representation | Simple representation of drugs and excipients (e.g., basic molecular descriptors) | Advanced molecular representations, including molecular descriptors, molecular fingerprints, 3D conformations, molecular graphs, and text-based embeddings |
| Algorithms | Basic statistical methods, expert systems, and simple neural networks | Advanced AI algorithms, including classic machine learning (e.g., LightGBM), deep neural networks, and advanced architectures such as transformers and generative models |
| Generalization | Poor generalization, formulation optimization | Better generalization, formulation, and prediction |
| Interpretability | Limited interpretability for neural networks | Advanced algorithms and tools for model interpretability |
| Computational resources | Restricted by limited computational power and infrastructure | Supported by cloud computing and high-performance GPUs |
3D, three dimension; AI, artificial intelligence; GPUs, graphics processing units.
One of the most critical barriers was data scarcity. In the early stage, the absence of publicly available formulation databases necessitated reliance on internal laboratory data. Many predictive models developed during this time were based on limited experimental data (typically fewer than 100 formulations, each involving only a few drugs and excipients). The representation of drugs and excipients in these models was also insufficient. Early studies often relied on basic molecular descriptors and simple physicochemical parameters, failing to capture the intricate interactions between APIs, excipients, and environmental factors such as pH and temperature. These limitations made it difficult for models to accurately predict phenomena like dissolution profiles, stability, and pharmacokinetic behavior across diverse scenarios. Moreover, the early AI models were inherently constrained by their design focus on specific datasets and narrowly defined problems, and rarely progressed beyond proof-of-concept studies in real-world pharmaceutical development.
3. Current stage: era of AI-powered drug delivery since 2018
AI is ushering in a revolutionary era in drug delivery starting from 2018. Powered by the exponential growth in pharmaceutical data availability, advanced AI algorithms, and unprecedented computational capacities, AI drives a critical research paradigm shift from traditional empirical methods to data-driven approaches. This transformation is first characterized by continuously expanding data sources and structures, where diverse datasets spanning tabular, image, and text formats are integrated to enhance drug delivery strategies. The emergence of advanced learning algorithms has empowered researchers to harness this wealth of data in unprecedented ways, offering innovative solutions to persistent challenges in pharmaceutical development. This section systematically examines the current state of AI-driven drug delivery, highlighting how the convergence of data growth and technological maturity is accelerating pharmaceutical research and development.
3.1. Data expansion and advances in data processing
With the advancements of AI-driven drug delivery, the diversity in data sources and structures continues to expand. In early applications, researchers tended to rely on curated in-house experimental data. Such datasets were often expensive to generate and limited to exploring narrow chemical or formulation spaces. With the development of data mining and big data analytics tools, compiling publicly available data, including literature, patents, and books, has become a common approach to expand datasets. Database research continues to increase as the importance of data surges in the era of AI. Classic databases in AI-driven drug development, such as DrugBank47, PubChem48, ChEMBL49, and OCHEM50, primarily focus on drug substances. In recent years, databases specific to drug delivery have also emerged, such as cyclodextrin-drug inclusion complex databases51, 52, 53, nanoparticle-related databases54,55, drug-excipient interaction database56, self-emulsifying drug delivery system dataset57, and cross-linked polyester implants dataset58. A recent notable work is a drug product database developed by Murray et al.59 Recognizing that existing large-scale databases primarily focus on drug substances rather than pharmaceutical products, the authors employed a semi-automated approach to extract information on small-molecule drugs from the European Public Assessment Reports (EPARs) and constructed a machine-readable database. This database includes details such as administration route, dosage form, formulation information, and maximum clinical dose for each drug product. Furthermore, leveraging the constructed database, the authors developed AI models to evaluate drug-likeness, select excipients, and predict oral absorption fractions, thereby testing the utility of the dataset in providing valuable insights.
To reduce data costs and improve data homogeneity, novel experimental techniques are utilized to provide low-cost and high-quality data for AI model development18,60,61. Compared to traditional drug formulation experiments, emerging experimental techniques—especially micro-scale high-throughput experiments—require minimal material, feature high automation, and can generate large volumes of highly homogeneous data62. These techniques also excel in data-intensive acquisition and flexible experimental design, making them well-suited for integration with deep learning (DL) optimization and design frameworks. Common methods include microfluidics, continuous flow systems, multiwell-plate-based parallel reactor systems, and additive manufacturing63,64. Among these, microfluidics is widely applied in nanomedicine research. Microfluidic technology enables precise manipulation of fluids at the micrometer scale through microchannels or chambers to control, mix, react, and separate liquids65. These systems, typically made of silicon, glass, or polymers, are cost-effective, require low sample volumes, and offer high efficiency. Compared to conventional experimental methods, data generated by microfluidics are more consistent and reproducible, providing higher-quality training data for AI algorithms. Its high-throughput nature significantly reduces data acquisition costs, facilitating the construction of larger datasets. For example, Eugster et al.18 used microfluidics to generate a dataset comprising over 1300 liposome formulations and developed an XGBoost model to predict liposome formation and size under varying process parameters. The rapid experimental capabilities of microfluidic platforms also shorten the validation cycle for AI predictions. Drug delivery strategies predicted by AI can be immediately tested in vitro using microfluidics, enabling quick adjustments to algorithms and model parameters. Therefore, such a closed-loop feedback mechanism can greatly enhance the efficiency of AI-driven drug delivery. For instance, Ortiz-Perez et al.66 combined microfluidic formulation techniques, high-content imaging, and active learning strategies to design an integrated workflow. Using this modular platform, they developed poly (lactic-co-glycolic acid)-polyethylene glycol nanoparticles with high uptake rates in human breast cancer cells. Further insights into machine learning (ML) applications in microfluidics can be found in the review by Dedeloudi et al67. Beyond micro-scale high-throughput experimental technologies, other emerging techniques, such as organ-on-a-chip systems68, can also play critical roles in exploring data for understanding drug delivery. These technologies share a common trait of overcoming the limitations of traditional experiments by generating high-throughput, high-precision, and highly sensitive experimental data with unique perspectives, thus facilitating AI-driven drug delivery.
Since drug delivery involves complex multiscale processes, advanced imaging technologies also provide multiscale data support. Imaging technologies refer to techniques that utilize principles of physics, chemistry, and biology to acquire structural, functional, and dynamic information within biological systems69. Commonly used imaging technologies include high-content imaging, spectroscopic analysis, and non-invasive biological quantification techniques. For example, high-content imaging is a technology that integrates automated fluorescence microscopy, image analysis, and data processing70. It can provide rich quantitative information at the single-cell or tissue level, precisely evaluating drug delivery systems’ targeting capability, release behavior, and biological effects. In the study utilizing active learning for high-throughput nanoparticle design66, high-content imaging was employed to automatically process wide-field fluorescence images to quantify nanoparticle uptake. Based on scattering caused by molecular vibrations, rotations, and other low-frequency modes, Raman spectroscopy provides insights into molecular structure and composition. It plays a critical role in the characterization and real-time monitoring of drug release processes71. Abdalla et al.72 were the first to use Raman spectroscopy to characterize polysaccharides for building a machine learning model to predict drug release from polysaccharide matrices in the colonic environment. They found that the Raman peaks of glycosidic bonds were key features for predicting drug release. Other non-invasive imaging techniques include magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography (PET). These methods allow dynamic, quantitative imaging of biological systems without disrupting tissue structures73. Such technologies are particularly suitable for long-term in vivo observation of drug delivery processes, enabling real-time monitoring of drug behavior within organisms, reducing the use of experimental animals, and lowering data acquisition costs. In short, advanced imaging technologies offer multidimensional information about the in vivo behavior of drugs and carriers and quantitatively assess drug delivery efficacy, which is critical for training and optimizing AI models.
As the unstructured and multimodal data in multi-scale drug delivery emerged and accumulated, efficient data processing methods have been developed to support data integration and analysis. Drug delivery involves multilevel, multiscale biological processes, ranging from drug-carrier interactions to the interplay between drug delivery systems and biological systems. Such a complexity results in highly nonlinear, multidimensional characteristics in drug delivery data, which includes, but is not limited to, tabular data74 (describing formulation, process, and experimental condition information), molecular graphs75 (describing molecular structures), text formats76 (describing drug structures or sequences), images77 (depicting the appearance of drug products or in vivo drug distribution), and time-series data78 (such as drug release kinetics and pharmacokinetic curves). Traditional feature extraction and machine learning methods often struggle to comprehensively analyze unstructured and multimodal data, limiting drug formulation design and optimization. Through the flexible combination of neural networks, deep learning can automatically learn high-dimensional feature representations from large-scale complex data, making it particularly suitable for deciphering complex drug delivery processes. For example, natural language processing (NLP)79 and computer vision (CV)80 technologies are used to process text and image data, respectively. In 2019, our group74 pioneered the application of deep learning in pharmaceutical formulation predictions, achieving over 80% accuracy in predicting the disintegration time of orally fast-disintegrating films and the dissolution profiles of sustained-release matrix tablets. Furthermore, data alignment and integration techniques, cross-modal learning, and multimodal deep learning frameworks have been developed to handle multimodal data81. For example, our group82 combined graph-based networks and generative adversarial networks based on tabular representations to predict organic crystal structure, which is a critical solid-state property for pharmaceutical development. Beyond this, deep learning methods have been effectively used to predict the structure of materials83, RNA84, and proteins85, which form the foundation of drug delivery research. Drug carrier development, involving material selection, structural design, and optimization, is one of the core tasks for drug delivery. Using convolutional neural networks (CNNs), researchers can analyze carrier morphology and structural characteristics from image data, identifying correlations between microstructures and drug release performance. For example, Hornick et al.86 introduced the On-Demand Solid Texture Synthesis (STS) architecture, which generates 3D volumetric textures based on 2D exemplar texture images. This approach was applied to design formulations with desired critical quality attributes by leveraging representations of their microstructural features. The proposed AI method was validated using oral tablets and long-acting implantable formulations as examples. The drug release mechanisms are influenced by material structure, drug properties, and physiological environments, making it difficult for traditional kinetic models to describe complex drug release processes accurately. Deep learning offers new solutions to this challenge87, 88, 89. Data-driven drug release analysis methods can not only uncover the primary mechanisms of drug release but also provide theoretical support for the design of intelligent drug release systems. The in vivo behavior of drug delivery systems includes carrier distribution in blood circulation, accumulation in target tissues, cellular uptake, and release. Due to the complexity of biological systems, these processes are challenging to analyze using traditional modeling approaches comprehensively. Deep learning, combined with bioimaging techniques, provides powerful tools for studying the in vivo behavior of drug delivery systems. Deep learning-based image segmentation and feature extraction techniques can automatically identify the distribution of carriers in different tissues from in vivo imaging data and quantify their localization and drug release at the cellular level. For instance, Liu et al.90 designed a 3D tumor-mimicking model with in vitro‒in vivo correlation for drug release. By analyzing spatiotemporal images of the in vitro model, a dual-attention U-Net trained GAN was employed for vessel segmentation and quantitative drug analysis to evaluate the spatiotemporal dynamics of drug release within solid tumors. In short, deep learning offers novel tools and perspectives for analyzing the complexities of drug delivery systems, driving the design and optimization of these systems to new heights.
3.2. Advanced learning strategies for limited data scenarios
Data sparsity is one of the biggest challenges in drug delivery due to the high cost of experiments, long data collection cycles, and the diversity of chemical space. These factors often lead to poor model generalization and unstable predictions. To address these limitations, various advanced learning strategies have been developed to enhance model accuracy even with limited training data.
Ensemble learning is one of the effective strategies to address these challenges. Ensemble learning combines the predictions of multiple sub-models to reduce the bias and variance of individual models91. By leveraging diversity, it mitigates the risk of overfitting and improves the overall predictive performance, robustness, and generalization capability of the model to unseen data, laying the foundation for the maturity of AI-driven drug delivery research. Several studies have reported that ensemble learning algorithms, such as Random Forest, XGBoost, and LightGBM, often exhibit superior predictive performance and stability when handling tabular data in drug delivery92. In a study on drug-excipient compatibility prediction93, the authors used a stacking-based model integration strategy, demonstrating that the stacked model outperformed individual models in predictive capability. Similarly, Deng et al.94 compared the performance of 14 machine learning algorithms in developing predictive models for the dissolution curves of microsphere formulations. They identified four models based on different assumptions that offered superior predictive performance. Furthermore, the authors employed a voting-based ensemble strategy to construct a consensus model, effectively reducing prediction errors in both the initial release phase and the plateau phase of the dissolution curves.
To overcome data scarcity challenges, transfer learning and multitask learning enhance model performance by sharing information across domains and related tasks. Transfer learning leverages knowledge learned from pre-trained models in source domains and transfers it to target domains, making it particularly suitable for tasks with limited data but similar structures or patterns95. For example, the pre-trained molecular representation models can capture deep insights into molecular structures from large-scale molecular databases like PubChem or ChEMBL, which are then fine-tuned for specific tasks, such as molecular property prediction96. Multitask learning can simultaneously optimize objectives for multiple related tasks, enhancing performance on each task through shared representation learning97. Related tasks such as stability, drug loading efficiency, and targeted release performance can be jointly modeled in drug delivery. By sharing information across tasks, multitask learning enables robust predictions even under data-sparse conditions. Demonstrating the synergistic potential of these approaches, our group98 developed a unified framework in 2018 that integrated both transfer learning and multitask learning for predicting key pharmacokinetic parameters.
Active learning significantly accelerates model development by strategically selecting the most informative samples for labeling, achieving high model performance with minimal experimental data requirements99. In drug delivery, active learning can start with just a few experimental data points and guide experimental design by prioritizing the selection of the most representative or uncertain drug molecules for experimental validation. This approach incrementally expands high-value datasets, reduces data wastage, and improves model accuracy and reliability. Rakhimbekova et al.100 compared various active learning protocols in designing peptide-binding polymers and studied factors such as the initial training set size and task complexity on active learning performance. Using the best-performing active learning method, the authors efficiently designed novel peptide-binding polymers and validated them experimentally. This comprehensive and detailed work is a benchmark for using active learning to accelerate drug delivery system development.
Additionally, in drug delivery, challenges may arise from missing data labels, such as in positive-unlabeled learning scenarios, where negative samples are unavailable. In our recent work101, a semi-supervised learning framework was designed to address the issue of missing negative samples in formulation strategy decision-making tasks, and based on this framework, we developed the first AI system named FormulationDT for drug formulation strategy design, as illustrated in Fig. 3. FormulationDT covers multiple decision-making steps with a total of 12 machine learning classification models, achieving an average area under the receiver operating characteristic curve (ROC_AUC) score above 0.90. In short, through efficient adaptation, knowledge transfer, chemical space navigation, and task collaboration, advanced AI learning strategies not only improve model performance but also lay the foundation for a paradigm shift towards computer-driven drug development.
Figure 3.
The AI formulation strategy decision route and the application scenarios of FormulationDT. Adapted from Ref. 101 with permission from Elsevier; copyright © 2025 Elsevier.
3.3. Deep learning driving the paradigm shift of AI-powered drug delivery
Drug delivery research essentially involves multi-objective optimization (e.g., delivery efficiency, side effects, stability, cost-effectiveness) in a high-dimensional space composed of material properties and process parameters to achieve the ideal response from biological systems102. In early studies, statistics-based Design of Experiments (DoE) methods were commonly used by formulation scientists as a tool for formulation optimization. However, such methods are limited to explaining linear or low-dimensional nonlinear relationships between variables and responses within a narrow design space, making them ineffective in navigating high-dimensional nonlinear spaces. The powerful feature extraction capability and flexible neural network architectures of deep learning enable it to adapt to multimodal data and diverse tasks, providing strong driving forces for solving complex problems in the drug delivery field.
Deep learning-based methods can efficiently navigate the complex design space of pharmaceutical formulations and processes to identify optimal solutions. A typical example is the work by Li et al.30, who optimized Verapamil hydrochloride polymer‒lipid hybrid nanoparticles (PLNs). In this study, neural networks demonstrated superior data fitting performance compared to response surface methodology and were further combined with a continuous genetic algorithm to optimize the drug loading efficiency and mean particle size of the PLNs. Sano et al.103 reviewed the application of Bayesian optimization in drug development, demonstrating its effectiveness in reducing experimental trials and improving optimization efficiency compared to traditional DoE approaches. Reinforcement learning (RL) is a machine learning paradigm specifically designed for optimization tasks104, aiming to learn a policy through interactions between an agent and the environment to maximize cumulative rewards. Currently, RL is more commonly used for optimizing drug administration methods and dosage regimens105,106. Based on the principles of reinforcement learning, it can also be applied to optimizing drug delivery systems107 and dynamically adjusting manufacturing processes108, provided that a suitable environment is designed to enable efficient and low-cost interactions. For instance, researchers have developed algorithms using reinforcement learning strategies to plan the optimal path for nanorobots delivering drugs to tumor sites107. The proposed method can dynamically optimize delivery paths when tumor locations in patients change, demonstrating high decision-making efficiency and low error rates.
In addition to prediction and optimization tasks, deep learning holds potential for further drug delivery design. The design tasks in drug delivery aim to deduce the ideal carriers or formulation combinations from desired properties (e.g., delivery efficiency, patient response). Compared to predictive tasks and optimization tasks confined within limited design spaces, design tasks in drug delivery have more disruptive innovation potential, encompassing scenarios such as functional excipients design and innovative formulations exploration. Functional excipients are auxiliary substances in pharmaceutical formulations that serve specific functions, playing a crucial role in optimizing drug performance or improving formulation quality. These include, but are not limited to, cyclodextrins, lipids, and polymers109. AI techniques have emerged as powerful tools in mRNA-LNP formulation development, particularly in designing ionizable lipids. For example, a recent work from our group110 performed AI-driven virtual screening on a large-scale library of nearly 20 million lipids. By using developed machine learning models to predict two key properties of mRNA-LNPs (delivery efficiency and apparent pKa), two iterations of screening were conducted. All six molecules from the second round matched or exceeded the benchmark DLin-MC3-DMA's performance, with one achieving in vivo delivery efficiency comparable to the benchmark SM-102 lipid used in the marketed mRNA-LNP vaccines. This demonstrated the potential of AI technology for efficient screening of ionizable lipids from large-scale virtual libraries. The convergence of AI techniques with combinatorial chemistry is another design strategy for discovering novel ionizable lipids, aiming to enhance the delivery efficiency, safety, and organ specificity of LNPs19,111, 112, 113. Beyond screening virtual libraries, deep generative models offer a promising solution for inverse design. Common generative models include Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and diffusion models. GANs operate through a competitive process between a generator, which generates data to mimic real samples, and a discriminator, which distinguishes real data from generated ones. High-quality data can be produced by the generator through such an adversarial training process114. VAEs map data into a latent space and reconstruct it by optimizing the distribution of latent variables using variational inference, enabling the generation of diverse and continuous samples115. Diffusion models simulate the gradual noise addition and removal process, reverse-engineering realistic distributions of real samples from random noise116. Deep generative models have been widely applied in drug molecule design117, 118, 119, which also indicates their potential for functional excipient design120,121. For instance, Yue et al.122 conducted a benchmark study exploring the use of six common deep generative models (Variational Autoencoder, Adversarial Autoencoder, Objective-Reinforced GANs, Character-level Recurrent Neural Network, REINVENT, and GraphINVENT) for de novo polymer design, aiming to design polymers with high glass transition temperatures. Similarly, Liu et al.123 trained a molecular graph generative model based on invertible regularized flows on a dataset of 250k polymers to design polymers with a high glass-transition temperature (Tg) and a wide bandgap. Beyond functional excipient design, generative models can also create drug formulations with desired performance. The aforementioned work86, combining the Continuous-Conditional GAN method and the On-Demand Solid Texture Synthesis (STS) architecture to design implants with controlled particle size and drug loading, is a typical example. Elbadawi et al.124 trained conditional GANs (cGANs) on a dataset of over 1437 3D-printed formulations. They explored 27 different cGAN architectures to generate 270 formulations and selected a model with a balanced capability to generate novel yet realistic formulations, successfully printing one of the generated formulations. Additionally, reinforcement learning can be integrated with deep generative models to optimize molecule generation through reward mechanisms125. Overall, deep learning-based design demonstrates the potential to drive a new paradigm of target-oriented, design-driven research in drug delivery.
3.4. AI platform deployment for model applicability enhancement
Developing user-friendly AI platforms can enhance the practical value of AI models while facilitating their continuous improvement and broader adoption. Although the development of AI models is crucial, their actual impact depends on whether they can be efficiently and conveniently utilized by a wider range of users, such as pharmaceutical researchers and clinicians. Deploying AI models on public platforms significantly lowers the technical barriers for drug developers to integrate AI into their workflows. Furthermore, public deployment can drive the iterative advancement of AI models. Under data privacy protection, AI models can be fine-tuned and updated using real-world data and user feedback, thereby improving their predictive performance and generalizability. Additionally, AI platforms can promote collaboration among various parties within and beyond the pharmaceutical industry. Academia, industry, and regulatory agencies can leverage shared AI platforms to exchange data, validate model performance, and assess the applicability of emerging AI algorithms. This collaborative framework not only accelerates drug development but also facilitates the regulatory integration of AI in drug delivery. Regulatory authorities can utilize these platforms to evaluate the reliability of AI in formulation design, quality control, and risk assessment, laying the foundation for the incorporation of AI into pharmaceutical regulatory frameworks.
AI platform construction encompasses multiple steps, including model optimization and packaging, application programming interface development and integration, user-friendly front-end interface design, cloud computing and scalability, data privacy and security protection, and the integration of a continuous feedback mechanism. These processes can be referenced by several drug delivery AI web-platforms from our group, which span preformulation studies, formulation strategy design, and formulation prediction. In 2021, our group126 developed the first drug delivery AI platform, PharmSD, to predict solid dispersions' stability and dissolution. Following this, preformulation AI platforms, PharmDE56 and FormulationBCS127, were developed to evaluate drug-excipient compatibility and predict Biopharmaceutics Classification System (BCS) categories, respectively. FormulationAI92 is the first integrated AI platform for drug formulation prediction, covering 16 key formulation characteristics across six formulation types: cyclodextrin inclusions, solid dispersions, phospholipid complexes, nanocrystals, self-emulsifying drug delivery systems, and liposomal formulations. By simply inputting the basic information of drugs and excipients, users can efficiently perform AI-powered excipient selection, and formulation & process parameter design. The recently launched FormulationDT101, the first data-driven and knowledge-guided AI platform for small molecule formulation strategy design, serves as a crucial decision-making tool upstream of formulation development, adding an important component to the new paradigm of computer-driven drug development128. Moving forward, AI platforms in drug delivery are expected to become increasingly standardized, and the functionalities of AI platforms will further expand. For example, developing automated machine learning (AutoML) platforms129 in recent years can enable non-data science researchers to easily access advanced learning algorithms for building AI models in drug delivery. With the rapid advancement of large language models, developing AI assistants for drug delivery is another viable approach to lowering the barriers to AI usage. Furthermore, optimizing models through data sharing can create an integrated “data-model-user” ecosystem, fostering continuous positive feedback for AI-driven drug delivery. Overall, the deployment of AI platforms not only enhances the accessibility of AI in the pharmaceutical industry but also provides strong support for scientific innovation and industry upgrades, accelerating the pace of intelligent drug development.
4. Methodology for reliable AI-driven drug delivery research
The expansion of data and advancements in AI technology have driven the widespread adoption of AI in drug delivery. However, the modeling approaches employed by different researchers vary, posing challenges to the reproducibility and reliability of the research. To support and ensure the quality, reliability, and reproducibility of implementing AI in drug delivery research, we propose a comprehensive set of guidelines that should be followed throughout the research lifecycle (Fig. 4). The detailed guidelines address four critical aspects: (1) problem definition for determining research objectives and data requirements; (2) data engineering for improving data quality; (3) model development for robust models; and (4) model sharing and deployment for promoting utility. Each component has been carefully designed to help researchers avoid common pitfalls in AI-driven drug delivery studies and improve model reliability and reproducibility. Specifically, based on the guidelines and our previous experience, the “Rule of Five” (Ro5) was proposed as the essential requirements for reliable AI applications in formulation prediction.
-
•
Sufficient dataset volume, preferably ≥500 entries;
-
•
Component diversity, preferably ≥10 drugs and coverage of critical excipients;
-
•
Inclusion of all critical process parameters;
-
•
Proper molecular representation for both drugs and excipients, e.g., molecular descriptors and fingerprints;
-
•
Suitable algorithms and model interpretability.
Figure 4.
Guidelines for implementing AI in current drug delivery research.
Here we present two representative case studies demonstrating our proposed “Rule of Five”. The first case study examined the stability prediction of solid dispersions130. The researchers established a dataset comprising 646 formulations, including 50 drug molecules and 47 polymer excipients. The model inputs included molecular descriptors of both drugs and polymers, along with critical process parameters such as preparation methods and process temperatures. Comparative evaluation of eight machine learning algorithms revealed that the random forest model achieved the highest prediction accuracy of 82.5% on the test set. Analysis of the 20 most important features identified key patterns in the model's decision-making process. The second case study investigated machine learning applications in drug/cyclodextrin systems16. This research constructed a dataset of 3000 formulations, including 1320 guest molecules and 8 cyclodextrins. Among three different algorithms evaluated, the LightGBM model performed best, achieving an R2 value of 0.86 on the test set. Feature importance analysis revealed the key molecular descriptors and physicochemical properties that predominantly influenced the model's predictions, providing valuable insights for future formulation design. Notably, this study evaluated the impact of dataset size on model performance by progressively reducing the training data volume. Reducing the dataset from 3000 to 500 samples led to substantial performance degradation, with mean absolute error (MAE) increasing from 1.38 to 2.28 kJ/mol and R2 decreasing from 0.86 to 0.58. These findings empirically demonstrated the importance of adequate dataset size in AI-driven drug delivery research.
4.1. Problem definition
Problem definition is a crucial first step in any data analysis project. At this stage, the study objective should be defined, such as predicting drug–excipient compatibility or drug release profiles. The next step is to determine what data needs to be collected to address the identified problem effectively through domain expertise. Careful consideration at this stage is crucial as it can prevent unnecessary rework and costly data recollection later in the project.
4.2. Data engineering
4.2.1. Data collection
When initiating data collection, public databases and repositories often serve as valuable initial data sources, potentially offering substantial datasets that can significantly reduce the burden of data collection. However, such public resources are not always available for specific drug delivery tasks. Consequently, researchers often need to extract data from patents and literature manually. Modern data mining approaches based on NLP and ML have provided advanced tools and techniques to assist in this data collection process, which helps to build semi-automated or fully automated pipelines to reduce manual labor requirements and accelerate data gathering processes131,132. It is worth noting that whether using public datasets or newly collected information, all data sources should be meticulously documented and verified to prevent recording errors during collection.
While the ideal scenario involves collecting extensive and diverse datasets, the reality is that the amount of data that can be collected is largely determined by the specific drug delivery task, ranging from tens to thousands of data points. Therefore, choosing the appropriate algorithm based on the amount and type of data is more important. For example, a range of advanced techniques developed specifically for small datasets, such as data augmentation and transfer learning, may help build robust models even with limited data resources. For a detailed discussion of addressing data sparsity through advanced learning strategies, please refer to Section 3.2.
4.2.2. Data cleaning and management
Data cleaning and preprocessing are key steps in the data engineering pipeline for further data quality improvement. The basic operations to clean data include duplicate removal and missing value handling. While removing features with missing values directly is possible, sometimes we wish to retain important features by filling in missing data instead. Either statistical methods (e.g., mean or median imputation) or ML-based imputation techniques (e.g., missForest133) can be used to deal with missing values. However, attention must be paid to the proportion of missing values, the missingness mechanism, and the missing data patterns when choosing appropriate imputation methods134. Many Python packages provide efficient tools to automate these processes, which are highly efficient and especially beneficial when working with large datasets.
Another important issue that must be treated with caution is the bias that arises when integrating data from multiple sources, which may obscure important patterns in the dataset. Such bias may arise from differences in experimental conditions, computational methods, or data collection protocols across different databases. Establishing a standardized data processing pipeline and documentation is required to identify and manage the potential bias. All steps throughout the cleaning and merging process should be meticulously documented and reported to ensure reproducibility and traceability.
4.2.3. Data representation
In addition to drug/excipient molecules, data representation in drug delivery also includes information such as formulation data and experimental conditions. This multifaceted nature requires careful consideration of various representation methods for effective model development while maintaining interpretability.
Drugs and excipients are primarily molecules. A common practice is to convert these molecules into their SMILES string representation and then calculate molecular descriptors or molecular fingerprints based on specialized software packages such as PaDEL135, Mordred136, and RDKit137. These molecular descriptors and fingerprints can be further filtered by feature engineering. When dealing with excipients such as polymers, the characterization process requires additional considerations beyond the molecular structure of the monomer, such as the degree of polymerization130. Information such as formulation and experimental conditions can often be represented as tabular features, which are normally set up empirically by the researchers.
Recently, with the expansion of available data types and the development of advanced AI algorithms, DL-based end-to-end representations have also been adopted. This approach directly processes various data types, including images and text, to automatically learn appropriate representations, thus reducing the need for extensive feature engineering. However, it is worth noting that DL models are “black boxes,” and their automatically learned representations often face interpretability challenges. For a detailed discussion of advances in data processing and representation, refer to Section 3.1.
The success of data representation methods often lies in finding the right balance between complexity and interpretability. While more complex representations might capture subtle patterns in the data, they may sacrifice interpretability and practical utility. Conversely, simpler representations might be more interpretable but could miss important patterns. This trade-off should be carefully considered based on the specific requirements of the drug delivery application and the intended use of the resulting models.
4.2.4. Data visualization
Data visualization is important for researchers to assess data quality before model development, as it helps identify potential issues that might not be immediately apparent in numerical or tabular formats.
Exploratory data analysis (EDA) through visualization begins with understanding the fundamental characteristics of the dataset. Basic statistical visualizations, such as histograms, box plots, and density plots, provide immediate insights into data distributions, helping researchers identify outliers, distribution shifts, and potential anomalies. Advanced visualization techniques refer to dimensionality reduction methods such as principal component analysis (PCA)138, t-SNE139, and UMAP140. These methods can transform high-dimensional complex datasets into low-dimensional representations. Visualizing these meaningful low-dimensional representations can enable researchers to identify data distributions and patterns that might be obscured in the original high-dimensional space.
4.3. Model development
4.3.1. Data preparation
Model development in drug delivery systems demands meticulous data preparation to ensure reliable and robust outcomes. This foundational phase encompasses several critical aspects that directly influence model performance and reliability, requiring careful consideration of various techniques and methodologies.
Data balancing is a critical challenge in drug delivery datasets, where class imbalance frequently occurs. This imbalance might manifest in various ways, such as disproportionate success rates in formulation studies. Such an imbalance can severely affect model performance, as the model may be biased towards the majority class and underperform on the minority class. To address this challenge, balanced datasets can be created by over-sampling the minority class and under-sampling the majority class141. Synthetic Minority Oversampling Technique (SMOTE)142 is a widely used method for generating synthetic samples for minority classes while preserving their underlying statistical properties and distribution patterns.
Another factor to consider is data scaling. When dealing with various feature sets common in drug delivery studies, there is often a huge difference in the range of values taken by different features. Some algorithms (e.g., decision trees and LightGBM) are insensitive to the range of feature values, while others (e.g., support vector machines and neural networks) are sensitive to them, requiring appropriate scaling. When selecting these scaling methods, both the characteristics of the features and the requirements of the chosen modeling algorithm should be considered. Most importantly, the scaling parameters computed from the training set must be consistently applied to both the validation and test sets to prevent data leakage and ensure the reliability of the model evaluation.
Data augmentation is not mandatory, but it is a powerful tool to enhance model robustness with small datasets. For example, image transformations such as rotation and scaling can be implemented for image-based datasets to help improve model performance143. Additionally, SMILES enumeration can augment the representations of SMILES-based drugs/excipients by taking advantage of the fact that a single molecule can correspond to multiple SMILES representations144.
4.3.2. Model training
The ML model training process for drug delivery applications encompasses several interconnected stages from data splitting to model optimization.
Drug delivery often faces data challenges, including small data sizes, high noise levels, and high-dimensional features. Randomly splitting the dataset into training, validation, and test sets is a common approach in studies with abundant data. However, for the small datasets often encountered in drug delivery, a common one-shot “Training-Validation-Test” data splitting may severely affect the model performance, such as overfitting, where models tend to learn noise or spurious patterns from limited data instead of capturing generalizable trends145. To address this, cross-validation146 is often a better choice. This method divides the dataset into k folds, where k‒1 folds are used for training while the remaining fold serves as the validation set. This process is repeated k times, with each fold being the validation set once. The final performance is reported as the average across all iterations with their standard deviations, providing a more robust assessment. Beyond this, specific data splitting methods can be adopted for certain scenarios. For example, stratified sampling can be incorporated into various splitting strategies to maintain consistent data distribution across all subsets. Besides, placing drug molecules and their formulations not seen during training into the test set allows evaluation of the model's generalization ability on unseen drugs. Temporal splitting is a data splitting strategy that takes the time factor into consideration, where historical data is used for training, and newer data is reserved for testing, which simulates the prospective application of the model147. For specific data and task types in the drug delivery field, specific data splitting methods should be considered to enable more accurate model performance evaluation. For instance, in 2019, our group74 proposed a tailored automatic data splitting algorithm for drug formulation datasets to address the small and imbalanced data space. In microsphere dissolution prediction94, we employed group splitting to ensure that different time points from the same dissolution curve did not simultaneously appear in both the training and test sets. This approach was crucial, as the goal was to predict an entire dissolution curve. The key principle is to use data splitting methods aligned with the model's intended application scenarios.
The next step is to select ML algorithms for model training. For beginners, it is recommended to start with algorithms that are widely used today, such as random forest12. After gaining some AI implementation experience, you can try developing multiple models using different combinations of algorithms and representations, then select the best model that suits the task at hand. Hyperparameter tuning significantly impacts model performance. In traditional machine learning, this involves optimizing parameters like the number of trees in a random forest and the kernel type in support vector machines. The process becomes more complex in deep learning, encompassing parameters such as network architecture, learning rate, and batch size. Commonly used hyperparameter search strategies are grid search, random search, and Bayesian optimization148. Further insights into machine learning algorithm selection can be found in the review by Vamathevan et al149.
After training individual models separately, using ensemble methods to combine the predictions of multiple models is expected to improve the prediction results further. Further insights into ensemble learning methods and applications can be found in the review by Cao et al150.
4.3.3. Model evaluation and selection
In addition to rational data splitting to improve the reliability of model validation, it is also necessary to establish requirements for model performance reporting to facilitate a thorough evaluation of the models.
-
•
A comprehensive report of performance metrics is the basis for model evaluation. For classification tasks, common metrics include accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (ROC_AUC). Metrics such as mean square error (MSE), mean absolute error (MAE), and coefficient of determination (R2) are commonly used for regression tasks. Different metrics correspond to different aspects of model evaluation, and we recommend reporting model performance as comprehensively as possible for a thorough assessment.
-
•
Report model performance based on cross-validation or repeated experiments. The generalization ability of the models should be assessed using methods such as “group splitting”, “scaffold splitting”, or “temporal splitting”.
-
•
When proposing a new model, it should be compared with simpler baseline and state-of-the-art models to clarify improved model performance and the basis for model selection.
-
•
Model selection should be guided by task requirements rather than model complexity alone. More sophisticated models don't necessarily outperform simpler alternatives, particularly when interpretability is crucial.
4.3.4. Model interpretability
Model interpretability in machine learning enhances the transparency and practical value of AI research in drug delivery by explaining how models arrive at their predictions in a human-understandable way. Interpretable AI models serve as a crucial bridge between computational predictions and pharmaceutical sciences. They can help understand complex formulation-performance relationships and accelerate the formulation optimization process.
Methods for providing model interpretability in drug delivery research can be primarily categorized into two approaches: transparent design and post hoc interpretation. Transparent design here refers to choosing models where the decision-making process is easy to understand during model construction, such as linear regression and tree-based algorithms (e.g., Decision Trees, Random Forest, and LightGBM). Post hoc interpretation methods are normally model-agnostic, which means they can be applied to various machine learning models, including complex deep neural networks. These methods include using SHAP (Shapley Additive Explanations)151 and LIME (Local Interpretable Model-agnostic Explanations)152 to analyze the significance of input features on model decisions. Notably, combining inherently interpretable algorithms with post hoc interpretation tools enables validation and complementary explanations from different perspectives, allowing for a deeper and more comprehensive exploration of formulation-performance relationships. For example, Mendes et al.153 implemented this complementary approach by combining tree-based models with SHAP analysis to investigate nanoparticle design principles in cancer treatment, offering robust and comprehensive insights into nanoparticle design-performance relationships. Further insights into model interpretability can be found in the review by Jiménez-Luna et al.154
4.3.5. Uncertainty quantification
In drug delivery, ML models are typically trained on limited datasets, and their prediction reliability generally decreases when encountering compounds that differ significantly from the training examples. This raises a critical concern in pharmaceutical development: how much can we trust the predictions? The importance of quantifying prediction confidence rivals that of improving model accuracy itself. The challenges of small datasets, variable data quality, and potential out-of-distribution predictions make it essential to define model uncertainty for reliable real-world applications.
Sources of uncertainty in pharmaceutics typically arise from two primary types155: aleatoric uncertainty and epistemic uncertainty. Aleatoric uncertainty refers to the irreducible uncertainty caused by inherent randomness in the system, manifesting as noise and measurement errors in the data. Epistemic uncertainty, on the other hand, arises from incomplete knowledge or understanding, which in machine learning models is primarily reflected in the model's limited understanding of the data distribution.
Various uncertainty quantification methods have been applied in drug discovery, such as Bayesian-based, similarity-based, ensemble-based, and probabilistic modeling approaches156. Although not yet widely adopted, some machine learning studies in drug delivery have begun incorporating uncertainty quantification methods. For example, Deng et al.94 have developed a consensus model for microsphere dissolution curve prediction with quantified uncertainty by reporting the range of predictions from four sub-models in the ensemble. Defining the applicability domain (AD) is a method to define the range where a model can make reliable predictions by measuring how similar new samples are to the training data. This approach has been applied in PharmSD126, a machine learning platform for predicting the stability and solubility of solid dispersions. This platform used a set of distance-based chemical structure similarity metrics to assess whether the input drug molecules fall within the model's application domain.
4.4. Model sharing and deployment
For scientific research, data and code sharing help ensure that results can be accurately replicated. This sharing validates research results and enables knowledge transfer, allowing other researchers to create cumulative advances on existing work. Deploying AI models to public platforms can further transform them into practical tools, which lowers the programming barrier to using AI models. While deploying AI models often requires more programming knowledge beyond AI, tools such as Streamlit (https://streamlit.io/) offer a convenient solution for researchers to deploy their models with pure Python scripts quickly. For a detailed introduction to model deployment, please refer to Section 3.4.
5. Future stage: towards an AI-driven transformation in drug delivery after 2024
The future stage of drug delivery stands at the threshold of a revolutionary transformation, as we are moving towards an era where AI becomes an integral component of drug delivery system design and optimization. To explore this transformation, this section delineates the technological trajectory through several key areas, including leveraging large language models (LLMs) and the multidisciplinary integrations between AI techniques and other approaches. These technological convergences create new opportunities for more intelligent, adaptive, and personalized drug delivery systems that could dramatically improve therapeutic outcomes (Fig. 5). While technological advancement is crucial, the successful implementation of AI in drug delivery critically relies on talent development and cultural transformation. This transformation requires nurturing interdisciplinary expertise, fostering a data-driven mindset, and establishing collaborative ecosystems between AI experts and pharmaceutical scientists.
Figure 5.
Schematic illustration of future AI potentials in the drug delivery field.
5.1. Leverage the power of large language models
The emergence of large language models (LLMs), built upon transformer-based architectures such as BERT and GPT, represents a significant advancement in artificial intelligence. These models, trained on vast amounts of data, have shown remarkable potential across diverse fields, encompassing automated text analysis, knowledge extraction, and complex pattern recognition. ChatGPT's release in late 2022 particularly exemplified this potential. Unlike traditional ML models that require specific input formats and coding expertise, modern LLMs like ChatGPT utilize prompt engineering to accept natural language instructions, making them more accessible to researchers without extensive programming backgrounds.
Models specifically trained on biomedical and chemical data have shown superior performance in drug development tasks. Biomedical LLMs such as BioGPT157 and PubMedBERT158 excel at understanding medical literature and biological concepts. Chemical-oriented LLMs such as MolFormer159, and ChemBERTa160 specialize in molecular structure understanding and prediction. These LLM-based systems have demonstrated effectiveness in several key applications, including molecular property prediction161, understanding protein structures162, drug repurposing163, and automated screening of the literature164. In the field of drug delivery, the application of LLMs remains limited. Currently, they are primarily utilized in the molecular design of drug delivery materials165, while many research areas remain unexplored.
The applications of LLMs are expected to expand into broader areas, encompassing the entire workflow from literature review and database construction to critical formulation design and experimental result prediction. At first, through automated analysis of vast literature, LLMs can help researchers rapidly extract key scientific information, such as molecular structures and properties, as well as formulation and processing parameters related to drug delivery. This efficient knowledge extraction capability further supports constructing and optimizing multidimensional databases, encompassing material characteristics, experimental conditions, and drug release behaviors, providing systematic references for research. Based on this data foundation, LLMs can generate preliminary formulation designs and experimental procedures through cross-database analysis, integrating multidimensional information including drug properties, excipient characteristics, and process parameters. This data-driven analysis not only provides formulation composition recommendations but also predicts potential experimental outcomes, offering valuable decision-making references for researchers. Meanwhile, LLMs' capabilities in code generation and experimental protocol design offer new efficiency optimization approaches. On the regulatory front, LLMs can streamline and accelerate review processes for IND and NDA documents by providing powerful integration capabilities and automated assistance166. Although these applications remain conceptual, their closed-loop capability encompasses the entire research workflow from knowledge extraction and database construction through design generation and result prediction. This comprehensive data-driven approach enables intelligent collaboration throughout the research cycle, potentially revolutionizing the efficiency of drug delivery research.
It is foreseeable that over the next few years, there will be an increased exploration of LLMs in drug delivery studies, which presents both opportunities and challenges. As the technology continues to evolve, collaboration between computational scientists, pharmaceutical researchers, and regulatory agencies will be critical in establishing standardized protocols for applying LLMs in drug delivery. Successful implementation of LLMs will require careful validation and particular vigilance against the problem of LLM hallucinations167,168—the tendency of LLMs to generate plausible but factually incorrect information, which could instead lead to additional risks in pharmaceutical research and development. These challenges underscore the importance of developing robust validation frameworks and maintaining human oversight in critical decision-making processes.
5.2. Cross-disciplinary artificial intelligence
While machine learning has brought numerous innovative opportunities to the field of drug delivery over the past few decades, its applications face challenges such as high data requirements, weak interpretability, and limited generalization capabilities. By deeply integrating machine learning with mathematically and physically based multiscale modeling, these challenges can potentially be addressed, paving the way for new scientific exploration. Over the past decade, “Computational Pharmaceutics” has emerged as a burgeoning discipline, introducing AI and multiscale modeling technologies into pharmaceutics and offering immense potential to disrupt traditional formulation development paradigms169,170. Representative methods in multiscale modeling include quantum mechanics (QM)171, molecular dynamics (MD) simulations172, mathematical modeling, physiologically based pharmacokinetic (PBPK) modeling173, and process simulation174. Our earlier review proposed the deep integration of AI with multiscale modeling to achieve computer-driven drug development6. In 2023, our group128 further introduced a computer-driven drug formulation design framework emphasizing the “understand-design-validate-optimize” cycle, as depicted in Fig. 6. This design-oriented framework implements the principles of Quality by Design (QbD), promising not only to improve formulation development efficiency significantly but also to open a promising path toward personalized medicine design. It should be recognized that the widespread application of this framework still faces numerous challenges, relying on the refinement of computational modeling technologies as well as continuous breakthroughs in cross-disciplinary artificial intelligence.
Figure 6.
The proposed computer-driven drug formulation design framework consisting of four steps: understand, design, validate, and optimize. Step 1: Combining in silico modeling with experimental approaches to deeply understand the physiological processes, disease mechanisms, biological effects, drug and formulation properties, as well as the microscopic details of drug delivery. Step 2: Developing AI-driven PBPK/PD models based on a systematic understanding of drug delivery to derive the desired formulation attributes based on the required drug exposure. Integrating machine learning with other computational modeling techniques to design or generate formulation and process parameters according to the desired critical formulation attributes. Step 3 and Step 4: Conducting in vivo efficacy and safety evaluations for the designed formulation, iteratively optimizing it until achieving the desired outcomes. Adapted from Ref. 128 with permission from Elsevier; copyright © 2023 Elsevier.
5.2.1. AI-PBPK modeling
Physiologically based pharmacokinetic (PBPK) modeling is a computational technique used to simulate the absorption, distribution, metabolism, excretion, and toxicity (ADMET) of drug in the body175. This modeling approach integrates physicochemical data by mathematical equations to predict how drugs and other compounds behave in different tissues and organs. However, the complexity of integrating diverse physiological, biochemical, and physicochemical data presents significant challenges in model development and application. AI integration addresses these limitations by offering efficient processing of high-dimensional datasets, predictive capabilities for missing data, and enhanced model accuracy176,177.
AI-PBPK modeling has been extensively used in drug development to predict drug behavior, optimize dosing regimens, and assess drug‒drug interactions. Specifically, AI-enhanced PBPK models can facilitate virtual drug development by predicting drug behavior in various scenarios, including different populations and disease states. For example, our group178 recently developed an integrated AI-powered PBPK platform. This platform enables end-to-end prediction of human PK profiles and tissue distribution of candidate drugs by embedding AI models for eight key drug properties into the PBPK framework—without requiring any in vitro or in vivo experimental data. The platform was validated using over 600 clinical plasma PK profiles, demonstrating its ability to accurately predict systemic exposure and organ selectivity of candidate compounds. AI-PBPK models also assist in predicting the impact of physiological parameters like age and ethnicity on drug pharmacokinetics179. Obtaining these parameters would aid in the design of drug formulations. AI-PBPK models also support the development of quantitative adverse outcome pathways for assessing drug toxicity and efficacy, reducing the need for animal testing180.
Despite the advancements, the accuracy of AI-PBPK models still needs to be improved by expanding datasets and advanced algorithms. For example, AI techniques are used to predict key ADMET parameters from available datasets, such as plasma protein binding, cell permeability, and total plasma clearance, which are then incorporated into PBPK models181. This approach reduces the need for extensive experiments in vitro and in vivo. Moreover, AI can handle incomplete datasets by predicting missing values, thereby improving the robustness of PBPK models. For example, random forest models can predict tissue-to-plasma partition coefficients (Kp) even with sparse data182. Thus, the ability to predict PK parameters without complete experimental data accelerates early drug discovery and reduces the reliance on animal studies. Furthermore, neural networks, including neural ordinary differential equations, have shown better predictive capabilities for time-series PK profiles compared to traditional methods183. Therefore, more improved approaches and algorithms, such as deep learning, applied to build AI-PBPK models would be more beneficial for drug development and discovery.
AI-PBPK modeling plays an important role in future drug regulatory assessment. Regulatory agencies like the European Medicines Agency (EMA) and FDA actively use PBPK models in various stages of drug evaluation to support decision-making184,185. These models are invaluable for addressing drug approval, labeling, and safety questions by providing mechanistic insights into drug behavior179. However, the “black box” nature of many AI-PBPK models makes it challenging to understand how predictions are generated, which can limit their acceptance in regulatory settings. Meanwhile, work on AI-PBPK model interpretability should also be carried out in the future. While regulatory support is growing, there is still a need for standardized guidelines and increased acceptance of AI-PBPK models in regulatory frameworks186. Moreover, regulatory agencies require rigorous validation of AI-PBPK models to ensure their reliability and accuracy for specific applications, such as predicting drug-drug (or food) interactions, assessing formulation changes, and evaluating organ impairment scenarios187,188.
5.2.2. AI-QM/MD modeling
AI-QM/MD modeling refers to integrating AI with quantum mechanics (QM) and molecular dynamics (MD) simulations in the context of computational drug delivery. QM methods provide accurate descriptions of electronic states, enabling the study of chemical bonding, reactivity, and charge distribution at an atomic level, which is essential for understanding drug metabolism and interactions. By combining the advantages of molecular mechanics to balance accuracy and computational efficiency, the application of QM has also been expanded, and it helps in calculating descriptors and physicochemical properties that are vital for ADMET predictions171. MD simulates the motion of atoms and molecules over time to provide insights into the behavior and structure of drug delivery systems at the molecular level189, 190, 191, facilitating the optimization of drug loading, controlled release, and interactions with biological membranes192. AI techniques not only have powerful predictive capabilities but also generate novel molecular structures with desired properties, accelerating the drug discovery process and targeted delivery. This hybrid approach leverages the strengths of AI to enhance the capabilities of QM and MD methods, which are crucial for understanding molecular interactions and predicting the behavior of biological systems6.
AI-QM/MD modeling substantially impacts the development of nanoscale drug delivery vehicles. For instance, cell-penetrating peptides (CPPs) hold significant therapeutic potential in drug delivery, yet the diversity of known CPPs remains relatively limited. Then, a series of CPPs with different structures was generated based on deep generative models. Meanwhile, MD simulations were employed to gain mechanistic insights and prioritize AI-generated peptides for further analysis. Then the top-scoring peptides were validated through wet-lab experiments, resulting in CPPs with better permeability and weaker toxicity. This study not only demonstrates how MD simulations can support de novo peptide design but also proposes a screening pipeline with low cost and high accuracy193. A similar case study using MD simulations was also found in the rational design of liposomal drug formulation due to their superior biocompatibility, biodegradability, and ability to provide controlled release and targeted delivery194. The results showed that in both passive and active liposome loading systems, protonation of drug molecules reduces their binding to phospholipid membranes and alters vesicle morphology in multivesicular liposomes, while maintaining the orientation of hydrophobic parts inward and hydrophilic parts outward; however, in active loading systems, the presence of ions within the liposome cavity enhances drug retention and release profiles by promoting drug self-aggregation. Therefore, MD simulations can validate the optimal liposome formulation predicted by AI before experimental testing, which will significantly enhance mechanistic understanding while reducing experimental costs. Moreover, AI models can predict drug toxicity, bioactivity, and physicochemical properties, essential for designing effective drug delivery systems195, 196, 197. Therefore, combining AI with QM and MD methods will improve predictions' accuracy by leveraging each approach's strengths, accelerating the drug discovery and delivery process.
When developing AI-QM/MD modeling, several caveats must be considered to ensure accurate, reliable, and efficient outcomes. QM methods require highly sophisticated algorithms and substantial computational resources, remaining a significant hurdle to handling large datasets and complex molecular systems198,199. Current QM/MD simulations are limited by the short time scales they can cover, which restricts their application in studying long-term molecular interactions and dynamics200. Moreover, extensive validation is required to ensure the accuracy and reliability of AI-QM/MD models.
5.2.3. Self-driving AI laboratories (AI-SDLs)
Self-driving laboratories (SDLs) represent a breakthrough in scientific research, with significant applications in chemistry and drug delivery201. These laboratories integrate automation, artificial intelligence, and advanced computing to accelerate the discovery and development of new drug molecules and delivery materials202. Specifically, SDLs are equipped with automated experimental setups that can perform a wide range of tasks. By combining the advantages of robotics and advanced AI algorithms, SDLs can perform high-throughput experiments, exploring a larger chemical space more efficiently and with less labor-intensive processes. This capability is crucial for rapidly identifying promising candidates.
AI algorithms in SDLs generate hypotheses based on prior experiments, establishing a feedback loop that reduces the number of experiments required for discovery. Thus, this approach enhances the precision and effectiveness of research. For example, SDLs can autonomously optimize the DNA purification process with minimal human intervention, leading to significant improvements in yield and purity of the product203. SDLs can also facilitate collaboration by enabling distributed experimentation and data sharing across institutions. This is exemplified by projects like The World Avatar, which links laboratories globally for real-time collaborative optimization204. In the future, AI-powered labs can conduct iterative tests to optimize drug release kinetics for sustained or targeted delivery. Furthermore, AI-driven SDLs can rapidly design, execute, and analyze experiments to identify optimal formulations for drug delivery systems. For example, thousands of polymers or ionizable lipid types and conditions can be rapidly screened by AI-SDLs to determine the best carriers or formulation for controlled drug release.
Despite their autonomy, SDLs still require human oversight to ensure progress towards research goals. Integrating human intuition with AI's capabilities is crucial for the success of SDLs205. Another challenge is the time required to adapt SDLs to new studies. Effective SDLs must be designed to work faster than automation alone and be readily adaptable to new research areas206. In summary, the AI-powered SDLs are poised to transform scientific research by leveraging automation, AI, and advanced computing. While there are challenges to overcome, the potential benefits of accelerated discovery and efficiency make AI-SDLs a promising development in the scientific community.
5.2.4. AI-process simulation
Process simulation is widely used in the pharmaceutical industry to model, analyze, and optimize drug manufacturing processes. Simulating complex workflows and equipment behaviors helps reduce costs, improve efficiency, and ensure quality. However, traditional process simulation tools often face limitations such as requiring extensive domain expertise from formulation scientists, lacking standardized simulation methods for various formulations, and being constrained by computational inefficiencies, incomplete data, and expensive commercial software. With such a dilemma, AI enhances process simulation by enabling more accurate predictions through machine learning models trained on large datasets. It can optimize parameters and adapt simulations in real-time. AI also reduces reliance on manual adjustments, accelerates decision-making, and improves handling of uncertainties, leading to more robust and efficient pharmaceutical manufacturing processes.
Some case studies of process simulation in the pharmaceutical field have been summarized207. Briefly, discrete element modelling (DEM) was introduced and used to simulate diffusion-induced swelling and shrinkage of deformable particles, enabling the capture of the microstructural evolution of individual particles208. Despite powerful advantages, DEM struggles to scale effectively to industrial systems with huge particle counts due to the exponential increase in computational demand. Integrating DEM with Computational Fluid Dynamics (CFD) is a powerful simulation approach for studying particle-fluid interactions at micro and macro scales, bridging the gap between lab-scale experiments and industrial-scale production209. For example, a coupled CFD-DEM approach was utilized to study powder dispersion mechanisms in pharmaceutical dry powder inhalers, with the Aerolizer® serving as a model device210. Then the study revealed that shear stress from turbulent flow did not significantly affect powder dispersion, and agglomerate-agglomerate interactions occurred only after the agglomerates were ejected from the capsule. Therefore, this work highlighted the effectiveness of CFD-DEM modeling in studying dispersion mechanisms and provided valuable insights for future improvements in inhaler device design. Additionally, this hybrid technique is particularly valuable in pharmaceuticals, where processes such as granulation, mixing, drying, and fluidized bed operations involve complex particle-fluid interactions. However, high computational resources are still required, especially for large-scale systems with many particles, making the accelerated computing and coarse grid simulation necessary211.
The utilization of AI for accelerating process simulations has demonstrated significant potential to drastically reduce simulation times while enhancing the accuracy and efficiency of predictions. For instance, a study proposed the Graph Neural Network-based Simulator (GNS) integrated with inverse design to optimize DEM parameters for granular flow simulations212. The GNS model, trained on high-fidelity DEM datasets, achieves superior predictive accuracy and generalization across solid dosage manufacturing process design213. Compared to traditional design of experiment methods, the GNS approach demonstrates enhanced computational efficiency and dynamic optimization of complex parameter interactions. A joint framework by integrating AI-CFD-PBPK modeling has also been proposed and applied to the development of various inhaler types such as nebulizers, pressurized metered-dose inhalers, soft mist inhalers, and dry powder inhalers, as well as in inhaled drug formulations214. This hybrid model shows great potential in predicting drug deposition in the human respiratory tract and using PBPK modeling to understand drug dissolution and absorption. Additionally, efforts have been made to investigate the relationship between solid dosage forms' disintegration and dissolution behaviors and the formulation optimization of pharmaceutical products by leveraging advanced AI algorithms like deep learning86,215. These initial results highlight the AI method's advantages in computational speed and its ability to handle complex systems. This represents a significant advancement in computational techniques for process simulation and real-world problem-solving.
5.3. Challenges and future perspectives
5.3.1. Current challenges of applying AI models in drug delivery
Integrating AI with other advanced techniques is revolutionizing smart drug delivery, significantly enhancing precision, efficiency, and personalization in therapeutic interventions. By leveraging AI-driven insights, researchers can optimize drug formulations, design targeted delivery systems, and adapt treatments in real time, ultimately improving patient outcomes and minimizing side effects. Despite the promising outlook, AI still faces several challenges in achieving its goals, as mentioned earlier.
5.3.1.1. Challenges in data sharing and privacy preservation
Data quality and availability issues are critical to developing reliable AI models. Datasets in drug delivery are typically characterized by scarcity, imbalance, and high complexity, while simultaneously facing severe data fragmentation. Such datasets are distributed across various institutions and organizations, creating numerous data silo, preventing effective integration of valuable information for model development.
To address the data issues, multiple data-sharing platforms and initiatives have been established globally. For instance, the Global Alliance for Genomics and Health (GA4GH) promotes sharing and standardizing genomic and health data216. The European Common Data Space aims to unlock the vast potential of data-driven innovation by enabling secure and trusted data exchange across the EU217. In the United States, the National Cancer Institute's (NCI) Cancer Moonshot initiative has developed the Cancer Research Data Commons (CRDC), integrating cancer data from various institutions into a shared platform218. Scholars in related fields also advocate for open data219,220, with increasing numbers of researchers required or voluntarily opting to share their data and code transparently and accessibly. However, barriers to data sharing persist, including data heterogeneity, privacy concerns, and issues of ownership221.
Alongside the advancement of data sharing initiatives, ensuring the privacy and security of patient data is also a significant concern that needs to be addressed when implementing AI in drug delivery systems195. The sensitive nature of health data necessitates robust security measures to prevent unauthorized access and breaches. Implementing advanced encryption techniques and secure data storage solutions is fundamental to safeguarding patient information in AI-driven drug delivery systems. Regulatory bodies also play a crucial role in overcoming these challenges. For instance, the FDA introduced the Knowledge-Aided Assessment and Structured Application (KASA) initiative to promote structured information sharing222. Additionally, some regulatory guidelines and laws, including the Health Insurance Portability and Accountability Act (HIPAA) and General Data Protection Regulation (GDPR), have been proposed to govern the privacy and security of personal health information. These frameworks mandate stringent data protection practices, including anonymization, pseudonymization, and obtaining informed patient consent, ensuring compliance with legal and ethical standards. Together, these measures aim to build a trustworthy environment for AI applications in drug delivery, essential for developing abundant, high-quality, standardized, and reliable datasets for further advancements in computational modeling223.
5.3.1.2. Challenges in model transparency and interpretability
The lack of transparency in AI models remains one of the major challenges for their clinical and commercial applications. Currently, many models are still considered “black boxes”. It is difficult for clinicians and regulatory authorities to understand the basis of model predictions, reducing their reliability and practical value. Enhancing model interpretability is thus crucial to establishing the trust necessary for deployment in clinical and commercial settings.
To address this challenge, advanced AI algorithms and tools with enhanced interpretability have been developed. Furthermore, the integration of interdisciplinary subjects and AI, such as AI-PBPK and AI-QM/MD modeling, holds significant potential for improving the interpretability of AI224. By combining the predictive power of AI with the mechanistic insights provided by these scientific models, researchers can better understand and validate the underlying processes driving AI predictions. This synergy not only improves the transparency of AI systems but also fosters trust by linking AI-driven outcomes to established scientific principles.
5.3.1.3. Challenges in AI model deployment and user accessibility
Despite the increasing application of AI technologies in drug delivery, current AI models face significant usability challenges. While many models are successfully developed, their broader application and development are often hindered by limited deployment capabilities, making it difficult for users to utilize them effectively. By integrating AI-driven drug delivery systems, patient monitoring, and personalized treatment planning225, user-friendly platforms may feature intuitive interfaces that allow patients and healthcare providers to access real-time insights into drug efficacy and health outcomes. Expanding beyond individual platforms, an intelligent ecosystem covering the entire drug development pipeline is gradually becoming operational. In the upstream drug discovery phase, advanced AI algorithms are extensively applied to molecular generation, target identification, and virtual screening, significantly enhancing the success rate and efficiency of drug development226. In the downstream phases of clinical trials and patient care, AI optimizes trial designs, patient stratification, and real-time data analysis, improving trial efficiency and the personalization of therapeutic strategies227. The molecular data generated in the upstream research stage serve as a foundation for AI-driven drug delivery. In contrast, the vast amounts of real-world data (RWD) accumulated in downstream clinical trials become valuable resources for further refining drug delivery AI models. This ecosystem not only shortens drug development timelines and reduces costs but also lays a solid foundation for personalized medicine and precision therapy. It signifies the advent of a fully intelligent paradigm for AI-driven drug development, ushering in a new era of innovation.
Besides the limitations in data quality, model transparency, and usability, the validation of AI models' effectiveness and safety requires substantial time accumulation. In drug delivery, the validation of model predictions typically relies on long-term real-world data support, and the acquisition and verification of such data necessitate rigorous clinical trials and regulatory approvals, which directly lead to delays in clinical implementation and commercialization. Nevertheless, regulatory authorities worldwide are taking proactive measures, as exemplified by the FDA, Health Canada, and the Medicines and Healthcare products Regulatory Agency (MHRA) jointly establishing ten guiding principles to support the safe and efficient application of AI in medical devices, with explicit emphasis on model transparency and interpretability228. As these regulatory frameworks continue to improve and technological innovation deepens, AI-based drug delivery systems are expected to gradually overcome these bottlenecks, providing robust technical support for precision medicine and personalized treatment.
5.3.2. Talent and education development
Currently, the pharmaceutical field faces a critical shortage of AI talent, creating an urgent need for comprehensive training programs to bridge this gap. Regarding talent cultivation, the demand for interdisciplinary professionals is skyrocketing, which requires knowledge input from machine learning, data science, and domain-specific pharmaceutical sciences. Professionals with traditional pharmaceutical backgrounds are supplementing their knowledge with expertise in computational science and artificial intelligence, while data scientists and AI engineers are learning about the unique requirements of the pharmaceutical industry through courses in medicinal chemistry and pharmacokinetics. Increasingly, universities and research institutions are offering courses or research projects related to AI-driven drug development, fostering a new generation of interdisciplinary talent with both theoretical foundations and industrial perspectives, injecting fresh energy into the sustainable development of the pharmaceutical industry. In brief, training should include practical applications of AI in drug formulation and clinical trials. Students should be exposed to AI platforms and tools used in the pharmaceutical industry, such as data mining, high-throughput screening, and simulation tools.
Additionally, introducing computational pharmaceutics courses in universities holds significant importance for the future of pharmaceutical science. As the pharmaceutical industry increasingly relies on advanced computational tools and AI to accelerate drug development, optimize formulations, and enhance drug delivery systems, equipping students with these skills is essential for the future. For example, a graduate course named “Computational Pharmacy” has been established and conducted at the University of Macau (China) since 2015, and the “Computational Pharmaceutics” course has also been introduced at Uppsala University in Sweden since 2021. Meanwhile, the reference book on computational pharmaceutics, first published in 2015, has released its second edition this year (2024)169,170. Top universities in China have gradually started establishing artificial intelligence schools this year, equipped with professional faculty and facilities. Such efforts and courses prepare future researchers to harness computational models for predicting drug behavior, analyzing complex biological interactions, and designing innovative drug delivery systems. By integrating computational pharmaceutics into the curriculum, universities can nurture a new generation of experts capable of driving innovation and shaping the future of smart drug delivery and personalized medicine.
5.3.3. Culture and collaboration
Capital market enthusiasm has significantly accelerated the development of AI-driven drug delivery systems, as AI pharmaceutical companies increasingly capture the attention of investors eager to support innovative healthcare solutions. However, this rapid advancement has also brought various cultural and ethical issues, including concerns about data privacy and security, bias in AI models due to non-representative datasets or fake data, and the accountability and transparency of AI systems in medical decision-making229.
To address these issues, it is essential to implement robust regulations, develop diverse and inclusive datasets, and establish clear accountability for AI decision-making. Firstly, stringent regulations must be enacted to protect patient data, guaranteeing that personal information is handled securely and transparently. Secondly, a clear framework for accountability must be developed, delineating the responsibilities of AI developers, healthcare providers, and regulatory bodies in case of errors or adverse outcomes associated with AI systems. Thirdly, stakeholders should collaborate to create initiatives that promote equal access to AI-enhanced therapies, particularly for underserved populations. Additionally, global collaboration is essential for harmonizing ethical standards and regulations, while investments in education and training are crucial for bridging the gap between AI advancements and clinical practice. By focusing on these principles and measures, AI in smart drug delivery can achieve ethical integrity and cultural sensitivity, ensuring that these remarkable advancements benefit all patients equitably and responsibly.
Meanwhile, close collaboration among academia, industry, and regulatory agencies drives coordinated innovation across the sector. Academia supports cutting-edge theories and technological breakthroughs, industry translates these achievements into practical products, and regulatory agencies ensure the safety and efficacy of innovative outcomes through scientific policies and standards. This tripartite collaboration not only accelerates the maturation of AI-driven drug development technologies but also propels the entire industry toward greater standardization, scientific rigor, and globalization. The future of AI in drug delivery will benefit from increased collaboration between AI researchers, pharmaceutical companies, and regulatory bodies. This collaborative approach can help address current challenges and accelerate the development of innovative drug delivery systems.
Despite the numerous challenges that remain, the future perspectives are promising. Addressing these challenges through innovative solutions and regulatory advancements is crucial for successfully integrating AI in smart drug delivery systems, ultimately leading to more efficient, personalized, and effective treatments in the future.
6. Conclusions
Through this review, we have explored how AI applications have evolved from simple predictive models to advanced algorithms capable of handling complex delivery challenges. It is evident that AI techniques have served as effective tools in modern pharmaceutical research. Driven by improvements in computational power, algorithms, and the expanding volume of pharmaceutical data, the synergy between AI and drug delivery research will continue to strengthen. Emerging technologies such as LLMs and multidisciplinary collaboration between AI and other technologies hold great promise for more efficient development pipelines and personalized drug delivery. To fully realize this potential, comprehensive talent training and education are essential. As AI tools become more accessible and useable, there has never been a better time for pharmacy researchers to embrace these technologies to enhance their research workflows.
Author contributions
Yiyang Wu: Conceptualization, Investigation, Data Curation, Writing - Original Draft, Visualization; Nannan Wang: Conceptualization, Investigation, Data Curation, Writing - Original Draft, Visualization; Ping Xiong: Conceptualization, Investigation, Data Curation, Writing - Original Draft; Ruifeng Wang: Conceptualization, Investigation, Data Curation, Writing - Original Draft; Jiayin Deng: Visualization; Defang Ouyang: Conceptualization, Supervision, Project administration, Funding acquisition, Writing - Review & Editing.
Conflicts of interest
The authors have no conflicts of interest to declare.
Acknowledgments
This work was financially supported by the Macau Science and Technology Development Fund (0071/2024/RIA1, China) and the Macau Science and Technology Development Fund (001/2023/ALC, China) and University of Macau Multi-Year Research Grant (MYRG-GRG2023-00077-ICMS-UMDF, China) and Macao Young Scholars Program (AM2024021, China).
Footnotes
This article is part of special issue entitled: Hot Topic Revs in Drug Delivery (II) published in Acta Pharmaceutica Sinica B.
Peer review under the responsibility of Chinese Pharmaceutical Association and Institute of Materia Medica, Chinese Academy of Medical Sciences.
References
- 1.Shah A.K., Agnihotri S.A. Recent advances and novel strategies in pre-clinical formulation development: an overview. J Control Release. 2011;156:281–296. doi: 10.1016/j.jconrel.2011.07.003. [DOI] [PubMed] [Google Scholar]
- 2.Liu H., Guo S.L., Wei S.J., Liu J.Y., Tian B.R. Pharmacokinetics and pharmacodynamics of cyclodextrin-based oral drug delivery formulations for disease therapy. Carbohydr Polym. 2024;329 doi: 10.1016/j.carbpol.2023.121763. [DOI] [PubMed] [Google Scholar]
- 3.Schlander M., Hernandez-Villafuerte K., Cheng C.Y., Mestre-Ferrandiz J., Baumann M. How much does it cost to research and develop a new drug?. A systematic review and assessment. Pharmacoeconomics. 2021;39:1243–1269. doi: 10.1007/s40273-021-01065-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Pharmaceutical drug delivery market growth, drivers & opportunities. https://www.marketsandmarkets.com/Market-Reports/drug-delivery-technologies-market-1085.html Available from:
- 5.Park H., Otte A., Park K. Evolution of drug delivery systems: from 1950 to 2020 and beyond. J Control Release. 2022;342:53–65. doi: 10.1016/j.jconrel.2021.12.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Wang W., Ye Z.Y.F., Gao H.L., Ouyang D.F. Computational pharmaceutics―a new paradigm of drug delivery. J Control Release. 2021;338:119–136. doi: 10.1016/j.jconrel.2021.08.030. [DOI] [PubMed] [Google Scholar]
- 7.Brown D.G., Wobst H.J. A decade of FDA-approved drugs (2010–2019): trends and future directions. J Med Chem. 2021;64:2312–2338. doi: 10.1021/acs.jmedchem.0c01516. [DOI] [PubMed] [Google Scholar]
- 8.Sertkaya A., Beleche T., Jessup A., Sommers B.D. Costs of drug development and research and development intensity in the US, 2000‒2018. JAMA Netw Open. 2024;7 doi: 10.1001/jamanetworkopen.2024.15445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Krizhevsky A., Sutskever I., Hinton G.E. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60:84–90. [Google Scholar]
- 10.Li B., Gilbert S. Artificial intelligence awarded two Nobel Prizes for innovations that will shape the future of medicine. npj Digit Med. 2024;7:1–3. doi: 10.1038/s41746-024-01345-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Schuhmacher A., Gatto A., Hinder M., Kuss M., Gassmann O. The upside of being a digital pharma player. Drug Discov Today. 2020;25:1569–1574. doi: 10.1016/j.drudis.2020.06.002. [DOI] [PubMed] [Google Scholar]
- 12.Schuhmacher A., Gatto A., Kuss M., Gassmann O., Hinder M. Big techs and startups in pharmaceutical R&D―a 2020 perspective on artificial intelligence. Drug Discov Today. 2021;26:2226–2231. doi: 10.1016/j.drudis.2021.04.028. [DOI] [PubMed] [Google Scholar]
- 13.Artificial intelligence for drug development. https://www.fda.gov/about-fda/center-drug-evaluation-and-research-cder/artificial-intelligence-drug-development Available from:
- 14.Patel S., Patel M., Kulkarni M., Patel M.S. DE-INTERACT: a machine-learning-based predictive tool for the drug-excipient interaction study during product development-validation through paracetamol and vanillin as a case study. Int J Pharm. 2023;637 doi: 10.1016/j.ijpharm.2023.122839. [DOI] [PubMed] [Google Scholar]
- 15.Cloutier T.K., Sudrik C., Mody N., Sathish H.A., Trout B.L. Machine learning models of antibody-excipient preferential interactions for use in computational formulation design. Mol Pharm. 2020;17:3589–3599. doi: 10.1021/acs.molpharmaceut.0c00629. [DOI] [PubMed] [Google Scholar]
- 16.Zhao Q.Q., Ye Z.Y.F., Su Y., Ouyang D.F. Predicting complexation performance between cyclodextrins and guest molecules by integrated machine learning and molecular modeling techniques. Acta Pharm Sin B. 2019;9:1241–1252. doi: 10.1016/j.apsb.2019.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Li J.J., Gao H.L., Ye Z.Y.F., Deng J.Y., Ouyang D.F. In silico formulation prediction of drug/cyclodextrin/polymer ternary complexes by machine learning and molecular modeling techniques. Carbohydr Polym. 2022;275 doi: 10.1016/j.carbpol.2021.118712. [DOI] [PubMed] [Google Scholar]
- 18.Eugster R., Orsi M., Buttitta G., Serafini N., Tiboni M., Casettari L., et al. Leveraging machine learning to streamline the development of liposomal drug delivery systems. J Control Release. 2024;376:1025–1038. doi: 10.1016/j.jconrel.2024.10.065. [DOI] [PubMed] [Google Scholar]
- 19.Xu Y., Ma S.H., Cui H.T., Chen J.G., Xu S.F., Gong F.L., et al. AGILE platform: a deep learning powered approach to accelerate LNP development for mRNA delivery. Nat Commun. 2024;15:6305. doi: 10.1038/s41467-024-50619-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Schwartz J.B., Flamholz J.R., Press R.H. Computer optimization of pharmaceutical formulations I: general procedure. J Pharmacol Sci. 1973;62:1165–1170. doi: 10.1002/jps.2600620722. [DOI] [PubMed] [Google Scholar]
- 21.Haux R., Wetter T., Stricker H., Flister J., Mann G., Oberhammer L. In: Medical informatics europe 1991. Adlassnig K.P., Grabner G., Bengtsson S., Hansen R., editors. vol. 45. Springer Berlin Heidelberg; Berlin: 1991. Knowledge-based galenical development of drug products: an overview on the design of the Galenical Development System Heidelberg; pp. 204–208. (Lecture notes in medical informatics). [Google Scholar]
- 22.Hussain A.S., Yu X.Q., Johnson R.D. Application of neural computing in pharmaceutical product development. Pharm Res. 1991;8:1248–1252. doi: 10.1023/a:1015843527138. [DOI] [PubMed] [Google Scholar]
- 23.Dowell J.A., Hussain A., Devane J., Young D. Artificial neural networks applied to the in vitro–in vivo correlation of an extended-release formulation: initial trials and experience. J Pharmacol Sci. 1999;88:154–160. doi: 10.1021/js970148p. [DOI] [PubMed] [Google Scholar]
- 24.Wu T., Pan W.S., Chen J.M., Zhang R.H. Formulation optimization technique based on artificial neural network in salbutamol sulfate osmotic pump tablets. Drug Dev Ind Pharm. 2000;26:211–215. doi: 10.1081/ddc-100100347. [DOI] [PubMed] [Google Scholar]
- 25.Bergström C.A.S., Norinder U., Luthman K., Artursson P. Experimental and computational screening models for prediction of aqueous drug solubility. Pharm Res. 2002;19:182–188. doi: 10.1023/a:1014224900524. [DOI] [PubMed] [Google Scholar]
- 26.Bergström C.A.S., Strafford M., Lazorova L., Avdeef A., Luthman K., Artursson P. Absorption classification of oral drugs based on molecular surface properties. J Med Chem. 2003;46:558–570. doi: 10.1021/jm020986i. [DOI] [PubMed] [Google Scholar]
- 27.Shao Q., Rowe R.C., York P. Comparison of neurofuzzy logic and neural networks in modelling experimental data of an immediate release tablet formulation. Eur J Pharmaceut Sci. 2006;28:394–404. doi: 10.1016/j.ejps.2006.04.007. [DOI] [PubMed] [Google Scholar]
- 28.Zhang Z.H., Dong H.Y., Peng B., Liu H.F., Li C.L., Liang M., et al. Design of an expert system for the development and formulation of push-pull osmotic pump tablets containing poorly water-soluble drugs. Int J Pharm. 2011;410:41–47. doi: 10.1016/j.ijpharm.2011.03.013. [DOI] [PubMed] [Google Scholar]
- 29.Patel A.D., Agrawal A., Dave R.H. Investigation of the effects of process variables on derived properties of spray dried solid-dispersions using polymer based response surface model and ensemble artificial neural network models. Eur J Pharm Biopharm. 2014;86:404–417. doi: 10.1016/j.ejpb.2013.10.014. [DOI] [PubMed] [Google Scholar]
- 30.Li Y.Q., Abbaspour M.R., Grootendorst P.V., Rauth A.M., Wu X.Y. Optimization of controlled release nanoparticle formulation of verapamil hydrochloride using artificial neural networks with genetic algorithm and response surface methodology. Eur J Pharm Biopharm. 2015;94:170–179. doi: 10.1016/j.ejpb.2015.04.028. [DOI] [PubMed] [Google Scholar]
- 31.Takayama K., Nambu N., Nagai T. Computer optimization of formulation of flufenamic acid/polyvinylpolypyrrolidone/methyl cellulose solid dispersions. Chem Pharm Bull. 1983;31:4496–4507. [Google Scholar]
- 32.Takai T., Takayama K., Nambu N., Nagai T. Optimum formulation of griseofulvin/hydroxypropyl cellulose solid dispersions with desirable dissolution properties. Chem Pharm Bull. 1984;32:1942–1947. [Google Scholar]
- 33.Rowe R.C., Roberts R.J. Artificial intelligence in pharmaceutical product formulation: knowledge-based and expert systems. Pharmaceut Sci Technol Today. 1998;1:153–159. [Google Scholar]
- 34.Ramani K.V., Patel M.R., Patel S.K. An expert system for drug preformulation in a pharmaceutical company. Interfaces. 1992;22:101–108. [Google Scholar]
- 35.Bateman S.D., Verlin J., Russo M., Guillot M., Laughlin S.M. The development and validation of a capsule formulation knowledge-based system. Pharmaceut Technol. 1996;20:174–184. [Google Scholar]
- 36.Rowe R.C., Wakerly M.G., Roberts R.J., Grundy R.U., Upjohn N.G. Expert systems for parenteral development. PDA J Pharm Sci Technol. 1995;49:257–261. [PubMed] [Google Scholar]
- 37.Rowe R.C., Roberts R.J. Expert systems in pharmaceutical product development. Encycl Pharm Technol. 2002:1188–1210. [Google Scholar]
- 38.Rowe R.C. Film coating formulation using an expert system. Pharmaceut Technol. 1998;10:72–82. [Google Scholar]
- 39.Dai S.Y., Xu B., Shi G.L., Liu J.W., Zhang Z.Q., Shi X.Y., et al. SeDeM expert system for directly compressed tablet formulation: a review and new perspectives. Powder Technol. 2019;342:517–527. [Google Scholar]
- 40.Singh I., Kumar P. Preformulation studies for direct compression suitability of cefuroxime axetil and paracetamol: a graphical representation using SeDeM diagram. Acta Pol Pharm. 2012;69:87–93. [PubMed] [Google Scholar]
- 41.Aguilar-Díaz J.E., García-Montoya E., Pérez-Lozano P., Suñe-Negre J.M., Miñarro M., Ticó J.R. The use of the SeDeM Diagram expert system to determine the suitability of diluents–disintegrants for direct compression and their use in formulation of ODT. Eur J Pharm Biopharm. 2009;73:414–423. doi: 10.1016/j.ejpb.2009.07.001. [DOI] [PubMed] [Google Scholar]
- 42.Aguilar-Díaz J.E., García-Montoya E., Suñe-Negre J.M., Pérez-Lozano P., Miñarro M., Ticó J.R. Predicting orally disintegrating tablets formulations of ibuprophen tablets: an application of the new SeDeM-ODT expert system. Eur J Pharm Biopharm. 2012;80:638–648. doi: 10.1016/j.ejpb.2011.12.012. [DOI] [PubMed] [Google Scholar]
- 43.Sipos E., Oltean A.R., Szabó Z.-I., Rédai E.-M., Nagy G.D. Application of SeDeM expert systems in preformulation studies of pediatric ibuprofen ODT tablets. Acta Pharm. 2017;67:237–246. doi: 10.1515/acph-2017-0017. [DOI] [PubMed] [Google Scholar]
- 44.Chalortham N., Ruangrajitpakorn T., Supnithi T., Leesawat P. In: Formulation tools for pharmaceutical development. Aguilar J.E., editor. Woodhead Publishing; Cambridge: 2013. 8 - OXPIRT: ontology-based eXpert system for production of a generic immediate release tablet; pp. 203–228. [Google Scholar]
- 45.Shao Q., Rowe R.C., York P. Comparison of neurofuzzy logic and decision trees in discovering knowledge from experimental data of an immediate release tablet formulation. Eur J Pharmaceut Sci. 2007;31:129–136. doi: 10.1016/j.ejps.2007.03.003. [DOI] [PubMed] [Google Scholar]
- 46.Landín M., Rowe R.C., York P. Advantages of neurofuzzy logic against conventional experimental design and statistical analysis in studying and developing direct compression formulations. Eur J Pharmaceut Sci. 2009;38:325–331. doi: 10.1016/j.ejps.2009.08.004. [DOI] [PubMed] [Google Scholar]
- 47.Knox C., Wilson M., Klinger C.M., Franklin M., Oler E., Wilson A., et al. DrugBank 6.0: the DrugBank Knowledgebase for 2024. Nucleic Acids Res. 2024;52:D1265–D1275. doi: 10.1093/nar/gkad976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kim S.H., Chen J., Cheng T.J., Gindulyte A., He J., He S., et al. PubChem 2023 update. Nucleic Acids Res. 2023;51:D1373–D1380. doi: 10.1093/nar/gkac956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Mendez D., Gaulton A., Bento A.P., Chambers J., De Veij M., Félix E., et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2019;47:D930–D940. doi: 10.1093/nar/gky1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Sushko I., Novotarskyi S., Körner R., Pandey A.K., Rupp M., Teetz W., et al. Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des. 2011;25:533–554. doi: 10.1007/s10822-011-9440-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Esposito R., Ermondi G., Caron G. OpenCDLig: a free web application for sharing resources about cyclodextrin/ligand complexes. J Comput Aided Mol Des. 2009;23:669–675. doi: 10.1007/s10822-009-9290-3. [DOI] [PubMed] [Google Scholar]
- 52.Hazai E., Hazai I., Demko L., Kovacs S., Malik D., Akli P., et al. Cyclodextrin KnowledgeBase a web-based service managing CD-ligand complexation data. J Comput Aided Mol Des. 2010;24:713–717. doi: 10.1007/s10822-010-9368-y. [DOI] [PubMed] [Google Scholar]
- 53.Mixcoha E., Rosende R., Garcia-Fandino R., Piñeiro Á. Cyclo-lib: a database of computational molecular dynamics simulations of cyclodextrins. Bioinformatics. 2016;32:3371–3373. doi: 10.1093/bioinformatics/btw289. [DOI] [PubMed] [Google Scholar]
- 54.Wang W.Y., Yan X.L., Zhao L.L., Russo D.P., Wang S.Q., Liu Y., et al. Universal nanohydrophobicity predictions using virtual nanoparticle library. J Cheminf. 2019;11:6. doi: 10.1186/s13321-019-0329-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Mathur D., Kaur H., Dhall A., Sharma N., Raghava G.P.S. SAPdb: a database of short peptides and the corresponding nanostructures formed by self-assembly. Comput Biol Med. 2021;133 doi: 10.1016/j.compbiomed.2021.104391. [DOI] [PubMed] [Google Scholar]
- 56.Wang N.N., Sun H.M., Dong J., Ouyang D.F. PharmDE: a new expert system for drug-excipient compatibility evaluation. Int J Pharm. 2021;607 doi: 10.1016/j.ijpharm.2021.120962. [DOI] [PubMed] [Google Scholar]
- 57.Zaslavsky J., Allen C. A dataset of formulation compositions for self-emulsifying drug delivery systems. Sci Data. 2023;10:914. doi: 10.1038/s41597-023-02812-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Jung S.M., Bufton J., Bao Z.Q., Cho W.J., Aguiar D., Allen C. Data characterizing a panel of biodegradable cross-linked polyester implants for sustained delivery of an anti-viral drug. Data Brief. 2025;58 doi: 10.1016/j.dib.2024.111182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Murray J.D., Bennett-Lenane H., O'Dwyer P.J., Griffin B.T. Establishing a pharmacoinformatics repository of approved medicines: a database to support drug product development. Mol Pharm. 2025;22:408–423. doi: 10.1021/acs.molpharmaceut.4c00991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Kamal M.M., Nazzal S. Development of a new class of sulforaphane-enabled self-emulsifying drug delivery systems (SFN-SEDDS) by high throughput screening: a case study with curcumin. Int J Pharm. 2018;539:147–156. doi: 10.1016/j.ijpharm.2018.01.045. [DOI] [PubMed] [Google Scholar]
- 61.Bao Z.Q., Yung F., Hickman R.J., Aspuru-Guzik A., Bannigan P., Allen C. Data-driven development of an oral lipid-based nanoparticle formulation of a hydrophobic drug. Drug Deliv Transl Res. 2024;14:1872–1887. doi: 10.1007/s13346-023-01491-9. [DOI] [PubMed] [Google Scholar]
- 62.Tomé I., Francisco V., Fernandes H., Ferreira L. High-throughput screening of nanoparticles in drug delivery. APL Bioeng. 2021;5 doi: 10.1063/5.0057204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Mennen S.M., Alhambra C., Allen C.L., Barberis M., Berritt S., Brandt T.A., et al. The evolution of high-throughput experimentation in pharmaceutical development and perspectives on the future. Org Process Res Dev. 2019;23:1213–1242. [Google Scholar]
- 64.Mahjour B., Zhang R., Shen Y.N., McGrath A., Zhao R.H., Mohamed O.G., et al. Rapid planning and analysis of high-throughput experiment arrays for reaction discovery. Nat Commun. 2023;14:3924. doi: 10.1038/s41467-023-39531-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Whitesides G.M. The origins and the future of microfluidics. Nature. 2006;442:368–373. doi: 10.1038/nature05058. [DOI] [PubMed] [Google Scholar]
- 66.Ortiz-Perez A., Tilborg D van, van der Meel R., Grisoni F., Albertazzi L. Machine learning-guided high throughput nanoparticle design. Dig Dis. 2024;3:1280–1291. [Google Scholar]
- 67.Dedeloudi A., Weaver E., Lamprou D.A. Machine learning in additive manufacturing & microfluidics for smarter and safer drug delivery systems. Int J Pharm. 2023;636 doi: 10.1016/j.ijpharm.2023.122818. [DOI] [PubMed] [Google Scholar]
- 68.Leung C.M., de Haan P., Ronaldson-Bouchard K., Kim G.A., Ko J., Rho H.S., et al. A guide to the organ-on-a-chip. Nat Rev Methods Primers. 2022 doi: 10.1038/s43586-022-00118-6. Available from: [DOI] [Google Scholar]
- 69.Wallyn J., Anton N., Akram S., Vandamme T.F. Biomedical imaging: principles, technologies, clinical aspects, contrast agents, limitations and future trends in nanomedicines. Pharm Res. 2019;36:78. doi: 10.1007/s11095-019-2608-5. [DOI] [PubMed] [Google Scholar]
- 70.Boutros M., Heigwer F., Laufer C. Microscopy-based high-content screening. Cell. 2015;163:1314–1325. doi: 10.1016/j.cell.2015.11.007. [DOI] [PubMed] [Google Scholar]
- 71.Siddhanta S., Kuzmin A.N., Pliss A., Baev A.S., Khare S.K., Chowdhury P.K., et al. Advances in Raman spectroscopy and imaging for biomedical research. Adv Opt Photonics. 2023;15:318–384. [Google Scholar]
- 72.Abdalla Y., McCoubrey L.E., Ferraro F., Sonnleitner L.M., Guinet Y., Siepmann F., et al. Machine learning of Raman spectra predicts drug release from polysaccharide coatings for targeted colonic delivery. J Control Release. 2024;374:103–111. doi: 10.1016/j.jconrel.2024.08.010. [DOI] [PubMed] [Google Scholar]
- 73.Kim B., Kim H., Kim S., Hwang Y.R. A brief review of non-invasive brain imaging technologies and the near-infrared optical bioimaging. Appl Microsc. 2021;51:9. doi: 10.1186/s42649-021-00058-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Yang Y.L., Ye Z.Y.F., Su Y., Zhao Q.Q., Li X.S., Ouyang D.F. Deep learning for in vitro prediction of pharmaceutical formulations. Acta Pharm Sin B. 2019;9:177–185. doi: 10.1016/j.apsb.2018.09.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Tan X.R., Liu Q.H., Fang Y.P., Zhu Y.L., Chen F., Zeng W.B., et al. Predicting peptide permeability across diverse barriers: a systematic investigation. Mol Pharm. 2024;21:4116–4127. doi: 10.1021/acs.molpharmaceut.4c00478. [DOI] [PubMed] [Google Scholar]
- 76.Harmalkar A., Rao R., Richard Xie Y., Honer J., Deisting W., Anlahr J., et al. Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features. mAbs. 2023;15 doi: 10.1080/19420862.2022.2163584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Tang Y.X., Zhang J.L., He D.D., Miao W.F., Liu W., Li Y., et al. GANDA: a deep generative adversarial network conditionally generates intratumoral nanoparticles distribution pixels-to-pixels. J Control Release. 2021;336:336–343. doi: 10.1016/j.jconrel.2021.06.039. [DOI] [PubMed] [Google Scholar]
- 78.Harrison P.J., Wieslander H., Sabirsh A., Karlsson J., Malmsjö V., Hellander A., et al. Deep-learning models for lipid nanoparticle-based drug delivery. Nanomedicine. 2021;16:1097–1110. doi: 10.2217/nnm-2020-0461. [DOI] [PubMed] [Google Scholar]
- 79.Nadkarni P.M., Ohno-Machado L., Chapman W.W. Natural language processing: an introduction. J Am Med Inf Assoc. 2011;18:544–551. doi: 10.1136/amiajnl-2011-000464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Ballard D.H., Brown C.M. Prentice-Hall; Hoboken: 1982. Computer vision. [Google Scholar]
- 81.Ramachandram D., Taylor G.W. Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process Mag. 2017;34:96–108. [Google Scholar]
- 82.Ye Z.Y.F., Wang N.N., Zhou J.T., Ouyang D.F. Organic crystal structure prediction via coupled generative adversarial networks and graph convolutional networks. Innovation. 2024;5 doi: 10.1016/j.xinn.2023.100562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Merchant A., Batzner S., Schoenholz S.S., Aykol M., Cheon G., Cubuk E.D. Scaling deep learning for materials discovery. Nature. 2023;624:80–85. doi: 10.1038/s41586-023-06735-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Townshend R.J.L., Eismann S., Watkins A.M., Rangan R., Karelina M., Das R., et al. Geometric deep learning of RNA structure. Science. 2021;373:1047–1051. doi: 10.1126/science.abe5650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630:493–500. doi: 10.1038/s41586-024-07487-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Hornick T., Mao C., Koynov A., Yawman P., Thool P., Salish K., et al. In silico formulation optimization and particle engineering of pharmaceutical products using a generative artificial intelligence structure synthesis method. Nat Commun. 2024;15:9622. doi: 10.1038/s41467-024-54011-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Salma H., Melha Y.M., Sonia L., Hamza H., Salim N. Efficient prediction of in vitro piroxicam release and diffusion from topical films based on biopolymers using deep learning models and generative adversarial networks. J Pharmacol Sci. 2021;110:2531–2543. doi: 10.1016/j.xphs.2021.01.032. [DOI] [PubMed] [Google Scholar]
- 88.Obeid S., Madžarević M., Krkobabić M., Ibrić S. Predicting drug release from diazepam FDM printed tablets using deep learning approach: influence of process parameters and tablet surface/volume ratio. Int J Pharm. 2021;601 doi: 10.1016/j.ijpharm.2021.120507. [DOI] [PubMed] [Google Scholar]
- 89.Husseini G.A., Sabouni R., Puzyrev V., Ghommem M. Deep learning for the accurate prediction of triggered drug delivery. IEEE Trans NanoBioscience. 2024;24:102–112. doi: 10.1109/TNB.2024.3426291. [DOI] [PubMed] [Google Scholar]
- 90.Liu X.Y., Wang X.Y., Luo Y.C., Wang M.J., Chen Z.J., Han X.Y., et al. A 3D tumor-mimicking in vitro drug release model of locoregional chemoembolization using deep learning-based quantitative analyses. Adv Sci. 2023;10 doi: 10.1002/advs.202206195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Sagi O., Rokach L. Ensemble learning: a survey. WIREs Data Min Knowl. 2018;8 [Google Scholar]
- 92.Dong J., Wu Z., Xu H.L., Ouyang D.F. FormulationAI: a novel web-based platform for drug formulation design driven by artificial intelligence. Briefings Bioinf. 2024;25 doi: 10.1093/bib/bbad419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Hang N.T., Long N.T., Duy N.D., Chien N.N., Van Phuong N. Towards safer and efficient formulations: machine learning approaches to predict drug-excipient compatibility. Int J Pharm. 2024;653 doi: 10.1016/j.ijpharm.2024.123884. [DOI] [PubMed] [Google Scholar]
- 94.Deng J.Y., Ye Z.Y.F., Zheng W.W., Chen J., Gao H.S., Wu Z., et al. Machine learning in accelerating microsphere formulation development. Drug Deliv Transl Res. 2023;13:966–982. doi: 10.1007/s13346-022-01253-z. [DOI] [PubMed] [Google Scholar]
- 95.Cai C.J., Wang S.W., Xu Y.J., Zhang W.L., Tang K., Ouyang Q., et al. Transfer learning for drug discovery. J Med Chem. 2020;63:8683–8694. doi: 10.1021/acs.jmedchem.9b02147. [DOI] [PubMed] [Google Scholar]
- 96.Guo W.B., Dong Y.W., Hao G.F. Transfer learning empowers accurate pharmacokinetics prediction of small samples. Drug Discov Today. 2024;29 doi: 10.1016/j.drudis.2024.103946. [DOI] [PubMed] [Google Scholar]
- 97.Zhang Y., Yang Q. A survey on multi-task learning. IEEE Trans Knowl Data Eng. 2022;34:5586–5609. doi: 10.1109/tkde.2020.3045924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Ye Z.Y.F., Yang Y.L., Li X.S., Cao D.S., Ouyang D.F. An integrated transfer learning and multitask learning approach for pharmacokinetic parameter prediction. Mol Pharm. 2019;16:533–541. doi: 10.1021/acs.molpharmaceut.8b00816. [DOI] [PubMed] [Google Scholar]
- 99.Wang L., Zhou Z.R., Yang X.X., Shi S.H., Zeng X.X., Cao D.S. The present state and challenges of active learning in drug discovery. Drug Discov Today. 2024;29 doi: 10.1016/j.drudis.2024.103985. [DOI] [PubMed] [Google Scholar]
- 100.Rakhimbekova A., Lopukhov A., Klyachko N., Kabanov A., Madzhidov T.I., Tropsha A. Efficient design of peptide-binding polymers using active learning approaches. J Control Release. 2023;353:903–914. doi: 10.1016/j.jconrel.2022.11.023. [DOI] [PubMed] [Google Scholar]
- 101.Wang N.N., Dong J., Ouyang D.F. AI-directed formulation strategy design initiates rational drug development. J Control Release. 2025;378:619–636. doi: 10.1016/j.jconrel.2024.12.043. [DOI] [PubMed] [Google Scholar]
- 102.Wang N.N., Wang W., Zhong H., Ouyang D.F. In: Exploring computational pharmaceutics―AI and modeling in Pharma 4.0. Ouyang D.F., editor. John Wiley & Sons; Hoboken: 2024. Introduction to computational pharmaceutics; pp. 1–9. [Google Scholar]
- 103.Sano S., Kadowaki T., Tsuda K., Kimura S. Application of Bayesian optimization for pharmaceutical product development. J Pharm Innov. 2020;15:333–343. [Google Scholar]
- 104.Arulkumaran K., Deisenroth M.P., Brundage M., Bharath A.A. Deep reinforcement learning: a brief survey. IEEE Signal Process Mag. 2017;34:26–38. [Google Scholar]
- 105.Chen L., Zhang Y.W., Zhang S.C. Optimal drug dosage control strategy of immune systems using reinforcement learning. IEEE Access. 2023;11:1269–1279. [Google Scholar]
- 106.Padmanabhan R., Meskin N., Haddad W.M. Optimal adaptive control of drug dosing using integral reinforcement learning. Math Biosci. 2019;309:131–142. doi: 10.1016/j.mbs.2019.01.012. [DOI] [PubMed] [Google Scholar]
- 107.Tabrizi S.P.H.P., Reza A., Jameii S.M. Enhanced path planning for automated nanites drug delivery based on reinforcement learning and polymorphic improved ant colony optimization. J Supercomput. 2021;77:6714–6733. [Google Scholar]
- 108.Kim J.W., Park B.J., Oh T.H., Lee J.M. Model-based reinforcement learning and predictive control for two-stage optimal control of fed-batch bioreactor. Comput Chem Eng. 2021;154 [Google Scholar]
- 109.van der Merwe J., Steenekamp J., Steyn D., Hamman J. The role of functional excipients in solid oral dosage forms to overcome poor drug dissolution and bioavailability. Pharmaceutics. 2020;12:393. doi: 10.3390/pharmaceutics12050393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Wang W., Chen K.P., Jiang T., Wu Y.Y., Wu Z., Ying H., et al. Artificial intelligence-driven rational design of ionizable lipids for mRNA delivery. Nat Commun. 2024;15 doi: 10.1038/s41467-024-55072-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Li B.W., Raji I.O., Gordon A.G.R., Sun L.Z., Raimondo T.M., Oladimeji F.A., et al. Accelerating ionizable lipid discovery for mRNA delivery using machine learning and combinatorial chemistry. Nat Mater. 2024;23:1002–1008. doi: 10.1038/s41563-024-01867-3. [DOI] [PubMed] [Google Scholar]
- 112.Witten J., Raji I., Manan R.S., Beyer E., Bartlett S., Tang Y.H., et al. Artificial intelligence-guided design of lipid nanoparticles for pulmonary gene therapy. Nat Biotechnol. 2024 doi: 10.1038/s41587-024-02490-y. Available from: [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Bae S.H., Choi H., Lee J., Kang M.H., Ahn S.H., Lee Y.S., et al. Rational design of lipid nanoparticles for enhanced mRNA vaccine delivery via machine learning. Small. 2025;21 doi: 10.1002/smll.202405618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Goodfellow I.J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., et al. Generative adversarial nets. Proceedings of the 28th International Conference on Neural Information Processing Systems. 2014;2:2672–2680. Cambridge: MIT Press. [Google Scholar]
- 115.Kingma D.P., Welling M. An introduction to variational autoencoders. Found Trends Mach Learn. 2019;12:307–392. [Google Scholar]
- 116.Yang L., Zhang Z.L., Song Y., Hong S.D., Xu R., Zhao Y., et al. Diffusion models: a comprehensive survey of methods and applications. ACM Comput Surv. 2023;56:1–39. [Google Scholar]
- 117.Liu Y.X., Xu C.C., Yang X.Y., Zhang Y.M., Chen Y.D., Liu H.C. Application progress of deep generative models in de novo drug design. Mol Divers. 2024;28:2411–2427. doi: 10.1007/s11030-024-10942-5. [DOI] [PubMed] [Google Scholar]
- 118.Sanchez-Lengeling B., Aspuru-Guzik A. Inverse molecular design using machine learning: generative models for matter engineering. Science. 2018;361:360–365. doi: 10.1126/science.aat2663. [DOI] [PubMed] [Google Scholar]
- 119.Tong X.C., Liu X.H., Tan X.Q., Li X.T., Jiang J.X., Xiong Z.P., et al. Generative models for de novo drug design. J Med Chem. 2021;64:14011–14027. doi: 10.1021/acs.jmedchem.1c00927. [DOI] [PubMed] [Google Scholar]
- 120.McDonald S.M., Augustine E.K., Lanners Q., Rudin C., Catherine Brinson L., Becker M.L. Applied machine learning as a driver for polymeric biomaterials design. Nat Commun. 2023;14:4838. doi: 10.1038/s41467-023-40459-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Bhowmik D., Zhang P., Fox Z., Irle S., Gounley J. Enhancing molecular design efficiency: uniting language models and generative networks with genetic algorithms. Patterns. 2024;5 doi: 10.1016/j.patter.2024.100947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Yue T.L., Tao L., Varshney V., Li Y. Benchmarking study of deep generative models for inverse polymer design. Dig Dis. 2025;4:910–926. [Google Scholar]
- 123.Liu D.F., Zhang Y.X., Dong W.Z., Feng Q.K., Zhong S.L., Dang Z.M. High-temperature polymer dielectrics designed using an invertible molecular graph generative model. J Chem Inf Model. 2023;63:7669–7675. doi: 10.1021/acs.jcim.3c01572. [DOI] [PubMed] [Google Scholar]
- 124.Elbadawi M., Li H.X., Sun S.Y., Alkahtani M.E., Basit A.W., Gaisford S. Artificial intelligence generates novel 3D printing formulations. Appl Mater Today. 2024;36 [Google Scholar]
- 125.Korshunova M., Huang N., Capuzzi S., Radchenko D.S., Savych O., Moroz Y.S., et al. Generative and reinforcement learning approaches for the automated de novo design of bioactive compounds. Commun Chem. 2022;5:1–11. doi: 10.1038/s42004-022-00733-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Dong J., Gao H.L., Ouyang D.F. PharmSD: a novel AI-based computational platform for solid dispersion formulation design. Int J Pharm. 2021;604 doi: 10.1016/j.ijpharm.2021.120705. [DOI] [PubMed] [Google Scholar]
- 127.Wu Z., Wang N.N., Ye Z.Y.F., Xu H.L., Chan G., Ouyang D.F. FormulationBCS: a machine learning platform based on diverse molecular representations for biopharmaceutical classification system (BCS) class prediction. Mol Pharm. 2025;22:330–342. doi: 10.1021/acs.molpharmaceut.4c00946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Wang N.N., Zhang Y.S., Wang W., Ye Z.Y.F., Chen H.Y., Hu G.H., et al. How can machine learning and multiscale modeling benefit ocular drug development?. Adv Drug Deliv Rev. 2023;196 doi: 10.1016/j.addr.2023.114772. [DOI] [PubMed] [Google Scholar]
- 129.Alsharef A., Aggarwal K., Sonia Kumar M., Mishra A. Review of ML and AutoML solutions to forecast time-series data. Arch Comput Methods Eng. 2022;29:5297–5311. doi: 10.1007/s11831-022-09765-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130.Han R., Xiong H., Ye Z.Y.F., Yang Y.L., Huang T.H., Jing Q.F., et al. Predicting physical stability of solid dispersions by machine learning techniques. J Control Release. 2019;311–312:16–25. doi: 10.1016/j.jconrel.2019.08.030. [DOI] [PubMed] [Google Scholar]
- 131.Swain M.C., Cole J.M. ChemDataExtractor: a toolkit for automated extraction of chemical information from the scientific literature. J Chem Inf Model. 2016;56:1894–1904. doi: 10.1021/acs.jcim.6b00207. [DOI] [PubMed] [Google Scholar]
- 132.Mavračić J., Court C.J., Isazawa T., Elliott S.R., Cole J.M. ChemDataExtractor 2.0: autopopulated ontologies for materials science. J Chem Inf Model. 2021;61:4280–4289. doi: 10.1021/acs.jcim.1c00446. [DOI] [PubMed] [Google Scholar]
- 133.Stekhoven D.J., Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28:112–118. doi: 10.1093/bioinformatics/btr597. [DOI] [PubMed] [Google Scholar]
- 134.Dong Y.R., Peng C.Y. Principled missing data methods for researchers. SpringerPlus. 2013;2:222. doi: 10.1186/2193-1801-2-222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135.Yap C.W. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32:1466–1474. doi: 10.1002/jcc.21707. [DOI] [PubMed] [Google Scholar]
- 136.Moriwaki H., Tian Y.S., Kawashita N., Takagi T. Mordred: a molecular descriptor calculator. J Cheminf. 2018;10:4. doi: 10.1186/s13321-018-0258-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137.Landrum G., Tosco P., Kelley B., Rodriguez R., Cosgrove D., Vianello R., et al. 2024. RDKit: open-source cheminformatics. Available from: https://www.rdkit.org. [Google Scholar]
- 138.Hotelling H. Analysis of a complex of statistical variables into principal components. J Educ Psychol. 1933;24:417–441. [Google Scholar]
- 139.Maaten L. van der, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9:2579–2605. [Google Scholar]
- 140.McInnes L., Healy J., Saul N., Großberger L. UMAP: uniform manifold approximation and projection. J Open Source Softw. 2018;3:861. [Google Scholar]
- 141.Krawczyk B. Learning from imbalanced data: open challenges and future directions. Prog Artif Intell. 2016;5:221–232. [Google Scholar]
- 142.Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–357. [Google Scholar]
- 143.Shorten C., Khoshgoftaar T.M. A survey on image data augmentation for deep learning. J Big Data. 2019;6:60. doi: 10.1186/s40537-021-00492-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 144.Bjerrum E.J. SMILES enumeration as data augmentation for neural network modeling of molecules. arXiv:1703.07076 [Preprint] 2017 https://arxiv.org/abs/1703.07076 Available from: [Google Scholar]
- 145.Cawley G.C., Talbot N.L.C. On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res. 2010;11:2079–2107. [Google Scholar]
- 146.Browne M.W. Cross-validation methods. J Math Psychol. 2000;44:108–132. doi: 10.1006/jmps.1999.1279. [DOI] [PubMed] [Google Scholar]
- 147.Deng S.W., Wu Y.Y., Ye Z.Y.F., Ouyang D.F. In silico prediction of metabolic stability for ester-containing molecules: machine learning and quantum mechanical methods. Chemometr Intell Lab Syst. 2025;257 [Google Scholar]
- 148.Bischl B., Binder M., Lang M., Pielok T., Richter J., Coors S., et al. Hyperparameter optimization: foundations, algorithms, best practices, and open challenges. WIREs Data Min Knowl. 2023;13 [Google Scholar]
- 149.Vamathevan J., Clark D., Czodrowski P., Dunham I., Ferran E., Lee G., et al. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019;18:463–477. doi: 10.1038/s41573-019-0024-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 150.Cao Y., Geddes T.A., Yang J.Y.H., Yang P.Y. Ensemble deep learning in bioinformatics. Nat Mach Intell. 2020;2:500–508. [Google Scholar]
- 151.Lundberg S.M., Lee S.-I. Proceedings of the 31st International Conference on neural information processing systems. Curran Associates Inc.; Red Hook, NY, USA: 2017. A unified approach to interpreting model predictions; pp. 4768–4777. [Google Scholar]
- 152.Ribeiro M.T., Singh S., Guestrin C. Proceedings of the 22nd ACM SIGKDD International Conference on knowledge discovery and data mining. Association for Computing Machinery; New York, USA: 2016. “Why should I trust you?”: explaining the predictions of any classifier; pp. 1135–1144. [Google Scholar]
- 153.Mendes B.B., Zhang Z.L., Conniot J., Sousa D.P., Ravasco J.M.J.M., Onweller L.A., et al. A large-scale machine learning analysis of inorganic nanoparticles in preclinical cancer research. Nat Nanotechnol. 2024;19:867–878. doi: 10.1038/s41565-024-01673-7. [DOI] [PubMed] [Google Scholar]
- 154.Jiménez-Luna J., Grisoni F., Schneider G. Drug discovery with explainable artificial intelligence. Nat Mach Intell. 2020;2:573–584. [Google Scholar]
- 155.Hüllermeier E., Waegeman W. Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach Learn. 2021;110:457–506. [Google Scholar]
- 156.Clyde M., George E.I. Model uncertainty. Stat Sci. 2004;19:81–94. [Google Scholar]
- 157.Luo R.Q., Sun L.A., Xia Y.C., Qin T., Zhang S., Poon H.F., et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings Bioinf. 2022;23 doi: 10.1093/bib/bbac409. [DOI] [PubMed] [Google Scholar]
- 158.Gu Y., Tinn R., Cheng H., Lucas M., Usuyama N.T., Liu X.D., et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthc. 2022;3:1–23. [Google Scholar]
- 159.Ross J., Belgodere B., Chenthamarakshan V., Padhi I., Mroueh Y., Das P. Large-scale chemical language representations capture molecular structure and properties. Nat Mach Intell. 2022;4:1256–1264. [Google Scholar]
- 160.Chithrananda S., Grand G., Ramsundar B. ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. 2020. https://arxiv.org/abs/2010.09885 Available from:
- 161.Jablonka K.M., Schwaller P., Ortega-Guerrero A., Smit B. Leveraging large language models for predictive chemistry. Nat Mach Intell. 2024;6:161–169. [Google Scholar]
- 162.Wang C., Fan H.H., Quan R.J., Yao L., Yang Y. Proceedings of the 48th International ACM SIGIR Conference on research and development in information Retrieval. Association for Computing Machinery; New York: 2025. ProtChatGPT: towards understanding proteins with hybrid representation and large language models; pp. 1076–1086. [Google Scholar]
- 163.Wei J.H., Zhuo L.L., Fu X.Z., Zeng X.X., Wang L., Zou Q., et al. DrugReAlign: a multisource prompt framework for drug repurposing based on large language models. BMC Biol. 2024;22:226. doi: 10.1186/s12915-024-02028-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 164.Ronquillo J.G., Ye J., Gorman D., Lemeshow A.R., Watt S.J. Practical aspects of using large language models to screen abstracts for cardiovascular drug development: cross-sectional study. JMIR Med Inform. 2024;12 doi: 10.2196/64143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 165.Hu J.J., Wu P., Li Y.L., Li Q., Wang S.Y., Liu Y., et al. Discovering photoswitchable molecules for drug delivery with large language models and chemist instruction training. Pharmaceuticals. 2024;17:1300. doi: 10.3390/ph17101300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 166.Wu L.H., Xu J., Thakkar S., Gray M., Qu Y.Y., Li D.Y., et al. A framework enabling LLMs into regulatory environment for transparency and trustworthiness and its application to drug labeling document. Regul Toxicol Pharmacol. 2024;149 doi: 10.1016/j.yrtph.2024.105613. [DOI] [PubMed] [Google Scholar]
- 167.Huang L., Yu W.J., Ma W.T., Zhong W.H., Feng Z.Y., Wang H.T., et al. A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. ACM Trans Inf Syst. 2024;43:1–55. [Google Scholar]
- 168.Farquhar S., Kossen J., Kuhn L., Gal Y. Detecting hallucinations in large language models using semantic entropy. Nature. 2024;630:625–630. doi: 10.1038/s41586-024-07421-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 169.Ouyang D.F. John Wiley & Sons; Hoboken: 2015. Computational pharmaceutics: application of molecular modeling in drug delivery. [Google Scholar]
- 170.Ouyang D.F. John Wiley & Sons; Hoboken: 2024. Exploring computational pharmaceutics: AI and modeling in Pharma 4.0. [Google Scholar]
- 171.Silva-Júnior E.F., Aquino T.M., Araújo-Júnior J.X. Quantum mechanical (QM) calculations applied to ADMET drug prediction: a review. Curr Drug Metabol. 2017;18:511–526. doi: 10.2174/1389200218666170316094514. [DOI] [PubMed] [Google Scholar]
- 172.Casalini T. Not only in silico drug discovery: molecular modeling towards in silico drug delivery formulations. J Control Release. 2021;332:390–417. doi: 10.1016/j.jconrel.2021.03.005. [DOI] [PubMed] [Google Scholar]
- 173.Wang W., Ouyang D.F. Opportunities and challenges of physiologically based pharmacokinetic modeling in drug delivery. Drug Discov Today. 2022;27:2100–2120. doi: 10.1016/j.drudis.2022.04.015. [DOI] [PubMed] [Google Scholar]
- 174.Yeom S.B., Ha E.S., Kim M.S., Jeong S.H., Hwang S.J., Choi D.H. Application of the discrete element method for manufacturing process simulation in the pharmaceutical industry. Pharmaceutics. 2019;11:414. doi: 10.3390/pharmaceutics11080414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 175.Zhuang X.M., Lu C. PBPK modeling and simulation in drug research and development. Acta Pharm Sin B. 2016;6:430–440. doi: 10.1016/j.apsb.2016.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 176.Danishuddin Kumar V., Faheem M., Woo Lee K. A decade of machine learning-based predictive models for human pharmacokinetics: advances and challenges. Drug Discov Today. 2022;27:529–537. doi: 10.1016/j.drudis.2021.09.013. [DOI] [PubMed] [Google Scholar]
- 177.Wu K.H., Li X., Zhou Z., Zhao Y.N., Su M., Cheng Z., et al. Predicting pharmacodynamic effects through early drug discovery with artificial intelligence-physiologically based pharmacokinetic (AI-PBPK) modelling. Front Pharmacol. 2024;15 doi: 10.3389/fphar.2024.1330855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 178.Wang W., Wang N.N., Wu Y.Y., Ye Z.Y.F., Zhao L., Chen X.F., et al. An integrated AI-PBPK platform for predicting drug in vivo fate and tissue distribution in human and inter-species extrapolation. Clin Pharmacol Ther. 2025 doi: 10.1002/cpt.3732. Available from: [DOI] [PubMed] [Google Scholar]
- 179.Zhuang X., Lu C. PBPK modeling and simulation in drug research and development. Acta Pharm Sin B. 2016;6:430–440. doi: 10.1016/j.apsb.2016.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 180.Deepika D., Kumar V. The role of “physiologically based pharmacokinetic model (PBPK)” new approach methodology (NAM) in pharmaceuticals and environmental chemical risk assessment. Int J Environ Res Publ Health. 2023;20:3473. doi: 10.3390/ijerph20043473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 181.Liu B., Li X. In: Exploring computational pharmaceutics―AI and modeling in Pharma 4.0. Ouyang D.F., editor. John Wiley & Sons; Hoboken: 2024. Application of PBPK modeling in formulation development; pp. 474–492. [Google Scholar]
- 182.Handa K., Yoshimura S., Kageyama M., Iijima T. Development of novel methods for QSAR modeling by machine learning repeatedly: a case study on drug distribution to each tissue. J Chem Inf Model. 2024;64:3662–3669. doi: 10.1021/acs.jcim.4c00046. [DOI] [PubMed] [Google Scholar]
- 183.Chou W.C., Lin Z.M. Machine learning and artificial intelligence in physiologically based pharmacokinetic modeling. Toxicol Sci. 2023;191:1–14. doi: 10.1093/toxsci/kfac101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 184.Zhao P., Zhang L., Grillo J.A., Liu Q., Bullock J.M., Moon Y.J., et al. Applications of physiologically based pharmacokinetic (PBPK) modeling and simulation during regulatory review. Clin Pharmacol Ther. 2011;89:259–267. doi: 10.1038/clpt.2010.298. [DOI] [PubMed] [Google Scholar]
- 185.Luzon E., Blake K., Cole S., Nordmark A., Versantvoort C., Berglund E.G. Physiologically based pharmacokinetic modeling in regulatory decision-making at the European Medicines Agency. Clin Pharmacol Ther. 2017;102:98–105. doi: 10.1002/cpt.539. [DOI] [PubMed] [Google Scholar]
- 186.Meek M.E., Barton H.A., Bessems J.G., Lipscomb J.C., Krishnan K. Case study illustrating the WHO IPCS guidance on characterization and application of physiologically based pharmacokinetic models in risk assessment. Regul Toxicol Pharmacol. 2013;66:116–129. doi: 10.1016/j.yrtph.2013.03.005. [DOI] [PubMed] [Google Scholar]
- 187.Kumar V., Faheem M., Woo Lee K. A decade of machine learning-based predictive models for human pharmacokinetics: advances and challenges. Drug Discov Today. 2022;27:529–537. doi: 10.1016/j.drudis.2021.09.013. [DOI] [PubMed] [Google Scholar]
- 188.Frechen S., Solodenko J., Wendl T., Dallmann A., Ince I., Lehr T., et al. A generic framework for the physiologically-based pharmacokinetic platform qualification of PK-Sim and its application to predicting cytochrome P450 3A4–mediated drug–drug interactions. CPT Pharmacometrics Syst Pharmacol. 2021;10:633–644. doi: 10.1002/psp4.12636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 189.Wang R.F., Zhang Y.S., Zhong H., Zang J.Y., Wang W., Cheng H., et al. Understanding the self-assembly and molecular structure of mRNA lipid nanoparticles at real size: insights from the ultra-large-scale simulation. Int J Pharm. 2025;670 doi: 10.1016/j.ijpharm.2024.125114. [DOI] [PubMed] [Google Scholar]
- 190.Zhong H., Lu T.S., Wang R.F., Ouyang D.F. Quantitative analysis of physical stability mechanisms of amorphous solid dispersions by molecular dynamic simulation. AAPS J. 2024;27:9. doi: 10.1208/s12248-024-01001-w. [DOI] [PubMed] [Google Scholar]
- 191.Wang R.F., Zhu W., Dang M.Z., Deng X.Y., Shi X., Zhang Y.J., et al. Targeting lipid rafts as a rapid screening strategy for potential antiadipogenic polyphenols along with the structure–activity relationship and mechanism elucidation. J Agric Food Chem. 2022;70:3872–3885. doi: 10.1021/acs.jafc.2c00444. [DOI] [PubMed] [Google Scholar]
- 192.Bunker A., Róg T. Understanding from molecular dynamics simulation in pharmaceutical research 1: drug delivery. Front Mol Biosci. 2020;7 doi: 10.3389/fmolb.2020.604770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 193.Tran D.P., Tada S., Yumoto A., Kitao A., Ito Y., Uzawa T., et al. Using molecular dynamics simulations to prioritize and understand AI-generated cell penetrating peptides. Sci Rep. 2021;11 doi: 10.1038/s41598-021-90245-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 194.Wang R.F., Zang J.Y., Zhong H., Zhang Y.S., Ouyang D.F. Understanding the molecular insights of marketed liposomal drugs using molecular dynamics simulations of reduced scale coarse-grained models. Npj Drug Discov. 2025;2:11. [Google Scholar]
- 195.Gholap A.D., Uddin M.J., Faiyazuddin M., Omri A., Gowri S., Khalid M. Advances in artificial intelligence for drug delivery and development: a comprehensive review. Comput Biol Med. 2024;178 doi: 10.1016/j.compbiomed.2024.108702. [DOI] [PubMed] [Google Scholar]
- 196.Agu P.C., Obulose C.N. Piquing artificial intelligence towards drug discovery: tools, techniques, and applications. Drug Dev Res. 2024;85 doi: 10.1002/ddr.22159. [DOI] [PubMed] [Google Scholar]
- 197.Serrano D.R., Luciano F.C., Anaya B.J., Ongoren B., Kara A., Molina G., et al. Artificial intelligence (AI) applications in drug discovery and drug delivery: revolutionizing personalized medicine. Pharmaceutics. 2024;16:1328. doi: 10.3390/pharmaceutics16101328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 198.Bikku T., Malligunta K.K., Thota S., Surapaneni P.P. Improved quantum algorithm: a crucial stepping stone in quantum-powered drug discovery. J Electron Mater. 2025;54:3434–3443. [Google Scholar]
- 199.Kar R.K. Benefits of hybrid QM/MM over traditional classical mechanics in pharmaceutical systems. Drug Discov Today. 2023;28 doi: 10.1016/j.drudis.2022.103374. [DOI] [PubMed] [Google Scholar]
- 200.Raghavan B., Paulikat M., Ahmad K., Callea L., Rizzi A., Ippoliti E., et al. Drug design in the exascale era: a perspective from massively parallel QM/MM simulations. J Chem Inf Model. 2023;63:3647–3658. doi: 10.1021/acs.jcim.3c00557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 201.Tom G., Schmid S.P., Baird S.G., Cao Y., Darvish K., Hao H., et al. Self-driving laboratories for chemistry and materials science. Chem Rev. 2024;124:9633–9732. doi: 10.1021/acs.chemrev.4c00055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 202.Bayley O., Savino E., Slattery A., Noël T. Autonomous chemistry: navigating self-driving labs in chemical and material sciences. Matter. 2024;7:2382–2398. [Google Scholar]
- 203.Putz S., Döttling J., Ballweg T., Tschöpe A., Biniyaminov V., Franzreb M. Self-driving lab for solid-phase extraction process optimization and application to nucleic acid purification. Adv Intell Syst. 2025;7 [Google Scholar]
- 204.Bai J.R., Mosbach S., Taylor C.J., Karan D., Lee K.F., Rihm S.D., et al. A dynamic knowledge graph approach to distributed self-driving laboratories. Nat Commun. 2024;15:462. doi: 10.1038/s41467-023-44599-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 205.Hysmith H., Foadian E., Padhy S.P., Kalinin S.V., Moore R.G., Ovchinnikova O.S., et al. The future of self-driving laboratories: from human in the loop interactive AI to gamification. Dig Dis. 2024;3:621–636. [Google Scholar]
- 206.MacLeod B.P., Parlane F.G.L., Berlinguette C.P. How to build an effective self-driving laboratory. MRS Bull. 2023;48:173–178. [Google Scholar]
- 207.Hu J.W., Zhang L., Wu C.Y., Ouyang D.F. Exploring computational pharmaceutics―AI and modeling in Pharma 4.0. John Wiley & Sons; Hoboken: 2024. Computational modeling of dry powder inhalation; pp. 85–103. [Google Scholar]
- 208.Hu J.W., Li W., Zhang L., Wu C.Y. A microscopic diffusion-induced discrete element model for swellable particles. Chem Eng J. 2023;464 [Google Scholar]
- 209.Hu J.W., Li W., Zhang L., Tan Y.Q., Wu C.Y. DEM-CFD analysis of swelling behaviors of binary particle systems with a microscopic diffusion model. AIChE J. 2024;70 [Google Scholar]
- 210.Tong Z.B., Zheng B., Yang R.Y., Yu A.B., Chan H.K. CFD-DEM investigation of the dispersion mechanisms in commercial dry powder inhalers. Powder Technol. 2013;240:19–24. [Google Scholar]
- 211.Zheng C., Govender N., Zhang L., Wu C.Y. GPU-enhanced DEM analysis of flow behaviour of irregularly shaped particles in a full-scale twin screw granulator. Particuology. 2022;61:30–40. [Google Scholar]
- 212.Jiang Y., Byrne E., Glassey J., Chen X.Z. Integrating graph neural network-based surrogate modeling with inverse design for granular flows. Ind Eng Chem Res. 2024;63:9225–9235. [Google Scholar]
- 213.Chen X.Z., Liu K., Wang L.G., Li L., Luo Z.H. In: Exploring computational pharmaceutics―AI and modeling in Pharma 4.0. Ouyang D.F., editor. John Wiley & Sons; Hoboken: 2024. Multiscale models for tablet manufacturing process development; pp. 493–516. [Google Scholar]
- 214.Li R.J., Miao H., Zhou X.D., Zou R.P., Tong Z.B. In: Exploring computational pharmaceutics―AI and modeling in Pharma 4.0. Ouyang D.F., editor. John Wiley & Sons; Hoboken: 2024. Artificial intelligence and computational modeling in orally inhaled drugs; pp. 379–407. [Google Scholar]
- 215.Yang Y.C., Gengji J.J., Gong T., Zhang Z.R., Deng L. Time-lapse macro imaging with dissolution tests for exploring the interrelationship between disintegration and dissolution behaviors of solid dosages. Pharm Res. 2024;41:387–400. doi: 10.1007/s11095-024-03655-9. [DOI] [PubMed] [Google Scholar]
- 216.Rehm H.L., Page A.J.H., Smith L., Adams J.B., Alterovitz G., Babb L.J., et al. GA4GH: international policies and standards for data sharing across genomic research and healthcare. Cell Genom. 2021;1 doi: 10.1016/j.xgen.2021.100029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 217.Otto B. A federated infrastructure for European data spaces. Commun ACM. 2022;65:44–45. [Google Scholar]
- 218.Agus D.B., Jaffee E.M., Van Dang C. Cancer Moonshot 2.0. Lancet Oncol. 2021;22:164–165. doi: 10.1016/S1470-2045(21)00003-6. [DOI] [PubMed] [Google Scholar]
- 219.Artrith N., Butler K.T., Coudert F.X., Han S., Isayev O., Jain A., et al. Best practices in machine learning for chemistry. Nat Chem. 2021;13:505–508. doi: 10.1038/s41557-021-00716-z. [DOI] [PubMed] [Google Scholar]
- 220.Greener J.G., Kandathil S.M., Moffat L., Jones D.T. A guide to machine learning for biologists. Nat Rev Mol Cell Biol. 2022;23:40–55. doi: 10.1038/s41580-021-00407-0. [DOI] [PubMed] [Google Scholar]
- 221.Al-Rubaie M., Chang J.M. Privacy-preserving machine learning: threats and solutions. IEEE S&P. 2019;17:49–58. [Google Scholar]
- 222.Yu L.X., Raw A., Wu L., Capacci-Daniel C., Zhang Y., Rosencrance S. FDA's new pharmaceutical quality initiative: knowledge-aided assessment & structured applications. Int J Pharm X. 2019;1 doi: 10.1016/j.ijpx.2019.100010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 223.Sheller M.J., Edwards B., Reina G.A., Martin J., Pati S., Kotrotsou A., et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci Rep. 2020;10 doi: 10.1038/s41598-020-69250-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 224.Shahiwala A. In: A Handbook of artificial intelligence in drug delivery. Philip A., Shahiwala A., Rashid M., Md Faiyazuddin, editors. Academic Press; Cambridge: 2023. Chapter 5 - AI approaches for the development of drug delivery systems; pp. 83–96. [Google Scholar]
- 225.Raijada D., Wac K., Greisen E., Rantanen J., Genina N. Integration of personalized drug delivery systems into digital health. Adv Drug Deliv Rev. 2021;176 doi: 10.1016/j.addr.2021.113857. [DOI] [PubMed] [Google Scholar]
- 226.Jayatunga M.K.P., Xie W., Ruder L., Schulze U., Meier C. AI in small-molecule drug discovery: a coming wave?. Nat Rev Drug Discov. 2022;21:175–176. doi: 10.1038/d41573-022-00025-1. [DOI] [PubMed] [Google Scholar]
- 227.Harrer S., Shah P., Antony B., Hu J.Y. Artificial intelligence for clinical trial design. Trends Pharmacol Sci. 2019;40:577–591. doi: 10.1016/j.tips.2019.05.005. [DOI] [PubMed] [Google Scholar]
- 228.Transparency for machine learning-enabled medical devices: guiding principles. https://www.fda.gov/medical-devices/software-medical-device-samd/transparency-machine-learning-enabled-medical-devices-guiding-principles Available from:
- 229.Mazumdar H., Khondakar K.R., Das S., Halder A., Kaushik A. Artificial intelligence for personalized nanomedicine; from material selection to patient outcomes. Expet Opin Drug Deliv. 2025;22:85–108. doi: 10.1080/17425247.2024.2440618. [DOI] [PubMed] [Google Scholar]






