Skip to main content
Yonago Acta Medica logoLink to Yonago Acta Medica
. 2026 Feb 15;69(1):1–13. doi: 10.33160/yam.2026.02.001

Next-Generation Artificial Intelligence for ADME Prediction in Drug Discovery: From Small Molecules to Biologics

Soyoka Tanihata *, Hiroaki Iwata *
PMCID: PMC12910220  PMID: 41709935

ABSTRACT

Pharmacokinetic (PK) behavior, which emerges from the underlying processes of absorption, distribution, metabolism, and excretion (ADME), is central to drug discovery and development, dose optimization, and safety assessment. Despite decades of experimental and computational research, early-stage prediction of human PK remains a major challenge, contributing to clinical attrition and inefficiency in pharmaceutical pipelines. Advances in artificial intelligence (AI) and machine learning (ML) have significantly improved ADME predictions, particularly for small molecules. Traditional descriptor-based quantitative structure–activity relationship and classical ML methods offer interpretability and robust performance on standardized datasets. In contrast, graph neural networks, deep learning architectures, and chemical language models facilitate the learning of complex nonlinear structure–property relationships and multitask predictions. Multimodal frameworks further integrate experimental measurements, structural data, and biological contexts, enhancing predictive accuracy under low-data and heterogeneous conditions. Emerging modalities, including peptides, oligonucleotides, and antibody-based therapeutics, pose additional challenges owing to their sequence-dependent stability, conformational flexibility, and mechanistically distinct determinants of ADME and toxicity (ADMET). AI approaches that incorporate sequence-, structure-, and mechanism-aware representations combined with multimodal data integration have demonstrated improved predictability for medium- and large-molecule therapeutics. Recent developments in foundation-model architectures offer unified representations across chemical, biological, and biophysical domains, enabling cross-modality ADMET modeling with enhanced generalization and mechanistic interpretability. In this review, we summarize the evolution of computational ADME- and PK-oriented prediction frameworks from small molecules to complex biologics, highlighting methodological advances, representative studies, and emerging trends in multimodal and foundation-model approaches. We also discuss the limitations and future perspectives of the practical implementation of AI-driven ADMET predictions to support rational drug design and development.

Keywords: ADME prediction, artificial intelligence, biologics, peptides, small molecules


Pharmacokinetics (PK) describes the system-level behavior of drugs in the body and arises from the underlying absorption, distribution, metabolism, and excretion (ADME) processes. Together, these processes determine drug exposure, therapeutic efficacy, and safety, making reliable PK characterization indispensable throughout all stages of drug development, from early discovery to clinical implementation. In particular, accurate prediction of human PK parameters, such as clearance, volume of distribution, and systemic exposure, is closely linked to clinical success rates and rational dose optimization. Although early-stage prediction of PK properties is essential, traditional experimental approaches demand high costs, long timelines, and ethical concerns related to animal studies. Consequently, in silico methodologies that infer PK behavior by modeling ADME mechanisms using chemical and physicochemical information have emerged and become increasingly important.1, 2

However, despite these technological advancements, many drug candidates still fail in clinical development, often owing to limitations in early-stage human PK prediction.3, 4 Inadequate characterization of ADME and toxicity (ADMET)-related properties remains a leading cause of clinical attrition, underscoring the need for accurate and robust predictive technologies to improve R&D efficiency, reduce animal use, and enhance patient safety. In response, the pharmaceutical industry has increasingly adopted artificial intelligence (AI) and data science approaches, driving a broader movement toward “digital transformation” in drug discovery.5, 6

Historically, in silico ADME prediction originated from quantitative structure–activity relationship (QSAR) modeling developed in the 1960s and 1970s.7,8,9 QSAR approaches established statistical relationships between molecular descriptors and biological endpoints and became widely used owing to their simplicity and modest computational demands. Advances in regression modeling and multivariate analysis during the 1980s and 1990s enabled the handling of larger descriptor sets and facilitated broader application to ADME-related endpoints.10 Nonetheless, these early methods had limited ability to capture the nonlinear dependencies and complex molecular interactions underlying ADME behavior.11, 12

Parallelly, physiologically based pharmacokinetic (PBPK) modeling, initially conceptualized in 1937, rapidly advanced following improvements in computational resources in the late 1990s.13, 14 PBPK frameworks enable the mechanistic simulation of plasma concentration–time profiles, tissue distribution, and interspecies scaling, offering interpretability and extrapolation capabilities. However, their predictive accuracy is constrained by the availability and uncertainty of compound-specific and physiological parameters. In addition, the reliance on detailed mechanistic inputs limits flexibility, particularly when experimental data are incomplete or novel chemical structures are introduced.12

Similarly, traditional QSAR and other descriptor-based approaches are constrained by manually selected features and predefined model structures, often failing to capture the high-dimensional and nonlinear relationships governing ADME processes, such as membrane permeability, metabolic stability, and protein binding. Together, these limitations of PBPK and classical QSAR models have motivated the development of more flexible data-driven modeling strategies. Flexibility denotes the ability to learn representations directly from data without predefined descriptors or rigid model structures.

In contrast, modern AI-based methods, including deep learning and graph neural networks (GNNs), directly learn molecular features from structural representations, allowing for the modeling of highly nonlinear interactions and complex relationships intrinsic to ADME processes and the resulting PK behavior. Such nonlinear relationships often arise from complex, nonadditive interactions between molecular structures, physicochemical properties, and biological mechanisms, which are difficult to model using linear or manually engineered descriptor-based approaches. Recent advancements in transformer architectures, self-supervised learning, and chemistry-oriented foundation models have further enhanced the predictive performance, particularly for low-data regimes.5, 15,16,17 These developments have substantially improved the PK-relevant prediction accuracy for small molecules and enabled the characterization of subtle structure–property relationships that were previously difficult to detect.

Beyond structural data, multimodal learning approaches that integrate experimental measurements, cellular or tissue images, and sequence information have expanded rapidly, allowing for a more comprehensive modeling of ADME mechanisms.18, 19 Recent studies, including ours, have demonstrated that multimodal integration and imputation of missing non-clinical measurements improve the utility of ADME and PK prediction in real-world decision-making contexts.20, 21

As ADME mechanisms and PK determinants diversify across diverse chemical classes, predictive modeling frameworks must evolve accordingly. Concurrently, the therapeutic landscape has broadened beyond small molecules to include peptides, cyclic peptides, oligonucleotides, and antibody-based modalities (Fig. 1).22, 23 This expansion has fundamentally altered the chemical space, molecular representation, and dominant ADMET determinants, necessitating modality-aware adaptations of AI-based models originally developed for small molecules. Extending QSAR- and AI-based methodologies across these emerging modalities represents a central challenge in next-generation ADMET prediction.24

Fig. 1.

Fig. 1.

 Overview of molecular modalities, key ADME challenges, and representative AI modeling approaches.

In this review, we summarize the advances in ADME- and PK-oriented modeling across small, medium, and large molecular modalities, highlighting the progress enabled by deep learning, multimodal integration, and chemistry-oriented foundation models. We also discuss opportunities to unify ADMET predictions across modalities, connect algorithmic strategies to the summary tables provided, and outline the remaining challenges and future prospects for practical implementation in drug discovery.

AI-BASED ADME PREDICTION FOR SMALL MOLECULES

Small molecules are the most extensively investigated class of molecules in PK and ADME studies, reflecting decades of in vitro, in vivo, and computational data. Consequently, AI- and machine learning (ML)-driven ADME predictions are the most mature in this chemical space, with numerous modeling approaches demonstrating both industrial relevance and regulatory impact. In this section, we summarize the major methodologies, representative studies, and emerging trends in small-molecule ADME modeling, with an explicit link to Table 1, which provides a comparative overview of the model types, input features, and reported performance.

Table 1.  Summary of methodological trends and representative studies in AI-based ADME prediction for small molecules.

Target for Prediction Dataset Descriptor Algorithm Metrics Reference
In vivo humanintrinsic clearance (CLint) • 645 compounds with human intravenous PK data and corresponding in vitro measurements
• 16 unpublished Pfizer compounds
• Calculated physicochemical and structural descriptors based on chemical structure
• In vitro experimental values
• In silico–predicted parameters
• XGBoost • AAFE
• R2
Keefer C.E. et al., Mol. Pharm., 2023
Small-moleculeproperty prediction 11 public datasets from MoleculeNet (Regression: 3 tasks; Classification: 8 tasks; Single-task: 6; Multi-task: 5) External test: HIV virtual screening • Molecular descriptors
• PubChem fingerprints
• Substructure fingerprints
• Molecular graphs
• Descriptor-based models: SVM, XGBoost, RF, DNN
• Graph-based models: GCN, GAT, MPNN, AttentiveFP
• RMSE
• ROC-AUC
Jiang D. et al., J. Cheminform., 2021
Oral bioavailability (transfer learning) • Solubility dataset: 9,940 compounds
• Oral bioavailability dataset: 1,447 compounds
• Graph structural representation • GNN with transfer learning Ng S.S.S. et al., J. Chem. Inf. Model., 2023
Molecular property prediction(foundation model) Over 91 million PubChem compounds • SMILES • SMI-TED289M family (encoder–decoder Transformer) Soares et al., Communications Chemistry, 2025
Human clearance (CLtot) 748 compounds with human clearance data • Molecular graphs • Deep Tensor model • GMFE Iwata et al., J. Pharm. Sci., 2021

Traditional descriptor-based ML/QSAR

Historically, traditional QSAR models have relied on curated molecular descriptors and classical ML algorithms such as random forests, gradient boosting (such as XGBoost), support vector machines, and fully connected neural networks. These models have been widely applied to predict the clearance, permeability, tissue distribution, enzyme inhibition, and plasma protein binding.

Advantages:

  • • High interpretability

  • • Strong performance in well-standardized datasets

Limitations:

  • • Dependence on handcrafted descriptors

  • • Difficulty in capturing nonlinear interactions and higher-order structural features

Since these models rely on predefined descriptors and fixed feature spaces, their predictive performance strongly depends on prior chemical knowledge and is often degraded when extrapolated to novel chemical scaffolds or sparsely sampled regions of the chemical space.

Representative studies have demonstrated its strengths and limitations. Keefer et al. have compared mechanistic in vitro to in vivo extrapolation (IVIVE) with an XGBoost-based IVIVE model using 645 compounds. The integration of structural descriptors, in vitro measurements, and ML-predicted in vivo-like parameters improved the predictive accuracy (AAFE 2.5 vs. 2.8 for mechanistic IVIVE).25 (See Table 1 for the performance metrics).

Graph neural networks

The advent of GNNs, including message-passing neural networks, graph convolutional networks (GCNs), graph attention networks (GATs), and hybrid architectures such as AttentiveFP, has marked a major step forward. By learning directly from molecular topologies, GNNs capture higher-order structural features, such as local stereochemistry, extended conjugation, and subgraph motifs, which are typically inaccessible to descriptor-based methods. This end-to-end representation learning enables GNNs to model nonadditive and context-dependent structure–property relationships, providing improved flexibility than descriptor-based QSAR models.

Benchmark studies have shown that while classical descriptor-based models, such as SVMs, often outperform GNNs on individual tasks, GNNs demonstrate clear advantages in large-scale, multitask, and foundation-model-like settings. Jiang et al. have systematically evaluated 11 MoleculeNet datasets spanning physicochemical, biophysical, and ADME endpoints and reported that GNN performance surpassed classical methods when models were trained in shared representation and multitask regimes rather than single-task settings.17 This finding informs model selection strategies in industrial pipelines and underscores the importance of scalable representation learning frameworks for ADME prediction. (Refer to Table 1 for GNN-based model examples and dataset coverage.)

Deep learning and transformer-based models

Recent developments have extended representation learning to deep learning and transformer-based architectures, including chemical language models (such as SMILES-based), 3D conformer ensembles, and geometric deep learning approaches. These models capture the stereoelectronic and spatial determinants of ADME properties, such as passive permeability, transporter affinity, and ligand–protein interaction profiles, even under low-data conditions. By leveraging large-scale pre-training and contextualized molecular embeddings, transformer-based models can improve data efficiency and generalization across diverse ADME endpoints.

Transfer learning has proven particularly effective. Ng et al. have pre-trained a GNN on ~10,000 solubility data points and fine-tuned it for oral bioavailability prediction using only 1,447 compounds, substantially improving reproducibility.26 Similarly, Soares et al. introduced SMI-TED289M, a chemical language model trained on 91 million SMILES strings, exemplifying foundation-model architectures capable of general-purpose molecular embeddings transferable across ADME endpoints.27 (Table 1 summarizes the models and their input features.)

Multimodal learning

Emerging multimodal learning frameworks integrate molecular structures with experimental data, including microsomal stability, permeability, plasma protein binding, and non-clinical measurements across species. This approach enables unified learning across chemistry and biology, thereby improving the prediction of complex endpoints such as hepatic clearance and oral absorption. Such integration allows models to disentangle multiple ADME-determining factors simultaneously, thus reducing the uncertainty caused by data sparsity, inter-laboratory variability, and species differences that limit single-modality approaches.

We have previously developed multitask GNN models capable of simultaneously predicting multiple ADME properties, with explainability analyses demonstrating mechanistically meaningful substructure capture.20 Additionally, the imputation of missing preclinical measurements expanded dataset coverage and improved generalization across chemical series, incorporating compounds previously excluded owing to incomplete data.21

Multimodal strategies enhance predictive accuracy and bridge data sparsity, inter-laboratory variability, and species differences, which traditionally constrain QSAR and mechanistic IVIVE approaches. Table 1 provides representative multimodal models and highlights the integration of structural and experimental features.

Summary

These studies illustrate the trajectory of small-molecule ADME modeling toward increasingly integrated, data-efficient, and biologically grounded approaches. AI-based methods complement and enhance QSAR and mechanistic IVIVE frameworks, offering avenues to address persistent challenges such as data sparsity, inter-laboratory noise, and species differences. As datasets continue to expand and foundation-model methodologies become more prevalent, small-molecule ADME prediction is poised to transition from task-specific modeling to unified, multi-source frameworks that support early drug design decisions with greater accuracy and mechanistic insight.

TRANSITION TO PEPTIDES AND MEDIUM-SIZED MOLECULES

Peptide therapeutics and medium-sized molecules, typically spanning 500–2000 Da, constitute a rapidly expanding class of modalities that bridge the physicochemical and PK properties of small molecules and biologics. Unlike classical small molecules, these compounds often exhibit limited passive membrane permeability, pronounced susceptibility to proteolytic degradation, and a dependence on active transport mechanisms for cellular uptake. Their ADME behavior is further shaped by features unique to peptide chemistry, including backbone flexibility, secondary structure formation, and interactions with carrier proteins, resulting in complex PK profiles. These complexities highlight the inadequacy of traditional QSAR models and standard GNN architectures trained primarily on small-molecule datasets, prompting intensified interest in AI approaches that integrate sequence, structure, and mechanism-aware information for ADME prediction.

Conformational complexity and 3D-aware modeling

Medium molecules exhibit greater conformational freedom than small molecules, which introduces additional modeling challenges. Often, conformational ensembles, rather than single low-energy structures, determine key ADME properties such as permeability, metabolic stability, and receptor engagement. Unlike small molecules, whose ADME behavior can often be approximated using static representations, peptides require explicit consideration of their dynamic conformational distributions.

To address this challenge, emerging AI models have incorporated the following:

  • • Three-dimensional descriptors and learned geometric representations

  • • Topological information capturing long-range intramolecular interactions

  • • Generative frameworks, such as equivariant neural networks and diffusion-based molecular generators, to explore chemical and conformational spaces.

These 3D-aware approaches improve predictive performance by explicitly modeling spatial arrangements and intramolecular constraints that govern permeability and stability. The availability of high-accuracy protein structure prediction engines, such as AlphaFold28 and ESMFold,29 has further accelerated the integration of structural information into ADME modeling, particularly for cyclic peptides, stapled peptides, and peptide–protein fusion constructs.

Data scarcity and multimodal learning

Despite methodological advances, progress in medium-molecule ADME prediction remains limited, owing to the limited availability of systematically annotated datasets. Key endpoints such as plasma stability, renal clearance, metabolic half-life, and extrahepatic tissue degradation are often underrepresented owing to experimental costs and chemical diversity. This data sparsity amplifies uncertainty and limits the applicability of single-modality structure-only prediction models.

Multimodal learning strategies, originally developed for small molecules, are valuable in addressing these limitations. By integrating sequences, structures, physicochemical descriptors, experimental measurements, and biological contexts, these approaches enable the following:

  • • Effective utilization of limited datasets

  • • Robustness against experimental variability

  • • Improved generalization across peptide chemotypes

By jointly learning from heterogeneous data sources, multimodal models reduce overfitting and improve robustness in low-data regimes, which is typical in peptide ADME studies. The representative AI methodologies for medium-sized molecules are summarized in Table 2.

Table 2.  Summary of methodological trends and representative studies in AI-driven ADMET prediction for peptides and medium-sized molecules.

Prediction Target Dataset Descriptor(s) Algorithm Metrics Reference
Drug-likeness and TIM-3 peptide bioactivity prediction Pre-training: ChEMBL / PeptideAtlas;
Fine-tuning: TIM-3 dataset
Molecular graphs; SMILES Reinforcement Learning (Actor–Critic) + GAT • Validity
• Novelty
• Diversity
• Scaffold similarity
• FCD
Wang et al. (2024)
hGLP-1R activity
hSCTR activity
Solubility
Plasma concentration profiles
In vivo efficacy
• 2,688 peptides
• In vitro functional assays, solubility, fibrillation, rat PK, and other endpoints
• z-scale descriptors
• One-hot encoding
• Gaussian Process (GP)
• CNN
• Random Forest
• RMSE
• R2
Nielsen et al. (2024)
Melanin binding Pilot peptide microarray (119 peptides) Composition, transition, distribution, autocorrelation, conjoint triad, quasi-sequence-order descriptors, pseudo amino acid and amphiphilic pseudo amino acid composition (1,094 variables) Random Forest OOB classification error Hsueh et al. (2023)
Melanin bindingaffinity 5,483 peptides
(7–12 aa)
Same peptide-level descriptors as above; low-information variables excluded • Nested CV
• NN, GBM, XGBoost, GLM, DRF, XRT
• Super Learner
R2, normalized MAE, normalized RMSE, AIC,
rank-sum scoring
Hsueh et al. (2023)
Cell permeability 460 cell-permeable and 462 non-permeable peptides Same peptide-level descriptors as above; low-information variables excluded • Nested CV
• 100 NN, 100 GBM, 100 XGBoost, DRF, XRT
• Super Learner
Log loss, MCC, F1, Balanced accuracy (rank-sum scoring) Hsueh et al. (2023)
Cytotoxicity 1,777 toxic and 3,522 non-toxic peptides Composition, transition, distribution, autocorrelation, conjoint triad, quasi-sequence-order descriptors, pseudo–amino acid composition, and amphiphilic pseudo–amino acid composition descriptors (1,094 variables in total) Nested CV; 100 NN, 100 GBM, 100 XGBoost, GLM, DRF, XRT; Super Learner F1 score Hsueh et al. (2023)
Peptide aggregation propensity (regression & classification) 62,159 coarse-grained MD-derived penta–decapeptides 1D vectors / sequence / graph representations • 1D: SVM, RF, MLP
• Sequence: RNN, LSTM, Bi-LSTM, Transformer
• Graph: GCN, GAT, GraphSAGE
• MAE
• MSE
• R2
• Accuracy
• Precision
• Recall
• F1
Liu et al. (2023)

ML-guided peptide optimization

AI-driven peptide optimization has advanced considerably, with early success in leveraging GNNs combined with reinforcement learning to generate novel peptide sequences. Unlike traditional trial-and-error optimization, these approaches directly explore high-dimensional sequence spaces under explicit ADME-related constraints.

In these systems:

  • • Amino acids are represented as graph nodes, enabling flexible modeling of linear and cyclic peptides.

  • • Multi-objective reward functions incorporate activity, solubility, stability, and drug-like physicochemical properties.

This paradigm enables the exploration of high-dimensional sequence spaces while preserving structural diversity, thereby facilitating the identification of peptide candidates with improved ADME and pharmacological profiles.

Integrated peptide design platforms have also emerged, as exemplified by systems for optimizing GLP-1 receptor agonists.30 These frameworks combine iterative cycles of design, synthesis, experimental evaluation, and downstream ML analysis; and integrating multiple endpoints (receptor potency, solubility, metabolic stability, and in vivo PK) to enable rational, multi-objective optimization. Such closed-loop learning frameworks outperform conventional sequential optimization by explicitly balancing the competing ADME and efficacy requirements. The resulting candidates demonstrated enhanced biological activity and favorable in vivo efficacy, illustrating the practical benefits of ML-assisted peptide drug discovery.

Additionally, ML has been applied to engineer long-acting ophthalmic peptides.31 By integrating microarray-derived melanin-binding data with cellular uptake and toxicity measurements, ensemble models have predicted multi-property profiles, yielding peptide conjugates with prolonged ocular retention and improved delivery characteristics. This example highlights how ML can uncover non-obvious relationships between binding, distribution, and retention, which are difficult to capture using a rule-based design.

Representation learning for peptide biophysical properties

Systematic investigations have evaluated optimal representation strategies for peptide biophysical endpoints. Using coarse-grained molecular dynamics datasets, sequential models (RNNs, LSTMs, and Transformers) were compared with graph-based architectures (GCN, GAT, and GraphSAGE) to predict peptide self-association.32 These comparative studies clarify how different architectures encode sequence order, long-range interactions, and collective behavior relevant to ADME outcomes.

Transformers consistently outperformed alternative architectures across a wide range of sequence lengths and chemotypes, suggesting that the determinants of aggregation and assembly can be largely inferred from primary sequences. Insights into solubility, stability, and tissue retention are critical for ADME prediction of endpoints influenced by aggregation.

Summary

Collectively, peptides and medium-sized molecules constitute a dynamic methodological frontier for predicting ADME. This field is transitioning from small-molecule-centric QSAR and GNN approaches to integrated sequence- and structure-aware frameworks that incorporate multimodal biochemical, biophysical, and structural information. These advances address the key limitations of traditional models by capturing the conformational flexibility, data sparsity, and multi-property trade-offs intrinsic to peptide ADME. As datasets expand and foundation-model-inspired architectures become increasingly accessible, AI technologies should play a central role in the rational design, multi-property optimization, and enhanced predictability of ADME behavior for this modality class.

EXTENDING TO OLIGONUCLEOTIDES AND BIOLOGICS

Nucleic acid therapeutics, including siRNA and antisense oligonucleotides (ASOs), and emerging modalities, such as circular RNA, along with antibody and protein therapeutics, represent modalities whose ADMET profiles are governed by biological mechanisms that are distinct from those of small molecules. Key factors such as tissue permeability, cellular uptake pathways, interactions with serum and tissue proteins, and innate immune activation often dominate the PK behavior. Unlike small molecules, whose ADMET properties are largely structure-driven, these modalities exhibit PK behavior that is primarily determined by sequence-dependent and system-level biological processes. Consequently, AI models for these modalities must integrate information beyond chemical structure by combining sequence features, chemical modifications, structural descriptors, and biological context. This has motivated the development of multimodal learning frameworks capable of capturing both molecular- and system-level ADMET determinants.

Nucleic acid therapeutics: sequence- and modification-aware modeling

For oligonucleotide therapeutics, critical determinants of ADME include:

  • • Chemical modification patterns (such as phosphorothioate linkages, 2′-O-methyl, or 2′-fluoro substitutions)

  • • Duplex versus single-stranded conformations

  • • Interactions with RNA-binding proteins

Collectively, these features influence nuclease stability, plasma protein binding, endosomal trafficking, and in vivo distribution. As these effects arise from hierarchical interactions between sequences, chemical modifications, and higher-order structures, single-representation QSAR models are insufficient to capture oligonucleotide ADMET behavior. Recent AI approaches have employed hybrid architectures that combine the following features:

  • • Sequence encodings

  • • Graph-based or 3D structural representations

These frameworks capture the hierarchical nature of the oligonucleotide structure, enabling improved predictions of stability, distribution, and cellular uptake. By jointly modeling sequence-level and structural features, these hybrid models outperform conventional linear or descriptor-based approaches in predicting in vivo-relevant ADME endpoints.

A representative example is ASOptimizer, which is a two-stage AI-driven framework for oligonucleotide design. Initial linear-model screening was followed by refinement using an Edge Graph Transformer (EGT), balancing knockdown potency with toxicity.33, 34 Experimental validation, for example, with indoleamine 2,3-dioxygenase 1 (IDO1), demonstrated enhanced activity over traditional PS-ASO and Gapmer designs, while also revealing differences in immunomodulatory effects, such as macrophage differentiation. This study illustrates how multi-objective AI optimization can simultaneously address the efficacy- and ADMET-related constraints unique to nucleic acid therapeutics. (Table 3 shows the representative performance metrics).

Table 3.  Summary of methodological trends and representative studies in AI-driven ADMET prediction for oligonucleotides and biologics.

Target for Prediction Dataset Descriptor Algorithm Metrics Reference
ASO-mediated mRNA suppression efficiency (IDO1) 187,090 ASO experimental records (full dataset D) + 155-entry IDO1 subset (DIDO1) • Thermodynamic features (ΔG)
• Mean off-target ΔG
• Secondary-structure accessibility
• Molecular graph of ASOs
Linear regression model + Edge-augmented Graph Transformer • Linear regression performance
• Transformer-based predictive accuracy
Hwang et al. (2024)
Compound stability in human plasma (classification) In vitro stability data Internal dataset: 932 compounds Public datasets (PubChem, ChEMBL): 2,166 compounds • Molecular graphs
• Molecular fingerprints
Hybrid CMPNN + Self-attention + Fingerprint model (PredPS) Baselines: RF, SVM, DMPNN, CMPNN • Pearson correlation
• MSE (sequence task)
• Pairwise ranking accuracy (chemical task)
Jang et al. (2023)
Molecular property prediction (HOMO/LUMO gap, NPA charge, yield, stereoselectivity) Pretraining: ChEMBL-25 (1.87M molecules) Downstream: MoleculeNet (4 tasks) + 4 catalysis-related datasets (DHBD, NHC, phosphines-yield, phosphines-selectivity) • Molecular images
• Structure pseudo-labels
• Image augmentations
MoleCLIP (OpenAI CLIP ViT-B/16 backbone) with contrastive pretraining + 3-layer MLP prediction head • MAE
• ROC-AUC
• distribution-shift robustness
Harnik et al. (2025)

Biologics: protein- and antibody-focused ADMET modeling

Although this section focuses on protein- and antibody-related ADMET challenges, some representative AI methods discussed here were originally developed for small molecules or modified compounds and are highlighted for their conceptual relevance and potential applicability to biologic-associated ADMET tasks. The ADMET behavior of mAbs and engineered proteins depends on several factors.

  • • FcRn binding affinity

  • • Glycosylation patterns

  • • Aggregation propensity

  • • Immune system interactions

Advances in protein language models and deep-learning-based structure prediction have enabled the estimation of stability, solubility, and binding profiles, gradually linking these properties to PK outcomes such as half-life and tissue penetration. These approaches improve upon traditional empirical models by directly learning sequence–structure–function relationships that govern antibody stability and disposition.

Although many of these determinants are specific to protein- and antibody-level biology, current AI models often address only subsets of these factors, particularly those related to molecular stability, degradation, and distribution. Accordingly, the representative models discussed below are highlighted as conceptually relevant approaches that capture specific ADMET-related aspects, rather than comprehensive antibody PK determinants.

Representative models illustrating such partial yet informative approaches include:

  • • MoleCLIP,35 which applies foundation-model pre-training using large molecular image collections via a contrastive language–image pre-training (CLIP)-based contrastive framework, demonstrating robust prediction for tasks with limited datasets. This highlights the potential of visual molecular representations for complementing conventional descriptors. Although not explicitly designed for proteins, MoleCLIP exemplifies how representation learning can enhance generalization under data-scarce conditions relevant to biologic-related ADMET tasks. (Originally developed for small-molecule and modified compound representations).

  • • PredPS,36 a structure-only model built on a communicative message-passing neural network (CMPNN) with self-attention that emphasizes substructures relevant to chemical stability. Combined with Morgan fingerprints, it outperforms prior classifiers for plasma stability prediction. This approach demonstrates how attention-based message-passing can identify stability-relevant motifs, a concept transferable to degradation and clearance modeling in biologics. (This model targets the chemical stability of small or modified molecules rather than the protein backbones.)

Summary and outlook

Collectively, these studies illustrate a shift toward multimodal and biologically informed ADMET modeling for oligonucleotides and biologics. AI methodologies routinely integrate sequence, structural, and experimental data to predict the PK behavior, candidate prioritization, and rational design. By explicitly accounting for modality-specific biological mechanisms, these models overcome key limitations of small-molecule-centric ADME prediction frameworks. As these modalities continue to expand across therapeutic areas, such approaches will become increasingly essential for enabling reliable and mechanistically grounded ADMET predictions (see Table 3 for a summary of the methods and performance benchmarks).

EMERGING TRENDS: MULTIMODAL LEARNING AND FOUNDATION MODELS

The rapid evolution of cross-modal ADMET predictions has brought multimodal learning and foundation models to the forefront of computational PK. Traditional QSAR-style approaches, which rely heavily on handcrafted descriptors, are increasingly complemented or, in some cases, replaced by architectures that can integrate multiple heterogeneous data streams. These include:

  • • Structural representations

  • • Sequence information

  • • Physicochemical properties

  • • High-throughput in vitro assay results

  • • Preclinical in vivo measurements

  • • Early clinical data

By capturing the complementary aspects of molecular and biological behaviors, multimodal learning provides greater robustness in contexts with sparse, noisy, or modality-specific datasets. Unlike single-modality models, these approaches can exploit partial information across data types, reducing sensitivity to missing values and experimental variability. Our previous studies support this trend: by partially imputing missing non-clinical measurements and integrating them with molecular features, we demonstrated improved estimation of human PK parameters, highlighting the potential of multimodal strategies to bridge gaps across experimental systems and chemical modalities.

Foundation models for cross-modality ADMET prediction

Since 2023, foundation models capable of jointly processing small molecules, proteins, and nucleic acids have emerged as a major technological shift.33, 37 These models unify tasks such as molecular property prediction, protein structure inference, interaction modeling, and antibody optimization within a single representational framework. By learning shared latent representations across modalities, foundation models enable knowledge transfer between molecular classes and prediction tasks, which were previously treated independently.

Key advantages include:

  • • Extraction of cross-domain knowledge from large-scale chemical, biological, and biophysical data sets

  • • Learning representations that generalize across chemical structures, biomolecular sequences, and assay modalities

  • • Potential to serve as a basis for next-generation ADMET modeling

In contrast to task-specific QSAR and GNN models, foundation models reduce the need for extensive retraining and feature engineering when applied to new endpoints or molecular modalities. The representative methods illustrating these capabilities are summarized in Table 4.

Table 4.  Summary of methodological trends and representative studies in multimodal and foundation model approaches for ADMET prediction.

Target for Prediction Dataset Descriptor Algorithm Metrics Reference
Molecularproperties MoleculeNet (Lipo, ESOL, FreeSolv, BACE) Drugs-75K Kraken • SMILES
• Molecular graphs
• 3D conformers
MolMix: – Transformer for SMILES – Message Passing Neural Network for 2D graphs – Equivariant neural network for 3D conformers – Optimized with Flash Attention and bfloat16 (bf16) precision Mean Absolute Error Manolache et al. (2024)
Molecularproperties Pre-training: NMRShiftDB-2 Downstream: MoleculeNet (11 datasets) Virtual screening benchmarks: DUO-E, LIT-PCBA • ECFP
• SMILES
• NMR spectra
• Molecular images
MMFRL (Multimodal Feature Relational Learning) • ROC-AUC
• RMSE
Zhou et al. (2025)

Representative multimodal architectures

Multimodal Fusion with Relational Learning (MMFRL) demonstrates how multimodal architectures leverage relational structures across different types.34 Key features include:

  • • Integration of diverse descriptors (SMILES, molecular fingerprints, 2D images, and NMR spectra) into graph neural network embeddings.

  • • The use of relative similarity structures rather than absolute feature alignment enables consistent latent representations despite incomplete data.

  • • Intermediate fusion strategies with excellent performances across multiple molecular net benchmarks, particularly under low-data conditions.

By modeling the relationships between modalities rather than relying on complete feature availability, MMFRL improves the robustness than conventional concatenation-based multimodal models. More importantly, MMFRL inference does not require auxiliary modalities, reduces operational costs, and facilitates its adoption in industrial settings.

MOLMIX provides a complementary, structure-centric approach.38 Its framework:

  • • Separately encodes SMILES sequences, 2D molecular graphs, and multiple 3D conformers

  • • Concatenates tokenized outputs into a unified Transformer for joint representation learning

  • • Exploits cross-dimensional structural correlations without complex fusion mechanisms

This design enables MOLMIX to capture complementary structural information across representation levels, thus improving generalization beyond that achievable with any single structural modality. MOLMIX demonstrated strong generalization across the MoleculeNet and MARCEL tasks, with notable improvements on the BACE and Kraken datasets. Effective transfer learning highlights its potential as a practical foundation for structure-driven ADMET predictions.

Summary and outlook

Together, these developments reflect a broad shift toward models capable of integrating reasoning across multiple biological and chemical modalities. As datasets become increasingly multimodal and encompass chemical structures, omics profiles, experimental measurements, and clinical observations, these approaches should play a central role in advancing ADMET predictions across molecular classes. By combining multimodal integration with foundation-model pre-training, emerging frameworks can overcome the key limitations of traditional QSAR and single-task deep learning, particularly in data-sparse and cross-modality settings. The combination of multimodal learning with foundation-model pre-training enhances predictive performance and improves mechanistic insight, enabling informed drug discovery and development decisions.

CONCLUSIONS AND FUTURE PERSPECTIVES

The landscape of ADMET prediction is undergoing a rapid and transformative expansion. Although AI methodologies for small molecules have matured substantially over the past decade, emerging modalities, including peptide, nucleic acid, and antibody therapeutics, pose new challenges that exceed the capacity of conventional QSAR frameworks. In this context, the recent progress in large-scale multimodal learning and foundation models represents a pivotal turning point. These models, which were initially developed for use in materials science and molecular informatics, are beginning to exhibit broad generalization capabilities across chemical and biological spaces, enabling knowledge transfer in previously data-sparse domains. Recent reports in Chemical Science and JACS Au have demonstrated that pre-training on heterogeneous large-scale chemical datasets yields zero-shot performance that surpasses traditional supervised baselines, highlighting the disruptive potential of such approaches.39, 40

A key advantage of these foundation models is their ability to learn unified representations from highly diverse sources, including chemical structures, spectroscopic data, biomolecular sequences, protein structures, and molecular dynamics trajectories. Through self-supervised objectives, these models capture the latent principles governing molecular behavior, allowing downstream ADMET tasks to be addressed with minimal task-specific fine-tuning. Foundational studies published in Npj Computational Materials and Chemical Science further underscore that integrating millions to hundreds of millions of multimodal data points can yield robust cross-domain embeddings with strong transferability,35, 41 These findings collectively suggest that a single universal backbone model for PK and toxicological prediction, long envisioned but technically elusive, may be within reach.

Nevertheless, several issues require careful consideration before such models can be incorporated into mainstream drug development workflows. First, although pre-trained representations offer broad generalizations, their reliability in highly novel chemical spaces or clinical edge cases must be systematically evaluated. Second, the lack of standardized benchmarks, harmonized metadata, and transparent reporting practices impedes cross-study comparability. Third, the ethical implications of increasingly automated decision making, particularly concerning animal use reduction, data-driven bias mitigation, and appropriate human oversight, must be addressed as model complexity increases. Progress in these areas is essential to ensure that the advantages of AI-driven ADMET predictions are responsibly and reproducibly realized.

The trajectory of methodological development points toward an “in the silico-first” paradigm, in which human ADMET properties are inferred directly from molecular structures or sequences, enabling drastically reductions in non-clinical experimentation and an informed early-stage design. As foundation models continue to evolve toward broader chemical–biological universality, they are poised to redefine the computational toolkit of ADMET science and the conceptual boundaries of drug discovery itself. The next decade is likely to witness the consolidation of these advances into practical pipelines, marking the transition from auxiliary computational tools to a new central framework for predictive pharmacology.

Footnotes

DECLARATION OF GENERATIVE AI AND AI-ASSISTED TECHNOLOGIES IN THE WRITING PROCESS: During the preparation of this work, the author used ChatGPT (OpenAI) in order to improve the readability of the manuscript and to translate text from Japanese to English. After using this tool, the author reviewed and edited the content as needed and takes full responsibility for the content of the published article.

The authors declare no conflict of interest.

REFERENCES

  • 1.Ekins S,Lane TR,Urbina F,Puhl AC. In silicoADME/tox comes of age: twenty years later. Xenobiotica. 2024;54:352-8. 10.1080/00498254.2023.2245049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2..Stokes JM,Yang K,Swanson K,Jin W,Cubillos-Ruiz A,Donghia NM,et al. A deep learning approach to antibiotic discovery. Cell. 2020;180:688-702. e613. [DOI] [PMC free article] [PubMed]
  • 3.Waring MJ,Arrowsmith J,Leach AR,Leeson PD,Mandrell S,Owen RM,et al. An analysis of the attrition of drug candidates from four major pharmaceutical companies. Nat Rev Drug Discov. 2015;14:475-86. 10.1038/nrd4609 [DOI] [PubMed] [Google Scholar]
  • 4.Morgan RE,van Staden CJ,Chen Y,Kalyanaraman N,Kalanzi J,Dunn RT II,et al. A multifactorial approach to hepatobiliary transporter assessment enables improved therapeutic compound development. Toxicol Sci. 2013;136:216-41. 10.1093/toxsci/kft176 [DOI] [PubMed] [Google Scholar]
  • 5.Schneider G. Mind and machine in drug design. Nat Mach Intell. 2019;1:128-30. 10.1038/s42256-019-0030-7 [DOI] [Google Scholar]
  • 6.Vamathevan J,Clark D,Czodrowski P,Dunham I,Ferran E,Lee G,et al. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019;18:463-77. 10.1038/s41573-019-0024-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Dearden JC. The history and development of quantitative structure-activity relationships (QSARs). International Journal of Quantitative Structure-Property Relationships. 2016;1:1-44. IJQSPR 10.4018/IJQSPR.2016010101 [DOI] [Google Scholar]
  • 8..Selassie C,Verma RP. History of quantitative structure-activity relationships. Burger’s medicinal chemistry and drug discovery. 2003;1:1-48.
  • 9.Hansch C,Maloney PP,Fujita T,Muir RM. Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature. 1962;194:178-80. 10.1038/194178b0 [DOI] [Google Scholar]
  • 10.Li J,Zhao T,Yang Q,Du S,Xu L. A review of quantitative structure-activity relationship: the development and current status of data sets, molecular descriptors and mathematical models. Chemom Intell Lab Syst. 2025;256:105278. 10.1016/j.chemolab.2024.105278 [DOI] [Google Scholar]
  • 11.Cherkasov A,Muratov EN,Fourches D,Varnek A,Baskin II,Cronin M,et al. QSAR modeling: where have you been? Where are you going to? J Med Chem. 2014;57:4977-5010. 10.1021/jm4004285 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Jones HM,Rowland-Yeo K. Basic concepts in physiologically based pharmacokinetic modeling in drug discovery and development. CPT Pharmacometrics Syst Pharmacol. 2013;2:1-12. 10.1038/psp.2013.41 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Teorell T. Kinetics of distribution of substances administered to the body, I: the extravascular modes of administration. Arch Int Pharmacodyn Ther. 1937;57:205-25. [Google Scholar]
  • 14.Huang H,Zhao W,Qin N,Duan X. Recent progress on physiologically based pharmacokinetic (PBPK) model: A review based on bibliometrics. Toxics. 2024;12:433. 10.3390/toxics12060433 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chen Z,Li D,Liu M,Liu J. Graph neural networks with molecular segmentation for property prediction and structure–property relationship discovery. Comput Chem Eng. 2023;179:108403. 10.1016/j.compchemeng.2023.108403 [DOI] [Google Scholar]
  • 16.Schwaller P,Probst D,Vaucher AC,Nair VH,Kreutter D,Laino T,et al. Mapping the space of chemical reactions using attention-based neural networks. Nat Mach Intell. 2021;3:144-52. 10.1038/s42256-020-00284-w [DOI] [Google Scholar]
  • 17.Jiang D,Wu Z,Hsieh CY,Chen G,Liao B,Wang Z,et al. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J Cheminform. 2021;13:12. 10.1186/s13321-020-00479-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ohnuki Y,Akiyama M,Sakakibara Y. Deep learning of multimodal networks with topological regularization for drug repositioning. J Cheminform. 2024;16:103. 10.1186/s13321-024-00897-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Lu X,Xie L,Xu L,Mao R,Xu X,Chang S. Multimodal fused deep learning for drug property prediction: integrating chemical language and molecular graph. Comput Struct Biotechnol J. 2024;23:1666-79. 10.1016/j.csbj.2024.04.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Iwata H,Matsuo T,Mamada H,Motomura T,Matsushita M,Fujiwara T,et al. Prediction of Total Drug Clearance in Humans Using Animal Data: Proposal of a Multimodal Learning Method Based on Deep Learning. J Pharm Sci. 2021;110:1834-41. 10.1016/j.xphs.2021.01.020 [DOI] [PubMed] [Google Scholar]
  • 21.Iwata H,Matsuo T,Mamada H,Motomura T,Matsushita M,Fujiwara T,et al. Predicting Total Drug Clearance and Volumes of Distribution Using the Machine Learning-Mediated Multimodal Method through the Imputation of Various Nonclinical Data. J Chem Inf Model. 2022;62:4057-65. 10.1021/acs.jcim.2c00318 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Luo X,Chen H,Song Y,Qin Z,Xu L,He N,et al. Advancements, challenges and future perspectives on peptide-based drugs: focus on antimicrobial peptides. Eur J Pharm Sci. 2023;181:106363. 10.1016/j.ejps.2022.106363 [DOI] [PubMed] [Google Scholar]
  • 23.Bjørnsdottir I,Lotz R,Lindmark B,Hood S,Christensen JK. Meeting report: DMDG peptide and oligonucleotide ADME workshop 2024. Xenobiotica. 2025;55:277-82. 10.1080/00498254.2025.2506702 [DOI] [PubMed] [Google Scholar]
  • 24.Tang Y,Cao Y. Modeling pharmacokinetics and pharmacodynamics of therapeutic antibodies: progress, challenges, and future directions. Pharmaceutics. 2021;13:422. 10.3390/pharmaceutics13030422 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Keefer CE,Chang G,Di L,Woody NA,Tess DA,Osgood SM,et al. The comparison of machine learning and mechanistic in vitro–in vivo extrapolation models for the prediction of human intrinsic clearance. Mol Pharm. 2023;20:5616-30. 10.1021/acs.molpharmaceut.3c00502 [DOI] [PubMed] [Google Scholar]
  • 26.Ng SSS,Lu Y. Evaluating the use of graph neural networks and transfer learning for oral bioavailability prediction. J Chem Inf Model. 2023;63:5035-44. 10.1021/acs.jcim.3c00554 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Soares E,Vital Brazil E,Shirasuna V,Zubarev D,Cerqueira R,Schmidt K. An open-source family of large encoder-decoder foundation models for chemistry. Commun Chem. 2025;8:193. 10.1038/s42004-025-01585-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Jumper J,Evans R,Pritzel A,Green T,Figurnov M,Ronneberger O,et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583-9. 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lin Z,Akin H,Rao R,Hie B,Zhu Z,Lu W,et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023;379:1123-30. 10.1126/science.ade2574 [DOI] [PubMed] [Google Scholar]
  • 30.Nielsen JC,Hjo̷rringgaard C,Nygaard MM,Wester A,Elster L,Porsgaard T,et al. Machine-learning-guided peptide drug discovery: development of GLP-1 receptor agonists with improved drug properties. J Med Chem. 2024;67:11814-26. 10.1021/acs.jmedchem.4c00417 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Hsueh HT,Chou RT,Rai U,Liyanage W,Kim YC,Appell MB,et al. Machine learning-driven multifunctional peptide engineering for sustained ocular drug delivery. Nat Commun. 2023;14:2509. 10.1038/s41467-023-38056-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Liu Z,Wang J,Luo Y,Zhao S,Li W,Li SZ. Efficient prediction of peptide self-assembly through sequential and graphical encoding. Brief Bioinform. 2023;24:bbad409. 10.1093/bib/bbad409 [DOI] [PubMed] [Google Scholar]
  • 33.Morehead A,Ruffolo J,Bhatnagar A,Madani A. Towards joint sequence-structure generation of nucleic acid and protein complexes with SE (3)-discrete diffusion. arXiv preprint arXiv:240106151. 2023.
  • 34.Zhou Z,Li Y,Hong P,Xu H. Multimodal fusion with relational learning for molecular property prediction. Commun Chem. 2025;8:200. 10.1038/s42004-025-01586-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Harnik Y,Shalit Peleg H,Bermano AH,Milo A. Data efficient molecular image representation learning using foundation models. Chem Sci (Camb). 2025;16:10833-41. 10.1039/D5SC00907C [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Jang WD,Jang J,Song JS,Ahn S,Oh KS,Pred PS. PredPS: attention-based graph neural network for predicting stability of compounds in human plasma. Comput Struct Biotechnol J. 2023;21:3532-9. 10.1016/j.csbj.2023.07.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Shoshan Y,Raboh M,Ozery-Flato M,Ratner V,Golts A,Weber JK,et al. MAMMAL–Molecular Aligned Multi-Modal Architecture and Language. arXiv preprint arXiv:241022367. 2024.
  • 38.Manolache A,Tantaru D,Niepert M. MolMix: a simple yet effective baseline for multimodal molecular representation learning. arXiv preprint arXiv:241007981. 2024.
  • 39.Ramos MC,Collison CJ,White AD. A review of large language models and autonomous agents in chemistry. Chem Sci (Camb). 2025;16:2514-72. 10.1039/D4SC03921A [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Choi J,Nam G,Choi J,Jung Y. A perspective on foundation models in chemistry. JACS Au. 2025;5:1499-518. 10.1021/jacsau.4c01160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Pyzer-Knapp EO,Manica M,Staar P,Morin L,Ruch P,Laino T,et al. Foundation models for materials discovery – current state and future directions. NPJ Comput Mater. 2025;11:61. 10.1038/s41524-025-01538-0 [DOI] [Google Scholar]

Articles from Yonago Acta Medica are provided here courtesy of Tottori University Medical Press

RESOURCES