Skip to main content
CPT: Pharmacometrics & Systems Pharmacology logoLink to CPT: Pharmacometrics & Systems Pharmacology
. 2025 Oct 28;14(12):2210–2220. doi: 10.1002/psp4.70128

LSTM‐Based Prediction of Human PK Profiles and Parameters for Intravenous Small Molecule Drugs Using ADME and Physicochemical Properties

Pingyao Luo 1,2, Rong Chen 1,2, Zhisong Wu 2, Yaou Liu 2,3, Tianyan Zhou 1,2,
PMCID: PMC12706403  PMID: 41147737

ABSTRACT

Accurate prediction of human pharmacokinetics (PK) for lead compounds is one of the critical determinants of successful drug development. Traditional methods for PK parameter prediction, such as in vitro to in vivo extrapolation and physiologically based pharmacokinetic modeling, often require extensive experimental data and time‐consuming calibration of parameters. Machine learning (ML) has been widely applied to predict ADME and physicochemical properties (ADMEP descriptors), but studies focusing on concentration‐time (C‐t) profile prediction remain limited. In this study, we developed a Long Short‐Term Memory (LSTM) based ML framework to predict C‐t profiles following intravenous (IV) bolus drug administration in humans. The model used ADMEP descriptors generated by ADMETlab 3.0 and dose information as input. A total of 40 drugs were used for training and 18 for testing, with concentration data simulated from published PK models. Our approach achieved R 2 of 0.75 across all C‐t profiles, and 77.8% of C max, 55.6% of clearance, and 61.1% of volume of distribution predictions within a 2‐fold error range, demonstrating predictive performance comparable to previously published ML methods. Furthermore, model performance was found to be associated with the input dose level and ADMEP descriptors, suggesting the accuracy and confidence of the prediction may be expected in advance via these descriptors. This LSTM‐based framework using a small number of compounds enables efficient prediction of human PK profiles with IV dosing, offering a practical alternative to traditional PK prediction models. It holds promise for improving early‐phase prioritizing lead compounds and reducing reliance on animals in drug development.

Keywords: concentration‐time prediction, long short‐term memory (LSTM) networks, machine learning, pharmacokinetics


Study Highlights.

  • What is the current knowledge on the topic?
    • Machine learning (ML) approaches have been widely applied to predict pharmacokinetic (PK) and ADME parameters.
    • However, few studies focus on predicting full concentration‐time (C‐t) profiles of humans. Existing approaches often rely either on large compound datasets (ML) or on detailed experimental data (PBPK) for prediction.
  • What question did this study address?
    • Can an autoregressive Long Short‐Term Memory (LSTM) model trained using simulated IV bolus concentration‐time data, dose, and ADME/physicochemical properties accurately predict human intravenous C‐t profiles without extensive experimental inputs?
    • Can dense, simulated concentration data per compound substitute for a large number of compounds in ML training?
  • What does this study add to our knowledge?
    • This study demonstrates that an LSTM‐based model can predict human IV C‐t profiles with accuracy comparable to published PBPK and other ML‐based methods, while making minimal mechanistic assumptions and requiring fewer compounds.
  • How might this change drug discovery, development, and/or therapeutics?
    • The framework can serve as a practical tool for early PK prediction in humans with IV dosing and may help reduce reliance on animal testing in early‐stage development.

1. Introduction

In the early stages of drug discovery, understanding the absorption, distribution, metabolism, and excretion (ADME) properties of a compound is crucial for predicting pharmacokinetic (PK) parameters and concentration‐time profiles (C‐t) in humans. Poor PK properties remain one of the major causes of clinical trial failures [1], which highlights the importance of early and reliable PK evaluation to improve the success rate of drugs entering Phase I clinical trials [2].

Traditional methods, such as in vitro to in vivo extrapolation (IVIVE) [3] and physiologically based pharmacokinetic (PBPK) [4, 5, 6, 7], predict PK parameters and C‐t profiles by integrating ADME parameters from in vitro assays, in vivo animal data if necessary, together with physicochemical properties and human physiological data. However, obtaining comprehensive in vitro or in vivo data is time‐consuming, costly, and typically occurs at later stages of drug development. Besides, the pharmaceutical field and regulatory agencies such as the FDA are promoting alternative testing methods to reduce reliance on animal data and enhance animal welfare [8, 9].

For these reasons, machine learning (ML) approaches have gained significant attention for PK prediction, particularly for their ability to provide rapid and experimental‐data‐free solutions in the early stages of drug design and screening. Current ML methods exhibit considerable accuracy in predicting ADME and PK parameters from chemical structures, such as fraction unbound (fu), plasma protein binding (PPB), clearance (CL), volume of distribution (Vss), half‐life (T1/2), and area under the curve (AUC) [10, 11, 12]. Some researchers have further integrated PBPK modeling with ML‐predicted ADME and physicochemical properties to predict C‐t profile [13, 14, 15, 16].

Although ML techniques have been applied in the prediction of C‐t profile, such studies remain relatively underexplored. Classical ML methods such as random forest (RF) [16, 17], eXtreme Gradient Boosting (XGBoost) [16, 18], support vector regression (SVR) [18] and other algorithms have shown promising results in C‐t profile prediction tasks. In parallel, deep learning (DL) frameworks such as Alchemite [19] and DeepCt [20] enable direct C‐t prediction from chemical structures via data‐driven imputation and compartmental parameter inference, respectively. However, most of these models were trained on C‐t data from hundreds to thousands of compounds, collecting high‐quality data of which is labor intensive, and variations in data quality or experimental conditions may compromise model robustness. Furthermore, with the exception of the study based on human data by Jia et al. [16], the majority of existing studies relied on preclinical animal data, limiting both their translational relevance and clinical applicability.

The Long Short‐Term Memory (LSTM) network is a variant of recurrent neural networks (RNNs) in DL, designed with memory cells that enable retention of long‐range dependencies and effective capture of temporal dynamics, making them particularly well‐suited for time series prediction tasks [21]. Recent studies have demonstrated that RNN‐based architectures, including LSTM networks, are well‐suited for modeling pharmacological time‐series data, such as pharmacokinetic‐pharmacodynamic (PK‐PD) relationships, and capturing the temporal relationships between drug exposure and response [22, 23]. This application highlights the potential of LSTM‐based models for direct prediction of C‐t profiles. Given that PK profiles change over time in a step‐by‐step way, models tend to perform better when each time point is predicted based on information from previous ones. Autoregressive training strategies leverage this temporal structure to improve learning efficiency, particularly under limited data conditions. By segmenting time‐series data into overlapping subsequences using a fixed‐length sliding window (i.e., window size), these strategies augment the training set and enhance generalization.

Owing to the availability of mature tools for structure‐based ADME and physicochemical properties prediction, this study focused on C‐t profile prediction. An LSTM‐based model was developed and trained using estimated properties and simulated C‐t data derived from published PK models. The model was applied to forecast C‐t profiles following intravenous (IV) drug administration in human, and its performance was systematically evaluated and compared with existing modeling approaches. To our knowledge, this is the first study to apply an autoregressive LSTM framework to predict intravenous human C‐t profiles. This approach shows promise in reducing reliance on animal experiments in early drug development and in providing valuable insights into drug exposure in human.

2. Method

2.1. Data Collection

Population pharmacokinetic (PopPK) and compartmental models describing IV bolus, IV infusion, or combined administration routes were retrieved through systematic literature searches in PubMed, Google Scholar, and Embase. A diverse and representative dataset was assembled by selecting drugs from various therapeutic classes—such as antibiotics, anticancer agents, and CNS drugs—with a wide range of ADME and physicochemical properties. This diversity was intended to improve the generalizability of the model and better reflect the compound variability encountered in early‐phase drug development.

The majority of selected studies involved healthy adult populations, primarily from Europe and the United States. In cases where data from these populations were unavailable, studies conducted in other regions were considered. PK models based on patient populations were included if the subjects had mild or stable conditions—such as pain, psychiatric disorders, cardiovascular diseases, or mild hepatic/renal impairment. Studies involving severe hepatic or renal dysfunction were excluded. Studies including mixed populations (e.g., adults and the elderly) were accepted if the reported PK parameters were considered representative of healthy adults.

For each compound, typical parameter values reported in these models were used to simulate IV bolus C‐t profiles over a 24‐h period at 0.1‐h intervals using the PKPDsim package in R [24]. To augment the dataset and account for inter‐dose variability, three dose levels were established for each drug based on literature data. For compounds from multiple‐dose studies, the three doses matched the original modeling doses. For single‐dose studies, the intermediate dose was designated as the reference dose, with high and low doses defined within a 2‐fold range. These simulations provided target concentration data for ML model training and testing.

Small molecule structures were represented using SMILES strings. Based on the molecular SMILES string, then ADME‐related features were generated using ADMETlab 3.0 (https://admetlab3.scbdd.com/) [12]. For each compound, 39 descriptors were extracted to represent its ADME and physicochemical characteristics (ADMEP descriptors). These descriptors are categorized into 5 groups: 1–5 represent physicochemical properties such as logS and logP, 6–14 are absorption‐related parameters such as human intestinal absorption and MDCK permeability, 15–23 are distribution‐related parameters such as fu and PPB, 24–36 are metabolism‐related parameters such as the probability of being CYPs inhibitor, and 37–39 are excretion‐related parameters such as CL of plasma and T1/2 (Table S1). The ADMEP descriptors were used as input variables for the ML model.

2.2. Model Architectures and Implementation

The proposed architectures are shown in Figure 1. After generating ADMEP descriptors and simulating concentration data, we constructed an ML architecture comprising three sequential modules: an Initial Model (IM) based on a Multi‐Layer Perceptron (MLP) and two C‐t profile prediction modules—Combined Model 1 and Combined Model 2 (CM1 and CM2). The two Combined Models integrate an LSTM sub‐network for handling temporal concentration data and an MLP sub‐network for processing static ADMEP descriptors and dose information.

FIGURE 1.

FIGURE 1

Overview of the study design and model architecture. (1) Chemical structures of the selected drugs were collected and represented as SMILES strings. (2) Based on the SMILES strings, ADME‐related descriptors were generated using ADMETlab 3.0. (3) Human PK model parameters were extracted from published literature and used to simulate C‐t profiles. (4) These ADMEP descriptors, dose, and simulated PK data were then used for a machine learning framework containing MLP and LSTM networks to train. (5) The trained models used ADMEP descriptors and dose to predict full C‐t profiles.

2.2.1. Initial Condition Estimation (Initial Model)

The IM estimates the drug concentration at time zero (C max) from ADMEP descriptors. The output was generated in a dose‐normalized form and subsequently multiplied by the actual dose to obtain the final estimated concentration. This prediction served as the initial condition for the subsequent time‐series prediction.

2.2.2. C‐t Profile Prediction (Combined Models)

For both CM1 and CM2, all input concentration data was log‐transformed before training. The trained CM1 (window size = 1) takes the log‐transformed C max (from the IM), ADMEP descriptors, and dose as input, and predicts early concentration values until a predefined time point (first “X” time steps). The trained CM2 (window size = “X”) then uses ADMEP descriptors, dose, and the sequence of concentrations predicted by IM and CM1 to complete the remainder of the C‐t profile.

The window size of CM2 was selected empirically by comparing model performance across different window lengths using metrics such as the coefficient of determination (R 2), root mean square error (RMSE), and mean absolute percentage error (MAPE), and choosing the configuration that yielded the best overall accuracy (details of metrics calculation and model architectures are provided in Method Sections S1 and S2).

Models were implemented in Python (v3.9.18) using the PyTorch (v2.1.1) [25] framework and trained using the standard deep learning workflow. Details of training settings, including optimizer, learning rate scheduling, loss functions, and batch size, are provided in the Method Section S2.

2.3. Feature Selection

A total of 40 features (39 ADMEP descriptors and dose information) could lead to model instability. To enhance robustness and mitigate overfitting, feature selection was performed using SHAP (SHapley Additive exPlanations) analysis [26]. SHAP assigns each feature an importance value by quantifying its contribution to the model's predictions. To determine the optimal number of features, the cumulative importance of ranked SHAP values was calculated. Based on cumulative importance, the minimal subset of features was selected such that their cumulative SHAP importance remained below a predefined threshold “P” (details of calculating SHAP values are provided in Method Section S3).

2.4. PK Parameter Derivation

For each drug and dose level, concentration values were either observed or predicted at 0.1‐h intervals over a 24‐h period. Based on non‐compartmental analysis (NCA), key pharmacokinetic parameters [T1/2, mean residence time (MRT), and CL, Vss] were estimated from both observed and predicted profiles. For each compound, parameter values were calculated for all three dose levels and subsequently averaged to obtain a single representative value per compound. All calculations were performed using the PKNCA package in R [27].

2.5. Assessment of Prediction Performance

The primary PK endpoints included the C‐t profile, C max, and four NCA‐derived parameters. ML‐based predictions for these endpoints were compared with observed data in the testing dataset. Performance was assessed using R 2, median fold error (MFE), geometric mean fold error (GMFE), and the proportion of predictions within a twofold error range (details of metrics calculation are provided in Method Section S1). For C max and PK parameters, observed‐to‐predicted ratio plots were also provided as supplementary evaluation metrics. To benchmark model performance, we compared the results with those from representative published studies employing ML‐, PBPK‐, or mechanism‐based approaches. Since each metric captures a different aspect of performance, we interpreted them jointly to assess whether our model achieved accuracy comparable to published studies. Furthermore, to assess the impact of dose level on prediction accuracy, dose‐normalized predictions were generated for each compound at four doses input levels (1, 10, 100, and 1000 mg) and compared with observed data.

All statistical metrics and plots were generated using R software (version 4.1.3).

3. Result

3.1. Data Collection

Dose information, PK model results, ADMEP descriptors of 58 small molecule drugs were collected. Among these drugs, 40 of them were classified as the training dataset, and 18 of them were classified as the testing dataset (listed in Table S2). The distributions of molecular weight and physicochemical properties in the testing set were within the range of the training set, as shown in Figure S1. For each drug, concentration data were simulated at 0.1‐h intervals over 0–24 h at three dose levels, yielding approximately 40,000 data points in total.

3.2. CM2 Architecture Determination

To optimize the window size in CM2, we compared the predictive performance across different values (2, 3, 5, 7, 10, and). As shown in Table 1, although smaller windows also performed reasonably well, a window size of 5 achieved the best overall performance, with the highest R 2 (0.88) and lowest RMSE (1.57 mg/L). In contrast, larger sizes tended to introduce more error or instability. Therefore, a window size of 5 was selected to balance temporal feature extraction and predictive robustness.

TABLE 1.

Summary of predictive performance for different window sizes in combined Model 2.

Window size R 2 RMSE (mg/L) MAPE (%)
2 0.87 1.63 1505
3 0.78 2.13 4007
5 0.88 1.57 3515
7 0.79 2.08 2589
10 0.84 1.85 2558

3.3. Feature Selection

Different thresholds (P) of cumulative SHAP importance were set to select the features for IM and CMs. P was 80% for IM, which selected 19 features. For two CMs, 90% was set, and CM1 and CM2 respectively identified 20 and 21 features that were not completely identical (Table S1). Then we used the union of the two sets of features (containing 27 features) as the final training features for CMs.

3.4. Evaluation of C‐t Profiles

Figure 2 presented the predicted C‐t profiles in the testing dataset. Since each drug had three dose levels, the plotted profiles represented the average dose‐normalized concentrations to enhance visual clarity. Overall, the predicted and observed profiles were well aligned across most compounds, with an MFE of 2.50, R 2 of 0.75, and 41.6% of the profiles falling within a 2‐fold error margin, which indicated reasonably good predictive performance (Table 2).

FIGURE 2.

FIGURE 2

Predicted dose‐normalized C‐t profiles in the testing set. Each profile represents the average of three dose levels per compound. Solid lines indicate observed values; dashed lines indicate predicted values.

TABLE 2.

Summary of predictive performance for our model and selected published studies using ML‐, PBPK‐, and mechanism‐based approaches.

Research Main method Metrics Profile C max T1/2 MRT CL Vss
ML

This research

n = 40 (training)/18 (test)

LSTM‐based R 2 0.75 0.26 −0.66 −0.14 0.13 −0.12
GMFE 4.61 1.85 2.01 1.49 1.91 1.98
MFE 2.50 1.42 1.38 1.37 1.49 1.64
Within 2‐fold 41.6% 77.8% 55.6% 83.3% 55.6% 61.1%

Becker (time split, rat, i.v. + p.o.)

n = 21,000 (total) a

Derivation/Direct*

DeepCt [20] R 2 0.66 0.07/0.15 −0.04/0.19 0.32/0.31 −0.11/0.24
GMFE
MFE 2.03 1.87/1.84 1.96/1.83 1.83/1.87 2.13/1.75
Within 2‐fold 54%/55% 51%/56% 56%/55% 45%/60%

Pillai b (rat, i.v.)

n = 316/61

XGBoost [18] R 2 0.65
GMFE 3.01
MFE 2.35
Within 2‐fold 41.9%

Handa c (rat, i.v.)

N = 970 (total)

RF [17] R 2 0.18–0.61 0.47 0.63
GMFE
MFE
Within 2‐fold 49.8%–91.2% 76.1% 75.7%

Obrezanova (rat, i.v. + p.o.)

n = 5895/632

Derivation/Direct*

Alchemite [19] R 2 0.46/0.42 0.30/0.39 0.54/0.57 0.28/0.45
GMFE
MFE d 1.62/1.70 125/85.1 1.95/1.91 2.19/2.00
Within 2‐fold e 87% 55% 78% 78%
ML + PBPK/ML

Jia (human, i.v.)

n = 667/106

ML‐PBPK hybrid model and hierarchical ML frameworks [16] R 2 0.59 −0.26 0.48 0.60
GMFE 2.33 2.14 2.00 1.88
MFE
Within 2‐fold 55% 59% 64% 62%
PBPK

Mao f (human, i.v. + p.o.)

n = 18 (total)

Individualized PBPK [6] R 2
GMFE
MFE 2.16
Within 2‐fold 44.4% 94% 64% 64%

Geci f (human, i.v. + p.o.)

n = 160 (total)

high‐throughput

PBPK [14]

R 2 0.43
GMFE 3.57 3.03 3.51
MFE 2.62 2.18 2.44
Within 2‐fold 37.5% 44.1% 36.6%
Mechanism‐based

Himstedt f (human, p.o.)

n = 40 (total)

Mechanism‐based [3] R 2 0.77 0.40 0.53 0.76 0.73
GMFE 1.83 1.90 1.73 1.77 1.81
MFE 1.84 1.59 1.69 1.81 1.60 1.70
Within 2‐fold 52.5% 65% 67.5% 62.5% 67.5% 70.0%
*

“Derivation” indicates that parameters were calculated from the predicted C‐t profiles, while “Direct” refers to parameters predicted directly using a specific model.

a

“Total” refers to PK experiments (not compounds); data split: 70% training, 5% validation, 25% test.

b

Except for R 2, all metrics were calculated using observed concentrations extracted from published figures via digitization, based on available time points.

c

For C‐t profile predictions, the value of R 2 represented the range of performance metrics across 8 selected time points.

d

As only the RMSE in the log‐transformed scale was reported in the original study, MFE was approximated by exponentiating the RMSE (MFE≈10RMSE).

e

The “within 2‐fold” metric represents the best‐performing model among eight model architectures in the study; other metrics were derived from the same model architecture.

f

The results were summarized and reported in Himstedt et al., and are cited directly from their publication.

Figure 3a showed the predicted versus observed concentrations across all time points. Most values fell within the 5‐fold error range and clustered closely around the line of identity (dashed red). Figure 3b illustrated the log2‐transformed FE over time, where prediction errors remained low in the early phase but gradually increased in the terminal phase. Together, both plots indicate that model performance was more accurate during early time periods and at higher drug concentrations, with moderate bias emerging toward later time points and at lower concentrations.

FIGURE 3.

FIGURE 3

Predicted versus observed concentrations (mg/L) in the testing dataset on a log–log scale (a). log 2‐transformed fold error (FE) over time (b). The red dashed line represents line of identity indicating a 1:1 correlation, while the black dotted and solid lines indicate the 2‐fold and 5‐fold error bounds, respectively.

The scatter plot of predicted versus observed concentrations from the IM, representing dose‐normalized C max predictions, was shown in Figure 4a. Most data points were distributed around the line of identity, with 77.8% falling within the 2‐fold error range, and the model exhibited relatively low bias (MFE = 1.42, Table 2; median observed‐to‐predicted ratio ≈1, Figure 4b), suggesting that the model effectively captured the trends of C max.

FIGURE 4.

FIGURE 4

Predicted versus observed for C max and PK parameters in the testing set on a log–log scale (a). The red dashed line represents line of identity indicating a 1:1 correlation, while the black dotted and solid lines indicate the 2‐fold and 5‐fold error bounds, respectively. Distribution of ratio of observed‐to‐predicted ratios for C max and PK parameters (b). The red dashed line represents a ratio of 1 (perfect prediction), and the black dotted lines indicate 0.5‐ and 2‐fold bounds.

In Figure 5a, the 39 ADMEP descriptors of 58 compounds were projected into a two‐dimensional space using t‐distributed stochastic neighbor embedding (t‐SNE). Notably, compounds in the lower‐right region of the t‐SNE space were almost exclusively associated with better prediction accuracy (R 2 ≥ 0.6).

FIGURE 5.

FIGURE 5

t‐SNE plot of 58 compounds based on ADME descriptors (a). Point color indicates dataset (blue: Training, orange: Testing), and shape indicates prediction accuracy (cross: R 2 < 0.4, hollow: 0.6 > R 2 ≥ 0.4, solid: R 2 ≥ 0.6). t‐SNE plot of compounds based on ADME descriptors, clustered by k‐means (b). Each point shape indicates the best‐performance dose at which the model achieved optimal prediction: ● = 1 mg, ▲ = 10 mg, ■ = 100 mg, + = 1000 mg.

3.5. Evaluation of PK Parameters

Most predictions fell within a 5‐fold error margin, and over 50% were within 2‐fold error (Figure 4a). Among all, MRT achieved the highest prediction accuracy, with 83.3% of results within 2‐fold error (Table 2). Figure 4b presented the distribution of observed‐to‐predicted ratios for each parameter. Ratios were generally centered around 1 and mostly within the commonly accepted 2‐fold range (0.5–2). Slight upward bias was observed in MRT and Vss, and a slight downward shift in CL, indicating minor systemic deviations. Although the R 2 values were low for all parameters, the GMFE values were generally below 2, and all MFE values except for Vss were below 1.5 (Table 2), suggesting reasonable agreement between predicted and observed values, without significant systematic bias.

3.6. Comparison of Prediction Performance

The predictive performance of our model was evaluated on six key PK endpoints: the C‐t profile, C max, T1/2, MRT, CL, and Vss. We compared these results with some published studies applying various modeling strategies, including ML, PBPK‐based, and mechanism‐based approaches. Since not all publications reported quantitative performance metrics for the profiles, some results were estimated by digitizing figures or summarized based on information provided by Himstedt et al. [3]. The comparison across different studies is presented in Table 2.

Despite being trained on a relatively small dataset, our model achieved the highest R 2 for C‐t profile predictions among the studies reviewed. In terms of other metrics, its performance was comparable to the DeepCt model [20] which was trained on data from more than 20,000 PK experiments (e.g., within 2‐fold: MRT: 83.3% vs. 56%, Vss: 61.1% vs. 60%). Accuracy for full‐profile prediction was also on par with models using XGBoost [18], which achieved 41.9% within a 2‐fold error of profiles, compared to 41.6% in our study. Another study comparing the ML‐PBPK hybrid model to hierarchical ML frameworks found the latter to outperform in overall C‐t profiles (55% within 2‐fold) [16], albeit with higher GMFE values for other endpoints than our result. Other methods showed better results for part of the endpoints. For example, the RF‐based model achieved slightly higher within‐2‐fold accuracy for CL and Vss (~76%), and greater precision of concentrations at specific time points (~90%), though they did not predict the entire 24‐h profile [17]. The deep learning model based on the Alchemite method also showed good performance in rat PK prediction (e.g., R 2 up to 0.63 for CL, 87% of C max within 2‐fold error), and demonstrated reasonable accuracy in C‐t profile prediction, but exact results for C–t profiles were not fully disclosed [19].

Compared with PBPK approaches, our model also demonstrated competitive accuracy. Individualized PBPK frameworks, integrating in vitro and in vivo data from multiple species, achieved excellent predictive accuracy (e.g., 94% of C max within 2‐fold error) [6], but often required more complex inputs. In contrast, the high‐throughput PBPK framework [14] based on in silico data offered broad applicability but slightly lower predictive accuracy, with MFE > 2 and GMFE > 3 across all endpoints. For C‐t profile prediction, the proportion of predictions within a 2‐fold error range in two studies (44.4% and 37.5%) was similar to that observed in our study (41.6%).

Additionally, a comprehensive approach combining empirical data, expert knowledge, and modeling pipelines demonstrated strong performance, with 52.5% of C‐t profiles within a 2‐fold error range and high R 2 for all parameters [3]. Despite this, our model achieved comparable MFE and GMFE values.

3.7. Influence of Dose Setting

Figure S2 presents representative examples of predicted dose‐normalized C‐t profiles with four doses input to CMs (1, 10, 100, and 1000 mg) to illustrate the model's performance under different dose settings. The prediction accuracy across dose levels varied in different compounds: for instance, comp44 was best predicted at low dose, comp39 at medium dose, comp56 at high dose, while comp20 showed similar performance across all doses. A complete summary of the model's prediction performance across all dose levels and compounds is available in Table S2. For most compounds, the best‐performance doses were approximately close to the actual doses. To further explore the relationship between compound features and predictive performance across input doses, all compounds were projected into a t‐SNE plot based on ADMEP descriptors and clustered using k‐means (Figure 5b). The resulting pattern revealed dose‐related clustering tendencies: green‐labeled compounds (Cluster 3) tended to correspond to higher dose settings (100–1000 mg), red‐labeled (Cluster 1) to medium doses (10–100 mg), and blue‐labeled (Cluster 2) to lower doses, where 1 or 10 mg appeared more frequently.

4. Discussion

In this study, we developed an LSTM‐based framework to predict C‐t profiles for IV bolus administered drugs using ADME and physicochemical properties. Relying on these features rather than raw chemical structures and avoiding explicit mechanistic assumptions, our model offers a flexible and data‐driven alternative—particularly suited for early‐stage drug development, where rapid and efficient prediction is essential. ADMEP descriptors can be interpreted as lower‐dimensional, PK‐relevant embedding of structure information. This approach allows the model to focus on features more directly associated with PK behavior, thereby improving prediction accuracy and reducing the computational cost of processing raw structural data. Similar strategies have been employed in previous studies, such as those by Pillai et al. [18] and Jia et al. [16], which incorporated features such as CL, Vss, fu, pka acid/base and logP. These features were also retained in our model following SHAP‐based feature selection, which additionally identified other relevant features for C‐t profile prediction, such as PPB (Table S1). Across six PK endpoints, our model showed good agreement with observed values, particularly for C max and MRT. Although most predicted PK parameters fell within acceptable error margins and exhibited relatively low MFE, the corresponding R 2 values were generally low across all endpoints except for the C‐t profiles. This can be partly attributed to sensitivity of R 2 to outliers and to small sample size. A few predictions with large residuals, or endpoints with highly skewed distributions, may reduce R 2, even when fold‐level prediction performance remains acceptable. In addition, we compared our model's predictions of CL and Vss with values generated from ADMETlab 3.0 (Figure S3 and Table S3). Although ADMETlab achieved higher R 2 values (0.59 for CL and 0.58 for Vss), our model exhibited lower MFE, along with median observed‐to‐predicted ratios closer to 1. Paired t‐tests also showed no significant differences between our predictions and the observed values (this study vs. ADMETlab 3.0: p = 0.32 vs. 0.03 for CL; p = 0.76 vs. 0.006 for Vss), suggesting that our model may offer improved alignment with actual values.

In Figure 2, for a few compounds, slight fluctuations were observed in the early phase of prediction. This may be due to inconsistencies between CM1 and CM2, especially at the point where CM1 passes prediction to CM2. The distribution of CM1‐generated predictions used as input for CM2 during prediction may differ from the data distribution seen during training, which may lead to unstable predictions [28, 29]. As shown in Figure 3, model bias tended to increase at later time points and lower concentrations. Therefore, the high R 2 value for C‐t profiles primarily reflects accurate predictions in high‐concentration regions, while relatively poor performance in low‐concentration and terminal phase could lead to a low proportion of concentration predictions within the 2‐fold error range and low R 2 for T1/2. This, in turn, indirectly impaired the accuracy of CL and Vss. A key factor is the autoregressive nature of the model, where each prediction depends on the output of the previous time step. This structure allows early prediction errors to propagate and accumulate over time, especially in the terminal phase of the profile [28]. Another possible reason is that low concentration data change very little over time. This can make the model learn a too‐smooth curve and miss small changes, which adds bias. These issues may also reflect limitations of the current weighted L1–MSE loss, which could be refined (e.g., adaptive weighting or weighted MSE) to remain sensitive to both low‐ and high‐concentration data [30, 31, 32]. In contrast, PBPK [6] and mechanism‐based [3] models embed physiological or mechanistic constraints, making them more reliable for terminal‐phase extrapolation than purely data‐driven ML approaches.

t‐SNE‐based clustering revealed that compounds with similar ADME and physicochemical properties tended to exhibit consistent predictive performance and similar best‐performance dose. In practical scenarios with unknown doses, a compound's position and cluster assignment in the ADMEP descriptor space may help identify “best‐performance” input dose, especially for red‐ or green‐cluster compounds. While some blue‐labeled compounds performed best above 10 mg (Figure 5b), most of them had actual doses below this best‐performance dose or showed comparable accuracy at both high and low doses (Table S2).

While offering strong mechanistic interpretability, PBPK and mechanism‐based approaches often require high‐quality data and careful calibration of parameters, which limit their utility in early‐stage drug development. ML approaches provide an alternative, but many require very large datasets. Although trained on a relatively small dataset of compounds, our model achieved comparable accuracy. This was partly because the PK simulation yielded a dense concentration‐time dataset with approximately 30,000 data points, which provided sufficient information for training. These data were generated primarily from linear PK models, and concentrations were log‐transformed prior to training. Therefore, the C‐t profiles tended to show linear trends over time [33]. Within each autoregressive window, the linear trend led to relatively consistent proportional changes between time points, allowing the model to learn regular transitions instead of complex or random fluctuations. These might partially explain the improved model efficiency and the reduced requirement of compounds.

Our study also has several limitations. First, the dataset is relatively small and focused solely on IV administration. Future work should expand to include additional compounds and explore other routes such as oral delivery. Second, the model exhibited instability between CM1 and CM2, and deviations were observed during the terminal (elimination) phase. Further optimization is needed to improve prediction performance. Third, the LSTM architecture used in our model requires uniformly spaced time steps, so extending the model to handle irregularly sampled PK data would require further methodological adaptation. In recent years, emerging ML approaches such as Neural Ordinary Differential Equations (NODEs) [34, 35] and the PKsciML framework [36, 37] have shown growing potential in PK modeling. These methods attempt to incorporate prior knowledge or model structure into the machine learning process. In the future, combining such methods with compound structural or ADME features may further enhance early PK prediction.

5. Conclusion

In summary, our LSTM‐based autoregressive framework, driven by ADMEP descriptors and dosing information, provides a flexible and data‐driven approach for predicting full PK profiles of humans. Compared to existing PBPK‐based and ML methods, our model offers a simplified alternative and achieves comparable predictive performance without relying on large numbers of compounds. This framework has the potential for early‐phase PK evaluation and virtual compound screening.

Author Contributions

P.L., R.C., Z.W., and T.Z. wrote the manuscript; P.L., R.C., and T.Z. designed the research; P.L., Z.W., and Y.L. performed the research; P.L. and Y.L. analyzed the data.

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting information

Data S1: psp470128‐sup‐0001‐Supinfo1.pdf.

PSP4-14-2210-s003.pdf (266.8KB, pdf)

Table S1: Summary of feature (ADMEP descriptor) names and feature selection.

PSP4-14-2210-s004.xlsx (16.9KB, xlsx)

Table S2: Summary of drug names and the model's prediction performance across all dose levels and drugs.

PSP4-14-2210-s002.xlsx (15.6KB, xlsx)

Table S3: Summary of predictive performance in CL and Vss for our model and ADMETlab 3.0.

PSP4-14-2210-s006.docx (16.5KB, docx)

Data S2: psp470128‐sup‐0005‐Supinfo2.py.

PSP4-14-2210-s001.py (19.5KB, py)

Data S3: psp470128‐sup‐0006‐Supinfo3.xlsx.

PSP4-14-2210-s007.xlsx (32KB, xlsx)

Data S4: psp470128‐sup‐0007‐Supinfo4.docx.

PSP4-14-2210-s005.docx (29.5KB, docx)

Acknowledgments

The authors thank all contributors who provided insightful feedback and support during the study. The authors further acknowledge the use of ChatGPT (https://openai.com/chatgpt/) to improve the grammar and clarity of the manuscript.

Funding: This study was supported by the National Key Research and Development Program of China (2022YFF1203003).

Data Availability Statement

The datasets analyzed during the current study are available from the corresponding author on reasonable request.

References

  • 1. Kola I. and Landis J., “Can the Pharmaceutical Industry Reduce Attrition Rates?,” Nature Reviews Drug Discovery 3 (2004): 711–716. [DOI] [PubMed] [Google Scholar]
  • 2. Paul S. M., Mytelka D. S., Dunwiddie C. T., et al., “How to Improve R&D Productivity: the Pharmaceutical Industry's Grand Challenge,” Nature Reviews Drug Discovery 9 (2010): 203–214. [DOI] [PubMed] [Google Scholar]
  • 3. Himstedt A., Rapp H., Stopfer P., et al., “Beyond CL and VSS: A Comprehensive Approach to Human Pharmacokinetic Predictions,” Drug Discovery Today 29 (2024): 104238. [DOI] [PubMed] [Google Scholar]
  • 4. Davies M., Jones R. D. O., Grime K., et al., “Improving the Accuracy of Predicted Human Pharmacokinetics: Lessons Learned From the AstraZeneca Drug Pipeline Over Two Decades,” Trends in Pharmacological Sciences 41 (2020): 390–408. [DOI] [PubMed] [Google Scholar]
  • 5. Sager J. E., Yu J., Ragueneau‐Majlessi I., and Isoherranen N., “Physiologically Based Pharmacokinetic (PBPK) Modeling and Simulation Approaches: A Systematic Review of Published Models, Applications, and Model Verification,” Drug Metabolism and Disposition 43 (2015): 1823–1837. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Mao J., Ma F., Yu J., et al., “Shared Learning From a Physiologically Based Pharmacokinetic Modeling Strategy for Human Pharmacokinetics Prediction Through Retrospective Analysis of Genentech Compounds,” Biopharmaceutics & Drug Disposition 44 (2023): 315–334. [DOI] [PubMed] [Google Scholar]
  • 7. Zhang T., Heimbach T., Lin W., Zhang J., and He H., “Prospective Predictions of Human Pharmacokinetics for Eighteen Compounds,” Journal of Pharmaceutical Sciences 104 (2015): 2795–2806. [DOI] [PubMed] [Google Scholar]
  • 8. Stokes W. S., “Animals and the 3Rs in Toxicology Research and Testing: The Way Forward,” Human & Experimental Toxicology 34 (2015): 1297–1303. [DOI] [PubMed] [Google Scholar]
  • 9. States CotU , “H.R.2565 – FDA Modernization Act of 2021; 117th Congress (2021–2022),” 2021 [cited 2025 01–02], https://www.congress.gov/bill/117th‐congress/house‐bill/2565/text?r=33&s.
  • 10. Schneckener S., Grimbs S., Hey J., et al., “Prediction of Oral Bioavailability in Rats: Transferring Insights From in Vitro Correlations to (Deep) Machine Learning Models Using in Silico Model Outputs and Chemical Structure Parameters,” Journal of Chemical Information and Modeling 59 (2019): 4893–4905. [DOI] [PubMed] [Google Scholar]
  • 11. Wang Y., Liu H., Fan Y., et al., “In Silico Prediction of Human Intravenous Pharmacokinetic Parameters With Improved Accuracy,” Journal of Chemical Information and Modeling 59 (2019): 3968–3980. [DOI] [PubMed] [Google Scholar]
  • 12. Fu L., Shi S., Yi J., et al., “ADMETlab 3.0: An Updated Comprehensive Online ADMET Prediction Platform Enhanced With Broader Coverage, Improved Performance, API Functionality and Decision Support,” Nucleic Acids Research 52 (2024): W422–w431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Ren H.‐C., Sai Y., and Chen T., “Evaluation of Generic Methods to Predict Human Pharmacokinetics Using Physiologically Based Pharmacokinetic Model for Early Drug Discovery of Tyrosine Kinase Inhibitors,” European Journal of Drug Metabolism and Pharmacokinetics 44 (2019): 121–132. [DOI] [PubMed] [Google Scholar]
  • 14. Geci R., Gadaleta D., de Lomana M. G., et al., “Systematic Evaluation of High‐Throughput PBK Modelling Strategies for the Prediction of Intravenous and Oral Pharmacokinetics in Humans,” Archives of Toxicology 98 (2024): 2659–2676. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Mavroudis P. D., Teutonico D., Abos A., and Pillai N., “Application of Machine Learning in Combination With Mechanistic Modeling to Predict Plasma Exposure of Small Molecules,” Frontiers in Systems Biology 3 (2023): 1180948. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Jia X., Teutonico D., Dhakal S., et al., “Application of Machine Learning and Mechanistic Modeling to Predict Intravenous Pharmacokinetic Profiles in Humans,” Journal of Medicinal Chemistry 68 (2025): 7737–7750. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Handa K., Wright P., Yoshimura S., Kageyama M., Iijima T., and Bender A., “Prediction of Compound Plasma Concentration–Time Profiles in Mice Using Random Forest,” Molecular Pharmaceutics 20 (2023): 3060–3072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Pillai N., Abos A., Teutonico D., and Mavroudis P. D., “Machine Learning Framework to Predict Pharmacokinetic Profile of Small Molecule Drugs Based on Chemical Structure,” Clinical and Translational Science 17 (2024): e13824. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Obrezanova O., Martinsson A., Whitehead T., et al., “Prediction of in Vivo Pharmacokinetic Parameters and Time–Exposure Curves in Rats Using Machine Learning From the Chemical Structure,” Molecular Pharmaceutics 19 (2022): 1488–1504. [DOI] [PubMed] [Google Scholar]
  • 20. Beckers M., Yonchev D., Desrayaud S., Gerebtzoff G., and Rodríguez‐Pérez R., “DeepCt: Predicting Pharmacokinetic Concentration–Time Curves and Compartmental Models From Chemical Structure Using Deep Learning,” Molecular Pharmaceutics 21 (2024): 6220–6233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Hochreiter S. and Schmidhuber J., “Long Short‐Term Memory,” Neural Computation 9 (1997): 1735–1780. [DOI] [PubMed] [Google Scholar]
  • 22. Liu X., Liu C., Huang R., et al., “Long Short‐Term Memory Recurrent Neural Network for Pharmacokinetic‐Pharmacodynamic Modeling,” International Journal of Clinical Pharmacology and Therapeutics 59 (2021): 138–146. [DOI] [PubMed] [Google Scholar]
  • 23. Tang A., “Machine Learning for Pharmacokinetic/Pharmacodynamic Modeling,” Journal of Pharmaceutical Sciences 112 (2023): 1460–1475. [DOI] [PubMed] [Google Scholar]
  • 24. Keizer R., Hughes J., Tong D., and Woo K., “PKPDsim: Tools for Performing Pharmacokinetic‐Pharmacodynamic Simulations,” (2023).
  • 25. Paszke A., “Automatic Differentiation in Pytorch,”(2017).
  • 26. Lundberg S. M. and Lee S.‐I., “A Unified Approach to Interpreting Model Predictions,” Advances in Neural Information Processing Systems 30 (2017).
  • 27. Denney W. S., Duvvuri S., and Buckeridge C., “Simple, Automatic Noncompartmental Analysis: The PKNCA R Package,” Journal of Pharmacokinetics and Pharmacodynamics 42 (2015): 11–107. S165. [Google Scholar]
  • 28. Bengio S., Vinyals O., Jaitly N., and Shazeer N., “Scheduled Sampling for Sequence Prediction With Recurrent Neural Networks,” Advances in Neural Information Processing Systems 28 2015.
  • 29. Wen R., Torkkola K., Narayanaswamy B., and Madeka D., “A Multi‐Horizon Quantile Recurrent Forecaster,” arXiv Preprint arXiv:171111053 2017.
  • 30. Jaiswal R. and Singh B., “A Comparative Study of Loss Functions for Deep Neural Networks in Time Series Analysis,” in Big Data, Machine Learning, and Applications, ed. Borah M. D., Laiphrakpam D. S., Auluck N., and Balas V. E. (Springer Nature Singapore, 2024), 147–163. [Google Scholar]
  • 31. Wang Q., Ma Y., Zhao K., and Tian Y., “A Comprehensive Survey of Loss Functions in Machine Learning,” Annals of Data Science 9 (2022): 187–212. [Google Scholar]
  • 32. Barron J. T., “A General and Adaptive Robust Loss Function,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 15–20 June 2019 (IEEE, 2019), 4326–4334. [Google Scholar]
  • 33. Fagiolino P. and Guevara N., “Linear and Nonlinear Pharmacokinetics,” in The ADME Encyclopedia: A Comprehensive Guide on Biopharmacy and Pharmacokinetics (Springer International Publishing, 2021), 1–6. [Google Scholar]
  • 34. Bräm D. S., Nahum U., Schropp J., Pfister M., and Koch G., “Low‐Dimensional Neural ODEs and Their Application in Pharmacokinetics,” Journal of Pharmacokinetics and Pharmacodynamics 51 (2024): 123–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Lu J., Deng K., Zhang X., Liu G., and Guan Y., “Neural‐ODE for Pharmacokinetics Modeling and Its Advantage to Alternative Machine Learning Models in Predicting New Dosing Regimens,” iScience 24 (2021): 102804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Valderrama D., Ponce‐Bobadilla A. V., Mensing S., Fröhlich H., and Stodtmann S., “Integrating Machine Learning With Pharmacokinetic Models: Benefits of Scientific Machine Learning in Adding Neural Networks Components to Existing PK Models,” CPT: Pharmacometrics & Systems Pharmacology 13 (2024): 41–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Valderrama D., Teplytska O., Koltermann L. M., et al., “Comparing Scientific Machine Learning With Population Pharmacokinetic and Classical Machine Learning Approaches for Prediction of Drug Concentrations,” CPT: Pharmacometrics & Systems Pharmacology 14 (2025): 759–769. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1: psp470128‐sup‐0001‐Supinfo1.pdf.

PSP4-14-2210-s003.pdf (266.8KB, pdf)

Table S1: Summary of feature (ADMEP descriptor) names and feature selection.

PSP4-14-2210-s004.xlsx (16.9KB, xlsx)

Table S2: Summary of drug names and the model's prediction performance across all dose levels and drugs.

PSP4-14-2210-s002.xlsx (15.6KB, xlsx)

Table S3: Summary of predictive performance in CL and Vss for our model and ADMETlab 3.0.

PSP4-14-2210-s006.docx (16.5KB, docx)

Data S2: psp470128‐sup‐0005‐Supinfo2.py.

PSP4-14-2210-s001.py (19.5KB, py)

Data S3: psp470128‐sup‐0006‐Supinfo3.xlsx.

PSP4-14-2210-s007.xlsx (32KB, xlsx)

Data S4: psp470128‐sup‐0007‐Supinfo4.docx.

PSP4-14-2210-s005.docx (29.5KB, docx)

Data Availability Statement

The datasets analyzed during the current study are available from the corresponding author on reasonable request.


Articles from CPT: Pharmacometrics & Systems Pharmacology are provided here courtesy of Wiley

RESOURCES