Abstract
Recent research has increasingly focused on using machine learning for covariate selection in population pharmacokinetics (PPK) analysis. However, few studies have explored the prediction of plasma concentration profiles of drugs using nonlinear mixed-effect models combined with machine learning. This gap includes limited validation of prediction accuracy and applicability to diverse patient populations and dosing conditions. This study addresses these gaps by using remifentanil as a model drug and applying machine learning models to predict plasma concentration profiles based on virtual and real-world data. We created various training data sets for the virtual data by clustering based on the size and diversity of the test data set. Our results demonstrated high prediction accuracy for virtual and real-world data sets using Random Forest models. These results suggest that machine learning models are effective for large-scale data sets and real-world data with variable dosing times and amounts per patient. Considering the efficiency of machine learning, it offers a fit-for-purpose approach alongside traditional PPK methods, potentially enhancing future pharmacokinetic and pharmacodynamic studies.
Keywords: population pharmacokinetics, machine learning, remifentanil, virtual data set, real world data set


Introduction
The research and development of new drugs has a significant impact on society due to the persistent unmet medical needs across various diseases. However, pharmaceutical companies face substantial challenges, with drug development taking over 10 years and costing between $161 million to $4.54 billion (in 2019 US$) per drug. To address these challenges, model-informed drug discovery and development (MID3) offers a quantitative approach that spans all phases of drug development, from biomarker selection in translational medicine to pharmacoeconomic assessments. , In drug development, particularly in pharmacokinetics, rational dose and regimen selection through mathematical models is essential for advancing new drugs to clinical trials. Population pharmacokinetics (PPK) modeling has been the primary modeling method used. Building a PPK model involves complex procedures such as data handling, model selection with covariates, and validation. Based on these difficulties, the increasing availability of clinical data and the growing demand for data science have led to the development and application of various machine learning (ML) methods in clinical pharmacokinetics. ,
The conventional stepwise covariate modeling (SCM) algorithm has been the gold standard for covariate selection. However, SCM has limitations, including issues with removing multiplicity when numerous covariates are involved and is impractical for exploring all possible equations (linear and exponential). Recent advancements have introduced several ML methods to address these limitations. , For example, a study compared classical methods with ML techniques, such as random forest (RF), neural networks (NN), and support vector regression (SVR), applied to NONMEM empirical Bayes estimates. ML methods demonstrated comparable or superior statistical performance compared to SCM and were notably faster in computation, with SCM being 30–100 times slower than NN and SVR, respectively.
Model structure selection in the PPK model also presents challenges. Selecting the appropriate model from a library of existing models requires consideration of factors such as administration routes (e.g., intravenous, oral, or subcutaneous), absorption processes (e.g., first-order, zero-order, or with lag time), elimination mechanisms (e.g., linear or Michaelis–Menten), and compartment numbers. Systematic ML-based selection methods have been developed to improve efficiency in model selection, especially for large data sets or complex models. , For instance, a study utilizing genetic algorithms and NN found that ML methods significantly enhance the efficiency of pharmacokinetic model selection and can expedite the initial model selection process, which can then be refined using conventional pharmacometric methods.
Integrating the covariate and model structure selection enables the creation of predictive PPK models. These models are particularly valuable in therapeutic drug monitoring (TDM), where individualized dosing is crucial for drugs with narrow therapeutic windows. In traditional TDM, a patient’s blood sample is collected for drug concentration measurement. This concentration is then used as a sample in PPK modeling to estimate pharmacokinetic parameters using Bayes estimation. Given the complexities of PPK modeling, ML presents an emerging alternative for TDM. Instead of relying solely on traditional PPK modeling, ML can predict key pharmacokinetic parameters and concentrations from clinical data. Notable examples include studies predicting individual pharmacokinetic parameters for drugs like vancomycin [20–22], cyclosporine [23, 24], and tacrolimus [25–28]. In pharmacokinetic and pharmacodynamic (PK/PD) analysis, the full time–concentration profile of a drug is crucial, as it accounts for subtle temporal changes that are not captured by exposure-response analyses. Historical PK/PD parameters, such as the ratio of maximum concentration to minimum inhibitory concentration (MIC) and time–concentration above MIC, have been used extensively since the 1980s. Recent developments include utilizing the complete time–concentration profile for analysis.
In the nonclinical stage, several prediction methods for drug time–concentration profiles in rodents using ML models have recently been published, utilizing chemical structures as descriptors fitted for the drug discovery stages. , Surprisingly, before the AI boom in the 1990s and 2000s, prediction methods for time–concentration profiles in human plasma were published for tobramycin and remifentanil. Although these studies did not meet modern statistical validation criteria, such as external validation, their predicted accuracy was higher than that of the PPK modeling. Despite their relatively high accuracies, these ML models were not widely accepted; a review article suggested that this might be due to the “black box” nature of predictive inference. Researchers and clinicians tend to design doses and regimens in easily understandable ways. However, the increasing amount of digitalized clinical data and the development of interpretable ML technology have changed the landscape. Moreover, considering ML’s much higher processing speed compared to PPK modeling, the current usefulness of ML for predicting concentration–time profiles is evident. Recent publications have demonstrated this for olanzapine, rosuvastatin, and valproate.
However, to the best of our knowledge, no study has evaluated ML for predicting plasma concentration–time profiles, particularly regarding the quality and quantity of data sets. In this study, we validated the prediction method for concentration–time profiles using remifentanil, for which clinical data sets of time–concentration profiles and PPK models have been published. Additionally, since this is the first step of the validation of ML for PPK, the problem set should be as simple as possible. Since remifentanil is an intravenous drug, complexities such as absorption are not an issue in its analysis. Although one previous study attempted to build an ML model for remifentanil concentration–time profiles in human plasma, it was limited by a small number of patients and similar dosing conditions, which restricted its flexibility. Therefore, we employed two validation methods: first, using the published PPK model, we sampled a large virtual data set for ML validation to investigate sample size limits; second, we used a clinical data set with various dosing regimens to explore dosing limits.
Materials and Methods
Overall Workflow
In this study, we prepared virtual and real-world data sets of remifentanil and built ML models using these data sets. Figure illustrates this workflow.
1.
Overall workflow of the validation study.
Virtual data set (A) and real-world data set (B) were validated using different processes.
Data Set
Virtual Data Set
Generation of Data Set
A large virtual data set was prepared from samples using published equations from a PPK study (Figure A-1). Sampling followed the following equations:
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
The virtual data set comprised 10,000 PK sets (virtual subjects) (Figure A-2). These PK parameters were used to simulate remifentanil plasma concentration (Cp)-time profiles using a 3-compartment model. For this virtual data set, we set the dosing rate and infusion time as constants at 227 μg/min/body and 10 min, respectively, based on observed clinical settings (Figure A-3). Sampling and simulations were performed by using the RxODE library in R4.3.3.
Virtual Subjects Splitting into Test and Training Sets by Clustering
To evaluate model dependency on the data set, we created two different training data sets against one test data set through a clustering process (Figure A-4). First, 10,000 virtual subjects were divided into 10 clusters using hierarchical clustering based on age, lean body mass (LBM), Cp at 10 min, and Cp at 40 min using the agglomerative clustering function in scikit-learn 1.3.0 in Python 3. These properties were chosen because age and LBM are significant covariates in PPK equations, and Cp at 10 and 40 min reflects the infusion’s terminal state and the elimination phase of remifentanil, respectively.
Next, we randomly selected a test data set, named Cluster 1 (Figure A-5), and identified the training data set furthest from Cluster 1, named Cluster 2 (far cluster) (Figure A-6). The training data set nearest to Cluster 1 was named Cluster 3 (near cluster) (Figure A-7).
Number of Subjects in the Training Data Set
Since clinical data sets can be of variable sizes, it is important to understand how the model works under different data set sizes. Consequently, we investigated model dependency on the size of the training data sets. For each training data set, we selected 1,000, 500, 100, 50, and 10 subjects. For the test data set, we randomly selected 100 subjects from Cluster 1.
Real-World Data Set
The real-world data set was extracted from a previous report, selecting the time, Cp, age, and LBM of 64 clinical subjects. , Since remifentanil is known to be rapidly metabolized by esterases, acetonitrile was immediately added to the blood samples during collection to deactivate esterase activity. One subject with poor Cp time points (ID No. 11) was removed from the original 65 subjects (Figure B-1). The statistical properties of the participants are summarized in Table .
1. Statistical Properties of the Subjects.
| Dose |
Rate |
Age |
LBM |
TINFCAT |
|
|---|---|---|---|---|---|
| Metrics | μg | μg/min/body | Years | Kg | Min |
| Average | 2431 | 227 | 47 | 55 | 10 |
| Standard deviation | 2296 | 101 | 21 | 10 | 4 |
| Maximum | 15,000 | 750 | 85 | 76 | 20 |
| Median | 1863 | 216 | 41 | 57 | 10 |
| Minimum | 675 | 63 | 20 | 36 | 4 |
Details of Data Set
Each subject had varying time points for Cp, with some having fewer samples and others having more samples in the clinical stage. This setting of sampling time would be determined by the fixed exact time or some-fold of the length of infusion time. To ensure a fair analysis, we extracted time points to create two data sets. The first data set included subject properties and Cp at fixed nearest time points (5, 15, 30, 50, and 100 min), the termination time of infusion (TINFCAT), and four times the TINFCAT. The second data set included subject properties and Cp at 0.5, 1.5, 3, 5, and 10 times the TINFCAT, the termination time of TINFCAT, and four times the TINFCAT. Finally, the data set was divided into test and training sets using 5-fold cross-validation (CV) (Figure B-2, 3).
Proposed Machine Learning Method
In this study, we implemented eight different ML methods: least absolute shrinkage and selection operator, ridge regression, k-nearest neighbors, support vector machine, RF, gradient boosting (GB), extreme gradient boosting (XGB), and neural network using Scikit-learn 1.3.0 in Python 3. We set 20% of the training data as validation data and conducted parameter tuning for each ML method through a grid search. Performance evaluation was then carried out using optimal parameters. The list of the eight ML methods and the parameter ranges used in the grid search are provided in Table S1. The optimal parameters were selected based on performance metrics. The training and test data sets are listed in Table . We used seven explanatory variables: target time, infusion rate, age, LBM, TINFCAT, and Cp at the termination time of TINFCAT, and Cp four times after TINFCAT.
2. Conditions for Each Machine Learning Model.
| Objective Variables | Explanatory Variables | Data Set Origin | Target Time | Test Data Set | Training Data Set | Number of Training Subjects |
|---|---|---|---|---|---|---|
| Cp at target time | Target time, infusion rate, Age, LBM, TINFCAT, Cp at the termination time of TINFCAT, Cp at 4-times later of TINFCAT | Virtual | 5, 15, 30, 50, 100 min | Cluster 1 | Cluster 2 (far cluster) | 1000 |
| 500 | ||||||
| 100 | ||||||
| 50 | ||||||
| 10 | ||||||
| Cluster 3 (near cluster) | 1000 | |||||
| 500 | ||||||
| 100 | ||||||
| 50 | ||||||
| 10 | ||||||
| Real-world | 5, 15, 30, 50, 100 min | 5-fold CV | ||||
| 0.5, 1.5, 3, 5, 10 times (×) of TINFCAT | 5-fold CV | |||||
For the Target Time, we assumed several sampling time points that could be obtained in early clinical stages, such as Phase I studies. For Cp at the termination time of TINFCAT and Cp four times later, we assumed sampling points obtainable in later clinical stages, such as Phase II or III, where PPK studies are generally performed. The prediction workflow, considering the above description, is shown in Figure .
2.
Prediction workflow for this study.
This figure shows prediction workflow using the proposed ML method with a real data set. The ML model was built using the training data set, and remifentanil plasma concentration–time profiles were predicted based on specified times and other variables. The explanatory variables included target time, infusion rate, age, LBM, TINFCAT, Cp at the termination time of TINFCAT, and Cp four times later than that of TINFCAT. Consequently, instead of the target time, any specified time can be input to predict Cp at that time.
Compared Methods
Average Method
To assess the effectiveness of the proposed ML method using a virtual data set, we employed the average method as a control method. This method is commonly used in ML research. , Additionally, this method is consistent with the Naïve Average Data approach, which is one of the classical PPK modeling used when the sampling time points and doses are the same in patients. Predicted values were calculated based on the average values at specific time points in the training data set. Although this method does not account for individual differences in Cp, it is computationally efficient and suitable for large virtual data sets.
Population Pharmacokinetics Model
Constructing a PPK model for a large virtual data set is challenging due to the sample size. However, for smaller, real-world data sets, PPK modeling is feasible. We compared the proposed ML model with the PPK model by using real-world data. To build the PPK model, we included all Cp data at the termination time of TINFCAT and Cp four times later without separation of the training and test. We used TINFCAT and Cp only four times later, excluding Cp at other time points that were to be predicted. This approach was taken to prevent information leakage, which could occur if the predicted value were included in the PPK analysis for modeling when comparing ML model predictions for the test data set.
A robust PPK model requires sufficient subjects; a small sample size can lead to poor model performance. The PK structure and covariates were based on a 3-compartment model with a multiplicative error model, following a previous report. Stepwise methods were used to determine the equations and coefficients for the clearance and volume of distribution. The parameters and equations used are listed in Table S2.
Metrics
To evaluate and compare the models, we calculated the R 2-value and mean squared error (MSE) using the following equations:
| 9 |
| 10 |
where SSres is the sum of squared residuals, and SStot is the total sum of squares. These calculations were performed manually using Microsoft Excel (Office 365 version).
Results and Discussion
Performance Evaluation Using Virtual Data Set
Dependency on the Training Data Set’s Distance and Size Against the Test Set
To assess the dependency of the proposed ML model on the diversity and size of the training data set, various training data sets were prepared from a generated virtual data set. We implemented eight ML models. The results of the clustering are listed in Table S3, showing that each cluster has over 500 virtual subjects. Cluster 1 was assigned as the test cluster, whereas Clusters 2 and 3 were assigned as the furthest (distance: 56.52) and nearest (distance: 20.05) training clusters, respectively. Random sampling from Clusters 2 and 3 was performed to prepare training data sets of different sizes, including 1,000, 500, 100, 50, and 10 virtual subjects. Tables S4 and S5 list the MSE and R 2 values for each setting.
The predictivity (MSE) of each ML model against 100 test data sets randomly selected from Cluster 1 is shown in Figure AB. Here, we discuss the three ML models with the highest accuracy: RF, GB, and XGB. The results indicate that the predictivity using the near training cluster was superior to that of the far training cluster, regardless of the training size (average MSE and R 2 values: RF: 0.0225 and 0.9895, GB: 0.0173 and 0.9919, and XGB: 0.0368 and 0.9828 for the near cluster; RF: 0.0645 and 0.9699, GB: 0.0599 and 0.9721, and XGB: 0.0695 and 0.9675 for the far cluster). Both ML models outperformed the average method (average MSE and R 2 values: 0.2739 and 0.8721 for the near cluster; 0.3164 and 0.8523 for the far cluster).
3.
Dependency on the distance and size of the training data set of predictivity in the proposed ML model in the virtual data set using near (A) and far (B) training data set.
Focusing on the MSE for the near cluster, the MSE for 10 training subjects was significantly higher (MSE: RF: 0.0867, GB: 0.0691, and XGB: 0.1632) than for other settings (MSE: RF: 0.0099 for 50, 0.0096 for 100, 0.0033 for 500, and 0.0029 for 1000 training subjects, GB: 0.0082, 0.0043, 0.0023, and 0.0025, and XGB: 0.0090, 0.0062, 0.0034, and 0.0023). This indicates that more than 50 subjects are sufficient for precise predictions when using a similar training data set to the test data set. For the far cluster, the MSE for 10 training subjects was also much higher (MSE: RF: 0.1143, GB: 0.1017, and XGB: 0.1340). However, the MSE decreased linearly with an increasing number of training subjects (MSE: 0.0711 for 50, 0.0612 for 100, 0.0404 for 500, and 0.0357 for 1000 training subjects; GB: 0.0767, 0.0551, 0.0353, and 0.0305; and XGB: 0.0954, 0.0495, 0.0373, and 0.0312). This trend indicates that even when the training subjects are far from the test data, a larger training size (over 100 subjects) can still contribute to building an effective predictive ML model.
The MSE was compared according to the size of the training data set (from 1000 to 10) against the test data set.
Dependency on the Target Time Points
To understand the features of the proposed ML models, we analyzed the differences in predictivity at various time points after administration. Based on previous results, ML models using 100 virtual subjects were compared with the average method (Figure S1). The MSE and R 2-values for each setting are listed in Tables S6 and S7, respectively. Figure A,B show the MSE values for near and far training clusters, respectively. The MSE and R 2 values for the overall target time points using the near cluster were better (RF: 0.0096 and 0.9955, GB: 0.0043 and 0.9980, and XGB: 0.0062 and 0.9971, respectively) than those using the far clusters (RF: 0.0612 and 0.9714, GB: 0.0551 and 0.9743, and XGB: 0.0495 and 0.9769, respectively). Near and far training clusters had much higher predictivity than those of the average method (0.2752 and 0.8715 for near clusters and 0.3259 and 0.8478 for far clusters, respectively).
4.
MSE of proposed ML models on each time point when using near (A) and far (B) training clusters, with a training size of 100.
Focusing on the MSE, it is observed that the MSE at late time points (50–100 min: RF: 0.0023–0.0418, GB: 0.0013–0.0164, and XGB: 0.0020–0.0240 for the near training model and RF: 0.0116–0.2512, GB: 0.0089–0.2162, and XGB: 0.0163–0.1804 for the far training model) was higher than at early time points (5–30 min: RF: 0.0011–0.0017, GB: 0.0010–0.0015, and XGB: 0.0014–0.0025 for the near training model and RF: 0.0102–0.0172, GB: 0.0087–0.0243, and XGB: 0.085–0.0258 for the far training model). The Cp values were also analyzed, leading to a similar conclusion that predictivity was better at higher Cp values. This indicates that predictivity worsens over time, consistent with the increased difficulty in predicting the terminal phase due to complex elimination processes. , However, even at 100 min, the MSE of the near and far training clusters of the ML model was significantly lower than those of the average model.
The MSE was compared according to the sampling time points (from 5 to 100 min) when using 100 training samples from the near and far data sets against the test data set.
Performance Evaluation Using Real-World Data Set
Predictivity of Real Plasma Concentration–Time Profiles at Exact Target Time Points
In addition to validation with the virtual data set, we developed an ML model using the remifentanil real-world data set under a 5-fold CV and compared its performance with that of the PPK model built in this study. We implemented three ML models: RF, GB, and XGB. However, we focused our discussion on RF, as it showed relatively high accuracy with the virtual data set. Results for the other ML methods are provided in the Supporting Information. The MSE and R 2-values for each setting are listed in Tables S8 and S9, respectively. Notably, the overall MSE for the RF model (0.0405 ± 0.0132) was slightly lower than for the PPK model (0.0508), as shown in Figure A. Similarly, the R 2-values were 0.9573 for the PPK model and 0.9606 ± 0.0117 for the RF model.
5.
MSE (A) and plot (B) of proposed ML and PPK models at each target time point on the basis of the exact time.
Focusing on MSE at specific time points, at 5 and 15 min, the MSEs for the PPK model (0.0186 and 0.0374) were higher than for the RF model (0.0068 ± 0.0012 and 0.0307 ± 0.0188). Conversely, at 30 and 50 min, the MSEs for the PPK model (0.0154 and 0.0479) were lower than for the RF model (0.0574 ± 0.0224 and 0.0792 ± 0.0297). At 100 min, the MSE for the PPK model (0.1349) was significantly higher than for the RF model (0.0283 ± 0.0046). Figure B shows a plot of log-transformed predicted versus observed values for both models. Visually, the RF model predicted better at lower concentrations (−1.0 to – 0.5); however, in the middle range (−0.5 to 1.0), the RF model’s predictions had a larger spread. These results suggest that the RF model performs better at earlier and later time points with lower plasma concentrations, whereas the PPK model performs better at intermediate time points with higher plasma concentrations.
(A) The MSE was compared according to the sampling time points (from 5 to 100 min) in the clinical data set. (B) Scattered plot whose data is from (A: PPK in green, and ML in orange); the concentration observed (x-axis) and predicted (y-axis) values are transformed into log-values.
Predictivity of Real Plasma Concentration–Time Profiles on Infusion Time
We also compared the predictivity of the RF and PPK models based on infusion time, which varied for each subject (Table and Figure ). The MSE and R 2-values for each setting are shown in Tables S10 and S11. Overall, the RF model’s performance (MSE: 0.0422 ± 0.0113, R 2: 0.9569 ± 0.0105) was equivalent to that of the PPK model (MSE: 0.0491, R 2: 0.9531). During infusion, both models demonstrated a high predictivity. Immediately after infusion (TINFCAT × 1.5), the RF (MSE: 0.0308 ± 0.0078) was lower than the PPK model (MSE: 0.0641). At longer times after infusion (TINFCAT × 3 and × 5), the MSEs of the PPK model (0.0149 and 0.0157) were higher than the RF model’s (0.0573 ± 0.0180 and 0.0425 ± 0.0166). At a much longer time after infusion (TINFCAT × 10), the MSE of the PPK model (0.1428) was significantly higher than that of the RF model (0.0696 ± 0.0177). These findings are consistent with those in the previous section: the RF model performs well at earlier and later time points with lower plasma concentrations, whereas the PPK model performs well at intermediate time points with higher plasma concentrations.
6.
MSE (A) and plot (B) of proposed ML and PPK models at each time point based on the infusion time.
In conclusion, based on these predictive results (Figures and ), the RF model proposed in this study shows high predictivity for plasma concentration. This indicates a new era of prediction methods for clinical plasma drug concentrations, with RF offering faster analysis compared to PPK. Despite this, the PPK model’s predictivity remains robust even when built using a small data set (e.g., at the termination time of TINFCAT, Cp four times later than TINFCAT) to maintain a sufficient number of subjects. The time–concentration profile of remifentanil is relatively uncomplicated, partially due to its administration by infusion, which bypasses the absorption process. Predicting concentration–time profiles becomes more challenging with drugs involving initial absorption processes and factors such as first-pass liver metabolism. However, this study has some limitations. First, the data set used is relatively small, which might affect the generalizability of the findings. Second, the study focuses on remifentanil, a drug with a straightforward pharmacokinetic profile, limiting the applicability of the results to drugs with more complex profiles. Additionally, the RF model’s performance was evaluated under specific conditions, and its predictive accuracy may vary depending on different data characteristics or clinical settings. Therefore, to conclusively determine the superior method, further research should address drugs with more complex pharmacokinetics, such as those with double peaks. Future studies should also consider larger and more diverse data sets to enhance the robustness and generalizability of the findings.
(A) The MSE was compared according to the sampling time points (from TIFCAT × 0.5 to TIFCAT × 10) in the clinical data set. (B) Scattered plot whose data is from (A: PPK in green, and ML in orange); the concentration observed (x-axis) and predicted (y-axis) values are transformed into log-values.
Conclusion
This study utilized remifentanil as a model to evaluate the effectiveness of ML models in predicting plasma concentration profiles, validated using virtual and real data sets. Our findings indicate that the ML model achieved higher prediction accuracy with nearby training clusters and larger training sample sizes. Even with distant training clusters, the accuracy remained acceptable with increased training samples. When applied to real data, the ML model demonstrated prediction accuracy equal to or surpassing that of the PPK model, indicating its robustness for individual patient PK predictions. Therefore, ML models are suitable for large-scale data sets and real-world data with variable dosing times and amounts. Given the speed and efficiency of ML, it offers a significant practical advantage over traditional PPK models, particularly for extensive data sets. However, the difference in prediction accuracies should be further investigated, as this study did not include a sufficient number of real samples for PPK modeling. Future studies should further validate this technique by applying it to compounds with more complex PK profiles, such as those with oral administration or double plasma peaks. This research marks a potential shift toward more rapid and accurate PK predictions using ML, tailored to specific analytical needs.
Supplementary Material
Acknowledgments
The authors thank Ryutaro Iwama, Daisuke Ishii, and Kimie Nishida in TEIJIN Pharma Ltd. for helpful discussions.
Glossary
Abbreviations
- PPK
population pharmacokinetics
- PK/PD
pharmacokinetic and pharmacodynamic
- MID3
model-informed drug discovery and development
- ML
machine learning
- SCM
stepwise covariate modeling
- RF
random forest
- NN
neural networks
- SVR
support vector regression
- TDM
therapeutic drug monitoring
- MIC
minimum inhibitory concentratio
- TINFCAT
termination time of infusion
- CV
cross-validation
- LBM
lean body mass
- Cp
remifentanil plasma concentration
- MSE
mean squared error
- GB
gradient boosting
- XGB
extreme gradient boosting
Data is provided in Supplementary_Data_File.xlsx.
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.molpharmaceut.4c01431.
Table S1: the list of 8 machine learning methods and grid search range; Table S2: PPK model built in this study using a real-world data set; Table S3: cluster sizes and distances of virtual data sets; Table S4: MSE of the average method and the proposed ML model using virtual data sets; Table S5: R 2 values for the average method and proposed ML model using the virtual data sets; Table S6: MSE of the average method and proposed ML model using virtual data sets of size 100; Table S7: R 2 values of the average method and proposed ML model using virtual data sets with a sample size of 100; Table S8: MSE of PPK and proposed ML model using real data sets for each specified time point and overall; Table S9: R 2 values of PPK and proposed ML models using real data sets for each specified time point and overall; Table S10: MSE of PPK and proposed ML models using real data sets for each fold of time points during the infusion period and overall; Table S11: R 2 value of PPK and proposed ML models using real data sets for each fold of time points during the infusion period and overall; Figure S1: comparison of predictivity at different Cp values using 100 training subjects from the near (A) and far (B) clusters (PDF)
(XLSX)
H.I.: data curation, formal analysis, funding acquisition, investigation, methodology, visualization, writing – review and editing, and writing – original draft. M.K.: writing – review and editing. K.H.: conceptualization, data curation, formal analysis, investigation, methodology, visualization, writing – review and editing, and supervision.
This work was supported by a Grant-in-Aid for Transformative Research Areas (A) “Latent Chemical Space” [JP24H01771] for HI from the Ministry of Education, Culture, Sports, Science and Technology, Japan, and by the Japan Research Foundation for Clinical Pharmacology.
The authors declare no competing financial interest.
References
- Kusynova Z., Pauletti G. M., van den Ham H. A., Leufkens H. G. M., Mantel-Teeuwisse A. K.. Unmet Medical Need as a Driver for Pharmaceutical Sciences - A Survey Among Scientists. J. Pharm. Sci. 2022;111(5):1318–1324. doi: 10.1016/j.xphs.2021.10.002. [DOI] [PubMed] [Google Scholar]
- Schlander M., Hernandez-Villafuerte K., Cheng C. Y., Mestre-Ferrandiz J., Baumann M.. How Much Does It Cost to Research and Develop a New Drug? A Systematic Review and Assessment. Pharmacoeconomics. 2021;39(11):1243–1269. doi: 10.1007/s40273-021-01065-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marshall S. F., Burghaus R., Cosson V., Cheung S. Y., Chenel M., DellaPasqua O., Frey N., Hamren B., Harnisch L.. et al. Good Practices in Model-Informed Drug Discovery and Development: Practice, Application, and Documentation. CPT: pharmacometrics Syst. Pharmacol. 2016;5(3):93–122. doi: 10.1002/psp4.12049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marshall S., Madabushi R., Manolis E., Krudys K., Staab A., Dykstra K., Visser S. A. G.. Model-Informed Drug Discovery and Development: Current Industry Good Practice and Regulatory Expectations and Future Perspectives. CPT: pharmacometrics Syst. Pharmacol. 2019;8(2):87–96. doi: 10.1002/psp4.12372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- a Beal S. L., Sheiner L. B.. Estimating population kinetics. Crit. Rev. Biomed. Eng. 1982;8(3):195–222. [PubMed] [Google Scholar]; b Watanabe-Uchida M., Narukawa M.. Utilization of population pharmacokinetics in drug development and provision of the results to healthcare professionals. Int. J. Clin Pharmacol Ther. 2017;55(1):25–31. doi: 10.5414/CP202696. [DOI] [PubMed] [Google Scholar]
- US Food and Drug Administration Population pharmacokinetics: Guidance for industry. US Food And Drug Administration, 2022. [Google Scholar]
- Ota R., Yamashita F.. Application of machine learning techniques to the analysis and prediction of drug pharmacokinetics. J. Controlled Release. 2022;352:961–969. doi: 10.1016/j.jconrel.2022.11.014. [DOI] [PubMed] [Google Scholar]
- Janssen A., Bennis F. C., Mathot R. A. A.. Adoption of Machine Learning in Pharmacometrics: An Overview of Recent Implementations and Their Considerations. Pharmaceutics. 2022;14(9):1814. doi: 10.3390/pharmaceutics14091814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- a Ribbing J., Niclas Jonsson E.. Power, selection bias and predictive performance of the population pharmacokinetic covariate model. J. Pharmacokinet. Pharmacodyn. 2004;31:109–134. doi: 10.1023/B:JOPA.0000034404.86036.72. [DOI] [PubMed] [Google Scholar]; b Ahamadi M., Largajolli A., Diderichsen P. M., de Greef R., Kerbusch T., Witjes H., Chawla A., Davis C. B., Gheyas F.. Operating characteristics of stepwise covariate selection in pharmacometric modeling. J. Pharmacokinet. Pharmacodyn. 2019;46(3):273–285. doi: 10.1007/s10928-019-09635-6. [DOI] [PubMed] [Google Scholar]
- a Bram D. S., Nahum U., Atkinson A., Koch G., Pfister M.. Evaluation of machine learning methods for covariate data imputation in pharmacometrics. CPT: pharmacometrics Syst. Pharmacol. 2022;11(12):1638–1648. doi: 10.1002/psp4.12874. [DOI] [PMC free article] [PubMed] [Google Scholar]; b Zhu X., Zhang M., Wen Y., Shang D.. Machine learning advances the integration of covariates in population pharmacokinetic models: Valproic acid as an example. Front. Pharmacol. 2022;13(13):994665. doi: 10.3389/fphar.2022.994665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sibieude E., Khandelwal A., Hesthaven J. S., Girard P., Terranova N.. Fast screening of covariates in population models empowered by machine learning. J. Pharmacokinet. Pharmacodyn. 2021;48(4):597–609. doi: 10.1007/s10928-021-09757-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sibieude E., Khandelwal A., Girard P., Hesthaven J. S., Terranova N.. Population pharmacokinetic model selection assisted by machine learning. J. Pharmacokinet. Pharmacodyn. 2022;49(2):257–270. doi: 10.1007/s10928-021-09793-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- a Lee S., Song M., Han J., Lee D., Kim B. H.. Application of Machine Learning Classification to Improve the Performance of Vancomycin Therapeutic Drug Monitoring. Pharmaceutics. 2022;14(5):1023. doi: 10.3390/pharmaceutics14051023. [DOI] [PMC free article] [PubMed] [Google Scholar]; b Jaber M. M., Yaman B., Sarafoglou K., Brundage R. C.. Application of Deep Neural Networks as a Prescreening Tool to Assign Individualized Absorption Models in Pharmacokinetic Analysis. Pharmaceutics. 2021;13(6):797. doi: 10.3390/pharmaceutics13060797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Del Valle-Moreno P., Suarez-Casillas P., Mejias-Trueba M., Ciudad-Gutierrez P., Guisado-Gil A. B., Gil-Navarro M. V., Herrera-Hidalgo L.. Model-Informed Precision Dosing Software Tools for Dosage Regimen Individualization: A Scoping Review. Pharmaceutics. 2023;15(7):1859. doi: 10.3390/pharmaceutics15071859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Overgaard R. V., Ingwersen S. H., Tornoe C. W.. Establishing Good Practices for Exposure-Response Analysis of Clinical Endpoints in Drug Development. CPT: pharmacometrics Syst. Pharmacol. 2015;4(10):565–575. doi: 10.1002/psp4.12015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Landersdorfer C. B., Nation R. L.. Limitations of Antibiotic MIC-Based PK-PD Metrics: Looking Back to Move Forward. Front. Pharmacol. 2021;12:770518. doi: 10.3389/fphar.2021.770518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- a Seeger J., Guenther S., Schaufler K., Heiden S. E., Michelet R., Kloft C.. Novel Pharmacokinetic/Pharmacodynamic Parameters Quantify the Exposure-Effect Relationship of Levofloxacin against Fluoroquinolone-Resistant Escherichia coli . Antibiotics. 2021;10(6):615. doi: 10.3390/antibiotics10060615. [DOI] [PMC free article] [PubMed] [Google Scholar]; b Negus S. S., Banks M. L.. Pharmacokinetic–pharmacodynamic (PKPD) analysis with drug discrimination. Behav. Neurosci. Drug Discrim. 2016;39:245–259. doi: 10.1007/7854_2016_36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Handa K., Wright P., Yoshimura S., Kageyama M., Iijima T., Bender A.. Prediction of Compound Plasma Concentration-Time Profiles in Mice Using Random Forest. Mol. Pharm. 2023;20(6):3060–3072. doi: 10.1021/acs.molpharmaceut.3c00071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Obrezanova O., Martinsson A., Whitehead T., Mahmoud S., Bender A., Miljkovic F., Grabowski P., Irwin B., Oprisiu I., Conduit G.. et al. Prediction of In Vivo Pharmacokinetic Parameters and Time-Exposure Curves in Rats Using Machine Learning from the Chemical Structure. Mol. Pharm. 2022;19(5):1488–1504. doi: 10.1021/acs.molpharmaceut.2c00027. [DOI] [PubMed] [Google Scholar]
- Chow H.-H., Tolle K. M., Roe D. J., Elsberry V., Chen H.. Application of neural networks to population pharmacokinetic data analysis. J. Pharm. Sci. 1997;86(7):840–845. doi: 10.1021/js9604016. [DOI] [PubMed] [Google Scholar]
- Poynton M. R., Choi B. M., Kim Y. M., Park I. S., Noh G. J., Hong S. O., Boo Y. K., Kang S. H.. Machine learning methods applied to pharmacokinetic modelling of remifentanil in healthy volunteers: A multi-method comparison. J. Int. Med. Res. 2009;37(6):1680–1691. doi: 10.1177/14732300090370060. [DOI] [PubMed] [Google Scholar]
- Jiménez-Luna J., Grisoni F., Schneider G.. Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2020;2(10):573–584. doi: 10.1038/s42256-020-00236-4. [DOI] [Google Scholar]
- Khusial R., Bies R. R., Akil A.. Deep Learning Methods Applied to Drug Concentration Prediction of Olanzapine. Pharmaceutics. 2023;15(4):1139. doi: 10.3390/pharmaceutics15041139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu Y., Lou H., Chen J., Jiang B., Yang D., Hu Y., Ruan Z.. Application of a Backpropagation Artificial Neural Network in Predicting Plasma Concentration and Pharmacokinetic Parameters of Oral Single-Dose Rosuvastatin in Healthy Subjects. Clin Pharmacol. Drug Dev. 2020;9(7):867–875. doi: 10.1002/cpdd.809. [DOI] [PubMed] [Google Scholar]
- Soeorg H., Sverrisdottir E., Andersen M., Lund T. M., Sessa M.. Artificial Neural Network vs. Pharmacometric Model for Population Prediction of Plasma Concentration in Real-World Data: A Case Study on Valproic Acid. Clin. Pharmacol. Ther. 2022;111(6):1278–1285. doi: 10.1002/cpt.2577. [DOI] [PubMed] [Google Scholar]
- Minto C. F., Schnider T. W., Egan T. D., Youngs E., Lemmens H. J., Gambus P. L., Billard V., Hoke J. F., Moore K. H., Hermann D. J.. Influence of age and gender on the pharmacokinetics and pharmacodynamics of remifentanil: I. Model development. J. American Soc. Anesthesiol. 1997;86(1):10–23. doi: 10.1097/00000542-199701000-00004. [DOI] [PubMed] [Google Scholar]
- Traynard P., Ayral G., Twarogowska M., Chauvin J.. Efficient Pharmacokinetic Modeling Workflow With the MonolixSuite: A Case Study of Remifentanil. CPT: pharmacometrics Syst. Pharmacol. 2020;9(4):198–210. doi: 10.1002/psp4.12500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Remifentanil (FDA Approved Labeling Text). https://www.accessdata.fda.gov/drugsatfda_docs/label/2004/20630se5-005_ultiva_lbl.pdf accessed 6 March 2025.
- Miljkovic F., Martinsson A., Obrezanova O., Williamson B., Johnson M., Sykes A., Bender A., Greene N.. Machine Learning Models for Human In Vivo Pharmacokinetic Parameters with In-House Validation. Mol. Pharm. 2021;18(12):4520–4530. doi: 10.1021/acs.molpharmaceut.1c00718. [DOI] [PubMed] [Google Scholar]
- Ette E. I., Williams P. J.. Population pharmacokinetics II: Estimation methods. Ann. Pharmacother. 2004;38(11):1907–1915. doi: 10.1345/aph.1E259. [DOI] [PubMed] [Google Scholar]
- Kapoor S., Narayanan A.. Leakage and the reproducibility crisis in machine-learning-based science. Patterns. 2023;4(9):100804. doi: 10.1016/j.patter.2023.100804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang W., Mak W., Gwee A., Gu M., Wu Y., Shi Y., He Q., Xiang X., Han B., Zhu X.. Establishment and Evaluation of a Parametric Population Pharmacokinetic Model Repository for Ganciclovir and Valganciclovir. Pharmaceutics. 2023;15(7):1801. doi: 10.3390/pharmaceutics15071801. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Atkinson, A. J. CHAPTER 13 - PHARMACOKINETICS. In Pharmacology and Therapeutics, Waldman, S. A. ; Terzic, A. ; Egan, L. J. ; Elghozi, J.-L. ; Jahangir, A. ; Kane, G. C. ; Kraft, W. K. ; Lewis, L. D. ; Morrow, J. D. ; Zingman, L. V. . CHAPTER 13 - PHARMACOKINETICS Pharmacology and Therapeutics; W.B. Saunders; 2009, pp. 193–202 [Google Scholar]
- Davies N. M., Takemoto J. K., Brocks D. R., Yáñez J. A.. Multiple peaking phenomena in pharmacokinetic disposition. Clin. Pharmacokinet. 2010;49:351–377. doi: 10.2165/11319320-000000000-00000. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data is provided in Supplementary_Data_File.xlsx.






