Accelerating virtual patient generation with a Bayesian optimization and machine learning surrogate model

Hiroaki Iwata; Ryuta Saito

doi:10.1002/psp4.13288

. 2024 Dec 4;14(3):486–494. doi: 10.1002/psp4.13288

Accelerating virtual patient generation with a Bayesian optimization and machine learning surrogate model

Hiroaki Iwata ^1,^✉, Ryuta Saito ²

PMCID: PMC11919265 PMID: 39630593

Abstract

The pharmaceutical industry has increasingly adopted model‐informed drug discovery and development (MID3) to enhance productivity in drug discovery and development. Quantitative systems pharmacology (QSP), which integrates drug action mechanisms and disease complexities to predict clinical endpoints and biomarkers is central to MID3. QSP modeling has proven successful in metabolic and cardiovascular diseases and has expanded into oncology, immunotherapy, and infectious diseases. Despite its benefits, QSP model validation through clinical trial simulations using virtual patients (VPs) is challenging because of parameter variability and high computational costs. To address these challenges, this study proposes a hybrid method that combines Bayesian optimization with machine learning for efficient parameter screening. Our approach achieved an acceptance rate of 27.5% in QSP simulations, which is in sharp contrast with the 2.5% rate of conventional random search methods, indicating more than 10‐fold improvement in efficiency. By facilitating faster and more diverse VPs generation, this method promises to advance clinical trial simulations and accelerate drug development in pharmaceutical research.

Study Highlights.

WHAT IS THE CURRENT KNOWLEDGE ON THE TOPIC?

Current knowledge highlights the role of quantitative systems pPharmacology (QSP) models in integrating drug mechanisms and disease complexity to predict clinical outcomes across various therapeutic areas.

WHAT QUESTION DID THIS STUDY ADDRESS?

This study aimed to improve the efficiency of virtual patient generation in QSP modeling by investigating a hybrid Bayesian optimization and machine learning approach compared with conventional methods.

WHAT DOES THIS STUDY ADD TO OUR KNOWLEDGE?

This study demonstrates a 10‐fold increased efficiency of virtual patient generation using a hybrid approach, significantly reducing computational costs while ensuring diverse and physiologically plausible models.

HOW MIGHT THIS CHANGE DRUG DISCOVERY, DEVELOPMENT AND/OR THERAPEUTICS?

By accelerating virtual patient generation, this approach enables cost‐effective and rapid clinical trial simulations, potentially revolutionizing the design of optimized regimens in clinical trials and decision‐making in pharmaceutical research and development.

INTRODUCTION

To improve the low productivity of drug discovery and development, the pharmaceutical industry has increasingly adopted a modeling and simulation approach known as model‐informed drug discovery and development (MID3). ¹ , ² Quantitative systems pharmacology (QSP), a comprehensive mathematical model that integrates drug action mechanisms and disease considerations to enable analysis and prediction, is at the core of the MID3 approach. QSP models are based on the mechanism of action of the target drug and account for the biological complexity of humans and diversity within patient populations. ³ They can predict quantitative changes in clinical end points and pharmacodynamic biomarkers for various dosing regimens in both healthy individuals and patients. The effectiveness of these models has been reported for metabolic diseases such as hypercholesterolemia and cardiovascular conditions such as hypertension. ⁴ Recent reviews have indicated that since 2019, the scope of QSP modeling has expanded to include areas such as oncology, cancer immunotherapy, and infectious diseases. ⁵ , ⁶ Moreover, QSP models are now used both retrospectively to interpret clinical trial results and prospectively for clinical prediction and risk assessment of adverse effects. ⁶ According to FDA reports, regulatory submissions incorporating QSP model analysis have steadily increased since 2013, with 157 submissions acknowledged by 2020. ⁷ These reviews indicate that QSP modeling is poised to continue expansion as a tool for clinical prediction in drug development, with further research and implementation anticipated in the pharmaceutical industry.

Clinical trial simulations using alternative parameterizations of QSP models, known as virtual patients (VPs), are common strategies for validating the constructed QSP model and ensuring the reliability of its predictive outcomes. ⁸ These simulations utilized VPs to account for patient diversity and heterogeneity by introducing parameter variability, thereby assessing their impact on specific biomarkers and clinical endpoints. In the context of QSP models that encapsulate complex pathophysiological mathematical frameworks, scarcity of relevant human data often hampers the determination of parameter distributions. Consequently, techniques have been proposed to select VPs that align with data from clinical trials. ⁹ This approach enables the generation of robust VP models by effectively exploring a constrained multidimensional parameter space using limited human data. However, the computational cost remains the most significant constraint, as obtaining a sample size of VPs appropriate for the study objective requires extensive iterative calculations.

With rapid advancements in artificial intelligence (AI) and machine learning (ML) technologies in recent years, these techniques are increasingly being applied to complex QSP models, which are often described by numerous ordinary differential equations. ¹⁰ The integrated approach that combines QSP with AI/ML aims to address four primary categories: (1) parameter estimation, (2) model structure, (3) dimensionality reduction, and (4) uncertainty and virtual population. Given that QSP modeling typically requires running tens of thousands of simulations, the need for efficiency through AI/ML is significant. For instance, in the generation of VPs, methods such as Markov chain Monte Carlo (MCMC) sampling and related techniques have been proposed to explore VP populations that align with the desired data density. ³ , ¹¹ , ¹² These inverse problems involving the inference of mechanistic model parameters are often formulated as Bayesian inference problems. Approaching the parameter inference for VP models as an inverse problem ensures that the final model population comprises physiologically plausible models constrained by a range of observed characteristics. However, a tradeoff remains between maintaining diversity among VPs and efficiently exploring the space within a constrained multidimensional parameter space. Moreover, the reduction in computational cost remains insufficiently addressed. ¹²

In this study, we aimed to reduce the computational costs associated with generating VPs by proposing a method that uses ML models as surrogate models to screen large parameter spaces. The computational efficiency of our proposed method, a hybrid method combining Bayesian optimization with ML, was also evaluated. The proposed method achieved the highest efficiency, significantly reducing computational costs, while successfully obtaining diverse parameter sets for virtual patient (VP) models. By accelerating the generation of VPs, which has historically been constrained by high computational costs, this study research facilitated the easier implementation of clinical trial simulations using virtual‐patient models. We anticipate that this advancement will further promote the use of virtual‐patient modeling in clinical research and drug development within pharmaceutical companies.

METHODS

Workflow

Figure 1 illustrates the workflow proposed in this study. Traditionally, random simulations using QSP models were conducted based on clinical data distribution to select VP models. In contrast, our proposed method employs ML techniques. The workflow comprises the following steps: (1) setting the clinical data distribution and conducting 10,000 QSP model simulations for constructing a ML model; (2) using the data obtained from step (1) as a training dataset to build the machine learning model; (3) predicting VP models that generate normal profiles using the ML model constructed in step (2); and (4) conducting QSP model simulations for models with high prediction scores from the ML model. Detailed explanations of each stage are provided below.

Generation of training data for machine learning

To construct the ML model, we generated a training dataset using two simulation methods—random search and Bayesian optimization search—both utilizing the QSP model. Each method involved 10,000 simulations. First, in the random search approach, similar to that in traditional methods, we randomly set 11 parameters and conducted simulations using the QSP model. The QSP model, the mathematical biophysical model of human cardiomyocytes for evaluating arrhythmogenic effects of the drug candidates, and the varied model parameters for VP models used in this study are described in the next section. The objective variable score was calculated by measuring the Euclidean distance of the 10 scored features from the action potential (AP) waveforms, based on how far they deviated from the variation range in healthy subjects (see Table S1 and Figure S1 in the quantitative systems pharmacology simulation section). A Bayesian optimization search was conducted using PHYSBO. ¹³ PHYSBO (Optimization Tools for PHYSics Based on Bayesian Optimization) is a Python library designed for efficient and scalable Bayesian optimization. It builds upon the framework of COMBO (Common Bayesian Optimization) ¹⁴ and is tailored specifically for researchers in the field of materials science (https://issp‐center‐dev.github.io/PHYSBO/manual/master/en/introduction.html).

Using the simulation results, two types of datasets were created. The first was the raw dataset, which comprised data directly obtained from the simulations. The second was created by sampling an equal number of abnormal and normal AP waveform VP models from the original dataset.

Construction of a machine learning model

The ML model employed the random forest algorithm (scikit‐learn 1.3.0 in Python), ¹⁵ which has been demonstrated to be effective in various studies. Using the four types of training datasets generated in step (1), four ML models were constructed. The explanatory variables comprised 11 features, whereas the response variable was a binary classification indicating whether the AP waveform was normal or abnormal. The hyperparameters were determined via fivefold cross‐validation. Four types of hyperparameters were considered, and a grid search was conducted within the following ranges to explore the optimal parameters: ‘n_estimators’: [50, 100, 200], ‘max_depth’: [None, 10, 20], ‘min_samples_split’: [2, 5, 10], ‘min_samples_leaf’: [1, 2, 4].

Prediction using the machine learning model

To evaluate the performance of the ML surrogate model in efficiently selecting acceptable VPs, a validation dataset consisting of 500,000 parameter sets and the simulated results of all QSP models (14,001 with normal AP waveforms and 485,999 with abnormal AP waveforms) was created in the same manner as the training dataset. Using this validation dataset, predictions were made using the ML models, and prediction scores were assigned to each individual parameter set. This process involved the use of a ML model as a surrogate model for the QSP model simulation.

Implementation of the QSP model

Using the parameter sets scored in (3), simulations were conducted with QSP models, starting from individuals with high prediction scores (already executed in practice). Once 5000 normal AP waveform VPs were obtained, simulations with the QSP model were concluded.

Quantitative systems pharmacology simulation

We employed the O'Hara‐Rudy (ORd) dynamic model developed by O'Hara et al., ¹⁶ a mathematical biophysical model of human ventricular cardiomyocytes commonly used for proarrhythmic assessment, as a use case of QSP modeling. This model constitutes the most validated human ventricular AP model to date, developed using extensive experimental data from healthy human hearts. The model parameters as the baseline value for VP generation were adopted from the model presented by a research group at the University of Oxford. ¹⁷ A population of healthy AP models accounting for biological variability was constructed by assuming that variability is mostly caused by inter‐cellular differences in ion channel density rather than kinetics, as in previous research by Britton et al. ¹⁸ , ¹⁹ VPs were generated by varying 11 parameters of the endocardial model of healthy human ventricular myocytes, which represent individual differences. These 11 parameters were the ion channel densities of the main ionic currents, pumps, and exchangers characterizing the human ventricular AP: Ca²⁺ uptake into the endoplasmic reticulum by SERCA, Na⁺/K⁺ pump (NaK), the release of Ca²⁺ from the sarcoplasmic reticulum by ryanodine receptor (RyR), the transient outward potassium current (Ito), the rapid delayed rectifier potassium current (IKr), the slow delayed rectifier potassium current (IKs), the inwardly rectifying potassium current (IK1), Na⁺/Ca²⁺ exchanger (NCX), the L‐type calcium current (ICaL), the late sodium current (INaL), and the peak sodium current (INa). All parameters for individual differences were randomly sampled within the range of 0%–200% of the original settings in the endocardial model of healthy human ventricular myocytes (N = 500,000). We selected a wide range of 0–2 times the baseline value of each ionic current conductance to allow models with a wide variety of underlying ionic current configurations. This setting of limit was reported in a previous study by Britton et al., ¹⁹ to allow the investigation of abnormal ionic profiles that produce normal APs in control conditions but may have increased susceptibility to repolarization abnormalities under channel block.

Adoption criteria for healthy human VPs required that 10 features derived from AP waveforms, including action potential duration (APD) computed at 20%, 50%, and 90% of repolarization (APD20, APD50, and APD90, respectively), AP amplitude (APamp), mean upstroke velocity (dV/dt), resting membrane potential (RMP), calcium transient time to peak (CaTttp), and calcium transient relaxation time from peak computed at 50% and 90% of calcium transient decay (T50 and T90), as well as calcium transient amplitude (CaTamp), fell within the range of healthy variability of human experimental data ¹⁷ (Table S1). The acceptable AP waveforms, as healthy human VPs, were selected by independently satisfying all criteria for 10 features of the AP waveform shape, ensuring that these values were controlled within normal ranges defined by experimental data from non‐diseased human ventricular cardiomyocytes. Simulations were performed in which the AP waveform of an individual parameter set was stabilized by 500 repeated stimulations to reach steady state. The stimulation conditions for the myocardial cells involved pacing cycle length = 1000 ms (1 Hz), amplitude = −80 μA/μF, and duration = 1.0 ms.

A QSP model of drug‐induced arrhythmia was developed by using MATLAB (MathWorks Inc., Natick, MA, USA). All simulations utilized a variable‐step and variable‐order ODE stiff solver based on numerical differentiation formulas in the orders 1–5, known as an ode15 function in MATLAB. ²⁰

RESULTS

Generation simulation for the training dataset of machine learning

Figure 2 illustrates the results of generating a training dataset for ML. The figure shows the number of normal AP waveform VPs obtained from 10,000 simulations using two methods random search and Bayesian optimization search. In the random search, 281 normal AP waveform VPs were obtained from 10,000 simulations. Conversely, the Bayesian optimization search yielded ~1.5 times more (specifically, 416) normal AP waveform VPs than those with the random search (Table 1).

Results of the quantitative systems pharmacology (QSP) model simulations. Simulations were conducted using a random search and Bayesian optimization search for the construction of machine learning models. The x‐axis represents the number of simulations and the y‐axis represents the number of normal AP waveform virtual patients.

TABLE 1.

List of datasets for building machine learning models.

Dataset	Search methods	Models	Number of acceptances	Number of rejections
Raw dataset	Random search	RandML	281	9719
Raw dataset	Bayesian optimization search	BOML	416	9584
Balanced dataset	Random search	Balanced RandML	281	281
Balanced dataset	Bayesian optimization search	Balanced BOML	416	416

Open in a new tab

Performance evaluation of the AI models

We constructed ML models to predict whether VPs corresponded to normal AP waveforms using four types of generated training datasets. Initially, datasets from both the random search (normal AP waveform VPs: 281, abnormal AP waveform VPs: 9719) and Bayesian optimization search (normal AP waveform VPs: 416, abnormal AP waveform VPs: 9584) were used to build ML models, referred to as RandML and BOML, respectively (Table 1). Because of the potential impact of imbalanced datasets on ML model accuracy, we balanced the numbers of normal and abnormal AP waveform VPs by randomly sampling the abnormal AP waveform VPs (random search: normal AP waveform VPs: 281, abnormal AP waveform VPs: 281; Bayesian optimization search: normal AP waveform VPs: 416, abnormal AP waveform VPs: 416) and constructed balanced ML models, termed as Balanced RandML and Balanced BOML (Table 1). We then evaluated the performance of these models using fivefold cross‐validation and summarized the results in Figure 3. Additionally, Figure 3 presents the receiver operating characteristic curve (ROC) and precision–recall (PR) curve for the fivefold cross‐validation of each method.

Evaluation results of the fivefold coefficient of variation for the four training models. The models were constructed using two training data generation methods (random search and Bayesian optimization search) and two datasets (raw and balanced). The ROC‐AUC and PR‐AUC scores for each model are presented.

Comparing the models trained on the entire dataset obtained from the searches, BOML exhibited slightly better accuracy than that of RandML. This is suggested by the higher number of acceptances in the Bayesian optimization search, which resulted in a more balanced ratio of normal AP waveform VPs to abnormal AP waveform VPs. Next, we compared the performances of Balanced RandML and Balanced BOML, trained on datasets containing an equal number of normal AP waveform VPs and abnormal AP waveform VPs. The results showed improved accuracy when the numbers of normal and abnormal AP waveform VPs in the training datasets were balanced for both random and Bayesian optimization searches. In particular, the area under the PR curve (PR‐AUC) values improved significantly with balanced data. A high PR‐AUC indicates well‐balanced precision and recall, where the model maintains a high true‐positive rate and a low false‐positive rate. This suggests that with an equal ratio of normal AP waveform VPs to abnormal AP waveform VPs, the models can identify a large number of positive samples with high precision and effectively retrieve all positive samples.

Furthermore, as an additional evaluation of ML models, we randomly sampled 1,000 normal AP waveform VPs and 1,000 abnormal AP waveform VPs from a separate dataset of 500,000 data points, distinct from those in the 30,000 training datasets, and conducted 10 independent validation experiments (Figure S2). The results showed no significant differences between methods. Across all methods, the area under the ROC curve (ROC‐AUC) values were ~0.89, while the PR‐AUC values were ~0.87, indicating that accurate models were consistently constructed.

Results of applying machine learning models

Using the four constructed ML models, we predicted 500,000 data points, sorted them based on prediction scores, and subsequently executed QSP simulations starting from the highest‐scoring individuals. Additionally, we included the results from conventional random‐search‐based QSP executions, totaling five exploration methods, as shown in Figure 4a. The results showed that the number of individuals for which the QSP models were executed until 5000 normal AP waveform VPs were obtained was 21,239 for RandML, 18,206 for BOML, 20,805 for Balanced RandML, 18,620 for Balanced BOML, and 176,520 for the random search. The approach with the fewest QSP model executions to achieve 5000 normal AP waveform VPs was BOML (a machine learning model trained using data from a Bayesian optimization search). Furthermore, the acceptance rates were 23.5%, 27.5%, 24.0%, 26.9%, and 2.8%, with the highest acceptance rate observed for BOML at 27.5% (Figure 4b). In contrast, a conventional random search requires 176,520 simulations to achieve 5000 acceptances, resulting in an acceptance rate of 2.8%. Therefore, the proposed method achieved an ~10‐fold increase in acceptance rates compared with that with the conventional method, demonstrating a 10‐fold acceleration compared with the conventional approach. As shown in Figure 2, these results demonstrate a significantly higher search efficiency compared with the 1.5‐fold improvement observed in the comparison between the Bayesian optimization search and the conventional random method.

Exploration results using 500,000 samples scored by machine learning models. (a) Simulations were conducted starting from samples with the highest prediction scores, based on 500,000 data points predicted by machine learning models. The x‐axis represents the number of simulations, and the y‐axis represents the number of normal AP waveform virtual patients (VPs). The results from a random search on the 500,000 samples are also shown. (b) The rate of acceptances is graphed at the point where 5000 normal AP waveform VPs are reached.

DISCUSSION

The computational time required for QSP model simulations and low acceptance rates were identified as significant challenges. Therefore, we proposed a method using ML models aimed at achieving a 10‐fold acceleration compared with that in conventional methods. As shown in Figure 3, the exploration results demonstrated an ~10‐fold acceleration compared with that of conventional methods. Our proposed method, which utilizes a hybrid surrogate model of Bayesian optimization and ML, generated VPs faster than the method using ML alone. The purpose of our method is to obtain a diverse set of QSP model parameters that can generate acceptable AP waveforms. This approach can be considered as an inverse problem, where we aim to determine the distribution range of intrinsic parameters of potential VPs. The advantage of obtaining a diverse range of VP candidates is twofold: (1) it allows for the definition of a population of VPs without the need for extensive simulation calculations across multiple clinical trials, and (2) it enables more in‐depth scientific investigation by potentially including pathological risk factors, such as those seen in Long‐QT syndrome. This approach is particularly useful for elucidating the mechanistic features of patients, such as the differences in mechanisms between patients with chronic atrial fibrillation and those with sinus rhythm in dilated cardiomyopathy ¹¹ ; our method can further accelerate such research. Studies like ours, which utilize AI/ML technologies to accelerate virtual clinical trials using QSP models, are advancing rapidly, ¹⁰ including research on parameter estimation, ²¹ dimensionality reduction of QSP models, ²² , ²³ , ²⁴ and the generation of VPs using GANs. ²⁵ The number of research cases in virtual clinical trials is expected to continue increasing.

Herein, we developed a method that explicitly divides the traditional VP generation process into two stages, allowing for the acquisition of a VP population that can more effectively account for patient heterogeneity in the first stage. The QSP model used in this study for proarrhythmia assessment is designed for evaluating potential arrhythmic risks in a clinical context and thus does not need to be strictly tailored to specific clinical trials. Instead, it is intended to identify a broader range of potential risk patients, making it an appropriate evaluation method. In future studies, we plan to focus on clinical pharmacological themes such as type 2 diabetes and immuno‐oncology and conduct a technical investigation including the optimization process in the second stage for reconstructing multiple clinical trials as a VP.

Recently, reports from other research groups have indicated an ~11‐fold acceleration using ML surrogate models for VP generation. ²⁶ This underscores the effectiveness of VP generation using ML surrogate models, in the QSP models employed in this study as well as in various other contexts, highlighting it as a generally effective approach.

A limitation of our approach lies in its dependence on training data because of the use of ML models. This implies that the exploration results may have been influenced by the initial composition of the training data. To address this, we examined the overlap of VPs explored using the four proposed methods (RandML, BOML, Balanced RandML, and Balanced BOML) (Figure 5a). In total, 1828 individuals were explored across all methods, wherein approximately one‐third of the explored individuals were common to all methods. Conversely, approximately two‐thirds of the explored individuals differed between the methods. Additionally, we compared the overlap between the number of normal and abnormal VPs in RandML versus Balanced RandML, and in BOML versus Balanced BOML. There were 2976 common individuals between RandML and Balanced RandML and 3219 common individuals between BOML and Balanced BOML, indicating that more than half of the 5000 exploration results were common, while ~2000 individuals were unique to each method, ensuring a reasonable level of diversity. A comparison was conducted between the distributions of parameters for the Accepted VPs obtained from the initial exploration and those obtained from the ML score (Figure S3). The results indicated that a population with a distribution comparable to that of the initial exploration was successfully identified, thereby providing evidence that the number of VP models has indeed increased.

Analysis of the diversity of acquired virtual patient (VP) models. (a) Venn diagram showing the overlap of VPs obtained from four different models. (b) UMAP visualization of normal VPs acquired using four methods and abnormal VPs. (c) Hierarchical clustering analysis displaying VPs on the horizontal axis and 11 parameters on the vertical axis.

Additionally, to visualize the distribution of normal and abnormal AP waveform VPs obtained using the four methods, we mapped the 10 features derived from the AP waveforms (APD20, APD50, APD90, APamp, dV/dt, RMP, CaTtt, T50, T90, and CaTamp) onto a two‐dimensional space using UMAP ²⁷ (Figure 5b). The results showed that each method explored a diverse range of individuals rather than being biased toward certain types of individuals. This finding, which is consistent with the results shown in Figure 5a, reinforced the idea that a reasonable level of diversity was maintained during the exploration.

Furthermore, we performed a hierarchical clustering analysis (Figure 5c) on all 14,001 normal AP waveform VPs included in the 500,000 parameter sets using individual difference parameters and prediction scores from the proposed methods. The analysis suggested that BOML and Balanced BOML achieved a broader range of normal VP judgments because of the higher number of initial explorations, indicating the potential for more diverse VP models to be collected more quickly and efficiently. To efficiently generate diverse VP models, methods for pruning the exploration space can be considered, such as excluding parameter spaces that represent physiologically impossible phenotypes or imposing stricter constraints tailored to patient backgrounds. Previous studies have indicated the existence of parameter spaces indicating phenotypes that are physiologically implausible in nonlinear QSP model VP generation and have confirmed, through simulated data analyses, that the implementation of such constraints has certain beneficial effects. ²⁸

Based on the above, our proposed method achieved a 10‐fold increase in exploration efficiency compared with that of conventional methods for generating VPs. Additionally, the generated VPs were demonstrated to exhibit a relatively guaranteed diversity, affirming the utility of our approach. Although the effectiveness of QSP models has been demonstrated previously, our method suggests that their application in clinical settings can now be expected to be accelerated.

AUTHOR CONTRIBUTIONS

H.I. and R.S. wrote the manuscript and designed the research; R.S. and H.I. performed the research; and H.I. and R.S. analyzed the data.

FUNDING INFORMATION

This work was supported by a Grant‐in‐Aid for Transformative Research Areas (A) “Latent Chemical Space” [JP24H01771] for HI from the Ministry of Education, Culture, Sports, Science and Technology, Japan, and by the Japan Research Foundation for Clinical Pharmacology.

CONFLICT OF INTEREST STATEMENT

The authors declare no competing interests for this work.

Supporting information

Table S1

PSP4-14-486-s002.docx^{(15.5KB, docx)}

Figure S1

PSP4-14-486-s003.docx^{(17KB, docx)}

Figure S2

PSP4-14-486-s004.docx^{(661.6KB, docx)}

Figure S3

PSP4-14-486-s001.docx^{(161.4KB, docx)}

ACKNOWLEDGMENTS

This study was conducted as part of the Life Intelligence Consortium (LINC).

Iwata H, Saito R. Accelerating virtual patient generation with a Bayesian optimization and machine learning surrogate model. CPT Pharmacometrics Syst Pharmacol. 2025;14:486‐494. doi: 10.1002/psp4.13288

Contributor Information

Hiroaki Iwata, Email: iwata.hiroaki@tottori-u.ac.jp.

Ryuta Saito, Email: saitou.ryuuta@mc.mt-pharma.co.jp.

REFERENCES

1. EFPIA MID3 Workgroup , Marshall SF, Burghaus R, et al. Good practices in model‐informed drug discovery and development: practice, application, and documentation. CPT Pharmacometrics Syst Pharmacol. 2016;5:93‐122. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Marshall S, Madabushi R, Manolis E, et al. Model‐informed drug discovery and development: current industry good practice and regulatory expectations and future perspectives. CPT Pharmacometrics Syst Pharmacol. 2019;8:87‐96. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Cheng Y, Thalhauser CJ, Smithline S, et al. QSP toolbox: computational implementation of integrated workflow components for deploying multi‐scale mechanistic models. AAPS J. 2017;19:1002‐1016. [DOI] [PubMed] [Google Scholar]
4. Bai JPF, Earp JC, Pillai VC. Translational quantitative systems pharmacology in drug development: from current landscape to good practices. AAPS J. 2019;21:72. [DOI] [PubMed] [Google Scholar]
5. Aghamiri SS, Amin R, Helikar T. Recent applications of quantitative systems pharmacology and machine learning models across diseases. J Pharmacokinet Pharmacodyn. 2022;49:19‐37. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Saito R, Nakada T. Insights into drug development with quantitative systems pharmacology: a prospective case study of uncovering hyperkalemia risk in diabetic nephropathy with virtual clinical trials. Drug Metab Pharmacokinet. 2024;56:101019. [DOI] [PubMed] [Google Scholar]
7. Bai JPF, Earp JC, Florian J, et al. Quantitative systems pharmacology: landscape analysis of regulatory submissions to the US Food and Drug Administration. CPT Pharmacometrics Syst Pharmacol. 2021;10:1479‐1484. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Allen RJ, Rieger TR, Musante CJ. Efficient generation and selection of virtual populations in quantitative systems pharmacology models. CPT Pharmacometrics Syst Pharmacol. 2016;5:140‐146. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Rieger TR, Allen RJ, Bystricky L, et al. Improving the generation and selection of virtual populations in quantitative systems pharmacology models. Prog Biophys Mol Biol. 2018;139:15‐22. [DOI] [PubMed] [Google Scholar]
10. Zhang T, Androulakis IP, Bonate P, et al. Two heads are better than one: current landscape of integrating QSP and machine learning: an ISoP QSP SIG white paper by the working group on the integration of quantitative systems pharmacology and machine learning. J Pharmacokinet Pharmacodyn. 2022;49:5‐18. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Lawson BA, Drovandi CC, Cusimano N, Burrage P, Rodriguez B, Burrage K. Unlocking data sets by calibrating populations of models to data density: a study in atrial electrophysiology. Sci Adv. 2018;4:e1701676. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Kolesova G, Stepanov A, Lebedeva G, Demin O. Application of different approaches to generate virtual patient populations for the quantitative systems pharmacology model of erythropoiesis. J Pharmacokinet Pharmacodyn. 2022;49:511‐524. [DOI] [PubMed] [Google Scholar]
13. Motoyama Y, Tamura R, Yoshimi K, Terayama K, Ueno T, Tsuda K. Bayesian optimization package: PHYSBO. Comput Phys Commun. 2022;278:108405. [Google Scholar]
14. Ueno T, Rhone TD, Hou Z, Mizoguchi T, Tsuda K. COMBO: an efficient Bayesian optimization library for materials science. Mater Discov. 2016;4:18‐21. [Google Scholar]
15. Breiman L. Random forests. Mach Learn. 2001;45:5‐32. [Google Scholar]
16. O'Hara T, Virag L, Varro A, Rudy Y. Simulation of the undiseased human cardiac ventricular action potential: model formulation and experimental validation. PLoS Comput Biol. 2011;7:e1002061. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Passini E, Mincholé A, Coppini R, et al. Mechanisms of pro‐arrhythmic abnormalities in ventricular repolarisation and anti‐arrhythmic therapies in human hypertrophic cardiomyopathy. J Mol Cell Cardiol. 2016;96:72‐81. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Britton OJ, Bueno‐Orovio A, Van Ammel K, et al. Experimentally calibrated population of models predicts and explains intersubject variability in cardiac cellular electrophysiology. Proc Natl Acad Sci USA. 2013;110:E2098‐E2105. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Britton OJ, Bueno‐Orovio A, Virag L, Varro A, Rodriguez B. The electrogenic Na(+)/K(+) pump is a key determinant of repolarization abnormality susceptibility in human ventricular cardiomyocytes: a population‐based simulation study. Front Physiol. 2017;8:278. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Shampine LF, Reichelt MW. The matlab ode suite. SIAM J Sci Comput. 1997;18:1‐22. [Google Scholar]
21. Lu Y, Lee MY, Zhu S, Sinno T, Diamond SL. Multiscale simulation of thrombus growth and vessel occlusion triggered by collagen/tissue factor using a data‐driven model of combinatorial platelet signalling. Math Med Biol. 2017;34:523‐546. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Snowden TJ, van der Graaf PH, Tindall MJ. Model reduction in mathematical pharmacology: integration, reduction and linking of PBPK and systems biology models. J Pharmacokinet Pharmacodyn. 2018;45:537‐555. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Hasegawa C, Duffull SB. Selection and qualification of simplified QSP models when using model order reduction techniques. AAPS J. 2017;20:2. [DOI] [PubMed] [Google Scholar]
24. Derbalah A, Al‐Sallami HS, Duffull SB. Reduction of quantitative systems pharmacology models using artificial neural networks. J Pharmacokinet Pharmacodyn. 2021;48:509‐523. [DOI] [PubMed] [Google Scholar]
25. Parikh J, Rumbell T, Butova X, et al. Generative adversarial networks for construction of virtual populations of mechanistic models: simulations to study Omecamtiv Mecarbil action. J Pharmacokinet Pharmacodyn. 2022;49:51‐64. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Myers RC, Augustin F, Huard J, Friedrich CM. Using machine learning surrogate modeling for faster QSP VP cohort generation. CPT Pharmacometrics Syst Pharmacol. 2023;12:1047‐1059. [DOI] [PMC free article] [PubMed] [Google Scholar]
27. McInnes L, Healy J, Melville J. Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint 2018;arXiv:180203426.
28. Duffull S, Gulati A. Potential issues with virtual populations when applied to nonlinear quantitative systems pharmacology models. CPT Pharmacometrics Syst Pharmacol. 2020;9:613‐616. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1

PSP4-14-486-s002.docx^{(15.5KB, docx)}

Figure S1

PSP4-14-486-s003.docx^{(17KB, docx)}

Figure S2

PSP4-14-486-s004.docx^{(661.6KB, docx)}

Figure S3

PSP4-14-486-s001.docx^{(161.4KB, docx)}

[psp413288-bib-0001] 1. EFPIA MID3 Workgroup , Marshall SF, Burghaus R, et al. Good practices in model‐informed drug discovery and development: practice, application, and documentation. CPT Pharmacometrics Syst Pharmacol. 2016;5:93‐122. [DOI] [PMC free article] [PubMed] [Google Scholar]

[psp413288-bib-0002] 2. Marshall S, Madabushi R, Manolis E, et al. Model‐informed drug discovery and development: current industry good practice and regulatory expectations and future perspectives. CPT Pharmacometrics Syst Pharmacol. 2019;8:87‐96. [DOI] [PMC free article] [PubMed] [Google Scholar]

[psp413288-bib-0003] 3. Cheng Y, Thalhauser CJ, Smithline S, et al. QSP toolbox: computational implementation of integrated workflow components for deploying multi‐scale mechanistic models. AAPS J. 2017;19:1002‐1016. [DOI] [PubMed] [Google Scholar]

[psp413288-bib-0004] 4. Bai JPF, Earp JC, Pillai VC. Translational quantitative systems pharmacology in drug development: from current landscape to good practices. AAPS J. 2019;21:72. [DOI] [PubMed] [Google Scholar]

[psp413288-bib-0005] 5. Aghamiri SS, Amin R, Helikar T. Recent applications of quantitative systems pharmacology and machine learning models across diseases. J Pharmacokinet Pharmacodyn. 2022;49:19‐37. [DOI] [PMC free article] [PubMed] [Google Scholar]

[psp413288-bib-0006] 6. Saito R, Nakada T. Insights into drug development with quantitative systems pharmacology: a prospective case study of uncovering hyperkalemia risk in diabetic nephropathy with virtual clinical trials. Drug Metab Pharmacokinet. 2024;56:101019. [DOI] [PubMed] [Google Scholar]

[psp413288-bib-0007] 7. Bai JPF, Earp JC, Florian J, et al. Quantitative systems pharmacology: landscape analysis of regulatory submissions to the US Food and Drug Administration. CPT Pharmacometrics Syst Pharmacol. 2021;10:1479‐1484. [DOI] [PMC free article] [PubMed] [Google Scholar]

[psp413288-bib-0008] 8. Allen RJ, Rieger TR, Musante CJ. Efficient generation and selection of virtual populations in quantitative systems pharmacology models. CPT Pharmacometrics Syst Pharmacol. 2016;5:140‐146. [DOI] [PMC free article] [PubMed] [Google Scholar]

[psp413288-bib-0009] 9. Rieger TR, Allen RJ, Bystricky L, et al. Improving the generation and selection of virtual populations in quantitative systems pharmacology models. Prog Biophys Mol Biol. 2018;139:15‐22. [DOI] [PubMed] [Google Scholar]

[psp413288-bib-0010] 10. Zhang T, Androulakis IP, Bonate P, et al. Two heads are better than one: current landscape of integrating QSP and machine learning: an ISoP QSP SIG white paper by the working group on the integration of quantitative systems pharmacology and machine learning. J Pharmacokinet Pharmacodyn. 2022;49:5‐18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[psp413288-bib-0011] 11. Lawson BA, Drovandi CC, Cusimano N, Burrage P, Rodriguez B, Burrage K. Unlocking data sets by calibrating populations of models to data density: a study in atrial electrophysiology. Sci Adv. 2018;4:e1701676. [DOI] [PMC free article] [PubMed] [Google Scholar]

[psp413288-bib-0012] 12. Kolesova G, Stepanov A, Lebedeva G, Demin O. Application of different approaches to generate virtual patient populations for the quantitative systems pharmacology model of erythropoiesis. J Pharmacokinet Pharmacodyn. 2022;49:511‐524. [DOI] [PubMed] [Google Scholar]

[psp413288-bib-0013] 13. Motoyama Y, Tamura R, Yoshimi K, Terayama K, Ueno T, Tsuda K. Bayesian optimization package: PHYSBO. Comput Phys Commun. 2022;278:108405. [Google Scholar]

[psp413288-bib-0014] 14. Ueno T, Rhone TD, Hou Z, Mizoguchi T, Tsuda K. COMBO: an efficient Bayesian optimization library for materials science. Mater Discov. 2016;4:18‐21. [Google Scholar]

[psp413288-bib-0015] 15. Breiman L. Random forests. Mach Learn. 2001;45:5‐32. [Google Scholar]

[psp413288-bib-0016] 16. O'Hara T, Virag L, Varro A, Rudy Y. Simulation of the undiseased human cardiac ventricular action potential: model formulation and experimental validation. PLoS Comput Biol. 2011;7:e1002061. [DOI] [PMC free article] [PubMed] [Google Scholar]

[psp413288-bib-0017] 17. Passini E, Mincholé A, Coppini R, et al. Mechanisms of pro‐arrhythmic abnormalities in ventricular repolarisation and anti‐arrhythmic therapies in human hypertrophic cardiomyopathy. J Mol Cell Cardiol. 2016;96:72‐81. [DOI] [PMC free article] [PubMed] [Google Scholar]

[psp413288-bib-0018] 18. Britton OJ, Bueno‐Orovio A, Van Ammel K, et al. Experimentally calibrated population of models predicts and explains intersubject variability in cardiac cellular electrophysiology. Proc Natl Acad Sci USA. 2013;110:E2098‐E2105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[psp413288-bib-0019] 19. Britton OJ, Bueno‐Orovio A, Virag L, Varro A, Rodriguez B. The electrogenic Na(+)/K(+) pump is a key determinant of repolarization abnormality susceptibility in human ventricular cardiomyocytes: a population‐based simulation study. Front Physiol. 2017;8:278. [DOI] [PMC free article] [PubMed] [Google Scholar]

[psp413288-bib-0020] 20. Shampine LF, Reichelt MW. The matlab ode suite. SIAM J Sci Comput. 1997;18:1‐22. [Google Scholar]

[psp413288-bib-0021] 21. Lu Y, Lee MY, Zhu S, Sinno T, Diamond SL. Multiscale simulation of thrombus growth and vessel occlusion triggered by collagen/tissue factor using a data‐driven model of combinatorial platelet signalling. Math Med Biol. 2017;34:523‐546. [DOI] [PMC free article] [PubMed] [Google Scholar]

[psp413288-bib-0022] 22. Snowden TJ, van der Graaf PH, Tindall MJ. Model reduction in mathematical pharmacology: integration, reduction and linking of PBPK and systems biology models. J Pharmacokinet Pharmacodyn. 2018;45:537‐555. [DOI] [PMC free article] [PubMed] [Google Scholar]

[psp413288-bib-0023] 23. Hasegawa C, Duffull SB. Selection and qualification of simplified QSP models when using model order reduction techniques. AAPS J. 2017;20:2. [DOI] [PubMed] [Google Scholar]

[psp413288-bib-0024] 24. Derbalah A, Al‐Sallami HS, Duffull SB. Reduction of quantitative systems pharmacology models using artificial neural networks. J Pharmacokinet Pharmacodyn. 2021;48:509‐523. [DOI] [PubMed] [Google Scholar]

[psp413288-bib-0025] 25. Parikh J, Rumbell T, Butova X, et al. Generative adversarial networks for construction of virtual populations of mechanistic models: simulations to study Omecamtiv Mecarbil action. J Pharmacokinet Pharmacodyn. 2022;49:51‐64. [DOI] [PMC free article] [PubMed] [Google Scholar]

[psp413288-bib-0026] 26. Myers RC, Augustin F, Huard J, Friedrich CM. Using machine learning surrogate modeling for faster QSP VP cohort generation. CPT Pharmacometrics Syst Pharmacol. 2023;12:1047‐1059. [DOI] [PMC free article] [PubMed] [Google Scholar]

[psp413288-bib-0027] 27. McInnes L, Healy J, Melville J. Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint 2018;arXiv:180203426.

[psp413288-bib-0028] 28. Duffull S, Gulati A. Potential issues with virtual populations when applied to nonlinear quantitative systems pharmacology models. CPT Pharmacometrics Syst Pharmacol. 2020;9:613‐616. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Accelerating virtual patient generation with a Bayesian optimization and machine learning surrogate model

Hiroaki Iwata

Ryuta Saito

Abstract

Study Highlights.

INTRODUCTION