Abstract
Physiologically based pharmacokinetic (PBPK) models are useful tools in drug development and risk assessment of environmental chemicals. PBPK model development requires the collection of species-specific physiological, and chemical-specific absorption, distribution, metabolism, and excretion (ADME) parameters, which can be a time-consuming and expensive process. This raises a need to create computational models capable of predicting input parameter values for PBPK models, especially for new compounds. In this review, we summarize an emerging paradigm for integrating PBPK modeling with machine learning (ML) or artificial intelligence (AI)-based computational methods. This paradigm includes 3 steps (1) obtain time-concentration PK data and/or ADME parameters from publicly available databases, (2) develop ML/AI-based approaches to predict ADME parameters, and (3) incorporate the ML/AI models into PBPK models to predict PK summary statistics (eg, area under the curve and maximum plasma concentration). We also discuss a neural network architecture “neural ordinary differential equation (Neural-ODE)” that is capable of providing better predictive capabilities than other ML methods when used to directly predict time-series PK profiles. In order to support applications of ML/AI methods for PBPK model development, several challenges should be addressed (1) as more data become available, it is important to expand the training set by including the structural diversity of compounds to improve the prediction accuracy of ML/AI models; (2) due to the black box nature of many ML models, lack of sufficient interpretability is a limitation; (3) Neural-ODE has great potential to be used to generate time-series PK profiles for new compounds with limited ADME information, but its application remains to be explored. Despite existing challenges, ML/AI approaches will continue to facilitate the efficient development of robust PBPK models for a large number of chemicals.
Keywords: artificial intelligence, machine learning, physiologically based pharmacokinetic (PBPK) modeling, risk assessment, in vitro to in vivo extrapolation (IVIVE), pharmacometrics
Physiologically based pharmacokinetic (PBPK) modeling is a valuable computational tool that is capable of characterizing the pharmacokinetics (PK) or toxicokinetics (TK) by describing the processes of absorption, distribution, metabolism, and excretion (ADME) of a chemical and/or its metabolites in animals and humans (Fisher et al., 2020). PBPK models have been widely applied to support dosing recommendations and optimize the design of clinical trials in drug development (Abouir et al., 2021; Xiong et al., 2022; Zhao et al., 2011), interpret human biomonitoring data in epidemiological studies (Andersen et al., 2021; Campbell et al., 2016; Ruark et al., 2017), support in vitro to in vivo extrapolation (IVIVE) of TK or toxicity data (Chen et al., 2022a; Chou and Lin, 2020; Martin et al., 2015), estimate drug tissue residues and withdrawal times in food animals (Chou et al., 2022b; Lin et al., 2016; Zhou et al., 2021), and to extrapolate TK and toxicity data across different species (Chou and Lin, 2019; Lee et al., 2020) and different life stages (Chou and Lin, 2021; Mallick et al., 2020) to estimate the population risk in environmental chemical risk assessment. The creation of PBPK models for new chemicals, however, is considered time- and resource-intensive, and complicated because PBPK models are relatively complex in nature and consist of many parameters. Some of these parameters may have unknown values either due to difficulties in measurement or because they have not been measured before, thus leading to the estimation of these unknown parameters through the fitting of the model to observed data from in vivo studies. However, animal studies are time-consuming and expensive and have ethical concerns from the animal welfare perspectives, thus it is unrealistic to conduct animal studies to help develop a PBPK model for each chemical. Therefore, the use of machine learning (ML) and artificial intelligence (AI) approaches to predict the PK parameters as the input parameters in PBPK models of new chemical entities is increasingly appealing because these novel approaches not only largely speed up the efficient development of robust PBPK models for drugs and environmental chemicals but also save substantial resources, and have the potential to become an alternative approach to traditional in vivo-data-based PBPK modeling.
AI, a subdiscipline of computer science, is one of the most rapidly advancing technologies with the goal of developing machines or computational approaches that can solve various cognitive tasks at a level similar to or even exceeding human intelligence (Davidovic et al., 2021). ML is a subarea of AI which applies mathematical or computational algorithms to perform complex tasks by automatically learning from past data or knowledge (Ghahramani, 2015). Generally, there are 3 main types of ML methods: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning methods are used to train a model on known input and output data with the goal to predict new outputs based on new inputs; unsupervised learning techniques allow the trained model to cluster data in meaningful ways to identify intrinsic patterns or structures based on unknown input and output data relationships; and reinforcement learning is a feedback-based learning approach that is used to learn optimal actions in an environment to receive the maximum reward (LeCun et al., 2015; Matsuo et al., 2022). A new class of ML methods, called the deep neural network and often referred to as deep learning, enables to create more complex models with a logic structure similar to the human brain (LeCun et al., 2015). The ML and deep-learning algorithms constitute essential blocking of AI systems. These algorithms provide a data-driven approach to the evaluation of chemical/drug ADME and toxicity properties. Commonly used ML and deep learning models in the field of toxicology and a brief description of each method are provided in the accompanying review paper (Lin and Chou, 2022).
In recent years, ML algorithms have been utilized to develop quantitative structure-activity relationship (QSAR) models, including but not limited to the prediction of ADME properties and solving complex PK profiles of drugs and nanoparticles (Chou et al., 2022a; Danishuddin et al., 2022; Maltarollo et al., 2015). By leveraging multiple features/descriptors extracted from chemical (eg, open source molecular description ontologies such as ToxPrint and PaDEL, and a proprietary molecular descriptor calculator Dragon), biological (eg, open source databases ToxCast and Tox21), and kinetic (eg, the open source database PK-DB) descriptor databases or ontologies, ML-based methods can broaden the arsenal of QSAR models and are useful for the establishment of relationships between molecular descriptors and several critical PK parameters, such as volume of distribution at steady state (Vd) (Murad et al., 2021), clearance (Cl) (Dawson et al., 2021), terminal half-life (t1/2) (Wang et al., 2019), and fraction unbound in plasma (fu) (Watanabe et al., 2018). In one of the earlier examples, Gleeson et al. (2006) built a comprehensive QSAR model, by integrating Bayesian neural networks, classification and regression trees, and partial least squares, to predict Vd in rats and humans. In addition, one of the most notable ML methods, support vector machines, was first applied as a potential tool in QSAR by Liu et al. (2005) to predict the tissue/blood partition coefficients (Kp) of organic compounds in different tissues. This was followed by several studies that used ML methods to predict various PK parameters, such as human intestinal absorption (Kamiya et al., 2021b), Cl (Kosugi and Hosea, 2020), and Kp (Golmohammadi et al., 2012). Overall, ML methods have been successfully incorporated into QSAR models to predict different ADME properties.
With rapid advancements in theory, computational power, and optimized algorithms in recent years, the deep learning model, that is, the artificial neural networks model with multiple hidden layers was introduced as a viable computational tool in drug development (Chen et al., 2018a). Because of its advantage in the identification of higher levels of features and multitask properties, the deep learning model has been gaining popularity in drug development. In particular, the recent development of neural-ordinary differential equations (Neural-ODEs) technology, a new family of deep neural network models, provides a novel ODE modeling paradigm that can learn the governing ODE equations directly from PK data sets to simulate the time-course concentration profile of a drug (Lu et al., 2021a,b).
In this review article, we summarize an emerging research paradigm, based on recently published studies, to integrate ML/AI-based approaches with PBPK models (Figure 1). This paradigm includes 3 steps (1) obtain time-concentration PK data and/or ADME parameters from publicly available databases (Figure 1A), (2) develop ML/AI-based approaches to predict ADME parameters (Figure 1B), and (3) incorporate the ML/AI models into PBPK models to predict PK summary statistics [eg, area under the curve (AUC) and maximum plasma concentration (Cmax)] (Figure 1C). Based on this paradigm, we provide a comprehensive review on (1) publicly available databases that contain time-concentration PK data and/or ADME parameters, (2) recently reported ML/AI-based approaches in the prediction of ADME parameters for pharmaceutical and non-pharmaceutical compounds, and (3) available methods to integrate ML/AI approaches with PBPK modeling to predict PK summary statistics. Additionally, we highlight Neural-ODE’s potential use in the prediction of PK parameters based on recent studies.
Figure 1.
An emerging research paradigm to integrate machine learning and artificial intelligence approaches with physiologically based pharmacokinetic modeling. A, A database consisting of in vivo PK profiles and in vitro ADME assays (eg, permeability) is obtained from the literature, which is then used for the establishment of ML/DL-based model. B, ML/DL algorithms (eg, SVM, RF, and DNN) can be used to estimate ADME parameters by training with chemical descriptors and the properties of the molecules. These ADME parameters can be used as input parameters (eg, F, Cl, fu) for the development of a generic PBPK model. C, The ML-generic PBPK model can be used to generate the secondary PK parameters including AUC, Cmax, and Vd and subsequently be evaluated with in vivo PK data. Once a PBPK model is evaluated to be adequate or acceptable at Step C, it can then be used to generate simulated time-concentration data, which in turn can be incorporated into existing databases or become a new database for Step A. Abbreviations: ADME, absorption, distribution, metabolism, and excretion; AUC, area under the curve; Cl, clearance; Cmax, maximum plasma concentration; DL, deep learning; DNN, deep neural network; F, bioavailability; ML, machine learning; PBPK, physiologically based pharmacokinetic; PK, pharmacokinetic; RF, random forest; SVM, support vector machine; Vd, volume of distribution.
Application of machine learning/artificial intelligence-based methods to predict pharmacokinetic parameters
One of the big challenges to build a PBPK model is the difficulty to find adequate drug/chemical-specific parameters, especially for newly developed drugs or compounds, because measured values for most of the drug- or chemical-specific parameters are not available. To fill in this scientific gap, some researchers (Kamiya et al., 2019, 2020, 2021a, 2022; Pradeep et al., 2020; Schneckener et al., 2019; Wambaugh et al., 2015) proposed an integrated approach, by integrating a simplified PBPK model with an ML-based QSAR model, to estimate the plasma and tissue concentrations, as well as several PK parameters (eg, Vd, Cl, and t1/2). This emerging integrated research paradigm can be used to characterize, optimize, and predict the fundamental ADME parameters following several steps (Figure 1). In brief, the first step (Figure 1A) is to create databases or find existing databases consisting of in vivo time-concentration data, and PK parameters (Cao et al., 2012; Grzegorzewski et al., 2021; Lombardo et al., 2018; Sayre et al., 2020; Wang et al., 2019), in vitro assays on ADME parameters, and the structural and physicochemical properties of selected drugs and environmental chemicals (Gaulton et al., 2012; Pihan et al., 2012). Next (Figure 1B), based on the databases, ML/AI-based computational models can be established and applied to predict various ADME parameters (eg, Kp, Cl, and fu) based on the compound’s physicochemical and structural properties by training with data collected from Step 1 (Agatonovic-Kustrin et al., 2001; Antontsev et al., 2021; Baranwal et al., 2020; Deconinck et al., 2007; Gombar and Hall, 2013; Hou et al., 2007; Hsiao et al., 2013; Iwata et al., 2021; Jia et al., 2020; Kosugi and Hosea, 2020; Liu et al., 2005; Niwa, 2003; Paine et al., 2010; Paixao et al., 2010; Sarigiannis et al., 2017; Shen et al., 2010; Talevi et al., 2011; Wambaugh et al., 2015; Wang et al., 2017; Yun et al., 2014; Zhang et al., 2008). These ADME parameters can then be incorporated into a generic PBPK model (Wambaugh et al., 2015). Step 3 (Figure 1C) is to use the integrated ML-based PBPK model to predict the time-concentration profiles in plasma and tissues and to calculate relevant PK parameters such as AUC and Cmax (Kamiya et al., 2019, 2021a, 2022; Lu et al., 2021a,b; Sipes et al., 2017; Wambaugh et al., 2015). These PBPK-derived data will then be evaluated with in vivo PK data. Once a PBPK model is evaluated to be adequate or acceptable at Step 3 (Figure 1C), it can be used to generate simulation data, which in turn can be incorporated into existing databases or become a new database for Step 1. Each of these steps is described in further detail in the following sections.
Publicly available databases on PK data and/or parameters
ML/AI-based modeling approaches by their definition are about data-driven empirical science, and thus largely rely on the quality and size of data sets. In particular, the development of ML/AI-based QSAR models for the prediction of PK parameters usually requires a database with a relatively large amount of chemical information, such as molecular descriptors and corresponding PK parameters. The size of the database that is needed to develop an ML/AI model depends on the complexity of the model, and the number of chemicals can range from dozens (Antontsev et al., 2021) to thousands (Baranwal et al., 2020) of chemicals. Such curated databases provide a relationship between chemical structures and biological targets to serve as the training data set for the model. There are multiple well-developed PK-related databases (mainly in humans) that are available to be used to design ML-based prediction models. Table 1 shows a list of publicly available databases that contain time-concentration PK data and/or relevant PK parameters for different compounds. We will briefly describe recent advances in PK databases in the following paragraph.
Table 1.
A list of databases that contain pharmacokinetic data for machine learning analyses
| Database name | Number of compounds | PK parameters | Description | Website | References |
|---|---|---|---|---|---|
| PK-DB | 676 | Cl, t1/2, AUC, Cmax, Kel and PK time-courses data | PK-DB is a comprehensive database, which contains data from human clinical trials and provides curated PK information on characteristics of studied patient cohorts, applied interventions, PK parameters, and PK time-courses data. | https://pk-db.com | Grzegorzewski et al. (2021) |
| PK/DB | 1203 | HIA, F, fu, BBB, Vd, Cl, t1/2 | PK/DB is a robust database for PK studies and in silico ADME prediction. | www.pkdb.ifsc.usp.br | Moda et al. (2008) |
| PKKB | 1685 | HIA, fu, Vd, Cl, LD50 | Pharmacokinetic Knowledge Base (PKKB) is a comprehensive database of PK and toxic properties for drugs. | http://cadd.suda.edu.cn/admet | Cao et al. (2012) |
| e-Drug3D | 1852 | Vd, Cl, t½, PPB, F, Cmax, and Tmax | e-Drug3D is a database of 1852 FDA-approved drugs with 3-D chemical structures and information on PK parameters | https://chemoinfo.ipmc.cnrs.fr/MOLDB/index.php | Pihan et al. (2012) |
| ChEMBL | >1M | Not available | Open-access database containing ADME and toxic information for numerous drug-like compounds | www.ebi.ac.uk/chembl/ | Gaulton et al. (2012) |
| Lombardo's database | 1352 | Vd, Cl, MRT, fu, t1/2 | A human intravenous PK data set derived from the literature. | Not available | Lombardo et al. (2018) |
| Wang's database | 970 | HIA | A human intestinal absorption data set consists of 970 compounds, and 9 different types of descriptors. | Not available | Wang et al. (2017) |
| CvT | 144 | PK time-course data | A public database of chemical time-series concentration data for 144 environmentally relevant chemicals and their metabolites | https://github.com/USEPA/CompTox-PK-CvTdb | Sayre et al. (2020) |
Abbreviations: AUC, area under curve; BBB, blood brain barrier; Cl, clearance; Cmax, maximum concentration; F, oral bioavailability; fu, fraction unbound in plasma; HIA, human intestinal absorption; Kel, elimination rate; LD, lethal dose; MRT, mean residence time; PK, pharmacokinetic; PPB, plasma protein binding; t1/2, terminal half-life; Tmax, time to peak drug concentration; Vd, volume of distribution.
PK-DB provides high-quality PK data for 676 compounds collected from experimental and clinical studies, including information on studied patients in the cohorts (eg, age, bodyweight, smoking status, and genetic variants), dosing regimen (eg, dosing, substance, and administration route), PK parameters (eg, Cl, t1/2 and AUC) and PK time-course data (Grzegorzewski et al., 2021). PK/DB is a web-based database containing PK information of 1203 compounds (Moda et al., 2008). This online database also provides 5 in silico models for ADME predictions. PKKB is an online PK-information database that offers structural property data sets, including data on permeability across blood-to-brain barrier, P-glycoprotein inhibition, human intestinal absorption, and oral bioavailability (Cao et al., 2012). E-Drug3D is a curated database consisting of 1852 FDA-approved drugs with 3-D chemical structures and information on PK parameters (Pihan et al., 2012). This database includes experimental PK parameters from drug labels, including Vd, Cl, plasma protein binding, terminal t1/2, and oral bioavailability. ChEMBL is a manually curated open-access database of bioactive molecules for over 1 million chemicals (Gaulton et al., 2012). This database is maintained by European Molecular Biology Laboratory and provides functional and ADME-tox information for drug-like molecules. In this database, 74 004 records relevant to the ADME-tox information can be retrieved for the development of ML models. In addition to these publicly available databases, some researchers have collected data from the literature and regulatory agencies and shared their own data sets. For example, Lombardo et al. (2018) created a data set that contains human PK parameters, such as Vd, Cl, mean residence time, fu, and t1/2 for 1352 drugs. Wang et al. (2019) developed a human intestinal absorption database consisting of 970 compounds and 9 different types of descriptors for QSAR modeling. Due to the need for verification of in vitro assays and in silico models for environmental chemicals, Sayre et al. (2020) presented a public database CvT including PK time-series concentration data from 567 studies in humans or test animals for 144 environmentally relevant chemicals and their metabolites.
Pharmacokinetic parameter predictions for pharmaceutical compounds
A PBPK model can estimate ADME properties of pharmaceutical compounds and simulate the concentration versus time profiles in plasma and different tissues or organs to support dosing optimization. Apart from information on physiology, other critical input parameters in a PBPK model include the absorption rate constant following extravascular routes of administration, Kp, fu, and Cl. The conventional methods to obtain these input parameters are either based on in vivo or in vitro studies or calculated from mechanistic models, all 3 of which require drug-specific information. Recently, several researchers have leveraged advances in ML/AI-based methods for the structure-based prediction of critical PK parameters to establish a PBPK model. These ML/AI-based models can predict the general ADME properties of new compounds without the need for animal studies and therefore can also benefit the evaluation of the toxicity for emerging chemicals. In this section, we review recently developed predictive models for several critical ADME parameters related to the development of PBPK models for pharmaceutical compounds and summarize the progress of these predictive models. A list of these models is provided in Table 2.
Table 2.
A list of representative studies that used machine learning and artificial intelligence approaches in the predictions of absorption, distribution, metabolism, and excretion properties for pharmaceutical compounds
| References | N | Predict target | Descriptor types | Modeling method | Performancea |
|---|---|---|---|---|---|
| Absorption | |||||
| Agatonovic-Kustrin et al. (2001) | 86 | HIA | 0D–3D theoretical descriptors | ANN, RBF, GNN | Training set: R2 = 0.82; RMSE = 0.59 Test set: RMSE = 0.90 |
| Deconinck et al. (2007) | 67 | HIA | 1D–3D theoretical descriptors plus one of Abraham’s solvation parameters | MARS | Whole data set: RMSE = 7.2%; Whole data set: R2 = 0.93 |
| Niwa (2003) | 86 | HIA | 0D–1D theoretical descriptors | GRNN, PNN | Training set: RMSE = 6.5 |
| Test set: RMSE = 22.8 | |||||
| Talevi et al. (2011) | 120 | HIA | 0D–3D Dragon theoretical descriptors | MLR, ANN, SVM | Training set: R2= 0.8; RMSE = 0.18 |
| Test set: R2= 0.66; RMSE = 0.21 | |||||
| Yan et al. (2008) | 52 | HIA | Adriana Code and Cerius2 0D–2D theoretical descriptors | GA, PLS, SVM | Training set: R2= 0.66; RMSE = 12.5 |
| Test set: R2= 0.77; RMSE = 16 | |||||
| Shen et al. (2010) | 1593 | HIA | 1D–2D theoretical descriptors | SVM | Training set: Q = 98.5% |
| Test set: Q = 99% | |||||
| Kamiya et al. (2021b) | 184 | Papp | Chemical descriptors (not specific descriptions) | SVM, PLS, RBF | Whole data set: R = 0.84–0.85 |
| Ghafourian et al. (2012) | 310 | HIA | A total of 215 descriptors (not specific descriptions) | MLR | Training set: RMSE = 14.54 |
| Test set: RMSE = 23.84 | |||||
| Hou et al. (2007) | 648 | HIA | 0D–2D theoretical descriptors | MARS, GA | Training set: R2= 0.97.3 |
| Test set: R2= 0.98 | |||||
| Wang et al. (2017) | 970 | HIA | 2D–3D descriptors, molecular fingerprints, and structural fragments | RF | Training set: SE = 0.89; SP = 0.85; Q = 0.89 |
| Test set: SE = 0.88; SP = 0.81; Q = 0.87 | |||||
| Distribution | |||||
| Antontsev et al. (2021) | 21 | Kp | Not explained in the study | BIOiSIM | Test set: AFE = 0.96 (Cmax), 0.89 (AUC), 0.69 (Vd); AAFE = 1.2 (Cmax), 1.30 (AUC), 1.71 (Vd); R2 = 0.99 (Cmax), 0.98 (AUC), 0.99 (Vd) |
| Golmohammadi et al. (2012) | 310 | Kp | 3D descriptors and molecular structural information | SVM; GA, PLS | Training set: R2 = 0.98, RMSE = 0.117 |
| Test set: R2 = 0.98, RMSE = 0.118 | |||||
| Liu et al. (2005) | 208 | Kp | Constitutional, topological, geometrical, electrostatic and quantum chemical descriptors | SVM | Training set: R2 = 0.97, RMSE = 0.02 |
| Test set: R2 = 0.974, RMSE = 0.0289 | |||||
| Yun et al. (2014) | 122 | Kp | LogP, pKa, fu | DT; RF | Whole data set: Q = 72% |
| Metabolic | |||||
| Athersuch et al. (2013) | 15 | Classify the metabolic pathways of test compounds | PCA, PLS | Whole data set: R2 = 0.96, Q = 77.5% | |
| Baranwal et al. (2020) | 6669 | Classify the metabolic pathways of test compounds | RF and GCN | Test set: Q = 98.99% | |
| Jia et al. (2020) | 5682 | Classify the metabolic pathways of test compounds | RF | Whole data set: Q = 94% | |
| Zhang et al. (2008) | 44 | V max, Km | Molecular fingerprints | ANN | Whole data set: R2 = 0.6–0.9 (Km), R2 = 0.6–0.7 (Vmax), RMSE = 0.3–0.5 (Km), RMSE = 0.4–0.7 (Vmax) |
| Sarigiannis et al. (2017) | 54 | V max, Km | Physicochemical properties based on Abraham's solvation equation | ANN, NLR | Test set: R2 = 0.82 (Km), R2 = 0.99 (Vmax) |
| Elimination | |||||
| Hsiao et al. (2013) | 244 | Clint | Molecular fingerprints, physicochemical properties, and 3D quantum chemical descriptors | PLS, RF, PCA | Whole data set: R2 = 0.96; Q = 48% |
| Iwata et al. (2021) | 748 | Cltotal | The chemical structure was represented as graph data | DL | Test data set: GMFE = 2.68 |
| Kosugi and Hosea (2020) | 1114 | Cltotal | 2D SMARTS-based descriptors | RF, RBF | Whole data set: R2 = 0.55, RMSE= 0.332 |
| Paine et al. (2010) | 349 | Clrenal | 195 descriptors | RF | Training set: R2 = 0.93, RMSE = 0.32 |
| Test set: R2 = 0.63, RMSE = 0.63 | |||||
| Paixao et al. (2010) | 112 | Clint | 233 molecular descriptors | ANN | Training set: R2 = 0.953, RMSE = 0.236 |
| Test set: R2 = 0.804, RMSE = 0.544 | |||||
| Wang et al. (2019) | 1352 | Cltotal | 2D and 3D descriptors, and 49 fingerprints. | SVM, GBM, XGBoost | Training set: R2 = 0.882, RMSE = 0.239 |
| Test set: R2 = 0.875, RMSE = 0.103 | |||||
| Gombar and Hall (2013) | 525 | Cltot | 89 descriptors calculated from electro-topological state (E-state) fingerprints | SVM, MLR | Test set: R2 = 0.70 |
Abbreviations: AAFE, absolute average fold error; AFE, absolute fold error; ANN, artificial neural networks; Clint, intrinsic metabolic clearance; Clrenal, renal clearance; Cltotal, total plasma clearance; DL, deep learning; DT, decision tree; GA, generic algorithm; GBM, gradient boosting machine; GCN, graphical conventional network; GMFE, geometric mean fold error; GNN, general neural network; GRNN, general regression neural network; F, oral bioavailability; HIA, human intestinal absorption; Km, Michaelis constant; MARS, multivariate adaptive regression splines; MLR, multiple linear regression; NLR, nonlinear regression; Papp, apparent membrane permeability coefficients; PCA, principle component analysis; PLS, partial least squares; PNN, probabilistic neural network; Q, prediction accuracy; R2, squared Pearson’s correlation coefficient; RBF, radial basis function; RF, random forest; RMSE, root-mean-square error; SVM, support vector machine; Vmax, maximal reaction rate; XGBoost, eXtreme Gradient Boosting.
The performance from the best model.
Prediction of absorption parameters
The absorption parameters related to oral administration, such as oral bioavailability and gastrointestinal absorption rate constant are essential attributes in the drug development, and of high importance in the establishment of a PBPK model. Multiple QSAR models have been established to predict these oral absorption-related parameters either based on the in vitro assays such as human colon adenocarcinoma cell lines and the Madin Darby canine kidney (MDCK) cells or in vivo studies. These models were constructed based on different ML/AI-based methods, including artificial neural networks (Agatonovic-Kustrin et al., 2001; Niwa, 2003; Talevi et al., 2011), support vector machine (Yan et al., 2008), multivariate adaptive regression splines (Deconinck et al., 2007), and stepwise regression (Ghafourian et al., 2012). Several earlier models were generated on the basis of a small number of compounds (<100) (Agatonovic-Kustrin et al., 2001; Niwa, 2003). Later on, a large expert-curated database containing human intestinal absorption factors for 578 compounds was established (Hou et al., 2007). Based on this curated database, Hou et al. (2007) and Shen et al. (2010) built classification models using decision tree and support vector machine methods, respectively, to identify the poor- or good-absorption compounds with an accuracy of >0.95. Subsequently, Wang et al. (2017) extended the Hou et al. database to include 970 compounds and applied the relatively large data set to build a classification model based on a random forest algorithm with the predictive accuracy of 0.89 in the training set and 0.87 in the testing set. More recently, Kamiya et al. (2021b) applied a light gradient boosting machine learning model with 19 chemical descriptors to predict membrane permeability coefficients of human colorectal carcinoma cells of 219 different compounds. The predicted values were well correlated with observed values with the correlation coefficient ranging from 0.83 to 0.84.
Prediction of distribution parameters
Kp, a parameter that characterizes the tissue distribution of a chemical, is an essential input parameter in PBPK models. Kp indicates the extent of distribution or accumulation of a chemical in a tissue at steady-state conditions and represents the relative exposure of a chemical between different tissues, which can be determined based on in vivo studies in mice, rats, and dogs. However, these experiments are expensive and time-consuming (Lin et al., 2003). As such, several in silico models including but not limited to ML/AI-based methods have been developed (Andersen et al., 2021; Golmohammadi et al., 2012; Liu et al., 2005; Yun et al., 2014). Previous studies (Pearce et al., 2017; Poulin and Theil, 2002; Rodgers et al., 2005; Rodgers and Rowland, 2006; Schmitt, 2008) developed several mechanistic-based models, by considering a combination of tissue composition information and the chemical’s physicochemical characteristics such as lipophilicity, unbound fraction in plasma, and octanol-to-water partition coefficient, as well as the fractions of proteins, lipids, and phospholipids in a tissue, to predict Kp in different tissues or organs. For ML/AI-based methods, several ML methods, such as support vector machine (Golmohammadi et al., 2012; Liu et al., 2005) and decision trees (Yun et al., 2014) methods have been used to develop QSAR or quantitative structure-property relationship (QSPR) models for the prediction of Kp. Antontsev et al. (2021) leveraged ML and deep learning methods to develop a structure-based prediction of Kp based on a novel ML/AI-based pharmacokinetic/pharmacodynamic (PK/PD) modeling platform BIOiSIM. They compared the calculated PK output results (eg, AUC, Cmax, and Vd) that were based on the prediction of Kp using a ML/AI-based method with the simulated results that were based on a traditional mechanistic-based model (Rodgers et al., 2005; Rodgers and Rowland, 2006). The accuracy of simulation results was evaluated by absolute average fold error, average fold error, and R2 value across all PK outputs (eg, AUC, Cmax, and Vd) for a test set consisting of 21 chemicals. Their study showed that the performance of the predicted PK outputs with the ML/AI-predicted Kp (geometric mean fold-error = 1.53) was better than the results based on the Kp derived using the Rodgers et al. method, especially for the output Vd (eg, absolute average fold error: 1.71 vs. 2.63, R2: 0.99 vs. 0.77).
Prediction of metabolism parameters
Understanding a chemical’s metabolic processes is an important step in the development of a PBPK model because metabolism is essential in the overall elimination of most chemicals and it is necessary to predict the extent of formation of metabolite(s) and whether they are biologically toxic moieties. However, chemical metabolism is very complicated because a variety of enzymes, such as cytochromes P450 (CYPs) during Phase I metabolism and glutathione S-transferase during Phase II metabolism are involved in the metabolic processes. Since metabolic reactions, in general, are mediated by enzymes, there are several research groups employing ML methodologies to determine if a compound can be metabolized by a specific metabolic pathway. Arthersuch et al. (2013) developed a quantitative structure-metabolism relationship (QSMR) model based on several methods, such as principal component analysis, soft independent modeling of class analogy, and partial least squares to classify whether tested compounds can be metabolized by N-acetylation and subsequent N-oxanilic acid formation. Baranwal et al. (2020) proposed a hybrid ML framework by integrating random forest and graph convolutional neural networks to predict the classes of metabolic pathways that the tested compound belongs to. The graph convolutional network method was used to extract molecular shape features as input to the random forest model (ie, a hybrid model) and the authors compared their hybrid model performance with the random forest model with the input of traditional molecular descriptors (eg, molecular weight, water-octanol partition coefficient). A total of 4545 compounds were split randomly into training (3635 compounds) and testing (910 compounds) data sets. Their hybrid model can more accurately predict the respective metabolic pathway class of the 910 tested compounds (prediction accuracy Q = 98.99%) compared with random forest models with traditional molecular features (Q = 57.76%). However, Baranwal’s model can only identify metabolic pathway types of compounds rather than specific metabolic pathways. Therefore, Jia et al. (2020) developed a similarity-based classification model with random forest algorithms to pair the compounds with specific metabolic pathways. Their classification model was able to output “YES” or “NO” for each pair and identify whether the compound can be metabolized by a specific metabolic pathway.
Michaelis-Menten constant (Km) and maximum metabolic rate (Vmax) are 2 critical kinetic parameters in PBPK models, which are used to describe the maximum velocity of a metabolic reaction and the affinity of the enzyme for the substrate. Determining Km and Vmax values is often difficult and time-consuming. QSAR models have been used to estimate Km/Vmax values to support PBPK model development (Pirovano et al., 2015; Price and Krishnan, 2011; Sarigiannis et al., 2017; Sweeney and Sterner, 2022). A few studies have also attempted to predict the kinetic parameters with ML/AI-based models. For example, Zhang et al. (2008) developed artificial neural network models to predict the Km and Vmax values for the 5 important human CYP450 enzymes 1A2, 2C9, 2C19, 2D6, and 3A4. Their models showed great performance for the prediction of Km (R2: 0.6–0.9; root mean square error [RMSE]: 0.3–0.5) and Vmax (R2: 0.6–0.7; RMSE: 0.4–0.7) for different CYP450 enzymes. Sarigiannis et al. (2017) applied similar methods to establish QSAR models based on the artificial neural network and nonlinear regression models for the prediction of Km/Vmax values of 54 volatile organic chemicals. The artificial neural network models were trained with training (70% of total data) and validation data (15% of the total data) and then were evaluated with testing data (15% of total data). After training, the predictions of Km/Vmax values from the artificial neural network models outperformed the nonlinear regression model (Km: R2 = 0.31, Vmax: R2 = −0.42) with a higher predictive power (Km: R2 = 0.82, Vmax: R2 = 0.99) by evaluating with the testing data set.
Prediction of excretion parameters
Clearance (Cl), by definition, is the proportionality factor used to determine the rate of elimination of a compound from the body. Elimination of a chemical from the body may involve processes occurring in the kidney, liver, and lung or others. Therefore, total body clearance (Cltotal) can be calculated as the sum of respective clearances for each of these organs for a chemical. Several ML-based models for the predictions of Cl have already been published based on the data set derived from either in vivo studies or in vitro assays. Based on human renal clearance data, Paine et al. (2010) trained their ML model including partial least squares or random forest algorithms with a database of 349 compounds to predict the human renal clearance. In their study, the random forest model showed a superior performance compared to other models indicated by all statistical indicators (eg, Training data: R2 = 0.93, RMSE = 0.32; Test data: R2 = 0.63, RMSE = 0.63). Hsiao et al. (2013) applied several QSAR models based on random forest and other ML methods such as orthogonal partial least squares and multiple linear regression to predict the intrinsic metabolic clearance (Clint) in humans. Their data set consisted of 244 drugs derived from an extensive human PK data set (Varma et al., 2010). The random forest model showed a better performance (R2 = 0.87) than other 2 models (R2 = 0.59 for orthogonal partial least squares and R2 = 0.48 for multiple linear regression). Wang et al. (2019) developed different ML models including support vector machines, random forest, gradient-boosting machine, and extreme gradient boosting using a large human intravenous PK database consisted of 1268 drugs to predict human Cl. Among these models, the random forest model showed the best performance with R2 of 0.875 and RMSE of 0.103 in the test data set.
Recently, high-throughput in vitro metabolic assays are considered as one of the most promising tools to determine the metabolic clearance for new drugs. Based on such data, some computational models have been developed to convert the predicted in vitro Cl to in vivo Cl by using physiologically based scaling factors for the early selection process of new drugs (Iwata et al., 2021; Kosugi and Hosea, 2020). On the basis of experimentally determined in vitro metabolic clearance for 1114 compounds, Kosugi and Hosea (2020) compared the total plasma clearance (Cltotal) predicted from ML-based QSAR models and a conventional IVIVE approach. The ML-based QSAR model was generated from the commercial software StarDrop (StarDrop v6.5.0, Optibrium Ltd., Cambridge, UK) based on several ML techniques such as partial least squares, radial basis function fitting, random forest regression, and Gaussian process models to predict the Cltotal with the use of physicochemical descriptors and fingerprints calculated from chemical structure information as input features. Their models showed that the majority of Cltotal values predicted by random forest and radial basis function fitting models were within 2-fold difference of observed values in the test data set (random forest: 67%; radial basis function: 72%) and showed better predictivity compared to the conventional IVIVE approach (R2 for radial basis function: 0.55 vs. R2 for IVIVE: 0.297). Concerning the black box problem in ML-based QSAR model that can not sufficiently interpret the contribution of the chemical structure to the predictive target, Iwata et al. (2021) developed a new Cltotal prediction method by using “Deep Tensor”. The deep tensor model used in their study is a deep learning model that can process the graph data representing the connections between atoms in a chemical structure. Their study showed that the deep tensor model obtained a better prediction after model training with a geometric mean fold error of 2.68 and 48.5% prediction errors of 2-fold or less (% of 2-fold error) in test data set compared with the support vector regression model (geometric mean fold error: 2.88; % of 2-fold error: 43.9%) and conventional animal scale-up method (geometric mean fold error: 2.65; % of 2-fold error: 47.2%). Overall, the deep tensor model can not only convert the graphical data (eg, chemical structures) into interpretable descriptors but also extract the important features automatically to substantially improve the predicted performance. These characteristics were unseen in the conventional animal scale-up method and machine learning approaches (eg, support vector machine) (Fuji et al., 2019; Iwata et al., 2021). In addition, the deep tensor model is capable of showing the reasons behind the model-generated results and make them explainable by using an inference method which can learn from several interpretable models (eg, linear regression model) and knowledge-based graphs (Fuji et al., 2019; Iwata et al., 2021).
Pharmacokinetic parameter predictions for nonpharmaceutical chemicals
Several in silico models have been developed with a broad data set by including nonpharmaceutical chemicals such as pesticides, environmental and industrial chemicals to predict TK parameters which can serve as input for high-throughput toxicokinetic (HTTK) and PBPK models (Dawson et al., 2021; Ingle et al., 2016; Papa et al., 2018; Watanabe et al., 2018; Yun et al., 2021). A summary of previous efforts has been listed in Table 3. In one of the early studies, Wambaugh et al., (2015) used a random forest algorithm to develop a QSAR model based on the data from the literature (Wetmore, 2015; Wetmore et al., 2014) to predict the transporter affinity for 271 environmental chemicals. Ingle et al. (2016) constructed QSAR models based on multiple ML algorithms including k nearest neighbors, support vector machines and a random forest models from a large training sets of 1045 pharmaceuticals for the prediction of the fraction of chemical unbound by human plasma proteins. After model training, the models were used to evaluate with independent test sets of 200 pharmaceuticals and 406 environmental chemicals from the ToxCast library. The ToxCast project, implemented by US Environmental Protection Agency (EPA), includes data conducted by in vitro assays for nonpharmaceutical compounds such as pesticides, food additives, consumer products, and industrial products (Dix et al., 2007). When evaluated against the test data set, these models produced adequate predictability for either pharmaceuticals (mean absolute errors: 0.15-1.18; RMSE: 0.23-0.24) or environmental chemicals (mean absolute errors: 0.10-1.6; RMSE: 0.18-0.23). Based on the training set of over 1000 pharmaceuticals, Papa et al. (2018) developed several QSAR models with multiple linear regression equations and then selected the best model with the genetic algorithm to identify the persistent, bio-accumulative and toxic properties for over 1300 compounds including both pharmaceuticals and nonpharmaceuticals. In the external validation sets (ie, test set), the RMSE and concordance correlation coefficient from the best performance model were 0.66 and 0.87, respectively.
Table 3.
A List of representative studies that used machine learning and artificial intelligence approaches in the predictions of toxicokinetic parameters for nonpharmaceutical compounds
| References | N | Predict target | Descriptor types | Modeling method | Performancea |
|---|---|---|---|---|---|
| Wambaugh et al. (2015) | 271 | Transporter affinity | NA | RF | NA |
| Ingle et al. (2016) | 1651 | Fub | 2D molecular descriptors | kNN, SVM, RF | Training set: R2 = 0.82; RMSE = 0.59 |
| Test set: R2 = 0.51; RMSE = 0.218 | |||||
| Watanabe et al. (2018) | 2738 | Fub | 2D molecular descriptors | kNN, SVM, RF, PLS | Test set: R2 = 0.728; RMSE = 0.145 |
| Papa et al. (2018) | 1000 | Clint | 2–3D molecular descriptors | PLS | Whole data set: R2 = 0.80, RMSE= 0.62 |
| Pradeep et al. (2020) | 1487 | Fub, Clint | 0–3D molecular descriptors | SVM, RF, ANN |
|
| Dawson et al. (2021) | 6484 | Fub, Clint | 1–3D molecular descriptors | RF |
|
| Yun et al. (2021) | 818 | Fub | 2D molecular descriptors | kNN, SVM, RF, PLS | Test set: R2 = 0.52, Mean absolute error = 12.6 |
Abbreviations: ANN, artificial neural networks; Clint, intrinsic metabolic clearance; PLS, partial least squares; PNN, probabilistic neural network; Q, prediction accuracy; R2, squared Pearson’s correlation coefficient; RF, random forest; RMSE, root mean square error; SVM, support vector machine.
The performance from the best model.
Several studies have reported that the differences in chemical properties between pharmaceuticals and environmental or other nonpharmaceuticals chemicals might influence the ADME properties of chemicals (Sipes et al., 2017; Wambaugh et al., 2018; Wambaugh et al., 2015). Therefore, Dawson et al. (2021) expanded the training set from Ingle et al., database (Ingle et al., 2016) by including more other nonpharmaceutical chemicals (eg, pesticides, and industrial chemicals) to train random forest models for the prediction of Clint and fub and using a similar test set from ToxCast testing chemicals for model evaluation. The Dawson et al. model (R2 = 0.56) outperforms the model (R2 = 0.39) by Ingle et al. (2016) for the prediction of fub in the test set of environmental chemicals, but they had similar performance (R2 = 0.56 for the Dawson et al., model and R2 = 0.62 for the Ingle et al., model) in the test set of pharmaceuticals. For the prediction of Cint, the performance of the Dawson et al., model (R2 = 0.52) was better than a previous model developed by Sipes et al. (2017) (R2 = 0.20) trained by the data set of pharmaceuticals. Pradeep et al. (2020) performed a similar study using a training data set of 1487 environmental chemicals extracted from the published literature (Rotroff et al., 2010; Wetmore et al., 2013). Multiple ML algorithms such as lasso regression, support vector machine, random forest and neural network models were used to build a QSAR model. Their results showed that the best model for fub had RMSE = 0.80 and R2 = 0.57, and for Clint had RMSE = 0.40 and R2 = 0.16 for the external test set. The relative low performance in the prediction of Clint might be in part due to the large uncertainty that existed in the Clint test data.
A most recent study (Yun et al., 2021) compared the prediction accuracy of fub for human plasma proteins using 3 available QSAR models from Ingle et al. (2016), Watanabe et al. (2018), and an ADMET Predictor software program (Xiong et al., 2021). The predictive performance was evaluated based on 3 data categories including (1) highly binding or low-to-moderately binding compounds, (2) environmentally relevant and pharmaceutical compounds, and (3) acid-base properties (acid, base, neutral, and zwitterion). Their results showed that the prediction accuracy for all evaluated QSAR models was lower for environmentally relevant compounds than for pharmaceuticals. In addition, the prediction of fub from these models was uncertain in both highly binding and acid compounds compared with other types of chemicals. The study demonstrated that the structural differences between different types of chemicals such as pharmaceuticals and nonpharmaceutical compounds might contribute to the discrepancy and uncertainty of QSAR models in the predicted accuracy of fub (Yun et al., 2021).
Machine learning and artificial intelligence-based methods in PK/PBPK modeling
Machine learning-based methods in PBPK modeling
Kamiya et al. reported a series of studies (Kamiya et al., 2019, 2020, 2021a, 2022), by integrating a simplified PBPK model and an ML-based QSAR model, to estimate the plasma and tissue concentrations as well as estimation of several PK parameters (eg, AUC, Cmax, and t1/2) in animals and humans. This simple version of the PBPK model developed by Kamiya et al. (2019) only needs 3 input parameters including the absorption rate constant (ka), the volume of the systemic circulation (V1), and the hepatic intrinsic clearance (Clh, int), which can be estimated and optimized by an ensemble learning method: light gradient boosting machine (LightGBM) (Zhang et al., 2019). The LightGBM model was trained with 1718 molecular descriptors based on a data set of 246 disparate chemicals and showed that the predicted PK parameter values (Ka, V1, and Cl) and some PK metrics (Cmax and AUC) in rats were in well agreement with observed values with a correlation coefficient of 0.6–0.8 (Kamiya et al.,2020, 2021a). More recently, Kamiya et al. (2022) collected the time-dependent plasma concentration data in humans after oral dose for 212 chemicals from the literature and used the same methodology to predict the plasma concentrations of these chemicals in humans. Their model exhibited good correlation between predicted and measured values for Cmax (r = 0.85) and AUC (r = 0.80) (Kamiya et al., 2022). Similarly, Antontsev et al. (2021) applied a ML/AI-based PK/PD modeling platform BIOiSIM (Maharao et al., 2020) to calculate the PK outputs such as AUC, Cmax, Vd, and mean residence time. The BlOiSIM is an AI biosimulation software that includes a PBPK model with 14 individual compartments that represent major organs. This model was used to integrate with existing in vivo data sets to estimate the unknown PK parameters: Kp, blood-to-plasma ratio, and first-order absorption rate constant (h−1) and subsequently used these parameters in the simulation of the PBPK model. Their results showed R2 values of 0.6–0.99 for PK outputs (eg, Cmax and AUC) across all the compounds.
Deep learning for pharmacokinetic modeling
Recent advancement in deep learning algorithms has resulted in a substantial interest in applying these deep learning-based approaches to drug development and regulations (Liu et al., 2020). For example, scientists at US FDA have recently developed a PK/PD model based on a long short-term memory network, a type of recurrent neural network model that is capable of learning order or time dependence, to simulate the PK/PD profiles of a hypothetical drug (Liu et al., 2021). Specifically, they developed several long short-term memory network models with different hidden layers and perceptron, by training models with time sequences of plasma concentrations, to simulate the PD response under a specific dosing schedule. Their results showed that the model was able to predict the PD responses of individual patients with training data but could not do so accurately for patients under different dosing regimens such as on twice- or thrice-daily dosing scenarios. This suggests that there are still some difficulties and challenges in applying recurrent neural network-based PK/PD models (Liu et al., 2020, 2021). The recurrent neural network model utilized for modeling time-series data such as PK studies have disadvantages in model architecture and capabilities that lead to less efficient and accurate prediction (Neil, 2016). The constraints of sampling rate and frequency in recurrent neural network models lead to the process of continuous time series as a discrete-time sequence along with the high computational load and memory usage. In addition, if the time gap between 2 observations is too large (ie, irregularly time-series data), it could have negative influences on model predictions because the dependency on the computation of previous stages in the recurrent neural network models. These weaknesses might affect the efficiency of the recurrent neural network-based PK models negatively as both dosing and measured times could be irregular but unseen in the recently developed Neural-ODE model (Lu et al., 2021a).
Several studies have demonstrated the strength and capability of Neural-ODE models for processing time-series analysis using deep-learning approaches (Bonnaffe et al., 2021; Chen et al., 2022b) since the concept of the Neural-ODE model was published in 2018 (Chen et al., 2018b). The Neural-ODE model improves the predictive capability provided by the process of irregular time-series data and generation of the input-output mapping as the numerical integration of an ODE system described by a neural network. The Neural-ODE algorithm is a well-suited methodology for PK modeling. Lu et al. (2021a) developed the first Neural-ODE PK model with the clinical PK data from trastuzumab emtansine (Boyraz et al., 2013) (ie, a conjugated monoclonal antibody drug that has been approved for the treatment of breast cancers). Following 2 different dosing regimens, their Neural-ODE PK model showed substantially better performance over several commonly used ML-based methods (LightGBM and long short-term memory network) and a traditional population PK model with a nonlinear mixed effects approach (Lu et al., 2021a). In addition, Lu et al. (2021b) trained their Neural-ODE PK/PD model and applied it to analyze the relationship between the drug concentration and platelet response based on a clinical data set. The performance of the novel Neural-ODE PK/PD model indicates great advantages compared to other traditional methods and thus it warrants further development. Although the Neural-ODE model used in the Lu et al. studies (Lu et al., 2021a,b) only made the PK prediction, it brings another novel paradigm to enable the integration of a PBPK model with a deep neural network approach (ie, Neural-ODE algorithms). Based on the general structure of Neural-ODE (Figure 2), the Lu et al., studies inputted 5 features such as time intervals of drug dosing, treatment time, the dosing cycle number and dosing amount from PK observations, and the systems of governing ODEs function directly generated by the ODE solver in the Neural-ODE model to output the time-series PK predictions through a decoder part (Figure 2). In the future, this Neural-ODE model can integrate different features, which will enhance the ability of this model to extrapolate to generic PBPK simulations applied with varying drug properties and experimental settings.
Figure 2.
Schematic of the neural ordinary differential equation (Neural-ODE) model. The Neural-ODE model consists of encoder, ODE solver, and decoder parts to predict the time-series PK profiles. The TFDS, TIME, CYCL, AMT, and PK cycle 1 observation are used as the input features in the RNN encoder part. Then ODE solvers are used to incorporate dosing information into the time sequence before the decoder generates the predictions. Abbreviations: AMT, the dosing amount in milligrams; CYCL, the current dosing cycle number; PK, pharmacokinetic; ODE, ordinary differential equation; RNN, recurrent neural network; TFDS, the time in hours between each dose; TIME, the time in hours since the start of the treatment. This figure was adapted based on Lu et al. (2021a) with permission from the publisher.
Conclustions and future perspectives
In order to support the development of PBPK models to effectively provide sufficient and accurate information during drug development and environmental chemical risk assessment processes with minimal in vivo or in vitro testing, an accurate and interpretable computational predictive model is needed. Approaches on how to integrate ML/AI algorithms with PBPK modeling have been reported in several recent studies (Gao et al., 2021; Kamiya et al., 2019, 2020, 2021a, 2022; Schneckener et al., 2019) and also summarized in the present review. However, there are still some challenges and practical constraints in this area.
Firstly, there is a lack of sufficient interpretability of existing models. Because of the black-box nature of the ML algorithms, the ML-based model does not allow full interpretation of the contributions of a chemical structural property, nor extract key features for the prediction of the target object. To address this limitation, researchers need to design inherently interpretable models, rather than trying to explain black box models (Rudin, 2019). Recently, Ciallella et al. (2021) developed a knowledge-based deep neural network to identify estrogen mimetics. Due to the development of model architecture based on the explainable adverse outcome pathway representing the signaling pathway initiated by estrogen receptor alpha (Erα), their model can make interpretable end-to-end predictions. Such knowledge- and mechanism-based approach is a potential universal strategy to develop interpretable ML models.
Secondly, the wealth of existing ML-based QSAR models relies on vastness of chemical structural properties, including “fingerprint” descriptors, to predict the PK parameters. ML models, during the training process, are very likely to suffer from overfitting problems because of too many input features and small training data sets. The overfitting issues will cause the model to not accurately predict the data set outside of training data and may misrepresent potential hazard when the misleading information is applied in the prediction of toxicity endpoints (Luechtefeld et al., 2018). In addition, during model development, it is important to apply the principle of parsimony to choose a small number of necessary parameters to fit for the best results and meanwhile to avoid over-parameterization (Basak and Vracko, 2020; Chiu et al., 2007; Clewell and Clewell, 2008). Thus, to achieve high performance and prevent overfitting, researchers need to substantially reduce the number of chemical descriptors. However, this is prone to create a problem of potentially excluding crucial descriptors, which in turn results in the potential loss of predictive information. The advance in deep learning approaches might mitigate this challenge and facilitates the development of better predictive models. With the advantages of automatic feature extraction and advanced optimization approaches such as L1 or L2 regulation, dropout, and early stopping to prevent from overfitting (Ying, 2019), deep learning models have become a more viable method for QSAR model predictions (Goh et al., 2017). Our recent study to predict the delivery efficiency of nanomedicines based on several ML and deep-learning models showed that the random forest model had better predictive performance than other ML models used in our study, but the deep neural network model had the best predictive performance among all selected methods (Lin et al., 2022). In addition, Iwata et al. (2021) used the graph convolutional network model to provide an added benefit in that the chemical structure was converted to graph inputs and gain more accuracy than other models. The deep learning-based models used in the prediction of PK parameters have been demonstrated to have higher predictability in multiple studies (Chen et al., 2018a; Iwata et al., 2021).
Another challenge of using QSAR models to predict PK parameters is the lack of quality experimental data and the structural diversity of compounds. Most existing QSAR models are built only based on the training data consisted of pharmaceutical compounds, which do not represent a diversity of compounds, in part, due to the lack of experimental data for nonpharmaceutical compounds. However, the chemical properties of nonpharmaceutical compounds such as environmental and industrial chemicals vary from pharmaceuticals, which can contribute to the discrepancy in the prediction accuracy of QSAR models. For example, pharmaceuticals are designed with the purpose of being absorbed following oral administration, yet it is not the case for nonpharmaceutical compounds. In addition, the environmental chemicals, in many cases, were more lipophilic and lower numbers of functional groups as compared to pharmaceuticals. A previous study has indicated that the models developed to predict absorption for pharmaceuticals cannot predict well for environmental chemicals (Wambaugh et al., 2018). Recent studies also demonstrated that the prediction accuracy of QSAR models that were built with inadequate data sets (ie, imbalance data sets between different types of compounds) is uncertain for nonpharmaceutical compounds (Dawson et al., 2021; Yun et al., 2021). It is necessary, therefore, to expand the training set by including more structural diversity of compounds when building ML-based QSAR models.
Finally, the development of mechanistically credible PBPK models for new compounds with insufficient prior knowledge is difficult and challenging because the chemical ADME pathways are not well characterized, nor mathematically formulized. With the recent development of the Neural-ODE algorithms, it is possible to generate PBPK simulations for a new drug based on its features that can learn the governing ODE equations algorithmically and directly from PK data without well-characterized prior information (Chen et al., 2018b). Overall, the advances in ML/AI approaches, particularly for the deep neural network model, could solve some of the current challenges, thereby helping to improve the performance of PK and PBPK modeling and simulations to support drug discovery and development, as well as human health risk assessment of environmental chemicals.
Funding
The United States Department of Agriculture (USDA) National Institute of Food and Agriculture (NIFA) for the Food Animal Residue Avoidance Databank (FARAD) Program (2021-41480-35271); the United States National Institutes of Health (NIH) National Institute of Biomedical Imaging and Bioengineering (NIBIB) Research Grant Program (R01EB031022); and the New Faculty Start-up Funds from the University of Florida.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Contributor Information
Wei-Chun Chou, Department of Environmental and Global Health, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA; Center for Environmental and Human Toxicology, University of Florida, Gainesville, FL 32608, USA.
Zhoumeng Lin, Department of Environmental and Global Health, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA; Center for Environmental and Human Toxicology, University of Florida, Gainesville, FL 32608, USA.
References
- Abouir K., Samer C. F., Gloor Y., Desmeules J. A., Daali Y. (2021). Reviewing data integrated for PBPK model development to predict metabolic drug-drug interactions: Shifting perspectives and emerging trends. Front. Pharmacol. 12, 708299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Agatonovic-Kustrin S., Beresford R., Yusof A. P. M. (2001). Theoretically-derived molecular descriptors important in human intestinal absorption. J. Pharm. Biomed. Anal. 25, 227–237. [DOI] [PubMed] [Google Scholar]
- Andersen M. E., Mallick P., Clewell H. J. 3rd, Yoon M., Olsen G. W., Longnecker M. P. (2021). Using quantitative modeling tools to assess pharmacokinetic bias in epidemiological studies showing associations between biomarkers and health outcomes at low exposures. Environ. Res. 197, 111183. [DOI] [PubMed] [Google Scholar]
- Antontsev V., Jagarapu A., Bundey Y., Hou H., Khotimchenko M., Walsh J., Varshney J. (2021). A hybrid modeling approach for assessing mechanistic models of small molecule partitioning in vivo using a machine learning-integrated modeling platform. Sci. Rep. 11, 11143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Athersuch T. J., Wilson I. D., Keun H. C., Lindon J. C. (2013). Development of quantitative structure-metabolism (QSMR) relationships for substituted anilines based on computational chemistry. Xenobiotica 43, 792–802. [DOI] [PubMed] [Google Scholar]
- Baranwal M., Magner A., Elvati P., Saldinger J., Violi A., Hero A. O. (2020). A deep learning architecture for metabolic pathway prediction. Bioinformatics 36, 2547–2553. [DOI] [PubMed] [Google Scholar]
- Basak S. C., Vracko M. G. (2020). Parsimony principle and its proper use/application in computer-assisted drug design and QSAR. Curr. Comput. Aided Drug Des. 16, 1–5. [DOI] [PubMed] [Google Scholar]
- Bonnaffe W., Sheldon B., Coulson T. (2021). Neural ordinary differential equations for ecological and evolutionary time-series analysis. Methods Ecol. Evol. 12, 1301–1315. [Google Scholar]
- Boyraz B., Sendur M. A. N., Aksoy S., Babacan T., Roach E. C., Kizilarslanoglu M. C., Petekkaya I., Altundag K. (2013). Trastuzumab emtansine (T-DM1) for HER2-positive breast cancer. Curr. Med. Res. Opin. 29, 405–414. [DOI] [PubMed] [Google Scholar]
- Campbell J. L., Andersen M. E., Hinderliter P. M., Yi K. D., Pastoor T. P., Breckenridge C. B., Clewell H. J. (2016). PBPK model for atrazine and its chlorotriazine metabolites in rat and human. Toxicol. Sci. 150, 441–453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cao D., Wang J., Zhou R., Li Y., Yu H., Hou T. (2012). Admet evaluation in drug discovery. 11. Pharmacokinetics knowledge base (PKKB): A comprehensive database of pharmacokinetic and toxic properties for drugs. J. Chem. Inf. Model. 52, 1132–1137. [DOI] [PubMed] [Google Scholar]
- Chen H. M., Engkvist O., Wang Y. H., Olivecrona M., Blaschke T. (2018a). The rise of deep learning in drug discovery. Drug Discov. Today 23, 1241–1250. [DOI] [PubMed] [Google Scholar]
- Chen Q., Chou W. C., Lin Z. (2022a). Integration of toxicogenomics and physiologically based pharmacokinetic modeling in human health risk assessment of perfluorooctane sulfonate. Environ. Sci. Technol. 56, 3623–3633. [DOI] [PubMed] [Google Scholar]
- Chen T. Q., Rubanova Y., Bettencourt J., Duvenaud D. K. (2018b). Neural ordinary differential equations. In Advances in neural information processing systems 31 (NeurIPS 2018), 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada, pp. 6571–6583. [Google Scholar]
- Chen X., Araujo F. A., Riou M., Torrejon J., Ravelosona D., Kang W., Zhao W., Grollier J., Querlioz D. (2022b). Forecasting the outcome of spintronic experiments with neural ordinary differential equations. Nat. Commun. 13, 1016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chiu W. A., Barton H. A., DeWoskin R. S., Schlosser P., Thompson C. M., Sonawane B., Lipscomb J. C., Krishnan K. (2007). Evaluation of physiologically based pharmacokinetic models for use in risk assessment. J. Appl. Toxicol. 27, 218–237. [DOI] [PubMed] [Google Scholar]
- Chou W. C., Lin Z. (2020). Probabilistic human health risk assessment of perfluorooctane sulfonate (PFOS) by integrating in vitro, in vivo toxicity, and human epidemiological studies using a Bayesian-based dose-response assessment coupled with physiologically based pharmacokinetic (PBPK) modeling approach. Environ. Int. 137, 105581. [DOI] [PubMed] [Google Scholar]
- Chou W. C., Lin Z. (2019). Bayesian evaluation of a physiologically based pharmacokinetic (PBPK) model for perfluorooctane sulfonate (PFOS) to characterize the interspecies uncertainty between mice, rats, monkeys, and humans: Development and performance verification. Environ. Int. 129, 408–422. [DOI] [PubMed] [Google Scholar]
- Chou W. C., Lin Z. (2021). Development of a gestational and lactational physiologically based pharmacokinetic (PBPK) model for perfluorooctane sulfonate (PFOS) in rats and humans and its implications in the derivation of health-based toxicity values. Environ. Health Perspect. 129, 37004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chou W. C., Cheng Y. H., Riviere J. E., Monteiro-Riviere N. A., Kreyling W. G., Lin Z. (2022a). Development of a multi-route physiologically based pharmacokinetic (PBPK) model for nanomaterials: A comparison between a traditional versus a new route-specific approach using gold nanoparticles in rats. Part. Fibre Toxicol. 19, 47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chou W. C., Tell L. A., Baynes R. E., Davis J. L., Maunsell F. P., Riviere J. E., Lin Z. (2022b). An interactive generic physiologically based pharmacokinetic (igPBPK) modeling platform to predict drug withdrawal intervals in cattle and swine: A case study on flunixin, florfenicol, and penicillin G. Toxicol. Sci. 188, 180–197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ciallella H. L., Russo D. P., Aleksunes L. M., Grimm F. A., Zhu H. (2021). Revealing adverse outcome pathways from public high-throughput screening data to evaluate new toxicants by a knowledge-based deep neural network approach. Environ. Sci. Technol. 55, 10875–10887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clewell R. A., Clewell H. J. (2008). Development and specification of physiologically based pharmacokinetic models for use in risk assessment. Regul. Toxicol. Pharmacol. 50, 129–143. [DOI] [PubMed] [Google Scholar]
- Davidovic L. M., Laketic D., Cumic J., Jordanova E., Pantic I. (2021). Application of artificial intelligence for detection of chemico-biological interactions associated with oxidative stress and DNA damage. Chem. Biol. Interact. 345, 109533. [DOI] [PubMed] [Google Scholar]
- Dawson D. E., Ingle B. L., Phillips K. A., Nichols J. W., Wambaugh J. F., Tornero-Velez R. (2021). Designing QSARs for parameters of high-throughput toxicokinetic models using open-source descriptors. Environ. Sci. Technol. 55, 6505–6517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deconinck E., Ates H., Callebaut N., Van Gyseghem E., Vander Heyden Y. (2007). Evaluation of chromatographic descriptors for the prediction of gastro-intestinal absorption of drugs. J. Chromatogr. A 1138, 190–202. [DOI] [PubMed] [Google Scholar]
- Dix D. J., Houck K. A., Martin M. T., Richard A. M., Setzer R. W., Kavlock R. J. (2007). The toxcast program for prioritizing toxicity testing of environmental chemicals. Toxicol. Sci. 95, 5–12. [DOI] [PubMed] [Google Scholar]
- Fisher J. W., Gearhart J., Lin Z. (2020). Physiologically Based Pharmacokinetic (PBPK) Modeling - Methods and Applications in Toxicology and Risk Assessment, 1st ed, pp. 1-346. Elsevier, Amsterdam, Netherlands. [Google Scholar]
- Fuji M., Morita H., Goto K., Maruhashi K., Anai H., Igata N. (2019). Explainable AI through combination of deep tensor and knowledge graph. Fujitsu Sci. Tech. J. 55, 58–64. [Google Scholar]
- Gao H., Wang W., Dong J., Ye Z., Ouyang D. (2021). An integrated computational methodology with data-driven machine learning, molecular modeling and PBPK modeling to accelerate solid dispersion formulation design. Eur. J. Pharm. Biopharm. 158, 336–346. [DOI] [PubMed] [Google Scholar]
- Gaulton A., Bellis L. J., Bento A. P., Chambers J., Davies M., Hersey A., Light Y., McGlinchey S., Michalovich D., Al-Lazikani B., et al. (2012). ChEMBL: A large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–1107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ghafourian T., Freitas A. A., Newby D. (2012). The impact of training set data distributions for modelling of passive intestinal absorption. Int. J. Pharm. 436, 711–720. [DOI] [PubMed] [Google Scholar]
- Ghahramani Z. (2015). Probabilistic machine learning and artificial intelligence. Nature 521, 452–459. [DOI] [PubMed] [Google Scholar]
- Gleeson M. P., Waters N. J., Paine S. W., Davis A. M. (2006). In silico human and rat vss quantitative structure-activity relationship models. J. Med. Chem. 49, 1953–1963. [DOI] [PubMed] [Google Scholar]
- Goh G. B., Hodas N. O., Vishnu A. (2017). Deep learning for computational chemistry. J. Comput. Chem. 38, 1291–1307. [DOI] [PubMed] [Google Scholar]
- Golmohammadi H., Dashtbozorgi Z., Acree W. E. (2012). Quantitative structure-activity relationship prediction of blood-to-brain partitioning behavior using support vector machine. Eur. J. Pharm. Sci. 47, 421–429. [DOI] [PubMed] [Google Scholar]
- Gombar V. K., Hall S. D. (2013). Quantitative structure-activity relationship models of clinical pharmacokinetics: Clearance and volume of distribution. J. Chem. Inf. Model. 53, 948–957. [DOI] [PubMed] [Google Scholar]
- Grzegorzewski J., Brandhorst J., Green K., Eleftheriadou D., Duport Y., Barthorscht F., Köller A., Ke D. Y. J., De Angelis S., König M. (2021). PK-DB: Pharmacokinetics database for individualized and stratified computational modeling. Nucleic Acids Res. 49, D1358–D1364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hou T. J., Wang J. M., Zhang W., Xu X. J. (2007). ADME evaluation in drug discovery. 7. Prediction of oral absorption by correlation and classification. J. Chem. Inf. Model. 47, 208–218. [DOI] [PubMed] [Google Scholar]
- Hsiao Y. W., Fagerholm U., Norinder U. (2013). In silico categorization of in vivo intrinsic clearance using machine learning. Mol. Pharm. 10, 1318–1321. [DOI] [PubMed] [Google Scholar]
- Ingle B. L., Veber B. C., Nichols J. W., Tornero-Velez R. (2016). Informing the human plasma protein binding of environmental chemicals by machine learning in the pharmaceutical space: Applicability domain and limits of predictability. J. Chem. Inf. Model. 56, 2243–2252. [DOI] [PubMed] [Google Scholar]
- Iwata H., Matsuo T., Mamada H., Motomura T., Matsushita M., Fujiwara T., Kazuya M., Handa K. (2021). Prediction of total drug clearance in humans using animal data: Proposal of a multimodal learning method based on deep learning. J. Pharm. Sci. 110, 1834–1841. [DOI] [PubMed] [Google Scholar]
- Jia Y. J., Zhao R., Chen L. (2020). Similarity-based machine learning model for predicting the metabolic pathways of compounds. IEEE Access 8, 130687–130696. [Google Scholar]
- Kamiya Y., Handa K., Miura T., Ohori J., Kato A., Shimizu M., Kitajima M., Yamazaki H. (2022). Machine learning prediction of the three main input parameters of a simplified physiologically based pharmacokinetic model subsequently used to generate time-dependent plasma concentration data in humans after oral doses of 212 disparate chemicals. Biol. Pharm. Bull. 45, 124–128. [DOI] [PubMed] [Google Scholar]
- Kamiya Y., Handa K., Miura T., Yanagi M., Shigeta K., Hina S., Shimizu M., Kitajima M., Shono F., Funatsu K., et al. (2021a). In silico prediction of input parameters for simplified physiologically based pharmacokinetic models for estimating plasma, liver, and kidney exposures in rats after oral doses of 246 disparate chemicals. Chem. Res. Toxicol. 34, 507–513. [DOI] [PubMed] [Google Scholar]
- Kamiya Y., Omura A., Hayasaka R., Saito R., Sano I., Handa K., Ohori J., Kitajima M., Shono F., Funatsu K., et al. (2021b). Prediction of permeability across intestinal cell monolayers for 219 disparate chemicals using in vitro experimental coefficients in a pH gradient system and in silico analyses by trivariate linear regressions and machine learning. Biochem. Pharmacol. 192, 114749. [DOI] [PubMed] [Google Scholar]
- Kamiya Y., Otsuka S., Miura T., Takaku H., Yamada R., Nakazato M., Nakamura H., Mizuno S., Shono F., Funatsu K., et al. (2019). Plasma and hepatic concentrations of chemicals after virtual oral administrations extrapolated using rat plasma data and simple physiologically based pharmacokinetic models. Chem. Res. Toxicol. 32, 792–792. [DOI] [PubMed] [Google Scholar]
- Kamiya Y., Otsuka S., Miura T., Yoshizawa M., Nakano A., Iwasaki M., Kobayashi Y., Shimizu M., Kitajima M., Shono F., et al. (2020). Physiologically based pharmacokinetic models predicting renal and hepatic concentrations of industrial chemicals after virtual oral doses in rats. Chem. Res. Toxicol. 33, 1736–1751. [DOI] [PubMed] [Google Scholar]
- Kosugi Y., Hosea N. (2020). Direct comparison of total clearance prediction: Computational machine learning model versus bottom-up approach using in vitro assay. Mol. Pharm. 17, 2299–2309. [DOI] [PubMed] [Google Scholar]
- Kumar V., Faheem M., Woo Lee K. (2022). A decade of machine learning-based predictive models for human pharmacokinetics: Advances and challenges. Drug Discov. Today 27, 529–537. [DOI] [PubMed] [Google Scholar]
- Le Cun Y., Bengio Y., Hinton G. (2015). Deep learning. Nature 521, 436–444. [DOI] [PubMed] [Google Scholar]
- Lee J. B., Zhou S., Chiang M., Zang X., Kim T. H., Kagan L. (2020). Interspecies prediction of pharmacokinetics and tissue distribution of doxorubicin by physiologically-based pharmacokinetic modeling. Biopharm. Drug Dispos. 41, 192–205. [DOI] [PubMed] [Google Scholar]
- Lin J., Sahakian D. C., de Morais S. M. F., Xu J. H., Polzer R. J., Winter S. M. (2003). The role of absorption, distribution, metabolism, excretion and toxicity in drug discovery. Curr. Top. Med. Chem. 3, 1125–1154. [DOI] [PubMed] [Google Scholar]
- Lin Z., Chou W. C. (2022). Machine learning and artificial intelligence in toxicological sciences. Toxicol. Sci. 189, 7–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin Z., Chou W. C., Cheng Y. H., He C., Monteiro-Riviere N. A., Riviere J. E. (2022). Predicting nanoparticle delivery to tumors using machine learning and artificial intelligence approaches. Int. J. Nanomed. 17, 1365–1379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin Z., Gehring R., Mochel J. P., Lave T., Riviere J. E. (2016). Mathematical modeling and simulation in animal health - Part II: Principles, methods, applications, and value of physiologically based pharmacokinetic modeling in veterinary medicine and food safety assessment. J. Vet. Pharmacol. Ther. 39, 421–438. [DOI] [PubMed] [Google Scholar]
- Liu H. X., Yao X. J., Zhang R. S., Liu M. C., Hu Z. D., Fan B. T. (2005). Prediction of the tissue/blood partition coefficients of organic compounds based on the molecular structure using least-squares support vector machines. J. Comput. Aided Mol. Des. 19, 499–508. [DOI] [PubMed] [Google Scholar]
- Liu Q., Zhu H., Liu C., Jean D., Huang S. M., ElZarrad M. K., Blumenthal G., Wang Y. N. (2020). Application of machine learning in drug development and regulation: Current status and future potential. Clin. Pharmacol. Ther. 107, 726–729. [DOI] [PubMed] [Google Scholar]
- Liu X. Y., Liu C., Huang R. H., Zhu H., Liu Q., Mitra S., Wang Y. N. (2021). Long short-term memory recurrent neural network for pharmacokinetic-pharmacodynamic modeling. Int. J. Clin. Pharmacol. Ther. 59, 138–146. [DOI] [PubMed] [Google Scholar]
- Lombardo F., Berellini G., Obach R. S. (2018). Trend analysis of a database of intravenous pharmacokinetic parameters in humans for 1352 drug compounds. Drug Metab. Dispos. 46, 1466–1477. [DOI] [PubMed] [Google Scholar]
- Lu J., Deng K. W., Zhang X. Y., Liu G. B., Guan Y. F. (2021a). Neural-ode for pharmacokinetics modeling and its advantage to alternative machine learning models in predicting new dosing regimens. Iscience 24, 102804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu J. M., Bender B., Jin J. Y., Guan Y. F. (2021b). Deep learning prediction of patient response time course from early data via neural-pharmacokinetic/pharmacodynamic modelling. Nat. Mach. Intell. 3, 696–704. [Google Scholar]
- Luechtefeld T., Rowlands C., Hartung T. (2018). Big-data and machine learning to revamp computational toxicology and its use in risk assessment. Toxicol. Res. (Camb.) 7, 732–744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maharao N., Antontsev V., Hou H., Walsh J., Varshney J. (2020). Scalable in silico simulation of transdermal drug permeability: Application of BIOiSIM platform. Drug Des. Devel. Ther. 14, 2307–2317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mallick P., Moreau M., Song G., Efremenko A. Y., Pendse S. N., Creek M. R., Osimitz T. G., Hines R. N., Hinderliter P., Clewell H. J., et al. (2020). Development and application of a life-stage physiologically based pharmacokinetic (PBPK) model to the assessment of internal dose of pyrethroids in humans. Toxicol. Sci. 173, 86–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maltarollo V. G., Gertrudes J. C., Oliveira P. R., Honorio K. M. (2015). Applying machine learning techniques for ADME-Tox prediction: A review. Expert Opin. Drug Metab. Toxicol. 11, 259–271. [DOI] [PubMed] [Google Scholar]
- Martin S. A., McLanahan E. D., Bushnell P. J., Hunter E. S., El-Masri H. (2015). Species extrapolation of life-stage physiologically-based pharmacokinetic (PBPK) models to investigate the developmental toxicology of ethanol using in vitro to in vivo (IVIVE) methods. Toxicol. Sci. 143, 512–535. [DOI] [PubMed] [Google Scholar]
- Matsuo Y., LeCun Y., Sahani M., Precup D., Silver D., Sugiyama M., Uchibe E., Morimoto J. (2022). Deep learning, reinforcement learning, and world models. Neural Netw. 152, 267–275. [DOI] [PubMed] [Google Scholar]
- Moda T. L., Torres L. G., Carrara A. E., Andricopulo A. D. (2008). PK/DB: Database for pharmacokinetic properties and predictive in silico ADME models. Bioinformatics 24, 2270–2271. [DOI] [PubMed] [Google Scholar]
- Murad N., Pasikanti K. K., Madej B. D., Minnich A., McComas J. M., Crouch S., Polli J. W., Weber A. D. (2021). Predicting volume of distribution in humans: Performance of in silico methods for a large set of structurally diverse clinical compounds. Drug Metab. Dispos. 49, 169–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neil, D., , Pfeiffer M., Liu S.-C. (2016). Phased LSTM - Accelerating recurrent network training for long or event-based sequences. In Proceedings of the 30th international conference on neural information processing systems (NIPS 2016), Barcelona, Spain, pp. 3889–3897. [Google Scholar]
- Niwa T. (2003). Using general regression and probabilistic neural networks to predict human intestinal absorption with topological descriptors derived from two-dimensional chemical structures. J. Chem. Inf. Comput. Sci. 43, 113–119. [DOI] [PubMed] [Google Scholar]
- Paine S. W., Barton P., Bird J., Denton R., Menochet K., Smith A., Tomkinson N. P., Chohan K. K. (2010). A rapid computational filter for predicting the rate of human renal clearance. J. Mol. Graph. Model. 29, 529–537. [DOI] [PubMed] [Google Scholar]
- Paixao P., Gouveia L. F., Morais J. A. G. (2010). Prediction of the in vitro intrinsic clearance determined in suspensions of human hepatocytes by using artificial neural networks. Eur. J. Pharm. Sci. 39, 310–321. [DOI] [PubMed] [Google Scholar]
- Papa E., Sangion A., Arnot J. A., Gramatica P. (2018). Development of human biotransformation QSARs and application for PBT assessment refinement. Food Chem. Toxicol. 112, 535–543. [DOI] [PubMed] [Google Scholar]
- Pearce R. G., Setzer R. W., Davis J. L., Wambaugh J. F. (2017). Evaluation and calibration of high-throughput predictions of chemical distribution to tissues. J. Pharmacokinet. Pharmacodyn. 44, 549–565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pihan E., Colliandre L., Guichou J. F., Douguet D. (2012). E-Drug3D: 3D structure collections dedicated to drug repurposing and fragment-based drug design. Bioinformatics 28, 1540–1541. [DOI] [PubMed] [Google Scholar]
- Pirovano A., Brandmaier S., Huijbregts M. A. J., Ragas A. M. J., Veltman K., Hendriks A. J. (2015). The utilisation of structural descriptors to predict metabolic constants of xenobiotics in mammals. Environ. Toxicol. Pharmacol. 39, 247–258. [DOI] [PubMed] [Google Scholar]
- Poulin P., Theil F. P. (2002). Prediction of pharmacokinetics prior to in vivo studies. 1. Mechanism-based prediction of volume of distribution. J. Pharm. Sci. 91, 129–156. [DOI] [PubMed] [Google Scholar]
- Pradeep P., Patlewicz G., Pearce R., Wambaugh J., Wetmore B., Judson R. (2020). Using chemical structure information to develop predictive models for in vitro toxicokinetic parameters to inform high-throughput risk-assessment. Comput. Toxicol. 16, 100136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Price K., Krishnan K. (2011). An integrated QSAR-PBPK modelling approach for predicting the inhalation toxicokinetics of mixtures of volatile organic chemicals in the rat. SAR QSAR Environ. Res. 22, 107–128. [DOI] [PubMed] [Google Scholar]
- Rodgers T., Leahy D., Rowland M. (2005). Physiologically based pharmacokinetic modeling 1: Predicting the tissue distribution of moderate-to-strong bases. J. Pharm. Sci. 94, 1259–1276. [DOI] [PubMed] [Google Scholar]
- Rodgers T., Rowland M. (2006). Physiologically based pharmacokinetic modelling 2: Predicting the tissue distribution of acids, very weak bases, neutrals and zwitterions. J. Pharm. Sci. 95, 1238–1257. [DOI] [PubMed] [Google Scholar]
- Rotroff D. M., Wetmore B. A., Dix D. J., Ferguson S. S., Clewell H. J., Houck K. A., Lecluyse E. L., Andersen M. E., Judson R. S., Smith C. M., et al. (2010). Incorporating human dosimetry and exposure into high-throughput in vitro toxicity screening. Toxicol. Sci. 117, 348–358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ruark C. D., Song G., Yoon M., Verner M.-A., Andersen M. E., Clewell H. J., Longnecker M. P. (2017). Quantitative bias analysis for epidemiological associations of perfluoroalkyl substance serum concentrations and early onset of menopause. Environ. Int. 99, 245–254. [DOI] [PubMed] [Google Scholar]
- Rudin C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sarigiannis D. A., Papadaki K., Kontoroupis P., Karakitsios S. P. (2017). Development of QSARs for parameterizing physiology based toxicokinetic models. Food Chem. Toxicol. 106, 114–124. [DOI] [PubMed] [Google Scholar]
- Sayre R. R., Wambaugh J. F., Grulke C. M. (2020). Database of pharmacokinetic time-series data and parameters for 144 environmental chemicals. Sci. Data 7, 122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmitt W. (2008). General approach for the calculation of tissue to plasma partition coefficients. Toxicol. In Vitro 22, 457–467. [DOI] [PubMed] [Google Scholar]
- Schneckener S., Grimbs S., Hey J., Menz S., Osmers M., Schaper S., Hillisch A., Goller A. H. (2019). Prediction of oral bioavailability in rats: Transferring insights from in vitro correlations to (deep) machine learning models using in silico model outputs and chemical structure parameters. J. Chem. Inf. Model. 59, 4893–4905. [DOI] [PubMed] [Google Scholar]
- Shen J., Cheng F. X., Xu Y., Li W. H., Tang Y. (2010). Estimation of adme properties with substructure pattern recognition. J. Chem. Inf. Model. 50, 1034–1041. [DOI] [PubMed] [Google Scholar]
- Sipes N. S., Wambaugh J. F., Pearce R., Auerbach S. S., Wetmore B. A., Hsieh J. H., Shapiro A. J., Svoboda D., DeVito M. J., Ferguson S. S. (2017). An intuitive approach for predicting potential human health risk with the Tox21 10k library. Environ. Sci. Technol. 51, 10786–10796. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sweeney L. M., Sterner T. R. (2022). Prediction of mammalian maximal rates of metabolism and Michaelis constants for industrial and environmental compounds: Revisiting four quantitative structure activity relationship (QSAR) publications. Comput. Toxicol. 21, 100214. [Google Scholar]
- Talevi A., Goodarzi M., Ortiz E. V., Duchowicz P. R., Bellera C. L., Pesce G., Castro E. A., Bruno-Blanch L. E. (2011). Prediction of drug intestinal absorption by new linear and non-linear QSPR. Eur. J. Med. Chem. 46, 218–228. [DOI] [PubMed] [Google Scholar]
- Varma M. V. S., Obach R. S., Rotter C., Miller H. R., Chang G., Steyn S. J., El-Kattan A., Troutman M. D. (2010). Physicochemical space for optimum oral bioavailability: Contribution of human intestinal absorption and first-pass elimination. J. Med. Chem. 53, 1098–1108. [DOI] [PubMed] [Google Scholar]
- Wambaugh J. F., Hughes M. F., Ring C. L., MacMillan D. K., Ford J., Fennell T. R., Black S. R., Snyder R. W., Sipes N. S., Wetmore B. A., et al. (2018). Evaluating in vitro-in vivo extrapolation of toxicokinetics. Toxicol. Sci. 163, 152–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wambaugh J. F., Wetmore B. A., Pearce R., Strope C., Goldsmith R., Sluka J. P., Sedykh A., Tropsha A., Bosgra S., Shah I., et al. (2015). Toxicokinetic triage for environmental chemicals. Toxicol. Sci. 147, 55–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang N. N., Huang C., Dong J., Yao Z. J., Zhu M. F., Deng Z. K., Lv B., Lu A. P., Chen A. F., Cao D. S. (2017). Predicting human intestinal absorption with modified random forest approach: A comprehensive evaluation of molecular representation, unbalanced data, and applicability domain issues. RSC Adv. 7, 19007–19018. [Google Scholar]
- Wang Y. C., Liu H. C., Fan Y. R., Chen X. Y., Yang Y., Zhu L., Zhao J. N., Chen Y. D., Zhang Y. M. (2019). In silico prediction of human intravenous pharmacokinetic parameters with improved accuracy. J. Chem. Inf. Model. 59, 3968–3980. [DOI] [PubMed] [Google Scholar]
- Watanabe R., Esaki T., Kawashima H., Natsume-Kitatani Y., Nagao C., Ohashi R., Mizuguchi K. (2018). Predicting fraction unbound in human plasma from chemical structure: Improved accuracy in the low value ranges. Mol. Pharm. 15, 5302–5311. [DOI] [PubMed] [Google Scholar]
- Wetmore B. A. (2015). Quantitative in vitro-to-in vivo extrapolation in a high-throughput environment. Toxicology 332, 94–101. [DOI] [PubMed] [Google Scholar]
- Wetmore B. A., Allen B., Clewell H. J., Parker T., Wambaugh J. F., Almond L. M., Sochaski M. A., Thomas R. S. (2014). Incorporating population variability and susceptible subpopulations into dosimetry for high-throughput toxicity testing. Toxicol. Sci. 142, 210–224. [DOI] [PubMed] [Google Scholar]
- Wetmore B. A., Wambaugh J. F., Ferguson S. S., Li L., Clewell H. J., Judson R. S., Freeman K., Bao W., Sochaski M. A., Chu T.-M., et al. (2013). Relative impact of incorporating pharmacokinetics on predicting in vivo hazard and mode of action from high-throughput in vitro toxicity assays. Toxicol. Sci. 132, 327–346. [DOI] [PubMed] [Google Scholar]
- Xiong G., Wu Z., Yi J., Fu L., Yang Z., Hsieh C., Yin M., Zeng X., Wu C., Lu A., et al. (2021). ADMETlab 2.0: An integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic Acids Res. 49, W5–W14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiong Y., Fan J., Kitabi E., Zhang X., Bi Y., Grimstein M., Yang Y., Earp J. C., Zheng N., Liu J., et al. (2022). Model-informed drug development approaches to assist new drug development in the covid-19 pandemic. Clin. Pharmacol. Ther. 111, 572–578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan A. X., Wang Z., Cai Z. Y. (2008). Prediction of human intestinal absorption by GA feature selection and support vector machine regression. Int. J. Mol. Sci. 9, 1961–1976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ying X. (2019). An overview of overfitting and its solutions. J. Phys. Conf. Ser. 1168, 022022. [Google Scholar]
- Yun Y. E., Cotton C. A., Edginton A. N. (2014). Development of a decision tree to classify the most accurate tissue-specific tissue to plasma partition coefficient algorithm for a given compound. J. Pharmacokinet. Pharmacodyn. 41, 1–14. [DOI] [PubMed] [Google Scholar]
- Yun Y. E., Tornero-Velez R., Purucker S. T., Chang D. T., Edginton A. N. (2021). Evaluation of quantitative structure property relationship algorithms for predicting plasma protein binding in humans. Comput. Toxicol. 17, 100142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang J., Mucs D., Norinder U., Svensson F. (2019). LightGBM: An effective and scalable algorithm for prediction of chemical toxicity-application to the Tox21 and mutagenicity data sets. J. Chem. Inf. Model. 59, 4150–4158. [DOI] [PubMed] [Google Scholar]
- Zhang J. H., Fraczkiewicz R., Bolger M. B., Waldman M., Woltosz W. S., Enslein K. (2008). Predicting kinetic parameters km and vmax for substrates of human cytochrome p450 1a2, 2c9, 2c19, 2d6, and 3a4. Drug Metab. Rev. 40, 119–119. [Google Scholar]
- Zhao P., Zhang L., Grillo J. A., Liu Q., Bullock J. M., Moon Y. J., Song P., Brar S. S., Madabushi R., Wu T. C., et al. (2011). Applications of physiologically based pharmacokinetic (PBPK) modeling and simulation during regulatory review. Clin. Pharmacol. Ther. 89, 259–267. [DOI] [PubMed] [Google Scholar]
- Zhou K., Liu A., Ma W., Sun L., Mi K., Xu X., Algharib S. A., Xie S., Huang L. (2021). Apply a physiologically based pharmacokinetic model to promote the development of enrofloxacin granules: Predict withdrawal interval and toxicity dose. Antibiotics (Basel) 10, 955. [DOI] [PMC free article] [PubMed] [Google Scholar]


