Abstract
The extent of plasma protein binding is an important compound-specific property that influences a compound’s pharmacokinetic behavior and is a critical input parameter for predicting exposure in physiologically based pharmacokinetic (PBPK) modeling. When experimentally determined fraction unbound in plasma (fup) data are not available, quantitative structure-property relationship (QSPR) models can be used for prediction. Because available QSPR models were developed based on training sets containing pharmaceutical-like compounds, we compared their prediction accuracy for environmentally relevant and pharmaceutical compounds. Fup values were calculated using Ingle et al., Watanabe et al. and ADMET Predictor (Simulation Plus). The test set included 818 pharmaceutical and environmentally relevant compounds with fup values ranging from 0.01 to 1. Overall, the three QSPR models resulted in over-prediction of fup for highly binding compounds and under-prediction for low or moderately binding compounds. For highly binding compounds (0.01≤ fup ≤ 0.25), Watanabe et al. performed better with a lower mean absolute error (MAE) of 6.7% and a lower mean absolute relative prediction error (RPE) of 171.7 % than other methods. For low to moderately binding compounds, both Ingle et al. and ADMET Predictor performed better than Watanabe et al. with superior MAE and RPE values. The positive polar surface area, the number of basic functional groups and lipophilicity were the most important chemical descriptors for predicting fup. This study demonstrated that the prediction of fup was the most uncertain for highly binding compounds. This suggested that QSPR-predicted fup values should be used with caution in PBPK modeling.
Keywords: Plasma protein binding, quantitative structure-property relationship (QSPR) models, environmentally relevant chemicals, human health risk assessment
1. Introduction
Many endogenous and exogenous substances reversibly bind to plasma proteins such as albumin, alpha1-acid glycoprotein (AAG) and lipoproteins. The extent of binding, often expressed as a fraction unbound in plasma (fup), is a function of protein concentration and protein binding affinity as well as, to a lesser degree, a displacement by other molecules [1]. The extent of plasma protein binding is an important compound-specific property as it affects a compound’s distribution, metabolism and elimination processes. Therefore, the characterization of protein binding is critical to the prediction of the pharmacokinetics (PK) of a compound and essential within a physiologically-based toxicokinetic (PBTK) modeling framework [2] to the estimation of human exposure to environmental toxicants.
Experimental determination of fraction unbound in plasma can be done by, for example, ultrafiltration or equilibrium dialysis methods [3–6]. When experimentally determined protein binding information is not available, in silico methods, such as predictive quantitative structure-property relationship (QSPR) modeling, can be used. QSPR models identify relationships between chemical structure and a chemical property, such as the degree of protein binding in plasma [7]. Chemical descriptors that capture structural properties and characteristics of compounds are used as predictors, while protein binding information is used as a response variable. The learned relationship between protein binding and chemical descriptors can provide an estimation of a fup for a new compound.
In one of the recently developed QSPR models, Ingle et al. [8] included both pharmaceutical and environmentally relevant compounds (ERC) in their training and test sets. For the pharmaceuticals, previously curated data from the literature [9–12] was included. In terms of ERCs, experimentally derived protein binding data of ToxCast compounds from Wetmore et al. [13, 14] were included and used as a test set. The data from ToxCast, implemented by the US Environmental Protection Agency (EPA), included in vitro assessment of fup for pesticides, food additives, consumer products, and industrial products [15]. Molecular Operating Environment (MOE, Chemical Computing group) was used to calculate the input chemical descriptors. To construct a predictive model, several machine learning techniques such as k-nearest neighbours, support vector machines and random forest were employed. A consensus model resulted in the best predictive performance with mean absolute error (MAE) values of 0.15 and 0.11 for pharmaceuticals and ERCs, respectively.
Watanabe et al. [16] developed two kinds of predictive models using machine learning techniques. For the first model, a classification function was used to predict whether a new compound will have high or low protein binding separated by a fup value of 0.05 for binary classifiers (i.e. high or low) and by fup values of 0.05 and 0.2 for three-class (i.e. high, moderate and low) classifiers. For the second model, a regression function of machine learning methods was used to predict fup. For chemical descriptors, Mordred [17] and Padel [18] programs were used. The training set included 2192 compounds from the ChEMBL [19] and PharmaPendium [20] databases. The test set included 546 compounds from KEGG DRUG [21–23]. The classifier resulted in a true positive rate of 0.83 for the low fup class. Predictive performance was compared with the results of the S+PrUnbnd model from ADMET Predictor 8.1 (Simulation Plus, Inc.). The Watanabe et al. [16] model (MAE: 0.32) resulted in higher predictive accuracy as compared to the S+PrUnbnd model (MAE: 0.43) [16]. The online calculator provides both a fup estimate based on the regression algorithm and a compound’s degree of binding classification based on the multi-state classifier.
The predictive performances of different QSPR models have been evaluated using a relatively small set of data, not fully encompassing the structural diversity of compounds. The QSPR models have not been evaluated with the same dataset. Therefore, the prediction accuracy that was determined in the QSPR studies is not comparatively informative. Furthermore, available QSPR models have been developed based on training sets containing pharmaceutical compounds. It is necessary, therefore, to compare the prediction accuracy of the QSPR models for both environmentally relevant (ERC) and pharmaceutical compounds.
In this study, (i) we will evaluate the predictive performance of QSPR models for predicting fup values in humans for ERCs and pharmaceuticals. The prediction accuracy of QSPR models, Ingle et al. [8] and Watanabe et al. [16], will be compared to that of a commercially available program ADMET Predictor. (ii) We will identify the most critical chemical characteristics that influence the predictive performance of each QSPR model. (iii) We will identify the chemical space that is different between QSPR training sets and ERCs.
2. Materials and Methods
2.1. Construction of the test dataset
Fup values in humans were obtained from the literature [8, 16, 24–27]. The workflow for the construction of the test set is illustrated in Figure 1. The training and test sets of Ingle et al. [8] and Watanabe et al. [16] were obtained from the respective supplemental materials. As each dataset included different chemical identifiers, the different chemical identifiers were then translated to the same type of identifier, PubChem ID (CID) using the PubChem Identifier Exchange Service (https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi) [28]. The test sets of Watanabe et al. [16] and Ingle et al. [8] were then combined with the literature data [24–27] that were not included in these datasets (Figure 1A). All the data sets were merged using R (version 3.6). To prevent overlap in compounds between the training and test sets, the compounds that were used for training sets for either Ingle et al. [8] or Watanabe et al. [16] were removed from the test set. Using the obtained PubChem IDs as input, 2-dimensional structure-data file (SDF) files were downloaded using the PubChem Download Service (https://pubchem.ncbi.nlm.nih.gov/pc_fetch/pc_fetch.cgi). In order to ensure the ID conversion process was properly done, the simplified molecular-input line-entry system (SMILES) identifiers in the original data set were visually compared to the PubChem canonical SMILES by using the SMILES checker (http://www.cheminfo.org/flavor/malaria/Utilities/SMILES_generator___checker/index.html). For consistency, PubChem canonical SMILES and SDF files were used as inputs for the QSPR models and for the calculation of fup values.
2.2. Selection criteria for non-commercial QSPR models for predicting fup
Among the many available QSPR models, non-commercial models were selected for this study. The selection was based on the following criteria: (i) the training dataset used to build the QSPR model includes structurally diverse compounds (i.e. the training set included more than 1000 compounds), and (ii) either a freely available calculator or the source code for the model was available online. Selected non-commercial QSPR models were compared for prediction accuracy amongst themselves as well as against the commercial software, ADMET Predictor (Simulation Plus®).
2.3. Calculation of fupadult based on QSPR methods
With the constructed test set, fup values were calculated (Figure 1B). For the ADMET Predictor, SDF files were imported into the program, and human fup values were calculated using the ADMET predict function (ADMET Predictor™ software provided by Simulations Plus, Inc., Lancaster, California, USA). For fup calculation using Watanabe et al. [16], SDF files that were obtained from PubChem were used as input in the online fup calculator (https://drumap.nibiohn.go.jp/fup/). For the fup calculation using Ingle et al. [8], using SMILES as an input, chemical descriptors that were required to predict a fup were calculated using MOE. The consensus model output was evaluated for the prediction accuracy. The calculated fup values based on each method were gathered and imported into R (version 3.6) for further analysis.
2.4. Predictive performance of QSPR models
In order to assess the predictive performance of the QSPR models, prediction error (Eqn. 1), relative prediction error (Eqn. 2), average absolute relative prediction error (Eqn 3), mean absolute error (Eqn 4), root mean squared error (Eqn 5) and correlation of determination (r2, Eqn 6) were calculated. Scatterplots of prediction errors and fup values were created using the “ggplot” package in the software R. Prediction error is the difference between the predicted and observed fup (Eqn 1). Relative prediction error (RPE) indicates the relative magnitude of a prediction error (Eqn 2). Mean absolute RPE indicates overall relative prediction deviation regardless of under- or over-prediction (Eqn 3). Mean absolute error indicates the average of prediction deviation from the observed value. An observed fup value of 0 was assumed to be 0.001. The evaluation metrics were calculated for compounds with observed fup values ranging from 0.01 to 1. Due to the uncertainty associated with experimental measurements of protein binding for highly bound compounds [29], compounds with fup less than 0.01 (fup < 0.01) were excluded for evaluating the prediction performance of QSPR models.
In order to evaluate the predictive performances of the QSPR models based on the types of compounds, the test set was subdivided by three categories, such as (i) highly binding (i.e. 0.01≤ fupobs ≤ 0.25) or low-to-moderately binding compounds (i.e. fupobs > 0.25), (ii) pharmaceuticals and ERCs, and (iii) acid-base properties. For acid-base classification, the same criteria used in Ingle et al. [8] were applied to determine acid, base, neutral and zwitterion.
Eqn 1. |
Eqn 2. |
Eqn 3. |
Eqn 4. |
Eqn 5. |
Eqn 6. |
2.5. Identification of important chemical descriptors on QSPR model prediction performance
Using PubChem 2D and 3D SDF files as inputs, chemical descriptors were obtained using the PaDel Descriptor program (version 2.21) [18] (Figure 1C). The total of 1045 2D chemical descriptors and 431 3D chemical descriptors were generated for the test set. Additional chemical information such as lipophilicity and polar surface area was obtained from the PubChem SDF files. The chemical descriptors with near zero variance (i.e. descriptor with one unique value or relative small number of unique values compared to the size of the sample) were removed using nearZeroVar function of Caret package in R [30]. The chemical descriptors were standardized for a statistical test (Eqn 7). The Pearson’s correlation test was performed between the logarithm of RPE for each QSPR method and each chemical descriptor. The logarithm transformation of RPE was done because of the skewed distribution of RPE values. The correlation coefficients (r) and significance level (i.e. p-value < 0.05) were obtained for each chemical descriptor. The chemical descriptors with a p-value exceeding 0.05 were removed. The remaining descriptors were then ranked based on Pearson’s correlation coefficients in order to identify the most correlated chemical descriptors with RPE values of each model.
Eqn 7. |
2.6. Comparison of chemical structures between training sets of QSPR models and environmentally relevant compounds
For training sets of Ingle et al. [8] and Watanabe et al. [16], 2- and 3- dimensional chemical descriptors were calculated using Padel. Chemical descriptors of compounds in those training sets were those of environmentally relevant compounds in the test set of this study. Each chemical descriptor in each training set and test set were subjected to Two-sample Student’s T test (i.e. Welch’s T test) to identify statistically different chemical descriptors. The significantly different chemical descriptors with p-value less than 0.05 between a training set and the test set of ERC compounds were then ranked based on p-values.
3. Results
3.1. Prediction performance of QSPR models for estimating fup
Data from a total of 1026 compounds was gathered. Among 208 compounds with observed fup values less than 0.01, only 0 %, 8.2 % and 1.4 % of fup values were calculated within the range of 0.001– 0.01 by ADMET Predictor, Watanabe et al. [16], and Ingle et al. [8], respectively (Figure S1). The test set for evaluating the predictive performance of QSPR models included a total of 818 compounds with fup values ranging from 0.01 to 1 of which 69% were pharmaceutical and 31% were environmentally relevant. The predicted fup values based on QSPR models were compared to the observed data. Overall, the three QSPR models resulted in over-prediction of fup for highly binding compounds and under-prediction for low or moderately binding compounds (Figure 2 A–B). All QSPR models resulted in higher relative prediction error for highly binding compounds (i.e. observed fup ranging from 0.01 to 0.25) (Figure 3 A–C). The highly deviating predictions by all QSPR models were observed for both types of compounds, namely pharmaceutical and environmentally relevant compounds (Figure 3D).
In terms of the overall predictive performance for both ERC and pharmaceuticals, ADMET Predictor and Watanabe et al. [16] resulted in the better predictive performance with lower mean absolute RPE, lower MAE and RMSE values than those of Ingle et al. [8] (Table 1). For highly binding compounds (i.e. 0.01 ≤ fup ≤ 0.25), Watanabe et al. [16] performed better with a lower MAE of 6.7% and a lower mean absolute RPE of 171.7 % than other QSPR methods. For low or moderately binding compounds (fup > 0.25), both Ingle et al. [8] and ADMET Predictor performed better than Watanabe et al. [16] with superior MAE and mean absolute RPE values. For both pharmaceuticals and ERCs, ADMET Predictor and Watanabe et al. [16] performed better than Ingle et al. [8] with lower MAE and mean absolute RPE values. For all QSPR models, higher RPEs were observed for acids compared to those of other types of compounds. Based on RPE values, Watanabe et al. [16] performed better for bases, neutrals and zwitterions.
Table 1.
ADMET Predictor | Watanabe et al. | Ingle et al. | |
---|---|---|---|
All compounds (n=818) | |||
RMSE | 0.21 | 0.22 | 0.24 |
R2 | 0.52 | 0.48 | 0.37 |
Mean absolute error | 12.6 | 14.3 | 15.9 |
Mean absolute RPE | 149.3 | 131.4 | 243.9 |
Median absolute error | 6.5 | 7.2 | 9.5 |
Median absolute RPE | 58.2 | 55.3 | 67.1 |
Highly binding compounds (0.01 ≤ fup ≤ 0.25, n=552) | |||
Mean absolute error | 7 | 6.7 | 12 |
Mean absolute RPE | 202.1 | 171.7 | 341.6 |
Median absolute error | 4.9 | 4 | 6.4 |
Median absolute RPE | 95.3 | 64 | 116.4 |
Lowly or moderately binding compounds (fup > 0.25, n=266) | |||
Mean absolute error | 24.3 | 30 | 24 |
Mean absolute RPE | 39.6 | 47.8 | 41.1 |
Median absolute error | 17.5 | 26.5 | 18.6 |
Median absolute RPE | 36.6 | 49.2 | 36.2 |
Pharmaceuticals (n=565) | |||
Mean absolute error | 13 | 13.2 | 16.7 |
Mean absolute RPE | 146.3 | 109 | 269.9 |
Median absolute error | 6.6 | 6.1 | 10.1 |
Median absolute RPE | 60 | 51.4 | 71.4 |
Environmentally relevant compounds (n=253) | |||
Mean absolute error | 11.8 | 16.6 | 14.1 |
Mean absolute RPE | 155.9 | 181.5 | 185.9 |
Median absolute error | 6.5 | 10.2 | 8.2 |
Median absolute RPE | 56.1 | 65 | 53.1 |
Acids (n=177) | |||
Mean absolute error | 12.5 | 14.4 | 16.9 |
Mean absolute RPE | 174.5 | 213.6 | 364.4 |
Median absolute error | 6.2 | 7.8 | 11 |
Median absolute RPE | 77.2 | 64 | 99.7 |
Bases (n=221) | |||
Mean absolute error | 13.1 | 14.4 | 18.3 |
Mean absolute RPE | 150.2 | 91.3 | 277.5 |
Median absolute error | 8.1 | 7.7 | 11.5 |
Median absolute RPE | 52.8 | 49.6 | 62.3 |
Neutrals (n=397) | |||
Mean absolute error | 12 | 13.3 | 13.9 |
Mean absolute RPE | 139.7 | 120 | 177.4 |
Median absolute error | 5.7 | 6.2 | 7.2 |
Median absolute RPE | 57.3 | 57.4 | 62.5 |
Zwitterions (n=23) | |||
Mean absolute error | 20.1 | 29.5 | 21.6 |
Mean absolute RPE | 111.6 | 82.7 | 142.4 |
Median absolute error | 12.3 | 28.8 | 15.4 |
Median absolute RPE | 49 | 62.8 | 38 |
3.2. Prediction accuracy as a function of chemical structure
Chemical descriptors that exhibited a significant correlation with RPE (p-values < 0.05) based on the Pearson correlation test were ranked. For ADMET Predictor, the number of basic functional groups, the fraction of charged weighted partial positive surface area and lipophilicity were most correlated with the RPE (Figure 4 A–B). For Watanabe et al. [16] and Ingle et al. [8], the partial positive surface area, the number of basic functional groups and lipophilicity were the most important parameters (Figure 4 C–F). Taken together, for all three QSPR models, the positive polar surface area, the number of basic functional groups and lipophilicity were the most important chemical descriptors for predicting fup. Highly hydrophobic compounds with fewer basic functional groups were found to have high RPEs (>200%) or below the prediction limit (fup < 0.01). On the other hand, the QSPR models showed relatively low prediction error (RPE < 200%) for hydrophilic compounds with basic functional groups.
3.3. Identification of significantly different chemical characteristics between ERCs and QSPR training set compounds
Environmentally relevant compounds were more lipophilic than the QSPR training set compounds (Figure 5A). Structurally, ERC contained a lower number of rings and a lower number of basic functional groups and a higher number of halogens than those of compounds in the training sets (Figure 5 B–D). In terms of 3D chemical descriptors, partial positive surface areas of ERC compounds (i.e. the sum of surface area on an electropositive portion of a molecule, PPSA-2) were significantly lower than those in the QSPR training sets (Figure 5 E). The ERC compounds were smaller in size with significantly smaller geometrical radii (Figure 5 F) and geometrical diameters.
4. Discussion
The degree of plasma protein binding is an important property of a compound influencing toxicokinetics (TK) and is a key parameter in PBTK modeling. Volume of distribution that affects the maximum concentration and the half-life, is directly proportional to fup. Further, the freely available portion of compounds elicit pharmacological/toxicological response [1]. Clearance reflects overall exposure (i.e. area under the curve (AUC)) and is almost proportional to fup for compounds with a low hepatic extraction ratio (EH). Its importance in in vitro-in vivo extrapolation (IVIVE) has been demonstrated with Trichloroethylene [31] where in vitro CLH is measured with a hepatocyte uptake assay using isolated hepatocytes in a medium in the absence of plasma proteins and fup is applied as in equation 8. The determined CLH information can be incorporated into a PBPK model.
Eqn 8. |
Q: blood flow, RBP: blood-to-plasma ratio, CLint: in vitro intrinsic clearance, SF: scaling factor for 20 g liver/kg and 45 mg protein g/liver in humans [32].
An accurate determination of a fup is needed for use in PBTK models for human health risk assessment such that fup levels in potentially sensitive populations such as diseased, pediatric or elderly can be extrapolated by adjusting altered plasma protein concentrations [33, 34]. In a PBTK model, virtual individuals are built based on known trajectories of anatomy, biochemistry and physiology across age, and compounds are defined by physicochemical properties [35, 36]. Sensitivity analyses found that fup is one of the most critical input parameters for PBPK model outputs [2, 37] and when extrapolating across age or disease state, the reference value of fup defines the extrapolated fup. For example, Yun and Edginton [2] found that from 10 pediatric PBPK models that were extrapolated from adult models, the sensitivity coefficients of fup were high for compounds with low EH in predicting AUC, compared to compounds with high EH. Therefore, in many cases, the precision of fupadult must be ensured if there is to be confidence in the pediatric model outcomes.
In this study, we evaluated the performance of QSPR models for predicting protein binding as well as the chemical descriptors that were most associated with the resulting prediction errors. In terms of the overall predictive performance for all compounds, the RMSE and MAE values were the lowest for ADMET Predictor although Watanabe et al. [16] provided similar metrics. A clear distinguishing feature was that there was increased predictive performance for compounds having a fup >0.25 regardless of the model used.
In terms of extremely highly binding compounds (i.e. observed fup < 0.01), only a small fraction of fup values were predicted below fup of 0.01 indicating that the QSPR models may not be suitable for predicting fup values for extremely highly binding compounds. The reason for this is that during the data curation and the training set development, when a compound was stated to have protein binding higher than 99%, the fup value for the compound was assumed to be 0.01 [16]. This assumption is in line with U.S. Food Drug Administration guidelines that states when an experimentally determined value of fup is less than 0.01, the fup is then set to 0.01 due to uncertainties in the protein binding measurements [29]. However, this limitation makes the QSPR model inherently incapable of predicting a high degree of protein binding.
In terms of important chemical descriptors that were associated with the prediction errors of QSPR models, the commonly observed chemical characteristics of highly binding compounds were also correlated with a high prediction error. Lipophilicity was a critical chemical characteristic that was positively correlated with RPEs such that the compounds that had a high prediction error or below the limit of prediction (fup <0.01) tended to be highly lipophilic. This was expected because the lipophilicity of a compound is known to have a high correlation with plasma protein binding [38] and QSPR models, in general, resulted in poor prediction performance for highly binding compounds. In contrast, the number of basic functional groups and positive partial surface area were negatively correlated with RPEs.
The negative correlation between the chemical characteristics of positively charged states and the prediction error is in line with the albumin binding sites findings. There are three drug-like molecule binding sites in albumin, namely, warfarin (Site I), benzodiazepine (Site II), and digitoxin [1, 39–43]. An affinity of a compound to each binding site depends on the functional groups of that compound. The binding site II of the major carrier protein albumin has positively charged groups on the surface of its’ binding site [44]. The cationic center of binding sites and the positively charged portion of compounds are likely to have an electropositive repulsion. This is in accordance with an earlier finding that the presence of positive charges in a compound precludes binding to binding site II [44].
For the most part, QSPR models are built based on pharmaceutical compounds as a training set primarily due to the availability of experimental fup for these compounds as compared to ERCs. Therefore, it is necessary to evaluate the predictive performance of QSPR models for non-pharmaceutical compounds. For all evaluated QSPR models, the prediction accuracy was lower for ERCs than for pharmaceuticals. This suggested that the structural difference between the two types of compounds may have contributed to the discrepancy in the prediction accuracy of QSPR models. The ERCs were more lipophilic and smaller in size, furthermore, there were a lower number of basic functional groups and rings in ERCs compared to the training set compounds (Figure 5). These tendencies of ERCs lead to high binding affinity towards plasma proteins. This suggest that QSPR models are less equipped to predict ERCs that have the chemical characteristics listed above.
Some of the ERCs were highly halogenated compared to the QSPR training set compounds (Figure 5D). The highly halogenated compounds included organochlorines, pyrethroids, perfluoroalkyl and polyfluoroalkyl substances. The presence of halogens increases binding affinity to proteins through halogen bonding (e.g. halothane [45]). The majority (18 of 28 compounds) of highly halogenated compounds (i.e. the number of halogens > 5) were highly protein binding with the observed fup values less than 0.01. The absence of highly halogenated compounds in the QSPR training sets implies that the relationship between high halogenation and protein binding may not be well captured. This suggests that QSPR models may not be suitable to make predictions for the highly halogenated ERCs.
As regression models, QSPR models are suitable for predicting the target property within or near the chemical space of a training set [46]. For the predictions outside the intended chemical space, Tan et al. [46] suggested re-parameterizing or creating a new model. In the previous findings of Yin et al. [47] and Ingle et al. [8], the chemical spaces of pharmaceutical and ToxCast compounds [14, 48] overlap and the application domain [49] of the Ingle et al. [8] model covered the chemical properties of ERCs with a few exceptions. This leveraged the use of pharmaceutical data to predict fup values for ERCs. However, our study identified several chemical descriptors that were significantly different between ERC and pharmaceuticals (Figure 5). It is thought that expanding a training set to include ERC data may improve the prediction performance of a QSPR model. In addition, different sets of chemical descriptors and different machine learning techniques may result in different prediction performance [50]. With this in mind, multiple alternative QSPR models can be developed and consensus prediction can be applied [50, 51]. When predictions from multiple QSPR models converge, the confidence of an output increases and moving forward, this multiple model prediction approach should be considered.
A critical concern in the use of PBTK modeling for human health risk assessment is the availability of input parameters [35]. When an experimentally determined fup is not available, the use of QSPR models for predicting fup seems the most viable option and has been accepted as a de facto standard [46, 52]. This study suggests that the use of QSPR models for fup prediction in human and for further extrapolations using PBTK modeling or IVIVE may not be an optimal choice especially for highly binding ERCs. To improve prediction of fup, better mechanistic understanding is needed between the protein binding properties and chemical structure. Also, the uncertainty associated with experimental determination for highly binding compounds should be improved [53] as this uncertainty is carried forward into the QSPR models. Prediction of fup values using the QSPR approach is an alternative to experiments; however, if certainty is required, experimental determination is required.
Supplementary Material
Highlights.
Prediction performance of QSPR models are most variable for highly binding compounds
Lipophilicity and acid-base properties are critical for protein binding prediction
Chemical space differs between pharmaceuticals in training sets and environmentals
If precision is necessary, experimental determination is required
5. Acknowledgements
The authors would like to thank Moriah Pellowe, Elaina M. Kenyon and David A. Olson for helpful comments on the manuscript. This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), [funding reference number 2016-01382]. Cette recherche a été financée par le Conseil de recherches en sciences naturelles et en génie du Canada (CRSNG), [numéro de référence 2016-01382]. This manuscript has been subject to the U.S. Environmental Protection Agency’s administrative review and approved for publication, the presented work is that of the authors and does not necessarily represent Agency policy. Mention of trade names, products, or services does not convey, and should not be interpreted as conveying, official EPA approval, endorsement, or recommendation. The authors declare no competing financial interest.
Footnotes
Declaration of Interest
The authors have no conflict of interest, financial or otherwise.
8 Reference List
- 1.Burton ME, et al. , Applied pharmacokinetics & pharmacodynamics: principles of therapeutic drug monitoring. 4th ed. 2006: Lippincott Williams & Wilkins. [Google Scholar]
- 2.Yun YE and Edginton AN, Model qualification of the PK-Sim® pediatric module for pediatric exposure assessment of CYP450 metabolized compounds. Journal of Toxicology and Environmental Health, Part A, 2019: p. 1–26. [DOI] [PubMed] [Google Scholar]
- 3.Pacifici GM and Viani A, Methods of determining plasma and tissue binding of drugs. Pharmacokinetic consequences. Clin Pharmacokinet, 1992. 23(6): p. 449–68. [DOI] [PubMed] [Google Scholar]
- 4.Bowers WF, Fulton S, and Thompson J, Ultrafiltration vs equilibrium dialysis for determination of free fraction. Clin Pharmacokinet, 1984. 9 Suppl 1: p. 49–60. [DOI] [PubMed] [Google Scholar]
- 5.Oravcova J, Bohs B, and Lindner W, Drug-protein binding sites. New trends in analytical and experimental methodology. J Chromatogr B Biomed Appl, 1996. 677(1): p. 1–28. [DOI] [PubMed] [Google Scholar]
- 6.Bohnert T and Gan LS, Plasma Protein Binding: From Discovery to Development. Journal of Pharmaceutical Sciences, 2013. 102(9): p. 2953–2994. [DOI] [PubMed] [Google Scholar]
- 7.Lambrinidis G, Vallianatou T, and Tsantili-Kakoulidou A, In vitro, in silico and integrated strategies for the estimation of plasma protein binding. A review. Adv Drug Deliv Rev, 2015. 86: p. 27–45. [DOI] [PubMed] [Google Scholar]
- 8.Ingle BL, et al. , Informing the Human Plasma Protein Binding of Environmental Chemicals by Machine Learning in the Pharmaceutical Space: Applicability Domain and Limits of Predictability. Journal of Chemical Information and Modeling, 2016. 56(11): p. 2243–2252. [DOI] [PubMed] [Google Scholar]
- 9.Obach RS, Lombardo F, and Waters NJ, Trend analysis of a database of intravenous pharmacokinetic parameters in humans for 670 drug compounds. Drug Metab Dispos, 2008. 36(7): p. 1385–405. [DOI] [PubMed] [Google Scholar]
- 10.Zhu XW, et al. , The Use of Pseudo-Equilibrium Constant Affords Improved QSAR Models of Human Plasma Protein Binding. Pharmaceutical Research, 2013. 30(7): p. 1790–1798. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Votano JR, et al. , QSAR modeling of human serum protein binding with several modeling techniques utilizing structure-information representation. Journal of Medicinal Chemistry, 2006. 49(24): p. 7169–7181. [DOI] [PubMed] [Google Scholar]
- 12.Moda TL, et al. , PK/DB: database for pharmacokinetic properties and predictive in silico ADME models. Bioinformatics, 2008. 24(19): p. 2270–1. [DOI] [PubMed] [Google Scholar]
- 13.Wetmore BA, et al. , Incorporating High-Throughput Exposure Predictions With Dosimetry-Adjusted In Vitro Bioactivity to Inform Chemical Toxicity Testing. Toxicol Sci, 2015. 148(1): p. 121–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wetmore BA, et al. , Integration of dosimetry, exposure, and high-throughput screening data in chemical toxicity assessment. Toxicol Sci, 2012. 125(1): p. 157–74. [DOI] [PubMed] [Google Scholar]
- 15.US-EPA, Toxicity Forecaster (ToxCast) Fact Sheet https://www.epa.gov/sites/production/files/2019-01/documents/toxcast_factsheet_dec2018.pdf. [Google Scholar]
- 16.Watanabe R, et al. , Predicting Fraction Unbound in Human Plasma from Chemical Structure: Improved Accuracy in the Low Value Ranges. Mol Pharm, 2018. 15(11): p. 5302–5311. [DOI] [PubMed] [Google Scholar]
- 17.Moriwaki H, et al. , Mordred: a molecular descriptor calculator. Journal of Cheminformatics, 2018. 10(1): p. 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Yap CW, PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem, 2011. 32(7): p. 1466–74. [DOI] [PubMed] [Google Scholar]
- 19.Gaulton A, et al. , The ChEMBL database in 2017. Nucleic Acids Res, 2017. 45(D1): p. D945–D954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.SA RELX Intellectual Properties, PharmaPendium https://www.pharmapendium.com. 2016, Elsevier. [Google Scholar]
- 21.Kanehisa M, et al. , KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res, 2017. 45(D1): p. D353–D361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kanehisa M and Goto S, KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res, 2000. 28(1): p. 27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kanehisa M, et al. , New approach for understanding genome variations in KEGG. Nucleic Acids Res, 2019. 47(D1): p. D590–D595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kalvass JC, Maurer TS, and Pollack GM, Use of plasma and brain unbound fractions to assess the extent of brain distribution of 34 drugs: comparison of unbound concentration ratios to in vivo p-glycoprotein efflux ratios. Drug Metabolism and Disposition, 2007. 35(4): p. 660–666. [DOI] [PubMed] [Google Scholar]
- 25.Li GF, et al. , Quantitative Estimation of Plasma Free Drug Fraction in Patients With Varying Degrees of Hepatic Impairment: A Methodological Evaluation. J Pharm Sci, 2018. 107(7): p. 1948–1956. [DOI] [PubMed] [Google Scholar]
- 26.Patsalos PN, et al. , Serum protein binding of 25 antiepileptic drugs in a routine clinical setting: A comparison of free non-protein-bound concentrations. Epilepsia, 2017. 58(7): p. 1234–1243. [DOI] [PubMed] [Google Scholar]
- 27.Sethi PK, et al. , Ontogeny of plasma proteins, albumin and binding of diazepam, cyclosporine, and deltamethrin. Pediatr Res, 2016. 79(3): p. 409–15. [DOI] [PubMed] [Google Scholar]
- 28.Kim S, et al. , PubChem 2019 update: improved access to chemical data. Nucleic Acids Research, 2019. 47(D1): p. D1102–D1109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Food and Drug Administration, In vitro metabolism‐and transporter‐mediated drug‐drug interaction studies: Guidance for industry. Center for Drug Evaluation and Research, US Food and Drug Administration, US Department of Health and Human Services, Rockville, MD, 2017. [Google Scholar]
- 30.Kuhn M, Building predictive models in R using the caret package. J Stat Softw, 2008. 28(5): p. 1–26.27774042 [Google Scholar]
- 31.Lipscomb JC, et al. , In vitro to in vivo extrapolation for trichloroethylene metabolism in humans. Toxicol Appl Pharmacol, 1998. 152(2): p. 376–87. [DOI] [PubMed] [Google Scholar]
- 32.Heuberger J, Schmidt S, and Derendorf H, When is protein binding important? Journal of pharmaceutical sciences, 2013. 102(9): p. 3458–3467. [DOI] [PubMed] [Google Scholar]
- 33.McNamara PJ and Alcorn J, Protein binding predictions in infants. AAPS PharmSci, 2002. 4(1): p. E4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.McNamara PJ and Meiman D, Predicting Drug Binding to Human Serum Albumin and Alpha One Acid Glycoprotein in Diseased and Age Patient Populations. J Pharm Sci, 2019. 108(8): p. 2737–2747. [DOI] [PubMed] [Google Scholar]
- 35.Clewell HJ 3rd, The application of physiologically based pharmacokinetic modeling in human health risk assessment of hazardous substances. Toxicol Lett, 1995. 79(1–3): p. 207–17. [DOI] [PubMed] [Google Scholar]
- 36.Edginton AN, Schmitt W, and Willmann S, Development and evaluation of a generic physiologically based pharmacokinetic model for children. Clinical pharmacokinetics, 2006. 45(10): p. 1013–1034. [DOI] [PubMed] [Google Scholar]
- 37.Zhou WD, et al. , Predictive Performance of Physiologically Based Pharmacokinetic (PBPK) Modeling of Drugs Extensively Metabolized by Major Cytochrome P450s in Children. Clinical Pharmacology & Therapeutics, 2018. 104(1): p. 188–200. [DOI] [PubMed] [Google Scholar]
- 38.Gleeson MP, Plasma protein binding affinity and its relationship to molecular structure: an in-silico analysis. J Med Chem, 2007. 50(1): p. 101–12. [DOI] [PubMed] [Google Scholar]
- 39.Sjoholm I, et al. , The specificity of three binding sites as studied with albumin immobilized in microparticules. Mol Pharmaco 116, 1979. [PubMed] [Google Scholar]
- 40.Sudlow G, Birkett DJ, and Wade DN, The characterization of two specific drug binding sites on human serum albumin. Mol Pharmacol, 1975. 11(6): p. 824–32. [PubMed] [Google Scholar]
- 41.Sjoholm I, et al. , Binding of drugs to human serum albumin:XI. The specificity of three binding sites as studied with albumin immobilized in microparticles. Mol Pharmacol, 1979. 16(3): p. 767–77. [PubMed] [Google Scholar]
- 42.Tillement JP, et al. , Binding of digitoxin, digoxin and gitoxin to human serum albumin. Eur J Drug Metab Pharmacokinet, 1980. 5(3): p. 129–34. [DOI] [PubMed] [Google Scholar]
- 43.Sengupta A and Hage DS, Characterization of minor site probes for human serum albumin by high-performance affinity chromatography. Anal Chem, 1999. 71(17): p. 3821–7. [DOI] [PubMed] [Google Scholar]
- 44.Wanwimolruk S, Birkett DJ, and Brooks PM, Structural requirements for drug binding to site II on human serum albumin. Mol Pharmacol, 1983. 24(3): p. 458–63. [PubMed] [Google Scholar]
- 45.Xu Z, et al. , Halogen bond: its role beyond drug-target binding affinity for drug discovery and development. J Chem Inf Model, 2014. 54(1): p. 69–78. [DOI] [PubMed] [Google Scholar]
- 46.Tan Y-M, et al. , Reconstructing human exposures using biomarkers and other “clues”. J Toxicol Environ Health, 2012. 15(1): p. 22–38. [DOI] [PubMed] [Google Scholar]
- 47.Yin Y, et al. , Essential set of molecular descriptors for ADME prediction in drug and environmental chemical space. 2014, Research. [Google Scholar]
- 48.Thomas RS, et al. , Incorporating new technologies into toxicity testing and risk assessment: moving from 21st century vision to a data-driven framework. Toxicol Sci, 2013. 136(1): p. 4–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Tropsha A, Gramatica P, and Gombar VK, The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models. Qsar & Combinatorial Science, 2003. 22(1): p. 69–77. [Google Scholar]
- 50.Tropsha A, Application of predictive QSAR models to database mining. Chemoinformatics in Drug Discovery, 2004. 23: p. 437–455. [Google Scholar]
- 51.Kovatcheva A, et al. , Combinatorial QSAR of ambergris fragrance compounds. Journal of chemical information and computer sciences, 2004. 44(2): p. 582–595. [DOI] [PubMed] [Google Scholar]
- 52.Ekins S, et al. , Towards a new age of virtual ADME/TOX and multidimensional drug discovery. Mol Divers, 2002. 5(4): p. 255–75. [DOI] [PubMed] [Google Scholar]
- 53.Wang H, et al. , Understanding and reducing the experimental variability of in vitro plasma protein binding measurements. J Pharm Sci, 2014. 103(10): p. 3302–9. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.