Designing QSARs for parameters of high throughput toxicokinetic models using open-source descriptors

Daniel Dawson; Brandall L Ingle; Katherine A Phillips; John W Nichols; John F Wambaugh; Rogelio Tornero-Velez

doi:10.1021/acs.est.0c06117

. Author manuscript; available in PMC: 2022 May 4.

Published in final edited form as: Environ Sci Technol. 2021 Apr 15;55(9):6505–6517. doi: 10.1021/acs.est.0c06117

Designing QSARs for parameters of high throughput toxicokinetic models using open-source descriptors

Daniel Dawson ¹, Brandall L Ingle ², Katherine A Phillips ¹, John W Nichols ¹, John F Wambaugh ¹, Rogelio Tornero-Velez ^1,^*

PMCID: PMC8548983 NIHMSID: NIHMS1703517 PMID: 33856768

Abstract

The intrinsic metabolic clearance rate (Cl_int) and fraction of chemical unbound in plasma (f_up) serve as important parameters for high throughput toxicokinetic models, but experimental data are limited for many chemicals. Open-source quantitative structure-activity relationship (QSAR) models for both parameters were developed to offer reliable in silico predictions for a diverse set of chemicals regulated under U.S. law, including pharmaceuticals, pesticides, and industrial chemicals. As a case study to demonstrate their utility, model predictions served as inputs to the TK component of a risk-based prioritization approach based on Bioactivity: Exposure Ratios (BER), in which a BER < 1 indicates exposures are predicted to exceed a biological activity threshold. When applied to a subset of the Tox21 screening library (6484 chemicals) we found that the proportion of chemicals with BER <1 was similar using either in silico (1137/6484; 17.53%) or in vitro (148/848; 17.45%) parameters. Further, when considering only the chemicals in the Tox21 set with in vitro data, there was a high concordance of chemicals classified with either BER < 1 or > 1 using either in silico or in vitro parameters (768/848, 90.5%). Thus, the presented QSARs may be suitable for prioritizing the risk posed by many chemicals for which measured in vitro TK data are lacking.

Keywords: Metabolic clearance, Fraction unbound in plasma, machine learning, toxicokinetics, QSAR, Chemical risk prioritization

Graphical Abstract

graphic file with name nihms-1703517-f0001.jpg

1. INTRODUCTION

Environmental contamination associated with industrial processes and household products results in a complex chemical exposure that can impact public health.^{1, 2} Regulation of non-pesticidal environmental contaminants within the United States occurs under the Toxic Substances Control Act, as recently re-authorized by the Frank R. Lautenberg Chemical Safety for the 21st Century Act.^{3, 4} These regulations require the U.S. Environmental Protection Agency to evaluate risks to human health and the environment associated with both new and existing chemicals. With >86,000 chemicals in this domain⁵, traditional in vivo methods used to assess toxicity for individual chemicals are not feasible.⁶ Instead, a need exists for high throughput methods, including in vitro assays and in silico models, to prioritize chemicals posing the greatest potential public health risk for further review^6–12. High throughput in vitro screening initiatives such as the U.S. Federal Government Toxicology in the 21^st Century (Tox21) and Toxicity Forecaster (ToxCast) use batteries of assays to identify potential hazards across curated libraries of chemicals of interest¹³. A recent publication from the U.S. National Toxicology Program showed the value of such high throughput methods coupled with in silico methods in a risk prioritization framework¹⁴.

Toxicokinetic (TK) models that account for chemical absorption, distribution, metabolism, and elimination (ADME properties), sometimes referred to as physiological or physiologically-based toxicokinetic models (PBTK), are widely used to predict a compound’s concentration time-course in selected tissues based on a known or forecast exposure.^15–18 With these models, a measured in vitro effect concentration at an assumed relevant tissue can also be related to a corresponding environmental exposure in a process known as reverse dosimetry.^{19, 20} For most environmental contaminants, in vivo TK information is unavailable, so in vitro measurements and generic TK models are often used.²¹ The intrinsic hepatic clearance rate (Cl_int) and fraction of chemical unbound in plasma by proteins (f_up) are two key chemical-specific inputs for such modeling efforts. In vitro measured values for these parameters have already proven themselves useful for the risk prioritization of hundreds of non-pharmaceuticals in the ToxCast library.^{8, 14, 20, 22–24} Currently, however, similar data are unavailable for thousands of other chemicals of concern.^{14, 20, 25} Fortunately, in silico models can potentially fill this data gap in a fast and cost-effective manner.^{6, 12, 21, 26}

In silico models have shown potential for predicting these toxicokinetic parameters with approaches ranging from simple physicochemical property based equations to quantitative structure-activity relationships (QSAR) to isoenzyme specific binding models.^25–36 However, most models have focused exclusively on predicting these parameters for pharmaceuticals,^{28, 37–40} with prediction outside this domain relatively limited. Additionally, many existing models for Cl_int and f_up are constructed with proprietary data, descriptors, or software, which limits access to the models and evaluation of prediction quality within a desired chemical space.

In the present study, machine learning algorithms were used to construct accessible, open-source QSAR models for f_up and Cl_int with a focus on a broad set of chemical domains including pharmaceuticals, pesticides, and industrial chemicals. These models were then incorporated into an existing TK-based framework to evaluate the utility of the QSARs in high throughput public health risk prioritization.

2. MATERIALS AND METHODS

2.1. Data Collection and Curation

2.1.1. Hepatic Clearance: Cl_int

In vitro Cl_int values were collected from two sources. First, ChEMBL (version 21) ^41–43 is a “a manually curated database of bioactive molecules with drug-like properties”. Second, the ToxCast screening program⁴⁴, hereafter known as the “ToxCast dataset” included chemicals from a variety of domains that were reported in the literature.^{8–10, 24} Assembled values included both hepatic cell assays as well as assays using microsomes, and only included chemicals with 1^st order dynamics (See Supplemental Information (S1.1.1) for further details). All values were standardized into units of μL/min/10⁶ cells, with Cl_int values based on microsomes (2163/2917) converted to units of μL/min/10⁶ cells using an extrapolation factor: 1 mg/ml microsomal protein to 1 × 10⁶ cells⁴⁵. Lastly, chemical binding by biological substrates such as lipids and proteins can interfere with the calculation of clearance rates calculated via in vitro assays⁴⁵. To adjust for this, Cl_int values were divided by an assay specific (hepatic or microsomal) correction factor based on the lipophilicity characteristics of the chemical. See Supplemental Information (S1.1.2) for further details on chemical binding correction factors.

Next, we decided to use a classification approach to model Cl_int. Hepatic clearance is accomplished through enzymatic activity, carried out largely through an array of cytochrome P450 (CYP450) enzymes. Although targeting CYP450 activity has been a frequent approach for QSAR modeling of Cl_int⁴⁰, broadly modeling Cl_int is challenging because enzymes vary in their activity by chemical group⁴⁶, and cellular processes such as efflux mechanisms may influence measured Cl_int rates⁴⁷. However, a previous general QSAR model for Cl_int by Kirman et al. (2015)²⁶ fit over a broad domain of chemicals using regression was encouraging (R²=0.39). Because heteroskedastic distributions of model residuals impacted different regions of Cl_int values (e.g., fast vs. slow), we reasoned that model performance could be improved by recasting clearance prediction as a classification problem. By accepting a lower level of predictive precision imposed by classification bins of clearance rates, we expected to gain an overall increase in accuracy. Furthermore, we sought to broaden the model domain compared to Kirman et al. (2015) (VOCs, ToxCast Phase I and Phase II compounds) by including a nominally equal representation of pharmaceuticals randomly selected from ChEMBL.

Preliminary work showed that assembled in vitro Cl_int values for chemicals could be grouped by quartiles into 4 bins (See Supplemental Information (S1.1.3)), including “very slow”(< 3.9 μL/min/10⁶cells), “slow” (3.9–9.3 μL/min/10⁶), “fast” (9.3–21.7 μL/min/10⁶cells), and “very fast” (>21.7 μL/min/10⁶cells). In addition, the transition point between the slow and fast rates of metabolism (9.3 μL/min/10⁶cells) is biologically relevant, as the rate of average blood flow to the human liver is similar to this value when converted to units of μL/min/10⁶ cells (8.45)^48–50. See Supporting Information (S2.1) for a detailed accounting of this conversion. Because the rate of hepatic clearance cannot exceed the rate of blood flow to the tissue, predicted hepatic clearance becomes increasingly rate-limited by liver blood flow as Cl_int approaches and then exceeds this transitional range. However, because blood-flow rates vary, it is difficult to ascribe a specific threshold above which Cl_int equals to blood flow. Thus, in a parallel with the 4-bin classification described above, we created a 3-bin classification in which the “fast” and “very fast” clearing chemicals are combined into a single “fast” bin.

To construct and assess models for Cl_int, the overall dataset was split into a training set consisting of equal representation of both ToxCast and ChEMBL data (1600), with each bin of the 4-Bin training set explicitly balanced by clearance rate (very slow, slow, fast, very fast) and data source. For the 3-Bin training set, a combined fast bin was constructed by combining the fast and very fast bins of the 4-Bin training set. Two independent validation sets consisting of chemicals from either ToxCast (99) or ChEMBL (2113) were also assembled. The ChEMBL test set was relatively large (e.g., each bin could contain at least 248 chemicals), and it was explicitly balanced by bin. Because the ToxCast test was relatively small, it was not balanced by classification bin. However, each bin made up at least 8% of the dataset. See Supporting Information (S1.1.3) for a detailed description of how training and test sets for Cl_int were assembled.

2.1.2. Fraction Unbound in Plasma: f_up

In vitro values for f_up were assembled starting from the dataset of Ingle et al.²⁵, a collection of 1651 (1245 pharmaceuticals and 406 ToxCast) compounds curated from the literature ^{10, 22, 28, 51} This dataset was predominantly made up of pharmaceuticals, and we sought to train and test models over a broader chemical domain. Thus, this dataset was augmented with additional compounds from the more recent Wambaugh et al.²⁴, and reassembled into training and test sets. The training set consisted of a combined, balanced training set of 1305 chemicals (650 ToxCast, 655 pharmaceuticals), and test sets included 2 pharmaceutical test sets (199, 454), and a third test set consisting of the same 99 ToxCast chemicals used to evaluate the Cl_int model described above. See the Supporting Information (S1.2) for additional details on the assembly of training and test sets for f_up.

The remainder of the Methods and Materials section describe QSAR model development, evaluation, and application. Please see Figure 1 for a visual reference of these details.

Fig 1. — Overall methods workflow of study, with A) Random Forest model development and selection for f_up and Cl_int using assembled training sets; B) Model evaluation of source-specific optimal f_up and Cl_int models against test sets (following removal of chemicals outside applicability domain) and previously published models; C) Generation of bioactivity-exposure ratios (BER) using reverse dosimetry with the *httk* R package to evaluate model outputs in bioactivity-exposure space with the ToxCast test set D) Generation of BER’s for 6484 chemicals of the Tox21 set using the same methodology as (C), only 884 of which have known values for f_up and Cl_int.

2.2. QSAR Descriptor Calculation and Evaluation

2.2.1. Cl_int Model Development and Selection

Random Forest (RF) Classification was used to construct a QSAR model of binned values of Cl_int using descriptors from 4 open-source descriptor sets: PaDEL,⁵² OPERA,⁵³ ToxPrints, ^{54, 55} and MACCS.^{56, 57} Of these descriptors sets, PaDEL, ToxPrints, and MACCS are different systems of describing chemicals through chemical structure fingerprints. In contrast, OPERA represents a collection of physicochemical properties predicted by CDK-structural and PaDEL fingerprints.⁵⁸ We also combined descriptor sets together into an “All Descriptor sets” set. Following a pruning process to remove uninformative low variance (<5%) and strongly correlated (>95%) descriptors, the final descriptor sets included 710 PaDEL descriptors, 17 OPERA descriptors, 79 ToxPrints, 116 MACCS fingerprints, and 917 descriptors in the “All Descriptors sets” set. All descriptors in each descriptor set were centered and scaled to 1 standard deviation with respect to the training set.

To prevent overfitting while maintaining model generalizability, a recursive feature elimination (incorporating 5-fold cross validation) was used with the descriptors from each source to find the minimum subset of descriptors that allows for similar prediction accuracy as the full set of descriptors for that source. See the Supporting Information (S1.3) for a detailed description of this process. In brief, for each descriptor source a model was fit with the total number of available descriptors. Because this model had the highest accuracy for a set of descriptors, it was the source-specific best model. Next, a series of model subsets with decreasing numbers of descriptors (down to 1 descriptor) were iteratively fit based on the importance values of descriptors in the previous subset; that is, the least important descriptors in one model subset were excluded from next model subset. The source-specific optimal model was then defined as the model with the least number of descriptors that had an average accuracy within the 95% Confidence Interval of the source-specific best model.

The R package randomForest^{56, 59} was used to fit models for the recursive feature elimination, and average model accuracy was assessed using the caret^{60, 61} package. Thus, a total of 10 source-specific optimal models were produced using the above described process (5 descriptor source sets * 2 binning sets (3 bins, 4 bins)). Out of all the source/bin-specific optimal models, the one with the highest average accuracy was defined as the best overall QSAR model for Cl_int. This best overall model was utilized for all downstream in silico Cl_int predictions. The statistical significance of the accuracy of prediction was assessed against the no-information rate, which equaled the proportion of chemicals in the largest bin to the overall set.⁶⁰

Following the selection of the best overall model, the applicability domain (AD) of all best source/bin-specific models were determined using a methodology described by Roy et al. 2015.⁶² This methodology uses the ranges of standardized values of model descriptors for chemicals in the training set to determine whether a novel chemical would be reasonably described by the model. Chemicals from the test sets that were determined to be outside the AD of each model were removed. Then, the accuracy of the overall best model and each of the other source/bin-specific optimal models was calculated using the assembled test sets. In addition to the accuracy, which calculates the overall ability to classify the test, we calculated the multi-class weighted average precision of each model.^63–65 This metric takes the proportion of the bin of the test set into account and indicates the average probability of a classified chemical belonging to the assigned bin.

2.2.2. Cl_int Model Evaluation

Most recently published models for Cl_int have been structured as regression models instead of classification models¹⁴, making direct comparison with the present classification-based QSAR models (producing binned values) problematic. As an alternative, we compared the impact of Cl_int predictions within a TK model context, as this is a primary application of the new Cl_int models. The TK model outputs (e.g. steady state plasma concentrations (C_ss)) that depend on Cl_int values were evaluated utilizing in vitro data and in silico predictions from the new classification models and a previously published regression model. This analysis was conducted using a simplified TK model (“3compartmentss”) implemented in the high throughput toxicokinetics (httk) R package, version 2.0.1.⁵⁰ This model also includes passive glomerular filtration as a secondary clearance mechanism, and uses a Monte Carlo approach to simulate a distribution of C_ss values for chemicals across a population. It assumes an intravenous, continuously administered reference dose of 1 μg/kg/day. See Pearce et al. (2017)⁵⁰ for complete details of this model. The C_ss values for in-domain chemicals in the ToxCast test set were determined with this TK model using Cl_int predictions from the best overall model by substituting the medians of in vitro Cl_int values within the predicted bin. We repeated this exercise using Cl_int predictions made by a model by Sipes et al (2017).¹⁴ That model focused on modeling cumulative CYP450 enzyme activity, and used proprietary software (ADMET Predictor 7.2, Simulations Plus Inc, Lancaster, CA) to produce log P and pka parameters for 8758 chemicals. For consistency in our comparison, in vitro values for f_up available in the ToxCast dataset were utilized to estimate C_ss values. The two sets of C_ss predictions were compared against C_ss values produced using the in vitro Cl_int data using linear models of log₁₀ transformed data. The coefficient of determination (R²) of each model was used to compare how well the modeling approaches approximated in vitro Cl_int parameter values within the context of the TK model.

2.2.3. f_up Model Development and Selection

Random Forest (RF) Regression was used to construct a QSAR model of f_up using the descriptors from the same 4 open-source descriptor sets used for Cl_int: PaDEL, OPERA, ToxPrints, and MACCS. PaDEL descriptors were incalculable for some compounds (primarily those with large ring structures) resulting in 1305 (from 1310) drug/ToxCast training samples, 99 ToxCast test samples, 199 (from 200) drug test compounds (I), and a second drug test set of 454 (from 458) compounds (II). Pruning of descriptor sets to remove uninformative low variance (<5%) and strongly correlated (>95%) descriptors resulted in 648 PaDEL, 13 OPERA, 120 ToxPrints, and 83 MACCS features to be used as source-specific model descriptors. These descriptors were also compiled together into an “All Descriptors set” containing 864 descriptors. All descriptor sets were centered and scaled by 1 standard deviation with respect to their values in the training set. To account for a skew towards compounds with high protein binding affinity we considered several transformations, including converting experimental f_up into pseudo equilibrium constants (ln K_a),^{25, 66, 67} the logarithmic transformation (log f_up),^{36, 68} or by the square root transformation (√ f_up).⁶⁹ Each transformation was evaluated following model selection (see next section) for best fit. See Supplemental Information (S1.4) for more details of f_up transformation.

To determine an overall best model for f_up, we used a process similar to that described for Cl_int, with recursive feature elimination carried out using the caret^{60, 61} package of R. That is, for each descriptor set we compared models built with all descriptors (the source-specific best model) against models built with decreasing numbers of descriptors to find a subset that produced a model with similar performance (the source-specific optimal model). For f_up, source-specific optimal models were selected as the models with the smallest number of descriptors in each descriptor set for which there was no statistical difference with respect to the average root mean squared error (RMSE) compared to the source-specific best model. For comparative purposes, mean absolute error (MAE) was also calculated for all models. Then, from the source-specific optimal models, a best overall QSAR model for f_up was selected as the model with the highest overall average coefficient of determination (Q², from 5-fold CV) produced from the 5 descriptor sets. This model was utilized for all downstream in silico f_up predictions. The ADs of all source-specific best models (including the best overall model) were calculated using the same methodology as described in section 2.2.1, and applied to the assembled test sets to remove chemicals outside the model domains. Finally, the fit of the source-specific optimal models, including the best overall QSAR model for f_up, was assessed against the AD-filtered test sets.

2.3. Bioactivity: Exposure Ratio (BER)

The Bioactivity: Exposure Ratio (BER) is a metric for risk prioritization that compares a compound’s relative bioactive dose and expected external exposure. A bioactive exposure equivalent of the bioactive dose, expressed as an oral equivalent dose (OED; mg/kg body weight/day), can be determined by reverse dosimetry using TK models and measured in vitro toxicity information. The bioactive exposure equivalent can then be compared to the known or anticipated exposure concentration (also expressed as OED in mg/kg body weight/day) to provide a dimensionless, chemical-specific metric of risk, the BER. Given that TK model information is sparser than toxicity information, we propose expediting TK model development using in silico parameters to estimate BER for new compounds. In this study, BER was defined as:

B E R = O E D_{B i o a c t i v e D o s e} / O E D_{E x p e c t e d E x p o s u r e} .

[1]

In equation 1, OED_{Bioactive Dose} is the oral equivalent dose of a bioactive concentration from an in vitro toxicity screening assay, and OED_{Expected Exposure} is the oral equivalent dose at the highest estimated exposure dose rate. The bioactive dose for a chemical was defined as the lowest 10^th percentile (Q10) of the half-maximal activity concentrations (AC₅₀) across the ToxCast screening assays for that chemical.⁷⁰ The Q10 represents activation of a sufficient number of assays such that a corresponding in vivo effect is plausible. A BER < 1 (or on a log10 scale, < 0) indicates that the highest expected exposure level for a chemical exceeds the bioactive concentration; thus, chemicals with a lower BER may be of higher risk relative to chemicals with higher BER values.

The current study assessed the utility of the developed QSARs in a risk prioritization context by calculating BERs for the ToxCast test set of 99 chemicals according to several different scenarios. These scenarios used measured in vitro data and/or in silico data from the best overall QSAR models, and included: 1) in vitro f_up and Cl_int values, 2) in vitro f_up and binned in vitro Cl_int values ; 3) in silico values for f_up and Cl_int ; 4) in silico f_up values and y-randomized Cl_int values; 5) y-randomized f_up values and in silico Cl_int values, and 6) y-randomized f_up and Cl_int values. Comparisons with y-randomized values employed in vitro f_up and/or Cl_int values that were randomly sorted to assess the relative sensitivity of BER estimation to Cl_int and f_up parameters. First, to evaluate how predicted values from our QSAR models influenced calculated estimates of steady state chemical concentrations in plasma (C_ss), C_ss values were calculated using each of the scenarios above for the 99 chemicals in the set using a TK model in the R package httk.^{50, 71} In this instance, the TK model assumed an OED of 1 mg/kg per day. Then, we regressed C_ss values calculated using in vitro parameter values data (i.e., scenario 1), against C_ss values generated using parameter values described in scenarios 2–6 above.

To extend the evaluation of the QSAR’s into bioactivity space, the ToxCast dataset was used to identify the Q10 of plasma concentrations known to cause bioactivity⁷² for each chemical. These values were then converted into OEDs via reverse dosimetry using the same TK model as above. The TK model utilized the combinations of Cl_int and f_up parameter listed for scenarios 1–6 above. These OED’s were divided by a combination of empirical and modeled exposure estimates^{50, 73, 74} to produce BER values. Further details on the data sources and methods used to prepare toxicological data and exposure data for this analysis are found in the Supporting Information (S1.5). Following the calculation of BERs, those generated from in vitro parameter values (i.e., scenario 1) and scenarios 2–6 were compared in two ways, including 1) calculating Spearman’s rank correlation coefficients (ρ)⁷⁵ to assess the general of the order of BERs and 2) calculating RMSE to quantify the difference in BER values. These approaches allowed for the evaluation of both Cl_int and f_up QSAR’s together and separately, and in both a toxicokinetic and bioactivity context.

To evaluate the potential of these QSAR models to provide TK estimates for chemicals for which empirical data are lacking, BERs were calculated for chemicals in the Tox21 dataset.⁷² The entire dataset is comprised of approximately 10,000 chemicals across a wide domain of applicability, including industrial chemicals, pesticides, and consumer products.¹³ The evaluated dataset was limited to a subset of 6493 chemicals for which both exposure and toxicological data were available, sufficient physicochemical property information was available to calculate the OED using the toxicokinetic model in httk⁷¹, and chemicals were in the AD’s of both QSAR models. For these chemicals, OEDs and corresponding BERs were calculated using QSAR-predicted Cl_int and f_up. For comparison, OEDs and BERs were also calculated for a subset of 848 chemicals (also within the AD of the QSAR models) using available in vitro Cl_int and f_up values.

Lastly, the Organization for Economic Cooperation and Development (OECD) recommends that QSAR models intended for regulatory purposes meet 5 criteria. We discuss how our models meet these criteria.

3. RESULTS AND DISCUSSION

3.1. Random Forest Classification model for Cl_int

3.1.1. Model description and evaluation

The model selection process determined that the overall best model was the 3-Bin All Descriptor sets model with 30 descriptors (i.e., “3-Bin All Descriptors (30)”). See Supporting Information (S2.2) for the training data set for this model. This model had an average accuracy = 58.5% when applied to the training set. As expected, the accuracies of source-specific 3-Bin optimal models were higher than their corresponding 4-Bin models, with all models being significantly better than random chance (Table 1 (top)). In addition, for each Source/Bin-specific optimal model, the number of descriptors needed to achieve a statistically similar level of predictive ability as the Source/Bin-specific best model was markedly fewer than the total available for the source. See the Supporting Information for graphical depictions of the model selection process (S2.6), and more detailed description of the descriptors included in this model (S2.7).

Table 1.

Source-specific optimal QSAR models for Cl_int applied to training and test sets


Training Set

Model Bins	Descriptor Source	Descriptors Considered	Descriptors Selected	Ratio of selected to considered	Accuracy	95% CI Boundary	NIR	P-value

4	a	917	30	0.033	0.511	0.024	0.25	<0.001
4	t	79	30	0.380	0.407	0.022
4	m	116	40	0.345	0.475	0.022
4	p	710	20	0.028	0.495	0.022
4	o	17	10	0.588	0.470	0.028

3	a	917	30	0.033	0.585	0.025	0.5	<0.001
3	t	79	20	0.253	0.516	0.015
3	m	116	50	0.431	0.539	0.021
3	p	710	30	0.042	0.572	0.023
3	o	17	10	0.588	0.566	0.025

			Test Sets

		ChEMBL				ToxCast

Model Bins	Dataset	Accuracy	NIR	P-value	Average Multiclass Precision	Accuracy	NIR	P-value	Average Multiclass Precision
4	a	0.376	0.250	<0.001	0.620	0.551	0.388	0.001	0.488
4	t	0.335		<0.001	0.405	0.465	0.384	0.062	0.434
4	m	0.314		<0.001	0.509	0.535	0.384	0.002	0.501
4	p	0.336		<0.001	0.537	0.541	0.388	0.002	0.507
4	o	0.341		<0.001	0.434	0.537	0.389	0.003	0.546

3	a	0.489	0.330	<0.001	0.620	0.694	0.531	0.001	0.714
3	t	0.419		<0.001	0.490	0.616	0.535	0.065	0.625
3	m	0.447		<0.001	0.604	0.606	0.535	0.095	0.603
3	p	0.470		<0.001	0.620	0.643	0.531	0.016	0.665
3	o	0.437		<0.001	0.489	0.684	0.526	0.001	0.689

Open in a new tab

Top Panel: The number descriptors available, number of descriptors selected, accuracy, 95% confidence boundary, and p-value of source-specific optimal models based on the model selection process. Bottom Panel: Source-specific optimal models applied to the ChEMBL and ToxCast Test sets.

NIR=No-information rate; proportion of the class with the largest proportion of cases. Note that the test sets (ChEMBL and ToxCast) only included chemicals that fell into the applicability domain of the corresponding model.

After filtering the test sets with the applicability domain of the model (Supplementary Information (S2.3)), the model had the highest accuracy when applied to the ToxCast (69.4%) test set. The multiclass precision metric was higher for the ChEMBL test set (0.615) but was still lower than the metric for the ToxCast test set (0.714). A closer look at the confusion matrices for this model against the test sets (Supplemental Information S2.8), shows that the apparent worse performance of the model with the ChEMBL test set was likely due to poor performance of chemicals in the “very slow” category (11% accuracy). In contrast, the “very slow” category was predicted reasonably well for the ToxCast test set (55.3%), and both sets predicted chemicals in the combined “fast” and “very fast” category with high accuracy (88.5% in ToxCast, 81.4% in ChEMBL). The imbalance in the ToxCast test set (e.g., the “slow” bin contained only 8% of the set) may have resulted in an accuracy metric that is biased high. However, this concern is offset by the similar values of the multiclass precision metric between the ToxCast and ChEMBL test sets, which takes bin proportion into account. Overall, the evaluation of the model performance suggests that the model is less consistently accurate for more slowly clearing chemicals.

Lastly, an evaluation of the best overall model against a recently published model by Sipes et al.¹⁴ showed that in silico inputs from the Sipes model resulted in a linear model with an R² = 0.249, while in silico Cl_int values from our model had an R² = 0.55. Thus, our in silico Cl_int model is a marked improvement over the Sipes model when applied to the chemicals of the ToxCast test set. See Supporting Information (S2.11) for details on these linear models. Previously published QSAR models constructed specifically for pharmaceuticals have better applicability across a range of Cl_int rates^{76, 77}. However, many pharmaceuticals are known to be metabolized by a relatively few CYP450 enzymes,⁴⁰ with some QSAR models designed for chemicals metabolized by specific enzymes⁴⁶. The Sipes model is somewhat mechanistic as well, with each parameter representing activity by different CYP450 liver enzymes known to influence clearance. By training a mix of pharmaceutical and non-pharmaceuticals against a wide set of descriptors, the Cl_int model presented here represents a tradeoff in accuracy and precision compared to pharmaceutical models, but still an improvement for general chemical general applicability. In addition, the model presented heres uses open source descriptors, while the Sipes model utilized proprietary descriptors.

3.2. QSAR model for f_up

3.2.1. Model description and evaluation.

Using recursive feature elimination with all descriptors, comparison of transformations of experimental f_up indicated that the square root transformation provided the best fit (Supplemental Information (S2.14)). Using the √ transformation, the model selection process (see Supplemental Information (S2.9) for a graphic depiction) showed that the best overall model was constructed from all descriptor sources and using 30 descriptors, hereafter referred to as the “All (30)” model. See Supplementary Information (S2.4) for all training data utilized for this model. After filtering the test sets with the applicability domain of this model and applying it to the remaining chemicals (Supplementary Information (S2.5)), the model was further supported as the best overall model (Table 2). The model consisted of mostly PaDEL descriptors (26) but showed a modest benefit from the inclusion of 4 OPERA descriptors (Q²=0.58) relative to the PaDEL-specific model (Q²= 0.56). The top two OPERA descriptors in the All (30) model were associated with water/octanol partitioning and water solubility and ranked, respectively, 1^st and 4^th in importance. Similarly, 2 of 26 PaDEL descriptors in the All (30) model associated with octanol-water partitioning, CrippenLogP and XLogP, ranked highly (2 & 5). Models based on MACCS or ToxPrints were less predictive and did not contribute among the top 30 descriptors with Q² of 0.403 and 0.437, respectively. The All (30) model outperformed the other source-specific models for the ToxCast and Drug I and II test sets (Table 2). All descriptors used in the model and their ranks are shown in the Supporting Information (S.2.10). The All (30) model was used in all subsequent BER analysis and Tox21 predictions.

Table 2.

Random Forest Performance Metrics for f_up Predictions

DataSet	Metric	All (30) 30 of 864	PaDEL (30) 30 of 648	MACCS (30) 30 of 120	ToxPrint(30) 30 of 83	OPERA (11) 11 of 13
Training^a	MAE	0.131	0.139	0.171	0.166	0.146
	RMSE	0.206	0.213	0.249	0.242	0.223
	Q²	0.584	0.556	0.403	0.427	0.520
	N	1305	1305	1305	1305	1305

Drug I	MAE	0.164	0.166	0.203	0.211	0.186
	RMSE	0.228	0.233	0.274	0.293	0.262
	R²	0.560	0.549	0.423	0.342	0.464
	N	189	189	199	199	198

Drug II	MAE	0.157	0.158	0.201	0.204	0.189
	RMSE	0.219	0.222	0.269	0.279	0.259
	R²	0.613	0.599	0.438	0.387	0.475
	N	432	431	454	454	451

ToxCast	MAE	0.112	0.118	0.154	0.141	0.130
	RMSE	0.187	0.191	0.241	0.218	0.203
	R²	0.591	0.562	0.283	0.463	0.495
	N	97	97	99	99	97

Open in a new tab

Training set metrics are the mean values across the 5 testing sets from 5-fold CV; mean absolute error (MAE) and root-mean-square error (RMSE) for random forest are shown. b Q2 is the cross-validated R² as per Golbraikh and Tropsha (2002). ⁷⁸ Column headers indicate descriptor set and number of best descriptors employed. Samples sizes (N) for test sets (Drug I, Drug II, and ToxCast) indicate number of chemicals included in the applicability domain as per Roy et al.(2015).⁶²

3.2.2. Comparison of f_up QSAR model to recently published models.

Several QSAR models of plasma protein binding and f_up published in recent years were either trained on pharmaceuticals, involved the use of proprietary predictors, or both.^{28, 36, 69, 79, 80} Ghafourian and Amin (2013)⁷⁹ employed proprietary software and descriptors to construct QSAR models for f_up on a moderately sized pharmaceutical dataset (662 training set , 132 test set), and reported R² values ranging from 0.58–0.72 (training) and 0.58–0.65 (test) for different methods including random forest regression. Toma et al. (2019)⁶⁹ also employed proprietary software and descriptors to model f_up on a set of 670 pharmaceuticals, and reported a Q² of 0.6. Like our study, these authors observed superior fits for the √f_up transformation compared with other assessed transformations (Supplemental Information (S2.14)). In this study, we expanded the chemical training domain employed by Ingle et al. (2016) and used similar test sets (Drug I test set was equivalent between the two studies). While our model (R²=0.59) performed better than Ingle et al.²⁵(R²=0.39 and 0.56) for ToxCast-based testing data, Ingle et al. (2016)²⁵ (R² =0.62) performed better for the Drug I test set(this paper, R²=0.56). This likely reflects the predominant influence of pharmaceutical compounds in Ingle et al. (2016)²⁵ training data relative to the more balanced ToxCast dataset utilized here. In addition, the model given by Ingle et al. (2016)²⁵ was constructed using a proprietary prediction software (MOE), while our model utilizes only open-source descriptors. Similarly, the model provided by Watanabe et. al. (2019),³⁶ was also constructed using open-source descriptors (including PaDEL), however, we did not observe a benefit from applying the log(f_up) transformation as they report. While our best model for f_up had slightly a lower R² than the best log-based open-source models of Watanabe et al. (2018)³⁶ (R²=0.691), the training set used here encompasses a more diverse chemical space of both pharmaceutical and environmental chemicals.

3.2.3. Compliance with OECD guidelines and model interpretation

Our models generally meet the guidelines suggested by the Organization for Economic Cooperation and Development (OECD) for the development of QSAR models used for regulatory purposes (Table 3). Although a mechanistic interpretation of the model (criteria 5), is less straightforward here than using methods such as multiple linear regression³⁵ due to the lack of directionality and the large number of descriptors, general inferences can be made about the relative importance of particular mechanisms as suggested by high-ranking descriptors⁵³. For Cl_int, 4 of the top 10 descriptors (including 3 of the top 5) are measures of lipophilicity, while 5 are electronic and/or topological descriptors (See Supplemental Information S2.7). Lipophilicity measures are likely important here because they influence the transport of chemicals through cell membranes,⁴⁶ and because they are indicators of differences in substrate affinities for particular CYP450 enzymes.⁸¹ Likewise, electronic and topological descriptors may capture binding affinities on active residues of CYP450 enzymes. In the f_up QSAR, lipophilicity descriptors (QSAR-predicted LogP, and PaDEL descriptors CrippenLogP, and XLogP) and water solubility made up the top four descriptors, with electronic/topographical descriptors making up the remaining 5 of the top 10 descriptors. Lipophilicity influences the partitioning of chemicals between the aqueous and protein components of plasma, and is known to be a primary driver of plasma protein binding.⁶⁷ Concurrently, electronic/topographical descriptors reflect the influence of molecular structure and charge interactions on plasma protein binding.⁶⁷ Thus, the QSAR model appears to have captured critical aspects governing f_up.

Table 3:

OECD Guidelines for QSAR models to be used in regulatory contexts

Number	OECD Guidelines	Met?	Explanation
1	A defined endpoint	Yes	Models explicitly predict Cl_int and f_up
2	An unambiguous algorithm	Yes	Algorithms are publicly available and published in R packages
3	A defined domain of applicability3	Yes	Used method by Roy et al. (2015)⁶²
4	Appropriate measures of goodness-of–fit, robustness and predictivity	Yes	5-fold cross validation were used to construct models (robustness), accuracy and RSME metrics were applied to models (goodness of fit), independent test sets used to evaluate models (predictivity)
5	A mechanistic interpretation, if possible	Yes	Highly ranked descriptors with mechanistic interpretations; see text for specific details

Open in a new tab

Organization for Economic Cooperation and Development (OECD) guidelines for QSAR models to be used for regulatory purposes, along with if and how each are met by the models presented here.

3.3. Risk-Based Prioritization Approach

3.3.1. Evaluation of QSAR model produced C_ss values.

Linear regression analysis (in log₁₀ space) was used to evaluate QSAR-produced predictions translated to a TK context (See section 2.3 for details). First, C_ss values produced using in vitro parameters were generally well represented when in vitro values of Cl_int were binned (R² = 0.615). Thus, if reasonably accurate values for Cl_int (via medians of bin) can be predicted from structure, it would be possible to make reasonable predictions of C_ss. Next, C_ss values produced using in vitro parameters was predicted with low-to-moderate success using in silico parameters (R²=0.195). Lastly, when either f_up (R²=0.006), Cl_int (R² = 0.127) or both parameters (R²=0.038) were supplied with y-randomized values, predicted C_ss were much less predictive of in-in vitro data. Complete details on the linear models discussed here are presented in the Supplemental Information (S2.11).

3.3.2. Evaluation of Bioactivity-Exposure Relationship using the ToxCast Test Set

For scenarios 2–6 described in 2.3, BERs were calculated and assessed against BERs calculated with continuous in vitro parameters (i.e., scenario 1) using correlation (Spearman’s ρ) and RMSE with the ToxCast test set (See Supplementary Information S2.12). The highest correlation/lowest RMSE (ρ=0.938, RMSE=17704) was when the in vitro data were binned. A slightly lower correlation/moderately higher RMSE occurred (ρ=0.852, RMSE=44115) when the QSAR models supplied the parameter values. Of the 91 chemicals in the ToxCast test set with active assay data, 16 had exposure estimates that exceeded the Q10 of the estimated oral equivalent dose (OED) when continuous in vitro parameters were used, and thus were negative on a log₁₀ scale (Fig 2A). When Cl_int and f_up were replaced with QSAR supplied values, negative log₁₀ BER values were predicted for all but 3 chemicals (that is, an 18.75% false negative rate). Meanwhile, negative log₁₀ BER values (that is, false positives) were predicted for only 7 additional chemicals (corresponding to a false positive rate of 9.3 %) (Fig 2B).

Interestingly, only modest reductions in correlation occurred when either Cl_int (ρ = 0.68), f_up (ρ = 0.765), or both (ρ = 0.72) were y-randomized (Fig 2C). The replacement of Cl_int with y-randomized values resulted in 3 false negatives (18.75% false negative rate), while the replacement of f_up or both with y-randomized values resulted in 5 false negatives (31.25% false negative rate). In addition, replacing either or both with y-randomized values resulted in 8–13 false positives (corresponding to 10–17.3% false positive rate). In contrast, while the y-randomization of f_up (RMSE=48866) or both f_up and Cl_int (RMSE=39752) resulted in RMSEs of similar magnitude to the completely QSAR supplied model, the RMSE was an order of magnitude when Cl_int (RMSE=234798) alone was y-randomized. Overall, these results demonstrate that if values for f_up and Cl_int are chosen that simply lie within the range of biological plausibility (i.e., y-randomized points were selected at random from in vitro values), a reasonable ordering of chemicals by bioactivity and subsequent risk may be made. However, parameter values provided by the QSAR models resulted in a better relative ordering of chemicals by BER, fewer wrongly classified BER estimates (either < or > 0), and estimates that were closer to those produced using measured in vitro values, particular for Cl_int. Thus, the QSAR models presented can significantly contribute toward chemical prioritization.

3.3.3. Bioactivity-Exposure Relationship Risk Prioritization Applied to Tox21 Dataset

Lastly, a significant portion of the Tox21 dataset used here (see Supplemental Information S.13) is currently missing either in vitro Cl_int or f_up values (5636/6484, 87.4%). However, the proportions of chemicals with httk-calculated BER values < 1 (that is, where expected exposures exceed bioactive doses) was similar between the subset of chemicals using available in vitro parameter values (148/848; 17.45%)(Fig 3A) and when all chemicals used in silico parameter values (1137/6484; 17.53%)(Fig 3B). In addition, for the chemicals for which in vitro data were available, the overall concordance in BER classification (i.e., either > or < 1) between the in vitro and in silico predicted values was high (90.5%). Fifty-five chemicals (6.48% of in vitro dataset) chemicals that were predicted to have BERs < 1 when in vitro data are used were predicted to have BERs > 1 when in silico values are used. This corresponded to a false negative rate of 37.16%. Conversely, 25 chemicals (3.0% of in vitro dataset) predicted by in vitro data to have BERs > 1 where predicted to have BERS < 1 when in silico parameter values were used. This corresponded to a false positive rate of 3.6%. A closer look at the false negatives showed that 20% had in silico predicted BERs of between 1–2, while about 28% of false positives had in vitro predicted BERs between 1–2; thus, these were near misses. Increasing the precision of future classifications may be important for better risk prioritization of some chemicals. On the other hand, very rapid in vitro Cl_int estimates may greatly exceed reasonable expectations for liver blood flow rates which provide an upper limit on possible hepatic clearance and therefore reduce the impact of changes in accuracy of predictions at the upper extreme. Overall, however, the relatively low proportion of false negatives and positives may indicate that the current precision of the Cl_int QSAR model adequately provided predictions in bioactivity space.

Fig 3. — Expected exposure concentrations (blue) versus bioactivity concentrations (red) for 6484 chemicals of the Tox21 dataset. The source of Cl_int and f_up values provided as inputs into C_SS in the *httk* package differed, representing (A) continuous *in vitro* parameter values and (B) QSAR-predicted parameter values. Chemicals in both (A) and (B) are ordered by increasing bioactivity-exposure ratio (BER)as predicted in (B) (that is, for all 6484 chemicals). White points in (A) are drawn as placeholders for all the chemicals without available toxicokinetic data (5636) with which to estimate bioactivity concentrations. These points are positioned at concentrations 25% greater than the highest exposure concentration estimated for that chemical. Black and yellow points mark 10^th quantile toxicological values (Q10) calculated using *in vitro* (A) or QSAR-predicted (B) parameter values. Black points indicate chemicals for which Q10 values ARE NOT exceeded by maximum exposures. Yellow points indicate chemicals for which Q10 values ARE exceeded by maximum exposure values. Note: for figure clarity, chemicals with exposure values < 10⁻⁷ (9) or bioactivity concentrations > 10⁷ (17 A, 77 B) were excluded from the figures. A list of chemicals and data used to construct these plots is available in Supplemental Information section S2.13.

BERs of concern are driven by either high predicted exposure or potent bioactivity. In both cases, there can be significant uncertainty, as indicated by wide distributions in both bioactivity and exposure for many chemicals in the Tox21 dataset (Fig 3). This uncertainty in large part derives from the lack of empirical data used in model-based approaches of estimating exposure and bioactivity. In the case of bioactivity, raw Cl_int and f_up values in databases, themselves produced from experimentally variable assays, were processed into comparable values suitable for modeling using several simplifying assumptions (see Section 2.1). Further, the underlying suite of parameter values employed in the reverse-dosimetry TK modeling methodology are often modeled estimates themselves, and thus subject to their own uncertainty. In the case of exposure, uncertainty is derived from the empirical data and methodology available from which exposure estimates are made. As described in the Supporting Information (S1.5.2), some estimates of exposure (~10%) were derived from empirical data (The National Health and Nutrition Examination Survey(NHANES)⁷⁴), but the majority of exposure predictions for the chemicals in the ToxCast test set (as well as the Tox21 set) chemicals were derived from a consensus-based modeling approach using the Systematic Empirical Evaluation Model (SEEM) Framework.⁷³ When considering the Tox21 set, only about ~1.3% of the 6484 chemicals included in the BER analysis had empirical exposure data included in NHANES. Thus, while there is variation in range and quality of bioactivity ranges for individual chemicals, uncertainty in the aggregate is also driven by a lack of empirical exposure data.

Several factors compensate for uncertainty in the values of Cl_int and f_up used to construct models and the assumptions used to calculate bioactivity and exposure. First, by employing several independent test sets to evaluate Cl_int and f_up, and explicitly comparing the use of in vitro versus in silico parameter values within the TK model, we were able to quantify individual and combined model performance in both TK and risk prioritization contexts. Next, the TK model used for all C_ss and reverse dosimetry calculations (httk) uses a Monte Carlo approach to estimate probabilistic distributions of OEDs, allowing us to be conservative in the values we choose for inclusion in the BER. Finally, these models are intended to inform risk prioritization rather than risk assessment. That is, while there in insufficient confidence in BER’s estimated here to warrant regulatory activity, a larger magnitude of exceedance of exposure to bioactivity may warrant prioritization for risk assessment attention.

Finally, limited human data, the high cost of in vivo animal studies, and large knowledge gaps for current and emerging pollutants makes even risk prioritization of environmental contaminants challenging. When used in conjunction with human exposure estimates and in vitro toxicity data, TK models can be employed to evaluate risk across large chemical inventories as part of a strategy to focus limited in vivo testing resources.^{8, 12, 14} The models provided by the httk package of program R is an example of a TK modeling platform that aims to make reasonable estimates of in vivo chemical concentrations with minimal information. At a minimum, httk requires only values for Cl_int and f_up to estimate steady-state dynamics. By combining this with available environmental exposure and toxicological estimates, chemical risk prioritization capability is within reach for an increasing number of chemicals. Unfortunately, in vitro Cl_int and f_up values are still not available for a significant number of chemicals, as evidenced by the 88% of the Tox21 dataset used in this study without available data. The Tox21 library is itself a subset of the larger Toxic Substances Chemical Control Inventory^3–5 chemical list that must be prioritized to identify chemicals of concern.¹³ Until TK data are available for these chemicals it is critical that reliable in silico-based estimations are available. These QSAR models meet this need in a way that in both transparent and publicly accessible. While the limitations of these models are currently unsuitable for setting regulatory limits, they are well suited to help inform regulatory processes, including the implementation of safety standards mandated through the Toxic Substances Control Act.⁸²

Supplementary Material

Supplement1

NIHMS1703517-supplement-Supplement1.docx^{(164.6KB, docx)}

Supplement2

NIHMS1703517-supplement-Supplement2.xlsx^{(8.8MB, xlsx)}

SYNOPSIS:

Open source QSAR models that produce in silico parameters for toxicokinetic models can contribute to transparent, high throughput risk prioritization of chemicals over a broad chemical domain.

ACKNOWLEDGEMENT

Many thanks go out to Brandon Veber for help with development of this project, Woody Setzer for thoughtful discussions, and Cecilia Tan and Barbara Wetmore for providing feedback on the manuscript during the internal review process.

FUNDING SOURCES

The U.S. Environmental Protection Agency through its Office of Research and Development funded and managed the research described here.

Footnotes

SUPPORTING INFORMATION

Additional details used in the methods are found in the Microsoft Document files “S1_Dawson et al._Supporting_Information.docx”. Additional information, including datasets and graphical results are found in the Microsoft Excel file “S2_Dawson et al. Supporting Information.xlsx”. This is an MS Excel file with the following sheets: S2.1 illustrates Cl_int hepatic flow calculations, S2.2 – 5 include training and test data sets; S2.6–7 include figures illustrating Cl_int model selection criteria and assemblages of model descriptors; S2.8 includes confusion matrices for evaluation Cl_int model, S2.9–10 include figures illustrating f_up model selection criteria and assemblages of model descriptors (with ranges); S2.11 includes tables of model assessments of the Cl_int test set, S2.12 includes information relevant to BER calculations for the ToxCast test set, S2.13 includes information relevant to BER calculations for Tox21 chemicals, and S2.14 provides information on different transformations for f_up.

DECLARATION OF INTEREST

This publication was subjected to an administrative review and approved for submission by the U.S. Environmental Protection Agency. The research presented here are the work of the author and does not necessarily represent Agency policy or opinion. The authors declare no competing financial interests.

REFERENCES

1.Egeghy PP; Judson R; Gangwal S; Mosher S; Smith D; Vail J; Hubal EAC, The Exposure Data Landscape for Manufactured Chemicals. Sci. Total Environ 2012, 414, 159–166. [DOI] [PubMed] [Google Scholar]
2.Muir DCG; Howard PH, Are There Other Persistent Organic Pollutants? A Challenge for Environmental Chemists. Environ. Sci. Technol 2006, 40, (23), 7157–7166. [DOI] [PubMed] [Google Scholar]
3.“Toxic Substances Control Act”(Tsca). Public Law 94–469, October 11, 1976. In Congress, U. S., Ed. 1976. [Google Scholar]
4.Frank R Lautenberg Chemical Safety for the 21st Century Act. Public Law. 114–182 June, 22, 2016. In Congress, U. S., Ed. [Google Scholar]
5.Tsca Chemical Substance Inventory In US Environmental Protection Agency,: 2021. [Google Scholar]
6.Wambaugh JF; Wetmore BA; Pearce R; Strope C; Goldsmith R; Sluka JP; Sedykh A; Tropsha A; Bosgra S; Shah I; Judson R; Thomas RS; Setzer RW, Toxicokinetic Triage for Environmental Chemicals. Toxicol Sci 2015, 147, (1), 55–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Phillips MB; Leonard JA; Grulke CM; Chang DT; Edwards SW; Brooks R; Goldsmith MR; El-Masri H; Tan YM, A Workflow to Investigate Exposure and Pharmacokinetic Influences on High-Throughput in Vitro Chemical Screening Based on Adverse Outcome Pathways. Environ. Health Perspect 2016, 124, (1), 53–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Rotroff DM; Wetmore BA; Dix DJ; Ferguson SS; Clewell HJ; Houck KA; LeCluyse EL; Andersen ME; Judson RS; Smith CM; Sochaski MA; Kavlock RJ; Boellmann F; Martin MT; Reif DM; Wambaugh JF; Thomas RS, Incorporating Human Dosimetry and Exposure into High-Throughput in Vitro Toxicity Screening. Toxicol Sci 2010, 117, (2), 348–358. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Wetmore BA; Wambaugh JF; Ferguson SS; Sochaski MA; Rotroff DM; Freeman K; Clewell HJ; Dix DJ; Andersen ME; Houck KA; Allen B; Judson RS; Singh R; Kavlock RJ; Richard AM; Thomas RS, Integration of Dosimetry, Exposure, and High-Throughput Screening Data in Chemical Toxicity Assessment. Toxicol Sci 2012, 125, (1), 157–174. [DOI] [PubMed] [Google Scholar]
10.Wetmore BA; Wambaugh JF; Allen B; Ferguson SS; Sochaski MA; Setzer RW; Houck KA; Strope CL; Cantwell K; Judson RS; LeCluyse E; Clewell HJ; Thomas RS; Andersen ME, Incorporating High-Throughput Exposure Predictions with Dosimetry-Adjusted in Vitro Bioactivity to Inform Chemical Toxicity Testing. Toxicol Sci 2015, 148, (1), 121–136. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Knaak JB; Dary CC; Zhang XF; Gerlach RW; Tornero-Velez R; Chang DT; Goldsmith R; Blancato JN, Parameters for Pyrethroid Insecticide Qsar and Pbpk/Pd Models for Human Risk Assessment. Rev Environ Contam T 2012, 219, 1–114. [DOI] [PubMed] [Google Scholar]
12.Thomas RS; Philbert MA; Auerbach SS; Wetmore BA; Devito MJ; Cote I; Rowlands JC; Whelan MP; Hays SM; Andersen ME; Meek ME; Reiter LW; Lambert JC; Clewell HJ; Stephens ML; Zhao QJ; Wesselkamper SC; Flowers L; Carney EW; Pastoor TP; Petersen DD; Yauk CL; Nong A, Incorporating New Technologies into Toxicity Testing and Risk Assessment: Moving from 21st Century Vision to a Data-Driven Framework. Toxicol Sci 2013, 136, (1), 4–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Tice RR; Austin CP; Kavlock RJ; Bucher JR, Improving the Human Hazard Characterization of Chemicals: A Tox21 Update. Environ Health Perspect 2013, 121, (7), 756–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Sipes NS; Wambaugh JF; Pearce R; Auerbach SS; Wetmore BA; Hsieh JH; Shapiro AJ; Svoboda D; DeVito MJ; Ferguson SS, An Intuitive Approach for Predicting Potential Human Health Risk with the Tox21 10k Library. Environ. Sci. Technol 2017, 51, (18), 10786–10796. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Espie P; Tytgat D; Sargentini-Maier ML; Poggesi I; Watelet JB, Physiologically Based Pharmacokinetics (Pbpk). Drug Metab. Rev 2009, 41, (3), 391–407. [DOI] [PubMed] [Google Scholar]
16.Coecke S; Pelkonen O; Leite SB; Bernauer U; Bessems JG; Bois FY; Gundert-Remy U; Loizou G; Testai E; Zaldívar J-M, Toxicokinetics as a Key to the Integrated Toxicity Risk Assessment Based Primarily on Non-Animal Approaches. Toxicol. Vitro 2013, 27, (5), 1570–1577. [DOI] [PubMed] [Google Scholar]
17.McLanahan ED; El-Masri HA; Sweeney LM; Kopylev LY; Clewell HJ; Wambaugh JF; Schlosser PM, Physiologically Based Pharmacokinetic Model Use in Risk Assessment-Why Being Published Is Not Enough. Toxicol Sci 2012, 126, (1), 5–15. [DOI] [PubMed] [Google Scholar]
18.O’ Flaherty E, J., Toxicants and Drugs: Kinetics and Dynamics John Wiley & Sons. : 1981. [Google Scholar]
19.Tan Y-M; Liao KH; Clewell HJ, Reverse Dosimetry: Interpreting Trihalomethanes Biomonitoring Data Using Physiologically Based Pharmacokinetic Modeling. Journal of exposure science & environmental epidemiology 2007, 17, (7), 591–603. [DOI] [PubMed] [Google Scholar]
20.Wetmore BA; Wambaugh JF; Allen B; Ferguson SS; Sochaski MA; Setzer RW; Houck KA; Strope CL; Cantwell K; Judson RS; LeCluyse E; Clewell HJ; Thomas RS; Andersen ME, Incorporating High-Throughput Exposure Predictions with Dosimetry-Adjusted in Vitro Bioactivity to Inform Chemical Toxicity Testing. Toxicol Sci 2015, 148, (1), 121–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Bell SM; Chang X; Wambaugh JF; Allen DG; Bartels M; Brouwer KLR; Casey WM; Choksi N; Ferguson SS; Fraczkiewicz G; Jarabek AM; Ke A; Lumen A; Lynn SG; Paini A; Price PS; Ring C; Simon TW; Sipes NS; Sprankle CS; Strickland J; Troutman J; Wetmore BA; Kleinstreuer NC, In Vitro to in Vivo Extrapolation for High Throughput Prioritization and Decision Making. Toxicol. Vitro 2018, 47, 213–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Wetmore BA; Wambaugh JF; Ferguson SS; Sochaski MA; Rotroff DM; Freeman K; Clewell HJ 3rd; Dix DJ; Andersen ME; Houck KA; Allen B; Judson RS; Singh R; Kavlock RJ; Richard AM; Thomas RS, Integration of Dosimetry, Exposure, and High-Throughput Screening Data in Chemical Toxicity Assessment. Toxicol Sci 2012, 125, (1), 157–74. [DOI] [PubMed] [Google Scholar]
23.Tonnelier A; Coecke S; Zaldívar J-M, Screening of Chemicals for Human Bioaccumulative Potential with a Physiologically Based Toxicokinetic Model. Arch. Toxicol 2012, 86, (3), 393–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Wambaugh JF; Wetmore BA; Ring CL; Nicolas CI; Pearce R; Honda G; Dinallo R; Angus D; Gilbert J; Sierra T; Badrinarayanan A; Snodgrass B; Brockman A; Strock C; Setzer W; Thomas RS, Assessing Toxicokinetic Uncertainty and Variability in Risk Prioritization In 2019. [DOI] [PMC free article] [PubMed]
25.Ingle BL; Veber BC; Nichols JW; Tornero-Velez R, Informing the Human Plasma Protein Binding of Environmental Chemicals by Machine Learning in the Pharmaceutical Space: Applicability Domain and Limits of Predictability. J Chem Inf Model 2016, 56, (11), 2243–2252. [DOI] [PubMed] [Google Scholar]
26.Kirman CR; Aylward LL; Wetmore BA; Thomas RS; Sochaski M; Ferguson SS; Csiszar SA; Jolliet O, Quantitative Property–Property Relationship for Screening-Level Prediction of Intrinsic Clearance: A Tool for Exposure Modeling for High-Throughput Toxicity Screening Data. Applied In Vitro Toxicology 2015, 1, (2), 140–146. [Google Scholar]
27.Votano JR; Parham M; Hall LM; Hall LH; Kier LB; Oloff S; Tropsha A, Qsar Modeling of Human Serum Protein Binding with Several Modeling Techniques Utilizing Structure-Information Representation. J. Med. Chem 2006, 49, (24), 7169–7181. [DOI] [PubMed] [Google Scholar]
28.Zhu XW; Sedykh A; Zhu H; Liu SS; Tropsha A, The Use of Pseudo-Equilibrium Constant Affords Improved Qsar Models of Human Plasma Protein Binding. Pharm Res-Dordr 2013, 30, (7), 1790–1798. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Hallifax D; Foster JA; Houston JB, Prediction of Human Metabolic Clearance from in Vitro Systems: Retrospective Analysis and Prospective View. Pharm Res-Dordr 2010, 27, (10), 2150–2161. [DOI] [PubMed] [Google Scholar]
30.Poulin P; Haddad S, Toward a New Paradigm for the Efficient in Vitro-in Vivo Extrapolation of Metabolic Clearance in Humans from Hepatocyte Data. J. Pharm. Sci 2013, 102, (9), 3239–3251. [DOI] [PubMed] [Google Scholar]
31.Varma MV; Steyn SJ; Allerton C; El-Kattan AF, Predicting Clearance Mechanism in Drug Discovery: Extended Clearance Classification System (Eccs). Pharm Res-Dordr 2015, 32, (12), 3785–3802. [DOI] [PubMed] [Google Scholar]
32.Long A; Walker JD, Quantitative Structure-Activity Relationships for Predicting Metabolism and Modeling Cytochrome P450 Enzyme Activities. Environmental toxicology and chemistry 2003, 22, (8), 1894–1899. [DOI] [PubMed] [Google Scholar]
33.Peyret T; Krishnan K, Quantitative Property-Property Relationship for Screening-Level Prediction of Intrinsic Clearance of Volatile Organic Chemicals in Rats and Its Integration within Pbpk Models to Predict Inhalation Pharmacokinetics in Humans. J Toxicol 2012, 2012, 286079. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Liu R; Schyman P; Wallqvist A, Critically Assessing the Predictive Power of Qsar Models for Human Liver Microsomal Stability. J Chem Inf Model 2015, 55, (8), 1566–75. [DOI] [PubMed] [Google Scholar]
35.Pirovano A; Brandmaier S; Huijbregts MA; Ragas AM; Veltman K; Hendriks AJ, Qsars for Estimating Intrinsic Hepatic Clearance of Organic Chemicals in Humans. Environ Toxicol Pharmacol 2016, 42, 190–7. [DOI] [PubMed] [Google Scholar]
36.Watanabe R; Esaki T; Kawashima H; Natsume-Kitatani Y; Nagao C; Ohashi R; Mizuguchi K, Predicting Fraction Unbound in Human Plasma from Chemical Structure: Improved Accuracy in the Low Value Ranges. Mol Pharm 2018, 15, (11), 5302–5311. [DOI] [PubMed] [Google Scholar]
37.de Groot MJ, Designing Better Drugs: Predicting Cytochrome P450 Metabolism. Drug Discovery Today 2006, 11, (13), 601–606. [DOI] [PubMed] [Google Scholar]
38.Kirchmair J; Göller AH; Lang D; Kunze J; Testa B; Wilson ID; Glen RC; Schneider G, Predicting Drug Metabolism: Experiment and/or Computation? Nature Reviews Drug Discovery 2015, 14, (6), 387–404. [DOI] [PubMed] [Google Scholar]
39.Valerio LG, In Silico Toxicology for the Pharmaceutical Sciences. Toxicol. Appl. Pharmacol 2009, 241, (3), 356–370. [DOI] [PubMed] [Google Scholar]
40.Kazmi SR; Jun R; Yu MS; Jung C; Na D, In Silico Approaches and Tools for the Prediction of Drug Metabolism and Fate: A Review. Comput Biol Med 2019, 106, 54–64. [DOI] [PubMed] [Google Scholar]
41.Bento AP; Gaulton A; Hersey A; Bellis LJ; Chambers J; Davies M; Kruger FA; Light Y; Mak L; McGlinchey S; Nowotka M; Papadatos G; Santos R; Overington JP, The Chembl Bioactivity Database: An Update. Nucleic Acids Res 2014, 42, (Database issue), D1083–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Gaulton A; Bellis LJ; Bento AP; Chambers J; Davies M; Hersey A; Light Y; McGlinchey S; Michalovich D; Al-Lazikani B; Overington JP, Chembl: A Large-Scale Bioactivity Database for Drug Discovery. Nucleic Acids Res 2012, 40, (Database issue), D1100–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Gaulton A; Hersey A; Nowotka M; Bento AP; Chambers J; Mendez D; Mutowo P; Atkinson F; Bellis LJ; Cibrian-Uhalte E; Davies M; Dedman N; Karlsson A; Magarinos MP; Overington JP; Papadatos G; Smit I; Leach AR, The Chembl Database in 2017. Nucleic Acids Res 2017, 45, (D1), D945–D954. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Richard AM; Judson RS; Houck KA; Grulke CM; Volarath P; Thillainadarajah I; Yang C; Rathman J; Martin MT; Wambaugh JF, Toxcast Chemical Landscape: Paving the Road to 21st Century Toxicology. Chem. Res. Toxicol 2016, 29, (8), 1225–1251. [DOI] [PubMed] [Google Scholar]
45.Kilford PJ; Gertz M; Houston JB; Galetin A, Hepatocellular Binding of Drugs: Correction for Unbound Fraction in Hepatocyte Incubations Using Microsomal Binding or Drug Lipophilicity Data. Drug Metab Dispos 2008, 36, (7), 1194–7. [DOI] [PubMed] [Google Scholar]
46.Lewis DFV; Dickens M, Factors Influencing Rates and Clearance in P450-Mediated Reactions: Qsars for Substrates of the Xenobiotic-Metabolizing Hepatic Microsomal P450s. Toxicology 2002, 170, 45–53. [DOI] [PubMed] [Google Scholar]
47.Keefer C; Chang G; Carlo A; Novak JJ; Banker M; Carey J; Cianfrogna J; Eng H; Jagla C; Johnson N; Jones R; Jordan S; Lazzaro S; Liu J; Scott Obach R; Riccardi K; Tess D; Umland J; Racich J; Varma M; Visswanathan R; Di L, Mechanistic Insights on Clearance and Inhibition Discordance between Liver Microsomes and Hepatocytes When Clearance in Liver Microsomes Is Higher Than in Hepatocytes. Eur J Pharm Sci 2020, 155, 105541. [DOI] [PubMed] [Google Scholar]
48.Brown RP; Delp MD; Lindstedt SL; Rhomberg LR; Beliles RP, Physiological Parameter Values for Physiologically Based Pharmacokinetic Models. Toxicol Ind Health 1997, 13, (4), 407–84. [DOI] [PubMed] [Google Scholar]
49.Ito K; Houston JB, Comparison of the Use of Liver Models for Predicting Drug Clearance Using in Vitro Kinetic Data from Hepatic Microsomes and Isolated Hepatocytes. Pharm Res-Dordr 2004, 21, (5), 785–792. [DOI] [PubMed] [Google Scholar]
50.Pearce RG; Setzer RW; Strope CL; Wambaugh JF; Sipes NS, Httk: R Package for High-Throughput Toxicokinetics. J Stat Softw 2017, 79, (4), 1–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Obach RS; Lombardo F; Waters NJ, Trend Analysis of a Database of Intravenous Pharmacokinetic Parameters in Humans for 670 Drug Compounds. Drug Metab. Dispos 2008, 36, (7), 1385–1405. [DOI] [PubMed] [Google Scholar]
52.Yap CW, Padel-Descriptor: An Open Source Software to Calculate Molecular Descriptors and Fingerprints. J Comput Chem 2011, 32, (7), 1466–74. [DOI] [PubMed] [Google Scholar]
53.Mansouri K; Grulke CM; Judson RS; Williams AJ, Opera Models for Predicting Physicochemical Properties and Environmental Fate Endpoints. J Cheminform 2018, 10, (1), 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Wang J; Hallinger DR; Murr AS; Buckalew AR; Lougee RR; Richard AM; Laws SC; Stoker TE, High-Throughput Screening and Chemotype-Enrichment Analysis of Toxcast Phase Ii Chemicals Evaluated for Human Sodium-Iodide Symporter (Nis) Inhibition. Environ. Int 2019, 126, 377–386. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Yang C; Tarkhov A; Marusczyk J; Bienfait B; Gasteiger J; Kleinoeder T; Magdziarz T; Sacher O; Schwab CH; Schwoebel J; Terfloth L; Arvidson K; Richard A; Worth A; Rathman J, New Publicly Available Chemical Query Language, Csrml, to Support Chemotype Representations for Application to Data Mining and Modeling. J Chem Inf Model 2015, 55, (3), 510–28. [DOI] [PubMed] [Google Scholar]
56.Guha R, Chemical Informatics Functionality in R. Journal of Statistical Software 2007, 6, (18). [Google Scholar]
57.MDL Information Systems Inc. Maccs Keys , 14600 Catalina Street, San Leandro, CA: 94577. [Google Scholar]
58.Willighagen EL; Mayfield JW; Alvarsson J; Berg A; Carlsson L; Jeliazkova N; Kuhn S; Pluskal T; Rojas-Cherto M; Spjuth O; Torrance G; Evelo CT; Guha R; Steinbeck C, The Chemistry Development Kit (Cdk) V2.0: Atom Typing, Depiction, Molecular Formulas, and Substructure Searching. J Cheminform 2017, 9, (1), 33. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Liaw A; Wiener M, Classification and Regression by Randomforest. R News 2002, 2, (3), 18–22. [Google Scholar]
60.Kuhn M Caret: Classification and Regression Training, 2019.
61.Kuhn M, Caret: Classification and Regression Training. R Package Version 6.0–85 In R Project, https://CRAN.R-project.org/package=caret: 2020. [Google Scholar]
62.Roy K; Kar S; Ambure P, On a Simple Approach for Determining Applicability Domain of Qsar Models. Chemometrics and Intelligent Laboratory Systems 2015, 145, 22–29. [Google Scholar]
63.Kuhn M; Vaughan D Yardstick: Tidy Characterizations of Model Performance: R Package Version 0.0.7, CRAN: 2020. [Google Scholar]
64.Buckland M; Gey F, The Relationship between Recall and Precision. Journal of the American Society for Information Science 1994, 45, (`), 12–19. [Google Scholar]
65.Powers D Evaluation: From Precision, Recall and F Factor to Roc, Informedness, Markedness and Correlation; Flinders University: 2007. [Google Scholar]
66.Zhu XW; Sedykh A; Zhu H; Liu SS; Tropsha A, The Use of Pseudo-Equilibrium Constant Affords Improved Qsar Models of Human Plasma Protein Binding. Pharm Res 2013, 30, (7), 1790–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Lambrinidis G; Vallianatou T; Tsantili-Kakoulidou A, In Vitro, in Silico and Integrated Strategies for the Estimation of Plasma Protein Binding. A Review. Adv Drug Deliv Rev 2015, 86, 27–45. [DOI] [PubMed] [Google Scholar]
68.Yun YE; Tornero-Velez R; Purucker ST; Chang DT; Edginton AN, Evaluation of Quantitative Structure Property Relationship Algorithms for Predicting Plasma Protein Binding in Humans. Computational Toxicology 2021, 17, 100142. [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Toma C; Gadaleta D; Roncaglioni A; Toropov A; Toropova A; Marzo M; Benfenati E, Qsar Development for Plasma Protein Binding: Influence of the Ionization State. Pharm Res 2019, 36, (2), 28. [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Filer DL; Kothiya P; Setzer RW; Judson RS; Martin MT, Tcpl: The Toxcast Pipeline for High-Throughput Screening Data. Bioinformatics 2017, 33, (4), 618–620. [DOI] [PubMed] [Google Scholar]
71.Pearce RG; Setzer RW; Strope CL; Sipes NS; Wambaugh JF, Httk: R Package for High-Throughput Toxicokinetics. Journal of Statistical Software 2017, 79, (4). [DOI] [PMC free article] [PubMed] [Google Scholar]
72.U.S. EPA Toxcast & Tox21 Data Spreadsheet for in-Vitro Dbv3.2 https://www.epa.gov/chemical-research/toxicity-forecaster-toxcasttm-data (02/October/2020),
73.Ring CL; Arnot JA; Bennett DH; Egeghy PP; Fantke P; Huang L; Isaacs KK; Jolliet O; Phillips KA; Price PS; Shin HM; Westgate JN; Setzer RW; Wambaugh JF, Consensus Modeling of Median Chemical Intake for the U.S. Population Based on Predictions of Exposure Pathways. Environ Sci Technol 2019, 53, (2), 719–732. [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Centers for Disease Control and Prevention (CDC). National Center for Health Statistics (NCHS), National Health and Nutrition Examination Survey Data In U.S. Department of Health and Human Services, Centers for Disease Control and Prevention: Hyattsville, MD., 2009–2010. [Google Scholar]
75.Hollander M; Wolfe DA; Chicken E, Nonparametric Statistical Methods John Wiley & Sons: 2013; Vol. 751. [Google Scholar]
76.Paixao P; Gouveia LF; Morais JA, Prediction of the in Vitro Intrinsic Clearance Determined in Suspensions of Human Hepatocytes by Using Artificial Neural Networks. Eur J Pharm Sci 2010, 39, (5), 310–21. [DOI] [PubMed] [Google Scholar]
77.Nikolic K; Agababa D, Prediction of Hepatic Microsomal Intrinsic Clearance and Human Clearance Values for Drugs. J Mol Graph Model 2009, 28, (3), 245–52. [DOI] [PubMed] [Google Scholar]
78.Golbraikh A; Tropsha A, Predictive Qsar Modeling Based on Diversity Sampling of Experimental Datasets for the Training and Test Set Selection. Molecular Diversity 2002, 5, 231–243. [DOI] [PubMed] [Google Scholar]
79.Ghafourian T; Amin Z, Qsar Models for the Prediction of Plasma Protein Binding. Bioimpacts 2013, 3, (1), 21–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
80.Kosugi Y; Hosea N, Prediction of Oral Pharmacokinetics Using a Combination of in Silico Descriptors and in Vitro Adme Properties. Mol Pharm 2021. [DOI] [PubMed]
81.Lewis DFV, On the Recognition of Mammalian Microsomal Cytochrome P450 Substrates and Their Characteristics. Biochem. Pharmacol 2000, 60, 293–306. [DOI] [PubMed] [Google Scholar]
82.Zeeman M; Auer CM; Clements RG; Nabholz JV; Boethling RS, Epa US Regulator Perspectives on the Use of Qsar for New and Existing Chemicla Evaluations. SAR QSAR Environ. Res 1995, 3, (3), 179–201. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement1

NIHMS1703517-supplement-Supplement1.docx^{(164.6KB, docx)}

Supplement2

NIHMS1703517-supplement-Supplement2.xlsx^{(8.8MB, xlsx)}

[R1] 1.Egeghy PP; Judson R; Gangwal S; Mosher S; Smith D; Vail J; Hubal EAC, The Exposure Data Landscape for Manufactured Chemicals. Sci. Total Environ 2012, 414, 159–166. [DOI] [PubMed] [Google Scholar]

[R2] 2.Muir DCG; Howard PH, Are There Other Persistent Organic Pollutants? A Challenge for Environmental Chemists. Environ. Sci. Technol 2006, 40, (23), 7157–7166. [DOI] [PubMed] [Google Scholar]

[R3] 3.“Toxic Substances Control Act”(Tsca). Public Law 94–469, October 11, 1976. In Congress, U. S., Ed. 1976. [Google Scholar]

[R4] 4.Frank R Lautenberg Chemical Safety for the 21st Century Act. Public Law. 114–182 June, 22, 2016. In Congress, U. S., Ed. [Google Scholar]

[R5] 5.Tsca Chemical Substance Inventory In US Environmental Protection Agency,: 2021. [Google Scholar]

[R6] 6.Wambaugh JF; Wetmore BA; Pearce R; Strope C; Goldsmith R; Sluka JP; Sedykh A; Tropsha A; Bosgra S; Shah I; Judson R; Thomas RS; Setzer RW, Toxicokinetic Triage for Environmental Chemicals. Toxicol Sci 2015, 147, (1), 55–67. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Phillips MB; Leonard JA; Grulke CM; Chang DT; Edwards SW; Brooks R; Goldsmith MR; El-Masri H; Tan YM, A Workflow to Investigate Exposure and Pharmacokinetic Influences on High-Throughput in Vitro Chemical Screening Based on Adverse Outcome Pathways. Environ. Health Perspect 2016, 124, (1), 53–60. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Rotroff DM; Wetmore BA; Dix DJ; Ferguson SS; Clewell HJ; Houck KA; LeCluyse EL; Andersen ME; Judson RS; Smith CM; Sochaski MA; Kavlock RJ; Boellmann F; Martin MT; Reif DM; Wambaugh JF; Thomas RS, Incorporating Human Dosimetry and Exposure into High-Throughput in Vitro Toxicity Screening. Toxicol Sci 2010, 117, (2), 348–358. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Wetmore BA; Wambaugh JF; Ferguson SS; Sochaski MA; Rotroff DM; Freeman K; Clewell HJ; Dix DJ; Andersen ME; Houck KA; Allen B; Judson RS; Singh R; Kavlock RJ; Richard AM; Thomas RS, Integration of Dosimetry, Exposure, and High-Throughput Screening Data in Chemical Toxicity Assessment. Toxicol Sci 2012, 125, (1), 157–174. [DOI] [PubMed] [Google Scholar]

[R10] 10.Wetmore BA; Wambaugh JF; Allen B; Ferguson SS; Sochaski MA; Setzer RW; Houck KA; Strope CL; Cantwell K; Judson RS; LeCluyse E; Clewell HJ; Thomas RS; Andersen ME, Incorporating High-Throughput Exposure Predictions with Dosimetry-Adjusted in Vitro Bioactivity to Inform Chemical Toxicity Testing. Toxicol Sci 2015, 148, (1), 121–136. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Knaak JB; Dary CC; Zhang XF; Gerlach RW; Tornero-Velez R; Chang DT; Goldsmith R; Blancato JN, Parameters for Pyrethroid Insecticide Qsar and Pbpk/Pd Models for Human Risk Assessment. Rev Environ Contam T 2012, 219, 1–114. [DOI] [PubMed] [Google Scholar]

[R12] 12.Thomas RS; Philbert MA; Auerbach SS; Wetmore BA; Devito MJ; Cote I; Rowlands JC; Whelan MP; Hays SM; Andersen ME; Meek ME; Reiter LW; Lambert JC; Clewell HJ; Stephens ML; Zhao QJ; Wesselkamper SC; Flowers L; Carney EW; Pastoor TP; Petersen DD; Yauk CL; Nong A, Incorporating New Technologies into Toxicity Testing and Risk Assessment: Moving from 21st Century Vision to a Data-Driven Framework. Toxicol Sci 2013, 136, (1), 4–18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Tice RR; Austin CP; Kavlock RJ; Bucher JR, Improving the Human Hazard Characterization of Chemicals: A Tox21 Update. Environ Health Perspect 2013, 121, (7), 756–65. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Sipes NS; Wambaugh JF; Pearce R; Auerbach SS; Wetmore BA; Hsieh JH; Shapiro AJ; Svoboda D; DeVito MJ; Ferguson SS, An Intuitive Approach for Predicting Potential Human Health Risk with the Tox21 10k Library. Environ. Sci. Technol 2017, 51, (18), 10786–10796. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Espie P; Tytgat D; Sargentini-Maier ML; Poggesi I; Watelet JB, Physiologically Based Pharmacokinetics (Pbpk). Drug Metab. Rev 2009, 41, (3), 391–407. [DOI] [PubMed] [Google Scholar]

[R16] 16.Coecke S; Pelkonen O; Leite SB; Bernauer U; Bessems JG; Bois FY; Gundert-Remy U; Loizou G; Testai E; Zaldívar J-M, Toxicokinetics as a Key to the Integrated Toxicity Risk Assessment Based Primarily on Non-Animal Approaches. Toxicol. Vitro 2013, 27, (5), 1570–1577. [DOI] [PubMed] [Google Scholar]

[R17] 17.McLanahan ED; El-Masri HA; Sweeney LM; Kopylev LY; Clewell HJ; Wambaugh JF; Schlosser PM, Physiologically Based Pharmacokinetic Model Use in Risk Assessment-Why Being Published Is Not Enough. Toxicol Sci 2012, 126, (1), 5–15. [DOI] [PubMed] [Google Scholar]

[R18] 18.O’ Flaherty E, J., Toxicants and Drugs: Kinetics and Dynamics John Wiley & Sons. : 1981. [Google Scholar]

[R19] 19.Tan Y-M; Liao KH; Clewell HJ, Reverse Dosimetry: Interpreting Trihalomethanes Biomonitoring Data Using Physiologically Based Pharmacokinetic Modeling. Journal of exposure science & environmental epidemiology 2007, 17, (7), 591–603. [DOI] [PubMed] [Google Scholar]

[R20] 20.Wetmore BA; Wambaugh JF; Allen B; Ferguson SS; Sochaski MA; Setzer RW; Houck KA; Strope CL; Cantwell K; Judson RS; LeCluyse E; Clewell HJ; Thomas RS; Andersen ME, Incorporating High-Throughput Exposure Predictions with Dosimetry-Adjusted in Vitro Bioactivity to Inform Chemical Toxicity Testing. Toxicol Sci 2015, 148, (1), 121–36. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Bell SM; Chang X; Wambaugh JF; Allen DG; Bartels M; Brouwer KLR; Casey WM; Choksi N; Ferguson SS; Fraczkiewicz G; Jarabek AM; Ke A; Lumen A; Lynn SG; Paini A; Price PS; Ring C; Simon TW; Sipes NS; Sprankle CS; Strickland J; Troutman J; Wetmore BA; Kleinstreuer NC, In Vitro to in Vivo Extrapolation for High Throughput Prioritization and Decision Making. Toxicol. Vitro 2018, 47, 213–227. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Wetmore BA; Wambaugh JF; Ferguson SS; Sochaski MA; Rotroff DM; Freeman K; Clewell HJ 3rd; Dix DJ; Andersen ME; Houck KA; Allen B; Judson RS; Singh R; Kavlock RJ; Richard AM; Thomas RS, Integration of Dosimetry, Exposure, and High-Throughput Screening Data in Chemical Toxicity Assessment. Toxicol Sci 2012, 125, (1), 157–74. [DOI] [PubMed] [Google Scholar]

[R23] 23.Tonnelier A; Coecke S; Zaldívar J-M, Screening of Chemicals for Human Bioaccumulative Potential with a Physiologically Based Toxicokinetic Model. Arch. Toxicol 2012, 86, (3), 393–403. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Wambaugh JF; Wetmore BA; Ring CL; Nicolas CI; Pearce R; Honda G; Dinallo R; Angus D; Gilbert J; Sierra T; Badrinarayanan A; Snodgrass B; Brockman A; Strock C; Setzer W; Thomas RS, Assessing Toxicokinetic Uncertainty and Variability in Risk Prioritization In 2019. [DOI] [PMC free article] [PubMed]

[R25] 25.Ingle BL; Veber BC; Nichols JW; Tornero-Velez R, Informing the Human Plasma Protein Binding of Environmental Chemicals by Machine Learning in the Pharmaceutical Space: Applicability Domain and Limits of Predictability. J Chem Inf Model 2016, 56, (11), 2243–2252. [DOI] [PubMed] [Google Scholar]

[R26] 26.Kirman CR; Aylward LL; Wetmore BA; Thomas RS; Sochaski M; Ferguson SS; Csiszar SA; Jolliet O, Quantitative Property–Property Relationship for Screening-Level Prediction of Intrinsic Clearance: A Tool for Exposure Modeling for High-Throughput Toxicity Screening Data. Applied In Vitro Toxicology 2015, 1, (2), 140–146. [Google Scholar]

[R27] 27.Votano JR; Parham M; Hall LM; Hall LH; Kier LB; Oloff S; Tropsha A, Qsar Modeling of Human Serum Protein Binding with Several Modeling Techniques Utilizing Structure-Information Representation. J. Med. Chem 2006, 49, (24), 7169–7181. [DOI] [PubMed] [Google Scholar]

[R28] 28.Zhu XW; Sedykh A; Zhu H; Liu SS; Tropsha A, The Use of Pseudo-Equilibrium Constant Affords Improved Qsar Models of Human Plasma Protein Binding. Pharm Res-Dordr 2013, 30, (7), 1790–1798. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Hallifax D; Foster JA; Houston JB, Prediction of Human Metabolic Clearance from in Vitro Systems: Retrospective Analysis and Prospective View. Pharm Res-Dordr 2010, 27, (10), 2150–2161. [DOI] [PubMed] [Google Scholar]

[R30] 30.Poulin P; Haddad S, Toward a New Paradigm for the Efficient in Vitro-in Vivo Extrapolation of Metabolic Clearance in Humans from Hepatocyte Data. J. Pharm. Sci 2013, 102, (9), 3239–3251. [DOI] [PubMed] [Google Scholar]

[R31] 31.Varma MV; Steyn SJ; Allerton C; El-Kattan AF, Predicting Clearance Mechanism in Drug Discovery: Extended Clearance Classification System (Eccs). Pharm Res-Dordr 2015, 32, (12), 3785–3802. [DOI] [PubMed] [Google Scholar]

[R32] 32.Long A; Walker JD, Quantitative Structure-Activity Relationships for Predicting Metabolism and Modeling Cytochrome P450 Enzyme Activities. Environmental toxicology and chemistry 2003, 22, (8), 1894–1899. [DOI] [PubMed] [Google Scholar]

[R33] 33.Peyret T; Krishnan K, Quantitative Property-Property Relationship for Screening-Level Prediction of Intrinsic Clearance of Volatile Organic Chemicals in Rats and Its Integration within Pbpk Models to Predict Inhalation Pharmacokinetics in Humans. J Toxicol 2012, 2012, 286079. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Liu R; Schyman P; Wallqvist A, Critically Assessing the Predictive Power of Qsar Models for Human Liver Microsomal Stability. J Chem Inf Model 2015, 55, (8), 1566–75. [DOI] [PubMed] [Google Scholar]

[R35] 35.Pirovano A; Brandmaier S; Huijbregts MA; Ragas AM; Veltman K; Hendriks AJ, Qsars for Estimating Intrinsic Hepatic Clearance of Organic Chemicals in Humans. Environ Toxicol Pharmacol 2016, 42, 190–7. [DOI] [PubMed] [Google Scholar]

[R36] 36.Watanabe R; Esaki T; Kawashima H; Natsume-Kitatani Y; Nagao C; Ohashi R; Mizuguchi K, Predicting Fraction Unbound in Human Plasma from Chemical Structure: Improved Accuracy in the Low Value Ranges. Mol Pharm 2018, 15, (11), 5302–5311. [DOI] [PubMed] [Google Scholar]

[R37] 37.de Groot MJ, Designing Better Drugs: Predicting Cytochrome P450 Metabolism. Drug Discovery Today 2006, 11, (13), 601–606. [DOI] [PubMed] [Google Scholar]

[R38] 38.Kirchmair J; Göller AH; Lang D; Kunze J; Testa B; Wilson ID; Glen RC; Schneider G, Predicting Drug Metabolism: Experiment and/or Computation? Nature Reviews Drug Discovery 2015, 14, (6), 387–404. [DOI] [PubMed] [Google Scholar]

[R39] 39.Valerio LG, In Silico Toxicology for the Pharmaceutical Sciences. Toxicol. Appl. Pharmacol 2009, 241, (3), 356–370. [DOI] [PubMed] [Google Scholar]

[R40] 40.Kazmi SR; Jun R; Yu MS; Jung C; Na D, In Silico Approaches and Tools for the Prediction of Drug Metabolism and Fate: A Review. Comput Biol Med 2019, 106, 54–64. [DOI] [PubMed] [Google Scholar]

[R41] 41.Bento AP; Gaulton A; Hersey A; Bellis LJ; Chambers J; Davies M; Kruger FA; Light Y; Mak L; McGlinchey S; Nowotka M; Papadatos G; Santos R; Overington JP, The Chembl Bioactivity Database: An Update. Nucleic Acids Res 2014, 42, (Database issue), D1083–90. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Gaulton A; Bellis LJ; Bento AP; Chambers J; Davies M; Hersey A; Light Y; McGlinchey S; Michalovich D; Al-Lazikani B; Overington JP, Chembl: A Large-Scale Bioactivity Database for Drug Discovery. Nucleic Acids Res 2012, 40, (Database issue), D1100–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Gaulton A; Hersey A; Nowotka M; Bento AP; Chambers J; Mendez D; Mutowo P; Atkinson F; Bellis LJ; Cibrian-Uhalte E; Davies M; Dedman N; Karlsson A; Magarinos MP; Overington JP; Papadatos G; Smit I; Leach AR, The Chembl Database in 2017. Nucleic Acids Res 2017, 45, (D1), D945–D954. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Richard AM; Judson RS; Houck KA; Grulke CM; Volarath P; Thillainadarajah I; Yang C; Rathman J; Martin MT; Wambaugh JF, Toxcast Chemical Landscape: Paving the Road to 21st Century Toxicology. Chem. Res. Toxicol 2016, 29, (8), 1225–1251. [DOI] [PubMed] [Google Scholar]

[R45] 45.Kilford PJ; Gertz M; Houston JB; Galetin A, Hepatocellular Binding of Drugs: Correction for Unbound Fraction in Hepatocyte Incubations Using Microsomal Binding or Drug Lipophilicity Data. Drug Metab Dispos 2008, 36, (7), 1194–7. [DOI] [PubMed] [Google Scholar]

[R46] 46.Lewis DFV; Dickens M, Factors Influencing Rates and Clearance in P450-Mediated Reactions: Qsars for Substrates of the Xenobiotic-Metabolizing Hepatic Microsomal P450s. Toxicology 2002, 170, 45–53. [DOI] [PubMed] [Google Scholar]

[R47] 47.Keefer C; Chang G; Carlo A; Novak JJ; Banker M; Carey J; Cianfrogna J; Eng H; Jagla C; Johnson N; Jones R; Jordan S; Lazzaro S; Liu J; Scott Obach R; Riccardi K; Tess D; Umland J; Racich J; Varma M; Visswanathan R; Di L, Mechanistic Insights on Clearance and Inhibition Discordance between Liver Microsomes and Hepatocytes When Clearance in Liver Microsomes Is Higher Than in Hepatocytes. Eur J Pharm Sci 2020, 155, 105541. [DOI] [PubMed] [Google Scholar]

[R48] 48.Brown RP; Delp MD; Lindstedt SL; Rhomberg LR; Beliles RP, Physiological Parameter Values for Physiologically Based Pharmacokinetic Models. Toxicol Ind Health 1997, 13, (4), 407–84. [DOI] [PubMed] [Google Scholar]

[R49] 49.Ito K; Houston JB, Comparison of the Use of Liver Models for Predicting Drug Clearance Using in Vitro Kinetic Data from Hepatic Microsomes and Isolated Hepatocytes. Pharm Res-Dordr 2004, 21, (5), 785–792. [DOI] [PubMed] [Google Scholar]

[R50] 50.Pearce RG; Setzer RW; Strope CL; Wambaugh JF; Sipes NS, Httk: R Package for High-Throughput Toxicokinetics. J Stat Softw 2017, 79, (4), 1–26. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] 51.Obach RS; Lombardo F; Waters NJ, Trend Analysis of a Database of Intravenous Pharmacokinetic Parameters in Humans for 670 Drug Compounds. Drug Metab. Dispos 2008, 36, (7), 1385–1405. [DOI] [PubMed] [Google Scholar]

[R52] 52.Yap CW, Padel-Descriptor: An Open Source Software to Calculate Molecular Descriptors and Fingerprints. J Comput Chem 2011, 32, (7), 1466–74. [DOI] [PubMed] [Google Scholar]

[R53] 53.Mansouri K; Grulke CM; Judson RS; Williams AJ, Opera Models for Predicting Physicochemical Properties and Environmental Fate Endpoints. J Cheminform 2018, 10, (1), 10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] 54.Wang J; Hallinger DR; Murr AS; Buckalew AR; Lougee RR; Richard AM; Laws SC; Stoker TE, High-Throughput Screening and Chemotype-Enrichment Analysis of Toxcast Phase Ii Chemicals Evaluated for Human Sodium-Iodide Symporter (Nis) Inhibition. Environ. Int 2019, 126, 377–386. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] 55.Yang C; Tarkhov A; Marusczyk J; Bienfait B; Gasteiger J; Kleinoeder T; Magdziarz T; Sacher O; Schwab CH; Schwoebel J; Terfloth L; Arvidson K; Richard A; Worth A; Rathman J, New Publicly Available Chemical Query Language, Csrml, to Support Chemotype Representations for Application to Data Mining and Modeling. J Chem Inf Model 2015, 55, (3), 510–28. [DOI] [PubMed] [Google Scholar]

[R56] 56.Guha R, Chemical Informatics Functionality in R. Journal of Statistical Software 2007, 6, (18). [Google Scholar]

[R57] 57.MDL Information Systems Inc. Maccs Keys , 14600 Catalina Street, San Leandro, CA: 94577. [Google Scholar]

[R58] 58.Willighagen EL; Mayfield JW; Alvarsson J; Berg A; Carlsson L; Jeliazkova N; Kuhn S; Pluskal T; Rojas-Cherto M; Spjuth O; Torrance G; Evelo CT; Guha R; Steinbeck C, The Chemistry Development Kit (Cdk) V2.0: Atom Typing, Depiction, Molecular Formulas, and Substructure Searching. J Cheminform 2017, 9, (1), 33. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] 59.Liaw A; Wiener M, Classification and Regression by Randomforest. R News 2002, 2, (3), 18–22. [Google Scholar]

[R60] 60.Kuhn M Caret: Classification and Regression Training, 2019.

[R61] 61.Kuhn M, Caret: Classification and Regression Training. R Package Version 6.0–85 In R Project, https://CRAN.R-project.org/package=caret: 2020. [Google Scholar]

[R62] 62.Roy K; Kar S; Ambure P, On a Simple Approach for Determining Applicability Domain of Qsar Models. Chemometrics and Intelligent Laboratory Systems 2015, 145, 22–29. [Google Scholar]

[R63] 63.Kuhn M; Vaughan D Yardstick: Tidy Characterizations of Model Performance: R Package Version 0.0.7, CRAN: 2020. [Google Scholar]

[R64] 64.Buckland M; Gey F, The Relationship between Recall and Precision. Journal of the American Society for Information Science 1994, 45, (`), 12–19. [Google Scholar]

[R65] 65.Powers D Evaluation: From Precision, Recall and F Factor to Roc, Informedness, Markedness and Correlation; Flinders University: 2007. [Google Scholar]

[R66] 66.Zhu XW; Sedykh A; Zhu H; Liu SS; Tropsha A, The Use of Pseudo-Equilibrium Constant Affords Improved Qsar Models of Human Plasma Protein Binding. Pharm Res 2013, 30, (7), 1790–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R67] 67.Lambrinidis G; Vallianatou T; Tsantili-Kakoulidou A, In Vitro, in Silico and Integrated Strategies for the Estimation of Plasma Protein Binding. A Review. Adv Drug Deliv Rev 2015, 86, 27–45. [DOI] [PubMed] [Google Scholar]

[R68] 68.Yun YE; Tornero-Velez R; Purucker ST; Chang DT; Edginton AN, Evaluation of Quantitative Structure Property Relationship Algorithms for Predicting Plasma Protein Binding in Humans. Computational Toxicology 2021, 17, 100142. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R69] 69.Toma C; Gadaleta D; Roncaglioni A; Toropov A; Toropova A; Marzo M; Benfenati E, Qsar Development for Plasma Protein Binding: Influence of the Ionization State. Pharm Res 2019, 36, (2), 28. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R70] 70.Filer DL; Kothiya P; Setzer RW; Judson RS; Martin MT, Tcpl: The Toxcast Pipeline for High-Throughput Screening Data. Bioinformatics 2017, 33, (4), 618–620. [DOI] [PubMed] [Google Scholar]

[R71] 71.Pearce RG; Setzer RW; Strope CL; Sipes NS; Wambaugh JF, Httk: R Package for High-Throughput Toxicokinetics. Journal of Statistical Software 2017, 79, (4). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R72] 72.U.S. EPA Toxcast & Tox21 Data Spreadsheet for in-Vitro Dbv3.2 https://www.epa.gov/chemical-research/toxicity-forecaster-toxcasttm-data (02/October/2020),

[R73] 73.Ring CL; Arnot JA; Bennett DH; Egeghy PP; Fantke P; Huang L; Isaacs KK; Jolliet O; Phillips KA; Price PS; Shin HM; Westgate JN; Setzer RW; Wambaugh JF, Consensus Modeling of Median Chemical Intake for the U.S. Population Based on Predictions of Exposure Pathways. Environ Sci Technol 2019, 53, (2), 719–732. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R74] 74.Centers for Disease Control and Prevention (CDC). National Center for Health Statistics (NCHS), National Health and Nutrition Examination Survey Data In U.S. Department of Health and Human Services, Centers for Disease Control and Prevention: Hyattsville, MD., 2009–2010. [Google Scholar]

[R75] 75.Hollander M; Wolfe DA; Chicken E, Nonparametric Statistical Methods John Wiley & Sons: 2013; Vol. 751. [Google Scholar]

[R76] 76.Paixao P; Gouveia LF; Morais JA, Prediction of the in Vitro Intrinsic Clearance Determined in Suspensions of Human Hepatocytes by Using Artificial Neural Networks. Eur J Pharm Sci 2010, 39, (5), 310–21. [DOI] [PubMed] [Google Scholar]

[R77] 77.Nikolic K; Agababa D, Prediction of Hepatic Microsomal Intrinsic Clearance and Human Clearance Values for Drugs. J Mol Graph Model 2009, 28, (3), 245–52. [DOI] [PubMed] [Google Scholar]

[R78] 78.Golbraikh A; Tropsha A, Predictive Qsar Modeling Based on Diversity Sampling of Experimental Datasets for the Training and Test Set Selection. Molecular Diversity 2002, 5, 231–243. [DOI] [PubMed] [Google Scholar]

[R79] 79.Ghafourian T; Amin Z, Qsar Models for the Prediction of Plasma Protein Binding. Bioimpacts 2013, 3, (1), 21–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R80] 80.Kosugi Y; Hosea N, Prediction of Oral Pharmacokinetics Using a Combination of in Silico Descriptors and in Vitro Adme Properties. Mol Pharm 2021. [DOI] [PubMed]

[R81] 81.Lewis DFV, On the Recognition of Mammalian Microsomal Cytochrome P450 Substrates and Their Characteristics. Biochem. Pharmacol 2000, 60, 293–306. [DOI] [PubMed] [Google Scholar]

[R82] 82.Zeeman M; Auer CM; Clements RG; Nabholz JV; Boethling RS, Epa US Regulator Perspectives on the Use of Qsar for New and Existing Chemicla Evaluations. SAR QSAR Environ. Res 1995, 3, (3), 179–201. [DOI] [PubMed] [Google Scholar]

PERMALINK

Designing QSARs for parameters of high throughput toxicokinetic models using open-source descriptors

Daniel Dawson

Brandall L Ingle

Katherine A Phillips

John W Nichols

John F Wambaugh

Rogelio Tornero-Velez

Abstract

Graphical Abstract

1. INTRODUCTION

2. MATERIALS AND METHODS

2.1. Data Collection and Curation

2.1.1. Hepatic Clearance: Clint

2.1.2. Fraction Unbound in Plasma: fup

Fig 1.

2.2. QSAR Descriptor Calculation and Evaluation

2.2.1. Clint Model Development and Selection

2.2.2. Clint Model Evaluation

2.2.3. fup Model Development and Selection

2.3. Bioactivity: Exposure Ratio (BER)

3. RESULTS AND DISCUSSION

3.1. Random Forest Classification model for Clint

3.1.1. Model description and evaluation

Table 1.

3.2. QSAR model for fup

3.2.1. Model description and evaluation.

Table 2.

3.2.2. Comparison of fup QSAR model to recently published models.

3.2.3. Compliance with OECD guidelines and model interpretation

Table 3:

3.3. Risk-Based Prioritization Approach

3.3.1. Evaluation of QSAR model produced Css values.

3.3.2. Evaluation of Bioactivity-Exposure Relationship using the ToxCast Test Set

Fig 2.

3.3.3. Bioactivity-Exposure Relationship Risk Prioritization Applied to Tox21 Dataset

Fig 3.

Supplementary Material

SYNOPSIS:

ACKNOWLEDGEMENT

FUNDING SOURCES

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2.1.1. Hepatic Clearance: Cl_int

2.1.2. Fraction Unbound in Plasma: f_up

2.2.1. Cl_int Model Development and Selection

2.2.2. Cl_int Model Evaluation

2.2.3. f_up Model Development and Selection

3.1. Random Forest Classification model for Cl_int

3.2. QSAR model for f_up

3.2.2. Comparison of f_up QSAR model to recently published models.

3.3.1. Evaluation of QSAR model produced C_ss values.