Abstract
BACKGROUND:
Toxicokinetic (TK) data needed for chemical risk assessment are not available for most chemicals. To support a greater number of chemicals, the U.S. Environmental Protection Agency (EPA) created the open-source R package “httk” (High Throughput ToxicoKinetics). The “httk” package provides functions and data tables for simulation and statistical analysis of chemical TK, including a population variability simulator that uses biometrics data from the National Health and Nutrition Examination Survey (NHANES).
OBJECTIVE:
Here we modernize the “HTTK-Pop” population variability simulator based on the currently available data and literature. We provide explanations of the algorithms used by “httk” for variability simulation and uncertainty propagation.
METHODS:
We updated and revised the population variability simulator in the “httk” package with the most recent NHANES biometrics (up to the 2017–18 NHANES cohort). Model equations describing glomerular filtration rate (GFR) were revised to more accurately represent physiology and population variability. The model output from the updated “httk” package was compared with the current version.
RESULTS:
The revised population variability simulator in the “httk” package now provides refined, more relevant, and better justified estimations.
SIGNIFICANCE:
Fulfilling the U.S. EPA’s mission to provide open-source data and models for evaluations and applications by the broader scientific community, and continuously improving the accuracy of the “httk” package based on the currently available data and literature.
Keywords: Generic physiologically-based toxicokinetic (PBTK) models, New approach methodologies (NAMs), Modeling software tools, Environmental justice, Model variability and uncertainty, Population simulator
INTRODUCTION
There is a need for swift assessments of potential chemical risks in our environment, but most chemicals lack sufficient toxicokinetic (TK) data to conduct assessments [1–3]. While therapeutic chemicals may undergo clinical trials, non-therapeutic chemicals traditionally rely on data from animal models. New approach methodologies (NAMs) provide tools that are potentially faster for determining hazard and more human relevant than animal studies by using in silico or in vitro methods [4–6]. High-throughput TK (HTTK) methods are one of the NAMs that can be used to characterize large numbers of chemicals by combining in vitro measurements and in silico predictions of chemical-specific TK properties with generic TK models [7–10]. To assess and screen the potential chemical hazards among the wider scientific community, the U.S. Environmental Protection Agency (EPA) provides HTTK methods through the freely available R package “httk” with open-source data and models [7, 11]. Chemical regulators in multiple nations have opened chemical risk assessments using NAMs, including a recent Health Canada report [12]. The Health Canada report advocates the use of HTTK methodology and the “httk” package for future screening level assessments under the Canadian Environmental Protection Act [12]. They and others have determined that points of departure based on in vitro testing can serve as protective surrogates for traditional, animal testing-based hazard data [12, 13]. To fulfill these needs, researchers at the U.S. EPA work to continuously improve the prediction accuracy of the “httk” package based on the currently available data and literature [7].
The U.S. EPA is required to consider “potentially exposed or susceptible subpopulations” in chemical risk assessment [14]. To address this need, the “httk” package integrates Monte Carlo simulation of population variability in TK, with the HTTK-Pop module [7, 15]. HTTK-Pop simulates population variability in TK by predicting distributions of physiological parameters based on distributions of demographic and biometric quantities from the National Health and Nutrition Examination Survey (NHANES), conducted by the U.S. Centers for Disease Control and Prevention (CDC) [15]. Every 2 years since 1999, NHANES surveys a representative sample of the American population (n = ~10,000) to collect various health-related information, including demographics and biometrics [7]. As of June 2022, the most recent NHANES cohort with fully published data is 2017–18. However, when HTTK-Pop was published in 2017, the most recent available NHANES cohort was 2011–12 [15]. As new NHANES cohort data are published, there is a need to modernize HTTK-Pop with the updated data. For a better representation of the current U.S. population, we updated HTTK-Pop with the most recent NHANES data available.
One key physiological parameter estimated by HTTK-Pop is the glomerular filtration rate (GFR) which characterizes the speed with which the kidney filters chemicals from the blood. HTTK-Pop uses the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation [16], a regression equation which predicts GFR based on serum creatinine measurements, age, sex, and whether race is “black” or “non-black”. The estimated GFR is ~16% higher in “black” than “non-black” persons [16]. Recent publications have questioned whether this “race coefficient” is appropriate for use in estimating GFR. In 2019, Eneanya et al. [17] laid out criteria for whether race is justified in making clinical decisions based on balancing benefits with harms, and determined that the GFR “race coefficient” fails the test. In 2020, the University of Washington (UW) Department of Medicine announced the removal of the GFR “race coefficient” used in clinical practice [18]. The same year, a task force was established by the National Kidney Foundation (NKF) and American Society of Nephrology (ASN) to broadly reassess the GFR “race coefficient” and its clinical implications [19]. In February 2022, the joint NKF-ASN task force recommended that GFR be estimated using equations without the “race coefficient” [20]. Although “httk” and HTTK-Pop are not intended for making clinical health decisions for individuals, the “race coefficient” in the CKD-EPI equation implemented in HTTK-Pop may result in underestimating potential risk for black populations, communities already experiencing environmental racism and its consequent health disparities [21]. Environmental justice is one of the missions of the U.S. EPA [22], and the use of HTTK-Pop is intended to make “httk” more appropriate for “potentially exposed or susceptible subpopulations” [14]. Therefore, we stopped using the CKD-EPI “race coefficient” in HTTK-Pop, and investigated the results of this change.
As we updated “httk” and HTTK-Pop, we also revised certain algorithms and provided more detailed justifications and explanations of the algorithms. Specifically, we updated: (1) the uncertainty propagation for chemical-specific TK parameters measured in vitro, (2) the GFR model for HTTK-Pop, (3) the default GFR in physiologically-based toxicokinetic (PBTK) model, (4) the physicochemical properties data, and (5) the NHANES data for HTTK-Pop. With greater clarity and transparency, we hope to better serve the goal of using open-source data and models for public health decisions makers and the broader scientific community.
METHODS
We updated and revised the population variability and uncertainty propagation simulator in the “httk” package. The open-source R package “httk” version 2.1.0 was used as the reference (unchanged) by the analysis presented here while new version 2.2.0 incorporates all changes described. The model outputs from the updated “httk” package were compared with version 2.1.0 for the pharmaceutical and non-pharmaceutical chemicals with environmental relevance (Supplementary Table S1) described in Wambaugh et al. [23]. Detailed modeling approaches and the population variability and uncertainty simulator of the current version of “httk” are described in Pearce et al. [11], Ring et al. [15], and Wambaugh et al. [24], respectively. It should be noted that we distinguish between the general methodology of high throughput toxicokinetics (HTTK) and the specific R package (“httk”) via capitalization, except that the “httk” population simulator HTTK-Pop is capitalized.
Monte Carlo simulation for uncertainty and variability
To simulate variability and propagate uncertainty, the R package “httk” uses Monte Carlo methods. The overview of the key functions involved in Monte Carlo uncertainty and variability simulation in “httk” is depicted as a flow diagram in Fig. 1, with descriptions for each function in Supplementary Table S2. Additional details are provided in Breen et al. [7] and Ring et al. [15].
Fig. 1. Overview of the key functions involved in Monte Carlo uncertainty and variability simulation in “httk”.

“create_mc_samples” is the primary function and creates a table of samples with a column for each parameter and a row for each “draw” of the Monte Carlo simulation. “httk-pop“ simulates population variability in physiological parameters and is controlled by function “httkpop_mc”. Since httk-pop predicts correlated human population variation in more than 50 biometrics, modelers may include a custom function “convert_httkpop_[MODEL]” for their PBTK model. Measurement uncertainty and population variability in chemical-specific parameters is controlled by “invitro_mc”.
TK model parameter values are sampled from distributions representing parameter uncertainty and variability, producing a matrix wherein each column corresponds to one parameter and each row corresponds to one sample. “httk” then uses each row of this table to parameterize and solve the same user-specified toxicokinetic model, generating model outputs corresponding to the sampled sets of parameter values. For example, to simulate variability for ten individuals, “httk” would generate a matrix with ten rows (one for each individual) where each row is a complete set of parameters describing that individual in terms of the relevant “httk” model, solve the TK model for each set of parameters, and return another matrix with ten rows; each row containing the TK model output for the corresponding set of parameters.
The “httk” function “calc_mc_css” implements a Monte Carlo simulation for the steady-state solutions of “httk” models. The function “calc_mc_css” generates the matrix of Monte Carlo-sampled model parameters using the function “create_mc_samples”, which can also be called directly by the user if desired. We aim to make any function in “httk” that uses chemical identities (name, CAS-RN, DTXSID) to also work if passed a complete vector of parameters describing the chemical in terms needed by the function. This is to allow for the potential use of Monte Carlo to vary all parameters. All parameters can be varied according to normal distributions using the function “monte_carlo”.
Although “httk” allows variation of any parameter, we focus on two groups of parameters with specific, non-normal, distributions: (1) chemical-independent physiological parameters (tissue masses, blood flows, hematocrit, hepatocellularity, kidney function) and (2) chemical-specific TK parameters based on in vitro measurements (fraction unbound in plasma, fup, and intrinsic hepatic clearance, Clint). Monte Carlo sampling of these two parameter groups is implemented in two separate functions called by “create_mc_samples”. The function “httkpop_mc” simulates population physiological variability according to the HTTK-Pop methods of Ring et al. [15], and the function “invitro_mc” simulates in vitro measurement uncertainty and population variability according to the methods of Wambaugh et al. [24].
The function “httkpop_mc” uses HTTK-Pop, a population physiology simulator for TK modeling that generates a table of chemical-independent TK-relevant physiological parameters for a simulated population. First, HTTK-Pop samples quantities from the demographics, body measures, and laboratory data available from NHANES. Then, HTTK-Pop uses these NHANES quantities to predict other TK-relevant physiological quantities using regression equations from the literature [15]. This table of physiological quantities can be generated directly using the function “httkpop_generate”. Finally, the various physiological parameters from HTTK-Pop are converted into the parameters expected by the HTTK toxicokinetics models, using the function “httkpop_biotophys_default”.
A Monte Carlo simulation of uncertainty for chemical-specific in vitro TK data must handle a variety of different reporting formats for in vitro measurements and associated uncertainty. Tables of in vitro TK data built into “httk” come from a variety of sources, including data from the open literature (for example, Tonnelier et al. [25]), data from EPA contractors and collaborators (for example, Wetmore et al. [26]), and data analyzed using Bayesian methods to characterize uncertainty (Wambaugh et al. [24]). Some of these sources report only point estimates, while other sources report confidence intervals and/or p values. Wambaugh et al. [24] reports median and bounds of a 95% credible interval.
Uncertainty and variability are simulated in different ways by the function “invitro_mc” according to the in vitro measurements and uncertainty metrics available for the chemical of interest. The logic of “invitro_mc” is depicted as a set of flow diagrams in Supplementary Fig. S1 through S5. For both fup (Supplementary Fig. S1) and Clint (Supplementary Fig. S4), “invitro_mc” first simulates uncertainty. The range of values from the uncertainty propagation are then treated as population means for the population variability simulation (Supplementary Figs. S2, S3 and S5).
Revision of chemical-specific uncertainty propagation
In vitro TK measurements of non-therapeutic compounds are often performed with methods of chemical analysis (that is, the determination of chemical composition and concentration of samples) that are not fully optimized. An occurrence that especially complicates uncertainty analysis is when the median estimated value is below the limit of detection, but the upper 95th percentile estimate is above that limit. This situation for in vitro data occurs both with Clint and fup. Sampling from this distribution is not straightforward. Previously this was handled with various approaches throughout the Monte Carlo sampler software implementation, but we have now created a single function “rmed0non095” (see Supplementary Material).
PBTK GFR racial factor correction
The “race coefficient” in the CKD-EPI equation was set to 1 by default (that is, the “non-black” value) for all individuals. The user can control this behavior by using the argument “ckd_epi_race_coeff” for the function “httkpop_generate.” (Arguments to “httkpop_generate” can be provided to “calc_mc_css” or “create_mc_samples” as a named list in argument “httkpop.generate.arg.list”). To keep the coefficient fixed at 1, set “ckd_epi_race_coeff = FALSE” (the default); to use the original implementation with a coefficient of 1.159 for simulated individuals with NHANES race/ethnicity category “non-Hispanic black”, set “ckd_epi_race_coeff = TRUE”. If the user does not supply a value for “ckd_epi_race_coeff”, the default behavior is to keep the coefficient fixed at 1.
The CKD-EPI equation is used to estimate GFR only for simulated individuals with age 18 years and older because this equation has only been validated in adults. For simulated individuals under age 18, GFR is estimated using the equation of Johnson et al. [27], which estimates GFR based only on body surface area. Therefore, the change to the CKD-EPI equation affects only simulated individuals age 18 or older.
In addition to the “race coefficient” change, we added residual population variability to the GFR estimated by both the CKD-EPI equation and Johnson et al. [27] equation. This residual population variability is an additional GFR variability not already explained by variability in serum creatinine, age, and sex (for CKD-EPI) or variability of body surface area (for Johnson et al. [27]). For Johnson et al. [27] equation, residual variability was simply modeled as a lognormal distribution with a 30% coefficient of variation, as originally reported by Johnson et al. [27]. For the CKD-EPI equation, residual variability was reported only in the form of summary statistics for the residuals in Appendix Table 6 of Levey et al. [16]. For the “development” data set (n = 5504), the median residual was 0.4 and the interquartile range (IQR) of residuals was 14.7 on the natural scale. The percentage of estimated GFR within 30% of measured GFR (P30) was 85.6%. On the log scale, root mean squared error (RMSE) was 0.231. We used these summary statistics to reconstruct the full distribution of residual errors. The process is briefly described below.
Under the assumption that the residuals on the log scale are zero-mean with constant variance (a standard assumption for regression analyses) and notating the estimated GFR from the CKD-EPI equation as eGFR, it can be shown that the natural-scale residuals must obey a log-normal distribution with log-scale mean equal to eGFR and unknown log-scale standard deviation (logSD). The correct logSD can be found by finding the value that produces a distribution of residuals most closely matching the reported summary statistics for residuals.
This log-normal residual distribution is conditional on the value of eGFR, but the residual summary statistics are reported for the full distribution marginalized over all eGFR values. Therefore, to find a value of logSD to reproduce the residual summary statistics, we also needed to reproduce the distribution of eGFR values. The only information about the distribution of eGFR values in Levey et al. [16] is the reported percentiles of eGFR predicted by the CKD-EPI equation in the “external validation” dataset (Table 4 in Levey et al. [16]). We assume that eGFR distribution is roughly the same in “development” and “external validation” datasets. There are 3.7% of patients with eGFR <15 ml/min/1.73 m2; 12.1% = 15–29; 33.2% = 30–59; 25.5% = 60–89; 25.4% > 90. The minimum eGFR is 0 ml/min/1.73 m2. To estimate a maximum eGFR, we visually examined Fig. 1 of Levey et al. [16], which shows an approximate maximum of 150. We used this information to approximate an empirical cumulative distribution function (CDF) for eGFR.
Using this empirical CDF for eGFR, we used R’s “optim” function to find a value of logSD that most closely reproduced the reported residual summary statistics in Levey et al. [16] Appendix Table 6. The result was logSD = 0.206.
Ultimately, the estimated GFR values were assigned to simulated adult individuals in a two-stage process. First, the CKD-EPI equation was used to estimate GFR based on individual serum creatinine, age, and sex. Second, residual variability was drawn for each individual from a log-normal distribution with log-scale mean equal to that individual’s eGFR, and log-scale SD = 0.206.
Revision of PBTK GFR physiological model
In addition, through the investigation of the GFR estimation issues in the “httk” package, we found recent publications with an alternate physiological modeling structure of GFR [28, 29]. We examined the current model structure of the GFR in the “httk” package and identified and corrected an oversight. The current model description shows renal clearance is the product of GFR and the concentration of chemical unbound in plasma leaving the kidney tissue, but physiologically the blood is filtered before entering the kidney tissue [28–30]. We have revised the model structure of the GFR to more accurately represent the kidney physiology of the renal clearance.
The revised physiological model of GFR described herein is a modification of the previously-described generic physiologically based toxicokinetic (PBTK) model in the “httk” package [11]. The generic PBTK model contains separate tissue compartments for the gut, liver, lungs, arteries, veins, kidneys, and rest of body, and describes the time-dependent amount of a substance in each compartment. The GFR is the volume of plasma that can be completely filtered per unit time, and only the unbound fraction of a chemical is available to be removed via GFR, which is neither bound to plasma proteins nor red blood cells. In the current model, the mass-balance differential equation of the kidney compartment defines the renal clearance as the GFR multiplied by the concentration of chemical unbound in plasma leaving the kidney tissue (last term of Eq. (1)):
| (1) |
where:
We revised the renal clearance model for chemicals to be eliminated as proportional to the amount of chemical unbound in the arterial plasma instead of being proportional to the amount of chemical unbound in plasma leaving the kidney tissue, which is a more accurate description of kidney physiology since blood is filtered prior to entering the kidney tissue (Fig. 2). The mass-balance differential equation of the kidney compartment, which defines the renal clearance (Eq. (1)), is modified by applying GFR to the concentration unbound in arterial plasma:
| (2) |
Fig. 2. Representation of “httk” PBTK model structure.

The generic PBTK model contains separate tissue compartments for the gut, liver, lungs, arteries, veins, kidneys, and rest of body, and describes the time-dependent amount of a substance in each compartment. Blood flow rates are represented with Q, tissue compartments and associated blood supplies are represented with rectangles. Some chemical is cleared from the system through liver metabolism and glomerular filtration. The renal clearance model was revised for chemicals to be eliminated as proportional to the amount of chemical unbound in the arterial plasma instead of being proportional to the amount of chemical unbound in plasma leaving the kidney tissue, which is a more accurate description of kidney physiology since blood is filtered prior to entering the kidney tissue.
The change in the amount of chemical in the kidney compartment is the difference in the amount that enters the kidney and the amount that leaves. The revised renal clearance model changes the amount entering the kidney compartment to be the amount of chemical entering from the arterial blood (proportional to the kidney blood flow rate) minus the amount of chemical cleared via GFR (last term of Eq. (2)).
Physico-chemical properties update
The acid dissociation constant (pKa) characterizes the equilibrium in solution between a compound and its charged components, if any. The charge of a compound influences how it partitions into tissues. The chemical-specific pKa values used by “httk” were updated from the predictions of the proprietary pKa model in ChemAxon (Cambridge,MA, USA) as provided by the Supplementary Material of Strope et al. [31] (“httk” version 2.1.0) to the OPEn structure-activity Relationship App (OPERA) 2.7 predictions [32]. OPERA is a publicly-available software package to predict physicochemical properties, environmental fate parameters, and toxicity endpoints using quantitative structure- activity relationship (QSAR) models [33].
NHANES biometrics update
The update of the population variability simulator in the “httk” package was performed using the modeling approaches described by Ring et al. [15] with the most recent NHANES biometrics (up to the 2017–18 NHANES cohort). NHANES data was processed by combining data from the three most recent NHANES cycles (2007–2008, 2009–2010, 2011–2012). To handle the complex NHANES structure, the R survey package was used for the analysis. As in our previous analysis, we only used data for respondents with a health examination, and we used height, weight, serum creatinine, and hematocrit measurements for each cycle for the analysis [15]. To combine data from three cycles, a cycle-specific sample weight for each NHANES respondent was assigned and divided by three (the number of NHANES cycles used). This scaling procedure is performed following the NHANES data analysis documentation, which represents the number of individuals in the total U.S. population represented by each NHANES respondent in each cycle.
The updated analysis resulted in 28,061 unique respondents, slightly reduced from the previous analysis with 29,353 unique respondents. In addition, we excluded 4441 respondents with missing information including age (for example, participant with age marked as “80” since NHANES defines respondents 80 years and older as “80”), height, weight, hematocrit data, and those 12 years or older with missing serum creatinine data. The final number of unique respondents included in the HTTK-Pop dataset were 23,620 and 24,546 for the updated and previous analysis, respectively. The summary and comparison of these participants by race/ethnicity, sex, and age are provided in Supplementary Tables S3 and S4.
RESULTS
The model outputs from the updated “httk” package were compared with the current version for the 45 compounds with both in vitro HTTK data and in vivo TK data described in Wambaugh et al. [23], including the pharmaceutical and environmental chemicals (for example, pesticides and plasticizers) to represent the diversity of the chemicals. The chemical names, abbreviations, CAS-RN, and DTXSID are summarized in Supplementary Table S1.
Revision of chemical-specific uncertainty propagation
Supplementary Figs. 1–5 provide logic diagrams explaining how parameters are varied depending on the type of data (that is, point estimates vs. quantiles) available for each chemical. These diagrams describe a process that has been streamlined and improved to better handle the wide range of in vitro TK parameter data. Two pathological cases (that is, “bugs”) were identified. First, when the median reported Clint was zero but the upper 95th percentile was non-zero, a small number of draws are now appropriately simulated as non-zero. Second, when median fup was below the limit of detection, only a single draw was being used for uncertainty rather than a range of values from the interval [min. fup, limit of detection]. Supplementary Table S5 summarizes the minimum, maximum, and mean values of absolute and relative Monte Carlo steady-state plasma concentration (Css) change for an upper 95th percentile between before and after the revision of chemical-specific uncertainty propagation with the “httk” PBTK model. All 42 chemicals were affected with a mean absolute change of −5.24 mg/l and a relative change of −22.5%. Chlorpyrifos showed a minimum absolute change of −134.4 mg/l, Permethrin showed a minimum relative change of −99.4%, and Imazalil showed a maximum absolute change of 19.60 mg/l and a maximum relative change of 10.9%.
PBTK GFR racial factor correction
To evaluate the effect of setting the CKD-EPI race coefficient to 1, and of adding residual variability to the estimated GFR values, we simulated a population of 10,000 non-Hispanic black adults (ages 18–79) using HTTK-Pop in “direct resampling” mode (sample with replacement from actual NHANES data). We then assigned each simulated individual an estimated eGFR using the CKD-EPI equation under both the previous status quo (CKD-EPI race coefficient set to 1.159 for non-Hispanic black simulated individuals; no residual variability added) and the new defaults (CKD-EPI race coefficient set to 1 for all simulated individuals; residual variability added). All other physiology parameters were kept the same for each simulated individual. The result was that each simulated individual was assigned two calculated values of eGFR: one using the old status quo, and one using the new defaults. We compared the distributions of these two eGFR values for this simulated population (Fig. 3A). Unsurprisingly, the mode of the distribution shifts lower for the new defaults, because setting the race coefficient to 1 produces a lower eGFR for the same combination of serum creatinine, age, and sex. However, the new defaults also result in a heavier upper tail, reflecting the addition of residual variability. For this simulated population, ~76% of individuals were assigned lower eGFR using the new defaults vs. the old status quo, and 24% of individuals were assigned higher eGFR using the new defaults vs. the old status quo. If residual variability were not added, then every simulated individual would have had a lower eGFR when the race coefficient was set to 1. Their simulated age, sex, race, and serum creatinine remained the same, so their CKD-EPI eGFR would only have changed (i.e., decreased) because of the change in the race coefficient. However, with the addition of residual variability, their eGFR is the CKD-EPI estimated value, plus or minus some randomly-drawn amount of individual residual variability. For 24% of the simulated population, that individual residual variability was positive and large enough to move their eGFR higher than their original CKD-EPI eGFR (calculated with the race coefficient taking its original value). But for 76% of the population, that individual residual variability was either negative, or positive but not large enough to move their eGFR higher than their original CKD-EPI eGFR. The result is that there is a 74% chance that a simulated non-Hispanic Black adult would have lower eGFR under the new default than under the old status quo.
Fig. 3. Evaluation of PBTK GFR racial factor correction.

A Kernel-density-estimated distributions of eGFR across a simulated population of 10,000 non-Hispanic black individuals, simulated using HTTK-Pop in “direct resampling” mode. eGFR estimated with the CKD-EPI equation using two methods. Green curve: race coefficient 1.159 for black individuals, and no residual variability added (old status quo). Orange curve: race coefficient set to 1 for all individuals, and residual variability added (new defaults). B Histogram across 943 chemicals of the % change in equivalent dose that results from the change in eGFR based on using the new defaults vs. the old status quo, for the same simulated population of 10,000 non-Hispanic black individuals shown in A. Binwidth of half a percentage point.
Further, we evaluated how much these changes to GFR estimation would affect “httk” estimated equivalent doses calculated using the 3-compartment steady-state model, across all 943 chemicals for which “httk” has built-in data to parameterize the 3-compartment steady-state model (Fig. 3B). Administered equivalent dose (AED) represents the daily oral dose (mg/kg/day) that would produce a specified Css equal to a specified target concentration. For the 3-compartment steady-state model, the relation between AED and target concentration is linear:
| (3) |
If “Css” is adopted as shorthand for “Css for 1 mg/kg/day”, then the percent change in AED (AED%) between old and new GFR estimation methods is:
| (4) |
For each simulated individual in the simulated population, Css was calculated for each of 943 chemicals using the 3-compartment steady-state model, first using eGFR calculated with the old status quo (resulting in Css_old), and then using eGFR calculated with the new defaults (resulting in Css_new). Then, for each of the 943 chemicals, the 95th percentile of both Css_old and Css_new were calculated across the simulated population. 95th percentile Css reflects the most-sensitive 5% of individuals (500 in this simulated population). Then, the AED% was calculated for each chemical, and the resulting distribution of AED% across the 943 chemicals is illustrated as a histogram in Fig. 3B.
The AED% was zero or negative for nearly all chemicals (901/943). Interestingly, the distribution is bimodal. The lower peak represents chemicals with little or no hepatic clearance, such that nearly all clearance is controlled by the GFR parameter (see Supplementary Fig. S6). The upper peak represents chemicals where clearance is primarily hepatic, such that total clearance is not so sensitive to the GFR parameter (see Supplementary Fig. S6). These results suggest that these changes in the GFR estimation method will mainly affect chemicals whose clearance is primarily passive renal. For these chemicals, the changes in the GFR estimation method will tend to predict higher Css for non-Hispanic black simulated individuals. However, for chemicals whose clearance is primarily hepatic, the changes in the GFR estimation method will have little effect on the predicted Css for non-Hispanic black simulated individuals.
Revisions of PBTK GFR physiological model
We modified the GFR physiological modeling structure of the generic PBTK model in “httk” to describe kidney physiology more accurately. The modification did not affect Monte Carlo Css for the upper 95th percentile. Figure 4A shows the Css using the updated GFR model with the HTTK PBTK model vs. estimation based on in vivo data for 31 chemicals. The solid black line indicates the identity line, while the dashed light blue and red lines indicate the linear regression (log-scale) trend lines for pharmaceuticals (R2 of 0.3, MSE of 4.04 mg/l) and other chemicals (R2 of 0.38, MSE of 8.98 mg/l), respectively. Figure 4B shows the comparison of the TK clearance estimated from the in vivo data with the clearance predictions made using HTTK data for 38 chemicals. The solid black line indicates the identity line, while the dashed light blue and red lines indicate the linear regression (log-scale) trend lines for pharmaceuticals (R2 of 0.17, MSE of 2.5 mg/l/h) and other chemicals (R2 of 0.6, MSE of 3.13 mg/l/h), respectively.
Fig. 4. Evaluation of PBTK GFR physiological model revision.

A Css using the updated GFR model with the “httk” PBTK model vs. estimation based on in vivo data for 31 chemicals. The solid black line indicates the identity line, while the dashed light blue and red lines indicate the linear regression (log-scale) trend lines for pharmaceuticals and other chemicals, respectively. B Comparison of the TK clearance estimated from the in vivo data with the clearance predictions made using “httk” data for 38 chemicals. The solid black line indicates the identity line, while the dashed light blue and red lines indicate the linear regression (log-scale) trend lines for pharmaceuticals and other chemicals, respectively.
Physico-chemical properties update
The physico-chemical properties of pKa values were updated from the predictions based on Strope et al. [31] to those from OPERA [32]. Figure 5A shows Monte Carlo Css for the upper 95th percentile using the pKa predictions accompanying Strope et al. [31] vs. updated predictions using the OPERA 2.7 [32] for 33 chemicals with the “httk” PBTK model. The dashed blue line indicates the identity line. Figure 5B shows the Css using the OPERA estimated pKa values vs. estimation based on in vivo data for 31 chemicals. The solid black line indicates the identity line, while the dashed light blue and red lines indicate the linear regression (log-scale) trend lines for pharmaceuticals (R2 of 0.32, MSE of 3.92 mg/l) and other chemicals (R2 of 0.4, MSE of 8.75 mg/l), respectively. Supplementary Table S6 summarizes the minimum, maximum, and mean values of absolute and relative Monte Carlo Css change for the upper 95th percentile between the previous “httk” pKa data and updated OPERA pKa data with the “httk” PBTK model. Out of 42 chemicals, 33 chemicals were affected, with a mean absolute change of 2.26 mg/l and a relative change of 32.3 %. Imzalil showed a minimum absolute change of −98.8 mg/l and a minimum relative change of −49.4%, Ondansetron showed a maximum absolute change of 3.05 mg/l, and Imipramine showed a maximum relative change of 1250.0%.
Fig. 5. Evaluation of physico-chemical properties update.

A Monte Carlo Css for the upper 95th percentile using the pKa predictions accompanying Strope et al. vs. updated predictions using the OPERA 2.7 for 33 chemicals with the “httk” PBTK model. The dashed blue line indicates the identity line. B Css using the OPERA pKa data vs. estimation based on in vivo data for 31 chemicals. The solid black line indicates the identity line, while the dashed light blue and red lines indicate the linear regression (log-scale) trend lines for pharmaceuticals and other chemicals, respectively.
NHANES biometrics update
We updated four NHANES biometrics (age, weight, height, and serum creatinine) with data from the three most recent cohorts (2013–2018) to predict the population distribution of our key physiological TK model parameters (hepatocellularity, GFR, portal vein flow, liver mass, and hematocrit). Table 1 shows comparison of mean/median weight and height of the previous cohort (2007–12 NHANES) and updated cohort (2013–18 NHANES) for five subgroups: Total, Male, Female, Adults, Youth. To compare the differences between the mean weight and height of the previous cohort and updated cohort, we performed a two-sample t-test. The 95% confidence interval (CI) for five subgroup is shown in Table 1. The mean body weights for the updated cohort were increased as compared with the previous cohort while no significant changes in the mean body heights for three subgroups: Total, Female, Adults. Both the mean body weights and heights were increased for Youth subgroup, while no significant changes in the mean body weights and heights for Male subgroup. To capture the weights and heights distributions of previous cohort and updated cohort, the scatter plots of weights vs. age and heights vs. age for Total, Male, Female are shown in Supplementary Figs. S7 and S8, respectively. Also, the scatter plots of weights vs. heights for Total, Male, Female are shown in Supplementary Fig. S9.
Table 1.
Comparison of mean/median weight and height of the previous cohort (2007–2012 NHANES cohort) and updated cohort (2013–2018 NHANES cohort) for five subgroups: Total, Male, Female, Adults (age 20+ years old), and Youth (age 2–19 years old).
| Total | Male | Female | Adults | Youth | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Median | Mean | Median | Mean | Median | Mean | Median | Mean | Median | Mean | |
| Weight (kg) | ||||||||||
| NHANES 2007–2012 | 67.6 | 64.2 | 73.0 | 67.5 | 63.0 | 60.9 | 79.0 | 82.0 | 30.8 | 37.7 |
| NHANES 2013–2018 | 67.8 | 65.2 | 72.9 | 68.2 | 63.5 | 62.4 | 79.2 | 82.7 | 32.2 | 38.8 |
| Two-sample t-test (95% CI) | [0.5, 1.6] | [−0.1, 1.6] | [0.7, 2.2] | [0.2, 1.3] | [0.4, 1.9] | |||||
| Height (cm) | ||||||||||
| NHANES 2007–2012 | 161.9 | 151.6 | 169.9 | 156.0 | 157.6 | 147.2 | 167.3 | 167.6 | 132.6 | 127.7 |
| NHANES 2013–2018 | 161.2 | 151.7 | 169.4 | 156.0 | 157.2 | 147.5 | 166.4 | 166.8 | 133.8 | 128.8 |
| Two-sample t-test (95% CI) | [−0.5, 0.6] | [−0.8, 0.9] | [−0.4, 1.0] | [−1.1, −0.6] | [0.1, 2.1] | |||||
To compare the differences between the mean weight and height of the previous cohort and updated cohort, we performed a two-sample t-test. The 95% confidence interval (CI) for five subgroup is shown.
Figure 6 shows the comparison of Monte Carlo Css for the upper 95th percentile using the previous cohort vs. updated cohort for 42 chemicals with the “httk” PBTK model. The dashed blue line indicates the identity line. Supplementary Table S7 summarizes the minimum, maximum, and mean values of absolute and relative Monte Carlo Css change for the upper 95th percentile between the previous cohort (2007–12 NHANES cohort) and updated cohort (2013–18 NHANES cohort) with the “httk” PBTK model. Many chemicals had only a slight increase or decrease in the Monte Carlo Css, with a mean absolute change of 0.56 mg/l and a relative change of 3.57%. Ibuprofen showed a minimum absolute change of −3.65 mg/l, and Diclofenac showed a minimum relative change of −21.3%. Perfluorooctanoic acid showed a maximum absolute change of 11.2 mg/l, and Triclosan showed a maximum relative change of 23.4%.
Fig. 6. Evaluation of NHANES biometrics update.

Monte Carlo Css for the upper 95th percentile using the previous cohort (2007–12 NHANES cohort) versus updated cohort (2013–18 NHANES cohort) for 42 chemicals with the “httk” PBTK model. The dashed blue line indicates the identity line.
DISCUSSION
Here we modernize the “HTTK-Pop” population variability simulator based on the currently available data and literature. Monte Carlo simulation of human toxicokinetic variability based on the CDC NHANES cohort allows identification of individuals who are more sensitive to chemical exposure. The R package “httk” has been updated to reflect the most recent U.S. NHANES cohorts. Comparison of a previous cohort (2007–12 NHANES cohort) and an updated cohort (2013–18 NHANES cohort) indicates some increase in mean body weights for the updated cohort with no change in body heights for three subgroups: Total, Female, Adults. Both the mean body weights and heights were increased for Youth subgroup, while no significant changes in the mean body weights and heights for Male subgroup. The generic (not chemical-specific or study-specific) PBTK model within “httk” has been updated to more accurately predict renal clearance by glomerular filtration and the impact on chemical predictions has been examined. The regression model used by HTTK-Pop to predict kidney function (GFR) has been updated to remove dependence on race, following new clinical recommendations, and residual variability in GFR has been included. Also, the physico-chemical properties of pKa values have been updated to those from OPERA predictions QSAR models. The revised NHANES biometric data within HTTK-Pop, GFR population prediction and variability, the GFR physiological modeling structure, and the revised pKa data all produce refined TK estimates.
When evaluating the risk of chemical exposures, it is not only important to consider biological variability, but also to evaluate the uncertainty of chemical-specific measurements. To fulfill the need to quantitatively assess the impact of experimental uncertainty in measuring the chemical-specific parameters, the Monte Carlo simulator for propagation of uncertainty was recently included in the “httk” package with an accompanying publication describing the methods [24]. In this paper, we provided the detailed descriptions and processes, which can handle the wide range of in vitro TK parameter data.
Wambaugh et al. [23] reported that HTTK could make decent predictions of in vivo TK. They characterized it by using R2 and MSE: R2 of 0.48 and MSE of 6.48 mg/l for Css; R2 of 0.19 and MSE of 2.44 mg/l/h for total clearance of pharmaceuticals; and R2 of 0.5 and MSE of 2.44 mg/l/h and 2.93 mg/l/h for total clearance of non-pharmaceutical chemicals (“httk” version 1.8 in 2018). In this study, we observed the similar results for two revisions of “httk”—the revisions of GFR in PBTK model and updated physico-chemical properties of pKa values.
We have also removed the CKD-EPI “race coefficient” by default in “httk”/HTTK-Pop. Many experts have recommended removing the “race coefficient” when estimating GFR for clinical or public health decision-making [17, 18, 20, 34–36]. In “httk”, the “race coefficient” may result in underestimating potential risk for black populations, communities already experiencing environmental racism and its consequent health disparities [21]. Environmental justice is one of the missions of the U.S. EPA [22], and the use of HTTK-Pop is intended to make “httk” more appropriate for “potentially exposed or susceptible subpopulations” [14]. Although using the CKD-EPI equation with “race coefficient” set to 1 may slightly underestimate GFR (and thus overestimate risk) in black populations [37, 38], we consider this more-cautious approach to be appropriate, given the substantial criticisms of the “race coefficient”. We note that the NKF-ASN Task Force recommends using the CKD-EPI equation completely refit without the use of race as a predictor, rather than using the original equation with the race coefficient fixed at 1 [20]. This recommendation was published while this manuscript was in the final stages of preparation, so we have not yet been able to evaluate implementing it in HTTK-Pop. Future updates for GFR estimation in HTTK-Pop will continue to follow the most up-to-date recommendations from clinical and public health experts.
Furthermore, we have incorporated data-driven simulation of GFR residual variability into HTTK-Pop, to more accurately simulate the most-sensitive portion of the population. We estimated the distribution of residual variability from the CKD-EPI regression using reported residual summary statistics and percentiles of estimated GFR in the study populations. These reported values, and therefore our estimates of residual variability, are based on the CKD-EPI equation with the “race coefficient,” yet we apply the estimated residual variability to CKD-EPI predictions made without the “race coefficient.” Despite this inconsistency, we still consider this the best available estimate of the distribution of residual variability in GFR estimated using the CKD-EPI equation.
These updates to HTTK-Pop—including the changes to the CKD-EPI equation; the updated NHANES biometric data; the updated physiological model of GFR; updated physico-chemical properties; and updates to the uncertainty/variability propagation algorithm—all strive toward improving the ability of HTTK-Pop and the “httk” R package to inform decision-making about potential chemical risks. In support of the U.S. EPA’s chemical prioritization research program, we aim to create open-source data and models for evaluations and applications that are transparent, accessible, driven by the best available data and science, protective of potentially highly-exposed and susceptible subpopulations, and consistent with the U.S. EPA’s mission of environmental justice.
Supplementary Material
ACKNOWLEDGEMENTS
The authors thank Drs. Xiaoqing Chang and Kristin Eccles for their helpful U.S. EPA internal reviews of the manuscript. We greatly appreciate Dr. Sarah Davidson-Fritz for support with software engineering for the “httk” R package. We thank Drs. Peter Egeghy and Risa Sayre for useful conversations. Although the manuscript was reviewed by the US EPA and approved for publication, it may not necessarily reflect official Agency policy. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.
FUNDING
The United States Environmental Protection Agency (EPA) through its Office of Research and Development (ORD) funded the research described here. This project was supported by appointments to the Internship/Research Participation Program at ORD and administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and U.S. EPA.
Footnotes
COMPETING INTERESTS
The authors declare no competing interests.
Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41370-022-00491-0.
DATA AVAILABILITY
The data from this study are provided within the article and its Supplementary Files.
REFERENCES
- 1.Egeghy PP, Judson R, Gangwal S, Mosher S, Smith D, Vail J, et al. The exposure data landscape for manufactured chemicals. Sci total Environ. 2012;414:159–66. [DOI] [PubMed] [Google Scholar]
- 2.Breyer S Breaking the vicious circle: toward effective risk regulation. Cambridge (MA): Harvard University Press; 2009. [Google Scholar]
- 3.Judson R, Richard A, Dix DJ, Houck K, Martin M, Kavlock R, et al. The toxicity data landscape for environmental chemicals. Environ Health Perspect. 2009;117:685. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Kavlock RJ, Bahadori T, Barton-Maclaren TS, Gwinn MR, Rasenberg M, Thomas RS. Accelerating the pace of chemical risk assessment. Chem Res Toxicol. 2018;31:287–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Dix DJ, Houck KA, Martin MT, Richard AM, Setzer RW, Kavlock RJ. The toxcast program for prioritizing toxicity testing of environmental chemicals. Toxicological Sci. 2006;95:5–12. [DOI] [PubMed] [Google Scholar]
- 6.Collins FS, Gray GM, Bucher JR. Transforming environmental health protection. Science. 2008;319:906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Breen M, Ring CL, Kreutz A, Goldsmith MR, Wambaugh JF. High-throughput PBTK models for in vitro to in vivo extrapolation. Expert Opin Drug Metab Toxicol. 2021;17:903–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Coecke S, Pelkonen O, Leite SB, Bernauer U, Bessems JG, Bois FY, et al. Toxicokinetics as a key to the integrated toxicity risk assessment based primarily on non-animal approaches. Toxicol Vitr. 2013;27:1570–7. [DOI] [PubMed] [Google Scholar]
- 9.Bessems JG, Loizou G, Krishnan K, Clewell HJ III, Bernasconi C, Bois F, et al. PBTK modelling platforms and parameter estimation tools to enable animal-free risk assessment: recommendations from a joint EPAA–EURL ECVAM ADME workshop. Regulatory Toxicol Pharmacol. 2014;68:119–39. [DOI] [PubMed] [Google Scholar]
- 10.Wambaugh JF, Bare JC, Carignan CC, Dionisio KL, Dodson RE, Jolliet O, et al. New approach methodologies for exposure science. Curr Opin Toxicol. 2019;15:76–92. [Google Scholar]
- 11.Pearce RG, Setzer RW, Strope CL, Sipes NS, Wambaugh JF. Httk: R package for high-throughput toxicokinetics. J Stat Softw. 2017;79:1–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Health Canada. Science approach document—bioactivity exposure ratio: application in priority setting and risk assessment. 2021.
- 13.Paul Friedman K, Gagne M, Loo LH, Karamertzanis P, Netzeva T, Sobanski T, et al. Utility of in vitro bioactivity as a lower bound estimate of in vivo adverse effect levels and in risk-based prioritization. Toxicol Sci. 2020;173:202–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.U.S. Congress, Frank R Lautenberg chemical safety for the 21st century act. Public Law 114–182 (114th Congress). Washington, DC: U.S. Congress; 2016. [Google Scholar]
- 15.Ring CL, Pearce RG, Setzer RW, Wetmore BA, Wambaugh JF. Identifying populations sensitive to environmental chemicals by simulating toxicokinetic variability. Environ Int. 2017;106:105–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Levey AS, Stevens LA, Schmid CH, Zhang YL, Castro AF 3rd, Feldman HI, et al. A new equation to estimate glomerular filtration rate. Ann Intern Med. 2009;150:604–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Eneanya ND, Yang W, Reese PP. Reconsidering the consequences of using race to estimate kidney function. JAMA. 2019;322:113–4. [DOI] [PubMed] [Google Scholar]
- 18.UW Medicine to exclude race from calculation of eGFR (measure of kidney function) [press release]. Seattle (WA): University of Washington Department of Medicine; 2020. [Google Scholar]
- 19.Delgado C, Baweja M, Burrows NR, Crews DC, Eneanya ND, Gadegbeku CA, et al. Reassessing the inclusion of race in diagnosing kidney diseases: an interim report from the NKF-ASN task force. Am J Kidney Dis. 2021;78:103–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Delgado C, Baweja M, Crews DC, Eneanya ND, Gadegbeku CA, Inker LA, et al. A unifying approach for GFR estimation: recommendations of the NKF-ASN task force on reassessing the inclusion of race in diagnosing kidney disease. Am J Kidney Dis. 2022;79:268–88.e1. [DOI] [PubMed] [Google Scholar]
- 21.Kaufman JD, Hajat A. Confronting environmental racism. Environ Health Perspect. 2021;129:51001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.U.S. Environmental Protection Agency. Environmental Justice. 2022. https://www.epa.gov/environmentaljustice.
- 23.Wambaugh JF, Hughes MF, Ring CL, MacMillan DK, Ford J, Fennell TR, et al. Evaluating in vitro-in vivo extrapolation of toxicokinetics. Toxicological Sci. 2018;163:152–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wambaugh JF, Wetmore BA, Ring CL, Nicolas CI, Pearce RG, Honda GS, et al. Assessing toxicokinetic uncertainty and variability in risk prioritization. Toxicol Sci. 2019;172:235–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tonnelier A, Coecke S, Zaldívar J-M. Screening of chemicals for human bioaccumulative potential with a physiologically based toxicokinetic model. Arch Toxicol. 2012;86:393–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Wetmore BA, Wambaugh JF, Allen B, Ferguson SS, Sochaski MA, Setzer RW, et al. Incorporating high-throughput exposure predictions with dosimetry-adjusted in vitro bioactivity to inform chemical toxicity testing. Toxicol Sci. 2015;148:121–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Johnson TN, Rostami-Hodjegan A, Tucker GT. Prediction of the clearance of eleven drugs and associated variability in neonates, infants and children. Clin Pharmacokinet. 2006;45:931–56. [DOI] [PubMed] [Google Scholar]
- 28.Bernstein AS, Kapraun DF, Schlosser PM. A model template approach for rapid evaluation and application of physiologically based pharmacokinetic models for use in human health risk assessments: a case study on per- and polyfluoroalkyl substances. Toxicol Sci. 2021;182:215–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kim SJ, Choi EJ, Choi GW, Lee YB, Cho HY. Exploring sex differences in human health risk assessment for PFNA and PFDA using a PBPK model. Arch Toxicol. 2019;93:311–30. [DOI] [PubMed] [Google Scholar]
- 30.Kriz W, Bankir L. A standard nomenclature for structures of the kidney. The Renal Commission of the International Union of Physiological Sciences (IUPS). Kidney Int. 1988;33:1–7. [DOI] [PubMed] [Google Scholar]
- 31.Strope CL, Mansouri K, Clewell HJ, Rabinowitz JR, Stevens C, Wambaugh JF. High-throughput in-silico prediction of ionization equilibria for pharmacokinetic modeling. Sci Total Environ. 2018;615:150–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Mansouri K, Cariello NF, Korotcov A, Tkachenko V, Grulke CM, Sprankle CS, et al. Open-source QSAR models for pKa prediction using multiple machine learning approaches. J Cheminformatics. 2019;11:1–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Mansouri K, Grulke CM, Judson RS, Williams AJ. OPERA models for predicting physicochemical properties and environmental fate endpoints. J Cheminformatics. 2018;10:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Williams WW, Hogan JW, Ingelfinger JR. Time to eliminate health care disparities in the estimation of kidney function. N Engl J Med. 2021;385:1804–6. [DOI] [PubMed] [Google Scholar]
- 35.Duggal V, Thomas IC, Montez-Rath ME, Chertow GM, Kurella Tamura M National estimates of CKD prevalence and potential impact of estimating glomerular filtration rate without race. J Am Soc Nephrol. 2021;32:1454–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Young BA. Removal of race from estimation of kidney function. Nat Rev Nephrol. 2022;18:201–2. [DOI] [PubMed] [Google Scholar]
- 37.Hsu CY, Yang W, Parikh RV, Anderson AH, Chen TK, Cohen DL, et al. Race, genetic ancestry, and estimating kidney function in CKD. N. Engl J Med 2021;385:1750–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Levey AS, Titan SM, Powe NR, Coresh J, Inker LA. Kidney disease, race, and GFR estimation. Clin J Am Soc Nephrol. 2020;15:1203–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data from this study are provided within the article and its Supplementary Files.
