Abstract
To fulfil the promise of reducing reliance on mammalian in vivo laboratory animal studies, new approach methods (NAMs) need to provide a confident basis for regulatory decision-making. However, previous attempts to develop in vitro NAMs-based points of departure (PODs) have yielded mixed results, with PODs from U.S. EPA’s ToxCast, for instance, appearing more conservative (protective) but poorly correlated with traditional in vivo studies. Here, we aimed to address this discordance by reducing the heterogeneity of in vivo PODs, accounting for species differences, and enhancing the biological relevance of in vitro PODs. However, we only found improved in vitro-to-in vivo concordance when combining the use of Bayesian model averaging-based benchmark dose modeling for in vivo PODs, allometric scaling for interspecies adjustments, and human-relevant in vitro assays with multiple induced pluripotent stem cell-derived models. Moreover, the available sample size was only 15 chemicals, and the resulting level of concordance was only fair, with correlation coefficients < 0.5 and prediction intervals spanning several orders of magnitude. Overall, while this study suggests several ways to enhance concordance and thereby increase scientific confidence in in vitro NAMs-based PODs, it also highlights challenges in their predictive accuracy and precision for use in regulatory decision making.
Keywords: point of departure (POD), in vitro-to-in vivo extrapolation (IVIVE), Bayesian modeling, new approach methods (NAMs), human induced pluripotent stem cell
Introduction
Existing animal-based toxicity testing approaches are deemed too time-consuming and resource-intensive to support current and future regulatory requirements and public health needs (Kavlock et al. 2018). Additionally, it has been suggested that findings from high-dose experiments in animals may not accurately reflect potential effects in humans. These limitations, combined with ethical concerns about animal experiments, have created an impetus to explore alternative approaches to toxicity testing (NRC 2007, 2017; Krewski et al. 2020). Accordingly, in vitro and in silico methods are being extensively explored with the promise of more efficient and accurate evaluations for a large number of chemicals, particularly for those that lack traditional toxicological data. In response to an increase in the availability of the data from “new approach methods” (NAMs), increasing attention has been devoted to the need to rigorously (i) evaluate novel in vitro and in silico approaches to increase scientific confidence in regulatory decision-making, (ii) assess their equivalence (or lack thereof) to mammalian in vivo testing, and (iii) determine their intended purpose and context of use in human risk assessment (van der Zalm et al. 2022; NASEM 2023). Specifically, greater transparency in defining the population, exposure, comparator, and outcomes (PECO) for in vitro models in parallel with their intended human “target” PECO was recommended to facilitate the incorporation of NAMs into risk assessment. Evaluation of the validity of NAMs in terms of biological relevance to human population and outcomes, as well as the accuracy of model predictions, will also strengthen their utility and relevance (NASEM 2023).
While in vitro and in silico methods offer many potential advantages, there is a need to address the concordance between traditional in vivo testing and NAMs. The correlation between in vivo and in vitro points of departure (PODs) has been expected by decision-makers as one prerequisite for using NAMs for deriving toxicity values for chemicals without traditional safety data. For example, the confidence in using animal studies-based PODs for human health decisions has been enhanced by the transition from no observed adverse effect level (NOAEL) or the lowest observed adverse effect level (LOAEL) to the benchmark dose (BMD) modeling and accounting for statistical uncertainty (U.S.EPA 2012). Additional enhancements for dose-response assessments include Bayesian model averaging (BMA) BMD approach (Shao and Shapiro 2018) and probabilistic dosimetric adjustment factors (WHO/IPCS 2018). These approaches help improve the confidence in the extrapolation from animal doses to human equivalent doses.
Dose-response analysis of NAMs data has benefited from more granular dose selection and several BMD-like approaches that are widely used for derivation of PODs. Because most NAMs assays are based on human cells, interspecies extrapolation is not needed; however, additional adjustment factors (Rusyn and Chiu 2022) and in vitro-to-in vivo extrapolation (IVIVE) (Bell et al. 2018) may be needed to convert in vitro PODs to oral equivalent doses. With hundreds of chemicals that have both traditional and NAMs data, efforts have been made to compare in vitro and in vivo PODs. It has been reported that in vitro PODs, when extrapolated from nominal concentrations to oral equivalent doses through IVIVE, appear to be more conservative than traditional PODs by around two orders of magnitude on average, and thus they have been proposed to serve as protective surrogates in prioritizing risk when the traditional in vivo data is lacking (Paul Friedman et al. 2020; Beal et al. 2022; Health Canada 2021). While ideally, cellular concentrations would be compared between in vivo experimental animal and in vitro human data, neither has been measured or modeled systematically across a large range of chemicals in both laboratory animal species and in vitro assays. Paul Friedman et al. (2020) did explore the inclusion of interspecies adjustment for in vivo PODs, which reduced the level of conservatism of ToxCast in vitro PODs (median of log10(in vivo POD/NAM POD) improved from 2 to 1.33). Nonetheless, the correlation between in vitro PODs, primarily based on ToxCast data (Williams et al. 2017), and in vivo PODs is generally quite poor (R2 ≤ 0.12), posing a confidence challenge with relying on the NAMs-derived dose-response data for establishing regulatory toxicity values for chemicals without traditional data (Fantke et al. 2021; Wignall et al. 2018; Paul Friedman et al. 2020).
To build confidence in using in vitro PODs for regulatory decision-making, it is essential to further explore the poor concordance between in vitro and in vivo PODs. One reason could be the type of cell models used. While ToxCast is the largest and most systematic dataset of NAMs (Williams et al. 2017), other studies have used additional human cell types for screening large chemical libraries. For example, human induced pluripotent stem cell (hiPSC)-derived models have emerged as a promising approach in toxicology and can model a compendium of organ systems (ie. heart, brain, liver, etc.) (Burnett, Blanchette, et al. 2021b; Chen et al. 2020; Anson, Kolaja, and Kamp 2011; Pang 2020). Some hiPSC-derived cells, such as cardiomyocytes, have been recently employed for population-based testing (Blanchette et al. 2022; Burnett et al. 2019; Burnett, Blanchette, et al. 2021b). Other population-based human cell models, such as lymphoblastoid cell lines (LCLs) (Abdo et al. 2015; Ford et al. 2022), have also been used to address inter-individual variability in responses to chemicals and mixtures (Eduati et al. 2015; Blanchette et al. 2020). Other reasons for the poor concordance could be the ways in which animal-based PODs were derived (Paul Friedman et al. 2020; Beal et al. 2022) or the fact that most available in vivo PODs were based on NOAEL or LOAEL dose-response approach. The harmonization of in vivo and in vitro POD derivation may narrow the gap and provide more appropriate comparisons.
In this study, we aimed to address potential reasons for the limited concordance between in vitro and in vivo PODs by (1) reducing the heterogeneity of in vivo PODs through BMA BMD modeling, (2) accounting for species differences through interspecies allometric scaling, and (3) increasing the human biological relevance of in vitro PODs by including the data from additional human-tissue relevant in vitro assays. We hypothesized that implementing these approaches will improve in vivo to in vitro alignment specifically in the context of deriving health protective PODs for supporting regulatory toxicity values. In addition, in keeping with recommendations from NASEM (2023), we defined a “health protective” parallel PECO for each assay battery along with a corresponding target human PECO.
Materials and methods
Parallel PECO approach
As suggested by NASEM (2023), parallel PECO statements serve to identify the relevant information within toxicity testing methods which are assumed to be surrogates for the target human population of interest within the context of use. In non-cancer human dose-response assessment, the goal is to ensure the protection of the population by deriving non-cancer toxicity values based more sensitive endpoints (also known as “critical effects”). Thus, as shown in Table 1, our target “P” is the human population, target “E” is internal dose exposure expressed as steady state plasma concentration; target “C” is low or no exposure; and target “O” is health protective effects. As surrogates for the “target human PECO,” in vivo toxicity testing and in vitro models measure various phenotypes as endpoints and the most sensitive values are considered to be protective. The first “parallel” PECO in Table 1 is the “traditional” approach, where the POD is derived from toxicological reviews of toxicology studies.
Table 1.
Parallel framework describing in vivo and in vitro toxicity testing methods and target human PECO statements evaluated in this study.
| Target human PECO | Toxicity testing method | Test method PECO |
|---|---|---|
| P: Human population E: Oral exposure to chemicals, converted to steady state plasma concentration C: No exposure O: Health protective effects |
Regulatory toxicological review by EPA of in vivo toxicity studies | P: Human populations or experimental mammalian animals E: Oral exposure to chemicals, converted to human equivalent steady state plasma concentration C: No or lower exposure O: Critical effect determined by regulatory toxicological review |
| ToxCast high throughput screening battery | P: ToxCast battery of cell or protein-based assays E: Chemicals dissolved in media with vehicles such as dimethyl sulfoxide (DMSO) C: DMSO in media O: Lower 5th percentile of responses measured as positive hit calls |
|
| High throughput screening battery using four hiPSC-derived cell types (hepatocytes, neurons, cardiomyocytes, endothelial cells), human umbilical vein endothelial cells [HUVECs], and population-based lymphoblastoid cell lines (LCLs) | P: Battery of hiPSC-derived cells, HUVECs, and 146 human LCLs from four diverse subpopulations of European and African descent E: Chemicals dissolved in media with DMSO C: Negative controls: DMSO in media; Positive controls: known positive chemicals/drugs that were specific for each cell type. O: Most sensitive functional and cytotoxic phenotypes for each five-cell-type and decreased intracellular ATP concentration as cell viability for LCLs |
|
| High throughput screening battery in hiPSC-derived population-based cardiomyocytes | P: hiPSC-derived cardiomyocytes from 5 donors E: Chemicals dissolved in media with DMSO C: Negative controls: DMSO in media; Positive controls: isoproterenol (maximum increase in beats per minute [BPM]), propranolol (EC50 in BPM), and cisapride (maximum increase in decay–rise ratio). O: Most sensitive functional or cytotoxic phenotypes: increased response for [+] chronotropy and decay–rise ratio or decreased response for [−] chronotropy; Asystole phenotype: decrease in peak frequency, decreased total number of cells |
The remaining “parallel” PECOs reflect the approach for each in vitro method used in our study. We include ToxCast as a comparison that has been used previously. The remaining approaches are based on previous work in our laboratories, where the selection of cell models, toxicity testing assays and relevant parameters are detailed in original studies (Ford et al. 2022; Chen et al. 2020; Burnett, Blanchette, et al. 2021a). The first is a battery of five cell-types representing various organs and tissues that are important toxicologically (liver, nervous system, heart, vasculature), including four iPSC-derived bioassays, in addition to LCLs from a multiple (>100) donors to address population variability. Across these assays, a multitude of phenotypes were evaluating, including those related to their physiological function (e.g., neurite outgrowth in neurons, cell beating in cardiomyocytes) as well as measures of viability. This approach is intended to represent broader coverage of the biological domain, as suggested by NASEM (2023). Separately, we incorporated a larger screening dataset just in iPSC-derived population-based cardiomyocytes to assess the ability of a “sensitive” cell type to serve as a surrogate for in vivo PODs. Additionally, we tried to be a consistent as possible with respect to PODs. All ToxCast bioactivity-based PODs those reported by U.S. EPA derived through their standard pipeline with a Hill model and gain-loss model at 50% response (AC50). For our own datasets, we only used previously published POD values, which were derived using a Hill model, and response levels that were specific to each phenotype as previously reported. Finally, we note that because there are insufficient human data to serve as a test of external validity, therefore herein we compare to the “traditional” approaches based on in vivo experimental animal data, but fully converted to human equivalent estimates through use of allometric scaling and high throughput toxicokinetic modeling.
Data Analysis Workflow
Based on the framework outlined by the parallel PECO statements, Figure 1 illustrates the workflow of our study design. Initially, we identified a total of 652 chemicals with noncancer reference dose (RfD) values from the U.S. EPA Regional Screening Levels (RSLs) database. Among these, 56 chemicals have published in vivo dose-response data, were tested in ToxCast in vitro bioactivity assays, and have toxicokinetic predictions for the human plasma steady state concentration (CSS) available from high-throughput toxicokinetics (httk) R package. Within this subset, we previously tested 15 substances, including 3 PAHs, 3 intermediates, and 9 pesticides in a battery of six-human cell-derived in vitro assays (i.e., four hiPSC models [cardiomyocytes, neurons, hepatocytes, endothelial cells], human umbilical vein endothelial cells [HUVECs], and LCLs) (Ford et al. 2022; Chen et al. 2020). These 15 substances, plus an additional 26 chemicals (3 flame retardants, 2 plasticizers, 2 surfactants, 1 food additive, 13 intermediates, and 5 pesticides), were also previously tested in a population-based hiPSC-derived cardiomyocyte assay (Burnett, Blanchette, et al. 2021a). A detailed list of the selected chemicals is provided in Table 2.
Figure 1.

Overview of chemical selection in this study and the derivation of in vivo and in vitro PODs. Note: steady state concentration CSS, human induced pluripotent stem cell hiPSC, point of departure POD, the 5th percentile from the distribution of 50% maximal activity concentration AC50, iPSC-derived cardiomyocyte iPSC-CM, iPSC-derived neuron iPSC-Neu, iPSC-derived hepatocyte iPSC-Hep, iPSC-derived endothelial cell iPSC-Endo, human umbilical vein endothelial cell HUVEC, lymphoblastoid cell line LCL.
Table 2.
Information of chemicals and classifications, regulatory in vivo PODs, and available in vitro POD datasets (denoted by X) used in this study.
| Chemical | CAS | Classification | ToxCast PODs | Six-cell-type PODs | Population Cardiomyocyte PODs |
|---|---|---|---|---|---|
| 1,2,3-Trichlorobenzene | 87-61-6 | Solvent | X | X | X |
| Acenaphthene | 83-32-9 | PAH | X | X | X |
| Aldrin | 309-00-2 | Pesticide | X | X | X |
| Azinphos-methyl | 86-50-0 | Pesticide | X | X | X |
| Benzidine | 92-87-5 | Chemical intermediate | X | X | X |
| Chlorpyrifos | 2921-88-2 | Pesticide | X | X | X |
| Dicofol | 115-32-2 | Pesticide | X | X | X |
| Dieldrin | 60-57-1 | Pesticide | X | X | X |
| Endosulfan | 115-29-7 | Pesticide | X | X | X |
| Fluoranthene | 206-44-0 | PAH | X | X | X |
| gamma-Hexachlorocyclohexane | 58-89-9 | Pesticide | X | X | X |
| Heptachlor | 76-44-8 | Pesticide | X | X | X |
| Naphthalene | 91-20-3 | PAH | X | X | X |
| p-Cresol | 106-44-5 | Chemical intermediate | X | X | X |
| Pentachlorophenol | 87-86-5 | Pesticide | X | X | X |
| Biphenyl | 92-52-4 | Chemical intermediate | X | X | |
| 1,2,4,5-Tetrachlorobenzene | 95-94-3 | Chemical intermediate | X | X | |
| 1,3-Dinitrobenzene | 99-65-0 | Chemical intermediate | X | X | |
| 1,4-Dichlorobenzene | 106-46-7 | Chemical intermediate | X | X | |
| 2,4,5-Trichlorophenoxyacetic acid | 93-76-5 | Pesticide | X | X | |
| 2-Mercaptobenzothiazole | 149-30-4 | Pesticide | X | X | |
| 3-nitrotoluene | 99-08-1 | Chemical intermediate | X | X | |
| 4-Nitroaniline | 100-01-6 | Chemical intermediate | X | X | |
| 4-Nitrotoluene | 99-99-0 | Chemical intermediate | X | X | |
| Butyl Benzyl Phthalate | 85-68-7 | Plasticizer | X | X | |
| Cacodylic acid | 75-60-5 | Pesticide | X | X | |
| Caprolactam | 105-60-2 | Chemical intermediate | X | X | |
| di-N-octyl phthalate | 117-84-0 | Plasticizer | X | X | |
| Hexachlorocyclopentadiene | 77-47-4 | Chemical intermediate | X | X | |
| Methyl ethyl ketone | 78-93-3 | Food additive | X | X | |
| Mirex | 2385-85-5 | Pesticide | X | X | |
| Nitrobenzene | 98-95-3 | Chemical intermediate | X | X | |
| p, p’-DDE | 72-55-9 | Pesticide | X | X | |
| p-Chloroaniline | 106-47-8 | Chemical intermediate | X | X | |
| Perfluorononanoic acid (PFNA) | 375-95-1 | Surfactant | X | X | |
| Perfluorooctanesulfonic acid (PFOS) | 1763-23-1 | Surfactant | X | X | |
| Phenol | 108-95-2 | Chemical intermediate | X | X | |
| Phenothiazine | 92-84-2 | Chemical intermediate | X | X | |
| Potassium perfluorobutanesulfonate | 29420-49-3 | Flame retardant | X | X | |
| Tris(1,3-dichloro-2-propyl) phosphate | 13674-87-8 | Flame retardant | X | X | |
| Tris(2-chloroethyl)phosphate | 115-96-8 | Flame retardant | X | X |
For each of the selected chemicals, we derived alternative in vivo PODs using the available in vivo dose-response data by (1) applying Bayesian benchmark dose modeling and (2) deriving human equivalent dose (HED)-PODs through deterministic (allometric scaling) or probabilistic interspecies adjustments. To appropriately compare with in vitro PODs, we converted in vivo oral PODs (mg/kg-day) to steady state plasma concentrations (μM) using the httk R package database. The ToxCast in vitro PODs were defined as the 5th percentile from the distribution of 50% maximal activity concentration (AC50) values. The six-cell type in vitro POD was selected based on the most sensitive POD cell type and phenotype. The population-based hiPSC-derived cardiomyocyte PODs were also selected based on the most sensitive phenotypic POD for the population median.
Additional details for the derivation of each POD and subsequent analyses are provided as follows.
Derivation of in vivo PODs
Four alternative in vivo PODs were derived for each chemical:
The original regulatory in vivo POD in the experimental animal (Reg PODA) used to support the derivation of the regulatory toxicity value as collated in the U.S. EPA Regional Screening Levels (RSLs) database. Supplemental Table S1 listed the original regulatory POD studies, the study species, and in vivo PODs in animal doses for selected chemicals.
The human equivalent chronic dose for the original regulatory in vivo POD (Reg PODH), converted from animal dose using EPA default factors for allometric scaling from U.S.EPA (2011) and (if necessary) subchronic to chronic adjustment.
Bayesian model averaging-based benchmark dose (BMA BMDA), calculated using Bayesian BMD (BBMD) system (https://benchmarkdose.org) (Shao and Shapiro 2018), from the original dose-response data in the experimental animal supporting regulatory in vivo PODs. If multiple endpoints were the basis for the original LOAEL or NOAEL for a given chemical, we selected the endpoint with the minimum median BMD value as the most sensitive POD for further analysis.
The human equivalent chronic benchmark dose (BMA BMDH), derived from applying WHO/IPCS (2018) default distributions for probabilistic inter-species and duration adjustments to the BMA BMDA.
Additional details for each derivation are provided in Supplementary Materials and Methods.
Conversion of in vivo oral POD to steady state concentration (CSS)
All in vivo oral PODs (mg/kg-day) were converted into plasma steady state concentration (μM) to facilitate the comparison with in vitro PODs. The human plasma CSS was obtained from the httk R package (v2.2.1) (Pearce et al. 2017) based on a 3-compartment steady state model, a constant dose rate of 1 mg/kg per day, and 100% bioavailability. The CSS values were derived programmatically using “calc_analytic_css” function with 50th quantile (which.quantile=0.5), species of human (species=“Human”), the default 3 compartment model (model=“3compartmentss”), and a unit of μM (output.units = “uM”). The conversion followed the equation
| (4) |
Selection of in vitro NAM-based PODs
Supplemental Table S1 summarizes the in vitro PODs used for each chemical, with additional details for each dataset provided below.
ToxCast in vitro POD
ToxCast in vitro bioactivity data was obtained from the U.S. EPA CompTox Chemicals Dashboard (Williams et al. 2017). For each assay endpoint, the concentration-response was fitted by EPA using the Hill model and/or the gain-loss model to derive AC50. The bioassay data in our work was accessed from the ToxCast Summary files during October 2022 and April 2023. We only included bioassay data that had active hit calls from multi-concentration tests. In this study, the ToxCast POD was defined as the most sensitive 5th percentile on the distribution of the ToxCast AC50 value (μM), analogous to the approach previously used by Paul Friedman et al. (2020).
Six-cell-type in vitro PODs
Combined in vitro POD values from two datasets previously published in Chen et al. (2020) and Ford et al. (2022) were merged into a six human-based cell-based battery of assays. The Chen et al. (2020) dataset involve four organ/tissue types (i.e., hepatocytes, neurons, cardiomyocytes, and endothelial cells) as well as pooled HUVECs. In brief, cells were exposed to chemicals across a range of concentrations spanning five orders of magnitude, from 0.01 to 100 μM. Cytotoxicity and physiologically-relevant phenotypes were assessed using high-content live cell imaging. All responses were normalized to vehicle controls for evaluation. Concentration-response of each treatment was then fitted using a nonlinear logistic model (Sirenko et al. 2013). The POD values were determined as the concentration at which the fitting curve exceeds one standard deviation above or below the mean of vehicle controls (U.S.EPA 2012).
The Ford et al. (2022) dataset used a population-based model of human LCLs. In brief, a total of 146 cell lines were included in the in vitro assay, consisting of four subpopulations (Utah residents with European ancestry, Tuscans in Italy, British from England and Scotland, and Yoruban from Ibadan, Nigeria) and with a balanced sex ratio. Concentrations ranging from 0.01 to 100 μM were tested, spanning five orders of magnitude. Cell viability was measured using an intracellular adenosine triphosphate (ATP) concentration after a 24 hr exposure period to the test chemicals. Each cell line was evaluated in replicates within or between plates. A hierarchical Bayesian random effects Hill model was applied to fit the concentration-response for each chemical, following a “downward” Hill model as described in Chiu, Wright, and Rusyn (2017). The effective concentration at which a 10% change in cell viability compared to vehicle control (EC10) for the median and 5th percentile individual were considered as the benchmark responses, representing the chemical-specific POD.
We combined POD values in each hiPSC-derived, HUVEC, and LCL-derived PODs, and selected the minimum value as the most sensitive POD value of six-cell-type dataset for each test chemicals. In most cases, iPSC-derived cardiomyocytes exhibited the most sensitive in vitro POD values (Supplemental Fig. S1).
Population hiPSC-derived cardiomyocyte in vitro PODs
As Chen et al. (2020) previously observed that hiPSC-derived cardiomyocytes appeared to exhibit more sensitive POD values, we also utilized a larger dataset of population-based cardiomyocyte screening from Burnett, Blanchette, et al. (2021a) to compare in vitro PODs and in vivo PODs for 41 chemicals. In brief, the iPSC-derived cardiomyocytes were retrieved from 5 human donors, consisting of individuals of European, African, American, and mixed ancestry (3 females and 2 males), without diagnosed cardiovascular disease or familial history of cardiovascular disease. Cardiotoxicity-relevant phenotypes were assessed using the Ca2+ flux measurements (Sirenko et al. 2017; Grimm et al. 2015; Grimm et al. 2018). Cell viability was then tested using high-content cell imaging. Concentration-response was normalized to vehicle controls and fit the curve with a random-effects Hill model under the Bayesian approach. The PODs for functional phenotypes related to negative/positive chronotropy and QT prolongation were determined as the concentration at which the response changed 5% above or below the vehicle controls. For the asystole phenotype, PODs were derived from a 95% decrease of peak frequency compared to the vehicle controls. The PODs from cytotoxicity were determined based on a 10% decrease in the total number of cells compared to the vehicle controls. For each chemical, we selected the lowest POD value for the population median across different phenotypes as the most sensitive POD value for population hiPSC-derived cardiomyocytes.
Concordance analyses
In the analysis of the 15-chemical dataset, we compared each alternative in vivo POD with ToxCast PODs and the most sensitive six-cell-type PODs. For the larger 41-chemical dataset, each alternative in vivo POD was compared to ToxCast PODs and the most sensitive hiPSC-derived cardiomyocyte POD. We evaluated concordance using multiple statistical measures:
To evaluate the correlation between in vitro PODs and in vivo PODs, we calculated Spearman correlation coefficient (ρ) based on a linear regression model. Because the BMA BMDs include an uncertainty estimate (unlike regulatory PODs), for comparisons involving BMA BMDA and BMA BMDH, we used inverse variance weighting in the linear regression model to account for differences in the degree of uncertainty for each data point.
To evaluate accuracy and precision of in vitro PODs for predicting in vivo PODs, we used random effects models in a meta-analysis-based approach. Specifically, we defined the prediction error as the log10 POD ratio (in vitro POD/in vivo POD)). A log10 POD ratio close to 0 indicates more accurate prediction, while values smaller than 0 indicate that the in vitro POD is more conservative, and values greater than 0 indicate that the in vitro POD is less conservative. The metafor R package (v4.0-0) (Viechtbauer 2010) was implemented to conduct random-effects meta-analyses on the log10 POD ratios for each dataset. The random-effects model was fitted with the “rma” function, addressing the integration of associated estimates of chemicals (observations) based on the DerSiomonian and Laird method (DerSimonian and Laird 1986). For Reg PODA and Reg PODH, which are fixed values, we assumed P95/P05 = 100 to derive their standard errors. For BMA BMDA and BMA BMDH, we used the 95th quantile and 5th quantile to calculate their standard errors. The formula used for the calculation was
| (5) |
where z is 1.645 based on 90% confidence interval (CI) and SE represents standard error.
Forest plots were generated to present the overall pooled outcomes of the meta-analyses, treating each chemical as one observation. The I2 index represents the percentage of heterogeneity attributed to the total variability in effect size estimates (Higgins et al. 2003). The tau statistics represent the heterogeneity of the effect size estimates, reflecting the precision of the log10 POD ratio from a random chemical. The random-effects summary of the central estimate and 90% CI served as indicators to assess the accuracy of the log10 POD ratio from a random chemical. We also applied “predict” function in the metafor package to obtain predicted log10 POD ratio values under the fitted random-effects model, which represents the expected variation of log10 POD ratio for a new random chemical.
Software
The statistical analyses were performed using R software (version 4.2.1) with R Studio as the interface. Model codes for computing concordance analyses are available on GitHub (https://github.com/EnHsuan/Concordance-of-in-vitro-and-in-vivo-PODs).
Results
Derivation of in vivo PODs
The Supplementary Material File S1 shows the dose-response datasets, estimated individual BMD model fits, and model-averaged BMDs for each chemical and endpoint. The dose-response trend and model-averaged BMD were compared with Reg PODA to visually assess their differences. The detailed modeling summary, including comparison of BMA BMDA distributions with regulatory PODs for each dataset, can be found in Supplementary Material File S2. For each chemical and endpoint, the median and 90% CI of the BMA BMDA results are summarized in Supplemental Table S2.
To examine the effect of Bayesian BMD, we compared Reg PODA with the most sensitive BMA BMDA in both the 15-chemical dataset and the 41-chemical dataset (Figure 2). In both datasets, the estimated BMA BMDA highly correlated with Reg PODA (R2 > 0.85, p < 0.01). Additionally, we examined the log10 ratio of Reg PODA to the median of BMA BMDA. For the 15-chemical dataset, the log10 ratio ranged from −1.39 to 1.76, with 11 chemicals having a log10 ratio above 0, indicating that Reg PODA were typically higher (less conservative) than the BMA BMDA. Similarly, in the 41-chemical dataset, the log10 ratio ranged from −1.90 to 1.76, with 21 chemicals showing a log10 ratio greater than 0. Across all 41 chemicals, only 6 chemicals (heptachlor, p-cresol, 1,2,4,5-tetrachlorobenzene, pentachlorophenol, gamma-hexachlorocyclohexane, and p-chloroaniline) showed a log10 ratio more than 1 or less than −1 (difference greater than one order of magnitude).
Figure 2.

(A) Correlation of regulatory in vivo PODs in animal doses (Reg PODA) and Bayesian model average benchmark doses in animal doses (BMA BMDA); the black line represents the unit line (y = x). (B) log10 ratio of Reg PODA to BMA BMDA for 15-chemical dataset (red) and the dataset consisting of an additional 26 chemicals (black). The BMA BMDA used for each chemical is the median estimate for the most sensitive endpoint. The dashed reference line represents when Reg PODA equals BMA BMDA.
Concordance analysis for 15-chemical dataset
To test our hypotheses regarding how to improve the alignment of in vitro and in vivo PODs, we first compared alternative in vivo PODs with 15 chemicals that had both the ToxCast PODs and six-cell-type PODs. The results of the correlation analyses as well as the application of the random-effects model to the log10 POD ratio are summarized in Table 3. Spearman correlations (ρ) increased when using the BMA BMD approach as compared to Reg PODs on the in vivo side, and when using the six-cell-type PODs as compared to ToxCast PODs on the in vitro side (Supplemental Fig. S2). Additionally, use of the six-cell-type PODs instead of ToxCast PODs improved the accuracy of concordance as measured by the central estimate of the log10 POD ratio. While application of interspecies scaling, whether deterministic or probabilistic, did not consistently increase correlation, it did substantially reduce the bias and thus increase the accuracy of the in vitro PODs, shifting the log10 POD ratio closer to 0.
Table 3.
Meta-analysis results of prediction error (log10 POD ratio) and correlation statistics for six-cell-type dataset consisting of 15 chemicals. The correlation statistics are analyzed between in vitro PODs and in vivo PODs.
| in vivo vs. ToxCast PODs | in vivo vs. Six-cell-type PODs | |||||||
|---|---|---|---|---|---|---|---|---|
|
|
||||||||
| Pooled log10 POD ratio (CI) [PI] | I2 (%) | tau | ρ | Pooled log10 POD ratio (CI) [PI] | I2 (%) | tau | ρ | |
| Reg | −1.30 (−1.96, −0.64) | 84.60 | 1.43 | −0.31 | −0.90 (−1.61, −0.19) | 86.80 | 1.56 | 0.13 |
| PODA | [−3.74, 1.14] | [−3.56, 1.76] | ||||||
| Reg | −0.47 (−1.08, 0.14) | 82.20 | 1.31 | −0.26 | −0.07 (−0.67, 0.53) | 81.40 | 1.27 | 0.11 |
| PODH | [−2.71, 1.77] | [−2.25, 2.11] | ||||||
| BMA | −1.33 (−2.12, −0.54) | 97.60 | 1.81 | −0.04 | −0.94 (−1.73, −0.14) | 97.60 | 1.82 | 0.53 |
| BMDA | [−4.42, 1.75] | [−4.03, 2.15] | ||||||
| BMA | −0.57 (−1.35, 0.20) | 95.50 | 1.78 | 0.08 | −0.18 (−0.93, 0.58) | 95.20 | 1.72 | 0.41 |
| BMDH | [−3.58, 2.43] | [−3.10, 2.75] | ||||||
Abbreviations: Reg PODA = original regulatory POD in animal dose units; Reg PODH = original regulatory POD converted to human equivalent dose units; BMA BMDA = Bayesian model averaging-based benchmark dose in animal dose units; BMA BMDH = Bayesian model averaging-based benchmark dose converted to human equivalent dose units; log10 POD ratio = log10(in vitro POD/in vivo POD); CI: 90% confidence interval on the average log10 POD ratio; PI: 90% prediction interval for a new random chemical; I2 = statistic describing the percentage of variation across chemicals that is due to inter-chemical variability rather than uncertainty in individual PODs; tau = estimated random effects standard deviation for the inter-chemical variability in log10 POD ratio; ρ = Spearman correlation coefficient of log10(in vitro POD) vs. log10(in vivo POD).
Figure 3 shows a comparison between the log10 POD ratio based on BMA BMDH for the ToxCast dataset and the six-cell-type dataset. The six-cell-type dataset showed improved accuracy of concordance (−0.18 [90% CI: −0.93 – 0.58]) compared to the ToxCast dataset (−0.57 [90% CI: −1.35 – 0.20]). Additionally, most of the six-cell-type PODs (10/15 chemicals, 66.7%) were more conservative than BMA BMDH, a pattern similar to the results in ToxCast PODs (10/15 chemicals, 66.7%). Supplemental Figures S3–S5 provide the similar detailed results of meta-analyses based on Reg PODA, Reg PODH, and BMA BMDA. The heterogeneity of log10 POD ratio estimates, as measured by the tau and I2 statistics, was similar whether using ToxCast or six-cell-type PODs, implying that the precision of concordance was not significantly improved when using assays from six-cell-types.
Figure 3.

Meta-analysis results for prediction error (the ratio of NAM to traditional POD) based on (A) ToxCast PODs and (B) six-cell-type PODs for 15 chemicals. The log10 POD ratio is based on BMA BMDH using a random-effects model. Chemicals are ordered based on their central estimates in panel A.
Concordance analysis for 41-chemical dataset
The results of the correlation analyses as well as application of the random-effects modeling when we expanded our analysis to include the 41-chemical dataset are summarized in Table 4. We also observed only a slightly improved correlation as measured by Spearman ρ between alternative in vivo PODs and hiPSC-derived cardiomyocytes PODs compared to the results with ToxCast PODs (Supplemental Fig. S6). Similar to the findings from the six-cell-type dataset, the incorporation of interspecies extrapolation did not improve the correlation results. However, in contrast to the six-cell-type dataset, the BMA BMD approach did not lead to improvements in correlations within the cardiomyocyte dataset.
Table 4.
Meta-analysis results of prediction error (log10 POD ratio) and correlation statistics for 41-chemical dataset. The correlation statistics are analyzed between in vitro PODs and in vivo PODs.
| in vivo vs. ToxCast PODs | in vivo vs. cardiomyocyte PODs | |||||||
|---|---|---|---|---|---|---|---|---|
|
|
||||||||
| Pooled log10 POD ratio (CI) [PI] | I2 (%) | tau | ρ | Pooled log10 POD ratio (CI) [PI] | I2 (%) | tau | ρ | |
| Reg | −1.61 (−2.02, −1.19) | 85.70 | 1.49 | −0.07 | −0.49 (−0.88, −0.10) | 84.20 | 1.40 | 0.12 |
| PODA | [−4.09, 0.88] | 0 | [−2.83, 1.85] | |||||
| Reg | −0.75 (−1.15, −0.35) | 85.00 | 1.45 | −0.07 | 0.37 (−0.03, 0.77) | 84.70 | 1.43 | 0.1 |
| PODH | [−3.16, 1.66] | 0 | [−2.02, 2.76] | |||||
| BMA | −1.73 (−2.20, −1.26) | 98.9 | 1.78 | 0.17 | −0.63 (−1.08, −0.18) | 98.80 | 1.69 | 0.08 |
| BMDA | [−4.70, 1.23] | 0 | [−3.44, 2.18] | |||||
| BMA | −0.92 (−1.38, −0.46) | 97.3 | 1.73 | 0 | 0.18 (−0.27, 0.63) | 97.20 | 1.68 | 0.11 |
| BMDH | [−3.80, 1.96] | 0 | [−2.62, 2.97] | |||||
Abbreviations: Reg PODA = original regulatory POD in animal dose units; Reg PODH = original regulatory POD converted to human equivalent dose units; BMA BMDA = Bayesian model averaging-based benchmark dose in animal dose units; BMA BMDH = Bayesian model averaging-based benchmark dose converted to human equivalent dose units; log10 POD ratio = log10(in vitro POD/in vivo POD); CI: 90% confidence interval on the average log10 POD ratio; PI: 90% prediction interval for a new random chemical; I2 = statistic describing the percentage of variation across chemicals that is due to inter-chemical variability rather than uncertainty in individual PODs; tau = estimated random effects standard deviation for the inter-chemical variability in log10 POD ratio; ρ = Spearman correlation coefficient of log10(in vitro POD) vs. log10(in vivo POD).
Figure 4 illustrates the meta-analysis results of log10 POD ratio based on BMA BMDH. The central estimate of log10 POD ratio using hiPSC-derived cardiomyocyte assays was close to 0 (0.18 [90% CI: −0.27 – 0.63], implying that hiPSC-derived cardiomyocyte PODs accurately predicted BMA BMDH. In contrast, the central estimate of log10 POD ratio using ToxCast assays was −0.92 (90% CI: −1.38 – −0.46), suggesting that ToxCast PODs were more conservative on average than BMA BMDH. This pattern is also shown in the higher proportion of ToxCast assay PODs (28/41 chemicals, 68.3%) versus hiPSC-cardiomyocyte assays PODs (20/41 chemicals, 48.8%) with log10 POD ratio < 0.
Figure 4.

Meta-analysis results for prediction error (the ratio of NAM to traditional POD) based on (A) ToxCast PODs and (B) hiPSC-derived cardiomyocyte PODs for 41 chemicals. The log10 POD ratio is based on BMA BMDH using a random-effects model. Chemicals are ordered based on their central estimates in panel A.
Table 4 and Supplemental Figures S7–S9 also present parallel results of the meta-analyses for the cardiomyocyte dataset based on Reg PODA, Reg PODH, and BMA BMDA. Generally, the accuracy of the POD was improved by using hiPSC-derived cardiomyocyte PODs instead of ToxCast PODs. However, the heterogeneity of log10 POD ratio estimates, as measured by the tau and I2 statistics, were similar whether using ToxCast or hiPSC-derived cardiomyocyte PODs, implying that the precision of concordance was not significantly different.
Discussion
Since the seminal National Academies report Toxicity Testing in the 21st Century: A Vision and Strategy (NRC 2007), there has been extensive interest in the toxicology community to move to an in vitro toxicity testing paradigm. However, a major challenge has been how to establish the appropriate benchmark for ensuring the accuracy, precison, and health-protectiveness of in vitro assays. As reviewed in the recent National Academies report Building Confidence in New Evidence Streams for Human Health Risk Assessment: Lessons Learned from Laboratory Mammalian Toxicity Tests (NASEM 2023), there are few high quality systematic reviews to evaluate concordance of NAMs with human in vivo data; such analyses require alignment on factors such as exposure timing and duration, life stage, sex, and outcome. Therefore, most existing efforts compare in vitro predictions with in vivo data from laboratory mammalian studies. From a regulatory risk assessment viewpoint, this makes some sense based on the counterfactual statement that if adequate in vivo laboratory mammalian studies were to exist, then they would be accepted for use in risk-based decision-making. Thus, if a NAMs-based method were to provide an adequate surrogate for such in vivo laboratory mammalian studies, then they could be deemed acceptable for regulatory use.
Several recent studies have concluded that in vitro bioactivity data yield “conservative” or “protective” PODs that have utility for prioritizing chemicals when the traditional in vivo data is absent (Paul Friedman et al. 2020; Beal et al. 2022; Health Canada 2021; Paul Friedman et al. 2016; Nicolas et al. 2022). However, the correlation between in vitro and in vivo PODs is generally poor (Fantke et al. 2021; Wignall et al. 2018). There are several limitations in these previous concordance analyses. First, these comparisons used U.S. EPA’s ToxVal Database (Lowe and Williams 2021), which is a compilation of heterogeneous in vivo datasets, with high variation across chemicals in terms of the species tested, the types of PODs (NOEL, NOAEL, LOEL, etc.) reported, and the number of studies available. Several studies attempted to reduce this heterogeneity, such as only using “NOAEL” PODs (Nicolas et al. 2022) or applying allometric scaling to address species differences (Paul Friedman et al. 2020), but these were not consistently applied across all studies. Moreover, none of the previous efforts benchmarked in vivo data against the results of regulatory assessments, so there is no assurance that the in vivo data in ToxValDB would be considered adequate for public health decision-making.
However, previous studies have found that although on average in vitro PODs are lower than, and hence “protective” of, in vivo PODs, their low correlations imply that in vitro dose-response bioactivity data cannot distinguish between chemicals that are lower and higher potency in vivo. Thus, their value in regulatory decision-making is extremely limited. Therefore, in this study, we investigated several approaches hypothesized to improve the alignment between in vitro PODs and in vivo PODs. The specific context of use we envision, as outlined in our parallel PECO statement, is the derivation of protective public health PODs that can be used for regulatory decision-making. Thus, our in vivo benchmark consists of PODs underlying regulatory toxicity values that are already in use for health-based decision-making – namely EPA’s Regional Screening Levels, which are used “to help identify areas, contaminants, and conditions that require further federal attention at a particular site” and “as initial cleanup goals” (U.S. EPA 2023a).
For PODs based on ToxCast in vitro bioactivity, our results are consistent with findings from previous studies that (1) ToxCast bioactivity is “conservative” as compared to in vivo PODs by one to two orders of magnitude; (2) application of interspecies scaling reduces the “conservative” bias to one order of magnitude or less; (3) there is a poor correlation between ToxCast PODs and in vivo PODs. As shown in Tables 3–4, the only case in which we found a notable improvement in both correlation (ρ increased from less than 0 to 0.41) and accuracy (from 3-fold to 1.5-fold difference between PODs on average) when combining the use of (1) PODs based on BMA BMD modeling, (2) interspecies scaling, and (3) a six-cell-type battery of human-based in vitro assays, including four hiPSC-derived cell types. However, the generalizability of this result is unclear because it is only based on 15 chemicals. The attempt to increase sample size by using only a single cell type, in this case for hiPSC-derived cardiomyocytes, retained the improvement in accuracy, but decreased the correlation to close to zero. It is worth noting that in all cases, the precision did not show substantial improvement, and 90% prediction intervals spanned about two to three orders of magnitude in either direction.
To more directly compare our results with previous published studies, we re-analyzed several other previously published datasets using the same metrics for concordance (Figure 5). Supplemental Fig. S10A shows our re-analysis data from Paul Friedman et al. (2020), revealing a poor correlation (ρ = 0.11) between traditional in vivo PODs and NAM-based PODs consistent with our analysis comparing ToxCast with in vivo regulatory PODs. We derived a median log10 POD ratio of −1.26 (90% CI: −1.38 – −1.14) based on their data, which was also similar to our ToxCast analyses for a smaller dataset using regulatory PODs (Supplemental Table S3). Although Nicolas et al. (2022) reported an R2 of 0.38 for NOAEL- and ToxCast- based margins of exposure (MOEs), these two MOEs both are calculated using the same exposure estimate, which artificially increases their correlation because they share a common factor. By contrast, when comparing just the NOAEL and ToxCast PODs directly, the R2 falls to 0.03, with a correlation coefficient of 0.17 (Supplemental Fig. S10B). The pooled median log10 POD ratio of −1.95 (90% CI: −2.07 – −1.83) is also similar to Paul Friedman et al. (2020) (Supplemental Table S3). Overall, these previously published findings using larger samples of ToxCast PODs give results that are very similar to our analyses using ToxCast PODs from smaller subsets of chemicals.
Figure 5.

Comparisons of concordance of New Approach Methods with in vivo PODs: (A) accuracy in terms of NAM/in vivo POD ratio, (B) precision in terms of 90% prediction interval width, (C) and rank correlation (ρ). Paul Friedman et al. (2020) and Nicolas et al. (2022) were both based on subsets of ToxCast; the present work is represented by “ToxCast,” “Six-cell-type assays,” and “Cardiomyocyte assays”; and ETAP refers to the (in vivo) U.S. EPA Transcriptomic Assessment Product (U.S. EPA 2023b).
We also compared our results with findings from the U.S. EPA Transcriptomic Assessment Product (ETAP) analysis (U.S. EPA 2023b), which is proposed as a means to derive a POD from a short-term rat in vivo study as an acceptable replacement for PODs derived from traditional chronic in vivo animal studies. ETAP-derived central estimates of log10 POD ratios (−0.16 [90% CI: −0.47 – 0.15]) (Supplemental Table S3) and correlation coefficient (ρ=0.43, p value=0.05) are similar to our results for six-cell-type PODs (Supplemental Fig. S11). Although the prediction interval for ETAP was much narrower, spanning only an order of magnitude in each direction, this result may be spurious because the ETAP dataset spanned a much narrower range of in vivo potency, only three orders of magnitude from the least to most potent chemical, while in our dataset, the regulatory in vivo PODs spanned more than six orders of magnitude.
Overall, as summarized in Figure 5, these comparisons show that in terms of accuracy, the best concordance between NAMs-derived PODs and traditional animal study-derived PODs used for deriving regulatory values was from the six-cell types, hiPSC-derived cardiomyocytes, and ETAP approaches. All in vitro-based approaches had relatively wide prediction intervals and the six-cell type and ETAP approaches had the highest (and comparable) rank correlation. Thus, our results support the hypothesis that reducing heterogeneity in in vivo PODs, addressing species differences, and using a focused battery of in vitro assays representing multiple tissues including hiPSC-derived assays, or PODs derived from gene expression data in short-term rat studies, can lead to greater concordance in NAMs-based and traditional in vivo PODs. These data also suggest that hiPSC-derived cells, which are available for a number of human tissues, may provide PODs that are more suitable for regulatory decision making. Moreover, as the ToxCast database has not yet expanded its analysis to hiPSC-derived assays, a fully human-based in vitro assay library may hold promise for the future. Nonetheless, while the six-cell-type battery we propose has the best performance in terms of correlation and accuracy, the precision is still relatively poor, which may still limit the utility of these in vitro PODs for regulatory decision-making.
We acknowledge certain limitations in our analysis that may constraint the generalizability of our results or may contribute to the relatively low precision of in vitro PODs. For instance, we note the relatively small sample sizes (in terms of the number of chemicals examined) of our datasets. In the original in vitro datasets, the six-cell-type dataset was tested with 42 chemicals (Chen et al. 2020; Ford et al. 2022) and hiPSC-derived cardiomyocyte dataset was tested with over 1000 chemicals (Burnett, Blanchette, et al. 2021a). However, the small dataset of 15 and 41 chemicals in our observations results from a deliberate selection of chemicals with sufficient in vivo data from regulatory toxicity values to conduct more comprehensive analyses. Despite the smaller sample size in our study, our findings reveal that our results of ToxCast bioactivity-based PODs were very similar to those from our reanalysis of previous studies with larger sample sizes with respect to correlation, accuracy, and precision. Thus, we feel that our conclusions remain representative, but we acknowledge the need for additional testing to confirm and generalize our results. Additionally, our population-based cardiomyocyte data was based on the median estimates across 5 donors, but as a sensitivity analysis, we also examined concordance using the “standard” hiPSC-derived cardiomyocyte donor (Burnett, Blanchette, et al. 2021a). Although the accuracy of the single donor hiPSC-derived cardiomyocyte assays was slightly worse as compared to the population-based hiPSC-derived cardiomyocyte assays, the correlations improved slightly to ρ = 0.17 – 0.23 while the precision remained about the same (Supplemental Table S4). Overall, the results were still not as concordant as with the six-cell-type PODs.
Several of the other limitations in this work are common to previous studies examining in vitro to in vivo concordance as well. First, existing analyses, including ours, have used nominal concentrations when deriving in vitro PODs, which may not fully characterize the effective concentration related to the biological response. We and others have previously shown that measured free concentrations in media may substantially differ from nominal concentrations (e.g., Valdiviezo et al. (2021)). Furthermore, in work with cardiomyocytes, we showed that adjusting to free concentrations in media and in plasma is an essential step toward showing quantitative concordance between in vitro PODs and human in vivo PODs (Blanchette et al. 2019). While measuring free concentrations in a high throughput screening assay is likely infeasible due to low sample volumes and analytical bottlenecks, in vitro mass balance modeling may be a feasible surrogate. Models such as In Vitro Mass Balance Model (IV-MBM) (Armitage et al. 2021) and others reviewed by Proenca et al. (2021) and Dimitrijevic et al. (2022) can offer a more accurate measurement of unbound free concentrations in in vitro systems. However, the ToxCast database currently does not have the required information about the test system configuration and cellular and media composition necessary to implement these models. For our six-cell type systems, we expect to be measuring the parameters required for implementing a mass balance model in our future work (in addition to testing additional chemicals) in order to refine these results.
Another limitation is that our in vitro PODs were not based on BMA BMD modeling, as ToxCast and the other previously published datasets (Chen et al. 2020; Ford et al. 2022; Burnett, Blanchette, et al. 2021a) used only the Hill model in their POD analyses, and ToxCast PODs were not derived using Bayesian methods. Based on our comparison for in vivo PODs (Figure 2) show little bias, it appears that this would mainly affect precision rather than accuracy. Additionally, our approach relies on the httk R package database to perform IVIVE. Rather than converting in vitro PODs to oral equivalent doses, we decided to convert in vivo oral PODs to concentrations in terms of CSS because it aligns more closely with the overall source-to-outcome continuum, although the results in terms of log10 POD ratio and correlation would be exactly the same either way. Nonetheless, because the CSS values for these particular chemicals have not been validated with in vivo toxicokinetic data, there may be substantial uncertainty contributing to the low precision found in our concordance analysis. Previously studies have suggested that httk may be systematically higher than in vivo measurements (R2 = 0.34 – 0.47) (Wambaugh et al. 2018; Wambaugh et al. 2015), potentially resulting in an overestimation in our oral-to-concentration conversion. The availability of in vivo human toxicokinetic data would help refine these parameters. Additionally, sensitivity analyses could be performed using alternative httk-like models (Arnot et al. 2022).
Additionally, similar to the Paul Friedman et al. (2020) study, we applied allometric body weight scaling for interspecies extrapolation, which is a default assumption applicable for different chemicals as recommended by the U.S.EPA (2011). To achieve more accurate quantification of interspecies uncertainty, we targeted each endpoint based on species, sex, exposure duration to select the corresponding body weight for allometric scaling. However, the accuracy of interspecies extrapolations could be improved if chemical-specific human and animal toxicokinetic data were available. The utilization of in vitro hepatic models to derive interspecies toxicokinetic data, as demonstrated in Valdiviezo et al. (2022), is a promising strategy to explore more chemicals and may help improve the concordance between in vitro PODs and in vivo PODs.
Furthermore, our analysis does not account for interspecies toxicodynamic differences, which are admittedly very challenging to address. The NRC (2007) report recommends utilizing in vitro data, preferably derived from human models, to address these differences. One approach proposed by Burnett, Karmakar, et al. (2021) used a concentration-response cytotoxicity screening in primary dermal fibroblasts from humans and 53 other diverse species to characterize interspecies toxicodynamic variability of 40 chemicals. This approach can potentially refine the concordance estimates if we are able to collect chemical-specific interspecies toxicodynamic data for the selected chemicals evaluated in our study. Another possibility would be to utilize in vivo human data instead of experimental animal data for comparison. However, the number of regulatory PODs based on human data is very small, and we could find none that had sufficient overlap with the chemicals in our in vitro datasets for analysis. NASEM (2023) recommended that, in the absence of authoritative assessments, it is necessary to utilize systematic reviews in order to confidently assess concordance in this manner, and such work remains to be done.
Conclusion
This study shows that it is possible to improve the concordance between in vitro PODs and in vivo PODs through the integration of hiPSC-derived assays, the BMA BMD approach, and the inclusion of interspecies dosimetric adjustment factor. Notably, the highest level of concordance in terms of accuracy and correlation is observed between six-cell-type PODs and BMA BMDH. On the other hand, our most concordant results are based on a relatively small number of chemicals, and the precision of in vitro PODs as surrogates for in vivo PODs, even after these refinements, remains low, with prediction intervals spanning many orders of magnitude. Thus, further refinement is necessary, both in terms of expanding the range of chemicals analyzed and improving in vivo to in vitro extrapolation approaches. Specifically, in addition to the approaches we have implemented here, two major considerations include increasing the accuracy high throughput toxicokinetic modeling of external to internal dose and better characterizing in vitro bioavailable concentrations through mass balance modeling. Moreover, for both these considerations, current approaches also lack qualitative and quantitative uncertainty analysis that could also be valuable to better characterize their accuracy and precision. Overall, challenges clearly remain in the utilization of NAM-based PODs, but this study offers valuable insights as to the potential path forward for incorporating in vitro PODs into regulatory risk assessment and decision-making.
Supplementary Material
Acknowledgments
This research was supported in part by the U.S. National Institute of Environmental Health Sciences (P42 ES027704, P30 ES029067, R41 TR002567, R42 ES032642), the U.S. Environmental Protection Agency (STAR RD83516602 and RD83580202), and institutional support from Texas A&M University.
Competing Interests
W.A.C. declares the following competing financial interest(s): The NIH Awards R41TR002567 and R42ES032642 were granted to DREAM Tech, LLC to develop and commercialize the BBMD modeling system. W.A.C. is affiliated with DREAM Tech, LLC, and may benefit from the success of the BBMD system.
Abbreviations
- AC50
50% Maximal activity concentration
- ATP
Intracellular adenosine triphosphate
- BMA
Bayesian model averaging
- BMD
Benchmark dose
- BMA BMDA
Bayesian model average benchmark dose in animal dose
- BMA BMDH
Bayesian model average benchmark dose in human dose
- BMDL
Benchmark dose lower confidence limit
- BPM
Beats per minute
- CSS
Steady state concentration
- DAF
Dosimetric adjustment factor
- DMSO
Dimethyl sulfoxide
- ETAP
U.S. EPA Transcriptomic Assessment Product
- HED
Human equivalent dose
- HEAST
Health Effects Assessment Summary Tables
- hiPSC
Human induced pluripotent stem cell
- httk
High-throughput toxicokinetics
- IVIVE
In vitro-to-in vivo extrapolation
- IRIS
Integrated Risk Information System
- IV-MBM
In Vitro Mass Balance Model
- LCL
Lymphoblastoid cell line
- LED
Lower effective dose
- LOAEL
Lowest observed adverse effect level
- LOEL
Lowest observable effect level
- MCMC
Markov Chain Monte Carlo
- MOE
Margin of exposure
- NAM
New approach method
- NOAEL
No observed adverse effect level
- NOEL
No observable effect level
- OPP
Office of Pesticide Programs
- PECO
Population, exposure, comparator, and outcomes
- POD
Point of departure
- PPRTV
Provisional Peer-Reviewed Toxicity Values
- Reg PODA
Regulatory in vivo POD in animal dose
- Reg PODH
Regulatory in vivo POD in human dose
- RfD
Reference dose
- RSL
Regional Screening Level
- TK
Toxicokinetic
- UF
Uncertainty factor
Footnotes
CRediT author statement
En-Hsuan Lu: Methodology, Software, Formal analysis, Investigation, Data Curation, Writing – Original Draft, Writing – Review and Editing, Visualization; Lucie C. Ford: Data curation, Investigation, Writing – Review and Editing; Zunwei Chen: Data curation, Investigation; Sarah D. Burnett: Data curation, Investigation; Ivan Rusyn: Methodology, Writing – Original Draft, Writing – Review and Editing, Funding acquisition, Visualization; Weihsueh A. Chiu: Conceptualization, Methodology, Software, Formal analysis, Investigation, Writing – Original Draft, Writing – Review and Editing, Visualization, Supervision, Project administration, Funding acquisition.
References
- Abdo N, Xia M, Brown CC, Kosyk O, Huang R, Sakamuru S, Zhou YH, Jack JR, Gallins P, Xia K, Li Y, Chiu WA, Motsinger-Reif AA, Austin CP, Tice RR, Rusyn I, and Wright FA. 2015. ‘Population-based in vitro hazard and concentration-response assessment of chemicals: the 1000 genomes high-throughput screening study’, Environ Health Perspect, 123: 458–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anson BD, Kolaja KL, and Kamp TJ. 2011. ‘Opportunities for use of human iPS cells in predictive toxicology’, Clin Pharmacol Ther, 89: 754–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Armitage JM, Sangion A, Parmar R, Looky AB, and Arnot JA. 2021. ‘Update and Evaluation of a High-Throughput In Vitro Mass Balance Distribution Model: IV-MBM EQP v2.0’, Toxics, 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arnot JA, Toose L, Armitage JM, Sangion A, Looky A, Brown TN, Li L, and Becker RA. 2022. ‘Developing an internal threshold of toxicological concern (iTTC)’, J Expo Sci Environ Epidemiol, 32: 877–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beal MA, Gagne M, Kulkarni SA, Patlewicz G, Thomas RS, and Barton-Maclaren TS. 2022. ‘Implementing in vitro bioactivity data to modernize priority setting of chemical inventories’, ALTEX, 39: 123–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bell SM, Chang X, Wambaugh JF, Allen DG, Bartels M, Brouwer KLR, Casey WM, Choksi N, Ferguson SS, Fraczkiewicz G, Jarabek AM, Ke A, Lumen A, Lynn SG, Paini A, Price PS, Ring C, Simon TW, Sipes NS, Sprankle CS, Strickland J, Troutman J, Wetmore BA, and Kleinstreuer NC. 2018. ‘In vitro to in vivo extrapolation for high throughput prioritization and decision making’, Toxicol In Vitro, 47: 213–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blanchette AD, Burnett SD, Grimm FA, Rusyn I, and Chiu WA. 2020. ‘A Bayesian Method for Population-wide Cardiotoxicity Hazard and Risk Characterization Using an In Vitro Human Model’, Toxicol Sci, 178: 391–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blanchette AD, Burnett SD, Rusyn I, and Chiu WA. 2022. ‘A tiered approach to population-based in vitro testing for cardiotoxicity: Balancing estimates of potency and variability’, J Pharmacol Toxicol Methods, 114: 107154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blanchette AD, Grimm FA, Dalaijamts C, Hsieh NH, Ferguson K, Luo YS, Rusyn I, and Chiu WA. 2019. ‘Thorough QT/QTc in a Dish: An In Vitro Human Model That Accurately Predicts Clinical Concentration-QTc Relationships’, Clin Pharmacol Ther, 105: 1175–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burnett SD, Blanchette AD, Chiu WA, and Rusyn I. 2021a. ‘Cardiotoxicity Hazard and Risk Characterization of ToxCast Chemicals Using Human Induced Pluripotent Stem Cell-Derived Cardiomyocytes from Multiple Donors’, Chem Res Toxicol, 34: 2110–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- ———. 2021b. ‘Human induced pluripotent stem cell (iPSC)-derived cardiomyocytes as an in vitro model in toxicology: strengths and weaknesses for hazard identification and risk characterization’, Expert Opin Drug Metab Toxicol, 17: 887–902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burnett SD, Blanchette AD, Grimm FA, House JS, Reif DM, Wright FA, Chiu WA, and Rusyn I. 2019. ‘Population-based toxicity screening in human induced pluripotent stem cell-derived cardiomyocytes’, Toxicol Appl Pharmacol, 381: 114711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burnett SD, Karmakar M, Murphy WJ, Chiu WA, and Rusyn I. 2021. ‘A new approach method for characterizing inter-species toxicodynamic variability’, J Toxicol Environ Health A, 84: 1020–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen Z, Liu Y, Wright FA, Chiu WA, and Rusyn I. 2020. ‘Rapid hazard characterization of environmental chemicals using a compendium of human cell lines from different organs’, ALTEX, 37: 623–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chiu WA, Wright FA, and Rusyn I. 2017. ‘A tiered, Bayesian approach to estimating of population variability for regulatory decision-making’, ALTEX, 34: 377–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DerSimonian R, and Laird N. 1986. ‘Meta-analysis in clinical trials’, Control Clin Trials, 7: 177–88. [DOI] [PubMed] [Google Scholar]
- Dimitrijevic D, Fabian E, Nicol B, Funk-Weyer D, and Landsiedel R. 2022. ‘Toward Realistic Dosimetry In Vitro: Determining Effective Concentrations of Test Substances in Cell Culture and Their Prediction by an In Silico Mass Balance Model’, Chem Res Toxicol, 35: 1962–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eduati FLM. Mangravite, Wang T, Tang H, Bare JC, Huang R, Norman T, Kellen M, Menden MP, Yang J, Zhan X, Zhong R, Xiao G, Xia M, Abdo N, Kosyk O, Niehs-Ncats-Unc Dream Toxicogenetics Collaboration, Friend S, Dearry A, Simeonov A, Tice RR, Rusyn I, Wright FA, Stolovitzky G, Xie Y, and Saez-Rodriguez J. 2015. ‘Prediction of human population responses to toxic compounds by a collaborative competition’, Nat Biotechnol, 33: 933–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fantke P, Chiu WA, Aylward L, Judson R, Huang L, Jang S, Gouin T, Rhomberg L, Aurisano N, McKone T, and Jolliet O. 2021. ‘Exposure and Toxicity Characterization of Chemical Emissions and Chemicals in Products: Global Recommendations and Implementation in USEtox’, Int J Life Cycle Assess, 26: 899–915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ford LC, Jang S, Chen Z, Zhou YH, Gallins PJ, Wright FA, Chiu WA, and Rusyn I. 2022. ‘A Population-Based Human In Vitro Approach to Quantify Inter-Individual Variability in Responses to Chemical Mixtures’, Toxics, 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grimm FA, Blanchette A, House JS, Ferguson K, Hsieh NH, Dalaijamts C, Wright AA, Anson B, Wright FA, Chiu WA, and Rusyn I. 2018. ‘A human population-based organotypic in vitro model for cardiotoxicity screening’, ALTEX, 35: 441–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grimm FA, Iwata Y, Sirenko O, Bittner M, and Rusyn I. 2015. ‘High-Content Assay Multiplexing for Toxicity Screening in Induced Pluripotent Stem Cell-Derived Cardiomyocytes and Hepatocytes’, Assay Drug Dev Technol, 13: 529–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Health Canada. 2021. “Science approach document. Bioactivity Exposure Ratio: Application in Priority Setting and Risk Assessment.” In.: Health Canada. [Google Scholar]
- Higgins JP, Thompson SG, Deeks JJ, and Altman DG. 2003. ‘Measuring inconsistency in meta-analyses’, BMJ, 327: 557–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kavlock RJ, Bahadori T, Barton-Maclaren TS, Gwinn MR, Rasenberg M, and Thomas RS. 2018. ‘Accelerating the Pace of Chemical Risk Assessment’, Chemical Research in Toxicology, 31: 287–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krewski D, Andersen ME, Tyshenko MG, Krishnan K, Hartung T, Boekelheide K, Wambaugh JF, Jones D, Whelan M, Thomas R, Yauk C, Barton-Maclaren T, and Cote I. 2020. ‘Toxicity testing in the 21st century: progress in the past decade and future perspectives’, Arch Toxicol, 94: 1–58. [DOI] [PubMed] [Google Scholar]
- Lowe CN, and Williams AJ. 2021. ‘Enabling High-Throughput Searches for Multiple Chemical Data Using the U.S.-EPA CompTox Chemicals Dashboard’, Journal of Chemical Information and Modeling, 61: 565–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- NASEM, National Academies of Sciences, Engineering, and Medicine. 2023. “Building Confidence in New Evidence Streams for Human Health Risk Assessment: Lessons Learned from Laboratory Mammalian Toxicity Tests.” In. Washington, DC: The National Academies Press. [PubMed] [Google Scholar]
- Nicolas CI, Linakis MW, Minto MS, Mansouri K, Clewell RA, Yoon M, Wambaugh JF, Patlewicz G, McMullen PD, Andersen ME, and Clewell Iii HJ. 2022. ‘Estimating provisional margins of exposure for data-poor chemicals using high-throughput computational methods’, Front Pharmacol, 13: 980747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- NRC, National Research Council. 2007. “Toxicity Testing in the 21st Century: A Vision and a Strategy.” In. Washington, DC: The National Academies Press. [Google Scholar]
- ———. 2017. “Using 21st century science to improve risk-related evaluations.” In. Washington, DC: National Academies Press. [PubMed] [Google Scholar]
- Pang L 2020. ‘Toxicity testing in the era of induced pluripotent stem cells: A perspective regarding the use of patient-specific induced pluripotent stem cell–derived cardiomyocytes for cardiac safety evaluation’, Current Opinion in Toxicology, 23-24: 50–55. [Google Scholar]
- Paul Friedman K, Gagne M, Loo LH, Karamertzanis P, Netzeva T, Sobanski T, Franzosa JA, Richard AM, Lougee RR, Gissi A, Lee JJ, Angrish M, Dorne JL, Foster S, Raffaele K, Bahadori T, Gwinn MR, Lambert J, Whelan M, Rasenberg M, Barton-Maclaren T, and Thomas RS. 2020. ‘Utility of In Vitro Bioactivity as a Lower Bound Estimate of In Vivo Adverse Effect Levels and in Risk-Based Prioritization’, Toxicol Sci, 173: 202–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paul Friedman K, Papineni S, Marty MS, Yi KD, Goetz AK, Rasoulpour RJ, Kwiatkowski P, Wolf DC, Blacker AM, and Peffer RC. 2016. ‘A predictive data-driven framework for endocrine prioritization: a triazole fungicide case study’, Crit Rev Toxicol, 46: 785–833. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pearce RG, Setzer RW, Strope CL, Wambaugh JF, and Sipes NS. 2017. ‘httk: R Package for High-Throughput Toxicokinetics’, J Stat Softw, 79: 1–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Proenca S, Escher BI, Fischer FC, Fisher C, Gregoire S, Hewitt NJ, Nicol B, Paini A, and Kramer NI. 2021. ‘Effective exposure of chemicals in in vitro cell systems: A review of chemical distribution models’, Toxicol In Vitro, 73: 105133. [DOI] [PubMed] [Google Scholar]
- Rusyn I, and Chiu WA. 2022. ‘Decision-Making with New Approach Methodologies: Time to Replace Default Uncertainty Factors with Data’, Toxicol Sci, 189: 148–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shao K, and Shapiro AJ. 2018. ‘A Web-Based System for Bayesian Benchmark Dose Estimation’, Environ Health Perspect, 126: 017002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sirenko O, Cromwell EF, Crittenden C, Wignall JA, Wright FA, and Rusyn I. 2013. ‘Assessment of beating parameters in human induced pluripotent stem cells enables quantitative in vitro screening for cardiotoxicity’, Toxicol Appl Pharmacol, 273: 500–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sirenko O, Grimm FA, Ryan KR, Iwata Y, Chiu WA, Parham F, Wignall JA, Anson B, Cromwell EF, Behl M, Rusyn I, and Tice RR. 2017. ‘In vitro cardiotoxicity assessment of environmental chemicals using an organotypic human induced pluripotent stem cell-derived model’, Toxicol Appl Pharmacol, 322: 60–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- U.S. EPA. 2023a. ‘Regional Screening Levels (RSLs)-Frequent Questions. November 2023’, U.S. Environmental Protection Agency, Accessed 11/17. https://www.epa.gov/risk/regional-screening-levels-rsls-frequent-questions. [Google Scholar]
- ———. 2023b. “Scientific Studies Supporting Development of Transcriptomic Points of Departure for EPA Transcriptomic Assessment Products (ETAPs).” In, edited by Office of Research and Development Center for Computational Toxicology and Exposure (CCTE) & Center for Public Health and Environmental Assessment (CPHEA). Washington, DC: U.S. Environmental Protection Agency. [Google Scholar]
- U.S.EPA. 2011. “Recommended Use of body weight ¾ as the Default Method in Derivation of the Oral Reference Dose.” In. Washington, DC: U.S. Environmental Protection Agency. [Google Scholar]
- ———. 2012. “Benchmark Dose Technical Guidance.” In.: U.S. Environmental Protection Agency. [Google Scholar]
- Valdiviezo A, Brown GE, Michell AR, Trinconi CM, Bodke VV, Khetani SR, Luo YS, Chiu WA, and Rusyn I. 2022. ‘Reanalysis of Trichloroethylene and Tetrachloroethylene Metabolism to Glutathione Conjugates Using Human, Rat, and Mouse Liver in Vitro Models to Improve Precision in Risk Characterization’, Environ Health Perspect, 130: 117009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Valdiviezo A, Luo YS, Chen Z, Chiu WA, and Rusyn I. 2021. ‘Quantitative In Vitro-to-In Vivo Extrapolation for Mixtures: A Case Study of Superfund Priority List Pesticides’, Toxicol Sci, 183: 60–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Zalm AJ, Barroso J, Browne P, Casey W, Gordon J, Henry TR, Kleinstreuer NC, Lowit AB, Perron M, and Clippinger AJ. 2022. ‘A framework for establishing scientific confidence in new approach methodologies’, Arch Toxicol, 96: 2865–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Viechtbauer W 2010. ‘Conducting Meta-Analyses in R with the metafor Package’, Journal of Statistical Software, 36: 1–48. [Google Scholar]
- Wambaugh JF, Hughes MF, Ring CL, MacMillan DK, Ford J, Fennell TR, Black SR, Snyder RW, Sipes NS, Wetmore BA, Westerhout J, Setzer RW, Pearce RG, Simmons JE, and Thomas RS. 2018. ‘Evaluating In Vitro-In Vivo Extrapolation of Toxicokinetics’, Toxicol Sci, 163: 152–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wambaugh JF, Wetmore BA, Pearce R, Strope C, Goldsmith R, Sluka JP, Sedykh A, Tropsha A, Bosgra S, Shah I, Judson R, Thomas RS, and Setzer RW. 2015. ‘Toxicokinetic Triage for Environmental Chemicals’, Toxicol Sci, 147: 55–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- WHO/IPCS. 2018. “Guidance document on evaluating and expressing uncertainty in hazard [Google Scholar]
- characterization– 2nd edition.” In. Geneva, Switzerland: World Health Organization. [Google Scholar]
- Wignall JA, Muratov E, Sedykh A, Guyton KZ, Tropsha A, Rusyn I, and Chiu WA. 2018. ‘Conditional Toxicity Value (CTV) Predictor: An In Silico Approach for Generating Quantitative Risk Estimates for Chemicals’, Environ Health Perspect, 126: 057008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, Patlewicz G, Shah I, Wambaugh JF, Judson RS, and Richard AM. 2017. ‘The CompTox Chemistry Dashboard: a community data resource for environmental chemistry’, J Cheminform, 9: 61. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
