HExpPredict: In Vivo Exposure Prediction of Human Blood Exposome Using a Random Forest Model and Its Application in Chemical Risk Prioritization

Fanrong Zhao; Li Li; Penghui Lin; Yue Chen; Shipei Xing; Huili Du; Zheng Wang; Junjie Yang; Tao Huan; Cheng Long; Limao Zhang; Bin Wang; Mingliang Fang

doi:10.1289/EHP11305

. 2023 Mar 13;131(3):037009. doi: 10.1289/EHP11305

HExpPredict: In Vivo Exposure Prediction of Human Blood Exposome Using a Random Forest Model and Its Application in Chemical Risk Prioritization

Fanrong Zhao ^1,^2,³, Li Li ⁴, Penghui Lin ³, Yue Chen ⁵, Shipei Xing ⁶, Huili Du ^3,⁷, Zheng Wang ⁵, Junjie Yang ³, Tao Huan ⁶, Cheng Long ⁵, Limao Zhang ³, Bin Wang ^8,^9,^✉, Mingliang Fang ^1,^3,^10,^✉

PMCID: PMC10010393 PMID: 36913238

Abstract

Background:

Due to many substances in the human exposome, there is a dearth of exposure and toxicity information available to assess potential health risks. Quantification of all trace organics in the biological fluids seems impossible and costly, regardless of the high individual exposure variability. We hypothesized that the blood concentration ( $C_{B}$ ) of organic pollutants could be predicted via their exposure and chemical properties. Developing a prediction model on the annotation of chemicals in human blood can provide new insight into the distribution and extent of exposures to a wide range of chemicals in humans.

Objectives:

Our objective was to develop a machine learning (ML) model to predict blood concentrations ( $C_{B} s$ ) of chemicals and prioritize chemicals of health concern.

Methods:

We curated the $C_{B} s$ of compounds mostly measured at population levels and developed an ML model for chemical $C_{B}$ predictions by considering chemical daily exposure (DE) and exposure pathway indicators ( $δ_{i j}$ ), half-lives ( $lowercase t begin subscript 1 by 2 end subscript$ ), and volume of distribution ( $V_{d}$ ). Three ML models, including random forest (RF), artificial neural network (ANN) and support vector regression (SVR) were compared. The toxicity potential or prioritization of each chemical was represented as a bioanalytical equivalency (BEQ) and its percentage (BEQ%) estimated based on the predicted $C_{B}$ and ToxCast bioactivity data. We also retrieved the top 25 most active chemicals in each assay to further observe changes in the BEQ% after the exclusion of the drugs and endogenous substances.

Results:

We curated the $C_{B} s$ of 216 compounds primarily measured at population levels. RF outperformed the ANN and SVF models with the root mean square error (RMSE) of 1.66 and $2.07 micromolar$ , the mean absolute error (MAE) values of 1.28 and $1.56 micromolar$ , the mean absolute percentage error (MAPE) of 0.29 and 0.23, and $R^{2}$ of 0.80 and 0.72 across test and testing sets. Subsequently, the human $C_{B} s$ of 7,858 ToxCast chemicals were successfully predicted, ranging from $1.29 \times 10^{- 6}$ to $1.79 \times 10^{- 2} μ M$ . The predicted $C_{B} s$ were then combined with ToxCast in vitro bioassays to prioritize the ToxCast chemicals across 12 in vitro assays with important toxicological end points. It is interesting that we found the most active compounds to be food additives and pesticides rather than widely monitored environmental pollutants.

Discussion:

We have shown that the accurate prediction of “internal exposure” from “external exposure” is possible, and this result can be quite useful in the risk prioritization. https://doi.org/10.1289/EHP11305

Introduction

Because many chemical substances have been developed and used in commerce over numerous recent decades, there is a dearth of exposure and toxicity information available to assess potential health risks of most of these chemicals to humans.^1,2 To address concerns over the potential health effects of untested chemicals, high-throughput screening (HTS) assessments that incorporate both exposure and toxicity data are needed for risk-based screening and prioritization.^1–4 The U.S. Environmental Protection Agency (U.S. EPA) has developed the ToxCast program to provide in vitro bioactivity data that may inform chemical toxicity.^5,6 However, to use the in vitro bioactivity data of ToxCast to evaluate the potential risk to human health, chemical blood concentration ( $C_{B}$ ) is essential to link the internal exposure to external human exposure.⁷

One challenge to chemical exposure and risk assessments has been the demand for a large number of chemical $C_{B}$ measurements.⁸ Clearly, experimental quantification is cumbersome and time-consuming. The standards used for analysis are also costly or difficult to obtain. In addition, the concentrations of most compounds are too low to be detectable.^9,10 Moreover, there is high variability in chemical levels between biospecimens from different people, sometimes even for samples collected from the same donors on different days in cases of exposure to rapidly metabolized chemicals.^11,12 The National Health and Nutrition Examination Survey (NHANES) has spent years monitoring several hundred chemicals, which is still insufficient for the evaluation of chemical exposure risk in the era of the exposome. Therefore, without extensive direct measurements of chemicals at the population level, there is an urgent need to explore whether we can develop in silico methods to predict the $C_{B} s$ of chemicals. Although the U.S. EPA has also developed the ExpoCast program to predict human exposure to the large number of chemicals with the balanced accuracies of the source-based exposure pathway models ranging from 73% to 81% and with a coefficient of determination ( $R^{2}$ ) between predictions and biomonitoring-based inferences of 0.8,³ the ExpoCast can only predict the intake rates, which is an indicator of external exposure. Because different chemicals have different bioavailability and clearance, to assess health risks using ToxCast activity test data, it is necessary to convert the external exposure data into internal concentration in bodily fluids.⁷ Previous efforts built quantitative approaches to translate in vitro toxicity potencies to equivalent in vivo doses using in vitro−in vivo extrapolation (IVIVE) techniques.¹³ These approaches used pharmacokinetic equations to estimate steady-state plasma concentrations ( $C_{SS}$ ) using the High-Throughput Toxicokinetic (HTTK) the open-source R package (version 4.2.1; R Development Core Team).¹³ However, the $C_{B}$ values predicted by the HTTK model were derived by assuming steady-state and 100% oral bioavailability under a dose rate of $1 milligram per kilogram per day$ , which did not consider the exposure and the corresponding uncertainty; and chemicals such as perfluorooctanoic acid (PFOA) and perfluorooctanesulfonic acid (PFOS), which were thought to be actively resorbed by the kidney, were not captured by the current HTTK model.¹³ In addition, for the recent studies, the high-throughput PROduction-To-EXposure (PROTEX-HT) model developed by Li et al. could already predict the Css without assuming 100% oral absorption,² and the Physiologically based Toxicokinetic (PBTK) model developed by Armitage et al. could already capture the renal clearance and reabsorption of ions such as polyfluoroalkyl substances (PFAS).¹⁴ However, most of those theoretical methods used to predict the chemical Css resulting from repeated daily exposure were limited to oral route of exposure.^4,15,16

We hypothesized that the $C_{B}$ of organic pollutants could be predicted via their exposure and chemical properties, especially for those with similar exposure routes and physicochemical parameters. We seek to increase the prediction accuracy of $C_{B}$ using machine learning (ML) methods. In this study, we curated the $C_{B} s$ of pollutants in the general population from available databases and literature and applied ML algorithms for $C_{B}$ predictions by optimizing the key parameters that mediate the $C_{B}$ . We compared three ML algorithms, including random forest (RF), artificial neural network (ANN) and support vector regression (SVR), based on the publicly available experimental data. The best-performing RF model was then used to predict the $C_{B} s$ of $greater 7500$ ToxCast chemicals. The predicted $C_{B}$ values were further combined with ToxCast in vitro bioassays to prioritize those ToxCast chemicals in terms of $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ratios, using different assay end points. This advanced human internal exposure prediction (HExpPredict) approach provides the ability to evaluate and prioritize chemicals for potential risk to human health.

Methods

A detailed data processing workflow is depicted in Figure 1. Key parameters and models regarding the models developed for this study are described in the following sections. According to the pharmacokinetics and toxicokinetics models, factors that are known or expected to influence the relationship between external exposure and the chemical $C_{B}$ are the elimination half-life, bioavailability, volume of distribution ( $V_{d}$ ), dosage, and dosing interval.¹⁷ When defining dosing interval equal to 1 day, the maintenance dose refers to the daily exposure (DE, milligrams per kilogram body weight per day). Because of the lack of data, the parameters such as renal clearance half-life and bioavailability were treated as an unknown parameter and trained by ML model. As one of the major pathways of elimination, the predictable biotransformation half-life ( $lowercase t begin subscript 1 by 2 end subscript$ ) was included in our prediction model.

Figure 1 is schematic illustration with two parts. On the left, the illustration is titled Blood exposure database has four steps. Step 1: Biomonitoring California C D C, exposome explorer, and literature lead to 216 chemicals and parameter characterization. Step 2: Q S A R ready structures with 7858 chemicals leads to H L B prediction, experimental C begin subscript uppercase b end subscript, V begin subscript lowercase d end subscript, ExpoCast S E E M 3, including D E and lowercase delta begin subscript lowercase italic i j end subscript. Step 3: ToxCast trademark leads to Q S A R ready structures with 7858 chemicals. Step 4: ExpoCast trademark leads to ExpoCast S E E M 3, including D E and lowercase delta begin subscript lowercase italic i j end subscript. On the right, the illustration titled machine learning prediction model has four steps. Step 1: 172 training set and 44 testing set with an icon of a flowchart. Step 2: A set of three dot graphs titled AN N, random forest S V R plots experimental inc begin subscript uppercase b end subscript (y-axis) across predicted inc begin subscript uppercase b end subscript (x-axis) leads to 7858 prediction set. Step 3: It is set of one line graph and pie chart. The line graph titled predicted uppercase c begin subscript uppercase b end subscript plots predicted uppercase c begin subscript uppercase b end subscript (y-axis) across Rank (x-axis). T E Q begin subscript lowercase i end subscript percentage equals uppercase c begin subscript uppercase b end subscript per Activity concentration, 50 percent begin subscript lowercase i end subscript over uppercase sigma uppercase c begin subscript uppercase b end subscript per Activity concentration, 50 percent begin subscript lowercase i end subscript uppercase c begin subscript uppercase b end subscript per Activity concentration, 50 percent begin subscript lowercase i end subscript. Under risk prioritization, a pie chart depict ratios of Top 1, Top 2, Top 3. — Overview of framework for human $C_{B}$ prediction (HExpPredict) modeling and risk prioritization in this study. Note: $C_{B}$ , blood concentration.

Chemical Selection

The chemicals were selected based on a subset of the ToxCast Database (version 3.0, publicly released October 2018) in this study, for which the exposure data and in vitro bioactivity assay data were readily available.¹⁸ The U.S. EPA’s ToxCast chemical list includes more than 9,000 compounds, including industrial chemicals, pesticides, consumer product ingredients, and pharmaceuticals. The full list of chemicals considered is available in Excel Table S1. All chemical descriptors including CAS registry number, chemical name, Simplified Molecular Input Line Entry Specification (SMILES), molecular formula, average mass, and monoisotopic mass are available through the U.S. EPA’s CompTox Chemicals Dashboard (version 2.1.1; https://comptox.epa.gov/dashboard/batch-search).¹⁹

Exposure Estimates

The median of estimated DE level (milligrams per kilogram body weight per day) with uncertainty [95% confidence interval (CI)] for the ToxCast chemical as shown in Excel Table S1 was acquired from the U.S. EPA’s ExpoCast exposure estimates, which were developed using the General Population Consensus Model (SEEM3).^3,20 The exposure pathway indicators ( $δ_{i j}$ ) for four source-based pathways (far-field pesticide use, nonpesticide dietary exposure, far-field industrial exposure, and consumer) in the SEEM3 model were also included in our prediction model.³ The $δ_{i j}$ is an estimated probability of whether a given pathway j is relevant to a given chemical i.

Chemical Biotransformation Half-Life Prediction

The predicted half-life values ( $lowercase t begin subscript 1 by 2 end subscript$ ) for the ToxCast chemicals were taken from the Human Exposome and Metabolite Database (HExpMetDB).²¹ The prediction was based on the quantitative structure−activity relationship (QSAR) approach called Iterative Fragment Selection (IFS).²²

The Distribution Volume ( $V_{d}$ ) Prediction

The $V_{d}$ values were predicted by a comprehensive exposure model named Risk Assessment, IDentification And Ranking-Indoor and Consumer Exposure (RAIDAR-ICE) according to previous study.²³

Molecular Descriptors and QSAR Parameter Calculation

The QSAR parameters such as $Log K_{OW}$ and $Log K_{OA}$ were calculated using solute descriptors provided by the online UFZ-LSER Database.²⁴ Water solubility (WS) and substructure molecular descriptors were calculated by the Toxicity Estimation Software Tool (TEST, version 5.1.1).²⁵

Chemical $C_{B}$ Search

To investigate the occurrence and levels of xenobiotics in human blood, we conducted a database and literature search on chemicals in human blood. The measured $C_{B} s$ of xenobiotics in this study were first retrieved from the NHANES 2003–2017,²⁶ the California Environmental Contaminant Biomonitoring Program (also known as Biomonitoring California),²⁷ or the Exposome-Explorer.²⁸ We excluded drugs and endogenous compounds by filtering the U.S. EPA’s CompTox Chemicals Dashboard Drugbank list (https://comptox.epa.gov/dashboard/chemical-lists/DRUGBANK) and manually searching the chemical category through PubChem (https://pubchem.ncbi.nlm.nih.gov/). When a given chemical was present in both of these databases, we used the NHANES concentrations. To further obtain concentration data for more compounds, we performed a literature search on typical pollutants that were not in the databases, based on the chemicals of concerns previously summarized in our research.²⁹ The National Center for Biotechnology Information (NCBI) PubMed database (https://pubmed.ncbi.nlm.nih.gov/) was searched from the year 2005 to 2022. The keywords used to search the PubMed database included those describing sample types “blood,” “plasma,” or “serum” and terms for the typical pollutant classes summarized in our previous study,²⁹ including “perfluorinated compounds,” “volatile organic compound,” “pesticide,” “organophosphorus flame retardant,” or “polycyclic aromatic hydrocarbons,” together with keywords including “exposome,” “exposure,” “detection,” “level” or “concentration.” We included only the studies from healthy human populations using a mass spectrometry–based analytical method during our manual screening of the possible literature hits. We also excluded the studies from polluted areas or special environment areas. The $C_{B}$ of each compound was calculated based on the sample size weighted geometric mean (GM, if provided) or median concentrations measured in serum, plasma, or blood. To develop models for different age and sex groups, we also collected the GMs of $C_{B} s$ for different age and sex groups from the NHANES Database ( $lowercase italic n equals 48$ ).

ML Models

Methods of random search and 5-fold cross-validation were used for parameter optimization to train three ML models (i.e., RF, ANN, and SVR) with various prediction features of DE, $δ_{i j}$ , $V_{d}$ , $lowercase t begin subscript 1 by 2 end subscript$ , and other chemical properties, of which the optimal parameters was evaluated by and root mean square error (RMSE). The publicly available data sets Exposome-Explorer database,²⁸ the Fourth National Report on Human Exposure to Environmental Chemicals,²⁶ and the California Environmental Contaminant Biomonitoring Program²⁷ were searched for experimentally measured human in vivo $C_{B}$ values. Literature mining was performed by manually searching reviews or articles as mentioned above. The measured $C_{B} s$ were employed to train ML models for in silico $C_{B}$ prediction. We excluded the drug and endogenous compounds by filtering the U.S. EPA’s CompTox Chemicals Dashboard Drugbank list and manually searching the chemical category through PubChem (https://pubchem.ncbi.nlm.nih.gov/), because our model only considers the $C_{B} s$ produced by external exposures. For the collected experimental $C_{B} s$ and predicted $lowercase t begin subscript 1 by 2 end subscript$ , $V_{d}$ , and DE values, we unified their units into micromolar, day, L, and micromole per day, respectively, and normalized the right-skewed data by natural logarithmic transformation before feeding them to a ML model. The training and testing splits were 80:20 to train and test RF, ANN, and SVR models. Training and testing set chemicals were randomly selected. In this work, the RMSE, mean absolute error (MAE), mean absolute percentage error (MAPE), and fitness degree $R^{2}$ of the three models were compared. Finally, the trained model was used to predict $C_{B}$ for the ToxCast compounds. All analyses were performed in R (version 4.2.1; R Development Core Team). All chemical predictors are provided in Excel Table S1. To improve the applicability of our model, the R script and tutorial for users are also available in the Supplemental File HExpPredict_scripts.rar and Supplemental Material, “Text S1,” as well as at https://github.com/FangLabNTU/HExpPredict.

Monte Carlo (MC) Simulation and Parameter Distributions

MC simulation was implemented to simulate the impact of DE and $lowercase t begin subscript 1 by 2 end subscript$ uncertainty on calculating the $C_{B}$ 10,000 times, using a similar model as in our previous studies.^21,30,31 Three separate MC simulations were performed referring to previous studies: DE prediction uncertainty only, $lowercase t begin subscript 1 by 2 end subscript$ prediction uncertainty only, and both DE and $lowercase t begin subscript 1 by 2 end subscript$ prediction uncertainty.^20,21 For each chemical, the $C_{B}$ was calculated 10,000 times for three separate MC simulations respectively, allowing estimation of the 5th, median, and 95th percentiles.

In Vitro Bioactivity Data

All ToxCast in vitro HTS data (version 3.0, publicly released October 2018)¹⁸ were downloaded from the U.S. EPA’s CompTox Chemicals Dashboard (version 2.1) Assay Endpoints List (https://comptox.epa.gov/dashboard/assay-endpoints?filtered) to estimate the endocrine-related activity. The 12 targeted assays covering the estrogen receptor alpha ( $ER α$ ) (TOX21_Era_BLA_Agonist_ratio and TOX21_Era_BLA_Antagonist_ratio), androgen receptor (AR) (TOX21_AR_BLA_Agonist_ratio, Tox21_AR_LUC_MDAKB2_Agonist, TOX21_AR_BLA_Antagonist_ratio and TOX21_AR_LUC_MDAKB2_Antagonist_0.5nM_R1881), peroxisome proliferator–activated receptor gamma ( $PPAR γ$ ) (Tox21_PPARg_BLA_Agonist_ratio, TOX21_PPARg_BLA_Agonist_ch2, TOX21_PPARg_BLA_Antagonist_ch1 and TOX21_PPARg_BLA_antagonist_viability), and thyroid hormone receptor (TR) (TOX21_TR_LUC_GH3_Agonist and TOX21_TR_LUC_GH3_Antagonist) were chosen for further study. The bioactivity potential or prioritization of each chemical was represented as $uppercase c begin subscript uppercase b end subscript to activity concentration, 50 percent$ ratio ( $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ). We used the concentration at 50% of maximum activity ( $activity concentration, 50 percent$ ) estimates from the U.S. EPA’s CompTox Chemicals Dashboard (version 2.1) ToxCast Assay Endpoints List³² provided by the ToxCast program¹⁸ as well as the predicted $C_{B}$ to calculate the $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ratios of ToxCast chemicals.

The relative ranking of $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ can be used for priority setting; that is, higher $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ can be considered to be a higher priority. The toxicity potential or prioritization of each chemical was represented as a bioanalytical equivalency (BEQ). The BEQ values of each chemical and its percentage in the total BEQ (BEQ%) were estimated based on the below equations²⁹:

{BEQ}_{i} = C_{B i} / {AC}_{50 i} \times {AC}_{50 r e f}

(1)

{BEQ}_{i} % = {BEQ}_{i} \div \sum {BEQ}_{i} \times 100 %,

(2)

where $C_{B i}$ is the predicted blood concentration of compound i; $activity concentration, 50 percent lowercase italic i$ is the concentration of compound i that causes 50% response; and $activity concentration, 50 percent reference$ is the concentration of the reference compound (the compound with the minimum $activity concentration, 50 percent$ for each assay) that causes 50% response. We further retrieved the applications of the top 25 most active chemicals of each assay from the NCBI PubMed databases (https://pubmed.ncbi.nlm.nih.gov).

Results

A total of 7,858 chemicals were selected in this study from 9,403-chemical U.S. EPA’s ToxCast Database.¹⁸ The chemicals that were not selected (1,545) comprised those that did not have available DE data through ExpoCast SEEM3 and those that were categorized as ionogenic chemicals, organic mixtures, or chemicals with molecular weights over 1,000 Da and therefore unable to be used by the iterated function system (IFS) algorithm.

Chemical $C_{B}$ Search

To investigate the occurrence and levels of the selected chemicals in human blood, we conducted a database^26–28 and literature search,^10,33–37 extracting $C_{B}$ from identified data and studies. In total, the measured $C_{B} s$ of 216 chemicals were documented for the further ML modeling. In general, the $C_{B} s$ of the documented chemicals ranged from $1.65 \times 10^{- 8}$ to $1.59 micromolar$ staggering 8 orders of magnitude. The final list is presented in Excel Table S2, including CAS registry number, chemical name, formula, average mass, monoisotopic mass, weighted $C_{B}$ , and data sources for our RF model. Overall, the NHANES, the Exposome-Explorer Database, and the literature search were the dominant contributors to the training set and contributed to 23%, 39%, and 36% of the data set, respectively. Due to limited measured $C_{B}$ data of the population, we collected data from only 48 chemicals for which the age- and sex-specific geometric means of measured $C_{B}$ was available from NHANES Database. The GM $C_{B}$ ranges for individuals age 12–19 y and those older than 20 y were $4.65 \times 10^{- 6} - 5.32 \times 10^{- 3} μ M$ and $1.11 \times 10^{- 5} - 9.00 \times 10^{- 3} μ M$ , respectively. The GM $C_{B}$ ranges were $8.31 \times 10^{- 6} - 1.07 \times 10^{- 2} μ M$ for males and $8.31 \times 10^{- 6} - 6.84 \times 10^{- 3} μ M$ for females (Excel Table S3).

Human Exposure Evaluation

The predicted exposure values of 7,858 chemicals were obtained from ExpoCast (Excel Table S1). The estimated human chemical DE ranged from $3.17 \times 10^{- 15}$ (95% CI: $3.82 \times 10^{- 17}$ , $4.19 \times 10^{- 13}$ ) to 4.92 (95% CI: $1.65 \times 10^{- 7}$ , $2.21 \times 10^{5}$ ) mg/kg body weight/d, spanning across 15 orders of magnitude. The $δ_{i j}$ values of 7,858 chemicals ranged from 0 to 1 for the four pathways (Excel Table S1; i.e., far-field pesticide use, nonpesticide dietary exposure, far-field industrial exposure, and consumer), with values near zero indicating low probability and values near one indicating high probability exposure to the chemical.

Chemical $lowercase t begin subscript 1 by 2 end subscript$ Evaluation

The $lowercase t begin subscript 1 by 2 end subscript$ of 7,858 chemicals listed in Excel Table S1 were successfully predicted using the IFS approach. Of these 7,858 chemicals, the median $lowercase t begin subscript 1 by 2 end subscript$ was predicted to be 4.64 h (h). Rolitetracycline was predicted to have the shortest $lowercase t begin subscript 1 by 2 end subscript$ of 0.05 h, and mirex was predicted to have the longest $lowercase t begin subscript 1 by 2 end subscript$ of 2,020,000 h with a wide range of 8 orders of magnitude.

Chemical $V_{d}$ Prediction

We used the RAIDAR-ICE model to predict the $V_{d}$ values of 7,858 chemicals (Excel Table S1). The median $V_{d}$ was predicted to be $14.4 liters per kilogram$ whole blood. The $V_{d} s$ span over 3 orders of magnitude, from $7.36 \times 10^{- 1}$ to $20.3 liter per kilogram$ whole blood.

$C_{B}$ Prediction ML Modeling

We developed a workflow to use experimental $C_{B}$ data to train and test ML models (Figure 1). Such models were then applied to the 7,858 chemicals from U.S. EPA ToxCast Program for which in vitro bioactivity data were available. We collected available experimentally measured human $C_{B}$ values through publicly available databases and literature to train ML models for in silico $C_{B}$ prediction. After excluding the drug and endogenous compounds, a total of 216 experimental $C_{B}$ data points were included in the ML model (Figure 2A). We randomly divided the 216 data points into 172 compounds for training and 44 compounds for further testing (i.e., 80%:20%). We downloaded the chemical QSAR-ready SMILES from the U.S. EPA’s CompTox Chemicals Dashboard Batch Search (version 2.1.1),³⁸ which we used to predict the $V_{d}$ , and $lowercase t begin subscript 1 by 2 end subscript$ . Chemical-specific inputs to ML models included DE, $V_{d}$ , $lowercase t begin subscript 1 by 2 end subscript$ , and $δ_{i j}$ for parameter tuning.

Figure 2A is a Venn diagram. On the left, there are N H A N E S, including 50, 0, 0, 0 and B C, including 4, 12, 0, 0. The intersection area includes the following data: 0, 0, 0, 0. On the right, there are E E, including 73, 12, 0, 0 and Literature, including 77, 0, 0, 0. The intersection area includes the following data: 0, 0, 0, 0. Figure 2B is a dot graph, plotting Lanthanum (Predicted uppercase c begin subscript uppercase b end subscript per micrometer), ranging from negative 18 to 0 in increments of negative 2 (y-axis) across Lanthanum (observed uppercase c begin subscript uppercase b end subscript per micrometer), ranging from negative 18 to 0 in increments of negative 2 (x-axis) for training set and testing set. Figure 2C is a line graph, plotting Lanthanum (Predicted uppercase c begin subscript uppercase b end subscript per micrometer), ranging from negative 18 to 0 in increments of negative 2 (y-axis) across Lanthanum (observed uppercase c begin subscript uppercase b end subscript per micrometer), ranging from negative 18 to 0 in increments of negative 2 (x-axis) for Dioxins, O C Ps, P A Es, PO P E Rs, P B D Es, P C Bs, P P C Ps, P A Hs, V O Cs, and P F Cs. Figure 2D is a graph, plotting predicted uppercase c begin subscript uppercase b end subscript per experimental uppercase c begin subscript uppercase b end subscript ratios, ranging from 0.001 to 0.01 in increments of 0.009; 0.01 to 0.1 in increments of 0.09, 0.1 to 1 in increments of 0.9, 1 to 10 in increments of 9, 10 to 100 in increments of 90, and 100 to 1000 in increments of 900 (y-axis) across training set and testing set. — (A) Overlapping analysis of major sources for measured $C_{B}$ used in machine learning training. (B) Prediction performance of RF ML model for training ( $lowercase italic n equals 172$ ) and testing ( $lowercase italic n equals 44$ ) sets (referring to the data in Excel Table S4); Black line is the $y = x$ line, and blue dotted lines are 10-fold boundaries; (C) Prediction performance of RF ML model for different groups of chemicals (referring to the data in Excel Table S2); Black line is the $y = x$ line. (D) Violin plots for training and testing set prediction errors by calculating the ratio between measured and predicted concentration from RF ML model (referring to the data in Excel Table S4). Blue dashed lines are the median line, and red dotted lines are quartiles. Note: BC, Biomonitoring California; $C_{B}$ , blood concentration; EE, Exposome-Explorer; ML, machine learning; NHANES, National Health and Nutrition Examination Survey; OPFRs, organophosphorus flame retardants; OP, organochlorine pesticide; PAE, phthalate ester; PBDE; polybrominated diphenyl ether; PCB, polychlorinated biphenyl; PFC, perfluorinated compounds; PPCP, personal care and consumer product; RF, random forest; VOC, volatile organic compound.

Model Validation

To optimize the $C_{B}$ prediction model performance by training set, tuning parameters including maximum depth (5–100), mtry ratio (0.2–0.8), number of trees (10–500), maximum tuning times (20), and tuning method (“random_search”) were executed using the learner “ranger” of “mlr3” learning platform (https://github.com/mlr-org/mlr3). The RMSE, MAE, MAPE, and $R^{2}$ were calculated to compare the predicted and experimental $C_{B}$ in the test data set. We investigated three widely used ML models (RF, ANN, and SVR) for $C_{B}$ predictions with seven basic variables, including DE, $V_{d}$ , $lowercase t begin subscript 1 by 2 end subscript$ , and four $δ_{i j} s$ . RF outperformed the other two models with RMSE values of 1.66 and $2.07 micromolar$ , MAE of 1.28 and $1.56 micromolar$ , MAPE of 0.29 and 0.23, and $R^{2}$ of 0.80 and 0.72 across training and testing predictions of $C_{B}$ , respectively (Table 1). In comparison, ANN and SVR showed less robustness, with RMSE values of 2.83 and $3.07 micromolar$ , MAE of 2.13 and $2.56 micromolar$ , MAPE of 0.39 and 0.35, and $R^{2}$ of 0.41 and 0.39 for ANN, and with RMSE values of 3.52 and $3.76 micromolar$ , MAE of 2.81 and $3.22 micromolar$ , MAPE of 0.69 and 0.47, and $R^{2}$ of 0.08 and 0.06 for SVR across training and testing sets (Table 1), respectively. Approximately 90% (165 of 174) and 84% (37 of 44) predicted $C_{B}$ values of training and testing sets showed to be within the 10-fold boundary when compared with measured $C_{B}$ values (Figure 2B), showing much better regression than ANN and SVR models (Figure S1, referring to the data in Excel Table S4). To further optimize the RF model, we considered adding sex, age, and variables of varying complexity including $Log K_{OW}$ , $Log K_{OA}$ , WS, and additional molecular descriptors to our RF model. However, the prediction performance was not dramatically improved when more parameters were included into the RF model. Detailed results were provided in the Supplemental Material, “Text S2.”

Table 1.

The prediction performance of three $C_{B}$ prediction machine learning models.

Model	Training set ( $lowercase italic n equals 172$ )				Testing set ( $lowercase italic n equals 44$ )
Model	RMSE	MAE	MAPE	$R^{2}$	RMSE	MAE	MAPE	$R^{2}$
Random forest	1.66	1.28	0.29	0.80	2.07	1.56	0.23	0.72
Artificial neural network	2.83	2.13	0.39	0.41	3.07	2.56	0.35	0.39
Support vector regression	3.52	2.81	0.69	0.08	3.79	3.22	0.47	0.06

Open in a new tab

Note: $C_{B}$ , blood concentration; MAE, mean absolute error; MAPE, mean absolute percentage error; RMSE, root mean square error.

Good prediction performance of the RF model were observed for some typical environmental pollutants, such as polychlorinated biphenyls (PCBs), dioxins, phthalate esters (PAEs), dioxins, polycyclic aromatic hydrocarbons (PAHs), perfluorinated compounds (PFCs), organophosphorus flame retardants (OPFRs), and volatile organic compounds (VOCs) (Figure 2C), with the RMSE of 0.64, 0.70, 0.71, 0.73, 0.83, 0.85, and 0.86, respectively (Table S1). In contrast, some substances, like personal care and consumer products (PPCPs) and organochlorine pesticides (OPs), showed relatively poor prediction performance, with the RMSE of 1.18 and 1.68, respectively. The RF model covered 50% compounds within 0.32 to 2.6 and 0.24 to 3.4 times of predicted $C_{B} / experimental$ $C_{B}$ ratios for training set and testing set, respectively (Figure 2D).

Using the final RF model, $C_{B} s$ were determined for each of the 7,858 ToxCast chemicals. In general, the predicted human blood $C_{B}$ of 7,858 ToxCast chemicals ranged from $1.02 \times 10^{- 6}$ to $3.25 \times 10^{- 2} μ M$ (Excel Table S1), ranging four orders of magnitude (Figure 3).

Figure 3 is a line graph, plotting predicted uppercase c begin subscript uppercase b end subscript (micromolar), ranging as 10 begin superscript negative 6 end superscript, 10 begin superscript negative 5 end superscript, 10 begin superscript negative 4 end superscript, 10 begin superscript negative 3 end superscript, 10 begin superscript negative 2 end superscript, 10 begin superscript negative 1 end superscript (y-axis) across chemical rank by uppercase c begin subscript uppercase b end subscript, ranging from 0 to 8000 in increments of 1000 (x-axis) for uppercase c begin subscript uppercase b end subscript and 95 percent confidence interval. — The cumulative distribution of chemical predicted $C_{B}$ using RF model ( $lowercase italic n equals 7858$ ). The bar indicates the median predicted $C_{B}$ for each chemical; the pink area represents the predicted $C_{B}$ range (5%−95%) derived from the Monte Carlo simulations. Some typical environmental pollutants are labeled. Note: $C_{B}$ , blood concentration; RF, random forest.

Uncertainty Analysis

Three MC simulations (DE prediction uncertainty alone, $lowercase t begin subscript 1 by 2 end subscript$ prediction uncertainty alone, and both) were performed to determine the predicted $C_{B}$ upper 95th percentile. The ratio of the $C_{B}$ for the 95th percentile to the median indicates the relative contribution uncertainty, with larger ratios indicating greater uncertainty. We observed that the ratio value of median $lowercase t begin subscript 1 by 2 end subscript$ prediction uncertainty (1.17) was close to DE prediction uncertainty (1.28). The ratio value of both uncertainty (2.17) was close to the sum of $lowercase t begin subscript 1 by 2 end subscript$ and DE, which indicated that the prediction of $lowercase t begin subscript 1 by 2 end subscript$ and DE contributed approximately the same degree of uncertainty to the prediction model.

Chemical Prioritization Using the U.S. EPA’s ToxCast Database

We evaluated bioactivity potential for each chemical across 12 in vitro assays from ToxCast using $activity concentration, 50 percent per uppercase c begin subscript uppercase b end subscript$ ratios calculated as ToxCast $activity concentration, 50 percent per uppercase c begin subscript uppercase b end subscript$ ratios. The 12 ToxCast in vitro HT screening assays,¹⁸ including the targets of $ER α$ , AR, $PPAR - γ$ and TR, were chosen as case studies. The total 12 assays covered two AR agonists, two AR antagonists, one $ER α$ agonist, one $ER α$ antagonist, two $PPAR γ$ agonists, two $PPAR γ$ antagonists, one TR agonist, and two TR antagonist assays (Excel Table S5). The $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ratios across all the 12 assays are listed in Excel Table S6, and the distribution (BEQ%) of each target assay result is shown in Excel Table S7. We found that each end point had obviously different chemical toxicity prioritization and had its own dominant contributor(s). For different assays of the same receptor toxicity end point, the results varied widely due to the distinct compounds tested by the different assays. It was interesting to find that drugs or endogenous chemicals were dominant contributors with the top $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ratios for most assays. For example, salidroside and N-vinyl-2-pyrrolidone were the most dominant contributors for Tox21_AR_LUC_MDAKB2_Agonist and TOX21_AR_LUC_MDAKB2_Antagonist_0.5 nM_R1881 assays, respectively, with the high $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ratios of 2,288, and BEQ% of 24.0% and 37.0%, respectively (Excel Table S7; Figure S2). Salidroside is a major component of Rhodiola rosea, which has been used in traditional Chinese medicine³⁹ and N-vinyl-2-pyrrolidone is used for treatment of infectious conjunctivitis.⁴⁰ For the TOX21_PPARg_BLA_antagonist_viability assay, the top contributor was ribavirin ( $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ : 327, BEQ: 79.1%), followed by ramipril (36.6, 8.84%) and diphenoxylate hydrochloride (31.0, 7.49%). Drugs like acipimox, 5-methyl-1-phenyl-2(1H)-pyridone, piconol, triacetin, and hexylcaine hydrochloride were the most abundant chemicals, accounting for 9.17%, 19.9%, 11.5%, 10.7%, and 4.05% in TOX21_AR_BLA_Agonist_ratio, TOX21_ERa_BLA_Agonist_ratio, TOX21_ERa_BLA_Antagonist_ratio, TOX21_PPARg_BLA_Agonist_ch2, and TOX21_TR_LUC_GH3_Agonist assay, respectively.

Because the predicted $C_{B}$ in this study was based only on the internal $C_{B}$ generated by the external exposure, we further excluded endogenous chemicals and drugs, and performed the analysis on the remaining 4,893 chemicals. After excluding endogenous chemicals and drugs, methyl formate, di(2-methoxyethyl) phthalate, propylammonium nitrate, 2,3-butanedione, and (3,5-dimethyl-1H-pyrazol-1-yl)methanol became the most dominant chemicals in TOX21_AR_BLA_Antagonist_ratio, TOX21_AR_LUC_MDAKB2_Antagonist_0.5nM_R1881, TOX21_PPARg_BLA_Antagonist_ch1, TOX21_PPARg_BLA_antagonist_viability, and TOX21_TR_LUC_GH3_Antagonist assay, with the BEQ% of 22.1%, 23.8%, 51.7%, 61.4%, and 46.3%, respectively (Excel Table S8). 2-Acetylpyrrole, thiamine thiozole, and aminopyridine a showed the highest $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ratios of 549, 494, and 401, respectively (Excel Table S6), which was the dominant contributor with the BEQ% of 10.3%, 9.31%, and 7.56%, for the TOX21_AR_BLA_Agonist_ratio assay (Excel Table S8), suggesting that they had a relatively high potential risk of androgen disruption. In the Tox21_AR_LUC_MDAKB2_Agonist, the largest contributions were 3,3′-(ethylenedioxy)dipropiononitrile ( $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ratio: 275, BEQ%: 10.3%), 1,3-dichloropropanone (226, 8.42%), and MCPB (214, 7.97%). The dominant contributor of the TOX21_AR_BLA_Antagonist_ratio assay was methyl formate (1218, 22.1%), followed by 1-bromoheptadecane (644, 11.7%), and 1,2-dimethylhydrazine dihydrochloride (488, 8.83%). In the TOX21_AR_LUC_MDAKB2_Antagonist_0.5nM_R1881 assay, di(2-methoxyethyl) phthalate (7.71, 23.8%), FD&C yellow 5 (7.50, 23.1%), and acetone (7.43, 22.9%) contributed the most.

In the TOX21_ERa_BLA_Agonist_ratio assay, the major contributions came from 1,1':4',1''-terphenyl (538, 22.8%), sodium nicotinate (379, 16.0%), and sodium 2,5-dimethylbenzenesulfonate (361, 15.3%). In the TOX21_ERa_BLA_Antagonist_ratio assay, 2-bromo-1-ethanol (550, 13.6%), benzyl nicotinate (379, 9.32%), and ethyl bromoacetate (224, 5.51%) were the dominant contributors. The dominant contributors were 1-bromopentadecane (328, 24.1%), beta-nitrostyrene (221, 16.2%), and (6Z)-non6-en-1-ol (216, 15.9%) for the Tox21_PPARg_BLA_Agonist_ratio assay; triacetin (1132, 18.2%), succinic anhydride (841, 17.2%), 2-(2-aminoethoxy)ethanol (680, 13.9%), and 2-pyrrolidinone (510, 10.5%) for the TOX21_PPARg_BLA_Agonist_ch2 assay; propylammonium nitrate (599, 51.7%), citronellol (246, 21.2%), geranyl formate (188, 16.3%), and isopentyl benzoate (39.5, 3.41%) for the TOX21_PPARg_BLA_Antagonist_ch1assay; and 2,3-butanedione (7.47, 61.4%), 3-acetyldihydro-2(3H)-furanone (2.61, 21.4%) and 3-mercaptopropyltrimethoxysilane (0.75, 6.15%) for the TOX21_PPARg_BLA_antagonist_viability assay.

The top contributions of the TOX21_TR_LUC_GH3_Agonist and assay were (4-methoxyphenyl)methanol (549, 6.20%), 2-butene-1,4-diol (528, 5.96%), 2,3-butanedione (523, 5.90%), and ethyl phthalyl ethyl glycolate (487, 5.50%). In the TOX21_TR_LUC_GH3_Antagonist assay, the dominant contributors were (3,5-dimethyl-1H-pyrazol-1-yl)methanol (638, 46.5%), phenethyl anthranilate (183, 13.4%), dimethyl isophthalate (89.5, 10.3%), and sodium 2-mercaptobenzothiolate (73.2, 6.91%).

We further retrieved the applications of the top 25 chemicals of each assay from the NCBI PubMed databases (https://pubmed.ncbi.nlm.nih.gov) (Excel Table S8), and we recalculated their BEQ% values after excluding drugs and endogenous substances: Food additives such as 2,3-butanedione, methyl formate, and FD&C Yellow 5 are used as flavoring agents or colorants, with the BEQ% values of 61.4%, 22.1%, and 23.1% in TOX21_PPARg_BLA_antagonist_viability, TOX21_AR_BLA_Antagonist_ratio, and TOX21_AR_LUC_MDAKB2_Antagonist_0.5nM_R1881 assay, respectively. Plasticizers such as dimethyl isophthalate (6.50%), diisobutyl phthalate (4.51%), and diethyl phthalate (4.16%), which are defined as U.S. Food and Drug Administration indirect additives used in food-contact substances, also showed significant activity after excluding drugs and endogenous substances in TOX21_TR_LUC_GH3_Antagonist, TOX21_PPARg_BLA_Agonist_ch2, and TOX21_AR_BLA_Antagonist_ratio assays, respectively. Chemicals such as propylammonium nitrate (51.7%) and (3,5-dimethyl-1H-pyrazol-1-yl)methanol (46.3%), used for solvents and cosmetic products, were the top contributors in TOX21_PPARg_BLA_Antagonist_ch1 and TOX21_TR_LUC_GH3_Antagonist assay, respectively.

Discussion

The framework described in this study provides several implications for HT chemical screening and prioritization. We used an HT machine learning algorithm for $C_{B}$ predictions with key parameters, including DE, $δ_{i j}$ , $V_{d}$ , and $lowercase t begin subscript 1 by 2 end subscript$ . This HT HExpPredict approach can rapidly relate environmental chemical exposures to in vitro bioactivity, helping drive priorities based on risk potential.

Based on direct comparison of RMSE, MAE, MAPE, and $R^{2}$ between models, we concluded that the RF model showed better performance than the other models. The ML model developed in this study was based on the physical and chemical properties and exposure of the chemicals. The input data of the ML models only included the key parameters DE, $δ_{i j}$ , $V_{d}$ , and $lowercase t begin subscript 1 by 2 end subscript$ , and we used the ML models to combine these variables to make predictions without the other parameters, such as bioavailability and plasma protein binding data. We noted that only 10.3% and 15.9% of our evaluation chemicals were predicted to be over the 10-fold boundary for the RF training and testing sets, respectively, showing good predictive ability. To build this ML model, some well-performed predictive models including the IFS approach and SEEM3 were applied. Although these models were evaluated and tested, it is important to note that these prediction models can continue to be improved with the generation of more data, which could also improve our present ML model in the future.

Uncertainty in predicting $C_{B}$ can be accounted for in risk prioritization if the degree of uncertainty can be predicted for each chemical. According to the results of the three MC simulations, the prediction uncertainties of $lowercase t begin subscript 1 by 2 end subscript$ and DE contributed approximate uncertainty to the ML prediction model. However, the uncertainty of the ML model was underestimated because of the lack of the $V_{d}$ uncertainty. Although $lowercase t begin subscript 1 by 2 end subscript$ and DE contributed approximately the same degree of uncertainty, some chemicals out of the model’s applicability domain, such as chemicals that contain silicon, were observed to have large standard errors in the prediction, which leads to high uncertainties for the $lowercase t begin subscript 1 by 2 end subscript$ .

The $C_{B}$ of phthalates such as di(2-methoxyethyl) phthalate, dipentyl phthalate, dipropyl phthalate, dihexyl phthalate, and bis(2-butoxyethyl) phthalate were predicted to be $7.86 \times 10^{- 3}$ ( $2.75 \times 10^{- 3} - 1.99 \times 10^{- 2}$ ), $3.08 \times 10^{- 3}$ ( $7.31 \times 10^{- 4} - 5.55 \times 10^{- 3}$ ), $1.93 \times 10^{- 3}$ ( $4.71 \times 10^{- 4} - 3.01 \times 10^{- 3}$ ), $1.50 \times 10^{- 3}$ ( $2.95 \times 10^{- 4} - 3.01 \times 10^{- 3}$ ) and $9.07 \times 10^{- 5} (2.97 \times 10^{- 5} - 3.18 \times 10^{- 4}) μ M$ , respectively. A phthalate metabolite such as monobutyl phthalate was predicted to be with the $C_{B}$ of $2.12 \times 10^{- 3} (6.64 \times 10^{- 4} - 3.82 \times 10^{- 3}) μ M$ [i.e., $0.47 open parenthesis 0.15 to 0.85 close parenthesis nanogram per milliliter$ ], which was consistent with the concentration of $0.5 nanogram per milliliter$ observed in the previous study.⁴¹ Because the exposure of phthalates is usually characterized by monitoring the concentrations of their metabolites in the urine, our model can HT predict the $C_{B}$ of these easily metabolized substances, which is convenient for subsequent HT prioritization of their toxicity and risk. The $C_{B} s$ of bisphenol A (BPA) alternatives, such as bisphenol AF (BPAF), were predicted to be $0.020 open parenthesis 0.019 to 0.021 close parenthesis nanogram per milliliter$ , which was similar to the GM concentration of 0.01 ng/mL determined in the previous study.⁴² Perfluorinated compounds such as perfluorononanoic acid (PFNA) and perfluoroundecanoic acid (PFUnA) were predicted to have the $C_{B} s$ of 0.21 (0.19–0.52) and $0.17 open parenthesis 0.13 to 0.27 close parenthesis nanogram per milliliter$ , respectively, which were within the GM concentration ranges of 0.11–1.88 and $0.07 to 1.38 nanograms per milliliter$ , respectively, as observed in the general populations in 13 Chinese cities.⁴³ However, for perfluorohexanoic acid (PFHxA), the predicted $C_{B}$ value (0.24; 95% CI: $0.21 to 0.26 nanogram per milliliter$ ) was a little bit higher than the GM concentration range of $0.02 to 0.21 nanogram per milliliter$ of the 13 Chinese cities’ general populations.⁴³ The estimated concentration can be very useful in the exposure or toxicity prioritization or even the mixture effect of blood exposome.^31,44,45 In this study, the potential health effects and the causal compounds of ToxCast were summarized, revealing several key biomarker assays. A total of 12 ToxCast assays were used to assess the health effects of 4,893 chemicals, which showed different risk-based prioritization patterns. In addition to the top risk substances listed in the “Results” section, we found it interesting that some typical AR agonists, such as 2,3,7,8-Tetrachlorodibenzo-p-dioxin with the $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ratio of 0.12 for Tox21_AR_LUC_MDAKB2_Agonist assay, also showed relative higher (97th of 4,893 chemicals) AR agonist activity owing to its extremely low $activity concentration, 50 percent$ ( $6.45 \times 10^{- 5} μ M$ ). In contrast, due to its low $C_{B}$ ( $7.91 \times 10^{- 6} μ M$ ), the BEQ% was only 0.0045%. Nonylparaben showed relatively strong AR antagonist activity, with the $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ratio of 2.49 and BEQ% of 0.05% in the TOX21_AR_BLA_Agonist_ratio assay, and diethyl phthalate showed very strong AR antagonist activity, with the $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ratio of 230 and BEQ% of 3.59% in the TOX21_AR_BLA_Antagonist_ratio assay.

Due to the very low $activity concentration, 50 percent$ values, some pesticides such as siduron and tribufos were observed to have relatively strong ER agonist activity in the TOX21_ERa_BLA_Agonist_ratio assay, with the $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ratios of 15.1 and 5.92, and BEQ% of 30.49% and 0.19%, respectively, and benzyl nicotinate (379, 9.26%) and diallate (19.5, 0.48%) were found to have strong antagonist activity in the TOX21_ERa_BLA_Antagonist_ratio assay, with the $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ratios of 379 and 19.5 and BEQ% of 9.26% and 0.48%, respectively. In the ER agonist and antagonist assays, phthalates, BPA, and BPA alternatives were negligible due to their relatively higher $activity concentration, 50 percent$ . For example, the $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ratios of BPA in the TOX21_ERa_BLA_Agonist_ratio assay and di(2-ethylhexyl) phthalate (DEHP) in the TOX21_ERa_BLA_Antagonist_ratio assay were only $4.22 \times 10^{- 3}$ and $5.07 \times 10^{- 4}$ , respectively, due to their higher $activity concentration, 50 percent$ of 0.96 and $6.46 micromolar$ , respectively, although they had relative high $C_{B}$ values of $4.06 \times 10^{- 3}$ and $3.27 \times 10^{- 3} μ M$ , respectively. For the organophosphate compounds, the $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ratios of dibenzyl phosphate and triisobutyl phosphate were 202 and 13.6, respectively, in the TOX21_TR_LUC_GH3_Agonist assay, showing strong TR agonist activity. Triisobutyl phosphate also showed strong PPAR agonist activity, with the $uppercase c begin subscript uppercase b end subscript per activity concentration, 50 percent$ ratio of 41.2 in the Tox21_PPARg_BLA_Agonist_ratio assay.

An interesting finding was that when drugs and endogenous substances were excluded, food additives were the major contributors of BEQ% to the majority of assays (Figure 4; Figure S3) due to high predicted exposure by SEEM3. Food additives such as 2,3-butanedione, methyl formate, FD&C Yellow 5, and succinic anhydride showed a high potential receptor activity in AR or $PPAR γ$ (Figure 4). However, these substances are not typical pollutants, and there are little data on their biomonitoring in humans, raising concerns about their potential health risks. U.S. FDA indirect additives used in food-packing materials, such as dimethyl isophthalate, diisobutyl phthalate, and diethyl phthalate, also showed modest potential receptor activity in TR, $PPAR γ$ , and AR. The health risk of food additives and indirect food additives should be studied further. It should be noted that, besides nuclear receptors, adverse outcome pathways (AOP) (https://aopkb.oecd.org) with more toxicological end points could be further considered in future risk prioritization.

Figure 4 is a set of four pie charts. On the top-left, the pie chart is titled T O X 21 underscore A R underscore B L A underscore antagonist underscore ratio displays the following information: 3.3 percent diethyl maleate, food additive; 4.2 percent diethyl phthalate, food indirect additive; 4.3 percent 2, 3-dimethylimidazolium hexafluorophosphate, 4.5 percent ethyl hexadecanoate, food additive; 4.8 percent 3-acetyl-2,5-dimethylfuran, food addictive; 6.1 percent Bis(1-methylethyl) methylphosphonate; 22.1 percent methyl formate, food additive; 11.7 percent -bromoheptadecane; 8.8 percent 1,2-dimethylhydrazine dihydrochloride; 8 percent n,n-diisopropyl ethanolamine. At the bottom-left, the pie chart is titled T O X 21 underscore P P A Rg underscore B L A underscore antagonist underscore viability displays the following information: 3.4 percent 2-penty-2-cyclopenten-1-one; 6.2 percent 3-mercaptopropyltrimethoxysilane; 21.5 percent 3-acetyldihydro-2(3 H)-furanone; 61.5 percent 2,3-butanedione, food additive. On the top-right, the pie chart is titled T O X 21 underscore A R underscore L U C underscore M D A K B 2 underscore antagonist displays the following information: 6.4 percent diallyl disulfide, food additive; 19.1 percent acetic acid, mercapto-,ethylester, 22.9 percent acetone, 23.8 percent di(2-methoxyethyl) phthalate, 23.2 percent F D and C yellow 5, food additive. At the bottom-right, the pie chart is titled T O X 21 underscore underscore P P A Rg underscore B L A underscore agonist underscore ch2 displays the following information: 3.2 percent 2-butenoic acid, food additive; 4 percent R-(plus)-pulegone, food additive; 4.1 percent 2-phenylmercaptoethanol; 4.1 percent (6 Z)-Non-6-en-l-ol, food additive; 4.5 percent diisobutyl phthalate, food indirect additive; 17.2 percent succinic anhydride, food additive; 13.9 percent 2-(2-aminoethoxy)ethanol; 10.5 percent 2-pyrolidinone; 5 percent (Z)-5-octen-l-ol, food additive, and 4.9 percent 1,2-benzendicarbonitrile, food additive. — Figure 4. Toxicity contributions (percentage) of ToxCast chemicals (excluding the drugs and endogenous compounds) in assays of AR and $PPAR γ$ as examples (referring to the data in Excel Table S8). Note: AR, androgen receptor; FA, food additive; FIA, food indirect additive; $PPAR γ$ , peroxisome proliferator–activated receptor.

This study has several limitations. First, we could not predict the $lowercase t begin subscript 1 by 2 end subscript$ of chemicals with a metal atom or molecular weight over 1,000 using the IFS approach. In addition, some chemicals had extreme properties that were out of the model’s applicability domain, such as silicon-containing chemicals. These chemicals were observed with large standard errors higher than the predicted mean. As far as we are concerned, no computational model can handle silicon-containing molecules at this point. Second, the ExpoCast database was unable to cover all the ToxCast compounds, and ExpoCast merely represents the exposure of typical Americans for their historical exposure. Because the amount of chemicals used varies with the year, the variation of chemical exposure and the year of blood collection has a certain impact on the predicted results. Our prediction should be periodically updated to incorporate new estimated exposure and measured $C_{B} s$ of chemicals in the future. Third, models based on subsets of measured data for chemical groups were not considered in the prediction model due to limited measured data. In the future, more accurate prediction models based on different chemical subsets could be built if we can collect sufficient data as a training set. In addition, we regarded the concentrations of blood, plasma, and serum as $C_{B} s$ and did not consider parameters such as plasma protein binding. Nonetheless, the predicted $C_{B} s$ in this study can still contribute to the concentrations’ ranking of substances in human blood and the prioritization of potential biological effects. An accurate PBPK model could be combined with the $C_{B}$ prediction model of this study in the future to predict concentrations in other organs, and animal experiments for validation of the model should be made in the future as well. Fourth, although the $activity concentration, 50 percent$ value has become a standard way to compare potencies of chemicals in in vitro pharmacology and toxicology studies, it may not be the best metric for prioritization or estimating toxicological risk based on well-designed in vitro tests. Fifth, the mode of toxic action (MOA), which was not considered in our prediction model, is related to the $C_{B}$ and metabolism of the chemical. The MOA could be considered in future work to refine the model. Last, the present prioritization results based on ToxCast data have limitations in predicting the toxicities of the chemicals due to the limited assays adopted by the ToxCast exercise, and different chemicals were tested in different assays. Therefore, it is still impossible to systematically evaluate the contribution of one chemical in different toxicological end points.

In conclusion, we curated the $C_{B} s$ of 216 compounds and developed ML algorithms for $C_{B}$ prediction, and our work improved HT risk prioritization for large numbers of environmental chemicals. Many of the high-risk chemicals in some assays were also unexpected. This study has implications for current efforts to overhaul existing chemical testing methods to address the disparity in the number of tested and untested chemicals. By using the HT method, chemicals could be screened in a cost-effective and efficient manner, which provides a better basis for informed decisions on chemical testing priorities and regulatory attention.

Supplementary Material

Click here for additional data file.^{(68.3KB, pdf)}

Click here for additional data file.^{(479.3KB, pdf)}

Click here for additional data file.^{(9.4MB, zip)}

Acknowledgments

This work was funded by the National Key R&D Program (No. 2022YFC3702600 and 2022YFC3702601), the Singapore Ministry of Education Academic Research Fund Tier 1 (04MNP000567C120), and the Startup Grant of Fudan University (No. JIH 1829010Y).

In addition, to improve the applicability of our model, the R scripts are also provided at https://github.com/FangLabNTU/HExpPredict.

References

1.Shin H-M, Ernstoff A, Arnot JA, Wetmore BA, Csiszar SA, Fantke P, et al. 2015. Risk-based high-throughput chemical screening and prioritization using exposure models and in vitro bioactivity assays. Environ Sci Technol 49(11):6760–6771, PMID: , 10.1021/acs.est.5b00498. [DOI] [PubMed] [Google Scholar]
2.Li L, Sangion A, Wania F, Armitage JM, Toose L, Hughes L, et al. 2021. Development and evaluation of a holistic and mechanistic modeling framework for chemical emissions, fate, exposure, and risk. Environ Health Perspect 129(12):127006, PMID: , 10.1289/EHP9372. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Ring CL, Arnot JA, Bennett DH, Egeghy PP, Fantke P, Huang L, et al. 2019. Consensus modeling of median chemical intake for the U.S. population based on predictions of exposure pathways. Environ Sci Technol 53(2):719–732, PMID: , 10.1021/acs.est.8b04056. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Wambaugh JF, Bare JC, Carignan CC, Dionisio KL, Dodson RE, Jolliet O, et al. 2019. New approach methodologies for exposure science. Curr Opin Toxicol 15:76–92, 10.1016/j.cotox.2019.07.001. [DOI] [Google Scholar]
5.Dix DJ, Houck KA, Martin MT, Richard AM, Setzer RW, Kavlock RJ. 2007. The ToxCast program for prioritizing toxicity testing of environmental chemicals. Toxicol Sci 95(1):5–12, PMID: , 10.1093/toxsci/kfl103. [DOI] [PubMed] [Google Scholar]
6.Honda GS, Pearce RG, Pham LL, Setzer RW, Wetmore BA, Sipes NS, et al. 2019. Using the concordance of in vitro and in vivo data to evaluate extrapolation assumptions. PLoS One 14(5):e0217564, PMID: , 10.1371/journal.pone.0217564. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Blaauboer BJ. 2010. Biokinetic modeling and in vitro-in vivo extrapolations. J Toxicol Environ Health B Crit Rev 13(2–4):242–252, PMID: , 10.1080/10937404.2010.483940. [DOI] [PubMed] [Google Scholar]
8.Wambaugh JF, Wetmore BA, Pearce R, Strope C, Goldsmith R, Sluka JP, et al. 2015. Toxicokinetic triage for environmental chemicals. Toxicol Sci 147(1):55–67, PMID: , 10.1093/toxsci/kfv118. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.David A, Chaker J, Price EJ, Bessonneau V, Chetwynd AJ, Vitale CM, et al. 2021. Towards a comprehensive characterisation of the human internal chemical exposome: challenges and perspectives. Environ Int 156:106630, PMID: , 10.1016/j.envint.2021.106630. [DOI] [PubMed] [Google Scholar]
10.Rappaport SM, Barupal DK, Wishart D, Vineis P, Scalbert A. 2014. The blood exposome and its role in discovering causes of disease. Environ Health Perspect 122(8):769–774, PMID: , 10.1289/ehp.1308015. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Jia S, Xu T, Huan T, Chong M, Liu M, Fang W, et al. 2019. Chemical isotope labeling exposome (CIL-EXPOSOME): one high-throughput platform for human urinary global exposome characterization. Environ Sci Technol 53(9):5445–5453, PMID: , 10.1021/acs.est.9b00285. [DOI] [PubMed] [Google Scholar]
12.Zhao F, Kang Q, Zhang X, Liu J, Hu J. 2019. Urinary biomarkers for assessment of human exposure to monomeric aryl phosphate flame retardants. Environ Int 124:259–264, PMID: , 10.1016/j.envint.2019.01.022. [DOI] [PubMed] [Google Scholar]
13.Sipes NS, Wambaugh JF, Pearce R, Auerbach SS, Wetmore BA, Hsieh J-H, et al. 2017. An intuitive approach for predicting potential human health risk with the Tox21 10k library. Environ Sci Technol 51(18):10786–10796, PMID: , 10.1021/acs.est.7b00650. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Armitage JM, Hughes L, Sangion A, Arnot JA. 2021. Development and intercomparison of single and multicompartment physiologically-based toxicokinetic models: implications for model selection and tiered modeling frameworks. Environ Int 154:106557, PMID: , 10.1016/j.envint.2021.106557. [DOI] [PubMed] [Google Scholar]
15.Wetmore BA, Wambaugh JF, Allen B, Ferguson SS, Sochaski MA, Setzer RW, et al. 2015. Incorporating high-throughput exposure predictions with dosimetry-adjusted in vitro bioactivity to inform chemical toxicity testing. Toxicol Sci 148(1):121–136, PMID: , 10.1093/toxsci/kfv171. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Wetmore BA, Wambaugh JF, Ferguson SS, Sochaski MA, Rotroff DM, Freeman K, et al. 2012. Integration of dosimetry, exposure, and high-throughput screening data in chemical toxicity assessment. Toxicol Sci 125(1):157–174, PMID: , 10.1093/toxsci/kfr254. [DOI] [PubMed] [Google Scholar]
17.Wadhwa R, Cascella M. Steady State Concentration. Treasure Island, FL: StatPearls Publishing. https://www.ncbi.nlm.nih.gov/books/NBK553132/ [accessed 12 December 2022]. [PubMed] [Google Scholar]
18.U.S. EPA (U.S. Environmental Protection Agency). Previously Published ToxCast Data. Updated data released October 2018. https://epa.figshare.com/articles/dataset/Previously_Published_ToxCast_Data/6062551/3 [accessed 4 March 2023].
19.Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, et al. 2017. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform 9(1):61, PMID: , 10.1186/s13321-017-0247-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Wambaugh JF, Wetmore BA, Ring CL, Nicolas CI, Pearce RG, Honda GS, et al. 2019. Assessing toxicokinetic uncertainty and variability in risk prioritization. Toxicol Sci 172(2):235–251, PMID: , 10.1093/toxsci/kfz205. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Zhao F, Li L, Chen Y, Huang Y, Keerthisinghe TP, Chow A, et al. 2021. Risk-Based chemical ranking and generating a prioritized human exposome database. Environ Health Perspect 129(4):47014, PMID: , 10.1289/EHP7722. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Arnot JA, Brown TN, Wania F. 2014. Estimating screening-level organic chemical half-lives in humans. Environ Sci Technol 48(1):723–730, PMID: , 10.1021/es4029414. [DOI] [PubMed] [Google Scholar]
23.Li L, Westgate JN, Hughes L, Zhang X, Givehchi B, Toose L, et al. 2018. A model for risk-based screening and prioritization of human exposure to chemicals from near-field sources. Environ Sci Technol 52(24):14235–14244, PMID: , 10.1021/acs.est.8b04059. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Ulrich N, Endo S, Brown TN, Watanabe N, Bronner G, Abraham MH, et al. 2017. UFZ-LSER database v 3.2.1. Leipzig, Germany: Helmholtz Centre for Environmental Research-UFZ. [Google Scholar]
25.U.S. EPA. 2022. Toxicity Estimation Software Tool (TEST). https://www.epa.gov/chemical-research/toxicity-estimation-software-tool-test [accessed 4 March 2023].
26.U.S. CDC (U.S. Centers for Disease Control and Prevention). 2022. National Report on Human Exposure to Environmental Chemicals. https://www.cdc.gov/exposurereport/ [accessed 8 December 2022].
27.Office of Environmental Health Hazard Assessment, California Department of Public Health. 2020. Explore Results: Biomonitoring California’s Results Database. https://biomonitoring.ca.gov/results/explore [accessed 5 December 2021].
28.Neveu V, Moussy A, Rouaix H, Wedekind R, Pon A, Knox C, et al. 2017. Exposome-Explorer: a manually-curated database on biomarkers of exposure to dietary and environmental factors. Nucleic Acids Res 45(D1):D979–D984, PMID: , 10.1093/nar/gkw980. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Dong T, Zhang Y, Jia S, Shang H, Fang W, Chen D, et al. 2019. Human indoor exposome of chemicals in dust and risk prioritization using EPA’s ToxCast database. Environ Sci Technol 53(12):7045–7054, PMID: , 10.1021/acs.est.9b00280. [DOI] [PubMed] [Google Scholar]
30.Jia S, Sankaran G, Wang B, Shang H, Tan ST, Yap HM, et al. 2019. Exposure and risk assessment of volatile organic compounds and airborne phthalates in Singapore’s child care centers. Chemosphere 224:85–92, PMID: , 10.1016/j.chemosphere.2019.02.120. [DOI] [PubMed] [Google Scholar]
31.Zhang Y, Liu M, Peng B, Jia S, Koh D, Wang Y, et al. 2020. Impact of mixture effects between emerging organic contaminants on cytotoxicity: a systems biological understanding of synergism between tris(1,3-dichloro-2-propyl)phosphate and triphenyl phosphate. Environ Sci Technol 54(17):10722–10734, PMID: , 10.1021/acs.est.0c02188. [DOI] [PubMed] [Google Scholar]
32.U.S. EPA. 2022. CompTox Chemicals Dashboard (Version 2.1) Assay Endpoints List. https://comptox.epa.gov/dashboard/assay-endpoints?filtered= [accessed 8 December 2022].
33.Glynn A, Aune M, Darnerud PO, Cnattingius S, Bjerselius R, Becker W, et al. 2007. Determinants of serum concentrations of organochlorine compounds in Swedish pregnant women: a cross-sectional study. Environ Health 6:2, PMID: , 10.1186/1476-069X-6-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Koukoulakis KG, Kanellopoulos PG, Chrysochou E, Koukoulas V, Minaidis M, Maropoulos G, et al. 2020. Leukemia and PAHs levels in human blood serum: preliminary results from an adult cohort in Greece. Atmos Pollut Res 11(9):1552–1565, 10.1016/j.apr.2020.06.018. [DOI] [Google Scholar]
35.Mathur H, Agarwal H, Johnson S, Saikia N. 2005. Analysis of Pesticide Residues in Blood Samples from Villages of Punjab. CSE/PML/PR-21, India, 1–15.
36.Mochalski P, King J, Klieber M, Unterkofler K, Hinterhuber H, Baumann M, et al. 2013. Blood and breath levels of selected volatile organic compounds in healthy volunteers. Analyst 138(7):2134–2145, PMID: , 10.1039/c3an36756h. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Zhao F, Wan Y, Zhao H, Hu W, Mu D, Webster TF, et al. 2016. Levels of blood organophosphorus flame retardants and association with changes in human sphingolipid homeostasis. Environ Sci Technol 50(16):8896–8903, PMID: , 10.1021/acs.est.6b02474. [DOI] [PubMed] [Google Scholar]
38.U.S. EPA. 2022. CompTox Chemicals Dashboard (Version 2.1.1) Batch Search. https://comptox.epa.gov/dashboard/batch-search [accessed 4 March 2023].
39.Zhang X, Xie L, Long J, Xie Q, Zheng Y, Liu K, et al. 2021. Salidroside: a review of its recent advances in synthetic pathways and pharmacological properties. Chem Biol Interact 339:109268, PMID: , 10.1016/j.cbi.2020.109268. [DOI] [PubMed] [Google Scholar]
40.National Library of Medicine, PubChem. PubChem Compound Summary for CID 6917, N-Vinyl-2-pyrrolidone. https://pubchem.ncbi.nlm.nih.gov/compound/N-Vinyl-2-pyrrolidone [accessed 22 January 2022].
41.Wang Y, Zhu H, Kannan K. 2019. A review of biomonitoring of phthalate exposures. Toxics 7(2):21, PMID: , 10.3390/toxics7020021. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Zhang B, He Y, Zhu H, Huang X, Bai X, Kannan K, et al. 2020. Concentrations of bisphenol A and its alternatives in paired maternal-fetal urine, serum and amniotic fluid from an e-waste dismantling area in China. Environ Int 136:105407, PMID: , 10.1016/j.envint.2019.105407. [DOI] [PubMed] [Google Scholar]
43.Zhang S, Kang Q, Peng H, Ding M, Zhao F, Zhou Y, et al. 2019. Relationship between perfluorooctanoate and perfluorooctane sulfonate blood concentrations in the general population and routine drinking water exposure. Environ Int 126:54–60, PMID: , 10.1016/j.envint.2019.02.009. [DOI] [PubMed] [Google Scholar]
44.Liu M, Jia S, Dong T, Zhao F, Xu T, Yang Q, et al. 2020. Metabolomic and transcriptomic analysis of MCF-7 cells exposed to 23 chemicals at human-relevant levels: estimation of individual chemical contribution to effects. Environ Health Perspect 128(12):127008, PMID: , 10.1289/EHP6641. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Fang M, Hu L, Chen D, Guo Y, Liu J, Lan C, et al. 2021. Exposome in human health: utopia or wonderland? Innovation (Camb) 2(4):100172, PMID: , 10.1016/j.xinn.2021.100172. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Click here for additional data file.^{(68.3KB, pdf)}

Click here for additional data file.^{(479.3KB, pdf)}

Click here for additional data file.^{(9.4MB, zip)}

[c1] 1.Shin H-M, Ernstoff A, Arnot JA, Wetmore BA, Csiszar SA, Fantke P, et al. 2015. Risk-based high-throughput chemical screening and prioritization using exposure models and in vitro bioactivity assays. Environ Sci Technol 49(11):6760–6771, PMID: , 10.1021/acs.est.5b00498. [DOI] [PubMed] [Google Scholar]

[c2] 2.Li L, Sangion A, Wania F, Armitage JM, Toose L, Hughes L, et al. 2021. Development and evaluation of a holistic and mechanistic modeling framework for chemical emissions, fate, exposure, and risk. Environ Health Perspect 129(12):127006, PMID: , 10.1289/EHP9372. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c3] 3.Ring CL, Arnot JA, Bennett DH, Egeghy PP, Fantke P, Huang L, et al. 2019. Consensus modeling of median chemical intake for the U.S. population based on predictions of exposure pathways. Environ Sci Technol 53(2):719–732, PMID: , 10.1021/acs.est.8b04056. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c4] 4.Wambaugh JF, Bare JC, Carignan CC, Dionisio KL, Dodson RE, Jolliet O, et al. 2019. New approach methodologies for exposure science. Curr Opin Toxicol 15:76–92, 10.1016/j.cotox.2019.07.001. [DOI] [Google Scholar]

[c5] 5.Dix DJ, Houck KA, Martin MT, Richard AM, Setzer RW, Kavlock RJ. 2007. The ToxCast program for prioritizing toxicity testing of environmental chemicals. Toxicol Sci 95(1):5–12, PMID: , 10.1093/toxsci/kfl103. [DOI] [PubMed] [Google Scholar]

[c6] 6.Honda GS, Pearce RG, Pham LL, Setzer RW, Wetmore BA, Sipes NS, et al. 2019. Using the concordance of in vitro and in vivo data to evaluate extrapolation assumptions. PLoS One 14(5):e0217564, PMID: , 10.1371/journal.pone.0217564. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c7] 7.Blaauboer BJ. 2010. Biokinetic modeling and in vitro-in vivo extrapolations. J Toxicol Environ Health B Crit Rev 13(2–4):242–252, PMID: , 10.1080/10937404.2010.483940. [DOI] [PubMed] [Google Scholar]

[c8] 8.Wambaugh JF, Wetmore BA, Pearce R, Strope C, Goldsmith R, Sluka JP, et al. 2015. Toxicokinetic triage for environmental chemicals. Toxicol Sci 147(1):55–67, PMID: , 10.1093/toxsci/kfv118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c9] 9.David A, Chaker J, Price EJ, Bessonneau V, Chetwynd AJ, Vitale CM, et al. 2021. Towards a comprehensive characterisation of the human internal chemical exposome: challenges and perspectives. Environ Int 156:106630, PMID: , 10.1016/j.envint.2021.106630. [DOI] [PubMed] [Google Scholar]

[c10] 10.Rappaport SM, Barupal DK, Wishart D, Vineis P, Scalbert A. 2014. The blood exposome and its role in discovering causes of disease. Environ Health Perspect 122(8):769–774, PMID: , 10.1289/ehp.1308015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c11] 11.Jia S, Xu T, Huan T, Chong M, Liu M, Fang W, et al. 2019. Chemical isotope labeling exposome (CIL-EXPOSOME): one high-throughput platform for human urinary global exposome characterization. Environ Sci Technol 53(9):5445–5453, PMID: , 10.1021/acs.est.9b00285. [DOI] [PubMed] [Google Scholar]

[c12] 12.Zhao F, Kang Q, Zhang X, Liu J, Hu J. 2019. Urinary biomarkers for assessment of human exposure to monomeric aryl phosphate flame retardants. Environ Int 124:259–264, PMID: , 10.1016/j.envint.2019.01.022. [DOI] [PubMed] [Google Scholar]

[c13] 13.Sipes NS, Wambaugh JF, Pearce R, Auerbach SS, Wetmore BA, Hsieh J-H, et al. 2017. An intuitive approach for predicting potential human health risk with the Tox21 10k library. Environ Sci Technol 51(18):10786–10796, PMID: , 10.1021/acs.est.7b00650. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c14] 14.Armitage JM, Hughes L, Sangion A, Arnot JA. 2021. Development and intercomparison of single and multicompartment physiologically-based toxicokinetic models: implications for model selection and tiered modeling frameworks. Environ Int 154:106557, PMID: , 10.1016/j.envint.2021.106557. [DOI] [PubMed] [Google Scholar]

[c15] 15.Wetmore BA, Wambaugh JF, Allen B, Ferguson SS, Sochaski MA, Setzer RW, et al. 2015. Incorporating high-throughput exposure predictions with dosimetry-adjusted in vitro bioactivity to inform chemical toxicity testing. Toxicol Sci 148(1):121–136, PMID: , 10.1093/toxsci/kfv171. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c16] 16.Wetmore BA, Wambaugh JF, Ferguson SS, Sochaski MA, Rotroff DM, Freeman K, et al. 2012. Integration of dosimetry, exposure, and high-throughput screening data in chemical toxicity assessment. Toxicol Sci 125(1):157–174, PMID: , 10.1093/toxsci/kfr254. [DOI] [PubMed] [Google Scholar]

[c17] 17.Wadhwa R, Cascella M. Steady State Concentration. Treasure Island, FL: StatPearls Publishing. https://www.ncbi.nlm.nih.gov/books/NBK553132/ [accessed 12 December 2022]. [PubMed] [Google Scholar]

[c18] 18.U.S. EPA (U.S. Environmental Protection Agency). Previously Published ToxCast Data. Updated data released October 2018. https://epa.figshare.com/articles/dataset/Previously_Published_ToxCast_Data/6062551/3 [accessed 4 March 2023].

[c19] 19.Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, et al. 2017. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform 9(1):61, PMID: , 10.1186/s13321-017-0247-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c20] 20.Wambaugh JF, Wetmore BA, Ring CL, Nicolas CI, Pearce RG, Honda GS, et al. 2019. Assessing toxicokinetic uncertainty and variability in risk prioritization. Toxicol Sci 172(2):235–251, PMID: , 10.1093/toxsci/kfz205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c21] 21.Zhao F, Li L, Chen Y, Huang Y, Keerthisinghe TP, Chow A, et al. 2021. Risk-Based chemical ranking and generating a prioritized human exposome database. Environ Health Perspect 129(4):47014, PMID: , 10.1289/EHP7722. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c22] 22.Arnot JA, Brown TN, Wania F. 2014. Estimating screening-level organic chemical half-lives in humans. Environ Sci Technol 48(1):723–730, PMID: , 10.1021/es4029414. [DOI] [PubMed] [Google Scholar]

[c23] 23.Li L, Westgate JN, Hughes L, Zhang X, Givehchi B, Toose L, et al. 2018. A model for risk-based screening and prioritization of human exposure to chemicals from near-field sources. Environ Sci Technol 52(24):14235–14244, PMID: , 10.1021/acs.est.8b04059. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c24] 24.Ulrich N, Endo S, Brown TN, Watanabe N, Bronner G, Abraham MH, et al. 2017. UFZ-LSER database v 3.2.1. Leipzig, Germany: Helmholtz Centre for Environmental Research-UFZ. [Google Scholar]

[c25] 25.U.S. EPA. 2022. Toxicity Estimation Software Tool (TEST). https://www.epa.gov/chemical-research/toxicity-estimation-software-tool-test [accessed 4 March 2023].

[c26] 26.U.S. CDC (U.S. Centers for Disease Control and Prevention). 2022. National Report on Human Exposure to Environmental Chemicals. https://www.cdc.gov/exposurereport/ [accessed 8 December 2022].

[c27] 27.Office of Environmental Health Hazard Assessment, California Department of Public Health. 2020. Explore Results: Biomonitoring California’s Results Database. https://biomonitoring.ca.gov/results/explore [accessed 5 December 2021].

[c28] 28.Neveu V, Moussy A, Rouaix H, Wedekind R, Pon A, Knox C, et al. 2017. Exposome-Explorer: a manually-curated database on biomarkers of exposure to dietary and environmental factors. Nucleic Acids Res 45(D1):D979–D984, PMID: , 10.1093/nar/gkw980. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c29] 29.Dong T, Zhang Y, Jia S, Shang H, Fang W, Chen D, et al. 2019. Human indoor exposome of chemicals in dust and risk prioritization using EPA’s ToxCast database. Environ Sci Technol 53(12):7045–7054, PMID: , 10.1021/acs.est.9b00280. [DOI] [PubMed] [Google Scholar]

[c30] 30.Jia S, Sankaran G, Wang B, Shang H, Tan ST, Yap HM, et al. 2019. Exposure and risk assessment of volatile organic compounds and airborne phthalates in Singapore’s child care centers. Chemosphere 224:85–92, PMID: , 10.1016/j.chemosphere.2019.02.120. [DOI] [PubMed] [Google Scholar]

[c31] 31.Zhang Y, Liu M, Peng B, Jia S, Koh D, Wang Y, et al. 2020. Impact of mixture effects between emerging organic contaminants on cytotoxicity: a systems biological understanding of synergism between tris(1,3-dichloro-2-propyl)phosphate and triphenyl phosphate. Environ Sci Technol 54(17):10722–10734, PMID: , 10.1021/acs.est.0c02188. [DOI] [PubMed] [Google Scholar]

[c32] 32.U.S. EPA. 2022. CompTox Chemicals Dashboard (Version 2.1) Assay Endpoints List. https://comptox.epa.gov/dashboard/assay-endpoints?filtered= [accessed 8 December 2022].

[c33] 33.Glynn A, Aune M, Darnerud PO, Cnattingius S, Bjerselius R, Becker W, et al. 2007. Determinants of serum concentrations of organochlorine compounds in Swedish pregnant women: a cross-sectional study. Environ Health 6:2, PMID: , 10.1186/1476-069X-6-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c34] 34.Koukoulakis KG, Kanellopoulos PG, Chrysochou E, Koukoulas V, Minaidis M, Maropoulos G, et al. 2020. Leukemia and PAHs levels in human blood serum: preliminary results from an adult cohort in Greece. Atmos Pollut Res 11(9):1552–1565, 10.1016/j.apr.2020.06.018. [DOI] [Google Scholar]

[c35] 35.Mathur H, Agarwal H, Johnson S, Saikia N. 2005. Analysis of Pesticide Residues in Blood Samples from Villages of Punjab. CSE/PML/PR-21, India, 1–15.

[c36] 36.Mochalski P, King J, Klieber M, Unterkofler K, Hinterhuber H, Baumann M, et al. 2013. Blood and breath levels of selected volatile organic compounds in healthy volunteers. Analyst 138(7):2134–2145, PMID: , 10.1039/c3an36756h. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c37] 37.Zhao F, Wan Y, Zhao H, Hu W, Mu D, Webster TF, et al. 2016. Levels of blood organophosphorus flame retardants and association with changes in human sphingolipid homeostasis. Environ Sci Technol 50(16):8896–8903, PMID: , 10.1021/acs.est.6b02474. [DOI] [PubMed] [Google Scholar]

[c38] 38.U.S. EPA. 2022. CompTox Chemicals Dashboard (Version 2.1.1) Batch Search. https://comptox.epa.gov/dashboard/batch-search [accessed 4 March 2023].

[c39] 39.Zhang X, Xie L, Long J, Xie Q, Zheng Y, Liu K, et al. 2021. Salidroside: a review of its recent advances in synthetic pathways and pharmacological properties. Chem Biol Interact 339:109268, PMID: , 10.1016/j.cbi.2020.109268. [DOI] [PubMed] [Google Scholar]

[c40] 40.National Library of Medicine, PubChem. PubChem Compound Summary for CID 6917, N-Vinyl-2-pyrrolidone. https://pubchem.ncbi.nlm.nih.gov/compound/N-Vinyl-2-pyrrolidone [accessed 22 January 2022].

[c41] 41.Wang Y, Zhu H, Kannan K. 2019. A review of biomonitoring of phthalate exposures. Toxics 7(2):21, PMID: , 10.3390/toxics7020021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c42] 42.Zhang B, He Y, Zhu H, Huang X, Bai X, Kannan K, et al. 2020. Concentrations of bisphenol A and its alternatives in paired maternal-fetal urine, serum and amniotic fluid from an e-waste dismantling area in China. Environ Int 136:105407, PMID: , 10.1016/j.envint.2019.105407. [DOI] [PubMed] [Google Scholar]

[c43] 43.Zhang S, Kang Q, Peng H, Ding M, Zhao F, Zhou Y, et al. 2019. Relationship between perfluorooctanoate and perfluorooctane sulfonate blood concentrations in the general population and routine drinking water exposure. Environ Int 126:54–60, PMID: , 10.1016/j.envint.2019.02.009. [DOI] [PubMed] [Google Scholar]

[c44] 44.Liu M, Jia S, Dong T, Zhao F, Xu T, Yang Q, et al. 2020. Metabolomic and transcriptomic analysis of MCF-7 cells exposed to 23 chemicals at human-relevant levels: estimation of individual chemical contribution to effects. Environ Health Perspect 128(12):127008, PMID: , 10.1289/EHP6641. [DOI] [PMC free article] [PubMed] [Google Scholar]

[c45] 45.Fang M, Hu L, Chen D, Guo Y, Liu J, Lan C, et al. 2021. Exposome in human health: utopia or wonderland? Innovation (Camb) 2(4):100172, PMID: , 10.1016/j.xinn.2021.100172. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

HExpPredict: In Vivo Exposure Prediction of Human Blood Exposome Using a Random Forest Model and Its Application in Chemical Risk Prioritization

Fanrong Zhao

Li Li

Penghui Lin

Yue Chen

Shipei Xing

Huili Du

Zheng Wang

Junjie Yang

Tao Huan

Cheng Long

Limao Zhang

Bin Wang

Mingliang Fang

Abstract

Background:

Objectives:

Methods:

Results:

Discussion:

Introduction

Methods

Figure 1.

Chemical Selection

Exposure Estimates

Chemical Biotransformation Half-Life Prediction

The Distribution Volume (Vd) Prediction

Molecular Descriptors and QSAR Parameter Calculation

Chemical CB Search

ML Models

Monte Carlo (MC) Simulation and Parameter Distributions

In Vitro Bioactivity Data

Results

Chemical CB Search

Human Exposure Evaluation

Chemical t1/2 Evaluation

Chemical Vd Prediction

CB Prediction ML Modeling

Figure 2.

Model Validation

Table 1.

Figure 3.

Uncertainty Analysis

Chemical Prioritization Using the U.S. EPA’s ToxCast Database

Discussion

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

The Distribution Volume ( $V_{d}$ ) Prediction

Chemical $C_{B}$ Search

Chemical $C_{B}$ Search

Chemical $lowercase t begin subscript 1 by 2 end subscript$ Evaluation

Chemical $V_{d}$ Prediction

$C_{B}$ Prediction ML Modeling