Skip to main content
EPA Author Manuscripts logoLink to EPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Dec 13.
Published in final edited form as: Regul Toxicol Pharmacol. 2019 Oct 29;109:104510. doi: 10.1016/j.yrtph.2019.104510

Development of a prioritization method for chemical-mediated effects on steroidogenesis using an integrated statistical analysis of high-throughput H295R data

Derik E Haggard 1,2, Woodrow Setzer 2, Richard Judson 2, Katie Paul Friedman 2
PMCID: PMC8667012  NIHMSID: NIHMS1742205  PMID: 31676319

Abstract

Synthesis of 11 steroid hormones in human adrenocortical carcinoma cells (H295R) was measured in a high-throughput steroidogenesis assay (HT-H295R) for 656 chemicals in concentration-response as part of the US Environmental Protection Agency’s ToxCast program. This work extends previous analysis of the HT-H295R dataset and model by examining the utility of a novel prioritization metric based on the Mahalanobis distance that reduced these 11-dimensional data to 1-dimension via calculation of a mean Mahalanobis distance (mMd) at each chemical concentration screened for all hormone measures available. Herein, we evaluated the robustness of mMd values, and demonstrate that covariance and variance of the hormones measured appear independent of the chemicals screened and are inherent to the assay; the Type I error rate of the mMd method is less than 1%; and, absolute fold changes (up or down) of 1.5 to 2-fold have sufficient power for statistical significance. As a case study, we examined hormone responses for aromatase inhibitors in the HT-H295R assay and found high concordance with other ToxCast assays for known aromatase inhibitors. Finally, we used mMd and other ToxCast cytotoxicity data to demonstrate prioritization of the most selective and active chemicals as candidates for further in vitro or in silico screening.

Keywords: Steroidogenesis, ToxCast, endocrine disruption, prioritization

Introduction1

The US Environmental Protection Agency (USEPA) Toxicity Forecaster, or ToxCast, program (Dix et al., 2007; Kavlock et al., 2012) currently includes alternative, high-throughput screening (HTS) assays to evaluate the endocrine bioactivity potential for hundreds to thousands of chemicals, including activity at the estrogen receptor, androgen receptor, steroid hormone biosynthesis, and thyroid hormone homeostasis (Judson et al., 2018a). These HTS assays, and the models developed based on them (Browne et al., 2015; Haggard et al., 2018; Judson et al., 2017; Judson et al., 2015; Kleinstreuer et al., 2017), are part of a larger continued effort to use new approach methodologies (NAMs) for prioritization and initial hazard screening for safety assessment (ECHA, 2016; Health Canada, 2016; USEPA, 2018). The acceptance of NAMs for this purpose typically requires a fit-for-purpose validation approach, as NAMs present new challenges in terms of the methodologies employed and the approaches used for data interpretation and model building (Casati, 2018; Griesinger et al., 2016; Patlewicz et al., 2013; Rovida et al., 2015).

Disruption of hormone production can result in a wide range of diseases and adverse effects (as reviewed in Miller, 2017; Miller and Auchus, 2011). The USEPA Endocrine Disruptor Screening Program (EDSP) and the Organization of Economic Cooperation and Development (OECD) have developed testing guidelines for an in vitro steroidogenesis assay using the H295R cell line to measure chemical-induced perturbations of testosterone or estradiol synthesis (Hecker et al., 2011; OECD, 2011; USEPA, 2009). The H295R adrenocortical carcinoma cell line expresses the enzymes necessary for the synthesis of four major classes of hormones, including progestogens, corticosteroids, androgens, and estrogens (Gazdar et al., 1990), which makes these cells useful for screening for chemical effects on steroidogenesis. However, these cells do not fully resemble any one steroidogenic tissue in vivo, and the toxicokinetics and toxicodynamics for H295R cells may be different compared to other model systems. For example, lower throughput models for steroidogenesis employing Leydig or ovarian cells ex vivo have demonstrated differing steroidogenic effects of chemical exposures when compared to H295R which may be due to the presence of additional regulatory mechanisms (e.g. hypothalamic-pituitary-gonadal feedback mechanisms, lack of corticosteroid synthesis in gonadal tissues, etc.) or differences in toxicokinetics/dynamics between models (Botteri Principato et al., 2018; Pinto et al., 2018). H295R cells resemble zonally undifferentiated fetal adrenal cells in that the major classes of hormones are all synthesized, and the overall level of steroidogenic output can be controlled using different culture media conditions, e.g. angiotensin II and forskolin can increase output (Mangelis et al., 2016). H295R cells are useful as an in vitro model system to study chemical effects on steroidogenesis. Due to the low-throughput nature of the USEPA and OECD-validated test guidelines, the H295R assay was adapted to a high-throughput format (termed HT-H295R) to increase screening efficiency (Haggard et al., 2018; Karmaus et al., 2016). This was, in part, spurred by the thousands of substances present in the environment for which there is no information regarding potential effects on steroidogenesis and the push to adopt alternative high-throughput technologies for endocrine-related screening by USEPA and other regulatory agencies (ECHA, 2016; Health Canada, 2016; Health Canada, 2018; Judson et al., 2009; USEPA, 2011). Further development of the HT-H295R assay may result in its use as a part of NAMs that enable rapid screening of large numbers of chemicals to identify and prioritize substances with the highest potential to disrupt steroidogenesis.

The HT-H295R assay has been used to screen 2012 chemicals in single-concentration and 656 chemicals in multi-concentration. Unlike the USEPA and OECD testing guidelines, which only require measurement of testosterone and estradiol, 13 hormones were quantified in the HT-H295R assay using liquid-chromatography and mass spectrometry methods; of these 13 hormones, 11 were consistently measured above the lower limit of quantitation and comprised the input data for a statistical model to interpret chemical effects on steroidogenesis in the HT-H295R assay (Haggard et al. 2018). Comparison of this approach with the reference chemical set used in the OECD interlaboratory validation study (Hecker et al., 2011) showed good reproducibility in the testosterone and estradiol responses. Further, inclusion of additional hormone measures provided not only more biological information in the H295R model, but also enabled a statistical model that provides a quantitative ranking measure for prioritization (Haggard et al., 2018; Hecker et al., 2011). Thus, it was demonstrated that the HT-H295R assay model represents a fit-for-purpose replacement for the low-throughput H295R assay, and that the HT-H295R model, including data from 11 hormones, may provide a means to identify chemicals that have impacts on steroidogenesis upstream of testosterone or estradiol production.

The HT-H295R assay and analysis approach are being considered for use as an alternative for the low-throughput H295R assay, subsequent to review by a Scientific Advisory Panel convened by the USEPA to review the performance of the HT-H295R assay and the statistical model that was implemented (USEPA, 2017). There were three main needs identified for technical refinement (USEPA, 2017): demonstration of the robustness and/or reproducibility of the methodology chosen using data simulations (with specific requests to understand the false positive rate and normality of the data); investigation of whether the assay can identify specific mechanisms of disruption; and, demonstration of how the results may have been confounded by mitochondrial function and/or cytotoxicity despite use of a parallel mitochondrial toxicity assay in H295R cells. The objective of the work herein is to provide more context for the use of this statistical model in prioritization or hazard screening by addressing these questions, which may help move this approach further towards regulatory acceptance.

A statistical model was needed for interpretation of the HT-H295R assay due to three major challenges: one, HT-H295R data are multivariate, i.e. there are 11 hormone responses to understand at any given chemical concentration; two, the production of each hormone in the system is not independent of all of the other hormones, i.e. covariances and correlations between hormone outputs are observed; and, three, the high diversity of patterns in the 11 hormone panel and time-dependence of these patterns indicated that a mechanistic, pathway-based model might not be feasible with the available dataset, and as such, an unbiased way of describing the magnitude of chemical-induced perturbation of the 11 hormone network was needed. Therefore, a novel statistical metric using the mean Mahalanobis distance (mMd) was used to quantify the effect of a given chemical on the overall steroidogenesis pathway measured in HT-H295R at each screened concentration; i.e. the mMd quantifies in a single metric (at each chemical concentration) the magnitude of the difference in the response of all 11 hormones compared to the DMSO control response. A primary justification for the use of the Mahalanobis distance for quantification of effects in the HT-H295R assay is its ability to control for the correlation and covariance that may be present in multivariate data (De Maesschalck et al., 2000). In our previous work, we estimated a covariance matrix for the steroid hormone responses using the HT-H295R dataset; however, it was unknown if minor changes in the dataset might introduce changes to this covariance matrix that would affect the calculation of the mMds. As previously described (Haggard et al., 2018; Karmaus et al., 2016), a phased screening approach was used to identify a subset of the 2012 unique chemicals screened in single concentration for multi-concentration screening; many chemicals selected for multi-concentration screening demonstrated positive responses in at least four of the 11 hormone measures in the single-concentration study (Haggard et al., 2018; Karmaus et al., 2016). Our null hypothesis was that this heavy weighting of active chemicals may have resulted in a biased estimate for the covariance matrix. Here, we explore the reproducibility of the mMd method using covariance matrices derived from different subsets of the HT-H295R dataset using data simulation. In these simulations, we also included steroid response profiles indicative of a true negative response and simulated profiles with response patterns at different effect sizes compared to DMSO control to better characterize the Type I (false positive) error rate and power of the mMd metric to identify experimental significance.

Beyond the multivariate mMd model that characterizes the global response, specific hormones measured in the HT-H295R assay may provide information regarding effects on specific hormone classes, namely progestogens, corticosteroids, androgens, and estrogens. In this paper, a case study for aromatase inhibition is further explored because of the importance of this mechanism of steroidogenesis disruption within the context of EDSP Tier 1 screening and breast cancer therapeutics development. In the EDSP Tier 1 battery, there is an in vitro assay for a mechanism of hormone synthesis disruption: inhibition of the aromatase enzyme (CYP19A1), which results in decreased estradiol and estrone production. H295R cells have previously been used to examine this specific mechanism for a set of seven reference chemicals (Higley et al., 2010); however, this study was limited to only measures of testosterone, estradiol, and aromatase activity and did not include measurements for other hormones. Understanding specific effects on aromatase are difficult in the absence of a kinetic model to demonstrate enzyme inhibition. Differential effects on aromatase activity may manifest further upstream or along different terminal ends of steroidogenesis, e.g. corticosteroids, and other mechanisms such as estrogen receptor modulation may also affect estrogen output. We performed hormone response profiling of the 656 chemicals tested in multi-concentration to identify the chemical exposures associated with decreased estrone and estradiol to understand if prototypical aromatase inhibitors would be reflected in this set. This analysis aimed to answer whether the estrogen signal in the HT-H295R assay could also be used as a screen for effects on estrogen synthesis inhibition.

Finally, a demonstration of how to use cytotoxicity and mitochondrial bioactivity in concert with the mMd approach for prioritization was developed, as chemicals with specific effects on steroidogenesis and/or mitochondrial function might be of greater interest for endocrine activity than cytotoxic chemicals. Steroidogenesis is dependent on functional mitochondria, and so the HT-H295R assay should be particularly sensitive to mitochondrial toxicants or cytotoxic disruption of mitochondrial function. Here, a mitochondria-related toxicity assay, the MTT assay, was screened in parallel with the HT-H295R assay data, and these MTT data were used as a cell viability indicator filter on the HT-H295R data used in the mMd modeling approach (Haggard et al., 2018). However, the sensitivity of the MTT assay for changes in mitochondrial health and cell viability versus other available methods is unknown, and as such we sought to leverage the additional data available via ToxCast for cytotoxicity, with the expectation that this contextual information will be improved iteratively over time. Experience from high-throughput chemical screening programs has demonstrated that the frequency of positive activity across diverse assays increases at concentrations at or approaching activity in cytotoxicity assays (termed the cytotoxicity “burst”), reducing the ability to identify mechanistic bioactivity (Judson et al., 2016). Utilizing cytotoxicity information from the in vitro ToxCast screening data alongside the parallel MTT in the HT-H295R assay, we developed a selectivity and ranking metric to enable prioritization of chemical samples that had the greatest potency and efficacy in the HT-H295R assay. The novel evaluation of the HT-H295R assay and mMd approach further support its use in identification of in vitro endocrine activity.

Materials and Methods

HT-H295R assay information

The following sections describe the HT-H295R assay in brief, as this assay has been described in previous publications (Haggard et al., 2018; Karmaus et al., 2016).

Chemical Library

As previously described (Haggard et al., 2018; Karmaus et al., 2016), 656 unique chemicals were tested in HT-H295R in concentration-response. These chemicals were derived from the ToxCast phase I, II, and endocrine 1000 (E1K) chemical lists and cover a wide chemical landscape with diverse bioactivities (Richard et al., 2016). Tested chemical concentrations ranged from 0.041 nM to 100 μM, depending on solubility in dimethyl sulfoxide (DMSO) and cell viability. All information regarding the chemicals within the ToxCast chemical library are publicly available (http://www.epa.gov/ncct/dsstox/ or https://comptox.epa.gov/dashboard/chemical_lists).

HT-H295R Assay

Cell culture, treatment, hormone quantification, and cell viability components of the HT-H295R assay have been described previously (Haggard et al., 2018; Karmaus et al., 2016). Cell culture, treatments, and cell viability assessments were performed by Cyprotex US, LLC (Kalamazoo, MI). In brief, H295R cells were seeded at 50–60% confluency in 96-well microwell plates and allowed to adhere overnight. Cells were then exposed to media containing 10 μM forskolin for 48 hours to stimulate steroidogenesis. Medium containing forskolin was then removed and replaced with media containing test chemical or experimental controls (10 μM forskolin, 3 μM prochloraz, 0.1% DMSO) to a final concentration of 0.1% DMSO for another 48 hours.2 After chemical exposure, media was removed and stored at −80°C and shipped to OpAns, LLC (Durham, NC) for hormone quantification; the remaining cells in each well were assessed for cell viability using the MTT assay. Any test concentrations that resulted in lower than 70% cell viability were removed prior to analysis. Frozen medium was thawed, and hormone concentrations were quantified using high-performance liquid chromatography tandem mass spectrometry (HPLC-MS/MS) after extraction using methyl tert-butyl ether with an extra derivation using danzyl chloride to quantify estrone and estradiol concentrations. Thirteen hormone measurements were attempted (pregnenolone, 17α-hydroxypregnenolone, progesterone, 17α-hydroxyprogesterone, deoxycorticosterone, corticosterone, 11-deoxycotrisol, cortisol, dehydroepiandrosterone, androstenedione, testosterone, estrone, 17β-estradiol), but pregnenolone and dehydroepiandrosterone were often below the lower limit of quantitation (53.1% and 69.5% of all measurements, respectively), and therefore these hormones were not been included in this analysis, leaving 11 hormone measures as previously described (Haggard et al., 2018; Karmaus et al., 2016). Each test chemical was assayed in technical duplicate, representing one biological replicate per chemical. However, as describe previously (Haggard et al., 2018), 103 and four of the 656 test chemicals have two or three, biological replicates, respectively.

Software

All data management, analysis, and figures were generated using the R statistical programming language (3.5.1) using RStudio (version 0.98.501) on Linux (Red Hat version 6.10). Data simulations were run in parallel using 34 cores. All the original code and source files are available on (EPA FTP ftp://newftp.epa.gov/COMPTOX/NCCT_Publication_Data/FriedmanPaul_K/CompTox-ToxCast_EDSPsteroidogenesis_prioritization) and the USEPA GitHub site (https://github.com/USEPA/CompTox-ToxCast_EDSPsteroidogenesis_prioritization).

Overview of Computational Approach

In the sections below, we describe the computational approach employed in detail to answer several main questions. Answers to the first questions are meant to characterize the robustness of the mMd approach by evaluating: (1) how stable is the residual covariance matrix used in the mMd approach, and how do changes in the covariance matrix impact resulting mMd values compared to the mMd values calculated in original analysis of Haggard et al. (2018); and, (2) what is the power of the mMd approach to detect fold-changes in hormones and, conversely, what is the observed false positive rate? The second set of questions strive to define use cases for the HT-H295R data and the mMd model of these data: (1) does the HT-H295R assay detect chemicals known to disrupt estrogen synthesis; and (2) do additional cytotoxicity and mitochondrial toxicity information help contextualize the mMd results from the HT-H295R assay?

The computational experiments to address the robustness of the mMd approach are illustrated in Figure 1. In brief, we created simulated datasets of control and treatment responses for each chemical by sampling from multivariate normal distributions defined by estimated mean responses and covariances from the original data. The sampling (including estimation of means and covariances) was performed within experimental assay block and within four bins of chemicals: DMSO controls, and then those having low, medium, or high mMd values, and low mMds, and all chemicals (termed “DMSO”, “low”, “medium” and “high”) (Figure 1A). Three sets of mMd values were then calculated using these different covariance matrices based on simulated data (Figure 1B); these three repartitions of the simulated data produced covariance matrix estimates from “low” maxmMd values, all (termed “global”) maxmMd values, and “high” maxmMd values. A comparison between the original HT-H295R analysis and the data simulations was used to address the question: was the original covariance matrix estimate a reasonable approximation? These data simulations were then used in power analyses and Type I error approximation. In total, these experiments address the robustness and appropriateness of the mMd approach. Additional details of the methods for the data simulation experiments follow in subsequent sections.

Figure 1. Workflow of data simulation study.

Figure 1.

(A) Original level 0 data from the ToxCast database, invitrodb, were downloaded and converted into micromolar concentrations and were subset into four datasets to represent vehicle controls, low, medium, and high responses in the HT-H295R assay based on the maxmMd values from Haggard et al. (2018). These subsets were used to calculate 4 covariance matrices for each block of assay data (for a total of 32 covariance matrices) for MVN sampling. Using mean hormone response values by chemical sample and the appropriate covariance matrices, 2000 unique simulated datasets were generated. (B) For each simulated dataset, three sets of mMds were calculated for all simulated chemical samples using three covariance matrices estimated from either low responding chemicals, high responding chemicals, or the global dataset, for a total of 6000 sets of mMd values.

Source data

HT-H295R data for these analyses were downloaded using the ToxCast pipeline (tcpl) R package that is publicly available (https://cran.r-project.org/web/packages/tcpl/index.html) (Filer et al., 2017). Multi-concentration level 0 data from invitrodb (version 3.1) were downloaded and converted from μg/ml into micromolar concentrations prior to calculation of mMds and data simulation (Supplemental Data 1). The data structure consisted of eight different data blocks, which represent the eight separate HT-H295R assay experiments that were performed between 2013 and 2017. Due to the presence of block effects, analyses were performed within blocks. In total, 654 unique chemicals were screened in multi-concentration with some replication within and across blocks for a total of 766 unique chemical samples. Measured hormones are abbreviated throughout the manuscript as follows: PROG, progesterone; OH-PREG, 17α-hydroxypregnenolone; OH-PROG, 17α-hydroxyprogesterone; DOC, deoxycorticosterone; CORTIC, corticosterone; 11-DCORT, 11-deoxycortisol; CORT, cortisol; ANDR, androstenedione; TESTO, testosterone; E1, estrone; E2, estrone.

Mahalanobis distance overview

Mahalanobis distances were calculated as previously described (Haggard et al., 2018). As demonstrated previously, there is a large degree of covariance in the residuals of the hormones measured in the HT-H295R assay. Unlike Euclidean distance, which cannot control for positive or negative covariance between multivariate measures, the Mahalanobis distance corrects for correlation and covariance in multivariate data by including the inverse covariance matrix of all multivariate measures in the calculation of the distance values. As such, the multivariate measures are rotated and scaled to make the variables uncorrelated and have the same variance. An example of the differences between multivariate Euclidean and Mahalanobis distance is demonstrated in Figure 2. We employ the Mahalanobis distance as a measure of multivariate effect size of chemical treatment on the whole steroidogenesis pathway represented in the HT-H295R assay. This allows for the integration of all 11 hormone measures into a single value that represents the overall magnitude of effect of a specific concentration of a chemical in the HT-H295R assay. The Mahalanobis distance for a chemical at a given concentration is calculated as shown in Equation 1.

Figure 2. Example of Mahalanobis distance.

Figure 2.

The difference between Euclidean distance and Mahalanobis distance using an example to two theoretical hormone measures with positive covariance. (A) Scatterplot for two theoretical hormone measures X and Y in Euclidean space. Grey points denote mean hormone measures for theoretical chemical samples. The error distribution, denoted by the dashed ellipse, shows the variance among measurements of hormone Y (standard deviation = 0.16) is greater than measurements of hormone X (standard deviation = 0.08), and these measures show positive covariance (correlation = 0.8). The points labeled conc 1, conc 2, and conc 3 represent the mean (natural logarithm) concentrations of three concentrations of an example test chemical. In terms of hormone concentrations, the response at conc 3 for hormone X is roughly twice as far from conc 1 as is the response at conc 2 (and is a larger number of standard deviations away from the mean of the distribution for hormone X); however, the Euclidean distances of conc 2 and conc 3 to conc 1 are the same. (B) Scatterplot of the same hormone measures as in A after Mahalanobis distance transformation. Mahalanobis distance adjusts for the covariance between measurements, which rotates and scales the data so that the data are uncorrelated and have the same variance (as denoted by the error ellipse now being a circle). After transformation, we see that conc 3 is now around four times as distant from conc 1 as is conc 2, better reflecting the difference of conc 3 from the overall distribution of measures in hormones X and Y compared to conc 2.

mMd=(ycy1)TΣ1(ycy1)/Nh

Here, yc is the vector of natural log-transformed hormone concentrations for a test chemical at the cth exposure concentration and y1 is the vector of natural log-transformed hormone concentrations for the matched DMSO control sample. Σ is the covariance matrix which is estimated from fitting the natural log-transformed HT-H295R hormone concentration data for all chemicals and control samples to a multivariate linear model, accounting for chemical and concentration effects. In this case, due to apparent block effects, a covariance matrix was estimated from the eight assay test date blocks as the unweighted average of the covariance matrices of these blocks. Nh is the number of hormones measured for the test chemical; this variable results in the computation of the mMd, which was done to allow for the best comparability of Mahalanobis distance values between chemicals that have different numbers of hormone measurements, e.g. due to missing data. It is important to note that the same covariance matrix, Σ, is used to calculate the mMd for all chemical samples.

The maximum mMd (maxmMd) value for each chemical concentration-response curve represents the overall magnitude of perturbation of the steroidogenesis pathway in HT-H295R for a chemical, which can be used to rank and prioritize chemicals for additional screening.

To determine the experimental significance of any given mMd value above background variability, we also calculated a critical value (Haggard et al., 2018) using the method developed by Nakamura and Imada (2005). This method is a multiple comparisons procedure, like the Dunnett’s procedure for univariate comparisons, and is based off the similarity of the squared Mahalanobis distance to the Hotelling’s T2 statistic. The critical value is derived to approximate a nominal Type I error rate of 0.01.

Mahalanobis distance of different covariance matrices from original data

Prior to the data simulation study, we estimated additional sets of mMd values using covariance matrices derived from two data subsets of the HT-H295R data. The intent in doing this was to understand how the mMds might change using different covariance matrices based on the measured data themselves, thus serving as a check on the mMd values calculated using the covariance matrices derived via data simulation. “High” and “low” groups were defined by subsetting the data within assay date block into upper and lower halves based on maxmMd values calculated from the original HT-H295R dataset. For both the upper and lower groups, we estimated new covariance matrices following the same multivariate linear modeling approach as described above and recalculated the mMd values for all chemical samples using these covariance matrices.

Multivariate normal data simulation

Dataset generation

Data simulation was performed by sampling the HT-H295R data from a multivariate normal distribution (MVN) using the mvtnorm R package (https://cran.r-project.org/web/packages/mvtnorm/index.html). Figure 1 summarizes the workflow for the MVN data simulation (Figure 1A) and generation of new mMds (Figure 1B). Simulating hormone response measures of a chemical sample from an MVN distribution requires a vector of multivariate means (i.e. the mean hormone measures of a given concentration of chemical) and a covariance matrix that should reflect the multivariate covariance of that chemical sample. In this case, each block was subset into a total of four groups (DMSO controls, low maxmMd chemical samples, medium maxmMd chemical samples, and high maxmMd chemical samples), and covariance matrices were estimated. This resulted in estimation of a total of 32 covariance matrices, i.e. one for each of the four groups for all eight experimental assay blocks. For each concentration of a test chemical or DMSO control, the average hormone response of the two technical replicates (following the same dimensions of the experimental data3) was calculated, and the covariance matrix from the set of 32 estimated covariance matrices that contains the chemical sample of interest was used as input to draw samples from a MVN distribution. The number of samples was equal to the number of technical replicates from the original mMd analysis so that each simulated dataset had the same dimensions as the original data. If the original hormone data for a test chemical was equal to or below the standardized lower limit of quantification (LLOQ) value, i.e. the LLOQ/√2 for a given hormone, it was set to that standardized value. 2000 simulated datasets were generated from the original HT-H295R data for subsequent analysis (Figure 1A).

Derivation of covariance matrices from simulated data to examine stability of the mMd metric

To explore the possibility that covariances vary with treatment effect and may be different between highly responsive and non-responding chemicals (which would influence the calculation of the mMd), we generated three different covariance matrices to calculate new mMds (Figure 1B). For each simulated dataset, we first estimated a pooled covariance matrix using the all simulated chemical data, termed the “global” covariance matrix, and recalculated the mMds using the same methodology as described above and in the original mMd analysis (Haggard et al., 2018). With these new mMds, we determined the maxmMd for the simulated chemical samples. The relative rank of chemicals by maxmMd was then used to subset the simulated data in half within assay date blocks into “high” maxmMd and “low” maxmMd chemical sample groups, i.e. strong and low responding chemical sets, respectively. We then estimated pooled covariance matrices using the high and low data subsets and calculated mMds using the high and low covariance matrices. To summarize, for each of the 2000 simulated datasets, three separate covariance matrices were generated (global, low, and high) and subsequently used to calculate new mMds for a total of three sets of mMds for each data simulation (for a total of 6000 sets of mMds). The output from the simulations, including the data, covariance matrices, mMds, and maxmMds can be found on USEPA FTP site (ftp://newftp.epa.gov/COMPTOX/NCCT_Publication_Data/FriedmanPaul_K/CompTox-ToxCast_EDSPsteroidogenesis_prioritization).

Type I error rate computation

Using the simulated data described above, the rate of false positives, i.e. the Type I error, was also computed. This was done by generating a true null concentration response profile based off a randomly selected DMSO sample during the MVN sampling procedure. All 2000 samples generated from this null profile would be expected to be negative, with some rate of false positives due to sampling noise that inform an estimate of the true Type I error rate. For the mMd calculation, a threshold for a positive was defined by the critical value (described previously). For a chemical sample with 11 hormone measures and an N=2 (N based on technical replicates), the critical value is 1.64 assuming a Type I error rate with α = 0.01. This was the assumption of Type I error rate used in Haggard et al. (2018). If a Type I error rate with α = 0.05 is assumed, the critical value is 1.50 (when 11 hormone measures are available and N=2). The frequency of positives in the null data simulation was defined as the number of null simulations that exceeded the critical value divided by the total number of null simulations (2000).

Power analyses

We generated ten theoretical hormone concentration response profiles with increasing effect sizes (1.1, 1.5, 2, and 2.5-fold increases compared to control) by amplifying the responses of the random DMSO sample used in the Type I error rate analysis above. These profiles were modeled after the following theoretical hormone changes: increases in all 11 measured hormones (all); increases in OH-PROG, 11-DCORT, CORT, ANDR, TESTO, E1,and E2 (some); increases in 11-DCORT and CORT (glucocorticoids); increases in DOC and CORTIC (mineralocorticoids); increases in ANDR (androstenedione); increases in TESTO (testosterone); increases in ANDR and TESTO (androgens); increases in E1 (estrone); increases in E2 (estradiol); and increases in E1 and E2 (estrogens). The mMd values across the theoretical profiles were calculated using the global, low, and high covariance matrices as described above and the frequency of simulated mMd values above the critical value (assuming a Type I error rate with α = 0.01) across all the simulations was determined as the power to detect experimental significance.

Steroid hormone pattern analysis of putative aromatase inhibitors

Micromolar steroid hormone concentration-response profiles for all chemical samples were analyzed using an analysis of variance (ANOVA) with post hoc Dunnett’s procedure at an α = 0.05. Directionality of response was determined by the sign of the slope of the fitted data. Positive hit calls for hormone perturbation were determined following the same logic used in the OECD test guideline (TG) 456 method (OECD, 2011) and the interlaboratory validation report (Hecker et al., 2011), where a hit call was defined as a statistically significant effect at two consecutive concentrations of test chemical or at the highest concentration tested. In this case, a chemical was coded as a hit in the positive direction (1), negative direction (−1), or a non-hit (0). A table of the hit calls can be found in Supplemental Data 2.

Since a putative aromatase inhibitor will theoretically decrease estrogen concentrations in HT-H295R, hit call data were filtered to only include chemicals that were significant hits in the negative direction for both E2 and E1 with an efficacy threshold of −1.5 fold change compared to plate-level DMSO controls. HT-H295R hormone profiles for Figure 6 were visualized using the ComplexHeatmap package (Gu et al., 2016) using Ward’s method.

Figure 6. Heatmap of hormone response profiles.

Figure 6.

Putative aromatase inhibitors were identified after filtering HT-H295R response profiles for chemical samples with statistically significant decreases in E1 and E2 concentrations compared to plate-level DMSO controls (as measured by one-way ANOVA) with a fold decrease of ≥1.5. Rows denote chemical samples and columns are hormones. Rows were clustered using Ward’s method only considering ANDR, TESTO, E1, and E2, and four prominent clusters were identified and are labeled. The aromatase relevant hormones, specifically ANDR, TESTO, E1, and E2, are presented to the left whereas the other hormones are presented to the right. The log10-maxmMd value is also denoted with an annotation bar to the right to indicate effect size for the 11 hormones. Reference chemicals (chrysin, clotrimazole, metyrapone, letrozole, and fadrozole hydrochloride) known to inhibit aromatase are specifically labeled to the right (Judson et al., 2018b). Grey denotes missing data due to sample loss during screening. Cluster numbers 1-4 referenced in the text are labeled.

Further comparison was performed to relate these data to a putative aromatase inhibitor list identified by a related high-throughput screening initiative, Tox21 (Thomas et al., 2018). As part of the Tox21 program, a cell-based high-throughput screening for aromatase inhibitors for the Tox21 10K chemical library (Chen et al., 2014), in combination with data filtering and target follow-up screening (Chen et al., 2015), was used to identify a list of 50 putative aromatase inhibitors. Previously, a biochemical, cell-free assay from NovaScreen (NVS) (Sipes et al., 2013) was used by the ToxCast program in a tiered screen for 1882 chemicals in single concentration and 248 chemicals in multi-concentration response. Fourteen and 17 of the 50 putative aromatase inhibitors from Tox21 were included in the ToxCast NVS assay and HT-H295R assays, respectively. A simple comparison, using the HT-H295R hit call logic defined above and based on the OECD TG 456 method, is presented to understand if the HT-H295R screen can recognize putative aromatase inhibitors identified from previous screening efforts. As the mMd incorporates signal from other hormones in the H295R cell, a comparison to hit calls for estrogens seemed to be a more equivalent comparison.

Selectivity scoring and potency estimation of mMd benchmark dose values

Concentration-response curves of the mMds for each test chemical were fit to a four-parameter logistic curve and a benchmark dose (BMD), defined as the log10 concentration of a test chemical where the mMd value is equal to the critical value, was determined as previously described (Haggard et al., 2018). A minimum and maximum BMD value was set to −4 on the log10 micromolar scale (0.0001 micromolar) and 3 on the log10 micromolar scale (1000 micromolar), respectively. A selectivity ranking metric was calculated for each test chemical as defined in Equation 2.

selectivity=min(cytoburst,MTTacc)BMD

Where cytoburst is the lower bound (median AC50 – 3*baseline median absolute deviation of the 26 ToxCast cytotoxicity assays) of log10 AC50 values of the set of cytotoxicity assays used in the ToxCast high-throughput screening program (Supplemental Table 1), MTTacc is the concentration where the cell viability response was equal to the maximum allowed cell viability of 70%, which was determined by fitting concentration-response data of the HT-H295R cell viability assay using tcpl. This value represents the concentration where potential confounding effects due to mitochondrial disruption may occur. The cytoburst values were calculated using the tcplCytoPt function from the tcpl package (Filer et al., 2017; Judson et al., 2016). If a chemical was not considered a hit in at least two of the cytotoxicity assays, the cytoburst was set to 3. A test chemical with a selectivity score ≥0.5 log10 units were considered selective hits in the HT-H295R assay.

Potency and efficacy of chemical effects on the steroidogenesis pathway were compressed for relative prioritization by calculating the area under the curve (AUC) of the mMd concentration-response. AUC was calculated by integrating each curve using the model parameters from the four-parameter logistic curve fitting that was used to derive the BMD values for each chemical sample. The limits of integration were defined based on the lowest and highest exposure concentration for each test chemical. The calculated AUCs were used to rank chemical samples that passed the selectivity filtering described above.

Results

Stability of the mMd Metric with Different Covariance Matrices

To determine the stability of the mMd metric, the influence of the covariance matrix on the mMd calculation, and overall performance, we performed a data simulation study using MVN sampling of the original HT-H295R data. We generated a total of 2000 simulated HT-H295R datasets and recalculated mMds using three different covariance matrices. These covariance matrices were estimated using different subsets of the simulated data representing weak responding chemicals, all chemicals, and strong responding chemicals in the HT-H295R assay (termed low, global, and high respectively; Figure 1). Comparison of the covariance structures of the three covariance matrices in the simulation, represented as the average covariance matrix of the three data subset groups, are summarized in Figure 3. The pairwise correlation of the hormone covariances, as well as the variances, were similar across the three subsets; the maximum difference in the pairwise correlation of the hormone covariances was 0.05, with a range of 0 to 0.05. Running the same analysis using only the original HT-H295R data, partitioned in the same way as the simulated datasets, gave similar results (Supplemental Figure 1).

Figure 3. Hormone variance and pairwise correlation of covariances.

Figure 3.

The pairwise correlation of the mean pooled covariance matrix of the hormones for the low (italics), global (boldface), and high (normal) covariance types are shown. The diagonal of the matrix indicates the mean variance for each measured hormone. The correlation matrix is colored (red to blue; −1.0 to 1.0) based on the correlation coefficients of the global covariance matrices (boldface) and are clustered using Ward’s method.

For each data simulation, three sets of mMds were calculated, one set for each of the three covariance matrices. A scatterplot of mMd values for chemicals calculated from the HT-H295R data and the median mMd of the 2000 data simulations for the three covariance types showed that the median simulated mMds were very similar to the original, regardless of the covariance matrix type (low, global, or high) used in the mMd calculation (Figure 4A). Linear regression by covariance matrix type showed that both the low and global covariance matrices were the most accurate at estimating the original mMd with slopes equal to 0.98 and 0.97, respectively. In contrast, the high covariance matrix type estimated lower mMd values overall compared to the original with a slope of 0.91. This lower slope translates to slightly lower mMd values calculated when using the high covariance matrix type, which was calculated based on the hormone effects of “high-responding” chemicals only. This same general, but minor, trend is maintained when examining the mMd distributions of simulated hormone responses of prochloraz, butylparaben, and ethylene dimethanesulfonate (Figure 4B-D). In sum, the mMd values generated using 2000 data simulations and three distinct covariance matrix types were nearly identical, with only very slight differences observed for the mMd values generated using the high covariance type. The mMd-by-concentration boxplots for all chemical samples using the three covariance types can be found in Supplemental Figure 2.

Figure 4. Stability of mMd values calculated with different covariance matrices.

Figure 4.

For each data simulation, three sets of mMds were calculated, one set for each of the three covariance types. (A) Scatterplot to compare the original mMd values calculated in Haggard et al. (2018) to the simulated mMd values calculated for each covariance type. Closed circles represent the median and the bars represent the 95% confidence interval of the mMd values for each covariance type. Dashed line indicates the identity line. (B) Boxplots of mMd concentration response values after prochloraz exposure in HT-H295R across the 2000 simulation and each covariance type. (C) Boxplots of mMd responses after butylparaben exposure in HT-H295R across the 2000 simulations and each covariance type. (D) Boxplots of mMd concentration response values after ethylene dimethanesulfonate exposure in HT-H295R across the 2000 simulation and each covariance type. For all plots, red, green, and blue correspond to the low, global, and high covariance type, respectively. Filled triangles represent mMd values estimated using the original dataset using the three covariance types.

Specificity and Power of the mMd Metric

To determine a positive mMd response, we defined a critical value based on the method of Nakamura and Imada (2005). The critical value determines the minimum mMd needed for a multivariate response to be considered experimentally significant compared to the DMSO control response. However, the Type I error rate this critical value was evaluated on (α = 0.01) is considered approximate. Through this data simulation, we determined the observed Type I error rate of the mMd metric by counting the number of times the calculated maxmMd value from “true negative” DMSO control data was larger than the defined critical value at either an approximate α of 0.05 or 0.01. As shown in Table 1, the true negative concentration-response elicited a Type I error rate slightly below the approximate value defined by the critical value regardless of the covariance matrix type used in the mMd calculation. In general, the low and global covariance matrix type were closer to the approximate Type I error rate compared to the high covariance type.

Table 1.

mMd Type I error rate of true negative response in HT-H295R

Covariance Type

Error Rate Low Global High
0.05 0.015 0.019 0.012
0.01 0.007 0.008 0.006

Along with generating a true negative response, we also simulated ten diverse hormone response profiles at four different effect sizes (1.1, 1.5, 2, and 2.5-fold change in hormone concentrations compared to control) to better define the power of the mMd method to detect varying experimentally significant responses. Hormone responses with effect sizes larger than 2-fold (interpreted bidirectionally, as an increase or decrease) had a power >0.99 to detect a maxmMd above the 0.01 critical value for all tested response profiles (Figure 5). For nearly all response profiles, except for an effect on both E2 and E1, a 1.5-fold effect size was equal to or greater than a power of 0.8. The power for most hormone response profiles was not affected by covariance type, however, the high covariance type had the largest impact on the power to detect experimental significance for several response profiles including effects on both estrogens, both mineralocorticoids, E2 alone, E1 alone, and TESTO alone (Figure 5).

Figure 5. Power analysis.

Figure 5.

As part of the MVN simulation, we generated ten different theoretical hormone response profiles at varying effects sizes (1.1, 1.5, 2, and 2.5-fold increases compared to a DMSO control sample), and calculated mMds for these profiles using the three covariance types. The percentage of maxmMd values above the pre-defined critical value at α = 0.01 (i.e. maxmMd ≥ 1.64) was considered the power to detect a maxmMd with experimental significance. Red, green, and blue correspond to the low, global, and high covariance type, respectively. Circles and triangles denote effect sizes of 1.1- and 1.5-fold, respectively. Note that effect sizes of 2.0- and 2.5-fold are not included in the figure due to these effect sizes having a power of ≥0.999 for all response profiles. Dashed line indicates a power of 0.8. Definitions for the response profile types are as follows: All: all measured hormones increased; Some: CORT, 11-DCORT, ANDR, TESTO, E1, and E2 increased; Glucocorticoids: 11-DCORT and CORT increased; Mineralocorticoids: DOC and CORTIC increased; Androstenedione: ANDR increased; Testosterone: TESTO increased; Androgens: ANDR and TESTO increased; Estrone: E1 increased; Estradiol: E2 increased; Estrogens: E1 and E2 increased.

Response of Putative Aromatase Inhibitors in HT-H295R

Two approaches were taken to understanding the capability of HT-H295R assay data to identify putative aromatase inhibitors. The first approach involved naïve identification of chemicals that decreased E1 and E2, and examination to understand if these identified chemicals included recognized aromatase inhibitors. Note that chemicals that may have induced E1 and/or E2 are excluded from this analysis and visualization in order to isolate putative aromatase inhibitors for this example. A second approach relied on integrating information from previous HTS assay data to identify putative aromatase inhibitors to characterize whether the HT-H295R assay identified experimentally-defined aromatase inhibitors as positives.

Based on significant decreases in E1 and E2 concentrations, using a 1.5-fold efficacy cutoff, we identified a set of 72 chemical samples (out of 766 samples total) that we considered putative aromatase inhibitors in HT-H295R. Fold change in hormone concentrations for the 72 chemical samples were visualized as a heatmap (Figure 6). In this case, chemical samples were hierarchically clustered using only the hormones directly affected by the aromatase enzyme: ANDR, TESTO, E2, and E1. This clustering identified four prominent clusters: cluster 1 has strong negative effects on both estrogens and androgens, cluster 2 with strong negative effects on the androgens with decreases in the estrogens at higher concentrations, cluster 3 with more moderate negative effects on androgens and estrogens, and cluster 4 with modest decreases in estrogens with small decreases in the androgens. Five of the 72 chemical samples that elicited a phenotype indicative of aromatase inhibition are known reference aromatase inhibitors: chrysin, ketoconazole, clotrimazole, letrozole, and fadrozole hydrochloride (Judson et al., 2018b). Two of these, fadrozole hydrochloride and letrozole, are pharmacological inhibitors of aromatase and were in the cluster with strong inhibitory effects on both estrogens and androgens. Ketoconazole and clotrimazole are triazole fungicides, with known effects on aromatase as well as other steroidogenesis-relevant enzymes further upstream such as CYP17A1, and were in the cluster with strong inhibitory effects on androgens and decreases in estrogens at high concentrations (Ayub and Levell, 1987; Trosken et al., 2006). The last reference chemical, chrysin, is a plant-derived compound and was a member of the cluster with moderate effects on androgens and estrogens (Kao et al., 1998). Expanding to the rest of the steroidogenesis pathway represented in HT-H295R, including progestogens and corticosteroids, we observed a mixture of different response patterns for these chemical samples. For the two chemical clusters with strong effects on estrogens and androgens, similar inhibitory pattern for OH-PROG, 11-DCORT, and CORT, particularly at high test concentrations, were apparent.

In a second approach, we characterized how available HTS methods for aromatase inhibition behaved in aggregate. Previous Tox21 screening and confirmatory assay work (Chen et al., 2015) suggested a list of 50 putative aromatase inhibitors, of which only a small subset (17) were screened in the HT-H295R assay and the ToxCast NVS aromatase assay (NVS_ADME_hCYP19A1). As shown in Table 2, both HT-H295R and NVS assays identified the putative aromatase inhibitors. In the HT-H295R assay, 12 of 17 screened putative aromatase inhibitors were positive for ≥ 1.5-fold decreases E2 and E1, one was positive for a ≥ 1.5-fold decrease in E2 only (imazalil), one chemical was positive for a ≥1.5-fold decrease in E1 only (4,4’-oxydianiline), one chemical had ≥ 1.5-fold increases in E2 and E1 (cyproconazole) and two chemicals did not have ≥ 1.5-fold increases or decreases in E2 and E1 (propiconazole and tetraconazole), though they were mMd positives, with maxmMds of 6.83 and 3.51, respectively. Of the 17 putative aromatase inhibitors in NVS and HT-H295R, 12 were positive in NVS. Thus, we may have increased confidence for chemicals with E2 and/or E1 decreases and a NVS_ADME_hCYP19A1 positive.

Table 2. Putative Aromatase Inhibitors from Tox21/ToxCast HTS and HT-H295R Assays.

Available HT-H295R and NVS hCYP19A1 assay data are summarized for seventeen aromatase positives from Tox21 screening

Name Cas No. maxmMd HT-H295R E1a HT-H295R E2a HT-H295R Fold Changeb HT-H295R Aromatase Positivec NVS Aromatase Positived Tox21 Aromatase Positivee
4,4’-Oxydianiline 101-80-4 18.35 - Yes Yes No Yes
Clotrimazole 23593-75-1 18.35 Yes Yes No Yes
Cyproconazole 94361-06-5 8.58 Yes No Yes Yes
Epoxiconazole 133855-98-8 14.68 Yes Yes Yes Yes
Fadrozole hydrochloride 102676-31-3 16.90 Yes Yes Yes Yes
Fenarimol 60168-88-9 16.70 Yes Yes No Yes
Flusilazole 85509-19-9 9.27 Yes Yes Yes Yes
Imazalil 35554-44-0 22.62 Yes Yes Yes Yes
Letrozole 112809-51-5 14.02 Yes Yes Yes Yes
Myclobutanil 88671-89-0 16.43 Yes Yes Yes Yes
Prochloraz 67747-09-5 18.56 Yes Yes Yes Yes
Propiconazole 60207-90-1 5.14 - - No No Yes Yes
SSR150106 NOCAS_47362 9.06 Yes Yes No Yes
Tetraconazole 112281-77-3 3.41 - No No Yes Yes
Triadimenol 55219-65-3 17.25 Yes Yes Yes Yes
Triflumizole 68694-11-1 17.19 Yes Yes Yes Yes
Vatalanib dihydrochloride 212141-51-0 8.34 Yes Yes No Yes
a

Arrows for E1 (estrone) and E2 (estradiol) indicate direction of effect on the hormone in the HT-H295R system based on a minimum 1.5-fold change (see Methods). A “-” indicates no change.

b

HT-H295R Fold-Change indicates that the chemical was positive (Yes) or negative (No) for meeting a statistically significant fold-change criteria (>1.5-fold change; ANOVA p-value ≤ 0.05).

c

HT-H295R Aromatase Positive indicates that the chemical would have been flagged for effects on estrogens consistent with an aromatase inhibitor on the basis of the HT-H295R data (i.e., E1 and/or E2 significantly decreased).

d

NVS aromatase positives were pulled from the level 5 information for the assay in invitrodb (where positive is equivalent to hitcall of 1).

e

All of the chemicals in this table were Tox21 Aromatase positives from Chen et al. (2015).

Rank Prioritization of Steroidogenesis Disruptors Using the mMd Metric and ToxCast Viability Assays

To determine which chemical samples had the most selective activity within the HT-H295R assays based on the mMd metric, and not due to non-specific bioactivity from mitochondrial toxicity or cytotoxicity, we compared benchmark dose (BMD) values calculated from the mMd concentration-response curves to the parallel HT-H295R cell viability assay as well as ToxCast/Tox21 cytotoxicity assays. A selectivity scoring metric was designed to prioritize chemical samples for which the BMD values were less than the minimum of the cytotoxicity burst value as defined by Judson et al. (2016) or the concentration at the allowed cell viability threshold (70%) in the parallel MTT assay. Chemical samples with a selectivity score ≥0.5 log10 units, i.e. the BMD was 0.5 log10 micromolar less than the cytotoxicity or MTT values, were considered selective for the HT-H295R assay. A total of 428 of the 709 chemical samples that were considered a maxmMd positive (maxmMd ≥ critical value at α = 0.01) and with sufficient ToxCast assay coverage to calculate the cytotoxicity burst value (Judson et al., 2016), were considered selective actives in HT-H295R. Since the BMD value is based on the calculated critical mMd value, there are some instances where all mMd values for a test chemical are above the critical value but have a small maxmMd. In this case, a test chemical would be considered an active having high selectivity, but low efficacy. To account for this possible discrepancy, we also estimated the AUC for the mMd concentration-response curves as a measure of efficacy. In general, we found that the maxmMd was more correlated with the AUC compared to the selectivity score (Supplemental Table 1; Supplemental Figure 3). Chemical samples that passed the selectivity filter were then ranked according to the AUC (Supplemental Figure 4). The 25 most and least efficacious chemical samples that were also selective are shown in Figure 7. The chemical samples ranked the highest using this hybrid approach included hormones, pharmaceuticals, insecticides, among others.

Figure 7. Selectivity, area under the curve, and maxmMd as context for prioritization.

Figure 7.

The most (top 25) and least (bottom 25) efficacious selective chemicals are illustrated in Figure 7 (for all, see Supplemental Figure 5 and Supplemental Table 1). The BMD is the potency at the critical value for the mMd curve (red); the MTTacc is the potency at the threshold for a positive response in that assay (green); and the cytoburst is the potency value for the lower bound of a cytotoxicity distribution (blue). Selectivity (minimum of cytoburst or MTTacc minus the BMD), AUC (area under the curve for mMd versus concentration), and maxmMd are shown to the right.

Discussion

Developing screening strategies to rapidly and cost-effectively prioritize chemicals for effects on the endocrine system is an important challenge facing the global regulatory community, and high-throughput screening technologies are becoming a component of strategies to inform and/or prioritize chemical safety assessment (ECHA, 2016; ECHA, 2017; Friedman et al., 2019; Health Canada, 2016; USEPA, 2019). Thus far, endocrine testing programs have focused on chemical bioactivity at the level of the nuclear receptor, particularly for the estrogen (Browne et al., 2015; Judson et al., 2015) and androgen receptors (Browne et al., 2015; Kleinstreuer et al., 2017; Rotroff et al., 2013). In addition, the EDSP incorporates in vitro and in vivo steroidogenesis assays in its tiered screening strategy, including the USEPA H295R guideline assay (USEPA, 2009; USEPA, 2017). These assays are low-throughput and the H295R guideline assay only examines chemical disruption of TESTO and E2, which lacks additional hormones in the steroidogenesis pathway that may serve to bolster interpretation of changes in E2 and TESTO in H295R cells. The HT-H295R assay is a demonstration of an informative high-throughput method to rapidly screen large numbers of chemicals for effects on the steroidogenesis pathway, providing a readout of 11 hormones and intermediates that covers the four major classes of hormones produced in humans (Karmaus et al., 2016). A multivariate statistical approach utilizing the Mahalanobis distance was performed on HT-H295R data in order to enable a data-driven statistical metric for chemical prioritization (Haggard et al., 2018). Herein, we furthered this novel approach by addressing key questions regarding the stability and reproducibility of the mMd values, the sensitivity of the approach, the ability of the HT-H295R assay to detect aromatase inhibitors as a class of steroidogenesis disruptor, and a proof-of-concept on how to use the mMd metric with additional contextual information for prioritization of chemicals. We examined the robustness of the mMd approach via characterization of the effects on mMd of different covariance and correlation relationships among hormones. To better understand the false positive rate and sensitivity of this assay to fold-changes in hormones, the Type I error rate and power of the mMd method to determine experimental significance were evaluated. Furthermore, we examined whether the HT-H295R assay’s inclusion of hormones beyond TESTO and E2 allowed for the identification of specific response profiles indicative of aromatase inhibition. Lastly, we incorporated additional cytotoxicity measures, and combined efficacy and potency estimates for the mMd analysis, to better prioritize bioactivity of chemicals in HT-H295R.

To increase confidence that the mMd metric is useful for screening level assessment, we explored some key statistical assumptions underlying the method using a large data simulation; namely, we needed to understand how reproducible the mMd values are, what the power of the mMd approach is to detect fold-changes in hormones (increases and/or decreases), and what the observed Type I error rate (false positive rate) might be. One of the main assumptions for using the Mahalanobis distance to calculate multivariate distances is the presence of substantial correlation or covariance among the multivariate measures. As described earlier, the 656 chemicals tested in multi-concentration largely demonstrated significant activity for three or more hormones from the original single concentration HT-H295R chemical screen. Since this dataset was biased towards steroidogenesis positive chemicals due to the use of a tiered screening approach, a hypothesis to address was: did the disproportionately high number of positive chemicals in the dataset influence approximations of the covariance matrix used, which heavily influenced the mMd obtained? If we had screened a different chemical set, would the covariance matrix approximation have been different? Data simulation allowed us to explore the effects of different datasets on the covariance between hormones, and further how this may affect the mMd values. In short, this helped inform conclusions regarding the stability of the original mMd values from Haggard et al. (2018). As demonstrated in Figure 3 and Supplemental Figure 1 for the original HT-H295R data, the pairwise correlation of the hormone covariances, as well as the variances, were largely similar across the low (with a covariance matrix derived from “low”-responding actives), global (with a covariance matrix derived from all chemical samples), and high (with a covariance matrix derived from “high”-responding actives) covariance types. This suggests the underlying covariance of the hormone measures in the HT-H295R assay are not heavily influenced by the chemical effects observed in screening; rather, the covariance between hormones may reflect some fundamental biological aspect of the system of hormone synthesis in H295R cells. This similarity in covariance between hormone measures for the three data subsets of the original and simulated HT-H295R datasets was reflected in the calculated mMd values in which the median values, even in the presence of sampling noise, were still very good estimates of the original mMd values (Figure 4A). We did observe a minor overall decrease in mMd values when using the high covariance type. As the Mahalanobis distance tends to diminish the significance of changes in measures with very large covariances, it is possible that the high covariance type, despite having a similar correlation in the covariance of hormone measures, has an overall larger magnitude in covariances compared to the low and global covariance type. The diagonal of the correlation matrix shown in Figure 3 indicates the mean variance of the 11 hormone measures across the simulations. We did observe that the high covariance type had slightly larger variances for E2, E1, 11-DCORT, CORT, ANDR, TESTO, DOC, and PROG (Figure 3); and although these differences appear small, they could contribute to the slightly decreased mMd values we observed. Given that the covariance (and variance) observed is stable during data simulation experiments, we conclude that the Mahalanobis distance-based methods that rely on the covariance matrix appear to be appropriate.

The definition of a critical value as a threshold signifying experimental significance is important for the mMd as a prioritization metric; often, users of this HT-H295R mMd model will want to understand in binary whether a chemical was positive or negative, in addition to the mMd values obtained. Here, we utilized a multivariate multiple comparisons procedure first defined by Nakamura and Imada (2005) to establish the minimum mMd value to be considered statistically significant compared to the control response. To calculate this threshold, the variance-covariance matrix is assumed to be known and the sample size is set to the median sample size across all concentrations of a test chemical. Due to these requirements, the nominal Type I error rate is considered an approximation. As part of the data simulation study, we sought to determine how well the critical value approximated the Type I error rate at α = 0.05 and 0.01 by using a true null HT-H295R concentration-response profile. As shown in Table 1, across all covariance types and α levels, the observed Type I error rates were similar. As we found that the approximate Type I error rate (α = 0.01) was slightly higher than the observed false positive rate in the data simulation, this indicated that the use of a 1% error rate was reasonable for establishing the critical value that sets a threshold for positive mMd values. Thus far, most guideline and experimental studies using H295R cells to measure chemical effects on hormone production rely on parametric and non-parametric analyses of variance to identify significant differences compared to control, many incorporating some form of efficacy threshold for activity (Hecker et al., 2011; Higley et al., 2010; Ohlsson et al., 2009; Strajhar et al., 2017; Tremoen et al., 2014). Having established that the Type I error rate used in the calculation of the critical value is a good approximation, it is important to further define the level of change in hormone concentrations required to detect experimental significance in HT-H295R using the mMd approach.

As the mMd reduces all hormone measures into a single value, it is necessary to examine a diverse set of possible hormone response patterns to determine whether the mMd metric is more responsive to certain hormone changes over others. We found that, across nine theoretical combinations (all, some, glucocorticoids, mineralocorticoids) or single hormones (ANDR, TESTO, E2, E1), there was sufficient power (power > 0.8) to detect a 1.5-fold change (increase or decrease) in hormone(s) (Figure 5). As the mMd approach adjusts for covariance, it is not surprising that some hormone responses have decreased power compared to others. The only combination with reduced power (~0.6) to detect a 1.5-fold change was for E1 and E2 combination. In general, the mMd calculation decreases the significance of changes in hormones that have large correlation in covariance. E2 and E1 had the largest correlation in covariance compared to all other measures; therefore, a larger fold-change is required to result in a significant mMd for these two hormones. However, a 2-fold change in any hormone, alone or in combination, demonstrated a power that approached 1. The OECD inter-laboratory validation of the low-throughput H295R assay suggested that based on inter-laboratory variability, a threshold for positive responses for TESTO and E2 likely approached 1.5-fold and 2-fold, respectively (Hecker et al., 2011). As a result, the power of the HT-H295R mMd analysis is similar to the low-throughput version of the assay with only TESTO and E2 measures. An important conclusion is that the mMd approach for the HT-H295R dataset appears to have a reasonable false positive rate approximated at 1% and sufficient power to observe positive mMd responses when there are 1.5- to 2-fold changes.

The data generated using HT-H295R represents the largest chemical screen to date using H295R cells. This offers a unique opportunity to determine whether specific classes of steroidogenesis disruptors are detected in this model system. One widely studied mechanism of steroidogenesis disruption, which was incorporated into the EDSP Tier 1 screening strategy (USEPA, 2016), is inhibition of aromatase (CYP19A1) activity. Due to its regulatory and therapeutic relevance and because the synthesis of estrogens in HT-H295R are reliant on aromatase activity, we chose to examine the ability of the HT-H295R assay to identify putative aromatase inhibitors as a case study of how data from this assay can be used to classify chemicals with specific steroidogenic bioactivity. Reduction of the 11 hormone measures into a singular value using the mMd method appears to be a good ranking metric and indicator of pathway-level effects but does not directly indicate mechanism or hormone class affected (though this may be inspected in the dataset). Inhibition of aromatase is a mechanism of action of interest, with two of the five in vitro assays in the EDSP Tier 1 screening battery targeting this enzyme, one of which is the low-throughput H295R guideline assay (USEPA, 2016). Within HT-H295R, aromatase inhibition should theoretically cause decreases in E2 and E1 concentrations and increases in ANDR and TESTO concentrations, with unknown effects on the remainder of the measured hormones. Filtering the HT-H295R dataset for this steroid response pattern only identified two chemical samples; therefore, we relaxed our search to only significant decreases in E2 and/or E1. We found that 72 of the 766 chemical samples matched this response pattern. Contrary to what was expected, clustering the response profiles of these chemicals showed a prominent response of decreases in both the measured estrogens and androgens and appeared to group chemicals based on increasing potency and efficacy of disruption in these hormones; i.e. most potent disruptors in the bottom cluster and least potent disruptors in the top cluster (Figure 6). Another study of the activity of aromatase inhibitors, including letrozole and ketoconazole, observed similar changes in both E2 and T concentrations in H295R cells (Higley et al., 2010). Like what was observed for Higley et al. (2010), except at their maximum tested concentration of 100 μM, letrozole did not have a strong effect on androgen levels. This contrasts with the other pharmacological aromatase inhibitor, fadrozole hydrochloride, as well as the remaining chemical samples within the bottom cluster of Figure 6, which resulted in significant decreases in both the estrogens and androgens in our study. It may be that an expanded concentration range is needed to better define the concentration-dependent bioactivity of letrozole on estrogen and androgen levels in HT-H295R, since significant bioactivity on E1 and E2 was observed at all exposure concentrations. The two conazole reference chemicals, ketoconazole and clotrimazole, are known to inhibit aromatase as well as other steroidogenic enzymes, including CYP17A1 (Ayub and Levell, 1987; Trosken et al., 2006). Although we highlight known reference aromatase inhibitors in Figure 6, it should be noted that there were many other conazoles including myclobutanil, epoxiconazole, prochloraz, triadimenol, triadimefon, and tebuconazole as well as other drugs such as danazol, metyrapone, and lynestrenol that were members of the two bottom clusters. This qualitative analysis demonstrates that known and suspected aromatase inhibitors appear to cause decreases in both estrogens and androgens, with variable effects for the other hormones. The HT-H295R assay as implemented can identify aromatase inhibitors (i.e. maxmMd > critical value), but the myriad responses across the rest of the hormone measures for these chemicals suggests that proper mechanistic evaluation of chemical effects requires multiple timepoints to better model the complex and dynamic steroidogenesis pathway present in H295R cells (Breen et al., 2011; Breen et al., 2010; Saito et al., 2016). The lack of a standard pattern observed for putative aromatase inhibitors extends to all chemicals tested in HT-H295R. In total, 608 of the 766 unique chemical samples tested had a unique hormone response patterns as measured by ANOVA (Supplemental Data 2), providing support that specific mechanistic hypotheses using the HT-H295R assay would require additional information regarding the kinetics of steroidogenesis and incorporation of orthogonal assays.

The quantification of 11 hormones in the HT-H295R assay enables identification of chemicals with diverse bioactivity on steroidogenesis, more so than solely E2 and TESTO measures in the H295R guideline assays. Additionally, including E1 and ANDR added greater certainty to conclusions of chemical effects on estrogen and androgen production, especially with the observation that these intermediates had strong, positive covariance with their terminal steroid products (Figure 3; Supplemental Figure 1). Indeed, there are many non-guideline studies examining chemical effects across the steroidogenesis pathway present in H295R (Ahmed et al., 2018; Nielsen et al., 2012; Strajhar et al., 2017). Overall, this case study examining putative aromatase inhibitors highlights how HT-H295R data and the mMd metric presented here and previously (Haggard et al., 2018) can assist in identifying and prioritizing chemicals with specific bioactivities of interest such as disruption in glucocorticoid synthesis via disruption of CYP21A2 or HSD3B1, for example.

Development of a ranked prioritization metric should account for non-specific bioactivity and factor in potency and efficacy estimates for the target of interest, and so we demonstrate an approach for using the mMd prioritization metric in this way. As mentioned previously, the HT-H295R assay should be sensitive to cytotoxicity and mitochondrial dysfunction, and when the assay was conducted, a parallel MTT assay was performed to limit the upper concentration considered in the HT-H295R assay. The MTT assay, an indicator of mitochondrial toxicity and cell viability, may be confounded by potential dye reduction in other cellular compartments (Meyer et al., 2018). However, in ToxCast there are many cytotoxicity assays in a variety of cell types that could confirm a cytotoxicity threshold, and so these data were considered herein. A current limitation for this work is a lack of highly specific mitochondrial toxicity assays in ToxCast/Tox21 to identify mitochondrial toxicants at lower concentrations than cytotoxicity. The hybrid ranking approach used in this study incorporates the cytotoxicity “burst” estimates from ToxCast, the parallel HT-H295R cell viability assay (70%), and the BMD and AUC estimates from the mMd concentration-response curves of the chemical samples. We found that selectivity scoring alone was not a sufficient ranking metric. This appears to be a result of lack of coverage in the ToxCast cytotoxicity assays for some chemicals as well as artificially low BMD estimates that occur when the lowest exposure concentration has a mMd value above the critical value but the chemical demonstrated limited efficacy. However, filtering chemical samples that had BMDs within the cytotoxicity ‘burst’ range using the selectivity score, and then ordering by the AUC estimates, appeared to efficiently rank active chemicals. The highest ranked chemicals included chemicals that were identical to hormone measures in HT-H295R (e.g. ANDR, E1, and CORTIC), estrogen analogues (i.e. 17α-estradiol and the equine estrogen, equilin), the phytoestrogen daidzein, and drugs with reproductive health applications (i.e. danazol and mifepristone; Figure 7). Two statin drugs, pitavastatin calcium and cerivastatin sodium, cholesterol synthesis inhibitors that block HMG-CoA reductase activity, and melatonin, which helps regulate circulating hormone levels due to its roles in circadian rhythms (Qin et al., 2015; Wu et al., 2001), were also ranked highly. We did observe that two organophosphate pesticides, parathion and methyl parathion, were a part of this highly prioritized chemical set. Although there are some observations showing induction of aromatase activity (Laville et al., 2006) and anti-glucocorticoid receptor activity (Vrzal et al., 2015), these chemicals are a good example of how this prioritization metric can identify candidates to consider for additional study. It is important to note that the AUC and maxmMd appeared to be positively linearly correlated, in that a large maxmMd was predictive of a large AUC for a test chemical (Supplemental Figure 2). This suggests the maxmMd alone is a good indicator of both potency and efficacy, and that combining maxmMd with information on potential non-specific activity from cytotoxicity is sufficient for prioritization.

Here, we demonstrated that the multivariate approach we previously established using the mMd to combine the hormone measures into a single value to prioritize chemicals is robust and stable. Due to the similarities in mMds, Type I error rate, and power to detect experimental significance for data with different covariance structures, it appears that the initial decisions for covariance estimation and assumptions were appropriate. The use of HT-H295R data to identify specific mechanisms of action across the whole steroidogenesis pathway remains a challenge. Hormone profiling of putative aromatase inhibitors showed a possible unique profile with respect to the direct substrates and products of aromatase, but after expanding to the rest of the hormone measures there was no obvious signature of aromatase inhibition. Further work is needed to understand the complex dynamics of the steroidogenesis pathway in H295R cells, both in the presence and absence of toxicants, to associate hormone response patterns to specific mechanistic perturbations. Despite some limitations in mechanistic interpretation of this dataset, the mMd approach for the HT-H295R data is a very useful method to prioritize chemicals for additional consideration. Accounting for non-specificity due to cytotoxicity alongside potency and efficacy further strengthens the use of the method and presents a data-driven strategy to identify chemicals with the highest potential for disruption of steroidogenesis. Together, these analyses of the HT-H295R assay suggest that the mMd is a robust, reproducible metric that can be used for prioritization, and that examination of effects on specific hormone classes, such as the estrogens, may provide a binary determinant for identification of putative hormone class disruptors. The novel evaluation of the HT-H295R assay and mMd approach further support its use in identification of in vitro endocrine activity, with potential to be incorporated into more comprehensive tiered strategies for screening level chemical safety assessment.

Supplementary Material

Supplement2

Supplemental Data 1. Source data Invitrodb level 0 data for the HT-H295R assay.

Supplemental Data 2. ANOVA Hit call table with fold change direction Concentration responses for hormone measures for all chemical samples were analyzed by ANOVA as previously described (Haggard et al., 2018). Binary response data were annotated based on the direction of response with a 1 indicating a significant fold increase, a −1 indicating a significant fold decrease, and 0 indicating no significant response.

Supplemental Figure 1. Variance and pairwise correlation of hormone data (based on collected data rather than data simulation). The pairwise correlation for each pairing of hormones is represented in the colored blocks of the heatmap, with positive correlation denoted by blue hues and negative correlation denoted by red hues, and high, global, and low covariance correlations listed from top to bottom in each block. The variance for each hormone using the high, global, and low subsets of data are similarly reported just under the hormone abbreviation.

Supplemental Figure 2. Boxplots of mMd distances calculated from the data simulation. Boxplots of mMd concentration response profiles across the data simulations, with high, global, and low covariance matrix types (large PDFs with one plot per chemical sample). For all plots, red, green, and blue correspond to the low, global, and high covariance type, respectively. Filled triangles represent mMd values estimated using the original dataset using the three covariance types.

Supplemental Figure 3. Comparison of prioritization metrics including area under the curve, log10-selectivity, and maxmMd. (A) Scatterplot of the maxmMd compared to the area under the curve (AUC) of the mMd concentration response by chemical sample which shows a positive correlation of maxmMd with AUC . (B) Scatterplot of the maxmMd compared to the log10 selectivity by chemical sample. Little correlation between these values is observed, which may be indicative of chemicals with high potency but low efficacy using the selectivity metric alone.

Supplemental Figure 4. Selectivity for all tested chemicals. The BMD is the potency value at the critical value for the mMd curve (red); the MTTacc is the potency value at the threshold for a positive response in that assay (green); and the cytotoxicity burst is the potency value for the lower bound of a cytotoxicity distribution (blue). Selectivity (minimum of cytoburst or MTTacc minus the BMD), AUC (area under the curve for mMd versus concentration), and maxmMd are shown to the right.

Supplemental Table 1. Selectivity of HT-H295R bioactivity. Selectivity score based on comparison of HT-H295R active concentration to the minimum of the cytotoxicity burst or parallel MTT assay.

Highlights.

  • Data simulations suggest that covariances and variances among the 11 steroid hormones may be inherent to this assay.

  • The mean Mahalanobis distance (mMd) approach demonstrated a false positive rate of less than 1%.

  • The mMd approach has sufficient power to observe 1.5- to 2-fold changes in hormones and hormone combinations.

  • Reference aromatase inhibitors were identified using the HT-H295R assay.

  • A relative prioritization can be performed using cytotoxicity information and the maximum mMd.

Footnotes

Publisher's Disclaimer: Disclaimer: The United States Environmental Protection Agency (U.S. EPA) through its Office of Research and Development has subjected this article to Agency administrative review and approved it for publication. Mention of trade names or commercial products does not constitute endorsement for use. The views expressed in this article are those of the authors and do not necessarily represent the views or policies of the US EPA.

1

Endocrine Disruptor Screening Program (EDSP); U.S. Environmental Protection Agency, USEPA; high-throughput H295R assay, HT-H295R; mean Mahalanobis distance, mMd; maximum mean Mahalanobis distance, maxmMd; new approach methodology, NAM; progesterone, PROG; 17α-hydroxypregnenolone, OH-PREG; 17α-hydroxyprogesterone, OH-PROG; deoxycorticosterone, DOC; corticosterone, CORTIC; 11-deoxycortisol, 11-DCORT; cortisol, CORT; androstenedione, ANDR; testosterone, TESTO; estrone, E1; estrone E2

2

As described in Karmaus et al. (2016) and Haggard et al. (2018), a forskolin pre-stimulation was used. However, despite pre-stimulation, chemical-mediated induction of steroid hormone synthesis is still observed with the HT-H295R assay. For example, comparison of the median DMSO control hormone response to the median response of the forskolin positive controls (10 µM) demonstrate positive fold-induction for 10 of 11 steroid hormones reported (1.99±0.16, 0.45±0.35, 1.25±0.20, 1.47±0.47, 2.93±0.50, 1.42±0.12, 1.86±0.21, 1.45±0.13, 1.42±0.09, 2.51±0.57, 2.59±0.40 log2 fold change in µM for OH-PREG, PROG, OH-PROG, DOC, CORTIC, 11-DCORT, CORT, ANDR, T, E1, and E2, respectively).

3

No biological replicates were used in the data simulation because there were no biological replicates in the experimental dataset. In the experimental data, most chemicals had 1 biological replicate and 2 technical replicates, but approximately 16% of chemicals were screened with 2–3 biological replicates and these data were used to evaluate assay result reproducibility in a previous analysis (Haggard et al., 2018).

References

  1. Ahmed KEM, et al. , 2018. LC-MS/MS based profiling and dynamic modelling of the steroidogenesis pathway in adrenocarcinoma H295R cells. Toxicol In Vitro. 52, 332–341. [DOI] [PubMed] [Google Scholar]
  2. Ayub M, Levell MJ, 1987. Inhibition of testicular 17 alpha-hydroxylase and 17,20-lyase but not 3 beta-hydroxysteroid dehydrogenase-isomerase or 17 beta-hydroxysteroid oxidoreductase by ketoconazole and other imidazole drugs. J Steroid Biochem. 28, 521–31. [DOI] [PubMed] [Google Scholar]
  3. Botteri Principato NL, et al. , 2018. The use of purified rat Leydig cells complements the H295R screen to detect chemical-induced alterations in testosterone production. Biol Reprod. 98, 239–249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Breen M, et al. , 2011. Mechanistic computational model of steroidogenesis in H295R cells: role of oxysterols and cell proliferation to improve predictability of biochemical response to endocrine active chemical--metyrapone. Toxicol Sci. 123, 80–93. [DOI] [PubMed] [Google Scholar]
  5. Breen MS, et al. , 2010. Computational model of steroidogenesis in human H295R cells to predict biochemical response to endocrine-active chemicals: model development for metyrapone. Environ Health Perspect. 118, 265–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Browne P, et al. , 2015. Screening Chemicals for Estrogen Receptor Bioactivity Using a Computational Model. Environ Sci Technol. 49, 8804–14. [DOI] [PubMed] [Google Scholar]
  7. Casati S, 2018. Integrated Approaches to Testing and Assessment. Basic Clin Pharmacol Toxicol. 123 Suppl 5, 51–55. [DOI] [PubMed] [Google Scholar]
  8. Chen S, et al. , 2015. Cell-Based High-Throughput Screening for Aromatase Inhibitors in the Tox21 10K Library. Toxicol Sci. 147, 446–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chen S, et al. , 2014. AroER tri-screen is a biologically relevant assay for endocrine disrupting chemicals modulating the activity of aromatase and/or the estrogen receptor. Toxicol Sci. 139, 198–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. De Maesschalck R, et al. , 2000. The Mahalanobis distance. Chemometrics and Intelligent Laboratory Systems. 50, 1–18. [Google Scholar]
  11. Dix DJ, et al. , 2007. The ToxCast program for prioritizing toxicity testing of environmental chemicals. Toxicol Sci. 95, 5–12. [DOI] [PubMed] [Google Scholar]
  12. ECHA, New Approach Methodologies in Regulatory Science: Proceedings of a scientific workshop. In: Agency, E. C., (Ed.), 2016. [Google Scholar]
  13. ECHA, Non-animal approaches: Current status of regulatory applicability under the REACH, CLP and Biocidal Products regulations. 2017. [Google Scholar]
  14. Filer DL, et al. , 2017. tcpl: the ToxCast pipeline for high-throughput screening data. Bioinformatics. 33, 618–620. [DOI] [PubMed] [Google Scholar]
  15. Friedman KP, et al. , 2019. Utility of In Vitro Bioactivity as a Lower Bound Estimate of In Vivo Adverse Effect Levels and in Risk-Based Prioritization. Toxicol Sci. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Gazdar AF, et al. , 1990. Establishment and characterization of a human adrenocortical carcinoma cell line that expresses multiple pathways of steroid biosynthesis. Cancer Res. 50, 5488–96. [PubMed] [Google Scholar]
  17. Griesinger C, et al. , 2016. Validation of Alternative In Vitro Methods to Animal Testing: Concepts, Challenges, Processes and Tools. Adv Exp Med Biol. 856, 65–132. [DOI] [PubMed] [Google Scholar]
  18. Gu Z, et al. , 2016. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 32, 2847–9. [DOI] [PubMed] [Google Scholar]
  19. Haggard DE, et al. , 2018. High-Throughput H295R Steroidogenesis Assay: Utility as an Alternative and a Statistical Approach to Characterize Effects on Steroidogenesis. Toxicol Sci. 162, 509–534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Health Canada, Integrating New Approach Methodologies within the CMP: Identifying Priorities for Risk Assessment, Existing Substances Risk Assessment Program. In: Committee, C. M. P. C. S., (Ed.), 2016. [Google Scholar]
  21. Health Canada, Chemicals Management Plan Science Committee: Advancing consideration of endocrine-disrupting chemicals under the Canadian Environmental Protection Act, 1999. Vol. 2019, 2018. [Google Scholar]
  22. Hecker M, et al. , 2011. The OECD validation program of the H295R steroidogenesis assay: Phase 3. Final inter-laboratory validation study. Environ Sci Pollut Res Int. 18, 503–15. [DOI] [PubMed] [Google Scholar]
  23. Higley EB, et al. , 2010. Assessment of chemical effects on aromatase activity using the H295R cell line. Environ Sci Pollut Res Int. 17, 1137–48. [DOI] [PubMed] [Google Scholar]
  24. Judson R, et al. , 2016. Editor’s Highlight: Analysis of the Effects of Cell Stress and Cytotoxicity on In Vitro Assay Activity Across a Diverse Chemical and Assay Space. Toxicol Sci. 152, 323–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Judson R, et al. , 2009. The toxicity data landscape for environmental chemicals. Environ Health Perspect. 117, 685–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Judson RS, et al. , 2017. On selecting a minimal set of in vitro assays to reliably determine estrogen agonist activity. Regul Toxicol Pharmacol. 91, 39–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Judson RS, et al. , 2015. Integrated Model of Chemical Perturbations of a Biological Pathway Using 18 In Vitro High-Throughput Screening Assays for the Estrogen Receptor. Toxicol Sci. 148, 137–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Judson RS, et al. , 2018a. New approach methods for testing chemicals for endocrine disruption potential. Current Opinion in Toxicology. 9, 40–47. [Google Scholar]
  29. Judson RS, et al. , 2018b. Workflow for defining reference chemicals for assessing performance of in vitro assays. ALTEX. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Kao YC, et al. , 1998. Molecular basis of the inhibition of human aromatase (estrogen synthetase) by flavone and isoflavone phytoestrogens: A site-directed mutagenesis study. Environ Health Perspect. 106, 85–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Karmaus AL, et al. , 2016. High-Throughput Screening of Chemical Effects on Steroidogenesis Using H295R Human Adrenocortical Carcinoma Cells. Toxicol Sci. 150, 323–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Kavlock R, et al. , 2012. Update on EPA’s ToxCast program: providing high throughput decision support tools for chemical risk management. Chem Res Toxicol. 25, 1287–302. [DOI] [PubMed] [Google Scholar]
  33. Kleinstreuer NC, et al. , 2017. Development and Validation of a Computational Model for Androgen Receptor Activity. Chem Res Toxicol. 30, 946–964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Laville N, et al. , 2006. Modulation of aromatase activity and mRNA by various selected pesticides in the human choriocarcinoma JEG-3 cell line. Toxicology. 228, 98–108. [DOI] [PubMed] [Google Scholar]
  35. Mangelis A, et al. , 2016. Computational analysis of liquid chromatography-tandem mass spectrometric steroid profiling in NCI H295R cells following angiotensin II, forskolin and abiraterone treatment. J Steroid Biochem Mol Biol. 155, 67–75. [DOI] [PubMed] [Google Scholar]
  36. Meyer JN, et al. , 2018. Mitochondrial Toxicity. Toxicol Sci. 162, 15–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Miller WL, 2017. Disorders in the initial steps of steroid hormone synthesis. J Steroid Biochem Mol Biol. 165, 18–37. [DOI] [PubMed] [Google Scholar]
  38. Miller WL, Auchus RJ, 2011. The molecular biology, biochemistry, and physiology of human steroidogenesis and its disorders. Endocr Rev. 32, 81–151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Nakamura T, Imada T, 2005. Multiple comparison procedure of Dunnett’s type for multivariate normal means. Journal of the Japanese Society of Computational Statistics. 18, 21–32. [Google Scholar]
  40. Nielsen FK, et al. , 2012. H295R cells as a model for steroidogenic disruption: a broader perspective using simultaneous chemical analysis of 7 key steroid hormones. Toxicol In Vitro. 26, 343–50. [DOI] [PubMed] [Google Scholar]
  41. OECD, 2011. Test No. 456: H295R Steroidogenesis Assay. [Google Scholar]
  42. Ohlsson A, et al. , 2009. A biphasic effect of the fungicide prochloraz on aldosterone, but not cortisol, secretion in human adrenal H295R cells--underlying mechanisms. Toxicol Lett. 191, 174–80. [DOI] [PubMed] [Google Scholar]
  43. Patlewicz G, et al. , 2013. Use and validation of HT/HC assays to support 21st century toxicity evaluations. Regul Toxicol Pharmacol. 65, 259–68. [DOI] [PubMed] [Google Scholar]
  44. Pinto CL, et al. , 2018. Identification of candidate reference chemicals for in vitro steroidogenesis assays. Toxicol In Vitro. 47, 103–119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Qin F, et al. , 2015. Inhibitory effect of melatonin on testosterone synthesis is mediated via GATA-4/SF-1 transcription factors. Reprod Biomed Online. 31, 638–46. [DOI] [PubMed] [Google Scholar]
  46. Richard AM, et al. , 2016. ToxCast Chemical Landscape: Paving the Road to 21st Century Toxicology. Chem Res Toxicol. 29, 1225–51. [DOI] [PubMed] [Google Scholar]
  47. Rotroff DM, et al. , 2013. Using in vitro high throughput screening assays to identify potential endocrine-disrupting chemicals. Environ Health Perspect. 121, 7–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Rovida C, et al. , 2015. Integrated Testing Strategies (ITS) for safety assessment. ALTEX. 32, 25–40. [DOI] [PubMed] [Google Scholar]
  49. Saito R, et al. , 2016. Estimation of the Mechanism of Adrenal Action of Endocrine-Disrupting Compounds Using a Computational Model of Adrenal Steroidogenesis in NCI-H295R Cells. J Toxicol. 2016, 4041827. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Sipes NS, et al. , 2013. Profiling 976 ToxCast chemicals across 331 enzymatic and receptor signaling assays. Chem Res Toxicol. 26, 878–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Strajhar P, et al. , 2017. Steroid profiling in H295R cells to identify chemicals potentially disrupting the production of adrenal steroids. Toxicology. 381, 51–63. [DOI] [PubMed] [Google Scholar]
  52. Thomas RS, et al. , 2018. The US Federal Tox21 Program: A strategic and operational plan for continued leadership. ALTEX. 35, 163–168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Tremoen NH, et al. , 2014. Exposure to the three structurally different PCB congeners (PCB 118, 153, and 126) results in decreased protein expression and altered steroidogenesis in the human adrenocortical carcinoma cell line H295R. J Toxicol Environ Health A. 77, 516–34. [DOI] [PubMed] [Google Scholar]
  54. Trosken ER, et al. , 2006. Inhibition of human CYP19 by azoles used as antifungal agents and aromatase inhibitors, using a new LC-MS/MS method for the analysis of estradiol product formation. Toxicology. 219, 33–40. [DOI] [PubMed] [Google Scholar]
  55. USEPA, Endocrine Disruptor Screening Program Test Guidelines: OPPTS 890.1550 : Steroidogenesis (human Cell Line-H295R). U.S. Environmental Protection Agency, Office of Chemical Safety and Pollution Prevention, 2009. [Google Scholar]
  56. USEPA, Endocrine Disruptor Screening Program for the 21st Century: (EDSP21 Work Plan) The Incorporation of In Silico Models and In Vitro High Throughput Assays in the Endocrine Disruptor Screening Program (EDSP) for Prioritization and Screening. U.S. Environmental Protection Agency, Endocrine Disruptor Screening Program, 2011. [Google Scholar]
  57. USEPA, Endocrine Disruptor Screening Program Tier 1 Battery of Assays. Vol. 2018, 2016. [Google Scholar]
  58. USEPA, Continuing Development of Alternative HighThroughput Screens to Determine Endocrine Disruption, Focusing on Androgen Receptor, Steroidogenesis, and Thyroid Pathways. U.S. Environmental Protection Agency, Endocrine Disruptor Screening Program, 2017. [Google Scholar]
  59. USEPA, Strategic Plan to Promote the Development and Implementation of Alternative Test Methods Within the TSCA Program. In: Prevention, O. o. C. S. a. P., (Ed.), 2018. [Google Scholar]
  60. USEPA, Directive to Prioritize Efforts to Reduce Animal Testing. 2019. [Google Scholar]
  61. Vrzal R, et al. , 2015. Environmental pollutants parathion, paraquat and bisphenol A show distinct effects towards nuclear receptors-mediated induction of xenobiotics-metabolizing cytochromes P450 in human hepatocytes. Toxicol Lett. 238, 43–53. [DOI] [PubMed] [Google Scholar]
  62. Wu CS, et al. , 2001. Melatonin inhibits the expression of steroidogenic acute regulatory protein and steroidogenesis in MA-10 cells. J Androl. 22, 245–54. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement2

Supplemental Data 1. Source data Invitrodb level 0 data for the HT-H295R assay.

Supplemental Data 2. ANOVA Hit call table with fold change direction Concentration responses for hormone measures for all chemical samples were analyzed by ANOVA as previously described (Haggard et al., 2018). Binary response data were annotated based on the direction of response with a 1 indicating a significant fold increase, a −1 indicating a significant fold decrease, and 0 indicating no significant response.

Supplemental Figure 1. Variance and pairwise correlation of hormone data (based on collected data rather than data simulation). The pairwise correlation for each pairing of hormones is represented in the colored blocks of the heatmap, with positive correlation denoted by blue hues and negative correlation denoted by red hues, and high, global, and low covariance correlations listed from top to bottom in each block. The variance for each hormone using the high, global, and low subsets of data are similarly reported just under the hormone abbreviation.

Supplemental Figure 2. Boxplots of mMd distances calculated from the data simulation. Boxplots of mMd concentration response profiles across the data simulations, with high, global, and low covariance matrix types (large PDFs with one plot per chemical sample). For all plots, red, green, and blue correspond to the low, global, and high covariance type, respectively. Filled triangles represent mMd values estimated using the original dataset using the three covariance types.

Supplemental Figure 3. Comparison of prioritization metrics including area under the curve, log10-selectivity, and maxmMd. (A) Scatterplot of the maxmMd compared to the area under the curve (AUC) of the mMd concentration response by chemical sample which shows a positive correlation of maxmMd with AUC . (B) Scatterplot of the maxmMd compared to the log10 selectivity by chemical sample. Little correlation between these values is observed, which may be indicative of chemicals with high potency but low efficacy using the selectivity metric alone.

Supplemental Figure 4. Selectivity for all tested chemicals. The BMD is the potency value at the critical value for the mMd curve (red); the MTTacc is the potency value at the threshold for a positive response in that assay (green); and the cytotoxicity burst is the potency value for the lower bound of a cytotoxicity distribution (blue). Selectivity (minimum of cytoburst or MTTacc minus the BMD), AUC (area under the curve for mMd versus concentration), and maxmMd are shown to the right.

Supplemental Table 1. Selectivity of HT-H295R bioactivity. Selectivity score based on comparison of HT-H295R active concentration to the minimum of the cytotoxicity burst or parallel MTT assay.

RESOURCES