Abstract
Microarray profiling of chemical-induced effects is being increasingly used in medium- and high-throughput formats. Computational methods are described here to identify molecular targets from whole-genome microarray data using as an example the estrogen receptor α (ERα), often modulated by potential endocrine disrupting chemicals. ERα biomarker genes were identified by their consistent expression after exposure to 7 structurally diverse ERα agonists and 3 ERα antagonists in ERα-positive MCF-7 cells. Most of the biomarker genes were shown to be directly regulated by ERα as determined by ESR1 gene knockdown using siRNA as well as through chromatin immunoprecipitation coupled with DNA sequencing analysis of ERα-DNA interactions. The biomarker was evaluated as a predictive tool using the fold-change rank-based Running Fisher algorithm by comparison to annotated gene expression datasets from experiments using MCF-7 cells, including those evaluating the transcriptional effects of hormones and chemicals. Using 141 comparisons from chemical- and hormone-treated cells, the biomarker gave a balanced accuracy for prediction of ERα activation or suppression of 94% and 93%, respectively. The biomarker was able to correctly classify 18 out of 21 (86%) ER reference chemicals including “very weak” agonists. Importantly, the biomarker predictions accurately replicated predictions based on 18 in vitro high-throughput screening assays that queried different steps in ERα signaling. For 114 chemicals, the balanced accuracies were 95% and 98% for activation or suppression, respectively. These results demonstrate that the ERα gene expression biomarker can accurately identify ERα modulators in large collections of microarray data derived from MCF-7 cells.
Keywords: estrogen receptor, gene expression profiling, MCF-7 cell line, biomarker.
High-throughput screening (HTS) assays are an important component of chemical safety evaluation programs carried out by a number of organizations. The Environmental Protection Agency (EPA) ToxCast screening program (http://www.epa.gov/chemical-research/toxicity-forecasting) and the cross-agency Tox21 program (http://www.ncats.nih.gov/tox21) have screened more than 1800 chemicals in as many as 700 HTS assays representing approximately 350 molecular targets (Judson et al., 2014). Although the use of the HTS assays in the ToxCast screening program has proven useful in prioritizing chemicals for further testing, there is increased recognition that the assays do not sufficiently cover all potentially important pathways (Cox et al., 2014; Filer et al., 2014). To more completely assess the effects of chemicals on specific targets and pathways, approaches that better capture perturbations of molecular targets of regulatory interest in HTS formats are needed.
A complementary approach to multiple HTS assays is to use microarray-based gene expression profiling. The field of gene expression profiling is witnessing advances in methods that can readily assess partial or full-genome gene expression changes in HTS formats. A notable example is the screening program coordinated by the Broad Institute which has recently made public the Library of Integrated Network-based Cellular Signatures (LINCS) database consisting of approximately 4000 mostly pharmaceutical chemicals screened in approximately 17 cell lines using a platform that assesses the expression of approximately 1000 genes (http://www.lincsproject.org). Although the full impact of this effort on the field of chemical genomics is yet to be determined, the LINCS project was derived in part from an earlier project called the Connectivity Map (CMAP) in which a collection of genome-wide transcriptional expression data was collected from cultured human cells treated with approximately 1300 bioactive small molecules (Lamb, 2007; Lamb et al., 2006). The CMAP database and associated tools have proven useful in identification of drug candidates used to treat a number of diseases (Hurle et al., 2013). In the near future, full-genome gene expression assessment will likely be available in affordable, HTS formats. For example, the RNA-mediated oligonucleotide Annealing, Selection, and Ligation with Next-Gen sequencing (RASL-Seq) platform, which has been used to screen compounds for antiandrogenicity (Li et al., 2012), could theoretically be used to assess expression of all genes (Larman et al., 2014). Integrating gene expression profiling into HTS, if carried out in appropriate cell lines or organotypic cultures, would increase confidence that fewer chemically induced effects would be overlooked. Used in conjunction with in vitro to in vivo extrapolation approaches, points-of-departure could be derived from chemically induced perturbations in gene expression. Gene expression profiling could be used as “Tier 0” assays to further prioritize targeted in vitro testing in the context of toxicity testing programs.
One of the major challenges of HTS gene expression profiling is to accurately identify modulation of specific molecular targets. Previous attempts at “connectivity mapping” or using gene expression profiles to identify biological states have had some success both in relation to drugs and diseases (Lamb et al., 2006) and to toxicology (Smalley et al., 2010). Smalley et al. (2010) developed a method to query the CMAP datasets with gene expression signatures for 3 chemical classes, including potentially endocrine-disrupting estrogens. As the identification of endocrine disrupting compounds (EDCs) is currently a high priority at the EPA, we have greatly expanded on this work and determined whether computational procedures could be developed which would identify potential EDCs that can interfere with normal endocrine signaling. One mechanism through which xenobiotics can act as EDCs is via inappropriate activation or repression of a subgroup of nuclear receptors for estrogen, testosterone and thyroid hormones. These receptors, including 2 estrogen receptors (ERα and ERβ), the androgen receptor and 2 thyroid hormone receptors (THRα and THRβ), act as ligand bound transcription factors that can be activated or repressed by chemicals resulting in altered gene expression in susceptible tissues. EDCs can also impact gene expression indirectly by interfering with the biosynthesis, metabolism or transport of activating hormones. Exposure to EDCs is a risk factor for oncogenesis and disruption of reproductive development in humans and wildlife (Diamanti-Kandarakis et al., 2009).
In the 1990s, increased recognition that man-made chemicals may interfere with endocrine functions in wildlife and humans led to legislation in the United States, eventually resulting in a mandate that the U.S. EPA develop a screening program for potential EDCs. In this program, approximately 10000 existing chemicals would be evaluated for their potential to disrupt the estrogen, androgen, and thyroid signaling systems (The Endocrine Disruptor Screening Program [EDSP]; http://www.epa.gov/endocrine-disruption). Under these guidelines, a battery of Tier 1 in vitro and short-term in vivo screening assays including those that assess nuclear receptor activity were developed to provide guidance for subsequent longer term, more definitive in vivo Tier 2 tests for endocrine disrupting activity. The EPA’s vision for the EDSP in the twenty-first century (EDSP21) includes utilization of in vitro HTS assays coupled with computational modeling to prioritize chemicals, and to eventually replace some or all of the current EDSP Tier 1 screening assays. Within the ToxCast battery, there are 18 HTS assays that have been used to evaluate the ability of chemicals to modulate ERα and ERβ (Judson et al., 2015).
ERα, like other nuclear receptor family members, regulates target gene expression through well-defined mechanisms. The classical pathway includes ligand binding by agonists followed by direct DNA binding to estrogen response elements (ERE) and modulation of gene regulation (Barone et al., 2010; Safe and Kim, 2008). Nonclassical pathways include post-transcriptional modulation of ERα through upstream activation of a number of kinase-dependent signaling pathways. In addition, estrogens also activate ERα-dependent transactivation through ERα interactions with Sp1, AP1, and other DNA-bound transcription factors (Safe and Kim, 2008; Wu et al., 2008a,b).
Previous studies have successfully linked specific gene expression profiles in developing rats to estrogenic activity as a proposed screening tool (Naciff et al., 2003; Naciff and Daston, 2004). To build on this idea of using genomic data to screen for EDCs, we developed a gene expression biomarker for ERα and tested its ability to identify estrogenic compounds in a database of microarray data. The predictive capabilities of the biomarker were determined by comparison to the expression profiles of known ERα active/inactive chemicals in the human breast cancer cell line, MCF-7. We determined whether the biomarker could serve as a surrogate for the in vitro HTS assays currently used to assess estrogenicity or antiestrogenicity of compounds through HTS ER screening programs (Judson et al., 2015; Rotroff et al., 2014).
MATERIALS AND METHODS
Strategy for identification of perturbants that modulate ERα in MCF-7 gene expression profiles
A summary of the methods used in this study are outlined in Figure 1. A screen for ERα modulators required a gene expression biomarker of ERα-dependent genes and an annotated database of gene expression profiles of statistically filtered genes (also called biosets). The ERα biomarker is a list of differentially expressed genes that are consistently altered in expression after exposure to ERα modulators. The biomarker includes fold-change values associated with each gene, derived from the average differences in expression across treatment by 7 agonists. A commercially available gene expression database (http://www.nextbio.com) facilitated the assembly of a gene expression compendium that with the ERα biomarker could be used for chemical screening. The NextBio database contains over 123000 lists of statistically filtered genes from over 18800 microarray studies carried out in 16 species (as of June, 2015). Available information about each bioset was extracted from NextBio and used to populate a spreadsheet of experimental parameters. To facilitate analysis, each bioset was annotated for the general category of the perturbant (eg, hormone) and the specific name of the perturbant examined (eg, 17β-estradiol). Biosets generated from experiments in the human breast cancer cell line, MCF-7, were used in the analysis. MCF-7 cells were examined as a possible in vitro cell line for ERα screening because of the known expression levels of ER subtypes (ie, primarily ERα) and responsiveness to ERα modulators. The ERα gene biomarker was uploaded to the NextBio database and compared with all biosets in the database using the Running Fisher algorithm (Kupershmidt et al., 2010) to assess activation or suppression of ERα function. The method allows an assessment of the overlap in regulated genes between the biomarker and the bioset and whether those overlapping genes are significantly regulated in a similar or opposite manner. Biosets which exhibit expression of biomarker genes that are positively correlated with the biomarker would be predicted to exhibit ERα activation. The activation could be due to direct agonism or occur indirectly (eg, increasing pools of estradiol). Biosets which exhibit expression of biomarker genes that are negatively correlated to the biomarker would be predicted to exhibit ERα suppression through direct or indirect mechanisms. Due to endogenous ERα activators in the growth media (eg, Sikora et al., 2012 and discussed in the results), MCF-7 cells exhibit some constitutive ERα activity that allows positively regulated genes to be downregulated in the presence of ERα antagonists. Results of the comparisons were exported and used to populate the annotated compendium with a Running Fisher test P value of each comparison and direction of correlation. Test results were used to determine the accuracy of predictions as described later. We have previously used this analysis strategy to accurately identify chemicals that activate or suppress other transcription factors (aryl hydrocarbon receptor [AhR], constitutive androstane receptor [CAR] and peroxisome proliferator-activated receptor alpha [PPARα]) (Oshida et al., 2015a,b,c).
Identification of differentially expressed genes in NextBio microarray datasets
All differentially regulated genes were identified using the criteria in the NextBio analysis pipeline and are described in detail in Kupershmidt et al. (2010). Briefly, following platform-appropriate processing and normalization, statistical analysis to identify differentially expressed genes involved Welch or standard t tests with a P value cutoff of .05 (without multiple test correction) and a minimum absolute fold-change cutoff of 1.2. The CMAP database was downloaded as CMAP 2.0 build01 into NextBio. Even though there was only 1 biological replicate per chemical exposure (ie, 1 Affymetrix .cel file per treatment), statistically significant genes were identified by comparing each treatment with a group of control samples using a t test to calculate the P value with an assumption of equal variance between case and controls. Chemicals were excluded from the analysis if there were an insufficient number of corresponding controls matched to a treated sample. For the CMAP data, the 6 h treatment groups were analyzed in NextBio, which capture initial cellular responses to chemical exposure through ERα modulation.
Assembly of a compendium of gene expression experiments carried out in MCF-7 cells
Information in the NextBio database was used to build an annotated compendium of gene expression biosets derived from experiments carried out in MCF-7 cells. First, annotated information from NextBio about human-derived biosets was used to populate a master file with information about each bioset including Biodesign, Biosource, Chemical Name, Gene, Gene Mode, Phenotype, Tissue, and Study ID. Approximately 150 biosets were removed from subsequent annotation because the full name of the bioset was represented more than once in the database. The table was then filtered for biosets derived from MCF-7 cells, and these biosets were used to populate a separate table. Biosets from other cell lines treated with 17β-estradiol (E2) were also collected and used for additional subsequent comparisons in this study. Each bioset was annotated for category and name of the perturbant examined based on the name of the bioset. For example, the bioset called “MCF-7 cells + hexestrol, 14.8µM _vs_ DMSO vehicle” is in the category “Chemical” and the specific perturbant is “Hexestrol.” The bioset called “MCF-7 with siRNA disrupted ESR1_72hr _vs_ siRNA controls” is in the category “Gene” and the specific perturbant is “ESR1.” Biosets that examined more than 2 perturbants at 1 time (eg, exposure to 3 chemicals vs control) or that could not be interpreted were not used in any further analyses. The final compendium contained approximately 2200 biosets.
Identification of ERα biomarker genes
Lists of statistically filtered genes were used to derive a consensus gene expression biomarker for ERα. Biosets in NextBio used to create the biomarker include the following derived from the CMAP 2.0 dataset (Lamb et al., 2006):
“MCF-7 cells + alpha-estradiol, 0.01µM _vs_ DMSO vehicle”
“MCF-7 cells + genistein, 10µM _vs_ DMSO vehicle”
“MCF-7 cells + hexestrol, 14.8µM _vs_ DMSO vehicle”
“MCF-7 cells + mestranol, 12.8µM _vs_ DMSO vehicle”
“MCF-7 cells + estradiol, 0.01µM _vs_ DMSO vehicle”
“MCF-7 cells + diethylstilbestrol, 15µM _vs_ DMSO vehicle”
“MCF-7 cells + estrone, 14.8µM _vs_ DMSO vehicle”
“MCF-7 cells + fulvestrant, 1µM _vs_ DMSO vehicle”
“MCF-7 cells + raloxifene, 7.8µM _vs_ DMSO vehicle”
“MCF-7 cells + clomifene, 6.6µM _vs_ DMSO vehicle”
These biosets were selected because they exhibited robust gene expression changes (> 1700 statistically altered genes for each bioset); also the chemicals used were structurally diverse, and included both well-known agonists (first 7) and antagonists (last 3). The top 5000 genes with the greatest degree of overlap between all biosets were identified by the “meta-analysis” function in NextBio, and all data were exported. First, those genes which exhibited consistent expression behavior across the agonists were selected. For this filter, the genes had to consistently exhibit either up or down regulation in at least 6 of the 7 comparisons. The resulting gene list was then compared with the gene profiles of cells treated with antagonists and genes that exhibited the contrasting directionality from controls were selected. Those genes had to consistently exhibit either up or down regulation by at least 2 of the 3 comparisons. Thus, biomarker genes that were increased in expression after chemical exposure by agonists were decreased by antagonists, and genes that were decreased in expression after agonist exposure were increased by antagonists. The final list consisted of 46 genes. An average fold-change across all agonist treatments was calculated for each gene. These average fold-change values and gene abbreviations were imported into NextBio without any further filtering.
Identification of ERα target genes
To determine putative target genes that may be directly regulated by ERα in our biomarker gene list, we analyzed multiple chromatin immunoprecipitation coupled with DNA sequencing (ChIP-seq) datasets (Cicatiello et al., 2010; Grober et al., 2011; Hu et al., 2010; Ross-Innes et al., 2010; Welboren et al., 2009) which were curated in ChIPBase (Yang et al., 2013). An additional ERα ChIP-seq performed in MCF-7 cells not included in ChIPBase but found in the published literature was also examined in the analysis (Joseph et al., 2010). To increase the likelihood of selecting regions that were ERα bound, we further filtered these bound regions for the presence of an ERα binding motif (Welboren et al., 2009) using the R Bioconductor package “MotifDb” (http://www.bioconductor.org/packages/release/bioc/html/MotifDb.html). In addition, we looked for distal ERα binding sites using data derived from a chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) (Fullwood et al., 2009), thereby considering non-cis ERα regulation of gene expression. The evaluation of evidence for linkage of ERα binding in association with target genes was a post hoc analysis and served to support, but not develop, the composition of the biomarker.
To determine if the level of ERα expression affects the expression of biomarker genes, biosets from experiments involving siRNA knockdown of ERα in MCF-7 cells were considered. Specifically, the expression level (fold-change) of the 46 biomarker genes was determined in the following biosets: “GSE27473: MCF7 breast cancer cells with estrogen receptor alpha siRNA_vs_control,” “GSE18431: MCF7 breast cancer cell line ESR1 shRNA_vs_control shRNA,” “GSE10061: MCF7 breast cancer cells transfected 72 hr with estrogen receptor a siRNA_vs_untransfected,” “GSE18431: MCF7 breast cancer cell line ESR1 shRNA_vs_luc1 shRNA,” “GSE10890: MCF7 with siRNA disrupted ESR1_18hr_vs_siRNA controls,” “GSE37820: Breast cancer MCF7 cell line—ESR1 siRNA_vs_control siRNA,” “GSE10890: MCF7 with siRNA disrupted ESR1_72hr_vs_siRNA controls.”
Comparison of the ERα biomarker to biosets in the MCF-7 compendium
The strategy for comparison of a biomarker to collections of biosets has been described in previous studies (Oshida et al., 2015a,b,c). Using the Running Fisher algorithm, the ERα biomarker was compared with each bioset in NextBio. The P value and direction of the correlation were exported. P values were converted to −log(P value)s and those with negative correlations were converted to negative numbers. The final list of −log(P value)s was used to populate the table containing the study characteristics of each bioset. This final master table enabled the determination of effects on ERα by categories of perturbants (eg, chemical) as well as individual perturbants (eg, genistein).
Prediction accuracy of ERα function
Biosets from the following microarray experiments in which the MCF-7 cells were treated with hormones or chemicals with known activation of ERα were used to determine predictive accuracy: GSE10618, GSE11266, GSE11317, GSE11324, GSE11352, GSE11467, GSE11506, GSE11791, GSE13577, GSE14986, GSE15548, GSE15717, GSE20081, GSE22012, GSE2225, GSE22533, GSE23610, GSE23850, GSE24065, GSE24592, GSE25316, GSE26259, GSE26459, GSE26834, GSE27375, GSE28006, GSE30597, GSE30931, GSE32670, GSE33366, GSE3529, GSE35428, GSE38252, GSE39564, GSE39623, GSE4006, GSE4025, GSE42619, GSE43702, GSE4668, GSE46856, GSE46924, GSE48931, GSE48989, GSE50705, GSE5200, GSE5258, GSE53394, GSE57935, GSE5840, GSE6800, GSE8383, GSE8597, GSE9253, and GSE9936. Experiments are represented by their Gene Expression Omnibus (GEO) Series (GSE) number which identifies them in the GEO public data repository and also in NextBio. The number of biosets used to test for an increase in ERα function (ie, ERα “activation”) was 122 true positives (TPs) and 13 true negatives (TNs). The number of biosets used to test for a decrease in ERα function (ie, ERα “suppression”) was 19 TPs and 122 TNs. Prior studies with gene expression biomarkers for xenobiotic receptors CAR and PPARα showed that a cutoff of the Running Fisher algorithm P value ≤ 10−4 after a Benjamini Hochberg correction of α = .001 resulted in a balanced accuracy of 95%, 97%, and 98% for AhR, CAR, and PPARα, respectively (Oshida et al., 2015a,b,c). Applying a Benjamini Hochberg correction of α = .001 to the ERα biomarker predictions also resulted in a P value cutoff of 10−4. This cutoff resulted in a balanced accuracy of ERα activation or ERα suppression of 94% or 93%, respectively (see Results). The values for predictive accuracy were calculated as follows: sensitivity (TP rate) = TP/(TP + FN); specificity (TN rate) = TN/(FP + TN); positive predictive value (PPV) = TP/(TP + FP); negative predictive value (NPV) = TN/(TN + FN); balanced accuracy = (sensitivity + specificity)/2; where FN = false negative and FP = false positive.
Comparison of biomarker predictions to OECD reference chemicals
A list of ER reference chemicals was taken from the Organisation for Economic Co-operation and Development (OECD) TG457 BG1 guidance document (OECD, 2012). Of the 45 reference chemicals in the document, there were 21 chemicals represented by 98 biosets in the MCF-7 compendium, including 13 agonists, 3 antagonists, and 5 inactives. If biosets were available for more than 1 concentration for a single chemical, only the bioset with the highest |−log(P value)| was considered for this analysis.
Comparison of biomarker predictions to ToxCast/Tox21 assay predictions
Comparisons were made between the predictions using the ERα biomarker and predictions from Judson et al. (2015) in which the results of 18 in vitro HTS assays were used to score chemicals for ER agonism or antagonism. These 18 assays included those for receptor binding, receptor dimerization, reporter gene assays and cell growth, implemented in a variety of cell types, and assay readout formats. The rationale for using this large battery of assays was to account for a variety of assay artifacts and assay interference issues that can arise when screening a very diverse set of chemicals, as well as testing chemicals up to concentrations at which cell stress and cytotoxicity can occur. Of the approximately 1800 chemicals examined in the Judson et al. (2015) study, 114 chemicals were also evaluated by transcript profiling in MCF-7 cells. Most of these biosets came from the CMAP 2.0 dataset. If there was more than 1 bioset evaluating a chemical, the bioset for the highest exposure concentration was selected to increase the probability of detection of ERα modulation using the biomarker. There were 6 biosets included as part of the evaluation that came from studies other than CMAP. Five came from GSE50705 (17α-ethinylestradiol [EE], 4-nonylphenol, bisphenol A [BPA], genistein, E2). One bioset came from GSE35428 in which 4-hydroxytamoxifen was evaluated for its ability to suppress E2-induced responses. ER scores were considered active if area under the concentration-response curves (AUC) ≥ 0.1, median-T > 50%, and median-Z-score > 3 according to Judson et al. (2015). ER scores were considered inactive if these criteria were not met.
Evaluation of the effect of biomarker size on predictive ability
To determine how the size of the biomarker affects the predictive ability, shortened gene lists were derived. The 46 genes were ranked in order of decreasing absolute value of the fold-change. The genes with the lowest fold-change were removed, first removing 6 genes to create a list of 40, then removing 5 genes at a time to create lists of 35, 30, 25, 20, and 15 genes. Each new biomarker was queried against the MCF-7 compendium using the Running Fisher test. The 46 gene biomarker showed significant correlation (−log(P value) > 4) with 327 biosets; the correlation of each shortened biomarker to these 327 biosets was plotted. Linear, logarithmic and exponential trend lines were added to the graph in Excel to determine which fit resulted in the optimal R2 values.
Additional computational analyses
Heat maps were generated using Treeview software (http://jtreeview.sourceforge.net; accessed 29 January 2014). The genes in the ERα biomarker were analyzed using the Ingenuity Pathways Analysis Core Analysis function (Qiagen). All results were exported as Excel files and filtered based on P value and ratio or activation z-score.
RESULTS
Assembly and Functional Characterization of an ERα Biomarker
To assemble a biomarker predictive of ERα modulation, gene expression comparisons (biosets) were utilized from chemically treated MCF-7 cells, a cell line which expresses ERα as the major ER subtype (Al-Bader et al., 2011; Li et al., 2014; Zivadinovic et al., 2005). As described in the Materials and Methods, the biomarker was built using biosets from MCF-7 cells treated for 6 h with 7 agonists or 3 antagonists from the CMAP 2.0 study (Lamb et al., 2006) (Figure 1). Putative ERα-regulated genes were first identified as those that exhibited consistent regulation across at least 6 of 7 agonists. The genes were further filtered to include only those that also exhibited opposite regulation by at least 2 of 3 antagonists. A total of 46 genes (32 with increased expression and 14 with decreased expression) were identified which exhibited consistent regulation by the chemicals (Figure 2). A number of the identified genes are well-known targets of ERα, including PGR (Petz et al., 2004), CXCL12 (Hall and Korach, 2003), EGR3 (Inoue et al., 2004), and SIAH2 (Frasor et al., 2005). The full list of genes in the biomarker is found in Supplementary File 1.
The biomarker genes were examined for evidence that they are direct targets of ERα regulation using published ChIP-Seq experiments, which identified direct interactions between ERα and the promoter regions of biomarker genes in MCF-7 cells. In addition, genes that may be regulated by ERα through long-range chromatin interactions were identified by ChIA-PET (Fullwood et al., 2009). A total of 32 genes in our biomarker had ERα-bound regions (7 by ChIP-Seq, 2 by ChIA-PET, and 23 by both; Figure 2, arrowheads), which suggests direct transcriptional regulation by ERα. Direct targets of ERα were expected given that we selected a short treatment time (6 h) to capture primary target genes of responsive transcription factors (Lamb et al., 2006). Overall, the results of the ChIP-Seq and ChIA-PET studies indicate that 70% of the genes in the ERα biomarker are under direct transcriptional control of ERα.
We also determined whether or not constitutive expression of the biomarker genes was influenced by the level of ERα. Gene expression was examined in MCF-7 cells after the ESR1 gene was knocked down using siRNA methods. Knockdown of ESR1 in 7 biosets from 5 studies showed generally consistent effects on the genes in the biomarker with decreased expression of the upregulated genes and increased expression of the downregulated genes (Figure 3).
The 46 ERα biomarker genes were examined for functional class enrichment by Ingenuity Pathway Analysis (IPA). The top 25 canonical pathways enriched with the biomarker genes included pathways associated with ER modulation: “Ovarian Cancer Signaling,” “Endometrial Cancer Signaling,” and “Estrogen-Dependent Breast Cancer Signaling” (Supplementary Figure 1). The upstream analysis function of IPA identified a number of transcription factors and chemicals that were predicted to regulate the biomarker genes (Supplementary Figure 2). The transcription factors and chemicals with significant z-scores (≥ 2.0) that activate the biomarker genes included those that were expected (β-estradiol, estrogen, and ESR1). In addition, RNF31 is an atypical ubiquitin ligase that increases the expression of ERα-regulated genes by stabilizing ERα in the cytoplasm (Zhu et al., 2014). NCOA2 is a nuclear receptor coactivator that is overexpressed in breast cancer and regulates ERα (Wagner et al., 2013). Upstream regulators that inhibit expression of biomarker genes (z-score ≤ −2.0) include the chemicals bexarotene, PD98059, and fulvestrant. Fulvestrant is a complete ERα antagonist. Bexarotene is a selective retinoid X receptor (RXR) agonist; RXR activation increases the expression of HSD17B2, which mediates the conversion of estradiol to the less potent estrone, thus decreasing estradiol pools (Cheng et al., 2008). PD98059, a MAPK/ERK kinase inhibitor, blocks phosphorylation of ERK1/2, and thus inhibits a pathway of nonclassical ER signaling (Alessi et al., 1995). Therefore, all of the significant upstream regulators as identified by IPA have biologically plausible explanations for effects on ERα biomarker genes.
The Biomarker Accurately Predicts ERα Modulation in a Compendium of MCF-7 Biosets
The ability of the 46 gene biomarker to identify chemicals that modulate ERα was examined in a compendium of biosets derived from MCF-7 cells assembled as described in the Materials and Methods. The compendium contains biosets of gene expression differences between control and experimental states including chemical and hormone treatments. Most of the chemical comparisons were derived from the CMAP 2.0 study.
The Running Fisher algorithm (Kupershmidt et al., 2010), a fold-change rank-based pattern matching strategy, was used to predict modulation of ERα by chemicals. The algorithm calculates the significance of the correlation between the biomarker and biosets in the database. In previous studies, the Running Fisher algorithm coupled with derived gene expression biomarkers was found to be very accurate (balanced accuracy range: 95%–98%) in predicting the activation of xenobiotic-responsive transcription factors in a mouse liver compendium (P value ≤ 10−4) (Oshida et al., 2015a,b,c) (see Materials and Methods for details of derivation of the cutoff). Using these methods, the ERα biomarker was examined for correlation to the 10 biosets used to identify the biomarker genes. As expected, the biosets from agonist-treated cells exhibited statistically significant positive correlation to the biomarker (P values ≤ 10−12), and the biosets from treatments with the 3 antagonists exhibited significant negative correlation to the biomarker (P values ≤ 10−10) (Figure 4).
The biomarker was evaluated for the ability to predict ERα modulation by chemicals with known activity. A list of ER reference chemicals was taken from the OECD TG457 BG1 guidance document (OECD, 2012). Of the 45 reference chemicals listed in the document, there were 21 chemicals represented by 98 biosets in the MCF-7 compendium, including 13 agonists, 3 antagonists, and 5 inactives. The potency category of the reference chemicals (ie, strong, moderate, weak, very weak, and inactive) are listed along with the −log(P value) to indicate correlation with the ERα biomarker (Table 1). The biomarker correctly identified chemicals that are classified as weak (BPA, daidzein, genistein) or very weak (apigenin, kaempferol, 4-nonylphenol [linear, CAS 104-40-5]) agonists. The very weak activator chrysin represented by 1 bioset was classified as inactive using the biomarker. Although 2 of the inactive chemicals, reserpine and flutamide, had no activity, it was surprising that the other 2 OECD inactives, cycloheximide, and corticosterone, exhibited significant suppression by the biomarker approach. (These compounds are examined in greater detail later.) The antagonists 4-hydroxytamoxifen, tamoxifen, and raloxifene showed ERα suppression as determined by significant negative correlation to the biomarker. The 1 reference compound classified as inactive for antagonism (progesterone) was also inactive using the biomarker. In summary, of the 21 compounds examined, the biomarker was able to correctly classify 12 of 13 agonists, all 3 antagonists, and 3 of 5 inactives. Importantly, the biomarker could identify weak and very weak agonists.
TABLE 1.
CASRN | Chemical Name | Classification | −Log(P value) |
---|---|---|---|
57-91-0 | 17alpha-Estradiol | moderate agonist | 19.7 |
84-16-2 | meso-Hexestrol | strong agonist | 18.4 |
50-28-2 | 17beta-Estradiol | strong agonist | 18 |
53-16-7 | Estrone | moderate agonist | 16.2 |
104-40-5 | p-n-Nonylphenol | very weak agonist | 16.1 |
446-72-0 | Genistein | weak agonist | 16 |
80-05-7 | BPA | weak agonist | 15.4 |
486-66-8 | Daidzein | weak agonist | 15.2 |
57-63-6 | 17alpha-EE | strong agonist | 13.4 |
520-18-3 | Kaempferol | very weak agonist | 13.4 |
56-53-1 | Diethylstilbestrol | strong agonist | 12.8 |
520-36-5 | Apigenin | very weak agonist | 4.96 |
480-40-0 | Chrysin | very weak agonist | 2.77 |
57-83-0 | Progesterone | inactive | 1.14 |
13311-84-7 | Flutamide | inactive | 0.87 |
50-55-5 | Reserpine | inactive | −2.6 |
66-81-9 | Cycloheximide | inactive | −7.28 |
68392-35-8 | 4-Hydroxytamoxifen (E/Z) | antagonist | −13.9 |
10540-29-1 | Tamoxifen | antagonist | −15.3 |
50-22-6 | Corticosterone | inactive | −21.7 |
82640-04-8 | Raloxifene | antagonist | −26.3 |
Twenty-one reference chemicals with known ERα activities were examined for correlation to the biomarker (−log(P value)). The classification refers to the agonist or antagonist potency as reported in the OECD reference list as described in the text. The 3 chemicals for which the ERα biomarker prediction and OECD classification do not agree are italicized. CASRN - Chemical Abstracts Service Registry Number.
A classification analysis was performed on biosets from MCF-7 cells that were treated with chemicals or hormones with known effects on ERα including those discussed earlier. Classification of activation or suppression required a threshold P value ≤ 10−4. For prediction of activation, the ERα biomarker had a sensitivity of 88% and a specificity of 100%, with a balanced accuracy of 94% (Table 2). For prediction of suppression, the ERα biomarker had a sensitivity of 86% and a specificity of 100%, with a balanced accuracy of 93%. Overall, evaluation of the predictive power of the biomarker resulted in an excellent balanced accuracy to detect exposure conditions which lead to ERα activation or suppression.
TABLE 2.
Activation | Suppression | |
---|---|---|
True positives | 122 | 19 |
True negatives | 13 | 122 |
False positives | 0 | 0 |
False negatives | 16 | 3 |
Sensitivity | 0.884 | 0.863 |
Specificity | 1.000 | 1.000 |
PPV | 1.000 | 1.000 |
NPV | 0.448 | 0.976 |
Balanced accuracy | 0.942 | 0.932 |
The biomarker was compared with biosets that are known positives or negatives for ERα activation including chemicals and E2. Separate tests for ERα activation (estrogenicity) and ERα suppression (antiestrogenicity) were carried out.
Comparison of ERα Biomarker Predictions With Those From 18 ER HTS Assays
Eighteen in vitro HTS assays which examined activity at different points in the ER pathway (receptor binding, receptor dimerization, reporter gene assays run in agonist and antagonist mode, and cell proliferation) have been used to evaluate the estrogenicity of approximately 1800 chemicals (Judson et al., 2015). A mathematical model was used to derive pathway-level concentration-response profiles for either agonism or antagonism. Efficacy values were normalized to agonist activity for 17β-estradiol (E2). Agonist and antagonist scores were calculated as the AUC for the chemical relative to the positive control. Thus, the higher the AUC, the higher was the predicted ER activity (combined potency and efficacy) for that chemical. Assay interference (ie, from cytotoxicity) is another important factor to consider when evaluating the HTS data for estrogenicity. Assay results are compared with the results of 35 cytotoxicity assays by the calculation of a Z-score to address this issue (Judson et al., 2015). Based on the analysis in Judson et al. (2015), we excluded chemicals with possible non-ER-specific activity based on their scores for maximum efficacy (T) and cytotoxicity (Z-score). For the comparisons in this article, chemicals were classified as active if their AUC ≥ 0.1, median-T > 50%, and median-Z-score > 3, and inactive if all conditions were not met.
Figure 5A shows a comparison of the predictions based on the −log(P value)s from the ERα MCF-7 screen and the ER AUC from the Judson et al. study for the 114 overlapping chemicals. The −log(P value)s of the biomarker predictions were rank ordered and colored based on their predicted activity by the HTS assay model. Most of the compounds (83) had no activity as assessed by both methods (ie, an ER AUC of < 0.1 or median-Z-score < 3 and |−log(P value)| < 4). False positive chemicals are those that were predicted to activate ERα in MCF-7 cells but were inactive in the Judson et al. model. The call of inactive could be due to an AUC < 0.1 (filled black circles: theobromine, 4-nonylphenol, methotrexate, niclosamide, digoxin, and corticosterone) or due to a Z-score < 3 (open black circles: norethindrone and cycloheximide). There was 1 false negative for activation, chrysin. Using the 18 in vitro assays as the reference dataset, an accuracy test for predictions of activation or suppression using the ERα biomarker gave balanced accuracies of 95% and 98%, respectively (Table 3). Thus, for the 114 compounds in common, there was excellent agreement between the 2 approaches.
TABLE 3.
Activation | Suppression | |
---|---|---|
True positives | 16 | 6 |
True negatives | 93 | 104 |
False positives | 4 | 4 |
False negatives | 1 | 0 |
Sensitivity | 0.941 | 1.000 |
Specificity | 0.959 | 0.963 |
PPV | 0.800 | 0.600 |
NPV | 0.989 | 1.000 |
Balanced accuracy | 0.950 | 0.980 |
Summary of the sensitivity and specificity of the ERα biomarker compared with the predictions from the Judson et al. (2015) study. Separate tests for ERα activation (estrogenicity) and ERα suppression (antiestrogenicity) were carried out.
Differences in the predictions from the Judson et al. (2015) study and those using the biomarker were examined in greater detail. Figure 5B shows a heat map representing the fold-change of each gene in the biomarker for the false negative and false positive chemicals as well as the −log(P value) from comparison to the biomarker. The chemical chrysin was the only false negative. Chrysin was identified as having marginal estrogenicity in the HTS assays (AUC score = 0.134) but no significant activity with the biomarker (−log(P value) = 2.77). Examination of the heat map of chrysin indicated that the pattern lacked marked similarity with the biomarker (Figure 5B, left). Because chrysin was evaluated only at a single concentration after 6 h of exposure in MCF-7 cells, it is possible that this very weak agonist would have been identified in our MCF-7 screen if a full concentration-response analysis was carried out comparable to that in the HTS ER assays (ie, up to 100 µM) or if exposure time was increased.
The false positives are those 8 chemicals that had activity predicted by the biomarker but were considered inactive in the HTS ER model. For these chemicals, the gene expression biosets showed visible similarity to the ERα biomarker as expected due to the significant −log(P value)s (Figure 5B, right). Theobromine and methotrexate showed no activity in the agonist HTS assays and corticosterone showed no activity in the antagonist HTS assays. The other 5 false positive chemicals had an AUC > 0.1 or showed activity in at least 1 individual ER assay but had Z-scores < 3 suggesting that it is not possible to differentiate true ER activity from assay interference due to cytotoxicity. Thus, 5 of the 8 false positive chemicals had some activity in at least 1 of the 18 assays, raising the possibility that these chemicals could alter ERα in a subset of assays and cellular contexts, and also at concentrations that may be inducing overall cytotoxicity in the cells. The results of these comparisons were similar to the analysis using an earlier version of the ER model (Rotroff et al., 2014) in which 13 assays were used to provide predictions of ERα activity (data not shown).
Relationship Between Biomarker Gene Number and Prediction of ERα Modulation
Screening for candidate EDCs using HT gene expression profiling may utilize array platforms with considerably fewer genes than the full-genome arrays that were used in this study. For example, the L1000 platform used by the Broad Institute LINCS effort examines the expression of approximately 1000 representative genes from the human genome. We thus examined the relationships between the number of genes in the ERα biomarker and the ability to predict ERα modulation. Predictions within the MCF-7 compendium were carried out using biomarkers which lacked, in increments of 5, the bottom ranked genes (those with the lowest average |fold-change|) resulting in ERα biomarkers of 40, 35, 30, 25, 20, and 15 genes (Figure 6A, inset). Using the original 46 gene ERα biomarker as the reference, the changes in the number of biosets predicted to have activation or suppression of ERα were determined. Figure 6A shows the −log(P value)s from biosets which were predicted to have ERα activation (327) using the original 46 gene biomarker compared with the 6 shortened versions. Similar trends were observed for ERα suppression (data not shown). A linear trend line was used to determine the points at which there is crossover with a |−log(P value)| = 4 for the individual shortened versions of the biomarker. Linear trend lines resulted in the best representation of the data with R2 values of 0.968–0.675 as compared with exponential (R2 0.986–0.237) or logarithmic (R2 0.896–0.627). Figure 6B summarizes the percent of biosets that would be misclassified as false negatives as a function of biomarker gene number. These results indicate that the biomarker could be reduced to 32 (activation) or 38 (suppression) genes while keeping the number of false negatives under 10%. However, if the goal is to utilize the biomarker as a Tier 0 screening strategy, the number of false negatives would need to be minimized to avoid misclassifying any chemical that may have effects. This gene expression screening strategy would require using the full 46 gene biomarker.
Evaluation of the Biomarker as a Potential Screening Tool in Different Cell Models
We determined if the biomarker developed using MCF-7 cell experiments could be used in conjunction with gene expression profiling in other cell lines. Biomarker behavior was first examined across E2-treated breast cancer cell lines with known ERα activity. The biomarker was able to identify significant ERα activation in 75 out of 80 biosets from E2-treated MCF-7 cells (Figure 7A). Short exposure times (1–4 h) may explain why E2 did not activate ERα in 5 of these biosets. All 5 biosets from 3 ERα positive cell lines (MDA-MB-134, SUM44PE, ZR-75-1) treated with E2 showed significant activation. In contrast, only 3 of the 14 biosets from the ERα positive T47D cells exposed to E2 showed significant ERα activation. Although these cells are generally ER-responsive, the very high levels of progesterone receptor expressed in T47D cells may inhibit E2-induced gene expression (Abdel-Hafiz et al., 2002; Horwitz et al., 1982). It should be noted that this cell line was used in the HTS ER screening program as a model of E2-induced growth (Judson et al., 2015; Rotroff et al., 2014). None of the 23 biosets from E2-treated ERα-negative cell line MDA-MB-231 showed significant ERα activation.
The biomarker was also evaluated to determine responsiveness in cell lines derived from tissues other than breast. ERα activation was observed after E2 treatment in 8 of 27 biosets from endometrium-derived cell lines (Figure 7B). Most of these biosets were generated using the Ishikawa cell line, which expresses ERα and ERβ (Hevir-Kene and Rizner, 2015). Only 1 of the 10 E2-treated biosets from the U-2 OS osteosarcoma cell line resulted in significant activation. Cell lines derived from E2-treated blood, endothelial cells, leukocytes, liver, ovary, quadriceps muscle, raphe nuclei, skin, umbilical cord, and vagina did not exhibit significant activation. Lack of activation could be due to either little or no expression of ERα/ERβ or that ERα is expressed but regulates a different set of genes in these tissues. Thus, the biomarker appears to be most useful as a screening tool in MCF-7 cells and could possibly be used in a subset of ERα-positive breast cancer cell lines. However, the biomarker does not appear to be useful for screening chemicals in other cell lines.
DISCUSSION
HTS assays including those that are carried out as part of the ToxCast/Tox21 screening programs have proven useful in the identification of candidate EDCs and in providing information about their potential mechanisms of action (Judson et al., 2015). In this study, a testing strategy complementary to these HTS ER assays was evaluated to identify candidate EDCs using a gene expression biomarker in combination with transcript profiling. As a proof of principle, our efforts were focused on identification of chemicals that modulate ERα, arguably one of the most important and well-studied EDC targets, using a 46 gene biomarker derived from microarray profiles of ERα agonists and antagonists in MCF-7 cells. The ERα biomarker genes exhibited consistent regulation by structurally diverse agonists and opposite regulation by antagonists (Figure 2). Although our approach did not specifically screen for genes that were regulated by ERα, most of the biomarker genes are direct targets of ERα. Approximately 70% of the genes were found to have direct interactions with ERα in their promoter/enhancer regulatory regions as assessed by ChIP-Seq/ChIA-PET experiments (Figure 2). The regions bound by ERα all contain an ERE motif, so this interaction is likely a direct binding of ERα to the promoter (data not shown). Furthermore, many of the genes exhibited altered expression when the ESR1 gene itself was knocked down by siRNA methodologies (Figure 3). The biomarker genes exhibited expected changes in expression after E2 exposure in ERα-positive but not ERα-negative breast cancer cell lines (Figure 7). Our approach significantly expands on a study that used statistical procedures to identify commonly regulated genes in 10 published microarray experiments that involved E2 exposure in MCF-7 cells (Ochsner et al., 2009). Nine of those 10 datasets are included in our analysis (Figure 7) and 39 of our biomarker genes were identified in their study, supporting the validity of our approach. Taken together, our procedures identified ERα-regulated genes that could be useful in classifying chemicals for effects on ERα.
To explain the biomarker genes that do not have evidence for direct ERα binding, it should be noted that although the biomarker was built using gene expression profiles from chemicals that bind to ERα, the biomarker cannot distinguish between those chemicals which activate by the classical agonism mechanism and those chemicals that may activate through alternative mechanisms (eg, nonclassical activation or by increasing the availability of estrogens) (Chen et al., 2014). Thus, the term “activation” is used in this study to include all mechanisms that lead to increased activity of ERα. ERα “suppression” then includes true antagonism as well as decreases in background or estrogen-stimulated activation through other mechanisms (eg, depletion of pools of estrogen through alteration in metabolism). In fact, depletion of estrogens from serum affects the expression of the ERα biomarker genes similar to that of antagonist exposure (Supplementary Figure 3).
To provide an appropriate cellular context for testing the biomarker, a compendium of gene expression comparisons (also called biosets) was assembled from experiments carried out in the human breast cancer cell line MCF-7. This ERα-positive cell line has been extensively used as a model for breast cancer treatment strategies and to identify ERα modulating chemicals. A large number of biosets were identified and annotated from curated studies found in a commercially available gene expression database (NextBio). The final compendium contains over 1400 biosets from chemically treated cells, most of which came from the CMAP 2.0 drug study (Lamb et al., 2006), as well as hundreds of comparisons of hormone effects consisting mostly of E2 treatments used as a positive control in various experiments. The compendium also contains biosets from experiments which examined the effects of overexpression or knocking down expression of approximately 200 different genes. Applications of our screening approach for identification of novel CMAP chemicals and genes encoding proteins that modulate ERα will be described in future work and is not a focus of this study. The compendium, which will continue to grow in parallel with advances in genomic screening techniques, will be a useful database for future studies to link chemical exposure and genetic perturbation to molecular targets and pathway-level effects. The prediction of ERα modulation is the first application of this compendium.
To screen for chemicals that lead to alterations of ERα function, the biomarker was compared with individual biosets in the MCF-7 compendium using the fold-change rank-based nonparametric Running Fisher algorithm (Kupershmidt et al., 2010). The approach, somewhat analogous to the Gene Set Enrichment Analysis method (Lamb, 2007; Lamb et al., 2006), has proven useful in identifying novel treatment strategies for disease (Eriksson et al., 2015). The approach finds, in an unsupervised manner, biosets with expression patterns of biomarker genes with statistically significant positive or negative correlation corresponding to activation or suppression of ERα. Antagonist-like activity can most likely be detected because the MCF-7 cell line cultured under standard conditions exhibits a basal level of ERα activation that can be suppressed with antagonists. Indeed, ERα is suppressed under conditions of depletion of estrogens in the media by charcoal filtering (Supplementary Figure 3) and culture conditions for MCF-7 cells contain approximately 20 pM of E2 (Bindal and Katzenellenbogen, 1988). This level of E2 could likely modulate ERα-regulated genes.
The use of the gene expression biomarker resulted in excellent predictive accuracy for ERα modulation. Using 141 biosets from cells treated with E2 or chemicals with known ERα activity, a test for prediction of activation or suppression gave a balanced accuracy of 94% or 93%, respectively (Table 2). This high level of accuracy demonstrated the robustness of the computational procedures to identify ERα modulators despite the fact that the biosets were nonhomogeneous, consisting of a collection derived from experiments with various exposure conditions carried out in different labs that queried gene expression using different microarray platforms (data not shown). The high degree of accuracy using the ERα biomarker is consistent with our past experience identifying modulators of the transcription factors AhR, CAR, and PPARα that also resulted in excellent accuracy (95%, 97%, and 98% balanced accuracy, respectively) in a compendium of liver biosets (Oshida et al., 2015a,b,c). The computational approach used in these studies will be useful for the future assessment of chemical modulation of other transcription factors including those that are important mediators of endocrine disruption.
There was excellent concordance between the predictions using the ERα biomarker and those of other tests for estrogenicity/antiestrogenicity. The biomarker was able to correctly classify 18 of 21 OECD ER reference chemicals (Table 1), most notably identifying 3 of the 4 agonists classified as “very weak.” The very weak agonist chrysin gave a positive response but did not achieve significance (−log(P value) = 2.77). Two of the chemicals (corticosterone and cycloheximide) were misclassified as false positives for antiestrogenicity using the biomarker. Cycloheximide was likely confounded by cytotoxicity as the concentration used in the CMAP 2.0 study was 14.2 μM, a concentration that was cytotoxic in a number of HTS ER assays (Judson et al., 2015). We hypothesize that corticosterone, a glucocorticoid receptor (GR) agonist, may be influencing expression of ERα-responsive genes indirectly through GR-mediated increases in the expression of SULT1E1, a sulfotransferase that sulfonates and inactivates estrogen (Gong et al., 2008).
The biomarker predictions were compared with 18 in vitro HTS assays which examined different endpoints of ER activity carried out as part of the EDSP HTS program (Judson et al., 2015). Remarkably, the biomarker was able to correctly classify 105 of the 114 chemicals in common (Figure 5; Table 3). Accuracy tests for ERα activation or suppression gave balanced accuracies of 95% or 98%, respectively. Chrysin was the only false negative of the 114 chemicals tested and, in support of this finding, was a very weak modulator in the Judson et al. study with an agonism score of 0.134, very near the cutoff used. Of the 4 false positive chemicals for estrogenicity, theobromine was previously identified as an estrogenic chemical in a study of drug repurposing (Iskar et al., 2013), and 4-nonylphenol was shown to activate ERα (Vivacqua et al., 2003). Digoxin, which possesses a steroid structure, was 1 of the 4 false positive chemicals for ERα suppression. Digoxin has been linked to estrogenicity in other cellular contexts (Biggar, 2012), indicating that digoxin can act as a selective ERα modulator. Further experiments are warranted to confirm the activity of these compounds.
The concordance between the biomarker classifications and these other methods was remarkable considering the deficiencies inherent in most of the biosets used for classification in our study. In particular, the biosets from the CMAP 2.0 dataset included chemical comparisons in which statistically significant gene lists were derived using a t test comparison between 1 treated sample versus multiple control samples at only 1 time point and concentration level (approximately 10 µM for most chemicals). As the concentration of a chemical and time of exposure are critical factors determining toxicity, evaluation of a range of concentrations and time points is necessary to reduce the risk of false negatives and false positives in toxicity testing. Therefore, when HT gene expression profiling is ultimately implemented within chemical screening programs, the ability to identify chemicals of concern should be greatly improved over the present analysis when multiple replicates, concentrations, and times of exposure are examined.
The excellent concordance between our method and the ER predictive model indicates that the MCF-7 cell line will be useful for future HT gene expression profiling. The MCF-7 cell line has been used as a model to examine genes and signaling pathways that determine ERα activation by classical and nonclassical mechanisms (Marino et al., 2006). Several signaling pathways that impact ER activation and are associated with cell growth and cancer are functional in MCF-7 cells including G protein-coupled receptor pathways, PI3K-Akt signaling, Wnt/β-catenin, and Notch signaling (Hu et al., 2011). In contrast to the MCF-7 cells, at least some of the assays used in the ER HTS program cannot identify chemicals that activate ERα through nonclassical mechanisms. The assays carried out in the human kidney cell line HEK293T and the human hepatoblastoma cell line HepG2 use hybrid proteins consisting of the ligand binding domain (LBD) of ERα in frame with the yeast GAL4 DNA binding domain. These systems are only responsive to activation/suppression mediated through the LBD and would not be responsive to activation by signaling in the ERα protein N-terminal to the LBD. In contrast to other systems, the human ovarian cell line BG1 used for agonist and antagonist assays, like the MCF-7 cells, depend on the endogenous full-length ERs for activity. It would be interesting to determine if chemicals that activate ER in BG-1 and MCF-7 cells, but not in the HEK293T or HepG2 cells, act through nonclassical mechanisms of activation. Our screen with the biomarker identified a number of chemicals that were not identified in other assays (the false positives discussed earlier). It is possible that these chemicals activate ERα through nonclassical mechanisms but further work is needed to confirm this hypothesis.
There are a number of potential caveats of our approach for identification of ERα modulating chemicals. The approach does not reveal the underlying nature of the agonist-like or antagonist-like activity. Like the current strategy of using ER HTS assays, the methods described here will greatly reduce the number of chemicals for further testing, but additional tests would have to be carried out to determine how the chemicals are causing modulation. In addition, the MCF-7 cell line may not be appropriate to identify chemicals that alter ERα through effects on steriodogenesis that determine the level of E2. Two aromatase inhibitors are included in the compendium (letrozole, aminoglutethimide) but both had no effects on ERα consistent with studies that show that endogenous aromatase is expressed at levels that do not allow inhibition effects to be seen (Zhou et al., 1990). To circumvent this problem, an MCF-7 cell line that constitutively expresses human aromatase, the MCF-7aro cell line has been recently used as a screening model (Chen et al., 2014;, 2015) and provides a solution to detect not only aromatase inhibitors but also chemicals that affect other steroidogenesis enzymes. Of note, the ERα biomarker would not necessarily identify chemicals that are ERα modulators in other cellular contexts due to the likelihood of tissue-specific differences in ERα target genes. Our examination of E2 responsiveness in an array of cell types indicated that the biomarker is only appropriate for screening in a subset of human breast cancer cell lines, namely those that are ERα-positive (Figure 7).
Another limitation is that the number of genes in the biomarker determines the sensitivity of the predictions (Figure 6). This aspect is important when considering that it may not be feasible to interrogate the full genome for high-throughput gene expression profiling. Platforms with smaller numbers of genes (eg, Broad Institute L1000) may allow only a subset of derived biomarker genes to be queried. Our analysis of the impact of the number of genes in the biomarker indicates that as the size of the biomarker decreases, there are increases in the number of false negatives for prediction of both ERα activation and suppression.
In summary, we have developed gene expression-based computational procedures to screen chemicals for ERα activity that closely replicate the results of 18 HTS assays without an increase in the number of false negatives. High-throughput transcript profiling in MCF-7 cells for ERα modulators could complement the current screening paradigm by serving as a Tier 0 screen which would be followed by more targeted assays to uncover the underlying mechanism of action. Although the experimental details have yet to be fully explored, the cost, time, and resource requirements of running a single gene expression experiment will undoubtedly provide savings over the current HTS assay platform. The procedures also have the advantage of simultaneously assessing agonist-like or antagonist-like activity in a single assay system. As detailed in a recent U.S. Federal Register Notice (https://www.federalregister.gov/articles/2015/06/19/2015-15182/use-of-high-throughput-assays-and-computational-tools-endocrine-disruptor-screening-program-notice; accessed 23 June 2015), 3 assays in the EDSP Tier I battery could be replaced by in vitro ER assays based on the ability of the assays used in the Judson et al. (2015) model to accurately predict uterotrophic results in mice and rats (Browne et al., 2015). Thus, the methods developed here could not only be used as a more streamlined alternative to the 18 ER ToxCast assays but also provide a general strategy for the identification of ER modulators that would meet the needs of a number of EDSP stakeholders.
SUPPLEMENTARY DATA
Supplementary data are available online at http://toxsci.oxfordjournals.org/.
FUNDING
The information in this document has been funded in part by the U.S. EPA and by the National Institute of Environmental Health Sciences.
Supplementary Material
ACKNOWLEDGMENTS
This study was carried out as part of the Environmental Protection Agency (EPA) High-Throughput Testing project within the Chemical Safety for Sustainability (CSS) Program. This research was supported in part by a postdoctoral appointment (for Ryan) to the Research Participation Program for the U.S. EPA, Office of Research and Development, administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and EPA. The views expressed in this article are those of the authors and do not necessarily reflect the statements, opinions, views, conclusions, or policies of the National Institute of Environmental Health Sciences, National Institutes of Health, or the U.S. government. Mention of trade names or commercial products does not constitute endorsement or recommendation for use. The authors declare they have no actual or potential competing financial interests. This study has been subjected to review by the National Health and Environmental Effects Research Laboratory and approved for publication. Approval does not signify that the contents reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use. The authors thank Dr Patience Brown, Dr Stephen Safe, and Dr Vickie Wilson for critical review of the article and Dr Susan Laws for helpful discussions.
REFERENCES
- Abdel-Hafiz H., Takimoto G. S., Tung L., Horwitz K. B. (2002). The inhibitory function in human progesterone receptor N termini binds SUMO-1 protein to regulate autoinhibition and transrepression. J. Biol. Chem. 277, 33950–33956. [DOI] [PubMed] [Google Scholar]
- Al-Bader M., Ford C., Al-Ayadhy B., Francis I. (2011). Analysis of estrogen receptor isoforms and variants in breast cancer cell lines. Exp. Ther. Med. 2, 537–544. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alessi D. R., Cuenda A., Cohen P., Dudley D. T., Saltiel A. R. (1995). PD 098059 is a specific inhibitor of the activation of mitogen-activated protein kinase kinase in vitro and in vivo. J. Biol. Chem. 270, 27489–27494. [DOI] [PubMed] [Google Scholar]
- Barone I., Brusco L., Fuqua S. A. (2010). Estrogen receptor mutations and changes in downstream gene expression and signaling. Clin. Cancer Res. 16, 2702–2708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Biggar R. J. (2012). Molecular pathways: Digoxin use and estrogen-sensitive cancers–risks and possible therapeutic implications. Clin. Cancer Res. 18, 2133–2137. [DOI] [PubMed] [Google Scholar]
- Bindal R. D., Katzenellenbogen J. A. (1988). Bis(4-hydroxyphenyl)[2-(phenoxysulfonyl)phenyl]methane: Isolation and structure elucidation of a novel estrogen from commercial preparations of phenol red (phenolsulfonphthalein). J. Med. Chem. 31, 1978–1983. [DOI] [PubMed] [Google Scholar]
- Browne P., Judson R. S., Casey W. M., Kleinstreuer N. C., Thomas R. S. (2015). Screening chemicals for estrogen receptor bioactivity using a computational model. Environ. Sci. Technol. 49, 8804–8814. [DOI] [PubMed] [Google Scholar]
- Chen S., Hsieh J. H., Huang R., Sakamuru S., Hsin L. Y., Xia M., Shockley K. R., Auerbach S., Kanaya N., Lu H., et al. (2015). Cell-based high-throughput screening for aromatase inhibitors in the Tox21 10K library. Toxicol. Sci. 147, 446–457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen S., Zhou D., Hsin L. Y., Kanaya N., Wong C., Yip R., Sakamuru S., Xia M., Yuan Y. C., Witt K., et al. (2014). AroER tri-screen is a biologically relevant assay for endocrine disrupting chemicals modulating the activity of aromatase and/or the estrogen receptor. Toxicol. Sci. 139, 198–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng Y. H., Yin P., Xue Q., Yilmaz B., Dawson M. I., Bulun S. E. (2008). Retinoic acid (RA) regulates 17beta-hydroxysteroid dehydrogenase type 2 expression in endometrium: Interaction of RA receptors with specificity protein (SP) 1/SP3 for estradiol metabolism. J. Clin. Endocrinol. Metab. 93, 1915–1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cicatiello L., Mutarelli M., Grober O. M., Paris O., Ferraro L., Ravo M., Tarallo R., Luo S., Schroth G. P., Seifert M, et al. (2010). Estrogen receptor alpha controls a gene network in luminal-like breast cancer cells comprising multiple transcription factors and microRNAs. Am. J. Pathol. 176, 2113–2130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cox L. A., Popken D., Marty M. S., Rowlands J. C., Patlewicz G., Goyak K. O., Becker R. A. (2014). Developing scientific confidence in HTS-derived prediction models: Lessons learned from an endocrine case study. Regul. Toxicol. Pharmacol. 69, 443–450. [DOI] [PubMed] [Google Scholar]
- Diamanti-Kandarakis E., Bourguignon J. P., Giudice L. C., Hauser R., Prins G. S., Soto A. M., Zoeller R. T., Gore A. C. (2009). Endocrine-disrupting chemicals: An endocrine society scientific statement. Endocr. Rev. 30, 293–342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eriksson A., Osterroos A., Hassan S., Gullbo J., Rickardson L., Jarvius M., Nygren P., Fryknas M., Hoglund M., Larsson R. (2015). Drug screen in patient cells suggests quinacrine to be repositioned for treatment of acute myeloid leukemia. Blood Cancer J. 5, e307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Filer D., Patisaul H. B., Schug T., Reif D., Thayer K. (2014). Test driving ToxCast: Endocrine profiling for 1858 chemicals included in phase II. Curr. Opin. Pharmacol. 19, 145–152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frasor J., Danes J. M., Funk C. C., Katzenellenbogen B. S. (2005). Estrogen down-regulation of the corepressor N-CoR: Mechanism and implications for estrogen derepression of N-CoR-regulated genes. Proc. Natl. Acad. Sci. U.S.A. 102, 13153–13157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fullwood M. J., Liu M. H., Pan Y. F., Liu J., Xu H., Mohamed Y. B., Orlov Y. L., Velkov S., Ho A., Mei P. H, et al. (2009). An oestrogen-receptor-alpha-bound human chromatin interactome. Nature 462, 58–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gong H., Jarzynka M. J., Cole T. J., Lee J. H., Wada T., Zhang B., Gao J., Song W. C., DeFranco D. B., Cheng S. Y., et al. (2008). Glucocorticoids antagonize estrogens by glucocorticoid receptor-mediated activation of estrogen sulfotransferase. Cancer Res. 68, 7386–7393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grober O. M., Mutarelli M., Giurato G., Ravo M., Cicatiello L., De Filippo M. R., Ferraro L., Nassa G., Papa M. F., Paris O, et al. (2011). Global analysis of estrogen receptor beta binding to breast cancer cell genome reveals an extensive interplay with estrogen receptor alpha for target gene regulation. BMC Genomics 12, 36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hall J. M., Korach K. S. (2003). Stromal cell-derived factor 1, a novel target of estrogen receptor action, mediates the mitogenic effects of estradiol in ovarian and breast cancer cells. Mol. Endocrinol. 17, 792–803. [DOI] [PubMed] [Google Scholar]
- Hevir-Kene N., Rizner T. L. (2015). The endometrial cancer cell lines Ishikawa and HEC-1A, and the control cell line HIEEC, differ in expression of estrogen biosynthetic and metabolic genes, and in androstenedione and estrone-sulfate metabolism. Chem. Biol. Interact. 234, 309–319. [DOI] [PubMed] [Google Scholar]
- Horwitz K. B., Mockus M. B., Lessey B. A. (1982). Variant T47D human breast cancer cells with high progesterone-receptor levels despite estrogen and antiestrogen resistance. Cell 28, 633–642. [DOI] [PubMed] [Google Scholar]
- Hu M., Yu J., Taylor J. M., Chinnaiyan A. M., Qin Z. S. (2010). On the detection and refinement of transcription factor binding sites using ChIP-Seq data. Nucleic Acids Res. 38, 2154–2167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu Z. Z., Kagan B. L., Ariazi E. A., Rosenthal D. S., Zhang L., Li J. V., Huang H., Wu C., Jordan V. C., Riegel A. T., et al. (2011). Proteomic analysis of pathways involved in estrogen-induced growth and apoptosis of breast cancer cells. PLoS One 6, e20410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hurle M. R., Yang L., Xie Q., Rajpal D. K., Sanseau P., Agarwal P. (2013). Computational drug repositioning: From data to therapeutics. Clin. Pharmacol. Ther. 93, 335–341. [DOI] [PubMed] [Google Scholar]
- Inoue A., Omoto Y., Yamaguchi Y., Kiyama R., Hayashi S. I. (2004). Transcription factor EGR3 is involved in the estrogen-signaling pathway in breast cancer cells. J. Mol. Endocrinol. 32, 649–661. [DOI] [PubMed] [Google Scholar]
- Iskar M., Zeller G., Blattmann P., Campillos M., Kuhn M., Kaminska K. H., Runz H., Gavin A. C., Pepperkok R., van Noort V., Bork P. (2013). Characterization of drug-induced transcriptional modules: Towards drug repositioning and functional understanding. Mol. Syst. Biol. 9, 662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Joseph R., Orlov Y. L., Huss M., Sun W., Kong S. L., Ukil L., Pan Y. F., Li G., Lim M., Thomsen J. S, et al. (2010). Integrative model of genomic factors for determining binding site selection by estrogen receptor-alpha. Mol. Syst. Biol. 6, 456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Judson R., Houck K., Martin M., Knudsen T., Thomas R. S., Sipes N., Shah I., Wambaugh J., Crofton K. (2014). in vitro and modelling approaches to risk assessment from the U.S. Environmental Protection Agency ToxCast programme. Basic Clin. Pharmacol. Toxicol. 115, 69–76. [DOI] [PubMed] [Google Scholar]
- Judson R. S., Magpantay F. M., Chickarmane V., Haskell C., Tania N., Taylor J., Xia M., Huang R., Rotroff D. M., Filer D. L, et al. (2015). Integrated model of chemical perturbations of a biological pathway using 18 in vitro high-throughput screening assays for the estrogen receptor. Toxicol. Sci. 148, 137–154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kupershmidt I., Su Q. J., Grewal A., Sundaresh S., Halperin I., Flynn J., Shekar M., Wang H., Park J., Cui W, et al. (2010). Ontology-based meta-analysis of global collections of high-throughput public data. PLoS One 5, e13066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lamb J. (2007). The connectivity map: A new tool for biomedical research. Nat. Rev. Cancer 7, 54–60. [DOI] [PubMed] [Google Scholar]
- Lamb J., Crawford E. D., Peck D., Modell J. W., Blat I. C., Wrobel M. J., Lerner J., Brunet J. P., Subramanian A., Ross K. N, et al. (2006). The connectivity map: Using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935. [DOI] [PubMed] [Google Scholar]
- Larman H. B., Scott E. R., Wogan M., Oliveira G., Torkamani A., Schultz P. G. (2014). Sensitive, multiplex and direct quantification of RNA sequences using a modified RASL assay. Nucleic Acids Res. 42, 9146–9157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H., Qiu J., Fu X. D. (2012). RASL-seq for massively parallel and quantitative analysis of gene expression. Curr. Protoc. Mol. Biol. Chapter 4, Unit 4 13, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Y., Arao Y., Hall J. M., Burkett S., Liu L., Gerrish K., Cavailles V., Korach K. S. (2014). Research resource: STR DNA profile and gene expression comparisons of human BG-1 cells and a BG-1/MCF-7 clonal variant. Mol. Endocrinol. 28, 2072–2081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marino M., Galluzzo P., Ascenzi P. (2006). Estrogen signaling multiple pathways to impact gene transcription. Curr. Genomics 7, 497–508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Naciff J. M., Daston G. P. (2004). Toxicogenomic approach to endocrine disrupters: Identification of a transcript profile characteristic of chemicals with estrogenic activity. Toxicol. Pathol. 32(Suppl. 2), 59–70. [DOI] [PubMed] [Google Scholar]
- Naciff J. M., Overmann G. J., Torontali S. M., Carr G. J., Tiesman J. P., Richardson B. D., Daston G. P. (2003). Gene expression profile induced by 17 alpha-ethynyl estradiol in the prepubertal female reproductive system of the rat. Toxicol. Sci. 72, 314–330. [DOI] [PubMed] [Google Scholar]
- Ochsner S. A., Steffen D. L., Hilsenbeck S. G., Chen E. S., Watkins C., McKenna N. J. (2009). GEMS (gene expression metasignatures), a Web resource for querying meta-analysis of expression microarray datasets: 17beta-estradiol in MCF-7 cells. Cancer Res. 69, 23–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- OECD. (2012). OECD Test No. 457: BG1Luc Estrogen Receptor Transactivation Test Method for Identifying Estrogen Receptor Agonists and Antagonists. OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris. DOI: http://dx.doi.org/10.1787/9789264185395-en.
- Oshida K., Vasani N., Jones C., Moore T., Hester S., Nesnow S., Auerbach S., Geter D. R., Aleksunes L. M., Thomas R. S, et al. (2015a). Identification of chemical modulators of the constitutive activated receptor (CAR) in a gene expression compendium. Nucl. Recept. Signal. 13, e002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oshida K., Vasani N., Thomas R. S., Applegate D., Gonzalez F. J., Aleksunes L. M., Klaassen C. D., Corton J. C. (2015b). Screening a mouse liver gene expression compendium identifies modulators of the aryl hydrocarbon receptor (AhR). Toxicology 336, 99–112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oshida K., Vasani N., Thomas R. S., Applegate D., Rosen M., Abbott B., Lau C., Guo G., Aleksunes L. M., Klaassen C., et al. (2015c). Identification of modulators of the nuclear receptor peroxisome proliferator-activated receptor alpha (PPARalpha) in a mouse liver gene expression compendium. PLoS One 10, e0112655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Petz L. N., Ziegler Y. S., Schultz J. R., Kim H., Kemper J. K., Nardulli A. M. (2004). Differential regulation of the human progesterone receptor gene through an estrogen response element half site and Sp1 sites. J. Steroid. Biochem. Mol. Biol. 88, 113–122. [DOI] [PubMed] [Google Scholar]
- Ross-Innes C. S., Stark R., Holmes K. A., Schmidt D., Spyrou C., Russell R., Massie C. E., Vowler S. L., Eldridge M., Carroll J. S. (2010). Cooperative interaction between retinoic acid receptor-alpha and estrogen receptor in breast cancer. Genes Dev. 24, 171–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rotroff D. M., Martin M. T., Dix D. J., Filer D. L., Houck K. A., Knudsen T. B., Sipes N. S., Reif D. M., Xia M., Huang R., et al. (2014). Predictive endocrine testing in the 21st century using in vitro assays of estrogen receptor signaling responses. Environ. Sci. Technol. 48, 8706–8716. [DOI] [PubMed] [Google Scholar]
- Safe S., Kim K. (2008). Non-classical genomic estrogen receptor (ER)/specificity protein and ER/activating protein-1 signaling pathways. J. Mol. Endocrinol. 41, 263–275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sikora M. J., Strumba V., Lippman M. E., Johnson M. D., Rae J. M. (2012). Mechanisms of estrogen-independent breast cancer growth driven by low estrogen concentrations are unique versus complete estrogen deprivation. Breast Cancer Res. Treat. 134, 1027–1039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smalley J. L., Gant T. W., Zhang S. D. (2010). Application of connectivity mapping in predictive toxicology based on gene-expression similarity. Toxicology 268, 143–146. [DOI] [PubMed] [Google Scholar]
- Vivacqua A., Recchia A. G., Fasanella G., Gabriele S., Carpino A., Rago V., Di Gioia M. L., Leggio A., Bonofiglio D., Liguori A., et al. (2003). The food contaminants bisphenol A and 4-nonylphenol act as agonists for estrogen receptor alpha in MCF7 breast cancer cells. Endocrine 22, 275–284. [DOI] [PubMed] [Google Scholar]
- Wagner M., Koslowski M., Paret C., Schmidt M., Tureci O., Sahin U. (2013). NCOA3 is a selective co-activator of estrogen receptor alpha-mediated transactivation of PLAC1 in MCF-7 breast cancer cells. BMC Cancer 13, 570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Welboren W. J., van Driel M. A., Janssen-Megens E. M., van Heeringen S. J., Sweep F. C., Span P. N., Stunnenberg H. G. (2009). ChIP-Seq of ERalpha and RNA polymerase II defines genes differentially responding to ligands. EMBO J. 28, 1418–1428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu F., Khan S., Wu Q., Barhoumi R., Burghardt R., Safe S. (2008a). Ligand structure-dependent activation of estrogen receptor alpha/Sp by estrogens and xenoestrogens. J. Steroid. Biochem. Mol. Biol. 110, 104–115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu F., Xu R., Kim K., Martin J., Safe S. (2008b). in vivo profiling of estrogen receptor/specificity protein-dependent transactivation. Endocrinology 149, 5696–5705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang J. H., Li J. H., Jiang S., Zhou H., Qu L. H. (2013). ChIPBase: A database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data. Nucleic Acids Res. 41(Database issue), D177–D187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou D. J., Pompon D., Chen S. A. (1990). Stable expression of human aromatase complementary DNA in mammalian cells: A useful system for aromatase inhibitor screening. Cancer Res. 50, 6949–6954. [PubMed] [Google Scholar]
- Zhu J., Zhao C., Kharman-Biz A., Zhuang T., Jonsson P., Liang N., Williams C., Lin C. Y., Qiao Y., Zendehdel K, et al. (2014). The atypical ubiquitin ligase RNF31 stabilizes estrogen receptor alpha and modulates estrogen-stimulated breast cancer cell proliferation. Oncogene 33, 4340–4351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zivadinovic D., Gametchu B., Watson C. S. (2005). Membrane estrogen receptor-alpha levels in MCF-7 breast cancer cells predict cAMP and proliferation responses. Breast Cancer Res. 7, R101–R112. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.