Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Sep 28;15(9):e0239367. doi: 10.1371/journal.pone.0239367

Mining a human transcriptome database for chemical modulators of NRF2

John P Rooney 1,2,¤, Brian Chorley 1, Steven Hiemstra 3, Steven Wink 3, Xuting Wang 4, Douglas A Bell 4, Bob van de Water 3, J Christopher Corton 1,*
Editor: Roberto Mantovani5
PMCID: PMC7521735  PMID: 32986742

Abstract

Nuclear factor erythroid-2 related factor 2 (NRF2) encoded by the NFE2L2 gene is a transcription factor critical for protecting cells from chemically-induced oxidative stress. We developed computational procedures to identify chemical modulators of NRF2 in a large database of human microarray data. A gene expression biomarker was built from statistically-filtered gene lists derived from microarray experiments in primary human hepatocytes and cancer cell lines exposed to NRF2-activating chemicals (oltipraz, sulforaphane, CDDO-Im) or in which the NRF2 suppressor Keap1 was knocked down by siRNA. Directionally consistent biomarker genes were further filtered for those dependent on NRF2 using a microarray dataset from cells after NFE2L2 siRNA knockdown. The resulting 143-gene biomarker was evaluated as a predictive tool using the correlation-based Running Fisher algorithm. Using 59 gene expression comparisons from chemically-treated cells with known NRF2 activating potential, the biomarker gave a balanced accuracy of 93%. The biomarker was comprised of many well-known NRF2 target genes (AKR1B10, AKR1C1, NQO1, TXNRD1, SRXN1, GCLC, GCLM), 69% of which were found to be bound directly by NRF2 using ChIP-Seq. NRF2 activity was assessed across ~9840 microarray comparisons from ~1460 studies examining the effects of ~2260 chemicals in human cell lines. A total of 260 and 43 chemicals were found to activate or suppress NRF2, respectively, most of which have not been previously reported to modulate NRF2 activity. Using a NRF2-responsive reporter gene in HepG2 cells, we confirmed the activity of a set of chemicals predicted using the biomarker. The biomarker will be useful for future gene expression screening studies of environmentally-relevant chemicals.

Introduction

The vast number of chemicals used in industry have limited toxicity testing. The time and financial costs to screen these chemicals via traditional methods is prohibitive. These constraints have driven the development of high-throughput in vitro screening assays to aid in assessing chemical toxicity and to prioritize chemicals for traditional testing. Great strides have been made by United States governmental chemical screening programs including the EPA’s ToxCast program and the cross-agency Tox21 program in this regard, screening over 1800 chemicals in as many as 700 assays [1]. Landmark studies have developed computational models that for example predict modulation of estrogen and androgen receptors, important targets for endocrine disrupting chemicals with remarkable accuracy based on results from in vitro assays [24]. Yet, despite the success of these programs in prioritization, there are still obstacles to overcome. Most HTS assays are targeted to specific events in biological pathways, and while this aids in specificity, it also leaves a large biological space that is not covered by the current assay battery [5, 6]. To more completely assess the effects of chemicals on a wider range of molecular targets of regulatory interest, broader approaches are needed.

Gene expression profiling represents a robust complementary approach to HTS screening and has the potential to cover some of the biological space missed by HTS assays. The Library of Integrated Network-Based Cellular Signatures (LINCS) and the Connectivity Map (CMAP) projects have both made significant contributions to the field of large-scale gene expression profiling, using network-based approaches to link chemical exposure to gene expression and disease. The CMAP project screened ~1300 small molecules in 3 cell lines with whole transcriptome expression data [7, 8]. One of the major hurdles to these early gene expression profiling efforts was the inherently low throughput of microarray technologies. However, the field has seen great advancements in throughput in recent years. The aforementioned LINCS database uses the L1000 gene expression technology that measures the expression of 1000 genes, and through computational inference can predict transcriptional changes in nearly 80% of the genome. The LINCS project has generated over 1 million profiles using the L1000 technology from a large battery of human cell lines perturbed with chemicals and gene probes [9]. Furthermore, new RNA sequencing (RNA-seq) technologies, such as the TempO-Seq platform show great promise in their ability to measure expression changes in both smaller, targeted gene sets, and the whole transcriptome in a high-throughput manner [10]. Lastly, there is a wealth of microarray- and RNA-Seq-derived gene expression data currently available in multiple public repositories that can be used to develop procedures for predicting the molecular targets of chemicals.

A major challenge in the field of gene expression analysis is identifying signals that are indicative of modulation of specific molecular targets. One mechanism by which chemicals can exert toxic effects on a cell is by activating or repressing transcription factors, thus changing gene expression and altering normal cellular signaling events. We have previously developed gene expression biomarkers that can accurately identify chemicals that activate and/or suppress a number of transcription factors in the mouse or rat liver important in cancer and steatosis [1121]. Our group has also characterized biomarkers that predict modulation of estrogen receptor (ERα) and androgen receptor or predict genotoxicity in human cells [2224]. These biomarkers consist of short lists of genes (up to ~150) whose expression consistently change as a result of exposure to structurally-diverse chemicals or other perturbants, indicating either activation or suppression of a specific transcription factor. The modulation of the transcription factors that our group has focused on are often molecular initiating events (MIEs) or key events (KEs) in Adverse Outcome Pathways (AOPs). AOPs are defined as a series of mechanistically linked KEs starting with a MIE in which a chemical interacts with a target, culminating in an adverse outcome in a tissue [25]. Thus, gene expression biomarkers can be utilized to interpret microarray profiles with the goal of populating MIE/KE activity in AOP networks [26].

Nuclear factor erythroid-2 related factor 2 (NRF2) encoded by the NFE2L2 gene is a key transcription factor important in cellular responses to oxidative stress and xenobiotics. Under normal conditions, NRF2 is bound in the cytoplasm by Kelch-like ECH-associated protein 1 (Keap1), which results in ubiquitination and targeting of NRF2 for proteasomal degradation. When activated, NRF2 dissociates from Keap1, translocates to the nucleus and binds to genomic antioxidant response elements (AREs) as a heterodimer with Maf proteins, MafF, MafG, or MafK [27]. NRF2 binding to AREs promotes the transcription of a diverse battery of genes involved in the antioxidant response and detoxification, including many genes related to the cytoprotective processes of phase 2 metabolism [28]. NRF2 is intricately linked with carcinogenesis, as NRF2-nullizygous mice are more susceptible to many chemical carcinogens, yet paradoxically NRF2 and its target genes are also upregulated in many cancers [29, 30]. The activation status of NRF2 is also linked to hepatocyte steatosis [31], a condition in which there is an accumulation of triglycerides. The ability to readily identify chemicals that modulate NRF2 in microarray studies could help to build predictive models for cancer or steatosis.

In the present study, we developed computational methods for using a gene expression biomarker to predict NRF2 activation or suppression in human cells. We used this biomarker, coupled with an annotated database of gene expression profiling experiments, to perform an in silico screen for chemical perturbations that lead to NRF2 modulation. We validate our findings using an ARE-linked reporter system in HepG2 cells.

Methods

Construction and characterization of the NRF2 biomarker

The overall strategy used to computationally construct and characterize the NRF2 gene expression biomarker is depicted in Fig 1. The biomarker consists of a list of differentially regulated genes whose expression is consistently altered after exposure to chemicals that activate NRF2. Gene expression data were sourced from the commercially available database, called BaseSpace Correlation Engine (BSCE) (https://www.illumina.com/products/by-type/informatics-products/basespace-correlation-engine.html; formally NextBio). Differential gene expression analysis protocols employed in BSCE are described in detail in Kupershmidt et al. [32]. Briefly, raw (or preprocessed data if raw is not available) gene expression data is collected from GEO, ArrayExpress and other public repositories. Expression data are log transformed and normalized as appropriate (RMA, per-chip median or Lowess), and differential expression is calculated using a Welch’s or standard t-test, with a p value cut-off of 0.05 without multiple test correction, and a fold change cut-off of +/- 1.2 fold. Genes with expression values in the lower 20th percentile in both groups are removed. Lists of statistically filtered differentially expressed genes resulting from these comparisons are referred to as biosets.

Fig 1. Biomarker construction and screening strategy.

Fig 1

Left, NRF2 biomarker development. Differentially expressed genes (DEGs) from exposures to known NRF2 activators (sulforafan, oltipraz, sulindac, quercetin) and KEAP1 and NFE2L2 siRNA knockdown were assessed in the BSCE environment. Biomarker genes were identified as those consistently up- or down-regulated by the chemical exposures or KEAP1 knockdown and opposing regulation in NFE2L2 knockdown (further details in Results section). Ten biomarkers were initially created from biosets that varied based on chemical, time and dose of exposure, and tissue context.

Right, Biomarker Testing and Screening for Modulators. The biomarkers were imported into the BSCE environment and compared to all other human biosets via the Running Fisher algorithm for rank-ordered pairwise comparisons [32]. The p-values and correlation directions were exported and used to populate a master database of experimental details for all biosets. The biomarkers were tested for accuracy by comparing predictions with those from the ToxCast and Tox21 HTS NRF2 activation assays. The biomarker with the best predictive accuracy was used to identify chemicals in the database that activated or suppressed NRF2. Post-hoc analysis of the biomarker gene list included canonical pathway enrichment and ChIP-Seq analysis.

The NRF2 biomarker was constructed from lists of differentially expressed genes derived from chemical and genetic perturbations known to affect the activity of NRF2. The list of biosets used to construct the biomarker are listed in Table 1. Because our goal was to use the biomarker to identify conditions that activate NRF2 in many cell types, we used biosets derived from a variety of different cell lines to find a common set of genes. The biosets were from primary human hepatocytes and human cancer cell lines derived from liver (HepG2), breast (MCF7, MCF10A), and lung (A549). These gene sets were compared to identify those with consistent expression in a series of steps. First, genes were identified that exhibited consistent expression upon exposure to known NRF2-activating chemicals (e.g., sulforaphane, oltipraz, sulindac, quercetin) across the 6 chemical-treated biosets as well as a bioset derived from genetic knockdown of the NRF2 repressor Keap1. To pass this filter genes had to be either up-regulated or down-regulated in all of the biosets in which the gene exhibited differential expression and in the majority of the biosets (4 or more out of 7 biosets). Second, the gene lists were further filtered for those genes that also were regulated in an opposite manner by NRF2 knockdown to filter for direct dependence on NRF2. Any gene that did not meet this criteria was removed. Finally, the genes were filtered for those with robust expression changes. The average fold-change across the 7 biosets in which NRF2 was activated had to be ≥ |+/- 1.5-fold| (not Log2(fold-change). The final biomarker consisted of 143 genes. The list of biomarker genes and associated fold-change values is found in S1 File.

Table 1. Biosets used to build the NRF2 biomarker.

Bioset Name Factor Examined Cell type Time of treatment (hr) Concentration (uM) Number of Differentially Expressed Genes Study NRF2 biomarker
[-Log(p-value)]
Primary human hepatocytes + 50uM sulforaphane for 48hr _vs_ vehicle Sulforafan PHH 48 50 2619 GSE20479 37.444
MCF10A breast cell line + 15uM sulforaphane 24hr _vs_ vehicle control Sulforafan MCF10A 24 15 2714 GSE28813 21.824
Primary human hepatocytes + 30uM oltipraz for 48hr _vs_ vehicle Oltipraz PHH 48 30 2851 GSE20479 29.187
Hepatocytes of female donors treated 24hr with 600uM sulindac _vs_ 0uM Sulindac PHH 24 600 1478 TG-GATES 11.854
HepG2 hepatocellular carcinoma cell line 50uM Que treated for 24hr _vs_ DMSO control Quercetin HepG2 24 50 1849 GSE28878 19.155
HepG2 hepatocellular carcinoma cell line 50uM Que treated for 48hr _vs_ DMSO control Quercetin HepG2 48 50 3729 GSE28878 26.481
MCF10A breast cell line + Keap1 siRNA 24hr _vs_ control siRNA KEAP1 gene knockdown MCF10A NA NA 2267 GSE28813 23.155
A549 lung adenocarcinoma cells expressing NRF2 siRNA _vs_ non-targeting (NS) siRNA NFE2L2 gene knockdown A549 NA NA 3497 GSE38332 -40.398

PHH, primary human hepatocytes.

1It should be noted that A549 cells possess a homozygous mutation in Keap1 that results in increased NRF2 activity [33], explaining the dramatic effect on NRF2-regulated genes by knocking down NFE2L2.

Building the experimental database

An annotated master database of gene expression profiles was created using BSCE, as previously described [11, 13, 22]. Briefly, BSCE contains over ~22,900 highly curated, publicly available, omic-scale studies from 15 species including ~140,000 lists of statistically filtered genes (as of June 2019). All available information for each human bioset (45,163 total) was downloaded and assembled into a database of experimental parameters. Each entry in the database contained the bioset name, GEO accession number (where applicable), analysis platform, and tissue (when available), and was then annotated for category of perturbant (i.e. chemical, gene, etc.), specific perturbant studied, and for chemicals, dose and time of exposure. This database was then populated with–log(p-values) from the Running Fisher algorithm (performed in the BSCE environment) to assess the correlation in gene expression changes between each bioset and the NRF2 biomarker. The Running Fisher test is a platform-independent, rank-based comparison that is otherwise similar to Gene Set Enrichment Analysis [32]. P-values were exported, converted to–log(p-values), and biosets with negative correlations were assigned negative values. Based on our past studies [11, 12], we considered–log(p-values) ≥ 4 to indicate NRF2 activation and ≤ -4 to indicate NRF2 suppression. In these past studies, a p-value cutoff <0.001 with a Benjamini-Hochberg multiple correction consistently resulted in a p-value cutoff of 1E-4 which gave balanced accuracies > 90% for biomarkers derived from human, mouse and rat transcript profiles. Only biosets examining the effects of individual chemicals (as opposed to more than one chemical) were evaluated in this study.

Testing biomarker accuracy

To test for predictive accuracy, the biomarker was compared to gene expression biosets from chemically-treated HepG2 cells or primary human hepatocytes curated in the BSCE database. The biosets spanned exposures to 137 chemicals with known NRF2 activity based on ToxCast and Tox21 high-throughput NRF2 assay results. The comparisons were limited to HepG2 cells because this was the cell line in which the ToxCast and Tox21 assays were carried out, as well as primary hepatocytes to increase the number of chemicals with available biosets. True positive NRF2-activating chemicals were defined as those chemicals classified as NRF2 activators in both the ToxCast NRF2 (ATG_NRF2_ARE_CIS, Attagene Inc, Durham, NC) and the Tox21 NRF2 (Tox21_ARE_BLA_Agonist) assays (US. EPA, ToxCast & Tox21 Summary Files Released Dec. 2014, http://www.epa.gov/chemical-research/toxicity-forecaster-toxcasttm-data). True negatives were defined as those chemicals that were inactive in both assays. The biosets were filtered for exposure conditions that would be more likely for NRF2 activation, i.e., times between 8 and 48 h, and doses no lower than the HTS-determined AC50 values (for positive chemicals) or between 10 and 400 uM (for negative chemicals). Thirty-five chemicals in 59 unique comparisons remained after using these filters. Comparisons were carried out using the Running Fisher algorithm. The biosets used to create the biomarker (essentially the training set) were excluded from the test. The values for predictive accuracy were calculated as follows: sensitivity (true positive rate) = TP/(TP+FN); specificity (true negative rate) = TN/(FP+TN); positive predictive value (PPV) = TP/(TP+FP); negative predictive value (NPV) = TN/(TN+FN); balanced accuracy = (sensitivity+specificity)/2. It should be noted that our true positive and true negative chemicals are based on HTS assay results carried out in HepG2 cells. There is evidence that the responses to perturbations will differ between cell types [34]. Thus, our true positive and true negative chemicals may be less applicable to cell types other than liver cells. However, as the biomarker was designed using multiple cell types, we carried out this and other analyses on both the HepG2/hepatocyte-filtered database and the entire human database.

Ingenuity pathway analysis

The NRF2 biomarker genes were analyzed using the canonical pathway and upstream analysis functions of Ingenuity Pathway Analysis (IPA, Qiagen Bioinformatics, Redwood City, California). IPA calculates significance using a right-tailed Fisher’s Exact test. The p-value is the probability of the overlap between the NRF2 biomarker gene list and the IPA pathway gene list. Significant reported pathways have q-values < 1E-3. Upstream analysis uses the number of differentially expressed genes to predict upstream regulators of the biomarker genes. A Z-score is applied in the predictions of upstream analysis. A Z-score of > |2| is considered significant. Multiple test correction on p-values was carried out using the Benjamini-Hochberg method.

Identification of interactions between NRF2 and NRF2 biomarker genes using ChIP-Seq data

Analysis using human-based chromatin immunoprecipitation sequencing (ChIP-Seq) data was carried out to identify genes directly regulated by NRF2. Briefly, ChIP-seq data from experiments with activated NRF2 in human cells were obtained from Gene Expression Omnibus (3 biological replicates from a sulforaphane (SFN) treated human lymphoid cell line [GSM922966, GSM922967; GSM922968]; 2 biological replicates from SFN-treated human bronchial epithelial cell line BEAS-2B [GSM1968263; GSM1968264]; and 2 biological replicates from a human adenocarcinomic alveolar basal epithelial cell line A549 [GSM2423705; GSM2423706]). Raw sequencing reads downloaded from NCBI Sequence Read Archive (SRA) were mapped to the human genome (hg19) using the BWA (Burrows-Wheeler Aligner) [35]. Replicates were combined and NRF2 binding peaks were determined by MACS version 2.0 [36]. Genomic coordinates derived from these datasets were updated to human genome build GRCh38/hg38 using LiftOver [https://genome.ucsc.edu/cgi-bin/hgLiftOver]. Coordinates were annotated to bi-directional promoter features with 10 Kb using the “ChIPpeakAnno” function in the R Bioconductor package (version 3.19.3) [37] based on the current UCSC TxDb database (2019-10-21). Additionally, we identified antioxidant responsive elements (AREs) based on a position weight matrix (PWM) defined by ChIP-seq binding data [38]. These regions were evaluated using the “matchPWM” function in the R BioStrings package (v2.46.0) [https://www.rdocumentation.org/packages/Biostrings/versions/2.40.2/topics/matchPWM]. Matches were based on a minimum of 80% relatedness. It should be noted that ARE binding in association with target genes was a post hoc analysis and served to support, but not develop, the composition of the biomarker (S1 File).

Identifying modulators of the NRF2 pathway

To screen for chemicals that modulate NRF2, the biomarker was compared to each of the human biosets in the BSCE database. The p-values and directions of correlation were exported, p-values were converted to–log(p-values), and those with negative correlations were given negative values. Our master experimental database was then populated with the–log(p-values) for each bioset, which were classified as NRF2 activating (-log(p-value) ≥ 4), NRF2 suppressing (-log(p-value) ≤ -4) or having no effect on NRF2 (-log(p-value) between -4 and 4). Using this database, several strategies were implemented to identify chemicals that consistently activated NRF2. First, a Fisher’s exact test was used to determine which chemicals were enriched in the group of NRF2 active biosets (p-value <0.05 after a Benjamini-Hochberg multiple test correction). For example, there are 239 biosets with “Tobacco Smoke” as the specific perturbant, 83 of which resulted in NRF2 activation and 155 of which did not, whereas only 1429 biosets of the 45,163 total biosets result in activation. (It should be noted that this analysis included all biosets in the database, including those examining the effects of chemicals.) Thus, NRF2 active biosets were significantly enriched for the perturbant “Tobacco Smoke.” The same approach was taken with NRF2-suppressive biosets. This approach works well if there are a sufficient number (≥ 3) of biosets for each specific chemical; however, it does not take dose or exposure time into account.

In another approach to characterize chemicals that activate NRF2, the maximum–log(p-value) for each specific chemical generated in HepG2 cells or primary human hepatocytes was determined, without accounting for dose or exposure time, and used to make an active or inactive call. The resulting list of active and inactive chemicals was then compared with the active and inactive chemicals from the ToxCast and Tox21 HTS assays described above. Perturbants that suppressed NRF2 could not be compared with the HTS assays because the assays were designed to measure NRF2 activation and, therefore, were considered inactive for activation in these comparisons.

Dose-response, time-course, and chemical-induced effects on datasets with listed oncogenic mutations were examined in a limited set of chemicals as proof of concept. The limited number and specific choices of chemicals was driven mainly by available data for those chemicals. Where possible, data from multiple studies were combined to construct a–log(p-value) vs. dose or time relationship. In these cases, all data for a perturbant were considered, and individual dose- or time-response curves were constructed from studies carried out in a single cell type.

Validation experiments with the SRXN1-GFP reporter cell line

To investigate specific predictions based on the biomarker and differences between biomarker and HTS predictions, we employed a real-time, SRXN1-GFP fluorescence based NRF2-activation reporter system [39]. Thirty chemicals were screened in single/concentration-response format, over time, for their ability to induce expression of an SRXN1-GFP fusion protein in the human hepatoma HepG2 cell line (ATCC (clone HB8065)). The generation and qualification of the HepG2-BAC-GFP-SRXN1 reporter cell line has been previously described [39]. SRXN1-GFP cells were maintained and exposed in DMEM high glucose supplemented with 10% (v/v) Fetal Bovine Serum, 25 U/mL penicillin and 25 μg/mL streptomycin. During culturing of both the parental and reporter cell lines, mycoplasma was checked every 2 months using PCR-based testing. Three days prior to imaging, cells were seeded in Greiner black μ-clear 384 wells plates at 20,000 cells per well. Cytoplasmic accumulation of SRXN1-GFP levels and propidium iodide staining were monitored for 24 h using a Nikon TiE2000 confocal laser microscope (lasers: 408 nm, 488 nm and 561nm), equipped with an automated stage and perfect focus system. Prior to imaging at 20x magnification, HepG2 cells were loaded for 45 min with 100 ng/mL Hoechst33342 to visualize the nuclei. The time interval was ~40 min for both SRXN1-GFP and propidium iodide (PI) staining. All chemicals came from the ToxCast chemical inventory (kindly provided by Dr. Ann Richard).

Single-cell SRXN1-GFP intensity levels were transformed by log(intensity + 0.001) to attain a more symmetrical distribution. GFP-positive cell counts (GFP_pos) were defined as the fraction of cells per treatment-concentration that were above the 0.75 quantile + 0.25* of the inter quartile range of the matching vehicle control single-cell population GFP levels. Concentration-response fits of the GFP positive fraction were modeled with the lm function in R as function of the concentration using piece wise natural splines with 3 degrees of freedom: lm(GFP_pos ~ ns(dose_uM, df = 3)). The point-of-departure was calculated as the concentration where the fit intersects the threshold of 0.1 fraction of GFP positive cells + the upper 90% confidence limit of the regression at each matching concentration to account for replicate variability. The PI positive cells (PI_pos) represent the fraction of cells with at least 4 pixels of propidium iodide segmentation overlapping the cell segmentation. The cell counts (cell_count) represent the log fold-change in the number of cells compared to the matched vehicle controls and was defined by log10(cell count plate i/ vehicle control cell count_plate i).

Results

Building the NRF2 biomarker and assessment of predictive accuracy

The strategy for constructing the NRF2 biomarker, testing predictive accuracy, and screening for modulators is summarized in Fig 1. Our strategy for constructing the NRF2 biomarker was similar to our previous efforts at building biomarkers for other transcription factors in which we utilized archived microarray profiles generated from tissues or cells exposed to known chemical modulators as well as profiles generated from cells in which the gene encoding the transcription factor was either knocked out or knocked down (e.g., [11, 22, 23]). The present strategy utilized diverse profiles from human cells under conditions known to robustly activate NRF2 and included 24–48 hr exposures to the NRF2 activators sulforaphane (Gene Expression Omnibus (GEO) accession numbers: GSE20479 and GSE28813), oltipraz (GSE20479), sulindac (TG-GATES), and quercetin (GSE28878) (Table 1). Chemical activation of NRF2 was also compared to chemical-independent genetic activation by including a profile from siRNA knockdown of the NRF2 negative regulator Keap1 (GSE28813). Biomarker genes were selected based on a number of criteria. These included directionally consistent changes in expression in most or all of the biosets in which NRF2 was chemically- or genetically-activated with no expression changes in the opposite direction. A genetic filter was implemented resulting in only genes that exhibit opposite directional changes after NRF2 was knocked down by siRNA (GSE38332). Finally, genes were filtered for those with relatively robust average expression changes (fold change ≥ |1.5|) across the activating conditions.

The ability of the biomarker to accurately predict NRF2 activation was assessed by comparing the results of the biomarker predictions with those from two high-throughput screening (HTS) assays carried out as part of ToxCast and Tox21 screening programs. These assays measured the activity at antioxidant response element (ARE)-linked reporters in HepG2 cells. In order to increase the confidence of these classifications, only chemicals that were called positives or negatives in both assays were used. There were 36 chemicals screened in these assays that overlapped with microarray studies in HepG2 or human primary hepatocytes in our compendium. The biosets used to create the biomarker (essentially the training set) were excluded from the test. The biomarker was compared to the microarray profiles using the Running Fisher test. Each pair-wise correlation resulted in a p-value which was converted to–Log(p-value), and those with negative correlation were converted to negative numbers. The balanced accuracy for the biomarker was calculated based on the 59 unique comparisons of biosets with corresponding HTS data (Fig 2A). Information about the biosets used in the test are provided in S1 File. The biomarker correctly classified thirteen true positive and 42 true negative biosets, with 3 biosets as false positives (allyl alcohol, carbamazepine, coumarin) and one false negative (mefenamic acid), resulting in a sensitivity and specificity of 93%. The positive predictive value was 81%, and the negative predictive value was 98%. The expression of the biomarker genes after exposure to the incorrectly classified chemicals is shown in Fig 2B. In general, the expression patterns of the false positive chemicals were consistent with that of the biomarker itself (i.e., the positive biomarker genes were generally increased in expression whereas the negative biomarker genes were generally decreased in expression). The false negative chemical mefenamic acid approached significance (-Log(p-value) = 3.7) and exhibited a pattern in which all but one of the 11 altered genes were directionally consistent with the biomarker.

Fig 2. Predictive accuracy of the NRF2 biomarker.

Fig 2

A. Predictions using the NRF2 biomarker were compared to those from the HTS NRF2 activation assays. The dashed line represents the cutoff value of four for significant biomarker activation. Those chemicals that were active (red) or inactive (black) in the HTS assays are shown. Three false positives (blue) and one false negative (yellow) are indicated. Inset, binary classifier calculations based on NRF2 HTS assay results. B. Expression changes of the biomarker genes for the three false positives and one false negative. (Top) The bar graph shows the -Log(p-values) of the comparisons between the biomarker and the indicated chemical. (Bottom) The heatmap shows the expression of the biomarker genes.

Attempts were made to evaluate shorter lists of NRF2-regulated genes for prediction. A six gene biomarker based on some of the most frequently cited NRF2 targets HMOX1, NQO1, TXNRD1, SXRN1, GPX2, and AKR1B10 had a balanced accuracy of only 66% and a sensitivity of 36%. This analysis indicates that a biomarker with a more comprehensive set of NRF2-dependent genes is more predictive than shorter lists of hand-picked genes.

3

Characterization of the NRF2 biomarker genes

The 143 NRF2 biomarker genes exhibited consistent expression across the chemical and KEAP1 knockdown perturbations (Fig 3A). As expected, the biosets from chemical activator treated cells or cells in which KEAP1 was knocked down exhibited statistically significant positive correlation to the biomarker (p-values ≤ 10−10). The bioset in which NFE2L2 was knocked down exhibited significant negative correlation to the biomarker (p-value ≤ 10−40).

Fig 3. Characterization of the human NRF2 biomarker.

Fig 3

A. Identification of NRF2 biomarker genes was based on consistent directional changes in gene expression resulting from exposure to NRF2 activators sulforaphane (GSE20479 and GSE28813), oltipraz (GSE20479), sulindac (TG-GATES), and quercetin (GSE28878), and knockdown of the NRF2 negative regulator Keap1 (GSE28813). Changes in gene expression were required to be in the opposing direction in cells in which NFE2L2 (NRF2) expression was knocked down (GSE38332). The bars on the left represent the–Log (p-values) for the Running Fisher comparison test between the biomarker and the individual biosets used to construct it. The heatmap on the right depicts gene expression changes for the 143 genes in the biomarker across the individual biosets. B. Canonical pathway analysis of the biomarker genes. Multiple test correction on p-values derived from the right-tailed Fisher’s exact tests was carried out using the Benjamini-Hochberg method. The -Log(q-value)s are shown. C. Potential upstream regulators of NRF2 biomarker genes (activation z-scores > |2|). Only upstream regulators with q-values < 0.05 are shown. Both analyses were conducted using Ingenuity Pathway Analysis software.

The 68 upregulated and 75 downregulated genes in the biomarker included many well-known NRF2 targets (e.g., AKR1B10, AKR1C1, NQO1, TXNRD1, SRXN1, GCLC and GCLM) [28, 38]. Similar to our mouse NRF2 biomarker [20], HMOX1 did not pass the genetic filter. Many of these genes were annotated near genomic regions bound by human NRF2 in human cell lines. Of the 143 NRF2 biomarker genes, 38 (26.6%) of these were associated with the NRF2-bound ChIP-Seq loci near gene promoter regions (within 10 Kb) in at least one dataset derived from cell lines treated with NRF2-activating isothiocyanate, SFN, or have constitutively active NRF2 [33, 38, 40]. This is in comparison to the background rate of 13.7%, where a total of 3553 genes contained NRF2-ChIP bound regions within 10 Kb of the TSS of the approximately 26,000 identified transcribed genes in the human genome (hg38). Therefore, the NRF2 biomarker genes are more significantly represented with NRF2-bound regions (Fisher’s Exact test, p<0.05). Although these reference ChIP-seq loci were not derived from liver cells and could potentially miss regions that are liver-specific, we utilized these existing data from lung and lymphoblastoid cells to identify common NRF2-bound regions. Additionally, we identified 19 genes with evidence of NRF2-bound regions containing sequences that matched the ARE binding motif, supporting that some of these biomarker genes are directly regulated by NRF2 in a cis-acting manner (S1 File). Most of these genes (17 of 19) were activated in the biomarker, indicating that these genes are being transcriptionally upregulated by NRF2.

The NRF2 biomarker was evaluated for functional class enrichment via Ingenuity Pathway Analysis (IPA) (Fig 3B). The top canonical pathway identified as enriched with the biomarker genes was “NRF2-mediated oxidative stress”. Other significantly enriched NRF2- linked pathways included “Glutathione biosynthesis” and “Thioredoxin pathway”. NRF2 was identified as the top upstream regulating transcription factor that regulated the biomarker genes (Fig 3C). AKT and AhR were also significant upstream activators, both of which are upstream regulators of NRF2 [30, 41]. RARA, ESR1 and PDX1 were classified as upstream regulators that inhibit expression of the NRF2 biomarker genes. RARA (retinoic acid receptor alpha) and ESR1 (estrogen receptor alpha) signaling have previously been identified as inhibitory to NRF2 activation [42, 43]. PDX1 (pancreatic and duodenal homeobox 1) is a diabetes susceptibility gene involved in pancreas development and regulates mitochondrial DNA transcription in pancreatic β-cells [44]. No direct interaction with NRF2 signaling has yet been identified for PDX1, however, indirect links between insulin signaling, PI3K, and NRF2 do exist [45].

Screening for chemicals that modulate NRF2 in a gene expression compendium

The NRF2 biomarker was used to query the entire BSCE chemical database of ~9840 biosets, which examined the effects of ~2260 chemicals. Fig 4A shows the expression of the genes in the NRF2 biomarker across the biosets ranked by Running Fisher test significance. On the far left are those chemical comparisons that exhibited the greatest significant positive correlations to the biomarker. Some of the most significant chemicals included those used to create the biomarker as well as those that are well known to activate NRF2 (discussed below). These chemicals induced a pattern of expression of the biomarker genes markedly similar to that of the biomarker itself. On Fig 4A, far right, a much smaller number of chemicals exhibited significant negative correlations to the biomarker. These chemicals induced a pattern of gene expression that was opposite to that of the biomarker genes, similar to that found when the NFE2L2 gene was knocked down by RNAi (see above). These exposure conditions are thus thought to suppress the activity of NRF2.

Fig 4. NRF2 activity across biosets from chemically treated human cells.

Fig 4

A. (Top) Biosets derived from microarray comparisons of human cells exposed to chemicals were rank ordered based on their correlation to the biomarker using the -Log(p-value) of the Running Fisher test. Biosets with positive correlation (red) to the biomarker are on the left and biosets with negative correlation (green) to the biomarker are on the right. The dashed lines denote the cutoff p-value = 10−4. (Bottom) The heat map shows the expression of genes in the biomarker across the biosets. B, biomarker genes. The numbers refer to rank-ordered bioset number. B. Top 20 chemical biosets that activate NRF2. Right, heatmap depicting the gene expression changes for the 143 genes in the NRF2 biomarker in each bioset. The biomarker gene expression changes are represented across the bottom of the heatmap. Each bioset is represented by the chemical, time and concentration of exposure, cell line used, and annotated study. One study did not have all information available (GSE13818). Abbreviations: BaP, benzo[a]pyrene; 2CMP, 2-(Chloromethyl) pyridine hydrochloride; NPD, 4-Nitro-o-phenylenediamine; CLEFMA, 4-[3,5-bis(2-chlorobenzylidene-4-oxo-piperidine-1-yl)-4-oxo-2-butenoic acid]; SFN, sulforafan; I3C, indole-3-carbinol. C. Top 20 chemical biosets that suppress NRF2. Abbreviations: 4AAF, 4-acetylaminofluorene; 17DMAG, 17-(dimethylaminoethylamino) 17-demethoxygeldanamycin; CCl4, carbon tetrachloride; Mixture, a mixture of liver toxicants; DMBA, 7,12‑Dimethylbenz[a]anthracene.

There were 260 unique chemicals that exhibited NRF2 activation. The top 20 chemical biosets with the greatest correlation to the biomarker excluding the chemical comparisons used to construct the biomarker, included well-known NRF2 activators (benzo[a]pyrene, sodium arsenite, indole-3-carbinol, menadione, and sulforafan) (Fig 4B). Far fewer chemicals suppressed than activated NRF2 signaling. There were 43 chemicals that exhibited suppression of NRF2. The top 20 chemical biosets with the most significant negative correlation to the biomarker are shown in Fig 4C. The chemicals included carbon tetrachloride, curcumin, and dexamethasone. Approximately half of all the activating chemicals were from experiments in cell types other than HepG2 cells or primary hepatocytes, providing evidence that the biomarker has predictive ability outside the context of liver cells (S1 File).

The microarray compendium includes multiple comparisons assessing the same chemical but in different cell lines at different doses and times of exposure carried out by different labs. To help organize the predictions, a Fisher’s Exact test was used to determine which chemicals were overrepresented in biosets with active NRF2 scores. Of the chemicals examined and which had a minimum of 3 bioset comparisons, 38 were significantly enriched (p < 0.05, with Benjamini-Hochberg correction) for NRF2 activation (Table 2). The most highly enriched chemicals included tobacco smoke, benzo[a]pyrene, sulforafan, quercetin, mitomycin C, and sodium arsenite, all of which are known NRF2 activators or generate reactive oxygen species [46]. Also included in the enriched chemicals were indole-3-carbinol, dithiothreitol (DTT) and a naturally occurring steroid compound, withaferin A.

Table 2. Chemicals enriched in NRF2 active biosets.

Perturbant Total Biosets Active Biosets Inactive Biosets Fisher’s Exact Test P-value (Benjamini-Hochberg Corrected)
Tobacco smoke 239 84 155 1.39E-05
Benzo(A)Pyrene 27 15 12 2.78E-05
Sulforafan 10 10 0 4.17E-05
Quercetin 20 12 8 5.56E-05
Mitomycin 10 9 1 6.95E-05
Smoke 29 11 18 9.72E-05
Sodium Arsenite 11 7 4 1.11E-04
Indole-3-Carbinol 6 5 1 1.39E-04
Dithiothreitol 7 5 2 1.53E-04
Withaferin A 4 4 0 1.67E-04
Mln4924 8 5 3 1.81E-04
Azathioprine 20 7 13 1.94E-04
((1S,2S,4R)-4-(4-((1S)-2,3-Dihydro-1H-Inden-1-Ylamino)-7H-Pyrrolo(2,3-D)Pyrimidin-7-Yl)-2-Hydroxycyclopentyl)Methyl Sulphamate 5 4 1 2.08E-04
Arsenic Trioxide 5 4 1 2.22E-04
Siomycin A 5 4 1 2.36E-04
Phenobarbital 16 6 10 2.64E-04
Nitric Oxide 11 5 6 2.78E-04
Oxyquinoline 6 4 2 2.92E-04
Decitabine 146 16 130 3.06E-04
Tetrachlorodibenzodioxin 19 6 13 3.20E-04
4-(Acetoxymethylnitrosamino)-1-(3-Pyridyl)-1-Butanone 3 3 0 3.33E-04
Cresidine 3 3 0 3.47E-04
Menadione 3 3 0 3.61E-04
Phenylenediamines 3 3 0 3.75E-04
Pyridine Hydrochloride 3 3 0 3.89E-04
Rosemary Oil 3 3 0 4.03E-04
Tert-Butylhydroperoxide 7 4 3 4.45E-04
Azacitidine 24 6 18 4.58E-04
Oxidized-L-Alpha-1-Palmitoyl-2-Arachidonoyl-Sn-Glycero-3-Phosphorylcholine 4 3 1 4.86E-04
Vitamin K 3 10 4 6 5.14E-04
Risperidone 5 3 2 5.28E-04
Atorvastatin 12 4 8 5.42E-04
Propylthiouracil 12 4 8 5.56E-04
17b-estradiol 174 15 159 5.70E-04
Cisplatin 157 14 143 5.83E-04
Aflatoxin B1 13 4 9 5.97E-04
WY 14,643 6 3 3 6.25E-04

The number of total biosets for each chemical were counted, as were the number of biosets in which NRF2 was activated or suppressed. A Fisher’s exact test was used to determine which chemicals were significantly enriched in the group of biosets classified as NRF2 active.

There are several factors to consider when interpreting these results. First, experimental conditions other than the specific chemical (i.e. dose, time, cell line/type, etc.) are not included in this analysis, all of which can play a significant role in determining transcriptional responses. Second, the number of biosets a specific chemical is associated with can impact its significance in this test. For example, if a chemical is associated with only three biosets, all three must be positive for NRF2 activity for the chemical to be significantly enriched. Therefore, this strategy is helpful in identifying chemicals with numerous biosets that activate NRF2 or chemicals that consistently activate NRF2 in smaller numbers of comparisons.

Dose-, time-, and mutation-dependent NRF2 modulation

Dose- and time-dependent NRF2 activation relationships were examined in a few select cases as proof of concept that the biomarker could be used to uncover characteristics of NRF2 activation. Biosets from HepG2 cells exposed to the polycyclic aromatic hydrocarbon benzo[a]pyrene, a prototypical AhR agonist and NRF2 activator, at 2 μM from four separate studies (GSE28878, GSE36242, GSE36243, GSE40117) were examined in a time-course design (Fig 5A). Surprisingly, there was a strong negative, linear correlation between time of exposure (12 to 72 h) and–log(p-value) (R2 = 0.904), indicating that NRF2 activation is decreasing with time of exposure beyond 12 hours, likely due to sequestration of reactive metabolites of BaP over time. Consistent with this, DNA adduct formation in HepG2 cells by BaP peaks at 12 hours compared to exposures of 6 and 18h [47]. In contrast, exposure to both 0.01 μM and 50 μM quercetin for 12–48 h in HepG2 cells from two studies (E-MEXP-2574, GSE28878) demonstrated strong positive, linear correlations (R2 = 0.957 and 0.846, respectively) (Fig 5B). Diazepam exposure at 10, 50, and 250 μM for 2, 8 and 24 h in primary human hepatocytes (from the TG-GATES study) displayed strong positive, linear correlations (R2 = 0.930, 0.964, and 0.999, respectively) (Fig 5C), demonstrating that prolonged diazepam exposures are required to activate NRF2. The time-dependent activation of NRF2 in lung fibroblasts by the known NRF2 activator, dithiothreitol, from GSE4301 was examined (Fig 5D). NRF2 was maximally activated by DTT at 16 hours.

Fig 5. Examples of dose-, time-, and mutation-dependent modulation of NRF2.

Fig 5

A. Response to 2 μM benzo[a]pyrene from 12 to 72 hours in HepG2 cells. Red circles indicate NRF2 activation, blue circles indicate inactivity. Data from GSE28878, GSE36242, GSE36243, GSE40117. B. Exposure of HepG2 cells to 0.01 μM and 50 μM quercetin from 12 to 48 hours. Data from E-MEXP-2574, GSE28878. C. Exposure of human primary hepatocytes to 10, 50, and 250 μM diazepam from 2 to 24 hours. Data from the TG-GATES study. D. Exposure of lung fibroblasts to dithiothreitol (2.5 mM) at different time points. Data from GSE4301. E. Examples of chemicals suppressing the activation of NRF2 in cancer cells expressing activated PI3K and EGFR. The activating mutations in PI3K3CA (E545K and H1047R) and overexpression of EGFR in the presence of EGF (caEGFR) lead to activation of NRF2 compared to wild-type cells. HCC827 with EGFR Del15 cells possess an amplified EGFR allele with an activating in frame deletion of 15 nucleotides in exon 19. Treatment of the cells with inhibitors for PI3K (LY294002) or EGFR (erlotinib) in the indicated cells suppresses background NRF2 activation.

Two chemicals were found to suppress NRF2 under conditions in which NRF2 exhibited constitutively higher background than wild-type cells. Two activating mutations in the phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha gene (PI3K3CA) (E545K and H1047R) (GSE33403) and overexpression of the epidermal growth factor receptor (EGFR) in the presence of EGF (caEGFR) (GSE3542) led to NRF2 activation compared to wild-type cells (Fig 5E). Treatment of cells carrying the activating mutations in PI3KCA by the PI3K inhibitor LY294002 led to suppression of NRF2. HCC827 cells possessing an amplified EGFR allele with an activating in-frame deletion of 15 nucleotides in exon 19 were treated with the EGFR inhibitor erlotinib, resulted in suppression of the background NRF2 activation.

Comparison between biomarker predictions and HTS studies: Validation of predictions with a real-time NRF2 activation assay

Ninety chemicals screened in the ToxCast and/or Tox21 NRF2-activation assays were found in the microarray compendium. Prediction of NRF2 activity agreed for 65 (72%) of the chemicals, including seventeen actives and 48 inactives between the two HTS assays and using the biomarker (Fig 6). The biomarker identified thirteen actives that the HTS assays did not, and the HTS assays identified twelve actives that the biomarker did not.

Fig 6. Biomarker prediction agreement with HTS NRF2 assay results.

Fig 6

Maximum biomarker scores for each chemical were extracted from the database of HepG2 and hepatocyte experiments, and activity calls based on these scores were compared with activity determinations from ToxCast and Tox21 NRF2 HTS assays. Chemicals examined at a concentration above the HTS-determined cytotoxicity threshold for that chemical were excluded from the analysis. Lists represent those chemicals classified as NRF2-active by the biomarker only (left) or by HTS only (right). Biomarker score values shaded in red indicate results that are ≥ -Log(p-value) = 4 for activation of NRF2 signaling.

Thirty chemicals were selected for follow-up screening in the SRXN1-GFP assay. The SRXN1-GFP assay identifies chemicals that activate NRF2 detected by increased expression of the GFP reporter under control of the human SRXN1 promoter in HepG2 cells [39]. The chemicals were selected in part from the lists of NRF2 activators that differed between HTS assays and biomarker predictions (Table 3). Cells were treated at multiple concentrations and examined for GFP accumulation over a 24-h period. Thirteen chemicals (2-nitrofluorene, acetaminophen, allyl alcohol, atorvastatin, benzoin, carbamazepine, coumarin, diethylstilbestrol, disulfiram, mefenamic acid, resorcinol, tetracycline, tolbutamide) were selected that had a positive response based on the biomarker approach. Of these, 7 (2-nitrofluorene, allyl alcohol, carbamazepine, diethylstilbestrol, mefenamic acid, resorcinol, tetracycline) induced NRF2 activity at least at one concentration (Fig 7). The two positive controls quercetin and sulindac that were used to build the biomarker were positive in the assay. Of the remaining six inactive compounds, four (acetaminophen, benzoin, coumarin, tolbutamide) were tested in the microarray studies at concentrations exceeding what could be achieved in the GFP reporter assay. Thus, it is likely that NRF2 was not activated by these chemicals because of insufficient concentration (Table 3). Of the fourteen chemicals predicted to be inactive using the biomarker approach, nine (fenofibrate, progesterone, propiconazole, simazine, simvastatin, sulfisoxazole, tamoxifen, triclosan, valproic acid) were not active in the reporter assay. Four (danazol, hydroquinone, indomethacin, pentachlorophenol) of the remaining five chemicals predicted to be inactive using the biomarker were active in the reporter assay but were tested in the microarray studies at concentrations under most of the concentrations used in the reporter assay. Thus, these chemicals appear to be active at the concentrations tested in the reporter assays but not at the lower concentrations used in the microarray studies. Using the designations in Table 3 for the compounds studied, we determined predictive accuracy. The sensitivity, specificity, positive predictive value, and negative predictive value were 0.75, 0.82, 0.75 and 0.82, respectively. The balanced accuracy was 0.78.

Table 3. Comparison of chemical effects using the biomarker and the SRXN1-GFP assay.

Chemical CAS Number Biomarker maximum (-log(p-value)) Highest no-effect concentration for NRF2 activation in microarray experiments Range of NRF2-active concentrations in microarray studies Maximum Tested Minimum Tested Summary of SRXN1-GFP assay Prediction of SRXN1-GFP results by microarray
Concentration in SRXN1-GFP Assay (uM) Concentration SRXN1-GFP in Assay (uM)
2-Nitrofluorene 607-57-8 6 32–18 uM 200 0.1 Active TP
Acetaminophen 103-90-2 6.522879 10–1 mM 200 0.1 Inactive 2
Allyl alcohol 107-18-6 7.022276 70uM 200 0.1 Active TP
Atorvastatin 134523-00-5 11.40894 40uM 190 0.1 Inactive FP
Benzoin 119-53-9 9.091515 345uM 200 0.1 Inactive 2
Carbamazepine 298-46-4 7.065502 300uM 200 0.1 Active TP
Coumaphos 56-72-4 1.793174 250uM 200 0.1 Active FN
Coumarin 91-64-5 6.69897 300uM 200 0.1 Inactive 2
Danazol 17230-88-5 3.39794 35uM 200 0.1 Active 1
Diethylstilbestrol 56-53-1 8.585027 5uM 200 0.1 Active TP
Disulfiram 97-77-8 5.823909 60uM 200 0.1 Inactive FP
Fenofibrate 49562-28-9 0.556737 30uM 200 0.1 Inactive TN
Hydroquinone 123-31-9 1.549751 150uM 200 0.1 Active 1
Indomethacin 53-86-1 3.69897 200uM 200 0.1 Active FN
LY294002 154447-36-6 -5.638 20uM 200 0.1 Active NU
Mefenamic acid 61-68-7 4 150uM 200 0.1 Active 1
Pentachlorophenol 87-86-5 1.752027 10uM 100 0.05 Active 1
Progesterone 57-83-0 0 6uM 200 0.1 Inactive TN
Propiconazole 60207-90-1 1.847712 10uM 200 0.1 Inactive TN
Quercetin 117-39-5 26.48149 10nM, 50uM 200 0.1 Active PC
Resorcinol 108-46-3 10.40894 2mM 200 0.1 Active TP
Simazine 122-34-9 -0.96981 50uM 200 0.1 Inactive TN
Simvastatin 79902-63-9 0.407601 30uM 200 0.1 Inactive TN
Sulfisoxazole 127-69-5 -2.92082 5uM 200 0.1 Inactive TN
Sulindac 38194-50-2 11.85387 0.6, 3mM 200 0.1 Active PC
Tamoxifen 10540-29-1 0 25uM 100 0.05 Inactive TN
Tetracycline 60-54-8 10.88606 2mM 200 0.1 Active TP
Tolbutamide 64-77-7 9.744727 2.1mM 200 0.1 Inactive 2
Triclosan 3380-34-5 -0.61834 22uM 200 0.1 Inactive TN
Valproic acid 99-66-1 1.171985 5mM 200 0.1 Inactive TN

Biomarker maximum (-log(p-value)): for any chemical represented by more than one bioset in the compendium, the top -Log(p-value) was selected. The range of concentrations assessed using the biomarker were from experiments in hepatocytes or hepatocyte-derived cell lines. In the column “Prediction of SRXN1-GFP results by microarray”: PC, positive control; 1 = concentration tested in the microarray studies was under that tested in the GFP assay; 2 = concentration tested in the microarray study exceeded that tested in the GFP assay; NU = not used in prediction. LY294002 was tested to determine if this compound suppresses NRF2 activity (described in Fig 5E).

Fig 7. NRF2 activity of selected chemicals in HepG2 cells.

Fig 7

Thirty chemicals tested for confirmation in the SRXN1-GFP assay. Cells were treated at multiple concentrations and examined for GFP accumulation over a 24-h period (GFP_pos_1–10) for concentrations 1 to 10. The last two columns, PI_pos_10 and cell_count_10, are the Propidium Iodide positive fraction of cells (marker for necrosis/ dead cells) and cell count compared to the negative control, respectively.

In addition, one of the test chemicals, the PI3K inhibitor LY294002, acted as a NRF2 suppressor in the context of a constitutively active PIK3CA in the human breast cancer cell line MCF10A when tested at 20 μM (see above). We predicted that this compound would suppress background levels of NRF2 activation, but instead discovered it to be an activator in the reporter assay, albeit at concentrations that concurrently caused cytotoxicity.

Discussion

High-throughput transcriptomic (HTTr) technologies have the potential to identify molecular targets in in vitro screens of environmental chemicals. In the present study, we used a computational approach to identify modulators of NRF2, the major regulator of cellular responses to oxidative stress. Using microarray data from chemical exposure and genetic modulation conditions known to either activate or suppress NRF2 activity, we identified 143 biomarker genes with an expression pattern consistent with NRF2 activation. When used in conjunction with a pattern-matching approach, our methods could readily identify, in a large gene expression human cell line compendium, chemical exposure experiments known to modulate NRF2. Using classifications from two HTS assays of NRF2 activation carried out in HepG2 cells, we found that the approach was highly predictive in identifying NRF2 activators (balanced accuracy of 93%). In an independent study of 29 chemicals, the predictions based on the NRF2 biomarker were independently validated for most of the chemicals in HepG2 cells encoding a NRF2-responsive GFP reporter (78% balanced accuracy). Our methods were used to perform a virtual screen of chemicals in a human gene expression compendium of ~2380 chemicals. We found that 288 chemicals activated NRF2 including several chemicals that were not also identified in the NRF2 HTS assays. We identified 53 chemicals that appeared to suppress NRF2, because exposure led to an expression pattern opposite that of the biomarker itself. The results indicate that our approach can be reliably used as a Tier 1 screen in the context of a larger HTTr profiling effort, similar to those ongoing in the ToxCast screening program [48].

Determination of predictive accuracy required identification of a set of chemicals with known (positive or negative) activity for NRF2. In our microarray compendium there were relatively few chemicals that could be considered “true” positive or negative NRF2 activators, because many chemical activators reported in the literature have been investigated only at the level of transcriptional induction of one or a few NRF2 target genes, far less than the full-genome analysis in the present study. Microarray studies of dose-response and time-course relationships are likewise relatively rare. Thus, the number of chemicals that can be confidently classified as true activators within the human compendium were too few to perform meaningful balanced accuracy calculations. We therefore chose to leverage the ToxCast and Tox21 HTS assay data to classify chemicals as NRF2 activators. Only chemicals consistently active or inactive in both assays were used to determine accuracy. These selection criteria yielded 35 chemicals with consensus NRF2 HTS activity. Using these designations as the truth, the balanced accuracy of the biomarker was 93%. There were three false positives (allyl alcohol, carbamazepine, coumarin) and one false negative (mefenamic acid). The microarray profiles of these compounds exhibited expression of biomarker genes that were consistent in direction and magnitude to the changes of the biomarker genes. Remarkably, carbamazepine was weakly active at 100 μM, and coumarin was active at 50 μM in earlier studies carried out in HepG2 cells [49, 50], indicating that despite the lack of activity in both HTS assays, these chemicals are likely NRF2 activators. Reclassification of these two chemicals as true positives, improves balanced accuracy to 96%. The excellent predictive accuracy is similar to those of other biomarkers that we have characterized, including those that identify chemical modulators of xenobiotic-activated transcription factors in the mouse and rat liver [1113, 16, 17, 19] and those that identify modulators of ER and AR in human cell lines [22, 23].

In addition to the chemicals discussed above, which had consistent NRF2-activating behavior in both HTS assays, we expanded our comparison to the larger set of chemicals in which there were differences between the two HTS assays. In general, our predictions of NRF2 activation compare favorably with those made from ToxCast and Tox21 in vitro HTS assays. After filtering for biosets exposed at concentrations higher than the HTS-derived cytotoxicity threshold, predictions for 65 chemicals agreed between the biomarker and HTS approaches (Fig 6). There were thirteen chemicals that the biomarker identified as activating NRF2 that were not called hits in the HTS assays. This may be the result of cell type differences, because primary hepatocyte data was included in the biomarker-based approach. HTS assays identified twelve active chemicals classified as inactive by the biomarker. However, these microarray studies were not optimal for comparison to the more standard HTS data. The chemicals included five in which exposures were below the HTS-determined AC50 (i.e., test concentrations insufficient to activate NRF2), four chemicals were tested using exposure durations that may not be optimal for NRF2 activation, including three that were exposed for 8 h or less, and one for 72 h. Two chemicals had no dose information available. Thus, the microarray data available for these chemicals was suboptimal. One notable difference in the chemicals identified by HTS assays but not the biomarker was curcumin. Curcumin is a diarylheptanoid found in turmeric and is well known to increase NRF2 signaling [51, 52]. HTS assays correctly identified curcumin as a hit, whereas the biomarker did not. Surprisingly, gene expression changes in HepG2 cells exposed to 1 μM for 12, 24, or 48 hours (GSE28878) resulted in significant NRF2 suppression at 48 hours, exactly opposite of that expected. Regarding the chemicals that were classified as activating NRF2 by the biomarker approach but not in the HTS assays, it is possible that the conditions used to assay NRF2 in the HTS assays was not optimal. Most of the chemicals were either examined at doses above 100 μM (eight chemicals) or at time points greater than 24 hours (three chemicals), or both in the microarray studies. Only allyl alcohol was examined under conditions comparable to the HTS assays (70 μM and at 24 hours). Differences in chemical metabolism may also contribute to differences between assays. Thus, the differences in the classifications between the HTS assays and the biomarker could be explained in part by differences in the concentrations and times of exposure. These considerations are important given that there are differences between chemicals in the exposure times that produce maximal activation of NRF2 (Fig 5).

Given some inconsistencies between our biomarker approach and the HTS assays, we examined 29 chemicals in HepG2 cells that possess a NRF2-reponsive reporter in which GFP is under control of the promoter from the SXRN1 gene. We found that when dose is considered, most of the predictions based on the biomarker approach could be confirmed using the reporter system. In any toxicity assay, false negatives are a concern. Out of the thirteen chemicals that were positive using the biomarker approach, six chemicals tested inactive in the reporter assay. However, all but two of the chemicals (atorvastatin, benzoin) were active only at concentrations that could not be achieved using the reporter assay due to DMSO tolerance of the HepG2 cells coupled to stock solution concentrations. Excluding those chemicals in which sufficient test concentrations could not be achieved, the reporter assay was able to confirm 7 of the 9 chemicals that were identified as positive using the biomarker (Table 3). In summary, the SXRN1-GFP assay was able to predict the NRF2 activation status of most of the chemicals selected for rescreening (78% balanced accuracy).

We performed a virtual screen of the chemicals in the human microarray compendium and found many chemical-dose-time conditions that result in NRF2 activation. The biomarker correctly identified many known chemical NRF2 activators (e.g., sodium arsenite, phenylenediamines, azathioprine, dithiothreitol), as well as known environmental stressors that increase oxidative stress (tobacco smoke, nitric oxide). Furthermore, the procedures identified chemicals that interact with the aryl hydrocarbon receptor (benzo[a]pyrene, smoke) and whose CYP-mediated metabolism results in increased oxidative stress, activating NRF2 [41]. Importantly, the biomarker identified NRF2-activating chemicals in cell lines other than HepG2 or primary hepatocytes. This finding was not unexpected because the biomarker was built from data derived from multiple cell types. Although there are likely cell type-specific effects of NRF2 activation (e.g., [34, 53]), there are also core changes in gene expression that can be used as a marker for activation in a larger number of cell lines.

In addition to NRF2 activators, our method identified chemicals that suppress NRF2 activity. Forty-three chemicals elicited a pattern of gene expression with significant negative correlation to the biomarker, similar to the pattern observed with NRF2 knockdown. Although it is possible that fewer chemicals suppress than activate NRF2, it is likely that the experimental conditions across the compendium were not optimized to identify NRF2 suppressors. To effectively identify suppressors, an attempt to first stimulate NRF2 activity is needed, similar to running the “antagonist mode” in hormone receptor assays in which cells are pre-treated with a test compound before the addition of a known agonist. In fact, two compounds identified by the biomarker as NRF2 suppressors, the PI3K inhibitor LY294002 and the EGFR erlotinib, did so in the context of cells harboring oncogenic mutations that led to constitutive activation of PI3K in MCF10A cells [54] or lung cancer cells expressing a constitutively active EGFR [55]. Examination of two constitutively active mutants in PI3KCA vs. wild-type cells or overexpression and activation of EGFR vs wild-type cells showed that NRF2 was activated supporting earlier work [30]. We tested the ability of the LY294002 compound to suppress NRF2 in our SRXN1-GFP assay, but instead we were surprised that NRF2 was activated. The activation may be an off-target effect since the NRF2-activating concentration of the chemical was much higher than that used to inhibit PI3K. Overall, while screening for NRF2 suppressors appears to be feasible using our approach, the microarray data in the compendium was not generated under conditions optimized to allow suppression of constitutively activated NRF2. In summary, the present study is to our knowledge the first to utilize gene expression data to identify potential human NRF2 modulators in a large screening study. Using a similar approach, our group has recently identified chemicals that activate NRF2 in the mouse liver and compared the activation to conditions in which a number of xenobiotic-activated transcription factors are chemically activated [20, 21, 56].

One focus of the ToxCast high-throughput screening program at EPA is to prioritize for further testing the vast number of chemicals used in industry with little toxicity information. Gene expression profiling represents a more global screening approach and an important complementary system to the current in vitro HTS assays that are at the forefront of the field. The historical challenge of low throughput of gene expression analysis, has been partially addressed through advances in technology. Another major challenge since the beginning of toxicogenomics is linking changes in gene expression to specific molecular events and toxicological outcomes. Our work here in the case of NRF2 and by extrapolation increases in oxidative stress, and in other publications examining endpoints important in endocrine disruption [22, 23] and DNA damage [24, 57], directly addresses this issue. Moving forward, biomarkers for activation of transcription factors can, to some extent, be developed using existing microarray data. However, there are many important targets of environmental chemicals that lack the chemical activator data required to construct predictive biomarkers. Cell-based experiments utilizing siRNA knockdown or Crispr-Cas9 mediated knockout of transcription factors of interest combined with exposure to chemical activators and gene expression profiling via current, high-throughput approaches have the potential to quickly fill these data gaps.

Supporting information

S1 File. Supporting information.

Contains 1) list of genes that comprise the NRF2 biomarker, 2) evidence from ChIP-Seq studies for direct interactions between biomarker genes and NRF2, and 3) biosets examined in this study.

(XLSX)

Acknowledgments

We thank two anonymous scientists for critical review of the manuscript, Dr. Ann Richard for the ToxCast chemicals, Drs. James Flynn and Joe Delaney for assistance with BSCE, and Chuck Gaul and Molly Windsor for assistance in making the figures.

Disclaimer: The information in this document has been funded in part by the U.S. Environmental Protection Agency. It has been subjected to review by the Center for Computational Toxicology and Exposure and approved for publication. Approval does not signify that the contents reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use.

Data Availability

All microarray or RNA-Seq data is available in Gene Expression Omnibus and ArrayExpress. The accession numbers are found in S1 File.

Funding Statement

This research was supported in part by a postdoctoral appointment to JPR to the Research Participation Program for the U.S. Environmental Protection Agency, Office of Research and Development, administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and EPA. There was no additional external funding received for this study.

References

  • 1.Judson R, Houck K, Martin M, Knudsen T, Thomas RS, Sipes N, et al. In vitro and modelling approaches to risk assessment from the U.S. Environmental Protection Agency ToxCast programme. Basic & clinical pharmacology & toxicology. 2014;115(1):69–76. 10.1111/bcpt.12239 . [DOI] [PubMed] [Google Scholar]
  • 2.Browne P, Judson RS, Casey WM, Kleinstreuer NC, Thomas RS. Screening Chemicals for Estrogen Receptor Bioactivity Using a Computational Model. Environmental science & technology. 2015;49(14):8804–14. Epub 2015/06/13. 10.1021/acs.est.5b02641 . [DOI] [PubMed] [Google Scholar]
  • 3.Judson RS, Magpantay FM, Chickarmane V, Haskell C, Tania N, Taylor J, et al. Integrated Model of Chemical Perturbations of a Biological Pathway Using 18 In Vitro High-Throughput Screening Assays for the Estrogen Receptor. Toxicological sciences: an official journal of the Society of Toxicology. 2015;148(1):137–54. Epub 2015/08/15. 10.1093/toxsci/kfv168 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kleinstreuer NC, Ceger P, Watt ED, Martin M, Houck K, Browne P, et al. Development and Validation of a Computational Model for Androgen Receptor Activity. Chemical research in toxicology. 2017;30(4):946–64. Epub 2016/12/10. 10.1021/acs.chemrestox.6b00347 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Cox LA, Popken D, Marty MS, Rowlands JC, Patlewicz G, Goyak KO, et al. Developing scientific confidence in HTS-derived prediction models: lessons learned from an endocrine case study. Regulatory toxicology and pharmacology: RTP. 2014;69(3):443–50. Epub 2014/05/23. 10.1016/j.yrtph.2014.05.010 . [DOI] [PubMed] [Google Scholar]
  • 6.Filer D, Patisaul HB, Schug T, Reif D, Thayer K. Test driving ToxCast: endocrine profiling for 1858 chemicals included in phase II. Current opinion in pharmacology. 2014;19:145–52. Epub 2014/12/03. 10.1016/j.coph.2014.09.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science (New York, NY). 2006;313(5795):1929–35. Epub 2006/09/30. 10.1126/science.1132939 . [DOI] [PubMed] [Google Scholar]
  • 8.Lamb J. The Connectivity Map: a new tool for biomedical research. Nature reviews Cancer. 2007;7(1):54–60. Epub 2006/12/23. 10.1038/nrc2044 . [DOI] [PubMed] [Google Scholar]
  • 9.Subramanian A, Narayan R, Corsello SM, Peck DD, Natoli TE, Lu X, et al. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell. 2017;171(6):1437–52.e17. Epub 2017/12/02. 10.1016/j.cell.2017.10.049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Yeakley JM, Shepard PJ, Goyena DE, VanSteenhouse HC, McComb JD, Seligmann BE. A trichostatin A expression signature identified by TempO-Seq targeted whole transcriptome profiling. PloS one. 2017;12(5):e0178302 Epub 2017/05/26. 10.1371/journal.pone.0178302 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Oshida K, Vasani N, Thomas RS, Applegate D, Rosen M, Abbott B, et al. Identification of modulators of the nuclear receptor peroxisome proliferator-activated receptor alpha (PPARalpha) in a mouse liver gene expression compendium. PloS one. 2015;10(2):e0112655 Epub 2015/02/18. 10.1371/journal.pone.0112655 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Oshida K, Vasani N, Jones C, Moore T, Hester S, Nesnow S, et al. Identification of chemical modulators of the constitutive activated receptor (CAR) in a gene expression compendium. Nuclear receptor signaling. 2015;13:e002 Epub 2015/05/08. 10.1621/nrs.13002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Oshida K, Vasani N, Thomas RS, Applegate D, Gonzalez FJ, Aleksunes LM, et al. Screening a mouse liver gene expression compendium identifies modulators of the aryl hydrocarbon receptor (AhR). Toxicology. 2015;336:99–112. Epub 2015/07/29. 10.1016/j.tox.2015.07.005 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Corton JC. Frequent Modulation of the Sterol Regulatory Element Binding Protein (SREBP) by Chemical Exposure in the Livers of Rats. Submitted. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Corton JC. Frequent Modulation of the Sterol Regulatory Element Binding Protein (SREBP) by Chemical Exposure in the Livers of Rats. Computational toxicology (Amsterdam, Netherlands). 2019;10:113–29. Epub 2019/04/02. 10.1016/j.comtox.2019.01.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Oshida K, Vasani N, Waxman DJ, Corton JC. Disruption of STAT5b-Regulated Sexual Dimorphism of the Liver Transcriptome by Diverse Factors Is a Common Event. PloS one. 2016;11(3):e0148308 Epub 2016/03/10. 10.1371/journal.pone.0148308 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Oshida K, Waxman DJ, Corton JC. Chemical and Hormonal Effects on STAT5b-Dependent Sexual Dimorphism of the Liver Transcriptome. PloS one. 2016;11(3):e0150284 Epub 2016/03/10. 10.1371/journal.pone.0150284 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Rooney J, Chorley B, Corton JC. A Gene Expression Biomarker Identifies Chemicals and Other Factors in the Mouse Liver That Modulate Sterol Regulatory Element Binding Protein (SREBP) Highlighting Differences in Targeted Regulation of Cholesterogenic and Lipogenic Genes. Submitted. 2018. [Google Scholar]
  • 19.Rooney J, Hill T, 3rd, Qin C, Sistare FD, Corton JC. Adverse outcome pathway-driven identification of rat liver tumorigens in short-term assays. Toxicology and applied pharmacology. 2018;356:99–113. Epub 2018/07/27. 10.1016/j.taap.2018.07.023 . [DOI] [PubMed] [Google Scholar]
  • 20.Rooney J, Oshida K, Vasani N, Vallanat B, Ryan N, Chorley BN, et al. Activation of Nrf2 in the liver is associated with stress resistance mediated by suppression of the growth hormone-regulated STAT5b transcription factor. PloS one. 2018;13(8):e0200004 Epub 2018/08/17. 10.1371/journal.pone.0200004 https://www.toray.com/). There are no patents, products in development or marketed products to declare. This does not alter our adherence to all the PLOS ONE policies on sharing data and materials. Kai Wu was not employed by Janssen Pharmaceuticals (https://www.janssen.com/us/) at the time of the study, but is currently employed by the company. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Rooney JP, Oshida K, Kumar R, Baldwin WS, Corton JC. Chemical Activation of the Constitutive Androstane Receptor Leads to Activation of Oxidant-Induced Nrf2. Toxicological sciences: an official journal of the Society of Toxicology. 2019;167(1):172–89. Epub 2018/09/12. 10.1093/toxsci/kfy231 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ryan N, Chorley B, Tice RR, Judson R, Corton JC. Moving Toward Integrating Gene Expression Profiling Into High-Throughput Testing: A Gene Expression Biomarker Accurately Predicts Estrogen Receptor alpha Modulation in a Microarray Compendium. Toxicological sciences: an official journal of the Society of Toxicology. 2016;151(1):88–103. Epub 2016/02/13. 10.1093/toxsci/kfw026 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Rooney JP, Chorley B, Kleinstreuer N, Corton JC. Identification of Androgen Receptor Modulators in a Prostate Cancer Cell Line Microarray Compendium. Toxicological sciences: an official journal of the Society of Toxicology. 2018;166(1):146–62. Epub 2018/08/08. 10.1093/toxsci/kfy187 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Corton JC, Williams A, Yauk CL. Using a gene expression biomarker to identify DNA damage-inducing agents in microarray profiles. Environmental and molecular mutagenesis. 2018;59(9):772–84. Epub 2018/10/18. 10.1002/em.22243 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ankley GT, Bennett RS, Erickson RJ, Hoff DJ, Hornung MW, Johnson RD, et al. Adverse outcome pathways: a conceptual framework to support ecotoxicology research and risk assessment. Environmental toxicology and chemistry / SETAC. 2010;29(3):730–41. 10.1002/etc.34 . [DOI] [PubMed] [Google Scholar]
  • 26.Corton JC, Kleinstreuer NC, Judson RS. Identification of potential endocrine disrupting chemicals using gene expression biomarkers. Toxicology and applied pharmacology. 2019:114683 Epub 2019/07/22. 10.1016/j.taap.2019.114683 . [DOI] [PubMed] [Google Scholar]
  • 27.Motohashi H, Katsuoka F, Engel JD, Yamamoto M. Small Maf proteins serve as transcriptional cofactors for keratinocyte differentiation in the Keap1-Nrf2 regulatory pathway. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(17):6379–84. 10.1073/pnas.0305902101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ma Q. Role of nrf2 in oxidative stress and toxicity. Annual review of pharmacology and toxicology. 2013;53:401–26. 10.1146/annurev-pharmtox-011112-140320 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lau A, Villeneuve NF, Sun Z, Wong PK, Zhang DD. Dual roles of Nrf2 in cancer. Pharmacological research. 2008;58(5–6):262–70. 10.1016/j.phrs.2008.09.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Mitsuishi Y, Taguchi K, Kawatani Y, Shibata T, Nukiwa T, Aburatani H, et al. Nrf2 redirects glucose and glutamine into anabolic pathways in metabolic reprogramming. Cancer cell. 2012;22(1):66–79. Epub 2012/07/14. 10.1016/j.ccr.2012.05.016 . [DOI] [PubMed] [Google Scholar]
  • 31.Chambel SS, Santos-Goncalves A, Duarte TL. The Dual Role of Nrf2 in Nonalcoholic Fatty Liver Disease: Regulation of Antioxidant Defenses and Hepatic Lipid Metabolism. BioMed research international. 2015;2015:597134 Epub 2015/06/30. 10.1155/2015/597134 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Kupershmidt I, Su QJ, Grewal A, Sundaresh S, Halperin I, Flynn J, et al. Ontology-based meta-analysis of global collections of high-throughput public data. PloS one. 2010;5(9). Epub 2010/10/12. 10.1371/journal.pone.0013066 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Singh A, Misra V, Thimmulappa RK, Lee H, Ames S, Hoque MO, et al. Dysfunctional KEAP1-NRF2 interaction in non-small-cell lung cancer. PLoS medicine. 2006;3(10):e420 Epub 2006/10/06. 10.1371/journal.pmed.0030420 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Simmons SO, Fan CY, Yeoman K, Wakefield J, Ramabhadran R. NRF2 Oxidative Stress Induced by Heavy Metals is Cell Type Dependent. Current chemical genomics. 2011;5:1–12. Epub 2011/06/07. 10.2174/1875397301105010001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England). 2009;25(14):1754–60. Epub 2009/05/20. 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Feng J, Liu T, Zhang Y. Using MACS to identify peaks from ChIP-Seq data. Current protocols in bioinformatics. 2011;Chapter 2:Unit 2.14. Epub 2011/06/03. 10.1002/0471250953.bi0214s34 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Zhu LJ, Gazin C, Lawson ND, Pages H, Lin SM, Lapointe DS, et al. ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC bioinformatics. 2010;11:237 Epub 2010/05/13. 10.1186/1471-2105-11-237 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Chorley BN, Campbell MR, Wang X, Karaca M, Sambandan D, Bangura F, et al. Identification of novel NRF2-regulated genes by ChIP-Seq: influence on retinoid X receptor alpha. Nucleic acids research. 2012;40(15):7416–29. Epub 2012/05/15. 10.1093/nar/gks409 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Wink S, Hiemstra S, Herpers B, van de Water B. High-content imaging-based BAC-GFP toxicity pathway reporters to assess chemical adversity liabilities. Archives of toxicology. 2017;91(3):1367–83. Epub 2016/07/01. 10.1007/s00204-016-1781-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Wang X, Campbell MR, Lacher SE, Cho HY, Wan M, Crowl CL, et al. A Polymorphic Antioxidant Response Element Links NRF2/sMAF Binding to Enhanced MAPT Expression and Reduced Risk of Parkinsonian Disorders. Cell reports. 2016;15(4):830–42. Epub 2016/05/07. 10.1016/j.celrep.2016.03.068 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Yeager RL, Reisman SA, Aleksunes LM, Klaassen CD. Introducing the "TCDD-inducible AhR-Nrf2 gene battery". Toxicological sciences: an official journal of the Society of Toxicology. 2009;111(2):238–46. Epub 2009/05/29. 10.1093/toxsci/kfp115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Wang XJ, Hayes JD, Henderson CJ, Wolf CR. Identification of retinoic acid as an inhibitor of transcription factor Nrf2 through activation of retinoic acid receptor alpha. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(49):19589–94. 10.1073/pnas.0709483104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Namani A, Li Y, Wang XJ, Tang X. Modulation of NRF2 signaling pathway by nuclear receptors: implications for cancer. Biochimica et biophysica acta. 2014;1843(9):1875–85. 10.1016/j.bbamcr.2014.05.003 . [DOI] [PubMed] [Google Scholar]
  • 44.Gauthier BR, Wiederkehr A, Baquie M, Dai C, Powers AC, Kerr-Conte J, et al. PDX1 deficiency causes mitochondrial dysfunction and defective insulin secretion through TFAM suppression. Cell metabolism. 2009;10(2):110–8. 10.1016/j.cmet.2009.07.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Hayes JD, Dinkova-Kostova AT. The Nrf2 regulatory network provides an interface between redox and intermediary metabolism. Trends in biochemical sciences. 2014;39(4):199–218. 10.1016/j.tibs.2014.02.002 . [DOI] [PubMed] [Google Scholar]
  • 46.Pritsos CA, Sartorelli AC. Generation of reactive oxygen radicals through bioactivation of mitomycin antibiotics. Cancer research. 1986;46(7):3528–32. . [PubMed] [Google Scholar]
  • 47.Souza T, Jennen D, van Delft J, van Herwijnen M, Kyrtoupolos S, Kleinjans J. New insights into BaP-induced toxicity: role of major metabolites in transcriptomics and contribution to hepatocarcinogenesis. Archives of toxicology. 2016;90(6):1449–58. Epub 2015/08/05. 10.1007/s00204-015-1572-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Thomas RS, Bahadori T, Buckley TJ, Cowden J, Deisenroth C, Dionisio KL, et al. The Next Generation Blueprint of Computational Toxicology at the U.S. Environmental Protection Agency. Toxicological sciences: an official journal of the Society of Toxicology. 2019;169(2):317–32. Epub 2019/03/06. 10.1093/toxsci/kfz058 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Prince M, Li Y, Childers A, Itoh K, Yamamoto M, Kleiner HE. Comparison of citrus coumarins on carcinogen-detoxifying enzymes in Nrf2 knockout mice. Toxicology letters. 2009;185(3):180–6. Epub 2009/01/20. 10.1016/j.toxlet.2008.12.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Herpers B, Wink S, Fredriksson L, Di Z, Hendriks G, Vrieling H, et al. Activation of the Nrf2 response by intrinsic hepatotoxic drugs correlates with suppression of NF-kappaB activation and sensitizes toward TNFalpha-induced cytotoxicity. Archives of toxicology. 2016;90(5):1163–79. 10.1007/s00204-015-1536-3 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Balogun E, Hoque M, Gong P, Killeen E, Green CJ, Foresti R, et al. Curcumin activates the haem oxygenase-1 gene via regulation of Nrf2 and the antioxidant-responsive element. The Biochemical journal. 2003;371(Pt 3):887–95. 10.1042/BJ20021619 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Yang C, Zhang X, Fan H, Liu Y. Curcumin upregulates transcription factor Nrf2, HO-1 expression and protects rat brains against focal ischemia. Brain research. 2009;1282:133–41. 10.1016/j.brainres.2009.05.009 . [DOI] [PubMed] [Google Scholar]
  • 53.Deferme L, Briede JJ, Claessen SM, Cavill R, Kleinjans JC. Cell line-specific oxidative stress in cellular toxicity: A toxicogenomics-based comparison between liver and colon cell models. Toxicology in vitro: an international journal published in association with BIBRA. 2015;29(5):845–55. Epub 2015/03/25. 10.1016/j.tiv.2015.03.007 . [DOI] [PubMed] [Google Scholar]
  • 54.Hutti JE, Pfefferle AD, Russell SC, Sircar M, Perou CM, Baldwin AS. Oncogenic PI3K mutations lead to NF-kappaB-dependent cytokine expression following growth factor deprivation. Cancer research. 2012;72(13):3260–9. Epub 2012/05/04. 10.1158/0008-5472.CAN-11-4141 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Rothenberg SM, Concannon K, Cullen S, Boulay G, Turke AB, Faber AC, et al. Inhibition of mutant EGFR in lung cancer cells triggers SOX2-FOXO6-dependent survival pathways. eLife. 2015;4 Epub 2015/02/17. 10.7554/eLife.06132 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.J R. Placeholder for the Nrf2-CAR study. 2018.
  • 57.Cho E, Buick JK, Williams A, Chen R, Li HH, Corton JC, et al. Assessment of the performance of the TGx-DDI biomarker to detect DNA damage-inducing agents using quantitative RT-PCR in TK6 cells. Environmental and molecular mutagenesis. 2019;60(2):122–33. Epub 2018/11/30. 10.1002/em.22257 . [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Roberto Mantovani

1 May 2020

PONE-D-20-07833

Mining a human transcriptome database for chemical modulators of NRF2

PLOS ONE

Dear Dr. Corton,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. particukar care should be given to addessing concerns of Reviewer 2, who has been more critical.

We would appreciate receiving your revised manuscript by Jun 14 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Roberto Mantovani

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements:

1.    Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please provide additional information about the HepG2 cell line used in this work, including source, history, culture conditions and any quality control testing procedures (authentication, characterisation, and mycoplasma testing). For more information, please see http://journals.plos.org/plosone/s/submission-guidelines#loc-cell-lines.

3. Please note that PLOS does not permit references to “data not shown.” Authors should provide the relevant data within the manuscript, the Supporting Information files, or in a public repository. If the data are not a core part of the research study being presented, we ask that authors remove any references to these data."

4. To comply with PLOS ONE submission guidelines, in your Methods section, please provide additional information regarding your statistical analyses. For more information on PLOS ONE's expectations for statistical reporting, please see https://journals.plos.org/plosone/s/submission-guidelines.#loc-statistical-reporting.

5. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

6. Thank you for stating the following in the Acknowledgments Section of your manuscript:

"The information in this document has been funded in part by the U.S. Environmental Protection Agency. This research was supported in part by a postdoctoral appointment to JPR to the Research Participation Program for the U.S. Environmental Protection Agency, Office of Research and Development, administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and EPA."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

"The authors received no specific funding for this work."

7. Thank you for stating in your Funding Statement:

"The information in this document has been funded in part by the U.S. Environmental Protection Agency. This research was supported in part by a postdoctoral appointment to JPR to the Research Participation Program for the U.S. Environmental Protection Agency, Office of Research and Development, administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and EPA."

Please provide an amended statement that declares *all* the funding or sources of support (whether external or internal to your organization) received during this study, as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now.  Please also include the statement “There was no additional external funding received for this study.” in your updated Funding Statement.

Please include your amended Funding Statement within your cover letter. We will change the online submission form on your behalf.

8. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data.

9. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Gene expression data:

Is it possible to make the expression data available for your biomarker gene set, or does PLOS One require the publication of this data?

A short explanation of the commercial database used here or how this database ensures the integration of the different gene expression data (normalization strategy, ...) would improve the transparency of the data analysis. May be the authors can add a short paragraph instead of citing an article. My answer to the question: "Has the statistical analysis been performed appropriately and rigorously? -> I don't know" is related to this part.

Characterization of the NRF2 biomarker genes (ChIP-seq analysis):

The authors analyzed and verified the final biomarker gene set in HEPG2 cells. To what extent are the used ChIP-seq data suitable to make statements about regulatory areas in HEPG2 cells? To what extent can direct (motif-defined) or indirect binding of NRF2 be expected?

How do the authors linked the experimental ChIP-seq regions to the biomarker gene set? May be a motif analysis of a "defined" promoter-region is more meaningful here. Perhaps this analysis was done already (see link: supplemental file 1 and line 390 in the manuscript) but may be the authors can add more details here.

In this paper the ChIP-seq analysis was done based on Ensembl database version 86 (Okt 2016). Why did the authors use this older version of the database in this actual paper?

In figure 4 the term 'bioset' is used in the legend and in the paper. In the paper the term 'biomarker gene set' is used before. Do these two terms refer to the same gene set? A minor point but I was just confused about this two terms. May be it is a frequently used term, which I just do not know.

Dose-, time-, and mutation-dependent NRF2 modulation:

The authors compare different chemicals screened in the ToxCast and/or Tox21 studies and relate these results to NRF2. First liver-cells are shown but also fibroblast and breast cancer cell line data are shown. The authors validate their findings using ARE-linked reporter system in HepG2 cells. The comparison with non-liver-related results is confusing. How comparable the findings between different cell types. Is there a possibility to estimate the completeness of this gene set?

-----

Figure 1: Biomarker construction and screening strategy.

-> In all other Figure description the authors use Figure X. Description of the figure.

May be the ":" is a typo.

short comment to Figure 6:

May be the used red color should be defined in the legend of this figure.

Reviewer #2: The manuscript describes a very reasonable approach to using chemical perturbation transcriptional signatures to identify chemicals altering the activity of the transcription factor NRF2. I have several comments that I would like to see addressed by the authors.

Procedure used to construct the list of differentially expressed genes used as NRF2 biomarkers - Relevant sections: Methods (162-177), Results (334-338):

- I think some key details are missing from the methods. In general, the level of detail and tone of the methods section seems more adequate to the results section. Specifically:

- can you be more specific, and provide the logical rules that were used to construct these gene-sets, e.g. fold change ≥ 1.5 and adjusted p-value < ... in at least N/M NRF2 activation data-sets, and fold change < ... and adjusted p-value < ... in ... NRF2 activation data-set(s)?

- how was gene differential expression calculated? Please state what method was used (e.g. limma) and if a linear model was used, how were the experimental factors modelled? And what multiple test correction method. This applies to the analysis of other differential expression data-sets

- can you please specify if all genes were probed or only a subset? (some older microarray platforms only probed a subset of genes)

- how different were these biomarker gene-sets from each other? E.g. can you show a reciprocal overlap matrix, and report their sizes..?

Performance assessment:

- I would appreciate a bit more detail on the performance for all ten biomarker gene-sets. You used ToxCast and Tox21 data to pick the best biomarker gene-set, therefore these data-sets were in a sense part of your training set. You don't have a completely separate test set, and so the performance may be a bit inflated. In general, it is a good practice to use cross-validation in this type of situation. However, if the performances of the ten biomarker gene-sets are all quite similar, I would not be too worried about this (also recognizing that you don't have a lot of ToxCast and Tox21 datasets, and so implementing cross-validation may be a bit tricky, the set is small, so things may get noisy). For the same reason, it would be useful to know how different were these biomarker gene-sets from each other (as commented above). You could use the ChIP-seq data to argue that the selection of the best biomarker set is robust (see comment below).

- Is this PPV really general? Usually the PPV can be different from sensitivity and specificity if the number of real positives is swamped by the number of real negatives (e.g. a test detecting a disorder that's present in 0.1% of the population). You have 14 real positives and 45 real negatives, does this reflect the general distribution in reality, can you comment on this?

ChIP-seq and PWM analysis:

- the percentage of signature gene with a ChIP-seq mark is impressive (68.5%). I would like a bit more details. For the annotation using ChIPpeakAnno, how far from a gene (transcription start or end) could a peak be? What fraction of genes had a peak before performing any intersection with the biomarker gene set? Can you show that this is much better as e.g. taking genes that were upregulated in at least one NRF2 activation data-set? Can you show that the top biomarker gene-set had also better ChIP-seq overlap (following up on the issue of potential inflation in performance)

- similar considerations for the PWM analysis

Ingenuity pathway analysis

- can you please use multiple-test adjusted p-values? (or if you already did, make that explicit)

- can you provide more details on how this analysis is performed "top upstream regulating transcription factor that regulated the biomarker genes"

Screening for chemicals that modulate NRF2

- can you explicitly comment on the anticipated false discovery rate? My back of the envelope calculation is: 9840 biosets * 1e-4 = 0.984 biosets, which is pretty stringent (this is very simplistic as it does not take into account correlations, which are probably abundant)

- for chemicals represents in multiple data-sets, how about using the median fold-change to collapse redundancy..?

Follow-up screening using the SRXN1-GFP assay

- in my mind, this experiment should be used as follows: from the results of 'Screening for chemicals that modulate NRF2', pick some that induce a positive effect, some that induce a negative effect, some that don't induce an effect; this selection should be unbiased, and it should exclude chemicals whose differential expression was used to construct and optimize the biomarker set (e.g. quercetin, sulindac, ...). Testing genes used to construct the biomarker gene-set would make sense only to validate this assay (e.g. it is known that quercetin activates NRF2). Doses should also be matched. Instead, it appears this experiment was designed to answer multiple questions (e.g. are negative results in the ToxCast and Tox21 used to select the best biomarker gene-set really negative, even when the biomarker gene-set suggests otherwise?), and it's harder to draw firmer conclusions; for instance, several of the chemicals use to construct and optimize the best biomarker gene-set were included.

- for these reasons, I am not supportive of this statement in the discussion << In an independent study

of 29 chemicals, the predictions based on the NRF2 biomarker were independently validated for most (81%) of the chemicals in HepG2 cells encoding a NRF2-responsive GFP reporter. >>

Discussion

- it is quite long-winded and should be shortened/summarized a bit

Other / General

- is it possible to have both effect size and p-value for biomarker signature correlations using Running Fisher..?

Table 1

- Is it possible to indicate the number of differentially expressed genes?

Figure 2

- figure 2A has resolution issues, please remake it

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Sep 28;15(9):e0239367. doi: 10.1371/journal.pone.0239367.r002

Author response to Decision Letter 0


12 Jun 2020

Response to reviewer comments for

Mining a human transcriptome database for chemical modulators of Nrf2

John P. Rooney1,2, Brian Chorley2, Steven Hiemstra3, Steven Wink3, Xuting Wang4, Douglas A. Bell4, Bob van de Water3, and J. Christopher Corton2,5

We thank the two reviewers for their suggestions that greatly improve the clarity and impact of the manuscript. Our responses to each comment are found below in bold.

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Gene expression data:

Is it possible to make the expression data available for your biomarker gene set, or does PLOS One require the publication of this data?

We have provided the list of the biomarker genes and their associated fold-change values in Supplemental File 1. Also in this file, we provide a master table of the different microarray comparisons (biosets) examined in this study including the GEO or ArrayExpress numbers of the publicly available datasets. We hope this addresses your question.

A short explanation of the commercial database used here or how this database ensures the integration of the different gene expression data (normalization strategy, ...) would improve the transparency of the data analysis. May be the authors can add a short paragraph instead of citing an article. My answer to the question: "Has the statistical analysis been performed appropriately and rigorously? -> I don't know" is related to this part.

The reviewer makes an important point about the data analysis procedures that should be clear in the paper. We have added text in the first paragraph in the Methods section to describe the gene expression workflow used in BaseSpace Correlation Engine including the criteria for selecting significantly altered genes. It should be noted that our group has used this database for the last 5 years resulting in >20 publications. The data is rigorously quality controlled. In fact, BSCE notes that they reject ~30% of all studies, because they do not meet their quality control criteria. We have great confidence that the methods that are used are rigorous and allow for appropriate comparisons between gene lists.

Characterization of the NRF2 biomarker genes (ChIP-seq analysis):

The authors analyzed and verified the final biomarker gene set in HEPG2 cells. To what extent are the used ChIP-seq data suitable to make statements about regulatory areas in HEPG2 cells? To what extent can direct (motif-defined) or indirect binding of NRF2 be expected?

There are no available NRF2 ChIP datasets that have been performed in NRF2 activated human hepatocyte or liver samples, therefore we had to rely on the limited published datasets that were currently available. Our previous studies in LCL (Chorley et al. NAR 2012) demonstrated that in basal conditions, NRF2 has very low activation levels without an electrophilic or pro-oxidant stimulation, therefore we attempted to tip the balance using only “NRF2 activated” datasets. The three cell lines included NRF2 activated (chemical or genetic) human airway cells (A549, BEAS-2B) and lymphoblastoid cells (LCL). Although all human based, there will be cell type or tissue specific differences for NRF2 targets in lung, blood cells, and liver, however we cast the widest net possible to include all detected ChIP-based targets in this analysis. The resultant comparison to liver-centric NRF2 targets may result in a higher rate of “false negatives”, and may have seen a higher validation of genes using a liver-based model. This point is now mentioned in the results, page 11, lines 405-408. Given our resource limited environment, we had to use previously published data. The additional analysis of detecting an ARE in these regions signifies that these are likely bone fide target genes. Indeed, some concerns of genome pull down due to complex protein interactions and 3D conformational chromatin interactions may result in NRF2 enrichment in genomic regions that NRF2 is not directly mediating transcription. To more clearly define likely cis-acting transcriptional mediation, we re-ran these datasets, limiting the NRF2-bound loci to just 10K bp with the location of bi-directional promotors using a new feature in “ChIPpeakAnno”. The combination of ARE identification + ChIP-identified loci within in promoter regions give the best possible evidence for a gene to be directly targeted (and presumably transactivated) by NRF2.

How do the authors linked the experimental ChIP-seq regions to the biomarker gene set? May be a motif analysis of a "defined" promoter-region is more meaningful here. Perhaps this analysis was done already (see link: supplemental file 1 and line 390 in the manuscript) but may be the authors can add more details here.

The original analysis simply looked for the closest gene region to the annotated peaks. Based on you and Reviewer 2’s suggestions, we limited the loci to those that are likely cis-acting (e.g. within 10Kbp of bi-directional promoter regions of genes and other transcribed regions). The combination of NRF2 bound peaks close to annotated gene promoters and the identification of bone fide ARE sequences within these peaks gave the best available evidence that these gene promoters are likely directly bound (and presumably regulated) by NRF2. This change is now noted on pg. 7, line 262.

In this paper the ChIP-seq analysis was done based on Ensembl database version 86 (Okt 2016). Why did the authors use this older version of the database in this actual paper?

For this re-run of the data, the current UCSC transcriptional database, last updated Oct 2019, was used to annotate the ChIP-seq peaks. This is noted on pg. 7, lines 263-264.

In figure 4 the term 'bioset' is used in the legend and in the paper. In the paper the term 'biomarker gene set' is used before. Do these two terms refer to the same gene set? A minor point but I was just confused about this two terms. May be it is a frequently used term, which I just do not know.

We are sorry this was not made clear. A bioset refers to a list of differentially expressed genes resulting from a specific comparison in the BSCE such as a chemical treatment vs control treatment. Text has been added to the first paragraph of the Methods to define this term.

Dose-, time-, and mutation-dependent NRF2 modulation:

The authors compare different chemicals screened in the ToxCast and/or Tox21 studies and relate these results to NRF2. First liver-cells are shown but also fibroblast and breast cancer cell line data are shown. The authors validate their findings using ARE-linked reporter system in HepG2 cells. The comparison with non-liver-related results is confusing. How comparable the findings between different cell types. Is there a possibility to estimate the completeness of this gene set?

The reviewer brings up some important points. The biomarker was constructed in a way to use in evaluating liver cell line data but also to use with transcript profiles derived from other cell lines that express Nrf2. Thus, we used a number of biosets from liver and non-liver cells to construct the biomarker to find common genes. Non-liver cells were used in exploratory data analysis when liver cell data was not available. Lung fibroblasts and MCF cells are known to express Nrf2 and respond to oxidative stress, and while the biomarker was not formally tested for accuracy in lung cells, it is not surprising to see activation in these comparisons. The inclusion of the breast cancer cell line was based in part on the fact that suppression of Nrf2 signaling by siRNA was carried out in this cell line. In answer to the question about the completeness of the gene set, we do not have an answer to that. It is very likely that there are other Nrf2 regulated genes that were not identified but that they may be more context specific. For example, there may be genes that are regulated by Nrf2 in chemical-specific and cell-specific manners as we note in the manuscript. However, the goal of the project was not to evaluate the Nrf2 regulon in multiple tissues but to find a set of genes consistently altered when Nrf2 is perturbed that can be used for prediction. We are confident based on our findings that the biomarker that we constructed is able to predict Nrf2 activation in many cell types. We hope this addresses the questions.

-----

Figure 1: Biomarker construction and screening strategy.

-> In all other Figure description the authors use Figure X. Description of the figure.

May be the ":" is a typo.

Thank you for noticing this. This has been fixed.

short comment to Figure 6:

May be the used red color should be defined in the legend of this figure.

A sentence has been added to the figure legend to describe the red shaded results.

Reviewer #2: The manuscript describes a very reasonable approach to using chemical perturbation transcriptional signatures to identify chemicals altering the activity of the transcription factor NRF2. I have several comments that I would like to see addressed by the authors.

Procedure used to construct the list of differentially expressed genes used as NRF2 biomarkers - Relevant sections: Methods (162-177), Results (334-338):

- I think some key details are missing from the methods. In general, the level of detail and tone of the methods section seems more adequate to the results section. Specifically:

- can you be more specific, and provide the logical rules that were used to construct these gene-sets, e.g. fold change ≥ 1.5 and adjusted p-value < ... in at least N/M NRF2 activation data-sets, and fold change < ... and adjusted p-value < ... in ... NRF2 activation data-set(s)?

Thank you for pointing out that this area needs further clarification. We have now added details in the Methods for describing how we filtered the genes that ultimately ended up in the biomarker. These include a fold change cutoff of >=1.5 averaged across all 7 biosets used to construct the biomarker in which Nrf2 is activated. The unadjusted p-value cutoff of p<0.05 that is used is inherent to the BaseSpace Correlation Engine data processing (see comments from reviewer #1). For genes to be included in the biomarker, expression changes had to be in the same direction in 4 of the 7 activation biosets, and none could include changes in the opposite direction. Lastly, the direction of regulation had to be in the opposite direction in the Nrf2 siRNA bioset. We have now added text in the Methods and the Results to better describe the criteria used to build the final biomarker used in the study.

- how was gene differential expression calculated? Please state what method was used (e.g. limma) and if a linear model was used, how were the experimental factors modelled? And what multiple test correction method. This applies to the analysis of other differential expression data-sets

A paragraph has been added to the methods section describing the gene expression workflow used in BaseSpace Correlation Engine. The methods used by BSCE to identify genes are deliberately less stringent than other approaches in that no multiple test correction is implemented. However, this works to our advantage when building a biomarker as genes that would be filtered out by the MTC are still available to be considered as candidates for incorporation into the biomarker. Our stringent cutoffs implemented in the biomarker construction procedures greatly reduce the number of possible Nrf2 regulated genes to find those that are cell line and chemical agnostic allowing the identification of a set of genes that are predictive of Nrf2 modulation.

- can you please specify if all genes were probed or only a subset? (some older microarray platforms only probed a subset of genes)

This is an important point. BSCE only incorporates data from studies that use genome-level gene expression microarrays or RNA-Seq. With that said, there are examples of microarrays that do not cover the entire genome including the U133A Affy chip. Each GEO or ArrayExpress number for each bioset is provided in Supplemental File 1 to allow answering further questions about each of the biosets if the reader so desires.

- how different were these biomarker gene-sets from each other? E.g. can you show a reciprocal overlap matrix, and report their sizes.?

Performance assessment:

- I would appreciate a bit more detail on the performance for all ten biomarker gene-sets. You used ToxCast and Tox21 data to pick the best biomarker gene-set, therefore these data-sets were in a sense part of your training set. You don't have a completely separate test set, and so the performance may be a bit inflated. In general, it is a good practice to use cross-validation in this type of situation. However, if the performances of the ten biomarker gene-sets are all quite similar, I would not be too worried about this (also recognizing that you don't have a lot of ToxCast and Tox21 datasets, and so implementing cross-validation may be a bit tricky, the set is small, so things may get noisy). For the same reason, it would be useful to know how different were these biomarker gene-sets from each other (as commented above). You could use the ChIP-seq data to argue that the selection of the best biomarker set is robust (see comment below).

- Is this PPV really general? Usually the PPV can be different from sensitivity and specificity if the number of real positives is swamped by the number of real negatives (e.g. a test detecting a disorder that's present in 0.1% of the population). You have 14 real positives and 45 real negatives, does this reflect the general distribution in reality, can you comment on this?

The reviewer asks some very relevant questions about our analysis. We realize that this is confusing and detracts from the presentation of the value of the final biomarker used. In response to these concerns, we have deleted the text mentioning the other biomarkers as these do not add to the paper. We feel that the manuscript is long enough to describe the process of choosing the biomarker studied. We have also changed Figure 1 accordingly. These modifications now make the paper consistent with our past publications including one in PLOS ONE in which we focus on the final biomarker studied, the criteria for filtering genes and accuracy determination as we have done for the biomarkers for the estrogen receptor, the androgen receptor, and MTF1.

In regards to the questions about prediction. First, we did not use the biosets used in making the biomarker in our prediction study. That is we did include the “training set”. Second, we had to use the Tox21/ToxCast data to provide a list of the true positives and negatives and we feel that this was the best dataset that could be used. These assays were trans-activation assays and did not involve gene expression profiling so including these identified chemicals examined in the same cell line but in different labs and likely under different exposure conditions provided a good test set for testing predictive accuracy. The reviewer makes a good point about the test was not balanced. Even in cases where the true positives and true negatives are not approximately equal, we decided to use the information we had to carry out the predictive accuracy.

Regarding the point about using the ChIP-Seq data to help guide the selection of the genes, it could be possible, but we have not tried it. We feel that if we use the ChIP-Seq data to guide selection of the genes, we may end up with genes that are not necessarily good for prediction of Nrf2 activation when you query microarray profiles. However, looking for overlap after selection of the genes helps to validate that we have selected a set of genes enriched in those that are directly regulated by Nrf2.

Regarding the comment about the distribution of Nrf2 positives and negatives in the chemical universe. Not sure anybody can answer that question with available data as it may likely be dependent on cellular context. However, screens for Nrf2 activation in the trans-activation studies could be used to answer that question. Our database is not set up to systematically answer that question due to the inherent heterogeneity of the dataset. Interesting question though.

ChIP-seq and PWM analysis:

- the percentage of signature gene with a ChIP-seq mark is impressive (68.5%). I would like a bit more details. For the annotation using ChIPpeakAnno, how far from a gene (transcription start or end) could a peak be? What fraction of genes had a peak before performing any intersection with the biomarker gene set? Can you show that this is much better as e.g. taking genes that were upregulated in at least one NRF2 activation data-set? Can you show that the top biomarker gene-set had also better ChIP-seq overlap (following up on the issue of potential inflation in performance) - similar considerations for the PWM analysis

Thank you for raising these concerns of potentially inflating the gene matches to the ChIP-seq datasets. As a result of this concern, we reevaluated these data to just annotate peaks within 10Kb of bi-directional promoter regions. This distance was dictated from our previous findings that many of NRF2 peaks with ARE regions fell within this range of gene transcriptional start sites (Chorley et al Nucleic Acids Research 2012). Using this cut-off, we enrich the possibility of finding NRF2-bound loci that are potentially cis-acting. The previous analysis identified features that were closest to the peak, however these distances could be 100kB + away. Although it is possible that conformational changes in the chromatin could create enhancer regions at these distances, we do not have the information to confirm this. Therefore, this re-evaluation enriched those loci near annotated human promoter regions. Disregarding the intersect with the biomarker gene set and using this new filter, this current analysis identified peaks 2819, 301 and 5212 transcriptional features near peaks. Combined, this is 7633 unique features. Note this is out of a possible 197,782 transcriptional features identified in the database. Of these features, 38 genes annotated with peaks were in common with the biomarker set. This lowered our match appreciably to 26.6%, however we more likely captured the genes that are being directly regulated by NRF2. Regarding enrichment of these peak-matched genes with other NRF2 data-sets, the purpose was not to validate that NRF2 was involved in the alteration of these genes (as would be captured by gene expression datasets, as suggested, and would be predictably a better overlap), but to show some supportive evidence that some of these biomarker genes were likely directly regulated by NRF2. While not of primary concern for the gene biomarker list generation (which could contain genes further downstream of target genes, i.e. non-direct), it does support that we are likely capturing some direct target genes of NRF2. Further refinement is achieved by demonstrating the presence of an ARE. With our new analysis we show that 19 of 38 genes with NRF2 peaks contain AREs (50%). This is a much better match than with the previous analysis that only found 21 ARE associated with 98 peak-associated genes (21.4%), supporting the idea that we are enriching regions with potential of direct NRF2 binding. These changes are now reflected in the updated Supplemental File 1 and on pg. 11, lines 403-412 and pg.15, lines 604-608.

Ingenuity pathway analysis

- can you please use multiple-test adjusted p-values? (or if you already did, make that explicit)

- can you provide more details on how this analysis is performed "top upstream regulating transcription factor that regulated the biomarker genes"

We have now performed a MTC using a Benjamini Hochberg analysis on the IPA results. While the more stringent cutoff resulted in some of the canonical pathways becoming insignificant, the upstream analysis results did not change. As such we have now reconstructed one of the figures and indicate in the Methods and legend that q-values were used.

Screening for chemicals that modulate NRF2

- can you explicitly comment on the anticipated false discovery rate? My back of the envelope calculation is: 9840 biosets * 1e-4 = 0.984 biosets, which is pretty stringent (this is very simplistic as it does not take into account correlations, which are probably abundant)

- for chemicals represents in multiple data-sets, how about using the median fold-change to collapse redundancy..?

The reviewer makes an interesting point here about the false discovery rate. In past studies with other biomarkers we tried to determine what p-value cutoff to use for the Running Fisher test. We started with a p-value of 0.001 and after a BH MTC the p-value cutoff was set at 0.0001 which we have used in all of our biomarker studies. Using this as our threshold for determining activation, our predictive accuracy has always been above 90% and ranges up to 98%. Importantly we are able to identify known activators which have correlation p-values close to 0.0001 for all of our biomarkers.

Regarding the point about collapsing the fold change values, we have not thought about that. We are not sure we understand what the reason would be for doing that. We feel that each bioset represents a unique exposure situation even if the same chemical is used. By not collapsing, we identify chemical exposure conditions that lead to activation. By collapsing gene sets from the same chemical, we could increase the false positive rate and false negative rates due to changes in the expression of different sets of genes within the biomarker. However, we can only speculate but we appreciate the idea that may be used in future studies if we can somehow develop code that could do this automated fashion.

Follow-up screening using the SRXN1-GFP assay

- in my mind, this experiment should be used as follows: from the results of 'Screening for chemicals that modulate NRF2', pick some that induce a positive effect, some that induce a negative effect, some that don't induce an effect; this selection should be unbiased, and it should exclude chemicals whose differential expression was used to construct and optimize the biomarker set (e.g. quercetin, sulindac, ...). Testing genes used to construct the biomarker gene-set would make sense only to validate this assay (e.g. it is known that quercetin activates NRF2). Doses should also be matched. Instead, it appears this experiment was designed to answer multiple questions (e.g. are negative results in the ToxCast and Tox21 used to select the best biomarker gene-set really negative, even when the biomarker gene-set suggests otherwise?), and it's harder to draw firmer conclusions; for instance, several of the chemicals use to construct and optimize the best biomarker gene-set were included.

- for these reasons, I am not supportive of this statement in the discussion << In an independent study

of 29 chemicals, the predictions based on the NRF2 biomarker were independently validated for most (81%) of the chemicals in HepG2 cells encoding a NRF2-responsive GFP reporter. >>

The reviewer brings up some important points here for this validation exercise. We see the point about including the two chemicals used to make the biomarker in the analysis. We have now recategorized them as positive controls and left them out of the analysis. Using the new list of chemicals, we have performed a predictive accuracy determination to find that the balanced accuracy is 78%. We have changed the text in the Results section and the Discussion regarding this analysis.

Discussion

- it is quite long-winded and should be shortened/summarized a bit

We agree that there was some redundancy with the text in the Results. We have now shortened the discussion as suggested by cutting two paragraphs and removing additional text in other paragraphs.

Other / General

- is it possible to have both effect size and p-value for biomarker signature correlations using Running Fisher..?

The reviewer brings up an intriguing point. If by effect size you mean in this case the number of overlapping genes that the correlation is calculated on or the fold-change of the genes, we have not determined that. The problem with fold-change is that the different microarray platforms have more compressed fold-change values compared to the profiles generated from RNA-Seq datasets. So the method used in CE, the fold-change rank based method compensates for these differences in fold-change between platforms. We have determined the effect of removal of different sets of biomarker genes in other studies (the estrogen receptor biomarker and TGx-DDI biomarker) and found that fewer genes generally results in lower sensitivity especially for genes that have higher fold-change (higher rank).

We have considered the -Log(p-value) as an effect in itself, a concept not universally appreciated. In a paper that was accepted in Toxicological Sciences this week from our lab, we have determined whether there is a -Log(p-value) that is associated with liver tumor induction in rats for a set of 6 gene expression biomarkers. In fact, we can identify these thresholds and use these for prediction. The basis for this is that as the factor is first activated, the number of genes altered that overlap with the biomarker is low and thus the correlation is low with the associated p-value close to 0.0001. As the factor is activated to a greater extent, there are more genes altered that overlap with the biomarker, a greater correlation and lower p-value, etc. So in our hands, with all of our biomarkers we see a relationship over and over again of greater activation leads to greater correlation and lower p-value. Thus, we determined that the p-value can be used to get an estimate of the level of activation of the factor and in the mentioned study, we found that when the p-value is low enough there is a level of activation of a factor (say AhR or CAR) that in a chronic situation would lead to liver tumors.

Table 1

- Is it possible to indicate the number of differentially expressed genes?

We have now added a column for the number of differentially expressed genes in each of the biosets.

Figure 2

- figure 2A has resolution issues, please remake it

The reviewer is correct in that the resolution is not optimal. We have remade the figure to improve resolution.

Attachment

Submitted filename: ResponseToReviewers_Rooney_06_10_20.docx

Decision Letter 1

Roberto Mantovani

11 Aug 2020

PONE-D-20-07833R1

Mining a human transcriptome database for chemical modulators of NRF2

PLOS ONE

Dear Dr. Corton,

Thank you for submitting your manuscript to PLOS ONE. After further consideration, we feel that minor revisions are required before the manuscript fully meet PLOS ONE’s publication criteria. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised  by one of the Reviewers during the second round of the process.

Please submit your revised manuscript by Sep 25 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Roberto Mantovani

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: The manuscript is much clearer. Only minor revisions required.

OVERALL

The official gene symbol is 'NFE2L2'; but the synonym 'NRF2' has probably been used broadly. To avoid confusion, can you state this at the beginning of the manuscript, and use only one of these two names in the figures and tables? For instance, it is confusing to have in figure 3C a title 'Nrf2 upstream regulators' and then 'NFE2L2' listed as one (but it's actually the same as NRF2)

METHODS

<< The average fold-change across the 7 biosets in which NRF2 was activated had to be > |+/- 1.5-fold|. >> -- is this a log2 (FC)..? Please state explicitly

<< The p-value is the probability of the overlap between the NRF2 biomarker gene list and the IPA pathway gene list. The smaller the p-value the less likely that the association is random. >> -- can you adopt a more rigorous explanation?

RESULTS

Building the NRF2 biomarker and assessment of predictive accuracy

In the response to one of the reviewer comments the authors state << First, we did not use the biosets used in making the biomarker in our prediction study. That is we did include the “training set”. >> -- I think the authors meant "we did *not* include the “training set”"; can you make this clear in the results and methods?

Fig 2A: can the authors add labels with the name of the chemicals, or add a table with this information?

Characterization of the NRF2 biomarker genes.

<< Of the 143 NRF2 biomarker genes, 38 (26.6% ) of these were associated with the NRF2- bound ChIP-Seq loci near gene promoter regions (within 10 Kb) in at least one dataset derived from cell lines treated with NRF2-activating isothiocyanate, SFN, or have constitutively active NRF2 >> -- can you report what's the background rate, when considering all genes?

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Sep 28;15(9):e0239367. doi: 10.1371/journal.pone.0239367.r004

Author response to Decision Letter 1


19 Aug 2020

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: The manuscript is much clearer. Only minor revisions required.

We thank the reviewer for providing suggestions that helped to make this a better paper.

OVERALL

The official gene symbol is 'NFE2L2'; but the synonym 'NRF2' has probably been used broadly. To avoid confusion, can you state this at the beginning of the manuscript, and use only one of these two names in the figures and tables? For instance, it is confusing to have in figure 3C a title 'Nrf2 upstream regulators' and then 'NFE2L2' listed as one (but it's actually the same as NRF2)

We have now made a number of changes suggested by the reviewer. These include stating that Nrf2 is encoded by the NFE2L2 gene in both the abstract and introduction when Nrf2 is first introduced, changing NFE2L2 to Nrf2 in Fig 3C, and adding two changes in Table 1 to indicate that either the Keap1 or NFE2L2 gene are knocked down.

METHODS

<< The average fold-change across the 7 biosets in which NRF2 was activated had to be > |+/- 1.5-fold|. >> -- is this a log2 (FC)..? Please state explicitly

We have now changed the sentence to read “The average fold-change across the 7 biosets in which NRF2 was activated had to be ≥ |+/- 1.5-fold| (not Log2(fold-change).”. We hope this suffices.

<< The p-value is the probability of the overlap between the NRF2 biomarker gene list and the IPA pathway gene list. The smaller the p-value the less likely that the association is random. >> -- can you adopt a more rigorous explanation?

We removed the last sentence mentioned and added “Significant reported pathways have q-values < 1E-3.”.

RESULTS

Building the NRF2 biomarker and assessment of predictive accuracy

In the response to one of the reviewer comments the authors state << First, we did not use the biosets used in making the biomarker in our prediction study. That is we did include the “training set”. >> -- I think the authors meant "we did *not* include the “training set”"; can you make this clear in the results and methods?

Thanks for picking up this mistake. We have now changed the sentence on line 258 to read “The biosets used to create the biomarker (essentially the training set) were excluded from the test.” As suggested, we added the same sentence to the Results on line 409.

Fig 2A: can the authors add labels with the name of the chemicals, or add a table with this information?

We have now provided this information in Table S1. We now state on line 414 “Information about the biosets used in the test are provided in Table S1.”

Characterization of the NRF2 biomarker genes.

<< Of the 143 NRF2 biomarker genes, 38 (26.6% ) of these were associated with the NRF2- bound ChIP-Seq loci near gene promoter regions (within 10 Kb) in at least one dataset derived from cell lines treated with NRF2-activating isothiocyanate, SFN, or have constitutively active NRF2 >> -- can you report what's the background rate, when considering all genes?

We have now reported the background rate in the Results. The text now reads: “Of the 143 NRF2 biomarker genes, 38 (26.6% ) of these were associated with the NRF2-bound ChIP-Seq loci near gene promoter regions (within 10 Kb) in at least one dataset derived from cell lines treated with NRF2-activating isothiocyanate, SFN, or have constitutively active NRF2 [33, 38, 42]. This is in comparison to the background rate of 13.7%, where a total of 3553 genes contained NRF2-ChIP bound regions within 10 Kb of the TSS of the approximately 26,000 identified transcribed genes in the human genome (hg38). Therefore, the NRF2 biomarker genes are more significantly represented with NRF2-bound regions (Fisher’s Exact test, p<0.05).”

Attachment

Submitted filename: Response to reviewer comments hNrf2 08_19_20.docx

Decision Letter 2

Roberto Mantovani

7 Sep 2020

Mining a human transcriptome database for chemical modulators of NRF2

PONE-D-20-07833R2

Dear Dr. Corton,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Roberto Mantovani

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Roberto Mantovani

15 Sep 2020

PONE-D-20-07833R2

Mining a human transcriptome database for chemical modulators of NRF2

Dear Dr. Corton:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Prof. Roberto Mantovani

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Supporting information.

    Contains 1) list of genes that comprise the NRF2 biomarker, 2) evidence from ChIP-Seq studies for direct interactions between biomarker genes and NRF2, and 3) biosets examined in this study.

    (XLSX)

    Attachment

    Submitted filename: ResponseToReviewers_Rooney_06_10_20.docx

    Attachment

    Submitted filename: Response to reviewer comments hNrf2 08_19_20.docx

    Data Availability Statement

    All microarray or RNA-Seq data is available in Gene Expression Omnibus and ArrayExpress. The accession numbers are found in S1 File.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES