Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Mar 17.
Published in final edited form as: Toxicol Sci. 2011 Aug 2;124(1):54–74. doi: 10.1093/toxsci/kfr202

Development and Evaluation of a Genomic Signature for the Prediction and Mechanistic Assessment of Nongenotoxic Hepatocarcinogens in the Rat

Mark R Fielden *,1, Alex Adai , Robert T Dunn II , Andrew Olaharski §, George Searfoss , Joe Sina , Jiri Aubrecht ∥|, Eric Boitier ∥∥, Paul Nioi #,2, Scott Auerbach **, David Jacobson-Kram ††, Nandini Raghavan a, Yi Yang b, Andrew Kincaid , Jon Sherlock c, Shen-Jue Chen d, Bruce Car d, on behalf of the Predictive Safety Testing Consortium, Carcinogenicity Working Group
PMCID: PMC12989971  NIHMSID: NIHMS2153729  PMID: 21813463

Abstract

Evaluating the risk of chemical carcinogenesis has long been a challenge owing to the protracted nature of the pathology and the limited translatability of animal models. Although numerous short-term in vitro and in vivo assays have been developed, they have failed to reliably predict the carcinogenicity of nongenotoxic compounds. Extending upon previous microarray work (Fielden, M. R., Nie, A., McMillian, M., Elangbam, C. S., Trela, B. A., Yang, Y., Dunn, R. T., II, Dragan, Y., Fransson-Stehen, R., Bogdanffy, M., et al. (2008). Interlaboratory evaluation of genomic signatures for predicting carcinogenicity in the rat. Toxicol. Sci. 103, 28–34), we have developed and extensively evaluated a quantitative PCR-based signature to predict the potential for nongenotoxic compounds to induce liver tumors in the rat as a first step in the safety assessment of potential nongenotoxic carcinogens. The training set was derived from liver RNA from rats treated with 72 compounds and used to develop a 22-gene signature on the TaqMan array platform, providing an economical and standardized assay protocol. Independent testing on over 900 diverse samples (66 compounds) confirmed the interlaboratory precision of the assay and its ability to predict known nongenotoxic hepatocarcinogens (NGHCs). When tested under different experimental designs, strains, time points, dose setting criteria, and other preanalytical processes, the signature sensitivity and specificity was estimated to be 67% (95% confidence interval [CI] = 38–88%) and 59% (95% CI = 44–72%), respectively, with an area under the receiver operating characteristic curve of 0.65 (95% CI = 0.46–0.83%). Compounds were best classified using expression data from short-term repeat dose studies; however, the prognostic expression changes appeared to be preserved after longer term treatment. Exploratory evaluations also revealed that different modes of action for nongenotoxic and genotoxic compounds can be discriminated based on the expression of specific genes. These results support a potential early preclinical testing paradigm to catalyze broader understanding of putative NGHCs.

Keywords: nongenotoxic, carcinogenesis, biomarkers, safety evaluation, liver, systems toxicology, toxicogenomics, methods, predictive toxicology, in vitro, alternatives


The rodent cancer bioassay has been used for over 30 years to evaluate the human carcinogenic risk of chemicals. The bioassay requires exposing rats and mice to a test compound for most of their lifetime (~18 to 24 months) up to a maximum tolerated dose based on prior chronic dose-ranging studies. Because of the extensive resources required, only a small fraction of chemicals have undergone carcinogenicity testing relative to the tens of thousands of compounds identified on the U.S. Environmental Protection Agency’s Toxic Substances Control Act Inventory or registered by Registration, Evaluation, Authorization, and Restriction of Chemicals (Christensen et al., 2011). Additionally, it has been reported that ~31% of marketed drugs have not been tested according to present carcinogenicity testing guidelines (Brambilla and Martelli, 2009). In addition to resource constraints and ethical concerns, the high doses frequently used in the bioassay and the physiological differences between rodents and humans have led to considerable debate over the relevance of the rodent cancer bioassay for assessing human risk (Cohen, 2010; Jacobs, 2005; Maronpot et al., 2004; Melnick et al., 2008; Ward, 2008). As a result, improving upon the current carcinogenicity testing paradigm remains an active area of research.

Because DNA damage is considered a hallmark of carcinogenesis, it is assumed that DNA damaging agents are likely to be carcinogenic. Thus, a number of in vitro and short-term in vivo genotoxicity assays have been developed and validated to detect the ability of chemicals and/or their metabolites to damage or mutate DNA and predict carcinogenic outcome (Kirkland et al., 2005). In contrast to the expectation that genotoxic chemicals are carcinogens, nongenotoxic chemicals cannot be assumed to be noncarcinogenic. Because most compounds that are mutagenic in the Ames test are excluded from drug development, the most frequent adverse outcome observed in rodent cancer bioassays is carcinogenicity initiated by nongenotoxic events. Due to the high maximum tolerated doses used in the rodent cancer bioassay, these carcinogenic events in rodents often occur at exposures above which carcinogenic risk to humans is assumed to be minimal. In addition, many examples exist for which nongenotoxic carcinogenicity in rodents has been conclusively shown not to be relevant for human risk, such as urinary bladder transitional cell carcinoma induced by saccharin or muraglitazar and renal carcinoma induced by D-limonene (Waites et al., 2007; Whysner and Williams, 1996a,b). As a result, there has been a stronger emphasis on understanding the chemical’s mode of action to better evaluate the risk and relevance of the findings to humans (Jacobs, 2005; Jacobs and Jacobson-Kram, 2004). However, the time and resources needed to determine the mechanism of action is considerable, and testing often occurs very late in drug development, if at all. Furthermore, the rodent bioassay typically does not provide the type of mechanistic insight needed to enable this evaluation.

Predicting carcinogenicity induced by nongenotoxic compounds is a challenge due to the many modes of action that have been described to contribute to tumor formation and the multistep process of carcinogenesis (Yamasaki et al., 1996). Nonetheless, numerous assays have been developed in an attempt to predict nongenotoxic carcinogens, including in silico quantitative structure activity relationship models (Contrera et al., 2003; Lee et al., 1995), in vitro mechanistic assays (Yamasaki et al., 1996), cell-based transformation systems (Mauthe et al., 2001; Vanparys et al., 2011), and various (sub)-chronic histological, histochemical, and biochemical indices (Allen et al., 2004; Elcombe et al., 2002; Kitchin et al., 1993; Tatematsu et al., 1987) and combinations thereof (Cohen, 2004; Kitchin et al., 1994). Given the modest predictivity of these approaches or the nature of the assays, there is general agreement that these short-term methods do not reliably predict tumor outcome or provide sufficient information to fully inform a human risk assessment (Jacobs, 2005). As a result, the rodent cancer bioassay remains the gold standard for assessing the human risk of chemical carcinogenesis. Therefore, novel assays or biomarkers that provide an early prediction of a carcinogenic outcome induced by nongenotoxic compounds could enable a more informed compound selection process for early-stage development. This approach could facilitate the proactive initiation of investigative studies to enable an early human risk assessment prior to initiating the rodent bioassay or provide a more efficient hazard identification approach to prioritize chemicals for carcinogenicity testing.

In response to these challenges, genomic or large-scale gene expression profiling has been extensively researched for its ability to predict long-term tumor outcome and/or provide mechanistic data to enable the risk assessment of carcinogens (Waters et al., 2010). The underlying premise of genomic profiling for carcinogenicity prediction is that gene expression changes in the target tissue precede and/or contribute to tumor development and that these changes can be monitored after a short-term in vivo treatment to predict longer term carcinogenic outcomes. To this end, numerous genomic biomarkers or signatures have been described to predict rat hepatocarcinogenicity induced by non-genotoxic compounds (Ellinger-Ziegelbauer et al., 2008; Fielden et al., 2007; Nie et al., 2006; Uehara et al., 2008). The liver is the most common site of tumor formation in the rodent bioassay (Gold et al., 2005) and a number of well-described mechanisms of liver tumor formation are amenable to evaluation based on hepatic gene expression (Waters et al., 2010), making it an ideal model system to evaluate the utility of genomics for carcinogenicity assessment.

Building on the genomic signatures originally described by Fielden et al. (2007) and Nie et al. (2006), we have previously demonstrated the statistical robustness of these proposed signatures for predicting nongenotoxic hepatocarcinogens (NGHCs) (Fielden et al., 2008). However, it was concluded that the published signatures lacked sufficient classification accuracy when used as is likely due to the effect of experimental variables that varied across laboratories, including the microarray platform and study conditions such as time and dose. We reasoned that if the gene expression measurement platform was controlled for and the reproducibility of gene expression measurements is enhanced, a signature could be derived and more thoroughly evaluated for its ability to predict NGHCs and to refine its boundaries of use for optimal classification. Furthermore, to enable broad utilization and evaluation across laboratories, the signature had to be commercially available, established on a reliable and readily available measurement platform, biologically interpretable, and thoroughly evaluated across hundreds of diverse samples. Because the signature is not intended to replace the chronic rodent bioassay but rather to guide internal decision making, allow prioritization of chemicals for formal testing, possibly reduce the reliance on longer term animal studies, and/or enable a more rapid understanding of mode of action, a rigorous validation of the signature as a replacement of chronic rodent studies was not an objective. Instead, the objectives were to develop a signature to enable an early evaluation of NGHCs and to make the signature and underlying data publically available for broader testing.

MATERIALS AND METHODS

TaqMan array card design.

We chose to rederive the initial microarray-based signature using quantitative real-time PCR (qPCR) to provide a widely accessible higher throughput gene expression platform to support evaluation. To this end, we chose the TaqMan array platform (384-well microfluidic cards) to develop a custom array (Applied Biosystems, part of Life Technologies, Foster City, CA). In order to maximize sample throughput, it was desirable to create a TaqMan array with 32 primer pairs in order to permit the analysis of four samples per card in triplicate wells. The predictor genes considered for evaluation included 37 genes from the Iconix signature (Fielden et al., 2007) and six genes from the signature published by Nie et al. (2006). An additional 10 genes from the genotoxic carcinogen signature published by Bayer (Ellinger-Ziegelbauer et al., 2004) were included as it was considered desirable to distinguish nongenotoxic from genotoxic modes of action. Because it was not practical to evaluate all 53 genes, steps were taken to identify 11 genes from the original Iconix 37 gene signature that could provide similar predictive accuracy (data not shown). This resulted in the final selection of 27 unique genes. Three of these genes were evaluated using multiple primer pair sequences (Trnt1, EST AW143969, and Sel1I). Three normalizer genes were also selected to identify an appropriate transcript to normalize and assess input RNA quality (Table 1). Primers and probes were designed by Applied Biosystems according to published design rules (Applied Biosystems).

TABLE 1.

TaqMan Assays Used for qPCR Signature Development

Assay ID Accession Gene symbol Gene name Source In final model
Rn03399817_g1 AI232085.1 Trnt1 tRNA nucleotidyl transferase, CCA-adding, 1 Fielden et al., 2007 No
Rn03399820_s1 AI232085.1 Trnt1 tRNA nucleotidyl transferase, CCA-adding, 1 Fielden et al., 2007 Yes
Rn03399816_s1 AW143969.1 EST EST Fielden et al., 2007 No
Rn03399821_s1 AW143969.1 EST EST Fielden et al., 2007 No
Rn03399822_s1 AW143969.1 EST EST Fielden et al., 2007 Yes
Rn03399819_s1 AW533663.1 Prodh Proline dehydrogenase Fielden et al., 2007 Yes
Rn03399815_s1 AW915076.1 Gpr146 G protein-coupled receptor 146 Fielden et al., 2007 Yes
Rn03399814_s1 BF553500.1 Cited4 Cbp/p300-interacting transactivator, with Glu/Asp-rich carboxy-terminal domain, 4 Fielden et al., 2007 Yes
Rn00680664_g1 NM_012708.1 Psmb9 Proteasome (prosome, macropain) subunit, beta type 9 Fielden et al., 2007 Yes
Rn01452409_m1 NM_030844.2 Ica1 Islet cell autoantigen 1 Fielden et al., 2007 Yes
Rn00587206_m1 NM_053774.2 Usp2 Ubiquitin-specific peptidase 2 Fielden et al., 2007 Yes
Rn01475179_m1 NM_138882.1 Pla1a Phospholipase A1 member A Fielden et al., 2007 Yes
Rn01424675_m1 U53184 Litaf Lipopolysaccharide-induced TNF factor Fielden et al., 2007 Yes
Rn01432563_g1 NM_001007629.1 Nutf2 Nuclear transport factor 2 Nie et al., 2006 No
Rn00689231_m1 NM_012860.2 Mat1a Methionine adenosyltransferase I, alpha Nie et al., 2006 Yes
Rn02132590_g1 NM_021766.1 Pgrmc1 Progesterone receptor membrane component 1 Nie et al., 2006 No
Rn00821759_g1 NM_138826.4 Mt1a Metallothionein 1a Nie et al., 2006 Yes
Rn00756519_m1 NM_173295.1 Ugt2b17 UDP glucuronosyltransferase 2 family, polypeptide B17 Nie et al., 2006 Yes
Rn01517723_m1 NM_177933.2 Sel1I Sel-1 suppressor of lin-12-like (C. elegans) Nie et al., 2006 No
Rn00710081_m1 NM_177933.2 Sel1I Sel-1 suppressor of lin-12-like (C. elegans) Nie et al., 2006 Yes
Rn03399818_s1 AI639488.1 Mdm2 Mdm2 p53-binding protein homolog Ellinger-Ziegelbauer et al., 2004 No
Rn00563462_m1 NM_012861.1 Mgmt O-6-methylguanine-DNA methyltransferase Ellinger-Ziegelbauer et al., 2004 Yes
Rn00566256_m1 NM_013215.1 Akr7a3 Aldo-keto reductase family 7, member A3 Ellinger-Ziegelbauer et al., 2004 Yes
Rn00568504_m1 NM_017259.1 Btg2 B-cell translocation gene 2, anti-proliferative Ellinger-Ziegelbauer et al., 2004 Yes
Rn01530533_g1 NM_019905.1 Anxa2 Annexin A2 Ellinger-Ziegelbauer et al., 2004 Yes
Rn00755484_m1 NM_022407.3 Aldh1a1 Aldehyde dehydrogenase 1 family, member A1 Ellinger-Ziegelbauer et al., 2004 Yes
Rn00709612_m1 NM_032055 Tap1 Transporter 1, ATP-binding cassette, sub-family B Ellinger-Ziegelbauer et al., 2004 Yes
Rn01427989_s1 NM_080782.3 Cdkn1a Cyclin-dependent kinase inhibitor 1A Ellinger-Ziegelbauer et al., 2004 Yes
Rn00592205_m1 NM_133586.1 Ces2 Carboxylesterase 2 (intestine, liver) Ellinger-Ziegelbauer et al., 2004 Yes
Rn00690933_m1 NM_017101.1 Ppia Peptidylprolyl isomerase A (cyclophilin A) Housekeeping gene Yes
Hs99999901_s1 X03205.1 18S 18S ribosomal RNA Housekeeping gene No
Rn99999916_s1 X02231.1 Gapdh Glyceraldehyde 3-phosphate-dehydrogenase Housekeeping gene No

Note. C. elegans, Caenorhabditis elegans; TNF, tumor necrosis factor; UPD, uridine diphosphate.

TaqMan array card assay.

RNA concentration and quality were determined using a NanoDrop ND-1000 Spectrophotometer (Thermo Scientific, Wilmington, DE). A total of 220 ng of liver RNA from each animal was reverse transcribed using the High Capacity complementary DNA (cDNA) RT Kit according to the manufacturer’s instructions (Applied Biosystems). The cDNA was diluted to 2 ng/μl in water and 105 μl were mixed with an equal volume of 2× TaqMan Universal Master Mix (Applied Biosystems). One hundred microliters were then injected into each of two ports on the TaqMan array and analyzed on the Applied Biosystems 7900HT Real-Time PCR System according to the manufacturer’s instructions.

Liver RNA samples.

To develop a de novo signature from qPCR data to predict nongenotoxic hepatocarcinogenicity in the rat, we reanalyzed rat liver RNA samples that had previously been used to identify and evaluate the original Iconix microarray signature (Fielden et al., 2007). Briefly, these samples were derived from male Sprague-Dawley (SD) rats that were administered compound (NGHCs or nonhepatocarcinogens [NH]) or vehicle by oral gavage once daily for 1, 3, or 5 days (n = 3 per group). The considerations for compound classification are described below. The doses administered were considered maximally tolerated in a 5-day study and induced decreases in body weight gain or histological changes in target organs but did not induce severe clinical signs that may otherwise confound interpretation of gene expression changes. Rats were necropsied 24 h after the last dose and liver was stored frozen until RNA extraction according to Fielden et al. (2007). RNA samples were stored at −70°C and were checked to ensure sufficient material to permit cDNA synthesis, as some RNA samples had been depleted or were of low quality. Samples were selected to ensure at least two to three rats per treatment and control group. Vehicle control samples were matched based on common vehicle (aqueous or corn oil) and date that the study was run (i.e., year/quarter). In total, there were 415 RNA samples representing 121 treatment groups, which were analyzed on the TaqMan array. The analyzed log10 ratio data for all treatment groups are provided in Supplementary table 1.

For an independent sample set, we obtained over 900 rat liver RNA samples representing 178 treatment groups from a variety of studies performed at collaborators facilities as further described below. Each treatment group had their own vehicle-matched control and consisted of at least three animals per group. All original SDS files and the R script to execute the model are available upon request to the author.

Compound classification.

A chemical was classified as a hepatocarcinogen if it was (1) found to induce liver tumors in a 2-year carcinogenicity study in at least one strain or gender of rat or (2) reasonably expected to induce liver tumors based on a known class effect (e.g., peroxisome proliferator-activated receptor alpha [PPARα] agonists, steroid hormones). Due to the high false-positive rate of some in vitro genotoxicity assays, we decided to classify hepatocarcinogens as nongenotoxic if there was sufficient literature evidence that they induce liver tumors primarily through a nongenotoxic mechanism despite having a positive finding in an in vitro genotoxicity assay (e.g., phenobarbital, clofibrate). Although we cannot discount the involvement of genotoxic mechanisms in tumor formation for these chemicals, we chose to include these chemicals as NGHCs to improve our ability to identify nongenotoxic mechanisms that may lead to tumor formation.

A chemical was classified as negative for hepatocarcinogenicity if it was (1) found not to induce liver tumors in a 2-year rodent bioassay in both male and female rats or (2) not expected to induce liver tumors based on an antiproliferative mode of action. NHs with a positive finding in a genotoxicity assay were not expected to affect the ability of the signature to identify NGHCs and thus were not specifically excluded from the NH class. Because the assay was restricted to hepatic gene expression, tumor formation in other organs was not considered in the classification nor was the presence or absence of carcinogenic activity in the mouse. No differentiation among tumor types was made, and the term hepatocarcinogenicity is used throughout to refer to chemicals that have been identified to induce adenomas and/or carcinomas.

Data on carcinogenicity outcomes were obtained from the Carcinogenicity Potency Database (http://potency.berkeley.edu), the National Library of Medicine Chemical Carcinogenesis Research Information System (http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?CCRIS), the National Toxicology Program (NTP) Database (http://ntp-apps.niehs.nih.gov/ntp_tox/index.cfm), the Physician’s Desktop Reference (http://www.pdrhealth.com), or peer-reviewed publications (Brambilla and Martelli, 2009; Davies and Monro, 1995; Haseman et al., 1987). Findings reported in the literature were used without reinterpretation or reclassification with respect to their statistical significance or relationship to treatment. Furthermore, no attempt was made to segregate chemicals based on the incidence or severity of tumor formation because the doses used in the current study are likely higher than that used in the rodent bioassay, and biasing the training set toward only potent carcinogens may hinder the sensitivity of the biomarker toward weaker carcinogens that are still of regulatory concern. We recognize that alternative classification of chemicals is possible, given discrepancies in the literature or the rodent bioassay; however, the goal was to derive a signature that provides a sensitive means of identifying NGHCs and modes of action that are expected to contribute to tumor formation rather than recapitulate a specific rodent bioassay result.

Model development step 1: Process evaluation.

The modeling strategy is outlined in Figure 1. We define a model in this context as a specific combination of parameters that are integrated to produce a single score for a given compound at a given dose and time (see model development in Supplementary materials and methods for more information about the model parameters). The general process is to use one subset of data to find the optimal model and a second set of data to test the single optimal model. The first step was to evaluate the process for selecting the optimal model. This involved estimating the accuracy (in terms of area under the receiver operating characteristic [ROC] curve [AUC] and proportion classified correctly) of the model building process in an evaluation phase using 72 compounds profiled on day 5 from the Iconix data set (Table 2). The AUC is equal to the probability that a classification model will rank a randomly chosen positive sample higher than a randomly chosen negative sample and is commonly used to select optimal models independent of class distribution. An AUC of 1.0 reflects a perfect classifier with 100% classification accuracy. This first step also enabled us to evaluate the model-building process on a set of samples from the same site with animals treated under the same protocol. Compounds not profiled on day 5 were excluded because it would have resulted in a skewed distribution of early (day 1) and late (days 3 and 5) samples in the training set, and previous experience indicated that a signature developed on day 5 samples provided the most robust classifier (Fielden et al., 2007). The data for the additional time points are nonetheless available as part of Supplementary table 1. Genotoxic hepatocarcinogens (GHCs) (aflatoxin B1, diethylnitrosamine, methyleugenol, and safrole) were excluded from model development and validation because they did not fit either class we intended to predict, and an insufficient number of GHC samples were available to adequately evaluate classification accuracy toward this class of compounds. However, they were included as part of an effort to test the ability of the genes to differentiate mode of action.

FIG. 1.

FIG. 1.

Overview of model building and evaluation. The model development occurred in three steps. (A) Step 1 was used to evaluate the process for selecting a single model for validation on an independent test set based on training and test set definitions in Fielden et al. (2007). The model strategy was successful in selecting a single model with a similar AUC estimate to that previously published (see Supplementary Results). (B) The model from step 1 was promising but underpowered. As a result, all Iconix samples were used for training in step 2 to select a single top model to classify compounds in step 3. The strategy for model building and selection was identical to that implemented in step 1 with the qualification that the performance of the top model is not preferentially driven by correctly classifying training samples defined in the process evaluation (step 1). (C) Step 3 is the validation of the top qPCR based model from step 2 on an independent test set. The independent test set is composed of samples from multiple sites using different protocols.

TABLE 2.

Summary of Male SD Rat Compound Treatments Used for Signature Development as Part of the Evaluation Study and Final Model Development

Compound Vehicle Dose (mg/kg/day) Time point (days) Class Seta
Anastrozole CMC 400 5 NGHC Training
Ethisterone CMC 1500 5 NGHC Training
Methapyrilene CMC 100 5 NGHC Training
Nafenopin Corn oil 338 5 NGHC Training
Norethindrone Corn oil 375 5 NGHC Training
Pentobarbital Water 70 5 NGHC Training
Phenobarbital Water 80 5 NGHC Training
Pirinixic acid CMC 364 5 NGHC Training
Pravastatin Corn oil 1200 5 NGHC Training
2,3,7,8-Tetrachlorodibenzo-p-dioxin CMC 0.02 5 NGHC Training
Acetaminophen Corn oil 972 5 NGHC Test
Beta-naphthoflavone CMC 1500 5 NGHC Test
Bezafibrate Corn oil 617 7 NGHC Test
Bis(2-ethylhexyl) phthalate Corn oil 500 5 NGHC Test
Carbamazepine CMC 490 5 NGHC Test
Carbimazole Water 400 5 NGHC Test
Chloroform Corn oil 600 5 NGHC Test
Diethylstilbestrol Corn oil 280 5 NGHC Test
Ethylestrenol CMC 390 5 NGHC Test
Fluconazole Corn oil 394 5 NGHC Test
Oxymetholone CMC 1170 5 NGHC Test
Spironolactone CMC 300 5 NGHC Test
Testosterone CMC 375 5 NGHC Test
Alfacalcidol CMC 0.04 5 NH Training
Amlodipine Corn oil 19 5 NH Training
Aspirin Corn oil 375 5 NH Training
Carvedilol Corn oil 2000 5 NH Training
Celecoxib Corn oil 400 5 NH Training
Ciprofloxacin Corn oil 450 5 NH Training
Citric acid Water 3000 5 NH Training
Clarithromycin Water 476 5 NH Training
Cortisone CMC 206 5 NH Training
Cycloheximide Water 0.25 5 NH Training
Dichlorvos Water 17 5 NH Training
Diclofenac Corn oil 10 5 NH Training
Ergocalciferol CMC 15 5 NH Training
Etodolac CMC 24 5 NH Training
Fluoxetine CMC 52 5 NH Training
Ketorolac Water 48 5 NH Training
Megestrol acetate CMC 132 5 NH Training
Methyldopa Water 325 5 NH Training
Pergolide CMC 1.1 5 NH Training
Perhexiline CMC 320 5 NH Training
Pioglitazone Corn oil 1500 5 NH Training
Praziquantel CMC 1200 5 NH Training
Promethazine Saline 113 5 NH Training
Propylthiouracil CMC 625 5 NH Training
Pyrazinamide CMC 1500 5 NH Training
Rabeprazole Water 1024 5 NH Training
Rifabutin CMC 1500 5 NH Training
Rofecoxib Corn oil 1550 5 NH Training
Rosiglitazone Corn oil 1800 5 NH Training
Roxithromycin CMC 312 5 NH Training
Ticlopidine CMC 223 5 NH Training
Tolazamide CMC 1500 5 NH Training
Troglitazone Corn oil 1200 5 NH Training
Valproic acid Water 1500 5 NH Training
1,1-Dichloroethene Water 600 5 NH Test
Amoxapine CMC 313 5 NH Test
Cholecalciferol CMC 8 5 NH Test
Citalopram Corn oil 90 5 NH Test
Clomiphene CMC 250 5 NH Test
Clomipramine Water 115 5 NH Test
Diazepam CMC 710 5 NH Test
Erythromycin CMC 1500 5 NH Test
Finasteride Corn oil 800 5 NH Test
Geraniol CMC 1500 5 NH Test
Pemoline CMC 70 5 NH Test
Phenothiazine Corn oil 386 5 NH Test
Primidone CMC 750 5 NH Test
Propylene glycol Water 2000 5 NH Test
Quetiapine CMC 500 5 NH Test

Note. CMC, carboxymethylcellulose.

a

Training and test refers to how the treatments were divided for model evaluation only. The final model included all 72 treatment groups. All treatments were by oral administration.

Although the original studies were performed with different microarray platforms, all work in this study was based on the TaqMan array platform. This implies that the features used in the original papers may have different predictive powers in this study due to differences between platforms, but we would expect comparable performance. Other markers from other papers (Nie et al., 2006) were provided, so the final feature list could be reconstituted. Because a portion of the genes used on the TaqMan array were chosen from the same samples based on results of prior modeling (e.g., Fielden et al., 2007), our estimates of signature performance on the training data were optimistically biased. Therefore, we chose to maintain the distinction of training and test samples as originally defined by Fielden et al. (2007) and estimate accuracy only on the test data that were not used for feature selection in the original study. The 72 compounds were thus split into a training set of 10 NGHC compounds and 34 NH compounds and a test set of 13 NGHC compounds and 15 NH compounds as shown in Table 2. For this particular training set, we performed 25 replications of fivefold cross-validation on the 44 training compounds (10 NGHC compounds and 34 NH compounds) to produce 25 × 44 = 1100 scores for each candidate model. For each model, we pooled all 1100 estimates appropriately paired with the class membership to estimate a single value for the area under the ROC curve. Although the AUC estimates will be optimistically biased, the AUC was still a reasonable procedure for ranking the models with the risk of potentially selecting overfitted models. The candidate with the top AUC estimate was selected to classify the samples in the held out test data.

Model development step 2: Final model development.

The AUC point estimate of the test data from the evaluation study appeared promising and was comparable to the AUC estimate in the original microarray-based classifier (see Supplementary materials and methods and Supplementary results). Therefore, we performed a second model building procedure using all 72 compounds in the data set to identify a final model suitable for further independent testing as described below. For the second and final model building process, we chose the model with the best pooled AUC estimate as the top model with the caveat that the AUC estimates stratified by the original training and test split are relatively balanced (in other words, we would not select a model with a high AUC estimate driven predominantly by the originally defined training data that were used in the feature selection). See Supplementary materials and methods and Supplementary results for more information.

Model development step 3: Signature evaluation on independent data set.

The model building procedure in step 2 produced a single final model to classify samples in an independent set of samples. In order to estimate signature performance of the final model, yet also determine the boundaries of use of the derived signature, we tested it on a broad array of samples, several of which did not fall within the training set framework as a result of distinct study designs, rat strains, and/or compound classes. Our goal in doing so was not only to judge the sensitivity, specificity, and reproducibility of the signature but also to identify any factors that may result in poor signature performance so that a study design could be recommended to provide optimal classification accuracy. In total, we obtained over 900 liver RNA samples from a number of sources that tested a variety of chemicals under different conditions. A description of the treatment conditions (dose, time, strain, and route of administration) is provided in Table 3 and references therein. Expression data for these treatment groups are also provided in Supplementary table 1. Liver RNA was analyzed using the TaqMan array as described above for the training set samples. In total, there were 169 treatment groups representing 86 unique compounds, including NGHC, NH, and GHC, and several compounds of unknown or inconclusive carcinogenic outcome in the rat (alpha-naphthylisothiocyanate, butylated hydroxytoluene, ridogrel, prucalopride, chlorpromazine, hexachlorocyclohexane, and amitriptyline). For the purpose of the independent signature evaluation, we removed the unknown and GHC compounds from the analysis, and we removed any compounds that were used in the training of the final model. This produced an independent data set totaling 66 unique compounds that were evaluated under varying conditions.

TABLE 3.

Results of Independent Multisite Signature Evaluation

Johnson & Johnson (male SD rats); Nie et al., 2006
Treatment Vehicle Dose (mg/kg/day) Route of administration Time point (days) Class Score Predicted classa Confidence levelb
Butylated hydroxytoluene 5% MC 1000 PO 1 NH 0.79 NGHC 100
Cyproterone acetate 5% MC 200 PO 1 NGHC 0.624 NGHC 68.9
Ethinyl estradiol (experiment 1) 5% MC 500 PO 1 NGHC 0.851 NGHC 100
Ethinyl estradiol (experiment 2) 5% MC 500 PO 1 NGHC 0.864 NGHC 100
Isoniazid 5% MC 125 PO 1 NGHC 0.157 NH 100
Methapyrilenec 5% MC 200 PO 1 NGHC 0.87 NGHC 100
Monocrotaline 5% MC 30 PO 1 NGHC 0.749 NGHC 99.7
Piperonyl butoxide 5% MC 4000 PO 1 NGHC 0.548 NH 80.2
Progesterone 5% MC 100 PO 1 NGHC 0.359 NH 100
Simvastatin 5% MC 150 PO 1 NGHC 0.353 NH 100
Tamoxifen 5% MC 750 PO 1 NGHC 0.532 NH 87.1
Amiodarone 5% MC 600 PO 1 NH 0.468 NH 98.8
Amiodarone (experiment 1) 5% MC 1000 PO 1 NH 0.419 NH 99.9
Amiodarone (experiment 2) 5% MC 1000 PO 1 NH 0.508 NH 94
Aniline 5% MC 200 PO 1 NH 0.696 NGHC 96.1
Aspirinc 5% MC 600 PO 1 NH 0.37 NH 100
Atenolol 5% MC 1500 PO 1 NH 0.245 NH 100
Beta-hydroxypropyl-cyclodextrin Water 2000 PO 1 NH 0.724 NGHC 98.8
Bromocryptine 5% MC 200 PO 1 NH 0.19 NH 100
Buspirone 5% MC 100 PO 1 NH 0.661 NGHC 87.6
Captopril 5% MC 5000 PO 1 NH 0.401 NH 100
Clozapine 5% MC 150 PO 1 NH 0.517 NH 91.8
Dantrolene 5% MC 500 PO 1 NH 0.438 NH 99.7
Dapsone 5% MC 50 PO 1 NH 0.563 NH 71.7
Dexamethasone 5% MC 75 PO 1 NH 0.437 NH 99.7
Dieldrin 5% MC 30 PO 1 NH 0.62 NGHC 66.3
Dieldrin 5% MC 45 PO 1 NH 0.615 NGHC 63
Dipyridamole 5% MC 5000 PO 1 NH 0.659 NGHC 86.6
Disulfiram 5% MC 2000 PO 1 NH 0.91 NGHC 100
Enalapril 5% MC 1800 PO 1 NH 0.653 NGHC 84.2
Erythromycin estolate (experiment 1) 5% MC 1500 PO 1 NH 0.678 NGHC 92.5
Erythromycin estolate (experiment 2) 5% MC 1500 PO 1 NH 0.802 NGHC 100
Famotidine 5% MC 500 PO 1 NH 0.641 NGHC 78.8
Fluoxetinec 5% MC 50 PO 1 NH 0.584 NH 58.7
Fluoxetinec 5% MC 100 PO 1 NH 0.499 NH 95.6
Flutamide 5% MC 500 PO 1 NH 0.923 NGHC 100
Flutamide 5% MC 500 PO 1 NH 0.779 NGHC 99.9
Furosemide 5% MC 1500 PO 1 NH 0.284 NH 100
Glibenclamide 5% MC 3000 PO 1 NH 0.35 NH 100
Glibenclamide 5% MC 5010 PO 1 NH 0.479 NH 98
Iansoprazole 5% MC 200 PO 1 NH 0.768 NGHC 99.9
Ibuprofen 5% MC 500 PO 1 NH 0.571 NH 67.2
Indomethacin Saline 30 IP 1 NH 0.385 NH 100
Itraconazole 5% MC 200 PO 1 NH 0.584 NH 58.7
Ketoconazole 5% MC 150 PO 1 NH 0.456 NH 99.3
Mebendazole 5% MC 40 PO 1 NH 0.472 NH 98.5
Metformin 5% MC 750 PO 1 NH 0.443 NH 99.7
Methyldopac 5% MC 1000 PO 1 NH 0.293 NH 100
Metoprolol 5% MC 2000 PO 1 NH 0.712 NGHC 98
Mycophenolic acid 5% MC 500 PO 1 NH 0.429 NH 99.8
Naltrexone 5% MC 1000 PO 1 NH 0.792 NGHC 100
Niacin 5% MC 2505 PO 1 NH 0.779 NGHC 99.9
Niacin 5% MC 5010 PO 1 NH 0.125 NH 100
Nifedipine 5% MC 750 PO 1 NH 0.655 NGHC 85.2
Nitrofurantoin 5% MC 400 PO 1 NH 0.599 NGHC 52.3
Nizatidine 5% MC 1000 PO 1 NH 0.569 NH 68.5
IVrhexilenec 5% MC 2000 PO 1 NH 0.522 NH 90.4
Perhexilenec 5% MC 2010 PO 1 NH 0.381 NH 100
Phenylephrine Saline 5 IP 1 NH 0.105 NH 100
Quercetin 5% MC 1995 PO 1 NH 0.316 NH 100
Quercetin 5% MC 4005 PO 1 NH 0.368 NH 100
Raloxifene 5% MC 700 PO 1 NH 0.235 NH 100
Rantidine 5% MC 1000 PO 1 NH 0.302 NH 100
Rifampin 5% MC 600 PO 1 NH 0.49 NH 96.9
Rosiglila/onec 5% MC 30 PO 1 NH 0.179 NH 100
Rosiglitazonec 5% MC 100 PO 1 NH 0.207 NH 100
Rotenone 5% MC 4 PO 1 NH 0.702 NGHC 96.9
Rotenone 5% MC 100 PO 1 NH 0.533 NH 86.8
Sulfamethoxazole 5% MC 2000 PO 1 NH 0.681 NGHC 93.2
Tannic acid (experiment 1) 5% MC 3000 PO 1 NH 0.715 NGHC 98.2
Tannic acid (experiment 2) 5% MC 3000 PO 1 NH 0.639 NGHC 77.5
Tetracycline 5% MC 500 PO 1 NH 0.565 NH 71
Troglitazanec 5% MC 100 PO 1 NH 0.486 NH 97.3
Troglitazonec 5% MC 500 PO 1 NH 0.606 NGHC 56.8
Valproic acidc 5% MC 200 PO 1 NH 0.312 NH 100
Valproic acidc 5% MC 500 PO 1 NH 0.237 NH 100
Valproic acidc 5% MC 600 PO 1 NH 0.41 NH 99.9
Valproic acidc 5% MC 1000 PO 1 NH 0.735 NGHC 99.3
Verapamil 5% MC 75 PO 1 NH 0.377 NH 100
Vitamin A 5% MC 100 PO 1 NH 0.509 NH 93.7
Vitamin A 5% MC 200 PO 1 NH 0.406 NH 100
NTP (male F344 rats); Auerbach et al., 2010
Treatment Feed Dose (mg/kg/day) Route of administration Time point (days) Class Score Predicted class Confidence level
1-Amino-2,4-dibromoanthraquinone Feed 5000 ppm Dietary 2 NGHC 0.909 NGHC 100
1-Amino-2,4-dibromoanthraquinone Feed 5000 ppm Dietary 14 NGHC 0.868 NGHC 100
1-Amino-2,4-dibromoanthraquinone Feed 5000 ppm Dietary 90 NGHC 0.792 NGHC 100
Acetaminophenc Feed 3000 ppm Dietary 2 NGHC 0.861 NGHC 100
Acetaminophenc Feed 3000 ppm Dietary 14 NGHC 0.797 NGHC 100
Acetaminophenc Feed 3000 ppm Dietary 90 NGHC 0.672 NGHC 91.1
Methyleugenol 5% MC 150 PO 2 GHC 0.911 NGHC 100
Methyleugenol 5% MC 150 PO 14 GHC 0.869 NGHC 100
Methyleugenol 5% MC 150 PO 90 GHC 0.654 NGHC 84.7
Methyleugenol Corn oil 35.6 PO 2 GHC 0.632 NGHC 73.6
Methyleugenol Corn oil 35.6 PO 14 GHC 0.537 NH 85.2
Methyleugenol Corn oil 35.6 PO 90 GHC 0.565 NH 70.7
Methyleugenol Corn oil 356 PO 2 GHC 0.849 NGHC 100
Methyleugenol Corn oil 356 PO 14 GHC 0.757 NGHC 99.8
Methyleugenol Corn oil 356 PO 90 GHC 0.851 NGHC 100
Safrole Corn oil 32.4 PO 2 GHC 0.717 NGHC 98.3
Safrole Corn oil 32.4 PO 14 GHC 0.538 NH 84.9
Safrole Corn oil 32.4 PO 90 GHC 0.655 NGHC 88.6
Safrole Corn oil 324 PO 2 GHC 0.758 NGHC 99.8
Safrole Corn oil 324 PO 14 GHC 0.736 NGHC 99.3
Safrole Corn oil 324 PO 90 GHC 0.717 NGHC 98.3
Ascorbic acid Feed 25,000 ppm Dietary 2 NH 0.636 NGHC 76.1
Ascorbic acid Feed 25,000 ppm Dietary 14 NH 0.786 NGHC 100
Ascorbic acid Feed 25,000 ppm Dietary 90 NH 0.468 NH 98.8
Eugenol Corn oil 32.8 PO 2 NH 0.544 NH 82.2
Eugenol Corn oil 32.8 PO 14 NH 0.49 NH 96.9
Eugenol Corn oil 32.8 PO 90 NH 0.431 NH 99.8
Eugenol Corn oil 328 PO 2 NH 0.429 NH 99.8
Eugenol Corn oil 328 PO 14 NH 0.698 NGHC 96.4
Eugenol Corn oil 328 PO 90 NH 0.363 NH 100
Isoeugenol Corn oil 32.8 PO 2 NH 0.804 NGHC 100
Isoeugenol Corn oil 32.8 PO 14 NH 0.77 NGHC 99.9
Isoeugenol Corn oil 32.8 PO 90 NH 0.709 NGHC 97.7
Isoeugenol Corn oil 328 PO 2 NH 0.728 NGHC 99
Isoeugenol Corn oil 328 PO 14 NH 0.655 NGHC 85
Isoeugenol Corn oil 328 PO 90 NH 0.642 NGHC 79.3
l-tryptophan Feed 25,000 ppm Dietary 2 NH 0.876 NGHC 100
l-tryptophan Feed 25,000 ppm Dietary 14 NH 0.837 NGHC 100
l-tryptophan Feed 25,000 ppm Dietary 90 NH 0.647 NGHC 81.8
Aflatoxin B1 Feed 1 ppm Dietary 2 GHC 0.665 NGHC 88.9
Aflatoxin B1 Feed 1 ppm Dietary 14 GHC 0.812 NGHC 100
Aflatoxin B1 Feed 1 ppm Dietary 90 GHC 0.836 NGHC 100
Dimethylnitrosamine Water 5 ppm Water 2 GHC 0.653 NGHC 84.2
Dimethylnitrosamine Water 5 ppm Water 14 GHC 0.745 NGHC 99.6
Dimethylnitrosamine Water 5 ppm Water 90 GHC 0.759 NGHC 99.8
Pfizer (male SD rats)
Treatment Vehicle Dose (mg/kg/day) Route of administration Time point (days) Class Score Predicted class Confidence level
Acetaminophenc 5% MC 300 PO 4 NGHC 0.199 NH 100
Thioacetamide 5% MC 50 PO 4 NGHC 0.809 NGHC 100
Alpha-naphthylisothiocyanate 5% MC 30 PO 1 Unknown 0.42 NH 99.9
Alpha-naphthylisothiocyanate 5% MC 100 PO 1 Unknown 0.329 NH 100
Roche (male SD rats)
Treatment Vehicle Dose (mg/kg/day) Route of administration Time point (days) Class Score Predicted class Confidence level
Methapyrilenec Water 10 PO 2 NGHC 0.329 NH 100
Methapyrilenec Water 10 PO 6 NGHC 0.584 NH 58.7
Methapyrilenec Water 10 PO 10 NGHC 0.698 NGHC 96.4
Methapyrilenec Water 10 PO 14 NGHC 0.734 NGHC 99.3
Methapyrilenec Water 50 PO 2 NGHC 0.861 NGHC 100
Methapyrilenec Water 50 PO 6 NGHC 0.884 NGHC 100
Methapyrilenec Water 50 PO 10 NGHC 0.879 NGHC 100
Methapyrilenec Water 50 PO 14 NGHC 0.713 NGHC 98
Sanofi-aventis (male F344 rats); Michel et al., 2005—site 2
Treatment Vehicle Dose (mg/kg/day) Route of administration Time point (days) Class Score Predicted class Confidence level
Clofibrate Feed 5000 ppm Dietary 18 NGHC 0.821 NGHC 100
Clofibrate Feed 5000 ppm Dietary 264 NGHC 0.866 NGHC 100
Clofibrate (nontumorous) Feed 5000 ppm Dietary 607 NGHC 0.709 NGHC 97.8
Clofibrate (adjacent tumor) Feed 5000 ppm Dietary 607 NGHC 0.945 NGHC 100
Schering-Plough Research Institute (male SD rats); Nioi et al., 2008
Treatment Vehicle Dose (mg/kg/day) Route of administration Time point (days) Class Score Predicted class Confidence level
Acetaminophenc 4% MC 950 PO 1 NGHC 0.782 NGHC 99.9
Acetaminophenc 4% MC 950 PO 5 NGHC 0.866 NGHC 100
Butylated hydroxytoluene 4% MC 450 PO 1 NH 0.739 NGHC 99.4
Butylated hydroxytoluene 4% MC 450 PO 5 NH 0.685 NGHC 94.2
Methapyrilenec 4% MC 100 PO 1 NGHC 0.828 NGHC 100
Methapyrilenec 4% MC 100 PO 5 NGHC 0.895 NGHC 100
Phenobarbitalc Water 50 PO 1 NGHC 0.701 NGHC 96.8
Phenobarbitalc Water 50 PO 5 NGHC 0.649 NGHC 82.5
Fluoxetinec 4% MC 400 PO 1 NH 0.444 NH 99.6
Fluoxetinec 4% MC 400 PO 5 NH 0.484 NH 97.6
Ranitidine 4% MC 1000 PO 1 NH 0.54 NH 83.9
Ranitidine 4% MC 1000 PO 5 NH 0.541 NH 83.4
Iconix (male SD rats); Fielden et al., 2007
Treatment Vehicle Dose (mg/kg/day) Route of administration Time point (days) Class Score Predicted class Confidence level
Aflatoxin B1 0.5% CMC 0.3 PO 5 GHC 0.518 NH 91.5
Diethylnitrosamine Saline 34 PO 5 GHC 0.754 NGHC 99.7
Pregnenolone-16alpha-carbonitrile 0.5% MC 100 PO 5 NGHC 0.899 NGHC 100
Carbon tetrachloride Corn oil 1175 PO 3 NGHC 0.62 NGHC 66.6
Abbott (male and female SD rats)
Treatment Vehicle Dose (mg/kg/day) Route of administration Time point (days) Class Score Predicted class Confidence level
N-vinylpyrrolidone-2—male Saline 3000 PO 5 NGHC 0.782 NGHC 99.7
N-vinylpyrrolidone-2—female Saline 3000 PO 5 NGHC 0.725 NGHC 97.1
Rimonabant—male 0.2% HPMC 10 PO 5 NGHC 0.635 NGHC 71.9
Latrepirdine—male 0.2% HPMC 10 IP 6 NH 0.276 NH 100

Note. CMC, carboxymethylcellulose; HPMC, hydroxypropylmethylcellulose; MC, methylcellulose.

a

Signature scores ≥ than the classification threshold (0.596) were predicted as NGHCs.

b

The CI provides an estimate of confidence for the two class predictions (NGHC or NH) and is described in the Supplementary Materials and Methods.

c

Indicates compounds that were also used in the original training set.

Determination of classification threshold.

Classifying compounds as NGHC or NH required dichotomizing the classification scores into calls. Because we standardized all potential models to have probabilistic output with values 0–1 inclusive, we modeled the classification scores with beta distributions. After each replication of cross-validation on the training data, we chose to separately fit the classification scores into two separate beta probability densities; one beta distribution was fit for NH classification results and one beta distribution for NGHC classification results. The point that is equally likely to be in either NGHC or NH distribution was defined as the threshold or classification cut point. The 25 replications of cross-validation provided 25 estimates for the threshold for a given model. Because the threshold tended to be away from the 0 or 1 limits, the thresholds were approximately normally distributed, and this allowed for reasonable estimates of the variance associated with the threshold.

Interlaboratory precision study.

The interlaboratory precision of the model was evaluated by splitting each of 38 liver RNA samples among four laboratories and determining the reproducibility of the expression values and signature scores when measured in different laboratories. The 38 samples consisted of liver RNA from male F344 rats (n = 6–10 rats per group) treated with 5000 ppm of clofibrate in the diet for 18, 264, or 607 days (Michel et al., 2005). RNA samples from liver tumors, and adjacent normal tissue, were evaluated and compared on day 607. The time-matched control animals received diet only. The precision of the TaqMan array data was evaluated by comparing the variability of signature scores and expression ratios across the four sites.

Biological interpretation of biomarker genes and their regulation.

Two approaches were used to obtain gene function information on the 23 genes composing the final model: (1) A general biomedical literature searching (PubMed) effort carried out on a gene-by-gene basis and (2) the mining of annotated knowledge-based databases found in the Ingenuity Pathway Analysis (IPA) software (Ingenuity Systems, Redwood City, CA) and the BIOBASE Knowledge Library (BIOBASE Corporation, Beverly, MA). The literature review was focused on identifying functional associations between biomarker genes and regulation of cell proliferation and carcinogenesis. IPA was used to identify pathways, biological processes, and networks that were statistically enriched in the signature genes. Through the use of both tools, the probability that the representation level of genes in the query set in each functional category, disease, or network process is due to chance alone was expressed as a p value. p Values less than 0.05 were considered significant. Detailed information on the statistical methods underlying the pathway and functional category enrichment and impact scoring can be found at the software provider’s web address (Ingenuity Systems, http://www.ingenuity.com/).

Using gene accession information, the genes composing the final model were uploaded into the BIOBASE analysis tool, ExPlain data analysis system, which leverages the TRANSFAC and TRANSPATH databases to score for the presence of transcription factor response elements (TFREs) within the 1100-bp proximal promoter region of the member genes. To determine relative enrichment, the TFRE abundance in the query set were compared with a reference set of 400 rat housekeeping gene promoters and the likelihood of TFRE overrepresentation in the query set relative to the reference set is expressed as a p value representing the probability that the difference in the TFRE overrepresentation is due solely to chance. A more detailed description of the BIOBASE ExPlain tool and the statistical methods underpinning the TFRE enrichment analysis can be found at the provider’s web address (http://www.biobase-international.com/).

RESULTS

Classification Accuracy

The results of the initial model building in the evaluation study (model development step 1: process evaluation) produced an AUC of 0.84 on the test data, which was significantly different from the top model trained on the same data with the class labels randomly permuted (p = 0.012), indicating that the model and the underlying model building procedure identified a true signal that can differentiate NGHCs from NHs (see Supplementary Results). Based on these encouraging results, we proceeded to build a final model using all 72 compounds. This modeling resulted in a signature containing 22 genes, all normalized to peptidylprolyl isomerase A, using a random forest classifier with a classification threshold of 0.596. This signature was then evaluated on the independent data set detailed in Table 3. Figure 2 shows the principal component analysis visualization based on the delta-delta Ct values (expression ratios) for each data point (a compound measured at a given site, dose, and time point) in the independent data set. The compound classes tended to separate in the first two principal components, thus indicating that the separation of NGHCs and NHs is partially preserved on independent samples based solely on the expression of the 22 genes in the final signature. To estimate overall sensitivity and specificity of the signature, an evaluation was done at the compound level by merging compounds measured from multiple sites, and at different doses or time points, into a single score based on the median signature score. Merging the replicates produced 66 unique compounds in the independent data set. This approach resulted in a sensitivity and specificity of 67% (95% confidence interval [CI] = 38–88%) and 59% (95% CI = 44–72%), respectively, with an AUC of 0.65 (95% CI = 0.46–0.83%) (Fig. 4A and Supplementary figure. 2). This conservative estimate provided a 1:1 mapping of compound to prediction in order to estimate the associations with class. In general, we found the data from multiple sites for the same compound to have correlated scores (see Supplementary figure. 3). However, merging multiple doses in this manner may risk conflating very different responses on individual compounds and should not be done in practice, but it nonetheless provides a convenient means to estimate performance. The effect of site, which in this context is a proxy for study protocol, on these classification results is difficult to evaluate because most samples in the independent study came from single dose (1 day) studies at Johnson & Johnson (J&J) (Fig. 3). If we confined our results to sites outside of the J&J samples, we estimate an improved AUC at 0.81 (95% CI = 0.5–1.0%), whereas the J&J results provided an AUC of 0.49 (95% CI = 0.22–0.76%) (Fig. 4B).

FIG. 2.

FIG. 2.

Principal component analysis of the independent signature evaluation data. The 66 test compounds (including replicates) spanned by the 22 predictive genes in the model are projected in the first two principal components. The results are stratified vertically by compound classification accuracy and horizontally by compound class. Rug plots were added, so that compound positioning is more apparent with the color scaled according to the classifier score (darker marks suggest higher scores). In general, we see good separation between NGHC compounds (black points) and NH compounds (gray points), and this suggests the 22 predictive genes tended to separate the classes as expected. The NGHC compounds that are classified incorrectly (black points with gray borders) are generally in close proximity to the NH compounds, whereas the NH compounds that are classified incorrectly (gray points with black borders) have mixed proximity to other NGHC compounds.

FIG. 4.

FIG. 4.

Final model performance. (A) ROC plot for the evaluation signature set. The results on the independent signature set (multisite test set) are represented by the black line with points. Each point is derived from an identical compound generated at different sites and tested at different doses but summarized by the median signature score. Random chance is the diagonal dashed black line. The observed sensitivity and specificity derived from the independent test set is shown with the gray ‘X’. The black box captures the 95% through the 97.5% CI based on an exact test (Clopper-Pearson). FPR: false-positive rate. TPR: true-positive rate. Sensitivity and specificity curves are also evaluated in Supplementary figure 2. (B) 2 × 2 contingency tables for classification of independent and unique compounds. Sensitivity is the proportion of NGHCs correctly predicted positive. Specificity is the proportion of NHs correctly predicted negative. PPV, positive predictive value, is the proportion of samples with positive test results that are correctly diagnosed. NPV, negative predictive value, is the proportion of samples with negative test results that are correctly diagnosed. AUC with 95% confidence limits in brackets.

FIG. 3.

FIG. 3.

Stratification of independent test compounds by class, day, and site. Each replicate from a given compound is represented by a single point, and class results are summarized using box plots. The boxes capture the middle 50% of the data. The classifier cutoff is represented using a horizontal line at 0.596. Compounds with a signature score greater than 0.596 are classified as positive (predicted NGHC). The plot shows the composition of the independent data set being composed of samples predominately from J&J at day1.

To explore further the boundaries of use for the signature, we evaluated the dose and time dependence of the signature score. The effects of dose and time on signature predictions were best illustrated by the samples from a time-course study in male SD rats treated with methapyrilene at doses of 10 and 50 mg/kg/day. Methapyrilene administered at 10 mg/kg produced a time-dependent increase in the signature score; however, it was correctly predicted positive only at the later time points on days 10 and 14 (Table 3). By contrast, the high dose of methapyrilene at 50 mg/kg/day was correctly predicted positive at all time points. The signature score did not appear to increase over time as it was close to its maximum on day 2 and sustained above 0.7 throughout the course of treatment. Methapyrilene was also correctly predicted positive when tested by J&J at 200 mg/kg for 24 h (Table 3). These results suggest that the signature is sensitive to dose and time and that low-dose exposure and/or early time points may not be optimal to identify expression changes diagnostic of NGHCs. This is consistent with the fact that the training set was established using maximum tolerated doses and repeated daily doses for 5 days.

The ability of the signature to correctly classify samples from long-term treatments was investigated by evaluating the 90-day studies conducted by the National Toxicology Program (NTP). A comparison of time points within these long-term studies indicate that the 90-day samples typically produce similar classification results as the earlier time points (cf days 2 or 14). Although many of the classification results from the 90-day NTP studies were incorrect (false positives), the consistency of the results suggest that the expression changes were conserved over time. Likewise, the long-term clofibrate diet study also indicated the classification results and expression changes were preserved over the extended course of treatment (Table 3). These limited results suggest that samples from both short-term and long-term repeat dose studies may have applicability to the signature.

Some hepatocarcinogens are thought to cause tumors secondary to hepatotoxicity and regenerative proliferation, raising the concern that the signature may be sensitive to false positives as a result of liver injury. Therefore, it was of interest to determine if there was an association between the signature score and the degree of hepatic damage. Rats treated with methapyrilene at both 10 and 50 mg/kg showed no difference in the degree of hepatotoxicity at the early time points as both groups showed minimal single cell necrosis, yet the signature scores were clearly distinct on days 2 and 6. The increasing severity of hepatic necrosis in the high-dose group at later time points also did not correlate with the signature score. Both low and high doses produced minimal to mild spindle cell proliferation on days 6 through 14, including biliary hyperplasia in the most severe instances in the high dose group (data not shown), yet this was not correlated with the signature score. These results suggest that proliferating cells and hepatotoxicity are unlikely to influence the signature scores and lead to false positives with hepatotoxic treatments. While the lack of a complete histological evaluation of all test samples precludes a more comprehensive analysis of this hypothesis, the negative signature scores for other known hepatotoxic compounds such as alpha-naphthylisothiocyanate, rotenone, valproic acid, or aflatoxin B1 (Table 3) provide further evidence that hepatotoxic drug treatments are unlikely to produce false-positive predictions. This is consistent with results with the original Iconix microarray signature (Fielden et al., 2007), and the fact that hepatotoxic treatments were included in both classes of the training set to limit this possibility.

Signature Precision and Reproducibility

The precision of the TaqMan array was assessed by splitting RNA samples from a chronic clofibrate toxicity study into aliquots for evaluation at four different laboratories to assess site-to-site variation. As expected, the precision of the classifier score when measured across sites was excellent, as all four sites produced very similar expression results and signature scores (Supplementary figures. 1A and 1B). In addition, it was of interest to determine the robustness of the predictive expression changes for compounds evaluated at different sites or dates; we expect a given compound to be classified identically assuming the same dose and experimental protocol were used. Five compounds were tested at the same dose level in separate studies but at the same site (J&J), thus permitting an evaluation of reproducibility within a single laboratory. In all five cases, the biomarker predictions were concordant (amiodarone, erythromycin, ethinyl estradiol, flutamide, and tannic acid; Table 3). These results provide confidence that signature predictions should be similar when assessed under similar study conditions. Because other compounds evaluated at multiple sites were tested under different conditions or doses, we were unable to evaluate the reproducibility across sites. In addition, it was of interest to determine the robustness of the predictive expression changes for compounds evaluated at different sites, as the same compound should ideally be predicted the same regardless of where it was tested. A number of compounds were tested at multiple sites albeit with different study designs and doses, so a direct comparison could not be made. For example, acetaminophen was tested at three different sites at 300 mg/kg for 4 days, 950 mg/kg/day for 1 and 5 days, and at a dietary exposure of 3000 ppm for 2, 14, and 90 days. Acetaminophen was correctly predicted positive by the signature at 950 mg/kg/day and 3000 ppm at all time points, whereas the lower dose of 350 mg/kg for 24 h was predicted negative (Table 3). Additionally, the non-genotoxic hepatocarcinogen methapyrilene was correctly predicted positive by the signature at four different doses and across three different laboratories. The NH fluoxetine was also correctly predicted negative at three different doses and across two different laboratories (Table 3).

Evaluating Nongenotoxic Modes of Action

Previous microarray expression data on the Iconix samples (Fielden et al., 2007) demonstrated that hierarchical clustering of NGHCs across 37 signature genes could identify compounds with similar mode of action based on the similarity of their expression profiles. Although hierarchical clustering is an unsupervised clustering technique and therefore not a formal prediction, it can provide a visual but subjective means to evaluate novel compounds for potential modes of action that may contribute to a positive prediction and hepatocarcinogenicity. The 23 NGHCs in the training set were clustered across all 22 genes in the model (Fig. 5). A number of test compounds were included in the clustering to evaluate whether the signature genes could facilitate identification of known compounds with similar modes of action. The genotoxic hepatocarcinogens aflatoxin B1 and N-nitrosodiethylamine dosed orally for 5 days clustered together and were distinct from other treatments. The next most similar expression profiles were a number of hepatotoxicants such as acetaminophen and chloroform, which appeared to be driven by induction of the oxidative stress–responsive gene Akr7a3. The test compounds pregnenolone-16alpha-carbonitrile, phenobarbital, and butylated hydroxytoluene clustered among other PXR and CAR agonists as expected, whereas the P450 inducers and Ah receptor agonists, beta-naphthoflavone and 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), clustered distinctly. Interestingly, fluconazole coclustered with diethylstilbestrol and norethindrone, which suggests fluconazole may have a similar mode of action for inducing hepatocarcinogenicity. It is notable that a number of PPARα agonists coclustered despite the fact that the 22 signature genes are not known for being associated with fatty acid metabolism. This cluster of PPARα agonists was also correlated to the profiles of bis(2-ethylhexyl)phthalate and pravastatin, which are also thought to activate PPARα (Chen et al., 2010). These results substantiate the utility of the 22 signature genes to identify putative modes of action for known or suspected hepatocarcinogens.

FIG. 5.

FIG. 5.

Hierarchical clustering of genotoxic and nongenotoxic hepatocarcinogens. The log10 ratios of the 22 signature genes were calculated by comparing the expression in the treated rats relative to time-matched vehicle control rats. The genes and expression profiles were then hierarchically clustered. Clustering method: complete linkage. Distance measure: correlation. Green = upregulation; red = downregulation; black = no change. Absolute magnitude of expression change is provided in Supplementary table 1. Note. See online version for color version.

Evaluating Genotoxic Modes of Action

A number of genotoxic treatments were included in the test set to evaluate whether the signature detected expression changes that were common to all hepatocarcinogens regardless of mode of action. In male SD rats, oral administration of aflatoxin B1 resulted in a negative prediction, whereas N-nitrosodiethylamine was predicted positive. Dietary exposure of male F344 rats to aflatoxin and N-nitrosodimethylamine resulted in a consistent positive prediction on days 2 through 90. This was unexpected as the model was trained to identify NGHCs. Whether genotoxic hepatocarcinogens cause prognostic expression changes similar to NGHCs is unclear and will require evaluation of a broader set of genotoxic compounds.

A number of genes on the array were chosen based on a previous study (Ellinger-Ziegelbauer et al., 2004) that demonstrated a strong and consistent upregulation of expression in response to genotoxic hepatocarcinogens, which suggested that they could be used to differentiate genotoxic from nongenotoxic modes of action. These genes include a number of p53- and DNA damage–responsive genes such as BTG2, CDKN1A, and MGMT, as well as a number of xenobiotic metabolism genes such as CES2 and ALDH1A1. As shown in Figure 6A, these genes were significantly induced by aflatoxin B1 and diethylnitrosamine after 5 days of repeated daily dosing in male SD rats. By comparison, the NGHCs, bezafibrate and TCDD do not consistently induce these genes after 5 days of repeated daily dosing (Fig. 6B), thus suggesting these genes could be used to differentiate genotoxic modes of action. However, it was also observed that a number of NGHCs were found to induce many of these genes. Examples include hepatotoxic treatments such as methapyrilene and chloroform (Fig. 6C). The induction of these genes may be secondary to cytotoxicity and p53 activation rather than evidence of direct DNA damage. The NHs, praziquantal and dichlorvos also induce a number of these genes (Fig. 6D). The weight of evidence would suggest these compounds are not genotoxic in vivo despite some conflicting reports (Booth et al., 2007; Montero and Ostrosky, 1997); however, there is no histological evidence of hepatotoxicity in these animals (data not shown). By evaluating the gene expression changes for these DNA damage–responsive genes, it may be possible to differentiate nongenotoxic from genotoxic modes of action. Histological changes in the samples would likely need to be taken into consideration when interpreting the potential of treatments to cause direct DNA damage in vivo based on the expression of these DNA damage–responsive genes.

FIG. 6.

FIG. 6.

Expression of DNA damage–responsive genes. Genes previously identified as being responsive to DNA damagers (Ellinger-Ziegelbauer et al., 2004; Table 1) were evaluated for their ability to differentiate genotoxic from nongenotoxic modes of action. Male SD rats were treated for 5 days with examples of (A) known genotoxic hepatocarcinogens, (B) known nongenotoxic hepatocarcinogens, (C) known nongenotoxic hepatocarcinogens at hepatotoxic doses, and (D) with NHs with known (dichlorvos) or equivocal (praziquantal) genotoxic liabilities. Treatments were as described in Table 3. Fold induction was calculated by comparing the expression in the treated rats relative to vehicle-matched controls as described in the “Materials and Methods” section.

Role of Biomarker Genes in Neoplasia

A detailed gene literature survey using the BioBase Knowledge Library revealed 10 of 23 genes that were correlated or causally associated with neoplasia or cancer (Supplementary table 2), and 8 of the genes have “Cell growth/cycle/signal transduction” as the primary biological process category. A gene-by-gene characterization, though useful, may miss the possible interconnectivity of the signature genes. Therefore, IPA was used to analyze the 23 genes and generate enrichment scores (statistical significance) for a number of biological categories and canonical pathways as well as for deriving potential network relationships. This analysis revealed that of the top 10 significantly ranked biological categories associated with the 23 signature genes, 7 have an association with cell proliferation and cancer or processes that when dysregulated could theoretically lead to neoplasia (data available upon request to the author).

In order to investigate possible relationships that may underlie the 23 genes in the signature, an examination of potential transcriptional coregulation was conducted. A response element enrichment analysis of the proximal promoter regions of all 23 genes revealed that a number of TFREs were significantly enriched for. The four most significantly enriched TFREs relative to the reference set were, in order of significance (all p < 0.05), AP-1, PBX-1, NFKB, and AHR. Although AP-1, NFKB, and AHR are associated with a general response to cellular stress and response to xenobiotics, the role of PBX1 (pre-B-cell leukemia homeobox 1) in the liver is unclear. This gene encodes a homeobox family transcription factor initially identified as a proto-oncogene associated with B-cell leukemia and has been reported to be required for the maintenance of hematopoiesis in the fetal liver and implicated in promoting hematopoietic progenitor cell expansion (DiMartino et al., 2001); however, it has not been reported to play a role in hepatocarcinogenesis. The genes harboring a PBX response element include not only the DNA damage–responsive genes, Akr7a3, Aldh1a1, Tap1, Cdkn1a, and Ces2 but also genes originally identified in the Iconix (Cited4, Ica2) and J&J (Sel1I) signature (Supplementary table 2).

DISCUSSION

Our approach to improve human carcinogenicity risk assessment has focused on the development of biomarkers for the early prediction of NGHCs in rats and the simultaneous application of genomics to understand their potential modes of action, in order to enable a proactive human hepatocarcinogenicity risk assessment prior to initiation of the 2-year rodent bioassay. To this end, we have leveraged previously published genomic biomarker discovery efforts (Ellinger-Ziegelbauer et al., 2004; Fielden et al., 2007; Nie et al., 2006) to develop a signature on the TaqMan array card to facilitate prediction of NGHCs using data from short-term repeat dose rat toxicology studies. Together with the diagnostic expression profiles provided by the accompanying data set, the data also facilitate investigations into the potential modes of action.

Numerous efforts have attempted to discover and evaluate novel biomarkers to predict carcinogenicity of nongenotoxic carcinogens (Waters et al., 2010); however, the biomarkers were often derived from relatively small data sets and/or lacked adequate independent testing. As a result, these putative biomarkers may not be widely recognized or applicable outside their laboratory of origin. In response to these limitations, we have focused our efforts on deriving a signature using a large training set of 72 compounds and subsequently evaluating the performance of the signature on over 900 RNA samples representing 169 treatment groups (86 unique compounds, including 4 GHCs) from eight different research sites. This facilitated an estimation of the likely sensitivity and specificity when applied to different treatment protocols and allowed us to understand the strengths and limitations of the signature to help define its boundaries of use.

In general, a predictive model can only be expected to perform well based on the training information. The training set was derived from a homogenous data set that utilized a common rat strain (SD), gender (male), dose-setting criteria (maximum tolerated dose), time point (day 5) and RNA isolation procedure. Although the boundaries of use that are expected to maximize classification accuracy are likely to be defined by the training set, it was important to test these boundaries with an independent and heterogeneous data set as this would reflect real world application. In order to generate composite estimates of classification accuracy, it was convenient to merge compounds into a single score and remove overlapping compounds that were also utilized in the 72 compound training set. This resulted in 169 test treatments, generated with varying study protocols, which reduced to 66 unique predictions that included 15 NGHCs and 51 NHs. Using this method, the sensitivity, or true-positive rate, was 67% and the specificity, or true-negative rate, was 59%. Although the sensitivity may be considered acceptable, we were hampered by the relatively few (15) independent compounds available for testing and so these results should be viewed as preliminary. By contrast, a fair assessment of the true-negative rate was provided by 51 independent NH compounds. Although numerous false positives and negatives were identified, they appeared to be enriched in samples predominantly from the J&J and NTP data sets. This may not be surprising as the protocols used by these two sites differed dramatically from the training set. For example, the sensitivity and specificity of the signature against the J&J compounds alone were 38 and 61%, respectively, and a high number of false positives were observed when testing compounds in the NTP data set. The AUC for the J&J compounds is right at random chance with a value of 0.49, but it is based on only eight positive samples, so the prevalence of NGHC compounds in the J&J and non-J&J data sets are quite different. In the end, the overall sensitivity estimate is driven by the non-J&J compounds whereas the specificity estimate is driven by the J&J compounds. The reason for the incorrect classifications in the J&J and NTP data sets are possibly numerous. For example, the false positives in the J&J data set may be a result of testing samples obtained only 24 h after a single high dose, as detailed in Nie et al. (2006). The acute transcriptional response after the first dose is expected to be highly variable and may result in other compensatory changes that do not uniquely reflect the predictive changes that may persist during the course of repeated daily doses, as represented in the training set. Given that the training set is based on maximum doses that are tolerated for up to 5 days (see Fielden et al., 2007), it is likely that the optimal signature performance for this particular model would be obtained when following a similar dosing paradigm. This is exemplified by the 3/3 correct predictions from samples evaluated at Abbott where compounds were dosed for 5 or 6 days in SD rats. However, it is important to consider that the use of a maximally tolerated dose in the training set may be of detriment when the signature is applied to samples that have not achieved such a dose level.

The reason for the high number of false positives in the NTP data set is unclear, but we cannot rule out the possibility that it is due to the use of male F344 rats or the inclusion of primarily nontherapeutic chemicals that may have unique modes of action that are difficult to classify with the current model. This latter hypothesis seems unlikely, however, given the large training set that includes nontherapeutic compounds of diverse modes of action. Differences in RNA isolation procedures may impact the results here. Additionally, it does not appear that false positives are generated by hepatotoxic treatments based on the differentiation of signature scores in the methapyrilene treatment groups and the results of other hepatotoxic treatments in the data set. One must also consider the score produced by the final model as it is likely that performance improves if one considers results that are farther away from the threshold that distinguishes NGHC and NH compounds (see Supplementary figure 7). These data suggest how samples generated by protocols distinct from the training set can result in poor signature performance and reinforces the concept that classification accuracy is likely to be optimal when samples are generated using protocols most similar to that of the training set. The results from both the evaluation study and the independent data set reinforces the protocol established by the Iconix training set as constituting the optimal boundaries of use. Based on these findings, a recommended protocol would include repeat daily dosing in male SD rats for ~5 days to generate data most comparable to the training set and maximize the potential benefit of this predictive assay. The number of animals per treatment group is recommended to be at least three, although it is recognized that more biological replication for test samples should improve the overall precision of the prediction. The use of only two or three animals per group in the training set is unlikely to have negatively affected the performance of the signature because the initial model evaluation exercise resulted in a robust area under the ROC curve of 0.84. However, any increased precision afforded by more replication may improve the confidence in the predictions, particularly those close to the classification threshold.

Comparison of the qPCR–based model to the microarray-based model from previous publications (Fielden et al., 2007; 2008) showed that performance is largely preserved across platforms. The most instructive comparison is the similarity of AUC estimates derived on the test data from the process evaluation step. In that step, the AUCs derived on the Iconix test data from the microarray and qRT-PCR–based classifiers are 0.89 and 0.84, respectively. We compared the microarray and qRT-PCR–based models using 45 compounds in the independent signature evaluation step. In that context, the AUC estimates of the microarray and qRT-PCR–based models were 0.65 and 0.66, respectively. Although the scores are less correlated with a Pearson correlation coefficient of 0.5, the classification calls per model have ~73% overall agreement. This suggests that the two models have similar performance characteristics (see Supplementary Results for more information).

Evaluating the effect of gender or strain on signature performance was not appropriately permitted by the available test samples; however, it is plausible that pharmacological mechanisms linked to tumor induction in the rat (i.e., PPARα induction) will be adequately maintained across genders and strains of rat. Where these variables impact pharmacokinetics, a quantitative or qualitative difference in the gene expression profile and signature outcome may be anticipated, although we have not formally tested this possibility. In one case, N-vinylpyrrolidone-2 was evaluated at the same dose in both male and female rats and the signature scores were highly similar. It is noteworthy that a number of compounds found to induce liver tumors in female rats only were positively identified by the signature despite using male expression data (e.g., carbon tetrachloride). The inclusion of expression data in the training set from male rats treated with female-specific hepatocarcinogens, such as diethylstilbestrol, 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), or chloroform, may have helped in this regard and increase the sensitivity of the assay to detect potential NGHCs. The effect of gender or strain on signature performance should still be considered unknown and suitably accounted for in the interpretation of any testing.

In addition to evaluating the predictive accuracy of the signature, we demonstrated that how the genomic data and the use of similarities in expression profiles could generate hypotheses for potential modes of action for NGHCs. Numerous examples exist that demonstrate the utility of toxicogenomic data to help understand mechanisms of carcinogenesis (Waters et al., 2010); however, it was concerning whether or not an expression profile of only 22 predictive genes would be sufficient to reveal information indicative of a compounds mode of action. It was surprising then that the expression profiles of well-characterized NGHCs with similar mode of action maintained a high degree of similarity. For example, the two genotoxic carcinogens aflatoxin B1 and diethylnitrosamine were found to cluster most similar to each other even when clustered among NGHCs. This raises the possibility that compounds with genotoxic activity may be identified through clustering of expression profiles, in addition to specifically evaluating the induction of the DNA damage–responsive genes included on the TaqMan array. Although the interpretation of clustering patterns can be subjective, it nonetheless provides valuable clues to guide more definitive investigate work that can help explain the mode of action of a novel compound. More extensive evaluations using whole genome arrays can provide more data; however, this could make interpretation more difficult because a database of known reference expression profiles from which comparisons can be drawn would not be as readily available as it is with the TaqMan array data set described here.

To add additional weight of evidence for use of the signature as both a predictive and mechanistic tool, it was of interest to understand if the 22 biomarker genes had a functional role in carcinogenesis, proliferation, and/or related phenotypes. Although the gene members of an algorithm-derived classifier are selected based on performance optimization and assay design, there is an underlying assumption that their classifying power is dependent on, a not always obvious, but nonetheless real connection to the underlying biology associated with the predicted phenotype. Failure to identify any known functional connection may cast doubt on the validity of the signature, although it is recognized that our knowledge of carcinogenesis and gene function is incomplete. Nonetheless, the combination of literature mining, pathway analysis, and transcription factor binding site analysis together provided support for a linkage between these genes and cellular processes associated with cell proliferation, growth regulation, injury repair, and cancer, all of which when dysregulated could lead to carcinogenesis. The possibility that compounds that induce liver tumors via a nongenotoxic mechanism may be eliciting a common transcriptional regulatory response, such as the possible activation of the homeobox transcription factor Pbx1, is an intriguing one that warrants further investigation. In any case, a complete understanding of the biology underlying the genes in the signature should not prohibit practical application of this tool.

By comparison to other approaches for predicting NGHCs, the predictive accuracy of the signature is greater or comparable to other histological-based endpoints that have been proposed and evaluated for their ability to predict carcinogenic outcome (Allen et al., 2004; Elcombe et al., 2002; Ito et al., 2003). The advantage of the current genomic approach is the ability to facilitate early and more efficient evaluation of molecules because it relies on a short-term repeat dose rat toxicity study rather than histological indices following chronic treatment (Allen et al., 2004; Elcombe et al., 2002) or a laborious initiation, partial hepatectomy, and promotion phase of treatment (Ito et al., 2003). This approach also provides a means to generate mechanistic information that other proposed predictive methods fail to provide (Contrera et al., 2003; Lee et al., 1995; Mauthe et al., 2001).

The challenge with evaluating this or other methods intended to predict carcinogenic outcome is the reliance on the rodent bioassay as the gold standard to which accuracy is defined. Due to the variable nature of the bioassay itself and the influences of dose, route of administration, strain, gender, and/or other experimental variables known to influence the outcome of the bioassay, the determined accuracy of the signature is subject to not only the intrinsic variation in the genomic assay but also the variation in the benchmark to which the signature is measured against. As a result, we cannot discount the possibility that false positives reported by the signature are true signals or mechanistic events relevant to proliferative potential, which did not happen to materialize into a phenotypic effect in the rodent bioassay due to differences in the aforementioned variables used between the assays. Likewise, false negatives may arise when low doses or early time points are evaluated that do not produce drug exposures or cumulative effects sufficient to perturb the biomarker genes. Therefore, the sensitivity and specificity of the signature reported here is a composite estimate and should be used as a guide rather than an absolute measure of performance. This could be said of other attempts to derive signatures predictive of hepatocarcinogenic activity (Ellinger-Ziegelbauer et al., 2008; Uehara et al., 2008), which notably have not reported to be 100% accurate either. Perhaps training and test sets composed of samples from longer term treatments would result in gene expression changes that are more prognostic of chronic lifetime changes in carcinogenic outcome, although this would limit the value of obtaining early predictions and mechanistic data as presented here. In practice, each compound should be evaluated individually in light of its dose-response, concurrent pathology, genotoxic potential, and any mechanistic data available.

A comparison of the approach presented here with alternative predictive assays or approaches designed to predict nongenotoxic hepatocarcinogens reveals dramatic differences in the relative sensitivity and specificity for prediction, utility for screening, and the degree of mechanistic data provided. For example, the Ito Medium Term bioassay reveals a higher accuracy for prediction (92%) (Ito et al., 2003), however, it does not afford much mechanistic information or provide a means to rapidly screen compounds. Other methods relying on histological endpoints from chronic studies suffer from poor accuracy, low throughput, and do not provide mechanistic insight (Allen et al., 2004; Elcombe et al., 2002). More recent efforts utilizing a similar genomic approach have reported favorable prediction accuracy (Ellinger-Ziegelbauer et al., 2008; Uehara et al., 2008), however, the reported performance should be viewed with caution because validation on a wider set of diverse samples has not been reported and the use of smaller training and test sets will increase the likelihood of bias in the performance estimates. Therefore, we believe the currently proposed assay system offers the advantages of reasonable predictive accuracy, moderate throughput, and a means to begin to understand mode of action.

Previous studies have illustrated the application of gene expression dose-response data to establish benchmark dose values for nongenotoxic carcinogens in order to determine a threshold, or point of departure, for risk assessment (Bercu et al., 2010; Thomas et al., 2007, 2010). These approaches utilized the dose-response of genes aggregated in pathways and Gene Ontology processes, which assume that changes in these groups of genes are key events in the mode of action for these carcinogens. In a similar manner, the genes in the current signature could be used to establish benchmark doses from short-term dose-response studies to estimate points of departure for nongenotoxic events driving hepatocarcinogenicity. Further evaluation would be needed to assess this possibility. Therefore, the outcome of this predictive assay should be viewed as solely a hazard identification tool. In this context of use, it is advantageous to consider compounds in the training set that cause liver tumors in any strain, gender, or dose in order to increase the sensitivity of the assay. False positives could be better tolerated in a predictive tool because the outcome would not necessarily limit development of a positive compound. Instead, a positive result would initiate investigations or development strategies to build a weight of evidence for carcinogenic risk and understand the potential modes of action before obtaining results from the 2-year rodent bioassay. Considering the frequency by which liver weight elevation and hepatocellular hypertrophy is observed in preclinical drug discovery, this approach may enable a rapid understanding of the potential mechanism(s) and relevance of the finding for humans. In addition to prospective applications in drug discovery and development, the signature would also be of use retrospectively when tumors or preneoplastic lesions are observed in chronic toxicology studies and a mechanistic understanding is needed to inform the risk to humans. Additionally, it would be useful to differentiate and prioritize molecules when structurally related chemicals have been identified as having a hepatocarcinogenic risk.

Hepatic adenomas and carcinomas are the most frequent neoplastic lesion in the 2-year rodent bioassay (Gold et al., 2005); however, a broad range of tumor types is observed. Unfortunately, methods to predict carcinogenicity in tissues outside the liver still remain limited, although genomic approaches have shown promise for the prediction of lung carcinogens (Thomas et al., 2009). Previous results, albeit limited, have suggested that hepatic gene expression data may be predictive of carcinogenic potential in extrahepatic tissues (Nie et al., 2006). Although the biological rationale for how hepatic expression could predict carcinogenic outcome in other tissues is currently unclear, this possibility was intriguing because it would significantly expand the utility of the current genomic signature. As the current data set included a number of nongenotoxic carcinogens that caused tumors in tissues outside the liver, we applied the current signature to these compounds to test this hypothesis. The results, however, indicate that the current model trained to detect hepatocarcinogens is unable to accurately predict carcinogens in other tissues (data not shown). Alternative models trained specifically on nongenotoxic carcinogens from Table 2, regardless of target tissue, also failed to appreciably predict carcinogens from the independent data set (data not shown). It is likely that alternative approaches will be needed to identify extrahepatic carcinogens.

In summary, we have developed and extensively evaluated a hepatic gene expression-based signature for NGHCs on a moderate throughput, cost-effective and well-validated TaqMan array platform using a training set derived from short-term rat toxicology studies and tested on a large heterogeneous test set. These results, in conjunction with previous publications demonstrating the predictive and mechanistic utility of the genes (Fielden et al., 2007, 2008; Nie et al., 2006), add to the weight of evidence demonstrating the practical application of genomic biomarkers for use in the assessment of potential hepatocarcinogens. The classification results on a large heterogeneous data set underscore the importance of protocol on the boundaries of use for the signature and to utilize samples that most closely follow the protocol established by the training set. Dissemination of the underlying expression data and commercial availability of the TaqMan array assay described here should facilitate further evaluation of this research tool.

Supplementary Material

Supplementary results
Supplementary table 1
Supplementary data file 004
Supplemenary materials and methods

SUPPLEMENTARY DATA

Supplementary data are available online at http://toxsci.oxfordjournals.org/.

ACKNOWLEDGMENTS

The authors would like to acknowledge Iconix Biosciences (now Entelos) for donating the liver RNA samples, Asuragen Services for performing experiments, Applied Biosystems (Life Technologies) for providing custom TaqMan arrays, Deepa Eveleigh, Sandi Calhoun, Michael McMillian, Joanne Tran, Rong Hu, Marnie Higgins-Garn, Rita Ciurlionis, and Olimpia Disorbo for laboratory support; Cassandra Mtine, Lindsay Lehman, Phil Rossi, and Elizabeth Walker for administrative support; and many others within the Predictive Safety Testing Consortium for constructive feedback and encouragement. This article may be the work product of an employee or group of employees of the National Institute of Environmental Health Sciences (NIEHS), National Institutes of Health (NIH), however, the statements, opinions, or conclusions contained therein do not necessarily represent the statements, opinions, or conclusions of NIEHS, NIH, or the U.S. government. J.S. is an employee of Life Technologies, a company that sells the TaqMan array. A.A. and A.K. are employees of Asuragen, a company that offers gene expression and TaqMan array services.

FUNDING

Member contributions to the Predictive Safety Testing Consortium of the Critical Path Institute.

REFERENCES

  1. Allen DG, Pearse G, Haseman JK, and Maronpot RR (2004). Prediction of rodent carcinogenesis: an evaluation of prechronic liver lesions as forecasters of liver tumors in NTP carcinogenicity studies. Toxicol. Pathol 32, 393–401. [DOI] [PubMed] [Google Scholar]
  2. Auerbach SS, Shah RR, Mav D, Smith CS, Walker NJ, Vallant MK, Boorman GA, and Irwin RD (2010). Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning. Toxicol. Appl. Pharmacol 243, 300–314. [DOI] [PubMed] [Google Scholar]
  3. Bercu JP, Jolly RA, Flagella KM, Baker TK, Romero P, and Stevens JL (2010). Toxicogenomics and cancer risk assessment: a framework for key event analysis and dose-response assessment for nongenotoxic carcinogens. Regul. Toxicol. Pharmacol 58, 369–381. [DOI] [PubMed] [Google Scholar]
  4. Booth ED, Jones E, and Elliott B,M (2007). Review of the in vitro and in vivo genotoxicity of dichlorvos. Regul. Toxicol. Pharmacol 49, 316–326. [DOI] [PubMed] [Google Scholar]
  5. Brambilla G, and Martelli A (2009). Update on genotoxicity and carcinogenicity testing of 472 marketed pharmaceuticals. Mutat. Res 681, 209–229. [DOI] [PubMed] [Google Scholar]
  6. Chen HH, Chen TW, and Lin H (2010). Pravastatin attenuates carboplatin-induced nephrotoxicity in rodents via peroxisome proliferator-activated receptor alpha-regulated heme oxygenase-1. Mol. Pharmacol 78, 36–45. [DOI] [PubMed] [Google Scholar]
  7. Christensen FM, Eisenreich SJ, Rasmussen K, Sintes JR, Sokull-Kluettgen B, and Van de Plassche EJ (2011). European experience in chemicals management: integrating science into policy. Environ. Sci. Technol 45, 80–89. [DOI] [PubMed] [Google Scholar]
  8. Cohen SM (2004). Human carcinogenic risk evaluation: an alternative approach to the two-year rodent bioassay. Toxicol. Sci 80, 225–259. [DOI] [PubMed] [Google Scholar]
  9. Cohen SM (2010). Evaluation of possible carcinogenic risk to humans based on liver tumors in rodent assays: the two-year bioassay is no longer necessary. Toxicol. Pathol 38, 487–501. [DOI] [PubMed] [Google Scholar]
  10. Contrera JF, Matthews EJ, and Daniel Benz R (2003). Predicting the carcinogenic potential of pharmaceuticals in rodents using molecular structural similarity and E-state indices. Regul. Toxicol. Pharmacol 38, 243–259. [DOI] [PubMed] [Google Scholar]
  11. Davies TS, and Monro A (1995). Marketed human pharmaceuticals reported to be tumorigenic in rodents. J. Amer. Coll. Toxicol 14, 90–107. [Google Scholar]
  12. DiMartino JF, Selleri L, Traver D, Firpo MT, Rhee J, Warnke R, O’Gorman S, Weissman IL, and Cleary ML (2001). The Hox cofactor and proto-oncogene Pbx1 is required for maintenance of definitive hematopoiesis in the fetal liver. Blood 98, 618–626. [DOI] [PubMed] [Google Scholar]
  13. Elcombe CR, Odum J, Foster JR, Stone S, Hasmall S, Soames AR, Kimber I, and Ashby J (2002). Prediction of rodent nongenotoxic carcinogenesis: evaluation of biochemical and tissue changes in rodents following exposure to nine nongenotoxic NTP carcinogens. Environ. Health Perspect 110, 363–375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Ellinger-Ziegelbauer H, Gmuender H, Bandenburg A, and Ahr HJ (2008). Prediction of a carcinogenic potential of rat hepatocarcinogens using toxicogenomics analysis of short-term in vivo studies. Mutat. Res 637, 23–39. [DOI] [PubMed] [Google Scholar]
  15. Ellinger-Ziegelbauer H, Stuart B, Wahle B, Bomann W, and Ahr HJ (2004). Characteristic expression profiles induced by genotoxic carcinogens in rat liver. Toxicol. Sci 77, 19–34. [DOI] [PubMed] [Google Scholar]
  16. Fielden MR, Brennan R, and Gollub J (2007). A gene expression biomarker provides early prediction and mechanistic assessment of hepatic tumor induction by nongenotoxic chemicals. Toxicol. Sci 99, 90–100. [DOI] [PubMed] [Google Scholar]
  17. Fielden MR, Nie A, McMillian M, Elangbam CS, Trela BA, Yang Y, Dunn RT II., Dragan Y, Fransson-Stehen R, Bogdanffy M, et al. (2008). Interlaboratory evaluation of genomic signatures for predicting carcinogenicity in the rat. Toxicol. Sci 103, 28–34. [DOI] [PubMed] [Google Scholar]
  18. Gold LS, Manley NB, Slone TH, Rohrbach L, and Garfinkel GB (2005). Supplement to the Carcinogenic Potency Database (CPDB): results of animal bioassays published in the general literature through 1997 and by the National Toxicology Program in 1997–1998. Toxicol. Sci 85, 747–808. [DOI] [PubMed] [Google Scholar]
  19. Haseman JK, Huff JE, Zeiger E, and McConnell EE (1987). Comparative results of 327 chemical carcinogenicity studies. Environ. Health Perspect 74, 229–235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Ito N, Tamano S, and Shirai T (2003). A medium-term rat liver bioassay for rapid in vivo detection of carcinogenic potential of chemicals. Cancer Sci. 94, 3–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Jacobs A (2005). Prediction of 2-year carcinogenicity study results for pharmaceutical products: how are we doing? Toxicol. Sci 88, 18–23. [DOI] [PubMed] [Google Scholar]
  22. Jacobs A, and Jacobson-Kram D (2004). Human carcinogenic risk evaluation, part III: assessing cancer hazard and risk in human drug development. Toxicol. Sci 81, 260–262. [DOI] [PubMed] [Google Scholar]
  23. Kirkland D, Aardema M, Henderson L, and Muller L (2005). Evaluation of the ability of a battery of three in vitro genotoxicity tests to discriminate rodent carcinogens and non-carcinogens I. Sensitivity, specificity and relative predictivity. Mutat. Res 584, 1–256. [DOI] [PubMed] [Google Scholar]
  24. Kitchin KT, Brown JL, and Kulkarni AP (1993). Predicting rodent carcinogenicity of halogenated hydrocarbons by in vivo biochemical parameters. Teratog. Carcinog. Mutagen 13, 167–184. [DOI] [PubMed] [Google Scholar]
  25. Kitchin KT, Brown JL, and Kulkarni AP (1994). Complementarity of genotoxic and nongenotoxic predictors of rodent carcinogenicity. Teratog. Carcinog. Mutagen 14, 83–100. [DOI] [PubMed] [Google Scholar]
  26. Lee Y, Buchanan BG, Mattison DM, Klopman G, and Rosenkranz HS (1995). Learning rules to predict rodent carcinogenicity of nongenotoxic chemicals. Mutat. Res 328, 127–149. [DOI] [PubMed] [Google Scholar]
  27. Maronpot RR, Flake G, and Huff J (2004). Relevance of animal carcinogenesis findings to human cancer predictions and prevention. Toxicol. Pathol 32(Suppl. 1), 40–48. [DOI] [PubMed] [Google Scholar]
  28. Mauthe RJ, Gibson DP, Bunch RT, and Custer L (2001). The syrian hamster embryo (SHE) cell transformation assay: review of the methods and results. Toxicol. Pathol 29(Suppl.), 138–146. [DOI] [PubMed] [Google Scholar]
  29. Melnick RL, Thayer KA, and Bucher JR (2008). Conflicting views on chemical carcinogenesis arising from the design and evaluation of rodent carcinogenicity studies. Environ. Health Perspect 116, 130–135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Michel C, Roberts RA, Desdouets C, Isaacs KR, and Boitier E (2005). Characterization of an acute molecular marker of nongenotoxic rodent hepatocarcinogenesis by gene expression profiling in a long term clofibric acid study. Chem. Res. Toxicol 18, 611–618. [DOI] [PubMed] [Google Scholar]
  31. Montero R, and Ostrosky P (1997). Genotoxic activity of praziquantel. Mutat Res 387, 123–139. [DOI] [PubMed] [Google Scholar]
  32. Nie AY, McMillian M, Parker JB, Leone A, Bryant S, Yieh L, Bittner A, Nelson J, Carmen A, Wan J, et al. (2006). Predictive toxicogenomics approaches reveal underlying molecular mechanisms of nongenotoxic carcinogenicity. Mol. Carcinog 45, 914–933. [DOI] [PubMed] [Google Scholar]
  33. Nioi P, Pardo ID, Sherratt PJ, Fielden MR, Gollub J, Nie A, and Snyder RD (2008). Prediction of non-genotoxic carcinogenesis in rats using changes in gene expression following acute dosing. Chem. Biol. Interact 176, 252–260. [DOI] [PubMed] [Google Scholar]
  34. Tatematsu M, Tsuda H, Shirai T, Masui T, and Ito N (1987). Placental glutathione S-transferase (GST-P) as a new marker for hepatocarcinogenesis: in vivo short-term screening for hepatocarcinogens. Toxicol. Pathol 15, 60–68. [DOI] [PubMed] [Google Scholar]
  35. Thomas RS, Allen BC, Nong A, Yang L, Bermudez E, Clewell HJ III., and Andersen ME (2007). A method to integrate benchmark dose estimates with genomic data to assess the functional effects of chemical exposure. Toxicol. Sci 98, 240–248. [DOI] [PubMed] [Google Scholar]
  36. Thomas RS, Bao W, Chu TM, Bessarabova M, Nikolskaya T, Nikolsky Y, Andersen ME, and Wolfinger RD (2009). Use of short-term transcriptional profiles to assess the long-term cancer-related safety of environmental and industrial chemicals. Toxicol. Sci 112, 311–321. [DOI] [PubMed] [Google Scholar]
  37. Thomas RS, Clewell HJ III., Allen BC, Wesselkamper SC, Wang NC, Lambert JC, Hess-Wilson JK, Zhao QJ, and Andersen ME (2010). Application of transcriptional benchmark dose values in quantitative cancer and noncancer risk assessment. Toxicol. Sci 120, 194–205. [DOI] [PubMed] [Google Scholar]
  38. Uehara T, Hirode M, Ono A, Kiyosawa N, Omura K, Shimizu T, Mizukawa Y, Miyagishima T, Nagao T, and Urushidani T (2008). A toxicogenomics approach for early assessment of potential nongenotoxic hepatocarcinogenicity of chemicals in rats. Toxicology 250, 15–26. [DOI] [PubMed] [Google Scholar]
  39. Vanparys P, Corvi R, Aardema M, Gribaldo L, Hayashi M, Hoffmann S, and Schechtman L (2011). ECVAM prevalidation of three cell transformation assays. ALTEX 28, 56–59. [DOI] [PubMed] [Google Scholar]
  40. Waites CR, Dominick MA, Sanderson TP, and Schilling BE (2007). Nonclinical safety evaluation of muraglitazar, a novel PPARalpha/gamma agonist. Toxicol. Sci 100, 248–258. [DOI] [PubMed] [Google Scholar]
  41. Ward JM (2008). Value of rodent carcinogenesis bioassays. Toxicol. Appl. Pharmacol 226, 212. [DOI] [PubMed] [Google Scholar]
  42. Waters MD, Jackson M, and Lea I (2010). Characterizing and predicting carcinogenicity and mode of action using conventional and toxicogenomics methods. Mutat. Res 705, 184–200. [DOI] [PubMed] [Google Scholar]
  43. Whysner J, and Williams GM (1996a). D-limonene mechanistic data and risk assessment: absolute species-specific cytotoxicity, enhanced cell proliferation, and tumor promotion. Pharmacol. Ther 71, 127–136. [DOI] [PubMed] [Google Scholar]
  44. Whysner J, and Williams GM (1996b). Saccharin mechanistic data and risk assessment: urine composition, enhanced cell proliferation, and tumor promotion. Pharmacol. Ther 71, 225–252. [DOI] [PubMed] [Google Scholar]
  45. Yamasaki H, Ashby J, Bignami M, Jongen W, Linnainmaa K, Newbold RF, Nguyen-Ba G, Parodi S, Rivedal E, Schiffmann D, et al. (1996). Nongenotoxic carcinogens: development of detection methods based on mechanisms: a European project. Mutat. Res 353, 47–63. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary results
Supplementary table 1
Supplementary data file 004
Supplemenary materials and methods

RESOURCES