Skip to main content
JNCI Journal of the National Cancer Institute logoLink to JNCI Journal of the National Cancer Institute
. 2015 Aug 21;107(11):djv247. doi: 10.1093/jnci/djv247

Predicting Response to Histone Deacetylase Inhibitors Using High-Throughput Genomics

Paul Geeleher 1, Andrey Loboda 1, Divya Lenkala 1, Fan Wang 1, Bonnie LaCroix 1, Sanja Karovic 1, Jacqueline Wang 1, Michael Nebozhyn 1, Michael Chisamore 1, James Hardwick 1, Michael L Maitland 1, R Stephanie Huang 1,
PMCID: PMC4643634  PMID: 26296641

Abstract

Background:

Many disparate biomarkers have been proposed as predictors of response to histone deacetylase inhibitors (HDI); however, all have failed when applied clinically. Rather than this being entirely an issue of reproducibility, response to the HDI vorinostat may be determined by the additive effect of multiple molecular factors, many of which have previously been demonstrated.

Methods:

We conducted a large-scale gene expression analysis using the Cancer Genome Project for discovery and generated another large independent cancer cell line dataset across different cancers for validation. We compared different approaches in terms of how accurately vorinostat response can be predicted on an independent out-of-batch set of samples and applied the polygenic marker prediction principles in a clinical trial.

Results:

Using machine learning, the small effects that aggregate, resulting in sensitivity or resistance, can be recovered from gene expression data in a large panel of cancer cell lines.

This approach can predict vorinostat response accurately, whereas single gene or pathway markers cannot. Our analyses recapitulated and contextualized many previous findings and suggest an important role for processes such as chromatin remodeling, autophagy, and apoptosis. As a proof of concept, we also discovered a novel causative role for CHD4, a helicase involved in the histone deacetylase complex that is associated with poor clinical outcome. As a clinical validation, we demonstrated that a common dose-limiting toxicity of vorinostat, thrombocytopenia, can be predicted (r = 0.55, P = .004) several days before it is detected clinically.

Conclusion:

Our work suggests a paradigm shift from single-gene/pathway evaluation to simultaneously evaluating multiple independent high-throughput gene expression datasets, which can be easily extended to other investigational compounds where similar issues are hampering clinical adoption.


There has been a recent resurgence of interest in broadly targeted cancer agents, including drugs that target fundamental cellular processes, such as histone deacetylase inhibitors (HDI) and proteasome inhibitors. These drugs indirectly affect many pathways and processes, impeding the evolution of cancer cell resistance to therapy (1). Currently, three HDIs, vorinostat (also known as suberanilohydroxamic acid [SAHA] or Zolinza), belinostat, and romidepsin, have been approved by the US Food and Drug Administration (FDA) for treatment of refractory cutaneous T-cell lymphoma (CTCL), with several other HDIs in development (2). Despite the positive findings in preclinical data and in CTCL, patient response to HDIs in other types of cancer has been highly variable and thus the efficacy has been insufficient to support other clinical indications. A better understanding of the mechanisms of vorinostat therapeutic activity and resistance could lead to more effective use of HDIs and may help predict who will benefit from this therapy in a broader context.

Indeed, a very large number of biomarkers have been proposed (3). Some suggested resistance mechanisms include: DNA repair (4,5), changes in apoptosis pathways (6,7), activation of NF-KB (8,9), overexpression of STAT signaling pathway genes (10,11), retinoic acid signaling (12,13), checkpoint kinase activation (14), reactive oxygen species production (15), and endoplasmic reticulum stress signaling (16). Furthermore, the mechanisms by which treatment with HDIs cause tumor cell death are not yet fully understood (17), although it has been associated with changes in gene expression, for example, the reinitiation of aberrantly silenced tumor suppressor and apoptosis genes (18,19), but many other mechanisms have also been proposed (20).

These diverse observations and the failure of any single marker in predicting clinical HDI effects might suggest reproducibility issues. An alternative hypothesis is that the relationship between the molecular mechanisms and drug sensitivity is highly polygenic; thus many of the previously observed associations might indeed be correct, but of an effect size that is not clinically useful as a single gene/pathway marker. If this is the case, the current literature is inadequate to determine whether the reported associations are clinically relevant, to assess their relative importance, or to gauge suitability as biomarkers across a range of different cancers. Here, we present the results from a large-scale gene expression analysis. One of the largest public cancer cell line screening projects (the Cancer Genome Project [CGP] [21]) was used for discovery, while we generated another large independent cancer cell line dataset across different cancers and used it for validation and to rigorously compare different approaches in terms of how accurately vorinostat response can be predicted on an independent out-of-batch set of samples. Furthermore, we applied the polygenic marker prediction principles in a clinical trial and showed that change in expression in whole blood following treatment is effective at predicting thrombocytopenia, a common dose-limiting toxicity for vorinostat.

Methods

Cell Line Screening

The cell line data used for discovery were obtained from the Cancer Genome Project (CGP). Vorinostat sensitivity data were obtained from the CGP website (www.cancerrxgene.org), which includes IC50 values evaluated in 663 cancer cell lines. The CGP expression data were obtained from ArrayExpress (accession number E-MTAB-783) and processed using robust multiarray average (RMA) (8), summarizing probes using the BrainArray (9) Entrez Gene remapped CDF file. Further details are available in the Supplementary Materials (available online).

Gene Set Enrichment Analysis

We assessed enrichment of gene ontology (GO) terms, KEGG pathways and transcription factor binding motifs using GSEA (12). Gene sets were obtained from version 4 of Molecular Signatures Database (MSigDB). When assessing enrichment of vorinostat sensitivity associated gene sets, we first removed the potential confounding effects of cancer type from the expression data by fitting a linear model for the expression of each gene against cancer type and used the residual of this model as the expression estimates in GSEA. For calculating enrichment scores, genes were ranked using Pearson’s correlation with vorinostat IC50 and P values were estimated using sample label permutations.

Predicting Drug Sensitivity in Cell Lines Using Lasso Model

The discovery and validation expression datasets were standardized using the ComBat() function from the SVA (13) library in R. The lasso regression model was fit for the expression of every gene dependent on vorinostat IC50 in CGP, with an interaction term included for each cancer type (encoded as a factor). This yields 137 920 total possible predictors, allowing an effect for every gene across all cancer types and within each cancer type. Of these, 78 genes were selected in the optimal model, which was selected to minimize mean-squared error in 20-fold cross-validation using the cv.glmnet() function from the R package glmnet (14). The model was fit on 410 CGP cell lines and tested on 171 cell lines of our validation set. Cell lines that overlapped both sets were removed from the training set, because including the same biological samples in both sets would artificially inflate the results.

Functional Validation of CHD4 Gene

Functional validation of CHD4 gene expression association with vorinostat sensitivity was accomplished in a human colon cancer cell line (HCT-116) and a human breast cancer cell line (CAMA1), obtained from the American Type Culture Collection (ATCC, Manassas, VA). Further details are given in the Supplementary Materials (available online).

Collection of Clinical Trial Data

A total of 34 treatment refractory cancer patients with advanced solid tumors were recruited at the University of Chicago Medical Center. Written, informed consent was obtained from each subject. The study was approved by the University of Chicago institutional review board (IRB) (IRB# 10-652B. ClinicalTrials.gov Identifier: NCT01281176). The main goals of the clinical study were to evaluate the pharmacokinetics of different vorinostat dosing schedules and examine their effects on patient toxicities. Patients were assigned to treatment with 400mg or higher doses (pharmacokinetically equivalent 1200 or 1600mg) of vorinostat daily from days 1 through 3, followed by a recovery period of four days. Then switched to either 1600 or 400mg of vorinostat from days 8 through 10. Platelet counts were measured before vorinostat treatment and ten days after the beginning of the treatment regime. Gene expression was measured before drug treatment and eight hours thereafter on the third day of vorinostat treatment. Based on observations of dose/response effects, we controlled for drug dosing for all subsequent analyses.

Clinical Expression Change Analysis

Gene expression was measured using Illumina Human HT12 BeadArrays. RNA was isolated from whole blood before vorinostat treatment and eight hours thereafter. Microarrays were processed at the University of Chicago Genomics Core Facility. The data were quality assessed, quantile normalized, and corrected for possible batch effects using the ComBat() function from the SVA library in R. The dataset was restricted to genes that were detected above background (at a threshold of P < .05). Expression change between before-exposure and eight-hour timepoints was calculated for each gene in each patient by the difference in log-transformed normalized fluorescence intensity levels for each gene. Platelet count was measured for each patient before any vorinostat treatment and ten days thereafter. The difference in platelet count was calculated for each patient (platelet change) as a proxy for vorinostat-induced thrombocytopenia. The association between platelet change and expression was calculated using the limma library in R. We controlled for potential confounding variables such as age, batch, drug dose, and sex. Twenty-six subjects with all available data were analyzed. We estimated the false discovery rate using the Benjamini and Hochberg method. Prediction accuracy was estimated using leave-one-out cross-validation, whereby a lasso model was fit iteratively on 25 of the 26 available samples and platelet change predicted on the remaining sample. For each of the 26 rounds of cross validation, the lambda penalty parameter of the lasso model was re-estimated using a second round of LOOCV within the 25 sample training set; this model was then applied to the one remaining sample, which was repeated iteratively until a prediction was generated for all 26 samples. This analysis was performed in R, and all lasso models were fit using the glmnet library.

Statistical Analysis

All basic statistical analyses (linear regression models, analysis of variance [ANOVA], correlation tests) were performed using the base functions in R version 3.0.2. Reported P values were for two-sided tests unless otherwise stated. False discovery rates were estimated using the Benjamini and Hochberg method implemented in the p.adjust() function in R. Figures were created using the base graphics or the ggplot2 package in R and were finalized using the InkScape software.

Results

The In Vitro Vorinostat Drug Response Phenotype

Large panels of cell line drug sensitivity screenings have shown that the distribution of the vorinostat drug response (Figure 1, A and B) is strikingly different to that of very highly targeted agents (eg, nilotinib) (Figure 1C). For vorinostat, like those of cytotoxic drugs, a clear separation between sensitive and resistant cell lines is not observed, which is consistent with that of a complex trait with a multifactorial etiology (Figure 1B).

Figure 1.

Figure 1.

Cell lines from many different cancer types are sensitive to vorinostat. A) Box and whisker plot showing the IC50 values in cell lines from each of the 51 different tissues in the Cancer Genome Project (CGP). Boxes are colored by cancer type. In each box, the middle horizontal lines indicate the median value, the top and bottom horizontal lines indicate the interquartile range, and whiskers indicate the highest or lowest data point within 1.5 times the interquartile range of the upper or lower quartile, respectively. B) Scatter plot showing the IC50 values of vorinostat in CGP. Cell lines that achieved a measurable IC50 within the drug screening window are shown in green; those that did not are shown in red. The maximum and minimum drug screening concentrations are shown as a red dashed line. For clarity this plot shows 100 cell lines chosen at random. C) Scatter plot showing the IC50 values of nilotinib in CGP; figure details are similar to (B). D) Box and whisker plot of vorinostat IC50 values in our validation cohort by cancer type. E) Stripchart of vorinostat IC50 values in our validation cohort further stratified by various subtypes. The mean and standard error for each group are shown in gray. ALCL = anaplastic large cell lymphoma; AML = acute myeloid leukemia; CTCL = cutaneous T-cell lymphoma; CML = chronic myeloid leukemia; CNS = central nervous system; DLBCL = diffuse large B-cell lymphoma; ER = estrogen receptor; GI = gastrointestinal; HER2 = human epidermal growth factor receptor 2; NOS = not otherwise specified; NSCLC = non–small cell lung cancer; SCLC = small cell lung cancer.

In the CGP data, there were clear differences in vorinostat sensitivity based on tissue-of-origin (n = 473, P = 4×10–11 from ANOVA) (Figure 1A), with results consistent with expectation given previous reports. For example, hematological neoplasms tended to be most sensitive. Tissue of origin also captured a substantial proportion of the variability in drug response (R 2 = 0.19). Vorinostat is currently approved for only CTCL, and CGP does not contain CTCL cell lines; however, we included five CTCL cell lines in our validation cohort of 171 lymphoma, breast, colon, or lung cancer cell lines (Figure 1, D and E). As expected, CTCL cell lines were comparatively sensitive to vorinostat (Figure 1E). Crucially, however, in both discovery and validation panels there were highly sensitive samples from many other cancer types. This observation is consistent with clinical findings, where the drug has been shown to sometimes be highly effective against several types of solid tumors (22).

Gene Expression Vorinostat Sensitivity Association

When controlling for tissue of origin in CGP, the expression levels of 52 genes remained significantly associated with vorinostat IC50 at false discovery rates (FDR) of less than 0.05 (Figure 2A; Supplementary Table 1, available online). There was a strong enrichment of low P values (Figure 2B), suggesting that the gene expression data provide additional signal aside from tissue of origin. These results were reflected in our validation cohort, where 18 (of a possible 38 shared genes) were associated with IC50 at P level of less than .05 (compared with 1.9 expected by chance) despite the different microarray platforms and experimental protocols. The level of enrichment was also similar in both cohorts (Figure 2C). Despite the robust associations, this set of top genes performed poorly as predictors of drug response in our validation cohort. In fact, a multivariable linear model fit from these 38 genes did not predict response better than a model fit using cancer type alone (r s = 0.24 compared with r s = 0.26).

Figure 2.

Figure 2.

Gene expression is informative towards vorinostat sensitivity. A) Heatmap showing the expression levels of the 52 genes achieving false discovery rates of less than 0.05 for association with vorinostat sensitivity in the Cancer Genome Project (CGP) (when controlling for tissue-of-origin). Genes are clustered using hierarchical clustering using a Spearman correlation based distance metric. B) Histogram of P values for gene-drug associations in CGP. C) Histogram of P values for gene-drug associations in our validation cohort.

Gene Set Analysis Further Elucidates the Molecular Mechanisms of Vorinostat Sensitivity

Next, we investigated whether any of the pathways/processes previously implicated in vorinostat resistance/response could be identified in these data in an unsupervised manner. We used gene set enrichment analysis (GSEA) (23) to identify KEGG pathways, gene ontology terms, and transcription factor binding motifs associated with vorinostat IC50. Significance was estimated using a robust (24,25) permutation-based approach. We identified several highly significant gene sets following correction for multiple testing (Supplementary Tables 2–13, available online); however, a particularly strong enrichment was observed for five gene sets, for which an enrichment score as extreme as that observed in the original data could not be achieved from 10 000 sample label permutations (Table 1 and Figure 3A; for details see the Supplementary Materials and Supplementary Tables 14–18, available online). Critically, we subsequently reproduced four of these five gene sets in our validation cohort. Our validation cohort measures many additional transcripts and cell lines, supporting the robustness of the results.

Table 1.

GO terms and KEGG pathways most strongly associated with vorinostat IC50*

Gene set name P CGP ES FDR FWER P in validation set Database
Chromatin remodeling <5×10–6 0.77 0.01 0.03 < 5×10–5 GO
Lysosome <1×10–4 0.65 4×10–3 3×10–3 4×10–3 KEGG
Ribonucleoprotein complex Biogenesis & assembly <5×10–5 -0.71 9×10–3 0.01 .03 GO
Glycosaminoglycan Degradation <1×10–4 0.77 0.02 0.03 .04 KEGG
RNA binding <5×10–5 -0.6 0.01 7×10–3 .38 GO

*A positive enrichment score (ES) indicates that the expression of genes in this set is positively associated with vorinostat IC50; for example, high expression of genes in the “lysosome” pathway is associated with vorinostat resistance (ie, high IC50). ES = enrichment score; FDR = false discovery rate; FWER = family-wise error-rate.

Figure 3.

Figure 3.

Many pathway/processes are reproducibly associated with vorinostat sensitivity, but only a polygenic model provides accurate prediction of sensitivity. A) Enrichment of gene ontology biological processes “Chromatin Remodeling” and “Ribonucleoprotein Complex Biogenesis and Assembly” for association with vorinostat IC50 in the Cancer Genome Project (CGP). Enrichment for KEGG pathways “Lysosome” and “Glycosaminoglycan Degradation” for association with vorinostat IC50 in CGP. A positive enrichment score indicates that high expression of these genes is associated with resistance to vorinostat (ie, high IC50). B) Scatter plot showing the predicted drug sensitivity (using the 78 gene Lasso model) plotted as a function of the measured IC50 value in the validation cohort (r s = 0.52, P = 2.6×10–13, n = 171). A linear regression line is shown in blue and confidence intervals in gray. C) Scatter plot showing the association between IC50 in our validation set and CGP for cell lines that were screened in both cohorts (r s = 0.58, P = 2.9×10–7, n = 63). NB these 63 samples were not included when fitting the lasso model in CGP, as this would inflate the estimate of prediction accuracy. D) Bar chart for different cancer types showing the proportion of samples in The Cancer Genome Atlas for which CHD4 is amplified or deleted. ACyC = adenoid cystic carcinoma; CS = carcinosarcoma; FDR = false discovery rate; GBM glioblastoma; MICH = University of Michigan; MSKCC = Memorial Sloan Kettering Cancer Center; TCGA = The Cancer Genome Atlas.

Interestingly, a multivariate linear model fit using the four highly significant gene sets did improve prediction of drug sensitivity over a model fit using cancer type alone (r s = 0.31 compared with r s = 0.26). This outperformed the results achieved using only the top associated genes and is not surprising given the evidence linking these pathways and processes to vorinostat response.

A Polygenic Machine Learning Algorithm for Vorinostat Sensitivity Prediction

The most important goal of assessing the association between any putative molecular marker and drug sensitivity is to develop robust predictors of drug response. Our results showed that while the expression of many genes and pathways/processes are reproducibly associated with vorinostat sensitivity, predicting drug response using these markers (individually or collectively) is insufficient. Hence, we developed a predictive model by applying lasso regression to the CGP cell line data. Our model fitted the expression of every gene against vorinostat IC50 and also included an interaction term for each cancer type. The optimal model was identified by 20-fold cross-validation (Supplementary Figure 1, available online). The final model contained 78 genes as predictors (Supplementary Table 19, available online). Of these, only seven genes were cancer type–specific and the effect size of those predictors was comparatively small. Thus, the best gene expression based predictors of sensitivity are highly similar across cancer types. Vorinostat is under clinical investigation for many hematological and solid tumors; thus, this finding suggests that similar biomarkers may be applicable across diverse cancer types. We tested this model in our validation cohort and the predicted drug response was highly correlated with the measured drug sensitivity across these cell lines (r s = 0.52, P < .001, n = 171) (Figure 3B). To contextualize this result, we observed a similar correlation for the IC50 values derived from the subset of biologically identical cell lines that were screened in both CGP and our validation set (r s = 0.58, P < .001, n = 63) (Figure 3C). These repeated measurements provided an approximate upper bound on the performance of the model; thus, it is clear that the polygenic model is, in a preclinical setting, capturing close to as much variability in vorinostat response as could realistically be captured by any approach based on molecular markers. This provides a vast improvement upon selecting candidate genes or pathways and demonstrates that the small effects that cumulatively result in drug sensitivity or resistance can in principle be recovered using supervised machine learning. Crucially, the predicted and measured IC50 values were also highly correlated within all four cancer types of the validation cohort (lymphoma r s = 0.53, P < .001, n = 44; lung r s = 0.50, P < .001, n = 47; breast r s = 0.21, P = .09, n = 39; colon r s = 0.41, P = .03, n = 41). In all cases, these results represent an improvement on the predictions obtained when the model is fit using a subset of the CGP data from only the same cancer type (lymphoma r s = 0.48, lung r s = 0.49, breast r s = 0.08, colon r s = -0.12), further supporting the argument that predictors of drug sensitivity in one cancer type are often informative in a broader context.

A Novel Marker of Vorinostat Sensitivity

The lasso algorithm applied above reduces the set of vorinostat-associated genes to an easily interpretable model. It is vital to understand that the lasso approach will tend to fit a model with as little redundancy as possible; the 78 genes fitted can be thought of as being independent signals that capture the disparate effects of diverse biological pathways and processes. While this does not guarantee that the selected genes are causative, in examining the model coefficients it is clear that some of the genes identified are strong candidates for a causative role (Supplementary Table 19, available online). Examples include BFAR and NR4A3, which regulate apoptosis (26,27), ABCB6, which is involved in multidrug resistance (28), and NKRF, which represses NF-κB (29), all processes that have previously been linked to vorinostat response. However, one of the most interesting findings may be CHD4, which had the second largest overall effect size. This gene is a helicase that is directly involved (Supplementary Figure 2, available online) in the histone deacetylase complex (30). TCGA has included CHD4 in its recent list of “highly significantly mutated” cancer genes, where it was the 24th most commonly mutated of all genes (31); it is also very commonly subject to copy number amplification (Figure 3D) (32,33). Importantly, CHD4 amplification is associated with altered expression and with poorer disease-free progression (Supplementary Figures 3 and 4, available online), suggesting that such patients may benefit from novel treatments. CHD4 expression has not been previously implicated in response to vorinostat. Additionally, this finding is consistent with our GSEA results that (broadly) implicated genes involved in chromatin remodeling. In CGP, CHD4 expression is also associated with sensitivity to the other HDIs screened (P = .016 for dacinostat and P = .057 for entinostat). Given the multiple lines of evidence and clear relevance to clinical cancers, we used siRNA knockdown to investigate whether CHD4 expression may perform a causative role in vorinostat response. Following knockdown, we observed a statistically significant decrease in the sensitivity of the colon cancer cell line HCT-116 to vorinostat (P < .001) (Figure 4A). The same trend was observed in a CAMA1 breast cancer cell line (P = .008) (Figure 4B). These results suggest CHD4 as a pan-cancer biomarker to HDI sensitivity.

Figure 4.

Figure 4.

Effect of CHD4 knockdown on cellular sensitivity to vorinostat. Change in cell viability following vorinostat treatment was measured in (A) HCT-116, a colon cancer cell line and (B) a CAMA1 breast cancer cell line after CHD4 knockdown by siRNA. P values were derived from two-way analysis of variance when comparing scramble-treated cells to siRNA-treated cells.

A Polygenic Model for Vorinostat Adverse Event Prediction in a Clinical Study

We evaluated the performance of a polygenic drug sensitivity prediction model in vivo using data gathered in a clinical trial (IRB# 10-652B, manuscript in preparation) where the vorinostat induced toxicity was carefully monitored through weekly platelet counts in a group of 26 cancer patients with solid tumors. We measured change in patients’ platelet count between day 0 (before treatment) and day 10 (following two cycles of vorinostat treatment). Change in platelet count was highly variable, continuous, and approximately normally distributed in the cohort (rather than strongly dichotomous) suggesting that this trait may also be influenced by multiple factors (Figure 5A). We also measured the change in in vivo gene expression levels in whole blood before vorinostat treatment and eight hours thereafter on day 3 of treatment. First, we assessed the association of before treatment (baseline) gene expression levels in whole blood and platelet change at day 10. There was no enrichment of low P values in these data (Figure 5B) and no evidence that these baseline gene expression levels were predictive in this cohort. We hypothesized that drug treatment may introduce greater variability in whole-blood gene expression and may thus provide improved power to identify relevant molecular markers of drug response. Hence, for genes reliably expressed at both baseline and eight hours following treatment, we calculated change in gene expression level. Unlike the baseline expression data, we saw a strong enrichment of low P values (Figure 5C), with 676 genes reaching a P value of less than .05, compared with an expected number of 395 under a uniform distribution of P values. Critically, a leave-one-out cross-validation (LOOCV) assay suggested that change in platelet count can be predicted from these expression change data (r = 0.55, P = .004) (Figure 5D). This provides a clinical validation with a quantitative marker of drug effect in a small sample size. Overall, these results support further use and investigation of this approach in predicting adverse events and potential treatment outcomes for broadly targeting classes of drugs. The optimal polygenic (expression change based) model fit on all 26 samples is shown in Supplementary Table 20 (available online).

Figure 5.

Figure 5.

Vorinostat toxicity in cancer patients can be predicted using post-treatment gene expression change obtained from whole blood. A) Quantile-quantile plot showing the platelet change phenotype across all patients plotted against a theoretical normal distribution. The change in platelet count is approximately normally distributed. B) Histogram of the P values from association of baseline (before treatment) gene expression with platelet change. C) Histogram of the P values from association of gene expression change (between baseline and eight hours after treatment) with platelet change. D) Scatterplot showing the change in platelet count at ten days against the predicted change in platelet count as estimated from leave-one-out cross-validation.

Discussion

This investigation validated a complex polygenic modeling approach for predicting cellular sensitivity to HDAC inhibition in sets of cancer cell lines and in early clinical effects in patients treated with vorinostat. Previous studies have proposed numerous candidate biomarkers and potential mechanisms for HDI response (22,34); however, in clinical testing, no individual biomarker has been validated. One possible explanation for this is lack of reproducibility, which has plagued biological sciences (35). However, in this study we demonstrated that these issues may also arise because the response to HDIs is determined by an additive combination of disparate molecular effects. To show this, we have used high-throughput gene expression analysis on a very large set of preclinical data. Our analyses revealed novel genes robustly associated with drug response across many cancer types. We also identified pathways and processes that were reproducibly associated with HDI response. Our results recapitulated expectation, provided a highly plausible basis for future research, and suggested that previous findings (eg, the lysosome, MYC expression) may be more broadly relevant than formerly appreciated. Most importantly, given the broad nature of previously proposed markers for this drug, our findings suggest that response to this drug is mediated by many molecular factors. This is a major consideration in the development of clinical biomarkers for this and similar drugs, as it provides one possible explanation for why single gene or pathway markers (often identified through low-throughput approaches) may still fail in clinical studies, despite being reproducibly associated with drug response.

By considering the above findings, we created a very accurate predictive model of vorinostat response in a preclinical pan-cancer setting. This model could predict response in our independent validation cohort with close to the same level of accuracy as was achieved when drug sensitivity was remeasured. This also revealed that genes whose expression was informative of sensitivity were generally informative across every cancer type, as only a very small number of cancer type–specific predictors were selected in the optimal model, suggesting that biomarkers identified in one cancer type could be useful across many cancer types. Our model also allowed us to identify strong candidates for a causative role; for example, CHD4, a gene directly involved in the histone deacetylase complex (but never linked to HDI sensitivity), had the second largest effect size of any gene. Indeed, we validated the causative role of this gene using an siRNA knockdown experiment in breast and colon cancer cell lines. CHD4 is also commonly mutated and/or in copy number amplification in many primary human cancers; thus it may play an important role in the clinical setting.

Finally, by applying the principles outlined above to clinical data, we have demonstrated that prediction of toxicities resulting from vorinostat treatment is possible. This involves using gene expression microarrays to measure the change in gene expression in vivo in whole blood before vorinostat treatment and eight hours thereafter. By using a polygenic model, we showed that gene expression change captures a significant proportion of variability in change in platelet count (thrombocytopenia) after ten days of the treatment regime. This model is derived from the clinical data itself and applied in a cross-validation framework because, despite some very encouraging recent results (36,37), there is still no consensus on the level of utility of cell lines for clinical prediction (38). Despite our small sample size (26 patients), we have shown that statistically significant predictions were achieved. However, larger studies will be necessary in order to achieve a level of accuracy of clinical utility. Nevertheless, the proposed approach is promising, particularly considering the lack of alternative effective methods.

This study also had some limitations, particularly the small size of the clinical cohort, which likely limits the number of predictive genes that could be recovered using the lasso model, thus affecting accuracy. The prediction reported is also for a cross-validation analysis, and typically accuracy would be expected to be lower in an out-of-batch set of samples, although this could be offset by using a larger training set (ie, more patients), whereby more predictive genes (those with smaller effect sizes) could likely be identified.

In conclusion, we have discussed several high throughput analyses that have revealed a complex relationship between gene expression and vorinostat response. In a preclinical setting, vorinostat sensitivity can be predicted extremely accurately by accounting for this complexity using a polygenic model. We have identified a component of the histone deacetylase complex (CHD4) as one of the causative mechanisms in vorinostat response. Based on these principles, we have also proposed a method of predicting adverse events in patients. This study exemplifies a paradigm shift in biological research, whereby large high-throughput data are used to generate and inform the hypotheses driving conventional wet-lab biology.

Funding

This study was supported by the National Institute of Health grant UO1GM61393, Circle of Service Foundation Early Career Investigator award, University of Chicago CTSA core subsidy grant, and a Conquer Cancer Foundation of ASCO Translational Research Professorship Award In Memory of Merrill J. Egorin, MD (awarded to Dr. M. J. Ratain). RSH also received support from National Institute of Health grant K08GM089941, National Institute of Health grant R21 CA139278, University of Chicago Support Grant (#P30 CA14599), Breast Cancer SPORE Career Development Award (CA125183), and the National Center for Advancing Translational Sciences of the National Institute of Health (UL1RR024999). PG received support from the Chicago Biomedical Consortium grant PDR-020. Clinical samples were derived from NCT01281176, supported by the Investigational Drug Branch of the US National Cancer Institute by grant U01 CA069852 and agreement with Merck, Inc. SK and MLM additionally received support from UM1 CA186705.

Supplementary Material

Supplementary Data

We thank the University of Chicago Cancer Center Clinical Pharmacology Core Laboratory and Genomic core facility in collecting, processing clinical samples, and generating the gene expression data from the clinical study.

AL, MN, and MC are employed by Merck Research Laboratory. JH was an employee of Merck Research Laboratory while working on this project. The other authors have no conflicts of interest to disclose.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from JNCI Journal of the National Cancer Institute are provided here courtesy of Oxford University Press

RESOURCES