Abstract
Objective:
Radiation therapy is among the most effective and widely used modalities of cancer therapy in current clinical practice. In this era of personalized radiation medicine, high-throughput data now provide the means to investigate novel biomarkers of radiation response. Large-scale efforts have identified several radiation response signatures, which poses two challenges, namely, their analytical validity and redundancy of gene signatures.
Methods:
To address these fundamental radiogenomics questions, we curated a database of gene expression signatures predictive of radiation response under oxic and hypoxic conditions. RadiationGeneSigDB has a collection of 11 oxic and 24 hypoxic signatures with the standardized gene list as a gene symbol, Entrez gene ID, and its function. We present the utility of this database by gaining an understanding of hypoxia-associated miRNA by applying a penalized multivariate model; by comparing breast cancer oxic signatures in cell line data vs patient data; and by comparing the similarity of head and neck cancer hypoxia signatures at the pathway level in clinical tumour data.
Results:
We obtained a set of miRNA highly associated both positively and negatively to the hypoxia gene signatures, across pan-cancer. In addition, we identified moderate correlations between breast cancer oxic signatures in patient data, and significant differences across molecular subtypes. Moreover, we also found that different set of pathways to be enriched using the head and neck hypoxia signatures, although, they are found to be concordant when applied on the patient data.
Conclusion:
This valuable, curated repertoire of published gene expression signatures provides motivating case studies for how to search for similarities in radiation response for tumours arising from different tissues across model systems under oxic and hypoxic conditions, and how a well-curated set of gene signatures can be used to generate novel biological hypotheses about the functions of non-coding RNA.
Advances in knowledge:
We envision that RadiationSigDB database will help accelerate preclinical radiotherapeutic discovery pipelines in terms of analytical validity of novel biomarkers of radiation response and the need for ensemble approaches to clinical genomic biomarkers.
Introduction
The two pillars driving the field of personalized radiation oncology are: (i) treatment delivery and dose conformity arising from technological improvements, which include particle therapies and advanced image guidance techniques; (ii) novel biomarker-guided tools, integrating concomitant chemotherapy.1 To tailor radiation therapy, it is crucial to build predictive assays that are more confidently able to stratify patients, and concomitantly have associated impactful radiotherapeutic regimens. This could augment the existing radiobiological treatment strategies to more biologically driven personalized radiation treatment to individual patients.2 The evolution of high throughput technologies and the continuous inflow of transcriptomic data have created new avenues to understand complex biological events induced by radiation, through data driven analysis, at a level beyond the gross clinical variables of an individual, and instead, at the individual tumour level. The ever-expanding arsenal of transcriptomic data holds great promise to investigate novel biomarkers that are predictive of radiation response. In the literature, several studies have attempted to associate radiosensitivity with molecular/genomic features,3,4 and this number is getting larger. Despite this, there have been no systematic efforts to build a database of radiation response gene expression signatures validated or designated for clinical use.
Even with efforts toward standardization and documentation, researchers continue to find it difficult to locate and utilize these resources effectively. Next-generation sequencing and personalized genomics will further enhance the amount of data available, complicating the search for efficacious and well-validated resources in this area. In spite of the challenges that have arisen with the growth of data and databases, the rewards and opportunities provided by this information have proven fruitful. Today, there is a wealth of data, facilitating new discoveries and uncovering new relationships between different disciplines, and particularly in cancer biology, finding new avenues and vulnerabilities to better treat this heterogeneous disease.
In the literature, many groups conducted comprehensive gene expression profiling and built signatures for radiation response under oxic4–6 and hypoxic conditions.7–9 Two methods have been used to identify these gene signatures, namely, data-driven (bottom-up) and hypothesis-based (top-down) approaches. The performance of these transcriptomic signatures has been evaluated on various data sets with limited to no independent external validation. Complicating this picture is the observation that there is only minimal overlap between these gene signatures. This may be attributed to the different technological platforms, such as microarray or RNA-sequencing, training sets, and statistical tools used to create these signatures. Furthermore, in order to develop biomarkers reproducible and appropriate for clinical translation, a database must be built to house well-validated predictive models. At present, there is no radiation response signature database that could potentially address these fundamental questions. In this study, we manually curated 35 radiation response gene signatures from the literature, called RadiationGeneSigDB, and implemented this as an R package. The initial release of RadiationGeneSigDB provides 11 radiation response gene signatures in oxic condition pertaining to breast cancer, carcinoma, head and neck squamous cell carcinoma, soft tissue sarcoma, carcinoma and pan-cancer. The oxic signatures were derived from cell lines (n = 9) and patient (n = 2) gene expression data by employing classification- and regression-based methods (Supplementary Material 1). Furthermore, the database also consists of 24 hypoxic gene signatures derived from gene expression data related to head and neck, breast, prostate, laryngeal, squamous, cervical, bladder, mammary epithelial, ER+ breast cancers and hepatocellular carcinoma. The hypoxic signatures were derived from cell lines (n = 16) and patient (n = 8) gene expression data by employing classification, regression, clustering and co-expression network methodologies (Supplementary Material 1). This database will facilitate users, (i) to compare radiation response signatures across pan-cancer data sets; (ii) to investigate the prognostic value of these signatures on a compendium of clinical data sets using meta-analysis; and, (iii) to investigate the tissue specificity of radiation response using these signatures across different transcriptomic platforms. Ultimately, the goal of this work is to improve the development, begin to add validation as a criteria towards future use of signatures, and spur a greater understanding of how transcriptomic signatures can be used to augment precision radiation oncology, for improved patient outcomes as well as more effectively designed clinical trials.
Methods and materials
Curation of RadiationGeneSigDB
Using the PubMed searches, all papers likely to contain one or more radiation response gene signatures were first identified of the form "radioresponse" OR "radiation response" OR "hypoxic gene signature" OR "Radiation Sensitivity Signature." Accordingly, each article was downloaded and gene signatures were extracted from the manuscript or its supplementary materials. With each manuscript, the gene signatures were present as tables, or as a figure, or in supplementary material as pdf, excel files. Each gene signature was given a signature identifier, with the name of the first author of each publication. Gene signatures were stored in excel file format and as a data object file that can be easily loaded using the R software. Additionally, the metadata for each gene in the signature is also stored in the same file. We used genome wide annotation based on Entrez Gene identifiers. Each of these signatures is saved both as Gene Symbol as well as the corresponding EntrezGeneId using the org.hs.db R package. Furthermore, the methodology used to build the gene signature along with the training and validation cohorts are also added which can be found in the Supplementary Material 1. In our first release, we collected 11, 24 oxic and hypoxic signatures respectively, and these are described in Supplementary Material 1. The GitHub link also provided in Supplementary Material 1 (https://github.com/vmsatya/RadiationGeneSigDB) can be downloaded as an excel file, or as an R data object. The original study, number of genes in the signature, histology for the radiation response gene expression signatures, the data type and model type used to build gene signature, along with the validation data set under both oxic and hypoxic conditions is discussed in the Supplementary material.
Data sets
Cancer cell lines were profiled at the genomic level and the processed data are available for download from a public database, The Cancer Cell Line Encyclopedia (CCLE).10 For this study, we used all breast cancer lines (26 in total) from the CCLE data set. The METABRIC data set was used for patient data11 and selected only those patients treated with radiation therapy. Out of 1992 patients, 232 patients were reportedly treated with radiation therapy. We retrieved head and neck squamous cell carcinoma (HNSCC) primary tumour transcriptomic data from the Tumour Cancer Genome Atlas (TCGA),12 and selected only the patients treated with radiation (99 in total).
Molecular subtyping of breast cancer patients
Breast cancer represents a heterogeneous group of malignancies with different molecular features, prognosis and response to different therapies (including chemotherapeutic agents and targeted agents). Breast cancers are classified into four subtypes, namely, basal-like (triple negative breast cancer), HER-2, Luminal A, Luminal B. These molecularly heterogeneous subtypes have distinct gene expression patterns that are associated with variable outcome. For instance, Luminal A tumours are associated with a low risk of local and distant recurrence, while basal-like tumours have higher rates of locoregional failure and poor overall survival outcome.13,14 Efforts are currently underway in the breast radiation oncology community to personalize adjuvant radiotherapy treatment based on molecular subtype of each patient. Previous attempts to identify radiation-specific signatures to predict likelihood of benefit to adjuvant radiotherapy have either failed independent validation or have not been subtype-specific resulting in inconsistent performance in independent cohorts. Hence, it is important to investigate if the existing radioresponse gene signatures are subtype-specific.
We used the SCMOD2 model to assign each tumour sample into the four established molecular subtypes of breast cancer: Basal-like (TNBC), Her2-enriched, Luminal A (ER-high) and Luminal B (ER-low).15 We used the SCMOD2 implementation available in the genefu R package. Each gene signature, when applied to a given sample, is first summarized into a single value. This single value, called the “summary score” (which is a continuous value) is taken as the representative value of each of the gene signatures across each of the samples. Thus, the Spearman correlation coefficient compares the relationship between these summary scores for each signature across the set of samples considered. To assay the quality of gene signatures, we used our recently developed method siqQC.16
Pathway analysis
We performed the pathway analysis using runGSAhyper function as implemented in the piano package.17 The genes in the signature of interest were compared against gene ontology (GO) gene sets with protein-coding genes as the background set. Nominal p-values obtained for each pathway are corrected for multiple testing using the false discovery approach (FDR).18
Predictive model to obtain hypoxia-associated miRNA
Genomic data involving the mRNA and miRNA expression of a cohort of 7738 patient samples was downloaded from the Cancer Genome Atlas (TCGA) project, accessed through the Broad Institute Firebrowse portal at http://www.firebrowse.org.12 Data used were RSEM-normalized gene expression and mature miRNA normalized expression. In the same method as Dhawan et al,19 we considered all cancer types which were epithelial or glandular with respect to histology, and with at least 200 unique patient samples with paired mRNA and miRNA-sequencing data (Supplementary Material 1).
In the same method of Dhawan et al,19 mRNA gene signature scores, scored as the median value of gene expression for the genes of a given signature, were taken. These values were used as the response variables in two series of predictive models. In the first, every miRNA with non-zero expression across at least 80% of all tumour samples was considered for its correlation with the gene signature score, across all samples of a given cancer type. miRNAs showing at least moderate univariate predictive ability for the signature summary score, were considered going forward. Subsequently, each of these miRNA passing the first filter were taken as predictors for a linear model, fitting the gene signature scores. Fitting was done by the same approach as taken by Dhawan et al.19 That is, multivariable linear regression with L1/L2 penalty, optimized by 10-fold cross-validation was used to identify the miRNAs which showed the greatest predictive ability for each hypoxia gene signature score across each of the cancer types considered (Figure 1). miRNA which showed the strongest predictive ability, in positive or negative association with the gene signature scores were obtained for each signature, by aggregating the linear model coefficients across cancer types, and considering the rank product statistic. Those miRNA that consistently had greater positive coefficients in each linear model across cancer types, than chance alone could predict, were taken as significantly positively associated to a given gene signature.
Figure 1. .
Overview of the approach used to identify hypoxia-associated miRNA (figure adapted from Dhawan et al19). An overview of the linear model used in fitting, wherein each gene signature and cancer type are considered, and the miRNA significantly associated with a given signature across cancer types are stored. Subsequently, those miRNA which associate the strongest with all gene signatures are next considered before obtaining the final list of hypoxia-associated miRNA.
These significantly associated miRNA were then ranked in order of p-value obtained by rank product statistic, for each of the gene signatures considered. Subsequently, these miRNA were aggregated across all gene signatures, to identify those which were recurrently positively or negatively associated with each gene signature, using the rank product statistic taken over all gene signatures. Using the miRNA identified as consistently negatively associated with the hypoxia gene signatures, we sought to identify those signature genes which may show potential de-repression by their targeting miRNA in association with hypoxia. Thus, for each cancer type independently, the Spearman correlation was computed between the expression of all mRNA and negatively associated miRNA to the hypoxia gene signatures as identified above. Using the rank product test, those miRNA–mRNA pairs with the most consistently negative correlation value across cancer types were identified. This list of statistically significantly negatively correlated miRNA–mRNA pairs (Bonferroni-corrected p < 0.05, rank product statistic) was then subset to the list of mRNA occurring in the gene signatures to identify those with potential de-repression in hypoxia.
Results
We present here three case studies utilizing the radiation response gene signature database, highlighting its utility. All associated R code is provided in the Github repository with link as provided in Supplementary Information.
Comparison of oxic breast gene signatures in cell lines and patients
The objective of this case study is to conduct an unbiased comparison of two different prognostic breast cancer signatures (namely, Piening et al,20 Speers et al6) predictive of radiation response under oxic conditions. We have chosen the two breast cancer oxic gene signatures based on the fact that these were developed using different methodologies. Piening et al20 developed the gene signature based on the changes in gene expression in the human lymphoblast cells irradiated with 5 Gy of radiation from 12 patients. They compared the treated cell lines with untreated cell lines and found 160, 59 genes that are significantly induced and repressed, respectively, and termed them as a gene signature. The gene signature was then applied to two independent breast cancer cohorts. Unsupervised clustering identified two clusters of patients and by Kaplan–Meier survival analysis, the radiation-induced and radiation-repressed gene signatures were significantly associated with local recurrence. A different methodology was employed by Speers et al,6 who used breast cancer cell line data and radiation response data (which was obtained from the clonogenic survival assays) as a continuous variable. The gene signature was applied to two independent breast cancer cohorts with patients treated surgery and radiation treatment. Using the gene signature, they were able to identify patients who are likely to recur despite surgery and radiotherapy.
We compared the signature scores in cell line and patient data, and assessed the similarities and differences in their behaviour across these data sets. Importantly, the overlap of genes between these two signatures is minimal, attributable to the different platforms, training sets, and statistical methods used in their generation. We computed signature scores, defined as the weighted average of expression of the signature genes, across the 26 breast cancer cell lines from the CCLE RNA-seq database, as well as for all patient samples from the METABRIC data set. Before proceeding further with this analysis, we first assayed the quality of gene signature application on each of these data sets using sigQC.16 sigQC is an R package-implemented protocol for gene signature quality control recently developed by Dhawan et al, and has begun to be used in multiple centres as a method of testing quality of signature application on a given data set. Encompassed within this protocol are a number of statistical metrics describing the ability of the gene signature to faithfully represent the data set, such as expression and variability of signature genes, as well as co-correlation of signature genes etc. Interestingly, this preliminary analysis, as shown by summary radar plots (Figure 2), reveals significant variations in quality of application between the signatures on different data sets.
Figure 2. .
Speers and Piening gene signatures show lower quality of application by the sigQC metrics on the ER High and Low data sets, as compared to the TNBC and CCLE data sets. Radar plots produced for sigQC summary metric evaluation of Piening and Speers signatures on CCLE and METABRIC data sets. Each ray of the radar plot evaluates one of the summary metrics checked by sigQC for gene signature quality prior to application on a data set. Values closer to the edge of the radar plot indicate stronger quality of application for a given signature-data set combination. CCLE,cancer cell line encyclopedia.
That is, the signatures appeared to have the strongest quality of application by the sigQC metrics when applied on CCLE data or TNBC data, but not the ER-high or ER-low data sets. Briefly, this plot reveals that the Piening signature on the TNBC data set (solid red line) is among the highest-quality application, along with the Piening signature on the CCLE data set (solid purple line). Among the lowest performing signature-data set combinations are the Speers and Piening signatures on the ER-High and ER-Low data sets, as poorer performance is observed on nearly every metric considered. Signature-data set combinations are ranked in the legend by quality, with the numeric value in brackets representing the area contained within the radar chart lines for each signature-data set combination.
Next, we examined the correlation between the Piening and Speers signature scores in each of the data sets independently using the Spearman correlation (Figure 3). We observed a moderate Spearman correlation (~0.69) between the signature scores in the cell line data. However, we observed significant differences when comparing signature scores in patient data.
Figure 3. .
Cell line vs patient model systems shows differential information captured by Speers and Piening gene signatures, particularly among the ER-high subset of patients. Comparison of Spearman correlation between Speers and Piening breast cancer radiation response gene signature scores in CCLE breast cancer cell lines and in METABRIC patient cohort. CCLE,cancer cell line encyclopedia.
We observed that for clinical data, when stratified by molecular subtype, the Spearman correlation had moderate value for triple negative breast cancer, HER2-amplified, and ER-low cancers. For the ER-high subtype, the Spearman correlation between the signatures is weak (~0.25), suggesting a difference in the information captured, despite a priori being expected to carry the same information about the samples. This is in line with the literature that supports the fact that radiation response is subtype-specific.21 Moreover, in vitro studies have shown that subtype-specific breast cancer cell lines exhibit varied sensitivities to radiation.22,23 The moderate correlation between the two breast cancer signatures may be attributed to: (i) different statistical tools implemented on different platforms, with a small overlap in genes (i.e. distinct signature derivation techniques and training data sets biasing the outcome signature, leading to reduced reproducible performance on ER-high samples); (ii) different experimental assays and protocols used to generate dose–response data (i.e. inconsistency in the initial data sets generated used for deriving each of the signatures); (iii) different experimental methodologies to generate radiation data across labs; (iv) signature development underpowered due to lack of sample size along with enough replicates. This could also have implications with regard to the evaluation of radiation response in an in vitro setting and translating it into clinical practice.21 Hence, there is a dire need to build robust radiation signatures that are independent of platforms and technology and generalizable across the data sets they are expected to be applied to.
Comparison of HNSCC hypoxia gene signatures in patients
In this case study, we have chosen the three HNSCC hypoxic gene signatures based on the fact that these were developed using different methodologies (two of them based on classification and one based on regression) using patient data. The first signature of interest was derived by Toustrup et al, who identified a set of 15 pH-independent hypoxia-related genes.9 The signature was initially validated in mouse models as well as on an HNSCC cohort of 58 patients, in which hypoxia was previously determined using electrode measurements. The gene signature was used to classify a cohort of 323 patients into having less or more hypoxic. The second signature was designed by Eustance et al, which is based on hypoxia associated genes using microarray platform.24 Eustace et al tested this hypoxia signature comprising 26 genes in HNSCC (laryngeal) cancer patients and bladder cancer patients enrolled in the prospective Phase III randomized trials. The third gene signature of interest for this study was derived by Lendahl et al. An in silico meta-analysis approach was carried out using published microarray gene expression data to identify a pan-cancer hypoxia signature.8 Each of these signatures was considered due to their wide citation in the literature, as well as their derivation from patient level data, and their potential for wider utility. Firstly, comparing the composition of each of these signatures, we noted that despite being predictive of the same outcome, radiation response, in a single cancer type, there is just a single common gene between the three signatures.
Next, we performed a quality analysis, to ensure validity when applying these signatures on the TCGA data set with sigQC and noted that all three signatures show good quality on this data set and are consistent between each other (Figure 4). Briefly, Figure 4 reveals that each of the three signatures has similar quality when applied on the TCGA data set, with the Eustace gene expression signature showing a slight reduction in quality, likely owing to differences in intrasignature gene score correlation, most likely due to model-specificity in its derivation. Signature-data set combinations are ranked in the legend by overall quality measure, with the numeric value in brackets representing the area contained within the radar chart lines for each signature-data set combination.
Figure 4. .
Each of the signatures (Toustrup, Lendahl, and Eustace), shows high quality across multiple sigQC metrics, suggesting validity in application with the TCGA data set. Radar plots produced for sigQC summary metric evaluation of Toustrup, Lendahl, and Eustace signatures on TCGA HNSCC data set. Each ray of the radar plot evaluates one of the summary metrics checked by sigQC16 for gene signature quality prior to application on a data set. HNSCC,head and neck squamous cell carcinoma; TCGA, tumour cancer genome atlas.
We then examined the correlations between the signature scores themselves across the TCGA cohort of patients, using the Spearman correlation. We found that the Toustrup and Lendahl, and the Toustrup and Eustace hypoxia signatures were strongly correlated (~0.8), whereas the Lendahl and Eustace signatures were weakly correlated (~0.46) (Figure 5, left panel), which is consistent with the study by Tawk et al.25 Having identified differences in the signatures’ genetic composition, but relative similarity in their behaviour, we next asked whether similar biological processes could be encapsulated by these signatures. To achieve this, we performed pathway analysis using GO terms from the MSigDB database made available from the Broad Institute.26 For a false discovery approach < 10%, 13, 37, and 51 transcriptional pathways were found to be enriched among the genes of the Toustrup, Eustace, Lendahl signatures, respectively. We found that only three pathways were commonly enriched between all the three signatures (Figure 5- Right panel), namely, “GO:Oxidation Reduction Process,” “GO:Glucose Metabolic Process,” “GO:Monosaccharide Metabolic Process.” Intracellular oxidation reduction reactions can be affected by ionizing radiation induced free radicals. These cellular reactions might contribute to the activation of protective or damaging processes that could impact upon the damaging effects of radiation,27 making the oxidation reduction pathway an important contributor to radiation response. On the other hand, metabolism pathways play a crucial role to alter the radiosensitivity of cancer cells. Several preclinical studies have shown that glucose metabolism of cancer cells are likely to be involved in alterations of radioresistance.28,29 It has been shown in the literature that interfering with glucose metabolism of tumour cells to reduce the levels of antioxidants could potentially improve response to radiotherapy.28,29
Figure 5. .
Comparison of hypoxia gene signatures in TCGA HNSCC. Left panel: Correlation of signature scores. Right panel: Venn diagram illustrating the transcriptional pathways enriched using the three head and neck hypoxia signatures (FDR < 10%). FDR,false discovery approach; HNSCC, head and neck squamous cell carcinoma; TCGA, tumourcancer genome atlas.
Using hypoxia gene signatures to identify hypoxia-associated miRNA mRNA gene expression signatures, as generated over the past decade, have enabled a greater understanding of cellular processes and phenotypic responses to environmental changes, such as hypoxia. Using RadiationGeneSigDB, the curated hypoxia gene signatures at the mRNA level were used to infer the function of miRNA using a robust pan-cancer statistical approach. Briefly, as described in the methods section, a graduated linear modelling approach with a pan-cancer data set was used to identify those miRNA with significant and recurring statistical association to the overall behaviour of the hypoxia gene signatures. Using a strong, broadly representative, and well-curated database of hypoxia gene signatures is essential to this approach, as the strength of the miRNA associations determined through the linear model.
As such, using the curated database of 24 hypoxia gene signatures and data from patient samples from 15 epithelial tumour types as summarized in Supplementary Material 1, we obtained a set of miRNA highly associated both positively and negatively to these hypoxia gene signatures. As a result of both the high quality of the signatures considered, as well as the robust statistical approach considered, the signature-associated miRNA obtained include many of those that have already been independently validated, as summarized in Table 1. Importantly, the strength of this approach is its versatility—with any such well-curated list of gene signatures, or indeed a different set of genomic data, novel associations of miRNA to phenotype can be discovered, underscoring the utility of this resource as a tool for biological discovery.
Table 1. .
miRNA significantly positively and negatively associated with hypoxia gene signatures (top 10 displayed)
miRNA positively associated with hypoxia | miRNA negatively associated with hypoxia | ||
---|---|---|---|
miRNA | Rank product p value | miRNA | Rank product p value |
hsa-miR-210–3p | 3.15E-42 | hsa-miR-374a-5p | 2.83E-31 |
hsa-miR-15b-5p | 1.98E-27 | hsa-miR-29c-5p | 3.04E-21 |
hsa-miR-130b-5p | 1.03E-21 | hsa-miR-1307–3p | 1.91E-18 |
hsa-miR-93–5p | 2.38E-18 | hsa-miR-30e-3p | 4.23E-15 |
hsa-miR-21–3p | 5.99E-16 | hsa-miR-30e-5p | 1.46E-14 |
hsa-miR-106b-5p | 1.94E-10 | hsa-let-7i-3p | 1.65E-12 |
hsa-miR-21–5p | 2.96E-10 | hsa-miR-26a-5p | 1.12E-11 |
hsa-miR-223–3p | 3.15E-10 | hsa-let-7a-3p | 1.27E-11 |
hsa-miR-15b-3p | 9.14E-10 | hsa-miR-362–5p | 8.40E-11 |
miRNAare identified as significantly associated by way of linear modelling schema,as outlined in the methods section, and by considering the association betweenmiRNA expression and hypoxia gene signature score across each of the 24 genesignatures curated into RadiationGeneSigDB.
We further extended this analysis to obtain not only the miRNA both positively and negatively associated with hypoxia, but also those genes which are in the hypoxia gene signatures, negatively correlated to the identified downregulated hypoxia miRNA (Figure 6).
Figure 6. .
Top 28 miRNA–mRNA interactions for hypoxia gene signature associated miRNA and hypoxia gene signature mRNA, across 15 cancer types. Left half of the circos plot depicts the miRNA species and the right half of the circos plot depicts the hypoxia gene signature mRNA that show the greatest degree of negative correlation in association with miRNA negatively associated with gene signature score. That is, the miRNA–mRNA pairs with the strongest negative correlations across cancer types (i.e. de-repressed mRNA), when considering the hypoxia miRNA and hypoxia gene signature genes are depicted. For ease of interpretation, the top 28 pairs are shown in the above plot.
This facilitates a potential understanding for the mechanism behind the association of miRNA and gene signatures genes. That is, we considered those miRNA decreased in hypoxia, and asked which genes of the gene signatures are strongly negatively associated; thereby identifying mRNA of the hypoxia gene signatures that are potentially de-repressed by miRNA, though this would require functional validation to confirm. Specifically, this analysis points towards the role of decreased mRNA levels of LOX, TIMP2, OSTM1, CCNF in association with the poor-prognosis in breast,30 carcinoma,31 ovarian,32 breast33 respectively.
Discussion
In this era of personalized medicine, an area of current interest in the field of radiation oncology is the study of changes in the transcriptome induced by radiation therapy, termed radiogenomics. With the evolution of transcriptomic sequencing technologies, radiogenomics has emerged as a new research field which can help decode changes in the genetic events induced by radiation therapy, and also identify biomarkers that are predictive of radiation response. In this regard, the number of available gene expression-based signatures built under oxic and hypoxic conditions is increasing. This poses two primary questions in the field, (i) how reliable are these signatures when applied across a compendium of data sets in different model systems representative of real-life heterogeneity; and (ii) whether there is redundancy among identified and well-validated gene signatures.1 To address these questions, we curated a database of 35 gene expression signatures predictive of radiation response under both oxic and hypoxic conditions. These signatures have come from a variety of sources and encompass a number of derivation techniques (e.g. classification,4 regression,6 clustering,34 co-expression networks7 using gene expression data across different types of cancers such as breast, carcinoma, head and neck squamous cell carcinoma, soft tissue sarcoma, carcinoma and pan-cancer).
We presented the utility of this database by using it to identify positively- and negatively enriched hypoxia-associated miRNA across a wide sample of 15 epithelial tumour types with patient-level genomic data. In addition, we investigated differences in the behaviour of these gene signatures using transcriptomic data from both cell lines and ex vivo patient samples. We identified moderate correlations between breast cancer oxic signatures in patient data and showed that surprisingly, the correlation between signature scores was dependent on molecular subtype of breast cancer.22,23 This finding has been highlighted in the literature by previous studies of gene signatures, where it has been noted that via single cell analysis, there is indeed substantial genomic heterogeneity, thereby necessitating careful analysis of gene signatures to ensure consistent behaviour across all heterogeneous subsets in a given data set.35 Indeed, in this work we suggest that secondary analysis of gene signatures is crucial on an independent and more heterozygous data set to identify when signatures diverge in their behaviour. Ultimately, this divergence suggests that there may have been some type of bias in the training data set or the signature derivation process, and that in a particular group of patients, a given signature may perform better than others. Importantly, this implies that signatures must be carefully checked for both concordance and biological validity on heterogeneous data sets before clinical application to better account for the wide variety of situations in which their information will be used. Moreover, although the head and neck hypoxia gene signatures we tested were concordant in their computed scores,25 we found that a different set of biological processes were enriched for by the genes of each signature when applied on patient data. Re-analysis of existing signatures for overlapping biological pathways may facilitate a greater understanding of mechanism leading to phenotype, and ultimately may lead to improved, more mechanistic gene signatures.
Translation from research-based omics biomarkers to the clinical setting requires a rigorous development and validation process. Evaluation of the readiness of biomarkers to be used for clinical care requires careful consideration of its analytical and clinical validity along with the potential clinical utility. In this regard, the Institute of Medicine conducted a review and came up with 30 criteria for the use of OMICS based biomarkers in a clinical trial.35 Specifically, these are broadly categorized into the following: specimen issues, assay issues, model development, specification, and preliminary performance evaluation, clinical trial design, ethical, legal and regulatory issues.36 The analysis presented here through case studies for RadiationGeneSigDB provides a means for testing specification and performance evaluation on a diverse set of data sets. Hence, it is the hope that these criteria, in combination with tools such as RadiationGeneSigDB and the recently developed RadioGx platform (which is a compendium of radiogenomic data sets),37 will pave the way for researchers to develop robust, reproducible, and clinically translatable biomarkers that can improve patient outcomes.
Availability of data and material
RadiationGeneSigDB is implemented in R. The utility of the database through three case studies are implemented in R. The source code of this package and signatures can be downloaded from the GitHub: https://github.com/vmsatya/RadiationGeneSigDB
Footnotes
Acknowledgment: The authors thank the scientific community for sharing their valuable data.
Contributor Information
Venkata SK Manem, Email: mail2mvskumar@gmail.com.
Andrew Dhawan, Email: dhawana@ccf.org.
REFERENCES
- 1.Baumann M, Krause M, Overgaard J, Debus J, Bentzen SM, Daartz J, et al. Radiation oncology in the era of precision medicine. Nat Rev Cancer 2016; 16: 234–49. doi: 10.1038/nrc.2016.18 [DOI] [PubMed] [Google Scholar]
- 2.Scott JG, Berglund A, Schell MJ, Mihaylov I, Fulp WJ, Yue B, et al. A genome-based model for adjusting radiotherapy dose (GARD): a retrospective, cohort-based study. Lancet Oncol 2017; 18: 202–11. doi: 10.1016/S1470-2045(16)30648-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Abazeed ME, Adams DJ, Hurov KE, Tamayo P, Creighton CJ, Sonkin D, et al. Integrative radiogenomic profiling of squamous cell lung cancer. Cancer Res 2013; 73: 6289–98. doi: 10.1158/0008-5472.CAN-13-1616 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Amundson SA, Do KT, Vinikoor LC, Lee RA, Koch-Paiz CA, Ahn J, et al. Integrating global gene expression and radiation survival parameters across the 60 cell lines of the National cancer Institute anticancer drug screen. Cancer Res 2008; 68: 415–24. doi: 10.1158/0008-5472.CAN-07-2120 [DOI] [PubMed] [Google Scholar]
- 5.Torres-Roca JF, Eschrich S, Zhao H, Bloom G, Sung J, McCarthy S, et al. Prediction of radiation sensitivity using a gene expression classifier. Cancer Res 2005; 65: 7169–76. doi: 10.1158/0008-5472.CAN-05-0656 [DOI] [PubMed] [Google Scholar]
- 6.Speers C, Zhao S, Liu M, Bartelink H, Pierce LJ, Feng FY. Development and validation of a novel radiosensitivity signature in human breast cancer. Clin Cancer Res 2015; 21: 3667–77. doi: 10.1158/1078-0432.CCR-14-2898 [DOI] [PubMed] [Google Scholar]
- 7.Buffa FM, Harris AL, West CM, Miller CJ. Large meta-analysis of multiple cancers reveals a common, compact and highly prognostic hypoxia metagene. Br J Cancer 2010; 102: 428–35. doi: 10.1038/sj.bjc.6605450 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lendahl U, Lee KL, Yang H, Poellinger L. Generating specificity and diversity in the transcriptional response to hypoxia. Nat Rev Genet 2009; 10: 821–32. doi: 10.1038/nrg2665 [DOI] [PubMed] [Google Scholar]
- 9.Toustrup K, Sørensen BS, Nordsmark M, Busk M, Wiuf C, Alsner J, et al. Development of a hypoxia gene expression classifier with predictive impact for hypoxic modification of radiotherapy in head and neck cancer. Cancer Res 2011; 71: 5923–31. doi: 10.1158/0008-5472.CAN-11-1182 [DOI] [PubMed] [Google Scholar]
- 10.Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012; 483: 603–7. doi: 10.1038/nature11003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Curtis C, Shah SP, Chin S-F, Turashvili G, Rueda OM, Dunning MJ, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012; 486: 346–52. doi: 10.1038/nature10983 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, Ellrott K, et al. The cancer genome atlas pan-cancer analysis project. Nat Genet 2013; 45: 1113–20. doi: 10.1038/ng.2764 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Panoff JE, Wright JL. Breast cancer subtypes and the risk of local and regional relapse. Yearbook of Oncology 2011; 2011: 57–60. doi: 10.1016/j.yonc.2011.10.002 [DOI] [Google Scholar]
- 14.Onitilo AA, Engel JM, Greenlee RT, Mukesh BN. Breast cancer subtypes based on ER/PR and HER2 expression: comparison of clinicopathologic features and survival. Clin Med Res 2009; 7(1-2): 4–13. doi: 10.3121/cmr.2008.825 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Haibe-Kains B, Desmedt C, Loi S, Culhane AC, Bontempi G, Quackenbush J, et al. A three-gene model to robustly identify breast cancer molecular subtypes. J Natl Cancer Inst 2012; 104: 311–25. doi: 10.1093/jnci/djr545 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Dhawan A, Barberis A, Cheng W-C, Domingo E, West C, Maughan T, et al. Guidelines for using sigQC for systematic evaluation of gene signatures. Nat Protoc 2019; 14: 1377–400. doi: 10.1038/s41596-019-0136-8 [DOI] [PubMed] [Google Scholar]
- 17.Väremo L, Nielsen J, Nookaew I. Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods. Nucleic Acids Res 2013; 41: 4378–91. doi: 10.1093/nar/gkt111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing [Internet. Journal of the Royal Statistical Society: Series B (Methodological). 1995; 57: 289–300. [Google Scholar]
- 19.Dhawan A, Scott JG, Harris AL, Buffa FM. Pan-Cancer characterisation of microRNA across cancer hallmarks reveals microRNA-mediated downregulation of tumour suppressors. Nat Commun 2018; 9: 5228. doi: 10.1038/s41467-018-07657-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Piening BD, Wang P, Subramanian A, Paulovich AG. A radiation-derived gene expression signature predicts clinical outcome for breast cancer patients. Radiat Res 2009; 171: 141–54. doi: 10.1667/RR1223.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lee C-T, Zhou Y, Roy-Choudhury K, Siamakpour-Reihani S, Young K, Hoang P, et al. Subtype-Specific radiation response and therapeutic effect of Fas death receptor modulation in human breast cancer. Radiat Res 2017; 188: 169–80. doi: 10.1667/RR14664.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Smith L, Qutob O, Watson MB, Beavis AW, Potts D, Welham KJ, et al. Proteomic identification of putative biomarkers of radiotherapy resistance: a possible role for the 26S proteasome? Neoplasia 2009; 11: 1194–207. doi: 10.1593/neo.09902 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Langlands FE, Horgan K, Dodwell DD, Smith L. Breast cancer subtypes: response to radiotherapy and potential radiosensitisation. Br J Radiol 2013; 86: 20120601. doi: 10.1259/bjr.20120601 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Eustace A, Mani N, Span PN, Irlam JJ, Taylor J, Betts GNJ, et al. A 26-gene hypoxia signature predicts benefit from hypoxia-modifying therapy in laryngeal cancer but not bladder cancer. Clin Cancer Res 2013; 19: 4879–88. doi: 10.1158/1078-0432.CCR-13-0542 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tawk B, Schwager C, Deffaa O, Dyckhoff G, Warta R, Linge A, et al. Comparative analysis of transcriptomics based hypoxia signatures in head- and neck squamous cell carcinoma. Radiother Oncol 2016; 118: 350–8. doi: 10.1016/j.radonc.2015.11.027 [DOI] [PubMed] [Google Scholar]
- 26.Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 2005; 102: 15545–50. doi: 10.1073/pnas.0506580102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Spitz DR, Azzam EI, Li JJ, Gius D. Metabolic oxidation/reduction reactions and cellular responses to ionizing radiation: a unifying concept in stress response biology. Cancer Metastasis Rev 2004; 23(3-4): 311–22. doi: 10.1023/B:CANC.0000031769.14728.bc [DOI] [PubMed] [Google Scholar]
- 28.Hirschhaeuser F, Sattler UGA, Mueller-Klieser W. Lactate: a metabolic key player in cancer. Cancer Res 2011; 71: 6921–5. doi: 10.1158/0008-5472.CAN-11-1457 [DOI] [PubMed] [Google Scholar]
- 29.Meijer TWH, Kaanders JHAM, Span PN, Bussink J. Targeting hypoxia, HIF-1, and tumor glucose metabolism to improve radiotherapy efficacy. Clin Cancer Res 2012; 18: 5585–94. doi: 10.1158/1078-0432.CCR-12-0858 [DOI] [PubMed] [Google Scholar]
- 30.Barker HE, Cox TR, Erler JT. The rationale for targeting the LOX family in cancer. Nat Rev Cancer 2012; 12: 540–52. doi: 10.1038/nrc3319 [DOI] [PubMed] [Google Scholar]
- 31.Kai AK-L, Chan LK, Lo RC-L, Lee JM-F, Wong CC-L, Wong JC-M, et al. Down-Regulation of TIMP2 by HIF-1α/miR-210/HIF-3α regulatory feedback circuit enhances cancer metastasis in hepatocellular carcinoma. Hepatology 2016; 64: 473–87. doi: 10.1002/hep.28577 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Engqvist H, Parris TZ, Rönnerman EW, Söderberg EMV, Biermann J, Mateoiu C, et al. Transcriptomic and genomic profiling of early-stage ovarian carcinomas associated with histotype and overall survival. Oncotarget 2018; 9: 35162–80. doi: 10.18632/oncotarget.26225 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Huang E, Cheng SH, Dressman H, Pittman J, Tsou MH, Horng CF, et al. Gene expression predictors of breast cancer outcomes. The Lancet 2003; 361: 1590–6. doi: 10.1016/S0140-6736(03)13308-9 [DOI] [PubMed] [Google Scholar]
- 34.Starmans MHW, Chu KC, Haider S, Nguyen F, Seigneuric R, Magagnin MG, et al. The prognostic value of temporal in vitro and in vivo derived hypoxia gene-expression signatures in breast cancer. Radiother Oncol 2012; 102: 436–43. doi: 10.1016/j.radonc.2012.02.002 [DOI] [PubMed] [Google Scholar]
- 35.Enge M, Arda HE, Mignardi M, Beausang J, Bottino R, Kim SK, et al. Single-Cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns. Cell 2017; 171: 321–30. doi: 10.1016/j.cell.2017.09.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.McShane LM, Cavenagh MM, Lively TG, Eberhard DA, Bigbee WL, Williams PM, et al. Criteria for the use of OMICs-based predictors in clinical trials. Nature 2013; 502: 317–20. doi: 10.1038/nature12564 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Manem VSK, Lambie M, Smirnov P, Kofia V, Freeman M, Koritzinsky M, et al. Modeling cellular response in large-scale radiogenomic databases to advance precision radiotherapy [Internet]. 2018. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
RadiationGeneSigDB is implemented in R. The utility of the database through three case studies are implemented in R. The source code of this package and signatures can be downloaded from the GitHub: https://github.com/vmsatya/RadiationGeneSigDB