Abstract
Relapse of Ewing sarcoma (ES) can occur months or years after initial remission, and salvage therapy for relapsed disease is usually ineffective. Thus, there is great need to develop biomarkers that can predict which patients are at risk for relapse so that therapy and post‐therapy evaluation can be adjusted accordingly. For this study, we performed whole genome expression profiling on two independent cohorts of clinically annotated ES tumours in an effort to identify and validate prognostic gene signatures. ES specimens were obtained from the Children's Oncology Group and whole genome expression profiling performed using Affymetrix Human Exon 1.0 ST arrays. Lists of differentially expressed genes between survivors and non‐survivors were used to identify prognostic gene signatures. An independent cohort of tumours from the Euro‐Ewing cooperative group was similarly analysed as a validation cohort. Unsupervised clustering of gene expression data failed to segregate tumours based on outcome. Supervised analysis of survivors versus non‐survivors revealed a small number of differentially expressed genes and several statistically significant gene signatures. Gene‐specific enrichment analysis demonstrated that integrin and chemokine genes were associated with survival in tumours where stromal contamination was present. Tumours that did not harbour stromal contamination showed no association of any genes or pathways with clinical outcome. Our results reflect the challenges of performing RNA‐based assays on archived bone tumour specimens. In addition, they reveal a key role for tumour stroma in determining ES prognosis. Future biological and clinical investigations should focus on elucidating the contribution of tumour:micro‐environment interactions on ES progression and response to therapy.
Keywords: Ewing sarcoma, gene expression profiling, prognostic signature
Introduction
Ewing sarcomas (ES) are highly malignant neoplasms of bone and soft tissue often affecting children, adolescents and young adults. Although metastatic ES is still usually fatal, intensification of multimodality therapy has improved outcomes for patients with localized disease 1. Patients with localized tumours treated on the experimental arm of the most recent Children's Oncology Group (COG) study (AEWS0031) experienced 5‐year event‐free survival (EFS) rates near 75% 2. Similar results were obtained for the Euro‐Ewing99‐R1 study within the European cooperative groups 3. Such intensive therapy results in significant and often life‐threatening short‐ and long‐term morbidities 4, 5. Moreover, despite dose intensification and aggressive local control, relapses can occur months or years after initial clinical remission, and salvage therapy is usually ineffective 1. Therefore, cure of ES is largely dependent on eradication of the disease during initial therapy, and biomarkers are needed that can predict relapse at the time of diagnosis. Although several copy number alteration and TP53 mutational studies have shown promise as prognostic biomarkers, none have yet been successfully validated prospectively 6.
There has been abundant research to evaluate whether gene expression profiling can be used to risk‐stratify cancer patients at diagnosis. First demonstrated to be feasible in breast cancer 7, this prognostic approach has been evaluated and validated in other human cancers 8, 9, including paediatric malignancies such as neuroblastoma 10, 11, 12, rhabdomyosarcoma 13, 14, 15, and leukaemia 16, 17. Several small ES genome‐wide profiling studies have been reported, and non‐overlapping candidate prognostic biomarkers were identified 18, 19, 20, 21. However, none of the candidate prognostic gene signatures has been prospectively validated in independent cohorts of equivalently treated patients.
For this study, we profiled gene expression in ES biopsies collected from patients on COG therapeutic studies. These gene profiles were used to identify differentially expressed genes and gene signatures that associated with clinical outcome. We also tested whether identified biomarkers could be validated in an independent set of tumours from patients treated on parallel European Cooperative group trials. Our findings reveal a key role for tumour–stromal interactions in determining prognosis‐associated genes in ES.
Materials and Methods
Sample accrual
Tumour specimens obtained from COG Biorepository in Columbus, OH (Cooperative Human Tissue Network—CHTN) were prospectively acquired from patients on clinical trials INT‐0154 (CCG‐7942, POG‐9354) and AEWS0031, the two most recent protocols for localized ES. An independent set of tumours was obtained from the EuroEWING tumour biorepository in Münster, Germany. These were prospectively acquired from patients registered on European Intergroup Cooperative Ewing's Sarcoma Study (EICESS) 92 and Euro‐Ewing 99 3, 22. Criteria for inclusion of tumours in this molecular profiling study included confirmation of localized disease at presentation, registration on a clinical trial (as above), and availability of outcome data and frozen tumour tissue. Diagnosis of ES was reaffirmed by pathological review, and an estimate of viable tumour cells relative to non‐tumour cells as well as an estimate of tumour necrosis was made for all samples using haematoxylin and eosin stained sections. Molecular analysis of COG and EuroEWING tumours was performed using RT‐PCR for EWS‐FLI1 and EWS‐ERG fusions, as previously reported 23, 24. All tumours were assigned an anonymous identifier and deidentified specimens and clinical data were provided to the investigators. All samples and clinical correlative data were obtained in compliance with the health insurance portability and accountability act. Review and approval by participating institutions was obtained in accordance with an assurance filed with and approved by the Department of Health and Human Services (US institutions) or European authorities. Informed consent for use of tumour samples for research was obtained from each subject or subject's guardian prior to collection and banking of the tissue.
RNA isolation and exon array pre‐processing
Total RNA was isolated using miRNAeasy kits (Qiagen, Valencia, CA). RNA concentrations were calculated using a Nanodrop ND‐1000 spectrophotometer (Nanodrop Technologies, Rockland, DE) and RNA integrity (RIN) was evaluated using the RNA 6000 PicoAssay (Agilent Technologies, Santa Clara, CA). RNA samples with RIN values of <4.0 were subjected to an RNA cleanup step using the mRNAeasy kit (Qiagen, Valencia, CA). RNA samples with a RIN value of >4.0 were analysed using Affymetrix GeneChip Human Exon 1.0 ST arrays. Samples were processed in the Genomics Core at Children's Hospital Los Angeles according to Affymetrix protocols (Affymetrix, Santa Clara, CA). Affymetrix power tools (APT) was used to generate normalized gene‐level signal intensity estimates 25. Affymetrix library files and annotation files were downloaded from the company's website (www.affymetrix.com). Processing included background correction, normalization, log2 transformation and probeset summarization. Only core probesets uniquely mapped to the genome were used. Various quality control measures were assessed, including density plots and the mean absolute deviation of the residuals. Multi‐dimensional scaling was used to detect any sample outliers. Filtering genes under‐detected in samples of interest was accomplished by requiring the detection of more than half the probesets in a gene (DABG value < 0.05) and only retaining genes appearing in at least 60% of the samples in each group. The empirical Bayes ComBat algorithm was applied to remove any batch effects 26.
Consensus clustering and determination of differentially expressed genes
We used consensus clustering to determine if groups could be segregated based on gene expression patterns. Briefly, the top 5,000 most variable genes, as determined by median absolute deviation, were selected. Then 80% of the samples were re‐sampled 1,000 times. Each time, an agglomerative hierarchical clustering algorithm was applied on a 1‐Pearson correlation distance matrix using the R package ConsensusClusterPlus 27. Differentially expressed genes were identified by applying the moderated t‐test implemented in Bioconductor applying the Limma package 28. The p‐values were corrected for multiple hypothesis testing using the Benjamini‐Hochberg procedure 29. Genes with a corrected p‐value < 0.2 and absolute fold‐change > 1.3 were considered as differentially expressed between survivors and non‐survivors. The Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7 30 was used for the functional enrichment analysis of the differentially expressed genes. The false‐discovery rate (FDR) of enriched gene ontology terms was set to 0.25.
Generation of putative prognostic gene expression signatures
To assess the potential for identifying prognostic gene signatures by chance, we randomly selected 5, 10 and 20 candidate signature differentially expressed genes, and the area under the receiver–operator curve (AUC) was calculated for each 31. A similar procedure was repeated 10,000 times for the same‐sized random gene sets from the 10,824 genes used in the study. The number of times the AUC from the random gene sets exceeded that from the differentially expressed gene sets was used to calculate the permutation p‐value. Gene sets with a permutation p‐value < 0.05 were considered as the putative prognostic gene signatures.
To rank the genes according to their contribution in discriminating the samples with different survival status, the variable importance measure (VIM) of each was calculated using an AUC‐based permutation measure derived from a random forest classification (Bioconductor package ctree, 32). The top 5, 10 and 20 genes with highest VIM values were selected, and their prognostic signature scores and the AUCs were computed and compared with those from randomly selected gene signatures of the same size.
Survival status prediction
To build a classification model, random forest, support vector machines and logistic regression were used 33, 34, 35. Random forest outperformed the other methods and was chosen for classification modelling. Briefly, for each candidate signature, the genes with absolute correlations above 0.75 were excluded to reduce the level of correlation. Next, samples were randomly split into training (2/3) and testing sets (1/3). The ntree (number of trees) parameter was set to 500, and leave‐one‐out cross‐validation was used to evaluate the effect on performance with different mtry (mtry = number of genes samples = 2, 3, 4, 5, 6). Using the R package caret 36,the training set was ‘re‐trained’ using the optimal values to obtain a final prediction model for each gene signature.
Demographics and outcomes analyses
To determine if the analytic cohort was representative of patients registered on the therapeutic studies, each characteristic was checked separately by the exact conditional test of proportions 37. Age at enrollment was checked as a categorical variable (<10, 10–17, ≥18 years) and also as a continuous variable using the t‐test. EFS and overall survival (OS) were compared between each analytic group and corresponding study population. EFS was defined as time from enrollment until disease progression, diagnosis of a second malignant neoplasm, death or last patient contact, whichever occurred first. OS was taken to be the time from enrollment to death or last patient contact (at which time they were censored), whichever occurred first. EFS and OS were estimated by the method of Kaplan and Meier, and the relative risks for event and death were compared across the groups using the log‐rank test 38. Data from INT‐0154 (1995–1998) and AEWS0031 (2001–2008) current to July 2007 and March 2009, respectively, were used for analysis. The analyses were done in SAS 9.2 using PROC LIFETEST and PROC FREQ.
Results
Banked ES tumour specimens yield limited samples with quality RNA
The COG has collected ES specimens to its central biorepository for 20 years and since 2002 this has been achieved through two sequential banking protocols 39. Microarray profiling is optimally performed on RNA that is extracted from fresh or freshly frozen tumour samples and at the time of study initiation, protocols for RNA profiling of formalin‐fixed, paraffin‐embedded samples had not yet been sufficiently developed. Therefore, we restricted our study to include only tumours for which frozen tissue was available. As shown (Figure 1A), 287 frozen tumour specimens were identified and subjected to RNA extraction. Fewer than 25% of these samples yielded RNA of sufficient quality (RIN ≥ 4.0) to proceed with array‐based analysis (Figure 1A, B). In sum, 69 Affymetrix CEL files were generated from 67 patients. Clinical and pathological data review of the 67 cases resulted in the exclusion of 11 tumours: eight patients with metastatic disease, two from patients not registered on a therapeutic study and one with less than 10% viable tumour cell content. Thus, despite the relative abundance of banked material, only limited RNA of sufficient quality and quantity was available for array profiling.
Figure 1.

(A) Flowchart detailing sample selection and RNA isolation for COG tumour specimens. (B) Frequency distribution of RNA integrity (RIN) values across the 142 samples. Only samples with a RIN ≥4.0 are used for HuEx array analysis. (C) Kaplan–Meier curve demonstrating that event‐free survival for the 48 analysed patients treated on COG AEWS0031 (dotted line) is similar to the study population as a whole (solid line). (D) Kaplan–Meier curves demonstrating that the EFS for the eight analysed patients treated on INT‐0154 is similar to the study population as a whole.
The analytic cohort is representative of the general ES patient population
Affymetrix CEL files were generated from 56 unique, clinically annotated tumours. All were obtained from patients who were registered on either AEWS0031 (N = 48) or INT‐0541 (N = 8). There were no statistically significant differences in demographics between patients in the analytic population and the remainder of patients enrolled on the two studies from which the analytic population was drawn. However, there was a relative dearth of extremity primary tumour sites (18.8 vs. 35.6%) and an excess of non‐extremity, non‐pelvis primary tumour sites (62.5 vs. 48.8%) in the AEWS0031 tumours (p = 0.053). Pelvic tumours are difficult to biopsy and primary extremity ES in paediatric patients are mainly in bone. In contrast, soft tissue tumours are more readily accessible for biopsy sampling and are more likely to be submitted for correlative biology studies 39. Also, the requirement for decalcification would have further diminished the availability of fresh bone tumour material for this study. Thus, both surgical and pathological issues contributed to the relative over‐representation of non‐pelvic tumours in the analytic cohort.
Next, clinical outcomes were compared. EFS and OS for the 48 analytic cases from AEWS0031 were 61.8 and 72.1%, respectively, compared to 70 and 81.3% for the study as a whole (EFS p = 0.2, OS p = 0.1) (Figure 1C). EFS for the eight patients registered on INT‐0514 was 75% (compared to study EFS = 71%, p = 0.8; Figure 1D), and OS was 72.9% (compared to study OS = 78.7%, p = 0.8). The analytic population was thus deemed to be representative of the general ES population with respect to both demographics and outcome.
Unsupervised analysis fails to discriminate tumours on the basis of clinical or pathological parameters
Quality control assessment of the 56 tumour data files resulted in the identification of two outliers and eight cases that clustered together yet deviated significantly from the remainder of samples (Figure 2A). The eight samples that clustered together were processed on a different slide scanner, and the discrepancy in chip signal intensity between these eight chips and the remaining samples was determined to be a technical artefact that could not be corrected using the batch effect correction algorithm (see Materials and Methods section). The two outliers showed expression profiles that deviated significantly from the remaining tumours. Having failed rigorous quality control, these ten cases were excluded from further analysis.
Figure 2.

(A) Multi‐dimensional scaling is applied to 5000 genes with the greatest variance in expression. The dimensionality reduction is displayed in three dimensions, illustrating two outliers and a batch of eight samples that were scanned and processed on a different instrument. (B) Unsupervised hierarchical clustering of the 2,201 most differentially expressed gene transcripts does not demonstrate any grouping of samples based on any clinical metric. (C) Flowchart describing the identification of differentially expressed genes and (D) candidate prognostic gene sets.
The remaining 46 tumours that were subjected to outcomes analysis are summarized in Table 1. Raw data CEL files and normalized data for these and the validation set (see below) are available at GEO (GSE63157). Unsupervised analyses of the 46 tumour profiles were first performed to determine if ES naturally segregate into different clinical or pathological groups on the basis of differential gene expression. To achieve this, we performed hierarchical clustering using the most highly variable transcripts (N = 2,201 transcripts with coefficient of variance (CV) > 0.15 across all tumours). As shown, Figure 2B, no segregation of tumours into distinct groups was evident. In fact, tumours from survivors and non‐survivors, bone and soft tissue origins and pelvic and non‐pelvic sites did not cluster together but were widely dispersed with respect to gene expression. Similarly, there was no clustering of tumour samples based on timing of the biopsy, before or after induction chemotherapy, or on molecular translocation type. In two cases, no fusion was detected and in a third molecular diagnostics were unavailable. The recent discovery of alternative fusions in rare Ewing‐like sarcomas raises the possibility that these three cases might have been Ewing‐like tumours rather than ES 40, 41. However, given their clinical and histological diagnosis of ES and their inclusion in ES therapeutic studies, these patients were retained for analysis of prognostic gene signatures. Thus, genome‐wide expression profiling of this small but representative cohort of tumours suggests that distinct clinical sub‐groups, as defined by differential expression of protein encoding genes, do not exist in ES.
Table 1.
Patient demographics and tumour pathology: 46 COG patients
| Clinical Features | Pathological Features | ||
|---|---|---|---|
| Age | Source of specimen | ||
| Average (range) | 11.8 yr (3 mo‐19yr) | Pre‐chemo biopsy | 38 |
| Median | 13 yr | Post‐chemo resection | 8 |
| Gender | Histology | ||
| Male | 27 | >70% viable tumour | 33 |
| Female | 19 | 50–70% viable tumour | 6 |
| Tissue of origin | <50% viable tumour | 4 | |
| Bone | 26 | No info | 3 |
| Extra‐osseous | 19 | Fusion (RT‐PCR) | |
| Not specified | 1 | Positive | 43 |
| Therapeutic study | Neg | 2 | |
| AEWS0031 | 38(24 standard, 14 ICa) | Unknown | 1 |
| INT‐0154 | 8 | ||
IC, interval compression
Differentially expressed genes and gene signatures associate with clinical outcome
Next, we performed supervised analyses to compare gene expression between tumours derived from survivors and non‐survivors (Figure 2C, D). This analysis identified only 33 differentially expressed genes, and all but five were up regulated in non‐survivors (Table 2). Gene ontology analysis of these differentially expressed genes revealed a significant enrichment for biological processes involved in cell motility, cell migration and cell adhesion, implicating differential expression of metastasis‐associated programmes in disease relapse and progression (Table 3). Interestingly, low levels of GSTM2, a gene that encodes for a key enzyme in glutathione metabolism, were associated with worse outcome (Table 2). This finding corroborates earlier studies that demonstrated an association between expression of other glutathione metabolism genes and ES outcome 21, 42. In addition, lower level expression of TET1 in poor prognosis tumours (Table 2) is of interest given the key role that TET1 plays in DNA demethylation and the recent discovery of loss of function mutations in TET genes in human cancer 43, 44.
Table 2.
Differentially expressed genes—non‐survivors vs. survivors gene list (FDR < 0.2 and FC > 1.3)
| Gene symbol | Fold change | p | Gene symbol | Fold change | p |
|---|---|---|---|---|---|
| ANPEP | 1.65 | 0.0004 | PLXNA2 | 1.50 | 0.0006 |
| C10orf10 | 1.51 | 0.0003 | PODXL | 1.51 | 0.0004 |
| CCL18 | 3.38 | 0.0001 | PTPRB | 1.70 | 0.0001 |
| CDH5 | 1.52 | 0.0003 | RAPGEF5 | 1.56 | 0.0001 |
| CFI | 1.47 | 0.0004 | RGS16 | 1.36 | 0.0004 |
| CTSC | 1.82 | 0.0007 | RPS6KA2 | 1.59 | 0.0005 |
| DCBLD1 | 1.55 | 0.0001 | SLC29A1 | 1.60 | 0.0000 |
| DDIT3 | 1.52 | 0.0003 | TP53I11 | 1.32 | 0.0005 |
| EMR2 | 1.31 | 0.0005 | TSPAN15 | 1.40 | 0.0003 |
| ENG | 1.42 | 0.0002 | VEGFC | 1.38 | 0.0002 |
| HEY2 | 1.59 | 0.0005 | VWF | 1.64 | 0.0001 |
| ICAM1 | 1.44 | 0.0005 | FBXO15 | 0.68 | 0.0005 |
| IL6 | 1.41 | 0.0004 | GSTM2 | 0.63 | 0.0002 |
| ITGA9 | 1.58 | 0.0001 | LOC100132167 | 0.60 | 0.0004 |
| LYN | 1.59 | 0.0003 | NBPF3 | 0.47 | 0.0000 |
| NKAIN1 | 1.45 | 0.0005 | TET1 | 0.63 | 0.0005 |
| PLEK | 1.64 | 0.0006 |
Table 3.
Differentially expressed genes—non‐survivors vs. survivors: Enriched gene ontologies in non‐survivors (biological process p < 0.01 and >4 genes/category)
| Term | # of Genes | p |
|---|---|---|
| GO:0016477∼cell migration | 6 | 0.0002 |
| GO:0048870∼cell motility | 6 | 0.0003 |
| GO:0051674∼localization of cell | 6 | 0.0003 |
| GO:0009611∼response to wounding | 7 | 0.0004 |
| GO:0001568∼blood vessel development | 5 | 0.0011 |
| GO:0001944∼vasculature development | 5 | 0.0013 |
| GO:0007155∼cell adhesion | 7 | 0.0018 |
| GO:0022610∼biological adhesion | 7 | 0.0018 |
| GO:0006928∼cell motion | 6 | 0.0019 |
| GO:0001775∼cell activation | 5 | 0.0021 |
| GO:0042127∼regulation of cell proliferation | 7 | 0.0032 |
| GO:0006955∼immune response | 6 | 0.0092 |
Individual genes are rarely useful as prognostic biomarkers, whereas gene signatures can more often be successful predictors of outcome. Therefore, we next generated candidate gene signatures that incorporated three or more of the differentially expressed genes and calculated their potential as prognostic signatures. As expected, given that these signatures were derived from this same group of tumours, the ability of the signatures to classify survivors and non‐survivors was excellent (Table 4).
Table 4.
Differentially expressed genes—non‐survivors vs. survivors, prognostic gene signatures from differentially expressed genes
| Signature | AUC | Accuracy | Sens | Spec |
|---|---|---|---|---|
| CFI, RGS16, CDH5, SLC29A1 | 0.925 | 0.929 | 1.000 | 0.900 |
| CTSC, ANPEP, ITGA9, DCBLD1 | 0.913 | 0.857 | 1.000 | 0.800 |
| SLC29A1, CFI, TSPAN15, DDIT3, EMR2 | 0.925 | 0.857 | 1.000 | 0.800 |
| DCBLD1, GSTM2, LYN, RAPGEF5 | 0.925 | 0.857 | 0.750 | 0.900 |
| ANPEP, C10orf10, LOC100132167, NBPF3, PLEK | 1.000 | 1.000 | 1.000 | 1.000 |
| CTSC, DCBLD1, RGS16, TET1, CCL18, SLC29A1 | 1.000 | 0.929 | 1.000 | 0.900 |
| LOC100132167, HEY2, TP53I11, SLC29A1, RGS16, VWF | 0.950 | 0.929 | 1.000 | 0.900 |
| IL6, EMR2, CCL18, GSTM2, TP53I11, CTSC, DDIT3, RGS16, SLC29A1, ITGA9, TET1, HEY2, ICAM1, RPS6KA2 | 1.000 | 1.000 | 1.000 | 1.000 |
| EMR2, NBPF3, CCL18, SLC29A1, ICAM1, LOC100132167, PODXL, NKAIN1, FBXO15, IL6, ANPEP, GSTM2, TET1 | 0.975 | 0.857 | 0.750 | 0.900 |
AUC, area under the curve; Sens, sensitivity; Spec, specificity
Gene Set Enrichment Analysis identifies stromal interactions and chemokine signalling as prognostic variables
To investigate the potential existence of multi‐gene programmes that were associated with outcome, we next performed gene set enrichment analysis (GSEA). Interestingly, this analysis identified both integrin pathway (Figure 3A) and chemokine receptor signalling (Figure 3B) gene programmes as being up regulated in tumours from non‐survivors. The discovery of upregulated chemokine signalling genes in poor prognosis tumours is consistent with a prior microarray‐based study of ES that identified increased expression of CXCR4 and CXCR7 as biomarkers of aggressive disease 18. In the COG cohort, high CXCR7 was associated with diminished survival (Figure 3C, D), whereas CXCR4 expression did not correlate with outcome (not shown).
Figure 3.

Gene set enrichment demonstrating that integrin (A) and chemokine receptor signalling (B) pathways are more highly expressed in tumours from subjects who succumbed to their disease. Kaplan–Meier curves demonstrating that event‐free survival (C) and overall survival (D) for patients with high levels of CXCR7 expression (above median) are worse than for patients with low level CXCR7 expression (below median).
Prognostic gene signatures were not validated in an independent cohort of patient tumours
For validation, we profiled a completely independent set of 39 tumour biopsy samples obtained from patients registered on the European collaborative group clinical trials (30 from EuroEwing 99 and 9 from EICESS 92). These studies were run in parallel to the COG trials and outcomes were comparable between the groups. Of the 39 patients evaluated, 28 were long‐term survivors. Unexpectedly, no differentially expressed genes were identified between survivors and non‐survivors in the European cohort (FDR < 0.2 and fold‐change > 1.3). In addition, GSEA analysis of the European tumours also failed to identify significantly enriched gene sets (FDR < 0.25). Moreover, consistent with the absence of an intrinsic prognostic signature, the COG‐derived genes failed to classify the European tumours. Thus, despite their identification as potential prognostic biomarkers in the COG cohort, none of the genes or gene signatures that were identified in the test set could be validated in an independent group of clinically similar patients.
Stromal cell content impacts on gene expression and prognostic classification of ES tumours
The absence of independent prognostic genes or gene sets and the failure to validate the candidate prognostic biomarkers in the European tumour cohort led us to hypothesize that there may have been unappreciated differences between the groups. To address this, we reviewed their pathological features (Table 1) and noted that, while ten of the COG samples contained substantial stromal contamination, all of the tumours in the European cohort were composed of more than 70% viable tumour. The nature of non‐tumour stroma in the COG tumours varied but included both normal and reactive fibrovascular tissue as well as normal connective tissue into which the tumour cells had infiltrated (see representative H&E images in supplementary material, Figure 1). Given these histological distinctions, we reasoned that the ability to uniquely identify prognostic genes in the COG cohort might have been due to differences in non‐tumour stroma content. To address this, we repeated supervised analysis of the 43 COG tumours for which detailed information on stromal content was available. GSEA was independently performed on tumours that showed significant stromal content (N = 10; stromal content >30% of sample) and tumours that were primarily composed of viable tumour cells only (N = 33; stromal content <30% of sample). Interestingly, this analysis failed to identify any prognostic gene sets in the tumour‐rich samples (Table 5). In contrast, the prognosis‐associated gene sets that were identified in the cohort as a whole were mostly attributed to the stroma‐rich tumours (Table 6). These findings together demonstrate the appreciable contribution of tumour heterogeneity to prognostic biomarkers in ES and provide evidence that tumour–stromal interactions are critical determinants of tumour behaviour and response to therapy.
Table 5.
Prognostic gene sets are determined by stromal content: Comparison of gene set enrichment results in COG tumours with and without significant stromal contamination
| Outcome | <30% Stroma | >30% Stroma | ||
|---|---|---|---|---|
| # of cases | # of Enriched genesets | # of cases | # of Enriched genesets | |
| Survival | ||||
| Dead | 7 | 0 | 4 | 9 |
| Alive | 26 | 0 | 6 | 0 |
| EFS | ||||
| Event (Relapse/SMNa) | 11 | 0 | 4 | 9 |
| No event | 22 | 0 | 6 | 0 |
SMN, secondary malignant neoplasm (1 case); relapse (10 cases)
Table 6.
Prognostic gene sets are determined by stromal content: Enriched pathways found in COG samples with stromal content
| Name | p‐Val | FDR q‐val |
|---|---|---|
| KEGG_ALLOGRAFT_REJECTION | 0.037 | 0.203 |
| PID_HES_HEYPATHWAY | 0.000 | 0.209 |
| PID_INTEGRIN_CS_PATHWAY | 0.000 | 0.211 |
| REACTOME_INTEGRIN_CELL_SURFACE_INTERACTIONS | 0.000 | 0.223 |
| REACTOME_INTEGRIN_ALPHAIIB_BETA3_SIGNALING | 0.000 | 0.228 |
| BIOCARTA_VEGF_PATHWAY | 0.000 | 0.234 |
| PID_INTEGRIN2_PATHWAY | 0.144 | 0.235 |
| PID_SYNDECAN_1_PATHWAY | 0.025 | 0.241 |
| BIOCARTA_FCER1_PATHWAY | 0.011 | 0.250 |
Discussion
ES is a highly aggressive bone and soft tissue tumour, which is associated with a high rate of recurrence and no clinically validated prognostic biomarkers. In this study, we report the findings of a multi‐centre, international effort designed to determine if gene expression profiling could be used to classify patients with localized ES into low‐ and high‐risk categories.
The quality of RNA isolated from the >250 patient samples in the COG biorepository was largely insufficient for Affymetrix HuEx‐based profiling, and we were able to generate data for only 59 patients with localized disease. Three tumours were excluded from analysis due to inadequate tumour content or a lack of available outcome data. An additional ten patients required exclusion for reasons of divergent chip signal intensity. Thus, we were able to generate quality gene expression profiles for only 46 patients, illustrating the challenges encountered when RNA‐based assays are used for analysis of banked sarcoma specimens, especially bone sarcomas. Moreover, they show that, even with rigorous batch‐correcting algorithms, it is sometimes impossible to correct for technical differences that contribute significantly to variations in signal intensities, skewing results and adversely impacting interpretation of microarray data.
The strengths of this study include prospective tumour collection, cooperative group therapeutic trials and independent analysis of two distinct patient cohorts. In addition, we made use of rigorous statistical algorithms to ensure quality control, resulting in an 18% reduction in our sample size, essential to ascertain that the final clinical correlates analyses were not skewed by non‐biologic factors. Failure to appreciate technical variability and use of less rigorous bioinformatic analytic tools can lead to invalid conclusions from microarray data 45.
Unsupervised analysis of the 46 COG tumours revealed no separation into sub‐groups based on differences in overall gene expression. It should be noted, however, that this unsupervised approach would not necessarily be able to classify tumour sub‐groups that might exist due to differences in biologic pathways. Supervised analyses of the data using strategies designed to test specific pre‐determined hypotheses may uncover differences that would not be evident with unsupervised clustering methods. As examples, supervised comparison of gene expression profiles between BMI‐1 over‐expressing and BMI‐1 negative ES revealed differences in pathway activation not evident from unsupervised analyses agnostic to BMI‐1 status 46. Likewise, subtle differences in gene expression are detectable between tumours with different EWS‐ETS fusions, but these differences are only apparent when supervised analysis of the data is performed 23. Thus, going forward these microarray data from well‐annotated patient tumours will provide a rich resource for directed investigations into the role of specific biological pathways in ES pathogenesis.
The most striking finding in our study was that prognostic genes and gene signatures were not validated in an independent cohort. Indeed, we were unable to identify any genes or gene sets that were significantly associated with outcome in the European cohort. Interestingly, detailed analysis of the pathological profiles of the two cohorts revealed a potential explanation for these observations. The most enriched prognostic gene sets in the COG tumour group were largely associated with pathways involved in tumour–stroma or other tumour–host interactions. In particular, integrin and chemokine signalling were identified as contributing to prognosis. However, these gene sets were only enriched in COG tumours that contained an abundance of non‐tumour cell elements including reactive fibrovascular tissue, normal connective tissue or both. Tumours that were composed of relatively pure populations of viable tumour cells with little stroma showed no association of any genes or pathways with clinical outcome. What remains unclear from these studies is the precise source of the differential gene expression between survivors and non‐survivors in the stroma‐rich samples. While it is possible that altered gene expression in the tumour cells themselves accounted for the observed differences, it is equally possible that non‐malignant cells in the stroma‐rich samples contributed to the prognostic gene expression signatures. Future studies will need to address both of these possibilities. Specifically, immunohistochemical staining of candidate prognostic proteins will need to be performed to define the contribution of tumour and non‐tumour cells to chemokine‐ and integrin‐associated pathway activation in ES tumours.
Despite these caveats, it is interesting that recent studies support a key role for both integrin‐dependent and chemokine signalling in mediating ES progression. Specifically, high levels of activation of focal adhesion kinase (FAK), a central regulator of integrin signalling that promotes cell adhesion and migration, are evident in ES and inhibition of FAK attenuates tumour growth 47. In addition, the metastatic capacity of ES cells that is conferred by activation of the ERBB4 tyrosine kinase receptor is, in part, mediated by FAK 48. Several other studies have identified key roles for chemokines and their receptors in tumour growth and metastasis, in particular chemokine receptors CXCR4 and CXCR7 18, 49, 50, 51, 52. Both receptors use CXCL12 as their activating ligand, and CXCL12/CXCR4 interactions promote ES cell proliferation and invasion. Thus, it is revealing that chemokine signalling was among the prognostic gene sets, and that high levels of CXCR7 were associated with worse outcomes in the COG patient cohort.
These findings indicate that the relationship between ES cells and their host micro‐environment is critical to tumour pathogenesis, and that full understanding of the complex biology of ES progression will not be achieved by studying the tumour cells in isolation. In addition, the repeated observation that integrin and chemokine signalling contribute to the aggressive nature of ES supports further investigation of these pathways as novel therapeutic targets.
Author contributions
ERL and TJT conceived of study design. ERL, TJT, DAB, MK, RBW, AR, JP and UD performed data collection. SV, JLV, JA, LH, DAB and ERL analysed data and generated figures. SV, ERL, JLV, JA, LH, RBW and DAB were involved in writing the paper and all authors had final approval of the submitted and published versions.
Supporting information
The following supplementary material may be found in the online version of this article.
Figure S1. Representative H&E images (×20) of four ES tumours from the COG tumour cohort that showed evidence of stromal contamination. Note that stroma consisted of reactive fibrosis, normal connective tissue, or both, with regions of infiltrating nests of ES tumour cells either in clusters or individually.
Acknowledgements
The authors would like to thank Drs. Michele Wing, Andreas Braeuninger, Richard Sposto and Diana Abdueva for technical and statistical support. Grant support: SPECS 1U01CA11475 (TJT, ERL), the Chair's Grant U10 CA98543 and Human Specimen Banking Grant U24 CA114766 of the Children's Oncology Group. Additional support for research is provided by a grant from the WWWW (QuadW) Foundation, Inc. (www.QuadW.org) to the Children's Oncology Group, by the SARC Sarcoma SPORE 5U54CA168512 (ERL), the St. Baldrick's Foundation (SLV) and the Daniel P. Sullivan Fund (RBW). The Euro‐EWING group received funding from German Cancer Aid DKH‐108128, by Federal Ministry of Education and Research Germany BMBF 01GM0869 and ERA NET, 01KT1310.
The authors have no conflicts of interest to disclose
References
- 1. Balamuth NJ, Womer RB. Ewing's sarcoma. Lancet Oncol 2010; 11: 184–192. [DOI] [PubMed] [Google Scholar]
- 2. Womer RB, West DC, Krailo MD, et al Randomized controlled trial of interval‐compressed chemotherapy for the treatment of localized Ewing sarcoma: a report from the Children's Oncology Group. J Clin Oncol 2012; 30: 4148–4154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Le Deley MC, Paulussen M, Lewis I, et al Cyclophosphamide compared with ifosfamide in consolidation treatment of standard‐risk Ewing sarcoma: results of the randomized noninferiority Euro‐EWING99‐R1 trial. J Clin Oncol 2014; 32: 2440–2448. [DOI] [PubMed] [Google Scholar]
- 4. Ginsberg JP, Goodman P, Leisenring W, et al Long‐term survivors of childhood Ewing sarcoma: report from the childhood cancer survivor study. J Natl Cancer Inst 2010; 102: 1272–1283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Youn P, Milano MT, Constine LS, et al Long‐term cause‐specific mortality in survivors of adolescent and young adult bone and soft tissue sarcoma: a population‐based study of 28,844 patients. Cancer 2014; 120: 2334–2342. [DOI] [PubMed] [Google Scholar]
- 6. Shukla N, Schiffman J, Reed D, et al Biomarkers in Ewing sarcoma: the promise and challenge of personalized medicine. A report from the children's oncology group. Front Oncol 2013; 3: 141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Carlson JJ, Roth JA. The impact of the Oncotype Dx breast cancer assay in clinical practice: a systematic review and meta‐analysis. Breast Cancer Res Treat 2013; 141: 13–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Erho N, Crisan A, Vergara IA, et al Discovery and validation of a prostate cancer genomic classifier that predicts early metastasis following radical prostatectomy. PLoS One 2013; 8: e66855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Trinquand A, Tanguy‐Schmidt A, Ben Abdelali R, et al Toward a NOTCH1/FBXW7/RAS/PTEN‐based oncogenetic risk classification of adult T‐cell acute lymphoblastic leukemia: a group for research in adult acute lymphoblastic leukemia study. J Clin Oncol 2013; 31: 4333–4342. [DOI] [PubMed] [Google Scholar]
- 10. Asgharzadeh S, Pique‐Regi R, Sposto R, et al Prognostic significance of gene expression profiles of metastatic neuroblastomas lacking MYCN gene amplification. J Natl Cancer Inst 2006; 98: 1193–1203. [DOI] [PubMed] [Google Scholar]
- 11. Oberthuer A, Hero B, Berthold F, et al Prognostic impact of gene expression‐based classification for neuroblastoma. J Clin Oncol 2010; 28: 3506–3515. [DOI] [PubMed] [Google Scholar]
- 12. Stricker TP, Morales La Madrid A, Chlenski A, et al Validation of a prognostic multi‐gene signature in high‐risk neuroblastoma using the high throughput digital NanoString nCounter system. Mol Oncol 2014; 8: 669–678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Davicioni E, Anderson JR, Buckley JD, et al Gene expression profiling for survival prediction in pediatric rhabdomyosarcomas: a report from the children's oncology group. J Clin Oncol 2010; 28: 1240–1246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Wilson RA, Teng L, Bachmeyer KM, et al A novel algorithm for simplification of complex gene classifiers in cancer. Cancer Res 2013; 73: 5625–5632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Missiaglia E, Williamson D, Chisholm J, et al PAX3/FOXO1 fusion gene status is the key prognostic molecular marker in rhabdomyosarcoma and significantly improves current risk stratification. J Clin Oncol 2012; 30: 1670–1677. [DOI] [PubMed] [Google Scholar]
- 16. Cleaver AL, Beesley AH, Firth MJ, et al Gene‐based outcome prediction in multiple cohorts of pediatric T‐cell acute lymphoblastic leukemia: a Children's Oncology Group study. Mol Cancer 2010; 9: 105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Kang H, Chen IM, Wilson CS, et al Gene expression classifiers for relapse‐free survival and minimal residual disease improve risk classification and outcome prediction in pediatric B‐precursor acute lymphoblastic leukemia. Blood 2010; 115: 1394–1405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Bennani‐Baiti IM, Cooper A, Lawlor ER, et al Intercohort gene expression co‐analysis reveals chemokine receptors as prognostic indicators in Ewing's sarcoma. Clin Cancer Res 2010; 16: 3769–3778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Ohali A, Avigad S, Zaizov R, et al Prediction of high risk Ewing's sarcoma by gene expression profiling. Oncogene 2004; 23: 8997–9006. [DOI] [PubMed] [Google Scholar]
- 20. Schaefer KL, Eisenacher M, Braun Y, et al Microarray analysis of Ewing's sarcoma family of tumours reveals characteristic gene expression signatures associated with metastasis and resistance to chemotherapy. Eur J Cancer 2008; 44: 699–709. [DOI] [PubMed] [Google Scholar]
- 21. Scotlandi K, Remondini D, Castellani G, et al Overcoming resistance to conventional drugs in Ewing sarcoma and identification of molecular predictors of outcome. J Clin Oncol 2009; 27: 2209–2216. [DOI] [PubMed] [Google Scholar]
- 22. Paulussen M, Craft AW, Lewis I, et al Results of the EICESS‐92 study: two randomized trials of Ewing's sarcoma treatment—cyclophosphamide compared with ifosfamide in standard‐risk patients and assessment of benefit of etoposide added to standard treatment in high‐risk patients. J Clin Oncol 2008; 26: 4385–4393. [DOI] [PubMed] [Google Scholar]
- 23. van Doorninck JA, Ji L, Schaub B, et al Current treatment protocols have eliminated the prognostic advantage of type 1 fusions in Ewing sarcoma: a report from the Children's Oncology Group. J Clin Oncol 2010; 28: 1989–1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Peter M, Gilbert E, Delattre O. A multiplex real‐time pcr assay for the detection of gene fusions observed in solid tumors. Lab Invest 2001; 81: 905–912. [DOI] [PubMed] [Google Scholar]
- 25. Lockstone HE. Exon array data analysis using Affymetrix power tools and R statistical software. Brief Bioinform 2011; 12: 634–644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 2007; 8: 118–127. [DOI] [PubMed] [Google Scholar]
- 27. Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 2010; 26: 1572–1573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Smyth GK. Limma: linear models for microarray data. In Gentleman R, Carey, V., Dudoit., S., Irizarry R., Huber W. (eds.), Bioinformatics and Computational Biology Solutions using R and Bioconductor. Springer: New York: 2005; 397–420. [Google Scholar]
- 29. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol 1995; 57: 289–300. [Google Scholar]
- 30. da Huang W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009; 4: 44–57. [DOI] [PubMed] [Google Scholar]
- 31. Starmans MHW, Fung G, Steck H, et al A simple but highly effective approach to evaluate the prognostic performance of gene expression signatures. PLoS One 2011; 6: e28320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Janitza S, Strobl C, Boulesteix A‐L. An AUC‐based permutation variable importance measure for random forests. BMC Bioinform 2013; 14: 119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Breiman L. Random forests. Mach Learn 2001; 45: 5–32. [Google Scholar]
- 34. Cortes C, Vapnik V. Support‐vector networks. Mach Learn 1995; 20: 273–297. [Google Scholar]
- 35. Bishop CM. Pattern Recognition and Machine Learning. Springer, New York, USA, 2006. [Google Scholar]
- 36. Kuhn M. Building predictive models in R using the caret package. J Stat Softw 2008; 28: 1–26. [Google Scholar]
- 37. Bishop YM, Fienberg SE, Holland PW. Discrete Multivariate Analysis. MIT press: Cambridge, MA, 1975. [Google Scholar]
- 38. Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. Wiley: New York, 1980. [Google Scholar]
- 39. Borinstein SC, Beeler N, Block JJ, et al A decade in banking Ewing sarcoma: a report from the Children's Oncology Group. Front Oncol 2013; 3: 57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Marino‐Enriquez A, Fletcher CD. Round cell sarcomas—biologically important refinements in subclassification. Int J Biochem Cell Biol 2014; 53: 493–504. [DOI] [PubMed] [Google Scholar]
- 41. Specht K, Sung YS, Zhang L, et al Distinct transcriptional signature and immunoprofile of CIC‐DUX4 fusion‐positive round cell tumors compared to EWSR1‐rearranged ewing sarcomas: further evidence toward distinct pathologic entities. Genes Chromosomes Cancer 2014; 53: 622–633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Luo W, Gangwal K, Sankar S, et al GSTM4 is a microsatellite‐containing EWS/FLI target involved in Ewing's sarcoma oncogenesis and therapeutic resistance. Oncogene 2009; 28: 4126–4132. [DOI] [PubMed] [Google Scholar]
- 43. Abdel‐Wahab O, Mullally A, Hedvat C, et al Genetic characterization of TET1, TET2, and TET3 alterations in myeloid malignancies. Blood 2009; 114: 144–147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Figueroa ME, Abdel‐Wahab O, Lu C, et al Leukemic IDH1 and IDH2 mutations result in a hypermethylation phenotype, disrupt TET2 function, and impair hematopoietic differentiation. Cancer Cell 2010; 18: 553–567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Shieh AD, Hung YS. Detecting outlier samples in microarray data. Stat Appl Genet Mol Biol 2009; 8: 1–24. [DOI] [PubMed] [Google Scholar]
- 46. Cooper A, van Doorninck J, Ji L, et al Ewing tumors that do not overexpress BMI‐1 are a distinct molecular subclass with variant biology: a report from the Children's Oncology Group. Clin Cancer Res 2011; 17: 56–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Crompton BD, Carlton AL, Thorner AR, et al High‐throughput tyrosine kinase activity profiling identifies FAK as a candidate therapeutic target in Ewing sarcoma. Cancer Res 2013; 73: 2873–2883. [DOI] [PubMed] [Google Scholar]
- 48. Mendoza‐Naranjo A, El‐Naggar A, Wai DH, et al ERBB4 confers metastatic capacity in Ewing sarcoma. EMBO Mol Med 2013; 5: 1019–1034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Berghuis D, Schilham MW, Santos SJ, et al The CXCR4‐CXCL12 axis in Ewing sarcoma: promotion of tumor growth rather than metastatic disease. Clin Sarcoma Res 2012; 2: 24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Jin Z, Zhao C, Han X, et al Wnt5a promotes ewing sarcoma cell migration through upregulating CXCR4 expression. BMC Cancer 2012; 12: 480. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Hamdan R, Zhou Z, Kleinerman ES. Blocking SDF‐1alpha/CXCR4 downregulates PDGF‐B and inhibits bone marrow‐derived pericyte differentiation and tumor vascular expansion in Ewing tumors. Mol Cancer Ther 2014; 13: 483–491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Krook MA, Nicholls LA, Scannell CA, et al Stress‐induced CXCR4 promotes migration and invasion of ewing sarcoma. Mol Cancer Res 2014; 12: 953–964. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
The following supplementary material may be found in the online version of this article.
Figure S1. Representative H&E images (×20) of four ES tumours from the COG tumour cohort that showed evidence of stromal contamination. Note that stroma consisted of reactive fibrosis, normal connective tissue, or both, with regions of infiltrating nests of ES tumour cells either in clusters or individually.
