Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Jun 1.
Published in final edited form as: Clin Cancer Res. 2014 Mar 25;20(11):2873–2884. doi: 10.1158/1078-0432.CCR-14-0205

A surprising cross-species conservation in the genomic landscape of mouse and human oral cancer identifies a transcriptional signature predicting metastatic disease

Michael D Onken 1, Ashley E Winkler 2, Krishna-Latha Kanchi 3, Varun Chalivendra 2, Jonathan H Law 2, Charles G Rickert 2, Dorina Kallogjeri 2, Nancy P Judd 2, Gavin P Dunn 6, Jay F Piccirillo 2, James S Lewis Jr 2,5, Elaine R Mardis 3,4, Ravindra Uppaluri 2,6,7
PMCID: PMC4096804  NIHMSID: NIHMS579580  PMID: 24668645

Abstract

Purpose

Improved understanding of the molecular basis underlying oral squamous cell carcinoma (OSCC) aggressive growth has significant clinical implications. Herein, cross-species genomic comparison of carcinogen-induced murine and human OSCCs with indolent or metastatic growth yielded results with surprising translational relevance.

Experimental Design

Murine OSCC cell lines were subjected to next-generation sequencing (NGS) to define their mutational landscape, to define novel candidate cancer genes and to assess for parallels with known drivers in human OSCC. Expression arrays identified a mouse metastasis signature and we assessed its representation in 4 independent human datasets comprising 324 patients using weighted voting and Gene Set Enrichment Analysis (GSEA). Kaplan-Meier analysis and multivariate Cox proportional hazards modeling were used to stratify outcomes. A qRT-PCR assay based on the mouse signature coupled to a machine-learning algorithm was developed and used to stratify an independent set of 31 patients with respect to metastatic lymphadenopathy.

Results

NGS revealed conservation of human driver pathway mutations in mouse OSCC including in Trp53, MAPK, PI3K, NOTCH, JAK/STAT and FAT1–4. Moreover, comparative analysis between The Cancer Genome Atlas (TCGA) and mouse samples defined AKAP9, MED12L and MYH6 as novel putative cancer genes. Expression analysis identified a transcriptional signature predicting aggressiveness and clinical outcomes, which were validated in 4 independent human OSCC datasets. Finally, we harnessed the translational potential of this signature by creating a clinically feasible assay that stratified OSCC patients with a 93.5% accuracy.

Conclusions

These data demonstrate surprising cross-species genomic conservation that has translational relevance for human oral squamous cell cancer.

Introduction

Aggressive carcinogen-induced oral squamous cell carcinomas (OSCC) are difficult to treat due to locoregional recurrences. In contrast, more indolent lesions can be treated with single modality surgical intervention with low morbidity and favorable outcomes. Histologic criteria such as perineural or lymphovascular invasion and tumor depth, harbingers of early spread to regional lymph nodes, are commonly used to predict tumor behavior (1, 2). Additionally, among clinical staging criteria, metastatic lymphadenopathy is one of the best predictors of a poor prognosis as it likely reflects aggressive primary tumor biology (35) (seer.cancer.gov/statfacts/html/oralcav.html). This staging is especially challenging in early disease as 20% of these patients have pathologically identifiable disease that is clinically undetectable. Thus, all “high risk” patients undergo neck dissection operations, which prove to be unnecessary in nearly 80% of clinically node negative patients. However, there is a dearth of studies delineating markers predictive of lymph node involvement, and genetic stratification approaches are at an early stage (6, 7). In addition, the molecular underpinnings of aggressive OSCC growth and metastasis remain largely undefined (5, 8).

Next generation sequencing (NGS) of human head and neck squamous cell carcinomas (HNSCC), of which OSCC are a significant subset, has confirmed previously identified aberrations (e.g. TP53 and CDKN2A) and has also defined novel NOTCH and FAT gene mutations along with frequent PI3K pathway mutations (914). In addition, other mitogenic cascades, such as RAS and JAK/STAT, are altered at lower frequencies. In contrast, mutations that distinguish indolent from aggressive human OSCC remain undefined. Genomic approaches to identify signatures that predict metastatic behavior in OSCC have been described but none have approached the clinical impact of tests available for breast cancer and ocular melanoma (1519). Importantly, molecular clues reflecting metastatic regulators have not arisen from these biomarker studies.

To better understand the genomic basis of the aggressive OSCC phenotype, we employed our recently described carcinogen-induced mouse oral cancer (MOC) cell line model (20). These MOC lines, which parallel the distinct phenotypes seen in human disease, are either CD44low and indolent, or CD44high and aggressive/metastatic. Herein, we used genomic approaches to (1) define parallels to human OSCC, (2) understand the transcriptomic differences that underlie both phenotypes and (3) translate this information into a clinically relevant context. Remarkably, despite differences in species and carcinogen exposure, many of the same drivers implicated in humans were altered in MOC lines, revealing highly conserved pathways in OSCC tumorigenesis. In addition, we identified a gene expression signature associated with metastasis that was conserved from mouse to three distinct human datasets, uncovering potential promoters of aggressive OSCC. Finally, we successfully translated this signature into a platform for potential clinical application. Together, this analysis identifies novel pathways associated with aggressive growth and metastasis that may contribute functionally to cancer progression and lead to improved diagnostics.

Materials and Methods

Study Approval

Mouse studies were performed and human specimens were obtained under approved protocols of Washington University Animal Studies and the Human Research Protection Office, respectively.

MOC cell line model

Cell lines were generated, characterized and propagated as described (20). Further analysis since their initial description revealed that the MOC7 and MOC10 lines were derived from MOC2 and were thus renamed MOC2–7 and MOC2–10 (data not shown). MOC2LN was generated from a lymph node bearing metastatic MOC2. Primary C57BL/6 oral keratinocytes were generated by microdissection of oral mucosa from wild type mice (Taconic), generating single cell suspensions and growing to near confluence using keratinocyte media (CellNTec). Media was then changed to MOC line media for 24 hours prior to RNA isolation.

Exome Capture and Sequencing

Genomic DNA from MOC cells was extracted using DNeasy Blood & Tissue Kit (Qiagen) and was constructed into Illumina libraries according to the manufacturer’s protocol (Illumina Inc, San Diego, CA). One microgram of the size-fractionated Illumina library was hybridized to the Agilent mouse exome reagent. After the 24-hour, 42°C hybridization, we added DynaBeads Streptavidin-coated magnetic beads to selectively remove the biotinylated Agilent probes and hybridized cDNA library fragments. The beads were washed, and the captured library fragments were released into solution using NaOH. The recovered fragments were PCR amplified according to the manufacturer’s protocol using 11 PCR cycles. Illumina library quantification was completed using the KAPA SYBR FAST qPCR Kit (KAPA Biosystems, Woburn, MA). The qPCR result was used to determine the quantity of library necessary to produce 180,000 clusters on a single lane of the Illumina GAIIx. One lane of 100bp paired-end data was generated for each captured sample on the HiSeq2000 (Illumina).

Mutation Detection and Annotation

As normal tissue from the mice bearing the parental tumors was not available, these mutation calls were compared to the reference C57BL/6 genome for MOC1, 22 and 23 or to the CXCR3−/− exome that we generated in this analysis for MOC2 and 2–10. Sequence data from each tumor and the C57BL/6 genome were aligned independently to NCBI Build 37 of the mouse reference using BWA 0.5.9 and de-duplicated using Picard 1.29 (http://picard.sourceforge.net). Sample variants were called using Samtools (Version 0.1.7a (revision #599)). Somatic single nucleotide variants (SNV) were detected using VarScan 2 (http://varscan.sourceforge.net) with the following parameters: min-coverage 30-min-var-freq 0.08-normal-purity 1-p-value 0.10-somatic-p-value 0.001-validation 1 and SomaticSniper. Somatic indels were extracted using GATK (Version 3 http://genome.cshlp.org/cgi/reprint/gr.107524.110v1) and Pindel. All predicted variants were filtered to remove false positives due to potential homopolymer artifacts (variants found in homopolymers with sequence length ≥ 5 were removed), strand specific sequence artifacts, ambiguously mapped data (the average mapping quality difference between the reference supporting reads and variant supporting reads >30), and low quality data at the beginning and end of reads (variants supported exclusively by bases observed in first or last 10% of the reads). Variants with an allele frequency <8% were removed. Initial variant transcript annotation was based on NCBI mouse build37. Due to lack of a true matched normal tissue, we had more somatic SNPs than expected, so we removed “clustered” SNPs using our internal cluster filter, which allowed a maximum of 2 variants per 0.5MB genome region and also filtered out mouse dbSNPs. To identify any sample specific mutations, variant allele frequency was calculated for all the SNVs using an internally developed tool Bam2ReadCount (unpublished), which counts the number of reads supporting the reference and variant alleles. We accessed TCGA HNSCC mutational data from (http://gdac.broadinstitute.org/runs/analyses__latest/reports/cancer/HNSC-TP/MutSigNozzleReportCV/nozzle.html).

Expression microarrays

MOC line and primary oral keratinocyte total RNA was isolated using the RNeasy kit (Qiagen) and subjected to gene expression profiling using Illumina MouseRef-8 Expression BeadChips (Illumina, San Diego, CA). Raw expression data were subjected to cubic spline normalization in GenomeStudio (version 2011.1). Microarray data are available in NCBIs GEO (GSE50041). Principal component analysis (PCA), ANOVA and hierarchical clustering were performed with Partek Genomics Suite (version 6.6) using a significance of p<0.01 as a threshold for gene inclusion. Significance Analysis of Microarrays (SAM), Version 4.0 was used to generate a ranked gene list, and a threshold of q<10% was then used to select the most highly significant genes that were up or down regulated in indolent versus aggressive mouse cell lines. These lists were used as signature gene sets for Gene Set Enrichment Analysis (GSEA). Human OSCC expression datasets were accessed via public databases and information regarding patient selection, demographics, tumor staging and treatment outcomes were reported in their original publications or on the TCGA data portal. For the TCGA dataset, we analyzed RNA-seq data from 134 cases of OSCC with sufficient annotation to determine regional lymph node involvement.

qRT-PCR

Total RNA was isolated from MOC cell lines (RNeasy, Qiagen) and converted to cDNA using the High Capacity cDNA Reverse Transcription Kit (ABI). Taqman gene expression assays with GAPDH controls were performed in duplicate using the Taqman Fast Advanced Master Mix (ABI) on an ABI Step One Plus. Relative expression for each probe was then calculated using the comparative Ct method.

Iterative GSEA-based enrichment

Gene Set Enrichment Analysis software and a complete description of the algorithm are provided online by the Broad Institute (http://broadinstitute.org/GSEA, (21)). Each published OSCC dataset was formatted for GSEA and classified by regional lymph node involvement or stage. GSEA was applied to each dataset using the two lists of significantly up- and down-regulated genes in indolent versus aggressive mouse cell lines. The enrichment scores assigned by GSEA were then used to trim away genes that were oppositely enriched to produce two new, trimmed ranked gene lists derived from each human dataset. GSEA was performed again using the trimmed lists from each dataset against each of the other human datasets; e.g., the lists trimmed by the FHCRC dataset were tested against the MDA dataset, and vice versa, resulting in six pairs of lists (Figure 4A). This process was continued for another round, producing the final lists that had been trimmed based on enrichment of the mouse genes in all three human expression sets.

Figure 4.

Figure 4

Derivation of OCAMP-B by enrichment analysis (A) Schematic of iterative GSEA showing selection of enrichment in each dataset (1st trim) and tandem enrichment in a 2nd trim that finally yields the 118-gene OCAMP-B signature. (B) Venn diagram of OCAMP-A enrichment on three datasets with 118 common genes defined as OCAMP-B. Note that 56 OCAMP-A genes did not enrich in any dataset. (C) Final GSEA on all three human datasets using OCAMP-B. (D) Kaplan-Meier analysis after OCAMP-B based weighted voting of MD dataset showing significant overall survival differences (p<0.001). (E) Kaplan-Meier analysis of MD dataset by stage compared to signature-based assignment (p=0.043 for Stage I/II and p=0.033 for Stage II/IV). (F) OCAMP assignment and pathologic node status are equivalent and importantly 18/18 patients who were cN0 but were pN+ were correctly identified.

Development of support vector machine (SVM) based clinical assay

Five FFPE sections (10 uM) each from surgically treated OSCC patients were obtained and tumor areas marked by a board certified head and neck pathologist (JSL). These areas were microdissected and combined for each individual tumor. RNA was harvested using RecoverAll (Ambion) and converted to cDNA using the High Capacity cDNA Reverse Transcription Kit (ABI). Using pooled Taqman Gene Expression Assays (for 42 discriminating and 3 housekeeping genes (GAPDH, ACTIN and UBC)) and Taqman Pre-Amp Master Mix (ABI) all samples were pre-amplified for 14 cycles. Samples were then diluted 20-fold and assayed in duplicate for individual genes using Taqman probes and Gene Expression Master Mix on an ABI Step One Plus. ΔCt values were calculated by subtracting the geometric mean of the mean Ct values of the three endogenous control genes from the mean ΔCt of each discriminating gene. The 42 genes were refined into the 19-gene set by SVM analysis (http://www.chibi.ubc.ca/cgi-bin/nph-SVMsubmit.cgi) on a 17-tumor training set with known pathologic status. SVM was able to accurately classify the 17 tumors when submitted as unknowns using the 19-gene set data. With the trained SVM, we then submitted data from 13 independent tumors, again with known pathologic status, as unknowns for classification.

For fresh biopsy samples, we acquired RNA prepared from freshly frozen OSCC tumor samples at surgery from the Siteman Cancer Center Tissue Procurement Core. RNA was processed as above for analysis with discriminating and housekeeping genes. We were able to refine the 42 genes into a 10-gene list for fresh samples. Using these probes, we trained the SVM with new data from 16 fresh biopsy tumors. Subsequently, 18 independent test set OSCCs were analyzed and stratified as above.

Statistics

Weighted voting was performed using GenePattern version 3.3.3 for classification of human tumor microarray data (http://www.broadinstitute.org/cancer/software/genepattern). For weighted voting, the gene expression data for the OCAMP-A signature genes were collected from the UW/FHCRC published dataset. Weighted voting was performed on the entire set, followed by leave-one-out cross-validation, which identified a subset of 26 tumors with correct calls and high confidence (>0.4). This subset was then used as a training set to re-classify the rest of the samples. After identifying the 118 genes of the OCAMP-B signature, leave-one-out cross-validation was performed using the weighted voting algorithm to re-classify samples according to the OCAMP-B signature. Kaplan-Meier survival analysis was performed on the re-classified UW/FHCRC and MDA samples using clinical follow-up data available with the datasets. Cross tabulation was used to explore the relationship of OCAMP with clinical TNM stage. The impact of gene signature on survival was evaluated using the product limit Kaplan-Meier method. The log rank test was used for comparison of survival curves and was evaluated at alpha level of 0.05. Stratified analysis was employed to explore role of the signature within TNM clinical stage categories. Statistical analysis was performed in IBM-SPSS (v20.0).

Results

Murine OSCC model and NGS

Previously, we described a 7,12 dimethylbenzanthracene (DMBA)-induced mouse cell line model of OSCC where, upon transplantation, individual lines displayed fixed in vivo phenotypes (Fig. 1A). The indolent lines all formed tumors in RAG2−/− immunodeficient mice but only MOC1 and MOC22 grew in wild type mice. Of the aggressive lines, MOC2–7 and MOC2–10 were derived from the MOC2 line, but the MOC2–10 line was included in the current analysis because it uniquely displayed lung in addition to lymph node metastasis. Note that we used the flank model for ease of tumor measurement, but lymph node metastasis was also observed upon orthotopic transplantation (20). MOC growth behaviors were consistent with human OSCC clinical behavior leading us to investigate whether their somatic alterations were also congruent. NGS was performed on 3 indolent and 2 related aggressive/metastatic lines with excellent coverage depth (Fig. S1A).

Figure 1.

Figure 1

Next generation sequencing analysis of MOC cell lines (A) Overview of MOC cell line model illustrating biologic behavior upon flank injection of cells. Note MOC23 (italicized) only grows in RAG2−/− mice, (B) Number of SNVs in each MOC line, (C) Distribution of DMBA induced changes in 6 core driver pathways of HNSCC shown as a “signal flag” plot of mutation subtypes in the indolent and aggressive cell lines (numbers of genes in each driver pathway in parentheses). The boxed nucleotide changes (A→T, T→A, C→A and G→T) represent the most common DMBA induced alterations. Note, the underlined G:C→T:A change is typical for tobacco-induced mutations (11). (D) Selected Oncoprint of mutation rates within the TCGA cohort for AKAP proteins showing that 57/279 patients have alterations in indicated AKAPs. (E) Oncoprint for MED proteins showing that 41/279 patients have alterations in indicated MED components.

Mutation overview

Many somatic non-synonymous single nucleotide variants (nsSNVs) were identified in these lines as expected for carcinogen treated tumors (Fig. 1B, Fig. S1B, Tables S1–2 and (22)). Observed mutations were consistent with the known predilection of DMBA for first, A:T→T:A (range 48.7–59.3% of total) and second, G:C→T:A (range of 14.4–17.6%) transversions (Fig. 1C, Table S3) (23) overlapping with G:C→T:A mutations described for HNSCC (11).

Conservation of candidate driver mutations between MOC lines and human HNSCC

We next compared the MOC line mutations with the 32 most significantly mutated genes from the TCGA HNSCC effort (Table S4). Surprisingly, as a group, the MOC lines bore mutations in many of these same genes with seven of the top ten genes altered in human HNSCC also carrying mutations in the MOC lines (Table 1). We also asked whether drivers described in human OSCC were present in MOC lines. Recent work has highlighted the NOTCH, PI3K, MAPK, JAK/STAT, FAT families and Trp53 as pathways critical for HNSCC (913). Again, MOC lines bore mutations in the same driver pathways described for HNSCC and changes were typical of the DMBA spectrum as described above (Fig. 1C, Table S5). Whereas all MOC lines had mutations in Trp53, MAPK and the FAT family of genes, only the indolent cell lines showed NOTCH, JAK/STAT and PI3K pathway mutations. Mutations in FAT1 (12%) and FAT4 (10%) have been identified in human HNSCC (13) and these genes in addition to FAT2 and FAT3 were altered in MOC lines. Other candidate driver mutations included CASP8 in MOC22, which is altered in 8–10% of human HNSCC typically in association with HRAS mutations (11, 13). Indels did not segregate into either indolent or aggressive growth categories (Table S6). Copy number and tumor heterogeneity could not be reliably evaluated, as normal tissue from the parental mice was not available. Thus, as a group, MOC lines had alterations in the most commonly mutated genes and driver pathways in HNSCC reflecting an unexpected conservation in the mutational landscape, despite differences in the species, specific carcinogen used to derive the lines and overall numbers of mutations.

Table 1.

MOC line conservation with common HNSCC TCGA mutations

MOC1 MOC22 MOC23 MOC2/2–10
1. CDKN2A - - - -
2. TP53 V170E and splice site T122P, K384* splice site E225*
3. JUB - - - -
4. CASP8 - R88S - -
5. PIK3CA Other Other Other Other
6. NSD1 - - N265I -
7. NOTCH1 splice site Other G1195R + splice site -
8. FAT1 E2054D D2756V K243* W3896R
9. MLL2 - S4691* Q1695L Q3647L
10. EPHA2 - K946* - -
16. HRAS Q61L Q61L Q61L Other

Other—indicates that mutations in other genes involved in this pathway were identified. See Table S5.

Novel candidate cancer genes

As common MOC line mutations may represent novel OSCC promoters, our analysis identified the A kinase anchoring protein Akap9, mediator complex component Med12l and Myh6 as potential candidates (Table S5, S7, S8). AKAP and MED protein families were mutated in the TCGA cohort (analyzed via cBio (24)) where we found that 9 members of the AKAP family were altered in 20.4% of tumors, with AKAP9 changes in 7% (Fig. 1D). Six components of the mediator complex were mutated in 14.7% of cases, with MED12L changes in 5% (Fig. 1E). Of note, MED1 mutations were previously identified in 5% of HNSCC (11). Importantly, MutSigCV analysis did not identify any of these genes as significantly mutated in TCGA when analyzed individually. However, together the mutations in several AKAP family members and mediator components suggest that these pathways may be relevant promoters. Further analysis identified 5% and 9% rates for AKAPs and 13% and 17.5% for MEDs in two independent HNSCC datasets ((12) and (11), respectively). Finally, very recent work using an RNAi in vivo screen identified MYH9 as a putative cancer gene in SCCA and we identified the related MYH6 gene as commonly mutated in MOC lines (25). The TCGA dataset shows equivalent mutation rates for both these genes. In addition, THSD7A, MUC5B, LRP2 and LAMA1 gene mutations were common in MOC lines (Tables S7, S8) and were also present in the TCGA cohort (Fig. S2). These alterations illustrate not only the conservation of structural parallels between mouse and human OSCC but also the ability of the mouse model to highlight novel tumor promoters.

Growth phenotype specific mutations

Although NGS confirmed MOC and human OSCC conservation, analysis of mutations specific to indolent or aggressive lines and lymph node versus lung metastatic lines was inconclusive likely due in part to the limited numbers of samples (Tables S7 and S9). We next approached this question by comparing our mouse sequencing data to mutations unique to lymph node metastasis negative (N0, 62 patients) versus positive (N+, 84 patients) OSCC samples from TCGA. We identified 3273 N0 and 3097 N+ mutations exclusive to each nodal status subset (Table S10). There was no significant difference in the average number of mutations between N0 and N+ patients regardless of smoking status (Fig. S3 and Table S11). This analysis showed 17 genes commonly mutated in mouse indolent and human N0 tumors and 55 common genes mutated in mouse aggressive and human N+ tumors (Table S12). However, none of these common genes were mutated at high frequency in the human N+ or N0 datasets (Tables S13 and S14). Finally, isolated analysis of the human TCGA data also showed that nodal status specific mutations occur infrequently in both the metastatic and non-metastatic tumors (from cBio portal, data not shown). Together, this analysis suggests that the aggressive OSCC phenotype is not clearly a result of somatic exome changes but rather may be driven instead by epigenetic or transcriptional alterations.

Expression microarray analysis

Next, we interrogated MOC lines and primary C57BL/6 oral keratinocytes to identify transcriptomic promoters of aggressiveness. As expected, principal component analysis (PCA) showed that all related aggressive lines clustered together. The indolent MOC1 and MOC22 clustered near each other and were only slightly separated from normal oral keratinocytes. In contrast, MOC23 showed a distinct distribution consistent with it being a unique subtype that grows only in RAG2−/− immunodeficient mice (Fig. 2A).

Figure 2.

Figure 2

Expression microarray analysis of MOC lines identifies a metastasis signature conserved in human OSCC (A) Principal component analysis (PCA) of MOC lines shows clustering of MOC1 and MOC22 near oral keratinocytes (OK). MOC23 is separated from all lines whereas the related aggressive lines all cluster together. (B) Unsupervised clustering of microarray data reveals a mouse signature of metastasis. (C) Microarray values (MA) and qRT-PCR analysis (Taqman) for Nkx2–3 and Foxa1 in MOC1 (indolent) and MOC2–10 (aggressive) lines showing dramatic upregulation in the aggressive line. (D) Hoxb7 and Bmp4 expression in MOC2 (LN-lymph node metastatic) and MOC2–10 (LN/lung metastatic).

Unsupervised hierarchical clustering demonstrated a metastasis signature (Fig. 2B) and significance analysis of microarrays (SAM) identified specific differentially expressed genes at a false-discovery rate of <10% (Fig. S4, Table S15), which were confirmed by ANOVA (p ≤ 0.01). The mouse signature was divided into significantly downregulated (260) or upregulated (218) genes in indolent versus aggressive lines (Table S16). Expression patterns for genes described in human metastatic tumors, such as MUC1, SLPI and TACSTD2, were conserved in mouse OSCC (2628). We identified several upregulated transcription factors, including Eomes, Nkx2–3, Foxa1, Hnf1b, Meis1 and E2f4 that were previously not described in OSCC and may be central to controlling global programs of aggressiveness.

The dramatic differences in expression between indolent and aggressive lines for Nkx2–3 and Foxa1 were confirmed by qRT-PCR (Fig. 2C). Finally, Hoxb7 and Bmp4 were implicated as candidate promoters of lung metastasis as they were overexpressed in MOC2–10 versus MOC2 (Fig. 2D) and are two key candidate promoters of distant metastasis in the MOC model.

Cross species signature conservation

We next asked whether the mouse signature predicted outcomes in human OSCC patients. Using microarray data from a carcinogen induced, HPV-negative cohort of 97 OSCC patients (UW/FHCRC), we stratified patients based on enrichment of the mouse signature by weighted voting (16). Using Kaplan-Meier analysis, disease specific survival (DSS) was statistically significantly worse for subjects in the group with the more aggressive signature as compared to those with the less aggressive signature (50% versus 80% 5-year DSS, Figure 3A, p<0.01). Thus, we termed this signature the Oral Cancer Aggressiveness and Metastasis Predictor (OCAMP-A).

Figure 3.

Figure 3

The mouse signature predicts outcomes and is enriched in human datasets (A) Kaplan-Meier analysis of OCAMP-A on UW/FHCRC dataset showing significant disease specific survival difference based on enrichment of mouse metastasis signature (p<0.01). (B) GSEA plots showing significant enrichment of both up and down OCAMP-A transcripts in the UW/FHCRC 97-patient OSCC dataset (p<0.05). (C) GSEA plots on the 134 OSCC patients from the TCGA dataset (p<0.001). (D) GSEA plots on the 71 OSCC patients from the MD Anderson dataset (p=0.57, n.s.= not significant).

To identify overlap between mouse and human signatures, we used Gene Set Enrichment Analysis (GSEA), which allows comparison of data from different platforms and species (21). Three independent datasets of human OSCC (UW/FHCRC (97 patients), MD Anderson (MD, 71 patients (29)) and TCGA (134 patients) were first classified by stage (UW/FHCRC) or regional lymph node metastasis as surrogate markers of tumor aggressiveness and then independently analyzed by GSEA with OCAMP-A. In all cases there was enrichment of OCAMP-A in human tumors (Fig. 3B, C, D) that was statistically significant for the TCGA (normalized enrichment score (NES)=1.6, Nominal p-value<0.001) and UW/FHCRC (NES=1.43, Nominal p-value<0.05) but not MD datasets (NES=0.9 and Nominal p-value=0.57).

Despite the high p-value for the MD dataset, we noted substantial overlap of enriched genes among the three human datasets. We used iterative GSEA based enrichment (Fig. 4A, Fig. S5A–F) to identify commonly enriched genes in all three human datasets and eliminate mouse specific transcripts (Figure 4B,C, designated OCAMP-B (118 genes)). Because initial analysis for the MD set were not significant, we reassessed significance of OCAMP-B. Kaplan-Meier analysis showed a statistically significant worse overall and disease specific survival for patients with the aggressive OCAMP-B signature (Fig. 4D, (p<0.001), Fig. S6 (p<0.05)). Similar analysis on OSCC patients from the TCGA dataset was limited by the availability of follow-up data. However, OCAMP-B was predictive of both OS and DSS in the UW/FHCRC dataset (p<0.01, Figure S7).

In current OSCC management, clinically node negative patients (the cN0 patient--i.e. those with no suspicious neck lymph nodes by palpation or imaging) undergo neck dissection surgery depending on specific features of the primary tumor to pathologically identify occult nodal metastases. As this approach leads to unnecessary surgery in nearly 80% of patients, our goal is to identify gene expression in the primary tumor predictive of outcomes and occult metastatic disease among newly-diagnosed and untreated patients. Thus, we used clinical rather than pathological TNM staging (available only for the MD Anderson dataset) and found that OCAMP-B status defines unique prognostic subgroups within clinical stages 1/2 and 3/4 (Fig. 4E and Table S17A). Multivariate modeling showed a statistically significant independent effect of OCAMP-B such that patients with an aggressive signature were 3.9 times more likely to die (adjusted for TNM hazard ratio value, 95% CI (1.52 to 10.03), Table S17B). Finally, we sought to compare the performance of the OCAMP-B signature to histopathological grading. Of 18 patients who were cN0 but pathologically N+ (pN+, i.e., clinically did not have nodal disease but harbored disease on pathologic analysis after neck dissection), all 18 had the aggressive gene signature. Additionally, of 24 cN+ and pN+ patients, all harbored the OCAMP-B aggressive signature. Finally, of 24 patients who were cN0 and pN0, all had the indolent signature (Figure 4F, Table S17C). Given that OCAMP-B was generated with overlaps between 3 datasets, the above stratification was not surprising. Towards independent confirmation of OCAMP-B performance, we used a 22 patient OSCC dataset from UPENN (30) and saw excellent stratification (21/22 tumors correct) with respect to lymph node metastatic status (Figure 5A, Table S18). Robust follow-up data for more complex analysis were not reported for this dataset. Together, these findings demonstrate that OCAMP-B allows disease outcome stratification at initial presentation based on results from the primary tumor.

Figure 5.

Figure 5

Independent validation of OCAMP-B and development of a clinical assay for stratification of OSCCs (A) An independent UPENN dataset is classified with high accuracy (21/22 tumors) by OCAMP-B weighted voting (WV output) with respect to lymph node metastatic status (Path= known pathologic status), (B) Schematic illustrating the selection of 42 OCAMP genes and SVM processing on training set samples to identify the best discriminating genes. The final 19-gene list for FFPE is on right with asterisks on 10-gene list for fresh specimens. (C) Discriminant scores from SVM analysis showing successful stratification in 12/13 FFPE and 17/18 fresh biopsy test cases of metastatic nodal disease using a qRT-PCR assay.

A multigene assay to stratify OSCC by lymph node status

As knowledge of lymph node metastatic status of OSCC is critical in clinical decision making, including whether to suggest neck surgery for early stage cancers, we next asked whether the mouse signature could be translated into a diagnostic test as described for ocular melanoma (Fig. 5B, (17)). For training sets, we used 17 formalin fixed, paraffin embedded (FFPE) or 16 fresh biopsies specimens from the primary tumor of Washington University OSCC patients with known pathologic status. Using a Taqman platform and a support vector machine-learning algorithm (SVM (31)), 42 discriminating genes were refined into 19 or 10 that classified the FFPE or fresh tumor set with 100% accuracy, respectively. Test sets of 13 independent FFPE or 18 fresh biopsy tumors were then subjected to the assay and analyzed as unknowns by the trained SVM. Accurate lymph node classification of 12/13 FFPE or 17/18 fresh tumors was achieved (Fig. 5C, Table S19). Importantly, no N+ samples were classified as N0 and the two N0 samples classified as N+ were from larger T3 and T4 tumors. These data represent proof-of-principle that the OCAMP can be translated for clinical stratification of OSCC patients.

Discussion

Here, we used genomic approaches including exome sequencing and transcriptional profiling to delineate the genetic basis of aggressive growth in the MOC model and in particular focused on its fidelity with human OSCC. Two obvious constraints of our approach were the limited number of different lines and that the aggressive lines were related. Despite these limitations, the MOC lines manifested the breadth of clinical scenarios observed in human OSCC. Our data showed that MOC lines as a group contained mutations in the majority of commonly mutated HNSCC genes, in driver pathways described in human OSCC and in addition, highlighted potential new driver mutations. No recurrent mutations associated with aggressive growth were identified. However, transcriptomic analysis revealed a mouse metastasis signature that contained both known and novel candidates for promoters of aggressiveness. Even though this signature was derived from a small number of cell lines, we were surprised it was conserved in three independent human datasets including from the TCGA effort. Using iterative GSEA, we then developed a consensus 118-transcript metastasis predictor. Finally, using the mouse signature, we were able to develop a preliminary clinically applicable test for genetic stratification of OSCC. Thus, these data have significant potential implications for understanding the biology, prognosis and therapy of human OSCC.

Recent genomics studies to define distinct HNSCC oncogenic driver classes revealed a major functional role for the PI3K-pathway (10, 12). We found PI3K-pathway mutations in MOC22 and 23 but their functional relevance has not been evaluated. As expected, all MOC lines shared RAS pathway mutations due to the predilection of DMBA for RAS mutations (20). Relevant to the HRAS mutant group of human OSCC (12), MOC22 was found to have both HRAS and CASP8 mutations. Interestingly, KRAS was mutated in aggressive MOC lines and NRAS in MOC23; however, these alleles are less common in human OSCC. Importantly, based on the initial description of enhanced ERK1/2 activation in CD44+ aggressive lines (20), we have initiated a MEK inhibitor (trametinib) clinical trial in patients with OSCC (NCT01553851). Future studies will address the functional contribution of putative drivers. Our focus on the genetic contribution of a conserved mouse to human transcriptional signature supports the existence of a distinct program of aggressiveness in MOC2 and MOC2–10 and human OSCC that is independent of common driver mutations.

Analysis of the aggressiveness biomarker panel revealed several intriguing candidate promoters, most notably the lineage specific transcription factor Nkx2–3 that, in addition to other tissues, is normally expressed in the developing tongue, floor of mouth and mandible (32). The NKX family of homeodomain transcription factors has been implicated in a variety of malignancies with lung adenocarcinoma serving as a prime example where Nkx2-1 has a dual role in tumor promotion and metastasis (33). Interestingly, recent work has shown that the Foxa1 pioneer factor partners with Nkx2-1 (34) and our analysis shows that Foxa1 is also upregulated in Nkx2–3 expressing aggressive tumors. Finally, with regard to MOC2–10, we identified Hoxb7, which has been implicated in poor outcomes in OSCC (35) and Bmp4, which promotes breast cancer metastasis (36), as candidate regulators of lung metastasis in OSCC. Thus, this approach of murine modeling is highly useful, and it supports the generation of additional lines to assess the frequency of recurrent mutations, to extend genotype-phenotype correlations, and to undertake further detailed mechanistic work. Finally, while carcinogenesis with DMBA results in a high number of mutations, it clearly identifies conserved cross-species pathways in contrast to defined oncogene-driven models, perhaps because it allows the natural biology of OSCC to emerge.

Several groups have used expression analyses on human OSCC specimens to develop predictive genetic biomarkers (15, 16, 19). Van Hooff et al. prospectively showed that their signature had 86% sensitivity and 44% specificity for metastasis in early stage OSCC (19); This signature had an 89% negative predictive value for metastasis in early stage OSCC lesions, but clinical application of the test would still result in either under or overtreatment of significant numbers of patients. Thus, the exact utility of this assay in clinical practice remains to be defined and more robust assays are desirable. The OCAMP signature offers a unique biomarker for human OSCC, as it does not have significant overlap with work described to date (Table S20). We successfully translated the OCAMP signature into a robust assay using a straightforward platform and anticipate rapid progression to larger samples and eventual validation in a prospective fashion. Further work focused on defining the molecular basis of OSCC aggressiveness using the high-fidelity MOC platform may identify additional novel therapeutic approaches for human OSCC.

Supplementary Material

1
2
3

Translational Relevance.

In this study, we present our genomic analysis of a carcinogen induced mouse model of oral cavity squamous cell carcinoma (OSCC) that has surprising conservation with human OSCC with significant translational implications. First, we identified that as a group mouse OSCC cell lines contain the same driver mutations described in human OSCC. In addition, this analysis highlighted novel candidate cancer genes that may impact the biology of human OSCC. Second, we identified a mouse based metastasis signature that was highly conserved in 4 independent human OSCC datasets. The translational relevance of these data for OSCC patients lies in the current deficiencies in risk stratification that result in many patients receiving unnecessary surgery or chemotherapy. Therefore, we developed a proof of principle, clinically relevant assay from this mouse signature that has significant potential for human OSCC risk stratification.

Acknowledgements

We thank The Genome Institute at Washington University for sequencing and analysis via a Center Initiated Project (NIH/NHGRI 5U54HG003079 (Richard Wilson, PI)) and Dr. J. William Harbour for support during early phases of microarray analysis. NIH P30DC04665 supports the RCAVS histology (Brian Faddis and Dorothy Edwards) and biostatistics cores (D. Kallogjeri). We thank Siteman Cancer Center Tissue Procurement (NCI Cancer Center Support Grant #P30 CA91842) and GTAC (NCCR ICTS/CTSA UL1TR000448 and NCI-CA91842). This publication is solely the responsibility of the authors and does not necessarily represent the views of NIH/NCRR. MDO is supported by NIGMS #R01GM38542 (John Cooper, PI). RU thanks Dr. Richard Chole for sustained support.

Footnotes

Conflict of interest: A patent application for the metastasis signature is pending (Washington University/Dr. Uppaluri). All other authors declare no conflict of interest.

References

  • 1.Brandwein-Gensler M, Teixeira MS, Lewis CM, Lee B, Rolnitzky L, Hille JJ, et al. Oral squamous cell carcinoma: histologic risk assessment, but not margin status, is strongly predictive of local disease-free and overall survival. Am J Surg Pathol. 2005;29(2):167–178. doi: 10.1097/01.pas.0000149687.90710.21. [DOI] [PubMed] [Google Scholar]
  • 2.Ganly I, Goldstein D, Carlson DL, Patel SG, O'Sullivan B, Lee N, et al. Long-term regional control and survival in patients with "low-risk," early stage oral tongue cancer managed by partial glossectomy and neck dissection without postoperative radiation: the importance of tumor thickness. Cancer. 2013;119(6):1168–1176. doi: 10.1002/cncr.27872. [DOI] [PubMed] [Google Scholar]
  • 3.Kalnins IK, Leonard AG, Sako K, Razack MS, Shedd DP. Correlation between prognosis and degree of lymph node involvement in carcinoma of the oral cavity. Am J Surg. 1977;134(4):450–454. doi: 10.1016/0002-9610(77)90376-2. [DOI] [PubMed] [Google Scholar]
  • 4.Myers JN, Greenberg JS, Mo V, Roberts D. Extracapsular spread. A significant predictor of treatment failure in patients with squamous cell carcinoma of the tongue. Cancer. 2001;92(12):3030–3036. doi: 10.1002/1097-0142(20011215)92:12<3030::aid-cncr10148>3.0.co;2-p. [DOI] [PubMed] [Google Scholar]
  • 5.Allen CT, Law JH, Dunn GP, Uppaluri R. Emerging insights into head and neck cancer metastasis. Head Neck. 2012 doi: 10.1002/hed.23202. [DOI] [PubMed] [Google Scholar]
  • 6.Monroe MM, Gross ND. Evidence-based practice: management of the clinical node-negative neck in early-stage oral cavity squamous cell carcinoma. Otolaryngol Clin North Am. 2012;45(5):1181–1193. doi: 10.1016/j.otc.2012.06.016. [DOI] [PubMed] [Google Scholar]
  • 7.Mroz EA, Rocco JW. Gene expression analysis as a tool in early-stage oral cancer management. J Clin Oncol. 2012;30(33):4053–4055. doi: 10.1200/JCO.2012.44.8050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Rothenberg SM, Ellisen LW. The molecular pathogenesis of head and neck squamous cell carcinoma. J Clin Invest. 2012;122(6):1951–1957. doi: 10.1172/JCI59889. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Agrawal N, Frederick MJ, Pickering CR, Bettegowda C, Chang K, Li RJ, et al. Exome sequencing of head and neck squamous cell carcinoma reveals inactivating mutations in NOTCH1. Science. 2011;333(6046):1154–1157. doi: 10.1126/science.1206923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lui VW, Hedberg ML, Li H, Vangara BS, Pendleton K, Zeng Y, et al. Frequent mutation of the PI3K pathway in head and neck cancer defines predictive biomarkers. Cancer Discov. 2013 doi: 10.1158/2159-8290.CD-13-0103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Stransky N, Egloff AM, Tward AD, Kostic AD, Cibulskis K, Sivachenko A, et al. The mutational landscape of head and neck squamous cell carcinoma. Science. 2011;333(6046):1157–1160. doi: 10.1126/science.1208130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Pickering CR, Zhang J, Yoo SY, Bengtsson L, Moorthy S, Neskey DM, et al. Integrative Genomic Characterization of Oral Squamous Cell Carcinoma Identifies Frequent Somatic Drivers. Cancer Discov. 2013 doi: 10.1158/2159-8290.CD-12-0537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Morris LG, Kaufman AM, Gong Y, Ramaswami D, Walsh LA, Turcan S, et al. Recurrent somatic mutation of FAT1 in multiple human cancers leads to aberrant Wnt activation. Nat Genet. 2013;45(3):253–261. doi: 10.1038/ng.2538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.India Project Team of the International Cancer Genome C. Mutational landscape of gingivo-buccal oral squamous cell carcinoma reveals new recurrently-mutated genes and molecular subgroups. Nature communications. 2013;4:2873. doi: 10.1038/ncomms3873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Bhattacharya A, Roy R, Snijders AM, Hamilton G, Paquette J, Tokuyasu T, et al. Two distinct routes to oral cancer differing in genome instability and risk for cervical node metastasis. Clinical cancer research : an official journal of the American Association for Cancer Research. 2011;17(22):7024–7034. doi: 10.1158/1078-0432.CCR-11-1944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Lohavanichbutr P, Mendez E, Holsinger FC, Rue TC, Zhang Y, Houck J, et al. A 13-gene signature prognostic of HPV-negative OSCC: discovery and external validation. Clinical cancer research : an official journal of the American Association for Cancer Research. 2013;19(5):1197–1203. doi: 10.1158/1078-0432.CCR-12-2647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Onken MD, Worley LA, Tuscan MD, Harbour JW. An accurate, clinically feasible multi-gene expression assay for predicting metastasis in uveal melanoma. J Mol Diagn. 2010;12(4):461–468. doi: 10.2353/jmoldx.2010.090220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. The New England journal of medicine. 2004;351(27):2817–2826. doi: 10.1056/NEJMoa041588. [DOI] [PubMed] [Google Scholar]
  • 19.van Hooff SR, Leusink FK, Roepman P, Baatenburg de Jong RJ, Speel EJ, van den Brekel MW, et al. Validation of a gene expression signature for assessment of lymph node metastasis in oral squamous cell carcinoma. J Clin Oncol. 2012;30(33):4104–4110. doi: 10.1200/JCO.2011.40.4509. [DOI] [PubMed] [Google Scholar]
  • 20.Judd NP, Winkler AE, Murillo-Sauca O, Brotman JJ, Law JH, Lewis JS, Jr, et al. ERK1/2 Regulation of CD44 Modulates Oral Cancer Aggressiveness. Cancer Res. 2012;72(1):365–374. doi: 10.1158/0008-5472.CAN-11-1831. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Matsushita H, Vesely MD, Koboldt DC, Rickert CG, Uppaluri R, Magrini VJ, et al. Cancer exome analysis reveals a T-cell-dependent mechanism of cancer immunoediting. Nature. 2012;482(7385):400–404. doi: 10.1038/nature10755. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Dipple A, Pigott M, Moschel RC, Costantino N. Evidence that binding of 7,12-dimethylbenz(a)anthracene to DNA in mouse embryo cell cultures results in extensive substitution of both adenine and guanine residues. Cancer Res. 1983;43(9):4132–4135. [PubMed] [Google Scholar]
  • 24.Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2(5):401–404. doi: 10.1158/2159-8290.CD-12-0095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Schramek D, Sendoel A, Segal JP, Beronja S, Heller E, Oristian D, et al. Direct in vivo RNAi screen unveils myosin IIa as a tumor suppressor of squamous cell carcinomas. Science. 2014;343(6168):309–313. doi: 10.1126/science.1248627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Cordes C, Hasler R, Werner C, Gorogh T, Rocken C, Hebebrand L, et al. The level of secretory leukocyte protease inhibitor is decreased in metastatic head and neck squamous cell carcinoma. Int J Oncol. 2011;39(1):185–191. doi: 10.3892/ijo.2011.1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Nitta T, Sugihara K, Tsuyama S, Murata F. Immunohistochemical study of MUC1 mucin in premalignant oral lesions and oral squamous cell carcinoma: association with disease progression, mode of invasion, and lymph node metastasis. Cancer. 2000;88(2):245–254. doi: 10.1002/(sici)1097-0142(20000115)88:2<245::aid-cncr1>3.0.co;2-t. [DOI] [PubMed] [Google Scholar]
  • 28.Wang J, Zhang K, Grabowska D, Li A, Dong Y, Day R, et al. Loss of Trop2 promotes carcinogenesis and features of epithelial to mesenchymal transition in squamous cell carcinoma. Mol Cancer Res. 2011;9(12):1686–1695. doi: 10.1158/1541-7786.MCR-11-0241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lohavanichbutr P, Houck J, Doody DR, Wang P, Mendez E, Futran N, et al. Gene expression in uninvolved oral mucosa of OSCC patients facilitates identification of markers predictive of OSCC outcomes. PLoS One. 2012;7(9):e46575. doi: 10.1371/journal.pone.0046575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.O'Donnell RK, Kupferman M, Wei SJ, Singhal S, Weber R, O'Malley B, et al. Gene expression signature predicts lymphatic metastasis in squamous cell carcinoma of the oral cavity. Oncogene. 2005;24(7):1244–1251. doi: 10.1038/sj.onc.1208285. [DOI] [PubMed] [Google Scholar]
  • 31.Pavlidis P, Wapinski I, Noble WS. Support vector machine classification on the web. Bioinformatics. 2004;20(4):586–587. doi: 10.1093/bioinformatics/btg461. [DOI] [PubMed] [Google Scholar]
  • 32.Biben C, Wang CC, Harvey RP. NK-2 class homeobox genes and pharyngeal/oral patterning: Nkx2–3 is required for salivary gland and tooth morphogenesis. Int J Dev Biol. 2002;46(4):415–422. [PubMed] [Google Scholar]
  • 33.Yamaguchi T, Hosono Y, Yanagisawa K, Takahashi T. NKX2-1/TTF-1: An Enigmatic Oncogene that Functions as a Double-Edged Sword for Cancer Cell Survival and Progression. Cancer Cell. 2013;23(6):718–723. doi: 10.1016/j.ccr.2013.04.002. [DOI] [PubMed] [Google Scholar]
  • 34.Watanabe H, Francis JM, Woo MS, Etemad B, Lin W, Fries DF, et al. Integrated cistromic and expression analysis of amplified NKX2-1 in lung adenocarcinoma identifies LMO3 as a functional transcriptional target. Genes Dev. 2013;27(2):197–210. doi: 10.1101/gad.203208.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.De Souza Setubal Destro MF, Bitu CC, Zecchin KG, Graner E, Lopes MA, Kowalski LP, et al. Overexpression of HOXB7 homeobox gene in oral cancer induces cellular proliferation and is associated with poor prognosis. Int J Oncol. 2010;36(1):141–149. [PubMed] [Google Scholar]
  • 36.Pal A, Huang W, Li X, Toy KA, Nikolovska-Coleska Z, Kleer CG. CCN6 modulates BMP signaling via the Smad-independent TAK1/p38 pathway, acting to suppress metastasis of breast cancer. Cancer Res. 2012;72(18):4818–4828. doi: 10.1158/0008-5472.CAN-12-0154. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3

RESOURCES