Abstract
The study sought to identify genetic aberrations driving oral squamous cell carcinoma (OSCC) development among users of shammah, an Arabian preparation of smokeless tobacco. Twenty archival OSCC samples, 15 of which with a history of shammah exposure, were whole-exome sequenced at an average depth of 127×. Somatic mutations were identified using a novel, matched controls-independent filtration algorithm. CODEX and Exomedepth coupled with a novel, Database of Genomic Variant-based filter were employed to call somatic gene-copy number variations. Significantly mutated genes were identified with Oncodrive FM and the Youn and Simon’s method. Candidate driver genes were nominated based on Gene Set Enrichment Analysis. The observed mutational spectrum was similar to that reported by the TCGA project. In addition to confirming known genes of OSCC (TP53, CDKNA2, CASP8, PIK3CA, HRAS, FAT1, TP63, CCND1 and FADD) the analysis identified several candidate novel driver events including mutations of NOTCH3, CSMD3, CRB1, CLTCL1, OSMR and TRPM2, amplification of the proto-oncogenes FOSL1, RELA, TRAF6, MDM2, FRS2 and BAG1, and deletion of the recently described tumor suppressor SMARCC1. Analysis also revealed significantly altered pathways not previously implicated in OSCC including Oncostatin-M signalling pathway, AP-1 and C-MYB transcription networks and endocytosis. There was a trend for higher number of mutations, amplifications and driver events in samples with history of shammah exposure particularly those that tested EBV positive, suggesting an interaction between tobacco exposure and EBV. The work provides further evidence for the genetic heterogeneity of oral cancer and suggests shammah-associated OSCC is characterized by extensive amplification of oncogenes.
Keywords: carcinoma, exome, genetics, head and neck neoplasms, high-throughput nucleotide sequencing, mouth, shammah, smokeless, squamous cell, tobacco
Oral squamous cell carcinoma (OSCC), a subset of head and neck squamous cell carcinomas (HNSCC), accounts for >90% of malignancies affecting the oral cavity. The disease continues to have poor prognosis with a 5-year survival rate of <50%.1 The incidence rate of lip and oral cancer combined ranks 16th world-wide and 12th in less developed countries, with a 2.2:1 male to female ratio.2 However, the incidence rates are much higher in certain countries/regions including South Asia, Papua New Guinea, France, Hungary2 and, according to preliminary data, south of the Arabian peninsula.3,4 The major environmental risk factors for OSCC are tobacco consumption, alcohol abuse and betel quid chewing. Infection with HPV 16/18 is also emerging as an important contributor to the development of OSCC particularly that of the tonsils.1 In the south of the Arabian Peninsula, OSCC is associated with the use of Arabian snuff, a highly carcinogenic form of smokeless tobacco (ST) locally referred to as shammah.5
Regardless of the nature of environmental exposure, HNSCC is known to result from clonal expansion of progenitors with cumulative genetic alterations that result in activation of oncogenes and/or inactivation of tumor suppressor genes, which disrupt cell cycle regulation. These alterations include single point mutations, indels, methylation and copy number variations (CNVs). Earlier genetic studies identified a number of genes involved in HNSCC including TP53, CDKN2A, PIK3CA, CCND1, EGFR, PTEN and HRAS.6 Recently, high throughput technologies such as whole genome/exome sequencing and copy number arrays have enabled comprehensive characterization of genetic alterations in individual tumors. Analysis of HNSCC samples with these technologies have in addition to confirming previous driver genetic events, identified alterations in genes previously not implicated in HNSCC development such as NOTCH1, CASP8, FAT1 and TP63.7–11
The mutational landscape of oral cancer seems to differ depending on the environmental exposure associations. It is well established, for example, that the genetic aberrations found in tobacco-associated HNSCC are quantitatively and qualitatively different from those identified in HPV-associated tumors.9–11 However, there has been no attempt to assess differences in genetic alterations by type of tobacco exposure itself, e.g., cigarette smoking and ST. A recent study involving samples of gingivo-buccal OSCC collected in India,7 where half of the oral cancers is attributed to the use of ST, revealed a different mutational landscape than that reported by other whole-exome sequencing studies on OSCC/HNSCC samples obtained in the USA,8–11 where cigarette smoking is the major route of tobacco consumption.12 In fact, epidemiological evidence indicates that ST products used in some regions such as India,13 Sudan14 and south of the Arabian Peninsula5 are far more carcinogenic than cigarette smoking and American ST products due to different processing methods. It becomes plausible, therefore, to hypothesize that these products induce unique genetic aberrations.
The objective of this study was to identify coding sequence and copy number variants involved in the initiation and progression of shammah-associated OSCC. In addition to this unique exposure, the study involved a novel study population from the south of the Arabian Peninsula.
Material and methods
OSCC DNA samples
Twenty samples were selected from an archive of anonymized, leftover DNA extracts obtained from fresh OSCC biopsies as part of a previous study in which shammah use had been identified as the major risk factor for OSCC.5 The archive was contributed by Dr. Akram Nasher at Sana’a University, Yemen, who collected the biopsies between June 2009 and February 2011 in two major hospitals in Sana’a City as detailed in the original study.5 All DNA samples had been stored at −80°C since 2011. No normal control DNA samples were available or could be obtained. The study described herein was approved by the Biomedical Research Ethical Committee at the Medical Research Center, Jazan University in Saudi Arabia.
The quantity and quality of DNA were assessed using the NanoDrop 2000 (Thermo Fisher Scientific), Qubit® 2.0 Fluorimeter (Life Technologies) and 1% agarose gel electrophoresis. Extracts with a concentration of ≥ 50 ng/µl, an OD260/280 ≥ 1.8 and a total quantity of at least 2 µg were considered for sequencing. The 20 samples for the current study were selected so as to include 15 cases with and 5 without history of shammah use. As tested by real-time polymerase chain reaction in the original study, none of the selected extracts was HPV 16 or 18 positive while 14 (70%) were EBV-positive. The clinical characteristics of the selected OSCC cases are shown in Table 1.
Table 1.
Variable | No. (%) |
---|---|
Age | |
≤ 45 years | 7 (35) |
> 45 years | 13 (65) |
Males | 8 (40) |
Site | |
Tongue | 9 (45) |
Gum | 4 (20) |
Floor of the mouth | 3 (15) |
Lip | 2 (10) |
Other unspecified parts of the mouth | 2 (10) |
Stage1 | |
1 | 0 (00) |
2 | 5 (25) |
3 | 5 (25) |
4 | 10 (50) |
Shammah users | 15 (75) |
Smokers | 6 (30) |
EBV positive | 14 (70) |
Staging based on the TNM scores.
Exome sequencing
Exome sequencing was performed at the Beijing Genome Institute in Hong Kong. The SureSelect reagent and Human All Exome V4 kits (Agilent) were used to prepare tagged libraries and capture exomic sequences from the DNA samples. The post-capture indexing protocol (SureSelect™ Target Enrichment) was employed to ensure the highest capture efficiency and coverage. Sequencing of the exome libraries was performed using 101 bp paired-end reads chemistry on a Hiseq-2000 (Illumina) with a planned on-target sequencing depth of at least 100×.
Sequencing data processing
The Hiseq-2000 data were demultiplexed and submitted to Short Reads Archive (SRA) under project number SRP064776. The SOAPnuke software (http://soap.genomics.org.cn) was used to remove adaptor sequences and reads with low quality defined as those with >1% ambiguous bases and/or with >10% bases with a quality score of <20. The clean reads were mapped to hg19 using the Burrows-Wheeler aligner (BMA; http://bio-bwa.sourceforge.net/) and alignment information was stored in .sam (sequence alignment map) format. Duplicate reads were removed with the PICARD 1.60 software (http://picard.sourceforge.net). The SAMtools software (http://samtools.sourceforge.net/) was employed to convert SAM files to BAM files before merging them for individual samples.
Calling sequence and copy number variants
SNPs and indels were called using the SOAPSNP software (http://soap.genomics.org.cn/) and genome analysis toolkit (GATK; https://www.broadinstitute.org/gatk/index.php), respectively. Criteria used for calling a SNP were as follows: sequencing depth at the mutation site >20× variant supported by ≥15% of reads from the region; variant at least five bases away from the end of the read; a mapping quality >30; and an average variant quality score of >20. Indels were called using the following criteria: sequencing depth at the mutation site >20×.; variant supported by ≥25% of reads from the region; variant at least eight bases away from the end of the read and the reads supporting the reference and variant sequences not significantly different as tested by rank sum test. The called variants were then annotated with SnpEff (http://snpeff.sourceforge.net/) against dbSNP 138 (http://www.ncbi.nlm.nih.gov/SNP/), dbSNP 141 common mutations, ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/), Catalogue of Somatic Mutations in Cancer (COSMIC; http://cancer.sanger.ac.uk/cosmic), 1,000 Genomes (1000G) Phases 1 and 3 (http://www.1000genomes.org/), Exome Sequencing Project (ESP) 6500 Version 2 (http://evs.gs.washington.edu/EVS/), variants in the AGI-ASP population (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_viewBatch.cgi?ibid=1051844) and variants recently described in a Kuwaiti Arab population (KAP).15 Variants with no match in any of these databases were considered novel.
CNVs were identified with CODEX16 and Exomedepth,17 neither of which requires data from matched normal samples. The former algorithm uses a cross-sample, read depth normalization procedure and Poisson likelihood segmentation for calling of CNVs; it has low sensitivity but high specificity. The latter uses a beta-binomial model for read depth data across samples to generate an aggregate reference set on which CNV inference is based; in the current study, the test: reference reads ratio threshold was set to 0.80 for deletions and 1.25 for amplifications. Exomedepth has a reported sensitivity of 75%.
Enrichment of somatic variants
In the absence of normal control data, potential germline mutations were filtered out using a combination of common population variant and variant frequency filters (Figure 1). The former filter comprised dbSNP 141 common mutations, variants in ESP6500 v2 with all populations allele frequency of >0.0001 and all variants in 1000G Phases 1 and 3, AGI-ASP and KAP. Variants from coding sequences (CDS), including splice sites, were then extracted and subjected to the variant frequency filter whereby SNPs with frequency >15% and indels with frequency >10% were removed. The resultant final list of mutations was deemed as potentially somatic.
To estimate the fraction of truly somatic mutations captured, the filtration algorithm was tested on a training set comprising confirmed somatic mutations from glioblastoma multiforme (GBM)18 and breast and colorectal cancers (CRC-BC).19 The algorithm was further assessed by comparing the resultant mutational spectrum with that of the 172 OSCC cases analyzed in the TCGA project.9
A novel approach was used to identify potentially somatic CNVs as follows (see also Figure 1). First, the July 2015 release of the Database of Genomic Variants (DGV; supporting variants format; downloaded at http://dgv.tcag.ca/dgv/app/downloads?ref=) was analyzed at the gene level, pooling all 33,808 samples in the database to generate common population, gene-level CNV frequencies (Supporting Information Table 1.). Second, CNVs identified in the samples by CODEX and Exomedepth were also subjected to gene-level analysis and filtered to include only genes with one type of CNV (amplification or deletion) identified by both methods and/or in two or more samples. Finally, these were searched against the generated DGV gene variants dataset (Supporting Information Table 1) to filter out CNVs with a population frequency ≥0.0005. The remaining CNVs were considered as potentially somatic.
Identification of significantly mutated genes
To deal with the small sample size, two functional impact-based methods were used to identify significantly mutated genes (SMGs): Oncodrive FM20 and the method described by Youn and Simon (YS method).21 The former algorithm relies entirely on measurement of mutation functional impact bias to identify SMGs, even those with low recurrence, while the latter uses a combination of background mutation rate estimation and scoring of functional impact. The p-values were corrected from multiple testing using the false discovery rate (FDR) according to the Benjamini and Hochberg’s method.22 However, since Oncodrive FM was too conservative, FDR calculation was performed using the prioritized subset analysis23 to improve the power of identifying important driver genes; i.e., a subset of priority genes were preselected and the FDR correction was applied to it first and then to the remaining, non-priority subset. The priority subset (Supporting Information Table 2) included driver genes of HNSCC/OSSC described in the literature as well as those mutated in ≥5% of HNSCC or OSCC samples in COSMIC. The FDR cutoff value (q-value) was set at 0.001 for the YS method and 0.25 for oncodrive FM. Mutations of SMGs identified were then functionally annotated with SIFT (http://sift.jcvi.org/), Polyphen (http://genetics.bwh.harvard.edu/pph2/) and Mutation Accessor (http://www.ngrl.org.uk/Manchester/page/mutation-assessor) and reassessed by Sanger sequencing.
Gene pathway enrichment analysis—Driver genes
SMGs and a selected panel of the genes with potentially somatic CNVs were subjected to gene pathway enrichment analysis using Gene Set Enrichment Analysis (GSEA; http://www.broadinstitute.org/gsea/index.jsp). Driver genes were nominated taking into consideration significant pathways (FDR ≥ 0.1) with relevance to tumorigenesis, function of the genes within the affected pathways, functional impact of mutations, CNV type (deletion vs. amplification) and existing evidence supporting the role of individual genes in cancer.
Results and discussion
Sequencing, quality filtration and mapping
Exome sequencing generated an average of 14 giga bases of raw data per sample at an average read length of 99.5 bases; the stringent quality filters applied removed 17% of the reads leaving an average of 11.6 giga bases clean data per sample. In total, 99% of the clean reads successfully mapped to the reference genome; 73% uniquely mapped on target, providing an average, final sequencing depth of 127×. The average fraction of target with coverage >20× was 96.7%. The corresponding figures for the flanking regions (near target) were 25× and 42%, respectively. Detailed sequencing, quality-filtration and mapping statistics are provided in Supporting Information Table 3.
Variants filtration algorithms—Potentially somatic mutations and CNVs
A total of 1,543,306 mutations (SNPs + indels) were called, of which 398,125 (25.8%) were in the CDS region; i.e., an average per subject of 77,165 and 19,906 total and CDS mutations, respectively. The common population variant filter removed 97.67% of the called CDS mutations. At this stage 5,671 unique novel CDS mutations were identified, of which 132 were recurrent and thus re-sequenced by Sanger technology (Supporting Information Table 4). A further 0.37% of the CDS mutations were filtered out by the variant frequency filter, leaving a final of 7,823 potentially somatic mutations in 5,410 genes (Supporting Information Table 5). Detailed statistics of the mutation filtration algorithm are presented in Supporting Information Table 6. In assessing the filtration algorithm on the GBM and CRC-BC confirmed somatic mutations training set (see above), it was found to retain the vast majority (96.3%) of them (Supporting Information Table 7). Based on this, it can be concluded that the final list of mutations obtained by the filtration algorithm in this study included almost all truly somatic, in addition to very rare germline, mutations.
A total of 3,531 gene deletions and 6,633 gene amplifications were identified by either or both CODEX and Exomedepth, at an average per subject of 176.6 and 331.7 gene deletions and amplifications, respectively. Restricting to genes with one CNV type identified by both methods and/or in two or more samples filtered out 92% and 70% of the deletions and amplifications, respectively. Applying the novel DGV gene variant frequency filter removed a further 6% and 12.2%, leaving a final of 71 and 1,178 potentially somatic gene deletions and amplifications, respectively (Supporting Information Table 8). The amplifications predominantly involved 11q13, 11p11.2, 12q14-15, 20p13, 20q11, 9p13.3 and 9p24; around 34% of the amplified genes were located on Chromosome 11. The very high amplification to deletion ratio could be due to a lower sensitivity of CODEX and Exomedepth in identifying heterozygous deletions.
Mutational spectrum and SMGs
The mean number of potentially somatic mutations per subject was 391 (range: 290–622). Indels and single nucleotide variants (SNVs) accounted for 5% and 95% of the mutations, respectively. Of the SNVs, 2,590 (34.84%) were silent; 4,585 (61.67%), missense; 171 (2.30%), nonsense and 89 (1.20%), splice sites. Among the indels, 45% were frameshift. C > T transitions accounted for 56.6% of the SNVs, followed by T > C transitions (17.2%) and C > G transversions (10.3%). Enrichment of C > A transversions typical of tobacco exposure, as reported in lung cancer and some OSCC exome studies,7 was not observed in the current cohort, which is consistent with the findings by Agrawal et al.10
A comparison with the mutational spectrum in the TCGA OSCC cohort is presented in Figure 2. Almost double the average number of mutations per sample was observed in the current study; however, the maximum number of mutations per subject as well as mutation distribution by type and base substitution category were very comparable to those observed in the TCGA cohort. This, together with the findings from analysis of the training set (see above), indicates that the filtration algorithm used served as a reliable alternative to the standard tumor-normal pair analysis. In fact, comparable algorithms for filtration of exome data from tumor-only samples have been recently described.24,25 Unique to the current study, however, the algorithm was comprehensive to include all public databases of population variants in addition the KAP variants. The latter was the only accessible set of exome variants from an Arab population, and including it as a last filter removed an average of 286 Arab-specific mutations per sample (Supporting Information Table 6).
Twenty-nine SMGs were identified with the YS method and 22 with Oncodrive FM, with an overlap of four genes (TP53, CDKN2A, CASP8 and HRAS). That is, a total of 47 SMGs were found. The 167 mutations identified in these genes, prediction of their functional impact and results of their validation by Sanger sequencing are presented in Supporting Information Table 9. Among these, 92 (55%) mutations were truncating or predicted as moderately to highly damaging; 44 (26.3%) were found in COSMIC and 98 (58.7%) were novel. The YS method and Oncodrive FM were chosen for identifying SMGs on the assumption that functional impact-based algorithms are more suitable for cohorts with small sample size and when the mutations analyzed are not purely somatic. In fact, we did make an attempt to identify SMGs with MutSigCV (http://www.broadinstitute.org/cancer/cga/mutsig), the algorithm used in all previous OSCC/HNSCC exome studies; however, the results obtained were not satisfactory (data not shown).
Driver SMGs
Thirteen SMGs were identified as drivers (Figure 3), including seven well-established driver genes of OSCC (CDKNA2, TP53, CASP8, PIK3CA, HRAS, FAT1 and TP63) in addition to six novel candidates: NOTCH3, CSMD3 (CUB and Sushi Multiple Domains 3), CRB1 (Crumbs Family Member 1), CLTCL1 (Clathrin, Heavy Chain-Like 1), OSMR (Oncostatin-M Receptor) and TRPM2 (Transient Receptor Potential Cation Channel, Subfamily M, Member 2). Forty-five (80.3%) of the mutations affecting these driver SMGs were either truncating or moderately to highly deleterious; 30 (55.6%) had been reported in COMSIC in association with carcinoma generally or HNSCC specifically (Supporting Information Table 9).
The tumor suppressor TP53 is the most frequently mutated gene in OSCC. In the current cohort, TP53 was mutated in 35% of the cases which, while being much lower than that found in previous exome sequencing studies (62–72%),7–9,11 is comparable to that reported in COSMIC based on 1,522 samples (42%). CDKN2A is another tumor suppressor frequently inactivated in OSCC, mainly via homozygous deletions. In the current study, no CDKN2A deletions were identified; however, the gene was affected by inactivating mutations in 35% of the cases, which is the highest CDKN2A mutation rate reported in the literature for OSCC. CASP8 was mutated in 30% of tumors; four of the six mutations were truncating, consistent with loss of function. Excluding the Indian, gingivo-buccal study in which CASP8 was mutated in 32% of the samples,7 the mutation rates for CASP8 reported in previous exome studies did not exceed 10%. The oncogenes HRAS and PIK3CA were activated in 15% and 20% of the tumors, respectively, which is comparable to the highest rates reported in previous studies.7,9 All HARS-mutant samples were also CASP8-mutant, in support of the observed correlation between the two genes.8 TP63 and FAT1, two recently described driver genes of HNSCC, were each mutated in 10% of the samples. FAT1 mutations were inactivating, which in line with its proposed function as a tumor suppressor through the Wnt pathway.26 The nature of TP63 mutations, all of which novel, could not be ascertained. However, given that the gene was also amplified in one sample, and in view of its reported interaction with NF-κB,27 the mutations may be speculated to be activating.
NOTCH1, an increasingly recognized driver gene in HNSCC, was not found to be significantly mutated in the current cohort. Instead, NOTCH3 was identified as a candidate novel driver gene, being mutated in 15% of the tumors. The three mutations were predicted as deleterious, one of which has already been reported in COSMIC in association with carcinoma. These may be assumed to be inactivating in line with those reported for NOTCH110,11; however, since there is also growing evidence to support an oncogenic role of NOTCH pathway in HNSCC,28 the possibility that the NOTCH3 mutations identified here are activating cannot be excluded. CSMD3 was mutated in 25% of the tumors; three of the mutations were moderately to highly damaging, of which one already described in COSMIC. The function of this gene in not fully understood, with both tumor suppressor and oncogenic properties described.29,30 Since the gene was amplified in another two samples, the identified mutations may be activating. CRB1 was mutated in four samples (20%); three of the mutations were predicted as deleterious and two have already been reported in COSMIC. CRB1 is typically expressed in the retina and brain and its mutations are associated with degenerative retinal pathologies.31 However, the gene is shown in COSMIC to be mutated, amplified and/or overexpressed in a number of cancers. In addition, its homolog CRB3 has been recently implicated as a cancer suppressor possibly via promoting cytoplasmic localization of TAZ and YAP, the transcriptional coactivators and key effectors of Hippo pathway.32
CLTCL1 was mutated in 15% of the tumors; all mutations were deleterious, two of which have been previously described in COSMIC. The gene encodes a major structural protein of the coated pits and vesicles involved in endocytosis. By adversely affecting trafficking of growth factor receptors and cycling of integrins and cathedrins, derailed endocytosis is believed to play an important role in cancer.33 CLTCL1 was also amplified in two samples, suggesting that the mutations identified may be activating. Indeed the gene is overexpressed in several cancer types. OSMR was affected with a novel truncating mutation in two of the cases (10%), which is consistent with a tumor suppressor role as that described in colorectal cancer.34 TRPM2 was also inactivated with a novel nonsense mutation in 10% of the samples. This gene encodes an oxidative stress-activated ion channel, and there is increasing evidence to suggest that it functions as a tumor suppressor in cancer through up-regulation of caspases and PARP cleavage.35
A number of observations are worth mentioning. First, of the novel driver SMGs identified, CLTCL1, OSMR and TRPM2 were unique to the shammah users. Second, there was a trend for higher mutation rates in males and older patients (>45 years). For example, CDKNA2, TP53 and CASP8 were mutated in 50% of the male subjects, while TP53, HRAS and FAT1 were exclusively mutated in the older cohort (frequency of 58%, 23.3% and 15.4%, respectively). Finally, all the genes described above were mutated in both early and late stage tumors, in support of their nomination as drivers.
Driver CNVs
Sixteen driver gene CNVs were identified (Figure 3). The 11q13 amplification commonly found in HNSCC and other cancers was observed in 25% of the samples. However, in addition to CCND1 and FADD, which are known oncogenes in HNSCC, the amplified region in some samples also contained RELA (v-rel avian reticuloendotheliosis viral oncogene homolog A) and FOSL1(FOS like antigen 1), both known oncogenes never implicated as driver genes in HNSCC. RELA was amplified in 2 (10%) of the samples. The gene encodes p65, a member of the NF-kB family with strong transcriptional activity, the expression of which has been shown to correlate with tumor dissemination and poor patient survival in HNSCC.36 Amplification has also been reported in a few early studies.37 FOSL1, also called FRA-1, was amplified in 15% of the tumors. The gene’s product is a member of the FOS family that dimerize with JUN proteins to form the AP-1 transcription factor complex. FRA-1 overexpression has been reported in several solid tumors, including HNSCC,38 and seems to play an important role in tumor invasiveness.39 FGF3, FGF4 and FGF19 were also located within the amplified regions, but since these have not been found to be expressed in HNSCC, even when amplified, they are not likely to be driver genes.40
Amplification of 12q14/15, 11p12 and 9p12 was found in 5–10% of the samples. These segments carried the oncogenes TRAF6 (TNF Receptor Associated Factor 6), MDM2, FRS2 (Fibroblast Growth Factor Receptor Substrate 2) and BAG1 (BCL2-Associated Athanogene-1), none of which has been previously reported to be amplified in HNSCC. TRAF6 was amplified in 10% of the samples. This gene has been recently implicated as an oncogene in lung cancer through activation of NF-kB.41 MDM2 and FRS2 were co-amplified in one sample (5%). The former exerts its oncogenic effects through inhibition of p53; it is amplified in several soft tissue malignancies and frequently overexpressed in HNSCC.42 FRS2 plays a critical role in cancer development through FGFR signalling and has been described as an amplified oncogene in some cancer types.43 BAG1 was amplified in one sample (5%). This gene encodes an anti-apoptotic protein that is overexpressed in several malignancies including OSCC.44
The only deleted driver gene identified was SMARCC1, being deleted in 10% of the samples. This gene is a subunit of the SWI/SNF complex and has been recently experimentally shown to act as a tumor suppressor in colon and ovarian carcinomas.45 In fact, there is increasing evidence for the involvement of SWI/SNF complex subunits, which are chromatin modifiers, in cancer.46 The role is classically thought of as tumor suppressor; however, it is proposed that oncogenesis is not due to loss of tumor suppressor subunits per se but to gain of function of remaining subunits, as shown for SMARCA2.47 The latter gene was amplified in 10% of the tumors in the current cohort, and was, therefore, also nominated as a candidate novel driver gene.
Other candidate driver CNVs identified by pathway enrichment analysis included amplification of RBPJ (Recombination Signal Binding Protein for Immunoglobulin Kappa J Region), GDF5 (Growth Differentiation Factor 5) and CLTA (Clathrin, Light Chain A). RBPJ is a transcription factor downstream the NOTCH signaling pathway and its inhibition have been shown to suppress growth of lung and prostate cancer.48,49 GDF5 have been shown to mediate TGFβ-dependent angiogenesis in breast carcinoma.50 It may also contribute to tumorigenesis through the hippo pathway as suggested by the GSEA results. Amplification of CLTA, in addition to mutations and amplification of CLTCL1 (see above), provides further support for a possible involvement of derailed endocytosis in OSSC.
Driver pathways
Key significantly altered gene pathways identified and the driver genes involved in each are presented in Table 2. Among these were well-established pathways of HNSCC, including apoptosis and cell cycle as well as p53, PI3K-Akt, EGFR, MAPK, nf-kb, NOTCH and Hippo signaling pathways. Compared to previous studies however, several novel genes were affected within these pathways in the current cohort such as RELA in PI3K-Akt signaling, TRAF6 in nf-kb signaling, MDM2 in apoptosis, FRS2 in EGFR signaling and CRB1 in Hippo signaling. Furthermore, while NOTCH signaling pathway has been repeatedly described as tumor suppressor in HNSCC, there is a possibility it had an oncogenic effect in the current cohort, particularly in view of RBPJ amplification. Additional pathways, previously implicated in the development of malignancies other than HNSCC, were identified including AP-1 and C-MYB transcription networks, Oncostatin-M signalling pathway and endocytosis.
Table 2.
Pathway | q-Value | Source | Driver genes involved1 | % samples affected |
---|---|---|---|---|
Apoptosis | 2.34E−05 | Wikipathways | TP63; CDKN2A; TP53; CASP8; FADD; MDM2; RELA | 65 |
p53 signaling pathway | 1.40E−05 | KEGG | CDKN2A; TP53; CCND1; CASP8; MDM2 | 60 |
Signaling by EGFR in cancer | 2.70E−05 | Reactome | PIK3CA; HRAS; FRS2; FGF19; CLTA; MDM2; FGF4; FGF3 | 50 |
Downstream signaling of activated FGFR | 3.78E−05 | Reactome | PIK3CA; HRAS; FRS2; FGF19; MDM2; FGF4; FGF3 | 50 |
AP-1 transcription factor network | 4.64E−05 | PID | BAG1; TP53; CCND1; CDKN2A; FOSL1 | 60 |
PI3K-Akt signaling pathway | 5.37E−05 | KEGG | TP53; CCND1; PIK3CA; HRAS; FGF19; OSMR; MDM2; FGF4; FGF3; RELA | 50 |
G1 to S cell cycle control | 0.0004 | Wikipathways | CCND1; CDKN2A; TP53; MDM2 | 50 |
Glucocorticoid receptor regulatory network | 0.0006 | PID | MDM2; SMARCC1; TP53; RELA | 45 |
C-MYB transcription factor network | 0.0009 | PID | CCND1; HRAS; CDKN2A; CLTA; SMARCA2 | 40 |
nf-kb signaling pathway | 0.001 | BioCarta | RELA; TRAF6; FADD | 25 |
Oncostatin M signaling pathway | 0.002 | Wikipathways | HRAS; OSMR; TP53; RELA | 40 |
MAPK signaling pathway | 0.003 | KEGG | TP53; TRAF6; HRAS; FGF19; RELA; FGF4; FGF3 | 45 |
Signaling by NOTCH | 0.009 | Reactome | CCND1; RBPJ; TP53; NOTCH3 | 55 |
Endocytosis | 0.016 | KEGG | CLTCL1; MDM2; CLTA; HRAS; TRAF6 | 40 |
Hippo signaling pathway | 0.07 | KEGG | CCND1; CRB1; GDF5 | 40 |
CSMD3, TRPM2 and FAT1 did not return significant pathway hits relevant to tumorigenesis; their nomination as driver genes was based the functional impact of mutations affecting them as well as recent literature implicating them in cancer.
Trends by shammah and EBV exposure
Although the small sample size did not allow for reliable statistical comparisons among the exposure subgroups, there were consistent trends by shammah and EBV exposure worth noting. The average number of genetic aberrations, particularly amplifications, was much higher in the samples from the shammah users compared to those from the non-shammah users (Figure 4). All HRAS, PIK3CA, FAT1, OSMR, TRPM2 and CLTCL1 mutations as well as the vast majority of driver gene amplifications were exclusively seen in the shammah users. Furthermore, among the shammah users themselves, samples positive for EBV tended to have higher number of aberrations than those negative for EBV (Figure 3). Amplifications of FOSL1, RELA, MDM2, BAG1, GDF5 and FRS2 in addition to all HRAS, all TRPM2, 5 out of 6 CASP and 3 out of 4 PIK3CA mutations were exclusively observed in the shammah users/EBV + group. These results are suggestive of a possible interaction between shammah use and EBV infection, a scenario never reported before in the literature and worth further investigation.
Conclusions
This is the first OSCC exome study to involve samples from the Middle East and to focus on analysis of tumors associated with the use of ST in general and Arabian snuff (shammah) in particular. Although matched control samples were not available, the novel filtration algorithm described was successfully used to identify potentially somatic variants, revealing a mutational spectrum similar to that of TCGA. Despite the small sample size, the functional-impact based algorithms employed were powerful enough to identify SMGs. In addition to confirming known genes of OSCC, the current study identified several novel driver events and pathways not previously reported in this malignancy, providing further evidence for the genetic heterogeneity of oral cancer. Based on the findings, shammah-associated OSCC seems to be characterized by involvement of novel SMGs and extensive amplification of oncogenes. The role of candidate novel driver genes identified, however, needs to be validated experimentally.
Supplementary Material
What’s new?
The mutational landscape of oral cancer seems to differ depending on the environmental exposure associations. However, there has been no attempt to assess differences in genetic alterations by type of tobacco exposure. This study sought to identify genetic aberrations driving oral squamous cell carcinoma (OSCC) development among users of shammah, an Arabian preparation of smokeless tobacco. Twenty candidate novel driver genes were identified, some of which previously implicated in other cancer types. The majority of these driver genes were amplified and may thus be targetable with drugs. The findings broaden our understanding of the genetic basis of oral cancer.
Acknowledgments
Grant sponsor: Al Muallem Mohammed Binladin Center for Knowledge and Education;; Grant number: R14351
This work was administered by the Substance Abuse Research Center (SARC) at Jazan University.
Footnotes
Additional Supporting Information may be found in the online version of this article.
The authors declare that they have no conflict of interests.
References
- 1.Warnakulasuriya S. Global epidemiology of oral and oropharyngeal cancer. Oral Oncol. 2009;45:309–16. doi: 10.1016/j.oraloncology.2008.06.002. [DOI] [PubMed] [Google Scholar]
- 2.Ferlay J, Soerjomataram I, Ervik M, et al. GLOBOCAN 2012 v1.0, Cancer Incidence and Mortality Worldwide: IARC CancerBase No. 11 [Internet] Lyon, France: International Agency for Research on Cancer; 2013. Available from: http://globocan.iarc.fr, 2013. [Google Scholar]
- 3.Sawair FA, Al-Mutwakel A, Al-Eryani K, et al. High relative frequency of oral squamous cell carcinoma in Yemen: qat and tobacco chewing as its aetiological background. Int J Environ Health Res. 2007;17:185–95. doi: 10.1080/09603120701254813. [DOI] [PubMed] [Google Scholar]
- 4.Brown A, Ravichandran K, Warnakulasuriya S. The unequal burden related to the risk of oral cancer in the different regions of the Kingdom of Saudi Arabia. Commun Dent Health. 2006;23:101–6. [PubMed] [Google Scholar]
- 5.Nasher AT, Al-Hebshi NN, Al-Moayad EE, et al. Viral infection and oral habits as risk factors for oral squamous cell carcinoma in Yemen: a case-control study. Oral Surg Oral Med Oral Pathol Oral Radiol. 2014;118:566–72. doi: 10.1016/j.oooo.2014.08.005. [DOI] [PubMed] [Google Scholar]
- 6.Leemans CR, Braakhuis BJ, Brakenhoff RH. The molecular biology of head and neck cancer. Nat Rev Cancer. 2010;11:9–22. doi: 10.1038/nrc2982. [DOI] [PubMed] [Google Scholar]
- 7.India Project Team of the International Cancer Genome Consortium. Mutational landscape of gingivo-buccal oral squamous cell carcinoma reveals new recurrently-mutated genes and molecular subgroups. Nat Commun. 2013;4:2873. doi: 10.1038/ncomms3873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Pickering CR, Zhang J, Yoo SY, et al. Integrative genomic characterization of oral squamous cell carcinoma identifies frequent somatic drivers. Cancer Discov. 2013;3:770–81. doi: 10.1158/2159-8290.CD-12-0537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.The Cancer Genome Atlas Network. Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature. 2015;517:576–82. doi: 10.1038/nature14129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Agrawal N, Frederick MJ, Pickering CR, et al. Exome sequencing of head and neck squamous cell carcinoma reveals inactivating mutations in NOTCH1. Science. 2012;333:1154–7. doi: 10.1126/science.1206923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Stransky N, Egloff AM, Tward AD, et al. The mutational landscape of head and neck squamous cell carcinoma. Science. 2012;333:1157–60. doi: 10.1126/science.1208130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Freedman ND, Abnet CC, Caporaso NE, et al. Impact of changing US cigarette smoking patterns on incident cancer: risks of 20 smoking-related cancers among the women and men of the NIH-AARP cohort. Int J Epidemiol. 2015 doi: 10.1093/ije/dyv175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Boffetta P, Hecht S, Gray N, et al. Smokeless tobacco and cancer. Lancet Oncol. 2008;9:667–75. doi: 10.1016/S1470-2045(08)70173-6. [DOI] [PubMed] [Google Scholar]
- 14.Idris AM, Ahmed HM, Malik MO. Toombak dipping and cancer of the oral cavity in the Sudan: a case-control study. Int J Cancer. 1995;63:477–80. doi: 10.1002/ijc.2910630402. [DOI] [PubMed] [Google Scholar]
- 15.Alsmadi O, John SE, Thareja G, et al. Genome at juncture of early human migration: a systematic analysis of two whole genomes and thirteen exomes from Kuwaiti population subgroup of inferred Saudi Arabian tribe ancestry. PLoS One. 2014;9:e99069. doi: 10.1371/journal.pone.0099069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Jiang Y, Oldridge DA, Diskin SJ, et al. CODEX: a normalization and copy number variation detection method for whole exome sequencing. Nucleic Acids Res. 2015;43:e39. doi: 10.1093/nar/gku1363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Plagnol V, Curtis J, Epstein M, et al. A robust model for read count data in exome sequencing experiments and implications for copy number variant calling. Bioinformatics. 2012;28:2747–54. doi: 10.1093/bioinformatics/bts526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–8. doi: 10.1038/nature07385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Sjoblom T, Jones S, Wood LD, et al. The consensus coding sequences of human breast and colorectal cancers. Science. 2006;314:268–74. doi: 10.1126/science.1133427. [DOI] [PubMed] [Google Scholar]
- 20.Gonzalez-Perez A, Lopez-Bigas N. Functional impact bias reveals cancer drivers. Nucleic Acids Res. 2012;40:e169. doi: 10.1093/nar/gks743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Youn A, Simon R. Identifying cancer driver genes in tumor genome sequencing studies. Bioinformatics. 2011;27:175–81. doi: 10.1093/bioinformatics/btq630. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 1995;57:289–300. [Google Scholar]
- 23.Lin WY, Lee WC. Improving power of genome-wide association studies with weighted false discovery rate control and prioritized subset analysis. PLoS One. 2012;7:e33716. doi: 10.1371/journal.pone.0033716. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Crompton BD, Stewart C, Taylor-Weiner A, et al. The genomic landscape of pediatric Ewing sarcoma. Cancer Discov. 2014;4:1326–41. doi: 10.1158/2159-8290.CD-13-1037. [DOI] [PubMed] [Google Scholar]
- 25.Goecks J, El-Rayes BF, Maithel SK, et al. Open pipelines for integrated tumor genome profiles reveal differences between pancreatic cancer tumors and cell lines. Cancer Med. 2015;4:392–403. doi: 10.1002/cam4.360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Morris LG, Kaufman AM, Gong Y, et al. Recurrent somatic mutation of FAT1 in multiple human cancers leads to aberrant Wnt activation. Nat Genet. 2013;45:253–61. doi: 10.1038/ng.2538. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Yang X, Lu H, Yan B, et al. DeltaNp63 versatilely regulates a Broad NF-kappaB gene program and promotes squamous epithelial proliferation, migration, and inflammation. Cancer Res. 2011;71:3688–700. doi: 10.1158/0008-5472.CAN-10-3445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Sun W, Gaykalova DA, Ochs MF, et al. Activation of the NOTCH pathway in head and neck cancer. Cancer Res. 2014;74:1091–104. doi: 10.1158/0008-5472.CAN-13-1259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Khan FH, Pandian V, Ramraj S, et al. Acquired genetic alterations in tumor cells dictate the development of high-risk neuroblastoma and clinical outcomes. BMC Cancer. 2015;15:514. doi: 10.1186/s12885-015-1463-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Liu P, Morrison C, Wang L, et al. Identification of somatic mutations in non-small cell lung carcinomas using whole-exome sequencing. Carcinogenesis. 2012;33:1270–6. doi: 10.1093/carcin/bgs148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Laprise P. Emerging role for epithelial polarity proteins of the Crumbs family as potential tumor suppressors. J Biomed Biotechnol. 2011;2011:868217. doi: 10.1155/2011/868217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Li P, Mao X, Ren Y, et al. Epithelial cell polarity determinant CRB3 in cancer development. Int J Biol Sci. 2015;11:31–7. doi: 10.7150/ijbs.10615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Mellman I, Yarden Y. Endocytosis and cancer. Cold Spring Harb Perspect Biol. 2014;5:a016949. doi: 10.1101/cshperspect.a016949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Hibi K, Goto T, Sakuraba K, et al. Methylation of OSMR gene is frequently observed in non-invasive colorectal cancer. Anticancer Res. 2011;31:1293–5. [PubMed] [Google Scholar]
- 35.Orfanelli U, Jachetti E, Chiacchiera F, et al. Antisense transcription at the TRPM2 locus as a novel prognostic marker and therapeutic target in prostate cancer. Oncogene. 2015;34:2094–102. doi: 10.1038/onc.2014.144. [DOI] [PubMed] [Google Scholar]
- 36.Balermpas P, Michel Y, Wagenblast J, et al. Nuclear NF-kappaB expression correlates with outcome among patients with head and neck squamous cell carcinoma treated with primary chemoradiation therapy. Int J Radiat Oncol Biol Phys. 2013;86:785–90. doi: 10.1016/j.ijrobp.2013.04.001. [DOI] [PubMed] [Google Scholar]
- 37.Rayet B, Gelinas C. Aberrant rel/nfkb genes and activity in human cancer. Oncogene. 1999;18:6938–47. doi: 10.1038/sj.onc.1203221. [DOI] [PubMed] [Google Scholar]
- 38.Mangone FR, Brentani MM, Nonogaki S, et al. Overexpression of Fos-related antigen-1 in head and neck squamous cell carcinoma. Int J Exp Pathol. 2005;86:205–12. doi: 10.1111/j.0959-9673.2005.00423.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Milde-Langosch K. The Fos family of transcription factors and their role in tumourigenesis. Eur J Cancer. 2005;41:2449–61. doi: 10.1016/j.ejca.2005.08.008. [DOI] [PubMed] [Google Scholar]
- 40.Gibcus JH. PhD thesis. Leeuwarden: University of Groningen; 2008. Characterization of the 11q13.3 amplicon in head and neck squamous cell carcinoma. [Google Scholar]
- 41.Starczynowski DT, Lockwood WW, Delehouzee S, et al. TRAF6 is an amplified oncogene bridging the RAS and NF-kappaB pathways in human lung cancer. J Clin Invest. 2011;121:4095–105. doi: 10.1172/JCI58818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Denaro N, Lo Nigro C, Natoli G, et al. The Role of p53 and MDM2 in head and neck cancer. ISRN Otolaryngol. 2011;2011:931813. doi: 10.5402/2011/931813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Luo LY, Kim E, Cheung HW, et al. The tyrosine kinase adaptor protein FRS2 is oncogenic and amplified in high-grade serous ovarian cancer. Mol Cancer Res. 2015;13:502–9. doi: 10.1158/1541-7786.MCR-14-0407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Wood J, Lee SS, Hague A. Bag-1 proteins in oral squamous cell carcinoma. Oral Oncol. 2009;45:94–102. doi: 10.1016/j.oraloncology.2008.07.013. [DOI] [PubMed] [Google Scholar]
- 45.DelBove J, Rosson G, Strobeck M, et al. Identification of a core member of the SWI/SNF complex, BAF155/SMARCC1, as a human tumor suppressor gene. Epigenetics. 2011;6:1444–53. doi: 10.4161/epi.6.12.18492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Helming KC, Wang X, Roberts CW. Vulnerabilities of mutant SWI/SNF complexes in cancer. Cancer Cell. 2014;26:309–17. doi: 10.1016/j.ccr.2014.07.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Wilson BG, Helming KC, Wang X, et al. Residual complexes containing SMARCA2 (BRM) underlie the oncogenic drive of SMARCA4 (BRG1) mutation. Mol Cell Biol. 2014;34:1136–44. doi: 10.1128/MCB.01372-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Lv Q, Shen R, Wang J. RBPJ inhibition impairs the growth of lung cancer. Tumour Biol. 2015;36:3751–6. doi: 10.1007/s13277-014-3015-5. [DOI] [PubMed] [Google Scholar]
- 49.Xue L, Li H, Chen Q, et al. Inhibition of recombining binding protein suppressor of hairless (RBPJ) impairs the growth of prostate cancer. Cell Physiol Biochem. 2015;36:1982–90. doi: 10.1159/000430166. [DOI] [PubMed] [Google Scholar]
- 50.Margheri F, Schiavone N, Papucci L, et al. GDF5 regulates TGF-b dependent angiogenesis in breast carcinoma MCF-7 cells: in vitro and in vivo control by anti-TGF-b peptides. PLoS One. 2012;7:e50342. doi: 10.1371/journal.pone.0050342. [PMC][10.1371/journal.pone.0050342] [23226264] [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.