Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2016 Apr 7;98(4):709–727. doi: 10.1016/j.ajhg.2016.02.021

Genomic Characterization of Esophageal Squamous Cell Carcinoma Reveals Critical Genes Underlying Tumorigenesis and Poor Prognosis

Hai-De Qin 1,2,7, Xiao-Yu Liao 1,7, Yuan-Bin Chen 1,7, Shao-Yi Huang 1, Wen-Qiong Xue 1, Fang-Fang Li 1, Xiao-Song Ge 1,3, De-Qing Liu 1, Qiuyin Cai 4, Jirong Long 4, Xi-Zhao Li 1, Ye-Zhu Hu 1, Shao-Dan Zhang 1, Lan-Jun Zhang 1, Benjamin Lehrman 2, Alan F Scott 5, Dongxin Lin 6, Yi-Xin Zeng 1, Yin Yao Shugart 2,, Wei-Hua Jia 1,∗∗
PMCID: PMC4833434  PMID: 27058444

Abstract

The genetic mechanisms underlying the poor prognosis of esophageal squamous cell carcinoma (ESCC) are not well understood. Here, we report somatic mutations found in ESCC from sequencing 10 whole-genome and 57 whole-exome matched tumor-normal sample pairs. Among the identified genes, we characterized mutations in VANGL1 and showed that they accelerated cell growth in vitro. We also found that five other genes, including three coding genes (SHANK2, MYBL2, FADD) and two non-coding genes (miR-4707-5p, PCAT1), were involved in somatic copy-number alterations (SCNAs) or structural variants (SVs). A survival analysis based on the expression profiles of 321 individuals with ESCC indicated that these genes were significantly associated with poorer survival. Subsequently, we performed functional studies, which showed that miR-4707-5p and MYBL2 promoted proliferation and metastasis. Together, our results shed light on somatic mutations and genomic events that contribute to ESCC tumorigenesis and prognosis and might suggest therapeutic targets.

Introduction

Esophageal squamous cell carcinoma (ESCC [MIM: 133239]) is a rapidly progressing cancer with poor prognosis.1 The low efficacy of current treatments underscores the need to better understand the genetic mechanisms driving tumor progression.2 Common germline variants3, 4, 5, 6, 7 and somatic mutations,8, 9 as well as copy-number alterations,10 have been implicated in ESCC. Recent studies on ESCC from high-risk areas in China11, 12, 13 have highlighted the roles of multiple recurrently altered genes in the pathogenesis of ESCC, including TP53 (MIM: 191170), NOTCH1 (MIM: 190198), CDKN2A (MIM: 600160), and RB1 (MIM: 614041). A recent next-generation sequencing study of ESCC identified mutations in the APOBEC family of cytidine deaminases.14 However, a complete landscape of genetic lesions remains incomplete, and it is likely that additional genes might also play a role in this disease and its progression. The discovery of these genetic lesions could become useful predictors of survival or therapeutic targets. We used a variety of methods including whole-genome and whole-exome sequencing, gene expression profiling, cell biology assays, and a xenograft mouse model study to identify and characterize functional genes in ESCC development.

Material and Methods

Study Subjects

Individuals with ESCC were selected from the biobank of the Sun Yat-Sen University Cancer Center (SYSUCC). In the whole-genome sequencing (WGS) study, the criterions for inclusion were as follows: (1) all subjects were histologically confirmed, (2) the histological types of ESCC were clearly determined, and (3) individuals did not receive chemotherapy or radiotherapy before surgery. Tumor location included the upper, middle, and lower thirds of the esophagus. Details on sample characteristics are shown in Table S1. Esophageal tumor tissues and adjacent normal esophageal tissues were dissected during surgery. The tissues were frozen at −80°C for subsequent DNA extraction. Frozen tissues were re-examined by at least one certified pathologist. Tissues were classified as “normal” if no tumor cells were detected or classified as “tumor” if they contained at least 80% malignant cells.

For the whole-exome sequencing (WES) study, we collected 60 tumor-normal pairs. The detailed information on the study subjects is described in Table S1. To investigate the relationship between gene expression and survival outcome, we evaluated 321 ESCC tumor-normal pairs, using the same criteria as described above. The majority of cohort subjects were male (78.5%), the mean age was 58.1 years (range: 34–84), and 81.3% (261) of the ESCC individuals were at a late clinical stage (TNM stage: IIb–IV). 142 of the 321 individuals were alive and 179 were deceased at the endpoint of a five-year follow-up. Detailed pathological characteristics of the individuals are provided in Table S2.

Clinical data of the studied subjects was collected from medical records at SYSUCC. The information on survival status, smoking, and alcohol consumption was collected via telephone interview with ESCC-affected individuals or, if deceased, through their first-degree relatives via a survey questionnaire. Survival status was also confirmed by linking to the information in the databases of the Death Registry of the Public Security Department.

This study was approved by the human ethics committee at the SYSUCC. All samples were collected with institutional review board approval and with documented informed consent by all participants.

DNA Extraction and Quality Assurance

DNA extraction from the frozen tissues was conducted with a QIAamp DNA Mini Kit (QIAGEN) according to the manufacturer’s protocol. DNA was quantified with Quant-iT PicoGreen dsDNA quantitation reagent as specified by the manufacturer (Invitrogen). The quality of DNA samples was confirmed by Complete Genomics (CG) and by the genome sequencing core facility at Vanderbilt University.

WGS, Data QC, and Variant Calling

We performed WGS on ten matched esophageal tumor-normal genomes by using the cancer sequencing service of CG. Data from the CG sequencer, based on iterative oligomer hybridization,15 was analyzed with the CG analysis pipeline (version 2.0). Reads were aligned to the NCBI Genome browser reference genome (build 37.2), and single-nucleotide variants (SNVs), indels, and substitutions were called. For alignment in WGS, reads were initially mapped to the reference genome with a fast algorithm, and these initial mappings were refined by local de novo assembly, which was applied to all regions that appear to contain variants from the initial mappings. The process of alignment has been described in previous papers.16

For somatic variants (SNVs, small indels), each tumor sample was compared to the baseline sample within the pair. The somatic variants that were unique to the tumor genome and not present in the normal genome were identified by the statistic somaticScore, indicating the quality for the somatic mutation calls. Somatic scores of ≥−10 (equivalent to SQHIGH) were reported in our study. Somatic variants of SQHIGH indicated a high quality of calling (i.e., it was reported that, for somatic variants in the exome, the false positives had a 3% false discovery rate at SQHIGH). The process of variant calling has been described previously.15

For somatic copy-number variants (CNVs), coverage in the tumor genome was normalized to coverage for the same region in the matched normal genome. A hidden Markov model was used to calculate CNVs in the genome. Segment plots were generated with the Bioconductor R package called Gviz (version 1.10.3).17 Structural variants (SVs), including deletions, inversions, tandem-duplications, distal-duplications, inter-chromosomal conjunctions, and complex variants, were identified with the CG pipeline. Details on data quality control (QC), assembly, and variant calling techniques have been previously described.15, 16

Somatic Circos plots were generated with the CG Assembly Pipeline version 2.0, and merged Circos plots for all samples were generated with the RCircos R package18 based on the high-confidence conjunctions identified via the CG Pipeline. We excluded the SVs involving human genomic repeats.

WES, Data QC, and Variant Calling

A total of 60 tumor-normal pairs were exome-sequenced with paired-end DNA sequencing at the genome sequencing core facility at Vanderbilt University. The Agilent Sure-Select Human All Exon V4 plus UTRs reagent targeting 335,765 human exons per 71.3 Mb in 190,414 genomic intervals was used as recommended by the manufacturer to capture the exomes and adjacent UTRs. An Illumina HiSeq2000 instrument was used to sequence the target regions to generate paired-end reads (76 bp × 2) with approximately 70× coverage per sample.

The FASTQ reads were subjected to initial data QC via QPLOT to assess the performance of the sequence run. We also used FastQC19 to evaluate the quality of the raw reads. Using UCSC Genome Browser hg19 as the reference, we conducted sequence alignment for samples that passed initial QC from FastQC by using the Burrows-Wheeler aligner (BWA, version 0.6.2).20 Specifically, reads were aligned to hg19 with the BWA software in multi-threading mode with options “-q 20” to trim poor-quality sequence bases in reads to ensure high-quality mapping. We also used the option “-k 1 -n 5 -l 25” as the alignment mapping parameters (taking the first 25 bp subsequence as the seed, the maximum edit distance in the seed is 1 and the maximum edit distance is 5).

After alignment by the BWA process, to improve variant calling accuracy, we performed standard processing for re-alignment of the BAM (Binary Alignment/Map) files before somatic mutations calling. These steps included the removal of duplicates (MarkDuplicates: identical 5′ coordinates and orientations and marks as duplicates), indel realignment (IndelRealigner: perform local realignment of reads around indels), and base recalibration (Base Quality Score Recalibration: recalibrate base quality scores) with the Picard tools21 and the GATK tool suite (the Genome Analysis Toolkit, Broad Institute).22 This approach has been widely used to improve variant calling accuracy.

We assessed the mapping quality and evaluated the concordance rate of known SNV calls. We further checked sample contamination (or sample swaps) and mismatches by evaluating the identical-by-descent matrix with germline variants (minor allele frequency > 0.05). One sample failed the initial QC, and two WES samples failed to pass the sample contamination QC and were removed. The remaining 57 WES samples were used for subsequent analysis.

For each sample, somatic SNVs and indels were called with VARSCAN2.23 The BAM files were processed into pileup format with SAMtools, skipping the alignments with a MAPQ (mapping quality score, reported on a Phred scale) smaller than 1 (option “-q 1”). We called somatic mutations with the default options of the software, setting the minimum depth at 8 reads for the normal and 6 reads for the tumors and the variant frequency to call a heterozygote at >0.1. Somatic mutations were further filtered to determine high-confidence calls via empirically derived parameters, including a variant allele frequency > 10% in the tumor and < 5% in the normal, and the one-tailed Fisher’s exact test p value < 0.07. The Fisher’s exact test was conducted by comparing the number of reference-supporting reads and the number of variant-supporting reads in the tumor and normal categories.23 We report only high-confidence somatic variants in this study.

Somatic copy-number alteration (SCNA) detection was based on the quantification of deviations from the log-ratio of sequence coverage depth within a tumor-normal pair.23 We used an empirical threshold of log ratio (logR) > 1.5 for genomic amplification and a log ratio < −1.5 for genomic deletion. Genome-wide SCNA plots were generated with the in-house R scripts based on the R package called gap.24

Somatic Mutation Annotation, Filtering, and Characterization

WGS variant data (masterVar/somaticVcfBeta) was manipulated with the CGAtools version 1.7.0 package from CG. CG variant files were converted to masterVar format and then to VCF format with CGAtools. R25 and Python scripts were used for data manipulation and figure preparation on our PowerWulf cluster computers (PSSC Lab). Variant annotation was conducted with ANNOVAR26 with publicly available databases, including the 1000 Genome Project database (1000 g2012apr_all, April 2012 version) and NHLBI exome sequencing project (NHLBI-ESP 6500, esp6500si_all).27 SNV annotation was based on dbSNP135 (snp135). Somatic variants were also annotated with COSMIC, the Catalogue Of Somatic Mutations In Cancer (version 64, cosmic64). PolyPhen-228 (ljb_pp2) and SIFT29 (ljb_sift) whole-exome scores were used to predict the potential biological function of the variants.

After annotation, somatic SNVs were classified into several categories based on genomic functional regions, including downstream, upstream, exonic, intergenic, intronic, 3′ UTR, 5′ UTR, splicing sites, and non-coding RNA (ncRNA) and their potential functional impacts (non-frameshift deletion and insertion, non-synonymous and synonymous SNVs, frameshift indels, and stop-gain and stop-loss variants). Loss-of-function (LOF) mutations included frameshift indels, stop-gain and stop-loss variants, and variants within splicing sites. Damaging variants were defined as harmful and predicted as damaging by either PolyPhen-228 or SIFT2.29 We defined LOF mutations and damaging variants as deleterious in this study. In addition, the novelty of the variants was determined with information provided by dbSNP (dbsnp135).

We also used SNPeFF (version 4.0) for annotation and vcf2maf to generate a standard TCGA (The Cancer Genome Atlas project of the International Genomics Consortium) Mutation Annotation Format (MAF) file for the somatic SNVs. The MAF format data were used as input for the MutSigCV tool30 to identify the potential significant genes.

To explore the roles of somatic mutations in regulatory elements, we annotated the whole-genome somatic mutations by mapping the mutations to the non-coding regulatory regions from the Encode Project and then focused on 57 chromatin immunoprecipitation sequencing (ChIP-seq) datasets generated from histone-binding assays in six human cell lines (H1hesc, Hepg2, Huvec, K562, Helas3, and Jurkat). In addition, we extracted all somatic variants in flanking regions and UTRs of genes to identify those variants mapping to known regulatory elements in the genome. We used the original January 2011 ENCODE data freeze. We reanalyzed the ENCODE data with the regulomeDB online tool to profile the somatic variants in regulatory elements identified in all of the immortalized cell lines in ENCODE.

The genomic regions included promoters (active for H3K4me3 and H3K9Ac binding and repressive for H3K27me3 and H3K9me3 binding), enhancers (active: H3K4me1 and H3K27Ac), and elongation binding sites (active: H3K36me3, H3K79me2, and H4K20me1).

Aggregation of Somatic Mutations, SCNAs, and Hierarchical Clustering

We collapsed somatic variants into genes (NCBI build 37) or the gene sets in known signaling pathways. Consecutive SCNAs, including genomic amplifications and duplications, were merged with Bedtools.31 Mutation frequency was calculated based on the number of the ESCC-affected individuals who harbored the observed somatic mutations out of all samples. We used the Kyoto Encyclopedia of Genes and Genomes (KEGG) knowledge database for the illustration of altered pathways. We used the R-package hclust to conduct the hierarchical clustering for samples and somatic variants. Heatmap plots were generated with our in-house R scripts based on the heatmap2 function in the R package called gplots.32 Hierarchical clustering was conducted with the Hclust method implemented in heatmap2 based on Euclidean distance matrix.

Variant Validation

To validate the SNVs and small indels, we used the Sanger sequencing method with BigDye Terminator chemistry on an ABI3730XL automated sequencer (Applied Biosystems) according to the protocol provided by the manufacturer. The somatic variants of interest were amplified by high-fidelity PCR, and then the amplified PCR products were subjected to Sanger sequencing.

Gene Selection for RNAi-Screening Assay

We selected 19 genes that had at least one somatic mutation in the studied subjects. We focused on previously uncharacterized genes, excluding well-established oncogenes that have been extensively studied (e.g., TP53, CDKN2A, and PIK3CA [MIM: 171834]).

We hypothesized that genes with deleterious somatic mutations (LOF or damaging) would be more important in the pathogenesis of ESCC. We focused on those previously uncharacterized recurrent mutations with potential function. After excluding genes that have been comprehensively characterized in cancers (i.e., TP53, CDKN2A, and NFE2L2 [MIM: 600492]), we selected 13 genes with deleterious mutations, namely, PABPC1 (MIM: 604679), ASTN1 (MIM: 600904), AGAP3, VANGL1 (MIM: 610132), CDYL (MIM: 603778), DHX33 (MIM: 614405), PLXNA2 (MIM: 601054), HCN1 (MIM: 602780), SCUBE3 (MIM: 614708), DDX26B, NRXN1 (MIM: 600565), GIGYF2 (MIM: 612003), and SYNPO2. Among them, eight genes carried LOF mutations, including PABPC1, ASTN1, AGAP3, VANGL1, CDYL, DHX33, NRXN1, and GIGYF2. In addition, we also included SOX11 (MIM: 600898) and SETBP1 (MIM: 611060), both of which showed a relatively high frequency of mutations in UTR regions (five individuals and three individuals affected, respectively). Lastly, we included AFF3 (MIM: 601464) (two individuals affected with two UTR mutations), WASF3 (MIM: 605068) (one individual affected with two mutations), and two other genes (RSPO1 [MIM: 609595] and RSU1 [MIM: 179555]) for their potential functions linked to cancer progression. Together, a total of 19 candidate interested genes were selected for RNAi assay.

Cell Lines

ESCC cell lines TE-1 and EC109 were purchased from the Cell Bank of the Chinese Academy of Sciences (Shanghai). ESCC cell lines KYSE30, KYSE140, KYSE180, KYSE410, and KYSE510 were generously provided by Professor Guan from SYSUCC. HEK293T cells were obtained from the State Key Laboratory of Oncology in South China (SYSUCC) and maintained in DMEM, and the other cell lines were maintained in RPMI-1640. All medium contained 10% fetal bovine serum and 1 mM penicillin and streptomycin.

Gene Expression Profiling for WGS Samples

Total RNA was extracted from tumor tissues and adjacent normal tissues of ten samples also used for WGS. Each RNA sample was quantified with the NanoDrop ND-1000 and RNA integrity was assessed with denaturing agarose gel electrophoresis. The pooled tumor RNA sample was generated by aliquoting an equal amount of tumor RNA from these ten samples. The pooled control RNA was generated from the normal RNA via the same method. After initial quality assurance, mRNA was purified from total RNA after removal of rRNA (mRNA-ONLY Eukaryotic mRNA Isolation Kit, Epicenter). Then, each sample was amplified and transcribed into fluorescent cRNA along the entire length of the transcripts. The labeled cRNAs were hybridized onto the Human LncRNA Array v.2.0 (8 × 60 K, Arraystar) and read on an Agilent Scanner G2505C according to the manufacturer’s standard protocols. Images were analyzed with the Agilent Feature Extraction software (version 11.0.1.1). Data pre-processing was performed with the GeneSpring GX v.11.5.1 software package (Agilent Technologies), including background signal adjustment, poor signal filtering, quantile normalization, and log transformation. Differentially expressed long non-coding RNAs (lncRNAs) and mRNAs in the pooled WGS samples were assessed by log fold change (FC).

Analysis of Array-Based Gene Expression in ESCC

To explore the biological impacts of SCNAs, including recurrent amplifications and deletions, we evaluated the correlation of SCNAs with global gene expression. We used both the in-house generated dataset of mRNA and lncRNA expression data (see Gene Expression Profiling for WGS Samples) and two publicly available datasets in the NCBI GEO repository, including GEO: GSE23400 (n = 53, Affymetrix Human Genome U133 Set U133A)33 and GEO: GSE20347 (n = 17, Affymetrix HG-U133A 2.0).34 Both GEO datasets were derived from mRNA arrays used to profile gene expression in ESCC tumor tissue as compared to their matched normal tissue. We included these two microarray datasets for the following reasons: (1) both studies were conducted by the same lab with the same platform (Affymetrix Human Genome U133A 2.0 Array), and (2) QC, background correction, and normalization were conducted with the same normalization method, the Robust Multiarray Average (RMA) algorithm implemented in Bioconductor in R. Additionally, (3) the results have been validated by the authors, and also have been used or validated in several other studies. Therefore, the data generated might represent a subset of ESCC differentially expressed genes with a relatively low false-positive rate.

Data pre-processing was performed (for the un-preprocessed dataset) with the GeneSpring GX (Agilent Technologies, see Gene Expression Profiling for WGS Samples above). Differentially expressed genes between tumor and normal samples were identified with a paired t test and fold change. Hierarchical clustering was performed based on the Euclidian distance matrix.

Pathway Analysis

To explore the information on somatically mutated genes and differentially expressed genes, we conducted pathway analysis with the David Bioinformatics Functional Annotation Tools and visualized altered genes in significantly enriched pathways with the Bioconductor R package called pathview.

Specifically, we conducted an enrichment analysis by using David Bioinformatics Functional Annotation Tools for somatically mutated genes. Separately, we conducted enrichment analyses with differentially expressed genes (combinations of differential-expression gene sets from GSE23400, GSE20347, and an in-house microarray dataset) and SCNA-affected genes. We then over-laid the same pathways with the above-mentioned three layers of analyses by using the Pathview tool and highlighted the altered genes enriched in the pathways. This combined analytical approach enabled us to illustrate the altered pathways by using different types of genomic data and expression data that might reflect the biological nature of altered genes in the pathways.

Real-Time qPCR Assays

Total RNA from tissue and cell lines was extracted with Trizol (Invitrogen). Reverse transcription was performed with the PrimeScript RT reagent kit (Takara). We performed qPCR analysis with Platinum SYBR Green qPCR SuperMix-UDG (Invitrogen) by using a LightCycler480 Real-Time PCR Detection System (Roche). Relative expression of targets was calculated with the following equations: relative quantity = 2−ΔCt, ΔCt = Ct (target) – Ct (reference). The primers for the target coding genes and lncRNAs are listed in Table S3. Primers for microRNA reverse transcription and qPCR were purchased from RiboBio (Guangzhou).

siRNA and miRNA Transfection

For siRNA and miRNA transfection, Lipofectamine RNAiMAX Reagent (Invitrogen) was used according to the manufacturer’s instructions. siRNA and miRNA mimics were purchased from RiboBio (Guangzhou). The sequences are shown in Table S3.

Generation of Lentivirus-Mediated Overexpressing Cells

Overexpressing lentiviruses were generated by co-transfection of PCDH-copGFP (System Biosciences) and the packing vectors psPAX2 and pMD2.G into HEK293T. After cells were infected, the GFP-positive cells were collected by fluorescence-activated cell sorting. Primers for plasmid construction are listed in Table S3.

Short-Term Cell Proliferation Assays

Cells were placed into 96-well plates at 3 × 103 to 4 × 103 cells per well and incubated for 3–7 days. The MTT (3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyl tetrazolium bromide) assay was performed as previously described.35

Colony Formation Assay

5 × 102 to 3 × 104 cells were placed into six-well plates and incubated for two weeks. The number of clones was counted after staining with crystal violet.

Cell-Cycle Analysis

Cells were harvested after transfection (24 hr), washed with ice-cold PBS, and fixed in 75% ice-cold ethanol at −20°C for 1 hr. RNaseA (TakaRa) was added at a final concentration of 50 ug/ml, and the cells were incubated at 37°C for 30 min. For staining, cells were incubated in 20 ug/ml propidium iodide (PI, Life Technologies) for 30 min at 4°C. The treated cells were collected and analyzed on a flow cytometer (Cytomics FC500, Beckman Coulter).

Transwell Migration and Invasion Assay

The migration and invasion ability of cells was determined by the Transwell chamber assay as previously described.36 4 × 104to 10 × 104 cells were seeded onto the top chamber of the 24-well polyethylene terephathalate filter with 8 μm pores (BD Falcon) while the bottom chamber was filled with 600 ul RPMI1640 with 10% fetal bovine serum. After 20 hr, cells that crossed the membranes were fixed with crystal violet and counted. For the invasion assay, a Matrigel-covered Transwell chamber (BD Falcon) was used.

TMAs and IHC

Formalin fixed, paraffin-embedded tissue microarrays (TMAs), including 120 primary ESCC tumors and 85 normal tissues, were used for immunohistochemistry (IHC). TMA slides were deparaffinized in xylene, rehydrated with ethanol, and incubated for 10 min in 3% hydrogen peroxide to quench endogenous peroxidase activity. For antigen retrieval, slides were heated with citrate buffer in a pressure container for 5 min and cooled at room temperature. The TMAs were incubated overnight at 4°C with the antibodies listed in Table S3. The nucleus was counterstained with hematoxylin. The results were independently evaluated by two pathologists on the basis of the following criteria—(1) staining intensity: zero (negative), 1 (weak), 2 (moderate), and 3 (strong) and (2) percentage of positive cells: 1 (≤25%), 2 (25%–50%), 3 (50%–75%), and 4 (75%–100%). A non-parametric Wilcoxon rank sum test was used for the comparison of protein amounts between tumor and normal tissues.

Western Blotting

Western blot analysis was performed according to the standard protocol. Antibodies used are shown in Table S3.

miRNA Targets Prediction and Luciferase Reporter Assay

miRNA targets were predicted with TargetScan37 and miRDB38, 39 software. The full length of the 3′ UTRs of the target genes were cloned into the pmirGLO Dual-Luciferase miRNA Target Expression Vector (Promega) (Table S3). The mutated vectors were constructed with the QuickChange Multi Site-Directed Mutagenesis Kit (Agilent Technologies). The site-directed mutation primers used are shown in Table S3. For the luciferase assay, vectors were co-transfected into HEK293T cells with either a negative control or miRNA mimics. 24 hr later, the luciferase activity was determined by the Dual-Glo Luciferase Assay System (Promega) and a TD-20/20 luminometer (Turner Designs).

Luciferase Assays for TOP-flash

Measurement of Wnt signaling was performed with the TOP-flash assay.40 HEK293T cells were cotransfected with pcDNA3.1-Wnt1 with pcDNA3.1-VANGL1 or mutants of pcDNA3.1-VANGL1. Luciferase activity was assessed according to the Dual-Glo Luciferase Assay System (Promega) 24 hr after transfection.

Xenografts in BALB/C-nu/nu Mice

All animal experiments were conducted following the guidelines of the institutional animal care and use committee at SYSUCC. Six-week-old age-matched BALB/C-nu/nu nude mice were used for lung metastasis assays41 and lymph node metastasis assays.42 For the lung metastasis assay, 2 × 106 cells, resuspended in RPMI1640, were injected into the tail vein. After 12 weeks, mice were sacrificed, the lungs were serial sectioned, and the extent of metastasis was determined by H&E staining and microscopic examination. For the lymph node metastasis assay, 2 × 106 cells, resuspended in RPMI1640, were injected into the footpad. After 8 weeks, all the popliteal lymph nodes and inguinal lymph nodes were evaluated for metastases by H&E staining.

Survival Analysis

The Cox proportional-hazards model was used for survival analysis with the R package called survival. We used a p value cutoff of <0.05 as statistical significance. For gene expression, we dichotomized gene expression values into low- and high-level groups by using an optimal cutoff value that maximizes sensitivity and specificity in the receiver operating characteristic (ROC) curve.

For SCNAs, ESCC-affected subjects with a given recurrent amplification or deletion were designated as “1,” whereas unaffected individuals were designated as “0.” To evaluate hazard ratios (HRs) for deregulated gene expression, we adjusted for sex, age, clinical stage, smoking, and alcohol drinking in the Cox regression models.

Results

In this study, 67 somatic genomes of ESCC were comprehensively characterized via WGS and WES. We described a landscape of genetic lesions, including somatic SNVs, small indels, and SVs and/or SCNAs. To investigate the molecular functions of the somatically altered genes in ESCC, we conducted functional analysis by using biological assays and evaluated the influence of their expression on prognosis in a large clinical cohort (n = 321). Finally, enrichment analysis was conducted to explore a potential joint effect of altered genes at the genomic and transcriptional levels on ESCC development. See Figure 1 for the workflow of our studies.

Figure 1.

Figure 1

Illustration of the Experimental Design and Analyses

Initially, we conducted WGS with 10 tumor-normal pairs and WES with 60 tumor-normal pairs. One sample failed the initial QC, and two WES samples failed to pass the sample contamination QC and were removed. The remaining 57 WES samples and 10 WGS samples were used for subsequent analysis (n = 67). We used the CG pipeline 2.0 for variant calling, filtering for WGS, and VARSCAN for WES. We combined high-confidence somatic variants from WGS and WES somatic mutations within 71.3 Mb of targeted regions and applied these variants to annotation and summarization. For gene expression analysis, we used in-house microarray data and two ESCC microarray datasets from the NCBI GEO database (see Material and Methods) to generate a differential expression (DE) gene set. Finally, genes from genomic data analysis and expression data analysis were subjected to biological studies and/or analysis in a large cohort. Three types of analyses were performed: (1) functional analysis: RNAi for 19 genes with LOF mutations and identified as uncharacterized based on a literature search; (2) survival analysis (n = 321); and (3) pathway-based analysis for differentially expressed genes, mutated genes, and SCNV genes.

Sequence Analysis and Somatic Mutation Identification

An average read coverage of approximately 50-fold was obtained in the ten pairs of WGS samples. For WES, we obtained approximately 72-fold read coverage in 57 tumor-normal pairs of samples. In the WES samples, 90% of genetic variants were SNVs, 5% were deletions, and 4% were insertions. Additionally, 27% of identified variants were novel. See Figure S1 for the Ti/Tv profiles in samples.

A total of 19,434 somatic mutations within 71.3 Mb of targeted regions (335,765 exons) were found in the combined WGS and WES data (Table S4). Table S5 lists the details of the somatic variants identified in exomes. Figure S2 depicts the somatic variant distributions across genomic functional regions (WGS). Sanger sequencing on a subset of variants confirmed 82% (83 variants) of somatic variants identified by WGS and 93% (84 variants) by WES. A list of somatic mutations identified by WGS and WES that have been validated by Sanger sequencing is provided in Table S6.

For exonic mutations, we identified 4,582 (24%) somatic mutations in coding regions. Previously reported genes in ESCC were frequent targets: TP53 (67% of cases have non-synonymous mutations or UTR/splicing-site mutations), TTN (21% [MIM: 188840]), NOTCH1 (19%), NFE2L2 (13% [MIM: 600492]), and CDKN2A (10%) (Figure 2A). We characterized mutations by either PolyPhen-228 or SIFT229 and found that 409 (2%) of somatic mutations were predicted to result in loss-of-function mutations and 1,381 (7%) were classified as damaging (Table S4).

Figure 2.

Figure 2

Top Recurrent Genes Harbored Exonic Somatic Mutations in 67 ESCC Samples

(A) Upper panel, bar plot for the somatic mutation rates per Mb in tumor genomes. Lower panel, occurrence of top-ranked somatically mutated genes identified by WGS and WES. Each vertical bar represents an individual with a somatic mutation in the specified gene. For the exonic category (green), we excluded silent substitutions, mutations in the upstream or 5′ flank, the downstream or 3′ flank, and intergenic and intronic regions (IGRs). LOF (red) mutations include frameshift mutations, stop-gain and stop-loss mutations, and mutations within splicing sites. LOF and predicted damaging mutations were designated as deleterious (orange). Right panel, the x axis indicates the proportion of individuals harboring somatic mutations in the specific gene.

(B) 5 out of 19 mutated genes showed significant effects on cell growth in ESCC cell lines (KYSE30, EC109) in RNAi knockdown assays (see Material and Methods). The bars represent the mean ± SD; nc, the siRNA negative control with a nonsense and scrambled sequence; t test p value < 0.05, ∗∗p value < 0.01, ∗∗∗p value < 0.001.

We also conducted bioinformatics analysis to identify significant genes in our genomic data. We used the MutSigCV software43 for gene identification because of its wide use in cancer studies. As a result, TP53, CDKN2A, NOTCH1, and NFE2L2 were ranked at the top of the gene list, with q values < 0.05 (Table S7). Of note, our sequencing coverage and the limited sample size might be underpowered to detect other significant genes with rare mutations.

For somatic mutations in regulatory regions, we analyzed mutations in flanking regions, UTRs, transcriptional binding sites (TBSs), histone binding sites, or other regulatory elements of genes. See Figure S2 for the distributions of the variants in WGS samples, and see Table S8 for details of the somatic mutations within flanking regions and UTRs mapping to the ENCODE regulatory database using Regulomedb tool.

To determine which somatic mutations were correlated with overall survival, we evaluated the correlations of all genes with deleterious recurrent mutation frequencies greater than 7% in the WES samples. Of these, NOTCH1 provided the strongest evidence as a predictive marker for ESCC prognosis and was correlated with ESCC overall clinical stage (p = 0.0096) and WHO T stage (p = 0.047) (Figure S3A). Survival analysis showed that individuals with deleterious NOTCH1 mutations had a better outcome than those individuals without deleterious mutations (Figure S3B). But these results were preliminary due to our limited sample size. We further investigated the potential correlation of NOTCH1 gene expression and ESCC survival by using a large sample set (n = 321). Individuals with lower expression of NOTCH1 had a higher 5-year overall survival (OS) rate than those with higher NOTCH1 levels (54.6% versus 39.7%, respectively). And the median OS was 50 months and 29.6 months, respectively (log-rank test p = 0.006) (Figure S3C). Univariate analysis using Cox’s proportional hazard model showed that NOTCH1 (along with other clinical characters as showed in Table 1) was significantly correlated to OS with an HR of 1.60 (95% confidence interval [CI] = 1.14–2.25, p = 0.007). Multivariate Cox regression analysis indicated that NOTCH1 expression, after adjustment for age, sex, tumor stage, smoking, and alcohol consumption, was significantly associated with OS (HR = 1.51, 95% CI = 1.06–2.15, p = 0.022) (Table 1). Our study suggests that loss of function of NOTCH1 by deleterious mutations or by downregulated expression prolongs survival of ESCC-affected individuals.

Table 1.

Survival Analysis for Clinical Demographics and Gene Expression in 321 Subjects with ESCC

Analysis Variable HR (95% CI) p Value
Univariate

Gender (male/female) 0.87 (0.60, 1.25) 0.447
Age (< 58 / ≥ 58) 1.66 (1.22, 2.25) 0.001a
Tumor size (< 4.0 cm / ≥ 4.0 cm) 0.96 (0.71, 1.29) 0.765
Tumor location (upper/middle/lower) 0.70 (0.53, 0.91) 0.009a
Histologic grade (G1/G2/G3) 1.29 (1.05, 1.58) 0.017b
T status (T1-2/T3-4) 1.30 (0.88, 1.92) 0.183
Lymph node metastasis (N0/N1/2/3) 1.65 (1.40, 1.95) <0.001c
M status (M0/M1) 2.27 (1.29, 3.99) 0.005a
Stage 10 (I-IIa/IIb-IV) 2.96 (1.82,4.83) <0.001c
Smoking 1.20 (0.86, 1.69) 0.280
Alcohol drinking 1.02 (0.75, 1.37) 0.922
NOTCH1 expression (low/high) 1.60 (1.14, 2.25) 0.007a
MYBL2 expression (low/high) 1.60 (1.15, 2.23) 0.005a
FADD expression (low/high) 1.58 (1.18, 2.12) 0.002a
FGF19 expression (low/high) 1.25 (0.86, 1.82) 0.236
SHANK2 expression (low/high) 2.48 (1.50, 4.10) <0.001c
PCAT1 expression (low/high) 1.52 (1.03, 2.24) 0.035b
miR-4707-5p expression (low/high) 1.46 (0.96, 2.21) 0.077

Multivariated

NOTCH1 expression (low/high) 1.51 (1.06, 2.15) 0.022b
MYBL2 expression (low/high) 1.56 (1.11, 2.19) 0.010b
FADD expression (low/high) 1.63 (1.20, 2.21) 0.002a
FGF19 expression (low/high) 1.08 (0.74, 1.60) 0.685
SHANK2 expression (low/high) 2.51 (1.49, 4.22) 0.001a
PCAT1 expression (low/high) 1.75 (1.15, 2.67) 0.009a
miR-4707-5p expression (low/high) 1.71 (1.10, 2.65) 0.016b
a

p < 0.01.

b

p < 0.05.

c

p < 0.001.

d

Adjusted for age, sex, stage, smoking, and alcohol.

We also employed an RNAi screening assay with the ESCC cell lines KYSE30 and EC109 to knock down 19 previously uncharacterized genes with deleterious mutations identified in this study (see Gene Selection for RNAi-Screening Assay in Material and Methods for details). Both cell lines lacked mutations in these genes. We found that five genes were associated with ESCC cell growth, including VANGL1 (MIM: 610132), SCUBE3 (MIM: 614708), SOX11 (MIM: 600898), RSU1 (MIM: 179555) and RSPO1 (MIM: 609595) (Figure 2B), of which, VANGL1 showed the largest effect.

Characterization of the Roles of VANGL1 Somatic Mutations in ESCC

VANGL1 encodes a transmembrane protein that has been implicated in cancer. However, its roles and mechanisms in ESCC development have not been determined. Among our 67 samples, we identified two deleterious mutations in VANGL1 (GenBank: NM_001172412.1), a stop gain (c.1013C>A [p.Ser338]) and a nonsynonymous mutation (c.464C>G [p.Ser155Cys]). The p.Ser338 variant is predicted to cause premature termination in the cytosolic C-terminal domain. Three other somatic mutations of unknown significance were identified in the 3′ UTR, yielding a total of 6% of studied subjects with a VANGL1 mutation (Figure 3A).

Figure 3.

Figure 3

VANGL1 as a Functional Tumor Suppressor through the Inhibition of the Wnt/β-catenin Pathway in ESCC

(A) Diagram of somatic mutations within VANGL1 identified in WES study subjects.

(B) siRNA knockdown of VANGL1 expression increased KYSE30 and KYSE140 cell proliferation (MTT assay). Two different siRNAs (VANGL1-si1 and VANGL1-si2) were tested in triplicate.

(C) As compared to wild-type, VANGL1 mutants enhanced proliferation of K510 cells.

(D) Luciferase reporter assays indicated that the wild-type VANGL1 gene inhibited the Wnt1 mediated activation of TOP-FLASH in a dose dependent manner as compared to its mutants (VANGL1-M1 and VANGL1-M2). 10 ng, 20 ng or 40 ng of VANGL1, VANGL1-M1, or VANGL1-M2 were co-transfected with 5 ng Wnt1 into HEK293T cells.

(E) Western blots of β-catenin and its target gene c-Myc in KYSE30 and KYSE140 cells transfected with siRNA. Histone H3 and GAPDH were used as loading controls.

(F) VANGL1 knockdown upregulated the protein level of β-catenin and promoted β-catenin translocation from the cytoplasm to the nucleus in KYSE30 and KYSE410 cell lines. β-catenin was stained with red, and nuclei were counterstained with DAPI (blue).

Scale bars, 100 μm. NC, the siRNA negative control with a nonsense and scrambled sequence. The experiment was conducted in triplicate. The bars shown in (B)–(D) represent mean ± SD; #t test p values < 0.05.

Suppression of VANGL1 with siRNA promoted the proliferation of KYSE30 and KYSE140 cells (Figure 3B), whereas exogenous expression of VANGL1 inhibited the KYSE510 cell growth (Figure 3C). Furthermore, VANGL1 expression vectors containing either of the two deleterious mutations identified by sequencing, including p.Ser155Cys (referred to as M1) and p.Ser338 (referred to as M2), accelerated proliferation of KYSE510 cells (Figure 3C). VANGL1 has been reported to inhibit the Wnt/β-catenin signaling pathway in Drosophila.44 In our study, a TOP-FLASH luciferase reporter assay showed that overexpression of the wild-type VANGL1 inhibited the Wnt1-mediated TOP-FLASH activation in a dose-dependent manner (Figure 3D). In contrast, VANGL1 mutants showed a reversed effect as compared to the wide-type; notably, VANGL1-M2 (p.Ser338) substantially enhanced the TOP-FLASH activation (Figure 3D). Subsequent immunofluorescence and Western blot assays showed that VANGL1 knockdown promoted β-catenin translocation from the cytoplasm to the nucleus (Figures 3E and 3F). Also, VANGL1 suppression upregulated the protein amount of c-Myc in KYSE30 and KYSE140 cells (Figure 3E). Our results indicate that VANGL1 functions as a tumor suppressor in ESCC by inhibiting Wnt/β-catenin signaling activity.

Analysis of Structural Variants and Tandem Duplications of MYBL2

We also analyzed genomic rearrangement events detected by the CG Pipeline 2.0 (Figure 4A and Figure S4) in the WGS tumors. A total of 739 (39 to 147 per sample) high-confidence somatic rearrangements were found, including 278 deletions (38%), 147 inversions (20%), 124 duplications (17%), 29 interchromosomal translocations (4%), and 161 complex structural variants (22%) (Table S9 and Figures 4B and 4C). Structural variants (SVs) of multiple cancer-related genes were observed, including deletions of tumor suppressor genes (TP53, CDKN2A, NFE2L2, LRP1B [MIM: 608766], FHIT [MIM: 601153], TGFBR2 [MIM: 190182], and FOXP1 [MIM: 605515]), fusions of oncogenes (RUNX1T1-PHACTR1 [MIM: 133435, 608723], MAML2-TTC28 [MIM: 607537, 615098], ASXL1-RNF170 [MIM: 612990, 614649], FGF19-SHANK2 [MIM: 603891, 603290]), and amplifications of oncogenes (e.g., tandem duplications of MYBL2 [MIM: 601415]) (Figure 4D and Table S9).

Figure 4.

Figure 4

Structural Variants and Functional Studies of MYBL2

(A) The merged Circos plot of the somatic junctions identified in the WGS samples. Orange lines indicate inter-chromosomal junctions; gray lines indicate intra-chromosomal translocations.

(B) The vertical bar plot represents high confidence structural variation (SV) counts in 10 WGS study subjects.

(C) The pie chart illustrates the fractions of SVs in different categories, including deletion, inversion, tandem-duplication, distal-duplication, inter-chromosomal conjunction, and complex.

(D) The recurrent gene amplifications and deletions (≥2 cases observed) identified in WGS study subjects. Each square indicates one ESCC individual; green color indicates the individual carrying the gene deletion as indicated; red color indicates the individual carrying the gene duplication.

(E and F) The protein level and mRNA expression of MYBL2 were higher in ESCC tumor tissues as compared with their adjacent controls, measured by IHC (E) and qPCR assay (F), respectively.

(G) The positive correlation between MYBL2 DNA copy-number and its transcriptional expression.

(H) Higher DNA copy-number of MYBL2 was significantly associated with reduced survival of ESCC subjects.

(I) Upregulation of MYBL2 was significantly associated with poor survival of ESCC subjects.

(J–M) siRNA knockdown of MYBL2 in KYSE30 and KYSE410 cell lines decreased cell proliferation (J), colony formation (K), and migration (L). All experiments were conducted in triplicate. (M) Flow cytometry analysis of cell-cycle changes in KYSE30 and KYSE410 cells transfected with MYBL2 siRNAs. Experiments were conducted in triplicate. nc, the siRNA negative control with a nonsense and scrambled sequence. The bars shown in (J)–(M) represent the mean ± SD; , #t test p value < 0.05, ∗∗, ##p value < 0.01.

MYBL2, a member of the MYB family of transcription factor genes, has been implicated as a nuclear protein involved in cell-cycle progression. In our WGS samples, we found seven of ten ESCC tumors had tandemly repeated duplications of MYBL2. MYBL2 protein amount and mRNA expression were also significantly upregulated in tumors (p < 0.001) (see Figure 4E for IHC staining and Figure 4F for qPCR). The correlation between MYBL2 expression and its DNA copy number was significant (correlation coefficient = 0.748, p < 0.001) (Figure 4G).

In the survival analysis performed, individuals with elevated copies of MYBL2 showed a worse prognosis (p = 0.001, n = 68) (Figure 4H), as did individuals with high MYBL2 expression (log-rank p value = 0.005, n = 321) (Figure 4I). The adjusted HR is 1.56 (95% CI = 1.11–2.19) (Table 1). Knockdown of MYBL2 expression with siRNA decreased cell proliferation, colony formation, and migration of KYSE30 and KYSE410 cells (Figures 4J–4L and Figures S6A–S6D). In addition, in vitro experiments suggested that MYBL2 regulates cell proliferation by modulating the cell cycle (Figure 4M and Figure S6E). Collectively, our findings indicate MYBL2 as an important oncogene in ESCC, as has been previously shown in other cancers.45

Identification of SCNAs in ESCC

SCNAs, including amplifications and deletions, are among the most important genomic events in tumors but are largely uncharacterized in ESCC. In this study, we used coverage-based algorithms to call SCNAs from WGS and WES samples.15, 23 In our WGS data we detected 1,239 SCNAs with relative coverage greater than 2 and 7,586 with an absolute log ratio greater than 2. The SCNAs identified in the WES samples were consistent with the WGS findings and included a series of distinctive gene duplications within 1p36, 2q16, 3p24.3, 8p23.3, 10q21, 11q13.2-11q13.4 (the “FGFs-FADD [MIM: 602457]-SHANK2” region), 11q22 (the “CASP [MIM: 116896]-MMPs” region), and 14q21.1 (Figures 5A and 5C). The ESCC genomes had a pattern of amplifications and deletions characteristic of C class tumors.46 The amplified SCNAs included TP73 (MIM: 601990) on 1p36, SOX2 (MIM: 184429)47 on 3p24.3, FGFs (FGF19, FGF4 [MIM: 164980], and FGF3 [MIM: 164950]), SHANK2 in 11q13.2-11q13.4, FOXA1 (MIM: 602294) on 14q21.1, and MYC (MIM: 190080) on 8p23.3.48, 49 In addition to amplifications, we also detected a group of specific recurring deletions including 9p21.3 (the “MTAP [MIM: 156540]-CDKN2A/2B-CDKN2B-AS1 [MIM: 613149]” region) (Figure 5A). Notably, CDKN2A also carried recurrent deleterious somatic mutations in our study subjects (Figure 2A).

Figure 5.

Figure 5

Analysis of SCNAs in ESCC Genomes and Molecular Characterization of SCNA-Harboring Genes

(A) Recurrent amplifications and deletions identified in WES samples. x axis indicates the chromosome number (autosomes); y axis indicates the adjusted log ratios (see Material and Methods).

(B) Heatmap plot of the SCNAs identified in 57 WES study subjects. Upper panel presents the hierarchical clustering of the samples harboring the SCNAs; right panel indicates the proportion of ESCC subjects carrying the observed SCNA.

(C) Plots illustrating the CNV segmentations for chr3, chr8, chr11, and chr14 in WGS samples (see Material and Methods for details of the relative coverage algorithm and the CNV segmentation method).

In our 57 WES tissue samples, high-frequency recurrent SCNAs (i.e., >10% of cases affected) included 206 amplifications and 111 deletions. Figure 5B shows the top 10 amplifications and deletions identified. Of note, 47% of ESCC samples had recurrent SCNAs within 11q13.2–11q13.4 (the “FGFs-FADD-SHANK2” region), and 30% of samples had SCNAs within the CDKN2A deletion (9p21.3, chr9: 21,967,078–22,009,304) (Figures 5A and 5B). Interestingly, a previous study that investigated the temporal and spatial evolution of somatic chromosomal alterations in esophagus cancer (MIM: 133239) showed that the 9p21.3 deletion was shared in common with Barrett esophagus (MIM: 614266) and esophageal adenocarcinoma (EAC [MIM: 614266]),50 suggesting that 9p21.3 deletion might be an important early event common to different types of esophageal cancers. Together, approximately 65% of individuals were affected with an SCNA that included 11q13.2–q13.4 or 9p21.3. Of note, some cases had SCNAs involving both regions.

Expression and Survival Analysis of SCNA-Harboring Genes in ESCC

To investigate the impact of SCNAs on gene expression, we evaluated the expression amounts of 20 protein-coding genes with qPCR within the amplified regions on 3q26.33–3q27.1, 8q24.21, 11q13.3–11q13.4, and 14q21.1–14q11.2. We observed that seven genes exhibited elevated expression in ESCC tumor tissues as compared with the adjacent normal control tissue (Figure 6A and Table S10). For non-coding genes, we identified 17 miRNAs and three lncRNA genes (SHANK2-AS1, LINC00888, and PCAT1 [MIM: 616043]) within SCNA regions with abnormal expression in tumors as compared to expression in adjacent normal tissue (Figure 6B and Table S10). Using an independent sample set (n = 50 tumor-normal pairs), we found that three coding genes (FADD, FGF19, SHANK2) (Figure 6C) and four non-coding genes (miR-4707-5p, miR-1224-3p, PCAT1, miR-4448) (Figure 6D) within SCNA regions were statistically overexpressed in tumors.

Figure 6.

Figure 6

Functional Analysis of Coding and Non-coding Genes Involved in SCNAs

(A) Relative expression levels for protein-coding genes that occurred within regions of recurrent amplifications measured by qPCR from tumor and adjacent normal tissues.

(B) Relative expression levels for non-coding RNA genes within recurrent amplifications by qPCR in tumor and adjacent normal tissues.

(C) Comparisons of gene expression levels of SHANK2, FGF19, LAMP3, and FADD between ESCC tumor tissues and adjacent controls (n = 50).

(D) Comparisons of gene expression levels of miR-1224-3p, miR-4707-5p, miR-4448, and PCAT1 between ESCC tumor tissues and adjacent controls (n = 50).

(E) Kaplan–Meier survival curves of ESCC subjects with different expression levels of SHANK2, FADD, miR-4707-5p, and PCAT1 (n = 321).

(F–H) Molecular characterization of three noncoding RNA genes, PCAT1, miR-1224-3p, and miR-4707-5p. In (F), the results of cell proliferation (MTT assay) and cell colony formation for ESCC cell lines overexpressing lncRNA PCAT1, respectively. In (G), the results of cell proliferation and cell colony formation for ESCC cell lines overexpressing miR-1224-3p, respectively. In H, the analysis results of cell immigration and invasion assays for miR-4707-5p overexpressed in ESCC cell lines. Experiments were conducted in triplicate. Note: the bars shown in (F)–(H) represent the mean ± SD; t test p value < 0.05.

To investigate the roles of the deregulated genes within recurrent SCNA regions, using a large sample set (n = 321), we conducted a survival analysis with three coding genes (FADD, FGF19, SHANK2) and two non-coding genes (miR-4707-5p, PCAT1). We found that FADD, SHANK2, miR-4707-5p, and PCAT1 overexpression was significantly associated with poorer ESCC survival (Table 1 and Figure 6E).

SHANK2, located in the 11q13.3–11q13.4 amplified SCNA, had the largest effect size on ESCC survival. The adjusted HR was 2.51 (95% CI = 1.49–4.22) (Table 1). Kaplan-Meier curves and results of the log-rank tests indicated that individuals with higher expression of SHANK2 showed poorer survival (log-rank p value < 0.001). Two other genes in the same SCNA, FADD and FGF19, were also studied. For FADD, the group with higher expression showed a poorer survival (log-rank p value = 0.002), with an adjusted HR of 1.63 (95% CI = 1.20–2.21), but no association between FGF19 and ESCC survival was observed (Table 1).

For non-coding genes, survival analysis showed that individuals with higher PCAT1 expression had a worse prognosis (log-rank p value = 0.032), the adjusted HR was 1.75 (95% CI = 1.15–2.67). Likewise, for miR-4707-5p, the high expression group had a poorer survival (log-rank p value = 0.074) and the adjusted HR was 1.71 (95% CI = 1.10–2.65) (Table 1 and Figure 6E).

Functional Analyses of the ncRNA Genes

In cell culture, we observed that overexpression of PCAT1 and miR-1224-3p (Figures 6F and 6G and Figure S7) increased growth and colony formation, whereas overexpression of miR-4707-5p promoted cell migration and invasion in KYSE30 and KYSE180 cells (Figure 6H and Figure S8).

To determine the effect of miR-4707-5p on metastasis, we performed two different nude mouse xenograft experiments. In the first, a lymph node metastasis model, overexpression of miR-4707-5p promoted metastasis of KYSE30 and KYSE180 cells from the footpad to the popliteal lymph nodes (Figures 7A, 7B, and 7D). In the second assay, a lung metastasis model, overexpression of miR-4707-5p in KYSE30 and KYSE180 cells led more lung micrometastases (Figures 7C and 7D). To elucidate the molecular mechanisms of miR-4707-5p, we conducted a series of experiments that showed that miR-4707-5p upregulated β-catenin (MIM: 116806) and downregulated E-cadherin (MIM: 192090) (Figure 7F and Figure S9A), the master regulators in the Wnt/β-catenin signaling pathways.51

Figure 7.

Figure 7

miR-4707-5p Promotes Tumor Metastasis and Regulates the Wnt/Beta-Catenin Signaling Pathway by Targeting ADARB1

(A–D) Analysis of the functions of miR-4707-5p in BALB/C-nu/nu nude mice. Experiments were conducted with lymph node models and a lung metastasis model. In (A)–(C), the qualitative representations of tumor burden in xenograft nude mice are shown (A and B, photographs of the lymph node metastasis model and HE staining of lymph nodes with metastases; C, HE staining of lung tissue with metastases). The arrows indicate the metastatic foci. Scale bars, 200 um (B), 100 um (C), 20 um (inset of B and C). In (D), the number of mice with or without metastatic foci was recorded with cell lines transfected with lenti-miR-4707-5p.

(E) ADARB1 abolished the effects of miR-4707-5p in ESCC cell lines. In (E), miR-4707-5p downregulated the protein level of its target gene ADARB1 in KYSE30 and KYSE180.

(F) ADARB1 reversed miR-4707-5p-mediated upregulation of β-catenin and downregulation of E-cadherin. In Lane 1, cells are co-transfected with controls; lane 2, cells are transfected with miR-4707-5p mimics; lane 3, cells are transfected with ADARB1; lane 4, co-transfected with miR-4707-5p mimics and ADARB1. Mimic controls and pcDNA3.1 were used for background adjustment.

(G) Luciferase reporter TOP-FLASH assay in HEK293T cells shows increased TCF/β-catenin mediated gene transcription. The effect of miR-4707-5p was antagonized by co-transfection of ADARB1. 5 ng of Wnt1 were transfected as a positive control. Mimic controls and pcDNA3.1 were used for background adjustment.

(H) The effects of miR-4707-5p overexpression on cell migration were antagonized by ADARB1.

(I) Knockdown of ADARB1 mimicked the effects of miR-4707-5p.

(J) IHC staining for ADARB1 in ESCC tumor tissues and adjacent normal tissues.

(K) The difference in ADARB1 protein levels between miR-4707-5p low and high expression groups. Mann Whitney test p value is indicated.

(L) ADARB1 low-expression individuals have worse prognosis.

The bars shown in (G)–(I) represent the mean ± SD.

ADARB1 (MIM: 601218) encodes a site-specific adenosine deaminase that can edit target mRNAs and has been previously reported as a potential tumor suppressor in hepatocellular carcinoma (MIM: 114550).52 Using bioinformatic tools and luciferase reporter assays (Figure S9B), we identified ADARB1 as one of important targeted genes for miR-4707-5p. In vitro, miR-4707-5p downregulated the protein amount of ADARB1 (Figure 7E) and reversed its functions (Figures 7F–7I and Figure S9A). In our clinical cohort (including 120 ESCC-affected individuals with paraffin sections in the cohort of 321 total subjects), protein amount of ADARB1 and miR-4707-5p expression were reversely correlated (Figure 7K). Further, we showed that ADARB1 was downregulated in ESCC as compared with adjacent normal tissue, and individuals with tumors expressing lower ADARB1 had poorer survival rates (Figures 7J and 7L). The TOP-Flash assay, a measure of Wnt-signaling, was performed to allow us to better understand the function of miR-4707-5p and ADARB1 (Figure 7G). Transfection of miR-4707-5p into 293T cells activated β-catenin nuclear signaling, whereas co-transfection with ADARB1 largely reduced that effect, suggesting that miR-4707-5p plays a role, at least partially, through regulating ADARB1. Together, our results suggest that miR-4707-5p-driven depletion of ADARB1 contributes to E-cadherin downregulation and β-catenin nuclear signaling, and we will explore the detailed mechanisms in future studies.

Pathway Enrichment Analyses

We carried out pathway enrichment analyses to explore how multiple genomic alterations including SCNAs, point mutations, and differential gene expression might jointly contribute to ESCC development. These indicated that somatic mutations were strongly enriched in multiple functionally linked pathways known to be important in tumor cell proliferation (e.g., the p53 pathway and the cell-cycle signaling pathway) and migration (e.g., the Notch signaling pathway, the Wnt/β-catenin pathway, and the cadherin signaling pathway) (Figure S10A and Table S11). Notably, 93% of samples had somatic mutations in the cell-cycle signaling pathway, 78% of samples had genes altered in the p53 pathway, and 91% of samples exhibited exonic somatic mutations in genes involved in the Wnt/β-catenin pathway. When we clustered samples by their alterations in the core pathways, they formed a major group (Figure S10B). Furthermore, these core pathways were also the targets of the transcriptionally dysregulated genes impacted by recurrent SVs and/or SCNAs (e.g., MYBL2 tandem duplications, the recurrent deletions of CDKN2A, and the recurrent amplifications of FGFs; Figure S10A). In addition, multiple differentially expressed genes were found to be enriched in important cancer pathways such as the cell-cycle pathway and the p53 signaling pathway (Table S11), suggesting a joint effect of altered genes at the genomic and transcriptional levels on ESCC development.

Discussion

In this study, we sequenced tumor and normal samples to characterize somatic events in the genomes of ESCC tumors in order to identify frequently altered genes and core pathways associated with tumorigenesis and prognosis. We identified multiple somatic mutations previously seen in known cancer pathways and identified candidate genes for ESCC, including VANGL1 and miR-4707-5p.

VANGL1, a human homolog of Drosophila tissue polarity gene strabismus (stbm), was reported to promote JNK signaling and inhibit the Wnt/β-catenin signaling pathway in Drosophila.44 In this study, the results of in vitro functional experiments showed an anti-proliferative role for VANGL1. Our experiments demonstrated that VANGL1 inhibited the Wnt/β-catenin signaling pathway, consistent with its role in Drosophila.44 Previous studies have reported that VANGL1 promoted tumor metastasis in colon,53 gastric,54 oral,55 liver,56 and brain57 cancers etc. However, in our study, VANGL1 had no effect on the migration of ESCC cell lines (Figure S5), which might indicate that the biological functions of VANGL1 in ESCC are different from the above-mentioned tumor types.

We found two somatic mutations in VANGL1, one nonsynonymous variant (c.464C>G [p.Ser155Cys]) and one stop gain (c.1013C>A [p.Ser338]). We checked the genotypes for the somatic mutations in VANGL1 and confirmed that only one allele was affected for each single somatic mutation. We did not find any SVs or small indels in this gene. However, we realize that the purity of tumor samples can distort variant allele frequency due to heterogeneous tumor cell clones and mixtures of infiltrating normal cells. Single-cell sequencing could provide an alternative method to investigate the VANGL1 mutations in ESCC tumor cells but was not possible for these studies. The variant p.Ser155Cys might result in abnormal disulfide bond formation between C145 and C149, which could severely affect folding and thus the stability and normal activity of VANGL1. The variant p.Ser338 causes a predicted premature termination of VANGL1 at its cytosolic C-terminal domain, resulting in an expected non-functional truncation. The loss of the C-terminal domain is expected to abolish interaction with other proteins and subsequently cause a loss of VANGL1 activity. We believe our results show that VANGL1 serves as a growth suppressor in the development of ESCC by inhibiting the Wnt/β-catenin pathway and that reduced expression of VANGL1 caused by somatic mutations or other mechanisms is consistent with its potential role as a tumor suppressor.

At least some of somatically mutated genes we identified in this study could have potential clinical relevance. The most promising is NOTCH1, which encodes the key protein in the Notch signaling pathway. Our study indicated that ESCC-affected individuals with deleterious NOTCH1 or downregulated gene expression had a better prognosis (Figures S3B and S3C), consistent with a previous smaller study.58 This finding indicates that NOTCH1 could be a promising target for therapeutics. In the NCI60 panel,59 NOTCH1 showed a significant response to 14 existing anti-cancer drugs including Cytarabine, Dasatinib, Methotrexate, and Vinblastine (Table S12). We also identified 19 other somatically mutated genes that were significantly associated with anti-cancer drug response (Table S12), suggesting that these genes could also be potential targets for anti-cancer chemotherapy of ESCC.

We believe that the comprehensive characterization of genes in SCNAs is one of the strengths of this study. Previous studies have identified oncogenes within the altered genomic regions, e.g., SOX2 amplification within 3q2647, 60 and FBXW7 loss within 4q31.3.61 However, there has not been a systematic evaluation of SCNAs and ESCC development and prognosis based on genome sequencing. Although there are some limitations of the SCNA analysis with the exome-sequencing data, we conducted real-time PCR quantification for validation of MYBL2, the miRNA genes, and the other coding genes (Figure 6). In this study, we found that several miRNA genes occurring within amplified SCNAs were significantly associated with ESCC survival, and the mouse xenograft study showed that one of these, miR-4707-5p, substantially affected metastasis to the lungs and lymph nodes. From this and other studies, it is increasingly clear that clinical sequencing of tumor genomes will provide novel or improved therapeutic strategies for the treatment of ESCC and other cancers.

In conclusion, we have identified candidate genes that might play a role in ESCC and confirmed that amplification or decreased protein amounts of MYBL2 and ADARB1 are correlated with shorter survival. We have shown that at least two of these, VANGL1 and miR-4707-5p, have experimentally demonstrated roles in cell proliferation, invasion, and migration. Their involvement in ESCC points to potential new pathways and possible targets for therapy, as well as providing better tools for detection and predicting prognosis. We have also shown that genomic events that lead to amplification or loss of genes, both coding (i.e., MYBL2, FADD, FGF19, SHANK2) and non-coding (miR-4707-5p, PCAT1), likely play essential roles in driving tumor development and contributing to ESCC prognosis.

Acknowledgments

We’d like to thank Profs. Xiao-ou Shu and Wei Zheng of the Vanderbilt-Ingram Cancer Center for their kind advice in the course of sample preparation, exon capture, and deep sequencing. We thank Prof. Song Gao for his kind help in protein structure prediction. We also thank Bank of Tumor Resources, Cancer Center Sun Yat-sen University for providing the ESCC samples in this study. This work was supported by the grants from the National Science Fund for Distinguished Young Scholars of China (grant no. 81325018), the Key Project for International Cooperation and Exchange of the National Natural Science Foundation of China (grant no. 81220108022), the National Basic Research Program of China (973 program no. 2011CB504303). The authors gratefully acknowledge the support of the Intramural Research Program, National Institute of Mental Health (NIMH), NIH (IRP-NIMH-NIH, grant no. MH002930-05). The views expressed in this presentation do not necessarily represent the views of the the NIMH, the NIH, the US Department of Health and Human Services, or the United States Government.

Published: April 7, 2016

Footnotes

Supplemental Data include ten figures and twelve tables and can be found with this article online at http://dx.doi.org/10.1016/j.ajhg.2016.02.021.

Contributor Information

Yin Yao Shugart, Email: kay1yao@mail.nih.gov.

Wei-Hua Jia, Email: jiawh@sysucc.org.cn.

Accession Numbers

The WGS and WES data reported in this paper have been deposited in the European Genome-phenome Archive (EGA) under accession number EGA: EGAS00001001723. The lncRNA microarray data reported in this study have been deposited in the NCBI GEO database under accession number GEO: GSE77531.

Web Resources

The URLs for data presented herein are as follows:

Supplemental Data

Document S1. Figures S1–S10 and Tables S4, S10, and S12
mmc1.pdf (8.9MB, pdf)
Document S2. Table S2
mmc2.xlsx (33.2KB, xlsx)
Document S3. Table S3
mmc3.xlsx (16.8KB, xlsx)
Document S4. Table S5
mmc4.xlsx (810.2KB, xlsx)
Document S5. Table S6
mmc5.xlsx (23.3KB, xlsx)
Document S6. Table S7
mmc6.xlsx (48.8KB, xlsx)
Document S7. Table S8
mmc7.xlsx (203.6KB, xlsx)
Document S8. Table S9
mmc8.xlsx (83.8KB, xlsx)
Document S9. Table S11
mmc9.xlsx (31.1KB, xlsx)
Document S10. Article plus Supplemental Data
mmc10.pdf (13.1MB, pdf)

References

  • 1.Chang D.T., Chapman C., Shen J., Su Z., Koong A.C. Treatment of esophageal cancer based on histology: a surveillance epidemiology and end results analysis. Am. J. Clin. Oncol. 2009;32:405–410. doi: 10.1097/COC.0b013e3181917158. [DOI] [PubMed] [Google Scholar]
  • 2.Allum W.H., Stenning S.P., Bancewicz J., Clark P.I., Langley R.E. Long-term results of a randomized trial of surgery with or without preoperative chemotherapy in esophageal cancer. J. Clin. Oncol. 2009;27:5062–5067. doi: 10.1200/JCO.2009.22.2083. [DOI] [PubMed] [Google Scholar]
  • 3.Bass A.J., Meyerson M. Genome-wide association study in esophageal squamous cell carcinoma. Gastroenterology. 2009;137:1573–1576. doi: 10.1053/j.gastro.2009.09.026. [DOI] [PubMed] [Google Scholar]
  • 4.Wang L.D., Zhou F.Y., Li X.M., Sun L.D., Song X., Jin Y., Li J.M., Kong G.Q., Qi H., Cui J. Genome-wide association study of esophageal squamous cell carcinoma in Chinese subjects identifies susceptibility loci at PLCE1 and C20orf54. Nat. Genet. 2010;42:759–763. doi: 10.1038/ng.648. [DOI] [PubMed] [Google Scholar]
  • 5.Wu C., Hu Z., He Z., Jia W., Wang F., Zhou Y., Liu Z., Zhan Q., Liu Y., Yu D. Genome-wide association study identifies three new susceptibility loci for esophageal squamous-cell carcinoma in Chinese populations. Nat. Genet. 2011;43:679–684. doi: 10.1038/ng.849. [DOI] [PubMed] [Google Scholar]
  • 6.Wu C., Li D., Jia W., Hu Z., Zhou Y., Yu D., Tong T., Wang M., Lin D., Qiao Y. Genome-wide association study identifies common variants in SLC39A6 associated with length of survival in esophageal squamous-cell carcinoma. Nat. Genet. 2013;45:632–638. doi: 10.1038/ng.2638. [DOI] [PubMed] [Google Scholar]
  • 7.Wang L.D., Zhou F.Y., Li X.M., Sun L.D., Song X., Jin Y., Li J.M., Kong G.Q., Qi H., Cui J. Genome-wide association study of esophageal squamous cell carcinoma in Chinese subjects identifies susceptibility loci at PLCE1 and C20orf54. Nat. Genet. 2010;42:759–763. doi: 10.1038/ng.648. [DOI] [PubMed] [Google Scholar]
  • 8.Hollstein M.C., Metcalf R.A., Welsh J.A., Montesano R., Harris C.C. Frequent mutation of the p53 gene in human esophageal cancer. Proc. Natl. Acad. Sci. USA. 1990;87:9958–9961. doi: 10.1073/pnas.87.24.9958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Casson A.G., Tammemagi M., Eskandarian S., Redston M., McLaughlin J., Ozcelik H. p53 alterations in oesophageal cancer: association with clinicopathological features, risk factors, and survival. MP, Mol. Pathol. 1998;51:71–79. doi: 10.1136/mp.51.2.71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Beroukhim R., Mermel C.H., Porter D., Wei G., Raychaudhuri S., Donovan J., Barretina J., Boehm J.S., Dobson J., Urashima M. The landscape of somatic copy-number alteration across human cancers. Nature. 2010;463:899–905. doi: 10.1038/nature08822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lin D.C., Hao J.J., Nagata Y., Xu L., Shang L., Meng X., Sato Y., Okuno Y., Varela A.M., Ding L.W. Genomic and molecular characterization of esophageal squamous cell carcinoma. Nat. Genet. 2014;46:467–473. doi: 10.1038/ng.2935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Song Y., Li L., Ou Y., Gao Z., Li E., Li X., Zhang W., Wang J., Xu L., Zhou Y. Identification of genomic alterations in oesophageal squamous cell cancer. Nature. 2014;509:91–95. doi: 10.1038/nature13176. [DOI] [PubMed] [Google Scholar]
  • 13.Gao Y.B., Chen Z.L., Li J.G., Hu X.D., Shi X.J., Sun Z.M., Zhang F., Zhao Z.R., Li Z.T., Liu Z.Y. Genetic landscape of esophageal squamous cell carcinoma. Nat. Genet. 2014;46:1097–1102. doi: 10.1038/ng.3076. [DOI] [PubMed] [Google Scholar]
  • 14.Zhang L., Zhou Y., Cheng C., Cui H., Cheng L., Kong P., Wang J., Li Y., Chen W., Song B. Genomic analyses reveal mutational signatures and frequently altered genes in esophageal squamous cell carcinoma. Am. J. Hum. Genet. 2015;96:597–611. doi: 10.1016/j.ajhg.2015.02.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Drmanac R., Sparks A.B., Callow M.J., Halpern A.L., Burns N.L., Kermani B.G., Carnevali P., Nazarenko I., Nilsen G.B., Yeung G. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science. 2010;327:78–81. doi: 10.1126/science.1181498. [DOI] [PubMed] [Google Scholar]
  • 16.Carnevali P., Baccash J., Halpern A.L., Nazarenko I., Nilsen G.B., Pant K.P., Ebert J.C., Brownley A., Morenzoni M., Karpinchyk V. Computational techniques for human genome resequencing using mated gapped reads. J. Comput. Biol. 2012;19:279–292. doi: 10.1089/cmb.2011.0201. [DOI] [PubMed] [Google Scholar]
  • 17.F, H., S, D., R, I., A, M., S, L., G, T., and L, P. (2014). Gviz: Plotting data and annotation information along genomic coordinates. R package version 1.10.3.
  • 18.Zhang H., Meltzer P., Davis S. RCircos: an R package for Circos 2D track plots. BMC Bioinformatics. 2013;14:244. doi: 10.1186/1471-2105-14-244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Andrews, S. (2010). FastQC: A quality control tool for high throughput sequence data.
  • 20.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.PicardTeam. (2009). Picard: A set of tools (in Java) for working with next generation sequencing data in the BAM format.
  • 22.McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M., DePristo M.A. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Koboldt D.C., Zhang Q., Larson D.E., Shen D., McLellan M.D., Lin L., Miller C.A., Mardis E.R., Ding L., Wilson R.K. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22:568–576. doi: 10.1101/gr.129684.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhao, J.H. (2014). gap: Genetic Analysis Package. R package version 1.1-16.
  • 25.R Development Core Team (2014). R: A language and environment for statistical computing.
  • 26.Wang K., Li M., Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164. doi: 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Exome Variant Server. (2014). NHLBI GO Exome Sequencing Project (ESP).
  • 28.Adzhubei I.A., Schmidt S., Peshkin L., Ramensky V.E., Gerasimova A., Bork P., Kondrashov A.S., Sunyaev S.R. A method and server for predicting damaging missense mutations. Nat. Methods. 2010;7:248–249. doi: 10.1038/nmeth0410-248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ng P.C., Henikoff S. Predicting deleterious amino acid substitutions. Genome Res. 2001;11:863–874. doi: 10.1101/gr.176601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Lawrence M.S., Stojanov P., Polak P., Kryukov G.V., Cibulskis K., Sivachenko A., Carter S.L., Stewart C., Mermel C.H., Roberts S.A. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214–218. doi: 10.1038/nature12213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Quinlan A.R., Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Warnes, G.R., Bolker, B., Bonebakker, L., Gentleman, R., Liaw, W.H.A., Lumley, T., Maechler, M., Magnusson, A., Moeller, S., Schwartz, M., et al. (2014). gplots: Various R programming tools for plotting data.
  • 33.Su H., Hu N., Yang H.H., Wang C., Takikita M., Wang Q.H., Giffen C., Clifford R., Hewitt S.M., Shou J.Z. Global gene expression profiling and validation in esophageal squamous cell carcinoma and its association with clinical phenotypes. Clin. Cancer Res. 2011;17:2955–2966. doi: 10.1158/1078-0432.CCR-10-2724. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Hu N., Clifford R.J., Yang H.H., Wang C., Goldstein A.M., Ding T., Taylor P.R., Lee M.P. Genome wide analysis of DNA copy number neutral loss of heterozygosity (CNNLOH) and its relation to gene expression in esophageal squamous cell carcinoma. BMC Genomics. 2010;11:576. doi: 10.1186/1471-2164-11-576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Lee D.H., Thoennissen N.H., Goff C., Iwanski G.B., Forscher C., Doan N.B., Said J.W., Koeffler H.P. Synergistic effect of low-dose cucurbitacin B and low-dose methotrexate for treatment of human osteosarcoma. Cancer Lett. 2011;306:161–170. doi: 10.1016/j.canlet.2011.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Hashimoto Y., Ito T., Inoue H., Okumura T., Tanaka E., Tsunoda S., Higashiyama M., Watanabe G., Imamura M., Shimada Y. Prognostic significance of fascin overexpression in human esophageal squamous cell carcinoma. Clin. Cancer Res. 2005;11:2597–2605. doi: 10.1158/1078-0432.CCR-04-1378. [DOI] [PubMed] [Google Scholar]
  • 37.Lewis B.P., Burge C.B., Bartel D.P. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005;120:15–20. doi: 10.1016/j.cell.2004.12.035. [DOI] [PubMed] [Google Scholar]
  • 38.Wang X. miRDB: a microRNA target prediction and functional annotation database with a wiki interface. RNA. 2008;14:1012–1017. doi: 10.1261/rna.965408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Wang X., El Naqa I.M. Prediction of both conserved and nonconserved microRNA targets in animals. Bioinformatics. 2008;24:325–332. doi: 10.1093/bioinformatics/btm595. [DOI] [PubMed] [Google Scholar]
  • 40.Wan X., Liu J., Lu J.F., Tzelepi V., Yang J., Starbuck M.W., Diao L., Wang J., Efstathiou E., Vazquez E.S. Activation of beta-catenin signaling in androgen receptor-negative prostate cancer cells. Clin. Cancer Res. 2012;18:726–736. doi: 10.1158/1078-0432.CCR-11-2521. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kong K.L., Kwong D.L., Chan T.H., Law S.Y., Chen L., Li Y., Qin Y.R., Guan X.Y. MicroRNA-375 inhibits tumour growth and metastasis in oesophageal squamous cell carcinoma through repressing insulin-like growth factor 1 receptor. Gut. 2012;61:33–42. doi: 10.1136/gutjnl-2011-300178. [DOI] [PubMed] [Google Scholar]
  • 42.Ito T., Hashimoto Y., Tanaka E., Kan T., Tsunoda S., Sato F., Higashiyama M., Okumura T., Shimada Y. An inducible short-hairpin RNA vector against osteopontin reduces metastatic potential of human esophageal squamous cell carcinoma in vitro and in vivo. Clin. Cancer Res. 2006;12:1308–1316. doi: 10.1158/1078-0432.CCR-05-1611. [DOI] [PubMed] [Google Scholar]
  • 43.Pickering C.R., Zhou J.H., Lee J.J., Drummond J.A., Peng S.A., Saade R.E., Tsai K.Y., Curry J.L., Tetzlaff M.T., Lai S.Y. Mutational landscape of aggressive cutaneous squamous cell carcinoma. Clin. Cancer Res. 2014;20:6582–6592. doi: 10.1158/1078-0432.CCR-14-1768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Park M., Moon R.T. The planar cell-polarity gene stbm regulates cell behaviour and cell fate in vertebrate embryos. Nat. Cell Biol. 2002;4:20–25. doi: 10.1038/ncb716. [DOI] [PubMed] [Google Scholar]
  • 45.Papetti M., Augenlicht L.H. MYBL2, a link between proliferation and differentiation in maturing colon epithelial cells. J. Cell. Physiol. 2011;226:785–791. doi: 10.1002/jcp.22399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Ciriello G., Miller M.L., Aksoy B.A., Senbabaoglu Y., Schultz N., Sander C. Emerging landscape of oncogenic signatures across human cancers. Nat. Genet. 2013;45:1127–1133. doi: 10.1038/ng.2762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Bass A.J., Watanabe H., Mermel C.H., Yu S., Perner S., Verhaak R.G., Kim S.Y., Wardwell L., Tamayo P., Gat-Viks I. SOX2 is an amplified lineage-survival oncogene in lung and esophageal squamous cell carcinomas. Nat. Genet. 2009;41:1238–1242. doi: 10.1038/ng.465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ying J., Shan L., Li J., Zhong L., Xue L., Zhao H., Li L., Langford C., Guo L., Qiu T. Genome-wide screening for genetic alterations in esophageal cancer by aCGH identifies 11q13 amplification oncogenes associated with nodal metastasis. PLoS ONE. 2012;7:e39797. doi: 10.1371/journal.pone.0039797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Miyawaki Y., Kawachi H., Ooi A., Eishi Y., Kawano T., Inazawa J., Imoto I. Genomic copy-number alterations of MYC and FHIT genes are associated with survival in esophageal squamous-cell carcinoma. Cancer Sci. 2012;103:1558–1566. doi: 10.1111/j.1349-7006.2012.02329.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Li X., Galipeau P.C., Paulson T.G., Sanchez C.A., Arnaudo J., Liu K., Sather C.L., Kostadinov R.L., Odze R.D., Kuhner M.K. Temporal and spatial evolution of somatic chromosomal alterations: a case-cohort study of Barrett’s esophagus. Cancer Prev. Res. (Phila.) 2014;7:114–127. doi: 10.1158/1940-6207.CAPR-13-0289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Anastas J.N., Moon R.T. WNT signalling pathways as therapeutic targets in cancer. Nat. Rev. Cancer. 2013;13:11–26. doi: 10.1038/nrc3419. [DOI] [PubMed] [Google Scholar]
  • 52.Chan T.H., Lin C.H., Qi L., Fei J., Li Y., Yong K.J., Liu M., Song Y., Chow R.K., Ng V.H. A disrupted RNA editing balance mediated by ADARs (Adenosine DeAminases that act on RNA) in human hepatocellular carcinoma. Gut. 2014;63:832–843. doi: 10.1136/gutjnl-2012-304037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Kho D.H., Bae J.A., Lee J.H., Cho H.J., Cho S.H., Lee J.H., Seo Y.W., Ahn K.Y., Chung I.J., Kim K.K. KITENIN recruits Dishevelled/PKC delta to form a functional complex and controls the migration and invasiveness of colorectal cancer cells. Gut. 2009;58:509–519. doi: 10.1136/gut.2008.150938. [DOI] [PubMed] [Google Scholar]
  • 54.Ryu H.S., Park Y.L., Park S.J., Lee J.H., Cho S.B., Lee W.S., Chung I.J., Kim K.K., Lee K.H., Kweon S.S., Joo Y.E. KITENIN is associated with tumor progression in human gastric cancer. Anticancer Res. 2010;30:3479–3486. [PubMed] [Google Scholar]
  • 55.Yoon T.M., Kim S.A., Lee J.K., Park Y.L., Kim G.Y., Joo Y.E., Lee J.H., Kim K.K., Lim S.C. Expression of KITENIN and its association with tumor progression in oral squamous cell carcinoma. Auris Nasus Larynx. 2013;40:222–226. doi: 10.1016/j.anl.2012.07.006. [DOI] [PubMed] [Google Scholar]
  • 56.Cho S.B., Park Y.L., Park S.J., Park S.Y., Lee W.S., Park C.H., Choi S.K., Heo Y.H., Koh Y.S., Cho C.K. KITENIN is associated with activation of AP-1 target genes via MAPK cascades signaling in human hepatocellular carcinoma progression. Oncol. Res. 2011;19:115–123. doi: 10.3727/096504011x12935427587722. [DOI] [PubMed] [Google Scholar]
  • 57.Lee K.H., Ahn E.J., Oh S.J., Kim O., Joo Y.E., Bae J.A., Yoon S., Ryu H.H., Jung S., Kim K.K. KITENIN promotes glioma invasiveness and progression, associated with the induction of EMT and stemness markers. Oncotarget. 2015;6:3240–3253. doi: 10.18632/oncotarget.3087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Ogawa R., Ishiguro H., Kimura M., Funahashi H., Wakasugi T., Ando T., Shiozaki M., Takeyama H. NOTCH1 expression predicts patient prognosis in esophageal squamous cell cancer. Eur. Surg. Res. 2013;51:101–107. doi: 10.1159/000355674. [DOI] [PubMed] [Google Scholar]
  • 59.Garnett M.J., Edelman E.J., Heidorn S.J., Greenman C.D., Dastur A., Lau K.W., Greninger P., Thompson I.R., Luo X., Soares J. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature. 2012;483:570–575. doi: 10.1038/nature11005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Gen Y., Yasui K., Zen Y., Zen K., Dohi O., Endo M., Tsuji K., Wakabayashi N., Itoh Y., Naito Y. SOX2 identified as a target gene for the amplification at 3q26 that is frequently detected in esophageal squamous cell carcinoma. Cancer Genet. Cytogenet. 2010;202:82–93. doi: 10.1016/j.cancergencyto.2010.01.023. [DOI] [PubMed] [Google Scholar]
  • 61.Yokobori T., Mimori K., Iwatsuki M., Ishii H., Tanaka F., Sato T., Toh H., Sudo T., Iwaya T., Tanaka Y. Copy number loss of FBXW7 is related to gene expression and poor prognosis in esophageal squamous cell carcinoma. Int. J. Oncol. 2012;41:253–259. doi: 10.3892/ijo.2012.1436. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S10 and Tables S4, S10, and S12
mmc1.pdf (8.9MB, pdf)
Document S2. Table S2
mmc2.xlsx (33.2KB, xlsx)
Document S3. Table S3
mmc3.xlsx (16.8KB, xlsx)
Document S4. Table S5
mmc4.xlsx (810.2KB, xlsx)
Document S5. Table S6
mmc5.xlsx (23.3KB, xlsx)
Document S6. Table S7
mmc6.xlsx (48.8KB, xlsx)
Document S7. Table S8
mmc7.xlsx (203.6KB, xlsx)
Document S8. Table S9
mmc8.xlsx (83.8KB, xlsx)
Document S9. Table S11
mmc9.xlsx (31.1KB, xlsx)
Document S10. Article plus Supplemental Data
mmc10.pdf (13.1MB, pdf)

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES