Skip to main content
Cancer Science logoLink to Cancer Science
. 2007 Mar 28;98(5):740–746. doi: 10.1111/j.1349-7006.2007.00454.x

Identification of a predictive gene expression signature of cervical lymph node metastasis in oral squamous cell carcinoma

Su Tien Nguyen 1,3, Shogo Hasegawa 2, Hitoshi Tsuda 4, Hirofumi Tomioka 2, Masaru Ushijima 5, Masaki Noda 3,7, Ken Omura 2,3, Yoshio Miki 1,3,6,
PMCID: PMC11158652  PMID: 17391312

Abstract

An accurate assessment of the cervical lymph node metastasis status in oral cavity cancer not only helps predict the prognosis of patients, but also helps surgeons to perform the appropriate treatment. We investigated the utilization of microarray technology focusing on the differences in gene expression profiles between primary tumors of oral squamous cell carcinoma that had metastasized to cervical lymph nodes and those that had not metastasized in the hope of finding new biomarkers to serve for diagnosis and treatment of oral cavity cancer. To design this experiment, we prepared two groups: the learning case group with 30 patients and the test case group with 13 patients. All tissue samples were performed using laser captured microdissection to yield cancer cells, and RNA was isolated from purified cancer cells. To identify a predictive gene expression signature, the different gene expressions between the two groups with and without metastasis in the learning case (n = 30) were analyzed, and the 85 genes expressed differentially were selected. Subsequently, to construct a more accurate prediction model, we further selected the genes with a high power for prediction from the 85 genes using the AdaBoost algorithm. The eight candidate genes, DCTD, IL‐15, THBD, GSDML, SH3GL3, PTHLH, RP5‐1022P6 and C9orf46, were selected to achieve the minimum error rate. Quantitative reverse transcription–polymerase chain reaction was carried out to validate the selected genes. From these statistical methods, the prediction model was constructed including the eight genes and this model was evaluated by using the test case group. The results in 12 of 13 cases (∼92.3%) were predicted correctly. (Cancer Sci 2007; 98: 740–746)


In 2005, 335 870 Japanese people died from cancer. Of these, 5679 people had oral cavity cancer (http://www.mhlw.go.jp/toukei/saikin/hw/jinkou/suikei05/index.html) and the major cause of death by cancer was metastasis. An accurate assessment of the cervical lymph node metastasis status in oral cavity cancer not only helps predict the prognosis of patients, but also helps surgeons to carry out the appropriate treatment. When the disease is localized, surgical procedures can be used to remove the tumor in its entirety. For patients who are diagnosed clinically as cervical lymph node metastasis‐positive (N+), a surgical procedure, known as radical neck dissection (RND), is used to remove all lymph node groups from levels I, II, III, IV and V, which involves the sacrifice of the internal jugular vein, sternocleidomastoid muscle and spinal accessory nerve.

The clinical diagnostic procedure for clinical staging of cervical lymph nodes is carried out by clinical examination of the neck region or by ultrasound, computed tomography and magnetic resonance imaging. But the sensitivity of these methods is still limited. Post‐operative histological examination shows that approximately 30% of clinically diagnosed metastasis‐negative (N0) patients have metastasis‐positive lymph nodes in the neck,( 1 ) and 10–20% of clinically diagnosed metastasis‐positive (N+) patients turn out to be metastasis‐free. Due to the fact that the false‐negative rate is high in clinically diagnosed metastasis‐negative (N0) patients, most surgeons would not like to select a ‘wait and watch’ policy, because it may allow metastasis to spread further. Thus, the surgeons usually carry out a supraomohyoid neck dissection (SOHND) to remove lymph nodes at levels I, II and III to screen for metastasis. Although SOHND is not as stringent as RND and the technique of neck dissection has been perfected over the last century, surgeons still face minor and major complications during the surgical procedure,( 2 ) also sequelae such as chronic pain and limitation of shoulder movement due to a weak trapezius muscle.

Metastasis is a very complicated process. To metastasize, the cancer cells must break away from the tumor, increase their mobility and move through the extracellular matrix. Next they must invade the lymph vessels and grow in the lymph nodes or invade blood vessels and travel in the circulatory system. They then can pass through the vessel walls into surrounding tissue (distant metastasis). We think that the original genes controlling this process and the gene products of this process may be used as predictive markers of cervical lymph node metastasis. In the present study, microarray technology was used to investigate the differences in gene expression profiles between primary tumors of oral squamous cell carcinoma (OSCC) that metastasized to cervical lymph nodes and those that did not metastasize, in an effort to find new biomarkers that will provide more accurate diagnosis and more appropriate treatment for OSCC.

Materials and Methods

Tumor samples.  All of the primary oral cancer specimens were obtained from anonymous patients who were previously untreated at the Faculty of Dentistry, Tokyo Medical and Dental University and were defined as squamous cell carcinoma of the oral cavity by histopathology. Informed consent was obtained from all of the patients. All clinical materials were approved by the ethics committee. The samples were embedded using Tissue‐Tek OCT Compound (Sakura Finetek USA) and stored at −80°C until use. These samples were grouped into metastasis group and non‐metastasis group based on clinical diagnosis and histological examination. For the learning case, 30 samples were prepared (Table 1) including 13 samples from patients who were found to be N+ in the cervical lymph node and 17 samples from patients who were found to be N0 in the cervical lymph node. Those that remained metastasis‐free were monitored for at least 1 year after the primary tumor was removed. (The primary tumors were surgically removed between April 2004 and October 2005.) For the test case, 13 samples were prepared (Table 1) including seven samples found to be N+ in the cervical lymph node and six samples found to be N0 in the cervical lymph node. Those remaining metastasis‐free were monitored for at least 6 months after the primary tumor was removed. (The primary tumors were surgically removed between November 2005 and April 2006.) To determine the technical reproducibility, we prepared eight samples from the learning case for a replicate experiment.

Table 1.

Clinical and histological characteristics of individual patients

Case Sex Age (years) Primary site TN Differentiation Prediction score
Learning case
1 M 56 Lower gingival T2N0 Moderately −0.791
2 M 62 Buccal mucosa T1N0 Well −0.032
3 M 55 Upper gingival T2N0 Well −0.780
4 F 66 Tongue T2N0 Moderately −0.309
5 M 72 Hard palate T1N0 Moderately −0.481
6 M 58 Mouth floor T2N0 Well −0.716
7 M 80 Tongue T2N0 Well −0.564
8 M 30 Tongue T2N0 Well −0.609
9 F 81 retromolar trigone T1N0 Well −0.507
10 F 60 Tongue T2N0 Well −0.264
11 M 59 Tongue T3N0 Moderately −0.481
12 M 68 retromolar trigone T2N0 Moderately −0.428
13 M 54 Tongue T1N0 Well −0.534
14 M 56 Upper gingival T2N0 Well −0.303
15 M 43 Tongue T2N0 Moderately −0.716
16 M 46 Tongue T2N0 Well −0.564
17 M 58 Tongue T2N0 Well −0.282
18 F 64 Lower gingival T4aN1 Well  0.348
19 M 66 Tongue T1N2b Moderately  0.411
20 M 77 Lower gingival T2N2b Poor  0.577
21 F 74 Buccal mucosa T2N1 Moderately  0.780
22 M 78 Lower gingival T2N1 Well  0.499
23 M 71 Buccal mucosa T3N2b Poor  0.499
24 M 71 Lower gingival T3N2b Well  0.564
25 M 61 Tongue T2N2c Moderately  0.318
26 M 60 Lower gingival T4N2b Moderately  1.000
27 M 57 Upper gingival T4aN1 Poor  0.571
28 M 70 Tongue T3N1 Moderately  0.592
29 M 52 Lower gingival T4aN2b Moderately  0.817
30 M 37 Tongue T2N3 Poor  0.292
Test case
1 M 66 Lower gingival T2N0 Well −0.318
2 F 66 Upper gingival T2N0 Moderately −0.288
3 M 73 Tongue T2N0 Poor −0.507
4 M 32 Tongue T3N0 Moderately −0.260
5 M 58 Mouth floor T3N0 Moderately −0.053
6 M 72 Lower gingival T2N0 Well −0.165
7 M 66 Mouth floor T3N1 Moderately  0.162
8 M 68 Lower gingival T4aN2b Moderately  0.214
9 F 54 Tongue T2N2b Moderately  0.329
10 M 59 Mouth floor T4N1 Moderately  0.115
11 M 67 Mouth floor T4N1 Well  0.143
12 M 60 Tongue T4N2c Moderately  0.164
13 M 53 Tongue T2N2b Moderately −0.228

T1, tumor ≤2 cm in greatest dimension; T2, tumor >2 cm but ≤4 cm in greatest dimension; T3, tumor >4 cm in greatest dimension; T4, (lip) tumor invades through cortical bone, inferior alveolar nerve, floor of mouth, or skin of face (i.e. chin or nose); T4a, (oral cavity) tumor invades adjacent structures (e.g. through cortical bone, into deep [extrinsic] muscle of tongue [genioglossus, hyoglossus, palatoglossus, and styloglossus], maxillary sinus, and skin of face); T4b, tumor invades masticator space, pterygoid plates, or skull base and/or encases internal carotid artery; N0, no regional lymph node metastasis; N1, metastasis in a single ipsilateral lymph node, ≤3 cm in greatest dimension; N2, metastasis in a single ipsilateral lymph node, >3 cm but ≤6 cm in greatest dimension, or in multiple ipsilateral lymph nodes, ≤6 cm in greatest dimension, or in bilateral or contralateral lymph nodes, ≤6 cm in greatest dimension; N2a, metastasis in a single ipsilateral lymph node >3 cm but ≤6 cm in dimension; N2b, metastasis in multiple ipsilateral lymph nodes, ≤6 m in greatest dimension; N2c, metastasis in bilateral or contralateral lymph nodes, ≤6 cm in greatest dimension; N3, metastasis in a lymph node >6 cm in greatest dimension.

Laser captured microdissection.  All primary tumor specimens were cut into 9‐µm sections at −20°C using a LEICA cryostat model 3050S. The sections were mounted on a special slide for use in laser captured microdissection (LCM) and immediately placed at −80°C before use. First, the sections were fixed in cold ethanol for 3 min. They were then were washed in dionised water for 30 s, stained with hematoxylin for 40 s, and again washed in dionised water for 30 s. The sections were dried by cold wind for 2 or 3 min before the LCM. Squamous cell carcinomas were obtained accurately from the hematoxylin‐stained tissue sections by LCM.

RNA isolation and quality assessment.  Total RNA was extracted from the harvested cells using the RNeasy Micro Kit of Qiagen, and the concentration was measured using a NanoDrop ND‐100 Spectrophotometer. All RNA was run with RNA 6000 Pico LabChip kits on the Agilent 2100 Bioanalyzer to analyze the quality of total RNA. The total RNA quality was assessed by RNA integrity number (RIN) value,( 3 , 4 ) and the samples with RIN values below 5( 5 ) were not used for the next step.

cRNA amplification and biotin labeling.  Total RNA (100 ng) of each sample was used for starting the protocol of Two‐Cycle cDNA Synthesis and labeling of cRNA, following the recommendations of Affymetrix.( 6 ) The yield of biotin‐labeled cRNA was measured using a NanoDrop ND‐100 Spectrophotometer and the quality was analyzed using an Agilent 2100 Bioanalyzer. We removed samples with a yield less than 40 µg or with a median size of biotin‐labeled cRNA fragments less than 500 bp.

Microarray production.  The Human Genome U133 Plus 2.0 array was purchased from the Affymetrix company in Japan. The array comprised 1 300 000 distinct oligonucleotides and featured over 47 000 transcripts and variants, including approximately 39 000 of the best‐characterized human genes.

Cocktail solution and microarray hybridization.  Before making a cocktail solution, we used 20 µg of biotin‐labeled cRNA and broke down the full length to 35–200 base fragments. Then, we used 15 µg of broken cRNA to make the cocktail solution, and the solution was put into GeneChip HG U133 plus 2 and hybridized for 16 h at 45°C. After hybridization, the arrays were washed and stained using Fluidic station 450 with protocol EukGE‐WS2v5_450 and the arrays were scanned using the Affymetrix GeneChip Scaner 3000.

Statistical analysis.  After scanning, the fluorescence intensity was measured using Affymetrix Microarray Suite 5.0 software, and the array was removed if it had a report with a scale factor larger than 6, 3′/5′β‐actin larger than 35 or 3′/5′ glyceraldehyde‐3‐phosphate dehydrogenase larger than 7. At low‐level analysis, the arrays were imported into the RMAExpress software (http://rmaexpress.bmbolstad.com) to perform normalization using the RMA algorithm( 7 , 8 ) and computing expression levels, because the RMA algorithm gave the most reproducible results and showed the highest correlation coefficients with real‐time polymerase chain reaction data.( 9 , 10 ) After the expression levels were calculated, the array data were imported into DNA‐Chip analysis software (http://www.dchip.org) for high‐level analysis. Gene filtering was carried out using the variation across samples criteria (0.3 < standard deviation/mean < 100). For group comparison, two‐group t‐tests were used with a threshold of P < 0.04, absolute value of the difference in mean expression between two groups (Δ) > 100 intensity units and a fold change in mean expression >1.5 and <0.66. The 85 genes (Table 2) were selected. After selecting 85 genes that showed a difference in expression levels between the two groups, we again extracted from the 85 genes with software using the Adaboost algorithm.( 11 ) The software was able to autoselect the best gene combination for separating the metastasis group from the non‐metastasis group with the lowest cross validation (CV) error rate. Eight genes (Table 3) were extracted with a higher power for prediction, and were used to evaluate 13 samples from the test case.

Table 2.

The 85 genes related to lymph node metastasis

Accession no. Gene symbol Description Fold change P‐value
Downregulated genes in the metastasis group
NM_000597 IGFBP2 Insulin‐like growth factor binding protein 2, 36 kDa −6.81 0.004005
NM_002276 KRT19 Keratin 19 −5.78 0.028855
BG401568 SLC16A9 Solute carrier family 16 (monocarboxylic acid transporters), member 9 −3.54 0.000823
NM_001387 DPYSL3 Dihydropyrimidinase‐like 3 −3.33 0.035334
NM_016140 CGI‐38 Brain specific protein /// brain specific protein −2.49 0.019781
NM_001823 CKB Creatine kinase, brain −2.42 0.006229
AF288571 LEF1 Lymphoid enhancer‐binding factor 1 −2.35 0.033874
NM_002820 PTHLH Parathyroid hormone‐like hormone −2.34 0.02038
AW451197 CDNA clone IMAGE:5278089 −2.32 0.035398
AI278995 Predicted: Homo sapiens similar to B230208J24Rik protein (LOC201501), mRNA −2.24 0.01941
M31157 PTHLH Parathyroid hormone‐like hormone −2.16 0.038689
NM_003027 SH3GL3 SH3‐domain GRB2‐like 3 −2.15 0.029178
AL567411 CDK5R1 Cyclin‐dependent kinase 5, regulatory subunit 1 (p35) −2.13 0.011045
BG434174 Stoned B‐like factor −2.11 0.023189
AI522132 Hypothetical protein LOC115749 −2.04 0.038373
BC005961 PTHLH Parathyroid hormone‐like hormone /// parathyroid hormone‐like hormone −2.00 0.022891
NM_001759 CCND2 Cyclin D2 −1.98 0.018311
BG290193 ZNF703 Zinc finger protein 703 −1.93 0.037675
AI189753 TM4SF1 Transmembrane 4 L six family member 1 −1.92 0.011249
AA143793 RAB11FIP1 RAB11 family interacting protein 1 (class I) −1.89 0.013964
BF111651 PPAPDC1B Phosphatidic acid phosphatase type 2 domain containing 1B −1.89 0.03133
AL137763 GRHL3 Grainyhead‐like 3 (Drosophila) −1.89 0.035038
BC000408 ACAT2 Acetyl‐Coenzyme A acetyltransferase 2 (acetoacetyl Coenzyme A thiolase) −1.88 0.006495
AW026491 CCND2 Cyclin D2 −1.87 0.032209
AL137629 KALRN Kalirin, RhoGEF kinase −1.82 0.031132
BF514585 SESN3 Sestrin 3 −1.77 0.021875
M90657 TM4SF1 Transmembrane 4 L six family member 1 −1.76 0.011547
AI458128 CBX6 Chromobox homolog 6 −1.73 0.018388
BF680438 LONRF1 LON peptidase N‐terminal domain and ring finger 1 −1.70 0.01133
AU154469 SLC11A2 Solute carrier family 11 (proton‐coupled divalent metal ion transporters), member 2 −1.69 0.018722
BG165333 CNKSR3 CNKSR family member 3 −1.68 0.016795
AI654238 B4GALNT3 β1,4‐N‐acetylgalactosaminyltransferase‐transferase‐III −1.63 0.018618
AI346835 TM4SF1 Transmembrane 4 L six family member 1 −1.52 0.006659
Upregulated genes in the metastasis group
BE501952 SATL1 Spermidine/spermine N1‐acetyl transferase‐like 1  1.51 0.015987
AI809870 SMYD2 SET and MYND domain containing 2  1.54 0.003793
NM_017665 ZCCHC10 Zinc finger, CCHC domain containing 10  1.54 0.010698
BF214329 Mitochondrial fission regulator 1  1.56 0.008586
AI123233 RANBP6 RAN binding protein 6  1.58 0.0279
NM_016040 TMED5 Transmembrane emp24 protein transport domain containing 5  1.59 0.015651
NM_001889 CRYZ Crystallin, zeta (quinone reductase)  1.59 0.012586
NM_013322 SNX10 Sorting nexin 10  1.59 0.028971
NM_024699 ZFAND1 Zinc finger, AN1‐type domain 1  1.6 0.018314
U13700 CASP1 Caspase 1, apoptosis‐related cysteine peptidase (interleukin 1, beta, convertase)  1.61 0.02315
AW157773 ZFP62 Zinc finger protein 62 homolog (mouse)  1.61 0.016554
NM_016283 TAF9 TAF9 RNA polymerase II, TATA box binding protein (TBP)‐associated factor, 32 kDa  1.62 0.012154
BF439522 MGC23909 Hypothetical protein MGC23909  1.63 0.017789
NM_003187 TAF9 TAF9 RNA polymerase II, TATA box binding protein (TBP)‐associated factor, 32 kDa  1.65 0.006175
NM_014873 LPGAT1 Lysophosphatidylglycerol acyltransferase 1  1.65 0.010458
AK001947 RP5‐1022P6.2 Hypothetical protein KIAA1434  1.65 0.026428
NM_016576 GMPR2 Guanosine monophosphate reductase 2  1.66 0.039452
NM_024430 PSTPIP2 Proline‐serine‐threonine phosphatase interacting protein 2  1.67 0.016465
AW612657 LYPLAL1 Lysophospholipase‐like 1  1.67 0.021165
L12723 HSPA4 Heat shock 70 kDa protein 4  1.69 0.006059
BC001025 RCL1 RNA terminal phosphate cyclase‐like 1  1.71 0.007725
AI042152 TncRNA Trophoblast‐derived noncoding RNA  1.71 0.02217
AW962511 FLJ22531 Hypothetical protein FLJ22531  1.73 0.036123
AI634046 CFLAR CASP8 and FADD‐like apoptosis regulator  1.74 0.03599
AF183569 ARTS‐1 Type 1 tumor necrosis factor receptor shedding aminopeptidase regulator  1.76 0.009405
AI805560 ZMYM6 Zinc finger, MYM‐type 6  1.77 0.01367
AI347128 IGBP1 Immunoglobulin (CD79A) binding protein 1  1.77 0.001289
NM_002198 IRF1 Interferon regulatory factor 1  1.8 0.016707
NM_000361 THBD Thrombomodulin  1.81 0.007352
NM_000361 THBD Thrombomodulin  1.84 0.007352
NM_012485 HMMR Hyaluronan‐mediated motility receptor (RHAMM)  1.84 0.009623
BF735901 NUDCD2 NudC domain containing 2  1.86 0.00967
AL559202 Full‐length cDNA clone CS0DF034YI03 of fetal brain of Homo sapiens (human)  1.86 0.009237
AW973232 gb:AW973232 /DB_XREF=gi:8163078 /DB_XREF=EST385330 /FEA=EST /CNT=5 /TID=Hs.293553.0 /TIER=ConsEnd /STK=0 /UG=Hs.293553 /UG_TITLE=ESTs  1.89 0.008534
NM_004120 GBP2 Guanylate binding protein 2, interferon‐inducible /// guanylate binding protein 2, interferon‐inducible  1.89 0.029524
U29343 HMMR Hyaluronan‐mediated motility receptor (RHAMM)  1.92 0.004741
AW119113 THBD Thrombomodulin  1.92 0.010555
NM_018530 GSDML Gasdermin‐like  1.95 0.027119
NM_014349 APOL3 Apolipoprotein L, 3  1.95 0.018873
AI224133 Transcribed locus, weakly similar to XP_517454.1 PREDICTED: similar to hypothetical protein MGC45438[Pan troglodytes]  1.96 0.039636
AI928035 IRX2 Iroquois homeobox protein 2  1.99 0.022852
NM_018465 C9orf46 Chromosome 9 open reading frame 46  1.99 0.014382
AW003140 mRNA; cDNA DKFZp686K1098 (from clone DKFZp686K1098)  2.04 0.025115
AW613387 Endothelial cell growth factor 1 (platelet‐derived)  2.09 0.013803
NM_001657 AREG Amphiregulin (schwannoma‐derived growth factor)  2.10 0.013069
BC005254 CLEC2B C‐type lectin domain family 2, member B  2.11 0.024784
AI656493 DCTD dCMP deaminase  2.17 0.004514
NM_005415 SLC20A1 Solute carrier family 20 (phosphate transporter), member 1  2.17 0.022575
NM_004815 ARHGAP29 Rho GTPase activating protein 29  2.27 0.009215
AA976354 KIAA1618 KIAA1618  2.61 0.017377
NM_000585 IL15 Interleukin 15  2.80 0.00711
AI539443 STAT1 Signal transducer and activator of transcription 1, 91 kDa  3.00 0.033578

Table 3.

Genes selected for the prediction model

Accession Gene symbol Description Fold change P‐value
AI656493 DCTD dCMP deaminase  2.17 0.004514
NM_000585 IL15 Interleukin 15  2.80 0.00711
AW119113 THBD Thrombomodulin  1.92 0.010555
NM_018530 GSDML Gasdermin‐like  1.92 0.027119
NM_003027 SH3GL3 SH3‐domain GRB2‐like 3 −2.15 0.029178
BC005961 PTHLH Parathyroid hormone‐like hormone −2.00 0.022891
BE328402 RP5‐1022P6 Hypothetical protein KIAA1434  1.92 0.020426
NM_018465 C9orf46 Chromosome 9 open reading frame 46  1.99 0.014382

Quantitative reverse transcription–polymerase chain reaction analysis.  Quantitative reverse transcription–polymerase chain reaction (RT‐PCR)( 12 ) was to validate the results of eight meaningfully expressed genes from the analyzed microarray array data. For each sample, 100 ng of original total RNA was used to synthesize the first strand of cDNA by reverse transcriptase using oligo dT primer following the protocol recommended by Invitrogen (Superscript III First‐Strand Synthesis System for RT‐PCR). Primer sets for quantitative RT‐PCR (Table 4) were designed using PRIMER 3 software (http://www.genome.wi.mit.edu) and were synthesized by the Sigma Corporation. The PCR reaction was carried out using an ABI Prism 7900 Sequence Detection system with Power SYBR Green Master Mix (15 µL Power SYBR Green Master Mix, 0.3 µL with 5 µM of each primer, 5 µL cDNA, 9.4 µL water). For each sample, reactions were carried out in triplicate following the program: denaturation for 15 s at 95°C, and annealing and extension for 60 s at 60°C. Cumulative fluorescence was measured at the end of the extension phase of each cycle. Quantification was based on standard curves from serial dilution of human normal total RNA purchased from Stratagene Corporation. The results were normalized by to actin, and then compared with the microarray data of eight genes.

Table 4.

Primers for the genes used in real‐time polymerase chain reaction

Gene symbol Product size (bps) Forward primer Reverse primer
DCTD 113 ctgcgaggctcctgtttaat aagcttttgactcggtctgc
IL15 103 acaaacatcactctgctgcttagac ctgatccaaggtctgatcatcttct
THBD 105 agcacttgtgttgtctggtggt tgtgcacacagagatagcatgaa
GSDML 149 tgaggcacgaattctctgtg ggcagtgaggacagactggt
SH3GL3 103 gcttcctgtcctaaaagtcattggt ctgaggaatataggccattcgttg
PTHLH 122 tgtggcttgtttatccttagctc cttgccctaggttgtgaact
RP5‐1022P6 104 caatgagctttgcacagtttga tagtcccttagcttttgcctcttg
C9orf46 121 cttcctggtcccgattgttc actcttttctgtttccagtatgtcctc
Actin 150 atgtggccgaggactttga tgtgtggacttgggagagga

Results

To identify a predictive gene expression signature, 30 primary tumor samples (learning case group) located in the oral cavity region were analyzed. These included 13 samples from individuals who were found postoperatively to have metastasis in the lymph node of the neck, and 17 samples from individuals who were found postoperatively to have no metastasis in the lymph node of the neck and who remained metastasis‐free when monitored for at least 1 year after primary tumor removal. The cancer cells of the tumor were obtained by LEM technology. Total RNA was isolated and its quality checked. We removed samples with RIN values below 5. At first, technical reproducibility was determined using eight samples from the training case. The technical replicates of the same two‐sample comparison showed a high Pearson correlation coefficient. The lowest Pearson correlation coefficient was 0.9433 (Supplementary Fig. S1). This result indicated that the technical reproducibility of gene expression was high. To analyze the results of 30 primary tumors samples, two‐group t‐tests were used with a threshold of P < 0.04 and an absolute value of the difference in mean expression between the two groups (Δ) > 100 intensity units, with fold change in mean expression >1.5 and <0.66. The 85 genes expressed differentially between the two patient groups with and without cervical lymph node metastasis were selected (Table 2), including 33 genes that were downregulated and 52 genes that were upregulated in the metastasis group. Next, hierarchical clustering was carried out using 85 genes from the 30 samples by Pearson's correlation distance metric and average linkage (Fig. 1). Two major cluster branches were created. One major cluster included 16 non‐metastasis samples and the other included 13 metastasis samples and one non‐metastasis sample (missed clustering). Subsequently, to construct a more accurate and practical prediction model using a smaller number of genes, we selected further the genes with a high power for prediction from the 85 genes using AdaBoost algorithm.( 11 ) In the AdaBoost algorithm, the optimal gene and its weight are determined in each boosting step, and the prediction model is constructed by weighted voting of the selected genes. We performed 1000 replicates of five‐fold cross validation for the learning cases. Eight candidate genes (DCTD, IL‐15, THBD, GSDML, SH3GL3, PTHLH, RP5‐1022P6 and C9orf46) were selected (Table 3), which achieved the minimum error rate. Next a prediction score was established for each sample (Table 1). Prediction scores have a value from −1 to 1, and the borderline is 0. A positive score indicates cervical lymph node metastasis, whereas a negative score indicates that the sample is metastasis‐free. In the learning case, all 13 metastasis samples had positive scores and all 17 non‐metastasis samples had negative scores (Fig. 2A). Microarray is an excellent tool that can analyze the expression of tens of thousands of genes. However it has some problems with accuracy and universal use. For the prediction system with high accuracy, verification that we could accurately analyze gene expression using this method was required. Thus, to confirm the prediction results, quantitative RT‐PCR of the selected eight genes was carried out and normalized to actin before being compared by microarray data. The Pearson correlation values of the eight genes between microarray data and quantitative RT‐PCR data were calculated and revealed to be over 0.73 (Table 5), showing a high correlation between microarray data and quantitative RT‐PCR data in this study. We evaluated the prediction model by using the test case group. A prediction score was calculated for each sample using the prediction model constructed in this study (Fig. 2B). Six non‐metastasis samples and six of the seven metastasis samples (∼92.3%) were predicted correctly by the prediction model. Only one case (∼7.7%) was a failure by this prediction model (circled in red in Fig. 2B).

Figure 1.

Figure 1

Hierarchical clustering for 85 genes from 30 samples, including 13 metastasis samples (shown by pink color +) and 17 non‐metastasis samples (shown by green color –). Red color shows that the gene is upregulated and blue color shows that the gene is downregulated. Two major cluster branches were created. One major cluster included 16 non‐metastasis samples and the other one included 13 metastasis samples and one non‐metastasis sample (missed clustering).

Figure 2.

Figure 2

The samples were rank‐ordered by their score (determined by the Adaboost algorithm). A vertical line shows the total discriminant score. The samples with negative score indicated that the tumors were free of lymph node metastasis. The samples with positive scores indicated that the tumors metastasized to the cervical lymph node. (A) The prediction result of learning cases. All of the 17 samples in the non‐metastasis group were negative and all of the 13 samples in the metastasis group were positive. (B) The prediction results of test cases. The six samples in the non‐metastasis group were negative. Six of seven samples in the metastasis group were positive and one was negative (failure sample, circled in red).

Table 5.

Pearson correlation of expression values between microarray data and real‐time polymerase chain reaction data of the eight genes

Gene symbol Pearson correlation P‐value
DCTD 0.796 2.46 × 10−7
IL15 0.739 4.74 × 10−6
THBD 0.753 2.49 × 10−6
GSDML 0.868 1.06 × 10−9
SH3GL3 0.911 6.86 × 10−12
PTHLH 0.852 4.69 × 10−9
RP5‐1022P6 0.768 1.18 × 10−6
C9orf46 0.742 4.05 × 10−6

Discussion

At present, the methods for diagnosing the status of lymph node metastasis in oral cavity cancer are not accurate. Thus, two opinions were formed about the treatment of individual clinically diagnosed oral cavity cancer cases that are cervical lymph node metastasis‐free. The first is the ‘neck dissection’ policy and the other is the ‘wait and watch’ policy. However, neither policy provides appropriate treatment for the disease. Because the neck dissection policy can cause pain, discomfort and in some cases leads to complications (such as chronic pain and shoulder palsy), it is important to ascertain whether a patient really is metastasis‐free. The alternative ‘wait and watch’ policy may allow an overlooked metastasis to spread widely for the patient who has micrometastasis. The goal of our study is to devise a novel diagnostic system that may improve the diagnosis of N status in oral cavity cancer. The results in 12 of 13 cases (∼92.3%) were predicted correctly. Only one case (∼7.7%) was a failure by this prediction model. The misjudged metastasis case using our prediction model was a 53‐year‐old man with moderately differentiated tongue squamous cell carcinoma. It is very difficult to discuss why the case missed, because we could not find any relationship between clinicopathological features of the patient and the score. In this study we would like to say that quantitative RT‐PCR data should not be use for this prediction model system. The data could not be predicted accurately. The reason was that the prediction score from microarray data were normalized by RMA algorithm, but the quantitative RT‐PCR data were normalized to actin; therefore the gene expression value of each gene by quantitative RT‐PCR data differs from the microarray data.

Of the eight genes identified, IL‐15 is of particular interest. IL‐15 is a cytokine that regulates T and natural killer cell activation and proliferation. Studies on mice suggest that IL‐15 may increase the expression of apoptosis inhibitor. A recent study has reported that IL‐15 expression has been shown to play an important role in cell proliferation, invasion and metastasis of human colorectal cancer.( 13 , 14 ) In the present study, we observed IL‐15 overexpression in the metastasis group (fold change [FC]: 2.8, P < 0.00711; Pearson correlation between microarray data and real‐time PCR, 0.739). This showed that IL‐15 may also play a role in the metastasis of oral squamous cell carcinoma. Further study is required to learn more about the roles of IL‐15 in the metastasis of OSCC.

A second interesting gene is PTHLH. The protein encoded by this gene is a member of the parathyroid hormone family. This hormone regulates endochondral bone development and epithelial–mesenchymal interactions during formation of the mammary glands and teeth. Some articles have reported that PTHLH may play a role in metastasis of breast cancer and prostate cancer cell lines by upregulation.( 15 , 16 ) But in our results, PTHLH was upregulated in the non‐metastasis group and downregulated in the metastasis group (FC: −2, P < 0.022891; Pearson correlation between array data and real‐time PCR, 0.852). It is difficult to explain why, but it may be that the PTHLH mechanism is different in vitro compared with in vivo, or it may be that the role of PTHLH in each type of cells is different. It could also be that cancer cells produce PTHLH to prompt cancer cell migration and invasion, but when the metastasis process is finished PTHLH is no longer necessary and so was downregulated in the metastasis group. Further study may clarify the role of PTHLH in OSCC.

The novel diagnosis system using gene sets may be applied in diagnosis of the disease. Further, the system may be also applied for other diseases in the future.

Supporting information

Supporting info item

CAS-98-740-s001.pdf (85.1KB, pdf)

Acknowledgments

Many thanks to Drs Takashi Shimoji, Koichi Nagasaki, Kiyotsugu Yoshida, Masaru Uekusa and Fumiyuki Uematsu for helpful discussion during the preparation of this article. We also thank Professor Marie Cosgrove for having checked the English language of this paper.


References

  • 1. Jones AS, Phillips DE, Helliwell TR, Roland NJ. Occult node metastases in head and neck squamous carcinoma. Eur Arch Otorhinolaryngol 1993; 250: 446–9. [DOI] [PubMed] [Google Scholar]
  • 2. Genden EM, Ferlito A, Shaha AR et al. Complications of neck dissection. Acta Otolaryngol 2003; 123: 795–801. [DOI] [PubMed] [Google Scholar]
  • 3. Mueller O, Lightfoot S, Schroeder A. RNA integrity number (RIN) standardization of RNA quality control. Agilent Application Note, May 1 2004. Publication no. 5989‐1165EN. Available from URL: http://www.gene‐quantification.de/RIN.pdf
  • 4. Imbeaud S, Graudens E, Boulanger V et al. Towards standardization of RNA quality assessment using user‐independent classifiers of microcapillary electrophoresis traces. Nucleic Acids Res 2005; 33: e56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Lee J, Hever A, Willhite D, Zlotnik A, Hevezi P. Effects of RNA degradation on gene expression analysis of human postmortem tissues. FASEB J 2005; 19: 1356–8. [DOI] [PubMed] [Google Scholar]
  • 6. Technical note, GeneChip Eukaryotic Small Sample Target Labeling Assay Version II. Available from URL: http://genomics.msu.edu/RTSF/small_sample_labeling.pdf
  • 7. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 2003; 31: e15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Li J, Spletter ML, Johnson JA. Dissecting tBHQ induced ARE‐driven gene expression through long and short oligonucleotide arrays. Physiol Genomics 2005; 21: 43–58. [DOI] [PubMed] [Google Scholar]
  • 9. Irizarry RA, Hobbs B, Collin F et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003; 4: 249–64. [DOI] [PubMed] [Google Scholar]
  • 10. Millenaar FF, Okyere J, May ST, Van Zanten M, Voesenek LA, Peeters AJ. How to decide? Different methods of calculating gene expression from short oligonucleotide array data will give different results. BMC Bioinformatics 2006; 7: 137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Freund Y, Schapire RE. A short introduction to boosting. J. Japan. Soc. Artif. Intel. 1999; 14: 771–80. [Google Scholar]
  • 12. Ginzinger DG. Gene quantification using real‐time quantitative PCR: an emerging technology hits the mainstream. Exp Hematol 2002; 30: 503–12. [DOI] [PubMed] [Google Scholar]
  • 13. Kuniyasu H, Ohmori H, Sasaki T et al. Production of interleukin 15 by human colon cancer cells is associated with induction of mucosal hyperplasia, angiogenesis, and metastasis. Clin Cancer Res 2003; 9: 4802–10. [PubMed] [Google Scholar]
  • 14. Kuniyasu H, Oue N, Nakae D et al. Interleukin‐15 expression is associated with malignant potential in colon cancer cells. Pathobiology 2001; 69: 86–95. [DOI] [PubMed] [Google Scholar]
  • 15. Shen X, Qian L, Falzon M. PTH‐related protein enhances MCF‐7 breast cancer cell adhesion, migration, and invasion via an intracrine pathway. Exp Cell Res 2004; 294: 420–33. [DOI] [PubMed] [Google Scholar]
  • 16. Shen X, Falzon M. PTH‐related protein modulates PC‐3 prostate cancer cell adhesion and integrin subunit profile. Mol Cell Endocrinol 2003; 199: 165–77. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting info item

CAS-98-740-s001.pdf (85.1KB, pdf)

Articles from Cancer Science are provided here courtesy of Wiley

RESOURCES